fix(devin-review): stale-token override + status sessions counter + lock comment

Three Devin Review findings on PR #173 addressed in one commit since
they're in adjacent code paths:

1. cli/commands/init.py:99 (\u{1F534}): `agnes init --token NEW` ran
   step 2 verify against the OLD on-disk token because `get_token()`
   read `~/.config/agnes/token.json` before the env var, and
   `_override_server_env` only set the env var. So `agnes init --force`
   on a machine with a stale token.json failed 401 with a confusing
   'token expired' even though the --token arg was valid.

   Fix: ContextVar-based override in `cli.config._token_override`
   checked by `get_token()` BEFORE the on-disk read.
   `_with_token_override` context manager scopes the override.
   `_override_server_env` now also sets the contextvar via
   `_with_token_override(token)`, so both env var and contextvar
   carry the override (env for back-compat with anything bypassing
   get_token; contextvar is the authoritative source).
   Async-safe (each task sees its own override) and leak-proof
   (resets on context exit).
   2 new tests: regression on stale-disk-token + scope leak guard.

2. cli/commands/status.py:43 (\u{1F7E1}): sessions_pending_upload only
   checked legacy `<workspace>/user/sessions/` and always reported 0
   in workspaces bootstrapped with `agnes init` (Claude Code writes
   to `~/.claude/projects/`, not the legacy path). Same bug we fixed
   for `agnes push` in 08e49591.

   Fix: route through `cli.lib.claude_sessions.list_session_files()`
   so status and push agree on what counts as a pending session.

3. connectors/bigquery/extractor.py:111 (\u{1F7E1}): docstring claimed
   "a live holder still wins the second flock attempt" — incorrect on
   Linux. After `unlink()` + `open()`, the new file is a new inode;
   fcntl.flock keys per-inode, so the old holder's lock does NOT block
   the new acquisition. In a genuine TTL-overrun scenario two writers
   CAN race the parquet.tmp.

   Fix: documentation only. Comment now honestly describes the
   inode-recreation behavior, names the threading.Lock as the actual
   in-process guard, and flags pid-gating as the next-iteration fix
   if real corruption surfaces. The 24h default TTL is well above
   typical COPY durations so the practical risk is low.

Tests: 17/17 across test_cli_init.py + test_lib_pull.py + the broader
regression set.
This commit is contained in:
ZdenekSrotyr 2026-05-04 21:26:30 +02:00
parent 8233c3e3f9
commit 8784f10a6b
6 changed files with 176 additions and 20 deletions

View file

@ -38,6 +38,9 @@ End-to-end clean-analyst-bootstrap rewrite. The web `/setup?role=analyst` page n
- `agnes snapshot create` (formerly `da fetch`) no longer materializes an empty `user/duckdb/analytics.duckdb` when run before any `agnes pull`. Friendly hint redirects to `agnes pull`. - `agnes snapshot create` (formerly `da fetch`) no longer materializes an empty `user/duckdb/analytics.duckdb` when run before any `agnes pull`. Friendly hint redirects to `agnes pull`.
- Workspace `agnes status` reads from the canonical `server/parquet/` and `user/duckdb/analytics.duckdb` paths (was reading legacy `data/parquet/`, `data/metadata/last_sync.json`). - Workspace `agnes status` reads from the canonical `server/parquet/` and `user/duckdb/analytics.duckdb` paths (was reading legacy `data/parquet/`, `data/metadata/last_sync.json`).
- `agnes init` and `agnes pull` errors now use the `cli/error_render.py` typed-error renderer (added in 0.32.0), so analyst-facing error UX matches the structured shape `agnes query --remote` already produces. - `agnes init` and `agnes pull` errors now use the `cli/error_render.py` typed-error renderer (added in 0.32.0), so analyst-facing error UX matches the structured shape `agnes query --remote` already produces.
- **`agnes init --token X` now correctly uses the explicit token in the verify call**, even when `~/.config/agnes/token.json` already holds a stale token from a prior install. Pre-fix `cli.config.get_token()` read the on-disk file first and only fell back to env vars, so step 2 (PAT-verify) ran with the stale token and failed with a confusing 401 — even though the `--token` arg was valid (Devin Review on `init.py:99`). Fix: a `ContextVar`-based override in `cli.config` short-circuits `get_token()` before the file read; `_override_server_env` (used by both `agnes init` and `agnes pull`'s `run_pull`) sets it for the duration of the call. Async-safe (each task sees its own override) and leak-proof (resets on context exit).
- **`agnes status` sessions counter now reads the same source as `agnes push`** — `~/.claude/projects/<encoded-cwd>/` (Claude Code's actual write path) with the legacy `<workspace>/user/sessions/` as a fallback, via `cli.lib.claude_sessions.list_session_files()`. Pre-fix the counter only checked the legacy dir and always reported 0 in workspaces bootstrapped with `agnes init` (since Claude Code never writes there).
- **BigQuery materialize lock-reclaim docstring** at `connectors/bigquery/extractor.py:_try_acquire_file_lock` corrected: a still-running holder's `fcntl.flock` does NOT block the post-unlink reacquisition (new file = new inode = independent lock). The in-process `threading.Lock` keyed on `table_id` is the actual concurrency guard; cross-process protection (two schedulers on one workspace) relies on operators not running multiple concurrent schedulers AND on the TTL being well above the longest plausible COPY (24 h default). Documenting the residual risk so it isn't masked by a misleading "we're safe" comment (Devin Review on extractor.py:111).
- **`agnes pull` now re-downloads parquets when the local file is missing, even if the recorded hash matches the server.** Pre-fix the download set was computed from `sync_state.json` hash equality alone — if the parquet had been deleted (manual `rm`, disk cleanup, a different workspace sharing the same global `~/.config/agnes/sync_state.json` writing one workspace's parquets while another reads sync_state and assumes "I already have these"), the hash-equal check would short-circuit the download and the next DuckDB view rebuild would fail on a missing file. Now the existence check on `<workspace>/server/parquet/<tid>.parquet` runs alongside the hash compare; missing file → forced re-download regardless of hash. - **`agnes pull` now re-downloads parquets when the local file is missing, even if the recorded hash matches the server.** Pre-fix the download set was computed from `sync_state.json` hash equality alone — if the parquet had been deleted (manual `rm`, disk cleanup, a different workspace sharing the same global `~/.config/agnes/sync_state.json` writing one workspace's parquets while another reads sync_state and assumes "I already have these"), the hash-equal check would short-circuit the download and the next DuckDB view rebuild would fail on a missing file. Now the existence check on `<workspace>/server/parquet/<tid>.parquet` runs alongside the hash compare; missing file → forced re-download regardless of hash.
- **`agnes query --remote` no longer over-rejects narrow queries on partitioned/clustered BigQuery tables.** Closes #171. Pre-fix the `/api/query` cost guardrail dry-ran a synthetic `SELECT * FROM <table>` per registered remote-BQ row referenced by the user SQL, which forced BQ to estimate "full table scan" — column projection, predicate pushdown, and partition pruning were all ignored, producing scan-byte estimates up to ~30,000× larger than the actual query would scan. Narrow queries on big partitioned tables (the documented happy-path use case) were rejected with 400 `remote_scan_too_large` even when BQ's own dry-run reported single-digit MB. Now the guardrail rewrites the user SQL from DuckDB-flavor (bare registered names + `bq."<ds>"."<tbl>"`) to BQ-native (`` `<project>.<ds>.<tbl>` ``) and runs ONE dry-run on the EXACT user SQL — partition pruning, column projection, and predicate pushdown all engage. Cap check uses the real estimate. Fallback: if BQ rejects the rewritten SQL with `bq_bad_request` (DuckDB-only syntax that doesn't translate, e.g. `::INT` casts), the guardrail falls back to the pre-fix per-table SELECT * estimate so a non-portable query still gets bounded; non-parse errors (forbidden / upstream) propagate as 502. Helpers exported as `_rewrite_user_sql_for_bq_dry_run` (test seam). - **`agnes query --remote` no longer over-rejects narrow queries on partitioned/clustered BigQuery tables.** Closes #171. Pre-fix the `/api/query` cost guardrail dry-ran a synthetic `SELECT * FROM <table>` per registered remote-BQ row referenced by the user SQL, which forced BQ to estimate "full table scan" — column projection, predicate pushdown, and partition pruning were all ignored, producing scan-byte estimates up to ~30,000× larger than the actual query would scan. Narrow queries on big partitioned tables (the documented happy-path use case) were rejected with 400 `remote_scan_too_large` even when BQ's own dry-run reported single-digit MB. Now the guardrail rewrites the user SQL from DuckDB-flavor (bare registered names + `bq."<ds>"."<tbl>"`) to BQ-native (`` `<project>.<ds>.<tbl>` ``) and runs ONE dry-run on the EXACT user SQL — partition pruning, column projection, and predicate pushdown all engage. Cap check uses the real estimate. Fallback: if BQ rejects the rewritten SQL with `bq_bad_request` (DuckDB-only syntax that doesn't translate, e.g. `::INT` casts), the guardrail falls back to the pre-fix per-table SELECT * estimate so a non-portable query still gets bounded; non-parse errors (forbidden / upstream) propagate as 502. Helpers exported as `_rewrite_user_sql_for_bq_dry_run` (test seam).
- **Windows: `agnes` CLI no longer crashes on cs-CZ / non-UTF-8 consoles.** Two failure modes addressed (originally reported in #172 against the pre-rename `da` CLI; ported and broadened here): (1) `agnes pull` and any other Rich-progress-bar codepath crashed with `UnicodeEncodeError` because cp1250 / cp1252 cannot encode Rich's Braille spinner glyphs — `cli/main.py` now reconfigures `sys.stdout` / `sys.stderr` to UTF-8 with `errors="replace"` at import time when `sys.platform == "win32"`. (2) `agnes skills list` and `agnes skills show` crashed with `UnicodeDecodeError` reading skill markdown that contains em-dashes / accents — every `Path.read_text()` / `Path.write_text()` / `open()` call site in `cli/` (including ones not touched by #172, since several files were renamed in the bootstrap rewrite) now passes `encoding="utf-8"` explicitly. Defensive: also covers JSON / YAML config files that were ASCII-only in practice but were one non-ASCII value away from the same failure mode. - **Windows: `agnes` CLI no longer crashes on cs-CZ / non-UTF-8 consoles.** Two failure modes addressed (originally reported in #172 against the pre-rename `da` CLI; ported and broadened here): (1) `agnes pull` and any other Rich-progress-bar codepath crashed with `UnicodeEncodeError` because cp1250 / cp1252 cannot encode Rich's Braille spinner glyphs — `cli/main.py` now reconfigures `sys.stdout` / `sys.stderr` to UTF-8 with `errors="replace"` at import time when `sys.platform == "win32"`. (2) `agnes skills list` and `agnes skills show` crashed with `UnicodeDecodeError` reading skill markdown that contains em-dashes / accents — every `Path.read_text()` / `Path.write_text()` / `open()` call site in `cli/` (including ones not touched by #172, since several files were renamed in the bootstrap rewrite) now passes `encoding="utf-8"` explicitly. Defensive: also covers JSON / YAML config files that were ASCII-only in practice but were one non-ASCII value away from the same failure mode.

View file

@ -39,8 +39,11 @@ def status(
if db_path.exists(): if db_path.exists():
last_synced = datetime.fromtimestamp(db_path.stat().st_mtime, tz=timezone.utc).isoformat() last_synced = datetime.fromtimestamp(db_path.stat().st_mtime, tz=timezone.utc).isoformat()
sessions_dir = workspace / "user" / "sessions" # Sessions live in ~/.claude/projects/<encoded-cwd>/ (where Claude Code
session_count = len(list(sessions_dir.glob("*.jsonl"))) if sessions_dir.exists() else 0 # writes them), with `<workspace>/user/sessions/` as a legacy fallback.
# The helper unions both — same source of truth as `agnes push`.
from cli.lib.claude_sessions import list_session_files
session_count = len(list_session_files(workspace))
info = { info = {
"workspace": str(workspace), "workspace": str(workspace),

View file

@ -2,8 +2,45 @@
import json import json
import os import os
from contextlib import contextmanager
from contextvars import ContextVar
from pathlib import Path from pathlib import Path
from typing import Optional from typing import Iterator, Optional
# In-process override for `get_token()`. Used by `agnes init --token X` and
# `agnes auth import-token` to force a specific token for the duration of a
# scoped block, EVEN WHEN `~/.config/agnes/token.json` already holds a
# different (possibly stale) token. Without this override, `get_token()`
# reads the on-disk token first and the explicit `--token` argument is
# silently ignored — the bug Devin Review caught at cli/commands/init.py:99.
#
# A ContextVar is used (not a plain global) so concurrent callers — async
# tasks, threads — each see their own override, and a leaked override in
# one task can't corrupt another. `_token_override.set(...)` returns a
# token used to reset; the `_with_token_override` context manager scopes it.
_token_override: ContextVar[Optional[str]] = ContextVar(
"agnes_cli_token_override", default=None,
)
@contextmanager
def _with_token_override(token: Optional[str]) -> Iterator[None]:
"""Set `_token_override` for the duration of the block.
`get_token()` checks the override BEFORE reading `token.json`, so any
in-block call returns the supplied token regardless of on-disk state.
Restores the prior override (if any) on exit so nested overrides nest
correctly.
"""
if not token:
yield
return
reset_token = _token_override.set(token)
try:
yield
finally:
_token_override.reset(reset_token)
def _config_dir() -> Path: def _config_dir() -> Path:
@ -18,6 +55,12 @@ def get_server_url() -> str:
def get_token() -> Optional[str]: def get_token() -> Optional[str]:
# In-process override wins over BOTH the on-disk file and the env var.
# Set by `_with_token_override(...)`; used by `agnes init --token X`
# to force the explicit arg through the verify call even when a stale
# `~/.config/agnes/token.json` exists.
if (override := _token_override.get()) is not None:
return override
token_file = _config_dir() / "token.json" token_file = _config_dir() / "token.json"
if token_file.exists(): if token_file.exists():
data = json.loads(token_file.read_text(encoding="utf-8")) data = json.loads(token_file.read_text(encoding="utf-8"))

View file

@ -67,28 +67,34 @@ _SAFE_ID_RE = re.compile(r"^[a-zA-Z0-9_\-]{1,128}$")
@contextmanager @contextmanager
def _override_server_env(server_url: str, token: str) -> Iterator[None]: def _override_server_env(server_url: str, token: str) -> Iterator[None]:
"""Set AGNES_SERVER / AGNES_TOKEN for the duration of the call. """Set AGNES_SERVER + scoped token override for the duration of the call.
`cli.config.get_server_url` / `get_token` already honor these env vars, `cli.config.get_server_url` honors `AGNES_SERVER`, so the server URL is
which is the same mechanism used in production. Restores prior values swapped via env-var. The TOKEN override is routed through
on exit so the caller's environment isn't mutated permanently. `cli.config._with_token_override` (a ContextVar), which is checked by
`get_token()` BEFORE the on-disk `~/.config/agnes/token.json`. This is
load-bearing: `agnes init --token NEW` runs the verify call in step 2
while the file still holds an OLD token from a prior install without
the override, the verify uses the stale on-disk token and fails 401.
Caveats: `AGNES_TOKEN` env var is also set as a back-compat hint for any code
- **Token override is honored only when no `~/.config/agnes/token.json` path that bypasses `get_token()` (none in `cli/` at last audit, but
exists.** `get_token()` reads the file first and only falls back to third-party hooks may), but the contextvar is the authoritative source.
`AGNES_TOKEN`. `agnes init` writes `token.json` before calling
`run_pull` so the values agree in production; isolated tests/callers Restores prior values on exit so the caller's environment isn't
that pass a different token must clear the on-disk token first. mutated permanently. Not safe for concurrent invocation across threads;
- **Not safe for concurrent invocation in the same process** env-var single-threaded use only.
swap is global. Single-threaded use only.
""" """
from cli.config import _with_token_override
prev_server = os.environ.get("AGNES_SERVER") prev_server = os.environ.get("AGNES_SERVER")
prev_token = os.environ.get("AGNES_TOKEN") prev_token = os.environ.get("AGNES_TOKEN")
os.environ["AGNES_SERVER"] = server_url os.environ["AGNES_SERVER"] = server_url
if token: if token:
os.environ["AGNES_TOKEN"] = token os.environ["AGNES_TOKEN"] = token
try: try:
yield with _with_token_override(token):
yield
finally: finally:
if prev_server is None: if prev_server is None:
os.environ.pop("AGNES_SERVER", None) os.environ.pop("AGNES_SERVER", None)

View file

@ -106,10 +106,19 @@ def _try_acquire_file_lock(lock_path: Path):
Stale-lock reclaim: if the lock_path exists and its mtime is older Stale-lock reclaim: if the lock_path exists and its mtime is older
than the configured TTL, log a warning and unlink before retrying. than the configured TTL, log a warning and unlink before retrying.
A live holder still wins the second flock attempt (kernel-level
flock isn't tied to mtime), so the reclaim doesn't break correctness Caveat: ``lock_path.unlink()`` + the subsequent ``open()`` creates a
it just unblocks the case where a holder process was hard-killed NEW inode fcntl.flock keys on inode, so a still-running holder's
before the kernel released the lock.""" lock on the (now-orphan) old inode does NOT block the new acquisition.
A genuine overrunning materialize past TTL therefore CAN race a
fresh attempt and both can write to ``<id>.parquet.tmp``. The
in-process ``threading.Lock`` keyed on ``table_id`` blocks that race
within one scheduler process; cross-process protection (two schedulers
on the same workspace) relies on operators not running multiple
concurrent schedulers AND on the TTL being well above the longest
plausible COPY (24 h default). If real corruption surfaces in
production, the next iteration should attach a pid to the lock file
and skip reclaim while the holder pid is alive."""
lock_path.parent.mkdir(parents=True, exist_ok=True) lock_path.parent.mkdir(parents=True, exist_ok=True)
def _try_open_and_flock(): def _try_open_and_flock():

View file

@ -224,3 +224,95 @@ def test_init_manifest_unauthorized_when_pull_records_manifest_error(tmp_path, m
output = result.output + (result.stderr or "") output = result.output + (result.stderr or "")
assert "Traceback" not in output assert "Traceback" not in output
assert ("manifest_unauthorized" in output) or ("Manifest fetch failed" in output) assert ("manifest_unauthorized" in output) or ("Manifest fetch failed" in output)
def test_init_uses_explicit_token_arg_not_stale_disk_token(tmp_path, monkeypatch):
"""Regression for Devin Review finding on init.py:99.
Repro: a prior `agnes init` left a stale token in
`~/.config/agnes/token.json`. The new run passes a fresh token via
`--token`. Pre-fix, step 2's PAT-verify call read the on-disk token
first and only fell back to the env var so the explicit `--token`
arg was silently ignored, the verify ran with the stale token, and
init failed 401 with a confusing 'token expired' error even though
the supplied token was valid.
Fix: a ContextVar-based override (set by `_override_server_env`)
short-circuits `get_token()` BEFORE the on-disk read.
"""
import json
from unittest.mock import MagicMock
cfg_dir = tmp_path / "_cfg"
cfg_dir.mkdir()
monkeypatch.setenv("AGNES_CONFIG_DIR", str(cfg_dir))
# Seed a stale token on disk — this is what the bug exposed: the verify
# call would prefer this over the --token arg.
token_file = cfg_dir / "token.json"
token_file.write_text(json.dumps({
"access_token": "STALE-DO-NOT-USE",
"email": "old@example.com",
}), encoding="utf-8")
captured = {"verify_token": None}
def _api_get(path, *args, **kwargs):
# Verify endpoint: snapshot whatever token cli.config.get_token()
# returns at the moment of the call. If the override is wired
# correctly, this will be the --token arg, not the stale disk
# value.
if path == "/api/catalog/tables":
from cli.config import get_token
captured["verify_token"] = get_token()
resp = MagicMock()
resp.status_code = 200
if path == "/api/welcome":
resp.json.return_value = {"content": "# Test\n"}
elif path == "/api/sync/manifest":
resp.json.return_value = {"tables": {}}
elif path == "/api/memory/bundle":
resp.json.return_value = {"mandatory": [], "approved": []}
else:
resp.json.return_value = []
return resp
monkeypatch.setattr("cli.commands.init.api_get", _api_get, raising=False)
monkeypatch.setattr("cli.lib.pull.api_get", _api_get, raising=False)
result = runner.invoke(init_app, [
"--server-url", "http://x",
"--token", "FRESH-PAT-FROM-USER",
"--workspace", str(tmp_path / "ws"),
"--force",
])
assert captured["verify_token"] == "FRESH-PAT-FROM-USER", (
"Step 2 verify call must use the explicit --token arg, "
f"not the stale on-disk token. Got: {captured['verify_token']!r}"
)
output = result.output + (result.stderr or "")
assert "Traceback" not in output
def test_token_override_contextvar_does_not_leak_outside_block():
"""The override must be scoped to the `with` block — leaking it would
poison subsequent `get_token()` calls (e.g. a long-running daemon
that runs `agnes init` once and then `agnes pull` later in the same
process)."""
from cli.config import _with_token_override, get_token
import os
# Sandbox AGNES_CONFIG_DIR so the test's own config dir doesn't muddy
# the assertion (get_token would fall through to AGNES_TOKEN env or
# to None depending on host state).
prior_env = os.environ.pop("AGNES_TOKEN", None)
try:
with _with_token_override("INSIDE"):
assert get_token() == "INSIDE"
# Outside the block: override cleared, falls through to file/env.
# Without a config file or AGNES_TOKEN set, returns None.
assert get_token() != "INSIDE"
finally:
if prior_env is not None:
os.environ["AGNES_TOKEN"] = prior_env