Three Devin Review findings on PR #173 addressed in one commit since
they're in adjacent code paths:
1. cli/commands/init.py:99 (\u{1F534}): `agnes init --token NEW` ran
step 2 verify against the OLD on-disk token because `get_token()`
read `~/.config/agnes/token.json` before the env var, and
`_override_server_env` only set the env var. So `agnes init --force`
on a machine with a stale token.json failed 401 with a confusing
'token expired' even though the --token arg was valid.
Fix: ContextVar-based override in `cli.config._token_override`
checked by `get_token()` BEFORE the on-disk read.
`_with_token_override` context manager scopes the override.
`_override_server_env` now also sets the contextvar via
`_with_token_override(token)`, so both env var and contextvar
carry the override (env for back-compat with anything bypassing
get_token; contextvar is the authoritative source).
Async-safe (each task sees its own override) and leak-proof
(resets on context exit).
2 new tests: regression on stale-disk-token + scope leak guard.
2. cli/commands/status.py:43 (\u{1F7E1}): sessions_pending_upload only
checked legacy `<workspace>/user/sessions/` and always reported 0
in workspaces bootstrapped with `agnes init` (Claude Code writes
to `~/.claude/projects/`, not the legacy path). Same bug we fixed
for `agnes push` in 08e49591.
Fix: route through `cli.lib.claude_sessions.list_session_files()`
so status and push agree on what counts as a pending session.
3. connectors/bigquery/extractor.py:111 (\u{1F7E1}): docstring claimed
"a live holder still wins the second flock attempt" — incorrect on
Linux. After `unlink()` + `open()`, the new file is a new inode;
fcntl.flock keys per-inode, so the old holder's lock does NOT block
the new acquisition. In a genuine TTL-overrun scenario two writers
CAN race the parquet.tmp.
Fix: documentation only. Comment now honestly describes the
inode-recreation behavior, names the threading.Lock as the actual
in-process guard, and flags pid-gating as the next-iteration fix
if real corruption surfaces. The 24h default TTL is well above
typical COPY durations so the practical risk is low.
Tests: 17/17 across test_cli_init.py + test_lib_pull.py + the broader
regression set.
Typer/rich emits ANSI styling in CI's --help output (e.g. `--metrics`
becomes `-\x1b[0m\x1b[1;36m-metrics`), so literal substring asserts
like `assert "--metrics" in result.output` fail. Locally the test runner
auto-detects no-TTY and produces plain text, masking the issue.
Add a small `_clean()` helper per test file that strips ANSI escape
codes (`\x1b\[[0-9;]*m`) before substring containment checks.