Commit graph

5 commits

Author SHA1 Message Date
ZdenekSrotyr
2ae486bc5d feat(pull): parallel parquet downloads (AGNES_PULL_PARALLELISM=4 default)
The download loop in cli/lib/pull.py was strictly serial — N tables took
Σ stream_download(t_i). With the Caddy file_server change in this PR,
the server can now sustain many parallel sendfile transfers without
blocking app workers, so the client-side serialization became the new
bottleneck.

Switch to ThreadPoolExecutor capped by AGNES_PULL_PARALLELISM (default 4,
set 1 to restore pre-PR serial). 4 matches typical home-broadband
saturation without over-subscribing the analyst's NIC. Drops to serial
when len(to_download) <= 1 to avoid executor overhead in the common
single-table case.

Per-table error semantics preserved via (tid, entry, err) tuple — a
failure on one parquet doesn't abort the rest of the batch.

Verified end-to-end against a dev VM with the new Caddy file_server
deployed: 2-table pull through agnes CLI works under the new concurrency.
2026-05-05 16:42:55 +02:00
ZdenekSrotyr
8784f10a6b fix(devin-review): stale-token override + status sessions counter + lock comment
Three Devin Review findings on PR #173 addressed in one commit since
they're in adjacent code paths:

1. cli/commands/init.py:99 (\u{1F534}): `agnes init --token NEW` ran
   step 2 verify against the OLD on-disk token because `get_token()`
   read `~/.config/agnes/token.json` before the env var, and
   `_override_server_env` only set the env var. So `agnes init --force`
   on a machine with a stale token.json failed 401 with a confusing
   'token expired' even though the --token arg was valid.

   Fix: ContextVar-based override in `cli.config._token_override`
   checked by `get_token()` BEFORE the on-disk read.
   `_with_token_override` context manager scopes the override.
   `_override_server_env` now also sets the contextvar via
   `_with_token_override(token)`, so both env var and contextvar
   carry the override (env for back-compat with anything bypassing
   get_token; contextvar is the authoritative source).
   Async-safe (each task sees its own override) and leak-proof
   (resets on context exit).
   2 new tests: regression on stale-disk-token + scope leak guard.

2. cli/commands/status.py:43 (\u{1F7E1}): sessions_pending_upload only
   checked legacy `<workspace>/user/sessions/` and always reported 0
   in workspaces bootstrapped with `agnes init` (Claude Code writes
   to `~/.claude/projects/`, not the legacy path). Same bug we fixed
   for `agnes push` in 08e49591.

   Fix: route through `cli.lib.claude_sessions.list_session_files()`
   so status and push agree on what counts as a pending session.

3. connectors/bigquery/extractor.py:111 (\u{1F7E1}): docstring claimed
   "a live holder still wins the second flock attempt" — incorrect on
   Linux. After `unlink()` + `open()`, the new file is a new inode;
   fcntl.flock keys per-inode, so the old holder's lock does NOT block
   the new acquisition. In a genuine TTL-overrun scenario two writers
   CAN race the parquet.tmp.

   Fix: documentation only. Comment now honestly describes the
   inode-recreation behavior, names the threading.Lock as the actual
   in-process guard, and flags pid-gating as the next-iteration fix
   if real corruption surfaces. The 24h default TTL is well above
   typical COPY durations so the practical risk is low.

Tests: 17/17 across test_cli_init.py + test_lib_pull.py + the broader
regression set.
2026-05-04 21:26:30 +02:00
ZdenekSrotyr
976d0c7160 fix(pull): re-download parquet when file missing despite matching hash
Pre-fix `agnes pull` decided what to download from sync_state hash
equality alone:

    if server_hash != local_hash or tid not in local_tables or not server_hash:
        to_download.append(tid)

If the recorded local hash matched server but the actual parquet had
been deleted from disk, the download was skipped. The next DuckDB
view rebuild then fails on a missing file. Repro: `rm
server/parquet/X.parquet && agnes pull` → 'Updated 0 tables', X
still missing.

Failure modes that produce hash-equal-but-file-missing:
- manual `rm` of a single parquet
- operator-side cleanup of `server/parquet/`
- two workspaces sharing one user's
  `~/.config/agnes/sync_state.json` (TODO(workspace-scoped-sync-state)
  in pull.py): one workspace writes its parquets, the other reads
  sync_state and concludes 'I already have these'
- disk corruption / partial restore from backup

Fix: existence check runs alongside the hash compare. Missing file
forces a re-download regardless of hash equality. `parquet_dir` is
hoisted above the loop so the existence check is in scope when the
download set is built.

Tests: regression test for the hash-equal-but-missing-file case +
counterpart for the fast-path (hash-equal-and-file-present must
still skip).
2026-05-04 21:12:06 +02:00
ZdenekSrotyr
15004126de fix(cli-lib): I1+I2+I3 review fixes — token-precedence note, sync-state TODO, dry-run hermeticity test 2026-05-04 18:04:56 +02:00
ZdenekSrotyr
37da602060 feat(cli-lib): cli/lib/pull.py:run_pull primitive with lazy mkdir 2026-05-04 18:00:57 +02:00