Commit graph

3 commits

Author SHA1 Message Date
ZdenekSrotyr
976d0c7160 fix(pull): re-download parquet when file missing despite matching hash
Pre-fix `agnes pull` decided what to download from sync_state hash
equality alone:

    if server_hash != local_hash or tid not in local_tables or not server_hash:
        to_download.append(tid)

If the recorded local hash matched server but the actual parquet had
been deleted from disk, the download was skipped. The next DuckDB
view rebuild then fails on a missing file. Repro: `rm
server/parquet/X.parquet && agnes pull` → 'Updated 0 tables', X
still missing.

Failure modes that produce hash-equal-but-file-missing:
- manual `rm` of a single parquet
- operator-side cleanup of `server/parquet/`
- two workspaces sharing one user's
  `~/.config/agnes/sync_state.json` (TODO(workspace-scoped-sync-state)
  in pull.py): one workspace writes its parquets, the other reads
  sync_state and concludes 'I already have these'
- disk corruption / partial restore from backup

Fix: existence check runs alongside the hash compare. Missing file
forces a re-download regardless of hash equality. `parquet_dir` is
hoisted above the loop so the existence check is in scope when the
download set is built.

Tests: regression test for the hash-equal-but-missing-file case +
counterpart for the fast-path (hash-equal-and-file-present must
still skip).
2026-05-04 21:12:06 +02:00
ZdenekSrotyr
15004126de fix(cli-lib): I1+I2+I3 review fixes — token-precedence note, sync-state TODO, dry-run hermeticity test 2026-05-04 18:04:56 +02:00
ZdenekSrotyr
37da602060 feat(cli-lib): cli/lib/pull.py:run_pull primitive with lazy mkdir 2026-05-04 18:00:57 +02:00