agnes-the-ai-analyst

Author SHA1 Message Date

Author	SHA1	Message	Date
ZdenekSrotyr	dc7e27082d	fix(bq-materialize): code-review follow-ups for `16eaf7a3` - extractor._try_acquire_file_lock: close fd and re-raise on non- BlockingIOError from fcntl.flock (read-only fs, unsupported flock, fd exhaustion). Pre-fix the fd leaked silently and the underlying OSError still propagated past the caller. - extractor: reorder module-level layout so logger is bound before the new lock-related helpers reference it. Deferred import of app.instance_config inside _get_lock_ttl_seconds documented inline. - extractor: comment _table_locks unbounded-by-design rationale. - tests: docstring + monkeypatch-target rationale for the two concurrency tests where the contract isn't obvious from the body.	2026-05-04 17:59:21 +02:00
ZdenekSrotyr	16eaf7a399	feat(bq-materialize): per-table mutex + file lock with TTL reclaim Two layers of concurrency control. Layer 1 is a per-table_id threading.Lock keyed on table_id; Layer 2 is fcntl.flock on a sibling <id>.parquet.lock file. Overlapping calls for the same id raise MaterializeInFlightError, which the caller treats as 'skipped, in_flight' instead of a hard error. Stale file locks (mtime older than materialize.lock_ttl_seconds, default 86400) are reclaimed on the next attempt — covers the rare case where a holder was hard-killed before kernel-level flock release. Pre-fix, when a materialize ran longer than the scheduler tick interval (15 min), the next tick called materialize_query for the same id, hit the unconditional tmp_path.unlink() at function entry, and started a second COPY against the same path. Both writers interleaved bytes; the original COPY's read_parquet validation then failed with 'No magic bytes found at end of file'.	2026-05-04 17:40:21 +02:00

ZdenekSrotyr

dc7e27082d

fix(bq-materialize): code-review follow-ups for 16eaf7a3

- extractor._try_acquire_file_lock: close fd and re-raise on non-
  BlockingIOError from fcntl.flock (read-only fs, unsupported flock,
  fd exhaustion). Pre-fix the fd leaked silently and the underlying
  OSError still propagated past the caller.
- extractor: reorder module-level layout so logger is bound before
  the new lock-related helpers reference it. Deferred import of
  app.instance_config inside _get_lock_ttl_seconds documented inline.
- extractor: comment _table_locks unbounded-by-design rationale.
- tests: docstring + monkeypatch-target rationale for the two
  concurrency tests where the contract isn't obvious from the body.

2026-05-04 17:59:21 +02:00

ZdenekSrotyr

16eaf7a399

feat(bq-materialize): per-table mutex + file lock with TTL reclaim

Two layers of concurrency control. Layer 1 is a per-table_id
threading.Lock keyed on table_id; Layer 2 is fcntl.flock on a sibling
<id>.parquet.lock file. Overlapping calls for the same id raise
MaterializeInFlightError, which the caller treats as 'skipped,
in_flight' instead of a hard error. Stale file locks (mtime older
than materialize.lock_ttl_seconds, default 86400) are reclaimed on
the next attempt — covers the rare case where a holder was hard-killed
before kernel-level flock release.

Pre-fix, when a materialize ran longer than the scheduler tick interval
(15 min), the next tick called materialize_query for the same id, hit
the unconditional tmp_path.unlink() at function entry, and started a
second COPY against the same path. Both writers interleaved bytes;
the original COPY's read_parquet validation then failed with
'No magic bytes found at end of file'.

2026-05-04 17:40:21 +02:00

2 commits