agnes-the-ai-analyst/connectors/bigquery
ZdenekSrotyr e5645fd280 fix: devil's advocate R1 — chunked probe, parse-error heuristic narrow, pool settings refresh, content-length sanity, multi-project skip
R1 adversarial review surfaced 5 issues, all addressed:

#1 chunked download silently disabled in non-Caddy deployments (HEAD on
GET-only FastAPI route returns 405). _probe_range_support now falls back
to GET with Range: bytes=0-0 when HEAD fails — works against both
Caddy file_server (HEAD-friendly) and dev FastAPI direct (GET-only).

#2 parse-error fallback heuristic too broad — matched on Unrecognized
name / Function not found / No matching signature / Invalid cast,
which BQ surfaces for ordinary user-column typos. That triggered slow
ATTACH-catalog retry on every typo (2× latency tax). Narrowed to just
'Syntax error' / 'syntax error' which are the genuine DuckDB-vs-BQ
dialect mismatch markers.

#3 apply_bq_session_settings was only run on fresh-built pool entries,
not on reuse. An operator's /admin/server-config change to bq_query
_timeout_ms wouldn't propagate to long-lived pooled sessions until
restart. Fixed: re-apply on every pool acquire (idempotent + fail-soft).

#4 content-length sanity bound — a misconfigured proxy returning a
wildly inflated Content-Length would cause overlapping chunked Range
requests against the actual file → corrupt assembled output (caught
by manifest hash check, but only after wasted bandwidth). Cap at 100
GiB; above that, drop to single-stream.

#5 rewriter assumed every BQ row resolves under the single
bq.projects.data project. Bucket containing '.' suggests a project-
qualified bucket (multi-project deployment); rewriter would silently
target the wrong project. Conservative skip with regression test.
2026-05-06 13:50:46 +02:00
..
__init__.py Add BigQuery data source adapter 2026-03-11 13:56:12 +01:00
access.py fix: devil's advocate R1 — chunked probe, parse-error heuristic narrow, pool settings refresh, content-length sanity, multi-project skip 2026-05-06 13:50:46 +02:00
auth.py feat(v2): claude-driven fetch primitives + 0.14.0 (#102) 2026-04-29 01:07:19 +02:00
extractor.py feat(bigquery): bq_query_timeout_ms knob; default 600s (was 90s) 2026-05-05 16:40:40 +02:00