agnes-the-ai-analyst

AI-Cognitive-Leap/agnes-the-ai-analyst

Fork 0

Commit graph

Author	SHA1	Message	Date
ZdenekSrotyr	8e56d45c68	fix(query): code-review fixes — outer LIMIT wrap, dollar-quoting, parse-error fallback Address code-reviewer findings on the bigquery_query() rewrite path: 1. Outer LIMIT wrap — bigquery_query() materialises BQ result into DuckDB before fetchmany sees it (vs ATTACH-catalog Storage Read API streaming). A user 'SELECT *' against a billion-row remote table would buffer the entire result before request.limit applied. Wrap rewritten SQL in an outer 'LIMIT N+1' so the cap pushes into the BQ job itself. 2. Dollar-quoted inner SQL — naive replace("'", "''") doubling missed DuckDB backslash-escape sequences (\\, \\n, \\t, …). A predicate like 'WHERE name = ''O\\'Brien''' was unsafe under the doubling path. DuckDB $bqq_inner$ … $bqq_inner$ form takes the inner SQL verbatim with no escapes whatsoever. Falls back to legacy doubling if user SQL improbably contains the literal tag. 3. Parse-error fallback — when the rewritten path fails with a BQ-side parse / validation error (DuckDB-only syntax like ::INT cast that survives identifier rewrite but BQ refuses), retry the user's original SQL via the legacy ATTACH-catalog path so the request still succeeds. Mirrors the existing dry-run fallback contract. 4. CHANGELOG — delete duplicate CLI bullets that landed under already-released [0.38.1] (file corruption from merge — entries are correctly under [0.39.0]).	2026-05-06 13:29:45 +02:00
ZdenekSrotyr	b2c1ff143c	fix(query): rewrite BQ-backed user SQL via bigquery_query() to enable predicate pushdown User SQL hitting query_mode='remote' BigQuery rows was 50-100x slower than the equivalent direct bigquery_query() call because DuckDB's master view (CREATE VIEW … AS SELECT * FROM bigquery.<ds>.<tbl>) does not push WHERE/SELECT/LIMIT into BQ in ATTACH-catalog mode. The BQ extension opens a Storage Read API session over the entire upstream table; on >100M-row sources this was 70-150s and frequently failed with 'Response too large to return'. Extract the existing dry-run rewriter's core (table-name → BQ-native backtick path) into a shared helper. Add an execution-path rewriter that wraps the whole user SQL in bigquery_query('<project>', '<inner>') so the BQ planner sees the full query and engages partition pruning + projection pushdown server-side. Conservative fall-through: cross-source JOINs (BQ ↔ Keboola/Jira local), queries already containing bigquery_query(, and unconfigured BQ project all skip the rewrite and run the original SQL via ATTACH-catalog so behavior degrades gracefully.	2026-05-06 13:02:34 +02:00

Author

SHA1

Message

Date

ZdenekSrotyr

8e56d45c68

fix(query): code-review fixes — outer LIMIT wrap, dollar-quoting, parse-error fallback

Address code-reviewer findings on the bigquery_query() rewrite path:

1. Outer LIMIT wrap — bigquery_query() materialises BQ result into DuckDB
   before fetchmany sees it (vs ATTACH-catalog Storage Read API streaming).
   A user 'SELECT *' against a billion-row remote table would buffer the
   entire result before request.limit applied. Wrap rewritten SQL in an
   outer 'LIMIT N+1' so the cap pushes into the BQ job itself.

2. Dollar-quoted inner SQL — naive replace("'", "''") doubling missed
   DuckDB backslash-escape sequences (\\, \\n, \\t, …). A predicate
   like 'WHERE name = ''O\\'Brien''' was unsafe under the doubling
   path. DuckDB $bqq_inner$ … $bqq_inner$ form takes the inner SQL
   verbatim with no escapes whatsoever. Falls back to legacy doubling
   if user SQL improbably contains the literal tag.

3. Parse-error fallback — when the rewritten path fails with a BQ-side
   parse / validation error (DuckDB-only syntax like ::INT cast that
   survives identifier rewrite but BQ refuses), retry the user's
   original SQL via the legacy ATTACH-catalog path so the request still
   succeeds. Mirrors the existing dry-run fallback contract.

4. CHANGELOG — delete duplicate CLI bullets that landed under
   already-released [0.38.1] (file corruption from merge — entries are
   correctly under [0.39.0]).

2026-05-06 13:29:45 +02:00

ZdenekSrotyr

b2c1ff143c

fix(query): rewrite BQ-backed user SQL via bigquery_query() to enable predicate pushdown

User SQL hitting query_mode='remote' BigQuery rows was 50-100x slower
than the equivalent direct bigquery_query() call because DuckDB's master
view (CREATE VIEW … AS SELECT * FROM bigquery.<ds>.<tbl>) does not push
WHERE/SELECT/LIMIT into BQ in ATTACH-catalog mode. The BQ extension opens
a Storage Read API session over the entire upstream table; on >100M-row
sources this was 70-150s and frequently failed with 'Response too large
to return'.

Extract the existing dry-run rewriter's core (table-name → BQ-native
backtick path) into a shared helper. Add an execution-path rewriter
that wraps the whole user SQL in bigquery_query('<project>', '<inner>')
so the BQ planner sees the full query and engages partition pruning +
projection pushdown server-side.

Conservative fall-through: cross-source JOINs (BQ ↔ Keboola/Jira local),
queries already containing bigquery_query(, and unconfigured BQ project
all skip the rewrite and run the original SQL via ATTACH-catalog so
behavior degrades gracefully.

2026-05-06 13:02:34 +02:00

2 commits