feat(bigquery): bq_query_timeout_ms knob; default 600s (was 90s)
DuckDB BigQuery extension defaults `bq_query_timeout_ms` to 90 s, which is too tight for analyst-scale queries against view-backed BQ datasets. `agnes query --remote` HTTP 400'd with `Binder Error: Query execution exceeded the timeout. Job ID: ...` whenever the underlying BQ job ran longer than 90 s, even though the job itself was healthy. Add `data_source.bigquery.query_timeout_ms` (default 600 000 ms = 10 min, sentinel 0 falls through to the extension default). Applied via `SET bq_query_timeout_ms` after every `LOAD bigquery` on every BQ-touching DuckDB session: orchestrator's `_remote_attach` ATTACH path, BqAccess session factory, and the standalone extractor. Configurable via `/admin/server-config` UI. Fail-soft: extension versions that don't recognise the setting silently keep the default rather than poisoning the session.
This commit is contained in:
parent
4751094e1c
commit
4f04235502
7 changed files with 193 additions and 0 deletions
|
|
@ -10,6 +10,9 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C
|
||||||
|
|
||||||
## [Unreleased]
|
## [Unreleased]
|
||||||
|
|
||||||
|
### Added
|
||||||
|
- **`data_source.bigquery.query_timeout_ms` config knob** (default 600 000 ms = 10 min). The DuckDB BigQuery extension's built-in default of 90 s was too tight for analyst-scale queries against view-backed BQ datasets — `agnes query --remote` would HTTP 400 with `Binder Error: Query execution exceeded the timeout. Job ID: …` whenever the underlying BQ job took longer than 90 s, even though the BQ job itself was healthy. The new knob is applied via `SET bq_query_timeout_ms` after every `LOAD bigquery` on every BQ-touching DuckDB session — the orchestrator's `_remote_attach` ATTACH path (`src/orchestrator.py`), the analytics-DB read-only reattach path (`src/db.py:_reattach_remote_extensions` — the primary `agnes query --remote` request path), the `BqAccess` session factory (`connectors/bigquery/access.py`), and the standalone extractor (`connectors/bigquery/extractor.py`). Sentinel `0` (or non-numeric / unparseable values) leaves the extension default in place so operators on legacy extension versions that don't recognise the setting aren't broken. Configurable via `/admin/server-config` UI. Note: BigQuery's `jobs.query` RPC caps the wait at ~200 s per call regardless of this setting; the extension polls on top so the effective ceiling is the value here but each poll is ~200 s. DuckDB emits an informational warning when the value is set above the BQ RPC cap — operators can safely ignore it.
|
||||||
|
|
||||||
### Fixed
|
### Fixed
|
||||||
|
|
||||||
- Keboola sync now falls back to the legacy Storage-API client when the DuckDB Keboola extension's per-table scan fails, not just when the initial `ATTACH` fails. Two changes:
|
- Keboola sync now falls back to the legacy Storage-API client when the DuckDB Keboola extension's per-table scan fails, not just when the initial `ATTACH` fails. Two changes:
|
||||||
|
|
|
||||||
|
|
@ -285,6 +285,24 @@ _KNOWN_FIELDS: dict[str, dict[str, dict]] = {
|
||||||
"`agnes snapshot create` suggestion. 0 disables the gate. Default 5368709120 = 5 GiB."
|
"`agnes snapshot create` suggestion. 0 disables the gate. Default 5368709120 = 5 GiB."
|
||||||
),
|
),
|
||||||
},
|
},
|
||||||
|
"query_timeout_ms": {
|
||||||
|
"kind": "int",
|
||||||
|
"default": 600000,
|
||||||
|
"hint": (
|
||||||
|
"DuckDB BigQuery extension query timeout (milliseconds). Applied "
|
||||||
|
"via `SET bq_query_timeout_ms` after every `LOAD bigquery` on "
|
||||||
|
"every BQ-touching DuckDB session (orchestrator remote-view "
|
||||||
|
"ATTACH, BqAccess factory, standalone extractor). Extension "
|
||||||
|
"default is 90 000 ms = 90 s, which is too tight for analyst "
|
||||||
|
"queries against view-backed datasets — bumped to 600 000 ms = "
|
||||||
|
"10 min by default. Set 0 to fall through to the extension "
|
||||||
|
"default. Note: the underlying BQ jobs.query RPC caps the wait "
|
||||||
|
"at ~200 s per call; the extension polls on top, so the "
|
||||||
|
"effective ceiling is this value but each poll round-trip is "
|
||||||
|
"~200 s. DuckDB itself emits a warning when this is set above "
|
||||||
|
"~200 s — that warning is informational, not an error."
|
||||||
|
),
|
||||||
|
},
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
"keboola": {
|
"keboola": {
|
||||||
|
|
|
||||||
|
|
@ -127,6 +127,14 @@ data_source:
|
||||||
# # Dry-run check before running; exceeding -> registration / sync
|
# # Dry-run check before running; exceeding -> registration / sync
|
||||||
# # rejected. Default 10 GiB (10737418240). Set 0 to disable.
|
# # rejected. Default 10 GiB (10737418240). Set 0 to disable.
|
||||||
# # null falls through to default. Configurable via /admin/server-config UI.
|
# # null falls through to default. Configurable via /admin/server-config UI.
|
||||||
|
# query_timeout_ms: 600000
|
||||||
|
# # DuckDB BigQuery extension query timeout (milliseconds).
|
||||||
|
# # Applied via `SET bq_query_timeout_ms` after every LOAD bigquery
|
||||||
|
# # on every BQ-touching DuckDB session. Extension default is
|
||||||
|
# # 90 000 ms = 90 s, which is too tight for analyst queries against
|
||||||
|
# # view-backed datasets -- bumped to 600 000 ms = 10 min by default.
|
||||||
|
# # Set 0 to fall through to the extension default. Configurable via
|
||||||
|
# # /admin/server-config UI.
|
||||||
|
|
||||||
# --- OpenMetadata catalog (optional) ---
|
# --- OpenMetadata catalog (optional) ---
|
||||||
# Enriches table and column metadata from OpenMetadata REST API.
|
# Enriches table and column metadata from OpenMetadata REST API.
|
||||||
|
|
|
||||||
|
|
@ -232,11 +232,48 @@ def _default_duckdb_session_factory(projects: BqProjects):
|
||||||
f"failed to install/load BigQuery DuckDB extension: {e}",
|
f"failed to install/load BigQuery DuckDB extension: {e}",
|
||||||
details={"original": str(e)},
|
details={"original": str(e)},
|
||||||
)
|
)
|
||||||
|
apply_bq_session_settings(conn)
|
||||||
yield conn
|
yield conn
|
||||||
finally:
|
finally:
|
||||||
conn.close()
|
conn.close()
|
||||||
|
|
||||||
|
|
||||||
|
def apply_bq_session_settings(conn) -> None:
|
||||||
|
"""Apply per-session DuckDB BigQuery-extension settings from instance config.
|
||||||
|
|
||||||
|
Currently sets ``bq_query_timeout_ms`` from
|
||||||
|
``data_source.bigquery.query_timeout_ms``. The extension default is 90 s,
|
||||||
|
which is too tight for analyst-scale queries against view-backed BQ
|
||||||
|
datasets — bumping the default to 600 s here. Sentinel ``0`` (or a
|
||||||
|
non-numeric / unparseable value) leaves the extension default in place.
|
||||||
|
|
||||||
|
Call AFTER ``LOAD bigquery`` on every DuckDB session that touches BQ:
|
||||||
|
BqAccess's session factory, the standalone extractor in
|
||||||
|
``connectors/bigquery/extractor.py``, and the orchestrator's
|
||||||
|
``_remote_attach`` path in ``src/orchestrator.py``.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
from app.instance_config import get_value
|
||||||
|
except Exception:
|
||||||
|
return
|
||||||
|
raw = get_value(
|
||||||
|
"data_source", "bigquery", "query_timeout_ms", default=600_000,
|
||||||
|
)
|
||||||
|
try:
|
||||||
|
ms = int(raw) if raw is not None else 0
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
return
|
||||||
|
if ms <= 0:
|
||||||
|
return
|
||||||
|
try:
|
||||||
|
conn.execute(f"SET bq_query_timeout_ms = {int(ms)}")
|
||||||
|
except Exception:
|
||||||
|
# Fail-soft: extension version may not support the setting, or the
|
||||||
|
# session may already have been frozen — leave the default rather
|
||||||
|
# than poisoning the whole session.
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
class BqAccess:
|
class BqAccess:
|
||||||
"""Single entry point for BigQuery access. Stateless after construction.
|
"""Single entry point for BigQuery access. Stateless after construction.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -359,6 +359,8 @@ def _init_extract_locked(
|
||||||
conn.execute(
|
conn.execute(
|
||||||
f"CREATE SECRET bq_session (TYPE bigquery, ACCESS_TOKEN '{escaped_token}')"
|
f"CREATE SECRET bq_session (TYPE bigquery, ACCESS_TOKEN '{escaped_token}')"
|
||||||
)
|
)
|
||||||
|
from connectors.bigquery.access import apply_bq_session_settings
|
||||||
|
apply_bq_session_settings(conn)
|
||||||
conn.execute(
|
conn.execute(
|
||||||
f"ATTACH 'project={project_id}' AS bq (TYPE bigquery, READ_ONLY)"
|
f"ATTACH 'project={project_id}' AS bq (TYPE bigquery, READ_ONLY)"
|
||||||
)
|
)
|
||||||
|
|
|
||||||
|
|
@ -502,6 +502,8 @@ class SyncOrchestrator:
|
||||||
f"CREATE OR REPLACE SECRET {secret_name} "
|
f"CREATE OR REPLACE SECRET {secret_name} "
|
||||||
f"(TYPE bigquery, ACCESS_TOKEN '{escaped}')"
|
f"(TYPE bigquery, ACCESS_TOKEN '{escaped}')"
|
||||||
)
|
)
|
||||||
|
from connectors.bigquery.access import apply_bq_session_settings
|
||||||
|
apply_bq_session_settings(conn)
|
||||||
conn.execute(
|
conn.execute(
|
||||||
f"ATTACH '{safe_url}' AS {alias} (TYPE {extension}, READ_ONLY)"
|
f"ATTACH '{safe_url}' AS {alias} (TYPE {extension}, READ_ONLY)"
|
||||||
)
|
)
|
||||||
|
|
|
||||||
123
tests/test_bq_query_timeout.py
Normal file
123
tests/test_bq_query_timeout.py
Normal file
|
|
@ -0,0 +1,123 @@
|
||||||
|
"""Unit tests for apply_bq_session_settings.
|
||||||
|
|
||||||
|
Covers the data_source.bigquery.query_timeout_ms knob added so that
|
||||||
|
agnes query --remote no longer trips the DuckDB BigQuery extension's
|
||||||
|
built-in 90 s wait timeout when the underlying BQ job takes longer.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from unittest.mock import patch
|
||||||
|
|
||||||
|
from connectors.bigquery.access import apply_bq_session_settings
|
||||||
|
|
||||||
|
|
||||||
|
class _RecordingConn:
|
||||||
|
"""Minimal DuckDB-conn stand-in that records execute() calls.
|
||||||
|
|
||||||
|
apply_bq_session_settings only calls .execute(); we don't need a
|
||||||
|
real DuckDB to verify the SET command shape.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, raise_on=None):
|
||||||
|
self.calls: list[str] = []
|
||||||
|
self.raise_on = raise_on
|
||||||
|
|
||||||
|
def execute(self, sql: str):
|
||||||
|
self.calls.append(sql)
|
||||||
|
if self.raise_on and self.raise_on in sql:
|
||||||
|
raise RuntimeError(f"simulated failure on: {sql}")
|
||||||
|
|
||||||
|
|
||||||
|
def _patched_get_value(value):
|
||||||
|
"""Helper: build a patch target that returns *value* for the
|
||||||
|
data_source.bigquery.query_timeout_ms key and propagates the
|
||||||
|
`default=` kwarg for any other lookup so we don't accidentally
|
||||||
|
break tests that read other keys via the same module."""
|
||||||
|
def fake(*keys, default=None):
|
||||||
|
if keys == ("data_source", "bigquery", "query_timeout_ms"):
|
||||||
|
return value
|
||||||
|
return default
|
||||||
|
return patch("app.instance_config.get_value", side_effect=fake)
|
||||||
|
|
||||||
|
|
||||||
|
def test_default_when_config_missing():
|
||||||
|
"""When get_value returns the default (None passed through, default arg
|
||||||
|
used), apply_bq_session_settings should fall back to the bumped
|
||||||
|
600 000 ms default and emit the SET."""
|
||||||
|
conn = _RecordingConn()
|
||||||
|
# Simulate get_value returning the default we passed (600_000) by
|
||||||
|
# echoing the default kwarg.
|
||||||
|
def fake(*keys, default=None):
|
||||||
|
return default
|
||||||
|
with patch("app.instance_config.get_value", side_effect=fake):
|
||||||
|
apply_bq_session_settings(conn)
|
||||||
|
assert conn.calls == ["SET bq_query_timeout_ms = 600000"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_explicit_value():
|
||||||
|
conn = _RecordingConn()
|
||||||
|
with _patched_get_value(900_000):
|
||||||
|
apply_bq_session_settings(conn)
|
||||||
|
assert conn.calls == ["SET bq_query_timeout_ms = 900000"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_zero_sentinel_leaves_extension_default():
|
||||||
|
"""0 means 'use the DuckDB BQ extension's built-in default' — no SET
|
||||||
|
must be emitted so a non-zero default doesn't override an operator's
|
||||||
|
explicit opt-out."""
|
||||||
|
conn = _RecordingConn()
|
||||||
|
with _patched_get_value(0):
|
||||||
|
apply_bq_session_settings(conn)
|
||||||
|
assert conn.calls == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_negative_value_treated_as_zero():
|
||||||
|
"""Negative is nonsensical for a timeout; treat as 'extension default'
|
||||||
|
rather than emitting a negative SET that the extension might reject
|
||||||
|
or interpret unexpectedly."""
|
||||||
|
conn = _RecordingConn()
|
||||||
|
with _patched_get_value(-1):
|
||||||
|
apply_bq_session_settings(conn)
|
||||||
|
assert conn.calls == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_non_numeric_silently_skipped():
|
||||||
|
"""A string-typed YAML value (e.g. operator typo) shouldn't crash
|
||||||
|
the BQ session — fall through to the extension default."""
|
||||||
|
conn = _RecordingConn()
|
||||||
|
with _patched_get_value("notanumber"):
|
||||||
|
apply_bq_session_settings(conn)
|
||||||
|
assert conn.calls == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_string_numeric_is_coerced():
|
||||||
|
"""YAML loaders sometimes deliver int-like values as strings; accept
|
||||||
|
those rather than failing."""
|
||||||
|
conn = _RecordingConn()
|
||||||
|
with _patched_get_value("750000"):
|
||||||
|
apply_bq_session_settings(conn)
|
||||||
|
assert conn.calls == ["SET bq_query_timeout_ms = 750000"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_set_failure_does_not_propagate():
|
||||||
|
"""Older DuckDB BQ extension versions may not recognise the setting.
|
||||||
|
The function must fail-soft so a session that was otherwise healthy
|
||||||
|
keeps working — just with the extension's built-in default timeout."""
|
||||||
|
conn = _RecordingConn(raise_on="SET bq_query_timeout_ms")
|
||||||
|
with _patched_get_value(600_000):
|
||||||
|
# Must not raise.
|
||||||
|
apply_bq_session_settings(conn)
|
||||||
|
# The SET was attempted (recorded before the exception).
|
||||||
|
assert conn.calls == ["SET bq_query_timeout_ms = 600000"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_no_app_config_module_silently_skipped():
|
||||||
|
"""Unit-test contexts that don't bring up the app config layer must
|
||||||
|
still be able to construct BQ sessions for narrow tests; an
|
||||||
|
ImportError on app.instance_config means we can't read the knob,
|
||||||
|
so we leave the extension default in place."""
|
||||||
|
conn = _RecordingConn()
|
||||||
|
with patch.dict(
|
||||||
|
"sys.modules", {"app.instance_config": None},
|
||||||
|
):
|
||||||
|
apply_bq_session_settings(conn)
|
||||||
|
assert conn.calls == []
|
||||||
Loading…
Reference in a new issue