fix(query): #168 review — concurrent-slot wraps execute, doc/JS fixes

Devin Review on PR #168 found 5 issues — all real, all addressed.

🚩 ANALYSIS_001 (architectural): concurrent-slot guard didn't protect
actual BQ query execution. Earlier `_enforce_remote_bq_quota_and_cap`
ran dry-run + cap check inside `with quota.acquire(user_id):`, then
returned — releasing the slot BEFORE `analytics.execute(...)` ran. Spec
§4.3.3 explicitly designs the slot to wrap execute so the per-user
concurrent cap limits BQ scans, not just dry-runs.

Refactor to a context manager `_bq_quota_and_cap_guard`. Caller's `with`
block now holds the slot through dry-run, cap check, the actual
`analytics.execute(...)` (which is what triggers the BQ scan when DuckDB
resolves the master view), AND the post-flight record_bytes. Slot
released only when caller's `with` body exits.

🟡 BUG_001: placeholder JS walked `original` (full GET payload root)
instead of `original.sections`. `placeholder_from: ["data_source",
"bigquery", "project"]` is a section-relative path, so billing_project
placeholder NEVER rendered. Fix: walk `original.sections` (with fallback
to `original` for safety).

🟡 BUG_002 + BUG_003: admin_tables.html register and edit modals'
operator help text referenced `max_bytes_per_remote_query` (the old
name from the spec) but the actual config key is `bq_max_scan_bytes`
after the fix-up commit `6423888d` moved it. Replace both occurrences.

🟡 BUG_004: CHANGELOG entry said `api.query.bq_max_scan_bytes` (the
old path) but the read at app/api/query.py:53 is
`get_value("data_source", "bigquery", "bq_max_scan_bytes", ...)`. An
operator who set it under `api.query` in their yaml would have no
effect. Correct path in CHANGELOG.

All 95 #160-affected tests pass after the changes.
This commit is contained in:
ZdenekSrotyr 2026-05-04 13:28:03 +02:00
parent 6423888d02
commit 1263b80726
4 changed files with 64 additions and 32 deletions

View file

@ -20,7 +20,7 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C
to the data_source Save button; on failure the inline result uses the to the data_source Save button; on failure the inline result uses the
same structured shape as the CLI renderer so operators see the same same structured shape as the CLI renderer so operators see the same
hint format admins do. hint format admins do.
- **`api.query.bq_max_scan_bytes` server-config knob** (default 5 GiB): - **`data_source.bigquery.bq_max_scan_bytes` server-config knob** (default 5 GiB):
caps the BigQuery scan that `da query --remote` will issue against caps the BigQuery scan that `da query --remote` will issue against
`query_mode='remote'` BQ rows. Exceeded queries are rejected with a `query_mode='remote'` BQ rows. Exceeded queries are rejected with a
structured `400 remote_scan_too_large` detail naming the bytes, structured `400 remote_scan_too_large` detail naming the bytes,

View file

@ -1,5 +1,6 @@
"""Query endpoint — execute SQL against server DuckDB.""" """Query endpoint — execute SQL against server DuckDB."""
import contextlib
import logging import logging
import os import os
import re import re
@ -141,31 +142,41 @@ async def execute_query(
if blocked_bq_path is not None: if blocked_bq_path is not None:
raise HTTPException(status_code=403, detail=blocked_bq_path) raise HTTPException(status_code=403, detail=blocked_bq_path)
if dry_run_set: # Issue #160 §4.3.3 — concurrent-slot guard MUST wrap the actual
_enforce_remote_bq_quota_and_cap( # `analytics.execute(request.sql)` call (which is what triggers the
user_id=user.get("id") or user.get("email") or "anon", # BQ scan when DuckDB resolves the master view), not just the
dry_run_set=dry_run_set, # dry-run. Devin Review on PR #168 caught this — earlier
sql=request.sql, # implementation released the slot before execute. Use a context
# manager so dry-run + cap check + execute + record_bytes all run
# inside the slot.
user_id = user.get("id") or user.get("email") or "anon"
guard = (
_bq_quota_and_cap_guard(
user_id=user_id, dry_run_set=dry_run_set, sql=request.sql,
) )
if dry_run_set
else contextlib.nullcontext()
)
with guard:
# Open in read-only mode for extra safety
result = analytics.execute(request.sql).fetchmany(request.limit + 1)
columns = [desc[0] for desc in analytics.description] if analytics.description else []
truncated = len(result) > request.limit
rows = result[:request.limit]
# Open in read-only mode for extra safety # Post-flight: bill the dry-run estimate against the user's daily
result = analytics.execute(request.sql).fetchmany(request.limit + 1) # quota. Do this AFTER execute so a downstream failure (e.g. BQ
columns = [desc[0] for desc in analytics.description] if analytics.description else [] # outage) doesn't strand the user with charged-but-unrun bytes.
truncated = len(result) > request.limit # Stays inside the `with quota.acquire(...)` block so the slot
rows = result[:request.limit] # release happens after record_bytes completes.
if dry_run_set:
# Post-flight: bill the dry-run estimate against the user's daily try:
# quota. Do this AFTER execute so a downstream failure (e.g. BQ _build_quota_tracker().record_bytes(
# outage) doesn't strand the user with charged-but-unrun bytes. user_id, sum(b for _, _, b in dry_run_set),
if dry_run_set: )
user_id = user.get("id") or user.get("email") or "anon" except Exception:
try: # record_bytes is documented as never-raising; defensive guard.
_build_quota_tracker().record_bytes( logger.warning("quota record_bytes failed for user=%s", user_id)
user_id, sum(b for _, _, b in dry_run_set),
)
except Exception:
# record_bytes is documented as never-raising; defensive guard.
logger.warning("quota record_bytes failed for user=%s", user_id)
# Convert to serializable types # Convert to serializable types
serializable_rows = [] serializable_rows = []
@ -360,14 +371,27 @@ def _bq_guardrail_inputs(
return dry_run, None return dry_run, None
def _enforce_remote_bq_quota_and_cap(*, user_id: str, dry_run_set: list, sql: str) -> None: @contextlib.contextmanager
def _bq_quota_and_cap_guard(*, user_id: str, dry_run_set: list, sql: str):
"""Pre-flight check + dry-run + cap enforcement for /api/query BQ paths. """Pre-flight check + dry-run + cap enforcement for /api/query BQ paths.
Context-manager shape (Devin Review #5 on PR #168). Earlier implementation
ran the dry-run + cap check inside `with quota.acquire(user_id):`, then
returned releasing the concurrent slot BEFORE the actual BQ-touching
`analytics.execute(...)` ran. Spec §4.3.3 wants execute to be inside the
slot so the per-user concurrent cap actually limits BQ scans, not just
dry-runs.
Now: the helper is a context manager that yields after the cap check.
The caller's `with` block holds the slot through both dry-run AND the
subsequent `analytics.execute(...)` until the body exits.
1. `check_daily_budget` over-cap users get 429 BEFORE any BQ work. 1. `check_daily_budget` over-cap users get 429 BEFORE any BQ work.
2. `with quota.acquire(user_id)` concurrent slot guard. 2. `quota.acquire(user_id)` opened concurrent-slot held throughout.
3. Dry-run each `(bucket, source_table)` via the existing 3. Dry-run each `(bucket, source_table)` via `_bq_dry_run_bytes`.
`_bq_dry_run_bytes` helper. Sum bytes. 4. If sum > cap 400 `remote_scan_too_large`.
4. If sum > cap 400 `remote_scan_too_large` with structured detail. 5. Yield. Caller runs `analytics.execute(...)` + `record_bytes(...)`.
6. On exit, slot released.
Mutates `dry_run_set` in place: the third tuple element (bytes) is Mutates `dry_run_set` in place: the third tuple element (bytes) is
populated with the per-path dry-run result so the caller can sum and populated with the per-path dry-run result so the caller can sum and
@ -424,3 +448,7 @@ def _enforce_remote_bq_quota_and_cap(*, user_id: str, dry_run_set: list, sql: st
"the snapshot locally." "the snapshot locally."
), ),
}) })
# Yield control to the handler — slot stays acquired while the
# caller runs analytics.execute() + record_bytes().
yield total_bytes

View file

@ -322,9 +322,13 @@ function renderLeafInput(fieldId, section, pathSegments, kind, value, opts, isUn
// surface the access.py:339-340 billing→data fallback in the UI. // surface the access.py:339-340 billing→data fallback in the UI.
let placeholderAttr = ""; let placeholderAttr = "";
if (isUnset && opts && opts.spec && Array.isArray(opts.spec.placeholder_from)) { if (isUnset && opts && opts.spec && Array.isArray(opts.spec.placeholder_from)) {
// `original` is the full GET /api/admin/server-config response shape:
// {sections: {data_source: ...}, editable_sections: [...], ...}.
// `placeholder_from` is a section-relative path (e.g. ["data_source",
// "bigquery", "project"]) so walk `original.sections` not `original`.
const resolved = opts.spec.placeholder_from.reduce( const resolved = opts.spec.placeholder_from.reduce(
(cur, k) => (cur && typeof cur === "object" ? cur[k] : undefined), (cur, k) => (cur && typeof cur === "object" ? cur[k] : undefined),
original, original && original.sections ? original.sections : original,
); );
if (resolved !== undefined && resolved !== null && resolved !== "") { if (resolved !== undefined && resolved !== null && resolved !== "") {
placeholderAttr = ` placeholder="(defaults to ${escHtml(String(resolved))})"`; placeholderAttr = ` placeholder="(defaults to ${escHtml(String(resolved))})"`;

View file

@ -894,7 +894,7 @@
<br><strong>Live access:</strong> BASE TABLEs query via <br><strong>Live access:</strong> BASE TABLEs query via
<code>bq."dataset"."table"</code> (Storage Read API; predicate pushdown). <code>bq."dataset"."table"</code> (Storage Read API; predicate pushdown).
VIEWs and MATERIALIZED_VIEWs query via the BQ jobs API (full-scan estimate; VIEWs and MATERIALIZED_VIEWs query via the BQ jobs API (full-scan estimate;
cost-guarded by <code>max_bytes_per_remote_query</code>). cost-guarded by <code>bq_max_scan_bytes</code>).
<code>da query --remote</code> works for both. <code>da query --remote</code> works for both.
<br><strong>Synced access:</strong> handles both table and view transparently <br><strong>Synced access:</strong> handles both table and view transparently
— the scheduler runs <code>SELECT *</code> through the jobs API and writes a — the scheduler runs <code>SELECT *</code> through the jobs API and writes a
@ -1051,7 +1051,7 @@
<br><strong>Live access:</strong> BASE TABLEs query via <br><strong>Live access:</strong> BASE TABLEs query via
<code>bq."dataset"."table"</code> (Storage Read API; predicate pushdown). <code>bq."dataset"."table"</code> (Storage Read API; predicate pushdown).
VIEWs and MATERIALIZED_VIEWs query via the BQ jobs API (full-scan estimate; VIEWs and MATERIALIZED_VIEWs query via the BQ jobs API (full-scan estimate;
cost-guarded by <code>max_bytes_per_remote_query</code>). cost-guarded by <code>bq_max_scan_bytes</code>).
<code>da query --remote</code> works for both. <code>da query --remote</code> works for both.
<br><strong>Synced access:</strong> handles both transparently — the <br><strong>Synced access:</strong> handles both transparently — the
scheduler runs <code>SELECT *</code> through the jobs API and writes a scheduler runs <code>SELECT *</code> through the jobs API and writes a