* docs(spec): #134 unify BigQuery access behind BqAccess facade Brainstorm output for issue #134. Captures: - root cause (incl. correction of the issue's hypothesis about commit 33a9964) - BqAccess facade API + project resolution rules - error contract — typed BqAccessError mapped to HTTP 502 for upstream BQ failures, 500 for deployment/config bugs - migration plan for v2_scan, v2_sample, RemoteQueryEngine - test rewrite eliminating _bq_client_factory injection point - E2E verification protocol on agnes-development as success criterion * docs(spec): #134 revise after first review Incorporates code-reviewer findings: Must-fix: - Add v2_schema (2 copies of INSTALL/LOAD/SECRET dance) to migration scope. - Reframe v2_scan headline: missing try/except around BQ calls is the actual cause of bare 500s, not project resolution (which 33a9964 fixed). - List two more deferred call sites (extractor.py, register_bq_table) with explicit rationale. Important: - Drop billing != data clause from cross_project_forbidden heuristic; rely only on 'serviceusage' substring. billing != data is normal for cross-project setup, was over-classifying. - Split bq_bad_request into _user (400) and _server (502) variants; add sql_origin parameter to translate_bq_error so call sites declare whether SQL contains user input. - Add @functools.cache to BqAccess.from_config; document tests bypass via dependency_overrides. - Replace monkey-patched-classmethod test pattern with BqAccess(client_factory=...) injection at construction time. Cleaner than today's _bq_client_factory and 1:1 migration shape. - Keep BqProjects.data (reviewer assumed registry has source_project; it doesn't). Multi-project explicitly listed as non-goal with note. Nice-to-have: - Add 'Implementation strategy' section: 2 staged commits (bug fix alone is revertable; refactor follows). - Extend E2E protocol to cover all three endpoints, not just /sample. - Note removal of stale docstring at src/remote_query.py:204. * docs(spec): #134 revision 3 — incorporates second-round review Must-fix from second review: - v2_schema split into two migration cases: _fetch_bq_schema translates errors via translate_bq_error; _fetch_bq_table_options preserves its swallow-all 'except Exception → return {}' so /schema doesn't 502 on partition-info failures. - RemoteQueryEngine.__init__ now resolves BqAccess lazily (in _get_bq_client, not in __init__). Without this, ~7 DuckDB-only tests in test_remote_query.py would suddenly fail with not_configured. - translate_bq_error pass-through for BqAccessError is now load-bearing (clause 1, before any Google-API branch). bq.client() raises BqAccessError for bq_lib_missing/auth_failed; without explicit pass-through those fall to 'unknown' and re-raise as bare 500. - Commit 1 now emits the SAME structured response shape as commit 2 to avoid contract churn between commits. - BIGQUERY_PROJECT env-var precedence is BREAKING for env-only deployments — flagged in CHANGELOG ### Changed. Editorial: - sql_origin renamed to bad_request_status with values 'client_error' / 'upstream_error' (clearer about what the parameter actually decides). bq_bad_request_user/_server kinds collapsed to bq_bad_request (400) and bq_upstream_error (502). - CLI (cli/commands/query.py) noted as external RemoteQueryEngine caller; unaffected because new bq_access kwarg has default None. - Added unit/integration tests for the new contracts: test_translate_passes_through_BqAccessError, test_v2_scan_returns_500_on_bq_lib_missing, test_v2_schema_returns_200_with_empty_partition_on_bq_failure, test_resolve_succeeds_after_config_set. - E2E protocol now covers /schema as the fourth endpoint. - Documented functools.cache-doesn't-cache-exceptions semantics and fixture nullcontext-doesn't-close caveat for nested sessions. * docs(spec): #134 revision 4 — incorporates third-round review Third reviewer verdict: 'implementation-ready with two trivial edits'; explicitly noted prior rounds did the heavy lifting. Edits: 1. get_bq_access() module-level function instead of @classmethod @functools.cache from_config. Removes the classmethod-cache stacking footgun (different Python versions wrap differently) and gives FastAPI's dependency introspection a clean function signature. Drops the 'Do not subclass BqAccess' caveat that no longer applies. 2. Commit 1 strategy explicitly: wrap _fetch_bq_sample (v2_sample), _bq_dry_run_bytes + _run_bq_scan (v2_scan), and _fetch_bq_schema (v2_schema strict block). Do NOT touch _fetch_bq_table_options swallow-all in commit 1 — preserved as-is, then migrated (still preserved) in commit 2. All three endpoints emit the same structured body shape so client parsers see one consistent contract throughout the staged rollout. No more half-rolled-out window where /sample is bare 500 while /scan is structured 502. * docs(plan): #134 implementation plan — Phase 1 (atomic bug fix) + Phase 2 (BqAccess refactor) + Phase 3 (verification) Bite-sized TDD tasks. 3 phases, 16 tasks total: Phase 1 (Commit 1) — atomic bug fix across all four v2 endpoints: Tasks 1.1-1.5 wrap _fetch_bq_sample, _bq_dry_run_bytes, _run_bq_scan, _fetch_bq_schema with structured 502/400 try/except. _fetch_bq_table_options preserved untouched. CHANGELOG Fixed entries. Phase 2 (Commit 2) — BqAccess facade extraction + migration: Tasks 2.1-2.5 build connectors/bigquery/access.py bottom-up (BqProjects, BqAccessError, translate_bq_error, default factories, BqAccess class, get_bq_access module-level cached). Task 2.6 adds conftest.py fixture. Tasks 2.7-2.9 migrate v2_scan, v2_sample, v2_schema to BqAccess. Tasks 2.10-2.11 migrate RemoteQueryEngine + tests (lazy bq_access, drop _bq_client_factory). Task 2.12 CHANGELOG Changed BREAKING + Internal. Phase 3 — Verification: 3.1 full pytest. 3.2 squash into two PR-shape commits. 3.3 manual E2E on agnes-development per spec protocol → close #134. Self-review table maps spec sections to implementing tasks; no gaps. * fix(v2): #134 structured 502/400 on BQ errors across /scan, /scan/estimate, /sample, /schema Wraps the BigQuery call sites in v2_scan, v2_sample, and v2_schema (strict block only) with try/except for google.api_core exceptions, translating to HTTPException with a structured body shape: {error, message, details}. Fixes Pavel's report (#134) where these endpoints returned bare HTTP 500 with no body when the SA on agnes-development hit cross-project Forbidden on serviceusage.services.use. Also fixes /sample's missing billing_project fallback (the bug 33a9964 fixed for /scan never landed here). Status code split: - /scan, /scan/estimate: BadRequest -> 400 (bq_bad_request) since SQL is user-derived from req.select/where/order_by. - /sample, /schema: BadRequest -> 502 (bq_upstream_error) since SQL is server-constructed from validated identifiers. - All Forbidden -> 502 with cross_project_forbidden if 'serviceusage' in error message (with hint pointing at data_source.bigquery.billing_project), else bq_forbidden. Body shape matches what the upcoming BqAccess refactor (next commit) will produce, so client-side parsers see one consistent contract throughout the staged rollout. _fetch_bq_table_options preserved exactly as-is — its swallow-all-and-return-empty contract is intentional and survives into the refactor; /schema continues to return 200 with empty partition info when partition queries fail. Outer wraps in scan_endpoint, scan_estimate_endpoint, sample, and schema endpoints exist only to make the test pattern (monkeypatching whole _fetch_* functions) work, and are tagged TODO(#134 Phase 2) for removal once BqAccess centralizes translation. * refactor(bq): #134 BqAccess facade — unify v2_scan, v2_sample, v2_schema, RemoteQueryEngine Extracts the duplicated BigQuery-access pattern (project resolution + client construction + DuckDB-extension session + Google-API error translation) into connectors/bigquery/access.py. Migrates four call sites to use it: - app/api/v2_scan.py — _bq_dry_run_bytes, _run_bq_scan - app/api/v2_sample.py — _fetch_bq_sample - app/api/v2_schema.py — _fetch_bq_schema (strict translation), _fetch_bq_table_options (preserves swallow-all best-effort contract) - src/remote_query.py — RemoteQueryEngine, lazy bq_access kwarg The new module exposes: - BqProjects (frozen dataclass: billing + data project IDs) - BqAccessError (typed exception with HTTP_STATUS class mapping) - BqAccess (facade with injectable client_factory/duckdb_session_factory for tests; defaults call the real google-cloud-bigquery + DuckDB extension) - get_bq_access (module-level @functools.cache; FastAPI Depends target) - translate_bq_error (Google API exception → BqAccessError mapper, with BqAccessError pass-through, 'serviceusage'-substring heuristic for cross_project_forbidden, and bad_request_status param distinguishing user-derived (400) from server-constructed (502) SQL) - _default_client_factory, _default_duckdb_session_factory RemoteQueryEngine.__init__ no longer accepts _bq_client_factory; tests migrate to bq_access=BqAccess(projects, client_factory=...). DuckDB-only RemoteQueryEngine tests need no changes — bq_access defaults to None and get_bq_access() is only invoked on first BQ call (lazy resolution). BqAccessError raised internally is translated to RemoteQueryError( error_type="bq_error") in _get_bq_client to preserve the engine's existing public contract — CLI and /api/query/hybrid callers see no change. Endpoint tests (test_v2_scan, test_v2_scan_estimate, test_v2_sample, test_v2_schema) migrate from monkey-patching whole _fetch_* functions to using the new bq_access fixture in tests/conftest.py — which exercises the REAL translation path through BqAccess + translate_bq_error, closing the test gap flagged in Task 1.1's review. Side-effect behavior change: v2_sample's FROM clause now uses the data project (instance.yaml data_source.bigquery.project), not the conflated billing_project from Phase 1. Documented in CHANGELOG ### Internal. BREAKING for deployments combining BIGQUERY_PROJECT env var with data_source.bigquery.project in instance.yaml — env var now overrides data project too. See CHANGELOG ### Changed. Two known-duplicate BQ-access sites (connectors/bigquery/extractor.py, scripts/duckdb_manager.register_bq_table) explicitly out of scope; tracked as follow-up. Removed stale docstring at the previous src/remote_query.py:204 that referenced scripts.duckdb_manager._create_bq_client as the default BQ client factory (RemoteQueryEngine never actually used that function). Test counts: tests/test_bq_access.py +27 (new), tests/test_v2_*.py + tests/test_remote_query.py migrated to bq_access fixture (counts unchanged or +1-2 per file). Full suite: 2086 passed, 8 pre-existing failures (DB migration tests with unrelated internal_roles DependencyException — not introduced by this PR). * fix(bq_access): translate DefaultCredentialsError to BqAccessError(auth_failed) CI on PR #138 caught: bigquery.Client(...) resolves Application Default Credentials at construction time; without ADC (CI without SA key, dev laptop without 'gcloud auth application-default login') it raises google.auth.exceptions.DefaultCredentialsError synchronously. Pre-fix _default_client_factory only caught ImportError, so DefaultCredentialsError propagated as raw exception — and from production endpoints would surface as bare 500 (the exact failure mode #134 sets out to fix). Now translates to BqAccessError(kind='auth_failed', details.hint='Run gcloud auth application-default login...'). Endpoint catch chain returns HTTP 502 with structured body. Adds unit test test_raises_auth_failed_on_default_credentials_error. Third-round spec review flagged this case in passing; the fix didn't land. CI's auth-less environment surfaced it. * fix(bq_access): get_bq_access() returns sentinel instead of raising when not configured Devin BUG_0001 on PR #138 review: 'get_bq_access() as FastAPI Depends breaks all v2 endpoints for non-BigQuery instances'. Pre-fix: get_bq_access() raised BqAccessError(not_configured) when neither BIGQUERY_PROJECT env nor data_source.bigquery.project was set. Because FastAPI resolves Depends() BEFORE the endpoint body runs, this exception fires during dep-injection — the endpoint's try/except BqAccessError clause never gets a chance to catch it. Result: every v2 request on Keboola-only or CSV-only instances returned bare HTTP 500, even for local-source tables that never touch BigQuery. Fix: get_bq_access() now returns a sentinel BqAccess with empty BqProjects and factories that raise BqAccessError(not_configured) on actual use. Construction succeeds, FastAPI's dep-injection cleanly yields the sentinel, the endpoint runs. The local-source code path in build_sample / build_schema / etc. never calls bq.client() or bq.duckdb_session() (it reads parquet directly), so non-BQ tables return 200 as before. Only when an endpoint actually tries to query BQ (source_type == 'bigquery') does the sentinel raise — and the endpoint's existing except BqAccessError catches it normally, returning structured 502 with hint. Test get_bq_access::test_raises_not_configured_when_neither_set renamed and rewritten to test_returns_sentinel_when_neither_set: asserts BqAccess is returned, then asserts client() and duckdb_session() each raise BqAccessError(not_configured) on call. Test test_does_not_cache_exceptions removed (no longer applicable) and replaced with test_sentinel_is_cached_per_process documenting the operator-restart-on-config-change contract. * docs(spec+plan): #134 genericize customer-specific tokens (CLAUDE.md OSS rule) Devin BUG_0001/0002 round 3 on PR #138: spec and plan docs contained customer-specific deployment hostnames, deployment names, and a GCP project ID that violated CLAUDE.md's vendor-agnostic OSS rule ('Nothing customer-specific belongs in code, configuration defaults, comments, docs, commit messages, PR titles, or PR bodies'). Replacements: agnes-development.groupondev.com -> <your-agnes-host> agnes-development -> <your-dev-instance> prj-grp-dataview-prod-1ff9 -> <your-data-project> s1_session_landings -> <bq_table_id> E2E verification semantics unchanged — operators still run the same four curls + config flip + retry, just substituting their own host / deployment name / project / table. * fix(bq_access): hook get_bq_access.cache_clear into instance_config.reset_cache Devin ANALYSIS_0004 on PR #138: get_bq_access is @functools.cache'd at process level, so it captures BigQuery project IDs at first call and ignores subsequent instance.yaml changes. Pre-Phase-2 the v2 endpoints re-read get_value() on every request, so admin /api/admin/server-config saves (which call instance_config.reset_cache()) hot-reloaded the BQ project. Without this fix, my refactor silently regresses that contract — operators editing instance.yaml via the admin UI would see no effect on v2 endpoints until container restart. instance_config.reset_cache() now also calls connectors.bigquery.access.get_bq_access.cache_clear() (lazy import, swallowed if connectors module isn't loaded — keeps instance_config usable in isolated unit tests). Adds test_instance_config_reset_cache_invalidates_get_bq_access as regression guard. Updates CHANGELOG Internal entry to mention the hot-reload contract + the not-configured sentinel behavior (round-3 fix from Devin BUG_0001 was previously only in commit message). * fix(bq_access): surface not_configured before identifier validation + plan path genericize Devin BUG_0001 + BUG_0002 round 5 on PR #138. BUG_0001 (plan doc): personal filesystem path violated CLAUDE.md vendor-agnostic rule. Replaced with '<worktree-root>' placeholder. BUG_0002 (sentinel error path): when get_bq_access() returns the sentinel BqAccess (BQ not configured), the empty bq.projects.data was reaching validate_quoted_identifier first and raising ValueError -> endpoint mapped to HTTP 400 'unsafe_identifier' instead of structured 500 'not_configured' with hint. Each fetch helper now checks 'if not bq.projects.data: bq.client()' as the first step, which triggers the sentinel's BqAccessError(not_configured). Endpoint catches the typed error and returns HTTP 500 with hint pointing at data_source.bigquery.project. Best-effort _fetch_bq_table_options returns {} silently in this case (preserves the swallow-all contract). * fix(bq_access): classify DuckDB-native exceptions from bigquery_query() via string match Devin ANALYSIS on PR #138 review (latest round). The DuckDB bigquery extension is a C++ plugin making its own HTTP calls — when BQ returns 403, it throws duckdb.IOException with the BQ error embedded as text, not gax.Forbidden. translate_bq_error's isinstance checks would miss these, falling to case 7 → bare 500 in production for v2_scan, v2_sample, and v2_schema (the bigquery_query() paths). Fix: last-resort string-match heuristic before the re-raise. 'Forbidden' / '403' / 'Bad Request' / '400' in the lowercased message classifies via the same kind hierarchy. The 'serviceusage' substring still distinguishes cross_project_forbidden from bq_forbidden. Specific enough that random exceptions without HTTP-error keywords still re-raise. Adds 4 unit tests covering the new heuristic + the 'don't swallow random exceptions' invariant. * chore(release): cut 0.22.0 PR #138 contains issue #134 user-visible behavior changes: - BREAKING: BIGQUERY_PROJECT env var now overrides instance.yaml data_source.bigquery.project for v2 endpoints (previously RemoteQueryEngine billing only). - Fixed: structured 502/400 on /api/v2/sample, /scan, /scan/estimate, /schema when BigQuery raises Forbidden/BadRequest (was bare 500). - Internal: BqAccess facade refactor unifying four duplicate BQ-access call sites; instance_config.reset_cache() now invalidates BqAccess cache too so admin server-config saves hot-reload BQ project IDs. Bumps to 0.22.0 because PR #137 merged first and took 0.21.0.
39 KiB
Issue #134 — Unify BigQuery access behind BqAccess, fix v2_sample + 502 contract
Date: 2026-04-29 (revision 4 — incorporates three rounds of code review)
Issue: keboola/agnes-the-ai-analyst#134
Branch: fix/134-bq-access-unify
Problem
Issue #134 reports that v2 BigQuery endpoints on <your-dev-instance> (0.18.0) return HTTP 500 with no body:
POST /api/v2/scan/estimatePOST /api/v2/scanGET /api/v2/sample/{table_id}
POST /api/query/hybrid against the same tables works correctly.
Root cause analysis
Two distinct bugs cause the symptom:
Bug A — v2_scan.py has correct project resolution but no error translation
Commit 33a9964 added the billing_project parameter to app/api/v2_scan.py. Today both endpoints (scan_endpoint line 385, scan_estimate_endpoint line 221) read data_source.bigquery.billing_project from instance.yaml, fall back to project, and pass it to the BQ client constructor.
However, _bq_dry_run_bytes (lines 43-55) and _run_bq_scan (lines 266-282) have no try/except at all. The endpoint-level handlers in scan_endpoint and scan_estimate_endpoint catch WhereValidationError, QuotaExceededError, FileNotFoundError, PermissionError, ValueError — but google.api_core.exceptions.Forbidden and BadRequest propagate as bare HTTP 500 with no body.
This is the headline cause of the v2_scan/estimate 500s. When the SA on <your-dev-instance> lacks serviceusage.services.use on whatever project resolves to billing, BQ raises Forbidden, which propagates uncaught. The config fix landed in 33a9964; the error translation didn't.
Bug B — v2_sample.py is missing the billing_project split entirely
app/api/v2_sample.py:104 reads only data_source.bigquery.project, never billing_project. It then passes that project to bigquery_query() as the billing target. This is the same bug 33a9964 fixed in v2_scan.py, just in a sibling file that didn't get the same patch.
v2_sample.py also has no structured error handling — it catches only FileNotFoundError (404) and PermissionError (403). Anything else (Google API errors, identifier ValueError, ImportError) bubbles up as bare HTTP 500.
Bug C — v2_schema.py has the same shape (one strict block, one best-effort block)
app/api/v2_schema.py contains two separate blocks of the INSTALL/LOAD/SECRET/bigquery_query() dance with different error semantics:
_fetch_bq_schema(lines 48-73): hard-required for the endpoint to return a schema. Today swallows nothing; would benefit from structured translation._fetch_bq_table_options(lines 90-129): best-effort partition/cluster info. Wraps everything intry/except Exception → return {}and alogger.warning. Endpoint returns successfully even if partition info fails.
Schema reportedly works for Pavel today because both queries hit INFORMATION_SCHEMA, which doesn't trip the serviceusage permission check on the billing project. But the same code path is one query change away from the same 500.
Operator-config dimension (out of scope for code fix)
If instance.yaml on <your-dev-instance> does not set data_source.bigquery.billing_project, the fallback or project_id puts the call right back in the broken state. The fix surfaces this to the operator via a structured error body containing a hint pointing at the missing config key.
Five+ duplicate code paths today
The BQ-access pattern is duplicated across:
| File | Function | Shape | Status in this PR |
|---|---|---|---|
app/api/v2_scan.py |
_bq_dry_run_bytes, _run_bq_scan |
bigquery.Client (Python SDK) |
In scope |
app/api/v2_sample.py |
_fetch_bq_sample |
DuckDB bigquery_query() |
In scope |
app/api/v2_schema.py |
_fetch_bq_schema |
DuckDB bigquery_query() |
In scope (strict translation) |
app/api/v2_schema.py |
_fetch_bq_table_options |
DuckDB bigquery_query() |
In scope (preserve swallow-all except Exception → {}) |
src/remote_query.py |
RemoteQueryEngine._get_bq_client |
bigquery.Client (Python SDK) |
In scope |
connectors/bigquery/extractor.py |
sync-time extractor | mixed (ATTACH 'project=...' + bigquery_query()) |
Deferred (see below) |
scripts/duckdb_manager.py |
register_bq_table |
bigquery.Client (Python SDK) |
Deferred (see below) |
Deferred sites — rationale:
extractor.pyruns at sync time, async, behind the scheduler. Errors surface in logs /sync_history, not as HTTP responses. Different lifecycle, different control flow (usesATTACHnotbigquery.Client.query). Migrating it doubles the PR size for benefit not in #134's scope. Track as follow-up issue.register_bq_tableis admin-only, runs once at table registration time (M1 from #108). Its project resolution isbq_project or BIGQUERY_PROJECT env— noinstance.yamlfallback, different semantics from the runtime path. Different concern. Track as follow-up issue.
The stale docstring at src/remote_query.py:204 claims _bq_client_factory defaults to scripts.duckdb_manager._create_bq_client. It doesn't — _get_bq_client constructs _bq_module.Client(project=project) inline at line 450. Will self-correct when _bq_client_factory is removed.
Goals
- Fix the v2_sample billing_project bug so cross-project BQ reads work when the operator sets
billing_project. - Fix the v2_scan/estimate error translation so cross-project Forbidden surfaces as a structured 502 instead of bare 500.
- Translate Google API errors into structured responses with actionable bodies across all three v2 endpoints (the user / CLI gets an error shape they can reason about, not bare 500).
- Eliminate four duplicate BQ-access call sites behind a single facade so the fix lives in one place. (Two deferred — see above.)
- Preserve test invasiveness — existing tests that mock the BQ client must remain straightforward to write, ideally cleaner than today's
_bq_client_factoryinjection point.
Non-goals
- Changing the operator-facing config schema.
data_source.bigquery.billing_projectalready exists; we route everything through it. - Auto-detecting cross-project misconfiguration at startup (rejected for scope; would require a real BQ call at boot).
- Touching the
/api/query/hybridendpoint behavior —RemoteQueryEngineinternals change, but the HTTP contract does not. - Migrating
extractor.pyorregister_bq_tabletoBqAccess(deferred — see above). - Adding per-table multi-project support. Today's
table_registryschema has nosource_projectcolumn; every BQ table usesinstance.yaml'sprojectas the data project. Future multi-project is a separate feature; spec notes the constraint so it can be lifted cleanly later. - Schema migration / data migration of any kind.
- Hot-reload of
instance.yaml.get_bq_access()is@functools.cache'd at process level. Changing config requires a container restart. (Today's behavior is identical —instance.yamlis loaded once at boot.)
Design
Architecture — new module connectors/bigquery/access.py
connectors/bigquery/
├── auth.py (existing — get_metadata_token, unchanged)
├── extractor.py (existing — unchanged in this PR)
└── access.py (NEW)
├── BqProjects (frozen dataclass)
├── BqAccessError (typed exception with HTTP_STATUS class mapping)
├── BqAccess (facade with injectable factories)
├── get_bq_access() -> BqAccess (module-level, @functools.cache; FastAPI Depends target)
├── translate_bq_error(e, projects, *, bad_request_status) -> BqAccessError
├── _default_client_factory (real bigquery.Client construction)
└── _default_duckdb_session_factory (real INSTALL/LOAD/SECRET dance)
BqAccess public API
@dataclass(frozen=True)
class BqProjects:
billing: str # billing/quota target — used as `project=` and `quota_project_id=`
data: str # data project for FROM clauses (today: instance.yaml `project`).
# Note: locked to a single project per instance until table_registry
# grows a per-table source_project column. See "Non-goals".
class BqAccessError(Exception):
HTTP_STATUS = {
"not_configured": 500, # admin/config bug — page on-call
"bq_lib_missing": 500, # deployment bug
"auth_failed": 502, # GCP metadata server unreachable
"cross_project_forbidden": 502, # SA lacks serviceusage.services.use on billing project
"bq_forbidden": 502, # other Forbidden from BQ
"bq_bad_request": 400, # 400 from BQ when caller flagged it as client-derived
"bq_upstream_error": 502, # all other upstream BQ failures (incl. server-derived BadRequest)
}
def __init__(self, kind: str, message: str, details: dict | None = None):
self.kind = kind
self.message = message
self.details = details or {}
super().__init__(message)
class BqAccess:
"""Single entry point for BigQuery access — config resolution, client construction,
DuckDB-extension session, and error translation. Stateless after construction.
Factories are injectable for tests:
bq = BqAccess(
BqProjects(billing="test-billing", data="test-data"),
client_factory=lambda projects: mock_client,
)
"""
def __init__(
self,
projects: BqProjects,
*,
client_factory: Callable[[BqProjects], "bigquery.Client"] | None = None,
duckdb_session_factory: Callable[[BqProjects], "AbstractContextManager"] | None = None,
):
self._projects = projects
self._client_factory = client_factory or _default_client_factory
self._duckdb_session_factory = duckdb_session_factory or _default_duckdb_session_factory
@property
def projects(self) -> BqProjects: ...
def client(self) -> "bigquery.Client":
"""Construct (or retrieve from injected factory) a BigQuery client billed to
projects.billing. Raises BqAccessError(kind='bq_lib_missing') if
google-cloud-bigquery is not installed; raises BqAccessError(kind='auth_failed')
on credential resolution failure."""
@contextmanager
def duckdb_session(self) -> Iterator["duckdb.DuckDBPyConnection"]:
"""Yield in-memory DuckDB conn with bigquery extension loaded + SECRET set
from get_metadata_token(). Auto-cleanup. Translates INSTALL/LOAD/SECRET failures
and metadata-token failures to BqAccessError(kind='auth_failed' or 'bq_lib_missing')."""
@functools.cache
def get_bq_access() -> "BqAccess":
"""Module-level factory used as the FastAPI Depends target.
Resolves projects from BIGQUERY_PROJECT env → instance.yaml billing_project →
instance.yaml project. Returns a BqAccess instance with default factories.
Process-cached: config is loaded at boot and doesn't change at runtime. Hot-reload
of instance.yaml is explicitly out of scope. Note that functools.cache does NOT
cache exceptions, so a failed call (BqAccessError(kind='not_configured')) is retried
on the next invocation — useful when the operator fixes config and restarts a
request without restarting the container.
Tests inject via FastAPI's dependency_overrides[get_bq_access] = lambda: bq, or
construct BqAccess(...) directly for non-endpoint code (e.g. RemoteQueryEngine).
Tests do NOT mutate this cache.
Module-level (not BqAccess.from_config classmethod) to avoid the
@classmethod + @functools.cache stacking footgun and to give FastAPI's
dependency introspection a clean function signature."""
def translate_bq_error(
e: Exception,
projects: BqProjects,
*,
bad_request_status: Literal["client_error", "upstream_error"],
) -> BqAccessError:
"""Convert Google API exceptions into a typed BqAccessError.
Mapping (FIRST match wins):
1. BqAccessError -> pass through unchanged (this is critical:
bq.client() and bq.duckdb_session() can raise
BqAccessError directly for bq_lib_missing /
auth_failed, and those must round-trip through
translate_bq_error without being reclassified
as 'unknown' and re-raised).
2. Forbidden + 'serviceusage' in str(e).lower()
-> cross_project_forbidden (with hint)
3. Forbidden -> bq_forbidden
4. BadRequest, bad_request_status='client_error'
-> bq_bad_request (HTTP 400)
5. BadRequest, bad_request_status='upstream_error'
-> bq_upstream_error (HTTP 502)
6. GoogleAPICallError (other) -> bq_upstream_error
7. Anything else -> RE-RAISED unchanged (don't swallow programmer errors)
`bad_request_status` MUST be supplied by the caller. It distinguishes:
- 'client_error': SQL contains user input (select/where/order_by/limit). BQ rejecting
it is plausibly the user's fault — return 400 with the BQ message.
- 'upstream_error': SQL is fully built server-side from validated identifiers.
BQ rejecting it is server-side corruption — return 502."""
Project resolution rules (single source of truth)
get_bq_access resolves projects in this order (matching today's RemoteQueryEngine._get_bq_client behavior):
BIGQUERY_PROJECTenv var → if set, used as both billing and data (legacy override).data_source.bigquery.billing_projectfrominstance.yaml→ billing.data_source.bigquery.projectfrominstance.yaml→ data, and billing if (2) is unset.
If neither (1) nor (3) yields a value: BqAccessError(kind='not_configured', details={"hint": "set data_source.bigquery.project in instance.yaml"}).
BREAKING for env-only deployments. Today
RemoteQueryEngine._get_bq_clientusesBIGQUERY_PROJECTonly as the billing project; data project for FROM clauses comes from elsewhere (e.g. SQL strings the user wrote). After this change,BIGQUERY_PROJECTsets both billing and data, overridingdata_source.bigquery.projectfor FROM-clause construction inv2_scan/v2_sample/v2_schema. Deployments that combine env-var-for-billing + yaml-for-data must migrate by settingdata_source.bigquery.billing_projectininstance.yamland clearing the env var. Flagged in CHANGELOG.
Cross-project Forbidden detection
'serviceusage' in str(e).lower() is the only reliable signal of the cross-project quota issue. Other cross-project failures (revoked SA, table-level ACL, dataset-level deny) degrade to generic bq_forbidden — still 502, still structured body, just less specific in the hint. billing != data is the normal cross-project setup, not a signal of failure; using it to classify would over-trigger and misdirect operators.
billing != data MAY enrich details.hint ("Note: billing and data projects differ — verify SA has both BQ Read on data project AND serviceusage.services.use on billing project") but does not alter kind.
Status code mapping rationale
- 400 for
bq_bad_request— BQ rejecting a SQL string built from user input is plausibly the user's fault. Returns it back to the user as a 4xx with the BQ message. - 502 for
bq_forbidden,cross_project_forbidden,bq_upstream_error,auth_failed— upstream BQ refused or was unreachable. Operationally distinguishable from 500 in dashboards: "integration with BQ broken" vs "Agnes itself broken". - 500 for
not_configured,bq_lib_missing— deployment/admin-config bugs that should page on-call, not transient upstream errors.
Migration of four call sites
A. app/api/v2_scan.py
_bq_dry_run_bytes and _run_bq_scan change signature from (project: str, sql: str) to (bq: BqAccess, sql: str). Body (note: bq.client() is called outside the try/except so a BqAccessError from it still propagates correctly through the endpoint's except BqAccessError clause, even before the translator's pass-through would catch it):
def _bq_dry_run_bytes(bq: BqAccess, sql: str) -> int:
from google.cloud import bigquery
client = bq.client() # may raise BqAccessError(bq_lib_missing/auth_failed); propagates as-is
try:
job = client.query(
sql, job_config=bigquery.QueryJobConfig(dry_run=True, use_query_cache=False),
)
return int(job.total_bytes_processed or 0)
except Exception as e:
raise translate_bq_error(e, bq.projects, bad_request_status="client_error")
_run_bq_scan mirrors the same shape with bad_request_status="client_error" (SQL contains user's select/where/order_by).
scan_endpoint, scan_estimate_endpoint, estimate, and run_scan lose the project_id and billing_project parameters in favor of bq: BqAccess. Endpoints inject via FastAPI:
@router.post("/scan")
async def scan_endpoint(
raw: dict,
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
bq: BqAccess = Depends(get_bq_access),
):
try:
ipc = run_scan(conn, user, raw, bq=bq, quota=...)
return Response(...)
except WhereValidationError as e:
raise HTTPException(status_code=400, detail=...)
except QuotaExceededError as e:
raise HTTPException(status_code=429, detail=...)
except BqAccessError as e:
raise HTTPException(
status_code=BqAccessError.HTTP_STATUS.get(e.kind, 500),
detail={"error": e.kind, "message": e.message, "details": e.details},
)
_build_bq_sql(table_row, project_id, req) keeps project_id as a parameter (it's the data project for the FROM clause). Call sites pass bq.projects.data. Forward-compat note in code comments: when table_registry grows a per-table source_project column, callers should prefer table_row.get('source_project') or bq.projects.data. tests/test_v2_scan.py:206-207 imports _build_bq_sql directly — the signature is unchanged, no test update needed.
B. app/api/v2_sample.py
_fetch_bq_sample changes signature to (bq: BqAccess, dataset: str, table: str, n: int):
def _fetch_bq_sample(bq: BqAccess, dataset: str, table: str, n: int) -> list[dict]:
from src.identifier_validation import validate_quoted_identifier
if not (validate_quoted_identifier(bq.projects.data, "BQ project")
and validate_quoted_identifier(dataset, "BQ dataset")
and validate_quoted_identifier(table, "BQ source_table")):
raise ValueError("unsafe BQ identifier in registry — refusing to query")
bq_sql = f"SELECT * FROM `{bq.projects.data}.{dataset}.{table}` LIMIT {int(n)}"
with bq.duckdb_session() as conn:
try:
df = conn.execute(
"SELECT * FROM bigquery_query(?, ?)",
[bq.projects.billing, bq_sql],
).fetchdf()
return df.to_dict(orient="records")
except Exception as e:
raise translate_bq_error(e, bq.projects, bad_request_status="upstream_error")
build_sample and sample endpoint signatures lose project_id for bq: BqAccess = Depends(get_bq_access). Endpoint catch chain adds BqAccessError and ValueError → 400 (kind='unsafe_identifier').
C. app/api/v2_schema.py — two blocks, two semantics
The two blocks have different error contracts — preserve them.
_fetch_bq_schema (lines 48-73) — strict; the endpoint can't return without it. Migrate to bq.duckdb_session() + translate_bq_error(..., bad_request_status="upstream_error"). Endpoint wraps in try/except BqAccessError and returns the structured 502.
_fetch_bq_table_options (lines 90-129) — best-effort; current code has except Exception as e: logger.warning(...); return {}. Preserve this contract. Migrate to use bq.duckdb_session() for the connection but keep the outer try/except Exception → return {}. Do NOT translate via translate_bq_error; the /schema endpoint must continue to return successfully with empty partition info when BQ is unreachable, not 502.
def _fetch_bq_table_options(bq: BqAccess, dataset: str, table: str) -> dict:
from src.identifier_validation import validate_quoted_identifier
if not (validate_quoted_identifier(bq.projects.data, "BQ project")
and validate_quoted_identifier(dataset, "BQ dataset")
and validate_quoted_identifier(table, "BQ source_table")):
return {}
try:
with bq.duckdb_session() as conn:
bq_sql = (
f"SELECT column_name, is_partitioning_column, clustering_ordinal_position "
f"FROM `{bq.projects.data}.{dataset}.INFORMATION_SCHEMA.COLUMNS` "
f"WHERE table_name = ? "
f"ORDER BY clustering_ordinal_position NULLS LAST"
)
rows = conn.execute(
"SELECT * FROM bigquery_query(?, ?, ?)",
[bq.projects.billing, bq_sql, table],
).fetchall()
if not rows:
return {}
partition_by = next(
(r[0] for r in rows if (r[1] or "").upper() == "YES"),
None,
)
clustered_by = [r[0] for r in rows if r[2] is not None]
return {"partition_by": partition_by, "clustered_by": clustered_by}
except Exception as e:
logger.warning("BQ table options fetch failed for %s.%s.%s: %s",
bq.projects.data, dataset, table, e)
return {}
Endpoint signature gains bq: BqAccess = Depends(get_bq_access).
D. src/remote_query.py — lazy BqAccess construction
RemoteQueryEngine.__init__ signature changes:
def __init__(
self,
..., # existing args
bq_access: BqAccess | None = None,
):
...
self._bq = bq_access # may stay None — resolved lazily
Lazy resolution is critical. Many existing tests in tests/test_remote_query.py (lines 106, 148, 196, 417, 520, 529, 538) construct RemoteQueryEngine(analytics_conn) for DuckDB-only paths that never touch BQ. After this change, eager get_bq_access() at construction would fail those tests with not_configured in any environment without instance.yaml configured for BQ. Resolve only when actually needed:
def _get_bq_client(self):
if self._bq is None:
from connectors.bigquery.access import get_bq_access
self._bq = get_bq_access() # may raise BqAccessError; that's fine
return self._bq.client()
_bq_client_factory parameter, the docstring at line 204 (which referenced the stale scripts.duckdb_manager._create_bq_client default), and lines 407-450 of _get_bq_client all delete. The fallback chain logic moves to get_bq_access.
External caller note: cli/commands/query.py:120 constructs RemoteQueryEngine(conn, **engine_kwargs). The new bq_access kwarg has default None, so the CLI continues to work via the lazy get_bq_access() path. No CLI change needed.
Test rewrite
The existing _bq_client_factory injection point in RemoteQueryEngine and tests like tests/test_remote_query.py currently look like:
engine = RemoteQueryEngine(_bq_client_factory=lambda project: mock_client)
Migrates to direct BqAccess injection — no monkey-patching, no classmethod gymnastics:
def test_remote_query_x():
bq = BqAccess(
BqProjects(billing="test-billing", data="test-data"),
client_factory=lambda projects: mock_client,
)
engine = RemoteQueryEngine(..., bq_access=bq)
engine.execute(...)
DuckDB-only RemoteQueryEngine tests (the ~7 sites listed above) need NO change — bq_access=None defaults preserve today's behavior; get_bq_access() is never called.
For FastAPI endpoint tests (tests/test_v2_*.py), use FastAPI's dependency_overrides:
def test_v2_scan_x(client):
bq = BqAccess(
BqProjects(billing="test-billing", data="test-data"),
client_factory=lambda projects: mock_client,
)
app.dependency_overrides[get_bq_access] = lambda: bq
try:
response = client.post("/api/v2/scan", json={...})
assert response.status_code == 200
finally:
app.dependency_overrides.clear()
Optional shared fixture in tests/conftest.py:
@pytest.fixture
def bq_access():
"""Build a BqAccess with pluggable factories. Returns a callable that overrides
the FastAPI Depends and yields the BqAccess instance."""
def _build(*, client=None, duckdb_conn=None,
billing="test-billing", data="test-data"):
bq = BqAccess(
BqProjects(billing=billing, data=data),
client_factory=(lambda projects: client) if client else None,
duckdb_session_factory=(
lambda projects: contextlib.nullcontext(duckdb_conn)
) if duckdb_conn else None,
)
app.dependency_overrides[get_bq_access] = lambda: bq
return bq
yield _build
app.dependency_overrides.clear()
Fixture caveat for nested sessions.
contextlib.nullcontext(duckdb_conn)does NOT close the conn on__exit__. The production path closes it via_default_duckdb_session_factory. Tests that exercise multiple sequentialbq.duckdb_session()calls within a single test function will see the same conn object both times — fine for assertion purposes, but won't catch close-and-reopen regressions. The unit testtest_duckdb_session_closes_on_exitcovers the close behavior on the production factory; the fixture is for endpoint integration tests that don't care.
Tests
Unit tests — tests/test_bq_access.py (new)
| Test | Asserts |
|---|---|
test_resolve_env_var_wins |
BIGQUERY_PROJECT=foo overrides instance.yaml; both billing+data = foo |
test_resolve_billing_falls_back_to_project |
unset billing_project → both billing and data = project |
test_resolve_billing_distinct_from_project |
both set → billing and data differ |
test_resolve_raises_when_neither_set |
BqAccessError(kind='not_configured') with hint |
test_resolve_succeeds_after_config_set |
call once → raises; set config; call again → succeeds (functools.cache doesn't cache exceptions) |
test_get_bq_access_is_cached |
two successful calls return the same instance |
test_translate_forbidden_serviceusage |
gax.Forbidden('serviceusage.services.use') → kind='cross_project_forbidden' + hint |
test_translate_forbidden_no_serviceusage_diff_projects |
gax.Forbidden('table-level perm denied') + billing≠data → kind='bq_forbidden' (NOT cross_project) |
test_translate_forbidden_same_project |
gax.Forbidden + billing==data → kind='bq_forbidden' |
test_translate_bad_request_client_error |
gax.BadRequest, bad_request_status='client_error' → kind='bq_bad_request', status 400 |
test_translate_bad_request_upstream_error |
gax.BadRequest, bad_request_status='upstream_error' → kind='bq_upstream_error', status 502 |
test_translate_passes_through_BqAccessError |
BqAccessError('bq_lib_missing', ...) in → identical out (CRITICAL — guards against bq.client() raising it inside try/except) |
test_translate_unknown_reraises |
RuntimeError("oops") is re-raised, NOT silently wrapped |
test_client_uses_billing_as_quota_project |
_default_client_factory constructs with quota_project_id=projects.billing |
test_default_client_factory_raises_bq_lib_missing_on_importerror |
mock missing google-cloud-bigquery → BqAccessError(bq_lib_missing) |
test_default_duckdb_session_raises_auth_failed_on_metadata_error |
mock get_metadata_token raising BQMetadataAuthError → BqAccessError(auth_failed) |
test_duckdb_session_closes_on_exit |
mock token + duckdb conn, assert conn.close() called |
test_duckdb_session_closes_on_exception |
exception inside with block still triggers conn.close() |
test_injected_client_factory_overrides_default |
BqAccess(..., client_factory=...) skips _default_client_factory |
Integration tests — extending tests/test_v2_*.py
| Test | Asserts |
|---|---|
test_v2_scan_returns_502_on_bq_forbidden_serviceusage |
mock client raises gax.Forbidden('...serviceusage...'); response 502 + body error=cross_project_forbidden + hint mentions billing_project |
test_v2_scan_returns_400_on_bq_bad_request |
mock client raises gax.BadRequest('invalid syntax'); response 400 + body error=bq_bad_request |
test_v2_scan_estimate_returns_502_on_bq_forbidden |
same pattern for /scan/estimate |
test_v2_scan_returns_500_on_bq_lib_missing |
mock client raises BqAccessError(bq_lib_missing); response 500 + structured body |
test_v2_sample_returns_502_on_bq_forbidden |
mock duckdb_session raises via bigquery_query; response 502 + structured body |
test_v2_sample_returns_400_on_unsafe_identifier |
registry row with backtick in source_table → 400 + body error=unsafe_identifier |
test_v2_sample_returns_404_on_unknown_table |
unchanged behavior (regression guard) |
test_v2_sample_returns_403_on_unauthorized |
unchanged behavior (regression guard) |
test_v2_schema_returns_502_on_bq_forbidden |
strict block (_fetch_bq_schema) failure → 502 |
test_v2_schema_returns_200_with_empty_partition_on_bq_failure |
best-effort block (_fetch_bq_table_options) failure → 200, schema returned, partition_by/clustered_by absent. Regression guard for the swallow-all preservation. |
test_v2_schema_returns_200_on_success |
regression guard (the existing happy path) |
E2E manual verification (post-deploy on <your-dev-instance>)
For each of /sample, /scan/estimate, /scan, /schema:
PAT=...
# Pre-deploy reference (current production behavior — bare 500 with no body for /sample/scan*)
curl -k -i -H "Authorization: Bearer $PAT" \
https://<your-agnes-host>/api/v2/sample/<bq_table_id>?n=2
curl -k -i -X POST -H "Authorization: Bearer $PAT" -H "Content-Type: application/json" \
https://<your-agnes-host>/api/v2/scan/estimate \
-d '{"table_id":"<bq_table_id>","select":["event_date"],"where":"event_date = DATE \"2026-04-21\""}'
curl -k -i -X POST -H "Authorization: Bearer $PAT" -H "Content-Type: application/json" \
https://<your-agnes-host>/api/v2/scan \
-d '{"table_id":"<bq_table_id>","select":["event_date"],"where":"event_date = DATE \"2026-04-21\"","limit":50}'
curl -k -i -H "Authorization: Bearer $PAT" \
https://<your-agnes-host>/api/v2/schema/<bq_table_id>
# Post-deploy, BEFORE fixing instance.yaml — expect:
# /sample, /scan, /scan/estimate: 502 + structured JSON body with
# error=cross_project_forbidden and a hint.
# /schema: 200 (because INFORMATION_SCHEMA queries don't fail today; if
# something else fails on the strict block, expect 502).
# Operator action: set data_source.bigquery.billing_project in instance.yaml,
# restart the container.
# Post-config-fix — expect 200 on all four endpoints.
This four-endpoint × three-state matrix is the success criterion for closing #134. Without it, "fixed" is unverifiable.
Implementation strategy — staged commits
Per first-round review: stage as two commits so the user-visible bug fix is independently reviewable / revertable from the refactor. Per second-round review: both commits emit the same structured response shape so client-side parsers (CLI, UI) don't see contract churn between commits.
Commit 1 — Minimal bug fix (revertable, atomic across all three v2 endpoints):
app/api/v2_sample.py:- Read
billing_projectwith same fallback asv2_scan.py:385; pass tobigquery_query(). - Wrap
_fetch_bq_sampleintry/except google.api_core.exceptions.*translating toHTTPExceptionwith the structured body shape (see below).
- Read
app/api/v2_scan.py: wrap_bq_dry_run_bytesand_run_bq_scanin the sametry/exceptshape.app/api/v2_schema.py: wrap_fetch_bq_schema(strict block) in the sametry/except. Do NOT touch_fetch_bq_table_options— itsexcept Exception → return {}swallow-all is preserved unchanged in commit 1, then migrated to usebq.duckdb_session()in commit 2 (still preserved).
All three endpoints emit the same structured body shape that commit 2 will produce, so client-side parsers (CLI, UI) see one consistent contract throughout the rollout:
except gax.Forbidden as e:
kind = "cross_project_forbidden" if "serviceusage" in str(e).lower() else "bq_forbidden"
raise HTTPException(
status_code=502,
detail={"error": kind, "message": str(e), "details": {...}},
)
except gax.BadRequest as e:
# v2_scan: bad_request_status='client_error' → kind='bq_bad_request', 400
# v2_sample, v2_schema: bad_request_status='upstream_error' → kind='bq_upstream_error', 502
raise HTTPException(
status_code=400 if <user_derived> else 502,
detail={"error": "bq_bad_request" if <user_derived> else "bq_upstream_error",
"message": str(e), "details": {}},
)
- Tests: regression tests for the structured body shape on all three endpoints.
This commit alone closes the user-visible part of #134 atomically — no half-rolled-out window where /sample returns bare 500 while /scan returns structured 502. If commit 2 needs another review round, commit 1 still ships and the response contract is forward-compatible.
Commit 2 — BqAccess extraction + migration:
- Create
connectors/bigquery/access.pywith the design above. - Migrate
v2_scan,v2_sample,v2_schema(both blocks, with separate semantics),RemoteQueryEngine(lazybq_access) toBqAccess. - Remove
_bq_client_factoryfromRemoteQueryEngine.__init__and the stale docstring atsrc/remote_query.py:204. - Migrate tests to the new
BqAccess(client_factory=...)+dependency_overridespattern. - Delete inline
try/except gax.*blocks added in commit 1; route throughtranslate_bq_errorinstead.
Risks
- Test rewrite breaks something subtle.
tests/test_remote_query.pyand possiblytests/test_duckdb_manager.pyhave many_bq_client_factorycall sites. The new fixture pattern must cover every shape they exercise. Mitigation: convert tests one-by-one in commit 2, run pytest after each, before deleting the old injection point. DuckDB-only tests that don't touch BQ are protected by lazybq_accessresolution inRemoteQueryEngine. - Cross-project Forbidden detection heuristic is narrow but principled. Relies on Google's error message containing
'serviceusage'(case-insensitive). False positives are unlikely (the substring is specific). False negatives are possible — those degrade tobq_forbiddenwith a generic message, still a 502 with structured body. Acceptable. get_bq_access()is@functools.cache'd. Cheap and process-lifetime-safe (config is loaded at boot and immutable). Tests usedependency_overridesand directBqAccess(...)construction, never the cached path — no cache invalidation needed in tests. Hot-reload ofinstance.yamlis explicitly out of scope.bq_bad_request → 400could leak BQ error messages. BQ'sBadRequesttext typically describes the SQL problem. We surface it indetails.message. Operators who don't want this can filter at a reverse-proxy layer; this matches behavior of any 4xx-with-detail elsewhere in the app.BIGQUERY_PROJECTenv-var precedence is BREAKING for env-only deployments. Deployments that combine env-var-for-billing + yaml-for-data must migrate. See the project-resolution rules section. Flag in CHANGELOG and release notes.- Two duplicate sites left behind (
extractor.py,register_bq_table). Explicit follow-up issue should be filed at PR-merge time. translate_bq_errorpass-through ordering is load-bearing.bq.client()andbq.duckdb_session()raiseBqAccessErrordirectly forbq_lib_missing/auth_failed. The translator's first clause MUST beif isinstance(e, BqAccessError): return e. Otherwise those typed errors fall through to "unknown" and get re-raised as bare 500. Unit-tested viatest_translate_passes_through_BqAccessError.- In-memory DB reload cost on each
duckdb_session()(pre-existing). Every BQ-via-DuckDB call runsINSTALL bigqueryfresh. If the extension binary isn't already cached on disk, this is an HTTPS download. Today's code has the same cost; this PR doesn't fix it. Future optimization: long-lived in-memory conn with extension pre-loaded, behind a thread-safe pool.
CHANGELOG entry (for the implementation PR)
Under ## [Unreleased]:
### Fixed
- v2
/sampleendpoint: BigQuery cross-project queries now respectdata_source.bigquery.billing_projectfrominstance.yaml(mirrors v2/scanfix from33a9964). Closes #134 for/sample. - v2
/scan,/scan/estimate: BigQuery upstream errors no longer return bare HTTP 500 with empty body.Forbiddenfrom BQ now returns HTTP 502 with structured JSON body ({"error": "cross_project_forbidden", "message": "...", "details": {"hint": "..."}});BadRequeston user-derived SQL returns HTTP 400 withkind=bq_bad_request. Closes #134 for/scan*. - v2
/schema: same error translation applied to the strict path (_fetch_bq_schema); the best-effort partition-info path (_fetch_bq_table_options) preserves its swallow-all-and-return-empty behavior, so/schemastill returns 200 with empty partition info if BQ partition-info queries fail.
### Changed
- BREAKING for deployments using
BIGQUERY_PROJECTenv var alongsidedata_source.bigquery.projectininstance.yaml. The env var now sets BOTH billing and data project, overridingdata_source.bigquery.projectfor FROM-clause construction inv2_scan/v2_sample/v2_schema. Migrate by clearingBIGQUERY_PROJECTand usingdata_source.bigquery.billing_project+data_source.bigquery.projectininstance.yaml. (PreviouslyBIGQUERY_PROJECTonly affectedRemoteQueryEnginebilling.)
### Internal
- New shared module
connectors/bigquery/access.py—BqAccessfacade unifies BQ project resolution, client construction, DuckDB-extension session, and Google-API error translation acrossv2_scan,v2_sample,v2_schema, andRemoteQueryEngine. - Internal API change:
RemoteQueryEngine.__init__no longer accepts_bq_client_factory. Callers that injected it migrate toRemoteQueryEngine(..., bq_access=BqAccess(projects, client_factory=...)). The CLI (cli/commands/query.py) is unaffected — it never injected the factory and the newbq_accesskwarg defaults toNone(lazyget_bq_access()on first BQ call). - Stale docstring at
src/remote_query.py:204(referencingscripts.duckdb_manager._create_bq_clientas the default factory) removed. - Two known-duplicate BQ-access sites (
connectors/bigquery/extractor.py,scripts/duckdb_manager.register_bq_table) explicitly out of scope; tracked as follow-up.