* feat(bq): decouple table_registry bucket from BQ dataset name (#343) Adds optional `bq_fqn` column (schema v51) carrying the fully-qualified BigQuery path (project.dataset.table) so the rebuild path no longer has to reconstruct it from the dual-purpose `bucket` field (which is also a UX/RBAC label). - Schema v51 migration + _SYSTEM_SCHEMA carry the nullable column; rows without it keep using the legacy bucket+source_table+ remote_attach.project path (backwards compat). - BQ extractor honors bq_fqn per row when present: dataset/table override on same-project rows; cross-project VIEW path works via bigquery_query(billing, ...); cross-project BASE TABLE skipped with a clear warning (multi-ATTACH per project deferred to follow-up). - Orchestrator pre-pass detects drift between extract.duckdb _remote_attach.url and overlay data_source.bigquery.project, calls rebuild_from_registry to regenerate when they differ. Closes the operational hazard where /admin/server-config edits silently left the on-disk extract pointing at the old project until the next manual sync. - Startup config check warns when project ≠ billing_project without location set (the on-disk symptom is "provider returned no data" silently in metadata cache), and when a warehouse-like data project has no billing_project override (silent 403 serviceusage path). - _resolve_bq_location warning now points at the location config key explicitly so operators see the actionable fix in the log. - POST /api/admin/register-table and PUT /api/admin/registry/{id} accept bq_fqn; malformed values rejected at the API boundary (422). - 25 tests covering parse_bq_fqn matrix, extractor override paths (same-project + cross-project VIEW + cross-project BASE TABLE skip), orchestrator drift sync, startup-validator heuristic, admin models. UI surface for bq_fqn input in /admin/tables intentionally omitted from this PR (3.5k-line template change) — admins can register through the REST API or `agnes admin` CLI in the meantime. Multi-project ATTACH support is the same scope deferral as the cross-project BASE TABLE skip; both ride a follow-up PR. * review fixes: abstract CHANGELOG, merge duplicate Changed, bump docs schema version - CHANGELOG.md: remove customer-specific hostname + incident date range from the orchestrator drift-sync entry (vendor-agnostic OSS rule), fold the entry into the existing [Unreleased] ### Changed section instead of opening a duplicate heading. - docs/architecture.md: bump 'Current schema version' from 19 to 51 to match SCHEMA_VERSION (per agnes-orchestrator skill rule #4). * review fixes: vendor-agnostic test fixture + Schema v51 internal bullet - tests/test_bq_fqn.py: replace customer GCP project ID with generic 'my-warehouse-project' placeholder (vendor-agnostic OSS rule). Test asserts on the warehouse-like heuristic, not the literal project name, so the rename is behavior-neutral. - CHANGELOG.md: add explicit '\*\*Schema v51\*\*' bullet under `### Internal` naming the new version + summarizing the additive nullable column (matches the convention from v47/v48 bullets). * fix(bq): cross-project _detect_table_type bills against extractor project Addresses Devin review on #346 — pre-fix _detect_table_type passed the data project as BOTH the FROM-clause target AND the bigquery_query() first arg (billing project). For cross-project bq_fqn rows where fqn_project != project_id, the data SA holds bigquery.dataViewer on fqn_project but the serviceusage.services.use permission only on project_id, so the call 403'd. init_extract's broad except Exception swallowed the error and silently skipped the row, meaning the cross-project VIEW path at extractor.py:~696 — the PR's primary cross-project use case — never executed. - Add optional billing_project kwarg to _detect_table_type; defaults to project for backwards compat (same-project callers unaffected). - Update the init_extract call site to pass billing_project=project_id explicitly. Same-project rows (fqn_project == project_id) are a no-op; cross-project rows now route billing to the project where the SA actually has services.use. - 2 new tests in TestDetectTableTypeBilling cover (a) explicit billing_project routing to bigquery_query 1st arg + data project staying in FROM, and (b) the backwards-compat default. Plus test_cross_project_detect_call_bills_against_extractor_project pins the call-site wiring — captures the (project, billing_project) pair the extractor passes for a cross-project bq_fqn row. * release: 0.54.29 — bq_fqn decoupling + marketplace refactor + setup-script UX Accumulated [Unreleased] content from #342 (flea marketplace refactor), #344 (setup script step-2 cwd check), and #346 (this PR — bq_fqn column + orchestrator drift sync + startup config check). Schema v51.
533 lines
22 KiB
Python
533 lines
22 KiB
Python
"""Tests for the v51 ``bq_fqn`` decoupling work (issue #343).
|
|
|
|
Covers:
|
|
|
|
- ``parse_bq_fqn`` unit cases (valid / empty / malformed shapes).
|
|
- Extractor honors ``bq_fqn`` in registry rows: dataset/table override
|
|
for same-project rows; cross-project VIEW path works; cross-project
|
|
BASE TABLE skipped with warning; malformed rejected per-row.
|
|
- Orchestrator drift sync: ``_remote_attach.url`` mismatch with overlay
|
|
triggers ``rebuild_from_registry``.
|
|
- ``validate_bigquery_startup_config`` warning matrix.
|
|
- ``RegisterTableRequest`` accepts ``bq_fqn`` field; register handler
|
|
rejects malformed / non-BQ-source bq_fqn at the API boundary.
|
|
"""
|
|
|
|
import re
|
|
from pathlib import Path
|
|
from unittest.mock import MagicMock, patch
|
|
|
|
import duckdb
|
|
import pytest
|
|
|
|
from connectors.bigquery.extractor import parse_bq_fqn
|
|
|
|
|
|
class _CapturingProxy:
|
|
"""Lightweight DuckDB proxy: intercepts BigQuery extension SQL and
|
|
records every CREATE VIEW we would have emitted against the real
|
|
BQ extension. The extension itself isn't loaded (offline tests),
|
|
so view SQL referencing ``bq.*`` or ``bigquery_query(...)`` would
|
|
fail at create-time — the proxy substitutes a no-op CREATE TABLE
|
|
placeholder so downstream INSERT / verification still works.
|
|
|
|
Captured SQL is exposed as ``proxy.create_view_sqls`` for tests
|
|
that need to assert on the path the extractor constructed."""
|
|
|
|
def __init__(self, real_conn):
|
|
self._real = real_conn
|
|
self.create_view_sqls: list[str] = []
|
|
|
|
def execute(self, sql, *args, **kwargs):
|
|
upper = sql.strip().upper()
|
|
if upper.startswith("INSTALL BIGQUERY") or upper.startswith("LOAD BIGQUERY"):
|
|
return MagicMock()
|
|
if upper.startswith("CREATE SECRET") or upper.startswith("CREATE OR REPLACE SECRET"):
|
|
return MagicMock()
|
|
if "ATTACH" in upper and "BIGQUERY" in upper:
|
|
return MagicMock()
|
|
if upper.startswith("DETACH BQ"):
|
|
return MagicMock()
|
|
if upper.startswith("SET BQ_") or upper.startswith("SELECT CURRENT_SETTING"):
|
|
return MagicMock()
|
|
# View bodies that reference the BQ extension (`bq."ds"."t"` for
|
|
# BASE TABLE or `bigquery_query(...)` for VIEW) would error
|
|
# without a live extension. Capture the SQL for the test, then
|
|
# substitute a placeholder TABLE so subsequent INSERT INTO _meta
|
|
# paths keep working.
|
|
if ("FROM BQ." in upper or "BIGQUERY_QUERY(" in upper) and "CREATE" in upper:
|
|
self.create_view_sqls.append(sql)
|
|
m = re.search(r'VIEW\s+"?(\w+)"?', sql, re.IGNORECASE)
|
|
if m:
|
|
self._real.execute(
|
|
f'CREATE OR REPLACE TABLE "{m.group(1)}" (dummy INTEGER)'
|
|
)
|
|
return MagicMock()
|
|
return self._real.execute(sql, *args, **kwargs)
|
|
|
|
def close(self):
|
|
return self._real.close()
|
|
|
|
def __getattr__(self, name):
|
|
return getattr(self._real, name)
|
|
|
|
|
|
# ----------------------------------------------------------------------
|
|
# parse_bq_fqn — pure unit
|
|
# ----------------------------------------------------------------------
|
|
|
|
class TestParseBqFqn:
|
|
def test_none_returns_none(self):
|
|
assert parse_bq_fqn(None) is None
|
|
|
|
def test_empty_string_returns_none(self):
|
|
# Treat "" the same as None — the registry persists '' for
|
|
# cleared values in some paths, and the extractor's fallback
|
|
# branch is the right behavior in both cases.
|
|
assert parse_bq_fqn("") is None
|
|
|
|
def test_well_formed_three_segments(self):
|
|
assert parse_bq_fqn("my-proj.my_ds.my_tbl") == (
|
|
"my-proj", "my_ds", "my_tbl",
|
|
)
|
|
|
|
@pytest.mark.parametrize("bad", [
|
|
"just_a_table", # one segment
|
|
"ds.table", # two segments
|
|
"p.d.t.extra", # four segments
|
|
".d.t", # empty project
|
|
"p..t", # empty dataset
|
|
"p.d.", # empty table
|
|
])
|
|
def test_malformed_raises(self, bad):
|
|
with pytest.raises(ValueError, match="malformed bq_fqn"):
|
|
parse_bq_fqn(bad)
|
|
|
|
def test_unsafe_project_rejected(self):
|
|
# `_validate_project_id` accepts the canonical BQ project-id
|
|
# grammar (6-30 lowercase letters/digits/dashes). A space
|
|
# would let an attacker break out of the inline backtick path
|
|
# at view-create time; reject upfront.
|
|
with pytest.raises(ValueError, match="project.*grammar"):
|
|
parse_bq_fqn("bad project.ds.tbl")
|
|
|
|
|
|
# ----------------------------------------------------------------------
|
|
# Extractor honors bq_fqn
|
|
# ----------------------------------------------------------------------
|
|
|
|
@pytest.fixture
|
|
def output_dir(tmp_path):
|
|
d = tmp_path / "extracts" / "bigquery"
|
|
d.mkdir(parents=True)
|
|
return str(d)
|
|
|
|
|
|
def _run_init_extract(output_dir, project_id, tcs, detect_returns):
|
|
"""Run init_extract with mocked auth + entity-type detection through
|
|
the capturing proxy. Returns ``(stats, captured_sqls)`` so tests can
|
|
assert on both the per-row outcome AND the SQL the extractor would
|
|
have sent to the live BQ extension."""
|
|
from connectors.bigquery.extractor import init_extract
|
|
|
|
detector = (
|
|
detect_returns if callable(detect_returns)
|
|
else (lambda *a, **kw: detect_returns)
|
|
)
|
|
|
|
captured: list[str] = []
|
|
|
|
def proxy_connect(path=None, **kwargs):
|
|
real_conn = duckdb.connect(path)
|
|
proxy = _CapturingProxy(real_conn)
|
|
proxy.create_view_sqls = captured # share list across calls
|
|
return proxy
|
|
|
|
with patch("connectors.bigquery.extractor.get_metadata_token", lambda: "x"), \
|
|
patch("connectors.bigquery.extractor._detect_table_type", detector), \
|
|
patch("connectors.bigquery.extractor.duckdb") as mock_mod:
|
|
mock_mod.connect = proxy_connect
|
|
result = init_extract(output_dir, project_id, tcs)
|
|
return result, captured
|
|
|
|
|
|
def _meta_rows(output_dir):
|
|
conn = duckdb.connect(str(Path(output_dir) / "extract.duckdb"))
|
|
try:
|
|
return conn.execute(
|
|
"SELECT table_name FROM _meta ORDER BY table_name"
|
|
).fetchall()
|
|
finally:
|
|
conn.close()
|
|
|
|
|
|
class TestExtractorRespectsBqFqn:
|
|
def test_bq_fqn_overrides_bucket_for_same_project_view(self, output_dir):
|
|
"""A row with bq_fqn whose project matches the extractor's ATTACH
|
|
project should use the bq_fqn's dataset/table in the inner view.
|
|
|
|
Concretely: bucket='Sessions' (UX label) and bq_fqn=
|
|
'my-project.product_analytics.S2_pageviews' — the bigquery_query
|
|
FROM clause should reference product_analytics.S2_pageviews, NOT
|
|
Sessions.S2_pageviews."""
|
|
tcs = [{
|
|
"id": "s2",
|
|
"name": "s2_session_pageviews",
|
|
"source_type": "bigquery",
|
|
"bucket": "Sessions", # UX label — must NOT leak into BQ path
|
|
"source_table": "ignored_st", # should also be overridden
|
|
"bq_fqn": "my-project.product_analytics.S2_pageviews",
|
|
"query_mode": "remote",
|
|
"description": "",
|
|
}]
|
|
result, sqls = _run_init_extract(output_dir, "my-project", tcs, "VIEW")
|
|
assert result["tables_registered"] == 1
|
|
joined = "\n".join(sqls)
|
|
assert "product_analytics" in joined, joined
|
|
assert "S2_pageviews" in joined, joined
|
|
# The UX label must not leak into the BQ path
|
|
assert "Sessions" not in joined, joined
|
|
|
|
def test_bq_fqn_view_cross_project_succeeds(self, output_dir):
|
|
"""VIEW path uses bigquery_query(billing, ...), which can read across
|
|
projects. A bq_fqn with project ≠ extractor project should still
|
|
register the master view (cross-project SA permissions assumed)."""
|
|
tcs = [{
|
|
"id": "rfm",
|
|
"name": "rfm",
|
|
"source_type": "bigquery",
|
|
"bucket": "RFM",
|
|
"source_table": "ignored",
|
|
"bq_fqn": "other-project.revenue.bk_rfm",
|
|
"query_mode": "remote",
|
|
"description": "",
|
|
}]
|
|
result, sqls = _run_init_extract(output_dir, "my-project", tcs, "VIEW")
|
|
assert result["tables_registered"] == 1
|
|
joined = "\n".join(sqls)
|
|
# Verify the FROM clause carries the cross-project FQN
|
|
assert "other-project.revenue.bk_rfm" in joined, joined
|
|
# Billing project for the BQ job is still the ATTACH project
|
|
assert "bigquery_query('my-project'" in joined, joined
|
|
|
|
def test_bq_fqn_base_table_cross_project_skipped(self, output_dir):
|
|
"""BASE TABLE path goes through the bq ATTACH alias, which is bound
|
|
to the extractor's project. Cross-project BASE TABLE would silently
|
|
route to the wrong project (data not found there) — skip with a
|
|
warning and do NOT insert _meta so the master view isn't created
|
|
against missing data."""
|
|
tcs = [{
|
|
"id": "xp",
|
|
"name": "xp",
|
|
"source_type": "bigquery",
|
|
"bucket": "OtherDs",
|
|
"source_table": "tbl",
|
|
"bq_fqn": "other-project.OtherDs.tbl",
|
|
"query_mode": "remote",
|
|
"description": "",
|
|
}]
|
|
result, _ = _run_init_extract(output_dir, "my-project", tcs, "BASE TABLE")
|
|
assert result["tables_registered"] == 0
|
|
# No _meta row → orchestrator won't create a master view that
|
|
# would resolve to a nonexistent inner view.
|
|
assert _meta_rows(output_dir) == []
|
|
|
|
def test_malformed_bq_fqn_records_per_row_error(self, output_dir):
|
|
tcs = [{
|
|
"id": "ok", "name": "ok", "source_type": "bigquery",
|
|
"bucket": "ds", "source_table": "t",
|
|
"query_mode": "remote", "description": "",
|
|
}, {
|
|
"id": "bad", "name": "bad", "source_type": "bigquery",
|
|
"bucket": "ds", "source_table": "t",
|
|
"bq_fqn": "not.enough", # malformed
|
|
"query_mode": "remote", "description": "",
|
|
}]
|
|
result, _ = _run_init_extract(output_dir, "my-project", tcs, "BASE TABLE")
|
|
# Good row goes through; bad row recorded as per-row error and
|
|
# does NOT abort the whole extract.
|
|
assert result["tables_registered"] == 1
|
|
assert any("malformed bq_fqn" in e["error"] for e in result["errors"])
|
|
# Only the good row landed in _meta
|
|
rows = _meta_rows(output_dir)
|
|
assert rows == [("ok",)]
|
|
|
|
def test_no_bq_fqn_falls_back_to_legacy(self, output_dir):
|
|
"""A row without bq_fqn must keep using bucket+source_table+
|
|
ATTACH project, exactly as pre-v51. Backwards-compat guarantee."""
|
|
tcs = [{
|
|
"id": "legacy",
|
|
"name": "legacy",
|
|
"source_type": "bigquery",
|
|
"bucket": "legacy_ds",
|
|
"source_table": "legacy_tbl",
|
|
# bq_fqn intentionally absent
|
|
"query_mode": "remote",
|
|
"description": "",
|
|
}]
|
|
result, sqls = _run_init_extract(output_dir, "my-project", tcs, "BASE TABLE")
|
|
assert result["tables_registered"] == 1
|
|
assert any('bq."legacy_ds"."legacy_tbl"' in s for s in sqls), sqls
|
|
|
|
def test_cross_project_detect_call_bills_against_extractor_project(self, output_dir):
|
|
"""Regression: cross-project rows must call _detect_table_type with
|
|
billing_project=project_id (the extractor's billing project), not
|
|
just the bq_fqn data project. The data SA typically has
|
|
bigquery.dataViewer on the data project but only holds
|
|
serviceusage.services.use on the billing project — reusing the
|
|
data project as billing 403s and the broad except Exception in
|
|
init_extract silently drops the row, so the cross-project VIEW
|
|
path never executes."""
|
|
captured_calls: list[dict] = []
|
|
|
|
def capturing_detector(conn, project, dataset, table, billing_project=None):
|
|
captured_calls.append({
|
|
"project": project,
|
|
"billing_project": billing_project,
|
|
})
|
|
return "VIEW"
|
|
|
|
tcs = [{
|
|
"id": "rfm",
|
|
"name": "rfm",
|
|
"source_type": "bigquery",
|
|
"bucket": "RFM",
|
|
"source_table": "ignored",
|
|
"bq_fqn": "other-project.revenue.bk_rfm",
|
|
"query_mode": "remote",
|
|
"description": "",
|
|
}]
|
|
_run_init_extract(output_dir, "my-project", tcs, capturing_detector)
|
|
assert len(captured_calls) == 1
|
|
call = captured_calls[0]
|
|
# Data project (FROM clause / INFORMATION_SCHEMA target)
|
|
assert call["project"] == "other-project"
|
|
# Billing project (bigquery_query 1st arg + serviceusage.services.use
|
|
# check) — must be the extractor's billing project, NOT the data project.
|
|
assert call["billing_project"] == "my-project"
|
|
|
|
|
|
# ----------------------------------------------------------------------
|
|
# _detect_table_type — direct unit
|
|
# ----------------------------------------------------------------------
|
|
|
|
class TestDetectTableTypeBilling:
|
|
"""Verify that _detect_table_type wires billing_project into the
|
|
bigquery_query() 1st positional arg — the only knob that controls
|
|
which project the BQ jobs API charges + checks services.use on."""
|
|
|
|
def _make_fake_conn(self, captured: list, return_value):
|
|
class _FakeCursor:
|
|
def fetchone(self_inner):
|
|
return return_value
|
|
class _FakeConn:
|
|
def execute(self_inner, sql, params):
|
|
captured.append(list(params))
|
|
return _FakeCursor()
|
|
return _FakeConn()
|
|
|
|
def test_explicit_billing_project_used_for_bigquery_query_first_arg(self):
|
|
from connectors.bigquery.extractor import _detect_table_type
|
|
captured: list = []
|
|
conn = self._make_fake_conn(captured, ("VIEW",))
|
|
result = _detect_table_type(
|
|
conn, "data-proj", "ds", "tbl",
|
|
billing_project="billing-proj",
|
|
)
|
|
assert result == "VIEW"
|
|
# bigquery_query(billing_project, bq_sql, table_predicate)
|
|
params = captured[0]
|
|
assert params[0] == "billing-proj"
|
|
# FROM clause still references the data project
|
|
assert "`data-proj.ds.INFORMATION_SCHEMA.TABLES`" in params[1]
|
|
assert params[2] == "tbl"
|
|
|
|
def test_omitted_billing_project_defaults_to_data_project(self):
|
|
"""Backwards-compat: existing same-project callers omit
|
|
billing_project and bill against the data project (no-op since
|
|
the two projects are equal in same-project lookups)."""
|
|
from connectors.bigquery.extractor import _detect_table_type
|
|
captured: list = []
|
|
conn = self._make_fake_conn(captured, None)
|
|
_detect_table_type(conn, "same-proj", "ds", "tbl")
|
|
assert captured[0][0] == "same-proj"
|
|
|
|
|
|
# ----------------------------------------------------------------------
|
|
# Orchestrator drift sync
|
|
# ----------------------------------------------------------------------
|
|
|
|
class TestOrchestratorBqDriftSync:
|
|
def test_drift_triggers_rebuild_from_registry(self, tmp_path, monkeypatch):
|
|
"""When extract.duckdb's _remote_attach.url disagrees with the
|
|
overlay's data_source.bigquery.project, the orchestrator's
|
|
pre-pass should call rebuild_from_registry to regenerate the
|
|
extract before the main scan loop."""
|
|
from src.orchestrator import SyncOrchestrator
|
|
|
|
bq_dir = tmp_path / "extracts" / "bigquery"
|
|
bq_dir.mkdir(parents=True)
|
|
extract_path = bq_dir / "extract.duckdb"
|
|
|
|
# Create a minimal _remote_attach pointing at the OLD project.
|
|
conn = duckdb.connect(str(extract_path))
|
|
try:
|
|
conn.execute(
|
|
"CREATE TABLE _remote_attach ("
|
|
"alias VARCHAR, extension VARCHAR, url VARCHAR, "
|
|
"token_env VARCHAR)"
|
|
)
|
|
conn.execute(
|
|
"INSERT INTO _remote_attach VALUES (?, ?, ?, ?)",
|
|
["bq", "bigquery", "project=stale-project", ""],
|
|
)
|
|
finally:
|
|
conn.close()
|
|
|
|
# Overlay says the project is now `fresh-project`.
|
|
monkeypatch.setattr(
|
|
"app.instance_config.get_value",
|
|
lambda *a, **kw: "fresh-project" if a[-1] == "project" else "",
|
|
)
|
|
|
|
called = []
|
|
monkeypatch.setattr(
|
|
"connectors.bigquery.extractor.rebuild_from_registry",
|
|
lambda *a, **kw: (called.append(1), {"tables_registered": 0, "errors": []})[1],
|
|
)
|
|
|
|
orch = SyncOrchestrator(analytics_db_path=str(tmp_path / "analytics.duckdb"))
|
|
orch._sync_bq_remote_attach_with_overlay(tmp_path / "extracts")
|
|
assert called == [1], "drift detected but rebuild_from_registry was not invoked"
|
|
|
|
def test_no_drift_is_noop(self, tmp_path, monkeypatch):
|
|
from src.orchestrator import SyncOrchestrator
|
|
|
|
bq_dir = tmp_path / "extracts" / "bigquery"
|
|
bq_dir.mkdir(parents=True)
|
|
extract_path = bq_dir / "extract.duckdb"
|
|
|
|
conn = duckdb.connect(str(extract_path))
|
|
try:
|
|
conn.execute(
|
|
"CREATE TABLE _remote_attach ("
|
|
"alias VARCHAR, extension VARCHAR, url VARCHAR, "
|
|
"token_env VARCHAR)"
|
|
)
|
|
conn.execute(
|
|
"INSERT INTO _remote_attach VALUES (?, ?, ?, ?)",
|
|
["bq", "bigquery", "project=same-project", ""],
|
|
)
|
|
finally:
|
|
conn.close()
|
|
|
|
monkeypatch.setattr(
|
|
"app.instance_config.get_value",
|
|
lambda *a, **kw: "same-project" if a[-1] == "project" else "",
|
|
)
|
|
called = []
|
|
monkeypatch.setattr(
|
|
"connectors.bigquery.extractor.rebuild_from_registry",
|
|
lambda *a, **kw: called.append(1) or {},
|
|
)
|
|
orch = SyncOrchestrator(analytics_db_path=str(tmp_path / "analytics.duckdb"))
|
|
orch._sync_bq_remote_attach_with_overlay(tmp_path / "extracts")
|
|
assert called == [], "no drift but rebuild_from_registry was still called"
|
|
|
|
def test_missing_extract_is_noop(self, tmp_path, monkeypatch):
|
|
"""Pre-pass on an instance with no BQ extract at all must not
|
|
try to read or rewrite anything. Soft-fails silently."""
|
|
from src.orchestrator import SyncOrchestrator
|
|
called = []
|
|
monkeypatch.setattr(
|
|
"connectors.bigquery.extractor.rebuild_from_registry",
|
|
lambda *a, **kw: called.append(1) or {},
|
|
)
|
|
orch = SyncOrchestrator(analytics_db_path=str(tmp_path / "analytics.duckdb"))
|
|
orch._sync_bq_remote_attach_with_overlay(tmp_path / "extracts")
|
|
assert called == []
|
|
|
|
|
|
# ----------------------------------------------------------------------
|
|
# validate_bigquery_startup_config
|
|
# ----------------------------------------------------------------------
|
|
|
|
class TestStartupValidation:
|
|
def test_empty_config_no_warnings(self, monkeypatch):
|
|
from connectors.bigquery.access import validate_bigquery_startup_config
|
|
monkeypatch.setattr("app.instance_config.get_value", lambda *a, **kw: "")
|
|
assert validate_bigquery_startup_config() == []
|
|
|
|
def test_same_billing_and_data_project_no_warnings(self, monkeypatch):
|
|
from connectors.bigquery.access import validate_bigquery_startup_config
|
|
|
|
def fake_get_value(*args, **kwargs):
|
|
key = args[-1]
|
|
return {
|
|
"project": "my-proj",
|
|
"billing_project": "my-proj",
|
|
"location": "", # location unset is OK when same project
|
|
}.get(key, "")
|
|
|
|
monkeypatch.setattr("app.instance_config.get_value", fake_get_value)
|
|
assert validate_bigquery_startup_config() == []
|
|
|
|
def test_cross_project_without_location_warns(self, monkeypatch):
|
|
from connectors.bigquery.access import validate_bigquery_startup_config
|
|
|
|
def fake_get_value(*args, **kwargs):
|
|
key = args[-1]
|
|
return {
|
|
"project": "data-project",
|
|
"billing_project": "billing-project",
|
|
"location": "",
|
|
}.get(key, "")
|
|
|
|
monkeypatch.setattr("app.instance_config.get_value", fake_get_value)
|
|
warnings = validate_bigquery_startup_config()
|
|
assert len(warnings) == 1
|
|
assert "location is not set" in warnings[0]
|
|
assert "issue #343" in warnings[0]
|
|
|
|
def test_warehouse_like_project_without_billing_warns(self, monkeypatch):
|
|
from connectors.bigquery.access import validate_bigquery_startup_config
|
|
|
|
def fake_get_value(*args, **kwargs):
|
|
key = args[-1]
|
|
return {
|
|
"project": "my-warehouse-project",
|
|
"billing_project": "",
|
|
"location": "us-central1",
|
|
}.get(key, "")
|
|
|
|
monkeypatch.setattr("app.instance_config.get_value", fake_get_value)
|
|
warnings = validate_bigquery_startup_config()
|
|
# Only the warehouse-like heuristic fires (cross-project warning
|
|
# is suppressed because effective_billing == project when billing
|
|
# is unset, regardless of location).
|
|
assert any("warehouse" in w or "serviceusage" in w for w in warnings)
|
|
|
|
|
|
# ----------------------------------------------------------------------
|
|
# Admin API surface
|
|
# ----------------------------------------------------------------------
|
|
|
|
class TestRegisterRequestAcceptsBqFqn:
|
|
def test_pydantic_accepts_well_formed(self):
|
|
from app.api.admin import RegisterTableRequest
|
|
r = RegisterTableRequest(
|
|
name="t", source_type="bigquery",
|
|
bucket="ds", source_table="t",
|
|
bq_fqn="proj.ds.t",
|
|
)
|
|
assert r.bq_fqn == "proj.ds.t"
|
|
|
|
def test_pydantic_accepts_omitted(self):
|
|
from app.api.admin import RegisterTableRequest
|
|
r = RegisterTableRequest(name="t", source_type="bigquery", bucket="ds", source_table="t")
|
|
assert r.bq_fqn is None
|
|
|
|
def test_update_request_accepts_bq_fqn(self):
|
|
from app.api.admin import UpdateTableRequest
|
|
u = UpdateTableRequest(bq_fqn="p.d.t")
|
|
assert u.bq_fqn == "p.d.t"
|