merge: pull #174 (BQ materialize view fix + concurrency, 0.33.0) into bootstrap branch

Brings in zs/materialize-sync-fix (PR #174): - BigQuery view materialize works (wrap admin SQL in bigquery_query()) - Per-table mutex + fcntl.flock for concurrent COPY corruption - Cost guardrail dry-run engages on materialized rows - Schema v23 -> v24 migration: rewrite source_query to BQ-native - Server-generated trivial source_query from bucket+source_table - Validator backtick relaxation for materialized rows - 0.33.0 release cut Conflict resolution: - CHANGELOG.md: keep our [Unreleased] (bootstrap rewrite content) ABOVE the new [0.33.0] section from #174. The bootstrap rewrite remains unreleased; it'll cut 0.34.0 (or later) when this PR merges to main. - tests/conftest.py: union — keep our analyst-bootstrap fixture re-export AND #174's bq_instance / stub_bq_extractor fixtures. - pyproject.toml auto-merged to 0.33.0 (matches the cut), correct. - src/db.py auto-merged: SCHEMA_VERSION = 24, _v23_to_v24_finalize added — no overlap with our work which left schema at v23. - CLAUDE.md auto-merged: schema-history paragraph extended with v24. Verified: 79/79 across CLI bootstrap suite + materialize suite + schema v24 migration tests pass locally on Python 3.13/macOS.
2026-05-04 20:53:00 +02:00 · 2026-05-04 20:53:00 +02:00 · e438170ade
commit e438170ade
parent ee83cebbda e6a2c4c51d
23 changed files with 1607 additions and 259 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -54,6 +54,89 @@ End-to-end clean-analyst-bootstrap rewrite. The web `/setup?role=analyst` page n
 - `tests/test_clean_install_integration.py` — end-to-end happy-path tests (minimal grants, zero grants, force preserves CLAUDE.local.md, readers in pre-init dir).
 - `docs/RELEASE_CHECKLIST.md` — manual clean-install protocol mandated for any PR touching the bootstrap path.
 ## [0.33.0] — 2026-05-04
 Closes #162. Headline fix: `query_mode='materialized'` BigQuery rows now
 materialize correctly for views and materialized views, with per-table
 concurrency control preventing parquet corruption on overlapping scheduler
 ticks. Plus a source_query server-generation convenience, a
 `materialize.lock_ttl_seconds` config knob, and a schema v24 migration that
 converts existing DuckDB-flavor source_query values to BQ-native SQL.
 ### Fixed
 - BigQuery materialize now works for views and materialized views. Pre-fix,
  `materialize_query` ran admin's `source_query` as `COPY (sql) TO parquet`
  through the DuckDB BigQuery extension session, which routed through the BQ
  Storage Read API for `bq."<ds>"."<tbl>"` references. Storage Read API
  rejects non-base entities (`Binder Error: Error while creating read session:
  ... non-table entities cannot be read with the storage API`). Fixed by
  always wrapping admin SQL into `bigquery_query('<billing-project>',
  '<inner-sql>')` so COPY uses the BQ jobs API uniformly for tables, views,
  and materialized views.
 - `materialize_query` no longer corrupts its parquet under concurrent
  invocations for the same `table_id`. Pre-fix, two overlapping
  `_run_materialized_pass` calls (e.g. a long-running COPY + the next
  scheduler tick) both hit the unconditional `if tmp_path.exists():
  tmp_path.unlink()` at function entry and started parallel COPYs against the
  same path, interleaving bytes and producing a parquet file with no valid
  footer. Now each call acquires a per-table_id `threading.Lock` plus an
  advisory `fcntl.flock` on `<id>.parquet.lock`; the second caller raises
  `MaterializeInFlightError` and the scheduler treats it as
  `skipped, in_flight` — never as an error.
 - Cost guardrail dry-run now engages for materialized rows. Pre-fix, the
  BigQuery Python client returned 400 (`Table-valued function not found:
  bigquery_query`) on the wrapped SQL and the dry-run silently fail-opened.
  The dry-run now operates on the inner BQ-native SQL (admin's `source_query`
  directly), which the client parses cleanly.
 ### Changed
 - **BREAKING** `query_mode='materialized'` rows MUST register `source_query`
  as BigQuery-native SQL (backticks for dashed identifiers, native
  joins/CTEs). DuckDB-flavor (`bq."<ds>"."<tbl>"`) is no longer accepted on
  register/PUT. The schema v24 migration converts existing rows automatically;
  operators with custom-written `source_query` should review the migrated form
  on first deploy. The validator's prior backtick-rejection rule is now scoped
  to `query_mode IN ('remote', 'local')` only.
 - `_run_materialized_pass` summary `skipped` field changes from `list[str]`
  to `list[dict]` with shape
  `{"table": str, "reason": Literal["due_check", "in_flight"]}`. Downstream
  consumers that asserted the old string form must update.
 ### Added
 - `POST /api/admin/register-table` for `query_mode='materialized'` rows with
  `bucket`+`source_table` but no `source_query` now server-generates
  `` SELECT * FROM `<project>.<bucket>.<source_table>` `` from the configured
  BigQuery project. The same fallback fires on `PUT /api/admin/registry/{id}`
  when flipping to materialized. Operators only need to know
  `bigquery_query()` semantics for non-trivial queries.
 - New top-level `materialize` config section in `instance.yaml`. Single field
  — `materialize.lock_ttl_seconds` (default `86400`, 24 h) — controls how
  long a stale `<id>.parquet.lock` file lives before a sibling materialize
  attempt reclaims it. Editable via `/admin/server-config` API and UI.
 ### Internal
 - Schema v24 migration: rewrites `table_registry.source_query` for
  materialized BigQuery rows from DuckDB-flavor (`bq."<ds>"."<tbl>"`) to
  BQ-native (`` `<project>.<ds>.<tbl>` ``) using the configured BQ project.
  Idempotent on already-converted rows; logs a warning and skips when the
  project isn't configured (operator can configure + restart for retry).
  Wrapped in `BEGIN TRANSACTION` / `COMMIT` to match the project's
  transactional-finalizer pattern.
 - `connectors/bigquery/extractor.py` exports `MaterializeInFlightError` and
  the `_get_table_lock` / `_get_lock_ttl_seconds` /
  `_wrap_admin_sql_for_jobs_api` / `_escape_sql_string_literal` helpers as
  test seams. Underscore-prefixed; not part of the public API.
 - `tests/conftest.py` lifts `bq_instance` and `stub_bq_extractor` fixtures
  from `tests/test_api_admin_materialized.py` so subsequent test modules in
  this PR can resolve them via pytest's auto-discovery.
 - `app/api/sync.py:is_table_due` hoisted to module-level import (was deferred
  inside `_run_materialized_pass`) so monkeypatching `app.api.sync.is_table_due`
  actually intercepts the call — the deferred form made test patches a no-op.
 ## [0.32.0] — 2026-05-04
 Closes #160. Headline fix: `da query --remote` now resolves
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -443,7 +443,7 @@ Module sets `lifecycle { ignore_changes = [metadata_startup_script] }` on `googl
 ## Key Implementation Details
 ### DuckDB Schema (src/db.py)
- Schema v23 with auto-migration v1→…→v23 (v5 adds `users.active`, v6 adds `personal_access_tokens`, v7 adds `personal_access_tokens.last_used_ip`, v8/v9 added the legacy internal_roles/role-grants tables, v10 added `view_ownership` for cross-connector view-name collision detection (issue #81 Group C), v11 added marketplace_registry + marketplace_plugins + user_groups + plugin_access, v12 added users.groups JSON + user_groups.is_system, **v13 replaces internal_roles/group_mappings/user_role_grants/plugin_access with user_group_members + resource_grants and drops users.groups JSON**, v14 adds FK constraints on user_group_members + resource_grants after orphan cleanup, v15 adds knowledge_items context-engineering columns + contradictions + session_extraction_state, v16 adds verification_evidence, v17 adds knowledge_item_relations, v18 drops stranded non-google memberships from google-managed groups, **v19 drops legacy `dataset_permissions`, `access_requests` tables and `users.role`, `table_registry.is_public` columns — table access is now exclusively per-group via `resource_grants(resource_type='table')`**, **v20 adds `source_query` TEXT to `table_registry` to back `query_mode='materialized'` (BigQuery scheduled-query parquet path)**, **v21 adds `welcome_template` singleton table backing the Agent Setup Prompt admin override (`/admin/agent-prompt`)**, **v22 reserves the `setup_banner` table — feature dropped mid-development; table retained for forward compatibility with already-migrated instances**, **v23 adds `claude_md_template` singleton table backing the Agent Workspace Prompt admin override (`/admin/workspace-prompt`)** — see CHANGELOG and docs/RBAC.md)
+- Schema v24 with auto-migration v1→…→v24 (v5 adds `users.active`, v6 adds `personal_access_tokens`, v7 adds `personal_access_tokens.last_used_ip`, v8/v9 added the legacy internal_roles/role-grants tables, v10 added `view_ownership` for cross-connector view-name collision detection (issue #81 Group C), v11 added marketplace_registry + marketplace_plugins + user_groups + plugin_access, v12 added users.groups JSON + user_groups.is_system, **v13 replaces internal_roles/group_mappings/user_role_grants/plugin_access with user_group_members + resource_grants and drops users.groups JSON**, v14 adds FK constraints on user_group_members + resource_grants after orphan cleanup, v15 adds knowledge_items context-engineering columns + contradictions + session_extraction_state, v16 adds verification_evidence, v17 adds knowledge_item_relations, v18 drops stranded non-google memberships from google-managed groups, **v19 drops legacy `dataset_permissions`, `access_requests` tables and `users.role`, `table_registry.is_public` columns — table access is now exclusively per-group via `resource_grants(resource_type='table')`**, **v20 adds `source_query` TEXT to `table_registry` to back `query_mode='materialized'` (BigQuery scheduled-query parquet path)**, **v21 adds `welcome_template` singleton table backing the Agent Setup Prompt admin override (`/admin/agent-prompt`)**, **v22 reserves the `setup_banner` table — feature dropped mid-development; table retained for forward compatibility with already-migrated instances**, **v23 adds `claude_md_template` singleton table backing the Agent Workspace Prompt admin override (`/admin/workspace-prompt`)**, **v24 rewrites materialized BQ `source_query` from DuckDB-flavor `bq."ds"."t"` to BQ-native `` `<project>.ds.t` `` so the new wrapping path accepts them; idempotent + warns when project unconfigured** — see CHANGELOG and docs/RBAC.md)
 - `table_registry`: id, name, source_type, bucket, source_table, query_mode, sync_schedule, etc.
 - `sync_state`, `sync_history`: track extraction progress
 - `users`, `audit_log`: account state + audit trail. RBAC lives in `user_groups` + `user_group_members` + `resource_grants`.
--- a/app/api/admin.py
+++ b/app/api/admin.py
@ -146,6 +146,38 @@ def _validate_urls_in_patch(sections: Dict[str, Dict[str, Any]]) -> None:
                _validate_url_not_private(value, field_name=".".join(path))
 _LOCK_TTL_MIN = 60
 _LOCK_TTL_MAX = 7 * 24 * 3600  # 604800 — one week
 def _validate_materialize_section(sections: Dict[str, Dict[str, Any]]) -> None:
    """Validate the materialize section patch when present.
    Checks field-level constraints that the Pydantic envelope can't enforce
    (it only validates the outer shape, not nested leaf values).
    """
    mat = sections.get("materialize")
    if not isinstance(mat, dict):
        return
    ttl = mat.get("lock_ttl_seconds")
    if ttl is None:
        return
    if not isinstance(ttl, int) or isinstance(ttl, bool):
        raise HTTPException(
            status_code=422,
            detail="materialize.lock_ttl_seconds must be an integer",
        )
    if ttl < _LOCK_TTL_MIN or ttl > _LOCK_TTL_MAX:
        raise HTTPException(
            status_code=422,
            detail=(
                f"materialize.lock_ttl_seconds must be between "
                f"{_LOCK_TTL_MIN} and {_LOCK_TTL_MAX} "
                f"(got {ttl})"
            ),
        )
 # --- Server-config (instance.yaml) editor -----------------------------------
 #
 # The /admin/server-config UI POSTs a partial dict here keyed by section
@ -175,6 +207,7 @@ _EDITABLE_SECTIONS: tuple[str, ...] = (
    "openmetadata",
    "desktop",
    "corporate_memory",
    "materialize",
 )
 # "Danger-zone" sections — flipping these can lock operators out (auth.*) or
@ -585,6 +618,23 @@ _KNOWN_FIELDS: dict[str, dict[str, dict]] = {
            ),
        },
    },
    # materialize — file-lock TTL for the concurrent-materialize safety net.
    # A single field; more knobs may follow as the feature matures.
    "materialize": {
        "lock_ttl_seconds": {
            "kind": "int",
            "default": 86400,
            "hint": (
                "How long (seconds) before a stale materialize lock file is "
                "reclaimed. The lock is a .parquet.lock sibling file; if the "
                "holder process is hard-killed, the next attempt reclaims the "
                "lock once the file's mtime is older than this TTL. "
                "Default 86400 (24 h). Min 60, max 604800 (7 days). "
                "Lower only if you know materializes never exceed the new value "
                "and your host regularly hard-kills processes."
            ),
        },
    },
 }
 # Keys whose values must be redacted from the audit diff. We match
@ -913,6 +963,9 @@ async def update_server_config(
    # the per-section patch (e.g. data_source.keboola.stack_url).
    _validate_urls_in_patch(request.sections)
    # Field-level constraints for sections whose values have documented ranges.
    _validate_materialize_section(request.sections)
    # Defense-in-depth: scrub redaction sentinels (`***` / `<empty>`) out of
    # secret-keyed leaves in the patch before they reach the deep-merge.
    # The client form does the same scrub, but an API caller round-tripping
@ -1169,27 +1222,28 @@ class RegisterTableRequest(BaseModel):
    @model_validator(mode="after")
    def _check_mode_query_coherence(self):
        """Enforce query_mode ↔ source_query invariants up front so an admin
-        can't persist a remote/local row carrying an orphan source_query, and
+        can't persist a remote/local row carrying an orphan source_query.
-        materialized rows can't be registered without a SQL body."""
+
        For BigQuery materialized rows, an empty source_query is allowed here
        because _validate_bigquery_register_payload generates it from
        bucket+source_table after this validator runs. For all other source
        types (e.g. Keboola), source_query is still required for materialized.
        """
        sq = (self.source_query or "").strip() or None
        if self.query_mode == "materialized" and not sq:
            raise ValueError(
                "query_mode='materialized' requires a non-empty source_query"
            )
        if self.query_mode != "materialized" and sq:
            raise ValueError(
                "source_query is only valid when query_mode='materialized'"
            )
-        # The materialize path runs the SQL through DuckDB's parser (BigQuery
+        # Non-BQ materialized rows must supply source_query explicitly — there
-        # extension's COPY pushes it through DuckDB first, and the Keboola
+        # is no server-generate fallback for Keboola materialized.
-        # path COPYs the raw SQL through a DuckDB session too). DuckDB does
+        if self.query_mode == "materialized" and not sq and self.source_type != "bigquery":
-        # NOT understand BigQuery-native backtick identifiers — those parse-
+            raise ValueError(
-        # error or silently match no rows, leaving no parquet at the
+                "query_mode='materialized' requires a non-empty source_query"
-        # canonical path and no operator-visible failure. Reject at register
+            )
-        # time with an actionable message so the bad SQL never lands in
+        # Backtick guard stays for non-materialized rows (DuckDB-flavor SQL
-        # `table_registry.source_query`. See `_run_materialized_pass` for
+        # contract); materialized SQL is BigQuery-native and MUST allow
-        # the runtime path that would otherwise eat the error.
+        # backticks for dashed identifiers (e.g. `prj-org.dataset.table`).
-        if sq and "`" in sq:
+        if self.query_mode != "materialized" and sq and "`" in sq:
            raise ValueError(_BACKTICK_REJECTION_MESSAGE)
        # Normalise: stash the trimmed-or-None form so the persisted column
        # never carries surrounding whitespace or empty-string sentinels.
@ -1232,6 +1286,31 @@ class RegisterTableRequest(BaseModel):
        return v
 def _generate_materialized_source_query(
    bucket: str, source_table: str, project_id: str,
 ) -> str:
    """Build the canonical full-table-dump source_query for a materialized
    BQ row when admin only supplies dataset + table. The result is
    BigQuery-native SQL — wrapped at materialize time into
    bigquery_query(...) by connectors.bigquery.extractor.materialize_query."""
    if not _is_safe_quoted_identifier(bucket):
        raise HTTPException(
            status_code=400,
            detail=f"bigquery: dataset {bucket!r} is unsafe",
        )
    if not _is_safe_quoted_identifier(source_table):
        raise HTTPException(
            status_code=400,
            detail=f"bigquery: source_table {source_table!r} is unsafe",
        )
    if not _is_safe_project_id(project_id):
        raise HTTPException(
            status_code=400,
            detail=f"bigquery: data_source.bigquery.project {project_id!r} is malformed",
        )
    return f"SELECT * FROM `{project_id}.{bucket}.{source_table}`"
 def _validate_bigquery_register_payload(req: "RegisterTableRequest") -> None:
    """Enforce BQ-specific shape on a register/precheck request.
@ -1253,13 +1332,8 @@ def _validate_bigquery_register_payload(req: "RegisterTableRequest") -> None:
    """
    if req.query_mode == "materialized":
        # Materialized BQ rows: the SQL body replaces dataset+table refs.
-        # Pydantic model_validator already verified source_query is non-empty;
+        # source_query may be empty if admin supplied bucket+source_table —
-        # all we still need is a valid project_id and a safe view name.
+        # in that case the server generates a full-table-dump SQL below.
        if not req.source_query or not req.source_query.strip():
            raise HTTPException(
                status_code=422,
                detail="bigquery materialized: 'source_query' is required",
            )
        raw_name = req.name or ""
        if raw_name.strip() != raw_name or not _is_safe_identifier(raw_name):
            raise HTTPException(
@ -1271,7 +1345,7 @@ def _validate_bigquery_register_payload(req: "RegisterTableRequest") -> None:
                ),
            )
        from app.instance_config import get_value
-        project_id = get_value("data_source", "bigquery", "project", default="")
+        project_id = get_value("data_source", "bigquery", "project", default="") or ""
        if not project_id:
            raise HTTPException(
                status_code=400,
@ -1290,6 +1364,24 @@ def _validate_bigquery_register_payload(req: "RegisterTableRequest") -> None:
                    "^[a-z][a-z0-9-]{4,28}[a-z0-9]$"
                ),
            )
        if not (req.source_query and req.source_query.strip()):
            # Server-generate from bucket+source_table. Trivial full-table
            # dump path; admin only sets dataset+table and the server
            # builds BQ-native SQL from instance.yaml's configured project.
            if not (req.bucket and req.source_table):
                raise HTTPException(
                    status_code=422,
                    detail=(
                        "bigquery materialized requires either source_query "
                        "(custom SQL) or bucket+source_table (server-generates "
                        "the full-table-dump SQL)"
                    ),
                )
            req.source_query = _generate_materialized_source_query(
                req.bucket, req.source_table, project_id,
            )
        # Phase C: profile_after_sync is now inert (Pydantic field marked
        # deprecated; not read by app/api/sync.py:410-438). The runtime
        # profiles every synced table unconditionally, so we no longer
@ -2283,35 +2375,32 @@ async def update_table(
        # Cross-source coherence: query_mode='materialized' requires a
        # non-empty source_query for ALL source types, not just BigQuery.
-        # Pre-fix, only the BQ-specific synthetic-RegisterTableRequest below
+        # BQ rows without source_query can be server-generated from
-        # caught this — Keboola materialized rows could be PUT without
+        # bucket+source_table (handled by _validate_bigquery_register_payload
-        # source_query and persisted with source_query=None, then crash at
+        # via the synthetic RegisterTableRequest below). Non-BQ rows (e.g.
-        # the next sync tick when kb_materialize_query received `sql=None`
+        # Keboola) still require an explicit source_query at PUT time.
        # and DuckDB rejected `COPY (None) TO ...`. Devin finding 2026-05-01:
        # BUG_pr-review-job-58ae3148_0001.
        if merged.get("query_mode") == "materialized":
            sq = merged.get("source_query")
            if not sq or not str(sq).strip():
-                raise HTTPException(
+                # BQ rows: let _validate_bigquery_register_payload generate
-                    status_code=422,
+                # source_query from bucket+source_table (falls through below).
-                    detail=(
+                # Non-BQ rows: no server-generate fallback; raise 422.
-                        "query_mode='materialized' requires a non-empty "
+                if merged.get("source_type") != "bigquery":
-                        "source_query. To revert to a non-materialized mode, "
+                    raise HTTPException(
-                        "PATCH query_mode='local' (Keboola) or 'remote' "
+                        status_code=422,
-                        "(BigQuery) and the stale source_query is cleared "
+                        detail=(
-                        "automatically."
+                            "query_mode='materialized' requires a non-empty "
-                    ),
+                            "source_query. To revert to a non-materialized mode, "
-                )
+                            "PATCH query_mode='local' (Keboola) or 'remote' "
-            # Backtick rejection on the merged record — see
+                            "(BigQuery) and the stale source_query is cleared "
-            # `_BACKTICK_REJECTION_MESSAGE` for the rationale. Catches PATCHes
+                            "automatically."
-            # that flip `source_query` to a backtick form on an already-
+                        ),
-            # materialized row, which the synthetic-RegisterTableRequest below
+                    )
-            # only re-validates for BQ rows. Apply uniformly so Keboola
+            # Backtick guard removed for materialized rows: the Task 2 wrapping
-            # materialized rows can't carry one either.
+            # path (connectors.bigquery.extractor.materialize_query) now runs
-            if "`" in str(sq):
+            # admin SQL through the BQ jobs API using BQ-native syntax, which
-                raise HTTPException(
+            # requires backticks for dashed project/dataset identifiers.
-                    status_code=422, detail=_BACKTICK_REJECTION_MESSAGE,
+            # Non-materialized rows still reject backticks in the model validator.
                )
        if merged.get("source_type") == "bigquery":
            # Reuse the register-time validator. It mutates the request to
--- a/app/api/sync.py
+++ b/app/api/sync.py
@ -20,7 +20,7 @@ from src.repositories.sync_state import SyncStateRepository
 from src.repositories.sync_settings import SyncSettingsRepository
 from src.repositories.table_registry import TableRegistryRepository
 from src.rbac import can_access_table
-from src.scheduler import filter_due_tables
+from src.scheduler import filter_due_tables, is_table_due
 logger = logging.getLogger(__name__)
 router = APIRouter(prefix="/api/sync", tags=["sync"])
@ -74,9 +74,8 @@ def _run_materialized_pass(conn: duckdb.DuckDBPyConnection, bq) -> dict:
    its structured fields so operator alerting can pick out the cap-vs-actual
    bytes from the log line.
    """
    from src.scheduler import is_table_due
    from app.instance_config import get_value
-    from connectors.bigquery.extractor import MaterializeBudgetError
+    from connectors.bigquery.extractor import MaterializeBudgetError, MaterializeInFlightError
    bq_output_dir = str(Path(_get_data_dir()) / "extracts" / "bigquery")
    kb_output_dir = Path(_get_data_dir()) / "extracts" / "keboola" / "data"
@ -125,7 +124,7 @@ def _run_materialized_pass(conn: duckdb.DuckDBPyConnection, bq) -> dict:
        last_iso = last.isoformat() if last else None
        schedule = row.get("sync_schedule") or "every 1h"
        if not is_table_due(schedule, last_iso):
-            summary["skipped"].append(ref_name)
+            summary["skipped"].append({"table": ref_name, "reason": "due_check"})
            continue
        source_type = row.get("source_type") or "bigquery"  # legacy default
@ -195,6 +194,13 @@ def _run_materialized_pass(conn: duckdb.DuckDBPyConnection, bq) -> dict:
                    ),
                })
                continue
        except MaterializeInFlightError:
            # In-flight on a sibling worker / scheduler tick — treat as
            # 'skipped, in-flight'. Do NOT call state.set_error: that
            # would flip status='error' on a healthy concurrent run and
            # the registry UI would surface a false-positive failure.
            summary["skipped"].append({"table": ref_name, "reason": "in_flight"})
            continue
        except MaterializeBudgetError as e:
            logger.warning(
                "Materialize cap exceeded for %s: %s bytes > %s bytes",
@ -466,9 +472,13 @@ sys.exit(compute_exit_code(result, len(configs)))
                mat_summary = _run_materialized_pass(mat_conn, bq_access)
            finally:
                mat_conn.close()
            skipped_count = len(mat_summary["skipped"])
            in_flight_count = sum(
                1 for s in mat_summary["skipped"] if s.get("reason") == "in_flight"
            )
            print(
                f"[SYNC] Materialized SQL: {len(mat_summary['materialized'])} ok, "
-                f"{len(mat_summary['skipped'])} skipped, "
+                f"{skipped_count} skipped (in_flight={in_flight_count}), "
                f"{len(mat_summary['errors'])} errors",
                file=_sys.stderr, flush=True,
            )
--- a/app/web/templates/admin_server_config.html
+++ b/app/web/templates/admin_server_config.html
@ -218,6 +218,10 @@ const SECTION_META = {
    title: "Corporate Memory",
    help: "Optional governance for AI-extracted knowledge. When the section is unset, the system runs in legacy democratic-wiki mode with no admin review.",
  },
  materialize: {
    title: "Materialize",
    help: "Concurrency safety net for the materialize path. Controls the file-lock TTL used to detect and reclaim stale locks from hard-killed processes.",
  },
 };
 const DANGER_SECTIONS = new Set(["auth", "server"]);
--- a/config/instance.yaml.example
+++ b/config/instance.yaml.example
@ -403,3 +403,19 @@ catalog:
 #   schema_cache_ttl_seconds: 3600    # /api/v2/schema/{table_id} cache lifetime (default: 1 h)
 #   sample_cache_ttl_seconds: 3600    # /api/v2/sample/{table_id} cache lifetime (default: 1 h)
 #                                     # Admins can force-refresh via POST /api/v2/sample/{id}?refresh=true
 # --- Materialize concurrency safety (optional) ---
 # Concurrency safety net for the materialize path (BQ + Keboola). When
 # two materialize attempts race for the same table_id, the second one
 # raises MaterializeInFlightError and skips. The lock is held in a
 # .parquet.lock sibling file; if a holder process is hard-killed before
 # kernel-level flock release, the next attempt reclaims the lock once
 # the file's mtime is older than this TTL.
 #
 # Default 86400 (24h) is generous on purpose — anything shorter risks
 # a long-running COPY being interrupted by its own scheduler successor.
 # Lower it only if you know your materialize never exceeds the new
 # value AND your host has a habit of hard-killing processes.
 # Min 60 (1 minute), max 604800 (7 days). Configurable via /admin/server-config UI.
 materialize:
  lock_ttl_seconds: 86400
--- a/connectors/bigquery/extractor.py
+++ b/connectors/bigquery/extractor.py
@ -3,16 +3,29 @@
 No data is downloaded. All queries go directly to BigQuery via DuckDB extension ATTACH.
 """
 import fcntl
 import hashlib
 import logging
 import os
 import re
 import shutil
 import threading
 import time
 from datetime import datetime, timezone
 from pathlib import Path
 from typing import List, Dict, Any, Optional
 import duckdb
 from connectors.bigquery.auth import get_metadata_token, BQMetadataAuthError
 from src.sql_safe import (
    validate_identifier as _validate_identifier,
    validate_project_id as _validate_project_id,
 )
 from src.identifier_validation import validate_identifier, validate_quoted_identifier
 logger = logging.getLogger(__name__)
 # Serializes the body of `init_extract` across threads so two concurrent
 # materialize calls (e.g. the synchronous timeout-fallback BackgroundTask
 # kicking in while the original daemon thread is still running) can't both
@ -21,15 +34,127 @@ import duckdb
 # not the per-source extract-file write, so we need a dedicated lock here.
 _INIT_EXTRACT_LOCK = threading.Lock()
-from connectors.bigquery.auth import get_metadata_token, BQMetadataAuthError
+_LOCK_TTL_DEFAULT_SECONDS: int = 86400  # 24h — overridable via materialize.lock_ttl_seconds
 from app.instance_config import get_value
 from src.sql_safe import (
    validate_identifier as _validate_identifier,
    validate_project_id as _validate_project_id,
 )
 from src.identifier_validation import validate_identifier, validate_quoted_identifier
-logger = logging.getLogger(__name__)
+
 class MaterializeInFlightError(Exception):
    """Raised when a per-table_id materialize is already running.
    Caller (`_run_materialized_pass`) should treat this as a 'skipped,
    in-flight' outcome — the in-flight worker will finish and write
    sync_state on its own. Critically, this is NOT an error condition;
    `state.set_error` MUST NOT be called for this exception or the
    registry would surface a false-positive failure to the operator
    every overlap."""
    def __init__(self, table_id: str, layer: str = "process"):
        self.table_id = table_id
        self.layer = layer
        super().__init__(
            f"materialize for {table_id!r} already in flight ({layer} lock held)"
        )
 # Unbounded by design — each registered table_id gets one Lock for the
 # process lifetime. Per-Lock cost is ~56 bytes; a deployment with even
 # 10k registered tables holds <1 MB. No cleanup logic — clean would
 # need ref-counting and risks freeing a Lock currently held by a worker.
 _table_locks: dict[str, threading.Lock] = {}
 _table_locks_registry: threading.Lock = threading.Lock()
 def _get_table_lock(table_id: str) -> threading.Lock:
    """Return the process-wide mutex for a given table_id, creating it
    on first reference. The registry mutex serializes the dict mutation
    only — once the per-id Lock is returned, contention between callers
    happens on that lock alone."""
    with _table_locks_registry:
        lock = _table_locks.get(table_id)
        if lock is None:
            lock = threading.Lock()
            _table_locks[table_id] = lock
        return lock
 def _get_lock_ttl_seconds() -> int:
    """Read the configured stale-lock TTL with fallback to the default.
    Operator override lives at instance.yaml `materialize.lock_ttl_seconds`
    (also editable via /admin/server-config). Default 86400 s = 24 h
    matches the upper bound of any healthy BQ COPY in practice — anything
    longer is a stuck process or a hung BQ session, both of which warrant
    reclaim on next attempt."""
    try:
        # Deferred import: keeps the connectors module importable in
        # contexts where the app layer isn't bootstrapped (e.g. unit tests
        # that exercise extractor helpers without the FastAPI app).
        from app.instance_config import get_value
        v = get_value(
            "materialize", "lock_ttl_seconds",
            default=_LOCK_TTL_DEFAULT_SECONDS,
        )
        n = int(v) if v is not None else _LOCK_TTL_DEFAULT_SECONDS
        return n if n > 0 else _LOCK_TTL_DEFAULT_SECONDS
    except Exception:
        return _LOCK_TTL_DEFAULT_SECONDS
 def _try_acquire_file_lock(lock_path: Path):
    """Try to acquire an advisory exclusive flock on `lock_path`. Returns
    the open file object on success (caller must close to release); None
    on conflict.
    Stale-lock reclaim: if the lock_path exists and its mtime is older
    than the configured TTL, log a warning and unlink before retrying.
    A live holder still wins the second flock attempt (kernel-level
    flock isn't tied to mtime), so the reclaim doesn't break correctness
    — it just unblocks the case where a holder process was hard-killed
    before the kernel released the lock."""
    lock_path.parent.mkdir(parents=True, exist_ok=True)
    def _try_open_and_flock():
        # Open in 'w' mode so the file's mtime updates on every successful
        # acquisition — the mtime is the TTL signal for the next caller.
        # Content is intentionally empty; the fd exists only to anchor flock.
        f = open(lock_path, "w")
        try:
            fcntl.flock(f.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
            return f
        except BlockingIOError:
            # Another holder owns the lock — return None so the caller can
            # decide between TTL-reclaim and propagating MaterializeInFlightError.
            f.close()
            return None
        except OSError:
            # Anything else (read-only fs, unsupported, fd exhaustion) is a
            # platform / config error, not a contention signal. Close the fd
            # and re-raise so the caller (and operator) sees the real failure
            # instead of a silent leak.
            f.close()
            raise
    holder = _try_open_and_flock()
    if holder is not None:
        return holder
    # Conflict. If the file is older than TTL, reclaim and retry once.
    try:
        age = time.time() - lock_path.stat().st_mtime
    except FileNotFoundError:
        return _try_open_and_flock()
    if age > _get_lock_ttl_seconds():
        logger.warning(
            "Reclaiming stale materialize lock at %s (age %.1fs > TTL)",
            lock_path, age,
        )
        try:
            lock_path.unlink()
        except FileNotFoundError:
            pass
        return _try_open_and_flock()
    return None
 def _detect_table_type(
@ -59,6 +184,56 @@ def _detect_table_type(
    return row[0] if row else None
 _BILLING_PROJECT_RE = re.compile(r"^[a-z][a-z0-9-]{4,28}[a-z0-9]$")
 def _escape_sql_string_literal(s: str) -> str:
    """Double every single quote so the result is safe to embed inside a
    single-quoted SQL string literal. DuckDB and BigQuery both honor the
    SQL standard `''` escape inside `'...'`. Used to wrap admin
    source_query into bigquery_query()'s second arg without breaking
    the literal envelope."""
    return s.replace("'", "''")
 def _wrap_admin_sql_for_jobs_api(billing_project: str, inner_sql: str) -> str:
    """Build the COPY-source SQL that runs admin's `inner_sql` through
    the BigQuery jobs API via the DuckDB BQ extension's
    ``bigquery_query()`` table function.
    Why: the default `bq."ds"."t"` reference path uses the BQ Storage
    Read API which rejects non-base entities (views, materialized views).
    Routing through `bigquery_query()` uses the jobs API which accepts
    every entity type uniformly.
    Args:
        billing_project: GCP project ID that bills the BQ job. Must
            match the GCP project_id grammar — anything else is rejected
            as a defense-in-depth check (admin is trusted, but a typo
            should fail closed not silently lose budget to the wrong
            project).
        inner_sql: BigQuery-flavor SQL the admin registered as
            ``source_query``. Should be BigQuery-native; DuckDB-flavor
            `bq."ds"."t"` references are not enforced here but will fail at
            COPY time inside the BQ jobs API. Existing rows are converted by
            the v24 schema migration; new rows are validated upstream at
            register/PUT.
    Returns:
        A DuckDB-parseable SQL fragment suitable as the operand of
        ``COPY (...) TO 'path' (FORMAT PARQUET)``.
    """
    if not _BILLING_PROJECT_RE.match(billing_project):
        raise ValueError(
            f"billing_project {billing_project!r} is not a valid GCP project_id "
            "(grammar: ^[a-z][a-z0-9-]{4,28}[a-z0-9]$)"
        )
    return (
        f"SELECT * FROM bigquery_query('{billing_project}', "
        f"'{_escape_sql_string_literal(inner_sql)}')"
    )
 def _create_meta_table(conn: duckdb.DuckDBPyConnection) -> None:
    """Create the _meta table required by the extract.duckdb contract."""
    conn.execute("DROP TABLE IF EXISTS _meta")
@ -321,33 +496,42 @@ def materialize_query(
    to `<output_dir>/data/<table_id>.parquet` atomically.
    Designed for `query_mode='materialized'` table_registry rows. The SQL
-    is admin-registered (validated upstream) and may reference DuckDB
+    is admin-registered BQ-native SQL (DuckDB-flavor `bq."ds"."t"` refs are
-    three-part identifiers (`bq."dataset"."table"`) resolved by the
+    validated upstream). The SQL is wrapped in `bigquery_query('<billing>',
-    in-session ATTACH, OR native BQ identifiers via the `bigquery_query()`
+    '<inner>')` before the COPY so the BQ extension routes through the BQ
-    table function — both work because the session has the bigquery
+    jobs API — the default Storage Read API path rejects non-base entities
-    extension loaded with a SECRET token.
+    (views, materialized views) with "non-table entities cannot be read with
    the storage API". Routing through `bigquery_query()` works uniformly for
    base tables and views alike.
    Cost guardrail: when `max_bytes` is a positive int, run a BQ dry-run
    via `bq.client()` first; raise `MaterializeBudgetError` if the
    estimate exceeds the cap. `max_bytes=None` or `max_bytes <= 0`
    disables the guardrail (config sentinel, see
-    `data_source.bigquery.max_bytes_per_materialize`).
+    `data_source.bigquery.max_bytes_per_materialize`). The dry-run operates
    on the inner `sql` (BQ-native), not the wrapped form.
-    Dry-run is best-effort and fail-open: if the SQL uses DuckDB syntax
+    Dry-run is best-effort and fail-open: if the dry-run errors (transient
-    that the native BQ client can't parse (e.g. `bq."ds"."t"`), the
+    upstream failure, missing google lib), we log a warning and proceed
-    dry-run raises and we log a warning; the COPY still runs. This
+    with the wrapped COPY.
    matches the BqAccess facade's "client is for native BQ SQL only"
    contract — operators who need the cap to engage write the registered
    SQL using native BQ identifiers (`\\`project.ds.t\\``).
    Atomic write: result lands in `<id>.parquet.tmp` first, then
    `os.replace` swaps it in. A failed COPY leaves no partial file behind.
    Concurrency: per-``table_id`` in-process mutex + advisory file lock
    on ``<table_id>.parquet.lock``. Overlapping calls for the same id
    raise ``MaterializeInFlightError`` immediately so the caller can
    skip cleanly without consuming the COPY budget twice. Stale file
    locks (mtime > ``materialize.lock_ttl_seconds``, default 24 h) are
    reclaimed automatically.
    Args:
        table_id: Logical id from table_registry; becomes the parquet
            filename. Must pass `validate_identifier()` so it can't
            inject path traversal.
-        sql: SELECT statement, no trailing semicolon.
+        sql: BQ-native SELECT statement, no trailing semicolon. Wrapped
            in `bigquery_query()` before the COPY — must not itself
            contain a `bigquery_query()` call.
        bq: A `BqAccess` instance — provides `duckdb_session()` for the
            COPY and `client()` for the dry-run.
        output_dir: Connector root, e.g. `/data/extracts/bigquery`.
@ -358,7 +542,10 @@ def materialize_query(
        {"rows": int, "size_bytes": int, "query_mode": "materialized"}
    Raises:
-        ValueError: if `table_id` is unsafe.
+        ValueError: if `table_id` is unsafe or `bq.projects.billing` fails
            the GCP project_id grammar check.
        MaterializeInFlightError: if a concurrent call for the same table_id
            is already in progress (in-process or cross-process).
        MaterializeBudgetError: if `max_bytes > 0` and dry-run estimate exceeds it.
        BqAccessError: from `bq.duckdb_session()` (auth_failed / bq_lib_missing /
            not_configured) — caller catches and aggregates into the trigger
@ -374,99 +561,114 @@ def materialize_query(
    parquet_path = data_dir / f"{table_id}.parquet"
    tmp_path = data_dir / f"{table_id}.parquet.tmp"
-    if tmp_path.exists():
+    lock_path = data_dir / f"{table_id}.parquet.lock"
        tmp_path.unlink()
-    # Cost guardrail (best-effort — fail-open if dry-run can't parse the SQL).
+    proc_lock = _get_table_lock(table_id)
-    if max_bytes is not None and max_bytes > 0:
+    if not proc_lock.acquire(blocking=False):
        raise MaterializeInFlightError(table_id, layer="process")
    try:
        file_lock = _try_acquire_file_lock(lock_path)
        if file_lock is None:
            raise MaterializeInFlightError(table_id, layer="file")
        try:
            from app.api.v2_scan import _bq_dry_run_bytes  # reuse main's impl
            estimated = _bq_dry_run_bytes(bq, sql)
        except Exception as e:
            logger.warning(
                "BQ dry-run failed for materialize cost guardrail (fail-open): %s. "
                "If the SQL uses DuckDB three-part names like bq.\"ds\".\"t\", "
                "rewrite to native BQ identifiers (`project.ds.t`) for the "
                "guardrail to engage. Proceeding with COPY.",
                e,
            )
            estimated = 0
        if estimated > max_bytes:
            raise MaterializeBudgetError(
                f"dry-run estimate {estimated:,} bytes exceeds cap "
                f"{max_bytes:,} for table {table_id!r}",
                table_id=table_id,
                current=estimated,
                limit=max_bytes,
            )
    # COPY through a BqAccess-managed session.
    with bq.duckdb_session() as conn:
        # ATTACH the data project — but only when no `bq` catalog is
        # already attached. Production sessions (real BqAccess) come with
        # only `:memory:` and need the ATTACH; test sessions pre-populate
        # `bq` as a fixture catalog and would error on a redundant ATTACH
        # (alias already in use) AND on the bigquery extension load when
        # the test runner has no cached extension. Detecting via
        # `duckdb_databases()` keeps the ATTACH path idempotent without
        # swallowing real errors (auth, cross-project permission,
        # malformed project_id) — those still propagate from the actual
        # ATTACH call.
        attached = {
            r[0] for r in conn.execute(
                "SELECT database_name FROM duckdb_databases()"
            ).fetchall()
        }
        if "bq" not in attached:
            conn.execute(
                f"ATTACH 'project={bq.projects.data}' AS bq (TYPE bigquery, READ_ONLY)"
            )
        try:
            safe_path = str(tmp_path).replace("'", "''")
            conn.execute(f"COPY ({sql}) TO '{safe_path}' (FORMAT PARQUET)")
            rows = conn.execute(
                f"SELECT count(*) FROM read_parquet('{safe_path}')"
            ).fetchone()[0]
        except Exception:
            if tmp_path.exists():
                tmp_path.unlink()
            raise
-    # Compute the parquet hash inline before the atomic swap. The caller used
+            # Build the wrapped SQL once — both the cost guardrail dry-run and
-    # to re-read the file in `_run_materialized_pass` to hash it via
+            # the COPY operate on `sql` (the inner BQ SQL); only the COPY needs
-    # `_file_hash`, but that's a synchronous full-read on the FastAPI worker
+            # the DuckDB-side bigquery_query() envelope.
-    # thread — a 10 GiB parquet means 50+ seconds of disk I/O blocking other
+            billing_project = bq.projects.billing
-    # requests. Hashing here keeps the open-file handle hot from the COPY
+            wrapped_sql = _wrap_admin_sql_for_jobs_api(billing_project, sql)
    # round and removes the second read. Devil's-advocate review item.
    import hashlib
    h = hashlib.md5()
    with open(tmp_path, "rb") as f:
        for chunk in iter(lambda: f.read(8192), b""):
            h.update(chunk)
    parquet_hash = h.hexdigest()
-    size_bytes = tmp_path.stat().st_size
+            if max_bytes is not None and max_bytes > 0:
-    os.replace(tmp_path, parquet_path)
+                try:
                    from app.api.v2_scan import _bq_dry_run_bytes  # reuse main's impl
                    estimated = _bq_dry_run_bytes(bq, sql)  # NB: pass inner SQL (BQ-native)
                except Exception as e:
                    logger.warning(
                        "BQ dry-run failed for materialize cost guardrail (fail-open): %s. "
                        "Proceeding with COPY against `bigquery_query()` wrapping.",
                        e,
                    )
                    estimated = 0
                if estimated > max_bytes:
                    raise MaterializeBudgetError(
                        f"dry-run estimate {estimated:,} bytes exceeds cap "
                        f"{max_bytes:,} for table {table_id!r}",
                        table_id=table_id,
                        current=estimated,
                        limit=max_bytes,
                    )
-    rows = int(rows)
+            # COPY through a BqAccess-managed session. The session has the BQ
-    if rows == 0:
+            # extension loaded with a SECRET token; bigquery_query() reuses that
-        # 0 rows is indistinguishable from "the SQL is wrong and nobody
+            # auth path against the billing_project for the jobs API call.
-        # noticed" — surface it loudly so operators see it in the scheduler
+            with bq.duckdb_session() as conn:
-        # log line and the per-row error aggregation. Caller decides whether
+                attached = {
-        # to alert.
+                    r[0] for r in conn.execute(
-        logger.warning(
+                        "SELECT database_name FROM duckdb_databases()"
-            "Materialized %s produced 0 rows — verify the SQL filter is "
+                    ).fetchall()
-            "intentional. Parquet written: %s",
+                }
-            table_id, parquet_path,
+                if "bq" not in attached:
-        )
+                    conn.execute(
                        f"ATTACH 'project={bq.projects.data}' AS bq (TYPE bigquery, READ_ONLY)"
                    )
-    return {
+                try:
-        "rows": rows,
+                    safe_path = _escape_sql_string_literal(str(tmp_path))
-        "size_bytes": size_bytes,
+                    conn.execute(
-        "query_mode": "materialized",
+                        f"COPY ({wrapped_sql}) TO '{safe_path}' (FORMAT PARQUET)"
-        "hash": parquet_hash,
+                    )
-    }
+                    rows = conn.execute(
                        f"SELECT count(*) FROM read_parquet('{safe_path}')"
                    ).fetchone()[0]
                except Exception:
                    if tmp_path.exists():
                        tmp_path.unlink()
                    raise
            # Compute the parquet hash inline before the atomic swap. The caller used
            # to re-read the file in `_run_materialized_pass` to hash it via
            # `_file_hash`, but that's a synchronous full-read on the FastAPI worker
            # thread — a 10 GiB parquet means 50+ seconds of disk I/O blocking other
            # requests. Hashing here keeps the open-file handle hot from the COPY
            # round and removes the second read. Devil's-advocate review item.
            h = hashlib.md5()
            with open(tmp_path, "rb") as f:
                for chunk in iter(lambda: f.read(8192), b""):
                    h.update(chunk)
            parquet_hash = h.hexdigest()
            size_bytes = tmp_path.stat().st_size
            os.replace(tmp_path, parquet_path)
            rows = int(rows)
            if rows == 0:
                # 0 rows is indistinguishable from "the SQL is wrong and nobody
                # noticed" — surface it loudly so operators see it in the scheduler
                # log line and the per-row error aggregation. Caller decides whether
                # to alert.
                logger.warning(
                    "Materialized %s produced 0 rows — verify the SQL filter is "
                    "intentional. Parquet written: %s",
                    table_id, parquet_path,
                )
            return {
                "rows": rows,
                "size_bytes": size_bytes,
                "query_mode": "materialized",
                "hash": parquet_hash,
            }
        finally:
            try:
                file_lock.close()  # releases flock
            except Exception:
                pass
            # Don't unlink lock_path — its mtime is the TTL signal for
            # the next reclaim. Leaving it in place is intentional.
    finally:
        proc_lock.release()
 def _resolve_bq_project_id() -> str:
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,6 +1,6 @@
 [project]
 name = "agnes-the-ai-analyst"
-version = "0.32.0"
+version = "0.33.0"
 description = "Agnes — AI Data Analyst platform for AI analytical systems"
 requires-python = ">=3.11,<3.14"
 license = "MIT"
--- a/src/db.py
+++ b/src/db.py
@ -39,7 +39,7 @@ def _maybe_instrument(con, db_tag: str):
 _SAFE_IDENTIFIER = re.compile(r"^[a-zA-Z_][a-zA-Z0-9_]{0,63}$")
-SCHEMA_VERSION = 23
+SCHEMA_VERSION = 24
 _SYSTEM_SCHEMA = """
 CREATE TABLE IF NOT EXISTS schema_version (
@ -1682,6 +1682,81 @@ _V22_TO_V23_MIGRATIONS = [
 ]
 # v24: rewrite materialized BQ source_query from DuckDB-flavor
 # (bq."<dataset>"."<table>") to BigQuery-native (`<project>.<dataset>.<table>`)
 # so the new connectors.bigquery.extractor.materialize_query wrapping
 # path (which routes through bigquery_query() / BQ jobs API) accepts
 # them. Pre-v24, materialize used Storage Read API for the bq.<ds>.<tbl>
 # form, which fails for views — see PR for full motivation.
 #
 # This migration is implemented in Python (not pure SQL) because the
 # rewrite is a regex-and-replace per row: the project_id comes from
 # instance_config (file/env), not the DB. SQL alone can't pull the
 # project_id and substitute it. If the project isn't configured at
 # migration time, log a warning per affected row and leave them — the
 # operator must configure data_source.bigquery.project, restart, and
 # the migration will fire on next start (idempotent).
 def _replace_for_v24(project_id: str):
    """Build a re.sub replacement function (not a string) so backslash
    sequences in `project_id` aren't interpreted as group references.
    GCP project IDs can't actually contain backslashes, but using a
    function-form replacement is the defensive idiom — it makes the
    intent explicit and removes the dependency on re.sub's replacement-
    string escaping rules."""
    def _repl(m):
        return f"`{project_id}.{m.group(1)}.{m.group(2)}`"
    return _repl
 def _v23_to_v24_finalize(conn: duckdb.DuckDBPyConnection) -> None:
    import re as _re
    try:
        from app.instance_config import get_value
        project_id = get_value("data_source", "bigquery", "project", default="") or ""
    except Exception:
        project_id = ""
    pattern = _re.compile(r'bq\."([^"]+)"\."([^"]+)"')
    rows = conn.execute(
        "SELECT id, source_query FROM table_registry "
        "WHERE query_mode = 'materialized' "
        "AND source_query LIKE '%bq.\"%' "
        "AND source_type = 'bigquery'"
    ).fetchall()
    if not rows:
        return  # Nothing to migrate; skip the transaction.
    conn.execute("BEGIN TRANSACTION")
    try:
        for row_id, sq in rows:
            if sq is None:
                continue
            if not project_id:
                logger.warning(
                    "v24 migration: skipping rewrite of source_query for row %r — "
                    "data_source.bigquery.project is not configured. Set it via "
                    "/admin/server-config and restart the app to retry the "
                    "migration.", row_id,
                )
                continue
            new_sq = pattern.sub(_replace_for_v24(project_id), sq)
            if new_sq != sq:
                conn.execute(
                    "UPDATE table_registry SET source_query = ? WHERE id = ?",
                    [new_sq, row_id],
                )
                logger.info(
                    "v24 migration: rewrote source_query for row %r", row_id,
                )
        conn.execute("COMMIT")
    except Exception:
        conn.execute("ROLLBACK")
        raise
 def _ensure_schema(conn: duckdb.DuckDBPyConnection) -> None:
    """Create tables if they don't exist. Apply migrations if schema version changed.
@ -1837,6 +1912,8 @@ def _ensure_schema(conn: duckdb.DuckDBPyConnection) -> None:
            if current < 23:
                for sql in _V22_TO_V23_MIGRATIONS:
                    conn.execute(sql)
            if current < 24:
                _v23_to_v24_finalize(conn)
            conn.execute(
                "UPDATE schema_version SET version = ?, applied_at = current_timestamp",
                [SCHEMA_VERSION],
--- a/tests/conftest.py
+++ b/tests/conftest.py
@ -2,6 +2,7 @@
 import os
 from pathlib import Path
 from unittest.mock import MagicMock
 import duckdb
 import pytest
@ -319,3 +320,68 @@ from tests.fixtures.analyst_bootstrap import (  # noqa: E402,F401
    web_session,
    zero_grants_workspace,
 )
@pytest.fixture
 def bq_instance(monkeypatch):
    """Force instance.yaml to look like a BigQuery deployment for the
    duration of one test. Patches the cached load_instance_config so
    /admin/server-config reads / get_value('data_source.bigquery.project')
    return what we want, without touching the on-disk instance.yaml.
    Tests that need BigQuery-specific admin API behaviour (project_id
    validation, materialized source_query checks, etc.) depend on this
    fixture. Yields the fake config dict so callers can inspect it.
    Note: several test files (test_admin_bq_register.py,
    test_admin_tables_ui_materialized.py, …) define their own local
    ``bq_instance`` fixture. Those local definitions shadow this one
    inside those files — the conftest copy is the canonical provider for
    any new test file that imports from this module."""
    fake_cfg = {
        "data_source": {
            "type": "bigquery",
            "bigquery": {"project": "my-test-project", "location": "us"},
        },
    }
    monkeypatch.setattr(
        "app.instance_config.load_instance_config",
        lambda: fake_cfg,
        raising=False,
    )
    from app.instance_config import reset_cache
    reset_cache()
    yield fake_cfg
    reset_cache()
@pytest.fixture
 def stub_bq_extractor(monkeypatch):
    """Mirror tests/test_admin_bq_register.py — bypasses real-BQ traffic
    in the post-register rebuild path so the test stays offline. Required
    whenever the test seeds a remote-mode BQ row via the HTTP API.
    Patches:
    - ``connectors.bigquery.extractor.rebuild_from_registry`` — returns a
      minimal success dict so the admin register endpoint's 200/201 path
      completes without touching a real BQ project.
    - ``src.orchestrator.SyncOrchestrator`` — replaced with a no-op mock so
      the post-register orchestrator.rebuild() call doesn't scan the
      (empty) extracts directory during tests.
    Returns the ``rebuild_from_registry`` MagicMock directly so callers
    that only need the side-effect patcher can ignore the return value,
    and callers that want to assert call args can inspect it."""
    rebuild_mock = MagicMock(return_value={
        "project_id": "my-test-project",
        "tables_registered": 1, "errors": [], "skipped": False,
    })
    monkeypatch.setattr(
        "connectors.bigquery.extractor.rebuild_from_registry",
        rebuild_mock,
    )
    monkeypatch.setattr(
        "src.orchestrator.SyncOrchestrator",
        lambda *a, **kw: MagicMock(),
    )
    return rebuild_mock
--- a/tests/test_admin_register_materialized_server_generated_sql.py
+++ b/tests/test_admin_register_materialized_server_generated_sql.py
@ -0,0 +1,90 @@
 """When admin registers a materialized BQ row with bucket+source_table
 but NO source_query, the server generates the source_query from the
 configured BQ project + the supplied bucket/source_table. Admin never
 has to know about bigquery_query() syntax for the trivial full-table
 dump case.
 Fixtures `seeded_app`, `bq_instance`, `stub_bq_extractor` are auto-
 discovered from `tests/conftest.py` — DO NOT import. `seeded_app`
 is a dict: `{"client": TestClient, "admin_token": str, ...}`.
 """
 from __future__ import annotations
 import pytest
 def _auth(token: str) -> dict:
    """Mirror the project's local _auth helper used in every materialized
    test file (e.g. test_api_admin_materialized.py)."""
    return {"Authorization": f"Bearer {token}"}
 def test_register_materialized_with_bucket_only_generates_source_query(
    seeded_app, bq_instance, stub_bq_extractor,
 ):
    client = seeded_app["client"]
    headers = _auth(seeded_app["admin_token"])
    payload = {
        "name": "trivial_full_dump",
        "source_type": "bigquery",
        "query_mode": "materialized",
        "bucket": "analytics",
        "source_table": "orders_v2",
    }
    resp = client.post("/api/admin/register-table", json=payload, headers=headers)
    assert resp.status_code in (200, 201, 202), resp.text
    reg = client.get("/api/admin/registry", headers=headers).json()
    row = next(t for t in reg["tables"] if t["id"] == "trivial_full_dump")
    expected_project = bq_instance["data_source"]["bigquery"]["project"]
    assert row["source_query"] == (
        f"SELECT * FROM `{expected_project}.analytics.orders_v2`"
    )
 def test_register_materialized_with_explicit_source_query_persists_verbatim(
    seeded_app, bq_instance, stub_bq_extractor,
 ):
    client = seeded_app["client"]
    headers = _auth(seeded_app["admin_token"])
    custom = "SELECT col1, col2 FROM `analytics.orders_v2` WHERE col3 = 'x'"
    payload = {
        "name": "explicit_sql",
        "source_type": "bigquery",
        "query_mode": "materialized",
        "source_query": custom,
    }
    resp = client.post("/api/admin/register-table", json=payload, headers=headers)
    assert resp.status_code in (200, 201, 202), resp.text
    reg = client.get("/api/admin/registry", headers=headers).json()
    row = next(t for t in reg["tables"] if t["id"] == "explicit_sql")
    assert row["source_query"] == custom
 def test_put_flip_to_materialized_with_bucket_generates_source_query(
    seeded_app, bq_instance, stub_bq_extractor,
 ):
    client = seeded_app["client"]
    headers = _auth(seeded_app["admin_token"])
    # First register as remote.
    client.post(
        "/api/admin/register-table",
        json={"name": "flip_t", "source_type": "bigquery",
              "bucket": "analytics", "source_table": "orders_v2"},
        headers=headers,
    )
    # PUT to flip to materialized without supplying source_query.
    resp = client.put(
        "/api/admin/registry/flip_t",
        json={"query_mode": "materialized"},
        headers=headers,
    )
    assert resp.status_code == 200, resp.text
    reg = client.get("/api/admin/registry", headers=headers).json()
    row = next(t for t in reg["tables"] if t["id"] == "flip_t")
    expected_project = bq_instance["data_source"]["bigquery"]["project"]
    assert row["source_query"] == (
        f"SELECT * FROM `{expected_project}.analytics.orders_v2`"
    )
--- a/tests/test_admin_server_config_materialize_section.py
+++ b/tests/test_admin_server_config_materialize_section.py
@ -0,0 +1,179 @@
 """/api/admin/server-config exposes materialize.lock_ttl_seconds and
 accepts updates. Default is 86400 (24h).
 Fixture `seeded_app` is auto-discovered from `tests/conftest.py` —
 DO NOT import. It returns a dict: `{"client": TestClient,
 "admin_token": str, ...}`. Auth helper `_auth(token)` mirrors the
 project's local pattern (also used in test_api_admin_materialized.py).
 Behaviour contract:
  - GET returns `materialize` section in `sections` (empty dict when no
    override is set, since the endpoint surfaces every editable section).
  - GET also exposes the known_fields registry entry for `materialize`
    with `lock_ttl_seconds` spec (kind=int, default=86400).
  - POST with a valid value persists it and GET returns the new value.
  - POST with lock_ttl_seconds < 60 or > 604800 is rejected with 422.
 """
 from __future__ import annotations
 import pytest
 import yaml
 def _auth(token: str) -> dict:
    return {"Authorization": f"Bearer {token}"}
 # ---------------------------------------------------------------------------
 # GET — default state
 # ---------------------------------------------------------------------------
 def test_get_returns_materialize_in_editable_sections(seeded_app):
    """materialize must appear in editable_sections."""
    client = seeded_app["client"]
    headers = _auth(seeded_app["admin_token"])
    resp = client.get("/api/admin/server-config", headers=headers)
    assert resp.status_code == 200
    body = resp.json()
    assert "materialize" in body["editable_sections"]
 def test_get_returns_materialize_section_key(seeded_app):
    """materialize key appears in sections (empty dict when no override set)."""
    client = seeded_app["client"]
    headers = _auth(seeded_app["admin_token"])
    resp = client.get("/api/admin/server-config", headers=headers)
    assert resp.status_code == 200
    body = resp.json()
    # The endpoint surfaces every editable section so the UI can render it.
    assert "materialize" in body["sections"]
 def test_get_returns_materialize_known_fields(seeded_app):
    """known_fields must have a materialize.lock_ttl_seconds entry."""
    client = seeded_app["client"]
    headers = _auth(seeded_app["admin_token"])
    resp = client.get("/api/admin/server-config", headers=headers)
    assert resp.status_code == 200
    body = resp.json()
    mat_fields = body.get("known_fields", {}).get("materialize", {})
    assert "lock_ttl_seconds" in mat_fields, body.get("known_fields", {})
    spec = mat_fields["lock_ttl_seconds"]
    assert spec["kind"] == "int"
    assert spec["default"] == 86400
 # ---------------------------------------------------------------------------
 # POST — update and read back
 # ---------------------------------------------------------------------------
 def test_put_updates_materialize_lock_ttl(seeded_app, tmp_path, monkeypatch):
    """POST with a valid value persists; GET reflects the new value."""
    monkeypatch.setenv("DATA_DIR", str(tmp_path))
    state = tmp_path / "state"
    state.mkdir(parents=True, exist_ok=True)
    import app.instance_config as ic
    ic._instance_config = None
    try:
        client = seeded_app["client"]
        headers = _auth(seeded_app["admin_token"])
        resp = client.post(
            "/api/admin/server-config",
            json={"sections": {"materialize": {"lock_ttl_seconds": 3600}}},
            headers=headers,
        )
        assert resp.status_code == 200, resp.text
        # Verify on disk.
        loaded = yaml.safe_load((state / "instance.yaml").read_text())
        assert loaded["materialize"]["lock_ttl_seconds"] == 3600
        # Verify GET reflects the new value.
        ic._instance_config = None
        resp2 = client.get("/api/admin/server-config", headers=headers)
        assert resp2.json()["sections"]["materialize"]["lock_ttl_seconds"] == 3600
    finally:
        ic._instance_config = None
 # ---------------------------------------------------------------------------
 # POST — validation
 # ---------------------------------------------------------------------------
 def test_invalid_lock_ttl_below_min_rejected(seeded_app):
    """lock_ttl_seconds < 60 is rejected with 422."""
    client = seeded_app["client"]
    headers = _auth(seeded_app["admin_token"])
    resp = client.post(
        "/api/admin/server-config",
        json={"sections": {"materialize": {"lock_ttl_seconds": -5}}},
        headers=headers,
    )
    assert resp.status_code == 422, resp.text
 def test_invalid_lock_ttl_zero_rejected(seeded_app):
    """lock_ttl_seconds=0 is rejected with 422 (below the 60s floor)."""
    client = seeded_app["client"]
    headers = _auth(seeded_app["admin_token"])
    resp = client.post(
        "/api/admin/server-config",
        json={"sections": {"materialize": {"lock_ttl_seconds": 0}}},
        headers=headers,
    )
    assert resp.status_code == 422, resp.text
 def test_invalid_lock_ttl_above_max_rejected(seeded_app):
    """lock_ttl_seconds > 604800 (1 week) is rejected with 422."""
    client = seeded_app["client"]
    headers = _auth(seeded_app["admin_token"])
    resp = client.post(
        "/api/admin/server-config",
        json={"sections": {"materialize": {"lock_ttl_seconds": 604801}}},
        headers=headers,
    )
    assert resp.status_code == 422, resp.text
 def test_valid_lock_ttl_boundary_min_accepted(seeded_app, tmp_path, monkeypatch):
    """lock_ttl_seconds=60 (minimum) is accepted."""
    monkeypatch.setenv("DATA_DIR", str(tmp_path))
    state = tmp_path / "state"
    state.mkdir(parents=True, exist_ok=True)
    import app.instance_config as ic
    ic._instance_config = None
    try:
        client = seeded_app["client"]
        headers = _auth(seeded_app["admin_token"])
        resp = client.post(
            "/api/admin/server-config",
            json={"sections": {"materialize": {"lock_ttl_seconds": 60}}},
            headers=headers,
        )
        assert resp.status_code == 200, resp.text
    finally:
        ic._instance_config = None
 def test_valid_lock_ttl_boundary_max_accepted(seeded_app, tmp_path, monkeypatch):
    """lock_ttl_seconds=604800 (maximum) is accepted."""
    monkeypatch.setenv("DATA_DIR", str(tmp_path))
    state = tmp_path / "state"
    state.mkdir(parents=True, exist_ok=True)
    import app.instance_config as ic
    ic._instance_config = None
    try:
        client = seeded_app["client"]
        headers = _auth(seeded_app["admin_token"])
        resp = client.post(
            "/api/admin/server-config",
            json={"sections": {"materialize": {"lock_ttl_seconds": 604800}}},
            headers=headers,
        )
        assert resp.status_code == 200, resp.text
    finally:
        ic._instance_config = None
--- a/tests/test_admin_validator_backtick_relaxed_for_materialized.py
+++ b/tests/test_admin_validator_backtick_relaxed_for_materialized.py
@ -0,0 +1,39 @@
 """Backtick-quoted identifiers are required for materialized BQ source_query
 (when the dataset/table/project name contains a dash). The validator must
 allow them on materialized rows but still reject on remote/local."""
 from __future__ import annotations
 import pytest
 from pydantic import ValidationError
 from app.api.admin import RegisterTableRequest
 def test_materialized_accepts_backticks():
    req = RegisterTableRequest(
        name="b1",
        source_type="bigquery",
        query_mode="materialized",
        source_query="SELECT * FROM `my-project.ds.tbl`",
    )
    assert req.source_query == "SELECT * FROM `my-project.ds.tbl`"
 def test_remote_rejects_backticks():
    with pytest.raises(ValidationError):
        RegisterTableRequest(
            name="r1",
            source_type="bigquery",
            query_mode="remote",
            bucket="ds", source_table="tbl",
            source_query="SELECT * FROM `prj.ds.tbl`",
        )
 def test_local_rejects_backticks():
    with pytest.raises(ValidationError):
        RegisterTableRequest(
            name="l1",
            source_type="keboola",
            query_mode="local",
            source_query="SELECT * FROM `kbc.ds.tbl`",
        )
--- a/tests/test_api_admin_materialized.py
+++ b/tests/test_api_admin_materialized.py
@ -18,8 +18,6 @@ Covers PR #145 (re-implementation against 0.24.0 base):
 Shares the seeded_app + bq_instance fixtures from conftest /
 test_admin_bq_register.py for parity with the existing BQ test surface.
 """
 from unittest.mock import MagicMock
 import pytest
@ -27,59 +25,15 @@ def _auth(token):
    return {"Authorization": f"Bearer {token}"}
@pytest.fixture
 def stub_bq_extractor(monkeypatch):
    """Mirror tests/test_admin_bq_register.py — bypasses real-BQ traffic
    in the post-register rebuild path so the test stays offline. Required
    whenever the test seeds a remote-mode BQ row via the HTTP API."""
    rebuild_mock = MagicMock(return_value={
        "project_id": "my-test-project",
        "tables_registered": 1, "errors": [], "skipped": False,
    })
    monkeypatch.setattr(
        "connectors.bigquery.extractor.rebuild_from_registry",
        rebuild_mock,
    )
    monkeypatch.setattr(
        "src.orchestrator.SyncOrchestrator",
        lambda *a, **kw: MagicMock(),
    )
    return rebuild_mock
@pytest.fixture
 def bq_instance(monkeypatch):
    """Force instance.yaml to look like a BigQuery deployment.
    Mirrors tests/test_admin_bq_register.py::bq_instance so the
    project_id read inside _validate_bigquery_register_payload succeeds.
    """
    fake_cfg = {
        "data_source": {
            "type": "bigquery",
            "bigquery": {"project": "my-test-project", "location": "us"},
        },
    }
    monkeypatch.setattr(
        "app.instance_config.load_instance_config",
        lambda: fake_cfg,
        raising=False,
    )
    from app.instance_config import reset_cache
    reset_cache()
    yield fake_cfg
    reset_cache()
 def _materialized_payload(**overrides):
    p = {
        "name": "orders_90d",
        "source_type": "bigquery",
        "query_mode": "materialized",
-        # DuckDB-flavor SQL (not BQ-native backticks) — the materialize path
+        # BQ-native or DuckDB-flavor SQL — both accepted since Task 2 wraps
-        # runs the SQL through the DuckDB BQ extension's COPY which uses
+        # materialized SQL in bigquery_query() (BQ jobs API path). Backtick
-        # double-quoted identifiers. Backticks are now rejected at register
+        # identifiers are now allowed for materialized rows; remote/local rows
-        # time. See `_BACKTICK_REJECTION_MESSAGE` in app/api/admin.py.
+        # still require DuckDB-flavor (double-quoted) identifiers.
        "source_query": 'SELECT date FROM bq."ds"."orders"',
        "sync_schedule": "every 6h",
    }
@ -326,36 +280,44 @@ def test_register_materialized_persists_source_query_in_registry(seeded_app, bq_
    assert "WHERE x = 1" in row["source_query"]
-# --- Backtick (BigQuery-native) source_query rejection -----------------------
+# --- Backtick (BigQuery-native) source_query handling ------------------------
 #
-# DuckDB BQ extension's COPY path interprets the SQL through DuckDB's parser,
+# Task 2 (materialize-sync-fix) changed the BQ materialization path to run
-# which does NOT understand backtick-quoted identifiers (it uses double quotes
+# admin SQL through the BQ jobs API (bigquery_query() wrapper) rather than
-# for quoted identifiers). A registered backtick-style source_query like
+# through DuckDB's BQ extension COPY path. BQ-native SQL requires backticks
-# `SELECT * FROM \`prj.ds.t\`` either parse-errors or returns 0 rows at next
+# for dashed project/dataset/table identifiers. The backtick guard has been
-# materialize tick — silently — and no parquet ends up at the canonical path.
+# relaxed for ALL materialized rows: the validator now only rejects backticks
-# Reject at registration time with an actionable message.
+# for remote/local rows (DuckDB-flavor SQL contract). Materialized rows must
 # be allowed to carry backticks so operators can reference dashed identifiers.
 # See test_admin_validator_backtick_relaxed_for_materialized.py for the
 # model-layer unit tests.
-def test_register_materialized_rejects_backtick_source_query(seeded_app, bq_instance):
+def test_register_materialized_accepts_backtick_source_query(seeded_app, bq_instance, stub_bq_extractor):
    """BQ materialized rows now accept BQ-native backtick syntax; the
    materialize path (Task 2) wraps them in bigquery_query() which uses
    the BQ jobs API — not DuckDB's COPY — so backticks are valid."""
    c = seeded_app["client"]
    token = seeded_app["admin_token"]
    r = c.post(
        "/api/admin/register-table",
        json=_materialized_payload(
            name="bt_native",
-            source_query="SELECT * FROM `prj-grp.ds.product_inventory`",
+            source_query="SELECT * FROM `my-project.ds.product_inventory`",
        ),
        headers=_auth(token),
    )
-    assert r.status_code == 422, r.json()
+    assert r.status_code in (200, 201, 202), r.json()
-    detail = str(r.json().get("detail", "")).lower()
+    reg = c.get("/api/admin/registry", headers=_auth(token)).json()
-    assert "backtick" in detail
+    row = next(t for t in reg["tables"] if t["id"] == "bt_native")
-    assert 'bq."' in detail or "duckdb" in detail
+    assert row["source_query"] == "SELECT * FROM `my-project.ds.product_inventory`"
-def test_update_materialized_rejects_backtick_source_query(
+def test_update_materialized_accepts_backtick_source_query(
    seeded_app, bq_instance, stub_bq_extractor,
 ):
    """PUT to a materialized BQ row may switch source_query to BQ-native
    backtick form — accepted now that Task 2 wraps via jobs API."""
    c = seeded_app["client"]
    token = seeded_app["admin_token"]
@ -370,7 +332,7 @@ def test_update_materialized_rejects_backtick_source_query(
    assert r.status_code == 201, r.json()
    table_id = r.json()["id"]
-    # PATCH the source_query to a backtick form — must be rejected.
+    # PATCH the source_query to a BQ-native backtick form — now accepted.
    r2 = c.put(
        f"/api/admin/registry/{table_id}",
        json={
@ -379,14 +341,17 @@ def test_update_materialized_rejects_backtick_source_query(
        },
        headers=_auth(token),
    )
-    assert r2.status_code == 422, r2.json()
+    assert r2.status_code == 200, r2.json()
-    detail = str(r2.json().get("detail", "")).lower()
+    reg = c.get("/api/admin/registry", headers=_auth(token)).json()
-    assert "backtick" in detail
+    row = next(t for t in reg["tables"] if t["id"] == table_id)
    assert row["source_query"] == "SELECT * FROM `prj.ds.t`"
-def test_register_materialized_keboola_rejects_backtick_source_query(seeded_app):
+def test_register_materialized_keboola_accepts_backtick_source_query(seeded_app):
-    """The check is generic, not BQ-only — Keboola materialized rows that
+    """Keboola materialized rows also accept backtick source_query at register
-    include backticks would also be silently skipped at materialize time."""
+    time — the backtick guard now only applies to remote/local rows. If the
    SQL is invalid at runtime (DuckDB parse error), that surfaces as a sync
    error, not a registration error."""
    c = seeded_app["client"]
    token = seeded_app["admin_token"]
    r = c.post(
@ -399,9 +364,7 @@ def test_register_materialized_keboola_rejects_backtick_source_query(seeded_app)
        },
        headers=_auth(token),
    )
-    assert r.status_code == 422, r.json()
+    assert r.status_code == 201, r.json()
    detail = str(r.json().get("detail", "")).lower()
    assert "backtick" in detail
 # --- Surface materialize errors per-row ---------------------------------------
--- a/tests/test_bq_cost_guardrail.py
+++ b/tests/test_bq_cost_guardrail.py
@ -18,7 +18,13 @@ from connectors.bigquery.extractor import materialize_query, MaterializeBudgetEr
 def _bq_with_seed(tables: dict[str, str] | None = None) -> BqAccess:
    """Stub BqAccess seeded with in-memory tables (same recipe as
-    test_bq_materialize)."""
+    test_bq_materialize).
    A `bigquery_query(project, sql_text)` table macro is registered so the
    wrapping added by `_wrap_admin_sql_for_jobs_api` (Task 2 — routes COPY
    through the BQ jobs API for views) resolves against the in-memory tables
    without needing the real BQ extension.
    """
    tables = tables or {}
    @contextmanager
@ -30,6 +36,12 @@ def _bq_with_seed(tables: dict[str, str] | None = None) -> BqAccess:
                conn.execute(f"CREATE SCHEMA IF NOT EXISTS {s}")
            for ref, body in tables.items():
                conn.execute(f"CREATE OR REPLACE TABLE {ref} AS {body}")
            # Stub bigquery_query() so materialize_query's wrapped COPY works
            # against the in-memory bq catalog without the real BQ extension.
            conn.execute(
                "CREATE OR REPLACE MACRO bigquery_query(project, sql_text) "
                "AS TABLE SELECT * FROM query(sql_text)"
            )
            yield conn
        finally:
            conn.close()
@ -116,22 +128,26 @@ def test_zero_max_bytes_skips_dry_run(tmp_path):
    assert stats["rows"] == 1
-def test_dry_run_failure_is_fail_open(tmp_path):
+def test_dry_run_failure_is_fail_open(tmp_path, caplog):
    """If the dry-run errors (DuckDB syntax, missing google lib, transient
    upstream failure) we don't block — log + proceed with COPY. Operators
    who need hard-fail watch logs for the warning."""
    import logging
    out = tmp_path / "extracts" / "bigquery"
    out.mkdir(parents=True)
    bq = _bq_with_seed({"bq.test.tiny": "SELECT 1 AS n"})
-    with patch(
+    with caplog.at_level(logging.WARNING, logger="connectors.bigquery.extractor"):
-        "app.api.v2_scan._bq_dry_run_bytes", side_effect=RuntimeError("boom")
+        with patch(
-    ):
+            "app.api.v2_scan._bq_dry_run_bytes", side_effect=RuntimeError("boom")
-        stats = materialize_query(
+        ):
-            table_id="t1",
+            stats = materialize_query(
-            sql="SELECT * FROM bq.test.tiny",
+                table_id="t1",
-            bq=bq,
+                sql="SELECT * FROM bq.test.tiny",
-            output_dir=str(out),
+                bq=bq,
-            max_bytes=10 * 2**30,
+                output_dir=str(out),
-        )
+                max_bytes=10 * 2**30,
            )
    assert stats["rows"] == 1
    assert "fail-open" in caplog.text
--- a/tests/test_bq_materialize.py
+++ b/tests/test_bq_materialize.py
@ -21,6 +21,11 @@ def _make_stub_bq(tables: dict[str, str] | None = None) -> BqAccess:
    with a pretend `bq` catalog containing test tables. `tables` maps
    DuckDB-three-part references like `'bq.test.orders'` to a SELECT
    expression to seed them with.
    A `bigquery_query(project, sql_text)` table macro is registered so the
    wrapping added by `_wrap_admin_sql_for_jobs_api` (Task 2 — routes COPY
    through the BQ jobs API for views) resolves against the in-memory tables
    without needing the real BQ extension.
    """
    tables = tables or {}
@ -34,6 +39,12 @@ def _make_stub_bq(tables: dict[str, str] | None = None) -> BqAccess:
                conn.execute(f"CREATE SCHEMA IF NOT EXISTS {s}")
            for ref, body in tables.items():
                conn.execute(f"CREATE OR REPLACE TABLE {ref} AS {body}")
            # Stub bigquery_query() so materialize_query's wrapped COPY works
            # against the in-memory bq catalog without the real BQ extension.
            conn.execute(
                "CREATE OR REPLACE MACRO bigquery_query(project, sql_text) "
                "AS TABLE SELECT * FROM query(sql_text)"
            )
            yield conn
        finally:
            conn.close()
--- a/tests/test_bq_materialize_concurrency.py
+++ b/tests/test_bq_materialize_concurrency.py
@ -0,0 +1,204 @@
 """Per-table_id concurrency: in-process mutex + advisory file lock with
 TTL reclaim. Two overlapping materialize_query calls for the same id
 must NOT corrupt each other's parquet."""
 from __future__ import annotations
 import os
 import threading
 import time
 from pathlib import Path
 from unittest.mock import MagicMock, patch
 import pytest
 from connectors.bigquery.extractor import (
    materialize_query,
    MaterializeInFlightError,
    _get_table_lock,
    _LOCK_TTL_DEFAULT_SECONDS,
 )
@pytest.fixture(autouse=True)
 def reset_locks(monkeypatch):
    # Tests must not share lock state across runs.
    import connectors.bigquery.extractor as mod
    monkeypatch.setattr(mod, "_table_locks", {})
    yield
 def _slow_bq(stall_seconds: float = 1.0):
    """Build a fake BqAccess whose duckdb_session COPY blocks for
    `stall_seconds` so we can race a second call against it."""
    bq = MagicMock()
    bq.projects.billing = "prj-billing"
    bq.projects.data = "prj-data"
    class _Session:
        def __enter__(self):
            return self
        def __exit__(self, *a):
            return False
        def execute(self, sql):
            if sql.startswith("SELECT database_name"):
                class _R:
                    def fetchall(self):
                        return [("memory",)]
                return _R()
            if sql.startswith("ATTACH"):
                return MagicMock()
            if sql.startswith("COPY"):
                # Simulate a long-running COPY by writing a stub parquet
                # then sleeping so a second call can race us.
                # Extract the path from the COPY statement.
                import re
                m = re.search(r"TO '([^']+)'", sql)
                assert m
                Path(m.group(1)).write_bytes(b"PARQUET_STUB_HEADER" + b"\x00" * 200)
                time.sleep(stall_seconds)
                return MagicMock()
            if sql.startswith("SELECT count"):
                class _R:
                    def fetchone(self):
                        return (42,)
                return _R()
            return MagicMock()
    bq.duckdb_session.return_value = _Session()
    return bq
 def test_concurrent_calls_for_same_id_raise_in_flight(tmp_path):
    bq = _slow_bq(stall_seconds=2.0)
    out_dir = str(tmp_path)
    captured: list = []
    def runner(tag):
        try:
            r = materialize_query(
                table_id="t1", sql="SELECT 1",
                bq=bq, output_dir=out_dir, max_bytes=None,
            )
            captured.append(("ok", tag, r))
        except MaterializeInFlightError as e:
            captured.append(("in_flight", tag, str(e)))
        except Exception as e:
            captured.append(("err", tag, str(e)))
    t1 = threading.Thread(target=runner, args=("first",))
    t2 = threading.Thread(target=runner, args=("second",))
    t1.start()
    time.sleep(0.2)  # let t1 acquire the lock
    t2.start()
    t1.join()
    t2.join()
    outcomes = [c[0] for c in captured]
    assert outcomes.count("ok") == 1, f"expected exactly one success, got {captured}"
    assert outcomes.count("in_flight") == 1
 def test_sequential_calls_for_same_id_both_succeed(tmp_path):
    bq = _slow_bq(stall_seconds=0.05)
    out_dir = str(tmp_path)
    r1 = materialize_query(
        table_id="t1", sql="SELECT 1",
        bq=bq, output_dir=out_dir, max_bytes=None,
    )
    r2 = materialize_query(
        table_id="t1", sql="SELECT 1",
        bq=bq, output_dir=out_dir, max_bytes=None,
    )
    assert r1["rows"] == 42
    assert r2["rows"] == 42
 def test_different_ids_run_in_parallel(tmp_path):
    bq = _slow_bq(stall_seconds=1.0)
    out_dir = str(tmp_path)
    captured: list = []
    def runner(tid):
        try:
            r = materialize_query(
                table_id=tid, sql="SELECT 1",
                bq=bq, output_dir=out_dir, max_bytes=None,
            )
            captured.append((tid, r["rows"]))
        except Exception as e:
            captured.append((tid, "ERROR"))
    threads = [threading.Thread(target=runner, args=(f"tab_{i}",)) for i in range(3)]
    start = time.time()
    for t in threads: t.start()
    for t in threads: t.join()
    elapsed = time.time() - start
    # If they were serialized, would take >= 3s. Parallel: ~1s.
    assert elapsed < 2.0, f"expected parallel, elapsed={elapsed:.2f}s"
    assert len(captured) == 3
    assert all(c[1] == 42 for c in captured)
 def test_stale_file_lock_is_reclaimed_after_ttl(tmp_path, monkeypatch):
    """Verify a stale, unheld .lock file (old mtime, no live flock holder) does NOT
    cause `MaterializeInFlightError`. The reclaim branch in `_try_acquire_file_lock`
    is technically not reached here (the first `_try_open_and_flock` succeeds because
    nobody holds the lock), but exercising the in-flight-by-mtime-only mistake is what
    this test guards against."""
    bq = _slow_bq(stall_seconds=0.05)
    lock_path = Path(tmp_path) / "data" / "t1.parquet.lock"
    lock_path.parent.mkdir(parents=True, exist_ok=True)
    lock_path.write_text("")
    # Set mtime to 25h ago (> default 24h TTL).
    old_ts = time.time() - 25 * 3600
    os.utime(lock_path, (old_ts, old_ts))
    r = materialize_query(
        table_id="t1", sql="SELECT 1",
        bq=bq, output_dir=str(tmp_path), max_bytes=None,
    )
    assert r["rows"] == 42
 def test_fresh_file_lock_blocks_with_in_flight_error(tmp_path, monkeypatch):
    """Force a fresh .lock file (mtime within TTL) and verify a new
    call raises rather than reclaims."""
    bq = _slow_bq(stall_seconds=0.05)
    lock_path = Path(tmp_path) / "data" / "t1.parquet.lock"
    lock_path.parent.mkdir(parents=True, exist_ok=True)
    # Open the lock file and HOLD a fcntl exclusive lock so the materialize
    # call's flock(LOCK_NB) sees a real conflicting lock — relying on
    # mtime-only would let the test pass even if flock acquisition was
    # broken.
    import fcntl
    holder = open(lock_path, "w")
    fcntl.flock(holder.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
    try:
        with pytest.raises(MaterializeInFlightError):
            materialize_query(
                table_id="t1", sql="SELECT 1",
                bq=bq, output_dir=str(tmp_path), max_bytes=None,
            )
    finally:
        fcntl.flock(holder.fileno(), fcntl.LOCK_UN)
        holder.close()
 def test_lock_ttl_reads_from_instance_config(tmp_path, monkeypatch):
    """When `materialize.lock_ttl_seconds` is set in instance.yaml, that
    value overrides the default."""
    # Patches `app.instance_config.get_value` directly. This works because
    # `_get_lock_ttl_seconds` re-imports `get_value` on every call (see
    # extractor.py for the deferred-import rationale). If a future change
    # hoists the import to module-level, this patch must change to target
    # `connectors.bigquery.extractor.get_value` instead.
    monkeypatch.setattr(
        "app.instance_config.get_value",
        lambda *args, **kw: 60 if args == ("materialize", "lock_ttl_seconds") else kw.get("default"),
    )
    from connectors.bigquery.extractor import _get_lock_ttl_seconds
    assert _get_lock_ttl_seconds() == 60
--- a/tests/test_bq_materialize_query_wrapping.py
+++ b/tests/test_bq_materialize_query_wrapping.py
@ -0,0 +1,54 @@
 """materialize_query must always wrap admin source_query in
 bigquery_query('<billing>', '<admin>') so the COPY uses BQ jobs API,
 which works for base tables AND views — Storage Read API does not."""
 from __future__ import annotations
 from pathlib import Path
 from unittest.mock import MagicMock, patch
 import pytest
 from connectors.bigquery.extractor import (
    _wrap_admin_sql_for_jobs_api,
    _escape_sql_string_literal,
 )
 def test_wrap_simple_select():
    out = _wrap_admin_sql_for_jobs_api(
        billing_project="prj-billing",
        inner_sql="SELECT * FROM `ds.tbl`",
    )
    assert out == (
        "SELECT * FROM bigquery_query('prj-billing', "
        "'SELECT * FROM `ds.tbl`')"
    )
 def test_escape_single_quotes_in_inner_sql():
    inner = "SELECT name FROM `ds.tbl` WHERE country = 'CZ'"
    escaped = _escape_sql_string_literal(inner)
    assert escaped == "SELECT name FROM `ds.tbl` WHERE country = ''CZ''"
 def test_wrap_with_inner_quotes_round_trips():
    inner = "SELECT * FROM `ds.tbl` WHERE col = 'foo''bar'"
    out = _wrap_admin_sql_for_jobs_api("myproject", inner)
    # Outer string-literal envelope must double the inner single quotes
    # so DuckDB's parser sees a balanced literal.
    assert out.count("'") % 2 == 0
    # Round-trip: stripping the wrapper gives back the original inner exactly.
    prefix = "SELECT * FROM bigquery_query('myproject', '"
    assert out.startswith(prefix)
    assert out.endswith("')")
    middle = out[len(prefix):-2]
    # DuckDB string literal escape: '' → '. Reverse it.
    decoded = middle.replace("''", "'")
    assert decoded == inner
 def test_billing_project_validates_format():
    with pytest.raises(ValueError, match="billing_project"):
        _wrap_admin_sql_for_jobs_api(
            billing_project="bad project'; DROP",
            inner_sql="SELECT 1",
        )
--- a/tests/test_db_schema_version.py
+++ b/tests/test_db_schema_version.py
@ -13,8 +13,9 @@ import duckdb
 from src.db import SCHEMA_VERSION, _ensure_schema, get_schema_version
-def test_schema_version_is_23():
+def test_schema_version_is_24():
-    assert SCHEMA_VERSION == 23
+    # bumped from 23→24 for the materialized BQ source_query rewrite migration
    assert SCHEMA_VERSION == 24
 def test_v20_adds_source_query(tmp_path):
@ -29,7 +30,7 @@ def test_v20_adds_source_query(tmp_path):
        ).fetchall()
    }
    assert "source_query" in cols, f"source_query missing from {cols}"
-    assert get_schema_version(conn) == 23
+    assert get_schema_version(conn) == SCHEMA_VERSION
    conn.close()
@ -83,7 +84,7 @@ def test_v19_db_migrates_to_v20(tmp_path):
    _ensure_schema(conn)
-    assert get_schema_version(conn) == 23
+    assert get_schema_version(conn) == SCHEMA_VERSION  # bumped 23→24
    cols = {
        r[0] for r in conn.execute(
            "SELECT column_name FROM information_schema.columns "
--- a/tests/test_materialized_e2e.py
+++ b/tests/test_materialized_e2e.py
@ -75,7 +75,13 @@ def stub_bq_extractor(monkeypatch):
@pytest.fixture
 def stub_bq():
    """Real-shape BqAccess wired to in-memory DuckDB factories so the
-    materialize_query path can run end-to-end without GCP."""
+    materialize_query path can run end-to-end without GCP.
    A `bigquery_query(project, sql_text)` table macro is registered so the
    wrapping added by `_wrap_admin_sql_for_jobs_api` (Task 2 — routes COPY
    through the BQ jobs API for views) resolves against the in-memory tables
    without needing the real BQ extension.
    """
    @contextmanager
    def _session(_p):
        conn = duckdb.connect(":memory:")
@ -87,6 +93,12 @@ def stub_bq():
                "SELECT 'EU' AS region, 100 AS revenue UNION ALL "
                "SELECT 'US' AS region, 250 AS revenue"
            )
            # Stub bigquery_query() so materialize_query's wrapped COPY works
            # against the in-memory bq catalog without the real BQ extension.
            conn.execute(
                "CREATE OR REPLACE MACRO bigquery_query(project, sql_text) "
                "AS TABLE SELECT * FROM query(sql_text)"
            )
            yield conn
        finally:
            conn.close()
@ -265,12 +277,18 @@ def test_materialized_zero_rows_logs_warning(stub_bq, tmp_path, caplog):
            conn.execute("CREATE SCHEMA bq.test")
            conn.execute("CREATE OR REPLACE TABLE bq.test.empty AS "
                         "SELECT 1 AS n WHERE FALSE")
            # Stub bigquery_query() so materialize_query's wrapped COPY works
            # against the in-memory bq catalog without the real BQ extension.
            conn.execute(
                "CREATE OR REPLACE MACRO bigquery_query(project, sql_text) "
                "AS TABLE SELECT * FROM query(sql_text)"
            )
            yield conn
        finally:
            conn.close()
    bq_empty = BqAccess(
-        BqProjects(billing="t", data="t"),
+        BqProjects(billing="test-project", data="test-project"),
        client_factory=lambda _p: MagicMock(),
        duckdb_session_factory=_session_empty,
    )
@ -323,7 +341,7 @@ def test_attach_real_error_propagates(stub_bq, tmp_path):
            conn.close()
    bq_bad = BqAccess(
-        BqProjects(billing="t", data="t"),
+        BqProjects(billing="test-project", data="test-project"),
        client_factory=lambda _p: MagicMock(),
        duckdb_session_factory=_session_attach_fails,
    )
--- a/tests/test_run_materialized_pass_in_flight_skip.py
+++ b/tests/test_run_materialized_pass_in_flight_skip.py
@ -0,0 +1,66 @@
 """When materialize_query raises MaterializeInFlightError, _run_materialized_pass
 must record it as a 'skipped, in_flight' outcome and NOT call state.set_error
 (otherwise sync_state surfaces a false-positive 'failure' for a healthy
 in-progress run)."""
 from __future__ import annotations
 from unittest.mock import MagicMock, patch
 import pytest
 from app.api.sync import _run_materialized_pass
 from connectors.bigquery.extractor import MaterializeInFlightError
@pytest.fixture
 def fake_registry_with_one_materialized(monkeypatch, tmp_path):
    monkeypatch.setenv("DATA_DIR", str(tmp_path))
    rows = [{
        "id": "in_flight_t",
        "name": "in_flight_t",
        "query_mode": "materialized",
        "source_type": "bigquery",
        "source_query": "SELECT * FROM `ds.t`",
        "sync_schedule": None,
    }]
    class _Repo:
        def __init__(self, conn): pass
        def list_all(self): return rows
    class _State:
        def __init__(self, conn):
            self.set_error_calls = []
            self.update_sync_calls = []
        def get_last_sync(self, _id): return None
        def set_error(self, table_id, msg): self.set_error_calls.append((table_id, msg))
        def update_sync(self, **kw): self.update_sync_calls.append(kw)
    state = _State(None)
    monkeypatch.setattr("app.api.sync.TableRegistryRepository", _Repo)
    monkeypatch.setattr("app.api.sync.SyncStateRepository", lambda c: state)
    return state
 def test_in_flight_recorded_as_skipped_not_error(fake_registry_with_one_materialized):
    state = fake_registry_with_one_materialized
    with patch(
        "app.api.sync._materialize_table",
        side_effect=MaterializeInFlightError("in_flight_t", layer="process"),
    ):
        summary = _run_materialized_pass(MagicMock(), MagicMock())
    assert summary["materialized"] == []
    assert summary["errors"] == []
    assert len(summary["skipped"]) == 1
    skipped = summary["skipped"][0]
    assert skipped == {"table": "in_flight_t", "reason": "in_flight"}
    assert state.set_error_calls == []
    assert state.update_sync_calls == []
 def test_due_check_skipped_uses_due_check_reason(fake_registry_with_one_materialized, monkeypatch):
    monkeypatch.setattr("app.api.sync.is_table_due", lambda *a, **k: False)
    summary = _run_materialized_pass(MagicMock(), MagicMock())
    assert summary["skipped"] == [{"table": "in_flight_t", "reason": "due_check"}]
--- a/tests/test_schema_v24_source_query_rewrite.py
+++ b/tests/test_schema_v24_source_query_rewrite.py
@ -0,0 +1,159 @@
 """v24: rewrites table_registry.source_query for materialized BQ rows
 from DuckDB-flavor (bq.\"ds\".\"tbl\") to BQ-native (`<project>.ds.tbl`).
 The wrapping path (connectors.bigquery.extractor.materialize_query) only
 accepts BQ-native; pre-v24 rows would fail at materialize time without
 this conversion."""
 from __future__ import annotations
 import os
 import tempfile
 from pathlib import Path
 import duckdb
 import pytest
 from src.db import _ensure_schema, get_schema_version, SCHEMA_VERSION
 def _seed_v23(conn, project_id: str = "prj-data"):
    conn.execute(
        "CREATE TABLE schema_version (version INTEGER, applied_at TIMESTAMP DEFAULT current_timestamp)"
    )
    conn.execute("INSERT INTO schema_version (version) VALUES (23)")
    conn.execute(
        "CREATE TABLE table_registry ("
        "id VARCHAR PRIMARY KEY, name VARCHAR, source_type VARCHAR, "
        "query_mode VARCHAR, bucket VARCHAR, source_table VARCHAR, source_query VARCHAR)"
    )
 def test_v24_rewrites_duckdb_flavor_to_bq_native(monkeypatch):
    with tempfile.TemporaryDirectory() as tmp:
        monkeypatch.setenv("DATA_DIR", tmp)
        monkeypatch.setattr(
            "app.instance_config.get_value",
            lambda *args, **kw: "prj-data" if args == ("data_source", "bigquery", "project") else kw.get("default"),
        )
        Path(tmp, "state").mkdir(parents=True, exist_ok=True)
        db_path = Path(tmp, "state", "system.duckdb")
        conn = duckdb.connect(str(db_path))
        try:
            _seed_v23(conn)
            conn.execute(
                'INSERT INTO table_registry VALUES (?, ?, ?, ?, ?, ?, ?)',
                ["t1", "t1", "bigquery", "materialized", "ds", "tbl",
                 'SELECT * FROM bq."ds"."tbl"'],
            )
            conn.execute(
                'INSERT INTO table_registry VALUES (?, ?, ?, ?, ?, ?, ?)',
                ["t2", "t2", "bigquery", "materialized", "analytics", "orders",
                 'SELECT col1 FROM bq."analytics"."orders" WHERE col2 > 10'],
            )
            conn.execute(
                'INSERT INTO table_registry VALUES (?, ?, ?, ?, ?, ?, ?)',
                ["r1", "r1", "bigquery", "remote", "ds", "tbl", None],
            )
            _ensure_schema(conn)
            assert get_schema_version(conn) == SCHEMA_VERSION
            assert SCHEMA_VERSION >= 24
            rows = {r[0]: r[1] for r in conn.execute(
                "SELECT id, source_query FROM table_registry"
            ).fetchall()}
            assert rows["t1"] == "SELECT * FROM `prj-data.ds.tbl`"
            assert rows["t2"] == (
                "SELECT col1 FROM `prj-data.analytics.orders` WHERE col2 > 10"
            )
            assert rows["r1"] is None  # remote row untouched
        finally:
            conn.close()
 def test_v24_idempotent_when_already_bq_native(monkeypatch):
    with tempfile.TemporaryDirectory() as tmp:
        monkeypatch.setenv("DATA_DIR", tmp)
        monkeypatch.setattr(
            "app.instance_config.get_value",
            lambda *args, **kw: "prj-data" if args == ("data_source", "bigquery", "project") else kw.get("default"),
        )
        Path(tmp, "state").mkdir(parents=True, exist_ok=True)
        db_path = Path(tmp, "state", "system.duckdb")
        conn = duckdb.connect(str(db_path))
        try:
            _seed_v23(conn)
            conn.execute(
                'INSERT INTO table_registry VALUES (?, ?, ?, ?, ?, ?, ?)',
                ["t1", "t1", "bigquery", "materialized", "ds", "tbl",
                 "SELECT * FROM `prj-data.ds.tbl`"],
            )
            _ensure_schema(conn)
            row = conn.execute(
                "SELECT source_query FROM table_registry WHERE id='t1'"
            ).fetchone()
            assert row[0] == "SELECT * FROM `prj-data.ds.tbl`"
        finally:
            conn.close()
 def test_v24_logs_warning_when_project_not_configured(monkeypatch, caplog):
    with tempfile.TemporaryDirectory() as tmp:
        monkeypatch.setenv("DATA_DIR", tmp)
        monkeypatch.setattr(
            "app.instance_config.get_value",
            lambda *args, **kw: kw.get("default", ""),  # no project configured
        )
        Path(tmp, "state").mkdir(parents=True, exist_ok=True)
        db_path = Path(tmp, "state", "system.duckdb")
        conn = duckdb.connect(str(db_path))
        try:
            _seed_v23(conn)
            conn.execute(
                'INSERT INTO table_registry VALUES (?, ?, ?, ?, ?, ?, ?)',
                ["t1", "t1", "bigquery", "materialized", "ds", "tbl",
                 'SELECT * FROM bq."ds"."tbl"'],
            )
            with caplog.at_level("WARNING"):
                _ensure_schema(conn)
            row = conn.execute(
                "SELECT source_query FROM table_registry WHERE id='t1'"
            ).fetchone()
            assert row[0] == 'SELECT * FROM bq."ds"."tbl"'
            assert any(
                "v24" in r.message.lower() or "project" in r.message.lower()
                for r in caplog.records
            )
        finally:
            conn.close()
 def test_v24_keboola_materialized_row_not_rewritten(monkeypatch):
    """Materialized rows with source_type != 'bigquery' must not be touched
    by v24. Keboola materialized has no notion of bq."ds"."tbl" syntax;
    the SELECT's source_type filter pins this contract.
    """
    with tempfile.TemporaryDirectory() as tmp:
        monkeypatch.setenv("DATA_DIR", tmp)
        monkeypatch.setattr(
            "app.instance_config.get_value",
            lambda *args, **kw: "prj-data" if args == ("data_source", "bigquery", "project") else kw.get("default"),
        )
        Path(tmp, "state").mkdir(parents=True, exist_ok=True)
        db_path = Path(tmp, "state", "system.duckdb")
        conn = duckdb.connect(str(db_path))
        try:
            _seed_v23(conn)
            # Keboola row that happens to contain `bq."..."` in its SQL
            # (admin error or copy-paste from a BQ row). Migration must
            # leave it alone — this is not the v24 contract.
            conn.execute(
                'INSERT INTO table_registry VALUES (?, ?, ?, ?, ?, ?, ?)',
                ["kb1", "kb1", "keboola", "materialized", "ds", "tbl",
                 'SELECT * FROM bq."ds"."tbl"'],
            )
            _ensure_schema(conn)
            row = conn.execute(
                "SELECT source_query FROM table_registry WHERE id='kb1'"
            ).fetchone()
            assert row[0] == 'SELECT * FROM bq."ds"."tbl"'
        finally:
            conn.close()
--- a/tests/test_sync_trigger_materialized.py
+++ b/tests/test_sync_trigger_materialized.py
@ -102,7 +102,8 @@ def test_materialized_pass_skips_undue_rows(system_db, stub_bq):
        summary = sync_mod._run_materialized_pass(system_db, stub_bq)
    mock_mat.assert_not_called()
-    assert "orders_daily" in summary["skipped"]
+    # summary["skipped"] is now list[dict] — see PR zs/materialize-sync-fix
    assert {"table": "orders_daily", "reason": "due_check"} in summary["skipped"]
 def test_materialized_pass_skips_non_materialized_rows(system_db, stub_bq):