ZdenekSrotyr
c7c42de0f0
feat(sync): treat MaterializeInFlightError as 'skipped, in_flight'
...
_run_materialized_pass distinguishes due-check skips from in-flight
skips and never calls state.set_error for either. summary['skipped']
becomes a list of {table, reason} dicts; the end-of-pass log line
breaks out the in_flight subcount.
Hoists is_table_due to module-level import so test monkeypatching of
the symbol intercepts the call (the previous local import made
patches a no-op).
2026-05-04 18:11:38 +02:00
ZdenekSrotyr
85d3810535
feat(materialized): query_mode='materialized' for BigQuery + Keboola — admin SELECT → parquet → analyst
...
Closes the 'admin pre-stages a curated table/view for analysts' use case end-to-end across both supported source connectors.
Backend (BigQuery + Keboola, schema v20):
- schema v20 adds source_query TEXT to table_registry (renumbered from v19 after main's #150 RBAC migration also bumped to v19)
- connectors/bigquery/extractor.py adds materialize_query(table_id, sql, *, bq, output_dir, max_bytes=...) — BqAccess session, dry-run cost guardrail (default 10 GiB, configurable via data_source.bigquery.max_bytes_per_materialize), idempotent ATTACH, rows/bytes/md5 metadata for sync_state
- connectors/keboola/access.py — new KeboolaAccess facade (parallel of BqAccess) wrapping ATTACH 'keboola://...' AS kbc
- connectors/keboola/extractor.py adds materialize_query — same shape, no dry-run analog (Keboola Storage API has different cost model); legacy bucket-download path skips query_mode='materialized' rows
- app/api/sync.py:_run_materialized_pass dispatches by source_type to the right materialize_query
- app/api/admin.py: RegisterTableRequest accepts source_query; model_validator coheres mode↔source_query↔bucket; PUT preserves omitted fields; deprecation marks (Field(deprecated=True)) on sync_strategy + profile_after_sync (no extractor reads them; profile_after_sync becomes inert — bug from earlier work where /api/sync/trigger never honored the flag); _BQ_OPTIONAL_FIELD_DEFAULTS injects defaults into GET /server-config payload
Operator + CLI surface:
- da admin register-table --query / --query-mode materialized
- scripts/smoke-test-materialized-bq.sh — end-to-end smoke for operators
Tests (incl. spike + integration + regression):
- test_db_migration_v20, test_table_registry_source_query
- test_bq_materialize, test_bq_cost_guardrail, test_bq_init_extract_skips
- test_keboola_access, test_keboola_extension_query_passthrough (lock-in for the DuckDB extension capability), test_keboola_materialize, test_keboola_init_extract_skips, test_keboola_materialized_e2e (skipped without KBC_TEST_* creds)
- test_sync_trigger_materialized, test_sync_trigger_keboola_materialized
- test_api_admin_materialized, test_cli_admin_materialized
- test_admin_bq_register, test_admin_discover_bigquery, test_admin_keboola_materialized, test_admin_phase_c_deprecation, test_admin_put_preservation, test_materialized_e2e
Cost: BQ uses bigquery_query() (jobs API, view-aware) — works on tables, views, materialized views uniformly. Keboola uses ATTACH+COPY parquet through the DuckDB extension.
2026-05-01 20:25:56 +02:00