# `/admin/tables` Unified Tab UI + Keboola Materialized Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Unify `/admin/tables` operator UX with per-connector tabs (BigQuery / Keboola / Jira), bring Keboola to capability-parity with BigQuery for the materialized SQL path, clean up the misleading Keboola form fields, and resolve the `profile_after_sync` dead-code bug — all in **one PR** because the four concerns are tightly interconnected and splitting them would mean re-doing the form layout multiple times. **Architecture:** New branch on top of merged `main` (after PR #148 lands BQ-materialized + analyst auto-sync). The PR adds: (1) per-connector tab navigation in `/admin/tables`, (2) generalized materialized path in `app/api/sync.py` that dispatches by `source_type` to either `connectors/bigquery/extractor.py:materialize_query` (existing) or new `connectors/keboola/extractor.py:materialize_query`, (3) Keboola tab form with the new "Custom SQL" mode mirroring BQ's two-question radio, (4) Keboola form cleanup (drop Strategy, add Schedule, hide PK), (5) Pydantic deprecation marks and inert behavior for `profile_after_sync` (column stays in DB, removed from runtime). No DB schema migration. Existing BQ `#148` form structure preserved verbatim, just relocated into the BQ tab. **Tech Stack:** Jinja2 templates + vanilla JS (admin UI), FastAPI + Pydantic v2 (admin API), DuckDB BigQuery + Keboola extensions (extract path), pytest + TestClient (test suite). --- ## Brief — the problem and the design decision ### Today's `/admin/tables` UX has four interconnected problems 1. **Single mixed form** — One Jinja `{% if data_source.type == 'bigquery' %}` switch picks Keboola vs BigQuery branch. Result: an instance configured for Keboola can't register BigQuery rows from the UI, and vice versa. Multi-source instances have no UI surface for the secondary source. Edit modal compounds the mix with `keboola-edit-only` / `bq-edit-only` show/hide classes. 2. **Capability asymmetry: Keboola is a subset of BigQuery** — In our current model: | mode | BigQuery | Keboola | |---|---|---| | Live (queries hit source) | ✅ `query_mode='remote'` via DuckDB BQ extension | ❌ DuckDB Keboola extension has no live-attach mode | | Synced / Whole (full table → parquet) | ✅ `query_mode='materialized'` with auto `SELECT *` | ✅ legacy path: extractor downloads bucket/table | | Synced / Custom SQL (filtered/aggregated → parquet) | ✅ `query_mode='materialized'` with admin SELECT | ❌ not implemented | Two of the three modes work for both sources today, but the asymmetry is hidden in code and the operator can't see it. **Verified spike (2026-05-01):** the DuckDB Keboola extension supports `COPY (SELECT * FROM kbc."bucket"."table" WHERE …) TO 'parquet'` — same pattern the existing extractor already uses at `connectors/keboola/extractor.py:209`. The Keboola materialized path is a clean parallel of the BigQuery one. 3. **Misleading Keboola form fields** — Two independent agent reviews (2026-05-01) found: - `Sync Strategy` dropdown's hint claims it controls extraction, but no extractor reads the field — only `src/profiler.py:222 is_partitioned()` consumes it for parquet-layout detection. Every Keboola sync is a full overwrite regardless of value. Operators picking "Incremental" expect deltas and get full-refresh. - `Primary Key` looks like an upsert key but is decorative metadata only. No upsert/dedup anywhere; every sync is a full overwrite. Profiler reads it for catalog annotation. - `Sync Schedule` input is **missing entirely** from the Keboola branch even though `src/scheduler.py:248 filter_due_tables` honors per-table cron for every source. Operators have to use the API/CLI to set per-table cadence — no UI surface. 4. **`profile_after_sync` is dead code** — Agent 1 finding: BQ register endpoint at `app/api/admin.py:791,881` forces the field to `False` "as a signal," but `app/api/sync.py:410-438` profiler block **never reads the flag**. Profiler runs unconditionally on every synced table. Field is inert. ### Why one PR These four problems cluster around the same template (`app/web/templates/admin_tables.html`) and the same backend dispatch (`app/api/admin.py` + `app/api/sync.py`). Splitting into four sequential PRs would mean: - Form-cleanup PR touches Keboola-form-as-it-is-today, then tab-split PR re-does the same layout inside a tab → throwaway work - Keboola-materialized PR adds a Custom SQL textarea to the Keboola form, but the form layout is still the mixed flat one → confusing partial state - profile_after_sync PR is its own concern but loosely tied to the Pydantic models touched by the form changes Doing them as one PR lets us land a coherent operator-facing change: **"`/admin/tables` is now per-connector tabs, with Keboola at full capability parity with BigQuery."** The internal cleanup (form labels, dead code) comes along naturally. ### What this PR is NOT - Not a schema migration. `table_registry.profile_after_sync` and `sync_strategy` columns stay in DB (back-compat for external API consumers + profiler keeps reading sync_strategy). Marked `Field(deprecated=True)` in Pydantic. A future PR can drop the columns once external consumers migrate. - Not a Live mode for Keboola. The Keboola DuckDB extension doesn't support remote view passthrough; adding it is upstream extension work outside this scope. - Not a refactor of the orchestrator or analytics view layer. Materialized parquets land in `data/extracts//data/` and the existing `SyncOrchestrator.rebuild()` local-parquet walk picks them up unchanged. --- ## E2E safety contract User feedback (2026-05-01): "**Naše změny musí ve výsledku fungovat E2E, takže nemůžeme nic vynechávat.**" The plan must protect these invariants — every task that could violate one has an explicit gating test. 1. **PUT preservation invariant** — When Edit modal stops sending `sync_strategy` in the payload, an existing row's stored value (especially `'partitioned'`, used by `profiler.is_partitioned()` for parquet-directory layout) must survive. Verified: `app/api/admin.py:1623` uses `request.model_dump()` (without `exclude_unset=True`) plus `if v is not None` filter, so omitted Optional fields drop out before merge. Phase F locks this with a regression test. 2. **Existing partitioned rows still profile correctly** — `sync_strategy` stays alive in DB + Pydantic. Profiler `src/profiler.py:222` keeps reading it. Existing rows with `sync_strategy='partitioned'` keep their parquet-directory layout. No DB migration. No behavioral change for legacy rows. 3. **Existing #148 BQ form behavior preserved verbatim** — Two-question radio (Live × Synced × Whole | Custom), Discover/List tables/Use-as-base buttons, table-vs-view auto-detection hint — all of it lifted into the BigQuery tab unchanged. `tests/test_admin_tables_ui_materialized.py` and `tests/test_admin_bq_register.py` tests asserting form structure must still pass. 4. **External API back-compat** — `tests/test_migration.py:44`, `tests/test_repositories.py:277`, `tests/test_api_complete.py:117` POST `sync_strategy='incremental'` to the API. These must keep passing — `RegisterTableRequest` still accepts the field; only the UI omits it. 5. **`profile_after_sync` becomes inert, not breaking** — Pydantic still accepts the field (with `deprecated=True`). External API clients that send it get no error, no warning — server silently ignores. Existing tests at `tests/test_admin_bq_register.py:247,648,1371,1430` updated: assertions of `profile_after_sync == False` removed (the field is no longer persisted), but request payloads with the field still work. 6. **Materialized Keboola dispatch is conservative** — The new `_run_materialized_pass` Keboola branch only fires for rows with `source_type='keboola' AND query_mode='materialized'`. Existing Keboola rows (`query_mode='local'`, the default) keep going through the legacy `connectors/keboola/extractor.py` download path unchanged. No silent rerouting. 7. **Tab navigation degrades gracefully** — The page works without JS (server-renders all three tabs visible, JS just hides the inactive ones). If only one source type is configured, the relevant tab is auto-active and the other tabs render with a "no [source] configured" notice instead of an empty form. --- ## File Structure **Created:** - `connectors/keboola/extractor.py:materialize_query` — new top-level function (parallel to `connectors/bigquery/extractor.py:materialize_query`). Takes `(table_id, sql, *, keboola_url, token, output_dir)`, ATTACHes Keboola extension, runs `COPY (sql) TO 'parquet'`, returns dict with rows / bytes / md5 / path. - `connectors/keboola/access.py` — thin facade analogous to `connectors/bigquery/access.py:BqAccess`. Provides `KeboolaAccess.duckdb_session()` context manager that yields a DuckDB connection with the Keboola extension loaded + ATTACHed. Encapsulates token handling so `_run_materialized_pass` doesn't need to know extension wiring details. - `tests/test_keboola_materialize.py` — unit + integration tests for `materialize_query`. Mocks the Keboola extension where possible; uses a real fixture extract.duckdb otherwise. - `tests/test_admin_keboola_materialized.py` — admin API tests for registering/updating Keboola-materialized rows. - `tests/test_sync_trigger_keboola_materialized.py` — scheduler-level integration test asserting that `_run_materialized_pass` dispatches to Keboola for Keboola-materialized rows. - `tests/test_admin_tables_tab_ui.py` — UI tests for the new tab structure. - `tests/test_admin_put_preservation.py` — regression guard for PUT field-preservation invariant (item 1 of the E2E safety contract). **Modified:** - `app/web/templates/admin_tables.html` — substantial restructure: tab nav, per-tab content panels, per-tab Register modal triggers, per-tab listing filter. Existing BQ form contents preserved verbatim, relocated into BQ tab. Keboola form rebuilt with the same two-question radio model + new Custom SQL textarea. Jira tab is read-only listing. - `app/api/admin.py` — extend `RegisterTableRequest._check_mode_query_coherence` model_validator to allow `query_mode='materialized'` for `source_type='keboola'` (today the validator implicitly assumes BQ for materialized). Mark `sync_strategy` and `profile_after_sync` as `Field(deprecated=True)`. Stop reading `profile_after_sync` from the request in BQ register / `update_table` (no longer persisted, but the field is accepted for back-compat). - `app/api/sync.py` — `_run_materialized_pass` dispatches by `source_type`: existing BQ branch keeps `BqAccess` + `connectors.bigquery.extractor.materialize_query`; new Keboola branch uses `KeboolaAccess` + `connectors.keboola.extractor.materialize_query`. Cost guardrail (BQ dry-run) only runs for BQ rows; Keboola has no analogous dry-run primitive in the extension and Storage API has different cost shape — skipped with a TODO comment for future work. - `connectors/keboola/extractor.py` — `init_extract` (the legacy full-download path) skips `query_mode='materialized'` rows so they aren't double-extracted. Mirror of the BQ extractor's existing skip at `connectors/bigquery/extractor.py:188`. - `tests/test_admin_bq_register.py` — remove assertions of `row["profile_after_sync"] is False` (field is no longer persisted); request payloads keep the field for back-compat verification. Existing form-structure tests adjusted for tab restructure (selectors prefixed with tab container ids). - `tests/test_admin_tables_ui_materialized.py` — assertions adjusted for tab restructure. - `CHANGELOG.md` — `## [Unreleased]` block with `### Added`, `### Changed`, `### Fixed`, `### Deprecated` entries. **Deleted:** - Nothing (Pydantic fields stay alive with `deprecated=True`). **Untouched:** - `src/db.py` — schema stays at v20. Columns survive. - `src/profiler.py` — keeps reading `sync_strategy` for partition detection. - `src/orchestrator.py` — local-parquet walk picks up Keboola materialized parquets the same way it picks up BQ ones today. - `connectors/jira/**` — Jira tab is read-only; no register form, no backend change. - `cli/**` — analyst-side `da sync` / `da query` / `da fetch` flow unchanged. Materialized Keboola parquets show up in the manifest with `source_type='keboola'` + `query_mode='local'` (because the result is a local parquet) — analyst-side rails (`CLAUDE.md`) treat them like any other Keboola table. --- ## Phase A — Spike: lock down the Keboola extension query passthrough **Goal:** Phase B and onward depend on the Keboola DuckDB extension supporting `COPY (admin SELECT) TO 'parquet'`. The grep at planning time confirmed the existing extractor already uses this pattern, but we want a dedicated test that pins the capability so a future extension upgrade doesn't silently break the Keboola materialized path. ### Task A1: Lock-in test for Keboola extension query passthrough **Files:** - Create: `tests/test_keboola_extension_query_passthrough.py` - [ ] **Step 1: Write the failing test** ```python """Lock-in test for the DuckDB Keboola extension's query-passthrough capability that the Keboola materialized path depends on. Run only when KBC_TEST_URL + KBC_TEST_TOKEN env vars are set (CI without real Keboola credentials skips). Local dev with a real Storage API token exercises the path. """ import os import pytest import duckdb KBC_URL = os.environ.get("KBC_TEST_URL") KBC_TOKEN = os.environ.get("KBC_TEST_TOKEN") KBC_BUCKET = os.environ.get("KBC_TEST_BUCKET") KBC_TABLE = os.environ.get("KBC_TEST_TABLE") pytestmark = pytest.mark.skipif( not all([KBC_URL, KBC_TOKEN, KBC_BUCKET, KBC_TABLE]), reason="Keboola integration creds not provided", ) def test_extension_supports_attach_and_select(tmp_path): """Keboola extension must support: ATTACH 'keboola://...' AS kbc, then SELECT * FROM kbc.bucket.table. The Keboola materialized path uses this primitive at runtime (just like connectors/keboola/extractor.py:133).""" conn = duckdb.connect(str(tmp_path / "spike.duckdb")) conn.execute("INSTALL keboola FROM community") conn.execute("LOAD keboola") escaped_token = KBC_TOKEN.replace("'", "''") conn.execute(f"ATTACH '{KBC_URL}' AS kbc (TYPE keboola, TOKEN '{escaped_token}')") rows = conn.execute( f'SELECT COUNT(*) FROM kbc."{KBC_BUCKET}"."{KBC_TABLE}"' ).fetchone() assert rows[0] >= 0 # any non-negative count is fine; we're testing the path works def test_extension_supports_copy_to_parquet(tmp_path): """Keboola materialized writes the SELECT result via `COPY (...) TO '...' (FORMAT PARQUET)`. Lock that primitive.""" conn = duckdb.connect(str(tmp_path / "spike.duckdb")) conn.execute("INSTALL keboola FROM community") conn.execute("LOAD keboola") escaped_token = KBC_TOKEN.replace("'", "''") conn.execute(f"ATTACH '{KBC_URL}' AS kbc (TYPE keboola, TOKEN '{escaped_token}')") parquet_path = tmp_path / "out.parquet" safe_lit = str(parquet_path).replace("'", "''") conn.execute( f'COPY (SELECT * FROM kbc."{KBC_BUCKET}"."{KBC_TABLE}" LIMIT 5) ' f"TO '{safe_lit}' (FORMAT PARQUET)" ) assert parquet_path.exists() and parquet_path.stat().st_size > 0 def test_extension_supports_filtered_query(tmp_path): """Most important capability: a non-trivial WHERE/projection survives. This is what 'Custom SQL' mode actually relies on.""" conn = duckdb.connect(str(tmp_path / "spike.duckdb")) conn.execute("INSTALL keboola FROM community") conn.execute("LOAD keboola") escaped_token = KBC_TOKEN.replace("'", "''") conn.execute(f"ATTACH '{KBC_URL}' AS kbc (TYPE keboola, TOKEN '{escaped_token}')") parquet_path = tmp_path / "filtered.parquet" safe_lit = str(parquet_path).replace("'", "''") # Trivially filterable SELECT — extension must push the WHERE down or # at minimum execute it client-side. Either is acceptable for our # materialized path. conn.execute( f'COPY (SELECT 1 AS marker FROM kbc."{KBC_BUCKET}"."{KBC_TABLE}" LIMIT 3) ' f"TO '{safe_lit}' (FORMAT PARQUET)" ) assert parquet_path.exists() ``` - [ ] **Step 2: Run the test to verify it skips (no creds in dev) or passes (creds present)** ``` pytest tests/test_keboola_extension_query_passthrough.py -v ``` Expected: SKIP if no creds, PASS if `KBC_TEST_URL` etc. are set. Both outcomes confirm the test is well-formed; it gates Phase B but doesn't block dev work. - [ ] **Step 3: Commit** ```bash git add tests/test_keboola_extension_query_passthrough.py git commit -m "test(keboola): lock-in Keboola extension query passthrough capability The upcoming Keboola materialized path depends on the DuckDB Keboola extension supporting: ATTACH 'keboola://...' AS kbc (TYPE keboola, TOKEN '...'); COPY (SELECT * FROM kbc.bucket.table WHERE ...) TO 'parquet'; The existing extractor already uses this pattern (extractor.py:209), so the capability is verified; this test pins it so a future extension upgrade doesn't silently regress the materialized path. Skips in CI without KBC_TEST_* env vars; passes locally with a real Storage API token." ``` --- ## Phase B — Backend: Keboola materialized path ### Task B1: `KeboolaAccess` facade **Files:** - Create: `connectors/keboola/access.py` - Create: `tests/test_keboola_access.py` - [ ] **Step 1: Write the failing test** ```python """Tests for KeboolaAccess facade.""" import os import pytest from connectors.keboola.access import KeboolaAccess def test_access_session_yields_attached_duckdb(tmp_path, monkeypatch): """Mock-mode test: the facade should accept a token, install+load the Keboola extension, and ATTACH it as 'kbc'. We verify the SQL issued by intercepting the duckdb.connect call. """ issued = [] class FakeConn: def execute(self, sql, *args, **kwargs): issued.append(sql) class R: def fetchall(s): return [] def fetchone(s): return (0,) return R() def close(self): pass import duckdb monkeypatch.setattr(duckdb, "connect", lambda *a, **kw: FakeConn()) acc = KeboolaAccess( url="https://connection.keboola.com/", token="fake-token-xyz", ) with acc.duckdb_session() as conn: assert conn is not None # Verify the install + load + attach sequence happened. joined = "\n".join(issued) assert "INSTALL keboola" in joined assert "LOAD keboola" in joined assert "ATTACH" in joined and "TYPE keboola" in joined # Token must be escaped for embedding in the ATTACH literal. assert "fake-token-xyz" in joined def test_access_escapes_single_quote_in_token(monkeypatch): """Defense against a token containing a single quote breaking the ATTACH literal. SQL injection here is non-trivial because the token is admin-supplied at instance config time, but escape it anyway.""" issued = [] class FakeConn: def execute(self, sql, *args, **kwargs): issued.append(sql) class R: def fetchall(s): return [] def fetchone(s): return (0,) return R() def close(self): pass import duckdb monkeypatch.setattr(duckdb, "connect", lambda *a, **kw: FakeConn()) acc = KeboolaAccess(url="x", token="bad'token") with acc.duckdb_session() as conn: pass attach_sql = next(s for s in issued if "ATTACH" in s) # Doubled single-quote per SQL string-literal escaping. assert "bad''token" in attach_sql def test_access_real_attach_when_creds_present(tmp_path): """Smoke when KBC_TEST_URL + KBC_TEST_TOKEN are present.""" url = os.environ.get("KBC_TEST_URL") token = os.environ.get("KBC_TEST_TOKEN") if not (url and token): pytest.skip("Keboola creds not provided") acc = KeboolaAccess(url=url, token=token) with acc.duckdb_session() as conn: # ATTACH must have succeeded — querying duckdb_databases() should # show the 'kbc' alias. rows = [r[0] for r in conn.execute("SELECT name FROM duckdb_databases()").fetchall()] assert "kbc" in rows ``` - [ ] **Step 2: Run, verify failure** ``` pytest tests/test_keboola_access.py -v ``` Expected: ImportError on `connectors.keboola.access` — module not yet created. - [ ] **Step 3: Implement `KeboolaAccess`** Write `connectors/keboola/access.py`: ```python """DuckDB session facade for the Keboola Storage API extension. Parallel of `connectors/bigquery/access.py:BqAccess`. The materialized Keboola SQL path needs a one-shot DuckDB connection with the Keboola extension installed, loaded, and ATTACHed; this facade encapsulates that wiring so `_run_materialized_pass` doesn't need to know the extension name, the ATTACH alias, or how the token gets quoted into the URL literal. """ from __future__ import annotations from contextlib import contextmanager from typing import Iterator import duckdb class KeboolaAccess: """Lazy DuckDB session manager for the Keboola Storage API extension. Single-use — call `.duckdb_session()` as a context manager once per materialized job. """ def __init__(self, *, url: str, token: str) -> None: if not url or not token: raise ValueError("KeboolaAccess requires url and token") self._url = url self._token = token @contextmanager def duckdb_session(self) -> Iterator[duckdb.DuckDBPyConnection]: conn = duckdb.connect(":memory:") try: conn.execute("INSTALL keboola FROM community") conn.execute("LOAD keboola") escaped_token = self._token.replace("'", "''") conn.execute( f"ATTACH '{self._url}' AS kbc " f"(TYPE keboola, TOKEN '{escaped_token}')" ) yield conn finally: conn.close() ``` - [ ] **Step 4: Run, verify pass** ``` pytest tests/test_keboola_access.py -v ``` Expected: 2 PASS (mock tests), 1 SKIP (real-creds test). - [ ] **Step 5: Commit** ```bash git add connectors/keboola/access.py tests/test_keboola_access.py git commit -m "feat(keboola): add KeboolaAccess facade for DuckDB-extension session Parallel of connectors/bigquery/access.py:BqAccess. Encapsulates the INSTALL + LOAD + ATTACH sequence the Keboola materialized SQL path needs, with single-quote-escaped token interpolation. Single-use context manager — caller wraps `with acc.duckdb_session() as conn:` around one materialized job. Mock tests verify the SQL sequence; a real-creds test exercises the ATTACH end-to-end when KBC_TEST_URL + KBC_TEST_TOKEN are set." ``` --- ### Task B2: `connectors.keboola.extractor.materialize_query` **Files:** - Modify: `connectors/keboola/extractor.py` - Create: `tests/test_keboola_materialize.py` - [ ] **Step 1: Write the failing test** ```python """Tests for the Keboola materialize_query path.""" import hashlib import pytest from pathlib import Path from unittest.mock import MagicMock from connectors.keboola import extractor as kbe def test_materialize_query_writes_parquet_and_returns_metadata(tmp_path, monkeypatch): """Mock-mode: feed in a fake KeboolaAccess that yields a fake DuckDB connection accepting `COPY ... TO '...' (FORMAT PARQUET)` and just writes a small parquet via duckdb's own primitive on a tmp DB. """ import duckdb real_conn = duckdb.connect(":memory:") # Pre-create a small relation the fake materialize "copies". real_conn.execute("CREATE TABLE t AS SELECT 1 AS x, 'hello' AS y UNION ALL SELECT 2, 'world'") class FakeAccess: def duckdb_session(self): from contextlib import contextmanager @contextmanager def _cm(): yield real_conn return _cm() fake_access = FakeAccess() output_dir = tmp_path / "out" output_dir.mkdir() # Submit a query that selects from the in-memory table (not a real # Keboola bucket — the test verifies the COPY/parquet/hash path, # not the extension behavior). result = kbe.materialize_query( table_id="example_subset", sql="SELECT * FROM t", keboola_access=fake_access, output_dir=output_dir, ) parquet_path = output_dir / "example_subset.parquet" assert parquet_path.exists() assert result["table_id"] == "example_subset" assert result["path"] == str(parquet_path) assert result["rows"] == 2 assert result["bytes"] > 0 # MD5 of the bytes should match what we recompute. expected_md5 = hashlib.md5(parquet_path.read_bytes()).hexdigest() assert result["md5"] == expected_md5 def test_materialize_query_zero_rows_logs_warning(tmp_path, caplog): import duckdb real_conn = duckdb.connect(":memory:") real_conn.execute("CREATE TABLE t AS SELECT 1 AS x WHERE FALSE") class FakeAccess: def duckdb_session(self): from contextlib import contextmanager @contextmanager def _cm(): yield real_conn return _cm() output_dir = tmp_path / "out" output_dir.mkdir() with caplog.at_level("WARNING"): result = kbe.materialize_query( table_id="empty_subset", sql="SELECT * FROM t", keboola_access=FakeAccess(), output_dir=output_dir, ) assert result["rows"] == 0 assert "0 rows" in caplog.text or "empty" in caplog.text.lower() def test_materialize_query_rejects_unsafe_table_id(tmp_path): """Defense: table_id is interpolated into the parquet filename. SQL/ path-traversal-unsafe values must be rejected up-front (mirror of BQ materialize_query's validation).""" class FakeAccess: def duckdb_session(self): raise AssertionError("should not be called") output_dir = tmp_path / "out" output_dir.mkdir() with pytest.raises(ValueError, match="table_id"): kbe.materialize_query( table_id="../../etc/passwd", sql="SELECT 1", keboola_access=FakeAccess(), output_dir=output_dir, ) ``` - [ ] **Step 2: Run, verify fail** ``` pytest tests/test_keboola_materialize.py -v ``` Expected: AttributeError on `kbe.materialize_query` — function not yet defined. - [ ] **Step 3: Implement** Add to `connectors/keboola/extractor.py` (before any existing top-level helpers): ```python def materialize_query( table_id: str, sql: str, *, keboola_access, # KeboolaAccess (avoid circular import) output_dir: Path, ) -> dict: """Materialize an admin-registered SELECT against the Keboola Storage API extension into a parquet file. Parallel of `connectors/bigquery/extractor.py:materialize_query`. Cost guardrail: the Keboola extension has no analog of BQ dry-run; Storage API cost is download-shaped (per-byte egress + Storage API job). Phase B ships without a guardrail and logs the byte count; a future PR can add a configurable `max_bytes_per_keboola_materialize` gate similar to BQ's `max_bytes_per_materialize`. """ import re import hashlib import logging logger = logging.getLogger(__name__) # Defense: table_id is interpolated into the parquet filename. # Reject anything that's not a safe identifier. if not re.fullmatch(r"[A-Za-z_][A-Za-z0-9_]*", table_id): raise ValueError(f"unsafe table_id for materialize: {table_id!r}") parquet_path = output_dir / f"{table_id}.parquet" safe_pq_lit = str(parquet_path).replace("'", "''") with keboola_access.duckdb_session() as conn: # Run the admin SELECT and copy the result to parquet. # The COPY wrapper is identical to the existing legacy extract # path at extractor.py:209; the only difference is the SELECT is # admin-supplied rather than `SELECT * FROM kbc.bucket.table`. conn.execute(f"COPY ({sql}) TO '{safe_pq_lit}' (FORMAT PARQUET)") # Read back row count. row_count = conn.execute( f"SELECT COUNT(*) FROM read_parquet('{safe_pq_lit}')" ).fetchone()[0] file_bytes = parquet_path.read_bytes() md5 = hashlib.md5(file_bytes).hexdigest() size = len(file_bytes) if row_count == 0: logger.warning( "Materialized Keboola query for %s wrote 0 rows — verify the " "SQL filters and that the source bucket has data.", table_id, ) return { "table_id": table_id, "path": str(parquet_path), "rows": row_count, "bytes": size, "md5": md5, } ``` - [ ] **Step 4: Run, verify pass** ``` pytest tests/test_keboola_materialize.py -v ``` Expected: 3 PASS. - [ ] **Step 5: Commit** ```bash git add connectors/keboola/extractor.py tests/test_keboola_materialize.py git commit -m "feat(keboola): add materialize_query — admin SELECT → parquet Parallel of connectors/bigquery/extractor.py:materialize_query. Runs an admin-registered SELECT through the Keboola DuckDB extension via KeboolaAccess.duckdb_session(), wraps it in COPY ... TO '...' (FORMAT PARQUET), and returns rows/bytes/md5/path metadata for sync_state bookkeeping. Cost guardrail intentionally omitted in this iteration — the Keboola extension has no dry-run analog and Storage API cost shape is download-byte-based, not scan-byte-based. Phase B ships with byte-count logging; a follow-up can add a configurable max_bytes gate if needed. table_id is validated as a safe identifier (mirror of BQ implementation) because it's interpolated into the parquet filename." ``` --- ### Task B3: `init_extract` skips materialized rows **Files:** - Modify: `connectors/keboola/extractor.py` - Create: `tests/test_keboola_init_extract_skips.py` - [ ] **Step 1: Failing test** ```python """Verify the legacy Keboola download path skips materialized rows. Materialized rows are handled by `_run_materialized_pass` in `app/api/sync.py`, not by the legacy extractor. Mirror of the BQ extractor's existing skip behavior at line 188.""" import json from pathlib import Path from unittest.mock import patch from connectors.keboola import extractor as kbe def test_init_extract_skips_materialized_rows(tmp_path): """Given a registry with one local row + one materialized row, the legacy init_extract path must process only the local row.""" extracts = tmp_path / "extracts" / "keboola" extracts.mkdir(parents=True) (extracts / "data").mkdir() table_configs = [ { "id": "orders", "name": "orders", "bucket": "in.c-sales", "source_table": "orders", "query_mode": "local", }, { "id": "orders_recent", "name": "orders_recent", "source_query": "SELECT * FROM kbc.\"in.c-sales\".\"orders\" WHERE date > '2026-01-01'", "query_mode": "materialized", }, ] # Patch the actual ATTACH/COPY path so the test doesn't need real Keboola. seen = [] def fake_run_one(conn, tc, *a, **kw): seen.append(tc["id"]) with patch.object(kbe, "_extract_one_table", fake_run_one, create=True): kbe.init_extract( extracts_dir=extracts, table_configs=table_configs, keboola_url="https://x/", keboola_token="t", ) assert seen == ["orders"] # materialized row skipped def test_init_extract_logs_skip_reason(tmp_path, caplog): """When skipping a materialized row, log the reason for ops visibility.""" extracts = tmp_path / "extracts" / "keboola" extracts.mkdir(parents=True) (extracts / "data").mkdir() table_configs = [ { "id": "orders_recent", "name": "orders_recent", "source_query": "SELECT 1", "query_mode": "materialized", }, ] with caplog.at_level("INFO"): with patch.object(kbe, "_extract_one_table", lambda *a, **kw: None, create=True): kbe.init_extract( extracts_dir=extracts, table_configs=table_configs, keboola_url="https://x/", keboola_token="t", ) assert "Skipping" in caplog.text and "materialized" in caplog.text ``` - [ ] **Step 2: Run, verify fail** ``` pytest tests/test_keboola_init_extract_skips.py -v ``` Expected: FAIL — current `init_extract` does not skip. - [ ] **Step 3: Implement skip** Find the existing iteration loop in `connectors/keboola/extractor.py` (around lines 100–135 where each table_config is processed). Add at the top of the per-table-config loop: ```python for tc in table_configs: if tc.get("query_mode") == "materialized": logger.info( "Skipping legacy extract for %s — query_mode='materialized', " "handled by _run_materialized_pass instead", tc.get("id") or tc.get("name"), ) continue ... # existing per-table extract logic ``` (Refactoring note: if the existing loop body is monolithic, optionally extract it into `_extract_one_table(conn, tc, ...)` so the test can patch it cleanly. The first test above assumes that helper exists; if you keep the body inline, write the test to assert by directly observing parquet outputs instead.) - [ ] **Step 4: Run, verify pass** ``` pytest tests/test_keboola_init_extract_skips.py -v ``` Expected: PASS. - [ ] **Step 5: Commit** ```bash git add connectors/keboola/extractor.py tests/test_keboola_init_extract_skips.py git commit -m "feat(keboola): legacy extract skips query_mode='materialized' rows Mirror of the BQ extractor's existing skip at line 188. Materialized Keboola rows are handled by _run_materialized_pass (post-Phase-B implementation) rather than by the legacy bucket-download path. Without this skip, a materialized row would get full-extracted via its source bucket reference, double-writing data and confusing the sync_state bookkeeping." ``` --- ### Task B4: `_run_materialized_pass` dispatches by `source_type` **Files:** - Modify: `app/api/sync.py` - Create: `tests/test_sync_trigger_keboola_materialized.py` - [ ] **Step 1: Failing test** ```python """Scheduler-level test: when a Keboola row has query_mode='materialized', _run_materialized_pass uses connectors.keboola.extractor.materialize_query (not BQ's). Existing BQ-materialized rows continue using BqAccess.""" from unittest.mock import patch, MagicMock import pytest def test_run_materialized_pass_dispatches_keboola_to_keboola_extractor(seeded_app, tmp_path): c = seeded_app["client"] token = seeded_app["admin_token"] auth = {"Authorization": f"Bearer {token}"} # Register a Keboola materialized row. r = c.post( "/api/admin/register-table", headers=auth, json={ "name": "orders_recent", "source_type": "keboola", "query_mode": "materialized", "source_query": ( "SELECT * FROM kbc.\"in.c-sales\".\"orders\" " "WHERE date > '2026-01-01'" ), }, ) assert r.status_code == 201, r.text # Patch the two extractor entry points so we can observe which fires. bq_called = MagicMock() kb_called = MagicMock() with patch( "connectors.bigquery.extractor.materialize_query", bq_called ), patch( "connectors.keboola.extractor.materialize_query", kb_called ): # Trigger sync. r = c.post("/api/sync/trigger", headers=auth) # Allow background tasks to drain (depends on test client setup). assert kb_called.called, "Keboola materialize_query was not invoked" assert not bq_called.called, "BQ materialize_query was wrongly invoked for a Keboola row" def test_run_materialized_pass_dispatches_bigquery_to_bq_extractor(seeded_app): """Regression: existing BQ-materialized path keeps working unchanged.""" c = seeded_app["client"] token = seeded_app["admin_token"] auth = {"Authorization": f"Bearer {token}"} r = c.post( "/api/admin/register-table", headers=auth, json={ "name": "events_summary", "source_type": "bigquery", "query_mode": "materialized", "source_query": "SELECT date, COUNT(*) FROM `proj.dataset.events` GROUP BY 1", }, ) assert r.status_code == 201, r.text bq_called = MagicMock() kb_called = MagicMock() with patch( "connectors.bigquery.extractor.materialize_query", bq_called ), patch( "connectors.keboola.extractor.materialize_query", kb_called ): c.post("/api/sync/trigger", headers=auth) assert bq_called.called assert not kb_called.called ``` - [ ] **Step 2: Run, verify fail** ``` pytest tests/test_sync_trigger_keboola_materialized.py -v ``` Expected: FAIL — `_run_materialized_pass` doesn't yet dispatch by `source_type` for Keboola. - [ ] **Step 3: Implement dispatch** Find `_run_materialized_pass` in `app/api/sync.py` (around line 57). The current body iterates rows and calls `_materialize_table` (which wraps BQ's `materialize_query`). Refactor: ```python def _run_materialized_pass(conn, bq=None) -> dict: """Run all materialized rows that are due, dispatching by source_type to the correct connector's materialize_query. BigQuery rows go through BqAccess + bigquery_query() (jobs API), optionally cost-guarded by max_bytes_per_materialize. Keboola rows go through KeboolaAccess + ATTACH-and-COPY, no guardrail (extension has no dry-run primitive).""" from connectors.bigquery.extractor import materialize_query as bq_materialize from connectors.keboola.extractor import materialize_query as kb_materialize from connectors.keboola.access import KeboolaAccess from src.repositories.table_registry import TableRegistryRepository from src.scheduler import is_table_due # ... existing imports repo = TableRegistryRepository(conn) rows = repo.list_materialized_due() # or however the existing iteration looks stats = {"materialized": 0, "skipped": 0, "errors": []} keboola_access = None # lazy for row in rows: source_type = row.get("source_type") or "bigquery" # legacy default if source_type == "bigquery": try: bq_materialize( table_id=row["id"], sql=row["source_query"], bq=bq, # existing BqAccess instance output_dir=..., # existing path max_bytes=..., # existing guardrail config ) stats["materialized"] += 1 except Exception as e: stats["errors"].append({"id": row["id"], "error": str(e)}) elif source_type == "keboola": if keboola_access is None: # Lazy-init using instance config. from app.instance_config import get_value keboola_url = get_value("data_source", "keboola", "url") keboola_token = os.environ.get( get_value("data_source", "keboola", "token_env") ) if not (keboola_url and keboola_token): stats["errors"].append({ "id": row["id"], "error": "Keboola URL/token not configured for materialized path", }) continue keboola_access = KeboolaAccess(url=keboola_url, token=keboola_token) try: kb_materialize( table_id=row["id"], sql=row["source_query"], keboola_access=keboola_access, output_dir=..., # /data/extracts/keboola/data/ ) stats["materialized"] += 1 except Exception as e: stats["errors"].append({"id": row["id"], "error": str(e)}) else: stats["skipped"] += 1 stats["errors"].append({ "id": row["id"], "error": f"materialized path not supported for source_type={source_type!r}", }) return stats ``` (Adapt to the actual existing `_run_materialized_pass` shape — the snippet above is the structural change; concrete details like output_dir path and existing helper names are read from the file at implementation time.) - [ ] **Step 4: Run, verify pass** ``` pytest tests/test_sync_trigger_keboola_materialized.py -v pytest tests/test_sync_trigger_materialized.py -v # existing BQ test must still pass ``` Expected: both files pass. - [ ] **Step 5: Commit** ```bash git add app/api/sync.py tests/test_sync_trigger_keboola_materialized.py git commit -m "feat(sync): _run_materialized_pass dispatches by source_type BQ materialized rows continue using BqAccess + bigquery_query() with the cost guardrail. New Keboola materialized rows go through KeboolaAccess + ATTACH-and-COPY (no guardrail — Keboola extension has no dry-run primitive; download-byte-shaped cost is logged). Existing tests for BQ dispatch keep passing (regression test explicitly added). New tests verify Keboola dispatch fires for source_type='keboola' rows and stays silent for BQ rows." ``` --- ## Phase C — Backend: Pydantic deprecation + `profile_after_sync` becomes inert ### Task C1: Mark `sync_strategy` and `profile_after_sync` deprecated, stop persisting `profile_after_sync` **Files:** - Modify: `app/api/admin.py` (Pydantic models around lines 654–728 and 880–895; BQ register endpoint around line 791; `update_table` around line 1623) - Modify: `tests/test_admin_bq_register.py` (assertions of `row["profile_after_sync"] is False` → drop, replace with assertion that the field-being-sent doesn't error) - [ ] **Step 1: Failing test (deprecation visible in OpenAPI + field becomes inert)** ```python """Verify Phase C deprecation marks + profile_after_sync becomes inert.""" import pytest from app.api.admin import RegisterTableRequest, UpdateTableRequest def test_register_request_marks_sync_strategy_deprecated(): schema = RegisterTableRequest.model_json_schema() field = schema["properties"]["sync_strategy"] assert field.get("deprecated") is True def test_register_request_marks_profile_after_sync_deprecated(): schema = RegisterTableRequest.model_json_schema() field = schema["properties"]["profile_after_sync"] assert field.get("deprecated") is True def test_register_endpoint_accepts_profile_after_sync_for_backcompat(seeded_app): """External clients sending profile_after_sync get no error — the field is silently ignored.""" c = seeded_app["client"] token = seeded_app["admin_token"] auth = {"Authorization": f"Bearer {token}"} r = c.post( "/api/admin/register-table", headers=auth, json={ "name": "x", "source_type": "keboola", "bucket": "in.c-foo", "source_table": "y", "query_mode": "local", "profile_after_sync": True, # legacy client may send this }, ) assert r.status_code == 201 def test_register_endpoint_does_not_persist_profile_after_sync(seeded_app): """The persisted row no longer carries the old profile_after_sync value (column may still exist in DB for back-compat, but admin path never writes a non-default value).""" c = seeded_app["client"] token = seeded_app["admin_token"] auth = {"Authorization": f"Bearer {token}"} r = c.post( "/api/admin/register-table", headers=auth, json={ "name": "y", "source_type": "keboola", "bucket": "in.c-foo", "source_table": "y", "query_mode": "local", "profile_after_sync": True, }, ) assert r.status_code == 201 r = c.get("/api/admin/registry", headers=auth) rows = r.json()["tables"] row = next(t for t in rows if t["id"] == "y") # The field's value in the registry response is now whatever the DB # default is (True per current schema). Critical: the request value # is NOT echoed back. # If the value is in the response at all (legacy back-compat in the # GET serializer), it's the schema default, not the request value. # If the value is absent (deprecated and stripped), that's also fine. if "profile_after_sync" in row: # Whatever this is, it's the schema default, not request-driven. assert row["profile_after_sync"] is True or row["profile_after_sync"] is None ``` - [ ] **Step 2: Run, verify fail** ``` pytest tests/test_admin_phase_c_deprecation.py -v ``` Expected: deprecation-mark assertions FAIL (no `deprecated=True` yet). - [ ] **Step 3: Implement Pydantic deprecation marks** In `app/api/admin.py` `RegisterTableRequest` definition, change: ```python sync_strategy: str = "full_refresh" ``` to: ```python sync_strategy: str = Field( default="full_refresh", deprecated=True, description=( "DEPRECATED: catalog/profiler metadata only. No extractor reads " "this field; every sync is a full overwrite regardless of value. " "profiler.is_partitioned() consumes it for parquet-layout " "detection. Field stays for back-compat; will be removed in a " "future major release." ), ) ``` Same treatment for `profile_after_sync`: ```python profile_after_sync: bool = Field( default=True, deprecated=True, description=( "DEPRECATED: not consumed by the runtime (Agent 1 finding " "2026-05-01). Profiler runs unconditionally on every synced " "table; this flag has no effect. Field stays for back-compat." ), ) ``` In the BQ register endpoint at `app/api/admin.py:791`, find the line that sets `request.profile_after_sync = False` and remove it (the field is now inert, no need to force a value). In `update_table` at `app/api/admin.py:1657`, the synthetic `RegisterTableRequest` carries `profile_after_sync=bool(merged.get("profile_after_sync") or False)` — keep this for back-compat but understand it's now decorative; the synthetic-validate path doesn't need to change. In `register_table` at `app/api/admin.py:1362` (the actual repo.register call), drop `profile_after_sync=request.profile_after_sync` from the kwargs if it's there. The DB column has its default (`True` per schema) and stays consistent. - [ ] **Step 4: Run** ``` pytest tests/test_admin_phase_c_deprecation.py -v pytest tests/test_admin_bq_register.py -v ``` Expected: new tests pass; existing BQ register tests need updates where they assert `row["profile_after_sync"] is False`. - [ ] **Step 5: Update existing assertions in `tests/test_admin_bq_register.py`** Find lines 247, 648, 1371, 1430 where `assert row["profile_after_sync"] is False` exists. Replace with a comment + back-compat assertion: ```python # Phase C: profile_after_sync is now inert. The field is accepted in # the request for back-compat but no longer overrides the DB default. # Was: assert row["profile_after_sync"] is False (when BQ register # forced it to False as a "signal"). Now the row carries the schema # default (True). Profiler runs unconditionally regardless. assert row.get("profile_after_sync") in (True, None) ``` - [ ] **Step 6: Run full sweep** ``` pytest tests/test_admin_bq_register.py tests/test_admin_phase_c_deprecation.py -v ``` Expected: all pass. - [ ] **Step 7: Commit** ```bash git add app/api/admin.py tests/test_admin_bq_register.py tests/test_admin_phase_c_deprecation.py git commit -m "feat(admin-api): mark sync_strategy + profile_after_sync deprecated; profile_after_sync becomes inert OpenAPI schema now flags both fields with deprecated=true. External API clients see the signal during their next regen but get no runtime error — back-compat preserved. profile_after_sync was previously force-set to False by the BQ register endpoint as a 'signal,' but app/api/sync.py:410-438 never reads the flag (Agent 1 finding 2026-05-01). The runtime profiles every synced table unconditionally. Phase C removes the force-False line and stops the field from overriding the DB default — it's now decorative-only in both directions. sync_strategy stays alive in DB and Pydantic because profiler.is_partitioned() at src/profiler.py:222 still consumes it for parquet-directory-layout detection on existing partitioned rows. Phase F (UI) hides the field from the form; Phase C just labels it for external consumers. Existing BQ register tests asserting row['profile_after_sync'] is False updated to back-compat-tolerant form." ``` --- ### Task C2: `RegisterTableRequest` validator allows Keboola materialized **Files:** - Modify: `app/api/admin.py` (`_check_mode_query_coherence` model validator, around lines 681–692) - Create: `tests/test_admin_keboola_materialized.py` - [ ] **Step 1: Failing test** ```python """Tests for Keboola materialized registration.""" import pytest def test_register_keboola_materialized_accepts_source_query(seeded_app): c = seeded_app["client"] token = seeded_app["admin_token"] auth = {"Authorization": f"Bearer {token}"} r = c.post( "/api/admin/register-table", headers=auth, json={ "name": "orders_recent", "source_type": "keboola", "query_mode": "materialized", "source_query": "SELECT * FROM kbc.\"in.c-sales\".\"orders\" WHERE date > '2026-01-01'", "sync_schedule": "daily 03:00", }, ) assert r.status_code == 201, r.text def test_register_keboola_materialized_rejects_missing_source_query(seeded_app): c = seeded_app["client"] token = seeded_app["admin_token"] auth = {"Authorization": f"Bearer {token}"} r = c.post( "/api/admin/register-table", headers=auth, json={ "name": "orders_recent", "source_type": "keboola", "query_mode": "materialized", # source_query missing }, ) assert r.status_code == 422 assert "source_query" in r.text def test_register_keboola_materialized_skips_bucket_check(seeded_app): """Materialized rows don't need bucket/source_table — the SELECT inlines the references. Mirror of BQ materialized validator behavior.""" c = seeded_app["client"] token = seeded_app["admin_token"] auth = {"Authorization": f"Bearer {token}"} r = c.post( "/api/admin/register-table", headers=auth, json={ "name": "x", "source_type": "keboola", "query_mode": "materialized", "source_query": "SELECT 1", # No bucket / source_table — must still succeed. }, ) assert r.status_code == 201, r.text def test_update_keboola_materialized_clears_stale_source_query_on_mode_switch(seeded_app): c = seeded_app["client"] token = seeded_app["admin_token"] auth = {"Authorization": f"Bearer {token}"} # Register materialized. r = c.post( "/api/admin/register-table", headers=auth, json={ "name": "x", "source_type": "keboola", "query_mode": "materialized", "source_query": "SELECT 1", }, ) assert r.status_code == 201 # PUT to switch back to local — source_query must clear. r = c.put( "/api/admin/registry/x", headers=auth, json={ "source_type": "keboola", "query_mode": "local", "bucket": "in.c-foo", "source_table": "y", }, ) assert r.status_code == 200 r = c.get("/api/admin/registry", headers=auth) rows = r.json()["tables"] row = next(t for t in rows if t["id"] == "x") assert row.get("source_query") in (None, "") ``` - [ ] **Step 2: Run, verify fail** ``` pytest tests/test_admin_keboola_materialized.py -v ``` Expected: at least the first test fails — current validator rejects materialized for non-BQ source_type, or accepts but the storage path bombs. - [ ] **Step 3: Implement validator update** Find `_check_mode_query_coherence` in `app/api/admin.py` (around lines 681–692). It currently enforces `source_query` IFF `query_mode='materialized'`. Verify it doesn't gate by `source_type`. If it does, remove the gate. If it doesn't, the test should already pass — investigate. Also check `_validate_bigquery_register_payload` (around line 794) — make sure it isn't called for non-BQ rows. The dispatch at `register_table` line 1354 should already be `source_type == 'bigquery'`-gated. For the `update_table` PUT semantics test, verify that `update_table` at line 1642 already has the "switching away from materialized → drop source_query" logic. Mirror it for the reverse (switching INTO materialized → drop bucket/source_table) if needed. - [ ] **Step 4: Run, verify pass** ``` pytest tests/test_admin_keboola_materialized.py -v ``` - [ ] **Step 5: Commit** ```bash git add app/api/admin.py tests/test_admin_keboola_materialized.py git commit -m "feat(admin-api): allow query_mode='materialized' for Keboola source_type The model validator already only gates materialized↔source_query coherence (no source_type-specific check). Phase B made the runtime materialized path source_type-aware. This commit pins the API contract with end-to-end tests that: - Keboola+materialized POST with source_query succeeds - Keboola+materialized POST without source_query is rejected (422) - Keboola+materialized POST without bucket/source_table succeeds (the SELECT inlines references — same as BQ) - PUT switching a materialized row back to local clears the stale source_query (mirror of BQ behavior at admin.py:1642)" ``` --- ## Phase D — UI: tab-split scaffold ### Task D1: Tab nav structure + routing **Files:** - Modify: `app/web/templates/admin_tables.html` (top of `` around the existing single form area) - Create: `tests/test_admin_tables_tab_ui.py` - [ ] **Step 1: Failing test** ```python """UI tests for the per-connector tab layout.""" import pytest def _auth(token): return {"Authorization": f"Bearer {token}"} def test_admin_tables_renders_tab_nav(seeded_app): """Page has tab nav with at least the source types configured for the instance plus Jira (always shown when any Jira rows exist).""" c = seeded_app["client"] token = seeded_app["admin_token"] r = c.get("/admin/tables", headers=_auth(token)) assert r.status_code == 200 html = r.text assert 'role="tablist"' in html or 'class="tab-nav"' in html assert 'data-tab="bigquery"' in html or 'id="tab-bigquery"' in html assert 'data-tab="keboola"' in html or 'id="tab-keboola"' in html def test_admin_tables_active_tab_matches_instance_type(seeded_app, monkeypatch): """When data_source.type='bigquery', the BigQuery tab is the initially-active one. Operator can still switch to Keboola tab if they want to register a secondary source.""" fake_cfg = {"data_source": {"type": "bigquery", "bigquery": {"project": "p"}}} monkeypatch.setattr( "app.instance_config.load_instance_config", lambda: fake_cfg, raising=False, ) from app.instance_config import reset_cache reset_cache() try: c = seeded_app["client"] token = seeded_app["admin_token"] r = c.get("/admin/tables", headers=_auth(token)) html = r.text # The BQ tab content is the visible one initially. # Either a class="active" on the BQ tab button, or aria-selected="true". assert ( 'data-tab="bigquery" class="tab active"' in html or 'data-tab="bigquery" aria-selected="true"' in html ) finally: reset_cache() def test_admin_tables_each_tab_has_register_button(seeded_app): """Each writable source tab has its own Register button. Jira is read-only (no Register).""" c = seeded_app["client"] token = seeded_app["admin_token"] r = c.get("/admin/tables", headers=_auth(token)) html = r.text # Each Register button is scoped to its tab — id distinguishes. # We check presence of the registration trigger elements. assert 'id="bqRegisterBtn"' in html or 'data-register-source="bigquery"' in html assert 'id="kbRegisterBtn"' in html or 'data-register-source="keboola"' in html # No Jira register button (Jira is webhook-driven). assert 'data-register-source="jira"' not in html def test_admin_tables_listing_per_tab(seeded_app): """The registry table is rendered per tab — each tab has its own filtered by source_type. Listing JS reads tables from the catalog API and routes each row into the matching tab's .""" c = seeded_app["client"] token = seeded_app["admin_token"] r = c.get("/admin/tables", headers=_auth(token)) html = r.text assert 'id="bqTableListing"' in html assert 'id="kbTableListing"' in html assert 'id="jiraTableListing"' in html def test_admin_tables_tab_persists_in_url_hash(seeded_app): """Tab switching updates window.location.hash so refresh keeps the operator on the right tab. Verify the JS hooks for it are present.""" c = seeded_app["client"] token = seeded_app["admin_token"] r = c.get("/admin/tables", headers=_auth(token)) html = r.text assert "location.hash" in html or "history.replaceState" in html # And initial-tab pickup from hash on load. assert "window.location.hash" in html or "getActiveTabFromHash" in html ``` - [ ] **Step 2: Run, verify fail** ``` pytest tests/test_admin_tables_tab_ui.py -v ``` Expected: all FAIL — tab structure not yet in template. - [ ] **Step 3: Implement tab nav + content panels** Restructure `app/web/templates/admin_tables.html`. The existing single content area becomes three tab panels. Outline of the new top-level structure (replace the existing single page-content area): ```html {# Determine the initial active tab from the data source type + any registered rows. Operator can still switch tabs to register in another source. #} {% set initial_tab = data_source_type %}
{# BQ tab: Register button, listing, modals — Phase E moves existing content here. #}

BigQuery tables

{# Existing BQ register/edit modals get scoped here in Phase E. #}

Keboola tables

{# Phase F builds the Keboola form here. #}
``` - [ ] **Step 4: Run, verify pass** ``` pytest tests/test_admin_tables_tab_ui.py -v ``` - [ ] **Step 5: Commit** ```bash git add app/web/templates/admin_tables.html tests/test_admin_tables_tab_ui.py git commit -m "feat(admin-ui): tab-split scaffold for /admin/tables Per-connector tabs (BigQuery / Keboola / Jira) replace the single mixed form. Each tab has its own Register button + listing div + (later) form modals. Initial active tab matches data_source.type from instance.yaml; operator can switch tabs to manage a secondary source. Tab state persists in window.location.hash so refresh keeps the operator on the right tab. No JS framework — vanilla JS toggles display on .tab-content sections. Listing divs (bqTableListing / kbTableListing / jiraTableListing) are wired in Phase H (per-tab listing filter)." ``` --- ## Phase E — UI: BigQuery tab content (relocate existing #148 form) ### Task E1: Move BQ Register modal + listing logic into BQ tab **Files:** - Modify: `app/web/templates/admin_tables.html` - Modify: `tests/test_admin_tables_ui_materialized.py` (selector adjustments) - [ ] **Step 1: Failing test (existing tests must pass against new tab structure)** The existing `test_admin_tables_renders_two_question_radio_form` and `test_edit_modal_has_bq_parity_fields` already assert the BQ form exists. Update them to assert the form is **inside** the BQ tab: ```python def test_admin_tables_renders_two_question_radio_form(seeded_app, bq_instance): """[Phase E] BQ form moved into tab-content-bigquery section.""" c = seeded_app["client"] token = seeded_app["admin_token"] r = c.get("/admin/tables", headers=_auth(token)) assert r.status_code == 200, r.text html = r.text # Existing assertions (preserved): assert 'name="bqAccessMode"' in html assert 'value="live"' in html # ... (all the original assertions stay) # NEW: form fields are inside the BQ tab content area. bq_tab_content = html[html.index('id="tab-content-bigquery"'):] bq_tab_end = bq_tab_content.index('') bq_section = bq_tab_content[:bq_tab_end] assert 'name="bqAccessMode"' in bq_section assert 'id="bqDataset"' in bq_section assert 'id="bqSourceQuery"' in bq_section ``` - [ ] **Step 2: Run, verify fail** ``` pytest tests/test_admin_tables_ui_materialized.py::test_admin_tables_renders_two_question_radio_form -v ``` Expected: FAIL on the `tab-content-bigquery` slice — form not yet inside the tab. - [ ] **Step 3: Move BQ form into BQ tab content section** Take the existing BQ register form block (currently inside the `{% if data_source.type == 'bigquery' %}` Jinja branch) and physically relocate it inside the `
` element added in Phase D. Remove the outer `{% if %}` branch — the form is always rendered, just inside its tab. Same for the BQ Edit modal block — relocate inside the BQ tab section. Adjust the open/close modal trigger functions: ```javascript // Old: openRegisterModal() — assumed single source // New: openRegisterModal(source) function openRegisterModal(source) { if (source === 'bigquery') { document.getElementById('registerBqModal').style.display = 'block'; } else if (source === 'keboola') { document.getElementById('registerKeboolaModal').style.display = 'block'; } } ``` (The Keboola modal id is added in Phase F.) - [ ] **Step 4: Run, verify pass** ``` pytest tests/test_admin_tables_ui_materialized.py -v pytest tests/test_admin_bq_register.py -v pytest tests/test_admin_discover_bigquery.py -v ``` Expected: all existing BQ-form tests pass — the form behaves identically, just from inside a tab. - [ ] **Step 5: Commit** ```bash git add app/web/templates/admin_tables.html tests/test_admin_tables_ui_materialized.py git commit -m "refactor(admin-ui): relocate BigQuery form into BigQuery tab Phase E of the tab-split. Existing BQ register/edit modals + Discover/ List-tables/Use-as-base buttons + two-question radio model preserved verbatim — only the parent
changed. The Jinja {% if data_source.type == 'bigquery' %} branch is gone; the form is always rendered, just inside #tab-content-bigquery. openRegisterModal() now takes a source argument. Existing tests for form structure adjusted to slice on the BQ tab content; no behavior change." ``` --- ## Phase F — UI: Keboola tab content (with Custom SQL + form cleanup) ### Task F1: Keboola Register modal — full rebuild with two-question radio + form cleanup **Files:** - Modify: `app/web/templates/admin_tables.html` - Create test: extend `tests/test_admin_tables_ui_materialized.py` - [ ] **Step 1: Failing test** ```python def test_keboola_register_form_has_two_question_radio(seeded_app, monkeypatch): """Phase F: Keboola tab Register form mirrors BQ's two-question radio model, but Q1 (access mode) is forced to 'synced' (no Live mode for Keboola), so visually only Q2 (sync mode = whole | custom) is exposed. Q2.whole → query_mode='materialized' with auto SELECT * FROM kbc.bucket.table Q2.custom → query_mode='materialized' with admin SELECT Both create materialized rows; the legacy 'local' mode is no longer user-selectable (it would be exactly equivalent to whole).""" fake_cfg = {"data_source": {"type": "keboola", "keboola": {}}} monkeypatch.setattr( "app.instance_config.load_instance_config", lambda: fake_cfg, raising=False, ) from app.instance_config import reset_cache reset_cache() try: c = seeded_app["client"] token = seeded_app["admin_token"] r = c.get("/admin/tables", headers=_auth(token)) html = r.text kb_tab = html[html.index('id="tab-content-keboola"'):] kb_tab = kb_tab[:kb_tab.index('
')] # Q2 radio — Whole vs Custom. assert 'name="kbSyncMode"' in kb_tab assert 'value="whole"' in kb_tab assert 'value="custom"' in kb_tab # Bucket + source-table inputs reused for whole mode. assert 'id="kbBucket"' in kb_tab assert 'id="kbSourceTable"' in kb_tab # Custom-SQL textarea + Use-table-as-base prefill button. assert 'id="kbSourceQuery"' in kb_tab assert 'kbPrefillFromTable' in html or 'prefillFromTable(\'kbSourceQuery\')' in html # Sync Schedule input — was missing from old Keboola form. assert 'id="kbSyncSchedule"' in kb_tab # Sync Strategy dropdown — gone. assert 'id="kbStrategy"' not in kb_tab assert 'id="regStrategy"' not in html # leftover sanity # Primary Key — under
Advanced. assert 'id="kbPrimaryKey"' in kb_tab assert "Advanced" in kb_tab # Discover datasets / List tables buttons. assert 'kbDiscoverBuckets' in html or "discoverKeboolaBuckets(" in html assert 'kbListTables' in html or "discoverKeboolaTables(" in html def test_keboola_register_payload_maps_to_materialized(seeded_app, monkeypatch): """The form's whole-table mode posts query_mode='materialized' with a synthetic SELECT * SQL — same pattern as BQ Synced/Whole.""" # This test exercises the JS payload via a parameterized fetch shim # is harder than necessary; instead, verify the API endpoint accepts # the payload shape the form is going to send. c = seeded_app["client"] token = seeded_app["admin_token"] auth = {"Authorization": f"Bearer {token}"} r = c.post( "/api/admin/register-table", headers=auth, json={ "name": "orders", "source_type": "keboola", "query_mode": "materialized", "source_query": 'SELECT * FROM kbc."in.c-sales"."orders"', "sync_schedule": "every 6h", }, ) assert r.status_code == 201, r.text ``` - [ ] **Step 2: Run, verify fail** ``` pytest tests/test_admin_tables_ui_materialized.py::test_keboola_register_form_has_two_question_radio -v ``` Expected: FAIL — Keboola form not yet built. - [ ] **Step 3: Build the Keboola Register modal** Inside the `
` from Phase D, add the modal: ```html ``` Plus the JS for the form (in the `