CLAUDE.md rewritten (708 -> ~320 lines): four overlapping release sections collapsed to one, stale v1->v35 schema history dropped (it lives in CHANGELOG), marketplace endpoint internals and verbose process sections moved out or tightened. New focused docs: - docs/RELEASING.md - release process, deploy workflows, CI quirks (RELEASE_TEMPLATE.md folded in as an appendix) - docs/marketplace.md - marketplace ingestion + re-serving internals - docs/README.md - documentation index by audience, linked from README.md and CLAUDE.md Archived under docs/archive/: docs/superpowers/ (52 historical planning artifacts), HACKATHON.md, pd-ps-comments.md, security-audit-2026-04.md, future/NOTIFICATIONS.md. Removed the docs/auto-install.md stub. Fixed dangling links in connectors/jira/README.md and dev_docs/README.md, repointed code/doc references to archived paths.
101 KiB
/admin/tables Unified Tab UI + Keboola Materialized Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Unify /admin/tables operator UX with per-connector tabs (BigQuery / Keboola / Jira), bring Keboola to capability-parity with BigQuery for the materialized SQL path, clean up the misleading Keboola form fields, and resolve the profile_after_sync dead-code bug — all in one PR because the four concerns are tightly interconnected and splitting them would mean re-doing the form layout multiple times.
Architecture: New branch on top of merged main (after PR #148 lands BQ-materialized + analyst auto-sync). The PR adds: (1) per-connector tab navigation in /admin/tables, (2) generalized materialized path in app/api/sync.py that dispatches by source_type to either connectors/bigquery/extractor.py:materialize_query (existing) or new connectors/keboola/extractor.py:materialize_query, (3) Keboola tab form with the new "Custom SQL" mode mirroring BQ's two-question radio, (4) Keboola form cleanup (drop Strategy, add Schedule, hide PK), (5) Pydantic deprecation marks and inert behavior for profile_after_sync (column stays in DB, removed from runtime). No DB schema migration. Existing BQ #148 form structure preserved verbatim, just relocated into the BQ tab.
Tech Stack: Jinja2 templates + vanilla JS (admin UI), FastAPI + Pydantic v2 (admin API), DuckDB BigQuery + Keboola extensions (extract path), pytest + TestClient (test suite).
Brief — the problem and the design decision
Today's /admin/tables UX has four interconnected problems
-
Single mixed form — One Jinja
{% if data_source.type == 'bigquery' %}switch picks Keboola vs BigQuery branch. Result: an instance configured for Keboola can't register BigQuery rows from the UI, and vice versa. Multi-source instances have no UI surface for the secondary source. Edit modal compounds the mix withkeboola-edit-only/bq-edit-onlyshow/hide classes. -
Capability asymmetry: Keboola is a subset of BigQuery — In our current model:
mode BigQuery Keboola Live (queries hit source) ✅ query_mode='remote'via DuckDB BQ extension❌ DuckDB Keboola extension has no live-attach mode Synced / Whole (full table → parquet) ✅ query_mode='materialized'with autoSELECT *✅ legacy path: extractor downloads bucket/table Synced / Custom SQL (filtered/aggregated → parquet) ✅ query_mode='materialized'with admin SELECT❌ not implemented Two of the three modes work for both sources today, but the asymmetry is hidden in code and the operator can't see it. Verified spike (2026-05-01): the DuckDB Keboola extension supports
COPY (SELECT * FROM kbc."bucket"."table" WHERE …) TO 'parquet'— same pattern the existing extractor already uses atconnectors/keboola/extractor.py:209. The Keboola materialized path is a clean parallel of the BigQuery one. -
Misleading Keboola form fields — Two independent agent reviews (2026-05-01) found:
Sync Strategydropdown's hint claims it controls extraction, but no extractor reads the field — onlysrc/profiler.py:222 is_partitioned()consumes it for parquet-layout detection. Every Keboola sync is a full overwrite regardless of value. Operators picking "Incremental" expect deltas and get full-refresh.Primary Keylooks like an upsert key but is decorative metadata only. No upsert/dedup anywhere; every sync is a full overwrite. Profiler reads it for catalog annotation.Sync Scheduleinput is missing entirely from the Keboola branch even thoughsrc/scheduler.py:248 filter_due_tableshonors per-table cron for every source. Operators have to use the API/CLI to set per-table cadence — no UI surface.
-
profile_after_syncis dead code — Agent 1 finding: BQ register endpoint atapp/api/admin.py:791,881forces the field toFalse"as a signal," butapp/api/sync.py:410-438profiler block never reads the flag. Profiler runs unconditionally on every synced table. Field is inert.
Why one PR
These four problems cluster around the same template (app/web/templates/admin_tables.html) and the same backend dispatch (app/api/admin.py + app/api/sync.py). Splitting into four sequential PRs would mean:
- Form-cleanup PR touches Keboola-form-as-it-is-today, then tab-split PR re-does the same layout inside a tab → throwaway work
- Keboola-materialized PR adds a Custom SQL textarea to the Keboola form, but the form layout is still the mixed flat one → confusing partial state
- profile_after_sync PR is its own concern but loosely tied to the Pydantic models touched by the form changes
Doing them as one PR lets us land a coherent operator-facing change: "/admin/tables is now per-connector tabs, with Keboola at full capability parity with BigQuery." The internal cleanup (form labels, dead code) comes along naturally.
What this PR is NOT
- Not a schema migration.
table_registry.profile_after_syncandsync_strategycolumns stay in DB (back-compat for external API consumers + profiler keeps reading sync_strategy). MarkedField(deprecated=True)in Pydantic. A future PR can drop the columns once external consumers migrate. - Not a Live mode for Keboola. The Keboola DuckDB extension doesn't support remote view passthrough; adding it is upstream extension work outside this scope.
- Not a refactor of the orchestrator or analytics view layer. Materialized parquets land in
data/extracts/<source>/data/and the existingSyncOrchestrator.rebuild()local-parquet walk picks them up unchanged.
E2E safety contract
User feedback (2026-05-01): "Naše změny musí ve výsledku fungovat E2E, takže nemůžeme nic vynechávat." The plan must protect these invariants — every task that could violate one has an explicit gating test.
-
PUT preservation invariant — When Edit modal stops sending
sync_strategyin the payload, an existing row's stored value (especially'partitioned', used byprofiler.is_partitioned()for parquet-directory layout) must survive. Verified:app/api/admin.py:1623usesrequest.model_dump()(withoutexclude_unset=True) plusif v is not Nonefilter, so omitted Optional fields drop out before merge. Phase F locks this with a regression test. -
Existing partitioned rows still profile correctly —
sync_strategystays alive in DB + Pydantic. Profilersrc/profiler.py:222keeps reading it. Existing rows withsync_strategy='partitioned'keep their parquet-directory layout. No DB migration. No behavioral change for legacy rows. -
Existing #148 BQ form behavior preserved verbatim — Two-question radio (Live × Synced × Whole | Custom), Discover/List tables/Use-as-base buttons, table-vs-view auto-detection hint — all of it lifted into the BigQuery tab unchanged.
tests/test_admin_tables_ui_materialized.pyandtests/test_admin_bq_register.pytests asserting form structure must still pass. -
External API back-compat —
tests/test_migration.py:44,tests/test_repositories.py:277,tests/test_api_complete.py:117POSTsync_strategy='incremental'to the API. These must keep passing —RegisterTableRequeststill accepts the field; only the UI omits it. -
profile_after_syncbecomes inert, not breaking — Pydantic still accepts the field (withdeprecated=True). External API clients that send it get no error, no warning — server silently ignores. Existing tests attests/test_admin_bq_register.py:247,648,1371,1430updated: assertions ofprofile_after_sync == Falseremoved (the field is no longer persisted), but request payloads with the field still work. -
Materialized Keboola dispatch is conservative — The new
_run_materialized_passKeboola branch only fires for rows withsource_type='keboola' AND query_mode='materialized'. Existing Keboola rows (query_mode='local', the default) keep going through the legacyconnectors/keboola/extractor.pydownload path unchanged. No silent rerouting. -
Tab navigation degrades gracefully — The page works without JS (server-renders all three tabs visible, JS just hides the inactive ones). If only one source type is configured, the relevant tab is auto-active and the other tabs render with a "no [source] configured" notice instead of an empty form.
File Structure
Created:
connectors/keboola/extractor.py:materialize_query— new top-level function (parallel toconnectors/bigquery/extractor.py:materialize_query). Takes(table_id, sql, *, keboola_url, token, output_dir), ATTACHes Keboola extension, runsCOPY (sql) TO 'parquet', returns dict with rows / bytes / md5 / path.connectors/keboola/access.py— thin facade analogous toconnectors/bigquery/access.py:BqAccess. ProvidesKeboolaAccess.duckdb_session()context manager that yields a DuckDB connection with the Keboola extension loaded + ATTACHed. Encapsulates token handling so_run_materialized_passdoesn't need to know extension wiring details.tests/test_keboola_materialize.py— unit + integration tests formaterialize_query. Mocks the Keboola extension where possible; uses a real fixture extract.duckdb otherwise.tests/test_admin_keboola_materialized.py— admin API tests for registering/updating Keboola-materialized rows.tests/test_sync_trigger_keboola_materialized.py— scheduler-level integration test asserting that_run_materialized_passdispatches to Keboola for Keboola-materialized rows.tests/test_admin_tables_tab_ui.py— UI tests for the new tab structure.tests/test_admin_put_preservation.py— regression guard for PUT field-preservation invariant (item 1 of the E2E safety contract).
Modified:
app/web/templates/admin_tables.html— substantial restructure: tab nav, per-tab content panels, per-tab Register modal triggers, per-tab listing filter. Existing BQ form contents preserved verbatim, relocated into BQ tab. Keboola form rebuilt with the same two-question radio model + new Custom SQL textarea. Jira tab is read-only listing.app/api/admin.py— extendRegisterTableRequest._check_mode_query_coherencemodel_validator to allowquery_mode='materialized'forsource_type='keboola'(today the validator implicitly assumes BQ for materialized). Marksync_strategyandprofile_after_syncasField(deprecated=True). Stop readingprofile_after_syncfrom the request in BQ register /update_table(no longer persisted, but the field is accepted for back-compat).app/api/sync.py—_run_materialized_passdispatches bysource_type: existing BQ branch keepsBqAccess+connectors.bigquery.extractor.materialize_query; new Keboola branch usesKeboolaAccess+connectors.keboola.extractor.materialize_query. Cost guardrail (BQ dry-run) only runs for BQ rows; Keboola has no analogous dry-run primitive in the extension and Storage API has different cost shape — skipped with a TODO comment for future work.connectors/keboola/extractor.py—init_extract(the legacy full-download path) skipsquery_mode='materialized'rows so they aren't double-extracted. Mirror of the BQ extractor's existing skip atconnectors/bigquery/extractor.py:188.tests/test_admin_bq_register.py— remove assertions ofrow["profile_after_sync"] is False(field is no longer persisted); request payloads keep the field for back-compat verification. Existing form-structure tests adjusted for tab restructure (selectors prefixed with tab container ids).tests/test_admin_tables_ui_materialized.py— assertions adjusted for tab restructure.CHANGELOG.md—## [Unreleased]block with### Added,### Changed,### Fixed,### Deprecatedentries.
Deleted:
- Nothing (Pydantic fields stay alive with
deprecated=True).
Untouched:
src/db.py— schema stays at v20. Columns survive.src/profiler.py— keeps readingsync_strategyfor partition detection.src/orchestrator.py— local-parquet walk picks up Keboola materialized parquets the same way it picks up BQ ones today.connectors/jira/**— Jira tab is read-only; no register form, no backend change.cli/**— analyst-sideda sync/da query/da fetchflow unchanged. Materialized Keboola parquets show up in the manifest withsource_type='keboola'+query_mode='local'(because the result is a local parquet) — analyst-side rails (CLAUDE.md) treat them like any other Keboola table.
Phase A — Spike: lock down the Keboola extension query passthrough
Goal: Phase B and onward depend on the Keboola DuckDB extension supporting COPY (admin SELECT) TO 'parquet'. The grep at planning time confirmed the existing extractor already uses this pattern, but we want a dedicated test that pins the capability so a future extension upgrade doesn't silently break the Keboola materialized path.
Task A1: Lock-in test for Keboola extension query passthrough
Files:
-
Create:
tests/test_keboola_extension_query_passthrough.py -
Step 1: Write the failing test
"""Lock-in test for the DuckDB Keboola extension's query-passthrough
capability that the Keboola materialized path depends on.
Run only when KBC_TEST_URL + KBC_TEST_TOKEN env vars are set (CI without
real Keboola credentials skips). Local dev with a real Storage API
token exercises the path.
"""
import os
import pytest
import duckdb
KBC_URL = os.environ.get("KBC_TEST_URL")
KBC_TOKEN = os.environ.get("KBC_TEST_TOKEN")
KBC_BUCKET = os.environ.get("KBC_TEST_BUCKET")
KBC_TABLE = os.environ.get("KBC_TEST_TABLE")
pytestmark = pytest.mark.skipif(
not all([KBC_URL, KBC_TOKEN, KBC_BUCKET, KBC_TABLE]),
reason="Keboola integration creds not provided",
)
def test_extension_supports_attach_and_select(tmp_path):
"""Keboola extension must support: ATTACH 'keboola://...' AS kbc, then
SELECT * FROM kbc.bucket.table. The Keboola materialized path uses this
primitive at runtime (just like connectors/keboola/extractor.py:133)."""
conn = duckdb.connect(str(tmp_path / "spike.duckdb"))
conn.execute("INSTALL keboola FROM community")
conn.execute("LOAD keboola")
escaped_token = KBC_TOKEN.replace("'", "''")
conn.execute(f"ATTACH '{KBC_URL}' AS kbc (TYPE keboola, TOKEN '{escaped_token}')")
rows = conn.execute(
f'SELECT COUNT(*) FROM kbc."{KBC_BUCKET}"."{KBC_TABLE}"'
).fetchone()
assert rows[0] >= 0 # any non-negative count is fine; we're testing the path works
def test_extension_supports_copy_to_parquet(tmp_path):
"""Keboola materialized writes the SELECT result via
`COPY (...) TO '...' (FORMAT PARQUET)`. Lock that primitive."""
conn = duckdb.connect(str(tmp_path / "spike.duckdb"))
conn.execute("INSTALL keboola FROM community")
conn.execute("LOAD keboola")
escaped_token = KBC_TOKEN.replace("'", "''")
conn.execute(f"ATTACH '{KBC_URL}' AS kbc (TYPE keboola, TOKEN '{escaped_token}')")
parquet_path = tmp_path / "out.parquet"
safe_lit = str(parquet_path).replace("'", "''")
conn.execute(
f'COPY (SELECT * FROM kbc."{KBC_BUCKET}"."{KBC_TABLE}" LIMIT 5) '
f"TO '{safe_lit}' (FORMAT PARQUET)"
)
assert parquet_path.exists() and parquet_path.stat().st_size > 0
def test_extension_supports_filtered_query(tmp_path):
"""Most important capability: a non-trivial WHERE/projection survives.
This is what 'Custom SQL' mode actually relies on."""
conn = duckdb.connect(str(tmp_path / "spike.duckdb"))
conn.execute("INSTALL keboola FROM community")
conn.execute("LOAD keboola")
escaped_token = KBC_TOKEN.replace("'", "''")
conn.execute(f"ATTACH '{KBC_URL}' AS kbc (TYPE keboola, TOKEN '{escaped_token}')")
parquet_path = tmp_path / "filtered.parquet"
safe_lit = str(parquet_path).replace("'", "''")
# Trivially filterable SELECT — extension must push the WHERE down or
# at minimum execute it client-side. Either is acceptable for our
# materialized path.
conn.execute(
f'COPY (SELECT 1 AS marker FROM kbc."{KBC_BUCKET}"."{KBC_TABLE}" LIMIT 3) '
f"TO '{safe_lit}' (FORMAT PARQUET)"
)
assert parquet_path.exists()
- Step 2: Run the test to verify it skips (no creds in dev) or passes (creds present)
pytest tests/test_keboola_extension_query_passthrough.py -v
Expected: SKIP if no creds, PASS if KBC_TEST_URL etc. are set. Both outcomes confirm the test is well-formed; it gates Phase B but doesn't block dev work.
- Step 3: Commit
git add tests/test_keboola_extension_query_passthrough.py
git commit -m "test(keboola): lock-in Keboola extension query passthrough capability
The upcoming Keboola materialized path depends on the DuckDB Keboola
extension supporting:
ATTACH 'keboola://...' AS kbc (TYPE keboola, TOKEN '...');
COPY (SELECT * FROM kbc.bucket.table WHERE ...) TO 'parquet';
The existing extractor already uses this pattern (extractor.py:209), so
the capability is verified; this test pins it so a future extension
upgrade doesn't silently regress the materialized path. Skips in CI
without KBC_TEST_* env vars; passes locally with a real Storage API
token."
Phase B — Backend: Keboola materialized path
Task B1: KeboolaAccess facade
Files:
-
Create:
connectors/keboola/access.py -
Create:
tests/test_keboola_access.py -
Step 1: Write the failing test
"""Tests for KeboolaAccess facade."""
import os
import pytest
from connectors.keboola.access import KeboolaAccess
def test_access_session_yields_attached_duckdb(tmp_path, monkeypatch):
"""Mock-mode test: the facade should accept a token, install+load
the Keboola extension, and ATTACH it as 'kbc'. We verify the SQL
issued by intercepting the duckdb.connect call.
"""
issued = []
class FakeConn:
def execute(self, sql, *args, **kwargs):
issued.append(sql)
class R:
def fetchall(s): return []
def fetchone(s): return (0,)
return R()
def close(self): pass
import duckdb
monkeypatch.setattr(duckdb, "connect", lambda *a, **kw: FakeConn())
acc = KeboolaAccess(
url="https://connection.keboola.com/",
token="fake-token-xyz",
)
with acc.duckdb_session() as conn:
assert conn is not None
# Verify the install + load + attach sequence happened.
joined = "\n".join(issued)
assert "INSTALL keboola" in joined
assert "LOAD keboola" in joined
assert "ATTACH" in joined and "TYPE keboola" in joined
# Token must be escaped for embedding in the ATTACH literal.
assert "fake-token-xyz" in joined
def test_access_escapes_single_quote_in_token(monkeypatch):
"""Defense against a token containing a single quote breaking the
ATTACH literal. SQL injection here is non-trivial because the token
is admin-supplied at instance config time, but escape it anyway."""
issued = []
class FakeConn:
def execute(self, sql, *args, **kwargs):
issued.append(sql)
class R:
def fetchall(s): return []
def fetchone(s): return (0,)
return R()
def close(self): pass
import duckdb
monkeypatch.setattr(duckdb, "connect", lambda *a, **kw: FakeConn())
acc = KeboolaAccess(url="x", token="bad'token")
with acc.duckdb_session() as conn:
pass
attach_sql = next(s for s in issued if "ATTACH" in s)
# Doubled single-quote per SQL string-literal escaping.
assert "bad''token" in attach_sql
def test_access_real_attach_when_creds_present(tmp_path):
"""Smoke when KBC_TEST_URL + KBC_TEST_TOKEN are present."""
url = os.environ.get("KBC_TEST_URL")
token = os.environ.get("KBC_TEST_TOKEN")
if not (url and token):
pytest.skip("Keboola creds not provided")
acc = KeboolaAccess(url=url, token=token)
with acc.duckdb_session() as conn:
# ATTACH must have succeeded — querying duckdb_databases() should
# show the 'kbc' alias.
rows = [r[0] for r in conn.execute("SELECT name FROM duckdb_databases()").fetchall()]
assert "kbc" in rows
- Step 2: Run, verify failure
pytest tests/test_keboola_access.py -v
Expected: ImportError on connectors.keboola.access — module not yet created.
- Step 3: Implement
KeboolaAccess
Write connectors/keboola/access.py:
"""DuckDB session facade for the Keboola Storage API extension.
Parallel of `connectors/bigquery/access.py:BqAccess`. The materialized
Keboola SQL path needs a one-shot DuckDB connection with the Keboola
extension installed, loaded, and ATTACHed; this facade encapsulates
that wiring so `_run_materialized_pass` doesn't need to know the
extension name, the ATTACH alias, or how the token gets quoted into
the URL literal.
"""
from __future__ import annotations
from contextlib import contextmanager
from typing import Iterator
import duckdb
class KeboolaAccess:
"""Lazy DuckDB session manager for the Keboola Storage API extension.
Single-use — call `.duckdb_session()` as a context manager once per
materialized job.
"""
def __init__(self, *, url: str, token: str) -> None:
if not url or not token:
raise ValueError("KeboolaAccess requires url and token")
self._url = url
self._token = token
@contextmanager
def duckdb_session(self) -> Iterator[duckdb.DuckDBPyConnection]:
conn = duckdb.connect(":memory:")
try:
conn.execute("INSTALL keboola FROM community")
conn.execute("LOAD keboola")
escaped_token = self._token.replace("'", "''")
conn.execute(
f"ATTACH '{self._url}' AS kbc "
f"(TYPE keboola, TOKEN '{escaped_token}')"
)
yield conn
finally:
conn.close()
- Step 4: Run, verify pass
pytest tests/test_keboola_access.py -v
Expected: 2 PASS (mock tests), 1 SKIP (real-creds test).
- Step 5: Commit
git add connectors/keboola/access.py tests/test_keboola_access.py
git commit -m "feat(keboola): add KeboolaAccess facade for DuckDB-extension session
Parallel of connectors/bigquery/access.py:BqAccess. Encapsulates the
INSTALL + LOAD + ATTACH sequence the Keboola materialized SQL path
needs, with single-quote-escaped token interpolation. Single-use
context manager — caller wraps `with acc.duckdb_session() as conn:`
around one materialized job.
Mock tests verify the SQL sequence; a real-creds test exercises the
ATTACH end-to-end when KBC_TEST_URL + KBC_TEST_TOKEN are set."
Task B2: connectors.keboola.extractor.materialize_query
Files:
-
Modify:
connectors/keboola/extractor.py -
Create:
tests/test_keboola_materialize.py -
Step 1: Write the failing test
"""Tests for the Keboola materialize_query path."""
import hashlib
import pytest
from pathlib import Path
from unittest.mock import MagicMock
from connectors.keboola import extractor as kbe
def test_materialize_query_writes_parquet_and_returns_metadata(tmp_path, monkeypatch):
"""Mock-mode: feed in a fake KeboolaAccess that yields a fake DuckDB
connection accepting `COPY ... TO '...' (FORMAT PARQUET)` and just
writes a small parquet via duckdb's own primitive on a tmp DB.
"""
import duckdb
real_conn = duckdb.connect(":memory:")
# Pre-create a small relation the fake materialize "copies".
real_conn.execute("CREATE TABLE t AS SELECT 1 AS x, 'hello' AS y UNION ALL SELECT 2, 'world'")
class FakeAccess:
def duckdb_session(self):
from contextlib import contextmanager
@contextmanager
def _cm():
yield real_conn
return _cm()
fake_access = FakeAccess()
output_dir = tmp_path / "out"
output_dir.mkdir()
# Submit a query that selects from the in-memory table (not a real
# Keboola bucket — the test verifies the COPY/parquet/hash path,
# not the extension behavior).
result = kbe.materialize_query(
table_id="example_subset",
sql="SELECT * FROM t",
keboola_access=fake_access,
output_dir=output_dir,
)
parquet_path = output_dir / "example_subset.parquet"
assert parquet_path.exists()
assert result["table_id"] == "example_subset"
assert result["path"] == str(parquet_path)
assert result["rows"] == 2
assert result["bytes"] > 0
# MD5 of the bytes should match what we recompute.
expected_md5 = hashlib.md5(parquet_path.read_bytes()).hexdigest()
assert result["md5"] == expected_md5
def test_materialize_query_zero_rows_logs_warning(tmp_path, caplog):
import duckdb
real_conn = duckdb.connect(":memory:")
real_conn.execute("CREATE TABLE t AS SELECT 1 AS x WHERE FALSE")
class FakeAccess:
def duckdb_session(self):
from contextlib import contextmanager
@contextmanager
def _cm():
yield real_conn
return _cm()
output_dir = tmp_path / "out"
output_dir.mkdir()
with caplog.at_level("WARNING"):
result = kbe.materialize_query(
table_id="empty_subset",
sql="SELECT * FROM t",
keboola_access=FakeAccess(),
output_dir=output_dir,
)
assert result["rows"] == 0
assert "0 rows" in caplog.text or "empty" in caplog.text.lower()
def test_materialize_query_rejects_unsafe_table_id(tmp_path):
"""Defense: table_id is interpolated into the parquet filename. SQL/
path-traversal-unsafe values must be rejected up-front (mirror of BQ
materialize_query's validation)."""
class FakeAccess:
def duckdb_session(self):
raise AssertionError("should not be called")
output_dir = tmp_path / "out"
output_dir.mkdir()
with pytest.raises(ValueError, match="table_id"):
kbe.materialize_query(
table_id="../../etc/passwd",
sql="SELECT 1",
keboola_access=FakeAccess(),
output_dir=output_dir,
)
- Step 2: Run, verify fail
pytest tests/test_keboola_materialize.py -v
Expected: AttributeError on kbe.materialize_query — function not yet defined.
- Step 3: Implement
Add to connectors/keboola/extractor.py (before any existing top-level helpers):
def materialize_query(
table_id: str,
sql: str,
*,
keboola_access, # KeboolaAccess (avoid circular import)
output_dir: Path,
) -> dict:
"""Materialize an admin-registered SELECT against the Keboola Storage
API extension into a parquet file.
Parallel of `connectors/bigquery/extractor.py:materialize_query`.
Cost guardrail: the Keboola extension has no analog of BQ dry-run;
Storage API cost is download-shaped (per-byte egress + Storage API
job). Phase B ships without a guardrail and logs the byte count;
a future PR can add a configurable `max_bytes_per_keboola_materialize`
gate similar to BQ's `max_bytes_per_materialize`.
"""
import re
import hashlib
import logging
logger = logging.getLogger(__name__)
# Defense: table_id is interpolated into the parquet filename.
# Reject anything that's not a safe identifier.
if not re.fullmatch(r"[A-Za-z_][A-Za-z0-9_]*", table_id):
raise ValueError(f"unsafe table_id for materialize: {table_id!r}")
parquet_path = output_dir / f"{table_id}.parquet"
safe_pq_lit = str(parquet_path).replace("'", "''")
with keboola_access.duckdb_session() as conn:
# Run the admin SELECT and copy the result to parquet.
# The COPY wrapper is identical to the existing legacy extract
# path at extractor.py:209; the only difference is the SELECT is
# admin-supplied rather than `SELECT * FROM kbc.bucket.table`.
conn.execute(f"COPY ({sql}) TO '{safe_pq_lit}' (FORMAT PARQUET)")
# Read back row count.
row_count = conn.execute(
f"SELECT COUNT(*) FROM read_parquet('{safe_pq_lit}')"
).fetchone()[0]
file_bytes = parquet_path.read_bytes()
md5 = hashlib.md5(file_bytes).hexdigest()
size = len(file_bytes)
if row_count == 0:
logger.warning(
"Materialized Keboola query for %s wrote 0 rows — verify the "
"SQL filters and that the source bucket has data.",
table_id,
)
return {
"table_id": table_id,
"path": str(parquet_path),
"rows": row_count,
"bytes": size,
"md5": md5,
}
- Step 4: Run, verify pass
pytest tests/test_keboola_materialize.py -v
Expected: 3 PASS.
- Step 5: Commit
git add connectors/keboola/extractor.py tests/test_keboola_materialize.py
git commit -m "feat(keboola): add materialize_query — admin SELECT → parquet
Parallel of connectors/bigquery/extractor.py:materialize_query. Runs an
admin-registered SELECT through the Keboola DuckDB extension via
KeboolaAccess.duckdb_session(), wraps it in COPY ... TO '...'
(FORMAT PARQUET), and returns rows/bytes/md5/path metadata for
sync_state bookkeeping.
Cost guardrail intentionally omitted in this iteration — the Keboola
extension has no dry-run analog and Storage API cost shape is
download-byte-based, not scan-byte-based. Phase B ships with byte-count
logging; a follow-up can add a configurable max_bytes gate if needed.
table_id is validated as a safe identifier (mirror of BQ implementation)
because it's interpolated into the parquet filename."
Task B3: init_extract skips materialized rows
Files:
-
Modify:
connectors/keboola/extractor.py -
Create:
tests/test_keboola_init_extract_skips.py -
Step 1: Failing test
"""Verify the legacy Keboola download path skips materialized rows.
Materialized rows are handled by `_run_materialized_pass` in
`app/api/sync.py`, not by the legacy extractor. Mirror of the BQ
extractor's existing skip behavior at line 188."""
import json
from pathlib import Path
from unittest.mock import patch
from connectors.keboola import extractor as kbe
def test_init_extract_skips_materialized_rows(tmp_path):
"""Given a registry with one local row + one materialized row, the
legacy init_extract path must process only the local row."""
extracts = tmp_path / "extracts" / "keboola"
extracts.mkdir(parents=True)
(extracts / "data").mkdir()
table_configs = [
{
"id": "orders",
"name": "orders",
"bucket": "in.c-sales",
"source_table": "orders",
"query_mode": "local",
},
{
"id": "orders_recent",
"name": "orders_recent",
"source_query": "SELECT * FROM kbc.\"in.c-sales\".\"orders\" WHERE date > '2026-01-01'",
"query_mode": "materialized",
},
]
# Patch the actual ATTACH/COPY path so the test doesn't need real Keboola.
seen = []
def fake_run_one(conn, tc, *a, **kw):
seen.append(tc["id"])
with patch.object(kbe, "_extract_one_table", fake_run_one, create=True):
kbe.init_extract(
extracts_dir=extracts,
table_configs=table_configs,
keboola_url="https://x/",
keboola_token="t",
)
assert seen == ["orders"] # materialized row skipped
def test_init_extract_logs_skip_reason(tmp_path, caplog):
"""When skipping a materialized row, log the reason for ops visibility."""
extracts = tmp_path / "extracts" / "keboola"
extracts.mkdir(parents=True)
(extracts / "data").mkdir()
table_configs = [
{
"id": "orders_recent",
"name": "orders_recent",
"source_query": "SELECT 1",
"query_mode": "materialized",
},
]
with caplog.at_level("INFO"):
with patch.object(kbe, "_extract_one_table", lambda *a, **kw: None, create=True):
kbe.init_extract(
extracts_dir=extracts,
table_configs=table_configs,
keboola_url="https://x/",
keboola_token="t",
)
assert "Skipping" in caplog.text and "materialized" in caplog.text
- Step 2: Run, verify fail
pytest tests/test_keboola_init_extract_skips.py -v
Expected: FAIL — current init_extract does not skip.
- Step 3: Implement skip
Find the existing iteration loop in connectors/keboola/extractor.py (around lines 100–135 where each table_config is processed). Add at the top of the per-table-config loop:
for tc in table_configs:
if tc.get("query_mode") == "materialized":
logger.info(
"Skipping legacy extract for %s — query_mode='materialized', "
"handled by _run_materialized_pass instead",
tc.get("id") or tc.get("name"),
)
continue
... # existing per-table extract logic
(Refactoring note: if the existing loop body is monolithic, optionally extract it into _extract_one_table(conn, tc, ...) so the test can patch it cleanly. The first test above assumes that helper exists; if you keep the body inline, write the test to assert by directly observing parquet outputs instead.)
- Step 4: Run, verify pass
pytest tests/test_keboola_init_extract_skips.py -v
Expected: PASS.
- Step 5: Commit
git add connectors/keboola/extractor.py tests/test_keboola_init_extract_skips.py
git commit -m "feat(keboola): legacy extract skips query_mode='materialized' rows
Mirror of the BQ extractor's existing skip at line 188. Materialized
Keboola rows are handled by _run_materialized_pass (post-Phase-B
implementation) rather than by the legacy bucket-download path. Without
this skip, a materialized row would get full-extracted via its source
bucket reference, double-writing data and confusing the sync_state
bookkeeping."
Task B4: _run_materialized_pass dispatches by source_type
Files:
-
Modify:
app/api/sync.py -
Create:
tests/test_sync_trigger_keboola_materialized.py -
Step 1: Failing test
"""Scheduler-level test: when a Keboola row has query_mode='materialized',
_run_materialized_pass uses connectors.keboola.extractor.materialize_query
(not BQ's). Existing BQ-materialized rows continue using BqAccess."""
from unittest.mock import patch, MagicMock
import pytest
def test_run_materialized_pass_dispatches_keboola_to_keboola_extractor(seeded_app, tmp_path):
c = seeded_app["client"]
token = seeded_app["admin_token"]
auth = {"Authorization": f"Bearer {token}"}
# Register a Keboola materialized row.
r = c.post(
"/api/admin/register-table",
headers=auth,
json={
"name": "orders_recent",
"source_type": "keboola",
"query_mode": "materialized",
"source_query": (
"SELECT * FROM kbc.\"in.c-sales\".\"orders\" "
"WHERE date > '2026-01-01'"
),
},
)
assert r.status_code == 201, r.text
# Patch the two extractor entry points so we can observe which fires.
bq_called = MagicMock()
kb_called = MagicMock()
with patch(
"connectors.bigquery.extractor.materialize_query", bq_called
), patch(
"connectors.keboola.extractor.materialize_query", kb_called
):
# Trigger sync.
r = c.post("/api/sync/trigger", headers=auth)
# Allow background tasks to drain (depends on test client setup).
assert kb_called.called, "Keboola materialize_query was not invoked"
assert not bq_called.called, "BQ materialize_query was wrongly invoked for a Keboola row"
def test_run_materialized_pass_dispatches_bigquery_to_bq_extractor(seeded_app):
"""Regression: existing BQ-materialized path keeps working unchanged."""
c = seeded_app["client"]
token = seeded_app["admin_token"]
auth = {"Authorization": f"Bearer {token}"}
r = c.post(
"/api/admin/register-table",
headers=auth,
json={
"name": "events_summary",
"source_type": "bigquery",
"query_mode": "materialized",
"source_query": "SELECT date, COUNT(*) FROM `proj.dataset.events` GROUP BY 1",
},
)
assert r.status_code == 201, r.text
bq_called = MagicMock()
kb_called = MagicMock()
with patch(
"connectors.bigquery.extractor.materialize_query", bq_called
), patch(
"connectors.keboola.extractor.materialize_query", kb_called
):
c.post("/api/sync/trigger", headers=auth)
assert bq_called.called
assert not kb_called.called
- Step 2: Run, verify fail
pytest tests/test_sync_trigger_keboola_materialized.py -v
Expected: FAIL — _run_materialized_pass doesn't yet dispatch by source_type for Keboola.
- Step 3: Implement dispatch
Find _run_materialized_pass in app/api/sync.py (around line 57). The current body iterates rows and calls _materialize_table (which wraps BQ's materialize_query). Refactor:
def _run_materialized_pass(conn, bq=None) -> dict:
"""Run all materialized rows that are due, dispatching by source_type
to the correct connector's materialize_query.
BigQuery rows go through BqAccess + bigquery_query() (jobs API),
optionally cost-guarded by max_bytes_per_materialize.
Keboola rows go through KeboolaAccess + ATTACH-and-COPY, no
guardrail (extension has no dry-run primitive)."""
from connectors.bigquery.extractor import materialize_query as bq_materialize
from connectors.keboola.extractor import materialize_query as kb_materialize
from connectors.keboola.access import KeboolaAccess
from src.repositories.table_registry import TableRegistryRepository
from src.scheduler import is_table_due
# ... existing imports
repo = TableRegistryRepository(conn)
rows = repo.list_materialized_due() # or however the existing iteration looks
stats = {"materialized": 0, "skipped": 0, "errors": []}
keboola_access = None # lazy
for row in rows:
source_type = row.get("source_type") or "bigquery" # legacy default
if source_type == "bigquery":
try:
bq_materialize(
table_id=row["id"],
sql=row["source_query"],
bq=bq, # existing BqAccess instance
output_dir=..., # existing path
max_bytes=..., # existing guardrail config
)
stats["materialized"] += 1
except Exception as e:
stats["errors"].append({"id": row["id"], "error": str(e)})
elif source_type == "keboola":
if keboola_access is None:
# Lazy-init using instance config.
from app.instance_config import get_value
keboola_url = get_value("data_source", "keboola", "url")
keboola_token = os.environ.get(
get_value("data_source", "keboola", "token_env")
)
if not (keboola_url and keboola_token):
stats["errors"].append({
"id": row["id"],
"error": "Keboola URL/token not configured for materialized path",
})
continue
keboola_access = KeboolaAccess(url=keboola_url, token=keboola_token)
try:
kb_materialize(
table_id=row["id"],
sql=row["source_query"],
keboola_access=keboola_access,
output_dir=..., # /data/extracts/keboola/data/
)
stats["materialized"] += 1
except Exception as e:
stats["errors"].append({"id": row["id"], "error": str(e)})
else:
stats["skipped"] += 1
stats["errors"].append({
"id": row["id"],
"error": f"materialized path not supported for source_type={source_type!r}",
})
return stats
(Adapt to the actual existing _run_materialized_pass shape — the snippet above is the structural change; concrete details like output_dir path and existing helper names are read from the file at implementation time.)
- Step 4: Run, verify pass
pytest tests/test_sync_trigger_keboola_materialized.py -v
pytest tests/test_sync_trigger_materialized.py -v # existing BQ test must still pass
Expected: both files pass.
- Step 5: Commit
git add app/api/sync.py tests/test_sync_trigger_keboola_materialized.py
git commit -m "feat(sync): _run_materialized_pass dispatches by source_type
BQ materialized rows continue using BqAccess + bigquery_query() with
the cost guardrail. New Keboola materialized rows go through
KeboolaAccess + ATTACH-and-COPY (no guardrail — Keboola extension has
no dry-run primitive; download-byte-shaped cost is logged).
Existing tests for BQ dispatch keep passing (regression test
explicitly added). New tests verify Keboola dispatch fires for
source_type='keboola' rows and stays silent for BQ rows."
Phase C — Backend: Pydantic deprecation + profile_after_sync becomes inert
Task C1: Mark sync_strategy and profile_after_sync deprecated, stop persisting profile_after_sync
Files:
-
Modify:
app/api/admin.py(Pydantic models around lines 654–728 and 880–895; BQ register endpoint around line 791;update_tablearound line 1623) -
Modify:
tests/test_admin_bq_register.py(assertions ofrow["profile_after_sync"] is False→ drop, replace with assertion that the field-being-sent doesn't error) -
Step 1: Failing test (deprecation visible in OpenAPI + field becomes inert)
"""Verify Phase C deprecation marks + profile_after_sync becomes inert."""
import pytest
from app.api.admin import RegisterTableRequest, UpdateTableRequest
def test_register_request_marks_sync_strategy_deprecated():
schema = RegisterTableRequest.model_json_schema()
field = schema["properties"]["sync_strategy"]
assert field.get("deprecated") is True
def test_register_request_marks_profile_after_sync_deprecated():
schema = RegisterTableRequest.model_json_schema()
field = schema["properties"]["profile_after_sync"]
assert field.get("deprecated") is True
def test_register_endpoint_accepts_profile_after_sync_for_backcompat(seeded_app):
"""External clients sending profile_after_sync get no error — the
field is silently ignored."""
c = seeded_app["client"]
token = seeded_app["admin_token"]
auth = {"Authorization": f"Bearer {token}"}
r = c.post(
"/api/admin/register-table",
headers=auth,
json={
"name": "x",
"source_type": "keboola",
"bucket": "in.c-foo",
"source_table": "y",
"query_mode": "local",
"profile_after_sync": True, # legacy client may send this
},
)
assert r.status_code == 201
def test_register_endpoint_does_not_persist_profile_after_sync(seeded_app):
"""The persisted row no longer carries the old profile_after_sync
value (column may still exist in DB for back-compat, but admin path
never writes a non-default value)."""
c = seeded_app["client"]
token = seeded_app["admin_token"]
auth = {"Authorization": f"Bearer {token}"}
r = c.post(
"/api/admin/register-table",
headers=auth,
json={
"name": "y",
"source_type": "keboola",
"bucket": "in.c-foo",
"source_table": "y",
"query_mode": "local",
"profile_after_sync": True,
},
)
assert r.status_code == 201
r = c.get("/api/admin/registry", headers=auth)
rows = r.json()["tables"]
row = next(t for t in rows if t["id"] == "y")
# The field's value in the registry response is now whatever the DB
# default is (True per current schema). Critical: the request value
# is NOT echoed back.
# If the value is in the response at all (legacy back-compat in the
# GET serializer), it's the schema default, not the request value.
# If the value is absent (deprecated and stripped), that's also fine.
if "profile_after_sync" in row:
# Whatever this is, it's the schema default, not request-driven.
assert row["profile_after_sync"] is True or row["profile_after_sync"] is None
- Step 2: Run, verify fail
pytest tests/test_admin_phase_c_deprecation.py -v
Expected: deprecation-mark assertions FAIL (no deprecated=True yet).
- Step 3: Implement Pydantic deprecation marks
In app/api/admin.py RegisterTableRequest definition, change:
sync_strategy: str = "full_refresh"
to:
sync_strategy: str = Field(
default="full_refresh",
deprecated=True,
description=(
"DEPRECATED: catalog/profiler metadata only. No extractor reads "
"this field; every sync is a full overwrite regardless of value. "
"profiler.is_partitioned() consumes it for parquet-layout "
"detection. Field stays for back-compat; will be removed in a "
"future major release."
),
)
Same treatment for profile_after_sync:
profile_after_sync: bool = Field(
default=True,
deprecated=True,
description=(
"DEPRECATED: not consumed by the runtime (Agent 1 finding "
"2026-05-01). Profiler runs unconditionally on every synced "
"table; this flag has no effect. Field stays for back-compat."
),
)
In the BQ register endpoint at app/api/admin.py:791, find the line that sets request.profile_after_sync = False and remove it (the field is now inert, no need to force a value).
In update_table at app/api/admin.py:1657, the synthetic RegisterTableRequest carries profile_after_sync=bool(merged.get("profile_after_sync") or False) — keep this for back-compat but understand it's now decorative; the synthetic-validate path doesn't need to change.
In register_table at app/api/admin.py:1362 (the actual repo.register call), drop profile_after_sync=request.profile_after_sync from the kwargs if it's there. The DB column has its default (True per schema) and stays consistent.
- Step 4: Run
pytest tests/test_admin_phase_c_deprecation.py -v
pytest tests/test_admin_bq_register.py -v
Expected: new tests pass; existing BQ register tests need updates where they assert row["profile_after_sync"] is False.
- Step 5: Update existing assertions in
tests/test_admin_bq_register.py
Find lines 247, 648, 1371, 1430 where assert row["profile_after_sync"] is False exists. Replace with a comment + back-compat assertion:
# Phase C: profile_after_sync is now inert. The field is accepted in
# the request for back-compat but no longer overrides the DB default.
# Was: assert row["profile_after_sync"] is False (when BQ register
# forced it to False as a "signal"). Now the row carries the schema
# default (True). Profiler runs unconditionally regardless.
assert row.get("profile_after_sync") in (True, None)
- Step 6: Run full sweep
pytest tests/test_admin_bq_register.py tests/test_admin_phase_c_deprecation.py -v
Expected: all pass.
- Step 7: Commit
git add app/api/admin.py tests/test_admin_bq_register.py tests/test_admin_phase_c_deprecation.py
git commit -m "feat(admin-api): mark sync_strategy + profile_after_sync deprecated; profile_after_sync becomes inert
OpenAPI schema now flags both fields with deprecated=true. External API
clients see the signal during their next regen but get no runtime
error — back-compat preserved.
profile_after_sync was previously force-set to False by the BQ register
endpoint as a 'signal,' but app/api/sync.py:410-438 never reads the
flag (Agent 1 finding 2026-05-01). The runtime profiles every synced
table unconditionally. Phase C removes the force-False line and stops
the field from overriding the DB default — it's now decorative-only
in both directions.
sync_strategy stays alive in DB and Pydantic because
profiler.is_partitioned() at src/profiler.py:222 still consumes it for
parquet-directory-layout detection on existing partitioned rows. Phase
F (UI) hides the field from the form; Phase C just labels it for
external consumers.
Existing BQ register tests asserting row['profile_after_sync'] is False
updated to back-compat-tolerant form."
Task C2: RegisterTableRequest validator allows Keboola materialized
Files:
-
Modify:
app/api/admin.py(_check_mode_query_coherencemodel validator, around lines 681–692) -
Create:
tests/test_admin_keboola_materialized.py -
Step 1: Failing test
"""Tests for Keboola materialized registration."""
import pytest
def test_register_keboola_materialized_accepts_source_query(seeded_app):
c = seeded_app["client"]
token = seeded_app["admin_token"]
auth = {"Authorization": f"Bearer {token}"}
r = c.post(
"/api/admin/register-table",
headers=auth,
json={
"name": "orders_recent",
"source_type": "keboola",
"query_mode": "materialized",
"source_query": "SELECT * FROM kbc.\"in.c-sales\".\"orders\" WHERE date > '2026-01-01'",
"sync_schedule": "daily 03:00",
},
)
assert r.status_code == 201, r.text
def test_register_keboola_materialized_rejects_missing_source_query(seeded_app):
c = seeded_app["client"]
token = seeded_app["admin_token"]
auth = {"Authorization": f"Bearer {token}"}
r = c.post(
"/api/admin/register-table",
headers=auth,
json={
"name": "orders_recent",
"source_type": "keboola",
"query_mode": "materialized",
# source_query missing
},
)
assert r.status_code == 422
assert "source_query" in r.text
def test_register_keboola_materialized_skips_bucket_check(seeded_app):
"""Materialized rows don't need bucket/source_table — the SELECT inlines
the references. Mirror of BQ materialized validator behavior."""
c = seeded_app["client"]
token = seeded_app["admin_token"]
auth = {"Authorization": f"Bearer {token}"}
r = c.post(
"/api/admin/register-table",
headers=auth,
json={
"name": "x",
"source_type": "keboola",
"query_mode": "materialized",
"source_query": "SELECT 1",
# No bucket / source_table — must still succeed.
},
)
assert r.status_code == 201, r.text
def test_update_keboola_materialized_clears_stale_source_query_on_mode_switch(seeded_app):
c = seeded_app["client"]
token = seeded_app["admin_token"]
auth = {"Authorization": f"Bearer {token}"}
# Register materialized.
r = c.post(
"/api/admin/register-table",
headers=auth,
json={
"name": "x",
"source_type": "keboola",
"query_mode": "materialized",
"source_query": "SELECT 1",
},
)
assert r.status_code == 201
# PUT to switch back to local — source_query must clear.
r = c.put(
"/api/admin/registry/x",
headers=auth,
json={
"source_type": "keboola",
"query_mode": "local",
"bucket": "in.c-foo",
"source_table": "y",
},
)
assert r.status_code == 200
r = c.get("/api/admin/registry", headers=auth)
rows = r.json()["tables"]
row = next(t for t in rows if t["id"] == "x")
assert row.get("source_query") in (None, "")
- Step 2: Run, verify fail
pytest tests/test_admin_keboola_materialized.py -v
Expected: at least the first test fails — current validator rejects materialized for non-BQ source_type, or accepts but the storage path bombs.
- Step 3: Implement validator update
Find _check_mode_query_coherence in app/api/admin.py (around lines 681–692). It currently enforces source_query IFF query_mode='materialized'. Verify it doesn't gate by source_type. If it does, remove the gate. If it doesn't, the test should already pass — investigate.
Also check _validate_bigquery_register_payload (around line 794) — make sure it isn't called for non-BQ rows. The dispatch at register_table line 1354 should already be source_type == 'bigquery'-gated.
For the update_table PUT semantics test, verify that update_table at line 1642 already has the "switching away from materialized → drop source_query" logic. Mirror it for the reverse (switching INTO materialized → drop bucket/source_table) if needed.
- Step 4: Run, verify pass
pytest tests/test_admin_keboola_materialized.py -v
- Step 5: Commit
git add app/api/admin.py tests/test_admin_keboola_materialized.py
git commit -m "feat(admin-api): allow query_mode='materialized' for Keboola source_type
The model validator already only gates materialized↔source_query
coherence (no source_type-specific check). Phase B made the runtime
materialized path source_type-aware. This commit pins the API contract
with end-to-end tests that:
- Keboola+materialized POST with source_query succeeds
- Keboola+materialized POST without source_query is rejected (422)
- Keboola+materialized POST without bucket/source_table succeeds (the
SELECT inlines references — same as BQ)
- PUT switching a materialized row back to local clears the stale
source_query (mirror of BQ behavior at admin.py:1642)"
Phase D — UI: tab-split scaffold
Task D1: Tab nav structure + routing
Files:
-
Modify:
app/web/templates/admin_tables.html(top of<body>around the existing single form area) -
Create:
tests/test_admin_tables_tab_ui.py -
Step 1: Failing test
"""UI tests for the per-connector tab layout."""
import pytest
def _auth(token):
return {"Authorization": f"Bearer {token}"}
def test_admin_tables_renders_tab_nav(seeded_app):
"""Page has tab nav with at least the source types configured for
the instance plus Jira (always shown when any Jira rows exist)."""
c = seeded_app["client"]
token = seeded_app["admin_token"]
r = c.get("/admin/tables", headers=_auth(token))
assert r.status_code == 200
html = r.text
assert 'role="tablist"' in html or 'class="tab-nav"' in html
assert 'data-tab="bigquery"' in html or 'id="tab-bigquery"' in html
assert 'data-tab="keboola"' in html or 'id="tab-keboola"' in html
def test_admin_tables_active_tab_matches_instance_type(seeded_app, monkeypatch):
"""When data_source.type='bigquery', the BigQuery tab is the
initially-active one. Operator can still switch to Keboola tab if
they want to register a secondary source."""
fake_cfg = {"data_source": {"type": "bigquery", "bigquery": {"project": "p"}}}
monkeypatch.setattr(
"app.instance_config.load_instance_config",
lambda: fake_cfg, raising=False,
)
from app.instance_config import reset_cache
reset_cache()
try:
c = seeded_app["client"]
token = seeded_app["admin_token"]
r = c.get("/admin/tables", headers=_auth(token))
html = r.text
# The BQ tab content is the visible one initially.
# Either a class="active" on the BQ tab button, or aria-selected="true".
assert (
'data-tab="bigquery" class="tab active"' in html
or 'data-tab="bigquery" aria-selected="true"' in html
)
finally:
reset_cache()
def test_admin_tables_each_tab_has_register_button(seeded_app):
"""Each writable source tab has its own Register button. Jira is
read-only (no Register)."""
c = seeded_app["client"]
token = seeded_app["admin_token"]
r = c.get("/admin/tables", headers=_auth(token))
html = r.text
# Each Register button is scoped to its tab — id distinguishes.
# We check presence of the registration trigger elements.
assert 'id="bqRegisterBtn"' in html or 'data-register-source="bigquery"' in html
assert 'id="kbRegisterBtn"' in html or 'data-register-source="keboola"' in html
# No Jira register button (Jira is webhook-driven).
assert 'data-register-source="jira"' not in html
def test_admin_tables_listing_per_tab(seeded_app):
"""The registry table is rendered per tab — each tab has its own
<tbody> filtered by source_type. Listing JS reads tables from the
catalog API and routes each row into the matching tab's <tbody>."""
c = seeded_app["client"]
token = seeded_app["admin_token"]
r = c.get("/admin/tables", headers=_auth(token))
html = r.text
assert 'id="bqTableListing"' in html
assert 'id="kbTableListing"' in html
assert 'id="jiraTableListing"' in html
def test_admin_tables_tab_persists_in_url_hash(seeded_app):
"""Tab switching updates window.location.hash so refresh keeps the
operator on the right tab. Verify the JS hooks for it are present."""
c = seeded_app["client"]
token = seeded_app["admin_token"]
r = c.get("/admin/tables", headers=_auth(token))
html = r.text
assert "location.hash" in html or "history.replaceState" in html
# And initial-tab pickup from hash on load.
assert "window.location.hash" in html or "getActiveTabFromHash" in html
- Step 2: Run, verify fail
pytest tests/test_admin_tables_tab_ui.py -v
Expected: all FAIL — tab structure not yet in template.
- Step 3: Implement tab nav + content panels
Restructure app/web/templates/admin_tables.html. The existing single content area becomes three tab panels. Outline of the new top-level structure (replace the existing single page-content area):
{# Determine the initial active tab from the data source type +
any registered rows. Operator can still switch tabs to register
in another source. #}
{% set initial_tab = data_source_type %}
<nav class="tab-nav" role="tablist">
<button class="tab" data-tab="bigquery"
aria-selected="{{ 'true' if initial_tab == 'bigquery' else 'false' }}"
onclick="switchTab('bigquery')">BigQuery</button>
<button class="tab" data-tab="keboola"
aria-selected="{{ 'true' if initial_tab == 'keboola' else 'false' }}"
onclick="switchTab('keboola')">Keboola</button>
<button class="tab" data-tab="jira"
aria-selected="false"
onclick="switchTab('jira')">Jira</button>
</nav>
<section id="tab-content-bigquery" class="tab-content"
style="display: {% if initial_tab == 'bigquery' %}block{% else %}none{% endif %};">
{# BQ tab: Register button, listing, modals — Phase E moves
existing content here. #}
<div class="tab-header">
<h2>BigQuery tables</h2>
<button id="bqRegisterBtn" class="btn btn-primary"
onclick="openRegisterModal('bigquery')">Register BigQuery table</button>
</div>
<div id="bqTableListing"></div>
{# Existing BQ register/edit modals get scoped here in Phase E. #}
</section>
<section id="tab-content-keboola" class="tab-content"
style="display: {% if initial_tab == 'keboola' %}block{% else %}none{% endif %};">
<div class="tab-header">
<h2>Keboola tables</h2>
<button id="kbRegisterBtn" class="btn btn-primary"
onclick="openRegisterModal('keboola')">Register Keboola table</button>
</div>
<div id="kbTableListing"></div>
{# Phase F builds the Keboola form here. #}
</section>
<section id="tab-content-jira" class="tab-content" style="display: none;">
<div class="tab-header">
<h2>Jira tables</h2>
<p class="hint">Jira tables are populated by webhooks. To register a new
Jira webhook integration, see <code>docs/connectors/jira.md</code>.</p>
</div>
<div id="jiraTableListing"></div>
</section>
<script>
function switchTab(tab) {
document.querySelectorAll('.tab').forEach(function(b) {
b.setAttribute('aria-selected', b.dataset.tab === tab ? 'true' : 'false');
});
document.querySelectorAll('.tab-content').forEach(function(c) {
c.style.display = c.id === ('tab-content-' + tab) ? 'block' : 'none';
});
history.replaceState(null, '', '#' + tab);
}
(function initTabFromHash() {
var hash = window.location.hash.replace(/^#/, '');
if (hash === 'bigquery' || hash === 'keboola' || hash === 'jira') {
switchTab(hash);
}
})();
</script>
<style>
.tab-nav { display: flex; gap: 4px; border-bottom: 1px solid #e0e0e0; margin-bottom: 16px; }
.tab { padding: 8px 16px; background: transparent; border: 0; cursor: pointer; }
.tab[aria-selected="true"] { border-bottom: 2px solid #4a8cff; font-weight: 600; }
.tab-content { padding: 16px 0; }
.tab-header { display: flex; justify-content: space-between; align-items: center; margin-bottom: 16px; }
</style>
- Step 4: Run, verify pass
pytest tests/test_admin_tables_tab_ui.py -v
- Step 5: Commit
git add app/web/templates/admin_tables.html tests/test_admin_tables_tab_ui.py
git commit -m "feat(admin-ui): tab-split scaffold for /admin/tables
Per-connector tabs (BigQuery / Keboola / Jira) replace the single
mixed form. Each tab has its own Register button + listing div +
(later) form modals. Initial active tab matches data_source.type
from instance.yaml; operator can switch tabs to manage a secondary
source.
Tab state persists in window.location.hash so refresh keeps the
operator on the right tab. No JS framework — vanilla JS toggles
display on .tab-content sections.
Listing divs (bqTableListing / kbTableListing / jiraTableListing)
are wired in Phase H (per-tab listing filter)."
Phase E — UI: BigQuery tab content (relocate existing #148 form)
Task E1: Move BQ Register modal + listing logic into BQ tab
Files:
-
Modify:
app/web/templates/admin_tables.html -
Modify:
tests/test_admin_tables_ui_materialized.py(selector adjustments) -
Step 1: Failing test (existing tests must pass against new tab structure)
The existing test_admin_tables_renders_two_question_radio_form and test_edit_modal_has_bq_parity_fields already assert the BQ form exists. Update them to assert the form is inside the BQ tab:
def test_admin_tables_renders_two_question_radio_form(seeded_app, bq_instance):
"""[Phase E] BQ form moved into tab-content-bigquery section."""
c = seeded_app["client"]
token = seeded_app["admin_token"]
r = c.get("/admin/tables", headers=_auth(token))
assert r.status_code == 200, r.text
html = r.text
# Existing assertions (preserved):
assert 'name="bqAccessMode"' in html
assert 'value="live"' in html
# ... (all the original assertions stay)
# NEW: form fields are inside the BQ tab content area.
bq_tab_content = html[html.index('id="tab-content-bigquery"'):]
bq_tab_end = bq_tab_content.index('</section>')
bq_section = bq_tab_content[:bq_tab_end]
assert 'name="bqAccessMode"' in bq_section
assert 'id="bqDataset"' in bq_section
assert 'id="bqSourceQuery"' in bq_section
- Step 2: Run, verify fail
pytest tests/test_admin_tables_ui_materialized.py::test_admin_tables_renders_two_question_radio_form -v
Expected: FAIL on the tab-content-bigquery slice — form not yet inside the tab.
- Step 3: Move BQ form into BQ tab content section
Take the existing BQ register form block (currently inside the {% if data_source.type == 'bigquery' %} Jinja branch) and physically relocate it inside the <section id="tab-content-bigquery"> element added in Phase D. Remove the outer {% if %} branch — the form is always rendered, just inside its tab. Same for the BQ Edit modal block — relocate inside the BQ tab section.
Adjust the open/close modal trigger functions:
// Old: openRegisterModal() — assumed single source
// New: openRegisterModal(source)
function openRegisterModal(source) {
if (source === 'bigquery') {
document.getElementById('registerBqModal').style.display = 'block';
} else if (source === 'keboola') {
document.getElementById('registerKeboolaModal').style.display = 'block';
}
}
(The Keboola modal id is added in Phase F.)
- Step 4: Run, verify pass
pytest tests/test_admin_tables_ui_materialized.py -v
pytest tests/test_admin_bq_register.py -v
pytest tests/test_admin_discover_bigquery.py -v
Expected: all existing BQ-form tests pass — the form behaves identically, just from inside a tab.
- Step 5: Commit
git add app/web/templates/admin_tables.html tests/test_admin_tables_ui_materialized.py
git commit -m "refactor(admin-ui): relocate BigQuery form into BigQuery tab
Phase E of the tab-split. Existing BQ register/edit modals + Discover/
List-tables/Use-as-base buttons + two-question radio model preserved
verbatim — only the parent <section> changed. The Jinja
{% if data_source.type == 'bigquery' %} branch is gone; the form is
always rendered, just inside #tab-content-bigquery.
openRegisterModal() now takes a source argument. Existing tests for
form structure adjusted to slice on the BQ tab content; no behavior
change."
Phase F — UI: Keboola tab content (with Custom SQL + form cleanup)
Task F1: Keboola Register modal — full rebuild with two-question radio + form cleanup
Files:
-
Modify:
app/web/templates/admin_tables.html -
Create test: extend
tests/test_admin_tables_ui_materialized.py -
Step 1: Failing test
def test_keboola_register_form_has_two_question_radio(seeded_app, monkeypatch):
"""Phase F: Keboola tab Register form mirrors BQ's two-question
radio model, but Q1 (access mode) is forced to 'synced' (no Live
mode for Keboola), so visually only Q2 (sync mode = whole | custom)
is exposed.
Q2.whole → query_mode='materialized' with auto SELECT * FROM kbc.bucket.table
Q2.custom → query_mode='materialized' with admin SELECT
Both create materialized rows; the legacy 'local' mode is no longer
user-selectable (it would be exactly equivalent to whole)."""
fake_cfg = {"data_source": {"type": "keboola", "keboola": {}}}
monkeypatch.setattr(
"app.instance_config.load_instance_config",
lambda: fake_cfg, raising=False,
)
from app.instance_config import reset_cache
reset_cache()
try:
c = seeded_app["client"]
token = seeded_app["admin_token"]
r = c.get("/admin/tables", headers=_auth(token))
html = r.text
kb_tab = html[html.index('id="tab-content-keboola"'):]
kb_tab = kb_tab[:kb_tab.index('</section>')]
# Q2 radio — Whole vs Custom.
assert 'name="kbSyncMode"' in kb_tab
assert 'value="whole"' in kb_tab
assert 'value="custom"' in kb_tab
# Bucket + source-table inputs reused for whole mode.
assert 'id="kbBucket"' in kb_tab
assert 'id="kbSourceTable"' in kb_tab
# Custom-SQL textarea + Use-table-as-base prefill button.
assert 'id="kbSourceQuery"' in kb_tab
assert 'kbPrefillFromTable' in html or 'prefillFromTable(\'kbSourceQuery\')' in html
# Sync Schedule input — was missing from old Keboola form.
assert 'id="kbSyncSchedule"' in kb_tab
# Sync Strategy dropdown — gone.
assert 'id="kbStrategy"' not in kb_tab
assert 'id="regStrategy"' not in html # leftover sanity
# Primary Key — under <details>Advanced.
assert 'id="kbPrimaryKey"' in kb_tab
assert "<details" in kb_tab
assert ">Advanced" in kb_tab
# Discover datasets / List tables buttons.
assert 'kbDiscoverBuckets' in html or "discoverKeboolaBuckets(" in html
assert 'kbListTables' in html or "discoverKeboolaTables(" in html
def test_keboola_register_payload_maps_to_materialized(seeded_app, monkeypatch):
"""The form's whole-table mode posts query_mode='materialized' with
a synthetic SELECT * SQL — same pattern as BQ Synced/Whole."""
# This test exercises the JS payload via a parameterized fetch shim
# is harder than necessary; instead, verify the API endpoint accepts
# the payload shape the form is going to send.
c = seeded_app["client"]
token = seeded_app["admin_token"]
auth = {"Authorization": f"Bearer {token}"}
r = c.post(
"/api/admin/register-table",
headers=auth,
json={
"name": "orders",
"source_type": "keboola",
"query_mode": "materialized",
"source_query": 'SELECT * FROM kbc."in.c-sales"."orders"',
"sync_schedule": "every 6h",
},
)
assert r.status_code == 201, r.text
- Step 2: Run, verify fail
pytest tests/test_admin_tables_ui_materialized.py::test_keboola_register_form_has_two_question_radio -v
Expected: FAIL — Keboola form not yet built.
- Step 3: Build the Keboola Register modal
Inside the <section id="tab-content-keboola"> from Phase D, add the modal:
<div id="registerKeboolaModal" class="modal" style="display:none;">
<div class="modal-content">
<div class="modal-header">
<h3>Register Keboola table</h3>
<button class="btn-close" onclick="closeRegisterKeboolaModal()">×</button>
</div>
<div class="modal-body">
{# Q2 radio — Sync mode. (Q1 is implicitly 'synced'; Keboola
has no Live mode.) #}
<div class="form-group">
<label class="form-label">What to sync?</label>
<div class="radio-row">
<label>
<input type="radio" name="kbSyncMode" value="whole" checked
onchange="onKbSyncModeChange()">
<strong>Whole table</strong> — pull everything in the
bucket/table on each schedule tick
</label>
</div>
<div class="radio-row">
<label>
<input type="radio" name="kbSyncMode" value="custom"
onchange="onKbSyncModeChange()">
<strong>Custom SQL</strong> — pre-aggregate or filter
with your own SELECT (e.g. last 30 days only,
per-day rollup)
</label>
</div>
</div>
<div class="form-group">
<label class="form-label" for="kbViewName">View name (analyst-visible)</label>
<input type="text" class="form-input" id="kbViewName"
placeholder="e.g. orders_recent">
</div>
<div class="form-group kb-source-table">
<label class="form-label" for="kbBucket">
Bucket
<button type="button" class="btn btn-secondary btn-sm"
onclick="discoverKeboolaBuckets('kbBucketList')"
style="float:right;">Discover</button>
</label>
<input type="text" class="form-input" id="kbBucket"
list="kbBucketList" placeholder="e.g. in.c-sales">
<datalist id="kbBucketList"></datalist>
</div>
<div class="form-group kb-source-table">
<label class="form-label" for="kbSourceTable">
Source Table
<button type="button" class="btn btn-secondary btn-sm"
onclick="discoverKeboolaTables('kbBucket', 'kbTableList')"
style="float:right;">List tables</button>
</label>
<input type="text" class="form-input" id="kbSourceTable"
list="kbTableList" placeholder="e.g. orders">
<datalist id="kbTableList"></datalist>
</div>
<div class="form-group kb-source-custom" style="display:none;">
<label class="form-label" for="kbSourceQuery">
SQL
<button type="button" class="btn btn-secondary btn-sm"
onclick="prefillFromKeboolaTable('kbSourceQuery')"
style="float:right;"
title="Prefill SELECT * FROM kbc.bucket.table so you only edit the WHERE / projection">
Use table as base
</button>
</label>
<textarea class="form-textarea" id="kbSourceQuery" rows="8"></textarea>
<div class="form-hint">SELECT against <code>kbc."bucket"."table"</code>.
Result is materialized to parquet and distributed via <code>da sync</code>.</div>
</div>
<div class="form-group">
<label class="form-label" for="kbSyncSchedule">Sync Schedule
<span class="optional">(optional, default <code>every 1h</code>)</span></label>
<input type="text" class="form-input" id="kbSyncSchedule" placeholder="every 6h">
<div class="form-hint">
How often Agnes refreshes the local copy. Examples:
<code>every 15m</code>, <code>every 6h</code>,
<code>daily 03:00</code>, <code>daily 07:00,13:00,18:00</code> (UTC).
</div>
</div>
<div class="form-group">
<label class="form-label" for="kbDescription">Description
<span class="optional">(optional)</span></label>
<textarea class="form-textarea" id="kbDescription"
placeholder="Brief description of the table contents..."></textarea>
</div>
<div class="form-group">
<label class="form-label" for="kbFolder">Folder
<span class="optional">(optional)</span></label>
<input type="text" class="form-input" id="kbFolder"
placeholder="e.g. crm, finance, marketing">
</div>
<details class="form-group">
<summary>Advanced (optional)</summary>
<div class="form-group" style="margin-top:8px;">
<label class="form-label" for="kbPrimaryKey">Primary Key</label>
<input type="text" class="form-input" id="kbPrimaryKey"
placeholder="e.g. id">
<div class="form-hint">Comma-separated list. <strong>Catalog
metadata only</strong> — Agnes always does full-overwrite
sync; no upsert/dedup. Auto-filled from the Keboola source
when available.</div>
</div>
</details>
</div>
<div class="modal-footer">
<button class="btn btn-secondary" onclick="closeRegisterKeboolaModal()">Cancel</button>
<button class="btn btn-primary" onclick="registerKeboolaTable()">Register</button>
</div>
</div>
</div>
Plus the JS for the form (in the <script> block):
function _getKbSyncMode() {
var el = document.querySelector('input[name="kbSyncMode"]:checked');
return el ? el.value : 'whole';
}
function onKbSyncModeChange() {
var mode = _getKbSyncMode();
document.querySelectorAll('.kb-source-table').forEach(function(el) {
el.style.display = (mode === 'whole') ? '' : 'none';
});
document.querySelectorAll('.kb-source-custom').forEach(function(el) {
el.style.display = (mode === 'custom') ? '' : 'none';
});
}
function _buildKeboolaPayload() {
var mode = _getKbSyncMode();
var viewName = document.getElementById('kbViewName').value.trim();
var bucket = document.getElementById('kbBucket').value.trim();
var sourceTable = document.getElementById('kbSourceTable').value.trim();
var pk = document.getElementById('kbPrimaryKey').value.trim();
var primaryKey = pk
? pk.split(',').map(function(s) { return s.trim(); }).filter(Boolean)
: [];
var common = {
name: viewName || sourceTable,
source_type: 'keboola',
query_mode: 'materialized',
primary_key: primaryKey,
sync_schedule: document.getElementById('kbSyncSchedule').value.trim() || null,
description: document.getElementById('kbDescription').value.trim() || null,
folder: document.getElementById('kbFolder').value.trim() || null,
};
if (mode === 'custom') {
return Object.assign({}, common, {
source_query: document.getElementById('kbSourceQuery').value.trim(),
});
}
// Whole — synthesize SELECT *.
return Object.assign({}, common, {
bucket: bucket,
source_table: sourceTable,
source_query: 'SELECT * FROM kbc."' + bucket + '"."' + sourceTable + '"',
});
}
function registerKeboolaTable() {
var payload = _buildKeboolaPayload();
fetch('/api/admin/register-table', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload),
})
.then(function(r) {
if (!r.ok) {
return r.json().then(function(d) {
throw new Error(d.detail || d.error || 'Registration failed');
});
}
return r.json();
})
.then(function() {
closeRegisterKeboolaModal();
showToast('Table registered', 'success');
loadRegistry(); // existing function; will route the new row into the right tab
})
.catch(function(err) {
showToast('' + err.message, 'error');
});
}
// Discovery shims — reuse generic helpers if /api/admin/discover-tables
// supports both BQ and Keboola; otherwise add a /api/admin/discover-keboola-tables
// endpoint as a Task F2 (skipped if discovery already source-aware).
function discoverKeboolaBuckets(datalistId) {
fetch('/api/admin/discover-tables?source=keboola')
.then(function(r) { return r.json(); })
.then(function(data) {
var dl = document.getElementById(datalistId);
dl.innerHTML = '';
(data.buckets || data.datasets || []).forEach(function(b) {
var o = document.createElement('option');
o.value = b;
dl.appendChild(o);
});
});
}
function discoverKeboolaTables(bucketInputId, tablesDatalistId) {
var bucket = document.getElementById(bucketInputId).value.trim();
if (!bucket) {
showToast('Fill bucket first', 'error');
return;
}
fetch('/api/admin/discover-tables?source=keboola&bucket=' + encodeURIComponent(bucket))
.then(function(r) { return r.json(); })
.then(function(data) {
var dl = document.getElementById(tablesDatalistId);
dl.innerHTML = '';
(data.tables || []).forEach(function(t) {
var o = document.createElement('option');
o.value = t;
dl.appendChild(o);
});
});
}
function prefillFromKeboolaTable(textareaId) {
var bucket = document.getElementById('kbBucket').value.trim();
var sourceTable = document.getElementById('kbSourceTable').value.trim();
if (!bucket || !sourceTable) {
showToast('Fill bucket + source table first', 'error');
return;
}
var ta = document.getElementById(textareaId);
if (ta.value.trim()) {
if (!confirm('Replace existing SQL?')) return;
}
ta.value = 'SELECT *\nFROM kbc."' + bucket + '"."' + sourceTable + '"\nWHERE -- your filter here';
}
- Step 4: Run, verify pass
pytest tests/test_admin_tables_ui_materialized.py -v
pytest tests/test_admin_keboola_materialized.py -v
- Step 5: Commit
git add app/web/templates/admin_tables.html tests/test_admin_tables_ui_materialized.py
git commit -m "feat(admin-ui): Keboola tab Register modal with Custom SQL + cleanup
Phase F. The Keboola tab now exposes the same two-question radio
model as BigQuery (minus Live, which Keboola doesn't support):
Q2 = Whole table | Custom SQL
Whole → query_mode='materialized', auto SELECT * FROM kbc.bucket.table
Custom → query_mode='materialized', admin-supplied SELECT
This unifies the operator mental model across sources and brings
Keboola to capability parity for the materialized path. The legacy
'local' mode (extractor-driven full-table download) remains supported
by the API but is no longer the default — Whole mode is functionally
equivalent and follows the same materialized pipeline.
Form cleanup baked into the rebuild:
- Sync Strategy dropdown gone (UI lied; runtime never read it)
- Primary Key under <details>Advanced with catalog-only hint
- Sync Schedule input present (was missing from old Keboola form)
Discovery (List buckets / List tables / Use-table-as-base) parallels
the BQ tab's Discover/List tables/Use-as-base buttons via the
existing /api/admin/discover-tables endpoint with source=keboola
parameter."
Task F2: Keboola Edit modal — same parity
Files:
-
Modify:
app/web/templates/admin_tables.html -
Modify:
tests/test_admin_tables_ui_materialized.py -
Step 1: Failing test
def test_keboola_edit_modal_parity(seeded_app, monkeypatch):
"""Phase F: Edit modal mirrors Register's two-question structure
for Keboola rows."""
fake_cfg = {"data_source": {"type": "keboola", "keboola": {}}}
monkeypatch.setattr(
"app.instance_config.load_instance_config",
lambda: fake_cfg, raising=False,
)
from app.instance_config import reset_cache
reset_cache()
try:
c = seeded_app["client"]
token = seeded_app["admin_token"]
r = c.get("/admin/tables", headers=_auth(token))
html = r.text
# Q2 radio in edit.
assert 'name="editKbSyncMode"' in html
assert 'id="editKbBucket"' in html
assert 'id="editKbSourceTable"' in html
assert 'id="editKbSourceQuery"' in html
assert 'id="editKbSyncSchedule"' in html
# Discover/List/Use-as-base buttons mirror Register.
assert "discoverKeboolaBuckets('editKbBucketList')" in html
assert "discoverKeboolaTables('editKbBucket', 'editKbTableList')" in html
assert "prefillFromKeboolaTable('editKbSourceQuery')" in html
# Strategy gone, PK under details.
assert 'id="editStrategy"' not in html
assert 'id="editKbPrimaryKey"' in html
finally:
reset_cache()
- Step 2: Run, verify fail
pytest tests/test_admin_tables_ui_materialized.py::test_keboola_edit_modal_parity -v
- Step 3: Build Edit modal (mirror Register; reuse the helper functions which already accept ids).
(Concrete HTML omitted for brevity — mirror the Register modal with editKb* ids and add editKbSyncMode radios. Use existing helpers discoverKeboolaBuckets(datalistId), discoverKeboolaTables(inputId, datalistId), prefillFromKeboolaTable(textareaId) with the editKb* ids.)
- Step 4: Run, verify pass
pytest tests/test_admin_tables_ui_materialized.py -v
- Step 5: Commit
git add app/web/templates/admin_tables.html tests/test_admin_tables_ui_materialized.py
git commit -m "feat(admin-ui): Keboola tab Edit modal — parity with Register
Mirror of the Phase F Register modal in the Edit flow. Same Q2 radio,
same Discover/List tables/Use-as-base buttons via the parameterized
helpers, same Sync Schedule input, same Advanced disclosure for PK."
Phase G — UI: Jira tab (read-only listing)
Task G1: Jira tab listing
Files:
-
Modify:
app/web/templates/admin_tables.html -
Extend:
tests/test_admin_tables_tab_ui.py -
Step 1: Failing test
def test_jira_tab_is_read_only(seeded_app):
"""Jira tables are populated by webhooks, not by admin registration.
Tab shows the listing + a hint pointing to docs; no Register button."""
c = seeded_app["client"]
token = seeded_app["admin_token"]
r = c.get("/admin/tables", headers=_auth(token))
html = r.text
jira_tab = html[html.index('id="tab-content-jira"'):]
jira_tab = jira_tab[:jira_tab.index('</section>')]
# No Register button.
assert 'data-register-source="jira"' not in jira_tab
assert 'jiraRegisterBtn' not in jira_tab
# Hint pointing to docs.
assert "webhooks" in jira_tab.lower()
# Listing div present.
assert 'id="jiraTableListing"' in jira_tab
- Step 2: Run, verify pass
The Phase D scaffold already created the section with the hint — this test should already pass against the Phase D template. If it doesn't, adjust the Phase D HTML to match.
- Step 3: (Skip or commit if Phase D was sufficient)
git add tests/test_admin_tables_tab_ui.py
git commit -m "test(admin-ui): assert Jira tab is read-only listing"
Phase H — UI: per-tab listing filter, drop Strategy column
Task H1: Listing routes rows into the matching tab's <tbody>
Files:
-
Modify:
app/web/templates/admin_tables.html(the existingloadRegistryJS or its renderer) -
Step 1: Failing test
def test_listing_partitions_rows_by_source_type(seeded_app):
"""When the operator has registered tables across all three sources,
each tab's listing shows only the rows matching its source_type.
JS-driven so we test by inspecting the JS branching logic indirectly:
the renderer function takes a source filter and emits rows accordingly."""
c = seeded_app["client"]
token = seeded_app["admin_token"]
auth = {"Authorization": f"Bearer {token}"}
c.post("/api/admin/register-table", headers=auth, json={
"name": "kb_table", "source_type": "keboola", "bucket": "in.c-x",
"source_table": "y", "query_mode": "local",
})
c.post("/api/admin/register-table", headers=auth, json={
"name": "bq_table", "source_type": "bigquery",
"query_mode": "materialized", "source_query": "SELECT 1",
})
r = c.get("/admin/tables", headers=auth)
html = r.text
# The renderer function is dispatched per tab. The test verifies the
# JS code paths exist (we don't run JS in tests, just confirm the
# template provides the wiring).
assert "renderRegistryListing" in html or "loadRegistry" in html
# Each tab listing div is the renderer target.
assert 'document.getElementById(\'bqTableListing\')' in html
assert 'document.getElementById(\'kbTableListing\')' in html
assert 'document.getElementById(\'jiraTableListing\')' in html
- Step 2: Implement renderer dispatch
In the existing loadRegistry (or whatever the listing fetch is named), branch by source_type:
function loadRegistry() {
fetch('/api/admin/registry').then(function(r) { return r.json(); })
.then(function(data) {
var tables = data.tables || [];
renderRegistryListing(
'bqTableListing',
tables.filter(function(t) { return t.source_type === 'bigquery'; })
);
renderRegistryListing(
'kbTableListing',
tables.filter(function(t) { return t.source_type === 'keboola'; })
);
renderRegistryListing(
'jiraTableListing',
tables.filter(function(t) { return t.source_type === 'jira'; })
);
});
}
function renderRegistryListing(targetId, tables) {
var target = document.getElementById(targetId);
if (!target) return;
if (tables.length === 0) {
target.innerHTML = '<p class="empty-hint">No tables registered yet.</p>';
return;
}
var html = '<table class="registry-table">';
html += '<thead><tr>';
html += '<th>Table ID</th>';
html += '<th>Mode</th>'; // NEW: replaces Strategy column
html += '<th>Primary Key</th>';
html += '<th>Description</th>';
html += '<th class="col-actions">Actions</th>';
html += '</tr></thead><tbody>';
tables.forEach(function(table) {
html += '<tr>';
html += '<td class="col-id" title="' + escapeHtml(table.id) + '">' + escapeHtml(table.id) + '</td>';
html += '<td>' + escapeHtml(table.query_mode || 'local') + '</td>';
html += '<td>' + escapeHtml((table.primary_key || []).join(', ') || '-') + '</td>';
html += '<td>' + escapeHtml(table.description || '-') + '</td>';
html += '<td class="col-actions">';
html += '<button class="btn-icon" title="Edit" onclick=\'openEditModal(' + JSON.stringify(table).replace(/\'/g, "\\'") + ')\'>...</button>';
html += '<button class="btn-icon danger" title="Delete" onclick="deleteTable(\'' + escapeHtml(table.id).replace(/\'/g, "\\'") + '\')">...</button>';
html += '</td></tr>';
});
html += '</tbody></table>';
target.innerHTML = html;
}
Drop the legacy CSS .col-strategy and .strategy-badge from the <style> block (lines 514, 523 of the original file).
- Step 3: Run, verify pass
pytest tests/test_admin_tables_tab_ui.py -v
pytest tests/test_admin_tables_ui_materialized.py -v
- Step 4: Commit
git add app/web/templates/admin_tables.html tests/test_admin_tables_tab_ui.py
git commit -m "feat(admin-ui): per-tab listing filter; drop Strategy column
loadRegistry partitions the tables by source_type and dispatches each
slice to its own tab's listing div via renderRegistryListing(target, rows).
The Strategy column is replaced with a Mode column showing query_mode
(live / synced / materialized) — far more meaningful information.
.col-strategy and .strategy-badge CSS rules removed (no consumers left)."
Phase I — E2E integration tests + manual smoke + CHANGELOG + push
Task I1: PUT preservation regression guard
(Same as the prior plan iteration — re-stated here for completeness.)
Files:
-
Create:
tests/test_admin_put_preservation.py -
Step 1: Lock in the invariant
def test_put_preserves_omitted_sync_strategy(seeded_app):
c = seeded_app["client"]
token = seeded_app["admin_token"]
auth = {"Authorization": f"Bearer {token}"}
r = c.post("/api/admin/register-table", headers=auth, json={
"name": "events_partitioned",
"source_type": "keboola",
"bucket": "in.c-events",
"source_table": "events",
"query_mode": "local",
"sync_strategy": "partitioned",
})
assert r.status_code == 201, r.text
r = c.put("/api/admin/registry/events_partitioned", headers=auth, json={
"sync_schedule": "daily 03:00",
"description": "now daily",
})
assert r.status_code == 200
r = c.get("/api/admin/registry", headers=auth)
rows = r.json()["tables"]
row = next(t for t in rows if t["id"] == "events_partitioned")
assert row["sync_strategy"] == "partitioned"
(Plus a parallel test_put_preserves_omitted_primary_key.)
- Step 2: Run, verify pass on current code
pytest tests/test_admin_put_preservation.py -v
Expected: PASS — the invariant holds today; we're locking it.
- Step 3: Commit
git add tests/test_admin_put_preservation.py
git commit -m "test(admin-api): regression guard for PUT field preservation
Locks the Pydantic semantics that the Phase F form-cleanup relies on.
If a future maintainer flips model_dump() to exclude_unset=True, this
fires before partitioned rows silently regress."
Task I2: E2E integration for Keboola materialized
Files:
-
Create:
tests/test_keboola_materialized_e2e.py(skipped without real Keboola creds) -
Step 1: Write the test
"""End-to-end: register a Keboola materialized row → trigger sync →
parquet appears → manifest serves it → CLI da sync would download it.
Skipped unless KBC_TEST_URL + KBC_TEST_TOKEN + KBC_TEST_BUCKET +
KBC_TEST_TABLE are present."""
import os
import pytest
from pathlib import Path
KBC_URL = os.environ.get("KBC_TEST_URL")
KBC_TOKEN = os.environ.get("KBC_TEST_TOKEN")
KBC_BUCKET = os.environ.get("KBC_TEST_BUCKET")
KBC_TABLE = os.environ.get("KBC_TEST_TABLE")
pytestmark = pytest.mark.skipif(
not all([KBC_URL, KBC_TOKEN, KBC_BUCKET, KBC_TABLE]),
reason="Keboola creds not provided",
)
def test_register_trigger_manifest_path(seeded_app, monkeypatch, tmp_path):
monkeypatch.setenv("DATA_DIR", str(tmp_path))
monkeypatch.setenv("KEBOOLA_TOKEN", KBC_TOKEN)
monkeypatch.setattr(
"app.instance_config.load_instance_config",
lambda: {
"data_source": {
"type": "keboola",
"keboola": {
"url": KBC_URL,
"token_env": "KEBOOLA_TOKEN",
},
},
},
raising=False,
)
c = seeded_app["client"]
token = seeded_app["admin_token"]
auth = {"Authorization": f"Bearer {token}"}
# Register.
r = c.post("/api/admin/register-table", headers=auth, json={
"name": "smoke_subset",
"source_type": "keboola",
"query_mode": "materialized",
"source_query": (
f'SELECT * FROM kbc."{KBC_BUCKET}"."{KBC_TABLE}" LIMIT 5'
),
})
assert r.status_code == 201
# Trigger sync.
r = c.post("/api/sync/trigger", headers=auth)
assert r.status_code in (200, 202)
# Parquet must exist.
parquet = Path(tmp_path) / "extracts" / "keboola" / "data" / "smoke_subset.parquet"
assert parquet.exists() and parquet.stat().st_size > 0
# Manifest serves it.
r = c.get("/api/sync/manifest", headers=auth)
rows = r.json()["tables"]
smoke = next((t for t in rows if t["id"] == "smoke_subset"), None)
assert smoke is not None
assert smoke["source_type"] == "keboola"
assert smoke["query_mode"] == "local" # materialized parquets surface as local
assert smoke["md5"] # has a hash for da sync delta detection
- Step 2: Run
KBC_TEST_URL=... KBC_TEST_TOKEN=... KBC_TEST_BUCKET=... KBC_TEST_TABLE=... \
pytest tests/test_keboola_materialized_e2e.py -v
Expected: PASS with creds; SKIP without.
- Step 3: Commit
git add tests/test_keboola_materialized_e2e.py
git commit -m "test(keboola): E2E — register materialized → trigger → manifest
Full pipeline test. Skipped without KBC_TEST_* creds; passes locally
with a real Storage API token. Verifies parquet lands at the expected
path and the manifest exposes the row to da sync with the right
source_type / query_mode / md5 shape."
Task I3: CHANGELOG
Files:
-
Modify:
CHANGELOG.md -
Step 1: Add
## [Unreleased]block
## [Unreleased]
### Added
- **admin UI**: `/admin/tables` is now a per-connector tab interface
(BigQuery / Keboola / Jira). Each tab has its own Register modal +
listing scoped to its source_type. Active tab persists in
`window.location.hash` so refresh keeps the operator in place.
- **Keboola materialized SQL**: `query_mode='materialized'` now works
for `source_type='keboola'` — admin registers a SELECT against
`kbc."bucket"."table"` and the scheduler writes the result to
`/data/extracts/keboola/data/<id>.parquet`. Same flow as BigQuery
materialized; same `da sync` distribution; same RBAC. Cost guardrail
(BQ-style dry-run) intentionally omitted — Keboola extension has no
dry-run analog and Storage API cost is download-byte-shaped, not
scan-byte-shaped. A future PR can add a configurable byte cap if
operators ask for it.
- **Keboola Sync Schedule**: per-table cron input added to the Keboola
tab Register and Edit modals. The scheduler has always honored
per-table `sync_schedule` for every source via `is_table_due()`,
but the Keboola UI had no surface for it — operators had to use the
`/api/admin/registry/{id}` PUT endpoint or `da admin` CLI. Now they
can type `every 6h` / `daily 03:00` directly.
### Changed
- **admin UI**: Keboola Register and Edit modals adopt the same
two-question radio model as BigQuery — *What to sync?* (Whole table
/ Custom SQL). Whole-table mode synthesizes a `SELECT *` and writes
it through the materialized path; Custom mode lets the admin filter
/ aggregate / project. The legacy `query_mode='local'` extractor
path remains supported for back-compat but is no longer the default
for new Keboola registrations — Whole mode is functionally
equivalent and follows the unified materialized pipeline.
- **admin UI**: `Sync Strategy` dropdown removed from the Keboola form
(Register and Edit). Two independent agent reviews (2026-05-01) found
the field's hint claimed it controlled extraction but no extractor
reads it; only `profiler.is_partitioned()` consumes it for parquet-
layout detection. Field stays in the DB and Pydantic model for
back-compat (marked `Field(deprecated=True)`); just hidden from the
primary form.
- **admin UI**: `Primary Key` input moved under `<details>Advanced` in
both Keboola Register and Edit modals, with a clarifying hint that
it's catalog metadata only — Agnes always does full-overwrite sync;
no upsert / dedup. Auto-fill from Keboola discovery still works.
- **admin UI**: Registry listing column "Strategy" replaced with "Mode"
(showing `query_mode` instead of decorative `sync_strategy`). The
`.col-strategy` / `.strategy-badge` CSS rules removed.
### Deprecated
- `RegisterTableRequest.sync_strategy` — catalog/profiler metadata only;
no extractor reads it. Marked `Field(deprecated=True)`. External API
consumers see the signal in OpenAPI; back-compat preserved.
- `RegisterTableRequest.profile_after_sync` — runtime never read this
flag (Agent 1 finding 2026-05-01); profiler runs unconditionally on
every synced table. Marked `Field(deprecated=True)` and made inert
(the BQ register endpoint no longer force-sets it to `False`).
Back-compat preserved — external clients sending the field get no
error, no warning, no effect.
### Fixed
- **admin API**: `update_table` PUT preserves `sync_strategy` and
`primary_key` when the Edit modal omits them from the payload (this
invariant always held via `request.model_dump()` + `if v is not None`,
but Phase I now has an explicit regression-guard test).
- Step 2: Commit
git add CHANGELOG.md
git commit -m "docs(changelog): unified tab UI + Keboola materialized + form cleanup"
Task I4: Manual smoke + push + CI poll
- Step 1: Full test sweep
pytest tests/ -q
Expected: all passing. Investigate any unrelated regression.
- Step 2: Manual smoke on dev server
Start uvicorn app.main:app --reload and walk through:
-
/admin/tables— verify tab nav renders, switching between tabs works, hash persists. -
BigQuery tab — Register a materialized row using Custom SQL; verify it lands in the BQ tab's listing only.
-
Keboola tab — Register a Whole-mode row, verify it lands in the Keboola tab's listing.
-
Keboola tab — Register a Custom-SQL-mode row with a real BUCKET.TABLE filter; verify the parquet appears at
data/extracts/keboola/data/<id>.parquetafter the next scheduler tick. -
Jira tab — listing only, no Register button.
-
Edit any row in any tab; verify the right modal opens and the source-specific fields populate.
-
Step 3: Push branch
git push -u origin <branch-name>
-
Step 4: Open PR with body summarizing:
-
The four bundled concerns and why they're one PR
-
Backward-compat strategy (Pydantic deprecation, no DB migration)
-
Spike result confirming Keboola extension supports query passthrough
-
Manual smoke checklist
-
Step 5: Poll CI
gh pr checks <PR#>
Iterate on Devin Review feedback if any. The PR should land green: test, build-and-push, Devin Review.
Self-review checklist
- Spec coverage: Goal covers (1) tab-split, (2) Keboola materialized parity, (3) Keboola form cleanup, (4)
profile_after_syncresolution. Phases A–I implement all four. E2E safety contract enumerates the seven invariants the plan must protect; each has at least one explicit task. - Placeholder scan: Every step has the actual code or command. The few "(adapt to existing function shape)" notes apply where the existing code's exact line numbers can drift between planning and implementation; in those spots the task explicitly says "read the file at implementation time."
- Type / identifier consistency:
KeboolaAccess.duckdb_session()— used in Tasks B1, B2, B4materialize_query(table_id, sql, *, keboola_access, output_dir)— Task B2 signature, called from B4kb*ids in Keboola Register form (kbBucket, kbSourceTable, kbSourceQuery, kbSyncSchedule, kbPrimaryKey, kbViewName);editKb*ids in Keboola Edit form. Consistent across tasks F1, F2, H1.bqTableListing/kbTableListing/jiraTableListing— Phase D scaffold, referenced in Phase H renderer.
- TDD discipline: every behavior task starts with a failing test before implementation. Verification tasks (Task A1, Task I1) lock invariants that already hold.
- Commit cadence: 17 commits across the plan; each is scoped and reviewable on its own.
- Back-compat: No DB migration. All Pydantic fields stay alive (deprecated). External API clients sending legacy payloads get no error. Existing BQ form moves verbatim into a tab. Existing Keboola legacy
query_mode='local'rows continue to work.
Execution
Plan complete and saved to docs/superpowers/plans/2026-05-01-admin-tables-form-cleanup.md.
Two execution options:
- Subagent-driven (recommended) — fresh implementer subagent per task, two-stage review (spec compliance + code quality) between tasks. Same session, fast iteration. Plan has 17 commit-scoped tasks and a few sub-steps; expect ~3–4h of agentic work plus review iterations.
- Inline execution — execute tasks sequentially in this session with explicit checkpoints for human review.
Phase A is a 30-min spike that gates everything else (Keboola extension capability lock-in). Phases B–I run sequentially within their own constraints; Phase D (tab scaffold) must precede Phases E–G (tab content). Phase I (regression + E2E + CHANGELOG) wraps everything.