fix: #81 Group C — view-name collision detection (schema v10, squashed) (#100)

Schema v10 + view_ownership table. Cross-connector view name
collisions are detected and refused with an actionable ERROR rather
than silently last-write-wins. Pre-scan reconcile releases stale
ownerships in the same rebuild as a rename — but only when ALL
sources' pre-scans succeed (transient-IO defense; partial pre-scan
skips reconcile to avoid silently stealing a name).

26/26 view collision + orchestrator tests pass.
Refs #81 Group C.
This commit is contained in:
ZdenekSrotyr 2026-04-27 22:09:49 +02:00 committed by GitHub
parent ef74ec010c
commit 72230c3b51
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 620 additions and 9 deletions

View file

@ -13,6 +13,26 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C
<!-- Add bullets here. Group: Added / Changed / Fixed / Removed / Internal. <!-- Add bullets here. Group: Added / Changed / Fixed / Removed / Internal.
Mark breaking changes with **BREAKING** at the start of the bullet. --> Mark breaking changes with **BREAKING** at the start of the bullet. -->
### Added
- **Schema v10** introduces `view_ownership` to detect cross-connector
view-name collisions in the master analytics DB (issue #81 Group C).
When two connectors register the same `_meta.table_name`, the
orchestrator now refuses to silently overwrite the prior owner's view —
it logs a `view_ownership collision` ERROR identifying both sources
and the colliding name, and the second source's view is NOT created.
Previously this was last-write-wins, which depended on directory
iteration order and could change deployment-to-deployment. Operators
resolve a collision by renaming `name` in `table_registry` on one side
(registry-side aliasing — `source_table` stays unchanged, only the
view name changes). The orchestrator pre-scans every connector's
`_meta` at the start of each rebuild and releases stale ownerships
immediately (when ALL pre-scans succeed; if any fail, reconcile is
skipped to avoid silently stealing a transient-IO source's name),
so a renamed table frees its name in the SAME rebuild that introduces
the rename — no two-step waits needed. New module
`src/repositories/view_ownership.py` exposes the repository.
### Changed ### Changed
- **BREAKING (ops)**: Keboola extractor now exits with three distinct - **BREAKING (ops)**: Keboola extractor now exits with three distinct
@ -194,6 +214,14 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C
- `tests/conftest.py::seeded_app` extended with `viewer_token` and - `tests/conftest.py::seeded_app` extended with `viewer_token` and
`km_admin_token` so role-gating tests cover all four core roles. `km_admin_token` so role-gating tests cover all four core roles.
### Migrated
- **Schema bumped from v9 to v10**. Auto-migration applies on next start
(creates the `view_ownership` table; data on disk is unaffected). The
pre-migration snapshot machinery (added at v8→v9) covers v9→v10 too —
if anything goes wrong during the migration, the snapshot at
`<DATA_DIR>/state/system.duckdb.pre-migrate` lets you roll back.
## [0.11.5] — 2026-04-27 ## [0.11.5] — 2026-04-27
Follow-up release for PR #73: addresses four rounds of Devin AI review on the role-management-complete branch. No new public-API surface; the user-visible payoff is that v8→v9-migrated installations now work end-to-end (login flows, user list, admin nav, privilege revocation), and `make local-dev` startup is finally quiet. Follow-up release for PR #73: addresses four rounds of Devin AI review on the role-management-complete branch. No new public-API surface; the user-visible payoff is that v8→v9-migrated installations now work end-to-end (login flows, user list, admin nav, privilege revocation), and `make local-dev` startup is finally quiet.

View file

@ -16,7 +16,7 @@ logger = logging.getLogger(__name__)
_SAFE_IDENTIFIER = re.compile(r"^[a-zA-Z_][a-zA-Z0-9_]{0,63}$") _SAFE_IDENTIFIER = re.compile(r"^[a-zA-Z_][a-zA-Z0-9_]{0,63}$")
SCHEMA_VERSION = 9 SCHEMA_VERSION = 10
_SYSTEM_SCHEMA = """ _SYSTEM_SCHEMA = """
CREATE TABLE IF NOT EXISTS schema_version ( CREATE TABLE IF NOT EXISTS schema_version (
@ -64,6 +64,19 @@ CREATE TABLE IF NOT EXISTS sync_state (
error TEXT error TEXT
); );
-- v10: view-name collision detection across connectors. The orchestrator
-- writes views into the master analytics.duckdb under a flat namespace; two
-- connectors with the same `_meta.table_name` would otherwise silently
-- overwrite each other (last-write-wins). This table records the FIRST
-- source to register a given view name; subsequent attempts from a different
-- source are refused with a `name_collision` log line until the operator
-- renames one side. Issue #81 Group C.
CREATE TABLE IF NOT EXISTS view_ownership (
view_name VARCHAR PRIMARY KEY,
source_name VARCHAR NOT NULL,
registered_at TIMESTAMP NOT NULL DEFAULT current_timestamp
);
CREATE TABLE IF NOT EXISTS sync_history ( CREATE TABLE IF NOT EXISTS sync_history (
id VARCHAR PRIMARY KEY, id VARCHAR PRIMARY KEY,
table_id VARCHAR NOT NULL, table_id VARCHAR NOT NULL,
@ -585,6 +598,18 @@ _V8_TO_V9_MIGRATIONS = [
""", """,
] ]
# Issue #81 Group C — view-name collision detection. New table records the
# first source to register a given view name in the master analytics DB.
_V9_TO_V10_MIGRATIONS = [
"""
CREATE TABLE IF NOT EXISTS view_ownership (
view_name VARCHAR PRIMARY KEY,
source_name VARCHAR NOT NULL,
registered_at TIMESTAMP NOT NULL DEFAULT current_timestamp
)
""",
]
# Core role seed data — single source of truth. Used by both _seed_core_roles # Core role seed data — single source of truth. Used by both _seed_core_roles
# (idempotent insert) and the v8→v9 backfill. Order matters: lowest privilege # (idempotent insert) and the v8→v9 backfill. Order matters: lowest privilege
@ -798,6 +823,13 @@ def _ensure_schema(conn: duckdb.DuckDBPyConnection) -> None:
if current < 9: if current < 9:
for sql in _V8_TO_V9_MIGRATIONS: for sql in _V8_TO_V9_MIGRATIONS:
conn.execute(sql) conn.execute(sql)
_did_v9_finalize = True
else:
_did_v9_finalize = False
if current < 10:
for sql in _V9_TO_V10_MIGRATIONS:
conn.execute(sql)
if _did_v9_finalize:
# v9 finalize: seed core.* roles, backfill grants from # v9 finalize: seed core.* roles, backfill grants from
# legacy users.role, then drop the column. Order matters — # legacy users.role, then drop the column. Order matters —
# backfill needs the seed rows to exist; drop must be last. # backfill needs the seed rows to exist; drop must be last.

View file

@ -23,7 +23,7 @@ import os
import re import re
import threading import threading
from pathlib import Path from pathlib import Path
from typing import Dict, List from typing import Dict, List, Optional
import duckdb import duckdb
@ -99,12 +99,110 @@ class SyncOrchestrator:
with _rebuild_lock: with _rebuild_lock:
return self._do_rebuild_source(source_name) return self._do_rebuild_source(source_name)
def _scan_meta_pairs(self, extracts_dir: Path) -> tuple:
"""Read every connector's `_meta` and return (pairs, clean) where:
- ``pairs`` list of (source_name, table_name) tuples successfully
gathered from `_meta`.
- ``clean`` True iff every source's pre-scan succeeded. False if
any source's `_meta` couldn't be read (transient I/O, mid-write,
missing/corrupt extract.duckdb).
Used by view_ownership.reconcile to release stale claims before
the main rebuild loop tries to claim new names. The ``clean`` flag
guards against a correctness bug: if source B's pre-scan fails
and we naively reconcile against an incomplete `pairs` list, B's
prior ownership is dropped, and another source could claim B's
name in the same rebuild a silent overwrite, exactly what
Group C is meant to prevent. Callers MUST skip reconcile when
``clean`` is False; per-row claim-time collision detection still
catches actual collisions.
"""
pairs: List[tuple] = []
clean = True
for ext_dir in sorted(extracts_dir.iterdir()):
if not ext_dir.is_dir():
continue
db_file = ext_dir / "extract.duckdb"
if not db_file.exists():
continue
if not _validate_identifier(ext_dir.name, "source_name"):
continue
try:
ro_conn = duckdb.connect(str(db_file), read_only=True)
try:
rows = ro_conn.execute(
"SELECT table_name FROM _meta"
).fetchall()
for (table_name,) in rows:
if _validate_identifier(table_name, "table_name"):
pairs.append((ext_dir.name, table_name))
finally:
ro_conn.close()
except Exception as e:
logger.warning(
"scan_meta_pairs: failed to read %s (%s) — "
"skipping reconcile this rebuild to avoid releasing "
"ownerships prematurely",
ext_dir.name, e,
)
clean = False
return pairs, clean
def _do_rebuild(self) -> Dict[str, List[str]]: def _do_rebuild(self) -> Dict[str, List[str]]:
extracts_dir = _get_extracts_dir() extracts_dir = _get_extracts_dir()
if not extracts_dir.exists(): if not extracts_dir.exists():
logger.warning("Extracts directory %s does not exist", extracts_dir) logger.warning("Extracts directory %s does not exist", extracts_dir)
return {} return {}
# Issue #81 Group C — load view ownership map from system DB so we
# can detect cross-connector view-name collisions during this
# rebuild and refuse to silently overwrite a previously-claimed
# name. The map is kept in system.duckdb (analytics.duckdb is
# rebuilt fresh each time and would not survive).
from src.db import get_system_db
from src.repositories.view_ownership import ViewOwnershipRepository
sys_conn_for_views = get_system_db()
view_repo = None
try:
view_repo = ViewOwnershipRepository(sys_conn_for_views)
# Pre-scan every connector's _meta so we can run the reconcile
# pass BEFORE claims are evaluated. This makes "owner stopped
# publishing → name freed → another source can claim" work in
# the SAME rebuild rather than requiring two consecutive runs.
#
# Correctness: only reconcile when EVERY source's pre-scan
# succeeded. Otherwise a transient I/O failure on source B
# would drop B's prior ownership and let another source steal
# B's name — silent overwrite, exactly the bug Group C
# prevents. Per-row claim-time collision detection still
# catches actual collisions even without reconcile this run.
current_pairs, pre_scan_clean = self._scan_meta_pairs(extracts_dir)
if pre_scan_clean:
view_repo.reconcile(current_pairs)
else:
logger.warning(
"view_ownership: skipping reconcile this rebuild — "
"pre-scan was incomplete; renamed tables will release "
"their names on the next clean rebuild instead"
)
existing_owners = view_repo.get_all()
except Exception as e:
logger.warning(
"view_ownership pre-scan failed: %s — proceeding without "
"collision detection", e,
)
existing_owners = {}
view_repo = None
try:
sys_conn_for_views.close()
except Exception:
pass
sys_conn_for_views = None
# Track every (source, view) pair this rebuild successfully claims.
claimed_pairs: List[tuple] = []
result = {} result = {}
# Write to temp file then rename — avoids lock conflict with query endpoint # Write to temp file then rename — avoids lock conflict with query endpoint
tmp_path = self._db_path + ".tmp" tmp_path = self._db_path + ".tmp"
@ -139,14 +237,32 @@ class SyncOrchestrator:
continue continue
tables = self._attach_and_create_views( tables = self._attach_and_create_views(
conn, ext_dir.name, str(db_file) conn, ext_dir.name, str(db_file),
existing_owners=existing_owners,
claimed_pairs=claimed_pairs,
view_repo=view_repo if sys_conn_for_views else None,
) )
if tables: if tables:
result[ext_dir.name] = tables result[ext_dir.name] = tables
logger.info("Attached %s: %d tables", ext_dir.name, len(tables)) logger.info("Attached %s: %d tables", ext_dir.name, len(tables))
# No end-of-rebuild reconcile: the pre-scan reconcile above
# already released stale ownerships using a complete view of
# every source's `_meta`. Reconciling again here against
# `claimed_pairs` (which excludes refused collisions and any
# source that failed to attach) would incorrectly drop the
# legitimate prior owner of a name when its DB happens to be
# transiently unreadable. See test
# `test_pre_scan_failure_does_not_release_ownership` for the
# contract.
finally: finally:
conn.execute("CHECKPOINT") conn.execute("CHECKPOINT")
conn.close() conn.close()
if sys_conn_for_views is not None:
try:
sys_conn_for_views.close()
except Exception:
pass
# Atomic swap: replace analytics.duckdb with new version # Atomic swap: replace analytics.duckdb with new version
_atomic_swap_db(tmp_path, self._db_path) _atomic_swap_db(tmp_path, self._db_path)
@ -170,9 +286,25 @@ class SyncOrchestrator:
return result.get(source_name, []) return result.get(source_name, [])
def _attach_and_create_views( def _attach_and_create_views(
self, conn: duckdb.DuckDBPyConnection, source_name: str, db_path: str self,
conn: duckdb.DuckDBPyConnection,
source_name: str,
db_path: str,
existing_owners: Optional[Dict[str, str]] = None,
claimed_pairs: Optional[List[tuple]] = None,
view_repo=None,
) -> List[str]: ) -> List[str]:
"""ATTACH extract.duckdb, read _meta, create views in master.""" """ATTACH extract.duckdb, read _meta, create views in master.
Issue #81 Group C — when ``existing_owners`` and ``view_repo`` are
provided, the orchestrator checks for cross-connector view-name
collisions and refuses to overwrite a name owned by another source.
``claimed_pairs`` accumulates the (source, view) tuples this
rebuild successfully claims; the caller uses it for end-of-rebuild
reconcile.
"""
if existing_owners is None:
existing_owners = {}
tables = [] tables = []
try: try:
conn.execute(f"ATTACH '{db_path}' AS {source_name} (READ_ONLY)") conn.execute(f"ATTACH '{db_path}' AS {source_name} (READ_ONLY)")
@ -189,6 +321,33 @@ class SyncOrchestrator:
for table_name, rows, size_bytes, query_mode in meta_rows: for table_name, rows, size_bytes, query_mode in meta_rows:
if not _validate_identifier(table_name, "table_name"): if not _validate_identifier(table_name, "table_name"):
continue continue
# Issue #81 Group C — refuse cross-connector collisions.
# First-come-first-served: the source already in
# view_ownership keeps the name; any other source that
# tries to claim it gets logged + skipped until the
# operator renames one side. Re-claim by the same source
# is fine (idempotent rebuild).
if view_repo is not None:
if not view_repo.claim(table_name, source_name):
# Query live owner — covers two cases:
# (1) stale snapshot from rebuild start (existing_owners),
# (2) two sources both first-time-claim the same name
# in this rebuild — the loser sees the winner here.
prior_owner = (
view_repo.get_owner(table_name)
or existing_owners.get(table_name, "<unknown>")
)
logger.error(
"view_ownership collision: %s already owns view %r; "
"%s.%s will NOT be exposed. Rename `name` in the "
"table_registry on one side to resolve.",
prior_owner, table_name, source_name, table_name,
)
continue
if claimed_pairs is not None:
claimed_pairs.append((source_name, table_name))
conn.execute( conn.execute(
f"CREATE OR REPLACE VIEW \"{table_name}\" AS " f"CREATE OR REPLACE VIEW \"{table_name}\" AS "
f"SELECT * FROM {source_name}.\"{table_name}\"" f"SELECT * FROM {source_name}.\"{table_name}\""

View file

@ -0,0 +1,109 @@
"""Repository for view-name ownership across connectors.
Issue #81 Group C — when two connectors register the same view name in the
master analytics DB, the second one used to silently overwrite the first
(last-write-wins). With this repository the orchestrator records the FIRST
source to claim a name and refuses subsequent collisions until the operator
renames one side.
"""
from __future__ import annotations
from datetime import datetime, timezone
from typing import Dict, List, Optional, Tuple
import duckdb
class ViewOwnershipRepository:
def __init__(self, conn: duckdb.DuckDBPyConnection):
self.conn = conn
def get_owner(self, view_name: str) -> Optional[str]:
"""Return the source_name that owns ``view_name``, or None."""
row = self.conn.execute(
"SELECT source_name FROM view_ownership WHERE view_name = ?",
[view_name],
).fetchone()
return row[0] if row else None
def get_all(self) -> Dict[str, str]:
"""Return {view_name: source_name} for every registered ownership."""
rows = self.conn.execute(
"SELECT view_name, source_name FROM view_ownership"
).fetchall()
return {r[0]: r[1] for r in rows}
def claim(self, view_name: str, source_name: str) -> bool:
"""Register ``source_name`` as the owner of ``view_name``.
Returns True if the claim succeeds (either fresh registration or
re-claim by the same source). Returns False if a different source
already owns the name the caller MUST then refuse to create the
view and surface the collision to operators.
"""
existing = self.get_owner(view_name)
if existing is None:
self.conn.execute(
"INSERT INTO view_ownership (view_name, source_name, registered_at) "
"VALUES (?, ?, ?)",
[view_name, source_name, datetime.now(timezone.utc)],
)
return True
return existing == source_name
def release(self, view_name: str, source_name: str) -> bool:
"""Drop ownership of ``view_name`` if held by ``source_name``.
Used during rebuild cleanup when a connector no longer publishes a
previously-claimed name (e.g. operator renamed the table on the
upstream side). Returns True if a row was removed.
"""
before = self.conn.execute(
"SELECT COUNT(*) FROM view_ownership "
"WHERE view_name = ? AND source_name = ?",
[view_name, source_name],
).fetchone()[0]
if before == 0:
return False
self.conn.execute(
"DELETE FROM view_ownership "
"WHERE view_name = ? AND source_name = ?",
[view_name, source_name],
)
return True
def reconcile(
self, current_pairs: List[Tuple[str, str]]
) -> List[Tuple[str, str]]:
"""Drop ownerships for (source_name, view_name) pairs no longer
present in ``current_pairs``. Returns the list of dropped pairs.
Called at the end of `SyncOrchestrator.rebuild()` so a renamed or
removed table immediately releases its name; the next rebuild can
let a different source claim it without operator intervention.
"""
live = set(current_pairs)
all_rows = self.conn.execute(
"SELECT source_name, view_name FROM view_ownership"
).fetchall()
dropped = [
(src, view) for src, view in all_rows
if (src, view) not in live
]
for src, view in dropped:
self.conn.execute(
"DELETE FROM view_ownership "
"WHERE source_name = ? AND view_name = ?",
[src, view],
)
return dropped
def list_for_source(self, source_name: str) -> List[str]:
"""Return all view names owned by ``source_name``."""
rows = self.conn.execute(
"SELECT view_name FROM view_ownership "
"WHERE source_name = ? ORDER BY view_name",
[source_name],
).fetchall()
return [r[0] for r in rows]

View file

@ -77,10 +77,12 @@ def _v8_state(db_path) -> duckdb.DuckDBPyConnection:
class TestFreshInstall: class TestFreshInstall:
"""Fresh DB → v9 directly via _SYSTEM_SCHEMA + INSERT version + seed.""" """Fresh DB → v9 directly via _SYSTEM_SCHEMA + INSERT version + seed."""
def test_schema_version_is_9(self, fresh_data_dir): def test_schema_version_is_current(self, fresh_data_dir):
from src.db import get_system_db, get_schema_version from src.db import get_system_db, get_schema_version, SCHEMA_VERSION
conn = get_system_db() conn = get_system_db()
assert get_schema_version(conn) == 9 # Was hard-coded to 9 before #81 Group C bumped SCHEMA_VERSION to 10.
# Compare against the constant so future bumps don't churn this test.
assert get_schema_version(conn) == SCHEMA_VERSION
def test_core_roles_seeded_with_implies_hierarchy(self, fresh_data_dir): def test_core_roles_seeded_with_implies_hierarchy(self, fresh_data_dir):
from src.db import get_system_db from src.db import get_system_db
@ -132,7 +134,8 @@ class TestV8ToV9Migration:
# Trigger migration. # Trigger migration.
from src.db import get_system_db, get_schema_version from src.db import get_system_db, get_schema_version
conn = get_system_db() conn = get_system_db()
assert get_schema_version(conn) == 9 from src.db import SCHEMA_VERSION
assert get_schema_version(conn) == SCHEMA_VERSION
rows = conn.execute( rows = conn.execute(
"""SELECT u.email, r.key, g.source """SELECT u.email, r.key, g.source

View file

@ -0,0 +1,280 @@
"""Issue #81 Group C — view-name collision detection across connectors.
Two connectors with overlapping `_meta.table_name` used to silently
overwrite each other in the master analytics DB. This file exercises:
- Schema v10's `view_ownership` table exists after migration.
- `ViewOwnershipRepository.claim` is first-come-first-served.
- `ViewOwnershipRepository.reconcile` releases stale ownerships.
- The orchestrator refuses to overwrite a view owned by a different
source, logs an ERROR, but keeps publishing views for the winner.
"""
from __future__ import annotations
import os
import duckdb
import pytest
from src.repositories.view_ownership import ViewOwnershipRepository
# --------------------------------------------------------------------------
# Repository unit tests
# --------------------------------------------------------------------------
@pytest.fixture
def fresh_system_db(tmp_path, monkeypatch):
"""Set DATA_DIR to a fresh temp dir and trigger a v10 schema build."""
monkeypatch.setenv("DATA_DIR", str(tmp_path))
from src.db import get_system_db
conn = get_system_db()
yield conn
conn.close()
class TestViewOwnershipRepository:
def test_claim_first_succeeds(self, fresh_system_db):
repo = ViewOwnershipRepository(fresh_system_db)
assert repo.claim("orders", "keboola") is True
assert repo.get_owner("orders") == "keboola"
def test_claim_same_source_idempotent(self, fresh_system_db):
repo = ViewOwnershipRepository(fresh_system_db)
repo.claim("orders", "keboola")
# Re-claiming by the same source is fine — rebuild is idempotent.
assert repo.claim("orders", "keboola") is True
def test_claim_different_source_refused(self, fresh_system_db):
repo = ViewOwnershipRepository(fresh_system_db)
repo.claim("orders", "keboola")
# Second source asks for the same name — refused.
assert repo.claim("orders", "bigquery") is False
# Original owner unchanged.
assert repo.get_owner("orders") == "keboola"
def test_release(self, fresh_system_db):
repo = ViewOwnershipRepository(fresh_system_db)
repo.claim("orders", "keboola")
assert repo.release("orders", "keboola") is True
assert repo.get_owner("orders") is None
def test_release_wrong_source_no_op(self, fresh_system_db):
repo = ViewOwnershipRepository(fresh_system_db)
repo.claim("orders", "keboola")
# Release should not delete a row owned by a different source.
assert repo.release("orders", "bigquery") is False
assert repo.get_owner("orders") == "keboola"
def test_reconcile_drops_stale_pairs(self, fresh_system_db):
repo = ViewOwnershipRepository(fresh_system_db)
repo.claim("orders", "keboola")
repo.claim("users", "keboola")
repo.claim("traffic", "bigquery")
# Next rebuild claims orders + traffic only — users should be released.
live = [("keboola", "orders"), ("bigquery", "traffic")]
dropped = repo.reconcile(live)
assert dropped == [("keboola", "users")]
assert repo.get_owner("users") is None
assert repo.get_owner("orders") == "keboola"
assert repo.get_owner("traffic") == "bigquery"
def test_list_for_source(self, fresh_system_db):
repo = ViewOwnershipRepository(fresh_system_db)
repo.claim("orders", "keboola")
repo.claim("users", "keboola")
repo.claim("traffic", "bigquery")
assert repo.list_for_source("keboola") == ["orders", "users"]
assert repo.list_for_source("bigquery") == ["traffic"]
# --------------------------------------------------------------------------
# Orchestrator behaviour test
# --------------------------------------------------------------------------
def _make_extract_db(path: str, table_names: list[str]) -> None:
"""Create a minimal extract.duckdb with `_meta` rows + a view per table.
Returns nothing the file at `path` is the connector's output.
"""
os.makedirs(os.path.dirname(path), exist_ok=True)
# Recreate from scratch — round-2 tests rebuild the same connector dir
# with different tables, so wipe any prior file.
if os.path.exists(path):
os.unlink(path)
wal = path + ".wal"
if os.path.exists(wal):
os.unlink(wal)
conn = duckdb.connect(path)
conn.execute(
"CREATE TABLE _meta ("
"table_name VARCHAR, description VARCHAR, rows BIGINT, "
"size_bytes BIGINT, extracted_at TIMESTAMP, query_mode VARCHAR)"
)
for t in table_names:
# Each table is a tiny in-memory CTAS — query_mode='local' so the
# orchestrator picks it up the same way it does parquet-backed views.
conn.execute(f'CREATE TABLE "{t}" AS SELECT 1 AS x')
conn.execute(
"INSERT INTO _meta VALUES (?, ?, 1, 0, current_timestamp, 'local')",
[t, ""],
)
conn.close()
class TestOrchestratorCollisionRefusal:
def test_first_source_wins_second_source_skipped(
self, tmp_path, monkeypatch
):
"""Two sources both publish a view named `orders`. The first one
the orchestrator visits (alphabetical order) keeps the name; the
other is logged as a collision and its view is NOT created."""
monkeypatch.setenv("DATA_DIR", str(tmp_path))
# Sources visited alphabetically: alpha first, beta second.
_make_extract_db(
str(tmp_path / "extracts" / "alpha" / "extract.duckdb"),
["orders", "alpha_only"],
)
_make_extract_db(
str(tmp_path / "extracts" / "beta" / "extract.duckdb"),
["orders", "beta_only"],
)
from src.orchestrator import SyncOrchestrator
orch = SyncOrchestrator()
result = orch.rebuild()
# alpha got both its views (it ran first); beta only got its non-colliding one.
assert "orders" in result["alpha"]
assert "alpha_only" in result["alpha"]
assert "orders" not in result["beta"], (
"beta should NOT have published a colliding `orders` view"
)
assert "beta_only" in result["beta"]
# Ownership records persisted in system DB.
from src.db import get_system_db
sys_conn = get_system_db()
try:
repo = ViewOwnershipRepository(sys_conn)
assert repo.get_owner("orders") == "alpha"
assert repo.get_owner("alpha_only") == "alpha"
assert repo.get_owner("beta_only") == "beta"
finally:
sys_conn.close()
def test_partial_collision_does_not_block_other_tables(
self, tmp_path, monkeypatch
):
"""Source A publishes [orders, alpha_only_a, alpha_only_b]; source
B publishes [orders, beta_only]. The collision on `orders` (A wins,
first by alphabet) must NOT prevent B from publishing `beta_only`,
nor prevent A from publishing its other two."""
monkeypatch.setenv("DATA_DIR", str(tmp_path))
_make_extract_db(
str(tmp_path / "extracts" / "alpha" / "extract.duckdb"),
["orders", "alpha_only_a", "alpha_only_b"],
)
_make_extract_db(
str(tmp_path / "extracts" / "beta" / "extract.duckdb"),
["orders", "beta_only"],
)
from src.orchestrator import SyncOrchestrator
result = SyncOrchestrator().rebuild()
assert set(result["alpha"]) == {"orders", "alpha_only_a", "alpha_only_b"}
assert set(result["beta"]) == {"beta_only"}
def test_pre_scan_failure_does_not_release_ownership(
self, tmp_path, monkeypatch
):
"""When `_scan_meta_pairs` cannot read source B (corrupt
extract.duckdb, transient I/O), the orchestrator must SKIP
reconcile this rebuild otherwise B's name would be released and
another source could silently steal it. Issue #81 Group C
review-2."""
monkeypatch.setenv("DATA_DIR", str(tmp_path))
# Round 1: alpha owns orders.
_make_extract_db(
str(tmp_path / "extracts" / "alpha" / "extract.duckdb"),
["orders"],
)
from src.orchestrator import SyncOrchestrator
SyncOrchestrator().rebuild()
# Round 2: alpha's extract.duckdb is unreadable (simulate corrupt
# file by writing garbage). beta now publishes `orders`. The
# reconcile should be SKIPPED because the scan was incomplete;
# alpha keeps ownership; beta is refused.
alpha_db = tmp_path / "extracts" / "alpha" / "extract.duckdb"
alpha_wal = tmp_path / "extracts" / "alpha" / "extract.duckdb.wal"
alpha_db.write_bytes(b"NOT A REAL DUCKDB FILE")
if alpha_wal.exists():
alpha_wal.unlink()
_make_extract_db(
str(tmp_path / "extracts" / "beta" / "extract.duckdb"),
["orders"],
)
result = SyncOrchestrator().rebuild()
# alpha did not contribute a view (file unreadable); beta did NOT
# get to claim `orders` because reconcile was skipped.
assert "orders" not in result.get("beta", []), (
"beta should have been refused `orders` — reconcile must skip "
"when pre-scan is incomplete"
)
# Ownership unchanged.
from src.db import get_system_db
sys_conn = get_system_db()
try:
repo = ViewOwnershipRepository(sys_conn)
assert repo.get_owner("orders") == "alpha"
finally:
sys_conn.close()
def test_owner_releases_name_after_rename(self, tmp_path, monkeypatch):
"""If the previous owner of a name no longer publishes it, the
next rebuild releases the name a different source can then
claim it without operator intervention."""
monkeypatch.setenv("DATA_DIR", str(tmp_path))
# Round 1: alpha owns orders.
_make_extract_db(
str(tmp_path / "extracts" / "alpha" / "extract.duckdb"),
["orders"],
)
from src.orchestrator import SyncOrchestrator
SyncOrchestrator().rebuild()
# Round 2: alpha renames its table to alpha_orders, beta wants `orders`.
_make_extract_db(
str(tmp_path / "extracts" / "alpha" / "extract.duckdb"),
["alpha_orders"],
)
_make_extract_db(
str(tmp_path / "extracts" / "beta" / "extract.duckdb"),
["orders"],
)
result = SyncOrchestrator().rebuild()
assert "alpha_orders" in result["alpha"]
assert "orders" in result["beta"], (
"beta should now own `orders` after alpha released it"
)
from src.db import get_system_db
sys_conn = get_system_db()
try:
repo = ViewOwnershipRepository(sys_conn)
assert repo.get_owner("orders") == "beta"
assert repo.get_owner("alpha_orders") == "alpha"
finally:
sys_conn.close()