fix(schema-v24): raise on deferred migration so retry path actually runs (Devin Review on db.py:1757)

Pre-fix: when v24 migration found rows to migrate but
data_source.bigquery.project was empty, it logged a warning per row
and returned normally. Schema_version then bumped to 24 unconditionally
→ next start's 'if current < 24:' gate skipped _v23_to_v24_finalize
forever, leaving rows in DuckDB-flavor SQL that the new
_wrap_admin_sql_for_jobs_api wrapping path rejects.

Devin escalated this from advisory ("idempotent retry") to critical
on rescan after my reply. The reply was wrong — the LIKE filter inside
the function gives idempotency IF the function is called again, but
the schema-version gate prevents that call from happening.

Fix (Devin's recommended Approach 1): raise RuntimeError BEFORE the
schema-version bump when rows need migration but project_id is empty.
The schema_version stays at 23, so on next start the 'if current < 24:'
gate fires and the migration runs again — this time with project_id
configured.

Side effect: a BQ-using deployment that hasn't set the project blocks
startup until they do. That's the right call for a config error that
would otherwise silently break all materialized tables. The error
message points at the right knob (data_source.bigquery.project +
restart).

No-rows-no-block invariant preserved: the early 'if not rows: return'
at the top of _v23_to_v24_finalize means non-BQ deployments are
unaffected.

Tests:
- test_v24_raises_when_project_not_configured_and_rows_need_migration:
  asserts raise + schema_version stays at 23 (the load-bearing
  invariant for retry-on-next-start to work)
- test_v24_skips_clean_when_no_rows_match_even_without_project:
  asserts non-BQ deployments don't block startup
- Existing 3 tests still pass
This commit is contained in:
ZdenekSrotyr 2026-05-04 23:11:34 +02:00
parent 36012e0833
commit 0612c1e1a1
3 changed files with 93 additions and 13 deletions

View file

@ -39,6 +39,7 @@ End-to-end clean-analyst-bootstrap rewrite. The web `/setup` page now produces a
- `agnes snapshot create` (formerly `da fetch`) no longer materializes an empty `user/duckdb/analytics.duckdb` when run before any `agnes pull`. Friendly hint redirects to `agnes pull`. - `agnes snapshot create` (formerly `da fetch`) no longer materializes an empty `user/duckdb/analytics.duckdb` when run before any `agnes pull`. Friendly hint redirects to `agnes pull`.
- Workspace `agnes status` reads from the canonical `server/parquet/` and `user/duckdb/analytics.duckdb` paths (was reading legacy `data/parquet/`, `data/metadata/last_sync.json`). - Workspace `agnes status` reads from the canonical `server/parquet/` and `user/duckdb/analytics.duckdb` paths (was reading legacy `data/parquet/`, `data/metadata/last_sync.json`).
- `agnes init` and `agnes pull` errors now use the `cli/error_render.py` typed-error renderer (added in 0.32.0), so analyst-facing error UX matches the structured shape `agnes query --remote` already produces. - `agnes init` and `agnes pull` errors now use the `cli/error_render.py` typed-error renderer (added in 0.32.0), so analyst-facing error UX matches the structured shape `agnes query --remote` already produces.
- **Schema v24 migration retry path is no longer dead** (Devin Review on `db.py:1757`, escalated from advisory to critical on rescan). Pre-fix: when `_v23_to_v24_finalize` had materialized BQ rows to migrate but `data_source.bigquery.project` was not configured, it logged a warning per row and returned normally. The schema_version then bumped to 24 unconditionally, the `if current < 24:` gate in `_ensure_schema` skipped the function on every subsequent startup, and the affected rows kept their DuckDB-flavor `bq."ds"."tbl"` source_query forever — which the new `_wrap_admin_sql_for_jobs_api` wrapping path rejects as unparseable BQ SQL with no automatic recovery. The "set the project and restart to retry" log hint pointed at a code path that no longer ran. Fix: the migration now raises `RuntimeError` BEFORE the schema_version bump when it has rows to migrate but no project_id, blocking startup with a clear actionable error pointing at `data_source.bigquery.project`. Operator configures the project, restarts, and the migration completes (schema_version is still at 23, so the `if current < 24:` gate fires). Side effect: a BQ-using deployment that hasn't set the project blocks startup until they do — that's the right call for a config error that would otherwise silently break materialized tables. Two regression tests in `test_schema_v24_source_query_rewrite.py`: `test_v24_raises_when_project_not_configured_and_rows_need_migration` (raise + version-stays-at-23) and `test_v24_skips_clean_when_no_rows_match_even_without_project` (no-rows-no-block invariant).
- **`agnes admin register-table` UX**: three real-world feedback items addressed. - **`agnes admin register-table` UX**: three real-world feedback items addressed.
- **`--query-mode materialized` now requires `--bucket`** (client-side validation; exits with a clear error before hitting the server). The previous help docstring claimed `--bucket` was *ignored* for materialized rows, but the value is actually load-bearing — `agnes schema <name>` builds the BQ identifier as `bq.<bucket>.<source_table>`, so an empty bucket registered the row but broke subsequent schema/describe with HTTP 400 "unsafe BQ identifier in registry". Docstring rewritten to reflect reality. - **`--query-mode materialized` now requires `--bucket`** (client-side validation; exits with a clear error before hitting the server). The previous help docstring claimed `--bucket` was *ignored* for materialized rows, but the value is actually load-bearing — `agnes schema <name>` builds the BQ identifier as `bq.<bucket>.<source_table>`, so an empty bucket registered the row but broke subsequent schema/describe with HTTP 400 "unsafe BQ identifier in registry". Docstring rewritten to reflect reality.
- **Post-success hints**: after a successful registration the CLI now points operators at the two follow-ups they routinely miss: (a) `agnes setup first-sync` to materialize the parquet (registration alone doesn't trigger a build; `agnes pull` reports "Updated 0 tables" until the scheduler tick), and (b) `agnes admin grant create <group> table <name>` to make the row visible in `agnes catalog` for non-admin users (catalog is RBAC-filtered). - **Post-success hints**: after a successful registration the CLI now points operators at the two follow-ups they routinely miss: (a) `agnes setup first-sync` to materialize the parquet (registration alone doesn't trigger a build; `agnes pull` reports "Updated 0 tables" until the scheduler tick), and (b) `agnes admin grant create <group> table <name>` to make the row visible in `agnes catalog` for non-admin users (catalog is RBAC-filtered).

View file

@ -1729,19 +1729,37 @@ def _v23_to_v24_finalize(conn: duckdb.DuckDBPyConnection) -> None:
if not rows: if not rows:
return # Nothing to migrate; skip the transaction. return # Nothing to migrate; skip the transaction.
# If we have rows to migrate AND project_id isn't configured, we cannot
# rewrite their source_query. Raise BEFORE the schema_version bump so
# the migration re-runs on the NEXT startup (after the operator
# configures the project). Pre-fix the function logged a warning per
# row and returned normally — the schema_version then bumped to 24
# unconditionally, the `if current < 24:` gate skipped this function
# forever after, and rows stayed in DuckDB-flavor SQL. The new
# `_wrap_admin_sql_for_jobs_api` wrapping path then rejected those
# rows at materialize time as unparseable BQ SQL with no automatic
# recovery (Devin Review on db.py:1757). Side effect: a BQ-using
# deployment that hasn't set the project blocks startup until they
# do — that's the right call for a config error that would otherwise
# silently break materialized tables.
if not project_id:
raise RuntimeError(
f"v24 migration cannot complete: {len(rows)} materialized "
f"BigQuery row(s) need their source_query rewritten from "
f"DuckDB-flavor `bq.\"ds\".\"tbl\"` to BQ-native "
f"`<project>.ds.tbl`, but `data_source.bigquery.project` is "
f"not configured. Set it via /admin/server-config (or "
f"`instance.yaml: data_source.bigquery.project`) and restart "
f"the app to retry the migration. The schema version is NOT "
f"bumped to 24 until this completes; pre-migration DB "
f"snapshot is at `{{DATA_DIR}}/state/system.duckdb.pre-migrate`."
)
conn.execute("BEGIN TRANSACTION") conn.execute("BEGIN TRANSACTION")
try: try:
for row_id, sq in rows: for row_id, sq in rows:
if sq is None: if sq is None:
continue continue
if not project_id:
logger.warning(
"v24 migration: skipping rewrite of source_query for row %r"
"data_source.bigquery.project is not configured. Set it via "
"/admin/server-config and restart the app to retry the "
"migration.", row_id,
)
continue
new_sq = pattern.sub(_replace_for_v24(project_id), sq) new_sq = pattern.sub(_replace_for_v24(project_id), sq)
if new_sq != sq: if new_sq != sq:
conn.execute( conn.execute(

View file

@ -95,7 +95,24 @@ def test_v24_idempotent_when_already_bq_native(monkeypatch):
conn.close() conn.close()
def test_v24_logs_warning_when_project_not_configured(monkeypatch, caplog): def test_v24_raises_when_project_not_configured_and_rows_need_migration(monkeypatch):
"""Regression for Devin Review on db.py:1757.
Pre-fix: when v24 migration found rows to migrate but
`data_source.bigquery.project` was empty, it logged a warning per
row and returned normally. The schema_version then bumped to 24
unconditionally next start's `if current < 24:` gate skipped
`_v23_to_v24_finalize` forever, leaving rows in DuckDB-flavor SQL
that the new `_wrap_admin_sql_for_jobs_api` rejects as unparseable.
The "set the project and restart to retry" log hint pointed at a
code path that no longer ran.
Post-fix: the migration raises a RuntimeError BEFORE the
schema_version bump. The exception propagates out of `_ensure_schema`,
blocking app startup with a clear actionable error. Operator
configures the project, restarts, and the migration completes
because schema_version is still 23.
"""
with tempfile.TemporaryDirectory() as tmp: with tempfile.TemporaryDirectory() as tmp:
monkeypatch.setenv("DATA_DIR", tmp) monkeypatch.setenv("DATA_DIR", tmp)
monkeypatch.setattr( monkeypatch.setattr(
@ -112,20 +129,64 @@ def test_v24_logs_warning_when_project_not_configured(monkeypatch, caplog):
["t1", "t1", "bigquery", "materialized", "ds", "tbl", ["t1", "t1", "bigquery", "materialized", "ds", "tbl",
'SELECT * FROM bq."ds"."tbl"'], 'SELECT * FROM bq."ds"."tbl"'],
) )
with caplog.at_level("WARNING"):
# Migration must REFUSE to bump schema_version when it can't
# complete the row-rewrite. The error message must point the
# operator at the right knob (`data_source.bigquery.project`).
with pytest.raises(RuntimeError) as exc:
_ensure_schema(conn) _ensure_schema(conn)
msg = str(exc.value)
assert "data_source.bigquery.project" in msg
assert "restart" in msg.lower()
# Row stays in DuckDB-flavor (we couldn't rewrite it).
row = conn.execute( row = conn.execute(
"SELECT source_query FROM table_registry WHERE id='t1'" "SELECT source_query FROM table_registry WHERE id='t1'"
).fetchone() ).fetchone()
assert row[0] == 'SELECT * FROM bq."ds"."tbl"' assert row[0] == 'SELECT * FROM bq."ds"."tbl"'
assert any(
"v24" in r.message.lower() or "project" in r.message.lower() # Critical: schema_version stays at 23 so the migration retries
for r in caplog.records # on the next startup once the operator configures the project.
from src.db import get_schema_version
assert get_schema_version(conn) == 23, (
"schema_version must NOT bump to 24 when v24 migration is "
"deferred — otherwise the documented retry path is dead"
) )
finally: finally:
conn.close() conn.close()
def test_v24_skips_clean_when_no_rows_match_even_without_project(monkeypatch):
"""Counterpart to the raise-on-deferred test: a deployment with NO
materialized BQ rows (typical Keboola-only or remote-only install)
must NOT block startup just because `data_source.bigquery.project`
isn't configured. The `if not rows: return` early-out at the top of
`_v23_to_v24_finalize` handles this the raise only fires when
there's actual work to defer."""
import pytest as _pytest # noqa: F401 (referenced if test extends later)
with tempfile.TemporaryDirectory() as tmp:
monkeypatch.setenv("DATA_DIR", tmp)
monkeypatch.setattr(
"app.instance_config.get_value",
lambda *args, **kw: kw.get("default", ""),
)
Path(tmp, "state").mkdir(parents=True, exist_ok=True)
db_path = Path(tmp, "state", "system.duckdb")
conn = duckdb.connect(str(db_path))
try:
_seed_v23(conn)
# No materialized BQ rows seeded.
# Must NOT raise — there's nothing to migrate.
_ensure_schema(conn)
from src.db import get_schema_version, SCHEMA_VERSION
assert get_schema_version(conn) == SCHEMA_VERSION
finally:
conn.close()
def test_v24_keboola_materialized_row_not_rewritten(monkeypatch): def test_v24_keboola_materialized_row_not_rewritten(monkeypatch):
"""Materialized rows with source_type != 'bigquery' must not be touched """Materialized rows with source_type != 'bigquery' must not be touched
by v24. Keboola materialized has no notion of bq."ds"."tbl" syntax; by v24. Keboola materialized has no notion of bq."ds"."tbl" syntax;