agnes-the-ai-analyst/tests/test_db_schema_version.py
minasarustamyan dc5e0e0d11
Marketplace UX overhaul: rich plugin/skill/agent detail + filename rename (#251)
* Rename agnes-metadata.json to marketplace-metadata.json

Curated marketplace enrichment file (.claude-plugin/agnes-metadata.json)
becomes marketplace-metadata.json. Clean cut, no fallback — curators of
upstream marketplace repos must rename the file on their side.

Python API renames mirror the file rename: read_agnes_metadata →
read_marketplace_metadata, AGNES_METADATA_REL → MARKETPLACE_METADATA_REL,
AGNES_METADATA_MAX_BYTES → MARKETPLACE_METADATA_MAX_BYTES. Synth Claude
Code marketplace strip rule (.agnes/** + the metadata file) follows the
new filename.

* Marketplace detail polish: window cover + 715:310 aspect + helper alignment

- Plugin & item (skill/agent) detail hero: 160x160 square cover replaced
  with a macOS-style window frame (3 traffic-light dots + titlebar label
  showing the entity name). Body is constrained to 715:310 so curator-
  uploaded covers no longer crop to a square. Window is 380px wide; meta
  column and absolutely-positioned top-right install/remove actions stay
  put. Fallback when no cover_photo_url (translucent gradient + PL/SK/AG
  initials) is unchanged, just inside the window body.

- Inner skill/agent cards in the plugin detail's Internal structure
  section adopt the same 715:310 aspect (was fixed 78px tall). No window
  chrome on inner cards — just the matching proportions so covers read
  consistently across hero, grid tiles, and listing cards.

- Curated nested item helper text ("This skill is part of ... — add the
  bundle to your stack to use it") now stacks UNDER the "Open parent
  plugin" button instead of being a side-by-side flex sibling in the
  actions-row. Added align-self: flex-end so the 260px helper box
  anchors at the right edge of the 300px actions column, matching the
  button's right edge.

* Marketplace My tab: surface the same category + type filters as Flea

- Frontend: mp-cat-row and mp-type-row now show on tab=my (previously
  hidden — type was flea-only, category was flea/curated-only). Curated
  browse stays plugin-only and continues to hide the type pills.
  fetchOne() sends the `type` param for tab=my too, so the items
  endpoint's existing my-branch filter actually receives it.

- Backend categories endpoint, tab=my branch: when the type filter is
  set to skill/agent, skip counting curated subscriptions. Curated
  plugins are always type='plugin', so they wouldn't survive the items
  endpoint's type filter; including them in the category counts made
  the pill numbers overstate what users could actually see in the
  grid. type=None or type='plugin' keeps the previous behaviour.

- CHANGELOG entry under [Unreleased].

* Marketplace plugin detail: render rich content from marketplace-metadata.json

Adds five optional plugin-level fields to marketplace-metadata.json and
renders them on the curated plugin detail page + listing card:

* display_name — friendly h1 / listing-card name / mac-window titlebar
  label (overrides the technical plugin id)
* tagline — punchy 1-line value prop for the hero subtitle and the
  listing card description (replacing the verbose marketplace.json
  description on cards)
* description — multi-paragraph markdown body, server-side rendered
  through markdown-it-py and sanitized through nh3 with a
  description-scoped allowlist (no iframes / no raw HTML / no
  javascript: links). Powers the "What it does" panel.
* use_cases[] — {title, description, prompt} entries that render as a
  3-column "When to use it" card grid; each card shows the literal
  prompt as a code chip so users can copy-paste into Claude Code.
* sample_interaction — {user, assistant} dialog rendered in a Claude
  Code-style dark Catppuccin Mocha transcript panel: monospace user
  row with a green ">" prompt indicator + sans-serif assistant body
  with markdown formatting (peach bold, yellow italic, pink inline
  code, mantle-dark fenced code blocks).

All five fields are optional; UI sections only render when populated,
so plugins without enrichment look identical to before. Fields are
read on-demand from the working tree (cached by mtime per marketplace
slug) so curator edits land at the next request without waiting for
a sync cycle — same pattern as the existing inner-skill/agent
enrichment path. No DB schema bump.

Skill / agent rich-content rendering is deferred to a later phase
(needs a source-of-truth decision: extend plugin.yml? LLM-generate
from SKILL.md / agent.md?). The schema accepts the same fields at
skill/agent level today for forward compatibility but the UI ignores
them for now.

Also: stripped a stale `background-color: var(--bg)` from the global
`code` rule in style.css (was making inline code visually disappear
on the page background).

* Skill / agent detail: render rich content from marketplace-metadata.json

Brings the skill/agent detail pages to parity with the plugin detail
page. Same rich-content schema (display_name, tagline, description as
markdown, use_cases[], sample_interaction) plus two per-item additions:

* invocation — curator-provided literal command string. When set,
  overrides the computed "<manifest_name>:<inner_name>" chip and
  cleanly supports both "/" skill prefix and "@" agent prefix (the
  hardcoded "/" in the chip markup is hidden when the curator provides
  the invocation, so /grpn-eng:query <q> and @grpn-eng:cto-architect
  both render correctly).
* when_to_use — markdown disambiguation block ("Use this for X. For
  similar Y, see /other-skill") rendered into a new "When to use this"
  panel below the Example section.

Skill / agent category is now per-item overridable in
marketplace-metadata.json. When absent, the API keeps the parent
plugin's category as the badge so existing items don't lose their
category until curators opt in to per-item categorization.

The new "Example" Q&A panel uses the same Claude Code-style dark
Catppuccin Mocha transcript treatment as the plugin detail —
monospace user row with a green ">" prompt indicator + sans-serif
assistant body with markdown formatting.

All new fields are optional and read on-demand from the working tree.
Skills / agents whose marketplace-metadata.json doesn't carry rich
content render exactly the same way they did before (frontmatter
description + computed slash command + cover from existing v32
enrichment). No DB schema bump.

* Fix TypeError in skill / agent detail when curator sets per-item category

`curated_skill_detail` and `curated_agent_detail` were passing both
`**parent` (from `_curated_inner_parent_fields`, which returns the
parent plugin's category as a fallback) and `**enrichment` (from
`_curated_inner_enrichment`, which returns the per-item category
override when the curator set one) into `InnerDetailResponse(...)`.

Python function-call kwargs unpacking with overlapping keys raises
`TypeError: got multiple values for keyword argument 'category'`
— it doesn't merge like a literal dict does. The bug only surfaced
when the marketplace-metadata.json carried a `category` field at
skill / agent level (curator opting into per-item categorization);
items without that override hit the endpoint cleanly because only
parent provided the key.

Fix: build `merged = {**parent, **enrichment}` first (literal-dict
syntax DOES merge, with the right-hand-side winning) and unpack the
merged dict. Curator override still wins via the merge order, and
the same pattern is future-proof for any other field that lands in
both layers later.

Plus a regression test in test_marketplace_metadata.py asserting
that the inner-resolver carries `category` for downstream merging.

* Marketplace detail: tolerate partial curator JSON

Server constructed UseCase / SampleInteraction via raw dict indexing
(uc["title"], sample["assistant"]), so a curator commit missing any
required Pydantic field crashed the whole plugin / skill / agent detail
endpoint with a 500. Route both constructions through _safe_use_case /
_safe_sample_interaction helpers — partial input silently drops the
malformed card / section instead of breaking the page.

Regression test in test_marketplace_api.py covers the three shapes:
use_case missing a key, use_case with an empty string, and
sample_interaction with only user (no assistant). Sibling rich fields
still render.

* Address PR-251 review (must-fixes + S2/S3 polish) + release-cut 0.50.0

Five must-fixes from the review pass (3 from @cvrysanek's two-stage
review, 2 from my independent pass), plus the 0.50.0 release-cut as the
last commit on this PR per CLAUDE.md (CLAUDE.md "Release-cut belongs
to the PR" rule added in v0.49.1).

Must-fixes
----------

1. Cache eviction: bounded LRU instead of per-marketplace predicate.
   The previous predicate (`k[0] == marketplace_id and k[1] != mtime_ns`)
   only swept stale entries for the CURRENT marketplace; with N>100
   distinct marketplaces each holding one mtime key, the cap silently
   failed and memory grew linearly. Replaced with OrderedDict-backed
   bounded LRU at cap=256, drop oldest insert on overflow.
   Cache stress test pinned in test_marketplace_metadata.py.

2. Render CPU cap: per-field byte cap on description / when_to_use /
   sample_interaction.assistant via MARKETPLACE_METADATA_FIELD_MAX_BYTES
   (= 64 KiB). Without this, a 1 MiB curator markdown body × QPS =
   curator-controlled CPU burn through pure-Python markdown-it-py.
   Truncation respects UTF-8 boundaries and logs a warning so the
   curator sees the cap fire on the next sync. Test for cap +
   UTF-8-boundary preservation.

3. Inner-detail bypassed the metadata cache. _curated_inner_enrichment,
   _curated_inner_cover, and curated_detail all called
   read_marketplace_metadata directly, defeating the mtime cache the
   plugin listing already shared. Routed all three through
   _read_metadata_cached so skill/agent detail hits are O(1) re-parses
   per marketplace per mtime instead of O(QPS).

4. Truthy-vs-presence trap in plugin/inner enrichment merge. API-layer
   writers used `if resolved.get(k):` which silently dropped any
   future falsy-but-valid resolver field (bool featured=False, int
   priority=0, str category=''). Switched to presence check
   (`if k in resolved`) so the resolver is the authority on field
   presence; `{**parent, **enrichment}` merge respects whatever the
   resolver decided to ship.

5. Vendor-agnostic OSS cleanup. Removed operator-specific token
   references (/grpn-eng:, @grpn-eng:, .foundryai/) from
   src/marketplace_metadata.py docstring, app/web/templates/
   marketplace_item_detail.html JS comment, docs/curated-marketplace-
   format.md, and tests/test_marketplace_metadata.py fixtures. Replaced
   with generic /my-plugin:tool / @my-agent:role / .example/ placeholders.

CHANGELOG
---------
- New "### Fixed (PR #251 follow-ups)" section documenting all 4
  code-side must-fixes
- New "### Internal" section noting the vendor cleanup + new tests
- BREAKING bullet for the file rename now covers operator-side
  migration: running instances see plugin enrichment disappear from
  the UI until upstream curator renames + nightly sync overwrites the
  working tree; POST /api/marketplaces/{id}/sync forces refresh sooner
- Stripped /grpn-eng: leaks from the existing skill/agent rich-content
  bullet

Tests
-----
128 targeted tests pass (test_marketplace_metadata, test_marketplace_api,
test_marketplace, test_markdown_render, test_marketplace_synth_strip,
test_marketplace_filter). New tests added:
- 6 XSS regression tests on render_safe (javascript:/data:/vbscript:
  schemes via autolink, reference link, and mixed-case + positive
  http/https/mailto + noopener noreferrer rel)
- 3 byte-cap tests (truncation + UTF-8 boundary + under-cap pass-through)
- 1 cache eviction stress test (>256 marketplaces -> bounded at cap)
- 1 truthy-vs-presence resolver-contract test

Release-cut
-----------
- pyproject.toml 0.49.1 -> 0.50.0 (minor; BREAKING file rename per
  pre-1.0 CHANGELOG note: "breaking changes called out under Changed
  or Removed with the BREAKING marker")
- CHANGELOG [Unreleased] -> [0.50.0] - 2026-05-12, new empty
  [Unreleased] on top.

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-12 08:38:39 +00:00

581 lines
23 KiB
Python

"""v20 adds source_query column to table_registry.
Backs query_mode='materialized' for BigQuery: admin registers a SQL body
that the scheduler runs through the DuckDB BQ extension and writes as a
parquet to /data/extracts/bigquery/data/<id>.parquet.
The v19 step (#150) drops dataset_permissions, access_requests tables and
users.role, table_registry.is_public columns; v20 then ALTERs the post-v19
table_registry to add the source_query column.
"""
import duckdb
from src.db import SCHEMA_VERSION, _ensure_schema, get_schema_version
def test_schema_version_is_37():
# v27 → v28: explicit-install (Model B) for curated marketplace plugins.
# user_plugin_optouts row presence flips meaning from "excluded" to
# "subscribed"; migration wipes existing rows so the inverted reading
# starts from a clean baseline. Also adds marketplace_plugins.created_at
# (per-plugin "newest first" sort on /marketplace), backfilled from
# parent marketplace_registry.registered_at.
# v28 → v29: /home page rollout — instance_templates singleton
# consolidation (welcome_template + claude_md_template merged) + new
# users.onboarded column. See tests/test_v29_home_migration.py for
# the exhaustive coverage of that step.
# v29 → v30: news_template — single versioned table for the /home
# news perex + /news permalink page. See
# tests/test_news_template_repository.py.
# v30 → v31: session-pipeline framework — session_processor_state
# replaces session_extraction_state with composite PK.
# v31 → v32 (PR #233): flea-market upload guardrails — adds
# store_entities.visibility_status + creates store_submissions.
# v32 → v33 (PR #233): forensic columns on store_submissions —
# file_size, bundle_sha256, bundle_purged_at. Underpins the
# persist-blocked-bundle behavior so admins can Rescan /
# Override / Download; 30-day TTL purge clears bytes while
# keeping the row + sha intact. See docs/STORE_GUARDRAILS.md.
# v33 → v34: drop store_submissions.retry_count — counter mixed LLM
# error count + admin rescan count, redundant with audit_log.
# v34 → v35 (PR #233): store_entities gains 'archived' visibility
# state + archived_at + archived_by audit columns. Owner
# soft-delete writes 'archived'; existing user_store_installs
# keep serving the bundle through marketplace.zip / .git.
# Hard delete (DELETE ?hard=true) remains admin-only.
# v35 → v36 (PR #233 follow-up): re-apply NOT NULL + DEFAULT 'pending'
# on store_entities.visibility_status. Lost in the v34→v35
# column rebuild. Without this, an INSERT that omits the
# column lands NULL → repo reads None → undefined behavior
# in the visibility gates. Value-list invariant remains
# enforced application-side (DuckDB ADD CHECK on existing
# column not supported).
# v36 → v37: curated marketplace enrichment from
# `.claude-plugin/marketplace-metadata.json` plus mandatory curator
# identity on marketplace_registry. Adds curator_name +
# curator_email to marketplace_registry, and
# cover_photo_url + video_url + doc_links to
# marketplace_plugins.
# v37 → v38: flea-market edit feature with version
# history. Adds store_entities.version_no INTEGER and
# version_history JSON. Each new bundle upload via
# PUT bumps version_no and appends to version_history;
# metadata-only edits don't bump. Existing rows backfill
# to version_no=1 with a single-entry history seeded
# from the row's current `version` (hash). Bundle bytes
# for each version live on disk under
# ${DATA_DIR}/store/<id>/versions/v<N>/plugin/.
# v38 → v39 (this PR): system plugin tier — admin-toggleable
# mandatory plugin set. Adds marketplace_plugins.is_system
# BOOLEAN DEFAULT FALSE. The flag drives a fanout that
# materializes resource_grants + user_plugin_optouts rows
# for every existing user_groups + users row, so the
# resolver's existing (rbac ∩ subscriptions) computation
# naturally pulls system plugins into every user's stack.
# UI then locks the corresponding controls so users can't
# unsubscribe and admins can't revoke per-group grants.
assert SCHEMA_VERSION == 39
def test_v37_marketplace_curator_columns(tmp_path):
"""Fresh install reaches the current schema with the v37 marketplace
columns present."""
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
_ensure_schema(conn)
registry_cols = {
r[0] for r in conn.execute(
"SELECT column_name FROM information_schema.columns "
"WHERE table_name = 'marketplace_registry'"
).fetchall()
}
assert {"curator_name", "curator_email"} <= registry_cols, (
f"curator columns missing from marketplace_registry: {registry_cols}"
)
plugin_cols = {
r[0] for r in conn.execute(
"SELECT column_name FROM information_schema.columns "
"WHERE table_name = 'marketplace_plugins'"
).fetchall()
}
assert {"cover_photo_url", "video_url", "doc_links"} <= plugin_cols, (
f"enrichment columns missing from marketplace_plugins: {plugin_cols}"
)
conn.close()
def test_v36_db_migrates_to_current(tmp_path):
"""Pre-existing v36 DB upgrades cleanly through v37 (curator
enrichment) and v38 (flea edit version history) without losing
existing rows."""
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
# Stand up a minimal v36-shape registry + plugin row, plus the
# schema_version row that pins us to 36.
conn.execute(
"CREATE TABLE schema_version (version INTEGER, "
"applied_at TIMESTAMP DEFAULT current_timestamp)"
)
conn.execute("INSERT INTO schema_version (version) VALUES (36)")
conn.execute("""CREATE TABLE marketplace_registry (
id VARCHAR PRIMARY KEY, name VARCHAR NOT NULL,
url VARCHAR NOT NULL, branch VARCHAR, token_env VARCHAR,
description TEXT, registered_by VARCHAR,
registered_at TIMESTAMP DEFAULT current_timestamp,
last_synced_at TIMESTAMP, last_commit_sha VARCHAR, last_error TEXT
)""")
conn.execute("""CREATE TABLE marketplace_plugins (
marketplace_id VARCHAR NOT NULL, name VARCHAR NOT NULL,
description TEXT, version VARCHAR, author_name VARCHAR,
homepage VARCHAR, category VARCHAR, source_type VARCHAR,
source_spec JSON, raw JSON,
created_at TIMESTAMP DEFAULT current_timestamp,
updated_at TIMESTAMP DEFAULT current_timestamp,
PRIMARY KEY (marketplace_id, name)
)""")
conn.execute(
"INSERT INTO marketplace_registry (id, name, url) "
"VALUES ('legacy', 'Legacy', 'https://example.com/repo.git')"
)
conn.execute(
"INSERT INTO marketplace_plugins (marketplace_id, name) "
"VALUES ('legacy', 'foo')"
)
_ensure_schema(conn)
assert get_schema_version(conn) == SCHEMA_VERSION
# v37 enrichment columns exist; existing rows preserved with NULL.
row = conn.execute(
"SELECT curator_name, curator_email FROM marketplace_registry "
"WHERE id = 'legacy'"
).fetchone()
assert row == (None, None)
row = conn.execute(
"SELECT cover_photo_url, video_url, doc_links FROM marketplace_plugins "
"WHERE marketplace_id = 'legacy' AND name = 'foo'"
).fetchone()
assert row == (None, None, None)
conn.close()
def test_v39_adds_marketplace_plugins_is_system(tmp_path):
"""Fresh install reaches the current schema with the v39 is_system
column on marketplace_plugins. Default value is FALSE (not NULL) so
the fanout helpers don't need to special-case absent rows."""
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
_ensure_schema(conn)
cols = {
r[0] for r in conn.execute(
"SELECT column_name FROM information_schema.columns "
"WHERE table_name = 'marketplace_plugins'"
).fetchall()
}
assert "is_system" in cols, f"is_system missing from {cols}"
# New rows default to FALSE — required so a freshly-synced plugin
# doesn't accidentally land in everyone's stack.
conn.execute(
"INSERT INTO marketplace_registry (id, name, url) "
"VALUES ('m', 'M', 'https://example.com/repo.git')"
)
conn.execute(
"INSERT INTO marketplace_plugins (marketplace_id, name) "
"VALUES ('m', 'p')"
)
row = conn.execute(
"SELECT is_system FROM marketplace_plugins "
"WHERE marketplace_id = 'm' AND name = 'p'"
).fetchone()
assert row[0] is False, f"new plugin defaulted to {row[0]!r}, expected False"
conn.close()
def test_v38_db_migrates_to_v39(tmp_path):
"""Pre-existing v38 DB upgrades to v39 cleanly — adds is_system
column, existing rows backfill to FALSE, schema_version updates."""
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
# Stand up the v38 minimal shape: schema_version row + the two
# marketplace tables + a pre-existing plugin row that must survive
# the migration with is_system = FALSE.
conn.execute(
"CREATE TABLE schema_version (version INTEGER, "
"applied_at TIMESTAMP DEFAULT current_timestamp)"
)
conn.execute("INSERT INTO schema_version (version) VALUES (38)")
conn.execute("""CREATE TABLE marketplace_registry (
id VARCHAR PRIMARY KEY, name VARCHAR NOT NULL,
url VARCHAR NOT NULL, branch VARCHAR, token_env VARCHAR,
description TEXT, registered_by VARCHAR,
registered_at TIMESTAMP DEFAULT current_timestamp,
last_synced_at TIMESTAMP, last_commit_sha VARCHAR, last_error TEXT,
curator_name VARCHAR, curator_email VARCHAR
)""")
conn.execute("""CREATE TABLE marketplace_plugins (
marketplace_id VARCHAR NOT NULL, name VARCHAR NOT NULL,
description TEXT, version VARCHAR, author_name VARCHAR,
homepage VARCHAR, category VARCHAR, source_type VARCHAR,
source_spec JSON, raw JSON,
created_at TIMESTAMP DEFAULT current_timestamp,
updated_at TIMESTAMP DEFAULT current_timestamp,
cover_photo_url VARCHAR, video_url VARCHAR, doc_links JSON,
PRIMARY KEY (marketplace_id, name)
)""")
conn.execute(
"INSERT INTO marketplace_registry (id, name, url) "
"VALUES ('legacy', 'Legacy', 'https://example.com/repo.git')"
)
conn.execute(
"INSERT INTO marketplace_plugins (marketplace_id, name) "
"VALUES ('legacy', 'foo')"
)
_ensure_schema(conn)
assert get_schema_version(conn) == SCHEMA_VERSION
cols = {
r[0] for r in conn.execute(
"SELECT column_name FROM information_schema.columns "
"WHERE table_name = 'marketplace_plugins'"
).fetchall()
}
assert "is_system" in cols
# Existing pre-v39 row backfilled to FALSE — no plugin lands in
# everyone's stack just because we ran the migration.
row = conn.execute(
"SELECT is_system FROM marketplace_plugins "
"WHERE marketplace_id = 'legacy' AND name = 'foo'"
).fetchone()
assert row[0] is False, f"pre-existing row backfilled to {row[0]!r}"
conn.close()
def test_v20_adds_source_query(tmp_path):
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
_ensure_schema(conn)
cols = {
r[0] for r in conn.execute(
"SELECT column_name FROM information_schema.columns "
"WHERE table_name = 'table_registry'"
).fetchall()
}
assert "source_query" in cols, f"source_query missing from {cols}"
assert get_schema_version(conn) == SCHEMA_VERSION
conn.close()
def test_claude_md_template_seeded_in_instance_templates(tmp_path):
"""v23 introduced claude_md_template as a singleton table; v28 consolidates
it into instance_templates keyed 'claude_md'. Post-v28 the legacy table is
dropped — the canonical lookup is `instance_templates WHERE key='claude_md'`.
See tests/test_v28_migration.py for the migration path coverage. This test
just verifies the seeded row is present on a fresh install.
"""
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
_ensure_schema(conn)
tables = {
r[0] for r in conn.execute(
"SELECT table_name FROM information_schema.tables "
"WHERE table_schema = 'main'"
).fetchall()
}
assert "instance_templates" in tables
assert "claude_md_template" not in tables, (
"claude_md_template should be consolidated away post-v28"
)
row = conn.execute(
"SELECT key, content FROM instance_templates WHERE key = 'claude_md'"
).fetchone()
assert row is not None
assert row[0] == "claude_md"
assert row[1] is None # default = no override
conn.close()
def test_v19_db_migrates_to_v20(tmp_path):
"""Pre-existing v19 DB (post-RBAC-drop) without source_query upgrades
cleanly without losing data."""
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
# Simulate a v19 DB at minimal but realistic shape: schema_version row +
# a table_registry row in the post-v19 column shape (no is_public column,
# since v19 finalize dropped it via the table-rebuild idiom).
conn.execute(
"CREATE TABLE schema_version (version INTEGER, "
"applied_at TIMESTAMP DEFAULT current_timestamp)"
)
conn.execute("INSERT INTO schema_version (version) VALUES (19)")
conn.execute("""CREATE TABLE table_registry (
id VARCHAR PRIMARY KEY, name VARCHAR NOT NULL,
source_type VARCHAR, bucket VARCHAR, source_table VARCHAR,
sync_strategy VARCHAR DEFAULT 'full_refresh',
query_mode VARCHAR DEFAULT 'local',
sync_schedule VARCHAR, profile_after_sync BOOLEAN DEFAULT true,
primary_key VARCHAR, folder VARCHAR, description TEXT,
registered_by VARCHAR,
registered_at TIMESTAMP DEFAULT current_timestamp
)""")
conn.execute("INSERT INTO table_registry (id, name) VALUES ('foo', 'foo')")
_ensure_schema(conn)
assert get_schema_version(conn) == SCHEMA_VERSION # bumped 19→28 forward
cols = {
r[0] for r in conn.execute(
"SELECT column_name FROM information_schema.columns "
"WHERE table_name = 'table_registry'"
).fetchall()
}
assert "source_query" in cols
# Existing row preserved, new column NULL
row = conn.execute(
"SELECT id, source_query FROM table_registry WHERE id='foo'"
).fetchone()
assert row == ("foo", None)
conn.close()
def _make_v34_store_entities(conn):
"""Build a minimal v34-shape store_entities table for v34→v35 path tests.
Only includes the columns the v34→v35 migration touches; the rest of
the schema isn't needed because the function operates only on
store_entities's column set.
"""
conn.execute("""
CREATE TABLE store_entities (
id VARCHAR PRIMARY KEY,
visibility_status VARCHAR DEFAULT 'pending'
)
""")
conn.execute(
"INSERT INTO store_entities (id, visibility_status) VALUES "
"('a', 'approved'), ('b', 'pending'), ('c', 'hidden')"
)
def test_v34_to_v35_clean_path_rebuilds_visibility_column(tmp_path):
"""Standard v34 → v35 path: ``visibility_status`` is present, no temp
column. Migration rebuilds the column without the legacy CHECK so
'archived' becomes a valid value, preserves all row values, and adds
the audit columns.
"""
from src.db import _v34_to_v35_migrate
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
_make_v34_store_entities(conn)
_v34_to_v35_migrate(conn)
cols = {
r[0] for r in conn.execute(
"SELECT column_name FROM information_schema.columns "
"WHERE table_name = 'store_entities'"
).fetchall()
}
assert "visibility_status" in cols
assert "_vis_v35" not in cols, "temp column must be cleaned up"
assert "archived_at" in cols
assert "archived_by" in cols
rows = dict(conn.execute(
"SELECT id, visibility_status FROM store_entities ORDER BY id"
).fetchall())
assert rows == {"a": "approved", "b": "pending", "c": "hidden"}, (
f"row values must survive the rebuild: {rows}"
)
conn.close()
def test_v34_to_v35_recovers_from_partial_rebuild_missing_visibility(tmp_path):
"""Partial-rebuild recovery: a previous migration attempt completed
steps 3-5 (added _vis_v35, copied values, dropped visibility_status)
but failed before step 6 (RENAME). Subsequent restarts hit
DROP visibility_status (no IF EXISTS guard) and looped on the same
error, leaving the DB stranded with schema_version stuck pre-v35.
The new code detects this state — _vis_v35 present, visibility_status
absent — and finishes the rebuild with the RENAME alone instead of
re-running the full destructive sequence.
"""
from src.db import _v34_to_v35_migrate
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
# Hand-build the broken state: store_entities with _vis_v35 instead of
# visibility_status, populated with the canonical values.
conn.execute("""
CREATE TABLE store_entities (
id VARCHAR PRIMARY KEY,
_vis_v35 VARCHAR
)
""")
conn.execute(
"INSERT INTO store_entities (id, _vis_v35) VALUES "
"('a', 'approved'), ('b', 'pending'), ('c', 'hidden')"
)
_v34_to_v35_migrate(conn)
cols = {
r[0] for r in conn.execute(
"SELECT column_name FROM information_schema.columns "
"WHERE table_name = 'store_entities'"
).fetchall()
}
assert "visibility_status" in cols
assert "_vis_v35" not in cols
assert "archived_at" in cols
assert "archived_by" in cols
rows = dict(conn.execute(
"SELECT id, visibility_status FROM store_entities ORDER BY id"
).fetchall())
assert rows == {"a": "approved", "b": "pending", "c": "hidden"}, (
f"row values must come back via RENAME, not be lost: {rows}"
)
conn.close()
def test_v34_to_v35_recovers_from_partial_rebuild_both_columns(tmp_path):
"""Edge state: a prior attempt aborted before the DROP, leaving both
visibility_status (canonical) and _vis_v35 (temp) on the table.
The recovery path drops _vis_v35 and keeps visibility_status — the
rest of the schema expects that name.
"""
from src.db import _v34_to_v35_migrate
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
conn.execute("""
CREATE TABLE store_entities (
id VARCHAR PRIMARY KEY,
visibility_status VARCHAR,
_vis_v35 VARCHAR
)
""")
conn.execute(
"INSERT INTO store_entities (id, visibility_status, _vis_v35) VALUES "
"('a', 'approved', 'approved')"
)
_v34_to_v35_migrate(conn)
cols = {
r[0] for r in conn.execute(
"SELECT column_name FROM information_schema.columns "
"WHERE table_name = 'store_entities'"
).fetchall()
}
assert "visibility_status" in cols
assert "_vis_v35" not in cols, "temp column must be dropped"
row = conn.execute(
"SELECT id, visibility_status FROM store_entities WHERE id = 'a'"
).fetchone()
assert row == ("a", "approved")
conn.close()
def test_v32_db_with_partial_v35_recovers_through_full_ladder(tmp_path):
"""End-to-end: a DB stranded at schema_version=32 with the half-applied
v34→v35 state (visibility_status dropped, _vis_v35 left behind) must
upgrade cleanly through the full ladder when ``_ensure_schema`` runs.
This is the production scenario observed in operator instances after
the original list-form ``_V34_TO_V35_MIGRATIONS`` failed mid-run on
a fresh restart.
"""
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
# Stand up the broken state. We only need enough of the schema for the
# migration ladder to run — ``_ensure_schema`` will create the rest
# via ``_SYSTEM_SCHEMA``'s IF NOT EXISTS guards.
conn.execute(
"CREATE TABLE schema_version (version INTEGER, "
"applied_at TIMESTAMP DEFAULT current_timestamp)"
)
conn.execute("INSERT INTO schema_version (version) VALUES (32)")
conn.execute("""
CREATE TABLE store_entities (
id VARCHAR PRIMARY KEY,
owner_user_id VARCHAR,
owner_username VARCHAR,
type VARCHAR,
name VARCHAR,
archived_at TIMESTAMP,
archived_by VARCHAR,
_vis_v35 VARCHAR
)
""")
conn.execute(
"INSERT INTO store_entities (id, type, name, _vis_v35) "
"VALUES ('a', 'skill', 'alpha', 'approved')"
)
_ensure_schema(conn)
assert get_schema_version(conn) == SCHEMA_VERSION
cols = {
r[0] for r in conn.execute(
"SELECT column_name FROM information_schema.columns "
"WHERE table_name = 'store_entities'"
).fetchall()
}
assert "visibility_status" in cols
assert "_vis_v35" not in cols
# Existing row preserved, value carried over from _vis_v35.
row = conn.execute(
"SELECT id, visibility_status FROM store_entities WHERE id = 'a'"
).fetchone()
assert row == ("a", "approved")
conn.close()
def test_v35_to_v36_reapplies_visibility_constraints(tmp_path):
"""v34→v35 dropped NOT NULL + DEFAULT when rebuilding the column to
drop the legacy CHECK; v35→v36 re-applies them. Verifies that on a
freshly migrated DB, an INSERT omitting visibility_status either
inherits the default 'pending' or fails — never lands NULL.
"""
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
_ensure_schema(conn)
assert get_schema_version(conn) == SCHEMA_VERSION
cols = conn.execute(
"SELECT column_name, is_nullable, column_default "
"FROM information_schema.columns "
"WHERE table_name = 'store_entities' "
" AND column_name = 'visibility_status'"
).fetchall()
assert cols, "visibility_status column missing from store_entities"
name, is_nullable, default_expr = cols[0]
assert is_nullable == "NO", (
f"visibility_status must be NOT NULL after v36; got is_nullable={is_nullable!r}"
)
# DuckDB renders the default as a quoted literal — match either form.
assert default_expr is not None, "visibility_status DEFAULT must be set"
assert "pending" in str(default_expr).lower(), (
f"visibility_status DEFAULT must be 'pending'; got {default_expr!r}"
)
conn.close()