agnes-the-ai-analyst/tests/test_db_schema_version.py
minasarustamyan c6c72b9c00
feat(flea): marketplace refactor — data model, attribution, UI unification (#342)
* feat(flea): phase-1 — title, tagline, synthetic_name columns + upload UX

Schema v49 adds three user-facing metadata columns to store_entities:

- title (NOT NULL) — humanized display name shown on marketplace
  surfaces in later phases. Acronym-aware humanizer in
  src/store_naming.py (27 entries: MCP, API, OAuth, S3, …) shared
  with the frontend via Jinja-injected dict so JS pre-fill and
  Python backfill produce identical output.
- tagline (NULL, ≤200 chars) — optional short description for card
  listings. Long-form `description` stays.
- synthetic_name (NOT NULL) — deterministic `<name>-by-<owner_username>`
  stored as a column for indexing and as the single source of truth
  for attribution lookups in later phases. Today's bundle bake still
  uses suffixed_name() at the same call sites.

Migration (_v48_to_v49_migrate, Python function — humanize has no
SQL equivalent) backfills existing rows: title from
humanize_name(strip_archive_suffix(name)), synthetic from the concat
formula; tagline stays NULL. Idempotent (ADD COLUMN IF NOT EXISTS +
SET NOT NULL no-op on re-run).

Upload form (store_upload.html step 2) reorders fields: Title
(pre-filled from server-side humanize, JS keeps it in sync until
the user edits manually) → Name + dark synthetic preview on one
row (matches marketplace_item_detail.html dark code styling, no
copy button — preview only) → Short description with character
counter → Description (unchanged). Edit form (store_edit.html)
mirrors the layout with pre-filled values from the entity row.

API:

- POST /api/store/entities/preview returns `title` (humanized
  fallback) for upload form pre-fill.
- POST + PUT /api/store/entities accept `title` and `tagline` form
  fields with 100/200-char validation; PUT recomputes
  synthetic_name when `name` changes (caller responsibility per
  repo contract).
- StoreEntityResponse exposes all three new fields.

Repository:

- create() takes title + tagline + synthetic_name as optional
  kwargs with derived defaults (humanize_name(name) / concat) so
  existing test fixtures don't need to thread them.
- update() supports partial updates on all three; tagline empty
  string clears via NULL sentinel.
- archive() recomputes synthetic_name on rename to the archived
  slug so the column stays consistent with name.

Tests:

- New test_schema_v48_to_v49_migration.py: fresh install,
  populated-row backfill (incl. archived row strip), idempotence,
  NOT NULL constraint verification.
- test_store_naming.py: 14 humanize parametrize cases + acronym
  dict invariants.
- test_store_api.py::TestStoreV49Metadata: preview humanize, POST
  with explicit + fallback title, 100/200-char rejects, PUT
  partial update + synthetic recompute on rename.
- Schema version assertion bumps (48 → 49) in test_db_schema_version,
  test_home_stats, test_schema_v42_migration, test_schema_v46_migration.

Phase 1 only — surface rendering on cards / detail pages and
Claude Code bundle propagation come in later phases.

* feat(flea): phase-2 — wire title/tagline/owner through marketplace cards + detail pages

Phase 1 (7f4cfcbb) populated the three new columns on store_entities;
phase 2 surfaces them across the web presentation layer so the kebab-
case slug + bare username no longer leak into user-facing copy.

API:

- `_flea_to_item` now takes `conn` (both callsites updated) and sets
  `display_name=entity.title`, `tagline=entity.tagline`, `owner=
  _resolve_owner_display(conn, owner_user_id, owner_username)` —
  matches the chain the curated path already uses (users.name →
  users.email → fallback). The card JS chain `it.display_name ||
  it.name` then renders the friendly form; `name` stays at the
  suffixed slug as the technical identifier JS uses for fallbacks.
- `flea_detail` adds `display_name` + `tagline` to PluginDetailResponse
  so the standalone skill/agent + plugin detail heroes pick them up
  through the existing `d.display_name` / `d.tagline` chains.
- `_flea_inner_parent_fields` swaps `parent_display_name` from
  `strip_archive_suffix(name)` to `entity.title or strip_archive_suffix(
  name)`. Drives parent-plugin label in four surfaces at once:
  breadcrumb 3rd segment, hero "part of <plugin>" meta-row,
  helper "This skill is part of <plugin>" panel, and the Details
  sidebar's "Parent plugin" row.

Templates — `marketplace_item_detail.html`:

- Pre-render: browser title, hero h1, and hero-window-label read
  `(entity.title if entity else None) or inner_name or item_name or
  plugin_name` so the SSR shell shows the friendly title before the
  JS fetch lands (no flash of kebab-case).
- Breadcrumb last segment for flea standalone drops the `d.manifest_name
  || heroTitle` fallback in favour of just `heroTitle` — manifest_name
  is the suffixed slug and users explicitly didn't want it in the path.
- Hero meta-row for flea standalone is now hidden. The prior "by
  <author> · N installed · <size>" line duplicated install count
  (hero telemetry chip below), owner + bundle size (Details sidebar).

Templates — `marketplace_plugin_detail.html`:

- Same SSR pre-render swap (title, h1, window-label, crumb-name).
- Hero tagline element starts hidden; JS shows it only when
  `d.tagline` is truthy. Pre-fix it fell back to `d.description`
  (long-form text), which read awkwardly under the h1 and pulled the
  hero too tall. Description still renders in the "What it does"
  panel below the hero.
- Initial "Loading…" placeholder removed so entities without a
  tagline don't flash that text mid-fetch.

Tests:

- New `TestFleaPhase2Presentation` class in test_marketplace_api.py
  (6 cases): card title + tagline + full-name owner, owner fallback
  chain when users.name is NULL, flea_detail exposes title + tagline,
  tagline null when omitted, inner skill parent_display_name uses
  entity.title (explicit + humanize-fallback variants).
- Updated `TestListItems.test_flea_lists_uploads` to assert both
  `display_name == "Alpha"` (humanized) and `name ==
  "alpha-by-alice"` (suffixed slug compat).
- Updated `TestWebPages.test_marketplace_flea_detail_page_renders`
  to look for the humanized title ("Page Skill") in the SSR shell
  instead of the kebab-case `page-skill`.

* feat(flea): phase-3 — read synthetic_name from DB, suffixed_name() only on write

Phase 1 added the column + backfill, repo write paths keep it in sync.
Phase 3 routes every READ callsite through `store_entities.synthetic_name`
directly instead of recomputing `<name>-by-<owner_username>` on the fly,
and switches the collision query off the inline string concat. The
`suffixed_name()` primitive now lives exclusively in write flows.

Read callsites updated (all read `entity["synthetic_name"]` directly,
no fallback — the column is NOT NULL and a missing value would be a
real bug worth surfacing as KeyError):

- app/api/marketplace.py:_flea_to_item — card MarketplaceItem.name.
- app/api/marketplace.py:flea_detail — PluginDetailResponse.manifest_name.
- app/api/store.py:_entity_to_response — StoreEntityResponse.invocation_name.
- app/api/store.py PUT bundle re-bake — `suffixed` passed to
  `_bake_plugin_tree`; entity is loaded pre-rename, so its
  synthetic_name is the OLD value `_bake_plugin_tree` expects.
- app/api/store.py PUT rename — `old_suffix` for `_rename_baked_tree`.
- app/api/my_stack.py — StoreInstallEntry.invocation_name.
- src/marketplace_filter.py — manifest_name in served plugin entry.

`suffixed_name` imports removed from marketplace.py, my_stack.py, and
marketplace_filter.py (no remaining callsites). store.py keeps the
import for its write paths:

- POST create (`suffixed = suffixed_name(final_name, username)` →
  passed to `_bake_plugin_tree` and `repo.create(synthetic_name=...)`).
- PUT rename collision check (`new_suffixed`).
- PUT rename `new_suffix` for `_rename_baked_tree` (proposed value).
- PUT rename `new_synthetic` for `repo.update(synthetic_name=...)`.
- Archive `old_suffix` + `new_suffix` for `_rename_baked_tree`
  (retro-compute pre-archive value after `repo.archive` already
  overwrote the DB row with the post-archive synthetic).

Collision SQL — `_suffixed_already_taken`:

  WHERE name || '-by-' || owner_username = ?   (before)
  WHERE synthetic_name = ?                     (after)

Same matches today (phase 1 backfill + NOT NULL invariant + write
paths in sync); indexable + single source of truth going forward.

Repository:

- UserStoreInstallsRepository.list_for_user explicit SELECT extended
  with `se.title`, `se.tagline`, `se.synthetic_name` so my_stack and
  marketplace_filter callers can read them off the joined row.

Tests:

- test_store_api.py::test_invocation_name_reads_from_synthetic_column —
  upload entity, manually override the column with a non-canonical
  value, verify GET response returns the override (proves read path
  consumes the column, not recomputes).
- test_marketplace_api.py::test_flea_card_and_detail_read_synthetic_name_from_db —
  same proof for `MarketplaceItem.name` (card) and
  `PluginDetailResponse.manifest_name` (detail).

* feat(flea): phase-4 — rename agnes-store-bundle → flea (synthetic plugin)

The synthetic plugin that wraps loose flea-market skills + agents into
one Claude Code plugin is renamed from `agnes-store-bundle` to `flea`.
Plugin-type flea uploads (their own standalone plugin entry) are
unaffected.

Constants:
- src/marketplace_filter.py:
  - BUNDLE_PLUGIN_NAME: "agnes-store-bundle" → "flea"  (Claude Code
    plugin manifest name + .claude-plugin/plugin.json name)
  - BUNDLE_PREFIXED_NAME: "store-bundle" → "flea"      (on-disk ZIP /
    git tree path, now plugins/flea/...)

Attribution layer (services/session_processors/usage_lib.py):
- FLEA_BUNDLE_PREFIX: "agnes-store-bundle" → "flea". The JSONL
  invocation identifier going forward is `flea:<skill-name>`.
- New `_LEGACY_FLEA_BUNDLE_PREFIXES = ("agnes-store-bundle",)`.
  `MarketplaceItemLookup.resolve()` + `_attribute_event()` accept BOTH
  the new and the legacy prefix so historic usage_events (~90-day
  retention) continue attributing to source='flea'. The tuple becomes
  a no-op once the rename has been live past the retention window —
  a follow-up commit can drop it then.
- USAGE_PROCESSOR_VERSION bumped 6 → 7 so the session-pipeline reprocess
  loop re-runs attribution with the new + legacy prefix branches.

User-facing copy:
- /api/store/bundle.zip Content-Disposition filename: agnes-store-bundle.zip → flea.zip
- `agnes admin store pull` default --out: agnes-store-bundle.zip → flea.zip
- Docstrings + JS comment + welcome template comment updated.

Tests:
- skill_flea.jsonl fixture identifier updated to flea:flea-skill.
- New skill_flea_legacy.jsonl with the legacy prefix for backward-compat
  coverage.
- New test `test_legacy_agnes_store_bundle_prefix_resolves` replays the
  legacy fixture and asserts source='flea' attribution still lands.
- All other test assertions / mocks substituted mechanically:
  test_session_processor_usage.py, test_usage_rollups.py,
  test_marketplace_filter_store.py, test_store_api.py,
  test_cli_refresh_marketplace.py.
- `_seed_flea_entity` (test_usage_rollups.py) + `_seed_attribution`
  (test_session_processor_usage.py) helpers now supply the NOT NULL
  `title` + `synthetic_name` columns from phase 1, since they INSERT
  directly bypassing the repo's create() fallback.

Client rollover note (CHANGELOG): `agnes refresh-marketplace` will
install the new `flea@agnes` plugin and the local marketplace clone's
`plugins/store-bundle/` source folder is removed via `git reset --hard`.
Whether Claude Code itself auto-prunes the orphan `agnes-store-bundle
@agnes` registry entry is undocumented — to verify empirically on the
dev VM. If the orphan entry lingers, a follow-up will add targeted
cleanup; until then users can manually run
`claude plugin uninstall agnes-store-bundle@agnes`.

Verified locally: 98 passed (session_processor_usage + usage_rollups +
marketplace_filter_store + cli_refresh_marketplace) + 228 passed/2
skipped (store_api + marketplace_api + admin_store_submissions +
store_entity_versions + store_repositories).

* fix(flea): phase-5 — attribution keyspace mismatch (closes #335)

Pre-fix every flea skill/agent invocation silently fell through to
`usage_events.source = 'builtin'`. Root cause: lookup tables in
`services/session_processors/usage_lib.py` keyed `_flea_entities` (and
the derived `_flea_plugins` set) by `store_entities.name` — the
un-suffixed display name. Claude Code writes invocations as
`flea:<synthetic_name>` (e.g. `flea:xlsx-by-c-marustamyan`), so
`dict.get(local)` always missed and the resolver fell through to
builtin. Result: marketplace cards, detail telemetry chips, admin
group-by-source all showed 0 flea invocations even when the raw
JSONL stream was correct.

Phase 1 added the `synthetic_name` column + backfill; phase 4 renamed
the bundle prefix to `flea`; phase 5 finally flips the lookup
keyspace to match what JSONL writes.

usage_lib.py:
- `MarketplaceItemLookup.__init__` preload: `SELECT synthetic_name,
  type FROM store_entities` (was `SELECT name, type`). `_flea_plugins`
  set derived from those keys, so it now carries synthetic_names
  too — matches what Claude Code writes when invoking a skill nested
  inside a flea plugin (`<synthetic>:<inner>`).
- `rebuild_rollups` preload: same SELECT change; also derives
  `flea_plugins` and threads it through `_aggregate_events` /
  `_rebuild_window`.
- `_attribute_event`: signature extended with `flea_plugins`; new
  branch `if prefix in flea_plugins: return ("flea", default_type,
  prefix, local)` for flea-plugin-nested skills/agents. This branch
  was added to `MarketplaceItemLookup.resolve()` in v6 (commit
  e076ebbe) but the rollup builder's helper was never updated to
  match, so nested skills inside flea plugins silently dropped out
  of the daily/window fact tables.
- `USAGE_PROCESSOR_VERSION`: 7 → 8. Forces the session-pipeline
  reprocess loop to re-attribute existing usage_events rows with
  the corrected lookup so rollup tables fill correctly on the next
  tick.

marketplace.py — 4 API stats lookup callsites switched from
`entity["name"]` to `entity["synthetic_name"]`:
- `_flea_to_item` (card stats lookup)
- `flea_detail` (`_build_telemetry` + `_load_inner_items_stats_by_parent`)
- `flea_skill_detail` (inner detail `parent_plugin` key)
- `flea_agent_detail` (inner detail `parent_plugin` key)

Tests:
- `skill_flea.jsonl` invocation: `flea:flea-skill` →
  `flea:flea-skill-by-alice` (mirrors what Claude Code writes after
  phase 1/4 — the suffixed synthetic_name).
- `test_flea_skill_attributed_with_empty_parent` assertion: rollup
  `name` column now carries the synthetic_name.

No legacy `agnes-store-bundle` prefix backward compat — clean cut per
user direction (dev phase, no production data worth preserving).

Verified locally: 53 passed targeted (session_processor_usage +
usage_rollups + marketplace_filter_store) + 215 passed/2 skipped
broader (store_api + marketplace_api + admin_store_submissions +
store_entity_versions).

* fix(flea): phase-6 — plugin-level rollup aggregation parity for flea

Flea plugin entity cards + detail pages showed 0 invocations even
though nested skills had correct rollup rows. Root cause: the
plugin-level aggregation pass in `_aggregate_events` was hardcoded
to `source='curated'` only:

    if source != "curated" or not parent:
        continue
    if group_by_day:
        pkey = (day, "curated", "plugin", "", parent)
    else:
        pkey = ("curated", "plugin", "", parent)

So flea plugin entities never got a synthetic
`(source='flea', type='plugin', parent_plugin='', name=<synth>)`
row aggregating nested invocations. `_load_invocation_stats('flea')`
filters `parent_plugin = ''` and returned no row for flea plugin
entity cards, so `stats.get(entity["synthetic_name"])` missed and
the API exposed 0/0.

Triggered by empirical observation on the dev VM —
`codex-second-opinion-by-c-marustamyan` plugin showed 0 calls in
the listing card while its three inner skills (codex-setup ×3,
codex-review ×1, codex-second-opinion ×1) had the expected child
rollup rows.

Fix:

- Extend the guard to `source in ("curated", "flea")`.
- Replace the hardcoded `"curated"` in the `pkey` tuple with the
  loop's `source` variable, so flea aggregation lands as `source=
  'flea'` and curated aggregation continues landing as
  `source='curated'`.

API path unchanged — `_load_invocation_stats('flea')` filters
`parent_plugin = ''` already picks up the new aggregated row
alongside standalone skill/agent rows. Rollup `name` field carries
the synthetic_name keyspace; no collision between standalone entity
synthetic and plugin entity synthetic (global suffix uniqueness
enforced by `_suffixed_already_taken`).

`USAGE_PROCESSOR_VERSION` bumped 8 → 9 to force a reprocess pass so
historic nested-invocation data fills the new plugin-level rows on
the next tick (instead of waiting for the next live invocation).

Tests:

- New `test_flea_plugin_row_aggregates_children` mirrors the existing
  `test_curated_plugin_row_aggregates_children`: seeds a flea plugin
  entity, three nested events (one user invoking two skills, a
  second user invoking one) → asserts the aggregated plugin row
  carries count=3, distinct_users=2 (union, not sum), plus the child
  rows survive alongside.

Verified locally: 43 passed (session_processor_usage + usage_rollups)
+ 82 passed/2 skipped broader (+ marketplace_filter_store +
marketplace_api).

* refactor(marketplace): phase-7 — unify Details sidebar across detail surfaces

Five marketplace detail surfaces (curated plugin, flea plugin, curated
inner skill/agent, flea inner skill/agent, flea standalone skill/agent)
had drifted on which Details rows they show and what order — the same
field landed in different positions, some fields duplicated hero info,
and the flea plugin Owner row leaked the kebab-case `owner_username`
slug instead of the user's real name. This commit aligns all five
surfaces on a single scan order driven by UX priority:

  identity → life-stage → telemetry → debug-tier

Concretely:

  1. Curator / Owner          (first scan signal — trust)
  2. Parent plugin            (inner skill/agent only)
  3. Released                 (top-level only — plugins + flea standalone)
  4. Last used                (recency)
  5. Active days              (engagement consistency)
  6. Version                  (flea standalone only — content hash)
  7. Bundle size              (debug-tier)

Dropped:

  - Slug field on plugin detail surfaces (`marketplace_id` for curated,
    `entity_id` for flea). Pure debug info, never user-relevant; URL
    already carries it.
  - Category + Installs on flea standalone skill/agent detail.
    Category is already shown as a hero badge; install count is in
    the hero telemetry chip — sidebar duplication added noise.

Owner display:

  - Flea plugin Owner row now reads `d.owner_display` (resolved through
    `users.name → users.email → owner_username` by `_resolve_owner_display`
    in `app/api/marketplace.py:1491`) instead of the raw `d.author_name`
    (which is `owner_username`, the kebab-case slug). API field already
    populated from phase 2; templates just consume it.
  - Curated Curator row continues to read `d.author_name` from
    marketplace-metadata.json; `owner_todo` placeholder behavior
    preserved.

Files:

  - app/web/templates/marketplace_plugin_detail.html — rewrote the
    Details render loop (lines 1364-1427 area). Slug row removed,
    rows reordered, Owner branch reads `d.owner_display`.
  - app/web/templates/marketplace_item_detail.html — both branches of
    the Details sidebar (inner skill/agent + flea standalone) re-laid
    around the same scan order. Telemetry helper unchanged, just
    repositioned. Category + Installs rows removed from the
    standalone branch.

No new tests — no existing test asserts the precise order of Details
rows or references the dropped fields in a sidebar context (grep
confirmed). API surface unchanged.

Verified locally: 84 passed / 2 skipped on `test_marketplace_api.py`
+ `test_store_api.py`.

* fix(flea): post-review hardening — N+1, v50 UNIQUE, docs, test cleanup

Addresses 5 critical findings from PR #342 code review:

1. N+1 query in `_flea_to_item` — owner-display resolution previously
   ran one `SELECT … FROM users WHERE id = ?` per item in the listing
   comprehension. Now batched via `_load_users_display` IN-query
   prefetch; 50 items drops 51 user queries to 2. Regression-guarded
   by `TestFleaOwnerDisplayBatched` (spies `_resolve_owner_display`
   and asserts it's not called inside the list path).

2. Misleading comment in `src/marketplace_filter.py` claimed the
   attribution layer accepts both `agnes-store-bundle` and `flea`
   prefixes — it doesn't (clean cut per CHANGELOG). Rewrote to match
   reality.

3. CHANGELOG `[Unreleased]` had two `### Changed` blocks. Merged into
   one (BREAKING bullet first).

4. New v49→v50 migration adds `UNIQUE INDEX
   idx_store_entities_synthetic_name`. v49 made `synthetic_name` the
   canonical attribution key but uniqueness was only app-enforced;
   v50 promotes the invariant to the DB layer. Migration pre-checks
   for existing duplicates and raises `RuntimeError` listing them
   rather than letting `CREATE UNIQUE INDEX` fail mid-way. v48→v49
   migration gained an `is_nullable='YES'` guard on its `SET NOT NULL`
   ALTERs so re-runs on a fully-migrated DB don't trip DuckDB's
   "cannot alter entry … entries depend on it" block (the new index
   counts as such an entry). Index is created by the migration only —
   keeping it out of `_SYSTEM_SCHEMA` preserves fresh-install ordering
   (CREATE TABLE → v49 ALTERs → v50 CREATE INDEX).

5. Deleted three redundant version-pinned schema asserts whose names
   lied about their bodies (`test_schema_version_is_42` asserting
   `== 49`, etc.). Canonical assert lives in
   `test_db_schema_version.py`, renamed to
   `test_schema_version_matches_constant`.

* fix(db): gate v34→v38 store_entities ALTER COLUMN steps on column state

CI on Linux failed `test_v17_to_v18_drops_*` after the v50 UNIQUE INDEX
landed. Root cause: those tests open a DB at the full target version,
seed fixtures, then reset `schema_version` to 17 and reopen — forcing
the ladder to re-run from 17 → current. With the v50 index now in place,
DuckDB blocks intermediate `ALTER COLUMN` steps on `store_entities`
("Cannot drop this column: an index depends on a column after it!" /
"Cannot alter entry because there are entries that depend on it"),
because `synthetic_name` (the indexed column) sits positionally after
the columns those steps touch.

Fix: convert the three SQL-list migrations that hit store_entities into
defensive Python functions:

- `_v34_to_v35_migrate` short-circuits when `synthetic_name` already
  exists (post-v49 shape — the visibility_status rebuild is moot and
  the DROP COLUMN would be blocked by the index).
- `_v35_to_v36_migrate` gates the `visibility_status SET NOT NULL` +
  `SET DEFAULT` on `is_nullable='YES'` so it's a true no-op when the
  column is already constrained.
- `_v37_to_v38_migrate` gates the `version_no SET NOT NULL` step the
  same way.

Forward-roll path (real installs that never reset schema_version) is
unchanged: the gates fire `YES` → ALTERs run. The fix only changes
behavior for the "DB is already at v50 shape but version row says 17"
scenario the tests construct.

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
2026-05-19 02:32:41 +02:00

584 lines
27 KiB
Python

"""v20 adds source_query column to table_registry.
Backs query_mode='materialized' for BigQuery: admin registers a SQL body
that the scheduler runs through the DuckDB BQ extension and writes as a
parquet to /data/extracts/bigquery/data/<id>.parquet.
The v19 step (#150) drops dataset_permissions, access_requests tables and
users.role, table_registry.is_public columns; v20 then ALTERs the post-v19
table_registry to add the source_query column.
"""
import duckdb
from src.db import SCHEMA_VERSION, _ensure_schema, get_schema_version
def test_schema_version_matches_constant():
# v27 → v28: explicit-install (Model B) for curated marketplace plugins.
# user_plugin_optouts row presence flips meaning from "excluded" to
# "subscribed"; migration wipes existing rows so the inverted reading
# starts from a clean baseline. Also adds marketplace_plugins.created_at
# (per-plugin "newest first" sort on /marketplace), backfilled from
# parent marketplace_registry.registered_at.
# v28 → v29: /home page rollout — instance_templates singleton
# consolidation (welcome_template + claude_md_template merged) + new
# users.onboarded column. See tests/test_v29_home_migration.py for
# the exhaustive coverage of that step.
# v29 → v30: news_template — single versioned table for the /home
# news perex + /news permalink page. See
# tests/test_news_template_repository.py.
# v30 → v31: session-pipeline framework — session_processor_state
# replaces session_extraction_state with composite PK.
# v31 → v32 (PR #233): flea-market upload guardrails — adds
# store_entities.visibility_status + creates store_submissions.
# v32 → v33 (PR #233): forensic columns on store_submissions —
# file_size, bundle_sha256, bundle_purged_at. Underpins the
# persist-blocked-bundle behavior so admins can Rescan /
# Override / Download; 30-day TTL purge clears bytes while
# keeping the row + sha intact. See docs/STORE_GUARDRAILS.md.
# v33 → v34: drop store_submissions.retry_count — counter mixed LLM
# error count + admin rescan count, redundant with audit_log.
# v34 → v35 (PR #233): store_entities gains 'archived' visibility
# state + archived_at + archived_by audit columns. Owner
# soft-delete writes 'archived'; existing user_store_installs
# keep serving the bundle through marketplace.zip / .git.
# Hard delete (DELETE ?hard=true) remains admin-only.
# v35 → v36 (PR #233 follow-up): re-apply NOT NULL + DEFAULT 'pending'
# on store_entities.visibility_status. Lost in the v34→v35
# column rebuild. Without this, an INSERT that omits the
# column lands NULL → repo reads None → undefined behavior
# in the visibility gates. Value-list invariant remains
# enforced application-side (DuckDB ADD CHECK on existing
# column not supported).
# v36 → v37: curated marketplace enrichment from
# `.claude-plugin/marketplace-metadata.json` plus mandatory curator
# identity on marketplace_registry. Adds curator_name +
# curator_email to marketplace_registry, and
# cover_photo_url + video_url + doc_links to
# marketplace_plugins.
# v37 → v38: flea-market edit feature with version
# history. Adds store_entities.version_no INTEGER and
# version_history JSON. Each new bundle upload via
# PUT bumps version_no and appends to version_history;
# metadata-only edits don't bump. Existing rows backfill
# to version_no=1 with a single-entry history seeded
# from the row's current `version` (hash). Bundle bytes
# for each version live on disk under
# ${DATA_DIR}/store/<id>/versions/v<N>/plugin/.
# v38 → v39: system plugin tier — admin-toggleable mandatory plugin
# set. Adds marketplace_plugins.is_system BOOLEAN DEFAULT
# FALSE. The flag drives a fanout that materializes
# resource_grants + user_plugin_optouts rows for every
# existing user_groups + users row, so the resolver's
# existing (rbac ∩ subscriptions) computation naturally
# pulls system plugins into every user's stack. UI then
# locks the corresponding controls so users can't
# unsubscribe and admins can't revoke per-group grants.
# v39 → v40: persistent BigQuery metadata cache. Adds
# bq_metadata_cache(table_id PK, rows, size_bytes,
# partition_by, clustered_by, refreshed_at, error_at,
# error_msg).
# v40 → v41: Activity Center schema — audit_log gains params_before
# (JSON), client_ip (VARCHAR), client_kind (VARCHAR),
# correlation_id (VARCHAR). Three indices on (timestamp),
# (user_id, timestamp), (action, timestamp).
# v41 → v42 (this PR): platform telemetry schema — 7 new usage_*
# tables: usage_events (per-event log), usage_session_summary
# (per-session aggregate), usage_tool_daily + usage_plugin_daily
# (daily rollups), usage_attribution_skills/agents/commands
# (plugin manifest attribution). 10 indices for fast queries.
# v42 → v43: user_observability_views — per-user saved
# filter combinations backing the unified /admin/activity
# page (UNIQUE(user_id, name)). Schema is intentionally
# opaque JSON because the UI evolves faster than DB.
# v43 → v44: homepage status frame backing columns —
# users.last_pull_at (per-user manifest fetch timestamp,
# bumped by GET /api/sync/manifest) plus four BIGINT token
# counters on usage_session_summary (input_tokens,
# output_tokens, cache_read_tokens, cache_creation_tokens).
# USAGE_PROCESSOR_VERSION simultaneously bumps 1→2 so the
# reprocess loop backfills tokens on next tick.
# v44 → v45: user_id column on usage_session_summary + usage_events
# (stable RBAC filter — replaces the unstable email-local-part
# ``username`` column) plus matching indices.
# v45 → v46: per-user opt-out (dismiss) for curated memory
# items. New table ``knowledge_item_user_dismissed``
# ((user_id, item_id) PK, dismissed_at) + index on user_id
# for the EXISTS subquery used by list_items / search /
# count_items / bundle. Mandatory items are governance-
# protected: the API rejects POSTs against them, and the
# SQL filter exempts ``status = 'mandatory'`` so any stale
# row from before an item was mandated is silently ignored.
# v46 → v47: DuckDB FTS BM25 index over knowledge_items(title, content).
# Replaces ``ILIKE '%q%'`` ranking-by-insertion-order in
# ``KnowledgeRepository.search`` with BM25 relevance scoring.
# Migration is soft-fail: a missing fts extension leaves the
# DB at v46 (search falls back to ILIKE).
# v47 → v48 (this PR): marketplace telemetry refactor. Drops 4 legacy
# tables (usage_attribution_skills/_agents/_commands,
# usage_plugin_daily — all verified empty or derivable).
# Adds usage_marketplace_item_daily (per-day fact with
# count + distinct_users + error_count) and
# usage_marketplace_item_window (sliding-window snapshot,
# labels 'last_7d' refreshed every tick, 'last_30d' hourly).
# New attribution logic = prefix split on `<plugin>:<local>`
# identifier + live lookup against marketplace_plugins /
# store_entities — no mapping tables needed.
# v49 (#TBD): phase-1 Flea refactor — adds title, tagline,
# synthetic_name columns to store_entities. title is
# user-friendly display name (acronym-aware), tagline is
# an optional 200-char short description, synthetic_name is
# the deterministic <name>-by-<owner_username> string baked
# into served bundles. Migration backfills existing rows
# via humanize_name(strip_archive_suffix(name)) for title
# and the concat formula for synthetic_name.
# v50 (#TBD): UNIQUE INDEX on store_entities.synthetic_name. v49 made
# it the canonical attribution key (rollup keyspace, JSONL
# prefix, marketplace bundle naming) but uniqueness was
# only app-enforced; v50 adds DB-level uniqueness via
# CREATE UNIQUE INDEX. Migration pre-checks for existing
# duplicates and raises RuntimeError listing them rather
# than letting the index create fail mid-way.
assert SCHEMA_VERSION == 50
def test_v37_marketplace_curator_columns(tmp_path):
"""Fresh install reaches the current schema with the v37 marketplace
columns present."""
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
_ensure_schema(conn)
registry_cols = {
r[0]
for r in conn.execute(
"SELECT column_name FROM information_schema.columns WHERE table_name = 'marketplace_registry'"
).fetchall()
}
assert {"curator_name", "curator_email"} <= registry_cols, (
f"curator columns missing from marketplace_registry: {registry_cols}"
)
plugin_cols = {
r[0]
for r in conn.execute(
"SELECT column_name FROM information_schema.columns WHERE table_name = 'marketplace_plugins'"
).fetchall()
}
assert {"cover_photo_url", "video_url", "doc_links"} <= plugin_cols, (
f"enrichment columns missing from marketplace_plugins: {plugin_cols}"
)
conn.close()
def test_v36_db_migrates_to_current(tmp_path):
"""Pre-existing v36 DB upgrades cleanly through v37 (curator
enrichment) and v38 (flea edit version history) without losing
existing rows."""
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
# Stand up a minimal v36-shape registry + plugin row, plus the
# schema_version row that pins us to 36.
conn.execute("CREATE TABLE schema_version (version INTEGER, applied_at TIMESTAMP DEFAULT current_timestamp)")
conn.execute("INSERT INTO schema_version (version) VALUES (36)")
conn.execute("""CREATE TABLE marketplace_registry (
id VARCHAR PRIMARY KEY, name VARCHAR NOT NULL,
url VARCHAR NOT NULL, branch VARCHAR, token_env VARCHAR,
description TEXT, registered_by VARCHAR,
registered_at TIMESTAMP DEFAULT current_timestamp,
last_synced_at TIMESTAMP, last_commit_sha VARCHAR, last_error TEXT
)""")
conn.execute("""CREATE TABLE marketplace_plugins (
marketplace_id VARCHAR NOT NULL, name VARCHAR NOT NULL,
description TEXT, version VARCHAR, author_name VARCHAR,
homepage VARCHAR, category VARCHAR, source_type VARCHAR,
source_spec JSON, raw JSON,
created_at TIMESTAMP DEFAULT current_timestamp,
updated_at TIMESTAMP DEFAULT current_timestamp,
PRIMARY KEY (marketplace_id, name)
)""")
conn.execute(
"INSERT INTO marketplace_registry (id, name, url) VALUES ('legacy', 'Legacy', 'https://example.com/repo.git')"
)
conn.execute("INSERT INTO marketplace_plugins (marketplace_id, name) VALUES ('legacy', 'foo')")
_ensure_schema(conn)
assert get_schema_version(conn) == SCHEMA_VERSION
# v37 enrichment columns exist; existing rows preserved with NULL.
row = conn.execute("SELECT curator_name, curator_email FROM marketplace_registry WHERE id = 'legacy'").fetchone()
assert row == (None, None)
row = conn.execute(
"SELECT cover_photo_url, video_url, doc_links FROM marketplace_plugins "
"WHERE marketplace_id = 'legacy' AND name = 'foo'"
).fetchone()
assert row == (None, None, None)
conn.close()
def test_v39_adds_marketplace_plugins_is_system(tmp_path):
"""Fresh install reaches the current schema with the v39 is_system
column on marketplace_plugins. Default value is FALSE (not NULL) so
the fanout helpers don't need to special-case absent rows."""
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
_ensure_schema(conn)
cols = {
r[0]
for r in conn.execute(
"SELECT column_name FROM information_schema.columns WHERE table_name = 'marketplace_plugins'"
).fetchall()
}
assert "is_system" in cols, f"is_system missing from {cols}"
# New rows default to FALSE — required so a freshly-synced plugin
# doesn't accidentally land in everyone's stack.
conn.execute("INSERT INTO marketplace_registry (id, name, url) VALUES ('m', 'M', 'https://example.com/repo.git')")
conn.execute("INSERT INTO marketplace_plugins (marketplace_id, name) VALUES ('m', 'p')")
row = conn.execute("SELECT is_system FROM marketplace_plugins WHERE marketplace_id = 'm' AND name = 'p'").fetchone()
assert row[0] is False, f"new plugin defaulted to {row[0]!r}, expected False"
conn.close()
def test_v38_db_migrates_to_v39(tmp_path):
"""Pre-existing v38 DB upgrades to v39 cleanly — adds is_system
column, existing rows backfill to FALSE, schema_version updates."""
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
# Stand up the v38 minimal shape: schema_version row + the two
# marketplace tables + a pre-existing plugin row that must survive
# the migration with is_system = FALSE.
conn.execute("CREATE TABLE schema_version (version INTEGER, applied_at TIMESTAMP DEFAULT current_timestamp)")
conn.execute("INSERT INTO schema_version (version) VALUES (38)")
conn.execute("""CREATE TABLE marketplace_registry (
id VARCHAR PRIMARY KEY, name VARCHAR NOT NULL,
url VARCHAR NOT NULL, branch VARCHAR, token_env VARCHAR,
description TEXT, registered_by VARCHAR,
registered_at TIMESTAMP DEFAULT current_timestamp,
last_synced_at TIMESTAMP, last_commit_sha VARCHAR, last_error TEXT,
curator_name VARCHAR, curator_email VARCHAR
)""")
conn.execute("""CREATE TABLE marketplace_plugins (
marketplace_id VARCHAR NOT NULL, name VARCHAR NOT NULL,
description TEXT, version VARCHAR, author_name VARCHAR,
homepage VARCHAR, category VARCHAR, source_type VARCHAR,
source_spec JSON, raw JSON,
created_at TIMESTAMP DEFAULT current_timestamp,
updated_at TIMESTAMP DEFAULT current_timestamp,
cover_photo_url VARCHAR, video_url VARCHAR, doc_links JSON,
PRIMARY KEY (marketplace_id, name)
)""")
conn.execute(
"INSERT INTO marketplace_registry (id, name, url) VALUES ('legacy', 'Legacy', 'https://example.com/repo.git')"
)
conn.execute("INSERT INTO marketplace_plugins (marketplace_id, name) VALUES ('legacy', 'foo')")
_ensure_schema(conn)
assert get_schema_version(conn) == SCHEMA_VERSION
cols = {
r[0]
for r in conn.execute(
"SELECT column_name FROM information_schema.columns WHERE table_name = 'marketplace_plugins'"
).fetchall()
}
assert "is_system" in cols
# Existing pre-v39 row backfilled to FALSE — no plugin lands in
# everyone's stack just because we ran the migration.
row = conn.execute(
"SELECT is_system FROM marketplace_plugins WHERE marketplace_id = 'legacy' AND name = 'foo'"
).fetchone()
assert row[0] is False, f"pre-existing row backfilled to {row[0]!r}"
conn.close()
def test_v20_adds_source_query(tmp_path):
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
_ensure_schema(conn)
cols = {
r[0]
for r in conn.execute(
"SELECT column_name FROM information_schema.columns WHERE table_name = 'table_registry'"
).fetchall()
}
assert "source_query" in cols, f"source_query missing from {cols}"
assert get_schema_version(conn) == SCHEMA_VERSION
conn.close()
def test_claude_md_template_seeded_in_instance_templates(tmp_path):
"""v23 introduced claude_md_template as a singleton table; v28 consolidates
it into instance_templates keyed 'claude_md'. Post-v28 the legacy table is
dropped — the canonical lookup is `instance_templates WHERE key='claude_md'`.
See tests/test_v28_migration.py for the migration path coverage. This test
just verifies the seeded row is present on a fresh install.
"""
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
_ensure_schema(conn)
tables = {
r[0]
for r in conn.execute("SELECT table_name FROM information_schema.tables WHERE table_schema = 'main'").fetchall()
}
assert "instance_templates" in tables
assert "claude_md_template" not in tables, "claude_md_template should be consolidated away post-v28"
row = conn.execute("SELECT key, content FROM instance_templates WHERE key = 'claude_md'").fetchone()
assert row is not None
assert row[0] == "claude_md"
assert row[1] is None # default = no override
conn.close()
def test_v19_db_migrates_to_v20(tmp_path):
"""Pre-existing v19 DB (post-RBAC-drop) without source_query upgrades
cleanly without losing data."""
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
# Simulate a v19 DB at minimal but realistic shape: schema_version row +
# a table_registry row in the post-v19 column shape (no is_public column,
# since v19 finalize dropped it via the table-rebuild idiom).
conn.execute("CREATE TABLE schema_version (version INTEGER, applied_at TIMESTAMP DEFAULT current_timestamp)")
conn.execute("INSERT INTO schema_version (version) VALUES (19)")
conn.execute("""CREATE TABLE table_registry (
id VARCHAR PRIMARY KEY, name VARCHAR NOT NULL,
source_type VARCHAR, bucket VARCHAR, source_table VARCHAR,
sync_strategy VARCHAR DEFAULT 'full_refresh',
query_mode VARCHAR DEFAULT 'local',
sync_schedule VARCHAR, profile_after_sync BOOLEAN DEFAULT true,
primary_key VARCHAR, folder VARCHAR, description TEXT,
registered_by VARCHAR,
registered_at TIMESTAMP DEFAULT current_timestamp
)""")
conn.execute("INSERT INTO table_registry (id, name) VALUES ('foo', 'foo')")
_ensure_schema(conn)
assert get_schema_version(conn) == SCHEMA_VERSION # bumped 19→28 forward
cols = {
r[0]
for r in conn.execute(
"SELECT column_name FROM information_schema.columns WHERE table_name = 'table_registry'"
).fetchall()
}
assert "source_query" in cols
# Existing row preserved, new column NULL
row = conn.execute("SELECT id, source_query FROM table_registry WHERE id='foo'").fetchone()
assert row == ("foo", None)
conn.close()
def _make_v34_store_entities(conn):
"""Build a minimal v34-shape store_entities table for v34→v35 path tests.
Only includes the columns the v34→v35 migration touches; the rest of
the schema isn't needed because the function operates only on
store_entities's column set.
"""
conn.execute("""
CREATE TABLE store_entities (
id VARCHAR PRIMARY KEY,
visibility_status VARCHAR DEFAULT 'pending'
)
""")
conn.execute(
"INSERT INTO store_entities (id, visibility_status) VALUES ('a', 'approved'), ('b', 'pending'), ('c', 'hidden')"
)
def test_v34_to_v35_clean_path_rebuilds_visibility_column(tmp_path):
"""Standard v34 → v35 path: ``visibility_status`` is present, no temp
column. Migration rebuilds the column without the legacy CHECK so
'archived' becomes a valid value, preserves all row values, and adds
the audit columns.
"""
from src.db import _v34_to_v35_migrate
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
_make_v34_store_entities(conn)
_v34_to_v35_migrate(conn)
cols = {
r[0]
for r in conn.execute(
"SELECT column_name FROM information_schema.columns WHERE table_name = 'store_entities'"
).fetchall()
}
assert "visibility_status" in cols
assert "_vis_v35" not in cols, "temp column must be cleaned up"
assert "archived_at" in cols
assert "archived_by" in cols
rows = dict(conn.execute("SELECT id, visibility_status FROM store_entities ORDER BY id").fetchall())
assert rows == {"a": "approved", "b": "pending", "c": "hidden"}, f"row values must survive the rebuild: {rows}"
conn.close()
def test_v34_to_v35_recovers_from_partial_rebuild_missing_visibility(tmp_path):
"""Partial-rebuild recovery: a previous migration attempt completed
steps 3-5 (added _vis_v35, copied values, dropped visibility_status)
but failed before step 6 (RENAME). Subsequent restarts hit
DROP visibility_status (no IF EXISTS guard) and looped on the same
error, leaving the DB stranded with schema_version stuck pre-v35.
The new code detects this state — _vis_v35 present, visibility_status
absent — and finishes the rebuild with the RENAME alone instead of
re-running the full destructive sequence.
"""
from src.db import _v34_to_v35_migrate
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
# Hand-build the broken state: store_entities with _vis_v35 instead of
# visibility_status, populated with the canonical values.
conn.execute("""
CREATE TABLE store_entities (
id VARCHAR PRIMARY KEY,
_vis_v35 VARCHAR
)
""")
conn.execute(
"INSERT INTO store_entities (id, _vis_v35) VALUES ('a', 'approved'), ('b', 'pending'), ('c', 'hidden')"
)
_v34_to_v35_migrate(conn)
cols = {
r[0]
for r in conn.execute(
"SELECT column_name FROM information_schema.columns WHERE table_name = 'store_entities'"
).fetchall()
}
assert "visibility_status" in cols
assert "_vis_v35" not in cols
assert "archived_at" in cols
assert "archived_by" in cols
rows = dict(conn.execute("SELECT id, visibility_status FROM store_entities ORDER BY id").fetchall())
assert rows == {"a": "approved", "b": "pending", "c": "hidden"}, (
f"row values must come back via RENAME, not be lost: {rows}"
)
conn.close()
def test_v34_to_v35_recovers_from_partial_rebuild_both_columns(tmp_path):
"""Edge state: a prior attempt aborted before the DROP, leaving both
visibility_status (canonical) and _vis_v35 (temp) on the table.
The recovery path drops _vis_v35 and keeps visibility_status — the
rest of the schema expects that name.
"""
from src.db import _v34_to_v35_migrate
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
conn.execute("""
CREATE TABLE store_entities (
id VARCHAR PRIMARY KEY,
visibility_status VARCHAR,
_vis_v35 VARCHAR
)
""")
conn.execute("INSERT INTO store_entities (id, visibility_status, _vis_v35) VALUES ('a', 'approved', 'approved')")
_v34_to_v35_migrate(conn)
cols = {
r[0]
for r in conn.execute(
"SELECT column_name FROM information_schema.columns WHERE table_name = 'store_entities'"
).fetchall()
}
assert "visibility_status" in cols
assert "_vis_v35" not in cols, "temp column must be dropped"
row = conn.execute("SELECT id, visibility_status FROM store_entities WHERE id = 'a'").fetchone()
assert row == ("a", "approved")
conn.close()
def test_v32_db_with_partial_v35_recovers_through_full_ladder(tmp_path):
"""End-to-end: a DB stranded at schema_version=32 with the half-applied
v34→v35 state (visibility_status dropped, _vis_v35 left behind) must
upgrade cleanly through the full ladder when ``_ensure_schema`` runs.
This is the production scenario observed in operator instances after
the original list-form ``_V34_TO_V35_MIGRATIONS`` failed mid-run on
a fresh restart.
"""
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
# Stand up the broken state. We only need enough of the schema for the
# migration ladder to run — ``_ensure_schema`` will create the rest
# via ``_SYSTEM_SCHEMA``'s IF NOT EXISTS guards.
conn.execute("CREATE TABLE schema_version (version INTEGER, applied_at TIMESTAMP DEFAULT current_timestamp)")
conn.execute("INSERT INTO schema_version (version) VALUES (32)")
conn.execute("""
CREATE TABLE store_entities (
id VARCHAR PRIMARY KEY,
owner_user_id VARCHAR,
owner_username VARCHAR,
type VARCHAR,
name VARCHAR,
archived_at TIMESTAMP,
archived_by VARCHAR,
_vis_v35 VARCHAR
)
""")
conn.execute("INSERT INTO store_entities (id, type, name, _vis_v35) VALUES ('a', 'skill', 'alpha', 'approved')")
_ensure_schema(conn)
assert get_schema_version(conn) == SCHEMA_VERSION
cols = {
r[0]
for r in conn.execute(
"SELECT column_name FROM information_schema.columns WHERE table_name = 'store_entities'"
).fetchall()
}
assert "visibility_status" in cols
assert "_vis_v35" not in cols
# Existing row preserved, value carried over from _vis_v35.
row = conn.execute("SELECT id, visibility_status FROM store_entities WHERE id = 'a'").fetchone()
assert row == ("a", "approved")
conn.close()
def test_v35_to_v36_reapplies_visibility_constraints(tmp_path):
"""v34→v35 dropped NOT NULL + DEFAULT when rebuilding the column to
drop the legacy CHECK; v35→v36 re-applies them. Verifies that on a
freshly migrated DB, an INSERT omitting visibility_status either
inherits the default 'pending' or fails — never lands NULL.
"""
db_path = tmp_path / "system.duckdb"
conn = duckdb.connect(str(db_path))
_ensure_schema(conn)
assert get_schema_version(conn) == SCHEMA_VERSION
cols = conn.execute(
"SELECT column_name, is_nullable, column_default "
"FROM information_schema.columns "
"WHERE table_name = 'store_entities' "
" AND column_name = 'visibility_status'"
).fetchall()
assert cols, "visibility_status column missing from store_entities"
name, is_nullable, default_expr = cols[0]
assert is_nullable == "NO", f"visibility_status must be NOT NULL after v36; got is_nullable={is_nullable!r}"
# DuckDB renders the default as a quoted literal — match either form.
assert default_expr is not None, "visibility_status DEFAULT must be set"
assert "pending" in str(default_expr).lower(), f"visibility_status DEFAULT must be 'pending'; got {default_expr!r}"
conn.close()