* Curated marketplace enrichment via agnes-metadata.json + curator metadata
Adds a second well-defined metadata file `.claude-plugin/agnes-metadata.json`
that upstream marketplace repos can opt into, providing per-plugin (and
per-skill / per-agent) cover photo, demo video URL, doc links, and
category override. The Claude Code marketplace contract is untouched —
agnes-metadata.json + the convention `.agnes/` directory are stripped
from the synthetic Claude Code marketplace served via /marketplace.zip
and /marketplace.git/*, so user instances see a clean Claude Code repo
with no Agnes-only metadata.
Highlights:
- DB schema v32 — adds curator_name + curator_email on marketplace_registry,
cover_photo_url + video_url + doc_links on marketplace_plugins.
- Mandatory curator at marketplace registration, editable later through
the admin UI; surfaces on cards + detail pages in place of owner_todo.
- External-asset mirror cache at ${DATA_DIR}/marketplace-cache/<slug>/
with conditional GET, 60s timeout, 10 MB body cap, SSRF guards, and
Wikipedia-policy-compliant User-Agent.
- Strict drop semantics — anything Agnes can't deliver as a real PDF /
Markdown / plain text doc, or a real PNG / JPEG / WebP cover, is
dropped from the served metadata; UI looks identical to no-entry case
(gradient placeholder for missing covers, no row in the doc list).
- Doc allowlist + image allowlist enforced on both the curated mirror
flow and the Flea upload flow (/store/new); shared module
src/marketplace_assets.py.
- New /api/marketplace/curated/{mp}/{plugin}/{asset,doc,mirrored}/...
endpoints with path-traversal guards + RBAC + Content-Disposition
attachment for docs.
- Curator-focused format guide at /marketplace/format-guide; canonical
source is docs/curated-marketplace-format.md, also linked from the
admin /admin/marketplaces page next to + Add Marketplace.
See CHANGELOG.md under [Unreleased] for the full breakdown.
* Fix format-guide test assertion to match shortened disclaimer
The 'Flea Market' phrase was trimmed out of the disclaimer in
docs/curated-marketplace-format.md after the curator-focused rewrite.
Update the rendered-HTML test to assert the channel-scoping phrase
that's actually present ('Curated Marketplace channel only') rather
than the 'Flea Market' contrast that's no longer in the doc.
* Drop unused 'version' field from agnes-metadata.json schema
The parser never read it; it was a YAGNI placeholder for future
schema evolution. Curators don't need to wonder what to put there
when adding the file for the first time. Will be re-added if and
when we actually introduce a backwards-incompatible schema change.
* Harden asset mirror against SSRF via redirect + DNS rebinding
The pre-flight _is_safe_url check validated only the initial URL;
urllib.request.urlopen then followed redirects and re-resolved DNS for
the actual connection — both bypassable. Attacker-controlled origin
could 302 to http://169.254.169.254/... and exfil cloud metadata;
attacker-controlled DNS could return public IP first / 127.0.0.1 second.
Replace urlopen call with a shared OpenerDirector wired through three
custom handlers: _SafeRedirectHandler re-runs SSRF allowlist on every
redirect Location (max 5 hops, down from urllib's 10), and
_PinnedHTTPHandler / _PinnedHTTPSHandler connect to the IP that passed
validation rather than re-resolving the hostname. TLS SNI + cert verify
stay bound to the original hostname.
_resolve_safe returns the validated IP (the existing _is_safe_url
2-tuple wrapper stays for backwards compatibility) and rejects round-
robin DNS that mixes a public + private record. _UnsafeRedirectError
is a typed exception so _fetch_url can map redirect blocks to terminal
'rejected' status (not transient 'failed'). _http_open is the single
call site so tests can mock at one well-defined seam.
Tests cover redirect blocking (link-local, loopback), redirect-error
unwrapping inside URLError, pinned-IP connection target, and the
end-to-end DNS-rebinding scenario. Existing tests that mocked
urllib.request.urlopen are migrated to mock _http_open.
* Harden /asset/ endpoint against stored XSS
The endpoint served any file in the cloned marketplace repo with
stdlib-detected Content-Type, so a curator who landed evil.html (or a
renamed evil.png carrying HTML bytes) in the working tree got a
same-origin XSS — the response shares cookie scope with /admin and
/api/me/*.
The asset endpoint is image-only by contract (cover photos referenced
from agnes-metadata.json + inner skill / agent cards), so applying the
same allowlist + magic-bytes pattern that /doc/ already uses closes
the gap without breaking any legitimate use case. Three layered
checks: extension in IMAGE_EXTENSIONS (.png/.jpg/.jpeg/.webp; SVG
excluded — <script> inside SVG executes), validate_image_file magic
bytes (defeats rename-extension attack), Content-Type pinned from the
validated extension (never stdlib mimetypes).
Defense-in-depth: X-Content-Type-Options: nosniff stops browser MIME
sniffing; Content-Security-Policy: default-src 'none' blocks script /
iframe execution even if a future regression let HTML through.
Tests cover the .html extension reject, the renamed-HTML-as-PNG magic-
bytes reject, the .svg reject, and the happy-path PNG with security
headers attached. The pre-existing path-traversal test seeds a real
PNG instead of ok.txt now that the endpoint is image-only.
* Enforce mandatory curator on marketplace PATCH
The POST handler enforced curator_name + curator_email at create time,
but PATCH treated empty / missing curator inputs as 'no change'. Legacy
rows that pre-date v32 (curator_name=NULL) could be edited indefinitely
without ever filling the curator gap, and OWNER_TODO_PLACEHOLDER lingered
on every /marketplace card.
Reject the PATCH with 400 when the post-merge row would persist with
empty curator. The check fires after the existing field-merge logic, so
once-filled rows that don't touch curator still pass through (their
existing values fall through from the DB row). DB column stays nullable
so untouched legacy rows continue to coexist — the gate fires only the
moment an admin opens the edit modal.
Existing PATCH semantics preserved: empty-string input still means 'leave
existing value alone', and once-filled curator can't be cleared (those
test cases pass unchanged). New test seeds a legacy row directly via the
repository, then exercises url-only PATCH (rejected), partial-fill PATCH
(rejected), and full-fill PATCH (succeeds); a follow-up no-curator PATCH
on the now-formed row also passes.
* Drop unused curated-marketplace helpers (PR #234 review)
* build_db_payload — imported by src/marketplace.py but never called.
The strict-drop semantics it would have implemented were re-written
inline in _refresh_plugin_cache (see the comment block there). The
standalone helper still carried the old fall-back-to-original-external-
URL-on-mirror-failure behaviour, which contradicts the documented
drop-when-can't-deliver contract — a future contributor who re-wired
it would have introduced a silent regression. Delete with the helper
+ the import + the comment that referenced it.
* _resolve_marketplace_name — one-line shim with no remaining call
sites. Callers use _resolve_marketplace_meta which returns name +
curator together, avoiding the double DB hit the shim exists to
hide.
* '# noqa: F401 Optional kept for forward-compat' was wrong — Optional
IS used in src/marketplace.py (line 70 and line 238). Drop the noqa
comment so a future ruff run doesn't try to remove a real import.
Removing build_db_payload also drops the only remaining use of Optional
in src/marketplace_metadata.py, so the import comes out there too.
* Cap agnes-metadata.json size + catch RecursionError on parse
The reader is invoked once per marketplace per sync and the file is
curator-controlled. Two failure modes were unguarded:
* Multi-GB JSON: path.read_text() pulled the whole file into memory
before json.loads even ran. A curator with commit access to an
upstream repo could OOM the sync worker.
* Deeply-nested JSON under any size cap: cpython's recursive object /
array parser raises RecursionError at ~1000 levels of depth.
RecursionError is a RuntimeError, not ValueError, so the existing
catch let it propagate up and abort the entire sync — every other
marketplace in the same pass got skipped.
Add AGNES_METADATA_MAX_BYTES = 1 MiB (a real metadata file with covers,
docs, categories for ~50 plugins fits in <100 KB so the cap is
generous) and gate the size check on path.stat().st_size before the
body read. Broaden the parse except to (ValueError, RecursionError)
with a unified log line. Both failure modes degrade to the same
empty-dict fall-back the malformed-JSON path already used, so one bad
upstream never aborts the rest of the sync.
Tests cover the size cap firing before json.loads (whitespace-padded
valid JSON exceeding the cap) and the recursion path (5000 nested
arrays — past cpython's default recursion limit but well under the
size cap).
* Persist asset-mirror manifest per body write, before unlink
sync_assets wrote each body atomically (tmp + rename) but persisted
the manifest only at the end of the batch. A kill -9 mid-Phase 2 left
on-disk files the manifest never referenced. Once a curator dropped
that URL from agnes-metadata.json, Phase 3's cleanup had no record of
the file and the orphan stayed forever — there's no GC pass walking
the cache dir today, so disk would slowly bloat.
Phase 2 (body-write iteration): after the in-memory manifest mutation,
persist BEFORE unlinking the previous body. The crash window narrows
from 'all of Phase 2' to 'between persist and unlink' (microseconds).
A persist failure mid-batch keeps the previous body on disk — the on-
disk manifest still references it, and a stale-but-existing file beats
a 404. Cost: one extra tmp+rename per body write; manifest is a few KB
so the overhead is negligible vs. the HTTP fetches.
Phase 3 (curator-removed URLs): same discipline. Collect the to-delete
relpaths, persist the manifest with the entries already gone, THEN
unlink. A crash mid-cleanup leaves at most a microsecond window where
files exist despite the manifest no longer naming them. The next sync
reads the (correct) manifest and the orphan stays orphaned, but the
served state is consistent.
Tests cover per-body persist call count, the post-update on-disk
manifest content, and Phase 3 ordering verified by reading the on-disk
manifest from inside Path.unlink.
* Consolidate marketplace video embeds + format-guide CSS
The YouTube nocookie / Vimeo / <video> / link-fallback detection logic
was duplicated verbatim in marketplace_plugin_detail.html and
marketplace_item_detail.html (~40 JS lines each, with subtly-different
inline styles). Both templates now {% include %} a single
_marketplace_video_embed.html partial inside their IIFE so the regex,
the nocookie attribute set, and the unknown-host link fallback live in
ONE place — future tweaks (new host, new attribute, fixed sandbox flag)
no longer need to be applied twice in lockstep.
The .video-wrap selectors (one inline <style> rule in plugin_detail,
one inline style='...' attribute in item_detail) are replaced by the
existing .video-embed 16:9 wrapper in style-custom.css, with new
.video-embed video / .video-embed a child rules added so the wrapper
handles all four embed shapes uniformly without per-template
positioning.
The 60-line inline <style> block in marketplace_format_guide.html
moves verbatim to style-custom.css under a new 'Marketplace format
guide page' section, scoped to .format-guide so other pages aren't
affected.
No user-visible behaviour change: the rendered HTML for valid
YouTube / Vimeo / mp4 / external links is byte-identical to before,
and the format-guide page renders the same.
* Maintainability cleanup batch (PR #234 review)
#10: drop _path_under from app/api/marketplace.py — it was a byte-
equivalent clone of _safe_join (same Path.resolve(strict=True) +
relative_to() containment check). The three v32 endpoint handlers
(/asset, /doc, /mirrored) now share the existing helper.
#14: rename src/marketplace_assets.py → src/marketplace_asset_validation.py
so the file's purpose is obvious from the name and the previous
overlap with src/marketplace_asset_mirror.py is gone. Six call-site
imports updated in lockstep; CHANGELOG references under [Unreleased]
updated to track the new path.
#11: consolidate the URL builders that resolve
/api/marketplace/curated/<slug>/<plugin>/{asset,doc,mirrored}/...
paths. _internal_asset_url / _internal_doc_url / _mirrored_asset_url
lived in src/marketplace.py, while a copy named _mirrored_url lived
in app/api/marketplace.py with a 'must stay aligned' comment. New
module src/marketplace_urls.py is the single source of truth — both
call sites import from it and a future URL-format tweak only needs
to change one file. The _ROUTE_PREFIX constant collapses the per-
function f-string repetition. The route-handler endpoints themselves
still own the path string literals (keeping the builders identical
to the route declarations remains a checklist item, not a runtime
guarantee).
* Re-key asset-mirror manifest by (plugin, url) + dedup HTTP fetches
The manifest used to be keyed by URL alone, so two plugins in the
same marketplace referencing the same external image (a shared CDN
icon, a common cover) collided on entry.plugin_name — last writer
won. The DB row for the losing plugin then stored a served URL
pointing under the winning plugin's tree, and require_resource_access
denied legitimate access on one side and let the other plugin's user
reach the wrong asset.
In-memory: Dict[Tuple[str, str], MirrorEntry] keyed (plugin_name, url).
On disk: format flips from {url: entry} dict to [entry, ...] list of
self-describing entries (each carries plugin_name + url + the
previous fields). JSON keys can't be tuples; encoding 'plugin::url'
would just shift the parsing burden.
Phase 1 of sync_assets deduplicates fetches by URL — three plugins
sharing one URL share one HTTP request. The conditional-GET prior is
picked from any owning plugin's prior entry; if their etags diverge
(rare) we miss one 304 and pay for a full re-download instead.
Phase 2 still creates a per-(plugin, url) manifest entry pointing
under the plugin's own subdir, and Phase 3 cleanup is keyed the same
way so dropping a URL from one plugin's metadata doesn't disturb
another plugin still referencing it.
Body files stay per plugin (RBAC-clean isolation: deleting plugin A's
cache can't strand plugin B). Bandwidth saved by fetch dedup.
Consumer code re-keyed: src.marketplace._refresh_plugin_cache rebuilt
served_url_for / mirror_status as composite-keyed maps;
app.api.marketplace._resolve_external_via_mirror /
_curated_inner_cover / _curated_inner_enrichment look up by
(plugin_name, url).
Tests cover per-plugin manifest entries with shared URL, the single
HTTP fetch for N plugins, and Phase 3 drop-one-keep-other. All
existing tests migrated to composite key access; v2 list format
assertions verify on-disk shape.
* Migrate asset mirror from urllib.request to httpx
The asset mirror was the only HTTP call site in Agnes still using
urllib.request; every other module (CLI, Jira / OpenMetadata / OpenAI
connectors, scheduler, Telegram bot) already used httpx. The asset
mirror was added in this PR's base commit, so this is the only chance
to bring it into convention before someone copies it as 'the pattern
for HTTP fetches in Agnes'.
Three concrete benefits beyond consistency:
* SSRF defence collapses from five urllib classes
(_PinnedHTTPConnection, _PinnedHTTPSConnection, _PinnedHTTPHandler,
_PinnedHTTPSHandler, _SafeRedirectHandler) into one
_SSRFGuardTransport. httpx invokes handle_request() on every redirect
hop, so re-validation is free — we don't need a custom redirect
handler at all.
* DNS-rebinding defence: the transport rewrites request.url.host to the
SSRF-validated IP before delegating to super().handle_request().
httpcore connects to whatever URL.host says, so this pins the
connection without subclassing HTTPSConnection. The original hostname
goes into the Host header + the sni_hostname extension so TLS / vhost
routing still bind to the curator-supplied hostname.
* Error handling: one httpx.HTTPError catch-all for transport errors,
plus specific httpx.TimeoutException / httpx.TooManyRedirects branches
for clearer diagnostics. Matches the _translate_transport_error shape
in cli/client.py.
The shared httpx.Client is built lazily at module load (same pattern as
cli/client.py:_get_shared_client) with follow_redirects=True,
max_redirects=5, timeout=HTTP_TIMEOUT_SEC, and our custom transport.
Externally observable behaviour is unchanged: same FetchOutcome
statuses, same manifest format, same conditional GET semantics, same
body-size cap.
Tests migrated from urllib-shaped fakes to httpx-shaped (status_code,
iter_bytes, context manager). Five urllib-specific tests replaced with
httpx equivalents — three transport unit tests + one DNS-rebinding
integration test that verifies host rewrite via monkey-patched
super().handle_request. One test deleted without replacement
(unwrap-URLError-wrapping-an-_UnsafeRedirectError — urllib-specific,
not applicable to httpx).
* Surface curated agnes-metadata enrichment on My Stack tab
GET /api/marketplace/items?tab=my built each curated row from the
on-disk marketplace.json by way of resolve_allowed_plugins, which
doesn't carry the agnes-metadata enrichment columns
(cover_photo_url, video_url, category override, doc_links). The
handler then hard-coded cover_photo_url=None on the synthetic row.
Result: once a user clicked '+ Add to my stack' on a curated card,
the same plugin in tab=my rendered with the gradient placeholder
instead of its cover photo — confusing parity break vs. the curated
tab where the same row goes through MarketplacePluginsRepository
and gets the enriched columns.
Pre-load the enriched marketplace_plugins rows for every marketplace
the user is subscribed to, then look each granted+subscribed plugin
up by (marketplace_id, plugin_name). Fall back to the on-disk
synthetic shape only when the DB row is missing — happens during
the rare race where RBAC is granted before the first sync cycle
ingests the plugin. RBAC gating (granted set from
resolve_allowed_plugins) is unchanged so this fix can't widen
visibility; it just upgrades the data shape behind cards the user
was already going to see.
Per-marketplace list_for_marketplace beats N gets — typical user is
subscribed to <5 marketplaces, so this is at most a handful of
queries vs. one per subscribed plugin.
Regression test seeds a plugin with cover_photo_url + category
override, subscribes the user, hits /api/marketplace/items?tab=my,
and asserts photo_url + category come through. The misleading
'fall through to gradient until the user re-visits the curated tab'
comment is gone.
---------
Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
465 lines
18 KiB
Python
465 lines
18 KiB
Python
"""v20 adds source_query column to table_registry.
|
|
|
|
Backs query_mode='materialized' for BigQuery: admin registers a SQL body
|
|
that the scheduler runs through the DuckDB BQ extension and writes as a
|
|
parquet to /data/extracts/bigquery/data/<id>.parquet.
|
|
|
|
The v19 step (#150) drops dataset_permissions, access_requests tables and
|
|
users.role, table_registry.is_public columns; v20 then ALTERs the post-v19
|
|
table_registry to add the source_query column.
|
|
"""
|
|
import duckdb
|
|
|
|
from src.db import SCHEMA_VERSION, _ensure_schema, get_schema_version
|
|
|
|
|
|
def test_schema_version_is_37():
|
|
# v27 → v28: explicit-install (Model B) for curated marketplace plugins.
|
|
# user_plugin_optouts row presence flips meaning from "excluded" to
|
|
# "subscribed"; migration wipes existing rows so the inverted reading
|
|
# starts from a clean baseline. Also adds marketplace_plugins.created_at
|
|
# (per-plugin "newest first" sort on /marketplace), backfilled from
|
|
# parent marketplace_registry.registered_at.
|
|
# v28 → v29: /home page rollout — instance_templates singleton
|
|
# consolidation (welcome_template + claude_md_template merged) + new
|
|
# users.onboarded column. See tests/test_v29_home_migration.py for
|
|
# the exhaustive coverage of that step.
|
|
# v29 → v30: news_template — single versioned table for the /home
|
|
# news perex + /news permalink page. See
|
|
# tests/test_news_template_repository.py.
|
|
# v30 → v31: session-pipeline framework — session_processor_state
|
|
# replaces session_extraction_state with composite PK.
|
|
# v31 → v32 (PR #233): flea-market upload guardrails — adds
|
|
# store_entities.visibility_status + creates store_submissions.
|
|
# v32 → v33 (PR #233): forensic columns on store_submissions —
|
|
# file_size, bundle_sha256, bundle_purged_at. Underpins the
|
|
# persist-blocked-bundle behavior so admins can Rescan /
|
|
# Override / Download; 30-day TTL purge clears bytes while
|
|
# keeping the row + sha intact. See docs/STORE_GUARDRAILS.md.
|
|
# v33 → v34: drop store_submissions.retry_count — counter mixed LLM
|
|
# error count + admin rescan count, redundant with audit_log.
|
|
# v34 → v35 (PR #233): store_entities gains 'archived' visibility
|
|
# state + archived_at + archived_by audit columns. Owner
|
|
# soft-delete writes 'archived'; existing user_store_installs
|
|
# keep serving the bundle through marketplace.zip / .git.
|
|
# Hard delete (DELETE ?hard=true) remains admin-only.
|
|
# v35 → v36 (PR #233 follow-up): re-apply NOT NULL + DEFAULT 'pending'
|
|
# on store_entities.visibility_status. Lost in the v34→v35
|
|
# column rebuild. Without this, an INSERT that omits the
|
|
# column lands NULL → repo reads None → undefined behavior
|
|
# in the visibility gates. Value-list invariant remains
|
|
# enforced application-side (DuckDB ADD CHECK on existing
|
|
# column not supported).
|
|
# v36 → v37 (this PR): curated marketplace enrichment from
|
|
# `.claude-plugin/agnes-metadata.json` plus mandatory curator
|
|
# identity on marketplace_registry. Adds curator_name +
|
|
# curator_email to marketplace_registry, and
|
|
# cover_photo_url + video_url + doc_links to
|
|
# marketplace_plugins.
|
|
assert SCHEMA_VERSION == 37
|
|
|
|
|
|
def test_v37_marketplace_curator_columns(tmp_path):
|
|
"""Fresh install reaches v37 with the new marketplace columns present."""
|
|
db_path = tmp_path / "system.duckdb"
|
|
conn = duckdb.connect(str(db_path))
|
|
_ensure_schema(conn)
|
|
|
|
registry_cols = {
|
|
r[0] for r in conn.execute(
|
|
"SELECT column_name FROM information_schema.columns "
|
|
"WHERE table_name = 'marketplace_registry'"
|
|
).fetchall()
|
|
}
|
|
assert {"curator_name", "curator_email"} <= registry_cols, (
|
|
f"curator columns missing from marketplace_registry: {registry_cols}"
|
|
)
|
|
|
|
plugin_cols = {
|
|
r[0] for r in conn.execute(
|
|
"SELECT column_name FROM information_schema.columns "
|
|
"WHERE table_name = 'marketplace_plugins'"
|
|
).fetchall()
|
|
}
|
|
assert {"cover_photo_url", "video_url", "doc_links"} <= plugin_cols, (
|
|
f"enrichment columns missing from marketplace_plugins: {plugin_cols}"
|
|
)
|
|
conn.close()
|
|
|
|
|
|
def test_v36_db_migrates_to_v37(tmp_path):
|
|
"""Pre-existing v36 DB (with the v36 schema) upgrades cleanly to v37 without
|
|
losing existing marketplace_registry / marketplace_plugins rows."""
|
|
db_path = tmp_path / "system.duckdb"
|
|
conn = duckdb.connect(str(db_path))
|
|
|
|
# Stand up a minimal v36-shape registry + plugin row, plus the
|
|
# schema_version row that pins us to 36.
|
|
conn.execute(
|
|
"CREATE TABLE schema_version (version INTEGER, "
|
|
"applied_at TIMESTAMP DEFAULT current_timestamp)"
|
|
)
|
|
conn.execute("INSERT INTO schema_version (version) VALUES (36)")
|
|
conn.execute("""CREATE TABLE marketplace_registry (
|
|
id VARCHAR PRIMARY KEY, name VARCHAR NOT NULL,
|
|
url VARCHAR NOT NULL, branch VARCHAR, token_env VARCHAR,
|
|
description TEXT, registered_by VARCHAR,
|
|
registered_at TIMESTAMP DEFAULT current_timestamp,
|
|
last_synced_at TIMESTAMP, last_commit_sha VARCHAR, last_error TEXT
|
|
)""")
|
|
conn.execute("""CREATE TABLE marketplace_plugins (
|
|
marketplace_id VARCHAR NOT NULL, name VARCHAR NOT NULL,
|
|
description TEXT, version VARCHAR, author_name VARCHAR,
|
|
homepage VARCHAR, category VARCHAR, source_type VARCHAR,
|
|
source_spec JSON, raw JSON,
|
|
created_at TIMESTAMP DEFAULT current_timestamp,
|
|
updated_at TIMESTAMP DEFAULT current_timestamp,
|
|
PRIMARY KEY (marketplace_id, name)
|
|
)""")
|
|
conn.execute(
|
|
"INSERT INTO marketplace_registry (id, name, url) "
|
|
"VALUES ('legacy', 'Legacy', 'https://example.com/repo.git')"
|
|
)
|
|
conn.execute(
|
|
"INSERT INTO marketplace_plugins (marketplace_id, name) "
|
|
"VALUES ('legacy', 'foo')"
|
|
)
|
|
|
|
_ensure_schema(conn)
|
|
assert get_schema_version(conn) == SCHEMA_VERSION
|
|
|
|
# New columns exist and existing rows preserved with NULL enrichment.
|
|
row = conn.execute(
|
|
"SELECT curator_name, curator_email FROM marketplace_registry "
|
|
"WHERE id = 'legacy'"
|
|
).fetchone()
|
|
assert row == (None, None)
|
|
|
|
row = conn.execute(
|
|
"SELECT cover_photo_url, video_url, doc_links FROM marketplace_plugins "
|
|
"WHERE marketplace_id = 'legacy' AND name = 'foo'"
|
|
).fetchone()
|
|
assert row == (None, None, None)
|
|
conn.close()
|
|
|
|
|
|
def test_v20_adds_source_query(tmp_path):
|
|
db_path = tmp_path / "system.duckdb"
|
|
conn = duckdb.connect(str(db_path))
|
|
_ensure_schema(conn)
|
|
|
|
cols = {
|
|
r[0] for r in conn.execute(
|
|
"SELECT column_name FROM information_schema.columns "
|
|
"WHERE table_name = 'table_registry'"
|
|
).fetchall()
|
|
}
|
|
assert "source_query" in cols, f"source_query missing from {cols}"
|
|
assert get_schema_version(conn) == SCHEMA_VERSION
|
|
conn.close()
|
|
|
|
|
|
def test_claude_md_template_seeded_in_instance_templates(tmp_path):
|
|
"""v23 introduced claude_md_template as a singleton table; v28 consolidates
|
|
it into instance_templates keyed 'claude_md'. Post-v28 the legacy table is
|
|
dropped — the canonical lookup is `instance_templates WHERE key='claude_md'`.
|
|
|
|
See tests/test_v28_migration.py for the migration path coverage. This test
|
|
just verifies the seeded row is present on a fresh install.
|
|
"""
|
|
db_path = tmp_path / "system.duckdb"
|
|
conn = duckdb.connect(str(db_path))
|
|
_ensure_schema(conn)
|
|
|
|
tables = {
|
|
r[0] for r in conn.execute(
|
|
"SELECT table_name FROM information_schema.tables "
|
|
"WHERE table_schema = 'main'"
|
|
).fetchall()
|
|
}
|
|
assert "instance_templates" in tables
|
|
assert "claude_md_template" not in tables, (
|
|
"claude_md_template should be consolidated away post-v28"
|
|
)
|
|
|
|
row = conn.execute(
|
|
"SELECT key, content FROM instance_templates WHERE key = 'claude_md'"
|
|
).fetchone()
|
|
assert row is not None
|
|
assert row[0] == "claude_md"
|
|
assert row[1] is None # default = no override
|
|
conn.close()
|
|
|
|
|
|
def test_v19_db_migrates_to_v20(tmp_path):
|
|
"""Pre-existing v19 DB (post-RBAC-drop) without source_query upgrades
|
|
cleanly without losing data."""
|
|
db_path = tmp_path / "system.duckdb"
|
|
conn = duckdb.connect(str(db_path))
|
|
|
|
# Simulate a v19 DB at minimal but realistic shape: schema_version row +
|
|
# a table_registry row in the post-v19 column shape (no is_public column,
|
|
# since v19 finalize dropped it via the table-rebuild idiom).
|
|
conn.execute(
|
|
"CREATE TABLE schema_version (version INTEGER, "
|
|
"applied_at TIMESTAMP DEFAULT current_timestamp)"
|
|
)
|
|
conn.execute("INSERT INTO schema_version (version) VALUES (19)")
|
|
conn.execute("""CREATE TABLE table_registry (
|
|
id VARCHAR PRIMARY KEY, name VARCHAR NOT NULL,
|
|
source_type VARCHAR, bucket VARCHAR, source_table VARCHAR,
|
|
sync_strategy VARCHAR DEFAULT 'full_refresh',
|
|
query_mode VARCHAR DEFAULT 'local',
|
|
sync_schedule VARCHAR, profile_after_sync BOOLEAN DEFAULT true,
|
|
primary_key VARCHAR, folder VARCHAR, description TEXT,
|
|
registered_by VARCHAR,
|
|
registered_at TIMESTAMP DEFAULT current_timestamp
|
|
)""")
|
|
conn.execute("INSERT INTO table_registry (id, name) VALUES ('foo', 'foo')")
|
|
|
|
_ensure_schema(conn)
|
|
|
|
assert get_schema_version(conn) == SCHEMA_VERSION # bumped 19→28 forward
|
|
cols = {
|
|
r[0] for r in conn.execute(
|
|
"SELECT column_name FROM information_schema.columns "
|
|
"WHERE table_name = 'table_registry'"
|
|
).fetchall()
|
|
}
|
|
assert "source_query" in cols
|
|
# Existing row preserved, new column NULL
|
|
row = conn.execute(
|
|
"SELECT id, source_query FROM table_registry WHERE id='foo'"
|
|
).fetchone()
|
|
assert row == ("foo", None)
|
|
conn.close()
|
|
|
|
|
|
def _make_v34_store_entities(conn):
|
|
"""Build a minimal v34-shape store_entities table for v34→v35 path tests.
|
|
|
|
Only includes the columns the v34→v35 migration touches; the rest of
|
|
the schema isn't needed because the function operates only on
|
|
store_entities's column set.
|
|
"""
|
|
conn.execute("""
|
|
CREATE TABLE store_entities (
|
|
id VARCHAR PRIMARY KEY,
|
|
visibility_status VARCHAR DEFAULT 'pending'
|
|
)
|
|
""")
|
|
conn.execute(
|
|
"INSERT INTO store_entities (id, visibility_status) VALUES "
|
|
"('a', 'approved'), ('b', 'pending'), ('c', 'hidden')"
|
|
)
|
|
|
|
|
|
def test_v34_to_v35_clean_path_rebuilds_visibility_column(tmp_path):
|
|
"""Standard v34 → v35 path: ``visibility_status`` is present, no temp
|
|
column. Migration rebuilds the column without the legacy CHECK so
|
|
'archived' becomes a valid value, preserves all row values, and adds
|
|
the audit columns.
|
|
"""
|
|
from src.db import _v34_to_v35_migrate
|
|
|
|
db_path = tmp_path / "system.duckdb"
|
|
conn = duckdb.connect(str(db_path))
|
|
_make_v34_store_entities(conn)
|
|
|
|
_v34_to_v35_migrate(conn)
|
|
|
|
cols = {
|
|
r[0] for r in conn.execute(
|
|
"SELECT column_name FROM information_schema.columns "
|
|
"WHERE table_name = 'store_entities'"
|
|
).fetchall()
|
|
}
|
|
assert "visibility_status" in cols
|
|
assert "_vis_v35" not in cols, "temp column must be cleaned up"
|
|
assert "archived_at" in cols
|
|
assert "archived_by" in cols
|
|
|
|
rows = dict(conn.execute(
|
|
"SELECT id, visibility_status FROM store_entities ORDER BY id"
|
|
).fetchall())
|
|
assert rows == {"a": "approved", "b": "pending", "c": "hidden"}, (
|
|
f"row values must survive the rebuild: {rows}"
|
|
)
|
|
conn.close()
|
|
|
|
|
|
def test_v34_to_v35_recovers_from_partial_rebuild_missing_visibility(tmp_path):
|
|
"""Partial-rebuild recovery: a previous migration attempt completed
|
|
steps 3-5 (added _vis_v35, copied values, dropped visibility_status)
|
|
but failed before step 6 (RENAME). Subsequent restarts hit
|
|
DROP visibility_status (no IF EXISTS guard) and looped on the same
|
|
error, leaving the DB stranded with schema_version stuck pre-v35.
|
|
|
|
The new code detects this state — _vis_v35 present, visibility_status
|
|
absent — and finishes the rebuild with the RENAME alone instead of
|
|
re-running the full destructive sequence.
|
|
"""
|
|
from src.db import _v34_to_v35_migrate
|
|
|
|
db_path = tmp_path / "system.duckdb"
|
|
conn = duckdb.connect(str(db_path))
|
|
# Hand-build the broken state: store_entities with _vis_v35 instead of
|
|
# visibility_status, populated with the canonical values.
|
|
conn.execute("""
|
|
CREATE TABLE store_entities (
|
|
id VARCHAR PRIMARY KEY,
|
|
_vis_v35 VARCHAR
|
|
)
|
|
""")
|
|
conn.execute(
|
|
"INSERT INTO store_entities (id, _vis_v35) VALUES "
|
|
"('a', 'approved'), ('b', 'pending'), ('c', 'hidden')"
|
|
)
|
|
|
|
_v34_to_v35_migrate(conn)
|
|
|
|
cols = {
|
|
r[0] for r in conn.execute(
|
|
"SELECT column_name FROM information_schema.columns "
|
|
"WHERE table_name = 'store_entities'"
|
|
).fetchall()
|
|
}
|
|
assert "visibility_status" in cols
|
|
assert "_vis_v35" not in cols
|
|
assert "archived_at" in cols
|
|
assert "archived_by" in cols
|
|
|
|
rows = dict(conn.execute(
|
|
"SELECT id, visibility_status FROM store_entities ORDER BY id"
|
|
).fetchall())
|
|
assert rows == {"a": "approved", "b": "pending", "c": "hidden"}, (
|
|
f"row values must come back via RENAME, not be lost: {rows}"
|
|
)
|
|
conn.close()
|
|
|
|
|
|
def test_v34_to_v35_recovers_from_partial_rebuild_both_columns(tmp_path):
|
|
"""Edge state: a prior attempt aborted before the DROP, leaving both
|
|
visibility_status (canonical) and _vis_v35 (temp) on the table.
|
|
The recovery path drops _vis_v35 and keeps visibility_status — the
|
|
rest of the schema expects that name.
|
|
"""
|
|
from src.db import _v34_to_v35_migrate
|
|
|
|
db_path = tmp_path / "system.duckdb"
|
|
conn = duckdb.connect(str(db_path))
|
|
conn.execute("""
|
|
CREATE TABLE store_entities (
|
|
id VARCHAR PRIMARY KEY,
|
|
visibility_status VARCHAR,
|
|
_vis_v35 VARCHAR
|
|
)
|
|
""")
|
|
conn.execute(
|
|
"INSERT INTO store_entities (id, visibility_status, _vis_v35) VALUES "
|
|
"('a', 'approved', 'approved')"
|
|
)
|
|
|
|
_v34_to_v35_migrate(conn)
|
|
|
|
cols = {
|
|
r[0] for r in conn.execute(
|
|
"SELECT column_name FROM information_schema.columns "
|
|
"WHERE table_name = 'store_entities'"
|
|
).fetchall()
|
|
}
|
|
assert "visibility_status" in cols
|
|
assert "_vis_v35" not in cols, "temp column must be dropped"
|
|
|
|
row = conn.execute(
|
|
"SELECT id, visibility_status FROM store_entities WHERE id = 'a'"
|
|
).fetchone()
|
|
assert row == ("a", "approved")
|
|
conn.close()
|
|
|
|
|
|
def test_v32_db_with_partial_v35_recovers_through_full_ladder(tmp_path):
|
|
"""End-to-end: a DB stranded at schema_version=32 with the half-applied
|
|
v34→v35 state (visibility_status dropped, _vis_v35 left behind) must
|
|
upgrade cleanly through the full ladder when ``_ensure_schema`` runs.
|
|
|
|
This is the production scenario observed in operator instances after
|
|
the original list-form ``_V34_TO_V35_MIGRATIONS`` failed mid-run on
|
|
a fresh restart.
|
|
"""
|
|
db_path = tmp_path / "system.duckdb"
|
|
conn = duckdb.connect(str(db_path))
|
|
|
|
# Stand up the broken state. We only need enough of the schema for the
|
|
# migration ladder to run — ``_ensure_schema`` will create the rest
|
|
# via ``_SYSTEM_SCHEMA``'s IF NOT EXISTS guards.
|
|
conn.execute(
|
|
"CREATE TABLE schema_version (version INTEGER, "
|
|
"applied_at TIMESTAMP DEFAULT current_timestamp)"
|
|
)
|
|
conn.execute("INSERT INTO schema_version (version) VALUES (32)")
|
|
conn.execute("""
|
|
CREATE TABLE store_entities (
|
|
id VARCHAR PRIMARY KEY,
|
|
owner_user_id VARCHAR,
|
|
owner_username VARCHAR,
|
|
type VARCHAR,
|
|
name VARCHAR,
|
|
archived_at TIMESTAMP,
|
|
archived_by VARCHAR,
|
|
_vis_v35 VARCHAR
|
|
)
|
|
""")
|
|
conn.execute(
|
|
"INSERT INTO store_entities (id, type, name, _vis_v35) "
|
|
"VALUES ('a', 'skill', 'alpha', 'approved')"
|
|
)
|
|
|
|
_ensure_schema(conn)
|
|
|
|
assert get_schema_version(conn) == SCHEMA_VERSION
|
|
cols = {
|
|
r[0] for r in conn.execute(
|
|
"SELECT column_name FROM information_schema.columns "
|
|
"WHERE table_name = 'store_entities'"
|
|
).fetchall()
|
|
}
|
|
assert "visibility_status" in cols
|
|
assert "_vis_v35" not in cols
|
|
# Existing row preserved, value carried over from _vis_v35.
|
|
row = conn.execute(
|
|
"SELECT id, visibility_status FROM store_entities WHERE id = 'a'"
|
|
).fetchone()
|
|
assert row == ("a", "approved")
|
|
conn.close()
|
|
|
|
|
|
def test_v35_to_v36_reapplies_visibility_constraints(tmp_path):
|
|
"""v34→v35 dropped NOT NULL + DEFAULT when rebuilding the column to
|
|
drop the legacy CHECK; v35→v36 re-applies them. Verifies that on a
|
|
freshly migrated DB, an INSERT omitting visibility_status either
|
|
inherits the default 'pending' or fails — never lands NULL.
|
|
"""
|
|
db_path = tmp_path / "system.duckdb"
|
|
conn = duckdb.connect(str(db_path))
|
|
_ensure_schema(conn)
|
|
assert get_schema_version(conn) == SCHEMA_VERSION
|
|
|
|
cols = conn.execute(
|
|
"SELECT column_name, is_nullable, column_default "
|
|
"FROM information_schema.columns "
|
|
"WHERE table_name = 'store_entities' "
|
|
" AND column_name = 'visibility_status'"
|
|
).fetchall()
|
|
assert cols, "visibility_status column missing from store_entities"
|
|
name, is_nullable, default_expr = cols[0]
|
|
assert is_nullable == "NO", (
|
|
f"visibility_status must be NOT NULL after v36; got is_nullable={is_nullable!r}"
|
|
)
|
|
# DuckDB renders the default as a quoted literal — match either form.
|
|
assert default_expr is not None, "visibility_status DEFAULT must be set"
|
|
assert "pending" in str(default_expr).lower(), (
|
|
f"visibility_status DEFAULT must be 'pending'; got {default_expr!r}"
|
|
)
|
|
|
|
conn.close()
|