* Curated marketplace enrichment via agnes-metadata.json + curator metadata
Adds a second well-defined metadata file `.claude-plugin/agnes-metadata.json`
that upstream marketplace repos can opt into, providing per-plugin (and
per-skill / per-agent) cover photo, demo video URL, doc links, and
category override. The Claude Code marketplace contract is untouched —
agnes-metadata.json + the convention `.agnes/` directory are stripped
from the synthetic Claude Code marketplace served via /marketplace.zip
and /marketplace.git/*, so user instances see a clean Claude Code repo
with no Agnes-only metadata.
Highlights:
- DB schema v32 — adds curator_name + curator_email on marketplace_registry,
cover_photo_url + video_url + doc_links on marketplace_plugins.
- Mandatory curator at marketplace registration, editable later through
the admin UI; surfaces on cards + detail pages in place of owner_todo.
- External-asset mirror cache at ${DATA_DIR}/marketplace-cache/<slug>/
with conditional GET, 60s timeout, 10 MB body cap, SSRF guards, and
Wikipedia-policy-compliant User-Agent.
- Strict drop semantics — anything Agnes can't deliver as a real PDF /
Markdown / plain text doc, or a real PNG / JPEG / WebP cover, is
dropped from the served metadata; UI looks identical to no-entry case
(gradient placeholder for missing covers, no row in the doc list).
- Doc allowlist + image allowlist enforced on both the curated mirror
flow and the Flea upload flow (/store/new); shared module
src/marketplace_assets.py.
- New /api/marketplace/curated/{mp}/{plugin}/{asset,doc,mirrored}/...
endpoints with path-traversal guards + RBAC + Content-Disposition
attachment for docs.
- Curator-focused format guide at /marketplace/format-guide; canonical
source is docs/curated-marketplace-format.md, also linked from the
admin /admin/marketplaces page next to + Add Marketplace.
See CHANGELOG.md under [Unreleased] for the full breakdown.
* Fix format-guide test assertion to match shortened disclaimer
The 'Flea Market' phrase was trimmed out of the disclaimer in
docs/curated-marketplace-format.md after the curator-focused rewrite.
Update the rendered-HTML test to assert the channel-scoping phrase
that's actually present ('Curated Marketplace channel only') rather
than the 'Flea Market' contrast that's no longer in the doc.
* Drop unused 'version' field from agnes-metadata.json schema
The parser never read it; it was a YAGNI placeholder for future
schema evolution. Curators don't need to wonder what to put there
when adding the file for the first time. Will be re-added if and
when we actually introduce a backwards-incompatible schema change.
* Harden asset mirror against SSRF via redirect + DNS rebinding
The pre-flight _is_safe_url check validated only the initial URL;
urllib.request.urlopen then followed redirects and re-resolved DNS for
the actual connection — both bypassable. Attacker-controlled origin
could 302 to http://169.254.169.254/... and exfil cloud metadata;
attacker-controlled DNS could return public IP first / 127.0.0.1 second.
Replace urlopen call with a shared OpenerDirector wired through three
custom handlers: _SafeRedirectHandler re-runs SSRF allowlist on every
redirect Location (max 5 hops, down from urllib's 10), and
_PinnedHTTPHandler / _PinnedHTTPSHandler connect to the IP that passed
validation rather than re-resolving the hostname. TLS SNI + cert verify
stay bound to the original hostname.
_resolve_safe returns the validated IP (the existing _is_safe_url
2-tuple wrapper stays for backwards compatibility) and rejects round-
robin DNS that mixes a public + private record. _UnsafeRedirectError
is a typed exception so _fetch_url can map redirect blocks to terminal
'rejected' status (not transient 'failed'). _http_open is the single
call site so tests can mock at one well-defined seam.
Tests cover redirect blocking (link-local, loopback), redirect-error
unwrapping inside URLError, pinned-IP connection target, and the
end-to-end DNS-rebinding scenario. Existing tests that mocked
urllib.request.urlopen are migrated to mock _http_open.
* Harden /asset/ endpoint against stored XSS
The endpoint served any file in the cloned marketplace repo with
stdlib-detected Content-Type, so a curator who landed evil.html (or a
renamed evil.png carrying HTML bytes) in the working tree got a
same-origin XSS — the response shares cookie scope with /admin and
/api/me/*.
The asset endpoint is image-only by contract (cover photos referenced
from agnes-metadata.json + inner skill / agent cards), so applying the
same allowlist + magic-bytes pattern that /doc/ already uses closes
the gap without breaking any legitimate use case. Three layered
checks: extension in IMAGE_EXTENSIONS (.png/.jpg/.jpeg/.webp; SVG
excluded — <script> inside SVG executes), validate_image_file magic
bytes (defeats rename-extension attack), Content-Type pinned from the
validated extension (never stdlib mimetypes).
Defense-in-depth: X-Content-Type-Options: nosniff stops browser MIME
sniffing; Content-Security-Policy: default-src 'none' blocks script /
iframe execution even if a future regression let HTML through.
Tests cover the .html extension reject, the renamed-HTML-as-PNG magic-
bytes reject, the .svg reject, and the happy-path PNG with security
headers attached. The pre-existing path-traversal test seeds a real
PNG instead of ok.txt now that the endpoint is image-only.
* Enforce mandatory curator on marketplace PATCH
The POST handler enforced curator_name + curator_email at create time,
but PATCH treated empty / missing curator inputs as 'no change'. Legacy
rows that pre-date v32 (curator_name=NULL) could be edited indefinitely
without ever filling the curator gap, and OWNER_TODO_PLACEHOLDER lingered
on every /marketplace card.
Reject the PATCH with 400 when the post-merge row would persist with
empty curator. The check fires after the existing field-merge logic, so
once-filled rows that don't touch curator still pass through (their
existing values fall through from the DB row). DB column stays nullable
so untouched legacy rows continue to coexist — the gate fires only the
moment an admin opens the edit modal.
Existing PATCH semantics preserved: empty-string input still means 'leave
existing value alone', and once-filled curator can't be cleared (those
test cases pass unchanged). New test seeds a legacy row directly via the
repository, then exercises url-only PATCH (rejected), partial-fill PATCH
(rejected), and full-fill PATCH (succeeds); a follow-up no-curator PATCH
on the now-formed row also passes.
* Drop unused curated-marketplace helpers (PR #234 review)
* build_db_payload — imported by src/marketplace.py but never called.
The strict-drop semantics it would have implemented were re-written
inline in _refresh_plugin_cache (see the comment block there). The
standalone helper still carried the old fall-back-to-original-external-
URL-on-mirror-failure behaviour, which contradicts the documented
drop-when-can't-deliver contract — a future contributor who re-wired
it would have introduced a silent regression. Delete with the helper
+ the import + the comment that referenced it.
* _resolve_marketplace_name — one-line shim with no remaining call
sites. Callers use _resolve_marketplace_meta which returns name +
curator together, avoiding the double DB hit the shim exists to
hide.
* '# noqa: F401 Optional kept for forward-compat' was wrong — Optional
IS used in src/marketplace.py (line 70 and line 238). Drop the noqa
comment so a future ruff run doesn't try to remove a real import.
Removing build_db_payload also drops the only remaining use of Optional
in src/marketplace_metadata.py, so the import comes out there too.
* Cap agnes-metadata.json size + catch RecursionError on parse
The reader is invoked once per marketplace per sync and the file is
curator-controlled. Two failure modes were unguarded:
* Multi-GB JSON: path.read_text() pulled the whole file into memory
before json.loads even ran. A curator with commit access to an
upstream repo could OOM the sync worker.
* Deeply-nested JSON under any size cap: cpython's recursive object /
array parser raises RecursionError at ~1000 levels of depth.
RecursionError is a RuntimeError, not ValueError, so the existing
catch let it propagate up and abort the entire sync — every other
marketplace in the same pass got skipped.
Add AGNES_METADATA_MAX_BYTES = 1 MiB (a real metadata file with covers,
docs, categories for ~50 plugins fits in <100 KB so the cap is
generous) and gate the size check on path.stat().st_size before the
body read. Broaden the parse except to (ValueError, RecursionError)
with a unified log line. Both failure modes degrade to the same
empty-dict fall-back the malformed-JSON path already used, so one bad
upstream never aborts the rest of the sync.
Tests cover the size cap firing before json.loads (whitespace-padded
valid JSON exceeding the cap) and the recursion path (5000 nested
arrays — past cpython's default recursion limit but well under the
size cap).
* Persist asset-mirror manifest per body write, before unlink
sync_assets wrote each body atomically (tmp + rename) but persisted
the manifest only at the end of the batch. A kill -9 mid-Phase 2 left
on-disk files the manifest never referenced. Once a curator dropped
that URL from agnes-metadata.json, Phase 3's cleanup had no record of
the file and the orphan stayed forever — there's no GC pass walking
the cache dir today, so disk would slowly bloat.
Phase 2 (body-write iteration): after the in-memory manifest mutation,
persist BEFORE unlinking the previous body. The crash window narrows
from 'all of Phase 2' to 'between persist and unlink' (microseconds).
A persist failure mid-batch keeps the previous body on disk — the on-
disk manifest still references it, and a stale-but-existing file beats
a 404. Cost: one extra tmp+rename per body write; manifest is a few KB
so the overhead is negligible vs. the HTTP fetches.
Phase 3 (curator-removed URLs): same discipline. Collect the to-delete
relpaths, persist the manifest with the entries already gone, THEN
unlink. A crash mid-cleanup leaves at most a microsecond window where
files exist despite the manifest no longer naming them. The next sync
reads the (correct) manifest and the orphan stays orphaned, but the
served state is consistent.
Tests cover per-body persist call count, the post-update on-disk
manifest content, and Phase 3 ordering verified by reading the on-disk
manifest from inside Path.unlink.
* Consolidate marketplace video embeds + format-guide CSS
The YouTube nocookie / Vimeo / <video> / link-fallback detection logic
was duplicated verbatim in marketplace_plugin_detail.html and
marketplace_item_detail.html (~40 JS lines each, with subtly-different
inline styles). Both templates now {% include %} a single
_marketplace_video_embed.html partial inside their IIFE so the regex,
the nocookie attribute set, and the unknown-host link fallback live in
ONE place — future tweaks (new host, new attribute, fixed sandbox flag)
no longer need to be applied twice in lockstep.
The .video-wrap selectors (one inline <style> rule in plugin_detail,
one inline style='...' attribute in item_detail) are replaced by the
existing .video-embed 16:9 wrapper in style-custom.css, with new
.video-embed video / .video-embed a child rules added so the wrapper
handles all four embed shapes uniformly without per-template
positioning.
The 60-line inline <style> block in marketplace_format_guide.html
moves verbatim to style-custom.css under a new 'Marketplace format
guide page' section, scoped to .format-guide so other pages aren't
affected.
No user-visible behaviour change: the rendered HTML for valid
YouTube / Vimeo / mp4 / external links is byte-identical to before,
and the format-guide page renders the same.
* Maintainability cleanup batch (PR #234 review)
#10: drop _path_under from app/api/marketplace.py — it was a byte-
equivalent clone of _safe_join (same Path.resolve(strict=True) +
relative_to() containment check). The three v32 endpoint handlers
(/asset, /doc, /mirrored) now share the existing helper.
#14: rename src/marketplace_assets.py → src/marketplace_asset_validation.py
so the file's purpose is obvious from the name and the previous
overlap with src/marketplace_asset_mirror.py is gone. Six call-site
imports updated in lockstep; CHANGELOG references under [Unreleased]
updated to track the new path.
#11: consolidate the URL builders that resolve
/api/marketplace/curated/<slug>/<plugin>/{asset,doc,mirrored}/...
paths. _internal_asset_url / _internal_doc_url / _mirrored_asset_url
lived in src/marketplace.py, while a copy named _mirrored_url lived
in app/api/marketplace.py with a 'must stay aligned' comment. New
module src/marketplace_urls.py is the single source of truth — both
call sites import from it and a future URL-format tweak only needs
to change one file. The _ROUTE_PREFIX constant collapses the per-
function f-string repetition. The route-handler endpoints themselves
still own the path string literals (keeping the builders identical
to the route declarations remains a checklist item, not a runtime
guarantee).
* Re-key asset-mirror manifest by (plugin, url) + dedup HTTP fetches
The manifest used to be keyed by URL alone, so two plugins in the
same marketplace referencing the same external image (a shared CDN
icon, a common cover) collided on entry.plugin_name — last writer
won. The DB row for the losing plugin then stored a served URL
pointing under the winning plugin's tree, and require_resource_access
denied legitimate access on one side and let the other plugin's user
reach the wrong asset.
In-memory: Dict[Tuple[str, str], MirrorEntry] keyed (plugin_name, url).
On disk: format flips from {url: entry} dict to [entry, ...] list of
self-describing entries (each carries plugin_name + url + the
previous fields). JSON keys can't be tuples; encoding 'plugin::url'
would just shift the parsing burden.
Phase 1 of sync_assets deduplicates fetches by URL — three plugins
sharing one URL share one HTTP request. The conditional-GET prior is
picked from any owning plugin's prior entry; if their etags diverge
(rare) we miss one 304 and pay for a full re-download instead.
Phase 2 still creates a per-(plugin, url) manifest entry pointing
under the plugin's own subdir, and Phase 3 cleanup is keyed the same
way so dropping a URL from one plugin's metadata doesn't disturb
another plugin still referencing it.
Body files stay per plugin (RBAC-clean isolation: deleting plugin A's
cache can't strand plugin B). Bandwidth saved by fetch dedup.
Consumer code re-keyed: src.marketplace._refresh_plugin_cache rebuilt
served_url_for / mirror_status as composite-keyed maps;
app.api.marketplace._resolve_external_via_mirror /
_curated_inner_cover / _curated_inner_enrichment look up by
(plugin_name, url).
Tests cover per-plugin manifest entries with shared URL, the single
HTTP fetch for N plugins, and Phase 3 drop-one-keep-other. All
existing tests migrated to composite key access; v2 list format
assertions verify on-disk shape.
* Migrate asset mirror from urllib.request to httpx
The asset mirror was the only HTTP call site in Agnes still using
urllib.request; every other module (CLI, Jira / OpenMetadata / OpenAI
connectors, scheduler, Telegram bot) already used httpx. The asset
mirror was added in this PR's base commit, so this is the only chance
to bring it into convention before someone copies it as 'the pattern
for HTTP fetches in Agnes'.
Three concrete benefits beyond consistency:
* SSRF defence collapses from five urllib classes
(_PinnedHTTPConnection, _PinnedHTTPSConnection, _PinnedHTTPHandler,
_PinnedHTTPSHandler, _SafeRedirectHandler) into one
_SSRFGuardTransport. httpx invokes handle_request() on every redirect
hop, so re-validation is free — we don't need a custom redirect
handler at all.
* DNS-rebinding defence: the transport rewrites request.url.host to the
SSRF-validated IP before delegating to super().handle_request().
httpcore connects to whatever URL.host says, so this pins the
connection without subclassing HTTPSConnection. The original hostname
goes into the Host header + the sni_hostname extension so TLS / vhost
routing still bind to the curator-supplied hostname.
* Error handling: one httpx.HTTPError catch-all for transport errors,
plus specific httpx.TimeoutException / httpx.TooManyRedirects branches
for clearer diagnostics. Matches the _translate_transport_error shape
in cli/client.py.
The shared httpx.Client is built lazily at module load (same pattern as
cli/client.py:_get_shared_client) with follow_redirects=True,
max_redirects=5, timeout=HTTP_TIMEOUT_SEC, and our custom transport.
Externally observable behaviour is unchanged: same FetchOutcome
statuses, same manifest format, same conditional GET semantics, same
body-size cap.
Tests migrated from urllib-shaped fakes to httpx-shaped (status_code,
iter_bytes, context manager). Five urllib-specific tests replaced with
httpx equivalents — three transport unit tests + one DNS-rebinding
integration test that verifies host rewrite via monkey-patched
super().handle_request. One test deleted without replacement
(unwrap-URLError-wrapping-an-_UnsafeRedirectError — urllib-specific,
not applicable to httpx).
* Surface curated agnes-metadata enrichment on My Stack tab
GET /api/marketplace/items?tab=my built each curated row from the
on-disk marketplace.json by way of resolve_allowed_plugins, which
doesn't carry the agnes-metadata enrichment columns
(cover_photo_url, video_url, category override, doc_links). The
handler then hard-coded cover_photo_url=None on the synthetic row.
Result: once a user clicked '+ Add to my stack' on a curated card,
the same plugin in tab=my rendered with the gradient placeholder
instead of its cover photo — confusing parity break vs. the curated
tab where the same row goes through MarketplacePluginsRepository
and gets the enriched columns.
Pre-load the enriched marketplace_plugins rows for every marketplace
the user is subscribed to, then look each granted+subscribed plugin
up by (marketplace_id, plugin_name). Fall back to the on-disk
synthetic shape only when the DB row is missing — happens during
the rare race where RBAC is granted before the first sync cycle
ingests the plugin. RBAC gating (granted set from
resolve_allowed_plugins) is unchanged so this fix can't widen
visibility; it just upgrades the data shape behind cards the user
was already going to see.
Per-marketplace list_for_marketplace beats N gets — typical user is
subscribed to <5 marketplaces, so this is at most a handful of
queries vs. one per subscribed plugin.
Regression test seeds a plugin with cover_photo_url + category
override, subscribes the user, hits /api/marketplace/items?tab=my,
and asserts photo_url + category come through. The misleading
'fall through to gradient until the user re-visits the curated tab'
comment is gone.
---------
Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
437 lines
17 KiB
Python
437 lines
17 KiB
Python
"""End-to-end smoke tests for v32 endpoints + Flea allowlist behavior.
|
|
|
|
* GET /marketplace/format-guide — auth-gated (any logged-in user, no admin),
|
|
renders the markdown source.
|
|
* /api/marketplace/curated/.../asset/{path} — path-traversal guard, RBAC.
|
|
* Flea upload allowlist on /api/store/entities (photo + doc).
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import io
|
|
from pathlib import Path
|
|
|
|
import pytest
|
|
|
|
|
|
# --- /marketplace/format-guide -------------------------------------------
|
|
|
|
|
|
def test_format_guide_requires_login(seeded_app):
|
|
"""Anonymous user gets redirected (302) to /login — no public access."""
|
|
client = seeded_app["client"]
|
|
r = client.get("/marketplace/format-guide", follow_redirects=False)
|
|
# The guide endpoint is wrapped by the same auth dependency the rest of
|
|
# /marketplace/* uses; the exact response is a 302/303 to /login or a
|
|
# 401 depending on the auth provider. Either way it's not a 200.
|
|
assert r.status_code in (302, 303, 307, 401)
|
|
|
|
|
|
def test_format_guide_renders_for_logged_in_user(seeded_app):
|
|
"""Any logged-in user (admin in this seeded fixture) sees the rendered page.
|
|
|
|
The rendered HTML must:
|
|
* carry the curator-focused title (post-walkthrough rewrite),
|
|
* include the disclaimer scoping the guide to the Curated Marketplace
|
|
channel only (so a curator doesn't think it applies to the Flea
|
|
upload wizard),
|
|
* render the fenced JSON example as a <pre><code> block.
|
|
"""
|
|
client = seeded_app["client"]
|
|
headers = {"Authorization": f"Bearer {seeded_app['admin_token']}"}
|
|
r = client.get("/marketplace/format-guide", headers=headers)
|
|
assert r.status_code == 200
|
|
body = r.text
|
|
# Title points at curators (rewrite from the original "format guide").
|
|
assert "Curated Marketplace" in body
|
|
# Channel-scoping disclaimer near the top of the page.
|
|
assert "Curated Marketplace channel only" in body
|
|
# JSON examples render as code blocks.
|
|
assert "<pre>" in body or "<code" in body
|
|
|
|
|
|
# --- /api/store/entities — Flea allowlist enforcement --------------------
|
|
|
|
|
|
@pytest.fixture
|
|
def _flea_zip_for_skill(tmp_path):
|
|
"""Return the bytes of a minimal valid skill .zip for /entities upload.
|
|
|
|
Mirrors the layout `app/api/store.py::_validate_and_extract_metadata`
|
|
expects: a single SKILL.md at the top of the archive with frontmatter
|
|
declaring `name` + `description`.
|
|
"""
|
|
import zipfile
|
|
buf = io.BytesIO()
|
|
with zipfile.ZipFile(buf, "w", compression=zipfile.ZIP_DEFLATED) as zf:
|
|
zf.writestr(
|
|
"SKILL.md",
|
|
"---\nname: testskill\ndescription: A test skill\n---\nbody\n",
|
|
)
|
|
return buf.getvalue()
|
|
|
|
|
|
def test_flea_doc_upload_rejects_docx(seeded_app, _flea_zip_for_skill):
|
|
"""v32: Flea doc upload allowlist — .docx is rejected with 415."""
|
|
client = seeded_app["client"]
|
|
headers = {"Authorization": f"Bearer {seeded_app['admin_token']}"}
|
|
r = client.post(
|
|
"/api/store/entities",
|
|
headers=headers,
|
|
files=[
|
|
("file", ("skill.zip", _flea_zip_for_skill, "application/zip")),
|
|
("docs", ("notes.docx", b"PKfake-docx-content", "application/vnd.openxmlformats")),
|
|
],
|
|
data={"type": "skill", "version": "1.0"},
|
|
)
|
|
assert r.status_code == 415
|
|
assert "unsupported_doc_type" in r.text or "doc_extension" in r.text
|
|
|
|
|
|
def test_flea_doc_upload_accepts_pdf(seeded_app, _flea_zip_for_skill):
|
|
"""A real PDF (with %PDF magic bytes) makes it through both extension +
|
|
body validation."""
|
|
client = seeded_app["client"]
|
|
headers = {"Authorization": f"Bearer {seeded_app['admin_token']}"}
|
|
pdf_body = b"%PDF-1.4\n% comment\n"
|
|
r = client.post(
|
|
"/api/store/entities",
|
|
headers=headers,
|
|
files=[
|
|
("file", ("skill.zip", _flea_zip_for_skill, "application/zip")),
|
|
("docs", ("setup.pdf", pdf_body, "application/pdf")),
|
|
],
|
|
data={"type": "skill", "version": "1.0"},
|
|
)
|
|
assert r.status_code == 201, r.text
|
|
|
|
|
|
def test_flea_doc_upload_rejects_pdf_with_bad_magic_bytes(seeded_app, _flea_zip_for_skill):
|
|
"""Defense in depth: a file named .pdf but with non-PDF body bytes is
|
|
rejected by the magic-bytes check."""
|
|
client = seeded_app["client"]
|
|
headers = {"Authorization": f"Bearer {seeded_app['admin_token']}"}
|
|
r = client.post(
|
|
"/api/store/entities",
|
|
headers=headers,
|
|
files=[
|
|
("file", ("skill.zip", _flea_zip_for_skill, "application/zip")),
|
|
("docs", ("evil.pdf", b"not a pdf at all", "application/pdf")),
|
|
],
|
|
data={"type": "skill", "version": "1.0"},
|
|
)
|
|
assert r.status_code == 415
|
|
assert "magic_bytes" in r.text or "unsupported" in r.text
|
|
|
|
|
|
def test_flea_photo_upload_rejects_svg(seeded_app, _flea_zip_for_skill):
|
|
"""SVG photos are rejected — extension check (svg not in allowlist) fires
|
|
first and returns 415 with photo_unsupported_format."""
|
|
client = seeded_app["client"]
|
|
headers = {"Authorization": f"Bearer {seeded_app['admin_token']}"}
|
|
r = client.post(
|
|
"/api/store/entities",
|
|
headers=headers,
|
|
files=[
|
|
("file", ("skill.zip", _flea_zip_for_skill, "application/zip")),
|
|
("photo", ("logo.svg", b"<svg></svg>", "image/svg+xml")),
|
|
],
|
|
data={"type": "skill", "version": "1.0"},
|
|
)
|
|
assert r.status_code == 415
|
|
assert "photo_unsupported_format" in r.text
|
|
|
|
|
|
# --- /api/marketplace/curated/.../asset path-traversal guard -------------
|
|
|
|
|
|
_PNG_1x1 = (
|
|
b"\x89PNG\r\n\x1a\n"
|
|
b"\x00\x00\x00\rIHDR"
|
|
b"\x00\x00\x00\x01\x00\x00\x00\x01\x08\x06\x00\x00\x00"
|
|
b"\x1f\x15\xc4\x89"
|
|
b"\x00\x00\x00\nIDATx\x9cc\x00\x01\x00\x00\x05\x00\x01"
|
|
b"\r\n-\xb4"
|
|
b"\x00\x00\x00\x00IEND\xaeB`\x82"
|
|
)
|
|
|
|
|
|
def test_curated_asset_endpoint_blocks_path_traversal(seeded_app, monkeypatch, tmp_path):
|
|
"""Hitting the asset endpoint with `../` segments must return 404 (not the
|
|
file outside the marketplace dir). The route's ``Path.resolve(strict=True)
|
|
+ is_relative_to`` guard is the load-bearing check."""
|
|
from src.db import get_system_db
|
|
from src.repositories.marketplace_registry import MarketplaceRegistryRepository
|
|
from src.repositories.resource_grants import ResourceGrantsRepository
|
|
|
|
# Stand up a tiny on-disk marketplace at the configured marketplaces dir.
|
|
# The asset endpoint is image-only post-#234-review; the seeded fixture
|
|
# carries a real PNG (with magic bytes) so the sanity-200 case still
|
|
# works alongside the path-traversal regression check.
|
|
data_dir = Path(seeded_app["env"]["data_dir"])
|
|
repo_root = data_dir / "marketplaces" / "test-mp"
|
|
repo_root.mkdir(parents=True, exist_ok=True)
|
|
(repo_root / "ok.png").write_bytes(_PNG_1x1)
|
|
|
|
# Register the marketplace (with curator) + grant Admin access to its
|
|
# synthetic plugin so the resource_grants check passes.
|
|
conn = get_system_db()
|
|
try:
|
|
MarketplaceRegistryRepository(conn).register(
|
|
id="test-mp", name="Test", url="https://example.com/x.git",
|
|
curator_name="C", curator_email="c@example.com",
|
|
)
|
|
# Insert a plugin row so the RBAC check has a target. Resource id is
|
|
# `<slug>/<plugin>`.
|
|
conn.execute(
|
|
"INSERT OR REPLACE INTO marketplace_plugins "
|
|
"(marketplace_id, name) VALUES ('test-mp', 'demo')"
|
|
)
|
|
admin_group = conn.execute(
|
|
"SELECT id FROM user_groups WHERE name = 'Admin'"
|
|
).fetchone()
|
|
if admin_group:
|
|
try:
|
|
ResourceGrantsRepository(conn).create(
|
|
group_id=admin_group[0],
|
|
resource_type="marketplace_plugin",
|
|
resource_id="test-mp/demo",
|
|
assigned_by="test",
|
|
)
|
|
except Exception:
|
|
pass # already granted from a prior test run
|
|
finally:
|
|
conn.close()
|
|
|
|
client = seeded_app["client"]
|
|
headers = {"Authorization": f"Bearer {seeded_app['admin_token']}"}
|
|
|
|
# Sanity: a normal in-tree image fetch returns 200.
|
|
r = client.get(
|
|
"/api/marketplace/curated/test-mp/demo/asset/ok.png",
|
|
headers=headers,
|
|
)
|
|
assert r.status_code == 200
|
|
assert r.content == _PNG_1x1
|
|
|
|
# Traversal attempt: `..` segments. FastAPI's path param accepts the
|
|
# value; our ``_safe_join`` resolves it then checks containment, so the
|
|
# result is 404 (not 200, not 500).
|
|
r = client.get(
|
|
"/api/marketplace/curated/test-mp/demo/asset/../escape.txt",
|
|
headers=headers,
|
|
)
|
|
assert r.status_code == 404
|
|
|
|
|
|
# --- /asset/{path} XSS hardening (#234 review #3) -------------------------
|
|
|
|
|
|
def _seed_asset_marketplace(seeded_app, *, slug="xss-mp", plugin="demo"):
|
|
"""Helper: register marketplace + grant Admin RBAC. Returns repo_root."""
|
|
from src.db import get_system_db
|
|
from src.repositories.marketplace_registry import MarketplaceRegistryRepository
|
|
from src.repositories.resource_grants import ResourceGrantsRepository
|
|
|
|
data_dir = Path(seeded_app["env"]["data_dir"])
|
|
repo_root = data_dir / "marketplaces" / slug
|
|
repo_root.mkdir(parents=True, exist_ok=True)
|
|
|
|
conn = get_system_db()
|
|
try:
|
|
try:
|
|
MarketplaceRegistryRepository(conn).register(
|
|
id=slug, name=slug, url=f"https://example.com/{slug}.git",
|
|
curator_name="C", curator_email="c@example.com",
|
|
)
|
|
except Exception:
|
|
pass # already registered
|
|
conn.execute(
|
|
"INSERT OR REPLACE INTO marketplace_plugins "
|
|
"(marketplace_id, name) VALUES (?, ?)", [slug, plugin],
|
|
)
|
|
admin_group = conn.execute(
|
|
"SELECT id FROM user_groups WHERE name = 'Admin'"
|
|
).fetchone()
|
|
if admin_group:
|
|
try:
|
|
ResourceGrantsRepository(conn).create(
|
|
group_id=admin_group[0],
|
|
resource_type="marketplace_plugin",
|
|
resource_id=f"{slug}/{plugin}",
|
|
assigned_by="test",
|
|
)
|
|
except Exception:
|
|
pass
|
|
finally:
|
|
conn.close()
|
|
return repo_root
|
|
|
|
|
|
def test_curated_asset_rejects_html_extension(seeded_app):
|
|
"""A curator-planted ``.html`` in the cloned repo MUST NOT be served as
|
|
``text/html`` — extension allowlist denies any non-image extension."""
|
|
repo_root = _seed_asset_marketplace(seeded_app, slug="xss-html")
|
|
(repo_root / "evil.html").write_text(
|
|
"<script>alert('xss')</script>", encoding="utf-8",
|
|
)
|
|
client = seeded_app["client"]
|
|
headers = {"Authorization": f"Bearer {seeded_app['admin_token']}"}
|
|
|
|
r = client.get(
|
|
"/api/marketplace/curated/xss-html/demo/asset/evil.html",
|
|
headers=headers,
|
|
)
|
|
assert r.status_code == 415
|
|
assert "unsupported_asset_extension" in r.text
|
|
|
|
|
|
def test_curated_asset_rejects_renamed_html_as_png(seeded_app):
|
|
"""A curator who renames ``evil.html`` to ``evil.png`` must still be
|
|
blocked — magic-bytes validation catches the missing PNG signature."""
|
|
repo_root = _seed_asset_marketplace(seeded_app, slug="xss-rename")
|
|
(repo_root / "evil.png").write_text(
|
|
"<script>alert('xss')</script>", encoding="utf-8",
|
|
)
|
|
client = seeded_app["client"]
|
|
headers = {"Authorization": f"Bearer {seeded_app['admin_token']}"}
|
|
|
|
r = client.get(
|
|
"/api/marketplace/curated/xss-rename/demo/asset/evil.png",
|
|
headers=headers,
|
|
)
|
|
assert r.status_code == 415
|
|
assert "asset_validation_failed" in r.text
|
|
assert "png_magic_bytes_mismatch" in r.text
|
|
|
|
|
|
def test_curated_asset_rejects_svg_extension(seeded_app):
|
|
"""SVG is intentionally OUT of the allowlist — ``<script>`` inside SVG
|
|
executes in the browser, so even valid SVG would carry XSS risk."""
|
|
repo_root = _seed_asset_marketplace(seeded_app, slug="xss-svg")
|
|
(repo_root / "logo.svg").write_text(
|
|
'<svg xmlns="http://www.w3.org/2000/svg">'
|
|
'<script>alert("xss")</script></svg>',
|
|
encoding="utf-8",
|
|
)
|
|
client = seeded_app["client"]
|
|
headers = {"Authorization": f"Bearer {seeded_app['admin_token']}"}
|
|
|
|
r = client.get(
|
|
"/api/marketplace/curated/xss-svg/demo/asset/logo.svg",
|
|
headers=headers,
|
|
)
|
|
assert r.status_code == 415
|
|
|
|
|
|
def test_curated_asset_serves_valid_png_with_security_headers(seeded_app):
|
|
"""Happy path: a real PNG returns 200, ``Content-Type: image/png``,
|
|
plus the defense-in-depth headers ``X-Content-Type-Options: nosniff``
|
|
and a strict ``Content-Security-Policy`` that blocks script execution
|
|
even if a future regression let HTML through.
|
|
"""
|
|
repo_root = _seed_asset_marketplace(seeded_app, slug="xss-ok")
|
|
(repo_root / "cover.png").write_bytes(_PNG_1x1)
|
|
client = seeded_app["client"]
|
|
headers = {"Authorization": f"Bearer {seeded_app['admin_token']}"}
|
|
|
|
r = client.get(
|
|
"/api/marketplace/curated/xss-ok/demo/asset/cover.png",
|
|
headers=headers,
|
|
)
|
|
assert r.status_code == 200
|
|
assert r.headers.get("content-type", "").startswith("image/png")
|
|
assert r.headers.get("x-content-type-options") == "nosniff"
|
|
csp = r.headers.get("content-security-policy", "")
|
|
assert "default-src 'none'" in csp
|
|
assert "script-src" not in csp or "'none'" in csp # no script source allowed
|
|
|
|
|
|
# --- Content-Disposition force-download on /doc and /mirrored/docs --------
|
|
|
|
|
|
def _seed_marketplace_with_doc(seeded_app, *, slug="dl-mp", plugin="demo"):
|
|
"""Helper: register a marketplace, drop a sample doc into its working
|
|
tree, and grant Admin RBAC. Returns the data_dir for further fixturing."""
|
|
from src.db import get_system_db
|
|
from src.repositories.marketplace_registry import MarketplaceRegistryRepository
|
|
from src.repositories.resource_grants import ResourceGrantsRepository
|
|
|
|
data_dir = Path(seeded_app["env"]["data_dir"])
|
|
repo_root = data_dir / "marketplaces" / slug
|
|
repo_root.mkdir(parents=True, exist_ok=True)
|
|
(repo_root / "docs").mkdir(parents=True, exist_ok=True)
|
|
(repo_root / "docs" / "guide.md").write_text("# Guide\n", encoding="utf-8")
|
|
(repo_root / "logo.png").write_bytes(b"\x89PNG\r\n\x1a\n" + b"\x00" * 8)
|
|
|
|
conn = get_system_db()
|
|
try:
|
|
MarketplaceRegistryRepository(conn).register(
|
|
id=slug, name=slug, url=f"https://example.com/{slug}.git",
|
|
curator_name="C", curator_email="c@example.com",
|
|
)
|
|
conn.execute(
|
|
"INSERT OR REPLACE INTO marketplace_plugins "
|
|
"(marketplace_id, name) VALUES (?, ?)", [slug, plugin],
|
|
)
|
|
admin_group = conn.execute(
|
|
"SELECT id FROM user_groups WHERE name = 'Admin'"
|
|
).fetchone()
|
|
if admin_group:
|
|
try:
|
|
ResourceGrantsRepository(conn).create(
|
|
group_id=admin_group[0],
|
|
resource_type="marketplace_plugin",
|
|
resource_id=f"{slug}/{plugin}",
|
|
assigned_by="test",
|
|
)
|
|
except Exception:
|
|
pass
|
|
finally:
|
|
conn.close()
|
|
return data_dir
|
|
|
|
|
|
def test_curated_doc_endpoint_sets_attachment_disposition(seeded_app):
|
|
"""/doc serves as `Content-Disposition: attachment` so clicks download
|
|
the file rather than open it inline. /asset stays inline (covers belong
|
|
in <img>, not as downloads)."""
|
|
_seed_marketplace_with_doc(seeded_app)
|
|
client = seeded_app["client"]
|
|
headers = {"Authorization": f"Bearer {seeded_app['admin_token']}"}
|
|
|
|
# Doc → attachment + filename hint
|
|
r = client.get(
|
|
"/api/marketplace/curated/dl-mp/demo/doc/docs/guide.md",
|
|
headers=headers,
|
|
)
|
|
assert r.status_code == 200
|
|
assert "attachment" in r.headers.get("content-disposition", "").lower()
|
|
assert "guide.md" in r.headers.get("content-disposition", "")
|
|
|
|
# Asset (cover image) → no attachment header (renders inline)
|
|
r = client.get(
|
|
"/api/marketplace/curated/dl-mp/demo/asset/logo.png",
|
|
headers=headers,
|
|
)
|
|
assert r.status_code == 200
|
|
assert "attachment" not in r.headers.get("content-disposition", "").lower()
|
|
|
|
|
|
def test_curated_doc_endpoint_rejects_non_allowlist_extension(seeded_app):
|
|
"""The /doc endpoint refuses to serve a file whose extension isn't in
|
|
the allowlist (PDF / Markdown / plain text). Defense-in-depth even when
|
|
the parser lets a curator's exotic path slip through."""
|
|
data_dir = _seed_marketplace_with_doc(seeded_app)
|
|
# Plant a .docx file in the seeded marketplace
|
|
(data_dir / "marketplaces" / "dl-mp" / "docs" / "evil.docx").write_text(
|
|
"PKfake", encoding="utf-8",
|
|
)
|
|
client = seeded_app["client"]
|
|
headers = {"Authorization": f"Bearer {seeded_app['admin_token']}"}
|
|
r = client.get(
|
|
"/api/marketplace/curated/dl-mp/demo/doc/docs/evil.docx",
|
|
headers=headers,
|
|
)
|
|
assert r.status_code == 415
|
|
assert "unsupported_doc_extension" in r.text
|