* Curated marketplace enrichment via agnes-metadata.json + curator metadata
Adds a second well-defined metadata file `.claude-plugin/agnes-metadata.json`
that upstream marketplace repos can opt into, providing per-plugin (and
per-skill / per-agent) cover photo, demo video URL, doc links, and
category override. The Claude Code marketplace contract is untouched —
agnes-metadata.json + the convention `.agnes/` directory are stripped
from the synthetic Claude Code marketplace served via /marketplace.zip
and /marketplace.git/*, so user instances see a clean Claude Code repo
with no Agnes-only metadata.
Highlights:
- DB schema v32 — adds curator_name + curator_email on marketplace_registry,
cover_photo_url + video_url + doc_links on marketplace_plugins.
- Mandatory curator at marketplace registration, editable later through
the admin UI; surfaces on cards + detail pages in place of owner_todo.
- External-asset mirror cache at ${DATA_DIR}/marketplace-cache/<slug>/
with conditional GET, 60s timeout, 10 MB body cap, SSRF guards, and
Wikipedia-policy-compliant User-Agent.
- Strict drop semantics — anything Agnes can't deliver as a real PDF /
Markdown / plain text doc, or a real PNG / JPEG / WebP cover, is
dropped from the served metadata; UI looks identical to no-entry case
(gradient placeholder for missing covers, no row in the doc list).
- Doc allowlist + image allowlist enforced on both the curated mirror
flow and the Flea upload flow (/store/new); shared module
src/marketplace_assets.py.
- New /api/marketplace/curated/{mp}/{plugin}/{asset,doc,mirrored}/...
endpoints with path-traversal guards + RBAC + Content-Disposition
attachment for docs.
- Curator-focused format guide at /marketplace/format-guide; canonical
source is docs/curated-marketplace-format.md, also linked from the
admin /admin/marketplaces page next to + Add Marketplace.
See CHANGELOG.md under [Unreleased] for the full breakdown.
* Fix format-guide test assertion to match shortened disclaimer
The 'Flea Market' phrase was trimmed out of the disclaimer in
docs/curated-marketplace-format.md after the curator-focused rewrite.
Update the rendered-HTML test to assert the channel-scoping phrase
that's actually present ('Curated Marketplace channel only') rather
than the 'Flea Market' contrast that's no longer in the doc.
* Drop unused 'version' field from agnes-metadata.json schema
The parser never read it; it was a YAGNI placeholder for future
schema evolution. Curators don't need to wonder what to put there
when adding the file for the first time. Will be re-added if and
when we actually introduce a backwards-incompatible schema change.
* Harden asset mirror against SSRF via redirect + DNS rebinding
The pre-flight _is_safe_url check validated only the initial URL;
urllib.request.urlopen then followed redirects and re-resolved DNS for
the actual connection — both bypassable. Attacker-controlled origin
could 302 to http://169.254.169.254/... and exfil cloud metadata;
attacker-controlled DNS could return public IP first / 127.0.0.1 second.
Replace urlopen call with a shared OpenerDirector wired through three
custom handlers: _SafeRedirectHandler re-runs SSRF allowlist on every
redirect Location (max 5 hops, down from urllib's 10), and
_PinnedHTTPHandler / _PinnedHTTPSHandler connect to the IP that passed
validation rather than re-resolving the hostname. TLS SNI + cert verify
stay bound to the original hostname.
_resolve_safe returns the validated IP (the existing _is_safe_url
2-tuple wrapper stays for backwards compatibility) and rejects round-
robin DNS that mixes a public + private record. _UnsafeRedirectError
is a typed exception so _fetch_url can map redirect blocks to terminal
'rejected' status (not transient 'failed'). _http_open is the single
call site so tests can mock at one well-defined seam.
Tests cover redirect blocking (link-local, loopback), redirect-error
unwrapping inside URLError, pinned-IP connection target, and the
end-to-end DNS-rebinding scenario. Existing tests that mocked
urllib.request.urlopen are migrated to mock _http_open.
* Harden /asset/ endpoint against stored XSS
The endpoint served any file in the cloned marketplace repo with
stdlib-detected Content-Type, so a curator who landed evil.html (or a
renamed evil.png carrying HTML bytes) in the working tree got a
same-origin XSS — the response shares cookie scope with /admin and
/api/me/*.
The asset endpoint is image-only by contract (cover photos referenced
from agnes-metadata.json + inner skill / agent cards), so applying the
same allowlist + magic-bytes pattern that /doc/ already uses closes
the gap without breaking any legitimate use case. Three layered
checks: extension in IMAGE_EXTENSIONS (.png/.jpg/.jpeg/.webp; SVG
excluded — <script> inside SVG executes), validate_image_file magic
bytes (defeats rename-extension attack), Content-Type pinned from the
validated extension (never stdlib mimetypes).
Defense-in-depth: X-Content-Type-Options: nosniff stops browser MIME
sniffing; Content-Security-Policy: default-src 'none' blocks script /
iframe execution even if a future regression let HTML through.
Tests cover the .html extension reject, the renamed-HTML-as-PNG magic-
bytes reject, the .svg reject, and the happy-path PNG with security
headers attached. The pre-existing path-traversal test seeds a real
PNG instead of ok.txt now that the endpoint is image-only.
* Enforce mandatory curator on marketplace PATCH
The POST handler enforced curator_name + curator_email at create time,
but PATCH treated empty / missing curator inputs as 'no change'. Legacy
rows that pre-date v32 (curator_name=NULL) could be edited indefinitely
without ever filling the curator gap, and OWNER_TODO_PLACEHOLDER lingered
on every /marketplace card.
Reject the PATCH with 400 when the post-merge row would persist with
empty curator. The check fires after the existing field-merge logic, so
once-filled rows that don't touch curator still pass through (their
existing values fall through from the DB row). DB column stays nullable
so untouched legacy rows continue to coexist — the gate fires only the
moment an admin opens the edit modal.
Existing PATCH semantics preserved: empty-string input still means 'leave
existing value alone', and once-filled curator can't be cleared (those
test cases pass unchanged). New test seeds a legacy row directly via the
repository, then exercises url-only PATCH (rejected), partial-fill PATCH
(rejected), and full-fill PATCH (succeeds); a follow-up no-curator PATCH
on the now-formed row also passes.
* Drop unused curated-marketplace helpers (PR #234 review)
* build_db_payload — imported by src/marketplace.py but never called.
The strict-drop semantics it would have implemented were re-written
inline in _refresh_plugin_cache (see the comment block there). The
standalone helper still carried the old fall-back-to-original-external-
URL-on-mirror-failure behaviour, which contradicts the documented
drop-when-can't-deliver contract — a future contributor who re-wired
it would have introduced a silent regression. Delete with the helper
+ the import + the comment that referenced it.
* _resolve_marketplace_name — one-line shim with no remaining call
sites. Callers use _resolve_marketplace_meta which returns name +
curator together, avoiding the double DB hit the shim exists to
hide.
* '# noqa: F401 Optional kept for forward-compat' was wrong — Optional
IS used in src/marketplace.py (line 70 and line 238). Drop the noqa
comment so a future ruff run doesn't try to remove a real import.
Removing build_db_payload also drops the only remaining use of Optional
in src/marketplace_metadata.py, so the import comes out there too.
* Cap agnes-metadata.json size + catch RecursionError on parse
The reader is invoked once per marketplace per sync and the file is
curator-controlled. Two failure modes were unguarded:
* Multi-GB JSON: path.read_text() pulled the whole file into memory
before json.loads even ran. A curator with commit access to an
upstream repo could OOM the sync worker.
* Deeply-nested JSON under any size cap: cpython's recursive object /
array parser raises RecursionError at ~1000 levels of depth.
RecursionError is a RuntimeError, not ValueError, so the existing
catch let it propagate up and abort the entire sync — every other
marketplace in the same pass got skipped.
Add AGNES_METADATA_MAX_BYTES = 1 MiB (a real metadata file with covers,
docs, categories for ~50 plugins fits in <100 KB so the cap is
generous) and gate the size check on path.stat().st_size before the
body read. Broaden the parse except to (ValueError, RecursionError)
with a unified log line. Both failure modes degrade to the same
empty-dict fall-back the malformed-JSON path already used, so one bad
upstream never aborts the rest of the sync.
Tests cover the size cap firing before json.loads (whitespace-padded
valid JSON exceeding the cap) and the recursion path (5000 nested
arrays — past cpython's default recursion limit but well under the
size cap).
* Persist asset-mirror manifest per body write, before unlink
sync_assets wrote each body atomically (tmp + rename) but persisted
the manifest only at the end of the batch. A kill -9 mid-Phase 2 left
on-disk files the manifest never referenced. Once a curator dropped
that URL from agnes-metadata.json, Phase 3's cleanup had no record of
the file and the orphan stayed forever — there's no GC pass walking
the cache dir today, so disk would slowly bloat.
Phase 2 (body-write iteration): after the in-memory manifest mutation,
persist BEFORE unlinking the previous body. The crash window narrows
from 'all of Phase 2' to 'between persist and unlink' (microseconds).
A persist failure mid-batch keeps the previous body on disk — the on-
disk manifest still references it, and a stale-but-existing file beats
a 404. Cost: one extra tmp+rename per body write; manifest is a few KB
so the overhead is negligible vs. the HTTP fetches.
Phase 3 (curator-removed URLs): same discipline. Collect the to-delete
relpaths, persist the manifest with the entries already gone, THEN
unlink. A crash mid-cleanup leaves at most a microsecond window where
files exist despite the manifest no longer naming them. The next sync
reads the (correct) manifest and the orphan stays orphaned, but the
served state is consistent.
Tests cover per-body persist call count, the post-update on-disk
manifest content, and Phase 3 ordering verified by reading the on-disk
manifest from inside Path.unlink.
* Consolidate marketplace video embeds + format-guide CSS
The YouTube nocookie / Vimeo / <video> / link-fallback detection logic
was duplicated verbatim in marketplace_plugin_detail.html and
marketplace_item_detail.html (~40 JS lines each, with subtly-different
inline styles). Both templates now {% include %} a single
_marketplace_video_embed.html partial inside their IIFE so the regex,
the nocookie attribute set, and the unknown-host link fallback live in
ONE place — future tweaks (new host, new attribute, fixed sandbox flag)
no longer need to be applied twice in lockstep.
The .video-wrap selectors (one inline <style> rule in plugin_detail,
one inline style='...' attribute in item_detail) are replaced by the
existing .video-embed 16:9 wrapper in style-custom.css, with new
.video-embed video / .video-embed a child rules added so the wrapper
handles all four embed shapes uniformly without per-template
positioning.
The 60-line inline <style> block in marketplace_format_guide.html
moves verbatim to style-custom.css under a new 'Marketplace format
guide page' section, scoped to .format-guide so other pages aren't
affected.
No user-visible behaviour change: the rendered HTML for valid
YouTube / Vimeo / mp4 / external links is byte-identical to before,
and the format-guide page renders the same.
* Maintainability cleanup batch (PR #234 review)
#10: drop _path_under from app/api/marketplace.py — it was a byte-
equivalent clone of _safe_join (same Path.resolve(strict=True) +
relative_to() containment check). The three v32 endpoint handlers
(/asset, /doc, /mirrored) now share the existing helper.
#14: rename src/marketplace_assets.py → src/marketplace_asset_validation.py
so the file's purpose is obvious from the name and the previous
overlap with src/marketplace_asset_mirror.py is gone. Six call-site
imports updated in lockstep; CHANGELOG references under [Unreleased]
updated to track the new path.
#11: consolidate the URL builders that resolve
/api/marketplace/curated/<slug>/<plugin>/{asset,doc,mirrored}/...
paths. _internal_asset_url / _internal_doc_url / _mirrored_asset_url
lived in src/marketplace.py, while a copy named _mirrored_url lived
in app/api/marketplace.py with a 'must stay aligned' comment. New
module src/marketplace_urls.py is the single source of truth — both
call sites import from it and a future URL-format tweak only needs
to change one file. The _ROUTE_PREFIX constant collapses the per-
function f-string repetition. The route-handler endpoints themselves
still own the path string literals (keeping the builders identical
to the route declarations remains a checklist item, not a runtime
guarantee).
* Re-key asset-mirror manifest by (plugin, url) + dedup HTTP fetches
The manifest used to be keyed by URL alone, so two plugins in the
same marketplace referencing the same external image (a shared CDN
icon, a common cover) collided on entry.plugin_name — last writer
won. The DB row for the losing plugin then stored a served URL
pointing under the winning plugin's tree, and require_resource_access
denied legitimate access on one side and let the other plugin's user
reach the wrong asset.
In-memory: Dict[Tuple[str, str], MirrorEntry] keyed (plugin_name, url).
On disk: format flips from {url: entry} dict to [entry, ...] list of
self-describing entries (each carries plugin_name + url + the
previous fields). JSON keys can't be tuples; encoding 'plugin::url'
would just shift the parsing burden.
Phase 1 of sync_assets deduplicates fetches by URL — three plugins
sharing one URL share one HTTP request. The conditional-GET prior is
picked from any owning plugin's prior entry; if their etags diverge
(rare) we miss one 304 and pay for a full re-download instead.
Phase 2 still creates a per-(plugin, url) manifest entry pointing
under the plugin's own subdir, and Phase 3 cleanup is keyed the same
way so dropping a URL from one plugin's metadata doesn't disturb
another plugin still referencing it.
Body files stay per plugin (RBAC-clean isolation: deleting plugin A's
cache can't strand plugin B). Bandwidth saved by fetch dedup.
Consumer code re-keyed: src.marketplace._refresh_plugin_cache rebuilt
served_url_for / mirror_status as composite-keyed maps;
app.api.marketplace._resolve_external_via_mirror /
_curated_inner_cover / _curated_inner_enrichment look up by
(plugin_name, url).
Tests cover per-plugin manifest entries with shared URL, the single
HTTP fetch for N plugins, and Phase 3 drop-one-keep-other. All
existing tests migrated to composite key access; v2 list format
assertions verify on-disk shape.
* Migrate asset mirror from urllib.request to httpx
The asset mirror was the only HTTP call site in Agnes still using
urllib.request; every other module (CLI, Jira / OpenMetadata / OpenAI
connectors, scheduler, Telegram bot) already used httpx. The asset
mirror was added in this PR's base commit, so this is the only chance
to bring it into convention before someone copies it as 'the pattern
for HTTP fetches in Agnes'.
Three concrete benefits beyond consistency:
* SSRF defence collapses from five urllib classes
(_PinnedHTTPConnection, _PinnedHTTPSConnection, _PinnedHTTPHandler,
_PinnedHTTPSHandler, _SafeRedirectHandler) into one
_SSRFGuardTransport. httpx invokes handle_request() on every redirect
hop, so re-validation is free — we don't need a custom redirect
handler at all.
* DNS-rebinding defence: the transport rewrites request.url.host to the
SSRF-validated IP before delegating to super().handle_request().
httpcore connects to whatever URL.host says, so this pins the
connection without subclassing HTTPSConnection. The original hostname
goes into the Host header + the sni_hostname extension so TLS / vhost
routing still bind to the curator-supplied hostname.
* Error handling: one httpx.HTTPError catch-all for transport errors,
plus specific httpx.TimeoutException / httpx.TooManyRedirects branches
for clearer diagnostics. Matches the _translate_transport_error shape
in cli/client.py.
The shared httpx.Client is built lazily at module load (same pattern as
cli/client.py:_get_shared_client) with follow_redirects=True,
max_redirects=5, timeout=HTTP_TIMEOUT_SEC, and our custom transport.
Externally observable behaviour is unchanged: same FetchOutcome
statuses, same manifest format, same conditional GET semantics, same
body-size cap.
Tests migrated from urllib-shaped fakes to httpx-shaped (status_code,
iter_bytes, context manager). Five urllib-specific tests replaced with
httpx equivalents — three transport unit tests + one DNS-rebinding
integration test that verifies host rewrite via monkey-patched
super().handle_request. One test deleted without replacement
(unwrap-URLError-wrapping-an-_UnsafeRedirectError — urllib-specific,
not applicable to httpx).
* Surface curated agnes-metadata enrichment on My Stack tab
GET /api/marketplace/items?tab=my built each curated row from the
on-disk marketplace.json by way of resolve_allowed_plugins, which
doesn't carry the agnes-metadata enrichment columns
(cover_photo_url, video_url, category override, doc_links). The
handler then hard-coded cover_photo_url=None on the synthetic row.
Result: once a user clicked '+ Add to my stack' on a curated card,
the same plugin in tab=my rendered with the gradient placeholder
instead of its cover photo — confusing parity break vs. the curated
tab where the same row goes through MarketplacePluginsRepository
and gets the enriched columns.
Pre-load the enriched marketplace_plugins rows for every marketplace
the user is subscribed to, then look each granted+subscribed plugin
up by (marketplace_id, plugin_name). Fall back to the on-disk
synthetic shape only when the DB row is missing — happens during
the rare race where RBAC is granted before the first sync cycle
ingests the plugin. RBAC gating (granted set from
resolve_allowed_plugins) is unchanged so this fix can't widen
visibility; it just upgrades the data shape behind cards the user
was already going to see.
Per-marketplace list_for_marketplace beats N gets — typical user is
subscribed to <5 marketplaces, so this is at most a handful of
queries vs. one per subscribed plugin.
Regression test seeds a plugin with cover_photo_url + category
override, subscribes the user, hits /api/marketplace/items?tab=my,
and asserts photo_url + category come through. The misleading
'fall through to gradient until the user re-visits the curated tab'
comment is gone.
---------
Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
2187 lines
84 KiB
Python
2187 lines
84 KiB
Python
"""Community-driven Store endpoints.
|
||
|
||
Any authenticated user can upload skills, agents, or plugins as ZIP archives.
|
||
Uploaded entities live under ``${DATA_DIR}/store/<entity_id>/`` and are surfaced
|
||
into each user's served Claude Code marketplace via ``user_store_installs``.
|
||
|
||
Per the per-user marketplace composition (see
|
||
``src/marketplace_filter.py:resolve_user_marketplace``):
|
||
|
||
served_set = (admin_granted ∖ opt_outs) ∪ store_installs
|
||
|
||
Entities owned by user X have their plugin/skill/agent name suffixed with
|
||
``-by-<owner-username>`` at upload time so two owners uploading the same
|
||
display name don't collide in Claude Code's flat namespace.
|
||
"""
|
||
|
||
from __future__ import annotations
|
||
|
||
import io
|
||
import json
|
||
import logging
|
||
import os
|
||
import re
|
||
import shutil
|
||
import tempfile
|
||
import uuid
|
||
import zipfile
|
||
from datetime import datetime, timezone
|
||
from pathlib import Path
|
||
from typing import Any, List, Optional
|
||
from urllib.parse import urlparse
|
||
|
||
import duckdb
|
||
from fastapi import (
|
||
APIRouter,
|
||
BackgroundTasks,
|
||
Depends,
|
||
File,
|
||
Form,
|
||
HTTPException,
|
||
Query,
|
||
UploadFile,
|
||
)
|
||
from fastapi.responses import FileResponse, Response
|
||
from pydantic import BaseModel
|
||
|
||
from app.auth.access import is_user_admin, require_admin
|
||
from app.auth.dependencies import _get_db, get_current_user
|
||
from app.instance_config import get_guardrails_enabled
|
||
from app.utils import get_store_dir
|
||
from src.db import get_system_db
|
||
from src.repositories.audit import AuditRepository
|
||
from src.repositories.store_entities import StoreEntitiesRepository
|
||
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||
from src.repositories.user_store_installs import UserStoreInstallsRepository
|
||
from src.repositories.users import UserRepository
|
||
from src.store_categories import STORE_CATEGORIES, is_valid_category
|
||
from src.store_guardrails import InlineResult, run_inline_checks, run_llm_review
|
||
from src.store_guardrails.runner import (
|
||
default_api_key_loader,
|
||
default_model_loader,
|
||
)
|
||
from src.store_naming import (
|
||
compute_entity_version,
|
||
sanitize_username,
|
||
suffixed_name,
|
||
)
|
||
|
||
logger = logging.getLogger(__name__)
|
||
router = APIRouter(prefix="/api/store", tags=["store"])
|
||
|
||
|
||
MAX_ZIP_SIZE = 50 * 1024 * 1024 # 50 MB — matches app/api/upload.py
|
||
MAX_PHOTO_SIZE = 5 * 1024 * 1024 # 5 MB
|
||
MAX_DOC_SIZE = 10 * 1024 * 1024 # 10 MB per uploaded doc
|
||
ALLOWED_PHOTO_EXT = {".jpg", ".jpeg", ".png", ".webp"}
|
||
_CHUNK_SIZE = 64 * 1024
|
||
_VALID_TYPES = {"skill", "agent", "plugin"}
|
||
_NAME_RE = re.compile(r"^[a-z][a-z0-9-]{0,63}$")
|
||
_FRONTMATTER_RE = re.compile(r"^---\s*\n(.*?)\n---", re.DOTALL)
|
||
_ALLOWED_VIDEO_SCHEMES = {"http", "https"}
|
||
|
||
# Cap on uncompressed total size of an uploaded ZIP. The compressed-side cap
|
||
# is MAX_ZIP_SIZE; an attacker could craft a 50 MB ZIP that decompresses to
|
||
# >>10 GB and DOS the host disk via _safe_zip_extract. We sum infolist()
|
||
# file_size before extracting and refuse anything above this bound.
|
||
MAX_ZIP_UNCOMPRESSED = 200 * 1024 * 1024 # 200 MB
|
||
|
||
|
||
def _suffixed_already_taken(
|
||
conn: duckdb.DuckDBPyConnection,
|
||
suffixed: str,
|
||
*,
|
||
exclude_entity_id: Optional[str] = None,
|
||
exclude_archived: bool = False,
|
||
) -> bool:
|
||
"""Whether any existing entity ships the same display+invocation name.
|
||
|
||
The Store namespace is **flat** in Claude Code — two plugins/skills/agents
|
||
that share a ``name`` collide in the served marketplace catalog (the
|
||
``manifest_name`` is unique-key for ``/plugin`` lookup) and on-disk inside
|
||
the ``agnes-store-bundle`` (skills/<suffixed>/SKILL.md is the dir name).
|
||
|
||
``sanitize_username`` is many-to-one (``alice.smith`` and ``alice_smith``
|
||
both → ``alice-smith``), so the per-owner UNIQUE on
|
||
``(owner_user_id, name)`` does NOT prevent the cross-owner collision.
|
||
We enforce global uniqueness on ``name || '-by-' || owner_username``
|
||
here, at upload time, with a clear 409.
|
||
|
||
``exclude_archived=True`` skips rows whose
|
||
``visibility_status='archived'`` — required by the upload conflict
|
||
check so the same owner can re-upload under the original name after
|
||
archive. The archive path renames the row to free the slug, so this
|
||
flag is belt-and-braces.
|
||
"""
|
||
sql = (
|
||
"SELECT id FROM store_entities "
|
||
"WHERE name || '-by-' || owner_username = ?"
|
||
)
|
||
params: List[Any] = [suffixed]
|
||
if exclude_entity_id:
|
||
sql += " AND id != ?"
|
||
params.append(exclude_entity_id)
|
||
if exclude_archived:
|
||
sql += " AND visibility_status != 'archived'"
|
||
return bool(conn.execute(sql, params).fetchone())
|
||
|
||
|
||
def _validate_video_url(value: Optional[str]) -> Optional[str]:
|
||
"""Return the URL if it is a safe http(s) URL, raise 400 otherwise.
|
||
|
||
Empty / None passes through as None — video_url is optional. Defends
|
||
against ``javascript:``, ``data:``, ``vbscript:`` (etc.) URIs that
|
||
would execute in the viewer's session if rendered inside an ``href``.
|
||
Jinja2 autoescape only HTML-escapes characters; it does not block URI
|
||
schemes inside attribute values.
|
||
"""
|
||
if value is None:
|
||
return None
|
||
s = value.strip()
|
||
if not s:
|
||
return None
|
||
parsed = urlparse(s)
|
||
if parsed.scheme.lower() not in _ALLOWED_VIDEO_SCHEMES or not parsed.netloc:
|
||
raise HTTPException(status_code=400, detail="invalid_video_url")
|
||
return s
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Pydantic response models
|
||
# ---------------------------------------------------------------------------
|
||
|
||
|
||
class StoreEntityResponse(BaseModel):
|
||
id: str
|
||
type: str
|
||
name: str
|
||
description: Optional[str] = None
|
||
category: Optional[str] = None
|
||
version: str
|
||
owner_user_id: str
|
||
owner_username: str
|
||
owner_display_name: Optional[str] = None
|
||
install_count: int = 0
|
||
file_size: int = 0
|
||
photo_url: Optional[str] = None
|
||
video_url: Optional[str] = None
|
||
doc_paths: List[str] = []
|
||
created_at: Optional[str] = None
|
||
updated_at: Optional[str] = None
|
||
invocation_name: str # what the user types in Claude Code
|
||
# v32+ quarantine: surface visibility so /store browse can render
|
||
# the corner badge on the submitter's own non-approved cards.
|
||
visibility_status: Optional[str] = None
|
||
|
||
|
||
class StoreEntityListResponse(BaseModel):
|
||
items: List[StoreEntityResponse]
|
||
total: int
|
||
skip: int
|
||
limit: int
|
||
|
||
|
||
class InstallResponse(BaseModel):
|
||
entity_id: str
|
||
installed: bool
|
||
|
||
|
||
class PreviewResponse(BaseModel):
|
||
type: str
|
||
name: Optional[str] = None
|
||
description: Optional[str] = None
|
||
|
||
|
||
class OkResponse(BaseModel):
|
||
ok: bool = True
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Helpers
|
||
# ---------------------------------------------------------------------------
|
||
|
||
|
||
def _audit(
|
||
conn: duckdb.DuckDBPyConnection,
|
||
actor_id: str,
|
||
action: str,
|
||
entity_id: str,
|
||
params: Optional[dict] = None,
|
||
) -> None:
|
||
try:
|
||
AuditRepository(conn).log(
|
||
user_id=actor_id,
|
||
action=action,
|
||
resource=f"store_entity:{entity_id}",
|
||
params=params,
|
||
)
|
||
except Exception:
|
||
pass
|
||
|
||
|
||
def _schedule_llm_review(
|
||
background_tasks: BackgroundTasks,
|
||
submission_id: str,
|
||
plugin_dir: Path,
|
||
) -> None:
|
||
"""Defer the LLM security review to FastAPI's BackgroundTasks queue.
|
||
|
||
Runs after the response has been sent so the uploader sees the 202
|
||
immediately while the (slow) Anthropic call happens in the background.
|
||
Pulls a *fresh* DuckDB cursor inside the task — sharing the request's
|
||
cursor across the background path would close it before the task
|
||
fires (FastAPI yields cursors from ``_get_db`` and the cleanup runs
|
||
on the response yield).
|
||
"""
|
||
background_tasks.add_task(
|
||
run_llm_review,
|
||
submission_id,
|
||
plugin_dir=plugin_dir,
|
||
conn_factory=get_system_db,
|
||
api_key_loader=default_api_key_loader,
|
||
model_loader=default_model_loader,
|
||
)
|
||
|
||
|
||
def _categories_for_user(conn: duckdb.DuckDBPyConnection, user_id: str) -> List[str]:
|
||
"""Return the Store-wide category taxonomy.
|
||
|
||
Used to be user-scoped (sourced from user_groups), but groups are an RBAC
|
||
construct — they describe *who* can do what, not what an entity is about.
|
||
Categories are now a fixed taxonomy in ``src.store_categories`` so the
|
||
Store's organizational axis is independent of permissions.
|
||
"""
|
||
return list(STORE_CATEGORIES)
|
||
|
||
|
||
def _entity_dir(entity_id: str) -> Path:
|
||
return get_store_dir() / entity_id
|
||
|
||
|
||
def _plugin_dir(entity_id: str) -> Path:
|
||
return _entity_dir(entity_id) / "plugin"
|
||
|
||
|
||
def _assets_dir(entity_id: str) -> Path:
|
||
return _entity_dir(entity_id) / "assets"
|
||
|
||
|
||
def _to_iso(value: Any) -> Optional[str]:
|
||
if value is None:
|
||
return None
|
||
if isinstance(value, datetime):
|
||
return value.isoformat()
|
||
return str(value)
|
||
|
||
|
||
def _resolve_owner_display(
|
||
conn: duckdb.DuckDBPyConnection, user_id: str
|
||
) -> Optional[str]:
|
||
row = conn.execute(
|
||
"SELECT name, email FROM users WHERE id = ?", [user_id]
|
||
).fetchone()
|
||
if not row:
|
||
return None
|
||
name, email = row
|
||
if name and str(name).strip():
|
||
return str(name).strip()
|
||
return str(email) if email else None
|
||
|
||
|
||
def _entity_to_response(
|
||
conn: duckdb.DuckDBPyConnection, entity: dict
|
||
) -> StoreEntityResponse:
|
||
photo_url = (
|
||
f"/api/store/entities/{entity['id']}/photo" if entity.get("photo_path") else None
|
||
)
|
||
return StoreEntityResponse(
|
||
id=entity["id"],
|
||
type=entity["type"],
|
||
name=entity["name"],
|
||
description=entity.get("description"),
|
||
category=entity.get("category"),
|
||
version=entity["version"],
|
||
owner_user_id=entity["owner_user_id"],
|
||
owner_username=entity["owner_username"],
|
||
owner_display_name=_resolve_owner_display(conn, entity["owner_user_id"]),
|
||
install_count=int(entity.get("install_count") or 0),
|
||
file_size=int(entity.get("file_size") or 0),
|
||
photo_url=photo_url,
|
||
video_url=entity.get("video_url"),
|
||
doc_paths=entity.get("doc_paths") or [],
|
||
created_at=_to_iso(entity.get("created_at")),
|
||
updated_at=_to_iso(entity.get("updated_at")),
|
||
invocation_name=suffixed_name(entity["name"], entity["owner_username"]),
|
||
visibility_status=entity.get("visibility_status") or "approved",
|
||
)
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Streaming helpers (reuse pattern from app/api/upload.py)
|
||
# ---------------------------------------------------------------------------
|
||
|
||
|
||
async def _stream_to_temp(
|
||
file: UploadFile, max_size: int, suffix: str = ".tmp"
|
||
) -> tuple[tempfile._TemporaryFileWrapper, int]:
|
||
tmp = tempfile.NamedTemporaryFile(delete=False, suffix=suffix)
|
||
total = 0
|
||
try:
|
||
while True:
|
||
chunk = await file.read(_CHUNK_SIZE)
|
||
if not chunk:
|
||
break
|
||
total += len(chunk)
|
||
if total > max_size:
|
||
tmp.close()
|
||
Path(tmp.name).unlink(missing_ok=True)
|
||
raise HTTPException(
|
||
status_code=413,
|
||
detail=f"file_too_large (max {max_size // 1024 // 1024}MB)",
|
||
)
|
||
tmp.write(chunk)
|
||
tmp.flush()
|
||
except HTTPException:
|
||
raise
|
||
except Exception:
|
||
tmp.close()
|
||
Path(tmp.name).unlink(missing_ok=True)
|
||
raise
|
||
tmp.seek(0)
|
||
return tmp, total
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# ZIP validation + bake
|
||
# ---------------------------------------------------------------------------
|
||
|
||
|
||
def _safe_zip_extract(zf: zipfile.ZipFile, dest: Path) -> None:
|
||
"""Extract ``zf`` into ``dest`` while rejecting unsafe members.
|
||
|
||
Three guards:
|
||
|
||
1. Path traversal (zip-slip) — refuse absolute paths or ``..`` segments.
|
||
2. Decompression bomb — reject if the sum of declared uncompressed sizes
|
||
exceeds ``MAX_ZIP_UNCOMPRESSED``. The compressed-side cap
|
||
(``MAX_ZIP_SIZE``) does not bound the decompressed footprint; a 50 MB
|
||
ZIP at ratio 1:1000 expands to 50 GB on disk.
|
||
3. (Note) Python's stdlib ``ZipFile.extractall`` does NOT honor symlink
|
||
mode bits — symlink entries are written as regular files containing
|
||
the link target text, not as actual symlinks. So no extra symlink
|
||
guard is needed for the stdlib path.
|
||
"""
|
||
dest_resolved = dest.resolve()
|
||
total_uncompressed = 0
|
||
for member in zf.infolist():
|
||
# Path traversal.
|
||
member_path = Path(member.filename)
|
||
if member_path.is_absolute() or any(part == ".." for part in member_path.parts):
|
||
raise HTTPException(status_code=422, detail="zip_unsafe_path")
|
||
target = (dest / member_path).resolve()
|
||
try:
|
||
target.relative_to(dest_resolved)
|
||
except ValueError:
|
||
raise HTTPException(status_code=422, detail="zip_unsafe_path")
|
||
|
||
# Decompression bomb. We sum declared sizes — these are advisory
|
||
# (an attacker can lie) but mismatched values trip Python's own
|
||
# CRC/size check during read. The pre-extract sum catches the
|
||
# honest-malicious case (large declared sizes) and is the last
|
||
# cheap fence before extractall touches the disk.
|
||
total_uncompressed += int(member.file_size or 0)
|
||
if total_uncompressed > MAX_ZIP_UNCOMPRESSED:
|
||
raise HTTPException(
|
||
status_code=413,
|
||
detail=f"zip_too_large_uncompressed (max {MAX_ZIP_UNCOMPRESSED // 1024 // 1024}MB)",
|
||
)
|
||
|
||
zf.extractall(dest)
|
||
|
||
|
||
def _parse_frontmatter(text: str) -> dict:
|
||
m = _FRONTMATTER_RE.match(text)
|
||
if not m:
|
||
return {}
|
||
body = m.group(1)
|
||
out: dict = {}
|
||
for line in body.splitlines():
|
||
if not line.strip() or line.lstrip().startswith("#"):
|
||
continue
|
||
if ":" in line:
|
||
k, v = line.split(":", 1)
|
||
out[k.strip()] = v.strip().strip('"').strip("'")
|
||
return out
|
||
|
||
|
||
def _set_frontmatter_name(text: str, new_name: str) -> str:
|
||
"""Rewrite the ``name`` field in YAML-ish frontmatter, inserting one if
|
||
the document has frontmatter without a ``name`` key. If there is no
|
||
frontmatter at all, prepend a minimal block.
|
||
"""
|
||
m = _FRONTMATTER_RE.match(text)
|
||
if not m:
|
||
return f"---\nname: {new_name}\n---\n\n{text}"
|
||
|
||
body = m.group(1)
|
||
new_body_lines: List[str] = []
|
||
found = False
|
||
for line in body.splitlines():
|
||
if not found and re.match(r"^\s*name\s*:", line):
|
||
new_body_lines.append(f"name: {new_name}")
|
||
found = True
|
||
else:
|
||
new_body_lines.append(line)
|
||
if not found:
|
||
new_body_lines.insert(0, f"name: {new_name}")
|
||
new_body = "\n".join(new_body_lines)
|
||
return f"---\n{new_body}\n---" + text[m.end():]
|
||
|
||
|
||
def _find_skill_md(root: Path) -> Optional[Path]:
|
||
for p in sorted(root.rglob("SKILL.md")):
|
||
if p.is_file():
|
||
return p
|
||
return None
|
||
|
||
|
||
def _find_agent_md(root: Path) -> Optional[Path]:
|
||
"""An agent is any *.md (with name + description frontmatter) that is
|
||
NOT a SKILL.md and is NOT located inside a ``.claude-plugin/`` directory.
|
||
The two exclusions are how we keep skill / plugin ZIPs from
|
||
accidentally validating as agents — SKILL.md has the same frontmatter
|
||
fields as an agent definition, and plugin.json siblings live under
|
||
``.claude-plugin/``.
|
||
"""
|
||
for p in sorted(root.rglob("*.md")):
|
||
if not p.is_file():
|
||
continue
|
||
if p.name.lower() == "skill.md":
|
||
continue
|
||
# Reject anything below a .claude-plugin/ ancestor.
|
||
if any(part == ".claude-plugin" for part in p.parts):
|
||
continue
|
||
try:
|
||
text = p.read_text(encoding="utf-8", errors="replace")
|
||
except OSError:
|
||
continue
|
||
fm = _parse_frontmatter(text)
|
||
if "name" in fm and "description" in fm:
|
||
return p
|
||
return None
|
||
|
||
|
||
def _find_plugin_json(root: Path) -> Optional[Path]:
|
||
for p in sorted(root.rglob("plugin.json")):
|
||
if p.is_file() and p.parent.name == ".claude-plugin":
|
||
return p
|
||
return None
|
||
|
||
|
||
def _validate_and_extract_metadata(
|
||
type_: str, extracted_root: Path
|
||
) -> dict:
|
||
"""Return ``{"name": str | None, "description": str | None}`` parsed from
|
||
the ZIP for pre-fill. Raises 422 if the ZIP layout doesn't match ``type``.
|
||
|
||
Cross-type guards reject obvious mismatches so a skill ZIP can't be
|
||
smuggled in as an agent (same frontmatter shape) and a plugin ZIP can't
|
||
masquerade as a skill or agent (plugin trees often contain inner skills
|
||
+ agents that share their signatures):
|
||
|
||
* skill → must have SKILL.md and must NOT have a top-level
|
||
.claude-plugin/plugin.json (would be a plugin)
|
||
* agent → must have a non-SKILL .md with name+description frontmatter
|
||
outside any .claude-plugin/ folder, must NOT have SKILL.md
|
||
(would be a skill), and must NOT have .claude-plugin/plugin.json
|
||
(would be a plugin)
|
||
* plugin → must have .claude-plugin/plugin.json
|
||
"""
|
||
has_skill_md = _find_skill_md(extracted_root) is not None
|
||
has_plugin_json = _find_plugin_json(extracted_root) is not None
|
||
|
||
if type_ == "skill":
|
||
if has_plugin_json:
|
||
raise HTTPException(status_code=422, detail="zip_looks_like_plugin")
|
||
if not has_skill_md:
|
||
raise HTTPException(status_code=422, detail="zip_missing_skill_md")
|
||
skill_md = _find_skill_md(extracted_root)
|
||
text = skill_md.read_text(encoding="utf-8", errors="replace") # type: ignore[union-attr]
|
||
fm = _parse_frontmatter(text)
|
||
return {"name": fm.get("name"), "description": fm.get("description")}
|
||
|
||
if type_ == "agent":
|
||
if has_skill_md:
|
||
raise HTTPException(status_code=422, detail="zip_looks_like_skill")
|
||
if has_plugin_json:
|
||
raise HTTPException(status_code=422, detail="zip_looks_like_plugin")
|
||
agent_md = _find_agent_md(extracted_root)
|
||
if agent_md is None:
|
||
raise HTTPException(
|
||
status_code=422,
|
||
detail="zip_missing_agent_md_with_frontmatter",
|
||
)
|
||
fm = _parse_frontmatter(agent_md.read_text(encoding="utf-8", errors="replace"))
|
||
return {"name": fm.get("name"), "description": fm.get("description")}
|
||
|
||
if type_ == "plugin":
|
||
if not has_plugin_json:
|
||
raise HTTPException(
|
||
status_code=422,
|
||
detail="zip_missing_claude_plugin_json",
|
||
)
|
||
pj = _find_plugin_json(extracted_root)
|
||
try:
|
||
data = json.loads(pj.read_text(encoding="utf-8")) # type: ignore[union-attr]
|
||
except (OSError, ValueError):
|
||
raise HTTPException(status_code=422, detail="plugin_json_invalid")
|
||
if not isinstance(data, dict):
|
||
raise HTTPException(status_code=422, detail="plugin_json_invalid")
|
||
return {"name": data.get("name"), "description": data.get("description")}
|
||
|
||
raise HTTPException(status_code=422, detail=f"unknown_type:{type_}")
|
||
|
||
|
||
def _bake_plugin_tree(
|
||
*,
|
||
type_: str,
|
||
extracted_root: Path,
|
||
plugin_dir: Path,
|
||
final_name: str,
|
||
suffixed: str,
|
||
description: Optional[str],
|
||
) -> int:
|
||
"""Materialize the canonical Claude Code plugin tree at ``plugin_dir``.
|
||
|
||
Returns the total bytes written. Caller computes the version hash *after*
|
||
this finishes (so the hash covers exactly what's served on disk).
|
||
"""
|
||
if plugin_dir.exists():
|
||
shutil.rmtree(plugin_dir)
|
||
plugin_dir.mkdir(parents=True, exist_ok=True)
|
||
|
||
if type_ == "skill":
|
||
skill_md = _find_skill_md(extracted_root)
|
||
if skill_md is None:
|
||
raise HTTPException(status_code=422, detail="zip_missing_skill_md")
|
||
skill_root = skill_md.parent
|
||
target = plugin_dir / "skills" / suffixed
|
||
target.mkdir(parents=True, exist_ok=True)
|
||
for f in skill_root.rglob("*"):
|
||
if not f.is_file():
|
||
continue
|
||
rel = f.relative_to(skill_root)
|
||
dest = target / rel
|
||
dest.parent.mkdir(parents=True, exist_ok=True)
|
||
if f == skill_md:
|
||
# Rewrite frontmatter `name` to the suffixed value.
|
||
text = f.read_text(encoding="utf-8", errors="replace")
|
||
dest.write_text(_set_frontmatter_name(text, suffixed), encoding="utf-8")
|
||
else:
|
||
shutil.copy2(f, dest)
|
||
_write_synth_plugin_json(plugin_dir, suffixed, description)
|
||
|
||
elif type_ == "agent":
|
||
agent_md = _find_agent_md(extracted_root)
|
||
if agent_md is None:
|
||
raise HTTPException(status_code=422, detail="zip_missing_agent_md_with_frontmatter")
|
||
target_dir = plugin_dir / "agents"
|
||
target_dir.mkdir(parents=True, exist_ok=True)
|
||
text = agent_md.read_text(encoding="utf-8", errors="replace")
|
||
rewritten = _set_frontmatter_name(text, suffixed)
|
||
(target_dir / f"{suffixed}.md").write_text(rewritten, encoding="utf-8")
|
||
_write_synth_plugin_json(plugin_dir, suffixed, description)
|
||
|
||
elif type_ == "plugin":
|
||
# Mirror the upload as-is, then rewrite plugin.json `name` to suffixed.
|
||
for f in extracted_root.rglob("*"):
|
||
if not f.is_file():
|
||
continue
|
||
rel = f.relative_to(extracted_root)
|
||
dest = plugin_dir / rel
|
||
dest.parent.mkdir(parents=True, exist_ok=True)
|
||
shutil.copy2(f, dest)
|
||
pj_path = plugin_dir / ".claude-plugin" / "plugin.json"
|
||
if not pj_path.is_file():
|
||
raise HTTPException(status_code=422, detail="zip_missing_claude_plugin_json")
|
||
try:
|
||
data = json.loads(pj_path.read_text(encoding="utf-8"))
|
||
except (OSError, ValueError):
|
||
raise HTTPException(status_code=422, detail="plugin_json_invalid")
|
||
if not isinstance(data, dict):
|
||
raise HTTPException(status_code=422, detail="plugin_json_invalid")
|
||
data["name"] = suffixed
|
||
pj_path.write_text(json.dumps(data, indent=2), encoding="utf-8")
|
||
|
||
else:
|
||
raise HTTPException(status_code=422, detail=f"unknown_type:{type_}")
|
||
|
||
total = 0
|
||
for f in plugin_dir.rglob("*"):
|
||
if f.is_file():
|
||
total += f.stat().st_size
|
||
return total
|
||
|
||
|
||
def _write_synth_plugin_json(
|
||
plugin_dir: Path, suffixed: str, description: Optional[str]
|
||
) -> None:
|
||
target = plugin_dir / ".claude-plugin"
|
||
target.mkdir(parents=True, exist_ok=True)
|
||
payload = {
|
||
"name": suffixed,
|
||
"description": description or "",
|
||
}
|
||
(target / "plugin.json").write_text(
|
||
json.dumps(payload, indent=2), encoding="utf-8"
|
||
)
|
||
|
||
|
||
def _rename_baked_tree(
|
||
*,
|
||
type_: str,
|
||
plugin_dir: Path,
|
||
old_suffix: str,
|
||
new_suffix: str,
|
||
description: Optional[str],
|
||
) -> None:
|
||
"""Rename a baked entity's on-disk slug so a re-uploader can take
|
||
the original name.
|
||
|
||
Per-type layout (see :func:`_bake_plugin_tree`):
|
||
* ``skill`` — rename ``skills/<old>/`` → ``skills/<new>/`` and
|
||
rewrite ``SKILL.md`` frontmatter ``name``.
|
||
* ``agent`` — rename ``agents/<old>.md`` → ``agents/<new>.md``
|
||
and rewrite the file's frontmatter ``name``.
|
||
* ``plugin`` — tree paths don't carry the suffix; only the
|
||
synth ``.claude-plugin/plugin.json`` ``name`` field changes.
|
||
|
||
Always rewrites the synth ``plugin.json`` ``name`` to the new
|
||
suffix. Idempotent — old==new is a no-op.
|
||
"""
|
||
if old_suffix == new_suffix or not plugin_dir.is_dir():
|
||
return
|
||
|
||
if type_ == "skill":
|
||
old_dir = plugin_dir / "skills" / old_suffix
|
||
new_dir = plugin_dir / "skills" / new_suffix
|
||
if old_dir.is_dir() and not new_dir.exists():
|
||
old_dir.rename(new_dir)
|
||
skill_md = new_dir / "SKILL.md"
|
||
if skill_md.is_file():
|
||
text = skill_md.read_text(encoding="utf-8", errors="replace")
|
||
skill_md.write_text(
|
||
_set_frontmatter_name(text, new_suffix), encoding="utf-8"
|
||
)
|
||
elif type_ == "agent":
|
||
old_md = plugin_dir / "agents" / f"{old_suffix}.md"
|
||
new_md = plugin_dir / "agents" / f"{new_suffix}.md"
|
||
if old_md.is_file() and not new_md.exists():
|
||
old_md.rename(new_md)
|
||
if new_md.is_file():
|
||
text = new_md.read_text(encoding="utf-8", errors="replace")
|
||
new_md.write_text(
|
||
_set_frontmatter_name(text, new_suffix), encoding="utf-8"
|
||
)
|
||
elif type_ == "plugin":
|
||
pj_path = plugin_dir / ".claude-plugin" / "plugin.json"
|
||
if pj_path.is_file():
|
||
try:
|
||
data = json.loads(pj_path.read_text(encoding="utf-8"))
|
||
if isinstance(data, dict):
|
||
data["name"] = new_suffix
|
||
pj_path.write_text(
|
||
json.dumps(data, indent=2), encoding="utf-8"
|
||
)
|
||
except (OSError, ValueError):
|
||
pass
|
||
|
||
# Always rewrite the synth plugin.json (skill + agent path) so the
|
||
# name aligns with the renamed slug.
|
||
if type_ in ("skill", "agent"):
|
||
_write_synth_plugin_json(plugin_dir, new_suffix, description)
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Listing + detail endpoints
|
||
# ---------------------------------------------------------------------------
|
||
|
||
|
||
@router.get("/categories", response_model=List[str])
|
||
async def my_categories(
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Caller's group names — populates the upload form's category select."""
|
||
return _categories_for_user(conn, user["id"])
|
||
|
||
|
||
class OwnerOption(BaseModel):
|
||
user_id: str
|
||
display_name: str
|
||
entity_count: int
|
||
|
||
|
||
@router.get("/owners", response_model=List[OwnerOption])
|
||
async def list_owners(
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Owners who have at least one entity in the Store — populates the
|
||
listing-page owner filter. Sorted by display name (name fallbacks to
|
||
email then to username) so the dropdown is alphabetical and stable.
|
||
|
||
Visibility filter (v32+/v35): non-admin sees owners-of-approved
|
||
only (a submitter with N quarantined uploads must not surface in
|
||
the public dropdown until at least one is approved). Admin sees
|
||
every owner regardless of state.
|
||
"""
|
||
if is_user_admin(user["id"], conn):
|
||
where_sql = ""
|
||
params: list = []
|
||
else:
|
||
# 'approved' is the public set. Owners of only-archived /
|
||
# only-pending / only-blocked entries don't appear in the
|
||
# public dropdown — they have nothing to filter to.
|
||
where_sql = "WHERE se.visibility_status = 'approved'"
|
||
params = []
|
||
rows = conn.execute(
|
||
f"""SELECT
|
||
se.owner_user_id,
|
||
COALESCE(NULLIF(TRIM(u.name), ''), u.email, se.owner_username) AS display_name,
|
||
COUNT(*) AS entity_count
|
||
FROM store_entities se
|
||
LEFT JOIN users u ON u.id = se.owner_user_id
|
||
{where_sql}
|
||
GROUP BY se.owner_user_id, display_name
|
||
ORDER BY display_name""",
|
||
params,
|
||
).fetchall()
|
||
return [
|
||
OwnerOption(
|
||
user_id=r[0],
|
||
display_name=str(r[1]),
|
||
entity_count=int(r[2]),
|
||
)
|
||
for r in rows
|
||
]
|
||
|
||
|
||
@router.get("/entities", response_model=StoreEntityListResponse)
|
||
async def list_entities(
|
||
skip: int = Query(0, ge=0),
|
||
limit: int = Query(24, ge=1, le=100),
|
||
type: Optional[str] = Query(None),
|
||
category: Optional[str] = Query(None),
|
||
search: Optional[str] = Query(None),
|
||
owner: Optional[str] = Query(None),
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
if type and type not in _VALID_TYPES:
|
||
raise HTTPException(status_code=400, detail="invalid_type")
|
||
repo = StoreEntitiesRepository(conn)
|
||
# Visibility filter: hide pending/blocked from the public flea browse.
|
||
# An owner viewing their own uploads (`owner=<self_id>`) sees their
|
||
# whole catalogue regardless of guardrail status — same goes for
|
||
# admins via the existing /admin path. Anyone else only sees approved.
|
||
visibility_filter: Optional[List[str]]
|
||
include_owner_id: Optional[str] = None
|
||
is_admin = is_user_admin(user["id"], conn)
|
||
is_self_owner = bool(owner and owner == user["id"])
|
||
if is_admin or is_self_owner:
|
||
visibility_filter = None
|
||
else:
|
||
visibility_filter = ["approved"]
|
||
# Owner sees their own non-approved entries in the listing too
|
||
# so they spot what they uploaded that's still under review or
|
||
# quarantined. The card template renders a status badge for
|
||
# those rows; without this an upload silently disappears from
|
||
# the grid the moment it's quarantined.
|
||
include_owner_id = user["id"]
|
||
items, total = repo.list(
|
||
skip=skip,
|
||
limit=limit,
|
||
type=type,
|
||
category=category,
|
||
search=search,
|
||
owner_user_id=owner,
|
||
visibility_status=visibility_filter,
|
||
include_owner_id=include_owner_id,
|
||
)
|
||
return StoreEntityListResponse(
|
||
items=[_entity_to_response(conn, e) for e in items],
|
||
total=total,
|
||
skip=skip,
|
||
limit=limit,
|
||
)
|
||
|
||
|
||
def _enforce_visibility(entity: dict, user: dict, conn) -> None:
|
||
"""Refuse asset reads on quarantined entities for non-owner non-admin.
|
||
|
||
Returns 404 (not 403) so the existence of the entity is not leaked
|
||
via timing / status-code differences. Owner + admin always pass.
|
||
"""
|
||
if entity.get("visibility_status") == "approved":
|
||
return
|
||
if entity.get("owner_user_id") == user.get("id"):
|
||
return
|
||
if is_user_admin(user["id"], conn):
|
||
return
|
||
raise HTTPException(status_code=404, detail="entity_not_found")
|
||
|
||
|
||
@router.get("/entities/{entity_id}", response_model=StoreEntityResponse)
|
||
async def get_entity(
|
||
entity_id: str,
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
entity = StoreEntitiesRepository(conn).get(entity_id)
|
||
if not entity:
|
||
raise HTTPException(status_code=404, detail="entity_not_found")
|
||
_enforce_visibility(entity, user, conn)
|
||
return _entity_to_response(conn, entity)
|
||
|
||
|
||
@router.get("/entities/{entity_id}/files")
|
||
async def list_entity_files(
|
||
entity_id: str,
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
entity = StoreEntitiesRepository(conn).get(entity_id)
|
||
if not entity:
|
||
raise HTTPException(status_code=404, detail="entity_not_found")
|
||
_enforce_visibility(entity, user, conn)
|
||
plugin_dir = _plugin_dir(entity_id)
|
||
if not plugin_dir.is_dir():
|
||
return {"files": []}
|
||
files = []
|
||
for f in sorted(plugin_dir.rglob("*")):
|
||
if f.is_file():
|
||
files.append(
|
||
{
|
||
"path": f.relative_to(plugin_dir).as_posix(),
|
||
"size": f.stat().st_size,
|
||
}
|
||
)
|
||
return {"files": files}
|
||
|
||
|
||
@router.get("/entities/{entity_id}/photo")
|
||
async def get_entity_photo(
|
||
entity_id: str,
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
entity = StoreEntitiesRepository(conn).get(entity_id)
|
||
if not entity or not entity.get("photo_path"):
|
||
raise HTTPException(status_code=404, detail="photo_not_found")
|
||
_enforce_visibility(entity, user, conn)
|
||
abs_path = _entity_dir(entity_id) / entity["photo_path"]
|
||
if not abs_path.is_file():
|
||
raise HTTPException(status_code=404, detail="photo_not_found")
|
||
return FileResponse(abs_path)
|
||
|
||
|
||
@router.get("/entities/{entity_id}/docs/{filename}")
|
||
async def get_entity_doc(
|
||
entity_id: str,
|
||
filename: str,
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Stream an attached doc — directory-traversal-guarded."""
|
||
entity = StoreEntitiesRepository(conn).get(entity_id)
|
||
if not entity:
|
||
raise HTTPException(status_code=404, detail="entity_not_found")
|
||
_enforce_visibility(entity, user, conn)
|
||
docs_dir = (_assets_dir(entity_id) / "docs").resolve()
|
||
abs_path = (docs_dir / filename).resolve()
|
||
try:
|
||
abs_path.relative_to(docs_dir)
|
||
except ValueError:
|
||
raise HTTPException(status_code=400, detail="invalid_path")
|
||
if not abs_path.is_file():
|
||
raise HTTPException(status_code=404, detail="doc_not_found")
|
||
return FileResponse(abs_path, filename=filename)
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Preview — POST /api/store/entities/preview
|
||
# ---------------------------------------------------------------------------
|
||
|
||
|
||
@router.post("/entities/preview", response_model=PreviewResponse)
|
||
async def preview_entity(
|
||
file: UploadFile = File(...),
|
||
type: str = Form(...),
|
||
user: dict = Depends(get_current_user),
|
||
):
|
||
"""Wizard step 1 — validate the uploaded ZIP and parse frontmatter for
|
||
pre-fill on step 2. Does **not** persist anything: tmp dir is wiped before
|
||
the response returns. The browser must hold the same File and re-submit
|
||
it on step 2 (POST /entities) for the actual create.
|
||
"""
|
||
if type not in _VALID_TYPES:
|
||
raise HTTPException(status_code=400, detail="invalid_type")
|
||
|
||
tmp, _ = await _stream_to_temp(file, MAX_ZIP_SIZE, suffix=".zip")
|
||
try:
|
||
tmp.close()
|
||
scratch = Path(tempfile.mkdtemp(prefix="agnes_store_preview_"))
|
||
try:
|
||
try:
|
||
with zipfile.ZipFile(tmp.name, "r") as zf:
|
||
_safe_zip_extract(zf, scratch)
|
||
except zipfile.BadZipFile:
|
||
raise HTTPException(status_code=422, detail="zip_invalid")
|
||
meta = _validate_and_extract_metadata(type, scratch)
|
||
finally:
|
||
shutil.rmtree(scratch, ignore_errors=True)
|
||
finally:
|
||
Path(tmp.name).unlink(missing_ok=True)
|
||
|
||
return PreviewResponse(
|
||
type=type,
|
||
name=meta.get("name"),
|
||
description=meta.get("description"),
|
||
)
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Create (upload) — POST /api/store/entities
|
||
# ---------------------------------------------------------------------------
|
||
|
||
|
||
@router.post("/entities", response_model=StoreEntityResponse, status_code=201)
|
||
async def create_entity(
|
||
background_tasks: BackgroundTasks,
|
||
file: UploadFile = File(...),
|
||
type: str = Form(...),
|
||
name: Optional[str] = Form(None),
|
||
description: Optional[str] = Form(None),
|
||
category: Optional[str] = Form(None),
|
||
video_url: Optional[str] = Form(None),
|
||
photo: Optional[UploadFile] = File(None),
|
||
docs: List[UploadFile] = File(default=[]),
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
if type not in _VALID_TYPES:
|
||
raise HTTPException(status_code=400, detail="invalid_type")
|
||
|
||
try:
|
||
username = sanitize_username(user["email"])
|
||
except (KeyError, ValueError):
|
||
raise HTTPException(status_code=400, detail="invalid_email")
|
||
|
||
# Category: must be one of caller's groups.
|
||
if category and not is_valid_category(category):
|
||
raise HTTPException(status_code=400, detail="invalid_category")
|
||
|
||
video_url = _validate_video_url(video_url)
|
||
|
||
# Per-submitter spam quota (v30). When persisted-bundle is on, a bot
|
||
# looping on malformed ZIPs would otherwise fill disk + the admin
|
||
# queue with noise. Cap rejected uploads per submitter per 24h;
|
||
# operator can disable by setting the knob to 0.
|
||
#
|
||
# #9 widening: counts blocked_inline + blocked_llm + review_error so
|
||
# a submitter triggering only LLM-blocked verdicts is bounded too.
|
||
#
|
||
# Race note (#5, deferred): two parallel uploads from the same
|
||
# submitter can both pass the SELECT before either INSERT — the cap
|
||
# may be exceeded by the number of in-flight requests. A
|
||
# threading.Lock would block the asyncio event loop; a proper fix
|
||
# needs an asyncio.Lock held across the entire pipeline (extract +
|
||
# check + insert) which is a larger restructure tracked separately.
|
||
# API-level rate limiting (slowapi) bounds the worst case until then.
|
||
from datetime import datetime as _dt, timezone as _tz, timedelta as _td
|
||
from app.instance_config import get_guardrails_blocked_quota_per_day
|
||
quota = get_guardrails_blocked_quota_per_day()
|
||
if quota > 0:
|
||
since = _dt.now(_tz.utc) - _td(hours=24)
|
||
recent_blocked = StoreSubmissionsRepository(conn) \
|
||
.count_blocked_for_submitter_since(user["id"], since)
|
||
if recent_blocked >= quota:
|
||
raise HTTPException(
|
||
status_code=429,
|
||
detail={
|
||
"code": "quota_exceeded",
|
||
"blocked_in_last_24h": recent_blocked,
|
||
"limit": quota,
|
||
"hint": "Fix the previous upload errors before retrying. "
|
||
"Quota resets 24h after each blocked attempt.",
|
||
},
|
||
)
|
||
|
||
# Stream + extract ZIP into a scratch dir. Both the temp-file (`tmp`)
|
||
# AND the scratch dir need cleanup on every exit path, including
|
||
# validation HTTPExceptions raised inside _safe_zip_extract
|
||
# (zip_unsafe_path, zip_too_large_uncompressed) and the BadZipFile→422
|
||
# conversion. Pre-fix the scratch was created in one try/finally and
|
||
# cleaned up in a SEPARATE one — when extraction raised, control
|
||
# exited the first scope and the second never ran, leaking the dir.
|
||
# Single try/finally fixes both.
|
||
tmp, size = await _stream_to_temp(file, MAX_ZIP_SIZE, suffix=".zip")
|
||
tmp.close()
|
||
scratch = Path(tempfile.mkdtemp(prefix="agnes_store_"))
|
||
try:
|
||
try:
|
||
with zipfile.ZipFile(tmp.name, "r") as zf:
|
||
_safe_zip_extract(zf, scratch)
|
||
except zipfile.BadZipFile:
|
||
raise HTTPException(status_code=422, detail="zip_invalid")
|
||
finally:
|
||
Path(tmp.name).unlink(missing_ok=True)
|
||
|
||
meta = _validate_and_extract_metadata(type, scratch)
|
||
final_name = (name or meta.get("name") or "").strip()
|
||
if not final_name:
|
||
raise HTTPException(status_code=400, detail="missing_name")
|
||
if not _NAME_RE.match(final_name):
|
||
raise HTTPException(status_code=400, detail="invalid_name_format")
|
||
final_description = description or meta.get("description")
|
||
|
||
repo = StoreEntitiesRepository(conn)
|
||
# Skip archived rows: archive renames the row to free the slot,
|
||
# so a same-name re-upload after archive succeeds. Active rows
|
||
# (approved / pending / hidden) still 409 on collision.
|
||
if repo.get_by_owner_and_name(user["id"], final_name, exclude_archived=True):
|
||
raise HTTPException(status_code=409, detail="conflict_owner_name")
|
||
|
||
entity_id = uuid.uuid4().hex
|
||
suffixed = suffixed_name(final_name, username)
|
||
# Global cross-owner check — sanitize_username is many-to-one, so
|
||
# two emails (alice.smith / alice_smith) can resolve to the same
|
||
# username and produce the same `<name>-by-<username>` suffix even
|
||
# when the per-owner UNIQUE passes. The suffixed value drives both
|
||
# the bundle on-disk dir and the served plugin.json `name`, so a
|
||
# collision silently last-write-wins. Refuse upfront — but skip
|
||
# archived rows since archive renames their slug.
|
||
if _suffixed_already_taken(conn, suffixed, exclude_archived=True):
|
||
raise HTTPException(status_code=409, detail="conflict_global_suffix")
|
||
plugin_dir = _plugin_dir(entity_id)
|
||
file_size = _bake_plugin_tree(
|
||
type_=type,
|
||
extracted_root=scratch,
|
||
plugin_dir=plugin_dir,
|
||
final_name=final_name,
|
||
suffixed=suffixed,
|
||
description=final_description,
|
||
)
|
||
version = compute_entity_version(plugin_dir)
|
||
|
||
# ---- Guardrail pipeline ------------------------------------------
|
||
#
|
||
# Inline checks (manifest, static security, quality+templating)
|
||
# run synchronously against the BAKED plugin tree.
|
||
#
|
||
# On fail (v30): we KEEP the bundle on disk, create the entity
|
||
# row at ``visibility_status='hidden'`` (invisible in flea
|
||
# browse + served marketplace), persist the submission with
|
||
# entity_id + sha + size. Admin can Rescan / Override /
|
||
# Download from /admin/store/submissions. The 30-day TTL job
|
||
# purges bundle bytes later but keeps the audit row.
|
||
#
|
||
# On pass: create the entity in 'pending' visibility, schedule
|
||
# the (slow) LLM review on BackgroundTasks. Entity only
|
||
# becomes visible after the review approves it (or admin
|
||
# overrides). See docs/STORE_GUARDRAILS.md.
|
||
from src.store_guardrails.bundle_meta import compute_bundle_meta
|
||
bundle_meta = compute_bundle_meta(plugin_dir)
|
||
inline = run_inline_checks(
|
||
plugin_dir, type_=type, description=final_description,
|
||
)
|
||
subs_repo = StoreSubmissionsRepository(conn)
|
||
if not inline.passed:
|
||
# Persist the bundle on disk so admins can Rescan/Download.
|
||
# No rmtree here — TTL purge owns deletion now.
|
||
photo_rel = await _save_photo(photo, entity_id) if photo else None
|
||
doc_rels = await _save_docs(docs, entity_id)
|
||
repo.create(
|
||
id=entity_id,
|
||
owner_user_id=user["id"],
|
||
owner_username=username,
|
||
type=type,
|
||
name=final_name,
|
||
description=final_description,
|
||
category=category,
|
||
version=version,
|
||
photo_path=photo_rel,
|
||
video_url=video_url,
|
||
doc_paths=doc_rels,
|
||
file_size=file_size,
|
||
visibility_status="hidden",
|
||
)
|
||
sub_id = subs_repo.create(
|
||
submitter_id=user["id"],
|
||
submitter_email=user.get("email"),
|
||
type=type,
|
||
name=final_name,
|
||
version=version,
|
||
status="blocked_inline",
|
||
entity_id=entity_id,
|
||
inline_checks=inline.to_response_dict(),
|
||
file_size=bundle_meta.file_size,
|
||
bundle_sha256=bundle_meta.sha256,
|
||
)
|
||
_audit(
|
||
conn, user["id"], "store.submission.blocked_inline",
|
||
sub_id, {
|
||
"type": type, "name": final_name,
|
||
"entity_id": entity_id,
|
||
"manifest": inline.manifest.get("status"),
|
||
"static_security": inline.static_security.get("status"),
|
||
"static_findings": len(inline.static_security.get("findings") or []),
|
||
"sha256": bundle_meta.sha256,
|
||
"file_size": bundle_meta.file_size,
|
||
},
|
||
)
|
||
raise HTTPException(
|
||
status_code=422,
|
||
detail={
|
||
"code": "submission_blocked",
|
||
"submission_id": sub_id,
|
||
"entity_id": entity_id,
|
||
"checks": inline.to_response_dict(),
|
||
},
|
||
)
|
||
|
||
guardrails_on = get_guardrails_enabled()
|
||
# When the pipeline is disabled (local dev / no ANTHROPIC_API_KEY)
|
||
# we approve immediately. Inline checks ran above and recorded a
|
||
# passing verdict; skip the async LLM step and the 'pending' hold.
|
||
initial_visibility = "approved" if not guardrails_on else "pending"
|
||
photo_rel = await _save_photo(photo, entity_id) if photo else None
|
||
doc_rels = await _save_docs(docs, entity_id)
|
||
|
||
repo.create(
|
||
id=entity_id,
|
||
owner_user_id=user["id"],
|
||
owner_username=username,
|
||
type=type,
|
||
name=final_name,
|
||
description=final_description,
|
||
category=category,
|
||
version=version,
|
||
photo_path=photo_rel,
|
||
video_url=video_url,
|
||
doc_paths=doc_rels,
|
||
file_size=file_size,
|
||
visibility_status=initial_visibility,
|
||
)
|
||
_audit(
|
||
conn,
|
||
user["id"],
|
||
"store.entity.create",
|
||
entity_id,
|
||
{"type": type, "name": final_name, "version": version, "size": file_size},
|
||
)
|
||
|
||
sub_id = subs_repo.create(
|
||
submitter_id=user["id"],
|
||
submitter_email=user.get("email"),
|
||
type=type,
|
||
name=final_name,
|
||
version=version,
|
||
status="approved" if not guardrails_on else "pending_llm",
|
||
entity_id=entity_id,
|
||
inline_checks=inline.to_response_dict(),
|
||
file_size=bundle_meta.file_size,
|
||
bundle_sha256=bundle_meta.sha256,
|
||
)
|
||
_audit(
|
||
conn, user["id"],
|
||
"store.submission.accepted" if guardrails_on else "store.submission.approved",
|
||
sub_id, {
|
||
"entity_id": entity_id,
|
||
"guardrails_enabled": guardrails_on,
|
||
},
|
||
)
|
||
if guardrails_on:
|
||
_schedule_llm_review(background_tasks, sub_id, plugin_dir)
|
||
finally:
|
||
shutil.rmtree(scratch, ignore_errors=True)
|
||
|
||
_invalidate_etag()
|
||
entity = StoreEntitiesRepository(conn).get(entity_id)
|
||
return _entity_to_response(conn, entity) # type: ignore[arg-type]
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Update — PUT /api/store/entities/{id}
|
||
# ---------------------------------------------------------------------------
|
||
|
||
|
||
@router.put("/entities/{entity_id}", response_model=StoreEntityResponse)
|
||
async def update_entity(
|
||
entity_id: str,
|
||
background_tasks: BackgroundTasks,
|
||
file: Optional[UploadFile] = File(None),
|
||
description: Optional[str] = Form(None),
|
||
category: Optional[str] = Form(None),
|
||
video_url: Optional[str] = Form(None),
|
||
photo: Optional[UploadFile] = File(None),
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
repo = StoreEntitiesRepository(conn)
|
||
entity = repo.get(entity_id)
|
||
if not entity:
|
||
raise HTTPException(status_code=404, detail="entity_not_found")
|
||
if entity["owner_user_id"] != user["id"] and not is_user_admin(user["id"], conn):
|
||
raise HTTPException(status_code=403, detail="not_owner")
|
||
|
||
if category and not is_valid_category(category):
|
||
raise HTTPException(status_code=400, detail="invalid_category")
|
||
|
||
video_url = _validate_video_url(video_url)
|
||
|
||
new_version: Optional[str] = None
|
||
new_size: Optional[int] = None
|
||
inline_after_update: Optional[InlineResult] = None
|
||
if file is not None:
|
||
# PUT atomicity: bake the new bundle into a staging dir alongside
|
||
# the live one, run checks against the staging copy, then atomic-
|
||
# rename onto the live path on success. Pre-fix, the bake wrote
|
||
# directly into the live `plugin/` dir BEFORE checks ran — a
|
||
# concurrent GET during the window saw partial / unverified
|
||
# content. Staging closes that window: failed checks leave the
|
||
# live tree byte-for-byte intact.
|
||
tmp, size = await _stream_to_temp(file, MAX_ZIP_SIZE, suffix=".zip")
|
||
tmp.close()
|
||
scratch = Path(tempfile.mkdtemp(prefix="agnes_store_"))
|
||
# Sibling of the live `plugin/` dir so the eventual rename is on
|
||
# the same filesystem (atomic on POSIX). Random suffix in case
|
||
# multiple PUTs land back-to-back.
|
||
existing_plugin = _plugin_dir(entity_id)
|
||
staging_plugin = existing_plugin.with_name(
|
||
f"plugin.staging-{os.urandom(4).hex()}",
|
||
)
|
||
backup_plugin: Optional[Path] = None # set if the swap starts
|
||
try:
|
||
try:
|
||
with zipfile.ZipFile(tmp.name, "r") as zf:
|
||
_safe_zip_extract(zf, scratch)
|
||
except zipfile.BadZipFile:
|
||
raise HTTPException(status_code=422, detail="zip_invalid")
|
||
finally:
|
||
Path(tmp.name).unlink(missing_ok=True)
|
||
|
||
_validate_and_extract_metadata(entity["type"], scratch)
|
||
suffixed = suffixed_name(entity["name"], entity["owner_username"])
|
||
# Bake into the staging dir — _bake_plugin_tree creates the
|
||
# target if missing and does its own rmtree on existing
|
||
# children, so the staging path being fresh is fine.
|
||
new_size = _bake_plugin_tree(
|
||
type_=entity["type"],
|
||
extracted_root=scratch,
|
||
plugin_dir=staging_plugin,
|
||
final_name=entity["name"],
|
||
suffixed=suffixed,
|
||
description=description if description is not None else entity.get("description"),
|
||
)
|
||
new_version = compute_entity_version(staging_plugin)
|
||
|
||
inline_after_update = run_inline_checks(
|
||
staging_plugin,
|
||
type_=entity["type"],
|
||
description=description if description is not None
|
||
else entity.get("description"),
|
||
)
|
||
if not inline_after_update.passed:
|
||
# Compute the rejected bundle's sha+size BEFORE the
|
||
# staging dir is removed so the submission row carries
|
||
# forensics about what was tried.
|
||
from src.store_guardrails.bundle_meta import compute_bundle_meta
|
||
rejected_meta = compute_bundle_meta(staging_plugin)
|
||
# Live tree was never touched — no rollback work to do.
|
||
# The `finally` below cleans the staging dir.
|
||
subs_repo = StoreSubmissionsRepository(conn)
|
||
sub_id = subs_repo.create(
|
||
submitter_id=user["id"],
|
||
submitter_email=user.get("email"),
|
||
type=entity["type"],
|
||
name=entity["name"],
|
||
version=new_version,
|
||
status="blocked_inline",
|
||
entity_id=entity_id,
|
||
inline_checks=inline_after_update.to_response_dict(),
|
||
file_size=rejected_meta.file_size,
|
||
bundle_sha256=rejected_meta.sha256,
|
||
)
|
||
_audit(
|
||
conn, user["id"], "store.submission.blocked_inline",
|
||
sub_id, {"entity_id": entity_id, "on": "update",
|
||
"sha256": rejected_meta.sha256,
|
||
"file_size": rejected_meta.file_size},
|
||
)
|
||
raise HTTPException(
|
||
status_code=422,
|
||
detail={
|
||
"code": "submission_blocked",
|
||
"submission_id": sub_id,
|
||
"entity_id": entity_id,
|
||
"checks": inline_after_update.to_response_dict(),
|
||
},
|
||
)
|
||
|
||
# Checks passed — swap staging onto live. Three steps:
|
||
# rename live → backup, rename staging → live, rmtree backup.
|
||
# If the second rename fails, restore by renaming backup
|
||
# back. Same-FS rename is atomic on POSIX.
|
||
if existing_plugin.exists():
|
||
backup_plugin = existing_plugin.with_name(
|
||
f"plugin.backup-{os.urandom(4).hex()}",
|
||
)
|
||
os.rename(existing_plugin, backup_plugin)
|
||
try:
|
||
os.rename(staging_plugin, existing_plugin)
|
||
except OSError:
|
||
# Restore live from backup if the swap failed mid-way.
|
||
if backup_plugin is not None and backup_plugin.exists():
|
||
try:
|
||
os.rename(backup_plugin, existing_plugin)
|
||
except OSError:
|
||
pass
|
||
raise
|
||
# Live now points at the new bundle; drop the backup copy.
|
||
if backup_plugin is not None:
|
||
shutil.rmtree(backup_plugin, ignore_errors=True)
|
||
backup_plugin = None
|
||
finally:
|
||
shutil.rmtree(scratch, ignore_errors=True)
|
||
# If staging never made it through the swap (failed check or
|
||
# exception), remove it.
|
||
if staging_plugin.exists():
|
||
shutil.rmtree(staging_plugin, ignore_errors=True)
|
||
# If a swap failed leaving the backup orphaned without a
|
||
# restore, leave it on disk for the operator to recover —
|
||
# better to log and inspect than silently drop.
|
||
if backup_plugin is not None and backup_plugin.exists():
|
||
logger.error(
|
||
"PUT atomic-swap left orphan backup at %s — "
|
||
"operator should reconcile manually",
|
||
backup_plugin,
|
||
)
|
||
|
||
photo_rel: Optional[str] = None
|
||
if photo is not None:
|
||
# Replace existing photo if present.
|
||
existing_photo = entity.get("photo_path")
|
||
if existing_photo:
|
||
try:
|
||
(_entity_dir(entity_id) / existing_photo).unlink(missing_ok=True)
|
||
except OSError:
|
||
pass
|
||
photo_rel = await _save_photo(photo, entity_id)
|
||
|
||
repo.update(
|
||
entity_id,
|
||
description=description,
|
||
category=category,
|
||
version=new_version,
|
||
photo_path=photo_rel,
|
||
video_url=video_url,
|
||
file_size=new_size,
|
||
)
|
||
|
||
# Re-run the LLM review whenever the bundle bytes changed. New
|
||
# version = new submission row; visibility flips back to 'pending'
|
||
# until the review approves the new tree. Description-only / photo-
|
||
# only edits don't re-trigger.
|
||
if file is not None:
|
||
guardrails_on = get_guardrails_enabled()
|
||
if guardrails_on:
|
||
repo.set_visibility(entity_id, "pending")
|
||
subs_repo = StoreSubmissionsRepository(conn)
|
||
# Compute sha256 for the accepted re-upload so the submission row
|
||
# carries forensics for retry-chain correlation.
|
||
from src.store_guardrails.bundle_meta import compute_bundle_meta
|
||
accepted_meta = compute_bundle_meta(_plugin_dir(entity_id))
|
||
sub_id = subs_repo.create(
|
||
submitter_id=user["id"],
|
||
submitter_email=user.get("email"),
|
||
type=entity["type"],
|
||
name=entity["name"],
|
||
version=new_version,
|
||
status="approved" if not guardrails_on else "pending_llm",
|
||
entity_id=entity_id,
|
||
inline_checks=inline_after_update.to_response_dict()
|
||
if inline_after_update else None,
|
||
file_size=accepted_meta.file_size,
|
||
bundle_sha256=accepted_meta.sha256,
|
||
)
|
||
_audit(
|
||
conn, user["id"],
|
||
"store.submission.accepted" if guardrails_on else "store.submission.approved",
|
||
sub_id, {"entity_id": entity_id, "on": "update",
|
||
"guardrails_enabled": guardrails_on},
|
||
)
|
||
if guardrails_on:
|
||
_schedule_llm_review(background_tasks, sub_id, _plugin_dir(entity_id))
|
||
|
||
_audit(
|
||
conn,
|
||
user["id"],
|
||
"store.entity.update",
|
||
entity_id,
|
||
{"version": new_version, "rebuilt": file is not None},
|
||
)
|
||
_invalidate_etag()
|
||
return _entity_to_response(conn, repo.get(entity_id)) # type: ignore[arg-type]
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Delete — DELETE /api/store/entities/{id}
|
||
# ---------------------------------------------------------------------------
|
||
|
||
|
||
@router.delete("/entities/{entity_id}", response_model=OkResponse)
|
||
async def delete_entity(
|
||
entity_id: str,
|
||
hard: bool = False,
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Soft-archive (default) or hard-delete (admin-only).
|
||
|
||
* **Soft (default)** — flips `visibility_status='archived'`. Bundle
|
||
stays on disk; existing user_store_installs continue serving the
|
||
bundle through marketplace.zip / .git so already-installed users
|
||
don't lose the plugin. Browse listings hide it; install endpoint
|
||
refuses new installs. Owner + admin can soft-archive an
|
||
approved entity.
|
||
* **Hard (`?hard=true`)** — admin-only. Drops the row, removes the
|
||
bundle from disk, deletes user_store_installs (existing users
|
||
lose the plugin). Use for legal / privacy removals where the
|
||
bytes have to go.
|
||
|
||
Quarantined (pending / blocked / hidden) entities: only admins can
|
||
archive or hard-delete; owner is refused so they can't erase the
|
||
evidence of a flagged upload before triage.
|
||
"""
|
||
entity = StoreEntitiesRepository(conn).get(entity_id)
|
||
if not entity:
|
||
raise HTTPException(status_code=404, detail="entity_not_found")
|
||
is_admin_caller = is_user_admin(user["id"], conn)
|
||
if entity["owner_user_id"] != user["id"] and not is_admin_caller:
|
||
raise HTTPException(status_code=403, detail="not_owner")
|
||
|
||
# Hard delete is admin-only. Owners (or admins without ?hard=true)
|
||
# take the soft archive path below.
|
||
if hard and not is_admin_caller:
|
||
raise HTTPException(
|
||
status_code=403,
|
||
detail={
|
||
"code": "hard_delete_admin_only",
|
||
"hint": "Hard delete is admin-only — it drops the bundle "
|
||
"from disk and removes existing installs. Use the "
|
||
"default Archive button to soft-delete (keeps "
|
||
"existing installs working).",
|
||
},
|
||
)
|
||
|
||
# Quarantined (non-approved + non-archived): owner can't touch.
|
||
# Admin can either Archive (soft) or Hard Delete from here.
|
||
if (
|
||
entity.get("visibility_status") not in ("approved", "archived")
|
||
and not is_admin_caller
|
||
):
|
||
raise HTTPException(
|
||
status_code=403,
|
||
detail={
|
||
"code": "quarantined_owner_cannot_delete",
|
||
"hint": "This submission is under quarantine while admins "
|
||
"review it. Edit and re-upload to fix the issues, "
|
||
"or wait for an admin to resolve the quarantine.",
|
||
},
|
||
)
|
||
|
||
if hard:
|
||
# Mark linked submissions before dropping the entity row so
|
||
# mark_deleted_for_entity can find them by entity_id.
|
||
StoreSubmissionsRepository(conn).mark_deleted_for_entity(entity_id)
|
||
UserStoreInstallsRepository(conn).delete_all_for_entity(entity_id)
|
||
StoreEntitiesRepository(conn).delete(entity_id)
|
||
shutil.rmtree(_entity_dir(entity_id), ignore_errors=True)
|
||
_audit(
|
||
conn,
|
||
user["id"],
|
||
"store.entity.hard_delete",
|
||
entity_id,
|
||
{"name": entity.get("name"),
|
||
"owner_user_id": entity.get("owner_user_id")},
|
||
)
|
||
_invalidate_etag()
|
||
return OkResponse()
|
||
|
||
# Soft archive — preserves disk + installs + audit chain.
|
||
# v36+: archive renames the entity row's `name` (appends
|
||
# `__archived__<epoch>`) so the (owner, name) UNIQUE slot AND
|
||
# the global `<name>-by-<owner_username>` slug slot free up for
|
||
# re-upload. The on-disk skill/agent/plugin subdir is renamed
|
||
# in lockstep + frontmatter rewritten so consumers see the
|
||
# plugin under the new slug on their next sync.
|
||
rename_info = StoreEntitiesRepository(conn).archive(
|
||
entity_id, by_user_id=user["id"],
|
||
)
|
||
original_name = rename_info["original_name"]
|
||
new_name = rename_info["new_name"]
|
||
if original_name and new_name and original_name != new_name:
|
||
owner_username = entity.get("owner_username") or ""
|
||
old_suffix = suffixed_name(original_name, owner_username)
|
||
new_suffix = suffixed_name(new_name, owner_username)
|
||
try:
|
||
_rename_baked_tree(
|
||
type_=entity["type"],
|
||
plugin_dir=_plugin_dir(entity_id),
|
||
old_suffix=old_suffix,
|
||
new_suffix=new_suffix,
|
||
description=entity.get("description"),
|
||
)
|
||
except Exception:
|
||
# On-disk rename failure leaves the row pointing at a
|
||
# stale slug. Revert the DB row so the system stays
|
||
# consistent (operator can retry archive).
|
||
logger.exception(
|
||
"archive on-disk rename failed for entity %s — "
|
||
"reverting DB",
|
||
entity_id,
|
||
)
|
||
conn.execute(
|
||
"""UPDATE store_entities
|
||
SET visibility_status = 'approved',
|
||
name = ?,
|
||
archived_at = NULL,
|
||
archived_by = NULL,
|
||
updated_at = ?
|
||
WHERE id = ?""",
|
||
[original_name, datetime.now(timezone.utc), entity_id],
|
||
)
|
||
raise HTTPException(
|
||
status_code=500, detail="archive_rename_failed",
|
||
)
|
||
_audit(
|
||
conn,
|
||
user["id"],
|
||
"store.entity.archive",
|
||
entity_id,
|
||
{"name": new_name,
|
||
"original_name": original_name,
|
||
"owner_user_id": entity.get("owner_user_id"),
|
||
"by_admin": is_admin_caller and entity["owner_user_id"] != user["id"]},
|
||
)
|
||
_invalidate_etag()
|
||
return OkResponse()
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Install / uninstall
|
||
# ---------------------------------------------------------------------------
|
||
|
||
|
||
@router.post("/entities/{entity_id}/install", response_model=InstallResponse)
|
||
async def install_entity(
|
||
entity_id: str,
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
repo = StoreEntitiesRepository(conn)
|
||
entity = repo.get(entity_id)
|
||
if not entity:
|
||
raise HTTPException(status_code=404, detail="entity_not_found")
|
||
# Block installs against entities still in guardrail review or that
|
||
# have been hidden — they are not visible in the public flea browse,
|
||
# but a user with the entity_id in hand could otherwise install
|
||
# directly. Owner installing their own pending entity gets the same
|
||
# 409 — they preview before publishing via /api/store/entities/preview
|
||
# or wait for approval.
|
||
if entity.get("visibility_status") != "approved" and not is_user_admin(user["id"], conn):
|
||
raise HTTPException(status_code=409, detail="entity_not_approved")
|
||
installs = UserStoreInstallsRepository(conn)
|
||
inserted = installs.install(user["id"], entity_id)
|
||
if inserted:
|
||
repo.bump_install_count(entity_id, +1)
|
||
_audit(conn, user["id"], "store.entity.install", entity_id)
|
||
_invalidate_etag()
|
||
return InstallResponse(entity_id=entity_id, installed=True)
|
||
|
||
|
||
@router.delete("/entities/{entity_id}/install", response_model=InstallResponse)
|
||
async def uninstall_entity(
|
||
entity_id: str,
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
installs = UserStoreInstallsRepository(conn)
|
||
deleted = installs.uninstall(user["id"], entity_id)
|
||
if deleted:
|
||
StoreEntitiesRepository(conn).bump_install_count(entity_id, -1)
|
||
_audit(conn, user["id"], "store.entity.uninstall", entity_id)
|
||
_invalidate_etag()
|
||
return InstallResponse(entity_id=entity_id, installed=False)
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Bundle: GET /api/store/bundle.zip + POST /api/store/import-bundle
|
||
# ---------------------------------------------------------------------------
|
||
#
|
||
# Whole-Store backup/restore primitive. Operationally consumed by the
|
||
# `agnes admin store {pull,push}` CLI commands which back up the Store to a
|
||
# git repo (or restore from one). Bundle format:
|
||
#
|
||
# agnes-store-bundle.zip
|
||
# ├── manifest.json ← {"format":1,"generated_at":..., "entries":[...]}
|
||
# └── entities/<entity_id>/
|
||
# ├── plugin/... ← canonical Claude Code plugin tree
|
||
# └── assets/... ← photo + docs
|
||
#
|
||
# Each manifest entry carries `owner_email` (resolved at export time from the
|
||
# users table) — when `import-bundle` lands on a different Agnes instance,
|
||
# the importer matches by email rather than by `owner_user_id` (the latter
|
||
# is per-instance and won't match). If the email is unknown on the target,
|
||
# we create a stub user (active=False, password_hash=NULL) so the historical
|
||
# owner is preserved; an admin can later activate or reassign.
|
||
#
|
||
# Bundle ordering is deterministic (entries sorted by entity_id, files within
|
||
# each entity sorted by relpath, fixed mtime) so that diffs of two
|
||
# successive snapshots stay clean when committed to git.
|
||
|
||
BUNDLE_FORMAT_VERSION = 1
|
||
BUNDLE_DETERMINISTIC_TIMESTAMP = (1980, 1, 1, 0, 0, 0)
|
||
|
||
|
||
class BundleEntry(BaseModel):
|
||
entity_id: str
|
||
type: str
|
||
name: str
|
||
description: Optional[str] = None
|
||
category: Optional[str] = None
|
||
version: str
|
||
owner_user_id: str
|
||
owner_email: Optional[str] = None
|
||
owner_username: str
|
||
install_count: int = 0
|
||
file_size: int = 0
|
||
photo_path: Optional[str] = None
|
||
video_url: Optional[str] = None
|
||
doc_paths: List[str] = []
|
||
created_at: Optional[str] = None
|
||
updated_at: Optional[str] = None
|
||
|
||
|
||
class BundleManifest(BaseModel):
|
||
format: int = BUNDLE_FORMAT_VERSION
|
||
generated_at: str
|
||
entry_count: int
|
||
entries: List[BundleEntry]
|
||
|
||
|
||
class ImportBundleResponse(BaseModel):
|
||
imported: int
|
||
replaced: int
|
||
skipped: int
|
||
stub_users_created: int
|
||
errors: List[dict] = []
|
||
|
||
|
||
def _resolve_owner_emails(
|
||
conn: duckdb.DuckDBPyConnection, owner_ids: List[str]
|
||
) -> dict:
|
||
"""Bulk-fetch user_id → email map for the given owners.
|
||
|
||
Empty list short-circuits to {} so the caller doesn't need a guard.
|
||
Missing rows are simply absent from the returned dict — the caller
|
||
falls back to the row's stored ``owner_username`` for diagnostics.
|
||
"""
|
||
if not owner_ids:
|
||
return {}
|
||
placeholders = ",".join(["?"] * len(owner_ids))
|
||
rows = conn.execute(
|
||
f"SELECT id, email FROM users WHERE id IN ({placeholders})",
|
||
list(owner_ids),
|
||
).fetchall()
|
||
return {r[0]: r[1] for r in rows}
|
||
|
||
|
||
def _walk_entity_files(entity_id: str) -> List[tuple[str, Path]]:
|
||
"""Return [(arcname, abs_path)] for every file under
|
||
``${DATA_DIR}/store/<entity_id>/`` that should land in the bundle.
|
||
|
||
Both ``plugin/`` and ``assets/`` subtrees are included. Output is
|
||
sorted by arcname so the resulting ZIP is byte-deterministic.
|
||
"""
|
||
out: list[tuple[str, Path]] = []
|
||
root = _entity_dir(entity_id)
|
||
if not root.is_dir():
|
||
return out
|
||
for f in sorted(p for p in root.rglob("*") if p.is_file()):
|
||
rel = f.relative_to(root).as_posix()
|
||
# Only ship plugin/ and assets/ subtrees — anything else under the
|
||
# entity dir is internal scratch and shouldn't enter the bundle.
|
||
first = rel.split("/", 1)[0] if "/" in rel else rel
|
||
if first not in ("plugin", "assets"):
|
||
continue
|
||
arc = f"entities/{entity_id}/{rel}"
|
||
out.append((arc, f))
|
||
return sorted(out, key=lambda t: t[0])
|
||
|
||
|
||
def _build_bundle_zip(
|
||
conn: duckdb.DuckDBPyConnection,
|
||
entries: List[dict],
|
||
) -> bytes:
|
||
"""Build the deterministic ZIP from a list of store_entities rows.
|
||
|
||
Entries arrive already filtered (per the caller's query). We resolve
|
||
owner_email in one bulk roundtrip to keep the export path off the
|
||
O(N) per-row query path.
|
||
"""
|
||
owner_emails = _resolve_owner_emails(
|
||
conn, list({e["owner_user_id"] for e in entries})
|
||
)
|
||
bundle_entries: List[dict] = []
|
||
for e in sorted(entries, key=lambda r: r["id"]):
|
||
bundle_entries.append(
|
||
{
|
||
"entity_id": e["id"],
|
||
"type": e["type"],
|
||
"name": e["name"],
|
||
"description": e.get("description"),
|
||
"category": e.get("category"),
|
||
"version": e["version"],
|
||
"owner_user_id": e["owner_user_id"],
|
||
"owner_email": owner_emails.get(e["owner_user_id"]),
|
||
"owner_username": e["owner_username"],
|
||
"install_count": int(e.get("install_count") or 0),
|
||
"file_size": int(e.get("file_size") or 0),
|
||
"photo_path": e.get("photo_path"),
|
||
"video_url": e.get("video_url"),
|
||
"doc_paths": e.get("doc_paths") or [],
|
||
"created_at": _to_iso(e.get("created_at")),
|
||
"updated_at": _to_iso(e.get("updated_at")),
|
||
}
|
||
)
|
||
|
||
manifest = {
|
||
"format": BUNDLE_FORMAT_VERSION,
|
||
"generated_at": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
|
||
"entry_count": len(bundle_entries),
|
||
"entries": bundle_entries,
|
||
}
|
||
|
||
members: list[tuple[str, bytes]] = [
|
||
("manifest.json", json.dumps(manifest, indent=2, sort_keys=False).encode("utf-8"))
|
||
]
|
||
for entry in bundle_entries:
|
||
for arc, abs_path in _walk_entity_files(entry["entity_id"]):
|
||
members.append((arc, abs_path.read_bytes()))
|
||
members.sort(key=lambda m: m[0])
|
||
|
||
buf = io.BytesIO()
|
||
with zipfile.ZipFile(buf, "w", compression=zipfile.ZIP_DEFLATED) as zf:
|
||
for arc, data in members:
|
||
info = zipfile.ZipInfo(filename=arc, date_time=BUNDLE_DETERMINISTIC_TIMESTAMP)
|
||
info.compress_type = zipfile.ZIP_DEFLATED
|
||
info.external_attr = 0o644 << 16
|
||
zf.writestr(info, data)
|
||
return buf.getvalue()
|
||
|
||
|
||
@router.get("/bundle.zip")
|
||
async def export_bundle(
|
||
type: Optional[str] = Query(None, description="skill | agent | plugin"),
|
||
category: Optional[str] = Query(None),
|
||
search: Optional[str] = Query(None),
|
||
owner: Optional[str] = Query(None, description="Filter by owner user_id"),
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Stream a ZIP of all (filtered) Store entities.
|
||
|
||
Auth: any authenticated user — the Store is community-open, the same
|
||
set is already visible via ``GET /api/store/entities``. The bundle is
|
||
deterministic so two consecutive pulls without state changes produce
|
||
byte-identical ZIPs (modulo the manifest's ``generated_at`` timestamp).
|
||
Filters mirror the listing endpoint so a backup workflow can scope by
|
||
type/owner if needed.
|
||
"""
|
||
if type and type not in _VALID_TYPES:
|
||
raise HTTPException(status_code=400, detail="invalid_type")
|
||
# `owner=me` magic value resolves to the caller's user id — used by
|
||
# `agnes store mine` so analysts can pull a bundle of just their own
|
||
# uploads without needing to look up their own user_id first.
|
||
if owner == "me":
|
||
owner = user["id"]
|
||
repo = StoreEntitiesRepository(conn)
|
||
# Page through everything. The 100/req limit on `list` is a UI
|
||
# pagination affordance, not a backup constraint — for a bulk export
|
||
# we want all matches.
|
||
items: list[dict] = []
|
||
skip = 0
|
||
page = 200
|
||
while True:
|
||
page_items, _total = repo.list(
|
||
skip=skip, limit=page, type=type, category=category,
|
||
search=search, owner_user_id=owner,
|
||
)
|
||
if not page_items:
|
||
break
|
||
items.extend(page_items)
|
||
if len(page_items) < page:
|
||
break
|
||
skip += page
|
||
|
||
payload = _build_bundle_zip(conn, items)
|
||
return Response(
|
||
content=payload,
|
||
media_type="application/zip",
|
||
headers={
|
||
"Content-Disposition": 'attachment; filename="agnes-store-bundle.zip"',
|
||
"X-Bundle-Entry-Count": str(len(items)),
|
||
},
|
||
)
|
||
|
||
|
||
def _import_one_entry(
|
||
conn: duckdb.DuckDBPyConnection,
|
||
entry: dict,
|
||
extract_root: Path,
|
||
*,
|
||
mode: str,
|
||
actor_user_id: str,
|
||
) -> tuple[str, int]:
|
||
"""Apply a single manifest entry. Returns ``(outcome, stub_users_created)``
|
||
where outcome is one of ``imported``, ``replaced``, ``skipped``.
|
||
|
||
Owner resolution: we match the bundle's ``owner_email`` against
|
||
``users.email``. Missing → create a stub (active=False, no password)
|
||
so the historical owner stays attached; an admin can activate or
|
||
reassign in /admin/users. The stub gets ``id = "imported-" +
|
||
sha256(email)[:12]`` to make it idempotent across repeated imports.
|
||
"""
|
||
entity_id = entry["entity_id"]
|
||
repo = StoreEntitiesRepository(conn)
|
||
existing = repo.get(entity_id)
|
||
|
||
if existing:
|
||
if mode == "skip":
|
||
return ("skipped", 0)
|
||
if mode == "merge":
|
||
# Keep newer version (content-hash). If equal, skip.
|
||
if (existing.get("version") or "") == (entry.get("version") or ""):
|
||
return ("skipped", 0)
|
||
# mode='replace' OR mode='merge' with newer version → fall through.
|
||
|
||
# Resolve owner.
|
||
user_repo = UserRepository(conn)
|
||
owner_email = (entry.get("owner_email") or "").strip().lower()
|
||
stub_created = 0
|
||
owner_user_id: Optional[str] = None
|
||
if owner_email:
|
||
existing_user = user_repo.get_by_email(owner_email)
|
||
if existing_user:
|
||
owner_user_id = existing_user["id"]
|
||
else:
|
||
import hashlib as _hl
|
||
stub_id = "imported-" + _hl.sha256(owner_email.encode("utf-8")).hexdigest()[:12]
|
||
if not user_repo.get_by_id(stub_id):
|
||
user_repo.create(
|
||
id=stub_id, email=owner_email, name=owner_email,
|
||
password_hash=None,
|
||
)
|
||
user_repo.update(stub_id, active=False)
|
||
stub_created = 1
|
||
owner_user_id = stub_id
|
||
if owner_user_id is None:
|
||
# Fallback: use the importer (admin) so the row has a valid owner.
|
||
owner_user_id = actor_user_id
|
||
|
||
# Materialize files.
|
||
src_dir = extract_root / "entities" / entity_id
|
||
if not src_dir.is_dir():
|
||
raise HTTPException(
|
||
status_code=422,
|
||
detail=f"manifest entry {entity_id!r} has no entities/<id>/ directory in the bundle",
|
||
)
|
||
target_dir = _entity_dir(entity_id)
|
||
if existing and target_dir.exists():
|
||
shutil.rmtree(target_dir)
|
||
target_dir.mkdir(parents=True, exist_ok=True)
|
||
for f in src_dir.rglob("*"):
|
||
if not f.is_file():
|
||
continue
|
||
rel = f.relative_to(src_dir)
|
||
dest = target_dir / rel
|
||
dest.parent.mkdir(parents=True, exist_ok=True)
|
||
shutil.copy2(f, dest)
|
||
|
||
# Upsert DB row.
|
||
if existing:
|
||
repo.update(
|
||
entity_id,
|
||
description=entry.get("description"),
|
||
category=entry.get("category"),
|
||
version=entry["version"],
|
||
photo_path=entry.get("photo_path"),
|
||
video_url=entry.get("video_url"),
|
||
doc_paths=entry.get("doc_paths") or [],
|
||
file_size=int(entry.get("file_size") or 0),
|
||
)
|
||
return ("replaced", stub_created)
|
||
|
||
repo.create(
|
||
id=entity_id,
|
||
owner_user_id=owner_user_id,
|
||
owner_username=entry.get("owner_username") or owner_email.split("@")[0],
|
||
type=entry["type"],
|
||
name=entry["name"],
|
||
description=entry.get("description"),
|
||
category=entry.get("category"),
|
||
version=entry["version"],
|
||
photo_path=entry.get("photo_path"),
|
||
video_url=entry.get("video_url"),
|
||
doc_paths=entry.get("doc_paths") or [],
|
||
file_size=int(entry.get("file_size") or 0),
|
||
)
|
||
return ("imported", stub_created)
|
||
|
||
|
||
@router.post("/import-bundle", response_model=ImportBundleResponse)
|
||
async def import_bundle(
|
||
file: UploadFile = File(...),
|
||
mode: str = Form("merge"),
|
||
user: dict = Depends(require_admin),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Restore a Store bundle ZIP — admin only.
|
||
|
||
Modes:
|
||
* ``merge`` (default) — upsert by ``entity_id``; existing entities
|
||
are replaced when the bundle's ``version`` differs, otherwise
|
||
skipped. Safe default for nightly cron round-trips.
|
||
* ``replace`` — every entity in the bundle overwrites the existing
|
||
row + on-disk tree. Bundle-not-in-target rows are NOT deleted.
|
||
* ``skip`` — only entities NOT already present are imported.
|
||
|
||
Owner resolution by ``owner_email``; missing emails get a stub
|
||
disabled user so the row references an existing ``users.id`` (no
|
||
foreign key, but app code joins).
|
||
"""
|
||
if mode not in {"merge", "replace", "skip"}:
|
||
raise HTTPException(status_code=400, detail="invalid_mode")
|
||
|
||
tmp, _ = await _stream_to_temp(file, MAX_ZIP_SIZE * 4, suffix=".zip")
|
||
tmp.close()
|
||
extract_root = Path(tempfile.mkdtemp(prefix="agnes_store_import_"))
|
||
try:
|
||
try:
|
||
with zipfile.ZipFile(tmp.name, "r") as zf:
|
||
_safe_zip_extract(zf, extract_root)
|
||
except zipfile.BadZipFile:
|
||
raise HTTPException(status_code=422, detail="zip_invalid")
|
||
finally:
|
||
Path(tmp.name).unlink(missing_ok=True)
|
||
|
||
manifest_path = extract_root / "manifest.json"
|
||
if not manifest_path.is_file():
|
||
raise HTTPException(status_code=422, detail="manifest_missing")
|
||
try:
|
||
manifest = json.loads(manifest_path.read_text(encoding="utf-8"))
|
||
except (OSError, ValueError):
|
||
raise HTTPException(status_code=422, detail="manifest_invalid")
|
||
if not isinstance(manifest, dict) or manifest.get("format") != BUNDLE_FORMAT_VERSION:
|
||
raise HTTPException(
|
||
status_code=422,
|
||
detail=f"manifest_unsupported_format (expected {BUNDLE_FORMAT_VERSION})",
|
||
)
|
||
entries = manifest.get("entries") or []
|
||
if not isinstance(entries, list):
|
||
raise HTTPException(status_code=422, detail="manifest_entries_invalid")
|
||
|
||
imported = replaced = skipped = stubs = 0
|
||
errors: list[dict] = []
|
||
for entry in entries:
|
||
if not isinstance(entry, dict) or not entry.get("entity_id"):
|
||
errors.append({"entry": entry, "error": "entry_missing_id"})
|
||
continue
|
||
try:
|
||
outcome, sc = _import_one_entry(
|
||
conn, entry, extract_root, mode=mode, actor_user_id=user["id"],
|
||
)
|
||
except HTTPException:
|
||
raise
|
||
except Exception as exc:
|
||
errors.append({"entity_id": entry.get("entity_id"), "error": str(exc)})
|
||
continue
|
||
stubs += sc
|
||
if outcome == "imported":
|
||
imported += 1
|
||
elif outcome == "replaced":
|
||
replaced += 1
|
||
elif outcome == "skipped":
|
||
skipped += 1
|
||
|
||
_audit(
|
||
conn, user["id"], "store.bundle.import", "bundle",
|
||
{
|
||
"mode": mode,
|
||
"imported": imported,
|
||
"replaced": replaced,
|
||
"skipped": skipped,
|
||
"stub_users_created": stubs,
|
||
"errors": len(errors),
|
||
},
|
||
)
|
||
_invalidate_etag()
|
||
return ImportBundleResponse(
|
||
imported=imported, replaced=replaced, skipped=skipped,
|
||
stub_users_created=stubs, errors=errors,
|
||
)
|
||
finally:
|
||
shutil.rmtree(extract_root, ignore_errors=True)
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Photo upload helper
|
||
# ---------------------------------------------------------------------------
|
||
|
||
|
||
async def _save_docs(docs: List[UploadFile], entity_id: str) -> List[str]:
|
||
"""Save user-uploaded auxiliary docs into ``assets/docs/``.
|
||
|
||
v32: file types restricted to PDF / Markdown / plain text via the shared
|
||
allowlist in ``src.marketplace_asset_validation``. Anything outside the allowlist
|
||
(DOCX, HTML, images, archives, …) returns HTTP 415 so the wizard can
|
||
surface a precise rejection message. Same allowlist is enforced on the
|
||
Curated mirror side so the two surfaces stay aligned.
|
||
|
||
Filenames are sanitized (basename only — strips any directory
|
||
component sent by the browser). On collision the file is suffixed
|
||
with a counter so two ``readme.md`` uploads don't overwrite each
|
||
other.
|
||
"""
|
||
if not docs:
|
||
return []
|
||
from src.marketplace_asset_validation import validate_doc_file
|
||
|
||
docs_dir = _assets_dir(entity_id) / "docs"
|
||
docs_dir.mkdir(parents=True, exist_ok=True)
|
||
saved: List[str] = []
|
||
for upload in docs:
|
||
if upload is None or not upload.filename:
|
||
continue
|
||
safe_name = Path(upload.filename).name
|
||
if not safe_name or safe_name.startswith("."):
|
||
continue
|
||
target = _unique_doc_path(docs_dir, safe_name)
|
||
tmp, _ = await _stream_to_temp(
|
||
upload, MAX_DOC_SIZE, suffix=Path(safe_name).suffix or ".tmp",
|
||
)
|
||
try:
|
||
tmp.close()
|
||
# Validate body+extension against the allowlist BEFORE moving the
|
||
# temp file into the assets dir. Reading the first 8 bytes is
|
||
# enough for the PDF magic-byte check; Markdown / plain text
|
||
# validators don't sniff body, so we don't pay any cost for them.
|
||
with open(tmp.name, "rb") as fh:
|
||
head = fh.read(8)
|
||
check = validate_doc_file(safe_name, head)
|
||
if not check.ok:
|
||
Path(tmp.name).unlink(missing_ok=True)
|
||
raise HTTPException(
|
||
status_code=415,
|
||
detail=(
|
||
f"unsupported_doc_type: {check.reason}. "
|
||
"Allowed: PDF (.pdf), Markdown (.md, .markdown), "
|
||
"plain text (.txt)."
|
||
),
|
||
)
|
||
shutil.move(tmp.name, str(target))
|
||
except HTTPException:
|
||
raise
|
||
except Exception:
|
||
Path(tmp.name).unlink(missing_ok=True)
|
||
raise
|
||
saved.append(f"assets/docs/{target.name}")
|
||
return saved
|
||
|
||
|
||
def _unique_doc_path(docs_dir: Path, filename: str) -> Path:
|
||
target = docs_dir / filename
|
||
if not target.exists():
|
||
return target
|
||
stem = target.stem
|
||
suffix = target.suffix
|
||
i = 2
|
||
while True:
|
||
candidate = docs_dir / f"{stem}-{i}{suffix}"
|
||
if not candidate.exists():
|
||
return candidate
|
||
i += 1
|
||
|
||
|
||
async def _save_photo(photo: UploadFile, entity_id: str) -> str:
|
||
"""Save the uploaded photo into the entity's assets dir. Returns the path
|
||
relative to the entity dir (what gets stored on the row).
|
||
|
||
v32: extension allowlist (PNG / JPEG / WEBP) is now backed by a
|
||
body-level magic-bytes check so a renamed ``payload.png`` carrying SVG
|
||
XML or arbitrary bytes can't smuggle through. Source: ``src.marketplace_asset_validation``.
|
||
"""
|
||
from src.marketplace_asset_validation import validate_image_file
|
||
|
||
raw_name = photo.filename or "photo"
|
||
ext = Path(raw_name).suffix.lower()
|
||
if ext not in ALLOWED_PHOTO_EXT:
|
||
raise HTTPException(status_code=415, detail="photo_unsupported_format")
|
||
tmp, size = await _stream_to_temp(photo, MAX_PHOTO_SIZE, suffix=ext)
|
||
try:
|
||
tmp.close()
|
||
# Magic-bytes verification on the saved temp file. Reading the first
|
||
# 16 bytes covers PNG (8), JPEG (3), and WEBP (12 — RIFF header).
|
||
with open(tmp.name, "rb") as fh:
|
||
head = fh.read(16)
|
||
check = validate_image_file(raw_name, head)
|
||
if not check.ok:
|
||
Path(tmp.name).unlink(missing_ok=True)
|
||
raise HTTPException(
|
||
status_code=415,
|
||
detail=f"photo_validation_failed: {check.reason}",
|
||
)
|
||
assets = _assets_dir(entity_id)
|
||
assets.mkdir(parents=True, exist_ok=True)
|
||
target_name = f"photo{ext}"
|
||
target = assets / target_name
|
||
# Replace any pre-existing photo of a different ext.
|
||
for existing in assets.glob("photo.*"):
|
||
try:
|
||
existing.unlink()
|
||
except OSError:
|
||
pass
|
||
shutil.move(tmp.name, str(target))
|
||
except HTTPException:
|
||
raise
|
||
except Exception:
|
||
Path(tmp.name).unlink(missing_ok=True)
|
||
raise
|
||
return f"assets/{target_name}"
|
||
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# ETag invalidation — pulled into a module-level helper so we can mock it in
|
||
# tests if needed.
|
||
# ---------------------------------------------------------------------------
|
||
|
||
|
||
def _invalidate_etag() -> None:
|
||
try:
|
||
from app.marketplace_server import packager
|
||
packager.invalidate_etag_cache()
|
||
except Exception:
|
||
logger.exception("failed to invalidate marketplace etag cache")
|