Three remaining findings from Codex's adversarial review of PR #316
(issue #318), plus a pre-existing version-numbering bug surfaced while
fixing the atomic-promote ordering.
M1 — Prompt sentinel escape now covers file PATHS, not just file
BODIES. Pre-fix the per-file `--- FILE: {rel} ---` header inlined the
untrusted relative path unescaped. A ZIP whose relative path
concatenated to `</bundle>` (a `<` directory plus a `bundle>` child)
could forge the trust-boundary close tag from inside the path slot
and inject apparent system instructions after the boundary. Same
`_escape_sentinels` helper now runs on both rel and body.
M2 — Live-bundle swap + DB promote is now atomic-ish. The runner /
override / inline-promote paths previously called
`repo.promote_version(...)` then `_swap_live_to_version(...)`. A
missing `versions/v<N>/plugin/` made the swap silently return False
— leaving the DB ahead of live. New `promote_to_version` helper in
`app/api/store.py` swaps FIRST (with the existing
staging → backup → live rename chain) and only advances the DB row
after the on-disk swap succeeds; rolls live back to prior on DB
write failure.
While wiring up M2, the strict source check exposed a pre-existing
bug: `update_entity` and `restore_version` derived
`new_version_no = entity.version_no + 1`. Under deferred promotion
that's wrong — entity.version_no stays at the last approved version
while version_history grows with blocked / pending entries.
Subsequent PUTs would overwrite an in-flight blocked v2 dir's bytes,
then the runner's hash-match promotion in `runner.run_llm_review`
would load bytes that didn't match the recorded submission hash.
Fixed by deriving from `max(version_history.n) + 1`.
L1 — Admin forensic download now serves STAGED bundle bytes per
submission, not live. Pre-fix downloading a blocked v2 streamed
live's prior approved v1 bytes — admins reviewing whether to
override saw the wrong bytes. Resolves staged `versions/v<N>/plugin/`
via `_version_no_for_submission`; falls back to live for legacy rows
without history linkage.
Tests:
- test_filename_with_bundle_sentinel_is_escaped
- TestAtomicPromote::test_missing_source_dir_does_not_advance_db
- TestAdminBundleDownload::test_download_v2_blocked_returns_staged_bundle_not_live
* fix(store): close 1 critical + 2 high adversarial-review findings
Three findings from Codex's adversarial review of PR #316 (issue #318).
C2 — `/api/store/bundle.zip` leaked quarantined entities. The export
endpoint called `repo.list(...)` with no `visibility_status` filter,
so any authenticated non-admin could download pending / blocked v1
bytes — bypassing the publish gate. Mirrored the browse-listing gate:
non-admin sees only `approved` (plus their own non-approved entries
via `include_owner_id`); admins skip the filter.
H2 — concurrent PUTs on the same entity could both pass the
`latest_for_entity` pending gate. The `update_entity` and
`restore_version` handlers now wrap their critical section in a
per-entity asyncio.Lock (`_hold_entity_write_lock`). Single-process
deployments are now serialized; multi-worker deployments still have
a residual window (tracked in issue #318).
H3 — `StoreSubmissionsRepository.update_status` blindly overwrote any
current status. A late BG-task LLM verdict could clobber an
`overridden` row back to `approved` / `blocked_llm` after the admin
had already force-published. Added compare-and-swap on terminal
statuses (`approved`, `overridden`, `blocked_inline`); callers that
legitimately need to overwrite (admin rescan etc.) pass
`allow_terminal_overwrite=True`. Returns bool indicating whether the
write landed; BG callers no-op on terminal rows.
Tests:
- TestStoreBundle::test_bundle_zip_filters_quarantined_for_non_owner
- TestStoreBundle::test_bundle_zip_owner_sees_own_pending
- TestStoreBundle::test_bundle_zip_admin_sees_all
- TestConcurrentPutSerialization::test_per_entity_lock_serializes
- TestConcurrentPutSerialization::test_per_entity_lock_does_not_serialize_across_entities
- TestBgTaskIdempotency::test_late_verdict_does_not_clobber_overridden
- TestBgTaskIdempotency::test_explicit_allow_terminal_overwrite_works
* review fix: runner.run_llm_review honors update_status CAS bool
Codex's CAS in update_status closes the DB-level race correctly, but
runner.run_llm_review was still discarding the new bool return on both
its `approved` and `blocked_llm` branches. When the CAS no-op'd
(submission already at terminal status — most commonly an admin
override fired mid-review), the runner kept running the downstream
cascade:
- set_visibility_if_pending (no-op on approved, but still ran)
- promote_version + _swap_live_to_version (forward-only check
mitigated worst case)
- update_flea_attribution
- audit.log(action="store.submission.approved" / "blocked_llm")
— this is the operator-visible damage: the audit trail would
show a verdict that contradicts the row's actual `overridden`
status.
Fix: capture the bool, skip the cascade on no-op, log a single
`store.submission.bg_verdict_skipped` audit row instead. Mirrors the
existing `superseded_reason` path the runner already has for the
archive-during-review case (TestPRReviewFixes::
test_bg_verdict_skipped_when_admin_archives_during_review).
Test: TestBgTaskIdempotency::test_runner_late_verdict_logs_skipped_not_approved
sets up the v1-approved + v2-pending + admin-override sequence, fires
run_llm_review directly with a mocked "approved" verdict, asserts row
stays overridden AND audit has bg_verdict_skipped AND audit does NOT
have a contradictory approved entry.
CHANGELOG H3 bullet expanded to acknowledge the bg_verdict_skipped
audit-row behavior — operator reviewing the queue now sees dropped
verdicts explicitly rather than via row-vs-audit contradiction.
---------
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
* fix(store): surface review failures + harden publish gate
Four independent fixes to the flea-market submission pipeline, all surfaced
by an admin upload that landed at status='approved' without an LLM review.
1. LLM truncation no longer pins submissions in review_error.
- Raised MAX_RESPONSE_TOKENS 2500 → 6000 in llm_review.py
- Added one-shot retry-with-doubled-budget in anthropic_provider.py
(capped at 4× initial)
2. Flea detail page surfaces the latest submission's failure verdict even
when a previously-approved version is still serving (deferred-promotion
path). The _quarantine_banner gate widened from `visibility != approved`
to also fire on `blocked_inline / blocked_llm / review_error`, with copy
that distinguishes the v2+ edit case ("Latest edit failed review —
previously approved version (vN) keeps serving") from the initial-upload
quarantine wording.
3. Restore button + endpoint no longer allow restoring a version that was
never approved. Added StoreEntitiesRepository.get_with_version_approvals
joining store_submissions, gated the UI button on submission_status in
('approved', None), rendered status pills for non-restorable rows, and
added a 400 version_not_approved guard in POST /restore.
4. **BREAKING (operator-facing)**: publish gate is now fail-CLOSED on
misconfig. The previous get_guardrails_enabled() silently fell back to
"disabled, auto-approve everything" when guardrails.enabled=true in YAML
but no ANTHROPIC_API_KEY was in env. Split into:
- get_guardrails_enabled() (intent — YAML)
- get_guardrails_llm_provider_ready() (readiness — env)
Three-state matrix:
enabled=false → auto-approve (unchanged)
enabled=true + ready=true → normal pipeline (unchanged)
enabled=true + ready=false (NEW) → submissions hold at pending_llm
awaiting admin retry or override
(was: silent auto-approve)
Admin "Retry review" eligibility broadened to include pending_llm.
Boot-time WARNING banner surfaces the misconfig in app/main.py.
docs/STORE_GUARDRAILS.md updated with the three-state matrix.
Operators relying on the auto-fallback for local-dev no-LLM setups must
now explicitly set `guardrails.enabled: false` in instance.yaml.
Tests: 4623 passed. Added TestPublishGateFailClosed (4 tests) and
TestRestoreVersion::test_restore_rejects_* (3 tests). conftest.py adds an
autouse fixture defaulting guardrails OFF so legacy tests don't need to
know about the new toggle.
* fix(store): admin override promotes v2+ edits to current
The override handler at app/api/admin.py:3708 only flipped submission
status → 'overridden' and entity visibility → 'approved'. Under the v37+
deferred-promotion model that's insufficient for v2+ edits / restores:
the new bundle sits in versions/v<N>/plugin/ and the entity row stays at
the prior approved version_no + hash + on-disk live bundle. Installers
kept getting the OLD bytes the admin had just intended to replace.
Mirror the runner.run_llm_review auto-approval branch: look up the
submission's version_hash in entity.version_history, and if its `n`
differs from entity.version_no, promote_version + _swap_live_to_version.
Initial v1 overrides are unaffected — the loop finds n=1 == version_no
and skips promotion.
Tests:
- test_override_v2_edit_promotes_to_current: stage v1 approved + v2
blocked_llm; override the v2 sub; assert entity.version_no=2,
entity.version flips off the v1 hash, and the live plugin/ dir
mirrors versions/v2/plugin/.
- test_override_v1_initial_upload_no_promote: regression guard so the
promote loop doesn't accidentally bump a v1 override.
Audit log gains a promoted_to_version_no field on the override action.
* fix(store): retry/rescan review staged bundle; override forward-only
Two adversarial-review findings from a Codex pass on the publish-gate
work.
C1. Admin retry + rescan were passing live `plugin/` to the LLM. For a
v2+ submission held at `pending_llm` / `blocked_llm` / `review_error`,
live still holds the prior approved version's bytes — so the LLM
reviewed the WRONG bytes, and the runner's hash-match promotion in
`run_llm_review` would then advance the entity to staged bytes that
were never actually reviewed. Resolve the staged
`<entity>/versions/v<N>/plugin/` from the submission's
`version_history` entry, with a fall-back to live for legacy pre-v37
rows that never seeded a versions/ dir. Helpers
`_submission_plugin_dir` and `_version_no_for_submission` added to
`app/api/store.py` so override / retry / rescan share one path.
H1. Override's promote loop used `target != current`, which would
silently demote the live bundle when admin overrode a stale v2
submission while v3 was already approved + live. Changed to
`target > current` so override flips status + visibility on the row
regardless, but on-disk promotion only fires forward. Same `>`
defensive guard applied in `runner.run_llm_review` so a late LLM
verdict racing with a newer approval can't demote either.
Tests:
- TestAdminRetryReviewsStagedBundle::test_retry_v2_blocked_passes_staged_dir_not_live
- TestAdminRetryReviewsStagedBundle::test_rescan_v2_blocked_passes_staged_dir_not_live
- TestOverrideForwardOnly::test_override_stale_v2_does_not_demote_when_v3_current
* review polish: CHANGELOG drift, override eligibility, defensive copy
Three small additions on top of the retry/rescan staged-bundle fix:
1. CHANGELOG: the PR's bullets had drifted into the released
[0.54.17] section during rebase (context-match landed them next
to already-released content). Moved them up to [Unreleased] where
they belong; [0.54.17] now holds only what was actually released
(refresh-marketplace ls-remote, /me/activity hero, CI sharding +
workflow polish).
2. app/api/admin.py: admin override eligibility now accepts
pending_llm alongside blocked_inline + blocked_llm + review_error.
Closes a UX gap from the new fail-CLOSED behavior: under
enabled-but-not-ready, a known-good submission would otherwise
sit indefinitely until the admin set credentials AND clicked
Retry. Override already routes through version_history (and is
now forward-only on promote), so it stays safe for v2+ deferred-
promotion submissions.
3. src/repositories/store_entities.py: get_with_version_approvals
defensively copies each version_history entry before annotating
with submission_status. self.get() re-parses JSON each call today
so this is belt-and-suspenders against any future caching layer
leaking the annotated key into a subsequent plain get() call.
Tests: 112 passed (focused on test_store_entity_versions +
test_admin_store_submissions, covering the retry/rescan staged-
bundle fix the author shipped + this polish).
---------
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
* feat(store-guardrails): enforce per-component description quality
Two-tier hard guardrail on flea-market submissions. Empty / placeholder /
single-word descriptions now block before any LLM call; vague-but-passes-
floor descriptions block on the substantive LLM review layer.
Tier 1 — inline mechanical check (src/store_guardrails/content_check.py).
Walks the baked plugin tree, evaluates each component (plugin manifest,
agents, skills, commands) plus the submission-level form description
against a 60-char / 25-char (commands) / 5-distinct-word / 200-char-body
floor with a placeholder denylist (TODO, TBD, {{var}}, etc.). Floors
calibrated against real ecosystem norms: Claude / superpowers /
compound-engineering skill packs cluster 150–220 chars, npm / Docker /
VS Code at 100–120. InlineResult.passed now ANDs in content.status.
Tier 2 — LLM review extension (prompts.py + llm_review.py). System
prompt gains a content-quality criterion; REVIEW_JSON_SCHEMA carries a
content_quality {verdict, issues[]} object alongside the existing
security findings. is_safe() requires content_quality.verdict == 'pass'.
Single LLM call covers both dimensions. MAX_RESPONSE_TOKENS bumped
2000 → 2500 for the extra payload. Verdicts missing content_quality
treated as pass (backwards compat with already-recorded rows).
Submitter UX:
- /store/new wizard now carries a "Before you upload — what passes
review" collapsible disclosure on both step 1 and step 2 with the
bar + patterns that work. Live char counter on the description
field. Per-component preview table (green/red dots from the new
summarize_for_preview helper) renders after the ZIP /preview round
trip, scoping each finding to its file.
- New /store/examples page with rejected/passes pairs for skill /
agent / plugin / command plus a "Why these limits" research table.
Anchored sections (#skill / #agent / #plugin / #command) so the
rejection banner can deep-link by component_type.
- Quarantine banner _content_findings.html groups findings by file
(one "See <type> example ↗" per component, not per field) and
translates field codes (frontmatter.description / body / etc.) to
plain-English labels. _content_howto_fix.html surfaces a static
"Re-upload as new version" + "See examples" action row beneath any
content failure on the entity detail page.
- _parse_frontmatter moved to src/store_guardrails/_frontmatter.py so
the new check module shares the parser without inverting the
app → src dependency direction.
Tests:
- New tests/test_store_guardrails_content.py (29 cases) covering
every failure code per component type plus submission-level checks
and the summarize_components / summarize_for_preview helpers.
- Extended test_store_guardrails_inline.py for the new
InlineResult.content field + aggregate behaviour.
- Extended test_store_guardrails_llm.py for the new
content_quality verdict pathways (fail blocks, missing field passes).
- Backfilled fixture descriptions across test_store_api.py,
test_store_entity_versions.py, test_store_put_atomic.py,
test_admin_store_submissions.py, test_marketplace_api.py,
test_marketplace_v32_endpoints.py so existing happy-path tests
clear the new 60-char floor.
* fix(content-guardrail): align agents walker with preview + drop import-time .format()
Two cleanups from the takeover review on #276 (vr/guardrails-content).
1) `_iter_components` for agents now skips files lacking frontmatter
(no `name` AND no `description`). Pre-fix the walker greedily
evaluated every `*.md` under `agents/` — `agents/README.md` and
helper docs got flagged as "frontmatter.description empty"
rejections. Worse: `summarize_for_preview` for `type=agent` ALREADY
filters the same shape, so the upload preview gave a green dot
while the post-bake check gave a red rejection on submit. Two new
regression tests in TestAgentsWalkerSkipsNonAgentFiles pin both
shapes (README + _NOTES.md) so the preview/check parity stays
aligned.
2) `body_too_short` hints now use the same runtime-kwarg substitution
pattern as every other hint in the table. Pre-fix the skill +
agent body_too_short hints called `.format(min_chars=_MIN_BODY_CHARS)`
at module-load time, but the call site `_hint_for(type_,
"body_too_short")` didn't pass `min_chars=`, so the format() was
just baking the constant at import. Cosmetic inconsistency; pass
`min_chars=_MIN_BODY_CHARS` at the call site instead and let
`_hint_for` do the substitution like it does for `too_short`.
Verified end-to-end:
- New TestAgentsWalkerSkipsNonAgentFiles cases fail on the unfixed
walker (verified by reverting to the pre-fix file and re-running);
pass cleanly after the fix.
- Full content-guardrail suite: 25/25 (23 existing + 2 new).
- Full pytest: 4189 passed, 25 skipped.
* release: 0.53.5 — content guardrail (flea-market submitter UX) + catalog ENTITY column + BQ hint dispatch
Bundles three threads landed in [Unreleased]:
- Vojta's flea-market content guardrail (two-tier mechanical + LLM)
- Zdeněk's `agnes catalog` ENTITY column replacement for FLAVOR
- Zdeněk's `/api/query` remote_estimate_failed hint dispatch fix
Plus the takeover hygiene from #276 review (agents walker preview/check
parity + body_too_short hint runtime kwarg consistency) and the
backslash-escape fix follow-up to v0.53.4 #275.
No DB migration; no API change. Patch upgrade lands transparently.
Upload form's new "Before you upload" disclosure + per-component preview
table appear on the next dev-VM auto-pull. Quarantine banner now groups
findings by file with "See <type> example ↗" deep-links to the new
/store/examples reference page.
---------
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
* feat(store): flea-market entity edit feature with version history (schema v38)
Owner + admin can now edit a store entity from a real Edit page at
/marketplace/flea/{id}/edit, replacing the prior "coming soon"
placeholder. Editable: display name, description, category, video
URL, cover photo, and an optional new bundle. Type is locked (400
type_locked). Display-name change renames the on-disk slug for both
live plugin/ and version dirs (reuses rename-on-archive helper).
Schema v38 (originally drafted as v37; renumbered after rebase onto
main where v37 was taken by the curated marketplace enrichment).
Versioning model:
* Each bundle update bakes into ${DATA_DIR}/store/<id>/versions/v<N+1>/plugin/
and runs the standard guardrails pipeline.
* DEFERRED PROMOTION: live plugin/ + entity.version_no stay at the
prior approved version through the LLM review window so existing
installers keep receiving the previously approved bundle. Live swap
+ version_no/version/file_size bump happen only on LLM approval.
Blocked verdicts leave the prior version serving forever.
* store_entities gains version_no INTEGER + version_history JSON.
Each version_history entry carries hash, sha256, size, submission_id,
created_at, created_by.
* Existing entities backfill to v1 with a single-entry history seeded
from the row's current `version` hash. Initial create also seeds
versions/v1/plugin/ so future restore can copy v1 bytes forward.
Concurrency:
* Block-while-pending: an in-flight LLM review blocks any further edit
with 409 prior_version_pending. Owner waits 5-30s; Edit button on
detail page renders disabled in the same window via the new
edit_in_flight flag (decoupled from quarantine_sub since the
deferred-promotion flow keeps visibility='approved').
Rollback:
* New endpoint POST /api/store/entities/{id}/versions/{n}/restore
(owner + admin). Copies vN bundle forward as v<max+1> and re-runs
guardrails (rules tighten over time; pre-approved bundles re-validate).
Forward-only history. Same deferred-promotion semantics — live stays
at prior version until LLM approves the restored copy.
UI:
* New /marketplace/flea/{id}/edit page (owner + admin gated).
* Versions card on plugin + item detail templates (owner/admin only)
via shared _flea_versions.html partial.
* Admin queue gains v# column with current badge + separate Hash
column. Submission detail surfaces Version + Bundle hash rows.
* Activity timeline split into per-submission + entity-wide cards;
entity-wide rows render vN chips when audit row params reference
a specific version.
* Section headers (Manifest / Static / Quality / LLM review) tag
with vN chip via shared macro.
* Reviewed-by-model field surfaces explanatory text per status.
* Banner upload-failure now redirects to detail page on
submission_blocked instead of staying stuck.
Tests: 24 in tests/test_store_entity_versions.py covering metadata-
only edit, bundle-edit version bump, type lock, block-while-pending,
name change disk rename, restore flow + 404/400/403 paths, edit page
404 for non-owner, versions card visibility gating, admin queue v#
column, admin detail Version/Hash rows, deferred-promotion installer
contract (pending review doesn't break installer / blocked verdict
keeps prior / approved promotes), admin can edit/restore non-owned,
restore deferred promotion, audit log per-version params. 214 tests
green across guardrails + edit + admin + repo + schema suites.
* docs(store): refresh update_entity docstring to match deferred-promotion + submission-status gate
Bring the docstring in sync with the actual fixes from the prior
commit. The pre-fix wording said the gate read
visibility_status='pending' AND submission status — under deferred
promotion that would never fire for v2+ edits. Now describes:
- Block-while-pending gates on submission.status DIRECTLY,
independent of visibility (so v2+ deferred-promotion edits don't
slip through).
- Display-name + bundle change defers the live rename to promotion;
metadata-only renames stay immediate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>