Commit graph

4 commits

Author SHA1 Message Date
Vojtech
6fb11a137b
fix(store): close 1 critical + 2 high adversarial-review findings (C2/H2/H3 from #318) (#320)
* fix(store): close 1 critical + 2 high adversarial-review findings

Three findings from Codex's adversarial review of PR #316 (issue #318).

C2 — `/api/store/bundle.zip` leaked quarantined entities. The export
endpoint called `repo.list(...)` with no `visibility_status` filter,
so any authenticated non-admin could download pending / blocked v1
bytes — bypassing the publish gate. Mirrored the browse-listing gate:
non-admin sees only `approved` (plus their own non-approved entries
via `include_owner_id`); admins skip the filter.

H2 — concurrent PUTs on the same entity could both pass the
`latest_for_entity` pending gate. The `update_entity` and
`restore_version` handlers now wrap their critical section in a
per-entity asyncio.Lock (`_hold_entity_write_lock`). Single-process
deployments are now serialized; multi-worker deployments still have
a residual window (tracked in issue #318).

H3 — `StoreSubmissionsRepository.update_status` blindly overwrote any
current status. A late BG-task LLM verdict could clobber an
`overridden` row back to `approved` / `blocked_llm` after the admin
had already force-published. Added compare-and-swap on terminal
statuses (`approved`, `overridden`, `blocked_inline`); callers that
legitimately need to overwrite (admin rescan etc.) pass
`allow_terminal_overwrite=True`. Returns bool indicating whether the
write landed; BG callers no-op on terminal rows.

Tests:
- TestStoreBundle::test_bundle_zip_filters_quarantined_for_non_owner
- TestStoreBundle::test_bundle_zip_owner_sees_own_pending
- TestStoreBundle::test_bundle_zip_admin_sees_all
- TestConcurrentPutSerialization::test_per_entity_lock_serializes
- TestConcurrentPutSerialization::test_per_entity_lock_does_not_serialize_across_entities
- TestBgTaskIdempotency::test_late_verdict_does_not_clobber_overridden
- TestBgTaskIdempotency::test_explicit_allow_terminal_overwrite_works

* review fix: runner.run_llm_review honors update_status CAS bool

Codex's CAS in update_status closes the DB-level race correctly, but
runner.run_llm_review was still discarding the new bool return on both
its `approved` and `blocked_llm` branches. When the CAS no-op'd
(submission already at terminal status — most commonly an admin
override fired mid-review), the runner kept running the downstream
cascade:
  - set_visibility_if_pending (no-op on approved, but still ran)
  - promote_version + _swap_live_to_version (forward-only check
    mitigated worst case)
  - update_flea_attribution
  - audit.log(action="store.submission.approved" / "blocked_llm")
    — this is the operator-visible damage: the audit trail would
    show a verdict that contradicts the row's actual `overridden`
    status.

Fix: capture the bool, skip the cascade on no-op, log a single
`store.submission.bg_verdict_skipped` audit row instead. Mirrors the
existing `superseded_reason` path the runner already has for the
archive-during-review case (TestPRReviewFixes::
test_bg_verdict_skipped_when_admin_archives_during_review).

Test: TestBgTaskIdempotency::test_runner_late_verdict_logs_skipped_not_approved
sets up the v1-approved + v2-pending + admin-override sequence, fires
run_llm_review directly with a mocked "approved" verdict, asserts row
stays overridden AND audit has bg_verdict_skipped AND audit does NOT
have a contradictory approved entry.

CHANGELOG H3 bullet expanded to acknowledge the bg_verdict_skipped
audit-row behavior — operator reviewing the queue now sees dropped
verdicts explicitly rather than via row-vs-audit contradiction.

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-15 17:45:43 +02:00
Vojtech
a694a30a5e
fix(store): surface review failures + harden publish gate (#316)
* fix(store): surface review failures + harden publish gate

Four independent fixes to the flea-market submission pipeline, all surfaced
by an admin upload that landed at status='approved' without an LLM review.

1. LLM truncation no longer pins submissions in review_error.
   - Raised MAX_RESPONSE_TOKENS 2500 → 6000 in llm_review.py
   - Added one-shot retry-with-doubled-budget in anthropic_provider.py
     (capped at 4× initial)

2. Flea detail page surfaces the latest submission's failure verdict even
   when a previously-approved version is still serving (deferred-promotion
   path). The _quarantine_banner gate widened from `visibility != approved`
   to also fire on `blocked_inline / blocked_llm / review_error`, with copy
   that distinguishes the v2+ edit case ("Latest edit failed review —
   previously approved version (vN) keeps serving") from the initial-upload
   quarantine wording.

3. Restore button + endpoint no longer allow restoring a version that was
   never approved. Added StoreEntitiesRepository.get_with_version_approvals
   joining store_submissions, gated the UI button on submission_status in
   ('approved', None), rendered status pills for non-restorable rows, and
   added a 400 version_not_approved guard in POST /restore.

4. **BREAKING (operator-facing)**: publish gate is now fail-CLOSED on
   misconfig. The previous get_guardrails_enabled() silently fell back to
   "disabled, auto-approve everything" when guardrails.enabled=true in YAML
   but no ANTHROPIC_API_KEY was in env. Split into:
     - get_guardrails_enabled()              (intent — YAML)
     - get_guardrails_llm_provider_ready()   (readiness — env)
   Three-state matrix:
     enabled=false                       → auto-approve (unchanged)
     enabled=true + ready=true           → normal pipeline (unchanged)
     enabled=true + ready=false (NEW)    → submissions hold at pending_llm
                                           awaiting admin retry or override
                                           (was: silent auto-approve)
   Admin "Retry review" eligibility broadened to include pending_llm.
   Boot-time WARNING banner surfaces the misconfig in app/main.py.
   docs/STORE_GUARDRAILS.md updated with the three-state matrix.
   Operators relying on the auto-fallback for local-dev no-LLM setups must
   now explicitly set `guardrails.enabled: false` in instance.yaml.

Tests: 4623 passed. Added TestPublishGateFailClosed (4 tests) and
TestRestoreVersion::test_restore_rejects_* (3 tests). conftest.py adds an
autouse fixture defaulting guardrails OFF so legacy tests don't need to
know about the new toggle.

* fix(store): admin override promotes v2+ edits to current

The override handler at app/api/admin.py:3708 only flipped submission
status → 'overridden' and entity visibility → 'approved'. Under the v37+
deferred-promotion model that's insufficient for v2+ edits / restores:
the new bundle sits in versions/v<N>/plugin/ and the entity row stays at
the prior approved version_no + hash + on-disk live bundle. Installers
kept getting the OLD bytes the admin had just intended to replace.

Mirror the runner.run_llm_review auto-approval branch: look up the
submission's version_hash in entity.version_history, and if its `n`
differs from entity.version_no, promote_version + _swap_live_to_version.
Initial v1 overrides are unaffected — the loop finds n=1 == version_no
and skips promotion.

Tests:
- test_override_v2_edit_promotes_to_current: stage v1 approved + v2
  blocked_llm; override the v2 sub; assert entity.version_no=2,
  entity.version flips off the v1 hash, and the live plugin/ dir
  mirrors versions/v2/plugin/.
- test_override_v1_initial_upload_no_promote: regression guard so the
  promote loop doesn't accidentally bump a v1 override.

Audit log gains a promoted_to_version_no field on the override action.

* fix(store): retry/rescan review staged bundle; override forward-only

Two adversarial-review findings from a Codex pass on the publish-gate
work.

C1. Admin retry + rescan were passing live `plugin/` to the LLM. For a
v2+ submission held at `pending_llm` / `blocked_llm` / `review_error`,
live still holds the prior approved version's bytes — so the LLM
reviewed the WRONG bytes, and the runner's hash-match promotion in
`run_llm_review` would then advance the entity to staged bytes that
were never actually reviewed. Resolve the staged
`<entity>/versions/v<N>/plugin/` from the submission's
`version_history` entry, with a fall-back to live for legacy pre-v37
rows that never seeded a versions/ dir. Helpers
`_submission_plugin_dir` and `_version_no_for_submission` added to
`app/api/store.py` so override / retry / rescan share one path.

H1. Override's promote loop used `target != current`, which would
silently demote the live bundle when admin overrode a stale v2
submission while v3 was already approved + live. Changed to
`target > current` so override flips status + visibility on the row
regardless, but on-disk promotion only fires forward. Same `>`
defensive guard applied in `runner.run_llm_review` so a late LLM
verdict racing with a newer approval can't demote either.

Tests:
- TestAdminRetryReviewsStagedBundle::test_retry_v2_blocked_passes_staged_dir_not_live
- TestAdminRetryReviewsStagedBundle::test_rescan_v2_blocked_passes_staged_dir_not_live
- TestOverrideForwardOnly::test_override_stale_v2_does_not_demote_when_v3_current

* review polish: CHANGELOG drift, override eligibility, defensive copy

Three small additions on top of the retry/rescan staged-bundle fix:

1. CHANGELOG: the PR's bullets had drifted into the released
   [0.54.17] section during rebase (context-match landed them next
   to already-released content). Moved them up to [Unreleased] where
   they belong; [0.54.17] now holds only what was actually released
   (refresh-marketplace ls-remote, /me/activity hero, CI sharding +
   workflow polish).

2. app/api/admin.py: admin override eligibility now accepts
   pending_llm alongside blocked_inline + blocked_llm + review_error.
   Closes a UX gap from the new fail-CLOSED behavior: under
   enabled-but-not-ready, a known-good submission would otherwise
   sit indefinitely until the admin set credentials AND clicked
   Retry. Override already routes through version_history (and is
   now forward-only on promote), so it stays safe for v2+ deferred-
   promotion submissions.

3. src/repositories/store_entities.py: get_with_version_approvals
   defensively copies each version_history entry before annotating
   with submission_status. self.get() re-parses JSON each call today
   so this is belt-and-suspenders against any future caching layer
   leaking the annotated key into a subsequent plain get() call.

Tests: 112 passed (focused on test_store_entity_versions +
test_admin_store_submissions, covering the retry/rescan staged-
bundle fix the author shipped + this polish).

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-15 15:52:07 +02:00
Vojtech
fb6e930bc9
feat(store-guardrails): per-component description quality + plain-language UX (#276)
* feat(store-guardrails): enforce per-component description quality

Two-tier hard guardrail on flea-market submissions. Empty / placeholder /
single-word descriptions now block before any LLM call; vague-but-passes-
floor descriptions block on the substantive LLM review layer.

Tier 1 — inline mechanical check (src/store_guardrails/content_check.py).
Walks the baked plugin tree, evaluates each component (plugin manifest,
agents, skills, commands) plus the submission-level form description
against a 60-char / 25-char (commands) / 5-distinct-word / 200-char-body
floor with a placeholder denylist (TODO, TBD, {{var}}, etc.). Floors
calibrated against real ecosystem norms: Claude / superpowers /
compound-engineering skill packs cluster 150–220 chars, npm / Docker /
VS Code at 100–120. InlineResult.passed now ANDs in content.status.

Tier 2 — LLM review extension (prompts.py + llm_review.py). System
prompt gains a content-quality criterion; REVIEW_JSON_SCHEMA carries a
content_quality {verdict, issues[]} object alongside the existing
security findings. is_safe() requires content_quality.verdict == 'pass'.
Single LLM call covers both dimensions. MAX_RESPONSE_TOKENS bumped
2000 → 2500 for the extra payload. Verdicts missing content_quality
treated as pass (backwards compat with already-recorded rows).

Submitter UX:
- /store/new wizard now carries a "Before you upload — what passes
  review" collapsible disclosure on both step 1 and step 2 with the
  bar + patterns that work. Live char counter on the description
  field. Per-component preview table (green/red dots from the new
  summarize_for_preview helper) renders after the ZIP /preview round
  trip, scoping each finding to its file.
- New /store/examples page with rejected/passes pairs for skill /
  agent / plugin / command plus a "Why these limits" research table.
  Anchored sections (#skill / #agent / #plugin / #command) so the
  rejection banner can deep-link by component_type.
- Quarantine banner _content_findings.html groups findings by file
  (one "See <type> example ↗" per component, not per field) and
  translates field codes (frontmatter.description / body / etc.) to
  plain-English labels. _content_howto_fix.html surfaces a static
  "Re-upload as new version" + "See examples" action row beneath any
  content failure on the entity detail page.
- _parse_frontmatter moved to src/store_guardrails/_frontmatter.py so
  the new check module shares the parser without inverting the
  app → src dependency direction.

Tests:
- New tests/test_store_guardrails_content.py (29 cases) covering
  every failure code per component type plus submission-level checks
  and the summarize_components / summarize_for_preview helpers.
- Extended test_store_guardrails_inline.py for the new
  InlineResult.content field + aggregate behaviour.
- Extended test_store_guardrails_llm.py for the new
  content_quality verdict pathways (fail blocks, missing field passes).
- Backfilled fixture descriptions across test_store_api.py,
  test_store_entity_versions.py, test_store_put_atomic.py,
  test_admin_store_submissions.py, test_marketplace_api.py,
  test_marketplace_v32_endpoints.py so existing happy-path tests
  clear the new 60-char floor.

* fix(content-guardrail): align agents walker with preview + drop import-time .format()

Two cleanups from the takeover review on #276 (vr/guardrails-content).

1) `_iter_components` for agents now skips files lacking frontmatter
   (no `name` AND no `description`). Pre-fix the walker greedily
   evaluated every `*.md` under `agents/` — `agents/README.md` and
   helper docs got flagged as "frontmatter.description empty"
   rejections. Worse: `summarize_for_preview` for `type=agent` ALREADY
   filters the same shape, so the upload preview gave a green dot
   while the post-bake check gave a red rejection on submit. Two new
   regression tests in TestAgentsWalkerSkipsNonAgentFiles pin both
   shapes (README + _NOTES.md) so the preview/check parity stays
   aligned.

2) `body_too_short` hints now use the same runtime-kwarg substitution
   pattern as every other hint in the table. Pre-fix the skill +
   agent body_too_short hints called `.format(min_chars=_MIN_BODY_CHARS)`
   at module-load time, but the call site `_hint_for(type_,
   "body_too_short")` didn't pass `min_chars=`, so the format() was
   just baking the constant at import. Cosmetic inconsistency; pass
   `min_chars=_MIN_BODY_CHARS` at the call site instead and let
   `_hint_for` do the substitution like it does for `too_short`.

Verified end-to-end:
- New TestAgentsWalkerSkipsNonAgentFiles cases fail on the unfixed
  walker (verified by reverting to the pre-fix file and re-running);
  pass cleanly after the fix.
- Full content-guardrail suite: 25/25 (23 existing + 2 new).
- Full pytest: 4189 passed, 25 skipped.

* release: 0.53.5 — content guardrail (flea-market submitter UX) + catalog ENTITY column + BQ hint dispatch

Bundles three threads landed in [Unreleased]:
- Vojta's flea-market content guardrail (two-tier mechanical + LLM)
- Zdeněk's `agnes catalog` ENTITY column replacement for FLAVOR
- Zdeněk's `/api/query` remote_estimate_failed hint dispatch fix

Plus the takeover hygiene from #276 review (agents walker preview/check
parity + body_too_short hint runtime kwarg consistency) and the
backslash-escape fix follow-up to v0.53.4 #275.

No DB migration; no API change. Patch upgrade lands transparently.
Upload form's new "Before you upload" disclosure + per-component preview
table appear on the next dev-VM auto-pull. Quarantine banner now groups
findings by file with "See <type> example ↗" deep-links to the new
/store/examples reference page.

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-12 21:48:27 +02:00
Vojtech
929520f5e1
Flea-market edit feature with version history (schema v37) (#239)
* feat(store): flea-market entity edit feature with version history (schema v38)

Owner + admin can now edit a store entity from a real Edit page at
/marketplace/flea/{id}/edit, replacing the prior "coming soon"
placeholder. Editable: display name, description, category, video
URL, cover photo, and an optional new bundle. Type is locked (400
type_locked). Display-name change renames the on-disk slug for both
live plugin/ and version dirs (reuses rename-on-archive helper).

Schema v38 (originally drafted as v37; renumbered after rebase onto
main where v37 was taken by the curated marketplace enrichment).

Versioning model:
* Each bundle update bakes into ${DATA_DIR}/store/<id>/versions/v<N+1>/plugin/
  and runs the standard guardrails pipeline.
* DEFERRED PROMOTION: live plugin/ + entity.version_no stay at the
  prior approved version through the LLM review window so existing
  installers keep receiving the previously approved bundle. Live swap
  + version_no/version/file_size bump happen only on LLM approval.
  Blocked verdicts leave the prior version serving forever.
* store_entities gains version_no INTEGER + version_history JSON.
  Each version_history entry carries hash, sha256, size, submission_id,
  created_at, created_by.
* Existing entities backfill to v1 with a single-entry history seeded
  from the row's current `version` hash. Initial create also seeds
  versions/v1/plugin/ so future restore can copy v1 bytes forward.

Concurrency:
* Block-while-pending: an in-flight LLM review blocks any further edit
  with 409 prior_version_pending. Owner waits 5-30s; Edit button on
  detail page renders disabled in the same window via the new
  edit_in_flight flag (decoupled from quarantine_sub since the
  deferred-promotion flow keeps visibility='approved').

Rollback:
* New endpoint POST /api/store/entities/{id}/versions/{n}/restore
  (owner + admin). Copies vN bundle forward as v<max+1> and re-runs
  guardrails (rules tighten over time; pre-approved bundles re-validate).
  Forward-only history. Same deferred-promotion semantics — live stays
  at prior version until LLM approves the restored copy.

UI:
* New /marketplace/flea/{id}/edit page (owner + admin gated).
* Versions card on plugin + item detail templates (owner/admin only)
  via shared _flea_versions.html partial.
* Admin queue gains v# column with current badge + separate Hash
  column. Submission detail surfaces Version + Bundle hash rows.
* Activity timeline split into per-submission + entity-wide cards;
  entity-wide rows render vN chips when audit row params reference
  a specific version.
* Section headers (Manifest / Static / Quality / LLM review) tag
  with vN chip via shared macro.
* Reviewed-by-model field surfaces explanatory text per status.
* Banner upload-failure now redirects to detail page on
  submission_blocked instead of staying stuck.

Tests: 24 in tests/test_store_entity_versions.py covering metadata-
only edit, bundle-edit version bump, type lock, block-while-pending,
name change disk rename, restore flow + 404/400/403 paths, edit page
404 for non-owner, versions card visibility gating, admin queue v#
column, admin detail Version/Hash rows, deferred-promotion installer
contract (pending review doesn't break installer / blocked verdict
keeps prior / approved promotes), admin can edit/restore non-owned,
restore deferred promotion, audit log per-version params. 214 tests
green across guardrails + edit + admin + repo + schema suites.

* docs(store): refresh update_entity docstring to match deferred-promotion + submission-status gate

Bring the docstring in sync with the actual fixes from the prior
commit. The pre-fix wording said the gate read
visibility_status='pending' AND submission status — under deferred
promotion that would never fire for v2+ edits. Now describes:

- Block-while-pending gates on submission.status DIRECTLY,
  independent of visibility (so v2+ deferred-promotion edits don't
  slip through).
- Display-name + bundle change defers the live rename to promotion;
  metadata-only renames stay immediate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 00:14:33 +04:00