* fix(store): surface review failures + harden publish gate
Four independent fixes to the flea-market submission pipeline, all surfaced
by an admin upload that landed at status='approved' without an LLM review.
1. LLM truncation no longer pins submissions in review_error.
- Raised MAX_RESPONSE_TOKENS 2500 → 6000 in llm_review.py
- Added one-shot retry-with-doubled-budget in anthropic_provider.py
(capped at 4× initial)
2. Flea detail page surfaces the latest submission's failure verdict even
when a previously-approved version is still serving (deferred-promotion
path). The _quarantine_banner gate widened from `visibility != approved`
to also fire on `blocked_inline / blocked_llm / review_error`, with copy
that distinguishes the v2+ edit case ("Latest edit failed review —
previously approved version (vN) keeps serving") from the initial-upload
quarantine wording.
3. Restore button + endpoint no longer allow restoring a version that was
never approved. Added StoreEntitiesRepository.get_with_version_approvals
joining store_submissions, gated the UI button on submission_status in
('approved', None), rendered status pills for non-restorable rows, and
added a 400 version_not_approved guard in POST /restore.
4. **BREAKING (operator-facing)**: publish gate is now fail-CLOSED on
misconfig. The previous get_guardrails_enabled() silently fell back to
"disabled, auto-approve everything" when guardrails.enabled=true in YAML
but no ANTHROPIC_API_KEY was in env. Split into:
- get_guardrails_enabled() (intent — YAML)
- get_guardrails_llm_provider_ready() (readiness — env)
Three-state matrix:
enabled=false → auto-approve (unchanged)
enabled=true + ready=true → normal pipeline (unchanged)
enabled=true + ready=false (NEW) → submissions hold at pending_llm
awaiting admin retry or override
(was: silent auto-approve)
Admin "Retry review" eligibility broadened to include pending_llm.
Boot-time WARNING banner surfaces the misconfig in app/main.py.
docs/STORE_GUARDRAILS.md updated with the three-state matrix.
Operators relying on the auto-fallback for local-dev no-LLM setups must
now explicitly set `guardrails.enabled: false` in instance.yaml.
Tests: 4623 passed. Added TestPublishGateFailClosed (4 tests) and
TestRestoreVersion::test_restore_rejects_* (3 tests). conftest.py adds an
autouse fixture defaulting guardrails OFF so legacy tests don't need to
know about the new toggle.
* fix(store): admin override promotes v2+ edits to current
The override handler at app/api/admin.py:3708 only flipped submission
status → 'overridden' and entity visibility → 'approved'. Under the v37+
deferred-promotion model that's insufficient for v2+ edits / restores:
the new bundle sits in versions/v<N>/plugin/ and the entity row stays at
the prior approved version_no + hash + on-disk live bundle. Installers
kept getting the OLD bytes the admin had just intended to replace.
Mirror the runner.run_llm_review auto-approval branch: look up the
submission's version_hash in entity.version_history, and if its `n`
differs from entity.version_no, promote_version + _swap_live_to_version.
Initial v1 overrides are unaffected — the loop finds n=1 == version_no
and skips promotion.
Tests:
- test_override_v2_edit_promotes_to_current: stage v1 approved + v2
blocked_llm; override the v2 sub; assert entity.version_no=2,
entity.version flips off the v1 hash, and the live plugin/ dir
mirrors versions/v2/plugin/.
- test_override_v1_initial_upload_no_promote: regression guard so the
promote loop doesn't accidentally bump a v1 override.
Audit log gains a promoted_to_version_no field on the override action.
* fix(store): retry/rescan review staged bundle; override forward-only
Two adversarial-review findings from a Codex pass on the publish-gate
work.
C1. Admin retry + rescan were passing live `plugin/` to the LLM. For a
v2+ submission held at `pending_llm` / `blocked_llm` / `review_error`,
live still holds the prior approved version's bytes — so the LLM
reviewed the WRONG bytes, and the runner's hash-match promotion in
`run_llm_review` would then advance the entity to staged bytes that
were never actually reviewed. Resolve the staged
`<entity>/versions/v<N>/plugin/` from the submission's
`version_history` entry, with a fall-back to live for legacy pre-v37
rows that never seeded a versions/ dir. Helpers
`_submission_plugin_dir` and `_version_no_for_submission` added to
`app/api/store.py` so override / retry / rescan share one path.
H1. Override's promote loop used `target != current`, which would
silently demote the live bundle when admin overrode a stale v2
submission while v3 was already approved + live. Changed to
`target > current` so override flips status + visibility on the row
regardless, but on-disk promotion only fires forward. Same `>`
defensive guard applied in `runner.run_llm_review` so a late LLM
verdict racing with a newer approval can't demote either.
Tests:
- TestAdminRetryReviewsStagedBundle::test_retry_v2_blocked_passes_staged_dir_not_live
- TestAdminRetryReviewsStagedBundle::test_rescan_v2_blocked_passes_staged_dir_not_live
- TestOverrideForwardOnly::test_override_stale_v2_does_not_demote_when_v3_current
* review polish: CHANGELOG drift, override eligibility, defensive copy
Three small additions on top of the retry/rescan staged-bundle fix:
1. CHANGELOG: the PR's bullets had drifted into the released
[0.54.17] section during rebase (context-match landed them next
to already-released content). Moved them up to [Unreleased] where
they belong; [0.54.17] now holds only what was actually released
(refresh-marketplace ls-remote, /me/activity hero, CI sharding +
workflow polish).
2. app/api/admin.py: admin override eligibility now accepts
pending_llm alongside blocked_inline + blocked_llm + review_error.
Closes a UX gap from the new fail-CLOSED behavior: under
enabled-but-not-ready, a known-good submission would otherwise
sit indefinitely until the admin set credentials AND clicked
Retry. Override already routes through version_history (and is
now forward-only on promote), so it stays safe for v2+ deferred-
promotion submissions.
3. src/repositories/store_entities.py: get_with_version_approvals
defensively copies each version_history entry before annotating
with submission_status. self.get() re-parses JSON each call today
so this is belt-and-suspenders against any future caching layer
leaking the annotated key into a subsequent plain get() call.
Tests: 112 passed (focused on test_store_entity_versions +
test_admin_store_submissions, covering the retry/rescan staged-
bundle fix the author shipped + this polish).
---------
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
223 lines
8.8 KiB
Python
223 lines
8.8 KiB
Python
"""LLM security review — agentic verdict over the uploaded bundle.
|
|
|
|
Mirrors the corporate-memory extraction pattern: builds a prompt, calls
|
|
``StructuredExtractor.extract_json`` against a strict JSON schema, parses
|
|
the result. The model tier is configurable per-instance via
|
|
``guardrails.review_model`` (Haiku / Sonnet / Opus) — see
|
|
``app/instance_config.get_guardrails_review_model``.
|
|
|
|
Cost cap: single-shot, MAX_REVIEW_BYTES content payload, max_tokens
|
|
budget tuned for the schema. Retries inside the extractor handle
|
|
transient errors; a hard timeout (configured at the connector level)
|
|
bounds wall-clock cost.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import logging
|
|
from pathlib import Path
|
|
from typing import Any, Dict, Optional
|
|
|
|
from connectors.llm.anthropic_provider import AnthropicExtractor
|
|
from connectors.llm.exceptions import (
|
|
LLMError,
|
|
)
|
|
from .prompts import (
|
|
REVIEW_JSON_SCHEMA,
|
|
SYSTEM_PROMPT,
|
|
build_review_prompt,
|
|
)
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
# Bound the response budget. The schema's two arrays (findings +
|
|
# content_quality.issues) are individually capped at maxItems=20, but
|
|
# each item is ~120-180 tokens (severity/category/file/explanation/
|
|
# fix_hint or file/field/issue/hint). A bundle with many weak
|
|
# descriptions can easily hit 4-5k output tokens. Stay generous on
|
|
# Haiku/Sonnet — output cost is negligible compared to the cost of a
|
|
# truncated verdict pinning the submission in `review_error`.
|
|
MAX_RESPONSE_TOKENS = 6000
|
|
|
|
|
|
def review_bundle(
|
|
plugin_dir: Path,
|
|
*,
|
|
type_: str,
|
|
name: str,
|
|
version: str,
|
|
description: Optional[str],
|
|
api_key: str,
|
|
model: str,
|
|
) -> Dict[str, Any]:
|
|
"""Run the LLM review against the baked plugin tree.
|
|
|
|
Returns a dict with the schema:
|
|
``{risk_level, summary, findings[], template_placeholders_found,
|
|
reviewed_by_model, error}``
|
|
|
|
``error`` is set only when the LLM call itself failed — the runner
|
|
surfaces this as ``status='review_error'`` and exposes a retry path
|
|
in the admin UI. On success ``error`` is None and the schema fields
|
|
are populated by the model.
|
|
"""
|
|
user_payload = build_review_prompt(
|
|
plugin_dir,
|
|
type_=type_,
|
|
name=name,
|
|
version=version,
|
|
description=description,
|
|
)
|
|
|
|
extractor = AnthropicExtractor(api_key=api_key, model=model)
|
|
try:
|
|
# Pass SYSTEM_PROMPT via the SDK's separate ``system=`` parameter
|
|
# so a crafted README inside the uploaded bundle cannot override
|
|
# the reviewer rules. The user-content payload wraps the bundle
|
|
# files in <bundle>...</bundle> sentinels per the trust-boundary
|
|
# paragraph in SYSTEM_PROMPT.
|
|
result = extractor.extract_json(
|
|
prompt=user_payload,
|
|
system=SYSTEM_PROMPT,
|
|
max_tokens=MAX_RESPONSE_TOKENS,
|
|
json_schema=REVIEW_JSON_SCHEMA,
|
|
schema_name="store_guardrails_review",
|
|
)
|
|
except LLMError as e:
|
|
# Bubble up as a structured error the runner can persist into
|
|
# store_submissions.llm_findings — operators see the failure
|
|
# type without having to grep logs.
|
|
logger.warning(
|
|
"LLM guardrail review failed for %s/%s@%s: %s",
|
|
type_, name, version, type(e).__name__,
|
|
)
|
|
return {
|
|
"risk_level": None,
|
|
"summary": None,
|
|
"findings": [],
|
|
"template_placeholders_found": 0,
|
|
"reviewed_by_model": model,
|
|
"error": f"{type(e).__name__}: {e}",
|
|
}
|
|
except Exception as e:
|
|
# Catch-all: AnthropicExtractor only translates the four transient
|
|
# error classes into LLMError. Anything else (BadRequestError on
|
|
# schema mismatch, NotFoundError on a stale model ID, library
|
|
# bugs) would otherwise bubble through BackgroundTasks and land
|
|
# the request in the unhandled-error path while the submission
|
|
# row stays stuck in 'pending_llm' forever. Convert here so the
|
|
# admin queue surfaces a real review_error with a retry button.
|
|
logger.exception(
|
|
"LLM guardrail review unexpected error for %s/%s@%s",
|
|
type_, name, version,
|
|
)
|
|
return {
|
|
"risk_level": None,
|
|
"summary": None,
|
|
"findings": [],
|
|
"template_placeholders_found": 0,
|
|
"reviewed_by_model": model,
|
|
"error": f"{type(e).__name__}: {e}",
|
|
}
|
|
|
|
# Defensive: ensure the keys we rely on exist with sane defaults even
|
|
# if the model returns the optional ones empty. ``risk_level`` is
|
|
# special-cased — defaulting it to "medium" would silently look like
|
|
# a model decision and trigger an implicit block. Surface as an error
|
|
# so the runner persists `status='review_error'` and the admin sees
|
|
# a retry button.
|
|
risk_level = result.get("risk_level")
|
|
content_quality = _normalize_content_quality(result.get("content_quality"))
|
|
if not risk_level:
|
|
return {
|
|
"risk_level": None,
|
|
"summary": result.get("summary") or "",
|
|
"findings": result.get("findings") or [],
|
|
"template_placeholders_found": int(result.get("template_placeholders_found") or 0),
|
|
"content_quality": content_quality,
|
|
"reviewed_by_model": model,
|
|
"error": "missing_risk_level",
|
|
}
|
|
return {
|
|
"risk_level": risk_level,
|
|
"summary": result.get("summary") or "",
|
|
"findings": result.get("findings") or [],
|
|
"template_placeholders_found": int(result.get("template_placeholders_found") or 0),
|
|
"content_quality": content_quality,
|
|
"reviewed_by_model": model,
|
|
"error": None,
|
|
}
|
|
|
|
|
|
def _normalize_content_quality(value: Any) -> Dict[str, Any]:
|
|
"""Coerce the model's content_quality output to a stable shape.
|
|
|
|
Missing or malformed content_quality is treated as pass — keeps
|
|
backwards compatibility with older recorded verdicts and ensures a
|
|
weird LLM response can't accidentally block all submissions. The
|
|
safe-by-default-on-empty stance is intentional: hard blocking is
|
|
the mechanical tier's job; the LLM tier is the substantive
|
|
judgement layer.
|
|
|
|
The verdict is treated as an aggregate of the evidence: if the
|
|
model said `fail` with empty issues we downgrade to `pass` (no
|
|
visible reason to block); if it said `pass` with non-empty issues
|
|
we promote to `fail` (defense in depth — a compromised or
|
|
prompt-injected model that flipped the verdict without zeroing the
|
|
issues would otherwise sneak through). #277 LOW #2.
|
|
"""
|
|
if not isinstance(value, dict):
|
|
return {"verdict": "pass", "issues": []}
|
|
verdict = value.get("verdict")
|
|
if verdict not in {"pass", "fail"}:
|
|
verdict = "pass"
|
|
issues_raw = value.get("issues") or []
|
|
issues: list = []
|
|
if isinstance(issues_raw, list):
|
|
for item in issues_raw:
|
|
if not isinstance(item, dict):
|
|
continue
|
|
issues.append({
|
|
"file": str(item.get("file") or ""),
|
|
"field": str(item.get("field") or "frontmatter.description"),
|
|
"issue": str(item.get("issue") or ""),
|
|
"hint": str(item.get("hint") or ""),
|
|
})
|
|
# If verdict claims fail but no issues were enumerated, downgrade to
|
|
# pass — we'd otherwise block a submission with no rendered reason.
|
|
if verdict == "fail" and not issues:
|
|
verdict = "pass"
|
|
# Symmetric defense: if the model emitted issues but said pass,
|
|
# promote to fail. The verdict must aggregate the evidence; a
|
|
# compromised or prompt-injected model that flips the verdict
|
|
# without zeroing the issues list would otherwise sneak through.
|
|
# Issue #277 LOW finding #2.
|
|
if verdict == "pass" and issues:
|
|
verdict = "fail"
|
|
return {"verdict": verdict, "issues": issues}
|
|
|
|
|
|
def is_safe(verdict: Dict[str, Any]) -> bool:
|
|
"""Decide whether a review verdict permits publication.
|
|
|
|
Pass condition: ``risk_level IN ('safe','low')`` AND no individual
|
|
finding has severity ``high|critical`` AND ``content_quality.verdict
|
|
== 'pass'``. We intentionally let ``medium`` findings through with a
|
|
low-risk verdict — the model uses medium for "would benefit from
|
|
review but no immediate exploit". Operators escalate to
|
|
Sonnet/Opus if they want a stricter floor. Content quality is a
|
|
hard gate: weak descriptions block, no severity scale.
|
|
"""
|
|
if verdict.get("error"):
|
|
return False
|
|
risk = (verdict.get("risk_level") or "").lower()
|
|
if risk not in {"safe", "low"}:
|
|
return False
|
|
for finding in verdict.get("findings") or []:
|
|
sev = (finding.get("severity") or "").lower()
|
|
if sev in {"high", "critical"}:
|
|
return False
|
|
content_quality = verdict.get("content_quality") or {}
|
|
if (content_quality.get("verdict") or "pass").lower() == "fail":
|
|
return False
|
|
return True
|