fix(store): surface review failures + harden publish gate (#316)
* fix(store): surface review failures + harden publish gate
Four independent fixes to the flea-market submission pipeline, all surfaced
by an admin upload that landed at status='approved' without an LLM review.
1. LLM truncation no longer pins submissions in review_error.
- Raised MAX_RESPONSE_TOKENS 2500 → 6000 in llm_review.py
- Added one-shot retry-with-doubled-budget in anthropic_provider.py
(capped at 4× initial)
2. Flea detail page surfaces the latest submission's failure verdict even
when a previously-approved version is still serving (deferred-promotion
path). The _quarantine_banner gate widened from `visibility != approved`
to also fire on `blocked_inline / blocked_llm / review_error`, with copy
that distinguishes the v2+ edit case ("Latest edit failed review —
previously approved version (vN) keeps serving") from the initial-upload
quarantine wording.
3. Restore button + endpoint no longer allow restoring a version that was
never approved. Added StoreEntitiesRepository.get_with_version_approvals
joining store_submissions, gated the UI button on submission_status in
('approved', None), rendered status pills for non-restorable rows, and
added a 400 version_not_approved guard in POST /restore.
4. **BREAKING (operator-facing)**: publish gate is now fail-CLOSED on
misconfig. The previous get_guardrails_enabled() silently fell back to
"disabled, auto-approve everything" when guardrails.enabled=true in YAML
but no ANTHROPIC_API_KEY was in env. Split into:
- get_guardrails_enabled() (intent — YAML)
- get_guardrails_llm_provider_ready() (readiness — env)
Three-state matrix:
enabled=false → auto-approve (unchanged)
enabled=true + ready=true → normal pipeline (unchanged)
enabled=true + ready=false (NEW) → submissions hold at pending_llm
awaiting admin retry or override
(was: silent auto-approve)
Admin "Retry review" eligibility broadened to include pending_llm.
Boot-time WARNING banner surfaces the misconfig in app/main.py.
docs/STORE_GUARDRAILS.md updated with the three-state matrix.
Operators relying on the auto-fallback for local-dev no-LLM setups must
now explicitly set `guardrails.enabled: false` in instance.yaml.
Tests: 4623 passed. Added TestPublishGateFailClosed (4 tests) and
TestRestoreVersion::test_restore_rejects_* (3 tests). conftest.py adds an
autouse fixture defaulting guardrails OFF so legacy tests don't need to
know about the new toggle.
* fix(store): admin override promotes v2+ edits to current
The override handler at app/api/admin.py:3708 only flipped submission
status → 'overridden' and entity visibility → 'approved'. Under the v37+
deferred-promotion model that's insufficient for v2+ edits / restores:
the new bundle sits in versions/v<N>/plugin/ and the entity row stays at
the prior approved version_no + hash + on-disk live bundle. Installers
kept getting the OLD bytes the admin had just intended to replace.
Mirror the runner.run_llm_review auto-approval branch: look up the
submission's version_hash in entity.version_history, and if its `n`
differs from entity.version_no, promote_version + _swap_live_to_version.
Initial v1 overrides are unaffected — the loop finds n=1 == version_no
and skips promotion.
Tests:
- test_override_v2_edit_promotes_to_current: stage v1 approved + v2
blocked_llm; override the v2 sub; assert entity.version_no=2,
entity.version flips off the v1 hash, and the live plugin/ dir
mirrors versions/v2/plugin/.
- test_override_v1_initial_upload_no_promote: regression guard so the
promote loop doesn't accidentally bump a v1 override.
Audit log gains a promoted_to_version_no field on the override action.
* fix(store): retry/rescan review staged bundle; override forward-only
Two adversarial-review findings from a Codex pass on the publish-gate
work.
C1. Admin retry + rescan were passing live `plugin/` to the LLM. For a
v2+ submission held at `pending_llm` / `blocked_llm` / `review_error`,
live still holds the prior approved version's bytes — so the LLM
reviewed the WRONG bytes, and the runner's hash-match promotion in
`run_llm_review` would then advance the entity to staged bytes that
were never actually reviewed. Resolve the staged
`<entity>/versions/v<N>/plugin/` from the submission's
`version_history` entry, with a fall-back to live for legacy pre-v37
rows that never seeded a versions/ dir. Helpers
`_submission_plugin_dir` and `_version_no_for_submission` added to
`app/api/store.py` so override / retry / rescan share one path.
H1. Override's promote loop used `target != current`, which would
silently demote the live bundle when admin overrode a stale v2
submission while v3 was already approved + live. Changed to
`target > current` so override flips status + visibility on the row
regardless, but on-disk promotion only fires forward. Same `>`
defensive guard applied in `runner.run_llm_review` so a late LLM
verdict racing with a newer approval can't demote either.
Tests:
- TestAdminRetryReviewsStagedBundle::test_retry_v2_blocked_passes_staged_dir_not_live
- TestAdminRetryReviewsStagedBundle::test_rescan_v2_blocked_passes_staged_dir_not_live
- TestOverrideForwardOnly::test_override_stale_v2_does_not_demote_when_v3_current
* review polish: CHANGELOG drift, override eligibility, defensive copy
Three small additions on top of the retry/rescan staged-bundle fix:
1. CHANGELOG: the PR's bullets had drifted into the released
[0.54.17] section during rebase (context-match landed them next
to already-released content). Moved them up to [Unreleased] where
they belong; [0.54.17] now holds only what was actually released
(refresh-marketplace ls-remote, /me/activity hero, CI sharding +
workflow polish).
2. app/api/admin.py: admin override eligibility now accepts
pending_llm alongside blocked_inline + blocked_llm + review_error.
Closes a UX gap from the new fail-CLOSED behavior: under
enabled-but-not-ready, a known-good submission would otherwise
sit indefinitely until the admin set credentials AND clicked
Retry. Override already routes through version_history (and is
now forward-only on promote), so it stays safe for v2+ deferred-
promotion submissions.
3. src/repositories/store_entities.py: get_with_version_approvals
defensively copies each version_history entry before annotating
with submission_status. self.get() re-parses JSON each call today
so this is belt-and-suspenders against any future caching layer
leaking the annotated key into a subsequent plain get() call.
Tests: 112 passed (focused on test_store_entity_versions +
test_admin_store_submissions, covering the retry/rescan staged-
bundle fix the author shipped + this polish).
---------
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
This commit is contained in:
parent
fbe756685b
commit
a694a30a5e
17 changed files with 1597 additions and 94 deletions
97
CHANGELOG.md
97
CHANGELOG.md
|
|
@ -10,8 +10,105 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C
|
|||
|
||||
## [Unreleased]
|
||||
|
||||
### Changed
|
||||
- **BREAKING (operator-facing)**: flea-market guardrail pipeline now
|
||||
fail-CLOSED on misconfig. `get_guardrails_enabled()` previously
|
||||
conflated operator intent (`guardrails.enabled` YAML) with provider
|
||||
readiness (`ANTHROPIC_API_KEY` env) — when intent was True but the
|
||||
env var was missing, the pipeline silently auto-fell-back to disabled
|
||||
and every upload landed `approved` without an LLM review. Split into
|
||||
`get_guardrails_enabled()` (intent only) and a new
|
||||
`get_guardrails_llm_provider_ready()` (env only). Three-state matrix:
|
||||
`enabled=false → auto-approve` (unchanged), `enabled=true + ready →
|
||||
normal hold-for-review` (unchanged), `enabled=true + not-ready →
|
||||
submissions sit at `pending_llm`, no auto-approval` (new — was
|
||||
silent auto-approval). Admin **Retry review** action on
|
||||
`/admin/store/submissions/<id>` now covers `pending_llm` too (was
|
||||
`review_error` + `blocked_llm`). Boot-time `WARNING` banner surfaces
|
||||
the misconfig in `app/main.py`. Operators who relied on the
|
||||
auto-fallback for local-dev no-LLM setups must now explicitly set
|
||||
`guardrails.enabled: false` in `instance.yaml` — same outcome,
|
||||
explicit intent.
|
||||
- Flea-market admin **Override** action on
|
||||
`/admin/store/submissions/<id>` now covers `pending_llm` submissions
|
||||
too (was `blocked_inline` + `blocked_llm` + `review_error`). Closes
|
||||
a UX gap created by the new fail-CLOSED behavior above: under
|
||||
enabled-but-not-ready, a known-good submission would otherwise sit
|
||||
indefinitely until the admin set credentials AND clicked **Retry
|
||||
review**. Override already routes through `entity.version_history`
|
||||
to resolve the correct version dir (and is now forward-only on
|
||||
promote — see the Fixed bullet below), so it stays safe for v2+
|
||||
deferred-promotion submissions.
|
||||
|
||||
### Fixed
|
||||
- **Unauthenticated browser requests to `GET /api/initial-workspace.zip` now redirect to `/login?next=/api/initial-workspace.zip` instead of returning a raw JSON 401** (#315). This is the one `/api/*` endpoint that's designed to be hit directly from a browser bookmark (the analyst clean-install zip), so it intentionally opts out of the global `_API_PATH_PREFIXES` "never redirect /api/*" contract in `app/main.py`. CLI / curl / other API clients (any `Accept` without `text/html` — including the `*/*` default) keep getting the 401 they can handle.
|
||||
- Flea-market LLM security review failed with `LLMFormatError: Response
|
||||
truncated (max_tokens) for schema store_guardrails_review` when the
|
||||
reviewer emitted many findings or content-quality issues. Raised the
|
||||
output budget (2500 → 6000 tokens) and added a one-shot
|
||||
retry-with-doubled-budget in the Anthropic provider (up to 4× initial)
|
||||
so the verdict survives an occasional verbose response instead of
|
||||
pinning the submission in `review_error`. (We initially added
|
||||
`maxItems=20` to the schema's `findings` and `content_quality.issues`
|
||||
arrays, but Anthropic's structured-output validator rejects `maxItems`
|
||||
on array types — removed.)
|
||||
- Flea-market entity detail page surfaces the latest submission's
|
||||
failure verdict even when a previously-approved version is still
|
||||
serving (deferred-promotion path). The owner / admin used to see no
|
||||
banner at all when a v2+ edit landed in `review_error`,
|
||||
`blocked_llm`, or `blocked_inline` because the `_quarantine_banner`
|
||||
partial gated on `entity.visibility_status != 'approved'`. The
|
||||
banner now renders for those failure statuses too, with copy that
|
||||
acknowledges the prior version is still live ("Latest edit failed
|
||||
review — previously approved version (vN) keeps serving …").
|
||||
- Flea-market admin **Retry review** and **Rescan** now review the
|
||||
STAGED version's bundle, not the live `plugin/` directory. For a
|
||||
v2+ edit held at `pending_llm` / `blocked_llm` / `review_error`,
|
||||
live still holds the prior approved version. Reviewing live would
|
||||
produce a verdict against the WRONG bytes; the runner's hash-match
|
||||
promotion would then advance the entity to staged bytes that were
|
||||
never actually reviewed. Now resolves the staged
|
||||
`versions/v<N>/plugin/` from the submission's `version_history`
|
||||
entry. (Critical — surfaced by adversarial review.)
|
||||
- Flea-market admin **Override** is now forward-only on promotion.
|
||||
Previously `target_version_no != current` would happily demote the
|
||||
live bundle when admin overrode a stale v2 submission while v3 was
|
||||
already approved + live. Changed to `target > current` so override
|
||||
flips status + visibility on the row regardless, but on-disk
|
||||
promotion only fires for newer versions. The same `> current`
|
||||
guard is now applied defensively in `runner.run_llm_review` so a
|
||||
late LLM verdict can't demote past a more recent approval either.
|
||||
(High — surfaced by adversarial review.)
|
||||
- Flea-market admin **Override** on a v2+ edit/restore submission now
|
||||
promotes the entity to the overridden version + swaps the on-disk
|
||||
live bundle. Pre-fix the override only flipped
|
||||
`visibility_status='approved'` and `submission.status='overridden'`,
|
||||
leaving `entity.version_no` at the prior approved version — so
|
||||
installers (and the marketplace UI) kept serving the OLD bytes the
|
||||
admin just intended to replace. Mirrors the auto-approval branch in
|
||||
`runner.run_llm_review`: look up the submission's version in
|
||||
`version_history`, `promote_version` + `_swap_live_to_version` when
|
||||
it differs from current. Initial-v1 overrides unchanged (no
|
||||
promotion needed).
|
||||
- Flea-market Restore button + endpoint no longer allow restoring a
|
||||
version that was never approved. The versions card hid the gate
|
||||
entirely (showed Restore on any non-current row), and the backend
|
||||
blocked only while the latest submission was `pending_*` — so a
|
||||
`blocked_llm` / `review_error` version could be restored anyway.
|
||||
Added `submission_status` decoration on `version_history` via a new
|
||||
`StoreEntitiesRepository.get_with_version_approvals` helper, gated
|
||||
the UI button on `submission_status in ('approved', None)`
|
||||
(`None` is the legacy v1 seed, back-compat-treated as approved),
|
||||
rendered status pills for blocked / errored / pending rows, and
|
||||
added a 400 `version_not_approved` guard in
|
||||
`POST /api/store/entities/{id}/versions/{n}/restore`.
|
||||
|
||||
### Internal
|
||||
- `StoreEntitiesRepository.get_with_version_approvals` now defensively
|
||||
copies each `version_history` entry before annotating with
|
||||
`submission_status`. `self.get()` re-parses JSON each call today so
|
||||
this is belt-and-suspenders, but it protects any future caching layer
|
||||
from leaking the annotated key into a subsequent plain `get()` call.
|
||||
|
||||
## [0.54.17] — 2026-05-15
|
||||
|
||||
|
|
|
|||
130
app/api/admin.py
130
app/api/admin.py
|
|
@ -3726,7 +3726,7 @@ async def admin_override_store_submission(
|
|||
sub = subs.get(submission_id)
|
||||
if sub is None:
|
||||
raise HTTPException(status_code=404, detail="submission_not_found")
|
||||
if sub["status"] not in {"blocked_inline", "blocked_llm", "review_error"}:
|
||||
if sub["status"] not in {"blocked_inline", "blocked_llm", "review_error", "pending_llm"}:
|
||||
raise HTTPException(
|
||||
status_code=409,
|
||||
detail=f"cannot_override_status:{sub['status']}",
|
||||
|
|
@ -3747,8 +3747,47 @@ async def admin_override_store_submission(
|
|||
ents_repo = StoreEntitiesRepository(conn)
|
||||
ents_repo.set_visibility(entity_id, "approved")
|
||||
|
||||
# Update usage-attribution rows now that the entity is live.
|
||||
# Mirror the runner's deferred-promotion path. An override on a
|
||||
# v2+ edit/restore must promote the overridden version + swap the
|
||||
# on-disk live bundle, otherwise the entity stays at the prior
|
||||
# approved version and installers keep receiving stale bytes the
|
||||
# admin just told us to replace. For an initial v1 submission
|
||||
# (no prior approved) the version_no already matches — the loop
|
||||
# just no-ops and we skip promotion harmlessly.
|
||||
entity_row = ents_repo.get(entity_id) or {}
|
||||
promoted_to: Optional[int] = None
|
||||
sub_hash = sub.get("version")
|
||||
target_version_no: Optional[int] = None
|
||||
for entry in (entity_row.get("version_history") or []):
|
||||
if entry.get("hash") == sub_hash:
|
||||
try:
|
||||
target_version_no = int(entry.get("n"))
|
||||
except (TypeError, ValueError):
|
||||
target_version_no = None
|
||||
break
|
||||
# Forward-only: refuse to promote backwards. An admin overriding a
|
||||
# stale v2 submission when v3 is already approved + live must NOT
|
||||
# demote the live bundle back to v2's bytes. Override flips the
|
||||
# row's status + visibility regardless; only the version-promote
|
||||
# is gated. Forward (target > current) is the only motion the
|
||||
# publish-gate model is designed to express.
|
||||
if (target_version_no is not None
|
||||
and target_version_no > int(entity_row.get("version_no") or 0)):
|
||||
if ents_repo.promote_version(entity_id, target_version_no):
|
||||
try:
|
||||
from app.api.store import _swap_live_to_version
|
||||
_swap_live_to_version(entity_id, target_version_no)
|
||||
promoted_to = target_version_no
|
||||
# Re-read after promotion so attribution picks up the
|
||||
# new version's name/type if a rename was bundled in.
|
||||
entity_row = ents_repo.get(entity_id) or entity_row
|
||||
except Exception:
|
||||
logger.exception(
|
||||
"override: live swap failed for entity %s v%d",
|
||||
entity_id, target_version_no,
|
||||
)
|
||||
|
||||
# Update usage-attribution rows now that the entity is live.
|
||||
update_flea_attribution(
|
||||
conn, entity_id,
|
||||
entity_row.get("type", ""),
|
||||
|
|
@ -3765,6 +3804,7 @@ async def admin_override_store_submission(
|
|||
"prior_status": sub["status"],
|
||||
"prior_findings": sub.get("llm_findings"),
|
||||
"prior_inline": sub.get("inline_checks"),
|
||||
"promoted_to_version_no": promoted_to,
|
||||
},
|
||||
result="ok",
|
||||
)
|
||||
|
|
@ -3799,7 +3839,11 @@ async def admin_rescan_store_submission(
|
|||
whose bundle was rolled back (no ``entity_id``) cannot be rescanned —
|
||||
nothing to scan.
|
||||
"""
|
||||
from app.api.store import _plugin_dir
|
||||
from app.api.store import (
|
||||
_plugin_dir,
|
||||
_submission_plugin_dir,
|
||||
_version_no_for_submission,
|
||||
)
|
||||
from src.db import get_system_db
|
||||
from src.repositories.store_entities import StoreEntitiesRepository
|
||||
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||||
|
|
@ -3809,7 +3853,10 @@ async def admin_rescan_store_submission(
|
|||
default_model_loader,
|
||||
run_llm_review,
|
||||
)
|
||||
from app.instance_config import get_guardrails_enabled
|
||||
from app.instance_config import (
|
||||
get_guardrails_enabled,
|
||||
get_guardrails_llm_provider_ready,
|
||||
)
|
||||
|
||||
subs = StoreSubmissionsRepository(conn)
|
||||
sub = subs.get(submission_id)
|
||||
|
|
@ -3819,12 +3866,21 @@ async def admin_rescan_store_submission(
|
|||
if not entity_id:
|
||||
raise HTTPException(status_code=409, detail="cannot_rescan_without_entity")
|
||||
|
||||
plugin_dir = _plugin_dir(entity_id)
|
||||
ents = StoreEntitiesRepository(conn)
|
||||
entity = ents.get(entity_id)
|
||||
# Rescan the bundle this submission represents — not live. See the
|
||||
# equivalent fix in /retry for the full reasoning. Same fall-back
|
||||
# to live for legacy rows that never seeded a versions/v<N>/plugin/.
|
||||
target_n = _version_no_for_submission(entity or {}, submission_id)
|
||||
if target_n is not None:
|
||||
plugin_dir = _submission_plugin_dir(entity_id, target_n)
|
||||
if not plugin_dir.exists():
|
||||
plugin_dir = _plugin_dir(entity_id)
|
||||
else:
|
||||
plugin_dir = _plugin_dir(entity_id)
|
||||
if not plugin_dir.exists():
|
||||
raise HTTPException(status_code=410, detail="bundle_missing")
|
||||
|
||||
ents = StoreEntitiesRepository(conn)
|
||||
entity = ents.get(entity_id)
|
||||
description = (entity or {}).get("description")
|
||||
|
||||
inline = run_inline_checks(
|
||||
|
|
@ -3849,20 +3905,28 @@ async def admin_rescan_store_submission(
|
|||
)
|
||||
return {"ok": True, "submission_id": submission_id, "status": "blocked_inline"}
|
||||
|
||||
# Inline passes — schedule LLM if enabled, else auto-approve.
|
||||
guardrails_on = get_guardrails_enabled()
|
||||
new_status = "pending_llm" if guardrails_on else "approved"
|
||||
# Inline passes. Three-state matrix:
|
||||
# - intent False → auto-approve (operator opt-out)
|
||||
# - intent True + ready → pending_llm, schedule LLM
|
||||
# - intent True + not-ready → pending_llm, DO NOT schedule (admin
|
||||
# retries from the same endpoint after providing credentials)
|
||||
guardrails_enabled = get_guardrails_enabled()
|
||||
provider_ready = get_guardrails_llm_provider_ready()
|
||||
hold_for_review = guardrails_enabled
|
||||
schedule_async_llm = guardrails_enabled and provider_ready
|
||||
guardrails_on = hold_for_review # retained for audit-log compat
|
||||
new_status = "pending_llm" if hold_for_review else "approved"
|
||||
subs.conn.execute(
|
||||
"UPDATE store_submissions SET inline_checks = ?, llm_findings = NULL, "
|
||||
"status = ?, updated_at = current_timestamp "
|
||||
"WHERE id = ?",
|
||||
[__import__("json").dumps(inline.to_response_dict()), new_status, submission_id],
|
||||
)
|
||||
if guardrails_on:
|
||||
if hold_for_review:
|
||||
ents.set_visibility(entity_id, "pending")
|
||||
else:
|
||||
ents.set_visibility(entity_id, "approved")
|
||||
# Guardrails off — immediately live; write attribution.
|
||||
# Guardrails explicitly disabled — immediately live; write attribution.
|
||||
entity_row = ents.get(entity_id) or {}
|
||||
update_flea_attribution(
|
||||
conn, entity_id,
|
||||
|
|
@ -3874,9 +3938,10 @@ async def admin_rescan_store_submission(
|
|||
action="store.submission.rescan",
|
||||
resource=f"store_submission:{submission_id}",
|
||||
params={"entity_id": entity_id, "outcome": new_status,
|
||||
"guardrails_enabled": guardrails_on},
|
||||
"guardrails_enabled": guardrails_on,
|
||||
"provider_ready": provider_ready},
|
||||
)
|
||||
if guardrails_on:
|
||||
if schedule_async_llm:
|
||||
background.add_task(
|
||||
run_llm_review,
|
||||
submission_id,
|
||||
|
|
@ -3895,13 +3960,27 @@ async def admin_retry_store_submission(
|
|||
user: dict = Depends(require_admin),
|
||||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||||
):
|
||||
"""Re-queue the LLM review for a submission stuck in ``review_error``.
|
||||
"""Re-queue the LLM review for a submission.
|
||||
|
||||
Eligible statuses:
|
||||
* ``review_error`` — LLM call failed, admin retrying after the
|
||||
underlying issue (rate limit, timeout, transient outage) clears.
|
||||
* ``blocked_llm`` — admin disagrees with the prior verdict; rerun
|
||||
from a clean slate (review rules may have shifted since).
|
||||
* ``pending_llm`` — submission was held when the LLM provider had
|
||||
no credentials in env (fail-CLOSED matrix: intent True + not
|
||||
ready). Admin sets the key and re-fires from here.
|
||||
|
||||
Only valid when the original submission's plugin tree is still on
|
||||
disk — for inline-blocked rows the bundle was deleted at POST time.
|
||||
"""
|
||||
from app.api.store import _plugin_dir
|
||||
from app.api.store import (
|
||||
_plugin_dir,
|
||||
_submission_plugin_dir,
|
||||
_version_no_for_submission,
|
||||
)
|
||||
from src.db import get_system_db
|
||||
from src.repositories.store_entities import StoreEntitiesRepository
|
||||
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||||
from src.store_guardrails.runner import (
|
||||
default_api_key_loader,
|
||||
|
|
@ -3913,7 +3992,7 @@ async def admin_retry_store_submission(
|
|||
sub = subs.get(submission_id)
|
||||
if sub is None:
|
||||
raise HTTPException(status_code=404, detail="submission_not_found")
|
||||
if sub["status"] not in {"review_error", "blocked_llm"}:
|
||||
if sub["status"] not in {"review_error", "blocked_llm", "pending_llm"}:
|
||||
raise HTTPException(
|
||||
status_code=409, detail=f"cannot_retry_status:{sub['status']}",
|
||||
)
|
||||
|
|
@ -3923,7 +4002,22 @@ async def admin_retry_store_submission(
|
|||
status_code=409, detail="cannot_retry_without_entity",
|
||||
)
|
||||
|
||||
plugin_dir = _plugin_dir(entity_id)
|
||||
# Review the STAGED version's bytes — not live. For a v2+ edit
|
||||
# held at pending_llm or blocked_llm, live `plugin/` still holds
|
||||
# the prior approved version. Reviewing live would produce a
|
||||
# verdict against the wrong bytes; the runner's hash-match
|
||||
# promotion would then advance the entity to staged bytes that
|
||||
# were never actually reviewed.
|
||||
ent = StoreEntitiesRepository(conn).get(entity_id) or {}
|
||||
target_n = _version_no_for_submission(ent, submission_id)
|
||||
if target_n is not None:
|
||||
plugin_dir = _submission_plugin_dir(entity_id, target_n)
|
||||
# Fall back to live for legacy pre-v37 rows where the version
|
||||
# dir was never seeded.
|
||||
if not plugin_dir.exists():
|
||||
plugin_dir = _plugin_dir(entity_id)
|
||||
else:
|
||||
plugin_dir = _plugin_dir(entity_id)
|
||||
if not plugin_dir.exists():
|
||||
raise HTTPException(status_code=410, detail="bundle_missing")
|
||||
|
||||
|
|
|
|||
135
app/api/store.py
135
app/api/store.py
|
|
@ -27,7 +27,7 @@ import uuid
|
|||
import zipfile
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any, List, Optional
|
||||
from typing import Any, Dict, List, Optional
|
||||
from urllib.parse import urlparse
|
||||
|
||||
import duckdb
|
||||
|
|
@ -46,7 +46,10 @@ from pydantic import BaseModel
|
|||
|
||||
from app.auth.access import is_user_admin, require_admin
|
||||
from app.auth.dependencies import _get_db, get_current_user
|
||||
from app.instance_config import get_guardrails_enabled
|
||||
from app.instance_config import (
|
||||
get_guardrails_enabled,
|
||||
get_guardrails_llm_provider_ready,
|
||||
)
|
||||
from app.utils import get_store_dir
|
||||
from src.db import get_system_db
|
||||
from src.repositories.audit import AuditRepository
|
||||
|
|
@ -381,6 +384,37 @@ def _plugin_dir(entity_id: str) -> Path:
|
|||
return _entity_dir(entity_id) / "plugin"
|
||||
|
||||
|
||||
def _submission_plugin_dir(
|
||||
entity_id: str, version_no: int,
|
||||
) -> Path:
|
||||
"""On-disk path of the bundle a particular submission represents.
|
||||
|
||||
v37+ writes each version's bytes under
|
||||
``<entity_dir>/versions/v<N>/plugin/``. Live ``plugin/`` mirrors
|
||||
whichever ``v<N>`` is currently promoted. Admin retry / rescan
|
||||
flows MUST review the staged version dir, not live — otherwise a
|
||||
pending v2 retry would re-review v1's bytes, a clean verdict
|
||||
would land, and the runner's hash-match promotion would advance
|
||||
the entity to v2 bytes that were never actually reviewed.
|
||||
"""
|
||||
return _entity_dir(entity_id) / "versions" / f"v{int(version_no)}" / "plugin"
|
||||
|
||||
|
||||
def _version_no_for_submission(
|
||||
entity_row: Dict[str, Any], submission_id: str,
|
||||
) -> Optional[int]:
|
||||
"""Locate the version_history entry produced by `submission_id`
|
||||
and return its ``n``. Used by admin retry / rescan / override to
|
||||
pick the right ``versions/v<N>/plugin/`` directory."""
|
||||
for entry in (entity_row.get("version_history") or []):
|
||||
if entry.get("submission_id") == submission_id:
|
||||
try:
|
||||
return int(entry.get("n"))
|
||||
except (TypeError, ValueError):
|
||||
return None
|
||||
return None
|
||||
|
||||
|
||||
def _assets_dir(entity_id: str) -> Path:
|
||||
return _entity_dir(entity_id) / "assets"
|
||||
|
||||
|
|
@ -1318,11 +1352,22 @@ async def create_entity(
|
|||
bundle_meta = compute_bundle_meta(plugin_dir)
|
||||
subs_repo = StoreSubmissionsRepository(conn)
|
||||
|
||||
guardrails_on = get_guardrails_enabled()
|
||||
# When the pipeline is disabled (local dev / no ANTHROPIC_API_KEY)
|
||||
# we approve immediately. Inline checks ran above and recorded a
|
||||
# passing verdict; skip the async LLM step and the 'pending' hold.
|
||||
initial_visibility = "approved" if not guardrails_on else "pending"
|
||||
# Three-state matrix (fail-CLOSED on misconfig):
|
||||
# - intent False → auto-approve (operator opt-out, e.g. local dev)
|
||||
# - intent True + ready → hold for review, schedule LLM async
|
||||
# - intent True + NOT ready → hold for review, DO NOT auto-approve
|
||||
# (submission sits at pending_llm; admin can set the key + click
|
||||
# Retry review or override-publish manually). The previous
|
||||
# auto-fallback silently approved everything when the env-var
|
||||
# was missing — a fail-OPEN hole.
|
||||
guardrails_enabled = get_guardrails_enabled()
|
||||
provider_ready = get_guardrails_llm_provider_ready()
|
||||
hold_for_review = guardrails_enabled # intent drives the hold
|
||||
schedule_async_llm = guardrails_enabled and provider_ready
|
||||
# `guardrails_on` retained for downstream audit-log compat —
|
||||
# historical column meaning is "did the pipeline gate this row".
|
||||
guardrails_on = hold_for_review
|
||||
initial_visibility = "pending" if hold_for_review else "approved"
|
||||
photo_rel = await _save_photo(photo, entity_id) if photo else None
|
||||
doc_rels = await _save_docs(docs, entity_id)
|
||||
|
||||
|
|
@ -1369,11 +1414,16 @@ async def create_entity(
|
|||
"guardrails_enabled": guardrails_on,
|
||||
},
|
||||
)
|
||||
if guardrails_on:
|
||||
if schedule_async_llm:
|
||||
_schedule_llm_review(background_tasks, sub_id, plugin_dir)
|
||||
elif initial_visibility == "approved":
|
||||
# Guardrails off — entity is immediately live; write attribution now.
|
||||
# Guardrails explicitly disabled in YAML — entity is
|
||||
# immediately live; write attribution now.
|
||||
update_flea_attribution(conn, entity_id, type, final_name)
|
||||
# Else: enabled-but-not-ready. Submission sits at pending_llm;
|
||||
# entity stays at visibility=pending. Admin retries from the
|
||||
# admin UI once credentials are present, OR overrides + publishes
|
||||
# the row manually. No silent auto-approval.
|
||||
finally:
|
||||
shutil.rmtree(scratch, ignore_errors=True)
|
||||
|
||||
|
|
@ -1703,7 +1753,17 @@ async def update_entity(
|
|||
# guardrails disabled the path collapses: submission lands at
|
||||
# 'approved' and we promote synchronously below.
|
||||
if file is not None and new_version_no is not None and new_version_dir is not None:
|
||||
guardrails_on = get_guardrails_enabled()
|
||||
# Same three-state matrix as the initial-upload path. Hold the
|
||||
# new version (defer promotion) whenever guardrails are enabled
|
||||
# — even when the provider isn't ready. Promotion only fires on
|
||||
# an actual LLM approval OR when the operator explicitly opted
|
||||
# out via `guardrails.enabled: false`. Misconfig (enabled +
|
||||
# no key) sits at pending_llm awaiting admin action.
|
||||
guardrails_enabled = get_guardrails_enabled()
|
||||
provider_ready = get_guardrails_llm_provider_ready()
|
||||
hold_for_review = guardrails_enabled
|
||||
schedule_async_llm = guardrails_enabled and provider_ready
|
||||
guardrails_on = hold_for_review
|
||||
subs_repo = StoreSubmissionsRepository(conn)
|
||||
from src.store_guardrails.bundle_meta import compute_bundle_meta
|
||||
# Hash the NEW version dir, not live (which still holds the
|
||||
|
|
@ -1715,7 +1775,7 @@ async def update_entity(
|
|||
type=entity["type"],
|
||||
name=rename_to or entity["name"],
|
||||
version=new_version,
|
||||
status="approved" if not guardrails_on else "pending_llm",
|
||||
status="approved" if not hold_for_review else "pending_llm",
|
||||
entity_id=entity_id,
|
||||
inline_checks=inline_after_update.to_response_dict()
|
||||
if inline_after_update else None,
|
||||
|
|
@ -1734,20 +1794,21 @@ async def update_entity(
|
|||
)
|
||||
_audit(
|
||||
conn, user["id"],
|
||||
"store.submission.accepted" if guardrails_on else "store.submission.approved",
|
||||
"store.submission.accepted" if hold_for_review else "store.submission.approved",
|
||||
sub_id, {"entity_id": entity_id, "on": "update",
|
||||
"version_no": appended_n,
|
||||
"guardrails_enabled": guardrails_on},
|
||||
)
|
||||
if guardrails_on:
|
||||
if schedule_async_llm:
|
||||
# Live remains at prior approved bundle. LLM reviews the
|
||||
# new version dir; runner promotes on approval.
|
||||
_schedule_llm_review(
|
||||
background_tasks, sub_id, new_version_dir / "plugin",
|
||||
)
|
||||
else:
|
||||
# Guardrails off → implicit approval. Promote inline:
|
||||
# update entity columns + swap live to new version.
|
||||
elif not hold_for_review:
|
||||
# Guardrails explicitly disabled → implicit approval.
|
||||
# Promote inline: update entity columns + swap live to new
|
||||
# version.
|
||||
repo.promote_version(entity_id, appended_n)
|
||||
_swap_live_to_version(entity_id, appended_n)
|
||||
# Live bundle is now the new version; refresh attribution.
|
||||
|
|
@ -1757,6 +1818,10 @@ async def update_entity(
|
|||
ent_after_swap.get("type") or entity["type"],
|
||||
ent_after_swap.get("name") or (rename_to or entity["name"]),
|
||||
)
|
||||
# Else (enabled + not-ready): submission sits at pending_llm,
|
||||
# live continues serving the prior approved version. Admin
|
||||
# retries from /admin/store/submissions once credentials are
|
||||
# provided.
|
||||
|
||||
# Use the freshly-appended version number when a bundle change
|
||||
# produced one, falling back to the planned new_version_no for
|
||||
|
|
@ -1847,6 +1912,30 @@ async def restore_version(
|
|||
},
|
||||
)
|
||||
|
||||
# Refuse to restore a version that was never approved. Look up the
|
||||
# submission that produced version `v<version_no>` and gate on its
|
||||
# status. Legacy v1 (no submission_id — seeded pre-v37) is
|
||||
# back-compat treated as approved. The UI also hides the Restore
|
||||
# button for these statuses, but defense in depth: a direct API
|
||||
# caller bypasses the template.
|
||||
src_sub_id = next(
|
||||
(entry.get("submission_id") for entry in
|
||||
(entity.get("version_history") or [])
|
||||
if int(entry.get("n") or 0) == int(version_no)),
|
||||
None,
|
||||
)
|
||||
if src_sub_id:
|
||||
src_sub = StoreSubmissionsRepository(conn).get(src_sub_id)
|
||||
if src_sub and src_sub.get("status") not in ("approved",):
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail={
|
||||
"code": "version_not_approved",
|
||||
"version_no": version_no,
|
||||
"source_status": src_sub.get("status"),
|
||||
},
|
||||
)
|
||||
|
||||
# Locate the source version dir.
|
||||
source_dir = (
|
||||
_entity_dir(entity_id) / "versions" / f"v{version_no}" / "plugin"
|
||||
|
|
@ -1904,7 +1993,11 @@ async def restore_version(
|
|||
# swap + version_no/version/file_size bump) waits on LLM approval.
|
||||
from src.store_guardrails.bundle_meta import compute_bundle_meta
|
||||
target_meta = compute_bundle_meta(target_plugin)
|
||||
guardrails_on = get_guardrails_enabled()
|
||||
# Same three-state hold-for-review matrix as create/edit.
|
||||
guardrails_enabled = get_guardrails_enabled()
|
||||
provider_ready = get_guardrails_llm_provider_ready()
|
||||
hold_for_review = guardrails_enabled
|
||||
schedule_async_llm = guardrails_enabled and provider_ready
|
||||
subs_repo = StoreSubmissionsRepository(conn)
|
||||
sub_id = subs_repo.create(
|
||||
submitter_id=user["id"],
|
||||
|
|
@ -1912,7 +2005,7 @@ async def restore_version(
|
|||
type=entity["type"],
|
||||
name=entity["name"],
|
||||
version=new_version,
|
||||
status="approved" if not guardrails_on else "pending_llm",
|
||||
status="approved" if not hold_for_review else "pending_llm",
|
||||
entity_id=entity_id,
|
||||
inline_checks=inline.to_response_dict(),
|
||||
file_size=target_meta.file_size,
|
||||
|
|
@ -1932,11 +2025,13 @@ async def restore_version(
|
|||
"new_version_no": appended_n,
|
||||
"submission_id": sub_id},
|
||||
)
|
||||
if guardrails_on:
|
||||
if schedule_async_llm:
|
||||
_schedule_llm_review(background_tasks, sub_id, target_plugin)
|
||||
else:
|
||||
elif not hold_for_review:
|
||||
# Guardrails explicitly disabled — inline-promote.
|
||||
repo.promote_version(entity_id, appended_n)
|
||||
_swap_live_to_version(entity_id, appended_n)
|
||||
# Else (enabled + not-ready): defer promotion, await admin retry.
|
||||
|
||||
_invalidate_etag()
|
||||
return _entity_to_response(conn, repo.get(entity_id)) # type: ignore[arg-type]
|
||||
|
|
|
|||
|
|
@ -569,21 +569,32 @@ def get_guardrails_min_body_chars() -> int:
|
|||
|
||||
|
||||
def get_guardrails_enabled() -> bool:
|
||||
"""Master kill-switch for the guardrail pipeline.
|
||||
"""Operator's stated intent for the guardrail pipeline.
|
||||
|
||||
Defaults to True. Operators can disable by setting ``guardrails.enabled:
|
||||
false`` in instance.yaml — useful for local development against the
|
||||
UI without burning Anthropic tokens. Inline checks always run; this
|
||||
flag only gates the LLM step (and skips the pending → approved hold).
|
||||
Reads ``guardrails.enabled`` from instance.yaml. Defaults to True.
|
||||
Operators can explicitly disable by setting ``guardrails.enabled:
|
||||
false`` — useful for local development against the UI without
|
||||
burning Anthropic tokens.
|
||||
|
||||
Auto-fallback: when the YAML says enabled but no ANTHROPIC_API_KEY /
|
||||
LLM_API_KEY is set in the environment, behave as disabled. This
|
||||
keeps the test suite + first-boot operator experience sane — uploads
|
||||
auto-approve until the operator wires up an LLM provider rather than
|
||||
silently piling up in ``review_error``.
|
||||
Note: this returns intent ONLY. Whether the LLM provider has
|
||||
working credentials is a separate concern — see
|
||||
:func:`get_guardrails_llm_provider_ready`. The two are kept apart
|
||||
so callers can implement fail-CLOSED behavior: hold submissions at
|
||||
``pending_llm`` (instead of silently auto-approving) when intent is
|
||||
True but credentials are missing.
|
||||
"""
|
||||
return bool(get_value("guardrails", "enabled", default=True))
|
||||
|
||||
|
||||
def get_guardrails_llm_provider_ready() -> bool:
|
||||
"""Whether the LLM provider has credentials present in the
|
||||
environment.
|
||||
|
||||
Independent from :func:`get_guardrails_enabled` (operator intent).
|
||||
A False return here when intent is True is a misconfiguration —
|
||||
the caller should hold submissions at ``pending_llm`` and surface
|
||||
a loud boot-time warning rather than silently auto-approving.
|
||||
"""
|
||||
if not bool(get_value("guardrails", "enabled", default=True)):
|
||||
return False
|
||||
if os.environ.get("ANTHROPIC_API_KEY", "").strip():
|
||||
return True
|
||||
if os.environ.get("LLM_API_KEY", "").strip():
|
||||
|
|
|
|||
27
app/main.py
27
app/main.py
|
|
@ -480,6 +480,33 @@ def create_app() -> FastAPI:
|
|||
logger.warning("NEVER enable this in a deployment reachable from the internet.")
|
||||
logger.warning("=" * 60)
|
||||
|
||||
# Guardrails misconfig surface — fail-CLOSED matrix means an enabled
|
||||
# pipeline with no LLM credentials in env will hold every submission
|
||||
# at `pending_llm` indefinitely. Surface this LOUDLY at boot so the
|
||||
# operator finds the cause before the submission queue piles up.
|
||||
try:
|
||||
from app.instance_config import (
|
||||
get_guardrails_enabled,
|
||||
get_guardrails_llm_provider_ready,
|
||||
)
|
||||
if get_guardrails_enabled() and not get_guardrails_llm_provider_ready():
|
||||
logger.warning("=" * 60)
|
||||
logger.warning(
|
||||
"GUARDRAILS ENABLED BUT NO LLM PROVIDER CREDENTIALS FOUND.",
|
||||
)
|
||||
logger.warning(
|
||||
"Set ANTHROPIC_API_KEY (or LLM_API_KEY) in the environment, "
|
||||
"or disable guardrails in instance.yaml.",
|
||||
)
|
||||
logger.warning(
|
||||
"Until then, every flea-market upload will sit at "
|
||||
"status='pending_llm' awaiting admin retry — the LLM "
|
||||
"review step cannot run.",
|
||||
)
|
||||
logger.warning("=" * 60)
|
||||
except Exception:
|
||||
logger.exception("guardrails readiness probe failed at boot")
|
||||
|
||||
# Seed admin user (SEED_ADMIN_EMAIL) and add them to the Admin user_group.
|
||||
# Optional SEED_ADMIN_PASSWORD lets the seeded user sign in immediately
|
||||
# without going through bootstrap; never overwritten if already set.
|
||||
|
|
|
|||
|
|
@ -1368,40 +1368,44 @@ async def marketplace_flea_detail(
|
|||
from src.repositories.store_entities import StoreEntitiesRepository
|
||||
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||||
|
||||
entity = StoreEntitiesRepository(conn).get(entity_id)
|
||||
if not entity:
|
||||
repo = StoreEntitiesRepository(conn)
|
||||
# Owner/admin get a version-status decorated entity so the versions
|
||||
# card can gate the Restore button on past-version approval state.
|
||||
# Plain viewers don't see the versions card at all, so the cheaper
|
||||
# plain get() suffices.
|
||||
base_entity = repo.get(entity_id)
|
||||
if not base_entity:
|
||||
raise HTTPException(status_code=404, detail="Entity not found")
|
||||
|
||||
# Refuse early — same gate as the API + the asset endpoints. 404
|
||||
# (not 403) so the entity's existence isn't leaked.
|
||||
_enforce_visibility(entity, user, conn)
|
||||
_enforce_visibility(base_entity, user, conn)
|
||||
|
||||
is_owner = entity.get("owner_user_id") == user.get("id")
|
||||
is_owner = base_entity.get("owner_user_id") == user.get("id")
|
||||
is_admin = is_user_admin(user["id"], conn)
|
||||
|
||||
entity = (
|
||||
repo.get_with_version_approvals(entity_id)
|
||||
if (is_owner or is_admin) else base_entity
|
||||
)
|
||||
|
||||
# Pull the latest submission so the quarantine banner can render
|
||||
# the most recent verdict (inline_checks + llm_findings). Skipped
|
||||
# for plain non-owner non-admin viewers since they only see
|
||||
# approved entities and don't need the diagnostic.
|
||||
# the most recent verdict (inline_checks + llm_findings). v37:
|
||||
# always load for owner/admin, even when the entity itself is
|
||||
# approved at a prior version — under deferred promotion, a v2+
|
||||
# edit can leave the latest submission in `review_error` /
|
||||
# `blocked_llm` while the entity row stays approved. Gating the
|
||||
# fetch on `visibility_status != 'approved'` silently hid the
|
||||
# failure from the owner.
|
||||
quarantine_sub = None
|
||||
if (is_owner or is_admin) and entity.get("visibility_status") != "approved":
|
||||
if is_owner or is_admin:
|
||||
quarantine_sub = StoreSubmissionsRepository(conn).latest_for_entity(entity_id)
|
||||
|
||||
# v37: even when entity is 'approved' (deferred promotion path —
|
||||
# existing installers continue receiving the prior version),
|
||||
# owner/admin needs to see if there's an edit-review in flight so
|
||||
# the Edit button can lock + a small status surfaces. Look it up
|
||||
# separately from quarantine_sub to keep the banner partial's
|
||||
# gates intact.
|
||||
edit_in_flight = False
|
||||
if (is_owner or is_admin):
|
||||
latest = (
|
||||
StoreSubmissionsRepository(conn).latest_for_entity(entity_id)
|
||||
)
|
||||
if latest and latest.get("status") in (
|
||||
"pending_inline", "pending_llm",
|
||||
):
|
||||
edit_in_flight = True
|
||||
# v37: the Edit button locks while a submission is under review.
|
||||
edit_in_flight = bool(
|
||||
quarantine_sub
|
||||
and quarantine_sub.get("status") in ("pending_inline", "pending_llm")
|
||||
)
|
||||
|
||||
common = dict(
|
||||
source="flea",
|
||||
|
|
|
|||
|
|
@ -55,6 +55,14 @@
|
|||
}
|
||||
.versions-card button.restore:hover { background: var(--surface-muted, #f9fafb); }
|
||||
.versions-card button.restore:disabled { opacity: 0.5; cursor: not-allowed; }
|
||||
.versions-card .status-pill {
|
||||
display: inline-block; padding: 1px 7px; border-radius: 999px;
|
||||
font-size: 10px; font-weight: 600; text-transform: uppercase;
|
||||
letter-spacing: 0.3px; vertical-align: middle;
|
||||
}
|
||||
.versions-card .status-pill.blocked { background: #fee2e2; color: #991b1b; }
|
||||
.versions-card .status-pill.errored { background: #fef3c7; color: #92400e; }
|
||||
.versions-card .status-pill.pending { background: #e0e7ff; color: #3730a3; }
|
||||
</style>
|
||||
|
||||
<div class="versions-card">
|
||||
|
|
@ -91,12 +99,37 @@
|
|||
{{ v.created_at[:10] if v.created_at else '' }}
|
||||
</td>
|
||||
<td>
|
||||
{# Restore only when the version was actually approved.
|
||||
``submission_status=None`` is legacy v1 (seeded pre-v37
|
||||
before submission_id backfill) — treat as approved.
|
||||
Anything else (blocked_inline / blocked_llm /
|
||||
review_error / pending_*) renders a pill instead so the
|
||||
owner understands why no button. #}
|
||||
{% set _sub_status = v.submission_status %}
|
||||
{% set _is_approvable = _sub_status in ['approved', None] %}
|
||||
{% if v.n != entity.version_no %}
|
||||
<button class="restore" type="button"
|
||||
data-version-no="{{ v.n }}"
|
||||
onclick="restoreVersion({{ v.n }})">
|
||||
Restore
|
||||
</button>
|
||||
{% if _is_approvable %}
|
||||
<button class="restore" type="button"
|
||||
data-version-no="{{ v.n }}"
|
||||
onclick="restoreVersion({{ v.n }})">
|
||||
Restore
|
||||
</button>
|
||||
{% elif _sub_status in ['blocked_inline', 'blocked_llm'] %}
|
||||
<span class="status-pill blocked"
|
||||
title="This version was blocked by guardrails — restore disabled.">
|
||||
blocked
|
||||
</span>
|
||||
{% elif _sub_status == 'review_error' %}
|
||||
<span class="status-pill errored"
|
||||
title="The security reviewer errored on this version — restore disabled.">
|
||||
errored
|
||||
</span>
|
||||
{% elif _sub_status in ['pending_inline', 'pending_llm'] %}
|
||||
<span class="status-pill pending"
|
||||
title="This version is still under review.">
|
||||
pending
|
||||
</span>
|
||||
{% endif %}
|
||||
{% endif %}
|
||||
</td>
|
||||
</tr>
|
||||
|
|
@ -155,6 +188,10 @@ async function restoreVersion(versionNo) {
|
|||
return;
|
||||
} else if (code === 'prior_version_pending') {
|
||||
msg = 'A previous edit is still under review. Wait for the verdict before restoring.';
|
||||
} else if (code === 'version_not_approved') {
|
||||
const src = j.detail.source_status || 'not approved';
|
||||
msg = `Restore blocked: v${versionNo} was never approved `
|
||||
+ `(status: ${src}). Pick a version that passed review.`;
|
||||
} else if (code === 'version_not_found') {
|
||||
msg = 'That version is no longer on disk.';
|
||||
} else if (code === 'already_current') {
|
||||
|
|
|
|||
|
|
@ -18,7 +18,21 @@
|
|||
approved earlier — only the rendering location changed.
|
||||
#}
|
||||
|
||||
{% if entity.visibility_status != 'approved' and (is_owner or is_admin) %}
|
||||
{# Gate widened for failure surfacing: under deferred promotion
|
||||
(v37+), a v2+ edit can leave the entity approved at the prior
|
||||
version while the latest submission landed in `review_error` /
|
||||
`blocked_llm` / `blocked_inline`. The original
|
||||
`visibility_status != 'approved'` gate silently hid those
|
||||
failures from the owner. Render the banner whenever EITHER the
|
||||
entity itself is non-approved OR the latest submission carries
|
||||
a *failure* verdict the owner needs to see. Pending edits keep
|
||||
the original behavior — Edit button locks instead, no banner. #}
|
||||
{% if (is_owner or is_admin) and (
|
||||
entity.visibility_status != 'approved'
|
||||
or (quarantine_sub and quarantine_sub.status in [
|
||||
'blocked_inline', 'blocked_llm', 'review_error',
|
||||
])
|
||||
) %}
|
||||
<style>
|
||||
.vis-banner {
|
||||
margin: 12px 0 16px 0;
|
||||
|
|
@ -69,6 +83,16 @@
|
|||
{% endif %}
|
||||
|
||||
{% elif st == 'blocked_inline' %}
|
||||
{% set _is_edit_review = entity.visibility_status == 'approved' %}
|
||||
{% if _is_edit_review %}
|
||||
<h3>⚠ Latest edit failed automated checks</h3>
|
||||
<div>
|
||||
Your latest edit failed at least one automated check. The
|
||||
previously approved version (v{{ entity.version_no }}) keeps
|
||||
serving to existing installers. Fix the issues below and
|
||||
re-upload, or wait for an admin to resolve the quarantine.
|
||||
</div>
|
||||
{% else %}
|
||||
<h3>⚠ Quarantined — automated checks failed</h3>
|
||||
<div>
|
||||
Your submission failed at least one automated check and has been
|
||||
|
|
@ -77,6 +101,7 @@
|
|||
re-upload to retry, or wait for an admin to resolve the
|
||||
quarantine.
|
||||
</div>
|
||||
{% endif %}
|
||||
{% if sub and sub.inline_checks %}
|
||||
{% set ic = sub.inline_checks %}
|
||||
{% if ic.manifest and ic.manifest.issues %}
|
||||
|
|
@ -100,6 +125,17 @@
|
|||
{% endif %}
|
||||
|
||||
{% elif st == 'blocked_llm' %}
|
||||
{% set _is_edit_review = entity.visibility_status == 'approved' %}
|
||||
{% if _is_edit_review %}
|
||||
<h3>⚠ Latest edit failed review</h3>
|
||||
<div>
|
||||
The reviewer flagged your latest edit for security risk and/or
|
||||
weak component descriptions. The previously approved version
|
||||
(v{{ entity.version_no }}) keeps serving to existing installers.
|
||||
Address the findings below and re-upload, or wait for an admin
|
||||
to resolve the quarantine.
|
||||
</div>
|
||||
{% else %}
|
||||
<h3>⚠ Quarantined — review flagged issues</h3>
|
||||
<div>
|
||||
The reviewer flagged this submission for security risk and/or
|
||||
|
|
@ -108,6 +144,7 @@
|
|||
findings below and re-upload, or wait for an admin to resolve
|
||||
the quarantine.
|
||||
</div>
|
||||
{% endif %}
|
||||
{% if sub and sub.llm_findings %}
|
||||
{% if sub.llm_findings.summary %}
|
||||
<div style="margin-top: 6px;"><em>{{ sub.llm_findings.summary }}</em></div>
|
||||
|
|
@ -138,11 +175,22 @@
|
|||
{% endif %}
|
||||
|
||||
{% elif st == 'review_error' %}
|
||||
{% set _is_edit_review = entity.visibility_status == 'approved' %}
|
||||
{% if _is_edit_review %}
|
||||
<h3>⚠ Latest edit failed review</h3>
|
||||
<div>
|
||||
The security reviewer couldn't complete its check on your latest
|
||||
edit. The previously approved version (v{{ entity.version_no }})
|
||||
keeps serving to existing installers. No action needed from you —
|
||||
an admin will retry.
|
||||
</div>
|
||||
{% else %}
|
||||
<h3>⚠ Under review — security check errored</h3>
|
||||
<div>
|
||||
The security reviewer couldn't complete its check. The submission
|
||||
stays hidden until an admin retries. No action needed from you.
|
||||
</div>
|
||||
{% endif %}
|
||||
{% if sub and sub.llm_findings and sub.llm_findings.error %}
|
||||
<div style="margin-top: 6px; font-family: ui-monospace, SFMono-Regular, Menlo, monospace; font-size: 12px; word-break: break-word;">
|
||||
Error: {{ sub.llm_findings.error }}
|
||||
|
|
|
|||
|
|
@ -25,6 +25,12 @@ MAX_RETRIES = 3
|
|||
INITIAL_BACKOFF_SECONDS = 2
|
||||
BACKOFF_MULTIPLIER = 2
|
||||
|
||||
# Truncation retry: when the model hits max_tokens we retry with a
|
||||
# doubled budget. Caps the multiplier at 4x the caller's original
|
||||
# value so a runaway can't drain the per-call budget.
|
||||
MAX_TRUNCATION_RETRIES = 2 # 2x then 4x
|
||||
TRUNCATION_BUDGET_MULTIPLIER = 2
|
||||
|
||||
|
||||
def _strict_json_schema(schema):
|
||||
"""Return a copy of the schema with additionalProperties=False on every object type.
|
||||
|
|
@ -94,17 +100,40 @@ class AnthropicExtractor:
|
|||
LLMRefusalError: Model refused to respond.
|
||||
"""
|
||||
last_exception: Exception | None = None
|
||||
# Truncation retries bump max_tokens; transient retries bump
|
||||
# backoff. Accounted separately so a verbose response under
|
||||
# rate-limit doesn't burn both budgets at once.
|
||||
truncation_retries = 0
|
||||
current_max_tokens = max_tokens
|
||||
|
||||
for attempt in range(1, MAX_RETRIES + 1):
|
||||
try:
|
||||
return self._attempt_extraction(
|
||||
prompt, max_tokens, json_schema, schema_name, attempt,
|
||||
system=system,
|
||||
prompt, current_max_tokens, json_schema, schema_name,
|
||||
attempt, system=system,
|
||||
)
|
||||
except LLMAuthError:
|
||||
raise
|
||||
except LLMRefusalError:
|
||||
raise
|
||||
except LLMFormatError as e:
|
||||
# Truncation is a special case: same prompt + schema,
|
||||
# but the model didn't have room to finish. Retry with
|
||||
# a doubled budget — capped — instead of giving up.
|
||||
# Other format errors (bad JSON, schema mismatch) won't
|
||||
# benefit from more tokens, so re-raise immediately.
|
||||
if (str(e).startswith("Response truncated")
|
||||
and truncation_retries < MAX_TRUNCATION_RETRIES):
|
||||
truncation_retries += 1
|
||||
current_max_tokens *= TRUNCATION_BUDGET_MULTIPLIER
|
||||
logger.warning(
|
||||
"Response truncated on attempt %d for model %s, "
|
||||
"retrying with max_tokens=%d (%dx initial)",
|
||||
attempt, self._model, current_max_tokens,
|
||||
TRUNCATION_BUDGET_MULTIPLIER ** truncation_retries,
|
||||
)
|
||||
continue
|
||||
raise
|
||||
except (LLMRateLimitError, LLMTimeoutError) as e:
|
||||
last_exception = e
|
||||
if attempt < MAX_RETRIES:
|
||||
|
|
|
|||
|
|
@ -101,12 +101,25 @@ Required environment variable (when guardrails enabled):
|
|||
ANTHROPIC_API_KEY=sk-ant-… # or LLM_API_KEY for the proxy case
|
||||
```
|
||||
|
||||
If `guardrails.enabled: true` but neither key is set, the pipeline
|
||||
**auto-falls back to disabled** with a warning at startup — uploads
|
||||
auto-approve until the operator wires up the LLM. This keeps tests +
|
||||
first-boot sane; it does NOT silently let unreviewed uploads through
|
||||
when the operator clearly intended review (the env var is missing, not
|
||||
the YAML).
|
||||
### Three-state publish-gate matrix (fail-CLOSED)
|
||||
|
||||
The pipeline distinguishes **operator intent** (the YAML toggle) from
|
||||
**provider readiness** (whether `ANTHROPIC_API_KEY` / `LLM_API_KEY` is
|
||||
in the environment). The two are deliberately separate so a missing
|
||||
env var can't silently flip an intended-on pipeline into auto-approve.
|
||||
|
||||
| `guardrails.enabled` | Provider key in env | Behavior |
|
||||
|---|---|---|
|
||||
| `false` | (any) | Pipeline OFF. Inline checks still run. Uploads auto-approve. Operator's explicit opt-out — local dev / no-LLM deployments. |
|
||||
| `true` | yes | Normal hold-for-review. Inline + LLM both run. |
|
||||
| `true` | **no** | **Hold-for-review, but no async worker fires.** Submissions land at `status='pending_llm'` and stay there until an admin either provides the key and clicks **Retry review** on `/admin/store/submissions/<id>`, or overrides + publishes the row manually. The entity stays at `visibility_status='pending'` (initial v1) or at the prior approved version (v2+ edits/restores). No silent auto-approval. A loud boot-time warning surfaces the misconfig in the logs. |
|
||||
|
||||
This is the **fail-CLOSED** policy. Before v45 the third row silently
|
||||
auto-approved every upload as a "first-boot sanity" affordance — which
|
||||
also meant a deployment whose operator forgot to set the key published
|
||||
every upload without security review. The split was introduced after a
|
||||
prod incident where an admin uploaded a skill containing a `curl … | sh`
|
||||
exfiltration script and the system happily marked it `approved`.
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -485,6 +485,55 @@ class StoreEntitiesRepository:
|
|||
columns = [d[0] for d in self.conn.description]
|
||||
return self._row_to_dict(columns, rows[0])
|
||||
|
||||
def get_with_version_approvals(
|
||||
self, id: str,
|
||||
) -> Optional[Dict[str, Any]]:
|
||||
"""Same as :meth:`get` but each ``version_history`` entry gets
|
||||
an additional ``submission_status`` field populated from
|
||||
``store_submissions``.
|
||||
|
||||
Used by the detail page + restore endpoint to gate which
|
||||
versions are restorable. Legacy v1 rows created pre-v37 carry
|
||||
``submission_id=None`` (the v1 seed predates the
|
||||
backfill) — those map to ``submission_status=None`` and the
|
||||
consumer treats them as approved (back-compat).
|
||||
"""
|
||||
entity = self.get(id)
|
||||
if entity is None:
|
||||
return None
|
||||
history = list(entity.get("version_history") or [])
|
||||
if not history:
|
||||
return entity
|
||||
sub_ids = [
|
||||
entry.get("submission_id") for entry in history
|
||||
if entry.get("submission_id")
|
||||
]
|
||||
status_by_id: Dict[str, str] = {}
|
||||
if sub_ids:
|
||||
placeholders = ",".join("?" for _ in sub_ids)
|
||||
rows = self.conn.execute(
|
||||
f"SELECT id, status FROM store_submissions "
|
||||
f"WHERE id IN ({placeholders})",
|
||||
sub_ids,
|
||||
).fetchall()
|
||||
for sub_id, status in rows:
|
||||
status_by_id[sub_id] = status
|
||||
# Defensive copy of each history entry before mutating — today
|
||||
# ``self.get()`` re-parses JSON each call so the mutation can't
|
||||
# leak across calls, but copying costs nothing and protects any
|
||||
# future caching layer from carrying the annotated
|
||||
# ``submission_status`` key into a subsequent plain ``get()``.
|
||||
annotated: List[Dict[str, Any]] = []
|
||||
for entry in history:
|
||||
entry = dict(entry)
|
||||
sid = entry.get("submission_id")
|
||||
entry["submission_status"] = (
|
||||
status_by_id.get(sid) if sid else None
|
||||
)
|
||||
annotated.append(entry)
|
||||
entity["version_history"] = annotated
|
||||
return entity
|
||||
|
||||
def get_by_owner_and_name(
|
||||
self,
|
||||
owner_user_id: str,
|
||||
|
|
|
|||
|
|
@ -30,10 +30,14 @@ from .prompts import (
|
|||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Bound the response budget. The schema is small — findings + content_quality
|
||||
# issues typically have 0–3 items each — but allow headroom so the model
|
||||
# doesn't truncate when both lists fire.
|
||||
MAX_RESPONSE_TOKENS = 2500
|
||||
# Bound the response budget. The schema's two arrays (findings +
|
||||
# content_quality.issues) are individually capped at maxItems=20, but
|
||||
# each item is ~120-180 tokens (severity/category/file/explanation/
|
||||
# fix_hint or file/field/issue/hint). A bundle with many weak
|
||||
# descriptions can easily hit 4-5k output tokens. Stay generous on
|
||||
# Haiku/Sonnet — output cost is negligible compared to the cost of a
|
||||
# truncated verdict pinning the submission in `review_error`.
|
||||
MAX_RESPONSE_TOKENS = 6000
|
||||
|
||||
|
||||
def review_bundle(
|
||||
|
|
|
|||
|
|
@ -280,8 +280,11 @@ def run_llm_review(
|
|||
except (TypeError, ValueError):
|
||||
target_version_no = None
|
||||
break
|
||||
# Forward-only promotion. A late verdict landing for
|
||||
# an older submission must NOT demote the live bundle
|
||||
# past a version that was approved more recently.
|
||||
if (target_version_no is not None
|
||||
and target_version_no != int(ent_row.get("version_no") or 0)):
|
||||
and target_version_no > int(ent_row.get("version_no") or 0)):
|
||||
if ents_repo.promote_version(entity_id, target_version_no):
|
||||
try:
|
||||
from app.api.store import _swap_live_to_version
|
||||
|
|
|
|||
|
|
@ -25,6 +25,44 @@ os.makedirs(os.path.join(os.environ["DATA_DIR"], "notifications"), exist_ok=True
|
|||
os.makedirs(os.path.join(os.environ["DATA_DIR"], "state"), exist_ok=True)
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _flea_guardrails_disabled_by_default(monkeypatch):
|
||||
"""Default flea-market upload pipeline to OFF for every test.
|
||||
|
||||
Post-v45 publish-gate refactor split operator intent
|
||||
(``guardrails.enabled`` in instance.yaml) from provider readiness
|
||||
(``ANTHROPIC_API_KEY`` in env). Both default to True/False in a
|
||||
test env that has no instance.yaml + no key — so the gate is now
|
||||
``enabled=True, ready=False`` and every upload sits at
|
||||
``visibility_status='pending'`` waiting on a non-existent LLM
|
||||
call. That breaks every legacy test that uploads a bundle and
|
||||
expects v1 to be live.
|
||||
|
||||
Default both to False here so legacy tests keep working. Tests
|
||||
that exercise the guardrail-on path override per-test with
|
||||
``monkeypatch.setattr("app.api.store.get_guardrails_enabled",
|
||||
lambda: True)`` + the matching ``..._llm_provider_ready`` line.
|
||||
"""
|
||||
try:
|
||||
# `app.api.store` does a top-level import — patch the bound
|
||||
# symbol there. Existing per-test overrides target the same path.
|
||||
monkeypatch.setattr(
|
||||
"app.api.store.get_guardrails_enabled", lambda: False,
|
||||
)
|
||||
except (AttributeError, ImportError):
|
||||
# app.api.store may not be importable in some test contexts
|
||||
# (e.g. tests that exercise migrations without the full app).
|
||||
pass
|
||||
try:
|
||||
# `app.api.admin` does a function-local import — patch the
|
||||
# source so per-call lookups see the override.
|
||||
monkeypatch.setattr(
|
||||
"app.instance_config.get_guardrails_enabled", lambda: False,
|
||||
)
|
||||
except (AttributeError, ImportError):
|
||||
pass
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _disable_auth_rate_limit_in_tests():
|
||||
"""Disable the slowapi auth rate limiter for every test by default.
|
||||
|
|
|
|||
|
|
@ -412,6 +412,172 @@ class TestAdminOverride:
|
|||
assert sub["override_reason"].startswith("false positive")
|
||||
conn.close()
|
||||
|
||||
def test_override_v2_edit_promotes_to_current(self, web_client, monkeypatch):
|
||||
"""When an admin overrides a v2+ edit/restore submission, the
|
||||
entity must be promoted to that version — same end state as
|
||||
an LLM auto-approval. Pre-fix the override only flipped
|
||||
visibility, leaving entity.version_no at the prior approved
|
||||
version + live bundle bytes unchanged. Installers kept getting
|
||||
the old version."""
|
||||
from pathlib import Path
|
||||
from app.utils import get_store_dir
|
||||
from src.repositories.store_entities import StoreEntitiesRepository
|
||||
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||||
from src.store_guardrails.runner import run_llm_review
|
||||
|
||||
user_id, user_cookies = _create_user(web_client, "v2over@x.com")
|
||||
|
||||
# Phase 1: clean v1 upload (guardrails off by default in tests) →
|
||||
# entity approved at version_no=1.
|
||||
r = web_client.post(
|
||||
"/api/store/entities",
|
||||
files={"file": ("s.zip", _make_skill_zip("v2over"), "application/zip")},
|
||||
data={"type": "skill",
|
||||
"description": (
|
||||
"Use when verifying admin override on a v2+ edit "
|
||||
"promotes the entity to the overridden version "
|
||||
"across the deferred-promotion path."
|
||||
)},
|
||||
cookies=user_cookies,
|
||||
)
|
||||
assert r.status_code == 201, r.text
|
||||
eid = r.json()["id"]
|
||||
|
||||
# Phase 2: flip guardrails on (LLM mocked to BLOCK), PUT v2.
|
||||
def mock_block(*args, **kwargs):
|
||||
return {
|
||||
"risk_level": "high",
|
||||
"summary": "mock block",
|
||||
"findings": [{"severity": "high", "category": "test",
|
||||
"file": "x", "explanation": "mock"}],
|
||||
"template_placeholders_found": 0,
|
||||
"reviewed_by_model": "mock-model", "error": None,
|
||||
}
|
||||
monkeypatch.setattr(
|
||||
"src.store_guardrails.llm_review.review_bundle", mock_block,
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
"app.api.store.get_guardrails_enabled", lambda: True,
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
"app.api.store.get_guardrails_llm_provider_ready", lambda: True,
|
||||
)
|
||||
|
||||
# Build a v2 zip inline (slightly different body so the hash diverges).
|
||||
import io as _io
|
||||
import zipfile as _zip
|
||||
buf = _io.BytesIO()
|
||||
with _zip.ZipFile(buf, "w") as zf:
|
||||
zf.writestr(
|
||||
"v2over/SKILL.md",
|
||||
"---\nname: v2over\ndescription: "
|
||||
"Use when verifying admin override v2 promote behaviour after edit\n---\n\n"
|
||||
+ ("V2 BODY text that is intentionally different from v1. " * 8),
|
||||
)
|
||||
r = web_client.put(
|
||||
f"/api/store/entities/{eid}",
|
||||
files={"file": ("v2.zip", buf.getvalue(), "application/zip")},
|
||||
cookies=user_cookies,
|
||||
)
|
||||
assert r.status_code == 200, r.text
|
||||
|
||||
# Drive the BG review synchronously so v2 lands at blocked_llm.
|
||||
conn = get_system_db()
|
||||
v2_sub_id = StoreSubmissionsRepository(conn).latest_for_entity(eid)["id"]
|
||||
conn.close()
|
||||
run_llm_review(
|
||||
v2_sub_id,
|
||||
plugin_dir=Path(get_store_dir()) / eid / "versions" / "v2" / "plugin",
|
||||
conn_factory=get_system_db,
|
||||
api_key_loader=lambda: "sk-test",
|
||||
model_loader=lambda: "mock-model",
|
||||
)
|
||||
|
||||
conn = get_system_db()
|
||||
ent_before = StoreEntitiesRepository(conn).get(eid)
|
||||
sub_before = StoreSubmissionsRepository(conn).get(v2_sub_id)
|
||||
conn.close()
|
||||
# Pre-condition: blocked at v2, entity stayed at v1.
|
||||
assert sub_before["status"] == "blocked_llm"
|
||||
assert ent_before["version_no"] == 1
|
||||
v1_hash = ent_before["version"]
|
||||
|
||||
# Phase 3: admin overrides v2. Entity must promote to v2 and
|
||||
# the on-disk live bundle must reflect v2's bytes.
|
||||
_, admin_cookies = _create_admin(web_client)
|
||||
r = web_client.post(
|
||||
f"/api/admin/store/submissions/{v2_sub_id}/override",
|
||||
json={"reason": "false positive — verified clean offline"},
|
||||
cookies=admin_cookies,
|
||||
)
|
||||
assert r.status_code == 200, r.text
|
||||
|
||||
conn = get_system_db()
|
||||
ent_after = StoreEntitiesRepository(conn).get(eid)
|
||||
conn.close()
|
||||
assert ent_after["version_no"] == 2, (
|
||||
f"override must promote entity to v2; got version_no={ent_after['version_no']}"
|
||||
)
|
||||
assert ent_after["version"] != v1_hash, (
|
||||
"entity.version (hash) must move to v2 — stayed at v1 hash"
|
||||
)
|
||||
# Live plugin/ dir must hold v2's bytes (compare to v2 source dir).
|
||||
v2_plugin = Path(get_store_dir()) / eid / "versions" / "v2" / "plugin"
|
||||
live_plugin = Path(get_store_dir()) / eid / "plugin"
|
||||
v2_files = sorted(p.name for p in v2_plugin.rglob("*") if p.is_file())
|
||||
live_files = sorted(p.name for p in live_plugin.rglob("*") if p.is_file())
|
||||
assert v2_files == live_files, (
|
||||
f"live plugin/ must mirror v2 dir after override; "
|
||||
f"v2_files={v2_files} live_files={live_files}"
|
||||
)
|
||||
|
||||
def test_override_v1_initial_upload_no_promote(self, web_client):
|
||||
"""Override on an initial v1 (no prior approved version) must
|
||||
still work: entity already at version_no=1, no promotion needed.
|
||||
Regression guard so the v2+ promote logic doesn't break v1
|
||||
overrides (the audit log used to be the only signal here)."""
|
||||
from src.repositories.store_entities import StoreEntitiesRepository
|
||||
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||||
|
||||
user_id, _ = _create_user(web_client, "v1over@x.com")
|
||||
conn = get_system_db()
|
||||
ents = StoreEntitiesRepository(conn)
|
||||
ents.create(
|
||||
id="ent-v1-over", owner_user_id=user_id, owner_username="v1over",
|
||||
type="skill", name="v1-blocked", description="x" * 40,
|
||||
category=None, version="aaaaaaaaaaaaaaaa", file_size=10,
|
||||
visibility_status="pending",
|
||||
)
|
||||
subs = StoreSubmissionsRepository(conn)
|
||||
sid = subs.create(
|
||||
submitter_id=user_id, submitter_email="v1over@x.com",
|
||||
type="skill", name="v1-blocked", version="aaaaaaaaaaaaaaaa",
|
||||
status="blocked_llm", entity_id="ent-v1-over",
|
||||
llm_findings={"risk_level": "high", "summary": "x"},
|
||||
)
|
||||
# Backfill the v1 history entry submission_id so the promote
|
||||
# loop has a target to find.
|
||||
ents.update_history_submission_id("ent-v1-over", 1, sid)
|
||||
conn.close()
|
||||
|
||||
_, admin_cookies = _create_admin(web_client)
|
||||
r = web_client.post(
|
||||
f"/api/admin/store/submissions/{sid}/override",
|
||||
json={"reason": "false positive — internal-only constants"},
|
||||
cookies=admin_cookies,
|
||||
)
|
||||
assert r.status_code == 200, r.text
|
||||
|
||||
conn = get_system_db()
|
||||
ent = StoreEntitiesRepository(conn).get("ent-v1-over")
|
||||
sub = StoreSubmissionsRepository(conn).get(sid)
|
||||
conn.close()
|
||||
assert ent["visibility_status"] == "approved"
|
||||
assert ent["version_no"] == 1, (
|
||||
"v1 override must NOT trigger phantom promotion"
|
||||
)
|
||||
assert sub["status"] == "overridden"
|
||||
|
||||
def test_override_short_reason_rejected(self, web_client):
|
||||
from src.repositories.store_entities import StoreEntitiesRepository
|
||||
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||||
|
|
@ -781,6 +947,363 @@ class TestAdminRescan:
|
|||
assert r.status_code == 403
|
||||
|
||||
|
||||
class TestAdminRetryReviewsStagedBundle:
|
||||
"""Codex adversarial finding [CRITICAL C1]: admin retry was passing
|
||||
live `plugin/` to the LLM. For a v2+ pending_llm / blocked_llm /
|
||||
review_error submission, live still holds the prior approved
|
||||
version. A retry would review the WRONG bytes and the runner's
|
||||
hash-match promotion would then advance the entity to staged
|
||||
bytes that were never reviewed. Fixed by resolving the staged
|
||||
`versions/v<N>/plugin/` from the submission's version_history
|
||||
entry."""
|
||||
|
||||
def test_retry_v2_blocked_passes_staged_dir_not_live(
|
||||
self, web_client, monkeypatch,
|
||||
):
|
||||
from pathlib import Path
|
||||
from app.utils import get_store_dir
|
||||
from src.repositories.store_entities import StoreEntitiesRepository
|
||||
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||||
|
||||
# v1 clean upload (guardrails off by default in tests).
|
||||
monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-fake-for-retry-test")
|
||||
user_id, user_cookies = _create_user(web_client, "retrystaged@x.com")
|
||||
r = web_client.post(
|
||||
"/api/store/entities",
|
||||
files={"file": ("s.zip", _make_skill_zip("retrystaged"), "application/zip")},
|
||||
data={"type": "skill",
|
||||
"description": (
|
||||
"Use when verifying retry reads staged version "
|
||||
"bundle bytes not live for v2 blocked submissions"
|
||||
)},
|
||||
cookies=user_cookies,
|
||||
)
|
||||
assert r.status_code == 201, r.text
|
||||
eid = r.json()["id"]
|
||||
|
||||
# v2 PUT with guardrails on, LLM mocked to BLOCK → blocked_llm.
|
||||
def mock_block(*a, **kw):
|
||||
return {
|
||||
"risk_level": "high", "summary": "mock block",
|
||||
"findings": [{"severity": "high", "category": "test",
|
||||
"file": "x", "explanation": "mock"}],
|
||||
"template_placeholders_found": 0,
|
||||
"reviewed_by_model": "mock", "error": None,
|
||||
}
|
||||
monkeypatch.setattr(
|
||||
"src.store_guardrails.llm_review.review_bundle", mock_block,
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
"app.api.store.get_guardrails_enabled", lambda: True,
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
"app.api.store.get_guardrails_llm_provider_ready", lambda: True,
|
||||
)
|
||||
import io as _io
|
||||
import zipfile as _zip
|
||||
buf = _io.BytesIO()
|
||||
with _zip.ZipFile(buf, "w") as zf:
|
||||
zf.writestr(
|
||||
"retrystaged/SKILL.md",
|
||||
"---\nname: retrystaged\ndescription: "
|
||||
"Use when verifying that admin retry hits the staged dir bundle bytes for v2 blocked submissions\n---\n\n"
|
||||
+ ("V2-only body text that is intentionally long enough to clear the inline content-quality threshold. " * 4),
|
||||
)
|
||||
r = web_client.put(
|
||||
f"/api/store/entities/{eid}",
|
||||
files={"file": ("v2.zip", buf.getvalue(), "application/zip")},
|
||||
cookies=user_cookies,
|
||||
)
|
||||
assert r.status_code == 200, r.text
|
||||
from src.store_guardrails.runner import run_llm_review
|
||||
conn = get_system_db()
|
||||
sub_id = StoreSubmissionsRepository(conn).latest_for_entity(eid)["id"]
|
||||
conn.close()
|
||||
run_llm_review(
|
||||
sub_id,
|
||||
plugin_dir=Path(get_store_dir()) / eid / "versions" / "v2" / "plugin",
|
||||
conn_factory=get_system_db,
|
||||
api_key_loader=lambda: "sk", model_loader=lambda: "mock",
|
||||
)
|
||||
|
||||
# Capture what review_bundle is called with on retry.
|
||||
seen = {}
|
||||
def spy_review_bundle(plugin_dir, **kw):
|
||||
seen["plugin_dir"] = plugin_dir
|
||||
return {
|
||||
"risk_level": "safe", "summary": "ok",
|
||||
"findings": [], "template_placeholders_found": 0,
|
||||
"reviewed_by_model": "mock", "error": None,
|
||||
"content_quality": {"verdict": "pass", "issues": []},
|
||||
}
|
||||
monkeypatch.setattr(
|
||||
"src.store_guardrails.llm_review.review_bundle", spy_review_bundle,
|
||||
)
|
||||
monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-fake-for-retry-loader")
|
||||
|
||||
# Admin retry.
|
||||
_, admin_cookies = _create_admin(web_client)
|
||||
r = web_client.post(
|
||||
f"/api/admin/store/submissions/{sub_id}/retry",
|
||||
cookies=admin_cookies,
|
||||
)
|
||||
assert r.status_code == 200, r.text
|
||||
|
||||
v2_dir = Path(get_store_dir()) / eid / "versions" / "v2" / "plugin"
|
||||
live_dir = Path(get_store_dir()) / eid / "plugin"
|
||||
assert seen["plugin_dir"] == v2_dir, (
|
||||
f"retry must review STAGED bytes ({v2_dir}); "
|
||||
f"instead reviewed {seen['plugin_dir']} (live={live_dir})"
|
||||
)
|
||||
|
||||
def test_rescan_v2_blocked_passes_staged_dir_not_live(
|
||||
self, web_client, monkeypatch,
|
||||
):
|
||||
"""Same invariant for rescan."""
|
||||
from pathlib import Path
|
||||
from app.utils import get_store_dir
|
||||
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||||
|
||||
monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-fake-for-rescan-test")
|
||||
user_id, user_cookies = _create_user(web_client, "rescanstaged@x.com")
|
||||
r = web_client.post(
|
||||
"/api/store/entities",
|
||||
files={"file": ("s.zip", _make_skill_zip("rescanstaged"), "application/zip")},
|
||||
data={"type": "skill",
|
||||
"description": (
|
||||
"Use when verifying rescan reads staged version "
|
||||
"bundle bytes not live for v2 blocked submissions"
|
||||
)},
|
||||
cookies=user_cookies,
|
||||
)
|
||||
assert r.status_code == 201
|
||||
eid = r.json()["id"]
|
||||
|
||||
def mock_block(*a, **kw):
|
||||
return {
|
||||
"risk_level": "high", "summary": "mock block",
|
||||
"findings": [{"severity": "high", "category": "test",
|
||||
"file": "x", "explanation": "mock"}],
|
||||
"template_placeholders_found": 0,
|
||||
"reviewed_by_model": "mock", "error": None,
|
||||
}
|
||||
monkeypatch.setattr(
|
||||
"src.store_guardrails.llm_review.review_bundle", mock_block,
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
"app.api.store.get_guardrails_enabled", lambda: True,
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
"app.api.store.get_guardrails_llm_provider_ready", lambda: True,
|
||||
)
|
||||
import io as _io
|
||||
import zipfile as _zip
|
||||
buf = _io.BytesIO()
|
||||
with _zip.ZipFile(buf, "w") as zf:
|
||||
zf.writestr(
|
||||
"rescanstaged/SKILL.md",
|
||||
"---\nname: rescanstaged\ndescription: "
|
||||
"Use when verifying that admin rescan hits the staged dir bundle bytes for v2 blocked submissions\n---\n\n"
|
||||
+ ("V2 unique payload body line that is intentionally long enough to clear the inline content-quality threshold. " * 4),
|
||||
)
|
||||
r = web_client.put(
|
||||
f"/api/store/entities/{eid}",
|
||||
files={"file": ("v2.zip", buf.getvalue(), "application/zip")},
|
||||
cookies=user_cookies,
|
||||
)
|
||||
assert r.status_code == 200, r.text
|
||||
from src.store_guardrails.runner import run_llm_review
|
||||
conn = get_system_db()
|
||||
sub_id = StoreSubmissionsRepository(conn).latest_for_entity(eid)["id"]
|
||||
conn.close()
|
||||
run_llm_review(
|
||||
sub_id,
|
||||
plugin_dir=Path(get_store_dir()) / eid / "versions" / "v2" / "plugin",
|
||||
conn_factory=get_system_db,
|
||||
api_key_loader=lambda: "sk", model_loader=lambda: "mock",
|
||||
)
|
||||
|
||||
# Spy on the inline pipeline. admin.py imports
|
||||
# `run_inline_checks` function-locally — patch at the source
|
||||
# module so the lookup at call time sees the spy.
|
||||
seen_inline: list = []
|
||||
from src.store_guardrails import run_inline_checks as orig_run_inline
|
||||
def spy_inline(plugin_dir, **kw):
|
||||
seen_inline.append(plugin_dir)
|
||||
return orig_run_inline(plugin_dir, **kw)
|
||||
monkeypatch.setattr(
|
||||
"src.store_guardrails.run_inline_checks", spy_inline,
|
||||
)
|
||||
monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-fake-for-rescan")
|
||||
|
||||
_, admin_cookies = _create_admin(web_client)
|
||||
r = web_client.post(
|
||||
f"/api/admin/store/submissions/{sub_id}/rescan",
|
||||
cookies=admin_cookies,
|
||||
)
|
||||
assert r.status_code == 200, r.text
|
||||
|
||||
v2_dir = Path(get_store_dir()) / eid / "versions" / "v2" / "plugin"
|
||||
assert v2_dir in seen_inline, (
|
||||
f"rescan must run inline against STAGED bytes ({v2_dir}); "
|
||||
f"got {seen_inline}"
|
||||
)
|
||||
|
||||
|
||||
class TestOverrideForwardOnly:
|
||||
"""Codex adversarial finding [HIGH H1]: override's promote loop
|
||||
used `target != current`, which would happily DEMOTE the live
|
||||
bundle when admin overrode a stale v2 submission while v3 was
|
||||
already approved + live. Forward-only: refuse the promote step
|
||||
when target_n <= current.
|
||||
|
||||
Override of the row still flips status + visibility (admin's
|
||||
intent on the row itself is preserved); the on-disk live bundle
|
||||
just isn't rolled back."""
|
||||
|
||||
def test_override_stale_v2_does_not_demote_when_v3_current(
|
||||
self, web_client, monkeypatch,
|
||||
):
|
||||
from src.repositories.store_entities import StoreEntitiesRepository
|
||||
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||||
|
||||
# v1 clean, v2 PUT (mocked block) → blocked_llm, entity stays v1.
|
||||
# Set a fake API key so the BG task `run_llm_review` (scheduled
|
||||
# by the API after each PUT) can pass `default_api_key_loader`
|
||||
# — it then calls the mocked `review_bundle`, not the network.
|
||||
monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-fake-for-override-test")
|
||||
user_id, user_cookies = _create_user(web_client, "demote@x.com")
|
||||
r = web_client.post(
|
||||
"/api/store/entities",
|
||||
files={"file": ("s.zip", _make_skill_zip("demote"), "application/zip")},
|
||||
data={"type": "skill",
|
||||
"description": (
|
||||
"Use when verifying override never demotes a "
|
||||
"currently-live newer version of the same entity"
|
||||
)},
|
||||
cookies=user_cookies,
|
||||
)
|
||||
assert r.status_code == 201
|
||||
eid = r.json()["id"]
|
||||
|
||||
def mock_block(*a, **kw):
|
||||
return {
|
||||
"risk_level": "high", "summary": "mock block",
|
||||
"findings": [{"severity": "high", "category": "test",
|
||||
"file": "x", "explanation": "mock"}],
|
||||
"template_placeholders_found": 0,
|
||||
"reviewed_by_model": "mock", "error": None,
|
||||
}
|
||||
monkeypatch.setattr(
|
||||
"src.store_guardrails.llm_review.review_bundle", mock_block,
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
"app.api.store.get_guardrails_enabled", lambda: True,
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
"app.api.store.get_guardrails_llm_provider_ready", lambda: True,
|
||||
)
|
||||
|
||||
# PUT v2 → blocked_llm.
|
||||
import io as _io
|
||||
import zipfile as _zip
|
||||
def _v2_zip():
|
||||
buf = _io.BytesIO()
|
||||
with _zip.ZipFile(buf, "w") as zf:
|
||||
zf.writestr(
|
||||
"demote/SKILL.md",
|
||||
"---\nname: demote\ndescription: Use when v2 blocked path is exercised by admin override forward-only tests for the demote-protection invariant\n---\n\n"
|
||||
+ ("V2 BODY content line long enough to clear the inline content-quality threshold for skill bodies. " * 4),
|
||||
)
|
||||
return buf.getvalue()
|
||||
r = web_client.put(
|
||||
f"/api/store/entities/{eid}",
|
||||
files={"file": ("v2.zip", _v2_zip(), "application/zip")},
|
||||
cookies=user_cookies,
|
||||
)
|
||||
assert r.status_code == 200
|
||||
from pathlib import Path
|
||||
from app.utils import get_store_dir
|
||||
from src.store_guardrails.runner import run_llm_review
|
||||
conn = get_system_db()
|
||||
v2_sub_id = StoreSubmissionsRepository(conn).latest_for_entity(eid)["id"]
|
||||
conn.close()
|
||||
run_llm_review(
|
||||
v2_sub_id,
|
||||
plugin_dir=Path(get_store_dir()) / eid / "versions" / "v2" / "plugin",
|
||||
conn_factory=get_system_db,
|
||||
api_key_loader=lambda: "sk", model_loader=lambda: "mock",
|
||||
)
|
||||
|
||||
# PUT v3 with mocked-APPROVE → v3 becomes current.
|
||||
def mock_approve(*a, **kw):
|
||||
return {
|
||||
"risk_level": "safe", "summary": "ok",
|
||||
"findings": [], "template_placeholders_found": 0,
|
||||
"reviewed_by_model": "mock", "error": None,
|
||||
"content_quality": {"verdict": "pass", "issues": []},
|
||||
}
|
||||
monkeypatch.setattr(
|
||||
"src.store_guardrails.llm_review.review_bundle", mock_approve,
|
||||
)
|
||||
def _v3_zip():
|
||||
buf = _io.BytesIO()
|
||||
with _zip.ZipFile(buf, "w") as zf:
|
||||
zf.writestr(
|
||||
"demote/SKILL.md",
|
||||
"---\nname: demote\ndescription: Use when v3 approved path is exercised by admin override forward-only tests for the demote-protection invariant\n---\n\n"
|
||||
+ ("V3 BODY content line long enough to clear the inline content-quality threshold for skill bodies. " * 4),
|
||||
)
|
||||
return buf.getvalue()
|
||||
r = web_client.put(
|
||||
f"/api/store/entities/{eid}",
|
||||
files={"file": ("v3.zip", _v3_zip(), "application/zip")},
|
||||
cookies=user_cookies,
|
||||
)
|
||||
assert r.status_code == 200
|
||||
conn = get_system_db()
|
||||
v3_sub_id = StoreSubmissionsRepository(conn).latest_for_entity(eid)["id"]
|
||||
conn.close()
|
||||
run_llm_review(
|
||||
v3_sub_id,
|
||||
plugin_dir=Path(get_store_dir()) / eid / "versions" / "v3" / "plugin",
|
||||
conn_factory=get_system_db,
|
||||
api_key_loader=lambda: "sk", model_loader=lambda: "mock",
|
||||
)
|
||||
conn = get_system_db()
|
||||
ent_at_v3 = StoreEntitiesRepository(conn).get(eid)
|
||||
conn.close()
|
||||
assert ent_at_v3["version_no"] == 3, (
|
||||
f"v3 must auto-promote to current; got {ent_at_v3['version_no']}"
|
||||
)
|
||||
v3_hash = ent_at_v3["version"]
|
||||
|
||||
# Admin overrides stale v2 (status was blocked_llm). MUST NOT
|
||||
# demote live from v3 back to v2 bytes.
|
||||
_, admin_cookies = _create_admin(web_client)
|
||||
r = web_client.post(
|
||||
f"/api/admin/store/submissions/{v2_sub_id}/override",
|
||||
json={"reason": "cleared in offline review — leaving stale row"},
|
||||
cookies=admin_cookies,
|
||||
)
|
||||
assert r.status_code == 200, r.text
|
||||
|
||||
conn = get_system_db()
|
||||
ent_after = StoreEntitiesRepository(conn).get(eid)
|
||||
v2_sub_after = StoreSubmissionsRepository(conn).get(v2_sub_id)
|
||||
conn.close()
|
||||
assert ent_after["version_no"] == 3, (
|
||||
f"override of stale v2 must NOT demote live; "
|
||||
f"got version_no={ent_after['version_no']} (expected 3)"
|
||||
)
|
||||
assert ent_after["version"] == v3_hash, (
|
||||
"entity.version hash must stay at v3 — got demoted"
|
||||
)
|
||||
# Submission row still flips to overridden (admin intent on the
|
||||
# row itself is preserved; only the on-disk roll-back is gated).
|
||||
assert v2_sub_after["status"] == "overridden"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# v30: Download bundle, Sort by size, Quota
|
||||
# ---------------------------------------------------------------------------
|
||||
|
|
|
|||
|
|
@ -330,8 +330,29 @@ class TestAnthropicExtractor:
|
|||
mock_sleep.assert_called_once_with(2) # INITIAL_BACKOFF_SECONDS
|
||||
|
||||
@patch("connectors.llm.anthropic_provider.anthropic.Anthropic")
|
||||
def test_truncation_raises_format_error(self, mock_client_cls):
|
||||
"""stop_reason='max_tokens' raises LLMFormatError immediately (no recursion)."""
|
||||
def test_truncation_retries_with_doubled_budget(self, mock_client_cls):
|
||||
"""stop_reason='max_tokens' on first attempt retries with 2x budget."""
|
||||
mock_client = MagicMock()
|
||||
mock_client_cls.return_value = mock_client
|
||||
|
||||
truncated = _anthropic_response('{"items": [', stop_reason="max_tokens")
|
||||
success = _anthropic_response('{"items": ["ok"]}')
|
||||
mock_client.messages.create.side_effect = [truncated, success]
|
||||
|
||||
ext = AnthropicExtractor(api_key="sk-ant-test", model="claude-haiku-4-5-20251001")
|
||||
result = ext.extract_json("test", 1024, self.SCHEMA, "test_schema")
|
||||
|
||||
assert result == {"items": ["ok"]}
|
||||
assert mock_client.messages.create.call_count == 2
|
||||
# First call used the caller's budget; second call doubled.
|
||||
first_kwargs = mock_client.messages.create.call_args_list[0].kwargs
|
||||
second_kwargs = mock_client.messages.create.call_args_list[1].kwargs
|
||||
assert first_kwargs["max_tokens"] == 1024
|
||||
assert second_kwargs["max_tokens"] == 2048
|
||||
|
||||
@patch("connectors.llm.anthropic_provider.anthropic.Anthropic")
|
||||
def test_truncation_persistent_raises_format_error(self, mock_client_cls):
|
||||
"""Truncation on every attempt eventually surfaces LLMFormatError."""
|
||||
mock_client = MagicMock()
|
||||
mock_client_cls.return_value = mock_client
|
||||
|
||||
|
|
@ -342,6 +363,13 @@ class TestAnthropicExtractor:
|
|||
with pytest.raises(LLMFormatError, match="truncated"):
|
||||
ext.extract_json("test", 1024, self.SCHEMA, "test_schema")
|
||||
|
||||
# MAX_TRUNCATION_RETRIES=2 → up to 3 calls (initial + 2 retries
|
||||
# at 2x and 4x), bounded by MAX_RETRIES=3 loop slots.
|
||||
assert mock_client.messages.create.call_count == 3
|
||||
# Final call hit the 4x cap.
|
||||
last_kwargs = mock_client.messages.create.call_args_list[-1].kwargs
|
||||
assert last_kwargs["max_tokens"] == 4096
|
||||
|
||||
@patch("connectors.llm.anthropic_provider.anthropic.Anthropic")
|
||||
def test_invalid_json_raises_format_error(self, mock_client_cls):
|
||||
"""Non-JSON response raises LLMFormatError."""
|
||||
|
|
|
|||
|
|
@ -288,6 +288,155 @@ class TestRestoreVersion:
|
|||
)
|
||||
assert r.status_code in (403, 404), r.text
|
||||
|
||||
def test_restore_rejects_blocked_llm_version(self, web_client, monkeypatch):
|
||||
"""A v2 that LLM-blocked sits in version_history with
|
||||
submission.status='blocked_llm'. The restore endpoint must
|
||||
refuse to roll forward from that bundle — defense in depth
|
||||
against the UI being bypassed by direct POST."""
|
||||
# Mock LLM to BLOCK v2.
|
||||
def mock_review_bundle(*args, **kwargs):
|
||||
return {
|
||||
"risk_level": "high",
|
||||
"summary": "mock block",
|
||||
"findings": [{"severity": "high", "category": "test",
|
||||
"file": "x", "explanation": "mock"}],
|
||||
"template_placeholders_found": 0,
|
||||
"reviewed_by_model": "mock-model",
|
||||
"error": None,
|
||||
}
|
||||
monkeypatch.setattr(
|
||||
"src.store_guardrails.llm_review.review_bundle",
|
||||
mock_review_bundle,
|
||||
)
|
||||
|
||||
owner_id, owner_cookies = _create_user(web_client, "blockrestore@x.com")
|
||||
# Phase 1: guardrails OFF — v1 lands approved.
|
||||
eid = _upload_clean(web_client, owner_cookies, name="blockrestore")
|
||||
# Phase 2: guardrails ON → v2 blocked, entity stays at v1.
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_enabled", lambda: True)
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_llm_provider_ready", lambda: True)
|
||||
v2 = _make_skill_zip("blockrestore", body="V2 BODY " * 80)
|
||||
r = web_client.put(
|
||||
f"/api/store/entities/{eid}",
|
||||
files={"file": ("v2.zip", v2, "application/zip")},
|
||||
cookies=owner_cookies,
|
||||
)
|
||||
assert r.status_code == 200, r.text
|
||||
|
||||
# Drive the BG review synchronously.
|
||||
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||||
from src.store_guardrails.runner import run_llm_review
|
||||
conn = get_system_db()
|
||||
sub_id = StoreSubmissionsRepository(conn).latest_for_entity(eid)["id"]
|
||||
conn.close()
|
||||
run_llm_review(
|
||||
sub_id,
|
||||
plugin_dir=Path(get_store_dir()) / eid / "versions" / "v2" / "plugin",
|
||||
conn_factory=get_system_db,
|
||||
api_key_loader=lambda: "sk-test",
|
||||
model_loader=lambda: "claude-haiku-4-5-20251001",
|
||||
)
|
||||
|
||||
# Now POST a restore /versions/2/restore. Must 400 because v2
|
||||
# was never approved.
|
||||
r = web_client.post(
|
||||
f"/api/store/entities/{eid}/versions/2/restore",
|
||||
cookies=owner_cookies,
|
||||
)
|
||||
assert r.status_code == 400, r.text
|
||||
body = r.json()
|
||||
assert body["detail"]["code"] == "version_not_approved"
|
||||
assert body["detail"]["source_status"] == "blocked_llm"
|
||||
|
||||
def test_restore_rejects_review_error_version(self, web_client, monkeypatch):
|
||||
"""Same as blocked_llm but the LLM call errored — the
|
||||
submission row lands at 'review_error' and the version is
|
||||
equally not-approvable."""
|
||||
def mock_review_bundle(*args, **kwargs):
|
||||
return {
|
||||
"risk_level": None,
|
||||
"summary": None,
|
||||
"findings": [],
|
||||
"template_placeholders_found": 0,
|
||||
"reviewed_by_model": "mock-model",
|
||||
"error": "LLMFormatError: mock truncation",
|
||||
}
|
||||
monkeypatch.setattr(
|
||||
"src.store_guardrails.llm_review.review_bundle",
|
||||
mock_review_bundle,
|
||||
)
|
||||
|
||||
owner_id, owner_cookies = _create_user(web_client, "errrestore@x.com")
|
||||
eid = _upload_clean(web_client, owner_cookies, name="errrestore")
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_enabled", lambda: True)
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_llm_provider_ready", lambda: True)
|
||||
v2 = _make_skill_zip("errrestore", body="V2 BODY " * 80)
|
||||
r = web_client.put(
|
||||
f"/api/store/entities/{eid}",
|
||||
files={"file": ("v2.zip", v2, "application/zip")},
|
||||
cookies=owner_cookies,
|
||||
)
|
||||
assert r.status_code == 200, r.text
|
||||
|
||||
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||||
from src.store_guardrails.runner import run_llm_review
|
||||
conn = get_system_db()
|
||||
sub_id = StoreSubmissionsRepository(conn).latest_for_entity(eid)["id"]
|
||||
conn.close()
|
||||
run_llm_review(
|
||||
sub_id,
|
||||
plugin_dir=Path(get_store_dir()) / eid / "versions" / "v2" / "plugin",
|
||||
conn_factory=get_system_db,
|
||||
api_key_loader=lambda: "sk-test",
|
||||
model_loader=lambda: "claude-haiku-4-5-20251001",
|
||||
)
|
||||
|
||||
r = web_client.post(
|
||||
f"/api/store/entities/{eid}/versions/2/restore",
|
||||
cookies=owner_cookies,
|
||||
)
|
||||
assert r.status_code == 400, r.text
|
||||
body = r.json()
|
||||
assert body["detail"]["code"] == "version_not_approved"
|
||||
assert body["detail"]["source_status"] == "review_error"
|
||||
|
||||
def test_restore_allows_legacy_v1_without_submission_id(self, web_client):
|
||||
"""The v1 seed entry created by ``StoreEntitiesRepository.create``
|
||||
carries ``submission_id=None`` until the API layer backfills.
|
||||
A restore targeting v1 must NOT be rejected just because the
|
||||
join can't find a submission status — back-compat for entities
|
||||
created before v37."""
|
||||
owner_id, owner_cookies = _create_user(web_client, "legv1@x.com")
|
||||
eid = _upload_clean(web_client, owner_cookies, name="legv1")
|
||||
# Manually clear v1's submission_id to simulate the legacy seed.
|
||||
conn = get_system_db()
|
||||
repo = StoreEntitiesRepository(conn)
|
||||
ent = repo.get(eid)
|
||||
history = ent["version_history"]
|
||||
history[0]["submission_id"] = None
|
||||
conn.execute(
|
||||
"UPDATE store_entities SET version_history = ? WHERE id = ?",
|
||||
[json.dumps(history), eid],
|
||||
)
|
||||
conn.close()
|
||||
|
||||
# PUT a v2 so v1 is no longer current.
|
||||
v2 = _make_skill_zip("legv1", body="V2 BODY " * 80)
|
||||
r = web_client.put(
|
||||
f"/api/store/entities/{eid}",
|
||||
files={"file": ("v2.zip", v2, "application/zip")},
|
||||
cookies=owner_cookies,
|
||||
)
|
||||
assert r.status_code == 200, r.text
|
||||
|
||||
# Restore v1 → must succeed (200), because legacy v1 has
|
||||
# submission_id=None which the guard treats as approved.
|
||||
r = web_client.post(
|
||||
f"/api/store/entities/{eid}/versions/1/restore",
|
||||
cookies=owner_cookies,
|
||||
)
|
||||
assert r.status_code == 200, r.text
|
||||
|
||||
|
||||
class TestEditPage:
|
||||
def test_edit_page_renders_for_owner(self, web_client):
|
||||
|
|
@ -386,6 +535,7 @@ class TestInstallerAlwaysGetsLatestApproved:
|
|||
# import get_guardrails_enabled` binds the symbol into app.api.store,
|
||||
# so patching the source module isn't enough.
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_enabled", lambda: True)
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_llm_provider_ready", lambda: True)
|
||||
|
||||
# PUT a new bundle. Guardrails on → submission lands at
|
||||
# pending_llm; promotion deferred.
|
||||
|
|
@ -461,6 +611,7 @@ class TestInstallerAlwaysGetsLatestApproved:
|
|||
# import get_guardrails_enabled` binds the symbol into app.api.store,
|
||||
# so patching the source module isn't enough.
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_enabled", lambda: True)
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_llm_provider_ready", lambda: True)
|
||||
conn = get_system_db()
|
||||
v1_hash = StoreEntitiesRepository(conn).get(eid)["version"]
|
||||
conn.close()
|
||||
|
|
@ -600,6 +751,7 @@ class TestRestoreDeferredPromotion:
|
|||
)
|
||||
# v2 promoted (guardrails off). Now flip on for the restore.
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_enabled", lambda: True)
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_llm_provider_ready", lambda: True)
|
||||
monkeypatch.setattr(
|
||||
"app.api.store._schedule_llm_review",
|
||||
lambda *a, **kw: None,
|
||||
|
|
@ -626,6 +778,91 @@ class TestEditPageBanner:
|
|||
"""Detail page banner during pending edit review must surface
|
||||
the version under review + the prior version still serving."""
|
||||
|
||||
def test_banner_shows_review_error_when_prior_version_still_serving(
|
||||
self, web_client, monkeypatch,
|
||||
):
|
||||
"""v2+ edit landing in review_error must surface a banner to
|
||||
owner/admin even though entity stays at visibility=approved.
|
||||
The original gate (visibility != approved) silently hid the
|
||||
failure — see Bug #2 in plan
|
||||
when-i-submitted-new-delightful-russell.md."""
|
||||
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||||
|
||||
# Mock LLM to ERROR on v2.
|
||||
def mock_review_bundle(*args, **kwargs):
|
||||
return {
|
||||
"risk_level": None,
|
||||
"summary": None,
|
||||
"findings": [],
|
||||
"template_placeholders_found": 0,
|
||||
"reviewed_by_model": "mock-model",
|
||||
"error": "LLMFormatError: mock truncation",
|
||||
}
|
||||
monkeypatch.setattr(
|
||||
"src.store_guardrails.llm_review.review_bundle",
|
||||
mock_review_bundle,
|
||||
)
|
||||
|
||||
owner_id, owner_cookies = _create_user(web_client, "errbanner@x.com")
|
||||
eid = _upload_clean(web_client, owner_cookies, name="errbanner")
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_enabled", lambda: True)
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_llm_provider_ready", lambda: True)
|
||||
|
||||
v2 = _make_skill_zip("errbanner", body="V2 BODY " * 80)
|
||||
r = web_client.put(
|
||||
f"/api/store/entities/{eid}",
|
||||
files={"file": ("v2.zip", v2, "application/zip")},
|
||||
cookies=owner_cookies,
|
||||
)
|
||||
assert r.status_code == 200, r.text
|
||||
|
||||
# Drive the BG review synchronously so the submission row
|
||||
# lands at review_error.
|
||||
from src.store_guardrails.runner import run_llm_review
|
||||
conn = get_system_db()
|
||||
sub_id = StoreSubmissionsRepository(conn).latest_for_entity(eid)["id"]
|
||||
ent = StoreEntitiesRepository(conn).get(eid)
|
||||
conn.close()
|
||||
assert ent["visibility_status"] == "approved", (
|
||||
"deferred promotion: entity stays approved at prior version"
|
||||
)
|
||||
run_llm_review(
|
||||
sub_id,
|
||||
plugin_dir=Path(get_store_dir()) / eid / "versions" / "v2" / "plugin",
|
||||
conn_factory=get_system_db,
|
||||
api_key_loader=lambda: "sk-test",
|
||||
model_loader=lambda: "claude-haiku-4-5-20251001",
|
||||
)
|
||||
|
||||
# Confirm the submission really landed at review_error.
|
||||
conn = get_system_db()
|
||||
sub = StoreSubmissionsRepository(conn).get(sub_id)
|
||||
ent = StoreEntitiesRepository(conn).get(eid)
|
||||
conn.close()
|
||||
assert sub["status"] == "review_error"
|
||||
assert ent["visibility_status"] == "approved", (
|
||||
"entity must remain approved — prior version still serving"
|
||||
)
|
||||
|
||||
# Detail page MUST surface the failure.
|
||||
r = web_client.get(
|
||||
f"/marketplace/flea/{eid}", cookies=owner_cookies,
|
||||
)
|
||||
assert r.status_code == 200
|
||||
body = r.text
|
||||
# The widened v2+ review_error copy mentions the prior version
|
||||
# still serving — that's the user-visible signal we just added.
|
||||
assert "Latest edit failed review" in body, (
|
||||
"banner partial must render review_error H3 for v2+ edit "
|
||||
"when prior version still serves"
|
||||
)
|
||||
assert "previously approved version (v1)" in body, (
|
||||
"banner copy must explain why the entity still appears live"
|
||||
)
|
||||
# The model's error string must reach the page so the owner
|
||||
# can see what went wrong.
|
||||
assert "LLMFormatError" in body
|
||||
|
||||
def test_banner_shows_version_n_under_review(self, web_client, monkeypatch):
|
||||
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||||
owner_id, owner_cookies = _create_user(web_client, "bannerowner@x.com")
|
||||
|
|
@ -633,6 +870,7 @@ class TestEditPageBanner:
|
|||
|
||||
# Switch guardrails on; stub LLM scheduler so v2 stays pending.
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_enabled", lambda: True)
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_llm_provider_ready", lambda: True)
|
||||
monkeypatch.setattr(
|
||||
"app.api.store._schedule_llm_review",
|
||||
lambda *a, **kw: None,
|
||||
|
|
@ -742,6 +980,7 @@ class TestPRReviewFixes:
|
|||
|
||||
# Switch guardrails on for the edit so promotion defers.
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_enabled", lambda: True)
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_llm_provider_ready", lambda: True)
|
||||
monkeypatch.setattr(
|
||||
"app.api.store._schedule_llm_review",
|
||||
lambda *a, **kw: None,
|
||||
|
|
@ -789,6 +1028,7 @@ class TestPRReviewFixes:
|
|||
|
||||
# Enable guardrails so promotion defers.
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_enabled", lambda: True)
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_llm_provider_ready", lambda: True)
|
||||
monkeypatch.setattr(
|
||||
"app.api.store._schedule_llm_review",
|
||||
lambda *a, **kw: None,
|
||||
|
|
@ -840,6 +1080,7 @@ class TestPRReviewFixes:
|
|||
# Enable guardrails for the edit. Stub LLM scheduler so we
|
||||
# control when the runner fires.
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_enabled", lambda: True)
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_llm_provider_ready", lambda: True)
|
||||
monkeypatch.setattr(
|
||||
"app.api.store._schedule_llm_review",
|
||||
lambda *a, **kw: None,
|
||||
|
|
@ -908,6 +1149,7 @@ class TestPRReviewFixes:
|
|||
eid = _upload_clean(web_client, owner_cookies, name="archmid")
|
||||
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_enabled", lambda: True)
|
||||
monkeypatch.setattr("app.api.store.get_guardrails_llm_provider_ready", lambda: True)
|
||||
monkeypatch.setattr(
|
||||
"app.api.store._schedule_llm_review",
|
||||
lambda *a, **kw: None,
|
||||
|
|
@ -1005,3 +1247,164 @@ class TestAdminQueueShowsVersion:
|
|||
assert "<dt>Version</dt>" in body
|
||||
assert "<dt>Bundle hash</dt>" in body
|
||||
assert "v1" in body
|
||||
|
||||
|
||||
class TestPublishGateFailClosed:
|
||||
"""Hold-for-review when ``guardrails.enabled: true`` but no LLM
|
||||
provider credentials are present in env. The pre-v45 fall-back
|
||||
silently auto-approved every upload — a fail-OPEN hole the
|
||||
operator couldn't notice. New behavior: submissions sit at
|
||||
``pending_llm``, entity stays at ``visibility_status='pending'``,
|
||||
admin retries from /admin/store/submissions after providing
|
||||
credentials."""
|
||||
|
||||
def test_v1_upload_enabled_but_not_ready_holds_at_pending(
|
||||
self, web_client, monkeypatch,
|
||||
):
|
||||
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||||
# Flip guardrails ON but leave provider_ready as False.
|
||||
monkeypatch.setattr(
|
||||
"app.api.store.get_guardrails_enabled", lambda: True,
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
"app.api.store.get_guardrails_llm_provider_ready", lambda: False,
|
||||
)
|
||||
# No mock review_bundle — we should never call the LLM.
|
||||
# If we did, the lack of patching would surface as a real
|
||||
# network call attempt, easy to catch as a hang.
|
||||
|
||||
owner_id, owner_cookies = _create_user(web_client, "holdv1@x.com")
|
||||
eid = _upload_clean(web_client, owner_cookies, name="holdv1")
|
||||
|
||||
conn = get_system_db()
|
||||
ent = StoreEntitiesRepository(conn).get(eid)
|
||||
sub = StoreSubmissionsRepository(conn).latest_for_entity(eid)
|
||||
conn.close()
|
||||
assert ent["visibility_status"] == "pending", (
|
||||
"enabled-but-not-ready must NOT publish — entity stays pending"
|
||||
)
|
||||
assert sub["status"] == "pending_llm", (
|
||||
"submission must hold at pending_llm awaiting admin retry"
|
||||
)
|
||||
assert sub["llm_findings"] is None, (
|
||||
"no LLM call was made — findings must be empty"
|
||||
)
|
||||
|
||||
def test_admin_retry_pending_llm_fires_review(
|
||||
self, web_client, monkeypatch,
|
||||
):
|
||||
"""After the operator sets the API key, admin Retry-review on a
|
||||
held pending_llm row schedules + runs the LLM."""
|
||||
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||||
|
||||
# Phase 1: upload with provider not-ready → held at pending_llm.
|
||||
monkeypatch.setattr(
|
||||
"app.api.store.get_guardrails_enabled", lambda: True,
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
"app.api.store.get_guardrails_llm_provider_ready", lambda: False,
|
||||
)
|
||||
owner_id, owner_cookies = _create_user(web_client, "retryholder@x.com")
|
||||
eid = _upload_clean(web_client, owner_cookies, name="retryholder")
|
||||
|
||||
conn = get_system_db()
|
||||
sub_id = StoreSubmissionsRepository(conn).latest_for_entity(eid)["id"]
|
||||
conn.close()
|
||||
|
||||
# Phase 2: operator adds credentials, admin retries.
|
||||
# Inject a fake env var so default_api_key_loader doesn't raise
|
||||
# before the mock review_bundle runs.
|
||||
monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-test-fake-key-for-retry")
|
||||
# Mock review_bundle so the retry resolves to approved without
|
||||
# touching the network.
|
||||
def mock_review_bundle(*args, **kwargs):
|
||||
return {
|
||||
"risk_level": "safe", "summary": "ok",
|
||||
"findings": [], "template_placeholders_found": 0,
|
||||
"reviewed_by_model": "mock-model", "error": None,
|
||||
"content_quality": {"verdict": "pass", "issues": []},
|
||||
}
|
||||
monkeypatch.setattr(
|
||||
"src.store_guardrails.llm_review.review_bundle",
|
||||
mock_review_bundle,
|
||||
)
|
||||
|
||||
_, admin_cookies = _create_admin(web_client)
|
||||
r = web_client.post(
|
||||
f"/api/admin/store/submissions/{sub_id}/retry",
|
||||
cookies=admin_cookies,
|
||||
)
|
||||
assert r.status_code == 200, r.text
|
||||
# After retry, BG task runs synchronously in TestClient (it
|
||||
# blocks the response). Verify the row moved to approved.
|
||||
conn = get_system_db()
|
||||
sub = StoreSubmissionsRepository(conn).get(sub_id)
|
||||
ent = StoreEntitiesRepository(conn).get(eid)
|
||||
conn.close()
|
||||
assert sub["status"] == "approved", (
|
||||
f"retry must drive submission to approved; got {sub['status']}"
|
||||
)
|
||||
assert ent["visibility_status"] == "approved", (
|
||||
"entity must flip to approved after LLM ok"
|
||||
)
|
||||
|
||||
def test_edit_enabled_but_not_ready_holds_prior_serving(
|
||||
self, web_client, monkeypatch,
|
||||
):
|
||||
"""v2+ edit under enabled-but-not-ready: v1 keeps serving,
|
||||
v2 submission held at pending_llm. Critical safety property:
|
||||
no silent promotion."""
|
||||
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||||
# Initial upload runs with guardrails OFF (autouse default) →
|
||||
# v1 approved. Then flip to enabled-but-not-ready for PUT.
|
||||
owner_id, owner_cookies = _create_user(web_client, "holdedit@x.com")
|
||||
eid = _upload_clean(web_client, owner_cookies, name="holdedit")
|
||||
conn = get_system_db()
|
||||
v1_hash = StoreEntitiesRepository(conn).get(eid)["version"]
|
||||
conn.close()
|
||||
|
||||
monkeypatch.setattr(
|
||||
"app.api.store.get_guardrails_enabled", lambda: True,
|
||||
)
|
||||
monkeypatch.setattr(
|
||||
"app.api.store.get_guardrails_llm_provider_ready", lambda: False,
|
||||
)
|
||||
v2 = _make_skill_zip("holdedit", body="V2 BODY " * 80)
|
||||
r = web_client.put(
|
||||
f"/api/store/entities/{eid}",
|
||||
files={"file": ("v2.zip", v2, "application/zip")},
|
||||
cookies=owner_cookies,
|
||||
)
|
||||
assert r.status_code == 200, r.text
|
||||
|
||||
conn = get_system_db()
|
||||
ent = StoreEntitiesRepository(conn).get(eid)
|
||||
sub = StoreSubmissionsRepository(conn).latest_for_entity(eid)
|
||||
conn.close()
|
||||
# Entity stays approved at v1, v2 sits at pending_llm.
|
||||
assert ent["visibility_status"] == "approved"
|
||||
assert ent["version_no"] == 1
|
||||
assert ent["version"] == v1_hash, (
|
||||
"live bundle must remain v1 — no silent promotion of v2"
|
||||
)
|
||||
assert sub["status"] == "pending_llm"
|
||||
assert sub["llm_findings"] is None
|
||||
|
||||
def test_disabled_intent_still_auto_approves(
|
||||
self, web_client, monkeypatch,
|
||||
):
|
||||
"""Operator explicitly opting out (``enabled: false``) keeps
|
||||
the prior auto-approve behavior — local dev / no-LLM
|
||||
deployments aren't blocked."""
|
||||
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||||
# autouse fixture already sets enabled=False. Just confirm
|
||||
# behavior end-to-end.
|
||||
owner_id, owner_cookies = _create_user(web_client, "offowner@x.com")
|
||||
eid = _upload_clean(web_client, owner_cookies, name="offowner")
|
||||
|
||||
conn = get_system_db()
|
||||
ent = StoreEntitiesRepository(conn).get(eid)
|
||||
sub = StoreSubmissionsRepository(conn).latest_for_entity(eid)
|
||||
conn.close()
|
||||
assert ent["visibility_status"] == "approved"
|
||||
assert sub["status"] == "approved"
|
||||
|
|
|
|||
Loading…
Reference in a new issue