From cd030287766d65d9a1349ed3f1ee59bc3d681c9e Mon Sep 17 00:00:00 2001 From: Vojtech <119944107+cvrysanek@users.noreply.github.com> Date: Sat, 16 May 2026 09:12:29 +0400 Subject: [PATCH] fix(store): restore reuses prior approved verdict + admin detail surfaces content_quality (#332) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * fix(store): restore reuses prior approved verdict; admin detail surfaces content_quality Live bug on agnes-development: entity 6ba2ee1d…'s v5 submission (third restore of v1, byte-identical to v1/v2/v4/v6) landed `blocked_llm` while the other identical-hash siblings landed `approved`. Anthropic structured output is non-deterministic — same bytes flipped `content_quality.verdict` pass↔fail across calls. Admin detail page made the failure look mysterious: only security-findings table rendered, so a content-quality-only block showed up as "No findings — model verdict was clean". Two fixes: 1. Restore endpoint reuses a prior `approved` submission's verdict when the restored bundle hash matches an existing history entry AND `reviewed_by_model` matches. Skips the LLM call, stamps the new submission with the prior verdict + `reused_from_submission_id` marker. Deterministic + saves Anthropic tokens. Gated on schedule_async_llm so guardrails-off keeps its existing path. 2. Admin detail template now renders `content_quality.issues` in its own table + adds an explicit "Blocked but no findings recorded" notice for the transient-non-determinism case + surfaces the reuse marker when present. Reuse falls back to a real LLM call when: - prior submission's reviewed_by_model doesn't match current (admin upgraded tier Haiku → Sonnet → Opus) - prior submission was guardrails-off (no reviewed_by_model) - no history entry has matching hash Tests: - TestRestoreReusesApprovedVerdict::test_restore_of_approved_version_skips_llm_and_reuses_verdict - TestRestoreReusesApprovedVerdict::test_restore_legacy_v1_falls_back_to_llm * fix(store): admin detail v# by submission_id + version switcher Three related fixes surfaced live by a user inspecting submission 47bbc1f5… on localhost where v# rendered as v1 even though current was v10. 1. Admin queue + admin detail derive submission v# by submission_id instead of hash. Pre-fix the loop matched first hash-equal entry in version_history — always v1 when bundles were byte-identical (which is the common case after the restore-reuse path). Two call sites updated: - `src/repositories/store_submissions.py:list_for_admin` (queue v# column) - `app/web/router.py:admin_store_submission_detail_page` (detail page v# chip on each section header) Same fix pattern as PR #330 for runner / override. 2. New version-switcher card on admin detail page lists every submission linked to the entity with status + reviewed_by_model + click-to-jump. Solves the user's secondary ask ("should be a way to switch different versions on the submission detail"). 3. Initial POST now backfills the v1 seed entry's submission_id right after creating the v1 submission. The helper `update_history_submission_id` existed but no production code path called it — so v1 always had submission_id=None and every "find v# for submission" lookup silently failed for v1. 171 tests green on touched surface. * release: 0.54.24 — restore reuses prior approved verdict + admin detail content_quality + v# by submission_id (Codex/Live follow-up to #330/#331) --------- Co-authored-by: ZdenekSrotyr --- CHANGELOG.md | 46 ++++ app/api/store.py | 155 +++++++++++++ app/web/router.py | 60 ++++- .../admin_store_submission_detail.html | 107 ++++++++- pyproject.toml | 2 +- src/repositories/store_submissions.py | 10 +- tests/test_store_entity_versions.py | 218 ++++++++++++++++++ 7 files changed, 584 insertions(+), 14 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 24487ca..6818b62 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,52 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C ## [Unreleased] +## [0.54.24] — 2026-05-16 + +### Fixed +- Flea-market admin submissions UI now derives the per-submission + `v#` label by **submission_id**, not **hash**. Hash-based lookup + mislabeled every byte-identical reupload (and every reused-verdict + restore — common after the restore-reuse fix below) as `v1` + because the loop picked the FIRST history entry with matching + hash. Affected both the admin queue column (`v#`) and the per- + section chips on the detail page. Same fix pattern as PR #330 + (runner / override paths). +- Flea-market admin submission detail page gained a version-switcher + card listing every submission linked to the same entity with + status badge + reviewed_by_model + click-to-jump. Lets admins + compare verdicts across versions without bouncing back to the + queue. +- Flea-market initial POST now backfills the v1 seed entry's + `submission_id` immediately after creating the v1 submission row. + Pre-fix the v1 history entry always carried `submission_id=None` + so downstream lookups (`_version_no_for_submission`, admin queue + v#, admin detail chip, restore-reuse) silently failed for v1. +- Flea-market restore endpoint now reuses the prior approved + submission's LLM verdict when the restored bundle is + byte-identical to a history entry already reviewed by the same + `review_model`. Pre-fix every restore re-ran the LLM; Anthropic + structured output is non-deterministic — same bytes flipped + `content_quality.verdict` pass↔fail across calls, so a second + restore of an already-approved version could spuriously land at + `blocked_llm`. Reuse skips the LLM, stamps the new submission + with the prior verdict + `reused_from_submission_id` marker, + and saves the Anthropic token cost. Surfaced live on a + development deployment where the third restore of a v1 bundle + (same hash as v1/v2/v4/v6 — multiple identical re-uploads) + landed `blocked_llm` while sibling submissions were `approved`. +- Admin submission detail page now surfaces + `llm_findings.content_quality.issues` in its own table next to + the security-findings table. Pre-fix the template only rendered + security findings, so a submission blocked purely on + `content_quality.verdict='fail'` (no security findings) showed + up as "No findings — model verdict was clean" even though + `status='blocked_llm'`. Also adds an explicit + "Blocked but no findings recorded" notice when the verdict is + blocked but neither findings list is populated (transient LLM + non-determinism), pointing admin at Rescan / Override. Reuse + markers (`reused_from_submission_id`) render too. + ## [0.54.23] — 2026-05-16 ### Fixed diff --git a/app/api/store.py b/app/api/store.py index effcc7d..2cfbb6e 100644 --- a/app/api/store.py +++ b/app/api/store.py @@ -51,6 +51,7 @@ from app.auth.dependencies import _get_db, get_current_user from app.instance_config import ( get_guardrails_enabled, get_guardrails_llm_provider_ready, + get_guardrails_review_model, ) from app.utils import get_store_dir from src.db import get_system_db @@ -431,6 +432,58 @@ async def _hold_entity_write_lock(entity_id: str): yield +def _find_reusable_approved_verdict( + entity_row: Dict[str, Any], + new_version_hash: str, + subs_repo, + current_review_model: str, +) -> Optional[tuple]: + """Find a prior `approved` submission whose bundle hash matches + the new submission's hash and whose reviewer was the same model. + + Returns ``(prior_submission_id, prior_llm_findings)`` for the + caller to reuse, or ``None`` if no eligible prior verdict exists. + + Rationale: + Anthropic structured output is non-deterministic. The + content_quality.verdict in particular can flip pass↔fail on + byte-identical bundles (observed live on a development + deployment: across five identical-hash submissions of one + entity, four landed `approved` and one landed `blocked_llm` + because the model flagged a description as weak on that one + call). When the user restores an already-approved version + — or re-uploads byte-identical bundles — the previous verdict + is the authoritative one and should be reused. + + Gated on ``reviewed_by_model`` match so a stricter model can + still re-review under tightened rules (admin upgrades from + Haiku → Sonnet → Opus). + """ + for entry in (entity_row.get("version_history") or []): + if entry.get("hash") != new_version_hash: + continue + sid = entry.get("submission_id") + if not sid: + continue + try: + sub = subs_repo.get(sid) + except Exception: + continue + if not sub: + continue + if sub.get("status") != "approved": + continue + # Skip rows that pre-date the LLM tier (guardrails-off prior + # approvals carry no reviewed_by_model — caller still needs + # the LLM to validate the bundle under guardrails-on rules). + if not sub.get("reviewed_by_model"): + continue + if sub.get("reviewed_by_model") != current_review_model: + continue + return sid, sub.get("llm_findings") or {} + return None + + def _version_no_for_submission( entity_row: Dict[str, Any], submission_id: str, ) -> Optional[int]: @@ -1495,6 +1548,15 @@ async def create_entity( file_size=bundle_meta.file_size, bundle_sha256=bundle_meta.sha256, ) + # Backfill the v1 seed's submission_id so downstream lookups + # (`_version_no_for_submission`, admin queue v#, admin detail + # v# chip, restore reuse) can resolve v1 the same way they + # resolve v2+. Pre-fix this was deferred to a later + # `update_history_submission_id` call that never landed in + # the create path — leaving the v1 entry with + # `submission_id=None` and every "find v# for this submission" + # lookup silently failing for v1. + repo.update_history_submission_id(entity_id, 1, sub_id) _audit( conn, user["id"], "store.submission.accepted" if guardrails_on else "store.submission.approved", @@ -2136,6 +2198,99 @@ async def _restore_version_locked( hold_for_review = guardrails_enabled schedule_async_llm = guardrails_enabled and provider_ready subs_repo = StoreSubmissionsRepository(conn) + + # Idempotent restore: when the restored bundle is byte-identical + # to a prior `approved` submission reviewed by the SAME model, + # reuse that verdict and skip the async LLM step. Anthropic + # structured output is non-deterministic — same bytes can flip + # content_quality.verdict pass↔fail across calls, so a second + # restore of an already-approved version could spuriously block. + # Reuse keeps the chain deterministic + saves tokens. + # `restore` (and its byte-for-byte recovery contract) is the + # most natural place for this; PUT could extend later. Gated on + # `schedule_async_llm` so guardrails-off keeps its existing + # auto-approve path and enabled-but-not-ready doesn't silently + # promote without a verdict in env. + reuse_verdict = None + if schedule_async_llm: + try: + current_model = get_guardrails_review_model() + reuse_verdict = _find_reusable_approved_verdict( + entity, new_version, subs_repo, current_model, + ) + except Exception: + logger.exception( + "restore: reusable-verdict lookup failed for entity %s", + entity_id, + ) + reuse_verdict = None + + if reuse_verdict is not None: + prior_sid, prior_findings = reuse_verdict + reused_findings = dict(prior_findings) + reused_findings["reused_from_submission_id"] = prior_sid + reused_findings["reused_reason"] = ( + "byte-identical bundle to a prior approved submission; " + "skipped re-running LLM review to avoid spurious " + "content_quality flips (Anthropic structured output is " + "non-deterministic)" + ) + sub_id = subs_repo.create( + submitter_id=user["id"], + submitter_email=user.get("email"), + type=entity["type"], + name=entity["name"], + version=new_version, + status="approved", + entity_id=entity_id, + inline_checks=inline.to_response_dict(), + llm_findings=reused_findings, + file_size=target_meta.file_size, + bundle_sha256=target_meta.sha256, + ) + # Stamp reviewed_by_model + llm_findings via update_status + # (create doesn't accept reviewed_by_model). Terminal-state + # CAS guard allows this since the row was just inserted at + # 'approved' — we're re-affirming, not clobbering. + subs_repo.update_status( + sub_id, status="approved", + llm_findings=reused_findings, + reviewed_by_model=current_model, + allow_terminal_overwrite=True, + ) + appended_n = repo.append_version_history( + entity_id, + version_hash=new_version, + sha256=target_meta.sha256, + size=target_meta.file_size, + submission_id=sub_id, + created_by=user["id"], + ) + _audit( + conn, user["id"], + "store.submission.reused_approved_verdict", + f"store_submission:{sub_id}", + { + "entity_id": entity_id, + "restored_from_version_no": version_no, + "new_version_no": appended_n, + "reused_from_submission_id": prior_sid, + "reviewed_by_model": current_model, + }, + ) + _audit( + conn, user["id"], "store.entity.restore", entity_id, + {"restored_from_version_no": version_no, + "new_version_no": appended_n, + "submission_id": sub_id, + "reused_verdict": True}, + ) + # Forward-only promote (same gate as runner / override). + if appended_n > int(entity.get("version_no") or 0): + promote_to_version(entity_id, appended_n, repo) + _invalidate_etag() + return _entity_to_response(conn, repo.get(entity_id)) # type: ignore[arg-type] + sub_id = subs_repo.create( submitter_id=user["id"], submitter_email=user.get("email"), diff --git a/app/web/router.py b/app/web/router.py index 32f6da4..8064db7 100644 --- a/app/web/router.py +++ b/app/web/router.py @@ -2061,23 +2061,64 @@ async def admin_store_submission_detail_page( # Verdict (sub.status) is immutable forensic record; lifecycle # (entity.visibility_status) reflects current state — see plan # "Admin Submissions Filter: Use Entity Visibility, Not Denormalized Status". - # Also derive submission_version_no by matching sub.version (hash) - # against the entity's version_history (v37 edit feature). + # Resolve THIS submission's version_no via submission_id (NOT + # hash — multiple history entries can share a hash when the user + # re-uploads byte-identical bundles, and the hash-match-first-wins + # loop always picked v1, mislabeling every reupload as v1). Same + # fix as PR #330 for the runner / override paths; we missed this + # display site at the time. entity_visibility_status = None entity_version_no = None submission_version_no = None + sibling_submissions: list = [] if sub.get("entity_id"): ent = StoreEntitiesRepository(conn).get(sub["entity_id"]) if ent: entity_visibility_status = ent.get("visibility_status") entity_version_no = ent.get("version_no") - for entry in (ent.get("version_history") or []): - try: - if entry.get("hash") == sub.get("version"): - submission_version_no = int(entry.get("n")) - break - except (TypeError, ValueError): - continue + from app.api.store import _version_no_for_submission + submission_version_no = _version_no_for_submission( + ent, submission_id, + ) + # Build a version-switcher: every submission row linked to + # this entity, sorted newest first, with its derived v#. + # Admin clicks a row → jumps to that submission's detail. + # Surfaces multi-version entities clearly + lets admin + # compare verdicts across versions without bouncing back + # to the queue. + history = ent.get("version_history") or [] + history_by_sub: dict = {} + for entry in history: + sid = entry.get("submission_id") + if sid: + try: + history_by_sub[sid] = int(entry.get("n")) + except (TypeError, ValueError): + continue + # Direct query — list_for_admin doesn't filter by entity_id + # and we don't want to add a parameter for this one display + # need. Order by created_at DESC so newest is first in the + # switcher. + ent_sub_rows = [ + dict(zip(["id", "status", "version", "created_at", "reviewed_by_model"], row)) + for row in conn.execute( + "SELECT id, status, version, created_at, reviewed_by_model " + "FROM store_submissions " + "WHERE entity_id = ? " + "ORDER BY created_at DESC", + [sub["entity_id"]], + ).fetchall() + ] + for row in ent_sub_rows: + sibling_submissions.append({ + "id": row["id"], + "status": row.get("status"), + "version": row.get("version"), + "created_at": row.get("created_at"), + "version_no": history_by_sub.get(row["id"]), + "reviewed_by_model": row.get("reviewed_by_model"), + "is_current": row["id"] == submission_id, + }) other_count = StoreSubmissionsRepository(conn).count_for_submitter( sub["submitter_id"], exclude_id=submission_id, @@ -2159,6 +2200,7 @@ async def admin_store_submission_detail_page( entity_visibility_status=entity_visibility_status, entity_version_no=entity_version_no, submission_version_no=submission_version_no, + sibling_submissions=sibling_submissions, ) return templates.TemplateResponse(request, "admin_store_submission_detail.html", ctx) diff --git a/app/web/templates/admin_store_submission_detail.html b/app/web/templates/admin_store_submission_detail.html index f87675e..cf65559 100644 --- a/app/web/templates/admin_store_submission_detail.html +++ b/app/web/templates/admin_store_submission_detail.html @@ -258,6 +258,58 @@ {% endif %} + {# ── Version switcher ────────────────────────────────────────────────────── #} + {# All submissions linked to this entity, newest first. Admin clicks + any row to jump to its detail. Surfaces multi-version entities so + verdicts across versions can be compared without bouncing back to + the queue. #} + {% if sibling_submissions and sibling_submissions|length > 1 %} +
+

Submissions for this entity ({{ sibling_submissions|length }})

+ + + + + + + + + + + + + {% for s in sibling_submissions %} + + + + + + + + + {% endfor %} + +
v#StatusHashReviewed byCreated
+ {% if s.version_no %} + v{{ s.version_no }} + {% else %} + + {% endif %} + {% if s.version_no and s.version_no == entity_version_no %} + current + {% endif %} + {{ s.status }}{{ (s.version or '')[:12] }}{{ s.reviewed_by_model or '—' }} + {{ s.created_at.strftime("%Y-%m-%d %H:%M") if s.created_at else "" }} + + {% if s.is_current %} + viewing + {% else %} + Open → + {% endif %} +
+
+ {% endif %} + {# ── Override history ────────────────────────────────────────────────────── #} {% if sub.override_by %}
@@ -404,6 +456,7 @@

{{ sub.llm_findings.summary }}

{% endif %} {% if sub.llm_findings.findings %} +

Security findings

@@ -420,8 +473,58 @@ {% endfor %}
SeverityCategoryFileExplanationFix hint
- {% else %} -
No findings — model verdict was clean.
+ {% endif %} + {# Content-quality issues. Pre-fix the template only rendered + the security findings table, so a submission blocked + purely on content_quality (e.g. weak description) showed + up as 'No findings — model verdict was clean' even though + status was blocked_llm. Surfaced live by an admin + investigating a blocked re-restore of an already-approved + bundle. #} + {% set cq = sub.llm_findings.content_quality %} + {% if cq and cq.issues %} +

+ Content quality issues + {{ cq.verdict }} +

+ + + + + + {% for issue in cq.issues %} + + + + + + + {% endfor %} + +
FileFieldIssueSuggested rewrite
{{ issue.file }}{{ issue.field }}{{ issue.issue }}{{ issue.hint or "" }}
+ {% endif %} + {% if not sub.llm_findings.findings and not (cq and cq.issues) %} + {% if sub.status in ['blocked_llm', 'review_error'] %} +
+ Blocked but no findings recorded. + This is usually a transient LLM non-determinism: the + reviewer returned a fail verdict without enumerating + the offending items. Click Rescan to re-run + the pipeline, or Override with a written + reason if you've verified the bundle offline. +
+ {% else %} +
No findings — model verdict was clean.
+ {% endif %} + {% endif %} + {% if sub.llm_findings.reused_from_submission_id %} +
+ ✓ Verdict reused from prior approved submission + + {{ sub.llm_findings.reused_from_submission_id[:8] }} + + (byte-identical bundle — LLM was not re-called). +
{% endif %} {% endif %} {% elif sub.status in ['pending_inline', 'pending_llm'] %} diff --git a/pyproject.toml b/pyproject.toml index b6c5d37..d5f0c31 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "agnes-the-ai-analyst" -version = "0.54.23" +version = "0.54.24" description = "Agnes — AI Data Analyst platform for AI analytical systems" requires-python = ">=3.11,<3.14" license = "MIT" diff --git a/src/repositories/store_submissions.py b/src/repositories/store_submissions.py index bfc0981..6a4a0d2 100644 --- a/src/repositories/store_submissions.py +++ b/src/repositories/store_submissions.py @@ -451,10 +451,16 @@ class StoreSubmissionsRepository: history = [] item["entity_version_history"] = history item["version_no"] = None - sub_hash = item.get("version") + # Look up THIS submission's version_no by submission_id, + # NOT by hash. Hash-based lookup mislabeled every + # byte-identical reupload (and every reused-verdict + # restore — common after PR #332) as v1 because the loop + # picked the FIRST history entry with matching hash. + # Same fix-pattern as PR #330 for runner / override. + sub_id = item.get("id") for entry in history: try: - if entry.get("hash") == sub_hash: + if entry.get("submission_id") == sub_id: item["version_no"] = int(entry.get("n")) break except (TypeError, ValueError): diff --git a/tests/test_store_entity_versions.py b/tests/test_store_entity_versions.py index 798a1bb..6e0ffd0 100644 --- a/tests/test_store_entity_versions.py +++ b/tests/test_store_entity_versions.py @@ -1992,3 +1992,221 @@ class TestRescanPromotesNonCurrent: assert ent_after["version"] != v1_hash, ( "entity.version hash must move to v2 after rescan promote" ) + +class TestRestoreReusesApprovedVerdict: + """Live-bug fix: a second restore of an already-approved version + sometimes flipped to `blocked_llm` because Anthropic structured + output is non-deterministic — same bytes, different + `content_quality.verdict` across calls. Restore now detects + byte-identical bundles backed by a prior `approved` submission + (same reviewed_by_model) and reuses that verdict.""" + + def test_restore_of_approved_version_skips_llm_and_reuses_verdict( + self, web_client, monkeypatch, + ): + from pathlib import Path + from app.utils import get_store_dir + from src.repositories.store_submissions import StoreSubmissionsRepository + from src.store_guardrails.runner import run_llm_review + + monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-fake-for-reuse") + owner_id, owner_cookies = _create_user(web_client, "reuse@x.com") + eid = _upload_clean(web_client, owner_cookies, name="reuseme") + + monkeypatch.setattr( + "app.api.store.get_guardrails_enabled", lambda: True, + ) + monkeypatch.setattr( + "app.api.store.get_guardrails_llm_provider_ready", lambda: True, + ) + monkeypatch.setattr( + "app.api.store.get_guardrails_review_model", + lambda: "claude-haiku-4-5-test", + ) + # Stub the BG-task scheduler so only our direct `run_llm_review` + # call writes the verdict (real BG uses default_model_loader + # which resolves to a different reviewed_by_model and would + # win the terminal-state CAS race). + monkeypatch.setattr( + "app.api.store._schedule_llm_review", + lambda *a, **kw: None, + ) + + approve_calls = {"n": 0} + + def mock_approve(*a, **kw): + approve_calls["n"] += 1 + return { + "risk_level": "safe", "summary": "ok", + "findings": [], "template_placeholders_found": 0, + "reviewed_by_model": "claude-haiku-4-5-test", + "error": None, + "content_quality": {"verdict": "pass", "issues": []}, + } + monkeypatch.setattr( + "src.store_guardrails.llm_review.review_bundle", mock_approve, + ) + + v2_zip = _make_skill_zip("reuseme", body="V2 body unique " * 30) + r = web_client.put( + f"/api/store/entities/{eid}", + files={"file": ("v2.zip", v2_zip, "application/zip")}, + cookies=owner_cookies, + ) + assert r.status_code == 200, r.text + conn = get_system_db() + v2_sub_id = StoreSubmissionsRepository(conn).latest_for_entity(eid)["id"] + conn.close() + run_llm_review( + v2_sub_id, + plugin_dir=Path(get_store_dir()) / eid / "versions" / "v2" / "plugin", + conn_factory=get_system_db, + api_key_loader=lambda: "sk", + model_loader=lambda: "claude-haiku-4-5-test", + ) + calls_after_v2 = approve_calls["n"] + + v3_zip = _make_skill_zip("reuseme", body="V3 body unique " * 30) + r = web_client.put( + f"/api/store/entities/{eid}", + files={"file": ("v3.zip", v3_zip, "application/zip")}, + cookies=owner_cookies, + ) + assert r.status_code == 200, r.text + conn = get_system_db() + v3_sub_id = StoreSubmissionsRepository(conn).latest_for_entity(eid)["id"] + conn.close() + run_llm_review( + v3_sub_id, + plugin_dir=Path(get_store_dir()) / eid / "versions" / "v3" / "plugin", + conn_factory=get_system_db, + api_key_loader=lambda: "sk", + model_loader=lambda: "claude-haiku-4-5-test", + ) + calls_after_v3 = approve_calls["n"] + # BG task in TestClient may fire run_llm_review on its own + # in addition to our direct call, so don't assert exact count + # difference — only check the restore window. + assert calls_after_v3 > calls_after_v2 + + # Restore v2 → byte-identical to v2 approved entry. Reuse + # must fire → no new LLM call. + r = web_client.post( + f"/api/store/entities/{eid}/versions/2/restore", + cookies=owner_cookies, + ) + assert r.status_code == 200, r.text + assert approve_calls["n"] == calls_after_v3, ( + f"restore of approved version must NOT call LLM; " + f"count grew from {calls_after_v3} to {approve_calls['n']}" + ) + + conn = get_system_db() + ent_after = StoreEntitiesRepository(conn).get(eid) + v4_sub_id = next( + (e["submission_id"] for e in ent_after["version_history"] + if int(e["n"]) == 4), + None, + ) + v4_sub = StoreSubmissionsRepository(conn).get(v4_sub_id) if v4_sub_id else None + conn.close() + assert v4_sub is not None + assert v4_sub["status"] == "approved" + assert v4_sub["reviewed_by_model"] == "claude-haiku-4-5-test" + assert (v4_sub["llm_findings"] or {}).get( + "reused_from_submission_id" + ) == v2_sub_id + assert ent_after["version_no"] == 4 + + def test_restore_legacy_v1_falls_back_to_llm( + self, web_client, monkeypatch, + ): + """v1 seed (guardrails-OFF approval, no reviewed_by_model) is + NOT eligible for reuse. Restoring v1 must schedule a real + LLM review under guardrails-on.""" + from pathlib import Path + from app.utils import get_store_dir + from src.repositories.store_submissions import StoreSubmissionsRepository + from src.store_guardrails.runner import run_llm_review + + monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-fake-fallback") + owner_id, owner_cookies = _create_user(web_client, "noreuse@x.com") + eid = _upload_clean(web_client, owner_cookies, name="noreuse") + + monkeypatch.setattr( + "app.api.store.get_guardrails_enabled", lambda: True, + ) + monkeypatch.setattr( + "app.api.store.get_guardrails_llm_provider_ready", lambda: True, + ) + monkeypatch.setattr( + "app.api.store.get_guardrails_review_model", + lambda: "claude-haiku-4-5-test", + ) + # Stub the BG-task scheduler so only our direct `run_llm_review` + # call writes the verdict (real BG uses default_model_loader + # which resolves to a different reviewed_by_model and would + # win the terminal-state CAS race). + monkeypatch.setattr( + "app.api.store._schedule_llm_review", + lambda *a, **kw: None, + ) + + approve_calls = {"n": 0} + + def mock_approve(*a, **kw): + approve_calls["n"] += 1 + return { + "risk_level": "safe", "summary": "ok", + "findings": [], "template_placeholders_found": 0, + "reviewed_by_model": "claude-haiku-4-5-test", + "error": None, + "content_quality": {"verdict": "pass", "issues": []}, + } + monkeypatch.setattr( + "src.store_guardrails.llm_review.review_bundle", mock_approve, + ) + + v2_zip = _make_skill_zip("noreuse", body="V2 body " * 30) + r = web_client.put( + f"/api/store/entities/{eid}", + files={"file": ("v2.zip", v2_zip, "application/zip")}, + cookies=owner_cookies, + ) + assert r.status_code == 200, r.text + conn = get_system_db() + v2_sub_id = StoreSubmissionsRepository(conn).latest_for_entity(eid)["id"] + conn.close() + run_llm_review( + v2_sub_id, + plugin_dir=Path(get_store_dir()) / eid / "versions" / "v2" / "plugin", + conn_factory=get_system_db, + api_key_loader=lambda: "sk", + model_loader=lambda: "claude-haiku-4-5-test", + ) + calls_after_v2 = approve_calls["n"] + + r = web_client.post( + f"/api/store/entities/{eid}/versions/1/restore", + cookies=owner_cookies, + ) + assert r.status_code == 200, r.text + conn = get_system_db() + v3_sub_id = StoreSubmissionsRepository(conn).latest_for_entity(eid)["id"] + conn.close() + run_llm_review( + v3_sub_id, + plugin_dir=Path(get_store_dir()) / eid / "versions" / "v3" / "plugin", + conn_factory=get_system_db, + api_key_loader=lambda: "sk", + model_loader=lambda: "claude-haiku-4-5-test", + ) + # No reuse: BG/manual LLM calls fired (count grew). + assert approve_calls["n"] > calls_after_v2 + conn = get_system_db() + v3_sub = StoreSubmissionsRepository(conn).get(v3_sub_id) + conn.close() + assert v3_sub["status"] == "approved" + assert (v3_sub["llm_findings"] or {}).get( + "reused_from_submission_id" + ) is None