fix(store): restore reuses prior approved verdict + admin detail surfaces content_quality (#332)
* fix(store): restore reuses prior approved verdict; admin detail surfaces content_quality
Live bug on agnes-development: entity 6ba2ee1d…'s v5 submission (third
restore of v1, byte-identical to v1/v2/v4/v6) landed `blocked_llm`
while the other identical-hash siblings landed `approved`. Anthropic
structured output is non-deterministic — same bytes flipped
`content_quality.verdict` pass↔fail across calls. Admin detail page
made the failure look mysterious: only security-findings table
rendered, so a content-quality-only block showed up as
"No findings — model verdict was clean".
Two fixes:
1. Restore endpoint reuses a prior `approved` submission's verdict
when the restored bundle hash matches an existing history entry
AND `reviewed_by_model` matches. Skips the LLM call, stamps the
new submission with the prior verdict + `reused_from_submission_id`
marker. Deterministic + saves Anthropic tokens. Gated on
schedule_async_llm so guardrails-off keeps its existing path.
2. Admin detail template now renders `content_quality.issues` in its
own table + adds an explicit "Blocked but no findings recorded"
notice for the transient-non-determinism case + surfaces the
reuse marker when present.
Reuse falls back to a real LLM call when:
- prior submission's reviewed_by_model doesn't match current (admin
upgraded tier Haiku → Sonnet → Opus)
- prior submission was guardrails-off (no reviewed_by_model)
- no history entry has matching hash
Tests:
- TestRestoreReusesApprovedVerdict::test_restore_of_approved_version_skips_llm_and_reuses_verdict
- TestRestoreReusesApprovedVerdict::test_restore_legacy_v1_falls_back_to_llm
* fix(store): admin detail v# by submission_id + version switcher
Three related fixes surfaced live by a user inspecting submission
47bbc1f5… on localhost where v# rendered as v1 even though current
was v10.
1. Admin queue + admin detail derive submission v# by submission_id
instead of hash. Pre-fix the loop matched first hash-equal entry
in version_history — always v1 when bundles were byte-identical
(which is the common case after the restore-reuse path). Two
call sites updated:
- `src/repositories/store_submissions.py:list_for_admin` (queue
v# column)
- `app/web/router.py:admin_store_submission_detail_page` (detail
page v# chip on each section header)
Same fix pattern as PR #330 for runner / override.
2. New version-switcher card on admin detail page lists every
submission linked to the entity with status + reviewed_by_model +
click-to-jump. Solves the user's secondary ask ("should be a way
to switch different versions on the submission detail").
3. Initial POST now backfills the v1 seed entry's submission_id
right after creating the v1 submission. The helper
`update_history_submission_id` existed but no production code
path called it — so v1 always had submission_id=None and every
"find v# for submission" lookup silently failed for v1.
171 tests green on touched surface.
* release: 0.54.24 — restore reuses prior approved verdict + admin detail content_quality + v# by submission_id (Codex/Live follow-up to #330/#331)
---------
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
This commit is contained in:
parent
9eaa1dc53c
commit
cd03028776
7 changed files with 584 additions and 14 deletions
46
CHANGELOG.md
46
CHANGELOG.md
|
|
@ -10,6 +10,52 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C
|
||||||
|
|
||||||
## [Unreleased]
|
## [Unreleased]
|
||||||
|
|
||||||
|
## [0.54.24] — 2026-05-16
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- Flea-market admin submissions UI now derives the per-submission
|
||||||
|
`v#` label by **submission_id**, not **hash**. Hash-based lookup
|
||||||
|
mislabeled every byte-identical reupload (and every reused-verdict
|
||||||
|
restore — common after the restore-reuse fix below) as `v1`
|
||||||
|
because the loop picked the FIRST history entry with matching
|
||||||
|
hash. Affected both the admin queue column (`v#`) and the per-
|
||||||
|
section chips on the detail page. Same fix pattern as PR #330
|
||||||
|
(runner / override paths).
|
||||||
|
- Flea-market admin submission detail page gained a version-switcher
|
||||||
|
card listing every submission linked to the same entity with
|
||||||
|
status badge + reviewed_by_model + click-to-jump. Lets admins
|
||||||
|
compare verdicts across versions without bouncing back to the
|
||||||
|
queue.
|
||||||
|
- Flea-market initial POST now backfills the v1 seed entry's
|
||||||
|
`submission_id` immediately after creating the v1 submission row.
|
||||||
|
Pre-fix the v1 history entry always carried `submission_id=None`
|
||||||
|
so downstream lookups (`_version_no_for_submission`, admin queue
|
||||||
|
v#, admin detail chip, restore-reuse) silently failed for v1.
|
||||||
|
- Flea-market restore endpoint now reuses the prior approved
|
||||||
|
submission's LLM verdict when the restored bundle is
|
||||||
|
byte-identical to a history entry already reviewed by the same
|
||||||
|
`review_model`. Pre-fix every restore re-ran the LLM; Anthropic
|
||||||
|
structured output is non-deterministic — same bytes flipped
|
||||||
|
`content_quality.verdict` pass↔fail across calls, so a second
|
||||||
|
restore of an already-approved version could spuriously land at
|
||||||
|
`blocked_llm`. Reuse skips the LLM, stamps the new submission
|
||||||
|
with the prior verdict + `reused_from_submission_id` marker,
|
||||||
|
and saves the Anthropic token cost. Surfaced live on a
|
||||||
|
development deployment where the third restore of a v1 bundle
|
||||||
|
(same hash as v1/v2/v4/v6 — multiple identical re-uploads)
|
||||||
|
landed `blocked_llm` while sibling submissions were `approved`.
|
||||||
|
- Admin submission detail page now surfaces
|
||||||
|
`llm_findings.content_quality.issues` in its own table next to
|
||||||
|
the security-findings table. Pre-fix the template only rendered
|
||||||
|
security findings, so a submission blocked purely on
|
||||||
|
`content_quality.verdict='fail'` (no security findings) showed
|
||||||
|
up as "No findings — model verdict was clean" even though
|
||||||
|
`status='blocked_llm'`. Also adds an explicit
|
||||||
|
"Blocked but no findings recorded" notice when the verdict is
|
||||||
|
blocked but neither findings list is populated (transient LLM
|
||||||
|
non-determinism), pointing admin at Rescan / Override. Reuse
|
||||||
|
markers (`reused_from_submission_id`) render too.
|
||||||
|
|
||||||
## [0.54.23] — 2026-05-16
|
## [0.54.23] — 2026-05-16
|
||||||
|
|
||||||
### Fixed
|
### Fixed
|
||||||
|
|
|
||||||
155
app/api/store.py
155
app/api/store.py
|
|
@ -51,6 +51,7 @@ from app.auth.dependencies import _get_db, get_current_user
|
||||||
from app.instance_config import (
|
from app.instance_config import (
|
||||||
get_guardrails_enabled,
|
get_guardrails_enabled,
|
||||||
get_guardrails_llm_provider_ready,
|
get_guardrails_llm_provider_ready,
|
||||||
|
get_guardrails_review_model,
|
||||||
)
|
)
|
||||||
from app.utils import get_store_dir
|
from app.utils import get_store_dir
|
||||||
from src.db import get_system_db
|
from src.db import get_system_db
|
||||||
|
|
@ -431,6 +432,58 @@ async def _hold_entity_write_lock(entity_id: str):
|
||||||
yield
|
yield
|
||||||
|
|
||||||
|
|
||||||
|
def _find_reusable_approved_verdict(
|
||||||
|
entity_row: Dict[str, Any],
|
||||||
|
new_version_hash: str,
|
||||||
|
subs_repo,
|
||||||
|
current_review_model: str,
|
||||||
|
) -> Optional[tuple]:
|
||||||
|
"""Find a prior `approved` submission whose bundle hash matches
|
||||||
|
the new submission's hash and whose reviewer was the same model.
|
||||||
|
|
||||||
|
Returns ``(prior_submission_id, prior_llm_findings)`` for the
|
||||||
|
caller to reuse, or ``None`` if no eligible prior verdict exists.
|
||||||
|
|
||||||
|
Rationale:
|
||||||
|
Anthropic structured output is non-deterministic. The
|
||||||
|
content_quality.verdict in particular can flip pass↔fail on
|
||||||
|
byte-identical bundles (observed live on a development
|
||||||
|
deployment: across five identical-hash submissions of one
|
||||||
|
entity, four landed `approved` and one landed `blocked_llm`
|
||||||
|
because the model flagged a description as weak on that one
|
||||||
|
call). When the user restores an already-approved version
|
||||||
|
— or re-uploads byte-identical bundles — the previous verdict
|
||||||
|
is the authoritative one and should be reused.
|
||||||
|
|
||||||
|
Gated on ``reviewed_by_model`` match so a stricter model can
|
||||||
|
still re-review under tightened rules (admin upgrades from
|
||||||
|
Haiku → Sonnet → Opus).
|
||||||
|
"""
|
||||||
|
for entry in (entity_row.get("version_history") or []):
|
||||||
|
if entry.get("hash") != new_version_hash:
|
||||||
|
continue
|
||||||
|
sid = entry.get("submission_id")
|
||||||
|
if not sid:
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
sub = subs_repo.get(sid)
|
||||||
|
except Exception:
|
||||||
|
continue
|
||||||
|
if not sub:
|
||||||
|
continue
|
||||||
|
if sub.get("status") != "approved":
|
||||||
|
continue
|
||||||
|
# Skip rows that pre-date the LLM tier (guardrails-off prior
|
||||||
|
# approvals carry no reviewed_by_model — caller still needs
|
||||||
|
# the LLM to validate the bundle under guardrails-on rules).
|
||||||
|
if not sub.get("reviewed_by_model"):
|
||||||
|
continue
|
||||||
|
if sub.get("reviewed_by_model") != current_review_model:
|
||||||
|
continue
|
||||||
|
return sid, sub.get("llm_findings") or {}
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
def _version_no_for_submission(
|
def _version_no_for_submission(
|
||||||
entity_row: Dict[str, Any], submission_id: str,
|
entity_row: Dict[str, Any], submission_id: str,
|
||||||
) -> Optional[int]:
|
) -> Optional[int]:
|
||||||
|
|
@ -1495,6 +1548,15 @@ async def create_entity(
|
||||||
file_size=bundle_meta.file_size,
|
file_size=bundle_meta.file_size,
|
||||||
bundle_sha256=bundle_meta.sha256,
|
bundle_sha256=bundle_meta.sha256,
|
||||||
)
|
)
|
||||||
|
# Backfill the v1 seed's submission_id so downstream lookups
|
||||||
|
# (`_version_no_for_submission`, admin queue v#, admin detail
|
||||||
|
# v# chip, restore reuse) can resolve v1 the same way they
|
||||||
|
# resolve v2+. Pre-fix this was deferred to a later
|
||||||
|
# `update_history_submission_id` call that never landed in
|
||||||
|
# the create path — leaving the v1 entry with
|
||||||
|
# `submission_id=None` and every "find v# for this submission"
|
||||||
|
# lookup silently failing for v1.
|
||||||
|
repo.update_history_submission_id(entity_id, 1, sub_id)
|
||||||
_audit(
|
_audit(
|
||||||
conn, user["id"],
|
conn, user["id"],
|
||||||
"store.submission.accepted" if guardrails_on else "store.submission.approved",
|
"store.submission.accepted" if guardrails_on else "store.submission.approved",
|
||||||
|
|
@ -2136,6 +2198,99 @@ async def _restore_version_locked(
|
||||||
hold_for_review = guardrails_enabled
|
hold_for_review = guardrails_enabled
|
||||||
schedule_async_llm = guardrails_enabled and provider_ready
|
schedule_async_llm = guardrails_enabled and provider_ready
|
||||||
subs_repo = StoreSubmissionsRepository(conn)
|
subs_repo = StoreSubmissionsRepository(conn)
|
||||||
|
|
||||||
|
# Idempotent restore: when the restored bundle is byte-identical
|
||||||
|
# to a prior `approved` submission reviewed by the SAME model,
|
||||||
|
# reuse that verdict and skip the async LLM step. Anthropic
|
||||||
|
# structured output is non-deterministic — same bytes can flip
|
||||||
|
# content_quality.verdict pass↔fail across calls, so a second
|
||||||
|
# restore of an already-approved version could spuriously block.
|
||||||
|
# Reuse keeps the chain deterministic + saves tokens.
|
||||||
|
# `restore` (and its byte-for-byte recovery contract) is the
|
||||||
|
# most natural place for this; PUT could extend later. Gated on
|
||||||
|
# `schedule_async_llm` so guardrails-off keeps its existing
|
||||||
|
# auto-approve path and enabled-but-not-ready doesn't silently
|
||||||
|
# promote without a verdict in env.
|
||||||
|
reuse_verdict = None
|
||||||
|
if schedule_async_llm:
|
||||||
|
try:
|
||||||
|
current_model = get_guardrails_review_model()
|
||||||
|
reuse_verdict = _find_reusable_approved_verdict(
|
||||||
|
entity, new_version, subs_repo, current_model,
|
||||||
|
)
|
||||||
|
except Exception:
|
||||||
|
logger.exception(
|
||||||
|
"restore: reusable-verdict lookup failed for entity %s",
|
||||||
|
entity_id,
|
||||||
|
)
|
||||||
|
reuse_verdict = None
|
||||||
|
|
||||||
|
if reuse_verdict is not None:
|
||||||
|
prior_sid, prior_findings = reuse_verdict
|
||||||
|
reused_findings = dict(prior_findings)
|
||||||
|
reused_findings["reused_from_submission_id"] = prior_sid
|
||||||
|
reused_findings["reused_reason"] = (
|
||||||
|
"byte-identical bundle to a prior approved submission; "
|
||||||
|
"skipped re-running LLM review to avoid spurious "
|
||||||
|
"content_quality flips (Anthropic structured output is "
|
||||||
|
"non-deterministic)"
|
||||||
|
)
|
||||||
|
sub_id = subs_repo.create(
|
||||||
|
submitter_id=user["id"],
|
||||||
|
submitter_email=user.get("email"),
|
||||||
|
type=entity["type"],
|
||||||
|
name=entity["name"],
|
||||||
|
version=new_version,
|
||||||
|
status="approved",
|
||||||
|
entity_id=entity_id,
|
||||||
|
inline_checks=inline.to_response_dict(),
|
||||||
|
llm_findings=reused_findings,
|
||||||
|
file_size=target_meta.file_size,
|
||||||
|
bundle_sha256=target_meta.sha256,
|
||||||
|
)
|
||||||
|
# Stamp reviewed_by_model + llm_findings via update_status
|
||||||
|
# (create doesn't accept reviewed_by_model). Terminal-state
|
||||||
|
# CAS guard allows this since the row was just inserted at
|
||||||
|
# 'approved' — we're re-affirming, not clobbering.
|
||||||
|
subs_repo.update_status(
|
||||||
|
sub_id, status="approved",
|
||||||
|
llm_findings=reused_findings,
|
||||||
|
reviewed_by_model=current_model,
|
||||||
|
allow_terminal_overwrite=True,
|
||||||
|
)
|
||||||
|
appended_n = repo.append_version_history(
|
||||||
|
entity_id,
|
||||||
|
version_hash=new_version,
|
||||||
|
sha256=target_meta.sha256,
|
||||||
|
size=target_meta.file_size,
|
||||||
|
submission_id=sub_id,
|
||||||
|
created_by=user["id"],
|
||||||
|
)
|
||||||
|
_audit(
|
||||||
|
conn, user["id"],
|
||||||
|
"store.submission.reused_approved_verdict",
|
||||||
|
f"store_submission:{sub_id}",
|
||||||
|
{
|
||||||
|
"entity_id": entity_id,
|
||||||
|
"restored_from_version_no": version_no,
|
||||||
|
"new_version_no": appended_n,
|
||||||
|
"reused_from_submission_id": prior_sid,
|
||||||
|
"reviewed_by_model": current_model,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
_audit(
|
||||||
|
conn, user["id"], "store.entity.restore", entity_id,
|
||||||
|
{"restored_from_version_no": version_no,
|
||||||
|
"new_version_no": appended_n,
|
||||||
|
"submission_id": sub_id,
|
||||||
|
"reused_verdict": True},
|
||||||
|
)
|
||||||
|
# Forward-only promote (same gate as runner / override).
|
||||||
|
if appended_n > int(entity.get("version_no") or 0):
|
||||||
|
promote_to_version(entity_id, appended_n, repo)
|
||||||
|
_invalidate_etag()
|
||||||
|
return _entity_to_response(conn, repo.get(entity_id)) # type: ignore[arg-type]
|
||||||
|
|
||||||
sub_id = subs_repo.create(
|
sub_id = subs_repo.create(
|
||||||
submitter_id=user["id"],
|
submitter_id=user["id"],
|
||||||
submitter_email=user.get("email"),
|
submitter_email=user.get("email"),
|
||||||
|
|
|
||||||
|
|
@ -2061,23 +2061,64 @@ async def admin_store_submission_detail_page(
|
||||||
# Verdict (sub.status) is immutable forensic record; lifecycle
|
# Verdict (sub.status) is immutable forensic record; lifecycle
|
||||||
# (entity.visibility_status) reflects current state — see plan
|
# (entity.visibility_status) reflects current state — see plan
|
||||||
# "Admin Submissions Filter: Use Entity Visibility, Not Denormalized Status".
|
# "Admin Submissions Filter: Use Entity Visibility, Not Denormalized Status".
|
||||||
# Also derive submission_version_no by matching sub.version (hash)
|
# Resolve THIS submission's version_no via submission_id (NOT
|
||||||
# against the entity's version_history (v37 edit feature).
|
# hash — multiple history entries can share a hash when the user
|
||||||
|
# re-uploads byte-identical bundles, and the hash-match-first-wins
|
||||||
|
# loop always picked v1, mislabeling every reupload as v1). Same
|
||||||
|
# fix as PR #330 for the runner / override paths; we missed this
|
||||||
|
# display site at the time.
|
||||||
entity_visibility_status = None
|
entity_visibility_status = None
|
||||||
entity_version_no = None
|
entity_version_no = None
|
||||||
submission_version_no = None
|
submission_version_no = None
|
||||||
|
sibling_submissions: list = []
|
||||||
if sub.get("entity_id"):
|
if sub.get("entity_id"):
|
||||||
ent = StoreEntitiesRepository(conn).get(sub["entity_id"])
|
ent = StoreEntitiesRepository(conn).get(sub["entity_id"])
|
||||||
if ent:
|
if ent:
|
||||||
entity_visibility_status = ent.get("visibility_status")
|
entity_visibility_status = ent.get("visibility_status")
|
||||||
entity_version_no = ent.get("version_no")
|
entity_version_no = ent.get("version_no")
|
||||||
for entry in (ent.get("version_history") or []):
|
from app.api.store import _version_no_for_submission
|
||||||
|
submission_version_no = _version_no_for_submission(
|
||||||
|
ent, submission_id,
|
||||||
|
)
|
||||||
|
# Build a version-switcher: every submission row linked to
|
||||||
|
# this entity, sorted newest first, with its derived v#.
|
||||||
|
# Admin clicks a row → jumps to that submission's detail.
|
||||||
|
# Surfaces multi-version entities clearly + lets admin
|
||||||
|
# compare verdicts across versions without bouncing back
|
||||||
|
# to the queue.
|
||||||
|
history = ent.get("version_history") or []
|
||||||
|
history_by_sub: dict = {}
|
||||||
|
for entry in history:
|
||||||
|
sid = entry.get("submission_id")
|
||||||
|
if sid:
|
||||||
try:
|
try:
|
||||||
if entry.get("hash") == sub.get("version"):
|
history_by_sub[sid] = int(entry.get("n"))
|
||||||
submission_version_no = int(entry.get("n"))
|
|
||||||
break
|
|
||||||
except (TypeError, ValueError):
|
except (TypeError, ValueError):
|
||||||
continue
|
continue
|
||||||
|
# Direct query — list_for_admin doesn't filter by entity_id
|
||||||
|
# and we don't want to add a parameter for this one display
|
||||||
|
# need. Order by created_at DESC so newest is first in the
|
||||||
|
# switcher.
|
||||||
|
ent_sub_rows = [
|
||||||
|
dict(zip(["id", "status", "version", "created_at", "reviewed_by_model"], row))
|
||||||
|
for row in conn.execute(
|
||||||
|
"SELECT id, status, version, created_at, reviewed_by_model "
|
||||||
|
"FROM store_submissions "
|
||||||
|
"WHERE entity_id = ? "
|
||||||
|
"ORDER BY created_at DESC",
|
||||||
|
[sub["entity_id"]],
|
||||||
|
).fetchall()
|
||||||
|
]
|
||||||
|
for row in ent_sub_rows:
|
||||||
|
sibling_submissions.append({
|
||||||
|
"id": row["id"],
|
||||||
|
"status": row.get("status"),
|
||||||
|
"version": row.get("version"),
|
||||||
|
"created_at": row.get("created_at"),
|
||||||
|
"version_no": history_by_sub.get(row["id"]),
|
||||||
|
"reviewed_by_model": row.get("reviewed_by_model"),
|
||||||
|
"is_current": row["id"] == submission_id,
|
||||||
|
})
|
||||||
|
|
||||||
other_count = StoreSubmissionsRepository(conn).count_for_submitter(
|
other_count = StoreSubmissionsRepository(conn).count_for_submitter(
|
||||||
sub["submitter_id"], exclude_id=submission_id,
|
sub["submitter_id"], exclude_id=submission_id,
|
||||||
|
|
@ -2159,6 +2200,7 @@ async def admin_store_submission_detail_page(
|
||||||
entity_visibility_status=entity_visibility_status,
|
entity_visibility_status=entity_visibility_status,
|
||||||
entity_version_no=entity_version_no,
|
entity_version_no=entity_version_no,
|
||||||
submission_version_no=submission_version_no,
|
submission_version_no=submission_version_no,
|
||||||
|
sibling_submissions=sibling_submissions,
|
||||||
)
|
)
|
||||||
return templates.TemplateResponse(request, "admin_store_submission_detail.html", ctx)
|
return templates.TemplateResponse(request, "admin_store_submission_detail.html", ctx)
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -258,6 +258,58 @@
|
||||||
{% endif %}
|
{% endif %}
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
{# ── Version switcher ────────────────────────────────────────────────────── #}
|
||||||
|
{# All submissions linked to this entity, newest first. Admin clicks
|
||||||
|
any row to jump to its detail. Surfaces multi-version entities so
|
||||||
|
verdicts across versions can be compared without bouncing back to
|
||||||
|
the queue. #}
|
||||||
|
{% if sibling_submissions and sibling_submissions|length > 1 %}
|
||||||
|
<div class="det-card">
|
||||||
|
<h2>Submissions for this entity ({{ sibling_submissions|length }})</h2>
|
||||||
|
<table class="data-table">
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>v#</th>
|
||||||
|
<th>Status</th>
|
||||||
|
<th>Hash</th>
|
||||||
|
<th>Reviewed by</th>
|
||||||
|
<th>Created</th>
|
||||||
|
<th></th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
{% for s in sibling_submissions %}
|
||||||
|
<tr {% if s.is_current %}style="background: var(--surface-muted, #f9fafb);"{% endif %}>
|
||||||
|
<td>
|
||||||
|
{% if s.version_no %}
|
||||||
|
<strong>v{{ s.version_no }}</strong>
|
||||||
|
{% else %}
|
||||||
|
<span class="empty-msg">—</span>
|
||||||
|
{% endif %}
|
||||||
|
{% if s.version_no and s.version_no == entity_version_no %}
|
||||||
|
<span class="badge approved" style="margin-left:4px;font-size:9px;">current</span>
|
||||||
|
{% endif %}
|
||||||
|
</td>
|
||||||
|
<td><span class="badge {{ s.status }}">{{ s.status }}</span></td>
|
||||||
|
<td><code>{{ (s.version or '')[:12] }}</code></td>
|
||||||
|
<td style="font-size:11px;color:var(--text-secondary,#6b7280);">{{ s.reviewed_by_model or '—' }}</td>
|
||||||
|
<td style="font-size:11px;color:var(--text-secondary,#6b7280);">
|
||||||
|
{{ s.created_at.strftime("%Y-%m-%d %H:%M") if s.created_at else "" }}
|
||||||
|
</td>
|
||||||
|
<td>
|
||||||
|
{% if s.is_current %}
|
||||||
|
<span class="empty-msg">viewing</span>
|
||||||
|
{% else %}
|
||||||
|
<a href="/admin/store/submissions/{{ s.id }}">Open →</a>
|
||||||
|
{% endif %}
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
{% endfor %}
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
{% endif %}
|
||||||
|
|
||||||
{# ── Override history ────────────────────────────────────────────────────── #}
|
{# ── Override history ────────────────────────────────────────────────────── #}
|
||||||
{% if sub.override_by %}
|
{% if sub.override_by %}
|
||||||
<div class="det-card">
|
<div class="det-card">
|
||||||
|
|
@ -404,6 +456,7 @@
|
||||||
<p class="det-summary">{{ sub.llm_findings.summary }}</p>
|
<p class="det-summary">{{ sub.llm_findings.summary }}</p>
|
||||||
{% endif %}
|
{% endif %}
|
||||||
{% if sub.llm_findings.findings %}
|
{% if sub.llm_findings.findings %}
|
||||||
|
<h3 style="margin: 18px 0 6px 0; font-size: 13px; color: var(--text-secondary, #6b7280); text-transform: uppercase; letter-spacing: 0.4px;">Security findings</h3>
|
||||||
<table class="data-table">
|
<table class="data-table">
|
||||||
<thead>
|
<thead>
|
||||||
<tr><th>Severity</th><th>Category</th><th>File</th><th>Explanation</th><th>Fix hint</th></tr>
|
<tr><th>Severity</th><th>Category</th><th>File</th><th>Explanation</th><th>Fix hint</th></tr>
|
||||||
|
|
@ -420,10 +473,60 @@
|
||||||
{% endfor %}
|
{% endfor %}
|
||||||
</tbody>
|
</tbody>
|
||||||
</table>
|
</table>
|
||||||
|
{% endif %}
|
||||||
|
{# Content-quality issues. Pre-fix the template only rendered
|
||||||
|
the security findings table, so a submission blocked
|
||||||
|
purely on content_quality (e.g. weak description) showed
|
||||||
|
up as 'No findings — model verdict was clean' even though
|
||||||
|
status was blocked_llm. Surfaced live by an admin
|
||||||
|
investigating a blocked re-restore of an already-approved
|
||||||
|
bundle. #}
|
||||||
|
{% set cq = sub.llm_findings.content_quality %}
|
||||||
|
{% if cq and cq.issues %}
|
||||||
|
<h3 style="margin: 18px 0 6px 0; font-size: 13px; color: var(--text-secondary, #6b7280); text-transform: uppercase; letter-spacing: 0.4px;">
|
||||||
|
Content quality issues
|
||||||
|
<span class="badge {{ 'fail' if cq.verdict == 'fail' else 'safe' }}" style="margin-left: 6px;">{{ cq.verdict }}</span>
|
||||||
|
</h3>
|
||||||
|
<table class="data-table">
|
||||||
|
<thead>
|
||||||
|
<tr><th>File</th><th>Field</th><th>Issue</th><th>Suggested rewrite</th></tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
{% for issue in cq.issues %}
|
||||||
|
<tr>
|
||||||
|
<td><code>{{ issue.file }}</code></td>
|
||||||
|
<td><code>{{ issue.field }}</code></td>
|
||||||
|
<td>{{ issue.issue }}</td>
|
||||||
|
<td>{{ issue.hint or "" }}</td>
|
||||||
|
</tr>
|
||||||
|
{% endfor %}
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
{% endif %}
|
||||||
|
{% if not sub.llm_findings.findings and not (cq and cq.issues) %}
|
||||||
|
{% if sub.status in ['blocked_llm', 'review_error'] %}
|
||||||
|
<div class="det-error-box">
|
||||||
|
<strong>Blocked but no findings recorded.</strong>
|
||||||
|
This is usually a transient LLM non-determinism: the
|
||||||
|
reviewer returned a fail verdict without enumerating
|
||||||
|
the offending items. Click <em>Rescan</em> to re-run
|
||||||
|
the pipeline, or <em>Override</em> with a written
|
||||||
|
reason if you've verified the bundle offline.
|
||||||
|
</div>
|
||||||
{% else %}
|
{% else %}
|
||||||
<div class="empty-msg">No findings — model verdict was clean.</div>
|
<div class="empty-msg">No findings — model verdict was clean.</div>
|
||||||
{% endif %}
|
{% endif %}
|
||||||
{% endif %}
|
{% endif %}
|
||||||
|
{% if sub.llm_findings.reused_from_submission_id %}
|
||||||
|
<div class="empty-msg" style="margin-top: 12px;">
|
||||||
|
✓ Verdict reused from prior approved submission
|
||||||
|
<a href="/admin/store/submissions/{{ sub.llm_findings.reused_from_submission_id }}">
|
||||||
|
{{ sub.llm_findings.reused_from_submission_id[:8] }}
|
||||||
|
</a>
|
||||||
|
(byte-identical bundle — LLM was not re-called).
|
||||||
|
</div>
|
||||||
|
{% endif %}
|
||||||
|
{% endif %}
|
||||||
{% elif sub.status in ['pending_inline', 'pending_llm'] %}
|
{% elif sub.status in ['pending_inline', 'pending_llm'] %}
|
||||||
<div class="empty-msg">Review still in progress…</div>
|
<div class="empty-msg">Review still in progress…</div>
|
||||||
{% else %}
|
{% else %}
|
||||||
|
|
|
||||||
|
|
@ -1,6 +1,6 @@
|
||||||
[project]
|
[project]
|
||||||
name = "agnes-the-ai-analyst"
|
name = "agnes-the-ai-analyst"
|
||||||
version = "0.54.23"
|
version = "0.54.24"
|
||||||
description = "Agnes — AI Data Analyst platform for AI analytical systems"
|
description = "Agnes — AI Data Analyst platform for AI analytical systems"
|
||||||
requires-python = ">=3.11,<3.14"
|
requires-python = ">=3.11,<3.14"
|
||||||
license = "MIT"
|
license = "MIT"
|
||||||
|
|
|
||||||
|
|
@ -451,10 +451,16 @@ class StoreSubmissionsRepository:
|
||||||
history = []
|
history = []
|
||||||
item["entity_version_history"] = history
|
item["entity_version_history"] = history
|
||||||
item["version_no"] = None
|
item["version_no"] = None
|
||||||
sub_hash = item.get("version")
|
# Look up THIS submission's version_no by submission_id,
|
||||||
|
# NOT by hash. Hash-based lookup mislabeled every
|
||||||
|
# byte-identical reupload (and every reused-verdict
|
||||||
|
# restore — common after PR #332) as v1 because the loop
|
||||||
|
# picked the FIRST history entry with matching hash.
|
||||||
|
# Same fix-pattern as PR #330 for runner / override.
|
||||||
|
sub_id = item.get("id")
|
||||||
for entry in history:
|
for entry in history:
|
||||||
try:
|
try:
|
||||||
if entry.get("hash") == sub_hash:
|
if entry.get("submission_id") == sub_id:
|
||||||
item["version_no"] = int(entry.get("n"))
|
item["version_no"] = int(entry.get("n"))
|
||||||
break
|
break
|
||||||
except (TypeError, ValueError):
|
except (TypeError, ValueError):
|
||||||
|
|
|
||||||
|
|
@ -1992,3 +1992,221 @@ class TestRescanPromotesNonCurrent:
|
||||||
assert ent_after["version"] != v1_hash, (
|
assert ent_after["version"] != v1_hash, (
|
||||||
"entity.version hash must move to v2 after rescan promote"
|
"entity.version hash must move to v2 after rescan promote"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
class TestRestoreReusesApprovedVerdict:
|
||||||
|
"""Live-bug fix: a second restore of an already-approved version
|
||||||
|
sometimes flipped to `blocked_llm` because Anthropic structured
|
||||||
|
output is non-deterministic — same bytes, different
|
||||||
|
`content_quality.verdict` across calls. Restore now detects
|
||||||
|
byte-identical bundles backed by a prior `approved` submission
|
||||||
|
(same reviewed_by_model) and reuses that verdict."""
|
||||||
|
|
||||||
|
def test_restore_of_approved_version_skips_llm_and_reuses_verdict(
|
||||||
|
self, web_client, monkeypatch,
|
||||||
|
):
|
||||||
|
from pathlib import Path
|
||||||
|
from app.utils import get_store_dir
|
||||||
|
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||||||
|
from src.store_guardrails.runner import run_llm_review
|
||||||
|
|
||||||
|
monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-fake-for-reuse")
|
||||||
|
owner_id, owner_cookies = _create_user(web_client, "reuse@x.com")
|
||||||
|
eid = _upload_clean(web_client, owner_cookies, name="reuseme")
|
||||||
|
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"app.api.store.get_guardrails_enabled", lambda: True,
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"app.api.store.get_guardrails_llm_provider_ready", lambda: True,
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"app.api.store.get_guardrails_review_model",
|
||||||
|
lambda: "claude-haiku-4-5-test",
|
||||||
|
)
|
||||||
|
# Stub the BG-task scheduler so only our direct `run_llm_review`
|
||||||
|
# call writes the verdict (real BG uses default_model_loader
|
||||||
|
# which resolves to a different reviewed_by_model and would
|
||||||
|
# win the terminal-state CAS race).
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"app.api.store._schedule_llm_review",
|
||||||
|
lambda *a, **kw: None,
|
||||||
|
)
|
||||||
|
|
||||||
|
approve_calls = {"n": 0}
|
||||||
|
|
||||||
|
def mock_approve(*a, **kw):
|
||||||
|
approve_calls["n"] += 1
|
||||||
|
return {
|
||||||
|
"risk_level": "safe", "summary": "ok",
|
||||||
|
"findings": [], "template_placeholders_found": 0,
|
||||||
|
"reviewed_by_model": "claude-haiku-4-5-test",
|
||||||
|
"error": None,
|
||||||
|
"content_quality": {"verdict": "pass", "issues": []},
|
||||||
|
}
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"src.store_guardrails.llm_review.review_bundle", mock_approve,
|
||||||
|
)
|
||||||
|
|
||||||
|
v2_zip = _make_skill_zip("reuseme", body="V2 body unique " * 30)
|
||||||
|
r = web_client.put(
|
||||||
|
f"/api/store/entities/{eid}",
|
||||||
|
files={"file": ("v2.zip", v2_zip, "application/zip")},
|
||||||
|
cookies=owner_cookies,
|
||||||
|
)
|
||||||
|
assert r.status_code == 200, r.text
|
||||||
|
conn = get_system_db()
|
||||||
|
v2_sub_id = StoreSubmissionsRepository(conn).latest_for_entity(eid)["id"]
|
||||||
|
conn.close()
|
||||||
|
run_llm_review(
|
||||||
|
v2_sub_id,
|
||||||
|
plugin_dir=Path(get_store_dir()) / eid / "versions" / "v2" / "plugin",
|
||||||
|
conn_factory=get_system_db,
|
||||||
|
api_key_loader=lambda: "sk",
|
||||||
|
model_loader=lambda: "claude-haiku-4-5-test",
|
||||||
|
)
|
||||||
|
calls_after_v2 = approve_calls["n"]
|
||||||
|
|
||||||
|
v3_zip = _make_skill_zip("reuseme", body="V3 body unique " * 30)
|
||||||
|
r = web_client.put(
|
||||||
|
f"/api/store/entities/{eid}",
|
||||||
|
files={"file": ("v3.zip", v3_zip, "application/zip")},
|
||||||
|
cookies=owner_cookies,
|
||||||
|
)
|
||||||
|
assert r.status_code == 200, r.text
|
||||||
|
conn = get_system_db()
|
||||||
|
v3_sub_id = StoreSubmissionsRepository(conn).latest_for_entity(eid)["id"]
|
||||||
|
conn.close()
|
||||||
|
run_llm_review(
|
||||||
|
v3_sub_id,
|
||||||
|
plugin_dir=Path(get_store_dir()) / eid / "versions" / "v3" / "plugin",
|
||||||
|
conn_factory=get_system_db,
|
||||||
|
api_key_loader=lambda: "sk",
|
||||||
|
model_loader=lambda: "claude-haiku-4-5-test",
|
||||||
|
)
|
||||||
|
calls_after_v3 = approve_calls["n"]
|
||||||
|
# BG task in TestClient may fire run_llm_review on its own
|
||||||
|
# in addition to our direct call, so don't assert exact count
|
||||||
|
# difference — only check the restore window.
|
||||||
|
assert calls_after_v3 > calls_after_v2
|
||||||
|
|
||||||
|
# Restore v2 → byte-identical to v2 approved entry. Reuse
|
||||||
|
# must fire → no new LLM call.
|
||||||
|
r = web_client.post(
|
||||||
|
f"/api/store/entities/{eid}/versions/2/restore",
|
||||||
|
cookies=owner_cookies,
|
||||||
|
)
|
||||||
|
assert r.status_code == 200, r.text
|
||||||
|
assert approve_calls["n"] == calls_after_v3, (
|
||||||
|
f"restore of approved version must NOT call LLM; "
|
||||||
|
f"count grew from {calls_after_v3} to {approve_calls['n']}"
|
||||||
|
)
|
||||||
|
|
||||||
|
conn = get_system_db()
|
||||||
|
ent_after = StoreEntitiesRepository(conn).get(eid)
|
||||||
|
v4_sub_id = next(
|
||||||
|
(e["submission_id"] for e in ent_after["version_history"]
|
||||||
|
if int(e["n"]) == 4),
|
||||||
|
None,
|
||||||
|
)
|
||||||
|
v4_sub = StoreSubmissionsRepository(conn).get(v4_sub_id) if v4_sub_id else None
|
||||||
|
conn.close()
|
||||||
|
assert v4_sub is not None
|
||||||
|
assert v4_sub["status"] == "approved"
|
||||||
|
assert v4_sub["reviewed_by_model"] == "claude-haiku-4-5-test"
|
||||||
|
assert (v4_sub["llm_findings"] or {}).get(
|
||||||
|
"reused_from_submission_id"
|
||||||
|
) == v2_sub_id
|
||||||
|
assert ent_after["version_no"] == 4
|
||||||
|
|
||||||
|
def test_restore_legacy_v1_falls_back_to_llm(
|
||||||
|
self, web_client, monkeypatch,
|
||||||
|
):
|
||||||
|
"""v1 seed (guardrails-OFF approval, no reviewed_by_model) is
|
||||||
|
NOT eligible for reuse. Restoring v1 must schedule a real
|
||||||
|
LLM review under guardrails-on."""
|
||||||
|
from pathlib import Path
|
||||||
|
from app.utils import get_store_dir
|
||||||
|
from src.repositories.store_submissions import StoreSubmissionsRepository
|
||||||
|
from src.store_guardrails.runner import run_llm_review
|
||||||
|
|
||||||
|
monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-fake-fallback")
|
||||||
|
owner_id, owner_cookies = _create_user(web_client, "noreuse@x.com")
|
||||||
|
eid = _upload_clean(web_client, owner_cookies, name="noreuse")
|
||||||
|
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"app.api.store.get_guardrails_enabled", lambda: True,
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"app.api.store.get_guardrails_llm_provider_ready", lambda: True,
|
||||||
|
)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"app.api.store.get_guardrails_review_model",
|
||||||
|
lambda: "claude-haiku-4-5-test",
|
||||||
|
)
|
||||||
|
# Stub the BG-task scheduler so only our direct `run_llm_review`
|
||||||
|
# call writes the verdict (real BG uses default_model_loader
|
||||||
|
# which resolves to a different reviewed_by_model and would
|
||||||
|
# win the terminal-state CAS race).
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"app.api.store._schedule_llm_review",
|
||||||
|
lambda *a, **kw: None,
|
||||||
|
)
|
||||||
|
|
||||||
|
approve_calls = {"n": 0}
|
||||||
|
|
||||||
|
def mock_approve(*a, **kw):
|
||||||
|
approve_calls["n"] += 1
|
||||||
|
return {
|
||||||
|
"risk_level": "safe", "summary": "ok",
|
||||||
|
"findings": [], "template_placeholders_found": 0,
|
||||||
|
"reviewed_by_model": "claude-haiku-4-5-test",
|
||||||
|
"error": None,
|
||||||
|
"content_quality": {"verdict": "pass", "issues": []},
|
||||||
|
}
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"src.store_guardrails.llm_review.review_bundle", mock_approve,
|
||||||
|
)
|
||||||
|
|
||||||
|
v2_zip = _make_skill_zip("noreuse", body="V2 body " * 30)
|
||||||
|
r = web_client.put(
|
||||||
|
f"/api/store/entities/{eid}",
|
||||||
|
files={"file": ("v2.zip", v2_zip, "application/zip")},
|
||||||
|
cookies=owner_cookies,
|
||||||
|
)
|
||||||
|
assert r.status_code == 200, r.text
|
||||||
|
conn = get_system_db()
|
||||||
|
v2_sub_id = StoreSubmissionsRepository(conn).latest_for_entity(eid)["id"]
|
||||||
|
conn.close()
|
||||||
|
run_llm_review(
|
||||||
|
v2_sub_id,
|
||||||
|
plugin_dir=Path(get_store_dir()) / eid / "versions" / "v2" / "plugin",
|
||||||
|
conn_factory=get_system_db,
|
||||||
|
api_key_loader=lambda: "sk",
|
||||||
|
model_loader=lambda: "claude-haiku-4-5-test",
|
||||||
|
)
|
||||||
|
calls_after_v2 = approve_calls["n"]
|
||||||
|
|
||||||
|
r = web_client.post(
|
||||||
|
f"/api/store/entities/{eid}/versions/1/restore",
|
||||||
|
cookies=owner_cookies,
|
||||||
|
)
|
||||||
|
assert r.status_code == 200, r.text
|
||||||
|
conn = get_system_db()
|
||||||
|
v3_sub_id = StoreSubmissionsRepository(conn).latest_for_entity(eid)["id"]
|
||||||
|
conn.close()
|
||||||
|
run_llm_review(
|
||||||
|
v3_sub_id,
|
||||||
|
plugin_dir=Path(get_store_dir()) / eid / "versions" / "v3" / "plugin",
|
||||||
|
conn_factory=get_system_db,
|
||||||
|
api_key_loader=lambda: "sk",
|
||||||
|
model_loader=lambda: "claude-haiku-4-5-test",
|
||||||
|
)
|
||||||
|
# No reuse: BG/manual LLM calls fired (count grew).
|
||||||
|
assert approve_calls["n"] > calls_after_v2
|
||||||
|
conn = get_system_db()
|
||||||
|
v3_sub = StoreSubmissionsRepository(conn).get(v3_sub_id)
|
||||||
|
conn.close()
|
||||||
|
assert v3_sub["status"] == "approved"
|
||||||
|
assert (v3_sub["llm_findings"] or {}).get(
|
||||||
|
"reused_from_submission_id"
|
||||||
|
) is None
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue