agnes-the-ai-analyst/app/web/templates/_flea_versions.html
Vojtech a694a30a5e
fix(store): surface review failures + harden publish gate (#316)
* fix(store): surface review failures + harden publish gate

Four independent fixes to the flea-market submission pipeline, all surfaced
by an admin upload that landed at status='approved' without an LLM review.

1. LLM truncation no longer pins submissions in review_error.
   - Raised MAX_RESPONSE_TOKENS 2500 → 6000 in llm_review.py
   - Added one-shot retry-with-doubled-budget in anthropic_provider.py
     (capped at 4× initial)

2. Flea detail page surfaces the latest submission's failure verdict even
   when a previously-approved version is still serving (deferred-promotion
   path). The _quarantine_banner gate widened from `visibility != approved`
   to also fire on `blocked_inline / blocked_llm / review_error`, with copy
   that distinguishes the v2+ edit case ("Latest edit failed review —
   previously approved version (vN) keeps serving") from the initial-upload
   quarantine wording.

3. Restore button + endpoint no longer allow restoring a version that was
   never approved. Added StoreEntitiesRepository.get_with_version_approvals
   joining store_submissions, gated the UI button on submission_status in
   ('approved', None), rendered status pills for non-restorable rows, and
   added a 400 version_not_approved guard in POST /restore.

4. **BREAKING (operator-facing)**: publish gate is now fail-CLOSED on
   misconfig. The previous get_guardrails_enabled() silently fell back to
   "disabled, auto-approve everything" when guardrails.enabled=true in YAML
   but no ANTHROPIC_API_KEY was in env. Split into:
     - get_guardrails_enabled()              (intent — YAML)
     - get_guardrails_llm_provider_ready()   (readiness — env)
   Three-state matrix:
     enabled=false                       → auto-approve (unchanged)
     enabled=true + ready=true           → normal pipeline (unchanged)
     enabled=true + ready=false (NEW)    → submissions hold at pending_llm
                                           awaiting admin retry or override
                                           (was: silent auto-approve)
   Admin "Retry review" eligibility broadened to include pending_llm.
   Boot-time WARNING banner surfaces the misconfig in app/main.py.
   docs/STORE_GUARDRAILS.md updated with the three-state matrix.
   Operators relying on the auto-fallback for local-dev no-LLM setups must
   now explicitly set `guardrails.enabled: false` in instance.yaml.

Tests: 4623 passed. Added TestPublishGateFailClosed (4 tests) and
TestRestoreVersion::test_restore_rejects_* (3 tests). conftest.py adds an
autouse fixture defaulting guardrails OFF so legacy tests don't need to
know about the new toggle.

* fix(store): admin override promotes v2+ edits to current

The override handler at app/api/admin.py:3708 only flipped submission
status → 'overridden' and entity visibility → 'approved'. Under the v37+
deferred-promotion model that's insufficient for v2+ edits / restores:
the new bundle sits in versions/v<N>/plugin/ and the entity row stays at
the prior approved version_no + hash + on-disk live bundle. Installers
kept getting the OLD bytes the admin had just intended to replace.

Mirror the runner.run_llm_review auto-approval branch: look up the
submission's version_hash in entity.version_history, and if its `n`
differs from entity.version_no, promote_version + _swap_live_to_version.
Initial v1 overrides are unaffected — the loop finds n=1 == version_no
and skips promotion.

Tests:
- test_override_v2_edit_promotes_to_current: stage v1 approved + v2
  blocked_llm; override the v2 sub; assert entity.version_no=2,
  entity.version flips off the v1 hash, and the live plugin/ dir
  mirrors versions/v2/plugin/.
- test_override_v1_initial_upload_no_promote: regression guard so the
  promote loop doesn't accidentally bump a v1 override.

Audit log gains a promoted_to_version_no field on the override action.

* fix(store): retry/rescan review staged bundle; override forward-only

Two adversarial-review findings from a Codex pass on the publish-gate
work.

C1. Admin retry + rescan were passing live `plugin/` to the LLM. For a
v2+ submission held at `pending_llm` / `blocked_llm` / `review_error`,
live still holds the prior approved version's bytes — so the LLM
reviewed the WRONG bytes, and the runner's hash-match promotion in
`run_llm_review` would then advance the entity to staged bytes that
were never actually reviewed. Resolve the staged
`<entity>/versions/v<N>/plugin/` from the submission's
`version_history` entry, with a fall-back to live for legacy pre-v37
rows that never seeded a versions/ dir. Helpers
`_submission_plugin_dir` and `_version_no_for_submission` added to
`app/api/store.py` so override / retry / rescan share one path.

H1. Override's promote loop used `target != current`, which would
silently demote the live bundle when admin overrode a stale v2
submission while v3 was already approved + live. Changed to
`target > current` so override flips status + visibility on the row
regardless, but on-disk promotion only fires forward. Same `>`
defensive guard applied in `runner.run_llm_review` so a late LLM
verdict racing with a newer approval can't demote either.

Tests:
- TestAdminRetryReviewsStagedBundle::test_retry_v2_blocked_passes_staged_dir_not_live
- TestAdminRetryReviewsStagedBundle::test_rescan_v2_blocked_passes_staged_dir_not_live
- TestOverrideForwardOnly::test_override_stale_v2_does_not_demote_when_v3_current

* review polish: CHANGELOG drift, override eligibility, defensive copy

Three small additions on top of the retry/rescan staged-bundle fix:

1. CHANGELOG: the PR's bullets had drifted into the released
   [0.54.17] section during rebase (context-match landed them next
   to already-released content). Moved them up to [Unreleased] where
   they belong; [0.54.17] now holds only what was actually released
   (refresh-marketplace ls-remote, /me/activity hero, CI sharding +
   workflow polish).

2. app/api/admin.py: admin override eligibility now accepts
   pending_llm alongside blocked_inline + blocked_llm + review_error.
   Closes a UX gap from the new fail-CLOSED behavior: under
   enabled-but-not-ready, a known-good submission would otherwise
   sit indefinitely until the admin set credentials AND clicked
   Retry. Override already routes through version_history (and is
   now forward-only on promote), so it stays safe for v2+ deferred-
   promotion submissions.

3. src/repositories/store_entities.py: get_with_version_approvals
   defensively copies each version_history entry before annotating
   with submission_status. self.get() re-parses JSON each call today
   so this is belt-and-suspenders against any future caching layer
   leaking the annotated key into a subsequent plain get() call.

Tests: 112 passed (focused on test_store_entity_versions +
test_admin_store_submissions, covering the retry/rescan staged-
bundle fix the author shipped + this polish).

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-15 15:52:07 +02:00

209 lines
8.4 KiB
HTML

{# Versions card — owner + admin only.
Renders entity.version_history (oldest-first) reversed so newest
appears at top. Each row gets:
* version label (vN, "current" badge for the active one)
* short hash + size + created_at
* Restore button (owner + admin) for non-current versions
* Download button (admin only) — links to admin submission bundle
Required scope:
entity — store_entities row carrying version_no + version_history
is_owner — bool
is_admin — bool
Self-guards on visibility (only renders for owner/admin) and on
history length (>= 1). Plain entities created post-v37 always have
at least a v1 entry.
#}
{% if (is_owner or is_admin) and entity and entity.version_history and entity.version_history|length >= 1 %}
<style>
.versions-card {
margin: 16px 0; padding: 14px 18px; background: var(--surface, #fff);
border: 1px solid var(--border, #e5e7eb); border-radius: 10px;
font-size: 13px;
}
.versions-card h3 {
margin: 0 0 10px 0; font-size: 14px; font-weight: 600;
color: var(--text, #111827);
}
.versions-card table { width: 100%; border-collapse: collapse; }
.versions-card th, .versions-card td {
text-align: left; padding: 6px 10px; border-bottom: 1px solid var(--border-light, #f3f4f6);
vertical-align: middle;
}
.versions-card th {
font-size: 11px; text-transform: uppercase; letter-spacing: 0.4px;
color: var(--text-secondary, #6b7280); font-weight: 500;
}
.versions-card .vn { font-weight: 600; }
.versions-card .current-badge {
display: inline-block; padding: 1px 6px; border-radius: 999px;
background: #d1fae5; color: #065f46; font-size: 10px;
font-weight: 600; text-transform: uppercase; letter-spacing: 0.3px;
margin-left: 6px;
}
.versions-card code {
font-family: ui-monospace, SFMono-Regular, Menlo, monospace;
font-size: 11px; background: rgba(0,0,0,0.04); padding: 1px 4px; border-radius: 3px;
}
.versions-card button.restore {
padding: 4px 10px; border-radius: 5px;
border: 1px solid var(--border, #d1d5db); background: var(--surface, #fff);
font-size: 12px; cursor: pointer;
}
.versions-card button.restore:hover { background: var(--surface-muted, #f9fafb); }
.versions-card button.restore:disabled { opacity: 0.5; cursor: not-allowed; }
.versions-card .status-pill {
display: inline-block; padding: 1px 7px; border-radius: 999px;
font-size: 10px; font-weight: 600; text-transform: uppercase;
letter-spacing: 0.3px; vertical-align: middle;
}
.versions-card .status-pill.blocked { background: #fee2e2; color: #991b1b; }
.versions-card .status-pill.errored { background: #fef3c7; color: #92400e; }
.versions-card .status-pill.pending { background: #e0e7ff; color: #3730a3; }
</style>
<div class="versions-card">
<h3>Versions ({{ entity.version_history|length }})</h3>
<table>
<thead>
<tr>
<th>Version</th>
<th>Hash</th>
<th>Size</th>
<th>Created</th>
<th></th>
</tr>
</thead>
<tbody>
{# Render newest-first. Build a reversed list in Jinja. #}
{% set ordered = entity.version_history | sort(attribute='n', reverse=true) %}
{% for v in ordered %}
<tr>
<td class="vn">
v{{ v.n }}
{% if v.n == entity.version_no %}<span class="current-badge">current</span>{% endif %}
</td>
<td><code>{{ v.hash[:12] if v.hash else '—' }}</code></td>
<td>
{%- if v.size is not none -%}
{%- if v.size < 1024 -%}{{ v.size }} B
{%- elif v.size < 1048576 -%}{{ "%.1f"|format(v.size / 1024) }} KB
{%- else -%}{{ "%.1f"|format(v.size / 1048576) }} MB
{%- endif -%}
{%- else -%}{%- endif -%}
</td>
<td style="white-space: nowrap; color: var(--text-secondary, #6b7280);">
{{ v.created_at[:10] if v.created_at else '' }}
</td>
<td>
{# Restore only when the version was actually approved.
``submission_status=None`` is legacy v1 (seeded pre-v37
before submission_id backfill) — treat as approved.
Anything else (blocked_inline / blocked_llm /
review_error / pending_*) renders a pill instead so the
owner understands why no button. #}
{% set _sub_status = v.submission_status %}
{% set _is_approvable = _sub_status in ['approved', None] %}
{% if v.n != entity.version_no %}
{% if _is_approvable %}
<button class="restore" type="button"
data-version-no="{{ v.n }}"
onclick="restoreVersion({{ v.n }})">
Restore
</button>
{% elif _sub_status in ['blocked_inline', 'blocked_llm'] %}
<span class="status-pill blocked"
title="This version was blocked by guardrails — restore disabled.">
blocked
</span>
{% elif _sub_status == 'review_error' %}
<span class="status-pill errored"
title="The security reviewer errored on this version — restore disabled.">
errored
</span>
{% elif _sub_status in ['pending_inline', 'pending_llm'] %}
<span class="status-pill pending"
title="This version is still under review.">
pending
</span>
{% endif %}
{% endif %}
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
<script>
async function restoreVersion(versionNo) {
const currentVersionNo = {{ entity.version_no | tojson }};
const ok = confirm(
`Restore version ${versionNo}?\n\n` +
`This will create a new version (v${currentVersionNo + 1}) with v${versionNo}'s ` +
`bundle and re-run security checks. The current version stays in history.`
);
if (!ok) return;
const r = await fetch(
`/api/store/entities/{{ entity.id }}/versions/${versionNo}/restore`,
{method: 'POST', credentials: 'same-origin'}
);
if (r.ok) {
window.location.reload();
return;
}
let msg = 'Restore failed.';
try {
const j = await r.json();
if (j.detail) {
const code = j.detail.code || '';
const checks = j.detail.checks || {};
if (code === 'validation_failed') {
// Stay on the detail page; surface manifest/content issues.
// The restored version's source bundle predates today's
// rules — admin or owner can either fix the source version
// or restore a different one.
const issues = (checks.manifest?.issues || [])
.concat((checks.content?.issues || []).map(i => i.code || 'issue'));
msg = 'Restore blocked: today\'s validation rules reject the v'
+ versionNo + ' bundle.';
if (issues.length) msg += '\n• ' + issues.slice(0, 5).join('\n• ');
} else if (code === 'security_blocked') {
const findings = (checks.static_security?.findings) || [];
msg = 'Restore blocked: security review found risky patterns in the v'
+ versionNo + ' bundle.';
if (findings.length) {
msg += '\n' + findings.slice(0, 5).map(f =>
'• ' + (f.file || '?') + ':' + (f.line || '?') + ' — ' + (f.reason || f.category || '')
).join('\n');
}
} else if (code === 'submission_blocked') {
// Legacy server response (pre-cutover). Land on the detail
// page so the existing quarantine banner UX still works.
const eid = j.detail.entity_id || '{{ entity.id }}';
window.location = `/marketplace/flea/${eid}`;
return;
} else if (code === 'prior_version_pending') {
msg = 'A previous edit is still under review. Wait for the verdict before restoring.';
} else if (code === 'version_not_approved') {
const src = j.detail.source_status || 'not approved';
msg = `Restore blocked: v${versionNo} was never approved `
+ `(status: ${src}). Pick a version that passed review.`;
} else if (code === 'version_not_found') {
msg = 'That version is no longer on disk.';
} else if (code === 'already_current') {
msg = 'Already on that version.';
} else if (typeof j.detail === 'object') {
msg = `Restore failed: ${code || JSON.stringify(j.detail)}`;
} else {
msg = `Restore failed: ${j.detail}`;
}
}
} catch (_) {}
alert(msg);
}
</script>
{% endif %}