agnes-the-ai-analyst/app/web/templates/_quarantine_banner.html
Vojtech fb6e930bc9
feat(store-guardrails): per-component description quality + plain-language UX (#276)
* feat(store-guardrails): enforce per-component description quality

Two-tier hard guardrail on flea-market submissions. Empty / placeholder /
single-word descriptions now block before any LLM call; vague-but-passes-
floor descriptions block on the substantive LLM review layer.

Tier 1 — inline mechanical check (src/store_guardrails/content_check.py).
Walks the baked plugin tree, evaluates each component (plugin manifest,
agents, skills, commands) plus the submission-level form description
against a 60-char / 25-char (commands) / 5-distinct-word / 200-char-body
floor with a placeholder denylist (TODO, TBD, {{var}}, etc.). Floors
calibrated against real ecosystem norms: Claude / superpowers /
compound-engineering skill packs cluster 150–220 chars, npm / Docker /
VS Code at 100–120. InlineResult.passed now ANDs in content.status.

Tier 2 — LLM review extension (prompts.py + llm_review.py). System
prompt gains a content-quality criterion; REVIEW_JSON_SCHEMA carries a
content_quality {verdict, issues[]} object alongside the existing
security findings. is_safe() requires content_quality.verdict == 'pass'.
Single LLM call covers both dimensions. MAX_RESPONSE_TOKENS bumped
2000 → 2500 for the extra payload. Verdicts missing content_quality
treated as pass (backwards compat with already-recorded rows).

Submitter UX:
- /store/new wizard now carries a "Before you upload — what passes
  review" collapsible disclosure on both step 1 and step 2 with the
  bar + patterns that work. Live char counter on the description
  field. Per-component preview table (green/red dots from the new
  summarize_for_preview helper) renders after the ZIP /preview round
  trip, scoping each finding to its file.
- New /store/examples page with rejected/passes pairs for skill /
  agent / plugin / command plus a "Why these limits" research table.
  Anchored sections (#skill / #agent / #plugin / #command) so the
  rejection banner can deep-link by component_type.
- Quarantine banner _content_findings.html groups findings by file
  (one "See <type> example ↗" per component, not per field) and
  translates field codes (frontmatter.description / body / etc.) to
  plain-English labels. _content_howto_fix.html surfaces a static
  "Re-upload as new version" + "See examples" action row beneath any
  content failure on the entity detail page.
- _parse_frontmatter moved to src/store_guardrails/_frontmatter.py so
  the new check module shares the parser without inverting the
  app → src dependency direction.

Tests:
- New tests/test_store_guardrails_content.py (29 cases) covering
  every failure code per component type plus submission-level checks
  and the summarize_components / summarize_for_preview helpers.
- Extended test_store_guardrails_inline.py for the new
  InlineResult.content field + aggregate behaviour.
- Extended test_store_guardrails_llm.py for the new
  content_quality verdict pathways (fail blocks, missing field passes).
- Backfilled fixture descriptions across test_store_api.py,
  test_store_entity_versions.py, test_store_put_atomic.py,
  test_admin_store_submissions.py, test_marketplace_api.py,
  test_marketplace_v32_endpoints.py so existing happy-path tests
  clear the new 60-char floor.

* fix(content-guardrail): align agents walker with preview + drop import-time .format()

Two cleanups from the takeover review on #276 (vr/guardrails-content).

1) `_iter_components` for agents now skips files lacking frontmatter
   (no `name` AND no `description`). Pre-fix the walker greedily
   evaluated every `*.md` under `agents/` — `agents/README.md` and
   helper docs got flagged as "frontmatter.description empty"
   rejections. Worse: `summarize_for_preview` for `type=agent` ALREADY
   filters the same shape, so the upload preview gave a green dot
   while the post-bake check gave a red rejection on submit. Two new
   regression tests in TestAgentsWalkerSkipsNonAgentFiles pin both
   shapes (README + _NOTES.md) so the preview/check parity stays
   aligned.

2) `body_too_short` hints now use the same runtime-kwarg substitution
   pattern as every other hint in the table. Pre-fix the skill +
   agent body_too_short hints called `.format(min_chars=_MIN_BODY_CHARS)`
   at module-load time, but the call site `_hint_for(type_,
   "body_too_short")` didn't pass `min_chars=`, so the format() was
   just baking the constant at import. Cosmetic inconsistency; pass
   `min_chars=_MIN_BODY_CHARS` at the call site instead and let
   `_hint_for` do the substitution like it does for `too_short`.

Verified end-to-end:
- New TestAgentsWalkerSkipsNonAgentFiles cases fail on the unfixed
  walker (verified by reverting to the pre-fix file and re-running);
  pass cleanly after the fix.
- Full content-guardrail suite: 25/25 (23 existing + 2 new).
- Full pytest: 4189 passed, 25 skipped.

* release: 0.53.5 — content guardrail (flea-market submitter UX) + catalog ENTITY column + BQ hint dispatch

Bundles three threads landed in [Unreleased]:
- Vojta's flea-market content guardrail (two-tier mechanical + LLM)
- Zdeněk's `agnes catalog` ENTITY column replacement for FLAVOR
- Zdeněk's `/api/query` remote_estimate_failed hint dispatch fix

Plus the takeover hygiene from #276 review (agents walker preview/check
parity + body_too_short hint runtime kwarg consistency) and the
backslash-escape fix follow-up to v0.53.4 #275.

No DB migration; no API change. Patch upgrade lands transparently.
Upload form's new "Before you upload" disclosure + per-component preview
table appear on the next dev-VM auto-pull. Quarantine banner now groups
findings by file with "See <type> example ↗" deep-links to the new
/store/examples reference page.

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-12 21:48:27 +02:00

281 lines
11 KiB
HTML

{# Shared quarantine banner partial.
Surfaces submission status (under review / quarantined / hidden /
override-applied) to the entity owner + admins. Self-guarded so it's
safe to {% include %} from any detail page — renders nothing when
the entity is approved or the viewer isn't owner/admin.
Required scope:
entity — store_entities row (must carry visibility_status,
visibility_status; entity.id surfaces in admin
detail link)
quarantine_sub — latest store_submissions row for entity, or None
is_owner — bool, viewer == entity.owner_user_id
is_admin — bool, viewer is in Admin group
Mirror of the version that previously lived in store_detail.html.
Wording stays consistent with the per-status messaging the user
approved earlier — only the rendering location changed.
#}
{% if entity.visibility_status != 'approved' and (is_owner or is_admin) %}
<style>
.vis-banner {
margin: 12px 0 16px 0;
padding: 14px 18px;
border-radius: 10px;
font-size: 14px;
border: 1px solid;
}
.vis-banner.pending { background: #fef3c7; color: #92400e; border-color: #fde68a; }
.vis-banner.blocked { background: #fee2e2; color: #991b1b; border-color: #fecaca; }
.vis-banner.hidden { background: #e5e7eb; color: #374151; border-color: #d1d5db; }
.vis-banner h3 { margin: 0 0 6px 0; font-size: 15px; font-weight: 600; }
.vis-banner ul { margin: 6px 0 0 0; padding-left: 20px; font-size: 13px; }
.vis-banner code { background: rgba(0,0,0,0.06); padding: 1px 6px; border-radius: 4px; font-size: 12px; }
.vis-banner .actions { margin-top: 10px; }
.vis-banner .actions a {
display: inline-block; padding: 5px 12px; border-radius: 6px;
background: rgba(0,0,0,0.08); color: inherit; text-decoration: none;
font-size: 12px; font-weight: 500;
}
</style>
{% set sub = quarantine_sub %}
{% set st = sub.status if sub else entity.visibility_status %}
{% set bcls = 'pending' if st in ['pending_inline','pending_llm','pending']
else ('blocked' if st in ['blocked_inline','blocked_llm','review_error']
else 'hidden') %}
<div class="vis-banner {{ bcls }}">
{% if st == 'pending_llm' or st == 'pending_inline' or st == 'pending' %}
{% set _is_edit_review = entity.version_no and entity.version_no > 1 %}
{% if _is_edit_review %}
<h3>⟳ Version {{ entity.version_no }} under review</h3>
<div>
Your edit is being checked. The previously approved version
(v{{ entity.version_no - 1 }}) keeps serving to existing
installers until v{{ entity.version_no }} passes review. The
page refreshes automatically when the verdict lands.
</div>
{% else %}
<h3>⟳ Under review</h3>
<div>
Your submission is being checked. It is hidden from the public
Store and from anyone else's view until all checks pass. Page
refreshes automatically when the verdict lands — usually a few
seconds.
</div>
{% endif %}
{% elif st == 'blocked_inline' %}
<h3>⚠ Quarantined — automated checks failed</h3>
<div>
Your submission failed at least one automated check and has been
quarantined. It is hidden from the public Store and from every
other user; nobody can install it. Fix the issues below and
re-upload to retry, or wait for an admin to resolve the
quarantine.
</div>
{% if sub and sub.inline_checks %}
{% set ic = sub.inline_checks %}
{% if ic.manifest and ic.manifest.issues %}
<ul>
{% for issue in ic.manifest.issues %}<li>manifest: <code>{{ issue }}</code></li>{% endfor %}
</ul>
{% endif %}
{% if ic.static_security and ic.static_security.findings %}
<ul>
{% for f in ic.static_security.findings[:6] %}
<li>security: <code>{{ f.file }}:{{ f.line }}</code> — {{ f.reason }}</li>
{% endfor %}
{% if ic.static_security.findings|length > 6 %}
<li><em>… and {{ ic.static_security.findings|length - 6 }} more</em></li>
{% endif %}
</ul>
{% endif %}
{% if ic.content and ic.content.issues %}
{% include "_content_findings.html" with context %}
{% endif %}
{% endif %}
{% elif st == 'blocked_llm' %}
<h3>⚠ Quarantined — review flagged issues</h3>
<div>
The reviewer flagged this submission for security risk and/or
weak component descriptions. It is hidden from the public Store
and from every other user; nobody can install it. Address the
findings below and re-upload, or wait for an admin to resolve
the quarantine.
</div>
{% if sub and sub.llm_findings %}
{% if sub.llm_findings.summary %}
<div style="margin-top: 6px;"><em>{{ sub.llm_findings.summary }}</em></div>
{% endif %}
{% if sub.llm_findings.findings %}
<div style="margin-top: 8px; font-weight: 600;">Security findings</div>
<ul>
{% for f in sub.llm_findings.findings[:6] %}
<li>[{{ f.severity }}] <code>{{ f.file }}</code> — {{ f.explanation }}</li>
{% endfor %}
</ul>
{% endif %}
{% if sub.llm_findings.content_quality and sub.llm_findings.content_quality.issues %}
<div style="margin-top: 10px; font-weight: 600;">Description quality — reviewer suggestions</div>
<ul>
{% for issue in sub.llm_findings.content_quality.issues[:8] %}
<li>
<code>{{ issue.file }}</code> — {{ issue.issue }}
{% if issue.hint %}
<div style="margin: 4px 0 0 0; font-size: 12px; opacity: 0.85;">
<strong>Rewrite:</strong> {{ issue.hint }}
</div>
{% endif %}
</li>
{% endfor %}
</ul>
{% endif %}
{% endif %}
{% elif st == 'review_error' %}
<h3>⚠ Under review — security check errored</h3>
<div>
The security reviewer couldn't complete its check. The submission
stays hidden until an admin retries. No action needed from you.
</div>
{% if sub and sub.llm_findings and sub.llm_findings.error %}
<div style="margin-top: 6px; font-family: ui-monospace, SFMono-Regular, Menlo, monospace; font-size: 12px; word-break: break-word;">
Error: {{ sub.llm_findings.error }}
</div>
{% endif %}
{# Surface any inline-check findings that were captured before the
LLM step errored — gives the submitter something concrete to
look at instead of a bare "errored" message. #}
{% if sub and sub.inline_checks %}
{% set ic = sub.inline_checks %}
{% if ic.static_security and ic.static_security.findings %}
<ul>
{% for f in ic.static_security.findings[:6] %}
<li>security: <code>{{ f.file }}:{{ f.line }}</code> — {{ f.reason }}</li>
{% endfor %}
</ul>
{% endif %}
{% endif %}
{% elif st == 'overridden' %}
<h3>✓ Admin override applied</h3>
<div>This submission was force-published by an admin.</div>
{% if sub and sub.override_reason %}
<div style="margin-top: 6px; font-size: 13px;">
<em>Override reason:</em> {{ sub.override_reason }}
</div>
{% endif %}
{% else %}
{# Fallback for hidden / unexpected lifecycle states. Surface
whatever verdict context the submission row carries so an
admin doesn't see a bare "Hidden" with no actionable detail. #}
<h3>Hidden</h3>
<div>
This entity is not visible in the public Store
(<code>visibility_status = "{{ entity.visibility_status }}"</code>).
</div>
{% if sub and sub.inline_checks %}
{% set ic = sub.inline_checks %}
{% if ic.manifest and ic.manifest.issues %}
<ul>
{% for issue in ic.manifest.issues %}<li>manifest: <code>{{ issue }}</code></li>{% endfor %}
</ul>
{% endif %}
{% if ic.static_security and ic.static_security.findings %}
<ul>
{% for f in ic.static_security.findings[:6] %}
<li>security: <code>{{ f.file }}:{{ f.line }}</code> — {{ f.reason }}</li>
{% endfor %}
</ul>
{% endif %}
{% endif %}
{% if sub and sub.llm_findings %}
{% if sub.llm_findings.summary %}
<div style="margin-top: 6px;"><em>{{ sub.llm_findings.summary }}</em></div>
{% endif %}
{% if sub.llm_findings.findings %}
<ul>
{% for f in sub.llm_findings.findings[:6] %}
<li>[{{ f.severity }}] <code>{{ f.file }}</code> — {{ f.explanation }}</li>
{% endfor %}
</ul>
{% endif %}
{% endif %}
{% endif %}
{# How-to-fix panel — render once below the per-tier findings whenever
content-quality issues exist on either tier. Same guidance regardless
of whether the inline mechanical check or the LLM substantive check
rejected the submission. #}
{% if sub and ((sub.inline_checks and sub.inline_checks.content and sub.inline_checks.content.issues)
or (sub.llm_findings and sub.llm_findings.content_quality and sub.llm_findings.content_quality.issues)) %}
{% include "_content_howto_fix.html" with context %}
{% endif %}
{% if is_admin and sub %}
<div class="actions">
<a href="/admin/store/submissions/{{ sub.id }}">Open submission detail →</a>
</div>
{% endif %}
</div>
{# Auto-refresh while the verdict is pending. Banner copy promises
"page refreshes automatically when the verdict lands" — this is
what does it. Polls the owner-accessible flea detail endpoint and
reloads when EITHER visibility flips off 'pending' OR the
submission verdict flips off 'pending_inline' / 'pending_llm'.
Both signals are needed because `blocked_llm` keeps the entity at
`visibility_status='pending'` (admin can override → publish), so
visibility alone doesn't fire. Only emits the script while the
verdict itself is still pending; terminal states render the
final banner copy and don't need to reload. #}
{% if quarantine_sub and quarantine_sub.status in ['pending_inline', 'pending_llm'] %}
<script>
(function () {
const entityId = {{ entity.id|tojson }};
const initialSubStatus = {{ quarantine_sub.status|tojson }};
const initialVisibility = {{ entity.visibility_status|tojson }};
let attempts = 0;
async function tick() {
attempts++;
try {
const r = await fetch(`/api/marketplace/flea/${entityId}/detail`, {
credentials: 'same-origin',
headers: {'Accept': 'application/json'},
});
if (r.ok) {
const data = await r.json();
const subFlipped = data.submission_status
&& data.submission_status !== initialSubStatus
&& data.submission_status !== 'pending_inline'
&& data.submission_status !== 'pending_llm';
const visFlipped = data.visibility_status
&& data.visibility_status !== initialVisibility;
if (subFlipped || visFlipped) {
window.location.reload();
return;
}
} else if (r.status === 404) {
// Entity might have been archived/deleted — reload so the
// page refetches and renders the new state (or a 404).
window.location.reload();
return;
}
} catch (e) { /* network blip; keep polling */ }
// First 30 attempts at 3s = 90s of fast polling, then back off
// to 10s. Same cadence as admin detail polling so an LLM review
// on Sonnet/Opus has room to land.
const next = attempts < 30 ? 3000 : 10000;
setTimeout(tick, next);
}
setTimeout(tick, 3000);
})();
</script>
{% endif %}
{% endif %}