agnes-the-ai-analyst/app/web/templates/store_examples.html
Vojtech 50a974f196
feat(store-guardrails): admin-configurable content thresholds (#281)
* feat(store-guardrails): admin-configurable content thresholds

Adds the flea-market content guardrail floors to the /admin/server-config
editor so operators can tune the bar without code changes. Defaults are
unchanged (60 chars description, 25 chars command, 5 distinct words, 200
chars body) — patching guardrails.* in instance.yaml or via the admin UI
overrides any of them and the next inline check picks up the new value.

src/store_guardrails/content_check.py now resolves the four floors via
helper functions (_min_desc_chars / _min_command_desc_chars /
_min_distinct_words / _min_body_chars) that read app.instance_config at
call time. Module-level _DEFAULT_* constants remain as fallbacks if
the import fails (defensive — keeps the guardrail module loadable
without the app package on its path).

app/instance_config.py grows four matching getters returning the live
value with sane defaults + integer coercion.

app/api/admin.py registers 'guardrails' as an editable section + ships
nine known-fields entries (min_description_chars,
min_command_description_chars, min_distinct_words, min_body_chars,
enabled, review_model, blocked_quota_per_day, blocked_bundle_ttl_days,
stuck_review_grace_seconds) with operator-facing hint copy explaining
what each knob does.

app/web/templates/admin_server_config.html gets a SECTION_META entry
so the section renders as 'Flea-market guardrails' with a help string
instead of a bare section ID.

app/web/router.py threads the live thresholds into /store/new and
/store/examples via a small _guardrail_thresholds() helper so the
disclosure copy, char counter, and "Why these limits" table render
the configured value (not a hardcoded 60). End-to-end smoke verified:
PATCH guardrails.min_description_chars=90 → /store/new immediately
renders "90 characters" + JS DESC_MIN=90 on the next request, no
restart required (helpers read live config per call).

* chore(store-guardrails): address PR review safe-fix findings

Code-review safe_auto findings on PR #281 (review run
20260513-100126-64052520):

- CHANGELOG: add Unreleased entry covering the new
  /admin/server-config Flea-market guardrails section, the four live
  threshold getters, and the route-helper rendering knobs. Required by
  the project's non-negotiable "Changelog discipline" rule.
- content_check.py: narrow `except Exception` to `except ImportError`
  on the four `_min_*()` resolver helpers. Surface-level TypeError /
  ValueError on a malformed YAML value belongs to the
  instance_config getters' own try/except — the resolvers should only
  defend against the in-tree import itself failing, not silently
  swallow real bugs in the getters.
- store_upload.html: refresh the stale "30-char threshold" comment to
  reflect the configurable floor (default 60), and add `|default(60)`
  / `|default(25)` / `|default(5)` filters to the disclosure-copy
  bindings so the upload form matches store_examples.html's
  belt-and-suspenders rendering if a future route ever renders the
  template without populating the `guardrail` context.
- router.py: tighten `_guardrail_thresholds()` return annotation
  from bare `dict` to `dict[str, int]`.

Residual work (left for separate change after operator direction):
- Add round-trip test (PATCH guardrails -> next inline check uses
  new value) — primary testing gap.
- Decide policy on `min_*=0` (currently coerced to 1 via
  `max(1, int(val))`) vs treating 0 as a disable sentinel like
  neighbour getters (`blocked_quota_per_day`,
  `blocked_bundle_ttl_days`).
- Add POST-time integer validation for `guardrails.*` so a typo'd
  YAML value (bool / string / float) errors loudly instead of
  silently falling back to the default.

* test(store-guardrails): cover admin-configurable thresholds + PATCH round-trip

Closes the "primary testing gap" Vojta noted in the safe-fix commit
on PR #281 — the four new `get_guardrails_min_*` getters and the
PATCH-takes-effect-on-next-check live-config flow had no direct
coverage.

10 new tests in `tests/test_store_guardrails_admin_config.py`:

- TestGuardrailGetterDefaults (4 tests) — each new getter returns the
  documented default (60 / 25 / 5 / 200) when nothing is configured.
- TestGuardrailGetterOverlay (5 tests) — overlay-driven overrides win,
  string values that look numeric coerce via int(), garbage strings
  fall back to default via the (TypeError, ValueError) branch, and the
  `max(1, int(val))` floor pins zero/negative inputs to 1.
- TestPatchRoundTrip (1 test) — PATCH `/api/admin/server-config`
  `guardrails.min_description_chars=90`, then call content_check
  against a 75-char description that previously passed: must now fail
  with `too_short`. Then PATCH back to 60 and verify the next check
  passes again. Closes the cache-invalidation contract Vojta relies on
  for the "no app restart" claim — broken without the
  reset_cache() bracket in /api/admin/server-config.

The TestGuardrailGetterOverlay.test_zero_or_negative_floored_to_one
test pins the current `max(1, int(val))` policy. Vojta's safe-fix
commit explicitly left "policy on min_*=0 vs disable-sentinel" as
residual work — pinning the current behavior here ensures any future
change to use 0 as a disable sentinel must update this test (and the
reviewer sees the policy decision).

Verified: 4509 tests pass locally (4499 existing + 10 new).

* release: 0.54.2 — admin-configurable flea-market guardrail thresholds + tests

Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.1 →
0.54.2) bundling Vojta's admin-configurable thresholds for the
flea-market content guardrail (9 knobs in /admin/server-config) plus
the test coverage closing the "primary testing gap" he punted in the
safe-fix commit.

No DB migration; defaults unchanged from PR #276 — instances that
don't set `guardrails.*` keep the original bar transparently.

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
Co-authored-by: ZdenekSrotyr <139972147+ZdenekSrotyr@users.noreply.github.com>
2026-05-13 09:20:55 +00:00

339 lines
13 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{% extends "base.html" %}
{% block title %}Submission examples — {{ config.INSTANCE_NAME }}{% endblock %}
{% block content %}
<style>
.examples-page { max-width: 980px; }
.examples-page h1 {
font-size: 26px; font-weight: 700; margin: 0 0 6px;
color: var(--text-primary, #111827);
}
.examples-page .sub {
font-size: 14px; color: var(--text-secondary, #6b7280);
margin-bottom: 24px; line-height: 1.55;
}
.ex-section {
background: var(--surface, #fff);
border: 1px solid var(--border, #e5e7eb);
border-radius: 10px;
padding: 20px 22px;
margin-bottom: 18px;
}
.ex-section h2 {
margin: 0 0 4px; font-size: 17px; font-weight: 600;
color: var(--text-primary, #111827);
}
.ex-section .why {
font-size: 13px; color: var(--text-secondary, #6b7280);
margin-bottom: 12px; line-height: 1.6;
}
.ex-block {
font-family: var(--font-mono); font-size: 12.5px;
background: #f6f8fa;
border: 1px solid var(--border, #e5e7eb);
border-radius: 8px;
padding: 12px 14px;
line-height: 1.5;
overflow-x: auto;
white-space: pre-wrap;
word-break: break-word;
color: #1f2937;
}
.ex-label {
display: inline-block; font-size: 11px; font-weight: 600;
padding: 2px 8px; border-radius: 4px;
background: rgba(16, 185, 129, 0.12); color: #047857;
text-transform: uppercase; letter-spacing: 0.5px;
margin-bottom: 6px;
}
.ex-label.bad { background: rgba(220, 38, 38, 0.10); color: #b91c1c; }
.ex-compare { display: grid; grid-template-columns: 1fr 1fr; gap: 14px; }
@media (max-width: 720px) { .ex-compare { grid-template-columns: 1fr; } }
.ex-tips {
background: var(--background, #f9fafb);
border-left: 3px solid var(--primary, #0073D1);
padding: 10px 14px; border-radius: 4px;
font-size: 13px; margin-top: 12px; line-height: 1.55;
}
.ex-tips strong { color: var(--text-primary, #111827); }
.top-actions { margin-bottom: 16px; }
.top-actions a {
color: var(--primary, #0073D1); text-decoration: none;
font-size: 13px;
}
</style>
<div class="examples-page page-shell">
<div class="top-actions">
<a href="/store/new">← Back to upload</a>
</div>
<h1>Submission examples</h1>
<p class="sub">
Each component (plugin, agent, skill, command) needs a description
that names the trigger condition AND the action. Skills are the
strictest case because Claude reads the description verbatim when
deciding whether to invoke the skill. Examples below show the
minimum bar plus what a strong submission looks like.
</p>
<!-- ───── Why these limits ───────────────────────────────────────── -->
<div class="ex-section">
<h2>Why these limits?</h2>
<div class="why">
Descriptions that are too short don't give the assistant enough
to go on. Below is the minimum bar plus what a well-written
submission usually looks like.
</div>
<table style="width:100%; border-collapse: collapse; font-size: 13px;">
<thead>
<tr style="background: var(--background, #f9fafb); text-align: left;">
<th style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);">Field</th>
<th style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);">Minimum</th>
<th style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);">Recommended</th>
<th style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);">What it does</th>
</tr>
</thead>
<tbody>
<tr>
<td style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);">Skill / agent / plugin description</td>
<td style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);"><strong>{{ guardrail.min_description_chars|default(60) }} chars</strong> · {{ guardrail.min_distinct_words|default(5) }} distinct words</td>
<td style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);">120220 chars (one full sentence)</td>
<td style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);">Tells the assistant when to use the component and what it does. Showed on the marketplace tile so others can pick it.</td>
</tr>
<tr>
<td style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);">Command description</td>
<td style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);"><strong>{{ guardrail.min_command_description_chars|default(25) }} chars</strong> · {{ guardrail.min_distinct_words|default(5) }} distinct words</td>
<td style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);">40100 chars</td>
<td style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);">Commands are one-verb actions ("run tests", "format code"). A short clear sentence is enough.</td>
</tr>
<tr>
<td style="padding: 8px 10px;">Skill / agent content body</td>
<td style="padding: 8px 10px;"><strong>{{ guardrail.min_body_chars|default(200) }} chars</strong></td>
<td style="padding: 8px 10px;">5002000 chars</td>
<td style="padding: 8px 10px;">The body explains what the component does once used: inputs it expects, outputs it produces, edge cases. The minimum is a "one paragraph" floor that catches stubs.</td>
</tr>
</tbody>
</table>
<div class="ex-tips" style="margin-top: 12px;">
<strong>Two-step review.</strong> First we check length and word
count to catch placeholders like <code>TODO</code> or
<code>description</code>. Submissions that pass the length check
then go to a substantive reviewer that judges whether the
description is genuinely useful or just padded filler that hit
the character count by accident.
</div>
</div>
<!-- ───── Skill ──────────────────────────────────────────────────── -->
<div class="ex-section" id="skill">
<h2>Skill</h2>
<div class="why">
<code>skills/&lt;name&gt;/SKILL.md</code> with YAML frontmatter and a
body. The frontmatter <code>description</code> IS the trigger
string Claude reads; the body explains how the skill works once
invoked.
</div>
<div class="ex-compare">
<div>
<span class="ex-label bad">Rejected</span>
<pre class="ex-block">---
name: code-review
description: A reviewer skill
---
Reviews code.
</pre>
<div class="ex-tips">
<strong>Why it fails:</strong> description is 15 chars
(floor 60), restates the name, body is 13 chars
(floor 200). Claude can't decide when to invoke it.
</div>
</div>
<div>
<span class="ex-label">Passes</span>
<pre class="ex-block">---
name: code-review
description: Use when reviewing pull requests to flag missing tests,
weak assertions, brittle implementation-coupled tests, and edge
cases the implementation forgot.
---
# Code review skill
Run this skill against the diff of an open pull request. It walks
the changed files and surfaces three categories of issues:
1. **Missing tests** — new functions / endpoints / migrations
without corresponding test coverage.
2. **Weak assertions** — tests that exist but only assert truthy
values, status codes, or shape without verifying the actual
behavior contract.
3. **Brittle coupling** — tests that depend on private state,
internal call counts, or implementation details that will break
on refactor without catching real regressions.
## Inputs
The skill expects to be invoked from a git repo with a PR branch
checked out. It reads `git diff <base>..HEAD` to scope the review.
## Output
Markdown comment grouped by file, with line refs and one fix
suggestion per finding.
</pre>
<div class="ex-tips">
<strong>Why it passes:</strong> description names WHEN (PR
review) and WHAT it does, body explains inputs/outputs.
</div>
</div>
</div>
</div>
<!-- ───── Agent ──────────────────────────────────────────────────── -->
<div class="ex-section" id="agent">
<h2>Agent</h2>
<div class="why">
Single <code>.md</code> file with YAML frontmatter. The
<code>description</code> drives dispatch — when a parent agent
decides which subagent to spawn, this is the string it reads.
</div>
<div class="ex-compare">
<div>
<span class="ex-label bad">Rejected</span>
<pre class="ex-block">---
name: debugger
description: A debugger
---
Helps with debugging.
</pre>
<div class="ex-tips">
<strong>Why it fails:</strong> description is 11 chars
(floor 60), body is 22 chars (floor 200), neither names a
dispatch criterion.
</div>
</div>
<div>
<span class="ex-label">Passes</span>
<pre class="ex-block">---
name: debugger
description: Diagnoses test failures, runtime errors, and unexpected
behavior. Use when a build is failing, a stack trace lands in the
conversation, or the user describes a symptom without a known
root cause.
---
# Debugger subagent
Triage flow:
1. Reproduce the failure deterministically (capture the exact
command + working directory + relevant env).
2. Reduce — strip the case down to the smallest input that still
fails. Confirm the reduction reproduces.
3. Bisect or trace, depending on whether the regression is recent
or the behavior was never correct.
4. Propose ONE root-cause hypothesis and a minimal fix. Don't
shotgun-fix multiple symptoms.
Report the hypothesis + fix path back to the dispatcher; do not
apply the fix yourself unless explicitly asked.
</pre>
<div class="ex-tips">
<strong>Why it passes:</strong> description names WHAT it
does (diagnose) AND WHEN to dispatch (test failures, stack
traces, symptoms).
</div>
</div>
</div>
</div>
<!-- ───── Plugin ─────────────────────────────────────────────────── -->
<div class="ex-section" id="plugin">
<h2>Plugin</h2>
<div class="why">
<code>.claude-plugin/plugin.json</code> at the bundle root. The
<code>description</code> is the marketplace tile copy — first
thing a user sees when browsing.
</div>
<div class="ex-compare">
<div>
<span class="ex-label bad">Rejected</span>
<pre class="ex-block">{
"name": "review-tools",
"description": "Tools for code review",
"version": "0.1.0"
}
</pre>
<div class="ex-tips">
<strong>Why it fails:</strong> 20 chars (floor 60),
generic enough to apply to any plugin.
</div>
</div>
<div>
<span class="ex-label">Passes</span>
<pre class="ex-block">{
"name": "review-tools",
"description": "Code review automation: PR diff analysis, test
coverage gaps, and a configurable rule pack for Rails / Python /
TypeScript projects.",
"version": "0.1.0"
}
</pre>
<div class="ex-tips">
<strong>Why it passes:</strong> names the audience (PR
reviewers), the value (gaps + rule pack), and the scope
(3 named languages).
</div>
</div>
</div>
</div>
<!-- ───── Command ────────────────────────────────────────────────── -->
<div class="ex-section" id="command">
<h2>Command</h2>
<div class="why">
<code>commands/&lt;name&gt;.md</code>. Shown in <code>/help</code>
and slash-command lists. Lower 20-char floor since commands tend
to be one-verb actions.
</div>
<div class="ex-compare">
<div>
<span class="ex-label bad">Rejected</span>
<pre class="ex-block">---
name: run-tests
description: Runs tests
---
</pre>
<div class="ex-tips">
<strong>Why it fails:</strong> 10 chars (floor 25),
restates the name.
</div>
</div>
<div>
<span class="ex-label">Passes</span>
<pre class="ex-block">---
name: run-tests
description: Run the project test suite and print failures grouped
by file with the first 5 lines of the traceback inline.
---
</pre>
<div class="ex-tips">
<strong>Why it passes:</strong> states the action, the
shape of the output, and the trimming behavior.
</div>
</div>
</div>
</div>
<div class="top-actions" style="margin-top: 20px;">
<a href="/store/new">← Back to upload</a>
</div>
</div>
{% endblock %}