feat(store-guardrails): admin-configurable content thresholds (#281)
* feat(store-guardrails): admin-configurable content thresholds Adds the flea-market content guardrail floors to the /admin/server-config editor so operators can tune the bar without code changes. Defaults are unchanged (60 chars description, 25 chars command, 5 distinct words, 200 chars body) — patching guardrails.* in instance.yaml or via the admin UI overrides any of them and the next inline check picks up the new value. src/store_guardrails/content_check.py now resolves the four floors via helper functions (_min_desc_chars / _min_command_desc_chars / _min_distinct_words / _min_body_chars) that read app.instance_config at call time. Module-level _DEFAULT_* constants remain as fallbacks if the import fails (defensive — keeps the guardrail module loadable without the app package on its path). app/instance_config.py grows four matching getters returning the live value with sane defaults + integer coercion. app/api/admin.py registers 'guardrails' as an editable section + ships nine known-fields entries (min_description_chars, min_command_description_chars, min_distinct_words, min_body_chars, enabled, review_model, blocked_quota_per_day, blocked_bundle_ttl_days, stuck_review_grace_seconds) with operator-facing hint copy explaining what each knob does. app/web/templates/admin_server_config.html gets a SECTION_META entry so the section renders as 'Flea-market guardrails' with a help string instead of a bare section ID. app/web/router.py threads the live thresholds into /store/new and /store/examples via a small _guardrail_thresholds() helper so the disclosure copy, char counter, and "Why these limits" table render the configured value (not a hardcoded 60). End-to-end smoke verified: PATCH guardrails.min_description_chars=90 → /store/new immediately renders "90 characters" + JS DESC_MIN=90 on the next request, no restart required (helpers read live config per call). * chore(store-guardrails): address PR review safe-fix findings Code-review safe_auto findings on PR #281 (review run 20260513-100126-64052520): - CHANGELOG: add Unreleased entry covering the new /admin/server-config Flea-market guardrails section, the four live threshold getters, and the route-helper rendering knobs. Required by the project's non-negotiable "Changelog discipline" rule. - content_check.py: narrow `except Exception` to `except ImportError` on the four `_min_*()` resolver helpers. Surface-level TypeError / ValueError on a malformed YAML value belongs to the instance_config getters' own try/except — the resolvers should only defend against the in-tree import itself failing, not silently swallow real bugs in the getters. - store_upload.html: refresh the stale "30-char threshold" comment to reflect the configurable floor (default 60), and add `|default(60)` / `|default(25)` / `|default(5)` filters to the disclosure-copy bindings so the upload form matches store_examples.html's belt-and-suspenders rendering if a future route ever renders the template without populating the `guardrail` context. - router.py: tighten `_guardrail_thresholds()` return annotation from bare `dict` to `dict[str, int]`. Residual work (left for separate change after operator direction): - Add round-trip test (PATCH guardrails -> next inline check uses new value) — primary testing gap. - Decide policy on `min_*=0` (currently coerced to 1 via `max(1, int(val))`) vs treating 0 as a disable sentinel like neighbour getters (`blocked_quota_per_day`, `blocked_bundle_ttl_days`). - Add POST-time integer validation for `guardrails.*` so a typo'd YAML value (bool / string / float) errors loudly instead of silently falling back to the default. * test(store-guardrails): cover admin-configurable thresholds + PATCH round-trip Closes the "primary testing gap" Vojta noted in the safe-fix commit on PR #281 — the four new `get_guardrails_min_*` getters and the PATCH-takes-effect-on-next-check live-config flow had no direct coverage. 10 new tests in `tests/test_store_guardrails_admin_config.py`: - TestGuardrailGetterDefaults (4 tests) — each new getter returns the documented default (60 / 25 / 5 / 200) when nothing is configured. - TestGuardrailGetterOverlay (5 tests) — overlay-driven overrides win, string values that look numeric coerce via int(), garbage strings fall back to default via the (TypeError, ValueError) branch, and the `max(1, int(val))` floor pins zero/negative inputs to 1. - TestPatchRoundTrip (1 test) — PATCH `/api/admin/server-config` `guardrails.min_description_chars=90`, then call content_check against a 75-char description that previously passed: must now fail with `too_short`. Then PATCH back to 60 and verify the next check passes again. Closes the cache-invalidation contract Vojta relies on for the "no app restart" claim — broken without the reset_cache() bracket in /api/admin/server-config. The TestGuardrailGetterOverlay.test_zero_or_negative_floored_to_one test pins the current `max(1, int(val))` policy. Vojta's safe-fix commit explicitly left "policy on min_*=0 vs disable-sentinel" as residual work — pinning the current behavior here ensures any future change to use 0 as a disable sentinel must update this test (and the reviewer sees the policy decision). Verified: 4509 tests pass locally (4499 existing + 10 new). * release: 0.54.2 — admin-configurable flea-market guardrail thresholds + tests Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.1 → 0.54.2) bundling Vojta's admin-configurable thresholds for the flea-market content guardrail (9 knobs in /admin/server-config) plus the test coverage closing the "primary testing gap" he punted in the safe-fix commit. No DB migration; defaults unchanged from PR #276 — instances that don't set `guardrails.*` keep the original bar transparently. --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com> Co-authored-by: ZdenekSrotyr <139972147+ZdenekSrotyr@users.noreply.github.com>
This commit is contained in:
parent
8f54ca3742
commit
50a974f196
10 changed files with 492 additions and 29 deletions
22
CHANGELOG.md
22
CHANGELOG.md
|
|
@ -10,6 +10,28 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C
|
|||
|
||||
## [Unreleased]
|
||||
|
||||
## [0.54.2] — 2026-05-13
|
||||
|
||||
### Added
|
||||
|
||||
- **Admin-configurable flea-market content guardrail thresholds.**
|
||||
`/admin/server-config` gains a new **Flea-market guardrails** section
|
||||
exposing nine knobs: `min_description_chars` (default 60),
|
||||
`min_command_description_chars` (default 25), `min_distinct_words`
|
||||
(default 5), `min_body_chars` (default 200), `enabled` (master
|
||||
kill-switch), `review_model` (haiku / sonnet / opus),
|
||||
`blocked_quota_per_day` (default 50), `blocked_bundle_ttl_days`
|
||||
(default 30), `stuck_review_grace_seconds` (default 1800). Each
|
||||
field carries an operator-facing hint string. The four mechanical
|
||||
floors are read from `app.instance_config` on every inline check,
|
||||
so a `/admin/server-config` PATCH takes effect on the next request
|
||||
without restarting uvicorn. `/store/new` (live char counter +
|
||||
disclosure copy) and `/store/examples` (the "Why these limits"
|
||||
table) render the configured values via a small
|
||||
`_guardrail_thresholds()` helper threaded into the route context.
|
||||
Defaults are unchanged — instances that don't set
|
||||
`guardrails.*` keep the original PR #276 bar.
|
||||
|
||||
## [0.54.1] — 2026-05-13
|
||||
|
||||
### Added
|
||||
|
|
|
|||
|
|
@ -258,6 +258,7 @@ _EDITABLE_SECTIONS: tuple[str, ...] = (
|
|||
"desktop",
|
||||
"corporate_memory",
|
||||
"materialize",
|
||||
"guardrails",
|
||||
)
|
||||
|
||||
# "Danger-zone" sections — flipping these can lock operators out (auth.*) or
|
||||
|
|
@ -703,6 +704,100 @@ _KNOWN_FIELDS: dict[str, dict[str, dict]] = {
|
|||
),
|
||||
},
|
||||
},
|
||||
"guardrails": {
|
||||
"min_description_chars": {
|
||||
"kind": "int",
|
||||
"default": 60,
|
||||
"hint": (
|
||||
"Minimum character floor for skill / agent / plugin "
|
||||
"descriptions on flea-market uploads (the inline content "
|
||||
"guardrail). Real-world Claude skill descriptions cluster "
|
||||
"150–220 chars; the default 60 is the bottom of the bar "
|
||||
"to catch placeholders. Bump to 100+ to push submitters "
|
||||
"closer to the ecosystem norm. Min 1."
|
||||
),
|
||||
},
|
||||
"min_command_description_chars": {
|
||||
"kind": "int",
|
||||
"default": 25,
|
||||
"hint": (
|
||||
"Minimum character floor for slash-command descriptions. "
|
||||
"Tighter than skills because commands are one-verb "
|
||||
"actions (\"run tests\", \"format code\"). Default 25. Min 1."
|
||||
),
|
||||
},
|
||||
"min_distinct_words": {
|
||||
"kind": "int",
|
||||
"default": 5,
|
||||
"hint": (
|
||||
"Minimum number of DISTINCT words in any description "
|
||||
"string. Defends against padding-only descriptions like "
|
||||
"\"description description description\" that hit the "
|
||||
"character count but say nothing. Default 5. Min 1."
|
||||
),
|
||||
},
|
||||
"min_body_chars": {
|
||||
"kind": "int",
|
||||
"default": 200,
|
||||
"hint": (
|
||||
"Minimum body-content floor for skill / agent files "
|
||||
"(the markdown after the YAML frontmatter). Real skill "
|
||||
"bodies run 500–2000 chars; the default 200 is a "
|
||||
"\"one paragraph\" floor that catches stubs. Min 1."
|
||||
),
|
||||
},
|
||||
"enabled": {
|
||||
"kind": "bool",
|
||||
"default": True,
|
||||
"hint": (
|
||||
"Master kill-switch for the LLM guardrail tier. When "
|
||||
"False (or when ANTHROPIC_API_KEY / LLM_API_KEY is "
|
||||
"absent), uploads still run the inline mechanical "
|
||||
"checks but skip the LLM security + content-quality "
|
||||
"review and auto-approve. Default True."
|
||||
),
|
||||
},
|
||||
"review_model": {
|
||||
"kind": "select",
|
||||
"default": "haiku",
|
||||
"options": ["haiku", "sonnet", "opus"],
|
||||
"hint": (
|
||||
"Anthropic model tier for the LLM security + content "
|
||||
"review. Haiku is the cheapest and fastest; Sonnet / "
|
||||
"Opus catch subtler prompt-injection + vague descriptions "
|
||||
"at proportionally higher per-upload cost."
|
||||
),
|
||||
},
|
||||
"blocked_quota_per_day": {
|
||||
"kind": "int",
|
||||
"default": 50,
|
||||
"hint": (
|
||||
"Per-submitter cap on `blocked_inline` rows in the "
|
||||
"trailing 24h. Bounds the worst case where a bot loops "
|
||||
"on malformed ZIPs. 0 disables the quota. Default 50."
|
||||
),
|
||||
},
|
||||
"blocked_bundle_ttl_days": {
|
||||
"kind": "int",
|
||||
"default": 30,
|
||||
"hint": (
|
||||
"How many days to keep a blocked bundle's bytes on disk. "
|
||||
"The submission row + sha256 + size always survive; only "
|
||||
"the bytes get removed. 0 disables the purge entirely. "
|
||||
"Default 30."
|
||||
),
|
||||
},
|
||||
"stuck_review_grace_seconds": {
|
||||
"kind": "int",
|
||||
"default": 1800,
|
||||
"hint": (
|
||||
"How long a submission may stay at `status='pending_llm'` "
|
||||
"before the reaper flips it to `review_error`. Default "
|
||||
"1800 (30 min) comfortably exceeds Sonnet / Opus p99 "
|
||||
"wall time. 0 disables the reaper."
|
||||
),
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
# Keys whose values must be redacted from the audit diff. We match
|
||||
|
|
|
|||
|
|
@ -449,6 +449,62 @@ def get_guardrails_stuck_review_grace_seconds() -> int:
|
|||
return 1800
|
||||
|
||||
|
||||
def get_guardrails_min_description_chars() -> int:
|
||||
"""Minimum character floor for skill / agent / plugin descriptions.
|
||||
|
||||
Reads ``guardrails.min_description_chars`` (default 60). Set the
|
||||
floor low (e.g. 30) to relax the inline content check; set high
|
||||
(e.g. 120) to push submitters closer to the Claude-skill-ecosystem
|
||||
norm of 150–220 chars per description.
|
||||
"""
|
||||
val = get_value("guardrails", "min_description_chars", default=60)
|
||||
try:
|
||||
return max(1, int(val))
|
||||
except (TypeError, ValueError):
|
||||
return 60
|
||||
|
||||
|
||||
def get_guardrails_min_command_description_chars() -> int:
|
||||
"""Minimum character floor for slash-command descriptions.
|
||||
|
||||
Reads ``guardrails.min_command_description_chars`` (default 25).
|
||||
Commands are typically one-verb actions — kept tighter than skills.
|
||||
"""
|
||||
val = get_value("guardrails", "min_command_description_chars", default=25)
|
||||
try:
|
||||
return max(1, int(val))
|
||||
except (TypeError, ValueError):
|
||||
return 25
|
||||
|
||||
|
||||
def get_guardrails_min_distinct_words() -> int:
|
||||
"""Minimum distinct-word count for any description string.
|
||||
|
||||
Reads ``guardrails.min_distinct_words`` (default 5). Defends against
|
||||
"padding hits the char count but says nothing" cases like
|
||||
`"description description description description"`.
|
||||
"""
|
||||
val = get_value("guardrails", "min_distinct_words", default=5)
|
||||
try:
|
||||
return max(1, int(val))
|
||||
except (TypeError, ValueError):
|
||||
return 5
|
||||
|
||||
|
||||
def get_guardrails_min_body_chars() -> int:
|
||||
"""Minimum body-content floor for skill / agent files.
|
||||
|
||||
Reads ``guardrails.min_body_chars`` (default 200). Body = the
|
||||
markdown after the YAML frontmatter. 200 chars is a "one paragraph"
|
||||
floor that catches stubs; real skill bodies run 500–2000 chars.
|
||||
"""
|
||||
val = get_value("guardrails", "min_body_chars", default=200)
|
||||
try:
|
||||
return max(1, int(val))
|
||||
except (TypeError, ValueError):
|
||||
return 200
|
||||
|
||||
|
||||
def get_guardrails_enabled() -> bool:
|
||||
"""Master kill-switch for the guardrail pipeline.
|
||||
|
||||
|
|
|
|||
|
|
@ -1150,6 +1150,27 @@ async def install_redirect(request: Request):
|
|||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _guardrail_thresholds() -> dict[str, int]:
|
||||
"""Live admin-configurable thresholds surfaced into the upload UI.
|
||||
|
||||
Each render reads the current value so the disclosure / counter /
|
||||
examples-table copy stays in lock-step with the
|
||||
/admin/server-config patch — no app restart required.
|
||||
"""
|
||||
from app.instance_config import (
|
||||
get_guardrails_min_body_chars,
|
||||
get_guardrails_min_command_description_chars,
|
||||
get_guardrails_min_description_chars,
|
||||
get_guardrails_min_distinct_words,
|
||||
)
|
||||
return {
|
||||
"min_description_chars": get_guardrails_min_description_chars(),
|
||||
"min_command_description_chars": get_guardrails_min_command_description_chars(),
|
||||
"min_distinct_words": get_guardrails_min_distinct_words(),
|
||||
"min_body_chars": get_guardrails_min_body_chars(),
|
||||
}
|
||||
|
||||
|
||||
@router.get("/store/new", response_class=HTMLResponse)
|
||||
async def store_new(
|
||||
request: Request,
|
||||
|
|
@ -1157,7 +1178,11 @@ async def store_new(
|
|||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||||
):
|
||||
from src.store_categories import STORE_CATEGORIES
|
||||
ctx = _build_context(request, user=user, categories=list(STORE_CATEGORIES))
|
||||
ctx = _build_context(
|
||||
request, user=user,
|
||||
categories=list(STORE_CATEGORIES),
|
||||
guardrail=_guardrail_thresholds(),
|
||||
)
|
||||
return templates.TemplateResponse(request, "store_upload.html", ctx)
|
||||
|
||||
|
||||
|
|
@ -1173,7 +1198,7 @@ async def store_examples(
|
|||
whose bundle failed review can see what 'good' looks like
|
||||
side-by-side with the rule that bit them.
|
||||
"""
|
||||
ctx = _build_context(request, user=user)
|
||||
ctx = _build_context(request, user=user, guardrail=_guardrail_thresholds())
|
||||
return templates.TemplateResponse(request, "store_examples.html", ctx)
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -222,6 +222,10 @@ const SECTION_META = {
|
|||
title: "Materialize",
|
||||
help: "Concurrency safety net for the materialize path. Controls the file-lock TTL used to detect and reclaim stale locks from hard-killed processes.",
|
||||
},
|
||||
guardrails: {
|
||||
title: "Flea-market guardrails",
|
||||
help: "Per-component content quality thresholds for store uploads. Lower the min-* knobs to relax the bar; raise to push submitters toward longer, more useful descriptions. The LLM tier (review_model + enabled) governs the second-stage substantive review.",
|
||||
},
|
||||
};
|
||||
const DANGER_SECTIONS = new Set(["auth", "server"]);
|
||||
|
||||
|
|
|
|||
|
|
@ -97,21 +97,21 @@
|
|||
<tbody>
|
||||
<tr>
|
||||
<td style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);">Skill / agent / plugin description</td>
|
||||
<td style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);"><strong>60 chars</strong> · 5 distinct words</td>
|
||||
<td style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);"><strong>{{ guardrail.min_description_chars|default(60) }} chars</strong> · {{ guardrail.min_distinct_words|default(5) }} distinct words</td>
|
||||
<td style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);">120–220 chars (one full sentence)</td>
|
||||
<td style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);">Tells the assistant when to use the component and what it does. Showed on the marketplace tile so others can pick it.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);">Command description</td>
|
||||
<td style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);"><strong>25 chars</strong> · 5 distinct words</td>
|
||||
<td style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);"><strong>{{ guardrail.min_command_description_chars|default(25) }} chars</strong> · {{ guardrail.min_distinct_words|default(5) }} distinct words</td>
|
||||
<td style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);">40–100 chars</td>
|
||||
<td style="padding: 8px 10px; border-bottom: 1px solid var(--border, #e5e7eb);">Commands are one-verb actions ("run tests", "format code"). A short clear sentence is enough.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td style="padding: 8px 10px;">Skill / agent content body</td>
|
||||
<td style="padding: 8px 10px;"><strong>200 chars</strong></td>
|
||||
<td style="padding: 8px 10px;"><strong>{{ guardrail.min_body_chars|default(200) }} chars</strong></td>
|
||||
<td style="padding: 8px 10px;">500–2000 chars</td>
|
||||
<td style="padding: 8px 10px;">The body explains what the component does once used: inputs it expects, outputs it produces, edge cases. 200 chars is the bare-minimum "one paragraph" floor.</td>
|
||||
<td style="padding: 8px 10px;">The body explains what the component does once used: inputs it expects, outputs it produces, edge cases. The minimum is a "one paragraph" floor that catches stubs.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
|
|
|||
|
|
@ -382,8 +382,8 @@
|
|||
</p>
|
||||
<p><strong>The bar:</strong></p>
|
||||
<ul>
|
||||
<li>Each description ≥ <strong>60 characters</strong> (commands ≥ 25). Aim for one complete sentence — too short and the assistant can't tell when to use it.</li>
|
||||
<li>At least 5 distinct words</li>
|
||||
<li>Each description ≥ <strong>{{ guardrail.min_description_chars|default(60) }} characters</strong> (commands ≥ {{ guardrail.min_command_description_chars|default(25) }}). Aim for one complete sentence — too short and the assistant can't tell when to use it.</li>
|
||||
<li>At least {{ guardrail.min_distinct_words|default(5) }} distinct words</li>
|
||||
<li>No <code>TODO</code> / <code>TBD</code> / template placeholders</li>
|
||||
<li>Action-oriented, names the trigger condition</li>
|
||||
</ul>
|
||||
|
|
@ -479,8 +479,8 @@
|
|||
</p>
|
||||
<p><strong>The bar:</strong></p>
|
||||
<ul>
|
||||
<li>Each description ≥ <strong>60 characters</strong> (commands ≥ 25). Aim for one complete sentence — too short and the assistant can't tell when to use it.</li>
|
||||
<li>At least 5 distinct words</li>
|
||||
<li>Each description ≥ <strong>{{ guardrail.min_description_chars|default(60) }} characters</strong> (commands ≥ {{ guardrail.min_command_description_chars|default(25) }}). Aim for one complete sentence — too short and the assistant can't tell when to use it.</li>
|
||||
<li>At least {{ guardrail.min_distinct_words|default(5) }} distinct words</li>
|
||||
<li>No <code>TODO</code> / <code>TBD</code> / template placeholders</li>
|
||||
<li>Action-oriented, names the trigger condition</li>
|
||||
</ul>
|
||||
|
|
@ -915,10 +915,11 @@ function renderDocs() {
|
|||
});
|
||||
}
|
||||
|
||||
// Description char counter — turns green at the 30-char threshold so the
|
||||
// submitter gets immediate feedback that they're past the floor. The
|
||||
// server is the source of truth; this is purely UX.
|
||||
const DESC_MIN = 60; // Matches src/store_guardrails/content_check.py
|
||||
// Description char counter — turns green at the configured floor so the
|
||||
// submitter gets immediate feedback that they're past the bar. The server
|
||||
// is the source of truth; this is purely UX. Floor is operator-configurable
|
||||
// via /admin/server-config (default 60).
|
||||
const DESC_MIN = {{ guardrail.min_description_chars|default(60) }}; // Live config from /admin/server-config
|
||||
const descField = document.getElementById('description');
|
||||
const descCounter = document.getElementById('desc-counter');
|
||||
function updateDescCounter() {
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
[project]
|
||||
name = "agnes-the-ai-analyst"
|
||||
version = "0.54.1"
|
||||
version = "0.54.2"
|
||||
description = "Agnes — AI Data Analyst platform for AI analytical systems"
|
||||
requires-python = ">=3.11,<3.14"
|
||||
license = "MIT"
|
||||
|
|
|
|||
|
|
@ -32,16 +32,51 @@ from ._frontmatter import frontmatter_body_offset, parse_frontmatter
|
|||
# Criteria constants
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# Hard floors. Rationale: 60 chars + 5 distinct words is the bottom of
|
||||
# ecosystem norms (npm 60-120, Docker Hub 100, VS Code 120, GitHub
|
||||
# 80-200) and well below real Claude/Anthropic skill examples (mean
|
||||
# ~190 chars, range 159-224 across superpowers / compound-engineering
|
||||
# / octo skill packs). The 60-char floor catches the obvious "didn't
|
||||
# Hard-floor defaults. Calibrated against real ecosystem norms (Claude
|
||||
# skill packs cluster 150–220 chars per description; npm / Docker Hub /
|
||||
# VS Code 100–120). 60 chars + 5 distinct words catches the "didn't
|
||||
# bother" cases; the LLM tier judges substantive quality on top.
|
||||
_MIN_DESC_CHARS = 60
|
||||
_MIN_COMMAND_DESC_CHARS = 25
|
||||
_MIN_DISTINCT_WORDS = 5
|
||||
_MIN_BODY_CHARS = 200
|
||||
#
|
||||
# Defaults are overridable via `instance.yaml.guardrails.*` keys —
|
||||
# operators tune the floor without code changes. Resolution helpers
|
||||
# below read the live config on every call so /admin/server-config
|
||||
# patches take effect on the next request (no app restart needed).
|
||||
_DEFAULT_MIN_DESC_CHARS = 60
|
||||
_DEFAULT_MIN_COMMAND_DESC_CHARS = 25
|
||||
_DEFAULT_MIN_DISTINCT_WORDS = 5
|
||||
_DEFAULT_MIN_BODY_CHARS = 200
|
||||
|
||||
|
||||
def _min_desc_chars() -> int:
|
||||
try:
|
||||
from app.instance_config import get_guardrails_min_description_chars
|
||||
return get_guardrails_min_description_chars()
|
||||
except ImportError:
|
||||
return _DEFAULT_MIN_DESC_CHARS
|
||||
|
||||
|
||||
def _min_command_desc_chars() -> int:
|
||||
try:
|
||||
from app.instance_config import get_guardrails_min_command_description_chars
|
||||
return get_guardrails_min_command_description_chars()
|
||||
except ImportError:
|
||||
return _DEFAULT_MIN_COMMAND_DESC_CHARS
|
||||
|
||||
|
||||
def _min_distinct_words() -> int:
|
||||
try:
|
||||
from app.instance_config import get_guardrails_min_distinct_words
|
||||
return get_guardrails_min_distinct_words()
|
||||
except ImportError:
|
||||
return _DEFAULT_MIN_DISTINCT_WORDS
|
||||
|
||||
|
||||
def _min_body_chars() -> int:
|
||||
try:
|
||||
from app.instance_config import get_guardrails_min_body_chars
|
||||
return get_guardrails_min_body_chars()
|
||||
except ImportError:
|
||||
return _DEFAULT_MIN_BODY_CHARS
|
||||
|
||||
# Case-insensitive substring matches that mark an unfilled template.
|
||||
_PLACEHOLDER_PHRASES = (
|
||||
|
|
@ -201,7 +236,7 @@ def check_submission_description(description: Optional[str]) -> Dict[str, Any]:
|
|||
single synthetic ``file`` = ``<submission>``.
|
||||
"""
|
||||
issues = _evaluate_description_string(
|
||||
description, min_chars=_MIN_DESC_CHARS,
|
||||
description, min_chars=_min_desc_chars(),
|
||||
component_kind="submission",
|
||||
)
|
||||
if issues:
|
||||
|
|
@ -326,7 +361,7 @@ def _read_text(p: Path) -> str:
|
|||
def _evaluate(comp: Dict[str, Any]) -> List[Dict[str, Any]]:
|
||||
"""Return the issue list for one component (empty when it passes)."""
|
||||
type_ = comp["type"]
|
||||
min_chars = _MIN_COMMAND_DESC_CHARS if type_ == "command" else _MIN_DESC_CHARS
|
||||
min_chars = _min_command_desc_chars() if type_ == "command" else _min_desc_chars()
|
||||
|
||||
issues = _evaluate_description_string(
|
||||
comp.get("description"),
|
||||
|
|
@ -344,12 +379,12 @@ def _evaluate(comp: Dict[str, Any]) -> List[Dict[str, Any]]:
|
|||
# commands often legitimately have a one-line body).
|
||||
if type_ in {"skill", "agent"}:
|
||||
body = (comp.get("body") or "").strip()
|
||||
if len(body) < _MIN_BODY_CHARS:
|
||||
if len(body) < _min_body_chars():
|
||||
issues.append({
|
||||
"file": comp["file"],
|
||||
"field": "body",
|
||||
"code": "body_too_short",
|
||||
"hint": _hint_for(type_, "body_too_short", min_chars=_MIN_BODY_CHARS),
|
||||
"hint": _hint_for(type_, "body_too_short", min_chars=_min_body_chars()),
|
||||
"name": comp.get("name"),
|
||||
"component_type": type_,
|
||||
})
|
||||
|
|
@ -413,7 +448,7 @@ def _evaluate_description_string(
|
|||
for t in raw.split()
|
||||
]
|
||||
distinct = {t for t in tokens if t}
|
||||
if len(distinct) < _MIN_DISTINCT_WORDS:
|
||||
if len(distinct) < _min_distinct_words():
|
||||
return [{
|
||||
"code": "low_word_count",
|
||||
"hint": _hint_for(component_kind, "low_word_count"),
|
||||
|
|
|
|||
225
tests/test_store_guardrails_admin_config.py
Normal file
225
tests/test_store_guardrails_admin_config.py
Normal file
|
|
@ -0,0 +1,225 @@
|
|||
"""Tests for the admin-configurable flea-market content-guardrail thresholds (#281).
|
||||
|
||||
Covers:
|
||||
1. The four new `get_guardrails_min_*` getters in app/instance_config.py:
|
||||
defaults, overlay-driven overrides, type coercion, and the
|
||||
`max(1, int(val))` floor.
|
||||
2. The round-trip: POST to /api/admin/server-config patches
|
||||
`guardrails.min_description_chars`, the next inline content check
|
||||
uses the new floor (closes the "primary testing gap" Vojta noted in
|
||||
the PR #281 safe-fix commit message).
|
||||
|
||||
These tests close the only real gap surfaced in the PR #281 takeover
|
||||
review — every other reviewer finding was either already addressed in
|
||||
Vojta's safe-fix commit or intentionally deferred (operator-direction
|
||||
decisions on the `min_*=0` semantics + POST-time integer validation).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import shutil
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
import yaml as _yaml
|
||||
|
||||
|
||||
def _auth(token: str) -> dict:
|
||||
return {"Authorization": f"Bearer {token}"}
|
||||
|
||||
|
||||
def _reset_cache() -> None:
|
||||
import app.instance_config as ic
|
||||
ic._instance_config = None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Unit tests for the four new getters
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestGuardrailGetterDefaults:
|
||||
"""Each getter returns the documented default when nothing is configured."""
|
||||
|
||||
def test_min_description_chars_default_60(self, tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("DATA_DIR", str(tmp_path))
|
||||
monkeypatch.setenv("TESTING", "1")
|
||||
monkeypatch.setenv("JWT_SECRET_KEY", "test-secret-key-minimum-32-characters!!")
|
||||
_reset_cache()
|
||||
from app.instance_config import get_guardrails_min_description_chars
|
||||
assert get_guardrails_min_description_chars() == 60
|
||||
|
||||
def test_min_command_description_chars_default_25(self, tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("DATA_DIR", str(tmp_path))
|
||||
monkeypatch.setenv("TESTING", "1")
|
||||
monkeypatch.setenv("JWT_SECRET_KEY", "test-secret-key-minimum-32-characters!!")
|
||||
_reset_cache()
|
||||
from app.instance_config import get_guardrails_min_command_description_chars
|
||||
assert get_guardrails_min_command_description_chars() == 25
|
||||
|
||||
def test_min_distinct_words_default_5(self, tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("DATA_DIR", str(tmp_path))
|
||||
monkeypatch.setenv("TESTING", "1")
|
||||
monkeypatch.setenv("JWT_SECRET_KEY", "test-secret-key-minimum-32-characters!!")
|
||||
_reset_cache()
|
||||
from app.instance_config import get_guardrails_min_distinct_words
|
||||
assert get_guardrails_min_distinct_words() == 5
|
||||
|
||||
def test_min_body_chars_default_200(self, tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("DATA_DIR", str(tmp_path))
|
||||
monkeypatch.setenv("TESTING", "1")
|
||||
monkeypatch.setenv("JWT_SECRET_KEY", "test-secret-key-minimum-32-characters!!")
|
||||
_reset_cache()
|
||||
from app.instance_config import get_guardrails_min_body_chars
|
||||
assert get_guardrails_min_body_chars() == 200
|
||||
|
||||
|
||||
class TestGuardrailGetterOverlay:
|
||||
"""Operator-supplied overlay values win over defaults."""
|
||||
|
||||
def _seed_overlay(self, tmp_path, monkeypatch, payload: dict) -> None:
|
||||
monkeypatch.setenv("DATA_DIR", str(tmp_path))
|
||||
monkeypatch.setenv("TESTING", "1")
|
||||
monkeypatch.setenv("JWT_SECRET_KEY", "test-secret-key-minimum-32-characters!!")
|
||||
state = tmp_path / "state"
|
||||
state.mkdir(parents=True, exist_ok=True)
|
||||
(state / "instance.yaml").write_text(_yaml.dump(payload))
|
||||
_reset_cache()
|
||||
|
||||
def test_overlay_overrides_min_description_chars(self, tmp_path, monkeypatch):
|
||||
self._seed_overlay(tmp_path, monkeypatch, {
|
||||
"guardrails": {"min_description_chars": 90},
|
||||
})
|
||||
from app.instance_config import get_guardrails_min_description_chars
|
||||
assert get_guardrails_min_description_chars() == 90
|
||||
|
||||
def test_overlay_overrides_min_body_chars(self, tmp_path, monkeypatch):
|
||||
self._seed_overlay(tmp_path, monkeypatch, {
|
||||
"guardrails": {"min_body_chars": 500},
|
||||
})
|
||||
from app.instance_config import get_guardrails_min_body_chars
|
||||
assert get_guardrails_min_body_chars() == 500
|
||||
|
||||
def test_string_value_coerced_to_int(self, tmp_path, monkeypatch):
|
||||
# An operator hand-editing the YAML can leave a string that's still
|
||||
# numeric — int() accepts it. Documented defensively in the getter.
|
||||
self._seed_overlay(tmp_path, monkeypatch, {
|
||||
"guardrails": {"min_distinct_words": "8"},
|
||||
})
|
||||
from app.instance_config import get_guardrails_min_distinct_words
|
||||
assert get_guardrails_min_distinct_words() == 8
|
||||
|
||||
def test_garbage_value_falls_back_to_default(self, tmp_path, monkeypatch):
|
||||
# Bool / non-numeric string / other garbage hits the
|
||||
# `(TypeError, ValueError)` branch and returns the documented default.
|
||||
self._seed_overlay(tmp_path, monkeypatch, {
|
||||
"guardrails": {"min_command_description_chars": "not-a-number"},
|
||||
})
|
||||
from app.instance_config import get_guardrails_min_command_description_chars
|
||||
assert get_guardrails_min_command_description_chars() == 25
|
||||
|
||||
def test_zero_or_negative_floored_to_one(self, tmp_path, monkeypatch):
|
||||
# `max(1, int(val))` — operator setting 0 to "disable" doesn't
|
||||
# actually disable; it's silently coerced to 1. Documented behavior;
|
||||
# this test pins the contract so a future change to use 0-as-sentinel
|
||||
# has to update this test (and reviewers see the policy decision).
|
||||
self._seed_overlay(tmp_path, monkeypatch, {
|
||||
"guardrails": {"min_description_chars": 0},
|
||||
})
|
||||
from app.instance_config import get_guardrails_min_description_chars
|
||||
assert get_guardrails_min_description_chars() == 1
|
||||
|
||||
# Negative integer hits the same floor.
|
||||
self._seed_overlay(tmp_path, monkeypatch, {
|
||||
"guardrails": {"min_body_chars": -50},
|
||||
})
|
||||
from app.instance_config import get_guardrails_min_body_chars
|
||||
assert get_guardrails_min_body_chars() == 1
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Round-trip: PATCH /api/admin/server-config → next inline check uses new floor
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class TestPatchRoundTrip:
|
||||
"""The "primary testing gap" Vojta flagged: an admin PATCH to
|
||||
`guardrails.min_description_chars` must take effect on the very next
|
||||
`content_check` call, with no app restart. The cache is invalidated
|
||||
by /api/admin/server-config's reset_cache() bracket.
|
||||
"""
|
||||
|
||||
def _write_skill(self, plugin_dir: Path, *, description: str) -> None:
|
||||
target = plugin_dir / "skills" / "test-skill"
|
||||
target.mkdir(parents=True, exist_ok=True)
|
||||
body = "Body content explaining the skill in enough words to clear the body floor. " * 4
|
||||
(target / "SKILL.md").write_text(
|
||||
f"---\nname: test-skill\ndescription: {description}\n---\n\n{body}\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
def test_patch_min_description_chars_takes_effect_next_check(
|
||||
self, seeded_app, monkeypatch, tmp_path,
|
||||
):
|
||||
# 75-char description: passes default floor (60) but fails after
|
||||
# we PATCH the floor to 90.
|
||||
mid_length = "Use when validating the round-trip live config thresholds end to end now."
|
||||
assert 60 <= len(mid_length) < 90, len(mid_length)
|
||||
|
||||
monkeypatch.setenv("DATA_DIR", str(tmp_path))
|
||||
state = tmp_path / "state"
|
||||
state.mkdir(parents=True, exist_ok=True)
|
||||
_reset_cache()
|
||||
|
||||
plugin_dir = Path(tempfile.mkdtemp(prefix="agnes_admin_config_test_"))
|
||||
try:
|
||||
self._write_skill(plugin_dir, description=mid_length)
|
||||
|
||||
# Step 1: at default floor (60), the description passes.
|
||||
from src.store_guardrails.content_check import check as content_check
|
||||
result = content_check(plugin_dir)
|
||||
assert result["status"] == "pass", (
|
||||
f"description {len(mid_length)} chars should pass default floor 60, "
|
||||
f"got: {result}"
|
||||
)
|
||||
|
||||
# Step 2: PATCH the floor to 90 via the admin API.
|
||||
c = seeded_app["client"]
|
||||
token = seeded_app["admin_token"]
|
||||
r = c.post(
|
||||
"/api/admin/server-config",
|
||||
headers=_auth(token),
|
||||
json={"sections": {"guardrails": {"min_description_chars": 90}}},
|
||||
)
|
||||
assert r.status_code in (200, 204), r.text
|
||||
|
||||
# Step 3: same description, same content_check — must now fail
|
||||
# with too_short. Cache invalidation done inside the admin POST
|
||||
# handler; no test-side reset_cache() call is needed (or
|
||||
# acceptable — that would be testing the test, not the system).
|
||||
result_after = content_check(plugin_dir)
|
||||
assert result_after["status"] == "fail", (
|
||||
f"after PATCH to floor 90, {len(mid_length)}-char description "
|
||||
f"must fail; got: {result_after}"
|
||||
)
|
||||
codes = {issue["code"] for issue in result_after["issues"]}
|
||||
assert "too_short" in codes, (
|
||||
f"expected too_short in issue codes, got: {codes}"
|
||||
)
|
||||
|
||||
# Step 4: PATCH the floor back to 60 (fixture hygiene + extra
|
||||
# confirmation that subsequent PATCHes also propagate).
|
||||
r = c.post(
|
||||
"/api/admin/server-config",
|
||||
headers=_auth(token),
|
||||
json={"sections": {"guardrails": {"min_description_chars": 60}}},
|
||||
)
|
||||
assert r.status_code in (200, 204), r.text
|
||||
assert content_check(plugin_dir)["status"] == "pass", (
|
||||
"PATCH-back-to-default did not propagate"
|
||||
)
|
||||
finally:
|
||||
shutil.rmtree(plugin_dir, ignore_errors=True)
|
||||
_reset_cache()
|
||||
Loading…
Reference in a new issue