* feat(store-guardrails): enforce per-component description quality
Two-tier hard guardrail on flea-market submissions. Empty / placeholder /
single-word descriptions now block before any LLM call; vague-but-passes-
floor descriptions block on the substantive LLM review layer.
Tier 1 — inline mechanical check (src/store_guardrails/content_check.py).
Walks the baked plugin tree, evaluates each component (plugin manifest,
agents, skills, commands) plus the submission-level form description
against a 60-char / 25-char (commands) / 5-distinct-word / 200-char-body
floor with a placeholder denylist (TODO, TBD, {{var}}, etc.). Floors
calibrated against real ecosystem norms: Claude / superpowers /
compound-engineering skill packs cluster 150–220 chars, npm / Docker /
VS Code at 100–120. InlineResult.passed now ANDs in content.status.
Tier 2 — LLM review extension (prompts.py + llm_review.py). System
prompt gains a content-quality criterion; REVIEW_JSON_SCHEMA carries a
content_quality {verdict, issues[]} object alongside the existing
security findings. is_safe() requires content_quality.verdict == 'pass'.
Single LLM call covers both dimensions. MAX_RESPONSE_TOKENS bumped
2000 → 2500 for the extra payload. Verdicts missing content_quality
treated as pass (backwards compat with already-recorded rows).
Submitter UX:
- /store/new wizard now carries a "Before you upload — what passes
review" collapsible disclosure on both step 1 and step 2 with the
bar + patterns that work. Live char counter on the description
field. Per-component preview table (green/red dots from the new
summarize_for_preview helper) renders after the ZIP /preview round
trip, scoping each finding to its file.
- New /store/examples page with rejected/passes pairs for skill /
agent / plugin / command plus a "Why these limits" research table.
Anchored sections (#skill / #agent / #plugin / #command) so the
rejection banner can deep-link by component_type.
- Quarantine banner _content_findings.html groups findings by file
(one "See <type> example ↗" per component, not per field) and
translates field codes (frontmatter.description / body / etc.) to
plain-English labels. _content_howto_fix.html surfaces a static
"Re-upload as new version" + "See examples" action row beneath any
content failure on the entity detail page.
- _parse_frontmatter moved to src/store_guardrails/_frontmatter.py so
the new check module shares the parser without inverting the
app → src dependency direction.
Tests:
- New tests/test_store_guardrails_content.py (29 cases) covering
every failure code per component type plus submission-level checks
and the summarize_components / summarize_for_preview helpers.
- Extended test_store_guardrails_inline.py for the new
InlineResult.content field + aggregate behaviour.
- Extended test_store_guardrails_llm.py for the new
content_quality verdict pathways (fail blocks, missing field passes).
- Backfilled fixture descriptions across test_store_api.py,
test_store_entity_versions.py, test_store_put_atomic.py,
test_admin_store_submissions.py, test_marketplace_api.py,
test_marketplace_v32_endpoints.py so existing happy-path tests
clear the new 60-char floor.
* fix(content-guardrail): align agents walker with preview + drop import-time .format()
Two cleanups from the takeover review on #276 (vr/guardrails-content).
1) `_iter_components` for agents now skips files lacking frontmatter
(no `name` AND no `description`). Pre-fix the walker greedily
evaluated every `*.md` under `agents/` — `agents/README.md` and
helper docs got flagged as "frontmatter.description empty"
rejections. Worse: `summarize_for_preview` for `type=agent` ALREADY
filters the same shape, so the upload preview gave a green dot
while the post-bake check gave a red rejection on submit. Two new
regression tests in TestAgentsWalkerSkipsNonAgentFiles pin both
shapes (README + _NOTES.md) so the preview/check parity stays
aligned.
2) `body_too_short` hints now use the same runtime-kwarg substitution
pattern as every other hint in the table. Pre-fix the skill +
agent body_too_short hints called `.format(min_chars=_MIN_BODY_CHARS)`
at module-load time, but the call site `_hint_for(type_,
"body_too_short")` didn't pass `min_chars=`, so the format() was
just baking the constant at import. Cosmetic inconsistency; pass
`min_chars=_MIN_BODY_CHARS` at the call site instead and let
`_hint_for` do the substitution like it does for `too_short`.
Verified end-to-end:
- New TestAgentsWalkerSkipsNonAgentFiles cases fail on the unfixed
walker (verified by reverting to the pre-fix file and re-running);
pass cleanly after the fix.
- Full content-guardrail suite: 25/25 (23 existing + 2 new).
- Full pytest: 4189 passed, 25 skipped.
* release: 0.53.5 — content guardrail (flea-market submitter UX) + catalog ENTITY column + BQ hint dispatch
Bundles three threads landed in [Unreleased]:
- Vojta's flea-market content guardrail (two-tier mechanical + LLM)
- Zdeněk's `agnes catalog` ENTITY column replacement for FLAVOR
- Zdeněk's `/api/query` remote_estimate_failed hint dispatch fix
Plus the takeover hygiene from #276 review (agents walker preview/check
parity + body_too_short hint runtime kwarg consistency) and the
backslash-escape fix follow-up to v0.53.4 #275.
No DB migration; no API change. Patch upgrade lands transparently.
Upload form's new "Before you upload" disclosure + per-component preview
table appear on the next dev-VM auto-pull. Quarantine banner now groups
findings by file with "See <type> example ↗" deep-links to the new
/store/examples reference page.
---------
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
302 lines
12 KiB
Python
302 lines
12 KiB
Python
"""Content-guardrail tests — the mechanical per-component description check.
|
|
|
|
Exercises every failure-mode code on a synthetic baked plugin tree:
|
|
empty, placeholder_text, too_short, low_word_count, body_too_short.
|
|
Also verifies the aggregate ``InlineResult.passed`` flips false when the
|
|
content tier fails even with manifest + security passing.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import json
|
|
import shutil
|
|
import tempfile
|
|
from pathlib import Path
|
|
|
|
import pytest
|
|
|
|
from src.store_guardrails import run_inline_checks
|
|
from src.store_guardrails.content_check import (
|
|
check as content_check,
|
|
check_submission_description,
|
|
summarize_components,
|
|
summarize_for_preview,
|
|
)
|
|
|
|
|
|
_OK_DESC = "Use when validating per-component description guardrails end to end"
|
|
_OK_BODY = "Body content explaining what this component does, when to use it, and the constraints. " * 4
|
|
|
|
|
|
@pytest.fixture
|
|
def plugin_dir():
|
|
d = Path(tempfile.mkdtemp(prefix="agnes_content_test_"))
|
|
yield d
|
|
shutil.rmtree(d, ignore_errors=True)
|
|
|
|
|
|
def _write_skill(plugin_dir: Path, *, description: str = _OK_DESC, body: str = _OK_BODY) -> None:
|
|
target = plugin_dir / "skills" / "test-skill"
|
|
target.mkdir(parents=True, exist_ok=True)
|
|
(target / "SKILL.md").write_text(
|
|
f"---\nname: test-skill\ndescription: {description}\n---\n\n{body}\n",
|
|
encoding="utf-8",
|
|
)
|
|
|
|
|
|
def _write_agent(plugin_dir: Path, *, name: str = "reviewer", description: str = _OK_DESC, body: str = _OK_BODY) -> None:
|
|
target = plugin_dir / "agents"
|
|
target.mkdir(parents=True, exist_ok=True)
|
|
(target / f"{name}.md").write_text(
|
|
f"---\nname: {name}\ndescription: {description}\n---\n\n{body}\n",
|
|
encoding="utf-8",
|
|
)
|
|
|
|
|
|
def _write_plugin_json(plugin_dir: Path, *, description: str = _OK_DESC) -> None:
|
|
target = plugin_dir / ".claude-plugin"
|
|
target.mkdir(parents=True, exist_ok=True)
|
|
(target / "plugin.json").write_text(
|
|
json.dumps({"name": "test-plugin", "description": description, "version": "0.1.0"}),
|
|
encoding="utf-8",
|
|
)
|
|
|
|
|
|
def _write_command(plugin_dir: Path, *, name: str = "run", description: str = "Run the test suite and report failures") -> None:
|
|
target = plugin_dir / "commands"
|
|
target.mkdir(parents=True, exist_ok=True)
|
|
(target / f"{name}.md").write_text(
|
|
f"---\nname: {name}\ndescription: {description}\n---\n\nrun it\n",
|
|
encoding="utf-8",
|
|
)
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Component-level failure codes
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
class TestSkillDescriptions:
|
|
def test_empty_description_fails(self, plugin_dir):
|
|
_write_skill(plugin_dir, description="")
|
|
out = content_check(plugin_dir)
|
|
assert out["status"] == "fail"
|
|
codes = {i["code"] for i in out["issues"]}
|
|
assert "empty" in codes
|
|
|
|
def test_todo_literal_fails_as_placeholder(self, plugin_dir):
|
|
_write_skill(plugin_dir, description="TODO")
|
|
out = content_check(plugin_dir)
|
|
codes = {i["code"] for i in out["issues"]}
|
|
assert "placeholder_text" in codes
|
|
|
|
def test_todo_prefix_fails(self, plugin_dir):
|
|
_write_skill(plugin_dir, description="TODO add the real description later")
|
|
out = content_check(plugin_dir)
|
|
codes = {i["code"] for i in out["issues"]}
|
|
assert "placeholder_text" in codes
|
|
|
|
def test_short_description_fails(self, plugin_dir):
|
|
_write_skill(plugin_dir, description="too short here") # 14 chars
|
|
out = content_check(plugin_dir)
|
|
codes = {i["code"] for i in out["issues"]}
|
|
assert "too_short" in codes
|
|
|
|
def test_unfilled_jinja_placeholder_fails(self, plugin_dir):
|
|
_write_skill(plugin_dir, description="Use when {{my_skill}} fires")
|
|
out = content_check(plugin_dir)
|
|
codes = {i["code"] for i in out["issues"]}
|
|
assert "placeholder_text" in codes
|
|
|
|
def test_length_floor_takes_precedence_over_word_count(self, plugin_dir):
|
|
# 4 words but well under 30 chars.
|
|
_write_skill(plugin_dir, description="foo bar baz quux")
|
|
out = content_check(plugin_dir)
|
|
codes = {i["code"] for i in out["issues"]}
|
|
assert "too_short" in codes
|
|
|
|
def test_low_distinct_words_fails(self, plugin_dir):
|
|
# 80 chars clears the length floor but only 1 distinct word
|
|
# after stripping punctuation — low_word_count fires.
|
|
_write_skill(plugin_dir, description=("foo " * 20).strip())
|
|
out = content_check(plugin_dir)
|
|
codes = {i["code"] for i in out["issues"]}
|
|
assert "low_word_count" in codes
|
|
|
|
def test_body_too_short_fails(self, plugin_dir):
|
|
_write_skill(plugin_dir, body="short body")
|
|
out = content_check(plugin_dir)
|
|
codes = {(i["field"], i["code"]) for i in out["issues"]}
|
|
assert ("body", "body_too_short") in codes
|
|
|
|
def test_well_formed_skill_passes(self, plugin_dir):
|
|
_write_skill(plugin_dir)
|
|
out = content_check(plugin_dir)
|
|
assert out["status"] == "pass"
|
|
assert out["issues"] == []
|
|
|
|
|
|
class TestPluginAndAgentDescriptions:
|
|
def test_plugin_description_empty_fails(self, plugin_dir):
|
|
_write_plugin_json(plugin_dir, description="")
|
|
out = content_check(plugin_dir)
|
|
files = {i["file"] for i in out["issues"]}
|
|
assert ".claude-plugin/plugin.json" in files
|
|
|
|
def test_one_bad_agent_among_many_is_isolated(self, plugin_dir):
|
|
_write_plugin_json(plugin_dir)
|
|
_write_agent(plugin_dir, name="good_one", description=_OK_DESC)
|
|
_write_agent(plugin_dir, name="bad_one", description="")
|
|
out = content_check(plugin_dir)
|
|
assert out["status"] == "fail"
|
|
# Only the bad agent's file shows in issues.
|
|
files = {i["file"] for i in out["issues"]}
|
|
assert "agents/bad_one.md" in files
|
|
assert "agents/good_one.md" not in files
|
|
|
|
def test_plugin_passes_when_descriptions_all_strong(self, plugin_dir):
|
|
_write_plugin_json(plugin_dir)
|
|
_write_agent(plugin_dir)
|
|
_write_skill(plugin_dir)
|
|
_write_command(plugin_dir)
|
|
out = content_check(plugin_dir)
|
|
assert out["status"] == "pass"
|
|
|
|
|
|
class TestCommands:
|
|
def test_command_short_description_fails(self, plugin_dir):
|
|
_write_command(plugin_dir, description="run") # 3 chars
|
|
out = content_check(plugin_dir)
|
|
codes = {i["code"] for i in out["issues"]}
|
|
assert "too_short" in codes
|
|
|
|
def test_command_lower_floor_still_enforced(self, plugin_dir):
|
|
# 38 chars + 6 distinct words — clears the 25/5 command floor.
|
|
_write_command(plugin_dir, description="Run tests, format output, report failures clearly")
|
|
out = content_check(plugin_dir)
|
|
assert out["status"] == "pass"
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Submission-level description
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
class TestSubmissionDescription:
|
|
def test_empty_submission_description_fails(self):
|
|
out = check_submission_description("")
|
|
assert out["status"] == "fail"
|
|
assert out["issues"][0]["code"] == "empty"
|
|
assert out["issues"][0]["file"] == "<submission>"
|
|
|
|
def test_placeholder_submission_description_fails(self):
|
|
out = check_submission_description("TBD")
|
|
codes = {i["code"] for i in out["issues"]}
|
|
assert "placeholder_text" in codes
|
|
|
|
def test_strong_submission_description_passes(self):
|
|
out = check_submission_description(_OK_DESC)
|
|
assert out["status"] == "pass"
|
|
assert out["issues"] == []
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Aggregation — InlineResult.passed flips on content failure
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
class TestInlineAggregate:
|
|
def test_content_failure_blocks_passed(self, plugin_dir):
|
|
# Skill with frontmatter description = TODO. Manifest + security
|
|
# pass; content fails; aggregate must be False.
|
|
_write_skill(plugin_dir, description="TODO")
|
|
r = run_inline_checks(plugin_dir, type_="skill", description=_OK_DESC)
|
|
assert r.manifest["status"] == "pass"
|
|
assert r.static_security["status"] == "pass"
|
|
assert r.content["status"] == "fail"
|
|
assert not r.passed
|
|
|
|
def test_submission_desc_failure_merges_into_content(self, plugin_dir):
|
|
_write_skill(plugin_dir)
|
|
r = run_inline_checks(plugin_dir, type_="skill", description="")
|
|
assert r.content["status"] == "fail"
|
|
files = {i["file"] for i in r.content["issues"]}
|
|
assert "<submission>" in files
|
|
|
|
def test_clean_bundle_passes(self, plugin_dir):
|
|
_write_skill(plugin_dir)
|
|
r = run_inline_checks(plugin_dir, type_="skill", description=_OK_DESC)
|
|
assert r.passed
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# summarize_components + summarize_for_preview
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
class TestSummaries:
|
|
def test_summarize_components_baked_plugin_tree(self, plugin_dir):
|
|
_write_plugin_json(plugin_dir)
|
|
_write_agent(plugin_dir)
|
|
_write_skill(plugin_dir)
|
|
rows = summarize_components(plugin_dir)
|
|
types = {r["type"] for r in rows}
|
|
assert types == {"plugin", "agent", "skill"}
|
|
for r in rows:
|
|
assert r["ok"] is True
|
|
|
|
def test_summarize_for_preview_skill(self, plugin_dir):
|
|
# Single SKILL.md at root — preview should locate it without the
|
|
# `skills/<name>/` wrapper.
|
|
(plugin_dir / "SKILL.md").write_text(
|
|
f"---\nname: probe\ndescription: {_OK_DESC}\n---\n\n{_OK_BODY}\n",
|
|
encoding="utf-8",
|
|
)
|
|
rows = summarize_for_preview(plugin_dir, "skill")
|
|
assert len(rows) == 1
|
|
assert rows[0]["type"] == "skill"
|
|
assert rows[0]["ok"] is True
|
|
|
|
def test_summarize_for_preview_marks_bad_descriptions(self, plugin_dir):
|
|
(plugin_dir / "SKILL.md").write_text(
|
|
"---\nname: probe\ndescription: TODO\n---\n\n" + _OK_BODY + "\n",
|
|
encoding="utf-8",
|
|
)
|
|
rows = summarize_for_preview(plugin_dir, "skill")
|
|
assert len(rows) == 1
|
|
assert rows[0]["ok"] is False
|
|
codes = {i["code"] for i in rows[0]["issues"]}
|
|
assert "placeholder_text" in codes
|
|
|
|
|
|
class TestAgentsWalkerSkipsNonAgentFiles:
|
|
"""`agents/README.md` (and other helper files without frontmatter)
|
|
must not be evaluated as a missing-description agent. Pre-fix the
|
|
`_iter_components` walker greedily evaluated every `*.md` under
|
|
`agents/`, which gave a green dot in the upload preview (preview
|
|
walker correctly filtered) but a red rejection on submit (check
|
|
walker did not). Pin the parity here so the two stay aligned."""
|
|
|
|
def test_readme_under_agents_is_skipped(self, plugin_dir):
|
|
# One real agent + one README (no frontmatter at all).
|
|
_write_agent(plugin_dir, name="reviewer")
|
|
(plugin_dir / "agents" / "README.md").write_text(
|
|
"# How to author agents in this plugin\n\nA few notes for contributors.\n",
|
|
encoding="utf-8",
|
|
)
|
|
result = content_check(plugin_dir)
|
|
# README must NOT generate any issue. The lone real agent passes
|
|
# the floor, so the whole plugin passes.
|
|
assert result["status"] == "pass", result["issues"]
|
|
|
|
def test_helper_md_without_frontmatter_is_skipped(self, plugin_dir):
|
|
_write_agent(plugin_dir, name="reviewer")
|
|
(plugin_dir / "agents" / "_NOTES.md").write_text(
|
|
"Some helper notes — not an agent. No frontmatter, no agent shape.\n",
|
|
encoding="utf-8",
|
|
)
|
|
rows = summarize_components(plugin_dir)
|
|
types_files = {(r["type"], r["file"]) for r in rows}
|
|
# Only the real agent should appear; _NOTES.md is filtered out.
|
|
assert ("agent", "agents/reviewer.md") in types_files
|
|
assert ("agent", "agents/_NOTES.md") not in types_files
|