agnes-the-ai-analyst

AI-Cognitive-Leap/agnes-the-ai-analyst

Fork 0

Commit graph

Author	SHA1	Message	Date
Vojtech	fb6e930bc9	feat(store-guardrails): per-component description quality + plain-language UX (#276 ) * feat(store-guardrails): enforce per-component description quality Two-tier hard guardrail on flea-market submissions. Empty / placeholder / single-word descriptions now block before any LLM call; vague-but-passes- floor descriptions block on the substantive LLM review layer. Tier 1 — inline mechanical check (src/store_guardrails/content_check.py). Walks the baked plugin tree, evaluates each component (plugin manifest, agents, skills, commands) plus the submission-level form description against a 60-char / 25-char (commands) / 5-distinct-word / 200-char-body floor with a placeholder denylist (TODO, TBD, {{var}}, etc.). Floors calibrated against real ecosystem norms: Claude / superpowers / compound-engineering skill packs cluster 150–220 chars, npm / Docker / VS Code at 100–120. InlineResult.passed now ANDs in content.status. Tier 2 — LLM review extension (prompts.py + llm_review.py). System prompt gains a content-quality criterion; REVIEW_JSON_SCHEMA carries a content_quality {verdict, issues[]} object alongside the existing security findings. is_safe() requires content_quality.verdict == 'pass'. Single LLM call covers both dimensions. MAX_RESPONSE_TOKENS bumped 2000 → 2500 for the extra payload. Verdicts missing content_quality treated as pass (backwards compat with already-recorded rows). Submitter UX: - /store/new wizard now carries a "Before you upload — what passes review" collapsible disclosure on both step 1 and step 2 with the bar + patterns that work. Live char counter on the description field. Per-component preview table (green/red dots from the new summarize_for_preview helper) renders after the ZIP /preview round trip, scoping each finding to its file. - New /store/examples page with rejected/passes pairs for skill / agent / plugin / command plus a "Why these limits" research table. Anchored sections (#skill / #agent / #plugin / #command) so the rejection banner can deep-link by component_type. - Quarantine banner _content_findings.html groups findings by file (one "See <type> example ↗" per component, not per field) and translates field codes (frontmatter.description / body / etc.) to plain-English labels. _content_howto_fix.html surfaces a static "Re-upload as new version" + "See examples" action row beneath any content failure on the entity detail page. - _parse_frontmatter moved to src/store_guardrails/_frontmatter.py so the new check module shares the parser without inverting the app → src dependency direction. Tests: - New tests/test_store_guardrails_content.py (29 cases) covering every failure code per component type plus submission-level checks and the summarize_components / summarize_for_preview helpers. - Extended test_store_guardrails_inline.py for the new InlineResult.content field + aggregate behaviour. - Extended test_store_guardrails_llm.py for the new content_quality verdict pathways (fail blocks, missing field passes). - Backfilled fixture descriptions across test_store_api.py, test_store_entity_versions.py, test_store_put_atomic.py, test_admin_store_submissions.py, test_marketplace_api.py, test_marketplace_v32_endpoints.py so existing happy-path tests clear the new 60-char floor. * fix(content-guardrail): align agents walker with preview + drop import-time .format() Two cleanups from the takeover review on #276 (vr/guardrails-content). 1) `_iter_components` for agents now skips files lacking frontmatter (no `name` AND no `description`). Pre-fix the walker greedily evaluated every `.md` under `agents/` — `agents/README.md` and helper docs got flagged as "frontmatter.description empty" rejections. Worse: `summarize_for_preview` for `type=agent` ALREADY filters the same shape, so the upload preview gave a green dot while the post-bake check gave a red rejection on submit. Two new regression tests in TestAgentsWalkerSkipsNonAgentFiles pin both shapes (README + _NOTES.md) so the preview/check parity stays aligned. 2) `body_too_short` hints now use the same runtime-kwarg substitution pattern as every other hint in the table. Pre-fix the skill + agent body_too_short hints called `.format(min_chars=_MIN_BODY_CHARS)` at module-load time, but the call site `_hint_for(type_, "body_too_short")` didn't pass `min_chars=`, so the format() was just baking the constant at import. Cosmetic inconsistency; pass `min_chars=_MIN_BODY_CHARS` at the call site instead and let `_hint_for` do the substitution like it does for `too_short`. Verified end-to-end: - New TestAgentsWalkerSkipsNonAgentFiles cases fail on the unfixed walker (verified by reverting to the pre-fix file and re-running); pass cleanly after the fix. - Full content-guardrail suite: 25/25 (23 existing + 2 new). - Full pytest: 4189 passed, 25 skipped. release: 0.53.5 — content guardrail (flea-market submitter UX) + catalog ENTITY column + BQ hint dispatch Bundles three threads landed in [Unreleased]: - Vojta's flea-market content guardrail (two-tier mechanical + LLM) - Zdeněk's `agnes catalog` ENTITY column replacement for FLAVOR - Zdeněk's `/api/query` remote_estimate_failed hint dispatch fix Plus the takeover hygiene from #276 review (agents walker preview/check parity + body_too_short hint runtime kwarg consistency) and the backslash-escape fix follow-up to v0.53.4 #275. No DB migration; no API change. Patch upgrade lands transparently. Upload form's new "Before you upload" disclosure + per-component preview table appear on the next dev-VM auto-pull. Quarantine banner now groups findings by file with "See <type> example ↗" deep-links to the new /store/examples reference page. --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-12 21:48:27 +02:00
Vojtech	929520f5e1	Flea-market edit feature with version history (schema v37) (#239 ) * feat(store): flea-market entity edit feature with version history (schema v38) Owner + admin can now edit a store entity from a real Edit page at /marketplace/flea/{id}/edit, replacing the prior "coming soon" placeholder. Editable: display name, description, category, video URL, cover photo, and an optional new bundle. Type is locked (400 type_locked). Display-name change renames the on-disk slug for both live plugin/ and version dirs (reuses rename-on-archive helper). Schema v38 (originally drafted as v37; renumbered after rebase onto main where v37 was taken by the curated marketplace enrichment). Versioning model: * Each bundle update bakes into ${DATA_DIR}/store/<id>/versions/v<N+1>/plugin/ and runs the standard guardrails pipeline. * DEFERRED PROMOTION: live plugin/ + entity.version_no stay at the prior approved version through the LLM review window so existing installers keep receiving the previously approved bundle. Live swap + version_no/version/file_size bump happen only on LLM approval. Blocked verdicts leave the prior version serving forever. * store_entities gains version_no INTEGER + version_history JSON. Each version_history entry carries hash, sha256, size, submission_id, created_at, created_by. * Existing entities backfill to v1 with a single-entry history seeded from the row's current `version` hash. Initial create also seeds versions/v1/plugin/ so future restore can copy v1 bytes forward. Concurrency: * Block-while-pending: an in-flight LLM review blocks any further edit with 409 prior_version_pending. Owner waits 5-30s; Edit button on detail page renders disabled in the same window via the new edit_in_flight flag (decoupled from quarantine_sub since the deferred-promotion flow keeps visibility='approved'). Rollback: * New endpoint POST /api/store/entities/{id}/versions/{n}/restore (owner + admin). Copies vN bundle forward as v<max+1> and re-runs guardrails (rules tighten over time; pre-approved bundles re-validate). Forward-only history. Same deferred-promotion semantics — live stays at prior version until LLM approves the restored copy. UI: * New /marketplace/flea/{id}/edit page (owner + admin gated). * Versions card on plugin + item detail templates (owner/admin only) via shared _flea_versions.html partial. * Admin queue gains v# column with current badge + separate Hash column. Submission detail surfaces Version + Bundle hash rows. * Activity timeline split into per-submission + entity-wide cards; entity-wide rows render vN chips when audit row params reference a specific version. * Section headers (Manifest / Static / Quality / LLM review) tag with vN chip via shared macro. * Reviewed-by-model field surfaces explanatory text per status. * Banner upload-failure now redirects to detail page on submission_blocked instead of staying stuck. Tests: 24 in tests/test_store_entity_versions.py covering metadata- only edit, bundle-edit version bump, type lock, block-while-pending, name change disk rename, restore flow + 404/400/403 paths, edit page 404 for non-owner, versions card visibility gating, admin queue v# column, admin detail Version/Hash rows, deferred-promotion installer contract (pending review doesn't break installer / blocked verdict keeps prior / approved promotes), admin can edit/restore non-owned, restore deferred promotion, audit log per-version params. 214 tests green across guardrails + edit + admin + repo + schema suites. * docs(store): refresh update_entity docstring to match deferred-promotion + submission-status gate Bring the docstring in sync with the actual fixes from the prior commit. The pre-fix wording said the gate read visibility_status='pending' AND submission status — under deferred promotion that would never fire for v2+ edits. Now describes: - Block-while-pending gates on submission.status DIRECTLY, independent of visibility (so v2+ deferred-promotion edits don't slip through). - Display-name + bundle change defers the live rename to promotion; metadata-only renames stay immediate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 00:14:33 +04:00

Author

SHA1

Message

Date

Vojtech

fb6e930bc9

feat(store-guardrails): per-component description quality + plain-language UX (#276 )

* feat(store-guardrails): enforce per-component description quality

Two-tier hard guardrail on flea-market submissions. Empty / placeholder /
single-word descriptions now block before any LLM call; vague-but-passes-
floor descriptions block on the substantive LLM review layer.

Tier 1 — inline mechanical check (src/store_guardrails/content_check.py).
Walks the baked plugin tree, evaluates each component (plugin manifest,
agents, skills, commands) plus the submission-level form description
against a 60-char / 25-char (commands) / 5-distinct-word / 200-char-body
floor with a placeholder denylist (TODO, TBD, {{var}}, etc.). Floors
calibrated against real ecosystem norms: Claude / superpowers /
compound-engineering skill packs cluster 150–220 chars, npm / Docker /
VS Code at 100–120. InlineResult.passed now ANDs in content.status.

Tier 2 — LLM review extension (prompts.py + llm_review.py). System
prompt gains a content-quality criterion; REVIEW_JSON_SCHEMA carries a
content_quality {verdict, issues[]} object alongside the existing
security findings. is_safe() requires content_quality.verdict == 'pass'.
Single LLM call covers both dimensions. MAX_RESPONSE_TOKENS bumped
2000 → 2500 for the extra payload. Verdicts missing content_quality
treated as pass (backwards compat with already-recorded rows).

Submitter UX:
- /store/new wizard now carries a "Before you upload — what passes
review" collapsible disclosure on both step 1 and step 2 with the
bar + patterns that work. Live char counter on the description
field. Per-component preview table (green/red dots from the new
summarize_for_preview helper) renders after the ZIP /preview round
trip, scoping each finding to its file.
- New /store/examples page with rejected/passes pairs for skill /
agent / plugin / command plus a "Why these limits" research table.
Anchored sections (#skill / #agent / #plugin / #command) so the
rejection banner can deep-link by component_type.
- Quarantine banner _content_findings.html groups findings by file
(one "See <type> example ↗" per component, not per field) and
translates field codes (frontmatter.description / body / etc.) to
plain-English labels. _content_howto_fix.html surfaces a static
"Re-upload as new version" + "See examples" action row beneath any
content failure on the entity detail page.
- _parse_frontmatter moved to src/store_guardrails/_frontmatter.py so
the new check module shares the parser without inverting the
app → src dependency direction.

Tests:
- New tests/test_store_guardrails_content.py (29 cases) covering
every failure code per component type plus submission-level checks
and the summarize_components / summarize_for_preview helpers.
- Extended test_store_guardrails_inline.py for the new
InlineResult.content field + aggregate behaviour.
- Extended test_store_guardrails_llm.py for the new
content_quality verdict pathways (fail blocks, missing field passes).
- Backfilled fixture descriptions across test_store_api.py,
test_store_entity_versions.py, test_store_put_atomic.py,
test_admin_store_submissions.py, test_marketplace_api.py,
test_marketplace_v32_endpoints.py so existing happy-path tests
clear the new 60-char floor.

* fix(content-guardrail): align agents walker with preview + drop import-time .format()

Two cleanups from the takeover review on #276 (vr/guardrails-content).

1) `_iter_components` for agents now skips files lacking frontmatter
(no `name` AND no `description`). Pre-fix the walker greedily
evaluated every `*.md` under `agents/` — `agents/README.md` and
helper docs got flagged as "frontmatter.description empty"
rejections. Worse: `summarize_for_preview` for `type=agent` ALREADY
filters the same shape, so the upload preview gave a green dot
while the post-bake check gave a red rejection on submit. Two new
regression tests in TestAgentsWalkerSkipsNonAgentFiles pin both
shapes (README + _NOTES.md) so the preview/check parity stays
aligned.

2) `body_too_short` hints now use the same runtime-kwarg substitution
pattern as every other hint in the table. Pre-fix the skill +
agent body_too_short hints called `.format(min_chars=_MIN_BODY_CHARS)`
at module-load time, but the call site `_hint_for(type_,
"body_too_short")` didn't pass `min_chars=`, so the format() was
just baking the constant at import. Cosmetic inconsistency; pass
`min_chars=_MIN_BODY_CHARS` at the call site instead and let
`_hint_for` do the substitution like it does for `too_short`.

Verified end-to-end:
- New TestAgentsWalkerSkipsNonAgentFiles cases fail on the unfixed
walker (verified by reverting to the pre-fix file and re-running);
pass cleanly after the fix.
- Full content-guardrail suite: 25/25 (23 existing + 2 new).
- Full pytest: 4189 passed, 25 skipped.

* release: 0.53.5 — content guardrail (flea-market submitter UX) + catalog ENTITY column + BQ hint dispatch

Bundles three threads landed in [Unreleased]:
- Vojta's flea-market content guardrail (two-tier mechanical + LLM)
- Zdeněk's `agnes catalog` ENTITY column replacement for FLAVOR
- Zdeněk's `/api/query` remote_estimate_failed hint dispatch fix

Plus the takeover hygiene from #276 review (agents walker preview/check
parity + body_too_short hint runtime kwarg consistency) and the
backslash-escape fix follow-up to v0.53.4 #275.

No DB migration; no API change. Patch upgrade lands transparently.
Upload form's new "Before you upload" disclosure + per-component preview
table appear on the next dev-VM auto-pull. Quarantine banner now groups
findings by file with "See <type> example ↗" deep-links to the new
/store/examples reference page.

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>

2026-05-12 21:48:27 +02:00

Vojtech

929520f5e1

Flea-market edit feature with version history (schema v37) (#239 )

* feat(store): flea-market entity edit feature with version history (schema v38)

Owner + admin can now edit a store entity from a real Edit page at
/marketplace/flea/{id}/edit, replacing the prior "coming soon"
placeholder. Editable: display name, description, category, video
URL, cover photo, and an optional new bundle. Type is locked (400
type_locked). Display-name change renames the on-disk slug for both
live plugin/ and version dirs (reuses rename-on-archive helper).

Schema v38 (originally drafted as v37; renumbered after rebase onto
main where v37 was taken by the curated marketplace enrichment).

Versioning model:
* Each bundle update bakes into ${DATA_DIR}/store/<id>/versions/v<N+1>/plugin/
and runs the standard guardrails pipeline.
* DEFERRED PROMOTION: live plugin/ + entity.version_no stay at the
prior approved version through the LLM review window so existing
installers keep receiving the previously approved bundle. Live swap
+ version_no/version/file_size bump happen only on LLM approval.
Blocked verdicts leave the prior version serving forever.
* store_entities gains version_no INTEGER + version_history JSON.
Each version_history entry carries hash, sha256, size, submission_id,
created_at, created_by.
* Existing entities backfill to v1 with a single-entry history seeded
from the row's current `version` hash. Initial create also seeds
versions/v1/plugin/ so future restore can copy v1 bytes forward.

Concurrency:
* Block-while-pending: an in-flight LLM review blocks any further edit
with 409 prior_version_pending. Owner waits 5-30s; Edit button on
detail page renders disabled in the same window via the new
edit_in_flight flag (decoupled from quarantine_sub since the
deferred-promotion flow keeps visibility='approved').

Rollback:
* New endpoint POST /api/store/entities/{id}/versions/{n}/restore
(owner + admin). Copies vN bundle forward as v<max+1> and re-runs
guardrails (rules tighten over time; pre-approved bundles re-validate).
Forward-only history. Same deferred-promotion semantics — live stays
at prior version until LLM approves the restored copy.

UI:
* New /marketplace/flea/{id}/edit page (owner + admin gated).
* Versions card on plugin + item detail templates (owner/admin only)
via shared _flea_versions.html partial.
* Admin queue gains v# column with current badge + separate Hash
column. Submission detail surfaces Version + Bundle hash rows.
* Activity timeline split into per-submission + entity-wide cards;
entity-wide rows render vN chips when audit row params reference
a specific version.
* Section headers (Manifest / Static / Quality / LLM review) tag
with vN chip via shared macro.
* Reviewed-by-model field surfaces explanatory text per status.
* Banner upload-failure now redirects to detail page on
submission_blocked instead of staying stuck.

Tests: 24 in tests/test_store_entity_versions.py covering metadata-
only edit, bundle-edit version bump, type lock, block-while-pending,
name change disk rename, restore flow + 404/400/403 paths, edit page
404 for non-owner, versions card visibility gating, admin queue v#
column, admin detail Version/Hash rows, deferred-promotion installer
contract (pending review doesn't break installer / blocked verdict
keeps prior / approved promotes), admin can edit/restore non-owned,
restore deferred promotion, audit log per-version params. 214 tests
green across guardrails + edit + admin + repo + schema suites.

* docs(store): refresh update_entity docstring to match deferred-promotion + submission-status gate

Bring the docstring in sync with the actual fixes from the prior
commit. The pre-fix wording said the gate read
visibility_status='pending' AND submission status — under deferred
promotion that would never fire for v2+ edits. Now describes:

- Block-while-pending gates on submission.status DIRECTLY,
independent of visibility (so v2+ deferred-promotion edits don't
slip through).
- Display-name + bundle change defers the live rename to promotion;
metadata-only renames stay immediate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-10 00:14:33 +04:00

2 commits