agnes-the-ai-analyst

History

Vojtech fb6e930bc9 feat(store-guardrails): per-component description quality + plain-language UX (#276 ) * feat(store-guardrails): enforce per-component description quality Two-tier hard guardrail on flea-market submissions. Empty / placeholder / single-word descriptions now block before any LLM call; vague-but-passes- floor descriptions block on the substantive LLM review layer. Tier 1 — inline mechanical check (src/store_guardrails/content_check.py). Walks the baked plugin tree, evaluates each component (plugin manifest, agents, skills, commands) plus the submission-level form description against a 60-char / 25-char (commands) / 5-distinct-word / 200-char-body floor with a placeholder denylist (TODO, TBD, {{var}}, etc.). Floors calibrated against real ecosystem norms: Claude / superpowers / compound-engineering skill packs cluster 150–220 chars, npm / Docker / VS Code at 100–120. InlineResult.passed now ANDs in content.status. Tier 2 — LLM review extension (prompts.py + llm_review.py). System prompt gains a content-quality criterion; REVIEW_JSON_SCHEMA carries a content_quality {verdict, issues[]} object alongside the existing security findings. is_safe() requires content_quality.verdict == 'pass'. Single LLM call covers both dimensions. MAX_RESPONSE_TOKENS bumped 2000 → 2500 for the extra payload. Verdicts missing content_quality treated as pass (backwards compat with already-recorded rows). Submitter UX: - /store/new wizard now carries a "Before you upload — what passes review" collapsible disclosure on both step 1 and step 2 with the bar + patterns that work. Live char counter on the description field. Per-component preview table (green/red dots from the new summarize_for_preview helper) renders after the ZIP /preview round trip, scoping each finding to its file. - New /store/examples page with rejected/passes pairs for skill / agent / plugin / command plus a "Why these limits" research table. Anchored sections (#skill / #agent / #plugin / #command) so the rejection banner can deep-link by component_type. - Quarantine banner _content_findings.html groups findings by file (one "See <type> example ↗" per component, not per field) and translates field codes (frontmatter.description / body / etc.) to plain-English labels. _content_howto_fix.html surfaces a static "Re-upload as new version" + "See examples" action row beneath any content failure on the entity detail page. - _parse_frontmatter moved to src/store_guardrails/_frontmatter.py so the new check module shares the parser without inverting the app → src dependency direction. Tests: - New tests/test_store_guardrails_content.py (29 cases) covering every failure code per component type plus submission-level checks and the summarize_components / summarize_for_preview helpers. - Extended test_store_guardrails_inline.py for the new InlineResult.content field + aggregate behaviour. - Extended test_store_guardrails_llm.py for the new content_quality verdict pathways (fail blocks, missing field passes). - Backfilled fixture descriptions across test_store_api.py, test_store_entity_versions.py, test_store_put_atomic.py, test_admin_store_submissions.py, test_marketplace_api.py, test_marketplace_v32_endpoints.py so existing happy-path tests clear the new 60-char floor. * fix(content-guardrail): align agents walker with preview + drop import-time .format() Two cleanups from the takeover review on #276 (vr/guardrails-content). 1) `_iter_components` for agents now skips files lacking frontmatter (no `name` AND no `description`). Pre-fix the walker greedily evaluated every `.md` under `agents/` — `agents/README.md` and helper docs got flagged as "frontmatter.description empty" rejections. Worse: `summarize_for_preview` for `type=agent` ALREADY filters the same shape, so the upload preview gave a green dot while the post-bake check gave a red rejection on submit. Two new regression tests in TestAgentsWalkerSkipsNonAgentFiles pin both shapes (README + _NOTES.md) so the preview/check parity stays aligned. 2) `body_too_short` hints now use the same runtime-kwarg substitution pattern as every other hint in the table. Pre-fix the skill + agent body_too_short hints called `.format(min_chars=_MIN_BODY_CHARS)` at module-load time, but the call site `_hint_for(type_, "body_too_short")` didn't pass `min_chars=`, so the format() was just baking the constant at import. Cosmetic inconsistency; pass `min_chars=_MIN_BODY_CHARS` at the call site instead and let `_hint_for` do the substitution like it does for `too_short`. Verified end-to-end: - New TestAgentsWalkerSkipsNonAgentFiles cases fail on the unfixed walker (verified by reverting to the pre-fix file and re-running); pass cleanly after the fix. - Full content-guardrail suite: 25/25 (23 existing + 2 new). - Full pytest: 4189 passed, 25 skipped. release: 0.53.5 — content guardrail (flea-market submitter UX) + catalog ENTITY column + BQ hint dispatch Bundles three threads landed in [Unreleased]: - Vojta's flea-market content guardrail (two-tier mechanical + LLM) - Zdeněk's `agnes catalog` ENTITY column replacement for FLAVOR - Zdeněk's `/api/query` remote_estimate_failed hint dispatch fix Plus the takeover hygiene from #276 review (agents walker preview/check parity + body_too_short hint runtime kwarg consistency) and the backslash-escape fix follow-up to v0.53.4 #275. No DB migration; no API change. Patch upgrade lands transparently. Upload form's new "Before you upload" disclosure + per-component preview table appear on the next dev-VM auto-pull. Quarantine banner now groups findings by file with "See <type> example ↗" deep-links to the new /store/examples reference page. --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>		2026-05-12 21:48:27 +02:00
..
__init__.py
_metadata_models.py	feat(catalog): entity_type + validated where_examples + view-aware cost-guard + scheduler hygiene	2026-05-12 10:37:35 +02:00
access.py	System plugins (schema v39) + marketplace UX polish + drop legacy pages (#241 )	2026-05-10 19:15:41 +00:00
admin.py	Flea-market upload guardrails + soft delete + JOIN-based admin queue (#233 )	2026-05-09 17:32:53 +04:00
admin_bigquery_test.py	feat(admin): #160 BQ test-connection endpoint + billing_project placeholder UI	2026-05-04 10:31:35 +02:00
bq_metadata_refresh.py	release: 0.52.0 — UX/hygiene round (5 fixes from 0.51.0 retro)	2026-05-12 15:09:14 +02:00
cache_warmup.py	release: 0.50.0 — persistent BQ metadata cache + scheduled refresh; catalog never blocks on BigQuery	2026-05-11 20:37:17 +02:00
catalog.py	feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150 )	2026-04-30 22:02:16 +02:00
claude_md.py	chore(cli-rename): replace stale `da` verbs in active code paths	2026-05-04 21:10:43 +02:00
cli_artifacts.py	chore: rename stale 'da' references to 'agnes' + CHANGELOG	2026-05-06 23:23:59 +02:00
data.py	feat(caddy): file_server for parquet downloads — bypass uvicorn	2026-05-05 16:41:33 +02:00
health.py	Extract session-pipeline framework + UsageProcessor skeleton (#232 )	2026-05-08 19:47:46 +02:00
jira_webhooks.py	fix(security): close Jira webhook fail-open + path traversal (#83 ) (#93 )	2026-04-27 19:53:55 +02:00
marketplace.py	Marketplace UX overhaul: rich plugin/skill/agent detail + filename rename (#251 )	2026-05-12 08:38:39 +00:00
marketplaces.py	System plugins (schema v39) + marketplace UX polish + drop legacy pages (#241 )	2026-05-10 19:15:41 +00:00
me.py	feat(home): state-aware /home + /setup-advanced + schema v26 (#228 )	2026-05-08 18:28:47 +02:00
me_debug.py	feat(auth): /me/debug self-only auth diagnostic page (#116 )	2026-04-29 06:36:28 +02:00
memory.py	feat(memory): admin Edit + MEMORY_DOMAIN RBAC + ai-section UI (#141 )	2026-04-30 11:04:41 +02:00
metadata.py	feat(rbac+marketplace): RBAC v13 + Claude Code marketplace + #81/#83/#44 hardening	2026-04-28 14:25:04 +02:00
metrics.py	feat(rbac+marketplace): RBAC v13 + Claude Code marketplace + #81/#83/#44 hardening	2026-04-28 14:25:04 +02:00
my_stack.py	System plugins (schema v39) + marketplace UX polish + drop legacy pages (#241 )	2026-05-10 19:15:41 +00:00
news.py	feat(home): state-aware /home + /setup-advanced + schema v26 (#228 )	2026-05-08 18:28:47 +02:00
query.py	fix(bq-hint): drop literal backslash escapes from syntax-error hint string (#275 )	2026-05-12 18:57:46 +00:00
query_hybrid.py	feat(rbac+marketplace): RBAC v13 + Claude Code marketplace + #81/#83/#44 hardening	2026-04-28 14:25:04 +02:00
scripts.py	feat(scheduler): re-wire sync_schedule + script.schedule; tune via env; OpenMetadata TLS (#135 )	2026-04-29 22:06:30 +02:00
settings.py	feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150 )	2026-04-30 22:02:16 +02:00
store.py	feat(store-guardrails): per-component description quality + plain-language UX (#276 )	2026-05-12 21:48:27 +02:00
sync.py	release: 0.47.1 — Keboola connector v27 (incremental, partitioned, where_filters, typed parquet) (#217 )	2026-05-07 19:01:27 +02:00
telegram.py
tokens.py	chore(lint): final ruff fixes	2026-05-04 19:32:52 +02:00
upload.py	fix(security+ops) + release(0.12.1): #82 #85 #87 hardening + cut 0.12.1 (#104 )	2026-04-28 19:57:30 +02:00
users.py	System plugins (schema v39) + marketplace UX polish + drop legacy pages (#241 )	2026-05-10 19:15:41 +00:00
v2_arrow.py	feat(v2): claude-driven fetch primitives + 0.14.0 (#102 )	2026-04-29 01:07:19 +02:00
v2_cache.py	feat(v2): claude-driven fetch primitives + 0.14.0 (#102 )	2026-04-29 01:07:19 +02:00
v2_catalog.py	feat(catalog): entity_type + validated where_examples + view-aware cost-guard + scheduler hygiene	2026-05-12 10:37:35 +02:00
v2_quota.py	refactor(quota): #160 relocate _build_quota_tracker to v2_quota.py	2026-05-04 10:31:35 +02:00
v2_sample.py	release: 0.46.5 — agnes describe -n parses, server sanitizes NaN (#224 )	2026-05-07 18:16:21 +02:00
v2_scan.py	perf: Tier 1 event-loop unblocking — async def → def on BQ-bound handlers	2026-05-05 17:44:08 +02:00
v2_schema.py	release: 0.53.0 — close Tier B trackers (#259-#261) + admin UI fix (#265 ) (#267 )	2026-05-12 16:28:41 +02:00
welcome.py	fix(devin-review): dashboard CTA respects override; PUT validates anon path	2026-05-03 21:45:32 +02:00
where_validator.py	feat(v2): claude-driven fetch primitives + 0.14.0 (#102 )	2026-04-29 01:07:19 +02:00