agnes-the-ai-analyst/app/api
Vojtech a694a30a5e
fix(store): surface review failures + harden publish gate (#316)
* fix(store): surface review failures + harden publish gate

Four independent fixes to the flea-market submission pipeline, all surfaced
by an admin upload that landed at status='approved' without an LLM review.

1. LLM truncation no longer pins submissions in review_error.
   - Raised MAX_RESPONSE_TOKENS 2500 → 6000 in llm_review.py
   - Added one-shot retry-with-doubled-budget in anthropic_provider.py
     (capped at 4× initial)

2. Flea detail page surfaces the latest submission's failure verdict even
   when a previously-approved version is still serving (deferred-promotion
   path). The _quarantine_banner gate widened from `visibility != approved`
   to also fire on `blocked_inline / blocked_llm / review_error`, with copy
   that distinguishes the v2+ edit case ("Latest edit failed review —
   previously approved version (vN) keeps serving") from the initial-upload
   quarantine wording.

3. Restore button + endpoint no longer allow restoring a version that was
   never approved. Added StoreEntitiesRepository.get_with_version_approvals
   joining store_submissions, gated the UI button on submission_status in
   ('approved', None), rendered status pills for non-restorable rows, and
   added a 400 version_not_approved guard in POST /restore.

4. **BREAKING (operator-facing)**: publish gate is now fail-CLOSED on
   misconfig. The previous get_guardrails_enabled() silently fell back to
   "disabled, auto-approve everything" when guardrails.enabled=true in YAML
   but no ANTHROPIC_API_KEY was in env. Split into:
     - get_guardrails_enabled()              (intent — YAML)
     - get_guardrails_llm_provider_ready()   (readiness — env)
   Three-state matrix:
     enabled=false                       → auto-approve (unchanged)
     enabled=true + ready=true           → normal pipeline (unchanged)
     enabled=true + ready=false (NEW)    → submissions hold at pending_llm
                                           awaiting admin retry or override
                                           (was: silent auto-approve)
   Admin "Retry review" eligibility broadened to include pending_llm.
   Boot-time WARNING banner surfaces the misconfig in app/main.py.
   docs/STORE_GUARDRAILS.md updated with the three-state matrix.
   Operators relying on the auto-fallback for local-dev no-LLM setups must
   now explicitly set `guardrails.enabled: false` in instance.yaml.

Tests: 4623 passed. Added TestPublishGateFailClosed (4 tests) and
TestRestoreVersion::test_restore_rejects_* (3 tests). conftest.py adds an
autouse fixture defaulting guardrails OFF so legacy tests don't need to
know about the new toggle.

* fix(store): admin override promotes v2+ edits to current

The override handler at app/api/admin.py:3708 only flipped submission
status → 'overridden' and entity visibility → 'approved'. Under the v37+
deferred-promotion model that's insufficient for v2+ edits / restores:
the new bundle sits in versions/v<N>/plugin/ and the entity row stays at
the prior approved version_no + hash + on-disk live bundle. Installers
kept getting the OLD bytes the admin had just intended to replace.

Mirror the runner.run_llm_review auto-approval branch: look up the
submission's version_hash in entity.version_history, and if its `n`
differs from entity.version_no, promote_version + _swap_live_to_version.
Initial v1 overrides are unaffected — the loop finds n=1 == version_no
and skips promotion.

Tests:
- test_override_v2_edit_promotes_to_current: stage v1 approved + v2
  blocked_llm; override the v2 sub; assert entity.version_no=2,
  entity.version flips off the v1 hash, and the live plugin/ dir
  mirrors versions/v2/plugin/.
- test_override_v1_initial_upload_no_promote: regression guard so the
  promote loop doesn't accidentally bump a v1 override.

Audit log gains a promoted_to_version_no field on the override action.

* fix(store): retry/rescan review staged bundle; override forward-only

Two adversarial-review findings from a Codex pass on the publish-gate
work.

C1. Admin retry + rescan were passing live `plugin/` to the LLM. For a
v2+ submission held at `pending_llm` / `blocked_llm` / `review_error`,
live still holds the prior approved version's bytes — so the LLM
reviewed the WRONG bytes, and the runner's hash-match promotion in
`run_llm_review` would then advance the entity to staged bytes that
were never actually reviewed. Resolve the staged
`<entity>/versions/v<N>/plugin/` from the submission's
`version_history` entry, with a fall-back to live for legacy pre-v37
rows that never seeded a versions/ dir. Helpers
`_submission_plugin_dir` and `_version_no_for_submission` added to
`app/api/store.py` so override / retry / rescan share one path.

H1. Override's promote loop used `target != current`, which would
silently demote the live bundle when admin overrode a stale v2
submission while v3 was already approved + live. Changed to
`target > current` so override flips status + visibility on the row
regardless, but on-disk promotion only fires forward. Same `>`
defensive guard applied in `runner.run_llm_review` so a late LLM
verdict racing with a newer approval can't demote either.

Tests:
- TestAdminRetryReviewsStagedBundle::test_retry_v2_blocked_passes_staged_dir_not_live
- TestAdminRetryReviewsStagedBundle::test_rescan_v2_blocked_passes_staged_dir_not_live
- TestOverrideForwardOnly::test_override_stale_v2_does_not_demote_when_v3_current

* review polish: CHANGELOG drift, override eligibility, defensive copy

Three small additions on top of the retry/rescan staged-bundle fix:

1. CHANGELOG: the PR's bullets had drifted into the released
   [0.54.17] section during rebase (context-match landed them next
   to already-released content). Moved them up to [Unreleased] where
   they belong; [0.54.17] now holds only what was actually released
   (refresh-marketplace ls-remote, /me/activity hero, CI sharding +
   workflow polish).

2. app/api/admin.py: admin override eligibility now accepts
   pending_llm alongside blocked_inline + blocked_llm + review_error.
   Closes a UX gap from the new fail-CLOSED behavior: under
   enabled-but-not-ready, a known-good submission would otherwise
   sit indefinitely until the admin set credentials AND clicked
   Retry. Override already routes through version_history (and is
   now forward-only on promote), so it stays safe for v2+ deferred-
   promotion submissions.

3. src/repositories/store_entities.py: get_with_version_approvals
   defensively copies each version_history entry before annotating
   with submission_status. self.get() re-parses JSON each call today
   so this is belt-and-suspenders against any future caching layer
   leaking the annotated key into a subsequent plain get() call.

Tests: 112 passed (focused on test_store_entity_versions +
test_admin_store_submissions, covering the retry/rescan staged-
bundle fix the author shipped + this polish).

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-15 15:52:07 +02:00
..
__init__.py feat: add FastAPI server with auth, RBAC, and all API endpoints 2026-03-27 15:19:18 +01:00
_metadata_models.py feat(catalog): entity_type + validated where_examples + view-aware cost-guard + scheduler hygiene 2026-05-12 10:37:35 +02:00
access.py feat(web): consolidate the personal /me/* surface — /me/activity + /me/profile (#304) 2026-05-14 21:29:51 +02:00
activity.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
admin.py fix(store): surface review failures + harden publish gate (#316) 2026-05-15 15:52:07 +02:00
admin_bigquery_test.py feat(admin): #160 BQ test-connection endpoint + billing_project placeholder UI 2026-05-04 10:31:35 +02:00
admin_sessions.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
admin_usage.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
admin_usage_summary.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
admin_user_sessions.py fix(security): RBAC filter uses stable user_id instead of mutable email local-part (#293) (#299) 2026-05-14 14:12:54 +00:00
bq_metadata_refresh.py release: 0.52.0 — UX/hygiene round (5 fixes from 0.51.0 retro) 2026-05-12 15:09:14 +02:00
cache_warmup.py release: 0.50.0 — persistent BQ metadata cache + scheduled refresh; catalog never blocks on BigQuery 2026-05-11 20:37:17 +02:00
catalog.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
claude_md.py chore(cli-rename): replace stale da verbs in active code paths 2026-05-04 21:10:43 +02:00
cli_artifacts.py feat(web): consolidate the personal /me/* surface — /me/activity + /me/profile (#304) 2026-05-14 21:29:51 +02:00
data.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
health.py Extract session-pipeline framework + UsageProcessor skeleton (#232) 2026-05-08 19:47:46 +02:00
initial_workspace.py fix(api): redirect unauthorized browser requests to login for initial workspace zip (#315) 2026-05-15 15:18:39 +02:00
jira_webhooks.py fix(security): close Jira webhook fail-open + path traversal (#83) (#93) 2026-04-27 19:53:55 +02:00
marketplace.py perf(marketplace): cache cover photos + restore Curated filter spacing (#294) 2026-05-14 10:09:32 +02:00
marketplaces.py feat(initial-workspace): per-instance agnes init override (#292) 2026-05-13 20:35:01 +00:00
me.py fix(security): RBAC filter uses stable user_id instead of mutable email local-part (#293) (#299) 2026-05-14 14:12:54 +00:00
me_debug.py feat(web): consolidate the personal /me/* surface — /me/activity + /me/profile (#304) 2026-05-14 21:29:51 +02:00
me_stats.py feat(web): consolidate the personal /me/* surface — /me/activity + /me/profile (#304) 2026-05-14 21:29:51 +02:00
memory.py feat(memory): admin Edit + MEMORY_DOMAIN RBAC + ai-section UI (#141) 2026-04-30 11:04:41 +02:00
metadata.py feat(rbac+marketplace): RBAC v13 + Claude Code marketplace + #81/#83/#44 hardening 2026-04-28 14:25:04 +02:00
metrics.py feat(rbac+marketplace): RBAC v13 + Claude Code marketplace + #81/#83/#44 hardening 2026-04-28 14:25:04 +02:00
my_stack.py perf(marketplace): cache cover photos + restore Curated filter spacing (#294) 2026-05-14 10:09:32 +02:00
news.py feat(home): state-aware /home + /setup-advanced + schema v26 (#228) 2026-05-08 18:28:47 +02:00
observability.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
query.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
query_hybrid.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
scripts.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
settings.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
store.py fix(store): surface review failures + harden publish gate (#316) 2026-05-15 15:52:07 +02:00
sync.py feat(me/stats): per-analyst Stats dashboard with 4 tabs (#298) 2026-05-14 10:27:58 +00:00
telegram.py feat: complete system — web UI, all API endpoints, governance, admin, CLI commands 2026-03-27 16:52:22 +01:00
tokens.py chore(lint): final ruff fixes 2026-05-04 19:32:52 +02:00
upload.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
users.py System plugins (schema v39) + marketplace UX polish + drop legacy pages (#241) 2026-05-10 19:15:41 +00:00
v2_arrow.py feat(v2): claude-driven fetch primitives + 0.14.0 (#102) 2026-04-29 01:07:19 +02:00
v2_cache.py feat(v2): claude-driven fetch primitives + 0.14.0 (#102) 2026-04-29 01:07:19 +02:00
v2_catalog.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
v2_quota.py refactor(quota): #160 relocate _build_quota_tracker to v2_quota.py 2026-05-04 10:31:35 +02:00
v2_sample.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
v2_scan.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
v2_schema.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
welcome.py fix(devin-review): dashboard CTA respects override; PUT validates anon path 2026-05-03 21:45:32 +02:00
where_validator.py feat(v2): claude-driven fetch primitives + 0.14.0 (#102) 2026-04-29 01:07:19 +02:00