* fix(store): surface review failures + harden publish gate
Four independent fixes to the flea-market submission pipeline, all surfaced
by an admin upload that landed at status='approved' without an LLM review.
1. LLM truncation no longer pins submissions in review_error.
- Raised MAX_RESPONSE_TOKENS 2500 → 6000 in llm_review.py
- Added one-shot retry-with-doubled-budget in anthropic_provider.py
(capped at 4× initial)
2. Flea detail page surfaces the latest submission's failure verdict even
when a previously-approved version is still serving (deferred-promotion
path). The _quarantine_banner gate widened from `visibility != approved`
to also fire on `blocked_inline / blocked_llm / review_error`, with copy
that distinguishes the v2+ edit case ("Latest edit failed review —
previously approved version (vN) keeps serving") from the initial-upload
quarantine wording.
3. Restore button + endpoint no longer allow restoring a version that was
never approved. Added StoreEntitiesRepository.get_with_version_approvals
joining store_submissions, gated the UI button on submission_status in
('approved', None), rendered status pills for non-restorable rows, and
added a 400 version_not_approved guard in POST /restore.
4. **BREAKING (operator-facing)**: publish gate is now fail-CLOSED on
misconfig. The previous get_guardrails_enabled() silently fell back to
"disabled, auto-approve everything" when guardrails.enabled=true in YAML
but no ANTHROPIC_API_KEY was in env. Split into:
- get_guardrails_enabled() (intent — YAML)
- get_guardrails_llm_provider_ready() (readiness — env)
Three-state matrix:
enabled=false → auto-approve (unchanged)
enabled=true + ready=true → normal pipeline (unchanged)
enabled=true + ready=false (NEW) → submissions hold at pending_llm
awaiting admin retry or override
(was: silent auto-approve)
Admin "Retry review" eligibility broadened to include pending_llm.
Boot-time WARNING banner surfaces the misconfig in app/main.py.
docs/STORE_GUARDRAILS.md updated with the three-state matrix.
Operators relying on the auto-fallback for local-dev no-LLM setups must
now explicitly set `guardrails.enabled: false` in instance.yaml.
Tests: 4623 passed. Added TestPublishGateFailClosed (4 tests) and
TestRestoreVersion::test_restore_rejects_* (3 tests). conftest.py adds an
autouse fixture defaulting guardrails OFF so legacy tests don't need to
know about the new toggle.
* fix(store): admin override promotes v2+ edits to current
The override handler at app/api/admin.py:3708 only flipped submission
status → 'overridden' and entity visibility → 'approved'. Under the v37+
deferred-promotion model that's insufficient for v2+ edits / restores:
the new bundle sits in versions/v<N>/plugin/ and the entity row stays at
the prior approved version_no + hash + on-disk live bundle. Installers
kept getting the OLD bytes the admin had just intended to replace.
Mirror the runner.run_llm_review auto-approval branch: look up the
submission's version_hash in entity.version_history, and if its `n`
differs from entity.version_no, promote_version + _swap_live_to_version.
Initial v1 overrides are unaffected — the loop finds n=1 == version_no
and skips promotion.
Tests:
- test_override_v2_edit_promotes_to_current: stage v1 approved + v2
blocked_llm; override the v2 sub; assert entity.version_no=2,
entity.version flips off the v1 hash, and the live plugin/ dir
mirrors versions/v2/plugin/.
- test_override_v1_initial_upload_no_promote: regression guard so the
promote loop doesn't accidentally bump a v1 override.
Audit log gains a promoted_to_version_no field on the override action.
* fix(store): retry/rescan review staged bundle; override forward-only
Two adversarial-review findings from a Codex pass on the publish-gate
work.
C1. Admin retry + rescan were passing live `plugin/` to the LLM. For a
v2+ submission held at `pending_llm` / `blocked_llm` / `review_error`,
live still holds the prior approved version's bytes — so the LLM
reviewed the WRONG bytes, and the runner's hash-match promotion in
`run_llm_review` would then advance the entity to staged bytes that
were never actually reviewed. Resolve the staged
`<entity>/versions/v<N>/plugin/` from the submission's
`version_history` entry, with a fall-back to live for legacy pre-v37
rows that never seeded a versions/ dir. Helpers
`_submission_plugin_dir` and `_version_no_for_submission` added to
`app/api/store.py` so override / retry / rescan share one path.
H1. Override's promote loop used `target != current`, which would
silently demote the live bundle when admin overrode a stale v2
submission while v3 was already approved + live. Changed to
`target > current` so override flips status + visibility on the row
regardless, but on-disk promotion only fires forward. Same `>`
defensive guard applied in `runner.run_llm_review` so a late LLM
verdict racing with a newer approval can't demote either.
Tests:
- TestAdminRetryReviewsStagedBundle::test_retry_v2_blocked_passes_staged_dir_not_live
- TestAdminRetryReviewsStagedBundle::test_rescan_v2_blocked_passes_staged_dir_not_live
- TestOverrideForwardOnly::test_override_stale_v2_does_not_demote_when_v3_current
* review polish: CHANGELOG drift, override eligibility, defensive copy
Three small additions on top of the retry/rescan staged-bundle fix:
1. CHANGELOG: the PR's bullets had drifted into the released
[0.54.17] section during rebase (context-match landed them next
to already-released content). Moved them up to [Unreleased] where
they belong; [0.54.17] now holds only what was actually released
(refresh-marketplace ls-remote, /me/activity hero, CI sharding +
workflow polish).
2. app/api/admin.py: admin override eligibility now accepts
pending_llm alongside blocked_inline + blocked_llm + review_error.
Closes a UX gap from the new fail-CLOSED behavior: under
enabled-but-not-ready, a known-good submission would otherwise
sit indefinitely until the admin set credentials AND clicked
Retry. Override already routes through version_history (and is
now forward-only on promote), so it stays safe for v2+ deferred-
promotion submissions.
3. src/repositories/store_entities.py: get_with_version_approvals
defensively copies each version_history entry before annotating
with submission_status. self.get() re-parses JSON each call today
so this is belt-and-suspenders against any future caching layer
leaking the annotated key into a subsequent plain get() call.
Tests: 112 passed (focused on test_store_entity_versions +
test_admin_store_submissions, covering the retry/rescan staged-
bundle fix the author shipped + this polish).
---------
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
215 lines
8.2 KiB
Python
215 lines
8.2 KiB
Python
"""Anthropic provider for structured JSON extraction.
|
|
|
|
Uses the Anthropic API with native structured output (json_schema)
|
|
for reliable JSON extraction. Includes retry logic for transient errors.
|
|
"""
|
|
|
|
import json
|
|
import logging
|
|
import time
|
|
|
|
import anthropic
|
|
|
|
from .exceptions import (
|
|
LLMAuthError,
|
|
LLMFormatError,
|
|
LLMRateLimitError,
|
|
LLMRefusalError,
|
|
LLMTimeoutError,
|
|
)
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
# Retry configuration
|
|
MAX_RETRIES = 3
|
|
INITIAL_BACKOFF_SECONDS = 2
|
|
BACKOFF_MULTIPLIER = 2
|
|
|
|
# Truncation retry: when the model hits max_tokens we retry with a
|
|
# doubled budget. Caps the multiplier at 4x the caller's original
|
|
# value so a runaway can't drain the per-call budget.
|
|
MAX_TRUNCATION_RETRIES = 2 # 2x then 4x
|
|
TRUNCATION_BUDGET_MULTIPLIER = 2
|
|
|
|
|
|
def _strict_json_schema(schema):
|
|
"""Return a copy of the schema with additionalProperties=False on every object type.
|
|
|
|
The Anthropic structured-output API rejects schemas where a `{"type": "object"}` node
|
|
omits `additionalProperties` (HTTP 400 invalid_request_error). We walk the schema
|
|
recursively and force the field where missing.
|
|
"""
|
|
if isinstance(schema, dict):
|
|
out = {k: _strict_json_schema(v) for k, v in schema.items()}
|
|
if out.get("type") == "object" and "additionalProperties" not in out:
|
|
out["additionalProperties"] = False
|
|
return out
|
|
if isinstance(schema, list):
|
|
return [_strict_json_schema(item) for item in schema]
|
|
return schema
|
|
|
|
|
|
class AnthropicExtractor:
|
|
"""Structured JSON extractor using the Anthropic API.
|
|
|
|
Uses output_config with json_schema format for structured output.
|
|
Retries transient errors (rate limit, timeout, connection) with
|
|
exponential backoff.
|
|
"""
|
|
|
|
def __init__(self, api_key: str, model: str) -> None:
|
|
"""Initialize the Anthropic extractor.
|
|
|
|
Args:
|
|
api_key: Anthropic API key.
|
|
model: Model identifier (e.g., "claude-haiku-4-5-20251001").
|
|
"""
|
|
self._client = anthropic.Anthropic(api_key=api_key)
|
|
self._model = model
|
|
|
|
def extract_json(
|
|
self,
|
|
prompt: str,
|
|
max_tokens: int,
|
|
json_schema: dict,
|
|
schema_name: str,
|
|
system: str | None = None,
|
|
) -> dict:
|
|
"""Extract structured JSON using the Anthropic API.
|
|
|
|
Args:
|
|
prompt: User-content prompt sent to the model.
|
|
max_tokens: Maximum tokens in the response.
|
|
json_schema: JSON Schema that the response must conform to.
|
|
schema_name: Human-readable name for the schema.
|
|
system: Optional system prompt — keeps trust boundary intact
|
|
when the user content contains untrusted data (e.g.
|
|
files uploaded by third parties). When the caller passes
|
|
a system prompt here, the prompt-injection threat model
|
|
relies on the SDK's separate ``system=`` parameter so a
|
|
crafted user payload can't override the rules.
|
|
|
|
Returns:
|
|
Parsed JSON dictionary conforming to the provided schema.
|
|
|
|
Raises:
|
|
LLMAuthError: Invalid API key.
|
|
LLMRateLimitError: Rate limited after all retries.
|
|
LLMTimeoutError: Timeout/connection error after all retries.
|
|
LLMFormatError: Response is not valid JSON.
|
|
LLMRefusalError: Model refused to respond.
|
|
"""
|
|
last_exception: Exception | None = None
|
|
# Truncation retries bump max_tokens; transient retries bump
|
|
# backoff. Accounted separately so a verbose response under
|
|
# rate-limit doesn't burn both budgets at once.
|
|
truncation_retries = 0
|
|
current_max_tokens = max_tokens
|
|
|
|
for attempt in range(1, MAX_RETRIES + 1):
|
|
try:
|
|
return self._attempt_extraction(
|
|
prompt, current_max_tokens, json_schema, schema_name,
|
|
attempt, system=system,
|
|
)
|
|
except LLMAuthError:
|
|
raise
|
|
except LLMRefusalError:
|
|
raise
|
|
except LLMFormatError as e:
|
|
# Truncation is a special case: same prompt + schema,
|
|
# but the model didn't have room to finish. Retry with
|
|
# a doubled budget — capped — instead of giving up.
|
|
# Other format errors (bad JSON, schema mismatch) won't
|
|
# benefit from more tokens, so re-raise immediately.
|
|
if (str(e).startswith("Response truncated")
|
|
and truncation_retries < MAX_TRUNCATION_RETRIES):
|
|
truncation_retries += 1
|
|
current_max_tokens *= TRUNCATION_BUDGET_MULTIPLIER
|
|
logger.warning(
|
|
"Response truncated on attempt %d for model %s, "
|
|
"retrying with max_tokens=%d (%dx initial)",
|
|
attempt, self._model, current_max_tokens,
|
|
TRUNCATION_BUDGET_MULTIPLIER ** truncation_retries,
|
|
)
|
|
continue
|
|
raise
|
|
except (LLMRateLimitError, LLMTimeoutError) as e:
|
|
last_exception = e
|
|
if attempt < MAX_RETRIES:
|
|
delay = INITIAL_BACKOFF_SECONDS * (BACKOFF_MULTIPLIER ** (attempt - 1))
|
|
logger.warning(
|
|
"Transient error on attempt %d/%d for model %s, "
|
|
"retrying in %ds: %s",
|
|
attempt, MAX_RETRIES, self._model, delay,
|
|
type(e).__name__,
|
|
)
|
|
time.sleep(delay)
|
|
|
|
raise last_exception # type: ignore[misc]
|
|
|
|
def _attempt_extraction(
|
|
self,
|
|
prompt: str,
|
|
max_tokens: int,
|
|
json_schema: dict,
|
|
schema_name: str,
|
|
attempt: int,
|
|
system: str | None = None,
|
|
) -> dict:
|
|
"""Single extraction attempt against the Anthropic API."""
|
|
logger.info(
|
|
"Anthropic extraction attempt %d/%d, model=%s, schema=%s",
|
|
attempt, MAX_RETRIES, self._model, schema_name,
|
|
)
|
|
|
|
from src.observability import trace_generation
|
|
|
|
try:
|
|
with trace_generation(provider="anthropic", model=self._model) as _trace:
|
|
_trace.set_input(prompt)
|
|
create_kwargs = {
|
|
"model": self._model,
|
|
"max_tokens": max_tokens,
|
|
"messages": [{"role": "user", "content": prompt}],
|
|
"output_config": {
|
|
"format": {
|
|
"type": "json_schema",
|
|
"schema": _strict_json_schema(json_schema),
|
|
},
|
|
},
|
|
}
|
|
if system:
|
|
create_kwargs["system"] = system
|
|
response = self._client.messages.create(**create_kwargs)
|
|
_trace.set_output_from_anthropic(response)
|
|
except anthropic.AuthenticationError as e:
|
|
raise LLMAuthError("Anthropic authentication failed (check API key)") from e
|
|
except anthropic.RateLimitError as e:
|
|
raise LLMRateLimitError("Anthropic rate limited") from e
|
|
except (anthropic.APITimeoutError, anthropic.APIConnectionError) as e:
|
|
raise LLMTimeoutError(
|
|
f"Anthropic connection error ({type(e).__name__})"
|
|
) from e
|
|
|
|
# Check for truncation - raise and let outer retry loop handle it
|
|
if response.stop_reason == "max_tokens":
|
|
raise LLMFormatError(
|
|
f"Response truncated (max_tokens) for schema {schema_name}"
|
|
)
|
|
|
|
# Check for refusal
|
|
if response.stop_reason == "end_turn" and not response.content:
|
|
raise LLMRefusalError(
|
|
f"Model refused to generate response for schema {schema_name}"
|
|
)
|
|
|
|
# Parse JSON from response
|
|
try:
|
|
text = response.content[0].text
|
|
return json.loads(text)
|
|
except (json.JSONDecodeError, IndexError, AttributeError) as e:
|
|
raise LLMFormatError(
|
|
f"Failed to parse Anthropic response as JSON for "
|
|
f"schema {schema_name} ({type(e).__name__})"
|
|
) from e
|