agnes-the-ai-analyst/app/instance_config.py
Vojtech a694a30a5e
fix(store): surface review failures + harden publish gate (#316)
* fix(store): surface review failures + harden publish gate

Four independent fixes to the flea-market submission pipeline, all surfaced
by an admin upload that landed at status='approved' without an LLM review.

1. LLM truncation no longer pins submissions in review_error.
   - Raised MAX_RESPONSE_TOKENS 2500 → 6000 in llm_review.py
   - Added one-shot retry-with-doubled-budget in anthropic_provider.py
     (capped at 4× initial)

2. Flea detail page surfaces the latest submission's failure verdict even
   when a previously-approved version is still serving (deferred-promotion
   path). The _quarantine_banner gate widened from `visibility != approved`
   to also fire on `blocked_inline / blocked_llm / review_error`, with copy
   that distinguishes the v2+ edit case ("Latest edit failed review —
   previously approved version (vN) keeps serving") from the initial-upload
   quarantine wording.

3. Restore button + endpoint no longer allow restoring a version that was
   never approved. Added StoreEntitiesRepository.get_with_version_approvals
   joining store_submissions, gated the UI button on submission_status in
   ('approved', None), rendered status pills for non-restorable rows, and
   added a 400 version_not_approved guard in POST /restore.

4. **BREAKING (operator-facing)**: publish gate is now fail-CLOSED on
   misconfig. The previous get_guardrails_enabled() silently fell back to
   "disabled, auto-approve everything" when guardrails.enabled=true in YAML
   but no ANTHROPIC_API_KEY was in env. Split into:
     - get_guardrails_enabled()              (intent — YAML)
     - get_guardrails_llm_provider_ready()   (readiness — env)
   Three-state matrix:
     enabled=false                       → auto-approve (unchanged)
     enabled=true + ready=true           → normal pipeline (unchanged)
     enabled=true + ready=false (NEW)    → submissions hold at pending_llm
                                           awaiting admin retry or override
                                           (was: silent auto-approve)
   Admin "Retry review" eligibility broadened to include pending_llm.
   Boot-time WARNING banner surfaces the misconfig in app/main.py.
   docs/STORE_GUARDRAILS.md updated with the three-state matrix.
   Operators relying on the auto-fallback for local-dev no-LLM setups must
   now explicitly set `guardrails.enabled: false` in instance.yaml.

Tests: 4623 passed. Added TestPublishGateFailClosed (4 tests) and
TestRestoreVersion::test_restore_rejects_* (3 tests). conftest.py adds an
autouse fixture defaulting guardrails OFF so legacy tests don't need to
know about the new toggle.

* fix(store): admin override promotes v2+ edits to current

The override handler at app/api/admin.py:3708 only flipped submission
status → 'overridden' and entity visibility → 'approved'. Under the v37+
deferred-promotion model that's insufficient for v2+ edits / restores:
the new bundle sits in versions/v<N>/plugin/ and the entity row stays at
the prior approved version_no + hash + on-disk live bundle. Installers
kept getting the OLD bytes the admin had just intended to replace.

Mirror the runner.run_llm_review auto-approval branch: look up the
submission's version_hash in entity.version_history, and if its `n`
differs from entity.version_no, promote_version + _swap_live_to_version.
Initial v1 overrides are unaffected — the loop finds n=1 == version_no
and skips promotion.

Tests:
- test_override_v2_edit_promotes_to_current: stage v1 approved + v2
  blocked_llm; override the v2 sub; assert entity.version_no=2,
  entity.version flips off the v1 hash, and the live plugin/ dir
  mirrors versions/v2/plugin/.
- test_override_v1_initial_upload_no_promote: regression guard so the
  promote loop doesn't accidentally bump a v1 override.

Audit log gains a promoted_to_version_no field on the override action.

* fix(store): retry/rescan review staged bundle; override forward-only

Two adversarial-review findings from a Codex pass on the publish-gate
work.

C1. Admin retry + rescan were passing live `plugin/` to the LLM. For a
v2+ submission held at `pending_llm` / `blocked_llm` / `review_error`,
live still holds the prior approved version's bytes — so the LLM
reviewed the WRONG bytes, and the runner's hash-match promotion in
`run_llm_review` would then advance the entity to staged bytes that
were never actually reviewed. Resolve the staged
`<entity>/versions/v<N>/plugin/` from the submission's
`version_history` entry, with a fall-back to live for legacy pre-v37
rows that never seeded a versions/ dir. Helpers
`_submission_plugin_dir` and `_version_no_for_submission` added to
`app/api/store.py` so override / retry / rescan share one path.

H1. Override's promote loop used `target != current`, which would
silently demote the live bundle when admin overrode a stale v2
submission while v3 was already approved + live. Changed to
`target > current` so override flips status + visibility on the row
regardless, but on-disk promotion only fires forward. Same `>`
defensive guard applied in `runner.run_llm_review` so a late LLM
verdict racing with a newer approval can't demote either.

Tests:
- TestAdminRetryReviewsStagedBundle::test_retry_v2_blocked_passes_staged_dir_not_live
- TestAdminRetryReviewsStagedBundle::test_rescan_v2_blocked_passes_staged_dir_not_live
- TestOverrideForwardOnly::test_override_stale_v2_does_not_demote_when_v3_current

* review polish: CHANGELOG drift, override eligibility, defensive copy

Three small additions on top of the retry/rescan staged-bundle fix:

1. CHANGELOG: the PR's bullets had drifted into the released
   [0.54.17] section during rebase (context-match landed them next
   to already-released content). Moved them up to [Unreleased] where
   they belong; [0.54.17] now holds only what was actually released
   (refresh-marketplace ls-remote, /me/activity hero, CI sharding +
   workflow polish).

2. app/api/admin.py: admin override eligibility now accepts
   pending_llm alongside blocked_inline + blocked_llm + review_error.
   Closes a UX gap from the new fail-CLOSED behavior: under
   enabled-but-not-ready, a known-good submission would otherwise
   sit indefinitely until the admin set credentials AND clicked
   Retry. Override already routes through version_history (and is
   now forward-only on promote), so it stays safe for v2+ deferred-
   promotion submissions.

3. src/repositories/store_entities.py: get_with_version_approvals
   defensively copies each version_history entry before annotating
   with submission_status. self.get() re-parses JSON each call today
   so this is belt-and-suspenders against any future caching layer
   leaking the annotated key into a subsequent plain get() call.

Tests: 112 passed (focused on test_store_entity_versions +
test_admin_store_submissions, covering the retry/rescan staged-
bundle fix the author shipped + this polish).

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-15 15:52:07 +02:00

602 lines
24 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

"""Instance configuration — loads instance.yaml and exposes to FastAPI."""
import logging
import os
from pathlib import Path
from typing import Any, Optional
logger = logging.getLogger(__name__)
_instance_config: Optional[dict] = None
def reset_cache() -> None:
"""Drop the in-process instance.yaml cache; the next ``load_instance_config``
call re-reads from disk. Used by `/api/admin/server-config` after a save.
Public alias so callers don't have to reach into the private global.
Also clears ``connectors.bigquery.access.get_bq_access`` so the v2 endpoints
pick up new BigQuery project IDs after an admin saves `instance.yaml` —
without this, `get_bq_access`'s `@functools.cache` would freeze the projects
at first call and require a container restart to pick up changes (Devin
ANALYSIS_0004 on PR #138). Lazy-imported so this module stays usable in
environments where the connectors package can't be imported (e.g. unit
tests of instance_config in isolation)."""
global _instance_config
_instance_config = None
try:
from connectors.bigquery.access import get_bq_access
get_bq_access.cache_clear()
except Exception:
# Connectors module not loaded yet, or BQ deps missing — both fine.
pass
def _deep_merge(base: dict, patch: dict) -> dict:
"""Deep-merge `patch` into `base`, returning a new dict.
Dict-into-dict recurses; everything else (scalars, lists, None) is
replaced wholesale. Used so the writable overlay can hold only the
sections an operator has touched, while everything else flows from
the static file unchanged. Same semantics as the helper in
`/api/admin/server-config`'s POST handler.
"""
out = dict(base)
for key, value in patch.items():
if isinstance(value, dict) and isinstance(out.get(key), dict):
out[key] = _deep_merge(out[key], value)
else:
out[key] = value
return out
def load_instance_config() -> dict:
"""Load instance.yaml as a deep-merge of the static file and the
writable overlay.
Resolution:
1. Static base: ``CONFIG_DIR/instance.yaml`` via ``config.loader``
(the source of truth for sections the editor doesn't expose —
``datasets``, ``corporate_memory``, ``openmetadata``, etc.).
2. Overlay patch: ``DATA_DIR/state/instance.yaml`` (written by
``/api/admin/configure`` and ``/api/admin/server-config``;
contains only the sections those endpoints accept).
3. Overlay wins per-leaf via deep-merge — operator edits persist,
static-only sections still flow through.
Pre-2026-04-28 this function returned the overlay verbatim when it
existed and only fell back to static when it didn't. That was a
silent footgun: the moment someone saved any section through the
new editor (which writes a narrow overlay by design), every
consumer of static-only sections (corporate memory page, dataset
list, OpenMetadata client) saw empty defaults. See PR #107.
"""
global _instance_config
if _instance_config is not None:
return _instance_config
import yaml
# Static base — strict validation lives in config.loader.
base: dict = {}
try:
from config.loader import load_instance_config as _load
base = _load() or {}
logger.info("Loaded instance.yaml base from config/")
except Exception as e:
logger.warning(f"Could not load static instance.yaml: {e}")
# Overlay patch from the writable volume. Best-effort — a corrupt
# overlay shouldn't take the app offline (we'd rather serve stale/base
# config than 500 every request), but log loudly with a traceback so
# the corruption surfaces in the operator's logs immediately. The
# write-side endpoints (POST /api/admin/server-config and /configure)
# refuse to overwrite a corrupt overlay with HTTP 500, so an admin
# noticing the saves break is the second line of defence.
#
# ${ENV_VAR} interpolation: ``config.loader.load_instance_config`` runs
# the static base through ``_resolve_env_refs`` already, but raw
# ``yaml.safe_load`` here would leave overlay strings like
# ``${ANTHROPIC_API_KEY}`` as literal placeholders. /api/admin/configure
# writes exactly that string into the seeded ai: block (#176), so we
# mirror the resolver here before the deep-merge — without it, the
# LLM factory receives the literal placeholder and rejects it as an
# invalid api key (#179 review fix).
# Resolve via _state_dir() so the path matches the writer in
# app/api/admin.py — under the flat-mount layout (STATE_DIR=/data-state)
# both the configure-endpoint and the server-config-endpoint write
# ``/data-state/instance.yaml``; reading from ``/data/state/...`` here
# would silently load stale config from the regenerable data disk.
from app.secrets import _state_dir
overlay_path = _state_dir() / "instance.yaml"
if overlay_path.exists():
try:
overlay = yaml.safe_load(overlay_path.read_text()) or {}
from config.loader import _resolve_env_refs
overlay = _resolve_env_refs(overlay)
base = _deep_merge(base, overlay)
logger.info("Merged overlay from %s", overlay_path)
except Exception:
logger.exception(
"instance.yaml overlay at %s is corrupt — falling back to "
"static base config; saves through the editor will refuse "
"until the file is repaired", overlay_path,
)
_instance_config = base
return _instance_config
def get_value(*keys, default=None) -> Any:
"""Get nested value from instance config."""
config = load_instance_config()
current = config
for key in keys:
if isinstance(current, dict):
current = current.get(key)
else:
return default
if current is None:
return default
return current
def get_data_source_type() -> str:
return os.environ.get("DATA_SOURCE", get_value("data_source", "type", default="local"))
def get_home_route() -> str:
"""Path that ``/`` redirects to for an authenticated user.
Resolution order: ``AGNES_HOME_ROUTE`` env var (Terraform-friendly,
overrides everything) > ``instance.home_route`` in instance.yaml >
default ``/dashboard``. The env-overrides-yaml shape mirrors
:func:`get_data_source_type` (precedent in this file) so operators
can flip a fork to ``/home`` per-deployment without forking the
YAML.
Validated to start with ``/`` and not ``//`` so a misconfigured
value can't pivot the root redirect to an external host.
"""
raw = os.environ.get("AGNES_HOME_ROUTE") or get_value(
"instance", "home_route", default="/dashboard"
)
route = (raw or "").strip()
if not route.startswith("/") or route.startswith("//"):
return "/dashboard"
return route
def get_gws_oauth_credentials() -> dict:
"""Pre-configured Google Workspace CLI OAuth client (client_id + secret).
When set, /home renders a connector prompt that tells the analyst (and
Claude) to export `GOOGLE_WORKSPACE_CLI_CLIENT_ID` and
`GOOGLE_WORKSPACE_CLI_CLIENT_SECRET` and skip the "create your own GCP
project" walkthrough — the operator has already provisioned a shared
OAuth app for the instance. When unset, the prompt falls back to the
manual `gws auth setup` flow.
OAuth client_id + secret here are app identifiers for an installed
"Desktop app" OAuth client, not a per-user secret. They're rendered
into the public /home page on purpose — they identify the OAuth app,
and the redirect-URI / scope guardrails on the GCP-side OAuth client
are what enforce safety. Treat them like a publishable bundle ID,
not a credential.
Resolution order (env-overrides-yaml, mirrors :func:`get_home_route`):
- ``AGNES_GWS_CLIENT_ID`` env > ``instance.gws.client_id`` YAML > None
- ``AGNES_GWS_CLIENT_SECRET`` env > ``instance.gws.client_secret`` YAML > None
- ``AGNES_GWS_OAUTHLIB_INSECURE_TRANSPORT`` env > ``instance.gws.oauthlib_insecure_transport`` YAML > "1"
(kept as "1" by default because the gws CLI binds an HTTP loopback
on 127.0.0.1:8080 for the OAuth redirect, and Google's oauthlib
refuses non-HTTPS redirects without this flag).
Both id and secret must be set for the configured branch to engage;
a half-configured instance falls back to manual setup with a warning.
"""
cid = os.environ.get("AGNES_GWS_CLIENT_ID") or get_value(
"instance", "gws", "client_id", default=""
)
secret = os.environ.get("AGNES_GWS_CLIENT_SECRET") or get_value(
"instance", "gws", "client_secret", default=""
)
insecure = os.environ.get("AGNES_GWS_OAUTHLIB_INSECURE_TRANSPORT") or get_value(
"instance", "gws", "oauthlib_insecure_transport", default="1"
)
project_id = os.environ.get("AGNES_GWS_PROJECT_ID") or get_value(
"instance", "gws", "project_id", default=""
)
cid = (cid or "").strip()
secret = (secret or "").strip()
project_id = (project_id or "").strip()
# Derive project_id from the client_id when not explicitly set. Google's
# OAuth client_id format is "<numeric-project-number>-<random>.apps.
# googleusercontent.com"; the numeric prefix is required by the
# client_secret.json schema (gws CLI's Rust struct treats it as
# non-Option). Falls back to "" when the client_id is empty or
# malformed; the configured branch in the template degrades gracefully.
if not project_id and cid and "-" in cid:
project_id = cid.split("-", 1)[0]
return {
"client_id": cid,
"client_secret": secret,
"project_id": project_id,
"oauthlib_insecure_transport": str(insecure).strip() or "1",
"configured": bool(cid and secret),
}
def get_home_automode_visibility() -> bool:
"""Whether /home renders the "Step 3 — turn on auto-accept mode"
install-block. Auto-accept mode is the recommended middle ground
between default per-action prompting (slow) and full YOLO
(`--dangerously-skip-permissions`, broad blast radius).
Cautious-rollout instances can hide the section by setting
``AGNES_HOME_SHOW_AUTOMODE=0`` so users learn the permission flow
first; the same content stays available on /setup-advanced.
Resolution: env var > ``instance.home.show_automode`` YAML > True.
Mirrors :func:`get_home_route` shape so Terraform overrides work
the same way.
"""
raw = os.environ.get("AGNES_HOME_SHOW_AUTOMODE")
if raw is None:
raw = get_value("instance", "home", "show_automode", default=True)
if isinstance(raw, bool):
return raw
return str(raw).strip().lower() not in ("0", "false", "no", "off", "")
def get_home_status_frame_visibility() -> bool:
"""Whether /home renders the homepage status frame (Last sync,
Sessions, Prompts, Tokens, Projects).
The template ALSO gates rendering on ``users.onboarded`` so a
fresh user sees a clean install-hero before the all-zero stat
cards. This helper is the operator-level master switch; the
onboarding gate is a UX coherence rule layered on top.
Cautious-rollout instances that would rather not expose token
counters to analysts yet can disable with
``AGNES_HOME_SHOW_STATUS_FRAME=0`` (or
``instance.home.show_status_frame: false`` in YAML).
Resolution: env var > ``instance.home.show_status_frame`` YAML > True.
Shape mirrors :func:`get_home_automode_visibility` so Terraform
overrides land the same way.
"""
raw = os.environ.get("AGNES_HOME_SHOW_STATUS_FRAME")
if raw is None:
raw = get_value("instance", "home", "show_status_frame", default=True)
if isinstance(raw, bool):
return raw
return str(raw).strip().lower() not in ("0", "false", "no", "off", "")
def get_instance_name() -> str:
return get_value("instance", "name", default="AI Data Analyst")
def get_instance_subtitle() -> str:
return get_value("instance", "subtitle", default="")
def get_instance_brand() -> str:
"""Product-name brand string surfaced to end users in the analyst-facing
UI (``/home`` hero copy, ``/setup``, ``/login``, the clipboard setup
script, etc.). Defaults to ``"Agnes"`` — operators rebranding this OSS
set it to e.g. ``"Foundry AI"`` without forking.
Distinct from :func:`get_instance_name` which drives page titles and
represents the deploying organization's display name ("AI Data Analyst").
Brand is the *product*; name is the *deployment*.
Resolution: ``AGNES_INSTANCE_BRAND`` env > ``instance.brand`` YAML > ``"Agnes"``.
Mirrors :func:`get_home_route` shape so Terraform env overrides work.
"""
raw = os.environ.get("AGNES_INSTANCE_BRAND")
if raw is None:
raw = get_value("instance", "brand", default="Agnes")
value = (raw or "").strip()
return value or "Agnes"
def get_instance_logo_svg() -> str:
"""Raw inline ``<svg>`` markup rendered into the header brand slot
(``_app_header.html``). When non-empty, replaces the text brand in
the header — typical use is a lockup that already contains the
brand wordmark. When empty, the header falls back to
:func:`get_instance_name` as text.
Resolution: ``AGNES_INSTANCE_LOGO_SVG`` env > ``instance.logo_svg``
YAML > ``""``. Mirrors :func:`get_instance_brand` so Terraform env
overrides work the same way.
"""
raw = os.environ.get("AGNES_INSTANCE_LOGO_SVG")
if raw is None:
raw = get_value("instance", "logo_svg", default="")
return (raw or "").strip()
def get_instance_overview() -> str:
"""Operator-authored Overview body rendered on ``/home``. Markdown is
NOT auto-converted — operators paste HTML (matches the existing
``news_intro`` ``| safe`` filter). Empty default = section hidden,
keeping the OSS vendor-neutral when an instance ships without
operator-specific framing.
Resolution: ``AGNES_INSTANCE_OVERVIEW`` env > ``instance.overview``
YAML > ``""``. Mirrors :func:`get_instance_logo_svg`.
"""
raw = os.environ.get("AGNES_INSTANCE_OVERVIEW")
if raw is None:
raw = get_value("instance", "overview", default="")
return (raw or "").strip()
def get_workspace_dir_name() -> str:
"""Filesystem-safe folder name for the analyst's local workspace
(``~/<workspace_dir_name>``). Defaults to :func:`get_instance_brand`
with every non-alphanumeric character stripped, so ``"Foundry AI"``
becomes ``"FoundryAI"`` and ``"Agnes"`` stays ``"Agnes"``.
An explicit override exists for operators who want a folder name that
doesn't follow the strip-whitespace derivation.
Resolution: ``AGNES_WORKSPACE_DIR_NAME`` env > ``instance.workspace_dir``
YAML > derived from :func:`get_instance_brand`.
"""
raw = os.environ.get("AGNES_WORKSPACE_DIR_NAME")
if raw is None:
raw = get_value("instance", "workspace_dir", default="")
explicit = (raw or "").strip()
if explicit:
return explicit
import re
derived = re.sub(r"[^A-Za-z0-9]", "", get_instance_brand())
return derived or "Agnes"
def get_instance_admin_email() -> str:
"""Operator-facing contact address shown in user-side prompts that
suggest the user reach out to their Agnes admin (e.g. the /home GWS
connector tile renders an "Email admin" mailto button when no shared
OAuth app is provisioned). Empty string when unset — the template
branches on the value being truthy, so an empty value hides the
button rather than rendering a broken `mailto:` link.
Resolution: ``AGNES_INSTANCE_ADMIN_EMAIL`` env > ``instance.admin_email`` YAML > "".
Mirrors :func:`get_home_route` shape so Terraform overrides work.
"""
raw = os.environ.get("AGNES_INSTANCE_ADMIN_EMAIL")
if raw is None:
raw = get_value("instance", "admin_email", default="")
return (raw or "").strip()
def get_atlassian_base_url() -> str:
"""Operator-provisioned Atlassian Cloud site URL — baked into the
Atlassian connector prompt so end users don't have to guess /
paste their org's `https://<myorg>.atlassian.net`.
When set, the connector prompt's "ask me for the site URL" step
is replaced by a literal value the helper script substitutes
directly. When unset (empty string), the prompt falls back to
asking the user — same flow as today.
Normalized: trailing slashes and a trailing ``/wiki`` are stripped
so the value is always the bare site root. Matches the
normalization the per-user helper script already does at storage
time (see atlassian_prompt step 4 guard 2).
Resolution: ``AGNES_ATLASSIAN_BASE_URL`` env > ``instance.atlassian.base_url`` YAML > "".
Mirrors :func:`get_instance_admin_email` so Terraform overrides
work the same way.
"""
raw = os.environ.get("AGNES_ATLASSIAN_BASE_URL")
if raw is None:
raw = get_value("instance", "atlassian", "base_url", default="")
value = (raw or "").strip().rstrip("/")
if value.endswith("/wiki"):
value = value[: -len("/wiki")]
return value
def get_sync_interval() -> str:
"""Human-readable refresh cadence shown in the analyst welcome prompt."""
return get_value("instance", "sync_interval", default="1 hour")
def get_allowed_domains() -> list:
domain = get_value("auth", "allowed_domain", default="")
if domain:
return [d.strip() for d in domain.split(",") if d.strip()]
return []
def get_datasets() -> dict:
return get_value("datasets", default={})
def get_theme() -> dict:
return get_value("theme", default={})
def get_auth_config() -> dict:
return get_value("auth", default={})
def get_corporate_memory_config() -> dict:
return get_value("corporate_memory", default={})
def get_guardrails_config() -> dict:
"""Flea-market upload-guardrail config (see docs/STORE_GUARDRAILS.md).
Returns the ``guardrails:`` block from instance.yaml, or an empty dict
when not configured. Call site: ``src/store_guardrails/runner.py``.
"""
return get_value("guardrails", default={})
def get_guardrails_review_model() -> str:
"""Resolved Anthropic model ID used for the LLM security review.
Reads ``guardrails.review_model`` (one of ``haiku``, ``sonnet``,
``opus``, or a concrete ``claude-*`` model ID) and returns the
concrete model ID. Defaults to Haiku — the cheapest tier — when the
operator hasn't set the key. Override per-instance for higher-stakes
review at proportionally higher cost.
"""
from connectors.llm.factory import resolve_model_tier
raw = get_value("guardrails", "review_model", default="haiku")
return resolve_model_tier(raw)
def get_guardrails_blocked_quota_per_day() -> int:
"""Per-submitter cap on `blocked_llm` + `review_error` rows in the
trailing 24h.
Defaults to 50. Set to 0 in instance.yaml to disable the quota
entirely (useful for trusted single-tenant deployments). Bounds
the worst case where a bot loops on bundles that pass inline
checks but trip the async LLM reviewer. Inline failures are
hard-rejected upstream (no row created) and not counted here;
HTTP-level rate limiting + the
``store.upload.security_blocked`` audit trail cover that path.
"""
val = get_value("guardrails", "blocked_quota_per_day", default=50)
try:
return max(0, int(val))
except (TypeError, ValueError):
return 50
def get_guardrails_blocked_bundle_ttl_days() -> int:
"""How many days to keep a blocked bundle's bytes on disk.
Default 30. The submission row + sha256 + size always survive — only
the bundle bytes get removed. ``bundle_purged_at`` is stamped so the
detail UI renders *"Bundle purged on …"*. Set to 0 to disable the
TTL purge entirely (bundles persist indefinitely until manual
Delete).
"""
val = get_value("guardrails", "blocked_bundle_ttl_days", default=30)
try:
return max(0, int(val))
except (TypeError, ValueError):
return 30
def get_guardrails_stuck_review_grace_seconds() -> int:
"""How long a submission may stay at ``status='pending_llm'`` before
the reaper flips it to ``review_error``.
The BackgroundTasks worker normally writes a verdict within a few
seconds. If the worker crashes between status flip and verdict
write, the row would otherwise sit at pending_llm forever — admin
queue surfaces it indefinitely; submitter never gets a verdict.
Default 1800s (30 min) comfortably exceeds the Sonnet/Opus p99
wall time for the configured ``MAX_REVIEW_BYTES`` payload. Set to
0 to disable the reaper entirely.
"""
val = get_value("guardrails", "stuck_review_grace_seconds", default=1800)
try:
return max(0, int(val))
except (TypeError, ValueError):
return 1800
def get_guardrails_min_description_chars() -> int:
"""Minimum character floor for skill / agent / plugin descriptions.
Reads ``guardrails.min_description_chars`` (default 60). Set the
floor low (e.g. 30) to relax the inline content check; set high
(e.g. 120) to push submitters closer to the Claude-skill-ecosystem
norm of 150220 chars per description.
"""
val = get_value("guardrails", "min_description_chars", default=60)
try:
return max(1, int(val))
except (TypeError, ValueError):
return 60
def get_guardrails_min_command_description_chars() -> int:
"""Minimum character floor for slash-command descriptions.
Reads ``guardrails.min_command_description_chars`` (default 25).
Commands are typically one-verb actions — kept tighter than skills.
"""
val = get_value("guardrails", "min_command_description_chars", default=25)
try:
return max(1, int(val))
except (TypeError, ValueError):
return 25
def get_guardrails_min_distinct_words() -> int:
"""Minimum distinct-word count for any description string.
Reads ``guardrails.min_distinct_words`` (default 5). Defends against
"padding hits the char count but says nothing" cases like
`"description description description description"`.
"""
val = get_value("guardrails", "min_distinct_words", default=5)
try:
return max(1, int(val))
except (TypeError, ValueError):
return 5
def get_guardrails_min_body_chars() -> int:
"""Minimum body-content floor for skill / agent files.
Reads ``guardrails.min_body_chars`` (default 200). Body = the
markdown after the YAML frontmatter. 200 chars is a "one paragraph"
floor that catches stubs; real skill bodies run 5002000 chars.
"""
val = get_value("guardrails", "min_body_chars", default=200)
try:
return max(1, int(val))
except (TypeError, ValueError):
return 200
def get_guardrails_enabled() -> bool:
"""Operator's stated intent for the guardrail pipeline.
Reads ``guardrails.enabled`` from instance.yaml. Defaults to True.
Operators can explicitly disable by setting ``guardrails.enabled:
false`` — useful for local development against the UI without
burning Anthropic tokens.
Note: this returns intent ONLY. Whether the LLM provider has
working credentials is a separate concern — see
:func:`get_guardrails_llm_provider_ready`. The two are kept apart
so callers can implement fail-CLOSED behavior: hold submissions at
``pending_llm`` (instead of silently auto-approving) when intent is
True but credentials are missing.
"""
return bool(get_value("guardrails", "enabled", default=True))
def get_guardrails_llm_provider_ready() -> bool:
"""Whether the LLM provider has credentials present in the
environment.
Independent from :func:`get_guardrails_enabled` (operator intent).
A False return here when intent is True is a misconfiguration —
the caller should hold submissions at ``pending_llm`` and surface
a loud boot-time warning rather than silently auto-approving.
"""
if os.environ.get("ANTHROPIC_API_KEY", "").strip():
return True
if os.environ.get("LLM_API_KEY", "").strip():
return True
return False