agnes-the-ai-analyst/src
Vojtech Rysanek a303de0372 feat: STATE_DIR env var + flat-mount overlay (parallel disks)
Introduces STATE_DIR as the single source of truth for the writable
state directory path, with backward-compatible default of
${DATA_DIR}/state. Pairs with a new docker-compose.flat-mount.yml
overlay that mounts the state disk in PARALLEL to the data disk
(rather than nested under it).

Why
---
The default deployment topology nests state under data: sdb at /data,
sdc at /data/state. That layout has known fragility documented in
docs/state-dir.md — bind-propagation gotchas, two-writer collisions
on the same prefix, mount-order coupling. The 2026-05-05 incident in
the Groupon FoundryAI deployment was a manifestation of the
propagation gotcha.

The flat layout (sdb at /data, sdc at /data-state — parallel, not
nested) eliminates the nested-mount class entirely. Each disk is its
own bind mount, recursive by default in modern Docker. No volume
options to forget. No two-writer collision (host scripts and
container app share /data-state at the same path, single namespace).

What changes
------------
App code (Python):
- src/db.py:        new _get_state_dir() helper. get_system_db() and
                    schema migration snapshot use it.
- app/secrets.py:   new _state_dir() helper. _load_or_generate() uses
                    it for .session_secret and .jwt_secret.
- app/main.py:      .env_overlay loaded from _state_dir().

Host scripts:
- scripts/ops/agnes-auto-upgrade.sh: STATE_DIR drives mount-sanity
  check and cert detection. Defaults preserve existing behavior.
- scripts/ops/agnes-tls-rotate.sh:   STATE_DIR drives CERT_DIR.

New compose overlay:
- docker-compose.flat-mount.yml: parallel /data and /data-state binds
  per service. Mutually exclusive with docker-compose.host-mount.yml;
  pick one based on disk topology.

Documentation:
- docs/state-dir.md: layout choice (A nested vs B flat), pros/cons,
  migration steps, and which code paths read STATE_DIR.

Backward compatibility
----------------------
STATE_DIR defaults to ${DATA_DIR}/state — current behavior. Existing
deployers that don't set the var see no behavior change. Migration
to flat layout is opt-in per the runbook in docs/state-dir.md.

Validation
----------
- bash -n on both host scripts: pass
- docker compose config -f docker-compose.flat-mount.yml: resolves
  cleanly with all 6 services binding /data and /data-state directly
- python3 import + helper exercise: STATE_DIR override works,
  default falls back to ${DATA_DIR}/state

Companion to PR #191 (drop named-volume driver_opts in host-mount.yml).
That PR fixes the immutability footgun for Layout A; this PR offers
Layout B as the architectural alternative.
2026-05-05 19:28:07 +02:00
..
repositories fix(anthropic): strict json_schema (additionalProperties=false) + add /admin/scheduler-runs UI 2026-05-05 08:00:57 +02:00
__init__.py Extract Keboola into connectors/keboola module 2026-03-09 12:22:16 +01:00
catalog_export.py feat(observability): request_id end-to-end + dev debug toolbar + centralized logging (#136) 2026-04-29 22:54:21 +02:00
claude_md.py fix(claude_md): load default via importlib.resources — survives /app/config bind-mount 2026-05-04 06:53:47 +02:00
db.py feat: STATE_DIR env var + flat-mount overlay (parallel disks) 2026-05-05 19:28:07 +02:00
identifier_validation.py fix(security): #81 Group D — extractor-side identifier validation (squashed) (#97) 2026-04-27 21:46:17 +02:00
marketplace.py feat(rbac+marketplace): schema v14 FK + AGNES_ENABLE_TABLE_GRANTS + break-glass CLI 2026-04-28 14:25:13 +02:00
marketplace_filter.py fix(marketplace): use plugin.json name in synth marketplace.json (#133) 2026-04-29 19:25:57 +02:00
orchestrator.py feat(bigquery): bq_query_timeout_ms knob; default 600s (was 90s) 2026-05-05 16:40:40 +02:00
orchestrator_security.py fix(security): #81 Group A — orchestrator attach hardening (squashed) (#95) 2026-04-27 21:34:04 +02:00
profiler.py feat(observability): request_id end-to-end + dev debug toolbar + centralized logging (#136) 2026-04-29 22:54:21 +02:00
rbac.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
remote_query.py fix(v2): #134 BigQuery cross-project errors return structured 502/400 + BqAccess facade (#138) 2026-04-30 10:11:20 +02:00
scheduler.py feat(scheduler): re-wire sync_schedule + script.schedule; tune via env; OpenMetadata TLS (#135) 2026-04-29 22:06:30 +02:00
sql_safe.py feat(v2): claude-driven fetch primitives + 0.14.0 (#102) 2026-04-29 01:07:19 +02:00
welcome_template.py refactor(welcome-template): drop role param; resolve plugins per-user unconditionally 2026-05-04 22:13:46 +02:00