Introduces STATE_DIR as the single source of truth for the writable
state directory path, with backward-compatible default of
${DATA_DIR}/state. Pairs with a new docker-compose.flat-mount.yml
overlay that mounts the state disk in PARALLEL to the data disk
(rather than nested under it).
Why
---
The default deployment topology nests state under data: sdb at /data,
sdc at /data/state. That layout has known fragility documented in
docs/state-dir.md — bind-propagation gotchas, two-writer collisions
on the same prefix, mount-order coupling. The 2026-05-05 incident in
the Groupon FoundryAI deployment was a manifestation of the
propagation gotcha.
The flat layout (sdb at /data, sdc at /data-state — parallel, not
nested) eliminates the nested-mount class entirely. Each disk is its
own bind mount, recursive by default in modern Docker. No volume
options to forget. No two-writer collision (host scripts and
container app share /data-state at the same path, single namespace).
What changes
------------
App code (Python):
- src/db.py: new _get_state_dir() helper. get_system_db() and
schema migration snapshot use it.
- app/secrets.py: new _state_dir() helper. _load_or_generate() uses
it for .session_secret and .jwt_secret.
- app/main.py: .env_overlay loaded from _state_dir().
Host scripts:
- scripts/ops/agnes-auto-upgrade.sh: STATE_DIR drives mount-sanity
check and cert detection. Defaults preserve existing behavior.
- scripts/ops/agnes-tls-rotate.sh: STATE_DIR drives CERT_DIR.
New compose overlay:
- docker-compose.flat-mount.yml: parallel /data and /data-state binds
per service. Mutually exclusive with docker-compose.host-mount.yml;
pick one based on disk topology.
Documentation:
- docs/state-dir.md: layout choice (A nested vs B flat), pros/cons,
migration steps, and which code paths read STATE_DIR.
Backward compatibility
----------------------
STATE_DIR defaults to ${DATA_DIR}/state — current behavior. Existing
deployers that don't set the var see no behavior change. Migration
to flat layout is opt-in per the runbook in docs/state-dir.md.
Validation
----------
- bash -n on both host scripts: pass
- docker compose config -f docker-compose.flat-mount.yml: resolves
cleanly with all 6 services binding /data and /data-state directly
- python3 import + helper exercise: STATE_DIR override works,
default falls back to ${DATA_DIR}/state
Companion to PR #191 (drop named-volume driver_opts in host-mount.yml).
That PR fixes the immutability footgun for Layout A; this PR offers
Layout B as the architectural alternative.
55 lines
1.7 KiB
Python
55 lines
1.7 KiB
Python
"""Auto-generate and persist secrets that survive container restarts."""
|
|
import logging
|
|
import os
|
|
import secrets
|
|
from pathlib import Path
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
def _state_dir() -> Path:
|
|
"""Return path to writable state directory.
|
|
|
|
STATE_DIR env var takes precedence; otherwise defaults to
|
|
${DATA_DIR}/state for backward compatibility with deployments
|
|
that nest state under the data disk. See docs/state-dir.md.
|
|
"""
|
|
state = os.environ.get("STATE_DIR", "")
|
|
if state:
|
|
return Path(state)
|
|
return Path(os.environ.get("DATA_DIR", "./data")) / "state"
|
|
|
|
|
|
def _load_or_generate(env_var: str, file_name: str) -> str:
|
|
"""Load secret from env var, or from file, or generate and persist."""
|
|
val = os.environ.get(env_var, "")
|
|
if val:
|
|
return val
|
|
secret_path = _state_dir() / file_name
|
|
if secret_path.exists():
|
|
val = secret_path.read_text().strip()
|
|
if val:
|
|
return val
|
|
logger.warning("Secret file %s is empty, regenerating", secret_path)
|
|
secret_path.parent.mkdir(parents=True, exist_ok=True)
|
|
val = secrets.token_hex(32)
|
|
secret_path.write_text(val)
|
|
try:
|
|
secret_path.chmod(0o600)
|
|
except OSError:
|
|
pass # chmod not supported on all platforms (e.g., Windows)
|
|
logger.info(
|
|
"Auto-generated %s -> %s (set %s in .env to use a fixed value)",
|
|
file_name, secret_path, env_var,
|
|
)
|
|
return val
|
|
|
|
|
|
def get_jwt_secret() -> str:
|
|
"""Get JWT secret key from env, file, or auto-generate."""
|
|
return _load_or_generate("JWT_SECRET_KEY", ".jwt_secret")
|
|
|
|
|
|
def get_session_secret() -> str:
|
|
"""Get session secret from env, file, or auto-generate."""
|
|
return _load_or_generate("SESSION_SECRET", ".session_secret")
|