agnes-the-ai-analyst/docker-compose.flat-mount.yml
Vojtech Rysanek a303de0372 feat: STATE_DIR env var + flat-mount overlay (parallel disks)
Introduces STATE_DIR as the single source of truth for the writable
state directory path, with backward-compatible default of
${DATA_DIR}/state. Pairs with a new docker-compose.flat-mount.yml
overlay that mounts the state disk in PARALLEL to the data disk
(rather than nested under it).

Why
---
The default deployment topology nests state under data: sdb at /data,
sdc at /data/state. That layout has known fragility documented in
docs/state-dir.md — bind-propagation gotchas, two-writer collisions
on the same prefix, mount-order coupling. The 2026-05-05 incident in
the Groupon FoundryAI deployment was a manifestation of the
propagation gotcha.

The flat layout (sdb at /data, sdc at /data-state — parallel, not
nested) eliminates the nested-mount class entirely. Each disk is its
own bind mount, recursive by default in modern Docker. No volume
options to forget. No two-writer collision (host scripts and
container app share /data-state at the same path, single namespace).

What changes
------------
App code (Python):
- src/db.py:        new _get_state_dir() helper. get_system_db() and
                    schema migration snapshot use it.
- app/secrets.py:   new _state_dir() helper. _load_or_generate() uses
                    it for .session_secret and .jwt_secret.
- app/main.py:      .env_overlay loaded from _state_dir().

Host scripts:
- scripts/ops/agnes-auto-upgrade.sh: STATE_DIR drives mount-sanity
  check and cert detection. Defaults preserve existing behavior.
- scripts/ops/agnes-tls-rotate.sh:   STATE_DIR drives CERT_DIR.

New compose overlay:
- docker-compose.flat-mount.yml: parallel /data and /data-state binds
  per service. Mutually exclusive with docker-compose.host-mount.yml;
  pick one based on disk topology.

Documentation:
- docs/state-dir.md: layout choice (A nested vs B flat), pros/cons,
  migration steps, and which code paths read STATE_DIR.

Backward compatibility
----------------------
STATE_DIR defaults to ${DATA_DIR}/state — current behavior. Existing
deployers that don't set the var see no behavior change. Migration
to flat layout is opt-in per the runbook in docs/state-dir.md.

Validation
----------
- bash -n on both host scripts: pass
- docker compose config -f docker-compose.flat-mount.yml: resolves
  cleanly with all 6 services binding /data and /data-state directly
- python3 import + helper exercise: STATE_DIR override works,
  default falls back to ${DATA_DIR}/state

Companion to PR #191 (drop named-volume driver_opts in host-mount.yml).
That PR fixes the immutability footgun for Layout A; this PR offers
Layout B as the architectural alternative.
2026-05-05 19:28:07 +02:00

88 lines
2.7 KiB
YAML

# Flat-mount overlay — parallel host binds for /data and /data-state.
#
# Why this overlay
# ----------------
# The default deployment topology nests state under data: sdb at /data,
# sdc at /data/state (i.e. /data/state is a separate disk mounted INSIDE
# the data disk). That layout works but has known fragility:
#
# - Bind-mount propagation matters. A non-recursive bind hides the
# nested mount, leading to silent shadow writes (the failure mode
# that caused 2026-05-05 in the Groupon FoundryAI deployment).
#
# - Two writers, one tree. Host-side timers (tls-rotate.timer)
# write to /data/state/certs as root, while the container app
# writes to /data/state/system.duckdb as uid 999. Same prefix,
# different mount-namespace views = ownership conflicts.
#
# - sdb resize requires umounting sdc first. Mount-order coupling.
#
# This overlay removes the nesting by mounting the state disk in
# PARALLEL to the data disk:
#
# sdb at /data (analytics, regenerable)
# sdc at /data-state (DuckDB, secrets, certs — irreplaceable)
#
# Both are direct service-level binds, recursive by default in modern
# Docker Engine. No volume options to forget. No nested propagation.
# No two-writer collision (app uses /data-state, host scripts also use
# /data-state — same path, single namespace).
#
# Usage
# -----
# 1. On the operator's host: mount the config disk at /data-state
# (instead of /data/state). Update fstab. Move existing state
# contents from /data/state to /data-state.
#
# 2. In /opt/agnes/.env, set STATE_DIR=/data-state. The app's secrets
# module + DuckDB code, plus the host-side rotate.sh and
# auto-upgrade.sh scripts, all read this var.
#
# 3. Compose invocation:
#
# docker compose \
# -f docker-compose.yml \
# -f docker-compose.prod.yml \
# -f docker-compose.flat-mount.yml \
# up -d
#
# Note: this overlay is mutually exclusive with docker-compose.host-mount.yml.
# Pick one based on your disk topology.
#
# Do NOT use this overlay in CI — /data and /data-state do not exist
# on GitHub runners.
services:
app:
volumes: !override
- /data:/data
- /data-state:/data-state
- ./config:/app/config:ro
extract:
volumes: !override
- /data:/data
- /data-state:/data-state
- ./config:/app/config:ro
scheduler:
volumes: !override
- /data:/data
- /data-state:/data-state
- ./config:/app/config:ro
telegram-bot:
volumes: !override
- /data:/data
- /data-state:/data-state
ws-gateway:
volumes: !override
- /data:/data
- /data-state:/data-state
caddy:
volumes: !override
- ./Caddyfile:/etc/caddy/Caddyfile:ro
- /data-state/certs:/certs:ro
- caddy_data:/data