agnes-the-ai-analyst/docker-compose.host-mount.yml
ZdenekSrotyr df2c33147c fix: Devin Review on #194 round 2 — 3 BUG-class findings
1. instance.yaml overlay path now matches read site under STATE_DIR.
   Three sites updated:
     - app/api/admin.py:1005 (server-config endpoint writer)
     - app/api/admin.py:2610 (configure endpoint writer)
     - app/instance_config.py:106 (overlay reader)
   All three now go through _state_dir() so under flat-mount layout
   (STATE_DIR=/data-state) the irreplaceable instance.yaml overlay
   lands on the state disk (sdc) instead of the regenerable data
   disk (sdb). Without this fix, .env_overlay correctly went to the
   state disk while instance.yaml went to the data disk — config
   would be lost if an operator wiped sdb.

2. Strip customer-specific tokens from OSS repo per CLAUDE.md
   vendor-agnostic rule:
     - docker-compose.host-mount.yml: 'a deployer (Groupon FoundryAI)'
       → 'a deployer in production'
     - docker-compose.flat-mount.yml: 'caused 2026-05-05 in the
       Groupon FoundryAI deployment' → generic 'production failure
       mode'
     - docs/state-dir.md: rewrote the incident reference to describe
       the failure mode abstractly without naming the deployment;
       updated the recommendation table to say 'shadow-mount class'
       instead of dating the specific incident.

3. Updated docs/state-dir.md 'What reads STATE_DIR' to list all
   read/write sites including the three migrated in this round
   (admin.py, instance_config.py, marketplaces.py).

ANALYSIS finding (tls-rotate.sh hardcoded host-mount.yml) deferred
— same operator-side class as auto-upgrade.sh hardcoded host-mount,
documented limitation per the PR body.
2026-05-05 20:02:50 +02:00

96 lines
3.6 KiB
YAML

# Bind-mount overlay — replaces the `data` named volume with a direct
# host bind mount per service.
#
# Why direct service-level bind, not driver_opts on the named volume
# ------------------------------------------------------------------
# The previous version of this file modified the `data` named volume's
# `driver_opts` to point at /data with `o: bind,rbind`. Docker named
# volumes have an immutability footgun: once a volume is created, its
# driver options are fixed for the life of the volume. Editing this
# file and re-running `docker compose up -d` does NOT propagate the
# new options to existing volumes — they keep whatever options were
# in effect at create time.
#
# This bit a deployer in production: the volume
# was created before this overlay had `bind,rbind`, kept the old
# `bind` (non-recursive) propagation, and containers wrote to a
# shadowed subdirectory of the parent disk instead of the nested
# child mount. DuckDB went FATAL on a root-owned WAL during a
# routine container recreate; sign-in broke.
#
# Direct service-level bind mounts (`/host/path:/container/path`)
# don't go through Docker's volume layer at all. They re-evaluate
# the mount options every container start, and modern Docker Engine
# (20.10+) defaults to recursive bind for these. No options to
# forget, no immutable state to migrate, no shadow-mount class.
#
# What this overlay does
# ----------------------
# `volumes: !override` on each service replaces the base
# `data:/data` named-volume mount with a direct `/data:/data` host
# bind. The named volume `data:` declared at the bottom of
# docker-compose.yml is left intact (still useful for local-dev
# `compose up` without this overlay) but is no longer referenced
# by any service when the overlay is active.
#
# When the operator's host has a nested mount under /data (e.g. a
# separate state disk mounted at /data/state), the recursive bind
# carries that nested mount into every container automatically.
#
# Usage (combined with docker-compose.prod.yml):
# docker compose \
# -f docker-compose.yml \
# -f docker-compose.prod.yml \
# -f docker-compose.host-mount.yml \
# up -d
#
# Do NOT use this overlay in CI — /data does not exist on GitHub
# runners.
#
# Compose-spec version requirement: !override merge tag is part of
# the Compose Specification supported by Docker Compose v2.20+ and
# the compose-go library used by Compose v5+. If you need to support
# older clients, fork this overlay into per-service files.
services:
app:
volumes: !override
- /data:/data
- ./config:/app/config:ro
extract:
volumes: !override
- /data:/data
- ./config:/app/config:ro
scheduler:
volumes: !override
- /data:/data
- ./config:/app/config:ro
telegram-bot:
volumes: !override
- /data:/data
ws-gateway:
volumes: !override
- /data:/data
caddy:
# Caddy was originally inheriting `data:/srv:ro` from the base
# service. Once the other services switch to direct binds and
# nothing populates the `data:` named volume, that inherited
# mount points at an empty Docker-managed volume — and the
# @download `try_files /bigquery/data/<id>.parquet …` block
# in Caddyfile finds nothing, so every parquet download falls
# through to the app's uvicorn worker, defeating the v0.36.0
# file_server bypass.
#
# Restate every mount the base caddy service depends on; mirror
# the same caveat that lives in flat-mount.yml.
volumes: !override
- ./Caddyfile:/etc/caddy/Caddyfile:ro
- /data/state/certs:/certs:ro
- caddy_data:/data
- caddy_config:/config
- /data:/srv:ro