1. instance.yaml overlay path now matches read site under STATE_DIR.
Three sites updated:
- app/api/admin.py:1005 (server-config endpoint writer)
- app/api/admin.py:2610 (configure endpoint writer)
- app/instance_config.py:106 (overlay reader)
All three now go through _state_dir() so under flat-mount layout
(STATE_DIR=/data-state) the irreplaceable instance.yaml overlay
lands on the state disk (sdc) instead of the regenerable data
disk (sdb). Without this fix, .env_overlay correctly went to the
state disk while instance.yaml went to the data disk — config
would be lost if an operator wiped sdb.
2. Strip customer-specific tokens from OSS repo per CLAUDE.md
vendor-agnostic rule:
- docker-compose.host-mount.yml: 'a deployer (Groupon FoundryAI)'
→ 'a deployer in production'
- docker-compose.flat-mount.yml: 'caused 2026-05-05 in the
Groupon FoundryAI deployment' → generic 'production failure
mode'
- docs/state-dir.md: rewrote the incident reference to describe
the failure mode abstractly without naming the deployment;
updated the recommendation table to say 'shadow-mount class'
instead of dating the specific incident.
3. Updated docs/state-dir.md 'What reads STATE_DIR' to list all
read/write sites including the three migrated in this round
(admin.py, instance_config.py, marketplaces.py).
ANALYSIS finding (tls-rotate.sh hardcoded host-mount.yml) deferred
— same operator-side class as auto-upgrade.sh hardcoded host-mount,
documented limitation per the PR body.
96 lines
3.6 KiB
YAML
96 lines
3.6 KiB
YAML
# Bind-mount overlay — replaces the `data` named volume with a direct
|
|
# host bind mount per service.
|
|
#
|
|
# Why direct service-level bind, not driver_opts on the named volume
|
|
# ------------------------------------------------------------------
|
|
# The previous version of this file modified the `data` named volume's
|
|
# `driver_opts` to point at /data with `o: bind,rbind`. Docker named
|
|
# volumes have an immutability footgun: once a volume is created, its
|
|
# driver options are fixed for the life of the volume. Editing this
|
|
# file and re-running `docker compose up -d` does NOT propagate the
|
|
# new options to existing volumes — they keep whatever options were
|
|
# in effect at create time.
|
|
#
|
|
# This bit a deployer in production: the volume
|
|
# was created before this overlay had `bind,rbind`, kept the old
|
|
# `bind` (non-recursive) propagation, and containers wrote to a
|
|
# shadowed subdirectory of the parent disk instead of the nested
|
|
# child mount. DuckDB went FATAL on a root-owned WAL during a
|
|
# routine container recreate; sign-in broke.
|
|
#
|
|
# Direct service-level bind mounts (`/host/path:/container/path`)
|
|
# don't go through Docker's volume layer at all. They re-evaluate
|
|
# the mount options every container start, and modern Docker Engine
|
|
# (20.10+) defaults to recursive bind for these. No options to
|
|
# forget, no immutable state to migrate, no shadow-mount class.
|
|
#
|
|
# What this overlay does
|
|
# ----------------------
|
|
# `volumes: !override` on each service replaces the base
|
|
# `data:/data` named-volume mount with a direct `/data:/data` host
|
|
# bind. The named volume `data:` declared at the bottom of
|
|
# docker-compose.yml is left intact (still useful for local-dev
|
|
# `compose up` without this overlay) but is no longer referenced
|
|
# by any service when the overlay is active.
|
|
#
|
|
# When the operator's host has a nested mount under /data (e.g. a
|
|
# separate state disk mounted at /data/state), the recursive bind
|
|
# carries that nested mount into every container automatically.
|
|
#
|
|
# Usage (combined with docker-compose.prod.yml):
|
|
# docker compose \
|
|
# -f docker-compose.yml \
|
|
# -f docker-compose.prod.yml \
|
|
# -f docker-compose.host-mount.yml \
|
|
# up -d
|
|
#
|
|
# Do NOT use this overlay in CI — /data does not exist on GitHub
|
|
# runners.
|
|
#
|
|
# Compose-spec version requirement: !override merge tag is part of
|
|
# the Compose Specification supported by Docker Compose v2.20+ and
|
|
# the compose-go library used by Compose v5+. If you need to support
|
|
# older clients, fork this overlay into per-service files.
|
|
|
|
services:
|
|
app:
|
|
volumes: !override
|
|
- /data:/data
|
|
- ./config:/app/config:ro
|
|
|
|
extract:
|
|
volumes: !override
|
|
- /data:/data
|
|
- ./config:/app/config:ro
|
|
|
|
scheduler:
|
|
volumes: !override
|
|
- /data:/data
|
|
- ./config:/app/config:ro
|
|
|
|
telegram-bot:
|
|
volumes: !override
|
|
- /data:/data
|
|
|
|
ws-gateway:
|
|
volumes: !override
|
|
- /data:/data
|
|
|
|
caddy:
|
|
# Caddy was originally inheriting `data:/srv:ro` from the base
|
|
# service. Once the other services switch to direct binds and
|
|
# nothing populates the `data:` named volume, that inherited
|
|
# mount points at an empty Docker-managed volume — and the
|
|
# @download `try_files /bigquery/data/<id>.parquet …` block
|
|
# in Caddyfile finds nothing, so every parquet download falls
|
|
# through to the app's uvicorn worker, defeating the v0.36.0
|
|
# file_server bypass.
|
|
#
|
|
# Restate every mount the base caddy service depends on; mirror
|
|
# the same caveat that lives in flat-mount.yml.
|
|
volumes: !override
|
|
- ./Caddyfile:/etc/caddy/Caddyfile:ro
|
|
- /data/state/certs:/certs:ro
|
|
- caddy_data:/data
|
|
- caddy_config:/config
|
|
- /data:/srv:ro
|