A single analyst's multi-GB `agnes pull` held the only uvicorn worker
for the duration of the stream, starving UI / /api/health / every other
API endpoint. Container flipped to `unhealthy`. Triggered while a
6.8 GB `order_economics` pull was in-flight on prod 2026-05-05.
Caddy now intercepts `GET /api/data/{table_id}/download` and serves
the parquet directly via sendfile from the data volume (mounted r-o
at /srv inside the caddy container). RBAC enforced by `forward_auth`
to a new lightweight `GET /api/data/{table_id}/check-access` endpoint
(returns 204 / 403) — the bulk transfer never reaches uvicorn.
Path discovery via `try_files` over the known extract.duckdb v2 source
subdirs. Anything not at a static path falls through to the existing
app handler so legacy `src_data/parquet` and future connectors still
work without a Caddyfile change. Non-Caddy deployments are unchanged.
Stage 1 (multi-worker uvicorn) was considered but blocked by the
single-writer DuckDB lock on system.duckdb — workers > 1 would crash
at startup on "Could not set lock on file", the same race that pushed
the scheduler from in-process writes to HTTP-via-app. Multi-reader
workers + single-writer coordination is out of scope for this PR.
87 lines
3.9 KiB
Caddyfile
87 lines
3.9 KiB
Caddyfile
{$DOMAIN:localhost} {
|
|
# Cert provisioning. Driven by env var CADDY_TLS:
|
|
# - unset (default) → cert-file mode for corporate PKI (rotated by
|
|
# scripts/ops/agnes-tls-rotate.sh into /data/state/certs/).
|
|
# - "tls <email>" → Let's Encrypt auto-issue, e.g. "tls ops@example.com"
|
|
# (used by public-internet deployments).
|
|
# - "tls internal" → Caddy-managed self-signed cert (lab/dev only,
|
|
# browser warning on every visit).
|
|
#
|
|
# The {$VAR:default} substitution lets one Caddyfile serve all three
|
|
# regimes without per-deployment forks. Caddyfile parses the substituted
|
|
# string as a directive, so the value MUST start with `tls `.
|
|
{$CADDY_TLS:tls /certs/fullchain.pem /certs/privkey.pem} {
|
|
# Modern TLS only. Caddy default already excludes 1.0/1.1 in
|
|
# most builds, but pin explicitly so a future Caddy default
|
|
# change can't silently weaken our posture.
|
|
protocols tls1.2 tls1.3
|
|
}
|
|
|
|
# Security headers
|
|
header {
|
|
# HSTS: tell compliant browsers to refuse plain-HTTP for this host
|
|
# for a year. Skipping `preload` so we keep an escape hatch (preload
|
|
# submission is hard-bound and blocks rollback). Skipping
|
|
# `includeSubDomains` because we don't control subdomains.
|
|
Strict-Transport-Security "max-age=31536000"
|
|
# Prevent clickjacking — dashboard is not embedded in iframes
|
|
X-Frame-Options "DENY"
|
|
# Prevent MIME-type sniffing — browser must honor declared Content-Type
|
|
X-Content-Type-Options "nosniff"
|
|
# Limit referrer leakage to origin on same-site navigations only
|
|
Referrer-Policy "strict-origin-when-cross-origin"
|
|
# Strip Server header to avoid fingerprinting the reverse proxy
|
|
-Server
|
|
}
|
|
|
|
# Direct file_server for parquet downloads — bypasses uvicorn so a
|
|
# multi-GB pull from one analyst can't starve the app workers and
|
|
# block UI / health / API for everyone else. forward_auth calls the
|
|
# app's lightweight ``/api/data/{id}/check-access`` (RBAC only,
|
|
# ~1 ms) on every request; on 2xx Caddy serves the file directly
|
|
# via sendfile/zero-copy from the data volume mounted read-only.
|
|
#
|
|
# Path layout matches `app/api/data.py`'s extract.duckdb v2 search:
|
|
# /data/extracts/<source_type>/data/<table_id>.parquet
|
|
# try_files probes known source subdirs in order; first hit wins.
|
|
# If a deployment adds a new connector and lands parquets at a fresh
|
|
# subdir, extend the try_files list. Anything that misses falls
|
|
# through to the app reverse_proxy below — so an unmapped source
|
|
# degrades to "downloads work, just through uvicorn" — never 404.
|
|
@download path_regexp tid ^/api/data/([^/]+)/download$
|
|
handle @download {
|
|
forward_auth app:8000 {
|
|
uri /api/data/{re.tid.1}/check-access
|
|
# Bearer PAT or session cookie travels in Authorization
|
|
# / Cookie; copy_headers ensures the upstream sees them.
|
|
copy_headers Authorization Cookie
|
|
}
|
|
# Caddy's own /data is occupied by the caddy_data volume, so the
|
|
# agnes data dir is mounted at /srv (read-only) instead — see the
|
|
# `data:/srv:ro` line in docker-compose.yml's caddy service. The
|
|
# root + try_files combo therefore probes /srv/extracts/...
|
|
root * /srv/extracts
|
|
try_files /bigquery/data/{re.tid.1}.parquet /keboola/data/{re.tid.1}.parquet /jira/data/{re.tid.1}.parquet
|
|
@found file
|
|
handle @found {
|
|
header Content-Disposition "attachment; filename=\"{re.tid.1}.parquet\""
|
|
file_server
|
|
}
|
|
# Fallback: parquet not at any known static path → defer to app
|
|
# (handles legacy src_data/parquet layout + future connectors).
|
|
reverse_proxy app:8000 {
|
|
header_up X-Forwarded-Proto https
|
|
header_up X-Forwarded-Host {host}
|
|
}
|
|
}
|
|
|
|
reverse_proxy app:8000 {
|
|
# App's uvicorn runs with --proxy-headers, so stamping these
|
|
# ourselves makes OAuth callback URLs and Set-Cookie Secure
|
|
# flags resolve to https consistently. X-Forwarded-Host is
|
|
# also Caddy's default, but pinning it explicitly insures
|
|
# against future default changes.
|
|
header_up X-Forwarded-Proto https
|
|
header_up X-Forwarded-Host {host}
|
|
}
|
|
}
|