agnes-the-ai-analyst/Caddyfile
ZdenekSrotyr 1be997f6d4 feat(caddy): file_server for parquet downloads — bypass uvicorn
A single analyst's multi-GB `agnes pull` held the only uvicorn worker
for the duration of the stream, starving UI / /api/health / every other
API endpoint. Container flipped to `unhealthy`. Triggered while a
6.8 GB `order_economics` pull was in-flight on prod 2026-05-05.

Caddy now intercepts `GET /api/data/{table_id}/download` and serves
the parquet directly via sendfile from the data volume (mounted r-o
at /srv inside the caddy container). RBAC enforced by `forward_auth`
to a new lightweight `GET /api/data/{table_id}/check-access` endpoint
(returns 204 / 403) — the bulk transfer never reaches uvicorn.

Path discovery via `try_files` over the known extract.duckdb v2 source
subdirs. Anything not at a static path falls through to the existing
app handler so legacy `src_data/parquet` and future connectors still
work without a Caddyfile change. Non-Caddy deployments are unchanged.

Stage 1 (multi-worker uvicorn) was considered but blocked by the
single-writer DuckDB lock on system.duckdb — workers > 1 would crash
at startup on "Could not set lock on file", the same race that pushed
the scheduler from in-process writes to HTTP-via-app. Multi-reader
workers + single-writer coordination is out of scope for this PR.
2026-05-05 16:41:33 +02:00

87 lines
3.9 KiB
Caddyfile

{$DOMAIN:localhost} {
# Cert provisioning. Driven by env var CADDY_TLS:
# - unset (default) → cert-file mode for corporate PKI (rotated by
# scripts/ops/agnes-tls-rotate.sh into /data/state/certs/).
# - "tls <email>" → Let's Encrypt auto-issue, e.g. "tls ops@example.com"
# (used by public-internet deployments).
# - "tls internal" → Caddy-managed self-signed cert (lab/dev only,
# browser warning on every visit).
#
# The {$VAR:default} substitution lets one Caddyfile serve all three
# regimes without per-deployment forks. Caddyfile parses the substituted
# string as a directive, so the value MUST start with `tls `.
{$CADDY_TLS:tls /certs/fullchain.pem /certs/privkey.pem} {
# Modern TLS only. Caddy default already excludes 1.0/1.1 in
# most builds, but pin explicitly so a future Caddy default
# change can't silently weaken our posture.
protocols tls1.2 tls1.3
}
# Security headers
header {
# HSTS: tell compliant browsers to refuse plain-HTTP for this host
# for a year. Skipping `preload` so we keep an escape hatch (preload
# submission is hard-bound and blocks rollback). Skipping
# `includeSubDomains` because we don't control subdomains.
Strict-Transport-Security "max-age=31536000"
# Prevent clickjacking — dashboard is not embedded in iframes
X-Frame-Options "DENY"
# Prevent MIME-type sniffing — browser must honor declared Content-Type
X-Content-Type-Options "nosniff"
# Limit referrer leakage to origin on same-site navigations only
Referrer-Policy "strict-origin-when-cross-origin"
# Strip Server header to avoid fingerprinting the reverse proxy
-Server
}
# Direct file_server for parquet downloads — bypasses uvicorn so a
# multi-GB pull from one analyst can't starve the app workers and
# block UI / health / API for everyone else. forward_auth calls the
# app's lightweight ``/api/data/{id}/check-access`` (RBAC only,
# ~1 ms) on every request; on 2xx Caddy serves the file directly
# via sendfile/zero-copy from the data volume mounted read-only.
#
# Path layout matches `app/api/data.py`'s extract.duckdb v2 search:
# /data/extracts/<source_type>/data/<table_id>.parquet
# try_files probes known source subdirs in order; first hit wins.
# If a deployment adds a new connector and lands parquets at a fresh
# subdir, extend the try_files list. Anything that misses falls
# through to the app reverse_proxy below — so an unmapped source
# degrades to "downloads work, just through uvicorn" — never 404.
@download path_regexp tid ^/api/data/([^/]+)/download$
handle @download {
forward_auth app:8000 {
uri /api/data/{re.tid.1}/check-access
# Bearer PAT or session cookie travels in Authorization
# / Cookie; copy_headers ensures the upstream sees them.
copy_headers Authorization Cookie
}
# Caddy's own /data is occupied by the caddy_data volume, so the
# agnes data dir is mounted at /srv (read-only) instead — see the
# `data:/srv:ro` line in docker-compose.yml's caddy service. The
# root + try_files combo therefore probes /srv/extracts/...
root * /srv/extracts
try_files /bigquery/data/{re.tid.1}.parquet /keboola/data/{re.tid.1}.parquet /jira/data/{re.tid.1}.parquet
@found file
handle @found {
header Content-Disposition "attachment; filename=\"{re.tid.1}.parquet\""
file_server
}
# Fallback: parquet not at any known static path → defer to app
# (handles legacy src_data/parquet layout + future connectors).
reverse_proxy app:8000 {
header_up X-Forwarded-Proto https
header_up X-Forwarded-Host {host}
}
}
reverse_proxy app:8000 {
# App's uvicorn runs with --proxy-headers, so stamping these
# ourselves makes OAuth callback URLs and Set-Cookie Secure
# flags resolve to https consistently. X-Forwarded-Host is
# also Caddy's default, but pinning it explicitly insures
# against future default changes.
header_up X-Forwarded-Proto https
header_up X-Forwarded-Host {host}
}
}