agnes-the-ai-analyst/Caddyfile
ZdenekSrotyr 7a72ea9c37 fix: Devin Review on #188 — try_files fallback + auto-upgrade ordering
Two bugs Devin caught:

1. Caddy `try_files A B C` rewrites the URI to its LAST entry when no
   file matches (per Caddy docs). Without an explicit "back to original
   URI" fallback, a parquet missing from all three known static paths
   would get rewritten to `/jira/data/<id>.parquet`, and the
   reverse_proxy below would forward THAT rewritten URI to app:8000 →
   404. The PR's documented "missed → falls through to app handler"
   promise didn't actually hold for legacy / future connectors. Append
   `/api/data/<id>/download` as the final try_files entry so the
   reverse_proxy receives the analyst-facing URI.

2. agnes-auto-upgrade.sh's TLS-overlay decision (which checks Caddyfile
   existence) ran BEFORE the config re-fetch loop. If a tick's fetch
   added a previously-missing Caddyfile, this tick's docker compose
   would still omit `--profile tls` until the next 5-min tick — a
   window where the recreate uses the wrong overlay set. Move the
   COMPOSE_FILES tls extension AFTER the fetch.

Also strip the workspace prompt of table-list / metric-count
enumerations (per user feedback): those are dynamic snapshots that go
stale; replace with explicit "use `agnes catalog` / `agnes schema` /
`agnes describe` to discover" guidance plus a note about
`rough_size_hint` semantics. The Available Datasets `{% for t in tables %}`
loop is gone — analysts use the live CLI instead.
2026-05-05 17:24:42 +02:00

100 lines
4.7 KiB
Caddyfile

{$DOMAIN:localhost} {
# Cert provisioning. Driven by env var CADDY_TLS:
# - unset (default) → cert-file mode for corporate PKI (rotated by
# scripts/ops/agnes-tls-rotate.sh into /data/state/certs/).
# - "tls <email>" → Let's Encrypt auto-issue, e.g. "tls ops@example.com"
# (used by public-internet deployments).
# - "tls internal" → Caddy-managed self-signed cert (lab/dev only,
# browser warning on every visit).
#
# The {$VAR:default} substitution lets one Caddyfile serve all three
# regimes without per-deployment forks. Caddyfile parses the substituted
# string as a directive, so the value MUST start with `tls `.
{$CADDY_TLS:tls /certs/fullchain.pem /certs/privkey.pem} {
# Modern TLS only. Caddy default already excludes 1.0/1.1 in
# most builds, but pin explicitly so a future Caddy default
# change can't silently weaken our posture.
protocols tls1.2 tls1.3
}
# Security headers
header {
# HSTS: tell compliant browsers to refuse plain-HTTP for this host
# for a year. Skipping `preload` so we keep an escape hatch (preload
# submission is hard-bound and blocks rollback). Skipping
# `includeSubDomains` because we don't control subdomains.
Strict-Transport-Security "max-age=31536000"
# Prevent clickjacking — dashboard is not embedded in iframes
X-Frame-Options "DENY"
# Prevent MIME-type sniffing — browser must honor declared Content-Type
X-Content-Type-Options "nosniff"
# Limit referrer leakage to origin on same-site navigations only
Referrer-Policy "strict-origin-when-cross-origin"
# Strip Server header to avoid fingerprinting the reverse proxy
-Server
}
# Direct file_server for parquet downloads — bypasses uvicorn so a
# multi-GB pull from one analyst can't starve the app workers and
# block UI / health / API for everyone else. forward_auth calls the
# app's lightweight ``/api/data/{id}/check-access`` (RBAC only,
# ~1 ms) on every request; on 2xx Caddy serves the file directly
# via sendfile/zero-copy from the data volume mounted read-only.
#
# Path layout matches `app/api/data.py`'s extract.duckdb v2 search:
# /data/extracts/<source_type>/data/<table_id>.parquet
# try_files probes known source subdirs in order; first hit wins.
# If a deployment adds a new connector and lands parquets at a fresh
# subdir, extend the try_files list. Anything that misses falls
# through to the app reverse_proxy below — so an unmapped source
# degrades to "downloads work, just through uvicorn" — never 404.
@download path_regexp tid ^/api/data/([^/]+)/download$
handle @download {
forward_auth app:8000 {
uri /api/data/{re.tid.1}/check-access
# Bearer PAT or session cookie travels in Authorization
# / Cookie; copy_headers ensures the upstream sees them.
copy_headers Authorization Cookie
}
# Caddy's own /data is occupied by the caddy_data volume, so the
# agnes data dir is mounted at /srv (read-only) instead — see the
# `data:/srv:ro` line in docker-compose.yml's caddy service. The
# root + try_files combo therefore probes /srv/extracts/...
#
# Devin Review caught: `try_files A B C` rewrites the URI to its
# LAST entry when no file matches (per Caddy docs). Without an
# explicit "rewrite back to original URI" fallback, a parquet
# missing from all three known paths would get rewritten to the
# last static candidate (`/jira/data/<id>.parquet`), and the
# reverse_proxy below would forward THAT rewritten URI to
# app:8000 → app has no such route → 404. To make the documented
# "missed → falls through to app handler" promise hold, append
# the original `/api/data/<id>/download` path as the final
# try_files entry: when no file matches, the URI is rewritten
# back to the analyst-facing path and the app's `download_table`
# handler picks it up via the reverse_proxy fallback below.
root * /srv/extracts
try_files /bigquery/data/{re.tid.1}.parquet /keboola/data/{re.tid.1}.parquet /jira/data/{re.tid.1}.parquet /api/data/{re.tid.1}/download
@found file
handle @found {
header Content-Disposition "attachment; filename=\"{re.tid.1}.parquet\""
file_server
}
# Fallback: parquet not at any known static path → defer to app
# (handles legacy src_data/parquet layout + future connectors).
reverse_proxy app:8000 {
header_up X-Forwarded-Proto https
header_up X-Forwarded-Host {host}
}
}
reverse_proxy app:8000 {
# App's uvicorn runs with --proxy-headers, so stamping these
# ourselves makes OAuth callback URLs and Set-Cookie Secure
# flags resolve to https consistently. X-Forwarded-Host is
# also Caddy's default, but pinning it explicitly insures
# against future default changes.
header_up X-Forwarded-Proto https
header_up X-Forwarded-Host {host}
}
}