fix: Devin Review on #188 — try_files fallback + auto-upgrade ordering

Two bugs Devin caught:

1. Caddy `try_files A B C` rewrites the URI to its LAST entry when no
   file matches (per Caddy docs). Without an explicit "back to original
   URI" fallback, a parquet missing from all three known static paths
   would get rewritten to `/jira/data/<id>.parquet`, and the
   reverse_proxy below would forward THAT rewritten URI to app:8000 →
   404. The PR's documented "missed → falls through to app handler"
   promise didn't actually hold for legacy / future connectors. Append
   `/api/data/<id>/download` as the final try_files entry so the
   reverse_proxy receives the analyst-facing URI.

2. agnes-auto-upgrade.sh's TLS-overlay decision (which checks Caddyfile
   existence) ran BEFORE the config re-fetch loop. If a tick's fetch
   added a previously-missing Caddyfile, this tick's docker compose
   would still omit `--profile tls` until the next 5-min tick — a
   window where the recreate uses the wrong overlay set. Move the
   COMPOSE_FILES tls extension AFTER the fetch.

Also strip the workspace prompt of table-list / metric-count
enumerations (per user feedback): those are dynamic snapshots that go
stale; replace with explicit "use `agnes catalog` / `agnes schema` /
`agnes describe` to discover" guidance plus a note about
`rough_size_hint` semantics. The Available Datasets `{% for t in tables %}`
loop is gone — analysts use the live CLI instead.
This commit is contained in:
ZdenekSrotyr 2026-05-05 17:24:42 +02:00
parent 30e81a15b9
commit 7a72ea9c37
3 changed files with 64 additions and 28 deletions

View file

@ -60,8 +60,21 @@
# agnes data dir is mounted at /srv (read-only) instead see the # agnes data dir is mounted at /srv (read-only) instead see the
# `data:/srv:ro` line in docker-compose.yml's caddy service. The # `data:/srv:ro` line in docker-compose.yml's caddy service. The
# root + try_files combo therefore probes /srv/extracts/... # root + try_files combo therefore probes /srv/extracts/...
#
# Devin Review caught: `try_files A B C` rewrites the URI to its
# LAST entry when no file matches (per Caddy docs). Without an
# explicit "rewrite back to original URI" fallback, a parquet
# missing from all three known paths would get rewritten to the
# last static candidate (`/jira/data/<id>.parquet`), and the
# reverse_proxy below would forward THAT rewritten URI to
# app:8000 app has no such route 404. To make the documented
# "missed falls through to app handler" promise hold, append
# the original `/api/data/<id>/download` path as the final
# try_files entry: when no file matches, the URI is rewritten
# back to the analyst-facing path and the app's `download_table`
# handler picks it up via the reverse_proxy fallback below.
root * /srv/extracts root * /srv/extracts
try_files /bigquery/data/{re.tid.1}.parquet /keboola/data/{re.tid.1}.parquet /jira/data/{re.tid.1}.parquet try_files /bigquery/data/{re.tid.1}.parquet /keboola/data/{re.tid.1}.parquet /jira/data/{re.tid.1}.parquet /api/data/{re.tid.1}/download
@found file @found file
handle @found { handle @found {
header Content-Disposition "attachment; filename=\"{re.tid.1}.parquet\"" header Content-Disposition "attachment; filename=\"{re.tid.1}.parquet\""

View file

@ -28,22 +28,31 @@ This workspace is connected to {{ server.url }}.
- **Personal customizations go in `.claude/CLAUDE.local.md`, NOT here.** This file is regenerated by `agnes init --force`; edits here will be lost. CLAUDE.local.md is preserved across regeneration and uploaded on `agnes push`. - **Personal customizations go in `.claude/CLAUDE.local.md`, NOT here.** This file is regenerated by `agnes init --force`; edits here will be lost. CLAUDE.local.md is preserved across regeneration and uploaded on `agnes push`.
## Metrics Workflow ## Metrics Workflow
1. `agnes catalog --metrics` — find the relevant metric ({{ metrics.count }} available, categories: {{ metrics.categories | join(", ") or "none yet" }}) 1. `agnes catalog --metrics` — list registered metrics + categories
2. `agnes catalog --metrics --show <category>/<name>` — read SQL and business rules 2. `agnes catalog --metrics --show <category>/<name>` — read the canonical SQL + business rules
3. Use the canonical SQL from the metric definition, adapt to the question 3. Adapt the canonical SQL; never invent metric calculations
4. Never invent metric calculations — always check existing definitions first
## Data Sync ## Data Sync
- `agnes pull` — download current data from server - `agnes pull` — download current data from server
- `agnes push` — upload sessions and local notes to server - `agnes push` — upload sessions and local notes to server
- Data on the server refreshes every {{ sync_interval }} - Data on the server refreshes every {{ sync_interval }}
## Available Datasets ## Discovering tables — never enumerate from memory
{% for t in tables -%}
- `{{ t.name }}`{% if t.description %} — {{ t.description }}{% endif %}{% if t.query_mode == "remote" %} *(remote, queried on demand)*{% endif %} Tables, columns, sizes, and `query_mode` change as admins register / migrate /
{% else -%} drop entries. Always re-discover from the live server, never from this file:
- _No tables registered yet — ask an admin to register tables in the dashboard._
{% endfor %} ```
agnes catalog --json # canonical list with query_mode, sql_flavor,
# where_examples, fetch_via, rough_size_hint per table
agnes schema <table> # columns + types in the right SQL dialect
agnes describe <table> -n 5 # sample rows (local + materialized only)
```
`rough_size_hint` is server-populated for `local` and `materialized` tables
(`small` ≤100 MiB, `medium` ≤1 GiB, `large` ≤10 GiB, `very_large` >10 GiB) and
`null` for `remote` rows. When `null`, treat the table as potentially large
and use `agnes snapshot create --estimate` to size-check before fetching.
{% if marketplaces -%} {% if marketplaces -%}
## Plugins available to you ## Plugins available to you

View file

@ -53,25 +53,16 @@ IMAGE="ghcr.io/keboola/agnes-the-ai-analyst:${AGNES_TAG:-stable}"
# Array form (vs. word-split string) — quoted expansion survives paths # Array form (vs. word-split string) — quoted expansion survives paths
# with spaces and is the modern bash idiom. Functionally identical here # with spaces and is the modern bash idiom. Functionally identical here
# since /opt/agnes paths are tame, but it's a cheap habit to keep. # since /opt/agnes paths are tame, but it's a cheap habit to keep.
#
# The TLS-overlay decision deliberately runs BELOW the config re-fetch
# (Devin Review caught: this used to live here, evaluating Caddyfile
# existence against the PRE-fetch state. If the fetch added a
# previously-missing Caddyfile, this tick's docker compose would still
# omit `--profile tls` until the next 5-minute tick — a window where
# the recreate uses the wrong overlay set). Base file list is fine to
# initialise here because the tls overlay is the only conditional one.
COMPOSE_FILES=( -f docker-compose.yml -f docker-compose.prod.yml -f docker-compose.host-mount.yml ) COMPOSE_FILES=( -f docker-compose.yml -f docker-compose.prod.yml -f docker-compose.host-mount.yml )
PROFILE_ARGS=() PROFILE_ARGS=()
# `-s` (size > 0) instead of `-f` — guards against the corner case where
# rotate.sh wrote a 0-byte cert and exited (or got SIGKILLed mid-write).
# Bringing up the tls profile against an empty cert would just crash
# Caddy on start; better to fall back to plain :8000 until rotate
# regenerates real bytes. Same `-s` rule for Caddyfile: without it (or
# with an empty one) the caddy service crash-loops while the tls overlay
# has already closed :8000 — net effect is "app unreachable". Skipping
# the overlay keeps the app on plain :8000 until config lands.
if [ -s /data/state/certs/fullchain.pem ] && [ -s /data/state/certs/privkey.pem ] && [ -s Caddyfile ]; then
COMPOSE_FILES+=( -f docker-compose.tls.yml )
PROFILE_ARGS=( --profile tls )
elif [ -s /data/state/certs/fullchain.pem ] && [ -s /data/state/certs/privkey.pem ]; then
logger -t agnes-auto-upgrade "WARN: certs present but Caddyfile missing/empty — skipping tls overlay"
fi
BEFORE=$(docker images --no-trunc --format '{{.Digest}}' "$IMAGE" | head -1)
docker compose "${COMPOSE_FILES[@]}" pull >/dev/null 2>&1
AFTER=$(docker images --no-trunc --format '{{.Digest}}' "$IMAGE" | head -1)
# Re-fetch the bind-mounted config files (compose overlays + Caddyfile) # Re-fetch the bind-mounted config files (compose overlays + Caddyfile)
# from the OSS main branch on every tick. Without this, an image-only # from the OSS main branch on every tick. Without this, an image-only
@ -113,6 +104,29 @@ for f in "${CONFIG_FILES[@]}"; do
done done
CONFIG_AFTER=$(hash_config_files) CONFIG_AFTER=$(hash_config_files)
# `-s` (size > 0) instead of `-f` — guards against the corner case where
# rotate.sh wrote a 0-byte cert and exited (or got SIGKILLed mid-write).
# Bringing up the tls profile against an empty cert would just crash
# Caddy on start; better to fall back to plain :8000 until rotate
# regenerates real bytes. Same `-s` rule for Caddyfile: without it (or
# with an empty one) the caddy service crash-loops while the tls overlay
# has already closed :8000 — net effect is "app unreachable". Skipping
# the overlay keeps the app on plain :8000 until config lands.
#
# Evaluated AFTER the config re-fetch above so a freshly-added or
# freshly-removed Caddyfile is reflected in this tick's compose set,
# not the next one.
if [ -s /data/state/certs/fullchain.pem ] && [ -s /data/state/certs/privkey.pem ] && [ -s Caddyfile ]; then
COMPOSE_FILES+=( -f docker-compose.tls.yml )
PROFILE_ARGS=( --profile tls )
elif [ -s /data/state/certs/fullchain.pem ] && [ -s /data/state/certs/privkey.pem ]; then
logger -t agnes-auto-upgrade "WARN: certs present but Caddyfile missing/empty — skipping tls overlay"
fi
BEFORE=$(docker images --no-trunc --format '{{.Digest}}' "$IMAGE" | head -1)
docker compose "${COMPOSE_FILES[@]}" pull >/dev/null 2>&1
AFTER=$(docker images --no-trunc --format '{{.Digest}}' "$IMAGE" | head -1)
if [ "$BEFORE" != "$AFTER" ] || [ "$CONFIG_BEFORE" != "$CONFIG_AFTER" ]; then if [ "$BEFORE" != "$AFTER" ] || [ "$CONFIG_BEFORE" != "$CONFIG_AFTER" ]; then
REASON=() REASON=()
[ "$BEFORE" != "$AFTER" ] && REASON+=("image digest") [ "$BEFORE" != "$AFTER" ] && REASON+=("image digest")