Two bugs Devin caught:
1. Caddy `try_files A B C` rewrites the URI to its LAST entry when no
file matches (per Caddy docs). Without an explicit "back to original
URI" fallback, a parquet missing from all three known static paths
would get rewritten to `/jira/data/<id>.parquet`, and the
reverse_proxy below would forward THAT rewritten URI to app:8000 →
404. The PR's documented "missed → falls through to app handler"
promise didn't actually hold for legacy / future connectors. Append
`/api/data/<id>/download` as the final try_files entry so the
reverse_proxy receives the analyst-facing URI.
2. agnes-auto-upgrade.sh's TLS-overlay decision (which checks Caddyfile
existence) ran BEFORE the config re-fetch loop. If a tick's fetch
added a previously-missing Caddyfile, this tick's docker compose
would still omit `--profile tls` until the next 5-min tick — a
window where the recreate uses the wrong overlay set. Move the
COMPOSE_FILES tls extension AFTER the fetch.
Also strip the workspace prompt of table-list / metric-count
enumerations (per user feedback): those are dynamic snapshots that go
stale; replace with explicit "use `agnes catalog` / `agnes schema` /
`agnes describe` to discover" guidance plus a note about
`rough_size_hint` semantics. The Available Datasets `{% for t in tables %}`
loop is gone — analysts use the live CLI instead.
Three concrete changes addressing the "analyst Claude misuses the CLI"
class of bugs (image.png table — issues #3, #5, plus the recurrent
"how big is this table" guesswork):
1. config/claude_md_template.txt — the template agnes init writes to
<workspace>/CLAUDE.md. Surfaces every catalog-row field with a why,
adds a query_mode-based decision tree, explicit --estimate scoping
(snapshot create ONLY — was the #1 first-try error), an agnes fetch
→ agnes snapshot create rename note, and a 6-row failure-mode table
that maps each common error wording to its right next step.
2. app/api/v2_catalog.py — populate rough_size_hint for local +
materialized rows from the on-disk parquet size, bucketed
small/medium/large/very_large. Was hardcoded null with a TODO; AI
couldn't tell "is this 6.8 GB" without a failed --remote round-trip.
3. cli/update_check.py — the [update] banner survived the da→agnes
rename and printed "[update] da X is out of date" on every command,
training analysts to associate the binary with the old name.
Verified by rendering the template against representative contexts
(33/33 tests pass) and running every use case from the original
screenshot through the real CLI against a dev VM.
13 Devin findings across 10 files:
🔴 Critical:
- app/api/v2_catalog.py:42 — `_fetch_hint` returns `da fetch` in /api/v2/catalog
responses (user-visible in every catalog list)
- cli/skills/agnes-data-querying.md — 11 stale `da fetch`/`da sync` refs in the
bundled skill markdown
- config/claude_md_template.txt:38 — referenced `agnes pull --docs-only` flag
that does NOT exist in agnes pull (removed; spec only ships --quiet/--json/
--dry-run)
🟡 Important:
- app/api/admin.py:252 — `da fetch` in bq_max_scan_bytes hint
- cli/commands/auth.py:119 — `da sync` in import-token docstring (--help text)
- cli/commands/tokens.py:48 — "Export it so `da` can use it" prose
- ARCHITECTURE.md — 4 stale rows in CLI commands table
- README.md — stale paragraphs for analysts (da sync, da analyst setup)
🚩 Substantive observations addressed:
- app/api/query.py:249,302,489 — server-side error/help strings still said
`da sync`/`da fetch` (returned in API responses to clients)
- cli/commands/snapshot.py:235-241 — DuckDB existence guard incorrectly
blocked `--estimate` (server-side dry-run that never opens local DB).
Added test ensuring estimate path skips the guard.
Skipped (intentionally historical):
- app/api/admin.py:2377,2429,2437 — historical comments describing past
manifest-vs-sync_state bug; past tense, accurate to keep as `da sync`.
- Verb renames (da X -> agnes X for surviving verbs; legacy verbs already
absent from this default template — admin overrides with legacy verbs are
caught by Task 2's _LEGACY_STRINGS scan + Task 5's admin banner).
- Path renames: data/parquet/ -> server/parquet/, data/duckdb/ ->
user/duckdb/, data/metadata/ removed entirely (no longer exists per spec).
- Drop user/artifacts/ from directory structure (spec workspace layout
drops it; surviving paths: server/parquet/, user/duckdb/, user/snapshots/,
user/sessions/).
- Add AGNES_WORKSPACE.md pointer near top-of-template so analysts know
where to find human-readable docs.
Cleans Task 0.5's missed sweep on this file (was not in cli/ tree but is
user-visible via /api/welcome).
81 claude_md/welcome_template tests pass.
- admin_welcome.html: update subtitle, description, placeholder cheatsheet
(drop tables/metrics/marketplaces/sync_interval; add user-null note and
security note). Textarea initial value is now empty (no default template
to show). Preview pane uses innerHTML (HTML output). refreshStatus sets
editor to empty when no override. Preview pane styled as light surface.
Reset modal copy updated (no banner shown, not "OSS-shipped template").
- config/claude_md_template.txt: deleted (markdown template is gone;
default is now no banner).
- docs/agent-setup-prompt.md: rewritten for variant C — describes the
/setup banner, smaller placeholder table, security/sanitization notes,
anonymous-user guard, example HTML snippet.
* fix(analyst): document BigQuery remote-query capability in bootstrap CLAUDE.md template
Closes#153.
The CLAUDE.md template generated by `da analyst bootstrap` (config/claude_md_template.txt)
covered metrics, sync, corporate memory, and directory layout — but had ZERO
mention of query_mode: "remote", da fetch, da query --remote, or --register-bq.
Result: the AI analyst running in a freshly-bootstrapped workspace had no
idea BigQuery-backed tables existed, no path to fetch unsynced data, and no
fallback for tables not in the catalog.
Validated against /Users/<user>/foundry-ai/foundryai-data-analyst/CLAUDE.md
on 2026-05-01: section confirmed missing. Workspace-level (parent-dir)
CLAUDE.md carried legacy SSH-heredoc instructions but the analyst-level
file (which Claude reads as primary project context) had nothing.
## Changes
### config/claude_md_template.txt (+83)
Added a `## Remote Queries (BigQuery)` section covering:
- Discovery first — `da catalog --json | jq '...'` to see all tables
with their query_mode, then `da schema` and `da describe` for shape.
- Three query patterns:
- `da fetch` (preferred) — materialize a filtered subset locally,
query the snapshot, drop when done.
- `da query --remote` — one-shot server-side execution (cheap probes).
- `da query --register-bq` — hybrid joins between local + ad-hoc BQ.
- `da fetch` estimate-first discipline — rules of thumb on
--select / --where / --estimate / snapshot reuse.
- BigQuery SQL flavor cheat sheet for `--where` (DATE literal,
DATE_SUB, REGEXP_CONTAINS, CAST AS INT64).
- Unknown-table fallback: when a table isn't in `da catalog` at all,
use ad-hoc `--register-bq` if the agnes server SA has BQ access, or
ask admin to register with `query_mode: "remote"` for ongoing use.
- Pointer to `da skills show agnes-data-querying` for deeper guidance.
### docs/setup/claude_md_template.txt (deleted)
Stale 359-line template that documented the deprecated SSH-heredoc
remote_query.sh protocol. No code references it (verified via grep
across .py / .sh / .yml / .md). Removing eliminates two failure
modes:
1. A future refactor accidentally pulling it into a workspace and
shipping deprecated guidance to analyst Claude sessions.
2. Reviewer confusion over which template is canonical.
### CHANGELOG.md
`### Fixed` and `### Removed` entries under [Unreleased].
## Tested
- Manually walked the diff against `da skills show agnes-data-querying`
output on a live VM (foundryai-development) — patterns + flags
match the modern CLI exactly.
- Re-bootstrap test deferred: requires network round-trip; pattern
is identical to existing template substitution path so render is
not at risk.
## Out of scope
- The companion gap that data_description.md often only enumerates
query_mode: "local" tables (no signal that other modes exist) —
separate concern, fix likely belongs in the metadata generator
on the server side, not in the analyst template.
- Encouraging admins to register frequently-queried BQ tables as
`query_mode: "remote"` in the registry — workflow improvement, not
a code bug.
* chore(release): cut 0.28.0
---------
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>