feat(workspace-prompt): decision tree + size-hint so analyst Claude gets it right first try
Three concrete changes addressing the "analyst Claude misuses the CLI" class of bugs (image.png table — issues #3, #5, plus the recurrent "how big is this table" guesswork): 1. config/claude_md_template.txt — the template agnes init writes to <workspace>/CLAUDE.md. Surfaces every catalog-row field with a why, adds a query_mode-based decision tree, explicit --estimate scoping (snapshot create ONLY — was the #1 first-try error), an agnes fetch → agnes snapshot create rename note, and a 6-row failure-mode table that maps each common error wording to its right next step. 2. app/api/v2_catalog.py — populate rough_size_hint for local + materialized rows from the on-disk parquet size, bucketed small/medium/large/very_large. Was hardcoded null with a TODO; AI couldn't tell "is this 6.8 GB" without a failed --remote round-trip. 3. cli/update_check.py — the [update] banner survived the da→agnes rename and printed "[update] da X is out of date" on every command, training analysts to associate the binary with the old name. Verified by rendering the template against representative contexts (33/33 tests pass) and running every use case from the original screenshot through the real CLI against a dev VM.
This commit is contained in:
parent
2ae486bc5d
commit
30e81a15b9
4 changed files with 114 additions and 13 deletions
|
|
@ -15,6 +15,11 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C
|
||||||
- **Per-user parallel parquet downloads in `agnes pull`** — the download loop in `cli/lib/pull.py` now uses a `ThreadPoolExecutor` with concurrency capped by the new `AGNES_PULL_PARALLELISM` env var (default 4, set 1 to restore pre-PR serial behavior). On a registry of N tables the wall-clock time drops from `Σ stream_download_seconds(table_i)` to roughly `max × ceil(N/4)`. Works hand-in-hand with the Caddy `file_server` change below: without it parallel client-side downloads would still queue on the single uvicorn worker; with it each request is its own caddy goroutine + sendfile, so 4-way parallelism actually delivers throughput. Per-table error semantics preserved — a failure on one table no longer aborts the rest of the batch.
|
- **Per-user parallel parquet downloads in `agnes pull`** — the download loop in `cli/lib/pull.py` now uses a `ThreadPoolExecutor` with concurrency capped by the new `AGNES_PULL_PARALLELISM` env var (default 4, set 1 to restore pre-PR serial behavior). On a registry of N tables the wall-clock time drops from `Σ stream_download_seconds(table_i)` to roughly `max × ceil(N/4)`. Works hand-in-hand with the Caddy `file_server` change below: without it parallel client-side downloads would still queue on the single uvicorn worker; with it each request is its own caddy goroutine + sendfile, so 4-way parallelism actually delivers throughput. Per-table error semantics preserved — a failure on one table no longer aborts the rest of the batch.
|
||||||
- **`scripts/ops/agnes-auto-upgrade.sh` now re-fetches Caddyfile + every compose overlay** from `keboola/agnes-the-ai-analyst@main` on every tick, hashes them, and triggers a `docker compose up -d` recreation when the hash changes — same path as an image-digest change. Pre-fix the script only watched `docker images` digests, so a Caddyfile or compose change in main never reached running VMs (only fresh boots ran `startup.sh`'s file fetch). Without this, the new file_server downloads-path below would land in the image but stay inert against an old Caddyfile. The script also self-updates from the same path so the very fix that watches config files isn't itself stuck on running VMs. Fail-soft on curl errors — keeps the existing file rather than blanking it.
|
- **`scripts/ops/agnes-auto-upgrade.sh` now re-fetches Caddyfile + every compose overlay** from `keboola/agnes-the-ai-analyst@main` on every tick, hashes them, and triggers a `docker compose up -d` recreation when the hash changes — same path as an image-digest change. Pre-fix the script only watched `docker images` digests, so a Caddyfile or compose change in main never reached running VMs (only fresh boots ran `startup.sh`'s file fetch). Without this, the new file_server downloads-path below would land in the image but stay inert against an old Caddyfile. The script also self-updates from the same path so the very fix that watches config files isn't itself stuck on running VMs. Fail-soft on curl errors — keeps the existing file rather than blanking it.
|
||||||
- **Caddy `file_server` for parquet downloads** — `GET /api/data/{table_id}/download` is now intercepted at the Caddy layer (TLS profile only) and served directly via sendfile/zero-copy from the data volume mounted read-only at `/srv` inside the caddy container. Caddy authorises every request via a new lightweight RBAC probe `GET /api/data/{table_id}/check-access` (returns 204 when the caller has read access on the table, 403 otherwise) using the `forward_auth` directive — the bulk byte transfer never touches uvicorn workers. Resolves a real production failure mode where a single multi-GB analyst pull held the app's only uvicorn worker for the duration of the stream and starved the UI / `/api/health` / every other API endpoint, eventually flipping the container to `unhealthy`. Path discovery uses Caddy's `try_files` over the known `extract.duckdb` v2 source subdirs (`bigquery/data/<id>.parquet`, `keboola/data/<id>.parquet`, `jira/data/<id>.parquet`); a parquet not at any of those paths transparently falls through to the existing app handler so legacy `src_data/parquet` layouts and future connectors keep working with no Caddyfile change. Non-Caddy deployments (dev `docker compose up` without `--profile tls`) continue to use the app handler unchanged.
|
- **Caddy `file_server` for parquet downloads** — `GET /api/data/{table_id}/download` is now intercepted at the Caddy layer (TLS profile only) and served directly via sendfile/zero-copy from the data volume mounted read-only at `/srv` inside the caddy container. Caddy authorises every request via a new lightweight RBAC probe `GET /api/data/{table_id}/check-access` (returns 204 when the caller has read access on the table, 403 otherwise) using the `forward_auth` directive — the bulk byte transfer never touches uvicorn workers. Resolves a real production failure mode where a single multi-GB analyst pull held the app's only uvicorn worker for the duration of the stream and starved the UI / `/api/health` / every other API endpoint, eventually flipping the container to `unhealthy`. Path discovery uses Caddy's `try_files` over the known `extract.duckdb` v2 source subdirs (`bigquery/data/<id>.parquet`, `keboola/data/<id>.parquet`, `jira/data/<id>.parquet`); a parquet not at any of those paths transparently falls through to the existing app handler so legacy `src_data/parquet` layouts and future connectors keep working with no Caddyfile change. Non-Caddy deployments (dev `docker compose up` without `--profile tls`) continue to use the app handler unchanged.
|
||||||
|
- **Workspace prompt: decision tree, common-mistakes callout, failure-mode dictionary** in `config/claude_md_template.txt` (the template `agnes init` writes to `<workspace>/CLAUDE.md`). Surfaces every catalog-row field analyst Claude should read before deciding which command to use (`query_mode`, `sql_flavor`, `where_examples`, `fetch_via`, `rough_size_hint`); explicitly binds `--estimate` to `agnes snapshot create` ONLY (was the most-failed first-try misuse — fails with `No such option: --estimate` on `agnes query`); calls out the `agnes fetch` → `agnes snapshot create` rename so stale-doc analysts don't run a non-command; documents the BQ permission model (server SA, not personal Google identity) and a 6-row failure-mode table mapping each common error wording to its cause + the right next step.
|
||||||
|
- **`rough_size_hint` populated for `local` + `materialized` catalog rows** in `GET /api/v2/catalog` (was hardcoded `null` with a "Task 8" TODO). Reads the parquet file size at `${DATA_DIR}/extracts/<source_type>/data/<table_id>.parquet` and buckets into `small` (≤100 MiB), `medium` (≤1 GiB), `large` (≤10 GiB), `very_large` (>10 GiB). `remote` rows stay `null` for now (size requires a BQ INFORMATION_SCHEMA call; tracked separately). Lets analyst Claude pick `agnes snapshot create` over `agnes query --remote` by inspecting `agnes catalog --json` rather than discovering size empirically via a failed `--remote` round-trip.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- **CLI update-banner now says `agnes` instead of `da`** (`cli/update_check.py:format_outdated_notice`). The string `[update] da X is out of date` had survived the `da` → `agnes` CLI rename and was the most-visible stale identifier in the analyst-facing surface — every CLI command printed it on stderr when a newer wheel was available.
|
||||||
|
|
||||||
### Fixed
|
### Fixed
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -2,10 +2,12 @@
|
||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
from datetime import datetime, timezone
|
from datetime import datetime, timezone
|
||||||
|
from pathlib import Path
|
||||||
from fastapi import APIRouter, Depends
|
from fastapi import APIRouter, Depends
|
||||||
import duckdb
|
import duckdb
|
||||||
|
|
||||||
from app.auth.dependencies import get_current_user, _get_db
|
from app.auth.dependencies import get_current_user, _get_db
|
||||||
|
from app.utils import get_data_dir as _get_data_dir
|
||||||
from src.rbac import can_access_table
|
from src.rbac import can_access_table
|
||||||
from src.repositories.table_registry import TableRegistryRepository
|
from src.repositories.table_registry import TableRegistryRepository
|
||||||
from app.api.v2_cache import TTLCache
|
from app.api.v2_cache import TTLCache
|
||||||
|
|
@ -43,6 +45,52 @@ def _fetch_hint(table_id: str, source_type: str) -> str:
|
||||||
return "already local — query directly via `agnes query`"
|
return "already local — query directly via `agnes query`"
|
||||||
|
|
||||||
|
|
||||||
|
# Coarse size buckets for `rough_size_hint`. Boundaries chosen so an analyst
|
||||||
|
# Claude can decide tool by inspection: anything `large` or worse implies
|
||||||
|
# `agnes snapshot create` over `agnes query --remote`. Numbers reflect the
|
||||||
|
# default `bq_max_scan_bytes` 5 GiB ceiling — at "large" you're already at
|
||||||
|
# half the per-query gate and a naive `--remote` is likely to refuse.
|
||||||
|
_SIZE_BUCKETS = (
|
||||||
|
(10 * 2**20, "small"), # ≤10 MiB
|
||||||
|
(100 * 2**20, "small"), # ≤100 MiB still small (analyst-laptop scale)
|
||||||
|
(1 * 2**30, "medium"), # ≤1 GiB
|
||||||
|
(10 * 2**30, "large"), # ≤10 GiB
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _bucket_size(byte_count: int) -> str:
|
||||||
|
for cap, label in _SIZE_BUCKETS:
|
||||||
|
if byte_count <= cap:
|
||||||
|
return label
|
||||||
|
return "very_large"
|
||||||
|
|
||||||
|
|
||||||
|
def _materialized_size_hint(table_id: str, source_type: str, query_mode: str) -> str | None:
|
||||||
|
"""Return a rough size bucket for a row whose data is on the server's
|
||||||
|
local filesystem (any `query_mode` that produces a parquet — `local` and
|
||||||
|
`materialized`). Returns ``None`` for `remote` (size requires a BQ
|
||||||
|
INFORMATION_SCHEMA round-trip; tracked separately) and for tables whose
|
||||||
|
parquet hasn't been materialised yet so the AI gets ``null`` not a
|
||||||
|
misleading "small".
|
||||||
|
|
||||||
|
Layout matches the v2 extract.duckdb contract:
|
||||||
|
${DATA_DIR}/extracts/<source_type>/data/<table_id>.parquet
|
||||||
|
"""
|
||||||
|
if query_mode == "remote":
|
||||||
|
return None
|
||||||
|
if not source_type:
|
||||||
|
return None
|
||||||
|
try:
|
||||||
|
path = Path(_get_data_dir()) / "extracts" / source_type / "data" / f"{table_id}.parquet"
|
||||||
|
if not path.exists():
|
||||||
|
return None
|
||||||
|
return _bucket_size(path.stat().st_size)
|
||||||
|
except Exception:
|
||||||
|
# Filesystem stat() race / permissions / weird DATA_DIR — fall back
|
||||||
|
# to null rather than crash the whole catalog response.
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
def build_catalog(conn: duckdb.DuckDBPyConnection, user: dict) -> dict:
|
def build_catalog(conn: duckdb.DuckDBPyConnection, user: dict) -> dict:
|
||||||
rows = _table_rows_cache.get(_TABLE_ROWS_KEY)
|
rows = _table_rows_cache.get(_TABLE_ROWS_KEY)
|
||||||
if rows is None:
|
if rows is None:
|
||||||
|
|
@ -66,7 +114,10 @@ def build_catalog(conn: duckdb.DuckDBPyConnection, user: dict) -> dict:
|
||||||
"sql_flavor": _flavor_for(r.get("source_type") or ""),
|
"sql_flavor": _flavor_for(r.get("source_type") or ""),
|
||||||
"where_examples": _examples_for(r.get("source_type") or ""),
|
"where_examples": _examples_for(r.get("source_type") or ""),
|
||||||
"fetch_via": _fetch_hint(r["id"], r.get("source_type") or ""),
|
"fetch_via": _fetch_hint(r["id"], r.get("source_type") or ""),
|
||||||
"rough_size_hint": None, # populated by Task 8 schema endpoint when called
|
"rough_size_hint": _materialized_size_hint(
|
||||||
|
r["id"], r.get("source_type") or "",
|
||||||
|
r.get("query_mode") or "local",
|
||||||
|
),
|
||||||
})
|
})
|
||||||
|
|
||||||
return {
|
return {
|
||||||
|
|
|
||||||
|
|
@ -184,7 +184,7 @@ def format_outdated_notice(info: UpdateInfo) -> str:
|
||||||
literal string "None" into a copy-pasteable command — drop the upgrade
|
literal string "None" into a copy-pasteable command — drop the upgrade
|
||||||
snippet in that case.
|
snippet in that case.
|
||||||
"""
|
"""
|
||||||
msg = f"[update] da {info.installed} is out of date — latest on this server is {info.latest}."
|
msg = f"[update] agnes {info.installed} is out of date — latest on this server is {info.latest}."
|
||||||
if info.download_url:
|
if info.download_url:
|
||||||
msg += f" Upgrade: uv tool install --force {info.download_url}"
|
msg += f" Upgrade: uv tool install --force {info.download_url}"
|
||||||
return msg
|
return msg
|
||||||
|
|
|
||||||
|
|
@ -58,15 +58,43 @@ Not every table is synced. Tables registered with `query_mode: "remote"` live in
|
||||||
BigQuery, accessed server-side via DuckDB's BQ extension — no parquet on disk.
|
BigQuery, accessed server-side via DuckDB's BQ extension — no parquet on disk.
|
||||||
Tables you don't see in `server/parquet/` may still be queryable.
|
Tables you don't see in `server/parquet/` may still be queryable.
|
||||||
|
|
||||||
### Discovery first
|
### Discovery first — read `agnes catalog --json` BEFORE every cross-table decision
|
||||||
|
|
||||||
|
`agnes catalog --json` returns one row per table with these fields. Use them; don't guess:
|
||||||
|
|
||||||
|
| Field | What it tells you | How to use it |
|
||||||
|
|---|---|---|
|
||||||
|
| `query_mode` | `local` (parquet on laptop) / `remote` (BQ on demand) / `materialized` (synced parquet of a BQ result) | Picks the tool — see decision tree below |
|
||||||
|
| `source_type` | `keboola` / `bigquery` / `jira` | Determines SQL dialect |
|
||||||
|
| `sql_flavor` | `duckdb` for local sources, `bigquery` for `--remote` queries on BQ rows | What syntax `--where` expects |
|
||||||
|
| `where_examples` | 1–3 example WHERE predicates that are valid for this table's dialect | Copy as starting point for `--where` |
|
||||||
|
| `fetch_via` | Pre-formatted `agnes snapshot create …` template for this table | The canonical "how do I get a slice of this table" command |
|
||||||
|
| `rough_size_hint` | Coarse size hint (`small` / `medium` / `large` or null when unknown) | Bigger than `medium` → never `agnes query --remote` without a tight `--where`; use `agnes snapshot create` |
|
||||||
|
|
||||||
```
|
```
|
||||||
agnes catalog --json | jq '.[] | {name, source_type, query_mode}' # see all tables + their modes
|
agnes catalog --json # full structured view (use this in scripts)
|
||||||
agnes schema <table> # columns + types
|
agnes catalog # human-readable summary
|
||||||
agnes describe <table> -n 5 # sample rows
|
agnes schema <table> # columns + types (BIGQUERY/DUCKDB dialect printed in header)
|
||||||
|
agnes describe <table> -n 5 # sample rows (works on local & materialized only)
|
||||||
```
|
```
|
||||||
|
|
||||||
For local-mode tables, query directly with `agnes query "SELECT … FROM <table>"`.
|
### Decision tree — pick the right tool BEFORE writing SQL
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─ local → agnes query "SELECT ..."
|
||||||
|
agnes catalog → ─────┤
|
||||||
|
query_mode of <table> ├─ materialized → agnes query (parquet was synced by agnes pull)
|
||||||
|
│ (if missing locally, run `agnes pull` first)
|
||||||
|
│
|
||||||
|
└─ remote → choose by table size + query shape:
|
||||||
|
- one cheap probe (COUNT, schema-confirm, single agg ≤200s)
|
||||||
|
→ agnes query --remote "..."
|
||||||
|
- repeated questions on same slice / large scan
|
||||||
|
→ agnes snapshot create <table> --select ... --where ... --as <name>
|
||||||
|
then agnes query "SELECT ... FROM <name>"
|
||||||
|
- join with a local table
|
||||||
|
→ agnes query --register-bq "alias=BQ_SQL" --sql "..."
|
||||||
|
```
|
||||||
|
|
||||||
### Three patterns for `query_mode: "remote"` tables
|
### Three patterns for `query_mode: "remote"` tables
|
||||||
|
|
||||||
|
|
@ -76,13 +104,30 @@ For local-mode tables, query directly with `agnes query "SELECT … FROM <table>
|
||||||
| **`agnes query --remote`** | one-shot, server-side execution against BigQuery (works for BASE TABLE rows directly + VIEW/MATERIALIZED_VIEW rows via the BQ jobs API; cost-guarded by a 5 GiB scan cap configurable in /admin/server-config) | single aggregate / cheap probe |
|
| **`agnes query --remote`** | one-shot, server-side execution against BigQuery (works for BASE TABLE rows directly + VIEW/MATERIALIZED_VIEW rows via the BQ jobs API; cost-guarded by a 5 GiB scan cap configurable in /admin/server-config) | single aggregate / cheap probe |
|
||||||
| **`agnes query --register-bq`** | hybrid joins between local snapshots and ad-hoc BQ subqueries | crossing local + remote |
|
| **`agnes query --register-bq`** | hybrid joins between local snapshots and ad-hoc BQ subqueries | crossing local + remote |
|
||||||
|
|
||||||
### Permission model + cost — important
|
### Common mistakes — avoid on first try
|
||||||
|
|
||||||
- BQ access goes through the **agnes server's GCE service account**, not your personal Google credentials. If a query fails with a permission error, the table is in a project the server SA cannot read — escalate to admin, do NOT try to authenticate yourself.
|
- **`--estimate` is on `agnes snapshot create` ONLY.** Do NOT pass it to `agnes query` — fails with `No such option: --estimate`. The estimate flow is a snapshot-creation cost gate, not a query primitive.
|
||||||
- Every BQ query bills the SA's GCP project for **bytes scanned**. A naive `SELECT * FROM <large_table>` can cost real money. ALWAYS:
|
- **Old `agnes fetch` / `da fetch` / `da query` references in stale docs** — the CLI is `agnes`; `agnes fetch` was renamed to `agnes snapshot create`. If you see those names, translate before running.
|
||||||
- filter via `--where` on the partition column (typically a date)
|
- **Don't attempt personal GCP auth** if a BQ query fails with permission errors. BQ access uses the **server's service account**, not your Google identity — escalate to admin instead.
|
||||||
- list specific columns in `--select` — column-store BQ skips the rest, cheaper
|
- **Don't `agnes query --remote "SELECT * FROM <large_table>"`** without a `--where`. Even if the scan-byte gate refuses, you've wasted the round-trip; gate yourself first by reading `rough_size_hint` and `where_examples` from `agnes catalog --json`.
|
||||||
- run `--estimate` first when unsure of the table size or partitioning
|
|
||||||
|
### Failure-mode dictionary — what each error means + the right response
|
||||||
|
|
||||||
|
| Error wording (substring) | Cause | Response |
|
||||||
|
|---|---|---|
|
||||||
|
| `Binder Error: Query execution exceeded the timeout. Job ID: ...` | BQ-side query took >~200 s wall-clock; the DuckDB BQ extension's `bq_query_timeout_ms` (default 90 s, server may bump to 600 s) elapsed | Narrow `--where` (especially partition column), drop unused columns from `--select`, or switch to `agnes snapshot create` to materialise once + query locally |
|
||||||
|
| `HTTP 400: remote_scan_too_large` | Server's `bq_max_scan_bytes` cost gate refused the query (default 5 GiB) | Tighten `--where`; consider `agnes snapshot create` so the cost is paid once, then local queries are free |
|
||||||
|
| `HTTP 401: ... unauthorized` | PAT expired or wrong | `agnes init --server-url ... --token <new-PAT>`; re-mint via the dashboard's "Personal Access Tokens" page |
|
||||||
|
| `HTTP 403: cross_project_forbidden` (with `serviceusage` mention) | Server SA lacks `serviceusage.services.use` on the BQ data project | Escalate to admin to set `data_source.bigquery.billing_project`; do NOT try personal auth |
|
||||||
|
| `ReadTimeout` (client-side) on `agnes query --remote` | CLI is older than 0.35.1 (had 30 s default) | `agnes --version`; if <0.35.1, upgrade with `uv tool install --force <wheel-from-server>` (the URL is in the `[update]` banner that prints on every command). Then retry. |
|
||||||
|
| `unknown columns: [...]` from `agnes snapshot create` | `--select` lists columns that don't exist | Run `agnes schema <table>` and copy column names verbatim |
|
||||||
|
|
||||||
|
### Cost discipline — every BQ query bills bytes scanned
|
||||||
|
|
||||||
|
A naive `SELECT * FROM <large_table>` can cost real money. ALWAYS:
|
||||||
|
- filter via `--where` on the partition column (typically a date) — read `where_examples` in `agnes catalog --json`
|
||||||
|
- list specific columns in `--select` — column-store BQ skips the rest
|
||||||
|
- run `--estimate` first (only valid on `agnes snapshot create`) when the table is partitioned/clustered or when `rough_size_hint` is unknown
|
||||||
|
|
||||||
### `agnes snapshot create` discipline
|
### `agnes snapshot create` discipline
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue