address Devin Review on PR #264

BUG_0001 (red): config/claude_md_template.txt is the Jinja2 source for
every analyst workspace's CLAUDE.md served via /api/welcome
(src/claude_md.py). It still instructed the agent to use the removed
--register-bq flag in 6 places — defeating the point of the PR for
anyone who ran agnes init after merge. Rewritten:
- ASCII routing diagram: "join with a local table" now points to
  "agnes snapshot create the remote side, then join locally"
- "Three patterns" table → "Two patterns" (snapshot create + --remote)
- "Hybrid query example" rewritten as snapshot-create + local join,
  with --remote called out as the escape hatch when the remote side
  is too big to snapshot
- "When the table isn't in agnes catalog" — drop the ad-hoc
  --register-bq path; admins register, no analyst-side workaround
- Footer cross-ref drops "hybrid-query examples"

BUG_0002 (yellow): cli/error_render.py docstring line 7 said "All
three previously flattened..." after I had already reduced "Three CLI
paths" → "Two CLI paths" on line 3. "All three" → "Both".
This commit is contained in:
ZdenekSrotyr 2026-05-12 16:50:02 +02:00 committed by ZdenekSrotyr
parent 79b55b6ff3
commit 7d868159f2
2 changed files with 29 additions and 29 deletions

View file

@ -4,7 +4,7 @@ Two CLI paths surface BigQuery / guardrail / RBAC typed errors today:
- ``agnes query --remote`` (POST /api/query)
- ``agnes snapshot create`` / ``agnes schema`` etc. (cli.v2_client wrappers around v2 endpoints)
All three previously flattened the structured ``detail`` JSON to a
Both previously flattened the structured ``detail`` JSON to a
truncated single-line string, hiding the operator-facing hint that
explains how to fix ``USER_PROJECT_DENIED`` / cost-cap rejection /
unregistered ``bq.*`` paths. This module recognizes a few canonical

View file

@ -140,16 +140,16 @@ query_mode of <table> ├─ materialized → agnes query (parquet was synced b
→ agnes snapshot create <table> --select ... --where ... --as <name>
then agnes query "SELECT ... FROM <name>"
- join with a local table
→ agnes query --register-bq "alias=BQ_SQL" --sql "..."
→ agnes snapshot create <remote-table> --select ... --where ... --as <name>,
then agnes query "SELECT ... FROM <local_table> JOIN <name> ..."
```
### Three patterns for `query_mode: "remote"` tables
### Two patterns for `query_mode: "remote"` tables
| Pattern | Tool | Use when |
|---------|------|----------|
| **`agnes snapshot create`** (preferred) | materializes a filtered subset locally → query the snapshot | repeated questions on same slice |
| **`agnes snapshot create`** (preferred) | materializes a filtered subset locally → query the snapshot | repeated questions on same slice, OR joining a remote table with a local one (snapshot the remote side, join locally) |
| **`agnes query --remote`** | one-shot, server-side execution against BigQuery (works for BASE TABLE rows directly + VIEW/MATERIALIZED_VIEW rows via the BQ jobs API; cost-guarded by a 5 GiB scan cap configurable in /admin/server-config) | single aggregate / cheap probe |
| **`agnes query --register-bq`** | hybrid joins between local snapshots and ad-hoc BQ subqueries | crossing local + remote |
### Common mistakes — avoid on first try
@ -215,21 +215,31 @@ If the question is time-sensitive (e.g. "today's orders"), assume any snapshot o
### Hybrid query example — local + remote in one query
`agnes query --register-bq` lets a single SQL statement join a local table with an ad-hoc BQ subquery. The BQ subquery runs first (server-side), result registered as a DuckDB view, then the joined query runs locally.
To join a local table with remote BigQuery data: `agnes snapshot create` a
filtered slice of the remote table, then query the local join. Snapshot the
remote side aggressively (WHERE + only the columns you need) so the cached
parquet stays under ~100 MB; the join itself runs locally over the snapshot
+ the existing local table.
```
agnes query \
--register-bq "traffic=SELECT date, country, SUM(views) AS views \
FROM \`prj.web_analytics.sessions\` \
WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) \
GROUP BY 1, 2" \
--sql "SELECT o.date, o.country, o.revenue, t.views, o.revenue / NULLIF(t.views,0) AS rev_per_view \
FROM orders o \
JOIN traffic t ON o.date = t.date AND o.country = t.country \
ORDER BY 1 DESC"
# 1. snapshot the remote side, narrow + aggregated
agnes snapshot create web_sessions \
--select date,country,views \
--where "date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)" \
--as traffic
# 2. local join (orders is a local table, traffic is the snapshot from step 1)
agnes query "SELECT o.date, o.country, o.revenue, t.views,
o.revenue / NULLIF(t.views, 0) AS rev_per_view
FROM orders o
JOIN traffic t ON o.date = t.date AND o.country = t.country
ORDER BY 1 DESC"
```
The BQ subquery MUST contain `WHERE` and/or `GROUP BY` to keep the registered result manageable (target: under 500K rows, well under 100 MB). Multiple `--register-bq` flags can compose multiple BQ sources. For complex SQL, use `--stdin` mode (`echo '{"register_bq":{...},"sql":"..."}' | agnes query --stdin`).
If the remote side is too big to snapshot even after filtering, push the
whole join server-side: `agnes query --remote "<SQL>"` runs on the server
where local tables and remote tables are already in the same DuckDB
session.
### BigQuery SQL flavor for `--where`
@ -244,22 +254,12 @@ Source-typed `bigquery` tables use BigQuery dialect, not DuckDB:
### When the table you want isn't in `agnes catalog`
The table may exist in BigQuery but not be registered with Agnes yet. Two options:
1. **Ad-hoc one-shot** — register a BQ subquery as a view inline, no admin needed
if the agnes server SA has BQ access:
```
agnes query --register-bq "live=SELECT * FROM \`project.dataset.table\` WHERE date >= '...' LIMIT 1000" \
--sql "SELECT * FROM live"
```
2. **Ask admin to register** the table with `query_mode: "remote"` so it shows up
in `agnes catalog` and supports `agnes snapshot create` / `agnes query --remote`. This is the
right path for any table you'll query repeatedly.
The table may exist in BigQuery but not be registered with Agnes yet. Tell the user the table isn't registered and ask an admin to register it with `query_mode: "remote"` so it shows up in `agnes catalog` and supports `agnes snapshot create` / `agnes query --remote`. Direct `bq."<dataset>"."<table>"` paths are registry-gated server-side (403 `bq_path_not_registered`) — there's no analyst-side workaround for an unregistered table; registration is the right path.
### Deeper guidance
For the full protocol, including hybrid-query examples, snapshot hygiene, and
when NOT to use `agnes snapshot create`, run:
For the full protocol, including snapshot hygiene and when NOT to use
`agnes snapshot create`, run:
```
agnes skills show agnes-data-querying