address Devin Review on PR #264

BUG_0001 (red): config/claude_md_template.txt is the Jinja2 source for
every analyst workspace's CLAUDE.md served via /api/welcome
(src/claude_md.py). It still instructed the agent to use the removed
--register-bq flag in 6 places — defeating the point of the PR for
anyone who ran agnes init after merge. Rewritten:
- ASCII routing diagram: "join with a local table" now points to
  "agnes snapshot create the remote side, then join locally"
- "Three patterns" table → "Two patterns" (snapshot create + --remote)
- "Hybrid query example" rewritten as snapshot-create + local join,
  with --remote called out as the escape hatch when the remote side
  is too big to snapshot
- "When the table isn't in agnes catalog" — drop the ad-hoc
  --register-bq path; admins register, no analyst-side workaround
- Footer cross-ref drops "hybrid-query examples"

BUG_0002 (yellow): cli/error_render.py docstring line 7 said "All
three previously flattened..." after I had already reduced "Three CLI
paths" → "Two CLI paths" on line 3. "All three" → "Both".
This commit is contained in:
ZdenekSrotyr 2026-05-12 16:50:02 +02:00 committed by ZdenekSrotyr
parent 79b55b6ff3
commit 7d868159f2
2 changed files with 29 additions and 29 deletions

View file

@ -4,7 +4,7 @@ Two CLI paths surface BigQuery / guardrail / RBAC typed errors today:
- ``agnes query --remote`` (POST /api/query) - ``agnes query --remote`` (POST /api/query)
- ``agnes snapshot create`` / ``agnes schema`` etc. (cli.v2_client wrappers around v2 endpoints) - ``agnes snapshot create`` / ``agnes schema`` etc. (cli.v2_client wrappers around v2 endpoints)
All three previously flattened the structured ``detail`` JSON to a Both previously flattened the structured ``detail`` JSON to a
truncated single-line string, hiding the operator-facing hint that truncated single-line string, hiding the operator-facing hint that
explains how to fix ``USER_PROJECT_DENIED`` / cost-cap rejection / explains how to fix ``USER_PROJECT_DENIED`` / cost-cap rejection /
unregistered ``bq.*`` paths. This module recognizes a few canonical unregistered ``bq.*`` paths. This module recognizes a few canonical

View file

@ -140,16 +140,16 @@ query_mode of <table> ├─ materialized → agnes query (parquet was synced b
→ agnes snapshot create <table> --select ... --where ... --as <name> → agnes snapshot create <table> --select ... --where ... --as <name>
then agnes query "SELECT ... FROM <name>" then agnes query "SELECT ... FROM <name>"
- join with a local table - join with a local table
→ agnes query --register-bq "alias=BQ_SQL" --sql "..." → agnes snapshot create <remote-table> --select ... --where ... --as <name>,
then agnes query "SELECT ... FROM <local_table> JOIN <name> ..."
``` ```
### Three patterns for `query_mode: "remote"` tables ### Two patterns for `query_mode: "remote"` tables
| Pattern | Tool | Use when | | Pattern | Tool | Use when |
|---------|------|----------| |---------|------|----------|
| **`agnes snapshot create`** (preferred) | materializes a filtered subset locally → query the snapshot | repeated questions on same slice | | **`agnes snapshot create`** (preferred) | materializes a filtered subset locally → query the snapshot | repeated questions on same slice, OR joining a remote table with a local one (snapshot the remote side, join locally) |
| **`agnes query --remote`** | one-shot, server-side execution against BigQuery (works for BASE TABLE rows directly + VIEW/MATERIALIZED_VIEW rows via the BQ jobs API; cost-guarded by a 5 GiB scan cap configurable in /admin/server-config) | single aggregate / cheap probe | | **`agnes query --remote`** | one-shot, server-side execution against BigQuery (works for BASE TABLE rows directly + VIEW/MATERIALIZED_VIEW rows via the BQ jobs API; cost-guarded by a 5 GiB scan cap configurable in /admin/server-config) | single aggregate / cheap probe |
| **`agnes query --register-bq`** | hybrid joins between local snapshots and ad-hoc BQ subqueries | crossing local + remote |
### Common mistakes — avoid on first try ### Common mistakes — avoid on first try
@ -215,21 +215,31 @@ If the question is time-sensitive (e.g. "today's orders"), assume any snapshot o
### Hybrid query example — local + remote in one query ### Hybrid query example — local + remote in one query
`agnes query --register-bq` lets a single SQL statement join a local table with an ad-hoc BQ subquery. The BQ subquery runs first (server-side), result registered as a DuckDB view, then the joined query runs locally. To join a local table with remote BigQuery data: `agnes snapshot create` a
filtered slice of the remote table, then query the local join. Snapshot the
remote side aggressively (WHERE + only the columns you need) so the cached
parquet stays under ~100 MB; the join itself runs locally over the snapshot
+ the existing local table.
``` ```
agnes query \ # 1. snapshot the remote side, narrow + aggregated
--register-bq "traffic=SELECT date, country, SUM(views) AS views \ agnes snapshot create web_sessions \
FROM \`prj.web_analytics.sessions\` \ --select date,country,views \
WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) \ --where "date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)" \
GROUP BY 1, 2" \ --as traffic
--sql "SELECT o.date, o.country, o.revenue, t.views, o.revenue / NULLIF(t.views,0) AS rev_per_view \
FROM orders o \ # 2. local join (orders is a local table, traffic is the snapshot from step 1)
JOIN traffic t ON o.date = t.date AND o.country = t.country \ agnes query "SELECT o.date, o.country, o.revenue, t.views,
o.revenue / NULLIF(t.views, 0) AS rev_per_view
FROM orders o
JOIN traffic t ON o.date = t.date AND o.country = t.country
ORDER BY 1 DESC" ORDER BY 1 DESC"
``` ```
The BQ subquery MUST contain `WHERE` and/or `GROUP BY` to keep the registered result manageable (target: under 500K rows, well under 100 MB). Multiple `--register-bq` flags can compose multiple BQ sources. For complex SQL, use `--stdin` mode (`echo '{"register_bq":{...},"sql":"..."}' | agnes query --stdin`). If the remote side is too big to snapshot even after filtering, push the
whole join server-side: `agnes query --remote "<SQL>"` runs on the server
where local tables and remote tables are already in the same DuckDB
session.
### BigQuery SQL flavor for `--where` ### BigQuery SQL flavor for `--where`
@ -244,22 +254,12 @@ Source-typed `bigquery` tables use BigQuery dialect, not DuckDB:
### When the table you want isn't in `agnes catalog` ### When the table you want isn't in `agnes catalog`
The table may exist in BigQuery but not be registered with Agnes yet. Two options: The table may exist in BigQuery but not be registered with Agnes yet. Tell the user the table isn't registered and ask an admin to register it with `query_mode: "remote"` so it shows up in `agnes catalog` and supports `agnes snapshot create` / `agnes query --remote`. Direct `bq."<dataset>"."<table>"` paths are registry-gated server-side (403 `bq_path_not_registered`) — there's no analyst-side workaround for an unregistered table; registration is the right path.
1. **Ad-hoc one-shot** — register a BQ subquery as a view inline, no admin needed
if the agnes server SA has BQ access:
```
agnes query --register-bq "live=SELECT * FROM \`project.dataset.table\` WHERE date >= '...' LIMIT 1000" \
--sql "SELECT * FROM live"
```
2. **Ask admin to register** the table with `query_mode: "remote"` so it shows up
in `agnes catalog` and supports `agnes snapshot create` / `agnes query --remote`. This is the
right path for any table you'll query repeatedly.
### Deeper guidance ### Deeper guidance
For the full protocol, including hybrid-query examples, snapshot hygiene, and For the full protocol, including snapshot hygiene and when NOT to use
when NOT to use `agnes snapshot create`, run: `agnes snapshot create`, run:
``` ```
agnes skills show agnes-data-querying agnes skills show agnes-data-querying