From 7d868159f2c6b271b91cf9a9307a3f279bd66a63 Mon Sep 17 00:00:00 2001 From: ZdenekSrotyr Date: Tue, 12 May 2026 16:50:02 +0200 Subject: [PATCH] address Devin Review on PR #264 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit BUG_0001 (red): config/claude_md_template.txt is the Jinja2 source for every analyst workspace's CLAUDE.md served via /api/welcome (src/claude_md.py). It still instructed the agent to use the removed --register-bq flag in 6 places — defeating the point of the PR for anyone who ran agnes init after merge. Rewritten: - ASCII routing diagram: "join with a local table" now points to "agnes snapshot create the remote side, then join locally" - "Three patterns" table → "Two patterns" (snapshot create + --remote) - "Hybrid query example" rewritten as snapshot-create + local join, with --remote called out as the escape hatch when the remote side is too big to snapshot - "When the table isn't in agnes catalog" — drop the ad-hoc --register-bq path; admins register, no analyst-side workaround - Footer cross-ref drops "hybrid-query examples" BUG_0002 (yellow): cli/error_render.py docstring line 7 said "All three previously flattened..." after I had already reduced "Three CLI paths" → "Two CLI paths" on line 3. "All three" → "Both". --- cli/error_render.py | 2 +- config/claude_md_template.txt | 56 +++++++++++++++++------------------ 2 files changed, 29 insertions(+), 29 deletions(-) diff --git a/cli/error_render.py b/cli/error_render.py index 6399478..982e92d 100644 --- a/cli/error_render.py +++ b/cli/error_render.py @@ -4,7 +4,7 @@ Two CLI paths surface BigQuery / guardrail / RBAC typed errors today: - ``agnes query --remote`` (POST /api/query) - ``agnes snapshot create`` / ``agnes schema`` etc. (cli.v2_client wrappers around v2 endpoints) -All three previously flattened the structured ``detail`` JSON to a +Both previously flattened the structured ``detail`` JSON to a truncated single-line string, hiding the operator-facing hint that explains how to fix ``USER_PROJECT_DENIED`` / cost-cap rejection / unregistered ``bq.*`` paths. This module recognizes a few canonical diff --git a/config/claude_md_template.txt b/config/claude_md_template.txt index bdeee46..81ad6e2 100644 --- a/config/claude_md_template.txt +++ b/config/claude_md_template.txt @@ -140,16 +140,16 @@ query_mode of ├─ materialized → agnes query (parquet was synced b → agnes snapshot create
--select ... --where ... --as then agnes query "SELECT ... FROM " - join with a local table - → agnes query --register-bq "alias=BQ_SQL" --sql "..." + → agnes snapshot create --select ... --where ... --as , + then agnes query "SELECT ... FROM JOIN ..." ``` -### Three patterns for `query_mode: "remote"` tables +### Two patterns for `query_mode: "remote"` tables | Pattern | Tool | Use when | |---------|------|----------| -| **`agnes snapshot create`** (preferred) | materializes a filtered subset locally → query the snapshot | repeated questions on same slice | +| **`agnes snapshot create`** (preferred) | materializes a filtered subset locally → query the snapshot | repeated questions on same slice, OR joining a remote table with a local one (snapshot the remote side, join locally) | | **`agnes query --remote`** | one-shot, server-side execution against BigQuery (works for BASE TABLE rows directly + VIEW/MATERIALIZED_VIEW rows via the BQ jobs API; cost-guarded by a 5 GiB scan cap configurable in /admin/server-config) | single aggregate / cheap probe | -| **`agnes query --register-bq`** | hybrid joins between local snapshots and ad-hoc BQ subqueries | crossing local + remote | ### Common mistakes — avoid on first try @@ -215,21 +215,31 @@ If the question is time-sensitive (e.g. "today's orders"), assume any snapshot o ### Hybrid query example — local + remote in one query -`agnes query --register-bq` lets a single SQL statement join a local table with an ad-hoc BQ subquery. The BQ subquery runs first (server-side), result registered as a DuckDB view, then the joined query runs locally. +To join a local table with remote BigQuery data: `agnes snapshot create` a +filtered slice of the remote table, then query the local join. Snapshot the +remote side aggressively (WHERE + only the columns you need) so the cached +parquet stays under ~100 MB; the join itself runs locally over the snapshot ++ the existing local table. ``` -agnes query \ - --register-bq "traffic=SELECT date, country, SUM(views) AS views \ - FROM \`prj.web_analytics.sessions\` \ - WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) \ - GROUP BY 1, 2" \ - --sql "SELECT o.date, o.country, o.revenue, t.views, o.revenue / NULLIF(t.views,0) AS rev_per_view \ - FROM orders o \ - JOIN traffic t ON o.date = t.date AND o.country = t.country \ - ORDER BY 1 DESC" +# 1. snapshot the remote side, narrow + aggregated +agnes snapshot create web_sessions \ + --select date,country,views \ + --where "date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)" \ + --as traffic + +# 2. local join (orders is a local table, traffic is the snapshot from step 1) +agnes query "SELECT o.date, o.country, o.revenue, t.views, + o.revenue / NULLIF(t.views, 0) AS rev_per_view + FROM orders o + JOIN traffic t ON o.date = t.date AND o.country = t.country + ORDER BY 1 DESC" ``` -The BQ subquery MUST contain `WHERE` and/or `GROUP BY` to keep the registered result manageable (target: under 500K rows, well under 100 MB). Multiple `--register-bq` flags can compose multiple BQ sources. For complex SQL, use `--stdin` mode (`echo '{"register_bq":{...},"sql":"..."}' | agnes query --stdin`). +If the remote side is too big to snapshot even after filtering, push the +whole join server-side: `agnes query --remote ""` runs on the server +where local tables and remote tables are already in the same DuckDB +session. ### BigQuery SQL flavor for `--where` @@ -244,22 +254,12 @@ Source-typed `bigquery` tables use BigQuery dialect, not DuckDB: ### When the table you want isn't in `agnes catalog` -The table may exist in BigQuery but not be registered with Agnes yet. Two options: - -1. **Ad-hoc one-shot** — register a BQ subquery as a view inline, no admin needed - if the agnes server SA has BQ access: - ``` - agnes query --register-bq "live=SELECT * FROM \`project.dataset.table\` WHERE date >= '...' LIMIT 1000" \ - --sql "SELECT * FROM live" - ``` -2. **Ask admin to register** the table with `query_mode: "remote"` so it shows up - in `agnes catalog` and supports `agnes snapshot create` / `agnes query --remote`. This is the - right path for any table you'll query repeatedly. +The table may exist in BigQuery but not be registered with Agnes yet. Tell the user the table isn't registered and ask an admin to register it with `query_mode: "remote"` so it shows up in `agnes catalog` and supports `agnes snapshot create` / `agnes query --remote`. Direct `bq.""."
"` paths are registry-gated server-side (403 `bq_path_not_registered`) — there's no analyst-side workaround for an unregistered table; registration is the right path. ### Deeper guidance -For the full protocol, including hybrid-query examples, snapshot hygiene, and -when NOT to use `agnes snapshot create`, run: +For the full protocol, including snapshot hygiene and when NOT to use +`agnes snapshot create`, run: ``` agnes skills show agnes-data-querying