address Devin Review on PR #264
BUG_0001 (red): config/claude_md_template.txt is the Jinja2 source for every analyst workspace's CLAUDE.md served via /api/welcome (src/claude_md.py). It still instructed the agent to use the removed --register-bq flag in 6 places — defeating the point of the PR for anyone who ran agnes init after merge. Rewritten: - ASCII routing diagram: "join with a local table" now points to "agnes snapshot create the remote side, then join locally" - "Three patterns" table → "Two patterns" (snapshot create + --remote) - "Hybrid query example" rewritten as snapshot-create + local join, with --remote called out as the escape hatch when the remote side is too big to snapshot - "When the table isn't in agnes catalog" — drop the ad-hoc --register-bq path; admins register, no analyst-side workaround - Footer cross-ref drops "hybrid-query examples" BUG_0002 (yellow): cli/error_render.py docstring line 7 said "All three previously flattened..." after I had already reduced "Three CLI paths" → "Two CLI paths" on line 3. "All three" → "Both".
This commit is contained in:
parent
79b55b6ff3
commit
7d868159f2
2 changed files with 29 additions and 29 deletions
|
|
@ -4,7 +4,7 @@ Two CLI paths surface BigQuery / guardrail / RBAC typed errors today:
|
|||
- ``agnes query --remote`` (POST /api/query)
|
||||
- ``agnes snapshot create`` / ``agnes schema`` etc. (cli.v2_client wrappers around v2 endpoints)
|
||||
|
||||
All three previously flattened the structured ``detail`` JSON to a
|
||||
Both previously flattened the structured ``detail`` JSON to a
|
||||
truncated single-line string, hiding the operator-facing hint that
|
||||
explains how to fix ``USER_PROJECT_DENIED`` / cost-cap rejection /
|
||||
unregistered ``bq.*`` paths. This module recognizes a few canonical
|
||||
|
|
|
|||
|
|
@ -140,16 +140,16 @@ query_mode of <table> ├─ materialized → agnes query (parquet was synced b
|
|||
→ agnes snapshot create <table> --select ... --where ... --as <name>
|
||||
then agnes query "SELECT ... FROM <name>"
|
||||
- join with a local table
|
||||
→ agnes query --register-bq "alias=BQ_SQL" --sql "..."
|
||||
→ agnes snapshot create <remote-table> --select ... --where ... --as <name>,
|
||||
then agnes query "SELECT ... FROM <local_table> JOIN <name> ..."
|
||||
```
|
||||
|
||||
### Three patterns for `query_mode: "remote"` tables
|
||||
### Two patterns for `query_mode: "remote"` tables
|
||||
|
||||
| Pattern | Tool | Use when |
|
||||
|---------|------|----------|
|
||||
| **`agnes snapshot create`** (preferred) | materializes a filtered subset locally → query the snapshot | repeated questions on same slice |
|
||||
| **`agnes snapshot create`** (preferred) | materializes a filtered subset locally → query the snapshot | repeated questions on same slice, OR joining a remote table with a local one (snapshot the remote side, join locally) |
|
||||
| **`agnes query --remote`** | one-shot, server-side execution against BigQuery (works for BASE TABLE rows directly + VIEW/MATERIALIZED_VIEW rows via the BQ jobs API; cost-guarded by a 5 GiB scan cap configurable in /admin/server-config) | single aggregate / cheap probe |
|
||||
| **`agnes query --register-bq`** | hybrid joins between local snapshots and ad-hoc BQ subqueries | crossing local + remote |
|
||||
|
||||
### Common mistakes — avoid on first try
|
||||
|
||||
|
|
@ -215,21 +215,31 @@ If the question is time-sensitive (e.g. "today's orders"), assume any snapshot o
|
|||
|
||||
### Hybrid query example — local + remote in one query
|
||||
|
||||
`agnes query --register-bq` lets a single SQL statement join a local table with an ad-hoc BQ subquery. The BQ subquery runs first (server-side), result registered as a DuckDB view, then the joined query runs locally.
|
||||
To join a local table with remote BigQuery data: `agnes snapshot create` a
|
||||
filtered slice of the remote table, then query the local join. Snapshot the
|
||||
remote side aggressively (WHERE + only the columns you need) so the cached
|
||||
parquet stays under ~100 MB; the join itself runs locally over the snapshot
|
||||
+ the existing local table.
|
||||
|
||||
```
|
||||
agnes query \
|
||||
--register-bq "traffic=SELECT date, country, SUM(views) AS views \
|
||||
FROM \`prj.web_analytics.sessions\` \
|
||||
WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) \
|
||||
GROUP BY 1, 2" \
|
||||
--sql "SELECT o.date, o.country, o.revenue, t.views, o.revenue / NULLIF(t.views,0) AS rev_per_view \
|
||||
FROM orders o \
|
||||
JOIN traffic t ON o.date = t.date AND o.country = t.country \
|
||||
# 1. snapshot the remote side, narrow + aggregated
|
||||
agnes snapshot create web_sessions \
|
||||
--select date,country,views \
|
||||
--where "date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)" \
|
||||
--as traffic
|
||||
|
||||
# 2. local join (orders is a local table, traffic is the snapshot from step 1)
|
||||
agnes query "SELECT o.date, o.country, o.revenue, t.views,
|
||||
o.revenue / NULLIF(t.views, 0) AS rev_per_view
|
||||
FROM orders o
|
||||
JOIN traffic t ON o.date = t.date AND o.country = t.country
|
||||
ORDER BY 1 DESC"
|
||||
```
|
||||
|
||||
The BQ subquery MUST contain `WHERE` and/or `GROUP BY` to keep the registered result manageable (target: under 500K rows, well under 100 MB). Multiple `--register-bq` flags can compose multiple BQ sources. For complex SQL, use `--stdin` mode (`echo '{"register_bq":{...},"sql":"..."}' | agnes query --stdin`).
|
||||
If the remote side is too big to snapshot even after filtering, push the
|
||||
whole join server-side: `agnes query --remote "<SQL>"` runs on the server
|
||||
where local tables and remote tables are already in the same DuckDB
|
||||
session.
|
||||
|
||||
### BigQuery SQL flavor for `--where`
|
||||
|
||||
|
|
@ -244,22 +254,12 @@ Source-typed `bigquery` tables use BigQuery dialect, not DuckDB:
|
|||
|
||||
### When the table you want isn't in `agnes catalog`
|
||||
|
||||
The table may exist in BigQuery but not be registered with Agnes yet. Two options:
|
||||
|
||||
1. **Ad-hoc one-shot** — register a BQ subquery as a view inline, no admin needed
|
||||
if the agnes server SA has BQ access:
|
||||
```
|
||||
agnes query --register-bq "live=SELECT * FROM \`project.dataset.table\` WHERE date >= '...' LIMIT 1000" \
|
||||
--sql "SELECT * FROM live"
|
||||
```
|
||||
2. **Ask admin to register** the table with `query_mode: "remote"` so it shows up
|
||||
in `agnes catalog` and supports `agnes snapshot create` / `agnes query --remote`. This is the
|
||||
right path for any table you'll query repeatedly.
|
||||
The table may exist in BigQuery but not be registered with Agnes yet. Tell the user the table isn't registered and ask an admin to register it with `query_mode: "remote"` so it shows up in `agnes catalog` and supports `agnes snapshot create` / `agnes query --remote`. Direct `bq."<dataset>"."<table>"` paths are registry-gated server-side (403 `bq_path_not_registered`) — there's no analyst-side workaround for an unregistered table; registration is the right path.
|
||||
|
||||
### Deeper guidance
|
||||
|
||||
For the full protocol, including hybrid-query examples, snapshot hygiene, and
|
||||
when NOT to use `agnes snapshot create`, run:
|
||||
For the full protocol, including snapshot hygiene and when NOT to use
|
||||
`agnes snapshot create`, run:
|
||||
|
||||
```
|
||||
agnes skills show agnes-data-querying
|
||||
|
|
|
|||
Loading…
Reference in a new issue