From a2157ee80764a5c0c535b7d6e2a34a0ddf5943b4 Mon Sep 17 00:00:00 2001 From: ZdenekSrotyr Date: Mon, 4 May 2026 05:48:04 +0200 Subject: [PATCH] fix(claude_md): restore full default content (BQ cost guard, hybrid example, ad-hoc table, deeper guidance) --- config/claude_md_template.txt | 47 +++++++++++++++++++++++++++++++++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/config/claude_md_template.txt b/config/claude_md_template.txt index 77cb53c..688f02e 100644 --- a/config/claude_md_template.txt +++ b/config/claude_md_template.txt @@ -73,7 +73,7 @@ For local-mode tables, query directly with `da query "SELECT … FROM "`. | Pattern | Tool | Use when | |---------|------|----------| | **`da fetch`** (preferred) | materializes a filtered subset locally → query the snapshot | repeated questions on same slice | -| **`da query --remote`** | one-shot, server-side execution against BigQuery | single aggregate / cheap probe | +| **`da query --remote`** | one-shot, server-side execution against BigQuery (works for BASE TABLE rows directly + VIEW/MATERIALIZED_VIEW rows via the BQ jobs API; cost-guarded by a 5 GiB scan cap configurable in /admin/server-config) | single aggregate / cheap probe | | **`da query --register-bq`** | hybrid joins between local snapshots and ad-hoc BQ subqueries | crossing local + remote | ### Permission model + cost — important @@ -111,7 +111,7 @@ Rules of thumb: ### Snapshot freshness — when to refresh -Snapshots are point-in-time copies. They go stale as the source data updates. For each new conversation: +Snapshots are point-in-time copies. They go stale as the source data updates (most BQ tables refresh daily; check `sync_schedule` per `da catalog`). For each new conversation: ``` da snapshot list # see existing snapshots + their ages @@ -119,6 +119,26 @@ da snapshot drop my_recent # drop stale ones da fetch
--select ... --where ... --as my_recent # re-fetch ``` +If the question is time-sensitive (e.g. "today's orders"), assume any snapshot older than the table's `sync_schedule` is stale and refresh. + +### Hybrid query example — local + remote in one query + +`da query --register-bq` lets a single SQL statement join a local table with an ad-hoc BQ subquery. The BQ subquery runs first (server-side), result registered as a DuckDB view, then the joined query runs locally. + +``` +da query \ + --register-bq "traffic=SELECT date, country, SUM(views) AS views \ + FROM \`prj.web_analytics.sessions\` \ + WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) \ + GROUP BY 1, 2" \ + --sql "SELECT o.date, o.country, o.revenue, t.views, o.revenue / NULLIF(t.views,0) AS rev_per_view \ + FROM orders o \ + JOIN traffic t ON o.date = t.date AND o.country = t.country \ + ORDER BY 1 DESC" +``` + +The BQ subquery MUST contain `WHERE` and/or `GROUP BY` to keep the registered result manageable (target: under 500K rows, well under 100 MB). Multiple `--register-bq` flags can compose multiple BQ sources. For complex SQL, use `--stdin` mode (`echo '{"register_bq":{...},"sql":"..."}' | da query --stdin`). + ### BigQuery SQL flavor for `--where` Source-typed `bigquery` tables use BigQuery dialect, not DuckDB: @@ -130,6 +150,29 @@ Source-typed `bigquery` tables use BigQuery dialect, not DuckDB: - Regex: `REGEXP_CONTAINS(col, r'pattern')` (raw string!) - Cast: `CAST(x AS INT64)` (NOT `INT`) +### When the table you want isn't in `da catalog` + +The table may exist in BigQuery but not be registered with Agnes yet. Two options: + +1. **Ad-hoc one-shot** — register a BQ subquery as a view inline, no admin needed + if the agnes server SA has BQ access: + ``` + da query --register-bq "live=SELECT * FROM \`project.dataset.table\` WHERE date >= '...' LIMIT 1000" \ + --sql "SELECT * FROM live" + ``` +2. **Ask admin to register** the table with `query_mode: "remote"` so it shows up + in `da catalog` and supports `da fetch` / `da query --remote`. This is the + right path for any table you'll query repeatedly. + +### Deeper guidance + +For the full protocol, including hybrid-query examples, snapshot hygiene, and +when NOT to use `da fetch`, run: + +``` +da skills show agnes-data-querying +``` + ## Corporate Memory Rules injected by `da sync` from the server's corporate knowledge base live in `.claude/rules/km_*.md`. They are automatically loaded by Claude Code on every session start.