{# Default analyst-onboarding workspace prompt for "agnes init". Rendered server-side by src/claude_md.py. Edit this file to change the OSS default; admins override per-instance via /admin/workspace-prompt. Available context (see docs/agent-workspace-prompt.md for the full reference): instance.name, instance.subtitle server.url, server.hostname sync_interval — string from instance.yaml data_source.type — keboola | bigquery | local tables — list of {name, description, query_mode} metrics.count, metrics.categories marketplaces — list of {slug, name, plugins:[{name}]} user.id, user.email, user.name, user.is_admin, user.groups now, today — datetime / date string #} # {{ instance.name }} — AI Data Analyst This workspace is connected to {{ server.url }}. {% if instance.subtitle %}Operated by **{{ instance.subtitle }}**.{% endif %} > Looking for human-readable workspace docs? Open `AGNES_WORKSPACE.md` in this directory — that file documents what `agnes init` installed, where files live, and how to uninstall. ## Rules - Before computing any business metric: run `agnes catalog --metrics --show /` - **For canonical table list with query modes: `agnes catalog`.** Treat `agnes catalog` as source of truth (covers all `query_mode` values: `local`, `remote`, `materialized`). - Do not use DESCRIBE/SHOW COLUMNS — use `agnes schema ` instead - Sync data regularly with `agnes pull` - **Personal customizations go in `.claude/CLAUDE.local.md`, NOT here.** This file is regenerated by `agnes init --force`; edits here will be lost. CLAUDE.local.md is preserved across regeneration and uploaded on `agnes push`. ## Metrics Workflow 1. `agnes catalog --metrics` — list registered metrics + categories 2. `agnes catalog --metrics --show /` — read the canonical SQL + business rules 3. Adapt the canonical SQL; never invent metric calculations ## Data Sync - `agnes pull` — download current data from server - `agnes push` — upload sessions and local notes to server - Data on the server refreshes every {{ sync_interval }} ## Private sessions If a Claude Code session covers something you do **not** want uploaded to the Agnes server (sensitive PII walk-through, exploratory work that shouldn't shape org-wide context, etc.), mark it private: - Run `/agnes-private` from inside the session — the deterministic `!`-prefix bash adds the current `CLAUDE_CODE_SESSION_ID` to `/.claude/agnes-sessions-private.txt`. Subsequent `agnes push` runs skip its transcript and audit-log the skip to `/.claude/agnes-sessions-private-skipped.txt`. - The Claude Code statusbar shows `🔒 agnes-private` while you are in a private session, so the marking is visible at a glance. - Marking before `capture-session` fires (the SessionStart hook that queues the transcript) means the session never enters the upload queue at all; marking after it fires is also safe — the push-side re-check on the private list is the authoritative filter. ## Discovering tables — never enumerate from memory Tables, columns, sizes, descriptions, and `query_mode` change as admins register / migrate / drop entries. Always re-discover from the live server, never from this file or your training data: ``` agnes catalog --json # all tables: id, query_mode, sql_flavor, # where_examples, fetch_via, rough_size_hint, description agnes catalog --json | jq '.tables[] | select(.id=="")' # single table — read its description in full BEFORE writing any SQL agnes schema
# columns + types in the right SQL dialect agnes describe
-n 5 # sample rows (local + materialized only) ``` The `description` field on each catalog row is the **authoritative business-rules text** for that table — it carries grain, partition column, join contracts, and column-level gotchas. Re-read it from the live `agnes catalog` for every cross-table decision; do **not** copy it into this workspace `CLAUDE.md` (it's a snapshot that goes stale, and `agnes init` will overwrite local edits — put personal notes into `.claude/CLAUDE.local.md` instead). The CLI is the source of truth. `rough_size_hint` is server-populated for `local` and `materialized` tables (`small` ≤100 MiB, `medium` ≤1 GiB, `large` ≤10 GiB, `very_large` >10 GiB) and `null` for `remote` rows. When `null`, treat the table as potentially large and use `agnes snapshot create --estimate` to size-check before fetching. {% if marketplaces -%} ## Agnes Marketplace — plugins available to you These plugins reach Claude Code through the per-user **`agnes`** marketplace served by this server (an aggregated, RBAC-filtered view of the upstream marketplaces below). When you install or invoke one of these plugins inside Claude Code, address it as `@agnes` regardless of which upstream it came from — e.g. `claude plugin install @agnes`. The `agnes refresh-marketplace` command (run by the SessionStart hook every session) keeps the local clone in sync. Upstream marketplaces folded into your `agnes` view: {% for mp in marketplaces -%} - **{{ mp.name }}** ({{ mp.slug }}): {{ mp.plugins | map(attribute="name") | join(", ") }} {% endfor %} {% endif -%} ## Remote Queries (BigQuery) — when data isn't on the laptop Not every table is synced. Tables registered with `query_mode: "remote"` live in BigQuery, accessed server-side via DuckDB's BQ extension — no parquet on disk. Tables you don't see in `server/parquet/` may still be queryable. ### Discovery first — read `agnes catalog --json` BEFORE every cross-table decision `agnes catalog --json` returns one row per table with these fields. Use them; don't guess: | Field | What it tells you | How to use it | |---|---|---| | `query_mode` | `local` (parquet on laptop) / `remote` (BQ on demand) / `materialized` (synced parquet of a BQ result) | Picks the tool — see decision tree below | | `source_type` | `keboola` / `bigquery` / `jira` | Determines SQL dialect | | `sql_flavor` | `duckdb` for local sources, `bigquery` for `--remote` queries on BQ rows | What syntax `--where` expects | | `where_examples` | 1–3 example WHERE predicates that are valid for this table's dialect | Copy as starting point for `--where` | | `fetch_via` | Pre-formatted `agnes snapshot create …` template for this table | The canonical "how do I get a slice of this table" command | | `rough_size_hint` | Coarse size hint (`small` / `medium` / `large` or null when unknown) | Bigger than `medium` → never `agnes query --remote` without a tight `--where`; use `agnes snapshot create` | ``` agnes catalog --json # full structured view (use this in scripts) agnes catalog # human-readable summary agnes schema
# columns + types (BIGQUERY/DUCKDB dialect printed in header) agnes describe
-n 5 # sample rows (works on local & materialized only) ``` ### Decision tree — pick the right tool BEFORE writing SQL ``` ┌─ local → agnes query "SELECT ..." agnes catalog → ─────┤ query_mode of
├─ materialized → agnes query (parquet was synced by agnes pull) │ (if missing locally, run `agnes pull` first) │ └─ remote → choose by table size + query shape: - one cheap probe (COUNT, schema-confirm, single agg ≤200s) → agnes query --remote "..." - repeated questions on same slice / large scan → agnes snapshot create
--select ... --where ... --as then agnes query "SELECT ... FROM " - join with a local table → agnes snapshot create --select ... --where ... --as , then agnes query "SELECT ... FROM JOIN ..." ``` ### Two patterns for `query_mode: "remote"` tables | Pattern | Tool | Use when | |---------|------|----------| | **`agnes snapshot create`** (preferred) | materializes a filtered subset locally → query the snapshot | repeated questions on same slice, OR joining a remote table with a local one (snapshot the remote side, join locally) | | **`agnes query --remote`** | one-shot, server-side execution against BigQuery (works for BASE TABLE rows directly + VIEW/MATERIALIZED_VIEW rows via the BQ jobs API; cost-guarded by a 5 GiB scan cap configurable in /admin/server-config) | single aggregate / cheap probe | ### Common mistakes — avoid on first try - **`--estimate` is on `agnes snapshot create` ONLY.** Do NOT pass it to `agnes query` — fails with `No such option: --estimate`. The estimate flow is a snapshot-creation cost gate, not a query primitive. - **Old `agnes fetch` / `da fetch` / `da query` references in stale docs** — the CLI is `agnes`; `agnes fetch` was renamed to `agnes snapshot create`. If you see those names, translate before running. - **Don't attempt personal GCP auth** if a BQ query fails with permission errors. BQ access uses the **server's service account**, not your Google identity — escalate to admin instead. - **Don't `agnes query --remote "SELECT * FROM "`** without a `--where`. Even if the scan-byte gate refuses, you've wasted the round-trip; gate yourself first by reading `rough_size_hint` and `where_examples` from `agnes catalog --json`. ### Failure-mode dictionary — what each error means + the right response | Error wording (substring) | Cause | Response | |---|---|---| | `Binder Error: Query execution exceeded the timeout. Job ID: ...` | BQ-side query took >~200 s wall-clock; the DuckDB BQ extension's `bq_query_timeout_ms` (default 90 s, server may bump to 600 s) elapsed | Narrow `--where` (especially partition column), drop unused columns from `--select`, or switch to `agnes snapshot create` to materialise once + query locally | | `HTTP 400: remote_scan_too_large` | Server's `bq_max_scan_bytes` cost gate refused the query (default 5 GiB) | Tighten `--where`; consider `agnes snapshot create` so the cost is paid once, then local queries are free | | `HTTP 401: ... unauthorized` | PAT expired or wrong | `agnes init --server-url ... --token `; re-mint via the dashboard's "Personal Access Tokens" page | | `HTTP 403: cross_project_forbidden` (with `serviceusage` mention) | Server SA lacks `serviceusage.services.use` on the BQ data project | Escalate to admin to set `data_source.bigquery.billing_project`; do NOT try personal auth | | `ReadTimeout` (client-side) on `agnes query --remote` | CLI is older than 0.35.1 (had 30 s default) | `agnes --version`; if <0.35.1, upgrade with `uv tool install --force ` (the URL is in the `[update]` banner that prints on every command). Then retry. | | `unknown columns: [...]` from `agnes snapshot create` | `--select` lists columns that don't exist | Run `agnes schema
` and copy column names verbatim | ### Cost discipline — every BQ query bills bytes scanned A naive `SELECT * FROM ` can cost real money. ALWAYS: - filter via `--where` on the partition column (typically a date) — read `where_examples` in `agnes catalog --json` - list specific columns in `--select` — column-store BQ skips the rest - run `--estimate` first (only valid on `agnes snapshot create`) when the table is partitioned/clustered or when `rough_size_hint` is unknown ### `agnes snapshot create` discipline ``` # 1. ESTIMATE first — refuses to fetch without knowing the cost agnes snapshot create
--select col1,col2 --where "date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)" --estimate # 2. If reasonable, fetch as a named snapshot agnes snapshot create
--select col1,col2 --where "..." --as my_recent # 3. Query the local snapshot agnes query "SELECT col1, COUNT(*) FROM my_recent GROUP BY 1" # 4. List + drop snapshots when done agnes snapshot list agnes snapshot drop my_recent ``` Rules of thumb: - ALWAYS list specific columns in `--select`. Avoid implicit SELECT *. - ALWAYS include a `--where` for remote tables; otherwise add `--limit`. - ALWAYS run `--estimate` first when the table is `partition_by` / `clustered_by` per `agnes schema`, or could plausibly exceed 1 GB local bytes. - Reuse snapshots across questions in the same conversation — `agnes snapshot list` before fetching. ### Snapshot freshness — when to refresh Snapshots are point-in-time copies. They go stale as the source data updates (most BQ tables refresh daily; check `sync_schedule` per `agnes catalog`). For each new conversation: ``` agnes snapshot list # see existing snapshots + their ages agnes snapshot drop my_recent # drop stale ones agnes snapshot create
--select ... --where ... --as my_recent # re-fetch ``` If the question is time-sensitive (e.g. "today's orders"), assume any snapshot older than the table's `sync_schedule` is stale and refresh. ### Hybrid query example — local + remote in one query To join a local table with remote BigQuery data: `agnes snapshot create` a filtered slice of the remote table, then query the local join. Snapshot the remote side aggressively (WHERE + only the columns you need) so the cached parquet stays under ~100 MB; the join itself runs locally over the snapshot + the existing local table. ``` # 1. snapshot the remote side, narrow + aggregated agnes snapshot create web_sessions \ --select date,country,views \ --where "date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)" \ --as traffic # 2. local join (orders is a local table, traffic is the snapshot from step 1) agnes query "SELECT o.date, o.country, o.revenue, t.views, o.revenue / NULLIF(t.views, 0) AS rev_per_view FROM orders o JOIN traffic t ON o.date = t.date AND o.country = t.country ORDER BY 1 DESC" ``` If the remote side is too big to snapshot even after filtering, push the whole join server-side: `agnes query --remote ""` runs on the server where local tables and remote tables are already in the same DuckDB session. ### BigQuery SQL flavor for `--where` Source-typed `bigquery` tables use BigQuery dialect, not DuckDB: - Date literal: `DATE '2026-01-01'` - Timestamp literal: `TIMESTAMP '2026-01-01 00:00:00 UTC'` - Now: `CURRENT_DATE()`, `CURRENT_TIMESTAMP()` - Date arithmetic: `DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)` - Regex: `REGEXP_CONTAINS(col, r'pattern')` (raw string!) - Cast: `CAST(x AS INT64)` (NOT `INT`) ### When the table you want isn't in `agnes catalog` The table may exist in BigQuery but not be registered with Agnes yet. Tell the user the table isn't registered and ask an admin to register it with `query_mode: "remote"` so it shows up in `agnes catalog` and supports `agnes snapshot create` / `agnes query --remote`. Direct `bq.""."
"` paths are registry-gated server-side (403 `bq_path_not_registered`) — there's no analyst-side workaround for an unregistered table; registration is the right path. ### Deeper guidance For the full protocol, including snapshot hygiene and when NOT to use `agnes snapshot create`, run: ``` agnes skills show agnes-data-querying ``` ## Corporate Memory Rules injected by `agnes pull` from the server's corporate knowledge base live in `.claude/rules/km_*.md`. They are automatically loaded by Claude Code on every session start. - `km_.md` — mandatory rules (always enforced) - `km_approved.md` — approved guidance (confidence × recency ranked) Run `agnes pull` to refresh. Rules are pruned automatically when items are revoked. ## Directory Structure - `server/parquet/*.parquet` — synced table data (RBAC-filtered subset for you) - `user/duckdb/analytics.duckdb` — local analytics DuckDB views — what `agnes query` reads - `user/snapshots/*.parquet` — ad-hoc materialized snapshots from `agnes snapshot create` - `user/sessions/*.jsonl` — Claude Code session logs (uploaded on `agnes push`) - `.claude/CLAUDE.local.md` — your personal notes + workspace customizations. **Never overwritten by `agnes init --force`.** Uploaded to the server on `agnes push`. Put any local-only Claude instructions, project-specific reminders, or temporary notes here — NOT in CLAUDE.md (this file is regenerated from a template). _Hello {{ user.name or user.email }} — generated {{ today }}._