docs(claude-md-template): rewrite verbs + paths for new CLI surface (Task 6)

- Verb renames (da X -> agnes X for surviving verbs; legacy verbs already
  absent from this default template — admin overrides with legacy verbs are
  caught by Task 2's _LEGACY_STRINGS scan + Task 5's admin banner).
- Path renames: data/parquet/ -> server/parquet/, data/duckdb/ ->
  user/duckdb/, data/metadata/ removed entirely (no longer exists per spec).
- Drop user/artifacts/ from directory structure (spec workspace layout
  drops it; surviving paths: server/parquet/, user/duckdb/, user/snapshots/,
  user/sessions/).
- Add AGNES_WORKSPACE.md pointer near top-of-template so analysts know
  where to find human-readable docs.

Cleans Task 0.5's missed sweep on this file (was not in cli/ tree but is
user-visible via /api/welcome).

81 claude_md/welcome_template tests pass.
This commit is contained in:
ZdenekSrotyr 2026-05-04 17:51:14 +02:00
parent a92c624dba
commit d25d075ed2

View file

@ -1,4 +1,4 @@
{# Default analyst-onboarding workspace prompt for "da analyst setup". {# Default analyst-onboarding workspace prompt for "agnes init".
Rendered server-side by src/claude_md.py. Edit this file to change Rendered server-side by src/claude_md.py. Edit this file to change
the OSS default; admins override per-instance via /admin/workspace-prompt. the OSS default; admins override per-instance via /admin/workspace-prompt.
@ -18,24 +18,25 @@
This workspace is connected to {{ server.url }}. This workspace is connected to {{ server.url }}.
{% if instance.subtitle %}Operated by **{{ instance.subtitle }}**.{% endif %} {% if instance.subtitle %}Operated by **{{ instance.subtitle }}**.{% endif %}
> Looking for human-readable workspace docs? Open `AGNES_WORKSPACE.md` in this directory — that file documents what `agnes init` installed, where files live, and how to uninstall.
## Rules ## Rules
- Before computing any business metric: run `da metrics show <category>/<name>` - Before computing any business metric: run `agnes catalog --metrics --show <category>/<name>`
- **For canonical table list with query modes: `da catalog`.** `data/metadata/schema.json` covers `query_mode: "local"` tables only — for remote/hybrid tables it's incomplete. Treat `da catalog` as source of truth. - **For canonical table list with query modes: `agnes catalog`.** Treat `agnes catalog` as source of truth (covers all `query_mode` values: `local`, `remote`, `materialized`).
- Do not use DESCRIBE/SHOW COLUMNS — use `da schema <table>` instead - Do not use DESCRIBE/SHOW COLUMNS — use `agnes schema <table>` instead
- Save work output to `user/artifacts/` - Sync data regularly with `agnes pull`
- Sync data regularly with `da sync` - **Personal customizations go in `.claude/CLAUDE.local.md`, NOT here.** This file is regenerated by `agnes init --force`; edits here will be lost. CLAUDE.local.md is preserved across regeneration and uploaded on `agnes push`.
- **Personal customizations go in `.claude/CLAUDE.local.md`, NOT here.** This file is regenerated by `da analyst setup --force`; edits here will be lost. CLAUDE.local.md is preserved across regeneration and uploaded on `da sync --upload-only`.
## Metrics Workflow ## Metrics Workflow
1. `da metrics list` — find the relevant metric ({{ metrics.count }} available, categories: {{ metrics.categories | join(", ") or "none yet" }}) 1. `agnes catalog --metrics` — find the relevant metric ({{ metrics.count }} available, categories: {{ metrics.categories | join(", ") or "none yet" }})
2. `da metrics show <category>/<name>` — read SQL and business rules 2. `agnes catalog --metrics --show <category>/<name>` — read SQL and business rules
3. Use the canonical SQL from the metric definition, adapt to the question 3. Use the canonical SQL from the metric definition, adapt to the question
4. Never invent metric calculations — always check existing definitions first 4. Never invent metric calculations — always check existing definitions first
## Data Sync ## Data Sync
- `da sync` — download current data from server - `agnes pull` — download current data from server
- `da sync --docs-only` — just metadata and metrics (fast refresh) - `agnes pull --docs-only` — just metadata and metrics (fast refresh)
- `da sync --upload-only` — upload sessions and local notes to server - `agnes push` — upload sessions and local notes to server
- Data on the server refreshes every {{ sync_interval }} - Data on the server refreshes every {{ sync_interval }}
## Available Datasets ## Available Datasets
@ -56,25 +57,25 @@ This workspace is connected to {{ server.url }}.
Not every table is synced. Tables registered with `query_mode: "remote"` live in Not every table is synced. Tables registered with `query_mode: "remote"` live in
BigQuery, accessed server-side via DuckDB's BQ extension — no parquet on disk. BigQuery, accessed server-side via DuckDB's BQ extension — no parquet on disk.
Tables you don't see in `data/parquet/` may still be queryable. Tables you don't see in `server/parquet/` may still be queryable.
### Discovery first ### Discovery first
``` ```
da catalog --json | jq '.[] | {name, source_type, query_mode}' # see all tables + their modes agnes catalog --json | jq '.[] | {name, source_type, query_mode}' # see all tables + their modes
da schema <table> # columns + types agnes schema <table> # columns + types
da describe <table> -n 5 # sample rows agnes describe <table> -n 5 # sample rows
``` ```
For local-mode tables, query directly with `da query "SELECT … FROM <table>"`. For local-mode tables, query directly with `agnes query "SELECT … FROM <table>"`.
### Three patterns for `query_mode: "remote"` tables ### Three patterns for `query_mode: "remote"` tables
| Pattern | Tool | Use when | | Pattern | Tool | Use when |
|---------|------|----------| |---------|------|----------|
| **`da fetch`** (preferred) | materializes a filtered subset locally → query the snapshot | repeated questions on same slice | | **`agnes snapshot create`** (preferred) | materializes a filtered subset locally → query the snapshot | repeated questions on same slice |
| **`da query --remote`** | one-shot, server-side execution against BigQuery (works for BASE TABLE rows directly + VIEW/MATERIALIZED_VIEW rows via the BQ jobs API; cost-guarded by a 5 GiB scan cap configurable in /admin/server-config) | single aggregate / cheap probe | | **`agnes query --remote`** | one-shot, server-side execution against BigQuery (works for BASE TABLE rows directly + VIEW/MATERIALIZED_VIEW rows via the BQ jobs API; cost-guarded by a 5 GiB scan cap configurable in /admin/server-config) | single aggregate / cheap probe |
| **`da query --register-bq`** | hybrid joins between local snapshots and ad-hoc BQ subqueries | crossing local + remote | | **`agnes query --register-bq`** | hybrid joins between local snapshots and ad-hoc BQ subqueries | crossing local + remote |
### Permission model + cost — important ### Permission model + cost — important
@ -84,49 +85,49 @@ For local-mode tables, query directly with `da query "SELECT … FROM <table>"`.
- list specific columns in `--select` — column-store BQ skips the rest, cheaper - list specific columns in `--select` — column-store BQ skips the rest, cheaper
- run `--estimate` first when unsure of the table size or partitioning - run `--estimate` first when unsure of the table size or partitioning
### `da fetch` discipline ### `agnes snapshot create` discipline
``` ```
# 1. ESTIMATE first — refuses to fetch without knowing the cost # 1. ESTIMATE first — refuses to fetch without knowing the cost
da fetch <table> --select col1,col2 --where "date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)" --estimate agnes snapshot create <table> --select col1,col2 --where "date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)" --estimate
# 2. If reasonable, fetch as a named snapshot # 2. If reasonable, fetch as a named snapshot
da fetch <table> --select col1,col2 --where "..." --as my_recent agnes snapshot create <table> --select col1,col2 --where "..." --as my_recent
# 3. Query the local snapshot # 3. Query the local snapshot
da query "SELECT col1, COUNT(*) FROM my_recent GROUP BY 1" agnes query "SELECT col1, COUNT(*) FROM my_recent GROUP BY 1"
# 4. List + drop snapshots when done # 4. List + drop snapshots when done
da snapshot list agnes snapshot list
da snapshot drop my_recent agnes snapshot drop my_recent
``` ```
Rules of thumb: Rules of thumb:
- ALWAYS list specific columns in `--select`. Avoid implicit SELECT *. - ALWAYS list specific columns in `--select`. Avoid implicit SELECT *.
- ALWAYS include a `--where` for remote tables; otherwise add `--limit`. - ALWAYS include a `--where` for remote tables; otherwise add `--limit`.
- ALWAYS run `--estimate` first when the table is `partition_by` / `clustered_by` - ALWAYS run `--estimate` first when the table is `partition_by` / `clustered_by`
per `da schema`, or could plausibly exceed 1 GB local bytes. per `agnes schema`, or could plausibly exceed 1 GB local bytes.
- Reuse snapshots across questions in the same conversation — `da snapshot list` - Reuse snapshots across questions in the same conversation — `agnes snapshot list`
before fetching. before fetching.
### Snapshot freshness — when to refresh ### Snapshot freshness — when to refresh
Snapshots are point-in-time copies. They go stale as the source data updates (most BQ tables refresh daily; check `sync_schedule` per `da catalog`). For each new conversation: Snapshots are point-in-time copies. They go stale as the source data updates (most BQ tables refresh daily; check `sync_schedule` per `agnes catalog`). For each new conversation:
``` ```
da snapshot list # see existing snapshots + their ages agnes snapshot list # see existing snapshots + their ages
da snapshot drop my_recent # drop stale ones agnes snapshot drop my_recent # drop stale ones
da fetch <table> --select ... --where ... --as my_recent # re-fetch agnes snapshot create <table> --select ... --where ... --as my_recent # re-fetch
``` ```
If the question is time-sensitive (e.g. "today's orders"), assume any snapshot older than the table's `sync_schedule` is stale and refresh. If the question is time-sensitive (e.g. "today's orders"), assume any snapshot older than the table's `sync_schedule` is stale and refresh.
### Hybrid query example — local + remote in one query ### Hybrid query example — local + remote in one query
`da query --register-bq` lets a single SQL statement join a local table with an ad-hoc BQ subquery. The BQ subquery runs first (server-side), result registered as a DuckDB view, then the joined query runs locally. `agnes query --register-bq` lets a single SQL statement join a local table with an ad-hoc BQ subquery. The BQ subquery runs first (server-side), result registered as a DuckDB view, then the joined query runs locally.
``` ```
da query \ agnes query \
--register-bq "traffic=SELECT date, country, SUM(views) AS views \ --register-bq "traffic=SELECT date, country, SUM(views) AS views \
FROM \`prj.web_analytics.sessions\` \ FROM \`prj.web_analytics.sessions\` \
WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) \ WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) \
@ -137,7 +138,7 @@ da query \
ORDER BY 1 DESC" ORDER BY 1 DESC"
``` ```
The BQ subquery MUST contain `WHERE` and/or `GROUP BY` to keep the registered result manageable (target: under 500K rows, well under 100 MB). Multiple `--register-bq` flags can compose multiple BQ sources. For complex SQL, use `--stdin` mode (`echo '{"register_bq":{...},"sql":"..."}' | da query --stdin`). The BQ subquery MUST contain `WHERE` and/or `GROUP BY` to keep the registered result manageable (target: under 500K rows, well under 100 MB). Multiple `--register-bq` flags can compose multiple BQ sources. For complex SQL, use `--stdin` mode (`echo '{"register_bq":{...},"sql":"..."}' | agnes query --stdin`).
### BigQuery SQL flavor for `--where` ### BigQuery SQL flavor for `--where`
@ -150,46 +151,43 @@ Source-typed `bigquery` tables use BigQuery dialect, not DuckDB:
- Regex: `REGEXP_CONTAINS(col, r'pattern')` (raw string!) - Regex: `REGEXP_CONTAINS(col, r'pattern')` (raw string!)
- Cast: `CAST(x AS INT64)` (NOT `INT`) - Cast: `CAST(x AS INT64)` (NOT `INT`)
### When the table you want isn't in `da catalog` ### When the table you want isn't in `agnes catalog`
The table may exist in BigQuery but not be registered with Agnes yet. Two options: The table may exist in BigQuery but not be registered with Agnes yet. Two options:
1. **Ad-hoc one-shot** — register a BQ subquery as a view inline, no admin needed 1. **Ad-hoc one-shot** — register a BQ subquery as a view inline, no admin needed
if the agnes server SA has BQ access: if the agnes server SA has BQ access:
``` ```
da query --register-bq "live=SELECT * FROM \`project.dataset.table\` WHERE date >= '...' LIMIT 1000" \ agnes query --register-bq "live=SELECT * FROM \`project.dataset.table\` WHERE date >= '...' LIMIT 1000" \
--sql "SELECT * FROM live" --sql "SELECT * FROM live"
``` ```
2. **Ask admin to register** the table with `query_mode: "remote"` so it shows up 2. **Ask admin to register** the table with `query_mode: "remote"` so it shows up
in `da catalog` and supports `da fetch` / `da query --remote`. This is the in `agnes catalog` and supports `agnes snapshot create` / `agnes query --remote`. This is the
right path for any table you'll query repeatedly. right path for any table you'll query repeatedly.
### Deeper guidance ### Deeper guidance
For the full protocol, including hybrid-query examples, snapshot hygiene, and For the full protocol, including hybrid-query examples, snapshot hygiene, and
when NOT to use `da fetch`, run: when NOT to use `agnes snapshot create`, run:
``` ```
da skills show agnes-data-querying agnes skills show agnes-data-querying
``` ```
## Corporate Memory ## Corporate Memory
Rules injected by `da sync` from the server's corporate knowledge base live in `.claude/rules/km_*.md`. They are automatically loaded by Claude Code on every session start. Rules injected by `agnes pull` from the server's corporate knowledge base live in `.claude/rules/km_*.md`. They are automatically loaded by Claude Code on every session start.
- `km_<id>.md` — mandatory rules (always enforced) - `km_<id>.md` — mandatory rules (always enforced)
- `km_approved.md` — approved guidance (confidence × recency ranked) - `km_approved.md` — approved guidance (confidence × recency ranked)
Run `da sync` to refresh. Rules are pruned automatically when items are revoked. Run `agnes pull` to refresh. Rules are pruned automatically when items are revoked.
## Directory Structure ## Directory Structure
- `data/` — read-only data downloaded from server - `server/parquet/*.parquet` — synced table data (RBAC-filtered subset for you)
- `data/parquet/` — table data in Parquet format - `user/duckdb/analytics.duckdb` — local analytics DuckDB views — what `agnes query` reads
- `data/duckdb/` — local analytics DuckDB database - `user/snapshots/*.parquet` — ad-hoc materialized snapshots from `agnes snapshot create`
- `data/metadata/` — profiles, schema, metrics cache - `user/sessions/*.jsonl` — Claude Code session logs (uploaded on `agnes push`)
- `user/` — your workspace (persistent across syncs) - `.claude/CLAUDE.local.md` — your personal notes + workspace customizations. **Never overwritten by `agnes init --force`.** Uploaded to the server on `agnes push`. Put any local-only Claude instructions, project-specific reminders, or temporary notes here — NOT in CLAUDE.md (this file is regenerated from a template).
- `user/artifacts/` — analysis outputs, reports, charts
- `user/sessions/` — Claude Code session logs
- `.claude/CLAUDE.local.md` — your personal notes + workspace customizations. **Never overwritten by `da analyst setup --force`.** Uploaded to the server on `da sync --upload-only`. Put any local-only Claude instructions, project-specific reminders, or temporary notes here — NOT in CLAUDE.md (this file is regenerated from a template).
_Hello {{ user.name or user.email }} — generated {{ today }}._ _Hello {{ user.name or user.email }} — generated {{ today }}._