From d25d075ed2fd9235d78c2b61b01702bf4dab9a44 Mon Sep 17 00:00:00 2001
From: ZdenekSrotyr <zdenek.srotyr@keboola.com>
Date: Mon, 4 May 2026 17:51:14 +0200
Subject: [PATCH] docs(claude-md-template): rewrite verbs + paths for new CLI
 surface (Task 6)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Verb renames (da X -> agnes X for surviving verbs; legacy verbs already
  absent from this default template — admin overrides with legacy verbs are
  caught by Task 2's _LEGACY_STRINGS scan + Task 5's admin banner).
- Path renames: data/parquet/ -> server/parquet/, data/duckdb/ ->
  user/duckdb/, data/metadata/ removed entirely (no longer exists per spec).
- Drop user/artifacts/ from directory structure (spec workspace layout
  drops it; surviving paths: server/parquet/, user/duckdb/, user/snapshots/,
  user/sessions/).
- Add AGNES_WORKSPACE.md pointer near top-of-template so analysts know
  where to find human-readable docs.

Cleans Task 0.5's missed sweep on this file (was not in cli/ tree but is
user-visible via /api/welcome).

81 claude_md/welcome_template tests pass.
---
 config/claude_md_template.txt | 98 +++++++++++++++++------------------
 1 file changed, 48 insertions(+), 50 deletions(-)
diff --git a/config/claude_md_template.txt b/config/claude_md_template.txt
index 688f02e..8bdda8e 100644
--- a/config/claude_md_template.txt
+++ b/config/claude_md_template.txt
@@ -1,4 +1,4 @@
-{# Default analyst-onboarding workspace prompt for "da analyst setup".
+{# Default analyst-onboarding workspace prompt for "agnes init".
    Rendered server-side by src/claude_md.py. Edit this file to change
    the OSS default; admins override per-instance via /admin/workspace-prompt.
 
@@ -18,24 +18,25 @@
 This workspace is connected to {{ server.url }}.
 {% if instance.subtitle %}Operated by **{{ instance.subtitle }}**.{% endif %}
 
+> Looking for human-readable workspace docs? Open `AGNES_WORKSPACE.md` in this directory — that file documents what `agnes init` installed, where files live, and how to uninstall.
+
 ## Rules
-- Before computing any business metric: run `da metrics show <category>/<name>`
-- **For canonical table list with query modes: `da catalog`.** `data/metadata/schema.json` covers `query_mode: "local"` tables only — for remote/hybrid tables it's incomplete. Treat `da catalog` as source of truth.
-- Do not use DESCRIBE/SHOW COLUMNS — use `da schema <table>` instead
-- Save work output to `user/artifacts/`
-- Sync data regularly with `da sync`
-- **Personal customizations go in `.claude/CLAUDE.local.md`, NOT here.** This file is regenerated by `da analyst setup --force`; edits here will be lost. CLAUDE.local.md is preserved across regeneration and uploaded on `da sync --upload-only`.
+- Before computing any business metric: run `agnes catalog --metrics --show <category>/<name>`
+- **For canonical table list with query modes: `agnes catalog`.** Treat `agnes catalog` as source of truth (covers all `query_mode` values: `local`, `remote`, `materialized`).
+- Do not use DESCRIBE/SHOW COLUMNS — use `agnes schema <table>` instead
+- Sync data regularly with `agnes pull`
+- **Personal customizations go in `.claude/CLAUDE.local.md`, NOT here.** This file is regenerated by `agnes init --force`; edits here will be lost. CLAUDE.local.md is preserved across regeneration and uploaded on `agnes push`.
 
 ## Metrics Workflow
-1. `da metrics list` — find the relevant metric ({{ metrics.count }} available, categories: {{ metrics.categories | join(", ") or "none yet" }})
-2. `da metrics show <category>/<name>` — read SQL and business rules
+1. `agnes catalog --metrics` — find the relevant metric ({{ metrics.count }} available, categories: {{ metrics.categories | join(", ") or "none yet" }})
+2. `agnes catalog --metrics --show <category>/<name>` — read SQL and business rules
 3. Use the canonical SQL from the metric definition, adapt to the question
 4. Never invent metric calculations — always check existing definitions first
 
 ## Data Sync
-- `da sync` — download current data from server
-- `da sync --docs-only` — just metadata and metrics (fast refresh)
-- `da sync --upload-only` — upload sessions and local notes to server
+- `agnes pull` — download current data from server
+- `agnes pull --docs-only` — just metadata and metrics (fast refresh)
+- `agnes push` — upload sessions and local notes to server
 - Data on the server refreshes every {{ sync_interval }}
 
 ## Available Datasets
@@ -56,25 +57,25 @@ This workspace is connected to {{ server.url }}.
 
 Not every table is synced. Tables registered with `query_mode: "remote"` live in
 BigQuery, accessed server-side via DuckDB's BQ extension — no parquet on disk.
-Tables you don't see in `data/parquet/` may still be queryable.
+Tables you don't see in `server/parquet/` may still be queryable.
 
 ### Discovery first
 
 ```
-da catalog --json | jq '.[] | {name, source_type, query_mode}'   # see all tables + their modes
-da schema <table>                                                # columns + types
-da describe <table> -n 5                                         # sample rows
+agnes catalog --json | jq '.[] | {name, source_type, query_mode}'   # see all tables + their modes
+agnes schema <table>                                                # columns + types
+agnes describe <table> -n 5                                         # sample rows
 ```
 
-For local-mode tables, query directly with `da query "SELECT … FROM <table>"`.
+For local-mode tables, query directly with `agnes query "SELECT … FROM <table>"`.
 
 ### Three patterns for `query_mode: "remote"` tables
 
 | Pattern | Tool | Use when |
 |---------|------|----------|
-| **`da fetch`** (preferred) | materializes a filtered subset locally → query the snapshot | repeated questions on same slice |
-| **`da query --remote`** | one-shot, server-side execution against BigQuery (works for BASE TABLE rows directly + VIEW/MATERIALIZED_VIEW rows via the BQ jobs API; cost-guarded by a 5 GiB scan cap configurable in /admin/server-config) | single aggregate / cheap probe |
-| **`da query --register-bq`** | hybrid joins between local snapshots and ad-hoc BQ subqueries | crossing local + remote |
+| **`agnes snapshot create`** (preferred) | materializes a filtered subset locally → query the snapshot | repeated questions on same slice |
+| **`agnes query --remote`** | one-shot, server-side execution against BigQuery (works for BASE TABLE rows directly + VIEW/MATERIALIZED_VIEW rows via the BQ jobs API; cost-guarded by a 5 GiB scan cap configurable in /admin/server-config) | single aggregate / cheap probe |
+| **`agnes query --register-bq`** | hybrid joins between local snapshots and ad-hoc BQ subqueries | crossing local + remote |
 
 ### Permission model + cost — important
 
@@ -84,49 +85,49 @@ For local-mode tables, query directly with `da query "SELECT … FROM <table>"`.
   - list specific columns in `--select` — column-store BQ skips the rest, cheaper
   - run `--estimate` first when unsure of the table size or partitioning
 
-### `da fetch` discipline
+### `agnes snapshot create` discipline
 
 ```
 # 1. ESTIMATE first — refuses to fetch without knowing the cost
-da fetch <table> --select col1,col2 --where "date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)" --estimate
+agnes snapshot create <table> --select col1,col2 --where "date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)" --estimate
 
 # 2. If reasonable, fetch as a named snapshot
-da fetch <table> --select col1,col2 --where "..." --as my_recent
+agnes snapshot create <table> --select col1,col2 --where "..." --as my_recent
 
 # 3. Query the local snapshot
-da query "SELECT col1, COUNT(*) FROM my_recent GROUP BY 1"
+agnes query "SELECT col1, COUNT(*) FROM my_recent GROUP BY 1"
 
 # 4. List + drop snapshots when done
-da snapshot list
-da snapshot drop my_recent
+agnes snapshot list
+agnes snapshot drop my_recent
 ```
 
 Rules of thumb:
 - ALWAYS list specific columns in `--select`. Avoid implicit SELECT *.
 - ALWAYS include a `--where` for remote tables; otherwise add `--limit`.
 - ALWAYS run `--estimate` first when the table is `partition_by` / `clustered_by`
-  per `da schema`, or could plausibly exceed 1 GB local bytes.
-- Reuse snapshots across questions in the same conversation — `da snapshot list`
+  per `agnes schema`, or could plausibly exceed 1 GB local bytes.
+- Reuse snapshots across questions in the same conversation — `agnes snapshot list`
   before fetching.
 
 ### Snapshot freshness — when to refresh
 
-Snapshots are point-in-time copies. They go stale as the source data updates (most BQ tables refresh daily; check `sync_schedule` per `da catalog`). For each new conversation:
+Snapshots are point-in-time copies. They go stale as the source data updates (most BQ tables refresh daily; check `sync_schedule` per `agnes catalog`). For each new conversation:
 
 ```
-da snapshot list                            # see existing snapshots + their ages
-da snapshot drop my_recent                  # drop stale ones
-da fetch <table> --select ... --where ... --as my_recent   # re-fetch
+agnes snapshot list                            # see existing snapshots + their ages
+agnes snapshot drop my_recent                  # drop stale ones
+agnes snapshot create <table> --select ... --where ... --as my_recent   # re-fetch
 ```
 
 If the question is time-sensitive (e.g. "today's orders"), assume any snapshot older than the table's `sync_schedule` is stale and refresh.
 
 ### Hybrid query example — local + remote in one query
 
-`da query --register-bq` lets a single SQL statement join a local table with an ad-hoc BQ subquery. The BQ subquery runs first (server-side), result registered as a DuckDB view, then the joined query runs locally.
+`agnes query --register-bq` lets a single SQL statement join a local table with an ad-hoc BQ subquery. The BQ subquery runs first (server-side), result registered as a DuckDB view, then the joined query runs locally.
 
 ```
-da query \
+agnes query \
   --register-bq "traffic=SELECT date, country, SUM(views) AS views \
                  FROM \`prj.web_analytics.sessions\` \
                  WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) \
@@ -137,7 +138,7 @@ da query \
          ORDER BY 1 DESC"
 ```
 
-The BQ subquery MUST contain `WHERE` and/or `GROUP BY` to keep the registered result manageable (target: under 500K rows, well under 100 MB). Multiple `--register-bq` flags can compose multiple BQ sources. For complex SQL, use `--stdin` mode (`echo '{"register_bq":{...},"sql":"..."}' | da query --stdin`).
+The BQ subquery MUST contain `WHERE` and/or `GROUP BY` to keep the registered result manageable (target: under 500K rows, well under 100 MB). Multiple `--register-bq` flags can compose multiple BQ sources. For complex SQL, use `--stdin` mode (`echo '{"register_bq":{...},"sql":"..."}' | agnes query --stdin`).
 
 ### BigQuery SQL flavor for `--where`
 
@@ -150,46 +151,43 @@ Source-typed `bigquery` tables use BigQuery dialect, not DuckDB:
 - Regex: `REGEXP_CONTAINS(col, r'pattern')` (raw string!)
 - Cast: `CAST(x AS INT64)` (NOT `INT`)
 
-### When the table you want isn't in `da catalog`
+### When the table you want isn't in `agnes catalog`
 
 The table may exist in BigQuery but not be registered with Agnes yet. Two options:
 
 1. **Ad-hoc one-shot** — register a BQ subquery as a view inline, no admin needed
    if the agnes server SA has BQ access:
    ```
-   da query --register-bq "live=SELECT * FROM \`project.dataset.table\` WHERE date >= '...' LIMIT 1000" \
+   agnes query --register-bq "live=SELECT * FROM \`project.dataset.table\` WHERE date >= '...' LIMIT 1000" \
             --sql "SELECT * FROM live"
    ```
 2. **Ask admin to register** the table with `query_mode: "remote"` so it shows up
-   in `da catalog` and supports `da fetch` / `da query --remote`. This is the
+   in `agnes catalog` and supports `agnes snapshot create` / `agnes query --remote`. This is the
    right path for any table you'll query repeatedly.
 
 ### Deeper guidance
 
 For the full protocol, including hybrid-query examples, snapshot hygiene, and
-when NOT to use `da fetch`, run:
+when NOT to use `agnes snapshot create`, run:
 
 ```
-da skills show agnes-data-querying
+agnes skills show agnes-data-querying
 ```
 
 ## Corporate Memory
 
-Rules injected by `da sync` from the server's corporate knowledge base live in `.claude/rules/km_*.md`. They are automatically loaded by Claude Code on every session start.
+Rules injected by `agnes pull` from the server's corporate knowledge base live in `.claude/rules/km_*.md`. They are automatically loaded by Claude Code on every session start.
 
 - `km_<id>.md` — mandatory rules (always enforced)
 - `km_approved.md` — approved guidance (confidence × recency ranked)
 
-Run `da sync` to refresh. Rules are pruned automatically when items are revoked.
+Run `agnes pull` to refresh. Rules are pruned automatically when items are revoked.
 
 ## Directory Structure
-- `data/` — read-only data downloaded from server
-  - `data/parquet/` — table data in Parquet format
-  - `data/duckdb/` — local analytics DuckDB database
-  - `data/metadata/` — profiles, schema, metrics cache
-- `user/` — your workspace (persistent across syncs)
-  - `user/artifacts/` — analysis outputs, reports, charts
-  - `user/sessions/` — Claude Code session logs
-- `.claude/CLAUDE.local.md` — your personal notes + workspace customizations. **Never overwritten by `da analyst setup --force`.** Uploaded to the server on `da sync --upload-only`. Put any local-only Claude instructions, project-specific reminders, or temporary notes here — NOT in CLAUDE.md (this file is regenerated from a template).
+- `server/parquet/*.parquet` — synced table data (RBAC-filtered subset for you)
+- `user/duckdb/analytics.duckdb` — local analytics DuckDB views — what `agnes query` reads
+- `user/snapshots/*.parquet` — ad-hoc materialized snapshots from `agnes snapshot create`
+- `user/sessions/*.jsonl` — Claude Code session logs (uploaded on `agnes push`)
+- `.claude/CLAUDE.local.md` — your personal notes + workspace customizations. **Never overwritten by `agnes init --force`.** Uploaded to the server on `agnes push`. Put any local-only Claude instructions, project-specific reminders, or temporary notes here — NOT in CLAUDE.md (this file is regenerated from a template).
 
 _Hello {{ user.name or user.email }} — generated {{ today }}._