docs(testing): E2E verification plan for clean-analyst-bootstrap (PR #173)

2026-05-04 19:41:50 +02:00 · 2026-05-04 19:41:50 +02:00 · d311b07d5d
commit d311b07d5d
parent 5bffec641f
1 changed files with 349 additions and 0 deletions
--- a/docs/testing/e2e_clean_analyst_bootstrap.md
+++ b/docs/testing/e2e_clean_analyst_bootstrap.md
@ -0,0 +1,349 @@
+# E2E Verification: clean-analyst-bootstrap (PR #173)
+
+End-to-end verification of the clean-analyst-bootstrap rewrite on a deployed
+VM. Designed for parallel sub-agent dispatch — Phase 0 prerequisites run
+sequentially, then 10 parallel slices in a single Claude Code message, then
+sequential hook test, then aggregation.
+
+**Estimated runtime:** ~15-20 min total (5 min Phase 0 + 5-10 min Phase 1
+parallel + 3 min Phase 2 + 1 min aggregation).
+
+**How to use:** open Claude Code on the VM in any cwd; paste this entire
+document into the conversation. Claude Code will execute Phase 0, then
+dispatch the 10 slices in parallel via `Agent` tool calls in a single
+message, then guide you through Phase 2.
+
+## Prerequisites — fill these in before starting
+
+- **Server URL:** `https://<your-agnes-host>`
+- **Active user:** confirm via web login (Google OAuth or password works)
+- **Test PAT:** mint via web `/setup?role=analyst` → click Generate prompt → copy clipboard → extract the PAT. Save as `$TEST_PAT`.
+- **Picked test tables:**
+  - `LOCAL_TABLE`: any `query_mode='local'` table from `agnes catalog --json`
+  - `REMOTE_TABLE`: any `query_mode='remote'` BigQuery table (preferably small or partitioned)
+  - `REMOTE_WHERE`: a simple WHERE clause that filters `REMOTE_TABLE` to a small subset (e.g., `event_date = DATE '2026-01-01'`)
+  - `REMOTE_SELECT`: 1-2 columns to select from `REMOTE_TABLE`
+
+---
+
+## Phase 0 — Prerequisites (sequential, ~5 min)
+
+```bash
+# 1. Confirm server reachable + on the new build
+curl -fsS "$SERVER_URL/api/health" | head -3
+curl -fsS "$SERVER_URL/cli/version" 2>/dev/null || true   # should mention 0.32.0+ post-rename
+
+# 2. Confirm /setup?role=analyst renders the role tiles
+curl -fsS "$SERVER_URL/setup?role=analyst" | grep -E "role-tiles|Analyst workspace|const ROLE" | head -5
+# Expect: at least 3 matches (CSS class, tile heading, JS const)
+
+# 3. Mint analyst PAT (see Prerequisites above) and export it
+export SERVER_URL="https://<your-agnes-host>"
+export TEST_PAT="agnes_pat_..."
+
+# 4. Bootstrap the BASE workspace that read-only slices share
+mkdir -p /tmp/agnes-e2e-base
+cd /tmp/agnes-e2e-base
+# Paste the install prompt verbatim from /setup?role=analyst.
+# After it finishes:
+tree -aL 2 /tmp/agnes-e2e-base | head -30
+```
+
+**Gate:** if the base workspace doesn't have the expected shape (`CLAUDE.md`,
+`AGNES_WORKSPACE.md`, `.claude/settings.json`, `user/duckdb/analytics.duckdb`),
+STOP and report the failure mode — the rest of the plan assumes a working
+base workspace.
+
+---
+
+## Phase 1 — Parallel slices (dispatch ALL 10 in ONE message, ~5-10 min)
+
+For each slice, dispatch a `general-purpose` Agent. **Send all 10 Agent tool
+calls in a single response message** so they run concurrently. Pass
+`SERVER_URL` and `TEST_PAT` from Phase 0 into each prompt.
+
+### Slice 1 — Web UI + role tiles + paste-prompt content
+
+```text
+Verify the /setup?role=analyst and /setup?role=admin pages on $SERVER_URL.
+
+Required checks:
+- /setup?role=analyst renders 2 role tiles (Analyst workspace + Admin CLI);
+  Analyst tile has is-active class, Admin tile is inactive.
+- Page contains `const ROLE = "analyst"` (JSON-escaped form, with quotes).
+- The JS array `SETUP_INSTRUCTIONS_TEMPLATE` contains `agnes init` and
+  `agnes catalog`, does NOT contain `claude plugin marketplace add`,
+  `agnes auth import-token`, or `agnes diagnose`.
+- /setup?role=admin: admin tile active, prompt contains `agnes auth import-token`.
+- /install (no query) returns 302 to /setup.
+
+Tools: curl + grep/python regex.
+Report: PASS/FAIL per check + 5 lines of the rendered prompt for both roles.
+```
+
+### Slice 2 — `agnes init` workspace inventory
+
+```text
+In /tmp/agnes-e2e-init (NEW empty folder):
+  agnes init --server-url $SERVER_URL --token $TEST_PAT --workspace .
+
+Verify EXACTLY this file set is present:
+- CLAUDE.md (non-empty, contains "agnes pull")
+- AGNES_WORKSPACE.md (does NOT contain $TEST_PAT, no "{placeholder}" leaks,
+  contains $SERVER_URL, contains the absolute workspace path,
+  has 6 H2 sections matching ^## )
+- .claude/settings.json (hooks.SessionStart with `agnes pull --quiet`,
+  hooks.SessionEnd with `agnes push --quiet`, has `model` and `permissions.allow`)
+- .claude/CLAUDE.local.md (stub content "# My Notes")
+- user/duckdb/analytics.duckdb (file exists, non-zero size)
+
+Verify NONE of these exist:
+  data/parquet, data/duckdb, data/metadata, user/artifacts, .agnes
+
+Conditional dirs (per lazy-mkdir contract):
+- if server/parquet/ exists, must contain ≥1 .parquet
+- if .claude/rules/ exists, must contain ≥1 km_*.md
+
+Run: tree -aL 3 /tmp/agnes-e2e-init
+Report: full file inventory + PASS/FAIL per assertion.
+```
+
+### Slice 3 — Reader smoke matrix (read-only against base workspace)
+
+```text
+cd /tmp/agnes-e2e-base
+For each command, capture exit code + first stderr line.
+Forbidden: any "Traceback" in stderr.
+
+  agnes catalog
+  agnes catalog --json
+  agnes catalog --metrics
+  agnes schema $LOCAL_TABLE
+  agnes describe $LOCAL_TABLE -n 5
+  agnes status
+  agnes status --json
+  agnes diagnose
+  agnes diagnose system
+  agnes disk-info
+  agnes auth whoami
+  agnes skills list
+  agnes skills show agnes-data-querying
+
+Bad-input no-crash:
+  agnes schema __nonexistent__
+  agnes describe __nonexistent__
+  agnes explore __nonexistent__
+
+PASS = all rc ∈ {0, 1}, no Traceback.
+Report: command-by-command rc + traceback-Y/N + first stderr line.
+```
+
+### Slice 4 — Local + remote query paths
+
+```text
+cd /tmp/agnes-e2e-base
+
+Local (parquet/DuckDB) path:
+  agnes query "SELECT count(*) FROM $LOCAL_TABLE LIMIT 1"
+  agnes query "SELECT * FROM $LOCAL_TABLE LIMIT 3"
+  agnes query --json "SELECT count(*) FROM $LOCAL_TABLE"
+  agnes explore $LOCAL_TABLE   # friendly even in non-TTY
+
+Remote (BigQuery passthrough) path:
+  agnes query --remote "SELECT count(*) FROM $REMOTE_TABLE LIMIT 1"
+  agnes query --remote "SELECT count(*) FROM $REMOTE_TABLE" --limit 1
+
+Report: per-query rc, first 5 lines of stdout, traceback-Y/N.
+Flag any 502 / cost-guardrail / typed-error envelopes.
+```
+
+### Slice 5 — Snapshot lifecycle
+
+```text
+cd /tmp/agnes-e2e-base
+
+  agnes snapshot list                                            # baseline
+  agnes snapshot create $REMOTE_TABLE --select $REMOTE_SELECT --where '$REMOTE_WHERE' --as e2e_test_snap --estimate
+  agnes snapshot create $REMOTE_TABLE --select $REMOTE_SELECT --where '$REMOTE_WHERE' --as e2e_test_snap
+  agnes snapshot list                                            # should show e2e_test_snap
+  agnes query "SELECT count(*) FROM e2e_test_snap"
+  agnes snapshot refresh e2e_test_snap
+  agnes snapshot drop e2e_test_snap
+  agnes snapshot list                                            # back to baseline
+
+PASS = clean lifecycle, snapshot file present after create + absent after drop.
+Report: per-step rc + first stdout line.
+```
+
+### Slice 6 — Force / protection scenarios
+
+```text
+Setup:
+  cp -r /tmp/agnes-e2e-base /tmp/agnes-e2e-force
+  cd /tmp/agnes-e2e-force
+  echo "# my private edit" > .claude/CLAUDE.local.md
+
+Test 1: re-init without --force should refuse
+  agnes init --server-url $SERVER_URL --token $TEST_PAT --workspace .
+  Expect rc=1, stderr contains "already initialized" or "partial_state",
+  no Traceback.
+
+Test 2: --force regenerates CLAUDE.md but PRESERVES CLAUDE.local.md
+  agnes init --server-url $SERVER_URL --token $TEST_PAT --workspace . --force
+  Expect rc=0
+  Then: cat .claude/CLAUDE.local.md  → must still contain "# my private edit"
+
+Report: PASS/FAIL per test + grep result for the private edit marker.
+```
+
+### Slice 7 — Pre-init reader smoke (no-traceback contract)
+
+```text
+mkdir -p /tmp/agnes-e2e-pre && cd /tmp/agnes-e2e-pre
+
+For each, capture rc + stderr:
+  agnes query "SELECT 1"
+  agnes snapshot create __nope__ --as x --estimate
+  agnes explore foo
+  agnes snapshot list
+  agnes status
+  agnes catalog
+  agnes disk-info
+
+Forbidden: any "Traceback" in stderr.
+Expected: rc=1 with friendly hint mentioning "agnes init" or "agnes pull".
+
+Report: per-command rc + traceback-Y/N + did-hint-mention-init-or-pull.
+```
+
+### Slice 8 — Auth + token lifecycle
+
+```text
+agnes auth whoami     # should print email + role
+
+# Token CRUD
+agnes auth token list
+TID=$(agnes auth token create e2e-test --expires-in-days 1 | grep -oE 'tok_[a-zA-Z0-9_-]+' | head -1)
+agnes auth token list   # new token visible
+agnes auth token revoke "$TID"
+agnes auth token list   # revoked_at populated for $TID
+
+# Bad token → friendly 401
+AGNES_TOKEN=fake-pat agnes catalog 2>&1 | tail -5
+# Expect friendly hint, no Traceback
+
+Report: PASS/FAIL per step + the captured TID.
+```
+
+### Slice 9 — Admin metrics + catalog --metrics
+
+```text
+# Read paths (any analyst)
+agnes catalog --metrics
+agnes catalog --metrics --show revenue/mrr   # adjust to a real metric path
+
+# Write paths (admin only — skip if the test user isn't admin)
+agnes admin metrics --help    # surface check (import/export/validate listed)
+
+If admin:
+  agnes admin metrics validate
+  agnes admin metrics export /tmp/metrics-backup
+  ls /tmp/metrics-backup/ | head
+
+Report: per-command rc + first stdout line.
+```
+
+### Slice 10 — AGNES_WORKSPACE.md content quality
+
+```text
+cd /tmp/agnes-e2e-base
+cat AGNES_WORKSPACE.md | head -100
+
+Programmatic checks:
+- Contains "Created:" header line with ISO timestamp
+- Contains "Server:" line with $SERVER_URL
+- Contains "Workspace:" line with absolute path
+- $TEST_PAT does NOT appear anywhere in the file
+- No literal "{created_at}", "{server_url}", "{workspace_path}" substrings
+- Has exactly 6 H2 sections (^## )
+- Cheat sheet section mentions: agnes catalog, agnes query, agnes pull,
+  agnes snapshot create, agnes status
+- Uninstall section mentions: uv tool uninstall, ~/.config/agnes, ~/.agnes
+- For each path in the "Globally installed" table that's a real file path
+  (~/.local/bin/agnes, ~/.config/agnes/{config.yaml,token.json}):
+  test -e and report exists/missing.
+
+Report: each assertion PASS/FAIL.
+```
+
+---
+
+## Phase 2 — Hook behavior (sequential, ~3 min)
+
+Sub-agents can't open Claude Code sessions, so this part is manual.
+
+```bash
+cd /tmp/agnes-e2e-base
+claude                    # opens Claude Code; SessionStart hook fires `agnes pull --quiet`
+
+# Inside Claude Code, ask: "show me 5 rows of $LOCAL_TABLE"  → should work without errors
+
+/exit                     # SessionEnd hook fires `agnes push --quiet`
+
+ls /tmp/agnes-e2e-base/user/sessions/    # should be non-empty (transcript captured)
+
+# Re-enter to verify SessionStart fires again
+claude
+/exit
+```
+
+**Verify in server audit log:** 2× `agnes pull` GETs (one per session start)
+ 2× `agnes push` POSTs (one per session end). Tail the server audit endpoint
+or DB table to confirm.
+
+---
+
+## Phase 3 — Aggregation
+
+Compile a single PASS/FAIL table:
+
+| Slice | Status | Notes |
+|---|---|---|
+| 1 — Web UI role tiles | … | … |
+| 2 — agnes init inventory | … | … |
+| 3 — Reader smoke matrix | … | … |
+| 4 — Query paths | … | … |
+| 5 — Snapshot lifecycle | … | … |
+| 6 — Force / protection | … | … |
+| 7 — Pre-init no-traceback | … | … |
+| 8 — Auth + tokens | … | … |
+| 9 — Admin metrics | … | … |
+| 10 — AGNES_WORKSPACE.md | … | … |
+| Hooks (Phase 2) | … | … |
+
+For any FAIL: preserve the failing folder/output, report the exact command
+ first traceback / first stderr line. Don't fix in place — flag for follow-up.
+
+---
+
+## Cleanup after testing
+
+```bash
+rm -rf /tmp/agnes-e2e-base /tmp/agnes-e2e-init /tmp/agnes-e2e-force /tmp/agnes-e2e-pre /tmp/metrics-backup
+agnes auth token revoke <e2e-test PAT id>
+```
+
+---
+
+## Slice priority
+
+If time is constrained, run these load-bearing slices first:
+
+1. **Slice 1** (Web UI role tiles) — proves the web entry point works.
+2. **Slice 2** (agnes init inventory) — proves the bootstrap creates the
+   exact expected file set with no dead dirs.
+3. **Slice 7** (pre-init no-traceback) — proves the reader contract holds.
+4. **Slice 10** (AGNES_WORKSPACE.md content) — proves the human-facing
+   docs render correctly with no PAT leak.
+
+Slices 3-6, 8-9 are breadth coverage — important but lower-priority.