349 lines
12 KiB
Markdown
349 lines
12 KiB
Markdown
# E2E Verification: clean-analyst-bootstrap (PR #173)
|
||
|
||
End-to-end verification of the clean-analyst-bootstrap rewrite on a deployed
|
||
VM. Designed for parallel sub-agent dispatch — Phase 0 prerequisites run
|
||
sequentially, then 10 parallel slices in a single Claude Code message, then
|
||
sequential hook test, then aggregation.
|
||
|
||
**Estimated runtime:** ~15-20 min total (5 min Phase 0 + 5-10 min Phase 1
|
||
parallel + 3 min Phase 2 + 1 min aggregation).
|
||
|
||
**How to use:** open Claude Code on the VM in any cwd; paste this entire
|
||
document into the conversation. Claude Code will execute Phase 0, then
|
||
dispatch the 10 slices in parallel via `Agent` tool calls in a single
|
||
message, then guide you through Phase 2.
|
||
|
||
## Prerequisites — fill these in before starting
|
||
|
||
- **Server URL:** `https://<your-agnes-host>`
|
||
- **Active user:** confirm via web login (Google OAuth or password works)
|
||
- **Test PAT:** mint via web `/setup?role=analyst` → click Generate prompt → copy clipboard → extract the PAT. Save as `$TEST_PAT`.
|
||
- **Picked test tables:**
|
||
- `LOCAL_TABLE`: any `query_mode='local'` table from `agnes catalog --json`
|
||
- `REMOTE_TABLE`: any `query_mode='remote'` BigQuery table (preferably small or partitioned)
|
||
- `REMOTE_WHERE`: a simple WHERE clause that filters `REMOTE_TABLE` to a small subset (e.g., `event_date = DATE '2026-01-01'`)
|
||
- `REMOTE_SELECT`: 1-2 columns to select from `REMOTE_TABLE`
|
||
|
||
---
|
||
|
||
## Phase 0 — Prerequisites (sequential, ~5 min)
|
||
|
||
```bash
|
||
# 1. Confirm server reachable + on the new build
|
||
curl -fsS "$SERVER_URL/api/health" | head -3
|
||
curl -fsS "$SERVER_URL/cli/version" 2>/dev/null || true # should mention 0.32.0+ post-rename
|
||
|
||
# 2. Confirm /setup?role=analyst renders the role tiles
|
||
curl -fsS "$SERVER_URL/setup?role=analyst" | grep -E "role-tiles|Analyst workspace|const ROLE" | head -5
|
||
# Expect: at least 3 matches (CSS class, tile heading, JS const)
|
||
|
||
# 3. Mint analyst PAT (see Prerequisites above) and export it
|
||
export SERVER_URL="https://<your-agnes-host>"
|
||
export TEST_PAT="agnes_pat_..."
|
||
|
||
# 4. Bootstrap the BASE workspace that read-only slices share
|
||
mkdir -p /tmp/agnes-e2e-base
|
||
cd /tmp/agnes-e2e-base
|
||
# Paste the install prompt verbatim from /setup?role=analyst.
|
||
# After it finishes:
|
||
tree -aL 2 /tmp/agnes-e2e-base | head -30
|
||
```
|
||
|
||
**Gate:** if the base workspace doesn't have the expected shape (`CLAUDE.md`,
|
||
`AGNES_WORKSPACE.md`, `.claude/settings.json`, `user/duckdb/analytics.duckdb`),
|
||
STOP and report the failure mode — the rest of the plan assumes a working
|
||
base workspace.
|
||
|
||
---
|
||
|
||
## Phase 1 — Parallel slices (dispatch ALL 10 in ONE message, ~5-10 min)
|
||
|
||
For each slice, dispatch a `general-purpose` Agent. **Send all 10 Agent tool
|
||
calls in a single response message** so they run concurrently. Pass
|
||
`SERVER_URL` and `TEST_PAT` from Phase 0 into each prompt.
|
||
|
||
### Slice 1 — Web UI + role tiles + paste-prompt content
|
||
|
||
```text
|
||
Verify the /setup?role=analyst and /setup?role=admin pages on $SERVER_URL.
|
||
|
||
Required checks:
|
||
- /setup?role=analyst renders 2 role tiles (Analyst workspace + Admin CLI);
|
||
Analyst tile has is-active class, Admin tile is inactive.
|
||
- Page contains `const ROLE = "analyst"` (JSON-escaped form, with quotes).
|
||
- The JS array `SETUP_INSTRUCTIONS_TEMPLATE` contains `agnes init` and
|
||
`agnes catalog`, does NOT contain `claude plugin marketplace add`,
|
||
`agnes auth import-token`, or `agnes diagnose`.
|
||
- /setup?role=admin: admin tile active, prompt contains `agnes auth import-token`.
|
||
- /install (no query) returns 302 to /setup.
|
||
|
||
Tools: curl + grep/python regex.
|
||
Report: PASS/FAIL per check + 5 lines of the rendered prompt for both roles.
|
||
```
|
||
|
||
### Slice 2 — `agnes init` workspace inventory
|
||
|
||
```text
|
||
In /tmp/agnes-e2e-init (NEW empty folder):
|
||
agnes init --server-url $SERVER_URL --token $TEST_PAT --workspace .
|
||
|
||
Verify EXACTLY this file set is present:
|
||
- CLAUDE.md (non-empty, contains "agnes pull")
|
||
- AGNES_WORKSPACE.md (does NOT contain $TEST_PAT, no "{placeholder}" leaks,
|
||
contains $SERVER_URL, contains the absolute workspace path,
|
||
has 6 H2 sections matching ^## )
|
||
- .claude/settings.json (hooks.SessionStart with `agnes pull --quiet`,
|
||
hooks.SessionEnd with `agnes push --quiet`, has `model` and `permissions.allow`)
|
||
- .claude/CLAUDE.local.md (stub content "# My Notes")
|
||
- user/duckdb/analytics.duckdb (file exists, non-zero size)
|
||
|
||
Verify NONE of these exist:
|
||
data/parquet, data/duckdb, data/metadata, user/artifacts, .agnes
|
||
|
||
Conditional dirs (per lazy-mkdir contract):
|
||
- if server/parquet/ exists, must contain ≥1 .parquet
|
||
- if .claude/rules/ exists, must contain ≥1 km_*.md
|
||
|
||
Run: tree -aL 3 /tmp/agnes-e2e-init
|
||
Report: full file inventory + PASS/FAIL per assertion.
|
||
```
|
||
|
||
### Slice 3 — Reader smoke matrix (read-only against base workspace)
|
||
|
||
```text
|
||
cd /tmp/agnes-e2e-base
|
||
For each command, capture exit code + first stderr line.
|
||
Forbidden: any "Traceback" in stderr.
|
||
|
||
agnes catalog
|
||
agnes catalog --json
|
||
agnes catalog --metrics
|
||
agnes schema $LOCAL_TABLE
|
||
agnes describe $LOCAL_TABLE -n 5
|
||
agnes status
|
||
agnes status --json
|
||
agnes diagnose
|
||
agnes diagnose system
|
||
agnes disk-info
|
||
agnes auth whoami
|
||
agnes skills list
|
||
agnes skills show agnes-data-querying
|
||
|
||
Bad-input no-crash:
|
||
agnes schema __nonexistent__
|
||
agnes describe __nonexistent__
|
||
agnes explore __nonexistent__
|
||
|
||
PASS = all rc ∈ {0, 1}, no Traceback.
|
||
Report: command-by-command rc + traceback-Y/N + first stderr line.
|
||
```
|
||
|
||
### Slice 4 — Local + remote query paths
|
||
|
||
```text
|
||
cd /tmp/agnes-e2e-base
|
||
|
||
Local (parquet/DuckDB) path:
|
||
agnes query "SELECT count(*) FROM $LOCAL_TABLE LIMIT 1"
|
||
agnes query "SELECT * FROM $LOCAL_TABLE LIMIT 3"
|
||
agnes query --json "SELECT count(*) FROM $LOCAL_TABLE"
|
||
agnes explore $LOCAL_TABLE # friendly even in non-TTY
|
||
|
||
Remote (BigQuery passthrough) path:
|
||
agnes query --remote "SELECT count(*) FROM $REMOTE_TABLE LIMIT 1"
|
||
agnes query --remote "SELECT count(*) FROM $REMOTE_TABLE" --limit 1
|
||
|
||
Report: per-query rc, first 5 lines of stdout, traceback-Y/N.
|
||
Flag any 502 / cost-guardrail / typed-error envelopes.
|
||
```
|
||
|
||
### Slice 5 — Snapshot lifecycle
|
||
|
||
```text
|
||
cd /tmp/agnes-e2e-base
|
||
|
||
agnes snapshot list # baseline
|
||
agnes snapshot create $REMOTE_TABLE --select $REMOTE_SELECT --where '$REMOTE_WHERE' --as e2e_test_snap --estimate
|
||
agnes snapshot create $REMOTE_TABLE --select $REMOTE_SELECT --where '$REMOTE_WHERE' --as e2e_test_snap
|
||
agnes snapshot list # should show e2e_test_snap
|
||
agnes query "SELECT count(*) FROM e2e_test_snap"
|
||
agnes snapshot refresh e2e_test_snap
|
||
agnes snapshot drop e2e_test_snap
|
||
agnes snapshot list # back to baseline
|
||
|
||
PASS = clean lifecycle, snapshot file present after create + absent after drop.
|
||
Report: per-step rc + first stdout line.
|
||
```
|
||
|
||
### Slice 6 — Force / protection scenarios
|
||
|
||
```text
|
||
Setup:
|
||
cp -r /tmp/agnes-e2e-base /tmp/agnes-e2e-force
|
||
cd /tmp/agnes-e2e-force
|
||
echo "# my private edit" > .claude/CLAUDE.local.md
|
||
|
||
Test 1: re-init without --force should refuse
|
||
agnes init --server-url $SERVER_URL --token $TEST_PAT --workspace .
|
||
Expect rc=1, stderr contains "already initialized" or "partial_state",
|
||
no Traceback.
|
||
|
||
Test 2: --force regenerates CLAUDE.md but PRESERVES CLAUDE.local.md
|
||
agnes init --server-url $SERVER_URL --token $TEST_PAT --workspace . --force
|
||
Expect rc=0
|
||
Then: cat .claude/CLAUDE.local.md → must still contain "# my private edit"
|
||
|
||
Report: PASS/FAIL per test + grep result for the private edit marker.
|
||
```
|
||
|
||
### Slice 7 — Pre-init reader smoke (no-traceback contract)
|
||
|
||
```text
|
||
mkdir -p /tmp/agnes-e2e-pre && cd /tmp/agnes-e2e-pre
|
||
|
||
For each, capture rc + stderr:
|
||
agnes query "SELECT 1"
|
||
agnes snapshot create __nope__ --as x --estimate
|
||
agnes explore foo
|
||
agnes snapshot list
|
||
agnes status
|
||
agnes catalog
|
||
agnes disk-info
|
||
|
||
Forbidden: any "Traceback" in stderr.
|
||
Expected: rc=1 with friendly hint mentioning "agnes init" or "agnes pull".
|
||
|
||
Report: per-command rc + traceback-Y/N + did-hint-mention-init-or-pull.
|
||
```
|
||
|
||
### Slice 8 — Auth + token lifecycle
|
||
|
||
```text
|
||
agnes auth whoami # should print email + role
|
||
|
||
# Token CRUD
|
||
agnes auth token list
|
||
TID=$(agnes auth token create e2e-test --expires-in-days 1 | grep -oE 'tok_[a-zA-Z0-9_-]+' | head -1)
|
||
agnes auth token list # new token visible
|
||
agnes auth token revoke "$TID"
|
||
agnes auth token list # revoked_at populated for $TID
|
||
|
||
# Bad token → friendly 401
|
||
AGNES_TOKEN=fake-pat agnes catalog 2>&1 | tail -5
|
||
# Expect friendly hint, no Traceback
|
||
|
||
Report: PASS/FAIL per step + the captured TID.
|
||
```
|
||
|
||
### Slice 9 — Admin metrics + catalog --metrics
|
||
|
||
```text
|
||
# Read paths (any analyst)
|
||
agnes catalog --metrics
|
||
agnes catalog --metrics --show revenue/mrr # adjust to a real metric path
|
||
|
||
# Write paths (admin only — skip if the test user isn't admin)
|
||
agnes admin metrics --help # surface check (import/export/validate listed)
|
||
|
||
If admin:
|
||
agnes admin metrics validate
|
||
agnes admin metrics export /tmp/metrics-backup
|
||
ls /tmp/metrics-backup/ | head
|
||
|
||
Report: per-command rc + first stdout line.
|
||
```
|
||
|
||
### Slice 10 — AGNES_WORKSPACE.md content quality
|
||
|
||
```text
|
||
cd /tmp/agnes-e2e-base
|
||
cat AGNES_WORKSPACE.md | head -100
|
||
|
||
Programmatic checks:
|
||
- Contains "Created:" header line with ISO timestamp
|
||
- Contains "Server:" line with $SERVER_URL
|
||
- Contains "Workspace:" line with absolute path
|
||
- $TEST_PAT does NOT appear anywhere in the file
|
||
- No literal "{created_at}", "{server_url}", "{workspace_path}" substrings
|
||
- Has exactly 6 H2 sections (^## )
|
||
- Cheat sheet section mentions: agnes catalog, agnes query, agnes pull,
|
||
agnes snapshot create, agnes status
|
||
- Uninstall section mentions: uv tool uninstall, ~/.config/agnes, ~/.agnes
|
||
- For each path in the "Globally installed" table that's a real file path
|
||
(~/.local/bin/agnes, ~/.config/agnes/{config.yaml,token.json}):
|
||
test -e and report exists/missing.
|
||
|
||
Report: each assertion PASS/FAIL.
|
||
```
|
||
|
||
---
|
||
|
||
## Phase 2 — Hook behavior (sequential, ~3 min)
|
||
|
||
Sub-agents can't open Claude Code sessions, so this part is manual.
|
||
|
||
```bash
|
||
cd /tmp/agnes-e2e-base
|
||
claude # opens Claude Code; SessionStart hook fires `agnes pull --quiet`
|
||
|
||
# Inside Claude Code, ask: "show me 5 rows of $LOCAL_TABLE" → should work without errors
|
||
|
||
/exit # SessionEnd hook fires `agnes push --quiet`
|
||
|
||
ls /tmp/agnes-e2e-base/user/sessions/ # should be non-empty (transcript captured)
|
||
|
||
# Re-enter to verify SessionStart fires again
|
||
claude
|
||
/exit
|
||
```
|
||
|
||
**Verify in server audit log:** 2× `agnes pull` GETs (one per session start)
|
||
+ 2× `agnes push` POSTs (one per session end). Tail the server audit endpoint
|
||
or DB table to confirm.
|
||
|
||
---
|
||
|
||
## Phase 3 — Aggregation
|
||
|
||
Compile a single PASS/FAIL table:
|
||
|
||
| Slice | Status | Notes |
|
||
|---|---|---|
|
||
| 1 — Web UI role tiles | … | … |
|
||
| 2 — agnes init inventory | … | … |
|
||
| 3 — Reader smoke matrix | … | … |
|
||
| 4 — Query paths | … | … |
|
||
| 5 — Snapshot lifecycle | … | … |
|
||
| 6 — Force / protection | … | … |
|
||
| 7 — Pre-init no-traceback | … | … |
|
||
| 8 — Auth + tokens | … | … |
|
||
| 9 — Admin metrics | … | … |
|
||
| 10 — AGNES_WORKSPACE.md | … | … |
|
||
| Hooks (Phase 2) | … | … |
|
||
|
||
For any FAIL: preserve the failing folder/output, report the exact command
|
||
+ first traceback / first stderr line. Don't fix in place — flag for follow-up.
|
||
|
||
---
|
||
|
||
## Cleanup after testing
|
||
|
||
```bash
|
||
rm -rf /tmp/agnes-e2e-base /tmp/agnes-e2e-init /tmp/agnes-e2e-force /tmp/agnes-e2e-pre /tmp/metrics-backup
|
||
agnes auth token revoke <e2e-test PAT id>
|
||
```
|
||
|
||
---
|
||
|
||
## Slice priority
|
||
|
||
If time is constrained, run these load-bearing slices first:
|
||
|
||
1. **Slice 1** (Web UI role tiles) — proves the web entry point works.
|
||
2. **Slice 2** (agnes init inventory) — proves the bootstrap creates the
|
||
exact expected file set with no dead dirs.
|
||
3. **Slice 7** (pre-init no-traceback) — proves the reader contract holds.
|
||
4. **Slice 10** (AGNES_WORKSPACE.md content) — proves the human-facing
|
||
docs render correctly with no PAT leak.
|
||
|
||
Slices 3-6, 8-9 are breadth coverage — important but lower-priority.
|