# Platform Telemetry Epic — 2026-05-12 > **For agentic workers:** REQUIRED SUB-SKILL: `superpowers:subagent-driven-development`. Each phase is one PR-ready unit. Tasks within a phase share state and ship together. > > **Branch:** `zs/platform-telemetry` (stacked on `zs/spec-activity-center` schema v40 + clean `/admin/activity` rebuild) > **Schema:** target v41 (one bump for the whole telemetry foundation) > **Goal:** Boss directive + Activity Monitoring Plan (Downloads, 2026-05-09) merged into one executable program. --- ## What this delivers Boss directive maps to: | Boss bullet | Where it lands here | Status before this PR | |---|---|---| | Platform setup, how-to guides | Phase D | scattered docs, no consolidated playbook | | Export telemetry | Phase C — `/api/admin/usage/export` + `agnes admin usage export` | nothing | | Admin access to telemetry (prompts, tool, usage) | Phase B (`/admin/users/` Sessions) + Phase C (export + ask) | `UsageProcessor` is no-op | | **Prompt the telemetry** (option C — CLI Text-to-SQL) | Phase C — `agnes admin ask "..."` | nothing | | Privacy mode | **Out of scope — Minas's `agnes mark-private` (#242, merged) already covers this** | shipped | | Flea market (anyone uploads, guardrails only, no ACL) | **Out of scope — shipped via #233 store_entities + #234 enrichment** | shipped | Plus the Activity Monitoring Plan (Downloads) substance: | Plan task | This epic phase | |---|---| | §0 Schema v38 | Phase A.1 — **renumber to v41** | | §1 Attribution explode | Phase A.2 | | §2 UsageProcessor real extraction | Phase A.3 | | §3 /marketplace stats wire | Phase B.1 | | §4 /admin/users sessions section | Phase B.2 | | §5 Reprocess + retention | Phase C.4 | | §6 Docs + CHANGELOG | Phase D | --- ## Architecture invariants - **Three-source taxonomy** for invocations: `curated` (curator-managed marketplace plugins), `flea` (analyst-uploaded store_entities), `builtin` (Anthropic-shipped Bash/Read/Edit/…). - **Per-event row + per-session summary + daily rollup** — events for forensics, summary for the user-detail page, rollups for marketplace popularity queries. - **Attribution explode at write time** (marketplace sync, store entity write) into `usage_attribution_*` tables. Processor does single-query attribution lookup, no scan-time. - **Privacy** = analyst's per-session `agnes mark-private` decision (Minas's #242). Sessions marked private never upload → never reach `UsageProcessor` → no telemetry. No new privacy code needed. - **Reprocess strategy** — `USAGE_PROCESSOR_VERSION` bump triggers `agnes admin usage reprocess` which DELETEs `session_processor_state` rows for `usage` only, leaves `verification` untouched (composite PK). - **Retention** — `USAGE_EVENTS_RETENTION_DAYS` env var, default `0` = forever. Daily prune in scheduler. - **Admin telemetry "ask"** — CLI sends the natural-language question + a schema digest + a few sample rows to Anthropic API (Claude Haiku, cheapest model that handles SQL well), gets SQL back, executes it read-only, prints both the SQL and the results. **Audit-logged.** No data leaves the server beyond the schema digest + the question itself. --- ## Phase A — Foundation (5 tasks) ### A.1: Schema v41 migration DDL identical to Activity Monitoring Plan §0 but bumped to v41: ```sql -- usage_events, usage_session_summary, usage_tool_daily, usage_plugin_daily, -- usage_attribution_skills, usage_attribution_agents, usage_attribution_commands -- See Activity Monitoring Plan — 2026-05-09 lines 92–199 for full DDL. ``` Files: - `src/db.py:43` — bump `SCHEMA_VERSION = 41` - `src/db.py` — add `_v40_to_v41(conn)` function with all 7 `CREATE TABLE IF NOT EXISTS` + indices, idempotent (`ADD COLUMN IF NOT EXISTS` not relevant here — these are new tables) - `src/db.py` — extend `_SYSTEM_SCHEMA` with the 7 new tables for fresh installs - `src/db.py` — ladder step `if current_version < 41: _v40_to_v41(conn)` - Test: `tests/test_schema_v41_migration.py` — 6 tests like v40 (version bump, columns/tables exist, indices exist, idempotent, v30→v41 evolved DB, v40→v41 direct) ### A.2: Attribution explode Activity Monitoring Plan §1 task 1.1–1.6 verbatim. Files: - Create `src/repositories/usage_attribution.py` (new repo) - Modify `src/marketplace.py` — call explode after `MarketplacePluginsRepository.replace_for_marketplace(...)` - Modify `app/api/store.py` — call explode after entity create/approve/soft-delete (transactional) - Create `scripts/backfill_usage_attribution.py` — first-deploy populator (idempotent) - Tests: `tests/test_usage_attribution.py` — curated + flea + re-sync replaces + lookup precedence ### A.3: UsageProcessor real extraction Activity Monitoring Plan §2 task 2.1–2.10 verbatim. Files: - Create `services/session_processors/usage_lib.py` — `iter_events`, `AttributionLookup`, `compute_active_seconds`, `compute_summary`, `rebuild_rollups` - Create `src/repositories/usage.py` — `UsageRepository` (`upsert_events`, `upsert_summary`, `purge_for_session`, `delete_older_than`) - Modify `services/session_processors/usage.py` — replace no-op with pipeline. Add `USAGE_PROCESSOR_VERSION = 1` constant - Modify `app/api/admin.py:3363` — `/api/admin/run-session-processor?processor=usage` already exists; after a successful `usage` run, call `rebuild_rollups` - Tests: `tests/test_session_processor_usage.py` — pure tool_use / mcp / skill curated / skill flea / slash / subagent / error / mixed / empty / re-grown file (10 fixtures under `tests/fixtures/sessions/usage/`) - Tests: `tests/test_usage_rollups.py` — seed events directly, call `rebuild_rollups`, assert rollup shapes ### A.4: (originally A.5) — skip cross-link with /admin/activity timeline **Decision: no cross-link in v1.** `/admin/activity` is server-ops timeline (audit_log). Usage events have separate semantics + surfaces (`/marketplace`, `/admin/users/`). Cross-link is a Phase 2 polish if operators ask. Documented under "Parked". --- ## Phase B — Telemetry surfaces (2 tasks) ### B.1: `/marketplace` Most Popular + stats Activity Monitoring Plan §3 task 3.1–3.8 verbatim. Net of: - `UnifiedItem` gains `invocations_30d`, `unique_users_30d`, `trend_pct` - `GET /api/marketplace/items?sort=most_used|trending|recent` - Uncomment Most Popular section in `marketplace.html`, render top-8 cards per tab - Sort dropdown in filter row - Per-card invocation chip + trend - Detail-page sparkline (server-rendered SVG) - Tests `tests/test_marketplace_telemetry.py` — card response shape, sort orders, hidden when empty, existing filters still work ### B.2: `/admin/users/` Sessions section Activity Monitoring Plan §4 task 4.1–4.7 verbatim. Net of: - `GET /api/admin/users/{user_id}/sessions` — paginated list (50 default, 200 max) - `GET /api/admin/users/{user_id}/sessions/{session_file}/download` — single JSONL stream, path-traversal guarded - `GET /api/admin/users/{user_id}/sessions/download-all` — chunked zip, single audit row with `file_count` + `total_bytes` - Sessions section in `admin_user_detail.html` — table + pagination + "Download all" button - Tests `tests/test_api_admin_user_sessions.py` — pagination cap, path-traversal rejection, zip integrity, audit row, admin-only --- ## Phase C — Admin telemetry access (4 tasks) ### C.1: Export endpoint ``` GET /api/admin/usage/export?format=csv|json|parquet&since=YYYY-MM-DD&until=…&user_id=…&source=… ``` Streamed response. CSV uses standard library. Parquet via duckdb COPY TO (already a dependency). - Files: `app/api/admin.py` (new endpoint) or new `app/api/admin_usage.py` - Audit-logged: `usage.export` action with `params={format, since, until, row_count}` - Tests: each format returns valid output, filters honored, admin-only ### C.2: `agnes admin usage export` CLI Mirrors `agnes admin activity` pattern from `zs/spec-activity-center`. Subcommand of `agnes admin usage`. - Files: `cli/commands/admin_usage.py`, register in `cli/commands/admin.py` - Options: `--format`, `--since`, `--until`, `--user`, `--source`, `--out FILE` (else stdout) - Tests: CLI runner with seeded server, valid output, error paths ### C.3: `agnes admin ask "..."` — Text-to-SQL Two-step: 1. Server endpoint `POST /api/admin/usage/ask` with body `{question: str}` returns `{sql: str, rows: list[dict], duration_ms: int}`. Server-side LLM call (Anthropic Claude Haiku via existing provider abstraction in `services/corporate_memory/`) with a system prompt that: - Embeds `usage_events` / `usage_session_summary` schemas - Lists a few sample rows for grounding - Demands SELECT-only SQL — refuses INSERT/UPDATE/DELETE/DROP - Returns the generated SQL even on guard-rail rejection so operator sees what the LLM tried 2. Server validates the SQL is SELECT-only (parse with `sqlglot` or string-prefix sanity check), executes against DuckDB read-only, returns rows 3. Server **audits the question + generated SQL + row count** to `audit_log` action `usage.ask` CLI: `cli/commands/admin_ask.py` — `agnes admin ask "kdo nejvíc používal /compound:ce-debug minulý týden?"` — sends to server, prints SQL + table of results. LLM availability: if no provider configured, endpoint returns 503 with hint. CLI shows the message clearly. - Files: - `app/api/admin_usage.py` — new endpoint `POST /api/admin/usage/ask` - `src/usage_ask.py` (or similar) — LLM prompt construction, SQL validator - `cli/commands/admin_ask.py` — CLI command - Tests: - Unit: SQL validator rejects mutations, accepts SELECT - Unit: prompt builder includes schema digest - Integration: end-to-end with a mocked LLM provider, assert returned SQL executes, rows returned - Admin-only ### C.4: Reprocess + prune Activity Monitoring Plan §5 task 5.1–5.4 verbatim. Net: - `POST /api/admin/usage/reprocess` — admin-only, DELETEs state + events, audit-logged - `POST /api/admin/usage/prune` — admin-only, deletes events older than `USAGE_EVENTS_RETENTION_DAYS` - Scheduler entry — `SCHEDULER_USAGE_PRUNE_INTERVAL` daily - Tests: reprocess clears only `usage` state, prune respects retention --- ## Phase D — Docs (3 tasks) ### D.1: `docs/PLATFORM_SETUP.md` — operator playbook Consolidates scattered setup docs into one ordered playbook: 1. First-time bootstrap (instance.yaml, OAuth, seed admin) 2. TLS / reverse-proxy (Caddy) 3. Marketplaces (curated + private repos, PAT secrets) 4. Scheduler (env vars per processor cadence) 5. Telemetry (UsageProcessor cadence, retention, export, ask) 6. Privacy posture (per-session `agnes mark-private`, server-side audit, PostHog opt-in) 7. Operator daily routine (`/admin/activity`, `/admin/users`, `agnes admin *` commands) Replaces the implicit knowledge in `docs/QUICKSTART.md`, `docs/ONBOARDING.md`, `docs/DEPLOYMENT.md`, `docs/HEADLESS_USAGE.md`. The old docs stay but get a "see PLATFORM_SETUP.md" pointer at the top. ### D.2: `docs/HOWTO/` index — analyst guides 5 cookbook-style guides: 1. `docs/HOWTO/01-first-query.md` — `agnes pull`, `agnes catalog`, first SQL 2. `docs/HOWTO/02-snapshots-for-remote.md` — `agnes snapshot create`, when not to 3. `docs/HOWTO/03-private-session.md` — `agnes mark-private` flow, what it does, what it doesn't 4. `docs/HOWTO/04-feedback-and-ask.md` — `agnes admin ask` (admin) + how to report a problem 5. `docs/HOWTO/05-customizing-skills.md` — install/uninstall, flea market upload, guardrails Plus `docs/HOWTO/README.md` as an index. ### D.3: CHANGELOG + spec doc Single `[Unreleased]` section with `### Added` / `### Changed` / `### Internal` covering everything in Phases A–C, plus Phase D as `### Documentation`. --- ## Rigor — double review + double tests User explicitly requested. Per task: 1. **Implementation subagent** writes the failing test FIRST, then implements, then verifies test passes (TDD). 2. **Spec compliance reviewer** subagent verifies the implementation matches what the spec asked for — nothing missing, nothing extra. 3. **Code quality reviewer** subagent checks for clarity, DRY, YAGNI, security smells, naming, file responsibility. 4. **E2E behavior reviewer** subagent — NEW step beyond `superpowers:subagent-driven-development` default. Runs the actual touchpoint end-to-end against a live test server (using existing fixtures from `tests/conftest.py`), confirms behavior matches the user-visible contract. Per phase: - After all tasks in a phase complete, dispatch one **phase integration reviewer** that exercises the full phase surface from outside (curl + CLI + DB inspection) and confirms inter-task coordination. Per epic (at the very end): - **Security review** across the whole diff (SQL injection, path traversal, LLM prompt injection in `agnes admin ask`) - **Code architecture review** across the whole diff (file responsibility, repo boundaries, no drift) - **End-to-end behavior review** — runs every CLI command + every web endpoint on a live server, screenshots `/admin/users/` and `/marketplace` Most Popular, verifies admin can answer 3 realistic questions via `agnes admin ask`. --- ## Acceptance criteria When all phases complete: 1. `usage_events` table populates from a single seeded session within one `run-session-processor` tick. 2. `/marketplace` shows a "Most Popular — last 30 days" section with at least one curated and one flea card after the seeded data is processed. 3. `/admin/users/` Sessions section lists the seeded user's sessions with single-file download + "Download all (.zip)" button. Both writes `audit_log` rows. 4. `GET /api/admin/usage/export?format=csv&since=2026-05-01` returns valid CSV stream with all seeded events. 5. `agnes admin usage export --format json --out /tmp/out.json` writes a valid JSON file with the same rows. 6. `agnes admin ask "how many times was Bash invoked yesterday"` returns a SELECT statement + the answer row. 7. `agnes admin usage reprocess` clears `usage_events` + `session_processor_state` rows for `usage` only; verification rows untouched. 8. New operator opening `docs/PLATFORM_SETUP.md` can bootstrap a fresh Agnes instance with telemetry enabled in under 30 min. 9. Per-session `agnes mark-private` (Minas's #242) prevents the session from reaching `usage_events` — verified by a regression test. 10. Full pytest suite green; double-review trail recorded in git commits per task. --- ## Phasing & PR strategy **One PR**, `zs/platform-telemetry`, stacked on `zs/spec-activity-center`. Each phase = one or more commits; tasks within a phase share state. Order: 1. Phase A.1 (schema) — 1 commit 2. Phase A.2 (attribution) — 1–2 commits 3. Phase A.3 (UsageProcessor) — 2–3 commits 4. Phase B.1 (marketplace stats) — 1–2 commits 5. Phase B.2 (per-user sessions) — 1–2 commits 6. Phase C.1+C.2 (export endpoint + CLI) — 2 commits 7. Phase C.3 (`agnes admin ask`) — 2–3 commits 8. Phase C.4 (reprocess + prune) — 1 commit 9. Phase D.1+D.2+D.3 (docs + CHANGELOG) — 2–3 commits Total: ~14–19 commits. Roughly 25–35 hours of subagent work. --- ## Out of scope (parked for v2) - `/admin/usage` drill-down dashboard (Activity Monitoring Plan §parked) - Cross-link of `usage_events` into `/admin/activity` timeline (decided in A.4) - LLM friction tagging on events - Bash regex error classification - Retry-loop detection - Per-plugin version stats - Per-user own-data dashboards (Resource type `OWN_USAGE`) - Real-time push (WebSocket) - "Users who used X also used Y" co-occurrence signal - Flea market team directory (Boss Q3 — skip per directive) - Anonymized telemetry mode (counts ok, content masked) — `agnes mark-private` covers the all-or-nothing case