CLAUDE.md rewritten (708 -> ~320 lines): four overlapping release sections collapsed to one, stale v1->v35 schema history dropped (it lives in CHANGELOG), marketplace endpoint internals and verbose process sections moved out or tightened. New focused docs: - docs/RELEASING.md - release process, deploy workflows, CI quirks (RELEASE_TEMPLATE.md folded in as an appendix) - docs/marketplace.md - marketplace ingestion + re-serving internals - docs/README.md - documentation index by audience, linked from README.md and CLAUDE.md Archived under docs/archive/: docs/superpowers/ (52 historical planning artifacts), HACKATHON.md, pd-ps-comments.md, security-audit-2026-04.md, future/NOTIFICATIONS.md. Removed the docs/auto-install.md stub. Fixed dangling links in connectors/jira/README.md and dev_docs/README.md, repointed code/doc references to archived paths.
15 KiB
Platform Telemetry Epic — 2026-05-12
For agentic workers: REQUIRED SUB-SKILL:
superpowers:subagent-driven-development. Each phase is one PR-ready unit. Tasks within a phase share state and ship together.Branch:
zs/platform-telemetry(stacked onzs/spec-activity-centerschema v40 + clean/admin/activityrebuild) Schema: target v41 (one bump for the whole telemetry foundation) Goal: Boss directive + Activity Monitoring Plan (Downloads, 2026-05-09) merged into one executable program.
What this delivers
Boss directive maps to:
| Boss bullet | Where it lands here | Status before this PR |
|---|---|---|
| Platform setup, how-to guides | Phase D | scattered docs, no consolidated playbook |
| Export telemetry | Phase C — /api/admin/usage/export + agnes admin usage export |
nothing |
| Admin access to telemetry (prompts, tool, usage) | Phase B (/admin/users/<id> Sessions) + Phase C (export + ask) |
UsageProcessor is no-op |
| Prompt the telemetry (option C — CLI Text-to-SQL) | Phase C — agnes admin ask "..." |
nothing |
| Privacy mode | Out of scope — Minas's agnes mark-private (#242, merged) already covers this |
shipped |
| Flea market (anyone uploads, guardrails only, no ACL) | Out of scope — shipped via #233 store_entities + #234 enrichment | shipped |
Plus the Activity Monitoring Plan (Downloads) substance:
| Plan task | This epic phase |
|---|---|
| §0 Schema v38 | Phase A.1 — renumber to v41 |
| §1 Attribution explode | Phase A.2 |
| §2 UsageProcessor real extraction | Phase A.3 |
| §3 /marketplace stats wire | Phase B.1 |
| §4 /admin/users sessions section | Phase B.2 |
| §5 Reprocess + retention | Phase C.4 |
| §6 Docs + CHANGELOG | Phase D |
Architecture invariants
- Three-source taxonomy for invocations:
curated(curator-managed marketplace plugins),flea(analyst-uploaded store_entities),builtin(Anthropic-shipped Bash/Read/Edit/…). - Per-event row + per-session summary + daily rollup — events for forensics, summary for the user-detail page, rollups for marketplace popularity queries.
- Attribution explode at write time (marketplace sync, store entity write) into
usage_attribution_*tables. Processor does single-query attribution lookup, no scan-time. - Privacy = analyst's per-session
agnes mark-privatedecision (Minas's #242). Sessions marked private never upload → never reachUsageProcessor→ no telemetry. No new privacy code needed. - Reprocess strategy —
USAGE_PROCESSOR_VERSIONbump triggersagnes admin usage reprocesswhich DELETEssession_processor_staterows forusageonly, leavesverificationuntouched (composite PK). - Retention —
USAGE_EVENTS_RETENTION_DAYSenv var, default0= forever. Daily prune in scheduler. - Admin telemetry "ask" — CLI sends the natural-language question + a schema digest + a few sample rows to Anthropic API (Claude Haiku, cheapest model that handles SQL well), gets SQL back, executes it read-only, prints both the SQL and the results. Audit-logged. No data leaves the server beyond the schema digest + the question itself.
Phase A — Foundation (5 tasks)
A.1: Schema v41 migration
DDL identical to Activity Monitoring Plan §0 but bumped to v41:
-- usage_events, usage_session_summary, usage_tool_daily, usage_plugin_daily,
-- usage_attribution_skills, usage_attribution_agents, usage_attribution_commands
-- See Activity Monitoring Plan — 2026-05-09 lines 92–199 for full DDL.
Files:
src/db.py:43— bumpSCHEMA_VERSION = 41src/db.py— add_v40_to_v41(conn)function with all 7CREATE TABLE IF NOT EXISTS+ indices, idempotent (ADD COLUMN IF NOT EXISTSnot relevant here — these are new tables)src/db.py— extend_SYSTEM_SCHEMAwith the 7 new tables for fresh installssrc/db.py— ladder stepif current_version < 41: _v40_to_v41(conn)- Test:
tests/test_schema_v41_migration.py— 6 tests like v40 (version bump, columns/tables exist, indices exist, idempotent, v30→v41 evolved DB, v40→v41 direct)
A.2: Attribution explode
Activity Monitoring Plan §1 task 1.1–1.6 verbatim. Files:
- Create
src/repositories/usage_attribution.py(new repo) - Modify
src/marketplace.py— call explode afterMarketplacePluginsRepository.replace_for_marketplace(...) - Modify
app/api/store.py— call explode after entity create/approve/soft-delete (transactional) - Create
scripts/backfill_usage_attribution.py— first-deploy populator (idempotent) - Tests:
tests/test_usage_attribution.py— curated + flea + re-sync replaces + lookup precedence
A.3: UsageProcessor real extraction
Activity Monitoring Plan §2 task 2.1–2.10 verbatim. Files:
- Create
services/session_processors/usage_lib.py—iter_events,AttributionLookup,compute_active_seconds,compute_summary,rebuild_rollups - Create
src/repositories/usage.py—UsageRepository(upsert_events,upsert_summary,purge_for_session,delete_older_than) - Modify
services/session_processors/usage.py— replace no-op with pipeline. AddUSAGE_PROCESSOR_VERSION = 1constant - Modify
app/api/admin.py:3363—/api/admin/run-session-processor?processor=usagealready exists; after a successfulusagerun, callrebuild_rollups - Tests:
tests/test_session_processor_usage.py— pure tool_use / mcp / skill curated / skill flea / slash / subagent / error / mixed / empty / re-grown file (10 fixtures undertests/fixtures/sessions/usage/) - Tests:
tests/test_usage_rollups.py— seed events directly, callrebuild_rollups, assert rollup shapes
A.4: (originally A.5) — skip cross-link with /admin/activity timeline
Decision: no cross-link in v1. /admin/activity is server-ops timeline (audit_log). Usage events have separate semantics + surfaces (/marketplace, /admin/users/<id>). Cross-link is a Phase 2 polish if operators ask. Documented under "Parked".
Phase B — Telemetry surfaces (2 tasks)
B.1: /marketplace Most Popular + stats
Activity Monitoring Plan §3 task 3.1–3.8 verbatim. Net of:
UnifiedItemgainsinvocations_30d,unique_users_30d,trend_pctGET /api/marketplace/items?sort=most_used|trending|recent- Uncomment Most Popular section in
marketplace.html, render top-8 cards per tab - Sort dropdown in filter row
- Per-card invocation chip + trend
- Detail-page sparkline (server-rendered SVG)
- Tests
tests/test_marketplace_telemetry.py— card response shape, sort orders, hidden when empty, existing filters still work
B.2: /admin/users/<id> Sessions section
Activity Monitoring Plan §4 task 4.1–4.7 verbatim. Net of:
GET /api/admin/users/{user_id}/sessions— paginated list (50 default, 200 max)GET /api/admin/users/{user_id}/sessions/{session_file}/download— single JSONL stream, path-traversal guardedGET /api/admin/users/{user_id}/sessions/download-all— chunked zip, single audit row withfile_count+total_bytes- Sessions section in
admin_user_detail.html— table + pagination + "Download all" button - Tests
tests/test_api_admin_user_sessions.py— pagination cap, path-traversal rejection, zip integrity, audit row, admin-only
Phase C — Admin telemetry access (4 tasks)
C.1: Export endpoint
GET /api/admin/usage/export?format=csv|json|parquet&since=YYYY-MM-DD&until=…&user_id=…&source=…
Streamed response. CSV uses standard library. Parquet via duckdb COPY TO (already a dependency).
- Files:
app/api/admin.py(new endpoint) or newapp/api/admin_usage.py - Audit-logged:
usage.exportaction withparams={format, since, until, row_count} - Tests: each format returns valid output, filters honored, admin-only
C.2: agnes admin usage export CLI
Mirrors agnes admin activity pattern from zs/spec-activity-center. Subcommand of agnes admin usage.
- Files:
cli/commands/admin_usage.py, register incli/commands/admin.py - Options:
--format,--since,--until,--user,--source,--out FILE(else stdout) - Tests: CLI runner with seeded server, valid output, error paths
C.3: agnes admin ask "..." — Text-to-SQL
Two-step:
- Server endpoint
POST /api/admin/usage/askwith body{question: str}returns{sql: str, rows: list[dict], duration_ms: int}. Server-side LLM call (Anthropic Claude Haiku via existing provider abstraction inservices/corporate_memory/) with a system prompt that:- Embeds
usage_events/usage_session_summaryschemas - Lists a few sample rows for grounding
- Demands SELECT-only SQL — refuses INSERT/UPDATE/DELETE/DROP
- Returns the generated SQL even on guard-rail rejection so operator sees what the LLM tried
- Embeds
- Server validates the SQL is SELECT-only (parse with
sqlglotor string-prefix sanity check), executes against DuckDB read-only, returns rows - Server audits the question + generated SQL + row count to
audit_logactionusage.ask
CLI: cli/commands/admin_ask.py — agnes admin ask "kdo nejvíc používal /compound:ce-debug minulý týden?" — sends to server, prints SQL + table of results.
LLM availability: if no provider configured, endpoint returns 503 with hint. CLI shows the message clearly.
- Files:
app/api/admin_usage.py— new endpointPOST /api/admin/usage/asksrc/usage_ask.py(or similar) — LLM prompt construction, SQL validatorcli/commands/admin_ask.py— CLI command
- Tests:
- Unit: SQL validator rejects mutations, accepts SELECT
- Unit: prompt builder includes schema digest
- Integration: end-to-end with a mocked LLM provider, assert returned SQL executes, rows returned
- Admin-only
C.4: Reprocess + prune
Activity Monitoring Plan §5 task 5.1–5.4 verbatim. Net:
POST /api/admin/usage/reprocess— admin-only, DELETEs state + events, audit-loggedPOST /api/admin/usage/prune— admin-only, deletes events older thanUSAGE_EVENTS_RETENTION_DAYS- Scheduler entry —
SCHEDULER_USAGE_PRUNE_INTERVALdaily - Tests: reprocess clears only
usagestate, prune respects retention
Phase D — Docs (3 tasks)
D.1: docs/PLATFORM_SETUP.md — operator playbook
Consolidates scattered setup docs into one ordered playbook:
- First-time bootstrap (instance.yaml, OAuth, seed admin)
- TLS / reverse-proxy (Caddy)
- Marketplaces (curated + private repos, PAT secrets)
- Scheduler (env vars per processor cadence)
- Telemetry (UsageProcessor cadence, retention, export, ask)
- Privacy posture (per-session
agnes mark-private, server-side audit, PostHog opt-in) - Operator daily routine (
/admin/activity,/admin/users,agnes admin *commands)
Replaces the implicit knowledge in docs/QUICKSTART.md, docs/ONBOARDING.md, docs/DEPLOYMENT.md, docs/HEADLESS_USAGE.md. The old docs stay but get a "see PLATFORM_SETUP.md" pointer at the top.
D.2: docs/HOWTO/ index — analyst guides
5 cookbook-style guides:
docs/HOWTO/01-first-query.md—agnes pull,agnes catalog, first SQLdocs/HOWTO/02-snapshots-for-remote.md—agnes snapshot create, when not todocs/HOWTO/03-private-session.md—agnes mark-privateflow, what it does, what it doesn'tdocs/HOWTO/04-feedback-and-ask.md—agnes admin ask(admin) + how to report a problemdocs/HOWTO/05-customizing-skills.md— install/uninstall, flea market upload, guardrails
Plus docs/HOWTO/README.md as an index.
D.3: CHANGELOG + spec doc
Single [Unreleased] section with ### Added / ### Changed / ### Internal covering everything in Phases A–C, plus Phase D as ### Documentation.
Rigor — double review + double tests
User explicitly requested. Per task:
- Implementation subagent writes the failing test FIRST, then implements, then verifies test passes (TDD).
- Spec compliance reviewer subagent verifies the implementation matches what the spec asked for — nothing missing, nothing extra.
- Code quality reviewer subagent checks for clarity, DRY, YAGNI, security smells, naming, file responsibility.
- E2E behavior reviewer subagent — NEW step beyond
superpowers:subagent-driven-developmentdefault. Runs the actual touchpoint end-to-end against a live test server (using existing fixtures fromtests/conftest.py), confirms behavior matches the user-visible contract.
Per phase:
- After all tasks in a phase complete, dispatch one phase integration reviewer that exercises the full phase surface from outside (curl + CLI + DB inspection) and confirms inter-task coordination.
Per epic (at the very end):
- Security review across the whole diff (SQL injection, path traversal, LLM prompt injection in
agnes admin ask) - Code architecture review across the whole diff (file responsibility, repo boundaries, no drift)
- End-to-end behavior review — runs every CLI command + every web endpoint on a live server, screenshots
/admin/users/<id>and/marketplaceMost Popular, verifies admin can answer 3 realistic questions viaagnes admin ask.
Acceptance criteria
When all phases complete:
usage_eventstable populates from a single seeded session within onerun-session-processortick./marketplaceshows a "Most Popular — last 30 days" section with at least one curated and one flea card after the seeded data is processed./admin/users/<id>Sessions section lists the seeded user's sessions with single-file download + "Download all (.zip)" button. Both writesaudit_logrows.GET /api/admin/usage/export?format=csv&since=2026-05-01returns valid CSV stream with all seeded events.agnes admin usage export --format json --out /tmp/out.jsonwrites a valid JSON file with the same rows.agnes admin ask "how many times was Bash invoked yesterday"returns a SELECT statement + the answer row.agnes admin usage reprocessclearsusage_events+session_processor_staterows forusageonly; verification rows untouched.- New operator opening
docs/PLATFORM_SETUP.mdcan bootstrap a fresh Agnes instance with telemetry enabled in under 30 min. - Per-session
agnes mark-private(Minas's #242) prevents the session from reachingusage_events— verified by a regression test. - Full pytest suite green; double-review trail recorded in git commits per task.
Phasing & PR strategy
One PR, zs/platform-telemetry, stacked on zs/spec-activity-center. Each phase = one or more commits; tasks within a phase share state.
Order:
- Phase A.1 (schema) — 1 commit
- Phase A.2 (attribution) — 1–2 commits
- Phase A.3 (UsageProcessor) — 2–3 commits
- Phase B.1 (marketplace stats) — 1–2 commits
- Phase B.2 (per-user sessions) — 1–2 commits
- Phase C.1+C.2 (export endpoint + CLI) — 2 commits
- Phase C.3 (
agnes admin ask) — 2–3 commits - Phase C.4 (reprocess + prune) — 1 commit
- Phase D.1+D.2+D.3 (docs + CHANGELOG) — 2–3 commits
Total: ~14–19 commits. Roughly 25–35 hours of subagent work.
Out of scope (parked for v2)
/admin/usagedrill-down dashboard (Activity Monitoring Plan §parked)- Cross-link of
usage_eventsinto/admin/activitytimeline (decided in A.4) - LLM friction tagging on events
- Bash regex error classification
- Retry-loop detection
- Per-plugin version stats
- Per-user own-data dashboards (Resource type
OWN_USAGE) - Real-time push (WebSocket)
- "Users who used X also used Y" co-occurrence signal
- Flea market team directory (Boss Q3 — skip per directive)
- Anonymized telemetry mode (counts ok, content masked) —
agnes mark-privatecovers the all-or-nothing case