CLAUDE.md rewritten (708 -> ~320 lines): four overlapping release sections collapsed to one, stale v1->v35 schema history dropped (it lives in CHANGELOG), marketplace endpoint internals and verbose process sections moved out or tightened. New focused docs: - docs/RELEASING.md - release process, deploy workflows, CI quirks (RELEASE_TEMPLATE.md folded in as an appendix) - docs/marketplace.md - marketplace ingestion + re-serving internals - docs/README.md - documentation index by audience, linked from README.md and CLAUDE.md Archived under docs/archive/: docs/superpowers/ (52 historical planning artifacts), HACKATHON.md, pd-ps-comments.md, security-audit-2026-04.md, future/NOTIFICATIONS.md. Removed the docs/auto-install.md stub. Fixed dangling links in connectors/jira/README.md and dev_docs/README.md, repointed code/doc references to archived paths.
480 lines
27 KiB
Markdown
480 lines
27 KiB
Markdown
# Admin Observability — parent spec
|
||
|
||
> **Status:** spec / discussion. Verified against `origin/main` at `65342cd1` (release 0.49.0).
|
||
> Schema v39. Worktree: `tmp_oss-activity-spec`.
|
||
>
|
||
> **Children (executable plans):**
|
||
> - `2026-05-11-activity-center-mvp.md` — Activity Center rebuild + audit gap closure (this PR)
|
||
> - (next) `2026-05-NN-admin-sessions.md` — `/admin/sessions` + `failure_scan` processor
|
||
> - (next) `2026-05-NN-feedback-inbox.md` — `agnes report` CLI + `/admin/feedback` + Claude skill
|
||
|
||
---
|
||
|
||
## 1. Why this exists
|
||
|
||
Agnes today has dozens of moving server-side processes — scheduler ticks, syncs, materialized BQ runs, marketplace clones, memory pipeline, RBAC mutations, PAT issuance, session uploads, queries. Some land in `audit_log`, some in `sync_history`, some only in container stdout, some nowhere.
|
||
|
||
An admin who asks **"is my Agnes instance healthy and what happened?"** today does one of three things:
|
||
|
||
1. SSHs into the VM and `docker logs` across containers.
|
||
2. Opens DuckDB directly with `duckdb /data/state/system.duckdb`.
|
||
3. Clicks through five separate admin pages (`/admin/scheduler-runs`, `/admin/tokens`, `/admin/access`, `/admin/marketplaces`, `/admin/users`) and stitches the picture together.
|
||
|
||
`/activity-center` was supposed to fix this. It doesn't — the template renders fake "Executive Pulse / Maturity Roadmap / Business Processes" sections fed by an empty handler context. Issue #206.
|
||
|
||
This spec rebuilds it as **`/admin/activity`** and adds two adjacent observability surfaces:
|
||
|
||
- **`/admin/sessions`** — admin browses Claude Code session transcripts across users, finds failure patterns ("where Claude got stuck so we can fix the CLI / setup prompt / skill"). New `failure_scan` processor in the existing `services/session_pipeline/` framework.
|
||
- **`/admin/feedback`** — inbox for explicit user-reported problems. New `agnes report` CLI command + Claude skill + new `feedback_reports` table.
|
||
|
||
The three surfaces together turn Agnes from a black box into a glass box for operators.
|
||
|
||
---
|
||
|
||
## 2. Audience model (no personas, just resources + RBAC)
|
||
|
||
Per v13 RBAC, the only hard distinction is `is_admin=true` (god-mode) vs. everyone else. We do not introduce new role bits. Instead we frame everything as **resources** that admins control via existing `resource_grants`. When the spec says "admin sees X" it means "the page is gated by `require_admin`; admin can later grant the underlying resource to other groups if the customer asks".
|
||
|
||
### Resources used / introduced
|
||
|
||
| Resource | Read-own surface | Manage-all surface | New / existing |
|
||
|---|---|---|---|
|
||
| Server operations (`audit_log` + `sync_history` + `session_processor_state`) | — | `/admin/activity` | rebuilt |
|
||
| Session transcripts (`${DATA_DIR}/user_sessions/<user>/*.jsonl`) | `/profile/sessions` | `/admin/sessions` | NEW page |
|
||
| Failure findings (new `session_findings` table) | — | tab in `/admin/sessions` | NEW table |
|
||
| User feedback (`feedback_reports` table — NEW) | (write-only via `agnes report`) | `/admin/feedback` | NEW |
|
||
| All others | various existing pages | various existing pages | unchanged |
|
||
|
||
---
|
||
|
||
## 3. Non-goals
|
||
|
||
- ❌ Replacing `/admin/diagnose`. Different question (current state vs. history).
|
||
- ❌ Strategic / exec value-reporting. The current template's "maturity roadmap" / "decisions supported" framing is deleted.
|
||
- ❌ Live streaming (SSE / WebSocket). Polling every 30s is enough.
|
||
- ❌ Cross-instance / fleet view.
|
||
- ❌ Mandatory LLM features. Activity Center works fully without PostHog or any external service.
|
||
- ❌ Analyst-side `/profile/activity`. Their existing `/profile/sessions` is already their personal audit trail in practice; adding a third profile page is not justified.
|
||
|
||
---
|
||
|
||
## 4. State on `origin/main` — verified facts the spec depends on
|
||
|
||
### 4.1 Schema (`src/db.py:43`)
|
||
|
||
```python
|
||
SCHEMA_VERSION = 39
|
||
```
|
||
|
||
Tables relevant to this work:
|
||
|
||
- `audit_log` (`id, timestamp, user_id, action, resource, params JSON, result, duration_ms`) — the primary event source. **30+ writer call sites today.**
|
||
- `sync_history` (`id, table_id, synced_at, rows, duration_ms, status, error`) — per-table sync events.
|
||
- `session_processor_state` (`processor_name, session_file, username, processed_at, items_extracted, file_hash`) — composite PK `(processor_name, session_file)`. **Per-processor checkpoint.**
|
||
- `verification_evidence`, `knowledge_items`, `knowledge_contradictions`, `knowledge_item_relations` — memory pipeline output (read-only for AC).
|
||
- `telegram_links` (`user_id PK, chat_id, linked_at`) — for admin notifications.
|
||
- `users`, `user_groups`, `user_group_members`, `resource_grants` — RBAC.
|
||
- `instance_templates` (singleton template store from earlier PRs; #246 proposes folding it into a unified content store, not yet built).
|
||
|
||
### 4.2 session_pipeline framework (`services/session_pipeline/contract.py`)
|
||
|
||
```python
|
||
@dataclass(frozen=True)
|
||
class ProcessorResult:
|
||
items_count: int = 0
|
||
|
||
class SessionProcessor(Protocol):
|
||
name: str
|
||
cadence_minutes: int
|
||
def process_session(
|
||
self,
|
||
session_path: Path,
|
||
username: str,
|
||
session_key: str,
|
||
conn: duckdb.DuckDBPyConnection,
|
||
) -> ProcessorResult: ...
|
||
```
|
||
|
||
- Runner: `services/session_pipeline/runner.py`. Idempotent per `(processor_name, session_file, file_hash)`.
|
||
- Registry: `services/session_processors/__init__.py:PROCESSORS = {"verification": …, "usage": …}`.
|
||
- Scheduler invokes: `POST /api/admin/run-session-processor?processor=<name>` (env-overridable interval per processor, e.g. `SCHEDULER_USAGE_PROCESSOR_INTERVAL=600`).
|
||
|
||
**Implication:** failure-scan is a third processor following the same protocol. No new framework code.
|
||
|
||
### 4.3 Audit coverage gaps (verified)
|
||
|
||
These endpoints exist today and do **not** write `audit_log`:
|
||
|
||
| Endpoint | File | Reason needed in AC |
|
||
|---|---|---|
|
||
| `POST /api/sync/trigger` | `app/api/sync.py:772` | The dominant scheduler-fired action; today only the call to the scheduler endpoint is audited, not what actually ran. |
|
||
| `POST /api/scripts/run-due` | `app/api/scripts.py:138` | Custom user scripts running on-server with no trail. |
|
||
| `POST /api/query` + variants | `app/api/query.py:140+` | Analyst queries — invisible without #158. |
|
||
| `POST /api/query-hybrid` | `app/api/query_hybrid.py` | Same. |
|
||
| `POST /api/upload/sessions` | `app/api/upload.py:55` | Session push — invisible. |
|
||
| `GET /api/data/{table_id}/download` | `app/api/data.py:45` | Parquet pulls — invisible. |
|
||
|
||
The MVP closes the four non-query gaps. Query attribution (#158) is its own scope.
|
||
|
||
### 4.4 PostHog (`src/observability/posthog_client.py`)
|
||
|
||
Singleton `get_posthog()`, methods:
|
||
- `.capture(event: str, distinct_id: str, properties: dict | None) -> None`
|
||
- `.capture_exception(exc, distinct_id, request, properties) -> None`
|
||
- `.is_feature_enabled(key, distinct_id, default)` — usable for opt-in feature flags inside AC
|
||
|
||
Off by default (`POSTHOG_API_KEY` unset). All call sites must be no-op-safe.
|
||
|
||
### 4.5 Telegram (`services/telegram_bot/sender.py`)
|
||
|
||
```python
|
||
async def send_message(chat_id: int, text: str, parse_mode: str = "Markdown") -> bool
|
||
```
|
||
|
||
Lookup `telegram_links` row by `user_id`. No existing admin notification flow — feedback inbox is its first user.
|
||
|
||
---
|
||
|
||
## 5. Architecture decisions
|
||
|
||
### 5.1 Where the three surfaces live
|
||
|
||
```
|
||
/admin/activity ← rebuilt /activity-center (this PR)
|
||
/admin/sessions ← NEW (follow-up plan)
|
||
/admin/feedback ← NEW (follow-up plan)
|
||
```
|
||
|
||
All three:
|
||
- Gated by `Depends(require_admin)` — no new resource type for now.
|
||
- Listed in `_app_header.html` admin dropdown.
|
||
- Share a common drawer / detail-modal pattern (one Jinja partial reused).
|
||
- Share the same audit-recursive rule: reading from these endpoints itself writes one `audit_log` row.
|
||
- Each gets a top-of-page health micro-summary that links to the Activity Center health pulse.
|
||
|
||
### 5.2 Data — separate `change_log` vs. fattened `audit_log`
|
||
|
||
**Decision: fatten `audit_log`** with two new columns.
|
||
|
||
Rationale: Adding a separate `change_log` table requires every mutating endpoint to write to two places, doubling the failure modes. The audit_log row IS the change log entry, plus `params_before` for diff/rollback purposes. The vast majority of audit rows are non-mutations (reads, ticks, queries) where `params_before` is null — null storage cost in DuckDB is trivial.
|
||
|
||
Schema migration **v40**:
|
||
|
||
```sql
|
||
ALTER TABLE audit_log ADD COLUMN params_before JSON; -- prior state, null for non-mutations
|
||
ALTER TABLE audit_log ADD COLUMN client_ip VARCHAR; -- promoted from params for indexability
|
||
ALTER TABLE audit_log ADD COLUMN client_kind VARCHAR; -- 'cli' | 'web' | 'agent' | 'scheduler' | 'external'
|
||
ALTER TABLE audit_log ADD COLUMN correlation_id VARCHAR; -- groups multi-step operations
|
||
CREATE INDEX idx_audit_timestamp_desc ON audit_log(timestamp);
|
||
CREATE INDEX idx_audit_user_time ON audit_log(user_id, timestamp);
|
||
CREATE INDEX idx_audit_action_time ON audit_log(action, timestamp);
|
||
```
|
||
|
||
`AuditRepository.log()` gains the four new kwargs. Existing callers compile-time-unbroken (kwargs default to None).
|
||
|
||
**Operational note (reviewer pass):** DuckDB does **not** honor `DESC` in `CREATE INDEX` — the planner picks direction at query time. The `_desc` suffix in the index name is informative, not directive. Direction is enforced by `ORDER BY ... DESC` in `AuditRepository.query()`.
|
||
|
||
**Upgrade window (reviewer pass):** index creation on a populated `audit_log` (>100k rows) is single-threaded and may take 30–60s per index. Customers upgrading to v40 should expect a 30–120s startup window on first launch. CHANGELOG entry for v40 must call this out.
|
||
|
||
### 5.3 Filtering & pagination
|
||
|
||
`AuditRepository.query()` today supports `user_id`, `action`, `limit`. Rewrite to:
|
||
|
||
```python
|
||
def query(
|
||
self,
|
||
*,
|
||
since: datetime | None = None,
|
||
until: datetime | None = None,
|
||
user_id: str | None = None,
|
||
action_prefix: str | None = None, # 'sync.', 'query.', 'auth.', …
|
||
action_in: list[str] | None = None,
|
||
resource: str | None = None,
|
||
result_pattern: str | None = None, # 'success', 'error.%'
|
||
correlation_id: str | None = None,
|
||
q: str | None = None, # full-text over params JSON
|
||
cursor: tuple[datetime, str] | None = None, # (timestamp, id)
|
||
limit: int = 100,
|
||
) -> tuple[list[dict], tuple[datetime, str] | None]:
|
||
...
|
||
```
|
||
|
||
Returns `(rows, next_cursor)`. Cursor encodes `(timestamp, id)` to make pagination stable under same-second writes. All filters AND together. `q` does `LIKE '%substring%'` on `params::TEXT` for v1; FTS upgrade is later.
|
||
|
||
### 5.4 Health pulse
|
||
|
||
Single endpoint `GET /api/admin/activity/health` returning a JSON dict cached server-side 30s:
|
||
|
||
```json
|
||
{
|
||
"status": "green | yellow | red",
|
||
"fields": [
|
||
{"key": "scheduler", "value": "47s ago", "raw_seconds": 47, "color": "green", "click_filter": "action_prefix=run_"},
|
||
{"key": "sync_24h", "value": "18 ok / 2 fail", "ok": 18, "fail": 2, "color": "yellow", "click_filter": "action_prefix=sync."},
|
||
{"key": "active_users_today", "value": "12", "color": "green"},
|
||
{"key": "memory_pipeline", "value": "ok (3 runs)", "color": "green", "click_filter": "action_prefix=run_session_processor"},
|
||
{"key": "diagnose_warnings", "value": "0", "color": "green"}
|
||
],
|
||
"sentence": "All systems nominal — 12 active users, last sync 4 min ago, no warnings."
|
||
}
|
||
```
|
||
|
||
Thresholds in code, not config. Acceptance: each field can be tested deterministically by seeding `audit_log` / `sync_history` and frozen-clock fixtures.
|
||
|
||
### 5.5 What gets MVP and what gets P2
|
||
|
||
| Activity Center tab | MVP (this PR) | Phase B | Phase C |
|
||
|---|---|---|---|
|
||
| Health pulse | ✓ | — | — |
|
||
| Timeline | ✓ | params_before diff | — |
|
||
| Sync (per-table grid) | ✓ | — | — |
|
||
| Changes (mutations) | — | ✓ (read-only diff) | rollback |
|
||
| Queries | — | — | ✓ (gated on #158) |
|
||
| Performance | — | — | ✓ |
|
||
| Usage (DAU/WAU) | — | ✓ | — |
|
||
| Costs | — | — | ✓ |
|
||
|
||
### 5.6 `/admin/sessions` — failure_scan processor
|
||
|
||
New file `services/session_processors/failure_scan.py`. Heuristics (deterministic, no LLM in v1):
|
||
|
||
| Signal | Detection |
|
||
|---|---|
|
||
| Tool error | turn with `tool_use` followed by tool result containing `is_error: true` / `exit code [1-9]` |
|
||
| Permission denied | tool result contains `permission denied` (case-insensitive) |
|
||
| User rejection | user turn matching regex `\b(no|stop|wrong|not what|incorrect|broken)\b` AND length < 60 chars |
|
||
| Loop pattern | 3+ consecutive assistant turns with same `tool_use.name` and similar input hash |
|
||
| Abrupt end | last turn `role=user` (never closed by assistant) |
|
||
|
||
Writes findings to NEW table `session_findings`:
|
||
|
||
```sql
|
||
CREATE TABLE session_findings (
|
||
id VARCHAR PRIMARY KEY,
|
||
session_file VARCHAR NOT NULL,
|
||
username VARCHAR NOT NULL,
|
||
finding_type VARCHAR NOT NULL, -- tool_error | permission_denied | user_rejection | loop | abrupt_end
|
||
turn_index INTEGER NOT NULL,
|
||
severity VARCHAR DEFAULT 'info', -- info | warning | error
|
||
excerpt TEXT, -- short context for UI display
|
||
detected_at TIMESTAMP DEFAULT current_timestamp
|
||
);
|
||
CREATE INDEX idx_session_findings_session ON session_findings(session_file);
|
||
CREATE INDEX idx_session_findings_type ON session_findings(finding_type);
|
||
```
|
||
|
||
Admin UI (`/admin/sessions`):
|
||
|
||
- List view: one row per session JSONL file, sortable by recency / # findings / user, filters: user, date range, has finding of type X
|
||
- Detail view: chronological replay of the session JSONL with finding markers inline; click a finding → highlights the relevant turn(s)
|
||
- Aggregated view: heatmap "finding type × week" across all users
|
||
|
||
### 5.7 `/admin/feedback` — feedback_reports + `agnes report`
|
||
|
||
NEW table:
|
||
|
||
```sql
|
||
CREATE TABLE feedback_reports (
|
||
id VARCHAR PRIMARY KEY,
|
||
created_at TIMESTAMP DEFAULT current_timestamp,
|
||
reporter_user VARCHAR, -- nullable for anonymous (future)
|
||
message TEXT NOT NULL,
|
||
session_excerpt TEXT, -- last N turns of JSONL serialized
|
||
session_file VARCHAR, -- pointer to full JSONL if uploaded
|
||
environment JSON, -- agnes version, OS, claude code version
|
||
fingerprint VARCHAR, -- sha256 over (message + last error excerpt) for dedup
|
||
status VARCHAR DEFAULT 'open', -- open | triaged | resolved | wontfix
|
||
assignee VARCHAR,
|
||
tags JSON, -- ['cli', 'setup-prompt', 'skill-name', …]
|
||
resolution TEXT,
|
||
resolved_at TIMESTAMP,
|
||
resolved_by VARCHAR
|
||
);
|
||
CREATE INDEX idx_feedback_status_created ON feedback_reports(status, created_at);
|
||
CREATE INDEX idx_feedback_fingerprint ON feedback_reports(fingerprint);
|
||
```
|
||
|
||
End-to-end flow:
|
||
|
||
1. Analyst (or Claude proactively) runs `agnes report --message "…"`.
|
||
2. CLI bundles last 50 turns of current session JSONL (via `cli/lib/claude_sessions.py:list_session_files`) + env info.
|
||
3. **CLI shows preview** ("This will be sent: …") and asks for confirmation. Mandatory — never silent submission.
|
||
4. `POST /api/feedback` with the bundle.
|
||
5. Server inserts row, computes `fingerprint`, writes `audit_log(action='feedback.report')`, returns `report_id`.
|
||
6. Server triggers Telegram notification to all admin users with linked `chat_id` (best-effort, swallowed errors).
|
||
7. Admin opens `/admin/feedback`, clicks row → modal with full message + session replay + env.
|
||
8. Admin actions: assign to self, tag, mark resolved (with resolution text), mark wontfix.
|
||
|
||
Claude-side trigger: a first-party skill `agnes-report` (in the OSS marketplace) that bundles current session and invokes `agnes report`. Skill manifest lives in `services/marketplace/oss/agnes-report/` (sibling to existing system plugins from #241).
|
||
|
||
---
|
||
|
||
## 6. Static content (CLAUDE.md template, copy on the new pages)
|
||
|
||
Issue #246 proposes a unified content framework. The MVP does NOT block on it — new pages embed copy directly in templates. When #246 lands, those strings move to `instance_content` slugs. Tracked as P2 follow-up; no migration debt incurred because the templates are small.
|
||
|
||
---
|
||
|
||
## 7. Security & privacy
|
||
|
||
### 7.1 Access
|
||
|
||
- All `/admin/activity/*`, `/admin/sessions/*`, `/admin/feedback/*` endpoints: `Depends(require_admin)`.
|
||
- No new resource type. Admin god-mode for v1. Future: optional `audit:read` grant for a hypothetical "compliance" group.
|
||
|
||
### 7.2 PII in `params`
|
||
|
||
- Default UI render: literal values in SQL strings masked to `?` placeholders; literal strings elsewhere truncated to 128 chars.
|
||
- **"Show raw" toggle + `audit.reveal_raw` logging deferred to Phase B** (reviewer pass): MVP ships with truncation-only display. The toggle UI + its dedicated audit action land alongside the Changes/Diff tab. Until then, admins who need raw values open DuckDB directly — that path itself does not leave a trace, which is documented as a known v40 gap.
|
||
- Database always stores raw values. Masking is render-side, not storage-side.
|
||
|
||
### 7.3 Recursive audit
|
||
|
||
Every read of `/admin/activity` / `/admin/sessions` / `/admin/feedback` writes `audit_log(action='activity.read' | 'sessions.read' | 'feedback.read')`. Suppressed when:
|
||
- Endpoint is the polling health endpoint (high-frequency, low signal).
|
||
- Same actor + same filter combination within last 60s.
|
||
|
||
**Reviewer note — single-worker assumption (v40):** The suppression cache (`_RECENT_AUDITS`) and health-pulse cache (`_HEALTH_CACHE`) are per-process module-level dicts. v40 ships with the existing **single-worker uvicorn** default (no compose change required). When multi-worker uvicorn is later enabled, both caches move to a shared store — a separate plan tracks that. Until then, dedup is per-worker and a multi-worker deployment would let one bad actor produce N rows / minute instead of 1.
|
||
|
||
### 7.4 Feedback privacy
|
||
|
||
`session_excerpt` is included in the feedback payload. Skill / CLI **must show preview before submit** — this is a hard requirement, not a UX suggestion. Logged in `audit_log(action='feedback.report', params={ack_preview: true})`.
|
||
|
||
Server stores excerpts as text. Retention default unbounded; admin can purge `feedback_reports` row directly (still leaves audit_log trace).
|
||
|
||
---
|
||
|
||
## 8. Observability of observability
|
||
|
||
- All new endpoints emit PostHog events when PostHog is enabled:
|
||
- `activity_health_viewed`
|
||
- `activity_timeline_filtered` (with filter keys, not values)
|
||
- `feedback_report_submitted`
|
||
- `session_failure_detected`
|
||
- All swallowed errors `posthog.capture_exception()`.
|
||
- PostHog events are best-effort; never block the user-visible flow.
|
||
|
||
---
|
||
|
||
## 9. Phasing across subsystems
|
||
|
||
```
|
||
WEEK 1 ┌─ Activity Center MVP (this PR) ─────────────────────────┐
|
||
│ - schema v40 (audit_log columns + indices) │
|
||
│ - AuditRepository.query() rewrite │
|
||
│ - SyncHistoryRepository.list_recent() │
|
||
│ - close 4 audit gaps (sync.trigger, scripts.run-due, │
|
||
│ upload.sessions, data.download) │
|
||
│ - /admin/activity handler + template │
|
||
│ - Health pulse + Timeline + Sync tabs │
|
||
│ - redirect /activity-center → /admin/activity │
|
||
│ - delete demo template content (BREAKING) │
|
||
└──────────────────────────────────────────────────────────┘
|
||
|
||
WEEK 2 ┌─ Admin sessions (separate plan) ────────────────────────┐
|
||
│ - schema v41 (session_findings table) │
|
||
│ - services/session_processors/failure_scan.py │
|
||
│ - register in PROCESSORS + scheduler JOBS │
|
||
│ - /admin/sessions list + detail │
|
||
│ - integrate with Activity Center timeline │
|
||
└──────────────────────────────────────────────────────────┘
|
||
|
||
WEEK 3 ┌─ Feedback inbox (separate plan) ────────────────────────┐
|
||
│ - schema v42 (feedback_reports table) │
|
||
│ - POST /api/feedback endpoint │
|
||
│ - cli/commands/report.py │
|
||
│ - agnes-report skill in OSS marketplace │
|
||
│ - /admin/feedback list + detail │
|
||
│ - Telegram admin notifications │
|
||
└──────────────────────────────────────────────────────────┘
|
||
|
||
WEEK 4+ ┌─ Phase B / C (separate plans) ──────────────────────────┐
|
||
│ - params_before + Changes tab + Rollback (B) │
|
||
│ - Usage tab (B) │
|
||
│ - Queries tab gated on #158 (C) │
|
||
│ - Performance tab (C) │
|
||
│ - LLM scoring in failure_scan (C) │
|
||
│ - GitHub issue auto-file from feedback row (C) │
|
||
└──────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
Each weekly chunk is a separate PR with its own CHANGELOG entry. Order matters: Activity Center first because closing the audit gaps benefits the other two surfaces' timelines.
|
||
|
||
---
|
||
|
||
## 10. Open questions (decisions still owed)
|
||
|
||
1. **Rollback in Phase B — generic vs. allowlist?** Recommendation: allowlist of 9 specific actions (`instance_config.update`, `registry.update/create/delete`, `resource_grants.add/remove`, `user_groups.*`, `user_group_members.*`, `instance_templates.set`). Generic rollback is a footgun.
|
||
2. **Telegram admin notification volume.** Feedback reports could come fast. Recommendation: rate-limit per admin to 1 message / 5 min; daily digest for the rest. Configurable later.
|
||
3. **Session replay in feedback** — store full JSONL or last 50 turns only? Recommendation: last 50 turns inline + pointer to full file if it still exists. Avoids storing duplicate JSONLs in the DB.
|
||
4. **`agnes report` always uploads the session, or opt-in?** Recommendation: prompt every time. Power-users can add `--yes` to bypass; default is interactive.
|
||
5. **failure_scan LLM scoring in v1 or v2?** Recommendation: v1 deterministic heuristics only. LLM scoring is v2 once we have data to validate heuristic precision against.
|
||
6. **`/admin/scheduler-runs` deprecation timing.** Recommendation: keep as a redirect to `/admin/activity?action_prefix=run_session_processor` after MVP ships; remove after one release cycle.
|
||
|
||
---
|
||
|
||
## 11. What this displaces / replaces
|
||
|
||
- `/activity-center` → redirected to `/admin/activity`. Demo template content deleted (BREAKING per CHANGELOG).
|
||
- `/admin/scheduler-runs` → redirected to `/admin/activity?action_in=run_session_processor:verification,run_session_processor:usage,marketplace.sync_all` after week 1.
|
||
- Dashboard widget pointing at `/activity-center` → URL updated to `/admin/activity`.
|
||
|
||
Nothing else removed. `/admin/diagnose`, `/admin/tokens`, `/admin/access`, `/admin/marketplaces`, `/admin/registry`, `/admin/server-config` remain as mutating surfaces; Activity Center deep-links into them.
|
||
|
||
---
|
||
|
||
## 12. Acceptance criteria for the whole programme (across all three subsystems)
|
||
|
||
When all three subsystems have shipped:
|
||
|
||
1. An admin opening `/admin/activity` sees, within 500ms p95, a health pulse and a chronological timeline of every event on the instance for the last 24h.
|
||
2. Every audit-writing endpoint (incl. the 4 newly instrumented in week 1) appears in the timeline within the same admin session as the action.
|
||
3. An admin clicking on a `sync` event sees the sync_history detail; clicking on a `feedback.report` event sees the feedback row; clicking on a `run_session_processor` event sees the per-processor state row.
|
||
4. An analyst running `agnes report --message "test"` produces a `feedback_reports` row, an `audit_log` row, a Telegram message to any admin with a linked chat_id, and visible entries in both `/admin/feedback` and `/admin/activity`.
|
||
5. A Claude Code session that contains a tool error triggers a row in `session_findings` after the next `failure_scan` processor tick, surfaced in `/admin/sessions`.
|
||
6. Removing the broken demo content from `activity_center.html` lands in a single PR with CHANGELOG `**BREAKING**` marker.
|
||
7. All three pages render correctly with PostHog disabled (no events emitted, no client snippet injected for AC's own analytics, page works fully).
|
||
8. Every new admin page passes a smoke test that asserts: invoking an audit-writing endpoint surfaces the row in the page's API response within the same test.
|
||
|
||
---
|
||
|
||
## 12a. Reviewer pass — applied & deferred
|
||
|
||
Three sub-agent reviews (security, production resilience, code architecture) ran against the original draft. Consolidated outcomes:
|
||
|
||
**Applied to spec:**
|
||
- §5.2 — DuckDB DESC behavior + upgrade-window note
|
||
- §7.2 — `audit.reveal_raw` mechanism deferred to Phase B
|
||
- §7.3 — explicit single-worker uvicorn assumption for v40
|
||
|
||
**Applied to plan (`2026-05-11-activity-center-mvp.md`):**
|
||
- Import path corrected (`app.auth.dependencies._get_db`)
|
||
- Test fixtures aligned with `seeded_app` / `admin_user` / `get_system_db()` pattern from existing `tests/conftest.py`
|
||
- All new audit writes wrapped in `try/except + logger.exception`
|
||
- Filename sanitization on `POST /api/upload/sessions`
|
||
- 256-char length cap on logged strings
|
||
- 7-day cap when `q` filter used without explicit `since`
|
||
- Migration idempotency + representative evolved-DB test
|
||
- Conventions section added at the top of the plan
|
||
|
||
**Deferred with rationale (out of MVP):**
|
||
- `audit.reveal_raw` toggle + UI (Phase B)
|
||
- Shared-cache multi-worker support (separate plan)
|
||
- Health pulse threshold env config (P2 polish)
|
||
- `diagnose_warnings` real count (depends on diagnose endpoint expansion)
|
||
- Default audit retention policy (Phase B follow-up)
|
||
- PostHog SDK timeout knob (add if observed in prod)
|
||
|
||
Reviewer reports are not separately archived in the repo — their consolidated outputs landed as the inline edits above and the "Revisions applied" appendix in the plan doc.
|
||
|
||
---
|
||
|
||
## 13. Implementation plan documents
|
||
|
||
This spec is the parent. The executable plans are:
|
||
|
||
- **`2026-05-11-activity-center-mvp.md`** — full TDD task list for Week 1 work. **Start here.**
|
||
- (next) `2026-05-NN-admin-sessions.md` — failure_scan + /admin/sessions.
|
||
- (next) `2026-05-NN-feedback-inbox.md` — agnes report + /admin/feedback.
|
||
|
||
Each child plan refers back to this spec for cross-cutting decisions.
|