* docs(spec): admin observability spec + Activity Center MVP plan
Parent spec (480 lines) + executable plan (2295 lines, 14 TDD tasks).
Covers Activity Center rebuild (/admin/activity), with /admin/sessions
and /admin/feedback deferred to follow-up plans.
Already incorporates reviewer-pass revisions across three angles
(security, production resilience, code architecture):
- _get_db import path corrected to app.auth.dependencies
- Test fixtures aligned with seeded_app / admin_user / get_system_db
- All new audit writes wrapped in try/except + logger.exception
- Filename sanitization on session uploads
- DuckDB DESC index behavior documented; upgrade window flagged
- Migration idempotency + evolved-DB test cases
- reveal_raw + shared-cache multi-worker explicitly deferred
Targets schema v40 (audit_log gains params_before, client_ip,
client_kind, correlation_id + 3 indices).
* feat(db): schema v40 — audit_log gains params_before, client_ip, client_kind, correlation_id + 3 indices
* chore(test): clean up Task 1 — drop unused import, rename stale test
* feat(audit): AuditRepository.log() accepts params_before/client_ip/client_kind/correlation_id
* test(audit): strengthen params_before assertion to round-trip JSON content
* feat(audit): AuditRepository.query() rich filters + keyset cursor pagination
* feat(sync): SyncStateRepository.list_recent() cross-table feed
* feat(audit): POST /api/sync/trigger writes audit_log row
* feat(audit): POST /api/scripts/run-due writes audit_log row
* feat(audit): POST /api/upload/sessions writes audit_log row + sanitizes filename
* feat(audit): GET /api/data/{table_id}/download writes audit_log row
* feat(activity): /api/admin/activity timeline + /health + /sync endpoints
* feat(ui): /admin/activity rebuilt — health pulse, timeline, sync grid; /activity-center → 308 redirect
BREAKING: removed demo executive-pulse / maturity-roadmap content from activity_center.html.
The page now reflects real audit_log + sync_history data.
* feat(ui): admin nav + dashboard widget point at /admin/activity
* feat(activity): recursive-audit suppression for AC read endpoints (60s window per actor+filter)
* feat(activity): emit PostHog events when integration enabled (no-op default)
* fix(audit): move v40 indices out of _SYSTEM_SCHEMA + update test_repositories to unpack query() tuple
_SYSTEM_SCHEMA CREATE INDEX on audit_log(timestamp) failed when migration
tests hand-roll a bare audit_log (id, action) without the timestamp column.
Fix: remove indices from _SYSTEM_SCHEMA; add ADD COLUMN IF NOT EXISTS guards
for timestamp and other pre-v40 columns in _v39_to_v40() so the upgrade path
is safe on any hand-rolled schema; call _v39_to_v40 explicitly in the
fresh-install (current==0) path to restore index creation there.
Also unpack the (rows, next_cursor) tuple from AuditRepository.query() in
the three TestAuditRepository tests that still treated it as a list.
* docs: CHANGELOG entry for Activity Center MVP
* chore: refresh stale module docstring in app/api/activity.py
* feat(cli): agnes admin activity — terminal access to Activity Center (timeline + health + sync)
* fix(db): _v39_to_v40 — add IF NOT EXISTS guard for 'action' column
The v39→v40 ladder step adds defensive ADD COLUMN IF NOT EXISTS for
every audit_log column so a hand-rolled bare audit_log (id only) is
safe through the ladder. 'action' was missing from the guard list,
causing CREATE INDEX idx_audit_action_time to fail on tests that
stub audit_log with only an id column (tests/test_e2e_extract.py::
TestSchemaMigration::test_migration_preserves_and_extends).
Local 6/6 schema tests + the previously-failing CI test pass.
* docs(spec): platform telemetry epic — Boss directive + Activity Monitoring plan rebased onto v40 (stacked on zs/spec-activity-center)
* feat(db): schema v41 — 7 usage_* tables for telemetry (events, summary, rollups, attribution)
* chore(db): tighten v41 — usage_session_summary.session_id NOT NULL + upgrade test asserts all 7 tables
* feat(usage): UsageAttributionRepository — replace/delete/lookup over usage_attribution_* tables
* refactor(marketplace): extract list_inner_skills/agents/commands to src/marketplace_listing.py for reuse
* feat(usage): explode plugin attribution on marketplace sync + store entity write; backfill script
* refactor(marketplace): finish src/marketplace_listing.py extraction — drop duplicate _list_inner_* + _parse_frontmatter from app/api/marketplace.py
* feat(usage): promote attribution helpers to src/usage_attribution_helpers.py; hook update_entity rename + bundle-swap; clarify best-effort semantics
* feat(usage): UsageProcessor real extraction + rollup rebuild + 10 fixture-driven tests
* fix(usage): include tool_id in event hash + executemany + rollup transaction (critical multi-tool-turn drop fix)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(marketplace): popularity stats — invocations_30d + trend + sort=most_used|trending + Most Popular section
* feat(admin): /admin/users/<id> Sessions section — list + single-file + bulk-zip downloads (audit-logged)
* feat(usage): admin export endpoint + CLI — csv/json/parquet streaming, filters, audit-logged
* feat(usage): agnes admin ask — LLM Text-to-SQL over usage_events with SELECT-only validator (audit-logged)
* feat(usage): reprocess + prune endpoints + scheduler daily prune job + CLI
* docs: PLATFORM_SETUP.md operator playbook + HOWTO/ cookbook (5 guides + index)
Adds docs/PLATFORM_SETUP.md as a consolidated operator playbook covering
bootstrap, TLS, marketplaces (curated + flea), scheduler env vars, telemetry
extraction/export/ask/prune, privacy posture, and daily routine.
Adds docs/HOWTO/ with 5 analyst cookbook guides: first query, snapshots for
remote tables, private sessions, feedback + admin ask, and customizing skills.
Existing setup docs (QUICKSTART, DEPLOYMENT, ONBOARDING, HEADLESS_USAGE)
get a one-line cross-reference at the top pointing to PLATFORM_SETUP.md.
* docs(changelog): platform telemetry epic — usage_* foundation + surfaces + admin access + docs
Comprehensive [Unreleased] entry covering: usage_events/session_summary/
tool_daily/plugin_daily tables (v41), attribution lookup tables, backfill
script, marketplace Most Popular + invocation chips + sort, admin Sessions
section, export/ask/reprocess/prune endpoints + CLI mirrors, Activity Center
(v40), PLATFORM_SETUP.md + HOWTO/ docs, and operations notes for v41 upgrade.
* fix(security): block DuckDB read_*/http_*/glob functions in usage_ask validator + symlink escape guard in session zip + clarify mark-private semantics
* fix(admin): parquet export tempfile cleanup on COPY failure + correct processed-first sort on /admin/users/<id>/sessions
* feat(audit): close 8 production audit gaps — query (local/remote/hybrid), catalog/schema/sample, snapshot estimate/create, check-access
* feat(ui): /admin/usage summary dashboard + per-user activity tab on /admin/users/<id>
* fix(audit): cap error messages at 200 chars + audit user_activity reads + recursion guard on usage.summary
* fix(audit): catalog.list audits on error path + clean up deferred json import
* fix(ux): client_kind=cli for PAT auth + timeline empty state + email-instead-of-uuid + nav reorder + help text + loading indicators + ask doc
* feat(observability): unify /admin/activity into single page with saved views
- KPI cards (events, users, error rate, p95) clickable as quick-filters
- Faceted filter dropdowns populated from audit_log in the current window
- Sortable audit table, cursor pagination, per-row JSON side panel
- Saved views (schema v43: user_observability_views) — per-user state
- Top bar: window selector + 30s Live toggle + saved views dropdown
- /admin/scheduler-runs → 308 redirect (source=scheduler filter)
- New endpoints: /api/admin/observability/{facets,kpis,views}
* test: update activity + scheduler-runs tests for unified page
- test_admin_activity_page_renders asserts new structural anchors
- test_admin_scheduler_runs_page_admin_only asserts 308 redirect
* fix(observability): respect [hidden] on modal + side panel
CSS `display: flex` on .obs-modal beat the [hidden] attribute's UA
display:none, so the save-view modal rendered on page load and Cancel
clicks couldn't dismiss it. Gate the modal's flex layout on
:not([hidden]); add the same display:none guard prophylactically to
.obs-panel and .obs-views-panel.
* feat(observability): user enrichment in audit + interactive /admin/usage
Activity:
- /api/admin/activity now joins users for user_email + user_name per row
- User column renders "name (id-prefix)" or "email (id-prefix)" instead
of an opaque truncated UUID; falls back to id when the user record is
missing
Usage:
- /admin/usage rewritten as the same filter/group-by/search pattern as
/admin/activity. Faceted dropdowns (User / Tool / Source / Event type)
populated from usage_events; debounced free-text search across
tool_name / skill_name / subagent_type / command_name
- New endpoints /api/admin/usage/{facets,kpis,query}; the query endpoint
supports group_by in {day, username, tool_name, source, ref_id} with
sort + offset pagination, plus an ungrouped raw-events mode
- 4 KPI cards (events, distinct users, distinct tools, error rate) are
clickable quick-filters; clicking a grouped row applies the bucket as
a filter
- Old static `?window=7d|30d|all` server preload removed; all state is
client-side via since_minutes + group_by + filters in the URL
* fix(observability): clearer labels, all-column sort, drop saved views UI
- Rename page titles: "Activity" → "Server activity", "Usage" → "Tool usage"
with a one-line subtitle on each explaining what the page covers and
linking the other one. The two pages source different data (audit_log
vs usage_events) and the previous labels conflated them.
- Drop the saved-views dropdown + save modal from /admin/activity. The
modal pop-open bug was the trigger; the value wasn't there yet. The
/api/admin/observability/views CRUD + DuckDB table stay in place.
- Rename "Live (30s)" to "Auto-refresh (30s)" with a tooltip clarifying
that it's the re-fetch rate, not the time range. Time range now
labeled "Time range" instead of "Window".
- All audit-table columns are sortable (User, Source, Action, Resource,
Result added); sort is page-local with a Jinja comment explaining the
trade-off. Same for raw usage rows.
- Fix duplicate sort-arrow bug — the literal "▼" in the Time th HTML was
rendering alongside the CSS ::before arrow. Removed the literal; CSS
is the single source of truth.
* feat(observability): global Sessions browser + transcript viewer + CLI
Web:
- /admin/sessions — list every collected session JSONL across all users
with time-range, user, model, errors-only and free-text filters. Default
sort surfaces error-heavy sessions first. KPI cards (sessions, distinct
users, sessions w/ errors, tool error rate) clickable as quick-filters.
- /admin/sessions/<username>/<file> — transcript viewer rendering the
JSONL chronologically: user prompts, assistant text, tool calls (with
JSON input) and tool results (with flattened output). Errors get a red
border + chip and a "Next error" navigation button at the top.
- Admin dropdown gains a "Sessions" link.
API:
- GET /api/admin/sessions/{list,kpis,facets} — filtered cross-user reads
off usage_session_summary
- GET /api/admin/sessions/{username}/{file}/transcript — parses JSONL via
the existing services.session_pipeline.lib, returns chronological events
- GET /api/admin/sessions/{username}/{file}/download — JSONL stream, same
path-safety guards as the per-user endpoint, audit-logged
CLI:
- `agnes admin sessions list [--user X] [--errors] [--since 7d]` — table
output with `!` prefix on rows that hit a tool error
- `agnes admin sessions show <username> <file>` — transcript dump, with
`--errors` to print only the failed tool_result blocks
- `agnes admin sessions download <username> <file> [-o path]`
- `agnes admin sessions kpis` — top-level numbers
* feat(internal): expose telemetry tables to agnes query with row-level RBAC
Three new registered tables backed by system.duckdb, queryable through
the same /api/query plumbing analysts use for Keboola / BigQuery /
local sources:
agnes_sessions → usage_session_summary (filter: username)
agnes_usage → usage_events (filter: username)
agnes_audit → audit_log (filter: user_id)
RBAC is per-row, not per-table: admins see every user's rows; non-admins
see only their own. The filter is built server-side from the auth user
dict; non-admin filter values are regex-validated before SQL interpolation.
Implementation:
- new connector connectors/internal/ with access (filter+exec) + registry
(idempotent table_registry seed at startup)
- /api/query detects internal table refs and short-circuits to a CTE
wrapper that prepends "WITH agnes_x AS (SELECT * FROM <src> WHERE …),
…" then "SELECT * FROM (<user_sql>) AS _q". DuckDB cursor on the
shared system.duckdb handle — opening parallel handles / ATTACH on the
same file is blocked process-wide.
- mixing internal + BQ / registered local tables in one SELECT is
rejected (v1 limitation)
- src.rbac.can_access_table waves internal tables through for all
authenticated users; row scoping is the actual security control
- /api/v2/schema and /api/v2/sample gained internal branches; sample
intentionally skips its cache because rows are RBAC-scoped per caller
- audit row written as action='query.internal' with is_admin flag
Tests: connectors/internal/access — RBAC, filter clause, schema, CTE
wrapper coexistence with user-supplied aggregations, unsafe-username
rejection. 16/16 passing.
Motivating queries this enables:
SELECT tool_name, COUNT(*) FROM agnes_usage
WHERE is_error GROUP BY 1 ORDER BY 2 DESC
-- analyst self-introspection: which tools fail for me?
SELECT user_id, COUNT(*) FROM agnes_audit
WHERE action = 'session.transcript_view' GROUP BY 1
-- admin: who's been looking at whose session transcripts?
* feat(admin): group dropdown into 5 named sections + internal tables in /catalog
Admin dropdown gains section headers so admins can land on the right
page without re-reading the full menu:
Activity Center Server activity / Tool usage / Sessions
Users & Access Users / Groups / Resource access / Tokens
Data Tables
Agent Experience Curated Marketplaces / Flea Submissions /
Agent Setup Prompt / Agent Workspace Prompt
Server Server config
"Agent Experience" frames the curated content + prompts as one cluster
— it's all admin-controlled material that shapes what an analyst's AI
agent encounters. "Configuration" → "Server" since only one item lives
there now.
Renamed the section's first two items:
"Activity" → "Server activity" (matches page H1)
"Usage" → "Tool usage"
Also fixes /catalog visibility of the internal tables (agnes_sessions /
_usage / _audit) for non-admin users: ``app.auth.access.can_access``
short-circuits to True for resource_type='table' + an internal-table id.
Without this, non-admins saw the tables in /api/v2/catalog (which uses
the same RBAC bypass) but not on the /catalog HTML page (which calls
can_access directly, requiring a resource_grants row internal tables
don't have).
CSS for `.app-nav-menu-section`: small caps, muted, non-clickable; first
section trims top padding so the panel doesn't open with an awkward gap.
* refactor(admin): move corporate memory into Admin > Agent Experience
Memory link was the only admin-only entry in the primary nav (gated by
session.user.is_admin). Moves it into the Admin dropdown under Agent
Experience, alongside Curated Marketplaces / Flea Submissions / Prompts
— all admin-curated content that shapes what an analyst's AI agent
encounters.
Renamed the nav label to "Shared Knowledge" to match what the page
actually is (admin-curated organisational knowledge from session
verification, surfaced to agents). URL stays at /corporate-memory; the
route still gates on require_admin per the existing comment.
Side effect: primary nav (Home / Marketplace / Data Packages) is now
uniform for every authenticated user — no conditional admin-only entry.
* ui: rename admin entries to Curated Knowledge / Init Prompt / Workspace Prompt
- "Shared Knowledge" → "Curated Knowledge" (parallel with "Curated
Marketplaces" in the same Agent Experience section; "curated" tells
the admin what they do there — review + approve)
- "Agent Setup Prompt" → "Init Prompt" (matches the `agnes init` flow
it actually drives)
- "Agent Workspace Prompt" → "Workspace Prompt" (the "Agent" prefix
was redundant — every item in the section is agent-facing)
Renames page titles + H1s on /admin/agent-prompt and
/admin/workspace-prompt to match.
* refactor: rename Usage → Telemetry across user-facing surfaces
External surfaces all switch; internal Python module / file names and the
physical DB tables (usage_events, usage_session_summary, usage_tool_daily,
usage_plugin_daily) stay — renaming them would force a schema migration
+ a redo of the LLM Text-to-SQL prompt for no analyst-visible win.
Changes:
- Admin dropdown: "Tool usage" → "Telemetry"
- Page H1 / <title>: same
- URL: /admin/usage → /admin/telemetry; old URL 308-redirects
- API prefix: /api/admin/usage/* → /api/admin/telemetry/*
- CLI: primary command `agnes admin telemetry …`; `agnes admin usage` kept
as a deprecated alias so existing operator scripts keep working
- Internal data-source table id: agnes_usage → agnes_telemetry. The
registry seed now evicts any stale internal-source row whose id no
longer matches INTERNAL_TABLES, so the old `agnes_usage` row is
removed from table_registry on next app boot
- All tests + JS endpoint paths updated
* test(rbac): include auto-appended internal tables in expectations
get_accessible_tables now appends agnes_sessions / agnes_telemetry /
agnes_audit to every authenticated user's accessible-tables list so the
internal data source shows up in /catalog. The two existing rbac tests
asserted hardcoded list shapes that pre-dated the change.
Rewritten to assert "granted tables + the canonical internal-table set"
instead of literal lists, so the test stays correct if the internal
table roster changes again later.
* ui: visual dividers between admin-dropdown sections
Adds a 1px top border + 6px top margin to every section header except
the first, so the five named groups (Activity Center, Users & Access,
Data, Agent Experience, Server) read as visually separated clusters.
The header itself stays small-caps + muted as before — the border is
additive.
* ui(memory): match obs-topbar visual on /corporate-memory
The Curated Knowledge page (linked from the admin dropdown's Agent
Experience section) opened straight into the stats bar — no title,
no subtitle, no shared chrome with the other admin pages. Adds an
obs-topbar-style header at the top of .container-memory:
- H1 "Curated Knowledge"
- subtitle explaining what the page is + how AI agents pull from it
The `.ck-*` class set duplicates the inline obs-* styles from
/admin/activity etc. for this one page; promoting the obs-* class set
to style-custom.css for shared reuse is the obvious next step (4 pages
already inline the same CSS), tracked as a follow-up.
Page <title> also renamed from "Corporate Memory" → "Curated Knowledge".
* ui(tables): list Agnes internal tables in /admin/tables + group in /catalog
/admin/tables previously rendered three per-source-type listings
(BQ / Keboola / Jira) and dropped any row whose source_type didn't
match — so the agnes_sessions / agnes_telemetry / agnes_audit rows
seeded into table_registry were invisible. Adds a fourth read-only
section "Agnes internal tables" that filters source_type === 'internal'
and renders the same registry-table layout the other sections use,
with two changes:
- no Register button (these rows are seeded on every app boot from
connectors/internal/registry.py)
- Edit + Delete actions hidden (any change would be reverted on the
next start). Manage access stays so admins can still inspect.
Mode badge picks up a new mode-internal CSS class (teal accent) so the
display doesn't lie and call it "local".
In /catalog, internal tables now group under an "agnes" accordion
section (bucket="agnes" on seed) instead of falling into the catch-all
"default". Single source of truth for which tables exist; admins find
them where they expect.
* ui(tables): Agnes internal as a 4th tab next to BQ/Keboola/Jira
Previous iteration mounted the internal-table listing as a separate
standalone card under the tab strip. Reshapes it to a proper
tab-content section so admins switch between data sources via one
consistent nav (BigQuery / Keboola / Jira / Agnes internal).
- New tab button "Agnes internal" in the tab-nav.
- The listing card becomes <section id="tab-content-internal"
class="tab-content">; switchTab() already routes by id so no JS
change beyond extending the hash allowlist for direct #internal
links.
- Tab content keeps the read-only treatment from the previous commit
(no Register button, no Edit / Delete in renderRegistryListing).
* ui: rename Curated Knowledge → Curated Memory
Settles the naming back on "Curated Memory" — parallel structure with
"Curated Marketplaces" in the same Agent Experience section, and zero
rename ripple: URL (/corporate-memory), API (/api/memory/*), CLI
(agnes admin memory), and Python modules all stay on "memory" so the
admin label finally lines up with the underlying surfaces.
The "Curated" prefix still tells admins what they do on the page
(review pending → approve / mandate / reject) and reads as a sibling
of "Curated Marketplaces" right next to it in the dropdown.
Touches: admin dropdown label, page <title>, page H1. DB tables stay
on knowledge_* (already the canonical naming for the data shape).
* ui: rename "Server activity" → "Audit log"
"Audit log" is what the page actually is — server-side audit_log table
rendered with KPI cards + filter bar + sortable table. The "Server
activity" label confused the term with Claude Code session telemetry
(Telemetry page) and didn't make the source/concept clear.
Touches:
- Admin dropdown nav label
- /admin/activity page H1 + subtitle
- /admin/telemetry subtitle cross-link
- test_activity_api page-renders assertion
URL (/admin/activity) and API (/api/admin/activity/*) stay — the
"activity" name has stuck at the route layer for a year; rerouting
those would churn dashboards/bookmarks for zero analyst-visible win.
* ui(admin-nav): gray band on each section header for clearer separation
Previous iteration used a 1px top border between section labels — the
labels still blended into the items above/below at a glance. Switches
to a light gray background band per section header, extended edge-to-
edge inside the panel via negative horizontal margins. Bolder
font-weight (700) reinforces the separation; bumping the font color
isn't needed because the band itself does the work.
First section's header tucks into the panel's top border-radius so the
band reaches the corners without a gap.
* ui(catalog): rename internal-table category to "Agnes Internal"
`bucket` is what /catalog renders as the accordion category header
verbatim — "agnes" lowercase didn't read as a real category name and
got confused with a system identifier. Bumps to "Agnes Internal".
Seed re-applies on every app boot so existing rows pick up the new
bucket value via `ON CONFLICT (id) DO UPDATE`.
* ui(catalog): split Agnes Internal into its own card on /catalog
Previously the three internal tables landed inside the "Core Business
Data" card under an "Agnes Internal" accordion alongside Keboola / BQ
buckets — readers conflated system telemetry with business datasets,
and the data_stats header counter ("3 tables · ~X rows total") only
ever counted synced rows so internal tables looked invisible.
Split the catalog page into two cards:
- Core Business Data: only non-internal source_types (Keboola, BQ,
Jira). Accordions group by bucket as before. Stats counter reflects
this card's tables.
- Agnes Internal: a dedicated card with its own visual treatment
(teal accent matching the mode-internal badge in /admin/tables).
Flat list (no accordion — only 3 rows, never grows here), each
row carries the canonical `agnes query` snippet. Read-only — no
profiler click, no In-stack toggle, no sync metadata.
Route adds `internal_card` context object; template renders the new
card only when it's non-None.
* fix(rbac): hide internal tables from /admin/access + drop "my" framing
Two related cleanups for the Agnes-internal tables:
1. /admin/access (resource grants) no longer lists them. The
`can_access` check has a hardcoded internal-table bypass — security
is row-level (per-request view filter), so a table-grain
`resource_grants` row would do nothing. Surfacing them in the UI
let admins set up grants that silently no-op. Filter at the
`_table_blocks` projection so the UI tree never sees them.
2. Display names drop the analyst-perspective "my" framing:
"Agnes — my sessions" → "Agnes sessions"
"Agnes — my telemetry events" → "Agnes telemetry events"
"Agnes — my audit log" → "Agnes audit log"
The "my" only makes sense from the querying analyst's seat
(`SELECT … FROM agnes_sessions` returns *their* rows); on /admin/*
pages where admin sees / configures them across users, the
pronoun was misleading. Description text now spells out the
row-level RBAC contract explicitly.
Display names update via TableRegistryRepository.register's ON CONFLICT
UPDATE on next app boot; no manual cleanup needed.
* ui: subtitle notes about agnes_* tables on each Activity Center page
The recursive observability story — Agnes serves its own audit /
telemetry / session data through the same `agnes query` plumbing
analysts use for business data — wasn't surfaced anywhere on the
admin pages that show that data. Three pages get a one-liner with
the canonical `agnes query` snippet + the RBAC contract (analysts
see their own rows, admin sees all):
- /admin/activity (Audit log) → agnes_audit
- /admin/telemetry (Tool usage) → agnes_telemetry
- /admin/sessions → agnes_sessions
Sets up the discovery moment for admins: they're reading the page,
they see "you can query this from Claude Code", they remember it
when an analyst asks "how do I find my own failed tool calls?".
* ui(tables): explain "Show log" empty-state on /admin/tables
Cache warmup log <pre> renders with a dark background and is only
populated by the SSE stream during a Re-warm all run. Opening the
page cold + clicking Show log just revealed a black bar with no
context — admins couldn't tell what they were looking at.
Adds an inline paragraph above the <pre> explaining what the log is,
the row format, when it fills in, and where to find the historical
audit trail (/admin/activity). The actual <pre> stays empty until
SSE events arrive, but the surrounding copy carries the meaning.
* ui(tables): auto-open cache-warmup log on Re-warm all click
A Re-warm all run takes ~24s per remote BQ row. With the <details>
collapsed by default, operators saw the button disable, watched a
quiet ~24s pass, and assumed nothing had happened — the streaming
log was hidden behind a closed disclosure.
Two small JS tweaks:
- cacheWarmupRun() opens the details on click, so streamed lines
appear without an extra interaction
- cacheWarmupOnStart() hides the inline hint paragraph the moment
real log content lands, so the dark log block isn't competing
with redundant context
Hint paragraph also clarifies that only `query_mode='remote'` BQ
rows are warmed — operators with only materialized/internal tables
would see total=0 and the page would "do nothing" by spec.
* ui: trim Agnes internal copy across surfaces
Descriptions had grown to explain the extraction pipeline ("parsed
out of session JSONLs"), the underlying table ("Backed by
usage_session_summary"), the RBAC mechanic ("row-level RBAC at query
time — analysts see their own; admin sees all"), and the SQL snippet.
Every implementation detail meant another rewrite on the next iter.
Strips to one stable line per surface: what the data is, plus
"Also available locally for analysis". Mechanics live in code +
docs; the page copy says what the user needs to know.
Touched:
- connectors/internal/access.py: INTERNAL_TABLES descriptions
- activity_center.html / admin_usage.html / admin_sessions.html
subtitles
- catalog.html Agnes Internal card description + row strip
- admin_tables.html "Agnes internal" tab hint
* fix(internal): is_user_admin arity bugs + + saved-view payload cap
Round-1 code review (PR #278) caught two blocking bugs and three nits.
Blocking — both `is_user_admin(user)` (single dict arg) calls raised
TypeError. is_user_admin signature is `(user_id, conn)`. Affected:
- app/api/query.py:_run_internal_query — every POST /api/query that
references agnes_sessions / agnes_telemetry / agnes_audit blew up
with a 500. The headline analyst-facing feature of this PR was
unusable through the API.
- app/api/v2_sample.py — same shape; `GET /api/v2/sample/agnes_*`
returned 500.
Both fixed to call `is_user_admin(user.get("id"), conn)`. Added two
FastAPI-level tests in test_internal_data_source.py that go through
the TestClient — the existing unit tests on `execute_internal_query`
and `build_filter_clause` skipped the request-handler layer where the
bugs lived, which is why this landed.
Nits also closed:
- connectors/internal/access.py: `+` allowed in _USERNAME_RE /
_USER_ID_RE so RFC 5321 email local-parts (alice+test@x) resolve
correctly without hitting InternalAccessError.
- app/api/observability.py: saved-view payload capped at 64 KiB to
prevent an admin from bloating system.duckdb with a malformed save.
* fix(security): close non-admin data-leak via underlying-table refs
PR #278 R2 review surfaced a non-admin-exploitable bypass: SQL whose
string literal contains 'agnes_sessions' routed into the privileged
internal-query path, then queried the underlying physical table
(usage_session_summary / usage_events / audit_log) directly, escaping
the CTE wrapper's row filter. Two reinforcing defenses:
1. find_internal_refs() now strips single-quoted string literals
before scanning for alias names — a literal alone no longer
routes the request into the privileged code path.
2. execute_internal_query() rejects non-admin SQL that references
the underlying physical tables (usage_*, audit_log). The CTE
wrapper only scopes the agnes_* aliases; a direct FROM on the
base table — or a shadowing inner WITH that still has to read
the base table — bypasses RBAC. Block before execution with an
actionable error pointing to the agnes_* alias. Admins are
unaffected (god-mode short-circuit on the filter clause).
3. tests/test_internal_data_source.py — three new negative tests
covering literal-only matches, direct-table refs, and CTE
shadow attempts.
Also tightens usage_ask.py's SELECT-only validator: pragma_table_info,
pragma_storage_info, pragma_database_*, and duckdb_tables / columns /
views / indexes / schemas are reflection functions that leak metadata
the analyst question shouldn't reach. \bPRAGMA\b in _FORBIDDEN never
matched the function-call form (word-boundary between `A` and `_`).
* fix(security): dynamic denylist for non-admin internal queries
R3 review (PR #278) caught a wider data-leak than R2: the underlying-
physical-table guard listed only the 7 usage_* + audit_log tables,
but system.duckdb has 30+ other sensitive tables — users (emails +
ids), personal_access_tokens, resource_grants, user_groups,
user_observability_views, store_*, marketplace_*, knowledge_*, etc.
A non-admin SQL like
SELECT * FROM agnes_sessions
UNION ALL SELECT email, id, … FROM users LIMIT 1
would leak every user's row.
Replaces the hardcoded denylist with a **dynamic allowlist** —
non-admin SQL may reference ONLY the registered agnes_* aliases.
Every other table in `information_schema.tables` (main schema) is
rejected. Future migrations that add a new sensitive table are
automatically covered without re-editing this module.
Also strips SQL comments (`/* */` and `--`) before the identifier
scan so a comment-wrapped table name (`/**/users/**/`) can't slip
past the regex.
Four new negative tests pin: `users`, `personal_access_tokens`,
block-comment wrap, line-comment wrap.
Plus: per-user view-count cap (100) on /api/admin/observability/views
so an admin can't fill system.duckdb with thousands of saved views.
* release: 0.54.0 — Activity Center + Telemetry + Sessions + internal datasource
Cuts the work shipped across this PR (Activity Center build, recursive
internal data source) into a versioned release. Bumps pyproject.toml
to 0.54.0; renames the top of CHANGELOG.md from [Unreleased] to
[0.54.0] — 2026-05-12 with a header summary; opens a fresh
[Unreleased] section for the next round.
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
822 lines
38 KiB
Python
822 lines
38 KiB
Python
"""FastAPI main application — unified server for web UI + API."""
|
|
|
|
# Silence authlib's internal forward-compat note. Authlib emits an
|
|
# AuthlibDeprecationWarning from its own _joserfc_helpers when our
|
|
# `from authlib.integrations.starlette_client import OAuth` import
|
|
# touches `authlib.jose` paths. The warning is upstream-internal — it's
|
|
# telling authlib to migrate to joserfc before its 2.0; it's not
|
|
# actionable on our side until either authlib ships the fix or we
|
|
# rewrite OAuth handling on top of joserfc directly. Filtering here
|
|
# (before authlib gets imported transitively) keeps `make local-dev`
|
|
# stdout clean without hiding warnings from any other package.
|
|
import warnings as _warnings
|
|
try:
|
|
from authlib.deprecate import AuthlibDeprecationWarning as _AuthlibDepr
|
|
_warnings.filterwarnings("ignore", category=_AuthlibDepr)
|
|
except ImportError:
|
|
# authlib too old / class moved — fall back to message-based match
|
|
# so the filter still keeps startup clean.
|
|
_warnings.filterwarnings(
|
|
"ignore",
|
|
message=r"authlib\.jose module is deprecated.*",
|
|
)
|
|
|
|
import logging
|
|
from contextlib import asynccontextmanager
|
|
from pathlib import Path
|
|
from urllib.parse import quote
|
|
|
|
import os
|
|
|
|
# Initialise structured logging BEFORE any module that emits logs at import
|
|
# time. setup_logging is idempotent and safe to call once at process start.
|
|
from app.logging_config import setup_logging
|
|
|
|
setup_logging("app")
|
|
|
|
from app.version import APP_VERSION, MIN_COMPAT_CLI_VERSION
|
|
|
|
from fastapi import FastAPI
|
|
from fastapi.middleware.cors import CORSMiddleware
|
|
from fastapi.responses import RedirectResponse
|
|
from fastapi.staticfiles import StaticFiles
|
|
from starlette.exceptions import HTTPException as StarletteHTTPException
|
|
from starlette.middleware.gzip import GZipMiddleware
|
|
from starlette.middleware.sessions import SessionMiddleware
|
|
from starlette.types import ASGIApp, Receive, Scope, Send
|
|
|
|
from app.middleware.request_id import RequestIdMiddleware
|
|
|
|
|
|
class _SelectiveGZipMiddleware:
|
|
"""GZipMiddleware wrapper that skips a set of path prefixes.
|
|
|
|
Parquet-serving endpoints send responses that are already columnar-
|
|
compressed (parquet's internal codec) and — for /api/data — can reach
|
|
hundreds of MB. Gzipping them on the way out costs CPU and latency with
|
|
no meaningful size reduction. Skip those paths; every other endpoint
|
|
(JSON manifests, HTML previews, install.sh) still gets compressed.
|
|
"""
|
|
|
|
def __init__(self, app: ASGIApp, minimum_size: int = 1024, skip_prefixes: tuple[str, ...] = ()) -> None:
|
|
# `self.app` is the Starlette middleware convention — outer middleware
|
|
# (e.g. fastapi-debug-toolbar's APIRouter walker) traverses the chain
|
|
# via `.app` to find the inner FastAPI app. Keep `_raw` as the public
|
|
# alias used by our own __call__ for the skip-path branch.
|
|
self.app = app
|
|
self._raw = app
|
|
self._gzip = GZipMiddleware(app, minimum_size=minimum_size)
|
|
self._skip_prefixes = skip_prefixes
|
|
|
|
async def __call__(self, scope: Scope, receive: Receive, send: Send) -> None:
|
|
if scope.get("type") == "http":
|
|
path = scope.get("path", "")
|
|
if any(path.startswith(p) for p in self._skip_prefixes):
|
|
await self._raw(scope, receive, send)
|
|
return
|
|
await self._gzip(scope, receive, send)
|
|
|
|
from app.auth.rate_limit import (
|
|
SlowAPIMiddleware as _AuthRateLimitMiddleware,
|
|
RateLimitExceeded as _AuthRateLimitExceeded,
|
|
_rate_limit_exceeded_handler as _auth_rate_limit_handler,
|
|
limiter as _auth_rate_limiter,
|
|
)
|
|
from app.auth.router import router as auth_router
|
|
from app.api.health import router as health_router
|
|
from app.api.sync import router as sync_router
|
|
from app.api.data import router as data_router
|
|
from app.api.query import router as query_router
|
|
from app.api.users import router as users_router
|
|
from app.api.memory import router as memory_router
|
|
from app.api.upload import router as upload_router
|
|
from app.api.scripts import router as scripts_router
|
|
from app.api.settings import router as settings_router
|
|
from app.api.catalog import router as catalog_router
|
|
from app.api.telegram import router as telegram_router
|
|
from app.api.access import router as access_router, me_router as me_access_router
|
|
from app.api.me_debug import router as me_debug_router
|
|
from app.api.me import router as me_router
|
|
from app.api.admin import router as admin_router
|
|
from app.api.admin_bigquery_test import router as admin_bigquery_test_router
|
|
from app.api.jira_webhooks import router as jira_webhooks_router
|
|
from app.api.metrics import router as metrics_router
|
|
from app.api.metadata import router as metadata_router
|
|
from app.api.query_hybrid import router as query_hybrid_router
|
|
from app.api.cli_artifacts import router as cli_artifacts_router
|
|
from app.api.tokens import router as tokens_router, admin_router as tokens_admin_router
|
|
from app.api.v2_catalog import router as v2_catalog_router
|
|
from app.api.v2_schema import router as v2_schema_router
|
|
from app.api.v2_sample import router as v2_sample_router
|
|
from app.api.v2_scan import router as v2_scan_router
|
|
from app.api.marketplaces import router as marketplaces_router
|
|
from app.api.store import router as store_router
|
|
from app.api.my_stack import router as my_stack_router
|
|
from app.api.marketplace import router as marketplace_router
|
|
from app.api.welcome import router as welcome_router
|
|
from app.api.claude_md import router as claude_md_router
|
|
from app.api.news import router as news_router
|
|
from app.api.cache_warmup import router as cache_warmup_router
|
|
from app.api.bq_metadata_refresh import router as bq_metadata_refresh_router
|
|
from app.api.activity import router as activity_router
|
|
from app.api.observability import router as observability_router
|
|
from app.api.admin_user_sessions import router as admin_user_sessions_router
|
|
from app.api.admin_sessions import router as admin_sessions_router
|
|
from app.api.admin_usage import router as admin_usage_router
|
|
from app.api.admin_usage_summary import router as admin_usage_summary_router
|
|
from app.marketplace_server.router import router as marketplace_server_router
|
|
from app.marketplace_server.git_router import make_git_wsgi_app
|
|
from app.web.router import router as web_router
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
@asynccontextmanager
|
|
async def lifespan(app):
|
|
# Issue #81 Group A — log the effective remote_attach allowlist at
|
|
# startup so an operator's typo in AGNES_REMOTE_ATTACH_EXTENSIONS
|
|
# (which REPLACES, not extends, the default) is visible.
|
|
try:
|
|
from src.orchestrator_security import log_effective_policy
|
|
log_effective_policy()
|
|
except Exception:
|
|
pass # never block startup on a logging convenience
|
|
|
|
# Bump anyio's default thread pool size from 40 → AGNES_THREADPOOL_SIZE
|
|
# (default 200). FastAPI auto-runs every plain `def` route handler in
|
|
# this pool — the Tier 1 endpoints converted in PR #188 (`/api/query`,
|
|
# `/api/v2/scan`, `/api/v2/sample`, `/api/v2/schema`) all block on
|
|
# synchronous DuckDB / BQ-extension calls inside the handler body and
|
|
# would otherwise serialise once 40 are in flight. 200 keeps the per-
|
|
# process working set well under the BQ extension's connection cap
|
|
# while leaving headroom for concurrent UI / health probes.
|
|
try:
|
|
import anyio.to_thread
|
|
size = int(os.environ.get("AGNES_THREADPOOL_SIZE", "200"))
|
|
anyio.to_thread.current_default_thread_limiter().total_tokens = size
|
|
logger.info("anyio thread pool capacity set to %d", size)
|
|
except Exception as e:
|
|
logger.warning("failed to bump anyio thread pool capacity: %s", e)
|
|
|
|
from app.api.cache_warmup import maybe_schedule_startup_warmup
|
|
maybe_schedule_startup_warmup()
|
|
|
|
# Sweep stale materialize parquet locks left behind by previous runs
|
|
# that were SIGKILL'd mid-materialize. Lazy reclaim at next acquire
|
|
# already handles correctness, but an active sweep at startup keeps
|
|
# the data directory tidy and gives operators a clear "swept N" log
|
|
# line instead of zombie 0-byte files lingering for days (issue #260).
|
|
try:
|
|
from connectors.bigquery.extractor import sweep_stale_parquet_locks
|
|
from src.db import _get_data_dir as _ddir
|
|
sweep_stale_parquet_locks(_ddir() / "extracts")
|
|
except Exception:
|
|
logger.exception("startup parquet-lock sweep failed (non-fatal)")
|
|
|
|
# Seed the internal data-source registry rows so `agnes_sessions /
|
|
# agnes_telemetry / agnes_audit` show up in /admin/tables + `agnes
|
|
# catalog` on every fresh install. Idempotent — re-applies canonical
|
|
# name + description on every boot so operators can't drift them
|
|
# away from the seed.
|
|
try:
|
|
from src.db import get_system_db
|
|
from connectors.internal.registry import ensure_internal_tables_registered
|
|
ensure_internal_tables_registered(get_system_db())
|
|
except Exception:
|
|
logger.exception("internal data-source seed failed; continuing")
|
|
|
|
# Construct the PostHog client up front so its background flush thread
|
|
# starts before the first request — and so a missing/invalid key fails
|
|
# loud at boot rather than on first capture. No-op when disabled.
|
|
try:
|
|
from src.observability import get_posthog
|
|
pc = get_posthog()
|
|
if pc.enabled:
|
|
logger.info("PostHog observability enabled (host=%s, identify=%s, replay=%s)",
|
|
pc.host, pc.identify_mode, pc.replay_enabled)
|
|
except Exception:
|
|
logger.exception("PostHog init at startup failed")
|
|
yield
|
|
try:
|
|
from src.observability import get_posthog
|
|
get_posthog().shutdown()
|
|
except Exception:
|
|
logger.exception("PostHog shutdown failed")
|
|
from src.db import close_system_db
|
|
close_system_db()
|
|
|
|
|
|
def _is_truthy_env(name: str) -> bool:
|
|
return os.environ.get(name, "").lower() in ("1", "true", "yes")
|
|
|
|
|
|
# DEBUG turns the toolbar on; LOCAL_DEV_MODE implies it (auth-bypassed dev
|
|
# environment is by definition a debugging context — no point in making
|
|
# operators set both).
|
|
DEBUG = _is_truthy_env("DEBUG") or _is_truthy_env("LOCAL_DEV_MODE")
|
|
|
|
|
|
def _toolbar_show_callback(request, settings) -> bool:
|
|
"""Decide whether the debug toolbar shows on a request.
|
|
|
|
Replaces the upstream default (which reads `request.app.debug`) — we keep
|
|
`app.debug=False` so our @app.exception_handler(Exception) runs instead of
|
|
Starlette's debug-only ServerErrorMiddleware, but we still want the
|
|
toolbar mounted. Read DEBUG / LOCAL_DEV_MODE env directly so operators who
|
|
flip the env at runtime (rare) see the change without re-import.
|
|
"""
|
|
return _is_truthy_env("DEBUG") or _is_truthy_env("LOCAL_DEV_MODE")
|
|
|
|
|
|
def create_app() -> FastAPI:
|
|
app = FastAPI(
|
|
title="AI Data Analyst",
|
|
description="Data distribution platform for AI analytical systems",
|
|
version=APP_VERSION,
|
|
lifespan=lifespan,
|
|
# Intentionally NOT debug=DEBUG: FastAPI's debug=True installs
|
|
# Starlette's ServerErrorMiddleware which intercepts unhandled
|
|
# Exceptions and renders a plain-HTML traceback BEFORE our
|
|
# @app.exception_handler(Exception) can run — robbing the 500 page
|
|
# of its chrome and the debug toolbar. We get the toolbar back via
|
|
# SHOW_TOOLBAR_CALLBACK below (reads DEBUG env directly instead of
|
|
# request.app.debug).
|
|
debug=False,
|
|
)
|
|
|
|
@app.middleware("http")
|
|
async def _add_version_headers(request, call_next):
|
|
response = await call_next(request)
|
|
# /api/* only — headers are advisory to the agnes CLI; UI/docs/marketplace
|
|
# traffic doesn't consume them.
|
|
if request.url.path.startswith("/api/"):
|
|
response.headers["X-Agnes-Latest-Version"] = APP_VERSION
|
|
response.headers["X-Agnes-Min-Version"] = MIN_COMPAT_CLI_VERSION
|
|
return response
|
|
|
|
# FastAPI debug toolbar — only when DEBUG=1 in env. Injects per-request
|
|
# HTML overlay (headers, routes, timer, profiling, logs) on any HTML
|
|
# response; harmless on JSON. Inner try/except is for the import only:
|
|
# if a developer sets DEBUG=1 without installing dev deps, log a warning
|
|
# instead of crashing. The middleware mount itself fails loud if broken.
|
|
#
|
|
# Mounted FIRST (innermost on response) so it sees the raw HTML BEFORE
|
|
# GZip compresses it — debug_toolbar.middleware decodes response bodies
|
|
# as UTF-8 to inject markup, and a gzipped body fails that decode (the
|
|
# toolbar's own `Accept-Encoding` skip-check reads response headers, not
|
|
# request headers, so it never trips).
|
|
if DEBUG:
|
|
try:
|
|
from debug_toolbar.middleware import DebugToolbarMiddleware
|
|
from jinja2 import FileSystemLoader
|
|
# debug_toolbar.middleware splats **kwargs into DebugToolbarSettings
|
|
# (a pydantic-settings model with case-insensitive UPPERCASE fields).
|
|
# Pass field names as kwargs to add_middleware — `panels` becomes
|
|
# `PANELS`, etc. Do NOT wrap them in a `settings={...}` dict —
|
|
# that hits the model's actual `SETTINGS` field (Sequence[BaseSettings])
|
|
# and fails validation. Field reference:
|
|
# https://github.com/mongkok/fastapi-debug-toolbar/blob/master/debug_toolbar/settings.py
|
|
# ProfilingPanel (pyinstrument) is intentionally omitted: it
|
|
# raises "There is already a profiler running" under uvicorn's
|
|
# async context because pyinstrument's stack sampler can't be
|
|
# nested per task. Re-enable per-developer if you really want it
|
|
# via env override; the rest of the panels are async-safe.
|
|
#
|
|
# JINJA_LOADERS prepends our app/debug/templates so DuckDBPanel
|
|
# can resolve `panels/duckdb.html`. The toolbar's built-in loader
|
|
# (PackageLoader for debug_toolbar/templates) stays appended via
|
|
# ChoiceLoader, so first-party panels still render.
|
|
_debug_templates_dir = Path(__file__).parent / "debug" / "templates"
|
|
_toolbar_settings = dict(
|
|
panels=[
|
|
"debug_toolbar.panels.headers.HeadersPanel",
|
|
"debug_toolbar.panels.routes.RoutesPanel",
|
|
"debug_toolbar.panels.settings.SettingsPanel",
|
|
"debug_toolbar.panels.versions.VersionsPanel",
|
|
"debug_toolbar.panels.timer.TimerPanel",
|
|
"debug_toolbar.panels.logging.LoggingPanel",
|
|
"app.debug.duckdb_panel.DuckDBPanel",
|
|
],
|
|
jinja_loaders=[FileSystemLoader(str(_debug_templates_dir))],
|
|
show_toolbar_callback="app.main._toolbar_show_callback",
|
|
)
|
|
# Eagerly register the toolbar's own routes
|
|
# (/_debug_toolbar/render_panel/ + /_debug_toolbar/static mount)
|
|
# NOW, before app.web.router's /{full_path:path} catch-all gets
|
|
# added by include_router(web_router). Otherwise the catch-all
|
|
# swallows the toolbar's own GET requests and the panel scripts
|
|
# render our 404 page. We can't construct DebugToolbarMiddleware
|
|
# directly on the FastAPI app (its `while not isinstance(...,
|
|
# APIRouter): self.router = self.router.app` walk fails — FastAPI
|
|
# has `.router`, not `.app`), so call init_toolbar's body
|
|
# ourselves on the APIRouter directly. add_middleware below still
|
|
# works lazily; init_toolbar's NoMatchFound guard skips re-adding
|
|
# routes when called the second time.
|
|
from debug_toolbar.api import render_panel as _render_panel_view
|
|
from debug_toolbar.middleware import show_toolbar as _show_toolbar
|
|
from debug_toolbar.settings import DebugToolbarSettings
|
|
from fastapi import HTTPException as _HTTPException, status as _status
|
|
from fastapi.staticfiles import StaticFiles as _StaticFiles
|
|
|
|
_eager_settings = DebugToolbarSettings(**_toolbar_settings)
|
|
|
|
async def _require_show_toolbar(request, call_next=None):
|
|
"""Mirror DebugToolbarMiddleware.require_show_toolbar: 404 the
|
|
toolbar API for clients that wouldn't see the toolbar."""
|
|
if not _show_toolbar(request, _eager_settings):
|
|
raise _HTTPException(status_code=_status.HTTP_404_NOT_FOUND)
|
|
return await _render_panel_view(request)
|
|
|
|
app.router.get(
|
|
_eager_settings.API_URL,
|
|
name="debug_toolbar.render_panel",
|
|
include_in_schema=False,
|
|
)(_render_panel_view)
|
|
app.router.mount(
|
|
_eager_settings.STATIC_URL,
|
|
_StaticFiles(packages=["debug_toolbar"]),
|
|
name="debug_toolbar.static",
|
|
)
|
|
|
|
app.add_middleware(DebugToolbarMiddleware, **_toolbar_settings)
|
|
except ImportError:
|
|
logger.warning(
|
|
"DEBUG=1 but fastapi-debug-toolbar not installed; toolbar disabled",
|
|
)
|
|
|
|
# PostHog HTML snippet injection — must run INSIDE the GZip layer so it
|
|
# sees uncompressed HTML before compression. Starlette runs middleware
|
|
# in reverse-registration order on the response, so registering this
|
|
# before _SelectiveGZipMiddleware places it deeper in the stack and
|
|
# therefore earlier in the response chain. Many of this app's templates
|
|
# are standalone (their own <!DOCTYPE>) and never extend base.html, so
|
|
# a per-template include would miss them; the middleware covers
|
|
# everything in one place. No-op when POSTHOG_API_KEY is unset.
|
|
from app.middleware.posthog_inject import PosthogInjectionMiddleware
|
|
app.add_middleware(PosthogInjectionMiddleware)
|
|
|
|
# Compress JSON / HTML responses on the wire. Parquet downloads are
|
|
# excluded — they're already columnar-compressed and re-gzipping them
|
|
# just burns CPU with no size win. minimum_size=1024 keeps tiny
|
|
# responses uncompressed too (cheaper than the header overhead).
|
|
app.add_middleware(
|
|
_SelectiveGZipMiddleware,
|
|
minimum_size=1024,
|
|
skip_prefixes=(
|
|
"/api/data/",
|
|
"/cli/wheel/",
|
|
"/cli/download",
|
|
"/marketplace.git", # git smart-HTTP is self-chunked; double-gzip bloats
|
|
),
|
|
)
|
|
|
|
# Per-IP rate limiting on auth endpoints (#45). Wired here so the
|
|
# SlowAPIMiddleware sits in the standard middleware chain (above CORS,
|
|
# below GZip — order doesn't affect correctness, only metric/log
|
|
# ordering). The limiter singleton is created at import time in
|
|
# app.auth.rate_limit; we just register state + middleware + handler.
|
|
app.state.limiter = _auth_rate_limiter
|
|
app.add_middleware(_AuthRateLimitMiddleware)
|
|
app.add_exception_handler(_AuthRateLimitExceeded, _auth_rate_limit_handler)
|
|
|
|
# Session middleware (required for OAuth state)
|
|
from app.secrets import get_session_secret
|
|
session_secret = get_session_secret()
|
|
if len(session_secret) < 32:
|
|
# Same gate JWT applies (app/auth/jwt.py:_get_secret_key) — keeps the
|
|
# two HMAC surfaces consistent. session_internal_roles + google_groups
|
|
# are trusted off the cookie signature; a weak SESSION_SECRET means
|
|
# those gates are weak too.
|
|
import warnings as _warnings
|
|
_warnings.warn(
|
|
f"SESSION_SECRET is {len(session_secret)} chars — minimum 32 recommended",
|
|
UserWarning, stacklevel=2,
|
|
)
|
|
app.add_middleware(SessionMiddleware, secret_key=session_secret)
|
|
|
|
# CORS for CLI and external clients
|
|
cors_origins = os.environ.get("CORS_ORIGINS", "http://localhost:3000,http://localhost:8000").split(",")
|
|
app.add_middleware(
|
|
CORSMiddleware,
|
|
allow_origins=[o.strip() for o in cors_origins],
|
|
allow_credentials=True,
|
|
allow_methods=["*"],
|
|
allow_headers=["*"],
|
|
)
|
|
|
|
# RequestIdMiddleware mounted LAST — Starlette inserts middleware at
|
|
# index 0, so the last add_middleware call ends up OUTERMOST and runs
|
|
# FIRST per request. The request_id ContextVar is set before any
|
|
# downstream middleware or handler runs, and every response gets the
|
|
# x-request-id header.
|
|
app.add_middleware(RequestIdMiddleware)
|
|
|
|
# Load .env_overlay (persisted by /api/admin/configure)
|
|
from app.secrets import _state_dir
|
|
_overlay = _state_dir() / ".env_overlay"
|
|
if _overlay.exists():
|
|
for line in _overlay.read_text().splitlines():
|
|
if "=" in line and not line.startswith("#"):
|
|
k, v = line.split("=", 1)
|
|
os.environ.setdefault(k.strip(), v.strip())
|
|
|
|
# Load instance config on startup
|
|
try:
|
|
from app.instance_config import load_instance_config
|
|
load_instance_config()
|
|
logger.info("Instance config loaded")
|
|
except Exception as e:
|
|
logger.warning(f"Could not load instance config: {e}")
|
|
|
|
# Configure confidence scoring from instance config (corporate_memory.confidence section)
|
|
try:
|
|
from app.instance_config import get_corporate_memory_config
|
|
from services.corporate_memory.confidence import configure as configure_confidence
|
|
cm_config = get_corporate_memory_config()
|
|
if cm_config and "confidence" in cm_config:
|
|
configure_confidence(cm_config["confidence"])
|
|
logger.info("Corporate memory confidence config applied")
|
|
except Exception as e:
|
|
logger.warning(f"Could not configure corporate memory confidence: {e}")
|
|
|
|
# Startup banner
|
|
from src.db import SCHEMA_VERSION
|
|
logger.info(
|
|
"Agnes %s | channel: %s | schema v%s",
|
|
os.environ.get("AGNES_VERSION", "dev"),
|
|
os.environ.get("RELEASE_CHANNEL", "dev"),
|
|
SCHEMA_VERSION,
|
|
)
|
|
|
|
# LOCAL_DEV_MODE: bypass authentication for local development. DO NOT enable in prod.
|
|
# When on, every protected route auto-logs in as a seeded admin user (default dev@localhost).
|
|
from app.auth.dependencies import (
|
|
is_local_dev_mode, get_local_dev_email, get_local_dev_groups,
|
|
)
|
|
if is_local_dev_mode():
|
|
logger.warning("=" * 60)
|
|
logger.warning("LOCAL_DEV_MODE is ON — authentication is bypassed.")
|
|
logger.warning("All requests auto-authenticate as: %s", get_local_dev_email())
|
|
# Validate + report LOCAL_DEV_GROUPS at startup so a malformed JSON
|
|
# value gets surfaced loudly here instead of silently warning on the
|
|
# first authenticated request. Empty when unset is fine — just say so.
|
|
raw_groups_env = os.environ.get("LOCAL_DEV_GROUPS", "").strip()
|
|
mocked_groups = get_local_dev_groups()
|
|
if raw_groups_env and not mocked_groups:
|
|
logger.warning(
|
|
"LOCAL_DEV_GROUPS is set but produced no valid groups — "
|
|
"check the WARNING above for the parse error.",
|
|
)
|
|
elif mocked_groups:
|
|
logger.warning(
|
|
"LOCAL_DEV_GROUPS: mocking %d group(s) into session: %s",
|
|
len(mocked_groups),
|
|
", ".join(g["id"] for g in mocked_groups),
|
|
)
|
|
else:
|
|
logger.warning("LOCAL_DEV_GROUPS is unset — session.google_groups will be empty.")
|
|
logger.warning("NEVER enable this in a deployment reachable from the internet.")
|
|
logger.warning("=" * 60)
|
|
|
|
# Seed admin user (SEED_ADMIN_EMAIL) and add them to the Admin user_group.
|
|
# Optional SEED_ADMIN_PASSWORD lets the seeded user sign in immediately
|
|
# without going through bootstrap; never overwritten if already set.
|
|
# The Admin/Everyone user_groups themselves are seeded inside
|
|
# _ensure_schema (src.db._seed_system_groups), so this hook only has to
|
|
# handle membership for the seed admin.
|
|
seed_email = os.environ.get("SEED_ADMIN_EMAIL") or (get_local_dev_email() if is_local_dev_mode() else None)
|
|
if seed_email:
|
|
try:
|
|
from src.db import SYSTEM_ADMIN_GROUP, get_system_db
|
|
from src.repositories.user_group_members import UserGroupMembersRepository
|
|
from src.repositories.users import UserRepository
|
|
conn = get_system_db()
|
|
repo = UserRepository(conn)
|
|
seed_password = os.environ.get("SEED_ADMIN_PASSWORD") or None
|
|
password_hash = None
|
|
if seed_password:
|
|
from argon2 import PasswordHasher
|
|
password_hash = PasswordHasher().hash(seed_password)
|
|
existing = repo.get_by_email(seed_email)
|
|
if not existing:
|
|
import uuid
|
|
user_id = str(uuid.uuid4())
|
|
repo.create(
|
|
id=user_id,
|
|
email=seed_email,
|
|
name="Admin",
|
|
password_hash=password_hash,
|
|
)
|
|
logger.info("Seeded admin user: %s (password=%s)", seed_email, "yes" if password_hash else "no")
|
|
else:
|
|
user_id = existing["id"]
|
|
if password_hash and not existing.get("password_hash"):
|
|
repo.update(id=user_id, password_hash=password_hash)
|
|
logger.info("Set password on existing seed admin: %s", seed_email)
|
|
# Make sure the seed admin is actually in the Admin group — this
|
|
# is what gives them admin access in v12. Idempotent.
|
|
admin_group = conn.execute(
|
|
"SELECT id FROM user_groups WHERE name = ?", [SYSTEM_ADMIN_GROUP],
|
|
).fetchone()
|
|
if admin_group:
|
|
UserGroupMembersRepository(conn).add_member(
|
|
user_id=user_id,
|
|
group_id=admin_group[0],
|
|
source="system_seed",
|
|
added_by="app.main:seed_admin",
|
|
)
|
|
conn.close()
|
|
except Exception as e:
|
|
logger.warning(f"Could not seed admin: {e}")
|
|
|
|
# Seed the synthetic scheduler user when SCHEDULER_API_TOKEN is configured,
|
|
# so the very first cron tick after a fresh deploy already has a valid
|
|
# actor to attribute audit-log entries to. The lazy seed in
|
|
# `app.auth.scheduler_token.get_scheduler_user` covers the case where the
|
|
# secret is rotated mid-life, but doing it here keeps startup observable.
|
|
from app.auth.scheduler_token import get_scheduler_secret
|
|
if get_scheduler_secret():
|
|
try:
|
|
from app.auth.scheduler_token import (
|
|
SCHEDULER_TOKEN_MIN_LENGTH,
|
|
ensure_scheduler_user,
|
|
)
|
|
from src.db import get_system_db
|
|
secret = get_scheduler_secret()
|
|
if len(secret) < SCHEDULER_TOKEN_MIN_LENGTH:
|
|
logger.warning(
|
|
"SCHEDULER_API_TOKEN is set but only %d chars — auth path"
|
|
" disabled (minimum %d). Generate a longer secret in .env.",
|
|
len(secret), SCHEDULER_TOKEN_MIN_LENGTH,
|
|
)
|
|
else:
|
|
conn = get_system_db()
|
|
try:
|
|
ensure_scheduler_user(conn)
|
|
finally:
|
|
conn.close()
|
|
except Exception as e:
|
|
logger.warning(f"Could not seed scheduler user: {e}")
|
|
|
|
# C8: Warn when no user has a password_hash — bootstrap endpoint is open.
|
|
# This is intentional UX (operator can claim seed admin), but the open
|
|
# window should be visible in startup logs so it's not forgotten.
|
|
if not is_local_dev_mode():
|
|
try:
|
|
from src.db import get_system_db
|
|
from src.repositories.users import UserRepository
|
|
conn = get_system_db()
|
|
repo = UserRepository(conn)
|
|
all_users = repo.list_all()
|
|
has_password = any(u.get("password_hash") for u in all_users)
|
|
if not has_password:
|
|
logger.warning(
|
|
"No user has a password set — /auth/bootstrap is reachable. "
|
|
"Claim the seed admin (or set SEED_ADMIN_PASSWORD) to close this window."
|
|
)
|
|
conn.close()
|
|
except Exception:
|
|
pass # never block startup on a logging convenience
|
|
|
|
# Static files
|
|
static_dir = Path(__file__).parent / "web" / "static"
|
|
if static_dir.exists():
|
|
app.mount("/static", StaticFiles(directory=str(static_dir)), name="static")
|
|
|
|
# Auth providers (conditional registration)
|
|
from app.auth.providers.google import router as google_auth_router, is_available as google_available
|
|
from app.auth.providers.password import router as password_auth_router
|
|
from app.auth.providers.email import router as email_auth_router, is_available as email_available
|
|
|
|
# API routers
|
|
app.include_router(auth_router)
|
|
app.include_router(google_auth_router)
|
|
app.include_router(password_auth_router)
|
|
app.include_router(email_auth_router) # Always register, check availability per-request
|
|
app.include_router(health_router)
|
|
app.include_router(sync_router)
|
|
app.include_router(data_router)
|
|
app.include_router(query_router)
|
|
app.include_router(users_router)
|
|
app.include_router(memory_router)
|
|
app.include_router(upload_router)
|
|
app.include_router(scripts_router)
|
|
app.include_router(settings_router)
|
|
app.include_router(catalog_router)
|
|
app.include_router(telegram_router)
|
|
app.include_router(admin_router)
|
|
app.include_router(admin_bigquery_test_router)
|
|
app.include_router(access_router)
|
|
app.include_router(me_access_router)
|
|
app.include_router(me_debug_router)
|
|
app.include_router(me_router)
|
|
app.include_router(jira_webhooks_router)
|
|
app.include_router(metrics_router)
|
|
app.include_router(metadata_router)
|
|
app.include_router(query_hybrid_router)
|
|
app.include_router(cli_artifacts_router)
|
|
app.include_router(tokens_router)
|
|
app.include_router(tokens_admin_router)
|
|
app.include_router(v2_catalog_router)
|
|
app.include_router(v2_schema_router)
|
|
app.include_router(v2_sample_router)
|
|
app.include_router(v2_scan_router)
|
|
app.include_router(marketplaces_router)
|
|
app.include_router(store_router)
|
|
app.include_router(my_stack_router)
|
|
app.include_router(marketplace_router)
|
|
app.include_router(welcome_router)
|
|
app.include_router(claude_md_router)
|
|
app.include_router(news_router)
|
|
app.include_router(cache_warmup_router)
|
|
app.include_router(bq_metadata_refresh_router)
|
|
app.include_router(activity_router)
|
|
app.include_router(observability_router)
|
|
app.include_router(admin_user_sessions_router)
|
|
app.include_router(admin_sessions_router)
|
|
app.include_router(admin_usage_router)
|
|
app.include_router(admin_usage_summary_router)
|
|
app.include_router(marketplace_server_router)
|
|
|
|
# Git smart-HTTP endpoint for Claude Code: /marketplace.git/*
|
|
# WSGI → ASGI bridge (dulwich is WSGI-native; FastAPI is ASGI).
|
|
from a2wsgi import WSGIMiddleware
|
|
app.mount("/marketplace.git", WSGIMiddleware(make_git_wsgi_app()))
|
|
|
|
# Web UI router (must be last — has catch-all routes)
|
|
app.include_router(web_router)
|
|
|
|
# Paths served as API responses (JSON / ZIP / git smart-HTTP) — never
|
|
# redirect a 401 here to the HTML login page; clients expect the raw 401.
|
|
_API_PATH_PREFIXES: tuple[str, ...] = (
|
|
"/api/",
|
|
"/auth/",
|
|
"/marketplace.zip",
|
|
"/marketplace.git",
|
|
"/marketplace/",
|
|
)
|
|
|
|
_ERROR_TITLES = {
|
|
400: "Bad request",
|
|
401: "Sign-in required",
|
|
403: "Forbidden",
|
|
404: "Page not found",
|
|
405: "Method not allowed",
|
|
408: "Request timeout",
|
|
413: "Payload too large",
|
|
422: "Unprocessable entity",
|
|
429: "Too many requests",
|
|
500: "Server error",
|
|
502: "Bad gateway",
|
|
503: "Service unavailable",
|
|
504: "Gateway timeout",
|
|
}
|
|
|
|
def _wants_html(request) -> bool:
|
|
"""True when the client looks like a browser (non-API path, explicit html).
|
|
|
|
We deliberately do NOT treat ``Accept: */*`` (curl's default) or an
|
|
empty Accept header as wanting HTML. curl-using operators were
|
|
getting JSON error bodies for non-API paths before this PR; matching
|
|
``*/*`` here would silently flip them to HTML and break tooling that
|
|
parses ``{"detail": "..."}``. A real browser sends
|
|
``Accept: text/html,application/xhtml+xml,...`` so the explicit
|
|
substring check below covers that case.
|
|
Devin ANALYSIS_0003 on PR #136 review.
|
|
"""
|
|
if request.url.path.startswith(_API_PATH_PREFIXES):
|
|
return False
|
|
accept = request.headers.get("accept", "")
|
|
return "text/html" in accept
|
|
|
|
async def _resolve_error_user(request) -> dict | None:
|
|
"""Best-effort user resolution for the error page header.
|
|
|
|
Mirrors ``app.auth.dependencies.get_optional_user`` precedence
|
|
(LOCAL_DEV_MODE → seeded dev user, else verify JWT from
|
|
Authorization header or ``access_token`` cookie). Returns None on
|
|
any failure — error page still renders, just without the user menu.
|
|
"""
|
|
try:
|
|
from app.auth.dependencies import get_current_user
|
|
from src.db import get_system_db
|
|
|
|
conn = get_system_db()
|
|
try:
|
|
authorization = request.headers.get("authorization")
|
|
return await get_current_user(
|
|
request=request, authorization=authorization, conn=conn
|
|
)
|
|
finally:
|
|
try:
|
|
conn.close()
|
|
except Exception:
|
|
pass
|
|
except Exception:
|
|
return None
|
|
|
|
async def _render_error(request, code: int, message: str, traceback_str: str | None = None):
|
|
"""Render error.html with the same chrome (header, theme, static_url)
|
|
as any other web route. Reuses ``_build_context`` so the page picks up
|
|
ConfigProxy, theme overrides, session user, and ``static_url`` /
|
|
``url_for`` helpers — without these, base.html + _app_header.html
|
|
silently render empty header/stylesheets."""
|
|
from app.logging_config import request_id_var
|
|
from app.web.router import templates as _web_templates, _build_context
|
|
|
|
title = _ERROR_TITLES.get(code, "Error")
|
|
user = await _resolve_error_user(request)
|
|
ctx = _build_context(
|
|
request,
|
|
user=user,
|
|
code=code,
|
|
title=title,
|
|
message=message,
|
|
path=request.url.path,
|
|
traceback=traceback_str,
|
|
request_id=request_id_var.get(),
|
|
)
|
|
return _web_templates.TemplateResponse(request, "error.html", ctx, status_code=code)
|
|
|
|
@app.exception_handler(StarletteHTTPException)
|
|
async def _html_auth_redirect_handler(request, exc: StarletteHTTPException):
|
|
"""Browser-friendly error rendering for HTML routes; JSON for API routes.
|
|
|
|
- 401 GET on a non-API path → redirect to ``/login`` (existing contract).
|
|
- Any other status code on a non-API path with HTML-accepting client →
|
|
render ``error.html`` (toolbar middleware injects panels because the
|
|
``_catch_all_404`` route at the end of ``app.web.router`` provides a
|
|
matched route for unrouted paths).
|
|
- API prefixes (``/api/``, ``/auth/``, ``/marketplace.zip``,
|
|
``/marketplace.git``, ``/marketplace/``) and non-HTML clients → JSON
|
|
``{"detail": "..."}`` per the existing contract.
|
|
"""
|
|
path_is_api = request.url.path.startswith(_API_PATH_PREFIXES)
|
|
|
|
if (
|
|
exc.status_code == 401
|
|
and request.method == "GET"
|
|
and not path_is_api
|
|
):
|
|
next_param = quote(request.url.path, safe="")
|
|
return RedirectResponse(url=f"/login?next={next_param}", status_code=302)
|
|
|
|
if not path_is_api and _wants_html(request):
|
|
return await _render_error(request, exc.status_code, exc.detail or "")
|
|
|
|
from fastapi.exception_handlers import http_exception_handler
|
|
return await http_exception_handler(request, exc)
|
|
|
|
@app.exception_handler(Exception)
|
|
async def _unhandled_exception_handler(request, exc: Exception):
|
|
"""Catch-all 500 handler — HTML for browsers, JSON for API clients."""
|
|
import os as _os
|
|
import traceback as _tb
|
|
logger.exception("Unhandled exception on %s %s", request.method, request.url.path)
|
|
|
|
# Best-effort: forward the exception to PostHog before rendering the
|
|
# error page. Disabled state is a cheap no-op. Wrapped because a
|
|
# tracing failure must never replace the user-visible 500 with a
|
|
# second exception.
|
|
try:
|
|
from src.observability import get_posthog
|
|
from app.logging_config import request_id_var as _rid_var
|
|
get_posthog().capture_exception(
|
|
exc,
|
|
request=request,
|
|
properties={
|
|
"request_id": _rid_var.get(),
|
|
"path": request.url.path,
|
|
"method": request.method,
|
|
},
|
|
)
|
|
except Exception:
|
|
logger.exception("PostHog capture_exception failed in 500 handler")
|
|
|
|
path_is_api = request.url.path.startswith(_API_PATH_PREFIXES)
|
|
debug_on = _os.environ.get("DEBUG", "").lower() in ("1", "true", "yes")
|
|
tb_str = _tb.format_exc() if debug_on else None
|
|
|
|
if not path_is_api and _wants_html(request):
|
|
# In production (DEBUG unset), never leak str(exc) to the
|
|
# rendered page — exception messages routinely contain DB paths,
|
|
# SQL fragments, internal hostnames, or credentials embedded in
|
|
# connection strings. Match the JSON branch's debug_on guard.
|
|
# Devin BUG_0001 on PR #136 (b1c6ee9 review).
|
|
visible_message = str(exc) if debug_on else "Internal server error"
|
|
return await _render_error(request, 500, visible_message, tb_str)
|
|
|
|
from app.logging_config import request_id_var
|
|
from fastapi.responses import JSONResponse
|
|
body: dict[str, str | None] = {
|
|
"detail": "Internal server error",
|
|
"request_id": request_id_var.get(),
|
|
}
|
|
if debug_on:
|
|
body["error"] = str(exc)
|
|
return JSONResponse(body, status_code=500)
|
|
|
|
return app
|
|
|
|
|
|
app = create_app()
|