* feat(unified-stack): Browse + My Stack + Recipes + RBAC matrix (v49–v55)
Squash of 94 commits spanning the v49 → v55 unified-stack rewrite.
Full per-feature breakdown lives in CHANGELOG.md under [Unreleased].
Major buckets:
* v49 schema — first-class user_groups + user_group_members +
resource_grants; admin can CRUD groups and grants; Google
Workspace nightly sync writes into the new tables.
* v49 data_packages — admin-curated bundles of tables, RBAC-gated,
first-class section on /catalog Browse + My Stack.
* v49 memory_domains — row-backed (replaces hardcoded VALID_DOMAINS
enum); admin can CRUD; grants follow the same shape as tables and
packages.
* v50 cover_image_url + admin sidebar collapsibles + per-row Mode
tooltip + admin queue domain badges + admin "+ New Item" seed flow.
* v51 lifecycle status (prod/poc/coming-soon/draft) + category +
palette swatches on admin modals.
* v52 per-table detail page /catalog/t/<id>.
* v53 Recipes — admin-curated SQL templates as a second tab on
/catalog with full Edit/Delete admin affordances.
* v54 soft-delete (deleted_at) + Undo toast for packages, memory
domains, and recipes; hard_delete() retained as escape hatch.
* v55 Recipes RBAC — ResourceType.RECIPE registered, inline Group
Access matrix on Create + Edit Recipe modals (mirrors the Memory
Domain pattern).
* Activity Center per-resource filter (resource_prefix LIKE-anchored
on audit_log.resource); admin nav g+letter keyboard shortcuts;
loadAdminTablesLayout N+1 → single endpoint; /api/memory 30s
page-level cache.
* CI hardening — Keboola legacy tests pytest.importorskip; perf-
smoke threshold widened to stop cold-cache flake.
5002 tests passing, 35 skipped.
* feat(p2 backlog): Cmd-K palette + suggest-a-domain + nightly E2E + v55 schema
10-item P2 sweep on top of the unified-stack squash. New behaviour:
* Cmd-K admin command palette (base.html) — fuzzy-search overlay over
admin + user-facing routes. Arrows/Enter to navigate, Esc to close.
* Stack-tabs digit shortcuts — 1/2/3 switch Browse / My Stack /
Recipes on /catalog + /corporate-memory.
* Friendlier non-admin empty state on /corporate-memory, plus a
"Suggest a domain" CTA → POST /api/memory-domain-suggestions, admin
queue with approve/reject. Backed by a new memory_domain_suggestions
table (schema v55).
* /admin/corporate-memory 7-tab strip grouped under Moderation /
Catalog parent labels.
* Bulk-assign table → package dropdown annotates each option with
"(N of M tables already in)" so the existing distribution is visible
before picking a target.
* GET /api/memory + /tree accept is_required filter; admin status
dropdowns route the "Required" sentinel onto it (status no longer
holds 'mandatory' post-v49, so the old dropdown returned nothing).
* chip-input.js is now opt-in per template via {% block extra_scripts %}
instead of loaded globally on every page from base.html.
* Edit-modal close helpers consolidated onto _closeEditModalById();
docs the per-source-type modal architecture decision.
* New .github/workflows/e2e-nightly.yml runs agent-browser smoke
scripts (scripts/e2e/smoke_*.sh) against a docker-compose stack
nightly at 04:30 UTC; failures open an agent-browser-nightly issue.
5012 tests passing, 35 skipped.
* fix(visual audit): 6 page regressions on memory + data-package surfaces
agent-browser walkthrough of every memory + data-package page in the PR
turned up 6 real bugs. Fixes:
1. Admin memory modals were dead. Duplicate `let _cmdNewDomainId`
declarations from the deprecated step-2 RBAC stubs in
admin_corporate_memory.html collided with the live state vars
declared earlier in the same <script> → SyntaxError on parse →
the entire second script block silently failed → every inline
onclick= handler defined there (`+ New Memory Domain`, Edit, etc.)
was a no-op. Removed the duplicate stubs.
2. /catalog/t/<table_id> + /catalog/r/<slug> rendered unstyled.
Both templates injected their CSS via {% block head %} but
base.html exposes {% block head_extra %} — wrong block name
meant <style> rules never reached the rendered HTML. Renamed
to head_extra. Hero card, section cards, dark SQL block, proper
full-width inputs all now render as designed.
3. L49 leak — "MANDATORY" KPI label + "Make Mandatory" row buttons
on /admin/corporate-memory still used the old word. Renamed to
"Required" / "Mark as Required" so UI matches the data model
(v49 split moved the Required tier onto the orthogonal
is_required boolean; status no longer holds 'mandatory').
4. Activity Center Resource dropdown didn't know the v55
`memory_domain_suggestion:` namespace — added it.
5. Tab strip on /admin/corporate-memory wrapped text 2× per button
on narrow viewports after the L50 MODERATION/CATALOG group
labels pushed total width past most viewports. Switched the
strip to flex-wrap:nowrap + overflow-x:auto with
white-space:nowrap + flex-shrink:0 on every direct child so the
tabs stay one row and slide horizontally when they overflow.
5012 tests passing, 35 skipped.
* rebase-cleanup: align with main's 0.54.25-27 API design + comment fix
Three follow-on fixes after rebasing onto origin/main (0.54.27):
* admin_tables.html: dropped a stray nested ``{% if data_source_type
== 'keboola' %}`` around ``prefillFromKeboolaTable`` (main never had
it; the outer Phase F2 guard already covers it) and reworded a JS
comment that contained literal ``{% %}`` tokens which Jinja was
parsing as a real tag → unbalanced if/endif → 30 template render
failures across the suite.
* /api/stack/subscription/{type}/{id}: DELETE now returns 204 instead
of 200 per the 0.54.26 design rules. CLI client + parity tests
updated to accept 2xx / assert 204.
* Memory-domain suggestion approve/reject paths added to
``_VERB_PATH_ALLOWLIST`` — they are pending → approved/rejected
state-machine transitions (approve also creates the real
memory_domains row as a side effect), so the RPC shape is
intentional rather than a missed PATCH refactor.
5035 tests passing, 35 skipped.
* fix(catalog_table_detail): real polish pass — hero glyph, dedup pills, rows/size meta, scoped sync CTA
The previous fix only got the block-name typo so the existing CSS rendered.
The actual layout was still wireframe-tier on close inspection:
* No cover glyph in the hero (a flat white card with title + meta line);
data-package + memory-domain detail pages both have a colored icon
square. Restored parity — table.icon emoji if set, otherwise initials
on a colored square using table.color.
* "INTERNAL" pill rendered twice for agnes_audit etc. — the mode pill
and the source-type pill happened to be identical strings. Now skip
the source pill when it matches the mode (`internal == internal`).
* Bucket / source_table code chip showed `Agnes Internal.audit_log` for
internal rows — meaningless to a user. Hidden when source_type is
internal.
* `pairs_well_with` admin input was a comma-separated `<input>` always
visible. Wrapped all 4 sections in an Edit-on-demand toggle: read-
only display by default, "+ Add" / "Edit" button on the right edge
of each section header reveals the inline form, Cancel hides it.
* "Trigger sync now" was a cramped link squashed into the empty-state
flex row (visible as `Tr…` overflow before). Promoted to a proper
btn-primary button under the empty-state copy. Hidden entirely for
internal tables (which are server-managed — no upstream to pull).
* Hero meta now surfaces row count + payload size (when sync_state has
them) + last sync timestamp on a single line — was missing from the
original.
* Mode pills colored by tier (local=green, remote=amber, materialized=
blue, internal=gray) so the basic fact about a table reads at a
glance, not from upper-cased ALL-CAPS text alone.
* tests(v56): TDD baseline for extended data-packages content + per-table docs
68 failing tests across 8 files spec the v56 surface before any
implementation lands:
* test_schema_v55_to_v56_migration.py — schema bump, additive ALTERs
on data_packages + table_registry, idempotency, sequential-upgrade
preservation
* test_data_packages_repo_v56.py — repo create/update/get/list for
owner_name, owner_team, tags, long_description, when_to_use,
when_not_to_use, example_questions (JSON list round-trip, empty
defaults, partial-update preservation)
* test_table_registry_v56_docs.py — update_docs for grain, platforms,
partition_col, history, gotchas; preserves v52 docs columns
* test_api_data_packages_v56.py — PUT/POST/GET for all new fields,
field-level validation (tag count, bullet length, description size),
virtual badge derivation (curated/new)
* test_api_registry_docs_v56.py — PATCH /api/admin/registry/{id}/docs
for v56 fields, validation, RBAC unchanged
* test_web_catalog_package_detail_v56.py — /catalog/p/<slug> rewrite
asserts on rendered owner line, tag pills, badges, What it is,
Use it when, Skip it when, Example questions, per-table extended
detail in collapsible row, key-gotcha distinctness, admin-only Edit
* test_web_stack_card_v56_metadata.py — Browse-grid card additions
(owner chip, tag chips, badges) without breaking back-compat for
rows missing the new fields
* test_data_packages_no_vendor_content.py — CI guard: scans app/ +
src/ + cli/ + config/ + scripts/ for Groupon-specific tokens from
the colleague's spec MD; fails if any leak into OSS surfaces
* test_db_schema_version.py — bumped 55 → 56 with rationale
Plus updates schema-version assertion to 56. Implementation lands in
subsequent commits (schema migration → repo → API → templates).
* feat(v56): schema + repo for extended data-packages content
Schema additions (ALTER ADD COLUMN IF NOT EXISTS — additive + idempotent):
* data_packages: owner_name, owner_team, tags, long_description,
when_to_use, when_not_to_use, example_questions (JSON-as-VARCHAR for
the lists)
* table_registry: grain, platforms, partition_col, history, gotchas
(extends the v52 sample_questions / things_to_know / pairs_well_with
docs surface with structured per-table content)
Repo extensions:
* DataPackagesRepository.create + update accept the new fields with
the same Optional-is-no-op contract as v51 (pass an empty list to
clear a JSON column)
* _decode_row decodes the new JSON-list columns to Python lists; NULL
rounds back to [] so callers don't branch
* TableRegistryRepository.update_docs grew the v56 fields alongside
the existing v52 ones — single PATCH can write either tier
atomically
* TableRegistryRepository._decode_row picks up platforms + gotchas in
the same NULL-tolerant decoder
22 repo + migration tests passing. API + UI land in subsequent commits.
* feat(v56): API surface for extended data-packages + per-table docs
CreateDataPackageRequest + UpdateDataPackageRequest grew the v56 fields
(owner_name, owner_team, tags, long_description, when_to_use,
when_not_to_use, example_questions) with per-field validators that
match the Foundry spec checklist:
* tags: ≤8 entries × ≤30 chars
* long_description: ≤4000 chars
* use/skip: ≤8 bullets × ≤200 chars
* example_questions: ≤12 × ≤200 chars
_serialize emits all v56 fields plus a virtual ``badges`` list derived
server-side at render time (no DB column needed): "curated" when the
creator is in the Admin group, "new" within 30 days of created_at.
Backdating created_at or admin-status changes pick up automatically.
PATCH /api/admin/registry/{id}/docs extended with v56 structured
per-table fields (grain, platforms, partition_col, history, gotchas).
gotchas: list of {key: bool, body: str} Pydantic models with the same
≤8 cap; first key=true entry becomes the Key gotcha on the rendered
package detail page. PATCH echoes the fresh state so callers can
re-render without a second GET.
26 API tests passing (16 data-packages + 10 registry-docs).
* feat(v56): /catalog/p/<slug> rewrite + Browse-grid card augmentation
The third (and final) v56 commit lights up the UI surfaces backed by
the schema + API commits earlier in this PR:
* /catalog/p/<slug> template rebuilt around the Foundry spec's
section ladder — hero (icon + name + badges + owner + tags +
description + meta + Add-to-stack), "What it is" markdown body,
paired "Use it when / Skip it when" panels, "Tables in this
package" with collapsible per-table extended detail (grain /
platforms / partition_col / history / gotchas + sample questions),
and an "Example questions you can ask Claude" prompt panel. Each
section guarded by ``{% if pkg.<field> %}`` — empty content fields
hide the section entirely (no "No X yet" placeholder noise on the
public-facing drilldown).
* router catalog_package_detail hydrates per-table v56 fields onto
the tables list + derives the virtual badges (curated / new)
server-side from creator-in-Admin + 30-day created_at.
* StackResolver.ResourceEntry grew owner_name / owner_team / tags /
badges; _fetch_entries pulls the v56 columns + computes badges
once per fetch using a single Admin-group SELECT.
* _data_package_entry_dict adapter passes the new fields through to
the macro; tags are merged source-type pills + admin-authored
category tags per the spec convention.
* _stack_card.html renders the v56 badges (top-left, data-badge=
hooks) + the owner chip (data-card-owner hook) without breaking
back-compat — pre-v56 rows render unchanged.
* Admin PUT handler strips the v56 docs fields from the
read-modify-write merged dict so register() doesn't blow up
with the now-larger row shape (same pattern as the v52 docs
fields stripping).
5115 tests passing (+98 v56 + 18 fixed regressions from the merged-
register PUT path), 35 skipped.
* fix(rbac): Edit-on-package + Group-access 'required' persistence + CI vendor guard
Three related bugs reported on the merged-with-main branch:
1. Clicking Edit on a Data Package card landed on /admin/tables with
a `#<pkg.id>` hash that nothing listened to — admin saw the global
table listing, not the editor for that specific package. Added a
`?edit_package=<pkg_id>` query-param handler in admin_tables.html
(analog to the existing `?edit=<table_id>` and `?assign_to=<pkg_id>`
patterns) that calls openEditDataPackageModal on DOMContentLoaded
after a 250ms layout settle. Updated the package-detail Edit link
to use the new query param.
2. Setting Group Access to 'required' didn't persist — re-opening
the modal showed 'available'. Root cause was the v49
``resource_grants.requirement`` enum existing in the DB but the
POST /api/admin/grants endpoint not surfacing it: ``CreateGrantRequest``
declared only group_id + resource_type + resource_id, so Pydantic
silently dropped the matrix's ``requirement: 'required'`` payload
and the new row landed at the DB column default ('available').
Plumbed ``requirement`` through ``CreateGrantRequest`` →
``ResourceGrantsRepository.create`` so the value persists in one
round-trip. Plus a UNIQUE-constraint race in the matrix
diff-apply: DELETE-old + POST-new ran in parallel via
``Promise.allSettled``, so POST could fire first and trip the
unique check before DELETE freed the slot. Switched to sequential
(await all deletes; then await all writes) across all three
matrices (Edit Data Package, Edit Memory Domain, Edit Recipe).
3. CI vendor-content guard ``test_no_groupon_specific_strings_in_oss``
tripped on two of my own docstrings: a "Foundry Data team" mention
in two src/db.py comments + an ``s1_session_landings`` example in
cli/skills/agnes-table-registration.md. Rephrased the comments to
"extended-descriptions admin spec" and replaced the example with
a generic ``events_daily`` table name.
5164 tests passing, 35 skipped (+4 regression tests pinning the POST
/api/admin/grants requirement contract). Vendor guard back to green.
* fix(catalog): admin Browse path drops v58 card fields
The /catalog and /memory admin god-mode branch built ResourceEntry
instances inline from pkg_repo.list() / domains_repo.list() and skipped
owner_name, owner_team, tags, and derived badges (curated/new). Visible
symptom: a package with an owner + tags rendered with the v56 chrome
for non-admin viewers but as a bare card for admins.
Adds StackResolver.browse_admin(user_id, resource_type) — admin god-mode
Browse that walks the full table but routes through the same
_fetch_entries enrichment pass as browse(), so admin + non-admin Browse
stay visually consistent. Both /catalog and /corporate-memory routes
switch to it.
Regression test in tests/test_stack_resolver_browse_admin.py covers:
owner/tags propagation, new/curated badge derivation, in_stack from
admin subscriptions, all-packages-regardless-of-grants, and the
ValueError for unsupported resource types.
* fix(catalog): three /catalog tab-strip UX bugs
1. Required Remove → red toast
browse_admin passed empty required_ids to _fetch_entries, so the
admin's own required grants surfaced as 'available' and the macro
rendered an actionable Remove button that POST /unsubscribe 400'd
on. Now derives required_ids from the admin's own groups so
Required packages render with the disabled "In stack (required)"
button. Regression test in test_stack_resolver_browse_admin.py.
2. Remove green-toasts but card stays until refresh
The My-Stack empty-state placeholder was only emitted server-side
when stack_entries was empty at render time. Removing the last
card left the tab completely blank — users read that as "Remove
didn't work, let me refresh". Both grid + empty-state are now
always rendered with one of them initially hidden; the JS swaps
visibility on add/remove instead of injecting DOM. Same fix in
/corporate-memory.
3. "What are Recipes?" + ambiguous (admin) suffix
Recipes tab now carries its own curator-block explainer (the
shared one was moved inside Browse view so it doesn't bleed
across tabs). The grey "(admin)" suffix becomes a yellow
.admin-only-hint chip with a title tooltip — visibility hint is
now unambiguous: yellow chip = "only you see this", non-admins
don't see the affordance at all.
* schema: renumber v51..v58 → v52..v59 to make room for main's v51
Main 0.54.29 introduced a NEW v51 (table_registry.bq_fqn — issue #343)
that releases ahead of this branch. The unified-stack chain v51..v58
shifts up by one so main's v51 stays as the released schema and ours
become v52..v59. Function names, internal version bumps, dispatch
ladder thresholds, and the migration-test references all move
together. Subsequent merge with main lands the bq_fqn column at the
freed v51 slot.
* fix(seed): seed admin lands in BOTH Admin AND Everyone groups
The LOCAL_DEV_MODE / SEED_ADMIN_EMAIL bootstrap only added the seed
user to Admin. Everyone-scoped grants — the canonical "every-user-
sees-this" pattern for Required onboarding — didn't surface for the
seed admin's own /catalog because they weren't in Everyone. Symptom:
admin grants a Required-tier package to Everyone, then sees it on
/catalog still rendered with an "Add to stack" button (because the
admin's resolved required_ids was empty for that package).
The dual-membership keeps Admin (authorization) and Everyone
(default-grant target) intentionally separate per the design comment
on UserRepository.create — every membership remains traceable to a
concrete row, just now with a system_seed row in Everyone too. Both
INSERTs go through UserGroupMembersRepository.add_member which is
idempotent on (user_id, group_id), so re-fires on every lifespan
startup don't duplicate rows.
Regression test in test_main_seed_admin_everyone.py.
* style: unify admin-only hints across marketplace + memory detail pages
Replaces three stale ``(admin)`` parentheticals with the same yellow
``admin-only`` chip introduced for /catalog tab actions. Same tooltip
copy ("Visible only to admins — analysts won't see this …") so the
visibility hint is unmistakable wherever it appears:
- Hard delete on marketplace_plugin_detail (admin-only destructive
action — same gating as the original suffix conveyed).
- Hard delete on marketplace_item_detail (same).
- Edit link on memory_domain_detail (title-attr only before; now a
visible chip too).
Non-admin viewers never saw these affordances — the gates are
unchanged. Pure styling pass for consistency.
* fix(catalog): exclude soft-deleted data packages + memory domains from Browse
``StackResolver._fetch_entries`` and ``browse_admin`` were querying
data_packages / memory_domains without a ``deleted_at IS NULL`` guard.
A package soft-deleted via /admin/* (v54 soft-delete contract) stayed
visible on /catalog and /memory until either an Undo or a hard delete
— directly contradicting the soft-delete UX which is supposed to
remove the affordance immediately and only retain the row for the
Undo window.
The repository accessors (DataPackagesRepository.list,
MemoryDomainsRepository.list, list_packages_of_table, etc.) already
filter deleted rows; this commit brings the resolver's direct SQL in
line with that contract.
Regression test in test_stack_resolver_browse_admin.py.
* fix(catalog): Add/Remove updates full card chrome, not just button
The previous _applyStackChange flipped only the footer button label —
the card border (.is-in-stack class), top-right "In stack" badge, and
button color class (--add / --remove) stayed at their server-rendered
state. After Add the user saw the button checkmark but the rest of
the card still looked like "available, not in stack". They read this
as "the change didn't take — let me refresh".
This commit makes the optimistic update mirror what the server-side
macro renders for the new state:
* ``c.classList.toggle('is-in-stack', becameInStack)`` — flips the
border + visual state class.
* Top-right ``.stack-card__req-badge--instack`` badge is injected on
Add, removed on Remove (skipped when ``data-requirement='required'``
— that slot is owned by the Required badge).
* Button text is "Remove" / "+ Add to stack" matching the macro
(was "✓ In stack" which was visually nice but inconsistent).
* Button color class --add / --remove swaps so the destructive Remove
tint kicks in immediately.
The clone-into-My-Stack path applies the same updates so the new card
in My Stack reads identically to a server-rendered in_stack card.
Mirrored in /corporate-memory.
* fix(memory): four Devin-review bugs on /memory drill-down + manifest
PR #333 Devin review surfaced four real bugs that ship a broken
/memory experience even though the unit tests passed.
1. Manifest md5 omits is_required + content (app/api/sync.py:836-840)
_build_memory_domains_section hashed only (id|title|status) per
item. _build_per_domain_markdown routes items between "## Required"
and "## Approved" by is_required and embeds full content — so an
admin edit of either dimension left the manifest md5 unchanged,
`agnes pull` skipped the re-fetch, and the analyst kept a stale
bundle.md. Now both fields participate in the hash.
2. required_count always 0 (src/repositories/memory_domains.py)
list_items_of_domain only SELECTed (id, title, status) so the
`it.get("is_required")` in the manifest builder always evaluated
to None → required_count = 0 regardless of actual state. The
manifest builder advertised a count it could never compute. Now
projects is_required + content too (required by fix 1 anyway).
3. Vote URL 404 (memory_domain_detail.html:289-290)
Constructed `/api/memory/items/{id}/vote` but the route is
`/api/memory/{id}/vote`. Every upvote/downvote button was a
silent no-op.
4. Dismiss/undismiss URL + method both wrong (memory_domain_detail.html:296-305)
Constructed `/api/memory/items/{id}/dismiss` (extra /items/) and
/undismiss (no such route — undismiss is DELETE on /dismiss).
Both buttons silently 404'd. Now POST + DELETE on
`/api/memory/{id}/dismiss` per app/api/memory.py:635/675.
* fix: multi-agent reviewer findings — vendor-token scrubs + manifest md5 predicate + soft-delete filter
Three reviewer findings from the multi-agent review on PR #333,
fixed in-place per CLAUDE.md issue-economy rule.
Reviewer-rules (Important — vendor-agnostic OSS):
- app/main.py:218 comment: replaced 'foundryai-prod' with generic
'a customer prod instance' phrasing. Public OSS repo must not
carry customer-specific tokens (CLAUDE.md § Project conventions).
- tests/test_table_registry_v56_docs.py:70 fixture string:
replaced "user_brand_affiliation = 'groupon'" with 'acme' on
the same rule.
Reviewer-architecture (closes still-unresolved Devin 🚩 ANALYSIS):
- app/api/sync.py _build_memory_domains_section: md5 hash loop now
filters items to the SAME predicate the bundle renderer uses
(is_required OR status='approved'). Pre-fix the hash iterated ALL
items but _build_per_domain_markdown only rendered the union of
required items + approved-non-required items — so an admin edit
to a pending/rejected non-required item flipped the md5 against
an identical-bytes bundle, triggering a wasteful re-fetch on
every analyst's next 'agnes pull'. The earlier commit fixed the
hash-input fields (is_required + content); this closes the
set-of-items asymmetry Devin separately flagged.
Reviewer-RBAC (minor cleanup):
- app/resource_types.py _data_package_blocks and _memory_domain_blocks
now filter 'WHERE deleted_at IS NULL' (v54 soft-delete column) so
the /admin/access UI doesn't surface soft-deleted entities as
grantable. Mirrors the existing filter on _recipe_blocks. No
security leak pre-fix (resolver double-filters and re-checks at
serve time), just UI cleanliness.
- app/services/stack_resolver.py add_to_stack: docstring note
added explaining that authorization is enforced at the API layer
(app/api/stack.py can_access gate), not at the resolver. The
initial review suggested adding a defensive 403 here, but that
broke 5 existing tests that legitimately call add_to_stack
directly without setting up grants first; the docstring captures
the contract instead. stack() already intersects subscriptions
with current available_ids on every read, so a 'zombie' row from
a misuse never leaks into the user-facing manifest.
* release: 0.55.0 — unified Browse + My Stack (Data Packages + Memory), schema v48→v59, 3 BREAKING
1013 lines
46 KiB
Python
1013 lines
46 KiB
Python
"""FastAPI main application — unified server for web UI + API."""
|
|
|
|
# Silence authlib's internal forward-compat note. Authlib emits an
|
|
# AuthlibDeprecationWarning from its own _joserfc_helpers when our
|
|
# `from authlib.integrations.starlette_client import OAuth` import
|
|
# touches `authlib.jose` paths. The warning is upstream-internal — it's
|
|
# telling authlib to migrate to joserfc before its 2.0; it's not
|
|
# actionable on our side until either authlib ships the fix or we
|
|
# rewrite OAuth handling on top of joserfc directly. Filtering here
|
|
# (before authlib gets imported transitively) keeps `make local-dev`
|
|
# stdout clean without hiding warnings from any other package.
|
|
import warnings as _warnings
|
|
try:
|
|
from authlib.deprecate import AuthlibDeprecationWarning as _AuthlibDepr
|
|
_warnings.filterwarnings("ignore", category=_AuthlibDepr)
|
|
except ImportError:
|
|
# authlib too old / class moved — fall back to message-based match
|
|
# so the filter still keeps startup clean.
|
|
_warnings.filterwarnings(
|
|
"ignore",
|
|
message=r"authlib\.jose module is deprecated.*",
|
|
)
|
|
|
|
import logging
|
|
from contextlib import asynccontextmanager
|
|
from pathlib import Path
|
|
from urllib.parse import quote
|
|
|
|
import os
|
|
|
|
# Initialise structured logging BEFORE any module that emits logs at import
|
|
# time. setup_logging is idempotent and safe to call once at process start.
|
|
from app.logging_config import setup_logging
|
|
|
|
setup_logging("app")
|
|
|
|
from app.version import APP_VERSION, MIN_COMPAT_CLI_VERSION
|
|
|
|
from fastapi import Depends, FastAPI
|
|
from fastapi.middleware.cors import CORSMiddleware
|
|
from fastapi.responses import RedirectResponse
|
|
from fastapi.staticfiles import StaticFiles
|
|
from starlette.exceptions import HTTPException as StarletteHTTPException
|
|
from starlette.middleware.gzip import GZipMiddleware
|
|
from starlette.middleware.sessions import SessionMiddleware
|
|
from starlette.types import ASGIApp, Receive, Scope, Send
|
|
|
|
from app.middleware.request_id import RequestIdMiddleware
|
|
|
|
|
|
class _SelectiveGZipMiddleware:
|
|
"""GZipMiddleware wrapper that skips a set of path prefixes.
|
|
|
|
Parquet-serving endpoints send responses that are already columnar-
|
|
compressed (parquet's internal codec) and — for /api/data — can reach
|
|
hundreds of MB. Gzipping them on the way out costs CPU and latency with
|
|
no meaningful size reduction. Skip those paths; every other endpoint
|
|
(JSON manifests, HTML previews, install.sh) still gets compressed.
|
|
"""
|
|
|
|
def __init__(self, app: ASGIApp, minimum_size: int = 1024, skip_prefixes: tuple[str, ...] = ()) -> None:
|
|
# `self.app` is the Starlette middleware convention — outer middleware
|
|
# (e.g. fastapi-debug-toolbar's APIRouter walker) traverses the chain
|
|
# via `.app` to find the inner FastAPI app. Keep `_raw` as the public
|
|
# alias used by our own __call__ for the skip-path branch.
|
|
self.app = app
|
|
self._raw = app
|
|
self._gzip = GZipMiddleware(app, minimum_size=minimum_size)
|
|
self._skip_prefixes = skip_prefixes
|
|
|
|
async def __call__(self, scope: Scope, receive: Receive, send: Send) -> None:
|
|
if scope.get("type") == "http":
|
|
path = scope.get("path", "")
|
|
if any(path.startswith(p) for p in self._skip_prefixes):
|
|
await self._raw(scope, receive, send)
|
|
return
|
|
await self._gzip(scope, receive, send)
|
|
|
|
from app.auth.rate_limit import (
|
|
SlowAPIMiddleware as _AuthRateLimitMiddleware,
|
|
RateLimitExceeded as _AuthRateLimitExceeded,
|
|
_rate_limit_exceeded_handler as _auth_rate_limit_handler,
|
|
limiter as _auth_rate_limiter,
|
|
)
|
|
from app.auth.router import router as auth_router
|
|
from app.api.health import router as health_router
|
|
from app.api.sync import router as sync_router
|
|
from app.api.data import router as data_router
|
|
from app.api.query import router as query_router
|
|
from app.api.users import router as users_router
|
|
from app.api.memory import router as memory_router
|
|
from app.api.upload import router as upload_router
|
|
from app.api.scripts import router as scripts_router
|
|
from app.api.settings import router as settings_router
|
|
from app.api.catalog import router as catalog_router
|
|
from app.api.telegram import router as telegram_router
|
|
from app.api.access import router as access_router, me_router as me_access_router
|
|
from app.api.me import router as me_router
|
|
from app.api.me_stats import router as me_stats_router
|
|
from app.api.admin import router as admin_router
|
|
from app.api.admin_bigquery_test import router as admin_bigquery_test_router
|
|
from app.api.jira_webhooks import router as jira_webhooks_router
|
|
from app.api.metrics import router as metrics_router
|
|
from app.api.metadata import router as metadata_router
|
|
from app.api.query_hybrid import router as query_hybrid_router
|
|
from app.api.cli_artifacts import router as cli_artifacts_router
|
|
from app.api.tokens import router as tokens_router, admin_router as tokens_admin_router
|
|
from app.api.v2_catalog import router as v2_catalog_router
|
|
from app.api.v2_schema import router as v2_schema_router
|
|
from app.api.v2_sample import router as v2_sample_router
|
|
from app.api.v2_scan import router as v2_scan_router
|
|
from app.api.marketplaces import router as marketplaces_router
|
|
from app.api.data_packages import router as data_packages_router
|
|
from app.api.memory_domains import router as memory_domains_router
|
|
from app.api.recipes import (
|
|
public_router as recipes_public_router,
|
|
admin_router as recipes_admin_router,
|
|
)
|
|
from app.api.memory_domain_suggestions import (
|
|
public_router as memory_domain_suggestions_public_router,
|
|
admin_router as memory_domain_suggestions_admin_router,
|
|
)
|
|
from app.api.uploads import router as admin_uploads_router
|
|
from app.api.stack import router as stack_router
|
|
from app.api.stack_views import router as stack_views_router
|
|
from app.api.initial_workspace import router as initial_workspace_router
|
|
from app.api.store import router as store_router
|
|
from app.api.my_stack import router as my_stack_router
|
|
from app.api.marketplace import router as marketplace_router
|
|
from app.api.welcome import router as welcome_router
|
|
from app.api.claude_md import router as claude_md_router
|
|
from app.api.news import router as news_router
|
|
from app.api.cache_warmup import router as cache_warmup_router
|
|
from app.api.bq_metadata_refresh import router as bq_metadata_refresh_router
|
|
from app.api.activity import router as activity_router
|
|
from app.api.observability import router as observability_router
|
|
from app.api.admin_user_sessions import router as admin_user_sessions_router
|
|
from app.api.admin_sessions import router as admin_sessions_router
|
|
from app.api.admin_usage import router as admin_usage_router
|
|
from app.api.admin_usage_summary import router as admin_usage_summary_router
|
|
from app.marketplace_server.router import router as marketplace_server_router
|
|
from app.marketplace_server.git_router import make_git_wsgi_app
|
|
from app.web.router import router as web_router
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
@asynccontextmanager
|
|
async def lifespan(app):
|
|
# Issue #81 Group A — log the effective remote_attach allowlist at
|
|
# startup so an operator's typo in AGNES_REMOTE_ATTACH_EXTENSIONS
|
|
# (which REPLACES, not extends, the default) is visible.
|
|
try:
|
|
from src.orchestrator_security import log_effective_policy
|
|
log_effective_policy()
|
|
except Exception:
|
|
pass # never block startup on a logging convenience
|
|
|
|
# Bump anyio's default thread pool size from 40 → AGNES_THREADPOOL_SIZE
|
|
# (default 200). FastAPI auto-runs every plain `def` route handler in
|
|
# this pool — the Tier 1 endpoints converted in PR #188 (`/api/query`,
|
|
# `/api/v2/scan`, `/api/v2/sample`, `/api/v2/schema`) all block on
|
|
# synchronous DuckDB / BQ-extension calls inside the handler body and
|
|
# would otherwise serialise once 40 are in flight. 200 keeps the per-
|
|
# process working set well under the BQ extension's connection cap
|
|
# while leaving headroom for concurrent UI / health probes.
|
|
try:
|
|
import anyio.to_thread
|
|
size = int(os.environ.get("AGNES_THREADPOOL_SIZE", "200"))
|
|
anyio.to_thread.current_default_thread_limiter().total_tokens = size
|
|
logger.info("anyio thread pool capacity set to %d", size)
|
|
except Exception as e:
|
|
logger.warning("failed to bump anyio thread pool capacity: %s", e)
|
|
|
|
from app.api.cache_warmup import maybe_schedule_startup_warmup
|
|
maybe_schedule_startup_warmup()
|
|
|
|
# Sweep stale materialize parquet locks left behind by previous runs
|
|
# that were SIGKILL'd mid-materialize. Lazy reclaim at next acquire
|
|
# already handles correctness, but an active sweep at startup keeps
|
|
# the data directory tidy and gives operators a clear "swept N" log
|
|
# line instead of zombie 0-byte files lingering for days (issue #260).
|
|
try:
|
|
from connectors.bigquery.extractor import sweep_stale_parquet_locks
|
|
from src.db import _get_data_dir as _ddir
|
|
sweep_stale_parquet_locks(_ddir() / "extracts")
|
|
except Exception:
|
|
logger.exception("startup parquet-lock sweep failed (non-fatal)")
|
|
|
|
# Seed the internal data-source registry rows so `agnes_sessions /
|
|
# agnes_telemetry / agnes_audit` show up in /admin/tables + `agnes
|
|
# catalog` on every fresh install. Idempotent — re-applies canonical
|
|
# name + description on every boot so operators can't drift them
|
|
# away from the seed.
|
|
try:
|
|
from src.db import get_system_db
|
|
from connectors.internal.registry import ensure_internal_tables_registered
|
|
ensure_internal_tables_registered(get_system_db())
|
|
except Exception:
|
|
logger.exception("internal data-source seed failed; continuing")
|
|
|
|
# Rebuild the FTS BM25 index over knowledge_items at boot (issue #121).
|
|
# The migration to schema v47 already does this on first upgrade, but
|
|
# for instances that have been on v47 across restarts the boot-time
|
|
# rebuild guarantees the index reflects whatever mutations landed via
|
|
# the BG-task / scheduler paths that bypass the per-mutation hook.
|
|
# Soft-failure — logs WARNING and the repo falls back to ILIKE.
|
|
try:
|
|
from src.db import get_system_db
|
|
from src.fts import ensure_knowledge_fts_index
|
|
ensure_knowledge_fts_index(get_system_db())
|
|
except Exception:
|
|
logger.exception("startup FTS index rebuild failed; falling back to ILIKE on /api/memory?search=")
|
|
|
|
# Surface BQ config gaps at startup so the operator sees them in
|
|
# the boot log instead of as cryptic "provider returned no data" /
|
|
# "403 serviceusage" later. Issue #343 — these are the same gaps
|
|
# that silently failed every remote BQ query on a customer prod
|
|
# instance for several days in mid-May 2026 before the cause was
|
|
# traced. Non-fatal: warnings only, no startup abort.
|
|
try:
|
|
from connectors.bigquery.access import validate_bigquery_startup_config
|
|
for warning in validate_bigquery_startup_config():
|
|
logger.warning("BQ config check: %s", warning)
|
|
except Exception:
|
|
logger.exception("BQ startup config validation crashed (non-fatal)")
|
|
|
|
# Seed admin user (SEED_ADMIN_EMAIL) and add them to the Admin user_group.
|
|
# Optional SEED_ADMIN_PASSWORD lets the seeded user sign in immediately
|
|
# without going through bootstrap; never overwritten if already set.
|
|
# The Admin/Everyone user_groups themselves are seeded inside
|
|
# _ensure_schema (src.db._seed_system_groups), so this hook only has to
|
|
# handle membership for the seed admin.
|
|
# Lives in lifespan (worker-only), NOT create_app(): the latter runs
|
|
# in the uvicorn --reload master too, and duckdb >=1.5 holds an
|
|
# exclusive per-process file lock on system.duckdb that would then
|
|
# block the worker.
|
|
from app.auth.dependencies import is_local_dev_mode, get_local_dev_email
|
|
seed_email = os.environ.get("SEED_ADMIN_EMAIL") or (get_local_dev_email() if is_local_dev_mode() else None)
|
|
if seed_email:
|
|
try:
|
|
from src.db import SYSTEM_ADMIN_GROUP, get_system_db
|
|
from src.repositories.user_group_members import UserGroupMembersRepository
|
|
from src.repositories.users import UserRepository
|
|
conn = get_system_db()
|
|
repo = UserRepository(conn)
|
|
seed_password = os.environ.get("SEED_ADMIN_PASSWORD") or None
|
|
password_hash = None
|
|
if seed_password:
|
|
from argon2 import PasswordHasher
|
|
password_hash = PasswordHasher().hash(seed_password)
|
|
existing = repo.get_by_email(seed_email)
|
|
if not existing:
|
|
import uuid
|
|
user_id = str(uuid.uuid4())
|
|
repo.create(
|
|
id=user_id,
|
|
email=seed_email,
|
|
name="Admin",
|
|
password_hash=password_hash,
|
|
)
|
|
logger.info("Seeded admin user: %s (password=%s)", seed_email, "yes" if password_hash else "no")
|
|
else:
|
|
user_id = existing["id"]
|
|
if password_hash and not existing.get("password_hash"):
|
|
repo.update(id=user_id, password_hash=password_hash)
|
|
logger.info("Set password on existing seed admin: %s", seed_email)
|
|
# Make sure the seed admin is actually in the Admin group — this
|
|
# is what gives them admin access in v12. Idempotent.
|
|
from src.db import SYSTEM_EVERYONE_GROUP
|
|
admin_group = conn.execute(
|
|
"SELECT id FROM user_groups WHERE name = ?", [SYSTEM_ADMIN_GROUP],
|
|
).fetchone()
|
|
if admin_group:
|
|
UserGroupMembersRepository(conn).add_member(
|
|
user_id=user_id,
|
|
group_id=admin_group[0],
|
|
source="system_seed",
|
|
added_by="app.main:seed_admin",
|
|
)
|
|
# Also seed Everyone membership — Everyone-scoped grants are the
|
|
# canonical "every-user-sees-this" pattern (Required onboarding,
|
|
# default reference packages). The seed admin not being in
|
|
# Everyone meant their own Required grants didn't surface on
|
|
# /catalog as Required for them, which read as a bug.
|
|
everyone_group = conn.execute(
|
|
"SELECT id FROM user_groups WHERE name = ?",
|
|
[SYSTEM_EVERYONE_GROUP],
|
|
).fetchone()
|
|
if everyone_group:
|
|
UserGroupMembersRepository(conn).add_member(
|
|
user_id=user_id,
|
|
group_id=everyone_group[0],
|
|
source="system_seed",
|
|
added_by="app.main:seed_admin",
|
|
)
|
|
conn.close()
|
|
except Exception as e:
|
|
logger.warning(f"Could not seed admin: {e}")
|
|
|
|
# Seed the synthetic scheduler user when SCHEDULER_API_TOKEN is configured,
|
|
# so the very first cron tick after a fresh deploy already has a valid
|
|
# actor to attribute audit-log entries to. The lazy seed in
|
|
# `app.auth.scheduler_token.get_scheduler_user` covers the case where the
|
|
# secret is rotated mid-life, but doing it here keeps startup observable.
|
|
from app.auth.scheduler_token import get_scheduler_secret
|
|
if get_scheduler_secret():
|
|
try:
|
|
from app.auth.scheduler_token import (
|
|
SCHEDULER_TOKEN_MIN_LENGTH,
|
|
ensure_scheduler_user,
|
|
)
|
|
from src.db import get_system_db
|
|
secret = get_scheduler_secret()
|
|
if len(secret) < SCHEDULER_TOKEN_MIN_LENGTH:
|
|
logger.warning(
|
|
"SCHEDULER_API_TOKEN is set but only %d chars — auth path"
|
|
" disabled (minimum %d). Generate a longer secret in .env.",
|
|
len(secret), SCHEDULER_TOKEN_MIN_LENGTH,
|
|
)
|
|
else:
|
|
conn = get_system_db()
|
|
try:
|
|
ensure_scheduler_user(conn)
|
|
finally:
|
|
conn.close()
|
|
except Exception as e:
|
|
logger.warning(f"Could not seed scheduler user: {e}")
|
|
|
|
# C8: Warn when no user has a password_hash — bootstrap endpoint is open.
|
|
# This is intentional UX (operator can claim seed admin), but the open
|
|
# window should be visible in startup logs so it's not forgotten.
|
|
if not is_local_dev_mode():
|
|
try:
|
|
from src.db import get_system_db
|
|
from src.repositories.users import UserRepository
|
|
conn = get_system_db()
|
|
repo = UserRepository(conn)
|
|
all_users = repo.list_all()
|
|
has_password = any(u.get("password_hash") for u in all_users)
|
|
if not has_password:
|
|
logger.warning(
|
|
"No user has a password set — /auth/bootstrap is reachable. "
|
|
"Claim the seed admin (or set SEED_ADMIN_PASSWORD) to close this window."
|
|
)
|
|
conn.close()
|
|
except Exception:
|
|
pass # never block startup on a logging convenience
|
|
|
|
# Construct the PostHog client up front so its background flush thread
|
|
# starts before the first request — and so a missing/invalid key fails
|
|
# loud at boot rather than on first capture. No-op when disabled.
|
|
try:
|
|
from src.observability import get_posthog
|
|
pc = get_posthog()
|
|
if pc.enabled:
|
|
logger.info("PostHog observability enabled (host=%s, identify=%s, replay=%s)",
|
|
pc.host, pc.identify_mode, pc.replay_enabled)
|
|
except Exception:
|
|
logger.exception("PostHog init at startup failed")
|
|
yield
|
|
try:
|
|
from src.observability import get_posthog
|
|
get_posthog().shutdown()
|
|
except Exception:
|
|
logger.exception("PostHog shutdown failed")
|
|
from src.db import close_analytics_db, close_system_db
|
|
close_system_db()
|
|
close_analytics_db()
|
|
|
|
|
|
def _is_truthy_env(name: str) -> bool:
|
|
return os.environ.get(name, "").lower() in ("1", "true", "yes")
|
|
|
|
|
|
# DEBUG turns the toolbar on; LOCAL_DEV_MODE implies it (auth-bypassed dev
|
|
# environment is by definition a debugging context — no point in making
|
|
# operators set both).
|
|
DEBUG = _is_truthy_env("DEBUG") or _is_truthy_env("LOCAL_DEV_MODE")
|
|
|
|
|
|
def _toolbar_show_callback(request, settings) -> bool:
|
|
"""Decide whether the debug toolbar shows on a request.
|
|
|
|
Replaces the upstream default (which reads `request.app.debug`) — we keep
|
|
`app.debug=False` so our @app.exception_handler(Exception) runs instead of
|
|
Starlette's debug-only ServerErrorMiddleware, but we still want the
|
|
toolbar mounted. Read DEBUG / LOCAL_DEV_MODE env directly so operators who
|
|
flip the env at runtime (rare) see the change without re-import.
|
|
"""
|
|
return _is_truthy_env("DEBUG") or _is_truthy_env("LOCAL_DEV_MODE")
|
|
|
|
|
|
def create_app() -> FastAPI:
|
|
app = FastAPI(
|
|
title="AI Data Analyst",
|
|
description="Data distribution platform for AI analytical systems",
|
|
version=APP_VERSION,
|
|
lifespan=lifespan,
|
|
# Swagger UI / OpenAPI JSON gated behind authentication — custom
|
|
# routes added below before the web_router catch-all. Setting these
|
|
# to None disables FastAPI's default unauthenticated endpoints.
|
|
docs_url=None,
|
|
redoc_url=None,
|
|
openapi_url=None,
|
|
# Intentionally NOT debug=DEBUG: FastAPI's debug=True installs
|
|
# Starlette's ServerErrorMiddleware which intercepts unhandled
|
|
# Exceptions and renders a plain-HTML traceback BEFORE our
|
|
# @app.exception_handler(Exception) can run — robbing the 500 page
|
|
# of its chrome and the debug toolbar. We get the toolbar back via
|
|
# SHOW_TOOLBAR_CALLBACK below (reads DEBUG env directly instead of
|
|
# request.app.debug).
|
|
debug=False,
|
|
)
|
|
|
|
@app.middleware("http")
|
|
async def _add_version_headers(request, call_next):
|
|
response = await call_next(request)
|
|
# /api/* only — headers are advisory to the agnes CLI; UI/docs/marketplace
|
|
# traffic doesn't consume them.
|
|
if request.url.path.startswith("/api/"):
|
|
response.headers["X-Agnes-Latest-Version"] = APP_VERSION
|
|
response.headers["X-Agnes-Min-Version"] = MIN_COMPAT_CLI_VERSION
|
|
return response
|
|
|
|
# FastAPI debug toolbar — only when DEBUG=1 in env. Injects per-request
|
|
# HTML overlay (headers, routes, timer, profiling, logs) on any HTML
|
|
# response; harmless on JSON. Inner try/except is for the import only:
|
|
# if a developer sets DEBUG=1 without installing dev deps, log a warning
|
|
# instead of crashing. The middleware mount itself fails loud if broken.
|
|
#
|
|
# Mounted FIRST (innermost on response) so it sees the raw HTML BEFORE
|
|
# GZip compresses it — debug_toolbar.middleware decodes response bodies
|
|
# as UTF-8 to inject markup, and a gzipped body fails that decode (the
|
|
# toolbar's own `Accept-Encoding` skip-check reads response headers, not
|
|
# request headers, so it never trips).
|
|
if DEBUG:
|
|
try:
|
|
from debug_toolbar.middleware import DebugToolbarMiddleware
|
|
from jinja2 import FileSystemLoader
|
|
# debug_toolbar.middleware splats **kwargs into DebugToolbarSettings
|
|
# (a pydantic-settings model with case-insensitive UPPERCASE fields).
|
|
# Pass field names as kwargs to add_middleware — `panels` becomes
|
|
# `PANELS`, etc. Do NOT wrap them in a `settings={...}` dict —
|
|
# that hits the model's actual `SETTINGS` field (Sequence[BaseSettings])
|
|
# and fails validation. Field reference:
|
|
# https://github.com/mongkok/fastapi-debug-toolbar/blob/master/debug_toolbar/settings.py
|
|
# ProfilingPanel (pyinstrument) is intentionally omitted: it
|
|
# raises "There is already a profiler running" under uvicorn's
|
|
# async context because pyinstrument's stack sampler can't be
|
|
# nested per task. Re-enable per-developer if you really want it
|
|
# via env override; the rest of the panels are async-safe.
|
|
#
|
|
# JINJA_LOADERS prepends our app/debug/templates so DuckDBPanel
|
|
# can resolve `panels/duckdb.html`. The toolbar's built-in loader
|
|
# (PackageLoader for debug_toolbar/templates) stays appended via
|
|
# ChoiceLoader, so first-party panels still render.
|
|
_debug_templates_dir = Path(__file__).parent / "debug" / "templates"
|
|
_toolbar_settings = dict(
|
|
panels=[
|
|
"debug_toolbar.panels.headers.HeadersPanel",
|
|
"debug_toolbar.panels.routes.RoutesPanel",
|
|
"debug_toolbar.panels.settings.SettingsPanel",
|
|
"debug_toolbar.panels.versions.VersionsPanel",
|
|
"debug_toolbar.panels.timer.TimerPanel",
|
|
"debug_toolbar.panels.logging.LoggingPanel",
|
|
"app.debug.duckdb_panel.DuckDBPanel",
|
|
],
|
|
jinja_loaders=[FileSystemLoader(str(_debug_templates_dir))],
|
|
show_toolbar_callback="app.main._toolbar_show_callback",
|
|
)
|
|
# Eagerly register the toolbar's own routes
|
|
# (/_debug_toolbar/render_panel/ + /_debug_toolbar/static mount)
|
|
# NOW, before app.web.router's /{full_path:path} catch-all gets
|
|
# added by include_router(web_router). Otherwise the catch-all
|
|
# swallows the toolbar's own GET requests and the panel scripts
|
|
# render our 404 page. We can't construct DebugToolbarMiddleware
|
|
# directly on the FastAPI app (its `while not isinstance(...,
|
|
# APIRouter): self.router = self.router.app` walk fails — FastAPI
|
|
# has `.router`, not `.app`), so call init_toolbar's body
|
|
# ourselves on the APIRouter directly. add_middleware below still
|
|
# works lazily; init_toolbar's NoMatchFound guard skips re-adding
|
|
# routes when called the second time.
|
|
from debug_toolbar.api import render_panel as _render_panel_view
|
|
from debug_toolbar.middleware import show_toolbar as _show_toolbar
|
|
from debug_toolbar.settings import DebugToolbarSettings
|
|
from fastapi import HTTPException as _HTTPException, status as _status
|
|
from fastapi.staticfiles import StaticFiles as _StaticFiles
|
|
|
|
_eager_settings = DebugToolbarSettings(**_toolbar_settings)
|
|
|
|
async def _require_show_toolbar(request, call_next=None):
|
|
"""Mirror DebugToolbarMiddleware.require_show_toolbar: 404 the
|
|
toolbar API for clients that wouldn't see the toolbar."""
|
|
if not _show_toolbar(request, _eager_settings):
|
|
raise _HTTPException(status_code=_status.HTTP_404_NOT_FOUND)
|
|
return await _render_panel_view(request)
|
|
|
|
app.router.get(
|
|
_eager_settings.API_URL,
|
|
name="debug_toolbar.render_panel",
|
|
include_in_schema=False,
|
|
)(_render_panel_view)
|
|
app.router.mount(
|
|
_eager_settings.STATIC_URL,
|
|
_StaticFiles(packages=["debug_toolbar"]),
|
|
name="debug_toolbar.static",
|
|
)
|
|
|
|
app.add_middleware(DebugToolbarMiddleware, **_toolbar_settings)
|
|
except ImportError:
|
|
logger.warning(
|
|
"DEBUG=1 but fastapi-debug-toolbar not installed; toolbar disabled",
|
|
)
|
|
|
|
# PostHog HTML snippet injection — must run INSIDE the GZip layer so it
|
|
# sees uncompressed HTML before compression. Starlette runs middleware
|
|
# in reverse-registration order on the response, so registering this
|
|
# before _SelectiveGZipMiddleware places it deeper in the stack and
|
|
# therefore earlier in the response chain. Many of this app's templates
|
|
# are standalone (their own <!DOCTYPE>) and never extend base.html, so
|
|
# a per-template include would miss them; the middleware covers
|
|
# everything in one place. No-op when POSTHOG_API_KEY is unset.
|
|
from app.middleware.posthog_inject import PosthogInjectionMiddleware
|
|
app.add_middleware(PosthogInjectionMiddleware)
|
|
|
|
# Compress JSON / HTML responses on the wire. Parquet downloads are
|
|
# excluded — they're already columnar-compressed and re-gzipping them
|
|
# just burns CPU with no size win. minimum_size=1024 keeps tiny
|
|
# responses uncompressed too (cheaper than the header overhead).
|
|
app.add_middleware(
|
|
_SelectiveGZipMiddleware,
|
|
minimum_size=1024,
|
|
skip_prefixes=(
|
|
"/api/data/",
|
|
"/cli/wheel/",
|
|
"/cli/download",
|
|
"/marketplace.git", # git smart-HTTP is self-chunked; double-gzip bloats
|
|
),
|
|
)
|
|
|
|
# Per-IP rate limiting on auth endpoints (#45). Wired here so the
|
|
# SlowAPIMiddleware sits in the standard middleware chain (above CORS,
|
|
# below GZip — order doesn't affect correctness, only metric/log
|
|
# ordering). The limiter singleton is created at import time in
|
|
# app.auth.rate_limit; we just register state + middleware + handler.
|
|
app.state.limiter = _auth_rate_limiter
|
|
app.add_middleware(_AuthRateLimitMiddleware)
|
|
app.add_exception_handler(_AuthRateLimitExceeded, _auth_rate_limit_handler)
|
|
|
|
# Session middleware (required for OAuth state)
|
|
from app.secrets import get_session_secret
|
|
session_secret = get_session_secret()
|
|
if len(session_secret) < 32:
|
|
# Same gate JWT applies (app/auth/jwt.py:_get_secret_key) — keeps the
|
|
# two HMAC surfaces consistent. session_internal_roles + google_groups
|
|
# are trusted off the cookie signature; a weak SESSION_SECRET means
|
|
# those gates are weak too.
|
|
import warnings as _warnings
|
|
_warnings.warn(
|
|
f"SESSION_SECRET is {len(session_secret)} chars — minimum 32 recommended",
|
|
UserWarning, stacklevel=2,
|
|
)
|
|
app.add_middleware(SessionMiddleware, secret_key=session_secret)
|
|
|
|
# CORS for CLI and external clients
|
|
cors_origins = os.environ.get("CORS_ORIGINS", "http://localhost:3000,http://localhost:8000").split(",")
|
|
app.add_middleware(
|
|
CORSMiddleware,
|
|
allow_origins=[o.strip() for o in cors_origins],
|
|
allow_credentials=True,
|
|
allow_methods=["*"],
|
|
allow_headers=["*"],
|
|
)
|
|
|
|
# RequestIdMiddleware mounted LAST — Starlette inserts middleware at
|
|
# index 0, so the last add_middleware call ends up OUTERMOST and runs
|
|
# FIRST per request. The request_id ContextVar is set before any
|
|
# downstream middleware or handler runs, and every response gets the
|
|
# x-request-id header.
|
|
app.add_middleware(RequestIdMiddleware)
|
|
|
|
# Load .env_overlay (persisted by /api/admin/configure)
|
|
from app.secrets import _state_dir
|
|
_overlay = _state_dir() / ".env_overlay"
|
|
if _overlay.exists():
|
|
for line in _overlay.read_text().splitlines():
|
|
if "=" in line and not line.startswith("#"):
|
|
k, v = line.split("=", 1)
|
|
os.environ.setdefault(k.strip(), v.strip())
|
|
|
|
# Load instance config on startup
|
|
try:
|
|
from app.instance_config import load_instance_config
|
|
load_instance_config()
|
|
logger.info("Instance config loaded")
|
|
except Exception as e:
|
|
logger.warning(f"Could not load instance config: {e}")
|
|
|
|
# Configure confidence scoring from instance config (corporate_memory.confidence section)
|
|
try:
|
|
from app.instance_config import get_corporate_memory_config
|
|
from services.corporate_memory.confidence import configure as configure_confidence
|
|
cm_config = get_corporate_memory_config()
|
|
if cm_config and "confidence" in cm_config:
|
|
configure_confidence(cm_config["confidence"])
|
|
logger.info("Corporate memory confidence config applied")
|
|
except Exception as e:
|
|
logger.warning(f"Could not configure corporate memory confidence: {e}")
|
|
|
|
# Startup banner
|
|
from src.db import SCHEMA_VERSION
|
|
logger.info(
|
|
"Agnes %s | channel: %s | schema v%s",
|
|
os.environ.get("AGNES_VERSION", "dev"),
|
|
os.environ.get("RELEASE_CHANNEL", "dev"),
|
|
SCHEMA_VERSION,
|
|
)
|
|
|
|
# LOCAL_DEV_MODE: bypass authentication for local development. DO NOT enable in prod.
|
|
# When on, every protected route auto-logs in as a seeded admin user (default dev@localhost).
|
|
from app.auth.dependencies import (
|
|
is_local_dev_mode, get_local_dev_email, get_local_dev_groups,
|
|
)
|
|
if is_local_dev_mode():
|
|
logger.warning("=" * 60)
|
|
logger.warning("LOCAL_DEV_MODE is ON — authentication is bypassed.")
|
|
logger.warning("All requests auto-authenticate as: %s", get_local_dev_email())
|
|
# Validate + report LOCAL_DEV_GROUPS at startup so a malformed JSON
|
|
# value gets surfaced loudly here instead of silently warning on the
|
|
# first authenticated request. Empty when unset is fine — just say so.
|
|
raw_groups_env = os.environ.get("LOCAL_DEV_GROUPS", "").strip()
|
|
mocked_groups = get_local_dev_groups()
|
|
if raw_groups_env and not mocked_groups:
|
|
logger.warning(
|
|
"LOCAL_DEV_GROUPS is set but produced no valid groups — "
|
|
"check the WARNING above for the parse error.",
|
|
)
|
|
elif mocked_groups:
|
|
logger.warning(
|
|
"LOCAL_DEV_GROUPS: mocking %d group(s) into session: %s",
|
|
len(mocked_groups),
|
|
", ".join(g["id"] for g in mocked_groups),
|
|
)
|
|
else:
|
|
logger.warning("LOCAL_DEV_GROUPS is unset — session.google_groups will be empty.")
|
|
logger.warning("NEVER enable this in a deployment reachable from the internet.")
|
|
logger.warning("=" * 60)
|
|
|
|
# Guardrails misconfig surface — fail-CLOSED matrix means an enabled
|
|
# pipeline with no LLM credentials in env will hold every submission
|
|
# at `pending_llm` indefinitely. Surface this LOUDLY at boot so the
|
|
# operator finds the cause before the submission queue piles up.
|
|
try:
|
|
from app.instance_config import (
|
|
get_guardrails_enabled,
|
|
get_guardrails_llm_provider_ready,
|
|
)
|
|
if get_guardrails_enabled() and not get_guardrails_llm_provider_ready():
|
|
logger.warning("=" * 60)
|
|
logger.warning(
|
|
"GUARDRAILS ENABLED BUT NO LLM PROVIDER CREDENTIALS FOUND.",
|
|
)
|
|
logger.warning(
|
|
"Set ANTHROPIC_API_KEY (or LLM_API_KEY) in the environment, "
|
|
"or disable guardrails in instance.yaml.",
|
|
)
|
|
logger.warning(
|
|
"Until then, every flea-market upload will sit at "
|
|
"status='pending_llm' awaiting admin retry — the LLM "
|
|
"review step cannot run.",
|
|
)
|
|
logger.warning("=" * 60)
|
|
except Exception:
|
|
logger.exception("guardrails readiness probe failed at boot")
|
|
|
|
# Static files
|
|
static_dir = Path(__file__).parent / "web" / "static"
|
|
if static_dir.exists():
|
|
app.mount("/static", StaticFiles(directory=str(static_dir)), name="static")
|
|
|
|
# v50 admin-uploaded cover images. Lives under ${DATA_DIR}/uploads so
|
|
# it survives across deploys (the app/web/static dir gets bundled into
|
|
# the container image and is treated as read-only). The directory is
|
|
# lazily created by app/api/uploads.py — we mkdir here too so the
|
|
# StaticFiles mount has a real directory on boot even before the first
|
|
# upload (avoids the "directory does not exist" 500 on cold systems).
|
|
from src.db import _get_data_dir as _ddir_uploads
|
|
uploads_dir = _ddir_uploads() / "uploads"
|
|
uploads_dir.mkdir(parents=True, exist_ok=True)
|
|
app.mount(
|
|
"/uploads",
|
|
StaticFiles(directory=str(uploads_dir)),
|
|
name="uploads",
|
|
)
|
|
|
|
# Auth providers (conditional registration)
|
|
from app.auth.providers.google import router as google_auth_router, is_available as google_available
|
|
from app.auth.providers.password import router as password_auth_router
|
|
from app.auth.providers.email import router as email_auth_router, is_available as email_available
|
|
|
|
# API routers
|
|
app.include_router(auth_router)
|
|
app.include_router(google_auth_router)
|
|
app.include_router(password_auth_router)
|
|
app.include_router(email_auth_router) # Always register, check availability per-request
|
|
app.include_router(health_router)
|
|
app.include_router(sync_router)
|
|
app.include_router(data_router)
|
|
app.include_router(query_router)
|
|
app.include_router(users_router)
|
|
app.include_router(memory_router)
|
|
app.include_router(upload_router)
|
|
app.include_router(scripts_router)
|
|
app.include_router(settings_router)
|
|
app.include_router(catalog_router)
|
|
app.include_router(telegram_router)
|
|
app.include_router(admin_router)
|
|
app.include_router(admin_bigquery_test_router)
|
|
app.include_router(access_router)
|
|
app.include_router(me_access_router)
|
|
app.include_router(me_router)
|
|
app.include_router(me_stats_router)
|
|
app.include_router(jira_webhooks_router)
|
|
app.include_router(metrics_router)
|
|
app.include_router(metadata_router)
|
|
app.include_router(query_hybrid_router)
|
|
app.include_router(cli_artifacts_router)
|
|
app.include_router(tokens_router)
|
|
app.include_router(tokens_admin_router)
|
|
app.include_router(v2_catalog_router)
|
|
app.include_router(v2_schema_router)
|
|
app.include_router(v2_sample_router)
|
|
app.include_router(v2_scan_router)
|
|
app.include_router(marketplaces_router)
|
|
app.include_router(data_packages_router)
|
|
app.include_router(memory_domains_router)
|
|
app.include_router(recipes_public_router)
|
|
app.include_router(recipes_admin_router)
|
|
app.include_router(memory_domain_suggestions_public_router)
|
|
app.include_router(memory_domain_suggestions_admin_router)
|
|
app.include_router(admin_uploads_router)
|
|
app.include_router(stack_router)
|
|
app.include_router(stack_views_router)
|
|
app.include_router(initial_workspace_router)
|
|
app.include_router(store_router)
|
|
app.include_router(my_stack_router)
|
|
app.include_router(marketplace_router)
|
|
app.include_router(welcome_router)
|
|
app.include_router(claude_md_router)
|
|
app.include_router(news_router)
|
|
app.include_router(cache_warmup_router)
|
|
app.include_router(bq_metadata_refresh_router)
|
|
app.include_router(activity_router)
|
|
app.include_router(observability_router)
|
|
app.include_router(admin_user_sessions_router)
|
|
app.include_router(admin_sessions_router)
|
|
app.include_router(admin_usage_router)
|
|
app.include_router(admin_usage_summary_router)
|
|
app.include_router(marketplace_server_router)
|
|
|
|
# Git smart-HTTP endpoint for Claude Code: /marketplace.git/*
|
|
# WSGI → ASGI bridge (dulwich is WSGI-native; FastAPI is ASGI).
|
|
from a2wsgi import WSGIMiddleware
|
|
app.mount("/marketplace.git", WSGIMiddleware(make_git_wsgi_app()))
|
|
|
|
# Authenticated Swagger / ReDoc / OpenAPI JSON — requires a valid session
|
|
# so the full admin API surface is not visible to unauthenticated callers.
|
|
# Must be registered before web_router (catch-all). /openapi.json is also
|
|
# added to _API_PATH_PREFIXES below so auth failures return JSON 401
|
|
# rather than an HTML redirect.
|
|
from fastapi.openapi.docs import get_swagger_ui_html, get_redoc_html
|
|
from fastapi.responses import HTMLResponse as _HTMLResponse
|
|
from app.auth.dependencies import get_current_user as _get_current_user
|
|
|
|
@app.get("/docs", include_in_schema=False, response_class=_HTMLResponse)
|
|
async def swagger_ui(user: dict = Depends(_get_current_user)):
|
|
return get_swagger_ui_html(openapi_url="/openapi.json", title="Agnes API")
|
|
|
|
@app.get("/redoc", include_in_schema=False, response_class=_HTMLResponse)
|
|
async def redoc_ui(user: dict = Depends(_get_current_user)):
|
|
return get_redoc_html(openapi_url="/openapi.json", title="Agnes API — ReDoc")
|
|
|
|
@app.get("/openapi.json", include_in_schema=False)
|
|
async def openapi_spec(user: dict = Depends(_get_current_user)):
|
|
return app.openapi()
|
|
|
|
# Web UI router (must be last — has catch-all routes)
|
|
app.include_router(web_router)
|
|
|
|
# Paths served as API responses (JSON / ZIP / git smart-HTTP) — never
|
|
# redirect a 401 here to the HTML login page; clients expect the raw 401.
|
|
_API_PATH_PREFIXES: tuple[str, ...] = (
|
|
"/api/",
|
|
"/auth/",
|
|
"/cli/",
|
|
"/openapi.json",
|
|
"/webhooks/",
|
|
"/marketplace.zip",
|
|
"/marketplace.git",
|
|
"/marketplace/",
|
|
)
|
|
|
|
_ERROR_TITLES = {
|
|
400: "Bad request",
|
|
401: "Sign-in required",
|
|
403: "Forbidden",
|
|
404: "Page not found",
|
|
405: "Method not allowed",
|
|
408: "Request timeout",
|
|
413: "Payload too large",
|
|
422: "Unprocessable entity",
|
|
429: "Too many requests",
|
|
500: "Server error",
|
|
502: "Bad gateway",
|
|
503: "Service unavailable",
|
|
504: "Gateway timeout",
|
|
}
|
|
|
|
def _wants_html(request) -> bool:
|
|
"""True when the client looks like a browser (non-API path, explicit html).
|
|
|
|
We deliberately do NOT treat ``Accept: */*`` (curl's default) or an
|
|
empty Accept header as wanting HTML. curl-using operators were
|
|
getting JSON error bodies for non-API paths before this PR; matching
|
|
``*/*`` here would silently flip them to HTML and break tooling that
|
|
parses ``{"detail": "..."}``. A real browser sends
|
|
``Accept: text/html,application/xhtml+xml,...`` so the explicit
|
|
substring check below covers that case.
|
|
Devin ANALYSIS_0003 on PR #136 review.
|
|
"""
|
|
if request.url.path.startswith(_API_PATH_PREFIXES):
|
|
return False
|
|
accept = request.headers.get("accept", "")
|
|
return "text/html" in accept
|
|
|
|
async def _resolve_error_user(request) -> dict | None:
|
|
"""Best-effort user resolution for the error page header.
|
|
|
|
Mirrors ``app.auth.dependencies.get_optional_user`` precedence
|
|
(LOCAL_DEV_MODE → seeded dev user, else verify JWT from
|
|
Authorization header or ``access_token`` cookie). Returns None on
|
|
any failure — error page still renders, just without the user menu.
|
|
"""
|
|
try:
|
|
from app.auth.dependencies import get_current_user
|
|
from src.db import get_system_db
|
|
|
|
conn = get_system_db()
|
|
try:
|
|
authorization = request.headers.get("authorization")
|
|
return await get_current_user(
|
|
request=request, authorization=authorization, conn=conn
|
|
)
|
|
finally:
|
|
try:
|
|
conn.close()
|
|
except Exception:
|
|
pass
|
|
except Exception:
|
|
return None
|
|
|
|
async def _render_error(request, code: int, message: str, traceback_str: str | None = None):
|
|
"""Render error.html with the same chrome (header, theme, static_url)
|
|
as any other web route. Reuses ``_build_context`` so the page picks up
|
|
ConfigProxy, theme overrides, session user, and ``static_url`` /
|
|
``url_for`` helpers — without these, base.html + _app_header.html
|
|
silently render empty header/stylesheets."""
|
|
from app.logging_config import request_id_var
|
|
from app.web.router import templates as _web_templates, _build_context
|
|
|
|
title = _ERROR_TITLES.get(code, "Error")
|
|
user = await _resolve_error_user(request)
|
|
ctx = _build_context(
|
|
request,
|
|
user=user,
|
|
code=code,
|
|
title=title,
|
|
message=message,
|
|
path=request.url.path,
|
|
traceback=traceback_str,
|
|
request_id=request_id_var.get(),
|
|
)
|
|
return _web_templates.TemplateResponse(request, "error.html", ctx, status_code=code)
|
|
|
|
@app.exception_handler(StarletteHTTPException)
|
|
async def _html_auth_redirect_handler(request, exc: StarletteHTTPException):
|
|
"""Browser-friendly error rendering for HTML routes; JSON for API routes.
|
|
|
|
- 401 GET on a non-API path → redirect to ``/login`` (existing contract).
|
|
- Any other status code on a non-API path with HTML-accepting client →
|
|
render ``error.html`` (toolbar middleware injects panels because the
|
|
``_catch_all_404`` route at the end of ``app.web.router`` provides a
|
|
matched route for unrouted paths).
|
|
- API prefixes (``/api/``, ``/auth/``, ``/marketplace.zip``,
|
|
``/marketplace.git``, ``/marketplace/``) and non-HTML clients → JSON
|
|
``{"detail": "..."}`` per the existing contract.
|
|
"""
|
|
path_is_api = request.url.path.startswith(_API_PATH_PREFIXES)
|
|
|
|
if (
|
|
exc.status_code == 401
|
|
and request.method == "GET"
|
|
and not path_is_api
|
|
):
|
|
next_param = quote(request.url.path, safe="")
|
|
return RedirectResponse(url=f"/login?next={next_param}", status_code=302)
|
|
|
|
if not path_is_api and _wants_html(request):
|
|
return await _render_error(request, exc.status_code, exc.detail or "")
|
|
|
|
from fastapi.exception_handlers import http_exception_handler
|
|
return await http_exception_handler(request, exc)
|
|
|
|
@app.exception_handler(Exception)
|
|
async def _unhandled_exception_handler(request, exc: Exception):
|
|
"""Catch-all 500 handler — HTML for browsers, JSON for API clients."""
|
|
import os as _os
|
|
import traceback as _tb
|
|
logger.exception("Unhandled exception on %s %s", request.method, request.url.path)
|
|
|
|
# Best-effort: forward the exception to PostHog before rendering the
|
|
# error page. Disabled state is a cheap no-op. Wrapped because a
|
|
# tracing failure must never replace the user-visible 500 with a
|
|
# second exception.
|
|
try:
|
|
from src.observability import get_posthog
|
|
from app.logging_config import request_id_var as _rid_var
|
|
get_posthog().capture_exception(
|
|
exc,
|
|
request=request,
|
|
properties={
|
|
"request_id": _rid_var.get(),
|
|
"path": request.url.path,
|
|
"method": request.method,
|
|
},
|
|
)
|
|
except Exception:
|
|
logger.exception("PostHog capture_exception failed in 500 handler")
|
|
|
|
path_is_api = request.url.path.startswith(_API_PATH_PREFIXES)
|
|
debug_on = _os.environ.get("DEBUG", "").lower() in ("1", "true", "yes")
|
|
tb_str = _tb.format_exc() if debug_on else None
|
|
|
|
if not path_is_api and _wants_html(request):
|
|
# In production (DEBUG unset), never leak str(exc) to the
|
|
# rendered page — exception messages routinely contain DB paths,
|
|
# SQL fragments, internal hostnames, or credentials embedded in
|
|
# connection strings. Match the JSON branch's debug_on guard.
|
|
# Devin BUG_0001 on PR #136 (b1c6ee9 review).
|
|
visible_message = str(exc) if debug_on else "Internal server error"
|
|
return await _render_error(request, 500, visible_message, tb_str)
|
|
|
|
from app.logging_config import request_id_var
|
|
from fastapi.responses import JSONResponse
|
|
body: dict[str, str | None] = {
|
|
"detail": "Internal server error",
|
|
"request_id": request_id_var.get(),
|
|
}
|
|
if debug_on:
|
|
body["error"] = str(exc)
|
|
return JSONResponse(body, status_code=500)
|
|
|
|
_patch_openapi_auth_errors(app)
|
|
|
|
return app
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# OpenAPI schema post-processing
|
|
# ---------------------------------------------------------------------------
|
|
|
|
#: Paths that are intentionally unauthenticated. Every other /api/* route
|
|
#: gets 401 and 403 injected into its declared responses so the spec truthfully
|
|
#: reflects that auth errors are possible. FastAPI cannot derive these from
|
|
#: Depends() chains automatically.
|
|
_PUBLIC_API_PATHS = frozenset({
|
|
"/api/health",
|
|
"/api/health/detailed",
|
|
"/api/version",
|
|
})
|
|
|
|
_HTTP_METHODS = frozenset({"get", "post", "put", "delete", "patch"})
|
|
|
|
|
|
def _add_auth_error_responses(schema: dict) -> dict:
|
|
"""Inject 401/403 into every protected /api/* operation."""
|
|
_401 = {"description": "Not authenticated"}
|
|
_403 = {"description": "Insufficient permissions"}
|
|
for path, methods in schema.get("paths", {}).items():
|
|
if not path.startswith("/api/") or path in _PUBLIC_API_PATHS:
|
|
continue
|
|
for method, op in methods.items():
|
|
if method not in _HTTP_METHODS:
|
|
continue
|
|
responses = op.setdefault("responses", {})
|
|
responses.setdefault("401", _401)
|
|
responses.setdefault("403", _403)
|
|
return schema
|
|
|
|
|
|
def _patch_openapi_auth_errors(app: "FastAPI") -> None:
|
|
"""Wrap app.openapi() to call _add_auth_error_responses on every generation."""
|
|
original = app.openapi
|
|
|
|
def patched() -> dict:
|
|
schema = original()
|
|
return _add_auth_error_responses(schema)
|
|
|
|
app.openapi = patched # type: ignore[method-assign]
|
|
|
|
|
|
app = create_app()
|