From 107195730d1353e8fc1c11c6d63386c927b10044 Mon Sep 17 00:00:00 2001 From: Vojtech <119944107+cvrysanek@users.noreply.github.com> Date: Fri, 8 May 2026 17:57:10 +0400 Subject: [PATCH] feat(observability): optional PostHog integration (#231) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * feat(observability): optional PostHog integration (errors, LLM traces, replay, flags) Off by default. Activates when POSTHOG_API_KEY is set in env. Defaults to PostHog Cloud EU; override host for US Cloud or self-hosted. Coverage: - FastAPI 500 handler captures unhandled exceptions - src/orchestrator.py rebuild + rebuild_source failures - services/scheduler/ HTTP-job failures - cli/main.py uncaught CLI errors (Typer.Exit/SystemExit/KeyboardInterrupt skipped; flushes before re-raise so short-lived CLI invocations don't drop events) - connectors/llm/anthropic_provider.py + openai_compat.py emit $ai_generation events with provider, model, latency, token counts (prompt/completion bodies stay off unless POSTHOG_LLM_PAYLOADS=1 because LLM prompts here routinely include customer SQL/data) - Browser snippet injected into every text/html response by PosthogInjectionMiddleware — registered inside the GZip layer so it sees uncompressed HTML before compression. Many templates are standalone (their own DOCTYPE) and never extend base.html, so a per-template include would miss them. - Frontend: $pageview, $pageleave, JS error capture via window.error and unhandledrejection handlers, masked session replay (maskAllInputs: true plus CSS-selector mask for known data surfaces), feature flags (browser posthog.isFeatureEnabled + server-side feature_enabled with fallback for older SDKs). Identification mode operator-configurable: none / id / email / full. Default email ships user.id + email but never name. CLI entry point moves from cli.main:app to cli.main:main (Typer wrapper). Files: - src/observability/posthog_client.py — lazy singleton, no network when disabled, single-process flush on shutdown - src/observability/llm_tracing.py — trace_generation context manager - app/middleware/posthog_inject.py — HTML rewrite middleware - app/web/templates/_posthog.html — browser snippet template - docs/observability.md — operator guide - config/.env.template — documented POSTHOG_* knobs - tests/test_posthog_disabled.py + tests/test_posthog_client.py + tests/test_llm_tracing.py — 18 tests covering disabled state, identify-mode payloads, $ai_generation shape, error variant. CHANGELOG entry under [Unreleased] Added. * feat(observability): tag every PostHog event with environment + release Splits PostHog dashboards cleanly between localhost / dev / staging / production without manual tagging on every capture call. - POSTHOG_ENVIRONMENT explicit override; auto-resolves to "local" when LOCAL_DEV_MODE=1, else RELEASE_CHANNEL, else AGNES_DEPLOYMENT_ENV, else "unknown". - AGNES_VERSION → RELEASE_CHANNEL fallback feeds the `release` property for "is this error new in this release?" cohorting. - Backend gets both via the PostHog SDK's super_properties constructor arg (every captured event picks them up automatically). - Browser snippet calls posthog.register({environment, release}) inside the loaded callback so $pageview, $exception, autocapture, etc. all carry the same labels. - request.state.user now populated by auth dependencies so the snippet can actually call posthog.identify(user_id, {email}) for logged-in users (previously the user block always resolved to None because nothing wrote to request.state.user). 4 new tests cover env resolution: explicit > LOCAL_DEV_MODE > channel > unknown, plus super-properties forwarding into the SDK constructor. * feat(observability): inline user attrs on every PostHog event + debug throw route PostHog's UI shows person properties on the Person profile page, not inline on each event — so a reviewer triaging an exception couldn't tell which user hit the bug without clicking through. Fix it on both sides. - Backend capture_exception merges user_id / user_email / user_name into the event properties (gated by POSTHOG_IDENTIFY_PII: none/id/email/full). Backed by a new _user_props_for_event helper on PosthogClient. - Browser snippet registers user_id + user_email + user_name as super- properties via posthog.register({...}) so every $exception, $pageview, and custom event coming from posthog.captureException() carries them inline. Mirrors the backend so cross-referencing client/server events doesn't require a person-profile lookup. - /api/debug/throw — debug-only endpoint gated by DEBUG=1 (404 in prod). Runs Depends(get_current_user) first so request.state.user is set when the unhandled-exception handler captures the event. Lets operators exercise the full observability path end-to-end without hand-rolling a TestClient script. Configurable via ?kind=ValueError&msg=... 7 new tests cover: backend user-attr merge across identify modes, anonymous request fall-through, browser snippet super-prop emission for logged-in / anonymous / id-only / full-name cases. * fix(observability): address minasarustamyan PR #231 review Two bugs caught in review. 1. PosthogInjectionMiddleware dropped Response.background on every return path. BaseHTTPMiddleware materialises the body and asks subclasses to return a fresh Response — three paths in dispatch() omitted background=, silently cancelling any BackgroundTask / BackgroundTasks the route attached (audit logging, async webhooks, email sends) with no log line. Fix: route every return through a _passthrough() helper that forwards background. Also adds a _MAX_BUFFER_BYTES (4 MB) cap so a streamed-HTML response can't balloon RSS during buffering. Bigger bodies short-circuit through with a warning rather than being injected. Regression tests in tests/test_posthog_inject_middleware.py exercise four return paths (snippet present, render-fail, double-injection guard, non-HTML passthrough) plus the streaming-guard short-circuit. 2. $ai_input / $ai_output_choices were emitted without truncation, so POSTHOG_LLM_PAYLOADS=1 silently dropped events past PostHog's ~32 KB per-event ingest limit — exactly the calls (large prompts with schemas / sample rows / SQL) an operator would want to inspect. Fix: clip both at POSTHOG_LLM_PAYLOAD_MAX_CHARS (default 30000) with an explicit "…[truncated N chars]" marker so readers don't mistake truncated captures for complete ones. Metadata (provider, model, tokens, latency, error) flows regardless. Three new tests cover default-cap clipping, env-override, and pass-through under the cap. 37 PostHog tests pass. --- CHANGELOG.md | 7 + app/api/health.py | 33 +++ app/auth/dependencies.py | 22 +- app/main.py | 46 +++ app/middleware/posthog_inject.py | 156 ++++++++++ app/web/router.py | 68 +++++ app/web/templates/_posthog.html | 77 +++++ cli/main.py | 34 ++- config/.env.template | 42 +++ connectors/llm/anthropic_provider.py | 25 +- connectors/llm/openai_compat.py | 7 +- docs/observability.md | 127 +++++++++ pyproject.toml | 12 +- services/scheduler/__main__.py | 9 + src/observability/__init__.py | 10 + src/observability/llm_tracing.py | 227 +++++++++++++++ src/observability/posthog_client.py | 361 ++++++++++++++++++++++++ src/orchestrator.py | 25 +- tests/test_llm_tracing.py | 192 +++++++++++++ tests/test_posthog_client.py | 350 +++++++++++++++++++++++ tests/test_posthog_disabled.py | 77 +++++ tests/test_posthog_inject_middleware.py | 180 ++++++++++++ uv.lock | 67 ++++- 23 files changed, 2123 insertions(+), 31 deletions(-) create mode 100644 app/middleware/posthog_inject.py create mode 100644 app/web/templates/_posthog.html create mode 100644 docs/observability.md create mode 100644 src/observability/__init__.py create mode 100644 src/observability/llm_tracing.py create mode 100644 src/observability/posthog_client.py create mode 100644 tests/test_llm_tracing.py create mode 100644 tests/test_posthog_client.py create mode 100644 tests/test_posthog_disabled.py create mode 100644 tests/test_posthog_inject_middleware.py diff --git a/CHANGELOG.md b/CHANGELOG.md index 075f2ff..5f6461e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,6 +12,13 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C ### Added +- **PostHog snippet middleware preserves `Response.background`** on every return path so any `BackgroundTask` / `BackgroundTasks` attached to an HTML route still fires once the integration is enabled (PR #231 review by minasarustamyan). `BaseHTTPMiddleware` materialises the body and asks subclasses to return a fresh `Response`; the previous implementation dropped `background` on three paths, silently cancelling deferred audit logging / async webhooks / email sends with no log line. Also adds a `_MAX_BUFFER_BYTES` (4 MB) cap so a streamed-HTML response can't balloon RSS — bigger bodies short-circuit through with a warning instead of being buffered. Regression tests in `tests/test_posthog_inject_middleware.py` exercise the four return paths plus the streaming guard. +- **`POSTHOG_LLM_PAYLOAD_MAX_CHARS` (default 30000) clips `$ai_input` / `$ai_output_choices`** before they hit PostHog so oversized prompts don't get silently dropped at ingest. PostHog's per-event ceiling is ~32 KB and the SDK does not chunk; Agnes prompts routinely include sample rows / table schemas / analyst SQL that exceed it, and unbounded payloads landed *exactly* the calls operators wanted to inspect on the floor (PR #231 review by minasarustamyan). Truncated payloads carry an explicit `…[truncated N chars]` marker so a reader doesn't mistake them for a complete capture; metadata (provider, model, tokens, latency, error) flows regardless. Override the cap via the env var. +- **PostHog event-level user attributes** so a reviewer reading an event in PostHog sees who the user was inline, without clicking through to the person profile. Backend `capture_exception` merges `user_id` / `user_email` / `user_name` (per `POSTHOG_IDENTIFY_PII`) into the event properties; browser snippet registers the same keys as super-properties via `posthog.register({...})` so every client-side event including `posthog.captureException()` carries them. +- **`/api/debug/throw` debug-only endpoint** for verifying observability wiring end-to-end. Gated by `DEBUG=1` (404 in production), runs after `Depends(get_current_user)` so `request.state.user` is populated, then raises a configurable exception (`?kind=ValueError&msg=…`). Use to confirm PostHog receives the exception with full user context attached, not just `request_id`. +- **PostHog `environment` + `release` super-properties on every event.** Resolved at startup as `POSTHOG_ENVIRONMENT` (explicit) → `local` when `LOCAL_DEV_MODE=1` → `RELEASE_CHANNEL` → `AGNES_DEPLOYMENT_ENV` → `unknown`. Backend events get them via the SDK's `super_properties`; browser events get them via `posthog.register({...})` in the loaded callback. Filtering PostHog dashboards by `environment = production` cleanly hides traffic from developer laptops, CI, and staging deployments. `release` falls back from `AGNES_VERSION` to `RELEASE_CHANNEL`. +- **`request.state.user` populated by auth dependencies** so response-phase middleware (PostHog snippet injector, 500 handler) can identify the actor without re-running the auth dependency. Adds an `_stash_user` helper in `app/auth/dependencies.py` called from every successful resolution path (LOCAL_DEV_MODE seeded user, scheduler shared-secret, PAT/JWT). The browser `posthog.identify(user_id, {email})` call now actually fires for logged-in users. +- **Optional PostHog observability integration.** Off by default; activates only when `POSTHOG_API_KEY` is set in the environment. Covers backend exception capture (FastAPI 500s + `src/orchestrator.py` rebuild failures + `services/scheduler/` HTTP-job failures + `cli/main.py` uncaught CLI errors), LLM call tracing (`$ai_generation` events with provider, model, latency, and token counts; prompt / completion bodies stay off unless `POSTHOG_LLM_PAYLOADS=1` because LLM prompts in this product routinely include customer data), frontend errors + `$pageview` / `$pageleave`, masked session replay (`maskAllInputs: true` plus a CSS-selector mask for known data surfaces), and feature flags (server-side `is_feature_enabled` + browser `posthog.isFeatureEnabled`). Defaults to PostHog Cloud EU (`https://eu.i.posthog.com`) — override with `POSTHOG_HOST` for US Cloud or a self-hosted endpoint. Identification mode is operator-configurable (`none` / `id` / `email` / `full`); default `email` ships `user.id` + email but never name. The browser snippet is injected by an HTML-rewrite middleware (`app/middleware/posthog_inject.py`) so it reaches every `text/html` page including standalone templates that don't extend `base.html` — registered inside the GZip layer so it sees uncompressed HTML before compression. CLI entry point moved from `cli.main:app` to `cli.main:main` (Typer wrapper that captures uncaught exceptions, flushes, and re-raises). New file `src/observability/posthog_client.py` (lazy singleton, no network when disabled), `src/observability/llm_tracing.py` (`$ai_generation` context manager), `app/web/templates/_posthog.html` (browser snippet template). See `docs/observability.md` for the operator guide and `config/.env.template` for the env-var reference. - New `/marketplace` browse page combining curated marketplaces with the community Flea Market in a single discovery + install surface. Three tabs (Curated / Flea / My Stack), per-tab category filter with inline SVG icons (Heroicons MIT, no new dependency, in `src/category_icons.py`), Flea-only type filter, search across both sources with Curated/Flea scope checkboxes, numeric pagination — all with URL state via query string. Detail pages live at `/marketplace/flea/` and `/marketplace/curated//`. Curated detail returns 403 without the RBAC grant. Plugin detail surfaces inner skills/agents as clickable nested cards (`/marketplace/curated///{skill,agent}/`); commands/hooks/MCPs render as plain name lists. Guide pages at `/marketplace/guide/{curated,flea}` host the publication-flow placeholder for full copy to be authored separately. - New REST router under `/api/marketplace` (in `app/api/marketplace.py`): `GET /items` per-tab listing, `GET /categories` per-tab counts, `GET /curated/{slug}/{plugin}` detail, `POST/DELETE /curated/{slug}/{plugin}/install` subscribe/unsubscribe, `GET /curated/{slug}/{plugin}/{skill,agent}/{name}` for inner items. - `marketplace_plugins.created_at` column for "newest first" sorting on `/marketplace`. `MarketplacePluginsRepository.replace_for_marketplace` switched from delete-and-insert to upsert so `created_at` survives across syncs. diff --git a/app/api/health.py b/app/api/health.py index ebed4df..8cfbcf9 100644 --- a/app/api/health.py +++ b/app/api/health.py @@ -443,6 +443,39 @@ async def health_check_detailed( } +@router.get("/api/debug/throw") +async def debug_throw( + user: dict = Depends(get_current_user), + kind: str = "RuntimeError", + msg: str = "intentional debug throw", +): + """Deliberate-crash route for verifying observability wiring. + + Gated by ``DEBUG=1`` — returns 404 in production. Always raises after + the auth dependency resolves, so ``request.state.user`` is populated + by the time the unhandled-exception handler captures the event. Use + to confirm that PostHog receives the exception with full user context + (``distinct_id``, ``user_id``, ``user_email``) and not just + ``request_id``. + + Optional query params let you pick the exception type and message: + /api/debug/throw?kind=ValueError&msg=hello + """ + if os.environ.get("DEBUG", "").strip().lower() not in ("1", "true", "yes", "on"): + from fastapi import HTTPException + raise HTTPException(status_code=404) + + types = { + "RuntimeError": RuntimeError, + "ValueError": ValueError, + "ZeroDivisionError": ZeroDivisionError, + "KeyError": KeyError, + "TypeError": TypeError, + } + cls = types.get(kind, RuntimeError) + raise cls(msg) + + @router.get("/api/version") async def version_info(): """Lightweight version info — cacheable, no DB touch. Used by UI footer badge.""" diff --git a/app/auth/dependencies.py b/app/auth/dependencies.py index 4229c5e..0ab4d4d 100644 --- a/app/auth/dependencies.py +++ b/app/auth/dependencies.py @@ -145,6 +145,22 @@ def _get_local_dev_user(conn: duckdb.DuckDBPyConnection) -> Optional[dict]: return user +def _stash_user(request: Optional[Request], user: dict) -> dict: + """Park the resolved user on ``request.state.user``. + + Read by response-phase middleware (e.g. the PostHog snippet injector + and the 500 handler) so they can identify the actor without re-running + the auth dependency. Tolerant of ``None`` requests (background paths + that call this helper from non-HTTP contexts). + """ + if request is not None: + try: + request.state.user = user + except Exception: + pass + return user + + async def get_current_user( request: Request = None, authorization: Optional[str] = Header(None), @@ -159,7 +175,7 @@ async def get_current_user( user = _get_local_dev_user(conn) if user: _attach_admin_flag(user, conn) - return user + return _stash_user(request, user) # Fall through to normal auth if seed missing — surfaces the bug # instead of hiding it. @@ -188,7 +204,7 @@ async def get_current_user( scheduler_user = get_scheduler_user(conn) if scheduler_user: _attach_admin_flag(scheduler_user, conn) - return scheduler_user + return _stash_user(request, scheduler_user) raise HTTPException( status_code=status.HTTP_401_UNAUTHORIZED, detail="Scheduler user not provisioned", @@ -198,7 +214,7 @@ async def get_current_user( user, reason = resolve_token_to_user(conn, token, request) if user: _attach_admin_flag(user, conn) - return user + return _stash_user(request, user) raise HTTPException( status_code=status.HTTP_401_UNAUTHORIZED, detail=_AUTH_DETAIL_BY_REASON.get(reason, "Invalid or expired token"), diff --git a/app/main.py b/app/main.py index cf6c146..e3b329b 100644 --- a/app/main.py +++ b/app/main.py @@ -152,7 +152,23 @@ async def lifespan(app): from app.api.cache_warmup import maybe_schedule_startup_warmup maybe_schedule_startup_warmup() + # Construct the PostHog client up front so its background flush thread + # starts before the first request — and so a missing/invalid key fails + # loud at boot rather than on first capture. No-op when disabled. + try: + from src.observability import get_posthog + pc = get_posthog() + if pc.enabled: + logger.info("PostHog observability enabled (host=%s, identify=%s, replay=%s)", + pc.host, pc.identify_mode, pc.replay_enabled) + except Exception: + logger.exception("PostHog init at startup failed") yield + try: + from src.observability import get_posthog + get_posthog().shutdown() + except Exception: + logger.exception("PostHog shutdown failed") from src.db import close_system_db close_system_db() @@ -287,6 +303,17 @@ def create_app() -> FastAPI: "DEBUG=1 but fastapi-debug-toolbar not installed; toolbar disabled", ) + # PostHog HTML snippet injection — must run INSIDE the GZip layer so it + # sees uncompressed HTML before compression. Starlette runs middleware + # in reverse-registration order on the response, so registering this + # before _SelectiveGZipMiddleware places it deeper in the stack and + # therefore earlier in the response chain. Many of this app's templates + # are standalone (their own ) and never extend base.html, so + # a per-template include would miss them; the middleware covers + # everything in one place. No-op when POSTHOG_API_KEY is unset. + from app.middleware.posthog_inject import PosthogInjectionMiddleware + app.add_middleware(PosthogInjectionMiddleware) + # Compress JSON / HTML responses on the wire. Parquet downloads are # excluded — they're already columnar-compressed and re-gzipping them # just burns CPU with no size win. minimum_size=1024 keeps tiny @@ -697,6 +724,25 @@ def create_app() -> FastAPI: import traceback as _tb logger.exception("Unhandled exception on %s %s", request.method, request.url.path) + # Best-effort: forward the exception to PostHog before rendering the + # error page. Disabled state is a cheap no-op. Wrapped because a + # tracing failure must never replace the user-visible 500 with a + # second exception. + try: + from src.observability import get_posthog + from app.logging_config import request_id_var as _rid_var + get_posthog().capture_exception( + exc, + request=request, + properties={ + "request_id": _rid_var.get(), + "path": request.url.path, + "method": request.method, + }, + ) + except Exception: + logger.exception("PostHog capture_exception failed in 500 handler") + path_is_api = request.url.path.startswith(_API_PATH_PREFIXES) debug_on = _os.environ.get("DEBUG", "").lower() in ("1", "true", "yes") tb_str = _tb.format_exc() if debug_on else None diff --git a/app/middleware/posthog_inject.py b/app/middleware/posthog_inject.py new file mode 100644 index 0000000..76f1b4f --- /dev/null +++ b/app/middleware/posthog_inject.py @@ -0,0 +1,156 @@ +"""HTML-injection middleware that places the PostHog snippet in every page. + +Many of this app's Jinja templates are standalone (their own ````) and do not extend ``base.html`` / ``base_login.html`` — including +the dashboard, catalog, admin pages, and activity center. Adding +``{% include '_posthog.html' %}`` to each one is fragile and easy to miss. + +Instead, this middleware rewrites every HTML response to inject the +rendered snippet immediately before ````. When PostHog is disabled +(no ``POSTHOG_API_KEY``) the middleware is a no-op. + +Skips: + * Non-HTML responses (everything API, JSON, parquet, CSV). + * Responses larger than ``_MAX_BUFFER_BYTES`` — defends against + genuine HTML streams (rare but legal: large dashboards rendered + as chunked transfer) where buffering the entire body would balloon + memory. Snippet injection is best-effort. + * Responses that already contain ``posthog.init`` (defensive — keeps + base-extending templates from getting a double-injection if a + future change re-includes the partial there). + +Background tasks attached to a route via ``Response.background`` are +preserved on every return path. ``BaseHTTPMiddleware`` materialises the +body and asks subclasses to return a fresh ``Response``; forgetting to +forward ``background`` would silently cancel any deferred work the +handler scheduled (audit logging, async webhooks, deferred email sends), +with no log line. Caught in PR #231 review (minasarustamyan). +""" + +from __future__ import annotations + +import logging +from typing import Awaitable, Callable + +from starlette.middleware.base import BaseHTTPMiddleware +from starlette.requests import Request +from starlette.responses import Response +from starlette.types import ASGIApp + +logger = logging.getLogger(__name__) + + +_HEAD_CLOSE = b"" + +# Hard ceiling on how much body we're willing to buffer in memory just to +# inject ~3 KB of snippet. 4 MB covers every HTML page this app currently +# emits with ample headroom while preventing a pathological streamed-HTML +# response from ballooning RSS. Adjust if a legitimate page exceeds it. +_MAX_BUFFER_BYTES = 4 * 1024 * 1024 + + +def _passthrough(body: bytes, response: Response) -> Response: + """Return a fresh ``Response`` carrying ``body`` plus every attribute of + ``response`` that ``BaseHTTPMiddleware`` would otherwise drop — + importantly ``background`` so any ``BackgroundTask`` / + ``BackgroundTasks`` the handler attached still fires. + """ + return Response( + content=body, + status_code=response.status_code, + headers=dict(response.headers), + media_type=response.media_type, + background=response.background, + ) + + +class PosthogInjectionMiddleware(BaseHTTPMiddleware): + """Inject the PostHog snippet into every HTML response.""" + + def __init__(self, app: ASGIApp) -> None: + super().__init__(app) + + async def dispatch( + self, + request: Request, + call_next: Callable[[Request], Awaitable[Response]], + ) -> Response: + from src.observability import get_posthog + if not get_posthog().enabled: + return await call_next(request) + + response = await call_next(request) + + content_type = response.headers.get("content-type", "") + if "text/html" not in content_type.lower(): + return response + + # Buffer the body. ``BaseHTTPMiddleware`` consumes + # ``response.body_iterator`` here — once we iterate it, the only + # way to forward the response is to return a new one. Bail out + # past ``_MAX_BUFFER_BYTES`` so a streamed HTML response (rare but + # legal) doesn't balloon memory. + chunks: list[bytes] = [] + total = 0 + too_big = False + async for chunk in response.body_iterator: # type: ignore[attr-defined] + buf = chunk if isinstance(chunk, (bytes, bytearray)) else chunk.encode("utf-8") + total += len(buf) + if total > _MAX_BUFFER_BYTES: + too_big = True + # Still need to drain the iterator to avoid breaking the + # ASGI stream contract; but stop appending so we don't + # hold every chunk. + continue + chunks.append(buf) + if too_big: + logger.warning( + "PostHog snippet injection skipped: HTML response > %d bytes (path=%s)", + _MAX_BUFFER_BYTES, request.url.path, + ) + # We've consumed the iterator; rebuild from the chunks we + # captured before the cap. Better to serve a truncated body + # than to crash, but in practice the cap is set so this + # branch shouldn't fire for legitimate pages. + return _passthrough(b"".join(chunks), response) + + body = b"".join(chunks) + + if _HEAD_CLOSE not in body or b"posthog.init" in body: + return _passthrough(body, response) + + try: + snippet = _render_snippet(request) + except Exception: + logger.exception("PostHog snippet render failed; serving response unmodified") + return _passthrough(body, response) + + body = body.replace(_HEAD_CLOSE, snippet.encode("utf-8") + _HEAD_CLOSE, 1) + # content-length must reflect the rewritten body — Starlette's + # ``Response`` sets it for us when we drop the prior header. + new_headers = {k: v for k, v in response.headers.items() if k.lower() != "content-length"} + return Response( + content=body, + status_code=response.status_code, + headers=new_headers, + media_type=response.media_type, + background=response.background, + ) + + +def _render_snippet(request: Request) -> str: + """Render ``_posthog.html`` with the current request's identify state.""" + from app.web.router import templates, _posthog_user_block, _posthog_config_global + + cfg = _posthog_config_global() + user_block = _posthog_user_block(request) + + template = templates.get_template("_posthog.html") + return template.render( + request=request, + posthog_config=cfg, + # ``_posthog.html`` calls ``posthog_user_block(request)`` itself — + # provide the same callable so the template renders identically + # to the inline-include path. + posthog_user_block=lambda _r: user_block, + ) diff --git a/app/web/router.py b/app/web/router.py index e5c91f6..8e80b28 100644 --- a/app/web/router.py +++ b/app/web/router.py @@ -89,6 +89,74 @@ def _humanbytes(value) -> str: templates.env.filters["humanbytes"] = _humanbytes +# ---- PostHog template wiring ---- +# Two Jinja globals injected into every render so the `_posthog.html` partial +# (included from `base.html` and `base_login.html`) can render the browser +# snippet — or render nothing when the integration is disabled. +# +# posthog_config process-level static config (host, project key, +# replay flag, extra mask selector). Resolved +# once on first access. +# posthog_user_block(request) per-request identify payload honoring the +# operator-chosen identify mode. Returns None +# for anonymous renders. +def _posthog_config_global() -> dict: + from src.observability import get_posthog + pc = get_posthog() + if not pc.enabled: + return {"enabled": False} + return { + "enabled": True, + "host": pc.host, + "api_key_public": pc.api_key_public, + "replay_enabled": pc.replay_enabled, + "replay_mask_selector_extra": pc.replay_mask_selector_extra, + "environment": pc.environment, + "release": pc.release, + } + + +def _posthog_user_block(request: Optional[Request]) -> Optional[dict]: + from src.observability import get_posthog + pc = get_posthog() + if not pc.enabled: + return None + mode = pc.identify_mode + if mode == "none": + return None + user = None + if request is not None: + try: + user = getattr(request.state, "user", None) + except Exception: + user = None + if not user: + return None + + def _get(attr: str): + if isinstance(user, dict): + return user.get(attr) + return getattr(user, attr, None) + + distinct_id = _get("id") or _get("user_id") or _get("email") + if not distinct_id: + return None + props: dict = {} + if mode in ("email", "full"): + email = _get("email") + if email: + props["email"] = str(email) + if mode == "full": + name = _get("name") or _get("full_name") + if name: + props["name"] = str(name) + return {"distinct_id": str(distinct_id), "props": props} + + +templates.env.globals["posthog_config"] = _posthog_config_global() +templates.env.globals["posthog_user_block"] = _posthog_user_block + + class _FlexDict(dict): """Dict that returns empty _FlexDict for missing keys and attributes. Prevents Jinja2 UndefinedError when templates access missing nested values.""" diff --git a/app/web/templates/_posthog.html b/app/web/templates/_posthog.html new file mode 100644 index 0000000..2f54381 --- /dev/null +++ b/app/web/templates/_posthog.html @@ -0,0 +1,77 @@ +{# PostHog browser snippet — included from in base.html / base_login.html. + Renders nothing when the integration is disabled (no POSTHOG_API_KEY set on + the server). The `posthog_config` Jinja global is wired up once at app + startup in app/web/router.py from src.observability.get_posthog(). + + Privacy posture: + * Session replay is masked-by-default (`maskAllInputs: true` plus a CSS + selector covering data cells / inputs). Operator can append a custom + selector via POSTHOG_REPLAY_MASK_SELECTOR. + * `person_profiles: 'identified_only'` keeps anonymous visits out of the + people table. + * Identification respects POSTHOG_IDENTIFY_PII (none/id/email/full). +#} +{% if posthog_config and posthog_config.enabled %} + +{% endif %} diff --git a/cli/main.py b/cli/main.py index a3dd3e6..2a73610 100644 --- a/cli/main.py +++ b/cli/main.py @@ -131,7 +131,28 @@ app.add_typer(store_app, name="store") app.add_typer(my_stack_app, name="my-stack") -def _run_with_clean_errors() -> None: +def _capture_cli_exception(exc: BaseException, kind: str) -> None: + """Best-effort PostHog forward for CLI-level errors. No-op when off.""" + try: + from src.observability import get_posthog + argv = sys.argv[1:] + command = argv[0] if argv else "" + get_posthog().capture_exception( + exc, + distinct_id="cli", + properties={ + "component": "cli", + "command": command, + "argv": " ".join(argv)[:512], + "error_kind": kind, + }, + ) + get_posthog().shutdown() + except Exception: + pass # never replace the user-visible error with a tracing failure + + +def main() -> None: """Wrap ``app()`` so AgnesTransportError (and other typed CLI errors) surface as a one-line message + exit, never as a Python traceback. The full traceback is already logged to ``~/.config/agnes/last-error.log`` @@ -140,6 +161,11 @@ def _run_with_clean_errors() -> None: fixing — log + print "internal error" so the analyst doesn't see a Pythonist's traceback either. + Also forwards captured exceptions to PostHog (no-op when disabled) so + operators can see CLI-level failures alongside server-side ones. + Normal control-flow exits (typer.Exit / SystemExit / KeyboardInterrupt) + are never reported. + Pavel's #185 Phase 3B: previously a `httpx.ReadTimeout` from an `agnes query --remote` against a slow BQ view dumped a 30-frame traceback to the analyst's terminal. Now: one clean line + a hint, @@ -148,7 +174,8 @@ def _run_with_clean_errors() -> None: from cli.client import AgnesTransportError, _log_traceback, _LOG_FILE try: app() - except (AgnesTransportError) as exc: + except AgnesTransportError as exc: + _capture_cli_exception(exc, kind="transport") typer.echo(f"Error: {exc.user_message}", err=True) if exc.hint: typer.echo(exc.hint, err=True) @@ -158,6 +185,7 @@ def _run_with_clean_errors() -> None: except (KeyboardInterrupt, SystemExit): raise except Exception as exc: # last-resort net — escaped exceptions are bugs + _capture_cli_exception(exc, kind="unhandled") log = _log_traceback(exc, context="unhandled at CLI top-level") typer.echo( f"Error: internal CLI error ({type(exc).__name__}). " @@ -168,4 +196,4 @@ def _run_with_clean_errors() -> None: if __name__ == "__main__": - _run_with_clean_errors() + main() diff --git a/config/.env.template b/config/.env.template index 2673ad7..0e4e4cf 100644 --- a/config/.env.template +++ b/config/.env.template @@ -113,3 +113,45 @@ SESSION_SECRET= # python -c "import secrets; print(secrets.token_he # reads DEBUG at call time, so the toolbar's mount status is fixed once the # app starts, but per-connection instrumentation respects runtime env changes. # DEBUG=1 + +# === Optional observability: PostHog === +# Off by default. With POSTHOG_API_KEY unset the integration is fully disabled +# (no JS shipped to the browser, no client init, no network). Setting the key +# enables backend exception capture, LLM call tracing ($ai_generation events), +# frontend errors / pageviews, masked session replay, and feature flags. +# Operator guide: docs/observability.md. +# +# POSTHOG_API_KEY must be a PROJECT (publishable, "phc_...") key. The project +# key is embedded in the browser snippet — do NOT use a personal API key here. +# POSTHOG_API_KEY=phc_xxx +# +# Default points at PostHog's EU Cloud endpoint. Override for the US region or +# a self-hosted deployment. +# POSTHOG_HOST=https://eu.i.posthog.com +# +# Identification mode for logged-in users: +# none - never identify; distinct_id is a random cookie +# id - identify by user.id only (no PII) +# email - identify by user.id + email (default) +# full - id + email + name +# POSTHOG_IDENTIFY_PII=email +# +# Disable session replay even when the integration is on (errors / events / +# flags still flow). Default true. +# POSTHOG_REPLAY=true +# +# Append a CSS selector to the default replay mask list. Useful when a custom +# template introduces a new sensitive surface (e.g. .customer-pii). The default +# masks: [data-sensitive], .data-cell, .query-result, .sql-output, code, pre. +# POSTHOG_REPLAY_MASK_SELECTOR= + +# Ship prompt + completion bodies inside $ai_generation events. Off by default +# because LLM prompts in this product routinely include customer SQL / data. +# Token counts and latency always flow regardless. +# POSTHOG_LLM_PAYLOADS=0 + +# Environment label tagged on every captured event (super property). +# Use it in PostHog dashboards to split local / dev / staging / production. +# Resolution order when unset: LOCAL_DEV_MODE=1 -> "local"; else +# RELEASE_CHANNEL value; else AGNES_DEPLOYMENT_ENV; else "unknown". +# POSTHOG_ENVIRONMENT=production diff --git a/connectors/llm/anthropic_provider.py b/connectors/llm/anthropic_provider.py index ee1539c..cb77b2e 100644 --- a/connectors/llm/anthropic_provider.py +++ b/connectors/llm/anthropic_provider.py @@ -125,18 +125,23 @@ class AnthropicExtractor: attempt, MAX_RETRIES, self._model, schema_name, ) + from src.observability import trace_generation + try: - response = self._client.messages.create( - model=self._model, - max_tokens=max_tokens, - messages=[{"role": "user", "content": prompt}], - output_config={ - "format": { - "type": "json_schema", - "schema": _strict_json_schema(json_schema), + with trace_generation(provider="anthropic", model=self._model) as _trace: + _trace.set_input(prompt) + response = self._client.messages.create( + model=self._model, + max_tokens=max_tokens, + messages=[{"role": "user", "content": prompt}], + output_config={ + "format": { + "type": "json_schema", + "schema": _strict_json_schema(json_schema), + }, }, - }, - ) + ) + _trace.set_output_from_anthropic(response) except anthropic.AuthenticationError as e: raise LLMAuthError("Anthropic authentication failed (check API key)") from e except anthropic.RateLimitError as e: diff --git a/connectors/llm/openai_compat.py b/connectors/llm/openai_compat.py index e3579a9..5918cfc 100644 --- a/connectors/llm/openai_compat.py +++ b/connectors/llm/openai_compat.py @@ -262,8 +262,13 @@ class OpenAICompatExtractor: ] kwargs["messages"] = messages + from src.observability import trace_generation + try: - response = self._client.chat.completions.create(**kwargs) + with trace_generation(provider="openai_compat", model=self._model) as _trace: + _trace.set_input(prompt) + response = self._client.chat.completions.create(**kwargs) + _trace.set_output_from_openai(response) except openai.AuthenticationError as e: raise LLMAuthError( f"OpenAI-compat authentication failed at {self._safe_url} (check API key)" diff --git a/docs/observability.md b/docs/observability.md new file mode 100644 index 0000000..247a694 --- /dev/null +++ b/docs/observability.md @@ -0,0 +1,127 @@ +# Observability — PostHog integration + +Optional integration that wires four signals into a single PostHog project: + +1. **Backend exceptions** — every unhandled FastAPI exception, plus rebuild + failures from `src/orchestrator.py` and HTTP-job failures from + `services/scheduler/`. +2. **LLM tracing** — every Anthropic / OpenAI-compat call emits a + `$ai_generation` event with provider, model, latency, and token counts. +3. **Frontend errors + pageviews** — `window.error` / + `unhandledrejection` forwarded via `posthog.captureException`; automatic + `$pageview` and `$pageleave`. +4. **Session replay (masked) + feature flags** — both gated behind the same + single `POSTHOG_API_KEY`. + +The integration ships **off by default**. Setting one environment variable +turns everything on. + +## Enabling the integration + +```bash +# Required — the only switch that controls on/off. +# Use a PROJECT key (publishable phc_…), never a personal API key. +POSTHOG_API_KEY=phc_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +``` + +That's the entire minimum. Defaults will: + +- Send to `https://eu.i.posthog.com` (override with `POSTHOG_HOST`). +- Identify logged-in users by id + email (override with `POSTHOG_IDENTIFY_PII`). +- Record session replay with all inputs and known data surfaces masked + (override with `POSTHOG_REPLAY=false` or + `POSTHOG_REPLAY_MASK_SELECTOR=…`). +- Skip prompt / completion bodies in LLM events; emit token counts + latency + only (override with `POSTHOG_LLM_PAYLOADS=1` if you accept the privacy + trade-off — LLM prompts in this product routinely include customer SQL + and data). + +## All knobs + +| Variable | Default | Notes | +|---|---|---| +| `POSTHOG_API_KEY` | unset | **The on/off switch.** Unset = integration is fully off. Project key only. | +| `POSTHOG_HOST` | `https://eu.i.posthog.com` | Full URL. Use `https://us.i.posthog.com` for the US region or your own host. | +| `POSTHOG_IDENTIFY_PII` | `email` | `none` / `id` / `email` / `full`. | +| `POSTHOG_REPLAY` | `true` | Disable replay only, keeping errors / events / flags. | +| `POSTHOG_REPLAY_MASK_SELECTOR` | empty | CSS selector appended to the default mask list. | +| `POSTHOG_LLM_PAYLOADS` | `0` | `1` adds `$ai_input` + `$ai_output_choices` to LLM events. Off by default. | +| `POSTHOG_ENVIRONMENT` | auto | Tagged on every event as the `environment` super-property. Auto-resolves to `local` when `LOCAL_DEV_MODE=1`, else `RELEASE_CHANNEL`, else `AGNES_DEPLOYMENT_ENV`, else `unknown`. | + +## Splitting traffic by environment + +Every captured event — backend exceptions, `$ai_generation`, browser +`$pageview`, JS errors, custom events — is tagged with two super +properties so PostHog dashboards can slice cleanly: + +- `environment` — resolved at startup (see table above). Operators + typically set this to `local`, `staging`, or `production` explicitly, + or rely on the auto-resolver. +- `release` — the running `AGNES_VERSION`, falling back to + `RELEASE_CHANNEL`. Useful for "is this error new in this release?" + cohorting. + +Both apply to backend events via the SDK's `super_properties` and to +browser events via `posthog.register({...})` in the loaded callback, so +filtering by `environment = production` in PostHog hides every event +generated from a developer laptop, CI, or staging. + +## Privacy posture + +- The PostHog **project key** is publishable — it's safe in browser HTML. + PostHog uses a separate **personal API key** for admin operations. This + integration only ever exposes the project key. Treat the personal key like + any other secret and never set it as `POSTHOG_API_KEY`. +- Session replay defaults: `maskAllInputs: true`, plus a CSS-selector mask + for known data-bearing classes (`.data-cell`, `.query-result`, + `.sql-output`, plain `` and `
`, and any element marked
+  `data-sensitive`). Add your own with `POSTHOG_REPLAY_MASK_SELECTOR`.
+- LLM payloads are **off by default** because the prompts and completions
+  in this product include customer SQL, query results, and table samples.
+  Token counts and latency are always sent (no payload contents in them).
+- `person_profiles: 'identified_only'` — anonymous visits do not create
+  person records.
+
+## Where the events come from
+
+| Event | Code path |
+|---|---|
+| `$exception` (unhandled 500) | `app/main.py:_unhandled_exception_handler` |
+| `$exception` (orchestrator rebuild) | `src/orchestrator.py:_capture_orchestrator_exception` |
+| `$exception` (scheduler job) | `services/scheduler/__main__.py:_call_api` |
+| `$exception` (CLI uncaught) | `cli/main.py:main` |
+| `$ai_generation` | `src/observability/llm_tracing.py:trace_generation` wrapped at `connectors/llm/anthropic_provider.py:_attempt_extraction` and `connectors/llm/openai_compat.py` |
+| `$pageview`, `$pageleave`, JS errors | injected into every `text/html` response by `app/middleware/posthog_inject.py` |
+
+## CLI coverage
+
+The `da` CLI (`cli/main.py:main`) catches every uncaught exception from a
+command, forwards it to PostHog with `component=cli` and the invoked
+command name, then flushes the client before re-raising for Typer's
+default error printer. Normal Typer / Click exits, `SystemExit`, and
+`KeyboardInterrupt` are intentionally skipped.
+
+Operators must surface `POSTHOG_API_KEY` (and any other `POSTHOG_*` knob)
+into the shell that runs `da` — typically by sourcing the same `.env` the
+server uses, or by setting the variable in their shell profile. The CLI
+respects exactly the same env-var contract as the server.
+
+LLM calls made by CLI commands (`da query`, `da explore`, etc.) flow
+through the provider wrappers in `connectors/llm/` and therefore emit
+`$ai_generation` events via the same tracing path the server uses.
+
+## Testing the integration
+
+Boot the app with the key set, hit `/`, then provoke a 500 (e.g. via a
+debug-only route). One **Errors** event should arrive within seconds along
+with one `$pageview` per page load. Open **Session replay** and pick the
+session — every `` should show as a masked rectangle.
+
+The unit tests in `tests/test_posthog_*.py` cover the disabled and enabled
+configurations; `tests/test_llm_tracing.py` exercises the success and error
+variants of the LLM event.
+
+## Self-hosting note
+
+PostHog is itself open source — operators with a self-hosted PostHog instance
+just point `POSTHOG_HOST` at their endpoint. No code changes required.
diff --git a/pyproject.toml b/pyproject.toml
index b182468..a4a78d5 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -77,9 +77,19 @@ dependencies = [
     # SDK stays for the metadata reads.
     "kbcstorage>=0.9.0",
     "sse-starlette>=2.0",
+    # Optional observability — pure-Python, no compilation. Lazily initialized
+    # in src/observability/posthog_client.py and only emits events when
+    # POSTHOG_API_KEY is set in the environment. With the key unset the
+    # integration is fully off (no network, no init). See docs/observability.md.
+    "posthog>=3.7.0",
 ]
 
 [project.optional-dependencies]
+observability = [
+    # Already in base dependencies — listed here so operators who want to
+    # be explicit can `pip install -e ".[observability]"` and signal intent.
+    "posthog>=3.7.0",
+]
 dev = [
     "pytest>=9.0.0",
     "pytest-timeout>=2.0.0",
@@ -96,7 +106,7 @@ dev = [
 ]
 
 [project.scripts]
-agnes = "cli.main:_run_with_clean_errors"
+agnes = "cli.main:main"
 
 [build-system]
 requires = ["hatchling"]
diff --git a/services/scheduler/__main__.py b/services/scheduler/__main__.py
index a915dcb..24f2c4f 100644
--- a/services/scheduler/__main__.py
+++ b/services/scheduler/__main__.py
@@ -194,6 +194,15 @@ def _call_api(endpoint: str, method: str, timeout_sec: int) -> bool:
             return False
     except Exception as e:
         logger.error(f"Job {endpoint} failed: {e}")
+        try:
+            from src.observability import get_posthog
+            get_posthog().capture_exception(
+                e,
+                distinct_id="system",
+                properties={"job": endpoint, "method": method, "component": "scheduler"},
+            )
+        except Exception:
+            logger.exception("PostHog capture_exception failed in scheduler")
         return False
 
 
diff --git a/src/observability/__init__.py b/src/observability/__init__.py
new file mode 100644
index 0000000..eb552a6
--- /dev/null
+++ b/src/observability/__init__.py
@@ -0,0 +1,10 @@
+"""Optional observability layer (PostHog).
+
+Disabled by default. Enabled when ``POSTHOG_API_KEY`` is set in the
+environment. See ``docs/observability.md`` for the operator guide.
+"""
+
+from src.observability.posthog_client import get_posthog, reset_posthog
+from src.observability.llm_tracing import trace_generation
+
+__all__ = ["get_posthog", "reset_posthog", "trace_generation"]
diff --git a/src/observability/llm_tracing.py b/src/observability/llm_tracing.py
new file mode 100644
index 0000000..9aac080
--- /dev/null
+++ b/src/observability/llm_tracing.py
@@ -0,0 +1,227 @@
+"""LLM call instrumentation that emits PostHog ``$ai_generation`` events.
+
+PostHog's LLM Observability product consumes events with a documented
+property schema (``$ai_provider``, ``$ai_model``, ``$ai_input_tokens``,
+``$ai_output_tokens``, ``$ai_latency``, ``$ai_trace_id``, ``$ai_input``,
+``$ai_output_choices``, ``$ai_is_error``).
+
+Use the :func:`trace_generation` context manager around the synchronous
+provider call. The capture object lets the caller record token counts
+and (when ``POSTHOG_LLM_PAYLOADS=1``) prompt/completion content.
+
+Example::
+
+    from src.observability import trace_generation
+
+    with trace_generation(provider="anthropic", model="claude-opus-4") as cap:
+        cap.set_input(prompt)
+        response = client.messages.create(...)
+        cap.set_output_from_anthropic(response)
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+import time
+import uuid
+from contextlib import contextmanager
+from typing import Any, Iterator
+
+from src.observability.posthog_client import get_posthog
+
+logger = logging.getLogger(__name__)
+
+
+# Default character cap for LLM prompt / completion payloads. Sized to
+# leave ~2 KB of headroom under PostHog's ~32 KB per-event ingest limit
+# for the surrounding event envelope (provider, model, tokens, request
+# id, super-properties, etc.). Override via the env var below.
+_DEFAULT_LLM_PAYLOAD_MAX_CHARS = 30_000
+
+
+def _llm_payload_cap() -> int:
+    raw = os.environ.get("POSTHOG_LLM_PAYLOAD_MAX_CHARS", "").strip()
+    if not raw:
+        return _DEFAULT_LLM_PAYLOAD_MAX_CHARS
+    try:
+        value = int(raw)
+    except ValueError:
+        logger.warning(
+            "POSTHOG_LLM_PAYLOAD_MAX_CHARS=%r is not an int; falling back to %d",
+            raw, _DEFAULT_LLM_PAYLOAD_MAX_CHARS,
+        )
+        return _DEFAULT_LLM_PAYLOAD_MAX_CHARS
+    if value <= 0:
+        return _DEFAULT_LLM_PAYLOAD_MAX_CHARS
+    return value
+
+
+def _truncate(value: Any, max_chars: int) -> Any:
+    """Return ``value`` clipped to ``max_chars`` characters as a string.
+
+    Lists / dicts / non-string scalars get ``str()``-converted because
+    PostHog stores event properties as JSON and a multi-megabyte nested
+    structure would just silently get rejected at ingest. Truncated
+    payloads carry an explicit ``…[truncated N chars]`` suffix so a
+    reader doesn't mistake them for a complete capture.
+    """
+    if value is None:
+        return None
+    text = value if isinstance(value, str) else str(value)
+    if len(text) <= max_chars:
+        return text
+    dropped = len(text) - max_chars
+    return text[:max_chars] + f"…[truncated {dropped} chars]"
+
+
+class _Capture:
+    """Per-call mutable bag the wrapped code uses to report token counts.
+
+    All setters are best-effort — exceptions are swallowed so a tracing
+    bug never breaks the LLM call itself.
+    """
+
+    def __init__(self) -> None:
+        self.input_tokens: int | None = None
+        self.output_tokens: int | None = None
+        self.prompt: Any = None
+        self.output: Any = None
+        self.extra: dict[str, Any] = {}
+
+    def set_input(self, prompt: Any) -> None:
+        try:
+            self.prompt = prompt
+        except Exception:
+            pass
+
+    def set_output(self, output: Any) -> None:
+        try:
+            self.output = output
+        except Exception:
+            pass
+
+    def set_tokens(self, input_tokens: int | None, output_tokens: int | None) -> None:
+        try:
+            self.input_tokens = input_tokens
+            self.output_tokens = output_tokens
+        except Exception:
+            pass
+
+    def set_output_from_anthropic(self, response: Any) -> None:
+        """Pull token counts + completion text from an Anthropic response.
+
+        Tolerates SDK version differences and partial responses.
+        """
+        try:
+            usage = getattr(response, "usage", None)
+            if usage is not None:
+                self.input_tokens = getattr(usage, "input_tokens", None)
+                self.output_tokens = getattr(usage, "output_tokens", None)
+            content = getattr(response, "content", None)
+            if content is not None:
+                # Anthropic returns a list of blocks; capture text payloads.
+                texts: list[str] = []
+                for block in content:
+                    text = getattr(block, "text", None)
+                    if isinstance(text, str):
+                        texts.append(text)
+                if texts:
+                    self.output = texts if len(texts) > 1 else texts[0]
+        except Exception:
+            logger.debug("set_output_from_anthropic: extraction failed", exc_info=True)
+
+    def set_output_from_openai(self, response: Any) -> None:
+        """Pull token counts + completion text from an OpenAI-compat response."""
+        try:
+            usage = getattr(response, "usage", None)
+            if usage is not None:
+                self.input_tokens = getattr(usage, "prompt_tokens", None)
+                self.output_tokens = getattr(usage, "completion_tokens", None)
+            choices = getattr(response, "choices", None)
+            if choices:
+                first = choices[0]
+                msg = getattr(first, "message", None)
+                if msg is not None:
+                    self.output = getattr(msg, "content", None)
+        except Exception:
+            logger.debug("set_output_from_openai: extraction failed", exc_info=True)
+
+
+class _Noop:
+    """Same surface as :class:`_Capture`; all setters drop on the floor."""
+
+    def set_input(self, prompt: Any) -> None: ...
+    def set_output(self, output: Any) -> None: ...
+    def set_tokens(self, input_tokens: int | None, output_tokens: int | None) -> None: ...
+    def set_output_from_anthropic(self, response: Any) -> None: ...
+    def set_output_from_openai(self, response: Any) -> None: ...
+
+    extra: dict[str, Any] = {}
+
+
+@contextmanager
+def trace_generation(
+    provider: str,
+    model: str,
+    distinct_id: str | None = None,
+    parent_trace_id: str | None = None,
+) -> Iterator[Any]:
+    """Wrap an LLM call and emit one ``$ai_generation`` event on exit.
+
+    Yields a capture object; the caller fills it in from the response.
+    Disabled state yields a :class:`_Noop`. Exceptions in the wrapped
+    block are re-raised after emitting an error variant of the event.
+    """
+    pc = get_posthog()
+    if not pc.enabled:
+        yield _Noop()
+        return
+
+    cap = _Capture()
+    trace_id = parent_trace_id or uuid.uuid4().hex
+    started = time.perf_counter()
+    error: BaseException | None = None
+    try:
+        yield cap
+    except BaseException as exc:
+        error = exc
+        raise
+    finally:
+        latency_s = time.perf_counter() - started
+        props: dict[str, Any] = {
+            "$ai_provider": provider,
+            "$ai_model": model,
+            "$ai_trace_id": trace_id,
+            "$ai_latency": latency_s,
+        }
+        if cap.input_tokens is not None:
+            props["$ai_input_tokens"] = cap.input_tokens
+        if cap.output_tokens is not None:
+            props["$ai_output_tokens"] = cap.output_tokens
+        if pc.llm_payloads_enabled:
+            # PostHog drops events past its per-event ingest size limit
+            # (~32 KB by default; the SDK does not chunk and 413 responses
+            # are best-effort log lines only — oversized captures land on
+            # the floor with no signal). Agnes prompts routinely include
+            # sample rows / table schemas / analyst SQL that exceed this
+            # limit, which is *exactly* when an operator wants to inspect
+            # them. Truncate so the metadata (provider, model, tokens,
+            # latency, error) keeps flowing while bounding payload size.
+            # PR #231 review (minasarustamyan).
+            #
+            # Cap is overridable via ``POSTHOG_LLM_PAYLOAD_MAX_CHARS``; the
+            # default of 30_000 leaves headroom under the 32 KB ceiling
+            # for the rest of the event envelope.
+            cap_chars = _llm_payload_cap()
+            if cap.prompt is not None:
+                props["$ai_input"] = _truncate(cap.prompt, cap_chars)
+            if cap.output is not None:
+                props["$ai_output_choices"] = _truncate(cap.output, cap_chars)
+        for key, value in cap.extra.items():
+            props.setdefault(key, value)
+        if error is not None:
+            props["$ai_is_error"] = True
+            props["$ai_error"] = repr(error)
+
+        pc.capture("$ai_generation", distinct_id or "system", props)
diff --git a/src/observability/posthog_client.py b/src/observability/posthog_client.py
new file mode 100644
index 0000000..541d593
--- /dev/null
+++ b/src/observability/posthog_client.py
@@ -0,0 +1,361 @@
+"""PostHog client wrapper — env-gated, lazy, no-op when disabled.
+
+The integration is **off by default**. It activates only when the
+``POSTHOG_API_KEY`` environment variable holds a non-empty PostHog
+project key (the publishable ``phc_…`` key, never a personal API key).
+
+When disabled, every public method is a cheap no-op and the underlying
+``posthog`` package's ``Posthog(...)`` client is never instantiated, so
+no background flush thread starts and no network calls are made.
+
+Configuration (environment variables):
+
+    POSTHOG_API_KEY            phc_… project key. Unset = integration off.
+    POSTHOG_HOST               default ``https://eu.i.posthog.com``.
+    POSTHOG_IDENTIFY_PII       ``none`` | ``id`` | ``email`` | ``full``
+                               (default ``email``).
+    POSTHOG_REPLAY             ``true`` (default) | ``false`` — gates the
+                               JS-side ``session_recording`` opt-in.
+    POSTHOG_LLM_PAYLOADS       ``1`` ships prompt/completion bodies inside
+                               ``$ai_generation`` events. Default off — the
+                               LLM-tracing helper still emits the event with
+                               token counts and latency.
+    POSTHOG_REPLAY_MASK_SELECTOR
+                               extra CSS selector appended to the default
+                               replay mask list.
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+import threading
+from typing import Any
+
+logger = logging.getLogger(__name__)
+
+
+_VALID_IDENTIFY_MODES = ("none", "id", "email", "full")
+
+
+def _bool_env(name: str, default: bool) -> bool:
+    raw = os.environ.get(name)
+    if raw is None:
+        return default
+    return raw.strip().lower() in ("1", "true", "yes", "on")
+
+
+def _resolve_environment() -> str:
+    """Pick the environment label attached to every captured event.
+
+    Resolution order:
+        1. ``POSTHOG_ENVIRONMENT`` — explicit operator override.
+        2. ``local`` when ``LOCAL_DEV_MODE`` is on (dev laptops).
+        3. ``RELEASE_CHANNEL`` (the existing channel marker — typically
+           ``stable`` for production tags, ``dev`` for branch builds).
+        4. ``AGNES_DEPLOYMENT_ENV`` (free-form sister variable some
+           operator playbooks set).
+        5. ``unknown`` — final fallback so a missing label never silently
+           pollutes the production view.
+    """
+    explicit = os.environ.get("POSTHOG_ENVIRONMENT", "").strip()
+    if explicit:
+        return explicit
+    if os.environ.get("LOCAL_DEV_MODE", "").strip().lower() in ("1", "true", "yes", "on"):
+        return "local"
+    channel = os.environ.get("RELEASE_CHANNEL", "").strip()
+    if channel:
+        return channel
+    deployment = os.environ.get("AGNES_DEPLOYMENT_ENV", "").strip()
+    if deployment:
+        return deployment
+    return "unknown"
+
+
+class PosthogClient:
+    """Single-process PostHog client.
+
+    Construct via :func:`get_posthog`; do not instantiate directly outside
+    of tests.
+    """
+
+    def __init__(self) -> None:
+        api_key = os.environ.get("POSTHOG_API_KEY", "").strip()
+        self._enabled = bool(api_key)
+        self._api_key = api_key
+        self._host = os.environ.get("POSTHOG_HOST", "https://eu.i.posthog.com").strip() or "https://eu.i.posthog.com"
+
+        identify_mode = os.environ.get("POSTHOG_IDENTIFY_PII", "email").strip().lower()
+        if identify_mode not in _VALID_IDENTIFY_MODES:
+            logger.warning(
+                "POSTHOG_IDENTIFY_PII=%r is invalid; falling back to 'email'. "
+                "Valid: %s.",
+                identify_mode,
+                ", ".join(_VALID_IDENTIFY_MODES),
+            )
+            identify_mode = "email"
+        self._identify_mode = identify_mode
+
+        self._replay_enabled = _bool_env("POSTHOG_REPLAY", True)
+        self._llm_payloads_enabled = _bool_env("POSTHOG_LLM_PAYLOADS", False)
+        self._replay_extra_mask = os.environ.get("POSTHOG_REPLAY_MASK_SELECTOR", "").strip()
+        self._environment = _resolve_environment()
+        self._release = os.environ.get("AGNES_VERSION", "").strip() or os.environ.get("RELEASE_CHANNEL", "").strip() or None
+
+        self._client: Any = None
+
+        if not self._enabled:
+            return
+
+        try:
+            from posthog import Posthog
+        except ImportError:  # pragma: no cover — posthog is in base deps
+            logger.warning("POSTHOG_API_KEY is set but the `posthog` package is not installed; disabling integration.")
+            self._enabled = False
+            return
+
+        super_props: dict = {"environment": self._environment}
+        if self._release:
+            super_props["release"] = self._release
+
+        try:
+            self._client = Posthog(
+                project_api_key=api_key,
+                host=self._host,
+                feature_flags_request_timeout_seconds=2,
+                super_properties=super_props,
+            )
+        except Exception:
+            logger.exception("PostHog client init failed; disabling integration.")
+            self._enabled = False
+            self._client = None
+
+    # ----- introspection -----
+
+    @property
+    def enabled(self) -> bool:
+        return self._enabled
+
+    @property
+    def host(self) -> str:
+        return self._host
+
+    @property
+    def api_key_public(self) -> str:
+        """The project key. Safe to embed in browser-served HTML."""
+        return self._api_key
+
+    @property
+    def identify_mode(self) -> str:
+        return self._identify_mode
+
+    @property
+    def replay_enabled(self) -> bool:
+        return self._replay_enabled
+
+    @property
+    def llm_payloads_enabled(self) -> bool:
+        return self._llm_payloads_enabled
+
+    @property
+    def replay_mask_selector_extra(self) -> str:
+        return self._replay_extra_mask
+
+    @property
+    def environment(self) -> str:
+        return self._environment
+
+    @property
+    def release(self) -> str | None:
+        return self._release
+
+    # ----- capture API -----
+
+    def capture(self, event: str, distinct_id: str, properties: dict | None = None) -> None:
+        if not self._enabled or self._client is None:
+            return
+        try:
+            self._client.capture(
+                distinct_id=distinct_id,
+                event=event,
+                properties=properties or {},
+            )
+        except Exception:
+            logger.exception("PostHog capture failed (event=%s)", event)
+
+    def capture_exception(
+        self,
+        exc: BaseException,
+        distinct_id: str | None = None,
+        request: Any = None,
+        properties: dict | None = None,
+    ) -> None:
+        if not self._enabled or self._client is None:
+            return
+        props: dict = dict(properties or {})
+        if request is not None:
+            try:
+                props.setdefault("path", str(request.url.path))
+                props.setdefault("method", str(request.method))
+            except Exception:
+                pass
+            # Attach user attributes to the exception event itself so a
+            # reviewer can see *who* the user was inline on the event,
+            # without clicking through to the person profile. Honors
+            # POSTHOG_IDENTIFY_PII (none / id / email / full).
+            user_props = self._user_props_for_event(request)
+            for key, value in user_props.items():
+                props.setdefault(key, value)
+            if distinct_id is None:
+                distinct_id = self._distinct_id_from_request(request)
+
+        if distinct_id is None:
+            distinct_id = "anonymous"
+
+        try:
+            # PostHog SDK ≥ 3.7 exposes ``capture_exception``. Older
+            # builds don't — fall back to a manual ``$exception`` event.
+            cap = getattr(self._client, "capture_exception", None)
+            if callable(cap):
+                cap(exc, distinct_id=distinct_id, properties=props)
+            else:  # pragma: no cover — only triggered on old SDK
+                self._client.capture(
+                    distinct_id=distinct_id,
+                    event="$exception",
+                    properties={
+                        **props,
+                        "$exception_type": type(exc).__name__,
+                        "$exception_message": str(exc),
+                    },
+                )
+        except Exception:
+            logger.exception("PostHog capture_exception failed")
+
+    # ----- feature flags -----
+
+    def is_feature_enabled(self, key: str, distinct_id: str, default: bool = False) -> bool:
+        if not self._enabled or self._client is None:
+            return default
+        # PostHog SDK 3.x exposed ``is_feature_enabled``; 7.x renamed it to
+        # ``feature_enabled``. Try the new name first, fall back to the old.
+        method = getattr(self._client, "feature_enabled", None) or getattr(
+            self._client, "is_feature_enabled", None
+        )
+        if method is None:
+            return default
+        try:
+            value = method(key, distinct_id)
+            return bool(value) if value is not None else default
+        except Exception:
+            logger.exception("PostHog feature_enabled failed (key=%s)", key)
+            return default
+
+    def get_feature_flag_payload(self, key: str, distinct_id: str) -> Any:
+        if not self._enabled or self._client is None:
+            return None
+        try:
+            return self._client.get_feature_flag_payload(key, distinct_id)
+        except Exception:
+            logger.exception("PostHog get_feature_flag_payload failed (key=%s)", key)
+            return None
+
+    # ----- lifecycle -----
+
+    def shutdown(self) -> None:
+        if not self._enabled or self._client is None:
+            return
+        try:
+            self._client.shutdown()
+        except Exception:
+            logger.exception("PostHog shutdown failed")
+
+    # ----- helpers -----
+
+    def _user_props_for_event(self, request: Any) -> dict:
+        """Return the user attributes attached to a captured event.
+
+        Mirrors :func:`app.web.router._posthog_user_block` but for
+        backend-emitted events. Respects ``POSTHOG_IDENTIFY_PII``:
+
+            none  -> {}
+            id    -> {user_id}
+            email -> {user_id, user_email}
+            full  -> {user_id, user_email, user_name}
+
+        Keys are namespaced with ``user_`` so they don't collide with
+        anything PostHog inserts on its side. Returns ``{}`` for
+        anonymous requests (no ``request.state.user``) or when the
+        identify mode is ``none``.
+        """
+        if self._identify_mode == "none":
+            return {}
+        try:
+            user = getattr(request.state, "user", None)
+        except Exception:
+            user = None
+        if not user:
+            return {}
+
+        def _get(attr: str):
+            if isinstance(user, dict):
+                return user.get(attr)
+            return getattr(user, attr, None)
+
+        out: dict = {}
+        uid = _get("id") or _get("user_id")
+        if uid:
+            out["user_id"] = str(uid)
+        if self._identify_mode in ("email", "full"):
+            email = _get("email")
+            if email:
+                out["user_email"] = str(email)
+        if self._identify_mode == "full":
+            name = _get("name") or _get("full_name")
+            if name:
+                out["user_name"] = str(name)
+        return out
+
+    @staticmethod
+    def _distinct_id_from_request(request: Any) -> str | None:
+        try:
+            user = getattr(request.state, "user", None)
+        except Exception:
+            user = None
+        if user is None:
+            return None
+        for attr in ("id", "user_id", "email"):
+            value = getattr(user, attr, None)
+            if value:
+                return str(value)
+        if isinstance(user, dict):
+            for key in ("id", "user_id", "email"):
+                if user.get(key):
+                    return str(user[key])
+        return None
+
+
+_singleton_lock = threading.Lock()
+_singleton: PosthogClient | None = None
+
+
+def get_posthog() -> PosthogClient:
+    """Return the process-wide :class:`PosthogClient`, constructing on first call."""
+    global _singleton
+    if _singleton is not None:
+        return _singleton
+    with _singleton_lock:
+        if _singleton is None:
+            _singleton = PosthogClient()
+    return _singleton
+
+
+def reset_posthog() -> None:
+    """Drop the singleton (test hook only)."""
+    global _singleton
+    with _singleton_lock:
+        if _singleton is not None:
+            try:
+                _singleton.shutdown()
+            except Exception:
+                pass
+        _singleton = None
diff --git a/src/orchestrator.py b/src/orchestrator.py
index 32d9d46..1e95556 100644
--- a/src/orchestrator.py
+++ b/src/orchestrator.py
@@ -48,6 +48,19 @@ logger = logging.getLogger(__name__)
 
 _rebuild_lock = threading.Lock()
 
+
+def _capture_orchestrator_exception(exc: BaseException, **props) -> None:
+    """Best-effort PostHog forward for rebuild failures. No-op when disabled."""
+    try:
+        from src.observability import get_posthog
+        get_posthog().capture_exception(
+            exc,
+            distinct_id="system",
+            properties={"component": "orchestrator", **props},
+        )
+    except Exception:
+        logger.debug("PostHog capture_exception failed in orchestrator", exc_info=True)
+
 # Identifier validation lives in src/identifier_validation.py so the
 # orchestrator and the extractors share the same regex (#81 Group D).
 # The local names are kept as aliases so existing call sites need no
@@ -101,12 +114,20 @@ class SyncOrchestrator:
         Returns: {source_name: [table_names]} for logging.
         """
         with _rebuild_lock:
-            return self._do_rebuild()
+            try:
+                return self._do_rebuild()
+            except Exception as exc:
+                _capture_orchestrator_exception(exc, op="rebuild")
+                raise
 
     def rebuild_source(self, source_name: str) -> List[str]:
         """Rebuild views from a single source (e.g. after Jira webhook)."""
         with _rebuild_lock:
-            return self._do_rebuild_source(source_name)
+            try:
+                return self._do_rebuild_source(source_name)
+            except Exception as exc:
+                _capture_orchestrator_exception(exc, op="rebuild_source", source=source_name)
+                raise
 
     def _scan_meta_pairs(self, extracts_dir: Path) -> tuple:
         """Read every connector's `_meta` and return (pairs, clean) where:
diff --git a/tests/test_llm_tracing.py b/tests/test_llm_tracing.py
new file mode 100644
index 0000000..18142bd
--- /dev/null
+++ b/tests/test_llm_tracing.py
@@ -0,0 +1,192 @@
+"""LLM tracing emits well-formed $ai_generation events."""
+
+from __future__ import annotations
+
+from types import SimpleNamespace
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+
+@pytest.fixture
+def enabled_posthog(monkeypatch):
+    monkeypatch.setenv("POSTHOG_API_KEY", "phc_x")
+    monkeypatch.delenv("POSTHOG_LLM_PAYLOADS", raising=False)
+    from src.observability import reset_posthog
+    reset_posthog()
+    yield
+    reset_posthog()
+
+
+def test_success_emits_ai_generation_with_token_counts(enabled_posthog):
+    sdk = MagicMock()
+    with patch("posthog.Posthog", return_value=sdk):
+        from src.observability import trace_generation
+
+        with trace_generation(provider="anthropic", model="claude-test", distinct_id="u-1") as t:
+            t.set_input("hello")
+            t.set_tokens(input_tokens=5, output_tokens=10)
+
+        # The wrapper calls sdk.capture exactly once.
+        sdk.capture.assert_called_once()
+        kwargs = sdk.capture.call_args.kwargs
+        assert kwargs["event"] == "$ai_generation"
+        assert kwargs["distinct_id"] == "u-1"
+        props = kwargs["properties"]
+        assert props["$ai_provider"] == "anthropic"
+        assert props["$ai_model"] == "claude-test"
+        assert props["$ai_input_tokens"] == 5
+        assert props["$ai_output_tokens"] == 10
+        assert "$ai_latency" in props
+        assert "$ai_trace_id" in props
+        # Payloads off by default — neither input nor output bodies leak.
+        assert "$ai_input" not in props
+        assert "$ai_output_choices" not in props
+        assert "$ai_is_error" not in props
+
+
+def test_payloads_flag_enables_prompt_and_completion(enabled_posthog, monkeypatch):
+    monkeypatch.setenv("POSTHOG_LLM_PAYLOADS", "1")
+    from src.observability import reset_posthog
+    reset_posthog()
+    sdk = MagicMock()
+    with patch("posthog.Posthog", return_value=sdk):
+        from src.observability import trace_generation
+
+        with trace_generation(provider="openai_compat", model="gpt-x") as t:
+            t.set_input("the prompt")
+            t.set_output("the completion")
+            t.set_tokens(input_tokens=1, output_tokens=2)
+
+        kwargs = sdk.capture.call_args.kwargs
+        props = kwargs["properties"]
+        assert props["$ai_input"] == "the prompt"
+        assert props["$ai_output_choices"] == "the completion"
+
+
+def test_exception_emits_error_event_and_reraises(enabled_posthog):
+    sdk = MagicMock()
+    with patch("posthog.Posthog", return_value=sdk):
+        from src.observability import trace_generation
+
+        with pytest.raises(RuntimeError, match="api down"):
+            with trace_generation(provider="anthropic", model="claude-test") as t:
+                t.set_input("x")
+                raise RuntimeError("api down")
+
+        sdk.capture.assert_called_once()
+        props = sdk.capture.call_args.kwargs["properties"]
+        assert props["$ai_is_error"] is True
+        assert "api down" in props["$ai_error"]
+        assert props["$ai_provider"] == "anthropic"
+        assert "$ai_latency" in props
+
+
+def test_set_output_from_anthropic_extracts_tokens(enabled_posthog):
+    sdk = MagicMock()
+    with patch("posthog.Posthog", return_value=sdk):
+        from src.observability import trace_generation
+
+        # Build a fake Anthropic response object.
+        block = SimpleNamespace(text="some output text")
+        response = SimpleNamespace(
+            usage=SimpleNamespace(input_tokens=11, output_tokens=22),
+            content=[block],
+        )
+
+        with trace_generation(provider="anthropic", model="claude-test") as t:
+            t.set_output_from_anthropic(response)
+
+        props = sdk.capture.call_args.kwargs["properties"]
+        assert props["$ai_input_tokens"] == 11
+        assert props["$ai_output_tokens"] == 22
+
+
+def test_payload_truncation_under_default_cap(enabled_posthog, monkeypatch):
+    """Oversized prompt/output gets clipped so PostHog doesn't drop the event.
+
+    Agnes ships LLM prompts containing sample rows / SQL that routinely
+    exceed PostHog's ~32 KB per-event ingest cap. Without truncation the
+    interesting events vanish silently. PR #231 review (minasarustamyan).
+    """
+    monkeypatch.setenv("POSTHOG_LLM_PAYLOADS", "1")
+    from src.observability import reset_posthog
+    reset_posthog()
+
+    big_prompt = "P" * 50_000
+    big_output = "O" * 50_000
+    sdk = MagicMock()
+    with patch("posthog.Posthog", return_value=sdk):
+        from src.observability import trace_generation
+        with trace_generation(provider="anthropic", model="claude-x") as t:
+            t.set_input(big_prompt)
+            t.set_output(big_output)
+            t.set_tokens(input_tokens=1, output_tokens=2)
+
+        props = sdk.capture.call_args.kwargs["properties"]
+        assert len(props["$ai_input"]) < len(big_prompt)
+        assert len(props["$ai_output_choices"]) < len(big_output)
+        # Truncation marker present so reader knows it was clipped.
+        assert "[truncated " in props["$ai_input"]
+        assert "[truncated " in props["$ai_output_choices"]
+        # Cap stays well under PostHog's ~32 KB per-event limit.
+        assert len(props["$ai_input"]) < 32_000
+        assert len(props["$ai_output_choices"]) < 32_000
+
+
+def test_payload_truncation_respects_env_override(enabled_posthog, monkeypatch):
+    monkeypatch.setenv("POSTHOG_LLM_PAYLOADS", "1")
+    monkeypatch.setenv("POSTHOG_LLM_PAYLOAD_MAX_CHARS", "100")
+    from src.observability import reset_posthog
+    reset_posthog()
+
+    sdk = MagicMock()
+    with patch("posthog.Posthog", return_value=sdk):
+        from src.observability import trace_generation
+        with trace_generation(provider="anthropic", model="claude-x") as t:
+            t.set_input("X" * 500)
+            t.set_output("Y" * 500)
+
+        props = sdk.capture.call_args.kwargs["properties"]
+        # Cap honored — first 100 chars then the marker.
+        assert props["$ai_input"].startswith("X" * 100)
+        assert props["$ai_input"].endswith("[truncated 400 chars]")
+
+
+def test_payload_under_cap_is_passed_through_unchanged(enabled_posthog, monkeypatch):
+    monkeypatch.setenv("POSTHOG_LLM_PAYLOADS", "1")
+    from src.observability import reset_posthog
+    reset_posthog()
+
+    sdk = MagicMock()
+    with patch("posthog.Posthog", return_value=sdk):
+        from src.observability import trace_generation
+        small = "tiny prompt"
+        with trace_generation(provider="anthropic", model="claude-x") as t:
+            t.set_input(small)
+            t.set_output(small)
+
+        props = sdk.capture.call_args.kwargs["properties"]
+        assert props["$ai_input"] == small
+        assert props["$ai_output_choices"] == small
+        assert "[truncated" not in props["$ai_input"]
+
+
+def test_set_output_from_openai_extracts_tokens(enabled_posthog):
+    sdk = MagicMock()
+    with patch("posthog.Posthog", return_value=sdk):
+        from src.observability import trace_generation
+
+        msg = SimpleNamespace(content="hi")
+        choice = SimpleNamespace(message=msg)
+        response = SimpleNamespace(
+            usage=SimpleNamespace(prompt_tokens=3, completion_tokens=7),
+            choices=[choice],
+        )
+
+        with trace_generation(provider="openai_compat", model="gpt-x") as t:
+            t.set_output_from_openai(response)
+
+        props = sdk.capture.call_args.kwargs["properties"]
+        assert props["$ai_input_tokens"] == 3
+        assert props["$ai_output_tokens"] == 7
diff --git a/tests/test_posthog_client.py b/tests/test_posthog_client.py
new file mode 100644
index 0000000..2df1ce7
--- /dev/null
+++ b/tests/test_posthog_client.py
@@ -0,0 +1,350 @@
+"""PostHog client behavior when POSTHOG_API_KEY is set.
+
+The underlying ``posthog.Posthog`` class is patched so the suite runs
+without a network. We assert on the calls our wrapper forwards, plus
+shape of the identify-mode payloads.
+"""
+
+from __future__ import annotations
+
+import os
+from types import SimpleNamespace
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+
+@pytest.fixture
+def posthog_env(monkeypatch):
+    """Set up POSTHOG_API_KEY and reset the singleton."""
+    monkeypatch.setenv("POSTHOG_API_KEY", "phc_test_key")
+    monkeypatch.delenv("POSTHOG_HOST", raising=False)
+    monkeypatch.delenv("POSTHOG_IDENTIFY_PII", raising=False)
+    monkeypatch.delenv("POSTHOG_REPLAY", raising=False)
+    monkeypatch.delenv("POSTHOG_LLM_PAYLOADS", raising=False)
+    from src.observability import reset_posthog
+    reset_posthog()
+    yield
+    reset_posthog()
+
+
+def test_enabled_when_key_set(posthog_env):
+    with patch("posthog.Posthog") as posthog_ctor:
+        from src.observability import get_posthog
+
+        pc = get_posthog()
+
+        assert pc.enabled is True
+        assert pc.host == "https://eu.i.posthog.com"
+        assert pc.identify_mode == "email"
+        assert pc.replay_enabled is True
+        assert pc.llm_payloads_enabled is False
+        posthog_ctor.assert_called_once()
+        kwargs = posthog_ctor.call_args.kwargs
+        assert kwargs["project_api_key"] == "phc_test_key"
+        assert kwargs["host"] == "https://eu.i.posthog.com"
+
+
+def test_host_override(monkeypatch):
+    monkeypatch.setenv("POSTHOG_API_KEY", "phc_x")
+    monkeypatch.setenv("POSTHOG_HOST", "https://us.i.posthog.com")
+    from src.observability import reset_posthog, get_posthog
+    reset_posthog()
+    with patch("posthog.Posthog"):
+        assert get_posthog().host == "https://us.i.posthog.com"
+    reset_posthog()
+
+
+def test_capture_exception_forwards_to_sdk(posthog_env):
+    sdk = MagicMock()
+    with patch("posthog.Posthog", return_value=sdk):
+        from src.observability import get_posthog
+
+        pc = get_posthog()
+        request = SimpleNamespace(
+            url=SimpleNamespace(path="/dashboard"),
+            method="GET",
+            state=SimpleNamespace(user={"id": "u-42", "email": "a@example.com", "name": "Ada"}),
+        )
+        pc.capture_exception(RuntimeError("boom"), request=request, properties={"k": "v"})
+
+        sdk.capture_exception.assert_called_once()
+        args, kwargs = sdk.capture_exception.call_args
+        # Exception is positional (PostHog SDK ≥ 3.7).
+        assert isinstance(args[0], RuntimeError)
+        assert kwargs["distinct_id"] == "u-42"
+        props = kwargs["properties"]
+        assert props["path"] == "/dashboard"
+        assert props["method"] == "GET"
+        assert props["k"] == "v"
+        # User attributes inlined on the event itself per default identify mode (email).
+        assert props["user_id"] == "u-42"
+        assert props["user_email"] == "a@example.com"
+        # name only at identify mode 'full'.
+        assert "user_name" not in props
+
+
+def test_capture_exception_user_props_full_mode(monkeypatch):
+    monkeypatch.setenv("POSTHOG_API_KEY", "phc_x")
+    monkeypatch.setenv("POSTHOG_IDENTIFY_PII", "full")
+    from src.observability import reset_posthog, get_posthog
+    reset_posthog()
+    sdk = MagicMock()
+    with patch("posthog.Posthog", return_value=sdk):
+        pc = get_posthog()
+        request = SimpleNamespace(
+            url=SimpleNamespace(path="/x"), method="POST",
+            state=SimpleNamespace(user={"id": "u-1", "email": "a@b.test", "name": "Ada"}),
+        )
+        pc.capture_exception(RuntimeError("e"), request=request)
+        props = sdk.capture_exception.call_args.kwargs["properties"]
+        assert props["user_id"] == "u-1"
+        assert props["user_email"] == "a@b.test"
+        assert props["user_name"] == "Ada"
+    reset_posthog()
+
+
+def test_capture_exception_user_props_none_mode_emits_nothing(monkeypatch):
+    monkeypatch.setenv("POSTHOG_API_KEY", "phc_x")
+    monkeypatch.setenv("POSTHOG_IDENTIFY_PII", "none")
+    from src.observability import reset_posthog, get_posthog
+    reset_posthog()
+    sdk = MagicMock()
+    with patch("posthog.Posthog", return_value=sdk):
+        pc = get_posthog()
+        request = SimpleNamespace(
+            url=SimpleNamespace(path="/x"), method="POST",
+            state=SimpleNamespace(user={"id": "u-1", "email": "a@b.test"}),
+        )
+        pc.capture_exception(RuntimeError("e"), request=request)
+        props = sdk.capture_exception.call_args.kwargs["properties"]
+        assert "user_id" not in props
+        assert "user_email" not in props
+        assert "user_name" not in props
+    reset_posthog()
+
+
+def test_capture_exception_anonymous_request_no_user_props(posthog_env):
+    sdk = MagicMock()
+    with patch("posthog.Posthog", return_value=sdk):
+        from src.observability import get_posthog
+        pc = get_posthog()
+        request = SimpleNamespace(
+            url=SimpleNamespace(path="/x"), method="GET",
+            state=SimpleNamespace(),  # no .user attribute
+        )
+        pc.capture_exception(RuntimeError("e"), request=request)
+        kwargs = sdk.capture_exception.call_args.kwargs
+        assert kwargs["distinct_id"] == "anonymous"
+        props = kwargs["properties"]
+        assert "user_id" not in props
+        assert "user_email" not in props
+
+
+def test_capture_exception_falls_back_when_sdk_lacks_native(posthog_env):
+    """Older posthog SDKs miss capture_exception — wrapper sends $exception."""
+    sdk = MagicMock(spec=["capture", "shutdown"])
+    with patch("posthog.Posthog", return_value=sdk):
+        from src.observability import get_posthog
+
+        pc = get_posthog()
+        pc.capture_exception(ValueError("x"), distinct_id="u-1")
+
+        sdk.capture.assert_called_once()
+        kwargs = sdk.capture.call_args.kwargs
+        assert kwargs["event"] == "$exception"
+        assert kwargs["distinct_id"] == "u-1"
+        assert kwargs["properties"]["$exception_type"] == "ValueError"
+        assert kwargs["properties"]["$exception_message"] == "x"
+
+
+def test_is_feature_enabled_returns_default_on_sdk_error(posthog_env):
+    sdk = MagicMock()
+    # Wrapper prefers the v7 name `feature_enabled`. Patch both so either
+    # SDK version routes through the failing path.
+    sdk.feature_enabled.side_effect = RuntimeError("network down")
+    sdk.is_feature_enabled.side_effect = RuntimeError("network down")
+    with patch("posthog.Posthog", return_value=sdk):
+        from src.observability import get_posthog
+
+        assert get_posthog().is_feature_enabled("flag-x", "u-1", default=True) is True
+
+
+def test_invalid_identify_mode_falls_back_to_email(monkeypatch, caplog):
+    monkeypatch.setenv("POSTHOG_API_KEY", "phc_x")
+    monkeypatch.setenv("POSTHOG_IDENTIFY_PII", "completely-bogus")
+    from src.observability import reset_posthog, get_posthog
+    reset_posthog()
+    with patch("posthog.Posthog"):
+        assert get_posthog().identify_mode == "email"
+    reset_posthog()
+
+
+def test_template_user_block_respects_identify_modes(monkeypatch):
+    """The Jinja helper produces id-only / email / full payloads on demand."""
+    monkeypatch.setenv("POSTHOG_API_KEY", "phc_x")
+    from app.web.router import _posthog_user_block
+    from src.observability import reset_posthog
+    reset_posthog()
+    with patch("posthog.Posthog"):
+        request = SimpleNamespace(state=SimpleNamespace(
+            user={"id": "u-7", "email": "a@b.test", "name": "Ada"},
+        ))
+
+        monkeypatch.setenv("POSTHOG_IDENTIFY_PII", "id")
+        reset_posthog()
+        block = _posthog_user_block(request)
+        assert block == {"distinct_id": "u-7", "props": {}}
+
+        monkeypatch.setenv("POSTHOG_IDENTIFY_PII", "email")
+        reset_posthog()
+        block = _posthog_user_block(request)
+        assert block == {"distinct_id": "u-7", "props": {"email": "a@b.test"}}
+
+        monkeypatch.setenv("POSTHOG_IDENTIFY_PII", "full")
+        reset_posthog()
+        block = _posthog_user_block(request)
+        assert block == {"distinct_id": "u-7", "props": {"email": "a@b.test", "name": "Ada"}}
+
+        monkeypatch.setenv("POSTHOG_IDENTIFY_PII", "none")
+        reset_posthog()
+        assert _posthog_user_block(request) is None
+    reset_posthog()
+
+
+def test_environment_resolution_explicit_wins(monkeypatch):
+    monkeypatch.setenv("POSTHOG_API_KEY", "phc_x")
+    monkeypatch.setenv("POSTHOG_ENVIRONMENT", "qa-7")
+    monkeypatch.setenv("LOCAL_DEV_MODE", "1")  # would otherwise resolve to "local"
+    monkeypatch.setenv("RELEASE_CHANNEL", "stable")
+    from src.observability import reset_posthog, get_posthog
+    reset_posthog()
+    with patch("posthog.Posthog") as ctor:
+        pc = get_posthog()
+        assert pc.environment == "qa-7"
+        kwargs = ctor.call_args.kwargs
+        assert kwargs["super_properties"]["environment"] == "qa-7"
+    reset_posthog()
+
+
+def test_environment_resolution_local_dev_short_circuit(monkeypatch):
+    monkeypatch.setenv("POSTHOG_API_KEY", "phc_x")
+    monkeypatch.delenv("POSTHOG_ENVIRONMENT", raising=False)
+    monkeypatch.setenv("LOCAL_DEV_MODE", "1")
+    monkeypatch.setenv("RELEASE_CHANNEL", "stable")  # ignored when LOCAL_DEV_MODE wins
+    from src.observability import reset_posthog, get_posthog
+    reset_posthog()
+    with patch("posthog.Posthog"):
+        assert get_posthog().environment == "local"
+    reset_posthog()
+
+
+def test_environment_release_channel_fallback(monkeypatch):
+    monkeypatch.setenv("POSTHOG_API_KEY", "phc_x")
+    monkeypatch.delenv("POSTHOG_ENVIRONMENT", raising=False)
+    monkeypatch.delenv("LOCAL_DEV_MODE", raising=False)
+    monkeypatch.setenv("RELEASE_CHANNEL", "stable")
+    from src.observability import reset_posthog, get_posthog
+    reset_posthog()
+    with patch("posthog.Posthog") as ctor:
+        pc = get_posthog()
+        assert pc.environment == "stable"
+        # release also surfaces from AGNES_VERSION → RELEASE_CHANNEL fallback
+        assert ctor.call_args.kwargs["super_properties"]["release"] == "stable"
+    reset_posthog()
+
+
+def test_environment_unknown_when_nothing_set(monkeypatch):
+    monkeypatch.setenv("POSTHOG_API_KEY", "phc_x")
+    for var in ("POSTHOG_ENVIRONMENT", "LOCAL_DEV_MODE", "RELEASE_CHANNEL", "AGNES_DEPLOYMENT_ENV", "AGNES_VERSION"):
+        monkeypatch.delenv(var, raising=False)
+    from src.observability import reset_posthog, get_posthog
+    reset_posthog()
+    with patch("posthog.Posthog") as ctor:
+        pc = get_posthog()
+        assert pc.environment == "unknown"
+        assert pc.release is None
+        assert "release" not in ctor.call_args.kwargs["super_properties"]
+    reset_posthog()
+
+
+def _render_snippet(user_block):
+    """Render `_posthog.html` directly with stub Jinja globals.
+
+    Avoids spinning up the full TestClient for what is effectively a
+    template-output assertion.
+    """
+    from jinja2 import Environment, FileSystemLoader
+
+    env = Environment(loader=FileSystemLoader("app/web/templates"), autoescape=False)
+    return env.get_template("_posthog.html").render(
+        request=None,
+        posthog_config={
+            "enabled": True,
+            "host": "https://eu.i.posthog.com",
+            "api_key_public": "phc_test",
+            "replay_enabled": True,
+            "replay_mask_selector_extra": "",
+            "environment": "local",
+            "release": "0.99.0",
+        },
+        posthog_user_block=lambda _r: user_block,
+    )
+
+
+def test_browser_snippet_registers_user_id_and_email_when_logged_in():
+    out = _render_snippet({
+        "distinct_id": "u-99",
+        "props": {"email": "ada@example.com"},
+    })
+
+    # Super-properties: env + release always, plus user_id/email when logged in.
+    assert "_superProps.user_id = \"u-99\"" in out
+    assert "_superProps.user_email = \"ada@example.com\"" in out
+    # identify() still fires alongside register() so person profiles get linked.
+    assert "ph.identify(\"u-99\"" in out
+    assert "\"email\": \"ada@example.com\"" in out
+    # Environment + release land on the same super-prop bag.
+    assert "environment: \"local\"" in out
+    assert "release: \"0.99.0\"" in out
+
+
+def test_browser_snippet_includes_user_name_in_full_mode():
+    out = _render_snippet({
+        "distinct_id": "u-99",
+        "props": {"email": "ada@example.com", "name": "Ada Lovelace"},
+    })
+
+    assert "_superProps.user_name = \"Ada Lovelace\"" in out
+
+
+def test_browser_snippet_omits_user_props_when_anonymous():
+    out = _render_snippet(None)
+
+    assert "_superProps.user_id" not in out
+    assert "_superProps.user_email" not in out
+    assert "_superProps.user_name" not in out
+    assert "ph.identify(" not in out
+    # Environment still registers so anonymous events are tagged too.
+    assert "environment: \"local\"" in out
+
+
+def test_browser_snippet_omits_email_when_id_only_mode():
+    """Caller passes a block with only distinct_id → no email/name in output."""
+    out = _render_snippet({"distinct_id": "u-1", "props": {}})
+
+    assert "_superProps.user_id = \"u-1\"" in out
+    assert "_superProps.user_email" not in out
+    assert "_superProps.user_name" not in out
+
+
+def test_template_user_block_anonymous_returns_none(monkeypatch):
+    monkeypatch.setenv("POSTHOG_API_KEY", "phc_x")
+    from app.web.router import _posthog_user_block
+    from src.observability import reset_posthog
+    reset_posthog()
+    with patch("posthog.Posthog"):
+        request = SimpleNamespace(state=SimpleNamespace())  # no user attribute
+        # `getattr` falls back to None — block should be None.
+        assert _posthog_user_block(request) is None
+    reset_posthog()
diff --git a/tests/test_posthog_disabled.py b/tests/test_posthog_disabled.py
new file mode 100644
index 0000000..8b5f3e5
--- /dev/null
+++ b/tests/test_posthog_disabled.py
@@ -0,0 +1,77 @@
+"""PostHog integration is fully off when POSTHOG_API_KEY is unset.
+
+Pins the contract: zero network, zero side effects, snippet renders empty.
+"""
+
+from __future__ import annotations
+
+import os
+from unittest.mock import patch
+
+import pytest
+
+
+@pytest.fixture(autouse=True)
+def _clear_posthog_env(monkeypatch):
+    """Drop every POSTHOG_* var so each test starts from disabled state."""
+    for key in list(os.environ):
+        if key.startswith("POSTHOG_"):
+            monkeypatch.delenv(key, raising=False)
+    from src.observability import reset_posthog
+    reset_posthog()
+    yield
+    reset_posthog()
+
+
+def test_get_posthog_returns_disabled_singleton():
+    from src.observability import get_posthog
+
+    pc = get_posthog()
+
+    assert pc.enabled is False
+    # singleton — second access returns same object
+    assert get_posthog() is pc
+
+
+def test_disabled_client_never_constructs_underlying_sdk():
+    """With key unset, posthog.Posthog() must never be instantiated."""
+    with patch("posthog.Posthog") as posthog_ctor:
+        from src.observability import reset_posthog, get_posthog
+        reset_posthog()
+        pc = get_posthog()
+
+        # Trigger every public method that could touch the SDK.
+        pc.capture("custom_event", "user-1", {"k": "v"})
+        pc.capture_exception(RuntimeError("boom"), distinct_id="user-1")
+        assert pc.is_feature_enabled("flag", "user-1", default=False) is False
+        assert pc.get_feature_flag_payload("flag", "user-1") is None
+        pc.shutdown()
+
+        posthog_ctor.assert_not_called()
+
+
+def test_template_global_reports_disabled():
+    """The Jinja-side `posthog_config` flag must be False when off."""
+    from app.web.router import _posthog_config_global
+
+    cfg = _posthog_config_global()
+
+    assert cfg == {"enabled": False}
+
+
+def test_template_user_block_returns_none_when_disabled():
+    from app.web.router import _posthog_user_block
+
+    assert _posthog_user_block(None) is None
+
+
+def test_trace_generation_yields_noop_when_disabled():
+    """The LLM-tracing context manager must not call into PostHog when off."""
+    from src.observability import trace_generation
+
+    with patch("src.observability.posthog_client.PosthogClient.capture") as cap:
+        with trace_generation(provider="anthropic", model="claude-test") as t:
+            t.set_input("hello")
+            t.set_tokens(input_tokens=10, output_tokens=20)
+
+        cap.assert_not_called()
diff --git a/tests/test_posthog_inject_middleware.py b/tests/test_posthog_inject_middleware.py
new file mode 100644
index 0000000..94bbb68
--- /dev/null
+++ b/tests/test_posthog_inject_middleware.py
@@ -0,0 +1,180 @@
+"""Regression tests for ``app/middleware/posthog_inject.py``.
+
+Two narrow concerns from PR #231 review (minasarustamyan):
+
+1. ``Response.background`` MUST be forwarded on every return path.
+   ``BaseHTTPMiddleware`` materialises the body and asks subclasses to
+   return a fresh ``Response``; a missed ``background`` parameter cancels
+   any ``BackgroundTask`` / ``BackgroundTasks`` the route attached, with
+   no log line.
+2. Oversized HTML responses must short-circuit gracefully — the
+   middleware buffers in memory by design, so a streamed-HTML route
+   would blow up RSS without a cap.
+
+Tests boot a minimal FastAPI app (no DB, no auth, no real PostHog) and
+run via ``TestClient`` so they exercise the actual middleware stack.
+"""
+
+from __future__ import annotations
+
+import os
+from unittest.mock import patch
+
+import pytest
+from fastapi import FastAPI
+from fastapi.responses import HTMLResponse
+from fastapi.testclient import TestClient
+from starlette.background import BackgroundTask
+
+
+@pytest.fixture
+def posthog_enabled(monkeypatch):
+    monkeypatch.setenv("POSTHOG_API_KEY", "phc_test")
+    from src.observability import reset_posthog
+    reset_posthog()
+    yield
+    reset_posthog()
+
+
+def _make_app() -> FastAPI:
+    """Minimal FastAPI app with the injection middleware mounted.
+
+    Avoids importing ``app.main`` so the test stays fast and self-contained.
+    """
+    from app.middleware.posthog_inject import PosthogInjectionMiddleware
+
+    app = FastAPI()
+    app.add_middleware(PosthogInjectionMiddleware)
+    return app
+
+
+def test_background_task_runs_on_html_response(posthog_enabled):
+    """A BackgroundTask attached to an HTMLResponse must still fire after
+    the middleware rewrites the body. Was silently dropped before fix."""
+    fired: list[bool] = []
+
+    def _mark():
+        fired.append(True)
+
+    with patch("posthog.Posthog"):
+        # ``_render_snippet`` reaches into app.web.router; stub it so the
+        # middleware doesn't drag in the full app dependency tree.
+        with patch("app.middleware.posthog_inject._render_snippet", return_value=""):
+            app = _make_app()
+
+            @app.get("/page", response_class=HTMLResponse)
+            def page():
+                return HTMLResponse(
+                    "x",
+                    background=BackgroundTask(_mark),
+                )
+
+            client = TestClient(app)
+            res = client.get("/page")
+
+    assert res.status_code == 200
+    assert "" in res.text  # snippet injected
+    # Background task ran. Without the fix, fired stays [].
+    assert fired == [True]
+
+
+def test_background_task_runs_when_snippet_render_fails(posthog_enabled):
+    """If snippet rendering raises, the response still serves and the
+    background task still fires."""
+    fired: list[bool] = []
+
+    def _mark():
+        fired.append(True)
+
+    with patch("posthog.Posthog"):
+        with patch(
+            "app.middleware.posthog_inject._render_snippet",
+            side_effect=RuntimeError("template blew up"),
+        ):
+            app = _make_app()
+
+            @app.get("/page", response_class=HTMLResponse)
+            def page():
+                return HTMLResponse(
+                    "x",
+                    background=BackgroundTask(_mark),
+                )
+
+            client = TestClient(app)
+            res = client.get("/page")
+
+    assert res.status_code == 200
+    assert fired == [True]
+
+
+def test_background_task_runs_when_snippet_already_present(posthog_enabled):
+    """Defensive double-injection guard path — body unchanged but
+    background still forwarded."""
+    fired: list[bool] = []
+
+    def _mark():
+        fired.append(True)
+
+    with patch("posthog.Posthog"):
+        with patch("app.middleware.posthog_inject._render_snippet", return_value=""):
+            app = _make_app()
+
+            @app.get("/page", response_class=HTMLResponse)
+            def page():
+                # Body already contains posthog.init -> middleware skips re-injection.
+                return HTMLResponse(
+                    "",
+                    background=BackgroundTask(_mark),
+                )
+
+            client = TestClient(app)
+            res = client.get("/page")
+
+    assert res.status_code == 200
+    assert fired == [True]
+
+
+def test_non_html_response_passthrough_does_not_buffer(posthog_enabled):
+    """JSON / non-HTML responses must skip the middleware entirely —
+    no body materialisation, no background-task interference."""
+    fired: list[bool] = []
+
+    def _mark():
+        fired.append(True)
+
+    with patch("posthog.Posthog"):
+        app = _make_app()
+
+        @app.get("/api/health")
+        def health():
+            from fastapi.responses import JSONResponse
+            return JSONResponse({"ok": True}, background=BackgroundTask(_mark))
+
+        client = TestClient(app)
+        res = client.get("/api/health")
+
+    assert res.status_code == 200
+    assert res.json() == {"ok": True}
+    assert fired == [True]
+
+
+def test_oversized_html_response_short_circuits(posthog_enabled, monkeypatch):
+    """Body bigger than the buffer cap serves without injection rather
+    than buffering arbitrarily large streams in memory."""
+    monkeypatch.setattr("app.middleware.posthog_inject._MAX_BUFFER_BYTES", 1024)
+
+    with patch("posthog.Posthog"):
+        with patch("app.middleware.posthog_inject._render_snippet", return_value=""):
+            app = _make_app()
+
+            @app.get("/big", response_class=HTMLResponse)
+            def big():
+                # 2 KB body — twice the patched cap.
+                return HTMLResponse("" + ("X" * 2048) + "")
+
+            client = TestClient(app)
+            res = client.get("/big")
+
+    assert res.status_code == 200
+    # Snippet NOT injected — middleware bailed out at the cap.
+    assert "" not in res.text
diff --git a/uv.lock b/uv.lock
index 582e4e4..de1df10 100644
--- a/uv.lock
+++ b/uv.lock
@@ -21,7 +21,7 @@ wheels = [
 
 [[package]]
 name = "agnes-the-ai-analyst"
-version = "0.44.0"
+version = "0.47.4"
 source = { editable = "." }
 dependencies = [
     { name = "a2wsgi" },
@@ -44,6 +44,7 @@ dependencies = [
     { name = "numpy" },
     { name = "openai" },
     { name = "pandas" },
+    { name = "posthog" },
     { name = "pyarrow" },
     { name = "pyjwt" },
     { name = "python-dotenv" },
@@ -53,6 +54,7 @@ dependencies = [
     { name = "rich" },
     { name = "slowapi" },
     { name = "sqlglot" },
+    { name = "sse-starlette" },
     { name = "starlette" },
     { name = "typer" },
     { name = "uvicorn", extra = ["standard"] },
@@ -67,6 +69,9 @@ dev = [
     { name = "pytest-timeout" },
     { name = "pytest-xdist" },
 ]
+observability = [
+    { name = "posthog" },
+]
 
 [package.dev-dependencies]
 dev = [
@@ -104,6 +109,8 @@ requires-dist = [
     { name = "numpy", specifier = ">=1.24.0" },
     { name = "openai", specifier = ">=1.30.0" },
     { name = "pandas", specifier = ">=2.0.0" },
+    { name = "posthog", specifier = ">=3.7.0" },
+    { name = "posthog", marker = "extra == 'observability'", specifier = ">=3.7.0" },
     { name = "pyarrow", specifier = ">=12.0.0" },
     { name = "pyjwt", specifier = ">=2.8.0" },
     { name = "pytest", marker = "extra == 'dev'", specifier = ">=9.0.0" },
@@ -116,11 +123,12 @@ requires-dist = [
     { name = "rich", specifier = ">=13.0.0" },
     { name = "slowapi", specifier = ">=0.1.9" },
     { name = "sqlglot", specifier = ">=30.0.0" },
+    { name = "sse-starlette", specifier = ">=2.0" },
     { name = "starlette", specifier = ">=0.41.0" },
     { name = "typer", specifier = ">=0.12.0" },
     { name = "uvicorn", extras = ["standard"], specifier = ">=0.32.0" },
 ]
-provides-extras = ["dev"]
+provides-extras = ["observability", "dev"]
 
 [package.metadata.requires-dev]
 dev = [
@@ -239,15 +247,15 @@ wheels = [
 
 [[package]]
 name = "azure-core"
-version = "1.40.0"
+version = "1.41.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "requests" },
     { name = "typing-extensions" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/ce/d9/6f5972b44761277394527a3a76af5ae2ef82fc5f20ce351abf0c826eca67/azure_core-1.40.0.tar.gz", hash = "sha256:ecf5b6ddf2564471fae9d576147b7e77a4da285958b2d9f4fd6c3af104f3e9d7", size = 380057, upload-time = "2026-05-01T00:59:45.488Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/a6/f3/b416179e408990df5db0d516283022dde0f5d0111d98c1a848e41853e81c/azure_core-1.41.0.tar.gz", hash = "sha256:f46ff5dfcd230f25cf1c19e8a34b8dc08a337b2503e268bb600a16c00db8ad5a", size = 381042, upload-time = "2026-05-07T23:30:54.302Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/ec/c9/25edc67692fb17523c7d29c73898be649b4d3c7ae13cc0f74f5c91938022/azure_core-1.40.0-py3-none-any.whl", hash = "sha256:7f3ea02579b1bb1d34e45043423b650621d11d7c2ea3b05e5554010080b78dfd", size = 220450, upload-time = "2026-05-01T00:59:47.17Z" },
+    { url = "https://files.pythonhosted.org/packages/5b/db/325c6d7312d2200251c52323878281045aaffcb5586612296484e4280eaa/azure_core-1.41.0-py3-none-any.whl", hash = "sha256:522b4011e8180b1a3dcd2024396a4e7fe9ac37fb8597db47163d230b5efe892d", size = 220920, upload-time = "2026-05-07T23:30:56.357Z" },
 ]
 
 [[package]]
@@ -265,32 +273,41 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/d8/3a/6ef2047a072e54e1142718d433d50e9514c999a58f51abfff7902f3a72f8/azure_storage_blob-12.28.0-py3-none-any.whl", hash = "sha256:00fb1db28bf6a7b7ecaa48e3b1d5c83bfadacc5a678b77826081304bd87d6461", size = 431499, upload-time = "2026-01-06T23:48:58.995Z" },
 ]
 
+[[package]]
+name = "backoff"
+version = "2.2.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/47/d7/5bbeb12c44d7c4f2fb5b56abce497eb5ed9f34d85701de869acedd602619/backoff-2.2.1.tar.gz", hash = "sha256:03f829f5bb1923180821643f8753b0502c3b682293992485b0eef2807afa5cba", size = 17001, upload-time = "2022-10-05T19:19:32.061Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/df/73/b6e24bd22e6720ca8ee9a85a0c4a2971af8497d8f3193fa05390cbd46e09/backoff-2.2.1-py3-none-any.whl", hash = "sha256:63579f9a0628e06278f7e47b7d7d5b6ce20dc65c5e96a6f3ca99a6adca0396e8", size = 15148, upload-time = "2022-10-05T19:19:30.546Z" },
+]
+
 [[package]]
 name = "boto3"
-version = "1.43.5"
+version = "1.43.6"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "botocore" },
     { name = "jmespath" },
     { name = "s3transfer" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/71/b0/90ba01763dd483bb040d0815dc0ba893421e3f5926672ceab9acbb73b23f/boto3-1.43.5.tar.gz", hash = "sha256:414be7868f25c3b6a0232301c8ab40347911b6b191926b61f00a63f89b97b2bc", size = 113150, upload-time = "2026-05-06T19:56:49.629Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/0a/37/78c630d1308964aa9abf44951d9c4df776546ff37251ec2434944e205c4e/boto3-1.43.6.tar.gz", hash = "sha256:e6315effaf12b890b99956e6f8e2c3000a3f64e4ee91943cec3895ce9a836afb", size = 113153, upload-time = "2026-05-07T20:49:59.694Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/10/bb/347307758c2003783df1d9a9b07596928d05a6ca0e17790cea3b18105244/boto3-1.43.5-py3-none-any.whl", hash = "sha256:aa8a296c8db55d812767b282cfe4c7977f0b0eeaa709abdaeb368b9c738e901f", size = 140502, upload-time = "2026-05-06T19:56:46.626Z" },
+    { url = "https://files.pythonhosted.org/packages/c8/e2/3c2eef44f55eafab256836d1d9479bd6a74f70c26cbfdc0639a0e23e4327/boto3-1.43.6-py3-none-any.whl", hash = "sha256:179601ec2992726a718053bf41e43c223ceba397d31ceab11f64d9c910d9fc3a", size = 140502, upload-time = "2026-05-07T20:49:57.8Z" },
 ]
 
 [[package]]
 name = "botocore"
-version = "1.43.5"
+version = "1.43.6"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "jmespath" },
     { name = "python-dateutil" },
     { name = "urllib3" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/6d/a2/1285a22bf157f9e97a8fd236daea95d9b14cc8425ae5f8a616badf948408/botocore-1.43.5.tar.gz", hash = "sha256:5c7207816ab5e48382adcb2a64db388fa4abe9ee1d23f72c82ae62c51a0bc84e", size = 15321290, upload-time = "2026-05-06T19:56:35.658Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/79/a7/23d0f5028011455096a1eeac0ddf3cbe147b3e855e127342f8202552194d/botocore-1.43.6.tar.gz", hash = "sha256:b1e395b347356860398da42e61c808cf1e34b6fa7180cf2b9d87d986e1a06ba0", size = 15336070, upload-time = "2026-05-07T20:49:48.14Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/82/d2/99f1741b12e3cdba2e5370f6dafaab743a373c6f83592601ec75ff2cc47f/botocore-1.43.5-py3-none-any.whl", hash = "sha256:a1df6e0c6346735936f42e6b99f3b28f1e9397731c0bc2563c617df7965a0dc0", size = 15002116, upload-time = "2026-05-06T19:56:29.993Z" },
+    { url = "https://files.pythonhosted.org/packages/e5/c8/6f47223840e8d8cfa8c9f7c0ec1b77970417f257fc885169ff4f6326ce09/botocore-1.43.6-py3-none-any.whl", hash = "sha256:b6d1fdbc6f65a5fe0b7e947823aa37535d3f39f3ba4d21110fab1f55bbbcc04b", size = 15017094, upload-time = "2026-05-07T20:49:44.964Z" },
 ]
 
 [[package]]
@@ -1645,6 +1662,21 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/54/20/4d324d65cc6d9205fabedc306948156824eb9f0ee1633355a8f7ec5c66bf/pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746", size = 20538, upload-time = "2025-05-15T12:30:06.134Z" },
 ]
 
+[[package]]
+name = "posthog"
+version = "7.14.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "backoff" },
+    { name = "distro" },
+    { name = "requests" },
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/c4/0f/0e6578feaf0d4e670bc517b6da09ec147a65421c44e0cd687eba12f08743/posthog-7.14.0.tar.gz", hash = "sha256:3be5e513f07e4ee5119f98b0458cb640739b49cef7c96c3e18b1d65076b18239", size = 205083, upload-time = "2026-05-01T20:41:37.971Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/b3/c2/2dc3e08e481f45c0215da4325ccc9b5f368dcee504779d2f169ab567c766/posthog-7.14.0-py3-none-any.whl", hash = "sha256:76db6e3158e2c11ec9bbcf32a673efec4acc8078965d92e2d3055555220ee546", size = 240187, upload-time = "2026-05-01T20:41:36.022Z" },
+]
+
 [[package]]
 name = "proto-plus"
 version = "1.27.2"
@@ -2241,6 +2273,19 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/49/4b/359f28a903c13438ef59ebeee215fb25da53066db67b305c125f1c6d2a25/sqlparse-0.5.5-py3-none-any.whl", hash = "sha256:12a08b3bf3eec877c519589833aed092e2444e68240a3577e8e26148acc7b1ba", size = 46138, upload-time = "2025-12-19T07:17:46.573Z" },
 ]
 
+[[package]]
+name = "sse-starlette"
+version = "3.4.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "anyio" },
+    { name = "starlette" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/38/82/10cdfab4ab663a6b6bd624d33f55b2cfa41af5105be033a6d5d135a92c5f/sse_starlette-3.4.2.tar.gz", hash = "sha256:2f9a7f51ed84395a0427fb9f66cb1ec11f7899d977a72cbc9070b962a2e14489", size = 35236, upload-time = "2026-05-06T19:42:13.727Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/c1/27/351c71e803c56090d8d3bf9520422debeb8ed938871fd4f7ef519805a6c5/sse_starlette-3.4.2-py3-none-any.whl", hash = "sha256:6ea5d35b7ce979a3de5a0db5f77fe886b1616e4b3e1ad93fba502bd9b5fb662f", size = 16516, upload-time = "2026-05-06T19:42:12.201Z" },
+]
+
 [[package]]
 name = "starlette"
 version = "1.0.0"