* feat(observability): optional PostHog integration (errors, LLM traces, replay, flags)
Off by default. Activates when POSTHOG_API_KEY is set in env. Defaults
to PostHog Cloud EU; override host for US Cloud or self-hosted.
Coverage:
- FastAPI 500 handler captures unhandled exceptions
- src/orchestrator.py rebuild + rebuild_source failures
- services/scheduler/ HTTP-job failures
- cli/main.py uncaught CLI errors (Typer.Exit/SystemExit/KeyboardInterrupt
skipped; flushes before re-raise so short-lived CLI invocations don't
drop events)
- connectors/llm/anthropic_provider.py + openai_compat.py emit
$ai_generation events with provider, model, latency, token counts
(prompt/completion bodies stay off unless POSTHOG_LLM_PAYLOADS=1
because LLM prompts here routinely include customer SQL/data)
- Browser snippet injected into every text/html response by
PosthogInjectionMiddleware — registered inside the GZip layer so it
sees uncompressed HTML before compression. Many templates are
standalone (their own DOCTYPE) and never extend base.html, so a
per-template include would miss them.
- Frontend: $pageview, $pageleave, JS error capture via window.error
and unhandledrejection handlers, masked session replay
(maskAllInputs: true plus CSS-selector mask for known data surfaces),
feature flags (browser posthog.isFeatureEnabled + server-side
feature_enabled with fallback for older SDKs).
Identification mode operator-configurable: none / id / email / full.
Default email ships user.id + email but never name. CLI entry point
moves from cli.main:app to cli.main:main (Typer wrapper).
Files:
- src/observability/posthog_client.py — lazy singleton, no network
when disabled, single-process flush on shutdown
- src/observability/llm_tracing.py — trace_generation context manager
- app/middleware/posthog_inject.py — HTML rewrite middleware
- app/web/templates/_posthog.html — browser snippet template
- docs/observability.md — operator guide
- config/.env.template — documented POSTHOG_* knobs
- tests/test_posthog_disabled.py + tests/test_posthog_client.py +
tests/test_llm_tracing.py — 18 tests covering disabled state,
identify-mode payloads, $ai_generation shape, error variant.
CHANGELOG entry under [Unreleased] Added.
* feat(observability): tag every PostHog event with environment + release
Splits PostHog dashboards cleanly between localhost / dev / staging /
production without manual tagging on every capture call.
- POSTHOG_ENVIRONMENT explicit override; auto-resolves to "local" when
LOCAL_DEV_MODE=1, else RELEASE_CHANNEL, else AGNES_DEPLOYMENT_ENV,
else "unknown".
- AGNES_VERSION → RELEASE_CHANNEL fallback feeds the `release` property
for "is this error new in this release?" cohorting.
- Backend gets both via the PostHog SDK's super_properties constructor
arg (every captured event picks them up automatically).
- Browser snippet calls posthog.register({environment, release}) inside
the loaded callback so $pageview, $exception, autocapture, etc. all
carry the same labels.
- request.state.user now populated by auth dependencies so the snippet
can actually call posthog.identify(user_id, {email}) for logged-in
users (previously the user block always resolved to None because
nothing wrote to request.state.user).
4 new tests cover env resolution: explicit > LOCAL_DEV_MODE > channel
> unknown, plus super-properties forwarding into the SDK constructor.
* feat(observability): inline user attrs on every PostHog event + debug throw route
PostHog's UI shows person properties on the Person profile page, not
inline on each event — so a reviewer triaging an exception couldn't tell
which user hit the bug without clicking through. Fix it on both sides.
- Backend capture_exception merges user_id / user_email / user_name into
the event properties (gated by POSTHOG_IDENTIFY_PII: none/id/email/full).
Backed by a new _user_props_for_event helper on PosthogClient.
- Browser snippet registers user_id + user_email + user_name as super-
properties via posthog.register({...}) so every $exception, $pageview,
and custom event coming from posthog.captureException() carries them
inline. Mirrors the backend so cross-referencing client/server events
doesn't require a person-profile lookup.
- /api/debug/throw — debug-only endpoint gated by DEBUG=1 (404 in prod).
Runs Depends(get_current_user) first so request.state.user is set when
the unhandled-exception handler captures the event. Lets operators
exercise the full observability path end-to-end without hand-rolling
a TestClient script. Configurable via ?kind=ValueError&msg=...
7 new tests cover: backend user-attr merge across identify modes,
anonymous request fall-through, browser snippet super-prop emission for
logged-in / anonymous / id-only / full-name cases.
* fix(observability): address minasarustamyan PR #231 review
Two bugs caught in review.
1. PosthogInjectionMiddleware dropped Response.background on every
return path. BaseHTTPMiddleware materialises the body and asks
subclasses to return a fresh Response — three paths in dispatch()
omitted background=, silently cancelling any BackgroundTask /
BackgroundTasks the route attached (audit logging, async webhooks,
email sends) with no log line. Fix: route every return through a
_passthrough() helper that forwards background.
Also adds a _MAX_BUFFER_BYTES (4 MB) cap so a streamed-HTML response
can't balloon RSS during buffering. Bigger bodies short-circuit
through with a warning rather than being injected.
Regression tests in tests/test_posthog_inject_middleware.py exercise
four return paths (snippet present, render-fail, double-injection
guard, non-HTML passthrough) plus the streaming-guard short-circuit.
2. $ai_input / $ai_output_choices were emitted without truncation, so
POSTHOG_LLM_PAYLOADS=1 silently dropped events past PostHog's ~32 KB
per-event ingest limit — exactly the calls (large prompts with
schemas / sample rows / SQL) an operator would want to inspect.
Fix: clip both at POSTHOG_LLM_PAYLOAD_MAX_CHARS (default 30000) with
an explicit "…[truncated N chars]" marker so readers don't mistake
truncated captures for complete ones. Metadata (provider, model,
tokens, latency, error) flows regardless. Three new tests cover
default-cap clipping, env-override, and pass-through under the cap.
37 PostHog tests pass.
157 lines
7.6 KiB
Text
157 lines
7.6 KiB
Text
# Agnes AI Data Analyst - Environment Variables
|
|
# =============================================
|
|
# Copy to .env: cp config/.env.template .env
|
|
# .env is gitignored - NEVER commit it.
|
|
|
|
# ── REQUIRED ────────────────────────────────────────
|
|
JWT_SECRET_KEY= # python -c "import secrets; print(secrets.token_hex(32))"
|
|
SESSION_SECRET= # python -c "import secrets; print(secrets.token_hex(32))"
|
|
|
|
# ── GOOGLE OAUTH (required for Google login) ────────
|
|
# GOOGLE_CLIENT_ID=
|
|
# GOOGLE_CLIENT_SECRET=
|
|
|
|
# ── KEBOOLA (required for Keboola data source) ──────
|
|
# KEBOOLA_STORAGE_TOKEN=
|
|
# KEBOOLA_STACK_URL=https://connection.keboola.com
|
|
|
|
# ── BIGQUERY (required for BigQuery data source) ─────
|
|
# BIGQUERY_PROJECT=
|
|
# BIGQUERY_LOCATION=us
|
|
|
|
# ── BOOTSTRAP (first deploy only) ───────────────────
|
|
# SEED_ADMIN_EMAIL=admin@example.com
|
|
# SEED_ADMIN_PASSWORD= # Dev helper only — sets password_hash on seed.
|
|
# # Never overwrites an existing password.
|
|
|
|
# ── EMAIL / SMTP (required for magic link auth) ─────
|
|
# SMTP_HOST=smtp.gmail.com
|
|
# SMTP_PORT=587
|
|
# SMTP_USER=
|
|
# SMTP_PASSWORD=
|
|
|
|
# ── OPTIONAL SERVICES ───────────────────────────────
|
|
# TELEGRAM_BOT_TOKEN=
|
|
# JIRA_WEBHOOK_SECRET=
|
|
# JIRA_API_TOKEN=
|
|
# ANTHROPIC_API_KEY=
|
|
# LLM_API_KEY=
|
|
|
|
# ── DESKTOP APP ─────────────────────────────────────
|
|
# DESKTOP_JWT_SECRET= # Separate secret for desktop app tokens
|
|
|
|
# ── DEPLOYMENT ──────────────────────────────────────
|
|
# DATA_DIR=/data # Default: /data in Docker, ./data locally
|
|
# LOG_LEVEL=info # debug, info, warning, error
|
|
# CORS_ORIGINS=http://localhost:3000,http://localhost:8000
|
|
|
|
# ── SCHEDULER (sidecar tuning) ──────────────────────
|
|
# All values are in seconds and must be positive integers. SCHEDULER_TICK_SECONDS
|
|
# must be <= the smallest job interval below.
|
|
# SCHEDULER_DATA_REFRESH_INTERVAL=900 # default 15 min — POST /api/sync/trigger
|
|
# SCHEDULER_HEALTH_CHECK_INTERVAL=300 # default 5 min — GET /api/health
|
|
# SCHEDULER_SCRIPT_RUN_INTERVAL=60 # default 1 min — POST /api/scripts/run-due
|
|
# SCHEDULER_TICK_SECONDS=30 # default 30 s — loop polling cadence
|
|
|
|
# ── HTTPS / REVERSE PROXY ───────────────────────────
|
|
# Set these when the app runs behind a TLS terminator (Caddy, Cloudflare
|
|
# Tunnel, nginx, GCP LB, etc.). The app itself speaks plain HTTP on :8000;
|
|
# the terminator is responsible for TLS.
|
|
#
|
|
# DOMAIN: public hostname. When set, session cookies get the `Secure` flag
|
|
# (browser only sends them over HTTPS). Also used by the Caddy
|
|
# profile to auto-provision Let's Encrypt certs.
|
|
# DOMAIN=data.yourcompany.com
|
|
#
|
|
# SERVER_URL: absolute base URL used to build OAuth callback URLs and other
|
|
# external links. Set this to avoid relying on the incoming
|
|
# request's Host header (which a misconfigured proxy can get
|
|
# wrong). Must match the redirect URI registered in OAuth apps.
|
|
# SERVER_URL=https://data.yourcompany.com
|
|
#
|
|
# Uvicorn is started with `--proxy-headers --forwarded-allow-ips='*'` so it
|
|
# trusts X-Forwarded-Proto / X-Forwarded-For from the reverse proxy.
|
|
|
|
# ── TLS TERMINATION (Caddy in cert-file mode) ───────
|
|
# When TLS_FULLCHAIN_URL is set, scripts/ops/agnes-tls-rotate.sh fetches
|
|
# the cert daily from this URL and reloads Caddy on diff (zero downtime).
|
|
# Empty -> no TLS, app serves plain HTTP on :8000. See docs/DEPLOYMENT.md
|
|
# -> TLS for the full bring-up flow.
|
|
#
|
|
# Supported URL schemes (all four scripts/tls-fetch.sh resolves):
|
|
# sm://<secret-name> Google Secret Manager (latest version)
|
|
# gs://<bucket>/<obj> GCS object
|
|
# https://<url> Plain HTTPS download (no redirects allowed)
|
|
# file://<path> Local file (dev/testing only)
|
|
#
|
|
# TLS_FULLCHAIN_URL=
|
|
#
|
|
# TLS_PRIVKEY_URL: optional. Empty -> on-VM RSA-2048 key + CSR auto-
|
|
# generated on first rotate tick (key never leaves the host; CSR at
|
|
# /data/state/certs/cert.csr to submit to your CA). Set to a URL when
|
|
# you want VM-replace resilience (e.g. sm://<secret>).
|
|
# TLS_PRIVKEY_URL=
|
|
#
|
|
# TLS_CSR_SUBJECT: stamped on auto-generated CSRs and on the self-signed
|
|
# bring-up cert that Caddy serves until your CA publishes the real chain.
|
|
# Defaults to /CN=$DOMAIN when unset.
|
|
# TLS_CSR_SUBJECT=/C=US/ST=California/L=San Francisco/O=Your Org/CN=data.yourcompany.com
|
|
|
|
# === Local development ===
|
|
# DEBUG=1 enables:
|
|
# - rich.logging.RichHandler (colored, with tracebacks)
|
|
# - fastapi-debug-toolbar mounted at right edge of HTML pages
|
|
# - DuckDB query capture in the toolbar
|
|
# Note: FastAPI's own debug=True flag is intentionally NOT toggled. The
|
|
# Starlette ServerErrorMiddleware it installs would intercept unhandled
|
|
# exceptions and render a plain-HTML traceback before the custom 500 page
|
|
# (with debug toolbar) can run. See the comment on `app = FastAPI(...)` in
|
|
# app/main.py for details.
|
|
# Never set in production. Keep separate from LOCAL_DEV_MODE (auth bypass).
|
|
# IMPORTANT: DEBUG is read at process start by app/main.py to decide whether
|
|
# to mount the toolbar middleware. The DuckDB connection wrapper in src/db.py
|
|
# reads DEBUG at call time, so the toolbar's mount status is fixed once the
|
|
# app starts, but per-connection instrumentation respects runtime env changes.
|
|
# DEBUG=1
|
|
|
|
# === Optional observability: PostHog ===
|
|
# Off by default. With POSTHOG_API_KEY unset the integration is fully disabled
|
|
# (no JS shipped to the browser, no client init, no network). Setting the key
|
|
# enables backend exception capture, LLM call tracing ($ai_generation events),
|
|
# frontend errors / pageviews, masked session replay, and feature flags.
|
|
# Operator guide: docs/observability.md.
|
|
#
|
|
# POSTHOG_API_KEY must be a PROJECT (publishable, "phc_...") key. The project
|
|
# key is embedded in the browser snippet — do NOT use a personal API key here.
|
|
# POSTHOG_API_KEY=phc_xxx
|
|
#
|
|
# Default points at PostHog's EU Cloud endpoint. Override for the US region or
|
|
# a self-hosted deployment.
|
|
# POSTHOG_HOST=https://eu.i.posthog.com
|
|
#
|
|
# Identification mode for logged-in users:
|
|
# none - never identify; distinct_id is a random cookie
|
|
# id - identify by user.id only (no PII)
|
|
# email - identify by user.id + email (default)
|
|
# full - id + email + name
|
|
# POSTHOG_IDENTIFY_PII=email
|
|
#
|
|
# Disable session replay even when the integration is on (errors / events /
|
|
# flags still flow). Default true.
|
|
# POSTHOG_REPLAY=true
|
|
#
|
|
# Append a CSS selector to the default replay mask list. Useful when a custom
|
|
# template introduces a new sensitive surface (e.g. .customer-pii). The default
|
|
# masks: [data-sensitive], .data-cell, .query-result, .sql-output, code, pre.
|
|
# POSTHOG_REPLAY_MASK_SELECTOR=
|
|
|
|
# Ship prompt + completion bodies inside $ai_generation events. Off by default
|
|
# because LLM prompts in this product routinely include customer SQL / data.
|
|
# Token counts and latency always flow regardless.
|
|
# POSTHOG_LLM_PAYLOADS=0
|
|
|
|
# Environment label tagged on every captured event (super property).
|
|
# Use it in PostHog dashboards to split local / dev / staging / production.
|
|
# Resolution order when unset: LOCAL_DEV_MODE=1 -> "local"; else
|
|
# RELEASE_CHANNEL value; else AGNES_DEPLOYMENT_ENV; else "unknown".
|
|
# POSTHOG_ENVIRONMENT=production
|