* feat(auth): internal roles + external→internal group mapping (foundation)
Two-layer authorization model: external Cloud Identity groups (org-managed)
get mapped onto internal Agnes-defined capabilities (app-managed) via an
admin-curated many-to-many table. Per-request permission checks read off
the session — no DB hit. Refresh requires re-login.
Schema v8 — new tables:
- internal_roles (id, key UNIQUE, display_name, description, owner_module, …)
— app-defined capabilities like 'context_admin'. Modules self-register at
import; the startup hook syncs the registry into this table (idempotent).
- group_mappings (id, external_group_id, internal_role_id FK, …)
— admin-managed bindings, UNIQUE(external_group_id, internal_role_id).
app/auth/role_resolver.py — new module:
- register_internal_role(key, display_name, description, owner_module)
Module-author entry point. lower_snake_case key, immutable, validated.
Same key + same fields = no-op (re-import safe); same key + different
fields = ValueError so two modules can't silently overwrite each other.
- sync_registered_roles_to_db(conn) — startup reconciliation. Inserts new
keys, updates drifted metadata, never deletes (preserves mappings).
- resolve_internal_roles(external_groups, conn) — joins group_mappings.
Sorted, deduplicated role-key list. Plugged into google_callback +
dev-bypass branch in get_current_user.
- require_internal_role('key') — FastAPI dependency factory; reads
session.internal_roles; 403 with explicit message when missing.
Resolution runs at sign-in only (Google callback + LOCAL_DEV_GROUPS change
in dev-bypass) — same semantics as session.google_groups. No admin UI yet;
mappings created via repository directly until follow-up PR ships UI.
21 new tests in tests/test_role_resolver.py: register/list, idempotency,
collision detection, key-format validation; sync insert/update/no-delete;
resolve empty/single/many-to-many/malformed-input; e2e via
LOCAL_DEV_GROUPS — gated endpoint allowed/denied + direct session-cookie
inspection. Full sweep: 178/178 passed across auth + db + repo tests.
(Two pre-existing test_catalog_export.py failures verified unrelated.)
* fix(auth): polish review feedback — first-request dev populate + PAT doc
Two follow-ups from a code-reviewer pass on the foundation commit before
opening the PR:
- Dev-bypass populates session["internal_roles"] on the first request
after sign-in, not just when external groups change. The previous
guard only resolved when groups_changed=True, which left a hole for
the LOCAL_DEV_GROUPS=`""` (explicit empty) flow: target=[],
current=None, neither write branch fires, internal_roles stays
unset, and require_internal_role then 403s with no roles to check
against. The OAuth callback writes session["internal_roles"]
unconditionally on sign-in (even []); dev-bypass now matches that
semantics. Adds a single-pass populate gated on the key being
absent from the session, so subsequent same-state requests still
no-op (cheap session lookup, no resolver call).
- Document that internal roles are session-scoped and PAT/headless
clients will get 403 from any require_internal_role(...) endpoint.
Same constraint already applies to session.google_groups (PAT JWTs
deliberately don't snapshot group memberships — they could change
after issuance with no way to re-sign), but the doc didn't surface
this — an operator pointing a CLI at a role-gated endpoint would
see 403 with no clue why. New "PAT and headless requests" section
spells out the constraint, the rationale, and the three escape
valves (use users.role for the gate; route through OAuth; wait for
the planned `da admin grant-role` CLI helper).
54 auth tests still pass locally (21 role-resolver + 33 existing
auth-provider).
* release(0.11.3): cut release for the internal-roles foundation
Bumps pyproject.toml 0.11.2 → 0.11.3 and renames CHANGELOG's
[Unreleased] section to [0.11.3] — 2026-04-26 (with a fresh
empty [Unreleased] skeleton appended). Adds the matching
[0.11.3] link reference at the bottom of CHANGELOG so the
section heading renders as a hyperlink to the GitHub release
page once the tag lands.
The bullet itself is unchanged content; the rephrasing of
"dev-bypass when external groups change" → "dev-bypass —
populates on first request and whenever external groups
change, mirroring the OAuth callback's always-write
semantics" reflects the polish committed in d590579, plus
the appended PAT/headless caveat pointing at the doc
section that landed in the same polish pass.
* fix(auth): address review feedback from Pavel — PAT-specific 403, audit logs, hardening
Round-2 polish over the internal-roles foundation, addressing Pavel's review
on PR #71. No behavior change for the happy path; tightens the safety rails
and makes the failure modes self-explanatory.
User-visible:
- require_internal_role now distinguishes "no session" (Bearer/PAT caller)
from "signed in but missing role" and surfaces a PAT-specific 403 detail
in the first case ("This endpoint needs an interactive (OAuth) session
— Bearer/PAT tokens do not carry session-resolved roles by design").
- docs/internal-roles.md documents deactivate+reactivate as the supported
"force re-resolve now" lever for users that can't be made to log out.
Internal hardening:
- INFO-level audit log on every successful resolve (OAuth callback +
dev-bypass) so a wrong-role complaint is debuggable from the log alone.
- Startup warning when SESSION_SECRET is shorter than 32 chars, matching
the existing JWT_SECRET_KEY gate — both HMAC surfaces sign trust-laden
state (session.internal_roles, session.google_groups, JWTs).
- _clear_registry_for_tests() now refuses to run unless TESTING=1 so a
stray import path in production can't drop the registered capabilities.
Tests:
- 4 new tests in tests/test_role_resolver.py covering: stale-session
contract after a mid-session mapping revoke (pin the documented
limitation), PAT 403 detail wording, OAuth pipeline data flow from
external groups to internal_roles, and the dev-bypass empty-list
fallback when the resolver raises.
CHANGELOG.md updated under [0.11.3] (### Changed + ### Internal).
CLAUDE.md schema doc bumped from v7 to v8.
---------
Co-authored-by: Claude <noreply@anthropic.com>
13 KiB
Changelog
All notable changes to Agnes AI Data Analyst.
Format: Keep a Changelog. Versions follow Semantic Versioning, pre-1.0 — public surface (CLI flags, REST endpoints, instance.yaml schema, extract.duckdb contract) may shift between minor versions; breaking changes called out under Changed or Removed with the BREAKING marker.
CalVer image tags (stable-YYYY.MM.N, dev-YYYY.MM.N) are produced for every CI build; semver tags (v0.X.Y) are cut at release boundaries and reference the same commit as a stable-* tag from the same day.
[Unreleased]
0.11.3 — 2026-04-26
Authorization-foundation release — adds the internal-roles layer between Cloud Identity groups and per-module capability checks. Schema v8 migration; no admin UI yet (follow-up).
Added
- Internal roles + group mapping (foundation). Schema v8 adds two tables:
internal_roles(app-defined capabilities likecontext_admin,agent_operator, registered by Agnes modules at import time) andgroup_mappings(many-to-many bindings of Cloud Identity group IDs to internal role keys, managed by admins). Newapp.auth.role_resolvermodule exposesregister_internal_role(...)for module authors,sync_registered_roles_to_db(...)(run once at startup, idempotent),resolve_internal_roles(external_groups, conn)(called at sign-in, writes resolved keys intosession["internal_roles"]), and arequire_internal_role("…")FastAPI dependency factory for permission checks. Resolution runs at sign-in (Google OAuth callback + dev-bypass — populates on first request and whenever external groups change, mirroring the OAuth callback's always-write semantics). No DB hit per request. Refresh requires re-login, same semantics assession.google_groups. No admin UI yet — mapping rows must be created via the repository directly until the management UI ships in a follow-up. PAT/headless clients carry no session and therefore cannot passrequire_internal_rolegates by design —require_internal_roledistinguishes "signed-in but missing role" from "no session at all" and surfaces a PAT-specific 403 detail in the second case so an API consumer hitting the wall sees what to fix. Seedocs/internal-roles.md→ PAT and headless requests.
Changed
docs/internal-roles.mddocumentsAdmin → Users → deactivate then reactivateas the supported "force re-resolve now" lever for users you can't get to log out (long-lived sessions, automated clients) — invalidates the existing session and forces a fresh sign-in on the next request.
Internal
- INFO-level audit log on every successful resolve (OAuth callback + dev-bypass) so a "wrong role" complaint is debuggable from the log alone — admin can correlate "user X claims they lost access" with the resolver output without replaying the request.
- Startup warning when
SESSION_SECRETis shorter than 32 chars, matching the existingJWT_SECRET_KEYgate. Both HMAC surfaces sign trust-laden state (session.internal_roles,session.google_groups, JWTs) — keeping the two gates consistent so a weak secret gets surfaced at boot, not after a quiet downgrade. _clear_registry_for_tests()now refuses to run unlessTESTING=1so a stray import path in production can't drop the registered capabilities.
0.11.2 — 2026-04-26
Dev-experience patch release — make LOCAL_DEV_MODE realistic enough to actually exercise group-aware code paths on localhost, and consolidate scattered dev-onboarding instructions into a single docs/local-development.md.
Added
LOCAL_DEV_GROUPSenv var mockssession.google_groupsfor the auto-logged-in dev user whenLOCAL_DEV_MODE=1. JSON array matching the production shape ([{"id":"…","name":"…"}]) so group-aware UI and access-control code paths can be exercised onlocalhostwithout a Google OAuth round-trip. Honored only underLOCAL_DEV_MODE=1. The startup banner reports the parsed group IDs (or warns loudly when the value is set but malformed), so a typo gets surfaced at boot rather than silently on the first authenticated request. Session injection mirrors the production OAuth callback's "always-write" semantics — including clearing stale groups when the operator unsetsLOCAL_DEV_GROUPSmid-session. Seedocs/auth-groups.md→ Local-dev mock.make local-devnow seeds two default mocked groups (Local Dev Engineers+Local Dev Adminsonexample.com) viascripts/run-local-dev.sh, so first-boot/profileis non-empty out of the box. Override withLOCAL_DEV_GROUPS='[…]' make local-dev; disable withLOCAL_DEV_GROUPS= make local-dev.docs/local-development.md— single onboarding doc for working on Agnes locally: TL;DR, whatLOCAL_DEV_MODEactually bypasses, group mocking, what isn't mocked, and the security-rails reminder that dev mode must never reach a production deploy.
Internal
- Fix nightly
docker-e2eCI failures: refresh two stale assertions that had drifted from the live API.tests/test_docker_full.py::test_app_returns_html_on_rootnow expects the auth-aware302 → /login(root has redirected since the auth middleware landed);tests/test_e2e_docker.py::TestDockerHealth::test_health_has_duckdbnow readsservices["duckdb_state"](current health-payload shape, already validated bytests/test_api.py). No application behavior change — these only ran in the scheduled nightly job, so the drift went unnoticed for several PRs.
0.11.1 — 2026-04-26
Patch release — hotfix the missed Caddy env passthrough that should have shipped with 0.11.0, plus codify changelog discipline so this kind of drift gets caught at PR review time next time.
Fixed
docker-compose.ymlcaddy service now passesCADDY_TLSthrough to the container (- CADDY_TLSbare-form passthrough). Without it theCaddyfile{$CADDY_TLS:default}substitution always falls back to cert-file mode regardless of what the operator wrote into.env, and Caddy crash-loops on Let's Encrypt / internal-CA deployments. Should have shipped with #52; first attempt was #55, accidentally closed before merging.
Internal
CLAUDE.md— non-negotiable changelog discipline: every PR touching user-visible behavior must updateCHANGELOG.mdunder## [Unreleased]in the same PR.
0.11.0 — 2026-04-26
First tagged semver release. The version = "2.x" strings that appeared in earlier pyproject.toml snapshots were arbitrary placeholders from the initial scaffold and never reflected actual API maturity — resetting to pre-1.0 to signal that things may still shift.
Added — Auth
- Google Workspace groups on
/profile. OAuth callback fetches the signed-in user's group memberships via Cloud Identity (searchTransitiveGroupswith thesecuritylabel — seedocs/auth-groups.mdfor the GCP setup checklist and thesecurity-vs-discussion_forumgotcha). Profile link added to the user dropdown. - Password reset + invite flows for web and admin (
/auth/password/reset,/admin/users/invite). - Personal access tokens (PAT) with separate
:typ=patJWT claim, per-token revoke, last-used IP tracking, "My tokens" + admin "All tokens" UI. - Email magic-link provider (itsdangerous-signed token).
- Optional
SEED_ADMIN_PASSWORDto pre-hash the seed admin (dev convenience).
Added — Deploy
keboola-deploy.ymlworkflow. Tag-triggered alternative torelease.ymlfor shared dev VMs that want explicit "deploy when I tag" semantics. Publishes immutable:keboola-deploy-<tag>+ floating:keboola-deploy-latestalias.- Caddy + Let's Encrypt + corporate-CA TLS.
Caddyfileparametrized via$CADDY_TLSenv var so a single file serves three regimes: cert-file (corp PKI), Let's Encrypt auto-issue, Caddy-internal-CA. URL-driven cert rotation with self-signed fallback (scripts/grpn/agnes-tls-rotate.sh).docker-compose.tls.ymloverlay closes host:8000when Caddy fronts. dev_instancesschema incustomer-instanceTerraform module gains optionaltls_mode+domain(mirrorsprod_instance).infra-v1.6.0tag.- Optional Google OAuth credentials from Secret Manager. Module reads
google-oauth-client-{id,secret}at boot if present; graceful fallback so non-Google deployments aren't affected. LOCAL_DEV_MODE+make local-dev-up/local-dev-downfor one-keystroke local stack with magic-link auth pre-wired.- Per-developer
dev-<prefix>-latestGHCR alias for branches matching<prefix>/<branch>— push-to-deploy on personal dev VMs. /setupweb wizard for first-time instance setup, plus headlessPOST /api/admin/configureandPOST /api/admin/discover-and-register.- Smoke-test job in CI (Docker-in-CI after every release) +
scripts/smoke-test.shfor post-deploy verification.
Added — CLI
- Wheel distribution + auto-update check on startup.
--versionflag,--dry-run+X/Nprogress onda sync, durable sync (atomic writes + manifest hash + retry on transient errors).- gzip on JSON/HTML responses (server-side).
Added — Data
- Remote query engine. Two-phase BigQuery + DuckDB engine for tables too large to sync locally (
--register-bqflag). - Business metrics. Standardized
metric_definitionstable in DuckDB with starter pack importer (da metrics import). /api/healthreturnsversion,channel,commit_sha,image_tag,schema_version.- Custom connector mount support (
connectors/custom/). - OpenAPI snapshot test for breaking-change detection.
Added — Docs / tooling
docs/auth-groups.md,docs/DEPLOYMENT.md,docs/HACKATHON.md,docs/ONBOARDING.mdrunbooks.scripts/debug/probe_google_groups.py— stdlib-only probe for diagnosing Cloud Identity API issues without a deploy cycle.- Schema migration safety tests (idempotency, data preservation, snapshot).
- Pre-migration snapshot of
system.duckdbbefore schema upgrades. - Auto-generated JWT and session secrets with file persistence (
/data/state/.jwt_secret). - Startup banner logging version, channel, and schema version.
Changed
- BREAKING (deployment) — Caddy compose profile renamed
production→tls. Existingdocker compose --profile production up -dinvocations need to switch. - BREAKING (deployment) — Default
Caddyfilemode is now cert-file (tls /certs/fullchain.pem /certs/privkey.pem); for the previous Let's Encrypt auto-issue behaviour setCADDY_TLS=tls <ops-email>in.env. Seedocs/auth-groups.mdandCaddyfileinline docs. - Schema migration v5→v6→v7: adds
users.active,personal_access_tokenstable,personal_access_tokens.last_used_ip. Auto-applied at boot. - Image-level
AGNES_VERSIONnow sourced frompyproject.tomlat build time (no more drift betweenda --versionand the package metadata). - Vendor-agnostic OSS rule codified in
CLAUDE.md— customer-specific names, hostnames, project IDs belong in consumer infra repos, not in this OSS distribution.
Fixed — Security
- Open-redirect guard for backslash in
safe_next_path. SessionMiddleware max_age=3600 + https_only(was browser-session forever, plain-HTTP-OK).- Timezone-aware datetimes in Keboola metadata cache.
- Atomic magic-link token consumption (closes double-use race under concurrent clicks).
- Bootstrap backdoor closed when passwordless seed admin exists.
- urllib3 1.26→2.6.3 (resolves 4 Dependabot security alerts).
- argon2-cffi adopted for password hashing.
- See docs/padak-security.md for the full audit.
Fixed — Other
uvicorn --proxy-headers --forwarded-allow-ips='*'so OAuth callbacks resolve to https when behind a TLS terminator.scripts/grpn/agnes-tls-rotate.shhardened:--max-redirs 0+--proto '=https'on cert fetch, post-fetch PEM validation (rejects HTML error pages from corp portals),ulimit -c 0to suppress coredumps that could leak the unencrypted privkey, POSIX-safe${arr[@]+"${arr[@]}"}array expansion.scripts/tls-fetch.sh— generic URL fetcher (sm://,gs://,https://,file://) with redirect refusal + PEM validation.kbcstoragemoved to optional dep — unblocks urllib3 security updates; primary Keboola path now uses the DuckDB Keboola extension.- Dependencies consolidated into
pyproject.toml(no morerequirements.txt).
Internal
- Test suite expanded to 1357+ tests (4 layers — unit, integration, web smoke, journey).