* feat(auth): internal roles + external→internal group mapping (foundation)
Two-layer authorization model: external Cloud Identity groups (org-managed)
get mapped onto internal Agnes-defined capabilities (app-managed) via an
admin-curated many-to-many table. Per-request permission checks read off
the session — no DB hit. Refresh requires re-login.
Schema v8 — new tables:
- internal_roles (id, key UNIQUE, display_name, description, owner_module, …)
— app-defined capabilities like 'context_admin'. Modules self-register at
import; the startup hook syncs the registry into this table (idempotent).
- group_mappings (id, external_group_id, internal_role_id FK, …)
— admin-managed bindings, UNIQUE(external_group_id, internal_role_id).
app/auth/role_resolver.py — new module:
- register_internal_role(key, display_name, description, owner_module)
Module-author entry point. lower_snake_case key, immutable, validated.
Same key + same fields = no-op (re-import safe); same key + different
fields = ValueError so two modules can't silently overwrite each other.
- sync_registered_roles_to_db(conn) — startup reconciliation. Inserts new
keys, updates drifted metadata, never deletes (preserves mappings).
- resolve_internal_roles(external_groups, conn) — joins group_mappings.
Sorted, deduplicated role-key list. Plugged into google_callback +
dev-bypass branch in get_current_user.
- require_internal_role('key') — FastAPI dependency factory; reads
session.internal_roles; 403 with explicit message when missing.
Resolution runs at sign-in only (Google callback + LOCAL_DEV_GROUPS change
in dev-bypass) — same semantics as session.google_groups. No admin UI yet;
mappings created via repository directly until follow-up PR ships UI.
21 new tests in tests/test_role_resolver.py: register/list, idempotency,
collision detection, key-format validation; sync insert/update/no-delete;
resolve empty/single/many-to-many/malformed-input; e2e via
LOCAL_DEV_GROUPS — gated endpoint allowed/denied + direct session-cookie
inspection. Full sweep: 178/178 passed across auth + db + repo tests.
(Two pre-existing test_catalog_export.py failures verified unrelated.)
* fix(auth): polish review feedback — first-request dev populate + PAT doc
Two follow-ups from a code-reviewer pass on the foundation commit before
opening the PR:
- Dev-bypass populates session["internal_roles"] on the first request
after sign-in, not just when external groups change. The previous
guard only resolved when groups_changed=True, which left a hole for
the LOCAL_DEV_GROUPS=`""` (explicit empty) flow: target=[],
current=None, neither write branch fires, internal_roles stays
unset, and require_internal_role then 403s with no roles to check
against. The OAuth callback writes session["internal_roles"]
unconditionally on sign-in (even []); dev-bypass now matches that
semantics. Adds a single-pass populate gated on the key being
absent from the session, so subsequent same-state requests still
no-op (cheap session lookup, no resolver call).
- Document that internal roles are session-scoped and PAT/headless
clients will get 403 from any require_internal_role(...) endpoint.
Same constraint already applies to session.google_groups (PAT JWTs
deliberately don't snapshot group memberships — they could change
after issuance with no way to re-sign), but the doc didn't surface
this — an operator pointing a CLI at a role-gated endpoint would
see 403 with no clue why. New "PAT and headless requests" section
spells out the constraint, the rationale, and the three escape
valves (use users.role for the gate; route through OAuth; wait for
the planned `da admin grant-role` CLI helper).
54 auth tests still pass locally (21 role-resolver + 33 existing
auth-provider).
* release(0.11.3): cut release for the internal-roles foundation
Bumps pyproject.toml 0.11.2 → 0.11.3 and renames CHANGELOG's
[Unreleased] section to [0.11.3] — 2026-04-26 (with a fresh
empty [Unreleased] skeleton appended). Adds the matching
[0.11.3] link reference at the bottom of CHANGELOG so the
section heading renders as a hyperlink to the GitHub release
page once the tag lands.
The bullet itself is unchanged content; the rephrasing of
"dev-bypass when external groups change" → "dev-bypass —
populates on first request and whenever external groups
change, mirroring the OAuth callback's always-write
semantics" reflects the polish committed in d590579, plus
the appended PAT/headless caveat pointing at the doc
section that landed in the same polish pass.
* fix(auth): address review feedback from Pavel — PAT-specific 403, audit logs, hardening
Round-2 polish over the internal-roles foundation, addressing Pavel's review
on PR #71. No behavior change for the happy path; tightens the safety rails
and makes the failure modes self-explanatory.
User-visible:
- require_internal_role now distinguishes "no session" (Bearer/PAT caller)
from "signed in but missing role" and surfaces a PAT-specific 403 detail
in the first case ("This endpoint needs an interactive (OAuth) session
— Bearer/PAT tokens do not carry session-resolved roles by design").
- docs/internal-roles.md documents deactivate+reactivate as the supported
"force re-resolve now" lever for users that can't be made to log out.
Internal hardening:
- INFO-level audit log on every successful resolve (OAuth callback +
dev-bypass) so a wrong-role complaint is debuggable from the log alone.
- Startup warning when SESSION_SECRET is shorter than 32 chars, matching
the existing JWT_SECRET_KEY gate — both HMAC surfaces sign trust-laden
state (session.internal_roles, session.google_groups, JWTs).
- _clear_registry_for_tests() now refuses to run unless TESTING=1 so a
stray import path in production can't drop the registered capabilities.
Tests:
- 4 new tests in tests/test_role_resolver.py covering: stale-session
contract after a mid-session mapping revoke (pin the documented
limitation), PAT 403 detail wording, OAuth pipeline data flow from
external groups to internal_roles, and the dev-bypass empty-list
fallback when the resolver raises.
CHANGELOG.md updated under [0.11.3] (### Changed + ### Internal).
CLAUDE.md schema doc bumped from v7 to v8.
---------
Co-authored-by: Claude <noreply@anthropic.com>
140 lines
13 KiB
Markdown
140 lines
13 KiB
Markdown
# Changelog
|
|
|
|
All notable changes to Agnes AI Data Analyst.
|
|
|
|
Format: [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Versions follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html), pre-1.0 — public surface (CLI flags, REST endpoints, `instance.yaml` schema, `extract.duckdb` contract) may shift between minor versions; breaking changes called out under **Changed** or **Removed** with the **BREAKING** marker.
|
|
|
|
CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every CI build; semver tags (`v0.X.Y`) are cut at release boundaries and reference the same commit as a `stable-*` tag from the same day.
|
|
|
|
---
|
|
|
|
## [Unreleased]
|
|
|
|
<!-- Add bullets here. Group: Added / Changed / Fixed / Removed / Internal.
|
|
Mark breaking changes with **BREAKING** at the start of the bullet. -->
|
|
|
|
## [0.11.3] — 2026-04-26
|
|
|
|
Authorization-foundation release — adds the internal-roles layer between Cloud Identity groups and per-module capability checks. Schema v8 migration; no admin UI yet (follow-up).
|
|
|
|
### Added
|
|
|
|
- **Internal roles + group mapping (foundation).** Schema v8 adds two tables: `internal_roles` (app-defined capabilities like `context_admin`, `agent_operator`, registered by Agnes modules at import time) and `group_mappings` (many-to-many bindings of Cloud Identity group IDs to internal role keys, managed by admins). New `app.auth.role_resolver` module exposes `register_internal_role(...)` for module authors, `sync_registered_roles_to_db(...)` (run once at startup, idempotent), `resolve_internal_roles(external_groups, conn)` (called at sign-in, writes resolved keys into `session["internal_roles"]`), and a `require_internal_role("…")` FastAPI dependency factory for permission checks. Resolution runs at sign-in (Google OAuth callback + dev-bypass — populates on first request and whenever external groups change, mirroring the OAuth callback's always-write semantics). No DB hit per request. Refresh requires re-login, same semantics as `session.google_groups`. **No admin UI yet** — mapping rows must be created via the repository directly until the management UI ships in a follow-up. PAT/headless clients carry no session and therefore cannot pass `require_internal_role` gates by design — `require_internal_role` distinguishes "signed-in but missing role" from "no session at all" and surfaces a PAT-specific 403 detail in the second case so an API consumer hitting the wall sees what to fix. See `docs/internal-roles.md` → *PAT and headless requests*.
|
|
|
|
### Changed
|
|
|
|
- `docs/internal-roles.md` documents `Admin → Users → deactivate then reactivate` as the supported "force re-resolve now" lever for users you can't get to log out (long-lived sessions, automated clients) — invalidates the existing session and forces a fresh sign-in on the next request.
|
|
|
|
### Internal
|
|
|
|
- INFO-level audit log on every successful resolve (OAuth callback + dev-bypass) so a "wrong role" complaint is debuggable from the log alone — admin can correlate "user X claims they lost access" with the resolver output without replaying the request.
|
|
- Startup warning when `SESSION_SECRET` is shorter than 32 chars, matching the existing `JWT_SECRET_KEY` gate. Both HMAC surfaces sign trust-laden state (`session.internal_roles`, `session.google_groups`, JWTs) — keeping the two gates consistent so a weak secret gets surfaced at boot, not after a quiet downgrade.
|
|
- `_clear_registry_for_tests()` now refuses to run unless `TESTING=1` so a stray import path in production can't drop the registered capabilities.
|
|
|
|
## [0.11.2] — 2026-04-26
|
|
|
|
Dev-experience patch release — make `LOCAL_DEV_MODE` realistic enough to actually exercise group-aware code paths on `localhost`, and consolidate scattered dev-onboarding instructions into a single `docs/local-development.md`.
|
|
|
|
### Added
|
|
|
|
- **`LOCAL_DEV_GROUPS` env var** mocks `session.google_groups` for the auto-logged-in dev user when `LOCAL_DEV_MODE=1`. JSON array matching the production shape (`[{"id":"…","name":"…"}]`) so group-aware UI and access-control code paths can be exercised on `localhost` without a Google OAuth round-trip. Honored only under `LOCAL_DEV_MODE=1`. The startup banner reports the parsed group IDs (or warns loudly when the value is set but malformed), so a typo gets surfaced at boot rather than silently on the first authenticated request. Session injection mirrors the production OAuth callback's "always-write" semantics — including clearing stale groups when the operator unsets `LOCAL_DEV_GROUPS` mid-session. See `docs/auth-groups.md` → *Local-dev mock*.
|
|
- **`make local-dev` now seeds two default mocked groups** (`Local Dev Engineers` + `Local Dev Admins` on `example.com`) via `scripts/run-local-dev.sh`, so first-boot `/profile` is non-empty out of the box. Override with `LOCAL_DEV_GROUPS='[…]' make local-dev`; disable with `LOCAL_DEV_GROUPS= make local-dev`.
|
|
- **`docs/local-development.md`** — single onboarding doc for working on Agnes locally: TL;DR, what `LOCAL_DEV_MODE` actually bypasses, group mocking, what isn't mocked, and the security-rails reminder that dev mode must never reach a production deploy.
|
|
|
|
### Internal
|
|
|
|
- Fix nightly `docker-e2e` CI failures: refresh two stale assertions that had drifted from the live API. `tests/test_docker_full.py::test_app_returns_html_on_root` now expects the auth-aware `302 → /login` (root has redirected since the auth middleware landed); `tests/test_e2e_docker.py::TestDockerHealth::test_health_has_duckdb` now reads `services["duckdb_state"]` (current health-payload shape, already validated by `tests/test_api.py`). No application behavior change — these only ran in the scheduled nightly job, so the drift went unnoticed for several PRs.
|
|
|
|
## [0.11.1] — 2026-04-26
|
|
|
|
Patch release — hotfix the missed Caddy env passthrough that should have shipped with 0.11.0, plus codify changelog discipline so this kind of drift gets caught at PR review time next time.
|
|
|
|
### Fixed
|
|
|
|
- `docker-compose.yml` caddy service now passes `CADDY_TLS` through to the container (`- CADDY_TLS` bare-form passthrough). Without it the `Caddyfile` `{$CADDY_TLS:default}` substitution always falls back to cert-file mode regardless of what the operator wrote into `.env`, and Caddy crash-loops on Let's Encrypt / internal-CA deployments. Should have shipped with #52; first attempt was #55, accidentally closed before merging.
|
|
|
|
### Internal
|
|
|
|
- `CLAUDE.md` — non-negotiable changelog discipline: every PR touching user-visible behavior must update `CHANGELOG.md` under `## [Unreleased]` in the same PR.
|
|
|
|
## [0.11.0] — 2026-04-26
|
|
|
|
First tagged semver release. The `version = "2.x"` strings that appeared in earlier `pyproject.toml` snapshots were arbitrary placeholders from the initial scaffold and never reflected actual API maturity — resetting to pre-1.0 to signal that things may still shift.
|
|
|
|
### Added — Auth
|
|
|
|
- **Google Workspace groups on `/profile`.** OAuth callback fetches the signed-in user's group memberships via Cloud Identity (`searchTransitiveGroups` with the `security` label — see `docs/auth-groups.md` for the GCP setup checklist and the `security`-vs-`discussion_forum` gotcha). Profile link added to the user dropdown.
|
|
- **Password reset + invite flows** for web and admin (`/auth/password/reset`, `/admin/users/invite`).
|
|
- **Personal access tokens (PAT)** with separate `:typ=pat` JWT claim, per-token revoke, last-used IP tracking, "My tokens" + admin "All tokens" UI.
|
|
- **Email magic-link provider** (itsdangerous-signed token).
|
|
- **Optional `SEED_ADMIN_PASSWORD`** to pre-hash the seed admin (dev convenience).
|
|
|
|
### Added — Deploy
|
|
|
|
- **`keboola-deploy.yml` workflow.** Tag-triggered alternative to `release.yml` for shared dev VMs that want explicit "deploy when I tag" semantics. Publishes immutable `:keboola-deploy-<tag>` + floating `:keboola-deploy-latest` alias.
|
|
- **Caddy + Let's Encrypt + corporate-CA TLS.** `Caddyfile` parametrized via `$CADDY_TLS` env var so a single file serves three regimes: cert-file (corp PKI), Let's Encrypt auto-issue, Caddy-internal-CA. URL-driven cert rotation with self-signed fallback (`scripts/grpn/agnes-tls-rotate.sh`). `docker-compose.tls.yml` overlay closes host `:8000` when Caddy fronts.
|
|
- **`dev_instances` schema in `customer-instance` Terraform module** gains optional `tls_mode` + `domain` (mirrors `prod_instance`). `infra-v1.6.0` tag.
|
|
- **Optional Google OAuth credentials from Secret Manager.** Module reads `google-oauth-client-{id,secret}` at boot if present; graceful fallback so non-Google deployments aren't affected.
|
|
- **`LOCAL_DEV_MODE` + `make local-dev-up` / `local-dev-down`** for one-keystroke local stack with magic-link auth pre-wired.
|
|
- **Per-developer `dev-<prefix>-latest` GHCR alias** for branches matching `<prefix>/<branch>` — push-to-deploy on personal dev VMs.
|
|
- **`/setup` web wizard** for first-time instance setup, plus headless `POST /api/admin/configure` and `POST /api/admin/discover-and-register`.
|
|
- **Smoke-test job in CI** (Docker-in-CI after every release) + `scripts/smoke-test.sh` for post-deploy verification.
|
|
|
|
### Added — CLI
|
|
|
|
- **Wheel distribution** + auto-update check on startup.
|
|
- `--version` flag, `--dry-run` + `X/N` progress on `da sync`, durable sync (atomic writes + manifest hash + retry on transient errors).
|
|
- gzip on JSON/HTML responses (server-side).
|
|
|
|
### Added — Data
|
|
|
|
- **Remote query engine.** Two-phase BigQuery + DuckDB engine for tables too large to sync locally (`--register-bq` flag).
|
|
- **Business metrics.** Standardized `metric_definitions` table in DuckDB with starter pack importer (`da metrics import`).
|
|
- **`/api/health`** returns `version`, `channel`, `commit_sha`, `image_tag`, `schema_version`.
|
|
- **Custom connector mount support** (`connectors/custom/`).
|
|
- **OpenAPI snapshot test** for breaking-change detection.
|
|
|
|
### Added — Docs / tooling
|
|
|
|
- `docs/auth-groups.md`, `docs/DEPLOYMENT.md`, `docs/HACKATHON.md`, `docs/ONBOARDING.md` runbooks.
|
|
- `scripts/debug/probe_google_groups.py` — stdlib-only probe for diagnosing Cloud Identity API issues without a deploy cycle.
|
|
- Schema migration safety tests (idempotency, data preservation, snapshot).
|
|
- Pre-migration snapshot of `system.duckdb` before schema upgrades.
|
|
- Auto-generated JWT and session secrets with file persistence (`/data/state/.jwt_secret`).
|
|
- Startup banner logging version, channel, and schema version.
|
|
|
|
### Changed
|
|
|
|
- **BREAKING (deployment)** — Caddy compose profile renamed `production` → `tls`. Existing `docker compose --profile production up -d` invocations need to switch.
|
|
- **BREAKING (deployment)** — Default `Caddyfile` mode is now cert-file (`tls /certs/fullchain.pem /certs/privkey.pem`); for the previous Let's Encrypt auto-issue behaviour set `CADDY_TLS=tls <ops-email>` in `.env`. See `docs/auth-groups.md` and `Caddyfile` inline docs.
|
|
- Schema migration v5→v6→v7: adds `users.active`, `personal_access_tokens` table, `personal_access_tokens.last_used_ip`. Auto-applied at boot.
|
|
- Image-level `AGNES_VERSION` now sourced from `pyproject.toml` at build time (no more drift between `da --version` and the package metadata).
|
|
- **Vendor-agnostic OSS rule** codified in `CLAUDE.md` — customer-specific names, hostnames, project IDs belong in consumer infra repos, not in this OSS distribution.
|
|
|
|
### Fixed — Security
|
|
|
|
- Open-redirect guard for backslash in `safe_next_path`.
|
|
- `SessionMiddleware max_age=3600 + https_only` (was browser-session forever, plain-HTTP-OK).
|
|
- Timezone-aware datetimes in Keboola metadata cache.
|
|
- Atomic magic-link token consumption (closes double-use race under concurrent clicks).
|
|
- Bootstrap backdoor closed when passwordless seed admin exists.
|
|
- urllib3 1.26→2.6.3 (resolves 4 Dependabot security alerts).
|
|
- argon2-cffi adopted for password hashing.
|
|
- See [docs/padak-security.md](docs/padak-security.md) for the full audit.
|
|
|
|
### Fixed — Other
|
|
|
|
- `uvicorn --proxy-headers --forwarded-allow-ips='*'` so OAuth callbacks resolve to https when behind a TLS terminator.
|
|
- `scripts/grpn/agnes-tls-rotate.sh` hardened: `--max-redirs 0` + `--proto '=https'` on cert fetch, post-fetch PEM validation (rejects HTML error pages from corp portals), `ulimit -c 0` to suppress coredumps that could leak the unencrypted privkey, POSIX-safe `${arr[@]+"${arr[@]}"}` array expansion.
|
|
- `scripts/tls-fetch.sh` — generic URL fetcher (`sm://`, `gs://`, `https://`, `file://`) with redirect refusal + PEM validation.
|
|
- `kbcstorage` moved to optional dep — unblocks urllib3 security updates; primary Keboola path now uses the DuckDB Keboola extension.
|
|
- Dependencies consolidated into `pyproject.toml` (no more `requirements.txt`).
|
|
|
|
### Internal
|
|
|
|
- Test suite expanded to 1357+ tests (4 layers — unit, integration, web smoke, journey).
|
|
|
|
[0.11.3]: https://github.com/keboola/agnes-the-ai-analyst/releases/tag/v0.11.3
|
|
[0.11.2]: https://github.com/keboola/agnes-the-ai-analyst/releases/tag/v0.11.2
|
|
[0.11.1]: https://github.com/keboola/agnes-the-ai-analyst/releases/tag/v0.11.1
|
|
[0.11.0]: https://github.com/keboola/agnes-the-ai-analyst/releases/tag/v0.11.0
|