* feat(auth): internal roles + external→internal group mapping (foundation)
Two-layer authorization model: external Cloud Identity groups (org-managed)
get mapped onto internal Agnes-defined capabilities (app-managed) via an
admin-curated many-to-many table. Per-request permission checks read off
the session — no DB hit. Refresh requires re-login.
Schema v8 — new tables:
- internal_roles (id, key UNIQUE, display_name, description, owner_module, …)
— app-defined capabilities like 'context_admin'. Modules self-register at
import; the startup hook syncs the registry into this table (idempotent).
- group_mappings (id, external_group_id, internal_role_id FK, …)
— admin-managed bindings, UNIQUE(external_group_id, internal_role_id).
app/auth/role_resolver.py — new module:
- register_internal_role(key, display_name, description, owner_module)
Module-author entry point. lower_snake_case key, immutable, validated.
Same key + same fields = no-op (re-import safe); same key + different
fields = ValueError so two modules can't silently overwrite each other.
- sync_registered_roles_to_db(conn) — startup reconciliation. Inserts new
keys, updates drifted metadata, never deletes (preserves mappings).
- resolve_internal_roles(external_groups, conn) — joins group_mappings.
Sorted, deduplicated role-key list. Plugged into google_callback +
dev-bypass branch in get_current_user.
- require_internal_role('key') — FastAPI dependency factory; reads
session.internal_roles; 403 with explicit message when missing.
Resolution runs at sign-in only (Google callback + LOCAL_DEV_GROUPS change
in dev-bypass) — same semantics as session.google_groups. No admin UI yet;
mappings created via repository directly until follow-up PR ships UI.
21 new tests in tests/test_role_resolver.py: register/list, idempotency,
collision detection, key-format validation; sync insert/update/no-delete;
resolve empty/single/many-to-many/malformed-input; e2e via
LOCAL_DEV_GROUPS — gated endpoint allowed/denied + direct session-cookie
inspection. Full sweep: 178/178 passed across auth + db + repo tests.
(Two pre-existing test_catalog_export.py failures verified unrelated.)
* fix(auth): polish review feedback — first-request dev populate + PAT doc
Two follow-ups from a code-reviewer pass on the foundation commit before
opening the PR:
- Dev-bypass populates session["internal_roles"] on the first request
after sign-in, not just when external groups change. The previous
guard only resolved when groups_changed=True, which left a hole for
the LOCAL_DEV_GROUPS=`""` (explicit empty) flow: target=[],
current=None, neither write branch fires, internal_roles stays
unset, and require_internal_role then 403s with no roles to check
against. The OAuth callback writes session["internal_roles"]
unconditionally on sign-in (even []); dev-bypass now matches that
semantics. Adds a single-pass populate gated on the key being
absent from the session, so subsequent same-state requests still
no-op (cheap session lookup, no resolver call).
- Document that internal roles are session-scoped and PAT/headless
clients will get 403 from any require_internal_role(...) endpoint.
Same constraint already applies to session.google_groups (PAT JWTs
deliberately don't snapshot group memberships — they could change
after issuance with no way to re-sign), but the doc didn't surface
this — an operator pointing a CLI at a role-gated endpoint would
see 403 with no clue why. New "PAT and headless requests" section
spells out the constraint, the rationale, and the three escape
valves (use users.role for the gate; route through OAuth; wait for
the planned `da admin grant-role` CLI helper).
54 auth tests still pass locally (21 role-resolver + 33 existing
auth-provider).
* release(0.11.3): cut release for the internal-roles foundation
Bumps pyproject.toml 0.11.2 → 0.11.3 and renames CHANGELOG's
[Unreleased] section to [0.11.3] — 2026-04-26 (with a fresh
empty [Unreleased] skeleton appended). Adds the matching
[0.11.3] link reference at the bottom of CHANGELOG so the
section heading renders as a hyperlink to the GitHub release
page once the tag lands.
The bullet itself is unchanged content; the rephrasing of
"dev-bypass when external groups change" → "dev-bypass —
populates on first request and whenever external groups
change, mirroring the OAuth callback's always-write
semantics" reflects the polish committed in d590579, plus
the appended PAT/headless caveat pointing at the doc
section that landed in the same polish pass.
* fix(auth): address review feedback from Pavel — PAT-specific 403, audit logs, hardening
Round-2 polish over the internal-roles foundation, addressing Pavel's review
on PR #71. No behavior change for the happy path; tightens the safety rails
and makes the failure modes self-explanatory.
User-visible:
- require_internal_role now distinguishes "no session" (Bearer/PAT caller)
from "signed in but missing role" and surfaces a PAT-specific 403 detail
in the first case ("This endpoint needs an interactive (OAuth) session
— Bearer/PAT tokens do not carry session-resolved roles by design").
- docs/internal-roles.md documents deactivate+reactivate as the supported
"force re-resolve now" lever for users that can't be made to log out.
Internal hardening:
- INFO-level audit log on every successful resolve (OAuth callback +
dev-bypass) so a wrong-role complaint is debuggable from the log alone.
- Startup warning when SESSION_SECRET is shorter than 32 chars, matching
the existing JWT_SECRET_KEY gate — both HMAC surfaces sign trust-laden
state (session.internal_roles, session.google_groups, JWTs).
- _clear_registry_for_tests() now refuses to run unless TESTING=1 so a
stray import path in production can't drop the registered capabilities.
Tests:
- 4 new tests in tests/test_role_resolver.py covering: stale-session
contract after a mid-session mapping revoke (pin the documented
limitation), PAT 403 detail wording, OAuth pipeline data flow from
external groups to internal_roles, and the dev-bypass empty-list
fallback when the resolver raises.
CHANGELOG.md updated under [0.11.3] (### Changed + ### Internal).
CLAUDE.md schema doc bumped from v7 to v8.
---------
Co-authored-by: Claude <noreply@anthropic.com>
226 lines
8.5 KiB
Python
226 lines
8.5 KiB
Python
"""Internal-role registry, resolver, and FastAPI dependency factory.
|
|
|
|
## Lifecycle
|
|
|
|
1. **Module import** — each Agnes module declares its internal roles via
|
|
``register_internal_role(...)``. The registry is module-level state, so
|
|
the registration happens once per process.
|
|
2. **App startup** — ``sync_registered_roles_to_db(conn)`` inserts any
|
|
newly-registered keys into ``internal_roles`` and refreshes the metadata
|
|
(display_name, description, owner_module) on existing rows. Idempotent.
|
|
3. **Sign-in** — ``resolve_internal_roles(external_groups, conn)`` joins
|
|
``session.google_groups`` against ``group_mappings`` and writes the
|
|
resulting role-key list into ``session["internal_roles"]``.
|
|
4. **Request handling** — ``require_internal_role("context_admin")`` reads
|
|
the cached list off the session; no DB hit per request.
|
|
|
|
## Refresh semantics
|
|
|
|
Resolution happens at sign-in, so a user with a stale session keeps stale
|
|
roles after an admin changes a mapping. ``Logout → sign in again`` is the
|
|
only refresh path today — the same semantics as Google's group cache and
|
|
the existing ``session.google_groups`` flow.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import logging
|
|
import os
|
|
import re
|
|
import uuid
|
|
from dataclasses import dataclass
|
|
from typing import Optional
|
|
|
|
import duckdb
|
|
from fastapi import Depends, HTTPException, Request, status
|
|
|
|
from app.auth.dependencies import get_current_user
|
|
from src.repositories.group_mappings import GroupMappingsRepository
|
|
from src.repositories.internal_roles import InternalRolesRepository
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
_ROLE_KEY_RE = re.compile(r"^[a-z][a-z0-9_]{0,63}$")
|
|
|
|
|
|
@dataclass(frozen=True)
|
|
class InternalRoleSpec:
|
|
"""Module-side declaration of an internal role.
|
|
|
|
Mirrors the persisted shape minus ``id`` (assigned at sync time) and
|
|
timestamps. Frozen so a stray mutation can't desync registry from DB.
|
|
"""
|
|
key: str
|
|
display_name: str
|
|
description: str = ""
|
|
owner_module: Optional[str] = None
|
|
|
|
|
|
# Module-level registry. Populated by register_internal_role() at import time;
|
|
# drained by sync_registered_roles_to_db() at app startup. Kept module-level
|
|
# (not class-state) because role registration is conceptually per-process.
|
|
_REGISTRY: dict[str, InternalRoleSpec] = {}
|
|
|
|
|
|
def register_internal_role(
|
|
key: str,
|
|
*,
|
|
display_name: str,
|
|
description: str = "",
|
|
owner_module: Optional[str] = None,
|
|
) -> None:
|
|
"""Declare an internal role at module-import time.
|
|
|
|
``key`` is the immutable identifier referenced from code (e.g.
|
|
``"context_admin"``); must match ``[a-z][a-z0-9_]{0,63}``. Calling twice
|
|
with the same key + same fields is a no-op (re-import safe). Calling
|
|
twice with conflicting fields raises ``ValueError`` — that almost always
|
|
means two modules picked the same key, which would leave admins unable
|
|
to tell which capability they're granting in the mapping UI.
|
|
"""
|
|
if not _ROLE_KEY_RE.match(key):
|
|
raise ValueError(
|
|
f"Invalid internal role key {key!r}: must be lower_snake_case, "
|
|
f"start with a letter, max 64 chars."
|
|
)
|
|
spec = InternalRoleSpec(
|
|
key=key,
|
|
display_name=display_name,
|
|
description=description,
|
|
owner_module=owner_module,
|
|
)
|
|
existing = _REGISTRY.get(key)
|
|
if existing is not None and existing != spec:
|
|
raise ValueError(
|
|
f"Internal role {key!r} already registered with different fields "
|
|
f"(existing={existing}, new={spec}). Pick a unique key."
|
|
)
|
|
_REGISTRY[key] = spec
|
|
|
|
|
|
def list_registered_roles() -> list[InternalRoleSpec]:
|
|
"""Snapshot of the current registry — sorted by key for stable output."""
|
|
return sorted(_REGISTRY.values(), key=lambda s: s.key)
|
|
|
|
|
|
def _clear_registry_for_tests() -> None:
|
|
"""Reset the module-level registry. Tests only — never call from app code.
|
|
|
|
Refuses to run unless ``TESTING=1`` so a stray import-path in production
|
|
can't accidentally drop the registered capabilities. Pytest sets this
|
|
via conftest / pytest.ini; production never does.
|
|
"""
|
|
if os.environ.get("TESTING", "").lower() not in ("1", "true"):
|
|
raise RuntimeError(
|
|
"_clear_registry_for_tests() called outside of TESTING — "
|
|
"this drops every registered internal role and is never safe "
|
|
"in app code. Set TESTING=1 if you really mean this.",
|
|
)
|
|
_REGISTRY.clear()
|
|
|
|
|
|
def sync_registered_roles_to_db(conn: duckdb.DuckDBPyConnection) -> None:
|
|
"""Reconcile registered roles into ``internal_roles``. Idempotent.
|
|
|
|
Inserts new keys, updates display_name/description/owner_module for
|
|
existing keys when they've changed. Never deletes — a role disappearing
|
|
from code may just mean the module was unloaded; the DB row keeps the
|
|
mappings safe until an admin explicitly removes it.
|
|
"""
|
|
repo = InternalRolesRepository(conn)
|
|
inserted = 0
|
|
updated = 0
|
|
for spec in _REGISTRY.values():
|
|
existing = repo.get_by_key(spec.key)
|
|
if existing is None:
|
|
repo.create(
|
|
id=str(uuid.uuid4()),
|
|
key=spec.key,
|
|
display_name=spec.display_name,
|
|
description=spec.description,
|
|
owner_module=spec.owner_module,
|
|
)
|
|
inserted += 1
|
|
else:
|
|
drift = (
|
|
existing.get("display_name") != spec.display_name
|
|
or (existing.get("description") or "") != spec.description
|
|
or (existing.get("owner_module") or None) != spec.owner_module
|
|
)
|
|
if drift:
|
|
repo.update(
|
|
id=existing["id"],
|
|
display_name=spec.display_name,
|
|
description=spec.description,
|
|
owner_module=spec.owner_module,
|
|
)
|
|
updated += 1
|
|
if inserted or updated:
|
|
logger.info(
|
|
"internal_roles sync: %d inserted, %d updated, %d total registered",
|
|
inserted, updated, len(_REGISTRY),
|
|
)
|
|
|
|
|
|
def resolve_internal_roles(
|
|
external_groups: list[dict],
|
|
conn: duckdb.DuckDBPyConnection,
|
|
) -> list[str]:
|
|
"""Map ``session.google_groups`` to internal role keys via ``group_mappings``.
|
|
|
|
Pure read of the mapping table — never mutates state. Returns a sorted,
|
|
de-duplicated list of role keys. Empty list when no external groups are
|
|
supplied or none of them are mapped.
|
|
"""
|
|
ids = [g["id"] for g in external_groups if isinstance(g, dict) and g.get("id")]
|
|
if not ids:
|
|
return []
|
|
return GroupMappingsRepository(conn).resolve_role_keys(ids)
|
|
|
|
|
|
def require_internal_role(role_key: str):
|
|
"""FastAPI dependency factory: 403 unless the user holds ``role_key``.
|
|
|
|
Reads ``session["internal_roles"]`` populated at sign-in; no DB hit.
|
|
The ``user`` dependency runs first so we still 401 unauthenticated
|
|
requests with the standard message before checking role membership.
|
|
|
|
PAT/headless callers carry no session-resolved roles (the OAuth callback
|
|
is the only writer of ``session["internal_roles"]``), so any
|
|
``require_internal_role`` gate returns 403 for them by design — see
|
|
``docs/internal-roles.md`` → *PAT and headless requests*. The 403 detail
|
|
distinguishes the two failure modes so an API consumer hitting it via
|
|
a token sees an actionable message instead of a generic "missing role".
|
|
"""
|
|
async def _check(
|
|
request: Request,
|
|
user: dict = Depends(get_current_user),
|
|
) -> dict:
|
|
# "internal_roles in session" is the marker that this request went
|
|
# through a sign-in flow (OAuth callback or dev-bypass). Bearer-token
|
|
# callers leave the key absent — distinguish those from "signed in,
|
|
# but lacks this specific role" so the API consumer sees what to fix.
|
|
has_session_roles = (
|
|
hasattr(request, "session")
|
|
and "internal_roles" in request.session
|
|
)
|
|
roles = (
|
|
request.session.get("internal_roles") or []
|
|
if has_session_roles else []
|
|
)
|
|
if role_key not in roles:
|
|
if not has_session_roles:
|
|
detail = (
|
|
f"Requires internal role '{role_key}'. This endpoint needs "
|
|
f"an interactive (OAuth) session — Bearer/PAT tokens do not "
|
|
f"carry session-resolved roles by design."
|
|
)
|
|
else:
|
|
detail = f"Requires internal role '{role_key}'"
|
|
raise HTTPException(
|
|
status_code=status.HTTP_403_FORBIDDEN,
|
|
detail=detail,
|
|
)
|
|
return user
|
|
return _check
|