agnes-the-ai-analyst/app/api/users.py
Vojtech c5948f26fc
fix(api): harden API surface before Swagger (issue #336) (#339)
* fix(api): harden API surface before Swagger — 9 findings from issue #336

ADV-001: POST /api/sync/table-subscriptions now checks can_access() per
table entry, matching the gate already on POST /api/sync/settings.

ADV-002: GET /webhooks/jira/health gated behind require_admin; jira_domain
removed from response to prevent anonymous info disclosure.

ADV-003: GET /api/version no longer exposes commit_sha or schema_version.

ADV-005: /docs, /redoc, /openapi.json now require a valid session via custom
FastAPI routes (docs_url=None, redoc_url=None, openapi_url=None).

ADV-006: /cli/ and /webhooks/ added to _API_PATH_PREFIXES so future
auth-gated routes there return JSON 401 not an HTML redirect.

ADV-007: GET /api/catalog/tables wired to CatalogTablesResponse model.

ADV-008: TableSubscriptionUpdate.tables capped at max_length=500.

ADV-009: GET /api/users and GET /auth/admin/tokens accept limit/offset
(default 1000, max 10000); repositories updated accordingly.

Tests: 11 new regression tests in TestApiHardening336; test_jira_webhooks
fixture updated with seeded admin user; OpenAPI snapshot regenerated.

* fix(test): update test_journey_jira health check to use admin auth after ADV-002 gate

* fix(security): close /auth/bootstrap auth-bypass + BREAKING markers on ADV-002/003/005

Reviewer-flagged regression introduced by ADV-009's pagination on
UserRepository.list_all(): the silent default LIMIT 1000 broke the
bootstrap check at app/auth/router.py and the startup no-password
warning at app/main.py — both call list_all() with no args and depend
on exhaustive enumeration.

On an instance with >1000 users where no password-holder lands in
the email-sorted first page, [u for u in list_all() if
u.get('password_hash')] becomes empty → bootstrap re-opens → an
unauthenticated caller can claim admin via /auth/bootstrap. Real
auth-bypass on a security-sensitive boot path.

Fix:
- src/repositories/users.py: list_all() restored to no-arg, returns
  EVERY row (no LIMIT). Comment explicitly warns against re-adding
  pagination here. API-surface pagination moved to a new
  list_paginated(limit, offset) method with its own docstring.
- app/api/users.py: GET /api/users now calls list_paginated().
  Existing query-param validation (limit <= 10000) preserved.

Regression guards in tests/test_security.py::TestApiHardening336:
- test_users_list_all_returns_every_row_no_silent_limit asserts
  list_all() takes no params other than self (via inspect.signature)
  so a future cleanup can't accidentally re-add limit/offset.
- test_users_list_paginated_is_separate_method asserts the
  paginated variant is a distinct method, not an overload.

CHANGELOG: added **BREAKING** markers per CLAUDE.md release
discipline to three pre-existing ADV bullets that are observable
breaking changes for external consumers:
- ADV-002 (webhook health going from anonymous to admin-only)
- ADV-003 (/api/version dropping commit_sha + schema_version)
- ADV-005 (/docs, /redoc, /openapi.json going from anonymous to
  session-required)

* release: 0.54.25 — API hardening before Swagger (ADV-001..009) + bootstrap-bypass regression fix

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-18 15:13:21 +02:00

461 lines
17 KiB
Python

"""User management endpoints (#11)."""
import logging
import os
import uuid
from datetime import datetime, timezone
from typing import Optional, List
import duckdb
from fastapi import APIRouter, Depends, HTTPException, Query, Request
from pydantic import BaseModel
from argon2 import PasswordHasher
from app.auth.access import is_user_admin, require_admin
from app.auth.dependencies import _get_db
from src.db import SYSTEM_ADMIN_GROUP, SYSTEM_EVERYONE_GROUP
from src.repositories.users import UserRepository
from src.repositories.user_group_members import UserGroupMembersRepository
from src.repositories.audit import AuditRepository
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/api/users", tags=["users"])
def _audit(conn: duckdb.DuckDBPyConnection, actor_id: str, action: str, target_id: str, params: Optional[dict] = None) -> None:
try:
# Convert non-JSON-serializable values (datetime) to strings first
safe_params = None
if params:
safe_params = {}
for k, v in params.items():
if isinstance(v, datetime):
safe_params[k] = v.isoformat()
else:
safe_params[k] = v
AuditRepository(conn).log(
user_id=actor_id,
action=action,
resource=f"user:{target_id}",
params=safe_params,
)
except Exception:
pass # never block the endpoint on audit failure
class CreateUserRequest(BaseModel):
email: str
name: str
send_invite: bool = False
class UpdateUserRequest(BaseModel):
name: Optional[str] = None
active: Optional[bool] = None
class SetPasswordRequest(BaseModel):
password: str
class GroupBrief(BaseModel):
id: str
name: str
is_system: bool = False
# Same 'system' | 'custom' | 'google_sync' tag as /api/admin/groups —
# the user list renders membership chips with color-coded backgrounds
# (Admin yellow, Everyone gray, google_sync green, custom purple) and
# needs the origin to pick the right swatch.
origin: str = "custom"
class UserResponse(BaseModel):
id: str
email: str
name: Optional[str]
role: str
is_admin: bool = False
is_sso_user: bool = False
groups: List[GroupBrief] = []
active: bool = True
created_at: Optional[str]
deactivated_at: Optional[str] = None
invite_url: Optional[str] = None
invite_email_sent: Optional[bool] = None
def _resolve_role(u: dict, conn: duckdb.DuckDBPyConnection) -> str:
"""Derive a label for the response. ``admin`` if the user is in the Admin
system group, otherwise ``user`` — the legacy 4-value enum collapsed to
a binary in v12 (admin / non-admin). The DB column ``users.role`` is a
deprecated artifact; we ignore it."""
return "admin" if is_user_admin(u["id"], conn) else "user"
def _user_groups(user_id: str, conn: duckdb.DuckDBPyConnection) -> List[GroupBrief]:
"""Groups the user is a member of, sorted with system groups first.
Inlined into ``/api/users`` responses so the admin list view can show
membership chips per row without an N+1 fetch. ``origin`` is computed
via the same ``_derive_origin`` helper /api/admin/groups uses, so
chip colors stay in lock-step across the two surfaces.
"""
from app.api.access import _derive_origin
rows = conn.execute(
"""SELECT g.id, g.name, g.is_system, g.created_by
FROM user_group_members m
JOIN user_groups g ON g.id = m.group_id
WHERE m.user_id = ?
ORDER BY g.is_system DESC, g.name""",
[user_id],
).fetchall()
return [
GroupBrief(
id=r[0],
name=r[1],
is_system=bool(r[2]),
origin=_derive_origin(
{"is_system": bool(r[2]), "name": r[1], "created_by": r[3]}
),
)
for r in rows
]
def _is_sso_user(user_id: str, conn: duckdb.DuckDBPyConnection) -> bool:
"""Whether the user is sourced from an external SSO provider.
Today the only SSO provider is Google Workspace, but the name is kept
generic so a future provider (Cloudflare Access, Okta, …) can plug into
the same flag without churning the API surface. The admin UI hides the
password-reset / set-password / delete affordances when this is True —
those accounts are managed upstream and editing them here would either
be no-ops (password) or get reverted on next sync (delete).
A user counts as SSO-managed if they are a member of any group where:
1. ``user_groups.created_by = 'system:google-sync'`` — the OAuth
callback auto-created this group from a Workspace claim, OR
2. the group is the seeded ``Admin`` system row AND
``AGNES_GROUP_ADMIN_EMAIL`` is set (env-mapped to a Workspace
admin group), OR
3. the group is the seeded ``Everyone`` system row AND
``AGNES_GROUP_EVERYONE_EMAIL`` is set (env-mapped to a Workspace
everyone group).
Users with no groups, or only admin-created custom groups, are NOT
SSO users — local accounts are unaffected.
Env values are read per-request so operators flipping the mapping
don't have to restart the process.
"""
rows = conn.execute(
"""SELECT g.name, g.is_system, g.created_by, m.source
FROM user_group_members m
JOIN user_groups g ON g.id = m.group_id
WHERE m.user_id = ?""",
[user_id],
).fetchall()
if not rows:
return False
admin_mapped = bool(os.environ.get("AGNES_GROUP_ADMIN_EMAIL", "").strip())
everyone_mapped = bool(os.environ.get("AGNES_GROUP_EVERYONE_EMAIL", "").strip())
for name, is_system, created_by, source in rows:
if created_by == "system:google-sync":
# google-sync groups are always SSO-managed regardless of how
# the individual membership was created — the group itself
# only exists because of Google sync.
return True
# System-group branches (Admin / Everyone): the group accepts
# memberships from MULTIPLE sources (system_seed for v13 backfill,
# admin for manual adds, google_sync from OAuth callback). The
# group being env-mapped to Workspace tells us SSO is *configured*,
# but only memberships whose source is 'google_sync' are actually
# owned by the upstream IdP. system_seed / admin memberships in
# the same group are local-only and must stay locally manageable.
# (Devin BUG_0002 on PR #142: without this check, the v13 migration's
# blanket Everyone backfill flips every local user to SSO the moment
# AGNES_GROUP_EVERYONE_EMAIL is set, locking admins out of password
# reset / delete on accounts the IdP doesn't actually own.)
if is_system and name == SYSTEM_ADMIN_GROUP and admin_mapped and source == "google_sync":
return True
if is_system and name == SYSTEM_EVERYONE_GROUP and everyone_mapped and source == "google_sync":
return True
return False
def _to_response(
u: dict,
conn: duckdb.DuckDBPyConnection,
invite_url: Optional[str] = None,
invite_email_sent: Optional[bool] = None,
) -> UserResponse:
groups = _user_groups(u["id"], conn)
return UserResponse(
id=u["id"],
email=u["email"],
name=u.get("name"),
role=_resolve_role(u, conn),
is_admin=any(g.name == SYSTEM_ADMIN_GROUP for g in groups),
is_sso_user=_is_sso_user(u["id"], conn),
groups=groups,
active=bool(u.get("active", True)),
created_at=str(u.get("created_at", "")),
deactivated_at=str(u["deactivated_at"]) if u.get("deactivated_at") else None,
invite_url=invite_url,
invite_email_sent=invite_email_sent,
)
def _set_admin_membership(
user_id: str,
is_admin: bool,
actor_email: Optional[str],
conn: duckdb.DuckDBPyConnection,
) -> None:
"""Add or remove the user's Admin group membership. Idempotent."""
admin_group = conn.execute(
"SELECT id FROM user_groups WHERE name = ?", [SYSTEM_ADMIN_GROUP],
).fetchone()
if not admin_group:
return
members = UserGroupMembersRepository(conn)
if is_admin:
members.add_member(user_id, admin_group[0], "admin", actor_email)
else:
members.remove_member(user_id, admin_group[0])
@router.get("", response_model=List[UserResponse])
async def list_users(
limit: int = Query(default=1000, ge=1, le=10000),
offset: int = Query(default=0, ge=0),
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
return [_to_response(u, conn) for u in UserRepository(conn).list_paginated(limit=limit, offset=offset)]
@router.get("/{user_id}", response_model=UserResponse)
async def get_user(
user_id: str,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Single-user payload used by the /admin/users/{id} detail page header
and the account-status block. Same shape as the list endpoint, so the
page can reuse the same response shape."""
target = UserRepository(conn).get_by_id(user_id)
if not target:
raise HTTPException(status_code=404, detail="User not found")
return _to_response(target, conn)
@router.post("", response_model=UserResponse, status_code=201)
async def create_user(
payload: CreateUserRequest,
request: Request,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
repo = UserRepository(conn)
if repo.get_by_email(payload.email):
raise HTTPException(status_code=409, detail="User with this email already exists")
import secrets
user_id = str(uuid.uuid4())
repo.create(id=user_id, email=payload.email, name=payload.name)
# New users start with no group memberships — admin promotion is an
# explicit follow-up step (POST /api/admin/users/{id}/memberships with
# the Admin group_id, or POST /api/admin/groups/{admin_id}/members).
# v39: subscribe to every system plugin so the mandatory tier
# reaches the new user on first sign-in without admin reconcile.
try:
from src.repositories.user_curated_subscriptions import (
UserCuratedSubscriptionsRepository,
)
UserCuratedSubscriptionsRepository(
conn
).fanout_system_for_user(user_id)
except Exception:
logger.exception(
"system-plugin fanout failed for new user %s", payload.email,
)
_audit(conn, user["id"], "user.create", user_id, {"email": payload.email})
invite_url: Optional[str] = None
invite_email_sent: Optional[bool] = None
if payload.send_invite:
token = secrets.token_urlsafe(32)
repo.update(
id=user_id,
setup_token=token,
setup_token_created=datetime.now(timezone.utc),
)
from app.auth.providers.password import build_setup_url, send_setup_email
invite_url = build_setup_url(request, payload.email, token)
invite_email_sent = send_setup_email(request, payload.email, token)
_audit(conn, user["id"], "user.invite", user_id, {"email": payload.email, "email_sent": invite_email_sent})
created = repo.get_by_id(user_id)
return _to_response(created, conn, invite_url=invite_url, invite_email_sent=invite_email_sent)
@router.patch("/{user_id}", response_model=UserResponse)
async def update_user(
user_id: str,
payload: UpdateUserRequest,
request: Request,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
repo = UserRepository(conn)
target = repo.get_by_id(user_id)
if not target:
raise HTTPException(status_code=404, detail="User not found")
target_is_admin = is_user_admin(target["id"], conn)
updates: dict = {}
if payload.name is not None:
updates["name"] = payload.name
if payload.active is not None:
if target["id"] == user["id"] and payload.active is False:
raise HTTPException(status_code=409, detail="Cannot deactivate yourself")
if (
target_is_admin
and payload.active is False
and repo.count_admins(active_only=True) <= 1
):
raise HTTPException(status_code=409, detail="Cannot deactivate the last active admin")
updates["active"] = payload.active
if payload.active is False:
updates["deactivated_at"] = datetime.now(timezone.utc)
updates["deactivated_by"] = user["id"]
else:
updates["deactivated_at"] = None
updates["deactivated_by"] = None
if updates:
repo.update(id=user_id, **updates)
_audit(conn, user["id"], "user.update", user_id, {k: v for k, v in updates.items() if k != "deactivated_at"})
return _to_response(repo.get_by_id(user_id), conn)
_SSO_LOCKED_DETAIL = (
"User is managed by an external SSO provider; "
"this operation must be performed in the upstream system"
)
def _reject_if_sso(target_id: str, conn: duckdb.DuckDBPyConnection) -> None:
"""409 if the target is SSO-managed.
The admin UI hides the password / delete affordances for SSO users, but
the UI-only guard is bypassable by anyone who calls /api/users/...
directly with a valid admin token. This is the server-side enforcement
that backs the UI: admins cannot reset / set / wipe a Google-Workspace
account through Agnes — those mutations belong upstream.
"""
if _is_sso_user(target_id, conn):
raise HTTPException(status_code=409, detail=_SSO_LOCKED_DETAIL)
@router.delete("/{user_id}", status_code=204)
async def delete_user(
user_id: str,
request: Request,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
repo = UserRepository(conn)
target = repo.get_by_id(user_id)
if not target:
raise HTTPException(status_code=404, detail="User not found")
if target["id"] == user["id"]:
raise HTTPException(status_code=409, detail="Cannot delete yourself")
_reject_if_sso(target["id"], conn)
if is_user_admin(target["id"], conn) and repo.count_admins(active_only=True) <= 1:
raise HTTPException(status_code=409, detail="Cannot delete the last active admin")
repo.delete(user_id)
_audit(conn, user["id"], "user.delete", user_id, {"email": target["email"]})
@router.post("/{user_id}/reset-password")
async def reset_password(
user_id: str,
request: Request,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Generate a reset token and (best-effort) email it to the user."""
import secrets
repo = UserRepository(conn)
target = repo.get_by_id(user_id)
if not target:
raise HTTPException(status_code=404, detail="User not found")
_reject_if_sso(target["id"], conn)
token = secrets.token_urlsafe(32)
repo.update(
id=user_id,
reset_token=token,
reset_token_created=datetime.now(timezone.utc),
)
_audit(conn, user["id"], "user.reset_password", user_id, {"email": target["email"]})
# Dedicated password-reset email/URL — points to /auth/password/reset where the
# user sets a new password, NOT to the magic-link verify endpoint (which would
# log them in without prompting for a new password).
from app.auth.providers.password import build_reset_url, send_reset_email
reset_url = build_reset_url(request, target["email"], token)
email_sent = send_reset_email(request, target["email"], token)
return {
"reset_url": reset_url,
"email_sent": email_sent,
}
@router.post("/{user_id}/set-password", status_code=204)
async def set_password(
user_id: str,
payload: SetPasswordRequest,
request: Request,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
if not payload.password or len(payload.password) < 8:
raise HTTPException(status_code=400, detail="Password must be at least 8 characters")
repo = UserRepository(conn)
target = repo.get_by_id(user_id)
if not target:
raise HTTPException(status_code=404, detail="User not found")
_reject_if_sso(target["id"], conn)
ph = PasswordHasher()
repo.update(id=user_id, password_hash=ph.hash(payload.password))
_audit(conn, user["id"], "user.set_password", user_id, {"email": target["email"]})
@router.post("/{user_id}/deactivate", response_model=UserResponse)
async def deactivate_user(
user_id: str,
request: Request,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
return await update_user(
user_id=user_id,
payload=UpdateUserRequest(active=False),
request=request, user=user, conn=conn,
)
@router.post("/{user_id}/activate", response_model=UserResponse)
async def activate_user(
user_id: str,
request: Request,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
return await update_user(
user_id=user_id,
payload=UpdateUserRequest(active=True),
request=request, user=user, conn=conn,
)