agnes-the-ai-analyst/tests/test_web_ui.py
Petr Simecek c25fd41bf7
feat(auth): Google Workspace groups on /profile + tag-triggered Keboola deploy workflow (#56)
* feat(auth): display Google Workspace groups on /profile

- Request cloud-identity.groups.readonly scope in Google OAuth
- Fetch groups via Cloud Identity API after callback; tolerate 4xx
  (non-Workspace tenants) and network errors — never break login
- Store result in Starlette session as google_groups
- Replace /profile redirect with a real profile page rendering
  account details (email, name, role) and the group list; show a
  friendly empty state when no groups are available
- Tests: helper parsing + 403 + exception paths; profile page
  smoke test; updated the old redirect test

* test: remove stale /profile redirect tests

Cherry-pick of Zdeněk's 4f7e4cd ("display Google Workspace groups on
/profile") replaces the /profile redirect with a real profile page —
but only updated one of three tests that expected the old behaviour.

These two tests in test_admin_tokens_ui.py and test_pat.py were left
asserting `/profile → 302 /tokens`, which now returns
`/profile → 302 /login?next=%2Fprofile` for unauth users (the standard
auth guard) or `/profile → 200 HTML` for authenticated users.

Removed both rather than patched — coverage for the new behaviour
already exists in tests/test_auth_providers.py (added by the same
commit). The /tokens render assertions in the deleted test_pat.py case
are redundant with test_admin_tokens_ui.py's own /tokens UI tests.

* fix(auth): Google groups search query needs parent + labels predicates

Cloud Identity Groups Search API returns 400 INVALID_ARGUMENT when the
CEL query lacks the required `parent == 'customers/<id>'` predicate AND
a `'<label>' in labels` membership predicate. Zdeněk's original 4f7e4cd
query had only `member_key_id == '<email>'` — every fetch silently
returned [] and the /profile groups list was always empty.

Fix: build the query with all three required pieces:
  parent == 'customers/my_customer'   (alias = caller's own Workspace
                                       org; no need to look up customer ID)
  member_key_id == '<email>'           (filter to this user's memberships)
  'cloudidentity.googleapis.com/groups.discussion_forum' in labels
                                       (Workspace mailing-list groups —
                                       the common case; security-group
                                       coverage is a follow-up)

Also: log the full error body (not truncated to 200 chars) and the
query string so the next time Google rejects something we can diagnose
in one log line instead of a re-deploy.

Caught when first agnes-dev login completed normally (HTTP 302) but app
log showed `Google groups fetch returned 400 for petr@keboola.com:
{"error":{"code":400,"message":"Request contains an invalid argument."}}`
on the same VM (kids-ai-data-analysis / agnes-dev.keboola.com).

Reference: https://cloud.google.com/identity/docs/reference/rest/v1/groups/search

* feat(web): add Profile link to user dropdown menu

The /profile page (Zdeněk's 4f7e4cd cherry-pick) renders a real profile
view including Google Workspace groups, but had no entry point in the
UI — users could only reach it by typing the URL manually. Add a
"Profile" menu item between the user header (email + role) and
"My tokens" so the page is discoverable.

Side effect: cleaned up the leftover `or _path.startswith('/profile')`
condition on the "My tokens" active class, which dated from the old
/profile → /tokens redirect (removed in c789617). Now each menu item
owns its own active state.

* fix: profile-link tests + .env quoting for CADDY_TLS

Two issues caught by Keboola's first agnes-dev deploy + agnes-auto-upgrade
cron run:

1. tests/test_web_ui.py — two negative assertions ("href=/profile" NOT in
   body) date from when /profile was a redirect-only stub. Now /profile
   is a real page (groups display) AND has a dropdown menu link, so the
   negative assertions flip to positive. Same for ">Profile<" text in
   the non-admin nav test.

2. startup-script.sh.tpl — CADDY_TLS line must be QUOTED in .env, because
   agnes-auto-upgrade.sh sources .env via `set -a; . .env; set +a` and
   bash treats `KEY=value with spaces` as `KEY=value` followed by `with`
   and `spaces` exec attempts. Symptom: cron log spam
   `/opt/agnes/.env: line 14: petr@keboola.com: command not found`,
   the cron exits non-zero, and no auto-upgrade ever happens. Caddy
   itself reads the value fine because docker-compose env_file=.env
   parses key=value properly without shell-evaluating the rest.

   Fix: emit `CADDY_TLS="tls <email>"` instead of `CADDY_TLS=tls <email>`.
   Both the cron source and docker-compose env_file accept the quoted
   form; cron stops failing.

* fix(auth): use searchTransitiveGroups + security label for non-admin user

Three bugs in the original cherry-pick + my prior fix attempt, all caught
by a stdlib probe script (scripts/debug/probe_google_groups.py) run
locally with a Playground-issued OAuth token:

1. Wrong endpoint. `groups:search` is the admin "find groups in org"
   endpoint and 400s for non-admin users regardless of query. Switched
   to `groups/-/memberships:searchTransitiveGroups` which is the
   user-perspective "what groups am I in" endpoint.

2. Wrong label. Querying with `cloudidentity.googleapis.com/groups.discussion_forum`
   returns 403 "Insufficient permissions to retrieve memberships" even
   on the new endpoint — Workspace policy denies non-admin reads of
   discussion-forum groups. Switching to `groups.security` returns 200
   with the actual membership list. Empirically every Workspace group
   at Keboola carries BOTH labels, so the security filter sees the full
   set anyway. Confirmed with the probe script.

3. Wrong response shape. `searchTransitiveGroups` returns
   {"memberships": [...]}, not {"groups": [...]}. Parser updated
   accordingly.

Also adds scripts/debug/probe_google_groups.py — stdlib-only standalone
probe that hits 6 candidate endpoints with a user OAuth token. Saved a
deploy cycle (~10 min) per query iteration; future API-syntax debugging
should start there.

Verified end-to-end: petr@keboola.com login on agnes-dev returns 5
groups (LIC-1PASSWORD, ROLE_ATLASSIAN_*, etc.) via the probe; once
deployed, the same will populate session["google_groups"] and render
on /profile.

* test(auth): update Google groups parser fixture to match searchTransitiveGroups shape

Mock payload was `{"groups": [...]}` (the shape `groups:search` returns).
After switching to `groups/-/memberships:searchTransitiveGroups` in the
prior commit, the actual response is `{"memberships": [...]}` and the
parser iterates that key. Test now mirrors the real shape.

The per-item structure (groupKey.id + displayName) is unchanged, so the
expected output dict stays the same: [{"id": "...", "name": "..."}].

* docs(auth): add docs/auth-groups.md — Google Workspace groups runbook

Captures the non-obvious bits: the GCP-side setup checklist (Cloud
Identity API + scope on consent screen + Internal user type), the
`security` vs `discussion_forum` label trap (the latter 403s for
non-admins, the former 200s — one of those is a 4-iteration debug
session and shouldn't have to be repeated), where groups are stored
(session, not DB) and how to refresh (re-login), plus how to use the
probe script for future API-syntax issues.

Deliberately stops short of explaining "what is Cloud Identity" or
"what is OAuth scope" — those belong in Google's own docs, not ours.

* docs(claude): document release workflows + module versioning + recreate trick

New "Release & deploy workflows" section in CLAUDE.md covers what didn't
exist anywhere in the repo before:

- Distinction between release.yml (auto-build per push) vs the new
  keboola-deploy.yml (tag-triggered, explicit deploy only) — plus when
  to use which (per-developer convenience vs shared dev VM safety)
- Module versioning (infra-vX.Y.Z) and the bump-after-merge dance
- The lifecycle.ignore_changes [metadata_startup_script] gotcha and how
  to force a recreate via workflow_dispatch's recreate_targets input

All generic — no customer hostnames, project IDs, IPs. Customer-specific
deploy steps belong in the consuming infra repo's README.

Also: cross-reference docs/auth-groups.md from the Authentication
section so future Claude sessions find the Workspace-groups runbook
without grepping.

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-04-26 00:56:44 +02:00

386 lines
17 KiB
Python

"""Smoke tests for web UI pages."""
import os
import pytest
from fastapi.testclient import TestClient
@pytest.fixture
def web_client(tmp_path, monkeypatch):
monkeypatch.setenv("DATA_DIR", str(tmp_path))
monkeypatch.setenv("TESTING", "1")
monkeypatch.setenv("JWT_SECRET_KEY", "test-secret-key-min-32-characters!!")
(tmp_path / "state").mkdir()
(tmp_path / "analytics").mkdir()
(tmp_path / "extracts").mkdir()
# Reset global DuckDB singleton to pick up new DATA_DIR
from src.db import close_system_db
close_system_db()
from app.main import create_app
app = create_app()
yield TestClient(app)
close_system_db()
@pytest.fixture
def admin_cookie(web_client, tmp_path, monkeypatch):
from argon2 import PasswordHasher
from src.db import get_system_db
from src.repositories.users import UserRepository
password = "AdminPass1!"
password_hash = PasswordHasher().hash(password)
conn = get_system_db()
UserRepository(conn).create(
id="admin1", email="admin@test.com", name="Admin", role="admin",
password_hash=password_hash,
)
conn.close()
resp = web_client.post("/auth/token", json={"email": "admin@test.com", "password": password})
assert resp.status_code == 200, f"Bootstrap failed: {resp.text}"
token = resp.json()["access_token"]
return {"access_token": token}
@pytest.fixture
def analyst_cookie(web_client, tmp_path, monkeypatch):
from argon2 import PasswordHasher
from src.db import get_system_db
from src.repositories.users import UserRepository
password = "AnalystPass1!"
password_hash = PasswordHasher().hash(password)
conn = get_system_db()
UserRepository(conn).create(
id="analyst1", email="analyst@test.com", name="Analyst", role="analyst",
password_hash=password_hash,
)
conn.close()
resp = web_client.post("/auth/token", json={"email": "analyst@test.com", "password": password})
assert resp.status_code == 200, f"Analyst token failed: {resp.text}"
token = resp.json()["access_token"]
return {"access_token": token}
class TestWebUISmoke:
def test_login_page(self, web_client):
resp = web_client.get("/login")
assert resp.status_code == 200
def test_dashboard(self, web_client, admin_cookie):
resp = web_client.get("/dashboard", cookies=admin_cookie)
assert resp.status_code in (200, 302)
def test_catalog(self, web_client, admin_cookie):
resp = web_client.get("/catalog", cookies=admin_cookie)
assert resp.status_code == 200
def test_corporate_memory(self, web_client, admin_cookie):
resp = web_client.get("/corporate-memory", cookies=admin_cookie)
assert resp.status_code == 200
def test_activity_center(self, web_client, admin_cookie):
resp = web_client.get("/activity-center", cookies=admin_cookie)
assert resp.status_code == 200
def test_admin_tables(self, web_client, admin_cookie):
resp = web_client.get("/admin/tables", cookies=admin_cookie)
if resp.status_code == 404:
pytest.skip("Route /admin/tables does not exist")
assert resp.status_code == 200
def test_admin_permissions(self, web_client, admin_cookie):
resp = web_client.get("/admin/permissions", cookies=admin_cookie)
if resp.status_code == 404:
pytest.skip("Route /admin/permissions does not exist")
assert resp.status_code == 200
def test_admin_users_renders_modern_ui(self, web_client, admin_cookie):
resp = web_client.get("/admin/users", cookies=admin_cookie)
assert resp.status_code == 200
body = resp.text
# New shared header chrome
assert "app-header" in body
# Nav after split: "Tokens" (own) for every signed-in user +
# admin-only "All tokens" link pointing at /admin/tokens.
# Profile link added with the Google-Workspace-groups feature
# (cherry-pick of zs/google-groups-display + dropdown wiring).
assert 'href="/tokens"' in body
assert 'href="/admin/tokens"' in body
assert 'href="/profile"' in body
assert 'href="/admin/users"' in body
# New modern UI markers
assert 'class="users-page"' in body
assert 'role-pill' in body
assert 'class="toggle"' in body
assert 'id="confirm-modal"' in body
def test_nav_shows_tokens_link_for_non_admin(self, web_client, analyst_cookie):
"""Non-admins see 'My tokens' + 'Profile' user-menu links — no 'All tokens'."""
resp = web_client.get("/dashboard", cookies=analyst_cookie)
assert resp.status_code in (200, 302)
if resp.status_code == 302:
# Dashboard may redirect in some flows; follow it for nav check.
resp = web_client.get(resp.headers["location"], cookies=analyst_cookie)
body = resp.text
assert 'href="/tokens"' in body
assert 'href="/profile"' in body
assert ">My tokens<" in body
assert ">Profile<" in body
# Non-admins must NOT see the admin "All tokens" link.
assert 'href="/admin/tokens"' not in body
assert ">All tokens<" not in body
def test_nav_shows_all_tokens_link_for_admin(self, web_client, admin_cookie):
"""Admins see the 'My tokens' user-menu link and the 'All tokens' nav link."""
resp = web_client.get("/dashboard", cookies=admin_cookie)
assert resp.status_code in (200, 302)
if resp.status_code == 302:
resp = web_client.get(resp.headers["location"], cookies=admin_cookie)
body = resp.text
assert 'href="/tokens"' in body
assert 'href="/admin/tokens"' in body
assert ">My tokens<" in body
assert ">All tokens<" in body
def test_profile_renders_account_details(self, web_client, admin_cookie):
"""/profile renders a real profile page with email, name, role."""
resp = web_client.get("/profile", cookies=admin_cookie)
assert resp.status_code == 200
body = resp.text
assert "admin@test.com" in body
# Role pill + link to /tokens for PAT management
assert 'class="role-pill"' in body
assert 'href="/tokens"' in body
# Empty-state copy when no Google groups in session
assert "No Google groups available" in body
def test_profile_requires_auth(self, web_client):
"""/profile requires auth (was a 302 back-compat redirect before)."""
resp = web_client.get("/profile", follow_redirects=False)
# Auth dep raises 401; some configs may redirect to /login — accept either.
assert resp.status_code in (401, 302)
class TestClaudeSetupPreview:
"""/install and /dashboard render a visible, read-only preview of the
'Setup a new Claude Code' clipboard payload. The real token is never
rendered into the HTML — only a styled placeholder is.
"""
def test_install_preview_visible_for_signed_in_user(self, web_client, admin_cookie):
resp = web_client.get("/install", cookies=admin_cookie)
assert resp.status_code == 200
body = resp.text
# Preview card + placeholder token render
assert "setup-preview-pre" in body
assert "What Claude Code will receive" in body
assert "&lt;will be generated on click&gt;" in body
assert 'class="placeholder-token"' in body
# Setup payload text substituted with real server URL. The wheel URL
# must be under /cli/wheel/ (uv tool install rejects a bare .whl alias
# because it validates the PEP 427 filename in the URL before fetch).
assert "/cli/wheel/" in body
assert "/cli/agnes.whl" not in body
# New numbered headers + da diagnose step
assert "1) Install the CLI" in body
assert "4) Run diagnostics" in body
assert "da diagnose" in body
assert "da auth whoami" in body
def test_dashboard_preview_visible(self, web_client, admin_cookie):
resp = web_client.get("/dashboard", cookies=admin_cookie)
assert resp.status_code == 200
body = resp.text
assert "env-setup-cta" in body
assert "setup-preview-pre" in body
assert "What Claude Code will receive" in body
assert "&lt;will be generated on click&gt;" in body
def test_install_mcp_card_removed(self, web_client):
"""The stale 'Use with Claude Code / MCP' card on /install has been
removed — there is no Agnes MCP server today.
"""
resp = web_client.get("/install")
assert resp.status_code == 200
body = resp.text
assert "Use with Claude Code / MCP" not in body
assert "MCP" not in body
class TestAdminRoleGuards:
def test_analyst_cannot_access_admin_tables(self, web_client, admin_cookie, analyst_cookie):
resp = web_client.get("/admin/tables", cookies=analyst_cookie)
assert resp.status_code == 403
def test_analyst_cannot_access_admin_permissions(self, web_client, admin_cookie, analyst_cookie):
resp = web_client.get("/admin/permissions", cookies=analyst_cookie)
assert resp.status_code == 403
def test_admin_can_access_admin_tables(self, web_client, admin_cookie):
resp = web_client.get("/admin/tables", cookies=admin_cookie)
assert resp.status_code == 200
def test_admin_can_access_admin_permissions(self, web_client, admin_cookie):
resp = web_client.get("/admin/permissions", cookies=admin_cookie)
assert resp.status_code == 200
def test_analyst_cannot_access_corporate_memory_admin(self, web_client, admin_cookie, analyst_cookie):
resp = web_client.get("/corporate-memory/admin", cookies=analyst_cookie)
assert resp.status_code == 403
class TestUnauthenticatedHtmlRedirects:
def test_dashboard_unauthenticated_redirects_to_login(self, web_client):
resp = web_client.get("/dashboard", follow_redirects=False)
assert resp.status_code == 302
assert resp.headers["location"].startswith("/login")
assert "next=%2Fdashboard" in resp.headers["location"]
def test_catalog_unauthenticated_redirects_to_login(self, web_client):
resp = web_client.get("/catalog", follow_redirects=False)
assert resp.status_code == 302
assert resp.headers["location"].startswith("/login")
assert "next=%2Fcatalog" in resp.headers["location"]
def test_api_route_still_returns_json_401(self, web_client):
# /api/sync/manifest requires auth; must keep JSON 401 (no redirect).
resp = web_client.get("/api/sync/manifest", follow_redirects=False)
assert resp.status_code == 401
assert resp.headers["content-type"].startswith("application/json")
def test_password_login_honors_next(self, web_client, tmp_path):
from argon2 import PasswordHasher
from src.db import get_system_db
from src.repositories.users import UserRepository
password = "TestPass1!"
conn = get_system_db()
UserRepository(conn).create(
id="u1", email="u1@test.com", name="U1", role="admin",
password_hash=PasswordHasher().hash(password),
)
conn.close()
resp = web_client.post(
"/auth/password/login/web",
data={"email": "u1@test.com", "password": password, "next": "/catalog"},
follow_redirects=False,
)
assert resp.status_code == 302
assert resp.headers["location"] == "/catalog"
def test_password_login_rejects_open_redirect(self, web_client, tmp_path):
from argon2 import PasswordHasher
from src.db import get_system_db
from src.repositories.users import UserRepository
password = "TestPass1!"
conn = get_system_db()
UserRepository(conn).create(
id="u2", email="u2@test.com", name="U2", role="admin",
password_hash=PasswordHasher().hash(password),
)
conn.close()
resp = web_client.post(
"/auth/password/login/web",
data={"email": "u2@test.com", "password": password, "next": "//evil.example/"},
follow_redirects=False,
)
assert resp.status_code == 302
assert resp.headers["location"] == "/dashboard"
@pytest.mark.parametrize("hostile_next,expected_location", [
("javascript:alert(1)", "/dashboard"),
("http://evil.example/", "/dashboard"),
("//evil.example/", "/dashboard"),
("dashboard", "/dashboard"), # missing leading slash
("/foo?bar=baz", "/foo?bar=baz"), # valid same-origin with query
])
def test_password_login_sanitizes_next(self, web_client, tmp_path, hostile_next, expected_location):
from argon2 import PasswordHasher
from src.db import get_system_db
from src.repositories.users import UserRepository
import uuid
password = "TestPass1!"
uid = f"u-{uuid.uuid4().hex[:8]}"
conn = get_system_db()
UserRepository(conn).create(
id=uid, email=f"{uid}@test.com", name=uid, role="admin",
password_hash=PasswordHasher().hash(password),
)
conn.close()
resp = web_client.post(
"/auth/password/login/web",
data={"email": f"{uid}@test.com", "password": password, "next": hostile_next},
follow_redirects=False,
)
assert resp.status_code == 302
assert resp.headers["location"] == expected_location
def test_non_api_post_still_returns_json_401(self, web_client):
# POST to a JSON auth endpoint that lives outside /api/ — must NOT be redirected.
resp = web_client.post("/auth/token", json={"email": "nope@x.com", "password": "wrong"},
follow_redirects=False)
assert resp.status_code == 401
assert resp.headers["content-type"].startswith("application/json")
def test_auth_json_get_still_returns_json_401(self, web_client):
# GET to a JSON endpoint under /auth/* (e.g. PAT CRUD) — must NOT be redirected,
# so CLI clients calling api_get("/auth/tokens") get JSON they can parse.
resp = web_client.get("/auth/tokens", follow_redirects=False)
assert resp.status_code == 401
assert resp.headers["content-type"].startswith("application/json")
def test_login_page_propagates_next_to_password_button(self, web_client):
resp = web_client.get("/login?next=/catalog")
assert resp.status_code == 200
body = resp.text
# Password button URL should carry next.
assert "/login/password?next=%2Fcatalog" in body, \
f"Expected /login/password?next=%2Fcatalog in login page HTML; got snippet: {body[:500]}"
def test_login_page_propagates_next_to_google_button(self, web_client, monkeypatch):
"""The Google OAuth button URL must also carry the ?next param so the
post-login redirect honors the requested destination."""
# Force Google provider to appear available so the button is rendered.
monkeypatch.setattr(
"app.auth.providers.google.is_available", lambda: True,
)
resp = web_client.get("/login?next=/catalog")
assert resp.status_code == 200
body = resp.text
assert "/auth/google/login?next=%2Fcatalog" in body, \
f"Expected google login URL with ?next in login page; snippet: {body[:800]}"
def test_login_email_page_extracts_and_renders_next(self, web_client):
"""/login/email (magic link) must extract ?next from the URL and
emit it into the hidden form field so it round-trips to the POST."""
resp = web_client.get("/login/email?next=/catalog")
assert resp.status_code == 200
body = resp.text
# The template renders <input type="hidden" name="next" value="/catalog">
assert 'name="next" value="/catalog"' in body, \
f"Expected /catalog in next hidden field; snippet: {body[:800]}"
def test_login_email_page_rejects_open_redirect_in_next(self, web_client):
"""Hostile ?next values (e.g. //evil) must be sanitized away before
the hidden field is rendered."""
resp = web_client.get("/login/email?next=//evil.example/")
assert resp.status_code == 200
body = resp.text
assert "evil.example" not in body
# Empty string is the sanitized default.
assert 'name="next" value=""' in body
def test_google_login_stashes_safe_next_in_session(self, web_client, monkeypatch):
"""google_login() must stash the sanitized next_path in the session.
We can't exercise the full OAuth flow without a Google mock, but we
can verify the helper applies the sanitizer correctly."""
from app.auth._common import safe_next_path
# Valid same-origin paths pass through.
assert safe_next_path("/catalog") == "/catalog"
assert safe_next_path("/foo?bar=baz") == "/foo?bar=baz"
# Open-redirect shapes get defaulted.
assert safe_next_path("//evil.example/") == "/dashboard"
assert safe_next_path("http://evil.example/") == "/dashboard"
assert safe_next_path("javascript:alert(1)") == "/dashboard"
assert safe_next_path("") == "/dashboard"
assert safe_next_path(None) == "/dashboard"
# Empty-default variant (used when computing query string).
assert safe_next_path(None, default="") == ""