agnes-the-ai-analyst/tests/test_api.py
minasarustamyan d4ac84dd46
feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150)
* feat(rbac): drop dataset_permissions + access_requests + users.role + is_public; v19 migration

BREAKING. Sjednocení datové RBAC vrstvy do per-group resource_grants modelu.
Před PR byla legacy data RBAC vrstva (dataset_permissions + is_public bypass)
de-facto neaktivní — is_public neměl API/UI/CLI surface, default true znamenal
že can_access_table vždycky bypassl. Dnes každý non-admin přístup vyžaduje
explicitní resource_grants(group, "table", id) řádek.

Schema v18 → v19 (src/db.py:_v18_to_v19_finalize):
- DROP TABLE dataset_permissions, access_requests
- DROP COLUMN users.role (NULL artifact since v13)
- DROP COLUMN table_registry.is_public
- Drops přes table-rebuild idiom (rename → create new → INSERT … SELECT
  → drop old) kvůli DuckDB ALTER DROP COLUMN limitacím na tabulkách
  s historic FK constraints. INSERT picks intersection sloupců, takže
  test fixtures s minimal pre-v19 schemou migrate cleanly.

Runtime:
- src/rbac.py:can_access_table → deleguje na app.auth.access.can_access
- DatasetPermissionRepository, AccessRequestRepository smazány
- AGNES_ENABLE_TABLE_GRANTS env-gate v app/resource_types.py odstraněn
  (TABLE je unconditionally enabled)

API drop:
- app/api/permissions.py, app/api/access_requests.py celé soubory
- /admin/permissions web route + admin_permissions.html
- "Request Access" modal v catalog.html + locked-row UI
- ~10 if user.get("role") != "admin" checků nahrazeno (admin shortcut
  je uvnitř can_access_table)
- /api/settings: drop permissions field z GET; PUT /api/settings/dataset
  gate přepnut na can_access(user_id, "table", dataset, conn)

Auth:
- app/auth/jwt.py:create_access_token: drop role parametr (claim zmizí
  z nově vydávaných JWT; staré tokeny zůstávají valid, claim ignored)
- app/api/users.py: drop role z CreateUserRequest / UpdateUserRequest
  (admin promotion = explicit add to Admin group via memberships API)
- src/repositories/users.py: drop role z create() / update()

CLI:
- da admin set-role smazán → hard-fail s replacement command
- da admin add-user --role flag pryč
- da auth import-token --role flag pryč
- da auth whoami: drop "Role:" výpis
- cli/config.py:save_token: role parametr now optional, no longer written
  (back-compat se starými token.json soubory zachována — pole se ignoruje)

Tests:
- DELETE: test_permissions.py, test_permissions_api.py, test_access_requests_api.py
- REWRITE: test_access_control.py (resource_grants flow), test_rbac.py
  (can_access_table over resource_grants), test_journey_rbac.py
  (drop access-request flow), test_resource_types.py (drop env-gate
  tests, drop is_public from helpers), test_v2_*.py (drop role-based
  user dicts in favor of id-based + Admin group membership),
  test_settings_api.py (no permissions field, can_access gate)
- TRIVIAL: ~30 souborů — drop role="admin" arg z UserRepository.create
  a 3rd positional role z create_access_token
- NEW: test_v18_to_v19 migration test (test_db.py),
  test_can_access_table_no_implicit_public (test_rbac.py),
  test_admin_set_role_returns_hardfail (test_cli_admin.py)
- OpenAPI snapshot regenerated

Docs:
- CHANGELOG: BREAKING entry pod [Unreleased]
- CLAUDE.md: schema v18 → v19
- docs/architecture.md: schema table + RBAC sekce přepsána
- docs/auth-google-oauth.md: admin promotion přes da admin break-glass
- cli/skills/security.md: kompletně přepsáno na group-based model
- docs/TODO-rbac-data-enforcement.md: smazáno (TODO splněn)

Test results: 2363 passed, 19 failed. Zbývající failures jsou pre-existing
Windows-specific issues (fcntl, charset) nesouvisející s tímto PR —
ověřeno git stash pop.

Plan: ~/.claude/plans/floofy-coalescing-parnas.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): cut 0.27.0

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-04-30 22:02:16 +02:00

517 lines
19 KiB
Python

"""Tests for FastAPI endpoints."""
import os
import pytest
from fastapi.testclient import TestClient
def _auth(token):
return {"Authorization": f"Bearer {token}"}
@pytest.fixture
def app_client(tmp_path, monkeypatch):
monkeypatch.setenv("DATA_DIR", str(tmp_path))
monkeypatch.setenv("JWT_SECRET_KEY", "test-secret")
from app.main import create_app
app = create_app()
return TestClient(app)
@pytest.fixture
def seeded_client(tmp_path, monkeypatch):
"""Client with a pre-created admin user and JWT token."""
monkeypatch.setenv("DATA_DIR", str(tmp_path))
monkeypatch.setenv("JWT_SECRET_KEY", "test-secret")
from app.main import create_app
from src.db import get_system_db
from src.repositories.users import UserRepository
from app.auth.jwt import create_access_token
from argon2 import PasswordHasher
ph = PasswordHasher()
from tests.helpers.auth import grant_admin
conn = get_system_db()
repo = UserRepository(conn)
repo.create(id="admin1", email="admin@acme.com", name="Admin",
password_hash=ph.hash("adminpass"))
repo.create(id="analyst1", email="analyst@acme.com", name="Analyst",
password_hash=ph.hash("analystpass"))
grant_admin(conn, "admin1")
conn.close()
app = create_app()
client = TestClient(app)
admin_token = create_access_token("admin1", "admin@acme.com")
analyst_token = create_access_token("analyst1", "analyst@acme.com")
return client, admin_token, analyst_token
# ---- Health ----
class TestHealth:
def test_health_no_auth(self, app_client):
resp = app_client.get("/api/health")
assert resp.status_code == 200
data = resp.json()
assert data["status"] == "ok"
def test_health_detailed_requires_auth(self, app_client):
resp = app_client.get("/api/health/detailed")
assert resp.status_code in (401, 403)
def test_health_detailed_has_duckdb_check(self, seeded_app):
c = seeded_app["client"]
resp = c.get("/api/health/detailed", headers=_auth(seeded_app["admin_token"]))
assert resp.status_code == 200
data = resp.json()
assert data["status"] in ("healthy", "degraded", "unhealthy")
assert "services" in data
assert "duckdb_state" in data["services"]
assert data["services"]["duckdb_state"]["status"] == "ok"
# ---- Auth ----
class TestAuth:
def test_token_for_existing_user(self, seeded_client):
client, _, _ = seeded_client
resp = client.post("/auth/token", json={"email": "admin@acme.com", "password": "adminpass"})
assert resp.status_code == 200
data = resp.json()
assert "access_token" in data
assert data["role"] == "admin"
def test_token_for_unknown_user(self, seeded_client):
client, _, _ = seeded_client
resp = client.post("/auth/token", json={"email": "nobody@acme.com"})
assert resp.status_code == 401
def test_protected_endpoint_without_token(self, seeded_client):
client, _, _ = seeded_client
resp = client.get("/api/users")
assert resp.status_code == 401
def test_protected_endpoint_with_token(self, seeded_client):
client, admin_token, _ = seeded_client
resp = client.get("/api/users", headers={"Authorization": f"Bearer {admin_token}"})
assert resp.status_code == 200
# ---- RBAC ----
class TestRBAC:
def test_admin_can_list_users(self, seeded_client):
client, admin_token, _ = seeded_client
resp = client.get("/api/users", headers={"Authorization": f"Bearer {admin_token}"})
assert resp.status_code == 200
assert len(resp.json()) == 2
def test_analyst_cannot_list_users(self, seeded_client):
client, _, analyst_token = seeded_client
resp = client.get("/api/users", headers={"Authorization": f"Bearer {analyst_token}"})
assert resp.status_code == 403
def test_analyst_cannot_trigger_sync(self, seeded_client):
client, _, analyst_token = seeded_client
resp = client.post("/api/sync/trigger", headers={"Authorization": f"Bearer {analyst_token}"})
assert resp.status_code == 403
# ---- Sync Manifest ----
class TestSyncManifest:
def test_manifest_returns_tables(self, seeded_client):
client, admin_token, _ = seeded_client
# Seed some sync state
from src.db import get_system_db
from src.repositories.sync_state import SyncStateRepository
conn = get_system_db()
repo = SyncStateRepository(conn)
repo.update_sync(table_id="orders", rows=1000, file_size_bytes=5000, hash="abc")
conn.close()
resp = client.get("/api/sync/manifest", headers={"Authorization": f"Bearer {admin_token}"})
assert resp.status_code == 200
data = resp.json()
assert "tables" in data
assert "orders" in data["tables"]
assert data["tables"]["orders"]["rows"] == 1000
# ---- Users CRUD ----
class TestUsersCRUD:
def test_create_user(self, seeded_client):
client, admin_token, _ = seeded_client
resp = client.post(
"/api/users",
json={"email": "new@acme.com", "name": "New User"},
headers={"Authorization": f"Bearer {admin_token}"},
)
assert resp.status_code == 201
assert resp.json()["email"] == "new@acme.com"
def test_create_duplicate_user(self, seeded_client):
client, admin_token, _ = seeded_client
resp = client.post(
"/api/users",
json={"email": "admin@acme.com", "name": "Duplicate"},
headers={"Authorization": f"Bearer {admin_token}"},
)
assert resp.status_code == 409
def test_delete_user(self, seeded_client):
client, admin_token, _ = seeded_client
resp = client.delete(
"/api/users/analyst1",
headers={"Authorization": f"Bearer {admin_token}"},
)
assert resp.status_code == 204
# ---- Knowledge / Memory ----
class TestMemory:
def test_create_and_list(self, seeded_client):
client, _, analyst_token = seeded_client
headers = {"Authorization": f"Bearer {analyst_token}"}
# Create
resp = client.post("/api/memory", json={
"title": "MRR Definition",
"content": "Monthly recurring revenue",
"category": "metrics",
}, headers=headers)
assert resp.status_code == 201
item_id = resp.json()["id"]
# List
resp = client.get("/api/memory", headers=headers)
assert resp.status_code == 200
assert resp.json()["count"] == 1
def test_vote(self, seeded_client):
client, _, analyst_token = seeded_client
headers = {"Authorization": f"Bearer {analyst_token}"}
resp = client.post("/api/memory", json={
"title": "Test", "content": "test", "category": "test",
}, headers=headers)
item_id = resp.json()["id"]
resp = client.post(f"/api/memory/{item_id}/vote", json={"vote": 1}, headers=headers)
assert resp.status_code == 200
assert resp.json()["upvotes"] == 1
def test_search(self, seeded_client):
client, _, analyst_token = seeded_client
headers = {"Authorization": f"Bearer {analyst_token}"}
client.post("/api/memory", json={
"title": "Revenue report", "content": "MRR trends", "category": "finance",
}, headers=headers)
client.post("/api/memory", json={
"title": "Support SLA", "content": "Response times", "category": "support",
}, headers=headers)
resp = client.get("/api/memory?search=revenue", headers=headers)
assert resp.json()["count"] == 1
# ---- Upload ----
class TestUpload:
def test_upload_session(self, seeded_client):
client, _, analyst_token = seeded_client
headers = {"Authorization": f"Bearer {analyst_token}"}
resp = client.post(
"/api/upload/sessions",
files={"file": ("session.jsonl", b'{"role":"user","content":"hello"}', "application/jsonl")},
headers=headers,
)
assert resp.status_code == 200
assert resp.json()["size"] > 0
def test_upload_local_md(self, seeded_client):
client, _, analyst_token = seeded_client
headers = {"Authorization": f"Bearer {analyst_token}"}
resp = client.post(
"/api/upload/local-md",
json={"content": "# My knowledge\n\nMRR = Monthly Recurring Revenue"},
headers=headers,
)
assert resp.status_code == 200
assert resp.json()["user"] == "analyst@acme.com"
# ---- Metrics API ----
SAMPLE_METRIC = {
"id": "finance/mrr",
"name": "mrr",
"display_name": "Monthly Recurring Revenue",
"category": "finance",
"sql": "SELECT SUM(amount) FROM subscriptions WHERE active = true",
}
class TestMetricsAPI:
def test_list_metrics_empty(self, seeded_client):
client, admin_token, _ = seeded_client
headers = {"Authorization": f"Bearer {admin_token}"}
resp = client.get("/api/metrics", headers=headers)
assert resp.status_code == 200
data = resp.json()
assert data["count"] == 0
assert data["metrics"] == []
def test_create_and_list_metric(self, seeded_client):
client, admin_token, _ = seeded_client
headers = {"Authorization": f"Bearer {admin_token}"}
resp = client.post("/api/admin/metrics", json=SAMPLE_METRIC, headers=headers)
assert resp.status_code == 201
resp = client.get("/api/metrics", headers=headers)
assert resp.status_code == 200
data = resp.json()
assert data["count"] == 1
assert data["metrics"][0]["id"] == "finance/mrr"
def test_get_metric_detail(self, seeded_client):
client, admin_token, _ = seeded_client
headers = {"Authorization": f"Bearer {admin_token}"}
client.post("/api/admin/metrics", json=SAMPLE_METRIC, headers=headers)
resp = client.get("/api/metrics/finance/mrr", headers=headers)
assert resp.status_code == 200
data = resp.json()
assert data["id"] == "finance/mrr"
assert data["display_name"] == "Monthly Recurring Revenue"
assert data["category"] == "finance"
def test_get_metric_not_found(self, seeded_client):
client, admin_token, _ = seeded_client
headers = {"Authorization": f"Bearer {admin_token}"}
resp = client.get("/api/metrics/nonexistent/metric", headers=headers)
assert resp.status_code == 404
def test_delete_metric(self, seeded_client):
client, admin_token, _ = seeded_client
headers = {"Authorization": f"Bearer {admin_token}"}
client.post("/api/admin/metrics", json=SAMPLE_METRIC, headers=headers)
resp = client.delete("/api/admin/metrics/finance/mrr", headers=headers)
assert resp.status_code == 200
assert resp.json()["status"] == "deleted"
resp = client.get("/api/metrics/finance/mrr", headers=headers)
assert resp.status_code == 404
def test_analyst_cannot_create_metric(self, seeded_client):
client, _, analyst_token = seeded_client
headers = {"Authorization": f"Bearer {analyst_token}"}
resp = client.post("/api/admin/metrics", json=SAMPLE_METRIC, headers=headers)
assert resp.status_code == 403
def test_list_metrics_filter_by_category(self, seeded_client):
client, admin_token, _ = seeded_client
headers = {"Authorization": f"Bearer {admin_token}"}
finance_metric = {**SAMPLE_METRIC}
support_metric = {
"id": "support/tickets",
"name": "tickets",
"display_name": "Open Tickets",
"category": "support",
"sql": "SELECT COUNT(*) FROM tickets WHERE status = 'open'",
}
client.post("/api/admin/metrics", json=finance_metric, headers=headers)
client.post("/api/admin/metrics", json=support_metric, headers=headers)
resp = client.get("/api/metrics?category=finance", headers=headers)
assert resp.status_code == 200
data = resp.json()
assert data["count"] == 1
assert data["metrics"][0]["category"] == "finance"
def test_import_metrics_yaml(self, seeded_client):
client, admin_token, _ = seeded_client
yaml_content = b"- name: test_metric\n display_name: Test\n category: test\n sql: SELECT 1\n"
resp = client.post(
"/api/admin/metrics/import",
files={"file": ("test.yml", yaml_content, "application/x-yaml")},
headers={"Authorization": f"Bearer {admin_token}"},
)
assert resp.status_code == 200
assert resp.json()["count"] == 1
class TestMetadataAPI:
def test_get_metadata_empty(self, seeded_client):
client, admin_token, _ = seeded_client
resp = client.get("/api/admin/metadata/orders", headers={"Authorization": f"Bearer {admin_token}"})
assert resp.status_code == 200
assert resp.json()["columns"] == []
def test_save_and_get_metadata(self, seeded_client):
client, admin_token, _ = seeded_client
resp = client.post(
"/api/admin/metadata/orders",
json={"columns": [
{"column_name": "id", "basetype": "STRING", "description": "Order ID"},
{"column_name": "total", "basetype": "NUMERIC", "description": "Total"},
]},
headers={"Authorization": f"Bearer {admin_token}"},
)
assert resp.status_code == 200
assert resp.json()["status"] == "ok"
assert resp.json()["count"] == 2
resp = client.get("/api/admin/metadata/orders", headers={"Authorization": f"Bearer {admin_token}"})
assert resp.status_code == 200
assert len(resp.json()["columns"]) == 2
def test_analyst_cannot_save_metadata(self, seeded_client):
client, _, analyst_token = seeded_client
resp = client.post(
"/api/admin/metadata/orders",
json={"columns": [{"column_name": "id", "basetype": "STRING"}]},
headers={"Authorization": f"Bearer {analyst_token}"},
)
assert resp.status_code == 403
def test_push_non_keboola_table_fails(self, seeded_client):
client, admin_token, _ = seeded_client
resp = client.post(
"/api/admin/metadata/orders/push",
headers={"Authorization": f"Bearer {admin_token}"},
)
# 'orders' is not in table_registry — expect 404 or 400
assert resp.status_code in (400, 404)
def test_push_keboola_table(self, seeded_client, monkeypatch): # noqa: F811
client, admin_token, _ = seeded_client
# 1. Register a keboola table
from src.db import get_system_db
from src.repositories.table_registry import TableRegistryRepository
conn = get_system_db()
TableRegistryRepository(conn).register(
id="kbc_orders",
name="Orders",
source_type="keboola",
source_table="in.c-main.orders",
)
conn.close()
# 2. Save column metadata
client.post(
"/api/admin/metadata/kbc_orders",
json={"columns": [
{"column_name": "id", "basetype": "STRING", "description": "Order ID"},
]},
headers={"Authorization": f"Bearer {admin_token}"},
)
# 3. Set required env vars
monkeypatch.setenv("KBC_STACK_URL", "https://connection.keboola.com")
monkeypatch.setenv("KBC_STORAGE_TOKEN", "test-token")
# 4. Mock httpx.AsyncClient so no real HTTP call is made
import httpx as _httpx
from unittest.mock import AsyncMock, MagicMock, patch
mock_response = _httpx.Response(200, json={"ok": True})
mock_post = AsyncMock(return_value=mock_response)
# AsyncClient is used as "async with httpx.AsyncClient() as client: await client.post(...)"
# We patch the class so the context-manager instance has our mock_post.
mock_async_client_instance = MagicMock()
mock_async_client_instance.post = mock_post
mock_async_client_instance.__aenter__ = AsyncMock(return_value=mock_async_client_instance)
mock_async_client_instance.__aexit__ = AsyncMock(return_value=False)
with patch("httpx.AsyncClient", return_value=mock_async_client_instance):
resp = client.post(
"/api/admin/metadata/kbc_orders/push",
headers={"Authorization": f"Bearer {admin_token}"},
)
# 5. Assertions
assert resp.status_code == 200
data = resp.json()
assert data.get("pushed") == 1
mock_post.assert_called_once()
call_args = mock_post.call_args
# Verify URL contains the source table name
called_url = call_args[0][0] if call_args[0] else call_args.kwargs.get("url", "")
assert "in.c-main.orders" in called_url
# Verify auth header
called_headers = call_args.kwargs.get("headers", {})
assert called_headers.get("X-StorageApi-Token") == "test-token"
# Verify payload structure
called_json = call_args.kwargs.get("json", {})
assert called_json.get("provider") == "ai-metadata-enrichment"
assert isinstance(called_json.get("metadata"), list)
# ---- Hybrid Query ----
class TestHybridQueryAPI:
def test_hybrid_query_requires_admin(self, seeded_client):
client, _, analyst_token = seeded_client
resp = client.post(
"/api/query/hybrid",
json={"sql": "SELECT 1 AS val", "register_bq": {}},
headers={"Authorization": f"Bearer {analyst_token}"},
)
assert resp.status_code == 403
def test_hybrid_query_local_only(self, seeded_client):
client, admin_token, _ = seeded_client
resp = client.post(
"/api/query/hybrid",
json={"sql": "SELECT 1 AS val", "register_bq": {}},
headers={"Authorization": f"Bearer {admin_token}"},
)
assert resp.status_code == 200
data = resp.json()
assert "columns" in data
assert "rows" in data
assert data["columns"] == ["val"]
assert data["rows"] == [[1]]
def test_hybrid_query_blocked_sql(self, seeded_client):
client, admin_token, _ = seeded_client
resp = client.post(
"/api/query/hybrid",
json={"sql": "DROP TABLE users", "register_bq": {}},
headers={"Authorization": f"Bearer {admin_token}"},
)
assert resp.status_code == 400
assert "query_error" in resp.json()["detail"]
def test_hybrid_query_blocked_bq_sql(self, seeded_client):
client, admin_token, _ = seeded_client
resp = client.post(
"/api/query/hybrid",
json={
"sql": "SELECT 1",
"register_bq": {"bad_alias": "DROP TABLE sensitive"},
},
headers={"Authorization": f"Bearer {admin_token}"},
)
assert resp.status_code == 400
assert "query_error" in resp.json()["detail"]