* fix(security+ops): #82 #85 #87 — auth hardening, API validation, deploy posture Security and operational hardening across three issue groups: - M23: docker-compose.override.yml → docker-compose.dev.yml (BREAKING, prod foot-gun) - C13: Container runs as non-root user 'agnes' (USER directive in Dockerfile) - M21: Docker resource limits (mem_limit, cpus) on app + scheduler - M22: Caddyfile security headers (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, -Server) - M17: /api/health split into minimal (unauth) + /api/health/detailed (auth) (BREAKING) - M26: release.yml restricts build-and-push to main + workflow_dispatch; paths-ignore for docs - C2: table_id traversal validation on /api/data/{table_id}/download - M4: Upload streaming (chunk-read + temp file) instead of full-buffer; /local-md hashed filename - C5: reset_token removed from POST /api/users/{id}/reset-password response - C8: Startup WARNING when no user has password_hash (bootstrap window visible) - M9: Audit log on failed web form login (mirrors /auth/token endpoint) - M10: Atomic magic-link consume via compare-and-swap (CONSUMED: marker + DuckDB conflict catch) Also: SSRF protection on /api/admin/configure (#46), memory stats SQL aggregation (#90) Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix(review): SSRF 169.254.x.x + IPv6 multicast; M10 marker cleanup safety Review fixes: - Add 169.254.0.0/16 (link-local, cloud metadata) to SSRF regex — was missing, allowing requests to AWS/GCP/Azure metadata endpoints - Add ff[0-9a-f]{2}: (IPv6 multicast) to SSRF regex - M10: wrap Step 3 (CONSUMED marker cleanup) in try-except with warning log — prevents unhandled exception if DB write fails after successful token consumption - Add test for 169.254.169.254 SSRF rejection Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix(review): SSRF IPv6 bypass, CLI health endpoint, upload FD leak Address Devin Review findings on PR #104: 1. SSRF IPv6 bypass: Replace hostname regex with DNS resolution + ipaddress module checks. The old regex patterns like `fe80:` only matched up to the first colon, missing real IPv6 addresses like `fe80::1`, `fc00::1`, `ff02::1`. The new approach resolves the hostname via getaddrinfo and checks each resulting IP against ipaddress.is_private/is_loopback/is_link_local/is_reserved/is_multicast. 2. CLI commands broken: `da setup test-connection`, `da setup verify`, `da diagnose`, `da status` all called /api/health expecting the old format (status=="healthy", services dict). Now they call /api/health/detailed for service-level checks (with graceful fallback to the minimal endpoint when auth is not configured). 3. Temp file handle leak: _stream_to_temp returns an open NamedTemporaryFile; callers now close it before shutil.move() to prevent FD leaks until GC. Also adds IPv6 SSRF test cases (loopback, link-local, unique-local, multicast) with mocked DNS resolution for test environment independence. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix(review): download regex blocks hyphenated IDs; document health split Address Devin Review round-3 findings on PR #104: 1. _SAFE_IDENTIFIER regex blocked hyphenated table IDs: The download endpoint used the strict SQL-identifier regex which does not allow dots or hyphens, but Keboola table IDs like in.c-crm.orders contain both. Switched to _SAFE_QUOTED_IDENTIFIER which allows dots and hyphens while still blocking path-traversal chars (/, .., \) and quote/control characters. Added test for hyphenated/dotted IDs. 2. Documented health endpoint split in DEPLOYMENT.md: Added Health checks & external monitoring section explaining both endpoints (minimal unauth /api/health vs authenticated /api/health/detailed) and how to wire external monitoring tools to the detailed endpoint with a PAT. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * release(0.12.1): cut hotfix for snapshot integrity + #82/#85/#87 hardening * fix(security): apply CAS pattern to password reset confirm (#82/M10 follow-up) Devin review on the rebased PR flagged the asymmetry: magic-link verify got the atomic compare-and-swap pattern in the original M10 fix, but password reset confirm at /auth/password/reset/confirm was still using read-validate-clear. Two concurrent POSTs with the same valid reset token could both succeed in setting different new passwords (last-write- wins). Lower severity than the magic-link race because the attacker would need the reset token AND to race the legitimate user, but the asymmetry was a polish gap. Mirrors app/auth/providers/email.py::_consume_token CAS exactly: write unique CONSUMED:<random> marker via UPDATE...WHERE token=old_token, then SELECT to verify our marker won, then proceed. Only the winner clears the marker and applies the password change. New regression test_concurrent_reset_only_one_wins in tests/test_password_flows.py::TestResetConfirm pins the contract: two ThreadPoolExecutor workers + Barrier hit /reset/confirm with the same token; exactly one gets 302 (password applied), the other gets 200 with 'Invalid or expired'. Sanity-checked against the pre-CAS code — both POSTs got 302 (race confirmed). --------- Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
681 lines
27 KiB
Python
681 lines
27 KiB
Python
"""E2E access control tests — verify data isolation between users."""
|
|
|
|
import os
|
|
import duckdb
|
|
import pytest
|
|
from tests.conftest import create_mock_extract
|
|
|
|
|
|
def _auth(token):
|
|
return {"Authorization": f"Bearer {token}"}
|
|
|
|
|
|
class TestPublicTablesAccessible:
|
|
"""Default: is_public=True tables are accessible to everyone."""
|
|
|
|
def test_analyst_sees_public_tables_in_manifest(self, seeded_app):
|
|
"""Analyst can see public tables in manifest."""
|
|
c = seeded_app["client"]
|
|
env = seeded_app["env"]
|
|
|
|
# Create extract with data
|
|
create_mock_extract(env["extracts_dir"], "keboola", [
|
|
{"name": "orders", "data": [{"id": "1"}]},
|
|
])
|
|
from src.orchestrator import SyncOrchestrator
|
|
SyncOrchestrator().rebuild()
|
|
|
|
# Register table as public (default)
|
|
c.post("/api/admin/register-table", json={
|
|
"name": "orders", "source_type": "keboola",
|
|
}, headers=_auth(seeded_app["admin_token"]))
|
|
|
|
# Analyst should see it
|
|
resp = c.get("/api/sync/manifest", headers=_auth(seeded_app["analyst_token"]))
|
|
assert resp.status_code == 200
|
|
|
|
def test_analyst_can_download_public_table(self, seeded_app):
|
|
env = seeded_app["env"]
|
|
create_mock_extract(env["extracts_dir"], "keboola", [
|
|
{"name": "orders", "data": [{"id": "1"}]},
|
|
])
|
|
from src.orchestrator import SyncOrchestrator
|
|
SyncOrchestrator().rebuild()
|
|
|
|
c = seeded_app["client"]
|
|
# Register table so access control recognizes it as public
|
|
c.post("/api/admin/register-table", json={
|
|
"name": "orders", "source_type": "keboola",
|
|
}, headers=_auth(seeded_app["admin_token"]))
|
|
|
|
resp = c.get("/api/data/orders/download", headers=_auth(seeded_app["analyst_token"]))
|
|
assert resp.status_code == 200
|
|
|
|
def test_admin_can_download_public_table(self, seeded_app):
|
|
env = seeded_app["env"]
|
|
create_mock_extract(env["extracts_dir"], "keboola", [
|
|
{"name": "orders", "data": [{"id": "1"}]},
|
|
])
|
|
from src.orchestrator import SyncOrchestrator
|
|
SyncOrchestrator().rebuild()
|
|
|
|
c = seeded_app["client"]
|
|
resp = c.get("/api/data/orders/download", headers=_auth(seeded_app["admin_token"]))
|
|
assert resp.status_code == 200
|
|
|
|
def test_public_table_visible_in_catalog(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
env = seeded_app["env"]
|
|
|
|
create_mock_extract(env["extracts_dir"], "keboola", [
|
|
{"name": "orders", "data": [{"id": "1"}]},
|
|
])
|
|
from src.orchestrator import SyncOrchestrator
|
|
SyncOrchestrator().rebuild()
|
|
|
|
c.post("/api/admin/register-table", json={
|
|
"name": "orders", "source_type": "keboola",
|
|
}, headers=_auth(seeded_app["admin_token"]))
|
|
|
|
resp = c.get("/api/catalog/tables", headers=_auth(seeded_app["analyst_token"]))
|
|
assert resp.status_code == 200
|
|
names = {t["name"] for t in resp.json()["tables"]}
|
|
assert "orders" in names
|
|
|
|
|
|
class TestPrivateTablesRestricted:
|
|
"""Tables with is_public=False require explicit permission."""
|
|
|
|
def test_analyst_cannot_see_private_table_in_manifest(self, seeded_app):
|
|
"""Private table hidden from manifest for unauthorized user."""
|
|
c = seeded_app["client"]
|
|
env = seeded_app["env"]
|
|
|
|
# Create extract
|
|
create_mock_extract(env["extracts_dir"], "keboola", [
|
|
{"name": "salaries", "data": [{"id": "1", "amount": "100000"}]},
|
|
])
|
|
from src.orchestrator import SyncOrchestrator
|
|
SyncOrchestrator().rebuild()
|
|
|
|
# Register as public first (default), then make private
|
|
c.post("/api/admin/register-table", json={
|
|
"name": "salaries", "source_type": "keboola",
|
|
}, headers=_auth(seeded_app["admin_token"]))
|
|
|
|
# Make private via direct DB update
|
|
from src.db import get_system_db
|
|
conn = get_system_db()
|
|
conn.execute("UPDATE table_registry SET is_public = false WHERE name = 'salaries'")
|
|
conn.close()
|
|
|
|
# Analyst should NOT see it in manifest
|
|
resp = c.get("/api/sync/manifest", headers=_auth(seeded_app["analyst_token"]))
|
|
assert "salaries" not in resp.json().get("tables", {})
|
|
|
|
# Admin SHOULD see it
|
|
resp = c.get("/api/sync/manifest", headers=_auth(seeded_app["admin_token"]))
|
|
assert resp.status_code == 200
|
|
# Admin sees all — salaries should not be filtered out
|
|
|
|
def test_analyst_blocked_from_downloading_private_table(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
env = seeded_app["env"]
|
|
|
|
create_mock_extract(env["extracts_dir"], "keboola", [
|
|
{"name": "salaries", "data": [{"id": "1", "amount": "100000"}]},
|
|
])
|
|
from src.orchestrator import SyncOrchestrator
|
|
SyncOrchestrator().rebuild()
|
|
|
|
# Make private
|
|
from src.db import get_system_db
|
|
conn = get_system_db()
|
|
conn.execute(
|
|
"INSERT INTO table_registry (id, name, is_public) VALUES ('salaries','salaries',false) "
|
|
"ON CONFLICT(id) DO UPDATE SET is_public=false"
|
|
)
|
|
conn.close()
|
|
|
|
resp = c.get("/api/data/salaries/download", headers=_auth(seeded_app["analyst_token"]))
|
|
assert resp.status_code == 403
|
|
|
|
def test_admin_can_download_private_table(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
env = seeded_app["env"]
|
|
|
|
create_mock_extract(env["extracts_dir"], "keboola", [
|
|
{"name": "salaries", "data": [{"id": "1", "amount": "100000"}]},
|
|
])
|
|
from src.orchestrator import SyncOrchestrator
|
|
SyncOrchestrator().rebuild()
|
|
|
|
from src.db import get_system_db
|
|
conn = get_system_db()
|
|
conn.execute(
|
|
"INSERT INTO table_registry (id, name, is_public) VALUES ('salaries','salaries',false) "
|
|
"ON CONFLICT(id) DO UPDATE SET is_public=false"
|
|
)
|
|
conn.close()
|
|
|
|
resp = c.get("/api/data/salaries/download", headers=_auth(seeded_app["admin_token"]))
|
|
assert resp.status_code == 200
|
|
|
|
def test_mixed_public_private_manifest(self, seeded_app):
|
|
"""Manifest shows public tables but hides private ones for analyst."""
|
|
c = seeded_app["client"]
|
|
env = seeded_app["env"]
|
|
|
|
create_mock_extract(env["extracts_dir"], "keboola", [
|
|
{"name": "orders", "data": [{"id": "1"}]},
|
|
{"name": "salaries", "data": [{"id": "1", "amount": "100000"}]},
|
|
])
|
|
from src.orchestrator import SyncOrchestrator
|
|
SyncOrchestrator().rebuild()
|
|
|
|
# Register both
|
|
c.post("/api/admin/register-table", json={
|
|
"name": "orders", "source_type": "keboola",
|
|
}, headers=_auth(seeded_app["admin_token"]))
|
|
c.post("/api/admin/register-table", json={
|
|
"name": "salaries", "source_type": "keboola",
|
|
}, headers=_auth(seeded_app["admin_token"]))
|
|
|
|
# Make salaries private
|
|
from src.db import get_system_db
|
|
conn = get_system_db()
|
|
conn.execute("UPDATE table_registry SET is_public = false WHERE name = 'salaries'")
|
|
conn.close()
|
|
|
|
# Analyst sees orders but not salaries
|
|
resp = c.get("/api/sync/manifest", headers=_auth(seeded_app["analyst_token"]))
|
|
tables = resp.json().get("tables", {})
|
|
assert "orders" in tables
|
|
assert "salaries" not in tables
|
|
|
|
|
|
class TestExplicitPermissions:
|
|
"""Granting explicit access to private tables."""
|
|
|
|
def test_grant_then_access(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
env = seeded_app["env"]
|
|
|
|
create_mock_extract(env["extracts_dir"], "keboola", [
|
|
{"name": "salaries", "data": [{"id": "1"}]},
|
|
])
|
|
from src.orchestrator import SyncOrchestrator
|
|
SyncOrchestrator().rebuild()
|
|
|
|
# Make private
|
|
from src.db import get_system_db
|
|
conn = get_system_db()
|
|
conn.execute(
|
|
"INSERT INTO table_registry (id, name, is_public) VALUES ('salaries','salaries',false) "
|
|
"ON CONFLICT(id) DO UPDATE SET is_public=false"
|
|
)
|
|
conn.close()
|
|
|
|
# Analyst blocked
|
|
resp = c.get("/api/data/salaries/download", headers=_auth(seeded_app["analyst_token"]))
|
|
assert resp.status_code == 403
|
|
|
|
# Admin grants access
|
|
resp = c.post("/api/admin/permissions", json={
|
|
"user_id": "analyst1", "dataset": "salaries", "access": "read",
|
|
}, headers=_auth(seeded_app["admin_token"]))
|
|
assert resp.status_code == 201
|
|
|
|
# Now analyst CAN download
|
|
resp = c.get("/api/data/salaries/download", headers=_auth(seeded_app["analyst_token"]))
|
|
assert resp.status_code == 200
|
|
|
|
def test_revoke_removes_access(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
env = seeded_app["env"]
|
|
|
|
create_mock_extract(env["extracts_dir"], "keboola", [
|
|
{"name": "salaries", "data": [{"id": "1"}]},
|
|
])
|
|
from src.orchestrator import SyncOrchestrator
|
|
SyncOrchestrator().rebuild()
|
|
|
|
from src.db import get_system_db
|
|
conn = get_system_db()
|
|
conn.execute(
|
|
"INSERT INTO table_registry (id, name, is_public) VALUES ('salaries','salaries',false) "
|
|
"ON CONFLICT(id) DO UPDATE SET is_public=false"
|
|
)
|
|
conn.close()
|
|
|
|
# Grant
|
|
c.post("/api/admin/permissions", json={
|
|
"user_id": "analyst1", "dataset": "salaries", "access": "read",
|
|
}, headers=_auth(seeded_app["admin_token"]))
|
|
|
|
# Verify access
|
|
resp = c.get("/api/data/salaries/download", headers=_auth(seeded_app["analyst_token"]))
|
|
assert resp.status_code == 200
|
|
|
|
# Revoke
|
|
c.request("DELETE", "/api/admin/permissions", json={
|
|
"user_id": "analyst1", "dataset": "salaries",
|
|
}, headers=_auth(seeded_app["admin_token"]))
|
|
|
|
# Now blocked again
|
|
resp = c.get("/api/data/salaries/download", headers=_auth(seeded_app["analyst_token"]))
|
|
assert resp.status_code == 403
|
|
|
|
def test_grant_makes_private_table_visible_in_manifest(self, seeded_app):
|
|
"""After granting access, analyst sees private table in manifest."""
|
|
c = seeded_app["client"]
|
|
env = seeded_app["env"]
|
|
|
|
create_mock_extract(env["extracts_dir"], "keboola", [
|
|
{"name": "salaries", "data": [{"id": "1"}]},
|
|
])
|
|
from src.orchestrator import SyncOrchestrator
|
|
SyncOrchestrator().rebuild()
|
|
|
|
c.post("/api/admin/register-table", json={
|
|
"name": "salaries", "source_type": "keboola",
|
|
}, headers=_auth(seeded_app["admin_token"]))
|
|
|
|
from src.db import get_system_db
|
|
conn = get_system_db()
|
|
conn.execute("UPDATE table_registry SET is_public = false WHERE name = 'salaries'")
|
|
conn.close()
|
|
|
|
# Not visible before grant
|
|
resp = c.get("/api/sync/manifest", headers=_auth(seeded_app["analyst_token"]))
|
|
assert "salaries" not in resp.json().get("tables", {})
|
|
|
|
# Grant access
|
|
c.post("/api/admin/permissions", json={
|
|
"user_id": "analyst1", "dataset": "salaries", "access": "read",
|
|
}, headers=_auth(seeded_app["admin_token"]))
|
|
|
|
# Now visible
|
|
resp = c.get("/api/sync/manifest", headers=_auth(seeded_app["analyst_token"]))
|
|
assert "salaries" in resp.json().get("tables", {})
|
|
|
|
|
|
class TestCatalogFiltering:
|
|
"""Catalog only shows accessible tables."""
|
|
|
|
def test_private_table_hidden_from_catalog(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
|
|
# Register public + private
|
|
c.post("/api/admin/register-table", json={"name": "public_table"}, headers=_auth(seeded_app["admin_token"]))
|
|
c.post("/api/admin/register-table", json={"name": "private_table"}, headers=_auth(seeded_app["admin_token"]))
|
|
|
|
from src.db import get_system_db
|
|
conn = get_system_db()
|
|
conn.execute("UPDATE table_registry SET is_public = false WHERE name = 'private_table'")
|
|
conn.close()
|
|
|
|
resp = c.get("/api/catalog/tables", headers=_auth(seeded_app["analyst_token"]))
|
|
names = {t["name"] for t in resp.json()["tables"]}
|
|
assert "public_table" in names
|
|
assert "private_table" not in names
|
|
|
|
def test_admin_sees_all_in_catalog(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
|
|
c.post("/api/admin/register-table", json={"name": "public_table"}, headers=_auth(seeded_app["admin_token"]))
|
|
c.post("/api/admin/register-table", json={"name": "private_table"}, headers=_auth(seeded_app["admin_token"]))
|
|
|
|
from src.db import get_system_db
|
|
conn = get_system_db()
|
|
conn.execute("UPDATE table_registry SET is_public = false WHERE name = 'private_table'")
|
|
conn.close()
|
|
|
|
resp = c.get("/api/catalog/tables", headers=_auth(seeded_app["admin_token"]))
|
|
names = {t["name"] for t in resp.json()["tables"]}
|
|
assert "public_table" in names
|
|
assert "private_table" in names
|
|
|
|
def test_granted_private_table_shown_in_catalog(self, seeded_app):
|
|
"""After granting access, private table appears in catalog for that user."""
|
|
c = seeded_app["client"]
|
|
|
|
c.post("/api/admin/register-table", json={"name": "secret_data"}, headers=_auth(seeded_app["admin_token"]))
|
|
|
|
from src.db import get_system_db
|
|
conn = get_system_db()
|
|
conn.execute("UPDATE table_registry SET is_public = false WHERE name = 'secret_data'")
|
|
conn.close()
|
|
|
|
# Not visible before grant
|
|
resp = c.get("/api/catalog/tables", headers=_auth(seeded_app["analyst_token"]))
|
|
names = {t["name"] for t in resp.json()["tables"]}
|
|
assert "secret_data" not in names
|
|
|
|
# Grant access
|
|
c.post("/api/admin/permissions", json={
|
|
"user_id": "analyst1", "dataset": "secret_data", "access": "read",
|
|
}, headers=_auth(seeded_app["admin_token"]))
|
|
|
|
# Now visible
|
|
resp = c.get("/api/catalog/tables", headers=_auth(seeded_app["analyst_token"]))
|
|
names = {t["name"] for t in resp.json()["tables"]}
|
|
assert "secret_data" in names
|
|
|
|
|
|
class TestPermissionsAPI:
|
|
"""Admin permissions CRUD."""
|
|
|
|
def test_grant_and_list(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
h = _auth(seeded_app["admin_token"])
|
|
|
|
resp = c.post("/api/admin/permissions", json={
|
|
"user_id": "analyst1", "dataset": "secret_data", "access": "read",
|
|
}, headers=h)
|
|
assert resp.status_code == 201
|
|
|
|
resp = c.get("/api/admin/permissions/analyst1", headers=h)
|
|
assert resp.status_code == 200
|
|
datasets = {p["dataset"] for p in resp.json()["permissions"]}
|
|
assert "secret_data" in datasets
|
|
|
|
def test_analyst_cannot_manage_permissions(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
resp = c.post("/api/admin/permissions", json={
|
|
"user_id": "analyst1", "dataset": "anything",
|
|
}, headers=_auth(seeded_app["analyst_token"]))
|
|
assert resp.status_code == 403
|
|
|
|
def test_grant_multiple_datasets(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
h = _auth(seeded_app["admin_token"])
|
|
|
|
for ds in ["sales", "hr", "finance"]:
|
|
resp = c.post("/api/admin/permissions", json={
|
|
"user_id": "analyst1", "dataset": ds, "access": "read",
|
|
}, headers=h)
|
|
assert resp.status_code == 201
|
|
|
|
resp = c.get("/api/admin/permissions/analyst1", headers=h)
|
|
datasets = {p["dataset"] for p in resp.json()["permissions"]}
|
|
assert datasets == {"sales", "hr", "finance"}
|
|
|
|
def test_revoke_via_delete(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
h = _auth(seeded_app["admin_token"])
|
|
|
|
c.post("/api/admin/permissions", json={
|
|
"user_id": "analyst1", "dataset": "secret_data", "access": "read",
|
|
}, headers=h)
|
|
|
|
resp = c.request("DELETE", "/api/admin/permissions", json={
|
|
"user_id": "analyst1", "dataset": "secret_data",
|
|
}, headers=h)
|
|
assert resp.status_code == 200
|
|
|
|
resp = c.get("/api/admin/permissions/analyst1", headers=h)
|
|
datasets = {p["dataset"] for p in resp.json()["permissions"]}
|
|
assert "secret_data" not in datasets
|
|
|
|
def test_analyst_cannot_revoke_permissions(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
resp = c.request("DELETE", "/api/admin/permissions", json={
|
|
"user_id": "analyst1", "dataset": "anything",
|
|
}, headers=_auth(seeded_app["analyst_token"]))
|
|
assert resp.status_code == 403
|
|
|
|
|
|
class TestQueryFiltering:
|
|
"""Query endpoint respects access control."""
|
|
|
|
def test_analyst_blocked_from_querying_private_table(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
env = seeded_app["env"]
|
|
|
|
# Create extract with private data
|
|
create_mock_extract(env["extracts_dir"], "keboola", [
|
|
{"name": "salaries", "data": [{"id": "1", "amount": "100000"}]},
|
|
])
|
|
from src.orchestrator import SyncOrchestrator
|
|
SyncOrchestrator().rebuild()
|
|
|
|
from src.db import get_system_db
|
|
conn = get_system_db()
|
|
conn.execute(
|
|
"INSERT INTO table_registry (id, name, is_public) VALUES ('salaries','salaries',false) "
|
|
"ON CONFLICT(id) DO UPDATE SET is_public=false"
|
|
)
|
|
conn.close()
|
|
|
|
resp = c.post("/api/query", json={"sql": "SELECT * FROM salaries"},
|
|
headers=_auth(seeded_app["analyst_token"]))
|
|
assert resp.status_code == 403
|
|
|
|
def test_admin_can_query_private_table(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
env = seeded_app["env"]
|
|
|
|
create_mock_extract(env["extracts_dir"], "keboola", [
|
|
{"name": "salaries", "data": [{"id": "1", "amount": "100000"}]},
|
|
])
|
|
from src.orchestrator import SyncOrchestrator
|
|
SyncOrchestrator().rebuild()
|
|
|
|
from src.db import get_system_db
|
|
conn = get_system_db()
|
|
conn.execute(
|
|
"INSERT INTO table_registry (id, name, is_public) VALUES ('salaries','salaries',false) "
|
|
"ON CONFLICT(id) DO UPDATE SET is_public=false"
|
|
)
|
|
conn.close()
|
|
|
|
resp = c.post("/api/query", json={"sql": "SELECT * FROM salaries"},
|
|
headers=_auth(seeded_app["admin_token"]))
|
|
# Admin should not be blocked by access control
|
|
assert resp.status_code != 403
|
|
|
|
def test_analyst_can_query_public_table(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
env = seeded_app["env"]
|
|
|
|
create_mock_extract(env["extracts_dir"], "keboola", [
|
|
{"name": "orders", "data": [{"id": "1", "total": "99.99"}]},
|
|
])
|
|
from src.orchestrator import SyncOrchestrator
|
|
SyncOrchestrator().rebuild()
|
|
|
|
# Register table so access control recognizes it as public
|
|
c.post("/api/admin/register-table", json={
|
|
"name": "orders", "source_type": "keboola",
|
|
}, headers=_auth(seeded_app["admin_token"]))
|
|
|
|
resp = c.post("/api/query", json={"sql": "SELECT * FROM orders"},
|
|
headers=_auth(seeded_app["analyst_token"]))
|
|
# Public table should not be blocked
|
|
assert resp.status_code != 403
|
|
|
|
def test_granted_analyst_can_query_private_table(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
env = seeded_app["env"]
|
|
|
|
create_mock_extract(env["extracts_dir"], "keboola", [
|
|
{"name": "salaries", "data": [{"id": "1", "amount": "100000"}]},
|
|
])
|
|
from src.orchestrator import SyncOrchestrator
|
|
SyncOrchestrator().rebuild()
|
|
|
|
from src.db import get_system_db
|
|
conn = get_system_db()
|
|
conn.execute(
|
|
"INSERT INTO table_registry (id, name, is_public) VALUES ('salaries','salaries',false) "
|
|
"ON CONFLICT(id) DO UPDATE SET is_public=false"
|
|
)
|
|
conn.close()
|
|
|
|
# Grant access
|
|
c.post("/api/admin/permissions", json={
|
|
"user_id": "analyst1", "dataset": "salaries", "access": "read",
|
|
}, headers=_auth(seeded_app["admin_token"]))
|
|
|
|
resp = c.post("/api/query", json={"sql": "SELECT * FROM salaries"},
|
|
headers=_auth(seeded_app["analyst_token"]))
|
|
assert resp.status_code != 403
|
|
|
|
|
|
class TestAccessRequestFlow:
|
|
"""Full access request lifecycle: request -> approve -> access."""
|
|
|
|
def test_request_approve_access(self, seeded_app):
|
|
"""Analyst requests -> admin approves -> analyst gets access."""
|
|
c = seeded_app["client"]
|
|
env = seeded_app["env"]
|
|
admin_h = _auth(seeded_app["admin_token"])
|
|
analyst_h = _auth(seeded_app["analyst_token"])
|
|
|
|
# Create private table with data
|
|
create_mock_extract(env["extracts_dir"], "keboola", [
|
|
{"name": "secret", "data": [{"id": "1", "val": "hidden"}]},
|
|
])
|
|
from src.db import get_system_db
|
|
conn = get_system_db()
|
|
conn.execute("INSERT INTO table_registry (id,name,is_public) VALUES ('secret','secret',false) ON CONFLICT(id) DO UPDATE SET is_public=false")
|
|
conn.close()
|
|
|
|
# Analyst blocked
|
|
assert c.get("/api/data/secret/download", headers=analyst_h).status_code == 403
|
|
|
|
# Analyst requests access
|
|
resp = c.post("/api/access-requests", json={"table_id": "secret", "reason": "Need for analysis"},
|
|
headers=analyst_h)
|
|
assert resp.status_code == 201
|
|
req_id = resp.json()["id"]
|
|
|
|
# Check pending
|
|
resp = c.get("/api/access-requests/pending", headers=admin_h)
|
|
assert resp.status_code == 200
|
|
assert any(r["id"] == req_id for r in resp.json()["requests"])
|
|
|
|
# Admin approves
|
|
resp = c.post(f"/api/access-requests/{req_id}/approve", headers=admin_h)
|
|
assert resp.status_code == 200
|
|
|
|
# Analyst now has access
|
|
assert c.get("/api/data/secret/download", headers=analyst_h).status_code == 200
|
|
|
|
def test_request_deny(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
admin_h = _auth(seeded_app["admin_token"])
|
|
analyst_h = _auth(seeded_app["analyst_token"])
|
|
|
|
from src.db import get_system_db
|
|
conn = get_system_db()
|
|
conn.execute("INSERT INTO table_registry (id,name,is_public) VALUES ('denied_tbl','denied_tbl',false) ON CONFLICT(id) DO UPDATE SET is_public=false")
|
|
conn.close()
|
|
|
|
resp = c.post("/api/access-requests", json={"table_id": "denied_tbl"}, headers=analyst_h)
|
|
req_id = resp.json()["id"]
|
|
|
|
resp = c.post(f"/api/access-requests/{req_id}/deny", headers=admin_h)
|
|
assert resp.status_code == 200
|
|
assert resp.json()["status"] == "denied"
|
|
|
|
def test_duplicate_request_409(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
analyst_h = _auth(seeded_app["analyst_token"])
|
|
|
|
from src.db import get_system_db
|
|
conn = get_system_db()
|
|
conn.execute("INSERT INTO table_registry (id,name,is_public) VALUES ('dup_tbl','dup_tbl',false) ON CONFLICT(id) DO UPDATE SET is_public=false")
|
|
conn.close()
|
|
|
|
c.post("/api/access-requests", json={"table_id": "dup_tbl"}, headers=analyst_h)
|
|
resp = c.post("/api/access-requests", json={"table_id": "dup_tbl"}, headers=analyst_h)
|
|
assert resp.status_code == 409
|
|
|
|
def test_my_requests(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
analyst_h = _auth(seeded_app["analyst_token"])
|
|
|
|
from src.db import get_system_db
|
|
conn = get_system_db()
|
|
conn.execute("INSERT INTO table_registry (id,name,is_public) VALUES ('my_req_tbl','my_req_tbl',false) ON CONFLICT(id) DO UPDATE SET is_public=false")
|
|
conn.close()
|
|
|
|
c.post("/api/access-requests", json={"table_id": "my_req_tbl"}, headers=analyst_h)
|
|
resp = c.get("/api/access-requests/my", headers=analyst_h)
|
|
assert resp.status_code == 200
|
|
assert any(r["table_id"] == "my_req_tbl" for r in resp.json()["requests"])
|
|
|
|
|
|
class TestUnauthenticatedAccess:
|
|
"""Endpoints require authentication."""
|
|
|
|
def test_manifest_requires_auth(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
resp = c.get("/api/sync/manifest")
|
|
assert resp.status_code in (401, 403)
|
|
|
|
def test_download_requires_auth(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
resp = c.get("/api/data/orders/download")
|
|
assert resp.status_code in (401, 403)
|
|
|
|
def test_catalog_requires_auth(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
resp = c.get("/api/catalog/tables")
|
|
assert resp.status_code in (401, 403)
|
|
|
|
def test_query_requires_auth(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
resp = c.post("/api/query", json={"sql": "SELECT 1"})
|
|
assert resp.status_code in (401, 403)
|
|
|
|
def test_permissions_api_requires_auth(self, seeded_app):
|
|
c = seeded_app["client"]
|
|
resp = c.post("/api/admin/permissions", json={
|
|
"user_id": "analyst1", "dataset": "anything",
|
|
})
|
|
assert resp.status_code in (401, 403)
|
|
|
|
|
|
class TestDownloadPathTraversal:
|
|
"""Path-traversal protection: unsafe table_id values are rejected with 404."""
|
|
|
|
def test_download_rejects_traversal_id(self, seeded_app):
|
|
"""URL-encoded path traversal in table_id returns 404."""
|
|
c = seeded_app["client"]
|
|
# ..%2F..%2Fstate%2Fsystem decodes to ../../state/system
|
|
resp = c.get("/api/data/..%2F..%2Fstate%2Fsystem/download",
|
|
headers=_auth(seeded_app["admin_token"]))
|
|
assert resp.status_code == 404
|
|
|
|
def test_download_rejects_dotdot(self, seeded_app):
|
|
"""Literal ../../etc/passwd in table_id returns 404."""
|
|
c = seeded_app["client"]
|
|
resp = c.get('/api/data/../../etc/passwd/download',
|
|
headers=_auth(seeded_app["admin_token"]))
|
|
assert resp.status_code == 404
|
|
|
|
def test_download_rejects_special_chars(self, seeded_app):
|
|
"""table_id with spaces, slashes, or other dangerous chars returns 404."""
|
|
c = seeded_app["client"]
|
|
# Spaces
|
|
resp = c.get("/api/data/my%20table/download",
|
|
headers=_auth(seeded_app["admin_token"]))
|
|
assert resp.status_code == 404
|
|
# Slashes
|
|
resp = c.get("/api/data/foo/bar/download",
|
|
headers=_auth(seeded_app["admin_token"]))
|
|
assert resp.status_code == 404
|
|
|
|
def test_download_accepts_hyphenated_dotted_id(self, seeded_app):
|
|
"""Keboola-style table IDs with dots and hyphens (e.g. in.c-crm.orders)
|
|
pass validation — they are safe for filesystem lookup and DB query."""
|
|
c = seeded_app["client"]
|
|
# No parquet file exists, so we expect 404 from "not found on disk",
|
|
# NOT 404 from identifier validation rejection.
|
|
resp = c.get("/api/data/in.c-crm.orders/download",
|
|
headers=_auth(seeded_app["admin_token"]))
|
|
assert resp.status_code == 404
|
|
assert "not found" in resp.json()["detail"].lower()
|