Commit graph

194 commits

Author SHA1 Message Date
ZdenekSrotyr
2043594670 fix: restrict script execution endpoints to analyst/admin roles
deploy, run, and run-deployed require analyst; undeploy requires admin.
Update test to use admin token for undeploy.
2026-04-09 16:31:42 +02:00
ZdenekSrotyr
449053bf8a fix: enforce per-table access control on catalog profile endpoints
Add can_access_table check to GET /api/catalog/profile/{table_name} and
POST /api/catalog/profile/{table_name}/refresh, returning 403 for
unauthorized tables. Update test_api_complete to cover new 403 behaviour
and fix the existing 404 test to use admin token.
2026-04-09 16:30:24 +02:00
ZdenekSrotyr
ad6b3a96e4 fix: enforce role guards on admin web pages
Add require_role(Role.ADMIN) to /admin/tables and /admin/permissions,
and require_role(Role.KM_ADMIN) to /corporate-memory/admin so that
non-admin users receive 403 instead of being served the page.

Fix admin_cookie test fixture to supply a password_hash (required since
the /auth/token endpoint blocks passwordless requests). Add analyst
fixture and TestAdminRoleGuards tests verifying analysts get 403 and
admins get 200 on the protected routes.
2026-04-09 16:30:13 +02:00
ZdenekSrotyr
3205a8d300 fix: block /auth/token for OAuth-only users without password_hash
Users without a password_hash (Google OAuth / magic-link accounts) could
obtain a JWT by simply posting their email to /auth/token. Add an else
clause that rejects such requests with 401, directing them to their
configured auth provider. Update and extend tests accordingly.
2026-04-09 16:29:47 +02:00
ZdenekSrotyr
55515266ea fix: block DuckDB metadata functions and relative paths in query endpoint
Add information_schema, duckdb_* introspection functions, pragma_* functions,
and relative path traversal patterns to the SQL blocklist so users cannot
enumerate schema metadata regardless of RBAC. Add six corresponding tests.
2026-04-09 16:29:11 +02:00
ZdenekSrotyr
8df8183a9f feat: add 50 MB upload size limit to session and artifact endpoints
Rejects files exceeding MAX_UPLOAD_SIZE with HTTP 413 before writing to disk.
2026-04-09 07:14:16 +02:00
ZdenekSrotyr
c56511f69f refactor: replace local _get_data_dir() with shared app.utils.get_data_dir()
Replace copy-pasted _get_data_dir() functions in catalog.py and upload.py
with import from app.utils.get_data_dir(). sync.py and data.py already use
the shared utility.
2026-04-09 07:05:50 +02:00
ZdenekSrotyr
53a9e838f9 feat: add graceful shutdown handler
- Add close_system_db() function in src/db.py to cleanly close shared DB connection
- Add lifespan context manager in app/main.py to trigger shutdown on app exit
- Integrate lifespan into FastAPI app initialization
- All API tests pass (77/77)
2026-04-09 07:03:45 +02:00
ZdenekSrotyr
1b3acce7e9 fix: replace substring table access check with word-boundary regex
Replace substring matching with word-boundary regex in query endpoint's
table access validation. Prevents false positives where short table names
like 'id' would block any query containing the word. Uses re.escape() to
safely handle special characters in table names.

- Import re module at top
- Use regex pattern with word boundaries (\b) for matching
- Add tests to verify no false positives and proper blocking
2026-04-09 07:00:48 +02:00
ZdenekSrotyr
1b219cabe9 fix: remove dead PRAGMA enable_wal code
DuckDB has used WAL by default since v0.8, so this pragma is not
valid DuckDB syntax. Removed obsolete try-except block that attempted
to enable WAL on system database initialization.
2026-04-09 06:59:57 +02:00
ZdenekSrotyr
535b5fb1bf security: strip VIRTUAL_ENV/PYTHONPATH from script sandbox and block httpx
Replace inherited env vars with a minimal env dict (PATH, DATA_DIR, HOME only),
omitting VIRTUAL_ENV and PYTHONPATH to prevent subprocess access to installed
packages. Switch subprocess invocation to sys.executable so the correct
interpreter is used with the restricted PATH. Add httpx to blocked_patterns
and BLOCKED_MODULES. Add test_sandbox_cannot_import_httpx to test_security.py.
2026-04-09 06:58:26 +02:00
ZdenekSrotyr
3321d2e266 security: reduce JWT expiry to 24h and add jti claim
Tokens previously lasted 30 days with no revocation path. Expiry is now
24 hours and every token carries a unique jti (UUID hex) to support future
revocation checks.
2026-04-09 06:57:23 +02:00
ZdenekSrotyr
23ae6a602c security: harden query endpoint SQL blocklist and disable external access
Expand blocked keywords to cover parquet_scan, read_csv_auto, query_table,
iceberg_scan, delta_scan, call, URL schemes (http/https/s3/gcs), and
additional file-scan functions. Set enable_external_access=false on the
non-read-only analytics connection path. Add three new tests covering
parquet_scan, read_csv_auto, and query_table blocking.
2026-04-09 06:54:58 +02:00
ZdenekSrotyr
4aa97c23d2 fix: raise RuntimeError on missing JWT_SECRET_KEY in non-test environments
Prevents production deployments from silently using a hardcoded default
secret. TESTING=1 still resolves to a built-in test key so the existing
test suite is unaffected. Adds a test that verifies the RuntimeError is
raised when neither JWT_SECRET_KEY nor TESTING is set.
2026-04-09 06:54:29 +02:00
ZdenekSrotyr
94c6b0f839 fix: require password verification when user has password_hash in /auth/token
Previously the password check was gated on both user.password_hash and
request.password being truthy, so an attacker could omit the password
field (which defaults to "") and receive a valid JWT. Now any user with a
stored hash must supply a non-empty password that passes argon2 verification.

Adds six TestTokenEndpoint tests covering empty, missing, wrong, and correct
password, plus no-hash user and unknown user cases.
2026-04-09 06:44:31 +02:00
ZdenekSrotyr
92fbb88c15 chore: Docker prod config (Python 3.13, no reload), fix utcnow deprecation, update docs 2026-04-08 12:10:47 +02:00
ZdenekSrotyr
05a1b452e9 security: harden query (read-only DB), uploads (path sanitization), scripts (AST validation) 2026-04-08 12:09:19 +02:00
ZdenekSrotyr
224635b88d security: fix auth (argon2, cookie, JWT), CORS, session middleware, pyproject.toml 2026-04-08 12:08:52 +02:00
ZdenekSrotyr
d5659d7091 fix: login page uses login_buttons format expected by template 2026-04-08 07:11:03 +02:00
ZdenekSrotyr
67a1e0bb45 feat: Jira webhook FastAPI adapter — replaces Flask Blueprint 2026-04-08 07:04:50 +02:00
ZdenekSrotyr
3e3f84a00e feat: dynamic login providers + profiler auto-trigger + refresh endpoint 2026-04-08 07:04:40 +02:00
ZdenekSrotyr
2b7348a773 fix: sync only extracts local tables, skips remote
Was using list_by_source() which returns all tables including remote.
Now uses list_local() to skip query_mode='remote' tables.
2026-03-31 15:35:49 +02:00
ZdenekSrotyr
8f3a342108 fix: sync logs via stderr for docker compose visibility 2026-03-31 14:05:01 +02:00
ZdenekSrotyr
7612385ed6 fix: extractor subprocess reads table configs via stdin, not DuckDB
Subprocess cannot open system.duckdb (main process holds lock).
Now main process reads table_registry and passes configs as JSON
via stdin to subprocess. Subprocess never touches system.duckdb.
2026-03-31 13:57:02 +02:00
ZdenekSrotyr
4d1acd014a refactor: remove legacy webapp + add missing tests + housekeeping
Phase A: Close fixed issues (#7, #8, #9), add server/ user/ to
.gitignore, increase extractor timeout to 30 min.

Phase B: Add 10 new tests — access request lifecycle (4), CLI admin
commands (5), sync subprocess trigger (1). 578 tests passing.

Phase C: Delete entire webapp/ directory (24,800 lines) — legacy Flask
app fully replaced by FastAPI app/. Fix auth providers to use
app.instance_config instead of webapp.config. Update CLAUDE.md.

Delete 6 webapp-only test files. Fix Jira service config imports.
2026-03-31 13:44:06 +02:00
ZdenekSrotyr
2d6a94fb6f fix: DuckDB concurrency — WAL mode, subprocess sync, temp+rename
Three-pronged fix for DuckDB lock conflicts:

1. WAL mode on system.duckdb — enables concurrent readers + writer
2. Sync trigger runs extractor as subprocess (not background task) —
   separate process = separate DuckDB connections, no lock conflict
3. Both extractor and orchestrator write to .tmp then atomic rename —
   avoids lock conflict with API reads on extract.duckdb/analytics.duckdb

Fixes #9 permanently.
2026-03-31 13:19:57 +02:00
ZdenekSrotyr
2e7d5d1fe9 feat: access request UI — catalog badges, request modal, admin approval page
Backend:
- access_requests table in DuckDB schema
- AccessRequestRepository with create/approve/deny/list
- API: POST/GET /api/access-requests (submit, my requests, pending, approve, deny)

UI:
- Catalog: lock icon on private tables, "Request Access" button + modal
- Catalog: "Pending" badge for tables with pending requests
- Admin permissions page (/admin/permissions): approve/deny requests,
  grant/revoke permissions, view all user permissions
- Cross-navigation between admin/tables and admin/permissions

733 tests passing.
2026-03-31 12:45:29 +02:00
ZdenekSrotyr
1074d5ec49 feat: implement data access control — table-level permissions
Schema v3: add is_public column to table_registry (default true).

src/rbac.py: can_access_table() checks admin bypass, public flag,
explicit permissions, wildcard bucket permissions.

API enforcement:
- manifest: filters tables by user access
- download: 403 if no access
- catalog: filters table list
- query: validates referenced tables against allowed list

New admin permissions API (/api/admin/permissions) for grant/revoke.

28 access control tests + 733 total tests passing.
2026-03-31 12:33:31 +02:00
ZdenekSrotyr
78f003f5b5 fix: reject empty table name in register-table endpoint
Fixes #8 — empty name created orphaned record that couldn't be deleted.
2026-03-31 12:18:58 +02:00
ZdenekSrotyr
e1e2d6d903 feat: add SEED_ADMIN_EMAIL for Docker test environments
app/main.py: seed admin user on startup when SEED_ADMIN_EMAIL is set.
docker-compose.test.yml: expose port 8000, add seed env var.
2026-03-31 09:48:12 +02:00
ZdenekSrotyr
617e724d21 feat: add E2E test suite — API, extractor, Docker
tests/conftest.py: shared fixtures (e2e_env, seeded_app, create_mock_extract)
tests/test_e2e_api.py: 11 tests — full sync flow, RBAC, table lifecycle
tests/test_e2e_extract.py: 6 tests — Keboola/BQ/Jira pipelines, multi-source, corrupt handling
tests/test_e2e_docker.py: 3 tests — Docker health + full flow (opt-in via -m docker)

Fix admin update route (duplicate id kwarg, .dict() → .model_dump()).

705 tests passing.
2026-03-31 08:18:54 +02:00
ZdenekSrotyr
caa60a507d feat: add centralized RBAC module — replace Linux group auth
New src/rbac.py: Role enum, hierarchy, get_user_role(), has_role(),
is_admin(), is_km_admin(), has_dataset_access(), set_user_role().

webapp/auth.py: admin_required + km_admin_required now use DuckDB
roles instead of Linux groups (pwd.getpwnam + sudo/data-ops check).

app/auth/dependencies.py: imports Role from src/rbac.py (single source).

11 RBAC tests passing.
2026-03-31 08:04:35 +02:00
ZdenekSrotyr
b502bd8bdd refactor: delete old sync pipeline — 9,500 lines removed
Phase 5 cleanup: remove all code replaced by extract.duckdb architecture.

Deleted modules:
- src/config.py (653) — replaced by DuckDB table_registry
- src/parquet_manager.py (755) — replaced by DuckDB COPY TO
- src/data_sync.py (734) — replaced by SyncOrchestrator
- src/remote_query.py (636) — replaced by DuckDB BigQuery ATTACH
- src/table_registry.py (464) — replaced by DuckDB repository
- connectors/keboola/adapter.py (820) — replaced by extractor.py
- connectors/bigquery/adapter.py (665) — replaced by extractor.py
- connectors/bigquery/client.py (644) — replaced by DuckDB BQ extension

Updated all imports in webapp, catalog_export, enricher, router,
sync_settings_service, generate_sample_data. Kept keboola/client.py
as fallback (removed src.config dependency).

704 tests passing.
2026-03-31 07:50:37 +02:00
ZdenekSrotyr
1bf97c725c feat: wire orchestrator into API — replace DataSyncManager
sync.py: _run_sync() now calls extractor + SyncOrchestrator.rebuild()
data.py: parquet lookup searches /data/extracts/ first, legacy fallback
catalog.py: list tables from DuckDB table_registry instead of src.config
admin.py: discover-tables uses KeboolaClient directly, remove old TableRegistry dep
2026-03-30 20:16:33 +02:00
ZdenekSrotyr
18e5f0b6e8 feat: implement extract.duckdb contract — orchestrator + extractors
Phase 0: extend table_registry schema (v1→v2 migration), add
source_type/bucket/source_table/query_mode columns.

Phase 1: SyncOrchestrator ATTACHes extract.duckdb files into master
analytics.duckdb. Keboola extractor uses DuckDB extension with
legacy client fallback. BigQuery extractor is remote-only via
DuckDB BQ extension (no data download).

62 tests passing.
2026-03-30 20:12:56 +02:00
ZdenekSrotyr
7b0a161d3d fix: handle timezone-naive timestamps in health check 2026-03-30 14:19:40 +02:00
ZdenekSrotyr
bca5e91826 feat: add bootstrap endpoint + deploy skill for AI agents
- POST /auth/bootstrap — creates first admin, self-deactivates after
- da setup bootstrap — CLI command for agent-driven setup
- da setup verify — structured health check (JSON output for agents)
- cli/skills/deploy.md — complete deployment guide for AI agents
- 6 bootstrap tests including full agent deployment flow simulation
- 156 total tests passing
2026-03-30 14:01:01 +02:00
ZdenekSrotyr
0b91d4ac47 feat: complete web UI + auth providers + template compatibility
All 7 web pages rendering (200):
  /login, /dashboard, /catalog, /corporate-memory,
  /corporate-memory/admin, /activity-center, /admin/tables

All 13 API endpoints working (200):
  health, sync, data, query, users, memory, scripts,
  settings, telegram, admin, catalog

Auth providers: Google OAuth, Password (argon2), Email magic link
Cookie-based JWT auth for web UI after OAuth redirect
FlexDict for Flask→FastAPI template compatibility
150 tests passing
2026-03-27 17:34:39 +01:00
ZdenekSrotyr
1a7939c594 feat: add auth providers (Google OAuth, Password, Email magic link) + web UI fixes
- Google OAuth with authlib + auto user creation + cookie-based JWT
- Password auth with argon2 hash + setup token flow
- Email magic link with SMTP/SendGrid support
- Cookie-based auth for web UI (after OAuth redirect)
- Dashboard template compatibility (user_info, activity, desktop status)
- 150 tests passing
2026-03-27 17:07:59 +01:00
ZdenekSrotyr
fb1e60d8e1 fix: fix TemplateResponse API for Starlette compatibility
Use new TemplateResponse(request, name, context) signature.
Add Flask compat shims (get_flashed_messages, url_for, session).
2026-03-27 16:59:04 +01:00
ZdenekSrotyr
1287e63ed9 feat: complete system — web UI, all API endpoints, governance, admin, CLI commands
Major additions:
- Web UI: Jinja2 templates in FastAPI (login, dashboard, catalog, corporate memory, admin)
- API: catalog profiles/metrics, telegram verify/unlink/status, admin table registry CRUD
- Corporate memory governance: approve/reject/mandate/revoke/edit/batch + audit log
- Sync: real DataSyncManager trigger, sync-settings, table-subscriptions
- CLI: setup (init/test/deploy/verify), server (logs/restart/deploy/backup), explore
- Instance config integration (instance.yaml loaded at startup)
- 140 tests passing (25 new)
2026-03-27 16:52:22 +01:00
ZdenekSrotyr
c5527ec153 fix: harden script sandbox and SQL query security
Fixes found by E2E QA agent:
- Script sandbox: block os, sys, socket, eval, exec, open, __import__,
  getattr, pathlib and 20+ other dangerous patterns
- SQL query: block COPY, ATTACH, read_csv, semicolons, non-SELECT
- Added 24 security tests covering all attack vectors
2026-03-27 16:11:05 +01:00
ZdenekSrotyr
e0ce91ddb9 feat: add dataset permissions, script execution, Kamal config, CI/CD
- SyncSettingsRepository + DatasetPermissionRepository with RBAC
- Script deploy/run/undeploy API with import sandboxing
- User sync settings API with permission checks
- 4 CLI skills (connectors, security, notifications, corporate-memory)
- Kamal production + staging configs
- GitHub Actions CI + deploy workflows
- 91 total tests passing
2026-03-27 15:40:11 +01:00
ZdenekSrotyr
a3918d3833 feat: add FastAPI server with auth, RBAC, and all API endpoints
- JWT auth with role-based access control (viewer/analyst/admin/km_admin)
- Endpoints: health, sync manifest, data download, query, users CRUD,
  corporate memory, session/artifact upload
- 18 API tests covering auth, RBAC, all endpoints
2026-03-27 15:19:18 +01:00