Commit graph

244 commits

Author SHA1 Message Date
ZdenekSrotyr
c8e232e43e docs: update stale v1 docs to v2 Docker/FastAPI/DuckDB architecture
- CONFIGURATION.md: remove Flask/SendGrid/WEBAPP_SECRET_KEY references,
  update env vars to JWT_SECRET_KEY and SESSION_SECRET, point to
  config/.env.template and config/instance.yaml.example
- disaster-recovery.md: rewrite for Docker volumes; cover GCP disk
  snapshot backup/restore and full VM rebuild; drop systemd/nginx/SSH
- server.md: strip rsync, systemd, nginx, Linux group, and sudo
  sections; keep Docker Compose operations, log viewing, health checks,
  sync/admin CLI, and Jira webhook procedures
2026-04-09 18:44:25 +02:00
ZdenekSrotyr
7d036760f5 fix: wrap Google OAuth DB connection in try/finally to ensure it is always closed
The system DB connection opened in google_callback is now closed in a
finally block, so it is released even when an exception occurs between
open and close.
2026-04-09 18:42:56 +02:00
ZdenekSrotyr
471982d3f9 fix: route admin_edit through KnowledgeRepository.update instead of raw SQL 2026-04-09 18:42:52 +02:00
ZdenekSrotyr
7e0cb80ed2 fix: move argon2 imports to top-level and catch VerifyMismatchError specifically
PasswordHasher and VerifyMismatchError are now imported at module level in
router.py and providers/password.py. Wrong-password errors are caught as
VerifyMismatchError (401); unexpected errors fall through to a 500 with logging.
2026-04-09 18:42:51 +02:00
ZdenekSrotyr
f6d2d1487f fix: remove duplicate Path alias in upload.py, replace _Path with Path 2026-04-09 18:42:48 +02:00
ZdenekSrotyr
5fe177c309 feat: add audit logging for authentication events
Log token_created, login_failed, and bootstrap_completed events via
AuditRepository. Extracts a shared _audit() helper that swallows
errors so audit failures never block auth. Also tightens password
verification to catch VerifyMismatchError specifically and log
unexpected errors at 500 rather than silently swallowing them.
2026-04-09 18:42:38 +02:00
ZdenekSrotyr
49cb940729 fix: strip HTML from table and column descriptions in OpenMetadata enricher
Imports strip_html from transformer and applies it to both table-level
and column-level descriptions parsed from the OpenMetadata API response.
2026-04-09 18:42:37 +02:00
ZdenekSrotyr
30987eef16 fix: add union_by_name=true to read_parquet calls in profiler
Handles schema evolution across partitions when profiling tables
with multiple parquet files that may have different column sets.
2026-04-09 18:42:33 +02:00
ZdenekSrotyr
5e0e4ceb9e fix: rewrite Makefile and scripts/README.md
Makefile simplified to four targets (test, dev, docker, lint) aligned
with the current FastAPI/Docker architecture. scripts/README.md rewritten
to list only the active and migration scripts that still exist.
2026-04-09 17:16:04 +02:00
ZdenekSrotyr
22cfbfe5fb docs: update references to deleted files
- QUICKSTART.md: replace data_description.md.example copy step with
  note that tables are registered via the admin API or web UI
- NOTIFICATIONS.md: replace examples/ section with planned-feature note
- telegram_bot.md: remove examples/notifications/ rows from deployment
  table and example scripts section; note feature is planned
- dev_docs/README.md: remove plan-corporate-memory.md entry
- duckdb_manager.py: update comment from remote_query.py to query API endpoint
2026-04-09 17:15:19 +02:00
ZdenekSrotyr
f3bd378b47 chore: remove 17 dead files from v1 architecture
Removes unused scripts (collect_session, generate_user_sync_configs,
standalone_profiler, remote_query, update, setup_views, test_sync,
activate_venv, backfill_gap, sync_config_template), legacy config
(data_description.md.example), llms.txt, completed planning docs
(plan-rsync-fix, plan_parquet_types_fix, plan-corporate-memory), and
notification examples/ directory.
2026-04-09 17:14:06 +02:00
ZdenekSrotyr
988cdb4320 docs: add production deployment sections to DEPLOYMENT.md
Add GHCR pre-built images, HTTPS/Caddy, multi-instance (Terraform + manual), and instance update procedures.
2026-04-09 16:41:26 +02:00
ZdenekSrotyr
fa30298589 fix: use DATA_DIR env var instead of hardcoded /data paths
- services/telegram_bot/config.py: NOTIFICATIONS_DIR now uses DATA_DIR fallback
- src/profiler.py: DATA_DIR now uses main DATA_DIR env var instead of PROFILER_DATA_DIR
- services/telegram_bot/dispatch.py: WS_GATEWAY_SOCKET_PATH now uses WS_GATEWAY_SOCKET env var
2026-04-09 16:39:44 +02:00
ZdenekSrotyr
d814eaa503 feat: add Caddy HTTPS reverse proxy and production compose override 2026-04-09 16:39:23 +02:00
ZdenekSrotyr
622c5005aa feat(infra): add GCS remote backend and instance.yaml generation to startup script 2026-04-09 16:39:07 +02:00
ZdenekSrotyr
510e1a8178 fix: add restart policy and config mount to app, scheduler, extract services 2026-04-09 16:38:58 +02:00
ZdenekSrotyr
816f217d8e feat: add commit SHA tag to Docker image push for rollback capability 2026-04-09 16:38:38 +02:00
ZdenekSrotyr
69e029fb53 docs: expand .env.template with all ~20 env vars, organize by section 2026-04-09 16:38:26 +02:00
ZdenekSrotyr
cf59abe6dd fix: update tests to provide password after OAuth token bypass fix 2026-04-09 16:35:15 +02:00
ZdenekSrotyr
7f523788c2 fix: correct YAML path for instance name and subtitle
get_instance_name and get_instance_subtitle now look up the nested
instance.name and instance.subtitle keys to match the YAML structure.
2026-04-09 16:31:56 +02:00
ZdenekSrotyr
7bada9f32b fix: force secure cookie in production, reduce max_age to 1 day
Use TESTING env var to detect production instead of relying on
request scheme, and align cookie max_age with JWT expiry (86400s).
2026-04-09 16:31:50 +02:00
ZdenekSrotyr
c55dd02196 fix: stop leaking server file paths in upload responses
Return filename instead of full server-side path in upload_session
and upload_artifact responses.
2026-04-09 16:31:46 +02:00
ZdenekSrotyr
2043594670 fix: restrict script execution endpoints to analyst/admin roles
deploy, run, and run-deployed require analyst; undeploy requires admin.
Update test to use admin token for undeploy.
2026-04-09 16:31:42 +02:00
ZdenekSrotyr
449053bf8a fix: enforce per-table access control on catalog profile endpoints
Add can_access_table check to GET /api/catalog/profile/{table_name} and
POST /api/catalog/profile/{table_name}/refresh, returning 403 for
unauthorized tables. Update test_api_complete to cover new 403 behaviour
and fix the existing 404 test to use admin token.
2026-04-09 16:30:24 +02:00
ZdenekSrotyr
ad6b3a96e4 fix: enforce role guards on admin web pages
Add require_role(Role.ADMIN) to /admin/tables and /admin/permissions,
and require_role(Role.KM_ADMIN) to /corporate-memory/admin so that
non-admin users receive 403 instead of being served the page.

Fix admin_cookie test fixture to supply a password_hash (required since
the /auth/token endpoint blocks passwordless requests). Add analyst
fixture and TestAdminRoleGuards tests verifying analysts get 403 and
admins get 200 on the protected routes.
2026-04-09 16:30:13 +02:00
ZdenekSrotyr
8e7913b93a fix: anchor auth/ gitignore rule to repo root only 2026-04-09 16:29:53 +02:00
ZdenekSrotyr
3205a8d300 fix: block /auth/token for OAuth-only users without password_hash
Users without a password_hash (Google OAuth / magic-link accounts) could
obtain a JWT by simply posting their email to /auth/token. Add an else
clause that rejects such requests with 401, directing them to their
configured auth provider. Update and extend tests accordingly.
2026-04-09 16:29:47 +02:00
ZdenekSrotyr
55515266ea fix: block DuckDB metadata functions and relative paths in query endpoint
Add information_schema, duckdb_* introspection functions, pragma_* functions,
and relative path traversal patterns to the SQL blocklist so users cannot
enumerate schema metadata regardless of RBAC. Add six corresponding tests.
2026-04-09 16:29:11 +02:00
ZdenekSrotyr
86fe4b411d fix: upgrade urllib3 1.26→2.6.3 — resolves all 4 Dependabot security alerts
Removed kbcstorage from all dependency groups (optional + dev) so urllib3
is no longer pinned to <2.0. Legacy Keboola client is available via
manual install: pip install kbcstorage
2026-04-09 14:53:30 +02:00
ZdenekSrotyr
809448e02b fix: move kbcstorage to optional dep — unblocks urllib3 security updates
kbcstorage pins urllib3<2.0.0 which blocks Dependabot security patches.
Moved to [project.optional-dependencies] keboola-legacy since the primary
extraction path uses the DuckDB Keboola extension, not kbcstorage.
Legacy fallback uses lazy import — app works without it installed.
2026-04-09 14:46:50 +02:00
ZdenekSrotyr
22b4d830e5 chore: upgrade docker actions to Node.js 24 (login-action@v4, build-push-action@v7) 2026-04-09 14:22:11 +02:00
ZdenekSrotyr
6ebfc15010 fix: setup-uv@v7 (v8 major tag doesn't exist yet) 2026-04-09 14:19:32 +02:00
ZdenekSrotyr
1ebf50bd78 fix: upgrade setup-uv@v5 → v8 (Node.js 24 native), add uv.lock
- setup-uv@v8 runs on Node.js 24 natively — no more deprecation warnings
- Removed FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 workaround (no longer needed)
- Added uv.lock for reproducible dependency resolution
2026-04-09 14:16:55 +02:00
ZdenekSrotyr
554ba0d9f2 fix: remove Kamal deploy job (no server configured), force Node.js 24 in CI
- Removed deploy-production job — Kamal config has placeholder IPs, no secrets
- Renamed workflow to "Build & Push" — test + Docker image to GHCR
- Added FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true to suppress Node.js 20 warnings
2026-04-09 14:10:37 +02:00
ZdenekSrotyr
0279cc06fa refactor: consolidate deps into pyproject.toml, remove requirements.txt
- All dependencies now in pyproject.toml [project.dependencies]
- Dev/test deps in [project.optional-dependencies] dev and [tool.uv]
- Dockerfile uses uv pip install . from pyproject.toml
- CI uses uv pip install ".[dev]"
- Deleted requirements.txt and requirements-dev.txt
- Updated README, CLAUDE.md install instructions
- Enhanced .dockerignore (exclude tests, docs, infra from image)
2026-04-09 13:17:59 +02:00
ZdenekSrotyr
fa3aef652f chore: update GitHub Actions to Node.js 24 compatible versions (checkout@v5, setup-python@v6, setup-uv@v5) 2026-04-09 12:48:14 +02:00
ZdenekSrotyr
eb7c548025 fix: add anthropic + openai to dev deps (needed by test_llm_connector) 2026-04-09 09:13:38 +02:00
ZdenekSrotyr
f9fae6e895 fix: CI installs requirements-dev.txt (faker needed for tests), set TESTING=1 2026-04-09 09:10:29 +02:00
ZdenekSrotyr
53f39bb38d chore: clean stale docs — rewrite architecture.md, remove old plans
- architecture.md rewritten for v2 (FastAPI, DuckDB, Docker) — removed
  all Flask/rsync/SSH/systemd references
- Deleted PLAN.md and REFACTORING_PLAN.md (completed, superseded)
- auto-install.md replaced with redirect to DEPLOYMENT.md
- Fixed absolute paths in superpowers plan doc
2026-04-09 09:06:13 +02:00
ZdenekSrotyr
afa84f6585 fix: web UI smoke tests — reset DuckDB singleton, get token via API 2026-04-09 07:18:17 +02:00
ZdenekSrotyr
5131816a5b test: add missing coverage for web UI, Jira extract, instance config, and concurrent rebuild
- tests/test_web_ui.py: smoke tests for all authenticated web pages (login, dashboard, catalog, corporate-memory, activity-center, admin/tables, admin/permissions)
- tests/test_jira_service.py: unit tests for extract_init and update_meta in the Jira connector
- tests/test_instance_config.py: verifies get_instance_name() returns a string when config file is absent
- tests/test_orchestrator.py: concurrent rebuild test asserting rebuild succeeds while a read-only connection holds the analytics DB
2026-04-09 07:15:14 +02:00
ZdenekSrotyr
8df8183a9f feat: add 50 MB upload size limit to session and artifact endpoints
Rejects files exceeding MAX_UPLOAD_SIZE with HTTP 413 before writing to disk.
2026-04-09 07:14:16 +02:00
ZdenekSrotyr
c20da6d744 Remove dead Flask Blueprint from Jira connector
The Flask-based webhook endpoint at connectors/jira/webhook.py is no longer used.
FastAPI handles webhooks via app/api/jira_webhooks.py which is already integrated
into the application. This removes the redundant Flask code.
2026-04-09 07:13:20 +02:00
ZdenekSrotyr
fbbb866879 Move faker to dev dependencies 2026-04-09 07:13:20 +02:00
ZdenekSrotyr
cb557baf36 chore: update .env.template to match actual code 2026-04-09 07:13:01 +02:00
ZdenekSrotyr
8e9a0c367a fix: replace os.environ direct assignment with monkeypatch.setenv in test fixtures
Prevents environment variable leaking between tests. All DATA_DIR,
JWT_SECRET_KEY, and SCRIPT_TIMEOUT assignments in fixtures now use
monkeypatch.setenv() which auto-reverts after each test. Removes
manual os.environ.pop() cleanup lines.
2026-04-09 07:11:36 +02:00
ZdenekSrotyr
c56511f69f refactor: replace local _get_data_dir() with shared app.utils.get_data_dir()
Replace copy-pasted _get_data_dir() functions in catalog.py and upload.py
with import from app.utils.get_data_dir(). sync.py and data.py already use
the shared utility.
2026-04-09 07:05:50 +02:00
ZdenekSrotyr
53a9e838f9 feat: add graceful shutdown handler
- Add close_system_db() function in src/db.py to cleanly close shared DB connection
- Add lifespan context manager in app/main.py to trigger shutdown on app exit
- Integrate lifespan into FastAPI app initialization
- All API tests pass (77/77)
2026-04-09 07:03:45 +02:00
ZdenekSrotyr
d6458a5b53 feat: add pytest timeout and strict markers
- Add --timeout=60 to pytest.ini to catch hanging tests
- Add --strict-markers to enforce declared markers only
- Install pytest-timeout>=2.0.0 dependency
- All 627 tests pass within 60s timeout
2026-04-09 07:03:44 +02:00
ZdenekSrotyr
9e5066cf1d feat: replace Docker healthcheck with curl 2026-04-09 07:03:12 +02:00