ZdenekSrotyr
4d1acd014a
refactor: remove legacy webapp + add missing tests + housekeeping
...
Phase A: Close fixed issues (#7 , #8 , #9 ), add server/ user/ to
.gitignore, increase extractor timeout to 30 min.
Phase B: Add 10 new tests — access request lifecycle (4), CLI admin
commands (5), sync subprocess trigger (1). 578 tests passing.
Phase C: Delete entire webapp/ directory (24,800 lines) — legacy Flask
app fully replaced by FastAPI app/. Fix auth providers to use
app.instance_config instead of webapp.config. Update CLAUDE.md.
Delete 6 webapp-only test files. Fix Jira service config imports.
2026-03-31 13:44:06 +02:00
ZdenekSrotyr
6aee6cf454
fix: CLI sync downloads tables with empty hash (not yet computed)
...
Empty server hash was matching empty local hash, skipping all tables.
Now treats empty hash or missing local entry as 'needs download'.
2026-03-31 13:30:10 +02:00
ZdenekSrotyr
2d6a94fb6f
fix: DuckDB concurrency — WAL mode, subprocess sync, temp+rename
...
Three-pronged fix for DuckDB lock conflicts:
1. WAL mode on system.duckdb — enables concurrent readers + writer
2. Sync trigger runs extractor as subprocess (not background task) —
separate process = separate DuckDB connections, no lock conflict
3. Both extractor and orchestrator write to .tmp then atomic rename —
avoids lock conflict with API reads on extract.duckdb/analytics.duckdb
Fixes #9 permanently.
2026-03-31 13:19:57 +02:00
ZdenekSrotyr
10d9280ab5
fix: extractor writes to temp file to avoid lock with orchestrator
...
Writes extract.duckdb.tmp then renames atomically, avoiding DuckDB lock
conflict when orchestrator holds a read connection on extract.duckdb.
2026-03-31 13:09:51 +02:00
ZdenekSrotyr
675a29c1c7
fix: DuckDB connection pool — shared connection avoids lock conflicts
...
Fixes #9 — background sync tasks could not access system.duckdb
because FastAPI held an exclusive lock. Now uses single shared
connection per DATA_DIR with cursor() for thread safety.
2026-03-31 13:01:04 +02:00
ZdenekSrotyr
04fa1402e4
feat: CLI admin commands — register-table, discover-and-register, list-tables
...
da admin register-table: register single table
da admin discover-and-register: auto-discover from Keboola API + bulk register
da admin list-tables: show all registered tables
Used to register all 142 Keboola tables on production.
2026-03-31 12:55:03 +02:00
ZdenekSrotyr
2e7d5d1fe9
feat: access request UI — catalog badges, request modal, admin approval page
...
Backend:
- access_requests table in DuckDB schema
- AccessRequestRepository with create/approve/deny/list
- API: POST/GET /api/access-requests (submit, my requests, pending, approve, deny)
UI:
- Catalog: lock icon on private tables, "Request Access" button + modal
- Catalog: "Pending" badge for tables with pending requests
- Admin permissions page (/admin/permissions): approve/deny requests,
grant/revoke permissions, view all user permissions
- Cross-navigation between admin/tables and admin/permissions
733 tests passing.
2026-03-31 12:45:29 +02:00
ZdenekSrotyr
1074d5ec49
feat: implement data access control — table-level permissions
...
Schema v3: add is_public column to table_registry (default true).
src/rbac.py: can_access_table() checks admin bypass, public flag,
explicit permissions, wildcard bucket permissions.
API enforcement:
- manifest: filters tables by user access
- download: 403 if no access
- catalog: filters table list
- query: validates referenced tables against allowed list
New admin permissions API (/api/admin/permissions) for grant/revoke.
28 access control tests + 733 total tests passing.
2026-03-31 12:33:31 +02:00
ZdenekSrotyr
78f003f5b5
fix: reject empty table name in register-table endpoint
...
Fixes #8 — empty name created orphaned record that couldn't be deleted.
2026-03-31 12:18:58 +02:00
ZdenekSrotyr
bd0b6d19c6
fix: legacy extractor constructs full Keboola table ID from bucket+source_table
...
Was using tc['id'] which is the registry ID (e.g. 'circle'), not the
full Keboola ID (e.g. 'in.c-finance.circle') needed by the API.
2026-03-31 12:06:38 +02:00
ZdenekSrotyr
0084f80ff6
fix: legacy extractor passes Path to export_table, not str
...
Fixes 'str' object has no attribute 'parent' when Keboola DuckDB
extension falls back to legacy client.
2026-03-31 12:03:16 +02:00
ZdenekSrotyr
865d6d657e
fix: keboola client metadata_cache_path uses DATA_DIR instead of deleted config
...
Fixes #7 — NameError: name 'config' is not defined
2026-03-31 11:57:57 +02:00
ZdenekSrotyr
04c5aecc58
fix: update Terraform for extract.duckdb architecture
...
- Create /data/extracts instead of /data/src_data/parquet
- Add admin_email variable for SEED_ADMIN_EMAIL
2026-03-31 09:49:32 +02:00
ZdenekSrotyr
e1e2d6d903
feat: add SEED_ADMIN_EMAIL for Docker test environments
...
app/main.py: seed admin user on startup when SEED_ADMIN_EMAIL is set.
docker-compose.test.yml: expose port 8000, add seed env var.
2026-03-31 09:48:12 +02:00
ZdenekSrotyr
617e724d21
feat: add E2E test suite — API, extractor, Docker
...
tests/conftest.py: shared fixtures (e2e_env, seeded_app, create_mock_extract)
tests/test_e2e_api.py: 11 tests — full sync flow, RBAC, table lifecycle
tests/test_e2e_extract.py: 6 tests — Keboola/BQ/Jira pipelines, multi-source, corrupt handling
tests/test_e2e_docker.py: 3 tests — Docker health + full flow (opt-in via -m docker)
Fix admin update route (duplicate id kwarg, .dict() → .model_dump()).
705 tests passing.
2026-03-31 08:18:54 +02:00
ZdenekSrotyr
b0eaef88cc
refactor: delete old server infra — 4,200 lines removed
...
Remove all legacy deployment infrastructure replaced by Docker + Kamal:
- server/ directory (deploy.sh, setup.sh, webapp-setup.sh, sudoers,
nginx config, systemd units, bin scripts)
- scripts/sync_data.sh (replaced by da sync + API)
- All services/*/systemd/ files (replaced by docker-compose)
- tests/test_deploy_guard.py and tests/test_sync_data.py
688 tests passing.
2026-03-31 08:06:41 +02:00
ZdenekSrotyr
caa60a507d
feat: add centralized RBAC module — replace Linux group auth
...
New src/rbac.py: Role enum, hierarchy, get_user_role(), has_role(),
is_admin(), is_km_admin(), has_dataset_access(), set_user_role().
webapp/auth.py: admin_required + km_admin_required now use DuckDB
roles instead of Linux groups (pwd.getpwnam + sudo/data-ops check).
app/auth/dependencies.py: imports Role from src/rbac.py (single source).
11 RBAC tests passing.
2026-03-31 08:04:35 +02:00
ZdenekSrotyr
9fef90a729
docs: rewrite CLAUDE.md for extract.duckdb architecture
...
Update project structure, architecture diagram, key implementation
details, development commands, and extensibility docs.
Add extract service to docker-compose.yml for one-shot extraction.
2026-03-31 07:52:44 +02:00
ZdenekSrotyr
b502bd8bdd
refactor: delete old sync pipeline — 9,500 lines removed
...
Phase 5 cleanup: remove all code replaced by extract.duckdb architecture.
Deleted modules:
- src/config.py (653) — replaced by DuckDB table_registry
- src/parquet_manager.py (755) — replaced by DuckDB COPY TO
- src/data_sync.py (734) — replaced by SyncOrchestrator
- src/remote_query.py (636) — replaced by DuckDB BigQuery ATTACH
- src/table_registry.py (464) — replaced by DuckDB repository
- connectors/keboola/adapter.py (820) — replaced by extractor.py
- connectors/bigquery/adapter.py (665) — replaced by extractor.py
- connectors/bigquery/client.py (644) — replaced by DuckDB BQ extension
Updated all imports in webapp, catalog_export, enricher, router,
sync_settings_service, generate_sample_data. Kept keboola/client.py
as fallback (removed src.config dependency).
704 tests passing.
2026-03-31 07:50:37 +02:00
ZdenekSrotyr
9f20529f10
fix: resolve 7 preexisting test failures
...
- Remove iCloud duplicate files (test_db 2.py, src/db 2.py)
- Fix metrics expression fallback to top-level field in transformer + webapp
- Fix sync_data.sh rsync exception pattern for $SSH_HOST variable
- Fix deploy_guard cp regex to skip shell variable expansions
- Update sudoers-deploy with missing root:data-ops rules
- Update CRITICAL_DIRS ownership expectations to match deploy.sh reality
913 tests passing, 0 failures.
2026-03-30 20:36:00 +02:00
ZdenekSrotyr
e2a7ee21a2
fix: Jira extract_init handles empty parquet dirs gracefully
...
DuckDB read_parquet glob fails when no files match. Skip view creation
for tables without parquet files, create views only after first write.
2026-03-30 20:28:29 +02:00
ZdenekSrotyr
8bc1fceb52
feat: add migration scripts for extract.duckdb transition
...
migrate_registry_to_duckdb.py: imports tables from data_description.md
or table_registry.json into DuckDB table_registry with source columns.
migrate_parquets_to_extracts.py: copies parquets to /data/extracts/
and creates extract.duckdb with _meta + views.
2026-03-30 20:21:12 +02:00
ZdenekSrotyr
e058c71777
feat: adapt Jira connector to extract.duckdb format
...
- New extract_init.py: creates extract.duckdb with _meta + views for 6 entity types
- Update default paths to /data/extracts/jira/data/ and /data/extracts/jira/raw/
- After parquet writes, update _meta table in extract.duckdb
- Trigger SyncOrchestrator.rebuild_source("jira") after successful transform
2026-03-30 20:19:27 +02:00
ZdenekSrotyr
1bf97c725c
feat: wire orchestrator into API — replace DataSyncManager
...
sync.py: _run_sync() now calls extractor + SyncOrchestrator.rebuild()
data.py: parquet lookup searches /data/extracts/ first, legacy fallback
catalog.py: list tables from DuckDB table_registry instead of src.config
admin.py: discover-tables uses KeboolaClient directly, remove old TableRegistry dep
2026-03-30 20:16:33 +02:00
ZdenekSrotyr
18e5f0b6e8
feat: implement extract.duckdb contract — orchestrator + extractors
...
Phase 0: extend table_registry schema (v1→v2 migration), add
source_type/bucket/source_table/query_mode columns.
Phase 1: SyncOrchestrator ATTACHes extract.duckdb files into master
analytics.duckdb. Keboola extractor uses DuckDB extension with
legacy client fallback. BigQuery extractor is remote-only via
DuckDB BQ extension (no data download).
62 tests passing.
2026-03-30 20:12:56 +02:00
ZdenekSrotyr
0b9720d090
docs: rewrite core refactoring spec v2 — simplified extract.duckdb contract
2026-03-30 19:24:19 +02:00
ZdenekSrotyr
9ee7b3bd09
docs: add core refactoring design spec — DuckDB-centric extract architecture
2026-03-30 18:15:52 +02:00
ZdenekSrotyr
a4944dba4a
feat: auto-generate JWT secret in Terraform, remove manual variable
2026-03-30 16:03:19 +02:00
ZdenekSrotyr
b6a94add67
feat: add Terraform config for GCP deployment
...
- GCE e2-small with Ubuntu 24.04 + Docker
- Static IP, firewall rules, SSD boot disk
- Startup script: installs Docker, clones repo, creates .env, starts compose
- Outputs: IP, SSH command, API URL, bootstrap command, CLI setup
- ~7$/month for always-on server
2026-03-30 15:55:26 +02:00
ZdenekSrotyr
7b0a161d3d
fix: handle timezone-naive timestamps in health check
2026-03-30 14:19:40 +02:00
ZdenekSrotyr
bca5e91826
feat: add bootstrap endpoint + deploy skill for AI agents
...
- POST /auth/bootstrap — creates first admin, self-deactivates after
- da setup bootstrap — CLI command for agent-driven setup
- da setup verify — structured health check (JSON output for agents)
- cli/skills/deploy.md — complete deployment guide for AI agents
- 6 bootstrap tests including full agent deployment flow simulation
- 156 total tests passing
2026-03-30 14:01:01 +02:00
ZdenekSrotyr
a74f69d6b1
chore: exclude CI workflow from push (needs workflow scope)
2026-03-27 17:41:27 +01:00
ZdenekSrotyr
0b91d4ac47
feat: complete web UI + auth providers + template compatibility
...
All 7 web pages rendering (200):
/login, /dashboard, /catalog, /corporate-memory,
/corporate-memory/admin, /activity-center, /admin/tables
All 13 API endpoints working (200):
health, sync, data, query, users, memory, scripts,
settings, telegram, admin, catalog
Auth providers: Google OAuth, Password (argon2), Email magic link
Cookie-based JWT auth for web UI after OAuth redirect
FlexDict for Flask→FastAPI template compatibility
150 tests passing
2026-03-27 17:34:39 +01:00
ZdenekSrotyr
1a7939c594
feat: add auth providers (Google OAuth, Password, Email magic link) + web UI fixes
...
- Google OAuth with authlib + auto user creation + cookie-based JWT
- Password auth with argon2 hash + setup token flow
- Email magic link with SMTP/SendGrid support
- Cookie-based auth for web UI (after OAuth redirect)
- Dashboard template compatibility (user_info, activity, desktop status)
- 150 tests passing
2026-03-27 17:07:59 +01:00
ZdenekSrotyr
fb1e60d8e1
fix: fix TemplateResponse API for Starlette compatibility
...
Use new TemplateResponse(request, name, context) signature.
Add Flask compat shims (get_flashed_messages, url_for, session).
2026-03-27 16:59:04 +01:00
ZdenekSrotyr
1287e63ed9
feat: complete system — web UI, all API endpoints, governance, admin, CLI commands
...
Major additions:
- Web UI: Jinja2 templates in FastAPI (login, dashboard, catalog, corporate memory, admin)
- API: catalog profiles/metrics, telegram verify/unlink/status, admin table registry CRUD
- Corporate memory governance: approve/reject/mandate/revoke/edit/batch + audit log
- Sync: real DataSyncManager trigger, sync-settings, table-subscriptions
- CLI: setup (init/test/deploy/verify), server (logs/restart/deploy/backup), explore
- Instance config integration (instance.yaml loaded at startup)
- 140 tests passing (25 new)
2026-03-27 16:52:22 +01:00
ZdenekSrotyr
c5527ec153
fix: harden script sandbox and SQL query security
...
Fixes found by E2E QA agent:
- Script sandbox: block os, sys, socket, eval, exec, open, __import__,
getattr, pathlib and 20+ other dangerous patterns
- SQL query: block COPY, ATTACH, read_csv, semicolons, non-SELECT
- Added 24 security tests covering all attack vectors
2026-03-27 16:11:05 +01:00
ZdenekSrotyr
07b396bfe2
docs: add refactoring plan, design spec, and gitignore updates
2026-03-27 15:42:57 +01:00
ZdenekSrotyr
e0ce91ddb9
feat: add dataset permissions, script execution, Kamal config, CI/CD
...
- SyncSettingsRepository + DatasetPermissionRepository with RBAC
- Script deploy/run/undeploy API with import sandboxing
- User sync settings API with permission checks
- 4 CLI skills (connectors, security, notifications, corporate-memory)
- Kamal production + staging configs
- GitHub Actions CI + deploy workflows
- 91 total tests passing
2026-03-27 15:40:11 +01:00
ZdenekSrotyr
3701130a11
feat: add Docker, CLI tool, scheduler, and agent skills
...
- Dockerfile (uv-based) + docker-compose.yml (3 services)
- CLI tool 'da' with commands: auth, sync, query, status, admin, diagnose, skills
- Scheduler sidecar service (replaces systemd timers)
- pyproject.toml for uv distribution
- Built-in skills (setup, troubleshoot) for AI agents
- 17 CLI tests, 75 total tests passing
2026-03-27 15:30:03 +01:00
ZdenekSrotyr
a3918d3833
feat: add FastAPI server with auth, RBAC, and all API endpoints
...
- JWT auth with role-based access control (viewer/analyst/admin/km_admin)
- Endpoints: health, sync manifest, data download, query, users CRUD,
corporate memory, session/artifact upload
- 18 API tests covering auth, RBAC, all endpoints
2026-03-27 15:19:18 +01:00
ZdenekSrotyr
64acc8d731
feat: add JSON to DuckDB migration script with tests
2026-03-27 15:09:06 +01:00
ZdenekSrotyr
79b0b66f2e
feat: add DuckDB state layer with all repository classes
...
- src/db.py: schema with 14 tables matching design spec
- 7 repository classes: SyncState, Users, Knowledge, Audit,
Telegram, PendingCode, Script, TableRegistry, Profiles
- 37 tests covering all CRUD operations
2026-03-27 15:06:55 +01:00
ZdenekSrotyr
f76411c603
feat: add DuckDB state layer with schema management
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 13:55:54 +01:00
Petr
eb7e5bdf8f
Add data freshness indicators and remote table visibility to UI
...
- Fix sync_state.json parsing: derive last_updated from table last_sync
timestamps when root-level field is missing (flat format support)
- Parse ALL YAML blocks from data_description.md (was only first block)
- Show remote tables (daily_deal_traffic) in catalog with "Live" badge
- Show per-table sync timestamps and Local/Live query mode badges
- Add data freshness note to Business Metrics section
- Dashboard: fix "Not yet synced" bug, show local/live table breakdown
2026-03-25 16:24:26 +01:00
Petr
a667b4e32f
Fix profiler crash for remote-only tables without primary_key
...
Same issue as config.py - profiler's TableInfo and parser required
primary_key and sync_strategy, breaking auto-profile after sync
when daily_deal_traffic (remote-only) is in config.
2026-03-25 14:47:00 +01:00
Petr
4ebb3fc7b2
Fix data sync crash: make primary_key and sync_strategy optional
...
Remote-only tables (query_mode="remote") like daily_deal_traffic
don't need primary_key or sync_strategy. The parser used hard
lookups (table_data["primary_key"]) causing KeyError and breaking
all data sync since 2026-03-21.
Changes:
- TableConfig: default primary_key="" and sync_strategy="none"
- Parser: use .get() with defaults instead of [] lookups
- Validator: add "none" as valid sync_strategy
2026-03-25 14:43:22 +01:00
Petr
74ecf66f80
Increase knowledge item content limit from 500 to 1000 chars
2026-03-24 00:12:15 +01:00
Petr
0560bbc127
Rename Mandate button to Make Mandatory
2026-03-23 19:44:08 +01:00
Petr
e85d296b0a
Add Corporate Memory admin review queue UI (Phase 2)
...
Admin page at /corporate-memory/admin with three tabs:
- Review Queue: pending items with approve/mandate/reject + batch ops
- All Items: status filter, promote/demote/revoke actions
- Audit Log: filterable action history table
Features:
- Keyboard shortcuts (j/k navigate, a/r/m = approve/reject/mandate)
- Inline mandate form (mandatory reason + audience targeting)
- Toast notifications on action success/error
- Pending count badge on main Corporate Memory page
- Matches existing visual design (CSS variables, card styles)
2026-03-23 19:32:33 +01:00