ZdenekSrotyr
e425d4baa5
fix: handle WAL files in atomic swap to prevent DB corruption
...
Add _atomic_swap_db helper that removes stale WAL files before and after
moving the temp DuckDB into place. Apply CHECKPOINT before close in both
orchestrator and Keboola extractor so DuckDB flushes WAL before the swap.
2026-04-09 06:57:29 +02:00
ZdenekSrotyr
79443e0df4
fix: CSV all_varchar in legacy extractor, rewrite DEPLOYMENT.md from real deploy
...
- Legacy extractor now uses read_csv(all_varchar=true) to avoid type
inference errors (e.g. seniority column typed as DOUBLE with string values)
- DEPLOYMENT.md rewritten based on actual dev VM deployment experience:
deploy key setup, DuckDB write locking, env reload gotchas, bootstrap flow
2026-04-08 19:09:55 +02:00
ZdenekSrotyr
06e1cf0a8d
feat: generic _remote_attach contract for remote DuckDB extension views
...
Extractors with remote tables now write a _remote_attach table into
extract.duckdb so the orchestrator can re-ATTACH external extensions
at query time. The mechanism is source-agnostic — any connector can use it.
- Keboola extractor writes _remote_attach + creates views on kbc.*
- Orchestrator reads _remote_attach, installs extension, reads token from env
- Graceful degradation: missing token → warning, local tables still work
2026-04-08 18:10:12 +02:00
ZdenekSrotyr
2d6a94fb6f
fix: DuckDB concurrency — WAL mode, subprocess sync, temp+rename
...
Three-pronged fix for DuckDB lock conflicts:
1. WAL mode on system.duckdb — enables concurrent readers + writer
2. Sync trigger runs extractor as subprocess (not background task) —
separate process = separate DuckDB connections, no lock conflict
3. Both extractor and orchestrator write to .tmp then atomic rename —
avoids lock conflict with API reads on extract.duckdb/analytics.duckdb
Fixes #9 permanently.
2026-03-31 13:19:57 +02:00
ZdenekSrotyr
10d9280ab5
fix: extractor writes to temp file to avoid lock with orchestrator
...
Writes extract.duckdb.tmp then renames atomically, avoiding DuckDB lock
conflict when orchestrator holds a read connection on extract.duckdb.
2026-03-31 13:09:51 +02:00
ZdenekSrotyr
bd0b6d19c6
fix: legacy extractor constructs full Keboola table ID from bucket+source_table
...
Was using tc['id'] which is the registry ID (e.g. 'circle'), not the
full Keboola ID (e.g. 'in.c-finance.circle') needed by the API.
2026-03-31 12:06:38 +02:00
ZdenekSrotyr
0084f80ff6
fix: legacy extractor passes Path to export_table, not str
...
Fixes 'str' object has no attribute 'parent' when Keboola DuckDB
extension falls back to legacy client.
2026-03-31 12:03:16 +02:00
ZdenekSrotyr
865d6d657e
fix: keboola client metadata_cache_path uses DATA_DIR instead of deleted config
...
Fixes #7 — NameError: name 'config' is not defined
2026-03-31 11:57:57 +02:00
ZdenekSrotyr
b502bd8bdd
refactor: delete old sync pipeline — 9,500 lines removed
...
Phase 5 cleanup: remove all code replaced by extract.duckdb architecture.
Deleted modules:
- src/config.py (653) — replaced by DuckDB table_registry
- src/parquet_manager.py (755) — replaced by DuckDB COPY TO
- src/data_sync.py (734) — replaced by SyncOrchestrator
- src/remote_query.py (636) — replaced by DuckDB BigQuery ATTACH
- src/table_registry.py (464) — replaced by DuckDB repository
- connectors/keboola/adapter.py (820) — replaced by extractor.py
- connectors/bigquery/adapter.py (665) — replaced by extractor.py
- connectors/bigquery/client.py (644) — replaced by DuckDB BQ extension
Updated all imports in webapp, catalog_export, enricher, router,
sync_settings_service, generate_sample_data. Kept keboola/client.py
as fallback (removed src.config dependency).
704 tests passing.
2026-03-31 07:50:37 +02:00
ZdenekSrotyr
18e5f0b6e8
feat: implement extract.duckdb contract — orchestrator + extractors
...
Phase 0: extend table_registry schema (v1→v2 migration), add
source_type/bucket/source_table/query_mode columns.
Phase 1: SyncOrchestrator ATTACHes extract.duckdb files into master
analytics.duckdb. Keboola extractor uses DuckDB extension with
legacy client fallback. BigQuery extractor is remote-only via
DuckDB BQ extension (no data download).
62 tests passing.
2026-03-30 20:12:56 +02:00
Petr
b99ec576ca
Add self-service data onboarding system
...
Table Registry as central source of truth (JSON) with atomic writes,
optimistic locking, audit logging, and data_description.md generation.
Existing readers (config.py, profiler.py) need zero changes.
Phase 1 - Discovery API:
- discover_tables() on DataSource ABC + Keboola implementation
- admin_required decorator with server-side recomputation
- GET /api/admin/discover-tables endpoint
Phase 2 - Table Registry:
- src/table_registry.py with CRUD, validation, migration from MD
- Admin API: register/update/unregister with version locking
- DELETE cascade cleans up per-user subscriptions
Phase 3 - Auto-Profiling:
- profile_changed_tables() for incremental profiling
- Non-fatal hook in sync_all() after successful sync
Phase 4 - Per-Table Subscriptions:
- table_mode (all/explicit) with per-table toggles
- GET/POST /api/table-subscriptions endpoints
- Subscription status in catalog and dashboard views
Phase 5 - Smart Sync:
- Python-generated rsync filter files (not shell YAML parsing)
- sync_data.sh uses --filter="merge ..." for explicit mode
Phase 6 - Admin UI:
- /admin/tables with discovery, registration modal, registry mgmt
- Vanilla JS, matching existing design system
2026-03-09 14:25:37 +01:00
Petr
266e8573d3
Extract Keboola into connectors/keboola module
...
Move all Keboola-specific code out of src/ into connectors/keboola/:
- git mv src/keboola_client.py -> connectors/keboola/client.py
- Extract LocalKeboolaSource (855 lines) from data_sync.py -> connectors/keboola/adapter.py
- Rename to KeboolaDataSource with full env var validation
- Extend DataSource ABC with get_column_metadata() and get_source_name()
- Add dynamic connector registry via importlib in create_data_source()
- Refactor _generate_schema_yaml to use ABC methods (source_type, _schema_version: 2)
- Remove src/adapters/ (redundant facade layer)
- Remove Keboola validation from src/config.py (connector validates itself)
- Add 14 tests for factory, ABC defaults, env validation, dynamic lookup
2026-03-09 12:22:16 +01:00