ZdenekSrotyr
06e1cf0a8d
feat: generic _remote_attach contract for remote DuckDB extension views
...
Extractors with remote tables now write a _remote_attach table into
extract.duckdb so the orchestrator can re-ATTACH external extensions
at query time. The mechanism is source-agnostic — any connector can use it.
- Keboola extractor writes _remote_attach + creates views on kbc.*
- Orchestrator reads _remote_attach, installs extension, reads token from env
- Graceful degradation: missing token → warning, local tables still work
2026-04-08 18:10:12 +02:00
ZdenekSrotyr
2d6a94fb6f
fix: DuckDB concurrency — WAL mode, subprocess sync, temp+rename
...
Three-pronged fix for DuckDB lock conflicts:
1. WAL mode on system.duckdb — enables concurrent readers + writer
2. Sync trigger runs extractor as subprocess (not background task) —
separate process = separate DuckDB connections, no lock conflict
3. Both extractor and orchestrator write to .tmp then atomic rename —
avoids lock conflict with API reads on extract.duckdb/analytics.duckdb
Fixes #9 permanently.
2026-03-31 13:19:57 +02:00
ZdenekSrotyr
10d9280ab5
fix: extractor writes to temp file to avoid lock with orchestrator
...
Writes extract.duckdb.tmp then renames atomically, avoiding DuckDB lock
conflict when orchestrator holds a read connection on extract.duckdb.
2026-03-31 13:09:51 +02:00
ZdenekSrotyr
bd0b6d19c6
fix: legacy extractor constructs full Keboola table ID from bucket+source_table
...
Was using tc['id'] which is the registry ID (e.g. 'circle'), not the
full Keboola ID (e.g. 'in.c-finance.circle') needed by the API.
2026-03-31 12:06:38 +02:00
ZdenekSrotyr
0084f80ff6
fix: legacy extractor passes Path to export_table, not str
...
Fixes 'str' object has no attribute 'parent' when Keboola DuckDB
extension falls back to legacy client.
2026-03-31 12:03:16 +02:00
ZdenekSrotyr
865d6d657e
fix: keboola client metadata_cache_path uses DATA_DIR instead of deleted config
...
Fixes #7 — NameError: name 'config' is not defined
2026-03-31 11:57:57 +02:00
ZdenekSrotyr
b502bd8bdd
refactor: delete old sync pipeline — 9,500 lines removed
...
Phase 5 cleanup: remove all code replaced by extract.duckdb architecture.
Deleted modules:
- src/config.py (653) — replaced by DuckDB table_registry
- src/parquet_manager.py (755) — replaced by DuckDB COPY TO
- src/data_sync.py (734) — replaced by SyncOrchestrator
- src/remote_query.py (636) — replaced by DuckDB BigQuery ATTACH
- src/table_registry.py (464) — replaced by DuckDB repository
- connectors/keboola/adapter.py (820) — replaced by extractor.py
- connectors/bigquery/adapter.py (665) — replaced by extractor.py
- connectors/bigquery/client.py (644) — replaced by DuckDB BQ extension
Updated all imports in webapp, catalog_export, enricher, router,
sync_settings_service, generate_sample_data. Kept keboola/client.py
as fallback (removed src.config dependency).
704 tests passing.
2026-03-31 07:50:37 +02:00
ZdenekSrotyr
18e5f0b6e8
feat: implement extract.duckdb contract — orchestrator + extractors
...
Phase 0: extend table_registry schema (v1→v2 migration), add
source_type/bucket/source_table/query_mode columns.
Phase 1: SyncOrchestrator ATTACHes extract.duckdb files into master
analytics.duckdb. Keboola extractor uses DuckDB extension with
legacy client fallback. BigQuery extractor is remote-only via
DuckDB BQ extension (no data download).
62 tests passing.
2026-03-30 20:12:56 +02:00
Petr
b99ec576ca
Add self-service data onboarding system
...
Table Registry as central source of truth (JSON) with atomic writes,
optimistic locking, audit logging, and data_description.md generation.
Existing readers (config.py, profiler.py) need zero changes.
Phase 1 - Discovery API:
- discover_tables() on DataSource ABC + Keboola implementation
- admin_required decorator with server-side recomputation
- GET /api/admin/discover-tables endpoint
Phase 2 - Table Registry:
- src/table_registry.py with CRUD, validation, migration from MD
- Admin API: register/update/unregister with version locking
- DELETE cascade cleans up per-user subscriptions
Phase 3 - Auto-Profiling:
- profile_changed_tables() for incremental profiling
- Non-fatal hook in sync_all() after successful sync
Phase 4 - Per-Table Subscriptions:
- table_mode (all/explicit) with per-table toggles
- GET/POST /api/table-subscriptions endpoints
- Subscription status in catalog and dashboard views
Phase 5 - Smart Sync:
- Python-generated rsync filter files (not shell YAML parsing)
- sync_data.sh uses --filter="merge ..." for explicit mode
Phase 6 - Admin UI:
- /admin/tables with discovery, registration modal, registry mgmt
- Vanilla JS, matching existing design system
2026-03-09 14:25:37 +01:00
Petr
266e8573d3
Extract Keboola into connectors/keboola module
...
Move all Keboola-specific code out of src/ into connectors/keboola/:
- git mv src/keboola_client.py -> connectors/keboola/client.py
- Extract LocalKeboolaSource (855 lines) from data_sync.py -> connectors/keboola/adapter.py
- Rename to KeboolaDataSource with full env var validation
- Extend DataSource ABC with get_column_metadata() and get_source_name()
- Add dynamic connector registry via importlib in create_data_source()
- Refactor _generate_schema_yaml to use ABC methods (source_type, _schema_version: 2)
- Remove src/adapters/ (redundant facade layer)
- Remove Keboola validation from src/config.py (connector validates itself)
- Add 14 tests for factory, ABC defaults, env validation, dynamic lookup
2026-03-09 12:22:16 +01:00