ZdenekSrotyr
|
10d9280ab5
|
fix: extractor writes to temp file to avoid lock with orchestrator
Writes extract.duckdb.tmp then renames atomically, avoiding DuckDB lock
conflict when orchestrator holds a read connection on extract.duckdb.
|
2026-03-31 13:09:51 +02:00 |
|
ZdenekSrotyr
|
bd0b6d19c6
|
fix: legacy extractor constructs full Keboola table ID from bucket+source_table
Was using tc['id'] which is the registry ID (e.g. 'circle'), not the
full Keboola ID (e.g. 'in.c-finance.circle') needed by the API.
|
2026-03-31 12:06:38 +02:00 |
|
ZdenekSrotyr
|
0084f80ff6
|
fix: legacy extractor passes Path to export_table, not str
Fixes 'str' object has no attribute 'parent' when Keboola DuckDB
extension falls back to legacy client.
|
2026-03-31 12:03:16 +02:00 |
|
ZdenekSrotyr
|
865d6d657e
|
fix: keboola client metadata_cache_path uses DATA_DIR instead of deleted config
Fixes #7 — NameError: name 'config' is not defined
|
2026-03-31 11:57:57 +02:00 |
|
ZdenekSrotyr
|
b502bd8bdd
|
refactor: delete old sync pipeline — 9,500 lines removed
Phase 5 cleanup: remove all code replaced by extract.duckdb architecture.
Deleted modules:
- src/config.py (653) — replaced by DuckDB table_registry
- src/parquet_manager.py (755) — replaced by DuckDB COPY TO
- src/data_sync.py (734) — replaced by SyncOrchestrator
- src/remote_query.py (636) — replaced by DuckDB BigQuery ATTACH
- src/table_registry.py (464) — replaced by DuckDB repository
- connectors/keboola/adapter.py (820) — replaced by extractor.py
- connectors/bigquery/adapter.py (665) — replaced by extractor.py
- connectors/bigquery/client.py (644) — replaced by DuckDB BQ extension
Updated all imports in webapp, catalog_export, enricher, router,
sync_settings_service, generate_sample_data. Kept keboola/client.py
as fallback (removed src.config dependency).
704 tests passing.
|
2026-03-31 07:50:37 +02:00 |
|
ZdenekSrotyr
|
18e5f0b6e8
|
feat: implement extract.duckdb contract — orchestrator + extractors
Phase 0: extend table_registry schema (v1→v2 migration), add
source_type/bucket/source_table/query_mode columns.
Phase 1: SyncOrchestrator ATTACHes extract.duckdb files into master
analytics.duckdb. Keboola extractor uses DuckDB extension with
legacy client fallback. BigQuery extractor is remote-only via
DuckDB BQ extension (no data download).
62 tests passing.
|
2026-03-30 20:12:56 +02:00 |
|
Petr
|
b99ec576ca
|
Add self-service data onboarding system
Table Registry as central source of truth (JSON) with atomic writes,
optimistic locking, audit logging, and data_description.md generation.
Existing readers (config.py, profiler.py) need zero changes.
Phase 1 - Discovery API:
- discover_tables() on DataSource ABC + Keboola implementation
- admin_required decorator with server-side recomputation
- GET /api/admin/discover-tables endpoint
Phase 2 - Table Registry:
- src/table_registry.py with CRUD, validation, migration from MD
- Admin API: register/update/unregister with version locking
- DELETE cascade cleans up per-user subscriptions
Phase 3 - Auto-Profiling:
- profile_changed_tables() for incremental profiling
- Non-fatal hook in sync_all() after successful sync
Phase 4 - Per-Table Subscriptions:
- table_mode (all/explicit) with per-table toggles
- GET/POST /api/table-subscriptions endpoints
- Subscription status in catalog and dashboard views
Phase 5 - Smart Sync:
- Python-generated rsync filter files (not shell YAML parsing)
- sync_data.sh uses --filter="merge ..." for explicit mode
Phase 6 - Admin UI:
- /admin/tables with discovery, registration modal, registry mgmt
- Vanilla JS, matching existing design system
|
2026-03-09 14:25:37 +01:00 |
|
Petr
|
266e8573d3
|
Extract Keboola into connectors/keboola module
Move all Keboola-specific code out of src/ into connectors/keboola/:
- git mv src/keboola_client.py -> connectors/keboola/client.py
- Extract LocalKeboolaSource (855 lines) from data_sync.py -> connectors/keboola/adapter.py
- Rename to KeboolaDataSource with full env var validation
- Extend DataSource ABC with get_column_metadata() and get_source_name()
- Add dynamic connector registry via importlib in create_data_source()
- Refactor _generate_schema_yaml to use ABC methods (source_type, _schema_version: 2)
- Remove src/adapters/ (redundant facade layer)
- Remove Keboola validation from src/config.py (connector validates itself)
- Add 14 tests for factory, ABC defaults, env validation, dynamic lookup
|
2026-03-09 12:22:16 +01:00 |
|