ZdenekSrotyr
|
b502bd8bdd
|
refactor: delete old sync pipeline — 9,500 lines removed
Phase 5 cleanup: remove all code replaced by extract.duckdb architecture.
Deleted modules:
- src/config.py (653) — replaced by DuckDB table_registry
- src/parquet_manager.py (755) — replaced by DuckDB COPY TO
- src/data_sync.py (734) — replaced by SyncOrchestrator
- src/remote_query.py (636) — replaced by DuckDB BigQuery ATTACH
- src/table_registry.py (464) — replaced by DuckDB repository
- connectors/keboola/adapter.py (820) — replaced by extractor.py
- connectors/bigquery/adapter.py (665) — replaced by extractor.py
- connectors/bigquery/client.py (644) — replaced by DuckDB BQ extension
Updated all imports in webapp, catalog_export, enricher, router,
sync_settings_service, generate_sample_data. Kept keboola/client.py
as fallback (removed src.config dependency).
704 tests passing.
|
2026-03-31 07:50:37 +02:00 |
|
ZdenekSrotyr
|
18e5f0b6e8
|
feat: implement extract.duckdb contract — orchestrator + extractors
Phase 0: extend table_registry schema (v1→v2 migration), add
source_type/bucket/source_table/query_mode columns.
Phase 1: SyncOrchestrator ATTACHes extract.duckdb files into master
analytics.duckdb. Keboola extractor uses DuckDB extension with
legacy client fallback. BigQuery extractor is remote-only via
DuckDB BQ extension (no data download).
62 tests passing.
|
2026-03-30 20:12:56 +02:00 |
|
Petr
|
b99ec576ca
|
Add self-service data onboarding system
Table Registry as central source of truth (JSON) with atomic writes,
optimistic locking, audit logging, and data_description.md generation.
Existing readers (config.py, profiler.py) need zero changes.
Phase 1 - Discovery API:
- discover_tables() on DataSource ABC + Keboola implementation
- admin_required decorator with server-side recomputation
- GET /api/admin/discover-tables endpoint
Phase 2 - Table Registry:
- src/table_registry.py with CRUD, validation, migration from MD
- Admin API: register/update/unregister with version locking
- DELETE cascade cleans up per-user subscriptions
Phase 3 - Auto-Profiling:
- profile_changed_tables() for incremental profiling
- Non-fatal hook in sync_all() after successful sync
Phase 4 - Per-Table Subscriptions:
- table_mode (all/explicit) with per-table toggles
- GET/POST /api/table-subscriptions endpoints
- Subscription status in catalog and dashboard views
Phase 5 - Smart Sync:
- Python-generated rsync filter files (not shell YAML parsing)
- sync_data.sh uses --filter="merge ..." for explicit mode
Phase 6 - Admin UI:
- /admin/tables with discovery, registration modal, registry mgmt
- Vanilla JS, matching existing design system
|
2026-03-09 14:25:37 +01:00 |
|
Petr
|
266e8573d3
|
Extract Keboola into connectors/keboola module
Move all Keboola-specific code out of src/ into connectors/keboola/:
- git mv src/keboola_client.py -> connectors/keboola/client.py
- Extract LocalKeboolaSource (855 lines) from data_sync.py -> connectors/keboola/adapter.py
- Rename to KeboolaDataSource with full env var validation
- Extend DataSource ABC with get_column_metadata() and get_source_name()
- Add dynamic connector registry via importlib in create_data_source()
- Refactor _generate_schema_yaml to use ABC methods (source_type, _schema_version: 2)
- Remove src/adapters/ (redundant facade layer)
- Remove Keboola validation from src/config.py (connector validates itself)
- Add 14 tests for factory, ABC defaults, env validation, dynamic lookup
|
2026-03-09 12:22:16 +01:00 |
|