ZdenekSrotyr
b502bd8bdd
refactor: delete old sync pipeline — 9,500 lines removed
...
Phase 5 cleanup: remove all code replaced by extract.duckdb architecture.
Deleted modules:
- src/config.py (653) — replaced by DuckDB table_registry
- src/parquet_manager.py (755) — replaced by DuckDB COPY TO
- src/data_sync.py (734) — replaced by SyncOrchestrator
- src/remote_query.py (636) — replaced by DuckDB BigQuery ATTACH
- src/table_registry.py (464) — replaced by DuckDB repository
- connectors/keboola/adapter.py (820) — replaced by extractor.py
- connectors/bigquery/adapter.py (665) — replaced by extractor.py
- connectors/bigquery/client.py (644) — replaced by DuckDB BQ extension
Updated all imports in webapp, catalog_export, enricher, router,
sync_settings_service, generate_sample_data. Kept keboola/client.py
as fallback (removed src.config dependency).
704 tests passing.
2026-03-31 07:50:37 +02:00
Petr
302494b632
Add --format parquet using project's ParquetManager
...
Generator now supports --format {csv,parquet,both}. Parquet mode
uses src.parquet_manager.ParquetManager for snappy compression,
proper column types (DATE, TIMESTAMP, DOUBLE), and metadata.
No more ad-hoc pandas conversion needed on the server.
2026-03-10 21:46:20 +01:00
Petr
44bf43535b
Add sample data generator with 9 e-commerce tables
...
Synthetic data generator for demo/testing without real data adapter:
- 9 tables: customers, products, campaigns, web_sessions, web_leads,
orders, order_items, payments, support_tickets
- 4 size presets: xs (1MB), s (15MB), m (150MB), l (1.5GB)
- Realistic patterns: seasonality, Pareto customer distribution,
segment-based behavior, referential integrity
- Deterministic output via --seed parameter
Also: docs/sample-data.md, updated auto-install.md with Step 6,
updated CLAUDE.md (email auth provider, dual-repo architecture)
2026-03-10 12:31:14 +01:00