- Split Step 6 into 6a (Generate Parquet) and 6b (Configure Data Catalog) - Document data_description.md + instance.yaml catalog categories - Uncomment data_description.md symlink in Step 3c - Add Data Catalog verification to Step 6 checklist
Generator now supports --format {csv,parquet,both}. Parquet mode uses src.parquet_manager.ParquetManager for snappy compression, proper column types (DATE, TIMESTAMP, DOUBLE), and metadata. No more ad-hoc pandas conversion needed on the server.
Synthetic data generator for demo/testing without real data adapter: - 9 tables: customers, products, campaigns, web_sessions, web_leads, orders, order_items, payments, support_tickets - 4 size presets: xs (1MB), s (15MB), m (150MB), l (1.5GB) - Realistic patterns: seasonality, Pareto customer distribution, segment-based behavior, referential integrity - Deterministic output via --seed parameter Also: docs/sample-data.md, updated auto-install.md with Step 6, updated CLAUDE.md (email auth provider, dual-repo architecture)
Document the full end-to-end workflow: OSS repo (code) + private instance repo (config/secrets). Covers SSH key isolation per repo, symlink bridging, and ongoing deployment workflow.