Read SSH alias from .sync_connection file at script start (default: 'data-analyst' for backward compatibility). All 32 occurrences of hardcoded 'data-analyst:' and 'ssh data-analyst' replaced with $SSH_HOST. |
||
|---|---|---|
| .. | ||
| activate_venv.sh | ||
| backfill_gap.sh | ||
| collect_session.py | ||
| dev_run.py | ||
| duckdb_manager.py | ||
| generate_sample_data.py | ||
| generate_user_sync_configs.py | ||
| init.sh | ||
| README.md | ||
| setup_views.sh | ||
| standalone_profiler.py | ||
| sync_config_template.yaml | ||
| sync_data.sh | ||
| test_sync.sh | ||
| update.sh | ||
Scripts
Helper scripts for working with AI Data Analyst project.
These scripts are synced from the server into server/scripts/ on the analyst's machine.
Available Scripts
setup_views.sh
Initialize or refresh DuckDB views on Parquet files.
bash server/scripts/setup_views.sh
sync_data.sh
Synchronize data from server, upload user files, and refresh DuckDB.
# Recommended: update scripts first, then sync
rsync -avz data-analyst:server/scripts/ ./server/scripts/ # Linux/macOS
scp -r data-analyst:server/scripts/* ./server/scripts/ # Windows fallback
bash server/scripts/sync_data.sh
# Other options:
bash server/scripts/sync_data.sh --dry-run # Preview what would be synced (no changes)
bash server/scripts/sync_data.sh --push # Only upload user/ to server
What sync does:
- Self-update check - detects if sync_data.sh changed, asks to re-run if so
- Downloads
server/docs/,server/scripts/,server/metadata/from server - Updates
CLAUDE.mdfrom latest template - Downloads
server/parquet/data files (with--deleteto remove old files) - Uploads
user/directory to server (backup, no--delete) - Syncs Python environment to server
- Validates DuckDB - if corrupted, deletes and recreates from parquets
- Reinitializes DuckDB views (
CREATE OR REPLACE VIEWfor all tables)
Self-update mechanism: The script checks its own checksum before and after syncing scripts. If it detects it was updated, it exits with a message asking you to run sync again. This ensures you always run the latest sync logic.
DuckDB corruption recovery: If DuckDB file is corrupted (e.g., interrupted sync), it's automatically detected and recreated. All data is safe in parquet files - DuckDB only contains VIEW definitions.
Development Scripts
dev_run.py
Flask development server with authentication bypass for local testing.
python3 scripts/dev_run.py
Starts a local Flask server at http://127.0.0.1:5000 with:
- Auth bypass routes (
/dev-login,/dev-catalog) - no OAuth required - Debug mode with hot reload
test_sync.sh
Test rsync reliability with the data server.
bash scripts/test_sync.sh # Full test sync
bash scripts/test_sync.sh --dry-run # Preview only
Typical Workflow
- First time setup: Follow bootstrap.yaml instructions
- Before analysis: Sync latest data
bash server/scripts/sync_data.sh - Analyze: Use DuckDB database at
user/duckdb/analytics.duckdb