History

Petr 26c4e0934d OSS cleanup: remove internal references, harden deployment, add config env interpolation Phase 1 - Internal reference cleanup: - Delete dev_docs/meetings/ (internal meeting notes/transcripts) - Replace hardcoded usernames (padak/matejkys/dasa) with deploy/generic - Replace "Internal AI Data Analyst" with "AI Data Analyst" - Replace keboola/internal_ai_data_analyst URLs with your-org/ai-data-analyst - Replace /tmp/keboola_load/ with /tmp/data_analyst_staging/ in dev_docs Phase 2 - Deployment hardening: - Tighten sudoers wildcards to explicit paths (visudo, sudoers cp) - setup.sh creates all groups (data-ops, dataread, data-private) and deploy user - webapp-setup.sh copies sudoers-webapp from repo instead of inline definition - deploy.sh conditional copy for data_description.md (not in git for OSS) - deploy.sh ownership changed to deploy:data-ops for /data/{scripts,docs,examples} Phase 3 - Config and misc: - Add ${ENV_VAR} interpolation to config/loader.py - Expand config/instance.yaml.example with all sections (admins, deployment, auth, etc.) - Create config/.env.template for secret values - Add MIT LICENSE - Fix .gitignore: add .venv/, docs/data_description.md - Fix README.md: CSV status Planned, remove metrics/, update license text - Translate Czech comments in requirements.txt to English - Fix test_account_service.py: mock username mapping instead of relying on instance config All 118 tests pass.		2026-03-09 07:59:57 +01:00
..
activate_venv.sh	Initial commit: OSS data distribution platform	2026-03-08 23:31:28 +01:00
backfill_gap.sh	Initial commit: OSS data distribution platform	2026-03-08 23:31:28 +01:00
collect_session.py	Initial commit: OSS data distribution platform	2026-03-08 23:31:28 +01:00
duckdb_manager.py	Initial commit: OSS data distribution platform	2026-03-08 23:31:28 +01:00
generate_user_sync_configs.py	Initial commit: OSS data distribution platform	2026-03-08 23:31:28 +01:00
init.sh	Initial commit: OSS data distribution platform	2026-03-08 23:31:28 +01:00
jira_backfill.py	Initial commit: OSS data distribution platform	2026-03-08 23:31:28 +01:00
jira_backfill_remote_links.py	Initial commit: OSS data distribution platform	2026-03-08 23:31:28 +01:00
jira_backfill_sla.py	Initial commit: OSS data distribution platform	2026-03-08 23:31:28 +01:00
jira_consistency_check.py	Initial commit: OSS data distribution platform	2026-03-08 23:31:28 +01:00
jira_poll_sla.py	Initial commit: OSS data distribution platform	2026-03-08 23:31:28 +01:00
README.md	OSS cleanup: remove internal references, harden deployment, add config env interpolation	2026-03-09 07:59:57 +01:00
setup_views.sh	Initial commit: OSS data distribution platform	2026-03-08 23:31:28 +01:00
sync_config_template.yaml	Initial commit: OSS data distribution platform	2026-03-08 23:31:28 +01:00
sync_data.sh	Initial commit: OSS data distribution platform	2026-03-08 23:31:28 +01:00
sync_jira.sh	Initial commit: OSS data distribution platform	2026-03-08 23:31:28 +01:00
update.sh	Initial commit: OSS data distribution platform	2026-03-08 23:31:28 +01:00

README.md

Scripts

Helper scripts for working with AI Data Analyst project.

These scripts are synced from the server into server/scripts/ on the analyst's machine.

Available Scripts

`setup_views.sh`

Initialize or refresh DuckDB views on Parquet files.

bash server/scripts/setup_views.sh

`sync_data.sh`

Synchronize data from server, upload user files, and refresh DuckDB.

# Recommended: update scripts first, then sync
rsync -avz data-analyst:server/scripts/ ./server/scripts/   # Linux/macOS
scp -r data-analyst:server/scripts/* ./server/scripts/      # Windows fallback
bash server/scripts/sync_data.sh

# Other options:
bash server/scripts/sync_data.sh --dry-run  # Preview what would be synced (no changes)
bash server/scripts/sync_data.sh --push     # Only upload user/ to server

What sync does:

Self-update check - detects if sync_data.sh changed, asks to re-run if so
Downloads server/docs/, server/scripts/, server/metadata/ from server
Updates CLAUDE.md from latest template
Downloads server/parquet/ data files (with --delete to remove old files)
Uploads user/ directory to server (backup, no --delete)
Syncs Python environment to server
Validates DuckDB - if corrupted, deletes and recreates from parquets
Reinitializes DuckDB views (CREATE OR REPLACE VIEW for all tables)

Self-update mechanism: The script checks its own checksum before and after syncing scripts. If it detects it was updated, it exits with a message asking you to run sync again. This ensures you always run the latest sync logic.

DuckDB corruption recovery: If DuckDB file is corrupted (e.g., interrupted sync), it's automatically detected and recreated. All data is safe in parquet files - DuckDB only contains VIEW definitions.

Typical Workflow

First time setup: Follow bootstrap.yaml instructions
Before analysis: Sync latest data
```
bash server/scripts/sync_data.sh
```
Analyze: Use DuckDB database at user/duckdb/analytics.duckdb