ZdenekSrotyr c8e232e43e docs: update stale v1 docs to v2 Docker/FastAPI/DuckDB architecture

- CONFIGURATION.md: remove Flask/SendGrid/WEBAPP_SECRET_KEY references,
  update env vars to JWT_SECRET_KEY and SESSION_SECRET, point to
  config/.env.template and config/instance.yaml.example
- disaster-recovery.md: rewrite for Docker volumes; cover GCP disk
  snapshot backup/restore and full VM rebuild; drop systemd/nginx/SSH
- server.md: strip rsync, systemd, nginx, Linux group, and sudo
  sections; keep Docker Compose operations, log viewing, health checks,
  sync/admin CLI, and Jira webhook procedures

2026-04-09 18:44:25 +02:00

4.9 KiB

Raw Blame History

Disaster Recovery

Recovery procedures for the AI Data Analyst Docker deployment.

Overview

What lives where:
  Docker volumes  /data        DuckDB files, parquet extracts, state
  Git             repo/        Application code — rebuild from GitHub
  .env            secrets      Recreate from GitHub Secrets / 1Password

Key principle: the container is disposable. All unique data lives in the /data Docker volume (or a GCP persistent disk mounted at /data). Re-pulling the image and restoring /data brings the service back to full operation.

Data Layout

Path	Content	Backup
`/data/state/system.duckdb`	Table registry, users, sync state	Daily snapshot
`/data/analytics/server.duckdb`	Master analytics DB (views)	Regenerated on start
`/data/extracts/*/extract.duckdb`	Per-source extract DBs	Daily snapshot
`/data/extracts//data/.parquet`	Parquet files (local sources)	Daily snapshot

analytics/server.duckdb is rebuilt automatically by SyncOrchestrator.rebuild() on every startup, so it does not need to be backed up separately.

Scenario A: Container Crash / Bad Deploy

Impact: Service down, data intact.

Recovery time: ~2 minutes

# Pull latest image and restart
docker compose pull
docker compose up -d

# Check health
curl https://your-instance.example.com/health

If a bad image was pushed, roll back to the previous tag:

docker compose down
# Edit docker-compose.yml to pin the previous image tag
docker compose up -d

Scenario B: /data Volume Corruption or Loss

Impact: All DuckDB state and parquet data lost.

Recovery time: ~10 minutes (from snapshot) or ~30 minutes (regenerate from source)

Option 1: Restore from GCP disk snapshot (faster)

# Find latest snapshot
gcloud compute snapshots list --project=your-gcp-project \
  --filter="sourceDisk:data-disk" --sort-by=~creationTimestamp --limit=5

# Create new disk from snapshot
gcloud compute disks create data-disk \
  --project=your-gcp-project \
  --zone=europe-north1-a \
  --source-snapshot=SNAPSHOT_NAME \
  --type=pd-balanced

# Attach to VM and mount
gcloud compute instances attach-disk your-server \
  --project=your-gcp-project \
  --zone=europe-north1-a \
  --disk=data-disk

# Restart containers
docker compose up -d

Option 2: Regenerate from source

# Start with empty /data volume
docker compose up -d

# Trigger a full sync from the data source
curl -X POST http://localhost:8000/api/sync/trigger
# Or via CLI:
docker compose exec app da sync

DuckDB extract files and parquet will be repopulated from Keboola / BigQuery. system.duckdb (table registry, users) must be restored from snapshot if not regenerated — user accounts and table definitions are not recreated by sync.

Scenario C: Complete VM Loss

Recovery time: ~20 minutes

Create new VM (or use managed instance group):

gcloud compute instances create your-server \
  --project=your-gcp-project \
  --zone=europe-north1-a \
  --machine-type=e2-medium \
  --image-family=debian-12 \
  --image-project=debian-cloud

Install Docker:
```
curl -fsSL https://get.docker.com | sh
```

Attach and mount the data disk (or restore from snapshot per Scenario B):

gcloud compute instances attach-disk your-server \
  --project=your-gcp-project --zone=europe-north1-a --disk=data-disk
# Add mount to /etc/fstab and mount /data

Clone repo and create .env:

git clone git@github.com:your-org/ai-data-analyst.git /opt/data-analyst
cd /opt/data-analyst
cp config/.env.template .env
# Fill in secrets from GitHub Secrets / 1Password

Start the stack:
```
docker compose up -d
```
Update DNS if the external IP changed:
- A record for your-instance.example.com

Verification Checklist

After any recovery, verify:

docker compose ps — all services Up
https://your-instance.example.com/health returns {"status": "ok"}
Login works (Google OAuth or email magic link)
At least one table appears in the data catalog
docker compose logs app — no ERROR lines at startup

Preventive Measures

GCP snapshots: Daily automatic snapshots of the /data persistent disk (14-day retention). Configure via:

gcloud compute resource-policies create snapshot-schedule daily-backup \
  --project=your-gcp-project \
  --region=europe-north1 \
  --max-retention-days=14 \
  --on-source-disk-delete=keep-auto-snapshots \
  --daily-schedule \
  --start-time=03:00
gcloud compute disks add-resource-policies data-disk \
  --project=your-gcp-project --zone=europe-north1-a \
  --resource-policies=daily-backup

Secrets in GitHub / 1Password: .env is never committed; recreate from stored secrets
Image tags: Pin a known-good image tag in docker-compose.yml before each deploy

4.9 KiB Raw Blame History