diff --git a/docs/auto-install.md b/docs/auto-install.md new file mode 100644 index 0000000..6ff64e7 --- /dev/null +++ b/docs/auto-install.md @@ -0,0 +1,326 @@ +# Automated Installation Log + +Step-by-step record of deploying the platform on a clean Ubuntu 24.04 VM (DigitalOcean). + +## Infrastructure + +- **Provider**: DigitalOcean +- **Droplet**: s-1vcpu-2gb (1 vCPU, 2GB RAM, 50GB disk) +- **Region**: ams3 (Amsterdam) +- **OS**: Ubuntu 24.04.3 LTS +- **IP**: 165.22.199.226 + +## Prerequisites Discovered + +1. **DigitalOcean API token needs `ssh_key` scope** to register SSH keys + - Without `ssh_keys` field in droplet creation, DO forces password expiry + - Cloud-init `user_data` cannot override this (DO scripts run after cloud-init) + - Solution: register key via `/v2/account/keys`, then reference in `ssh_keys` array + +2. **`python3-venv` must be installed** before `server/setup.sh` + - Ubuntu 24.04 doesn't include it by default + - Fix: `apt install python3.12-venv` before running setup + +## Step 0: Create GitHub Repo & Push + +```bash +# Repo was created on GitHub: padak/tmp_oss (private) +git remote add origin https://github.com/padak/tmp_oss.git +git push -u origin main +``` + +## Step 1: VM Setup + +### 1a: Create Droplet via API + +```bash +# First: register SSH key (requires ssh_key scope) +curl -s -X POST -H 'Content-Type: application/json' \ + -H "Authorization: Bearer $DO_TOKEN" \ + -d '{"name":"my-key","public_key":"ssh-ed25519 AAAA..."}' \ + "https://api.digitalocean.com/v2/account/keys" + +# Get key ID from response, then create droplet +curl -s -X POST -H 'Content-Type: application/json' \ + -H "Authorization: Bearer $DO_TOKEN" \ + -d '{ + "name":"oss-devel-1", + "size":"s-1vcpu-2gb", + "region":"ams3", + "image":"ubuntu-24-04-x64", + "ssh_keys":["KEY_ID_OR_FINGERPRINT"] + }' \ + "https://api.digitalocean.com/v2/droplets" +``` + +### 1b: Install Prerequisites + +```bash +ssh root@DROPLET_IP + +# Wait for apt lock to release (auto-updates run on first boot) +# Then install python3-venv +apt install -y python3.12-venv python3-pip +``` + +### 1c: Clone Repo & Run Setup + +```bash +# Generate deploy key on VM +ssh-keygen -t ed25519 -f /root/.ssh/deploy_key -N "" +# Add deploy_key.pub to GitHub repo as deploy key + +# Configure SSH for GitHub +cat > /root/.ssh/config << 'EOF' +Host github.com + IdentityFile /root/.ssh/deploy_key + StrictHostKeyChecking no +EOF + +# Clone and setup +git clone git@github.com:YOUR_ORG/YOUR_REPO.git /opt/data-analyst/repo +cd /opt/data-analyst/repo +REPO_URL="git@github.com:YOUR_ORG/YOUR_REPO.git" bash server/setup.sh +``` + +### Step 1 Checklist Results + +| # | Check | Result | +|---|-------|--------| +| 1.1 | Groups created | data-ops, dataread, data-private - OK | +| 1.2 | Deploy user exists | uid=999(deploy), groups: deploy, data-ops - OK | +| 1.3 | Directory structure | /opt/data-analyst/{repo,.venv,logs} - OK | +| 1.4 | Python venv works | Flask 3.1.3 loaded - OK | +| 1.5 | Management scripts | add-analyst, list-analysts, add-admin, remove-analyst - OK | + +## Step 2: Webapp Setup + +### 2a: Run webapp-setup.sh + +```bash +export SERVER_HOSTNAME="165.22.199.226" # or your domain +bash server/webapp-setup.sh +``` + +**Issue**: Nginx config assumes SSL/domain. For IP-only testing, replace with HTTP-only config: + +```bash +cat > /etc/nginx/sites-available/webapp << 'NGINX' +server { + listen 80; + server_name _; + + location / { + proxy_pass http://unix:/run/webapp/webapp.sock; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + proxy_http_version 1.1; + proxy_set_header Upgrade $http_upgrade; + proxy_set_header Connection "upgrade"; + } + + location /static/ { + alias /opt/data-analyst/repo/webapp/static/; + expires 1d; + } + + location /health { + proxy_pass http://unix:/run/webapp/webapp.sock; + proxy_set_header Host $host; + access_log off; + } +} +NGINX + +rm -f /etc/nginx/sites-enabled/default +nginx -t && systemctl restart nginx +``` + +### 2b: Configure .env + +```bash +SECRET_KEY=$(python3 -c 'import secrets; print(secrets.token_hex(32))') + +cat > /opt/data-analyst/.env << EOF +WEBAPP_SECRET_KEY="${SECRET_KEY}" +SERVER_HOST="YOUR_IP" +SERVER_HOSTNAME="YOUR_IP_OR_DOMAIN" +GOOGLE_CLIENT_ID="your-google-client-id" +GOOGLE_CLIENT_SECRET="your-google-client-secret" +DATA_SOURCE="local" +DATA_DIR="/data/src_data" +EOF + +chown root:data-ops /opt/data-analyst/.env +chmod 640 /opt/data-analyst/.env +``` + +### 2c: Create Data Directories & Start + +```bash +mkdir -p /data/src_data/{parquet,metadata} /data/docs /data/scripts +chown -R root:data-ops /data +chmod -R 2775 /data + +mkdir -p /run/webapp +chown www-data:www-data /run/webapp + +systemctl daemon-reload +systemctl start webapp +systemctl enable webapp +``` + +### Step 2 Checklist Results + +| # | Check | Result | +|---|-------|--------| +| 2.1 | Nginx running | active - OK | +| 2.2 | Webapp running | active (gunicorn with 2 workers) - OK | +| 2.3 | SSL cert | SKIPPED (IP-only, no domain) | +| 2.4 | Health endpoint | Returns JSON with disk/load/services - OK | +| 2.5 | Login page loads | HTTP 200 - OK | + +## Issues Found & Fixes + +### Issue 1: `python3-venv` not installed +- **Symptom**: `server/setup.sh` fails at venv creation +- **Fix**: `apt install python3.12-venv` before running setup +- **TODO**: Add to `server/setup.sh` package list + +### Issue 2: Nginx SSL config with IP address +- **Symptom**: Nginx fails to start - no SSL cert for "YOUR_DOMAIN" +- **Fix**: Replace nginx config with HTTP-only version for IP-only deployments +- **TODO**: `webapp-setup.sh` should detect IP vs domain and generate appropriate config + +### Issue 3: DigitalOcean cloud-init limitations +- **Symptom**: `user_data` cloud-init cannot prevent password expiry +- **Fix**: Must use `ssh_keys` API field with registered key +- **Lesson**: DO initialization scripts run after cloud-init and override password settings + +## Step 3: Instance Configuration + +```bash +cat > /opt/data-analyst/repo/config/instance.yaml << 'YAML' +instance: + name: "OSS Data Analyst" + subtitle: "Test Deployment" + copyright: "Test" + +server: + hostname: "165.22.199.226" + host: "165.22.199.226" + app_dir: "/opt/data-analyst" + +auth: + allowed_domain: "test.com" # any domain for testing + webapp_secret_key: "${WEBAPP_SECRET_KEY}" + +data_source: + type: "local" + +catalog: + categories: {} +YAML + +systemctl restart webapp +``` + +### Step 3 Checklist Results + +| # | Check | Result | +|---|-------|--------| +| 3.1 | Config loads | OK - webapp starts without errors | +| 3.2 | Instance name shown | "OSS Data Analyst" on login page | + +## Step 4: Authentication (Email Magic Link) + +No Google OAuth needed! The email magic link provider works without any external service. + +### How it works + +1. Login page shows "Sign in with Email" button +2. User enters email with allowed domain (e.g., `user@test.com`) +3. System generates a signed magic link (valid 15 minutes) +4. Without SMTP: link shown directly in browser (development mode) +5. With SMTP: link sent via email +6. Click link -> logged in, redirected to dashboard + +### Test Results + +```bash +# Login page shows both providers +curl -s http://localhost/login | grep 'Sign in with' +# Sign in with Google +# Sign in with Email + +# Email form accessible +curl -s -o /dev/null -w "%{http_code}" http://localhost/login/email +# 200 + +# Magic link generated (dev mode - shown in browser) +curl -s -X POST -d "email=admin@test.com" http://localhost/login/email/send +# Shows magic link URL + +# Click magic link -> redirect to dashboard +curl -s -D - -L -c cookies.txt "http://localhost/login/email/verify/TOKEN" +# HTTP 302 -> /dashboard -> HTTP 200 +``` + +### Step 4 Checklist Results + +| # | Check | Result | +|---|-------|--------| +| 4.1 | Email auth available | "Sign in with Email" shown on login page | +| 4.2 | Magic link generated | Token URL generated for valid domain email | +| 4.3 | Domain restriction | Rejects emails from wrong domain | +| 4.4 | Login works | Magic link redirects to /dashboard with session | +| 4.5 | Dev mode works | Link shown in browser when no SMTP configured | + +## Issues Found & Fixes + +### Issue 1: `python3-venv` not installed +- **Symptom**: `server/setup.sh` fails at venv creation +- **Fix**: `apt install python3.12-venv` before running setup +- **TODO**: Add to `server/setup.sh` package list + +### Issue 2: Nginx SSL config with IP address +- **Symptom**: Nginx fails to start - no SSL cert for "YOUR_DOMAIN" +- **Fix**: Replace nginx config with HTTP-only version for IP-only deployments +- **TODO**: `webapp-setup.sh` should detect IP vs domain and generate appropriate config + +### Issue 3: DigitalOcean cloud-init limitations +- **Symptom**: `user_data` cloud-init cannot prevent password expiry +- **Fix**: Must use `ssh_keys` API field with registered key +- **Lesson**: DO initialization scripts run after cloud-init and override password settings + +## Current State + +- **Step 0**: GitHub repo created and pushed - DONE +- **Step 1**: VM setup (groups, users, venv, scripts) - DONE +- **Step 2**: Webapp (nginx, gunicorn, .env) - DONE +- **Step 3**: Instance configuration - DONE +- **Step 4**: Authentication (email magic link) - DONE +- **Step 5+**: Discovery API, Table Registry, Data Sync - NEXT (needs data source) + +## Server Access + +```bash +ssh -i ~/.ssh/id_ed25519 root@165.22.199.226 +``` + +## Quick Verification + +```bash +# Health check +curl http://165.22.199.226/health | python3 -m json.tool + +# Login page +curl -s -o /dev/null -w "%{http_code}" http://165.22.199.226/login +# Expected: 200 + +# Email auth form +curl -s -o /dev/null -w "%{http_code}" http://165.22.199.226/login/email +# Expected: 200 +```