Merge: docs sweep — DEPLOYMENT.md rewrite, ONBOARDING v1.4.0, README links

2026-04-21 20:08:23 +02:00 · 2026-04-21 20:08:23 +02:00 · c1227df990
commit c1227df990
parent 0643437ab8 0121354596
3 changed files with 131 additions and 247 deletions
--- a/README.md
+++ b/README.md
@ -133,7 +133,11 @@ See `config/instance.yaml.example` for all available options.
 ## Documentation
- [Deployment Guide](docs/DEPLOYMENT.md) — server provisioning, Docker, environment setup
+- [Onboarding Guide](docs/ONBOARDING.md) — end-to-end Terraform deployment into a GCP project (recommended for production)
 - [Deployment Guide](docs/DEPLOYMENT.md) — chooses between Terraform and Docker Compose; covers OSS self-host
 - [Configuration Reference](docs/CONFIGURATION.md) — `instance.yaml`, env vars, per-instance options
 - [Architecture](docs/architecture.md) — orchestrator, extractors, DB layout
 - [Quickstart](docs/QUICKSTART.md) — local development
 ## Contributing
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@ -1,260 +1,128 @@
 # Deployment Guide
-## Server Requirements
+Agnes supports two deployment paths. Pick the one that matches your use case.
- Ubuntu 24.04 LTS
+## 1. Terraform — managed, multi-customer (recommended)
 - e2-small (2 vCPU, 2 GB RAM) or larger
 - 30 GB SSD boot disk
 - Docker + Docker Compose
 - Public IP with port 8000 open
-## Quick Deploy (GCP)
+For Keboola-operated deployments and anyone running Agnes for multiple customers on GCP.
-### 1. Create VM
+**Follow:** [`ONBOARDING.md`](ONBOARDING.md)
 Highlights:
 - Per-customer GCP project + private infra repo cloned from [`keboola/agnes-infra-template`](https://github.com/keboola/agnes-infra-template)
 - Reusable Terraform module `infra/modules/customer-instance` (versioned — `infra-vX.Y.Z` tags)
 - Prod + optional branch-aware dev VMs
 - Persistent SSD data disk with daily snapshots
 - Secret Manager for tokens (no plaintext in VM metadata)
 - OS Login for SSH, dedicated VM service account with scoped `secretAccessor`
 - Cron-based auto-upgrade (pulls `:stable` image digest every 5 min)
 - Caddy + Let's Encrypt TLS (opt-in with domain)
 - Uptime check + alert policy per VM (wire a notification channel to be paged)
 - CI/CD in the private repo: PR → `terraform plan`, merge to main → `apply-dev` auto, `apply-prod` gated by reviewer
 - First-boot bootstrap via `POST /auth/bootstrap`
 Target onboarding time: **< 1 hour** per customer.
 ## 2. Docker Compose — OSS self-host
 For running Agnes on your own VM / bare metal without Terraform. You're responsible for provisioning and maintenance.
 ### Prerequisites
 - Ubuntu 24.04 (or any Linux with Docker)
 - 2 vCPU, 2 GB RAM, 30 GB SSD minimum
 - Docker Engine + Compose plugin
 - Public IP with ports 80/443 (if using Caddy TLS) or 8000 (plain HTTP) open
 - Data-source credentials (e.g., Keboola Storage token)
 ### Steps
 1. Clone the Agnes repository:
   ```bash
-gcloud compute instances create data-analyst-dev \
+   git clone https://github.com/keboola/agnes-the-ai-analyst.git /opt/agnes
-  --project=YOUR_PROJECT \
+   cd /opt/agnes
  --zone=europe-west1-b \
  --machine-type=e2-small \
  --image-family=ubuntu-2404-lts-amd64 \
  --image-project=ubuntu-os-cloud \
  --boot-disk-size=30GB \
  --boot-disk-type=pd-ssd \
  --tags=data-analyst-dev
   ```
-### 2. Install Docker
+2. Create `.env`:
 ```bash
 curl -fsSL https://get.docker.com | sh
 sudo usermod -aG docker $USER
 # Log out and back in for group change to take effect
 ```
 ### 3. Set up deploy key
 Generate an SSH key for GitHub access:
 ```bash
 ssh-keygen -t ed25519 -f ~/.ssh/agnes_deploy -N "" -C "agnes-deploy"
 cat ~/.ssh/agnes_deploy.pub
 # Add the public key as a deploy key on the GitHub repo
 ```
 Configure SSH to use it:
 ```bash
 cat > ~/.ssh/config << 'EOF'
 Host github.com
  IdentityFile ~/.ssh/agnes_deploy
  StrictHostKeyChecking no
 EOF
 chmod 600 ~/.ssh/config
 ```
 ### 4. Clone and configure
 ```bash
 sudo mkdir -p /opt/data-analyst
 sudo chown $USER:$USER /opt/data-analyst
 git clone git@github.com:keboola/agnes-the-ai-analyst.git /opt/data-analyst
 cd /opt/data-analyst
 ```
 Create `.env`:
   ```bash
   cat > .env <<'EOF'
-JWT_SECRET_KEY=<generate: python3 -c "import secrets; print(secrets.token_hex(32))">
+   JWT_SECRET_KEY=$(openssl rand -hex 32)
   DATA_DIR=/data
   DATA_SOURCE=keboola
   KEBOOLA_STORAGE_TOKEN=<your-token>
   KEBOOLA_STACK_URL=<your-stack-url>
   SEED_ADMIN_EMAIL=<your-email>
   LOG_LEVEL=info
-KEBOOLA_STORAGE_TOKEN=<your-keboola-token>
+   AGNES_TAG=stable
 KEBOOLA_STACK_URL=<your-keboola-stack-url>
 SEED_ADMIN_EMAIL=<admin-email>
   EOF
   chmod 600 .env
   ```
-Create `config/instance.yaml` (optional, for Keboola source config):
+3. Mount a persistent disk at `/data` (optional but recommended — survives host rebuild). If you do, use the overlay:
   ```bash
-cp config/instance.yaml.example config/instance.yaml
+   docker compose \
-# Edit with your values
+       -f docker-compose.yml \
       -f docker-compose.prod.yml \
       -f docker-compose.host-mount.yml \
       up -d
   ```
-### 5. Create data directories
+   Without a persistent disk (data on Docker named volume, tied to boot disk):
 ```bash
 sudo mkdir -p /data/state /data/analytics /data/extracts
 sudo chown -R $USER:$USER /data
 ```
 ### 6. Build and start
 ```bash
 cd /opt/data-analyst
 docker compose up -d
 ```
 Wait for health check:
 ```bash
 curl -s http://localhost:8000/api/health | python3 -m json.tool
 ```
 ### 7. Bootstrap admin user
 ```bash
 curl -X POST http://localhost:8000/auth/bootstrap
 ```
 This creates the first admin user using `SEED_ADMIN_EMAIL` from `.env`.
 ### 8. Register tables and run first extraction
 Register tables via the admin API, then:
 ```bash
 # Stop app first — DuckDB only supports one writer
 docker compose down
 docker compose run --rm extract
 docker compose up -d
 ```
 ### 9. Open firewall (GCP)
 ```bash
 gcloud compute firewall-rules create allow-data-analyst-dev \
  --allow tcp:8000 \
  --target-tags=data-analyst-dev \
  --project=YOUR_PROJECT
 ```
 ## Production Deployment (pre-built images)
 Instead of building locally, use pre-built images from GitHub Container Registry:
   ```bash
   docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
   ```
-Pin to a specific version for rollback:
+4. Bootstrap your admin password via `POST /auth/bootstrap`:
   ```bash
-# Edit docker-compose.prod.yml, change :latest to a commit SHA
+   curl -X POST http://<host>:8000/auth/bootstrap \
-image: ghcr.io/keboola/agnes-the-ai-analyst:abc1234def
+       -H "Content-Type: application/json" \
       -d '{"email":"<your-email>","password":"<strong-password>"}'
   ```
-## HTTPS with Caddy (production)
+5. Open `http://<host>:8000/login` and sign in.
 ### TLS (optional)
 Set `DOMAIN` in `.env` + point your DNS A-record at the host, then start with the `tls` profile:
 Set your domain in `.env`:
 ```bash
-DOMAIN=data.yourcompany.com
+AGNES_DOMAIN=agnes.example.com ACME_EMAIL=admin@example.com \
    docker compose -f docker-compose.yml -f docker-compose.prod.yml --profile tls up -d
 ```
-Start with the production profile:
+### Upgrades (manual)
 ```bash
 docker compose --profile production up -d
 ```
 Caddy automatically provisions Let's Encrypt TLS certificates. Ensure ports 80 and 443 are open.
 ## Multi-Instance Deployment
 Each customer gets a separate VM with isolated data and config.
 ### Using Terraform
 1. Configure remote state: `cd infra && terraform init` (uses GCS backend)
 2. Create per-customer tfvars: `cp infra/terraform.tfvars.example infra/instances/customer.tfvars`
 3. Apply: `terraform workspace new customer && terraform apply -var-file=instances/customer.tfvars`
 4. The startup script creates `.env`, `instance.yaml`, and starts Docker Compose
 ### Manual
 1. Create VM and install Docker
 2. Clone repo and create `.env` from `config/.env.template`
 3. Create `config/instance.yaml` from `config/instance.yaml.example`
 4. Start: `docker compose -f docker-compose.yml -f docker-compose.prod.yml --profile production up -d`
 5. Bootstrap admin: `curl -X POST http://IP:8000/auth/bootstrap -H 'Content-Type: application/json' -d '{"email":"admin@customer.com","password":"initial-password"}'`
 ## Updating an Instance
 ```bash
-# Pull latest image
+cd /opt/agnes
 git pull
 docker compose -f docker-compose.yml -f docker-compose.prod.yml pull
 # Restart with new image (zero-downtime for stateless services)
 docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
 # Rollback: edit docker-compose.prod.yml to pin previous commit SHA
 ```
-Database migrations run automatically on startup.
+Or set up a cron job — see `infra/modules/customer-instance/startup-script.sh.tpl` for the reference implementation.
-## Important Notes
+## Which path should I pick?
-### DuckDB Write Locking
+| | Terraform | Docker Compose |
 |---|---|---|
 | Setup time | ~45 min first customer, ~15 min each subsequent | ~30 min |
 | Infra-as-Code | Full (all resources in git) | Partial (compose.yml only) |
 | Secret storage | GCP Secret Manager | `.env` file on host |
 | Upgrades | Auto via cron, gated prod apply | Manual `docker compose pull` |
 | Backups | Daily GCP snapshots, 30-day retention | You set up yourself |
 | Monitoring / alerts | GCP Uptime Checks + alert policy | You set up yourself |
 | TLS | Auto Caddy + LE | Auto Caddy + LE (same) |
 | Best for | Multi-tenant SaaS, production | Single-instance self-host, learning |
-DuckDB only supports one writer at a time. When running extraction:
+## Related documentation
-```bash
+- [`ONBOARDING.md`](ONBOARDING.md) — end-to-end Terraform onboarding checklist
-docker compose down          # Stop app + scheduler
+- [`CONFIGURATION.md`](CONFIGURATION.md) — `instance.yaml`, env vars, per-instance config
-docker compose run --rm extract   # Run extraction
+- [`architecture.md`](architecture.md) — internal architecture (orchestrator, extractors, DB layout)
-docker compose up -d         # Restart
+- [`QUICKSTART.md`](QUICKSTART.md) — local development setup
-```
+- [`superpowers/specs/2026-04-21-multi-customer-deployment-spec.md`](superpowers/specs/2026-04-21-multi-customer-deployment-spec.md) — design rationale for the multi-customer model
 The scheduler triggers extraction via the API, which handles locking internally.
 ### Environment Variable Changes
 `docker compose restart` does NOT reload `.env`. Use:
 ```bash
 docker compose down && docker compose up -d
 ```
 ### Services
 | Service | Profile | Description |
 |---------|---------|-------------|
 | `app` | default | FastAPI server on port 8000 |
 | `scheduler` | default | Periodic sync + extraction |
 | `extract` | extract | One-shot data extraction |
 | `telegram-bot` | full | Telegram notifications |
 | `ws-gateway` | full | WebSocket gateway |
 | `corporate-memory` | full | Knowledge collector |
 | `session-collector` | full | Session collection |
 Start all services: `docker compose --profile full up -d`
 ### Directory Structure on Server
 ```
 /opt/data-analyst/          # Git repo
  .env                      # Secrets (chmod 600)
  config/instance.yaml      # Instance config
 /data/                      # Persistent data (Docker volume)
  state/system.duckdb       # System state (users, registry, sync)
  analytics/server.duckdb   # Analytics views
  extracts/                 # Per-source extract.duckdb + parquets
    keboola/
    bigquery/
    jira/
 ```
 ## CI/CD
 Push to `main` triggers GitHub Actions:
 1. Run test suite (607 tests)
 2. Build Docker image
 3. Push to GHCR (`ghcr.io/keboola/agnes-the-ai-analyst`)
 4. Deploy via Kamal
 ## Monitoring
 - Health: `GET /api/health`
 - Logs: `docker compose logs -f app`
 - Disk: `df -h /data`
 - Tables: `curl -s http://localhost:8000/api/catalog | python3 -m json.tool`
--- a/docs/ONBOARDING.md
+++ b/docs/ONBOARDING.md
@ -83,13 +83,21 @@ Copy the example and fill it in:
 ```bash
 cp terraform/terraform.tfvars.example terraform/terraform.tfvars
-# Edit:
+# Required:
 #   gcp_project_id    = "<GCP_PROJECT_ID>"
 #   customer_name     = "<customer>"
 #   seed_admin_email  = "...@customer.com"
-#   (optionally) keboola_stack_url, prod_instance, dev_instances
+#   keboola_stack_url = "https://connection.<region>.gcp.keboola.com/"
 #
 # Optional (module infra-v1.4.0+):
 #   runtime_secrets            = ["keboola-storage-token"]  # empty if non-keboola data_source
 #   firewall_ssh_source_ranges = ["35.235.240.0/20"]        # IAP range; "0.0.0.0/0" if public SSH
 #   notification_channel_ids   = ["projects/<p>/notificationChannels/<id>"]
 #   compose_ref                = "main"                     # or a "stable-YYYY.MM.N" tag
 ```
 See the [module README](https://github.com/keboola/agnes-the-ai-analyst/tree/main/infra/modules/customer-instance) for the full variable schema.
 ## 5. First apply
 ```bash
@ -111,16 +119,20 @@ Output: `prod_ip` = external IP.
 ## 6. Bootstrap admin user
-On the first deploy the `users` table is empty. Create the first admin via `POST /auth/bootstrap` (this endpoint auto-disables once ≥1 user exists):
+On first boot the app auto-seeds an admin user from `SEED_ADMIN_EMAIL` — but *without a password*, which means nobody can log in yet. Activate it via `POST /auth/bootstrap`:
 ```bash
 PROD_IP=$(terraform output -raw prod_ip)
 curl -X POST "http://$PROD_IP:8000/auth/bootstrap" \
    -H "Content-Type: application/json" \
-    -d '{"email":"admin@<customer>.com","name":"Admin","password":"<STRONG_PASSWORD>"}'
+    -d '{"email":"<seed_admin_email from tfvars>","password":"<STRONG_PASSWORD>"}'
 ```
-Log in: `http://<prod_ip>:8000/login`.
+If the email matches the seed user, the endpoint sets its password and promotes to admin. If it doesn't match, a new admin is created. The endpoint self-deactivates once any user has a password — **so do this before exposing the URL**.
 Log in: `http://<prod_ip>:8000/login` with the email + password you just set.
 **Security:** The bootstrap endpoint is only disabled by a real password being set. Running `terraform destroy` + `apply` recreates the seed user and re-opens bootstrap — so if you destroy/recreate, a new attacker window opens until you re-run bootstrap.
 ## 7. DNS + TLS (optional)