From 01213545967ca9bc18cc14717eef4b94676c2e4f Mon Sep 17 00:00:00 2001 From: ZdenekSrotyr Date: Tue, 21 Apr 2026 20:07:43 +0200 Subject: [PATCH] docs: refresh DEPLOYMENT.md and ONBOARDING.md for infra-v1.4.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - docs/DEPLOYMENT.md: rewritten to pick between Terraform (managed) and Docker Compose (OSS self-host). Old manual SSH-key-and-git-clone flow replaced with compose-based instructions pointing at the persistent-disk overlay and bootstrap endpoint. - docs/ONBOARDING.md: section 4 now documents the new v1.4.0 variables (runtime_secrets, firewall_ssh_source_ranges, notification_channel_ids, compose_ref). Section 6 explains the /auth/bootstrap seed-user fix and warns that destroy+apply reopens the bootstrap window until run again. - README.md: Documentation list expanded — ONBOARDING.md first (recommended path), DEPLOYMENT.md as the branching point, plus links to CONFIGURATION, architecture, and QUICKSTART. --- README.md | 6 +- docs/DEPLOYMENT.md | 350 ++++++++++++++------------------------------- docs/ONBOARDING.md | 22 ++- 3 files changed, 131 insertions(+), 247 deletions(-) diff --git a/README.md b/README.md index 730f2fb..e8598f7 100644 --- a/README.md +++ b/README.md @@ -133,7 +133,11 @@ See `config/instance.yaml.example` for all available options. ## Documentation -- [Deployment Guide](docs/DEPLOYMENT.md) — server provisioning, Docker, environment setup +- [Onboarding Guide](docs/ONBOARDING.md) — end-to-end Terraform deployment into a GCP project (recommended for production) +- [Deployment Guide](docs/DEPLOYMENT.md) — chooses between Terraform and Docker Compose; covers OSS self-host +- [Configuration Reference](docs/CONFIGURATION.md) — `instance.yaml`, env vars, per-instance options +- [Architecture](docs/architecture.md) — orchestrator, extractors, DB layout +- [Quickstart](docs/QUICKSTART.md) — local development ## Contributing diff --git a/docs/DEPLOYMENT.md b/docs/DEPLOYMENT.md index 4859991..029dfd9 100644 --- a/docs/DEPLOYMENT.md +++ b/docs/DEPLOYMENT.md @@ -1,260 +1,128 @@ # Deployment Guide -## Server Requirements +Agnes supports two deployment paths. Pick the one that matches your use case. -- Ubuntu 24.04 LTS -- e2-small (2 vCPU, 2 GB RAM) or larger -- 30 GB SSD boot disk -- Docker + Docker Compose -- Public IP with port 8000 open +## 1. Terraform — managed, multi-customer (recommended) -## Quick Deploy (GCP) +For Keboola-operated deployments and anyone running Agnes for multiple customers on GCP. -### 1. Create VM +**Follow:** [`ONBOARDING.md`](ONBOARDING.md) + +Highlights: +- Per-customer GCP project + private infra repo cloned from [`keboola/agnes-infra-template`](https://github.com/keboola/agnes-infra-template) +- Reusable Terraform module `infra/modules/customer-instance` (versioned — `infra-vX.Y.Z` tags) +- Prod + optional branch-aware dev VMs +- Persistent SSD data disk with daily snapshots +- Secret Manager for tokens (no plaintext in VM metadata) +- OS Login for SSH, dedicated VM service account with scoped `secretAccessor` +- Cron-based auto-upgrade (pulls `:stable` image digest every 5 min) +- Caddy + Let's Encrypt TLS (opt-in with domain) +- Uptime check + alert policy per VM (wire a notification channel to be paged) +- CI/CD in the private repo: PR → `terraform plan`, merge to main → `apply-dev` auto, `apply-prod` gated by reviewer +- First-boot bootstrap via `POST /auth/bootstrap` + +Target onboarding time: **< 1 hour** per customer. + +## 2. Docker Compose — OSS self-host + +For running Agnes on your own VM / bare metal without Terraform. You're responsible for provisioning and maintenance. + +### Prerequisites + +- Ubuntu 24.04 (or any Linux with Docker) +- 2 vCPU, 2 GB RAM, 30 GB SSD minimum +- Docker Engine + Compose plugin +- Public IP with ports 80/443 (if using Caddy TLS) or 8000 (plain HTTP) open +- Data-source credentials (e.g., Keboola Storage token) + +### Steps + +1. Clone the Agnes repository: + + ```bash + git clone https://github.com/keboola/agnes-the-ai-analyst.git /opt/agnes + cd /opt/agnes + ``` + +2. Create `.env`: + + ```bash + cat > .env <<'EOF' + JWT_SECRET_KEY=$(openssl rand -hex 32) + DATA_DIR=/data + DATA_SOURCE=keboola + KEBOOLA_STORAGE_TOKEN= + KEBOOLA_STACK_URL= + SEED_ADMIN_EMAIL= + LOG_LEVEL=info + AGNES_TAG=stable + EOF + chmod 600 .env + ``` + +3. Mount a persistent disk at `/data` (optional but recommended — survives host rebuild). If you do, use the overlay: + + ```bash + docker compose \ + -f docker-compose.yml \ + -f docker-compose.prod.yml \ + -f docker-compose.host-mount.yml \ + up -d + ``` + + Without a persistent disk (data on Docker named volume, tied to boot disk): + + ```bash + docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d + ``` + +4. Bootstrap your admin password via `POST /auth/bootstrap`: + + ```bash + curl -X POST http://:8000/auth/bootstrap \ + -H "Content-Type: application/json" \ + -d '{"email":"","password":""}' + ``` + +5. Open `http://:8000/login` and sign in. + +### TLS (optional) + +Set `DOMAIN` in `.env` + point your DNS A-record at the host, then start with the `tls` profile: ```bash -gcloud compute instances create data-analyst-dev \ - --project=YOUR_PROJECT \ - --zone=europe-west1-b \ - --machine-type=e2-small \ - --image-family=ubuntu-2404-lts-amd64 \ - --image-project=ubuntu-os-cloud \ - --boot-disk-size=30GB \ - --boot-disk-type=pd-ssd \ - --tags=data-analyst-dev +AGNES_DOMAIN=agnes.example.com ACME_EMAIL=admin@example.com \ + docker compose -f docker-compose.yml -f docker-compose.prod.yml --profile tls up -d ``` -### 2. Install Docker +### Upgrades (manual) ```bash -curl -fsSL https://get.docker.com | sh -sudo usermod -aG docker $USER -# Log out and back in for group change to take effect -``` - -### 3. Set up deploy key - -Generate an SSH key for GitHub access: - -```bash -ssh-keygen -t ed25519 -f ~/.ssh/agnes_deploy -N "" -C "agnes-deploy" -cat ~/.ssh/agnes_deploy.pub -# Add the public key as a deploy key on the GitHub repo -``` - -Configure SSH to use it: - -```bash -cat > ~/.ssh/config << 'EOF' -Host github.com - IdentityFile ~/.ssh/agnes_deploy - StrictHostKeyChecking no -EOF -chmod 600 ~/.ssh/config -``` - -### 4. Clone and configure - -```bash -sudo mkdir -p /opt/data-analyst -sudo chown $USER:$USER /opt/data-analyst -git clone git@github.com:keboola/agnes-the-ai-analyst.git /opt/data-analyst -cd /opt/data-analyst -``` - -Create `.env`: - -```bash -cat > .env << 'EOF' -JWT_SECRET_KEY= -DATA_DIR=/data -LOG_LEVEL=info -KEBOOLA_STORAGE_TOKEN= -KEBOOLA_STACK_URL= -SEED_ADMIN_EMAIL= -EOF -chmod 600 .env -``` - -Create `config/instance.yaml` (optional, for Keboola source config): - -```bash -cp config/instance.yaml.example config/instance.yaml -# Edit with your values -``` - -### 5. Create data directories - -```bash -sudo mkdir -p /data/state /data/analytics /data/extracts -sudo chown -R $USER:$USER /data -``` - -### 6. Build and start - -```bash -cd /opt/data-analyst -docker compose up -d -``` - -Wait for health check: - -```bash -curl -s http://localhost:8000/api/health | python3 -m json.tool -``` - -### 7. Bootstrap admin user - -```bash -curl -X POST http://localhost:8000/auth/bootstrap -``` - -This creates the first admin user using `SEED_ADMIN_EMAIL` from `.env`. - -### 8. Register tables and run first extraction - -Register tables via the admin API, then: - -```bash -# Stop app first — DuckDB only supports one writer -docker compose down -docker compose run --rm extract -docker compose up -d -``` - -### 9. Open firewall (GCP) - -```bash -gcloud compute firewall-rules create allow-data-analyst-dev \ - --allow tcp:8000 \ - --target-tags=data-analyst-dev \ - --project=YOUR_PROJECT -``` - -## Production Deployment (pre-built images) - -Instead of building locally, use pre-built images from GitHub Container Registry: - -```bash -docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d -``` - -Pin to a specific version for rollback: -```bash -# Edit docker-compose.prod.yml, change :latest to a commit SHA -image: ghcr.io/keboola/agnes-the-ai-analyst:abc1234def -``` - -## HTTPS with Caddy (production) - -Set your domain in `.env`: -```bash -DOMAIN=data.yourcompany.com -``` - -Start with the production profile: -```bash -docker compose --profile production up -d -``` - -Caddy automatically provisions Let's Encrypt TLS certificates. Ensure ports 80 and 443 are open. - -## Multi-Instance Deployment - -Each customer gets a separate VM with isolated data and config. - -### Using Terraform - -1. Configure remote state: `cd infra && terraform init` (uses GCS backend) -2. Create per-customer tfvars: `cp infra/terraform.tfvars.example infra/instances/customer.tfvars` -3. Apply: `terraform workspace new customer && terraform apply -var-file=instances/customer.tfvars` -4. The startup script creates `.env`, `instance.yaml`, and starts Docker Compose - -### Manual - -1. Create VM and install Docker -2. Clone repo and create `.env` from `config/.env.template` -3. Create `config/instance.yaml` from `config/instance.yaml.example` -4. Start: `docker compose -f docker-compose.yml -f docker-compose.prod.yml --profile production up -d` -5. Bootstrap admin: `curl -X POST http://IP:8000/auth/bootstrap -H 'Content-Type: application/json' -d '{"email":"admin@customer.com","password":"initial-password"}'` - -## Updating an Instance - -```bash -# Pull latest image +cd /opt/agnes +git pull docker compose -f docker-compose.yml -f docker-compose.prod.yml pull - -# Restart with new image (zero-downtime for stateless services) docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d - -# Rollback: edit docker-compose.prod.yml to pin previous commit SHA ``` -Database migrations run automatically on startup. +Or set up a cron job — see `infra/modules/customer-instance/startup-script.sh.tpl` for the reference implementation. -## Important Notes +## Which path should I pick? -### DuckDB Write Locking +| | Terraform | Docker Compose | +|---|---|---| +| Setup time | ~45 min first customer, ~15 min each subsequent | ~30 min | +| Infra-as-Code | Full (all resources in git) | Partial (compose.yml only) | +| Secret storage | GCP Secret Manager | `.env` file on host | +| Upgrades | Auto via cron, gated prod apply | Manual `docker compose pull` | +| Backups | Daily GCP snapshots, 30-day retention | You set up yourself | +| Monitoring / alerts | GCP Uptime Checks + alert policy | You set up yourself | +| TLS | Auto Caddy + LE | Auto Caddy + LE (same) | +| Best for | Multi-tenant SaaS, production | Single-instance self-host, learning | -DuckDB only supports one writer at a time. When running extraction: +## Related documentation -```bash -docker compose down # Stop app + scheduler -docker compose run --rm extract # Run extraction -docker compose up -d # Restart -``` - -The scheduler triggers extraction via the API, which handles locking internally. - -### Environment Variable Changes - -`docker compose restart` does NOT reload `.env`. Use: - -```bash -docker compose down && docker compose up -d -``` - -### Services - -| Service | Profile | Description | -|---------|---------|-------------| -| `app` | default | FastAPI server on port 8000 | -| `scheduler` | default | Periodic sync + extraction | -| `extract` | extract | One-shot data extraction | -| `telegram-bot` | full | Telegram notifications | -| `ws-gateway` | full | WebSocket gateway | -| `corporate-memory` | full | Knowledge collector | -| `session-collector` | full | Session collection | - -Start all services: `docker compose --profile full up -d` - -### Directory Structure on Server - -``` -/opt/data-analyst/ # Git repo - .env # Secrets (chmod 600) - config/instance.yaml # Instance config - -/data/ # Persistent data (Docker volume) - state/system.duckdb # System state (users, registry, sync) - analytics/server.duckdb # Analytics views - extracts/ # Per-source extract.duckdb + parquets - keboola/ - bigquery/ - jira/ -``` - -## CI/CD - -Push to `main` triggers GitHub Actions: -1. Run test suite (607 tests) -2. Build Docker image -3. Push to GHCR (`ghcr.io/keboola/agnes-the-ai-analyst`) -4. Deploy via Kamal - -## Monitoring - -- Health: `GET /api/health` -- Logs: `docker compose logs -f app` -- Disk: `df -h /data` -- Tables: `curl -s http://localhost:8000/api/catalog | python3 -m json.tool` +- [`ONBOARDING.md`](ONBOARDING.md) — end-to-end Terraform onboarding checklist +- [`CONFIGURATION.md`](CONFIGURATION.md) — `instance.yaml`, env vars, per-instance config +- [`architecture.md`](architecture.md) — internal architecture (orchestrator, extractors, DB layout) +- [`QUICKSTART.md`](QUICKSTART.md) — local development setup +- [`superpowers/specs/2026-04-21-multi-customer-deployment-spec.md`](superpowers/specs/2026-04-21-multi-customer-deployment-spec.md) — design rationale for the multi-customer model diff --git a/docs/ONBOARDING.md b/docs/ONBOARDING.md index db30dc6..c01368f 100644 --- a/docs/ONBOARDING.md +++ b/docs/ONBOARDING.md @@ -83,13 +83,21 @@ Copy the example and fill it in: ```bash cp terraform/terraform.tfvars.example terraform/terraform.tfvars -# Edit: +# Required: # gcp_project_id = "" # customer_name = "" # seed_admin_email = "...@customer.com" -# (optionally) keboola_stack_url, prod_instance, dev_instances +# keboola_stack_url = "https://connection..gcp.keboola.com/" +# +# Optional (module infra-v1.4.0+): +# runtime_secrets = ["keboola-storage-token"] # empty if non-keboola data_source +# firewall_ssh_source_ranges = ["35.235.240.0/20"] # IAP range; "0.0.0.0/0" if public SSH +# notification_channel_ids = ["projects/

/notificationChannels/"] +# compose_ref = "main" # or a "stable-YYYY.MM.N" tag ``` +See the [module README](https://github.com/keboola/agnes-the-ai-analyst/tree/main/infra/modules/customer-instance) for the full variable schema. + ## 5. First apply ```bash @@ -111,16 +119,20 @@ Output: `prod_ip` = external IP. ## 6. Bootstrap admin user -On the first deploy the `users` table is empty. Create the first admin via `POST /auth/bootstrap` (this endpoint auto-disables once ≥1 user exists): +On first boot the app auto-seeds an admin user from `SEED_ADMIN_EMAIL` — but *without a password*, which means nobody can log in yet. Activate it via `POST /auth/bootstrap`: ```bash PROD_IP=$(terraform output -raw prod_ip) curl -X POST "http://$PROD_IP:8000/auth/bootstrap" \ -H "Content-Type: application/json" \ - -d '{"email":"admin@.com","name":"Admin","password":""}' + -d '{"email":"","password":""}' ``` -Log in: `http://:8000/login`. +If the email matches the seed user, the endpoint sets its password and promotes to admin. If it doesn't match, a new admin is created. The endpoint self-deactivates once any user has a password — **so do this before exposing the URL**. + +Log in: `http://:8000/login` with the email + password you just set. + +**Security:** The bootstrap endpoint is only disabled by a real password being set. Running `terraform destroy` + `apply` recreates the seed user and re-opens bootstrap — so if you destroy/recreate, a new attacker window opens until you re-run bootstrap. ## 7. DNS + TLS (optional)