Merge: docs sweep — DEPLOYMENT.md rewrite, ONBOARDING v1.4.0, README links
This commit is contained in:
commit
c1227df990
3 changed files with 131 additions and 247 deletions
|
|
@ -133,7 +133,11 @@ See `config/instance.yaml.example` for all available options.
|
|||
|
||||
## Documentation
|
||||
|
||||
- [Deployment Guide](docs/DEPLOYMENT.md) — server provisioning, Docker, environment setup
|
||||
- [Onboarding Guide](docs/ONBOARDING.md) — end-to-end Terraform deployment into a GCP project (recommended for production)
|
||||
- [Deployment Guide](docs/DEPLOYMENT.md) — chooses between Terraform and Docker Compose; covers OSS self-host
|
||||
- [Configuration Reference](docs/CONFIGURATION.md) — `instance.yaml`, env vars, per-instance options
|
||||
- [Architecture](docs/architecture.md) — orchestrator, extractors, DB layout
|
||||
- [Quickstart](docs/QUICKSTART.md) — local development
|
||||
|
||||
## Contributing
|
||||
|
||||
|
|
|
|||
|
|
@ -1,260 +1,128 @@
|
|||
# Deployment Guide
|
||||
|
||||
## Server Requirements
|
||||
Agnes supports two deployment paths. Pick the one that matches your use case.
|
||||
|
||||
- Ubuntu 24.04 LTS
|
||||
- e2-small (2 vCPU, 2 GB RAM) or larger
|
||||
- 30 GB SSD boot disk
|
||||
- Docker + Docker Compose
|
||||
- Public IP with port 8000 open
|
||||
## 1. Terraform — managed, multi-customer (recommended)
|
||||
|
||||
## Quick Deploy (GCP)
|
||||
For Keboola-operated deployments and anyone running Agnes for multiple customers on GCP.
|
||||
|
||||
### 1. Create VM
|
||||
**Follow:** [`ONBOARDING.md`](ONBOARDING.md)
|
||||
|
||||
Highlights:
|
||||
- Per-customer GCP project + private infra repo cloned from [`keboola/agnes-infra-template`](https://github.com/keboola/agnes-infra-template)
|
||||
- Reusable Terraform module `infra/modules/customer-instance` (versioned — `infra-vX.Y.Z` tags)
|
||||
- Prod + optional branch-aware dev VMs
|
||||
- Persistent SSD data disk with daily snapshots
|
||||
- Secret Manager for tokens (no plaintext in VM metadata)
|
||||
- OS Login for SSH, dedicated VM service account with scoped `secretAccessor`
|
||||
- Cron-based auto-upgrade (pulls `:stable` image digest every 5 min)
|
||||
- Caddy + Let's Encrypt TLS (opt-in with domain)
|
||||
- Uptime check + alert policy per VM (wire a notification channel to be paged)
|
||||
- CI/CD in the private repo: PR → `terraform plan`, merge to main → `apply-dev` auto, `apply-prod` gated by reviewer
|
||||
- First-boot bootstrap via `POST /auth/bootstrap`
|
||||
|
||||
Target onboarding time: **< 1 hour** per customer.
|
||||
|
||||
## 2. Docker Compose — OSS self-host
|
||||
|
||||
For running Agnes on your own VM / bare metal without Terraform. You're responsible for provisioning and maintenance.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Ubuntu 24.04 (or any Linux with Docker)
|
||||
- 2 vCPU, 2 GB RAM, 30 GB SSD minimum
|
||||
- Docker Engine + Compose plugin
|
||||
- Public IP with ports 80/443 (if using Caddy TLS) or 8000 (plain HTTP) open
|
||||
- Data-source credentials (e.g., Keboola Storage token)
|
||||
|
||||
### Steps
|
||||
|
||||
1. Clone the Agnes repository:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/keboola/agnes-the-ai-analyst.git /opt/agnes
|
||||
cd /opt/agnes
|
||||
```
|
||||
|
||||
2. Create `.env`:
|
||||
|
||||
```bash
|
||||
cat > .env <<'EOF'
|
||||
JWT_SECRET_KEY=$(openssl rand -hex 32)
|
||||
DATA_DIR=/data
|
||||
DATA_SOURCE=keboola
|
||||
KEBOOLA_STORAGE_TOKEN=<your-token>
|
||||
KEBOOLA_STACK_URL=<your-stack-url>
|
||||
SEED_ADMIN_EMAIL=<your-email>
|
||||
LOG_LEVEL=info
|
||||
AGNES_TAG=stable
|
||||
EOF
|
||||
chmod 600 .env
|
||||
```
|
||||
|
||||
3. Mount a persistent disk at `/data` (optional but recommended — survives host rebuild). If you do, use the overlay:
|
||||
|
||||
```bash
|
||||
docker compose \
|
||||
-f docker-compose.yml \
|
||||
-f docker-compose.prod.yml \
|
||||
-f docker-compose.host-mount.yml \
|
||||
up -d
|
||||
```
|
||||
|
||||
Without a persistent disk (data on Docker named volume, tied to boot disk):
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
|
||||
```
|
||||
|
||||
4. Bootstrap your admin password via `POST /auth/bootstrap`:
|
||||
|
||||
```bash
|
||||
curl -X POST http://<host>:8000/auth/bootstrap \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"email":"<your-email>","password":"<strong-password>"}'
|
||||
```
|
||||
|
||||
5. Open `http://<host>:8000/login` and sign in.
|
||||
|
||||
### TLS (optional)
|
||||
|
||||
Set `DOMAIN` in `.env` + point your DNS A-record at the host, then start with the `tls` profile:
|
||||
|
||||
```bash
|
||||
gcloud compute instances create data-analyst-dev \
|
||||
--project=YOUR_PROJECT \
|
||||
--zone=europe-west1-b \
|
||||
--machine-type=e2-small \
|
||||
--image-family=ubuntu-2404-lts-amd64 \
|
||||
--image-project=ubuntu-os-cloud \
|
||||
--boot-disk-size=30GB \
|
||||
--boot-disk-type=pd-ssd \
|
||||
--tags=data-analyst-dev
|
||||
AGNES_DOMAIN=agnes.example.com ACME_EMAIL=admin@example.com \
|
||||
docker compose -f docker-compose.yml -f docker-compose.prod.yml --profile tls up -d
|
||||
```
|
||||
|
||||
### 2. Install Docker
|
||||
### Upgrades (manual)
|
||||
|
||||
```bash
|
||||
curl -fsSL https://get.docker.com | sh
|
||||
sudo usermod -aG docker $USER
|
||||
# Log out and back in for group change to take effect
|
||||
```
|
||||
|
||||
### 3. Set up deploy key
|
||||
|
||||
Generate an SSH key for GitHub access:
|
||||
|
||||
```bash
|
||||
ssh-keygen -t ed25519 -f ~/.ssh/agnes_deploy -N "" -C "agnes-deploy"
|
||||
cat ~/.ssh/agnes_deploy.pub
|
||||
# Add the public key as a deploy key on the GitHub repo
|
||||
```
|
||||
|
||||
Configure SSH to use it:
|
||||
|
||||
```bash
|
||||
cat > ~/.ssh/config << 'EOF'
|
||||
Host github.com
|
||||
IdentityFile ~/.ssh/agnes_deploy
|
||||
StrictHostKeyChecking no
|
||||
EOF
|
||||
chmod 600 ~/.ssh/config
|
||||
```
|
||||
|
||||
### 4. Clone and configure
|
||||
|
||||
```bash
|
||||
sudo mkdir -p /opt/data-analyst
|
||||
sudo chown $USER:$USER /opt/data-analyst
|
||||
git clone git@github.com:keboola/agnes-the-ai-analyst.git /opt/data-analyst
|
||||
cd /opt/data-analyst
|
||||
```
|
||||
|
||||
Create `.env`:
|
||||
|
||||
```bash
|
||||
cat > .env << 'EOF'
|
||||
JWT_SECRET_KEY=<generate: python3 -c "import secrets; print(secrets.token_hex(32))">
|
||||
DATA_DIR=/data
|
||||
LOG_LEVEL=info
|
||||
KEBOOLA_STORAGE_TOKEN=<your-keboola-token>
|
||||
KEBOOLA_STACK_URL=<your-keboola-stack-url>
|
||||
SEED_ADMIN_EMAIL=<admin-email>
|
||||
EOF
|
||||
chmod 600 .env
|
||||
```
|
||||
|
||||
Create `config/instance.yaml` (optional, for Keboola source config):
|
||||
|
||||
```bash
|
||||
cp config/instance.yaml.example config/instance.yaml
|
||||
# Edit with your values
|
||||
```
|
||||
|
||||
### 5. Create data directories
|
||||
|
||||
```bash
|
||||
sudo mkdir -p /data/state /data/analytics /data/extracts
|
||||
sudo chown -R $USER:$USER /data
|
||||
```
|
||||
|
||||
### 6. Build and start
|
||||
|
||||
```bash
|
||||
cd /opt/data-analyst
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
Wait for health check:
|
||||
|
||||
```bash
|
||||
curl -s http://localhost:8000/api/health | python3 -m json.tool
|
||||
```
|
||||
|
||||
### 7. Bootstrap admin user
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/auth/bootstrap
|
||||
```
|
||||
|
||||
This creates the first admin user using `SEED_ADMIN_EMAIL` from `.env`.
|
||||
|
||||
### 8. Register tables and run first extraction
|
||||
|
||||
Register tables via the admin API, then:
|
||||
|
||||
```bash
|
||||
# Stop app first — DuckDB only supports one writer
|
||||
docker compose down
|
||||
docker compose run --rm extract
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
### 9. Open firewall (GCP)
|
||||
|
||||
```bash
|
||||
gcloud compute firewall-rules create allow-data-analyst-dev \
|
||||
--allow tcp:8000 \
|
||||
--target-tags=data-analyst-dev \
|
||||
--project=YOUR_PROJECT
|
||||
```
|
||||
|
||||
## Production Deployment (pre-built images)
|
||||
|
||||
Instead of building locally, use pre-built images from GitHub Container Registry:
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
|
||||
```
|
||||
|
||||
Pin to a specific version for rollback:
|
||||
```bash
|
||||
# Edit docker-compose.prod.yml, change :latest to a commit SHA
|
||||
image: ghcr.io/keboola/agnes-the-ai-analyst:abc1234def
|
||||
```
|
||||
|
||||
## HTTPS with Caddy (production)
|
||||
|
||||
Set your domain in `.env`:
|
||||
```bash
|
||||
DOMAIN=data.yourcompany.com
|
||||
```
|
||||
|
||||
Start with the production profile:
|
||||
```bash
|
||||
docker compose --profile production up -d
|
||||
```
|
||||
|
||||
Caddy automatically provisions Let's Encrypt TLS certificates. Ensure ports 80 and 443 are open.
|
||||
|
||||
## Multi-Instance Deployment
|
||||
|
||||
Each customer gets a separate VM with isolated data and config.
|
||||
|
||||
### Using Terraform
|
||||
|
||||
1. Configure remote state: `cd infra && terraform init` (uses GCS backend)
|
||||
2. Create per-customer tfvars: `cp infra/terraform.tfvars.example infra/instances/customer.tfvars`
|
||||
3. Apply: `terraform workspace new customer && terraform apply -var-file=instances/customer.tfvars`
|
||||
4. The startup script creates `.env`, `instance.yaml`, and starts Docker Compose
|
||||
|
||||
### Manual
|
||||
|
||||
1. Create VM and install Docker
|
||||
2. Clone repo and create `.env` from `config/.env.template`
|
||||
3. Create `config/instance.yaml` from `config/instance.yaml.example`
|
||||
4. Start: `docker compose -f docker-compose.yml -f docker-compose.prod.yml --profile production up -d`
|
||||
5. Bootstrap admin: `curl -X POST http://IP:8000/auth/bootstrap -H 'Content-Type: application/json' -d '{"email":"admin@customer.com","password":"initial-password"}'`
|
||||
|
||||
## Updating an Instance
|
||||
|
||||
```bash
|
||||
# Pull latest image
|
||||
cd /opt/agnes
|
||||
git pull
|
||||
docker compose -f docker-compose.yml -f docker-compose.prod.yml pull
|
||||
|
||||
# Restart with new image (zero-downtime for stateless services)
|
||||
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
|
||||
|
||||
# Rollback: edit docker-compose.prod.yml to pin previous commit SHA
|
||||
```
|
||||
|
||||
Database migrations run automatically on startup.
|
||||
Or set up a cron job — see `infra/modules/customer-instance/startup-script.sh.tpl` for the reference implementation.
|
||||
|
||||
## Important Notes
|
||||
## Which path should I pick?
|
||||
|
||||
### DuckDB Write Locking
|
||||
| | Terraform | Docker Compose |
|
||||
|---|---|---|
|
||||
| Setup time | ~45 min first customer, ~15 min each subsequent | ~30 min |
|
||||
| Infra-as-Code | Full (all resources in git) | Partial (compose.yml only) |
|
||||
| Secret storage | GCP Secret Manager | `.env` file on host |
|
||||
| Upgrades | Auto via cron, gated prod apply | Manual `docker compose pull` |
|
||||
| Backups | Daily GCP snapshots, 30-day retention | You set up yourself |
|
||||
| Monitoring / alerts | GCP Uptime Checks + alert policy | You set up yourself |
|
||||
| TLS | Auto Caddy + LE | Auto Caddy + LE (same) |
|
||||
| Best for | Multi-tenant SaaS, production | Single-instance self-host, learning |
|
||||
|
||||
DuckDB only supports one writer at a time. When running extraction:
|
||||
## Related documentation
|
||||
|
||||
```bash
|
||||
docker compose down # Stop app + scheduler
|
||||
docker compose run --rm extract # Run extraction
|
||||
docker compose up -d # Restart
|
||||
```
|
||||
|
||||
The scheduler triggers extraction via the API, which handles locking internally.
|
||||
|
||||
### Environment Variable Changes
|
||||
|
||||
`docker compose restart` does NOT reload `.env`. Use:
|
||||
|
||||
```bash
|
||||
docker compose down && docker compose up -d
|
||||
```
|
||||
|
||||
### Services
|
||||
|
||||
| Service | Profile | Description |
|
||||
|---------|---------|-------------|
|
||||
| `app` | default | FastAPI server on port 8000 |
|
||||
| `scheduler` | default | Periodic sync + extraction |
|
||||
| `extract` | extract | One-shot data extraction |
|
||||
| `telegram-bot` | full | Telegram notifications |
|
||||
| `ws-gateway` | full | WebSocket gateway |
|
||||
| `corporate-memory` | full | Knowledge collector |
|
||||
| `session-collector` | full | Session collection |
|
||||
|
||||
Start all services: `docker compose --profile full up -d`
|
||||
|
||||
### Directory Structure on Server
|
||||
|
||||
```
|
||||
/opt/data-analyst/ # Git repo
|
||||
.env # Secrets (chmod 600)
|
||||
config/instance.yaml # Instance config
|
||||
|
||||
/data/ # Persistent data (Docker volume)
|
||||
state/system.duckdb # System state (users, registry, sync)
|
||||
analytics/server.duckdb # Analytics views
|
||||
extracts/ # Per-source extract.duckdb + parquets
|
||||
keboola/
|
||||
bigquery/
|
||||
jira/
|
||||
```
|
||||
|
||||
## CI/CD
|
||||
|
||||
Push to `main` triggers GitHub Actions:
|
||||
1. Run test suite (607 tests)
|
||||
2. Build Docker image
|
||||
3. Push to GHCR (`ghcr.io/keboola/agnes-the-ai-analyst`)
|
||||
4. Deploy via Kamal
|
||||
|
||||
## Monitoring
|
||||
|
||||
- Health: `GET /api/health`
|
||||
- Logs: `docker compose logs -f app`
|
||||
- Disk: `df -h /data`
|
||||
- Tables: `curl -s http://localhost:8000/api/catalog | python3 -m json.tool`
|
||||
- [`ONBOARDING.md`](ONBOARDING.md) — end-to-end Terraform onboarding checklist
|
||||
- [`CONFIGURATION.md`](CONFIGURATION.md) — `instance.yaml`, env vars, per-instance config
|
||||
- [`architecture.md`](architecture.md) — internal architecture (orchestrator, extractors, DB layout)
|
||||
- [`QUICKSTART.md`](QUICKSTART.md) — local development setup
|
||||
- [`superpowers/specs/2026-04-21-multi-customer-deployment-spec.md`](superpowers/specs/2026-04-21-multi-customer-deployment-spec.md) — design rationale for the multi-customer model
|
||||
|
|
|
|||
|
|
@ -83,13 +83,21 @@ Copy the example and fill it in:
|
|||
|
||||
```bash
|
||||
cp terraform/terraform.tfvars.example terraform/terraform.tfvars
|
||||
# Edit:
|
||||
# Required:
|
||||
# gcp_project_id = "<GCP_PROJECT_ID>"
|
||||
# customer_name = "<customer>"
|
||||
# seed_admin_email = "...@customer.com"
|
||||
# (optionally) keboola_stack_url, prod_instance, dev_instances
|
||||
# keboola_stack_url = "https://connection.<region>.gcp.keboola.com/"
|
||||
#
|
||||
# Optional (module infra-v1.4.0+):
|
||||
# runtime_secrets = ["keboola-storage-token"] # empty if non-keboola data_source
|
||||
# firewall_ssh_source_ranges = ["35.235.240.0/20"] # IAP range; "0.0.0.0/0" if public SSH
|
||||
# notification_channel_ids = ["projects/<p>/notificationChannels/<id>"]
|
||||
# compose_ref = "main" # or a "stable-YYYY.MM.N" tag
|
||||
```
|
||||
|
||||
See the [module README](https://github.com/keboola/agnes-the-ai-analyst/tree/main/infra/modules/customer-instance) for the full variable schema.
|
||||
|
||||
## 5. First apply
|
||||
|
||||
```bash
|
||||
|
|
@ -111,16 +119,20 @@ Output: `prod_ip` = external IP.
|
|||
|
||||
## 6. Bootstrap admin user
|
||||
|
||||
On the first deploy the `users` table is empty. Create the first admin via `POST /auth/bootstrap` (this endpoint auto-disables once ≥1 user exists):
|
||||
On first boot the app auto-seeds an admin user from `SEED_ADMIN_EMAIL` — but *without a password*, which means nobody can log in yet. Activate it via `POST /auth/bootstrap`:
|
||||
|
||||
```bash
|
||||
PROD_IP=$(terraform output -raw prod_ip)
|
||||
curl -X POST "http://$PROD_IP:8000/auth/bootstrap" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"email":"admin@<customer>.com","name":"Admin","password":"<STRONG_PASSWORD>"}'
|
||||
-d '{"email":"<seed_admin_email from tfvars>","password":"<STRONG_PASSWORD>"}'
|
||||
```
|
||||
|
||||
Log in: `http://<prod_ip>:8000/login`.
|
||||
If the email matches the seed user, the endpoint sets its password and promotes to admin. If it doesn't match, a new admin is created. The endpoint self-deactivates once any user has a password — **so do this before exposing the URL**.
|
||||
|
||||
Log in: `http://<prod_ip>:8000/login` with the email + password you just set.
|
||||
|
||||
**Security:** The bootstrap endpoint is only disabled by a real password being set. Running `terraform destroy` + `apply` recreates the seed user and re-opens bootstrap — so if you destroy/recreate, a new attacker window opens until you re-run bootstrap.
|
||||
|
||||
## 7. DNS + TLS (optional)
|
||||
|
||||
|
|
|
|||
Loading…
Reference in a new issue