# Deployment & Multi-Instance Readiness Plan > **Historical note (2026-04-24):** This plan is a snapshot from 2026-04-09. Some details have evolved — most notably, the Caddy `production` profile referenced throughout this document was renamed to `tls` (see PR #51). For the current deployment guide, follow `docs/DEPLOYMENT.md`. This file is preserved as design history. > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Make the platform deployable to N customer instances with minimal manual effort. **Architecture:** Docker image on GHCR + per-instance config (instance.yaml + .env) + Terraform provisioning. One image, many instances. **Tech Stack:** Docker, Terraform (GCP), GitHub Actions, Caddy (TLS proxy) **Source:** Deployment readiness + multi-instance architecture reviews 2026-04-09 (findings C5-C7, I4-I9) --- ## File Map | File | Responsibility | Tasks | |------|---------------|-------| | `config/.env.template` | Complete env var reference | 1 | | `docker-compose.yml` | Add restart policy, config mount, image ref, Caddy proxy | 2, 3 | | `docker-compose.prod.yml` | Production override with GHCR image + Caddy | 2, 3 | | `.github/workflows/deploy.yml` | Image versioning with SHA tag | 4 | | `infra/main.tf` | Remote state backend, instance.yaml generation | 5 | | `services/telegram_bot/config.py` | Fix hardcoded paths | 6 | | `src/profiler.py` | Fix PROFILER_DATA_DIR | 6 | | `docs/DEPLOYMENT.md` | Update for multi-instance | 7 | --- ### Task 1: Complete .env.template with all env vars The template lists only 8 of ~15 needed variables. **Files:** - Modify: `config/.env.template` - [ ] **Step 1: Rewrite .env.template** ```bash # Agnes AI Data Analyst - Environment Variables # ============================================= # Copy to .env: cp config/.env.template .env # .env is gitignored - NEVER commit it. # ── REQUIRED ──────────────────────────────────────── JWT_SECRET_KEY= # python -c "import secrets; print(secrets.token_hex(32))" SESSION_SECRET= # python -c "import secrets; print(secrets.token_hex(32))" # ── GOOGLE OAUTH (required for Google login) ──────── # GOOGLE_CLIENT_ID= # GOOGLE_CLIENT_SECRET= # ── KEBOOLA (required for Keboola data source) ────── # KEBOOLA_STORAGE_TOKEN= # KEBOOLA_STACK_URL=https://connection.keboola.com # ── BIGQUERY (required for BigQuery data source) ───── # BIGQUERY_PROJECT= # BIGQUERY_LOCATION=us # ── BOOTSTRAP (first deploy only) ─────────────────── # SEED_ADMIN_EMAIL=admin@example.com # ── EMAIL / SMTP (required for magic link auth) ───── # SMTP_HOST=smtp.gmail.com # SMTP_PORT=587 # SMTP_USER= # SMTP_PASSWORD= # ── OPTIONAL SERVICES ─────────────────────────────── # TELEGRAM_BOT_TOKEN= # JIRA_WEBHOOK_SECRET= # JIRA_API_TOKEN= # ANTHROPIC_API_KEY= # LLM_API_KEY= # ── DESKTOP APP ───────────────────────────────────── # DESKTOP_JWT_SECRET= # Separate secret for desktop app tokens # ── DEPLOYMENT ────────────────────────────────────── # DATA_DIR=/data # Default: /data in Docker, ./data locally # LOG_LEVEL=info # debug, info, warning, error # CORS_ORIGINS=http://localhost:3000,http://localhost:8000 ``` - [ ] **Step 2: Commit** ```bash git add config/.env.template git commit -m "docs: complete .env.template with all 20+ env vars" ``` --- ### Task 2: Fix docker-compose for production (I7, I5, I8) Add restart policy to app, config volume mount, and GHCR image reference. **Files:** - Modify: `docker-compose.yml` - Create: `docker-compose.prod.yml` (production override) - [ ] **Step 1: Add restart policy and config mount to docker-compose.yml** In `docker-compose.yml`, add to the `app` service: ```yaml app: build: . command: uvicorn app.main:app --host 0.0.0.0 --port 8000 ports: - "8000:8000" volumes: - data:/data - ./config:/app/config:ro env_file: .env environment: - DATA_DIR=/data restart: unless-stopped healthcheck: test: ["CMD", "curl", "-sf", "http://localhost:8000/api/health"] interval: 30s timeout: 5s retries: 3 ``` Key changes: `restart: unless-stopped` added, `./config:/app/config:ro` volume mount added. - [ ] **Step 2: Create docker-compose.prod.yml** ```yaml # Production override — uses pre-built GHCR image instead of local build. # Usage: docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d services: app: image: ghcr.io/keboola/agnes-the-ai-analyst:latest build: !reset null scheduler: image: ghcr.io/keboola/agnes-the-ai-analyst:latest build: !reset null extract: image: ghcr.io/keboola/agnes-the-ai-analyst:latest build: !reset null telegram-bot: image: ghcr.io/keboola/agnes-the-ai-analyst:latest build: !reset null ws-gateway: image: ghcr.io/keboola/agnes-the-ai-analyst:latest build: !reset null corporate-memory: image: ghcr.io/keboola/agnes-the-ai-analyst:latest build: !reset null session-collector: image: ghcr.io/keboola/agnes-the-ai-analyst:latest build: !reset null ``` - [ ] **Step 3: Commit** ```bash git add docker-compose.yml docker-compose.prod.yml git commit -m "feat: add restart policy, config mount, production compose override with GHCR images" ``` --- ### Task 3: Add Caddy reverse proxy for TLS (I4) No HTTPS in Docker Compose — data transits in plaintext. **Files:** - Create: `Caddyfile` - Modify: `docker-compose.yml` (add caddy service) - [ ] **Step 1: Create Caddyfile** ``` {$DOMAIN:localhost} { reverse_proxy app:8000 } ``` - [ ] **Step 2: Add Caddy service to docker-compose.yml** Add to services section: ```yaml caddy: image: caddy:2-alpine ports: - "80:80" - "443:443" volumes: - ./Caddyfile:/etc/caddy/Caddyfile:ro - caddy_data:/data - caddy_config:/config environment: - DOMAIN=${DOMAIN:-localhost} depends_on: app: condition: service_healthy restart: unless-stopped profiles: - production ``` Add volumes: ```yaml volumes: data: caddy_data: caddy_config: ``` - [ ] **Step 3: Update DEPLOYMENT.md** Add section: ```markdown ### HTTPS with Caddy (production) Set `DOMAIN=data.yourcompany.com` in `.env`, then: ```bash docker compose --profile production up -d ``` Caddy automatically provisions Let's Encrypt TLS certificates. ``` - [ ] **Step 4: Commit** ```bash git add Caddyfile docker-compose.yml docs/DEPLOYMENT.md git commit -m "feat: add Caddy reverse proxy for automatic HTTPS in production" ``` --- ### Task 4: Add Docker image versioning with commit SHA (C7) Images are only tagged `:latest` — no versioning, no rollback. **Files:** - Modify: `.github/workflows/deploy.yml` - [ ] **Step 1: Update image tagging** In `.github/workflows/deploy.yml`, replace the build-and-push step: ```yaml - name: Build and push uses: docker/build-push-action@v7 with: push: true tags: | ghcr.io/${{ github.repository }}:latest ghcr.io/${{ github.repository }}:${{ github.sha }} ``` - [ ] **Step 2: Commit** ```bash git add -f .github/workflows/deploy.yml git commit -m "feat: tag Docker images with commit SHA for versioning and rollback" ``` --- ### Task 5: Add Terraform remote state backend (I6) Local tfstate blocks multi-operator and multi-instance Terraform. **Files:** - Modify: `infra/main.tf` - Modify: `infra/variables.tf` - [ ] **Step 1: Add GCS backend to main.tf** In `infra/main.tf`, inside the `terraform {}` block: ```hcl terraform { required_version = ">= 1.5" backend "gcs" { bucket = "agnes-terraform-state" prefix = "instances" } required_providers { google = { source = "hashicorp/google" version = "~> 5.0" } random = { source = "hashicorp/random" version = "~> 3.0" } } } ``` - [ ] **Step 2: Add instance.yaml generation to startup script** In `infra/main.tf`, in the `startup_script` local, after the `.env` generation: ```bash echo "=== Creating instance.yaml ===" cat > "$APP_DIR/config/instance.yaml" << 'YAMLEOF' instance: name: "${var.instance_name}" subtitle: "Data Analytics Platform" server: host: "${google_compute_address.data_analyst.address}" hostname: "${var.domain != "" ? var.domain : google_compute_address.data_analyst.address}" port: 8000 auth: allowed_domain: "${var.admin_email != "" ? join("", [split("@", var.admin_email)[1]]) : ""}" data_source: type: "${var.keboola_token != "" ? "keboola" : "local"}" YAMLEOF sed -i 's/^ //' "$APP_DIR/config/instance.yaml" ``` - [ ] **Step 3: Update repo URL in startup script** Replace line 73 `git clone https://github.com/padak/tmp_oss.git` with: ```bash git clone https://github.com/keboola/agnes-the-ai-analyst.git "$APP_DIR" ``` And line 75 `git checkout feature/v2-fastapi-duckdb-docker-cli` with: ```bash # main branch is default, no checkout needed ``` - [ ] **Step 4: Commit** ```bash git add infra/main.tf infra/variables.tf git commit -m "feat: add Terraform GCS remote state, instance.yaml generation, update repo URL" ``` --- ### Task 6: Fix hardcoded paths in services (I9) telegram_bot and profiler use hardcoded `/data/...` paths instead of `DATA_DIR`. **Files:** - Modify: `services/telegram_bot/config.py:14` - Modify: `src/profiler.py:87` - Modify: `services/telegram_bot/dispatch.py:17` - [ ] **Step 1: Fix telegram_bot config** In `services/telegram_bot/config.py`, replace line 14: ```python NOTIFICATIONS_DIR = os.path.join(os.environ.get("DATA_DIR", "/data"), "notifications") ``` - [ ] **Step 2: Fix profiler** In `src/profiler.py`, replace line 87: ```python DATA_DIR = Path(os.environ.get("DATA_DIR", "/data")) / "src_data" ``` Remove `PROFILER_DATA_DIR` reference — use standard `DATA_DIR` like everywhere else. - [ ] **Step 3: Fix dispatch.py** In `services/telegram_bot/dispatch.py`, replace line 17: ```python WS_GATEWAY_SOCKET_PATH = os.environ.get("WS_GATEWAY_SOCKET", "/run/ws-gateway/ws.sock") ``` - [ ] **Step 4: Run tests** Run: `pytest tests/ -q --tb=short` Expected: All pass - [ ] **Step 5: Commit** ```bash git add services/telegram_bot/config.py services/telegram_bot/dispatch.py src/profiler.py git commit -m "fix: use DATA_DIR env var everywhere — remove hardcoded /data paths" ``` --- ### Task 7: Update DEPLOYMENT.md for multi-instance Add production deployment with GHCR images, Caddy TLS, and multi-instance guidance. **Files:** - Modify: `docs/DEPLOYMENT.md` - [ ] **Step 1: Add sections to DEPLOYMENT.md** Add these sections: **Production with GHCR images:** ```markdown ### Production Deployment (pre-built images) Instead of building locally, pull from GitHub Container Registry: ```bash docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d ``` Pin to a specific version: ```bash # In docker-compose.prod.yml, change :latest to :COMMIT_SHA image: ghcr.io/keboola/agnes-the-ai-analyst:abc1234 ``` ``` **Multi-instance:** ```markdown ## Multi-Instance Deployment Each customer gets a separate VM with isolated data and config. 1. Copy `infra/terraform.tfvars.example` to `infra/instances/customer-name.tfvars` 2. Fill in customer-specific values 3. Apply: `cd infra && terraform workspace new customer-name && terraform apply -var-file=instances/customer-name.tfvars` 4. SSH in and create `config/instance.yaml` from `config/instance.yaml.example` 5. Start: `docker compose -f docker-compose.yml -f docker-compose.prod.yml --profile production up -d` 6. Bootstrap: `curl -X POST http://IP:8000/auth/bootstrap -d '{"email":"admin@customer.com"}'` ``` **Update/rollback:** ```markdown ## Updating an Instance ```bash # Pull latest image docker compose -f docker-compose.yml -f docker-compose.prod.yml pull # Restart with new image docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d # Rollback to specific version # Edit docker-compose.prod.yml: change :latest to :PREVIOUS_SHA docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d ``` ``` - [ ] **Step 2: Commit** ```bash git add docs/DEPLOYMENT.md git commit -m "docs: add multi-instance deployment, GHCR images, update/rollback procedures" ``` --- ## Execution Order Sequential recommended (some tasks depend on earlier ones): 1. **Task 1** — .env.template (no deps) 2. **Task 2** — docker-compose fixes (no deps) 3. **Task 3** — Caddy TLS (depends on Task 2) 4. **Task 4** — image versioning (no deps) 5. **Task 5** — Terraform remote state (no deps) 6. **Task 6** — hardcoded paths (no deps) 7. **Task 7** — documentation (depends on all above) Tasks 1, 2, 4, 5, 6 can run in parallel. **Verification after all tasks:** ```bash # Tests still pass pytest tests/ -v --tb=short # Docker builds docker compose build # Production compose validates docker compose -f docker-compose.yml -f docker-compose.prod.yml config ```