* feat(deploy): keboola-deploy tag-triggered workflow + Caddyfile LE/internal modes + dev_instances TLS support
Three coordinated changes that together unblock Keboola's internal Agnes
deployment from the foot-gun where the dev VM tracks `:dev` (= last push
from anyone in the upstream repo).
1. .github/workflows/keboola-deploy.yml — new workflow
Triggered ONLY on `keboola-deploy-*` git tag pushes (not on every branch
push like release.yml). Builds an image and publishes two GHCR tags:
ghcr.io/keboola/agnes-the-ai-analyst:keboola-deploy-<git-tag-suffix>
ghcr.io/keboola/agnes-the-ai-analyst:keboola-deploy-latest
The Keboola dev VM pins to `keboola-deploy-latest`; an operator deploys
by `git tag keboola-deploy-foo && git push origin keboola-deploy-foo`.
Audit trail lives in git tags (immutable, who-tagged-what-when), no
PR-cycle needed for each deploy.
Doesn't touch Vojta/Minas/David workflow — release.yml still builds
`:dev-<slug>` for every branch push as before.
2. Caddyfile — parametrize TLS directive via $CADDY_TLS env var
PR #51 hardcoded cert-file mode (`tls /certs/fullchain.pem ...`) for
Groupon's corporate CA flow. That broke the Let's Encrypt path the
module previously supported. Now:
CADDY_TLS unset (default) → cert-file mode (Groupon corp PKI)
CADDY_TLS="tls user@x.com" → Let's Encrypt auto-issue
CADDY_TLS="tls internal" → Caddy-managed self-signed (lab/dev)
Single Caddyfile, three regimes, no per-deployment fork. Validated with
`caddy validate` in all three modes.
3. customer-instance module — dev_instances TLS + auto-set CADDY_TLS
- variables.tf: dev_instances object schema gains optional tls_mode +
domain (mirroring prod_instance). Defaults to "none" + "" so existing
callers without those fields keep current behavior.
- startup-script.sh.tpl: when tls_mode="caddy" and DOMAIN is set, write
CADDY_TLS=tls <ACME_EMAIL> (or "tls internal" when ACME_EMAIL empty)
into /opt/agnes/.env. Caddy then picks it up and the Caddyfile
substitution flips the cert source.
For an LE deploy: set tls_mode="caddy", domain="agnes-dev.example.com",
ensure DNS A-record points at the VM, and acme_email is set on the
module (or seed_admin_email is, since acme_email defaults to it).
After this lands, tag as infra-v1.6.0 so downstream infra repos can bump
their module ref without needing the upstream change tracking.
* feat(deploy): fetch optional Google OAuth credentials from Secret Manager
Mirrors the existing keboola-storage-token / agnes-<customer>-jwt-secret
pattern: VM SA reads google-oauth-client-{id,secret} secrets at boot
(if they exist + IAM is wired by caller via runtime_secrets) and writes
them into /opt/agnes/.env. Empty / missing / 403 → silent fallback
to "" so password and email auth keep working untouched.
Pairs with downstream change in agnes-infra-keboola which adds the two
secret names to runtime_secrets, granting the Keboola VM SA secretAccessor
on them. Operator pre-creates the SM containers via gcloud secrets create
google-oauth-client-{id,secret} (one-time, out of band) — values stay
in SM forever; rotation = `gcloud secrets versions add`.
This unblocks the Keboola agnes-dev deploy from PR #3 (infra) — without
GOOGLE_CLIENT_{ID,SECRET} in .env, app/auth/providers/google.is_available()
returns False and the Google sign-in button never even appears.
163 lines
6.9 KiB
Smarty
163 lines
6.9 KiB
Smarty
#!/bin/bash
|
|
# Agnes VM startup script — templated by Terraform.
|
|
# Idempotent — runs on every boot.
|
|
set -euo pipefail
|
|
exec > /var/log/agnes-startup.log 2>&1
|
|
chmod 640 /var/log/agnes-startup.log # defense in depth — not readable by non-root
|
|
|
|
CUSTOMER_NAME="${customer_name}"
|
|
IMAGE_REPO="${image_repo}"
|
|
IMAGE_TAG="${image_tag}"
|
|
UPGRADE_MODE="${upgrade_mode}"
|
|
TLS_MODE="${tls_mode}"
|
|
DOMAIN="${domain}"
|
|
ACME_EMAIL="${acme_email}"
|
|
DATA_SOURCE="${data_source}"
|
|
KEBOOLA_STACK_URL="${keboola_stack_url}"
|
|
SEED_ADMIN_EMAIL="${seed_admin_email}"
|
|
SEED_ADMIN_PASSWORD="${seed_admin_password}"
|
|
ROLE="${role}"
|
|
COMPOSE_REF="${compose_ref}"
|
|
|
|
echo "=== [Agnes $CUSTOMER_NAME $ROLE] Startup at $(date) ==="
|
|
|
|
# --- 1. Docker (install if missing) ---
|
|
if ! command -v docker &>/dev/null; then
|
|
curl -fsSL https://get.docker.com | sh
|
|
fi
|
|
if ! docker compose version &>/dev/null; then
|
|
apt-get update && apt-get install -y docker-compose-plugin
|
|
fi
|
|
|
|
# --- 2. Persistent data disk mount ---
|
|
DATA_DEV="/dev/disk/by-id/google-data"
|
|
DATA_MNT="/data"
|
|
if [ -b "$DATA_DEV" ]; then
|
|
if ! blkid "$DATA_DEV" | grep -q ext4; then
|
|
mkfs.ext4 -F "$DATA_DEV"
|
|
fi
|
|
mkdir -p "$DATA_MNT"
|
|
mountpoint -q "$DATA_MNT" || mount -o discard,defaults "$DATA_DEV" "$DATA_MNT"
|
|
grep -qF "$DATA_DEV" /etc/fstab || echo "$DATA_DEV $DATA_MNT ext4 discard,defaults,nofail 0 2" >> /etc/fstab
|
|
mkdir -p "$DATA_MNT/state" "$DATA_MNT/analytics" "$DATA_MNT/extracts"
|
|
fi
|
|
|
|
# --- 3. App directory + docker-compose files from public repo ---
|
|
APP_DIR="/opt/agnes"
|
|
mkdir -p "$APP_DIR"
|
|
cd "$APP_DIR"
|
|
|
|
# Fetch docker-compose files pinned to $COMPOSE_REF (defaults to `main`; pin to a
|
|
# stable-YYYY.MM.N tag for reproducibility across VM rebuilds).
|
|
RAW_BASE="https://raw.githubusercontent.com/keboola/agnes-the-ai-analyst/$${COMPOSE_REF}"
|
|
curl -fsSL "$${RAW_BASE}/docker-compose.yml" -o docker-compose.yml
|
|
curl -fsSL "$${RAW_BASE}/docker-compose.prod.yml" -o docker-compose.prod.yml
|
|
# Overlay which binds `data` volume to host /data (persistent disk mounted above)
|
|
curl -fsSL "$${RAW_BASE}/docker-compose.host-mount.yml" -o docker-compose.host-mount.yml
|
|
|
|
# TLS overlay (Caddy + Let's Encrypt) — fetch only when actually needed; surface failures
|
|
if [ "$TLS_MODE" = "caddy" ] && [ -n "$DOMAIN" ]; then
|
|
curl -fsSL "$${RAW_BASE}/Caddyfile" -o Caddyfile
|
|
fi
|
|
|
|
# --- 4. Fetch secrets from Secret Manager — fail loudly if missing ---
|
|
KEBOOLA_TOKEN=""
|
|
if [ "$DATA_SOURCE" = "keboola" ]; then
|
|
# No `|| echo ""` fallback — if the token secret is missing, boot should fail
|
|
# loudly rather than silently start an app that will fail sync cryptically later.
|
|
KEBOOLA_TOKEN=$(gcloud secrets versions access latest --secret=keboola-storage-token)
|
|
fi
|
|
JWT_KEY=$(gcloud secrets versions access latest --secret=agnes-$${CUSTOMER_NAME}-jwt-secret)
|
|
|
|
# Optional Google OAuth credentials. If the operator has created
|
|
# google-oauth-client-{id,secret} secrets in the project's Secret Manager
|
|
# AND wired them via runtime_secrets in the calling Terraform, the VM SA can
|
|
# read them — write into .env so the Google sign-in flow works. Missing /
|
|
# 403 / empty → silent fallback to "" so password + email auth keep working.
|
|
GOOGLE_CLIENT_ID=$(gcloud secrets versions access latest --secret=google-oauth-client-id 2>/dev/null || echo "")
|
|
GOOGLE_CLIENT_SECRET=$(gcloud secrets versions access latest --secret=google-oauth-client-secret 2>/dev/null || echo "")
|
|
|
|
# AGNES_VERSION, RELEASE_CHANNEL, AGNES_COMMIT_SHA are baked into the image
|
|
# itself as ENV (see Dockerfile ARG/ENV + release.yml build-args). We do NOT
|
|
# set them here — doing so would override the image-level values with the
|
|
# floating tag name ("stable"/"dev"), hiding the real CalVer / git SHA.
|
|
# The app picks them up from the image's runtime environment.
|
|
|
|
# CADDY_TLS controls Caddyfile cert provisioning (see Caddyfile inline docs).
|
|
# - tls_mode=caddy + ACME_EMAIL set → Let's Encrypt auto-issue (public domain)
|
|
# - tls_mode=caddy + no ACME_EMAIL → Caddy-managed self-signed (lab use)
|
|
# - any other tls_mode → leave CADDY_TLS unset, Caddyfile default
|
|
# (cert-file mode for corporate PKI) applies.
|
|
# Operators wanting cert-file mode shouldn't set tls_mode at all on the dev
|
|
# instance — leave it "none" and let the corp-PKI rotate scripts handle certs.
|
|
CADDY_TLS_LINE=""
|
|
if [ "$TLS_MODE" = "caddy" ] && [ -n "$DOMAIN" ]; then
|
|
if [ -n "$ACME_EMAIL" ]; then
|
|
CADDY_TLS_LINE="CADDY_TLS=tls $ACME_EMAIL"
|
|
else
|
|
CADDY_TLS_LINE="CADDY_TLS=tls internal"
|
|
fi
|
|
fi
|
|
|
|
cat > "$APP_DIR/.env" <<ENVEOF
|
|
JWT_SECRET_KEY=$JWT_KEY
|
|
DATA_DIR=$DATA_MNT
|
|
DATA_SOURCE=$DATA_SOURCE
|
|
KEBOOLA_STORAGE_TOKEN=$KEBOOLA_TOKEN
|
|
KEBOOLA_STACK_URL=$KEBOOLA_STACK_URL
|
|
SEED_ADMIN_EMAIL=$SEED_ADMIN_EMAIL
|
|
SEED_ADMIN_PASSWORD=$SEED_ADMIN_PASSWORD
|
|
LOG_LEVEL=info
|
|
DOMAIN=$DOMAIN
|
|
AGNES_TAG=$IMAGE_TAG
|
|
ACME_EMAIL=$ACME_EMAIL
|
|
GOOGLE_CLIENT_ID=$GOOGLE_CLIENT_ID
|
|
GOOGLE_CLIENT_SECRET=$GOOGLE_CLIENT_SECRET
|
|
$CADDY_TLS_LINE
|
|
ENVEOF
|
|
chmod 600 "$APP_DIR/.env"
|
|
|
|
# --- 5. Start Agnes ---
|
|
COMPOSE_PROFILES_ARG=""
|
|
if [ "$TLS_MODE" = "caddy" ] && [ -n "$DOMAIN" ]; then
|
|
COMPOSE_PROFILES_ARG="--profile tls"
|
|
fi
|
|
|
|
COMPOSE_FILES="-f docker-compose.yml -f docker-compose.prod.yml -f docker-compose.host-mount.yml"
|
|
|
|
docker compose $COMPOSE_FILES $COMPOSE_PROFILES_ARG pull
|
|
docker compose $COMPOSE_FILES $COMPOSE_PROFILES_ARG up -d
|
|
|
|
# --- 6. Auto-upgrade via cron (pulls new image digest every 5 min) ---
|
|
if [ "$UPGRADE_MODE" = "auto" ]; then
|
|
# Cron script sources /opt/agnes/.env for AGNES_TAG — so if operator edits .env
|
|
# (e.g. to pin a specific stable-YYYY.MM.N), cron picks it up immediately. No
|
|
# drift between what compose up reads and what the digest-check inspects.
|
|
cat > /usr/local/bin/agnes-auto-upgrade.sh <<'SCRIPTEOF'
|
|
#!/bin/bash
|
|
# Runs from cron — pulls new image if one is available, restarts containers.
|
|
set -euo pipefail
|
|
cd /opt/agnes
|
|
# Source .env so AGNES_TAG reflects any operator edits since boot.
|
|
# shellcheck disable=SC1091
|
|
set -a; . /opt/agnes/.env; set +a
|
|
IMAGE="ghcr.io/keboola/agnes-the-ai-analyst:$${AGNES_TAG:-stable}"
|
|
COMPOSE_FILES="-f docker-compose.yml -f docker-compose.prod.yml -f docker-compose.host-mount.yml"
|
|
BEFORE=$(docker images --no-trunc --format '{{.Digest}}' "$IMAGE" | head -1)
|
|
docker compose $COMPOSE_FILES pull >/dev/null 2>&1
|
|
AFTER=$(docker images --no-trunc --format '{{.Digest}}' "$IMAGE" | head -1)
|
|
if [ "$BEFORE" != "$AFTER" ]; then
|
|
echo "$(date): new image digest for $IMAGE — recreating containers"
|
|
docker compose $COMPOSE_FILES up -d
|
|
docker image prune -f >/dev/null 2>&1
|
|
fi
|
|
SCRIPTEOF
|
|
chmod +x /usr/local/bin/agnes-auto-upgrade.sh
|
|
|
|
# Install cron entry idempotently: remove any prior agnes-auto-upgrade line, then append ours.
|
|
CRON_LINE="*/5 * * * * /usr/local/bin/agnes-auto-upgrade.sh >> /var/log/agnes-auto-upgrade.log 2>&1"
|
|
(crontab -l 2>/dev/null | grep -v agnes-auto-upgrade || true; echo "$CRON_LINE") | crontab -
|
|
fi
|
|
|
|
echo "=== [Agnes $CUSTOMER_NAME $ROLE] Startup complete at $(date) ==="
|
|
docker compose ps
|