From 4799119c813735aa51e416231f73ee80151938d7 Mon Sep 17 00:00:00 2001 From: Petr Simecek Date: Sat, 25 Apr 2026 23:19:00 +0200 Subject: [PATCH] feat(deploy): keboola-deploy tag-triggered workflow + Caddyfile LE/internal modes + dev_instances TLS support (#52) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * feat(deploy): keboola-deploy tag-triggered workflow + Caddyfile LE/internal modes + dev_instances TLS support Three coordinated changes that together unblock Keboola's internal Agnes deployment from the foot-gun where the dev VM tracks `:dev` (= last push from anyone in the upstream repo). 1. .github/workflows/keboola-deploy.yml — new workflow Triggered ONLY on `keboola-deploy-*` git tag pushes (not on every branch push like release.yml). Builds an image and publishes two GHCR tags: ghcr.io/keboola/agnes-the-ai-analyst:keboola-deploy- ghcr.io/keboola/agnes-the-ai-analyst:keboola-deploy-latest The Keboola dev VM pins to `keboola-deploy-latest`; an operator deploys by `git tag keboola-deploy-foo && git push origin keboola-deploy-foo`. Audit trail lives in git tags (immutable, who-tagged-what-when), no PR-cycle needed for each deploy. Doesn't touch Vojta/Minas/David workflow — release.yml still builds `:dev-` for every branch push as before. 2. Caddyfile — parametrize TLS directive via $CADDY_TLS env var PR #51 hardcoded cert-file mode (`tls /certs/fullchain.pem ...`) for Groupon's corporate CA flow. That broke the Let's Encrypt path the module previously supported. Now: CADDY_TLS unset (default) → cert-file mode (Groupon corp PKI) CADDY_TLS="tls user@x.com" → Let's Encrypt auto-issue CADDY_TLS="tls internal" → Caddy-managed self-signed (lab/dev) Single Caddyfile, three regimes, no per-deployment fork. Validated with `caddy validate` in all three modes. 3. customer-instance module — dev_instances TLS + auto-set CADDY_TLS - variables.tf: dev_instances object schema gains optional tls_mode + domain (mirroring prod_instance). Defaults to "none" + "" so existing callers without those fields keep current behavior. - startup-script.sh.tpl: when tls_mode="caddy" and DOMAIN is set, write CADDY_TLS=tls (or "tls internal" when ACME_EMAIL empty) into /opt/agnes/.env. Caddy then picks it up and the Caddyfile substitution flips the cert source. For an LE deploy: set tls_mode="caddy", domain="agnes-dev.example.com", ensure DNS A-record points at the VM, and acme_email is set on the module (or seed_admin_email is, since acme_email defaults to it). After this lands, tag as infra-v1.6.0 so downstream infra repos can bump their module ref without needing the upstream change tracking. * feat(deploy): fetch optional Google OAuth credentials from Secret Manager Mirrors the existing keboola-storage-token / agnes--jwt-secret pattern: VM SA reads google-oauth-client-{id,secret} secrets at boot (if they exist + IAM is wired by caller via runtime_secrets) and writes them into /opt/agnes/.env. Empty / missing / 403 → silent fallback to "" so password and email auth keep working untouched. Pairs with downstream change in agnes-infra-keboola which adds the two secret names to runtime_secrets, granting the Keboola VM SA secretAccessor on them. Operator pre-creates the SM containers via gcloud secrets create google-oauth-client-{id,secret} (one-time, out of band) — values stay in SM forever; rotation = `gcloud secrets versions add`. This unblocks the Keboola agnes-dev deploy from PR #3 (infra) — without GOOGLE_CLIENT_{ID,SECRET} in .env, app/auth/providers/google.is_available() returns False and the Google sign-in button never even appears. --- .github/workflows/keboola-deploy.yml | 98 +++++++++++++++++++ Caddyfile | 15 ++- .../customer-instance/startup-script.sh.tpl | 27 +++++ infra/modules/customer-instance/variables.tf | 10 +- 4 files changed, 146 insertions(+), 4 deletions(-) create mode 100644 .github/workflows/keboola-deploy.yml diff --git a/.github/workflows/keboola-deploy.yml b/.github/workflows/keboola-deploy.yml new file mode 100644 index 0000000..d5bd49b --- /dev/null +++ b/.github/workflows/keboola-deploy.yml @@ -0,0 +1,98 @@ +name: Keboola Deploy + +# Tag-triggered build for Keboola's internal dev instance. +# +# Why a separate workflow: the default release.yml builds an image for *every* push +# to *every* branch, which means Keboola's `agnes-dev` VM (pinned to `:dev` or +# similar floating tag) sees whoever pushed last — Vojta, Minas, anyone. That +# convenience for Groupon-side dev VMs (per-developer `dev--latest` aliases) +# is a footgun for shared instances. +# +# This workflow runs ONLY when an operator explicitly creates a `keboola-deploy-*` +# git tag. The image is published with two tags: +# - keboola-deploy- (immutable, audit trail in git) +# - keboola-deploy-latest (floating alias the VM tracks) +# +# Operator workflow: +# git checkout +# git tag keboola-deploy-2026-04-25-groups-test +# git push origin keboola-deploy-2026-04-25-groups-test +# # → image built, alias updated, agnes-dev cron picks it up within 5 min +on: + push: + tags: + - "keboola-deploy-*" + +permissions: + contents: read + packages: write + +jobs: + test: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v5 + + - uses: actions/setup-python@v6 + with: + python-version: "3.13" + + - name: Install uv + uses: astral-sh/setup-uv@v7 + + - name: Install dependencies + run: uv pip install --system ".[dev]" + + - name: Run tests + run: pytest tests/ -v --tb=short + env: + TESTING: "1" + + build-and-push: + needs: test + runs-on: ubuntu-latest + outputs: + image_tag: ${{ steps.meta.outputs.tag }} + steps: + - uses: actions/checkout@v5 + + - name: Resolve tag + version + id: meta + run: | + TAG="${GITHUB_REF#refs/tags/}" + # Sanity: tag must start with keboola-deploy- (the `on:` filter already + # enforces this, but cheap belt-and-braces against future workflow edits). + case "$TAG" in + keboola-deploy-*) ;; + *) echo "::error::Tag $TAG does not match keboola-deploy-* — refusing to build"; exit 1 ;; + esac + # Package version: source of truth is pyproject.toml (same convention as + # release.yml). The git tag is the *deploy identifier*, package version + # is the *product identifier*. + PKG_VERSION=$(grep '^version' pyproject.toml | head -1 | sed -E 's/^version\s*=\s*"([^"]+)".*/\1/') + if [ -z "$PKG_VERSION" ]; then + echo "::error::Could not extract version from pyproject.toml"; exit 1 + fi + echo "tag=${TAG}" >> "$GITHUB_OUTPUT" + echo "pkg_version=${PKG_VERSION}" >> "$GITHUB_OUTPUT" + echo "Building image for git tag: ${TAG} (package version ${PKG_VERSION})" + + - name: Log in to GHCR + uses: docker/login-action@v4 + with: + registry: ghcr.io + username: ${{ github.actor }} + password: ${{ secrets.GITHUB_TOKEN }} + + - name: Build and push + uses: docker/build-push-action@v7 + with: + push: true + build-args: | + AGNES_VERSION=${{ steps.meta.outputs.pkg_version }} + RELEASE_CHANNEL=keboola-deploy + AGNES_COMMIT_SHA=${{ github.sha }} + AGNES_TAG=${{ steps.meta.outputs.tag }} + tags: | + ghcr.io/${{ github.repository }}:${{ steps.meta.outputs.tag }} + ghcr.io/${{ github.repository }}:keboola-deploy-latest diff --git a/Caddyfile b/Caddyfile index 5caa4e4..b5a242d 100644 --- a/Caddyfile +++ b/Caddyfile @@ -1,7 +1,16 @@ {$DOMAIN:localhost} { - # Cert-file mode (corporate CA path). For Let's Encrypt, drop the - # `tls` directive entirely so Caddy auto-issues. See docs/DEPLOYMENT.md. - tls /certs/fullchain.pem /certs/privkey.pem { + # Cert provisioning. Driven by env var CADDY_TLS: + # - unset (default) → cert-file mode for corporate PKI (rotated by + # scripts/grpn/agnes-tls-rotate.sh into /data/state/certs/). + # - "tls " → Let's Encrypt auto-issue, e.g. "tls ops@example.com" + # (used by public-internet deployments like Keboola dev). + # - "tls internal" → Caddy-managed self-signed cert (lab/dev only, + # browser warning on every visit). + # + # The {$VAR:default} substitution lets one Caddyfile serve all three + # regimes without per-deployment forks. Caddyfile parses the substituted + # string as a directive, so the value MUST start with `tls `. + {$CADDY_TLS:tls /certs/fullchain.pem /certs/privkey.pem} { # Modern TLS only. Caddy default already excludes 1.0/1.1 in # most builds, but pin explicitly so a future Caddy default # change can't silently weaken our posture. diff --git a/infra/modules/customer-instance/startup-script.sh.tpl b/infra/modules/customer-instance/startup-script.sh.tpl index 9d6b740..fb9e6c3 100644 --- a/infra/modules/customer-instance/startup-script.sh.tpl +++ b/infra/modules/customer-instance/startup-script.sh.tpl @@ -69,12 +69,36 @@ if [ "$DATA_SOURCE" = "keboola" ]; then fi JWT_KEY=$(gcloud secrets versions access latest --secret=agnes-$${CUSTOMER_NAME}-jwt-secret) +# Optional Google OAuth credentials. If the operator has created +# google-oauth-client-{id,secret} secrets in the project's Secret Manager +# AND wired them via runtime_secrets in the calling Terraform, the VM SA can +# read them — write into .env so the Google sign-in flow works. Missing / +# 403 / empty → silent fallback to "" so password + email auth keep working. +GOOGLE_CLIENT_ID=$(gcloud secrets versions access latest --secret=google-oauth-client-id 2>/dev/null || echo "") +GOOGLE_CLIENT_SECRET=$(gcloud secrets versions access latest --secret=google-oauth-client-secret 2>/dev/null || echo "") + # AGNES_VERSION, RELEASE_CHANNEL, AGNES_COMMIT_SHA are baked into the image # itself as ENV (see Dockerfile ARG/ENV + release.yml build-args). We do NOT # set them here — doing so would override the image-level values with the # floating tag name ("stable"/"dev"), hiding the real CalVer / git SHA. # The app picks them up from the image's runtime environment. +# CADDY_TLS controls Caddyfile cert provisioning (see Caddyfile inline docs). +# - tls_mode=caddy + ACME_EMAIL set → Let's Encrypt auto-issue (public domain) +# - tls_mode=caddy + no ACME_EMAIL → Caddy-managed self-signed (lab use) +# - any other tls_mode → leave CADDY_TLS unset, Caddyfile default +# (cert-file mode for corporate PKI) applies. +# Operators wanting cert-file mode shouldn't set tls_mode at all on the dev +# instance — leave it "none" and let the corp-PKI rotate scripts handle certs. +CADDY_TLS_LINE="" +if [ "$TLS_MODE" = "caddy" ] && [ -n "$DOMAIN" ]; then + if [ -n "$ACME_EMAIL" ]; then + CADDY_TLS_LINE="CADDY_TLS=tls $ACME_EMAIL" + else + CADDY_TLS_LINE="CADDY_TLS=tls internal" + fi +fi + cat > "$APP_DIR/.env" <