From f6c2012d5ba5e106c7721f401a52465d7534b987 Mon Sep 17 00:00:00 2001 From: ZdenekSrotyr <139972147+ZdenekSrotyr@users.noreply.github.com> Date: Thu, 7 May 2026 11:36:36 +0200 Subject: [PATCH] infra(customer-instance): preserve operator AGNES_TAG / AGNES_TEMP_DIR (#214) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The startup script runs on every boot but the metadata_startup_script field is in lifecycle.ignore_changes — so a TF apply that changed image_tag does NOT reach a long-lived VM until someone explicitly recreates it. Meanwhile, operators commonly hand-edit /opt/agnes/.env to pin a specific image (custom branch builds, staged rollouts). Pre-fix, every boot rewrote .env from the baked-in template and clobbered the operator's choice — concretely, a stop+start triggered by a machine_type change would reset AGNES_TAG to whatever was in the template at first provision, regardless of the operator's intervening edit. Now the script reads the existing .env (when present) for AGNES_TAG and AGNES_TEMP_DIR; when those keys are set, the existing values win over the template-computed ones. Logged on stdout when AGNES_TAG disagrees with $IMAGE_TAG so an operator audit-trails the boot. Fresh provisions are unchanged (no .env yet → template values land). To force a TF-driven reset on an existing VM: rm /opt/agnes/.env and reboot. Cut as infra-v1.8.0 — additive, downstream consumers opt in by bumping the module ref. --- CHANGELOG.md | 4 ++- .../customer-instance/startup-script.sh.tpl | 29 ++++++++++++++++++- 2 files changed, 31 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index f4498d7..79d32c6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,7 +10,9 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C ## [Unreleased] -## [0.44.1] — 2026-05-07 +### Internal + +- `infra/modules/customer-instance` (tag `infra-v1.8.0`): `startup-script.sh.tpl` no longer overwrites operator-edited `AGNES_TAG` / `AGNES_TEMP_DIR` in `/opt/agnes/.env` on every boot. Reads the existing values when present and lets them win over the template-computed `$IMAGE_TAG`. Pre-fix, an in-place TF action that stopped/started the VM (e.g. `machine_type` change) would re-run the startup script and clobber any manually-pinned image tag — operators had to re-edit the file post-restart. Fresh provisions still get the TF-driven values; the `.env` file's existence is the disambiguator. To force a TF-driven reset, `rm /opt/agnes/.env` and reboot. ### Fixed diff --git a/infra/modules/customer-instance/startup-script.sh.tpl b/infra/modules/customer-instance/startup-script.sh.tpl index a31a585..e89efd5 100644 --- a/infra/modules/customer-instance/startup-script.sh.tpl +++ b/infra/modules/customer-instance/startup-script.sh.tpl @@ -142,6 +142,32 @@ if [ "$TLS_MODE" = "caddy" ] && [ -n "$DOMAIN" ]; then fi fi +# Preserve operator overrides on AGNES_TAG. Rationale: this script +# runs on every boot (and the `metadata_startup_script` is in +# `lifecycle.ignore_changes` so a TF apply that changed the +# `image_tag` variable does NOT propagate to a long-lived VM until +# someone explicitly recreates it). Operators commonly hand-edit +# `/opt/agnes/.env` to pin a custom image tag (e.g. for a dev branch +# build, or a staged rollout) — overwriting that on every reboot +# clobbers their decision. Read the existing AGNES_TAG and let it +# win when it disagrees with $IMAGE_TAG; ditto for AGNES_TEMP_DIR +# (a deployment-specific path tweak operators sometimes set to +# steer tempdirs onto a larger volume). +EXISTING_AGNES_TAG="" +EXISTING_AGNES_TEMP_DIR="" +if [ -f "$APP_DIR/.env" ]; then + EXISTING_AGNES_TAG=$(grep -E '^AGNES_TAG=' "$APP_DIR/.env" | head -1 | cut -d= -f2- | tr -d '"' || true) + EXISTING_AGNES_TEMP_DIR=$(grep -E '^AGNES_TEMP_DIR=' "$APP_DIR/.env" | head -1 | cut -d= -f2- | tr -d '"' || true) +fi +EFFECTIVE_AGNES_TAG="$${EXISTING_AGNES_TAG:-$IMAGE_TAG}" +if [ -n "$EXISTING_AGNES_TAG" ] && [ "$EXISTING_AGNES_TAG" != "$IMAGE_TAG" ]; then + echo "INFO: preserving operator-edited AGNES_TAG=$EXISTING_AGNES_TAG (TF variable said $IMAGE_TAG; rm /opt/agnes/.env to reset)" +fi +AGNES_TEMP_DIR_LINE="" +if [ -n "$EXISTING_AGNES_TEMP_DIR" ]; then + AGNES_TEMP_DIR_LINE="AGNES_TEMP_DIR=\"$EXISTING_AGNES_TEMP_DIR\"" +fi + cat > "$APP_DIR/.env" <