diff --git a/CHANGELOG.md b/CHANGELOG.md index 0a67166..5a1e557 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,9 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C ## [Unreleased] +### Fixed +- **`scripts/ops/agnes-tls-rotate.sh` now chowns `/data/state/certs/` to UID 999 (the `agnes` user inside the app image) on every run.** Previously the script only `mkdir -p`'d and `chmod 700`'d the directory, leaving ownership to whoever happened to create it first — root when systemd fired the timer before docker-compose-up, or UID 999 when the container's volume init touched it first. Race-dependent. When root won, the resulting `drwx------ root:root` directory was unreadable by the UID-999 container, `_read_agnes_ca_pem()` returned `None`, and the `/install` setup prompt silently dropped the cross-platform TLS trust block (Step 0 from #137) — operators on those VMs ended up with no client-side cert bootstrap and a broken `claude plugin marketplace add` against the self-signed host. The chown is unconditional + idempotent (`|| true` for hosts where the numeric GID can't be set), so re-running the timer self-heals existing VMs without manual `chown` on the operator's part. Files inside the directory keep their existing modes — `fullchain.pem` is `0644` (world-readable, so root- or 999-owned both work for the agnes container) and `privkey.pem` is `0600` (only Caddy reads it, and Caddy's container runs as root). + ## [0.23.0] — 2026-04-30 ### Added diff --git a/scripts/ops/agnes-tls-rotate.sh b/scripts/ops/agnes-tls-rotate.sh index 4592d3c..b3482cf 100755 --- a/scripts/ops/agnes-tls-rotate.sh +++ b/scripts/ops/agnes-tls-rotate.sh @@ -36,6 +36,17 @@ set -a; . /opt/agnes/.env; set +a CERT_DIR=/data/state/certs mkdir -p "$CERT_DIR" +# Match the agnes UID baked into the app image (Dockerfile: useradd --uid 999). +# Without this, whoever happens to win the create race (this script as root +# vs. the app container's first volume-init touch as 999) decides ownership; +# when root wins, mode 700 leaves the container unable to read its own certs +# and `_read_agnes_ca_pem()` silently returns None, suppressing the trust- +# bootstrap block in the /install setup prompt. `|| true` keeps the script +# resilient on hosts where the GID is reserved (chgrp on a non-existent +# numeric GID is fine on Linux but pedantically fails on some BSD-derived +# tooling); if the chown itself fails we keep going and surface the +# resulting permission error from the next refetch step instead. +chown 999:999 "$CERT_DIR" || true chmod 700 "$CERT_DIR" CHANGED=0