Critical fixes:
- C1: VM SA now gets secretmanager.secretAccessor only on specific secrets
(JWT + each entry in runtime_secrets). Previously project-wide.
- C3: chmod 640 on /var/log/agnes-startup.log (defense in depth)
- C4: Remove '|| echo ""' fallback on keboola-storage-token — boot now fails
loudly if the secret is missing instead of starting a broken app.
- C5: Cron auto-upgrade script sources /opt/agnes/.env for AGNES_TAG. If an
operator edits .env to pin a specific stable-YYYY.MM.N, cron picks it up
immediately with no drift. Removed AGNES_TAG from crontab entry.
- C7: explicit depends_on = [IAM bindings, secret_version] on VM — prevents
race where VM boots before IAM propagates.
Important fixes:
- I1: Split firewall into web (80/443 + conditional 8000) and ssh (port 22 with
configurable source_ranges, default IAP range only).
- I4: Fetch docker-compose files from compose_ref (default 'main'), so customers
can pin a specific tag for reproducibility.
- I5+I6: Merge order fixed — user-supplied dev_instances values now override
defaults (was the other way around). Dev tls_mode default flipped to 'none'.
- I7: Remove '|| true' on Caddyfile fetch; surface failures loudly.
- New acme_email variable (falls back to seed_admin_email if empty).
Out-of-module:
- Comments translated from Czech to English where applicable (M1).
- google_compute_resource_policy.daily_backup: daily snapshot at 02:00,
30-day retention, labels (app=agnes, customer=<name>)
- google_compute_disk_resource_policy_attachment.data_backup: attach policy
to each data disk (prod + dev)
- google_monitoring_uptime_check_config.health: per-VM /api/health uptime
check every 60s, 10s timeout
- google_monitoring_alert_policy.health_failure: alert when uptime check
fails for > 5 min
New opt-out: enable_monitoring = false (default true)
New opt-in: notification_channel_ids = [...] to wire alerts to email/Slack
Module API unchanged; existing customers pick up backups + monitoring on
next module upgrade. TF provider requirement unchanged.