Commit graph

2 commits

Author SHA1 Message Date
ZdenekSrotyr
0842debf8a feat(infra): add daily backup snapshot + monitoring alerts
- google_compute_resource_policy.daily_backup: daily snapshot at 02:00,
  30-day retention, labels (app=agnes, customer=<name>)
- google_compute_disk_resource_policy_attachment.data_backup: attach policy
  to each data disk (prod + dev)
- google_monitoring_uptime_check_config.health: per-VM /api/health uptime
  check every 60s, 10s timeout
- google_monitoring_alert_policy.health_failure: alert when uptime check
  fails for > 5 min

New opt-out: enable_monitoring = false (default true)
New opt-in:  notification_channel_ids = [...] to wire alerts to email/Slack

Module API unchanged; existing customers pick up backups + monitoring on
next module upgrade. TF provider requirement unchanged.
2026-04-21 19:01:56 +02:00
ZdenekSrotyr
a2c05a5d97 infra: refactor Terraform into reusable customer-instance module
Breaking changes:
- infra/main.tf, variables.tf, outputs.tf, terraform.tfvars.example removed
- Single-file monolith replaced by reusable module + example

New structure:
- infra/modules/customer-instance/ — the module:
  - main.tf: VMs, disks, firewall, Secret Manager, dedicated VM SA
  - variables.tf: prod_instance + dev_instances flexible schema
  - outputs.tf: IPs, SA email, JWT secret reference
  - startup-script.sh.tpl: bootstraps VM, fetches secrets, runs compose,
    adds Watchtower for auto-upgrade
- infra/examples/minimal/ — OSS self-host quickstart using the module

Supports:
- Per-customer GCP project isolation
- Branch-aware dev VMs via dev_instances list (any image_tag)
- Persistent /data disk (rebuild-safe)
- OS Login (no per-user SSH keys)
- Caddy TLS mode (opt-in via tls_mode="caddy" + domain)
- Watchtower auto-upgrade (opt-in via upgrade_mode="auto")
2026-04-21 16:18:35 +02:00