The earlier base.html edit only affected templates that extend base.html
(login.html via base_login.html). Most pages (dashboard, catalog,
admin_tables, admin_permissions, activity_center, corporate_memory, ...)
are standalone templates with their own <body>, so the badge never showed.
Fix: extracted the badge + fetch script into _version_badge.html partial,
included it before </body> in every full-page template. Consistent across
login, dashboard, admin, catalog, etc.
GCP rejected the policy with 'REDUCE_COUNT_FALSE cannot be applied to
metrics with value type DOUBLE' — because ALIGN_FRACTION_TRUE already
produces a fraction 0..1 per series, no need for an additional cross-series
reducer. Simplified: alert when the per-series fraction < 1 for 5 min.
Review M4 predicted this — uptime check filters needed double-checking
against live GCP.
v1.3.0 added google_monitoring_uptime_check_config + alert policies to the
module, but bootstrap-gcp.sh was not updated. Fresh customers (and the
first apply after upgrading existing customers) hit 403 on
monitoring.uptimeCheckConfigs.create.
Fix: enable monitoring.googleapis.com + grant roles/monitoring.editor to
the deploy SA. Idempotent (safe to re-run on existing projects).
UI now shows a small footer badge with:
- release channel + CalVer version (e.g. 'stable-2026.04.47')
- floating image tag (e.g. 'stable')
- time since last container restart (proxy for 'last deployed')
Backend:
- app/api/health.py: /api/health returns image_tag, commit_sha, deployed_at
- app/api/health.py: new /api/version endpoint (lightweight, no DB hit, for
footer badge polling)
Infra:
- startup-script.sh.tpl: resolves image digest from ghcr pull, derives
channel + version from the tag name, and writes AGNES_VERSION /
RELEASE_CHANNEL / AGNES_COMMIT_SHA into .env so the app can surface them
to the UI.
UI:
- app/web/templates/base.html: footer loads /api/version asynchronously and
renders '<channel>-<version> · <tag> · deployed <relative> (<UTC>)'.
Tooltip shows full detail (commit sha, schema version).
- docs/DEPLOYMENT.md: rewritten to pick between Terraform (managed) and
Docker Compose (OSS self-host). Old manual SSH-key-and-git-clone flow
replaced with compose-based instructions pointing at the persistent-disk
overlay and bootstrap endpoint.
- docs/ONBOARDING.md: section 4 now documents the new v1.4.0 variables
(runtime_secrets, firewall_ssh_source_ranges, notification_channel_ids,
compose_ref). Section 6 explains the /auth/bootstrap seed-user fix and
warns that destroy+apply reopens the bootstrap window until run again.
- README.md: Documentation list expanded — ONBOARDING.md first (recommended
path), DEPLOYMENT.md as the branching point, plus links to CONFIGURATION,
architecture, and QUICKSTART.
Bug: SEED_ADMIN_EMAIL creates a password-less user at app startup, which made
/auth/bootstrap return 403 '1 users already exist' on a fresh deployment —
leaving the operator no way to log in (the seed user has no password, and
/auth/token requires one).
Fix: bootstrap is now disabled only when at least one user has a
password_hash set. On a fresh deploy with a seed user:
- POST /auth/bootstrap { email: <matches seed>, password: X } → sets the
password on the seed user, promotes to admin, returns token.
- With a non-matching email, a new admin is created alongside the seed user.
Lock semantics: bootstrap self-deactivates as soon as any password is set.
Tests: 8 passing, including new test_bootstrap_activates_seed_user and
test_bootstrap_disabled_when_password_user_exists covering the two halves.
Critical fixes:
- C1: VM SA now gets secretmanager.secretAccessor only on specific secrets
(JWT + each entry in runtime_secrets). Previously project-wide.
- C3: chmod 640 on /var/log/agnes-startup.log (defense in depth)
- C4: Remove '|| echo ""' fallback on keboola-storage-token — boot now fails
loudly if the secret is missing instead of starting a broken app.
- C5: Cron auto-upgrade script sources /opt/agnes/.env for AGNES_TAG. If an
operator edits .env to pin a specific stable-YYYY.MM.N, cron picks it up
immediately with no drift. Removed AGNES_TAG from crontab entry.
- C7: explicit depends_on = [IAM bindings, secret_version] on VM — prevents
race where VM boots before IAM propagates.
Important fixes:
- I1: Split firewall into web (80/443 + conditional 8000) and ssh (port 22 with
configurable source_ranges, default IAP range only).
- I4: Fetch docker-compose files from compose_ref (default 'main'), so customers
can pin a specific tag for reproducibility.
- I5+I6: Merge order fixed — user-supplied dev_instances values now override
defaults (was the other way around). Dev tls_mode default flipped to 'none'.
- I7: Remove '|| true' on Caddyfile fetch; surface failures loudly.
- New acme_email variable (falls back to seed_admin_email if empty).
Out-of-module:
- Comments translated from Czech to English where applicable (M1).
- google_compute_resource_policy.daily_backup: daily snapshot at 02:00,
30-day retention, labels (app=agnes, customer=<name>)
- google_compute_disk_resource_policy_attachment.data_backup: attach policy
to each data disk (prod + dev)
- google_monitoring_uptime_check_config.health: per-VM /api/health uptime
check every 60s, 10s timeout
- google_monitoring_alert_policy.health_failure: alert when uptime check
fails for > 5 min
New opt-out: enable_monitoring = false (default true)
New opt-in: notification_channel_ids = [...] to wire alerts to email/Slack
Module API unchanged; existing customers pick up backups + monitoring on
next module upgrade. TF provider requirement unchanged.
Extracts branch name from GITHUB_REF, slugifies it, and adds as extra tag
on feature branch builds. Main branch is unaffected (no branch_slug output).
Enables dev_instances tfvar with image_tag pinning specific feature branches.
The CI smoke test failed because docker-compose.prod.yml forced a bind mount
to /data on the host — which doesn't exist on GitHub runners.
Split the bind mount into docker-compose.host-mount.yml, which is only
composed by the VM startup script (/data exists there, mounted from the
persistent disk). CI continues to use the default named volume.
Module startup script + auto-upgrade cron now compose all three:
-f docker-compose.yml -f docker-compose.prod.yml -f docker-compose.host-mount.yml
Watchtower container has Docker API mismatch (client 1.25 vs daemon 1.54+)
that can't be worked around without upstream fix. Simple cron job does the
same thing more reliably:
- Every 5 min: docker compose pull + detect digest change + up -d if changed
- Logs to /var/log/agnes-auto-upgrade.log
This removes the watchtower container and a Docker daemon dependency.
Without this override, docker-compose creates a named volume 'agnes_data'
on the boot disk, ignoring any persistent disk mounted at /data by the
VM startup script. This override makes the 'data' volume a bind mount
to host /data, so persistent disks work as expected.
Reads JWT_SECRET_KEY and KEBOOLA_STORAGE_TOKEN from Secret Manager,
combines with non-secret config, writes .env with chmod 600.
Run as part of VM startup or manually for rotation.
Creates agnes-deploy SA with Terraform-scoped roles, GCS tfstate bucket,
and generates a JSON key. Idempotent — safe to re-run.
Expanded .gitignore to block *-key.json files from ever being committed.
- Spec: pure self-deploy model with per-customer GCP project
- Public upstream repo with TF module; private template + per-customer repos
- Branch-aware dev VMs via dev_instances list
- Caddy TLS, Secret Manager for tokens, SA JSON key for CI (WIF follow-up)
- 6-phase implementation plan with bite-sized tasks
- Create empty .env before docker compose up in CI (env_file: .env is required)
- Mock get_jira_service in webhook HMAC test to isolate signature check
from Jira API availability — strict assert 200 instead of permissive 500