agnes-the-ai-analyst

Author	SHA1	Message	Date
ZdenekSrotyr	e2eb51f657	ci(release): build image for all branches, not just feature/** (#19 ) * dryrun: intentional failing test (will be reverted) * feat(auth): optional SEED_ADMIN_PASSWORD to pre-hash seed admin (dev helper) Terraform gains enable_seed_password + seed_admin_password (sensitive) vars on the customer-instance module; when enabled the password is piped via startup-script into /opt/agnes/.env as SEED_ADMIN_PASSWORD. On first boot app/main.py argon2-hashes it onto the seed user so the admin can log in immediately without going through /auth/bootstrap. Never overwrites an existing password_hash — safe against accidental reset on terraform apply. * ci(release): build :dev-<slug> on any branch, not just feature/ Before: only 'feature/' branches triggered release.yml, so pushing 'zs/my-edit' or 'fix/bug' did not publish an image. dev_instances entry pinning image_tag = 'dev-zs-my-edit' then crashed VM startup with 'image not found'. Now: any branch push (except main, which produces :stable) publishes :dev-<slug>. Slug strips a leading 'feature/' and replaces non-[a-z0-9-] with '-', keeping existing feature/** behavior identical. * Revert "dryrun: intentional failing test (will be reverted)" This reverts commit cf9cc06a7884bb401ff29fc5cb6d8baf84dc3daa.	2026-04-21 21:33:57 +02:00
ZdenekSrotyr	9a99a82e92	fix(infra): alert policy aggregation — drop cross_series_reducer GCP rejected the policy with 'REDUCE_COUNT_FALSE cannot be applied to metrics with value type DOUBLE' — because ALIGN_FRACTION_TRUE already produces a fraction 0..1 per series, no need for an additional cross-series reducer. Simplified: alert when the per-series fraction < 1 for 5 min. Review M4 predicted this — uptime check filters needed double-checking against live GCP.	2026-04-21 20:36:09 +02:00
ZdenekSrotyr	921094ae40	feat(infra): address code review — scoped SA, fail-fast secrets, firewall split, cron reads .env, merge fix Critical fixes: - C1: VM SA now gets secretmanager.secretAccessor only on specific secrets (JWT + each entry in runtime_secrets). Previously project-wide. - C3: chmod 640 on /var/log/agnes-startup.log (defense in depth) - C4: Remove '\|\| echo ""' fallback on keboola-storage-token — boot now fails loudly if the secret is missing instead of starting a broken app. - C5: Cron auto-upgrade script sources /opt/agnes/.env for AGNES_TAG. If an operator edits .env to pin a specific stable-YYYY.MM.N, cron picks it up immediately with no drift. Removed AGNES_TAG from crontab entry. - C7: explicit depends_on = [IAM bindings, secret_version] on VM — prevents race where VM boots before IAM propagates. Important fixes: - I1: Split firewall into web (80/443 + conditional 8000) and ssh (port 22 with configurable source_ranges, default IAP range only). - I4: Fetch docker-compose files from compose_ref (default 'main'), so customers can pin a specific tag for reproducibility. - I5+I6: Merge order fixed — user-supplied dev_instances values now override defaults (was the other way around). Dev tls_mode default flipped to 'none'. - I7: Remove '\|\| true' on Caddyfile fetch; surface failures loudly. - New acme_email variable (falls back to seed_admin_email if empty). Out-of-module: - Comments translated from Czech to English where applicable (M1).	2026-04-21 19:39:53 +02:00
ZdenekSrotyr	0842debf8a	feat(infra): add daily backup snapshot + monitoring alerts - google_compute_resource_policy.daily_backup: daily snapshot at 02:00, 30-day retention, labels (app=agnes, customer=<name>) - google_compute_disk_resource_policy_attachment.data_backup: attach policy to each data disk (prod + dev) - google_monitoring_uptime_check_config.health: per-VM /api/health uptime check every 60s, 10s timeout - google_monitoring_alert_policy.health_failure: alert when uptime check fails for > 5 min New opt-out: enable_monitoring = false (default true) New opt-in: notification_channel_ids = [...] to wire alerts to email/Slack Module API unchanged; existing customers pick up backups + monitoring on next module upgrade. TF provider requirement unchanged.	2026-04-21 19:01:56 +02:00
ZdenekSrotyr	a2c05a5d97	infra: refactor Terraform into reusable customer-instance module Breaking changes: - infra/main.tf, variables.tf, outputs.tf, terraform.tfvars.example removed - Single-file monolith replaced by reusable module + example New structure: - infra/modules/customer-instance/ — the module: - main.tf: VMs, disks, firewall, Secret Manager, dedicated VM SA - variables.tf: prod_instance + dev_instances flexible schema - outputs.tf: IPs, SA email, JWT secret reference - startup-script.sh.tpl: bootstraps VM, fetches secrets, runs compose, adds Watchtower for auto-upgrade - infra/examples/minimal/ — OSS self-host quickstart using the module Supports: - Per-customer GCP project isolation - Branch-aware dev VMs via dev_instances list (any image_tag) - Persistent /data disk (rebuild-safe) - OS Login (no per-user SSH keys) - Caddy TLS mode (opt-in via tls_mode="caddy" + domain) - Watchtower auto-upgrade (opt-in via upgrade_mode="auto")	2026-04-21 16:18:35 +02:00

5 commits