agnes-the-ai-analyst

Author	SHA1	Message	Date
ZdenekSrotyr	060335deba	docs(quickstart): add Hackathon section pointing to switch-dev-vm.sh and HACKATHON.md (#14 ) (#23 )	2026-04-21 21:59:23 +02:00
ZdenekSrotyr	4f381dc103	fix(ci): propagate-infra-tag fail-soft on branch push / missing secret (#24 ) Job-level 'if: secrets.X != ""' did not prevent workflow from being scheduled on branch pushes (GitHub reports failure with 0 jobs in that case). Refactored: first step is a guard that checks both the tag ref pattern and the secret presence; downstream steps skip when the guard says no. Result: workflow now reports success with a clear warning annotation on branch pushes or when the secret is absent; only real infra-v* tag pushes with the secret set perform the bump.	2026-04-21 21:59:10 +02:00
ZdenekSrotyr	5c6a641de7	Merge pull request #22 from keboola/feature/seed-admin-password feat(auth): optional SEED_ADMIN_PASSWORD to pre-hash seed admin (dev helper)	2026-04-21 21:37:29 +02:00
ZdenekSrotyr	e2eb51f657	ci(release): build image for all branches, not just feature/** (#19 ) * dryrun: intentional failing test (will be reverted) * feat(auth): optional SEED_ADMIN_PASSWORD to pre-hash seed admin (dev helper) Terraform gains enable_seed_password + seed_admin_password (sensitive) vars on the customer-instance module; when enabled the password is piped via startup-script into /opt/agnes/.env as SEED_ADMIN_PASSWORD. On first boot app/main.py argon2-hashes it onto the seed user so the admin can log in immediately without going through /auth/bootstrap. Never overwrites an existing password_hash — safe against accidental reset on terraform apply. * ci(release): build :dev-<slug> on any branch, not just feature/ Before: only 'feature/' branches triggered release.yml, so pushing 'zs/my-edit' or 'fix/bug' did not publish an image. dev_instances entry pinning image_tag = 'dev-zs-my-edit' then crashed VM startup with 'image not found'. Now: any branch push (except main, which produces :stable) publishes :dev-<slug>. Slug strips a leading 'feature/' and replaces non-[a-z0-9-] with '-', keeping existing feature/** behavior identical. * Revert "dryrun: intentional failing test (will be reverted)" This reverts commit cf9cc06a7884bb401ff29fc5cb6d8baf84dc3daa.	2026-04-21 21:33:57 +02:00
ZdenekSrotyr	1ca5295d54	docs: add HACKATHON.md — condensed deploy + dev playbooks (#21 ) Written for both humans and AI agents — explicit commands, expected outputs, troubleshooting tables, 'safe to run anytime' vs 'requires thought' sections, pitfalls checklist. Three parts: 1. Deploy for a new customer (45 min target, 7 steps) 2. Develop against Agnes (branch → image → dev VM loop, common tasks) 3. AI agent checklist (guardrails, verification, common pitfalls) Complements the deep docs (ONBOARDING.md, DEPLOYMENT.md, architecture.md) with a practical quick-reference for hackathon-style deploys.	2026-04-21 21:33:06 +02:00
ZdenekSrotyr	ada9fb75f6	chore: add switch-dev-vm.sh helper for hackathon (#20 )	2026-04-21 21:33:02 +02:00
ZdenekSrotyr	2cbffce85f	ci: propagate infra-v* tags to template repo + auto-merge rules (#17 ) * dryrun: verify per-branch GHCR tag * ci: propagate infra-v* tag bumps to template repo On push of any infra-v* tag, opens a PR in keboola/agnes-infra-template that bumps the module ref in terraform/main.tf. Auto-merge rules in the template (Renovate + CI validate + GitHub native auto-merge) land it without manual work on patch/minor bumps. Requires repo secret TEMPLATE_REPO_TOKEN (fine-grained PAT with Contents:write + Pull requests:write on keboola/agnes-infra-template). Fail-soft: if secret is missing the job is skipped and Renovate on the template repo picks up the new tag on its next cycle as a fallback. * docs(onboarding): 'Keeping the template up-to-date' maintainer section Documents the two mechanisms (upstream release hook + Renovate), the required repo settings (allow_auto_merge, validate.yml gate), the TOKEN secret setup, and the one-time setup checklist. Notes the difference between template repo (auto-merge on) and customer infra repos (human approval).	2026-04-21 21:32:58 +02:00
ZdenekSrotyr	96bd06ba00	feat(auth): optional SEED_ADMIN_PASSWORD to pre-hash seed admin (dev helper) Terraform gains enable_seed_password + seed_admin_password (sensitive) vars on the customer-instance module; when enabled the password is piped via startup-script into /opt/agnes/.env as SEED_ADMIN_PASSWORD. On first boot app/main.py argon2-hashes it onto the seed user so the admin can log in immediately without going through /auth/bootstrap. Never overwrites an existing password_hash — safe against accidental reset on terraform apply.	2026-04-21 21:32:22 +02:00
ZdenekSrotyr	e4f6910398	Merge: real CalVer + commit SHA in UI badge	2026-04-21 21:00:42 +02:00
ZdenekSrotyr	1c7cc8aa29	fix(image): add AGNES_COMMIT_SHA build-arg to Dockerfile + release.yml Completes the previous commit — bakes the full git SHA into the image ENV at build time so the UI badge shows a real commit, not a sha256 digest (which was the floating manifest digest and unhelpful for debugging).	2026-04-21 21:00:30 +02:00
ZdenekSrotyr	af6761f33e	fix(version): bake AGNES_VERSION/CHANNEL/COMMIT_SHA into image ENV Before: startup script wrote AGNES_VERSION=stable (the floating tag name) into .env, which overrode the image's build-time ENV AGNES_VERSION=2026.04.47. UI badge showed 'stable-stable' instead of 'stable-2026.04.47'. After: - Dockerfile ARG/ENV for AGNES_COMMIT_SHA (alongside existing VERSION + CHANNEL) - release.yml passes github.sha as AGNES_COMMIT_SHA build-arg - Startup script no longer writes these three into .env; the app reads them from the image ENV set at build time. Result: badge displays 'stable-2026.04.47 · stable · <time> ago' with the real CalVer, and the commit SHA tooltip points at an actual commit rather than the floating manifest digest.	2026-04-21 21:00:04 +02:00
ZdenekSrotyr	7553f77e55	Merge: version badge partial on all full-page templates	2026-04-21 20:52:04 +02:00
ZdenekSrotyr	432e7695b3	feat(ui): version badge as shared partial, injected into every full-page template The earlier base.html edit only affected templates that extend base.html (login.html via base_login.html). Most pages (dashboard, catalog, admin_tables, admin_permissions, activity_center, corporate_memory, ...) are standalone templates with their own <body>, so the badge never showed. Fix: extracted the badge + fetch script into _version_badge.html partial, included it before </body> in every full-page template. Consistent across login, dashboard, admin, catalog, etc.	2026-04-21 20:51:55 +02:00
ZdenekSrotyr	dbac3e698c	Merge: alert policy reducer fix	2026-04-21 20:36:21 +02:00
ZdenekSrotyr	9a99a82e92	fix(infra): alert policy aggregation — drop cross_series_reducer GCP rejected the policy with 'REDUCE_COUNT_FALSE cannot be applied to metrics with value type DOUBLE' — because ALIGN_FRACTION_TRUE already produces a fraction 0..1 per series, no need for an additional cross-series reducer. Simplified: alert when the per-series fraction < 1 for 5 min. Review M4 predicted this — uptime check filters needed double-checking against live GCP.	2026-04-21 20:36:09 +02:00
ZdenekSrotyr	717f40c218	Merge: bootstrap monitoring role fix	2026-04-21 20:32:59 +02:00
ZdenekSrotyr	4ab0838ba2	fix(bootstrap): grant monitoring.editor + enable monitoring API v1.3.0 added google_monitoring_uptime_check_config + alert policies to the module, but bootstrap-gcp.sh was not updated. Fresh customers (and the first apply after upgrading existing customers) hit 403 on monitoring.uptimeCheckConfigs.create. Fix: enable monitoring.googleapis.com + grant roles/monitoring.editor to the deploy SA. Idempotent (safe to re-run on existing projects).	2026-04-21 20:32:50 +02:00
ZdenekSrotyr	3fb17a13bb	Merge: workflow-driven recreate + docs	2026-04-21 20:24:40 +02:00
ZdenekSrotyr	1a55167234	docs: workflow-driven VM recreate for startup-script propagation - ONBOARDING.md: replace 'propagating module changes' section with two explicit options — workflow_dispatch with recreate_targets (recommended, CI audit trail), or local terraform apply -replace (emergency). Adds a 'do not' section banning manual .env edits on VMs. - deployment-log.md: iteration 4 summary (version badge + module v1.5.0 + workflow_dispatch).	2026-04-21 20:24:31 +02:00
ZdenekSrotyr	11c03f7235	Merge: version badge in footer + /api/version	2026-04-21 20:19:51 +02:00
ZdenekSrotyr	b091cf7003	feat(ui): version badge in footer + /api/version endpoint UI now shows a small footer badge with: - release channel + CalVer version (e.g. 'stable-2026.04.47') - floating image tag (e.g. 'stable') - time since last container restart (proxy for 'last deployed') Backend: - app/api/health.py: /api/health returns image_tag, commit_sha, deployed_at - app/api/health.py: new /api/version endpoint (lightweight, no DB hit, for footer badge polling) Infra: - startup-script.sh.tpl: resolves image digest from ghcr pull, derives channel + version from the tag name, and writes AGNES_VERSION / RELEASE_CHANNEL / AGNES_COMMIT_SHA into .env so the app can surface them to the UI. UI: - app/web/templates/base.html: footer loads /api/version asynchronously and renders '<channel>-<version> · <tag> · deployed <relative> (<UTC>)'. Tooltip shows full detail (commit sha, schema version).	2026-04-21 20:19:40 +02:00
ZdenekSrotyr	2743de6114	Merge: deployment log iteration 3	2026-04-21 20:09:27 +02:00
ZdenekSrotyr	cdd959b19f	docs(log): add iteration 3 — review, bootstrap fix, docs sweep, infra-v1.4.0	2026-04-21 20:09:13 +02:00
ZdenekSrotyr	c1227df990	Merge: docs sweep — DEPLOYMENT.md rewrite, ONBOARDING v1.4.0, README links	2026-04-21 20:08:23 +02:00
ZdenekSrotyr	0121354596	docs: refresh DEPLOYMENT.md and ONBOARDING.md for infra-v1.4.0 - docs/DEPLOYMENT.md: rewritten to pick between Terraform (managed) and Docker Compose (OSS self-host). Old manual SSH-key-and-git-clone flow replaced with compose-based instructions pointing at the persistent-disk overlay and bootstrap endpoint. - docs/ONBOARDING.md: section 4 now documents the new v1.4.0 variables (runtime_secrets, firewall_ssh_source_ranges, notification_channel_ids, compose_ref). Section 6 explains the /auth/bootstrap seed-user fix and warns that destroy+apply reopens the bootstrap window until run again. - README.md: Documentation list expanded — ONBOARDING.md first (recommended path), DEPLOYMENT.md as the branching point, plus links to CONFIGURATION, architecture, and QUICKSTART.	2026-04-21 20:07:43 +02:00
ZdenekSrotyr	0643437ab8	Merge: /auth/bootstrap seed-user fix	2026-04-21 20:01:38 +02:00
ZdenekSrotyr	2b17973796	fix(auth): /auth/bootstrap activates seed users, disabled only by real password Bug: SEED_ADMIN_EMAIL creates a password-less user at app startup, which made /auth/bootstrap return 403 '1 users already exist' on a fresh deployment — leaving the operator no way to log in (the seed user has no password, and /auth/token requires one). Fix: bootstrap is now disabled only when at least one user has a password_hash set. On a fresh deploy with a seed user: - POST /auth/bootstrap { email: <matches seed>, password: X } → sets the password on the seed user, promotes to admin, returns token. - With a non-matching email, a new admin is created alongside the seed user. Lock semantics: bootstrap self-deactivates as soon as any password is set. Tests: 8 passing, including new test_bootstrap_activates_seed_user and test_bootstrap_disabled_when_password_user_exists covering the two halves.	2026-04-21 20:01:20 +02:00
ZdenekSrotyr	7245eedd23	Merge: code review fixes — scoped SA, fail-fast, firewall split, cron .env	2026-04-21 19:40:07 +02:00
ZdenekSrotyr	921094ae40	feat(infra): address code review — scoped SA, fail-fast secrets, firewall split, cron reads .env, merge fix Critical fixes: - C1: VM SA now gets secretmanager.secretAccessor only on specific secrets (JWT + each entry in runtime_secrets). Previously project-wide. - C3: chmod 640 on /var/log/agnes-startup.log (defense in depth) - C4: Remove '\|\| echo ""' fallback on keboola-storage-token — boot now fails loudly if the secret is missing instead of starting a broken app. - C5: Cron auto-upgrade script sources /opt/agnes/.env for AGNES_TAG. If an operator edits .env to pin a specific stable-YYYY.MM.N, cron picks it up immediately with no drift. Removed AGNES_TAG from crontab entry. - C7: explicit depends_on = [IAM bindings, secret_version] on VM — prevents race where VM boots before IAM propagates. Important fixes: - I1: Split firewall into web (80/443 + conditional 8000) and ssh (port 22 with configurable source_ranges, default IAP range only). - I4: Fetch docker-compose files from compose_ref (default 'main'), so customers can pin a specific tag for reproducibility. - I5+I6: Merge order fixed — user-supplied dev_instances values now override defaults (was the other way around). Dev tls_mode default flipped to 'none'. - I7: Remove '\|\| true' on Caddyfile fetch; surface failures loudly. - New acme_email variable (falls back to seed_admin_email if empty). Out-of-module: - Comments translated from Czech to English where applicable (M1).	2026-04-21 19:39:53 +02:00
ZdenekSrotyr	9962fc4d40	Merge: final deployment log iteration 2	2026-04-21 19:11:14 +02:00
ZdenekSrotyr	6470e23df3	docs: finalize deployment log — iteration 2 summary	2026-04-21 19:11:07 +02:00
ZdenekSrotyr	1073517969	Merge: onboarding race-condition fix	2026-04-21 19:10:12 +02:00
ZdenekSrotyr	0b4807a836	docs(onboarding): use 'gh repo create --clone' to avoid template-copy race Separate 'gh repo create --clone=false' + 'git clone' races with GitHub's template content propagation. '--clone' waits for it in one step.	2026-04-21 19:10:04 +02:00
ZdenekSrotyr	4501840893	Merge: onboarding docs — propagation, restore, monitoring	2026-04-21 19:06:27 +02:00
ZdenekSrotyr	3e9213bfc4	docs(onboarding): add module propagation, backup restore, monitoring setup - 'Propagating module changes' — explains ignore_changes + -replace workflow - 'Restoring from backup' — step-by-step disk swap from daily snapshot - 'Monitoring alerts' — wiring notification channels	2026-04-21 19:06:20 +02:00
ZdenekSrotyr	85bca573a7	Merge: daily backup snapshot + monitoring alerts	2026-04-21 19:02:07 +02:00
ZdenekSrotyr	0842debf8a	feat(infra): add daily backup snapshot + monitoring alerts - google_compute_resource_policy.daily_backup: daily snapshot at 02:00, 30-day retention, labels (app=agnes, customer=<name>) - google_compute_disk_resource_policy_attachment.data_backup: attach policy to each data disk (prod + dev) - google_monitoring_uptime_check_config.health: per-VM /api/health uptime check every 60s, 10s timeout - google_monitoring_alert_policy.health_failure: alert when uptime check fails for > 5 min New opt-out: enable_monitoring = false (default true) New opt-in: notification_channel_ids = [...] to wire alerts to email/Slack Module API unchanged; existing customers pick up backups + monitoring on next module upgrade. TF provider requirement unchanged.	2026-04-21 19:01:56 +02:00
ZdenekSrotyr	0ca8ed2bce	Merge: per-branch image tag :dev-<slug> for branch-aware dev deploys	2026-04-21 18:47:16 +02:00
ZdenekSrotyr	5188bd9127	ci: add per-branch image tag :dev-<slug> for branch-aware dev deploys Extracts branch name from GITHUB_REF, slugifies it, and adds as extra tag on feature branch builds. Main branch is unaffected (no branch_slug output). Enables dev_instances tfvar with image_tag pinning specific feature branches.	2026-04-21 18:47:01 +02:00
ZdenekSrotyr	1811a408de	Merge: fix CI smoke test — split host bind mount to separate overlay	2026-04-21 16:54:27 +02:00
ZdenekSrotyr	1acc89c486	fix(ci): move bind-mount of /data to separate overlay, fix CI smoke test The CI smoke test failed because docker-compose.prod.yml forced a bind mount to /data on the host — which doesn't exist on GitHub runners. Split the bind mount into docker-compose.host-mount.yml, which is only composed by the VM startup script (/data exists there, mounted from the persistent disk). CI continues to use the default named volume. Module startup script + auto-upgrade cron now compose all three: -f docker-compose.yml -f docker-compose.prod.yml -f docker-compose.host-mount.yml	2026-04-21 16:54:18 +02:00
ZdenekSrotyr	a3b4b43e47	Merge: deployment log with final state	2026-04-21 16:51:28 +02:00
ZdenekSrotyr	03dd81c825	docs: update deployment log with final state and onboarding workflow - Volume fix documented (Docker named volume → bind mount /data) - Watchtower → cron-based auto-upgrade - Final state snapshot of VMs, repos, tags, secrets - Onboarding flow summary for 2nd customer	2026-04-21 16:51:20 +02:00
ZdenekSrotyr	85c6b114b0	Merge: add ONBOARDING.md	2026-04-21 16:49:54 +02:00
ZdenekSrotyr	a44e11a5e2	docs: add ONBOARDING.md — end-to-end per-customer deployment guide	2026-04-21 16:49:45 +02:00
ZdenekSrotyr	3dcdc52faf	Merge: replace watchtower with cron, bump infra module to v1.1.0	2026-04-21 16:47:05 +02:00
ZdenekSrotyr	cbd85c52ed	fix(infra): replace watchtower with cron for auto-upgrade Watchtower container has Docker API mismatch (client 1.25 vs daemon 1.54+) that can't be worked around without upstream fix. Simple cron job does the same thing more reliably: - Every 5 min: docker compose pull + detect digest change + up -d if changed - Logs to /var/log/agnes-auto-upgrade.log This removes the watchtower container and a Docker daemon dependency.	2026-04-21 16:46:55 +02:00
ZdenekSrotyr	94b6a8eff2	Merge feature/multi-customer-deployment: multi-customer deployment infra - infra/modules/customer-instance/ — reusable Terraform module (tag infra-v1.0.0) - infra/examples/minimal/ — OSS self-host quickstart - scripts/bootstrap-gcp.sh — per-customer GCP setup - scripts/fetch-env-from-secrets.sh — VM-side .env from Secret Manager - docker-compose.prod.yml — bind data volume to host /data for persistent disks - docs/superpowers/specs/2026-04-21-multi-customer-deployment-spec.md - docs/superpowers/plans/2026-04-21-multi-customer-deployment.md - docs/superpowers/plans/2026-04-21-deployment-log.md	2026-04-21 16:43:06 +02:00
ZdenekSrotyr	52d63457ff	fix(prod): bind docker data volume to host /data for persistent disk Without this override, docker-compose creates a named volume 'agnes_data' on the boot disk, ignoring any persistent disk mounted at /data by the VM startup script. This override makes the 'data' volume a bind mount to host /data, so persistent disks work as expected.	2026-04-21 16:42:23 +02:00
ZdenekSrotyr	a2c05a5d97	infra: refactor Terraform into reusable customer-instance module Breaking changes: - infra/main.tf, variables.tf, outputs.tf, terraform.tfvars.example removed - Single-file monolith replaced by reusable module + example New structure: - infra/modules/customer-instance/ — the module: - main.tf: VMs, disks, firewall, Secret Manager, dedicated VM SA - variables.tf: prod_instance + dev_instances flexible schema - outputs.tf: IPs, SA email, JWT secret reference - startup-script.sh.tpl: bootstraps VM, fetches secrets, runs compose, adds Watchtower for auto-upgrade - infra/examples/minimal/ — OSS self-host quickstart using the module Supports: - Per-customer GCP project isolation - Branch-aware dev VMs via dev_instances list (any image_tag) - Persistent /data disk (rebuild-safe) - OS Login (no per-user SSH keys) - Caddy TLS mode (opt-in via tls_mode="caddy" + domain) - Watchtower auto-upgrade (opt-in via upgrade_mode="auto")	2026-04-21 16:18:35 +02:00

... 3 4 5 6 7 ...

589 commits