agnes-the-ai-analyst

Author	SHA1	Message	Date
ZdenekSrotyr	0643437ab8	Merge: /auth/bootstrap seed-user fix	2026-04-21 20:01:38 +02:00
ZdenekSrotyr	2b17973796	fix(auth): /auth/bootstrap activates seed users, disabled only by real password Bug: SEED_ADMIN_EMAIL creates a password-less user at app startup, which made /auth/bootstrap return 403 '1 users already exist' on a fresh deployment — leaving the operator no way to log in (the seed user has no password, and /auth/token requires one). Fix: bootstrap is now disabled only when at least one user has a password_hash set. On a fresh deploy with a seed user: - POST /auth/bootstrap { email: <matches seed>, password: X } → sets the password on the seed user, promotes to admin, returns token. - With a non-matching email, a new admin is created alongside the seed user. Lock semantics: bootstrap self-deactivates as soon as any password is set. Tests: 8 passing, including new test_bootstrap_activates_seed_user and test_bootstrap_disabled_when_password_user_exists covering the two halves.	2026-04-21 20:01:20 +02:00
ZdenekSrotyr	7245eedd23	Merge: code review fixes — scoped SA, fail-fast, firewall split, cron .env	2026-04-21 19:40:07 +02:00
ZdenekSrotyr	921094ae40	feat(infra): address code review — scoped SA, fail-fast secrets, firewall split, cron reads .env, merge fix Critical fixes: - C1: VM SA now gets secretmanager.secretAccessor only on specific secrets (JWT + each entry in runtime_secrets). Previously project-wide. - C3: chmod 640 on /var/log/agnes-startup.log (defense in depth) - C4: Remove '\|\| echo ""' fallback on keboola-storage-token — boot now fails loudly if the secret is missing instead of starting a broken app. - C5: Cron auto-upgrade script sources /opt/agnes/.env for AGNES_TAG. If an operator edits .env to pin a specific stable-YYYY.MM.N, cron picks it up immediately with no drift. Removed AGNES_TAG from crontab entry. - C7: explicit depends_on = [IAM bindings, secret_version] on VM — prevents race where VM boots before IAM propagates. Important fixes: - I1: Split firewall into web (80/443 + conditional 8000) and ssh (port 22 with configurable source_ranges, default IAP range only). - I4: Fetch docker-compose files from compose_ref (default 'main'), so customers can pin a specific tag for reproducibility. - I5+I6: Merge order fixed — user-supplied dev_instances values now override defaults (was the other way around). Dev tls_mode default flipped to 'none'. - I7: Remove '\|\| true' on Caddyfile fetch; surface failures loudly. - New acme_email variable (falls back to seed_admin_email if empty). Out-of-module: - Comments translated from Czech to English where applicable (M1).	2026-04-21 19:39:53 +02:00
ZdenekSrotyr	9962fc4d40	Merge: final deployment log iteration 2	2026-04-21 19:11:14 +02:00
ZdenekSrotyr	6470e23df3	docs: finalize deployment log — iteration 2 summary	2026-04-21 19:11:07 +02:00
ZdenekSrotyr	1073517969	Merge: onboarding race-condition fix	2026-04-21 19:10:12 +02:00
ZdenekSrotyr	0b4807a836	docs(onboarding): use 'gh repo create --clone' to avoid template-copy race Separate 'gh repo create --clone=false' + 'git clone' races with GitHub's template content propagation. '--clone' waits for it in one step.	2026-04-21 19:10:04 +02:00
ZdenekSrotyr	4501840893	Merge: onboarding docs — propagation, restore, monitoring	2026-04-21 19:06:27 +02:00
ZdenekSrotyr	3e9213bfc4	docs(onboarding): add module propagation, backup restore, monitoring setup - 'Propagating module changes' — explains ignore_changes + -replace workflow - 'Restoring from backup' — step-by-step disk swap from daily snapshot - 'Monitoring alerts' — wiring notification channels	2026-04-21 19:06:20 +02:00
ZdenekSrotyr	85bca573a7	Merge: daily backup snapshot + monitoring alerts	2026-04-21 19:02:07 +02:00
ZdenekSrotyr	0842debf8a	feat(infra): add daily backup snapshot + monitoring alerts - google_compute_resource_policy.daily_backup: daily snapshot at 02:00, 30-day retention, labels (app=agnes, customer=<name>) - google_compute_disk_resource_policy_attachment.data_backup: attach policy to each data disk (prod + dev) - google_monitoring_uptime_check_config.health: per-VM /api/health uptime check every 60s, 10s timeout - google_monitoring_alert_policy.health_failure: alert when uptime check fails for > 5 min New opt-out: enable_monitoring = false (default true) New opt-in: notification_channel_ids = [...] to wire alerts to email/Slack Module API unchanged; existing customers pick up backups + monitoring on next module upgrade. TF provider requirement unchanged.	2026-04-21 19:01:56 +02:00
ZdenekSrotyr	0ca8ed2bce	Merge: per-branch image tag :dev-<slug> for branch-aware dev deploys	2026-04-21 18:47:16 +02:00
ZdenekSrotyr	5188bd9127	ci: add per-branch image tag :dev-<slug> for branch-aware dev deploys Extracts branch name from GITHUB_REF, slugifies it, and adds as extra tag on feature branch builds. Main branch is unaffected (no branch_slug output). Enables dev_instances tfvar with image_tag pinning specific feature branches.	2026-04-21 18:47:01 +02:00
ZdenekSrotyr	1811a408de	Merge: fix CI smoke test — split host bind mount to separate overlay	2026-04-21 16:54:27 +02:00
ZdenekSrotyr	1acc89c486	fix(ci): move bind-mount of /data to separate overlay, fix CI smoke test The CI smoke test failed because docker-compose.prod.yml forced a bind mount to /data on the host — which doesn't exist on GitHub runners. Split the bind mount into docker-compose.host-mount.yml, which is only composed by the VM startup script (/data exists there, mounted from the persistent disk). CI continues to use the default named volume. Module startup script + auto-upgrade cron now compose all three: -f docker-compose.yml -f docker-compose.prod.yml -f docker-compose.host-mount.yml	2026-04-21 16:54:18 +02:00
ZdenekSrotyr	a3b4b43e47	Merge: deployment log with final state	2026-04-21 16:51:28 +02:00
ZdenekSrotyr	03dd81c825	docs: update deployment log with final state and onboarding workflow - Volume fix documented (Docker named volume → bind mount /data) - Watchtower → cron-based auto-upgrade - Final state snapshot of VMs, repos, tags, secrets - Onboarding flow summary for 2nd customer	2026-04-21 16:51:20 +02:00
ZdenekSrotyr	85c6b114b0	Merge: add ONBOARDING.md	2026-04-21 16:49:54 +02:00
ZdenekSrotyr	a44e11a5e2	docs: add ONBOARDING.md — end-to-end per-customer deployment guide	2026-04-21 16:49:45 +02:00
ZdenekSrotyr	3dcdc52faf	Merge: replace watchtower with cron, bump infra module to v1.1.0	2026-04-21 16:47:05 +02:00
ZdenekSrotyr	cbd85c52ed	fix(infra): replace watchtower with cron for auto-upgrade Watchtower container has Docker API mismatch (client 1.25 vs daemon 1.54+) that can't be worked around without upstream fix. Simple cron job does the same thing more reliably: - Every 5 min: docker compose pull + detect digest change + up -d if changed - Logs to /var/log/agnes-auto-upgrade.log This removes the watchtower container and a Docker daemon dependency.	2026-04-21 16:46:55 +02:00
ZdenekSrotyr	94b6a8eff2	Merge feature/multi-customer-deployment: multi-customer deployment infra - infra/modules/customer-instance/ — reusable Terraform module (tag infra-v1.0.0) - infra/examples/minimal/ — OSS self-host quickstart - scripts/bootstrap-gcp.sh — per-customer GCP setup - scripts/fetch-env-from-secrets.sh — VM-side .env from Secret Manager - docker-compose.prod.yml — bind data volume to host /data for persistent disks - docs/superpowers/specs/2026-04-21-multi-customer-deployment-spec.md - docs/superpowers/plans/2026-04-21-multi-customer-deployment.md - docs/superpowers/plans/2026-04-21-deployment-log.md	2026-04-21 16:43:06 +02:00
ZdenekSrotyr	52d63457ff	fix(prod): bind docker data volume to host /data for persistent disk Without this override, docker-compose creates a named volume 'agnes_data' on the boot disk, ignoring any persistent disk mounted at /data by the VM startup script. This override makes the 'data' volume a bind mount to host /data, so persistent disks work as expected.	2026-04-21 16:42:23 +02:00
ZdenekSrotyr	a2c05a5d97	infra: refactor Terraform into reusable customer-instance module Breaking changes: - infra/main.tf, variables.tf, outputs.tf, terraform.tfvars.example removed - Single-file monolith replaced by reusable module + example New structure: - infra/modules/customer-instance/ — the module: - main.tf: VMs, disks, firewall, Secret Manager, dedicated VM SA - variables.tf: prod_instance + dev_instances flexible schema - outputs.tf: IPs, SA email, JWT secret reference - startup-script.sh.tpl: bootstraps VM, fetches secrets, runs compose, adds Watchtower for auto-upgrade - infra/examples/minimal/ — OSS self-host quickstart using the module Supports: - Per-customer GCP project isolation - Branch-aware dev VMs via dev_instances list (any image_tag) - Persistent /data disk (rebuild-safe) - OS Login (no per-user SSH keys) - Caddy TLS mode (opt-in via tls_mode="caddy" + domain) - Watchtower auto-upgrade (opt-in via upgrade_mode="auto")	2026-04-21 16:18:35 +02:00
ZdenekSrotyr	0dd8b13d62	infra: add fetch-env-from-secrets.sh for VM-side .env generation Reads JWT_SECRET_KEY and KEBOOLA_STORAGE_TOKEN from Secret Manager, combines with non-secret config, writes .env with chmod 600. Run as part of VM startup or manually for rotation.	2026-04-21 16:18:35 +02:00
ZdenekSrotyr	5ad96e5f86	infra: add bootstrap-gcp.sh for per-customer GCP setup Creates agnes-deploy SA with Terraform-scoped roles, GCS tfstate bucket, and generates a JSON key. Idempotent — safe to re-run. Expanded .gitignore to block *-key.json files from ever being committed.	2026-04-21 16:18:35 +02:00
ZdenekSrotyr	e514f57267	Merge pull request #6 from keboola/dependabot/uv/python-multipart-0.0.26 chore(deps): bump python-multipart from 0.0.24 to 0.0.26	2026-04-21 15:27:25 +02:00
dependabot[bot]	6e93461918	chore(deps): bump python-multipart from 0.0.24 to 0.0.26 Bumps [python-multipart](https://github.com/Kludex/python-multipart) from 0.0.24 to 0.0.26. - [Release notes](https://github.com/Kludex/python-multipart/releases) - [Changelog](https://github.com/Kludex/python-multipart/blob/master/CHANGELOG.md) - [Commits](https://github.com/Kludex/python-multipart/compare/0.0.24...0.0.26) --- updated-dependencies: - dependency-name: python-multipart dependency-version: 0.0.26 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-04-21 13:26:19 +00:00
ZdenekSrotyr	e53de59a42	docs: multi-customer deployment spec + implementation plan - Spec: pure self-deploy model with per-customer GCP project - Public upstream repo with TF module; private template + per-customer repos - Branch-aware dev VMs via dev_instances list - Caddy TLS, Secret Manager for tokens, SA JSON key for CI (WIF follow-up) - 6-phase implementation plan with bite-sized tasks	2026-04-21 15:25:17 +02:00
ZdenekSrotyr	cf8528b5cf	Merge pull request #7 from keboola/dependabot/uv/authlib-1.6.11 chore(deps): bump authlib from 1.6.9 to 1.6.11	2026-04-21 15:24:57 +02:00
ZdenekSrotyr	bd6921c4d5	docs,tests: anonymize customer references Replace identifying customer names and infrastructure URLs in documentation and test fixtures with generic placeholders. Test semantics preserved.	2026-04-21 11:56:19 +02:00
dependabot[bot]	043ae4b378	chore(deps): bump authlib from 1.6.9 to 1.6.11 Bumps [authlib](https://github.com/authlib/authlib) from 1.6.9 to 1.6.11. - [Release notes](https://github.com/authlib/authlib/releases) - [Changelog](https://github.com/authlib/authlib/blob/v1.6.11/docs/changelog.rst) - [Commits](https://github.com/authlib/authlib/compare/v1.6.9...v1.6.11) --- updated-dependencies: - dependency-name: authlib dependency-version: 1.6.11 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-04-17 00:41:27 +00:00
ZdenekSrotyr	c74a1fab53	Merge pull request #4 from keboola/feature/v2-fastapi-duckdb-docker-cli test: comprehensive test suite — 1169 tests, 4 layers	2026-04-13 16:14:11 +02:00
ZdenekSrotyr	5bbd82bacd	fix: address Devin review — docker-e2e .env, jira webhook test isolation - Create empty .env before docker compose up in CI (env_file: .env is required) - Mock get_jira_service in webhook HMAC test to isolate signature check from Jira API availability — strict assert 200 instead of permissive 500	2026-04-13 14:36:31 +02:00
ZdenekSrotyr	863453b2e2	fix: address code review findings — duplicate fixture, JWT key length, async deprecation - Remove duplicate mock_extract_factory fixture in conftest.py - Use 32+ char JWT_SECRET_KEY everywhere (was 15 chars, triggered warnings) - Replace deprecated asyncio.get_event_loop() with asyncio.run() - Unify WebhookEventFactory sign methods (consistent json.dumps)	2026-04-13 13:47:51 +02:00
ZdenekSrotyr	12480b8c35	fix: graceful skip for telegram bot tests when log dir unavailable in CI	2026-04-13 13:31:51 +02:00
ZdenekSrotyr	98af8e2df3	fix: make bot.py FileHandler resilient to missing log directory	2026-04-13 13:28:59 +02:00
ZdenekSrotyr	0045f5d324	fix: ensure DATA_DIR and notifications dir exist before bot.py import in CI	2026-04-13 13:26:18 +02:00
ZdenekSrotyr	1a68decd4e	fix: patch BOT_LOG_FILE at import time for CI/xdist compatibility	2026-04-13 13:21:04 +02:00
ZdenekSrotyr	9a144f8291	fix: unify JWT_SECRET_KEY across all test modules for xdist stability	2026-04-12 14:28:17 +02:00
ZdenekSrotyr	ed58075419	Merge branch 'worktree-agent-a417e289' into feature/v2-fastapi-duckdb-docker-cli	2026-04-12 14:24:39 +02:00
ZdenekSrotyr	325f785ef4	fix: get_instance_name reads nested instance.name from YAML	2026-04-12 14:23:54 +02:00
ZdenekSrotyr	209643becb	fix: return filename instead of absolute path in upload responses	2026-04-12 14:23:51 +02:00
ZdenekSrotyr	31e210c7e3	fix: require admin/km_admin role for web admin pages	2026-04-12 14:23:47 +02:00
ZdenekSrotyr	01b5f80ef9	fix: restrict script deploy/execute to analyst role, undeploy to admin	2026-04-12 14:23:44 +02:00
ZdenekSrotyr	5bfff6616c	ci: add parallel test execution and nightly Docker E2E job	2026-04-12 14:15:46 +02:00
ZdenekSrotyr	2ec50b4e4f	test: add telegram API endpoint tests (verify, unlink, status)	2026-04-12 14:12:28 +02:00
ZdenekSrotyr	e25a7aba7d	fix: resolve JWT secret key test isolation issue Replace module-level SECRET_KEY cache with lazy _get_cached_secret_key() that re-reads env vars in test mode. This fixes 20 test failures caused by JWT secret mismatch when test modules load in different orders.	2026-04-12 14:05:41 +02:00
ZdenekSrotyr	833de96cd7	merge: resolve Block E conflicts in pytest.ini and conftest.py	2026-04-12 11:17:26 +02:00

1 2 3 4 5 ...

414 commits