agnes-the-ai-analyst

Author	SHA1	Message	Date
Petr Simecek	d7bd710ca2	docs(claude): non-negotiable CHANGELOG.md update rule + [Unreleased] skeleton (#59 ) CLAUDE.md gains a "Changelog discipline — non-negotiable" section above "Git Commits & Pull Requests". Codifies the rule that every PR touching user-visible behavior must update CHANGELOG.md under [Unreleased] in the same PR — with concrete instructions for which sections to use, how to mark breaking changes, and what counts as user-visible. CHANGELOG.md gets an [Unreleased] skeleton above [0.11.0] so the next PR has somewhere obvious to land its bullet, plus the inaugural [Unreleased] entry documenting this very rule (eats its own dog food). The rule is intentionally strict ("no exceptions, no follow-ups") — soft "should" rules erode under pressure; binding rules survive PR churn. Reviewers should bounce PRs that violate it, same as they'd bounce a PR with no test changes for new logic.	2026-04-26 01:10:32 +02:00
Petr Simecek	598f186eb1	release(0.11.0): reset to pre-1.0 semver + first changelog (#58 ) The version = "2.x" strings in earlier pyproject.toml snapshots were arbitrary placeholders from the initial scaffold (cookiecutter default), not a reflection of API maturity. Resetting to 0.11.0 to signal pre-1.0 status: public surface (CLI flags, REST endpoints, instance.yaml schema, extract.duckdb contract) may still shift between minor versions. CalVer image tags (stable-YYYY.MM.N, dev-YYYY.MM.N) continue from CI; semver tags (v0.X.Y) are cut at release boundaries and reference the same commit as a stable-* tag from the same day. CHANGELOG.md replaces the old CalVer draft format with Keep a Changelog + semver. The 0.11.0 entry curates everything currently in main: - Auth: Workspace groups, password reset, PAT, magic-link, seed admin pwd - Deploy: keboola-deploy workflow, Caddy/LE/cert-file TLS, dev_instances TLS, optional Google OAuth from SM, LOCAL_DEV_MODE, /setup wizard - CLI: wheel distribution, auto-update, --version, --dry-run, gzip - Data: remote query (BQ+DuckDB), business metrics, OpenAPI snapshot test - Security: padak-security.md audit batch + urllib3 + argon2-cffi - Two BREAKING items called out (Caddy profile rename, Caddyfile default cert mode flipped to cert-file)	2026-04-26 01:05:55 +02:00
Petr Simecek	c25fd41bf7	feat(auth): Google Workspace groups on /profile + tag-triggered Keboola deploy workflow (#56 ) * feat(auth): display Google Workspace groups on /profile - Request cloud-identity.groups.readonly scope in Google OAuth - Fetch groups via Cloud Identity API after callback; tolerate 4xx (non-Workspace tenants) and network errors — never break login - Store result in Starlette session as google_groups - Replace /profile redirect with a real profile page rendering account details (email, name, role) and the group list; show a friendly empty state when no groups are available - Tests: helper parsing + 403 + exception paths; profile page smoke test; updated the old redirect test * test: remove stale /profile redirect tests Cherry-pick of Zdeněk's 4f7e4cd ("display Google Workspace groups on /profile") replaces the /profile redirect with a real profile page — but only updated one of three tests that expected the old behaviour. These two tests in test_admin_tokens_ui.py and test_pat.py were left asserting `/profile → 302 /tokens`, which now returns `/profile → 302 /login?next=%2Fprofile` for unauth users (the standard auth guard) or `/profile → 200 HTML` for authenticated users. Removed both rather than patched — coverage for the new behaviour already exists in tests/test_auth_providers.py (added by the same commit). The /tokens render assertions in the deleted test_pat.py case are redundant with test_admin_tokens_ui.py's own /tokens UI tests. * fix(auth): Google groups search query needs parent + labels predicates Cloud Identity Groups Search API returns 400 INVALID_ARGUMENT when the CEL query lacks the required `parent == 'customers/<id>'` predicate AND a `'<label>' in labels` membership predicate. Zdeněk's original 4f7e4cd query had only `member_key_id == '<email>'` — every fetch silently returned [] and the /profile groups list was always empty. Fix: build the query with all three required pieces: parent == 'customers/my_customer' (alias = caller's own Workspace org; no need to look up customer ID) member_key_id == '<email>' (filter to this user's memberships) 'cloudidentity.googleapis.com/groups.discussion_forum' in labels (Workspace mailing-list groups — the common case; security-group coverage is a follow-up) Also: log the full error body (not truncated to 200 chars) and the query string so the next time Google rejects something we can diagnose in one log line instead of a re-deploy. Caught when first agnes-dev login completed normally (HTTP 302) but app log showed `Google groups fetch returned 400 for petr@keboola.com: {"error":{"code":400,"message":"Request contains an invalid argument."}}` on the same VM (kids-ai-data-analysis / agnes-dev.keboola.com). Reference: https://cloud.google.com/identity/docs/reference/rest/v1/groups/search * feat(web): add Profile link to user dropdown menu The /profile page (Zdeněk's 4f7e4cd cherry-pick) renders a real profile view including Google Workspace groups, but had no entry point in the UI — users could only reach it by typing the URL manually. Add a "Profile" menu item between the user header (email + role) and "My tokens" so the page is discoverable. Side effect: cleaned up the leftover `or _path.startswith('/profile')` condition on the "My tokens" active class, which dated from the old /profile → /tokens redirect (removed in c789617). Now each menu item owns its own active state. * fix: profile-link tests + .env quoting for CADDY_TLS Two issues caught by Keboola's first agnes-dev deploy + agnes-auto-upgrade cron run: 1. tests/test_web_ui.py — two negative assertions ("href=/profile" NOT in body) date from when /profile was a redirect-only stub. Now /profile is a real page (groups display) AND has a dropdown menu link, so the negative assertions flip to positive. Same for ">Profile<" text in the non-admin nav test. 2. startup-script.sh.tpl — CADDY_TLS line must be QUOTED in .env, because agnes-auto-upgrade.sh sources .env via `set -a; . .env; set +a` and bash treats `KEY=value with spaces` as `KEY=value` followed by `with` and `spaces` exec attempts. Symptom: cron log spam `/opt/agnes/.env: line 14: petr@keboola.com: command not found`, the cron exits non-zero, and no auto-upgrade ever happens. Caddy itself reads the value fine because docker-compose env_file=.env parses key=value properly without shell-evaluating the rest. Fix: emit `CADDY_TLS="tls <email>"` instead of `CADDY_TLS=tls <email>`. Both the cron source and docker-compose env_file accept the quoted form; cron stops failing. * fix(auth): use searchTransitiveGroups + security label for non-admin user Three bugs in the original cherry-pick + my prior fix attempt, all caught by a stdlib probe script (scripts/debug/probe_google_groups.py) run locally with a Playground-issued OAuth token: 1. Wrong endpoint. `groups:search` is the admin "find groups in org" endpoint and 400s for non-admin users regardless of query. Switched to `groups/-/memberships:searchTransitiveGroups` which is the user-perspective "what groups am I in" endpoint. 2. Wrong label. Querying with `cloudidentity.googleapis.com/groups.discussion_forum` returns 403 "Insufficient permissions to retrieve memberships" even on the new endpoint — Workspace policy denies non-admin reads of discussion-forum groups. Switching to `groups.security` returns 200 with the actual membership list. Empirically every Workspace group at Keboola carries BOTH labels, so the security filter sees the full set anyway. Confirmed with the probe script. 3. Wrong response shape. `searchTransitiveGroups` returns {"memberships": [...]}, not {"groups": [...]}. Parser updated accordingly. Also adds scripts/debug/probe_google_groups.py — stdlib-only standalone probe that hits 6 candidate endpoints with a user OAuth token. Saved a deploy cycle (~10 min) per query iteration; future API-syntax debugging should start there. Verified end-to-end: petr@keboola.com login on agnes-dev returns 5 groups (LIC-1PASSWORD, ROLE_ATLASSIAN_, etc.) via the probe; once deployed, the same will populate session["google_groups"] and render on /profile. test(auth): update Google groups parser fixture to match searchTransitiveGroups shape Mock payload was `{"groups": [...]}` (the shape `groups:search` returns). After switching to `groups/-/memberships:searchTransitiveGroups` in the prior commit, the actual response is `{"memberships": [...]}` and the parser iterates that key. Test now mirrors the real shape. The per-item structure (groupKey.id + displayName) is unchanged, so the expected output dict stays the same: [{"id": "...", "name": "..."}]. * docs(auth): add docs/auth-groups.md — Google Workspace groups runbook Captures the non-obvious bits: the GCP-side setup checklist (Cloud Identity API + scope on consent screen + Internal user type), the `security` vs `discussion_forum` label trap (the latter 403s for non-admins, the former 200s — one of those is a 4-iteration debug session and shouldn't have to be repeated), where groups are stored (session, not DB) and how to refresh (re-login), plus how to use the probe script for future API-syntax issues. Deliberately stops short of explaining "what is Cloud Identity" or "what is OAuth scope" — those belong in Google's own docs, not ours. * docs(claude): document release workflows + module versioning + recreate trick New "Release & deploy workflows" section in CLAUDE.md covers what didn't exist anywhere in the repo before: - Distinction between release.yml (auto-build per push) vs the new keboola-deploy.yml (tag-triggered, explicit deploy only) — plus when to use which (per-developer convenience vs shared dev VM safety) - Module versioning (infra-vX.Y.Z) and the bump-after-merge dance - The lifecycle.ignore_changes [metadata_startup_script] gotcha and how to force a recreate via workflow_dispatch's recreate_targets input All generic — no customer hostnames, project IDs, IPs. Customer-specific deploy steps belong in the consuming infra repo's README. Also: cross-reference docs/auth-groups.md from the Authentication section so future Claude sessions find the Workspace-groups runbook without grepping. --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-04-26 00:56:44 +02:00
Petr Simecek	4799119c81	feat(deploy): keboola-deploy tag-triggered workflow + Caddyfile LE/internal modes + dev_instances TLS support (#52 ) * feat(deploy): keboola-deploy tag-triggered workflow + Caddyfile LE/internal modes + dev_instances TLS support Three coordinated changes that together unblock Keboola's internal Agnes deployment from the foot-gun where the dev VM tracks `:dev` (= last push from anyone in the upstream repo). 1. .github/workflows/keboola-deploy.yml — new workflow Triggered ONLY on `keboola-deploy-` git tag pushes (not on every branch push like release.yml). Builds an image and publishes two GHCR tags: ghcr.io/keboola/agnes-the-ai-analyst:keboola-deploy-<git-tag-suffix> ghcr.io/keboola/agnes-the-ai-analyst:keboola-deploy-latest The Keboola dev VM pins to `keboola-deploy-latest`; an operator deploys by `git tag keboola-deploy-foo && git push origin keboola-deploy-foo`. Audit trail lives in git tags (immutable, who-tagged-what-when), no PR-cycle needed for each deploy. Doesn't touch Vojta/Minas/David workflow — release.yml still builds `:dev-<slug>` for every branch push as before. 2. Caddyfile — parametrize TLS directive via $CADDY_TLS env var PR #51 hardcoded cert-file mode (`tls /certs/fullchain.pem ...`) for Groupon's corporate CA flow. That broke the Let's Encrypt path the module previously supported. Now: CADDY_TLS unset (default) → cert-file mode (Groupon corp PKI) CADDY_TLS="tls user@x.com" → Let's Encrypt auto-issue CADDY_TLS="tls internal" → Caddy-managed self-signed (lab/dev) Single Caddyfile, three regimes, no per-deployment fork. Validated with `caddy validate` in all three modes. 3. customer-instance module — dev_instances TLS + auto-set CADDY_TLS - variables.tf: dev_instances object schema gains optional tls_mode + domain (mirroring prod_instance). Defaults to "none" + "" so existing callers without those fields keep current behavior. - startup-script.sh.tpl: when tls_mode="caddy" and DOMAIN is set, write CADDY_TLS=tls <ACME_EMAIL> (or "tls internal" when ACME_EMAIL empty) into /opt/agnes/.env. Caddy then picks it up and the Caddyfile substitution flips the cert source. For an LE deploy: set tls_mode="caddy", domain="agnes-dev.example.com", ensure DNS A-record points at the VM, and acme_email is set on the module (or seed_admin_email is, since acme_email defaults to it). After this lands, tag as infra-v1.6.0 so downstream infra repos can bump their module ref without needing the upstream change tracking. feat(deploy): fetch optional Google OAuth credentials from Secret Manager Mirrors the existing keboola-storage-token / agnes-<customer>-jwt-secret pattern: VM SA reads google-oauth-client-{id,secret} secrets at boot (if they exist + IAM is wired by caller via runtime_secrets) and writes them into /opt/agnes/.env. Empty / missing / 403 → silent fallback to "" so password and email auth keep working untouched. Pairs with downstream change in agnes-infra-keboola which adds the two secret names to runtime_secrets, granting the Keboola VM SA secretAccessor on them. Operator pre-creates the SM containers via gcloud secrets create google-oauth-client-{id,secret} (one-time, out of band) — values stay in SM forever; rotation = `gcloud secrets versions add`. This unblocks the Keboola agnes-dev deploy from PR #3 (infra) — without GOOGLE_CLIENT_{ID,SECRET} in .env, app/auth/providers/google.is_available() returns False and the Google sign-in button never even appears.	2026-04-25 23:19:00 +02:00
Vojtech	0bbbf3e40b	feat(tls): corporate-CA HTTPS with URL-driven rotation, on-VM CSR gen, self-signed fallback (#51 ) Replaces the implicit Let's Encrypt flow with a general corporate-CA HTTPS path: - Caddy switches to cert-file mode (`tls /certs/fullchain.pem /certs/privkey.pem`) with HSTS + TLS 1.2/1.3 floor - New `docker-compose.tls.yml` overlay closes host `:8000` when Caddy fronts (no TLS bypass) - New `scripts/tls-fetch.sh` — generic URL fetcher for `sm://`, `gs://`, `https://`, `file://` with redirect refusal + PEM validation - New `scripts/grpn/agnes-tls-rotate.sh` — daily rotation, self-signed fallback against same key (zero key churn), on-VM RSA-2048 + CSR auto-gen, atomic swap, SIGUSR1 reload - `scripts/grpn/agnes-auto-upgrade.sh` becomes cert-aware (auto-enables tls overlay when certs present) - Compose profile `production` renamed to `tls` (aligns with DEPLOYMENT.md and infra startup) Pairs with FoundryAI/agnes-the-ai-analyst-infra#27 (merged) which wires per-VM `local.vm_tls`, writes `TLS_` env vars into `.env`, auto-creates Secret Manager containers for `sm://` privkey URLs, and installs `agnes-tls-rotate.{service,timer}` for daily polling. Includes hardening + docs follow-ups from code review: - `TLS_CSR_SUBJECT` env-var parametrisation applied to both CSR and self-signed cert paths - curl `--max-redirs 0 --proto '=https'` + post-fetch PEM validation in `tls-fetch.sh` - `ulimit -c 0` + array-form `COMPOSE_FILES` (POSIX-safe, bash 3.2 compatible) - TLS section added to `config/.env.template` - Historical-note headers in `docs/superpowers/{plans,specs}/2026-04-09-.md` flagging the profile rename	2026-04-25 19:51:25 +00:00
ZdenekSrotyr	1381770057	fix(auth): uvicorn --proxy-headers + Google OAuth doc + vendor-agnostic OSS rule in CLAUDE.md (#39 ) * fix(compose): pass --proxy-headers to uvicorn so OAuth callbacks resolve to https When the app runs behind a reverse proxy (Caddy, nginx, Cloudflare Tunnel), uvicorn's default policy of trusting X-Forwarded-* only from 127.0.0.1 means the request the container sees still looks like http://localhost:8000/..., even when the user is on https://. The OAuth provider then sends Google a callback URL Google has never seen — Error 400: redirect_uri_mismatch. --proxy-headers + --forwarded-allow-ips '' tell uvicorn to honor those headers from any source. The container only ever sees its own docker network anyway; trusting it everywhere is safe in this deployment shape. Adds docs/auth-google-oauth.md with the full operator gotcha list — env vars that have to be set, instance.yaml fields that silently fall back to defaults, and the DB workaround for ad-hoc role promotion when SEED_ADMIN_EMAIL was missed on first boot. docs(claude): codify vendor-agnostic OSS rule for AI agents and humans Adds a "Vendor-agnostic OSS" section to CLAUDE.md spelling out what cannot land in this repo (specific deployments, internal hostnames/projects, cross- references to private repos, customer-specific paths) and how to phrase abstractions instead. Plus a pre-PR grep checklist in the existing "Git Commits & Pull Requests" section. This trips up agents and humans alike — the previous version of #39 had private-deployment references in the body and a customer domain in a doc example. Surfacing the rule once in the file every Claude/Cursor/Aider session reads should prevent that on the next PR. * docs(oauth): cover DOMAIN + SERVER_URL env vars introduced by PR #48 PR #48 (merged) added DOMAIN-gated Secure cookie in google.py and documented SERVER_URL in .env.template, but this operator doc was drafted before that merge and didn't reference either variable. Adding both to the env table and extending the common-failure-modes table with a sticky-cookie / redirect-URI-mismatch entry that references SERVER_URL as the host-header-independent fix. Also aligns the compose command snippet with the `='*'` syntax that actually ships on main post-PR #48. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Vojtech Rysanek <vrysanek@groupon.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 07:07:33 +00:00
ZdenekSrotyr	9e19fb5219	chore(deploy): trust proxy headers + document HTTPS env vars (#48 ) * chore(deploy): trust proxy headers + document HTTPS env vars - uvicorn: add --proxy-headers --forwarded-allow-ips='' so the app honors X-Forwarded-Proto/Host from a TLS-terminating reverse proxy (Caddy, Cloudflare Tunnel, nginx, LB). Without this the app saw every request as plain HTTP and built redirect/OAuth URLs from the raw Host, which is fragile behind a proxy. - .env.template: document DOMAIN (enables Secure cookie flag) and new SERVER_URL (deterministic base URL for OAuth callbacks and external links). Grouped under a dedicated HTTPS / REVERSE PROXY section. chore(deploy): add proxy header flags to Dockerfile CMD and Kamal config Matches the docker-compose changes so non-compose deployments (docker run, Kubernetes, ECS, Kamal) also trust X-Forwarded-Proto/X-Forwarded-For. * fix(auth): align Google OAuth cookie Secure flag with password/email providers Google OAuth set the access_token cookie Secure flag based on the TESTING env var, while password and email providers use DOMAIN. This meant the DOMAIN env var (now documented in config/.env.template) did not actually control Secure for Google cookies. Align all three providers on DOMAIN so the documented behavior holds consistently.	2026-04-24 08:52:53 +02:00
Petr Simecek	1bbbe58ea0	release(2.1.0): durable sync, CLI auto-update, versioned wheel URL, version unification (#43 ) * fix(cli): versioned wheel URL in setup instructions; drop broken /cli/agnes.whl alias (#36) * fix(cli): inline PEP 427 wheel filename in setup instructions `uv tool install <server>/cli/agnes.whl` fails with error: The wheel filename "agnes.whl" is invalid: Must have a version because uv validates the filename in the URL path before fetching — so the server-side Content-Disposition header (which has the real versioned filename) is never consulted, and an HTTP redirect does not help either: uv resolves the filename from the initial URL. Fix the root cause by inlining the real PEP 427 filename into the setup snippet the dashboard copies to the clipboard. The wheel filename is resolved server-side via `_find_wheel()` and substituted into the lines returned from `setup_instructions.resolve_lines()`, so both the read-only HTML preview and the JS clipboard renderer get byte-identical output. Also added `/cli/wheel/{filename}` to serve wheels at their PEP 427 path, and kept `/cli/agnes.whl` as a 302 redirect for manual/legacy callers — though that redirect alone is NOT sufficient for `uv tool install` (uv validates before following redirects) and is there only as defense-in-depth. Verified locally: - `uv tool install <server>/cli/wheel/agnes_the_ai_analyst-2.0.0-py3-none-any.whl` succeeds - `/install` HTML now renders the versioned URL; `/cli/agnes.whl` no longer appears in the rendered snippet * fix(cli): remove /cli/agnes.whl alias entirely — it only confused users The bareword alias was never actually usable: - `uv tool install <server>/cli/agnes.whl` fails at filename validation before any HTTP fetch, so neither the Content-Disposition header nor a 302 redirect rescued it. - The 302-to-versioned-path fallback left a visibly "working" URL in browser / curl -L contexts, which is exactly how the original bug got reported in the first place ("the URL loads, why doesn't install work?"). Remove the endpoint and scrub all remaining references. The only CLI wheel URL is now `/cli/wheel/{filename}` with the real PEP 427 filename, which the setup-instructions template already generates server-side. Existing tests that referenced /cli/agnes.whl become negative tests ("must not appear") so we don't regress. * feat(cli): --version flag; sync --dry-run + progress indicator (#38) * feat(cli): add --version / -V flag Prints `da <version>` from package metadata (importlib.metadata). Falls back to "unknown" when the package is not installed (e.g. running from a source checkout without `uv pip install -e .`), instead of crashing. Eager typer callback, so `da --version` exits before subcommand resolution and does not require any auth/config. * feat(cli): da sync --dry-run + X/N progress indicator --dry-run reports what would be downloaded/uploaded without hitting the API or writing local state. Supports the full flag set (--table, --json, --upload-only); JSON shape is {"dry_run": true, "would_download": [...], "summary": {...}}. Progress bar now shows "[X/N] Downloading <table>..." with a Rich BarColumn + TaskProgressColumn + TimeElapsedColumn instead of a bare spinner — makes long syncs visible. * feat(cli): durable sync + server gzip + auto-update check (#41) * fix(sync): atomic writes + manifest hash verification + retry on transient errors Three durability hooks around stream_download and the sync command: 1. Atomic writes. stream_download now streams into `<target>.tmp` and calls os.replace() on success, so the real target file never exists in a half-written state. On failure the tmp is unlinked — no cleanup leftovers, no guard needed at read time. 2. Retry with backoff. Transient errors (ConnectError, ReadError, WriteError, RemoteProtocolError, TimeoutException, 5xx) are retried up to 3× with 0.3s / 1s / 3s backoff. 4xx (auth, 404) surfaces immediately — retrying those is pointless. 3. Manifest-hash verification. After download, sync.py computes MD5 of the target (same 8KiB chunking as app/api/sync.py:_file_hash) and compares against `server_tables[tid]["hash"]`. Mismatch ⇒ unlink, record error, skip state commit. The PAR1 structural check survives as a fallback for legacy manifests without a hash. Also makes _rebuild_duckdb_views tolerant: single broken parquet is skipped with a stderr warning instead of killing the whole rebuild. Supersedes #40 — this commit is a strict super-set (hash check + PAR1 fallback + atomic write + retry). #40 can be closed without merging. * perf(server): enable GZipMiddleware for JSON / HTML responses GZipMiddleware at minimum_size=1024 shaves bandwidth on manifest-style JSON endpoints (/api/sync/manifest, /api/version, …) and the /install HTML preview. Parquet file downloads are already columnar-compressed so the middleware sees limited benefit there — but it doesn't hurt, httpx on the client side decompresses transparently. Placed after session middleware so gzip wraps the session-Set-Cookie response too, and before CORSMiddleware so compression is applied to both cross-origin and same-origin responses. * feat(cli): auto-check for newer CLI version on startup Server side - GET /cli/latest returns {version, wheel_filename, download_url_path} for whatever wheel is currently in AGNES_CLI_DIST_DIR. Public, cacheable, no secrets — consumed by the CLI auto-update probe. Client side - New cli/update_check.py: reads /cli/latest with a 3s timeout, caches the result in $DA_CONFIG_DIR/update_check.json for 24h. Cache is invalidated when the installed version changes (e.g. after a fresh `uv tool install`) so stale "you're behind" warnings don't linger. - Root typer callback fires the probe before subcommand dispatch; any failure is swallowed so a bad network never blocks a working command. - Outdated → one-line stderr warning: [update] da 2.0.0 is out of date — latest on this server is 2.1.0. Upgrade: uv tool install --force <server>/cli/wheel/<…>.whl - Disable with DA_NO_UPDATE_CHECK=1. * fix(pr-review): None-guard the upgrade line + skip gzip on parquet paths Two follow-ups from Devin review on #41. 1. format_outdated_notice(UpdateInfo(download_url=None)) emitted literal "uv tool install --force None" — copy-pasting that fails. Drop the upgrade snippet when the URL is absent and keep only the version line. 2. GZipMiddleware compressed everything over 1024 bytes, including the parquet FileResponses served by /api/data/{tid}/download, /cli/wheel/{name}, and /cli/download. Parquet is already columnar- compressed — gzip there is pure CPU + latency with no size win, and /api/data bodies can reach hundreds of MB. Wrap GZipMiddleware in a small _SelectiveGZipMiddleware that skips those path prefixes and delegates the rest to the stock middleware. JSON / HTML endpoints (manifest, /install, /api/version, …) still get compressed. * release: bump to 2.1.0 — unify AGNES_VERSION with pyproject.toml version (#42) Before: two independent version systems. pyproject.toml carried semver (2.0.0 → wheel filename → `da --version`) while release.yml injected CalVer into AGNES_VERSION (e.g. 2026.04.155 → /api/version). Users saw different strings in the CLI vs. the /install page, and the CLI auto- update check couldn't tell "new deploy, same package version" apart from "new package version". Make pyproject.toml [project].version the single product-version source of truth. release.yml extracts it and feeds AGNES_VERSION, so every surface (/api/version, /api/health, /cli/latest, `da --version`) agrees on one number. The CalVer tag keeps doing what CalVer is for: release identity on the git tag and Docker image tag (versioned_tag). Also wires AGNES_TAG through the build: release.yml → Dockerfile ARG → env, so /api/version.image_tag finally reports the actual image tag instead of the "unknown" fallback. Bump to 2.1.0 to reflect the PRs shipped on ps/wheel-name-fix: durable sync (atomic writes + manifest MD5 + retry), server GZip, CLI auto- update probe, setup snippet PEP 427 URL. * fix(pr-review): directional version compare in is_outdated() UpdateInfo.is_outdated() used `self.latest != self.installed`, which fires in both directions. If the server is rolled back or the user connects to an older deployment, the CLI would warn "out of date" and — worse — the formatted notice would prompt uv tool install --force <older-version>.whl i.e. an unintended downgrade. Compare with packaging.version.Version (PEP 440 aware, handles pre- release tags). Fall back to dotted-int tuple compare if packaging is somehow missing, and return False on unparseable strings — better to miss an upgrade hint than to silently suggest a downgrade. Adds 4 test cases: installed older (True), installed newer (False), 10.0.0 vs 2.1.0 lexical-compare trap (correct), unparseable strings (False). Addresses Devin review on #43. * fix(pr-review): read FastAPI app version from package metadata app/main.py:80 hardcoded `version="2.0.0"` in the FastAPI constructor. After #42 bumped pyproject.toml to 2.1.0, /api/version, /cli/latest, and `da --version` all reported 2.1.0 while /openapi.json and the /docs UI still advertised 2.0.0. Read `agnes-the-ai-analyst` version via importlib.metadata (same pattern cli/main.py:_cli_version already uses), with a `"dev"` fallback when the package is not installed (source checkout). This way pyproject.toml stays the single source of truth across every version surface — /openapi.json now tracks the bump automatically. Adds a dedicated test file to pin this behavior so a future regression to a hardcoded literal fails at CI. Addresses second Devin finding on #43. * fix(pr-review): _fmt_bytes PiB label + negative cache in update_check Two more follow-ups from Devin review on #43. 1. _fmt_bytes off-by-unit. The old loop exited at TiB but the fallback labelled PiB, so 1 PiB rendered as "1024.0 PiB". Restructure: put every unit inside the loop (KiB through EiB) so the division count always matches the label. Covers up to 1 ZiB cleanly; anything beyond renders as "<big>.0 EiB" rather than crashing. 2. Negative cache for failed /cli/latest probes. On a corporate firewall / VPN that silently drops packets, the 3s HTTP timeout fired on every `da` invocation. Writing a `latest=None` cache entry with a 5-minute TTL caps that at one probe per 5min. Successful probes still use the 24h TTL. Reading logic branches on whether the cached `latest` is None. Adds TestFmtBytes (2 cases: small/medium sizes and the PiB/EiB fallback regression), plus two TestSync update-check cases covering negative- cache reuse and TTL expiry.	2026-04-22 21:18:18 +02:00
ZdenekSrotyr	7e4ddf0b01	feat(auth): password reset & invite flows for web + admin (#34 ) (#37 ) * feat(auth): password reset & invite flows for web + admin (#34) Wires end-to-end the previously orphaned password_reset.html and password_setup.html templates, adds the missing POST /auth/password/reset handler (closes #34), and restores the Reset action in the admin user UI (which origin/main had removed precisely because the flow was broken). Web flow - GET /auth/password/reset — renders the set-new-password form - POST /auth/password/reset — 'Forgot Password?' request; emails link, anti-enumeration (same response for unknown email) - POST /auth/password/reset/confirm — validates token + 24h TTL, sets new password, clears token, logs user in - GET /auth/password/setup — renders the setup form (invite link landing) - POST /auth/password/setup/request — signup-tab 'Request Access' (email-only) - POST /auth/password/setup/confirm — 7-day TTL, sets password + name, logs in - Reuses LOCAL_DEV_MODE pattern from email.py: logs the link loudly so developers can use the flow without an SMTP/SendGrid transport Admin flow - POST /api/users accepts send_invite → returns invite_url + invite_email_sent - POST /api/users/{id}/reset-password now returns a full reset_url pointing at the dedicated password-reset endpoint (NOT the magic-link verifier, which would log the user in without prompting for a new password) - admin_users.html: restored Reset row action, copyable reset/invite link modals, invite checkbox on create, reworded 'magic-link not wired' notes Backward compat - JSON POST /auth/password/setup kept unchanged (existing tests pass) - Active-account gate applied to reset/setup flows (matches password_login) Tests: 21 new cases (tests/test_password_flows.py) covering GET renders, request/confirm happy + error paths, TTLs, anti-enumeration, and admin invite/reset URL responses. Full suite: 1309 passed. Closes #34 * fix(admin-users): allow horizontal scroll when actions overflow Four action buttons (Tokens, Reset, Set pwd, Delete) can exceed the viewport on narrow screens. Switch .users-table-wrap from overflow: hidden to overflow-x: auto so the table scrolls instead of clipping, and lock row-actions buttons to a single nowrap line. * fix(admin-users): override base 800px container so table can use full width The base layout caps .container at 800px, so the table was always being clipped regardless of viewport. Unclamp the container on this page and widen the inner page cap to 1400px. * fix(auth): address Devin review — harden JSON setup, anti-enumeration, preserve email case Addresses findings from Devin review on PR #37: 1. JSON POST /auth/password/setup now enforces the same SETUP_TOKEN_TTL (7 days) and active-account check as the web flow. An expired token or a deactivated user can no longer bypass the gate by posting JSON. Existing test fixture seeds setup_token_created=now so backward-compat tests continue to pass. 2. GET /auth/password/setup no longer looks up the user to pre-fill name. The form renders identically regardless of whether the email exists, consistent with anti-enumeration in POST /setup/request. 3. reset_request / setup_request no longer lowercase the submitted email. The rest of the codebase (password_login, magic-link, admin create) uses case-sensitive lookups, so normalizing only here would silently fail for mixed-case accounts. Tests: 6 new cases covering expired-JSON-setup, missing-created-timestamp, deactivated-user-rejection, mixed-case email preservation, and the anti-enumeration property of GET /setup.	2026-04-22 17:43:57 +02:00
Petr Simecek	f593a151fc	docs(security): add padak-security.md audit report (#35 ) * docs(security): add padak-security.md — full audit report from 2026-04-22 Four-agent audit (secrets/SQLi/authz/SSRF, auth flows, UI wiring, data layer) deduped into one document. Top 5 to fix first, second/third/fourth tier by real exploitability, verified non-issues so we don't re-open them, and coverage gaps where automated scanners / pytest / Jira connector / infra were not touched. Missing /auth/password/reset is already tracked in padak/keboola_agent_cli#206; other top items (script sandbox RCE, rate-limit, backslash open-redirect, SSRF) still need their own issues. * docs(security): rephrase methodology description Replace "four parallel agents" with "parallel review passes over four scope areas" — same meaning, removes the overlap with agentic-AI terminology.	2026-04-22 16:31:13 +02:00
Petr Simecek	cbb7733987	feat(dev): make local-dev targets for one-keystroke LOCAL_DEV_MODE startup (#33 ) * feat(dev): add make local-dev targets wrapping run-local-dev.sh `make local-dev` runs scripts/run-local-dev.sh so docker compose + the LOCAL_DEV_MODE overlay are one keystroke away. `make local-dev-down` and `make local-dev-logs` manage the same 3-file stack. * fix(make): ensure .env exists before local-dev-down / local-dev-logs docker-compose.yml declares `env_file: .env` for several services (extract, telegram-bot, ws-gateway, …). Compose validates those paths during config parsing even for profiled services that never start, so a missing .env breaks `make local-dev-down` and `make local-dev-logs`. Add .env as a file-target prerequisite — touched on demand, so no-op when already present. Addresses Devin review on #33.	2026-04-22 14:57:10 +02:00
Petr Simecek	9b5214ea6f	feat(dev): LOCAL_DEV_MODE for one-command local dev + magic-link fixes (#32 ) * feat(dev): add LOCAL_DEV_MODE for one-command local dev When LOCAL_DEV_MODE=1, every protected route auto-authenticates as a seeded admin user (default dev@localhost) — no login screen, no Google OAuth config, no magic-link roundtrip. Startup logs a loud warning to make misuse obvious. Also fixes two preexisting bugs in the magic-link flow that surfaced while wiring up the dev fallback: - /auth/email/verify only accepted POST, but the URL embedded in emails is a GET link — clicking from any mail client returned 405. Added a GET variant that consumes the token, sets the auth cookie, and redirects to /dashboard. - Token expiry check compared an offset-aware datetime.now(timezone.utc) against an offset-naive value from DuckDB, raising TypeError on every valid link. Normalize the stored timestamp to UTC before subtracting. Dev-only fallback (scoped strictly to LOCAL_DEV_MODE to keep test and production behavior identical): send-link logs the magic link to stderr and returns it as dev_link in the JSON response when no SMTP is configured. Usage: ./scripts/run-local-dev.sh open http://localhost:8000 # lands on /dashboard as admin * fix(dev): URL-encode magic-link email + avoid /login redirect loop Two issues surfaced by Devin review on PR #32. 1. _build_magic_link interpolated email into the URL unescaped. For addresses with '+' (e.g. user+tag@gmail.com) Starlette's query parser decoded '+' as a space on the GET /verify side, so repo.get_by_email returned None and every click yielded 401 "Invalid link". quote(email, safe='') fixes both the email transport and the dev_link fallback. 2. /login in LOCAL_DEV_MODE unconditionally redirected to /dashboard. If dev-user seeding failed at startup (main.py wraps seed in try/except), /dashboard 401'd, the HTML redirect handler bounced to /login, and the loop repeated until the browser aborted. Now /login checks the dev user actually exists before short-circuiting; otherwise it falls through to the normal login form so the missing seed is visible.	2026-04-22 14:47:33 +02:00
ZdenekSrotyr	d2c76cb221	User management + PAT + CLI distribution + HTML auth redirect (#9 #10 #11 #12 ) (#28 ) * fix: redirect unauthenticated HTML routes to /login (#10) * docs(plan): user mgmt + PAT + CLI distribution implementation plan (#9 #10 #11 #12) * build(docker): produce wheel artifact for /cli/download (#9) * feat(db): schema v5 — users.active + deactivated_at/by (#11) * feat(api): /cli/download wheel + /cli/install.sh with baked server URL (#9) * feat(users): repository supports active flag + count_admins (#11) * feat(ui): /install page with per-deployment install instructions (#9) * feat(api): user PATCH/reset-password/set-password/activate/deactivate (#11) * fix(cli): da login prompts for password and sends it in body (#9) * test(api): safeguard tests for self-deactivate and last admin (#11) * feat(auth): reject requests from deactivated users (#11) * fixup(#10): propagate next through /login buttons + lock down sanitizer tests * feat(cli): da admin set-role/activate/deactivate/reset-password/set-password (#11) * feat(ui): /admin/users management page (#11) * feat(db): schema v6 — personal_access_tokens (#12) * feat(users): access_tokens repository (#12) * feat(auth): JWT carries typ (session\|pat) and explicit jti (#12) * feat(auth): reject revoked/expired PATs; update last_used_at (#12) * feat(api): /auth/tokens CRUD + admin revoke; session-only guard (#12) * feat(cli): da auth token create/list/revoke (#12) * feat(ui): /profile page with PAT create/list/revoke (#12) * docs: PAT usage and session/PAT TTL clarification (#12) * feat(auth): PAT first-use-from-new-IP audit + last_used_ip (schema v7) (#12) Closes remaining acceptance gap from issue #12: audit_log entry on first use of a PAT from an IP that differs from the recorded last_used_ip. - schema v7: personal_access_tokens.last_used_ip column - AccessTokenRepository.mark_used now stores the client IP - get_current_user extracts client IP (X-Forwarded-For first hop, fallback to request.client.host) and emits a token.first_use_new_ip audit when the IP changes on a subsequent use (not the very first use) - tests: new-ip audit, same-ip no-op, first-ever-use no-op, schema v7 column * fix: address Devin review findings on PR #28 - app/main.py: exclude /auth/* from HTML redirect handler so JSON endpoints under /auth/ (PAT CRUD used by `da auth token` CLI) keep their 401 JSON contract (Devin #1, bug) - app/api/tokens.py: reject expires_in_days <= 0 explicitly; use `is not None` so 0 no longer silently creates a non-expiring token (Devin #2) - app/api/users.py: validate role against Role enum in create_user to match update_user and prevent 500 on role-protected requests later (Devin #3) - app/web/templates/admin_users.html: escape user-supplied strings before innerHTML; move onclick handlers to addEventListener via data attributes so emails with quotes / HTML no longer break the UI or enable stored XSS (Devin #4) - app/auth/router.py, app/auth/providers/{password,google}.py: reject deactivated users at login instead of issuing a JWT that would then fail on the next request — removes the confusing redirect loop (Devin #5) - CLAUDE.md: document schema v7 instead of stale v4 (Devin #6) - tests/test_web_ui.py: regression test for the /auth/* JSON 401 * feat(web): add /profile and /admin/users links to dashboard nav * feat(web): point setup banner at /install page * chore(web): drop unused setup_instructions context * fix: address Devin review round 2 on PR #28 - app/api/tokens.py: when expires_in_days is None (the "never" option), use a ~100-year JWT expiry so the token doesn't silently die in 24h via the session-default fallback in create_access_token. The real expiry enforcement stays in verify_token's DB-level check (Devin 🔴) - app/web/templates/profile.html: escape t.name and other user-supplied strings via esc() helper before innerHTML, same pattern as admin_users.html. Move revoke onclick to data-attribute + addEventListener (Devin 🟡) - app/api/cli_artifacts.py: use `mktemp -d` with X's at end of template for GNU/BSD portability, place wheel inside the temp dir and clean up with rm -rf (Devin 🚩) * feat(web): redesign /install page; make curl one-liner primary, collapse manual Rebuild the public /install page using the dashboard visual language (shared header, card layout, gradient hero, design tokens from style-custom.css). The page is now anchored on the one-liner install path: curl -fsSL <server>/cli/install.sh \| bash is rendered as the primary, prominent step 1, while the old manual wheel-download flow is tucked behind a closed-by-default <details> block for users in restricted/offline environments. Information architecture: hero (server URL + version) -> step 1: quick install (one-liner, big Copy button) -> step 2: create PAT on /profile + export DA_TOKEN / da auth whoami -> step 3: Claude Code / MCP via ~/.config/da/token.json -> collapsed "Manual install" details for download-wheel flow -> footer link to docs/HEADLESS_USAGE.md Every shell snippet has a vanilla-JS "Copy" button that confirms visually ("Copied!" for 1.5s) and falls back to textarea+execCommand on non-secure contexts. No new dependencies, no bundler. The route now also pulls an optional user so the header shows the same nav (Dashboard / Profile / Logout) as dashboard.html when a session exists, while staying fully public when signed out. * fix(cli): use real wheel filename in install.sh (broken pip/uv install) The installer wrote the downloaded wheel as agnes_cli.whl, which lacks a PEP-427 version component — both pip and uv tool install reject it and abort the one-liner. Use curl -OJ so Content-Disposition determines the on-disk filename, then resolve it via glob. Install an EXIT trap to remove the tmpdir even when install fails. * fix(web): correct manual install wheel glob and add PEP 668 / PATH hints - Wheel glob is agnes_the_ai_analyst-.whl (not agnes-.whl) — the old pattern never matched the real artefact name from the build. - Add — or — separator between uv tool install and pip install. - Warn that pip install --user is blocked on macOS Homebrew / modern Debian (PEP 668) and recommend uv tool install as the default path. - Both flows now show the ~/.local/bin PATH hint so a fresh shell can find the da binary after install. * fix(web): consistent session.user reference in install header The avatar-letter fallback inside {% if session.user %} was reading user.name / user.email directly, but the route dependency can pass user=None — those references resolved to an empty FlexDict and produced an empty avatar circle. Read everything through session.user to match the guard and the dashboard pattern. * fix(web): point headless usage link at GitHub source /docs/HEADLESS_USAGE.md 404s — no static route serves repo docs. Point the footer link at the rendered markdown on GitHub instead of adding a dedicated docs serving route just for one file. * feat(web): /install hero size, anon sign-in banner, step 2 copy polish - Bump hero h1 from 26px to 30px to match dashboard primary scale. - Anonymous visitors see a small sign-in banner above Step 2 (creating a token requires auth; without the banner the flow appears stuck). - Add an 'After generating your token' section label inside Step 2 so the /profile CTA button no longer looks wedged mid-sentence between adjacent paragraphs. * chore(web): /install a11y + version pill polish - aria-live='polite' on copy buttons so screen readers announce the 'Copied!' state change. - Replace redundant INSTANCE_NAME eyebrow (already in the header logo) with 'Getting started'. - Hide the version pill when AGNES_VERSION is unset/'dev' — avoids the misleading 'vdev' label in local/unbuilt runs. - Manual summary focus-visible outline-offset +2px (was -2px which clipped inside the card), and mark the chevron as decorative. * fix(web): use session.user in dashboard avatar fallback Inside {% if session.user %} guard, the avatar fallback referenced (user.name or user.email). If user is None the block crashes when the profile picture is absent. Align with the guard variable. * fix: address Devin review round 3 on PR #28 - app/api/users.py: stop auto-sending email from reset_password. The magic-link sender would deliver a "Login Link" that — when clicked — consumes the reset_token via verify_magic_link and logs the user in WITHOUT prompting for a new password. Admins now share the raw reset_token from the API response manually, or use set-password directly. email_sent is always False. Documented inline. (Devin 🟡) - app/api/cli_artifacts.py: harden /cli/install.sh generation against shell injection via Host header or AGNES_VERSION. base_url is validated against a strict scheme+host+port regex; version against an alnum + dot/dash/underscore allowlist. Both values are also piped through shlex.quote() as defense in depth. (Devin 🟡) The shared users.reset_token column between magic-link and password- reset flows (Devin 🚩) remains an architectural gap; splitting into separate columns needs schema v8 and is tracked for a follow-up PR. * docs, chore(grpn): manual-deploy helpers + hackathon deploy learnings Adds scripts/grpn/ — Makefile + agnes-auto-upgrade.sh + README for operating Agnes on GRPN's existing foundryai-development VM when the full Terraform flow is blocked by org policies: - iam.disableServiceAccountKeyCreation (org constraint) forbids SA JSON keys, so GCP_SA_KEY-based CI is unavailable - No projectIamAdmin delegation → bootstrap-gcp.sh can't grant roles - Secret Manager IAM bindings require setIamPolicy which editor lacks Helper targets: deploy, deploy-tag, recreate, restart, stop, start, status, version, logs, ps, env, ssh, tunnel, open, bootstrap-admin, set-data-source, install-cron, uninstall-cron. docs/superpowers/plans/2026-04-22-grpn-deploy-learnings.md — running log of all org-policy constraints hit during the hackathon deploy, with workarounds and derived follow-ups (WIF support, external_ip variable, customer onboarding IAM checklist). Not a replacement for the TF flow — stopgap until WIF lands. * fix(web): make header logos clickable links to home * feat(web): one-click "Setup a new Claude Code" button Adds a single-button flow on the dashboard and /install page that generates a fresh personal access token via POST /auth/tokens and copies a complete, paste-ready setup script (server URL, token, install/verify commands) to the clipboard. Falls back to a modal textarea when the clipboard is blocked; redirects to /login on 401; surfaces backend errors inline. - dashboard.html: replaces the top "Set up your local environment" anchor with a real button wired to setupNewClaude(). Removes the duplicate bottom setup banner to keep a single entry point. - install.html: for signed-in users, Step 1 leads with the one-click button and demotes the curl one-liner into a collapsible "Or run manually" aside. Anonymous visitors still see the curl flow plus a sign-in hint. - No new deps. Vanilla JS. Token lives in memory/clipboard only — never rendered into persistent DOM. * feat(cli): add "da auth import-token" for non-interactive PAT login Writes a provided JWT into ~/.config/da/token.json using the canonical {access_token, email, role} shape expected by save_token(). Decodes the token locally to pull email/role claims, verifies it against the server via GET /api/catalog/tables, and refuses to overwrite an existing token file if the server returns 401. --email / --role overrides exist for tokens missing those claims; --skip-verify bypasses the server round-trip for offline / CI scenarios. * test(cli): cover da auth import-token success + 401 + claim-fallback paths Three new tests in TestAuthImportToken: - valid JWT + 200 -> canonical token.json written - 401 from /api/catalog/tables -> exit 1, existing token file untouched - JWT without email/role claims -> refused without overrides, accepted with --email / --role flags * feat(web): update one-click Claude setup instructions — explicit uv install, import-token, skills question Replaces the fragile `cat > token.json <<EOF` clipboard payload with an explicit, auditable sequence: 1. `curl -fsSL /cli/download` + `uv tool install --force` (no opaque `curl \| bash`). 2. `da auth import-token --token ...` instead of hand-written JSON. 3. Explicit PATH persistence for zsh/bash. 4. A required question to the user about whether to copy the bundled skills into ~/.claude/skills/agnes/ or pull them on-demand via `da skills show`. 5. A final confirmation step with whoami + version output. Factored both pages to include a shared partial (app/web/templates/_claude_setup_instructions.jinja) so dashboard.html and install.html can never drift apart again. {server_url} and {token} stay as runtime placeholders substituted by renderSetupInstructions(). * feat(ui): modernize /admin/users + unify header nav across pages - New shared partial app/web/templates/_app_header.html — single source of truth for the top navigation. Used by base.html and dashboard.html (which doesn't extend base.html). Active page highlighted via request.url.path. Admin "Users" link gated by session.user.role. - style-custom.css: add .app-header / .app-nav-link / .app-btn-logout / .app-avatar styles (mirrors dashboard's previous inline copy under app-* prefix). Mobile-friendly fallback at <720px. - base.html: include the new partial so every page extending base (admin_users, profile, login_email, error, …) gets the same chrome the dashboard has. - dashboard.html: replace its inline <header class="header"> markup with the shared partial. Inline .header CSS left in place as harmless dead code (separate cleanup PR). - admin_users.html: rewritten with avatars, role pills (color-coded per role), toggle switch for active, search/filter input, toast notifications, modal dialogs replacing alert/confirm/prompt, one-click copy for the reset token, empty / loading states. All XSS-safe via the existing esc() helper + data-attribute event delegation. - tests/test_web_ui.py: smoke test that /admin/users renders the new shared header chrome and the modernized markup. * feat(api): serve CLI wheel at /cli/agnes.whl for direct uv install uv tool install inspects the URL path suffix to recognise a wheel, so /cli/download (which has no .whl suffix) cannot be installed directly. Expose a stable /cli/agnes.whl alias over the same wheel lookup so users can run: uv tool install --force https://<server>/cli/agnes.whl * test(cli): cover da auth import-token --server persisting to config.yaml The server persistence was already implemented in the import-token command (save_config({server}) call) but not covered by tests. Add an explicit test so the one-step setup contract — single import-token call writes both token and server — cannot regress. * feat(web): simpler Claude setup — single uv install URL, single import-token call User feedback: the prior clipboard payload repeated the server URL and token across multiple steps (curl + tmpfile + install + rm + separate seed-config + import-token). Collapse to: 1. uv tool install --force {server_url}/cli/agnes.whl (single URL, direct) 2. da auth import-token --token ... --server ... (one call, persists both) 3. da auth whoami 4. skills (ask user first) 5. confirm uv accepts HTTPS URLs that end in .whl and installs them directly, so the tmpfile dance is unnecessary. import-token --server already persists the server to config.yaml, so no separate printf > config.yaml step. * fix(tests): update admin users heading assertion after template rename The admin_users.html template now uses <h2 class="users-title">Users</h2> instead of <h2>User management</h2>. Update the assertion to match. * feat(ui): unify header across remaining 7 standalone pages These 7 pages render their own full <html> and don't extend base.html, so the previous unification commit only covered base + dashboard. Each had its own ad-hoc <header> markup with inconsistent classes (.top-header / .header / .page-header), inconsistent nav-link sets, and inconsistent avatar/email styling. Replace each inline <header>...</header> block with the shared {% include '_app_header.html' %} so /activity-center, /admin/permissions, /admin/tables, /catalog, /corporate-memory, /corporate-memory/admin, and /install all show the same chrome (Dashboard / Install CLI / Profile / Users / email + avatar / Logout) with the active page highlighted via request.url.path. Old inline header CSS (.header, .top-header, .page-header, .nav-link, etc.) is left in place as harmless dead code; it can be cleaned up in a follow-up sweep. * feat(web): add readable preview of Claude setup payload on dashboard + /install Move the line-by-line setup instructions into app/web/setup_instructions.py as the single source of truth, then render them in two modes from the existing _claude_setup_instructions.jinja partial: - preview_mode=True → visible, read-only <pre><code> block with the real server URL and a clearly-styled placeholder token (never a real one). - preview_mode=False → the JS SETUP_INSTRUCTIONS_TEMPLATE used by the one-click flow (unchanged behaviour). Both /dashboard (env-setup-cta card) and /install (Step 1 card) now show the preview directly under the 'Setup a new Claude Code' button so users can see exactly what will land in their clipboard before they click. * feat(web): update setup instructions — `da diagnose` step, explicit section titles Rework the Claude Code setup payload to: - Give every numbered step an unambiguous verb header ("1) Install the CLI", "2) Log in", "3) Verify the login", "4) Run diagnostics", "5) Skills (ask the user first)", "6) Confirm"). - Add step 4 `da diagnose` as the post-login health check. The CLI already ships this command (cli/commands/diagnose.py); it prints "Overall: healthy" and a list of green checks that map cleanly to next actions. - Ask the skills copy-vs-on-demand question verbatim so Claude Code always prompts the user the same way. - Replace the terse "Confirm" line with a 4-bullet summary (version, whoami, skills choice, diagnose status) so the return message is structured and comparable across setups. * chore(web): remove stale MCP card from /install (no MCP server today) The 'Use with Claude Code / MCP' card (Step 3 on /install) referenced an MCP integration Agnes does not ship. Remove the whole card. The one-click 'Setup a new Claude Code' flow in Step 1 already covers the long-lived client use case and is less confusing than dangling persistence tips for a non-existent integration. * feat(api): include user_email + last_used_ip + user_id in admin tokens list response Adds AdminTokenItem response model (superset of TokenListItem) and AccessTokenRepository.list_all_with_user() joining personal_access_tokens with users to denormalize user_email. Needed for /admin/tokens UI where admins triage tokens across all users. * feat(web): /admin/tokens page — list, filter, search, revoke across all users Adds a new admin-only page with client-side filtering (status, user email, last-used window), column sorting, counts bar (active/revoked/expired), and an inline revoke action. Mirrors the /admin/users visual language. * feat(web): add Tokens nav link for admins + deep-link from admin/users row Admin-only nav entry to /admin/tokens, and a per-row Tokens button on /admin/users that prefills the token page's user filter via ?user=<email>. * test(admin): cover /admin/tokens rendering, filter state, non-admin denial, revoke Verifies admin can render the page (title + JS hooks present), a non-admin is blocked, unauthenticated users are redirected, the admin list response includes user_email / user_id / last_used_ip, and admin can revoke another user's token. * feat(web): modern redesign of /admin/tokens — hero, stat strip, refined table, responsive cards, a11y * feat(web): ditch the table — /admin/tokens as a card stack, modern GitHub-style list Replaces the table-based layout with a stack of self-contained token cards inside a <ul role=list>. Each card is a flex row: avatar + name/meta on the left, last-used block in the middle, status pill + outlined 'Revoke' button on the right. Status and sort controls are pill-shaped toggle chips; user email search has an inline search icon. No <table>/<tr>/<th>/<td> anywhere. Responsive below 720px (card stacks vertically) and 480px (stat chips 2x2). Preserves filter IDs (flt-status, flt-user, flt-last-used) and data-revoke for existing tests. * feat(web): add /tokens (role-aware) — single page for both user PAT CRUD and admin overview - Rename admin_tokens.html -> tokens.html with a new is_admin context flag. - New route GET /tokens: renders the same card-stack UI for everyone. * Admins: loads /auth/admin/tokens, shows owner column + stat strip, keeps the owner-email search box and sort-by-owner chip. * Non-admins: loads /auth/tokens (own tokens only), hides owner column + stat chips, adds a 'New token' CTA in the hero that opens a modal (name + expires_in_days) calling POST /auth/tokens. The raw token is revealed once in a dismissable banner and cleared from the DOM on Hide. - GET /admin/tokens now 302-redirects to /tokens, preserving query string (so the /admin/users deep-link ?user=foo still works). * feat(web): /tokens full-bleed layout to match dashboard width The hero, toolbar, and card list used to sit inside base.html's .container (max-width 800px). Break out with negative horizontal margins so the page spans the viewport like /dashboard does, capped at 1440px for readability on very wide screens with a 24px gutter on each side. - No change to base.html itself. The override is scoped to .tokens-page. - body { overflow-x: hidden; } guards against rare horizontal scrollbars. - < 808px viewport: reset to natural flow (mobile already narrower). - ≥ 1488px viewport: cap to 1440px and re-center. * chore(web): remove /profile template + nav link (redirect /profile -> /tokens) The old /profile PAT CRUD page is now redundant — the modern /tokens page covers both user and admin flows. Delete the template; the router's /profile handler already 302-redirects to /tokens. Nav cleanup: - Remove the 'Profile' link. - Show a single 'Tokens' link to every signed-in user (previously only admins saw it). - Active-state matches /tokens, /admin/tokens, and /profile so the highlight survives the redirect chain. /install CTA now points at /tokens instead of /profile. * test: cover /tokens for admin + non-admin flows, /profile redirect, nav update tests/test_admin_tokens_ui.py - Point admin rendering test at /tokens directly and tighten assertions (admin-only stat strip + owner search, non-admin CTA absent). - Add test_non_admin_can_render_tokens_page: personal body, New-token CTA, create-modal, reveal banner; stat strip + owner search absent. - Add test_admin_tokens_redirects_to_tokens: 302 to /tokens, query string (?user=...) preserved for the /admin/users deep-link. - Add test_profile_redirects_to_tokens: 302 to /tokens. - Add test_non_admin_can_create_pat_via_tokens_page_api: exercises the POST /auth/tokens call that the non-admin create-modal submits. tests/test_pat.py - test_profile_page_renders -> test_profile_page_redirects_to_tokens: assert the 302 + that /tokens lands on the unified non-admin body. tests/test_web_ui.py - admin_users nav assertion: 'Tokens' link present, 'Profile' link absent. - Add test_nav_shows_tokens_link_for_non_admin: non-admins see the same 'Tokens' link (previously only admins did). - Add test_profile_redirects_to_tokens back-compat check. * feat(web): collapse 'What Claude Code will receive' by default The preview block on /dashboard and /install now uses <details>/<summary> so it is hidden by default. Click the chevron/title to expand and review the clipboard payload. Markup stays in the DOM so existing tests that assert on content continue to pass. * fix(web): /tokens width — override .container to 1280px like dashboard The negative-margin full-bleed trick was fragile and pushed content past the right edge on deployed viewports. Replace with a simple max-width override of base.html's .container on this page only, matching /dashboard's 1280px center-column layout. * feat(web): split role-aware /tokens into my_tokens.html + admin_tokens.html * feat(web): router — separate handlers for /tokens (own) and /admin/tokens (all) * feat(web): nav — show Tokens for all, add All tokens for admins * test: cover split token pages (own vs all) + admin access gating * feat(web): move 'My tokens' into a user dropdown menu Replaces the separate Tokens/email/Logout nav trio with a rounded avatar trigger that opens a dropdown containing the user's email, role, a 'My tokens' link, and Logout. Admin-only 'All tokens' stays as a top-level nav item since it's an admin function, not a personal one. Click-outside and Escape close the panel; chevron rotates on open. * fix(api): allow PATs to list/get/revoke their own tokens (CLI flow) The documented 'da auth token list/revoke' CLI flow in docs/HEADLESS_USAGE.md uses a PAT, but the previous dependency (require_session_token) returned 403. Only create_token must be session-only to prevent PAT-spawning-PAT chains; listing and revoking your own tokens is safe with a PAT. * fix(api): cap expires_in_days at 3650 to avoid datetime overflow (500 to 400) Values above ~11 million days overflowed datetime.max in datetime.now(utc) + timedelta(days=...) and surfaced as an unhandled OverflowError → 500. Cap at 10 years with a clear 400 instead; the no-expiry code path is unaffected. * fix(api): relax _SAFE_URL_RE to allow path prefixes, underscores, and IPv6 The previous regex rejected legitimate reverse-proxy base_url values (https://host/agnes/), underscores in Docker Compose hostnames, and IPv6 literals (http://[::1]:8000). Widen the charset and allow an optional trailing path. shlex.quote continues to provide defense-in-depth against any metacharacter that slips through. * fix(web): /login/email and Google OAuth propagate next_path Previously, /login/email silently dropped the ?next=<path> query param so the hidden form field rendered empty and login always landed on /dashboard. Google's button was hard-coded to /auth/google/login, ignoring next entirely. - /login page now appends ?next to the Google button URL - /login/email reads + sanitizes next, passes as template context - google_login stashes sanitized next_path in session['login_next'] - google_callback pops + re-sanitizes and redirects there Sanitization factored into app/auth/_common.safe_next_path. * fix(auth): differentiate argon2 VerifyMismatchError from internal errors in web login The previous except (VerifyMismatchError, Exception) collapsed both cases into the generic 'invalid credentials' redirect, silently hiding corrupted-hash / library errors from ops. Split the two: bad password still gets ?error=invalid; anything else logs via logger.exception and redirects with ?err=auth_internal so ops have a visible signal and users don't retry forever against a broken password_hash column. * docs: correct CLAUDE.md table name (personal_access_tokens) v7 note referenced 'access_tokens.last_used_ip' but the real table is personal_access_tokens (as mentioned two tokens earlier in the same bullet). Same-file consistency fix. * chore(web): clarify admin user-reset UI — encourage Set password over the unused reset_token POST /api/users/{id}/reset-password stores and returns a token but no endpoint consumes it — the magic-link sender would log the user in without prompting for a new password, defeating the reset. - Drop the 'Reset' row action from admin_users so admins aren't pointed at a dead end. - Rewrite the reveal-modal copy to tell admins to use Set password and explicitly note that the magic-link flow isn't available for reset tokens in this build. The API endpoint stays for API-level future use. * test: cover PAT CLI flow, expires_in_days overflow, proxy base_url, next propagation - tests/test_pat.py: PAT can list own tokens (200, was 403); PAT can revoke own tokens (204); create_token returns 400 for expires_in_days > 3650 (was 500 via datetime overflow). - tests/test_cli_artifacts.py: _SAFE_URL_RE accepts reverse-proxy path prefixes, underscores, and IPv6 literals; end-to-end check of cli_install_script with a stubbed base_url that includes a path prefix (Agnes behind /agnes/). - tests/test_web_ui.py: /login propagates ?next to the Google button URL; /login/email renders next in the hidden form field and strips hostile values; unit coverage of safe_next_path. * fix(security): use \Z instead of $ in URL/version allowlists (trailing-\n bypass) Python regex `$` also matches just before a trailing newline, so a Host header or AGNES_VERSION value like "good.example.com\n$(rm -rf /)" would slip past the allowlist. `\Z` anchors to strict end-of-string. shlex.quote downstream remains as defense-in-depth, but the allowlist is now the tight gate it claims to be. * fix(auth): PAT with null expiry omits JWT exp claim (DB is the source of truth) Previously a PAT created with `expires_in_days=null` (user-requested "never expires") set the DB `expires_at` to NULL (correct) but still baked a ~100y `exp` claim into the JWT. That is misleading: the PAT silently did expire eventually, despite the UI and API promising "no expiry". `create_access_token` now accepts `omit_exp=True` to skip the `exp` claim entirely. `app/api/tokens.py` passes that when `expires_in_days is None`. The authoritative expiry check lives in `app/auth/dependencies.py`, which reads `expires_at` from the DB row — unchanged. PyJWT accepts claim-less JWTs indefinitely. * test: cover trailing-newline regex bypass + no-exp JWT for unbounded PAT - test_safe_url_re_rejects_trailing_newline_bypass: asserts both `_SAFE_URL_RE` and `_SAFE_VERSION_RE` reject values with a trailing `\n` (previously accepted because Python `$` matches before `\n`). - test_pat_null_expiry_jwt_has_no_exp_claim: POST /auth/tokens with `expires_in_days=null`, decode the returned JWT, assert `exp` is absent while `typ=pat`, `sub`, and `jti` are still present. - test_pat_with_null_expiry_is_accepted_by_verify_token: verify_token round-trips a claim-less JWT without ExpiredSignatureError. - test_pat_null_expiry_end_to_end_allows_authenticated_request: use the null-expiry PAT against /auth/tokens and confirm it authenticates. * docs(auth): document X-Forwarded-For trust model in _client_ip Deployment runs behind Caddy which strips incoming X-Forwarded-For and sets its own, so the leftmost hop is trustworthy. Clarify that the stored last_used_ip is audit-only and never used for access control — if the app is ever exposed directly, this value becomes client-settable. * docs: /profile → /tokens in install.sh next-steps, CLI error, HEADLESS_USAGE, security skill After splitting PAT management to /tokens (with /profile as a back-compat 302), stale references remained in user-facing text. Update them to the canonical /tokens URL so shell scripts, CLI error hints, docs, and the bundled security skill are all consistent.	2026-04-22 14:24:28 +02:00
ZdenekSrotyr	963db420fe	ci(release): push dev-<user-prefix>-latest alias for <user>/* branches (#31 ) Adds a second tag to dev-channel image builds: when a branch is in the form <prefix>/<whatever>, the image is also pushed as ghcr.io/keboola/agnes-the-ai-analyst:dev-<prefix>-latest. Enables per-developer dev VMs on GRPN (and elsewhere) to auto-deploy without knowing the specific branch slug. Each VM pins its .env to AGNES_TAG=dev-<prefix>-latest, and the auto-upgrade cron (5 min tick) picks up the newly pushed image on the next run. Common Git Flow prefixes are deliberately skipped so feature/, fix/, hotfix/* etc. don't create noise tags. Matched list: feature, fix, hotfix, bugfix, docs, chore, test, ci, ops, refactor, perf, style, build. Verified locally against several branch names: zs/my-feature -> dev-zs-latest vr/foo -> dev-vr-latest pc/bar-baz -> dev-pc-latest feature/xyz -> (skipped) fix/bug -> (skipped) main -> (no-op, stable channel) test-no-slash -> (no-op, no slash)	2026-04-22 14:02:59 +02:00
ZdenekSrotyr	060335deba	docs(quickstart): add Hackathon section pointing to switch-dev-vm.sh and HACKATHON.md (#14 ) (#23 )	2026-04-21 21:59:23 +02:00
ZdenekSrotyr	4f381dc103	fix(ci): propagate-infra-tag fail-soft on branch push / missing secret (#24 ) Job-level 'if: secrets.X != ""' did not prevent workflow from being scheduled on branch pushes (GitHub reports failure with 0 jobs in that case). Refactored: first step is a guard that checks both the tag ref pattern and the secret presence; downstream steps skip when the guard says no. Result: workflow now reports success with a clear warning annotation on branch pushes or when the secret is absent; only real infra-v* tag pushes with the secret set perform the bump.	2026-04-21 21:59:10 +02:00
ZdenekSrotyr	5c6a641de7	Merge pull request #22 from keboola/feature/seed-admin-password feat(auth): optional SEED_ADMIN_PASSWORD to pre-hash seed admin (dev helper)	2026-04-21 21:37:29 +02:00
ZdenekSrotyr	e2eb51f657	ci(release): build image for all branches, not just feature/** (#19 ) * dryrun: intentional failing test (will be reverted) * feat(auth): optional SEED_ADMIN_PASSWORD to pre-hash seed admin (dev helper) Terraform gains enable_seed_password + seed_admin_password (sensitive) vars on the customer-instance module; when enabled the password is piped via startup-script into /opt/agnes/.env as SEED_ADMIN_PASSWORD. On first boot app/main.py argon2-hashes it onto the seed user so the admin can log in immediately without going through /auth/bootstrap. Never overwrites an existing password_hash — safe against accidental reset on terraform apply. * ci(release): build :dev-<slug> on any branch, not just feature/ Before: only 'feature/' branches triggered release.yml, so pushing 'zs/my-edit' or 'fix/bug' did not publish an image. dev_instances entry pinning image_tag = 'dev-zs-my-edit' then crashed VM startup with 'image not found'. Now: any branch push (except main, which produces :stable) publishes :dev-<slug>. Slug strips a leading 'feature/' and replaces non-[a-z0-9-] with '-', keeping existing feature/** behavior identical. * Revert "dryrun: intentional failing test (will be reverted)" This reverts commit cf9cc06a7884bb401ff29fc5cb6d8baf84dc3daa.	2026-04-21 21:33:57 +02:00
ZdenekSrotyr	1ca5295d54	docs: add HACKATHON.md — condensed deploy + dev playbooks (#21 ) Written for both humans and AI agents — explicit commands, expected outputs, troubleshooting tables, 'safe to run anytime' vs 'requires thought' sections, pitfalls checklist. Three parts: 1. Deploy for a new customer (45 min target, 7 steps) 2. Develop against Agnes (branch → image → dev VM loop, common tasks) 3. AI agent checklist (guardrails, verification, common pitfalls) Complements the deep docs (ONBOARDING.md, DEPLOYMENT.md, architecture.md) with a practical quick-reference for hackathon-style deploys.	2026-04-21 21:33:06 +02:00
ZdenekSrotyr	ada9fb75f6	chore: add switch-dev-vm.sh helper for hackathon (#20 )	2026-04-21 21:33:02 +02:00
ZdenekSrotyr	2cbffce85f	ci: propagate infra-v* tags to template repo + auto-merge rules (#17 ) * dryrun: verify per-branch GHCR tag * ci: propagate infra-v* tag bumps to template repo On push of any infra-v* tag, opens a PR in keboola/agnes-infra-template that bumps the module ref in terraform/main.tf. Auto-merge rules in the template (Renovate + CI validate + GitHub native auto-merge) land it without manual work on patch/minor bumps. Requires repo secret TEMPLATE_REPO_TOKEN (fine-grained PAT with Contents:write + Pull requests:write on keboola/agnes-infra-template). Fail-soft: if secret is missing the job is skipped and Renovate on the template repo picks up the new tag on its next cycle as a fallback. * docs(onboarding): 'Keeping the template up-to-date' maintainer section Documents the two mechanisms (upstream release hook + Renovate), the required repo settings (allow_auto_merge, validate.yml gate), the TOKEN secret setup, and the one-time setup checklist. Notes the difference between template repo (auto-merge on) and customer infra repos (human approval).	2026-04-21 21:32:58 +02:00
ZdenekSrotyr	96bd06ba00	feat(auth): optional SEED_ADMIN_PASSWORD to pre-hash seed admin (dev helper) Terraform gains enable_seed_password + seed_admin_password (sensitive) vars on the customer-instance module; when enabled the password is piped via startup-script into /opt/agnes/.env as SEED_ADMIN_PASSWORD. On first boot app/main.py argon2-hashes it onto the seed user so the admin can log in immediately without going through /auth/bootstrap. Never overwrites an existing password_hash — safe against accidental reset on terraform apply.	2026-04-21 21:32:22 +02:00
ZdenekSrotyr	e4f6910398	Merge: real CalVer + commit SHA in UI badge	2026-04-21 21:00:42 +02:00
ZdenekSrotyr	1c7cc8aa29	fix(image): add AGNES_COMMIT_SHA build-arg to Dockerfile + release.yml Completes the previous commit — bakes the full git SHA into the image ENV at build time so the UI badge shows a real commit, not a sha256 digest (which was the floating manifest digest and unhelpful for debugging).	2026-04-21 21:00:30 +02:00
ZdenekSrotyr	af6761f33e	fix(version): bake AGNES_VERSION/CHANNEL/COMMIT_SHA into image ENV Before: startup script wrote AGNES_VERSION=stable (the floating tag name) into .env, which overrode the image's build-time ENV AGNES_VERSION=2026.04.47. UI badge showed 'stable-stable' instead of 'stable-2026.04.47'. After: - Dockerfile ARG/ENV for AGNES_COMMIT_SHA (alongside existing VERSION + CHANNEL) - release.yml passes github.sha as AGNES_COMMIT_SHA build-arg - Startup script no longer writes these three into .env; the app reads them from the image ENV set at build time. Result: badge displays 'stable-2026.04.47 · stable · <time> ago' with the real CalVer, and the commit SHA tooltip points at an actual commit rather than the floating manifest digest.	2026-04-21 21:00:04 +02:00
ZdenekSrotyr	7553f77e55	Merge: version badge partial on all full-page templates	2026-04-21 20:52:04 +02:00
ZdenekSrotyr	432e7695b3	feat(ui): version badge as shared partial, injected into every full-page template The earlier base.html edit only affected templates that extend base.html (login.html via base_login.html). Most pages (dashboard, catalog, admin_tables, admin_permissions, activity_center, corporate_memory, ...) are standalone templates with their own <body>, so the badge never showed. Fix: extracted the badge + fetch script into _version_badge.html partial, included it before </body> in every full-page template. Consistent across login, dashboard, admin, catalog, etc.	2026-04-21 20:51:55 +02:00
ZdenekSrotyr	dbac3e698c	Merge: alert policy reducer fix	2026-04-21 20:36:21 +02:00
ZdenekSrotyr	9a99a82e92	fix(infra): alert policy aggregation — drop cross_series_reducer GCP rejected the policy with 'REDUCE_COUNT_FALSE cannot be applied to metrics with value type DOUBLE' — because ALIGN_FRACTION_TRUE already produces a fraction 0..1 per series, no need for an additional cross-series reducer. Simplified: alert when the per-series fraction < 1 for 5 min. Review M4 predicted this — uptime check filters needed double-checking against live GCP.	2026-04-21 20:36:09 +02:00
ZdenekSrotyr	717f40c218	Merge: bootstrap monitoring role fix	2026-04-21 20:32:59 +02:00
ZdenekSrotyr	4ab0838ba2	fix(bootstrap): grant monitoring.editor + enable monitoring API v1.3.0 added google_monitoring_uptime_check_config + alert policies to the module, but bootstrap-gcp.sh was not updated. Fresh customers (and the first apply after upgrading existing customers) hit 403 on monitoring.uptimeCheckConfigs.create. Fix: enable monitoring.googleapis.com + grant roles/monitoring.editor to the deploy SA. Idempotent (safe to re-run on existing projects).	2026-04-21 20:32:50 +02:00
ZdenekSrotyr	3fb17a13bb	Merge: workflow-driven recreate + docs	2026-04-21 20:24:40 +02:00
ZdenekSrotyr	1a55167234	docs: workflow-driven VM recreate for startup-script propagation - ONBOARDING.md: replace 'propagating module changes' section with two explicit options — workflow_dispatch with recreate_targets (recommended, CI audit trail), or local terraform apply -replace (emergency). Adds a 'do not' section banning manual .env edits on VMs. - deployment-log.md: iteration 4 summary (version badge + module v1.5.0 + workflow_dispatch).	2026-04-21 20:24:31 +02:00
ZdenekSrotyr	11c03f7235	Merge: version badge in footer + /api/version	2026-04-21 20:19:51 +02:00
ZdenekSrotyr	b091cf7003	feat(ui): version badge in footer + /api/version endpoint UI now shows a small footer badge with: - release channel + CalVer version (e.g. 'stable-2026.04.47') - floating image tag (e.g. 'stable') - time since last container restart (proxy for 'last deployed') Backend: - app/api/health.py: /api/health returns image_tag, commit_sha, deployed_at - app/api/health.py: new /api/version endpoint (lightweight, no DB hit, for footer badge polling) Infra: - startup-script.sh.tpl: resolves image digest from ghcr pull, derives channel + version from the tag name, and writes AGNES_VERSION / RELEASE_CHANNEL / AGNES_COMMIT_SHA into .env so the app can surface them to the UI. UI: - app/web/templates/base.html: footer loads /api/version asynchronously and renders '<channel>-<version> · <tag> · deployed <relative> (<UTC>)'. Tooltip shows full detail (commit sha, schema version).	2026-04-21 20:19:40 +02:00
ZdenekSrotyr	2743de6114	Merge: deployment log iteration 3	2026-04-21 20:09:27 +02:00
ZdenekSrotyr	cdd959b19f	docs(log): add iteration 3 — review, bootstrap fix, docs sweep, infra-v1.4.0	2026-04-21 20:09:13 +02:00
ZdenekSrotyr	c1227df990	Merge: docs sweep — DEPLOYMENT.md rewrite, ONBOARDING v1.4.0, README links	2026-04-21 20:08:23 +02:00
ZdenekSrotyr	0121354596	docs: refresh DEPLOYMENT.md and ONBOARDING.md for infra-v1.4.0 - docs/DEPLOYMENT.md: rewritten to pick between Terraform (managed) and Docker Compose (OSS self-host). Old manual SSH-key-and-git-clone flow replaced with compose-based instructions pointing at the persistent-disk overlay and bootstrap endpoint. - docs/ONBOARDING.md: section 4 now documents the new v1.4.0 variables (runtime_secrets, firewall_ssh_source_ranges, notification_channel_ids, compose_ref). Section 6 explains the /auth/bootstrap seed-user fix and warns that destroy+apply reopens the bootstrap window until run again. - README.md: Documentation list expanded — ONBOARDING.md first (recommended path), DEPLOYMENT.md as the branching point, plus links to CONFIGURATION, architecture, and QUICKSTART.	2026-04-21 20:07:43 +02:00
ZdenekSrotyr	0643437ab8	Merge: /auth/bootstrap seed-user fix	2026-04-21 20:01:38 +02:00
ZdenekSrotyr	2b17973796	fix(auth): /auth/bootstrap activates seed users, disabled only by real password Bug: SEED_ADMIN_EMAIL creates a password-less user at app startup, which made /auth/bootstrap return 403 '1 users already exist' on a fresh deployment — leaving the operator no way to log in (the seed user has no password, and /auth/token requires one). Fix: bootstrap is now disabled only when at least one user has a password_hash set. On a fresh deploy with a seed user: - POST /auth/bootstrap { email: <matches seed>, password: X } → sets the password on the seed user, promotes to admin, returns token. - With a non-matching email, a new admin is created alongside the seed user. Lock semantics: bootstrap self-deactivates as soon as any password is set. Tests: 8 passing, including new test_bootstrap_activates_seed_user and test_bootstrap_disabled_when_password_user_exists covering the two halves.	2026-04-21 20:01:20 +02:00
ZdenekSrotyr	7245eedd23	Merge: code review fixes — scoped SA, fail-fast, firewall split, cron .env	2026-04-21 19:40:07 +02:00
ZdenekSrotyr	921094ae40	feat(infra): address code review — scoped SA, fail-fast secrets, firewall split, cron reads .env, merge fix Critical fixes: - C1: VM SA now gets secretmanager.secretAccessor only on specific secrets (JWT + each entry in runtime_secrets). Previously project-wide. - C3: chmod 640 on /var/log/agnes-startup.log (defense in depth) - C4: Remove '\|\| echo ""' fallback on keboola-storage-token — boot now fails loudly if the secret is missing instead of starting a broken app. - C5: Cron auto-upgrade script sources /opt/agnes/.env for AGNES_TAG. If an operator edits .env to pin a specific stable-YYYY.MM.N, cron picks it up immediately with no drift. Removed AGNES_TAG from crontab entry. - C7: explicit depends_on = [IAM bindings, secret_version] on VM — prevents race where VM boots before IAM propagates. Important fixes: - I1: Split firewall into web (80/443 + conditional 8000) and ssh (port 22 with configurable source_ranges, default IAP range only). - I4: Fetch docker-compose files from compose_ref (default 'main'), so customers can pin a specific tag for reproducibility. - I5+I6: Merge order fixed — user-supplied dev_instances values now override defaults (was the other way around). Dev tls_mode default flipped to 'none'. - I7: Remove '\|\| true' on Caddyfile fetch; surface failures loudly. - New acme_email variable (falls back to seed_admin_email if empty). Out-of-module: - Comments translated from Czech to English where applicable (M1).	2026-04-21 19:39:53 +02:00
ZdenekSrotyr	9962fc4d40	Merge: final deployment log iteration 2	2026-04-21 19:11:14 +02:00
ZdenekSrotyr	6470e23df3	docs: finalize deployment log — iteration 2 summary	2026-04-21 19:11:07 +02:00
ZdenekSrotyr	1073517969	Merge: onboarding race-condition fix	2026-04-21 19:10:12 +02:00
ZdenekSrotyr	0b4807a836	docs(onboarding): use 'gh repo create --clone' to avoid template-copy race Separate 'gh repo create --clone=false' + 'git clone' races with GitHub's template content propagation. '--clone' waits for it in one step.	2026-04-21 19:10:04 +02:00
ZdenekSrotyr	4501840893	Merge: onboarding docs — propagation, restore, monitoring	2026-04-21 19:06:27 +02:00
ZdenekSrotyr	3e9213bfc4	docs(onboarding): add module propagation, backup restore, monitoring setup - 'Propagating module changes' — explains ignore_changes + -replace workflow - 'Restoring from backup' — step-by-step disk swap from daily snapshot - 'Monitoring alerts' — wiring notification channels	2026-04-21 19:06:20 +02:00
ZdenekSrotyr	85bca573a7	Merge: daily backup snapshot + monitoring alerts	2026-04-21 19:02:07 +02:00

1 2 3 4 5 ...

403 commits