Commit graph

395 commits

Author SHA1 Message Date
ZdenekSrotyr
7e4ddf0b01
feat(auth): password reset & invite flows for web + admin (#34) (#37)
* feat(auth): password reset & invite flows for web + admin (#34)

Wires end-to-end the previously orphaned password_reset.html and
password_setup.html templates, adds the missing POST /auth/password/reset
handler (closes #34), and restores the Reset action in the admin user UI
(which origin/main had removed precisely because the flow was broken).

Web flow
- GET  /auth/password/reset — renders the set-new-password form
- POST /auth/password/reset — 'Forgot Password?' request; emails link,
  anti-enumeration (same response for unknown email)
- POST /auth/password/reset/confirm — validates token + 24h TTL, sets new
  password, clears token, logs user in
- GET  /auth/password/setup — renders the setup form (invite link landing)
- POST /auth/password/setup/request — signup-tab 'Request Access' (email-only)
- POST /auth/password/setup/confirm — 7-day TTL, sets password + name, logs in
- Reuses LOCAL_DEV_MODE pattern from email.py: logs the link loudly so
  developers can use the flow without an SMTP/SendGrid transport

Admin flow
- POST /api/users accepts send_invite → returns invite_url + invite_email_sent
- POST /api/users/{id}/reset-password now returns a full reset_url pointing
  at the dedicated password-reset endpoint (NOT the magic-link verifier,
  which would log the user in without prompting for a new password)
- admin_users.html: restored Reset row action, copyable reset/invite link
  modals, invite checkbox on create, reworded 'magic-link not wired' notes

Backward compat
- JSON POST /auth/password/setup kept unchanged (existing tests pass)
- Active-account gate applied to reset/setup flows (matches password_login)

Tests: 21 new cases (tests/test_password_flows.py) covering GET renders,
request/confirm happy + error paths, TTLs, anti-enumeration, and admin
invite/reset URL responses. Full suite: 1309 passed.

Closes #34

* fix(admin-users): allow horizontal scroll when actions overflow

Four action buttons (Tokens, Reset, Set pwd, Delete) can exceed the
viewport on narrow screens. Switch .users-table-wrap from overflow: hidden
to overflow-x: auto so the table scrolls instead of clipping, and lock
row-actions buttons to a single nowrap line.

* fix(admin-users): override base 800px container so table can use full width

The base layout caps .container at 800px, so the table was always being
clipped regardless of viewport. Unclamp the container on this page and
widen the inner page cap to 1400px.

* fix(auth): address Devin review — harden JSON setup, anti-enumeration, preserve email case

Addresses findings from Devin review on PR #37:

1. JSON POST /auth/password/setup now enforces the same SETUP_TOKEN_TTL
   (7 days) and active-account check as the web flow. An expired token or
   a deactivated user can no longer bypass the gate by posting JSON.
   Existing test fixture seeds setup_token_created=now so backward-compat
   tests continue to pass.

2. GET /auth/password/setup no longer looks up the user to pre-fill name.
   The form renders identically regardless of whether the email exists,
   consistent with anti-enumeration in POST /setup/request.

3. reset_request / setup_request no longer lowercase the submitted email.
   The rest of the codebase (password_login, magic-link, admin create)
   uses case-sensitive lookups, so normalizing only here would silently
   fail for mixed-case accounts.

Tests: 6 new cases covering expired-JSON-setup, missing-created-timestamp,
deactivated-user-rejection, mixed-case email preservation, and the
anti-enumeration property of GET /setup.
2026-04-22 17:43:57 +02:00
Petr Simecek
f593a151fc
docs(security): add padak-security.md audit report (#35)
* docs(security): add padak-security.md — full audit report from 2026-04-22

Four-agent audit (secrets/SQLi/authz/SSRF, auth flows, UI wiring, data layer)
deduped into one document. Top 5 to fix first, second/third/fourth tier by
real exploitability, verified non-issues so we don't re-open them, and
coverage gaps where automated scanners / pytest / Jira connector / infra
were not touched.

Missing /auth/password/reset is already tracked in
padak/keboola_agent_cli#206; other top items (script sandbox RCE,
rate-limit, backslash open-redirect, SSRF) still need their own issues.

* docs(security): rephrase methodology description

Replace "four parallel agents" with "parallel review passes over four scope
areas" — same meaning, removes the overlap with agentic-AI terminology.
2026-04-22 16:31:13 +02:00
Petr Simecek
cbb7733987
feat(dev): make local-dev targets for one-keystroke LOCAL_DEV_MODE startup (#33)
* feat(dev): add make local-dev targets wrapping run-local-dev.sh

`make local-dev` runs scripts/run-local-dev.sh so docker compose + the
LOCAL_DEV_MODE overlay are one keystroke away. `make local-dev-down`
and `make local-dev-logs` manage the same 3-file stack.

* fix(make): ensure .env exists before local-dev-down / local-dev-logs

docker-compose.yml declares `env_file: .env` for several services (extract,
telegram-bot, ws-gateway, …). Compose validates those paths during config
parsing even for profiled services that never start, so a missing .env
breaks `make local-dev-down` and `make local-dev-logs`. Add .env as a
file-target prerequisite — touched on demand, so no-op when already
present.

Addresses Devin review on #33.
2026-04-22 14:57:10 +02:00
Petr Simecek
9b5214ea6f
feat(dev): LOCAL_DEV_MODE for one-command local dev + magic-link fixes (#32)
* feat(dev): add LOCAL_DEV_MODE for one-command local dev

When LOCAL_DEV_MODE=1, every protected route auto-authenticates as a seeded
admin user (default dev@localhost) — no login screen, no Google OAuth config,
no magic-link roundtrip. Startup logs a loud warning to make misuse obvious.

Also fixes two preexisting bugs in the magic-link flow that surfaced while
wiring up the dev fallback:

- /auth/email/verify only accepted POST, but the URL embedded in emails is
  a GET link — clicking from any mail client returned 405. Added a GET
  variant that consumes the token, sets the auth cookie, and redirects to
  /dashboard.
- Token expiry check compared an offset-aware datetime.now(timezone.utc)
  against an offset-naive value from DuckDB, raising TypeError on every
  valid link. Normalize the stored timestamp to UTC before subtracting.

Dev-only fallback (scoped strictly to LOCAL_DEV_MODE to keep test and
production behavior identical): send-link logs the magic link to stderr
and returns it as dev_link in the JSON response when no SMTP is configured.

Usage:
  ./scripts/run-local-dev.sh
  open http://localhost:8000  # lands on /dashboard as admin

* fix(dev): URL-encode magic-link email + avoid /login redirect loop

Two issues surfaced by Devin review on PR #32.

1. _build_magic_link interpolated email into the URL unescaped. For addresses
   with '+' (e.g. user+tag@gmail.com) Starlette's query parser decoded '+'
   as a space on the GET /verify side, so repo.get_by_email returned None
   and every click yielded 401 "Invalid link". quote(email, safe='') fixes
   both the email transport and the dev_link fallback.

2. /login in LOCAL_DEV_MODE unconditionally redirected to /dashboard. If
   dev-user seeding failed at startup (main.py wraps seed in try/except),
   /dashboard 401'd, the HTML redirect handler bounced to /login, and the
   loop repeated until the browser aborted. Now /login checks the dev user
   actually exists before short-circuiting; otherwise it falls through to
   the normal login form so the missing seed is visible.
2026-04-22 14:47:33 +02:00
ZdenekSrotyr
d2c76cb221
User management + PAT + CLI distribution + HTML auth redirect (#9 #10 #11 #12) (#28)
* fix: redirect unauthenticated HTML routes to /login (#10)

* docs(plan): user mgmt + PAT + CLI distribution implementation plan (#9 #10 #11 #12)

* build(docker): produce wheel artifact for /cli/download (#9)

* feat(db): schema v5 — users.active + deactivated_at/by (#11)

* feat(api): /cli/download wheel + /cli/install.sh with baked server URL (#9)

* feat(users): repository supports active flag + count_admins (#11)

* feat(ui): /install page with per-deployment install instructions (#9)

* feat(api): user PATCH/reset-password/set-password/activate/deactivate (#11)

* fix(cli): da login prompts for password and sends it in body (#9)

* test(api): safeguard tests for self-deactivate and last admin (#11)

* feat(auth): reject requests from deactivated users (#11)

* fixup(#10): propagate next through /login buttons + lock down sanitizer tests

* feat(cli): da admin set-role/activate/deactivate/reset-password/set-password (#11)

* feat(ui): /admin/users management page (#11)

* feat(db): schema v6 — personal_access_tokens (#12)

* feat(users): access_tokens repository (#12)

* feat(auth): JWT carries typ (session|pat) and explicit jti (#12)

* feat(auth): reject revoked/expired PATs; update last_used_at (#12)

* feat(api): /auth/tokens CRUD + admin revoke; session-only guard (#12)

* feat(cli): da auth token create/list/revoke (#12)

* feat(ui): /profile page with PAT create/list/revoke (#12)

* docs: PAT usage and session/PAT TTL clarification (#12)

* feat(auth): PAT first-use-from-new-IP audit + last_used_ip (schema v7) (#12)

Closes remaining acceptance gap from issue #12: audit_log entry on first use
of a PAT from an IP that differs from the recorded last_used_ip.

- schema v7: personal_access_tokens.last_used_ip column
- AccessTokenRepository.mark_used now stores the client IP
- get_current_user extracts client IP (X-Forwarded-For first hop, fallback
  to request.client.host) and emits a token.first_use_new_ip audit when the
  IP changes on a subsequent use (not the very first use)
- tests: new-ip audit, same-ip no-op, first-ever-use no-op, schema v7 column

* fix: address Devin review findings on PR #28

- app/main.py: exclude /auth/* from HTML redirect handler so JSON
  endpoints under /auth/ (PAT CRUD used by `da auth token` CLI) keep
  their 401 JSON contract (Devin #1, bug)
- app/api/tokens.py: reject expires_in_days <= 0 explicitly; use
  `is not None` so 0 no longer silently creates a non-expiring token
  (Devin #2)
- app/api/users.py: validate role against Role enum in create_user
  to match update_user and prevent 500 on role-protected requests
  later (Devin #3)
- app/web/templates/admin_users.html: escape user-supplied strings
  before innerHTML; move onclick handlers to addEventListener via
  data attributes so emails with quotes / HTML no longer break the UI
  or enable stored XSS (Devin #4)
- app/auth/router.py, app/auth/providers/{password,google}.py:
  reject deactivated users at login instead of issuing a JWT that
  would then fail on the next request — removes the confusing
  redirect loop (Devin #5)
- CLAUDE.md: document schema v7 instead of stale v4 (Devin #6)
- tests/test_web_ui.py: regression test for the /auth/* JSON 401

* feat(web): add /profile and /admin/users links to dashboard nav

* feat(web): point setup banner at /install page

* chore(web): drop unused setup_instructions context

* fix: address Devin review round 2 on PR #28

- app/api/tokens.py: when expires_in_days is None (the "never" option),
  use a ~100-year JWT expiry so the token doesn't silently die in 24h
  via the session-default fallback in create_access_token. The real
  expiry enforcement stays in verify_token's DB-level check (Devin 🔴)
- app/web/templates/profile.html: escape t.name and other user-supplied
  strings via esc() helper before innerHTML, same pattern as
  admin_users.html. Move revoke onclick to data-attribute +
  addEventListener (Devin 🟡)
- app/api/cli_artifacts.py: use `mktemp -d` with X's at end of template
  for GNU/BSD portability, place wheel inside the temp dir and
  clean up with rm -rf (Devin 🚩)

* feat(web): redesign /install page; make curl one-liner primary, collapse manual

Rebuild the public /install page using the dashboard visual language
(shared header, card layout, gradient hero, design tokens from
style-custom.css). The page is now anchored on the one-liner install
path: curl -fsSL <server>/cli/install.sh | bash is rendered as the
primary, prominent step 1, while the old manual wheel-download flow
is tucked behind a closed-by-default <details> block for users in
restricted/offline environments.

Information architecture:
  hero (server URL + version)
  -> step 1: quick install (one-liner, big Copy button)
  -> step 2: create PAT on /profile + export DA_TOKEN / da auth whoami
  -> step 3: Claude Code / MCP via ~/.config/da/token.json
  -> collapsed "Manual install" details for download-wheel flow
  -> footer link to docs/HEADLESS_USAGE.md

Every shell snippet has a vanilla-JS "Copy" button that confirms
visually ("Copied!" for 1.5s) and falls back to textarea+execCommand
on non-secure contexts. No new dependencies, no bundler.

The route now also pulls an optional user so the header shows the
same nav (Dashboard / Profile / Logout) as dashboard.html when a
session exists, while staying fully public when signed out.

* fix(cli): use real wheel filename in install.sh (broken pip/uv install)

The installer wrote the downloaded wheel as agnes_cli.whl, which lacks a
PEP-427 version component — both pip and uv tool install reject it and
abort the one-liner.

Use curl -OJ so Content-Disposition determines the on-disk filename, then
resolve it via glob. Install an EXIT trap to remove the tmpdir even when
install fails.

* fix(web): correct manual install wheel glob and add PEP 668 / PATH hints

- Wheel glob is agnes_the_ai_analyst-*.whl (not agnes-*.whl) — the old
  pattern never matched the real artefact name from the build.
- Add — or — separator between uv tool install and pip install.
- Warn that pip install --user is blocked on macOS Homebrew / modern
  Debian (PEP 668) and recommend uv tool install as the default path.
- Both flows now show the ~/.local/bin PATH hint so a fresh shell can
  find the da binary after install.

* fix(web): consistent session.user reference in install header

The avatar-letter fallback inside {% if session.user %} was reading
user.name / user.email directly, but the route dependency can pass
user=None — those references resolved to an empty FlexDict and produced
an empty avatar circle. Read everything through session.user to match
the guard and the dashboard pattern.

* fix(web): point headless usage link at GitHub source

/docs/HEADLESS_USAGE.md 404s — no static route serves repo docs. Point
the footer link at the rendered markdown on GitHub instead of adding a
dedicated docs serving route just for one file.

* feat(web): /install hero size, anon sign-in banner, step 2 copy polish

- Bump hero h1 from 26px to 30px to match dashboard primary scale.
- Anonymous visitors see a small sign-in banner above Step 2 (creating
  a token requires auth; without the banner the flow appears stuck).
- Add an 'After generating your token' section label inside Step 2 so
  the /profile CTA button no longer looks wedged mid-sentence between
  adjacent paragraphs.

* chore(web): /install a11y + version pill polish

- aria-live='polite' on copy buttons so screen readers announce the
  'Copied!' state change.
- Replace redundant INSTANCE_NAME eyebrow (already in the header logo)
  with 'Getting started'.
- Hide the version pill when AGNES_VERSION is unset/'dev' — avoids the
  misleading 'vdev' label in local/unbuilt runs.
- Manual summary focus-visible outline-offset +2px (was -2px which
  clipped inside the card), and mark the chevron as decorative.

* fix(web): use session.user in dashboard avatar fallback

Inside {% if session.user %} guard, the avatar fallback referenced
(user.name or user.email). If user is None the block crashes when
the profile picture is absent. Align with the guard variable.

* fix: address Devin review round 3 on PR #28

- app/api/users.py: stop auto-sending email from reset_password. The
  magic-link sender would deliver a "Login Link" that — when clicked —
  consumes the reset_token via verify_magic_link and logs the user in
  WITHOUT prompting for a new password. Admins now share the raw
  reset_token from the API response manually, or use set-password
  directly. email_sent is always False. Documented inline. (Devin 🟡)
- app/api/cli_artifacts.py: harden /cli/install.sh generation against
  shell injection via Host header or AGNES_VERSION. base_url is
  validated against a strict scheme+host+port regex; version against
  an alnum + dot/dash/underscore allowlist. Both values are also
  piped through shlex.quote() as defense in depth. (Devin 🟡)

The shared users.reset_token column between magic-link and password-
reset flows (Devin 🚩) remains an architectural gap; splitting into
separate columns needs schema v8 and is tracked for a follow-up PR.

* docs, chore(grpn): manual-deploy helpers + hackathon deploy learnings

Adds scripts/grpn/ — Makefile + agnes-auto-upgrade.sh + README for
operating Agnes on GRPN's existing foundryai-development VM when the
full Terraform flow is blocked by org policies:

- iam.disableServiceAccountKeyCreation (org constraint) forbids SA
  JSON keys, so GCP_SA_KEY-based CI is unavailable
- No projectIamAdmin delegation → bootstrap-gcp.sh can't grant roles
- Secret Manager IAM bindings require setIamPolicy which editor lacks

Helper targets: deploy, deploy-tag, recreate, restart, stop, start,
status, version, logs, ps, env, ssh, tunnel, open, bootstrap-admin,
set-data-source, install-cron, uninstall-cron.

docs/superpowers/plans/2026-04-22-grpn-deploy-learnings.md — running
log of all org-policy constraints hit during the hackathon deploy,
with workarounds and derived follow-ups (WIF support, external_ip
variable, customer onboarding IAM checklist).

Not a replacement for the TF flow — stopgap until WIF lands.

* fix(web): make header logos clickable links to home

* feat(web): one-click "Setup a new Claude Code" button

Adds a single-button flow on the dashboard and /install page that
generates a fresh personal access token via POST /auth/tokens and
copies a complete, paste-ready setup script (server URL, token,
install/verify commands) to the clipboard. Falls back to a modal
textarea when the clipboard is blocked; redirects to /login on 401;
surfaces backend errors inline.

- dashboard.html: replaces the top "Set up your local environment"
  anchor with a real button wired to setupNewClaude(). Removes the
  duplicate bottom setup banner to keep a single entry point.
- install.html: for signed-in users, Step 1 leads with the one-click
  button and demotes the curl one-liner into a collapsible "Or run
  manually" aside. Anonymous visitors still see the curl flow plus a
  sign-in hint.
- No new deps. Vanilla JS. Token lives in memory/clipboard only —
  never rendered into persistent DOM.

* feat(cli): add "da auth import-token" for non-interactive PAT login

Writes a provided JWT into ~/.config/da/token.json using the canonical
{access_token, email, role} shape expected by save_token(). Decodes the
token locally to pull email/role claims, verifies it against the server
via GET /api/catalog/tables, and refuses to overwrite an existing token
file if the server returns 401. --email / --role overrides exist for
tokens missing those claims; --skip-verify bypasses the server round-trip
for offline / CI scenarios.

* test(cli): cover da auth import-token success + 401 + claim-fallback paths

Three new tests in TestAuthImportToken:
- valid JWT + 200 -> canonical token.json written
- 401 from /api/catalog/tables -> exit 1, existing token file untouched
- JWT without email/role claims -> refused without overrides, accepted
  with --email / --role flags

* feat(web): update one-click Claude setup instructions — explicit uv install, import-token, skills question

Replaces the fragile `cat > token.json <<EOF` clipboard payload with an
explicit, auditable sequence:

  1. `curl -fsSL /cli/download` + `uv tool install --force` (no opaque
     `curl | bash`).
  2. `da auth import-token --token ...` instead of hand-written JSON.
  3. Explicit PATH persistence for zsh/bash.
  4. A required question to the user about whether to copy the bundled
     skills into ~/.claude/skills/agnes/ or pull them on-demand via
     `da skills show`.
  5. A final confirmation step with whoami + version output.

Factored both pages to include a shared partial
(app/web/templates/_claude_setup_instructions.jinja) so dashboard.html
and install.html can never drift apart again. {server_url} and {token}
stay as runtime placeholders substituted by renderSetupInstructions().

* feat(ui): modernize /admin/users + unify header nav across pages

- New shared partial app/web/templates/_app_header.html — single source
  of truth for the top navigation. Used by base.html and dashboard.html
  (which doesn't extend base.html). Active page highlighted via
  request.url.path. Admin "Users" link gated by session.user.role.
- style-custom.css: add .app-header / .app-nav-link / .app-btn-logout /
  .app-avatar styles (mirrors dashboard's previous inline copy under
  app-* prefix). Mobile-friendly fallback at <720px.
- base.html: include the new partial so every page extending base
  (admin_users, profile, login_email, error, …) gets the same chrome
  the dashboard has.
- dashboard.html: replace its inline <header class="header"> markup
  with the shared partial. Inline .header CSS left in place as
  harmless dead code (separate cleanup PR).
- admin_users.html: rewritten with avatars, role pills (color-coded
  per role), toggle switch for active, search/filter input, toast
  notifications, modal dialogs replacing alert/confirm/prompt,
  one-click copy for the reset token, empty / loading states.
  All XSS-safe via the existing esc() helper + data-attribute
  event delegation.
- tests/test_web_ui.py: smoke test that /admin/users renders the new
  shared header chrome and the modernized markup.

* feat(api): serve CLI wheel at /cli/agnes.whl for direct uv install

uv tool install inspects the URL path suffix to recognise a wheel, so
/cli/download (which has no .whl suffix) cannot be installed directly.
Expose a stable /cli/agnes.whl alias over the same wheel lookup so users
can run: uv tool install --force https://<server>/cli/agnes.whl

* test(cli): cover da auth import-token --server persisting to config.yaml

The server persistence was already implemented in the import-token command
(save_config({server}) call) but not covered by tests. Add an explicit test
so the one-step setup contract — single import-token call writes both token
and server — cannot regress.

* feat(web): simpler Claude setup — single uv install URL, single import-token call

User feedback: the prior clipboard payload repeated the server URL and
token across multiple steps (curl + tmpfile + install + rm + separate
seed-config + import-token). Collapse to:

 1. uv tool install --force {server_url}/cli/agnes.whl  (single URL, direct)
 2. da auth import-token --token ... --server ...        (one call, persists both)
 3. da auth whoami
 4. skills (ask user first)
 5. confirm

uv accepts HTTPS URLs that end in .whl and installs them directly, so
the tmpfile dance is unnecessary. import-token --server already persists
the server to config.yaml, so no separate printf > config.yaml step.

* fix(tests): update admin users heading assertion after template rename

The admin_users.html template now uses <h2 class="users-title">Users</h2>
instead of <h2>User management</h2>. Update the assertion to match.

* feat(ui): unify header across remaining 7 standalone pages

These 7 pages render their own full <html> and don't extend base.html,
so the previous unification commit only covered base + dashboard. Each
had its own ad-hoc <header> markup with inconsistent classes
(.top-header / .header / .page-header), inconsistent nav-link sets,
and inconsistent avatar/email styling.

Replace each inline <header>...</header> block with the shared
{% include '_app_header.html' %} so /activity-center, /admin/permissions,
/admin/tables, /catalog, /corporate-memory, /corporate-memory/admin,
and /install all show the same chrome (Dashboard / Install CLI /
Profile / Users / email + avatar / Logout) with the active page
highlighted via request.url.path.

Old inline header CSS (.header, .top-header, .page-header, .nav-link,
etc.) is left in place as harmless dead code; it can be cleaned up in
a follow-up sweep.

* feat(web): add readable preview of Claude setup payload on dashboard + /install

Move the line-by-line setup instructions into app/web/setup_instructions.py
as the single source of truth, then render them in two modes from the
existing _claude_setup_instructions.jinja partial:

- preview_mode=True  → visible, read-only <pre><code> block with the real
  server URL and a clearly-styled placeholder token (never a real one).
- preview_mode=False → the JS SETUP_INSTRUCTIONS_TEMPLATE used by the
  one-click flow (unchanged behaviour).

Both /dashboard (env-setup-cta card) and /install (Step 1 card) now show
the preview directly under the 'Setup a new Claude Code' button so users
can see exactly what will land in their clipboard before they click.

* feat(web): update setup instructions — `da diagnose` step, explicit section titles

Rework the Claude Code setup payload to:

- Give every numbered step an unambiguous verb header ("1) Install the CLI",
  "2) Log in", "3) Verify the login", "4) Run diagnostics", "5) Skills (ask
  the user first)", "6) Confirm").
- Add step 4 `da diagnose` as the post-login health check. The CLI already
  ships this command (cli/commands/diagnose.py); it prints "Overall:
  healthy" and a list of green checks that map cleanly to next actions.
- Ask the skills copy-vs-on-demand question verbatim so Claude Code always
  prompts the user the same way.
- Replace the terse "Confirm" line with a 4-bullet summary (version,
  whoami, skills choice, diagnose status) so the return message is
  structured and comparable across setups.

* chore(web): remove stale MCP card from /install (no MCP server today)

The 'Use with Claude Code / MCP' card (Step 3 on /install) referenced an
MCP integration Agnes does not ship. Remove the whole card. The one-click
'Setup a new Claude Code' flow in Step 1 already covers the long-lived
client use case and is less confusing than dangling persistence tips for
a non-existent integration.

* feat(api): include user_email + last_used_ip + user_id in admin tokens list response

Adds AdminTokenItem response model (superset of TokenListItem) and
AccessTokenRepository.list_all_with_user() joining personal_access_tokens
with users to denormalize user_email. Needed for /admin/tokens UI where
admins triage tokens across all users.

* feat(web): /admin/tokens page — list, filter, search, revoke across all users

Adds a new admin-only page with client-side filtering (status, user email,
last-used window), column sorting, counts bar (active/revoked/expired),
and an inline revoke action. Mirrors the /admin/users visual language.

* feat(web): add Tokens nav link for admins + deep-link from admin/users row

Admin-only nav entry to /admin/tokens, and a per-row Tokens button on
/admin/users that prefills the token page's user filter via ?user=<email>.

* test(admin): cover /admin/tokens rendering, filter state, non-admin denial, revoke

Verifies admin can render the page (title + JS hooks present), a non-admin
is blocked, unauthenticated users are redirected, the admin list response
includes user_email / user_id / last_used_ip, and admin can revoke another
user's token.

* feat(web): modern redesign of /admin/tokens — hero, stat strip, refined table, responsive cards, a11y

* feat(web): ditch the table — /admin/tokens as a card stack, modern GitHub-style list

Replaces the table-based layout with a stack of self-contained token cards
inside a <ul role=list>. Each card is a flex row: avatar + name/meta on the
left, last-used block in the middle, status pill + outlined 'Revoke' button
on the right. Status and sort controls are pill-shaped toggle chips; user
email search has an inline search icon. No <table>/<tr>/<th>/<td> anywhere.
Responsive below 720px (card stacks vertically) and 480px (stat chips 2x2).
Preserves filter IDs (flt-status, flt-user, flt-last-used) and data-revoke
for existing tests.

* feat(web): add /tokens (role-aware) — single page for both user PAT CRUD and admin overview

- Rename admin_tokens.html -> tokens.html with a new is_admin context flag.
- New route GET /tokens: renders the same card-stack UI for everyone.
  * Admins: loads /auth/admin/tokens, shows owner column + stat strip, keeps
    the owner-email search box and sort-by-owner chip.
  * Non-admins: loads /auth/tokens (own tokens only), hides owner column +
    stat chips, adds a 'New token' CTA in the hero that opens a modal
    (name + expires_in_days) calling POST /auth/tokens. The raw token is
    revealed once in a dismissable banner and cleared from the DOM on Hide.
- GET /admin/tokens now 302-redirects to /tokens, preserving query string
  (so the /admin/users deep-link ?user=foo still works).

* feat(web): /tokens full-bleed layout to match dashboard width

The hero, toolbar, and card list used to sit inside base.html's .container
(max-width 800px). Break out with negative horizontal margins so the page
spans the viewport like /dashboard does, capped at 1440px for readability
on very wide screens with a 24px gutter on each side.

- No change to base.html itself. The override is scoped to .tokens-page.
- body { overflow-x: hidden; } guards against rare horizontal scrollbars.
- < 808px viewport: reset to natural flow (mobile already narrower).
- ≥ 1488px viewport: cap to 1440px and re-center.

* chore(web): remove /profile template + nav link (redirect /profile -> /tokens)

The old /profile PAT CRUD page is now redundant — the modern /tokens page
covers both user and admin flows. Delete the template; the router's
/profile handler already 302-redirects to /tokens.

Nav cleanup:
- Remove the 'Profile' link.
- Show a single 'Tokens' link to every signed-in user (previously only
  admins saw it).
- Active-state matches /tokens, /admin/tokens, and /profile so the
  highlight survives the redirect chain.

/install CTA now points at /tokens instead of /profile.

* test: cover /tokens for admin + non-admin flows, /profile redirect, nav update

tests/test_admin_tokens_ui.py
- Point admin rendering test at /tokens directly and tighten assertions
  (admin-only stat strip + owner search, non-admin CTA absent).
- Add test_non_admin_can_render_tokens_page: personal body, New-token CTA,
  create-modal, reveal banner; stat strip + owner search absent.
- Add test_admin_tokens_redirects_to_tokens: 302 to /tokens, query string
  (?user=...) preserved for the /admin/users deep-link.
- Add test_profile_redirects_to_tokens: 302 to /tokens.
- Add test_non_admin_can_create_pat_via_tokens_page_api: exercises the
  POST /auth/tokens call that the non-admin create-modal submits.

tests/test_pat.py
- test_profile_page_renders -> test_profile_page_redirects_to_tokens:
  assert the 302 + that /tokens lands on the unified non-admin body.

tests/test_web_ui.py
- admin_users nav assertion: 'Tokens' link present, 'Profile' link absent.
- Add test_nav_shows_tokens_link_for_non_admin: non-admins see the same
  'Tokens' link (previously only admins did).
- Add test_profile_redirects_to_tokens back-compat check.

* feat(web): collapse 'What Claude Code will receive' by default

The preview block on /dashboard and /install now uses <details>/<summary>
so it is hidden by default. Click the chevron/title to expand and review
the clipboard payload. Markup stays in the DOM so existing tests that
assert on content continue to pass.

* fix(web): /tokens width — override .container to 1280px like dashboard

The negative-margin full-bleed trick was fragile and pushed content past
the right edge on deployed viewports. Replace with a simple max-width
override of base.html's .container on this page only, matching
/dashboard's 1280px center-column layout.

* feat(web): split role-aware /tokens into my_tokens.html + admin_tokens.html

* feat(web): router — separate handlers for /tokens (own) and /admin/tokens (all)

* feat(web): nav — show Tokens for all, add All tokens for admins

* test: cover split token pages (own vs all) + admin access gating

* feat(web): move 'My tokens' into a user dropdown menu

Replaces the separate Tokens/email/Logout nav trio with a rounded
avatar trigger that opens a dropdown containing the user's email,
role, a 'My tokens' link, and Logout. Admin-only 'All tokens' stays
as a top-level nav item since it's an admin function, not a personal
one. Click-outside and Escape close the panel; chevron rotates on
open.

* fix(api): allow PATs to list/get/revoke their own tokens (CLI flow)

The documented 'da auth token list/revoke' CLI flow in
docs/HEADLESS_USAGE.md uses a PAT, but the previous dependency
(require_session_token) returned 403. Only create_token must be
session-only to prevent PAT-spawning-PAT chains; listing and
revoking your own tokens is safe with a PAT.

* fix(api): cap expires_in_days at 3650 to avoid datetime overflow (500 to 400)

Values above ~11 million days overflowed datetime.max in
datetime.now(utc) + timedelta(days=...) and surfaced as an
unhandled OverflowError → 500. Cap at 10 years with a clear
400 instead; the no-expiry code path is unaffected.

* fix(api): relax _SAFE_URL_RE to allow path prefixes, underscores, and IPv6

The previous regex rejected legitimate reverse-proxy base_url values
(https://host/agnes/), underscores in Docker Compose hostnames, and
IPv6 literals (http://[::1]:8000). Widen the charset and allow an
optional trailing path. shlex.quote continues to provide
defense-in-depth against any metacharacter that slips through.

* fix(web): /login/email and Google OAuth propagate next_path

Previously, /login/email silently dropped the ?next=<path> query
param so the hidden form field rendered empty and login always
landed on /dashboard. Google's button was hard-coded to
/auth/google/login, ignoring next entirely.

- /login page now appends ?next to the Google button URL
- /login/email reads + sanitizes next, passes as template context
- google_login stashes sanitized next_path in session['login_next']
- google_callback pops + re-sanitizes and redirects there

Sanitization factored into app/auth/_common.safe_next_path.

* fix(auth): differentiate argon2 VerifyMismatchError from internal errors in web login

The previous except (VerifyMismatchError, Exception) collapsed both
cases into the generic 'invalid credentials' redirect, silently
hiding corrupted-hash / library errors from ops. Split the two:
bad password still gets ?error=invalid; anything else logs via
logger.exception and redirects with ?err=auth_internal so ops have
a visible signal and users don't retry forever against a broken
password_hash column.

* docs: correct CLAUDE.md table name (personal_access_tokens)

v7 note referenced 'access_tokens.last_used_ip' but the real table
is personal_access_tokens (as mentioned two tokens earlier in the
same bullet). Same-file consistency fix.

* chore(web): clarify admin user-reset UI — encourage Set password over the unused reset_token

POST /api/users/{id}/reset-password stores and returns a token
but no endpoint consumes it — the magic-link sender would log the
user in without prompting for a new password, defeating the reset.
- Drop the 'Reset' row action from admin_users so admins aren't
  pointed at a dead end.
- Rewrite the reveal-modal copy to tell admins to use Set password
  and explicitly note that the magic-link flow isn't available
  for reset tokens in this build.
The API endpoint stays for API-level future use.

* test: cover PAT CLI flow, expires_in_days overflow, proxy base_url, next propagation

- tests/test_pat.py: PAT can list own tokens (200, was 403);
  PAT can revoke own tokens (204); create_token returns 400 for
  expires_in_days > 3650 (was 500 via datetime overflow).
- tests/test_cli_artifacts.py: _SAFE_URL_RE accepts reverse-proxy
  path prefixes, underscores, and IPv6 literals; end-to-end check
  of cli_install_script with a stubbed base_url that includes
  a path prefix (Agnes behind /agnes/).
- tests/test_web_ui.py: /login propagates ?next to the Google
  button URL; /login/email renders next in the hidden form field
  and strips hostile values; unit coverage of safe_next_path.

* fix(security): use \Z instead of $ in URL/version allowlists (trailing-\n bypass)

Python regex `$` also matches just before a trailing newline, so a Host
header or AGNES_VERSION value like "good.example.com\n$(rm -rf /)"
would slip past the allowlist. `\Z` anchors to strict end-of-string.

shlex.quote downstream remains as defense-in-depth, but the allowlist
is now the tight gate it claims to be.

* fix(auth): PAT with null expiry omits JWT exp claim (DB is the source of truth)

Previously a PAT created with `expires_in_days=null` (user-requested
"never expires") set the DB `expires_at` to NULL (correct) but still
baked a ~100y `exp` claim into the JWT. That is misleading: the PAT
silently did expire eventually, despite the UI and API promising
"no expiry".

`create_access_token` now accepts `omit_exp=True` to skip the `exp`
claim entirely. `app/api/tokens.py` passes that when `expires_in_days
is None`. The authoritative expiry check lives in
`app/auth/dependencies.py`, which reads `expires_at` from the DB row —
unchanged. PyJWT accepts claim-less JWTs indefinitely.

* test: cover trailing-newline regex bypass + no-exp JWT for unbounded PAT

- test_safe_url_re_rejects_trailing_newline_bypass: asserts both
  `_SAFE_URL_RE` and `_SAFE_VERSION_RE` reject values with a trailing
  `\n` (previously accepted because Python `$` matches before `\n`).
- test_pat_null_expiry_jwt_has_no_exp_claim: POST /auth/tokens with
  `expires_in_days=null`, decode the returned JWT, assert `exp` is
  absent while `typ=pat`, `sub`, and `jti` are still present.
- test_pat_with_null_expiry_is_accepted_by_verify_token: verify_token
  round-trips a claim-less JWT without ExpiredSignatureError.
- test_pat_null_expiry_end_to_end_allows_authenticated_request: use
  the null-expiry PAT against /auth/tokens and confirm it authenticates.

* docs(auth): document X-Forwarded-For trust model in _client_ip

Deployment runs behind Caddy which strips incoming X-Forwarded-For
and sets its own, so the leftmost hop is trustworthy. Clarify that
the stored last_used_ip is audit-only and never used for access
control — if the app is ever exposed directly, this value becomes
client-settable.

* docs: /profile → /tokens in install.sh next-steps, CLI error, HEADLESS_USAGE, security skill

After splitting PAT management to /tokens (with /profile as a back-compat
302), stale references remained in user-facing text. Update them to the
canonical /tokens URL so shell scripts, CLI error hints, docs, and the
bundled security skill are all consistent.
2026-04-22 14:24:28 +02:00
ZdenekSrotyr
963db420fe
ci(release): push dev-<user-prefix>-latest alias for <user>/* branches (#31)
Adds a second tag to dev-channel image builds: when a branch is in the
form <prefix>/<whatever>, the image is also pushed as
ghcr.io/keboola/agnes-the-ai-analyst:dev-<prefix>-latest.

Enables per-developer dev VMs on GRPN (and elsewhere) to auto-deploy
without knowing the specific branch slug. Each VM pins its .env to
AGNES_TAG=dev-<prefix>-latest, and the auto-upgrade cron (5 min tick)
picks up the newly pushed image on the next run.

Common Git Flow prefixes are deliberately skipped so feature/*, fix/*,
hotfix/* etc. don't create noise tags. Matched list:
feature, fix, hotfix, bugfix, docs, chore, test, ci, ops, refactor,
perf, style, build.

Verified locally against several branch names:
  zs/my-feature     -> dev-zs-latest
  vr/foo            -> dev-vr-latest
  pc/bar-baz        -> dev-pc-latest
  feature/xyz       -> (skipped)
  fix/bug           -> (skipped)
  main              -> (no-op, stable channel)
  test-no-slash     -> (no-op, no slash)
2026-04-22 14:02:59 +02:00
ZdenekSrotyr
060335deba
docs(quickstart): add Hackathon section pointing to switch-dev-vm.sh and HACKATHON.md (#14) (#23) 2026-04-21 21:59:23 +02:00
ZdenekSrotyr
4f381dc103
fix(ci): propagate-infra-tag fail-soft on branch push / missing secret (#24)
Job-level 'if: secrets.X != ""' did not prevent workflow from being
scheduled on branch pushes (GitHub reports failure with 0 jobs in that
case). Refactored: first step is a guard that checks both the tag ref
pattern and the secret presence; downstream steps skip when the guard
says no.

Result: workflow now reports success with a clear warning annotation on
branch pushes or when the secret is absent; only real infra-v* tag
pushes with the secret set perform the bump.
2026-04-21 21:59:10 +02:00
ZdenekSrotyr
5c6a641de7
Merge pull request #22 from keboola/feature/seed-admin-password
feat(auth): optional SEED_ADMIN_PASSWORD to pre-hash seed admin (dev helper)
2026-04-21 21:37:29 +02:00
ZdenekSrotyr
e2eb51f657
ci(release): build image for all branches, not just feature/** (#19)
* dryrun: intentional failing test (will be reverted)

* feat(auth): optional SEED_ADMIN_PASSWORD to pre-hash seed admin (dev helper)

Terraform gains enable_seed_password + seed_admin_password (sensitive) vars
on the customer-instance module; when enabled the password is piped via
startup-script into /opt/agnes/.env as SEED_ADMIN_PASSWORD. On first boot
app/main.py argon2-hashes it onto the seed user so the admin can log in
immediately without going through /auth/bootstrap. Never overwrites an
existing password_hash — safe against accidental reset on terraform apply.

* ci(release): build :dev-<slug> on any branch, not just feature/**

Before: only 'feature/**' branches triggered release.yml, so pushing
'zs/my-edit' or 'fix/bug' did not publish an image. dev_instances entry
pinning image_tag = 'dev-zs-my-edit' then crashed VM startup with
'image not found'.

Now: any branch push (except main, which produces :stable) publishes
:dev-<slug>. Slug strips a leading 'feature/' and replaces non-[a-z0-9-]
with '-', keeping existing feature/** behavior identical.

* Revert "dryrun: intentional failing test (will be reverted)"

This reverts commit cf9cc06a7884bb401ff29fc5cb6d8baf84dc3daa.
2026-04-21 21:33:57 +02:00
ZdenekSrotyr
1ca5295d54
docs: add HACKATHON.md — condensed deploy + dev playbooks (#21)
Written for both humans and AI agents — explicit commands, expected
outputs, troubleshooting tables, 'safe to run anytime' vs 'requires
thought' sections, pitfalls checklist.

Three parts:
1. Deploy for a new customer (45 min target, 7 steps)
2. Develop against Agnes (branch → image → dev VM loop, common tasks)
3. AI agent checklist (guardrails, verification, common pitfalls)

Complements the deep docs (ONBOARDING.md, DEPLOYMENT.md, architecture.md)
with a practical quick-reference for hackathon-style deploys.
2026-04-21 21:33:06 +02:00
ZdenekSrotyr
ada9fb75f6
chore: add switch-dev-vm.sh helper for hackathon (#20) 2026-04-21 21:33:02 +02:00
ZdenekSrotyr
2cbffce85f
ci: propagate infra-v* tags to template repo + auto-merge rules (#17)
* dryrun: verify per-branch GHCR tag

* ci: propagate infra-v* tag bumps to template repo

On push of any infra-v* tag, opens a PR in keboola/agnes-infra-template
that bumps the module ref in terraform/main.tf. Auto-merge rules in the
template (Renovate + CI validate + GitHub native auto-merge) land it
without manual work on patch/minor bumps.

Requires repo secret TEMPLATE_REPO_TOKEN (fine-grained PAT with
Contents:write + Pull requests:write on keboola/agnes-infra-template).

Fail-soft: if secret is missing the job is skipped and Renovate on the
template repo picks up the new tag on its next cycle as a fallback.

* docs(onboarding): 'Keeping the template up-to-date' maintainer section

Documents the two mechanisms (upstream release hook + Renovate), the
required repo settings (allow_auto_merge, validate.yml gate), the TOKEN
secret setup, and the one-time setup checklist. Notes the difference
between template repo (auto-merge on) and customer infra repos
(human approval).
2026-04-21 21:32:58 +02:00
ZdenekSrotyr
96bd06ba00 feat(auth): optional SEED_ADMIN_PASSWORD to pre-hash seed admin (dev helper)
Terraform gains enable_seed_password + seed_admin_password (sensitive) vars
on the customer-instance module; when enabled the password is piped via
startup-script into /opt/agnes/.env as SEED_ADMIN_PASSWORD. On first boot
app/main.py argon2-hashes it onto the seed user so the admin can log in
immediately without going through /auth/bootstrap. Never overwrites an
existing password_hash — safe against accidental reset on terraform apply.
2026-04-21 21:32:22 +02:00
ZdenekSrotyr
e4f6910398 Merge: real CalVer + commit SHA in UI badge 2026-04-21 21:00:42 +02:00
ZdenekSrotyr
1c7cc8aa29 fix(image): add AGNES_COMMIT_SHA build-arg to Dockerfile + release.yml
Completes the previous commit — bakes the full git SHA into the image ENV
at build time so the UI badge shows a real commit, not a sha256 digest
(which was the floating manifest digest and unhelpful for debugging).
2026-04-21 21:00:30 +02:00
ZdenekSrotyr
af6761f33e fix(version): bake AGNES_VERSION/CHANNEL/COMMIT_SHA into image ENV
Before: startup script wrote AGNES_VERSION=stable (the floating tag name)
into .env, which overrode the image's build-time ENV AGNES_VERSION=2026.04.47.
UI badge showed 'stable-stable' instead of 'stable-2026.04.47'.

After:
- Dockerfile ARG/ENV for AGNES_COMMIT_SHA (alongside existing VERSION + CHANNEL)
- release.yml passes github.sha as AGNES_COMMIT_SHA build-arg
- Startup script no longer writes these three into .env; the app reads them
  from the image ENV set at build time.

Result: badge displays 'stable-2026.04.47 · stable · <time> ago' with the
real CalVer, and the commit SHA tooltip points at an actual commit rather
than the floating manifest digest.
2026-04-21 21:00:04 +02:00
ZdenekSrotyr
7553f77e55 Merge: version badge partial on all full-page templates 2026-04-21 20:52:04 +02:00
ZdenekSrotyr
432e7695b3 feat(ui): version badge as shared partial, injected into every full-page template
The earlier base.html edit only affected templates that extend base.html
(login.html via base_login.html). Most pages (dashboard, catalog,
admin_tables, admin_permissions, activity_center, corporate_memory, ...)
are standalone templates with their own <body>, so the badge never showed.

Fix: extracted the badge + fetch script into _version_badge.html partial,
included it before </body> in every full-page template. Consistent across
login, dashboard, admin, catalog, etc.
2026-04-21 20:51:55 +02:00
ZdenekSrotyr
dbac3e698c Merge: alert policy reducer fix 2026-04-21 20:36:21 +02:00
ZdenekSrotyr
9a99a82e92 fix(infra): alert policy aggregation — drop cross_series_reducer
GCP rejected the policy with 'REDUCE_COUNT_FALSE cannot be applied to
metrics with value type DOUBLE' — because ALIGN_FRACTION_TRUE already
produces a fraction 0..1 per series, no need for an additional cross-series
reducer. Simplified: alert when the per-series fraction < 1 for 5 min.

Review M4 predicted this — uptime check filters needed double-checking
against live GCP.
2026-04-21 20:36:09 +02:00
ZdenekSrotyr
717f40c218 Merge: bootstrap monitoring role fix 2026-04-21 20:32:59 +02:00
ZdenekSrotyr
4ab0838ba2 fix(bootstrap): grant monitoring.editor + enable monitoring API
v1.3.0 added google_monitoring_uptime_check_config + alert policies to the
module, but bootstrap-gcp.sh was not updated. Fresh customers (and the
first apply after upgrading existing customers) hit 403 on
monitoring.uptimeCheckConfigs.create.

Fix: enable monitoring.googleapis.com + grant roles/monitoring.editor to
the deploy SA. Idempotent (safe to re-run on existing projects).
2026-04-21 20:32:50 +02:00
ZdenekSrotyr
3fb17a13bb Merge: workflow-driven recreate + docs 2026-04-21 20:24:40 +02:00
ZdenekSrotyr
1a55167234 docs: workflow-driven VM recreate for startup-script propagation
- ONBOARDING.md: replace 'propagating module changes' section with two
  explicit options — workflow_dispatch with recreate_targets (recommended,
  CI audit trail), or local terraform apply -replace (emergency). Adds a
  'do not' section banning manual .env edits on VMs.
- deployment-log.md: iteration 4 summary (version badge + module v1.5.0 +
  workflow_dispatch).
2026-04-21 20:24:31 +02:00
ZdenekSrotyr
11c03f7235 Merge: version badge in footer + /api/version 2026-04-21 20:19:51 +02:00
ZdenekSrotyr
b091cf7003 feat(ui): version badge in footer + /api/version endpoint
UI now shows a small footer badge with:
- release channel + CalVer version (e.g. 'stable-2026.04.47')
- floating image tag (e.g. 'stable')
- time since last container restart (proxy for 'last deployed')

Backend:
- app/api/health.py: /api/health returns image_tag, commit_sha, deployed_at
- app/api/health.py: new /api/version endpoint (lightweight, no DB hit, for
  footer badge polling)

Infra:
- startup-script.sh.tpl: resolves image digest from ghcr pull, derives
  channel + version from the tag name, and writes AGNES_VERSION /
  RELEASE_CHANNEL / AGNES_COMMIT_SHA into .env so the app can surface them
  to the UI.

UI:
- app/web/templates/base.html: footer loads /api/version asynchronously and
  renders '<channel>-<version> · <tag> · deployed <relative> (<UTC>)'.
  Tooltip shows full detail (commit sha, schema version).
2026-04-21 20:19:40 +02:00
ZdenekSrotyr
2743de6114 Merge: deployment log iteration 3 2026-04-21 20:09:27 +02:00
ZdenekSrotyr
cdd959b19f docs(log): add iteration 3 — review, bootstrap fix, docs sweep, infra-v1.4.0 2026-04-21 20:09:13 +02:00
ZdenekSrotyr
c1227df990 Merge: docs sweep — DEPLOYMENT.md rewrite, ONBOARDING v1.4.0, README links 2026-04-21 20:08:23 +02:00
ZdenekSrotyr
0121354596 docs: refresh DEPLOYMENT.md and ONBOARDING.md for infra-v1.4.0
- docs/DEPLOYMENT.md: rewritten to pick between Terraform (managed) and
  Docker Compose (OSS self-host). Old manual SSH-key-and-git-clone flow
  replaced with compose-based instructions pointing at the persistent-disk
  overlay and bootstrap endpoint.
- docs/ONBOARDING.md: section 4 now documents the new v1.4.0 variables
  (runtime_secrets, firewall_ssh_source_ranges, notification_channel_ids,
  compose_ref). Section 6 explains the /auth/bootstrap seed-user fix and
  warns that destroy+apply reopens the bootstrap window until run again.
- README.md: Documentation list expanded — ONBOARDING.md first (recommended
  path), DEPLOYMENT.md as the branching point, plus links to CONFIGURATION,
  architecture, and QUICKSTART.
2026-04-21 20:07:43 +02:00
ZdenekSrotyr
0643437ab8 Merge: /auth/bootstrap seed-user fix 2026-04-21 20:01:38 +02:00
ZdenekSrotyr
2b17973796 fix(auth): /auth/bootstrap activates seed users, disabled only by real password
Bug: SEED_ADMIN_EMAIL creates a password-less user at app startup, which made
/auth/bootstrap return 403 '1 users already exist' on a fresh deployment —
leaving the operator no way to log in (the seed user has no password, and
/auth/token requires one).

Fix: bootstrap is now disabled only when at least one user has a
password_hash set. On a fresh deploy with a seed user:
- POST /auth/bootstrap { email: <matches seed>, password: X } → sets the
  password on the seed user, promotes to admin, returns token.
- With a non-matching email, a new admin is created alongside the seed user.

Lock semantics: bootstrap self-deactivates as soon as any password is set.

Tests: 8 passing, including new test_bootstrap_activates_seed_user and
test_bootstrap_disabled_when_password_user_exists covering the two halves.
2026-04-21 20:01:20 +02:00
ZdenekSrotyr
7245eedd23 Merge: code review fixes — scoped SA, fail-fast, firewall split, cron .env 2026-04-21 19:40:07 +02:00
ZdenekSrotyr
921094ae40 feat(infra): address code review — scoped SA, fail-fast secrets, firewall split, cron reads .env, merge fix
Critical fixes:
- C1: VM SA now gets secretmanager.secretAccessor only on specific secrets
  (JWT + each entry in runtime_secrets). Previously project-wide.
- C3: chmod 640 on /var/log/agnes-startup.log (defense in depth)
- C4: Remove '|| echo ""' fallback on keboola-storage-token — boot now fails
  loudly if the secret is missing instead of starting a broken app.
- C5: Cron auto-upgrade script sources /opt/agnes/.env for AGNES_TAG. If an
  operator edits .env to pin a specific stable-YYYY.MM.N, cron picks it up
  immediately with no drift. Removed AGNES_TAG from crontab entry.
- C7: explicit depends_on = [IAM bindings, secret_version] on VM — prevents
  race where VM boots before IAM propagates.

Important fixes:
- I1: Split firewall into web (80/443 + conditional 8000) and ssh (port 22 with
  configurable source_ranges, default IAP range only).
- I4: Fetch docker-compose files from compose_ref (default 'main'), so customers
  can pin a specific tag for reproducibility.
- I5+I6: Merge order fixed — user-supplied dev_instances values now override
  defaults (was the other way around). Dev tls_mode default flipped to 'none'.
- I7: Remove '|| true' on Caddyfile fetch; surface failures loudly.
- New acme_email variable (falls back to seed_admin_email if empty).

Out-of-module:
- Comments translated from Czech to English where applicable (M1).
2026-04-21 19:39:53 +02:00
ZdenekSrotyr
9962fc4d40 Merge: final deployment log iteration 2 2026-04-21 19:11:14 +02:00
ZdenekSrotyr
6470e23df3 docs: finalize deployment log — iteration 2 summary 2026-04-21 19:11:07 +02:00
ZdenekSrotyr
1073517969 Merge: onboarding race-condition fix 2026-04-21 19:10:12 +02:00
ZdenekSrotyr
0b4807a836 docs(onboarding): use 'gh repo create --clone' to avoid template-copy race
Separate 'gh repo create --clone=false' + 'git clone' races with GitHub's
template content propagation. '--clone' waits for it in one step.
2026-04-21 19:10:04 +02:00
ZdenekSrotyr
4501840893 Merge: onboarding docs — propagation, restore, monitoring 2026-04-21 19:06:27 +02:00
ZdenekSrotyr
3e9213bfc4 docs(onboarding): add module propagation, backup restore, monitoring setup
- 'Propagating module changes' — explains ignore_changes + -replace workflow
- 'Restoring from backup' — step-by-step disk swap from daily snapshot
- 'Monitoring alerts' — wiring notification channels
2026-04-21 19:06:20 +02:00
ZdenekSrotyr
85bca573a7 Merge: daily backup snapshot + monitoring alerts 2026-04-21 19:02:07 +02:00
ZdenekSrotyr
0842debf8a feat(infra): add daily backup snapshot + monitoring alerts
- google_compute_resource_policy.daily_backup: daily snapshot at 02:00,
  30-day retention, labels (app=agnes, customer=<name>)
- google_compute_disk_resource_policy_attachment.data_backup: attach policy
  to each data disk (prod + dev)
- google_monitoring_uptime_check_config.health: per-VM /api/health uptime
  check every 60s, 10s timeout
- google_monitoring_alert_policy.health_failure: alert when uptime check
  fails for > 5 min

New opt-out: enable_monitoring = false (default true)
New opt-in:  notification_channel_ids = [...] to wire alerts to email/Slack

Module API unchanged; existing customers pick up backups + monitoring on
next module upgrade. TF provider requirement unchanged.
2026-04-21 19:01:56 +02:00
ZdenekSrotyr
0ca8ed2bce Merge: per-branch image tag :dev-<slug> for branch-aware dev deploys 2026-04-21 18:47:16 +02:00
ZdenekSrotyr
5188bd9127 ci: add per-branch image tag :dev-<slug> for branch-aware dev deploys
Extracts branch name from GITHUB_REF, slugifies it, and adds as extra tag
on feature branch builds. Main branch is unaffected (no branch_slug output).

Enables dev_instances tfvar with image_tag pinning specific feature branches.
2026-04-21 18:47:01 +02:00
ZdenekSrotyr
1811a408de Merge: fix CI smoke test — split host bind mount to separate overlay 2026-04-21 16:54:27 +02:00
ZdenekSrotyr
1acc89c486 fix(ci): move bind-mount of /data to separate overlay, fix CI smoke test
The CI smoke test failed because docker-compose.prod.yml forced a bind mount
to /data on the host — which doesn't exist on GitHub runners.

Split the bind mount into docker-compose.host-mount.yml, which is only
composed by the VM startup script (/data exists there, mounted from the
persistent disk). CI continues to use the default named volume.

Module startup script + auto-upgrade cron now compose all three:
  -f docker-compose.yml -f docker-compose.prod.yml -f docker-compose.host-mount.yml
2026-04-21 16:54:18 +02:00
ZdenekSrotyr
a3b4b43e47 Merge: deployment log with final state 2026-04-21 16:51:28 +02:00
ZdenekSrotyr
03dd81c825 docs: update deployment log with final state and onboarding workflow
- Volume fix documented (Docker named volume → bind mount /data)
- Watchtower → cron-based auto-upgrade
- Final state snapshot of VMs, repos, tags, secrets
- Onboarding flow summary for 2nd customer
2026-04-21 16:51:20 +02:00
ZdenekSrotyr
85c6b114b0 Merge: add ONBOARDING.md 2026-04-21 16:49:54 +02:00