User management + PAT + CLI distribution + HTML auth redirect (#9 #10 #11 #12 ) (#28 )

* fix: redirect unauthenticated HTML routes to /login (#10)

* docs(plan): user mgmt + PAT + CLI distribution implementation plan (#9 #10 #11 #12)

* build(docker): produce wheel artifact for /cli/download (#9)

* feat(db): schema v5 — users.active + deactivated_at/by (#11)

* feat(api): /cli/download wheel + /cli/install.sh with baked server URL (#9)

* feat(users): repository supports active flag + count_admins (#11)

* feat(ui): /install page with per-deployment install instructions (#9)

* feat(api): user PATCH/reset-password/set-password/activate/deactivate (#11)

* fix(cli): da login prompts for password and sends it in body (#9)

* test(api): safeguard tests for self-deactivate and last admin (#11)

* feat(auth): reject requests from deactivated users (#11)

* fixup(#10): propagate next through /login buttons + lock down sanitizer tests

* feat(cli): da admin set-role/activate/deactivate/reset-password/set-password (#11)

* feat(ui): /admin/users management page (#11)

* feat(db): schema v6 — personal_access_tokens (#12)

* feat(users): access_tokens repository (#12)

* feat(auth): JWT carries typ (session|pat) and explicit jti (#12)

* feat(auth): reject revoked/expired PATs; update last_used_at (#12)

* feat(api): /auth/tokens CRUD + admin revoke; session-only guard (#12)

* feat(cli): da auth token create/list/revoke (#12)

* feat(ui): /profile page with PAT create/list/revoke (#12)

* docs: PAT usage and session/PAT TTL clarification (#12)

* feat(auth): PAT first-use-from-new-IP audit + last_used_ip (schema v7) (#12)

Closes remaining acceptance gap from issue #12: audit_log entry on first use
of a PAT from an IP that differs from the recorded last_used_ip.

- schema v7: personal_access_tokens.last_used_ip column
- AccessTokenRepository.mark_used now stores the client IP
- get_current_user extracts client IP (X-Forwarded-For first hop, fallback
  to request.client.host) and emits a token.first_use_new_ip audit when the
  IP changes on a subsequent use (not the very first use)
- tests: new-ip audit, same-ip no-op, first-ever-use no-op, schema v7 column

* fix: address Devin review findings on PR #28

- app/main.py: exclude /auth/* from HTML redirect handler so JSON
  endpoints under /auth/ (PAT CRUD used by `da auth token` CLI) keep
  their 401 JSON contract (Devin #1, bug)
- app/api/tokens.py: reject expires_in_days <= 0 explicitly; use
  `is not None` so 0 no longer silently creates a non-expiring token
  (Devin #2)
- app/api/users.py: validate role against Role enum in create_user
  to match update_user and prevent 500 on role-protected requests
  later (Devin #3)
- app/web/templates/admin_users.html: escape user-supplied strings
  before innerHTML; move onclick handlers to addEventListener via
  data attributes so emails with quotes / HTML no longer break the UI
  or enable stored XSS (Devin #4)
- app/auth/router.py, app/auth/providers/{password,google}.py:
  reject deactivated users at login instead of issuing a JWT that
  would then fail on the next request — removes the confusing
  redirect loop (Devin #5)
- CLAUDE.md: document schema v7 instead of stale v4 (Devin #6)
- tests/test_web_ui.py: regression test for the /auth/* JSON 401

* feat(web): add /profile and /admin/users links to dashboard nav

* feat(web): point setup banner at /install page

* chore(web): drop unused setup_instructions context

* fix: address Devin review round 2 on PR #28

- app/api/tokens.py: when expires_in_days is None (the "never" option),
  use a ~100-year JWT expiry so the token doesn't silently die in 24h
  via the session-default fallback in create_access_token. The real
  expiry enforcement stays in verify_token's DB-level check (Devin 🔴)
- app/web/templates/profile.html: escape t.name and other user-supplied
  strings via esc() helper before innerHTML, same pattern as
  admin_users.html. Move revoke onclick to data-attribute +
  addEventListener (Devin 🟡)
- app/api/cli_artifacts.py: use `mktemp -d` with X's at end of template
  for GNU/BSD portability, place wheel inside the temp dir and
  clean up with rm -rf (Devin 🚩)

* feat(web): redesign /install page; make curl one-liner primary, collapse manual

Rebuild the public /install page using the dashboard visual language
(shared header, card layout, gradient hero, design tokens from
style-custom.css). The page is now anchored on the one-liner install
path: curl -fsSL <server>/cli/install.sh | bash is rendered as the
primary, prominent step 1, while the old manual wheel-download flow
is tucked behind a closed-by-default <details> block for users in
restricted/offline environments.

Information architecture:
  hero (server URL + version)
  -> step 1: quick install (one-liner, big Copy button)
  -> step 2: create PAT on /profile + export DA_TOKEN / da auth whoami
  -> step 3: Claude Code / MCP via ~/.config/da/token.json
  -> collapsed "Manual install" details for download-wheel flow
  -> footer link to docs/HEADLESS_USAGE.md

Every shell snippet has a vanilla-JS "Copy" button that confirms
visually ("Copied!" for 1.5s) and falls back to textarea+execCommand
on non-secure contexts. No new dependencies, no bundler.

The route now also pulls an optional user so the header shows the
same nav (Dashboard / Profile / Logout) as dashboard.html when a
session exists, while staying fully public when signed out.

* fix(cli): use real wheel filename in install.sh (broken pip/uv install)

The installer wrote the downloaded wheel as agnes_cli.whl, which lacks a
PEP-427 version component — both pip and uv tool install reject it and
abort the one-liner.

Use curl -OJ so Content-Disposition determines the on-disk filename, then
resolve it via glob. Install an EXIT trap to remove the tmpdir even when
install fails.

* fix(web): correct manual install wheel glob and add PEP 668 / PATH hints

- Wheel glob is agnes_the_ai_analyst-*.whl (not agnes-*.whl) — the old
  pattern never matched the real artefact name from the build.
- Add — or — separator between uv tool install and pip install.
- Warn that pip install --user is blocked on macOS Homebrew / modern
  Debian (PEP 668) and recommend uv tool install as the default path.
- Both flows now show the ~/.local/bin PATH hint so a fresh shell can
  find the da binary after install.

* fix(web): consistent session.user reference in install header

The avatar-letter fallback inside {% if session.user %} was reading
user.name / user.email directly, but the route dependency can pass
user=None — those references resolved to an empty FlexDict and produced
an empty avatar circle. Read everything through session.user to match
the guard and the dashboard pattern.

* fix(web): point headless usage link at GitHub source

/docs/HEADLESS_USAGE.md 404s — no static route serves repo docs. Point
the footer link at the rendered markdown on GitHub instead of adding a
dedicated docs serving route just for one file.

* feat(web): /install hero size, anon sign-in banner, step 2 copy polish

- Bump hero h1 from 26px to 30px to match dashboard primary scale.
- Anonymous visitors see a small sign-in banner above Step 2 (creating
  a token requires auth; without the banner the flow appears stuck).
- Add an 'After generating your token' section label inside Step 2 so
  the /profile CTA button no longer looks wedged mid-sentence between
  adjacent paragraphs.

* chore(web): /install a11y + version pill polish

- aria-live='polite' on copy buttons so screen readers announce the
  'Copied!' state change.
- Replace redundant INSTANCE_NAME eyebrow (already in the header logo)
  with 'Getting started'.
- Hide the version pill when AGNES_VERSION is unset/'dev' — avoids the
  misleading 'vdev' label in local/unbuilt runs.
- Manual summary focus-visible outline-offset +2px (was -2px which
  clipped inside the card), and mark the chevron as decorative.

* fix(web): use session.user in dashboard avatar fallback

Inside {% if session.user %} guard, the avatar fallback referenced
(user.name or user.email). If user is None the block crashes when
the profile picture is absent. Align with the guard variable.

* fix: address Devin review round 3 on PR #28

- app/api/users.py: stop auto-sending email from reset_password. The
  magic-link sender would deliver a "Login Link" that — when clicked —
  consumes the reset_token via verify_magic_link and logs the user in
  WITHOUT prompting for a new password. Admins now share the raw
  reset_token from the API response manually, or use set-password
  directly. email_sent is always False. Documented inline. (Devin 🟡)
- app/api/cli_artifacts.py: harden /cli/install.sh generation against
  shell injection via Host header or AGNES_VERSION. base_url is
  validated against a strict scheme+host+port regex; version against
  an alnum + dot/dash/underscore allowlist. Both values are also
  piped through shlex.quote() as defense in depth. (Devin 🟡)

The shared users.reset_token column between magic-link and password-
reset flows (Devin 🚩) remains an architectural gap; splitting into
separate columns needs schema v8 and is tracked for a follow-up PR.

* docs, chore(grpn): manual-deploy helpers + hackathon deploy learnings

Adds scripts/grpn/ — Makefile + agnes-auto-upgrade.sh + README for
operating Agnes on GRPN's existing foundryai-development VM when the
full Terraform flow is blocked by org policies:

- iam.disableServiceAccountKeyCreation (org constraint) forbids SA
  JSON keys, so GCP_SA_KEY-based CI is unavailable
- No projectIamAdmin delegation → bootstrap-gcp.sh can't grant roles
- Secret Manager IAM bindings require setIamPolicy which editor lacks

Helper targets: deploy, deploy-tag, recreate, restart, stop, start,
status, version, logs, ps, env, ssh, tunnel, open, bootstrap-admin,
set-data-source, install-cron, uninstall-cron.

docs/superpowers/plans/2026-04-22-grpn-deploy-learnings.md — running
log of all org-policy constraints hit during the hackathon deploy,
with workarounds and derived follow-ups (WIF support, external_ip
variable, customer onboarding IAM checklist).

Not a replacement for the TF flow — stopgap until WIF lands.

* fix(web): make header logos clickable links to home

* feat(web): one-click "Setup a new Claude Code" button

Adds a single-button flow on the dashboard and /install page that
generates a fresh personal access token via POST /auth/tokens and
copies a complete, paste-ready setup script (server URL, token,
install/verify commands) to the clipboard. Falls back to a modal
textarea when the clipboard is blocked; redirects to /login on 401;
surfaces backend errors inline.

- dashboard.html: replaces the top "Set up your local environment"
  anchor with a real button wired to setupNewClaude(). Removes the
  duplicate bottom setup banner to keep a single entry point.
- install.html: for signed-in users, Step 1 leads with the one-click
  button and demotes the curl one-liner into a collapsible "Or run
  manually" aside. Anonymous visitors still see the curl flow plus a
  sign-in hint.
- No new deps. Vanilla JS. Token lives in memory/clipboard only —
  never rendered into persistent DOM.

* feat(cli): add "da auth import-token" for non-interactive PAT login

Writes a provided JWT into ~/.config/da/token.json using the canonical
{access_token, email, role} shape expected by save_token(). Decodes the
token locally to pull email/role claims, verifies it against the server
via GET /api/catalog/tables, and refuses to overwrite an existing token
file if the server returns 401. --email / --role overrides exist for
tokens missing those claims; --skip-verify bypasses the server round-trip
for offline / CI scenarios.

* test(cli): cover da auth import-token success + 401 + claim-fallback paths

Three new tests in TestAuthImportToken:
- valid JWT + 200 -> canonical token.json written
- 401 from /api/catalog/tables -> exit 1, existing token file untouched
- JWT without email/role claims -> refused without overrides, accepted
  with --email / --role flags

* feat(web): update one-click Claude setup instructions — explicit uv install, import-token, skills question

Replaces the fragile `cat > token.json <<EOF` clipboard payload with an
explicit, auditable sequence:

  1. `curl -fsSL /cli/download` + `uv tool install --force` (no opaque
     `curl | bash`).
  2. `da auth import-token --token ...` instead of hand-written JSON.
  3. Explicit PATH persistence for zsh/bash.
  4. A required question to the user about whether to copy the bundled
     skills into ~/.claude/skills/agnes/ or pull them on-demand via
     `da skills show`.
  5. A final confirmation step with whoami + version output.

Factored both pages to include a shared partial
(app/web/templates/_claude_setup_instructions.jinja) so dashboard.html
and install.html can never drift apart again. {server_url} and {token}
stay as runtime placeholders substituted by renderSetupInstructions().

* feat(ui): modernize /admin/users + unify header nav across pages

- New shared partial app/web/templates/_app_header.html — single source
  of truth for the top navigation. Used by base.html and dashboard.html
  (which doesn't extend base.html). Active page highlighted via
  request.url.path. Admin "Users" link gated by session.user.role.
- style-custom.css: add .app-header / .app-nav-link / .app-btn-logout /
  .app-avatar styles (mirrors dashboard's previous inline copy under
  app-* prefix). Mobile-friendly fallback at <720px.
- base.html: include the new partial so every page extending base
  (admin_users, profile, login_email, error, …) gets the same chrome
  the dashboard has.
- dashboard.html: replace its inline <header class="header"> markup
  with the shared partial. Inline .header CSS left in place as
  harmless dead code (separate cleanup PR).
- admin_users.html: rewritten with avatars, role pills (color-coded
  per role), toggle switch for active, search/filter input, toast
  notifications, modal dialogs replacing alert/confirm/prompt,
  one-click copy for the reset token, empty / loading states.
  All XSS-safe via the existing esc() helper + data-attribute
  event delegation.
- tests/test_web_ui.py: smoke test that /admin/users renders the new
  shared header chrome and the modernized markup.

* feat(api): serve CLI wheel at /cli/agnes.whl for direct uv install

uv tool install inspects the URL path suffix to recognise a wheel, so
/cli/download (which has no .whl suffix) cannot be installed directly.
Expose a stable /cli/agnes.whl alias over the same wheel lookup so users
can run: uv tool install --force https://<server>/cli/agnes.whl

* test(cli): cover da auth import-token --server persisting to config.yaml

The server persistence was already implemented in the import-token command
(save_config({server}) call) but not covered by tests. Add an explicit test
so the one-step setup contract — single import-token call writes both token
and server — cannot regress.

* feat(web): simpler Claude setup — single uv install URL, single import-token call

User feedback: the prior clipboard payload repeated the server URL and
token across multiple steps (curl + tmpfile + install + rm + separate
seed-config + import-token). Collapse to:

 1. uv tool install --force {server_url}/cli/agnes.whl  (single URL, direct)
 2. da auth import-token --token ... --server ...        (one call, persists both)
 3. da auth whoami
 4. skills (ask user first)
 5. confirm

uv accepts HTTPS URLs that end in .whl and installs them directly, so
the tmpfile dance is unnecessary. import-token --server already persists
the server to config.yaml, so no separate printf > config.yaml step.

* fix(tests): update admin users heading assertion after template rename

The admin_users.html template now uses <h2 class="users-title">Users</h2>
instead of <h2>User management</h2>. Update the assertion to match.

* feat(ui): unify header across remaining 7 standalone pages

These 7 pages render their own full <html> and don't extend base.html,
so the previous unification commit only covered base + dashboard. Each
had its own ad-hoc <header> markup with inconsistent classes
(.top-header / .header / .page-header), inconsistent nav-link sets,
and inconsistent avatar/email styling.

Replace each inline <header>...</header> block with the shared
{% include '_app_header.html' %} so /activity-center, /admin/permissions,
/admin/tables, /catalog, /corporate-memory, /corporate-memory/admin,
and /install all show the same chrome (Dashboard / Install CLI /
Profile / Users / email + avatar / Logout) with the active page
highlighted via request.url.path.

Old inline header CSS (.header, .top-header, .page-header, .nav-link,
etc.) is left in place as harmless dead code; it can be cleaned up in
a follow-up sweep.

* feat(web): add readable preview of Claude setup payload on dashboard + /install

Move the line-by-line setup instructions into app/web/setup_instructions.py
as the single source of truth, then render them in two modes from the
existing _claude_setup_instructions.jinja partial:

- preview_mode=True  → visible, read-only <pre><code> block with the real
  server URL and a clearly-styled placeholder token (never a real one).
- preview_mode=False → the JS SETUP_INSTRUCTIONS_TEMPLATE used by the
  one-click flow (unchanged behaviour).

Both /dashboard (env-setup-cta card) and /install (Step 1 card) now show
the preview directly under the 'Setup a new Claude Code' button so users
can see exactly what will land in their clipboard before they click.

* feat(web): update setup instructions — `da diagnose` step, explicit section titles

Rework the Claude Code setup payload to:

- Give every numbered step an unambiguous verb header ("1) Install the CLI",
  "2) Log in", "3) Verify the login", "4) Run diagnostics", "5) Skills (ask
  the user first)", "6) Confirm").
- Add step 4 `da diagnose` as the post-login health check. The CLI already
  ships this command (cli/commands/diagnose.py); it prints "Overall:
  healthy" and a list of green checks that map cleanly to next actions.
- Ask the skills copy-vs-on-demand question verbatim so Claude Code always
  prompts the user the same way.
- Replace the terse "Confirm" line with a 4-bullet summary (version,
  whoami, skills choice, diagnose status) so the return message is
  structured and comparable across setups.

* chore(web): remove stale MCP card from /install (no MCP server today)

The 'Use with Claude Code / MCP' card (Step 3 on /install) referenced an
MCP integration Agnes does not ship. Remove the whole card. The one-click
'Setup a new Claude Code' flow in Step 1 already covers the long-lived
client use case and is less confusing than dangling persistence tips for
a non-existent integration.

* feat(api): include user_email + last_used_ip + user_id in admin tokens list response

Adds AdminTokenItem response model (superset of TokenListItem) and
AccessTokenRepository.list_all_with_user() joining personal_access_tokens
with users to denormalize user_email. Needed for /admin/tokens UI where
admins triage tokens across all users.

* feat(web): /admin/tokens page — list, filter, search, revoke across all users

Adds a new admin-only page with client-side filtering (status, user email,
last-used window), column sorting, counts bar (active/revoked/expired),
and an inline revoke action. Mirrors the /admin/users visual language.

* feat(web): add Tokens nav link for admins + deep-link from admin/users row

Admin-only nav entry to /admin/tokens, and a per-row Tokens button on
/admin/users that prefills the token page's user filter via ?user=<email>.

* test(admin): cover /admin/tokens rendering, filter state, non-admin denial, revoke

Verifies admin can render the page (title + JS hooks present), a non-admin
is blocked, unauthenticated users are redirected, the admin list response
includes user_email / user_id / last_used_ip, and admin can revoke another
user's token.

* feat(web): modern redesign of /admin/tokens — hero, stat strip, refined table, responsive cards, a11y

* feat(web): ditch the table — /admin/tokens as a card stack, modern GitHub-style list

Replaces the table-based layout with a stack of self-contained token cards
inside a <ul role=list>. Each card is a flex row: avatar + name/meta on the
left, last-used block in the middle, status pill + outlined 'Revoke' button
on the right. Status and sort controls are pill-shaped toggle chips; user
email search has an inline search icon. No <table>/<tr>/<th>/<td> anywhere.
Responsive below 720px (card stacks vertically) and 480px (stat chips 2x2).
Preserves filter IDs (flt-status, flt-user, flt-last-used) and data-revoke
for existing tests.

* feat(web): add /tokens (role-aware) — single page for both user PAT CRUD and admin overview

- Rename admin_tokens.html -> tokens.html with a new is_admin context flag.
- New route GET /tokens: renders the same card-stack UI for everyone.
  * Admins: loads /auth/admin/tokens, shows owner column + stat strip, keeps
    the owner-email search box and sort-by-owner chip.
  * Non-admins: loads /auth/tokens (own tokens only), hides owner column +
    stat chips, adds a 'New token' CTA in the hero that opens a modal
    (name + expires_in_days) calling POST /auth/tokens. The raw token is
    revealed once in a dismissable banner and cleared from the DOM on Hide.
- GET /admin/tokens now 302-redirects to /tokens, preserving query string
  (so the /admin/users deep-link ?user=foo still works).

* feat(web): /tokens full-bleed layout to match dashboard width

The hero, toolbar, and card list used to sit inside base.html's .container
(max-width 800px). Break out with negative horizontal margins so the page
spans the viewport like /dashboard does, capped at 1440px for readability
on very wide screens with a 24px gutter on each side.

- No change to base.html itself. The override is scoped to .tokens-page.
- body { overflow-x: hidden; } guards against rare horizontal scrollbars.
- < 808px viewport: reset to natural flow (mobile already narrower).
- ≥ 1488px viewport: cap to 1440px and re-center.

* chore(web): remove /profile template + nav link (redirect /profile -> /tokens)

The old /profile PAT CRUD page is now redundant — the modern /tokens page
covers both user and admin flows. Delete the template; the router's
/profile handler already 302-redirects to /tokens.

Nav cleanup:
- Remove the 'Profile' link.
- Show a single 'Tokens' link to every signed-in user (previously only
  admins saw it).
- Active-state matches /tokens, /admin/tokens, and /profile so the
  highlight survives the redirect chain.

/install CTA now points at /tokens instead of /profile.

* test: cover /tokens for admin + non-admin flows, /profile redirect, nav update

tests/test_admin_tokens_ui.py
- Point admin rendering test at /tokens directly and tighten assertions
  (admin-only stat strip + owner search, non-admin CTA absent).
- Add test_non_admin_can_render_tokens_page: personal body, New-token CTA,
  create-modal, reveal banner; stat strip + owner search absent.
- Add test_admin_tokens_redirects_to_tokens: 302 to /tokens, query string
  (?user=...) preserved for the /admin/users deep-link.
- Add test_profile_redirects_to_tokens: 302 to /tokens.
- Add test_non_admin_can_create_pat_via_tokens_page_api: exercises the
  POST /auth/tokens call that the non-admin create-modal submits.

tests/test_pat.py
- test_profile_page_renders -> test_profile_page_redirects_to_tokens:
  assert the 302 + that /tokens lands on the unified non-admin body.

tests/test_web_ui.py
- admin_users nav assertion: 'Tokens' link present, 'Profile' link absent.
- Add test_nav_shows_tokens_link_for_non_admin: non-admins see the same
  'Tokens' link (previously only admins did).
- Add test_profile_redirects_to_tokens back-compat check.

* feat(web): collapse 'What Claude Code will receive' by default

The preview block on /dashboard and /install now uses <details>/<summary>
so it is hidden by default. Click the chevron/title to expand and review
the clipboard payload. Markup stays in the DOM so existing tests that
assert on content continue to pass.

* fix(web): /tokens width — override .container to 1280px like dashboard

The negative-margin full-bleed trick was fragile and pushed content past
the right edge on deployed viewports. Replace with a simple max-width
override of base.html's .container on this page only, matching
/dashboard's 1280px center-column layout.

* feat(web): split role-aware /tokens into my_tokens.html + admin_tokens.html

* feat(web): router — separate handlers for /tokens (own) and /admin/tokens (all)

* feat(web): nav — show Tokens for all, add All tokens for admins

* test: cover split token pages (own vs all) + admin access gating

* feat(web): move 'My tokens' into a user dropdown menu

Replaces the separate Tokens/email/Logout nav trio with a rounded
avatar trigger that opens a dropdown containing the user's email,
role, a 'My tokens' link, and Logout. Admin-only 'All tokens' stays
as a top-level nav item since it's an admin function, not a personal
one. Click-outside and Escape close the panel; chevron rotates on
open.

* fix(api): allow PATs to list/get/revoke their own tokens (CLI flow)

The documented 'da auth token list/revoke' CLI flow in
docs/HEADLESS_USAGE.md uses a PAT, but the previous dependency
(require_session_token) returned 403. Only create_token must be
session-only to prevent PAT-spawning-PAT chains; listing and
revoking your own tokens is safe with a PAT.

* fix(api): cap expires_in_days at 3650 to avoid datetime overflow (500 to 400)

Values above ~11 million days overflowed datetime.max in
datetime.now(utc) + timedelta(days=...) and surfaced as an
unhandled OverflowError → 500. Cap at 10 years with a clear
400 instead; the no-expiry code path is unaffected.

* fix(api): relax _SAFE_URL_RE to allow path prefixes, underscores, and IPv6

The previous regex rejected legitimate reverse-proxy base_url values
(https://host/agnes/), underscores in Docker Compose hostnames, and
IPv6 literals (http://[::1]:8000). Widen the charset and allow an
optional trailing path. shlex.quote continues to provide
defense-in-depth against any metacharacter that slips through.

* fix(web): /login/email and Google OAuth propagate next_path

Previously, /login/email silently dropped the ?next=<path> query
param so the hidden form field rendered empty and login always
landed on /dashboard. Google's button was hard-coded to
/auth/google/login, ignoring next entirely.

- /login page now appends ?next to the Google button URL
- /login/email reads + sanitizes next, passes as template context
- google_login stashes sanitized next_path in session['login_next']
- google_callback pops + re-sanitizes and redirects there

Sanitization factored into app/auth/_common.safe_next_path.

* fix(auth): differentiate argon2 VerifyMismatchError from internal errors in web login

The previous except (VerifyMismatchError, Exception) collapsed both
cases into the generic 'invalid credentials' redirect, silently
hiding corrupted-hash / library errors from ops. Split the two:
bad password still gets ?error=invalid; anything else logs via
logger.exception and redirects with ?err=auth_internal so ops have
a visible signal and users don't retry forever against a broken
password_hash column.

* docs: correct CLAUDE.md table name (personal_access_tokens)

v7 note referenced 'access_tokens.last_used_ip' but the real table
is personal_access_tokens (as mentioned two tokens earlier in the
same bullet). Same-file consistency fix.

* chore(web): clarify admin user-reset UI — encourage Set password over the unused reset_token

POST /api/users/{id}/reset-password stores and returns a token
but no endpoint consumes it — the magic-link sender would log the
user in without prompting for a new password, defeating the reset.
- Drop the 'Reset' row action from admin_users so admins aren't
  pointed at a dead end.
- Rewrite the reveal-modal copy to tell admins to use Set password
  and explicitly note that the magic-link flow isn't available
  for reset tokens in this build.
The API endpoint stays for API-level future use.

* test: cover PAT CLI flow, expires_in_days overflow, proxy base_url, next propagation

- tests/test_pat.py: PAT can list own tokens (200, was 403);
  PAT can revoke own tokens (204); create_token returns 400 for
  expires_in_days > 3650 (was 500 via datetime overflow).
- tests/test_cli_artifacts.py: _SAFE_URL_RE accepts reverse-proxy
  path prefixes, underscores, and IPv6 literals; end-to-end check
  of cli_install_script with a stubbed base_url that includes
  a path prefix (Agnes behind /agnes/).
- tests/test_web_ui.py: /login propagates ?next to the Google
  button URL; /login/email renders next in the hidden form field
  and strips hostile values; unit coverage of safe_next_path.

* fix(security): use \Z instead of $ in URL/version allowlists (trailing-\n bypass)

Python regex `$` also matches just before a trailing newline, so a Host
header or AGNES_VERSION value like "good.example.com\n$(rm -rf /)"
would slip past the allowlist. `\Z` anchors to strict end-of-string.

shlex.quote downstream remains as defense-in-depth, but the allowlist
is now the tight gate it claims to be.

* fix(auth): PAT with null expiry omits JWT exp claim (DB is the source of truth)

Previously a PAT created with `expires_in_days=null` (user-requested
"never expires") set the DB `expires_at` to NULL (correct) but still
baked a ~100y `exp` claim into the JWT. That is misleading: the PAT
silently did expire eventually, despite the UI and API promising
"no expiry".

`create_access_token` now accepts `omit_exp=True` to skip the `exp`
claim entirely. `app/api/tokens.py` passes that when `expires_in_days
is None`. The authoritative expiry check lives in
`app/auth/dependencies.py`, which reads `expires_at` from the DB row —
unchanged. PyJWT accepts claim-less JWTs indefinitely.

* test: cover trailing-newline regex bypass + no-exp JWT for unbounded PAT

- test_safe_url_re_rejects_trailing_newline_bypass: asserts both
  `_SAFE_URL_RE` and `_SAFE_VERSION_RE` reject values with a trailing
  `\n` (previously accepted because Python `$` matches before `\n`).
- test_pat_null_expiry_jwt_has_no_exp_claim: POST /auth/tokens with
  `expires_in_days=null`, decode the returned JWT, assert `exp` is
  absent while `typ=pat`, `sub`, and `jti` are still present.
- test_pat_with_null_expiry_is_accepted_by_verify_token: verify_token
  round-trips a claim-less JWT without ExpiredSignatureError.
- test_pat_null_expiry_end_to_end_allows_authenticated_request: use
  the null-expiry PAT against /auth/tokens and confirm it authenticates.

* docs(auth): document X-Forwarded-For trust model in _client_ip

Deployment runs behind Caddy which strips incoming X-Forwarded-For
and sets its own, so the leftmost hop is trustworthy. Clarify that
the stored last_used_ip is audit-only and never used for access
control — if the app is ever exposed directly, this value becomes
client-settable.

* docs: /profile → /tokens in install.sh next-steps, CLI error, HEADLESS_USAGE, security skill

After splitting PAT management to /tokens (with /profile as a back-compat
302), stale references remained in user-facing text. Update them to the
canonical /tokens URL so shell scripts, CLI error hints, docs, and the
bundled security skill are all consistent.

2026-04-22 14:24:28 +02:00

34 KiB

Raw Blame History

Hackathon E2E Dry-Run Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Validate the full developer→dev-VM→merge→prod flow end-to-end the day before a multi-developer hackathon, so any broken link is found and fixed before participants arrive.

Architecture: This is an operational dry-run, not a code feature. The executing agent pushes a throwaway feature branch to the public repo, verifies that CI produces a per-branch Docker image tag on GHCR, switches the shared agnes-dev VM onto that tag via the existing auto-upgrade cron, verifies that the CI test gate blocks a deliberately-broken PR from reaching :stable, and produces a helper script + report. The plan is strictly non-destructive for prod — prod-pinning (point 6 of the original outline) is explicitly out of scope and left to the user.

Tech Stack: Bash / gcloud / gh / git / docker / curl / Python (pytest) / Terraform (plan only, no apply). No app code changes.

Out of Scope (do NOT do)

Any terraform apply against real infrastructure. TF plan is allowed; TF apply is forbidden.
Pinning prod_instance.image_tag in agnes-infra-keboola. User will do this themselves after the dry-run succeeds.
Rotating admin passwords, Keboola tokens, or JWT secrets.
Modifying main branch of any repo. All changes happen on throwaway branches, which are deleted at the end.
Creating new GCP resources (VMs, disks, IPs, secrets, SAs).

If any step would require doing one of the above, STOP and ask the user.

Prerequisites

Before starting, the executing agent MUST verify all of the following. If any fails, abort and report which prerequisite is missing — do NOT try to fix it.

Working directory is the tmp_oss checkout at /Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss. Current branch can be anything; the plan will create a new branch.
gh auth status shows authenticated, with workflow scope. Run:
```
gh auth status 2>&1 | grep -E "(Logged in|Token scopes)"
```
Expected: line containing Logged in to github.com and a line listing scopes that include workflow. If workflow scope is missing, abort with message: Run: gh auth refresh -h github.com -s workflow.
gcloud authenticated to project kids-ai-data-analysis. Run:
```
gcloud config get-value project
gcloud auth list --filter=status:ACTIVE --format="value(account)"
```
Expected: project is kids-ai-data-analysis, at least one active account. If not, abort with message: Run: gcloud config set project kids-ai-data-analysis && gcloud auth login.
SSH to agnes-dev works (OS Login). Run:
```
gcloud compute ssh agnes-dev --zone=europe-west1-b --command="echo ok" --quiet
```
Expected: output contains ok. First connection may take ~20s while OS Login provisions. If fails with permission error, abort with message: User needs compute.osLogin role on agnes-dev VM.
docker CLI available locally (for docker manifest inspect). Run: docker --version. Expected: version output. If missing, abort.
Public GHCR pull works. Run:
```
docker manifest inspect ghcr.io/keboola/agnes-the-ai-analyst:stable > /dev/null && echo ok
```
Expected: ok. If fails, abort — something is wrong with public image visibility.
Clone of agnes-infra-keboola exists or can be cloned at /tmp/agnes-infra-keboola. Run:
```
if [ ! -d /tmp/agnes-infra-keboola ]; then
  gh repo clone keboola/agnes-infra-keboola /tmp/agnes-infra-keboola
fi
cd /tmp/agnes-infra-keboola && git status --short
```
Expected: clone succeeds, git status is clean. If clone fails, skip Task 4 (TF plan verification) and note it in the final report.

Gate: All 7 prerequisite checks pass, OR the agent has clearly reported which ones failed and reduced scope accordingly. Only then proceed to Task 1.

Task 1: Baseline Snapshot

Purpose: Record the current state of both VMs and the TF outputs so the agent can detect drift at the end and prove it left everything as it found it.

Files:

Create: /tmp/dryrun-baseline/prod-health.json
Create: /tmp/dryrun-baseline/dev-health.json
Create: /tmp/dryrun-baseline/prod-image.txt
Create: /tmp/dryrun-baseline/dev-image.txt
Create: /tmp/dryrun-baseline/dev-env.txt
Step 1.1: Create baseline directory
```
mkdir -p /tmp/dryrun-baseline
```
Step 1.2: Capture prod health
```
curl -sf --max-time 10 http://34.77.102.61:8000/api/health > /tmp/dryrun-baseline/prod-health.json
cat /tmp/dryrun-baseline/prod-health.json | python3 -m json.tool
```
Expected: JSON with "status" field equal to "healthy" or "degraded". If "unhealthy" or curl times out, abort with message: Prod is not in acceptable baseline state — investigate before dry-run.

Step 1.3: Capture dev health

curl -sf --max-time 10 http://34.77.94.14:8000/api/health > /tmp/dryrun-baseline/dev-health.json
cat /tmp/dryrun-baseline/dev-health.json | python3 -m json.tool

Expected: JSON with "status" in {healthy, degraded}. Same abort condition as 1.2.

Step 1.4: Capture current image tags on both VMs

gcloud compute ssh agnes-prod --zone=europe-west1-b --quiet --command \
  "docker inspect \$(docker ps -qf name=app) --format '{{.Config.Image}}'" \
  > /tmp/dryrun-baseline/prod-image.txt
gcloud compute ssh agnes-dev --zone=europe-west1-b --quiet --command \
  "docker inspect \$(docker ps -qf name=app) --format '{{.Config.Image}}'" \
  > /tmp/dryrun-baseline/dev-image.txt
cat /tmp/dryrun-baseline/prod-image.txt /tmp/dryrun-baseline/dev-image.txt

Expected: each file contains exactly one line like ghcr.io/keboola/agnes-the-ai-analyst:stable or :stable-2026.04.XX. Non-empty.

Step 1.5: Capture agnes-dev .env AGNES_TAG line

gcloud compute ssh agnes-dev --zone=europe-west1-b --quiet --command \
  "sudo grep -E '^AGNES_TAG=' /data/.env || echo 'AGNES_TAG_NOT_SET'" \
  > /tmp/dryrun-baseline/dev-env.txt
cat /tmp/dryrun-baseline/dev-env.txt

Expected: output is AGNES_TAG=dev or similar. Record exact value for restoration in Task 6. If AGNES_TAG_NOT_SET, abort — the VM is in an unknown config state.

Step 1.6: Record baseline to report buffer

Append to a running report at /tmp/dryrun-report.md (create if not exists):

cat > /tmp/dryrun-report.md <<EOF
# Hackathon Dry-Run Report

**Run at:** $(date -u +"%Y-%m-%dT%H:%M:%SZ")

## Baseline (Task 1)

- Prod health status: $(jq -r '.status' /tmp/dryrun-baseline/prod-health.json)
- Dev health status: $(jq -r '.status' /tmp/dryrun-baseline/dev-health.json)
- Prod image: $(cat /tmp/dryrun-baseline/prod-image.txt)
- Dev image: $(cat /tmp/dryrun-baseline/dev-image.txt)
- Dev AGNES_TAG: $(cat /tmp/dryrun-baseline/dev-env.txt)

EOF
cat /tmp/dryrun-report.md

Expected: report file exists, all fields populated (no empty values).

Task 1 gate: baseline directory has 5 non-empty files, report has 5 non-empty bullet lines. Proceed.

Task 2: Verify Per-Branch GHCR Build

Purpose: Push a throwaway feature branch to the public repo, wait for the release workflow, and confirm that the per-branch :dev-<slug> tag appears on GHCR.

Files:

Create (throwaway): branch feature/hack-dryrun-<timestamp> in tmp_oss + one trivial commit touching docs/QUICKSTART.md

Branch naming: the agent MUST use feature/hack-dryrun-<epoch> (e.g. feature/hack-dryrun-1745254321) so the slug is unique per run and cleanup is deterministic.

Step 2.1: Compute branch name and expected slug

Per .github/workflows/release.yml:92-98 logic: strip feature/ prefix, sanitise [^a-zA-Z0-9-] to -, lowercase, cut 50 chars.

EPOCH=$(date +%s)
BRANCH="feature/hack-dryrun-${EPOCH}"
SLUG=$(echo "$BRANCH" | sed 's|^feature/||' | sed 's|[^a-zA-Z0-9-]|-|g' | tr '[:upper:]' '[:lower:]' | cut -c1-50)
echo "BRANCH=$BRANCH"
echo "SLUG=$SLUG"
echo "EXPECTED_TAG=ghcr.io/keboola/agnes-the-ai-analyst:dev-$SLUG"
# Persist for later steps
echo "$BRANCH" > /tmp/dryrun-baseline/branch-name.txt
echo "$SLUG" > /tmp/dryrun-baseline/slug.txt

Expected: BRANCH like feature/hack-dryrun-1745254321, SLUG like hack-dryrun-1745254321. Persisted.

Step 2.2: Create branch with trivial commit

cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
# Save current branch so we can return
git rev-parse --abbrev-ref HEAD > /tmp/dryrun-baseline/starting-branch.txt
BRANCH=$(cat /tmp/dryrun-baseline/branch-name.txt)
git checkout -b "$BRANCH"
echo "<!-- dryrun $(date -u +%FT%TZ) -->" >> docs/QUICKSTART.md
git add docs/QUICKSTART.md
git commit -m "dryrun: verify per-branch GHCR tag"
git push -u origin "$BRANCH"

Expected: branch created, one commit, push succeeds with upstream tracking. If push is rejected (e.g. protection), abort.

Step 2.3: Wait for release workflow to complete

cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
BRANCH=$(cat /tmp/dryrun-baseline/branch-name.txt)
# Get the most recent run id for this branch + workflow
sleep 10  # give GH a moment to register the run
RUN_ID=$(gh run list --branch "$BRANCH" --workflow release.yml --limit 1 --json databaseId --jq '.[0].databaseId')
echo "Watching run $RUN_ID"
gh run watch "$RUN_ID" --exit-status --interval 15
echo "Workflow exit: $?"

Expected: exit status 0 after ~3-5 min. If exit != 0, print the logs:

gh run view "$RUN_ID" --log-failed | tail -100

and abort with message: Release workflow failed for throwaway branch — investigate before hackathon.

Step 2.4: Verify per-branch tag exists on GHCR

SLUG=$(cat /tmp/dryrun-baseline/slug.txt)
EXPECTED="ghcr.io/keboola/agnes-the-ai-analyst:dev-$SLUG"
docker manifest inspect "$EXPECTED" > /tmp/dryrun-baseline/ghcr-manifest.json
DIGEST=$(jq -r '.config.digest // .manifests[0].digest' /tmp/dryrun-baseline/ghcr-manifest.json)
echo "Tag exists: $EXPECTED"
echo "Digest: $DIGEST"
echo "$DIGEST" > /tmp/dryrun-baseline/expected-digest.txt

Expected: docker manifest inspect returns JSON (exit 0), a non-empty digest is extracted. If the tag is missing, abort with message: release.yml did not produce :dev-<slug> tag — check build-and-push step logs.

Step 2.5: Record Task 2 result

SLUG=$(cat /tmp/dryrun-baseline/slug.txt)
cat >> /tmp/dryrun-report.md <<EOF
## Task 2: Per-Branch GHCR Build — PASS

- Branch: $(cat /tmp/dryrun-baseline/branch-name.txt)
- Slug: $SLUG
- Tag: ghcr.io/keboola/agnes-the-ai-analyst:dev-$SLUG
- Digest: $(cat /tmp/dryrun-baseline/expected-digest.txt)

EOF

Task 2 gate: :dev-<slug> manifest exists. Proceed.

Task 3: Dev VM Switch Flow

Purpose: Simulate the hackathon developer path — have the shared agnes-dev VM pick up the per-branch image via the existing auto-upgrade cron, verify the new image is running, then (in Task 6) roll back.

Files touched (reversibly):

/data/.env on agnes-dev VM — one-line AGNES_TAG= change (rollback is captured in baseline from Step 1.5)

Step 3.1: Switch agnes-dev .env AGNES_TAG to the per-branch tag

SLUG=$(cat /tmp/dryrun-baseline/slug.txt)
NEW_TAG="dev-$SLUG"
gcloud compute ssh agnes-dev --zone=europe-west1-b --quiet --command "\
  sudo cp /data/.env /data/.env.dryrun-bak && \
  sudo sed -i 's|^AGNES_TAG=.*|AGNES_TAG=$NEW_TAG|' /data/.env && \
  sudo grep -E '^AGNES_TAG=' /data/.env"

Expected: final line is AGNES_TAG=dev-<slug>. If sed didn't match (no AGNES_TAG= line existed), abort and manually investigate.

Step 3.2: Trigger auto-upgrade cron script immediately
```
gcloud compute ssh agnes-dev --zone=europe-west1-b --quiet --command \
  "sudo /usr/local/bin/agnes-auto-upgrade.sh 2>&1 | tail -30"
```
Expected: output shows docker compose pull + docker compose up -d activity. If the script doesn't exist or errors, abort with message: auto-upgrade script missing or broken on agnes-dev.

Step 3.3: Wait for app container to become healthy

# Poll /api/health for up to 90s
for i in $(seq 1 30); do
  STATUS=$(curl -s --max-time 5 http://34.77.94.14:8000/api/health | jq -r '.status' 2>/dev/null || echo "down")
  echo "[$i/30] status=$STATUS"
  if [ "$STATUS" = "healthy" ] || [ "$STATUS" = "degraded" ]; then
    break
  fi
  sleep 3
done
[ "$STATUS" = "healthy" ] || [ "$STATUS" = "degraded" ] || { echo "FAIL: dev never healthy"; exit 1; }

Expected: reaches healthy/degraded within 90s.

Step 3.4: Verify the running image is the per-branch one

SLUG=$(cat /tmp/dryrun-baseline/slug.txt)
EXPECTED_DIGEST=$(cat /tmp/dryrun-baseline/expected-digest.txt)
RUNNING_IMAGE=$(gcloud compute ssh agnes-dev --zone=europe-west1-b --quiet --command \
  "docker inspect \$(docker ps -qf name=app) --format '{{.Image}}'")
echo "Running image digest: $RUNNING_IMAGE"
# The running image line will be sha256:xxxxx. Compare to the manifest digest we recorded.
# They should match (or differ only by multi-arch manifest indirection — compare via docker inspect on remote)
gcloud compute ssh agnes-dev --zone=europe-west1-b --quiet --command \
  "docker inspect \$(docker ps -qf name=app) --format '{{.Config.Image}}' && \
   docker image inspect \$(docker ps -qf name=app --format '{{.Image}}' | head -1) --format '{{.RepoTags}}{{.RepoDigests}}'"

Expected: RepoTags or RepoDigests output includes either :dev-$SLUG or the digest from Step 2.4. If neither matches, the cron didn't pull the new tag — record as FAIL and continue (cleanup is still required).

Step 3.5: Record Task 3 result

The agent must judge PASS/FAIL based on Step 3.4 output: PASS iff RepoTags or RepoDigests contained :dev-$SLUG or the digest captured in Step 2.4.

SLUG=$(cat /tmp/dryrun-baseline/slug.txt)
# Replace <RESULT> with PASS or FAIL based on the Step 3.4 output the agent observed.
# Replace <IMAGE_OUTPUT> with the RepoTags/RepoDigests line from Step 3.4.
# Replace <SECONDS> with the loop iteration count from Step 3.3 × 3.
cat >> /tmp/dryrun-report.md <<EOF
## Task 3: Dev VM Switch — <RESULT>

- Switched agnes-dev to AGNES_TAG=dev-$SLUG
- Health after switch: reached healthy/degraded within 90s
- Running image: <IMAGE_OUTPUT>
- Time from cron trigger to healthy: <SECONDS>s

EOF

Task 3 gate: health reached OK state; running image verified. Proceed even if image verification was inconclusive — rollback still required.

Task 4: Terraform Plan Verification (Private Repo)

Purpose: Validate that adding a new entry to dev_instances produces a clean terraform plan (not apply) in agnes-infra-keboola. This proves the TF module accepts the variable shape the hackathon docs will recommend.

Skip condition: If prerequisites check found that /tmp/agnes-infra-keboola clone failed, skip this entire task and record SKIPPED — repo unavailable in the report.

Files touched (throwaway branch only):

/tmp/agnes-infra-keboola/terraform/terraform.tfvars (throwaway edit)

Step 4.1: Create throwaway branch in private repo

cd /tmp/agnes-infra-keboola
git checkout main
git pull
EPOCH=$(date +%s)
BRANCH="dryrun-tfplan-${EPOCH}"
echo "$BRANCH" > /tmp/dryrun-baseline/tf-branch.txt
git checkout -b "$BRANCH"

Expected: clean checkout of main, new branch created.

Step 4.2: Add throwaway dev_instance entry

Read terraform/terraform.tfvars first to understand the current dev_instances shape. Then append a new entry.

The dev_instances variable schema (from infra/modules/customer-instance/variables.tf:41-49) is:
```
list(object({
  name         = string
  machine_type = optional(string, "e2-small")
  image_tag    = optional(string, "dev")
}))
```
Modify the dev_instances list to append:
```
{ name = "agnes-hack-dryrun", image_tag = "dev-<slug-from-task-2>" }
```
The agent should detect the current tfvars format and insert accordingly. If the file does not already contain dev_instances, abort and report format-mismatch.
```
SLUG=$(cat /tmp/dryrun-baseline/slug.txt)
# Show current tfvars for context
cat /tmp/agnes-infra-keboola/terraform/terraform.tfvars | grep -A 20 "dev_instances"
# Agent must edit the file to add the new entry — use the Edit tool rather than sed to be safe.
```
After editing, show the diff:
```
cd /tmp/agnes-infra-keboola
git diff terraform/terraform.tfvars
```
Expected: diff adds exactly one new entry to dev_instances list with name = "agnes-hack-dryrun" and image_tag = "dev-<slug>".

Step 4.3: Run terraform plan locally (no apply)

cd /tmp/agnes-infra-keboola/terraform
export GOOGLE_APPLICATION_CREDENTIALS="$HOME/.agnes-keys/agnes-deploy-kids-ai-data-analysis-key.json"
[ -f "$GOOGLE_APPLICATION_CREDENTIALS" ] || { echo "SA key not found — skipping plan"; exit 2; }
terraform init -input=false -upgrade=false
terraform plan -input=false -no-color -out=/tmp/dryrun-tfplan.bin > /tmp/dryrun-tfplan.txt 2>&1
RC=$?
echo "terraform plan exit: $RC"
tail -40 /tmp/dryrun-tfplan.txt

Expected:

exit 0 or 2 (2 = changes detected, which is what we want)
output ends with Plan: N to add, M to change, K to destroy. where N >= 1 (at least the new VM + disk + IP) and K == 0 (we must NOT be destroying anything)

If K > 0 or terraform plan errors, abort and DO NOT proceed to Step 4.4. Report the plan output verbatim in the final report.

Step 4.4: Discard throwaway branch (no push, no apply)

cd /tmp/agnes-infra-keboola
git checkout main
BRANCH=$(cat /tmp/dryrun-baseline/tf-branch.txt)
git branch -D "$BRANCH"
# Branch was never pushed, so nothing to clean up remotely.

Expected: branch deleted locally, main is current, working tree clean.

Step 4.5: Record Task 4 result

ADDS=$(grep -E "Plan:" /tmp/dryrun-tfplan.txt | head -1)
DESTROYS_OK=$(grep -E "Plan:.*0 to destroy" /tmp/dryrun-tfplan.txt && echo yes || echo no)
cat >> /tmp/dryrun-report.md <<EOF
## Task 4: TF Plan for New Dev VM — <PASS|SKIPPED|FAIL>

- Plan summary: $ADDS
- Zero destroys: $DESTROYS_OK
- Full plan output: see /tmp/dryrun-tfplan.txt

EOF

Task 4 gate: plan produced with 0 destroys and ≥1 add. Proceed.

Task 5: Verify Smoke-Test Gate Blocks Broken PR

Purpose: Confirm that a pull request with a deliberately-failing test does NOT produce a passing CI — which is the safety net that keeps :stable from auto-promoting broken images to prod.

Files touched (throwaway branch only):

tests/test_dryrun_should_fail.py (new file on throwaway branch)

Important: This task creates a PR (not a merge). The PR is closed without merging in Step 5.5.

Step 5.1: Create throwaway branch with failing test

cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
git checkout main
git pull
EPOCH=$(date +%s)
BRANCH="dryrun-break-smoke-${EPOCH}"
echo "$BRANCH" > /tmp/dryrun-baseline/smoke-branch.txt
git checkout -b "$BRANCH"
cat > tests/test_dryrun_should_fail.py <<'PYEOF'
def test_intentional_fail_for_dryrun():
    """Intentional failure to verify CI gate blocks broken PRs. Remove after dryrun."""
    assert False, "dryrun: this test is supposed to fail"
PYEOF
git add tests/test_dryrun_should_fail.py
git commit -m "dryrun: intentional failing test (will be reverted)"
git push -u origin "$BRANCH"

Expected: push succeeds.

Step 5.2: Open PR

cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
PR_URL=$(gh pr create --title "dryrun: verify CI gate (DO NOT MERGE)" \
  --body "Intentionally failing test to verify CI blocks bad merges. Will be closed immediately after CI result." \
  --base main)
echo "$PR_URL" > /tmp/dryrun-baseline/pr-url.txt
echo "Opened: $PR_URL"

Expected: PR URL returned.

Step 5.3: Wait for CI test job to complete (expected: FAIL)

cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
BRANCH=$(cat /tmp/dryrun-baseline/smoke-branch.txt)
sleep 15
RUN_ID=$(gh run list --branch "$BRANCH" --workflow release.yml --limit 1 --json databaseId --jq '.[0].databaseId')
echo "Watching run $RUN_ID (expected to FAIL)"
# Use --exit-status WITHOUT `set -e`; we expect non-zero
set +e
gh run watch "$RUN_ID" --exit-status --interval 15
EXIT=$?
set -e
echo "Exit code: $EXIT (non-zero is EXPECTED here)"

Expected: exit code != 0. If exit code IS 0, that means CI passed despite assert False → the test suite is not being run, or the file was excluded → record as FAIL — CI gate broken.

Step 5.4: Verify PR mergeability check shows failure

PR_URL=$(cat /tmp/dryrun-baseline/pr-url.txt)
PR_NUM=$(basename "$PR_URL")
STATE=$(gh pr view "$PR_NUM" --json statusCheckRollup --jq '.statusCheckRollup[] | select(.name=="test") | .conclusion')
echo "test job conclusion: $STATE"

Expected: FAILURE. If SUCCESS, the gate is broken.

Step 5.5: Close PR and delete branch

cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
PR_URL=$(cat /tmp/dryrun-baseline/pr-url.txt)
PR_NUM=$(basename "$PR_URL")
gh pr close "$PR_NUM" --delete-branch --comment "dryrun complete — CI gate verified, closing without merge"
# Also delete locally
git checkout main
BRANCH=$(cat /tmp/dryrun-baseline/smoke-branch.txt)
git branch -D "$BRANCH" 2>/dev/null || true

Expected: PR closed, local branch gone.

Step 5.6: Check whether main has required status checks configured

gh api repos/keboola/agnes-the-ai-analyst/branches/main/protection 2>/tmp/dryrun-protection-err.txt > /tmp/dryrun-protection.json
RC=$?
if [ $RC -ne 0 ]; then
  echo "No branch protection on main (or insufficient permissions to read it)"
  cat /tmp/dryrun-protection-err.txt
  PROTECTION_NOTE="NONE — branch is unprotected; broken PRs can be merged. Recommend adding 'test' as required status check."
else
  REQUIRED=$(jq -r '.required_status_checks.contexts[]?' /tmp/dryrun-protection.json 2>/dev/null | tr '\n' ',')
  echo "Required checks: $REQUIRED"
  if echo "$REQUIRED" | grep -q "test"; then
    PROTECTION_NOTE="OK — 'test' is required."
  else
    PROTECTION_NOTE="PARTIAL — protection exists but 'test' is not required. Contexts: $REQUIRED"
  fi
fi
echo "$PROTECTION_NOTE" > /tmp/dryrun-baseline/protection-note.txt

Expected: note written. Does not abort — informational only.

Step 5.7: Record Task 5 result

cat >> /tmp/dryrun-report.md <<EOF
## Task 5: CI Gate — <PASS|FAIL>

- Throwaway PR: $(cat /tmp/dryrun-baseline/pr-url.txt) (closed)
- CI 'test' job result on broken code: <FAILURE expected>
- Branch protection on main: $(cat /tmp/dryrun-baseline/protection-note.txt)

EOF

Task 5 gate: broken PR's CI status is FAILURE. Proceed. If PROTECTION_NOTE says NONE/PARTIAL, the final report must flag this as a hackathon-blocking recommendation.

Task 6: Cleanup and Baseline Restoration

Purpose: Leave the system in exactly the state recorded in Task 1. This is the most important task — a dirty dry-run poisons the hackathon.

Step 6.1: Restore agnes-dev AGNES_TAG

ORIG_LINE=$(cat /tmp/dryrun-baseline/dev-env.txt)
# ORIG_LINE looks like: AGNES_TAG=dev
ORIG_VALUE=$(echo "$ORIG_LINE" | cut -d= -f2-)
gcloud compute ssh agnes-dev --zone=europe-west1-b --quiet --command "\
  sudo sed -i 's|^AGNES_TAG=.*|AGNES_TAG=$ORIG_VALUE|' /data/.env && \
  sudo rm -f /data/.env.dryrun-bak && \
  sudo grep -E '^AGNES_TAG=' /data/.env && \
  sudo /usr/local/bin/agnes-auto-upgrade.sh 2>&1 | tail -20"

Expected: AGNES_TAG line matches original, auto-upgrade pulls back to the original tag.

Step 6.2: Wait for dev VM to return to healthy state on original tag

for i in $(seq 1 30); do
  STATUS=$(curl -s --max-time 5 http://34.77.94.14:8000/api/health | jq -r '.status' 2>/dev/null || echo down)
  echo "[$i/30] status=$STATUS"
  [ "$STATUS" = "healthy" ] || [ "$STATUS" = "degraded" ] && break
  sleep 3
done

Expected: reaches healthy/degraded within 90s.

Step 6.3: Verify running image matches baseline

RESTORED=$(gcloud compute ssh agnes-dev --zone=europe-west1-b --quiet --command \
  "docker inspect \$(docker ps -qf name=app) --format '{{.Config.Image}}'")
ORIG=$(cat /tmp/dryrun-baseline/dev-image.txt)
echo "Restored: $RESTORED"
echo "Original: $ORIG"
[ "$RESTORED" = "$ORIG" ] && echo MATCH || echo "MISMATCH — investigate"

Expected: MATCH. If MISMATCH, the baseline-tag digest may have advanced (auto-upgrade pulled newer :stable/:dev floating image during the run) — that is acceptable as long as the .Config.Image tag matches. Record exact difference in report.

Step 6.4: Delete throwaway branches in public repo

cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
STARTING=$(cat /tmp/dryrun-baseline/starting-branch.txt)
git checkout "$STARTING"
FEAT_BRANCH=$(cat /tmp/dryrun-baseline/branch-name.txt)
SMOKE_BRANCH=$(cat /tmp/dryrun-baseline/smoke-branch.txt 2>/dev/null || echo "")
# Local delete
git branch -D "$FEAT_BRANCH" 2>/dev/null || true
[ -n "$SMOKE_BRANCH" ] && git branch -D "$SMOKE_BRANCH" 2>/dev/null || true
# Remote delete (smoke branch was already deleted via `gh pr close --delete-branch` in Step 5.5)
git push origin --delete "$FEAT_BRANCH" 2>/dev/null || echo "(feature branch already gone)"

Expected: local branches gone, remote feature branch deleted. QUICKSTART.md commit on throwaway branch vanishes from origin.

Step 6.5: Final health check on prod (must match baseline)

curl -sf --max-time 10 http://34.77.102.61:8000/api/health > /tmp/dryrun-baseline/prod-health-after.json
BEFORE=$(jq -r '.status' /tmp/dryrun-baseline/prod-health.json)
AFTER=$(jq -r '.status' /tmp/dryrun-baseline/prod-health-after.json)
echo "Prod status before: $BEFORE / after: $AFTER"
[ "$BEFORE" = "$AFTER" ] && echo UNCHANGED || echo DRIFT

Expected: UNCHANGED. (Note: prod was never touched, so this is sanity only.)

Step 6.6: Record Task 6 result

cat >> /tmp/dryrun-report.md <<EOF
## Task 6: Cleanup — <PASS|FAIL>

- agnes-dev AGNES_TAG restored to: $(cat /tmp/dryrun-baseline/dev-env.txt)
- agnes-dev health after restore: $(curl -s --max-time 5 http://34.77.94.14:8000/api/health | jq -r '.status')
- agnes-dev image: matches baseline? <MATCH|MISMATCH — paste both>
- Throwaway branches deleted: feature, smoke
- Prod status unchanged: <UNCHANGED|DRIFT>

EOF

Task 6 gate: dev VM back on its baseline tag, branches gone, prod untouched.

Task 7: Generate Deliverables

Purpose: Produce the artefacts the user needs tomorrow: a helper script for the hackathon team and a consolidated report.

Files:

Create: scripts/switch-dev-vm.sh (new)
Create (already being built): /tmp/dryrun-report.md

Step 7.1: Write scripts/switch-dev-vm.sh

Create file at scripts/switch-dev-vm.sh:

#!/usr/bin/env bash
# switch-dev-vm.sh — point the shared hackathon dev VM at the caller's branch image.
#
# Usage:
#   scripts/switch-dev-vm.sh <branch-slug>
#   scripts/switch-dev-vm.sh hack-zs-metrics
#
# Prerequisite: your branch has been pushed and the release.yml workflow has completed,
# producing ghcr.io/keboola/agnes-the-ai-analyst:dev-<slug>.
#
# The slug is derived from your branch name by stripping the leading "feature/" and
# replacing non-alphanumeric chars with "-". For branch "feature/hack-zs-metrics" the slug
# is "hack-zs-metrics".
set -euo pipefail

if [ $# -ne 1 ]; then
  echo "Usage: $0 <branch-slug>" >&2
  echo "Example: $0 hack-zs-metrics" >&2
  exit 2
fi

SLUG="$1"
VM="agnes-dev"
ZONE="europe-west1-b"
TAG="dev-$SLUG"
IMAGE="ghcr.io/keboola/agnes-the-ai-analyst:$TAG"

echo "[1/4] Verifying $IMAGE exists on GHCR..."
docker manifest inspect "$IMAGE" > /dev/null || {
  echo "ERROR: $IMAGE not found on GHCR. Did your release.yml run finish?" >&2
  echo "Check: gh run list --branch feature/$SLUG --workflow release.yml" >&2
  exit 1
}

echo "[2/4] Updating AGNES_TAG on $VM to $TAG..."
gcloud compute ssh "$VM" --zone="$ZONE" --quiet --command "\
  sudo sed -i 's|^AGNES_TAG=.*|AGNES_TAG=$TAG|' /data/.env && \
  sudo grep -E '^AGNES_TAG=' /data/.env"

echo "[3/4] Triggering auto-upgrade..."
gcloud compute ssh "$VM" --zone="$ZONE" --quiet --command \
  "sudo /usr/local/bin/agnes-auto-upgrade.sh 2>&1 | tail -10"

echo "[4/4] Waiting for app to become healthy..."
for i in $(seq 1 30); do
  STATUS=$(curl -s --max-time 5 http://34.77.94.14:8000/api/health | python3 -c 'import sys,json; print(json.load(sys.stdin).get("status","down"))' 2>/dev/null || echo down)
  echo "  [$i/30] status=$STATUS"
  if [ "$STATUS" = "healthy" ] || [ "$STATUS" = "degraded" ]; then
    echo "OK — agnes-dev now running $TAG. Open http://34.77.94.14:8000"
    exit 0
  fi
  sleep 3
done
echo "ERROR: agnes-dev did not become healthy in 90s. SSH in and check: docker compose logs" >&2
exit 1

chmod +x scripts/switch-dev-vm.sh
bash -n scripts/switch-dev-vm.sh  # syntax check

Expected: syntax-check passes, file executable.

Step 7.2: Commit the script on a fresh branch and open PR

cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
git checkout -b feature/hackathon-dryrun-deliverables
git add scripts/switch-dev-vm.sh
git commit -m "chore: add switch-dev-vm.sh helper for hackathon"
git push -u origin HEAD
gh pr create --title "chore: add switch-dev-vm.sh helper for hackathon" \
  --body "Adds scripts/switch-dev-vm.sh. Produced by the 2026-04-21 hackathon dry-run. Reviewed by user before merge." \
  --base main > /tmp/dryrun-baseline/deliverable-pr.txt
cat /tmp/dryrun-baseline/deliverable-pr.txt

Expected: PR URL. Do not merge — leave for user review.

Step 7.3: Finalise report with overall verdict

Determine overall verdict by inspecting each Task's PASS/FAIL line in /tmp/dryrun-report.md. Overall is PASS only if all tasks PASS (SKIPPED Task 4 is acceptable — note it).

Append to report:

cat >> /tmp/dryrun-report.md <<EOF
---

## Overall Verdict

<PASS | PASS WITH GAPS | FAIL>

## Recommendations for the User Before Hackathon Starts

1. <If protection-note said NONE/PARTIAL:> Configure required status check 'test' on main branch of keboola/agnes-the-ai-analyst.
2. Pin prod image_tag in agnes-infra-keboola/terraform/terraform.tfvars from "stable" to "stable-2026.04.XX" (current running version). Revert after hackathon.
3. Rotate admin password '1234' on prod (34.77.102.61:8000/login) and dev (34.77.94.14:8000/login).
4. Wire notification_channel_ids in tfvars so uptime alerts actually notify someone.
5. Share the hackathon 1-pager + switch-dev-vm.sh via the team Slack channel.
6. Review PR $(cat /tmp/dryrun-baseline/deliverable-pr.txt) and merge if switch-dev-vm.sh looks good.

## Artefacts

- Full report: /tmp/dryrun-report.md (this file)
- Baseline snapshots: /tmp/dryrun-baseline/*.{json,txt}
- TF plan output: /tmp/dryrun-tfplan.txt (if Task 4 ran)
- Deliverable PR: $(cat /tmp/dryrun-baseline/deliverable-pr.txt)

EOF
cat /tmp/dryrun-report.md

Expected: full report printed.

Step 7.4: Print final summary to chat

Agent should output, in its final message to the user:
- Overall verdict (one line)
- Each task's result (one line each)
- Any unresolved anomalies
- Link to deliverable PR
- Path to full report

Task 7 gate: report complete, PR open, all artefacts listed.

Abort / Rollback Procedures

If any task fails mid-execution, the agent must still perform Task 6 cleanup before reporting failure. Specifically:

If Task 2 push succeeded but Task 3 failed → still run Task 6 Steps 6.1-6.4 to restore dev VM and delete the branch.
If Task 5 PR was opened but workflow didn't finish → close the PR with gh pr close --delete-branch and log it.
If Task 4 TF plan showed destroys → abort immediately, do NOT attempt apply, record in report, continue to Task 6.

If Task 6 itself fails (dev VM won't come back healthy on original tag), the agent must:

Print the baseline values (from /tmp/dryrun-baseline/dev-env.txt, /tmp/dryrun-baseline/dev-image.txt) so the user can manually SSH and fix.
Attempt gcloud compute ssh agnes-dev --zone=europe-west1-b --command "docker compose -f /opt/agnes/docker-compose.yml logs --tail 100" and include output in the report.
Mark overall verdict as FAIL and stop.

What a Successful Run Looks Like

Task 1 baseline: captured with prod+dev healthy/degraded
Task 2: GHCR manifest exists for :dev-hack-dryrun-<epoch>
Task 3: agnes-dev briefly running the per-branch image, healthy within 90s
Task 4: terraform plan showed 1+ to add, 0 to destroy (or SKIPPED)
Task 5: CI test job reported FAILURE on the broken PR, PR closed
Task 6: agnes-dev back on its baseline AGNES_TAG, healthy, branches gone
Task 7: scripts/switch-dev-vm.sh committed on PR for user review, full report in /tmp/dryrun-report.md
Final agent message: verdict + 6 bullet results + deliverable PR link

Duration: ~45-75 minutes, bounded primarily by CI workflow runs (~3-5 min each, two runs) and TF init (~30s-2min cold).

34 KiB Raw Blame History Unescape Escape

Hackathon E2E Dry-Run Plan

Out of Scope (do NOT do)

Prerequisites

Task 1: Baseline Snapshot

Task 2: Verify Per-Branch GHCR Build

Task 3: Dev VM Switch Flow

Task 4: Terraform Plan Verification (Private Repo)

Task 5: Verify Smoke-Test Gate Blocks Broken PR

Task 6: Cleanup and Baseline Restoration

Task 7: Generate Deliverables

Abort / Rollback Procedures

What a Successful Run Looks Like

34 KiB

Raw Blame History