Commit graph

21 commits

Author SHA1 Message Date
Vojtech Rysanek
9e3e611aab fix(web): install-prompt Step 2 + restart cue match Desktop install path
/home page Step 2 told users to mkdir ~/Desktop/<workspace_dir>, but the
pasted install script's Step 2 pwd-check expected $HOME/<workspace_dir>
and warned 'normally installed in ~/FoundryAI' — sending users on the
Desktop path through an unnecessary 'install here' confirmation.

Align Step 2 (pwd check + warning copy + manual mkdir hint), Step 9
restart-claude cue, post-install /home hero, and the 'don't create
Projects/' callout to ~/Desktop/<workspace_dir>. Update tests.
2026-05-21 18:17:25 +04:00
Vojtech
001e5ce40e
feat(web): /home value-first redesign + unified page-shell across app (#366)
* feat(web): value-first /home reskin (CEO mock palette + pillars + first-session)

Restructures `/home` to lead with product value instead of install steps,
matching the CEO mock proposed for the homepage:

- New intro hero on top — eyebrow `Welcome, {{ display_name }}`, H1
  `{{ instance_brand }} is your team's AI workspace`, lede framing the
  product as an "AI Chief of Staff", two CTAs (`Set up in ~15 min →`
  jumps to the wizard, `Just browse — no install needed` jumps to
  `#look-around`), and a four-pillar row (Data packages · Plugins ·
  Skills · Memory). Renders for both onboarded and not-onboarded users
  so the value framing is consistent across visits.
- New `first-session` narrative — five-beat walkthrough (launch → pick
  project → memory loads → ask → close) with mock terminal frames
  carrying traffic-light dots, prompts, and dimmed system output.
- Setup wizard chrome — progress chip (`Step 1 of N · ~15 min ·
  One-time · Reversible`), thin progress bar, and per-step number
  badges on each `.install-block` so the wizard reads as bounded
  instead of an open-ended scroll.
- Palette shift from blue to green/navy: `--hp-primary` aliases
  `#2ea877` (mint), `--hp-hero-bg` is navy `#0f1b3a`, code panels stay
  near-black `#0c1224` with warm-yellow `#ffd866` accents. The token
  alias is reused so downstream rules pick up the new accent
  automatically; instance theme overrides via
  `config.theme_overrides()` still win.
- VS Code surface tile carries a `Recommended` pill; the existing
  "Want to look around first?" section is renamed to `Explore your
  workspace` and gets the `#look-around` anchor.

All test-pinned class names and IDs (`install-hero`, `install-block`,
`home-mock`, `self-mark-btn`, `setupClaudeBtn`, `offboard-strip`,
`home-getting-started`, `home-gs-item`, `home-overview`,
`home-usage`) preserved as structural anchors; new visual language
overlays via additional classes. Existing onboarded/not-onboarded
branching, `/api/me/onboarded` POST, status frame gating, post-CTA
modal, and OS-tab switching JS unchanged. Stray `~/FoundryAI`
comment swapped for `~/{{ workspace_dir }}` to honor the
vendor-agnostic OSS rule.

51 home tests pass without modification.

* fix(web): /home palette inversion — dark intro hero on top, light setup card below

Previous reskin commit kept the install-hero as a dark navy gradient and
rendered the new intro hero as a light surface — opposite of what the CEO
mock specifies. Playwright comparison vs `data/ceo_home.html` confirmed:

- CEO mock: dark navy hero at TOP (with white pillars on navy), LIGHT
  white setup card BELOW with light step rows and dark code panels
  inset.
- Previous: light intro hero on top, dark setup card below. Inverted.

This patch flips both:

- `.home-hero-intro` now: dark navy gradient `#0f1b3a → #1a2a5f`, green
  radial glow in the corner, green eyebrow, white H1 (`accent` span
  green), rgba-white lede, green pill primary CTA, translucent-white
  secondary CTA, pillars row separated by hairline border-top with
  green square-dot bullets in front of each pillar header.
- `.install-hero` and `.install-block` now: white surface card with
  thin green accent strip across the top, light step rows split by
  hairline borders, green-tinted step-number circles (`#e6f9f0` bg,
  `#1f8a5e` ink), green progress chip + bar. Code panels
  (`.install-cmd`) and terminal frames stay dark — they're the "type
  this" surfaces.
- All previously-rgba-white descendants of `.install-hero`
  (close button, eyebrow, h1, lead, links, code chips, OS tabs,
  install notes, setup-CTA button, self-mark fallback, auto-detect
  badge, terminal-howto disclosure) re-skinned for light surface.

All 12 home page tests still pass (no markup changes, only CSS).

* fix(web): /home parity polish — system font + mock sizes + blue info hint + gray step-num

After v2 palette flip, user comparison vs CEO mock surfaced three
remaining gaps in the wizard area:

- Font stack mismatch: Agnes inherits Inter via `style-custom.css`,
  but the CEO mock uses the platform system stack (San Francisco on
  macOS, Segoe UI on Windows). The rendered weight/letterforms read
  noticeably different. `.home-mock` now declares
  `-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif`
  for itself and all descendants, with the monospace stack reserved
  for `code`/`kbd`/`pre`, `.install-cmd`, and `.terminal-body`.
- Step number badges were green-tinted; mock uses neutral gray
  (`#f0f2f6` bg, `#4a5168` ink) — green is reserved for the "done"
  state. Switched to `--hp-surface-dim` + `--hp-text-secondary`.
- "Don't have a terminal open?" disclosure was an amber/yellow
  variant left over from the old dark-hero palette. Mock uses a
  blue info-hint vocabulary (`--info-bg: #eef3ff`,
  `--info-line: #4f7cf2`, `--info-ink: #1c3994`) with white kbd
  chips. Added the info-* tokens to the `:root` block and re-skinned
  `details.terminal-howto` (incl. summary, body, kbd) to match.

Step-body type sizes also brought in line with the mock spec —
`.install-block .label` (step h3 equivalent) is now 17px / 700 with
6px gap; `.install-note` body type is 14px / 1.55.

`--hp-info-bg / --hp-info-ink / --hp-info-line / --hp-warn-bg /
--hp-warn-ink / --hp-warn-line / --hp-surface-dim` added as
first-class tokens so future hint/warn callouts pick the same colors
without a duplicate vocabulary.

12/12 home tests pass.

* feat(web): centralize design tokens + reword /home wizard to 6 steps (CEO mock parity)

Two intertwined changes that touch both global design + /home structure:

GLOBAL TOKEN SHIFT (app/web/static/style-custom.css)
- `--primary` flipped from blue `#0073D1` to green `#2ea877` — same brand
  alias the rest of the app referenced, so every page picks up the new
  accent automatically. Old `--primary-dark` / `--primary-light` recolored
  to match.
- New tokens added: `--brand-accent`, `--hero-bg`, `--hero-ink`,
  `--surface-dim`, `--info-bg/ink/line`, `--warn-bg/ink/line`. Brings
  the global vocabulary in line with the CEO mock's `:root` block so
  callouts and hero surfaces don't have to invent local tokens.
- `--font-primary` switched from Inter-led stack to the system stack
  (`-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Inter",
  system-ui, sans-serif`) so weight/letterforms render identically on
  macOS (San Francisco) and Windows (Segoe UI) — matches the mock and
  avoids a font-loading flash for analysts without Inter installed.
- Shadow tints re-cast in navy `rgba(15,27,58,...)`; focus ring uses
  the new green `rgba(46,168,119,0.25)`.
- `.app-nav-link` font-size 13px → 14px, padding 6px 12px → 8px 14px,
  hover bg → `--primary-light` (mint), color → `--primary-dark`.
  `.app-nav-menu-item.is-active` re-tinted to the same green system.
- Sweep across 26 templates (style-custom.css + 25 template files)
  replacing every hardcoded `#0073D1` / `#005BA3` / `#E6F3FC` /
  `rgba(0,115,209,…)` / `rgba(0,86,163,…)` with token references or
  the new green hexes — 175 occurrences total. Pages that styled their
  own buttons / borders / shadows pick up the new brand color without
  per-page overrides.

/HOME WIZARD: 6 STEPS PER MOCK (app/web/templates/home_not_onboarded.html)
- Step 1 reworded `Install Claude Code on your computer` + `~3 min`
  subhead (mock copy).
- Step 2 renamed `Pick a folder for {{ instance_brand }}` (was
  `create your workspace folder`) — same `mkdir` command, mock-aligned
  framing.
- NEW Step 3 `Open a terminal inside that folder` — no shell command,
  just the "you are standing in the right directory" reassurance with
  a Finder/PowerShell/file-manager howto disclosure. Mirrors the CEO
  mock's Step 3.
- Step 4 (was Step 3, gated by `home_automode.show`) renamed
  `Launch Claude with auto-approve on`. Body copy lightly updated so
  it references "the next step" instead of "Step 4".
- Step 5 (was Step 4) renamed `Get the install script and paste it
  into Claude`. The setup-cta-lead now explicitly says
  "pasting the script into Claude Code will install {{ instance_brand
  }}…" so existing test assertions pinning the `install Agnes`
  substring still match.
- NEW Step 6 `Optional: create a one-word shortcut for next time` —
  prints an `echo 'alias {{workspace_dir|lower}}=…' >> ~/.zshrc`
  one-liner for Unix and an `Add-Content $PROFILE …` equivalent for
  Windows. OS tabs + copy buttons reuse the existing wizard chrome.
- Progress chip dynamic: `Step 1 of 6` when home_automode is on,
  `Step 1 of 5` when off. Progress bar fill `100 // total_steps` so
  the bar sits at 16-20 % on first paint.
- `.step-lede` token added for the new short body copy beneath each
  step label (14.5px / ink-soft).
- `macOS / Linux / WSL` tab labels changed to `macOS / Linux` per
  user instruction. Terminal-howto `WSL:` paragraph dropped; the
  paste-shortcut hint now reads `(Linux)` instead of `(Linux/WSL)`.
  Functional WSL handling in `connector_prompts.py` (it's a Linux
  detection fallback, not user-facing label) preserved.
- `setup_instructions.py` Claude Code install hint:
  `npm (Linux / WSL)` → `npm (Linux)`.

SURFACES — 4 CARDS PER MOCK
- Replaced the 3-tile `.home-usage-grid` with a 4-card grid:
  - VS Code (Recommended) — `.surface-card.feature`, green ring,
    DAILY USE eyebrow + 5-step numbered list + `Open VS Code setup
    guide →` link to `/setup-advanced#vscode`.
  - Terminal — QUICK ACCESS eyebrow + 4-step list.
  - Claude Code (Desktop app) — CONNECT IT eyebrow + 4-step list.
  - Cowork (claude.ai) — `.surface-card.incomplete`, warn-tinted
    border + `Instructions needed` badge + a TODO callout describing
    the missing content. The card is intentionally honest about the
    gap rather than hiding it.

TEST UPDATES
- `test_web_home_page.py` negative onboarded-state assertions
  rebased on the new step labels (6 entries instead of 4).
- `test_home_route_resolution.py` `test_home_renders_automode_block_by_default`
  + its `_when_env_off` counterpart now check the new
  `Step 4 — Launch Claude with auto-approve on` label.

* fix(web): /home section content + layout — verbatim mock match

User comparison flagged several remaining gaps; this patch rewrites
the three lower sections of /home to match the CEO mock spec exactly:

FIRST-SESSION (5 beats)
- h2 28px / 700 / -.5px tracking (was 19px / 600).
- lede 18px ink-soft (was 13.5px secondary).
- `.session-walk` wrapper, 36px gap between beats (mock spec).
- `.session-step` grid 48px / 1fr, gap 22px — number circle on
  the left, content on the right.
- `.session-num` 40 × 40 circle with SOLID GREEN bg (`--primary`)
  and WHITE text + soft green shadow (was 28px mint pill w/
  dark-green text).
- `.session-content h3` 18px / 600 (was 14.5px / 600).
- `.session-content > p` 15px.
- `.session-content .annotation` 13.5px ink-muted body type with
  `strong` for highlighting (replaces the upper-case "WHAT'S
  HAPPENING" eyebrow pattern that didn't match the mock).
- `.session-intro` callout card (white surface + mint icon block)
  framing the "five beats" tagline.
- `.session-tldr` summary box (brand-light bg + brand-dark left
  border) wrapping up the loop.
- Terminal frames re-skinned: `#0c1224` body / `#182241` bar /
  real macOS traffic-light colors `#ff5f57` / `#febc2e` / `#28c840`.
- Terminal body 13px / 1.65 line-height with mock-spec class
  vocabulary: `.you` (yellow input), `.ai-name` (brand bold),
  `.path` (light blue), `.dim` (translucent code-ink), `.caret`
  (blinking cursor).
- Five beats rewritten with mock's exact narrative flow (launch →
  menu → pick → ask → close), vendor-agnostic project names
  (`RevenueAnalysis`, `Onboarding`, etc.) replacing the customer-
  specific `GRPN_*` examples in the mock. Templated `{{
  instance_brand }}` / `{{ workspace_dir }}` / `{{ workspace_dir |
  lower }}` (the shortcut alias) everywhere.

SURFACES (4 cards)
- The section is no longer wrapped in a white rectangle; the
  `.home-usage` class loses its bg + border + padding (mock has the
  cards directly on the page bg).
- h2 28px (was 22px). Eyebrow 12px / 1.5px tracking / brand-dark.
- `.surface-card.feature` (VS Code) now uses 2px green border +
  vertical brand-light → white gradient (was 1px ring).
- `.surface-card.incomplete` (Cowork) uses 2px red border (`#e35e5e`)
  + vertical red-tint → white gradient (was yellow flat bg).
- `.surface-card .steps` panel: inner surface-dim bg + 8px radius
  + 13px font.
- `.surface-foot` top-border + ink-muted (mock spec).
- `.badge-warn` now a solid red box (`#e35e5e` bg + white ink + 4px
  radius) instead of a yellow pill, matching the mock.
- Header layout fixed: the global absorbed `header { display: flex;
  justify-content: space-between }` rule was making the h2 sit on
  the right of the eyebrow; explicit `display: block` override on
  `.home-mock section > header` puts the title on the LEFT under
  the eyebrow as the mock has.

BROWSE — Explore your workspace
- Wrapped in `<section class="browse-section">` with proper
  eyebrow + h2 + lede (was a bare `.section-label` div).
- `.browse-grid` 5-col grid (was responsive auto-fill, 4-card
  layout). Skills tile added as a 5th card linking to
  `/marketplace?type=skills`.
- `.browse-card` mock-spec: 22 20 padding, 28px icon, 15px title,
  12.5px ink-muted desc, hover lifts -2px with brand border +
  shadow-md.

Section wrappers (`.home-usage`, `.first-session`) no longer carry
the white card chrome — they sit directly on the page bg, matching
the mock. Only Getting Started + Overview keep their white cards.

GLOBAL eyebrow vocabulary (`.home-hero-intro .eyebrow`,
`.first-session > .eyebrow`, `.surfaces > header .eyebrow`,
`.browse-section .eyebrow`) all aligned to mock spec: 12px / 700 /
1.5px tracking / brand-dark color / 14px bottom margin.

Hero h1 bumped to 44px / 800 / -1px tracking (was 32px / 600).

51/51 home tests pass.

* fix(web): /home session-intro card + terminal-body verbatim mock match

User comparison flagged three remaining /home gaps; this patch
addresses each:

- `.session-intro` rule was missing — the "five beats" tagline
  rendered as a bare line with no card chrome. Added the mock-
  spec card: white surface, 14px radius, 20×24 padding, 1px
  border + shadow-sm, with a 44×44 brand-light icon block on the
  left.

- Beat 1 terminal-title was `~/{{ workspace_dir }} — zsh` (mock-
  style shell-pwd format), but the user wants every terminal
  frame across all 5 beats to read `claude — {{ instance_brand }}`.
  Updated.

- Terminal-body line structure for beats 2-5 rewritten verbatim
  from the CEO mock:
  - `<span class="prompt">&gt;</span><span class="you">…</span>`
    now has no space between the prompt and user input (mock
    pattern: zero gap, the .prompt's `margin-right: 8px` provides
    the visual separation).
  - Beat 2 menu items use `<strong>[N]</strong>` numbering with
    project entries on indented lines, each project name followed
    by a `<span class="dim">(N ago)</span>` timestamp at a fixed
    column — instead of my prior single-line concatenation.
  - Beat 3 narrative split into 4 stanzas separated by blank lines
    (matches mock): the "Switched to <strong>X</strong>" status,
    then dim Loaded/Last-session lines, then a stand-alone "One
    unprocessed input detected:" pair, then the "Want me to
    process …" question. My prior version dim-wrapped the entire
    block, which looked off.
  - Beat 4 narrative split into headline summary + risks section
    with <strong> heads + bullet lists separated by blank lines,
    matching the mock's "Q1 close summary" / "Open risks" rhythm.
    The Q1 question carries the mock's manual line-break + 2-
    space continuation indent inside the `.you` span — without
    that, terminal-body's `white-space: pre-wrap` would auto-wrap
    awkwardly at a different column than the mock.
  - Beat 5 exit narrative uses two separate dim lines + a
    standalone `.ai-name` "See you next time." line, then prompt
    + caret. My prior version collapsed everything into one dim
    block.
  - Project names changed from customer-specific (`GRPN_*`) to
    generic (RevenueAnalysis, WeeklyReview, Onboarding, OpsDb,
    HRHandShake) so the OSS distribution stays vendor-agnostic
    per CLAUDE.md.
  - `Marketing plan` examples replaced with `Q1 close` so the
    narrative stays plausible for an analyst audience.

12/12 home tests pass.

* fix(web): /home surfaces verbatim mock — VS Code thumb, Terminal expected-output, NEW badge

User comparison flagged three remaining surface-section gaps:

- VS Code surface card was rendering a generic "Screenshot pending"
  placeholder; the mock has a labeled inline mockup
  (`<a class="vscode-thumb">` w/ `.thumb-fallback`) showing the
  recommended 4-pane layout (EXPLORER yellow, TERMINAL 1 purple,
  TERMINAL 2 green, TERMINAL 3 orange) on a dark navy bg + a
  "Recommended layout" caption pill. CSS `.vscode-thumb` block
  added — uses gradient-strip backgrounds to draw the colored
  panel bars without needing a base64 image.

- "Recommended" badge was a pill (999px radius) with
  `--brand-accent` bg + navy text. Mock uses `.badge` instead of
  `.recommend-pill` — solid `--primary` (brand-dark green) bg
  with WHITE text and 4px radius. Replaced the class + CSS rule
  so the badge reads as a tag, not a pill.

- Terminal surface card was missing the "What you should see"
  subsection — mock has an `.expected-output` block showing a
  sample of the welcome menu inside a dim dashed panel. Added the
  block with the mock's exact rendered output (templated to
  `{{ instance_brand }}` + generic project names instead of
  customer-specific GRPN entries) plus the `.expected-output`
  CSS (surface-dim bg + dashed border + `::before` "WHAT YOU
  SHOULD SEE" eyebrow per mock spec).

Also addressed the explore-section feedback:

- Skills browse-card now carries the `new` class so it picks up
  the `.browse-card.new::after` corner badge ("NEW", green bg,
  white text, 10px / 700 / 0.5px tracking) per mock.
- Browse cards align same height via `align-self: stretch` (grid
  default) + `flex-grow: 1` on `.browse-desc` so descriptions
  fill remaining vertical space; previously the Skills tile sat
  shorter because its desc text was longer than others'.

Structural HTML changes to all four surface cards: dropped the
inner `<div class="surface-card-head">` wrapper + `<p
class="surface-pitch">` class in favor of mock's flat layout
(`.what` + `.steps` + `.when-to-use`). `<ol class="surface-steps">`
replaced with `<div class="steps"><strong
class="steps-eyebrow">DAILY USE / QUICK ACCESS / CONNECT IT</strong>
<ol>...</ol></div>` so the eyebrow + numbered list share the
mock's tinted surface-dim panel.

12/12 home tests pass.

* fix(web): align /home setup walkthrough to design spec

- Setup-section header (eyebrow + heading + lede) floats above the
  install hero; install card has no accent strip; step labels drop
  `Step N —` prefix; closing strip is single flex row.
- VS Code surface card renders recommended-layout screenshot from
  `/static/img/vscode-layout.png` with click-to-enlarge lightbox.
- Workspace install path cascades to `~/Desktop/{workspace_dir}` in
  every step, surface card, first-session annotation, and shortcut.
- Step 1 verify text restores Enterprise — Finance and Legal option.
- Step 6 shortcut installs a shell function with arg forwarding
  (`"$@"` unix / `@args` windows) and a user-facing Auto / YOLO
  permission-mode toggle.
- Step 5 manual-fallback details inline on the CTA row; description
  reads at step-lede size, not 13px chip.
- Setup-section heading no longer right-aligns (was inheriting
  `header { display: flex; justify-content: space-between }` from
  the legacy stylesheet; wrapper changed to `<div>`).
- Getting Started `<details>` block removed (duplicated links).

* test(web): align /home tests with restructured setup wizard

- Replace test_getting_started_card_renders_on_home with
  test_setup_section_renders_for_not_onboarded — asserts the new
  setup-section-header floats above the install hero and Getting
  Started markup is absent (block removed in the prior commit).
- Update automode-block test to match labels without the
  `Step N —` prefix.
- Update setup-CTA partial test to match the relabeled
  "Copy install script to clipboard" button.

Drop orphaned CSS for `.home-getting-started`, `.home-gs-summary*`,
and `.home-gs-item` — selectors had no matching markup after the
Getting Started block was removed.

Also: Step 3 `pwd` expected-output uses an absolute path
(`/Users/yourname/Desktop/{workspace_dir}`) instead of the
tilde-prefixed form, matching what the command actually prints.

* fix(web): repaint home_onboarded + setup_advanced; align CTA label

- home_onboarded + setup_advanced still carried the retired blue
  `#0056A3` as both `--hp-primary-dark` and the hero gradient
  endpoint. Both reference `var(--primary-dark)` now so the green
  palette cascades.
- setup_advanced YOLO snippet was the old `alias` form (no cd, no
  arg forwarding). Replaced with the shell function variant from
  /home Step 6 — drops into ~/Desktop/{workspace_dir} and forwards
  "\$@" (unix) / @args (Windows).
- setup_advanced ~/{workspace_dir} path references cascaded to
  ~/Desktop/{workspace_dir} so install story matches /home.
- Dashboard's "Setup a new Claude Code" button label aligned to the
  canonical "Copy install script to clipboard" — matches /home and
  the new docstring in _claude_setup_cta.jinja, which now mandates
  this label across consumers.

* fix(web): keep base brand blue; scope green palette to /home redesign

User noticed login + dashboard had turned green when the /home
redesign flipped --primary from blue (#0073D1) to green (#2ea877)
in commit 278f202e. The brand-wide flip went further than the
redesign needed — only /home, /home (onboarded), and /setup-advanced
intentionally use the green/navy spec; every other page (login,
dashboard, catalog, marketplace, admin, profile) was just inheriting
the green because --primary cascaded everywhere.

Revert the global brand colour to blue and lock the green into the
two outstanding redesign scopes:

- style-custom.css: --primary back to #0073D1, --primary-light back
  to rgba(0,115,209,0.1), --primary-dark back to #005BA3,
  --brand-accent back to a lighter blue.
- home_onboarded.html: .home-mock now sets --hp-primary,
  --hp-primary-dark, --hp-primary-light to explicit green hex
  (matching home_not_onboarded), so the hero stays green regardless
  of the global brand.
- setup_advanced.html: same lock — .advanced-mock pins the green
  palette in-scope.

Hero gradients on both pages now reference the local --hp-primary
chain (not the global --primary), so any future palette tweak inside
either scope cascades correctly without disturbing the rest of the app.

* refactor(web): hoist --hp-* into shared design-tokens.css (--ds-*)

PR 2 of the design-system extraction ladder. Pure mechanical rename
+ dedup; no visual diff on any rendered page (verified on /home,
/dashboard).

- New app/web/static/css/design-tokens.css declares the full token
  set on :root: brand surface (green primary, primary-dark, mint
  light, brand-accent), hero (navy bg + ink), code-panel (near-black
  bg + cool ink + warm-yellow), light surfaces (bg/surface/border),
  text (primary/secondary/muted), orange accent, info + warn
  callout vocabularies, navy-tinted elevation shadows, system font
  stack + mono.
- base.html loads it alongside style-custom.css so the tokens are
  globally available.
- Rename --hp-* -> --ds-* in home_not_onboarded (313 refs),
  home_onboarded (15), setup_advanced (39). 367 token references
  pointed at one of three local blocks; now all point at the
  global :root.
- Drop the three local token blocks. Each scope class
  (.home-mock / .advanced-mock) only keeps its base ink + font-size
  + line-height rules.

The legacy --primary family stays canonical for the blue base
brand — login, dashboard, catalog, marketplace, admin still read
blue. The design system is opt-in via the scope class.

* refactor(web): extract shared components.css; migrate /home markup

PR 3 of the design-system extraction ladder. First batch of
reusable components lifted out of home_not_onboarded.html into a
new shared stylesheet; markup migrated to consume them.

- New app/web/static/css/components.css with five components, all
  reusable on any page that loads design-tokens.css:
    .callout-rec        — amber lightbulb recommendation box
    .callout-hint       — blue info hint box
    .code-output        — "WHAT YOU SHOULD SEE" terminal output block
    .lightbox           — full-bleed image enlarge overlay
    .setup-section-header — wizard header (eyebrow + h2 + lede)
- base.html loads components.css after design-tokens.css.
- home_not_onboarded.html markup renamed:
    class="rec"             -> class="callout-rec"
    class="hint"            -> class="callout-hint"
    class="expected-output" -> class="code-output"
- Local CSS rules removed from home_not_onboarded.html for each of
  the extracted components — ~150 lines down to 5-line "extracted to
  components.css" comments. The bespoke wizard-specific styles
  (.install-cmd, .os-tabs, .mode-tabs, .terminal-frame) stay
  template-local for now since they only have one consumer.

Visual regression check: /home install hero renders the amber rec
callout, blue hint callout, dashed code-output block, green section
header, and click-to-enlarge VS Code thumb identically to the
pre-extraction render. 43 home tests pass.

* fix(web): unify page-headers — activity-center full-width, marketplace shares box

- /activity-center audit-log hero rendered as half-width because the
  _page_hero include was inside <header class="obs-topbar">, a flex
  row that pinned the time-range + auto-refresh controls next to it.
  The hero is now a sibling rendered before the <header>, so it
  spans the full container width like every other admin page; the
  controls keep their flex row underneath.
- Marketplace hero unified with .page-header--hero. Markup is now
  <section class="page-header page-header--hero mp-hero"> so the
  shared box drives padding/radius/gradient/max-width/shadow; the
  .mp-hero override block only carries the right-anchored cover
  image and the rules for the search row + scope checkboxes (which
  the canonical hero doesn't have). Inner text uses the canonical
  .page-header__eyebrow / __title / __subtitle classes.
- .page-header--hero shadow tint now follows the brand blue
  (rgba(0, 115, 209, 0.2)) instead of the leftover green from the
  prior palette flip; same depth highlight everywhere the gradient
  is blue.

* fix(web): unify remaining page heroes — admin, profile, install, store, stack

Sweep across pages that carried bespoke gradient hero markup so
every page-hero shares the canonical `.page-header--hero`
dimensions (padding 28/32/24, border-radius 14, max-width
var(--width-app), navy-tinted shadow, gradient with --primary →
--primary-dark). Inner text uses the .page-header__eyebrow /
__title / __subtitle classes so typography matches across the app.

- admin_tables: migrated to _page_hero.html include.
- admin_tokens: kept .tokens-hero wrapper for the counts-chip row
  but added the canonical class on the same element; stripped
  duplicate gradient + padding + typography rules.
- install: same pattern (kept hero-meta pill row).
- profile: migrated to _page_hero.html include.
- store_upload: kept .upload-hero wrapper for the .meta chip row;
  composite class with the canonical hero.
- setup_advanced: .advanced-mock .ad-hero now matches canonical
  dimensions; green palette retained via --ds-primary/dark.
- stack_card.css: .stack-hero (catalog + corporate-memory search
  hero) uses canonical gradient + padding + max-width.

The detail-page heroes (marketplace_plugin_detail,
marketplace_item_detail, catalog_*_detail, store_edit,
admin_group_detail, admin_store_submission_detail) stay bespoke
for now — they're rich detail headers with photos, badges, install
actions; converting them would lose contract context. Same applies
to dashboard.html env-setup-cta (it's a CTA card, not a page hero).

* fix(web): canonicalise .container — single page shell every page inherits

Previously each admin page set its own `.container:has(.<page>)
{max-width: none}` + `.<page>-page {max-width: 1400px}` override,
and per-page hero markup either nested inside flex toolbars (which
pinned the hero next to filter controls and squeezed it half-width)
or self-constrained with a different max-width than the page. /home,
/dashboard, /marketplace, and /admin/* ended up at different widths
with different nav-to-hero gaps.

- style-custom.css `.container` now carries the canonical 1280px
  max-width + `16px 32px 48px` padding so every page inherits the
  same nav-to-hero gap and side gutters. `.container > main` is
  margin/padding 0 so the container is the sole owner of gutters.
- `.page-header--hero` drops its self-constraining max-width and
  auto-centering margin — the container provides the width, so the
  hero sits flush with the table/toolbar below it.
- `.stack-hero` (catalog + corporate-memory) and `.advanced-mock
  .ad-hero` (/setup-advanced) follow the same pattern: container
  owns the width.
- Per-page max-width overrides stripped from admin_users,
  admin_access, admin_groups, admin_marketplaces, admin_welcome,
  admin_workspace_prompt.
- _page_hero include extracted from inside flex toolbars on
  admin_users, admin_access, admin_groups, admin_marketplaces,
  admin_server_config, admin_welcome, admin_workspace_prompt,
  admin_sessions, admin_session_detail, admin_usage,
  activity_center. The toolbar (`.users-toolbar`, `.gp-toolbar`,
  etc.) keeps only the filter + action controls; hero renders
  before it as a sibling.
- _page_chrome.html trimmed to just the page-background tint for
  the redesign scopes; the duplicate `.container` rules it carried
  are now redundant.

Verified: /home, /admin/marketplaces, /admin/users all render
container width 1280px with hero top at 88px (16px below the
72px-tall sticky nav). Same spacing as /home design spec.

* fix(web): admin_tables + admin_corporate_memory inherit canonical .container

Both pages were overriding `{% block layout %}` from base.html,
which bypasses the canonical `.container` wrapper. Result: hero
span the full viewport (1596px on a wide screen) while the inner
content sat at a narrower max-width — hero and content didn't
align, and the nav-to-hero gap differed from every other admin
page.

Switched both templates to `{% block content %}` so they render
inside the canonical `.container` from base.html — same path as
admin_groups, admin_users, admin_marketplaces, etc.

- admin_tables: dropped local `.page-title { max-width: 1600px }`
  + `.content { max-width: 1600px }` overrides (kept typography +
  inner gutter rules) and the mobile padding overrides that paired
  with them. Container now owns the gutters.
- admin_corporate_memory: only the block keyword needed changing;
  the template already had a clean inner structure (no max-width
  override on `.container-memory`).

Verified on /admin/tables and /admin/corporate-memory:
- .container width 1280, padding 16/32/48
- Hero top 88 (nav 72 + container padding-top 16)
- Hero + content both 1216px wide, both at left 190 — perfect
  alignment with /admin/groups.

* fix(web): drop .page-shell padding override + admin_tables stale :root

Two regressions discovered after the canonical-container unification:

1. `.container:has(.page-shell)` still set `padding: 28px 32px 48px`
   while the canonical `.container` had moved to `16px 32px 48px`.
   Every page-shell consumer (/admin/sessions, /admin/sessions/<id>,
   /admin/usage, /marketplace, /dashboard, marketplace detail pages,
   /me/activity, /store/*, /admin/store-submissions) was rendering
   with a 28px nav-to-hero gap while /admin/users + /admin/groups
   rendered with 16px. Same width, mismatched vertical rhythm.
   The opt-in rule is now a no-op marker: canonical container
   already provides 1280px + 16/32/48 + main margin/padding 0.

2. admin_tables.html had a stale `<style>` block that re-declared
   `:root { --primary: var(--primary); ... }`. The self-referential
   token resolved to empty, collapsing the page-header hero's
   `linear-gradient(135deg, var(--primary), var(--primary-dark))`
   to no background — the hero appeared as a pale ghost without
   colour. The entire shadow `:root` block was a stale copy of the
   design tokens that style-custom.css already provides. Dropped
   it; tokens now resolve from the global `:root`.

After both fixes /admin/sessions, /admin/tables, and every other
page-shell consumer match /admin/groups exactly: container 1280px,
container padding-top 16px, hero at top 88px / left 190px / width
1216px.

* fix(web): drop /admin/tokens .tokens-page width + padding override

`.tokens-page` carried its own `max-width: 1280px; margin: 0 auto;
padding: 28px 8px 48px` block — the canonical `.container` already
provides width + 16/32/48 padding, so the nested wrapper was
adding 28px on top of the container's 16px (= 44px nav-to-hero
gap, vs 16px on every other admin page) and shrinking the hero
sideways by 8px on each side (1200px vs the canonical 1216px).

After: container owns the layout; `.tokens-page` is just a
font-family scope. /admin/tokens hero now sits at top 88, left 190,
width 1216 — same numbers as /admin/groups / /admin/users.

* fix(web): hero links readable on blue; /admin/access Groups link href

- New `.page-header--hero a` rule in style-custom.css forces any
  anchor inside a gradient hero to render white + underlined so
  links stay readable on the blue background. Previously links
  inherited the global `var(--primary)` blue, which disappeared
  on top of the matching blue gradient. No per-page class needed —
  drop a plain `<a>` in any hero subtitle and it just works.
- /admin/access hero subtitle was Jinja-passing the inline link
  with HTML-entity-encoded quotes (`href=&quot;...&quot;`). The
  entities decoded to literal `"` characters inside the rendered
  href, producing `/admin/%22/admin/groups%22` — a 404. Switched
  the `set` to a block-set (`{% set page_hero_subtitle %}...{% endset %}`)
  so the inline `<a href="/admin/groups">Groups</a>` survives
  unescaped through `_page_hero.html`. Also stripped the now-redundant
  inline `style="color:#fff;text-decoration:underline;"` — the new
  shared rule handles it.

* fix(web): /dashboard top padding matches every other page

`.main` on /dashboard had `padding: 28px 32px 48px` while every
other page now uses `16px 32px 48px` via the canonical
`.container`. Dashboard bypasses `.container` (overrides
base.html's `layout` block to render a full-width `<main>`
directly), so the padding lives on `.main` itself — bumped the
top to 16px to match.

After: first child top = 88, left = 190, width = 1216 — same
numbers as /admin/groups / /admin/users / /admin/marketplaces.

* fix(web): green eyebrow + white title on .page-header--hero (matches /home)

`.page-header--hero .page-header__eyebrow` was faint white
(rgba(255,255,255,0.75)) — readable but unbranded against the blue
gradient. Changed to `var(--ds-brand-accent)` (mint green #54d3a0)
so every page hero pairs a green eyebrow with white title +
subtitle, echoing /home's setup-section header (green eyebrow,
dark heading combo). One CSS rule applies everywhere — no
per-page styling needed.

Also bumped the eyebrow to font-weight 700 / letter-spacing 1.2px
so the green stands out cleanly against the gradient.

* fix(web): page-header--hero + stack-hero use /home navy gradient

`.page-header--hero` and `.stack-hero` were on the brand-blue
gradient (`var(--primary)` → `var(--primary-dark)`) while
/home's hero (`.home-hero-intro`) sits on the deeper navy
gradient (`#0f1b3a` → `#1a2a5f`). Every other page-hero now
uses that same navy gradient so /home, /marketplace, /catalog,
/corporate-memory, /admin/*, /profile, /install, /dashboard,
/setup-advanced share one brand surface. Shadow tint adjusted
to the navy depth (rgba(15, 27, 58, 0.22)).

Brand blue stays the link/CTA colour everywhere else; only the
hero box itself is navy.

* fix(web): primary buttons green; marketplace tabs navy translucent

Two parity tweaks pulling the rest of the app toward /home's
visual language.

- `.btn-primary` (both rules in style-custom.css) now uses
  `var(--ds-primary)` / `var(--ds-primary-dark)` green fill,
  matching the "Copy install script to clipboard" button on
  /home. Brand-blue `--primary` still drives link colour and the
  accent surface; only the filled button background flipped to
  green. Every page with a `.btn-primary` (admin "+Add user",
  "+Add marketplace", catalog, marketplace actions, dashboard,
  modals) now reads as the same "do it" affordance.
- `.mp-tabs` (Curated Marketplace / Flea Market / My Stack tab
  group) now sits on the navy `--ds-hero-bg` with translucent
  white pills (rgba(255,255,255,0.10) inactive, 0.18 active) —
  same translucent-white-on-navy treatment as the "Just browse —
  no install needed" pill on /home. Icons render as soft white;
  per-tab colour-coding dropped in favour of the unified surface.

* fix(web): catalog/memory tabs + empty-state CTA + admin action buttons

Bring /catalog and /memory in line with /home + /marketplace:

- `.stack-tabs` (Browse / My Stack / Recipes on /catalog,
  Browse / My Stack on /memory) now uses the navy `--ds-hero-bg`
  container with translucent-white-on-navy pills, mirroring the
  `.mp-tabs` treatment and /home's "Just browse — no install
  needed" CTA pill. Per-tab icon colour-coding dropped — icons
  render as soft white on the navy fill.
- `.stack-tabs-row__actions .btn` (right-slot "+New Recipe",
  "+New Data Package" admin CTAs) now uses green primary fill
  (`--ds-primary`), matching `.btn-primary` and /home's
  "Copy install script to clipboard" button.
- `.stack-empty .cta a` (empty-state action button — the
  "Open /admin/tables →" CTA on /catalog and equivalent on
  /memory) flipped from blue `--primary` to green `--ds-primary`
  so the colour aligns with every other primary button in the app.

* fix(web): marketplace Search button green (--ds-primary) matching other CTAs

* fix(web): unify Search button + admin-action button across browse pages

- Added Search button (`<button class="stack-hero__search-btn">`)
  to /catalog and /memory heroes — same green pill as /marketplace.
  Wired to the existing live-filter pipeline (button click runs
  `applyFilters()` and refocuses the input). All three browse pages
  now wear the identical search bar UI.
- `.stack-hero__search-btn` shares `--ds-primary` fill with
  `.mp-hero .search-btn`.
- `.mp-actions .btn` ("Submit a skill or plugin" CTA on /marketplace)
  flipped from the legacy blue-outline to the same green primary
  fill + dimensions (`display: inline-flex; line-height: 1;
  padding: 9px 16px; gap: 6px`) as `.stack-tabs-row__actions .btn`
  on /catalog and /memory. All three right-slot action buttons
  render at identical height now.
- `.stack-tabs-row__actions .btn` got `inline-flex` + `line-height: 1`
  + `gap: 6px` so a `<button class="btn">` and a `<a class="btn">`
  both render at exactly 33px high — the embedded
  `.admin-only-hint` chip no longer pushes one variant taller
  than the other.

* fix(web): marketplace guide CTAs green (fastpath + primary); drop flea purple

* fix(web): dashboard CTA hero on navy; readable <code> chips in hero

- `.env-setup-cta` on /dashboard ("Set up a new Claude Code"
  card) flipped from the brand-blue gradient + green-tinted shadow
  to the canonical navy gradient (`--ds-hero-bg` → `#1a2a5f`) with
  navy-tinted shadow + 14px radius + 28/32/24 padding, matching
  `.page-header--hero` and /home's `.home-hero-intro`. Dashboard's
  top CTA now sits on the same brand surface as every other hero.
- Added `.page-header--hero code` rule — translucent white pill +
  warm-yellow ink (#ffd866) so `<code>` chips embedded in hero
  subtitles read as code samples against the navy gradient. The
  global `code` rule sets `color: var(--text-primary)` (dark),
  which turned in-hero chips into invisible dark-on-white-on-navy
  ghosts (e.g. the `-by-dev` suffix on /store/new).
- /store/new's `.page-header__subtitle code` dropped its inline
  style override — the shared rule handles it now.

* feat(web): two-theme switching via data-theme + admin toggle

Introduces a theme system that flips the entire UI palette between
"navy" (current design, default) and "blue" (pre-redesign palette)
via a single `<html data-theme="...">` attribute. Page markup, class
names, and component styles don't change — only the `--ds-*` token
values flip.

Backend
- New `app/instance_config.py::get_instance_theme()` resolves the
  active theme from `AGNES_INSTANCE_THEME` env > `instance.theme`
  in instance.yaml > default "navy". Unrecognised values clamp to
  "navy" so a typo doesn't break the page.
- `app/web/router.py::_build_context` injects `instance_theme`
  alongside `instance_brand` etc. so every template inherits it.
- `app/web/templates/base.html` renders
  `<html lang="en" data-theme="{{ instance_theme | default('navy') }}">`.

CSS
- `app/web/static/css/design-tokens.css` adds two new tokens to
  the default `:root` set: `--ds-hero-shadow` (drop-shadow tint
  on hero boxes) and `--ds-hero-eyebrow` (eyebrow accent colour).
  Plus a `:root[data-theme="blue"]` override block that flips
  seven tokens: `--ds-primary`, `--ds-primary-dark`,
  `--ds-primary-light`, `--ds-brand-accent`, `--ds-hero-bg`,
  `--ds-hero-bg-deep`, `--ds-hero-shadow`, `--ds-hero-eyebrow`.
  The blue theme aliases the brand surface tokens back to the
  legacy `--primary` family.
- `.page-header--hero`, `.stack-hero`, `.env-setup-cta`,
  `.home-mock .home-hero-intro` now reference the new
  `--ds-hero-shadow` and `--ds-hero-bg-deep` tokens instead of
  hard-coding `rgba(15, 27, 58, 0.22)` and `#1a2a5f` — gradient +
  shadow now flip with the theme.
- `.page-header--hero .page-header__eyebrow` uses
  `var(--ds-hero-eyebrow)` so the eyebrow goes mint-green on
  navy and translucent-white on blue (mint on blue reads poorly).

Admin
- `app/api/admin.py::_KNOWN_FIELDS["instance"]` now registers a
  `theme` field of kind `select` with options `["navy", "blue"]`
  and a `hint` explaining the trade-off. The existing
  /admin/server-config UI auto-renders a select for this — no
  template changes needed.

Defaults
- Default value is "navy" so existing instances see no visual
  change. Admins flip to "blue" via /admin/server-config to
  restore the pre-redesign look.

Restart note: uvicorn must reload to pick up the Python changes
(new getter, new template-context key, new known-field). CSS
changes hot-reload via browser refresh.

* fix(web): blue theme — home hero eyebrow + CTA contrast

`.home-hero-intro .eyebrow` and `.btn-intro-primary` referenced
`--ds-brand-accent` directly, which on the blue theme resolves to
the lighter brand-accent blue (#4F9DEB). Result: light-blue eyebrow
on the blue gradient ("WELCOME, ADMIN" barely readable) and a
light-blue button with darker-blue text ("Set up in ~15 min")
that all sat in the same hue range.

Introduces three new theme-aware tokens:
- `--ds-hero-eyebrow` already existed; blue theme bumped opacity
  to 0.92 so the eyebrow reads as full white.
- `--ds-hero-cta-bg` + `--ds-hero-cta-fg` + `--ds-hero-cta-bg-hover`
  flip the primary hero CTA: mint-green on navy (default), white-
  on-blue under `data-theme="blue"`.

`.home-hero-intro .eyebrow` now uses `--ds-hero-eyebrow` (mint on
navy / white on blue) and `.btn-intro-primary` uses the CTA token
trio.

Recommended palette on blue theme:
- Eyebrow: white at 92% opacity (clear on the blue gradient).
- Primary CTA pill: white background, brand-blue dark text
  (`--primary-dark` = #005BA3) for AAA-level contrast.
- Secondary CTA: translucent white pill (unchanged).

* fix(web): blue theme — callout-hint info bg/border/ink re-tinted to brand blue (was indigo, clashed with brand-blue hero)
2026-05-21 06:19:16 +00:00
Vojtech
ae67c40a81
fix(onboarding): /home install flow + agnes init UX hardening (#350)
* fix(web): /home Step 2 recommends --dangerously-skip-permissions for setup

The Step 4 paste runs ~20 shell commands (CLI install, workspace
bootstrap, marketplace clone, MCP register, connector logins). Previous
Step 2 recommended auto-accept-edits via Shift + Tab, which covers file
edits but not Bash — users still clicked ~20 Yes prompts during setup.

Step 2 now leads with `claude --dangerously-skip-permissions` as the
recommended session flag (Bash + edits both skip). Session-scoped, drops
on next plain `claude` — safe here because the pasted script is
generated by this server and ends after a fixed sequence; the flag does
not weaken future Claude sessions.

Auto-accept-edits via Shift + Tab kept as the strict-review fallback;
persistent YOLO allowlist link to /setup-advanced#yolo unchanged.

* fix(web): swap /home Steps 2↔3, claude --yolo as copy-button command

Folder creation moves to Step 2; Step 3 launches Claude from that
directory with `claude --dangerously-skip-permissions`. The YOLO flag
is rendered through the standard .install-cmd + copy-button affordance
(matching Step 1 + Step 2), not inline prose. Step 4 paste runs ~20
shell commands that auto-accept-edits would not cover (Bash still
prompts), so the YOLO flag is the default recommendation; session-
scoped, drops on next plain `claude`.

Setup script's pwd-check warning copy refreshed to reference "/home
Step 2" (the new folder-creation step number).

# Conflicts:
#	CHANGELOG.md

* fix(web): open YOLO setup-advanced link in new tab

Step 3 install-hero's persistent-YOLO link now opens /setup-advanced#yolo
in a new window so users don't lose their /home install context mid-
setup. target="_blank" + rel="noopener" (no reverse-tabnabbing).

* fix(web): merge /home Step 3 fallback prose into prior paragraph

Drop the <br><br> between the 'Session-scoped' line and the 'Prefer
reviewing each command' line so the strict-review fallback flows on
the same paragraph — less vertical space in the install-hero block.

* docs(web): add "What leaves your machine" privacy callout on /home

Install-hero lead now includes a short privacy paragraph: explains that
session telemetry (prompts / tool-calls / tool-responses) flows back to
the central catalog for failure-pattern analysis while raw data rows
the user queries locally stay on their machine. Points at /agnes-private
as the per-session opt-out.

Also collapses leftover cherry-pick conflict markers in CHANGELOG.md
into one clean [Unreleased] section.

* fix(init): harden agnes init UX — 5 issues from David's report

1. chmod +x hooks. agnes init + agnes refresh-marketplace --bootstrap
   now set the execute bit on every .sh they land on disk
   (`<workspace>/.claude/hooks/*.sh` after init; every `.sh` under the
   `~/.agnes/marketplace` clone after a bootstrap/pull). Git checkout
   doesn't always preserve filemode (filemode=false repos, ZIP
   extractions), so hooks were firing with "Permission denied" — silent
   SessionStart / PreToolUse breakage. Best-effort, no-op on Windows.

2. --token-file + AGNES_TOKEN. agnes init now accepts `--token-file
   <path>` and an `AGNES_TOKEN` env fallback alongside `--token`.
   Precedence: --token > --token-file > AGNES_TOKEN. The file / env-var
   paths dodge Claude Code's auto-classifier, which sometimes flags a
   long bearer token in `--token "eyJ..."` command line as a credential-
   exfil pattern. The pasted setup script now uses `--token-file
   ~/.agnes/token` (token written via single-quoted heredoc, umask 077)
   for the same reason.

3. Bash(agnes *) in allow. Default `.claude/settings.json` permissions.
   allow seeded by agnes init now includes `Bash(agnes *)` alongside the
   bare `Bash` entry, so Claude Code's classifier sees an explicit allow
   for subsequent `agnes <verb>` calls inside the workspace it just
   bootstrapped.

4. .zshrc PATH dedup. Setup-script step 1's PATH-persist snippet
   (no-CA install path) replaced with a `grep -qF + ||` idiom so a
   re-run doesn't append a duplicate `export PATH=...` line. Fixed-
   string match (not regex) per the dedup-bug report.

5. `!` prefix doc note. Setup-script step 3 now explicitly tells the
   user: if Claude Code blocks an `agnes` command, prefix it with `!`
   (e.g. `! agnes init …`) to run the command directly in the shell,
   bypassing the auto-classifier.

* release: 0.55.1 — /home onboarding install-hero rework + agnes init UX hardening

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-19 15:26:35 +02:00
Vojtech
bd90485dbd
fix(web): setup script step 2 checks pwd, no auto-mkdir (#344)
The /home onboarding page already has a visible manual "Step 3 — create
your workspace folder" instructing the user to `mkdir -p ~/<dir> && cd
~/<dir>` BEFORE pasting the install script into Claude Code. The pasted
script's step 2 then re-ran the same mkdir+cd, which silently overrode
an intentional alternate install path (e.g. user cd'd to
~/work/agnes-prod on purpose) and was redundant on the default path.

Step 2 now verifies the user is in `$HOME/<workspace_dir>` via `pwd`.
On mismatch it stops and asks the user to either re-paste from the
correct folder or reply `install here` to accept the current cwd.
Never auto-creates a folder.

Step 9 (restart Claude Code) references the install directory confirmed
in step 2 instead of a hardcoded `~/<workspace_dir>`, so users on a
custom path see accurate guidance.
2026-05-19 11:20:33 +02:00
Vojtech
79a958ec26
feat(setup): configurable instance brand + connector setup overhaul (#268)
- instance.brand (env AGNES_INSTANCE_BRAND, default "Agnes") +
  instance.workspace_dir replace hard-coded "Agnes" / "~/Agnes" across
  /home, /setup, /setup-advanced, /login, /install, /me/debug, and the
  Claude Code clipboard setup script. Terraform-friendly env override;
  defaults preserve existing Agnes branding.

- Explicit "create workspace folder" step on /home (OS-tabbed mkdir+cd)
  + same step baked into the clipboard script as step 2. Drops the
  implicit assumption that `agnes init --workspace .` lands in a
  sensibly-cd'd shell.

- Final "Restart Claude Code" step in the setup script (unconditional,
  between connectors and Confirm) so freshly-installed plugins, MCP
  servers, and SessionStart hooks load on the next Claude Code session.

- Asana reverted from hosted Remote MCP back to PAT + raw REST against
  app.asana.com/api/1.0. MCP envelope shape consumed ~5x tokens per
  call; the PAT path lets the agent read flat REST fields. Existing
  MCP registration is detected and the user is asked whether to remove
  it (default Y, with benefits listed: token cost, no third-party hop,
  no OAuth refresh dance, deterministic envelope shape).

- Atlassian connector instructs picking the longest API-token expiry
  (today "1 year") to cut re-mint friction. No public query-parameter
  hook exists on id.atlassian.com to pre-select expiry, so the prompt
  documents the manual click and acknowledges that limitation.

- Uniform  /  per-connector marker contract (Asana, GWS, Atlassian)
  for the Confirm summary to grep. Each connector now ends with a
  Claude-driven end-to-end test that uses Claude Code's own bash to
  exercise the stored credential and prints
  " <Connector> integration verified — ..." (or the failure variant).
2026-05-12 17:10:08 +02:00
Vojtech
c09c85d13a
fix(cta): clipboard fallback + fold Atlassian MCP into connectors (#249)
* fix(cta): fall back to textarea+execCommand when Clipboard API rejects

The "Setup a new Claude Code" CTA fetches /auth/tokens, parses the JSON
response, renders the setup script, THEN calls
`navigator.clipboard.writeText()`. Modern browsers (Safari, Firefox, and
Chrome on stricter configurations) reject `writeText` with
NotAllowedError when transient user activation has been consumed by an
intervening `await` — which is exactly the case here. Users perceived
this as "the browser blocked the copy" and got the manual-paste fallback
modal even though the textarea + `document.execCommand('copy')` path
WOULD have worked synchronously without needing fresh user activation.

`copyToClipboard` now:
- prefers the modern Clipboard API (unchanged for the happy path)
- on writeText rejection, falls back to `copyViaTextarea` instead of
  surfacing the rejection to the caller's catch block.

`copyViaTextarea` is the previously-inline textarea fallback factored
out into a named helper, with two small hardening touches:
- `readonly` + `tabindex=-1` so the hidden textarea doesn't steal
  focus or pop the virtual keyboard on mobile.
- explicit `setSelectionRange(0, text.length)` to belt-and-braces the
  selection on iOS Safari (where `.select()` alone sometimes selects
  zero chars on touch-focused textareas).

Only the CTA button needed this — the Step-1 install-command and the
connector-copy buttons all call `writeText` synchronously inside the
click handler (no awaits in between), so they keep their existing
user-gesture context and didn't hit the same rejection. No template
changes there.

* refactor(home): fold Atlassian MCP registration into connectors block

The standalone "Register the Atlassian MCP server" step (was step 6 in
the unified setup script) moves INTO the Atlassian connector's prompt
body so all Atlassian-related setup lives in one logical group. Same
intent that #247 carried for connectors, applied one level deeper:
the hosted Remote MCP registration is part of "set up Atlassian", not
its own ungrouped step.

What changed:
- `app/web/connector_prompts.py` — the Atlassian prompt's step 5
  replaces the speculative "Register the on-demand Atlassian MCP under
  .claude/mcp/atlassian" line with the actual hosted Remote MCP
  registration: `claude mcp add --transport sse atlassian
  https://mcp.atlassian.com/v1/sse || true`. The `|| true` keeps re-runs
  idempotent and the body explains the OAuth-on-first-use contract.
  Both /home's Atlassian tile and the inlined setup-script Atlassian
  sub-block emit this line — single source of truth holds.
- `app/web/setup_instructions.py` — `_mcp_servers_block` deleted; the
  `mcp_servers` step is removed from `_step_numbers`; resolve_lines no
  longer calls it.
- Renumbering: install (1), init (2), catalog (3), preflight (4),
  marketplace (5), diagnose (6), connectors (7), confirm (8). Was:
  6 = mcp_servers, 7 = diagnose, 8 = connectors, 9 = confirm.
- `tests/test_setup_instructions.py` — Confirm step 9→8, Connect 8→7,
  diagnose 7→6, mcp_servers references dropped.
  `test_step_numbering_with_connectors_step` now asserts
  `"mcp_servers" not in steps`. Stray-Confirm assertion lists shift
  by one position.
- `tests/test_setup_page_unified.py` + `tests/test_web_ui.py` — same
  step-number shifts in the rendered /setup preview assertions.

The `claude mcp add` line is still the Atlassian Remote-MCP path that
the 2026-05-10 init-report Fix C added — only its position in the
flow changes. /home Atlassian tile copying continues to install the
MCP too (the prompt body the tile pastes contains the same line).
112 tests pass.

* feat(atlassian): operator-overrideable base URL via AGNES_ATLASSIAN_BASE_URL

Adds an env var / YAML key the operator (Terraform module, customer-VM
template, OSS instance.yaml) can set to bake the Atlassian Cloud site
root into the connector prompt — so end users don't have to guess /
paste their org's `https://<myorg>.atlassian.net`.

When set, the Atlassian connector prompt (rendered on both /home tile
and inlined into the setup-script step 7 Atlassian sub-block) replaces
step 1's "Ask me for my Atlassian Cloud site URL and email" with a
one-line note that the URL is already provisioned by the operator and
asks only for the email. Step 4's helper-script body has the
`BASE_URL='<the site URL I gave you>'` placeholder substituted with
the literal value. When unset (empty), the existing "ask the user"
flow remains — no regression for OSS instances.

Resolution + normalization in `get_atlassian_base_url()`:
- env `AGNES_ATLASSIAN_BASE_URL` > yaml `instance.atlassian.base_url` > ""
- strips trailing slash + trailing `/wiki` so the canonical value is
  the bare site root. Matches the per-user helper script's
  normalization at storage time (atlassian_prompt step 4 guard 2), so
  the literal baked in by the operator stays consistent with what the
  user's helper script would have computed from their input.

Plumbing:
- `app/instance_config.py`: new `get_atlassian_base_url()` resolver.
- `app/web/connector_prompts.py`:
  - `atlassian_prompt(*, base_url: str = "")` — string-replace two
    explicit placeholder phrases when base_url is truthy; otherwise
    return the prompt unchanged.
  - `all_connector_prompts(..., atlassian_base_url: str = "")` —
    forwards the kwarg.
- `app/web/router.py` (`_build_context`): reads
  `get_atlassian_base_url()` and passes it through to
  `all_connector_prompts(...)` so both the /home tile context AND the
  inlined-script `resolve_lines(...)` call use the same value.
- `src/welcome_template.py` (`compute_default_agent_prompt`): same
  threading via the existing import-on-demand path.

Tests (`tests/test_home_route_resolution.py`):
- `get_atlassian_base_url` resolver: default empty, env override,
  trailing-slash strip, trailing-`/wiki` strip.
- `atlassian_prompt(base_url=...)`: literal URL baked in, ask-step
  removed, placeholder replaced, operator-baked-in copy appears.
- `atlassian_prompt(base_url="")`: existing ask-the-user flow
  unchanged.
- `all_connector_prompts(atlassian_base_url=...)`: kwarg threads
  through to the rendered atlassian prompt.

135 tests pass.

* feat(asana): register hosted Asana Remote MCP in connector prompt

The Asana connector prompt only stored a PAT in the OS keychain + ran
a curl verify against /api/1.0/users/me. That set Claude Code up for
direct `curl` calls but didn't actually wire Asana into Claude's tool
list — so the user couldn't ask Claude to "find my open Asana tasks"
and have it work. Symmetric oversight to the Atlassian connector's
original speculative `.claude/mcp/atlassian` line that this branch
already replaced with `claude mcp add --transport sse atlassian
https://mcp.atlassian.com/v1/sse`.

Adds a new step 5 that registers Asana's hosted Remote MCP:

  claude mcp add --transport http asana https://mcp.asana.com/mcp || true

This is the V2 endpoint (streamable HTTP transport, launched February
2026). The V1 SSE endpoint at https://mcp.asana.com/sse was deprecated
2026-05-11 (today) and must NOT be used — calling it out explicitly
in the prompt body so a future operator who finds an old reference
doesn't paste the dead URL. OAuth is handled by Claude Code at first
use, same model as the Atlassian MCP step.

The PAT stored in step 3 stays for direct `curl` calls (precheck +
ad-hoc scripts) — the MCP path uses its own OAuth grant, not the PAT.
Old step 5 (revoke instructions) renumbers to step 6 and adds the
`claude mcp remove asana` cleanup hint.

Same single-source-of-truth invariant holds: /home Asana tile + the
inlined Asana sub-block in the setup script (step 7 connectors) both
emit identical text from `asana_prompt()`.

71 tests pass.

* feat(asana): drive MCP OAuth login + end-to-end validation post-register

`claude mcp add --transport http asana ...` only registers the
server in Claude Code's local config — it does NOT trigger OAuth.
The browser tab opens the first time any `mcp__asana__*` tool gets
invoked. So the previous step 5 left a user looking at a "registered"
MCP that, in practice, hadn't authed yet and would fail on first
real use. Same blind spot Atlassian's prompt also has, but Asana was
the one called out in the latest review pass.

Adds a new step 6 between MCP registration (step 5) and the revoke
instructions (now step 7):

  a. Tell the user verbatim what's about to happen — a low-impact
     read through the MCP will pop the OAuth browser tab; sign in
     with the same account whose PAT they stored in step 3 and
     approve. Frames the OAuth as one-time so users don't wait
     for it on every later call.
  b. Drive an actual MCP read. Don't prescribe the exact tool name
     because the Asana MCP's exposed surface (`mcp__asana__*`) is
     versioned upstream and we don't want to pin to a name that
     gets renamed. Instead: tell Claude to pick the lightest read
     from its surfaced tool list (users-me / list-workspaces /
     equivalent). Document the recovery path when Claude Code
     times out waiting for the OAuth tool use: `claude mcp list`
     to confirm registration before retrying.
  c. Print a single one-line proof that combines wiring + auth:
     "Asana MCP connected as <name> — <N> workspace(s) visible."
     Explicit anti-echo callout for tokens, task content, comments.
     On failure, surface the exact Claude-Code error and stop —
     no silent pass.
  d. Sanity-check that the MCP OAuth identity and the PAT identity
     reference the same Asana account. Easy mistake to make when
     the user has multiple Asana accounts — flag only on mismatch,
     keep quiet when they match. Recovery: `claude mcp remove asana
     && claude logout asana` then redo step 5.

Step 7 (revoke) absorbs both the keychain delete + the
`claude logout asana` line so users have a single place to undo
everything.

43 tests pass.

* fix(init): clear stale CA env vars on Windows before any TLS handshake

Reported by the 2026-05-11 Windows test pass: after `agnes init` the
gws connector failed with `UnknownIssuer` TLS errors because
`SSL_CERT_FILE` and `REQUESTS_CA_BUNDLE` were still set in Windows
User scope pointing at `C:\Users\localadmin\.config\agnes\ca-bundle.pem`
— a file that did not exist on the test host. Past Agnes installs
(the setup-prompt trust block + older bootstrap helpers) write those
pointers when they materialize a combined Agnes-CA bundle; when the
bundle file later disappears (re-init on a new VM, machine swap, the
~/.agnes dir wiped), the pointers go stale and every native Windows
TLS handshake fails before Agnes itself runs. SSL_CERT_FILE in
particular REPLACES (not appends to) the trust store, so a stale
pointer is silently catastrophic.

`agnes init` now clears stale pointers in two layers before the first
server roundtrip:

1. Current-process env (os.environ) — what the immediately-following
   `api_get` to /api/catalog/tables actually reads. Without this, init
   itself blows up before it gets to step 2.
2. Windows User-scope env via PowerShell
   `[Environment]::SetEnvironmentVariable(name, $null, 'User')` — what
   every future shell + every native tool (gws, claude.exe, pip, uv)
   inherits. The 2026-05-11 reporter expected this exact cleanup
   ("init was supposed to clear these but they persisted").

The cleanup is best-effort and conservative:
- Only deletes a var when its value points at a path that does NOT
  exist on disk. Intentional operator config (e.g. SSL_CERT_FILE
  pointing at a corp certifi bundle) stays put.
- PowerShell missing / restricted execution policy / WSL-without-pwsh:
  swallowed silently. The current-process leg still runs, which
  unblocks init even on hosts where the User-scope leg cannot fire.

Tests (`tests/test_init_ca_cleanup.py`, 6 cases):
- Stale pointers → removed from process env.
- Real-path pointers → preserved.
- Non-Windows hosts: PowerShell is not invoked.
- Windows hosts: PowerShell IS invoked with a script that checks
  all three vars + uses Test-Path + SetEnvironmentVariable.
- PowerShell FileNotFoundError: cleanup swallows it, does not raise.
- `_is_windows_host()` reflects sys.platform.

* refactor(asana): MCP-first flow — drop PAT storage, precheck via `claude mcp list`

The Asana hosted MCP at https://mcp.asana.com/mcp authenticates via
OAuth (Claude Code holds the grant; browser tab pops on first tool
use). The earlier prompt walked the user through creating + keychain-
storing an Asana Personal Access Token AND registering the MCP — two
parallel auth surfaces for one connector. Once the MCP works, the PAT
has no consumer: the precheck/verify steps that used `curl
$BASE/api/1.0/users/me` are just redundant proof that Asana itself is
reachable, which the OAuth handshake already establishes.

Removed:
- Step 0 keychain probe + curl verify against /users/me with PAT.
- Step 1 open developer-console / create PAT.
- Step 2 click "+ New access token", warn shown-ONCE.
- Step 3 helper-script for keychain-storage (per-OS bodies: macOS
  `security add-generic-password`, Linux `secret-tool store`, Windows
  `cmdkey /generic`).
- Step 4 PAT-side `users/me` verify.
- Step 5's split that kept the PAT around for direct curl scripts.
- Step 6d's "MCP vs PAT identity sanity check" — there is no PAT
  anymore, nothing to mismatch against.

New flow (3 steps total):
- Step 0 precheck: `claude mcp list | grep ^asana` — if found, the
  server is registered AND Claude Code is holding its OAuth grant
  (otherwise prior failure would have removed it); print
  "Asana MCP already registered — skipping setup" and stop. Tells the
  user the explicit reset command (`claude mcp remove asana && claude
  logout asana`) so a re-register stays one paste.
- Step 1: `claude mcp add --transport http asana
  https://mcp.asana.com/mcp` — no `|| true` because step 0 should have
  caught the "already exists" case. Step explains the V2-vs-V1
  endpoint distinction (V1 SSE deprecated 2026-05-11) and the
  abort-clean recovery if the precheck somehow missed the existing
  server.
- Step 2: same OAuth + low-impact-read validation pattern as before.
- Step 3: revoke instructions (mcp remove + logout + Asana-side app
  revoke at app.asana.com/Settings → Apps).

Both surfaces (the /home Asana tile and the inlined Asana sub-block
in the setup script's step 7) emit the new text from the same
asana_prompt() — single-source-of-truth invariant intact.

77 tests pass.
2026-05-11 21:54:51 +02:00
Vojtech
a46b9dc928
/home install-hero polish: license link contrast, auto-mode reorder, Shift+Tab guidance (#243)
* Make /home install-hero links readable against blue background

The Claude license-options link added in the previous commit inherited
the default `<a>` style (`var(--hp-primary)` blue), which renders as
blue-on-blue and is unreadable inside the blue install-hero. Add a
scoped `.install-hero a` rule that uses white with an underline
(matching the existing lead-paragraph contrast pattern) so any link
nested in the hero stays legible.

* Reorder /home install flow: auto-mode is now Step 2, Agnes install becomes Step 3

Step 3 (was Step 2) pastes a ~20-command bash bootstrap into a fresh
Claude Code session. Without auto-mode enabled first, each Bash/edit
command needs a manual approve click — bad UX for first-time users.

Move auto-mode from the outside-hero `<details>` reference block into
the install-hero as a real Step 2, between "install Claude Code" and
"install Agnes". Content is the persistent `acceptEdits` snippet
(write to ~/.claude/settings.json) plus a one-liner pointing at
Shift+Tab for users who are already inside a running Claude Code
session. YOLO mode for full Bash auto-approve stays on
/setup-advanced behind the existing link.

The outside-hero `setup-collapsible[data-section="step3"]` block is
dropped — auto-mode is no longer reference content, it's a real
install step, and duplicating it would just diverge over time.
Onboarded users no longer see the auto-mode block at all (consistent
with Steps 1 + 3 also hiding post-onboarding).

Completion banner copy updated: "Step 1, 2 & 3 done — Claude Code
installed, auto-mode set, Agnes ready". Dashboard CTA partial and
other templates don't reference step numbers for this flow, so no
adaptation needed there.

* Simplify /home Step 2 to Shift+Tab only — drop the JSON snippet

Operator pointed out two issues with the prior Step 2:

1. The settings.json snippet is redundant. Claude Code's first
   Shift+Tab cycle to auto-accept mode already prompts the user
   whether to persist it as default — Claude writes the config
   itself, no manual file edit needed.

2. The snippet only showed the POSIX path `~/.claude/settings.json`,
   which doesn't translate to native Windows.

Replace the snippet + copy button with a plain Shift+Tab instruction,
explicitly call out the first-time "make this the default?" prompt,
and note that Claude handles the config write itself — same flow on
macOS / Linux / WSL / Windows. Adds a fallback line for users who
already closed the post-OAuth session.

* Tighten /home Step 2 install-note to two paragraphs

Operator: drop the 'Claude writes the setting itself, so this works
the same on macOS / Linux / WSL / Windows...' line plus the
'auto-approves file edits going forward; Bash commands stay gated
— that's the safe default' line. Both were filler — the make-default
prompt already implies persistence, and gated Bash is the obvious
default users won't be surprised by.

Result: paragraph 1 carries Shift+Tab + first-time make-default
say-yes + closed-session fallback in one breath; paragraph 2 keeps
the verbatim YOLO link. Same affordances, less vertical space.
2026-05-11 16:46:58 +00:00
minasarustamyan
19c5a7592a
Session capture queue, private session, and setup-prompt fixes (#242)
* Capture session paths via SessionStart hook + lock parallel pushes

Replace the encoding-based scan of ~/.claude/projects/<encoded-cwd>/ with
a queue file populated by a new `agnes capture-session` SessionStart hook.
The hook reads the documented `transcript_path` field from Claude Code's
hook stdin JSON, sidestepping the cwd-to-folder encoding (which is an
internal implementation detail and varies by Claude Code version).

- New `agnes capture-session` subcommand appends transcript_path to
  <workspace>/.claude/agnes-sessions.txt. Silent on all malformed input
  so a hook chain failure doesn't clutter Claude Code startup.
- `agnes push` now consumes the queue: atomic snapshot rename guards
  against hooks writing during the push window, successful uploads land
  in agnes-sessions-uploaded.txt (TSV: timestamp + path), failed paths
  are requeued.
- Cross-platform single-instance lock via the filelock package (fcntl
  on POSIX, msvcrt on Windows). Concurrent SessionEnd hooks — common
  when the user closes several sessions at once — silent-exit on the
  losing side instead of all racing the upload.
- Recovery: pre-existing snapshot files from a crashed push are picked
  up and processed before the live queue.
- The SessionStart `agnes push` self-heal entry is dropped — it became
  redundant once the queue persists across runs (orphans from headless /
  crashed sessions ship out on the next interactive SessionEnd push).
  Existing workspaces auto-migrate via the marker-based replace logic.
- Legacy encoding scan stays available behind `--legacy-scan` for one-
  off backfills of sessions predating the queue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add /agnes-private + statusLine indicator for private sessions

Users handling sensitive data inside Claude Code can now opt a session
out of the Agnes upload pipeline, either proactively (right after session
start) or reactively (mid-session). The `/agnes-private` slash command
runs `agnes mark-private` deterministically via `!`-prefix direct bash —
no AI in the loop. A workspace-installed statusLine surfaces a
`🔒 agnes-private` indicator in Claude Code's status bar so the user
sees the state at a glance.

Authoritative source of "do not upload" is a separate file
`<workspace>/.claude/agnes-sessions-private.txt` (one session_id per
line). Both `capture-session` (queue writer) and `push` (queue reader)
consult the list. This makes the slash-command / SessionStart-hook race
impossible by construction: whichever runs first, the session is correctly
filtered out.

- `agnes mark-private` reads `CLAUDE_CODE_SESSION_ID` from env (set by
  Claude Code in every bash subprocess it spawns — stable documented API)
  and appends to the private list.
- `agnes statusline` reads the session JSON Claude Code pipes on stdin,
  checks the private list, and emits the indicator or nothing. Optimized
  for the high call frequency of statusLine renders.
- `capture-session` extracts session_id from hook stdin and skips queue
  write when the ID is already on the private list (race protection).
- `push` filters snapshot entries by the private list and appends to a
  per-workspace audit log `agnes-sessions-private-skipped.txt`.
- Queue format migrated from `<path>` to `<session_id>\t<path>`; legacy
  one-column lines still parse (empty session_id, still upload, can't be
  marked private retroactively — fine, they pre-date the feature).
- `install_claude_hooks` writes a workspace statusLine unless the user
  already has a custom one (warn + preserve). Idempotent re-init.
- `install_claude_commands` ships `agnes-private.md` alongside
  `update-agnes-plugins.md`. Per-template fallback so a missing template
  doesn't get clobbered with the wrong content.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix setup-prompt + CLAUDE.md marketplace copy + drop skills step

Three issues against the post-PR-#240 / post-PR-#237 state:

1. Setup prompt's marketplace block trailer (both has-stack and
   empty-stack variants) claimed the SessionStart hook keeps the
   marketplace clone in sync via `agnes refresh-marketplace --quiet`
   on every session and that admin grants land automatically — both
   false since PR #237 (0.47.x) moved the install/update path out of
   the hook into the `/update-agnes-plugins` slash command. The hook
   is `--check`-only: detects server-side changes, prompts the user
   to run the slash command, which does the full reconcile
   interactively with output visible in the transcript.

2. The empty-stack variant framed composition as "admin grants only",
   missing the actual three-source served stack:
     (admin RBAC ∩ /marketplace subscriptions)
       ∪ system-mandatory plugins (admin-pinned, auto-applied)
       ∪ Flea market installs (skills/agents bundled, plugins standalone)
   Updated copy spells out all three sources so analysts know where
   their stack picks live, and what the SessionStart hook actually
   does on change detection.

3. CLAUDE.md template's "Agnes Marketplace" section conflated
   eligibility (`resolve_allowed_plugins` — what's listed) with served
   stack (`resolve_user_marketplace` — what actually reaches Claude
   Code). The two are different: a user can be RBAC-eligible for a
   plugin without having subscribed to it on /marketplace. Rewrote
   the section to distinguish the eligibility set from the served
   stack and to describe the `--check`-only hook accurately.

Plus: deleted the setup prompt's interactive Skills step (final step
before Confirm). The named-opinion question — "do you want me to
bulk-copy every skill into ~/.claude/skills/agnes/ or pull on-demand
via `agnes skills show <name>`?" — had no obvious right answer for
new users at the tail end of a wall of technical steps. On-demand
lookup is the one-size-fits-all default; `agnes skills list/show`
remain discoverable and the CLAUDE.md template references specific
skills inline (e.g. agnes-data-querying in the BigQuery section)
where they're relevant. Layout: Confirm shifts from step 9 to step 8.

Tests updated, full setup/marketplace/welcome surface green (115
passed). Remaining full-suite failures are pre-existing (BQ/Keboola
fixtures, Windows charmap collection error in test_v26_keboola_e2e)
— verified against a clean stash, unrelated to this diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix session-queue race + snapshot PID-reuse data loss

Two blocker fixes from the PR #242 review:

1. Concurrent SessionStart hooks could corrupt the queue file on
   Windows. Python's `open(path, "a")` is not atomic there — the CRT
   does not pass FILE_APPEND_DATA to CreateFile, so concurrent
   appenders (user opening several Claude Code windows simultaneously)
   could interleave bytes mid-line. The malformed lines then silently
   fail the parser and the entries are dropped.

   Fix: wrap append_to_queue, requeue_failed, and snapshot_queue in a
   short-lived FileLock on a dedicated `agnes-queue.lock`. Separate
   from `agnes-push.lock` so capture-session hooks don't block on the
   push command. New test_append_concurrent_threads_no_corruption
   reproduces the race with 4 threads x 50 appends.

2. Snapshot filenames embedded only the PID (`agnes-sessions.snapshot.
   <PID>.txt`). After a crashed push left a snapshot on disk and the
   OS recycled the PID for a new push, `os.rename` would atomically
   overwrite the recovery snapshot — every entry in it lost, silently.

   Fix: append a uuid8 hex tail (`agnes-sessions.snapshot.<PID>.
   <uuid8>.txt`). find_recovery_snapshots already globs the prefix
   so it picks up both old and new format. New
   test_snapshot_filename_is_unique_per_call asserts two consecutive
   snapshots under the same PID don't collide.

Targeted tests green (47/47 in session_queue/capture_session/cli_push).
Full suite failures unchanged from baseline (pre-existing BQ/Keboola
fixture issues per CLAUDE.md).

* Auto-refresh workspace hooks + bash-wrap all hook entries (Windows)

Fixes from PR #242 second review (ZdenekSrotyr):

1. `uv.lock` regenerated to include `filelock 3.29.0` (declared in
   pyproject.toml but missing from the lock file — CI's
   lockfile-consistency check would fail; `uv pip install` on a clean
   cache would silently miss the dep).

2. `agnes self-upgrade` now auto-refreshes the workspace Claude Code
   hooks via the new `cli.lib.hooks.maybe_refresh_claude_hooks`. Closes
   the silent-stop migration gap: a v0.48 workspace would auto-upgrade
   the CLI from its existing SessionStart self-upgrade entry but never
   pick up the new `agnes capture-session` SessionStart hook, leaving
   the queue empty and `agnes push` uploading nothing.

   The refresh fires on both the "info is None" fast path (CLI already
   current — catches the second SessionStart after a prior upgrade)
   and the install-success path. Guarded by `workspace_has_agnes_hooks`
   so it never writes `.claude/settings.json` into directories that
   aren't Agnes workspaces (e.g. `agnes self-upgrade` invoked from
   `~/`). Errors are surfaced on stderr but never flip the upgrade exit
   code.

3. All Agnes-managed hooks are now wrapped in `bash -c "..."`. The
   self-upgrade+pull chained SessionStart entry was the only one still
   shipping unwrapped — Claude Code on Windows runs hook commands
   directly without a shell, so the `;` chain + `2>/dev/null` +
   `|| true` shell syntax silently no-op'd on native Windows installs
   without Git Bash on PATH. Workspaces still on the old form
   auto-upgrade via the refresh path above.

Tests: +12 in test_lib_hooks.py (guard semantics, v0.48→v0.49
migration end-to-end, third-party-hook preservation, bash-wrap
invariant). +5 in test_self_upgrade.py (refresh fires on info=None,
fires on install success, skipped on failure, skipped on --check-only,
refresh failure never flips exit code).

130 targeted tests green. The 2 pre-existing Windows path-separator
failures in `test_smoke_test_detects_version_mismatch[uv|pip]` are
unrelated (path mismatch `\fake\uv\bin\agnes` vs `/fake/uv/bin/agnes`
in test asserts, pre-PR baseline).

* CHANGELOG: document PR-242 main features

Closes ZdenekSrotyr #4: the [Unreleased] block was missing entries for
the PR's primary surface — only the post-merge fix bullets and the
unrelated setup-prompt copy change were captured. Adds:

- ### Added: 6 bullets covering the session capture queue + new
  `agnes capture-session` subcommand, `/agnes-private` slash + `agnes
  mark-private`, `agnes statusline` + statusLine wiring, `--legacy-scan`
  opt-in fallback, single-instance push lock, and the new `filelock`
  runtime dep.

- ### Changed: BREAKING bullet on the SessionStart / SessionEnd hook
  wire format change (capture-session as first SessionStart entry,
  push self-heal removed, SessionEnd push detached via nohup, all
  entries bash-wrapped). Folds the prior standalone bash-wrap bullet
  into this consolidated entry — Z's review flagged the layout shift
  as BREAKING, and grouping the related sub-changes makes the
  migration story readable in one place.

- Operator migration is auto-handled by `maybe_refresh_claude_hooks`
  invoked from `agnes self-upgrade` (separate Changed entry below).
  No `agnes init` re-run required. Pre-queue session jsonls on
  upgrading workspaces still need a one-off `agnes push --legacy-scan`
  — flagged in the BREAKING bullet.

No code change; doc only.

* Drop permanent 4xx uploads instead of requeueing forever

Closes ZdenekSrotyr #5. Previously the push retry path requeued any
non-200 response except the literal "file not found on disk", so 401
(token expired), 403 (RBAC denial), 413 (payload too large), 400
(server-side validation) cycled through every push run forever — the
queue grew without bound and each run re-bombarded the server with the
same deterministically-failing upload.

Now 4xx (except 408 Request Timeout + 429 Too Many Requests, which the
HTTP spec marks as transient) is dropped and audit-logged to
`<workspace>/.claude/agnes-sessions-failed.txt`:

    <iso_ts>\t<session_id>\t<status>\t<transcript_path>

5xx and network errors continue to requeue — those reflect server /
transport state that can change between runs, so retry is the right
behavior.

The audit log piggybacks on the push single-instance lock
(agnes-push.lock) — push is the only writer to this file, same as the
existing `mark_uploaded` and `mark_private_skipped` paths, so no
separate filelock is needed.

`agnes push --json` surfaces a new `dropped_permanent` counter; non-
quiet stdout mentions the audit-log path so operators tailing the
output have a pointer to the forensic trail.

Tests: +7 in test_cli_push.py (401/400/403/413 → drop; 408/429 →
requeue; 500/502/503 → requeue; network exception → requeue;
--json `dropped_permanent` counter; stdout audit-log pointer). +1 in
test_session_queue.py (mark_failed_permanent TSV format).

127/129 targeted tests green. The 2 pre-existing Windows
path-separator failures in `test_smoke_test_detects_version_mismatch
[uv|pip]` are unrelated (path mismatch `\fake\uv\bin\agnes` vs
`/fake/uv/bin/agnes` in test asserts, pre-PR baseline).

* Catch OSError in push lock acquisition

Closes ZdenekSrotyr #8. `acquire_or_skip` in `cli/lib/push_lock.py`
previously caught only `filelock.Timeout`. Any `OSError` from
`FileLock.acquire` — read-only filesystem, permission denied on
`.claude/`, disk full, hardware I/O failure — propagated as an
unhandled traceback.

Two visible failure modes:
- SessionEnd hook: `|| true` in the wrapper swallowed the error, so
  daily pushes silently never ran. Operator had no signal.
- Manual `agnes push`: ugly Python traceback dumped to the terminal
  instead of a clean exit.

Now `OSError` is treated the same as `Timeout` — yield `None`, caller
returns cleanly with rc=0. The operator's environment in these
scenarios has bigger problems than missing session uploads, so we
swallow rather than retry-loop or surface a noisy warning.

Test: `test_push_silent_exit_when_filelock_raises_oserror` patches
the `FileLock` used inside `push_lock` to raise OSError on acquire,
verifies push exits 0 with no traceback and the queue is preserved
for the next attempt.

* Address remaining S2 items from PR-242 review

Four items from ZdenekSrotyr's S2 list:

S2.10 — `_install_statusline` truthy check (cli/lib/hooks.py): replace
`if existing:` with explicit `if existing is None or existing == "":`.
Documents and tests the behavior for both edge cases (explicit-null
and empty-string `statusLine`) — both treated as "not configured"
rather than "explicit user opt-out", so we install ours. Two new
tests in test_lib_hooks.py pin the contract.

S2.6 — onboarding docs for /agnes-private. New "Private sessions"
subsection in `config/claude_md_template.txt` (next to Data Sync)
covering the slash command, statusbar indicator, and audit-log
location. One-line tip in `app/web/setup_instructions.py` so the
feature is discoverable at onboarding.

S2.9 — e2e privacy test (tests/test_e2e_privacy.py). Wires
capture_session → mark_private → push against a recording fake
api_post and asserts zero session uploads for the marked one.
Three cases: mark-before-capture (queue write skipped),
mark-after-capture (push-side filter catches it + audit-logs),
control (unmarked sessions upload normally).

David #8 — `--legacy-scan` help text now documents the
private-list gap (legacy entries carry empty session_id, so
the filter is not consulted). The practical impact is bounded —
pre-queue sessions cannot have been marked private since the
private list is a queue-era feature — but the disclaimer in the
help text means an operator running a backfill is not surprised.

68 targeted tests green (3 new e2e + 2 new truthy edge tests +
existing). 2 pre-existing Windows path-separator failures in
test_smoke_test_detects_version_mismatch[uv|pip] unchanged.

Remaining S2 items (statusline mkdir push-back, capture-session
silent-fail follow-up) handled in PR comment + follow-up issue
respectively.

* Address remaining S2 follow-ups (David #8, S2.7, David #11)

Three items left over from Mina's bbf63472 batch — that commit
addressed S2.6/S2.9/S2.10 + documented David #8 in help text but
deferred the actual implementations of S2.7, David #11, and the real
David #8 fix to follow-ups. This commit closes them.

David #8 — `agnes push --legacy-scan` now consults the private list.
Claude Code names jsonls `<session-id>.jsonl`, so the file stem IS
the session id; the legacy-scan path can apply the same private filter
the queue path uses. Both the dry-run and live-upload code paths fixed.
Help text updated (no longer warns the filter is bypassed). Two new
tests in test_cli_push.py cover the upload-skip path + the dry-run
`would_skip_private` segregation.

S2.7 — `statusline`/`is_private` no longer mkdir-pollutes arbitrary
workdirs. Split `_claude_dir` into `_claude_dir_writable` (used only
from `add_private`) and `_claude_dir_readonly` (no mkdir). The
read-only public helpers (`private_list_path`, `read_all_private`,
`is_private`) compose the no-mkdir variant by default; `add_private`
opts in via `writable=True`. Added a process-local mtime-keyed cache
around `read_all_private` so in-process callers (push doing one stat
per upload candidate, future `agnes diagnose`) don't re-parse the
file on every check. Cache eviction on `add_private` so a sub-second
write+read sequence doesn't see stale data even on coarse-mtime
filesystems. Two new tests pin the no-mkdir contract + the
in-same-second add+read consistency.

David #11 — `agnes capture-session` writes a breadcrumb log on every
invocation. New `<workspace>/.claude/agnes-capture-session.log` TSV:
`<iso_ts>\t<outcome>\t<detail>` where outcome covers every silent-
exit path (`ok`, `private_skip`, `empty_stdin`, `bad_json`,
`not_object`, `no_transcript_path`, `stdin_read_error`,
`write_error`). Gives operators a signal to detect "hook fires but
queue stays empty" — without it, an upstream Claude Code stdin-
contract change is invisible because the hook always exits 0. Log
rolls at 256 KiB so it doesn't grow unbounded on long-lived
workspaces. Best-effort: a breadcrumb-write failure is itself
swallowed so the hook contract stays "exit 0 always". Skipped in
non-Agnes workdirs (no `.claude/` exists) so opening Claude Code
in `~/` doesn't pollute it. Five new tests in test_capture_session.py
cover the success / bad_json / no_transcript_path / private_skip /
no-pollute paths.

115 targeted tests green (test_cli_push, test_capture_session,
test_private_list, test_session_queue, test_e2e_privacy,
test_lib_hooks, test_statusline, test_mark_private).

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-11 13:31:16 +00:00
Vojtech
41829e8a45
Setup-prompt + bootstrap fixes from 2026-05-10 init report (#240)
* Setup-prompt + bootstrap fixes from David's 2026-05-10 init report

Three issues from clean-machine bootstrap evidence:

1. `agnes refresh-marketplace --bootstrap` failed to recover when the
   local clone existed but Claude Code's marketplace registry had lost
   the `agnes` entry. Bootstrap path now parses
   `claude plugin marketplace list`, re-runs
   `claude plugin marketplace add ~/.agnes/marketplace` when missing,
   and treats `add` failures as fatal (was warn-and-continue, root cause
   of the cascade into "Marketplace 'agnes' not found" plugin install
   errors).

2. Setup prompt now always emits the marketplace-registration block,
   even when the operator has zero plugin grants. Pre-wires the
   SessionStart hook so future admin grants land automatically without
   re-running setup. Block copy adapts: empty list shows
   "no plugins granted yet", populated list shows "install plugins".

3. Setup prompt registers the Atlassian Remote MCP server unattended
   (`claude mcp add --transport sse atlassian
   https://mcp.atlassian.com/v1/sse`). Hosted Remote MCP, OAuth handled
   automatically by Claude Code on first use. Asana / GWS stay on the
   /home connector cards (PAT/keychain flows don't fit unattended
   bootstrap).

Confirm step nudges the user toward the /home connector cards for the
PAT-flow services. CLAUDE.md template renames the marketplace section
to "Agnes Marketplace" and documents that all plugins are addressed as
`<plugin>@agnes` regardless of upstream slug.

Layout: Confirm shifts from step 6/8 to step 9 across all variants
(preflight, marketplace, MCP all unconditional). Tests updated.

* Link Claude license options from /home install pane

Step-1 Claude install on /home pointed users to  OAuth without
explaining what to do if they don't have a Pro/Max subscription. Add
a one-line follow-up link to the plan-tier section on /setup-advanced
(new `#claude-plan` anchor) so first-time users discover the
subscription tiers rather than bouncing on the OAuth screen.

* Add idempotent + no-TLS-bypass guardrails to /home connector prompts

The Asana / Google Workspace / Atlassian connector prompts on /home
already shipped a precheck step that short-circuits when the service
is already wired, but they didn't carry the same idempotency +
surface-errors-verbatim + don't-disable-TLS-verification guardrails
the bash bootstrap prompt has. Add a one-paragraph 'Ground rules'
block at the top of each prompt so a connector failure doesn't
tempt the model into bypass workarounds, matching the same posture
David's 2026-05-10 init report flagged for the bash flow.

* skip Source: lines in marketplace registry detector

`claude plugin marketplace list` prints a `Source: <local path>` line
under each registered marketplace; the local clone almost always lives
under a path containing the marketplace name itself
(`~/.agnes/marketplace`). A naive \\bagnes\\b match over the full
stdout therefore false-positives whenever ANY unrelated marketplace
sits under `~/.agnes-…/` or similar. Filter Source: lines out before
matching so the recovery path actually re-adds when needed instead of
silently falling through to a broken `marketplace update agnes`.
Adds regression test covering the substring-only case.

* drop customer-specific tokens from CHANGELOG entries

Per CLAUDE.md vendor-agnostic OSS rule ("nothing customer-specific
... in changelogs"):
- "agnes-vrysanek.groupondev.com" -> "a private-CA Agnes deployment"
- "Groupon Marketplace / groupon-marketplace" -> "<Org> Marketplace /
  <org>-marketplace" (placeholder example)
- Removed "David flagged" attribution language; init-report context
  stays intact, just stripped of the named host + brand

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-10 20:24:00 +02:00
minasarustamyan
d269c69359
Drop legacy sslVerify=false fallback from install setup prompt (#238)
The marketplace step (step 5) emitted `git config --global
http.<host>/.sslVerify false` on AGNES_DEBUG_AUTH=1 instances when no
ca_pem was readable from AGNES_TLS_FULLCHAIN_PATH. Two problems:

1. Claude Code auto-mode classifiers ("do not disable TLS verification"
   rule) block the line, breaking hands-free setup.
2. It silently masked operator misconfiguration — a debug-auth instance
   without a fullchain on disk fell through to a TLS-disabled clone
   instead of surfacing the missing cert.

After the cross-platform trust block (#137), self-signed and private-CA
servers are fully covered by step 0 reading the fullchain via
_read_agnes_ca_pem; publicly-trusted certs need no bootstrap at all.
The legacy fallback no longer covers a real scenario — verified by
running step 5 without sslVerify=false against a self-signed instance.

BREAKING: drops `self_signed_tls` parameter from
app.web.setup_instructions.resolve_lines and render_setup_instructions
(only consumed by the deleted block). The AGNES_DEBUG_AUTH env var
itself is unchanged — still gates /api/me_debug and the dropdown link.

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
2026-05-09 20:10:01 +02:00
Minas Arustamyan
50e0463501 feat(marketplace): clone-based plugin setup + auto-refresh SessionStart hook
Adds end-to-end flow for installing and keeping the per-user filtered
Claude Code marketplace in sync with the user's Agnes stack
(admin RBAC grants \ MyAIStack opt-outs U /store installs).

Setup (one-liner in install prompt step 5):
  `agnes refresh-marketplace --bootstrap` clones the per-user marketplace
  bare repo to ~/.agnes/marketplace, strips PAT from the cloned origin
  URL, registers the local path with Claude Code, and installs every
  plugin in the served manifest at --scope project. Replaces a 15-line
  inline shell sequence that tripped Claude Code's agent-driven `rm -rf`
  permission gate.

Auto-refresh (SessionStart hook installed by `agnes init`):
  `agnes refresh-marketplace --quiet` runs every Claude Code session,
  fetches+resets the clone (server rebuilds as orphan commits, so
  pull --ff-only is impossible), and version-aware reconciles:
    - missing in workspace -> claude plugin install <name>@agnes --scope project
    - version differs       -> claude plugin update <name>@agnes
    - matches               -> skip
  Don't auto-uninstall plugins that disappeared from the manifest --
  a transient empty manifest from the server would wipe the stack.

Hook output: when --quiet AND something actually changed, emits Claude
Code hook JSON on stdout -- `systemMessage` (transient toast) and
`hookSpecificOutput.additionalContext` (model-side system reminder),
both carrying the change summary plus a "/exit + restart Claude Code"
instruction (Claude only scans plugins at session start).

Windows hook compatibility: the refresh-marketplace hook command is
wrapped in `bash -c "..."` because Claude Code on Windows runs hook
commands directly without invoking a shell, so `2>/dev/null || true`
would otherwise be passed as literal argv tokens.

Cross-cutting:
  - cli/lib/marketplace.py: shared CLONE_DIR + MARKETPLACE_NAME constants.
  - cli/lib/hooks.py: SessionStart now has two independent entries
    (pull + refresh-marketplace) so a failure in one doesn't suppress
    the other; legacy `da sync` and prior single-pull layouts upgrade
    cleanly on re-init.
  - PAT injection on every git fetch via per-invocation credential
    helper (token in \$AGNES_TOKEN env, never in argv or .git/config).
  - Pre-snapshot of installed plugins captured BEFORE
    `claude plugin marketplace update` so silent auto-applied version
    bumps still fire notifications.
  - scripts/dev/agnes-client-reset.sh: cleans ~/.claude/plugins/marketplaces/agnes,
    ~/.claude/plugins/cache/agnes, drops uv build cache, documents
    workspace-scoped residue that can't be enumerated from the script.
  - app/web/setup_instructions.py: legacy AGNES_DEBUG_AUTH path also
    uses clone (direct HTTPS marketplace add is broken end-to-end on
    every Claude Code distribution -- stores response as single file,
    plugin source paths then 404).

28 new tests (test_cli_refresh_marketplace.py) + extended hook + setup
template tests cover bootstrap, fetch+reset ordering, version-aware
reconcile, project-path filtering, hook JSON shape, and the bash-c
Windows wrapper invariant.
2026-05-07 06:59:13 +02:00
ZdenekSrotyr
74b7f6e254 feat(setup-instructions): preflight checks both git and claude
Renames `_git_check_block` to `_preflight_block` and adds a
`claude --version` check beside `git --version`. Both binaries are
required by the marketplace step — git for the clone fallback,
claude for `claude plugin marketplace add` / `claude plugin install` —
so checking them together gives one clear failure instead of two
confusing downstream errors.

Install hints: `npm i -g @anthropic-ai/claude-code` for Linux / WSL
plus a doc URL (https://docs.claude.com/claude-code) for the native
macOS / Windows installers. We don't try to one-line a native
installer; the canonical instructions live upstream.

Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 3.
2026-05-04 22:11:38 +02:00
ZdenekSrotyr
e16698c3cc refactor(setup-instructions): unified layout with mandatory agnes init
Adds `_step_numbers(*, has_marketplace, has_skills)` so step numbering
lives in one place instead of being split across three branches in
`resolve_lines`. Pins the unified layout in the tests:

  No plugins:     1 install, 2 init, 3 catalog, 4 diagnose, 5 skills, 6 confirm
  With plugins:   1, 2, 3, 4 preflight, 5 marketplace, 6 diagnose, 7 skills, 8 confirm

`agnes auth import-token` / `agnes auth whoami` are now banned from the
rendered prompt — `agnes init` subsumes them. The renamed
`test_resolve_lines_no_plugins_unified_six_step_layout` asserts those
strings are absent and that the new step headers (`Bootstrap your Agnes
workspace`, `Verify the data is queryable`) are present.

Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 2.
2026-05-04 22:10:05 +02:00
ZdenekSrotyr
9334beed15 refactor(setup-instructions): drop role param; collapse analyst/admin into one layout
Removes the `role: Literal["analyst", "admin"]` parameter from
`resolve_lines` / `render_setup_instructions` and deletes the
`_resolve_analyst_lines`, `_analyst_init_lines`, `_analyst_finale_lines`
helpers. The unified flow now always emits `agnes init` (the
workspace-rails delivery mechanism) in place of the legacy
`agnes auth import-token` + `agnes auth whoami` pair, and uses
`agnes catalog` as the smoke-verify step.

`agnes init` already verifies the PAT internally, and `agnes catalog`
doubles as a data-plane smoke check, so dropping `agnes auth whoami`
costs no signal.

Drops the now-redundant `tests/test_setup_instructions_analyst.py` and
patches the one ordering test in `tests/test_setup_instructions.py` that
referenced the old "Log in" / "Verify the login" headers. Also strips
the `role=role` kwarg from `compute_default_agent_prompt`'s call into
`resolve_lines` so the welcome-template render path keeps working;
welcome_template.py's own role param is removed in a follow-up task.

Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 1.
2026-05-04 22:08:48 +02:00
ZdenekSrotyr
54f83c281c test(setup): I1+I2 review fixes — AGNES_WORKSPACE.md alignment + step-number pin 2026-05-04 17:23:15 +02:00
ZdenekSrotyr
ae00945cbf fix(setup): clean stale 'da' refs in setup_instructions.py (Task 0.5 missed sweep) 2026-05-04 17:19:55 +02:00
ZdenekSrotyr
29e28ccbd3 feat(setup): add analyst role to install-prompt renderer 2026-05-04 17:17:59 +02:00
ZdenekSrotyr
1563b05f2e refactor(cli): hard-cutover env vars + config dir to AGNES_*
Task 0.5 of clean-analyst-bootstrap. Greenfield rewrite — no fallback,
no aliases. Existing dev environments lose their cached PAT and must
re-authenticate.

Env var renames (hard cutover):
- DA_CONFIG_DIR    -> AGNES_CONFIG_DIR
- DA_SERVER        -> AGNES_SERVER
- DA_SERVER_URL    -> AGNES_SERVER_URL  (test-only stale ref, not in spec)
- DA_NO_UPDATE_CHECK -> AGNES_NO_UPDATE_CHECK
- DA_LOCAL_DIR     -> AGNES_LOCAL_DIR
- DA_TOKEN         -> AGNES_TOKEN
- DA_STREAM_RETRIES -> AGNES_STREAM_RETRIES

Config dir rename: ~/.config/da/ -> ~/.config/agnes/ (across code,
comments, docstrings, error messages, install templates, dev scripts).

Stale `da X` references in CLI source (and adjacent app/, tests/):
swept docstrings, comments, help text, and error messages where the
verb survives the rewrite (init, pull, push, catalog, status, diagnose,
auth, admin, skills, query, schema, describe, explore, disk-info,
snapshot, login, logout, whoami, server, setup) and replaced `da X`
with `agnes X`. Intentionally kept `da sync`, `da fetch`, `da analyst`,
`da metrics` — those verbs are removed in later tasks; the legacy
strings will be detected by `_LEGACY_STRINGS` (added in Task 2).

Test fixes:
- TestCLIVersion now asserts output starts with `agnes ` (was `da `).

Test results: 2675 passed, 25 skipped (full pytest run, excluding 9
pre-existing test_db.py / test_user_management.py / test_e2e_extract.py
/ test_cli_binary_rename.py failures unrelated to this rename).
2026-05-04 16:35:44 +02:00
minasarustamyan
4ec5ff44dd
feat(setup): cross-platform TLS bootstrap + marketplace plugin install (#137)
Bootstraps the Agnes Claude Code marketplace + RBAC-allowed plugins from
the dashboard CTA, and inlines the server's TLS cert when the chain isn't
publicly trusted (self-signed / private CA). Cross-platform setup prompt
covers Windows Git Bash, macOS, Linux. Includes Bun-compiled `claude` fix
(macOS goes via git-clone fallback, same as Windows), PAT stripping after
clone, explicit error handling, and four rounds of Devin Review fixes
(phantom step references, $PLATFORM re-detection, heredoc/awk line-count
sync). Cuts 0.21.0.

See CHANGELOG.md [0.21.0] section for details.
2026-04-30 08:56:45 +02:00
Petr Simecek
1bbbe58ea0
release(2.1.0): durable sync, CLI auto-update, versioned wheel URL, version unification (#43)
* fix(cli): versioned wheel URL in setup instructions; drop broken /cli/agnes.whl alias (#36)

* fix(cli): inline PEP 427 wheel filename in setup instructions

`uv tool install <server>/cli/agnes.whl` fails with

    error: The wheel filename "agnes.whl" is invalid: Must have a version

because uv validates the filename in the URL path *before* fetching — so
the server-side Content-Disposition header (which has the real versioned
filename) is never consulted, and an HTTP redirect does not help either:
uv resolves the filename from the initial URL.

Fix the root cause by inlining the real PEP 427 filename into the setup
snippet the dashboard copies to the clipboard. The wheel filename is
resolved server-side via `_find_wheel()` and substituted into the lines
returned from `setup_instructions.resolve_lines()`, so both the read-only
HTML preview and the JS clipboard renderer get byte-identical output.

Also added `/cli/wheel/{filename}` to serve wheels at their PEP 427 path,
and kept `/cli/agnes.whl` as a 302 redirect for manual/legacy callers —
though that redirect alone is NOT sufficient for `uv tool install` (uv
validates before following redirects) and is there only as defense-in-depth.

Verified locally:
- `uv tool install <server>/cli/wheel/agnes_the_ai_analyst-2.0.0-py3-none-any.whl` succeeds
- `/install` HTML now renders the versioned URL; `/cli/agnes.whl` no longer appears in the rendered snippet

* fix(cli): remove /cli/agnes.whl alias entirely — it only confused users

The bareword alias was never actually usable:

- `uv tool install <server>/cli/agnes.whl` fails at filename validation
  before any HTTP fetch, so neither the Content-Disposition header nor a
  302 redirect rescued it.
- The 302-to-versioned-path fallback left a visibly "working" URL in
  browser / curl -L contexts, which is exactly how the original bug got
  reported in the first place ("the URL loads, why doesn't install work?").

Remove the endpoint and scrub all remaining references. The only CLI wheel
URL is now `/cli/wheel/{filename}` with the real PEP 427 filename, which
the setup-instructions template already generates server-side.

Existing tests that referenced /cli/agnes.whl become negative tests
("must not appear") so we don't regress.

* feat(cli): --version flag; sync --dry-run + progress indicator (#38)

* feat(cli): add --version / -V flag

Prints `da <version>` from package metadata (importlib.metadata). Falls
back to "unknown" when the package is not installed (e.g. running from a
source checkout without `uv pip install -e .`), instead of crashing.

Eager typer callback, so `da --version` exits before subcommand
resolution and does not require any auth/config.

* feat(cli): da sync --dry-run + X/N progress indicator

--dry-run reports what would be downloaded/uploaded without hitting the
API or writing local state. Supports the full flag set (--table, --json,
--upload-only); JSON shape is {"dry_run": true, "would_download": [...],
"summary": {...}}.

Progress bar now shows "[X/N] Downloading <table>..." with a Rich
BarColumn + TaskProgressColumn + TimeElapsedColumn instead of a bare
spinner — makes long syncs visible.

* feat(cli): durable sync + server gzip + auto-update check (#41)

* fix(sync): atomic writes + manifest hash verification + retry on transient errors

Three durability hooks around stream_download and the sync command:

1. Atomic writes. stream_download now streams into `<target>.tmp` and
   calls os.replace() on success, so the real target file never exists
   in a half-written state. On failure the tmp is unlinked — no cleanup
   leftovers, no guard needed at read time.

2. Retry with backoff. Transient errors (ConnectError, ReadError,
   WriteError, RemoteProtocolError, TimeoutException, 5xx) are retried
   up to 3× with 0.3s / 1s / 3s backoff. 4xx (auth, 404) surfaces
   immediately — retrying those is pointless.

3. Manifest-hash verification. After download, sync.py computes MD5 of
   the target (same 8KiB chunking as app/api/sync.py:_file_hash) and
   compares against `server_tables[tid]["hash"]`. Mismatch ⇒ unlink,
   record error, skip state commit. The PAR1 structural check survives
   as a fallback for legacy manifests without a hash.

Also makes _rebuild_duckdb_views tolerant: single broken parquet is
skipped with a stderr warning instead of killing the whole rebuild.

Supersedes #40 — this commit is a strict super-set (hash check + PAR1
fallback + atomic write + retry). #40 can be closed without merging.

* perf(server): enable GZipMiddleware for JSON / HTML responses

GZipMiddleware at minimum_size=1024 shaves bandwidth on manifest-style
JSON endpoints (/api/sync/manifest, /api/version, …) and the /install
HTML preview. Parquet file downloads are already columnar-compressed so
the middleware sees limited benefit there — but it doesn't hurt, httpx
on the client side decompresses transparently.

Placed after session middleware so gzip wraps the session-Set-Cookie
response too, and before CORSMiddleware so compression is applied to
both cross-origin and same-origin responses.

* feat(cli): auto-check for newer CLI version on startup

Server side
- GET /cli/latest returns {version, wheel_filename, download_url_path}
  for whatever wheel is currently in AGNES_CLI_DIST_DIR. Public,
  cacheable, no secrets — consumed by the CLI auto-update probe.

Client side
- New cli/update_check.py: reads /cli/latest with a 3s timeout, caches
  the result in $DA_CONFIG_DIR/update_check.json for 24h. Cache is
  invalidated when the installed version changes (e.g. after a fresh
  `uv tool install`) so stale "you're behind" warnings don't linger.
- Root typer callback fires the probe before subcommand dispatch; any
  failure is swallowed so a bad network never blocks a working command.
- Outdated → one-line stderr warning:
    [update] da 2.0.0 is out of date — latest on this server is 2.1.0.
    Upgrade: uv tool install --force <server>/cli/wheel/<…>.whl
- Disable with DA_NO_UPDATE_CHECK=1.

* fix(pr-review): None-guard the upgrade line + skip gzip on parquet paths

Two follow-ups from Devin review on #41.

1. format_outdated_notice(UpdateInfo(download_url=None)) emitted literal
   "uv tool install --force None" — copy-pasting that fails. Drop the
   upgrade snippet when the URL is absent and keep only the version line.

2. GZipMiddleware compressed everything over 1024 bytes, including the
   parquet FileResponses served by /api/data/{tid}/download,
   /cli/wheel/{name}, and /cli/download. Parquet is already columnar-
   compressed — gzip there is pure CPU + latency with no size win, and
   /api/data bodies can reach hundreds of MB. Wrap GZipMiddleware in a
   small _SelectiveGZipMiddleware that skips those path prefixes and
   delegates the rest to the stock middleware. JSON / HTML endpoints
   (manifest, /install, /api/version, …) still get compressed.

* release: bump to 2.1.0 — unify AGNES_VERSION with pyproject.toml version (#42)

Before: two independent version systems. pyproject.toml carried semver
(2.0.0 → wheel filename → `da --version`) while release.yml injected
CalVer into AGNES_VERSION (e.g. 2026.04.155 → /api/version). Users saw
different strings in the CLI vs. the /install page, and the CLI auto-
update check couldn't tell "new deploy, same package version" apart
from "new package version".

Make pyproject.toml [project].version the single product-version source
of truth. release.yml extracts it and feeds AGNES_VERSION, so every
surface (/api/version, /api/health, /cli/latest, `da --version`) agrees
on one number. The CalVer tag keeps doing what CalVer is for: release
identity on the git tag and Docker image tag (versioned_tag).

Also wires AGNES_TAG through the build: release.yml → Dockerfile ARG →
env, so /api/version.image_tag finally reports the actual image tag
instead of the "unknown" fallback.

Bump to 2.1.0 to reflect the PRs shipped on ps/wheel-name-fix: durable
sync (atomic writes + manifest MD5 + retry), server GZip, CLI auto-
update probe, setup snippet PEP 427 URL.

* fix(pr-review): directional version compare in is_outdated()

UpdateInfo.is_outdated() used `self.latest != self.installed`, which
fires in both directions. If the server is rolled back or the user
connects to an older deployment, the CLI would warn "out of date"
and — worse — the formatted notice would prompt

    uv tool install --force <older-version>.whl

i.e. an unintended downgrade.

Compare with packaging.version.Version (PEP 440 aware, handles pre-
release tags). Fall back to dotted-int tuple compare if packaging is
somehow missing, and return False on unparseable strings — better to
miss an upgrade hint than to silently suggest a downgrade.

Adds 4 test cases: installed older (True), installed newer (False),
10.0.0 vs 2.1.0 lexical-compare trap (correct), unparseable strings
(False).

Addresses Devin review on #43.

* fix(pr-review): read FastAPI app version from package metadata

app/main.py:80 hardcoded `version="2.0.0"` in the FastAPI constructor.
After #42 bumped pyproject.toml to 2.1.0, /api/version, /cli/latest,
and `da --version` all reported 2.1.0 while /openapi.json and the
/docs UI still advertised 2.0.0.

Read `agnes-the-ai-analyst` version via importlib.metadata (same
pattern cli/main.py:_cli_version already uses), with a `"dev"`
fallback when the package is not installed (source checkout). This
way pyproject.toml stays the single source of truth across every
version surface — /openapi.json now tracks the bump automatically.

Adds a dedicated test file to pin this behavior so a future
regression to a hardcoded literal fails at CI.

Addresses second Devin finding on #43.

* fix(pr-review): _fmt_bytes PiB label + negative cache in update_check

Two more follow-ups from Devin review on #43.

1. _fmt_bytes off-by-unit. The old loop exited at TiB but the fallback
   labelled PiB, so 1 PiB rendered as "1024.0 PiB". Restructure: put
   every unit inside the loop (KiB through EiB) so the division count
   always matches the label. Covers up to 1 ZiB cleanly; anything
   beyond renders as "<big>.0 EiB" rather than crashing.

2. Negative cache for failed /cli/latest probes. On a corporate
   firewall / VPN that silently drops packets, the 3s HTTP timeout
   fired on *every* `da` invocation. Writing a `latest=None` cache
   entry with a 5-minute TTL caps that at one probe per 5min. Successful
   probes still use the 24h TTL. Reading logic branches on whether the
   cached `latest` is None.

Adds TestFmtBytes (2 cases: small/medium sizes and the PiB/EiB fallback
regression), plus two TestSync update-check cases covering negative-
cache reuse and TTL expiry.
2026-04-22 21:18:18 +02:00
ZdenekSrotyr
d2c76cb221
User management + PAT + CLI distribution + HTML auth redirect (#9 #10 #11 #12) (#28)
* fix: redirect unauthenticated HTML routes to /login (#10)

* docs(plan): user mgmt + PAT + CLI distribution implementation plan (#9 #10 #11 #12)

* build(docker): produce wheel artifact for /cli/download (#9)

* feat(db): schema v5 — users.active + deactivated_at/by (#11)

* feat(api): /cli/download wheel + /cli/install.sh with baked server URL (#9)

* feat(users): repository supports active flag + count_admins (#11)

* feat(ui): /install page with per-deployment install instructions (#9)

* feat(api): user PATCH/reset-password/set-password/activate/deactivate (#11)

* fix(cli): da login prompts for password and sends it in body (#9)

* test(api): safeguard tests for self-deactivate and last admin (#11)

* feat(auth): reject requests from deactivated users (#11)

* fixup(#10): propagate next through /login buttons + lock down sanitizer tests

* feat(cli): da admin set-role/activate/deactivate/reset-password/set-password (#11)

* feat(ui): /admin/users management page (#11)

* feat(db): schema v6 — personal_access_tokens (#12)

* feat(users): access_tokens repository (#12)

* feat(auth): JWT carries typ (session|pat) and explicit jti (#12)

* feat(auth): reject revoked/expired PATs; update last_used_at (#12)

* feat(api): /auth/tokens CRUD + admin revoke; session-only guard (#12)

* feat(cli): da auth token create/list/revoke (#12)

* feat(ui): /profile page with PAT create/list/revoke (#12)

* docs: PAT usage and session/PAT TTL clarification (#12)

* feat(auth): PAT first-use-from-new-IP audit + last_used_ip (schema v7) (#12)

Closes remaining acceptance gap from issue #12: audit_log entry on first use
of a PAT from an IP that differs from the recorded last_used_ip.

- schema v7: personal_access_tokens.last_used_ip column
- AccessTokenRepository.mark_used now stores the client IP
- get_current_user extracts client IP (X-Forwarded-For first hop, fallback
  to request.client.host) and emits a token.first_use_new_ip audit when the
  IP changes on a subsequent use (not the very first use)
- tests: new-ip audit, same-ip no-op, first-ever-use no-op, schema v7 column

* fix: address Devin review findings on PR #28

- app/main.py: exclude /auth/* from HTML redirect handler so JSON
  endpoints under /auth/ (PAT CRUD used by `da auth token` CLI) keep
  their 401 JSON contract (Devin #1, bug)
- app/api/tokens.py: reject expires_in_days <= 0 explicitly; use
  `is not None` so 0 no longer silently creates a non-expiring token
  (Devin #2)
- app/api/users.py: validate role against Role enum in create_user
  to match update_user and prevent 500 on role-protected requests
  later (Devin #3)
- app/web/templates/admin_users.html: escape user-supplied strings
  before innerHTML; move onclick handlers to addEventListener via
  data attributes so emails with quotes / HTML no longer break the UI
  or enable stored XSS (Devin #4)
- app/auth/router.py, app/auth/providers/{password,google}.py:
  reject deactivated users at login instead of issuing a JWT that
  would then fail on the next request — removes the confusing
  redirect loop (Devin #5)
- CLAUDE.md: document schema v7 instead of stale v4 (Devin #6)
- tests/test_web_ui.py: regression test for the /auth/* JSON 401

* feat(web): add /profile and /admin/users links to dashboard nav

* feat(web): point setup banner at /install page

* chore(web): drop unused setup_instructions context

* fix: address Devin review round 2 on PR #28

- app/api/tokens.py: when expires_in_days is None (the "never" option),
  use a ~100-year JWT expiry so the token doesn't silently die in 24h
  via the session-default fallback in create_access_token. The real
  expiry enforcement stays in verify_token's DB-level check (Devin 🔴)
- app/web/templates/profile.html: escape t.name and other user-supplied
  strings via esc() helper before innerHTML, same pattern as
  admin_users.html. Move revoke onclick to data-attribute +
  addEventListener (Devin 🟡)
- app/api/cli_artifacts.py: use `mktemp -d` with X's at end of template
  for GNU/BSD portability, place wheel inside the temp dir and
  clean up with rm -rf (Devin 🚩)

* feat(web): redesign /install page; make curl one-liner primary, collapse manual

Rebuild the public /install page using the dashboard visual language
(shared header, card layout, gradient hero, design tokens from
style-custom.css). The page is now anchored on the one-liner install
path: curl -fsSL <server>/cli/install.sh | bash is rendered as the
primary, prominent step 1, while the old manual wheel-download flow
is tucked behind a closed-by-default <details> block for users in
restricted/offline environments.

Information architecture:
  hero (server URL + version)
  -> step 1: quick install (one-liner, big Copy button)
  -> step 2: create PAT on /profile + export DA_TOKEN / da auth whoami
  -> step 3: Claude Code / MCP via ~/.config/da/token.json
  -> collapsed "Manual install" details for download-wheel flow
  -> footer link to docs/HEADLESS_USAGE.md

Every shell snippet has a vanilla-JS "Copy" button that confirms
visually ("Copied!" for 1.5s) and falls back to textarea+execCommand
on non-secure contexts. No new dependencies, no bundler.

The route now also pulls an optional user so the header shows the
same nav (Dashboard / Profile / Logout) as dashboard.html when a
session exists, while staying fully public when signed out.

* fix(cli): use real wheel filename in install.sh (broken pip/uv install)

The installer wrote the downloaded wheel as agnes_cli.whl, which lacks a
PEP-427 version component — both pip and uv tool install reject it and
abort the one-liner.

Use curl -OJ so Content-Disposition determines the on-disk filename, then
resolve it via glob. Install an EXIT trap to remove the tmpdir even when
install fails.

* fix(web): correct manual install wheel glob and add PEP 668 / PATH hints

- Wheel glob is agnes_the_ai_analyst-*.whl (not agnes-*.whl) — the old
  pattern never matched the real artefact name from the build.
- Add — or — separator between uv tool install and pip install.
- Warn that pip install --user is blocked on macOS Homebrew / modern
  Debian (PEP 668) and recommend uv tool install as the default path.
- Both flows now show the ~/.local/bin PATH hint so a fresh shell can
  find the da binary after install.

* fix(web): consistent session.user reference in install header

The avatar-letter fallback inside {% if session.user %} was reading
user.name / user.email directly, but the route dependency can pass
user=None — those references resolved to an empty FlexDict and produced
an empty avatar circle. Read everything through session.user to match
the guard and the dashboard pattern.

* fix(web): point headless usage link at GitHub source

/docs/HEADLESS_USAGE.md 404s — no static route serves repo docs. Point
the footer link at the rendered markdown on GitHub instead of adding a
dedicated docs serving route just for one file.

* feat(web): /install hero size, anon sign-in banner, step 2 copy polish

- Bump hero h1 from 26px to 30px to match dashboard primary scale.
- Anonymous visitors see a small sign-in banner above Step 2 (creating
  a token requires auth; without the banner the flow appears stuck).
- Add an 'After generating your token' section label inside Step 2 so
  the /profile CTA button no longer looks wedged mid-sentence between
  adjacent paragraphs.

* chore(web): /install a11y + version pill polish

- aria-live='polite' on copy buttons so screen readers announce the
  'Copied!' state change.
- Replace redundant INSTANCE_NAME eyebrow (already in the header logo)
  with 'Getting started'.
- Hide the version pill when AGNES_VERSION is unset/'dev' — avoids the
  misleading 'vdev' label in local/unbuilt runs.
- Manual summary focus-visible outline-offset +2px (was -2px which
  clipped inside the card), and mark the chevron as decorative.

* fix(web): use session.user in dashboard avatar fallback

Inside {% if session.user %} guard, the avatar fallback referenced
(user.name or user.email). If user is None the block crashes when
the profile picture is absent. Align with the guard variable.

* fix: address Devin review round 3 on PR #28

- app/api/users.py: stop auto-sending email from reset_password. The
  magic-link sender would deliver a "Login Link" that — when clicked —
  consumes the reset_token via verify_magic_link and logs the user in
  WITHOUT prompting for a new password. Admins now share the raw
  reset_token from the API response manually, or use set-password
  directly. email_sent is always False. Documented inline. (Devin 🟡)
- app/api/cli_artifacts.py: harden /cli/install.sh generation against
  shell injection via Host header or AGNES_VERSION. base_url is
  validated against a strict scheme+host+port regex; version against
  an alnum + dot/dash/underscore allowlist. Both values are also
  piped through shlex.quote() as defense in depth. (Devin 🟡)

The shared users.reset_token column between magic-link and password-
reset flows (Devin 🚩) remains an architectural gap; splitting into
separate columns needs schema v8 and is tracked for a follow-up PR.

* docs, chore(grpn): manual-deploy helpers + hackathon deploy learnings

Adds scripts/grpn/ — Makefile + agnes-auto-upgrade.sh + README for
operating Agnes on GRPN's existing foundryai-development VM when the
full Terraform flow is blocked by org policies:

- iam.disableServiceAccountKeyCreation (org constraint) forbids SA
  JSON keys, so GCP_SA_KEY-based CI is unavailable
- No projectIamAdmin delegation → bootstrap-gcp.sh can't grant roles
- Secret Manager IAM bindings require setIamPolicy which editor lacks

Helper targets: deploy, deploy-tag, recreate, restart, stop, start,
status, version, logs, ps, env, ssh, tunnel, open, bootstrap-admin,
set-data-source, install-cron, uninstall-cron.

docs/superpowers/plans/2026-04-22-grpn-deploy-learnings.md — running
log of all org-policy constraints hit during the hackathon deploy,
with workarounds and derived follow-ups (WIF support, external_ip
variable, customer onboarding IAM checklist).

Not a replacement for the TF flow — stopgap until WIF lands.

* fix(web): make header logos clickable links to home

* feat(web): one-click "Setup a new Claude Code" button

Adds a single-button flow on the dashboard and /install page that
generates a fresh personal access token via POST /auth/tokens and
copies a complete, paste-ready setup script (server URL, token,
install/verify commands) to the clipboard. Falls back to a modal
textarea when the clipboard is blocked; redirects to /login on 401;
surfaces backend errors inline.

- dashboard.html: replaces the top "Set up your local environment"
  anchor with a real button wired to setupNewClaude(). Removes the
  duplicate bottom setup banner to keep a single entry point.
- install.html: for signed-in users, Step 1 leads with the one-click
  button and demotes the curl one-liner into a collapsible "Or run
  manually" aside. Anonymous visitors still see the curl flow plus a
  sign-in hint.
- No new deps. Vanilla JS. Token lives in memory/clipboard only —
  never rendered into persistent DOM.

* feat(cli): add "da auth import-token" for non-interactive PAT login

Writes a provided JWT into ~/.config/da/token.json using the canonical
{access_token, email, role} shape expected by save_token(). Decodes the
token locally to pull email/role claims, verifies it against the server
via GET /api/catalog/tables, and refuses to overwrite an existing token
file if the server returns 401. --email / --role overrides exist for
tokens missing those claims; --skip-verify bypasses the server round-trip
for offline / CI scenarios.

* test(cli): cover da auth import-token success + 401 + claim-fallback paths

Three new tests in TestAuthImportToken:
- valid JWT + 200 -> canonical token.json written
- 401 from /api/catalog/tables -> exit 1, existing token file untouched
- JWT without email/role claims -> refused without overrides, accepted
  with --email / --role flags

* feat(web): update one-click Claude setup instructions — explicit uv install, import-token, skills question

Replaces the fragile `cat > token.json <<EOF` clipboard payload with an
explicit, auditable sequence:

  1. `curl -fsSL /cli/download` + `uv tool install --force` (no opaque
     `curl | bash`).
  2. `da auth import-token --token ...` instead of hand-written JSON.
  3. Explicit PATH persistence for zsh/bash.
  4. A required question to the user about whether to copy the bundled
     skills into ~/.claude/skills/agnes/ or pull them on-demand via
     `da skills show`.
  5. A final confirmation step with whoami + version output.

Factored both pages to include a shared partial
(app/web/templates/_claude_setup_instructions.jinja) so dashboard.html
and install.html can never drift apart again. {server_url} and {token}
stay as runtime placeholders substituted by renderSetupInstructions().

* feat(ui): modernize /admin/users + unify header nav across pages

- New shared partial app/web/templates/_app_header.html — single source
  of truth for the top navigation. Used by base.html and dashboard.html
  (which doesn't extend base.html). Active page highlighted via
  request.url.path. Admin "Users" link gated by session.user.role.
- style-custom.css: add .app-header / .app-nav-link / .app-btn-logout /
  .app-avatar styles (mirrors dashboard's previous inline copy under
  app-* prefix). Mobile-friendly fallback at <720px.
- base.html: include the new partial so every page extending base
  (admin_users, profile, login_email, error, …) gets the same chrome
  the dashboard has.
- dashboard.html: replace its inline <header class="header"> markup
  with the shared partial. Inline .header CSS left in place as
  harmless dead code (separate cleanup PR).
- admin_users.html: rewritten with avatars, role pills (color-coded
  per role), toggle switch for active, search/filter input, toast
  notifications, modal dialogs replacing alert/confirm/prompt,
  one-click copy for the reset token, empty / loading states.
  All XSS-safe via the existing esc() helper + data-attribute
  event delegation.
- tests/test_web_ui.py: smoke test that /admin/users renders the new
  shared header chrome and the modernized markup.

* feat(api): serve CLI wheel at /cli/agnes.whl for direct uv install

uv tool install inspects the URL path suffix to recognise a wheel, so
/cli/download (which has no .whl suffix) cannot be installed directly.
Expose a stable /cli/agnes.whl alias over the same wheel lookup so users
can run: uv tool install --force https://<server>/cli/agnes.whl

* test(cli): cover da auth import-token --server persisting to config.yaml

The server persistence was already implemented in the import-token command
(save_config({server}) call) but not covered by tests. Add an explicit test
so the one-step setup contract — single import-token call writes both token
and server — cannot regress.

* feat(web): simpler Claude setup — single uv install URL, single import-token call

User feedback: the prior clipboard payload repeated the server URL and
token across multiple steps (curl + tmpfile + install + rm + separate
seed-config + import-token). Collapse to:

 1. uv tool install --force {server_url}/cli/agnes.whl  (single URL, direct)
 2. da auth import-token --token ... --server ...        (one call, persists both)
 3. da auth whoami
 4. skills (ask user first)
 5. confirm

uv accepts HTTPS URLs that end in .whl and installs them directly, so
the tmpfile dance is unnecessary. import-token --server already persists
the server to config.yaml, so no separate printf > config.yaml step.

* fix(tests): update admin users heading assertion after template rename

The admin_users.html template now uses <h2 class="users-title">Users</h2>
instead of <h2>User management</h2>. Update the assertion to match.

* feat(ui): unify header across remaining 7 standalone pages

These 7 pages render their own full <html> and don't extend base.html,
so the previous unification commit only covered base + dashboard. Each
had its own ad-hoc <header> markup with inconsistent classes
(.top-header / .header / .page-header), inconsistent nav-link sets,
and inconsistent avatar/email styling.

Replace each inline <header>...</header> block with the shared
{% include '_app_header.html' %} so /activity-center, /admin/permissions,
/admin/tables, /catalog, /corporate-memory, /corporate-memory/admin,
and /install all show the same chrome (Dashboard / Install CLI /
Profile / Users / email + avatar / Logout) with the active page
highlighted via request.url.path.

Old inline header CSS (.header, .top-header, .page-header, .nav-link,
etc.) is left in place as harmless dead code; it can be cleaned up in
a follow-up sweep.

* feat(web): add readable preview of Claude setup payload on dashboard + /install

Move the line-by-line setup instructions into app/web/setup_instructions.py
as the single source of truth, then render them in two modes from the
existing _claude_setup_instructions.jinja partial:

- preview_mode=True  → visible, read-only <pre><code> block with the real
  server URL and a clearly-styled placeholder token (never a real one).
- preview_mode=False → the JS SETUP_INSTRUCTIONS_TEMPLATE used by the
  one-click flow (unchanged behaviour).

Both /dashboard (env-setup-cta card) and /install (Step 1 card) now show
the preview directly under the 'Setup a new Claude Code' button so users
can see exactly what will land in their clipboard before they click.

* feat(web): update setup instructions — `da diagnose` step, explicit section titles

Rework the Claude Code setup payload to:

- Give every numbered step an unambiguous verb header ("1) Install the CLI",
  "2) Log in", "3) Verify the login", "4) Run diagnostics", "5) Skills (ask
  the user first)", "6) Confirm").
- Add step 4 `da diagnose` as the post-login health check. The CLI already
  ships this command (cli/commands/diagnose.py); it prints "Overall:
  healthy" and a list of green checks that map cleanly to next actions.
- Ask the skills copy-vs-on-demand question verbatim so Claude Code always
  prompts the user the same way.
- Replace the terse "Confirm" line with a 4-bullet summary (version,
  whoami, skills choice, diagnose status) so the return message is
  structured and comparable across setups.

* chore(web): remove stale MCP card from /install (no MCP server today)

The 'Use with Claude Code / MCP' card (Step 3 on /install) referenced an
MCP integration Agnes does not ship. Remove the whole card. The one-click
'Setup a new Claude Code' flow in Step 1 already covers the long-lived
client use case and is less confusing than dangling persistence tips for
a non-existent integration.

* feat(api): include user_email + last_used_ip + user_id in admin tokens list response

Adds AdminTokenItem response model (superset of TokenListItem) and
AccessTokenRepository.list_all_with_user() joining personal_access_tokens
with users to denormalize user_email. Needed for /admin/tokens UI where
admins triage tokens across all users.

* feat(web): /admin/tokens page — list, filter, search, revoke across all users

Adds a new admin-only page with client-side filtering (status, user email,
last-used window), column sorting, counts bar (active/revoked/expired),
and an inline revoke action. Mirrors the /admin/users visual language.

* feat(web): add Tokens nav link for admins + deep-link from admin/users row

Admin-only nav entry to /admin/tokens, and a per-row Tokens button on
/admin/users that prefills the token page's user filter via ?user=<email>.

* test(admin): cover /admin/tokens rendering, filter state, non-admin denial, revoke

Verifies admin can render the page (title + JS hooks present), a non-admin
is blocked, unauthenticated users are redirected, the admin list response
includes user_email / user_id / last_used_ip, and admin can revoke another
user's token.

* feat(web): modern redesign of /admin/tokens — hero, stat strip, refined table, responsive cards, a11y

* feat(web): ditch the table — /admin/tokens as a card stack, modern GitHub-style list

Replaces the table-based layout with a stack of self-contained token cards
inside a <ul role=list>. Each card is a flex row: avatar + name/meta on the
left, last-used block in the middle, status pill + outlined 'Revoke' button
on the right. Status and sort controls are pill-shaped toggle chips; user
email search has an inline search icon. No <table>/<tr>/<th>/<td> anywhere.
Responsive below 720px (card stacks vertically) and 480px (stat chips 2x2).
Preserves filter IDs (flt-status, flt-user, flt-last-used) and data-revoke
for existing tests.

* feat(web): add /tokens (role-aware) — single page for both user PAT CRUD and admin overview

- Rename admin_tokens.html -> tokens.html with a new is_admin context flag.
- New route GET /tokens: renders the same card-stack UI for everyone.
  * Admins: loads /auth/admin/tokens, shows owner column + stat strip, keeps
    the owner-email search box and sort-by-owner chip.
  * Non-admins: loads /auth/tokens (own tokens only), hides owner column +
    stat chips, adds a 'New token' CTA in the hero that opens a modal
    (name + expires_in_days) calling POST /auth/tokens. The raw token is
    revealed once in a dismissable banner and cleared from the DOM on Hide.
- GET /admin/tokens now 302-redirects to /tokens, preserving query string
  (so the /admin/users deep-link ?user=foo still works).

* feat(web): /tokens full-bleed layout to match dashboard width

The hero, toolbar, and card list used to sit inside base.html's .container
(max-width 800px). Break out with negative horizontal margins so the page
spans the viewport like /dashboard does, capped at 1440px for readability
on very wide screens with a 24px gutter on each side.

- No change to base.html itself. The override is scoped to .tokens-page.
- body { overflow-x: hidden; } guards against rare horizontal scrollbars.
- < 808px viewport: reset to natural flow (mobile already narrower).
- ≥ 1488px viewport: cap to 1440px and re-center.

* chore(web): remove /profile template + nav link (redirect /profile -> /tokens)

The old /profile PAT CRUD page is now redundant — the modern /tokens page
covers both user and admin flows. Delete the template; the router's
/profile handler already 302-redirects to /tokens.

Nav cleanup:
- Remove the 'Profile' link.
- Show a single 'Tokens' link to every signed-in user (previously only
  admins saw it).
- Active-state matches /tokens, /admin/tokens, and /profile so the
  highlight survives the redirect chain.

/install CTA now points at /tokens instead of /profile.

* test: cover /tokens for admin + non-admin flows, /profile redirect, nav update

tests/test_admin_tokens_ui.py
- Point admin rendering test at /tokens directly and tighten assertions
  (admin-only stat strip + owner search, non-admin CTA absent).
- Add test_non_admin_can_render_tokens_page: personal body, New-token CTA,
  create-modal, reveal banner; stat strip + owner search absent.
- Add test_admin_tokens_redirects_to_tokens: 302 to /tokens, query string
  (?user=...) preserved for the /admin/users deep-link.
- Add test_profile_redirects_to_tokens: 302 to /tokens.
- Add test_non_admin_can_create_pat_via_tokens_page_api: exercises the
  POST /auth/tokens call that the non-admin create-modal submits.

tests/test_pat.py
- test_profile_page_renders -> test_profile_page_redirects_to_tokens:
  assert the 302 + that /tokens lands on the unified non-admin body.

tests/test_web_ui.py
- admin_users nav assertion: 'Tokens' link present, 'Profile' link absent.
- Add test_nav_shows_tokens_link_for_non_admin: non-admins see the same
  'Tokens' link (previously only admins did).
- Add test_profile_redirects_to_tokens back-compat check.

* feat(web): collapse 'What Claude Code will receive' by default

The preview block on /dashboard and /install now uses <details>/<summary>
so it is hidden by default. Click the chevron/title to expand and review
the clipboard payload. Markup stays in the DOM so existing tests that
assert on content continue to pass.

* fix(web): /tokens width — override .container to 1280px like dashboard

The negative-margin full-bleed trick was fragile and pushed content past
the right edge on deployed viewports. Replace with a simple max-width
override of base.html's .container on this page only, matching
/dashboard's 1280px center-column layout.

* feat(web): split role-aware /tokens into my_tokens.html + admin_tokens.html

* feat(web): router — separate handlers for /tokens (own) and /admin/tokens (all)

* feat(web): nav — show Tokens for all, add All tokens for admins

* test: cover split token pages (own vs all) + admin access gating

* feat(web): move 'My tokens' into a user dropdown menu

Replaces the separate Tokens/email/Logout nav trio with a rounded
avatar trigger that opens a dropdown containing the user's email,
role, a 'My tokens' link, and Logout. Admin-only 'All tokens' stays
as a top-level nav item since it's an admin function, not a personal
one. Click-outside and Escape close the panel; chevron rotates on
open.

* fix(api): allow PATs to list/get/revoke their own tokens (CLI flow)

The documented 'da auth token list/revoke' CLI flow in
docs/HEADLESS_USAGE.md uses a PAT, but the previous dependency
(require_session_token) returned 403. Only create_token must be
session-only to prevent PAT-spawning-PAT chains; listing and
revoking your own tokens is safe with a PAT.

* fix(api): cap expires_in_days at 3650 to avoid datetime overflow (500 to 400)

Values above ~11 million days overflowed datetime.max in
datetime.now(utc) + timedelta(days=...) and surfaced as an
unhandled OverflowError → 500. Cap at 10 years with a clear
400 instead; the no-expiry code path is unaffected.

* fix(api): relax _SAFE_URL_RE to allow path prefixes, underscores, and IPv6

The previous regex rejected legitimate reverse-proxy base_url values
(https://host/agnes/), underscores in Docker Compose hostnames, and
IPv6 literals (http://[::1]:8000). Widen the charset and allow an
optional trailing path. shlex.quote continues to provide
defense-in-depth against any metacharacter that slips through.

* fix(web): /login/email and Google OAuth propagate next_path

Previously, /login/email silently dropped the ?next=<path> query
param so the hidden form field rendered empty and login always
landed on /dashboard. Google's button was hard-coded to
/auth/google/login, ignoring next entirely.

- /login page now appends ?next to the Google button URL
- /login/email reads + sanitizes next, passes as template context
- google_login stashes sanitized next_path in session['login_next']
- google_callback pops + re-sanitizes and redirects there

Sanitization factored into app/auth/_common.safe_next_path.

* fix(auth): differentiate argon2 VerifyMismatchError from internal errors in web login

The previous except (VerifyMismatchError, Exception) collapsed both
cases into the generic 'invalid credentials' redirect, silently
hiding corrupted-hash / library errors from ops. Split the two:
bad password still gets ?error=invalid; anything else logs via
logger.exception and redirects with ?err=auth_internal so ops have
a visible signal and users don't retry forever against a broken
password_hash column.

* docs: correct CLAUDE.md table name (personal_access_tokens)

v7 note referenced 'access_tokens.last_used_ip' but the real table
is personal_access_tokens (as mentioned two tokens earlier in the
same bullet). Same-file consistency fix.

* chore(web): clarify admin user-reset UI — encourage Set password over the unused reset_token

POST /api/users/{id}/reset-password stores and returns a token
but no endpoint consumes it — the magic-link sender would log the
user in without prompting for a new password, defeating the reset.
- Drop the 'Reset' row action from admin_users so admins aren't
  pointed at a dead end.
- Rewrite the reveal-modal copy to tell admins to use Set password
  and explicitly note that the magic-link flow isn't available
  for reset tokens in this build.
The API endpoint stays for API-level future use.

* test: cover PAT CLI flow, expires_in_days overflow, proxy base_url, next propagation

- tests/test_pat.py: PAT can list own tokens (200, was 403);
  PAT can revoke own tokens (204); create_token returns 400 for
  expires_in_days > 3650 (was 500 via datetime overflow).
- tests/test_cli_artifacts.py: _SAFE_URL_RE accepts reverse-proxy
  path prefixes, underscores, and IPv6 literals; end-to-end check
  of cli_install_script with a stubbed base_url that includes
  a path prefix (Agnes behind /agnes/).
- tests/test_web_ui.py: /login propagates ?next to the Google
  button URL; /login/email renders next in the hidden form field
  and strips hostile values; unit coverage of safe_next_path.

* fix(security): use \Z instead of $ in URL/version allowlists (trailing-\n bypass)

Python regex `$` also matches just before a trailing newline, so a Host
header or AGNES_VERSION value like "good.example.com\n$(rm -rf /)"
would slip past the allowlist. `\Z` anchors to strict end-of-string.

shlex.quote downstream remains as defense-in-depth, but the allowlist
is now the tight gate it claims to be.

* fix(auth): PAT with null expiry omits JWT exp claim (DB is the source of truth)

Previously a PAT created with `expires_in_days=null` (user-requested
"never expires") set the DB `expires_at` to NULL (correct) but still
baked a ~100y `exp` claim into the JWT. That is misleading: the PAT
silently did expire eventually, despite the UI and API promising
"no expiry".

`create_access_token` now accepts `omit_exp=True` to skip the `exp`
claim entirely. `app/api/tokens.py` passes that when `expires_in_days
is None`. The authoritative expiry check lives in
`app/auth/dependencies.py`, which reads `expires_at` from the DB row —
unchanged. PyJWT accepts claim-less JWTs indefinitely.

* test: cover trailing-newline regex bypass + no-exp JWT for unbounded PAT

- test_safe_url_re_rejects_trailing_newline_bypass: asserts both
  `_SAFE_URL_RE` and `_SAFE_VERSION_RE` reject values with a trailing
  `\n` (previously accepted because Python `$` matches before `\n`).
- test_pat_null_expiry_jwt_has_no_exp_claim: POST /auth/tokens with
  `expires_in_days=null`, decode the returned JWT, assert `exp` is
  absent while `typ=pat`, `sub`, and `jti` are still present.
- test_pat_with_null_expiry_is_accepted_by_verify_token: verify_token
  round-trips a claim-less JWT without ExpiredSignatureError.
- test_pat_null_expiry_end_to_end_allows_authenticated_request: use
  the null-expiry PAT against /auth/tokens and confirm it authenticates.

* docs(auth): document X-Forwarded-For trust model in _client_ip

Deployment runs behind Caddy which strips incoming X-Forwarded-For
and sets its own, so the leftmost hop is trustworthy. Clarify that
the stored last_used_ip is audit-only and never used for access
control — if the app is ever exposed directly, this value becomes
client-settable.

* docs: /profile → /tokens in install.sh next-steps, CLI error, HEADLESS_USAGE, security skill

After splitting PAT management to /tokens (with /profile as a back-compat
302), stale references remained in user-facing text. Update them to the
canonical /tokens URL so shell scripts, CLI error hints, docs, and the
bundled security skill are all consistent.
2026-04-22 14:24:28 +02:00