Add a generic, placement-aware mechanism for operators to inject HTML/JS
into every page that extends base.html or base_login.html. Each entry
takes name, enabled, placement (head_start | head_end | body_end), and
html. Replaces the need for per-vendor helpers when shipping feedback
widgets, analytics, or error-capture snippets.
Trust boundary mirrors the existing instance.logo_svg / instance.overview
pattern — admin-only, rendered with `| safe`. Resolved by
app/instance_config.py::get_custom_scripts(), surfaced in
/admin/server-config via _KNOWN_FIELDS["instance"]. Empty default keeps
the OSS vendor-neutral; sample Marker.io block ships commented out in
config/instance.yaml.example as the canonical example.
* feat(web): value-first /home reskin (CEO mock palette + pillars + first-session)
Restructures `/home` to lead with product value instead of install steps,
matching the CEO mock proposed for the homepage:
- New intro hero on top — eyebrow `Welcome, {{ display_name }}`, H1
`{{ instance_brand }} is your team's AI workspace`, lede framing the
product as an "AI Chief of Staff", two CTAs (`Set up in ~15 min →`
jumps to the wizard, `Just browse — no install needed` jumps to
`#look-around`), and a four-pillar row (Data packages · Plugins ·
Skills · Memory). Renders for both onboarded and not-onboarded users
so the value framing is consistent across visits.
- New `first-session` narrative — five-beat walkthrough (launch → pick
project → memory loads → ask → close) with mock terminal frames
carrying traffic-light dots, prompts, and dimmed system output.
- Setup wizard chrome — progress chip (`Step 1 of N · ~15 min ·
One-time · Reversible`), thin progress bar, and per-step number
badges on each `.install-block` so the wizard reads as bounded
instead of an open-ended scroll.
- Palette shift from blue to green/navy: `--hp-primary` aliases
`#2ea877` (mint), `--hp-hero-bg` is navy `#0f1b3a`, code panels stay
near-black `#0c1224` with warm-yellow `#ffd866` accents. The token
alias is reused so downstream rules pick up the new accent
automatically; instance theme overrides via
`config.theme_overrides()` still win.
- VS Code surface tile carries a `Recommended` pill; the existing
"Want to look around first?" section is renamed to `Explore your
workspace` and gets the `#look-around` anchor.
All test-pinned class names and IDs (`install-hero`, `install-block`,
`home-mock`, `self-mark-btn`, `setupClaudeBtn`, `offboard-strip`,
`home-getting-started`, `home-gs-item`, `home-overview`,
`home-usage`) preserved as structural anchors; new visual language
overlays via additional classes. Existing onboarded/not-onboarded
branching, `/api/me/onboarded` POST, status frame gating, post-CTA
modal, and OS-tab switching JS unchanged. Stray `~/FoundryAI`
comment swapped for `~/{{ workspace_dir }}` to honor the
vendor-agnostic OSS rule.
51 home tests pass without modification.
* fix(web): /home palette inversion — dark intro hero on top, light setup card below
Previous reskin commit kept the install-hero as a dark navy gradient and
rendered the new intro hero as a light surface — opposite of what the CEO
mock specifies. Playwright comparison vs `data/ceo_home.html` confirmed:
- CEO mock: dark navy hero at TOP (with white pillars on navy), LIGHT
white setup card BELOW with light step rows and dark code panels
inset.
- Previous: light intro hero on top, dark setup card below. Inverted.
This patch flips both:
- `.home-hero-intro` now: dark navy gradient `#0f1b3a → #1a2a5f`, green
radial glow in the corner, green eyebrow, white H1 (`accent` span
green), rgba-white lede, green pill primary CTA, translucent-white
secondary CTA, pillars row separated by hairline border-top with
green square-dot bullets in front of each pillar header.
- `.install-hero` and `.install-block` now: white surface card with
thin green accent strip across the top, light step rows split by
hairline borders, green-tinted step-number circles (`#e6f9f0` bg,
`#1f8a5e` ink), green progress chip + bar. Code panels
(`.install-cmd`) and terminal frames stay dark — they're the "type
this" surfaces.
- All previously-rgba-white descendants of `.install-hero`
(close button, eyebrow, h1, lead, links, code chips, OS tabs,
install notes, setup-CTA button, self-mark fallback, auto-detect
badge, terminal-howto disclosure) re-skinned for light surface.
All 12 home page tests still pass (no markup changes, only CSS).
* fix(web): /home parity polish — system font + mock sizes + blue info hint + gray step-num
After v2 palette flip, user comparison vs CEO mock surfaced three
remaining gaps in the wizard area:
- Font stack mismatch: Agnes inherits Inter via `style-custom.css`,
but the CEO mock uses the platform system stack (San Francisco on
macOS, Segoe UI on Windows). The rendered weight/letterforms read
noticeably different. `.home-mock` now declares
`-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif`
for itself and all descendants, with the monospace stack reserved
for `code`/`kbd`/`pre`, `.install-cmd`, and `.terminal-body`.
- Step number badges were green-tinted; mock uses neutral gray
(`#f0f2f6` bg, `#4a5168` ink) — green is reserved for the "done"
state. Switched to `--hp-surface-dim` + `--hp-text-secondary`.
- "Don't have a terminal open?" disclosure was an amber/yellow
variant left over from the old dark-hero palette. Mock uses a
blue info-hint vocabulary (`--info-bg: #eef3ff`,
`--info-line: #4f7cf2`, `--info-ink: #1c3994`) with white kbd
chips. Added the info-* tokens to the `:root` block and re-skinned
`details.terminal-howto` (incl. summary, body, kbd) to match.
Step-body type sizes also brought in line with the mock spec —
`.install-block .label` (step h3 equivalent) is now 17px / 700 with
6px gap; `.install-note` body type is 14px / 1.55.
`--hp-info-bg / --hp-info-ink / --hp-info-line / --hp-warn-bg /
--hp-warn-ink / --hp-warn-line / --hp-surface-dim` added as
first-class tokens so future hint/warn callouts pick the same colors
without a duplicate vocabulary.
12/12 home tests pass.
* feat(web): centralize design tokens + reword /home wizard to 6 steps (CEO mock parity)
Two intertwined changes that touch both global design + /home structure:
GLOBAL TOKEN SHIFT (app/web/static/style-custom.css)
- `--primary` flipped from blue `#0073D1` to green `#2ea877` — same brand
alias the rest of the app referenced, so every page picks up the new
accent automatically. Old `--primary-dark` / `--primary-light` recolored
to match.
- New tokens added: `--brand-accent`, `--hero-bg`, `--hero-ink`,
`--surface-dim`, `--info-bg/ink/line`, `--warn-bg/ink/line`. Brings
the global vocabulary in line with the CEO mock's `:root` block so
callouts and hero surfaces don't have to invent local tokens.
- `--font-primary` switched from Inter-led stack to the system stack
(`-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Inter",
system-ui, sans-serif`) so weight/letterforms render identically on
macOS (San Francisco) and Windows (Segoe UI) — matches the mock and
avoids a font-loading flash for analysts without Inter installed.
- Shadow tints re-cast in navy `rgba(15,27,58,...)`; focus ring uses
the new green `rgba(46,168,119,0.25)`.
- `.app-nav-link` font-size 13px → 14px, padding 6px 12px → 8px 14px,
hover bg → `--primary-light` (mint), color → `--primary-dark`.
`.app-nav-menu-item.is-active` re-tinted to the same green system.
- Sweep across 26 templates (style-custom.css + 25 template files)
replacing every hardcoded `#0073D1` / `#005BA3` / `#E6F3FC` /
`rgba(0,115,209,…)` / `rgba(0,86,163,…)` with token references or
the new green hexes — 175 occurrences total. Pages that styled their
own buttons / borders / shadows pick up the new brand color without
per-page overrides.
/HOME WIZARD: 6 STEPS PER MOCK (app/web/templates/home_not_onboarded.html)
- Step 1 reworded `Install Claude Code on your computer` + `~3 min`
subhead (mock copy).
- Step 2 renamed `Pick a folder for {{ instance_brand }}` (was
`create your workspace folder`) — same `mkdir` command, mock-aligned
framing.
- NEW Step 3 `Open a terminal inside that folder` — no shell command,
just the "you are standing in the right directory" reassurance with
a Finder/PowerShell/file-manager howto disclosure. Mirrors the CEO
mock's Step 3.
- Step 4 (was Step 3, gated by `home_automode.show`) renamed
`Launch Claude with auto-approve on`. Body copy lightly updated so
it references "the next step" instead of "Step 4".
- Step 5 (was Step 4) renamed `Get the install script and paste it
into Claude`. The setup-cta-lead now explicitly says
"pasting the script into Claude Code will install {{ instance_brand
}}…" so existing test assertions pinning the `install Agnes`
substring still match.
- NEW Step 6 `Optional: create a one-word shortcut for next time` —
prints an `echo 'alias {{workspace_dir|lower}}=…' >> ~/.zshrc`
one-liner for Unix and an `Add-Content $PROFILE …` equivalent for
Windows. OS tabs + copy buttons reuse the existing wizard chrome.
- Progress chip dynamic: `Step 1 of 6` when home_automode is on,
`Step 1 of 5` when off. Progress bar fill `100 // total_steps` so
the bar sits at 16-20 % on first paint.
- `.step-lede` token added for the new short body copy beneath each
step label (14.5px / ink-soft).
- `macOS / Linux / WSL` tab labels changed to `macOS / Linux` per
user instruction. Terminal-howto `WSL:` paragraph dropped; the
paste-shortcut hint now reads `(Linux)` instead of `(Linux/WSL)`.
Functional WSL handling in `connector_prompts.py` (it's a Linux
detection fallback, not user-facing label) preserved.
- `setup_instructions.py` Claude Code install hint:
`npm (Linux / WSL)` → `npm (Linux)`.
SURFACES — 4 CARDS PER MOCK
- Replaced the 3-tile `.home-usage-grid` with a 4-card grid:
- VS Code (Recommended) — `.surface-card.feature`, green ring,
DAILY USE eyebrow + 5-step numbered list + `Open VS Code setup
guide →` link to `/setup-advanced#vscode`.
- Terminal — QUICK ACCESS eyebrow + 4-step list.
- Claude Code (Desktop app) — CONNECT IT eyebrow + 4-step list.
- Cowork (claude.ai) — `.surface-card.incomplete`, warn-tinted
border + `Instructions needed` badge + a TODO callout describing
the missing content. The card is intentionally honest about the
gap rather than hiding it.
TEST UPDATES
- `test_web_home_page.py` negative onboarded-state assertions
rebased on the new step labels (6 entries instead of 4).
- `test_home_route_resolution.py` `test_home_renders_automode_block_by_default`
+ its `_when_env_off` counterpart now check the new
`Step 4 — Launch Claude with auto-approve on` label.
* fix(web): /home section content + layout — verbatim mock match
User comparison flagged several remaining gaps; this patch rewrites
the three lower sections of /home to match the CEO mock spec exactly:
FIRST-SESSION (5 beats)
- h2 28px / 700 / -.5px tracking (was 19px / 600).
- lede 18px ink-soft (was 13.5px secondary).
- `.session-walk` wrapper, 36px gap between beats (mock spec).
- `.session-step` grid 48px / 1fr, gap 22px — number circle on
the left, content on the right.
- `.session-num` 40 × 40 circle with SOLID GREEN bg (`--primary`)
and WHITE text + soft green shadow (was 28px mint pill w/
dark-green text).
- `.session-content h3` 18px / 600 (was 14.5px / 600).
- `.session-content > p` 15px.
- `.session-content .annotation` 13.5px ink-muted body type with
`strong` for highlighting (replaces the upper-case "WHAT'S
HAPPENING" eyebrow pattern that didn't match the mock).
- `.session-intro` callout card (white surface + mint icon block)
framing the "five beats" tagline.
- `.session-tldr` summary box (brand-light bg + brand-dark left
border) wrapping up the loop.
- Terminal frames re-skinned: `#0c1224` body / `#182241` bar /
real macOS traffic-light colors `#ff5f57` / `#febc2e` / `#28c840`.
- Terminal body 13px / 1.65 line-height with mock-spec class
vocabulary: `.you` (yellow input), `.ai-name` (brand bold),
`.path` (light blue), `.dim` (translucent code-ink), `.caret`
(blinking cursor).
- Five beats rewritten with mock's exact narrative flow (launch →
menu → pick → ask → close), vendor-agnostic project names
(`RevenueAnalysis`, `Onboarding`, etc.) replacing the customer-
specific `GRPN_*` examples in the mock. Templated `{{
instance_brand }}` / `{{ workspace_dir }}` / `{{ workspace_dir |
lower }}` (the shortcut alias) everywhere.
SURFACES (4 cards)
- The section is no longer wrapped in a white rectangle; the
`.home-usage` class loses its bg + border + padding (mock has the
cards directly on the page bg).
- h2 28px (was 22px). Eyebrow 12px / 1.5px tracking / brand-dark.
- `.surface-card.feature` (VS Code) now uses 2px green border +
vertical brand-light → white gradient (was 1px ring).
- `.surface-card.incomplete` (Cowork) uses 2px red border (`#e35e5e`)
+ vertical red-tint → white gradient (was yellow flat bg).
- `.surface-card .steps` panel: inner surface-dim bg + 8px radius
+ 13px font.
- `.surface-foot` top-border + ink-muted (mock spec).
- `.badge-warn` now a solid red box (`#e35e5e` bg + white ink + 4px
radius) instead of a yellow pill, matching the mock.
- Header layout fixed: the global absorbed `header { display: flex;
justify-content: space-between }` rule was making the h2 sit on
the right of the eyebrow; explicit `display: block` override on
`.home-mock section > header` puts the title on the LEFT under
the eyebrow as the mock has.
BROWSE — Explore your workspace
- Wrapped in `<section class="browse-section">` with proper
eyebrow + h2 + lede (was a bare `.section-label` div).
- `.browse-grid` 5-col grid (was responsive auto-fill, 4-card
layout). Skills tile added as a 5th card linking to
`/marketplace?type=skills`.
- `.browse-card` mock-spec: 22 20 padding, 28px icon, 15px title,
12.5px ink-muted desc, hover lifts -2px with brand border +
shadow-md.
Section wrappers (`.home-usage`, `.first-session`) no longer carry
the white card chrome — they sit directly on the page bg, matching
the mock. Only Getting Started + Overview keep their white cards.
GLOBAL eyebrow vocabulary (`.home-hero-intro .eyebrow`,
`.first-session > .eyebrow`, `.surfaces > header .eyebrow`,
`.browse-section .eyebrow`) all aligned to mock spec: 12px / 700 /
1.5px tracking / brand-dark color / 14px bottom margin.
Hero h1 bumped to 44px / 800 / -1px tracking (was 32px / 600).
51/51 home tests pass.
* fix(web): /home session-intro card + terminal-body verbatim mock match
User comparison flagged three remaining /home gaps; this patch
addresses each:
- `.session-intro` rule was missing — the "five beats" tagline
rendered as a bare line with no card chrome. Added the mock-
spec card: white surface, 14px radius, 20×24 padding, 1px
border + shadow-sm, with a 44×44 brand-light icon block on the
left.
- Beat 1 terminal-title was `~/{{ workspace_dir }} — zsh` (mock-
style shell-pwd format), but the user wants every terminal
frame across all 5 beats to read `claude — {{ instance_brand }}`.
Updated.
- Terminal-body line structure for beats 2-5 rewritten verbatim
from the CEO mock:
- `<span class="prompt">></span><span class="you">…</span>`
now has no space between the prompt and user input (mock
pattern: zero gap, the .prompt's `margin-right: 8px` provides
the visual separation).
- Beat 2 menu items use `<strong>[N]</strong>` numbering with
project entries on indented lines, each project name followed
by a `<span class="dim">(N ago)</span>` timestamp at a fixed
column — instead of my prior single-line concatenation.
- Beat 3 narrative split into 4 stanzas separated by blank lines
(matches mock): the "Switched to <strong>X</strong>" status,
then dim Loaded/Last-session lines, then a stand-alone "One
unprocessed input detected:" pair, then the "Want me to
process …" question. My prior version dim-wrapped the entire
block, which looked off.
- Beat 4 narrative split into headline summary + risks section
with <strong> heads + bullet lists separated by blank lines,
matching the mock's "Q1 close summary" / "Open risks" rhythm.
The Q1 question carries the mock's manual line-break + 2-
space continuation indent inside the `.you` span — without
that, terminal-body's `white-space: pre-wrap` would auto-wrap
awkwardly at a different column than the mock.
- Beat 5 exit narrative uses two separate dim lines + a
standalone `.ai-name` "See you next time." line, then prompt
+ caret. My prior version collapsed everything into one dim
block.
- Project names changed from customer-specific (`GRPN_*`) to
generic (RevenueAnalysis, WeeklyReview, Onboarding, OpsDb,
HRHandShake) so the OSS distribution stays vendor-agnostic
per CLAUDE.md.
- `Marketing plan` examples replaced with `Q1 close` so the
narrative stays plausible for an analyst audience.
12/12 home tests pass.
* fix(web): /home surfaces verbatim mock — VS Code thumb, Terminal expected-output, NEW badge
User comparison flagged three remaining surface-section gaps:
- VS Code surface card was rendering a generic "Screenshot pending"
placeholder; the mock has a labeled inline mockup
(`<a class="vscode-thumb">` w/ `.thumb-fallback`) showing the
recommended 4-pane layout (EXPLORER yellow, TERMINAL 1 purple,
TERMINAL 2 green, TERMINAL 3 orange) on a dark navy bg + a
"Recommended layout" caption pill. CSS `.vscode-thumb` block
added — uses gradient-strip backgrounds to draw the colored
panel bars without needing a base64 image.
- "Recommended" badge was a pill (999px radius) with
`--brand-accent` bg + navy text. Mock uses `.badge` instead of
`.recommend-pill` — solid `--primary` (brand-dark green) bg
with WHITE text and 4px radius. Replaced the class + CSS rule
so the badge reads as a tag, not a pill.
- Terminal surface card was missing the "What you should see"
subsection — mock has an `.expected-output` block showing a
sample of the welcome menu inside a dim dashed panel. Added the
block with the mock's exact rendered output (templated to
`{{ instance_brand }}` + generic project names instead of
customer-specific GRPN entries) plus the `.expected-output`
CSS (surface-dim bg + dashed border + `::before` "WHAT YOU
SHOULD SEE" eyebrow per mock spec).
Also addressed the explore-section feedback:
- Skills browse-card now carries the `new` class so it picks up
the `.browse-card.new::after` corner badge ("NEW", green bg,
white text, 10px / 700 / 0.5px tracking) per mock.
- Browse cards align same height via `align-self: stretch` (grid
default) + `flex-grow: 1` on `.browse-desc` so descriptions
fill remaining vertical space; previously the Skills tile sat
shorter because its desc text was longer than others'.
Structural HTML changes to all four surface cards: dropped the
inner `<div class="surface-card-head">` wrapper + `<p
class="surface-pitch">` class in favor of mock's flat layout
(`.what` + `.steps` + `.when-to-use`). `<ol class="surface-steps">`
replaced with `<div class="steps"><strong
class="steps-eyebrow">DAILY USE / QUICK ACCESS / CONNECT IT</strong>
<ol>...</ol></div>` so the eyebrow + numbered list share the
mock's tinted surface-dim panel.
12/12 home tests pass.
* fix(web): align /home setup walkthrough to design spec
- Setup-section header (eyebrow + heading + lede) floats above the
install hero; install card has no accent strip; step labels drop
`Step N —` prefix; closing strip is single flex row.
- VS Code surface card renders recommended-layout screenshot from
`/static/img/vscode-layout.png` with click-to-enlarge lightbox.
- Workspace install path cascades to `~/Desktop/{workspace_dir}` in
every step, surface card, first-session annotation, and shortcut.
- Step 1 verify text restores Enterprise — Finance and Legal option.
- Step 6 shortcut installs a shell function with arg forwarding
(`"$@"` unix / `@args` windows) and a user-facing Auto / YOLO
permission-mode toggle.
- Step 5 manual-fallback details inline on the CTA row; description
reads at step-lede size, not 13px chip.
- Setup-section heading no longer right-aligns (was inheriting
`header { display: flex; justify-content: space-between }` from
the legacy stylesheet; wrapper changed to `<div>`).
- Getting Started `<details>` block removed (duplicated links).
* test(web): align /home tests with restructured setup wizard
- Replace test_getting_started_card_renders_on_home with
test_setup_section_renders_for_not_onboarded — asserts the new
setup-section-header floats above the install hero and Getting
Started markup is absent (block removed in the prior commit).
- Update automode-block test to match labels without the
`Step N —` prefix.
- Update setup-CTA partial test to match the relabeled
"Copy install script to clipboard" button.
Drop orphaned CSS for `.home-getting-started`, `.home-gs-summary*`,
and `.home-gs-item` — selectors had no matching markup after the
Getting Started block was removed.
Also: Step 3 `pwd` expected-output uses an absolute path
(`/Users/yourname/Desktop/{workspace_dir}`) instead of the
tilde-prefixed form, matching what the command actually prints.
* fix(web): repaint home_onboarded + setup_advanced; align CTA label
- home_onboarded + setup_advanced still carried the retired blue
`#0056A3` as both `--hp-primary-dark` and the hero gradient
endpoint. Both reference `var(--primary-dark)` now so the green
palette cascades.
- setup_advanced YOLO snippet was the old `alias` form (no cd, no
arg forwarding). Replaced with the shell function variant from
/home Step 6 — drops into ~/Desktop/{workspace_dir} and forwards
"\$@" (unix) / @args (Windows).
- setup_advanced ~/{workspace_dir} path references cascaded to
~/Desktop/{workspace_dir} so install story matches /home.
- Dashboard's "Setup a new Claude Code" button label aligned to the
canonical "Copy install script to clipboard" — matches /home and
the new docstring in _claude_setup_cta.jinja, which now mandates
this label across consumers.
* fix(web): keep base brand blue; scope green palette to /home redesign
User noticed login + dashboard had turned green when the /home
redesign flipped --primary from blue (#0073D1) to green (#2ea877)
in commit 278f202e. The brand-wide flip went further than the
redesign needed — only /home, /home (onboarded), and /setup-advanced
intentionally use the green/navy spec; every other page (login,
dashboard, catalog, marketplace, admin, profile) was just inheriting
the green because --primary cascaded everywhere.
Revert the global brand colour to blue and lock the green into the
two outstanding redesign scopes:
- style-custom.css: --primary back to #0073D1, --primary-light back
to rgba(0,115,209,0.1), --primary-dark back to #005BA3,
--brand-accent back to a lighter blue.
- home_onboarded.html: .home-mock now sets --hp-primary,
--hp-primary-dark, --hp-primary-light to explicit green hex
(matching home_not_onboarded), so the hero stays green regardless
of the global brand.
- setup_advanced.html: same lock — .advanced-mock pins the green
palette in-scope.
Hero gradients on both pages now reference the local --hp-primary
chain (not the global --primary), so any future palette tweak inside
either scope cascades correctly without disturbing the rest of the app.
* refactor(web): hoist --hp-* into shared design-tokens.css (--ds-*)
PR 2 of the design-system extraction ladder. Pure mechanical rename
+ dedup; no visual diff on any rendered page (verified on /home,
/dashboard).
- New app/web/static/css/design-tokens.css declares the full token
set on :root: brand surface (green primary, primary-dark, mint
light, brand-accent), hero (navy bg + ink), code-panel (near-black
bg + cool ink + warm-yellow), light surfaces (bg/surface/border),
text (primary/secondary/muted), orange accent, info + warn
callout vocabularies, navy-tinted elevation shadows, system font
stack + mono.
- base.html loads it alongside style-custom.css so the tokens are
globally available.
- Rename --hp-* -> --ds-* in home_not_onboarded (313 refs),
home_onboarded (15), setup_advanced (39). 367 token references
pointed at one of three local blocks; now all point at the
global :root.
- Drop the three local token blocks. Each scope class
(.home-mock / .advanced-mock) only keeps its base ink + font-size
+ line-height rules.
The legacy --primary family stays canonical for the blue base
brand — login, dashboard, catalog, marketplace, admin still read
blue. The design system is opt-in via the scope class.
* refactor(web): extract shared components.css; migrate /home markup
PR 3 of the design-system extraction ladder. First batch of
reusable components lifted out of home_not_onboarded.html into a
new shared stylesheet; markup migrated to consume them.
- New app/web/static/css/components.css with five components, all
reusable on any page that loads design-tokens.css:
.callout-rec — amber lightbulb recommendation box
.callout-hint — blue info hint box
.code-output — "WHAT YOU SHOULD SEE" terminal output block
.lightbox — full-bleed image enlarge overlay
.setup-section-header — wizard header (eyebrow + h2 + lede)
- base.html loads components.css after design-tokens.css.
- home_not_onboarded.html markup renamed:
class="rec" -> class="callout-rec"
class="hint" -> class="callout-hint"
class="expected-output" -> class="code-output"
- Local CSS rules removed from home_not_onboarded.html for each of
the extracted components — ~150 lines down to 5-line "extracted to
components.css" comments. The bespoke wizard-specific styles
(.install-cmd, .os-tabs, .mode-tabs, .terminal-frame) stay
template-local for now since they only have one consumer.
Visual regression check: /home install hero renders the amber rec
callout, blue hint callout, dashed code-output block, green section
header, and click-to-enlarge VS Code thumb identically to the
pre-extraction render. 43 home tests pass.
* fix(web): unify page-headers — activity-center full-width, marketplace shares box
- /activity-center audit-log hero rendered as half-width because the
_page_hero include was inside <header class="obs-topbar">, a flex
row that pinned the time-range + auto-refresh controls next to it.
The hero is now a sibling rendered before the <header>, so it
spans the full container width like every other admin page; the
controls keep their flex row underneath.
- Marketplace hero unified with .page-header--hero. Markup is now
<section class="page-header page-header--hero mp-hero"> so the
shared box drives padding/radius/gradient/max-width/shadow; the
.mp-hero override block only carries the right-anchored cover
image and the rules for the search row + scope checkboxes (which
the canonical hero doesn't have). Inner text uses the canonical
.page-header__eyebrow / __title / __subtitle classes.
- .page-header--hero shadow tint now follows the brand blue
(rgba(0, 115, 209, 0.2)) instead of the leftover green from the
prior palette flip; same depth highlight everywhere the gradient
is blue.
* fix(web): unify remaining page heroes — admin, profile, install, store, stack
Sweep across pages that carried bespoke gradient hero markup so
every page-hero shares the canonical `.page-header--hero`
dimensions (padding 28/32/24, border-radius 14, max-width
var(--width-app), navy-tinted shadow, gradient with --primary →
--primary-dark). Inner text uses the .page-header__eyebrow /
__title / __subtitle classes so typography matches across the app.
- admin_tables: migrated to _page_hero.html include.
- admin_tokens: kept .tokens-hero wrapper for the counts-chip row
but added the canonical class on the same element; stripped
duplicate gradient + padding + typography rules.
- install: same pattern (kept hero-meta pill row).
- profile: migrated to _page_hero.html include.
- store_upload: kept .upload-hero wrapper for the .meta chip row;
composite class with the canonical hero.
- setup_advanced: .advanced-mock .ad-hero now matches canonical
dimensions; green palette retained via --ds-primary/dark.
- stack_card.css: .stack-hero (catalog + corporate-memory search
hero) uses canonical gradient + padding + max-width.
The detail-page heroes (marketplace_plugin_detail,
marketplace_item_detail, catalog_*_detail, store_edit,
admin_group_detail, admin_store_submission_detail) stay bespoke
for now — they're rich detail headers with photos, badges, install
actions; converting them would lose contract context. Same applies
to dashboard.html env-setup-cta (it's a CTA card, not a page hero).
* fix(web): canonicalise .container — single page shell every page inherits
Previously each admin page set its own `.container:has(.<page>)
{max-width: none}` + `.<page>-page {max-width: 1400px}` override,
and per-page hero markup either nested inside flex toolbars (which
pinned the hero next to filter controls and squeezed it half-width)
or self-constrained with a different max-width than the page. /home,
/dashboard, /marketplace, and /admin/* ended up at different widths
with different nav-to-hero gaps.
- style-custom.css `.container` now carries the canonical 1280px
max-width + `16px 32px 48px` padding so every page inherits the
same nav-to-hero gap and side gutters. `.container > main` is
margin/padding 0 so the container is the sole owner of gutters.
- `.page-header--hero` drops its self-constraining max-width and
auto-centering margin — the container provides the width, so the
hero sits flush with the table/toolbar below it.
- `.stack-hero` (catalog + corporate-memory) and `.advanced-mock
.ad-hero` (/setup-advanced) follow the same pattern: container
owns the width.
- Per-page max-width overrides stripped from admin_users,
admin_access, admin_groups, admin_marketplaces, admin_welcome,
admin_workspace_prompt.
- _page_hero include extracted from inside flex toolbars on
admin_users, admin_access, admin_groups, admin_marketplaces,
admin_server_config, admin_welcome, admin_workspace_prompt,
admin_sessions, admin_session_detail, admin_usage,
activity_center. The toolbar (`.users-toolbar`, `.gp-toolbar`,
etc.) keeps only the filter + action controls; hero renders
before it as a sibling.
- _page_chrome.html trimmed to just the page-background tint for
the redesign scopes; the duplicate `.container` rules it carried
are now redundant.
Verified: /home, /admin/marketplaces, /admin/users all render
container width 1280px with hero top at 88px (16px below the
72px-tall sticky nav). Same spacing as /home design spec.
* fix(web): admin_tables + admin_corporate_memory inherit canonical .container
Both pages were overriding `{% block layout %}` from base.html,
which bypasses the canonical `.container` wrapper. Result: hero
span the full viewport (1596px on a wide screen) while the inner
content sat at a narrower max-width — hero and content didn't
align, and the nav-to-hero gap differed from every other admin
page.
Switched both templates to `{% block content %}` so they render
inside the canonical `.container` from base.html — same path as
admin_groups, admin_users, admin_marketplaces, etc.
- admin_tables: dropped local `.page-title { max-width: 1600px }`
+ `.content { max-width: 1600px }` overrides (kept typography +
inner gutter rules) and the mobile padding overrides that paired
with them. Container now owns the gutters.
- admin_corporate_memory: only the block keyword needed changing;
the template already had a clean inner structure (no max-width
override on `.container-memory`).
Verified on /admin/tables and /admin/corporate-memory:
- .container width 1280, padding 16/32/48
- Hero top 88 (nav 72 + container padding-top 16)
- Hero + content both 1216px wide, both at left 190 — perfect
alignment with /admin/groups.
* fix(web): drop .page-shell padding override + admin_tables stale :root
Two regressions discovered after the canonical-container unification:
1. `.container:has(.page-shell)` still set `padding: 28px 32px 48px`
while the canonical `.container` had moved to `16px 32px 48px`.
Every page-shell consumer (/admin/sessions, /admin/sessions/<id>,
/admin/usage, /marketplace, /dashboard, marketplace detail pages,
/me/activity, /store/*, /admin/store-submissions) was rendering
with a 28px nav-to-hero gap while /admin/users + /admin/groups
rendered with 16px. Same width, mismatched vertical rhythm.
The opt-in rule is now a no-op marker: canonical container
already provides 1280px + 16/32/48 + main margin/padding 0.
2. admin_tables.html had a stale `<style>` block that re-declared
`:root { --primary: var(--primary); ... }`. The self-referential
token resolved to empty, collapsing the page-header hero's
`linear-gradient(135deg, var(--primary), var(--primary-dark))`
to no background — the hero appeared as a pale ghost without
colour. The entire shadow `:root` block was a stale copy of the
design tokens that style-custom.css already provides. Dropped
it; tokens now resolve from the global `:root`.
After both fixes /admin/sessions, /admin/tables, and every other
page-shell consumer match /admin/groups exactly: container 1280px,
container padding-top 16px, hero at top 88px / left 190px / width
1216px.
* fix(web): drop /admin/tokens .tokens-page width + padding override
`.tokens-page` carried its own `max-width: 1280px; margin: 0 auto;
padding: 28px 8px 48px` block — the canonical `.container` already
provides width + 16/32/48 padding, so the nested wrapper was
adding 28px on top of the container's 16px (= 44px nav-to-hero
gap, vs 16px on every other admin page) and shrinking the hero
sideways by 8px on each side (1200px vs the canonical 1216px).
After: container owns the layout; `.tokens-page` is just a
font-family scope. /admin/tokens hero now sits at top 88, left 190,
width 1216 — same numbers as /admin/groups / /admin/users.
* fix(web): hero links readable on blue; /admin/access Groups link href
- New `.page-header--hero a` rule in style-custom.css forces any
anchor inside a gradient hero to render white + underlined so
links stay readable on the blue background. Previously links
inherited the global `var(--primary)` blue, which disappeared
on top of the matching blue gradient. No per-page class needed —
drop a plain `<a>` in any hero subtitle and it just works.
- /admin/access hero subtitle was Jinja-passing the inline link
with HTML-entity-encoded quotes (`href="..."`). The
entities decoded to literal `"` characters inside the rendered
href, producing `/admin/%22/admin/groups%22` — a 404. Switched
the `set` to a block-set (`{% set page_hero_subtitle %}...{% endset %}`)
so the inline `<a href="/admin/groups">Groups</a>` survives
unescaped through `_page_hero.html`. Also stripped the now-redundant
inline `style="color:#fff;text-decoration:underline;"` — the new
shared rule handles it.
* fix(web): /dashboard top padding matches every other page
`.main` on /dashboard had `padding: 28px 32px 48px` while every
other page now uses `16px 32px 48px` via the canonical
`.container`. Dashboard bypasses `.container` (overrides
base.html's `layout` block to render a full-width `<main>`
directly), so the padding lives on `.main` itself — bumped the
top to 16px to match.
After: first child top = 88, left = 190, width = 1216 — same
numbers as /admin/groups / /admin/users / /admin/marketplaces.
* fix(web): green eyebrow + white title on .page-header--hero (matches /home)
`.page-header--hero .page-header__eyebrow` was faint white
(rgba(255,255,255,0.75)) — readable but unbranded against the blue
gradient. Changed to `var(--ds-brand-accent)` (mint green #54d3a0)
so every page hero pairs a green eyebrow with white title +
subtitle, echoing /home's setup-section header (green eyebrow,
dark heading combo). One CSS rule applies everywhere — no
per-page styling needed.
Also bumped the eyebrow to font-weight 700 / letter-spacing 1.2px
so the green stands out cleanly against the gradient.
* fix(web): page-header--hero + stack-hero use /home navy gradient
`.page-header--hero` and `.stack-hero` were on the brand-blue
gradient (`var(--primary)` → `var(--primary-dark)`) while
/home's hero (`.home-hero-intro`) sits on the deeper navy
gradient (`#0f1b3a` → `#1a2a5f`). Every other page-hero now
uses that same navy gradient so /home, /marketplace, /catalog,
/corporate-memory, /admin/*, /profile, /install, /dashboard,
/setup-advanced share one brand surface. Shadow tint adjusted
to the navy depth (rgba(15, 27, 58, 0.22)).
Brand blue stays the link/CTA colour everywhere else; only the
hero box itself is navy.
* fix(web): primary buttons green; marketplace tabs navy translucent
Two parity tweaks pulling the rest of the app toward /home's
visual language.
- `.btn-primary` (both rules in style-custom.css) now uses
`var(--ds-primary)` / `var(--ds-primary-dark)` green fill,
matching the "Copy install script to clipboard" button on
/home. Brand-blue `--primary` still drives link colour and the
accent surface; only the filled button background flipped to
green. Every page with a `.btn-primary` (admin "+Add user",
"+Add marketplace", catalog, marketplace actions, dashboard,
modals) now reads as the same "do it" affordance.
- `.mp-tabs` (Curated Marketplace / Flea Market / My Stack tab
group) now sits on the navy `--ds-hero-bg` with translucent
white pills (rgba(255,255,255,0.10) inactive, 0.18 active) —
same translucent-white-on-navy treatment as the "Just browse —
no install needed" pill on /home. Icons render as soft white;
per-tab colour-coding dropped in favour of the unified surface.
* fix(web): catalog/memory tabs + empty-state CTA + admin action buttons
Bring /catalog and /memory in line with /home + /marketplace:
- `.stack-tabs` (Browse / My Stack / Recipes on /catalog,
Browse / My Stack on /memory) now uses the navy `--ds-hero-bg`
container with translucent-white-on-navy pills, mirroring the
`.mp-tabs` treatment and /home's "Just browse — no install
needed" CTA pill. Per-tab icon colour-coding dropped — icons
render as soft white on the navy fill.
- `.stack-tabs-row__actions .btn` (right-slot "+New Recipe",
"+New Data Package" admin CTAs) now uses green primary fill
(`--ds-primary`), matching `.btn-primary` and /home's
"Copy install script to clipboard" button.
- `.stack-empty .cta a` (empty-state action button — the
"Open /admin/tables →" CTA on /catalog and equivalent on
/memory) flipped from blue `--primary` to green `--ds-primary`
so the colour aligns with every other primary button in the app.
* fix(web): marketplace Search button green (--ds-primary) matching other CTAs
* fix(web): unify Search button + admin-action button across browse pages
- Added Search button (`<button class="stack-hero__search-btn">`)
to /catalog and /memory heroes — same green pill as /marketplace.
Wired to the existing live-filter pipeline (button click runs
`applyFilters()` and refocuses the input). All three browse pages
now wear the identical search bar UI.
- `.stack-hero__search-btn` shares `--ds-primary` fill with
`.mp-hero .search-btn`.
- `.mp-actions .btn` ("Submit a skill or plugin" CTA on /marketplace)
flipped from the legacy blue-outline to the same green primary
fill + dimensions (`display: inline-flex; line-height: 1;
padding: 9px 16px; gap: 6px`) as `.stack-tabs-row__actions .btn`
on /catalog and /memory. All three right-slot action buttons
render at identical height now.
- `.stack-tabs-row__actions .btn` got `inline-flex` + `line-height: 1`
+ `gap: 6px` so a `<button class="btn">` and a `<a class="btn">`
both render at exactly 33px high — the embedded
`.admin-only-hint` chip no longer pushes one variant taller
than the other.
* fix(web): marketplace guide CTAs green (fastpath + primary); drop flea purple
* fix(web): dashboard CTA hero on navy; readable <code> chips in hero
- `.env-setup-cta` on /dashboard ("Set up a new Claude Code"
card) flipped from the brand-blue gradient + green-tinted shadow
to the canonical navy gradient (`--ds-hero-bg` → `#1a2a5f`) with
navy-tinted shadow + 14px radius + 28/32/24 padding, matching
`.page-header--hero` and /home's `.home-hero-intro`. Dashboard's
top CTA now sits on the same brand surface as every other hero.
- Added `.page-header--hero code` rule — translucent white pill +
warm-yellow ink (#ffd866) so `<code>` chips embedded in hero
subtitles read as code samples against the navy gradient. The
global `code` rule sets `color: var(--text-primary)` (dark),
which turned in-hero chips into invisible dark-on-white-on-navy
ghosts (e.g. the `-by-dev` suffix on /store/new).
- /store/new's `.page-header__subtitle code` dropped its inline
style override — the shared rule handles it now.
* feat(web): two-theme switching via data-theme + admin toggle
Introduces a theme system that flips the entire UI palette between
"navy" (current design, default) and "blue" (pre-redesign palette)
via a single `<html data-theme="...">` attribute. Page markup, class
names, and component styles don't change — only the `--ds-*` token
values flip.
Backend
- New `app/instance_config.py::get_instance_theme()` resolves the
active theme from `AGNES_INSTANCE_THEME` env > `instance.theme`
in instance.yaml > default "navy". Unrecognised values clamp to
"navy" so a typo doesn't break the page.
- `app/web/router.py::_build_context` injects `instance_theme`
alongside `instance_brand` etc. so every template inherits it.
- `app/web/templates/base.html` renders
`<html lang="en" data-theme="{{ instance_theme | default('navy') }}">`.
CSS
- `app/web/static/css/design-tokens.css` adds two new tokens to
the default `:root` set: `--ds-hero-shadow` (drop-shadow tint
on hero boxes) and `--ds-hero-eyebrow` (eyebrow accent colour).
Plus a `:root[data-theme="blue"]` override block that flips
seven tokens: `--ds-primary`, `--ds-primary-dark`,
`--ds-primary-light`, `--ds-brand-accent`, `--ds-hero-bg`,
`--ds-hero-bg-deep`, `--ds-hero-shadow`, `--ds-hero-eyebrow`.
The blue theme aliases the brand surface tokens back to the
legacy `--primary` family.
- `.page-header--hero`, `.stack-hero`, `.env-setup-cta`,
`.home-mock .home-hero-intro` now reference the new
`--ds-hero-shadow` and `--ds-hero-bg-deep` tokens instead of
hard-coding `rgba(15, 27, 58, 0.22)` and `#1a2a5f` — gradient +
shadow now flip with the theme.
- `.page-header--hero .page-header__eyebrow` uses
`var(--ds-hero-eyebrow)` so the eyebrow goes mint-green on
navy and translucent-white on blue (mint on blue reads poorly).
Admin
- `app/api/admin.py::_KNOWN_FIELDS["instance"]` now registers a
`theme` field of kind `select` with options `["navy", "blue"]`
and a `hint` explaining the trade-off. The existing
/admin/server-config UI auto-renders a select for this — no
template changes needed.
Defaults
- Default value is "navy" so existing instances see no visual
change. Admins flip to "blue" via /admin/server-config to
restore the pre-redesign look.
Restart note: uvicorn must reload to pick up the Python changes
(new getter, new template-context key, new known-field). CSS
changes hot-reload via browser refresh.
* fix(web): blue theme — home hero eyebrow + CTA contrast
`.home-hero-intro .eyebrow` and `.btn-intro-primary` referenced
`--ds-brand-accent` directly, which on the blue theme resolves to
the lighter brand-accent blue (#4F9DEB). Result: light-blue eyebrow
on the blue gradient ("WELCOME, ADMIN" barely readable) and a
light-blue button with darker-blue text ("Set up in ~15 min")
that all sat in the same hue range.
Introduces three new theme-aware tokens:
- `--ds-hero-eyebrow` already existed; blue theme bumped opacity
to 0.92 so the eyebrow reads as full white.
- `--ds-hero-cta-bg` + `--ds-hero-cta-fg` + `--ds-hero-cta-bg-hover`
flip the primary hero CTA: mint-green on navy (default), white-
on-blue under `data-theme="blue"`.
`.home-hero-intro .eyebrow` now uses `--ds-hero-eyebrow` (mint on
navy / white on blue) and `.btn-intro-primary` uses the CTA token
trio.
Recommended palette on blue theme:
- Eyebrow: white at 92% opacity (clear on the blue gradient).
- Primary CTA pill: white background, brand-blue dark text
(`--primary-dark` = #005BA3) for AAA-level contrast.
- Secondary CTA: translucent white pill (unchanged).
* fix(web): blue theme — callout-hint info bg/border/ink re-tinted to brand blue (was indigo, clashed with brand-blue hero)
* fix(store): surface review failures + harden publish gate
Four independent fixes to the flea-market submission pipeline, all surfaced
by an admin upload that landed at status='approved' without an LLM review.
1. LLM truncation no longer pins submissions in review_error.
- Raised MAX_RESPONSE_TOKENS 2500 → 6000 in llm_review.py
- Added one-shot retry-with-doubled-budget in anthropic_provider.py
(capped at 4× initial)
2. Flea detail page surfaces the latest submission's failure verdict even
when a previously-approved version is still serving (deferred-promotion
path). The _quarantine_banner gate widened from `visibility != approved`
to also fire on `blocked_inline / blocked_llm / review_error`, with copy
that distinguishes the v2+ edit case ("Latest edit failed review —
previously approved version (vN) keeps serving") from the initial-upload
quarantine wording.
3. Restore button + endpoint no longer allow restoring a version that was
never approved. Added StoreEntitiesRepository.get_with_version_approvals
joining store_submissions, gated the UI button on submission_status in
('approved', None), rendered status pills for non-restorable rows, and
added a 400 version_not_approved guard in POST /restore.
4. **BREAKING (operator-facing)**: publish gate is now fail-CLOSED on
misconfig. The previous get_guardrails_enabled() silently fell back to
"disabled, auto-approve everything" when guardrails.enabled=true in YAML
but no ANTHROPIC_API_KEY was in env. Split into:
- get_guardrails_enabled() (intent — YAML)
- get_guardrails_llm_provider_ready() (readiness — env)
Three-state matrix:
enabled=false → auto-approve (unchanged)
enabled=true + ready=true → normal pipeline (unchanged)
enabled=true + ready=false (NEW) → submissions hold at pending_llm
awaiting admin retry or override
(was: silent auto-approve)
Admin "Retry review" eligibility broadened to include pending_llm.
Boot-time WARNING banner surfaces the misconfig in app/main.py.
docs/STORE_GUARDRAILS.md updated with the three-state matrix.
Operators relying on the auto-fallback for local-dev no-LLM setups must
now explicitly set `guardrails.enabled: false` in instance.yaml.
Tests: 4623 passed. Added TestPublishGateFailClosed (4 tests) and
TestRestoreVersion::test_restore_rejects_* (3 tests). conftest.py adds an
autouse fixture defaulting guardrails OFF so legacy tests don't need to
know about the new toggle.
* fix(store): admin override promotes v2+ edits to current
The override handler at app/api/admin.py:3708 only flipped submission
status → 'overridden' and entity visibility → 'approved'. Under the v37+
deferred-promotion model that's insufficient for v2+ edits / restores:
the new bundle sits in versions/v<N>/plugin/ and the entity row stays at
the prior approved version_no + hash + on-disk live bundle. Installers
kept getting the OLD bytes the admin had just intended to replace.
Mirror the runner.run_llm_review auto-approval branch: look up the
submission's version_hash in entity.version_history, and if its `n`
differs from entity.version_no, promote_version + _swap_live_to_version.
Initial v1 overrides are unaffected — the loop finds n=1 == version_no
and skips promotion.
Tests:
- test_override_v2_edit_promotes_to_current: stage v1 approved + v2
blocked_llm; override the v2 sub; assert entity.version_no=2,
entity.version flips off the v1 hash, and the live plugin/ dir
mirrors versions/v2/plugin/.
- test_override_v1_initial_upload_no_promote: regression guard so the
promote loop doesn't accidentally bump a v1 override.
Audit log gains a promoted_to_version_no field on the override action.
* fix(store): retry/rescan review staged bundle; override forward-only
Two adversarial-review findings from a Codex pass on the publish-gate
work.
C1. Admin retry + rescan were passing live `plugin/` to the LLM. For a
v2+ submission held at `pending_llm` / `blocked_llm` / `review_error`,
live still holds the prior approved version's bytes — so the LLM
reviewed the WRONG bytes, and the runner's hash-match promotion in
`run_llm_review` would then advance the entity to staged bytes that
were never actually reviewed. Resolve the staged
`<entity>/versions/v<N>/plugin/` from the submission's
`version_history` entry, with a fall-back to live for legacy pre-v37
rows that never seeded a versions/ dir. Helpers
`_submission_plugin_dir` and `_version_no_for_submission` added to
`app/api/store.py` so override / retry / rescan share one path.
H1. Override's promote loop used `target != current`, which would
silently demote the live bundle when admin overrode a stale v2
submission while v3 was already approved + live. Changed to
`target > current` so override flips status + visibility on the row
regardless, but on-disk promotion only fires forward. Same `>`
defensive guard applied in `runner.run_llm_review` so a late LLM
verdict racing with a newer approval can't demote either.
Tests:
- TestAdminRetryReviewsStagedBundle::test_retry_v2_blocked_passes_staged_dir_not_live
- TestAdminRetryReviewsStagedBundle::test_rescan_v2_blocked_passes_staged_dir_not_live
- TestOverrideForwardOnly::test_override_stale_v2_does_not_demote_when_v3_current
* review polish: CHANGELOG drift, override eligibility, defensive copy
Three small additions on top of the retry/rescan staged-bundle fix:
1. CHANGELOG: the PR's bullets had drifted into the released
[0.54.17] section during rebase (context-match landed them next
to already-released content). Moved them up to [Unreleased] where
they belong; [0.54.17] now holds only what was actually released
(refresh-marketplace ls-remote, /me/activity hero, CI sharding +
workflow polish).
2. app/api/admin.py: admin override eligibility now accepts
pending_llm alongside blocked_inline + blocked_llm + review_error.
Closes a UX gap from the new fail-CLOSED behavior: under
enabled-but-not-ready, a known-good submission would otherwise
sit indefinitely until the admin set credentials AND clicked
Retry. Override already routes through version_history (and is
now forward-only on promote), so it stays safe for v2+ deferred-
promotion submissions.
3. src/repositories/store_entities.py: get_with_version_approvals
defensively copies each version_history entry before annotating
with submission_status. self.get() re-parses JSON each call today
so this is belt-and-suspenders against any future caching layer
leaking the annotated key into a subsequent plain get() call.
Tests: 112 passed (focused on test_store_entity_versions +
test_admin_store_submissions, covering the retry/rescan staged-
bundle fix the author shipped + this polish).
---------
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
* feat(home): status frame on /home — last sync, sessions, prompts, tokens, projects
Adds the homepage status frame: a 5-card row above the install-hero /
offboard-strip on /home showing the calling user's Last sync (their
last `agnes pull`), Sessions, Prompts, Tokens used, and Projects worked
on, with a 24h/7d pill toggle.
Backed by `GET /api/me/home-stats?window=` (one DuckDB CTE joining
`users` + `usage_session_summary` + `usage_events`) and SSR'd from the
same `compute_home_stats` helper on initial paint so there's no
spinner. The window toggle is the only JS-driven path.
Side surfaces:
- `GET /api/sync/manifest` now stamps `users.last_pull_at` so
`agnes pull` (and the Claude Code SessionStart hook that wraps it)
imprints the analyst's last sync time for the new card.
- `usage_session_summary` gains four BIGINT token counters
(input_tokens, output_tokens, cache_read_tokens, cache_creation_tokens)
summed from JSONL `message.usage.*` per assistant turn.
- `USAGE_PROCESSOR_VERSION` bumps 1 → 2 so the session-pipeline
reprocess loop invalidates stale summaries and backfills tokens
on the next tick.
Schema migration v43 → v44 is idempotent ALTERs (last_pull_at +
4 token columns) — fresh installs receive them from `_SYSTEM_SCHEMA`,
upgrade path runs `_v43_to_v44`. Defaults (NULL / 0) backfill
existing rows cleanly.
9 new tests in tests/test_home_stats.py cover the migration,
endpoint shapes (24h/7d/unknown/empty/missing-user), and the
manifest-side last_pull_at bump.
* docs(CHANGELOG): homepage status frame entries under [Unreleased]
The post-rebase release-cut now belongs to whichever PR lands next
after main rolled to 0.54.9. This PR logs its bullets under
[Unreleased] (Added: homepage status frame, per-user pull tracking,
token counters; Changed: schema v43 → v44 migration) so they ride
out with the next release-cut.
* fix(tests): bump test_schema_v42_migration asserts to v44
CI failed because tests/test_schema_v42_migration.py hardcoded
`assert SCHEMA_VERSION == 43` and `assert v == 43` after init.
v44 (homepage stats frame backing columns) was introduced in the
preceding feat commit; this aligns the existing v42-era migration
tests with the new schema version.
* feat(home): gate status frame on operator flag + user.onboarded
Two gates on the homepage status frame:
1. **Operator master switch** — `get_home_status_frame_visibility()` in
app/instance_config.py mirrors the existing `get_home_automode_visibility()`
shape: env var `AGNES_HOME_SHOW_STATUS_FRAME` > yaml
`instance.home.show_status_frame` > default `True`. Cautious-rollout
instances can disable the frame without forking; the yaml example
documents both knobs.
2. **Onboarded gate** — the template only renders the frame when the
caller's `users.onboarded` is true. First-day users see a clean
install-hero before all-zero stat cards; the frame appears
automatically on the next render after `agnes init` POSTs
`/api/me/onboarded`.
Router skips the `compute_home_stats` DB read entirely when either
gate is closed; `home_stats` arrives at the template as None in that
branch and the `{% if %}` shortcuts the include.
Why both gates: PostHog feature flags evaluated and rejected — this
codebase uses PostHog for analytics capture only, not feature gating;
adding a per-user feature_enabled() call on the /home critical path
would couple the homepage render to a remote eval and still require
an admin master switch. The onboarded gate is a UX coherence rule
layered on top of the operator switch, not an A/B test signal.
3 new tests in test_home_stats.py cover the env-var resolution
(falsey values + default-true). The yaml example gets a `home:`
block documenting both `show_automode` (pre-existing flag, was
undocumented in the example) and `show_status_frame`.
Addresses post-merge review findings on #290:
- Admin Rescan is the only post-v30 producer of status='blocked_inline'.
Re-add it to admin queue 'Needs review' filter chip and to
TERMINAL_BLOCKED_STATUSES in the bundle-purge job so rescan-produced
rows surface in the default operator view and bundles get TTL-swept
instead of lingering indefinitely.
- Update three doc-drift sites still referring to the pre-#290 spam
counter scope (counted blocked_inline). The counter now narrows to
blocked_llm + review_error; fix the comment in app/api/store.py,
the docstring in get_guardrails_blocked_quota_per_day(), and the
operator-facing hint rendered on /admin/server-config.
- Add positive test for _reject_inline_or_continue validation branch
(code='validation_failed', checks payload shape, no-DB-write
contract). Locks the frontend wizard's detail.checks contract.
- Tighten test_quota_disabled_with_zero — assert (200, 201) explicitly
instead of !=429 so a 500 regression no longer passes.
- _reject_inline_or_continue takes plugin_dir and lazy-computes
bundle_meta only on the security branch. Validation rejects no
longer pay for a SHA256 walk on the bundle.
- Surface store.upload.security_blocked audit-log write failures via
logger.exception instead of swallowing — that audit row is the only
forensic trace by design.
* feat(home): Getting Started + Overview + Usage modes sections
Three new content cards rendered between the install-hero and the
existing connector tiles on /home. Order: Getting Started → Overview
→ Usage modes → connectors.
- Getting Started — dismissible card with two clickable rows linking
to /setup (install flow) and /setup-advanced (deeper reference).
Subsumes the legacy `.advanced-pointer` row that sat above the news
section. Per-device dismiss via a generic localStorage handler:
`.home-card-close[data-dismiss-key="..."]` inside a <section> wires
itself up at page load — drop in any future dismissible card without
per-card JS.
- Overview — operator-owned HTML body sourced from the new
`instance.overview` yaml field (env override
`AGNES_INSTANCE_OVERVIEW`). HTML in, HTML out via the same `| safe`
filter as news_intro. Empty default hides the section entirely,
keeping the OSS vendor-neutral; operators paste their product
framing / privacy posture into instance.yaml. New helper
`get_instance_overview()` in app/instance_config.py mirrors
`get_instance_logo_svg()`.
- Usage modes — three OSS-shipped tiles (Terminal / VS Code / Claude
Desktop · claude.ai) explaining each surface and linking to the
matching /setup-advanced anchors. Closes the gap for users
wondering "where do I actually run this".
Supporting changes:
- setup_advanced.html gains a new `#claude-app` section between
#vscode and #workspace, anchored by the Usage modes Claude Desktop
tile. Covers the marketplace registration paths and when to prefer
the terminal. Added to the table of contents.
- Three new tests in test_web_home_page.py pin the Getting Started
card markup, the Overview-on-when-yaml-set path, and the
Overview-off-by-default path. All 13 tests in the file pass.
Operator follow-up (separate infra PR — NOT this PR): paste the
Foundry-specific Overview body into instance.yaml's
`instance.overview` field. OSS ships with an empty default.
* fix(home): Overview is operator-owned content — drop dismiss button
Earlier iteration added a close X to the Overview section to match
the Getting Started card's dismiss UX. Wrong call: Overview is
operator-authored reference content (privacy posture, telemetry
policy, project framing) and a per-device localStorage hide means
returning users who want to re-read the policy can't recover it
without clearing storage.
Reverts the close button + the data-dismiss-key on the Overview
section. Test inverted to assert the dismiss key is absent (defends
against a future drive-by adding it back). Getting Started still
dismisses — that's procedural getting-started content users
legitimately stop needing once they've finished setup. Overview is
always reachable; whole section is still opt-in at the operator
level via the empty-yaml default.
* fix(home): Terminal usage-mode tile is informational (no click-through)
The setup hero above /home's Usage modes already walks the user
through the Claude Code CLI install — the Terminal tile click-through
to /setup just round-trips back to content the user already scrolled
past. Switch Terminal to a non-anchor <div> and scope the hover
affordance to a.home-usage-item so VS Code + Claude Desktop tiles
keep their click-through (those legitimately deep-link into
/setup-advanced anchors).
* fix(home): point Usage modes guidance at ~/{workspace}/Projects/ subfolder
The bundled plugin scopes the session-analysis loop and the
central-catalog sync to ~/<workspace>/Projects/, not the workspace
root itself — that convention already appears in the install hero's
Step 4 manual-fallback note ('Don't create ~/<workspace>/Projects/
manually — the bundled plugin offers to set it up after install').
Usage modes' footer guidance now matches: 'create every project
under ~/<workspace>/Projects/'. Also calls out that the
session-analysis loop is scoped to that root so users understand
why working outside the workspace dir is invisible to the platform.
* feat(brand): inline operator SVG logo + drop header subtitle (release 0.54.6)
Three header tweaks, one PR:
1. _app_header.html drops the small uppercase subtitle line below the
brand. instance.subtitle still flows into the CLAUDE.md preamble +
init welcome template ("Operated by …"); only the web header chrome
loses it.
2. get_instance_logo_svg() in app/instance_config.py reads
instance.logo_svg (yaml) / AGNES_INSTANCE_LOGO_SVG (env). The
yaml field was already documented in instance.yaml.example and the
template already supported inline <svg> via {{ config.LOGO_SVG |
safe }}, but router.py:344 hard-coded LOGO_SVG = "" — the middle
wire was missing. Now operators can paste a lockup directly into
their instance.yaml under instance.logo_svg: | and have it render
in the header. Resolution mirrors get_instance_brand (env > yaml >
""). instance.name remains independent: drives browser <title>
tags + page h1s + CLAUDE.md heading; the SVG is the web-header
visual only.
3. .app-header-logo svg gains max-height: 40px; width: auto; so any
operator's lockup scales via its viewBox to fit the 72px header
without per-asset width/height edits. Pairs with #2 — without the
clamp, raw artwork (e.g. a 1600x430 lockup) overflows the chrome.
Release-cut included per the same-PR rule (Unreleased contained only
these bullets after rebase onto 0.54.5).
* revert: keep app-header-subtitle span — out of scope for this PR
Initial commit dropped the subtitle line on the assumption that
the user wanted both the secondary header line AND the future-SVG
brand cleaned up. The actual ask was narrower: drop the hostname
suffix that renders inside instance.name ("Foundry AI (hostname)"),
which is a startup.sh concern, not a template one. Restore the
subtitle span and the CHANGELOG bullet that announced its removal.
PR scope narrows to LOGO_SVG wiring + CSS clamp only.
* fix(header): hide subtitle span when instance.subtitle is empty
Pre-fix the template fell back to the literal string 'Data Analyst
Portal' when INSTANCE_SUBTITLE was unset, so operators who left the
field empty saw a stray hardcoded label below their brand. Switched
to a Jinja {% if %} guard around the whole <span class="app-header-
subtitle"> so an empty subtitle produces no element at all — clean
header chrome instead of placeholder leak.
* feat(home): hide install-hero once onboarded + X close button
- Wrap the entire install-hero in `{% if not onboarded %}` so once
`users.onboarded=true` (auto-flipped by `agnes init` POSTing
/api/me/onboarded, or by the new X / existing fallback button) the
blue hero disappears entirely. Pre-PR the onboarded branch reused
the same shell with a "Welcome back" header + "Steps 1–4 done" badge
+ minimize toggle, which visually outweighed the actual nav hub.
- Add a circular × close button (top-right of the hero, rendered only
when not-onboarded). Click → window.confirm() asking the user to
acknowledge onboarding → POST /api/me/onboarded → reload. The
confirm string intentionally avoids the literal phrase
"Mark me as offboarded" because cli/commands/onboarded.py::status
scans /home's rendered HTML for that exact marker as a fallback for
the api/me/profile check.
- Lift the offboard escape hatch out of the hero into a discrete
`.offboard-strip` rendered below, gated `{% if onboarded %}`. Lets
the analyst flip back to the install view after wiping their
workspace folder.
- Centralize the /api/me/onboarded POST into a `postOnboarded()` JS
helper reused by the hero X, the existing "Mark me as onboarded"
fallback button, and the new offboard button.
Tests updated to match the new behavior:
- `test_home_onboarded_user_sees_nav_hub` — asserts the hero is gone
and the offboard strip is the only setup-flow remnant.
- `test_minimize_toggle_no_longer_rendered` (renamed) — asserts the
minimize toggle is absent in both states (was previously rendered
inside the now-hidden onboarded branch of the hero).
- `test_home_no_auto_transition_after_post_until_reload` — checks
offboard-strip presence post-flip instead of the removed
"Welcome back" hero copy.
* fix(home): X-close button used invalid source enum, hit 422
The X button's data-target-source was 'self_acknowledged_x' to give
audit_log a separate marker for X-vs-button-driven flips. But
app/api/me.py:38's OnboardedRequest pins source to a Literal of
['agnes_init', 'self_acknowledged', 'self_unmark'] — pydantic
returned 422 on every X click.
Confusing side effect: both buttons share self-mark-status as the
status element, so the failed X click rendered 'Failed (422)' next
to the still-functional 'Mark me as onboarded' button. Looked like
the button itself broke.
Fix: drop the _x suffix. Both surfaces now POST source='self_acknowledged'.
Distinction in audit_log is not load-bearing — the source field
captures user intent ('I'm onboarded'), not the specific UI affordance.
* feat(store-guardrails): admin-configurable content thresholds
Adds the flea-market content guardrail floors to the /admin/server-config
editor so operators can tune the bar without code changes. Defaults are
unchanged (60 chars description, 25 chars command, 5 distinct words, 200
chars body) — patching guardrails.* in instance.yaml or via the admin UI
overrides any of them and the next inline check picks up the new value.
src/store_guardrails/content_check.py now resolves the four floors via
helper functions (_min_desc_chars / _min_command_desc_chars /
_min_distinct_words / _min_body_chars) that read app.instance_config at
call time. Module-level _DEFAULT_* constants remain as fallbacks if
the import fails (defensive — keeps the guardrail module loadable
without the app package on its path).
app/instance_config.py grows four matching getters returning the live
value with sane defaults + integer coercion.
app/api/admin.py registers 'guardrails' as an editable section + ships
nine known-fields entries (min_description_chars,
min_command_description_chars, min_distinct_words, min_body_chars,
enabled, review_model, blocked_quota_per_day, blocked_bundle_ttl_days,
stuck_review_grace_seconds) with operator-facing hint copy explaining
what each knob does.
app/web/templates/admin_server_config.html gets a SECTION_META entry
so the section renders as 'Flea-market guardrails' with a help string
instead of a bare section ID.
app/web/router.py threads the live thresholds into /store/new and
/store/examples via a small _guardrail_thresholds() helper so the
disclosure copy, char counter, and "Why these limits" table render
the configured value (not a hardcoded 60). End-to-end smoke verified:
PATCH guardrails.min_description_chars=90 → /store/new immediately
renders "90 characters" + JS DESC_MIN=90 on the next request, no
restart required (helpers read live config per call).
* chore(store-guardrails): address PR review safe-fix findings
Code-review safe_auto findings on PR #281 (review run
20260513-100126-64052520):
- CHANGELOG: add Unreleased entry covering the new
/admin/server-config Flea-market guardrails section, the four live
threshold getters, and the route-helper rendering knobs. Required by
the project's non-negotiable "Changelog discipline" rule.
- content_check.py: narrow `except Exception` to `except ImportError`
on the four `_min_*()` resolver helpers. Surface-level TypeError /
ValueError on a malformed YAML value belongs to the
instance_config getters' own try/except — the resolvers should only
defend against the in-tree import itself failing, not silently
swallow real bugs in the getters.
- store_upload.html: refresh the stale "30-char threshold" comment to
reflect the configurable floor (default 60), and add `|default(60)`
/ `|default(25)` / `|default(5)` filters to the disclosure-copy
bindings so the upload form matches store_examples.html's
belt-and-suspenders rendering if a future route ever renders the
template without populating the `guardrail` context.
- router.py: tighten `_guardrail_thresholds()` return annotation
from bare `dict` to `dict[str, int]`.
Residual work (left for separate change after operator direction):
- Add round-trip test (PATCH guardrails -> next inline check uses
new value) — primary testing gap.
- Decide policy on `min_*=0` (currently coerced to 1 via
`max(1, int(val))`) vs treating 0 as a disable sentinel like
neighbour getters (`blocked_quota_per_day`,
`blocked_bundle_ttl_days`).
- Add POST-time integer validation for `guardrails.*` so a typo'd
YAML value (bool / string / float) errors loudly instead of
silently falling back to the default.
* test(store-guardrails): cover admin-configurable thresholds + PATCH round-trip
Closes the "primary testing gap" Vojta noted in the safe-fix commit
on PR #281 — the four new `get_guardrails_min_*` getters and the
PATCH-takes-effect-on-next-check live-config flow had no direct
coverage.
10 new tests in `tests/test_store_guardrails_admin_config.py`:
- TestGuardrailGetterDefaults (4 tests) — each new getter returns the
documented default (60 / 25 / 5 / 200) when nothing is configured.
- TestGuardrailGetterOverlay (5 tests) — overlay-driven overrides win,
string values that look numeric coerce via int(), garbage strings
fall back to default via the (TypeError, ValueError) branch, and the
`max(1, int(val))` floor pins zero/negative inputs to 1.
- TestPatchRoundTrip (1 test) — PATCH `/api/admin/server-config`
`guardrails.min_description_chars=90`, then call content_check
against a 75-char description that previously passed: must now fail
with `too_short`. Then PATCH back to 60 and verify the next check
passes again. Closes the cache-invalidation contract Vojta relies on
for the "no app restart" claim — broken without the
reset_cache() bracket in /api/admin/server-config.
The TestGuardrailGetterOverlay.test_zero_or_negative_floored_to_one
test pins the current `max(1, int(val))` policy. Vojta's safe-fix
commit explicitly left "policy on min_*=0 vs disable-sentinel" as
residual work — pinning the current behavior here ensures any future
change to use 0 as a disable sentinel must update this test (and the
reviewer sees the policy decision).
Verified: 4509 tests pass locally (4499 existing + 10 new).
* release: 0.54.2 — admin-configurable flea-market guardrail thresholds + tests
Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.1 →
0.54.2) bundling Vojta's admin-configurable thresholds for the
flea-market content guardrail (9 knobs in /admin/server-config) plus
the test coverage closing the "primary testing gap" he punted in the
safe-fix commit.
No DB migration; defaults unchanged from PR #276 — instances that
don't set `guardrails.*` keep the original bar transparently.
---------
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
Co-authored-by: ZdenekSrotyr <139972147+ZdenekSrotyr@users.noreply.github.com>
- instance.brand (env AGNES_INSTANCE_BRAND, default "Agnes") +
instance.workspace_dir replace hard-coded "Agnes" / "~/Agnes" across
/home, /setup, /setup-advanced, /login, /install, /me/debug, and the
Claude Code clipboard setup script. Terraform-friendly env override;
defaults preserve existing Agnes branding.
- Explicit "create workspace folder" step on /home (OS-tabbed mkdir+cd)
+ same step baked into the clipboard script as step 2. Drops the
implicit assumption that `agnes init --workspace .` lands in a
sensibly-cd'd shell.
- Final "Restart Claude Code" step in the setup script (unconditional,
between connectors and Confirm) so freshly-installed plugins, MCP
servers, and SessionStart hooks load on the next Claude Code session.
- Asana reverted from hosted Remote MCP back to PAT + raw REST against
app.asana.com/api/1.0. MCP envelope shape consumed ~5x tokens per
call; the PAT path lets the agent read flat REST fields. Existing
MCP registration is detected and the user is asked whether to remove
it (default Y, with benefits listed: token cost, no third-party hop,
no OAuth refresh dance, deterministic envelope shape).
- Atlassian connector instructs picking the longest API-token expiry
(today "1 year") to cut re-mint friction. No public query-parameter
hook exists on id.atlassian.com to pre-select expiry, so the prompt
documents the manual click and acknowledges that limitation.
- Uniform ✅ / ❌ per-connector marker contract (Asana, GWS, Atlassian)
for the Confirm summary to grep. Each connector now ends with a
Claude-driven end-to-end test that uses Claude Code's own bash to
exercise the stored credential and prints
"✅ <Connector> integration verified — ..." (or the failure variant).
* fix(cta): fall back to textarea+execCommand when Clipboard API rejects
The "Setup a new Claude Code" CTA fetches /auth/tokens, parses the JSON
response, renders the setup script, THEN calls
`navigator.clipboard.writeText()`. Modern browsers (Safari, Firefox, and
Chrome on stricter configurations) reject `writeText` with
NotAllowedError when transient user activation has been consumed by an
intervening `await` — which is exactly the case here. Users perceived
this as "the browser blocked the copy" and got the manual-paste fallback
modal even though the textarea + `document.execCommand('copy')` path
WOULD have worked synchronously without needing fresh user activation.
`copyToClipboard` now:
- prefers the modern Clipboard API (unchanged for the happy path)
- on writeText rejection, falls back to `copyViaTextarea` instead of
surfacing the rejection to the caller's catch block.
`copyViaTextarea` is the previously-inline textarea fallback factored
out into a named helper, with two small hardening touches:
- `readonly` + `tabindex=-1` so the hidden textarea doesn't steal
focus or pop the virtual keyboard on mobile.
- explicit `setSelectionRange(0, text.length)` to belt-and-braces the
selection on iOS Safari (where `.select()` alone sometimes selects
zero chars on touch-focused textareas).
Only the CTA button needed this — the Step-1 install-command and the
connector-copy buttons all call `writeText` synchronously inside the
click handler (no awaits in between), so they keep their existing
user-gesture context and didn't hit the same rejection. No template
changes there.
* refactor(home): fold Atlassian MCP registration into connectors block
The standalone "Register the Atlassian MCP server" step (was step 6 in
the unified setup script) moves INTO the Atlassian connector's prompt
body so all Atlassian-related setup lives in one logical group. Same
intent that #247 carried for connectors, applied one level deeper:
the hosted Remote MCP registration is part of "set up Atlassian", not
its own ungrouped step.
What changed:
- `app/web/connector_prompts.py` — the Atlassian prompt's step 5
replaces the speculative "Register the on-demand Atlassian MCP under
.claude/mcp/atlassian" line with the actual hosted Remote MCP
registration: `claude mcp add --transport sse atlassian
https://mcp.atlassian.com/v1/sse || true`. The `|| true` keeps re-runs
idempotent and the body explains the OAuth-on-first-use contract.
Both /home's Atlassian tile and the inlined setup-script Atlassian
sub-block emit this line — single source of truth holds.
- `app/web/setup_instructions.py` — `_mcp_servers_block` deleted; the
`mcp_servers` step is removed from `_step_numbers`; resolve_lines no
longer calls it.
- Renumbering: install (1), init (2), catalog (3), preflight (4),
marketplace (5), diagnose (6), connectors (7), confirm (8). Was:
6 = mcp_servers, 7 = diagnose, 8 = connectors, 9 = confirm.
- `tests/test_setup_instructions.py` — Confirm step 9→8, Connect 8→7,
diagnose 7→6, mcp_servers references dropped.
`test_step_numbering_with_connectors_step` now asserts
`"mcp_servers" not in steps`. Stray-Confirm assertion lists shift
by one position.
- `tests/test_setup_page_unified.py` + `tests/test_web_ui.py` — same
step-number shifts in the rendered /setup preview assertions.
The `claude mcp add` line is still the Atlassian Remote-MCP path that
the 2026-05-10 init-report Fix C added — only its position in the
flow changes. /home Atlassian tile copying continues to install the
MCP too (the prompt body the tile pastes contains the same line).
112 tests pass.
* feat(atlassian): operator-overrideable base URL via AGNES_ATLASSIAN_BASE_URL
Adds an env var / YAML key the operator (Terraform module, customer-VM
template, OSS instance.yaml) can set to bake the Atlassian Cloud site
root into the connector prompt — so end users don't have to guess /
paste their org's `https://<myorg>.atlassian.net`.
When set, the Atlassian connector prompt (rendered on both /home tile
and inlined into the setup-script step 7 Atlassian sub-block) replaces
step 1's "Ask me for my Atlassian Cloud site URL and email" with a
one-line note that the URL is already provisioned by the operator and
asks only for the email. Step 4's helper-script body has the
`BASE_URL='<the site URL I gave you>'` placeholder substituted with
the literal value. When unset (empty), the existing "ask the user"
flow remains — no regression for OSS instances.
Resolution + normalization in `get_atlassian_base_url()`:
- env `AGNES_ATLASSIAN_BASE_URL` > yaml `instance.atlassian.base_url` > ""
- strips trailing slash + trailing `/wiki` so the canonical value is
the bare site root. Matches the per-user helper script's
normalization at storage time (atlassian_prompt step 4 guard 2), so
the literal baked in by the operator stays consistent with what the
user's helper script would have computed from their input.
Plumbing:
- `app/instance_config.py`: new `get_atlassian_base_url()` resolver.
- `app/web/connector_prompts.py`:
- `atlassian_prompt(*, base_url: str = "")` — string-replace two
explicit placeholder phrases when base_url is truthy; otherwise
return the prompt unchanged.
- `all_connector_prompts(..., atlassian_base_url: str = "")` —
forwards the kwarg.
- `app/web/router.py` (`_build_context`): reads
`get_atlassian_base_url()` and passes it through to
`all_connector_prompts(...)` so both the /home tile context AND the
inlined-script `resolve_lines(...)` call use the same value.
- `src/welcome_template.py` (`compute_default_agent_prompt`): same
threading via the existing import-on-demand path.
Tests (`tests/test_home_route_resolution.py`):
- `get_atlassian_base_url` resolver: default empty, env override,
trailing-slash strip, trailing-`/wiki` strip.
- `atlassian_prompt(base_url=...)`: literal URL baked in, ask-step
removed, placeholder replaced, operator-baked-in copy appears.
- `atlassian_prompt(base_url="")`: existing ask-the-user flow
unchanged.
- `all_connector_prompts(atlassian_base_url=...)`: kwarg threads
through to the rendered atlassian prompt.
135 tests pass.
* feat(asana): register hosted Asana Remote MCP in connector prompt
The Asana connector prompt only stored a PAT in the OS keychain + ran
a curl verify against /api/1.0/users/me. That set Claude Code up for
direct `curl` calls but didn't actually wire Asana into Claude's tool
list — so the user couldn't ask Claude to "find my open Asana tasks"
and have it work. Symmetric oversight to the Atlassian connector's
original speculative `.claude/mcp/atlassian` line that this branch
already replaced with `claude mcp add --transport sse atlassian
https://mcp.atlassian.com/v1/sse`.
Adds a new step 5 that registers Asana's hosted Remote MCP:
claude mcp add --transport http asana https://mcp.asana.com/mcp || true
This is the V2 endpoint (streamable HTTP transport, launched February
2026). The V1 SSE endpoint at https://mcp.asana.com/sse was deprecated
2026-05-11 (today) and must NOT be used — calling it out explicitly
in the prompt body so a future operator who finds an old reference
doesn't paste the dead URL. OAuth is handled by Claude Code at first
use, same model as the Atlassian MCP step.
The PAT stored in step 3 stays for direct `curl` calls (precheck +
ad-hoc scripts) — the MCP path uses its own OAuth grant, not the PAT.
Old step 5 (revoke instructions) renumbers to step 6 and adds the
`claude mcp remove asana` cleanup hint.
Same single-source-of-truth invariant holds: /home Asana tile + the
inlined Asana sub-block in the setup script (step 7 connectors) both
emit identical text from `asana_prompt()`.
71 tests pass.
* feat(asana): drive MCP OAuth login + end-to-end validation post-register
`claude mcp add --transport http asana ...` only registers the
server in Claude Code's local config — it does NOT trigger OAuth.
The browser tab opens the first time any `mcp__asana__*` tool gets
invoked. So the previous step 5 left a user looking at a "registered"
MCP that, in practice, hadn't authed yet and would fail on first
real use. Same blind spot Atlassian's prompt also has, but Asana was
the one called out in the latest review pass.
Adds a new step 6 between MCP registration (step 5) and the revoke
instructions (now step 7):
a. Tell the user verbatim what's about to happen — a low-impact
read through the MCP will pop the OAuth browser tab; sign in
with the same account whose PAT they stored in step 3 and
approve. Frames the OAuth as one-time so users don't wait
for it on every later call.
b. Drive an actual MCP read. Don't prescribe the exact tool name
because the Asana MCP's exposed surface (`mcp__asana__*`) is
versioned upstream and we don't want to pin to a name that
gets renamed. Instead: tell Claude to pick the lightest read
from its surfaced tool list (users-me / list-workspaces /
equivalent). Document the recovery path when Claude Code
times out waiting for the OAuth tool use: `claude mcp list`
to confirm registration before retrying.
c. Print a single one-line proof that combines wiring + auth:
"Asana MCP connected as <name> — <N> workspace(s) visible."
Explicit anti-echo callout for tokens, task content, comments.
On failure, surface the exact Claude-Code error and stop —
no silent pass.
d. Sanity-check that the MCP OAuth identity and the PAT identity
reference the same Asana account. Easy mistake to make when
the user has multiple Asana accounts — flag only on mismatch,
keep quiet when they match. Recovery: `claude mcp remove asana
&& claude logout asana` then redo step 5.
Step 7 (revoke) absorbs both the keychain delete + the
`claude logout asana` line so users have a single place to undo
everything.
43 tests pass.
* fix(init): clear stale CA env vars on Windows before any TLS handshake
Reported by the 2026-05-11 Windows test pass: after `agnes init` the
gws connector failed with `UnknownIssuer` TLS errors because
`SSL_CERT_FILE` and `REQUESTS_CA_BUNDLE` were still set in Windows
User scope pointing at `C:\Users\localadmin\.config\agnes\ca-bundle.pem`
— a file that did not exist on the test host. Past Agnes installs
(the setup-prompt trust block + older bootstrap helpers) write those
pointers when they materialize a combined Agnes-CA bundle; when the
bundle file later disappears (re-init on a new VM, machine swap, the
~/.agnes dir wiped), the pointers go stale and every native Windows
TLS handshake fails before Agnes itself runs. SSL_CERT_FILE in
particular REPLACES (not appends to) the trust store, so a stale
pointer is silently catastrophic.
`agnes init` now clears stale pointers in two layers before the first
server roundtrip:
1. Current-process env (os.environ) — what the immediately-following
`api_get` to /api/catalog/tables actually reads. Without this, init
itself blows up before it gets to step 2.
2. Windows User-scope env via PowerShell
`[Environment]::SetEnvironmentVariable(name, $null, 'User')` — what
every future shell + every native tool (gws, claude.exe, pip, uv)
inherits. The 2026-05-11 reporter expected this exact cleanup
("init was supposed to clear these but they persisted").
The cleanup is best-effort and conservative:
- Only deletes a var when its value points at a path that does NOT
exist on disk. Intentional operator config (e.g. SSL_CERT_FILE
pointing at a corp certifi bundle) stays put.
- PowerShell missing / restricted execution policy / WSL-without-pwsh:
swallowed silently. The current-process leg still runs, which
unblocks init even on hosts where the User-scope leg cannot fire.
Tests (`tests/test_init_ca_cleanup.py`, 6 cases):
- Stale pointers → removed from process env.
- Real-path pointers → preserved.
- Non-Windows hosts: PowerShell is not invoked.
- Windows hosts: PowerShell IS invoked with a script that checks
all three vars + uses Test-Path + SetEnvironmentVariable.
- PowerShell FileNotFoundError: cleanup swallows it, does not raise.
- `_is_windows_host()` reflects sys.platform.
* refactor(asana): MCP-first flow — drop PAT storage, precheck via `claude mcp list`
The Asana hosted MCP at https://mcp.asana.com/mcp authenticates via
OAuth (Claude Code holds the grant; browser tab pops on first tool
use). The earlier prompt walked the user through creating + keychain-
storing an Asana Personal Access Token AND registering the MCP — two
parallel auth surfaces for one connector. Once the MCP works, the PAT
has no consumer: the precheck/verify steps that used `curl
$BASE/api/1.0/users/me` are just redundant proof that Asana itself is
reachable, which the OAuth handshake already establishes.
Removed:
- Step 0 keychain probe + curl verify against /users/me with PAT.
- Step 1 open developer-console / create PAT.
- Step 2 click "+ New access token", warn shown-ONCE.
- Step 3 helper-script for keychain-storage (per-OS bodies: macOS
`security add-generic-password`, Linux `secret-tool store`, Windows
`cmdkey /generic`).
- Step 4 PAT-side `users/me` verify.
- Step 5's split that kept the PAT around for direct curl scripts.
- Step 6d's "MCP vs PAT identity sanity check" — there is no PAT
anymore, nothing to mismatch against.
New flow (3 steps total):
- Step 0 precheck: `claude mcp list | grep ^asana` — if found, the
server is registered AND Claude Code is holding its OAuth grant
(otherwise prior failure would have removed it); print
"Asana MCP already registered — skipping setup" and stop. Tells the
user the explicit reset command (`claude mcp remove asana && claude
logout asana`) so a re-register stays one paste.
- Step 1: `claude mcp add --transport http asana
https://mcp.asana.com/mcp` — no `|| true` because step 0 should have
caught the "already exists" case. Step explains the V2-vs-V1
endpoint distinction (V1 SSE deprecated 2026-05-11) and the
abort-clean recovery if the precheck somehow missed the existing
server.
- Step 2: same OAuth + low-impact-read validation pattern as before.
- Step 3: revoke instructions (mcp remove + logout + Asana-side app
revoke at app.asana.com/Settings → Apps).
Both surfaces (the /home Asana tile and the inlined Asana sub-block
in the setup script's step 7) emit the new text from the same
asana_prompt() — single-source-of-truth invariant intact.
77 tests pass.
* Make /home install-hero links readable against blue background
The Claude license-options link added in the previous commit inherited
the default `<a>` style (`var(--hp-primary)` blue), which renders as
blue-on-blue and is unreadable inside the blue install-hero. Add a
scoped `.install-hero a` rule that uses white with an underline
(matching the existing lead-paragraph contrast pattern) so any link
nested in the hero stays legible.
* Reorder /home install flow: auto-mode is now Step 2, Agnes install becomes Step 3
Step 3 (was Step 2) pastes a ~20-command bash bootstrap into a fresh
Claude Code session. Without auto-mode enabled first, each Bash/edit
command needs a manual approve click — bad UX for first-time users.
Move auto-mode from the outside-hero `<details>` reference block into
the install-hero as a real Step 2, between "install Claude Code" and
"install Agnes". Content is the persistent `acceptEdits` snippet
(write to ~/.claude/settings.json) plus a one-liner pointing at
Shift+Tab for users who are already inside a running Claude Code
session. YOLO mode for full Bash auto-approve stays on
/setup-advanced behind the existing link.
The outside-hero `setup-collapsible[data-section="step3"]` block is
dropped — auto-mode is no longer reference content, it's a real
install step, and duplicating it would just diverge over time.
Onboarded users no longer see the auto-mode block at all (consistent
with Steps 1 + 3 also hiding post-onboarding).
Completion banner copy updated: "Step 1, 2 & 3 done — Claude Code
installed, auto-mode set, Agnes ready". Dashboard CTA partial and
other templates don't reference step numbers for this flow, so no
adaptation needed there.
* Simplify /home Step 2 to Shift+Tab only — drop the JSON snippet
Operator pointed out two issues with the prior Step 2:
1. The settings.json snippet is redundant. Claude Code's first
Shift+Tab cycle to auto-accept mode already prompts the user
whether to persist it as default — Claude writes the config
itself, no manual file edit needed.
2. The snippet only showed the POSIX path `~/.claude/settings.json`,
which doesn't translate to native Windows.
Replace the snippet + copy button with a plain Shift+Tab instruction,
explicitly call out the first-time "make this the default?" prompt,
and note that Claude handles the config write itself — same flow on
macOS / Linux / WSL / Windows. Adds a fallback line for users who
already closed the post-OAuth session.
* Tighten /home Step 2 install-note to two paragraphs
Operator: drop the 'Claude writes the setting itself, so this works
the same on macOS / Linux / WSL / Windows...' line plus the
'auto-approves file edits going forward; Bash commands stay gated
— that's the safe default' line. Both were filler — the make-default
prompt already implies persistence, and gated Bash is the obvious
default users won't be surprised by.
Result: paragraph 1 carries Shift+Tab + first-time make-default
say-yes + closed-session fallback in one breath; paragraph 2 keeps
the verbatim YOLO link. Same affordances, less vertical space.
* feat(store): flea-market upload guardrails + soft delete + JOIN-based admin queue
Adds an end-to-end guardrails pipeline for store uploads (manifest +
static-security + LLM review), persists blocked bundles for forensics,
introduces soft-delete (Archive) semantics, consolidates the legacy
/store/{id} surface into /marketplace/flea/{id}, and reworks the admin
queue so lifecycle filters read live entity visibility via LEFT JOIN
rather than a denormalized submission column.
Schema v29 → v35:
* v29 store_submissions table + store_entities.visibility_status
* v30 file_size, bundle_sha256, bundle_purged_at on submissions
* v31 reshape store_submissions (drop legacy unique on entity_id)
* v32 store_entities.archived_at/by + 'archived' visibility value
* v33 drop store_submissions.retry_count (unused)
* v34 ensure idx_store_submissions_entity exists post column-drop
* v35 broaden visibility_status enum + JOIN architecture cutover
Pipeline (src/store_guardrails/):
* Inline checks: manifest_check, static_scan, quality_check
* LLM review configurable haiku|sonnet|opus (default haiku)
* BackgroundTasks-driven async path with structured-output JSON
* Per-submitter daily quota (default 50)
* 30-day TTL purge job (POST /api/admin/run-blocked-purge)
* Bundle SHA256 + size persisted; sha256 survives purge for forensics
Visibility model:
* pending | approved | hidden | archived
* _enforce_visibility returns 404 (no leak) for non-owner non-admin
* Owner sees own non-approved entries via include_owner_id widening
* Install refused with 409 entity_not_approved when not approved
Soft-delete (DELETE /api/store/entities/{id}):
* Default = soft (visibility_status='archived'); existing installs
keep getting served the bundle so users don't lose the plugin
* ?hard=true admin-only: drops bundle + cascades user_store_installs
* Hard-delete preserves entity_id on submission as tombstone so
audit_log linkage survives for the activity timeline
Admin queue lifecycle (the JOIN refactor):
* Verdict (store_submissions.status) is immutable forensic record
* Lifecycle (store_entities.visibility_status) is live state
* /admin/store/submissions Archived chip translates to
`e.visibility_status='archived'` via LEFT JOIN — any path that
flips visibility surfaces in the queue immediately
* Detail page renders Status (verdict) and Entity lifecycle side by
side so admins see "approved at review, now archived" at a glance
URL consolidation:
* /store/{id} deleted (no redirect, stale bookmarks 404)
* /marketplace/flea/{id} is the canonical detail surface
* Three in-tree callers (upload-success, my-stack card, store
listing card) updated to point at the new URL
* Quarantine banner extracted to _quarantine_banner.html partial,
self-guarded, included from both flea detail templates
* Banner JS auto-refreshes when the verdict lands by polling
/api/marketplace/flea/{id}/detail (visibility_status +
submission_status — the latter is needed because blocked_llm
keeps the entity at visibility_status='pending')
Audit log resource format:
* runner.py emits prefixed `store_submission:{id}` (post-fix)
* Detail-page timeline query handles three patterns: prefixed
submission, helper-emitted `store_entity:{sub_id}`, and bare-id
legacy rows — all surface in the activity timeline
UX fixes:
* Owner sees Under review / Quarantined / Hidden banner with status
* Install button gray-disabled (not blue) when non-approved
* Owner cannot delete quarantined entries (403); admin can
* Admin queue: filter chips, sortable columns, paging, page-size
* Auto-refresh queue every 5s while pending rows are visible
* Store upload page file picker no longer opens twice (label →
input default action collided with explicit JS handler)
Tests: 168 passed across the guardrails suites (admin submissions,
store API, inline / LLM / purge guardrails, store repositories,
marketplace filter, schema version). New regression coverage
includes: archive surfaces via JOIN even when API path is bypassed;
deleted submission renders activity timeline (tombstone); flea
detail surfaces submission_status only for owner/admin; detail page
renders Entity lifecycle row; audit log resource format covers both
helper and runner paths.
* fix(store-guardrails): PR #233 follow-up — prompt injection, atomic PUT, BG race, schema, reaper, sort whitelist
Addresses 9 of the 23 findings from the PR #233 review (spec at
docs/superpowers/specs/2026-05-09-pr233-guardrails-fixes-spec.md).
Merge-gate items #1-#6 plus high-value mediums #7, #9-#12, #23.
Architectural items (#8 enum split, #14 factory) and pure
maintainability (#15-#22) deferred to follow-ups.
Security:
* #1 prompt injection — SYSTEM_PROMPT now passed via the SDK's
dedicated system= parameter; bundle wrapped in <bundle>...</bundle>
sentinels declared data-only by the system prompt; literal
sentinel strings in user content are escaped so an adversarial
README can't forge a close tag.
* #6 static scan honesty — module docstring + admin copy + docs
declare static scan as signal not gate; .md/.txt/.rst/.html/.json/
.yaml/.yml/.toml skipped to avoid false positives on prose.
AST mode for Python deferred (separate flag, FP comparison work).
Correctness:
* #2 PUT atomicity — bundles bake into plugin.staging-<rand>/
alongside live, atomic-rename on success; failed checks leave
live tree byte-for-byte intact.
* #3 BG-task race — set_visibility_if_pending guards verdict flips
to the (pending, hidden) review window; admin archives during
review survive; skipped flips audit-logged.
* #4 v35 NOT NULL/DEFAULT — schema v35→v36 re-applies them on
store_entities.visibility_status. CHECK constraint enforced
application-side (DuckDB ADD CHECK on existing column unsupported).
* #7 stuck-review reaper — reap_stuck_llm_reviews flips pending_llm
rows older than guardrails.stuck_review_grace_seconds (default
1800) to review_error. Scheduler runs every 15 min via new
/api/admin/run-reap-stuck-reviews. Set knob to 0 to disable.
* #9 quota counter — count_blocked_for_submitter_since now counts
blocked_inline + blocked_llm + review_error so a submitter
triggering only LLM-blocked verdicts is bounded.
* #10 missing risk_level — surfaces as review_error with
error='missing_risk_level' instead of silently defaulting to
'medium' (which looked like a model-decided block).
* #11 archived_at clear — set_visibility nulls archived_at +
archived_by when transitioning out of 'archived' so a future
read doesn't show stale archive forensics on an approved row.
Maintainability:
* #12 FSM doc comment — accurate insert/transition/lifecycle
description in src/db.py near store_submissions schema.
* #23 sort-key whitelist — admin queue rejects unknown sort keys
with 400 invalid_sort_key; substring-replace footgun removed.
Deferred (separate PRs):
* #5 quota race — proper fix requires asyncio.Lock spanning the
full pipeline; threading.Lock blocks event loop, DuckDB MVCC
doesn't help. API-level slowapi bounds worst case for now.
* #6 part 3 (AST static scan), #8 (enum split), #13 (import
bundle docs), #14 (factory consolidation), #15-#22 (maint).
Tests:
* New: tests/test_store_guardrails_prompt_injection.py (corpus +
trust-boundary invariants), tests/test_store_put_atomic.py,
tests/test_store_guardrails_reaper.py.
* Extended: test_store_guardrails_llm.py (system param, missing
risk_level, BG race), test_admin_store_submissions.py (quota
counter widening, sort whitelist 400), test_store_repositories.py
(un-archive metadata clear), test_db_schema_version.py (v36).
* Full suite: 3738 passed; 17 pre-existing baseline failures
unchanged (db migration tests, cli binary rename, catalog export,
user mgmt v5 backfill — confirmed by stash + rerun on clean tree).
* feat(home+news): state-aware /home + /news + admin-edited news section
Squash of the vr/home-page feature work for clean rebase onto main.
Original 18-commit history preserved in branch backup/vr-home-page-pre-rebase.
What's in this PR:
**State-aware /home page**
- New `/home` route with hero + auto-mode + connectors (Asana / GWS /
Atlassian) + lookarounds. Onboarded vs not-onboarded state-machine
branches a single template (`home_not_onboarded.html`); the install
steps, "Setup a new Claude Code" CTA (90-day PAT mint), and per-
connector setup prompts hide once `users.onboarded=TRUE`. A
completion badge replaces them.
- "Mark me as offboarded" button reverses the flag without an SQL UPDATE.
- `users.onboarded BOOLEAN` column added; default FALSE; flipped by the
CLI's `agnes init` post-success POST and the `/admin/users` API.
- Connector setup prompts pre-check whether the tool is already
installed/connected before re-running setup.
- GWS scope set widened to include Google Chat (`chat.spaces`,
`chat.messages`).
**Single template + design tokens**
- `dashboard.html` now extends `base.html` via the new
`{% block layout %}` opt-out (full-width pages skip the 800px
`.container`). Net: every page shares one shell.
- `style-custom.css` `:root` extended with `--space-{7,9,10,12}`,
`--radius-2xl`, `--shadow-{card,elevated}`, `--text-{muted,disabled}`,
`--focus-ring`, `--transition-*`, `--width-{narrow,app,wide}` so
inline page styles can migrate incrementally.
**Auth redirects honor AGNES_HOME_ROUTE**
- `safe_next_path` resolves the configured home route when no `default=`
is passed; OAuth callbacks, magic-link clicks, password form, and
LOCAL_DEV_MODE shortcuts now land on `/home` (or whatever the operator
picked) instead of always /dashboard.
**News section + /news permalink + /admin/news editor**
- Schema-bumped `news_template` table (single versioned entity, draft +
publish gate). `published BOOLEAN` distinguishes draft from public;
monotonically-increasing `version` per save; rows >30d pruned on
save except the currently-displayed published version.
- `/home` bottom-of-page renders the latest published intro with a
"Read more →" link to `/news` (which renders the full body).
- `/admin/news` editor with sandboxed live preview, versions table,
per-row Unpublish, Format-help cheatsheet.
- `agnes admin news show / draft / edit / publish / unpublish /
versions / export` (CLI). Talks to the live server via the
`/api/admin/news/*` endpoints (PAT-authed) — no direct DB access
so it coexists with a running uvicorn.
- **Optimistic-lock guard**: `agnes admin news publish --version N` and
PUT/PATCH endpoints accept `expected_version` and 409 with structured
`{error: "version_conflict", expected, actual, actual_by}` when a
concurrent admin replaced the draft. Edit refuses to overwrite a
draft authored by someone else without `--force` or
`--expect-version`.
- nh3 (Rust-backed ammonia) HTML sanitizer; iframe pre-pass strips
any iframe whose src is not on the YouTube/Vimeo/Loom allowlist;
javascript:/data: schemes blocked everywhere.
- Author CSS vocabulary: `.news-hero` (blue gradient hero block),
`.callout`/`.callout-{info,warn,success,danger}`,
`.video-embed`, `.news-section`, `.news-grid-{2,3}`, `.news-cta` —
all consolidated in `style-custom.css` under "News content
vocabulary (shared)" so /home perex, /news body, and /admin/news
preview share one source of styling.
- Code-inside-`<pre>` contrast fix (was unreadable amber-on-silver).
- `.news-content` table styling (border, header band, row-hover).
**`scripts/dev/run-local.sh`** — local uvicorn launcher. Pulls Google
OAuth client id/secret from GCP Secret Manager
(`AGNES_OAUTH_GCP_PROJECT`-driven, no vendor defaults), points
`AGNES_CLI_DIST_DIR` at `./dist` so the wheel endpoint resolves, and
`--dev` flips `LOCAL_DEV_MODE=1` + `AGNES_HOME_ROUTE=/home` for one-
command iteration. `LOCAL_DEV_MODE=1` also enables the FastAPI debug
toolbar.
**CLAUDE.md "Run tests before every push" section** codifies
`pytest tests/ -n auto -q` as non-negotiable before each push.
**Tests**: 51 + 14 + 8 = 73 new tests across news-template repo,
sanitizer, API, web, CLI; plus updated home/auth/template tests for
the new shared-shell architecture.
Origin docs (gitignored, customer-fork content):
docs/brainstorms/home-page-requirements.md,
docs/plans/2026-05-07-001-feat-home-page-plan.md.
* feat(cli): agnes onboarded {on,off,status} — self-scoped flag toggle
User-facing equivalent of the in-page "Mark me as (off)boarded" button
on /home. POSTs /api/me/onboarded with {onboarded, source}; --source
overrides the audit-log marker so flips made from the CLI vs the web
button vs agnes init automation stay distinguishable.
`status` reads via /api/me/profile (when present); falls back to a
quick body-marker scan of /home so the read path doesn't write an
audit_log row. PAT-authed via cli.client.api_post — same convention
as agnes admin news / agnes admin add-user etc.
Tests: 5 covering on/off/status round-trip, idempotency, and
audit-log source recording. Full suite holds at 12 pre-existing
failures (same set as before).
* ui(nav+home): primary nav reorg + green What's new band + /marketplace link fix
Primary nav (post-rebase audit + per-user feedback):
- Items: Home → Marketplace → Data Packages → Memory. Admin dropdown
for admins only. The "Dashboard" label was renamed Home — point still
resolves through `home_route` so customer instances on /dashboard
still land there.
- Activity Center moved into the Admin dropdown. Per-team adoption
analytics is admin-consumed in practice; the route still allows
any authed user for direct deep-links so existing /home tile +
bookmarks keep working.
- Memory link added (→ /corporate-memory) — was previously buried in
the /home "Look around" tiles.
- Setup local agent + My Stack dropped from main nav. Setup is the
/home install flow's home now; My Stack lives as a tab inside
/marketplace.
/home tweaks:
- Plugin marketplace tile now points at /marketplace (was /store —
legacy from before the marketplace rebrand landed in #230).
- "What's new" section header gets a green band (success-flavored
D1FAE5 background, A7F3D0 border, darker green title) so the
bottom-of-page news block visibly distinguishes from the blue
install-hero at the top. Header strip only — body stays white.
Test fix: test_home_route_resolution renamed `dashboard_link_uses_home_route`
→ `home_link_uses_home_route` and asserts `href="/home">Home` instead
of `href="/home">Dashboard` after the label change.
* fix(home): decouple Step 3 + Connect-tools collapse from server onboarded flag
The server-side `users.onboarded` flip happens through two paths:
1. Explicit user click on "Mark me as onboarded" or `agnes onboarded on`.
2. Implicit `agnes init` POST → /api/me/onboarded on success.
Path 2 produced a UX surprise: an analyst running `agnes init` mid-flow
reloaded /home and saw Step 3 (auto-mode) + Connect-your-tools auto-
collapse to summary bars. They were actively working through those
sections — the install POST never signalled "I'm done with the rest
of setup", just "Agnes itself is installed".
Decouple the section-collapse decision from the server flag:
- Step 1 + Step 2 install blocks: still hidden on `onboarded=TRUE`
(their completion is a hard server signal — Agnes IS installed).
- Step 3 + Connect-your-tools: render flat by default in BOTH states.
Wrapped in `<details class="setup-collapsible" open>` so the
browser's native disclosure handles per-section toggle without JS,
but the `<summary>` is CSS-hidden until the page-level
`data-setup-minimized="1"` attribute is set on `.home-mock`.
- New "Minimize setup view" toggle inside the blue install-hero,
rendered only when onboarded. Click flips the data-attr on
`.home-mock` AND removes the `open` attribute from each
`<details>`. State persists in `localStorage["agnes_home_setup_minimized"]`
so the choice survives reloads but is per-device.
- "Show full setup view" (the same button when minimized) re-opens
both `<details>` and clears localStorage.
When minimized, each `<details>` still has its own native expand/
collapse — click the gray summary bar to peek at one section without
toggling the page-level minimize off.
Tests:
- test_step3_and_connectors_render_flat_when_onboarded_by_default —
asserts `<details class="setup-collapsible" ... open>` for both
sections post-onboarding and the absence of any server-rendered
`data-setup-minimized` attribute on the `.home-mock` root.
- test_minimize_toggle_visible_only_when_onboarded — toggle button
rendered only when onboarded.
Full pytest holds at 12 pre-existing failures (same set).
1. instance.yaml overlay path now matches read site under STATE_DIR.
Three sites updated:
- app/api/admin.py:1005 (server-config endpoint writer)
- app/api/admin.py:2610 (configure endpoint writer)
- app/instance_config.py:106 (overlay reader)
All three now go through _state_dir() so under flat-mount layout
(STATE_DIR=/data-state) the irreplaceable instance.yaml overlay
lands on the state disk (sdc) instead of the regenerable data
disk (sdb). Without this fix, .env_overlay correctly went to the
state disk while instance.yaml went to the data disk — config
would be lost if an operator wiped sdb.
2. Strip customer-specific tokens from OSS repo per CLAUDE.md
vendor-agnostic rule:
- docker-compose.host-mount.yml: 'a deployer (Groupon FoundryAI)'
→ 'a deployer in production'
- docker-compose.flat-mount.yml: 'caused 2026-05-05 in the
Groupon FoundryAI deployment' → generic 'production failure
mode'
- docs/state-dir.md: rewrote the incident reference to describe
the failure mode abstractly without naming the deployment;
updated the recommendation table to say 'shadow-mount class'
instead of dating the specific incident.
3. Updated docs/state-dir.md 'What reads STATE_DIR' to list all
read/write sites including the three migrated in this round
(admin.py, instance_config.py, marketplaces.py).
ANALYSIS finding (tls-rotate.sh hardcoded host-mount.yml) deferred
— same operator-side class as auto-upgrade.sh hardcoded host-mount,
documented limitation per the PR body.
Devin BUG: /api/admin/configure seeds an ai: block to the writable
overlay at DATA_DIR/state/instance.yaml, but the three LLM consumers
imported from config.loader.load_instance_config — which reads the
static config dir only. Even if they had read the overlay, the loader
ran yaml.safe_load directly without passing through _resolve_env_refs,
so '${ANTHROPIC_API_KEY}' would have stayed a literal placeholder. The
pipeline appeared to work because the factory falls back to the env
var directly, but the overlay path itself was dead code.
Two fixes, both required:
1. Switched the three LLM consumers to app.instance_config.load_instance_config:
- services/corporate_memory/collector.py:collect_all
- services/verification_detector/__main__.py:main
- app/api/admin.py:run_verification_detector
2. app/instance_config.py runs the loaded overlay through
config.loader._resolve_env_refs *before* the deep-merge, so
'${ANTHROPIC_API_KEY}' resolves at config-load time.
New regression suite tests/test_instance_config_overlay.py pins:
- env-ref resolution against the overlay (resolved when env set,
empty when env missing — never the literal placeholder)
- deep-merge still preserves static-only sections
- the three consumers reach app.instance_config (inspected via
inspect.getsource so a future refactor that reverts the import
fails the test)
- end-to-end: a seeded overlay + ANTHROPIC_API_KEY env reaches the
factory with a resolved api_key
* docs(spec): #134 unify BigQuery access behind BqAccess facade
Brainstorm output for issue #134. Captures:
- root cause (incl. correction of the issue's hypothesis about commit 33a9964)
- BqAccess facade API + project resolution rules
- error contract — typed BqAccessError mapped to HTTP 502 for upstream
BQ failures, 500 for deployment/config bugs
- migration plan for v2_scan, v2_sample, RemoteQueryEngine
- test rewrite eliminating _bq_client_factory injection point
- E2E verification protocol on agnes-development as success criterion
* docs(spec): #134 revise after first review
Incorporates code-reviewer findings:
Must-fix:
- Add v2_schema (2 copies of INSTALL/LOAD/SECRET dance) to migration scope.
- Reframe v2_scan headline: missing try/except around BQ calls is the
actual cause of bare 500s, not project resolution (which 33a9964 fixed).
- List two more deferred call sites (extractor.py, register_bq_table)
with explicit rationale.
Important:
- Drop billing != data clause from cross_project_forbidden heuristic;
rely only on 'serviceusage' substring. billing != data is normal
for cross-project setup, was over-classifying.
- Split bq_bad_request into _user (400) and _server (502) variants;
add sql_origin parameter to translate_bq_error so call sites declare
whether SQL contains user input.
- Add @functools.cache to BqAccess.from_config; document tests bypass
via dependency_overrides.
- Replace monkey-patched-classmethod test pattern with
BqAccess(client_factory=...) injection at construction time. Cleaner
than today's _bq_client_factory and 1:1 migration shape.
- Keep BqProjects.data (reviewer assumed registry has source_project;
it doesn't). Multi-project explicitly listed as non-goal with note.
Nice-to-have:
- Add 'Implementation strategy' section: 2 staged commits (bug fix
alone is revertable; refactor follows).
- Extend E2E protocol to cover all three endpoints, not just /sample.
- Note removal of stale docstring at src/remote_query.py:204.
* docs(spec): #134 revision 3 — incorporates second-round review
Must-fix from second review:
- v2_schema split into two migration cases: _fetch_bq_schema translates
errors via translate_bq_error; _fetch_bq_table_options preserves its
swallow-all 'except Exception → return {}' so /schema doesn't 502 on
partition-info failures.
- RemoteQueryEngine.__init__ now resolves BqAccess lazily (in
_get_bq_client, not in __init__). Without this, ~7 DuckDB-only tests
in test_remote_query.py would suddenly fail with not_configured.
- translate_bq_error pass-through for BqAccessError is now load-bearing
(clause 1, before any Google-API branch). bq.client() raises BqAccessError
for bq_lib_missing/auth_failed; without explicit pass-through those
fall to 'unknown' and re-raise as bare 500.
- Commit 1 now emits the SAME structured response shape as commit 2 to
avoid contract churn between commits.
- BIGQUERY_PROJECT env-var precedence is BREAKING for env-only deployments
— flagged in CHANGELOG ### Changed.
Editorial:
- sql_origin renamed to bad_request_status with values 'client_error' /
'upstream_error' (clearer about what the parameter actually decides).
bq_bad_request_user/_server kinds collapsed to bq_bad_request (400)
and bq_upstream_error (502).
- CLI (cli/commands/query.py) noted as external RemoteQueryEngine caller;
unaffected because new bq_access kwarg has default None.
- Added unit/integration tests for the new contracts:
test_translate_passes_through_BqAccessError,
test_v2_scan_returns_500_on_bq_lib_missing,
test_v2_schema_returns_200_with_empty_partition_on_bq_failure,
test_resolve_succeeds_after_config_set.
- E2E protocol now covers /schema as the fourth endpoint.
- Documented functools.cache-doesn't-cache-exceptions semantics and
fixture nullcontext-doesn't-close caveat for nested sessions.
* docs(spec): #134 revision 4 — incorporates third-round review
Third reviewer verdict: 'implementation-ready with two trivial edits';
explicitly noted prior rounds did the heavy lifting.
Edits:
1. get_bq_access() module-level function instead of @classmethod
@functools.cache from_config. Removes the classmethod-cache stacking
footgun (different Python versions wrap differently) and gives FastAPI's
dependency introspection a clean function signature. Drops the
'Do not subclass BqAccess' caveat that no longer applies.
2. Commit 1 strategy explicitly: wrap _fetch_bq_sample (v2_sample),
_bq_dry_run_bytes + _run_bq_scan (v2_scan), and _fetch_bq_schema
(v2_schema strict block). Do NOT touch _fetch_bq_table_options swallow-all
in commit 1 — preserved as-is, then migrated (still preserved) in commit 2.
All three endpoints emit the same structured body shape so client parsers
see one consistent contract throughout the staged rollout. No more
half-rolled-out window where /sample is bare 500 while /scan is
structured 502.
* docs(plan): #134 implementation plan — Phase 1 (atomic bug fix) + Phase 2 (BqAccess refactor) + Phase 3 (verification)
Bite-sized TDD tasks. 3 phases, 16 tasks total:
Phase 1 (Commit 1) — atomic bug fix across all four v2 endpoints:
Tasks 1.1-1.5 wrap _fetch_bq_sample, _bq_dry_run_bytes, _run_bq_scan,
_fetch_bq_schema with structured 502/400 try/except. _fetch_bq_table_options
preserved untouched. CHANGELOG Fixed entries.
Phase 2 (Commit 2) — BqAccess facade extraction + migration:
Tasks 2.1-2.5 build connectors/bigquery/access.py bottom-up
(BqProjects, BqAccessError, translate_bq_error, default factories,
BqAccess class, get_bq_access module-level cached). Task 2.6 adds
conftest.py fixture. Tasks 2.7-2.9 migrate v2_scan, v2_sample, v2_schema
to BqAccess. Tasks 2.10-2.11 migrate RemoteQueryEngine + tests
(lazy bq_access, drop _bq_client_factory). Task 2.12 CHANGELOG
Changed BREAKING + Internal.
Phase 3 — Verification:
3.1 full pytest. 3.2 squash into two PR-shape commits. 3.3 manual
E2E on agnes-development per spec protocol → close#134.
Self-review table maps spec sections to implementing tasks; no gaps.
* fix(v2): #134 structured 502/400 on BQ errors across /scan, /scan/estimate, /sample, /schema
Wraps the BigQuery call sites in v2_scan, v2_sample, and v2_schema (strict
block only) with try/except for google.api_core exceptions, translating to
HTTPException with a structured body shape: {error, message, details}.
Fixes Pavel's report (#134) where these endpoints returned bare HTTP 500
with no body when the SA on agnes-development hit cross-project Forbidden
on serviceusage.services.use.
Also fixes /sample's missing billing_project fallback (the bug 33a9964
fixed for /scan never landed here).
Status code split:
- /scan, /scan/estimate: BadRequest -> 400 (bq_bad_request) since SQL is
user-derived from req.select/where/order_by.
- /sample, /schema: BadRequest -> 502 (bq_upstream_error) since SQL is
server-constructed from validated identifiers.
- All Forbidden -> 502 with cross_project_forbidden if 'serviceusage' in
error message (with hint pointing at data_source.bigquery.billing_project),
else bq_forbidden.
Body shape matches what the upcoming BqAccess refactor (next commit) will
produce, so client-side parsers see one consistent contract throughout
the staged rollout.
_fetch_bq_table_options preserved exactly as-is — its swallow-all-and-return-empty
contract is intentional and survives into the refactor; /schema continues to
return 200 with empty partition info when partition queries fail.
Outer wraps in scan_endpoint, scan_estimate_endpoint, sample, and schema
endpoints exist only to make the test pattern (monkeypatching whole
_fetch_* functions) work, and are tagged TODO(#134 Phase 2) for removal
once BqAccess centralizes translation.
* refactor(bq): #134 BqAccess facade — unify v2_scan, v2_sample, v2_schema, RemoteQueryEngine
Extracts the duplicated BigQuery-access pattern (project resolution +
client construction + DuckDB-extension session + Google-API error
translation) into connectors/bigquery/access.py. Migrates four
call sites to use it:
- app/api/v2_scan.py — _bq_dry_run_bytes, _run_bq_scan
- app/api/v2_sample.py — _fetch_bq_sample
- app/api/v2_schema.py — _fetch_bq_schema (strict translation),
_fetch_bq_table_options (preserves swallow-all best-effort contract)
- src/remote_query.py — RemoteQueryEngine, lazy bq_access kwarg
The new module exposes:
- BqProjects (frozen dataclass: billing + data project IDs)
- BqAccessError (typed exception with HTTP_STATUS class mapping)
- BqAccess (facade with injectable client_factory/duckdb_session_factory
for tests; defaults call the real google-cloud-bigquery + DuckDB extension)
- get_bq_access (module-level @functools.cache; FastAPI Depends target)
- translate_bq_error (Google API exception → BqAccessError mapper, with
BqAccessError pass-through, 'serviceusage'-substring heuristic for
cross_project_forbidden, and bad_request_status param distinguishing
user-derived (400) from server-constructed (502) SQL)
- _default_client_factory, _default_duckdb_session_factory
RemoteQueryEngine.__init__ no longer accepts _bq_client_factory; tests
migrate to bq_access=BqAccess(projects, client_factory=...). DuckDB-only
RemoteQueryEngine tests need no changes — bq_access defaults to None and
get_bq_access() is only invoked on first BQ call (lazy resolution).
BqAccessError raised internally is translated to RemoteQueryError(
error_type="bq_error") in _get_bq_client to preserve the engine's
existing public contract — CLI and /api/query/hybrid callers see no change.
Endpoint tests (test_v2_scan, test_v2_scan_estimate, test_v2_sample,
test_v2_schema) migrate from monkey-patching whole _fetch_* functions
to using the new bq_access fixture in tests/conftest.py — which
exercises the REAL translation path through BqAccess + translate_bq_error,
closing the test gap flagged in Task 1.1's review.
Side-effect behavior change: v2_sample's FROM clause now uses the data
project (instance.yaml data_source.bigquery.project), not the conflated
billing_project from Phase 1. Documented in CHANGELOG ### Internal.
BREAKING for deployments combining BIGQUERY_PROJECT env var with
data_source.bigquery.project in instance.yaml — env var now overrides
data project too. See CHANGELOG ### Changed.
Two known-duplicate BQ-access sites (connectors/bigquery/extractor.py,
scripts/duckdb_manager.register_bq_table) explicitly out of scope;
tracked as follow-up.
Removed stale docstring at the previous src/remote_query.py:204
that referenced scripts.duckdb_manager._create_bq_client as the default
BQ client factory (RemoteQueryEngine never actually used that function).
Test counts: tests/test_bq_access.py +27 (new), tests/test_v2_*.py +
tests/test_remote_query.py migrated to bq_access fixture (counts unchanged
or +1-2 per file). Full suite: 2086 passed, 8 pre-existing failures
(DB migration tests with unrelated internal_roles DependencyException —
not introduced by this PR).
* fix(bq_access): translate DefaultCredentialsError to BqAccessError(auth_failed)
CI on PR #138 caught: bigquery.Client(...) resolves Application Default
Credentials at construction time; without ADC (CI without SA key, dev
laptop without 'gcloud auth application-default login') it raises
google.auth.exceptions.DefaultCredentialsError synchronously.
Pre-fix _default_client_factory only caught ImportError, so DefaultCredentialsError
propagated as raw exception — and from production endpoints would surface
as bare 500 (the exact failure mode #134 sets out to fix).
Now translates to BqAccessError(kind='auth_failed', details.hint='Run
gcloud auth application-default login...'). Endpoint catch chain returns
HTTP 502 with structured body. Adds unit test
test_raises_auth_failed_on_default_credentials_error.
Third-round spec review flagged this case in passing; the fix didn't land.
CI's auth-less environment surfaced it.
* fix(bq_access): get_bq_access() returns sentinel instead of raising when not configured
Devin BUG_0001 on PR #138 review: 'get_bq_access() as FastAPI Depends
breaks all v2 endpoints for non-BigQuery instances'.
Pre-fix: get_bq_access() raised BqAccessError(not_configured) when
neither BIGQUERY_PROJECT env nor data_source.bigquery.project was set.
Because FastAPI resolves Depends() BEFORE the endpoint body runs, this
exception fires during dep-injection — the endpoint's try/except
BqAccessError clause never gets a chance to catch it. Result: every
v2 request on Keboola-only or CSV-only instances returned bare HTTP
500, even for local-source tables that never touch BigQuery.
Fix: get_bq_access() now returns a sentinel BqAccess with empty
BqProjects and factories that raise BqAccessError(not_configured)
on actual use. Construction succeeds, FastAPI's dep-injection cleanly
yields the sentinel, the endpoint runs. The local-source code path
in build_sample / build_schema / etc. never calls bq.client() or
bq.duckdb_session() (it reads parquet directly), so non-BQ tables
return 200 as before. Only when an endpoint actually tries to query
BQ (source_type == 'bigquery') does the sentinel raise — and the
endpoint's existing except BqAccessError catches it normally,
returning structured 502 with hint.
Test get_bq_access::test_raises_not_configured_when_neither_set
renamed and rewritten to test_returns_sentinel_when_neither_set:
asserts BqAccess is returned, then asserts client() and
duckdb_session() each raise BqAccessError(not_configured) on call.
Test test_does_not_cache_exceptions removed (no longer applicable)
and replaced with test_sentinel_is_cached_per_process documenting
the operator-restart-on-config-change contract.
* docs(spec+plan): #134 genericize customer-specific tokens (CLAUDE.md OSS rule)
Devin BUG_0001/0002 round 3 on PR #138: spec and plan docs contained
customer-specific deployment hostnames, deployment names, and a GCP
project ID that violated CLAUDE.md's vendor-agnostic OSS rule
('Nothing customer-specific belongs in code, configuration defaults,
comments, docs, commit messages, PR titles, or PR bodies').
Replacements:
agnes-development.groupondev.com -> <your-agnes-host>
agnes-development -> <your-dev-instance>
prj-grp-dataview-prod-1ff9 -> <your-data-project>
s1_session_landings -> <bq_table_id>
E2E verification semantics unchanged — operators still run the same
four curls + config flip + retry, just substituting their own host /
deployment name / project / table.
* fix(bq_access): hook get_bq_access.cache_clear into instance_config.reset_cache
Devin ANALYSIS_0004 on PR #138: get_bq_access is @functools.cache'd at
process level, so it captures BigQuery project IDs at first call and
ignores subsequent instance.yaml changes. Pre-Phase-2 the v2 endpoints
re-read get_value() on every request, so admin /api/admin/server-config
saves (which call instance_config.reset_cache()) hot-reloaded the BQ
project. Without this fix, my refactor silently regresses that contract
— operators editing instance.yaml via the admin UI would see no effect
on v2 endpoints until container restart.
instance_config.reset_cache() now also calls
connectors.bigquery.access.get_bq_access.cache_clear() (lazy import,
swallowed if connectors module isn't loaded — keeps instance_config
usable in isolated unit tests).
Adds test_instance_config_reset_cache_invalidates_get_bq_access as
regression guard. Updates CHANGELOG Internal entry to mention the
hot-reload contract + the not-configured sentinel behavior (round-3
fix from Devin BUG_0001 was previously only in commit message).
* fix(bq_access): surface not_configured before identifier validation + plan path genericize
Devin BUG_0001 + BUG_0002 round 5 on PR #138.
BUG_0001 (plan doc): personal filesystem path violated CLAUDE.md
vendor-agnostic rule. Replaced with '<worktree-root>' placeholder.
BUG_0002 (sentinel error path): when get_bq_access() returns the sentinel
BqAccess (BQ not configured), the empty bq.projects.data was reaching
validate_quoted_identifier first and raising ValueError -> endpoint
mapped to HTTP 400 'unsafe_identifier' instead of structured 500
'not_configured' with hint.
Each fetch helper now checks 'if not bq.projects.data: bq.client()' as
the first step, which triggers the sentinel's BqAccessError(not_configured).
Endpoint catches the typed error and returns HTTP 500 with hint pointing
at data_source.bigquery.project. Best-effort _fetch_bq_table_options
returns {} silently in this case (preserves the swallow-all contract).
* fix(bq_access): classify DuckDB-native exceptions from bigquery_query() via string match
Devin ANALYSIS on PR #138 review (latest round). The DuckDB bigquery
extension is a C++ plugin making its own HTTP calls — when BQ returns
403, it throws duckdb.IOException with the BQ error embedded as text,
not gax.Forbidden. translate_bq_error's isinstance checks would miss
these, falling to case 7 → bare 500 in production for v2_scan, v2_sample,
and v2_schema (the bigquery_query() paths).
Fix: last-resort string-match heuristic before the re-raise. 'Forbidden'
/ '403' / 'Bad Request' / '400' in the lowercased message classifies via
the same kind hierarchy. The 'serviceusage' substring still distinguishes
cross_project_forbidden from bq_forbidden. Specific enough that random
exceptions without HTTP-error keywords still re-raise.
Adds 4 unit tests covering the new heuristic + the 'don't swallow random
exceptions' invariant.
* chore(release): cut 0.22.0
PR #138 contains issue #134 user-visible behavior changes:
- BREAKING: BIGQUERY_PROJECT env var now overrides instance.yaml
data_source.bigquery.project for v2 endpoints (previously
RemoteQueryEngine billing only).
- Fixed: structured 502/400 on /api/v2/sample, /scan, /scan/estimate,
/schema when BigQuery raises Forbidden/BadRequest (was bare 500).
- Internal: BqAccess facade refactor unifying four duplicate BQ-access
call sites; instance_config.reset_cache() now invalidates BqAccess
cache too so admin server-config saves hot-reload BQ project IDs.
Bumps to 0.22.0 because PR #137 merged first and took 0.21.0.
Adds /admin/server-config UI for editing instance.yaml from the web. Hardening: SSRF gate on data_source URLs, narrow-overlay write strategy, atomic writes, audit log with secret masking on shape changes, threading lock on read-modify-write, corrupt-overlay refusal on write side + louder log on read side, modal Promise resolution on backdrop dismiss, sentinel scrub on save (defense-in-depth client+server). Bundles Windows PowerShell wrapper from #80. Cuts release v0.13.0.
- Config writes to DATA_DIR/state/instance.yaml (writable) instead of
CONFIG_DIR (read-only :ro in Docker)
- instance_config.py checks DATA_DIR/state/ first, then falls back to
CONFIG_DIR for backward compat
- CalVer counter is now global across channels (*-YYYY.MM.*) per spec
- Keboola error messages sanitized — log full error, return generic msg
- chmod in secrets.py wrapped in try/except for Windows compat
- Setup wizard JS handles 401 (expired JWT) with user-facing message
- deploy.yml changed to workflow_dispatch only (no duplicate test runs)
- Smoke test uses docker-compose.prod.yml + AGNES_TAG instead of sed
- docker-compose.prod.yml uses ${AGNES_TAG:-stable} env var
663 tests pass. 8 E2E verification tests pass.