agnes-the-ai-analyst

Author	SHA1	Message	Date
Vojtech Rysanek	4b48377d44	feat(web): instance.custom_scripts — operator-injected HTML/JS into base.html Add a generic, placement-aware mechanism for operators to inject HTML/JS into every page that extends base.html or base_login.html. Each entry takes name, enabled, placement (head_start \| head_end \| body_end), and html. Replaces the need for per-vendor helpers when shipping feedback widgets, analytics, or error-capture snippets. Trust boundary mirrors the existing instance.logo_svg / instance.overview pattern — admin-only, rendered with `\| safe`. Resolved by app/instance_config.py::get_custom_scripts(), surfaced in /admin/server-config via _KNOWN_FIELDS["instance"]. Empty default keeps the OSS vendor-neutral; sample Marker.io block ships commented out in config/instance.yaml.example as the canonical example.	2026-05-21 13:22:27 +04:00
Vojtech	001e5ce40e	feat(web): /home value-first redesign + unified page-shell across app (#366 ) * feat(web): value-first /home reskin (CEO mock palette + pillars + first-session) Restructures `/home` to lead with product value instead of install steps, matching the CEO mock proposed for the homepage: - New intro hero on top — eyebrow `Welcome, {{ display_name }}`, H1 `{{ instance_brand }} is your team's AI workspace`, lede framing the product as an "AI Chief of Staff", two CTAs (`Set up in ~15 min →` jumps to the wizard, `Just browse — no install needed` jumps to `#look-around`), and a four-pillar row (Data packages · Plugins · Skills · Memory). Renders for both onboarded and not-onboarded users so the value framing is consistent across visits. - New `first-session` narrative — five-beat walkthrough (launch → pick project → memory loads → ask → close) with mock terminal frames carrying traffic-light dots, prompts, and dimmed system output. - Setup wizard chrome — progress chip (`Step 1 of N · ~15 min · One-time · Reversible`), thin progress bar, and per-step number badges on each `.install-block` so the wizard reads as bounded instead of an open-ended scroll. - Palette shift from blue to green/navy: `--hp-primary` aliases `#2ea877` (mint), `--hp-hero-bg` is navy `#0f1b3a`, code panels stay near-black `#0c1224` with warm-yellow `#ffd866` accents. The token alias is reused so downstream rules pick up the new accent automatically; instance theme overrides via `config.theme_overrides()` still win. - VS Code surface tile carries a `Recommended` pill; the existing "Want to look around first?" section is renamed to `Explore your workspace` and gets the `#look-around` anchor. All test-pinned class names and IDs (`install-hero`, `install-block`, `home-mock`, `self-mark-btn`, `setupClaudeBtn`, `offboard-strip`, `home-getting-started`, `home-gs-item`, `home-overview`, `home-usage`) preserved as structural anchors; new visual language overlays via additional classes. Existing onboarded/not-onboarded branching, `/api/me/onboarded` POST, status frame gating, post-CTA modal, and OS-tab switching JS unchanged. Stray `~/FoundryAI` comment swapped for `~/{{ workspace_dir }}` to honor the vendor-agnostic OSS rule. 51 home tests pass without modification. * fix(web): /home palette inversion — dark intro hero on top, light setup card below Previous reskin commit kept the install-hero as a dark navy gradient and rendered the new intro hero as a light surface — opposite of what the CEO mock specifies. Playwright comparison vs `data/ceo_home.html` confirmed: - CEO mock: dark navy hero at TOP (with white pillars on navy), LIGHT white setup card BELOW with light step rows and dark code panels inset. - Previous: light intro hero on top, dark setup card below. Inverted. This patch flips both: - `.home-hero-intro` now: dark navy gradient `#0f1b3a → #1a2a5f`, green radial glow in the corner, green eyebrow, white H1 (`accent` span green), rgba-white lede, green pill primary CTA, translucent-white secondary CTA, pillars row separated by hairline border-top with green square-dot bullets in front of each pillar header. - `.install-hero` and `.install-block` now: white surface card with thin green accent strip across the top, light step rows split by hairline borders, green-tinted step-number circles (`#e6f9f0` bg, `#1f8a5e` ink), green progress chip + bar. Code panels (`.install-cmd`) and terminal frames stay dark — they're the "type this" surfaces. - All previously-rgba-white descendants of `.install-hero` (close button, eyebrow, h1, lead, links, code chips, OS tabs, install notes, setup-CTA button, self-mark fallback, auto-detect badge, terminal-howto disclosure) re-skinned for light surface. All 12 home page tests still pass (no markup changes, only CSS). * fix(web): /home parity polish — system font + mock sizes + blue info hint + gray step-num After v2 palette flip, user comparison vs CEO mock surfaced three remaining gaps in the wizard area: - Font stack mismatch: Agnes inherits Inter via `style-custom.css`, but the CEO mock uses the platform system stack (San Francisco on macOS, Segoe UI on Windows). The rendered weight/letterforms read noticeably different. `.home-mock` now declares `-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif` for itself and all descendants, with the monospace stack reserved for `code`/`kbd`/`pre`, `.install-cmd`, and `.terminal-body`. - Step number badges were green-tinted; mock uses neutral gray (`#f0f2f6` bg, `#4a5168` ink) — green is reserved for the "done" state. Switched to `--hp-surface-dim` + `--hp-text-secondary`. - "Don't have a terminal open?" disclosure was an amber/yellow variant left over from the old dark-hero palette. Mock uses a blue info-hint vocabulary (`--info-bg: #eef3ff`, `--info-line: #4f7cf2`, `--info-ink: #1c3994`) with white kbd chips. Added the info-* tokens to the `:root` block and re-skinned `details.terminal-howto` (incl. summary, body, kbd) to match. Step-body type sizes also brought in line with the mock spec — `.install-block .label` (step h3 equivalent) is now 17px / 700 with 6px gap; `.install-note` body type is 14px / 1.55. `--hp-info-bg / --hp-info-ink / --hp-info-line / --hp-warn-bg / --hp-warn-ink / --hp-warn-line / --hp-surface-dim` added as first-class tokens so future hint/warn callouts pick the same colors without a duplicate vocabulary. 12/12 home tests pass. * feat(web): centralize design tokens + reword /home wizard to 6 steps (CEO mock parity) Two intertwined changes that touch both global design + /home structure: GLOBAL TOKEN SHIFT (app/web/static/style-custom.css) - `--primary` flipped from blue `#0073D1` to green `#2ea877` — same brand alias the rest of the app referenced, so every page picks up the new accent automatically. Old `--primary-dark` / `--primary-light` recolored to match. - New tokens added: `--brand-accent`, `--hero-bg`, `--hero-ink`, `--surface-dim`, `--info-bg/ink/line`, `--warn-bg/ink/line`. Brings the global vocabulary in line with the CEO mock's `:root` block so callouts and hero surfaces don't have to invent local tokens. - `--font-primary` switched from Inter-led stack to the system stack (`-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Inter", system-ui, sans-serif`) so weight/letterforms render identically on macOS (San Francisco) and Windows (Segoe UI) — matches the mock and avoids a font-loading flash for analysts without Inter installed. - Shadow tints re-cast in navy `rgba(15,27,58,...)`; focus ring uses the new green `rgba(46,168,119,0.25)`. - `.app-nav-link` font-size 13px → 14px, padding 6px 12px → 8px 14px, hover bg → `--primary-light` (mint), color → `--primary-dark`. `.app-nav-menu-item.is-active` re-tinted to the same green system. - Sweep across 26 templates (style-custom.css + 25 template files) replacing every hardcoded `#0073D1` / `#005BA3` / `#E6F3FC` / `rgba(0,115,209,…)` / `rgba(0,86,163,…)` with token references or the new green hexes — 175 occurrences total. Pages that styled their own buttons / borders / shadows pick up the new brand color without per-page overrides. /HOME WIZARD: 6 STEPS PER MOCK (app/web/templates/home_not_onboarded.html) - Step 1 reworded `Install Claude Code on your computer` + `~3 min` subhead (mock copy). - Step 2 renamed `Pick a folder for {{ instance_brand }}` (was `create your workspace folder`) — same `mkdir` command, mock-aligned framing. - NEW Step 3 `Open a terminal inside that folder` — no shell command, just the "you are standing in the right directory" reassurance with a Finder/PowerShell/file-manager howto disclosure. Mirrors the CEO mock's Step 3. - Step 4 (was Step 3, gated by `home_automode.show`) renamed `Launch Claude with auto-approve on`. Body copy lightly updated so it references "the next step" instead of "Step 4". - Step 5 (was Step 4) renamed `Get the install script and paste it into Claude`. The setup-cta-lead now explicitly says "pasting the script into Claude Code will install {{ instance_brand }}…" so existing test assertions pinning the `install Agnes` substring still match. - NEW Step 6 `Optional: create a one-word shortcut for next time` — prints an `echo 'alias {{workspace_dir\|lower}}=…' >> ~/.zshrc` one-liner for Unix and an `Add-Content $PROFILE …` equivalent for Windows. OS tabs + copy buttons reuse the existing wizard chrome. - Progress chip dynamic: `Step 1 of 6` when home_automode is on, `Step 1 of 5` when off. Progress bar fill `100 // total_steps` so the bar sits at 16-20 % on first paint. - `.step-lede` token added for the new short body copy beneath each step label (14.5px / ink-soft). - `macOS / Linux / WSL` tab labels changed to `macOS / Linux` per user instruction. Terminal-howto `WSL:` paragraph dropped; the paste-shortcut hint now reads `(Linux)` instead of `(Linux/WSL)`. Functional WSL handling in `connector_prompts.py` (it's a Linux detection fallback, not user-facing label) preserved. - `setup_instructions.py` Claude Code install hint: `npm (Linux / WSL)` → `npm (Linux)`. SURFACES — 4 CARDS PER MOCK - Replaced the 3-tile `.home-usage-grid` with a 4-card grid: - VS Code (Recommended) — `.surface-card.feature`, green ring, DAILY USE eyebrow + 5-step numbered list + `Open VS Code setup guide →` link to `/setup-advanced#vscode`. - Terminal — QUICK ACCESS eyebrow + 4-step list. - Claude Code (Desktop app) — CONNECT IT eyebrow + 4-step list. - Cowork (claude.ai) — `.surface-card.incomplete`, warn-tinted border + `Instructions needed` badge + a TODO callout describing the missing content. The card is intentionally honest about the gap rather than hiding it. TEST UPDATES - `test_web_home_page.py` negative onboarded-state assertions rebased on the new step labels (6 entries instead of 4). - `test_home_route_resolution.py` `test_home_renders_automode_block_by_default` + its `_when_env_off` counterpart now check the new `Step 4 — Launch Claude with auto-approve on` label. * fix(web): /home section content + layout — verbatim mock match User comparison flagged several remaining gaps; this patch rewrites the three lower sections of /home to match the CEO mock spec exactly: FIRST-SESSION (5 beats) - h2 28px / 700 / -.5px tracking (was 19px / 600). - lede 18px ink-soft (was 13.5px secondary). - `.session-walk` wrapper, 36px gap between beats (mock spec). - `.session-step` grid 48px / 1fr, gap 22px — number circle on the left, content on the right. - `.session-num` 40 × 40 circle with SOLID GREEN bg (`--primary`) and WHITE text + soft green shadow (was 28px mint pill w/ dark-green text). - `.session-content h3` 18px / 600 (was 14.5px / 600). - `.session-content > p` 15px. - `.session-content .annotation` 13.5px ink-muted body type with `strong` for highlighting (replaces the upper-case "WHAT'S HAPPENING" eyebrow pattern that didn't match the mock). - `.session-intro` callout card (white surface + mint icon block) framing the "five beats" tagline. - `.session-tldr` summary box (brand-light bg + brand-dark left border) wrapping up the loop. - Terminal frames re-skinned: `#0c1224` body / `#182241` bar / real macOS traffic-light colors `#ff5f57` / `#febc2e` / `#28c840`. - Terminal body 13px / 1.65 line-height with mock-spec class vocabulary: `.you` (yellow input), `.ai-name` (brand bold), `.path` (light blue), `.dim` (translucent code-ink), `.caret` (blinking cursor). - Five beats rewritten with mock's exact narrative flow (launch → menu → pick → ask → close), vendor-agnostic project names (`RevenueAnalysis`, `Onboarding`, etc.) replacing the customer- specific `GRPN_` examples in the mock. Templated `{{ instance_brand }}` / `{{ workspace_dir }}` / `{{ workspace_dir \| lower }}` (the shortcut alias) everywhere. SURFACES (4 cards) - The section is no longer wrapped in a white rectangle; the `.home-usage` class loses its bg + border + padding (mock has the cards directly on the page bg). - h2 28px (was 22px). Eyebrow 12px / 1.5px tracking / brand-dark. - `.surface-card.feature` (VS Code) now uses 2px green border + vertical brand-light → white gradient (was 1px ring). - `.surface-card.incomplete` (Cowork) uses 2px red border (`#e35e5e`) + vertical red-tint → white gradient (was yellow flat bg). - `.surface-card .steps` panel: inner surface-dim bg + 8px radius + 13px font. - `.surface-foot` top-border + ink-muted (mock spec). - `.badge-warn` now a solid red box (`#e35e5e` bg + white ink + 4px radius) instead of a yellow pill, matching the mock. - Header layout fixed: the global absorbed `header { display: flex; justify-content: space-between }` rule was making the h2 sit on the right of the eyebrow; explicit `display: block` override on `.home-mock section > header` puts the title on the LEFT under the eyebrow as the mock has. BROWSE — Explore your workspace - Wrapped in `<section class="browse-section">` with proper eyebrow + h2 + lede (was a bare `.section-label` div). - `.browse-grid` 5-col grid (was responsive auto-fill, 4-card layout). Skills tile added as a 5th card linking to `/marketplace?type=skills`. - `.browse-card` mock-spec: 22 20 padding, 28px icon, 15px title, 12.5px ink-muted desc, hover lifts -2px with brand border + shadow-md. Section wrappers (`.home-usage`, `.first-session`) no longer carry the white card chrome — they sit directly on the page bg, matching the mock. Only Getting Started + Overview keep their white cards. GLOBAL eyebrow vocabulary (`.home-hero-intro .eyebrow`, `.first-session > .eyebrow`, `.surfaces > header .eyebrow`, `.browse-section .eyebrow`) all aligned to mock spec: 12px / 700 / 1.5px tracking / brand-dark color / 14px bottom margin. Hero h1 bumped to 44px / 800 / -1px tracking (was 32px / 600). 51/51 home tests pass. fix(web): /home session-intro card + terminal-body verbatim mock match User comparison flagged three remaining /home gaps; this patch addresses each: - `.session-intro` rule was missing — the "five beats" tagline rendered as a bare line with no card chrome. Added the mock- spec card: white surface, 14px radius, 20×24 padding, 1px border + shadow-sm, with a 44×44 brand-light icon block on the left. - Beat 1 terminal-title was `~/{{ workspace_dir }} — zsh` (mock- style shell-pwd format), but the user wants every terminal frame across all 5 beats to read `claude — {{ instance_brand }}`. Updated. - Terminal-body line structure for beats 2-5 rewritten verbatim from the CEO mock: - `<span class="prompt">></span><span class="you">…</span>` now has no space between the prompt and user input (mock pattern: zero gap, the .prompt's `margin-right: 8px` provides the visual separation). - Beat 2 menu items use `<strong>[N]</strong>` numbering with project entries on indented lines, each project name followed by a `<span class="dim">(N ago)</span>` timestamp at a fixed column — instead of my prior single-line concatenation. - Beat 3 narrative split into 4 stanzas separated by blank lines (matches mock): the "Switched to <strong>X</strong>" status, then dim Loaded/Last-session lines, then a stand-alone "One unprocessed input detected:" pair, then the "Want me to process …" question. My prior version dim-wrapped the entire block, which looked off. - Beat 4 narrative split into headline summary + risks section with <strong> heads + bullet lists separated by blank lines, matching the mock's "Q1 close summary" / "Open risks" rhythm. The Q1 question carries the mock's manual line-break + 2- space continuation indent inside the `.you` span — without that, terminal-body's `white-space: pre-wrap` would auto-wrap awkwardly at a different column than the mock. - Beat 5 exit narrative uses two separate dim lines + a standalone `.ai-name` "See you next time." line, then prompt + caret. My prior version collapsed everything into one dim block. - Project names changed from customer-specific (`GRPN_`) to generic (RevenueAnalysis, WeeklyReview, Onboarding, OpsDb, HRHandShake) so the OSS distribution stays vendor-agnostic per CLAUDE.md. - `Marketing plan` examples replaced with `Q1 close` so the narrative stays plausible for an analyst audience. 12/12 home tests pass. fix(web): /home surfaces verbatim mock — VS Code thumb, Terminal expected-output, NEW badge User comparison flagged three remaining surface-section gaps: - VS Code surface card was rendering a generic "Screenshot pending" placeholder; the mock has a labeled inline mockup (`<a class="vscode-thumb">` w/ `.thumb-fallback`) showing the recommended 4-pane layout (EXPLORER yellow, TERMINAL 1 purple, TERMINAL 2 green, TERMINAL 3 orange) on a dark navy bg + a "Recommended layout" caption pill. CSS `.vscode-thumb` block added — uses gradient-strip backgrounds to draw the colored panel bars without needing a base64 image. - "Recommended" badge was a pill (999px radius) with `--brand-accent` bg + navy text. Mock uses `.badge` instead of `.recommend-pill` — solid `--primary` (brand-dark green) bg with WHITE text and 4px radius. Replaced the class + CSS rule so the badge reads as a tag, not a pill. - Terminal surface card was missing the "What you should see" subsection — mock has an `.expected-output` block showing a sample of the welcome menu inside a dim dashed panel. Added the block with the mock's exact rendered output (templated to `{{ instance_brand }}` + generic project names instead of customer-specific GRPN entries) plus the `.expected-output` CSS (surface-dim bg + dashed border + `::before` "WHAT YOU SHOULD SEE" eyebrow per mock spec). Also addressed the explore-section feedback: - Skills browse-card now carries the `new` class so it picks up the `.browse-card.new::after` corner badge ("NEW", green bg, white text, 10px / 700 / 0.5px tracking) per mock. - Browse cards align same height via `align-self: stretch` (grid default) + `flex-grow: 1` on `.browse-desc` so descriptions fill remaining vertical space; previously the Skills tile sat shorter because its desc text was longer than others'. Structural HTML changes to all four surface cards: dropped the inner `<div class="surface-card-head">` wrapper + `<p class="surface-pitch">` class in favor of mock's flat layout (`.what` + `.steps` + `.when-to-use`). `<ol class="surface-steps">` replaced with `<div class="steps"><strong class="steps-eyebrow">DAILY USE / QUICK ACCESS / CONNECT IT</strong> <ol>...</ol></div>` so the eyebrow + numbered list share the mock's tinted surface-dim panel. 12/12 home tests pass. * fix(web): align /home setup walkthrough to design spec - Setup-section header (eyebrow + heading + lede) floats above the install hero; install card has no accent strip; step labels drop `Step N —` prefix; closing strip is single flex row. - VS Code surface card renders recommended-layout screenshot from `/static/img/vscode-layout.png` with click-to-enlarge lightbox. - Workspace install path cascades to `~/Desktop/{workspace_dir}` in every step, surface card, first-session annotation, and shortcut. - Step 1 verify text restores Enterprise — Finance and Legal option. - Step 6 shortcut installs a shell function with arg forwarding (`"$@"` unix / `@args` windows) and a user-facing Auto / YOLO permission-mode toggle. - Step 5 manual-fallback details inline on the CTA row; description reads at step-lede size, not 13px chip. - Setup-section heading no longer right-aligns (was inheriting `header { display: flex; justify-content: space-between }` from the legacy stylesheet; wrapper changed to `<div>`). - Getting Started `<details>` block removed (duplicated links). * test(web): align /home tests with restructured setup wizard - Replace test_getting_started_card_renders_on_home with test_setup_section_renders_for_not_onboarded — asserts the new setup-section-header floats above the install hero and Getting Started markup is absent (block removed in the prior commit). - Update automode-block test to match labels without the `Step N —` prefix. - Update setup-CTA partial test to match the relabeled "Copy install script to clipboard" button. Drop orphaned CSS for `.home-getting-started`, `.home-gs-summary`, and `.home-gs-item` — selectors had no matching markup after the Getting Started block was removed. Also: Step 3 `pwd` expected-output uses an absolute path (`/Users/yourname/Desktop/{workspace_dir}`) instead of the tilde-prefixed form, matching what the command actually prints. fix(web): repaint home_onboarded + setup_advanced; align CTA label - home_onboarded + setup_advanced still carried the retired blue `#0056A3` as both `--hp-primary-dark` and the hero gradient endpoint. Both reference `var(--primary-dark)` now so the green palette cascades. - setup_advanced YOLO snippet was the old `alias` form (no cd, no arg forwarding). Replaced with the shell function variant from /home Step 6 — drops into ~/Desktop/{workspace_dir} and forwards "\$@" (unix) / @args (Windows). - setup_advanced ~/{workspace_dir} path references cascaded to ~/Desktop/{workspace_dir} so install story matches /home. - Dashboard's "Setup a new Claude Code" button label aligned to the canonical "Copy install script to clipboard" — matches /home and the new docstring in _claude_setup_cta.jinja, which now mandates this label across consumers. * fix(web): keep base brand blue; scope green palette to /home redesign User noticed login + dashboard had turned green when the /home redesign flipped --primary from blue (#0073D1) to green (#2ea877) in commit 278f202e. The brand-wide flip went further than the redesign needed — only /home, /home (onboarded), and /setup-advanced intentionally use the green/navy spec; every other page (login, dashboard, catalog, marketplace, admin, profile) was just inheriting the green because --primary cascaded everywhere. Revert the global brand colour to blue and lock the green into the two outstanding redesign scopes: - style-custom.css: --primary back to #0073D1, --primary-light back to rgba(0,115,209,0.1), --primary-dark back to #005BA3, --brand-accent back to a lighter blue. - home_onboarded.html: .home-mock now sets --hp-primary, --hp-primary-dark, --hp-primary-light to explicit green hex (matching home_not_onboarded), so the hero stays green regardless of the global brand. - setup_advanced.html: same lock — .advanced-mock pins the green palette in-scope. Hero gradients on both pages now reference the local --hp-primary chain (not the global --primary), so any future palette tweak inside either scope cascades correctly without disturbing the rest of the app. * refactor(web): hoist --hp-* into shared design-tokens.css (--ds-) PR 2 of the design-system extraction ladder. Pure mechanical rename + dedup; no visual diff on any rendered page (verified on /home, /dashboard). - New app/web/static/css/design-tokens.css declares the full token set on :root: brand surface (green primary, primary-dark, mint light, brand-accent), hero (navy bg + ink), code-panel (near-black bg + cool ink + warm-yellow), light surfaces (bg/surface/border), text (primary/secondary/muted), orange accent, info + warn callout vocabularies, navy-tinted elevation shadows, system font stack + mono. - base.html loads it alongside style-custom.css so the tokens are globally available. - Rename --hp- -> --ds-* in home_not_onboarded (313 refs), home_onboarded (15), setup_advanced (39). 367 token references pointed at one of three local blocks; now all point at the global :root. - Drop the three local token blocks. Each scope class (.home-mock / .advanced-mock) only keeps its base ink + font-size + line-height rules. The legacy --primary family stays canonical for the blue base brand — login, dashboard, catalog, marketplace, admin still read blue. The design system is opt-in via the scope class. * refactor(web): extract shared components.css; migrate /home markup PR 3 of the design-system extraction ladder. First batch of reusable components lifted out of home_not_onboarded.html into a new shared stylesheet; markup migrated to consume them. - New app/web/static/css/components.css with five components, all reusable on any page that loads design-tokens.css: .callout-rec — amber lightbulb recommendation box .callout-hint — blue info hint box .code-output — "WHAT YOU SHOULD SEE" terminal output block .lightbox — full-bleed image enlarge overlay .setup-section-header — wizard header (eyebrow + h2 + lede) - base.html loads components.css after design-tokens.css. - home_not_onboarded.html markup renamed: class="rec" -> class="callout-rec" class="hint" -> class="callout-hint" class="expected-output" -> class="code-output" - Local CSS rules removed from home_not_onboarded.html for each of the extracted components — ~150 lines down to 5-line "extracted to components.css" comments. The bespoke wizard-specific styles (.install-cmd, .os-tabs, .mode-tabs, .terminal-frame) stay template-local for now since they only have one consumer. Visual regression check: /home install hero renders the amber rec callout, blue hint callout, dashed code-output block, green section header, and click-to-enlarge VS Code thumb identically to the pre-extraction render. 43 home tests pass. * fix(web): unify page-headers — activity-center full-width, marketplace shares box - /activity-center audit-log hero rendered as half-width because the _page_hero include was inside <header class="obs-topbar">, a flex row that pinned the time-range + auto-refresh controls next to it. The hero is now a sibling rendered before the <header>, so it spans the full container width like every other admin page; the controls keep their flex row underneath. - Marketplace hero unified with .page-header--hero. Markup is now <section class="page-header page-header--hero mp-hero"> so the shared box drives padding/radius/gradient/max-width/shadow; the .mp-hero override block only carries the right-anchored cover image and the rules for the search row + scope checkboxes (which the canonical hero doesn't have). Inner text uses the canonical .page-header__eyebrow / __title / __subtitle classes. - .page-header--hero shadow tint now follows the brand blue (rgba(0, 115, 209, 0.2)) instead of the leftover green from the prior palette flip; same depth highlight everywhere the gradient is blue. * fix(web): unify remaining page heroes — admin, profile, install, store, stack Sweep across pages that carried bespoke gradient hero markup so every page-hero shares the canonical `.page-header--hero` dimensions (padding 28/32/24, border-radius 14, max-width var(--width-app), navy-tinted shadow, gradient with --primary → --primary-dark). Inner text uses the .page-header__eyebrow / __title / __subtitle classes so typography matches across the app. - admin_tables: migrated to _page_hero.html include. - admin_tokens: kept .tokens-hero wrapper for the counts-chip row but added the canonical class on the same element; stripped duplicate gradient + padding + typography rules. - install: same pattern (kept hero-meta pill row). - profile: migrated to _page_hero.html include. - store_upload: kept .upload-hero wrapper for the .meta chip row; composite class with the canonical hero. - setup_advanced: .advanced-mock .ad-hero now matches canonical dimensions; green palette retained via --ds-primary/dark. - stack_card.css: .stack-hero (catalog + corporate-memory search hero) uses canonical gradient + padding + max-width. The detail-page heroes (marketplace_plugin_detail, marketplace_item_detail, catalog__detail, store_edit, admin_group_detail, admin_store_submission_detail) stay bespoke for now — they're rich detail headers with photos, badges, install actions; converting them would lose contract context. Same applies to dashboard.html env-setup-cta (it's a CTA card, not a page hero). fix(web): canonicalise .container — single page shell every page inherits Previously each admin page set its own `.container:has(.<page>) {max-width: none}` + `.<page>-page {max-width: 1400px}` override, and per-page hero markup either nested inside flex toolbars (which pinned the hero next to filter controls and squeezed it half-width) or self-constrained with a different max-width than the page. /home, /dashboard, /marketplace, and /admin/* ended up at different widths with different nav-to-hero gaps. - style-custom.css `.container` now carries the canonical 1280px max-width + `16px 32px 48px` padding so every page inherits the same nav-to-hero gap and side gutters. `.container > main` is margin/padding 0 so the container is the sole owner of gutters. - `.page-header--hero` drops its self-constraining max-width and auto-centering margin — the container provides the width, so the hero sits flush with the table/toolbar below it. - `.stack-hero` (catalog + corporate-memory) and `.advanced-mock .ad-hero` (/setup-advanced) follow the same pattern: container owns the width. - Per-page max-width overrides stripped from admin_users, admin_access, admin_groups, admin_marketplaces, admin_welcome, admin_workspace_prompt. - _page_hero include extracted from inside flex toolbars on admin_users, admin_access, admin_groups, admin_marketplaces, admin_server_config, admin_welcome, admin_workspace_prompt, admin_sessions, admin_session_detail, admin_usage, activity_center. The toolbar (`.users-toolbar`, `.gp-toolbar`, etc.) keeps only the filter + action controls; hero renders before it as a sibling. - _page_chrome.html trimmed to just the page-background tint for the redesign scopes; the duplicate `.container` rules it carried are now redundant. Verified: /home, /admin/marketplaces, /admin/users all render container width 1280px with hero top at 88px (16px below the 72px-tall sticky nav). Same spacing as /home design spec. * fix(web): admin_tables + admin_corporate_memory inherit canonical .container Both pages were overriding `{% block layout %}` from base.html, which bypasses the canonical `.container` wrapper. Result: hero span the full viewport (1596px on a wide screen) while the inner content sat at a narrower max-width — hero and content didn't align, and the nav-to-hero gap differed from every other admin page. Switched both templates to `{% block content %}` so they render inside the canonical `.container` from base.html — same path as admin_groups, admin_users, admin_marketplaces, etc. - admin_tables: dropped local `.page-title { max-width: 1600px }` + `.content { max-width: 1600px }` overrides (kept typography + inner gutter rules) and the mobile padding overrides that paired with them. Container now owns the gutters. - admin_corporate_memory: only the block keyword needed changing; the template already had a clean inner structure (no max-width override on `.container-memory`). Verified on /admin/tables and /admin/corporate-memory: - .container width 1280, padding 16/32/48 - Hero top 88 (nav 72 + container padding-top 16) - Hero + content both 1216px wide, both at left 190 — perfect alignment with /admin/groups. * fix(web): drop .page-shell padding override + admin_tables stale :root Two regressions discovered after the canonical-container unification: 1. `.container:has(.page-shell)` still set `padding: 28px 32px 48px` while the canonical `.container` had moved to `16px 32px 48px`. Every page-shell consumer (/admin/sessions, /admin/sessions/<id>, /admin/usage, /marketplace, /dashboard, marketplace detail pages, /me/activity, /store/, /admin/store-submissions) was rendering with a 28px nav-to-hero gap while /admin/users + /admin/groups rendered with 16px. Same width, mismatched vertical rhythm. The opt-in rule is now a no-op marker: canonical container already provides 1280px + 16/32/48 + main margin/padding 0. 2. admin_tables.html had a stale `<style>` block that re-declared `:root { --primary: var(--primary); ... }`. The self-referential token resolved to empty, collapsing the page-header hero's `linear-gradient(135deg, var(--primary), var(--primary-dark))` to no background — the hero appeared as a pale ghost without colour. The entire shadow `:root` block was a stale copy of the design tokens that style-custom.css already provides. Dropped it; tokens now resolve from the global `:root`. After both fixes /admin/sessions, /admin/tables, and every other page-shell consumer match /admin/groups exactly: container 1280px, container padding-top 16px, hero at top 88px / left 190px / width 1216px. fix(web): drop /admin/tokens .tokens-page width + padding override `.tokens-page` carried its own `max-width: 1280px; margin: 0 auto; padding: 28px 8px 48px` block — the canonical `.container` already provides width + 16/32/48 padding, so the nested wrapper was adding 28px on top of the container's 16px (= 44px nav-to-hero gap, vs 16px on every other admin page) and shrinking the hero sideways by 8px on each side (1200px vs the canonical 1216px). After: container owns the layout; `.tokens-page` is just a font-family scope. /admin/tokens hero now sits at top 88, left 190, width 1216 — same numbers as /admin/groups / /admin/users. * fix(web): hero links readable on blue; /admin/access Groups link href - New `.page-header--hero a` rule in style-custom.css forces any anchor inside a gradient hero to render white + underlined so links stay readable on the blue background. Previously links inherited the global `var(--primary)` blue, which disappeared on top of the matching blue gradient. No per-page class needed — drop a plain `<a>` in any hero subtitle and it just works. - /admin/access hero subtitle was Jinja-passing the inline link with HTML-entity-encoded quotes (`href="..."`). The entities decoded to literal `"` characters inside the rendered href, producing `/admin/%22/admin/groups%22` — a 404. Switched the `set` to a block-set (`{% set page_hero_subtitle %}...{% endset %}`) so the inline `<a href="/admin/groups">Groups</a>` survives unescaped through `_page_hero.html`. Also stripped the now-redundant inline `style="color:#fff;text-decoration:underline;"` — the new shared rule handles it. * fix(web): /dashboard top padding matches every other page `.main` on /dashboard had `padding: 28px 32px 48px` while every other page now uses `16px 32px 48px` via the canonical `.container`. Dashboard bypasses `.container` (overrides base.html's `layout` block to render a full-width `<main>` directly), so the padding lives on `.main` itself — bumped the top to 16px to match. After: first child top = 88, left = 190, width = 1216 — same numbers as /admin/groups / /admin/users / /admin/marketplaces. * fix(web): green eyebrow + white title on .page-header--hero (matches /home) `.page-header--hero .page-header__eyebrow` was faint white (rgba(255,255,255,0.75)) — readable but unbranded against the blue gradient. Changed to `var(--ds-brand-accent)` (mint green #54d3a0) so every page hero pairs a green eyebrow with white title + subtitle, echoing /home's setup-section header (green eyebrow, dark heading combo). One CSS rule applies everywhere — no per-page styling needed. Also bumped the eyebrow to font-weight 700 / letter-spacing 1.2px so the green stands out cleanly against the gradient. * fix(web): page-header--hero + stack-hero use /home navy gradient `.page-header--hero` and `.stack-hero` were on the brand-blue gradient (`var(--primary)` → `var(--primary-dark)`) while /home's hero (`.home-hero-intro`) sits on the deeper navy gradient (`#0f1b3a` → `#1a2a5f`). Every other page-hero now uses that same navy gradient so /home, /marketplace, /catalog, /corporate-memory, /admin/, /profile, /install, /dashboard, /setup-advanced share one brand surface. Shadow tint adjusted to the navy depth (rgba(15, 27, 58, 0.22)). Brand blue stays the link/CTA colour everywhere else; only the hero box itself is navy. fix(web): primary buttons green; marketplace tabs navy translucent Two parity tweaks pulling the rest of the app toward /home's visual language. - `.btn-primary` (both rules in style-custom.css) now uses `var(--ds-primary)` / `var(--ds-primary-dark)` green fill, matching the "Copy install script to clipboard" button on /home. Brand-blue `--primary` still drives link colour and the accent surface; only the filled button background flipped to green. Every page with a `.btn-primary` (admin "+Add user", "+Add marketplace", catalog, marketplace actions, dashboard, modals) now reads as the same "do it" affordance. - `.mp-tabs` (Curated Marketplace / Flea Market / My Stack tab group) now sits on the navy `--ds-hero-bg` with translucent white pills (rgba(255,255,255,0.10) inactive, 0.18 active) — same translucent-white-on-navy treatment as the "Just browse — no install needed" pill on /home. Icons render as soft white; per-tab colour-coding dropped in favour of the unified surface. * fix(web): catalog/memory tabs + empty-state CTA + admin action buttons Bring /catalog and /memory in line with /home + /marketplace: - `.stack-tabs` (Browse / My Stack / Recipes on /catalog, Browse / My Stack on /memory) now uses the navy `--ds-hero-bg` container with translucent-white-on-navy pills, mirroring the `.mp-tabs` treatment and /home's "Just browse — no install needed" CTA pill. Per-tab icon colour-coding dropped — icons render as soft white on the navy fill. - `.stack-tabs-row__actions .btn` (right-slot "+New Recipe", "+New Data Package" admin CTAs) now uses green primary fill (`--ds-primary`), matching `.btn-primary` and /home's "Copy install script to clipboard" button. - `.stack-empty .cta a` (empty-state action button — the "Open /admin/tables →" CTA on /catalog and equivalent on /memory) flipped from blue `--primary` to green `--ds-primary` so the colour aligns with every other primary button in the app. * fix(web): marketplace Search button green (--ds-primary) matching other CTAs * fix(web): unify Search button + admin-action button across browse pages - Added Search button (`<button class="stack-hero__search-btn">`) to /catalog and /memory heroes — same green pill as /marketplace. Wired to the existing live-filter pipeline (button click runs `applyFilters()` and refocuses the input). All three browse pages now wear the identical search bar UI. - `.stack-hero__search-btn` shares `--ds-primary` fill with `.mp-hero .search-btn`. - `.mp-actions .btn` ("Submit a skill or plugin" CTA on /marketplace) flipped from the legacy blue-outline to the same green primary fill + dimensions (`display: inline-flex; line-height: 1; padding: 9px 16px; gap: 6px`) as `.stack-tabs-row__actions .btn` on /catalog and /memory. All three right-slot action buttons render at identical height now. - `.stack-tabs-row__actions .btn` got `inline-flex` + `line-height: 1` + `gap: 6px` so a `<button class="btn">` and a `<a class="btn">` both render at exactly 33px high — the embedded `.admin-only-hint` chip no longer pushes one variant taller than the other. * fix(web): marketplace guide CTAs green (fastpath + primary); drop flea purple * fix(web): dashboard CTA hero on navy; readable <code> chips in hero - `.env-setup-cta` on /dashboard ("Set up a new Claude Code" card) flipped from the brand-blue gradient + green-tinted shadow to the canonical navy gradient (`--ds-hero-bg` → `#1a2a5f`) with navy-tinted shadow + 14px radius + 28/32/24 padding, matching `.page-header--hero` and /home's `.home-hero-intro`. Dashboard's top CTA now sits on the same brand surface as every other hero. - Added `.page-header--hero code` rule — translucent white pill + warm-yellow ink (#ffd866) so `<code>` chips embedded in hero subtitles read as code samples against the navy gradient. The global `code` rule sets `color: var(--text-primary)` (dark), which turned in-hero chips into invisible dark-on-white-on-navy ghosts (e.g. the `-by-dev` suffix on /store/new). - /store/new's `.page-header__subtitle code` dropped its inline style override — the shared rule handles it now. * feat(web): two-theme switching via data-theme + admin toggle Introduces a theme system that flips the entire UI palette between "navy" (current design, default) and "blue" (pre-redesign palette) via a single `<html data-theme="...">` attribute. Page markup, class names, and component styles don't change — only the `--ds-` token values flip. Backend - New `app/instance_config.py::get_instance_theme()` resolves the active theme from `AGNES_INSTANCE_THEME` env > `instance.theme` in instance.yaml > default "navy". Unrecognised values clamp to "navy" so a typo doesn't break the page. - `app/web/router.py::_build_context` injects `instance_theme` alongside `instance_brand` etc. so every template inherits it. - `app/web/templates/base.html` renders `<html lang="en" data-theme="{{ instance_theme \| default('navy') }}">`. CSS - `app/web/static/css/design-tokens.css` adds two new tokens to the default `:root` set: `--ds-hero-shadow` (drop-shadow tint on hero boxes) and `--ds-hero-eyebrow` (eyebrow accent colour). Plus a `:root[data-theme="blue"]` override block that flips seven tokens: `--ds-primary`, `--ds-primary-dark`, `--ds-primary-light`, `--ds-brand-accent`, `--ds-hero-bg`, `--ds-hero-bg-deep`, `--ds-hero-shadow`, `--ds-hero-eyebrow`. The blue theme aliases the brand surface tokens back to the legacy `--primary` family. - `.page-header--hero`, `.stack-hero`, `.env-setup-cta`, `.home-mock .home-hero-intro` now reference the new `--ds-hero-shadow` and `--ds-hero-bg-deep` tokens instead of hard-coding `rgba(15, 27, 58, 0.22)` and `#1a2a5f` — gradient + shadow now flip with the theme. - `.page-header--hero .page-header__eyebrow` uses `var(--ds-hero-eyebrow)` so the eyebrow goes mint-green on navy and translucent-white on blue (mint on blue reads poorly). Admin - `app/api/admin.py::_KNOWN_FIELDS["instance"]` now registers a `theme` field of kind `select` with options `["navy", "blue"]` and a `hint` explaining the trade-off. The existing /admin/server-config UI auto-renders a select for this — no template changes needed. Defaults - Default value is "navy" so existing instances see no visual change. Admins flip to "blue" via /admin/server-config to restore the pre-redesign look. Restart note: uvicorn must reload to pick up the Python changes (new getter, new template-context key, new known-field). CSS changes hot-reload via browser refresh. fix(web): blue theme — home hero eyebrow + CTA contrast `.home-hero-intro .eyebrow` and `.btn-intro-primary` referenced `--ds-brand-accent` directly, which on the blue theme resolves to the lighter brand-accent blue (#4F9DEB). Result: light-blue eyebrow on the blue gradient ("WELCOME, ADMIN" barely readable) and a light-blue button with darker-blue text ("Set up in ~15 min") that all sat in the same hue range. Introduces three new theme-aware tokens: - `--ds-hero-eyebrow` already existed; blue theme bumped opacity to 0.92 so the eyebrow reads as full white. - `--ds-hero-cta-bg` + `--ds-hero-cta-fg` + `--ds-hero-cta-bg-hover` flip the primary hero CTA: mint-green on navy (default), white- on-blue under `data-theme="blue"`. `.home-hero-intro .eyebrow` now uses `--ds-hero-eyebrow` (mint on navy / white on blue) and `.btn-intro-primary` uses the CTA token trio. Recommended palette on blue theme: - Eyebrow: white at 92% opacity (clear on the blue gradient). - Primary CTA pill: white background, brand-blue dark text (`--primary-dark` = #005BA3) for AAA-level contrast. - Secondary CTA: translucent white pill (unchanged). * fix(web): blue theme — callout-hint info bg/border/ink re-tinted to brand blue (was indigo, clashed with brand-blue hero)	2026-05-21 06:19:16 +00:00
Vojtech	a694a30a5e	fix(store): surface review failures + harden publish gate (#316 ) * fix(store): surface review failures + harden publish gate Four independent fixes to the flea-market submission pipeline, all surfaced by an admin upload that landed at status='approved' without an LLM review. 1. LLM truncation no longer pins submissions in review_error. - Raised MAX_RESPONSE_TOKENS 2500 → 6000 in llm_review.py - Added one-shot retry-with-doubled-budget in anthropic_provider.py (capped at 4× initial) 2. Flea detail page surfaces the latest submission's failure verdict even when a previously-approved version is still serving (deferred-promotion path). The _quarantine_banner gate widened from `visibility != approved` to also fire on `blocked_inline / blocked_llm / review_error`, with copy that distinguishes the v2+ edit case ("Latest edit failed review — previously approved version (vN) keeps serving") from the initial-upload quarantine wording. 3. Restore button + endpoint no longer allow restoring a version that was never approved. Added StoreEntitiesRepository.get_with_version_approvals joining store_submissions, gated the UI button on submission_status in ('approved', None), rendered status pills for non-restorable rows, and added a 400 version_not_approved guard in POST /restore. 4. BREAKING (operator-facing): publish gate is now fail-CLOSED on misconfig. The previous get_guardrails_enabled() silently fell back to "disabled, auto-approve everything" when guardrails.enabled=true in YAML but no ANTHROPIC_API_KEY was in env. Split into: - get_guardrails_enabled() (intent — YAML) - get_guardrails_llm_provider_ready() (readiness — env) Three-state matrix: enabled=false → auto-approve (unchanged) enabled=true + ready=true → normal pipeline (unchanged) enabled=true + ready=false (NEW) → submissions hold at pending_llm awaiting admin retry or override (was: silent auto-approve) Admin "Retry review" eligibility broadened to include pending_llm. Boot-time WARNING banner surfaces the misconfig in app/main.py. docs/STORE_GUARDRAILS.md updated with the three-state matrix. Operators relying on the auto-fallback for local-dev no-LLM setups must now explicitly set `guardrails.enabled: false` in instance.yaml. Tests: 4623 passed. Added TestPublishGateFailClosed (4 tests) and TestRestoreVersion::test_restore_rejects_* (3 tests). conftest.py adds an autouse fixture defaulting guardrails OFF so legacy tests don't need to know about the new toggle. * fix(store): admin override promotes v2+ edits to current The override handler at app/api/admin.py:3708 only flipped submission status → 'overridden' and entity visibility → 'approved'. Under the v37+ deferred-promotion model that's insufficient for v2+ edits / restores: the new bundle sits in versions/v<N>/plugin/ and the entity row stays at the prior approved version_no + hash + on-disk live bundle. Installers kept getting the OLD bytes the admin had just intended to replace. Mirror the runner.run_llm_review auto-approval branch: look up the submission's version_hash in entity.version_history, and if its `n` differs from entity.version_no, promote_version + _swap_live_to_version. Initial v1 overrides are unaffected — the loop finds n=1 == version_no and skips promotion. Tests: - test_override_v2_edit_promotes_to_current: stage v1 approved + v2 blocked_llm; override the v2 sub; assert entity.version_no=2, entity.version flips off the v1 hash, and the live plugin/ dir mirrors versions/v2/plugin/. - test_override_v1_initial_upload_no_promote: regression guard so the promote loop doesn't accidentally bump a v1 override. Audit log gains a promoted_to_version_no field on the override action. * fix(store): retry/rescan review staged bundle; override forward-only Two adversarial-review findings from a Codex pass on the publish-gate work. C1. Admin retry + rescan were passing live `plugin/` to the LLM. For a v2+ submission held at `pending_llm` / `blocked_llm` / `review_error`, live still holds the prior approved version's bytes — so the LLM reviewed the WRONG bytes, and the runner's hash-match promotion in `run_llm_review` would then advance the entity to staged bytes that were never actually reviewed. Resolve the staged `<entity>/versions/v<N>/plugin/` from the submission's `version_history` entry, with a fall-back to live for legacy pre-v37 rows that never seeded a versions/ dir. Helpers `_submission_plugin_dir` and `_version_no_for_submission` added to `app/api/store.py` so override / retry / rescan share one path. H1. Override's promote loop used `target != current`, which would silently demote the live bundle when admin overrode a stale v2 submission while v3 was already approved + live. Changed to `target > current` so override flips status + visibility on the row regardless, but on-disk promotion only fires forward. Same `>` defensive guard applied in `runner.run_llm_review` so a late LLM verdict racing with a newer approval can't demote either. Tests: - TestAdminRetryReviewsStagedBundle::test_retry_v2_blocked_passes_staged_dir_not_live - TestAdminRetryReviewsStagedBundle::test_rescan_v2_blocked_passes_staged_dir_not_live - TestOverrideForwardOnly::test_override_stale_v2_does_not_demote_when_v3_current * review polish: CHANGELOG drift, override eligibility, defensive copy Three small additions on top of the retry/rescan staged-bundle fix: 1. CHANGELOG: the PR's bullets had drifted into the released [0.54.17] section during rebase (context-match landed them next to already-released content). Moved them up to [Unreleased] where they belong; [0.54.17] now holds only what was actually released (refresh-marketplace ls-remote, /me/activity hero, CI sharding + workflow polish). 2. app/api/admin.py: admin override eligibility now accepts pending_llm alongside blocked_inline + blocked_llm + review_error. Closes a UX gap from the new fail-CLOSED behavior: under enabled-but-not-ready, a known-good submission would otherwise sit indefinitely until the admin set credentials AND clicked Retry. Override already routes through version_history (and is now forward-only on promote), so it stays safe for v2+ deferred- promotion submissions. 3. src/repositories/store_entities.py: get_with_version_approvals defensively copies each version_history entry before annotating with submission_status. self.get() re-parses JSON each call today so this is belt-and-suspenders against any future caching layer leaking the annotated key into a subsequent plain get() call. Tests: 112 passed (focused on test_store_entity_versions + test_admin_store_submissions, covering the retry/rescan staged- bundle fix the author shipped + this polish). --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-15 15:52:07 +02:00
Vojtech	37ad39c8a3	feat(home): status frame on /home (operator-gated, onboarded-only) (#297 ) * feat(home): status frame on /home — last sync, sessions, prompts, tokens, projects Adds the homepage status frame: a 5-card row above the install-hero / offboard-strip on /home showing the calling user's Last sync (their last `agnes pull`), Sessions, Prompts, Tokens used, and Projects worked on, with a 24h/7d pill toggle. Backed by `GET /api/me/home-stats?window=` (one DuckDB CTE joining `users` + `usage_session_summary` + `usage_events`) and SSR'd from the same `compute_home_stats` helper on initial paint so there's no spinner. The window toggle is the only JS-driven path. Side surfaces: - `GET /api/sync/manifest` now stamps `users.last_pull_at` so `agnes pull` (and the Claude Code SessionStart hook that wraps it) imprints the analyst's last sync time for the new card. - `usage_session_summary` gains four BIGINT token counters (input_tokens, output_tokens, cache_read_tokens, cache_creation_tokens) summed from JSONL `message.usage.` per assistant turn. - `USAGE_PROCESSOR_VERSION` bumps 1 → 2 so the session-pipeline reprocess loop invalidates stale summaries and backfills tokens on the next tick. Schema migration v43 → v44 is idempotent ALTERs (last_pull_at + 4 token columns) — fresh installs receive them from `_SYSTEM_SCHEMA`, upgrade path runs `_v43_to_v44`. Defaults (NULL / 0) backfill existing rows cleanly. 9 new tests in tests/test_home_stats.py cover the migration, endpoint shapes (24h/7d/unknown/empty/missing-user), and the manifest-side last_pull_at bump. docs(CHANGELOG): homepage status frame entries under [Unreleased] The post-rebase release-cut now belongs to whichever PR lands next after main rolled to 0.54.9. This PR logs its bullets under [Unreleased] (Added: homepage status frame, per-user pull tracking, token counters; Changed: schema v43 → v44 migration) so they ride out with the next release-cut. * fix(tests): bump test_schema_v42_migration asserts to v44 CI failed because tests/test_schema_v42_migration.py hardcoded `assert SCHEMA_VERSION == 43` and `assert v == 43` after init. v44 (homepage stats frame backing columns) was introduced in the preceding feat commit; this aligns the existing v42-era migration tests with the new schema version. * feat(home): gate status frame on operator flag + user.onboarded Two gates on the homepage status frame: 1. Operator master switch — `get_home_status_frame_visibility()` in app/instance_config.py mirrors the existing `get_home_automode_visibility()` shape: env var `AGNES_HOME_SHOW_STATUS_FRAME` > yaml `instance.home.show_status_frame` > default `True`. Cautious-rollout instances can disable the frame without forking; the yaml example documents both knobs. 2. Onboarded gate — the template only renders the frame when the caller's `users.onboarded` is true. First-day users see a clean install-hero before all-zero stat cards; the frame appears automatically on the next render after `agnes init` POSTs `/api/me/onboarded`. Router skips the `compute_home_stats` DB read entirely when either gate is closed; `home_stats` arrives at the template as None in that branch and the `{% if %}` shortcuts the include. Why both gates: PostHog feature flags evaluated and rejected — this codebase uses PostHog for analytics capture only, not feature gating; adding a per-user feature_enabled() call on the /home critical path would couple the homepage render to a remote eval and still require an admin master switch. The onboarded gate is a UX coherence rule layered on top of the operator switch, not an A/B test signal. 3 new tests in test_home_stats.py cover the env-var resolution (falsey values + default-true). The yaml example gets a `home:` block documenting both `show_automode` (pre-existing flag, was undocumented in the example) and `show_status_frame`.	2026-05-14 09:28:47 +00:00
Vojtech	4501c9c3dd	fix(store-guardrails): post-#290 review follow-up — purge tuple, filter chip, stale docs, lazy bundle_meta, logger.exception (#295 ) Addresses post-merge review findings on #290: - Admin Rescan is the only post-v30 producer of status='blocked_inline'. Re-add it to admin queue 'Needs review' filter chip and to TERMINAL_BLOCKED_STATUSES in the bundle-purge job so rescan-produced rows surface in the default operator view and bundles get TTL-swept instead of lingering indefinitely. - Update three doc-drift sites still referring to the pre-#290 spam counter scope (counted blocked_inline). The counter now narrows to blocked_llm + review_error; fix the comment in app/api/store.py, the docstring in get_guardrails_blocked_quota_per_day(), and the operator-facing hint rendered on /admin/server-config. - Add positive test for _reject_inline_or_continue validation branch (code='validation_failed', checks payload shape, no-DB-write contract). Locks the frontend wizard's detail.checks contract. - Tighten test_quota_disabled_with_zero — assert (200, 201) explicitly instead of !=429 so a 500 regression no longer passes. - _reject_inline_or_continue takes plugin_dir and lazy-computes bundle_meta only on the security branch. Validation rejects no longer pay for a SHA256 walk on the bundle. - Surface store.upload.security_blocked audit-log write failures via logger.exception instead of swallowing — that audit row is the only forensic trace by design.	2026-05-14 08:02:44 +02:00
Vojtech	1e87354d7e	feat(home): Getting Started + Overview + Usage modes sections (release 0.54.7) (#291 ) * feat(home): Getting Started + Overview + Usage modes sections Three new content cards rendered between the install-hero and the existing connector tiles on /home. Order: Getting Started → Overview → Usage modes → connectors. - Getting Started — dismissible card with two clickable rows linking to /setup (install flow) and /setup-advanced (deeper reference). Subsumes the legacy `.advanced-pointer` row that sat above the news section. Per-device dismiss via a generic localStorage handler: `.home-card-close[data-dismiss-key="..."]` inside a <section> wires itself up at page load — drop in any future dismissible card without per-card JS. - Overview — operator-owned HTML body sourced from the new `instance.overview` yaml field (env override `AGNES_INSTANCE_OVERVIEW`). HTML in, HTML out via the same `\| safe` filter as news_intro. Empty default hides the section entirely, keeping the OSS vendor-neutral; operators paste their product framing / privacy posture into instance.yaml. New helper `get_instance_overview()` in app/instance_config.py mirrors `get_instance_logo_svg()`. - Usage modes — three OSS-shipped tiles (Terminal / VS Code / Claude Desktop · claude.ai) explaining each surface and linking to the matching /setup-advanced anchors. Closes the gap for users wondering "where do I actually run this". Supporting changes: - setup_advanced.html gains a new `#claude-app` section between #vscode and #workspace, anchored by the Usage modes Claude Desktop tile. Covers the marketplace registration paths and when to prefer the terminal. Added to the table of contents. - Three new tests in test_web_home_page.py pin the Getting Started card markup, the Overview-on-when-yaml-set path, and the Overview-off-by-default path. All 13 tests in the file pass. Operator follow-up (separate infra PR — NOT this PR): paste the Foundry-specific Overview body into instance.yaml's `instance.overview` field. OSS ships with an empty default. * fix(home): Overview is operator-owned content — drop dismiss button Earlier iteration added a close X to the Overview section to match the Getting Started card's dismiss UX. Wrong call: Overview is operator-authored reference content (privacy posture, telemetry policy, project framing) and a per-device localStorage hide means returning users who want to re-read the policy can't recover it without clearing storage. Reverts the close button + the data-dismiss-key on the Overview section. Test inverted to assert the dismiss key is absent (defends against a future drive-by adding it back). Getting Started still dismisses — that's procedural getting-started content users legitimately stop needing once they've finished setup. Overview is always reachable; whole section is still opt-in at the operator level via the empty-yaml default. * fix(home): Terminal usage-mode tile is informational (no click-through) The setup hero above /home's Usage modes already walks the user through the Claude Code CLI install — the Terminal tile click-through to /setup just round-trips back to content the user already scrolled past. Switch Terminal to a non-anchor <div> and scope the hover affordance to a.home-usage-item so VS Code + Claude Desktop tiles keep their click-through (those legitimately deep-link into /setup-advanced anchors). * fix(home): point Usage modes guidance at ~/{workspace}/Projects/ subfolder The bundled plugin scopes the session-analysis loop and the central-catalog sync to ~/<workspace>/Projects/, not the workspace root itself — that convention already appears in the install hero's Step 4 manual-fallback note ('Don't create ~/<workspace>/Projects/ manually — the bundled plugin offers to set it up after install'). Usage modes' footer guidance now matches: 'create every project under ~/<workspace>/Projects/'. Also calls out that the session-analysis loop is scoped to that root so users understand why working outside the workspace dir is invisible to the platform.	2026-05-13 21:44:11 +02:00
Vojtech	14ddaf1e8e	feat(brand): wire instance.logo_svg into header brand slot (release 0.54.6) (#289 ) * feat(brand): inline operator SVG logo + drop header subtitle (release 0.54.6) Three header tweaks, one PR: 1. _app_header.html drops the small uppercase subtitle line below the brand. instance.subtitle still flows into the CLAUDE.md preamble + init welcome template ("Operated by …"); only the web header chrome loses it. 2. get_instance_logo_svg() in app/instance_config.py reads instance.logo_svg (yaml) / AGNES_INSTANCE_LOGO_SVG (env). The yaml field was already documented in instance.yaml.example and the template already supported inline <svg> via {{ config.LOGO_SVG \| safe }}, but router.py:344 hard-coded LOGO_SVG = "" — the middle wire was missing. Now operators can paste a lockup directly into their instance.yaml under instance.logo_svg: \| and have it render in the header. Resolution mirrors get_instance_brand (env > yaml > ""). instance.name remains independent: drives browser <title> tags + page h1s + CLAUDE.md heading; the SVG is the web-header visual only. 3. .app-header-logo svg gains max-height: 40px; width: auto; so any operator's lockup scales via its viewBox to fit the 72px header without per-asset width/height edits. Pairs with #2 — without the clamp, raw artwork (e.g. a 1600x430 lockup) overflows the chrome. Release-cut included per the same-PR rule (Unreleased contained only these bullets after rebase onto 0.54.5). * revert: keep app-header-subtitle span — out of scope for this PR Initial commit dropped the subtitle line on the assumption that the user wanted both the secondary header line AND the future-SVG brand cleaned up. The actual ask was narrower: drop the hostname suffix that renders inside instance.name ("Foundry AI (hostname)"), which is a startup.sh concern, not a template one. Restore the subtitle span and the CHANGELOG bullet that announced its removal. PR scope narrows to LOGO_SVG wiring + CSS clamp only. * fix(header): hide subtitle span when instance.subtitle is empty Pre-fix the template fell back to the literal string 'Data Analyst Portal' when INSTANCE_SUBTITLE was unset, so operators who left the field empty saw a stray hardcoded label below their brand. Switched to a Jinja {% if %} guard around the whole <span class="app-header- subtitle"> so an empty subtitle produces no element at all — clean header chrome instead of placeholder leak. * feat(home): hide install-hero once onboarded + X close button - Wrap the entire install-hero in `{% if not onboarded %}` so once `users.onboarded=true` (auto-flipped by `agnes init` POSTing /api/me/onboarded, or by the new X / existing fallback button) the blue hero disappears entirely. Pre-PR the onboarded branch reused the same shell with a "Welcome back" header + "Steps 1–4 done" badge + minimize toggle, which visually outweighed the actual nav hub. - Add a circular × close button (top-right of the hero, rendered only when not-onboarded). Click → window.confirm() asking the user to acknowledge onboarding → POST /api/me/onboarded → reload. The confirm string intentionally avoids the literal phrase "Mark me as offboarded" because cli/commands/onboarded.py::status scans /home's rendered HTML for that exact marker as a fallback for the api/me/profile check. - Lift the offboard escape hatch out of the hero into a discrete `.offboard-strip` rendered below, gated `{% if onboarded %}`. Lets the analyst flip back to the install view after wiping their workspace folder. - Centralize the /api/me/onboarded POST into a `postOnboarded()` JS helper reused by the hero X, the existing "Mark me as onboarded" fallback button, and the new offboard button. Tests updated to match the new behavior: - `test_home_onboarded_user_sees_nav_hub` — asserts the hero is gone and the offboard strip is the only setup-flow remnant. - `test_minimize_toggle_no_longer_rendered` (renamed) — asserts the minimize toggle is absent in both states (was previously rendered inside the now-hidden onboarded branch of the hero). - `test_home_no_auto_transition_after_post_until_reload` — checks offboard-strip presence post-flip instead of the removed "Welcome back" hero copy. * fix(home): X-close button used invalid source enum, hit 422 The X button's data-target-source was 'self_acknowledged_x' to give audit_log a separate marker for X-vs-button-driven flips. But app/api/me.py:38's OnboardedRequest pins source to a Literal of ['agnes_init', 'self_acknowledged', 'self_unmark'] — pydantic returned 422 on every X click. Confusing side effect: both buttons share self-mark-status as the status element, so the failed X click rendered 'Failed (422)' next to the still-functional 'Mark me as onboarded' button. Looked like the button itself broke. Fix: drop the _x suffix. Both surfaces now POST source='self_acknowledged'. Distinction in audit_log is not load-bearing — the source field captures user intent ('I'm onboarded'), not the specific UI affordance.	2026-05-13 17:25:46 +00:00
Vojtech	50a974f196	feat(store-guardrails): admin-configurable content thresholds (#281 ) * feat(store-guardrails): admin-configurable content thresholds Adds the flea-market content guardrail floors to the /admin/server-config editor so operators can tune the bar without code changes. Defaults are unchanged (60 chars description, 25 chars command, 5 distinct words, 200 chars body) — patching guardrails.* in instance.yaml or via the admin UI overrides any of them and the next inline check picks up the new value. src/store_guardrails/content_check.py now resolves the four floors via helper functions (_min_desc_chars / _min_command_desc_chars / _min_distinct_words / _min_body_chars) that read app.instance_config at call time. Module-level _DEFAULT_* constants remain as fallbacks if the import fails (defensive — keeps the guardrail module loadable without the app package on its path). app/instance_config.py grows four matching getters returning the live value with sane defaults + integer coercion. app/api/admin.py registers 'guardrails' as an editable section + ships nine known-fields entries (min_description_chars, min_command_description_chars, min_distinct_words, min_body_chars, enabled, review_model, blocked_quota_per_day, blocked_bundle_ttl_days, stuck_review_grace_seconds) with operator-facing hint copy explaining what each knob does. app/web/templates/admin_server_config.html gets a SECTION_META entry so the section renders as 'Flea-market guardrails' with a help string instead of a bare section ID. app/web/router.py threads the live thresholds into /store/new and /store/examples via a small _guardrail_thresholds() helper so the disclosure copy, char counter, and "Why these limits" table render the configured value (not a hardcoded 60). End-to-end smoke verified: PATCH guardrails.min_description_chars=90 → /store/new immediately renders "90 characters" + JS DESC_MIN=90 on the next request, no restart required (helpers read live config per call). * chore(store-guardrails): address PR review safe-fix findings Code-review safe_auto findings on PR #281 (review run 20260513-100126-64052520): - CHANGELOG: add Unreleased entry covering the new /admin/server-config Flea-market guardrails section, the four live threshold getters, and the route-helper rendering knobs. Required by the project's non-negotiable "Changelog discipline" rule. - content_check.py: narrow `except Exception` to `except ImportError` on the four `_min_()` resolver helpers. Surface-level TypeError / ValueError on a malformed YAML value belongs to the instance_config getters' own try/except — the resolvers should only defend against the in-tree import itself failing, not silently swallow real bugs in the getters. - store_upload.html: refresh the stale "30-char threshold" comment to reflect the configurable floor (default 60), and add `\|default(60)` / `\|default(25)` / `\|default(5)` filters to the disclosure-copy bindings so the upload form matches store_examples.html's belt-and-suspenders rendering if a future route ever renders the template without populating the `guardrail` context. - router.py: tighten `_guardrail_thresholds()` return annotation from bare `dict` to `dict[str, int]`. Residual work (left for separate change after operator direction): - Add round-trip test (PATCH guardrails -> next inline check uses new value) — primary testing gap. - Decide policy on `min_=0` (currently coerced to 1 via `max(1, int(val))`) vs treating 0 as a disable sentinel like neighbour getters (`blocked_quota_per_day`, `blocked_bundle_ttl_days`). - Add POST-time integer validation for `guardrails.` so a typo'd YAML value (bool / string / float) errors loudly instead of silently falling back to the default. test(store-guardrails): cover admin-configurable thresholds + PATCH round-trip Closes the "primary testing gap" Vojta noted in the safe-fix commit on PR #281 — the four new `get_guardrails_min_` getters and the PATCH-takes-effect-on-next-check live-config flow had no direct coverage. 10 new tests in `tests/test_store_guardrails_admin_config.py`: - TestGuardrailGetterDefaults (4 tests) — each new getter returns the documented default (60 / 25 / 5 / 200) when nothing is configured. - TestGuardrailGetterOverlay (5 tests) — overlay-driven overrides win, string values that look numeric coerce via int(), garbage strings fall back to default via the (TypeError, ValueError) branch, and the `max(1, int(val))` floor pins zero/negative inputs to 1. - TestPatchRoundTrip (1 test) — PATCH `/api/admin/server-config` `guardrails.min_description_chars=90`, then call content_check against a 75-char description that previously passed: must now fail with `too_short`. Then PATCH back to 60 and verify the next check passes again. Closes the cache-invalidation contract Vojta relies on for the "no app restart" claim — broken without the reset_cache() bracket in /api/admin/server-config. The TestGuardrailGetterOverlay.test_zero_or_negative_floored_to_one test pins the current `max(1, int(val))` policy. Vojta's safe-fix commit explicitly left "policy on min_=0 vs disable-sentinel" as residual work — pinning the current behavior here ensures any future change to use 0 as a disable sentinel must update this test (and the reviewer sees the policy decision). Verified: 4509 tests pass locally (4499 existing + 10 new). * release: 0.54.2 — admin-configurable flea-market guardrail thresholds + tests Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.1 → 0.54.2) bundling Vojta's admin-configurable thresholds for the flea-market content guardrail (9 knobs in /admin/server-config) plus the test coverage closing the "primary testing gap" he punted in the safe-fix commit. No DB migration; defaults unchanged from PR #276 — instances that don't set `guardrails.*` keep the original bar transparently. --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com> Co-authored-by: ZdenekSrotyr <139972147+ZdenekSrotyr@users.noreply.github.com>	2026-05-13 09:20:55 +00:00
Vojtech	79a958ec26	feat(setup): configurable instance brand + connector setup overhaul (#268 ) - instance.brand (env AGNES_INSTANCE_BRAND, default "Agnes") + instance.workspace_dir replace hard-coded "Agnes" / "~/Agnes" across /home, /setup, /setup-advanced, /login, /install, /me/debug, and the Claude Code clipboard setup script. Terraform-friendly env override; defaults preserve existing Agnes branding. - Explicit "create workspace folder" step on /home (OS-tabbed mkdir+cd) + same step baked into the clipboard script as step 2. Drops the implicit assumption that `agnes init --workspace .` lands in a sensibly-cd'd shell. - Final "Restart Claude Code" step in the setup script (unconditional, between connectors and Confirm) so freshly-installed plugins, MCP servers, and SessionStart hooks load on the next Claude Code session. - Asana reverted from hosted Remote MCP back to PAT + raw REST against app.asana.com/api/1.0. MCP envelope shape consumed ~5x tokens per call; the PAT path lets the agent read flat REST fields. Existing MCP registration is detected and the user is asked whether to remove it (default Y, with benefits listed: token cost, no third-party hop, no OAuth refresh dance, deterministic envelope shape). - Atlassian connector instructs picking the longest API-token expiry (today "1 year") to cut re-mint friction. No public query-parameter hook exists on id.atlassian.com to pre-select expiry, so the prompt documents the manual click and acknowledges that limitation. - Uniform ✅ / ❌ per-connector marker contract (Asana, GWS, Atlassian) for the Confirm summary to grep. Each connector now ends with a Claude-driven end-to-end test that uses Claude Code's own bash to exercise the stored credential and prints "✅ <Connector> integration verified — ..." (or the failure variant).	2026-05-12 17:10:08 +02:00
Vojtech	c09c85d13a	fix(cta): clipboard fallback + fold Atlassian MCP into connectors (#249 ) * fix(cta): fall back to textarea+execCommand when Clipboard API rejects The "Setup a new Claude Code" CTA fetches /auth/tokens, parses the JSON response, renders the setup script, THEN calls `navigator.clipboard.writeText()`. Modern browsers (Safari, Firefox, and Chrome on stricter configurations) reject `writeText` with NotAllowedError when transient user activation has been consumed by an intervening `await` — which is exactly the case here. Users perceived this as "the browser blocked the copy" and got the manual-paste fallback modal even though the textarea + `document.execCommand('copy')` path WOULD have worked synchronously without needing fresh user activation. `copyToClipboard` now: - prefers the modern Clipboard API (unchanged for the happy path) - on writeText rejection, falls back to `copyViaTextarea` instead of surfacing the rejection to the caller's catch block. `copyViaTextarea` is the previously-inline textarea fallback factored out into a named helper, with two small hardening touches: - `readonly` + `tabindex=-1` so the hidden textarea doesn't steal focus or pop the virtual keyboard on mobile. - explicit `setSelectionRange(0, text.length)` to belt-and-braces the selection on iOS Safari (where `.select()` alone sometimes selects zero chars on touch-focused textareas). Only the CTA button needed this — the Step-1 install-command and the connector-copy buttons all call `writeText` synchronously inside the click handler (no awaits in between), so they keep their existing user-gesture context and didn't hit the same rejection. No template changes there. * refactor(home): fold Atlassian MCP registration into connectors block The standalone "Register the Atlassian MCP server" step (was step 6 in the unified setup script) moves INTO the Atlassian connector's prompt body so all Atlassian-related setup lives in one logical group. Same intent that #247 carried for connectors, applied one level deeper: the hosted Remote MCP registration is part of "set up Atlassian", not its own ungrouped step. What changed: - `app/web/connector_prompts.py` — the Atlassian prompt's step 5 replaces the speculative "Register the on-demand Atlassian MCP under .claude/mcp/atlassian" line with the actual hosted Remote MCP registration: `claude mcp add --transport sse atlassian https://mcp.atlassian.com/v1/sse \|\| true`. The `\|\| true` keeps re-runs idempotent and the body explains the OAuth-on-first-use contract. Both /home's Atlassian tile and the inlined setup-script Atlassian sub-block emit this line — single source of truth holds. - `app/web/setup_instructions.py` — `_mcp_servers_block` deleted; the `mcp_servers` step is removed from `_step_numbers`; resolve_lines no longer calls it. - Renumbering: install (1), init (2), catalog (3), preflight (4), marketplace (5), diagnose (6), connectors (7), confirm (8). Was: 6 = mcp_servers, 7 = diagnose, 8 = connectors, 9 = confirm. - `tests/test_setup_instructions.py` — Confirm step 9→8, Connect 8→7, diagnose 7→6, mcp_servers references dropped. `test_step_numbering_with_connectors_step` now asserts `"mcp_servers" not in steps`. Stray-Confirm assertion lists shift by one position. - `tests/test_setup_page_unified.py` + `tests/test_web_ui.py` — same step-number shifts in the rendered /setup preview assertions. The `claude mcp add` line is still the Atlassian Remote-MCP path that the 2026-05-10 init-report Fix C added — only its position in the flow changes. /home Atlassian tile copying continues to install the MCP too (the prompt body the tile pastes contains the same line). 112 tests pass. * feat(atlassian): operator-overrideable base URL via AGNES_ATLASSIAN_BASE_URL Adds an env var / YAML key the operator (Terraform module, customer-VM template, OSS instance.yaml) can set to bake the Atlassian Cloud site root into the connector prompt — so end users don't have to guess / paste their org's `https://<myorg>.atlassian.net`. When set, the Atlassian connector prompt (rendered on both /home tile and inlined into the setup-script step 7 Atlassian sub-block) replaces step 1's "Ask me for my Atlassian Cloud site URL and email" with a one-line note that the URL is already provisioned by the operator and asks only for the email. Step 4's helper-script body has the `BASE_URL='<the site URL I gave you>'` placeholder substituted with the literal value. When unset (empty), the existing "ask the user" flow remains — no regression for OSS instances. Resolution + normalization in `get_atlassian_base_url()`: - env `AGNES_ATLASSIAN_BASE_URL` > yaml `instance.atlassian.base_url` > "" - strips trailing slash + trailing `/wiki` so the canonical value is the bare site root. Matches the per-user helper script's normalization at storage time (atlassian_prompt step 4 guard 2), so the literal baked in by the operator stays consistent with what the user's helper script would have computed from their input. Plumbing: - `app/instance_config.py`: new `get_atlassian_base_url()` resolver. - `app/web/connector_prompts.py`: - `atlassian_prompt(, base_url: str = "")` — string-replace two explicit placeholder phrases when base_url is truthy; otherwise return the prompt unchanged. - `all_connector_prompts(..., atlassian_base_url: str = "")` — forwards the kwarg. - `app/web/router.py` (`_build_context`): reads `get_atlassian_base_url()` and passes it through to `all_connector_prompts(...)` so both the /home tile context AND the inlined-script `resolve_lines(...)` call use the same value. - `src/welcome_template.py` (`compute_default_agent_prompt`): same threading via the existing import-on-demand path. Tests (`tests/test_home_route_resolution.py`): - `get_atlassian_base_url` resolver: default empty, env override, trailing-slash strip, trailing-`/wiki` strip. - `atlassian_prompt(base_url=...)`: literal URL baked in, ask-step removed, placeholder replaced, operator-baked-in copy appears. - `atlassian_prompt(base_url="")`: existing ask-the-user flow unchanged. - `all_connector_prompts(atlassian_base_url=...)`: kwarg threads through to the rendered atlassian prompt. 135 tests pass. feat(asana): register hosted Asana Remote MCP in connector prompt The Asana connector prompt only stored a PAT in the OS keychain + ran a curl verify against /api/1.0/users/me. That set Claude Code up for direct `curl` calls but didn't actually wire Asana into Claude's tool list — so the user couldn't ask Claude to "find my open Asana tasks" and have it work. Symmetric oversight to the Atlassian connector's original speculative `.claude/mcp/atlassian` line that this branch already replaced with `claude mcp add --transport sse atlassian https://mcp.atlassian.com/v1/sse`. Adds a new step 5 that registers Asana's hosted Remote MCP: claude mcp add --transport http asana https://mcp.asana.com/mcp \|\| true This is the V2 endpoint (streamable HTTP transport, launched February 2026). The V1 SSE endpoint at https://mcp.asana.com/sse was deprecated 2026-05-11 (today) and must NOT be used — calling it out explicitly in the prompt body so a future operator who finds an old reference doesn't paste the dead URL. OAuth is handled by Claude Code at first use, same model as the Atlassian MCP step. The PAT stored in step 3 stays for direct `curl` calls (precheck + ad-hoc scripts) — the MCP path uses its own OAuth grant, not the PAT. Old step 5 (revoke instructions) renumbers to step 6 and adds the `claude mcp remove asana` cleanup hint. Same single-source-of-truth invariant holds: /home Asana tile + the inlined Asana sub-block in the setup script (step 7 connectors) both emit identical text from `asana_prompt()`. 71 tests pass. * feat(asana): drive MCP OAuth login + end-to-end validation post-register `claude mcp add --transport http asana ...` only registers the server in Claude Code's local config — it does NOT trigger OAuth. The browser tab opens the first time any `mcp__asana__` tool gets invoked. So the previous step 5 left a user looking at a "registered" MCP that, in practice, hadn't authed yet and would fail on first real use. Same blind spot Atlassian's prompt also has, but Asana was the one called out in the latest review pass. Adds a new step 6 between MCP registration (step 5) and the revoke instructions (now step 7): a. Tell the user verbatim what's about to happen — a low-impact read through the MCP will pop the OAuth browser tab; sign in with the same account whose PAT they stored in step 3 and approve. Frames the OAuth as one-time so users don't wait for it on every later call. b. Drive an actual MCP read. Don't prescribe the exact tool name because the Asana MCP's exposed surface (`mcp__asana__`) is versioned upstream and we don't want to pin to a name that gets renamed. Instead: tell Claude to pick the lightest read from its surfaced tool list (users-me / list-workspaces / equivalent). Document the recovery path when Claude Code times out waiting for the OAuth tool use: `claude mcp list` to confirm registration before retrying. c. Print a single one-line proof that combines wiring + auth: "Asana MCP connected as <name> — <N> workspace(s) visible." Explicit anti-echo callout for tokens, task content, comments. On failure, surface the exact Claude-Code error and stop — no silent pass. d. Sanity-check that the MCP OAuth identity and the PAT identity reference the same Asana account. Easy mistake to make when the user has multiple Asana accounts — flag only on mismatch, keep quiet when they match. Recovery: `claude mcp remove asana && claude logout asana` then redo step 5. Step 7 (revoke) absorbs both the keychain delete + the `claude logout asana` line so users have a single place to undo everything. 43 tests pass. * fix(init): clear stale CA env vars on Windows before any TLS handshake Reported by the 2026-05-11 Windows test pass: after `agnes init` the gws connector failed with `UnknownIssuer` TLS errors because `SSL_CERT_FILE` and `REQUESTS_CA_BUNDLE` were still set in Windows User scope pointing at `C:\Users\localadmin\.config\agnes\ca-bundle.pem` — a file that did not exist on the test host. Past Agnes installs (the setup-prompt trust block + older bootstrap helpers) write those pointers when they materialize a combined Agnes-CA bundle; when the bundle file later disappears (re-init on a new VM, machine swap, the ~/.agnes dir wiped), the pointers go stale and every native Windows TLS handshake fails before Agnes itself runs. SSL_CERT_FILE in particular REPLACES (not appends to) the trust store, so a stale pointer is silently catastrophic. `agnes init` now clears stale pointers in two layers before the first server roundtrip: 1. Current-process env (os.environ) — what the immediately-following `api_get` to /api/catalog/tables actually reads. Without this, init itself blows up before it gets to step 2. 2. Windows User-scope env via PowerShell `[Environment]::SetEnvironmentVariable(name, $null, 'User')` — what every future shell + every native tool (gws, claude.exe, pip, uv) inherits. The 2026-05-11 reporter expected this exact cleanup ("init was supposed to clear these but they persisted"). The cleanup is best-effort and conservative: - Only deletes a var when its value points at a path that does NOT exist on disk. Intentional operator config (e.g. SSL_CERT_FILE pointing at a corp certifi bundle) stays put. - PowerShell missing / restricted execution policy / WSL-without-pwsh: swallowed silently. The current-process leg still runs, which unblocks init even on hosts where the User-scope leg cannot fire. Tests (`tests/test_init_ca_cleanup.py`, 6 cases): - Stale pointers → removed from process env. - Real-path pointers → preserved. - Non-Windows hosts: PowerShell is not invoked. - Windows hosts: PowerShell IS invoked with a script that checks all three vars + uses Test-Path + SetEnvironmentVariable. - PowerShell FileNotFoundError: cleanup swallows it, does not raise. - `_is_windows_host()` reflects sys.platform. * refactor(asana): MCP-first flow — drop PAT storage, precheck via `claude mcp list` The Asana hosted MCP at https://mcp.asana.com/mcp authenticates via OAuth (Claude Code holds the grant; browser tab pops on first tool use). The earlier prompt walked the user through creating + keychain- storing an Asana Personal Access Token AND registering the MCP — two parallel auth surfaces for one connector. Once the MCP works, the PAT has no consumer: the precheck/verify steps that used `curl $BASE/api/1.0/users/me` are just redundant proof that Asana itself is reachable, which the OAuth handshake already establishes. Removed: - Step 0 keychain probe + curl verify against /users/me with PAT. - Step 1 open developer-console / create PAT. - Step 2 click "+ New access token", warn shown-ONCE. - Step 3 helper-script for keychain-storage (per-OS bodies: macOS `security add-generic-password`, Linux `secret-tool store`, Windows `cmdkey /generic`). - Step 4 PAT-side `users/me` verify. - Step 5's split that kept the PAT around for direct curl scripts. - Step 6d's "MCP vs PAT identity sanity check" — there is no PAT anymore, nothing to mismatch against. New flow (3 steps total): - Step 0 precheck: `claude mcp list \| grep ^asana` — if found, the server is registered AND Claude Code is holding its OAuth grant (otherwise prior failure would have removed it); print "Asana MCP already registered — skipping setup" and stop. Tells the user the explicit reset command (`claude mcp remove asana && claude logout asana`) so a re-register stays one paste. - Step 1: `claude mcp add --transport http asana https://mcp.asana.com/mcp` — no `\|\| true` because step 0 should have caught the "already exists" case. Step explains the V2-vs-V1 endpoint distinction (V1 SSE deprecated 2026-05-11) and the abort-clean recovery if the precheck somehow missed the existing server. - Step 2: same OAuth + low-impact-read validation pattern as before. - Step 3: revoke instructions (mcp remove + logout + Asana-side app revoke at app.asana.com/Settings → Apps). Both surfaces (the /home Asana tile and the inlined Asana sub-block in the setup script's step 7) emit the new text from the same asana_prompt() — single-source-of-truth invariant intact. 77 tests pass.	2026-05-11 21:54:51 +02:00
Vojtech	a46b9dc928	/home install-hero polish: license link contrast, auto-mode reorder, Shift+Tab guidance (#243 ) * Make /home install-hero links readable against blue background The Claude license-options link added in the previous commit inherited the default `<a>` style (`var(--hp-primary)` blue), which renders as blue-on-blue and is unreadable inside the blue install-hero. Add a scoped `.install-hero a` rule that uses white with an underline (matching the existing lead-paragraph contrast pattern) so any link nested in the hero stays legible. * Reorder /home install flow: auto-mode is now Step 2, Agnes install becomes Step 3 Step 3 (was Step 2) pastes a ~20-command bash bootstrap into a fresh Claude Code session. Without auto-mode enabled first, each Bash/edit command needs a manual approve click — bad UX for first-time users. Move auto-mode from the outside-hero `<details>` reference block into the install-hero as a real Step 2, between "install Claude Code" and "install Agnes". Content is the persistent `acceptEdits` snippet (write to ~/.claude/settings.json) plus a one-liner pointing at Shift+Tab for users who are already inside a running Claude Code session. YOLO mode for full Bash auto-approve stays on /setup-advanced behind the existing link. The outside-hero `setup-collapsible[data-section="step3"]` block is dropped — auto-mode is no longer reference content, it's a real install step, and duplicating it would just diverge over time. Onboarded users no longer see the auto-mode block at all (consistent with Steps 1 + 3 also hiding post-onboarding). Completion banner copy updated: "Step 1, 2 & 3 done — Claude Code installed, auto-mode set, Agnes ready". Dashboard CTA partial and other templates don't reference step numbers for this flow, so no adaptation needed there. * Simplify /home Step 2 to Shift+Tab only — drop the JSON snippet Operator pointed out two issues with the prior Step 2: 1. The settings.json snippet is redundant. Claude Code's first Shift+Tab cycle to auto-accept mode already prompts the user whether to persist it as default — Claude writes the config itself, no manual file edit needed. 2. The snippet only showed the POSIX path `~/.claude/settings.json`, which doesn't translate to native Windows. Replace the snippet + copy button with a plain Shift+Tab instruction, explicitly call out the first-time "make this the default?" prompt, and note that Claude handles the config write itself — same flow on macOS / Linux / WSL / Windows. Adds a fallback line for users who already closed the post-OAuth session. * Tighten /home Step 2 install-note to two paragraphs Operator: drop the 'Claude writes the setting itself, so this works the same on macOS / Linux / WSL / Windows...' line plus the 'auto-approves file edits going forward; Bash commands stay gated — that's the safe default' line. Both were filler — the make-default prompt already implies persistence, and gated Bash is the obvious default users won't be surprised by. Result: paragraph 1 carries Shift+Tab + first-time make-default say-yes + closed-session fallback in one breath; paragraph 2 keeps the verbatim YOLO link. Same affordances, less vertical space.	2026-05-11 16:46:58 +00:00
Vojtech	d6ad08f107	Flea-market upload guardrails + soft delete + JOIN-based admin queue (#233 ) * feat(store): flea-market upload guardrails + soft delete + JOIN-based admin queue Adds an end-to-end guardrails pipeline for store uploads (manifest + static-security + LLM review), persists blocked bundles for forensics, introduces soft-delete (Archive) semantics, consolidates the legacy /store/{id} surface into /marketplace/flea/{id}, and reworks the admin queue so lifecycle filters read live entity visibility via LEFT JOIN rather than a denormalized submission column. Schema v29 → v35: * v29 store_submissions table + store_entities.visibility_status * v30 file_size, bundle_sha256, bundle_purged_at on submissions * v31 reshape store_submissions (drop legacy unique on entity_id) * v32 store_entities.archived_at/by + 'archived' visibility value * v33 drop store_submissions.retry_count (unused) * v34 ensure idx_store_submissions_entity exists post column-drop * v35 broaden visibility_status enum + JOIN architecture cutover Pipeline (src/store_guardrails/): * Inline checks: manifest_check, static_scan, quality_check * LLM review configurable haiku\|sonnet\|opus (default haiku) * BackgroundTasks-driven async path with structured-output JSON * Per-submitter daily quota (default 50) * 30-day TTL purge job (POST /api/admin/run-blocked-purge) * Bundle SHA256 + size persisted; sha256 survives purge for forensics Visibility model: * pending \| approved \| hidden \| archived * _enforce_visibility returns 404 (no leak) for non-owner non-admin * Owner sees own non-approved entries via include_owner_id widening * Install refused with 409 entity_not_approved when not approved Soft-delete (DELETE /api/store/entities/{id}): * Default = soft (visibility_status='archived'); existing installs keep getting served the bundle so users don't lose the plugin * ?hard=true admin-only: drops bundle + cascades user_store_installs * Hard-delete preserves entity_id on submission as tombstone so audit_log linkage survives for the activity timeline Admin queue lifecycle (the JOIN refactor): * Verdict (store_submissions.status) is immutable forensic record * Lifecycle (store_entities.visibility_status) is live state * /admin/store/submissions Archived chip translates to `e.visibility_status='archived'` via LEFT JOIN — any path that flips visibility surfaces in the queue immediately * Detail page renders Status (verdict) and Entity lifecycle side by side so admins see "approved at review, now archived" at a glance URL consolidation: * /store/{id} deleted (no redirect, stale bookmarks 404) * /marketplace/flea/{id} is the canonical detail surface * Three in-tree callers (upload-success, my-stack card, store listing card) updated to point at the new URL * Quarantine banner extracted to _quarantine_banner.html partial, self-guarded, included from both flea detail templates * Banner JS auto-refreshes when the verdict lands by polling /api/marketplace/flea/{id}/detail (visibility_status + submission_status — the latter is needed because blocked_llm keeps the entity at visibility_status='pending') Audit log resource format: * runner.py emits prefixed `store_submission:{id}` (post-fix) * Detail-page timeline query handles three patterns: prefixed submission, helper-emitted `store_entity:{sub_id}`, and bare-id legacy rows — all surface in the activity timeline UX fixes: * Owner sees Under review / Quarantined / Hidden banner with status * Install button gray-disabled (not blue) when non-approved * Owner cannot delete quarantined entries (403); admin can * Admin queue: filter chips, sortable columns, paging, page-size * Auto-refresh queue every 5s while pending rows are visible * Store upload page file picker no longer opens twice (label → input default action collided with explicit JS handler) Tests: 168 passed across the guardrails suites (admin submissions, store API, inline / LLM / purge guardrails, store repositories, marketplace filter, schema version). New regression coverage includes: archive surfaces via JOIN even when API path is bypassed; deleted submission renders activity timeline (tombstone); flea detail surfaces submission_status only for owner/admin; detail page renders Entity lifecycle row; audit log resource format covers both helper and runner paths. * fix(store-guardrails): PR #233 follow-up — prompt injection, atomic PUT, BG race, schema, reaper, sort whitelist Addresses 9 of the 23 findings from the PR #233 review (spec at docs/superpowers/specs/2026-05-09-pr233-guardrails-fixes-spec.md). Merge-gate items #1-#6 plus high-value mediums #7, #9-#12, #23. Architectural items (#8 enum split, #14 factory) and pure maintainability (#15-#22) deferred to follow-ups. Security: * #1 prompt injection — SYSTEM_PROMPT now passed via the SDK's dedicated system= parameter; bundle wrapped in <bundle>...</bundle> sentinels declared data-only by the system prompt; literal sentinel strings in user content are escaped so an adversarial README can't forge a close tag. * #6 static scan honesty — module docstring + admin copy + docs declare static scan as signal not gate; .md/.txt/.rst/.html/.json/ .yaml/.yml/.toml skipped to avoid false positives on prose. AST mode for Python deferred (separate flag, FP comparison work). Correctness: * #2 PUT atomicity — bundles bake into plugin.staging-<rand>/ alongside live, atomic-rename on success; failed checks leave live tree byte-for-byte intact. * #3 BG-task race — set_visibility_if_pending guards verdict flips to the (pending, hidden) review window; admin archives during review survive; skipped flips audit-logged. * #4 v35 NOT NULL/DEFAULT — schema v35→v36 re-applies them on store_entities.visibility_status. CHECK constraint enforced application-side (DuckDB ADD CHECK on existing column unsupported). * #7 stuck-review reaper — reap_stuck_llm_reviews flips pending_llm rows older than guardrails.stuck_review_grace_seconds (default 1800) to review_error. Scheduler runs every 15 min via new /api/admin/run-reap-stuck-reviews. Set knob to 0 to disable. * #9 quota counter — count_blocked_for_submitter_since now counts blocked_inline + blocked_llm + review_error so a submitter triggering only LLM-blocked verdicts is bounded. * #10 missing risk_level — surfaces as review_error with error='missing_risk_level' instead of silently defaulting to 'medium' (which looked like a model-decided block). * #11 archived_at clear — set_visibility nulls archived_at + archived_by when transitioning out of 'archived' so a future read doesn't show stale archive forensics on an approved row. Maintainability: * #12 FSM doc comment — accurate insert/transition/lifecycle description in src/db.py near store_submissions schema. * #23 sort-key whitelist — admin queue rejects unknown sort keys with 400 invalid_sort_key; substring-replace footgun removed. Deferred (separate PRs): * #5 quota race — proper fix requires asyncio.Lock spanning the full pipeline; threading.Lock blocks event loop, DuckDB MVCC doesn't help. API-level slowapi bounds worst case for now. * #6 part 3 (AST static scan), #8 (enum split), #13 (import bundle docs), #14 (factory consolidation), #15-#22 (maint). Tests: * New: tests/test_store_guardrails_prompt_injection.py (corpus + trust-boundary invariants), tests/test_store_put_atomic.py, tests/test_store_guardrails_reaper.py. * Extended: test_store_guardrails_llm.py (system param, missing risk_level, BG race), test_admin_store_submissions.py (quota counter widening, sort whitelist 400), test_store_repositories.py (un-archive metadata clear), test_db_schema_version.py (v36). * Full suite: 3738 passed; 17 pre-existing baseline failures unchanged (db migration tests, cli binary rename, catalog export, user mgmt v5 backfill — confirmed by stash + rerun on clean tree).	2026-05-09 17:32:53 +04:00
Vojtech	2e2e1a1eca	feat(home): state-aware /home + /setup-advanced + schema v26 (#228 ) * feat(home+news): state-aware /home + /news + admin-edited news section Squash of the vr/home-page feature work for clean rebase onto main. Original 18-commit history preserved in branch backup/vr-home-page-pre-rebase. What's in this PR: State-aware /home page - New `/home` route with hero + auto-mode + connectors (Asana / GWS / Atlassian) + lookarounds. Onboarded vs not-onboarded state-machine branches a single template (`home_not_onboarded.html`); the install steps, "Setup a new Claude Code" CTA (90-day PAT mint), and per- connector setup prompts hide once `users.onboarded=TRUE`. A completion badge replaces them. - "Mark me as offboarded" button reverses the flag without an SQL UPDATE. - `users.onboarded BOOLEAN` column added; default FALSE; flipped by the CLI's `agnes init` post-success POST and the `/admin/users` API. - Connector setup prompts pre-check whether the tool is already installed/connected before re-running setup. - GWS scope set widened to include Google Chat (`chat.spaces`, `chat.messages`). Single template + design tokens - `dashboard.html` now extends `base.html` via the new `{% block layout %}` opt-out (full-width pages skip the 800px `.container`). Net: every page shares one shell. - `style-custom.css` `:root` extended with `--space-{7,9,10,12}`, `--radius-2xl`, `--shadow-{card,elevated}`, `--text-{muted,disabled}`, `--focus-ring`, `--transition-`, `--width-{narrow,app,wide}` so inline page styles can migrate incrementally. Auth redirects honor AGNES_HOME_ROUTE* - `safe_next_path` resolves the configured home route when no `default=` is passed; OAuth callbacks, magic-link clicks, password form, and LOCAL_DEV_MODE shortcuts now land on `/home` (or whatever the operator picked) instead of always /dashboard. News section + /news permalink + /admin/news editor - Schema-bumped `news_template` table (single versioned entity, draft + publish gate). `published BOOLEAN` distinguishes draft from public; monotonically-increasing `version` per save; rows >30d pruned on save except the currently-displayed published version. - `/home` bottom-of-page renders the latest published intro with a "Read more →" link to `/news` (which renders the full body). - `/admin/news` editor with sandboxed live preview, versions table, per-row Unpublish, Format-help cheatsheet. - `agnes admin news show / draft / edit / publish / unpublish / versions / export` (CLI). Talks to the live server via the `/api/admin/news/` endpoints (PAT-authed) — no direct DB access so it coexists with a running uvicorn. - Optimistic-lock guard: `agnes admin news publish --version N` and PUT/PATCH endpoints accept `expected_version` and 409 with structured `{error: "version_conflict", expected, actual, actual_by}` when a concurrent admin replaced the draft. Edit refuses to overwrite a draft authored by someone else without `--force` or `--expect-version`. - nh3 (Rust-backed ammonia) HTML sanitizer; iframe pre-pass strips any iframe whose src is not on the YouTube/Vimeo/Loom allowlist; javascript:/data: schemes blocked everywhere. - Author CSS vocabulary: `.news-hero` (blue gradient hero block), `.callout`/`.callout-{info,warn,success,danger}`, `.video-embed`, `.news-section`, `.news-grid-{2,3}`, `.news-cta` — all consolidated in `style-custom.css` under "News content vocabulary (shared)" so /home perex, /news body, and /admin/news preview share one source of styling. - Code-inside-`<pre>` contrast fix (was unreadable amber-on-silver). - `.news-content` table styling (border, header band, row-hover). `scripts/dev/run-local.sh`* — local uvicorn launcher. Pulls Google OAuth client id/secret from GCP Secret Manager (`AGNES_OAUTH_GCP_PROJECT`-driven, no vendor defaults), points `AGNES_CLI_DIST_DIR` at `./dist` so the wheel endpoint resolves, and `--dev` flips `LOCAL_DEV_MODE=1` + `AGNES_HOME_ROUTE=/home` for one- command iteration. `LOCAL_DEV_MODE=1` also enables the FastAPI debug toolbar. CLAUDE.md "Run tests before every push" section codifies `pytest tests/ -n auto -q` as non-negotiable before each push. Tests: 51 + 14 + 8 = 73 new tests across news-template repo, sanitizer, API, web, CLI; plus updated home/auth/template tests for the new shared-shell architecture. Origin docs (gitignored, customer-fork content): docs/brainstorms/home-page-requirements.md, docs/plans/2026-05-07-001-feat-home-page-plan.md. * feat(cli): agnes onboarded {on,off,status} — self-scoped flag toggle User-facing equivalent of the in-page "Mark me as (off)boarded" button on /home. POSTs /api/me/onboarded with {onboarded, source}; --source overrides the audit-log marker so flips made from the CLI vs the web button vs agnes init automation stay distinguishable. `status` reads via /api/me/profile (when present); falls back to a quick body-marker scan of /home so the read path doesn't write an audit_log row. PAT-authed via cli.client.api_post — same convention as agnes admin news / agnes admin add-user etc. Tests: 5 covering on/off/status round-trip, idempotency, and audit-log source recording. Full suite holds at 12 pre-existing failures (same set as before). * ui(nav+home): primary nav reorg + green What's new band + /marketplace link fix Primary nav (post-rebase audit + per-user feedback): - Items: Home → Marketplace → Data Packages → Memory. Admin dropdown for admins only. The "Dashboard" label was renamed Home — point still resolves through `home_route` so customer instances on /dashboard still land there. - Activity Center moved into the Admin dropdown. Per-team adoption analytics is admin-consumed in practice; the route still allows any authed user for direct deep-links so existing /home tile + bookmarks keep working. - Memory link added (→ /corporate-memory) — was previously buried in the /home "Look around" tiles. - Setup local agent + My Stack dropped from main nav. Setup is the /home install flow's home now; My Stack lives as a tab inside /marketplace. /home tweaks: - Plugin marketplace tile now points at /marketplace (was /store — legacy from before the marketplace rebrand landed in #230). - "What's new" section header gets a green band (success-flavored D1FAE5 background, A7F3D0 border, darker green title) so the bottom-of-page news block visibly distinguishes from the blue install-hero at the top. Header strip only — body stays white. Test fix: test_home_route_resolution renamed `dashboard_link_uses_home_route` → `home_link_uses_home_route` and asserts `href="/home">Home` instead of `href="/home">Dashboard` after the label change. * fix(home): decouple Step 3 + Connect-tools collapse from server onboarded flag The server-side `users.onboarded` flip happens through two paths: 1. Explicit user click on "Mark me as onboarded" or `agnes onboarded on`. 2. Implicit `agnes init` POST → /api/me/onboarded on success. Path 2 produced a UX surprise: an analyst running `agnes init` mid-flow reloaded /home and saw Step 3 (auto-mode) + Connect-your-tools auto- collapse to summary bars. They were actively working through those sections — the install POST never signalled "I'm done with the rest of setup", just "Agnes itself is installed". Decouple the section-collapse decision from the server flag: - Step 1 + Step 2 install blocks: still hidden on `onboarded=TRUE` (their completion is a hard server signal — Agnes IS installed). - Step 3 + Connect-your-tools: render flat by default in BOTH states. Wrapped in `<details class="setup-collapsible" open>` so the browser's native disclosure handles per-section toggle without JS, but the `<summary>` is CSS-hidden until the page-level `data-setup-minimized="1"` attribute is set on `.home-mock`. - New "Minimize setup view" toggle inside the blue install-hero, rendered only when onboarded. Click flips the data-attr on `.home-mock` AND removes the `open` attribute from each `<details>`. State persists in `localStorage["agnes_home_setup_minimized"]` so the choice survives reloads but is per-device. - "Show full setup view" (the same button when minimized) re-opens both `<details>` and clears localStorage. When minimized, each `<details>` still has its own native expand/ collapse — click the gray summary bar to peek at one section without toggling the page-level minimize off. Tests: - test_step3_and_connectors_render_flat_when_onboarded_by_default — asserts `<details class="setup-collapsible" ... open>` for both sections post-onboarding and the absence of any server-rendered `data-setup-minimized` attribute on the `.home-mock` root. - test_minimize_toggle_visible_only_when_onboarded — toggle button rendered only when onboarded. Full pytest holds at 12 pre-existing failures (same set).	2026-05-08 18:28:47 +02:00
ZdenekSrotyr	df2c33147c	fix: Devin Review on #194 round 2 — 3 BUG-class findings 1. instance.yaml overlay path now matches read site under STATE_DIR. Three sites updated: - app/api/admin.py:1005 (server-config endpoint writer) - app/api/admin.py:2610 (configure endpoint writer) - app/instance_config.py:106 (overlay reader) All three now go through _state_dir() so under flat-mount layout (STATE_DIR=/data-state) the irreplaceable instance.yaml overlay lands on the state disk (sdc) instead of the regenerable data disk (sdb). Without this fix, .env_overlay correctly went to the state disk while instance.yaml went to the data disk — config would be lost if an operator wiped sdb. 2. Strip customer-specific tokens from OSS repo per CLAUDE.md vendor-agnostic rule: - docker-compose.host-mount.yml: 'a deployer (Groupon FoundryAI)' → 'a deployer in production' - docker-compose.flat-mount.yml: 'caused 2026-05-05 in the Groupon FoundryAI deployment' → generic 'production failure mode' - docs/state-dir.md: rewrote the incident reference to describe the failure mode abstractly without naming the deployment; updated the recommendation table to say 'shadow-mount class' instead of dating the specific incident. 3. Updated docs/state-dir.md 'What reads STATE_DIR' to list all read/write sites including the three migrated in this round (admin.py, instance_config.py, marketplaces.py). ANALYSIS finding (tls-rotate.sh hardcoded host-mount.yml) deferred — same operator-side class as auto-upgrade.sh hardcoded host-mount, documented limitation per the PR body.	2026-05-05 20:02:50 +02:00
ZdenekSrotyr	9f33e24bf9	fix(config): overlay-aware LLM consumers + env-ref resolution (#179 review) Devin BUG: /api/admin/configure seeds an ai: block to the writable overlay at DATA_DIR/state/instance.yaml, but the three LLM consumers imported from config.loader.load_instance_config — which reads the static config dir only. Even if they had read the overlay, the loader ran yaml.safe_load directly without passing through _resolve_env_refs, so '${ANTHROPIC_API_KEY}' would have stayed a literal placeholder. The pipeline appeared to work because the factory falls back to the env var directly, but the overlay path itself was dead code. Two fixes, both required: 1. Switched the three LLM consumers to app.instance_config.load_instance_config: - services/corporate_memory/collector.py:collect_all - services/verification_detector/__main__.py:main - app/api/admin.py:run_verification_detector 2. app/instance_config.py runs the loaded overlay through config.loader._resolve_env_refs before the deep-merge, so '${ANTHROPIC_API_KEY}' resolves at config-load time. New regression suite tests/test_instance_config_overlay.py pins: - env-ref resolution against the overlay (resolved when env set, empty when env missing — never the literal placeholder) - deep-merge still preserves static-only sections - the three consumers reach app.instance_config (inspected via inspect.getsource so a future refactor that reverts the import fails the test) - end-to-end: a seeded overlay + ANTHROPIC_API_KEY env reaches the factory with a resolved api_key	2026-05-05 05:57:22 +02:00
ZdenekSrotyr	d055417377	feat(config): default welcome template in jinja2 + sync_interval	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	83adf01bde	fix(v2): #134 BigQuery cross-project errors return structured 502/400 + BqAccess facade (#138 ) * docs(spec): #134 unify BigQuery access behind BqAccess facade Brainstorm output for issue #134. Captures: - root cause (incl. correction of the issue's hypothesis about commit 33a9964) - BqAccess facade API + project resolution rules - error contract — typed BqAccessError mapped to HTTP 502 for upstream BQ failures, 500 for deployment/config bugs - migration plan for v2_scan, v2_sample, RemoteQueryEngine - test rewrite eliminating _bq_client_factory injection point - E2E verification protocol on agnes-development as success criterion * docs(spec): #134 revise after first review Incorporates code-reviewer findings: Must-fix: - Add v2_schema (2 copies of INSTALL/LOAD/SECRET dance) to migration scope. - Reframe v2_scan headline: missing try/except around BQ calls is the actual cause of bare 500s, not project resolution (which 33a9964 fixed). - List two more deferred call sites (extractor.py, register_bq_table) with explicit rationale. Important: - Drop billing != data clause from cross_project_forbidden heuristic; rely only on 'serviceusage' substring. billing != data is normal for cross-project setup, was over-classifying. - Split bq_bad_request into _user (400) and _server (502) variants; add sql_origin parameter to translate_bq_error so call sites declare whether SQL contains user input. - Add @functools.cache to BqAccess.from_config; document tests bypass via dependency_overrides. - Replace monkey-patched-classmethod test pattern with BqAccess(client_factory=...) injection at construction time. Cleaner than today's _bq_client_factory and 1:1 migration shape. - Keep BqProjects.data (reviewer assumed registry has source_project; it doesn't). Multi-project explicitly listed as non-goal with note. Nice-to-have: - Add 'Implementation strategy' section: 2 staged commits (bug fix alone is revertable; refactor follows). - Extend E2E protocol to cover all three endpoints, not just /sample. - Note removal of stale docstring at src/remote_query.py:204. * docs(spec): #134 revision 3 — incorporates second-round review Must-fix from second review: - v2_schema split into two migration cases: _fetch_bq_schema translates errors via translate_bq_error; _fetch_bq_table_options preserves its swallow-all 'except Exception → return {}' so /schema doesn't 502 on partition-info failures. - RemoteQueryEngine.__init__ now resolves BqAccess lazily (in _get_bq_client, not in __init__). Without this, ~7 DuckDB-only tests in test_remote_query.py would suddenly fail with not_configured. - translate_bq_error pass-through for BqAccessError is now load-bearing (clause 1, before any Google-API branch). bq.client() raises BqAccessError for bq_lib_missing/auth_failed; without explicit pass-through those fall to 'unknown' and re-raise as bare 500. - Commit 1 now emits the SAME structured response shape as commit 2 to avoid contract churn between commits. - BIGQUERY_PROJECT env-var precedence is BREAKING for env-only deployments — flagged in CHANGELOG ### Changed. Editorial: - sql_origin renamed to bad_request_status with values 'client_error' / 'upstream_error' (clearer about what the parameter actually decides). bq_bad_request_user/_server kinds collapsed to bq_bad_request (400) and bq_upstream_error (502). - CLI (cli/commands/query.py) noted as external RemoteQueryEngine caller; unaffected because new bq_access kwarg has default None. - Added unit/integration tests for the new contracts: test_translate_passes_through_BqAccessError, test_v2_scan_returns_500_on_bq_lib_missing, test_v2_schema_returns_200_with_empty_partition_on_bq_failure, test_resolve_succeeds_after_config_set. - E2E protocol now covers /schema as the fourth endpoint. - Documented functools.cache-doesn't-cache-exceptions semantics and fixture nullcontext-doesn't-close caveat for nested sessions. * docs(spec): #134 revision 4 — incorporates third-round review Third reviewer verdict: 'implementation-ready with two trivial edits'; explicitly noted prior rounds did the heavy lifting. Edits: 1. get_bq_access() module-level function instead of @classmethod @functools.cache from_config. Removes the classmethod-cache stacking footgun (different Python versions wrap differently) and gives FastAPI's dependency introspection a clean function signature. Drops the 'Do not subclass BqAccess' caveat that no longer applies. 2. Commit 1 strategy explicitly: wrap _fetch_bq_sample (v2_sample), _bq_dry_run_bytes + _run_bq_scan (v2_scan), and _fetch_bq_schema (v2_schema strict block). Do NOT touch _fetch_bq_table_options swallow-all in commit 1 — preserved as-is, then migrated (still preserved) in commit 2. All three endpoints emit the same structured body shape so client parsers see one consistent contract throughout the staged rollout. No more half-rolled-out window where /sample is bare 500 while /scan is structured 502. * docs(plan): #134 implementation plan — Phase 1 (atomic bug fix) + Phase 2 (BqAccess refactor) + Phase 3 (verification) Bite-sized TDD tasks. 3 phases, 16 tasks total: Phase 1 (Commit 1) — atomic bug fix across all four v2 endpoints: Tasks 1.1-1.5 wrap _fetch_bq_sample, _bq_dry_run_bytes, _run_bq_scan, _fetch_bq_schema with structured 502/400 try/except. _fetch_bq_table_options preserved untouched. CHANGELOG Fixed entries. Phase 2 (Commit 2) — BqAccess facade extraction + migration: Tasks 2.1-2.5 build connectors/bigquery/access.py bottom-up (BqProjects, BqAccessError, translate_bq_error, default factories, BqAccess class, get_bq_access module-level cached). Task 2.6 adds conftest.py fixture. Tasks 2.7-2.9 migrate v2_scan, v2_sample, v2_schema to BqAccess. Tasks 2.10-2.11 migrate RemoteQueryEngine + tests (lazy bq_access, drop _bq_client_factory). Task 2.12 CHANGELOG Changed BREAKING + Internal. Phase 3 — Verification: 3.1 full pytest. 3.2 squash into two PR-shape commits. 3.3 manual E2E on agnes-development per spec protocol → close #134. Self-review table maps spec sections to implementing tasks; no gaps. * fix(v2): #134 structured 502/400 on BQ errors across /scan, /scan/estimate, /sample, /schema Wraps the BigQuery call sites in v2_scan, v2_sample, and v2_schema (strict block only) with try/except for google.api_core exceptions, translating to HTTPException with a structured body shape: {error, message, details}. Fixes Pavel's report (#134) where these endpoints returned bare HTTP 500 with no body when the SA on agnes-development hit cross-project Forbidden on serviceusage.services.use. Also fixes /sample's missing billing_project fallback (the bug 33a9964 fixed for /scan never landed here). Status code split: - /scan, /scan/estimate: BadRequest -> 400 (bq_bad_request) since SQL is user-derived from req.select/where/order_by. - /sample, /schema: BadRequest -> 502 (bq_upstream_error) since SQL is server-constructed from validated identifiers. - All Forbidden -> 502 with cross_project_forbidden if 'serviceusage' in error message (with hint pointing at data_source.bigquery.billing_project), else bq_forbidden. Body shape matches what the upcoming BqAccess refactor (next commit) will produce, so client-side parsers see one consistent contract throughout the staged rollout. _fetch_bq_table_options preserved exactly as-is — its swallow-all-and-return-empty contract is intentional and survives into the refactor; /schema continues to return 200 with empty partition info when partition queries fail. Outer wraps in scan_endpoint, scan_estimate_endpoint, sample, and schema endpoints exist only to make the test pattern (monkeypatching whole _fetch_* functions) work, and are tagged TODO(#134 Phase 2) for removal once BqAccess centralizes translation. * refactor(bq): #134 BqAccess facade — unify v2_scan, v2_sample, v2_schema, RemoteQueryEngine Extracts the duplicated BigQuery-access pattern (project resolution + client construction + DuckDB-extension session + Google-API error translation) into connectors/bigquery/access.py. Migrates four call sites to use it: - app/api/v2_scan.py — _bq_dry_run_bytes, _run_bq_scan - app/api/v2_sample.py — _fetch_bq_sample - app/api/v2_schema.py — _fetch_bq_schema (strict translation), _fetch_bq_table_options (preserves swallow-all best-effort contract) - src/remote_query.py — RemoteQueryEngine, lazy bq_access kwarg The new module exposes: - BqProjects (frozen dataclass: billing + data project IDs) - BqAccessError (typed exception with HTTP_STATUS class mapping) - BqAccess (facade with injectable client_factory/duckdb_session_factory for tests; defaults call the real google-cloud-bigquery + DuckDB extension) - get_bq_access (module-level @functools.cache; FastAPI Depends target) - translate_bq_error (Google API exception → BqAccessError mapper, with BqAccessError pass-through, 'serviceusage'-substring heuristic for cross_project_forbidden, and bad_request_status param distinguishing user-derived (400) from server-constructed (502) SQL) - _default_client_factory, _default_duckdb_session_factory RemoteQueryEngine.__init__ no longer accepts _bq_client_factory; tests migrate to bq_access=BqAccess(projects, client_factory=...). DuckDB-only RemoteQueryEngine tests need no changes — bq_access defaults to None and get_bq_access() is only invoked on first BQ call (lazy resolution). BqAccessError raised internally is translated to RemoteQueryError( error_type="bq_error") in _get_bq_client to preserve the engine's existing public contract — CLI and /api/query/hybrid callers see no change. Endpoint tests (test_v2_scan, test_v2_scan_estimate, test_v2_sample, test_v2_schema) migrate from monkey-patching whole _fetch_* functions to using the new bq_access fixture in tests/conftest.py — which exercises the REAL translation path through BqAccess + translate_bq_error, closing the test gap flagged in Task 1.1's review. Side-effect behavior change: v2_sample's FROM clause now uses the data project (instance.yaml data_source.bigquery.project), not the conflated billing_project from Phase 1. Documented in CHANGELOG ### Internal. BREAKING for deployments combining BIGQUERY_PROJECT env var with data_source.bigquery.project in instance.yaml — env var now overrides data project too. See CHANGELOG ### Changed. Two known-duplicate BQ-access sites (connectors/bigquery/extractor.py, scripts/duckdb_manager.register_bq_table) explicitly out of scope; tracked as follow-up. Removed stale docstring at the previous src/remote_query.py:204 that referenced scripts.duckdb_manager._create_bq_client as the default BQ client factory (RemoteQueryEngine never actually used that function). Test counts: tests/test_bq_access.py +27 (new), tests/test_v2_.py + tests/test_remote_query.py migrated to bq_access fixture (counts unchanged or +1-2 per file). Full suite: 2086 passed, 8 pre-existing failures (DB migration tests with unrelated internal_roles DependencyException — not introduced by this PR). fix(bq_access): translate DefaultCredentialsError to BqAccessError(auth_failed) CI on PR #138 caught: bigquery.Client(...) resolves Application Default Credentials at construction time; without ADC (CI without SA key, dev laptop without 'gcloud auth application-default login') it raises google.auth.exceptions.DefaultCredentialsError synchronously. Pre-fix _default_client_factory only caught ImportError, so DefaultCredentialsError propagated as raw exception — and from production endpoints would surface as bare 500 (the exact failure mode #134 sets out to fix). Now translates to BqAccessError(kind='auth_failed', details.hint='Run gcloud auth application-default login...'). Endpoint catch chain returns HTTP 502 with structured body. Adds unit test test_raises_auth_failed_on_default_credentials_error. Third-round spec review flagged this case in passing; the fix didn't land. CI's auth-less environment surfaced it. * fix(bq_access): get_bq_access() returns sentinel instead of raising when not configured Devin BUG_0001 on PR #138 review: 'get_bq_access() as FastAPI Depends breaks all v2 endpoints for non-BigQuery instances'. Pre-fix: get_bq_access() raised BqAccessError(not_configured) when neither BIGQUERY_PROJECT env nor data_source.bigquery.project was set. Because FastAPI resolves Depends() BEFORE the endpoint body runs, this exception fires during dep-injection — the endpoint's try/except BqAccessError clause never gets a chance to catch it. Result: every v2 request on Keboola-only or CSV-only instances returned bare HTTP 500, even for local-source tables that never touch BigQuery. Fix: get_bq_access() now returns a sentinel BqAccess with empty BqProjects and factories that raise BqAccessError(not_configured) on actual use. Construction succeeds, FastAPI's dep-injection cleanly yields the sentinel, the endpoint runs. The local-source code path in build_sample / build_schema / etc. never calls bq.client() or bq.duckdb_session() (it reads parquet directly), so non-BQ tables return 200 as before. Only when an endpoint actually tries to query BQ (source_type == 'bigquery') does the sentinel raise — and the endpoint's existing except BqAccessError catches it normally, returning structured 502 with hint. Test get_bq_access::test_raises_not_configured_when_neither_set renamed and rewritten to test_returns_sentinel_when_neither_set: asserts BqAccess is returned, then asserts client() and duckdb_session() each raise BqAccessError(not_configured) on call. Test test_does_not_cache_exceptions removed (no longer applicable) and replaced with test_sentinel_is_cached_per_process documenting the operator-restart-on-config-change contract. * docs(spec+plan): #134 genericize customer-specific tokens (CLAUDE.md OSS rule) Devin BUG_0001/0002 round 3 on PR #138: spec and plan docs contained customer-specific deployment hostnames, deployment names, and a GCP project ID that violated CLAUDE.md's vendor-agnostic OSS rule ('Nothing customer-specific belongs in code, configuration defaults, comments, docs, commit messages, PR titles, or PR bodies'). Replacements: agnes-development.groupondev.com -> <your-agnes-host> agnes-development -> <your-dev-instance> prj-grp-dataview-prod-1ff9 -> <your-data-project> s1_session_landings -> <bq_table_id> E2E verification semantics unchanged — operators still run the same four curls + config flip + retry, just substituting their own host / deployment name / project / table. * fix(bq_access): hook get_bq_access.cache_clear into instance_config.reset_cache Devin ANALYSIS_0004 on PR #138: get_bq_access is @functools.cache'd at process level, so it captures BigQuery project IDs at first call and ignores subsequent instance.yaml changes. Pre-Phase-2 the v2 endpoints re-read get_value() on every request, so admin /api/admin/server-config saves (which call instance_config.reset_cache()) hot-reloaded the BQ project. Without this fix, my refactor silently regresses that contract — operators editing instance.yaml via the admin UI would see no effect on v2 endpoints until container restart. instance_config.reset_cache() now also calls connectors.bigquery.access.get_bq_access.cache_clear() (lazy import, swallowed if connectors module isn't loaded — keeps instance_config usable in isolated unit tests). Adds test_instance_config_reset_cache_invalidates_get_bq_access as regression guard. Updates CHANGELOG Internal entry to mention the hot-reload contract + the not-configured sentinel behavior (round-3 fix from Devin BUG_0001 was previously only in commit message). * fix(bq_access): surface not_configured before identifier validation + plan path genericize Devin BUG_0001 + BUG_0002 round 5 on PR #138. BUG_0001 (plan doc): personal filesystem path violated CLAUDE.md vendor-agnostic rule. Replaced with '<worktree-root>' placeholder. BUG_0002 (sentinel error path): when get_bq_access() returns the sentinel BqAccess (BQ not configured), the empty bq.projects.data was reaching validate_quoted_identifier first and raising ValueError -> endpoint mapped to HTTP 400 'unsafe_identifier' instead of structured 500 'not_configured' with hint. Each fetch helper now checks 'if not bq.projects.data: bq.client()' as the first step, which triggers the sentinel's BqAccessError(not_configured). Endpoint catches the typed error and returns HTTP 500 with hint pointing at data_source.bigquery.project. Best-effort _fetch_bq_table_options returns {} silently in this case (preserves the swallow-all contract). * fix(bq_access): classify DuckDB-native exceptions from bigquery_query() via string match Devin ANALYSIS on PR #138 review (latest round). The DuckDB bigquery extension is a C++ plugin making its own HTTP calls — when BQ returns 403, it throws duckdb.IOException with the BQ error embedded as text, not gax.Forbidden. translate_bq_error's isinstance checks would miss these, falling to case 7 → bare 500 in production for v2_scan, v2_sample, and v2_schema (the bigquery_query() paths). Fix: last-resort string-match heuristic before the re-raise. 'Forbidden' / '403' / 'Bad Request' / '400' in the lowercased message classifies via the same kind hierarchy. The 'serviceusage' substring still distinguishes cross_project_forbidden from bq_forbidden. Specific enough that random exceptions without HTTP-error keywords still re-raise. Adds 4 unit tests covering the new heuristic + the 'don't swallow random exceptions' invariant. * chore(release): cut 0.22.0 PR #138 contains issue #134 user-visible behavior changes: - BREAKING: BIGQUERY_PROJECT env var now overrides instance.yaml data_source.bigquery.project for v2 endpoints (previously RemoteQueryEngine billing only). - Fixed: structured 502/400 on /api/v2/sample, /scan, /scan/estimate, /schema when BigQuery raises Forbidden/BadRequest (was bare 500). - Internal: BqAccess facade refactor unifying four duplicate BQ-access call sites; instance_config.reset_cache() now invalidates BqAccess cache too so admin server-config saves hot-reload BQ project IDs. Bumps to 0.22.0 because PR #137 merged first and took 0.21.0.	2026-04-30 10:11:20 +02:00
ZdenekSrotyr	a222f92e70	feat(admin): server configuration editor + 0.13.0 (#107 ) Adds /admin/server-config UI for editing instance.yaml from the web. Hardening: SSRF gate on data_source URLs, narrow-overlay write strategy, atomic writes, audit log with secret masking on shape changes, threading lock on read-modify-write, corrupt-overlay refusal on write side + louder log on read side, modal Promise resolution on backdrop dismiss, sentinel scrub on save (defense-in-depth client+server). Bundles Windows PowerShell wrapper from #80. Cuts release v0.13.0.	2026-04-29 00:47:23 +02:00
ZdenekSrotyr	49f109bf73	fix: address PR review findings — config write, CalVer, error handling - Config writes to DATA_DIR/state/instance.yaml (writable) instead of CONFIG_DIR (read-only :ro in Docker) - instance_config.py checks DATA_DIR/state/ first, then falls back to CONFIG_DIR for backward compat - CalVer counter is now global across channels (-YYYY.MM.) per spec - Keboola error messages sanitized — log full error, return generic msg - chmod in secrets.py wrapped in try/except for Windows compat - Setup wizard JS handles 401 (expired JWT) with user-facing message - deploy.yml changed to workflow_dispatch only (no duplicate test runs) - Smoke test uses docker-compose.prod.yml + AGNES_TAG instead of sed - docker-compose.prod.yml uses ${AGNES_TAG:-stable} env var 663 tests pass. 8 E2E verification tests pass.	2026-04-10 13:16:40 +02:00
ZdenekSrotyr	7f523788c2	fix: correct YAML path for instance name and subtitle get_instance_name and get_instance_subtitle now look up the nested instance.name and instance.subtitle keys to match the YAML structure.	2026-04-09 16:31:56 +02:00
ZdenekSrotyr	1287e63ed9	feat: complete system — web UI, all API endpoints, governance, admin, CLI commands Major additions: - Web UI: Jinja2 templates in FastAPI (login, dashboard, catalog, corporate memory, admin) - API: catalog profiles/metrics, telegram verify/unlink/status, admin table registry CRUD - Corporate memory governance: approve/reject/mandate/revoke/edit/batch + audit log - Sync: real DataSyncManager trigger, sync-settings, table-subscriptions - CLI: setup (init/test/deploy/verify), server (logs/restart/deploy/backup), explore - Instance config integration (instance.yaml loaded at startup) - 140 tests passing (25 new)	2026-03-27 16:52:22 +01:00

21 commits