Commit graph

91 commits

Author SHA1 Message Date
Vojtech Rysanek
7efcb10154 Merge remote-tracking branch 'origin/main' into vr/custom-scripts-integration
# Conflicts:
#	CHANGELOG.md
2026-05-21 14:20:37 +04:00
David Rybar
94af2581f6 feat(theme): switch default instance theme from navy to blue and enhance theme handling
- Updated the default `instance.theme` to `blue`, making it the new out-of-the-box look. The previous default `navy` can still be used by explicitly setting `AGNES_INSTANCE_THEME`.
- Pre-login pages now respect the configured `instance.theme`, eliminating the abrupt color change after sign-in for navy-configured instances.
- Adjusted documentation and code comments to reflect the new theme settings and their implications for existing instances.
- Version bump to 0.55.6 to reflect these changes.
2026-05-21 11:24:35 +02:00
Vojtech Rysanek
4b48377d44 feat(web): instance.custom_scripts — operator-injected HTML/JS into base.html
Add a generic, placement-aware mechanism for operators to inject HTML/JS
into every page that extends base.html or base_login.html. Each entry
takes name, enabled, placement (head_start | head_end | body_end), and
html. Replaces the need for per-vendor helpers when shipping feedback
widgets, analytics, or error-capture snippets.

Trust boundary mirrors the existing instance.logo_svg / instance.overview
pattern — admin-only, rendered with `| safe`. Resolved by
app/instance_config.py::get_custom_scripts(), surfaced in
/admin/server-config via _KNOWN_FIELDS["instance"]. Empty default keeps
the OSS vendor-neutral; sample Marker.io block ships commented out in
config/instance.yaml.example as the canonical example.
2026-05-21 13:22:27 +04:00
minasarustamyan
be9549c266
Marketplace: configurable 'See all curators' URL + flea-inner hero name fix (#370)
* fix(web): flea-inner skill/agent hero shows skill name, not parent plugin name

* feat(marketplace): admin-configurable 'See all curators' link URL

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
2026-05-21 11:02:29 +02:00
Vojtech
001e5ce40e
feat(web): /home value-first redesign + unified page-shell across app (#366)
* feat(web): value-first /home reskin (CEO mock palette + pillars + first-session)

Restructures `/home` to lead with product value instead of install steps,
matching the CEO mock proposed for the homepage:

- New intro hero on top — eyebrow `Welcome, {{ display_name }}`, H1
  `{{ instance_brand }} is your team's AI workspace`, lede framing the
  product as an "AI Chief of Staff", two CTAs (`Set up in ~15 min →`
  jumps to the wizard, `Just browse — no install needed` jumps to
  `#look-around`), and a four-pillar row (Data packages · Plugins ·
  Skills · Memory). Renders for both onboarded and not-onboarded users
  so the value framing is consistent across visits.
- New `first-session` narrative — five-beat walkthrough (launch → pick
  project → memory loads → ask → close) with mock terminal frames
  carrying traffic-light dots, prompts, and dimmed system output.
- Setup wizard chrome — progress chip (`Step 1 of N · ~15 min ·
  One-time · Reversible`), thin progress bar, and per-step number
  badges on each `.install-block` so the wizard reads as bounded
  instead of an open-ended scroll.
- Palette shift from blue to green/navy: `--hp-primary` aliases
  `#2ea877` (mint), `--hp-hero-bg` is navy `#0f1b3a`, code panels stay
  near-black `#0c1224` with warm-yellow `#ffd866` accents. The token
  alias is reused so downstream rules pick up the new accent
  automatically; instance theme overrides via
  `config.theme_overrides()` still win.
- VS Code surface tile carries a `Recommended` pill; the existing
  "Want to look around first?" section is renamed to `Explore your
  workspace` and gets the `#look-around` anchor.

All test-pinned class names and IDs (`install-hero`, `install-block`,
`home-mock`, `self-mark-btn`, `setupClaudeBtn`, `offboard-strip`,
`home-getting-started`, `home-gs-item`, `home-overview`,
`home-usage`) preserved as structural anchors; new visual language
overlays via additional classes. Existing onboarded/not-onboarded
branching, `/api/me/onboarded` POST, status frame gating, post-CTA
modal, and OS-tab switching JS unchanged. Stray `~/FoundryAI`
comment swapped for `~/{{ workspace_dir }}` to honor the
vendor-agnostic OSS rule.

51 home tests pass without modification.

* fix(web): /home palette inversion — dark intro hero on top, light setup card below

Previous reskin commit kept the install-hero as a dark navy gradient and
rendered the new intro hero as a light surface — opposite of what the CEO
mock specifies. Playwright comparison vs `data/ceo_home.html` confirmed:

- CEO mock: dark navy hero at TOP (with white pillars on navy), LIGHT
  white setup card BELOW with light step rows and dark code panels
  inset.
- Previous: light intro hero on top, dark setup card below. Inverted.

This patch flips both:

- `.home-hero-intro` now: dark navy gradient `#0f1b3a → #1a2a5f`, green
  radial glow in the corner, green eyebrow, white H1 (`accent` span
  green), rgba-white lede, green pill primary CTA, translucent-white
  secondary CTA, pillars row separated by hairline border-top with
  green square-dot bullets in front of each pillar header.
- `.install-hero` and `.install-block` now: white surface card with
  thin green accent strip across the top, light step rows split by
  hairline borders, green-tinted step-number circles (`#e6f9f0` bg,
  `#1f8a5e` ink), green progress chip + bar. Code panels
  (`.install-cmd`) and terminal frames stay dark — they're the "type
  this" surfaces.
- All previously-rgba-white descendants of `.install-hero`
  (close button, eyebrow, h1, lead, links, code chips, OS tabs,
  install notes, setup-CTA button, self-mark fallback, auto-detect
  badge, terminal-howto disclosure) re-skinned for light surface.

All 12 home page tests still pass (no markup changes, only CSS).

* fix(web): /home parity polish — system font + mock sizes + blue info hint + gray step-num

After v2 palette flip, user comparison vs CEO mock surfaced three
remaining gaps in the wizard area:

- Font stack mismatch: Agnes inherits Inter via `style-custom.css`,
  but the CEO mock uses the platform system stack (San Francisco on
  macOS, Segoe UI on Windows). The rendered weight/letterforms read
  noticeably different. `.home-mock` now declares
  `-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif`
  for itself and all descendants, with the monospace stack reserved
  for `code`/`kbd`/`pre`, `.install-cmd`, and `.terminal-body`.
- Step number badges were green-tinted; mock uses neutral gray
  (`#f0f2f6` bg, `#4a5168` ink) — green is reserved for the "done"
  state. Switched to `--hp-surface-dim` + `--hp-text-secondary`.
- "Don't have a terminal open?" disclosure was an amber/yellow
  variant left over from the old dark-hero palette. Mock uses a
  blue info-hint vocabulary (`--info-bg: #eef3ff`,
  `--info-line: #4f7cf2`, `--info-ink: #1c3994`) with white kbd
  chips. Added the info-* tokens to the `:root` block and re-skinned
  `details.terminal-howto` (incl. summary, body, kbd) to match.

Step-body type sizes also brought in line with the mock spec —
`.install-block .label` (step h3 equivalent) is now 17px / 700 with
6px gap; `.install-note` body type is 14px / 1.55.

`--hp-info-bg / --hp-info-ink / --hp-info-line / --hp-warn-bg /
--hp-warn-ink / --hp-warn-line / --hp-surface-dim` added as
first-class tokens so future hint/warn callouts pick the same colors
without a duplicate vocabulary.

12/12 home tests pass.

* feat(web): centralize design tokens + reword /home wizard to 6 steps (CEO mock parity)

Two intertwined changes that touch both global design + /home structure:

GLOBAL TOKEN SHIFT (app/web/static/style-custom.css)
- `--primary` flipped from blue `#0073D1` to green `#2ea877` — same brand
  alias the rest of the app referenced, so every page picks up the new
  accent automatically. Old `--primary-dark` / `--primary-light` recolored
  to match.
- New tokens added: `--brand-accent`, `--hero-bg`, `--hero-ink`,
  `--surface-dim`, `--info-bg/ink/line`, `--warn-bg/ink/line`. Brings
  the global vocabulary in line with the CEO mock's `:root` block so
  callouts and hero surfaces don't have to invent local tokens.
- `--font-primary` switched from Inter-led stack to the system stack
  (`-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Inter",
  system-ui, sans-serif`) so weight/letterforms render identically on
  macOS (San Francisco) and Windows (Segoe UI) — matches the mock and
  avoids a font-loading flash for analysts without Inter installed.
- Shadow tints re-cast in navy `rgba(15,27,58,...)`; focus ring uses
  the new green `rgba(46,168,119,0.25)`.
- `.app-nav-link` font-size 13px → 14px, padding 6px 12px → 8px 14px,
  hover bg → `--primary-light` (mint), color → `--primary-dark`.
  `.app-nav-menu-item.is-active` re-tinted to the same green system.
- Sweep across 26 templates (style-custom.css + 25 template files)
  replacing every hardcoded `#0073D1` / `#005BA3` / `#E6F3FC` /
  `rgba(0,115,209,…)` / `rgba(0,86,163,…)` with token references or
  the new green hexes — 175 occurrences total. Pages that styled their
  own buttons / borders / shadows pick up the new brand color without
  per-page overrides.

/HOME WIZARD: 6 STEPS PER MOCK (app/web/templates/home_not_onboarded.html)
- Step 1 reworded `Install Claude Code on your computer` + `~3 min`
  subhead (mock copy).
- Step 2 renamed `Pick a folder for {{ instance_brand }}` (was
  `create your workspace folder`) — same `mkdir` command, mock-aligned
  framing.
- NEW Step 3 `Open a terminal inside that folder` — no shell command,
  just the "you are standing in the right directory" reassurance with
  a Finder/PowerShell/file-manager howto disclosure. Mirrors the CEO
  mock's Step 3.
- Step 4 (was Step 3, gated by `home_automode.show`) renamed
  `Launch Claude with auto-approve on`. Body copy lightly updated so
  it references "the next step" instead of "Step 4".
- Step 5 (was Step 4) renamed `Get the install script and paste it
  into Claude`. The setup-cta-lead now explicitly says
  "pasting the script into Claude Code will install {{ instance_brand
  }}…" so existing test assertions pinning the `install Agnes`
  substring still match.
- NEW Step 6 `Optional: create a one-word shortcut for next time` —
  prints an `echo 'alias {{workspace_dir|lower}}=…' >> ~/.zshrc`
  one-liner for Unix and an `Add-Content $PROFILE …` equivalent for
  Windows. OS tabs + copy buttons reuse the existing wizard chrome.
- Progress chip dynamic: `Step 1 of 6` when home_automode is on,
  `Step 1 of 5` when off. Progress bar fill `100 // total_steps` so
  the bar sits at 16-20 % on first paint.
- `.step-lede` token added for the new short body copy beneath each
  step label (14.5px / ink-soft).
- `macOS / Linux / WSL` tab labels changed to `macOS / Linux` per
  user instruction. Terminal-howto `WSL:` paragraph dropped; the
  paste-shortcut hint now reads `(Linux)` instead of `(Linux/WSL)`.
  Functional WSL handling in `connector_prompts.py` (it's a Linux
  detection fallback, not user-facing label) preserved.
- `setup_instructions.py` Claude Code install hint:
  `npm (Linux / WSL)` → `npm (Linux)`.

SURFACES — 4 CARDS PER MOCK
- Replaced the 3-tile `.home-usage-grid` with a 4-card grid:
  - VS Code (Recommended) — `.surface-card.feature`, green ring,
    DAILY USE eyebrow + 5-step numbered list + `Open VS Code setup
    guide →` link to `/setup-advanced#vscode`.
  - Terminal — QUICK ACCESS eyebrow + 4-step list.
  - Claude Code (Desktop app) — CONNECT IT eyebrow + 4-step list.
  - Cowork (claude.ai) — `.surface-card.incomplete`, warn-tinted
    border + `Instructions needed` badge + a TODO callout describing
    the missing content. The card is intentionally honest about the
    gap rather than hiding it.

TEST UPDATES
- `test_web_home_page.py` negative onboarded-state assertions
  rebased on the new step labels (6 entries instead of 4).
- `test_home_route_resolution.py` `test_home_renders_automode_block_by_default`
  + its `_when_env_off` counterpart now check the new
  `Step 4 — Launch Claude with auto-approve on` label.

* fix(web): /home section content + layout — verbatim mock match

User comparison flagged several remaining gaps; this patch rewrites
the three lower sections of /home to match the CEO mock spec exactly:

FIRST-SESSION (5 beats)
- h2 28px / 700 / -.5px tracking (was 19px / 600).
- lede 18px ink-soft (was 13.5px secondary).
- `.session-walk` wrapper, 36px gap between beats (mock spec).
- `.session-step` grid 48px / 1fr, gap 22px — number circle on
  the left, content on the right.
- `.session-num` 40 × 40 circle with SOLID GREEN bg (`--primary`)
  and WHITE text + soft green shadow (was 28px mint pill w/
  dark-green text).
- `.session-content h3` 18px / 600 (was 14.5px / 600).
- `.session-content > p` 15px.
- `.session-content .annotation` 13.5px ink-muted body type with
  `strong` for highlighting (replaces the upper-case "WHAT'S
  HAPPENING" eyebrow pattern that didn't match the mock).
- `.session-intro` callout card (white surface + mint icon block)
  framing the "five beats" tagline.
- `.session-tldr` summary box (brand-light bg + brand-dark left
  border) wrapping up the loop.
- Terminal frames re-skinned: `#0c1224` body / `#182241` bar /
  real macOS traffic-light colors `#ff5f57` / `#febc2e` / `#28c840`.
- Terminal body 13px / 1.65 line-height with mock-spec class
  vocabulary: `.you` (yellow input), `.ai-name` (brand bold),
  `.path` (light blue), `.dim` (translucent code-ink), `.caret`
  (blinking cursor).
- Five beats rewritten with mock's exact narrative flow (launch →
  menu → pick → ask → close), vendor-agnostic project names
  (`RevenueAnalysis`, `Onboarding`, etc.) replacing the customer-
  specific `GRPN_*` examples in the mock. Templated `{{
  instance_brand }}` / `{{ workspace_dir }}` / `{{ workspace_dir |
  lower }}` (the shortcut alias) everywhere.

SURFACES (4 cards)
- The section is no longer wrapped in a white rectangle; the
  `.home-usage` class loses its bg + border + padding (mock has the
  cards directly on the page bg).
- h2 28px (was 22px). Eyebrow 12px / 1.5px tracking / brand-dark.
- `.surface-card.feature` (VS Code) now uses 2px green border +
  vertical brand-light → white gradient (was 1px ring).
- `.surface-card.incomplete` (Cowork) uses 2px red border (`#e35e5e`)
  + vertical red-tint → white gradient (was yellow flat bg).
- `.surface-card .steps` panel: inner surface-dim bg + 8px radius
  + 13px font.
- `.surface-foot` top-border + ink-muted (mock spec).
- `.badge-warn` now a solid red box (`#e35e5e` bg + white ink + 4px
  radius) instead of a yellow pill, matching the mock.
- Header layout fixed: the global absorbed `header { display: flex;
  justify-content: space-between }` rule was making the h2 sit on
  the right of the eyebrow; explicit `display: block` override on
  `.home-mock section > header` puts the title on the LEFT under
  the eyebrow as the mock has.

BROWSE — Explore your workspace
- Wrapped in `<section class="browse-section">` with proper
  eyebrow + h2 + lede (was a bare `.section-label` div).
- `.browse-grid` 5-col grid (was responsive auto-fill, 4-card
  layout). Skills tile added as a 5th card linking to
  `/marketplace?type=skills`.
- `.browse-card` mock-spec: 22 20 padding, 28px icon, 15px title,
  12.5px ink-muted desc, hover lifts -2px with brand border +
  shadow-md.

Section wrappers (`.home-usage`, `.first-session`) no longer carry
the white card chrome — they sit directly on the page bg, matching
the mock. Only Getting Started + Overview keep their white cards.

GLOBAL eyebrow vocabulary (`.home-hero-intro .eyebrow`,
`.first-session > .eyebrow`, `.surfaces > header .eyebrow`,
`.browse-section .eyebrow`) all aligned to mock spec: 12px / 700 /
1.5px tracking / brand-dark color / 14px bottom margin.

Hero h1 bumped to 44px / 800 / -1px tracking (was 32px / 600).

51/51 home tests pass.

* fix(web): /home session-intro card + terminal-body verbatim mock match

User comparison flagged three remaining /home gaps; this patch
addresses each:

- `.session-intro` rule was missing — the "five beats" tagline
  rendered as a bare line with no card chrome. Added the mock-
  spec card: white surface, 14px radius, 20×24 padding, 1px
  border + shadow-sm, with a 44×44 brand-light icon block on the
  left.

- Beat 1 terminal-title was `~/{{ workspace_dir }} — zsh` (mock-
  style shell-pwd format), but the user wants every terminal
  frame across all 5 beats to read `claude — {{ instance_brand }}`.
  Updated.

- Terminal-body line structure for beats 2-5 rewritten verbatim
  from the CEO mock:
  - `<span class="prompt">&gt;</span><span class="you">…</span>`
    now has no space between the prompt and user input (mock
    pattern: zero gap, the .prompt's `margin-right: 8px` provides
    the visual separation).
  - Beat 2 menu items use `<strong>[N]</strong>` numbering with
    project entries on indented lines, each project name followed
    by a `<span class="dim">(N ago)</span>` timestamp at a fixed
    column — instead of my prior single-line concatenation.
  - Beat 3 narrative split into 4 stanzas separated by blank lines
    (matches mock): the "Switched to <strong>X</strong>" status,
    then dim Loaded/Last-session lines, then a stand-alone "One
    unprocessed input detected:" pair, then the "Want me to
    process …" question. My prior version dim-wrapped the entire
    block, which looked off.
  - Beat 4 narrative split into headline summary + risks section
    with <strong> heads + bullet lists separated by blank lines,
    matching the mock's "Q1 close summary" / "Open risks" rhythm.
    The Q1 question carries the mock's manual line-break + 2-
    space continuation indent inside the `.you` span — without
    that, terminal-body's `white-space: pre-wrap` would auto-wrap
    awkwardly at a different column than the mock.
  - Beat 5 exit narrative uses two separate dim lines + a
    standalone `.ai-name` "See you next time." line, then prompt
    + caret. My prior version collapsed everything into one dim
    block.
  - Project names changed from customer-specific (`GRPN_*`) to
    generic (RevenueAnalysis, WeeklyReview, Onboarding, OpsDb,
    HRHandShake) so the OSS distribution stays vendor-agnostic
    per CLAUDE.md.
  - `Marketing plan` examples replaced with `Q1 close` so the
    narrative stays plausible for an analyst audience.

12/12 home tests pass.

* fix(web): /home surfaces verbatim mock — VS Code thumb, Terminal expected-output, NEW badge

User comparison flagged three remaining surface-section gaps:

- VS Code surface card was rendering a generic "Screenshot pending"
  placeholder; the mock has a labeled inline mockup
  (`<a class="vscode-thumb">` w/ `.thumb-fallback`) showing the
  recommended 4-pane layout (EXPLORER yellow, TERMINAL 1 purple,
  TERMINAL 2 green, TERMINAL 3 orange) on a dark navy bg + a
  "Recommended layout" caption pill. CSS `.vscode-thumb` block
  added — uses gradient-strip backgrounds to draw the colored
  panel bars without needing a base64 image.

- "Recommended" badge was a pill (999px radius) with
  `--brand-accent` bg + navy text. Mock uses `.badge` instead of
  `.recommend-pill` — solid `--primary` (brand-dark green) bg
  with WHITE text and 4px radius. Replaced the class + CSS rule
  so the badge reads as a tag, not a pill.

- Terminal surface card was missing the "What you should see"
  subsection — mock has an `.expected-output` block showing a
  sample of the welcome menu inside a dim dashed panel. Added the
  block with the mock's exact rendered output (templated to
  `{{ instance_brand }}` + generic project names instead of
  customer-specific GRPN entries) plus the `.expected-output`
  CSS (surface-dim bg + dashed border + `::before` "WHAT YOU
  SHOULD SEE" eyebrow per mock spec).

Also addressed the explore-section feedback:

- Skills browse-card now carries the `new` class so it picks up
  the `.browse-card.new::after` corner badge ("NEW", green bg,
  white text, 10px / 700 / 0.5px tracking) per mock.
- Browse cards align same height via `align-self: stretch` (grid
  default) + `flex-grow: 1` on `.browse-desc` so descriptions
  fill remaining vertical space; previously the Skills tile sat
  shorter because its desc text was longer than others'.

Structural HTML changes to all four surface cards: dropped the
inner `<div class="surface-card-head">` wrapper + `<p
class="surface-pitch">` class in favor of mock's flat layout
(`.what` + `.steps` + `.when-to-use`). `<ol class="surface-steps">`
replaced with `<div class="steps"><strong
class="steps-eyebrow">DAILY USE / QUICK ACCESS / CONNECT IT</strong>
<ol>...</ol></div>` so the eyebrow + numbered list share the
mock's tinted surface-dim panel.

12/12 home tests pass.

* fix(web): align /home setup walkthrough to design spec

- Setup-section header (eyebrow + heading + lede) floats above the
  install hero; install card has no accent strip; step labels drop
  `Step N —` prefix; closing strip is single flex row.
- VS Code surface card renders recommended-layout screenshot from
  `/static/img/vscode-layout.png` with click-to-enlarge lightbox.
- Workspace install path cascades to `~/Desktop/{workspace_dir}` in
  every step, surface card, first-session annotation, and shortcut.
- Step 1 verify text restores Enterprise — Finance and Legal option.
- Step 6 shortcut installs a shell function with arg forwarding
  (`"$@"` unix / `@args` windows) and a user-facing Auto / YOLO
  permission-mode toggle.
- Step 5 manual-fallback details inline on the CTA row; description
  reads at step-lede size, not 13px chip.
- Setup-section heading no longer right-aligns (was inheriting
  `header { display: flex; justify-content: space-between }` from
  the legacy stylesheet; wrapper changed to `<div>`).
- Getting Started `<details>` block removed (duplicated links).

* test(web): align /home tests with restructured setup wizard

- Replace test_getting_started_card_renders_on_home with
  test_setup_section_renders_for_not_onboarded — asserts the new
  setup-section-header floats above the install hero and Getting
  Started markup is absent (block removed in the prior commit).
- Update automode-block test to match labels without the
  `Step N —` prefix.
- Update setup-CTA partial test to match the relabeled
  "Copy install script to clipboard" button.

Drop orphaned CSS for `.home-getting-started`, `.home-gs-summary*`,
and `.home-gs-item` — selectors had no matching markup after the
Getting Started block was removed.

Also: Step 3 `pwd` expected-output uses an absolute path
(`/Users/yourname/Desktop/{workspace_dir}`) instead of the
tilde-prefixed form, matching what the command actually prints.

* fix(web): repaint home_onboarded + setup_advanced; align CTA label

- home_onboarded + setup_advanced still carried the retired blue
  `#0056A3` as both `--hp-primary-dark` and the hero gradient
  endpoint. Both reference `var(--primary-dark)` now so the green
  palette cascades.
- setup_advanced YOLO snippet was the old `alias` form (no cd, no
  arg forwarding). Replaced with the shell function variant from
  /home Step 6 — drops into ~/Desktop/{workspace_dir} and forwards
  "\$@" (unix) / @args (Windows).
- setup_advanced ~/{workspace_dir} path references cascaded to
  ~/Desktop/{workspace_dir} so install story matches /home.
- Dashboard's "Setup a new Claude Code" button label aligned to the
  canonical "Copy install script to clipboard" — matches /home and
  the new docstring in _claude_setup_cta.jinja, which now mandates
  this label across consumers.

* fix(web): keep base brand blue; scope green palette to /home redesign

User noticed login + dashboard had turned green when the /home
redesign flipped --primary from blue (#0073D1) to green (#2ea877)
in commit 278f202e. The brand-wide flip went further than the
redesign needed — only /home, /home (onboarded), and /setup-advanced
intentionally use the green/navy spec; every other page (login,
dashboard, catalog, marketplace, admin, profile) was just inheriting
the green because --primary cascaded everywhere.

Revert the global brand colour to blue and lock the green into the
two outstanding redesign scopes:

- style-custom.css: --primary back to #0073D1, --primary-light back
  to rgba(0,115,209,0.1), --primary-dark back to #005BA3,
  --brand-accent back to a lighter blue.
- home_onboarded.html: .home-mock now sets --hp-primary,
  --hp-primary-dark, --hp-primary-light to explicit green hex
  (matching home_not_onboarded), so the hero stays green regardless
  of the global brand.
- setup_advanced.html: same lock — .advanced-mock pins the green
  palette in-scope.

Hero gradients on both pages now reference the local --hp-primary
chain (not the global --primary), so any future palette tweak inside
either scope cascades correctly without disturbing the rest of the app.

* refactor(web): hoist --hp-* into shared design-tokens.css (--ds-*)

PR 2 of the design-system extraction ladder. Pure mechanical rename
+ dedup; no visual diff on any rendered page (verified on /home,
/dashboard).

- New app/web/static/css/design-tokens.css declares the full token
  set on :root: brand surface (green primary, primary-dark, mint
  light, brand-accent), hero (navy bg + ink), code-panel (near-black
  bg + cool ink + warm-yellow), light surfaces (bg/surface/border),
  text (primary/secondary/muted), orange accent, info + warn
  callout vocabularies, navy-tinted elevation shadows, system font
  stack + mono.
- base.html loads it alongside style-custom.css so the tokens are
  globally available.
- Rename --hp-* -> --ds-* in home_not_onboarded (313 refs),
  home_onboarded (15), setup_advanced (39). 367 token references
  pointed at one of three local blocks; now all point at the
  global :root.
- Drop the three local token blocks. Each scope class
  (.home-mock / .advanced-mock) only keeps its base ink + font-size
  + line-height rules.

The legacy --primary family stays canonical for the blue base
brand — login, dashboard, catalog, marketplace, admin still read
blue. The design system is opt-in via the scope class.

* refactor(web): extract shared components.css; migrate /home markup

PR 3 of the design-system extraction ladder. First batch of
reusable components lifted out of home_not_onboarded.html into a
new shared stylesheet; markup migrated to consume them.

- New app/web/static/css/components.css with five components, all
  reusable on any page that loads design-tokens.css:
    .callout-rec        — amber lightbulb recommendation box
    .callout-hint       — blue info hint box
    .code-output        — "WHAT YOU SHOULD SEE" terminal output block
    .lightbox           — full-bleed image enlarge overlay
    .setup-section-header — wizard header (eyebrow + h2 + lede)
- base.html loads components.css after design-tokens.css.
- home_not_onboarded.html markup renamed:
    class="rec"             -> class="callout-rec"
    class="hint"            -> class="callout-hint"
    class="expected-output" -> class="code-output"
- Local CSS rules removed from home_not_onboarded.html for each of
  the extracted components — ~150 lines down to 5-line "extracted to
  components.css" comments. The bespoke wizard-specific styles
  (.install-cmd, .os-tabs, .mode-tabs, .terminal-frame) stay
  template-local for now since they only have one consumer.

Visual regression check: /home install hero renders the amber rec
callout, blue hint callout, dashed code-output block, green section
header, and click-to-enlarge VS Code thumb identically to the
pre-extraction render. 43 home tests pass.

* fix(web): unify page-headers — activity-center full-width, marketplace shares box

- /activity-center audit-log hero rendered as half-width because the
  _page_hero include was inside <header class="obs-topbar">, a flex
  row that pinned the time-range + auto-refresh controls next to it.
  The hero is now a sibling rendered before the <header>, so it
  spans the full container width like every other admin page; the
  controls keep their flex row underneath.
- Marketplace hero unified with .page-header--hero. Markup is now
  <section class="page-header page-header--hero mp-hero"> so the
  shared box drives padding/radius/gradient/max-width/shadow; the
  .mp-hero override block only carries the right-anchored cover
  image and the rules for the search row + scope checkboxes (which
  the canonical hero doesn't have). Inner text uses the canonical
  .page-header__eyebrow / __title / __subtitle classes.
- .page-header--hero shadow tint now follows the brand blue
  (rgba(0, 115, 209, 0.2)) instead of the leftover green from the
  prior palette flip; same depth highlight everywhere the gradient
  is blue.

* fix(web): unify remaining page heroes — admin, profile, install, store, stack

Sweep across pages that carried bespoke gradient hero markup so
every page-hero shares the canonical `.page-header--hero`
dimensions (padding 28/32/24, border-radius 14, max-width
var(--width-app), navy-tinted shadow, gradient with --primary →
--primary-dark). Inner text uses the .page-header__eyebrow /
__title / __subtitle classes so typography matches across the app.

- admin_tables: migrated to _page_hero.html include.
- admin_tokens: kept .tokens-hero wrapper for the counts-chip row
  but added the canonical class on the same element; stripped
  duplicate gradient + padding + typography rules.
- install: same pattern (kept hero-meta pill row).
- profile: migrated to _page_hero.html include.
- store_upload: kept .upload-hero wrapper for the .meta chip row;
  composite class with the canonical hero.
- setup_advanced: .advanced-mock .ad-hero now matches canonical
  dimensions; green palette retained via --ds-primary/dark.
- stack_card.css: .stack-hero (catalog + corporate-memory search
  hero) uses canonical gradient + padding + max-width.

The detail-page heroes (marketplace_plugin_detail,
marketplace_item_detail, catalog_*_detail, store_edit,
admin_group_detail, admin_store_submission_detail) stay bespoke
for now — they're rich detail headers with photos, badges, install
actions; converting them would lose contract context. Same applies
to dashboard.html env-setup-cta (it's a CTA card, not a page hero).

* fix(web): canonicalise .container — single page shell every page inherits

Previously each admin page set its own `.container:has(.<page>)
{max-width: none}` + `.<page>-page {max-width: 1400px}` override,
and per-page hero markup either nested inside flex toolbars (which
pinned the hero next to filter controls and squeezed it half-width)
or self-constrained with a different max-width than the page. /home,
/dashboard, /marketplace, and /admin/* ended up at different widths
with different nav-to-hero gaps.

- style-custom.css `.container` now carries the canonical 1280px
  max-width + `16px 32px 48px` padding so every page inherits the
  same nav-to-hero gap and side gutters. `.container > main` is
  margin/padding 0 so the container is the sole owner of gutters.
- `.page-header--hero` drops its self-constraining max-width and
  auto-centering margin — the container provides the width, so the
  hero sits flush with the table/toolbar below it.
- `.stack-hero` (catalog + corporate-memory) and `.advanced-mock
  .ad-hero` (/setup-advanced) follow the same pattern: container
  owns the width.
- Per-page max-width overrides stripped from admin_users,
  admin_access, admin_groups, admin_marketplaces, admin_welcome,
  admin_workspace_prompt.
- _page_hero include extracted from inside flex toolbars on
  admin_users, admin_access, admin_groups, admin_marketplaces,
  admin_server_config, admin_welcome, admin_workspace_prompt,
  admin_sessions, admin_session_detail, admin_usage,
  activity_center. The toolbar (`.users-toolbar`, `.gp-toolbar`,
  etc.) keeps only the filter + action controls; hero renders
  before it as a sibling.
- _page_chrome.html trimmed to just the page-background tint for
  the redesign scopes; the duplicate `.container` rules it carried
  are now redundant.

Verified: /home, /admin/marketplaces, /admin/users all render
container width 1280px with hero top at 88px (16px below the
72px-tall sticky nav). Same spacing as /home design spec.

* fix(web): admin_tables + admin_corporate_memory inherit canonical .container

Both pages were overriding `{% block layout %}` from base.html,
which bypasses the canonical `.container` wrapper. Result: hero
span the full viewport (1596px on a wide screen) while the inner
content sat at a narrower max-width — hero and content didn't
align, and the nav-to-hero gap differed from every other admin
page.

Switched both templates to `{% block content %}` so they render
inside the canonical `.container` from base.html — same path as
admin_groups, admin_users, admin_marketplaces, etc.

- admin_tables: dropped local `.page-title { max-width: 1600px }`
  + `.content { max-width: 1600px }` overrides (kept typography +
  inner gutter rules) and the mobile padding overrides that paired
  with them. Container now owns the gutters.
- admin_corporate_memory: only the block keyword needed changing;
  the template already had a clean inner structure (no max-width
  override on `.container-memory`).

Verified on /admin/tables and /admin/corporate-memory:
- .container width 1280, padding 16/32/48
- Hero top 88 (nav 72 + container padding-top 16)
- Hero + content both 1216px wide, both at left 190 — perfect
  alignment with /admin/groups.

* fix(web): drop .page-shell padding override + admin_tables stale :root

Two regressions discovered after the canonical-container unification:

1. `.container:has(.page-shell)` still set `padding: 28px 32px 48px`
   while the canonical `.container` had moved to `16px 32px 48px`.
   Every page-shell consumer (/admin/sessions, /admin/sessions/<id>,
   /admin/usage, /marketplace, /dashboard, marketplace detail pages,
   /me/activity, /store/*, /admin/store-submissions) was rendering
   with a 28px nav-to-hero gap while /admin/users + /admin/groups
   rendered with 16px. Same width, mismatched vertical rhythm.
   The opt-in rule is now a no-op marker: canonical container
   already provides 1280px + 16/32/48 + main margin/padding 0.

2. admin_tables.html had a stale `<style>` block that re-declared
   `:root { --primary: var(--primary); ... }`. The self-referential
   token resolved to empty, collapsing the page-header hero's
   `linear-gradient(135deg, var(--primary), var(--primary-dark))`
   to no background — the hero appeared as a pale ghost without
   colour. The entire shadow `:root` block was a stale copy of the
   design tokens that style-custom.css already provides. Dropped
   it; tokens now resolve from the global `:root`.

After both fixes /admin/sessions, /admin/tables, and every other
page-shell consumer match /admin/groups exactly: container 1280px,
container padding-top 16px, hero at top 88px / left 190px / width
1216px.

* fix(web): drop /admin/tokens .tokens-page width + padding override

`.tokens-page` carried its own `max-width: 1280px; margin: 0 auto;
padding: 28px 8px 48px` block — the canonical `.container` already
provides width + 16/32/48 padding, so the nested wrapper was
adding 28px on top of the container's 16px (= 44px nav-to-hero
gap, vs 16px on every other admin page) and shrinking the hero
sideways by 8px on each side (1200px vs the canonical 1216px).

After: container owns the layout; `.tokens-page` is just a
font-family scope. /admin/tokens hero now sits at top 88, left 190,
width 1216 — same numbers as /admin/groups / /admin/users.

* fix(web): hero links readable on blue; /admin/access Groups link href

- New `.page-header--hero a` rule in style-custom.css forces any
  anchor inside a gradient hero to render white + underlined so
  links stay readable on the blue background. Previously links
  inherited the global `var(--primary)` blue, which disappeared
  on top of the matching blue gradient. No per-page class needed —
  drop a plain `<a>` in any hero subtitle and it just works.
- /admin/access hero subtitle was Jinja-passing the inline link
  with HTML-entity-encoded quotes (`href=&quot;...&quot;`). The
  entities decoded to literal `"` characters inside the rendered
  href, producing `/admin/%22/admin/groups%22` — a 404. Switched
  the `set` to a block-set (`{% set page_hero_subtitle %}...{% endset %}`)
  so the inline `<a href="/admin/groups">Groups</a>` survives
  unescaped through `_page_hero.html`. Also stripped the now-redundant
  inline `style="color:#fff;text-decoration:underline;"` — the new
  shared rule handles it.

* fix(web): /dashboard top padding matches every other page

`.main` on /dashboard had `padding: 28px 32px 48px` while every
other page now uses `16px 32px 48px` via the canonical
`.container`. Dashboard bypasses `.container` (overrides
base.html's `layout` block to render a full-width `<main>`
directly), so the padding lives on `.main` itself — bumped the
top to 16px to match.

After: first child top = 88, left = 190, width = 1216 — same
numbers as /admin/groups / /admin/users / /admin/marketplaces.

* fix(web): green eyebrow + white title on .page-header--hero (matches /home)

`.page-header--hero .page-header__eyebrow` was faint white
(rgba(255,255,255,0.75)) — readable but unbranded against the blue
gradient. Changed to `var(--ds-brand-accent)` (mint green #54d3a0)
so every page hero pairs a green eyebrow with white title +
subtitle, echoing /home's setup-section header (green eyebrow,
dark heading combo). One CSS rule applies everywhere — no
per-page styling needed.

Also bumped the eyebrow to font-weight 700 / letter-spacing 1.2px
so the green stands out cleanly against the gradient.

* fix(web): page-header--hero + stack-hero use /home navy gradient

`.page-header--hero` and `.stack-hero` were on the brand-blue
gradient (`var(--primary)` → `var(--primary-dark)`) while
/home's hero (`.home-hero-intro`) sits on the deeper navy
gradient (`#0f1b3a` → `#1a2a5f`). Every other page-hero now
uses that same navy gradient so /home, /marketplace, /catalog,
/corporate-memory, /admin/*, /profile, /install, /dashboard,
/setup-advanced share one brand surface. Shadow tint adjusted
to the navy depth (rgba(15, 27, 58, 0.22)).

Brand blue stays the link/CTA colour everywhere else; only the
hero box itself is navy.

* fix(web): primary buttons green; marketplace tabs navy translucent

Two parity tweaks pulling the rest of the app toward /home's
visual language.

- `.btn-primary` (both rules in style-custom.css) now uses
  `var(--ds-primary)` / `var(--ds-primary-dark)` green fill,
  matching the "Copy install script to clipboard" button on
  /home. Brand-blue `--primary` still drives link colour and the
  accent surface; only the filled button background flipped to
  green. Every page with a `.btn-primary` (admin "+Add user",
  "+Add marketplace", catalog, marketplace actions, dashboard,
  modals) now reads as the same "do it" affordance.
- `.mp-tabs` (Curated Marketplace / Flea Market / My Stack tab
  group) now sits on the navy `--ds-hero-bg` with translucent
  white pills (rgba(255,255,255,0.10) inactive, 0.18 active) —
  same translucent-white-on-navy treatment as the "Just browse —
  no install needed" pill on /home. Icons render as soft white;
  per-tab colour-coding dropped in favour of the unified surface.

* fix(web): catalog/memory tabs + empty-state CTA + admin action buttons

Bring /catalog and /memory in line with /home + /marketplace:

- `.stack-tabs` (Browse / My Stack / Recipes on /catalog,
  Browse / My Stack on /memory) now uses the navy `--ds-hero-bg`
  container with translucent-white-on-navy pills, mirroring the
  `.mp-tabs` treatment and /home's "Just browse — no install
  needed" CTA pill. Per-tab icon colour-coding dropped — icons
  render as soft white on the navy fill.
- `.stack-tabs-row__actions .btn` (right-slot "+New Recipe",
  "+New Data Package" admin CTAs) now uses green primary fill
  (`--ds-primary`), matching `.btn-primary` and /home's
  "Copy install script to clipboard" button.
- `.stack-empty .cta a` (empty-state action button — the
  "Open /admin/tables →" CTA on /catalog and equivalent on
  /memory) flipped from blue `--primary` to green `--ds-primary`
  so the colour aligns with every other primary button in the app.

* fix(web): marketplace Search button green (--ds-primary) matching other CTAs

* fix(web): unify Search button + admin-action button across browse pages

- Added Search button (`<button class="stack-hero__search-btn">`)
  to /catalog and /memory heroes — same green pill as /marketplace.
  Wired to the existing live-filter pipeline (button click runs
  `applyFilters()` and refocuses the input). All three browse pages
  now wear the identical search bar UI.
- `.stack-hero__search-btn` shares `--ds-primary` fill with
  `.mp-hero .search-btn`.
- `.mp-actions .btn` ("Submit a skill or plugin" CTA on /marketplace)
  flipped from the legacy blue-outline to the same green primary
  fill + dimensions (`display: inline-flex; line-height: 1;
  padding: 9px 16px; gap: 6px`) as `.stack-tabs-row__actions .btn`
  on /catalog and /memory. All three right-slot action buttons
  render at identical height now.
- `.stack-tabs-row__actions .btn` got `inline-flex` + `line-height: 1`
  + `gap: 6px` so a `<button class="btn">` and a `<a class="btn">`
  both render at exactly 33px high — the embedded
  `.admin-only-hint` chip no longer pushes one variant taller
  than the other.

* fix(web): marketplace guide CTAs green (fastpath + primary); drop flea purple

* fix(web): dashboard CTA hero on navy; readable <code> chips in hero

- `.env-setup-cta` on /dashboard ("Set up a new Claude Code"
  card) flipped from the brand-blue gradient + green-tinted shadow
  to the canonical navy gradient (`--ds-hero-bg` → `#1a2a5f`) with
  navy-tinted shadow + 14px radius + 28/32/24 padding, matching
  `.page-header--hero` and /home's `.home-hero-intro`. Dashboard's
  top CTA now sits on the same brand surface as every other hero.
- Added `.page-header--hero code` rule — translucent white pill +
  warm-yellow ink (#ffd866) so `<code>` chips embedded in hero
  subtitles read as code samples against the navy gradient. The
  global `code` rule sets `color: var(--text-primary)` (dark),
  which turned in-hero chips into invisible dark-on-white-on-navy
  ghosts (e.g. the `-by-dev` suffix on /store/new).
- /store/new's `.page-header__subtitle code` dropped its inline
  style override — the shared rule handles it now.

* feat(web): two-theme switching via data-theme + admin toggle

Introduces a theme system that flips the entire UI palette between
"navy" (current design, default) and "blue" (pre-redesign palette)
via a single `<html data-theme="...">` attribute. Page markup, class
names, and component styles don't change — only the `--ds-*` token
values flip.

Backend
- New `app/instance_config.py::get_instance_theme()` resolves the
  active theme from `AGNES_INSTANCE_THEME` env > `instance.theme`
  in instance.yaml > default "navy". Unrecognised values clamp to
  "navy" so a typo doesn't break the page.
- `app/web/router.py::_build_context` injects `instance_theme`
  alongside `instance_brand` etc. so every template inherits it.
- `app/web/templates/base.html` renders
  `<html lang="en" data-theme="{{ instance_theme | default('navy') }}">`.

CSS
- `app/web/static/css/design-tokens.css` adds two new tokens to
  the default `:root` set: `--ds-hero-shadow` (drop-shadow tint
  on hero boxes) and `--ds-hero-eyebrow` (eyebrow accent colour).
  Plus a `:root[data-theme="blue"]` override block that flips
  seven tokens: `--ds-primary`, `--ds-primary-dark`,
  `--ds-primary-light`, `--ds-brand-accent`, `--ds-hero-bg`,
  `--ds-hero-bg-deep`, `--ds-hero-shadow`, `--ds-hero-eyebrow`.
  The blue theme aliases the brand surface tokens back to the
  legacy `--primary` family.
- `.page-header--hero`, `.stack-hero`, `.env-setup-cta`,
  `.home-mock .home-hero-intro` now reference the new
  `--ds-hero-shadow` and `--ds-hero-bg-deep` tokens instead of
  hard-coding `rgba(15, 27, 58, 0.22)` and `#1a2a5f` — gradient +
  shadow now flip with the theme.
- `.page-header--hero .page-header__eyebrow` uses
  `var(--ds-hero-eyebrow)` so the eyebrow goes mint-green on
  navy and translucent-white on blue (mint on blue reads poorly).

Admin
- `app/api/admin.py::_KNOWN_FIELDS["instance"]` now registers a
  `theme` field of kind `select` with options `["navy", "blue"]`
  and a `hint` explaining the trade-off. The existing
  /admin/server-config UI auto-renders a select for this — no
  template changes needed.

Defaults
- Default value is "navy" so existing instances see no visual
  change. Admins flip to "blue" via /admin/server-config to
  restore the pre-redesign look.

Restart note: uvicorn must reload to pick up the Python changes
(new getter, new template-context key, new known-field). CSS
changes hot-reload via browser refresh.

* fix(web): blue theme — home hero eyebrow + CTA contrast

`.home-hero-intro .eyebrow` and `.btn-intro-primary` referenced
`--ds-brand-accent` directly, which on the blue theme resolves to
the lighter brand-accent blue (#4F9DEB). Result: light-blue eyebrow
on the blue gradient ("WELCOME, ADMIN" barely readable) and a
light-blue button with darker-blue text ("Set up in ~15 min")
that all sat in the same hue range.

Introduces three new theme-aware tokens:
- `--ds-hero-eyebrow` already existed; blue theme bumped opacity
  to 0.92 so the eyebrow reads as full white.
- `--ds-hero-cta-bg` + `--ds-hero-cta-fg` + `--ds-hero-cta-bg-hover`
  flip the primary hero CTA: mint-green on navy (default), white-
  on-blue under `data-theme="blue"`.

`.home-hero-intro .eyebrow` now uses `--ds-hero-eyebrow` (mint on
navy / white on blue) and `.btn-intro-primary` uses the CTA token
trio.

Recommended palette on blue theme:
- Eyebrow: white at 92% opacity (clear on the blue gradient).
- Primary CTA pill: white background, brand-blue dark text
  (`--primary-dark` = #005BA3) for AAA-level contrast.
- Secondary CTA: translucent white pill (unchanged).

* fix(web): blue theme — callout-hint info bg/border/ink re-tinted to brand blue (was indigo, clashed with brand-blue hero)
2026-05-21 06:19:16 +00:00
ZdenekSrotyr
62336bfd32
fix(rbac): stack-gated analyst access + first-demo polish (#333 follow-up) (#356)
* fix(rbac): stack-gate analyst table access via data_packages exclusively

Previously analysts could see a table in ``agnes catalog`` /
``/api/sync/manifest`` either by:
  1. being in a group with ``resource_grants(group, 'table', id)``, or
  2. being in a group with ``resource_grants(group, 'data_package', …)``
     for a package containing the table.

Path 1 leaked: admins who minted a per-table grant without ever
wrapping the table in a data_package still shipped the table to
analysts — directly contradicting the unified-stack mental model
("the stack is the unit of access"). User report:
"i když to admin nedal do data package tak to by default uživatelé
dostali to by se nemělo stát".

New policy: analyst visibility is strictly stack-gated. A table is
visible iff at least one data_package containing it is in the
analyst's stack (required ∪ subscribed). Admin god-mode and the three
internal data-source tables (agnes_sessions / _telemetry / _audit
with row-level RBAC) keep their existing carve-outs.

Touched surfaces:
* ``src/rbac.can_access_table`` + ``get_accessible_tables`` —
  routed through ``StackResolver.stack(user, DATA_PACKAGE)`` +
  ``data_package_tables`` join instead of ``resource_grants(table)``.
* ``app/api/sync._build_direct_tables_section`` — always returns
  ``[]`` (key kept for older CLI destructuring); per-table grants
  no longer manifest.
* Standardised 403 detail across ``/api/data/*``, ``/api/query``,
  ``/api/v2/sample``, ``/api/v2/scan``, ``/api/v2/schema``:
  ``Table 'X' is not in your stack. Ask an admin to add it to a
  Data Package you have access to (Required or in your stack),
  then run `agnes pull` to refresh.`` Single source of truth lives
  in ``src.rbac.table_not_in_stack_message`` so the wording stays
  consistent across CLI surfaces.

UX side: ``/catalog/t/<id>`` (table detail page) dropped the four
editorial sections (Sample questions, What's inside, Things to know,
Pairs well with) per user feedback — the page's job is now
"what is this table, where do I find it" (hero + parent packages).

Tests:
* ``tests/conftest.grant_table_via_package`` / ``revoke_table_via_package``
  — shared helpers that wrap a table in an auto-named data_package +
  grant the package required to a custom group. Replaces the legacy
  per-test ``_grant_table_to_analyst`` table-grant pattern.
* All 17 previously-failing legacy tests (test_access_control,
  test_journey_rbac, test_audit_gap_*, test_rbac, …) migrated to use
  the new helper; logic stays the same.
* ``tests/fixtures/analyst_bootstrap._grant_table_access`` updated
  to wrap via data_package so the ``test_pat`` fixture's "two table
  grants" semantics still ship parquets through ``agnes init``.
* New ``tests/test_table_not_in_stack_message.py`` locks in the
  standardised 403 detail across the data + check-access endpoints.

5204 tests passing (added 1).

* fix(catalog): first-demo UX feedback — required-first grouping + longer card description

Two minor polish items from the 2026-05-19 stakeholder demo:

1. Required packages cluster at the top of the Browse grid instead of
   being interleaved by ``created_at``. Sort key
   ``(requirement != 'required', name)`` runs before the adapter
   call in both /catalog (data_packages) and /corporate-memory
   (memory_domains) so the required block is visible without
   scrolling. Regression test pins the order via
   ``data-id="…"`` position in rendered HTML.

2. ``.stack-card__desc`` line clamp bumped 2 → 4 lines. Two-line clamp
   trailed almost every admin-authored description off in "…" before
   the second clause, forcing a click-through to read it. The detail
   page (/catalog/p/<slug>) keeps the unclamped body for longer
   content.

* release: 0.55.3 — stack-gated analyst RBAC (BREAKING) + first-demo UX polish + #345 A/B/C/D + #347 UI consistency
2026-05-19 17:01:14 +02:00
ZdenekSrotyr
64cf78860d
feat(stack): unified Browse + My Stack for Data Packages and Memory (v49 schema) (#333)
* feat(unified-stack): Browse + My Stack + Recipes + RBAC matrix (v49–v55)

Squash of 94 commits spanning the v49 → v55 unified-stack rewrite.
Full per-feature breakdown lives in CHANGELOG.md under [Unreleased].
Major buckets:

* v49 schema — first-class user_groups + user_group_members +
  resource_grants; admin can CRUD groups and grants; Google
  Workspace nightly sync writes into the new tables.
* v49 data_packages — admin-curated bundles of tables, RBAC-gated,
  first-class section on /catalog Browse + My Stack.
* v49 memory_domains — row-backed (replaces hardcoded VALID_DOMAINS
  enum); admin can CRUD; grants follow the same shape as tables and
  packages.
* v50 cover_image_url + admin sidebar collapsibles + per-row Mode
  tooltip + admin queue domain badges + admin "+ New Item" seed flow.
* v51 lifecycle status (prod/poc/coming-soon/draft) + category +
  palette swatches on admin modals.
* v52 per-table detail page /catalog/t/<id>.
* v53 Recipes — admin-curated SQL templates as a second tab on
  /catalog with full Edit/Delete admin affordances.
* v54 soft-delete (deleted_at) + Undo toast for packages, memory
  domains, and recipes; hard_delete() retained as escape hatch.
* v55 Recipes RBAC — ResourceType.RECIPE registered, inline Group
  Access matrix on Create + Edit Recipe modals (mirrors the Memory
  Domain pattern).
* Activity Center per-resource filter (resource_prefix LIKE-anchored
  on audit_log.resource); admin nav g+letter keyboard shortcuts;
  loadAdminTablesLayout N+1 → single endpoint; /api/memory 30s
  page-level cache.
* CI hardening — Keboola legacy tests pytest.importorskip; perf-
  smoke threshold widened to stop cold-cache flake.

5002 tests passing, 35 skipped.

* feat(p2 backlog): Cmd-K palette + suggest-a-domain + nightly E2E + v55 schema

10-item P2 sweep on top of the unified-stack squash. New behaviour:

* Cmd-K admin command palette (base.html) — fuzzy-search overlay over
  admin + user-facing routes. Arrows/Enter to navigate, Esc to close.
* Stack-tabs digit shortcuts — 1/2/3 switch Browse / My Stack /
  Recipes on /catalog + /corporate-memory.
* Friendlier non-admin empty state on /corporate-memory, plus a
  "Suggest a domain" CTA → POST /api/memory-domain-suggestions, admin
  queue with approve/reject. Backed by a new memory_domain_suggestions
  table (schema v55).
* /admin/corporate-memory 7-tab strip grouped under Moderation /
  Catalog parent labels.
* Bulk-assign table → package dropdown annotates each option with
  "(N of M tables already in)" so the existing distribution is visible
  before picking a target.
* GET /api/memory + /tree accept is_required filter; admin status
  dropdowns route the "Required" sentinel onto it (status no longer
  holds 'mandatory' post-v49, so the old dropdown returned nothing).
* chip-input.js is now opt-in per template via {% block extra_scripts %}
  instead of loaded globally on every page from base.html.
* Edit-modal close helpers consolidated onto _closeEditModalById();
  docs the per-source-type modal architecture decision.
* New .github/workflows/e2e-nightly.yml runs agent-browser smoke
  scripts (scripts/e2e/smoke_*.sh) against a docker-compose stack
  nightly at 04:30 UTC; failures open an agent-browser-nightly issue.

5012 tests passing, 35 skipped.

* fix(visual audit): 6 page regressions on memory + data-package surfaces

agent-browser walkthrough of every memory + data-package page in the PR
turned up 6 real bugs. Fixes:

1. Admin memory modals were dead. Duplicate `let _cmdNewDomainId`
   declarations from the deprecated step-2 RBAC stubs in
   admin_corporate_memory.html collided with the live state vars
   declared earlier in the same <script> → SyntaxError on parse →
   the entire second script block silently failed → every inline
   onclick= handler defined there (`+ New Memory Domain`, Edit, etc.)
   was a no-op. Removed the duplicate stubs.

2. /catalog/t/<table_id> + /catalog/r/<slug> rendered unstyled.
   Both templates injected their CSS via {% block head %} but
   base.html exposes {% block head_extra %} — wrong block name
   meant <style> rules never reached the rendered HTML. Renamed
   to head_extra. Hero card, section cards, dark SQL block, proper
   full-width inputs all now render as designed.

3. L49 leak — "MANDATORY" KPI label + "Make Mandatory" row buttons
   on /admin/corporate-memory still used the old word. Renamed to
   "Required" / "Mark as Required" so UI matches the data model
   (v49 split moved the Required tier onto the orthogonal
   is_required boolean; status no longer holds 'mandatory').

4. Activity Center Resource dropdown didn't know the v55
   `memory_domain_suggestion:` namespace — added it.

5. Tab strip on /admin/corporate-memory wrapped text 2× per button
   on narrow viewports after the L50 MODERATION/CATALOG group
   labels pushed total width past most viewports. Switched the
   strip to flex-wrap:nowrap + overflow-x:auto with
   white-space:nowrap + flex-shrink:0 on every direct child so the
   tabs stay one row and slide horizontally when they overflow.

5012 tests passing, 35 skipped.

* rebase-cleanup: align with main's 0.54.25-27 API design + comment fix

Three follow-on fixes after rebasing onto origin/main (0.54.27):

* admin_tables.html: dropped a stray nested ``{% if data_source_type
  == 'keboola' %}`` around ``prefillFromKeboolaTable`` (main never had
  it; the outer Phase F2 guard already covers it) and reworded a JS
  comment that contained literal ``{% %}`` tokens which Jinja was
  parsing as a real tag → unbalanced if/endif → 30 template render
  failures across the suite.
* /api/stack/subscription/{type}/{id}: DELETE now returns 204 instead
  of 200 per the 0.54.26 design rules. CLI client + parity tests
  updated to accept 2xx / assert 204.
* Memory-domain suggestion approve/reject paths added to
  ``_VERB_PATH_ALLOWLIST`` — they are pending → approved/rejected
  state-machine transitions (approve also creates the real
  memory_domains row as a side effect), so the RPC shape is
  intentional rather than a missed PATCH refactor.

5035 tests passing, 35 skipped.

* fix(catalog_table_detail): real polish pass — hero glyph, dedup pills, rows/size meta, scoped sync CTA

The previous fix only got the block-name typo so the existing CSS rendered.
The actual layout was still wireframe-tier on close inspection:

* No cover glyph in the hero (a flat white card with title + meta line);
  data-package + memory-domain detail pages both have a colored icon
  square. Restored parity — table.icon emoji if set, otherwise initials
  on a colored square using table.color.
* "INTERNAL" pill rendered twice for agnes_audit etc. — the mode pill
  and the source-type pill happened to be identical strings. Now skip
  the source pill when it matches the mode (`internal == internal`).
* Bucket / source_table code chip showed `Agnes Internal.audit_log` for
  internal rows — meaningless to a user. Hidden when source_type is
  internal.
* `pairs_well_with` admin input was a comma-separated `<input>` always
  visible. Wrapped all 4 sections in an Edit-on-demand toggle: read-
  only display by default, "+ Add" / "Edit" button on the right edge
  of each section header reveals the inline form, Cancel hides it.
* "Trigger sync now" was a cramped link squashed into the empty-state
  flex row (visible as `Tr…` overflow before). Promoted to a proper
  btn-primary button under the empty-state copy. Hidden entirely for
  internal tables (which are server-managed — no upstream to pull).
* Hero meta now surfaces row count + payload size (when sync_state has
  them) + last sync timestamp on a single line — was missing from the
  original.
* Mode pills colored by tier (local=green, remote=amber, materialized=
  blue, internal=gray) so the basic fact about a table reads at a
  glance, not from upper-cased ALL-CAPS text alone.

* tests(v56): TDD baseline for extended data-packages content + per-table docs

68 failing tests across 8 files spec the v56 surface before any
implementation lands:

* test_schema_v55_to_v56_migration.py — schema bump, additive ALTERs
  on data_packages + table_registry, idempotency, sequential-upgrade
  preservation
* test_data_packages_repo_v56.py — repo create/update/get/list for
  owner_name, owner_team, tags, long_description, when_to_use,
  when_not_to_use, example_questions (JSON list round-trip, empty
  defaults, partial-update preservation)
* test_table_registry_v56_docs.py — update_docs for grain, platforms,
  partition_col, history, gotchas; preserves v52 docs columns
* test_api_data_packages_v56.py — PUT/POST/GET for all new fields,
  field-level validation (tag count, bullet length, description size),
  virtual badge derivation (curated/new)
* test_api_registry_docs_v56.py — PATCH /api/admin/registry/{id}/docs
  for v56 fields, validation, RBAC unchanged
* test_web_catalog_package_detail_v56.py — /catalog/p/<slug> rewrite
  asserts on rendered owner line, tag pills, badges, What it is,
  Use it when, Skip it when, Example questions, per-table extended
  detail in collapsible row, key-gotcha distinctness, admin-only Edit
* test_web_stack_card_v56_metadata.py — Browse-grid card additions
  (owner chip, tag chips, badges) without breaking back-compat for
  rows missing the new fields
* test_data_packages_no_vendor_content.py — CI guard: scans app/ +
  src/ + cli/ + config/ + scripts/ for Groupon-specific tokens from
  the colleague's spec MD; fails if any leak into OSS surfaces
* test_db_schema_version.py — bumped 55 → 56 with rationale

Plus updates schema-version assertion to 56. Implementation lands in
subsequent commits (schema migration → repo → API → templates).

* feat(v56): schema + repo for extended data-packages content

Schema additions (ALTER ADD COLUMN IF NOT EXISTS — additive + idempotent):

* data_packages: owner_name, owner_team, tags, long_description,
  when_to_use, when_not_to_use, example_questions (JSON-as-VARCHAR for
  the lists)
* table_registry: grain, platforms, partition_col, history, gotchas
  (extends the v52 sample_questions / things_to_know / pairs_well_with
  docs surface with structured per-table content)

Repo extensions:

* DataPackagesRepository.create + update accept the new fields with
  the same Optional-is-no-op contract as v51 (pass an empty list to
  clear a JSON column)
* _decode_row decodes the new JSON-list columns to Python lists; NULL
  rounds back to [] so callers don't branch
* TableRegistryRepository.update_docs grew the v56 fields alongside
  the existing v52 ones — single PATCH can write either tier
  atomically
* TableRegistryRepository._decode_row picks up platforms + gotchas in
  the same NULL-tolerant decoder

22 repo + migration tests passing. API + UI land in subsequent commits.

* feat(v56): API surface for extended data-packages + per-table docs

CreateDataPackageRequest + UpdateDataPackageRequest grew the v56 fields
(owner_name, owner_team, tags, long_description, when_to_use,
when_not_to_use, example_questions) with per-field validators that
match the Foundry spec checklist:

  * tags: ≤8 entries × ≤30 chars
  * long_description: ≤4000 chars
  * use/skip: ≤8 bullets × ≤200 chars
  * example_questions: ≤12 × ≤200 chars

_serialize emits all v56 fields plus a virtual ``badges`` list derived
server-side at render time (no DB column needed): "curated" when the
creator is in the Admin group, "new" within 30 days of created_at.
Backdating created_at or admin-status changes pick up automatically.

PATCH /api/admin/registry/{id}/docs extended with v56 structured
per-table fields (grain, platforms, partition_col, history, gotchas).
gotchas: list of {key: bool, body: str} Pydantic models with the same
≤8 cap; first key=true entry becomes the Key gotcha on the rendered
package detail page. PATCH echoes the fresh state so callers can
re-render without a second GET.

26 API tests passing (16 data-packages + 10 registry-docs).

* feat(v56): /catalog/p/<slug> rewrite + Browse-grid card augmentation

The third (and final) v56 commit lights up the UI surfaces backed by
the schema + API commits earlier in this PR:

* /catalog/p/<slug> template rebuilt around the Foundry spec's
  section ladder — hero (icon + name + badges + owner + tags +
  description + meta + Add-to-stack), "What it is" markdown body,
  paired "Use it when / Skip it when" panels, "Tables in this
  package" with collapsible per-table extended detail (grain /
  platforms / partition_col / history / gotchas + sample questions),
  and an "Example questions you can ask Claude" prompt panel. Each
  section guarded by ``{% if pkg.<field> %}`` — empty content fields
  hide the section entirely (no "No X yet" placeholder noise on the
  public-facing drilldown).
* router catalog_package_detail hydrates per-table v56 fields onto
  the tables list + derives the virtual badges (curated / new)
  server-side from creator-in-Admin + 30-day created_at.
* StackResolver.ResourceEntry grew owner_name / owner_team / tags /
  badges; _fetch_entries pulls the v56 columns + computes badges
  once per fetch using a single Admin-group SELECT.
* _data_package_entry_dict adapter passes the new fields through to
  the macro; tags are merged source-type pills + admin-authored
  category tags per the spec convention.
* _stack_card.html renders the v56 badges (top-left, data-badge=
  hooks) + the owner chip (data-card-owner hook) without breaking
  back-compat — pre-v56 rows render unchanged.
* Admin PUT handler strips the v56 docs fields from the
  read-modify-write merged dict so register() doesn't blow up
  with the now-larger row shape (same pattern as the v52 docs
  fields stripping).

5115 tests passing (+98 v56 + 18 fixed regressions from the merged-
register PUT path), 35 skipped.

* fix(rbac): Edit-on-package + Group-access 'required' persistence + CI vendor guard

Three related bugs reported on the merged-with-main branch:

1. Clicking Edit on a Data Package card landed on /admin/tables with
   a `#<pkg.id>` hash that nothing listened to — admin saw the global
   table listing, not the editor for that specific package. Added a
   `?edit_package=<pkg_id>` query-param handler in admin_tables.html
   (analog to the existing `?edit=<table_id>` and `?assign_to=<pkg_id>`
   patterns) that calls openEditDataPackageModal on DOMContentLoaded
   after a 250ms layout settle. Updated the package-detail Edit link
   to use the new query param.

2. Setting Group Access to 'required' didn't persist — re-opening
   the modal showed 'available'. Root cause was the v49
   ``resource_grants.requirement`` enum existing in the DB but the
   POST /api/admin/grants endpoint not surfacing it: ``CreateGrantRequest``
   declared only group_id + resource_type + resource_id, so Pydantic
   silently dropped the matrix's ``requirement: 'required'`` payload
   and the new row landed at the DB column default ('available').
   Plumbed ``requirement`` through ``CreateGrantRequest`` →
   ``ResourceGrantsRepository.create`` so the value persists in one
   round-trip. Plus a UNIQUE-constraint race in the matrix
   diff-apply: DELETE-old + POST-new ran in parallel via
   ``Promise.allSettled``, so POST could fire first and trip the
   unique check before DELETE freed the slot. Switched to sequential
   (await all deletes; then await all writes) across all three
   matrices (Edit Data Package, Edit Memory Domain, Edit Recipe).

3. CI vendor-content guard ``test_no_groupon_specific_strings_in_oss``
   tripped on two of my own docstrings: a "Foundry Data team" mention
   in two src/db.py comments + an ``s1_session_landings`` example in
   cli/skills/agnes-table-registration.md. Rephrased the comments to
   "extended-descriptions admin spec" and replaced the example with
   a generic ``events_daily`` table name.

5164 tests passing, 35 skipped (+4 regression tests pinning the POST
/api/admin/grants requirement contract). Vendor guard back to green.

* fix(catalog): admin Browse path drops v58 card fields

The /catalog and /memory admin god-mode branch built ResourceEntry
instances inline from pkg_repo.list() / domains_repo.list() and skipped
owner_name, owner_team, tags, and derived badges (curated/new). Visible
symptom: a package with an owner + tags rendered with the v56 chrome
for non-admin viewers but as a bare card for admins.

Adds StackResolver.browse_admin(user_id, resource_type) — admin god-mode
Browse that walks the full table but routes through the same
_fetch_entries enrichment pass as browse(), so admin + non-admin Browse
stay visually consistent. Both /catalog and /corporate-memory routes
switch to it.

Regression test in tests/test_stack_resolver_browse_admin.py covers:
owner/tags propagation, new/curated badge derivation, in_stack from
admin subscriptions, all-packages-regardless-of-grants, and the
ValueError for unsupported resource types.

* fix(catalog): three /catalog tab-strip UX bugs

1. Required Remove → red toast
   browse_admin passed empty required_ids to _fetch_entries, so the
   admin's own required grants surfaced as 'available' and the macro
   rendered an actionable Remove button that POST /unsubscribe 400'd
   on. Now derives required_ids from the admin's own groups so
   Required packages render with the disabled "In stack (required)"
   button. Regression test in test_stack_resolver_browse_admin.py.

2. Remove green-toasts but card stays until refresh
   The My-Stack empty-state placeholder was only emitted server-side
   when stack_entries was empty at render time. Removing the last
   card left the tab completely blank — users read that as "Remove
   didn't work, let me refresh". Both grid + empty-state are now
   always rendered with one of them initially hidden; the JS swaps
   visibility on add/remove instead of injecting DOM. Same fix in
   /corporate-memory.

3. "What are Recipes?" + ambiguous (admin) suffix
   Recipes tab now carries its own curator-block explainer (the
   shared one was moved inside Browse view so it doesn't bleed
   across tabs). The grey "(admin)" suffix becomes a yellow
   .admin-only-hint chip with a title tooltip — visibility hint is
   now unambiguous: yellow chip = "only you see this", non-admins
   don't see the affordance at all.

* schema: renumber v51..v58 → v52..v59 to make room for main's v51

Main 0.54.29 introduced a NEW v51 (table_registry.bq_fqn — issue #343)
that releases ahead of this branch. The unified-stack chain v51..v58
shifts up by one so main's v51 stays as the released schema and ours
become v52..v59. Function names, internal version bumps, dispatch
ladder thresholds, and the migration-test references all move
together. Subsequent merge with main lands the bq_fqn column at the
freed v51 slot.

* fix(seed): seed admin lands in BOTH Admin AND Everyone groups

The LOCAL_DEV_MODE / SEED_ADMIN_EMAIL bootstrap only added the seed
user to Admin. Everyone-scoped grants — the canonical "every-user-
sees-this" pattern for Required onboarding — didn't surface for the
seed admin's own /catalog because they weren't in Everyone. Symptom:
admin grants a Required-tier package to Everyone, then sees it on
/catalog still rendered with an "Add to stack" button (because the
admin's resolved required_ids was empty for that package).

The dual-membership keeps Admin (authorization) and Everyone
(default-grant target) intentionally separate per the design comment
on UserRepository.create — every membership remains traceable to a
concrete row, just now with a system_seed row in Everyone too. Both
INSERTs go through UserGroupMembersRepository.add_member which is
idempotent on (user_id, group_id), so re-fires on every lifespan
startup don't duplicate rows.

Regression test in test_main_seed_admin_everyone.py.

* style: unify admin-only hints across marketplace + memory detail pages

Replaces three stale ``(admin)`` parentheticals with the same yellow
``admin-only`` chip introduced for /catalog tab actions. Same tooltip
copy ("Visible only to admins — analysts won't see this …") so the
visibility hint is unmistakable wherever it appears:

- Hard delete on marketplace_plugin_detail (admin-only destructive
  action — same gating as the original suffix conveyed).
- Hard delete on marketplace_item_detail (same).
- Edit link on memory_domain_detail (title-attr only before; now a
  visible chip too).

Non-admin viewers never saw these affordances — the gates are
unchanged. Pure styling pass for consistency.

* fix(catalog): exclude soft-deleted data packages + memory domains from Browse

``StackResolver._fetch_entries`` and ``browse_admin`` were querying
data_packages / memory_domains without a ``deleted_at IS NULL`` guard.
A package soft-deleted via /admin/* (v54 soft-delete contract) stayed
visible on /catalog and /memory until either an Undo or a hard delete
— directly contradicting the soft-delete UX which is supposed to
remove the affordance immediately and only retain the row for the
Undo window.

The repository accessors (DataPackagesRepository.list,
MemoryDomainsRepository.list, list_packages_of_table, etc.) already
filter deleted rows; this commit brings the resolver's direct SQL in
line with that contract.

Regression test in test_stack_resolver_browse_admin.py.

* fix(catalog): Add/Remove updates full card chrome, not just button

The previous _applyStackChange flipped only the footer button label —
the card border (.is-in-stack class), top-right "In stack" badge, and
button color class (--add / --remove) stayed at their server-rendered
state. After Add the user saw the button checkmark but the rest of
the card still looked like "available, not in stack". They read this
as "the change didn't take — let me refresh".

This commit makes the optimistic update mirror what the server-side
macro renders for the new state:

* ``c.classList.toggle('is-in-stack', becameInStack)`` — flips the
  border + visual state class.
* Top-right ``.stack-card__req-badge--instack`` badge is injected on
  Add, removed on Remove (skipped when ``data-requirement='required'``
  — that slot is owned by the Required badge).
* Button text is "Remove" / "+ Add to stack" matching the macro
  (was "✓ In stack" which was visually nice but inconsistent).
* Button color class --add / --remove swaps so the destructive Remove
  tint kicks in immediately.

The clone-into-My-Stack path applies the same updates so the new card
in My Stack reads identically to a server-rendered in_stack card.
Mirrored in /corporate-memory.

* fix(memory): four Devin-review bugs on /memory drill-down + manifest

PR #333 Devin review surfaced four real bugs that ship a broken
/memory experience even though the unit tests passed.

1. Manifest md5 omits is_required + content (app/api/sync.py:836-840)
   _build_memory_domains_section hashed only (id|title|status) per
   item. _build_per_domain_markdown routes items between "## Required"
   and "## Approved" by is_required and embeds full content — so an
   admin edit of either dimension left the manifest md5 unchanged,
   `agnes pull` skipped the re-fetch, and the analyst kept a stale
   bundle.md. Now both fields participate in the hash.

2. required_count always 0 (src/repositories/memory_domains.py)
   list_items_of_domain only SELECTed (id, title, status) so the
   `it.get("is_required")` in the manifest builder always evaluated
   to None → required_count = 0 regardless of actual state. The
   manifest builder advertised a count it could never compute. Now
   projects is_required + content too (required by fix 1 anyway).

3. Vote URL 404 (memory_domain_detail.html:289-290)
   Constructed `/api/memory/items/{id}/vote` but the route is
   `/api/memory/{id}/vote`. Every upvote/downvote button was a
   silent no-op.

4. Dismiss/undismiss URL + method both wrong (memory_domain_detail.html:296-305)
   Constructed `/api/memory/items/{id}/dismiss` (extra /items/) and
   /undismiss (no such route — undismiss is DELETE on /dismiss).
   Both buttons silently 404'd. Now POST + DELETE on
   `/api/memory/{id}/dismiss` per app/api/memory.py:635/675.

* fix: multi-agent reviewer findings — vendor-token scrubs + manifest md5 predicate + soft-delete filter

Three reviewer findings from the multi-agent review on PR #333,
fixed in-place per CLAUDE.md issue-economy rule.

Reviewer-rules (Important — vendor-agnostic OSS):
- app/main.py:218 comment: replaced 'foundryai-prod' with generic
  'a customer prod instance' phrasing. Public OSS repo must not
  carry customer-specific tokens (CLAUDE.md § Project conventions).
- tests/test_table_registry_v56_docs.py:70 fixture string:
  replaced "user_brand_affiliation = 'groupon'" with 'acme' on
  the same rule.

Reviewer-architecture (closes still-unresolved Devin 🚩 ANALYSIS):
- app/api/sync.py _build_memory_domains_section: md5 hash loop now
  filters items to the SAME predicate the bundle renderer uses
  (is_required OR status='approved'). Pre-fix the hash iterated ALL
  items but _build_per_domain_markdown only rendered the union of
  required items + approved-non-required items — so an admin edit
  to a pending/rejected non-required item flipped the md5 against
  an identical-bytes bundle, triggering a wasteful re-fetch on
  every analyst's next 'agnes pull'. The earlier commit fixed the
  hash-input fields (is_required + content); this closes the
  set-of-items asymmetry Devin separately flagged.

Reviewer-RBAC (minor cleanup):
- app/resource_types.py _data_package_blocks and _memory_domain_blocks
  now filter 'WHERE deleted_at IS NULL' (v54 soft-delete column) so
  the /admin/access UI doesn't surface soft-deleted entities as
  grantable. Mirrors the existing filter on _recipe_blocks. No
  security leak pre-fix (resolver double-filters and re-checks at
  serve time), just UI cleanliness.
- app/services/stack_resolver.py add_to_stack: docstring note
  added explaining that authorization is enforced at the API layer
  (app/api/stack.py can_access gate), not at the resolver. The
  initial review suggested adding a defensive 403 here, but that
  broke 5 existing tests that legitimately call add_to_stack
  directly without setting up grants first; the docstring captures
  the contract instead. stack() already intersects subscriptions
  with current available_ids on every read, so a 'zombie' row from
  a misuse never leaks into the user-facing manifest.

* release: 0.55.0 — unified Browse + My Stack (Data Packages + Memory), schema v48→v59, 3 BREAKING
2026-05-19 15:00:15 +02:00
minasarustamyan
c6c72b9c00
feat(flea): marketplace refactor — data model, attribution, UI unification (#342)
* feat(flea): phase-1 — title, tagline, synthetic_name columns + upload UX

Schema v49 adds three user-facing metadata columns to store_entities:

- title (NOT NULL) — humanized display name shown on marketplace
  surfaces in later phases. Acronym-aware humanizer in
  src/store_naming.py (27 entries: MCP, API, OAuth, S3, …) shared
  with the frontend via Jinja-injected dict so JS pre-fill and
  Python backfill produce identical output.
- tagline (NULL, ≤200 chars) — optional short description for card
  listings. Long-form `description` stays.
- synthetic_name (NOT NULL) — deterministic `<name>-by-<owner_username>`
  stored as a column for indexing and as the single source of truth
  for attribution lookups in later phases. Today's bundle bake still
  uses suffixed_name() at the same call sites.

Migration (_v48_to_v49_migrate, Python function — humanize has no
SQL equivalent) backfills existing rows: title from
humanize_name(strip_archive_suffix(name)), synthetic from the concat
formula; tagline stays NULL. Idempotent (ADD COLUMN IF NOT EXISTS +
SET NOT NULL no-op on re-run).

Upload form (store_upload.html step 2) reorders fields: Title
(pre-filled from server-side humanize, JS keeps it in sync until
the user edits manually) → Name + dark synthetic preview on one
row (matches marketplace_item_detail.html dark code styling, no
copy button — preview only) → Short description with character
counter → Description (unchanged). Edit form (store_edit.html)
mirrors the layout with pre-filled values from the entity row.

API:

- POST /api/store/entities/preview returns `title` (humanized
  fallback) for upload form pre-fill.
- POST + PUT /api/store/entities accept `title` and `tagline` form
  fields with 100/200-char validation; PUT recomputes
  synthetic_name when `name` changes (caller responsibility per
  repo contract).
- StoreEntityResponse exposes all three new fields.

Repository:

- create() takes title + tagline + synthetic_name as optional
  kwargs with derived defaults (humanize_name(name) / concat) so
  existing test fixtures don't need to thread them.
- update() supports partial updates on all three; tagline empty
  string clears via NULL sentinel.
- archive() recomputes synthetic_name on rename to the archived
  slug so the column stays consistent with name.

Tests:

- New test_schema_v48_to_v49_migration.py: fresh install,
  populated-row backfill (incl. archived row strip), idempotence,
  NOT NULL constraint verification.
- test_store_naming.py: 14 humanize parametrize cases + acronym
  dict invariants.
- test_store_api.py::TestStoreV49Metadata: preview humanize, POST
  with explicit + fallback title, 100/200-char rejects, PUT
  partial update + synthetic recompute on rename.
- Schema version assertion bumps (48 → 49) in test_db_schema_version,
  test_home_stats, test_schema_v42_migration, test_schema_v46_migration.

Phase 1 only — surface rendering on cards / detail pages and
Claude Code bundle propagation come in later phases.

* feat(flea): phase-2 — wire title/tagline/owner through marketplace cards + detail pages

Phase 1 (7f4cfcbb) populated the three new columns on store_entities;
phase 2 surfaces them across the web presentation layer so the kebab-
case slug + bare username no longer leak into user-facing copy.

API:

- `_flea_to_item` now takes `conn` (both callsites updated) and sets
  `display_name=entity.title`, `tagline=entity.tagline`, `owner=
  _resolve_owner_display(conn, owner_user_id, owner_username)` —
  matches the chain the curated path already uses (users.name →
  users.email → fallback). The card JS chain `it.display_name ||
  it.name` then renders the friendly form; `name` stays at the
  suffixed slug as the technical identifier JS uses for fallbacks.
- `flea_detail` adds `display_name` + `tagline` to PluginDetailResponse
  so the standalone skill/agent + plugin detail heroes pick them up
  through the existing `d.display_name` / `d.tagline` chains.
- `_flea_inner_parent_fields` swaps `parent_display_name` from
  `strip_archive_suffix(name)` to `entity.title or strip_archive_suffix(
  name)`. Drives parent-plugin label in four surfaces at once:
  breadcrumb 3rd segment, hero "part of <plugin>" meta-row,
  helper "This skill is part of <plugin>" panel, and the Details
  sidebar's "Parent plugin" row.

Templates — `marketplace_item_detail.html`:

- Pre-render: browser title, hero h1, and hero-window-label read
  `(entity.title if entity else None) or inner_name or item_name or
  plugin_name` so the SSR shell shows the friendly title before the
  JS fetch lands (no flash of kebab-case).
- Breadcrumb last segment for flea standalone drops the `d.manifest_name
  || heroTitle` fallback in favour of just `heroTitle` — manifest_name
  is the suffixed slug and users explicitly didn't want it in the path.
- Hero meta-row for flea standalone is now hidden. The prior "by
  <author> · N installed · <size>" line duplicated install count
  (hero telemetry chip below), owner + bundle size (Details sidebar).

Templates — `marketplace_plugin_detail.html`:

- Same SSR pre-render swap (title, h1, window-label, crumb-name).
- Hero tagline element starts hidden; JS shows it only when
  `d.tagline` is truthy. Pre-fix it fell back to `d.description`
  (long-form text), which read awkwardly under the h1 and pulled the
  hero too tall. Description still renders in the "What it does"
  panel below the hero.
- Initial "Loading…" placeholder removed so entities without a
  tagline don't flash that text mid-fetch.

Tests:

- New `TestFleaPhase2Presentation` class in test_marketplace_api.py
  (6 cases): card title + tagline + full-name owner, owner fallback
  chain when users.name is NULL, flea_detail exposes title + tagline,
  tagline null when omitted, inner skill parent_display_name uses
  entity.title (explicit + humanize-fallback variants).
- Updated `TestListItems.test_flea_lists_uploads` to assert both
  `display_name == "Alpha"` (humanized) and `name ==
  "alpha-by-alice"` (suffixed slug compat).
- Updated `TestWebPages.test_marketplace_flea_detail_page_renders`
  to look for the humanized title ("Page Skill") in the SSR shell
  instead of the kebab-case `page-skill`.

* feat(flea): phase-3 — read synthetic_name from DB, suffixed_name() only on write

Phase 1 added the column + backfill, repo write paths keep it in sync.
Phase 3 routes every READ callsite through `store_entities.synthetic_name`
directly instead of recomputing `<name>-by-<owner_username>` on the fly,
and switches the collision query off the inline string concat. The
`suffixed_name()` primitive now lives exclusively in write flows.

Read callsites updated (all read `entity["synthetic_name"]` directly,
no fallback — the column is NOT NULL and a missing value would be a
real bug worth surfacing as KeyError):

- app/api/marketplace.py:_flea_to_item — card MarketplaceItem.name.
- app/api/marketplace.py:flea_detail — PluginDetailResponse.manifest_name.
- app/api/store.py:_entity_to_response — StoreEntityResponse.invocation_name.
- app/api/store.py PUT bundle re-bake — `suffixed` passed to
  `_bake_plugin_tree`; entity is loaded pre-rename, so its
  synthetic_name is the OLD value `_bake_plugin_tree` expects.
- app/api/store.py PUT rename — `old_suffix` for `_rename_baked_tree`.
- app/api/my_stack.py — StoreInstallEntry.invocation_name.
- src/marketplace_filter.py — manifest_name in served plugin entry.

`suffixed_name` imports removed from marketplace.py, my_stack.py, and
marketplace_filter.py (no remaining callsites). store.py keeps the
import for its write paths:

- POST create (`suffixed = suffixed_name(final_name, username)` →
  passed to `_bake_plugin_tree` and `repo.create(synthetic_name=...)`).
- PUT rename collision check (`new_suffixed`).
- PUT rename `new_suffix` for `_rename_baked_tree` (proposed value).
- PUT rename `new_synthetic` for `repo.update(synthetic_name=...)`.
- Archive `old_suffix` + `new_suffix` for `_rename_baked_tree`
  (retro-compute pre-archive value after `repo.archive` already
  overwrote the DB row with the post-archive synthetic).

Collision SQL — `_suffixed_already_taken`:

  WHERE name || '-by-' || owner_username = ?   (before)
  WHERE synthetic_name = ?                     (after)

Same matches today (phase 1 backfill + NOT NULL invariant + write
paths in sync); indexable + single source of truth going forward.

Repository:

- UserStoreInstallsRepository.list_for_user explicit SELECT extended
  with `se.title`, `se.tagline`, `se.synthetic_name` so my_stack and
  marketplace_filter callers can read them off the joined row.

Tests:

- test_store_api.py::test_invocation_name_reads_from_synthetic_column —
  upload entity, manually override the column with a non-canonical
  value, verify GET response returns the override (proves read path
  consumes the column, not recomputes).
- test_marketplace_api.py::test_flea_card_and_detail_read_synthetic_name_from_db —
  same proof for `MarketplaceItem.name` (card) and
  `PluginDetailResponse.manifest_name` (detail).

* feat(flea): phase-4 — rename agnes-store-bundle → flea (synthetic plugin)

The synthetic plugin that wraps loose flea-market skills + agents into
one Claude Code plugin is renamed from `agnes-store-bundle` to `flea`.
Plugin-type flea uploads (their own standalone plugin entry) are
unaffected.

Constants:
- src/marketplace_filter.py:
  - BUNDLE_PLUGIN_NAME: "agnes-store-bundle" → "flea"  (Claude Code
    plugin manifest name + .claude-plugin/plugin.json name)
  - BUNDLE_PREFIXED_NAME: "store-bundle" → "flea"      (on-disk ZIP /
    git tree path, now plugins/flea/...)

Attribution layer (services/session_processors/usage_lib.py):
- FLEA_BUNDLE_PREFIX: "agnes-store-bundle" → "flea". The JSONL
  invocation identifier going forward is `flea:<skill-name>`.
- New `_LEGACY_FLEA_BUNDLE_PREFIXES = ("agnes-store-bundle",)`.
  `MarketplaceItemLookup.resolve()` + `_attribute_event()` accept BOTH
  the new and the legacy prefix so historic usage_events (~90-day
  retention) continue attributing to source='flea'. The tuple becomes
  a no-op once the rename has been live past the retention window —
  a follow-up commit can drop it then.
- USAGE_PROCESSOR_VERSION bumped 6 → 7 so the session-pipeline reprocess
  loop re-runs attribution with the new + legacy prefix branches.

User-facing copy:
- /api/store/bundle.zip Content-Disposition filename: agnes-store-bundle.zip → flea.zip
- `agnes admin store pull` default --out: agnes-store-bundle.zip → flea.zip
- Docstrings + JS comment + welcome template comment updated.

Tests:
- skill_flea.jsonl fixture identifier updated to flea:flea-skill.
- New skill_flea_legacy.jsonl with the legacy prefix for backward-compat
  coverage.
- New test `test_legacy_agnes_store_bundle_prefix_resolves` replays the
  legacy fixture and asserts source='flea' attribution still lands.
- All other test assertions / mocks substituted mechanically:
  test_session_processor_usage.py, test_usage_rollups.py,
  test_marketplace_filter_store.py, test_store_api.py,
  test_cli_refresh_marketplace.py.
- `_seed_flea_entity` (test_usage_rollups.py) + `_seed_attribution`
  (test_session_processor_usage.py) helpers now supply the NOT NULL
  `title` + `synthetic_name` columns from phase 1, since they INSERT
  directly bypassing the repo's create() fallback.

Client rollover note (CHANGELOG): `agnes refresh-marketplace` will
install the new `flea@agnes` plugin and the local marketplace clone's
`plugins/store-bundle/` source folder is removed via `git reset --hard`.
Whether Claude Code itself auto-prunes the orphan `agnes-store-bundle
@agnes` registry entry is undocumented — to verify empirically on the
dev VM. If the orphan entry lingers, a follow-up will add targeted
cleanup; until then users can manually run
`claude plugin uninstall agnes-store-bundle@agnes`.

Verified locally: 98 passed (session_processor_usage + usage_rollups +
marketplace_filter_store + cli_refresh_marketplace) + 228 passed/2
skipped (store_api + marketplace_api + admin_store_submissions +
store_entity_versions + store_repositories).

* fix(flea): phase-5 — attribution keyspace mismatch (closes #335)

Pre-fix every flea skill/agent invocation silently fell through to
`usage_events.source = 'builtin'`. Root cause: lookup tables in
`services/session_processors/usage_lib.py` keyed `_flea_entities` (and
the derived `_flea_plugins` set) by `store_entities.name` — the
un-suffixed display name. Claude Code writes invocations as
`flea:<synthetic_name>` (e.g. `flea:xlsx-by-c-marustamyan`), so
`dict.get(local)` always missed and the resolver fell through to
builtin. Result: marketplace cards, detail telemetry chips, admin
group-by-source all showed 0 flea invocations even when the raw
JSONL stream was correct.

Phase 1 added the `synthetic_name` column + backfill; phase 4 renamed
the bundle prefix to `flea`; phase 5 finally flips the lookup
keyspace to match what JSONL writes.

usage_lib.py:
- `MarketplaceItemLookup.__init__` preload: `SELECT synthetic_name,
  type FROM store_entities` (was `SELECT name, type`). `_flea_plugins`
  set derived from those keys, so it now carries synthetic_names
  too — matches what Claude Code writes when invoking a skill nested
  inside a flea plugin (`<synthetic>:<inner>`).
- `rebuild_rollups` preload: same SELECT change; also derives
  `flea_plugins` and threads it through `_aggregate_events` /
  `_rebuild_window`.
- `_attribute_event`: signature extended with `flea_plugins`; new
  branch `if prefix in flea_plugins: return ("flea", default_type,
  prefix, local)` for flea-plugin-nested skills/agents. This branch
  was added to `MarketplaceItemLookup.resolve()` in v6 (commit
  e076ebbe) but the rollup builder's helper was never updated to
  match, so nested skills inside flea plugins silently dropped out
  of the daily/window fact tables.
- `USAGE_PROCESSOR_VERSION`: 7 → 8. Forces the session-pipeline
  reprocess loop to re-attribute existing usage_events rows with
  the corrected lookup so rollup tables fill correctly on the next
  tick.

marketplace.py — 4 API stats lookup callsites switched from
`entity["name"]` to `entity["synthetic_name"]`:
- `_flea_to_item` (card stats lookup)
- `flea_detail` (`_build_telemetry` + `_load_inner_items_stats_by_parent`)
- `flea_skill_detail` (inner detail `parent_plugin` key)
- `flea_agent_detail` (inner detail `parent_plugin` key)

Tests:
- `skill_flea.jsonl` invocation: `flea:flea-skill` →
  `flea:flea-skill-by-alice` (mirrors what Claude Code writes after
  phase 1/4 — the suffixed synthetic_name).
- `test_flea_skill_attributed_with_empty_parent` assertion: rollup
  `name` column now carries the synthetic_name.

No legacy `agnes-store-bundle` prefix backward compat — clean cut per
user direction (dev phase, no production data worth preserving).

Verified locally: 53 passed targeted (session_processor_usage +
usage_rollups + marketplace_filter_store) + 215 passed/2 skipped
broader (store_api + marketplace_api + admin_store_submissions +
store_entity_versions).

* fix(flea): phase-6 — plugin-level rollup aggregation parity for flea

Flea plugin entity cards + detail pages showed 0 invocations even
though nested skills had correct rollup rows. Root cause: the
plugin-level aggregation pass in `_aggregate_events` was hardcoded
to `source='curated'` only:

    if source != "curated" or not parent:
        continue
    if group_by_day:
        pkey = (day, "curated", "plugin", "", parent)
    else:
        pkey = ("curated", "plugin", "", parent)

So flea plugin entities never got a synthetic
`(source='flea', type='plugin', parent_plugin='', name=<synth>)`
row aggregating nested invocations. `_load_invocation_stats('flea')`
filters `parent_plugin = ''` and returned no row for flea plugin
entity cards, so `stats.get(entity["synthetic_name"])` missed and
the API exposed 0/0.

Triggered by empirical observation on the dev VM —
`codex-second-opinion-by-c-marustamyan` plugin showed 0 calls in
the listing card while its three inner skills (codex-setup ×3,
codex-review ×1, codex-second-opinion ×1) had the expected child
rollup rows.

Fix:

- Extend the guard to `source in ("curated", "flea")`.
- Replace the hardcoded `"curated"` in the `pkey` tuple with the
  loop's `source` variable, so flea aggregation lands as `source=
  'flea'` and curated aggregation continues landing as
  `source='curated'`.

API path unchanged — `_load_invocation_stats('flea')` filters
`parent_plugin = ''` already picks up the new aggregated row
alongside standalone skill/agent rows. Rollup `name` field carries
the synthetic_name keyspace; no collision between standalone entity
synthetic and plugin entity synthetic (global suffix uniqueness
enforced by `_suffixed_already_taken`).

`USAGE_PROCESSOR_VERSION` bumped 8 → 9 to force a reprocess pass so
historic nested-invocation data fills the new plugin-level rows on
the next tick (instead of waiting for the next live invocation).

Tests:

- New `test_flea_plugin_row_aggregates_children` mirrors the existing
  `test_curated_plugin_row_aggregates_children`: seeds a flea plugin
  entity, three nested events (one user invoking two skills, a
  second user invoking one) → asserts the aggregated plugin row
  carries count=3, distinct_users=2 (union, not sum), plus the child
  rows survive alongside.

Verified locally: 43 passed (session_processor_usage + usage_rollups)
+ 82 passed/2 skipped broader (+ marketplace_filter_store +
marketplace_api).

* refactor(marketplace): phase-7 — unify Details sidebar across detail surfaces

Five marketplace detail surfaces (curated plugin, flea plugin, curated
inner skill/agent, flea inner skill/agent, flea standalone skill/agent)
had drifted on which Details rows they show and what order — the same
field landed in different positions, some fields duplicated hero info,
and the flea plugin Owner row leaked the kebab-case `owner_username`
slug instead of the user's real name. This commit aligns all five
surfaces on a single scan order driven by UX priority:

  identity → life-stage → telemetry → debug-tier

Concretely:

  1. Curator / Owner          (first scan signal — trust)
  2. Parent plugin            (inner skill/agent only)
  3. Released                 (top-level only — plugins + flea standalone)
  4. Last used                (recency)
  5. Active days              (engagement consistency)
  6. Version                  (flea standalone only — content hash)
  7. Bundle size              (debug-tier)

Dropped:

  - Slug field on plugin detail surfaces (`marketplace_id` for curated,
    `entity_id` for flea). Pure debug info, never user-relevant; URL
    already carries it.
  - Category + Installs on flea standalone skill/agent detail.
    Category is already shown as a hero badge; install count is in
    the hero telemetry chip — sidebar duplication added noise.

Owner display:

  - Flea plugin Owner row now reads `d.owner_display` (resolved through
    `users.name → users.email → owner_username` by `_resolve_owner_display`
    in `app/api/marketplace.py:1491`) instead of the raw `d.author_name`
    (which is `owner_username`, the kebab-case slug). API field already
    populated from phase 2; templates just consume it.
  - Curated Curator row continues to read `d.author_name` from
    marketplace-metadata.json; `owner_todo` placeholder behavior
    preserved.

Files:

  - app/web/templates/marketplace_plugin_detail.html — rewrote the
    Details render loop (lines 1364-1427 area). Slug row removed,
    rows reordered, Owner branch reads `d.owner_display`.
  - app/web/templates/marketplace_item_detail.html — both branches of
    the Details sidebar (inner skill/agent + flea standalone) re-laid
    around the same scan order. Telemetry helper unchanged, just
    repositioned. Category + Installs rows removed from the
    standalone branch.

No new tests — no existing test asserts the precise order of Details
rows or references the dropped fields in a sidebar context (grep
confirmed). API surface unchanged.

Verified locally: 84 passed / 2 skipped on `test_marketplace_api.py`
+ `test_store_api.py`.

* fix(flea): post-review hardening — N+1, v50 UNIQUE, docs, test cleanup

Addresses 5 critical findings from PR #342 code review:

1. N+1 query in `_flea_to_item` — owner-display resolution previously
   ran one `SELECT … FROM users WHERE id = ?` per item in the listing
   comprehension. Now batched via `_load_users_display` IN-query
   prefetch; 50 items drops 51 user queries to 2. Regression-guarded
   by `TestFleaOwnerDisplayBatched` (spies `_resolve_owner_display`
   and asserts it's not called inside the list path).

2. Misleading comment in `src/marketplace_filter.py` claimed the
   attribution layer accepts both `agnes-store-bundle` and `flea`
   prefixes — it doesn't (clean cut per CHANGELOG). Rewrote to match
   reality.

3. CHANGELOG `[Unreleased]` had two `### Changed` blocks. Merged into
   one (BREAKING bullet first).

4. New v49→v50 migration adds `UNIQUE INDEX
   idx_store_entities_synthetic_name`. v49 made `synthetic_name` the
   canonical attribution key but uniqueness was only app-enforced;
   v50 promotes the invariant to the DB layer. Migration pre-checks
   for existing duplicates and raises `RuntimeError` listing them
   rather than letting `CREATE UNIQUE INDEX` fail mid-way. v48→v49
   migration gained an `is_nullable='YES'` guard on its `SET NOT NULL`
   ALTERs so re-runs on a fully-migrated DB don't trip DuckDB's
   "cannot alter entry … entries depend on it" block (the new index
   counts as such an entry). Index is created by the migration only —
   keeping it out of `_SYSTEM_SCHEMA` preserves fresh-install ordering
   (CREATE TABLE → v49 ALTERs → v50 CREATE INDEX).

5. Deleted three redundant version-pinned schema asserts whose names
   lied about their bodies (`test_schema_version_is_42` asserting
   `== 49`, etc.). Canonical assert lives in
   `test_db_schema_version.py`, renamed to
   `test_schema_version_matches_constant`.

* fix(db): gate v34→v38 store_entities ALTER COLUMN steps on column state

CI on Linux failed `test_v17_to_v18_drops_*` after the v50 UNIQUE INDEX
landed. Root cause: those tests open a DB at the full target version,
seed fixtures, then reset `schema_version` to 17 and reopen — forcing
the ladder to re-run from 17 → current. With the v50 index now in place,
DuckDB blocks intermediate `ALTER COLUMN` steps on `store_entities`
("Cannot drop this column: an index depends on a column after it!" /
"Cannot alter entry because there are entries that depend on it"),
because `synthetic_name` (the indexed column) sits positionally after
the columns those steps touch.

Fix: convert the three SQL-list migrations that hit store_entities into
defensive Python functions:

- `_v34_to_v35_migrate` short-circuits when `synthetic_name` already
  exists (post-v49 shape — the visibility_status rebuild is moot and
  the DROP COLUMN would be blocked by the index).
- `_v35_to_v36_migrate` gates the `visibility_status SET NOT NULL` +
  `SET DEFAULT` on `is_nullable='YES'` so it's a true no-op when the
  column is already constrained.
- `_v37_to_v38_migrate` gates the `version_no SET NOT NULL` step the
  same way.

Forward-roll path (real installs that never reset schema_version) is
unchanged: the gates fire `YES` → ALTERs run. The fix only changes
behavior for the "DB is already at v50 shape but version row says 17"
scenario the tests construct.

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
2026-05-19 02:32:41 +02:00
Vojtech
cd03028776
fix(store): restore reuses prior approved verdict + admin detail surfaces content_quality (#332)
* fix(store): restore reuses prior approved verdict; admin detail surfaces content_quality

Live bug on agnes-development: entity 6ba2ee1d…'s v5 submission (third
restore of v1, byte-identical to v1/v2/v4/v6) landed `blocked_llm`
while the other identical-hash siblings landed `approved`. Anthropic
structured output is non-deterministic — same bytes flipped
`content_quality.verdict` pass↔fail across calls. Admin detail page
made the failure look mysterious: only security-findings table
rendered, so a content-quality-only block showed up as
"No findings — model verdict was clean".

Two fixes:

1. Restore endpoint reuses a prior `approved` submission's verdict
   when the restored bundle hash matches an existing history entry
   AND `reviewed_by_model` matches. Skips the LLM call, stamps the
   new submission with the prior verdict + `reused_from_submission_id`
   marker. Deterministic + saves Anthropic tokens. Gated on
   schedule_async_llm so guardrails-off keeps its existing path.

2. Admin detail template now renders `content_quality.issues` in its
   own table + adds an explicit "Blocked but no findings recorded"
   notice for the transient-non-determinism case + surfaces the
   reuse marker when present.

Reuse falls back to a real LLM call when:
- prior submission's reviewed_by_model doesn't match current (admin
  upgraded tier Haiku → Sonnet → Opus)
- prior submission was guardrails-off (no reviewed_by_model)
- no history entry has matching hash

Tests:
- TestRestoreReusesApprovedVerdict::test_restore_of_approved_version_skips_llm_and_reuses_verdict
- TestRestoreReusesApprovedVerdict::test_restore_legacy_v1_falls_back_to_llm

* fix(store): admin detail v# by submission_id + version switcher

Three related fixes surfaced live by a user inspecting submission
47bbc1f5… on localhost where v# rendered as v1 even though current
was v10.

1. Admin queue + admin detail derive submission v# by submission_id
   instead of hash. Pre-fix the loop matched first hash-equal entry
   in version_history — always v1 when bundles were byte-identical
   (which is the common case after the restore-reuse path). Two
   call sites updated:
   - `src/repositories/store_submissions.py:list_for_admin` (queue
     v# column)
   - `app/web/router.py:admin_store_submission_detail_page` (detail
     page v# chip on each section header)

   Same fix pattern as PR #330 for runner / override.

2. New version-switcher card on admin detail page lists every
   submission linked to the entity with status + reviewed_by_model +
   click-to-jump. Solves the user's secondary ask ("should be a way
   to switch different versions on the submission detail").

3. Initial POST now backfills the v1 seed entry's submission_id
   right after creating the v1 submission. The helper
   `update_history_submission_id` existed but no production code
   path called it — so v1 always had submission_id=None and every
   "find v# for submission" lookup silently failed for v1.

171 tests green on touched surface.

* release: 0.54.24 — restore reuses prior approved verdict + admin detail content_quality + v# by submission_id (Codex/Live follow-up to #330/#331)

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-16 07:12:29 +02:00
minasarustamyan
302cf58ccd
feat(marketplace): telemetry v46 + flea inner parity + listing polish (#329)
* feat(telemetry): marketplace item rollup refactor (schema v46)

Replace the v42 attribution layer with prefix-split + live lookup against
marketplace_plugins / store_entities. The v42 design had a latent bug —
AttributionLookup keyed on bare skill names while Claude Code writes
`<plugin>:<local>` in JSONL, so lookups never matched and
usage_plugin_daily stayed empty in every deployment.

Schema (v46 migration):
- Drop usage_attribution_skills / _agents / _commands (mapping tables,
  derivable from marketplace_plugins + plugin tree).
- Drop usage_plugin_daily (always empty in production due to the bug above).
- Create usage_marketplace_item_daily — per-day fact (count, distinct_users,
  error_count), composite PK on (day, source, type, parent_plugin, name).
- Create usage_marketplace_item_window — sliding-window snapshot with
  true cross-window distinct user counts; period_label='last_7d' refreshes
  every tick, 'last_30d' refreshes hourly (tracked via session_processor_state).
- Mark usage_tool_daily as candidate for removal (no product-UI consumer).

Attribution flow:
- MarketplaceItemLookup replaces AttributionLookup. Preloads
  marketplace_plugins.name + store_entities.name into memory once per
  UsageProcessor tick, then per-event splits identifier on ':',
  matches prefix, writes resolved source / parent_plugin into
  usage_events. agnes-store-bundle prefix routes to flea entities.
  Slash commands with `plugin:` prefix count as type='skill' in rollup.

API:
- BREAKING: MarketplaceItem.unique_users_30d renamed to distinct_users_30d
  (now a true distinct count from the window snapshot, not sum-of-daily).
- InnerDetailResponse gains a telemetry field — invocations_30d +
  distinct_users_30d surfaced on curated inner skill / agent detail pages.
- Card chip hidden pending UX finalisation; data stays in the response.

Backfill: scripts/backfill_marketplace_rollup.py — one-shot rebuild over
historic usage_events after deploy, idempotent.

USAGE_PROCESSOR_VERSION bumped 4 → 5 so the reprocess loop re-attributes
existing events to the new source/ref_id semantics on the next tick.

Tests rewritten: test_session_processor_usage, test_usage_rollups,
test_marketplace_telemetry, test_api_admin_usage_reprocess,
test_db_schema_version, test_home_stats, test_schema_v42_migration.
New: test_backfill_marketplace_rollup.

* fix(marketplace): refresh Most Popular on search + category changes

`loadMostPopular()` early-exits when `state.q` or `state.category` is
set, but the search + category handlers only called `loadItems()` —
so once the section was visible, typing a query or filtering by
category didn't re-run the hide check and the cards stayed on screen
out of scope. Tab + sort handlers already chained the call.

Add the call to runSearch + category pill click handlers (All +
per-category) so the visibility contract holds for every state
mutation that can flip the early-exit condition.

* feat(marketplace): All-plugins section + 7-day Most Popular

Listing layout:
- Always-visible "All plugins" / "All items" / "Your stack" section
  header (label swaps per tab) wrapped in `#mp-all-section` so its
  margin-collapse mirrors the sibling `#mp-popular-section` and the
  spacing from the filter row stays consistent in both layouts.
- Sort dropdown moved from the filter row into the All-* header,
  pinned right via `margin-left: auto`. Anchored to its section so
  the relationship between sort + grid is obvious.
- `.mp-section-header` gets `min-height: 32px` + `align-items: center`
  so the bare-text Most Popular row matches the dropdown-bearing
  All-* row.
- `.mp-section-header` margin tightened 24px → 20px on top.

Most Popular:
- Capacity reduced 8 → 4 cards.
- Now reflects a 7-day window (was 30-day). Backend surfaces
  `invocations_7d` + `distinct_users_7d` on `MarketplaceItem`
  alongside the existing 30d fields; the loader pulls a wider page
  (server still sorts by 30d) and re-sorts + filters client-side
  on `invocations_7d > 0` so the strip stays "hot right now".
- Section label updated to "Last 7 days".
- Section now renders on both `curated` and `flea` tabs (was
  curated-only). Hidden on `my` and whenever search / category
  filter is active. Refresh hooks wired into search + category
  click handlers so visibility flips immediately on state change.

Backend (`_load_invocation_stats`):
- Single SELECT pulls both `last_30d` and `last_7d` rows from
  `usage_marketplace_item_window`; the result dict carries
  invocations + distinct_users for both windows.
- Trend (recent_7 vs prior_7) kept on the daily fact table so it
  stays independent of the window snapshot's freshness.

* feat(marketplace): Most adopted sort + hide Trending when no trend data

Add a fourth sort option to the All-items dropdown — "Most adopted
(30d)", keyed on `MarketplaceItem.distinct_users_30d` (true 30d
distinct user count from `usage_marketplace_item_window`). Protects
the listing from power-user skew that `most_used` is susceptible to:
one user × 100 invokes can't beat 10 different users × 1 invoke
under adoption sort.

Hide Trending option when the response has no trend data. User
reported `sort=trending` returning an empty grid because every
plugin's `trend_pct` was None (prior-week threshold of >= 3
invocations didn't clear anywhere). Empty grids on a user-selected
sort are worse UX than just not offering the sort — surface what
works, hide what doesn't.

Backend (`app/api/marketplace.py`):
- `_apply_sort` gains a `most_adopted` branch (DESC distinct_users_30d,
  ties by name ASC).
- `sort` Literal extended.
- `ItemListResponse.available_sorts` lists the sort keys the UI
  should expose for this response. recent/most_used/most_adopted
  always; trending only when at least one item in the tab's stats
  carries a non-null trend_pct.
- `_available_sorts(stats_dicts)` helper centralises the rule —
  curated and flea branches pass one stats dict, my-tab passes both
  (option is available when either source has trend data).

Frontend (`app/web/templates/marketplace.html`):
- New `<option value="most_adopted">Most adopted (30d)</option>`
  between Most used and Trending.
- URL state allowlist extended so `?sort=most_adopted` round-trips.
- `applyAvailableSorts(available)` runs after each list fetch:
  hides options not in the response's available_sorts; if the user
  is on a now-unavailable sort, resets to 'recent' and re-fetches.
  Search-mode fan-out unions availability across the curated + flea
  responses so a hit on either side keeps the option visible.

* feat(marketplace): funnel chip on cards + deterministic Most Popular sort

Card chip — funnel telemetry between description and footer:

  [stack-icon] N installed · [user-icon] N active · [bolt-icon] N calls · ↑/↓ N%

- stack_count (new MarketplaceItem field): for curated it's COUNT(*)
  on user_plugin_optouts (post-v28 row PRESENCE = subscribed; system
  plugins are fanned out to every user via fanout_system_for_user so
  the count includes them naturally). For flea it reuses the existing
  store_entities.install_count (bumped on install/uninstall).
- distinct_users_30d (existing) — active users in the 30d window.
- invocations_30d (existing) — call volume.
- trend_pct (existing) — week-over-week, both directions: green ↑ /
  red ↓, magnitude only (sign in the arrow). Hidden when null.

Backend additions in app/api/marketplace.py:
- MarketplaceItem.stack_count field.
- _load_curated_stack_counts() — one SELECT per render, GROUP BY
  (marketplace_id, plugin_name). Wired into the curated + my-tab
  branches; flea reads install_count off the entity row directly.

Frontend (app/web/templates/marketplace.html):
- Heroicons solid 24×24 inlined (one helper per icon, all
  fill="currentColor" so per-segment colour tokens apply): rectangle-
  stack (mirrors the My Stack tab icon), user, bolt, arrow-trending-
  up/down.
- Per-segment colour: installed=amber #F59F0A (My Stack accent),
  active=green #0e9b6a, calls=orange #f97316. Text stays neutral so
  the chip still reads as metadata, the leading glyph carries the
  visual cue. Trend pill keeps the full-segment green/red colour.
- Zero state: chip hidden when stack_count == 0 AND invocations_30d
  == 0 — brand-new cards aren't visually penalised by a "0·0·0" row.
- Tooltips on every segment via title="…" so hover explains the
  number's meaning to anyone uncertain about the icon.

Most Popular section — deterministic ordering:

Previously sorted by invocations_7d DESC with no tie-breakers, so
several cards with identical 7d call counts would swap places on
refresh (JS stable sort fell back on backend order, and the backend's
own tie-breaker for `most_used` was just name ASC — six `grpn`
plugins from six test marketplaces collapse to the same name and
became indeterminate via list_with_filters' created_at order).

New cascading hierarchy (chosen primary now matches what "most
popular" really means — wide adoption, not power-user volume):

  1. distinct_users_7d DESC  ← adoption / social proof
  2. invocations_7d   DESC  ← volume at equal adoption
  3. distinct_users_30d DESC ← broader adoption fallback
  4. invocations_30d  DESC  ← broader volume fallback
  5. name              ASC  ← deterministic textual order
  6. marketplace_slug  ASC  ← splits duplicate plugin names across
                              marketplaces

Six levels guarantee any two items end at a different sort key, so
the strip is stable across refreshes.

* fix(marketplace): unify Most Popular on 30d + right-align installed chip

Most Popular section was sorting on the 7d window while its cards
rendered 30d numbers — header label promised one thing, cards showed
another. Unified everything on 30d so a card means the same data
everywhere on the page.

- Dropped the "Last 7 days" meta from the Most Popular header.
- Sort cascade now starts on distinct_users_30d, then invocations_30d,
  with 7d adoption/volume as recency-aware fallbacks before the name +
  marketplace_slug deterministic tail. Six levels guarantee identical
  sort keys never produce indeterminate order across refreshes.
- Filter switched from invocations_7d > 0 to invocations_30d > 0 to
  match the new horizon.
- Most Popular now only renders on page 1 of the listing. Past initial
  discovery, a top-of-list popularity strip on page 2+ would shadow the
  results the user paged into. Pager click handler refreshes the
  section so navigating back to page 1 re-mounts it.

Chip layout — split engagement vs adoption visually:

  [user] N active · [bolt] N calls · [↑/↓] N%        [stack] N installed
  └────────── LEFT (time-bounded engagement) ────┘   └── RIGHT (all-time) ──┘

- Installed (stack_count) is all-time, decremented on uninstall. Alone
  it says little ("12 people installed it") without the engagement
  context next to it ("…but did anyone actually use it?"). Visually
  separating the two groups makes that distinction obvious — left
  group answers "is it used", right answers "does anyone have it".
- Implemented via flex with margin-left:auto on .seg-installed so
  installed drifts to the trailing edge.
- Installed tooltip now reads "Currently installed by N users" — the
  count is a real-time net (uninstall drops it), and saying "currently"
  makes that explicit. Helps when a card shows 0: signals "nobody has
  this in their stack right now", not "data missing".

* feat(plugin-detail): telemetry chip in hero, derived rows in sidebar

Surface the same telemetry funnel the listing card carries on the
curated plugin detail page, so clicking through from /marketplace
keeps a single mental model — figures match, semantics match. The
detail sidebar drops the two raw numbers that used to live there
(Invocations 30d / Users 30d — duplicated by the chip now) and
replaces them with two *derived* signals only the daily series can
provide: Active days + Last used.

Backend (app/api/marketplace.py):
- PluginDetailResponse.stack_count — curated reads via
  _load_curated_stack_counts(), flea reuses install_count. Frontend
  treats both sources uniformly.
- _build_telemetry() always returns a dict (never None). Frontend
  decides chip visibility from stack_count + invocations_30d the
  same way the listing card does. daily_series is always 30 entries
  (zero-padded) so "Active days" and "Last used" derivations on the
  sidebar are trivial array filters.

Frontend (app/web/templates/marketplace_plugin_detail.html):
- New .hero-telemetry slot at the bottom of the hero meta column,
  between the pills row and the action buttons. Renders the four
  funnel segments — active · calls · trend · installed — joined by
  ` · `. No left/right split: the hero has space, so a single
  coherent metadata strip reads cleaner than the card's split layout.
- Heroicons solid inlined (user / bolt / arrow-trending-up,-down /
  rectangle-stack) recoloured against the dark hero — icons in
  lighter tokens (mint #6ee7b7, peach #fdba74, cream #fde68a), trend
  pill keeps the saturated green/red because direction-coding earns
  its own colour.
- Tooltip on installed reads "Currently installed by N users" — the
  count is a real-time net (drops on uninstall), and "currently"
  makes that explicit when a card shows 0.
- fmtNum helper added so 1.2k / 14M renderings match the card's
  format exactly.
- Sidebar swap: Invocations + Users rows removed, replaced by
    Active days  →  "N of 30"
    Last used    →  fmtRelative of the latest non-zero day
  Both derived from telemetry.daily_series — engagement consistency
  + recency, neither of which the hero chip exposes on its own.

* feat(item-detail): telemetry chip in hero for curated skill/agent

Bring the funnel chip the plugin detail page got in 4cf38d40 to the
curated inner skill/agent detail page — clicking through from the
listing card now keeps the same metadata strip from grid to plugin
page to inner item page.

Backend (app/api/marketplace.py):
- _load_inner_item_stats() rewritten:
    * always returns a dict (never None) so the frontend can decide
      chip visibility client-side, same contract as _build_telemetry
    * adds trend_pct, computed the same way as plugin level
      (recent_7 vs prior_7 from usage_marketplace_item_daily, ≥3
      prior-week threshold)
    * adds daily_series (30 entries, zero-padded) so the sidebar can
      derive Active days + Last used
- InnerDetailResponse.parent_stack_count — new field. Skills/agents
  don't have a per-item subscription model, so the hero shows the
  *parent plugin's* stack count under a "Plugin:" prefix. The
  funnel: "12 installed plugin → 2 actually use this skill".
- curated_skill_detail + curated_agent_detail handlers load
  _load_curated_stack_counts() once and pass the parent's value.

Frontend (app/web/templates/marketplace_item_detail.html):
- New .item-detail .hero .hero-telemetry slot beneath the badges
  row. CSS mirrors plugin-detail's colour tokens (mint/peach/cream
  Heroicons solid + saturated trend pill) so the two surfaces read
  as one visual family.
- Installed segment uses a "Plugin:" label rendered with reduced
  opacity to signal the metric describes the parent, not the item
  itself. Tooltip: "Parent plugin (<plugin_name>) currently
  installed by N users".
- Sidebar Invocations + Users rows removed (chip carries them).
  Active days + Last used derived from telemetry.daily_series replace
  them; only rendered when activeDays > 0 so a brand-new skill
  doesn't show "0 of 30" / "Last used —".
- "Type" row dropped from the sidebar — duplicates the hero badge.
- fmtNum helper added (matches listing card + plugin detail).

Plugin detail (app/web/templates/marketplace_plugin_detail.html):
- Hero "Curator: …" line removed. The Details sidebar already
  carries that info; duplicating it under the h1 was visual noise.
- Sidebar "Owner" row renamed to "Curator" — for curated plugins
  it's a person who curates inclusion in this Agnes instance, not
  the upstream code owner. "Owner" was a hold-over label.

* feat(item-detail): unify hero with plugin detail — pills + breadcrumb + cleaner sidebar

- Inner skill/agent hero now uses the same `.pills` / `.pill.cat / .curated /
  .flea / .muted` class names + CSS as the plugin detail page; the only
  item-only addition is `.pill.type` (Skill / Agent uppercase, plugin detail
  has no kind axis).
- Hero `Updated` moved out of the meta-row into a muted pill (mirrors the
  plugin detail hero), removed from the Details sidebar to avoid duplication.
- Details sidebar slimmed: dropped Marketplace, Path, Updated rows; Parent
  plugin now shows the curator-friendly display name
  (`parent_display_name || manifest_name || slug`) instead of the slug.
- Breadcrumb extended to full path: Marketplace > <marketplace_name> >
  <plugin display name> > <self>, mirroring the plugin detail breadcrumb.
- Backend: new `InnerDetailResponse.parent_display_name` field, populated via
  `_curated_plugin_enrichment` from marketplace-metadata.json — same source
  plugin detail hero already uses.

* feat(marketplace): flea inner skill/agent detail + breadcrumb polish

- Flea inner skill/agent detail page parity with curated:
  * GET /api/marketplace/flea/{id}/skill/{name} + /agent/{name}
    returning InnerDetailResponse (mirror of curated_skill_detail).
  * /marketplace/flea/{id}/skill|agent/{name} web routes that render
    marketplace_item_detail.html with source='flea' + innerName context.
  * Frontend apiURL grows a third branch for flea-inner; breadcrumb
    grows to 4 segments (Marketplace > Flea Market > <plugin display
    name> > <self>) when innerName is set.
  * Telemetry attribution: MarketplaceItemLookup resolves
    <flea_plugin>:<inner> prefixes to (source='flea',
    parent_plugin=<plugin name>) so nested invocations land in the
    same rollups curated nested skills use. USAGE_PROCESSOR_VERSION
    bumped 5 -> 6 so the reprocess loop re-attributes historic events.
- Breadcrumb 2nd segment is now a generic clickable "Curated
  Marketplace" / "Flea Market" link to /marketplace?tab=... instead
  of the opaque per-instance marketplace_name. Applied on both plugin
  detail and inner item detail.
- Inner item hero telemetry chip works for both sources: installedCount
  branches on parent_stack_count (curated) vs install_count (flea),
  installed segment drops the "Plugin:" prefix for flea standalone /
  inner items.
- Updated row dropped from Details sidebar on item detail — the hero
  pill already carries the value, sidebar row was duplicate.

* feat(item-detail): block stack-install on flea inner items (mirror curated)

Inner skills/agents nested inside a flea plugin can no longer be added
to a user's stack on their own — adoption only happens at the plugin
level, same rule curated nested items have followed since launch.

- Hero action: when innerName is set (curated nested OR flea nested),
  render "Open parent plugin →" link + helper text instead of the
  install/remove buttons. Flea standalone entities (no innerName) keep
  the normal install UX.
- Meta-row: same branch now serves curated + flea inner — "part of
  <parent plugin display name> · by <author>" with the parent link
  pointing at the right detail page per source.

No API gate change needed: POST /api/store/entities/{id}/install only
accepts existing entity ids (plugin-level), inner items have no entity
id of their own so the endpoint cannot target them directly.

* feat(marketplace): telemetry chip on inner cards + fix flea hero chip visibility

Inner skill/agent cards on the plugin detail page now carry the same
four-segment funnel chip the marketplace listing cards show (N active
. N calls . trend . N installed), for both curated nested skills and
flea nested skills. Plus two fixes that were keeping the hero chip
hidden on flea plugin / flea inner detail pages.

- Backend `_load_inner_items_stats_by_parent(conn, source, parent_plugin)`
  bulk loader: one query per plugin against usage_marketplace_item_window
  + one against _daily, returning {(name, type): stats}. Avoids N+1
  per-card lookups.
- `InnerItemSummary` gains invocations_30d / distinct_users_30d /
  trend_pct / parent_stack_count fields. `curated_detail` and
  `flea_detail` (in the entity.type=='plugin' branch) enrich the
  skills / agents lists after the existing cover-photo enrichment loop.
- `marketplace_plugin_detail.html`: new `.plugin-detail .inner-card
  .inv-chip*` CSS lifted from marketplace.html with the listing-card
  rules, new buildInnerCardChip() helper, buildCardSection appends
  the chip to each card body. Same gate as the listing card (hidden
  on parent_stack==0 && calls==0).

- fix(flea): flea_detail forgot to populate PluginDetailResponse.stack_count
  from entity.install_count (listing card does this on line 851; detail
  endpoint didn't). Hero chip gate `stackCount===0 && calls===0` then
  always hid the chip even when the entity had installs. Now mirrors
  listing card semantics: stack_count == install_count for flea.
- fix(flea inner): renderInnerHeroTelemetry was reading `d.install_count`
  for any non-curated source. InnerDetailResponse has no install_count
  field — it has parent_stack_count (populated server-side from the
  parent flea plugin's install_count). Gate + label now read
  parent_stack_count for both curated nested AND flea nested scenarios;
  install_count remains the flea standalone path.

* fix(marketplace): Owner label on flea + parent-centric sidebar for flea inner

- Plugin detail Details sidebar — authorship row label now tracks the
  source: curated bundles get `Curator` (existing behaviour), flea
  bundles get `Owner`. The `owner_todo` reminder placeholder stays on
  the curated branch only; flea falls through silently.
- Inner item detail Details sidebar — flea-inner (skill/agent nested
  inside a flea plugin) now shares the curated nested layout: Parent
  plugin / Bundle size / Active days / Last used / Owner. Drops the
  flea-standalone shape's `Category`, `Version`, `Installs`, `Released`
  rows that didn't apply to a nested item. Active days + Last used were
  already wired (telemetryRows) — they just weren't on the flea-inner
  branch.

* fix(tests): bump SCHEMA_VERSION assertions 47 -> 48 post-rebase

The marketplace telemetry migration was renamed _v46_to_v47 -> _v47_to_v48
during the rebase onto main (collision with #326 FTS BM25 migration that
took the v47 slot). Two test files still asserted the pre-rebase value:

- tests/test_home_stats.py::test_schema_version_constant_is_46 (CI red)
- tests/test_schema_v46_migration.py::test_schema_version_is_46

Renames the helper fn name + bumps the assertion. The other two test
files (test_db_schema_version.py, test_schema_v42_migration.py) were
already updated in the rebase resolution.

* fix(telemetry): _build_telemetry returns None when invocations_30d == 0

The follow-up commit that introduced the always-return-dict shape broke
the test contract from the original v46 PR (commit b603e998):

  tests/test_marketplace_telemetry.py::TestDetailTelemetry::
    test_detail_endpoint_telemetry_absent_when_no_data
    AssertionError: assert {'daily_series': [...], ...} is None

Both `PluginDetailResponse.telemetry` and `InnerDetailResponse.telemetry`
are declared `Optional[Dict] = None`, the frontend renders are None-safe
(`d.telemetry || {}` guard + `if (!d.telemetry || ...)` on daily_series),
so dropping the dict on zero activity is the cleaner default.

* release: 0.54.21 — marketplace telemetry refactor (schema v48) + flea inner detail parity + listing UX polish

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-15 20:58:03 +02:00
ZdenekSrotyr
9e948abc9c
release(0.54.18): Curated Memory restructure + per-user Dismiss + bundled adversarial-review fixes (#316/#320/#322) (#324)
* feat(web): Curated Memory restructure + per-user Dismiss + filter-state utility

Squashed from cvrysanek/zsrotyr's 4-commit PR branch + rebased onto
current main + CHANGELOG bullets spliced into [Unreleased] (preserves
existing #316/#320/#322 entries that landed on main since the branch
was authored).

Routes + access:
- /corporate-memory now user-facing (get_current_user), in primary
  nav next to "Data Packages" — same gate as /api/memory/*.
- /admin/corporate-memory is the new admin review queue location
  (was /corporate-memory/admin); reached via Admin dropdown. Template
  renamed: corporate_memory_admin.html → admin_corporate_memory.html.

Visual chrome:
- Both pages migrate to shared _page_hero.html blue hero band.

Per-user Dismiss (new feature, schema v46):
- knowledge_item_user_dismissed(user_id, item_id, dismissed_at) + index.
- POST /api/memory/{id}/dismiss + DELETE (idempotent).
- Mandatory items can never be dismissed — enforced at 2 layers.
- GET /api/memory: hide_dismissed=false default + dismissed_by_me flag.
- GET /api/memory/bundle: always excludes dismissed for the caller.
- UI: Dismiss/Undismiss button per item (hidden for mandatory),
  gray-out + line-through for dismissed rows, Hide-dismissed toggle.

Admin edit modal:
- Category as <select> + "Add new category…" reveal.
- Audience as <select> with (unset)/all/group:<name> from RBAC.
- Tags: full tag-input widget (pills, ×-remove, Backspace pop,
  Enter/comma to add, ↑/↓ typeahead from EXISTING_TAGS).

Bulk-edit modal pickers (closes #128):
- Move-to-category / Add-tag: <select> + add-new.
- Set-audience: <select> (no more typo-able 'gourp:eng').
- Remove-tag: closed-set picker.

FilterState utility:
- app/web/static/js/filter-state.js — save/load/clear/bindInputs
  for per-page localStorage filter state. Adopted on /corporate-memory.

E2E verified live on a real VM through the API + browser flow.

* release: 0.54.18 — Curated Memory restructure + 4 adversarial-review fixes

Bundles together:
- #316  fix(store): surface review failures + harden publish gate
        (BREAKING fail-CLOSED guardrail, override v2+ promote, restore guard,
        retry/rescan staged-bundle, banner widening, LLM truncation retry)
- #320  fix(store): C2 bundle export RBAC + H2 per-entity write lock +
        H3 update_status compare-and-swap with bg_verdict_skipped audit
- #322  fix(store): M1 prompt sentinel filename escape + M2 atomic
        promote_to_version helper + L1 admin forensic download per-version
- #324  Curated Memory restructure + per-user Dismiss + FilterState utility

Bump from 0.54.17 → 0.54.18 (patch — pre-1.0 policy: every cycle is patch).
2026-05-15 18:51:05 +02:00
Vojtech
a694a30a5e
fix(store): surface review failures + harden publish gate (#316)
* fix(store): surface review failures + harden publish gate

Four independent fixes to the flea-market submission pipeline, all surfaced
by an admin upload that landed at status='approved' without an LLM review.

1. LLM truncation no longer pins submissions in review_error.
   - Raised MAX_RESPONSE_TOKENS 2500 → 6000 in llm_review.py
   - Added one-shot retry-with-doubled-budget in anthropic_provider.py
     (capped at 4× initial)

2. Flea detail page surfaces the latest submission's failure verdict even
   when a previously-approved version is still serving (deferred-promotion
   path). The _quarantine_banner gate widened from `visibility != approved`
   to also fire on `blocked_inline / blocked_llm / review_error`, with copy
   that distinguishes the v2+ edit case ("Latest edit failed review —
   previously approved version (vN) keeps serving") from the initial-upload
   quarantine wording.

3. Restore button + endpoint no longer allow restoring a version that was
   never approved. Added StoreEntitiesRepository.get_with_version_approvals
   joining store_submissions, gated the UI button on submission_status in
   ('approved', None), rendered status pills for non-restorable rows, and
   added a 400 version_not_approved guard in POST /restore.

4. **BREAKING (operator-facing)**: publish gate is now fail-CLOSED on
   misconfig. The previous get_guardrails_enabled() silently fell back to
   "disabled, auto-approve everything" when guardrails.enabled=true in YAML
   but no ANTHROPIC_API_KEY was in env. Split into:
     - get_guardrails_enabled()              (intent — YAML)
     - get_guardrails_llm_provider_ready()   (readiness — env)
   Three-state matrix:
     enabled=false                       → auto-approve (unchanged)
     enabled=true + ready=true           → normal pipeline (unchanged)
     enabled=true + ready=false (NEW)    → submissions hold at pending_llm
                                           awaiting admin retry or override
                                           (was: silent auto-approve)
   Admin "Retry review" eligibility broadened to include pending_llm.
   Boot-time WARNING banner surfaces the misconfig in app/main.py.
   docs/STORE_GUARDRAILS.md updated with the three-state matrix.
   Operators relying on the auto-fallback for local-dev no-LLM setups must
   now explicitly set `guardrails.enabled: false` in instance.yaml.

Tests: 4623 passed. Added TestPublishGateFailClosed (4 tests) and
TestRestoreVersion::test_restore_rejects_* (3 tests). conftest.py adds an
autouse fixture defaulting guardrails OFF so legacy tests don't need to
know about the new toggle.

* fix(store): admin override promotes v2+ edits to current

The override handler at app/api/admin.py:3708 only flipped submission
status → 'overridden' and entity visibility → 'approved'. Under the v37+
deferred-promotion model that's insufficient for v2+ edits / restores:
the new bundle sits in versions/v<N>/plugin/ and the entity row stays at
the prior approved version_no + hash + on-disk live bundle. Installers
kept getting the OLD bytes the admin had just intended to replace.

Mirror the runner.run_llm_review auto-approval branch: look up the
submission's version_hash in entity.version_history, and if its `n`
differs from entity.version_no, promote_version + _swap_live_to_version.
Initial v1 overrides are unaffected — the loop finds n=1 == version_no
and skips promotion.

Tests:
- test_override_v2_edit_promotes_to_current: stage v1 approved + v2
  blocked_llm; override the v2 sub; assert entity.version_no=2,
  entity.version flips off the v1 hash, and the live plugin/ dir
  mirrors versions/v2/plugin/.
- test_override_v1_initial_upload_no_promote: regression guard so the
  promote loop doesn't accidentally bump a v1 override.

Audit log gains a promoted_to_version_no field on the override action.

* fix(store): retry/rescan review staged bundle; override forward-only

Two adversarial-review findings from a Codex pass on the publish-gate
work.

C1. Admin retry + rescan were passing live `plugin/` to the LLM. For a
v2+ submission held at `pending_llm` / `blocked_llm` / `review_error`,
live still holds the prior approved version's bytes — so the LLM
reviewed the WRONG bytes, and the runner's hash-match promotion in
`run_llm_review` would then advance the entity to staged bytes that
were never actually reviewed. Resolve the staged
`<entity>/versions/v<N>/plugin/` from the submission's
`version_history` entry, with a fall-back to live for legacy pre-v37
rows that never seeded a versions/ dir. Helpers
`_submission_plugin_dir` and `_version_no_for_submission` added to
`app/api/store.py` so override / retry / rescan share one path.

H1. Override's promote loop used `target != current`, which would
silently demote the live bundle when admin overrode a stale v2
submission while v3 was already approved + live. Changed to
`target > current` so override flips status + visibility on the row
regardless, but on-disk promotion only fires forward. Same `>`
defensive guard applied in `runner.run_llm_review` so a late LLM
verdict racing with a newer approval can't demote either.

Tests:
- TestAdminRetryReviewsStagedBundle::test_retry_v2_blocked_passes_staged_dir_not_live
- TestAdminRetryReviewsStagedBundle::test_rescan_v2_blocked_passes_staged_dir_not_live
- TestOverrideForwardOnly::test_override_stale_v2_does_not_demote_when_v3_current

* review polish: CHANGELOG drift, override eligibility, defensive copy

Three small additions on top of the retry/rescan staged-bundle fix:

1. CHANGELOG: the PR's bullets had drifted into the released
   [0.54.17] section during rebase (context-match landed them next
   to already-released content). Moved them up to [Unreleased] where
   they belong; [0.54.17] now holds only what was actually released
   (refresh-marketplace ls-remote, /me/activity hero, CI sharding +
   workflow polish).

2. app/api/admin.py: admin override eligibility now accepts
   pending_llm alongside blocked_inline + blocked_llm + review_error.
   Closes a UX gap from the new fail-CLOSED behavior: under
   enabled-but-not-ready, a known-good submission would otherwise
   sit indefinitely until the admin set credentials AND clicked
   Retry. Override already routes through version_history (and is
   now forward-only on promote), so it stays safe for v2+ deferred-
   promotion submissions.

3. src/repositories/store_entities.py: get_with_version_approvals
   defensively copies each version_history entry before annotating
   with submission_status. self.get() re-parses JSON each call today
   so this is belt-and-suspenders against any future caching layer
   leaking the annotated key into a subsequent plain get() call.

Tests: 112 passed (focused on test_store_entity_versions +
test_admin_store_submissions, covering the retry/rescan staged-
bundle fix the author shipped + this polish).

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-15 15:52:07 +02:00
ZdenekSrotyr
d55c8a3c33
feat(web): consolidate the personal /me/* surface — /me/activity + /me/profile (#304)
Consolidates the scattered per-analyst pages into /me/activity (usage
analytics) and /me/profile (account hub). /me/stats and /profile/sessions
301-redirect; /profile, /me/debug, /tokens are removed with every internal
link repointed. Includes an XSS fix in the /me/activity page hero, the
user_id-keyed session-lookup alignment, and the v0.54.15 release cut.

Co-developed by @ZdenekSrotyr and @cvrysanek.
2026-05-14 21:29:51 +02:00
ZdenekSrotyr
e290baa31b
release: 0.54.14 — changelog repair + post-#305 dead-code sweep (#309)
Cuts 0.54.14. Repairs the [Unreleased] changelog state left by three PRs that merged since v0.54.13 without proper changelog hygiene, sweeps dead code orphaned by #305 and earlier PRs, and bumps the version.

- CHANGELOG: #307 bullets moved out of the released [0.54.10] section into [Unreleased]; backfilled missing entries for #305 (Removed) and #308 (Changed); [Unreleased] -> [0.54.14] release cut.
- pyproject.toml: 0.54.13 -> 0.54.14.
- app/web/router.py: removed orphaned gws_oauth / instance_admin_email / connector_prompts keys from the shared _build_context ctx dict.
- app/web/templates/home_not_onboarded.html: swept dead connector-tile, automode, and setup-collapsible/minimize CSS + the orphaned JS click-wiring.
2026-05-14 20:41:04 +02:00
Vojtech
cb13f80241
feat(marketplace): rename CTA + expand submit-flow guides (#308)
Three coordinated tweaks to the publication discovery surface:

1. Action-row CTA on /marketplace?tab=curated reads 'Submit a skill
   or plugin' instead of 'Submit a plugin'. Skills are first-class
   citizens of the curated shelf; the old wording made them feel
   like an afterthought. Same rename in the empty-state JS innerHTML
   so the two paths can't drift.

2. Curated guide page (/marketplace/guide/curated) expanded from a
   4-line stub into a 3-step ordered list documenting the Named
   Curator handoff (find curator → handoff → publish + lifecycle).
   New '.guide-fastpath' callout block points users at the Flea
   Market when they want lighter review-bar / faster path. Primary
   CTA at the bottom of the curated guide now links to the flea
   guide too, so users who skim past the fast-path callout still
   see the escape hatch.

3. Flea guide page (/marketplace/guide/flea) expanded from a
   3-line stub into a 4-step ordered list (package → upload via
   form → automated review → published). Documents the actual
   /store/new flow + the automated guardrails (manifest, content
   quality, prompt-injection scan) so users know what 'self-service'
   actually means before they upload.

Route titles updated to match: 'Submit a skill or plugin to
Curated Marketplace'.

New file: tests/test_web_marketplace_guide.py — three tests covering
the CTA rename, the curated guide's structural elements (Named
Curators lede, 3 steps, fastpath callout, primary-CTA href), and
the flea guide's structural elements (4 steps, no fastpath
asymmetry, /store/new primary CTA).
2026-05-14 19:44:33 +02:00
ZdenekSrotyr
1b0329e8c5
UI design system unification — one stylesheet, canonical primitives, nav fix (#284)
* docs(plan): design-system unification plan (post-review revisions)

Plan covers consolidating two CSS files into one, introducing
canonical primitives (.btn family, .search-input, .filter-bar,
.page-header, .data-table, .empty-state, .toast, .stat-card,
.tab-strip), unifying the top-nav Admin trigger with sibling
links, and migrating 41 templates that today carry inline
<style> blocks.

Post-review revisions: nav fix moved to first commit (user
complaint lands first); sticky-header and dark-mode skeleton
tasks dropped (defer to follow-up PRs); contract test class
detection tokenizes class="..." attributes properly; baseline
screenshot loop added to Task 0; vendor-token grep widened.

* fix(nav): unify Admin trigger with sibling nav links

The top-nav Admin entry is a <button class="app-nav-link
app-nav-menu-trigger">, siblings are <a class="app-nav-link">.
.app-nav-menu-trigger used to override .app-nav-link with
"color: inherit; font: inherit", resetting font-size from 13px
back to body default and color from --text-secondary to body
color. Active state diverged too: .is-active on links used
--primary blue, [aria-expanded=true] on the button used
--border-light grey.

Fix: expand .app-nav-link so it covers <button>-element resets
(font-family: inherit, border: 0, background: transparent,
cursor: pointer, display: inline-flex for chevron alignment).
Add [aria-expanded="true"] as another active-state selector
so the dropdown's open state highlights identically to .is-active
on links. Delete the now-redundant .app-nav-menu-trigger rules
that stripped button chrome.

Extract the inline <script> from _app_header.html into a new
app/web/static/app.js (loaded by base.html only — base_login.html
has no nav). Sets up window.appUI.wireDropdown for both the user
menu and the Admin dropdown via DOMContentLoaded.

* style(css): consolidate style.css into style-custom.css + add cache-bust

One stylesheet for the whole web UI:
- style.css (1086 lines, legacy Google-inspired tokens + components)
  absorbed into style-custom.css under a labeled block, placed after
  the modern :root + body so style-custom's component rules continue
  to override the legacy ones (preserves the original cascade order
  that came from loading style.css first).
- style.css deleted; <link> dropped from base.html + base_login.html.
- static_url() now appends ?v=<mtime> to /static/<path>. Cheap
  per-request os.stat — auto-invalidates browser + proxy caches on
  redeploy without operator intervention. Mtime survives across
  uvicorn restarts as long as the file content is unchanged.

Legacy classes (.btn, .card, .login-*, .badge, .code-block, .flash,
.form-group, .username-box, .btn-copy, .auth-tabs, .divider, etc.)
still render — they live in style-custom.css now. Login pages,
error page, password setup, and the dashboard's Claude Code Setup
card all kept working in browser smoke.

* test(design): contract test for design-system invariants

7 structural invariants enforced from this commit onwards:
- style.css must stay deleted
- no template links style.css via static_url
- exactly one bare :root block in style-custom.css
- canonical primitives declared (.btn, .btn-primary, .search-input,
  .filter-bar, .page-header, .data-table, .empty-state, .toast, …)
- no deprecated class names in templates (.users-table, .gp-table,
  .marketplaces-table, .audit-table, .users-search, .marketplaces-search,
  .modal-btn, .btn-primary-v2, …)
- app.js loaded by base.html, NOT by base_login.html
- 3 helper-level unit tests for the class-attribute tokenizer
  (multi-line attrs, Jinja-conditional fragments, false-positive prose)

Two of the assertions intentionally start FAILING after this commit
(missing primitives + legacy class refs in 7 admin templates) and
will turn green as Tasks 4–7 add primitives and Tasks 8–15 migrate
the templates.

* feat(css): canonical button family + legacy token aliases

Adds at top of :root: legacy token aliases (--bg, --card-bg, --text,
--text-light, --secondary, --radius) pointing at modern equivalents.
Absorbed style.css rules referenced these names; without aliases
they fell back to 'unset'. Aliases live until Task 16 alongside
their absorbed rules.

Appends canonical .btn variants at end of file (last cascade):
  .btn-primary + .btn-primary-v2 + .modal-btn.primary (alias group)
  .btn-secondary + .btn-secondary-v2 + .modal-btn:not(.primary):not(.danger)
  .btn-ghost + .btn-ghost-v2
  .btn-danger + .modal-btn.danger
  .btn-lg
  .btn:disabled + .btn:focus-visible (focus ring via --focus-ring)

Existing absorbed .btn, .btn-primary, .btn-secondary, .btn-sm rules
remain — the canonical block adds the missing variants + selector-list
aliases so .modal-btn and v2 markup keep rendering until migration
tasks swap them out.

Contract test: .btn-danger now declared (one less missing primitive).
Browser smoke: /admin/tokens hero + filter pills + empty state render
correctly with the absorbed style.css rules now backed by real tokens.

* feat(css): form-control primitives — .search-input + .filter-bar + .filter-pill + .form-input

Canonical filter bar shape: 36px-height inputs (matches button height
for vertical rhythm), 28px pills with .is-active state, consistent
focus ring via --focus-ring token.

Selector-list aliases for legacy per-page classes:
- .users-search / .marketplaces-search / .kb-search → .search-input
- .filters-card → .filter-bar
- .pill[aria-pressed="true"] also matches the .filter-pill active state

.form-input added as a sibling of .search-input for forms — same
baseline height + radius + focus treatment, with textarea.form-input
auto-sizing to min 96px and using the mono font (matches CSV/SQL
pasted-snippet patterns on /admin/agent-prompt + /admin/workspace-prompt).

Contract test: .search-input + .filter-bar + .filter-pill now declared.

* feat(css): .page-header primitive + variants + .tab-strip

Canonical page-header pattern with title (22px) + optional subtitle +
optional eyebrow + right-aligned actions slot. Two modifiers:
- .page-header--hero: gradient background (primary→primary-dark),
  28px white title, semi-transparent subtitle/eyebrow. For
  /marketplace, /store, /profile-style pages that already use this
  layout via per-page inline <style>. Migration tasks delete the
  duplicated rules.
- .page-header--compact: 18px title for dense admin index pages.

.tab-strip + .tab-strip__item — the secondary tab row pattern used by
/marketplace?tab=flea and similar. .is-active / [aria-selected=true]
both flip the active treatment (primary color + bottom border).

Contract test: .page-header / __title / __subtitle / __actions all
now declared (4 fewer missing primitives).

* feat(css+js): .data-table + .empty-state + .toast + .stat-card primitives

Last primitive batch. All 8 canonical-primitives invariants in
test_design_system_contract.py now green; only the template-migration
test fails (expected — Tasks 8–15).

.data-table (+ --compact modifier): selector-list aliases for legacy
per-page table classes (.users-table, .gp-table, .marketplaces-table,
.audit-table) so existing markup keeps rendering until migration.
Compact modifier shrinks padding + font for dense lists (audit log).

.empty-state with __icon / __title / __description / __actions —
replaces the ad-hoc 'no results' rendering scattered across pages
(corporate_memory, admin_users, admin_marketplaces, etc.).

.toast / .toast-container — paired with window.appToast({kind, msg,
timeout}) appended to app.js. Bottom-right stacked, click-to-dismiss,
auto-dismiss after 4s by default. Kind 'success' / 'warning' / 'error'
/ 'info' shows a 3px colored left border.

.stat-card (+ --accent variant) + .stat-row grid — for the dashboard
metric tile row.

* style(templates): migrate 8 templates off deprecated class names

Mechanical class-attribute rewrite via tokenizer (preserves Jinja
conditionals + multi-line attrs):

  modal-btn primary    -> btn btn-primary
  modal-btn danger     -> btn btn-danger
  modal-btn            -> btn btn-secondary
  users-table          -> data-table
  gp-table             -> data-table
  marketplaces-table   -> data-table
  audit-table          -> data-table
  users-search         -> search-input
  marketplaces-search  -> search-input

8 templates touched: admin_groups, admin_marketplaces, admin_tokens,
admin_users, admin_welcome, admin_workspace_prompt, my_tokens,
corporate_memory_admin. 43 lines updated total.

Inline <style> blocks in these templates still define rules for the
old class names — those rules no longer match anything and become
dead code, removed in Task 16's alias cleanup along with the
selector-list aliases in style-custom.css.

Contract test (tests/test_design_system_contract.py) now fully green:
9/9 invariants enforced from this commit onward.

* feat(css): extend .data-table selector list to 13 more bespoke -table classes

Visual unification of remaining tables across the codebase without
per-template edits. The .data-table baseline rules (uppercase header
tracking, 12px padding, hover state, border-radius) now apply to:

  .ad-table / .ea-table / .md-table / .members-table /
  .obs-table / .overview-stats-table / .registry-table /
  .sample-table / .sched-table / .sess-table / .sub-table /
  .subs-table / .ud-table

These class names live in 12 templates (activity_center, admin_access,
admin_group_detail, admin_scheduler_runs, admin_sessions,
admin_store_submissions, admin_tables, admin_usage, admin_user_detail,
catalog, me_debug, profile_sessions) that have their own per-page
<style> blocks. Per-page rules with higher specificity still win for
their custom needs (column widths, etc.) — this commit only sets a
shared baseline so every table renders with the same chrome.

Contract test stays green: 9/9 invariants enforced.

* style(css): remove now-unused legacy class aliases

Phase A renamed 8 templates off these names; no markup references
them any more, so the selector-list memberships are dead weight.
Removed from style-custom.css:

  .btn-primary-v2 / .btn-secondary-v2 / .btn-ghost-v2
  .modal-btn / .modal-btn.primary / .modal-btn.danger /
  .modal-btn:not(.primary):not(.danger)
  .users-search / .marketplaces-search / .kb-search
  .users-table / .gp-table / .marketplaces-table / .audit-table
  .filters-card

37 lines smaller. Contract test catches any reintroduction.

KEPT aliases (still in untouched template markup):
- .pill (marketplace_plugin_detail.html, marketplace.html — these
  pages weren't part of Phase A's deprecated-class sweep; their
  own .pill CSS rules still apply)
- All .data-table family extensions (.ad-table, .ea-table, .md-table,
  .members-table, .obs-table, .overview-stats-table, .registry-table,
  .sample-table, .sched-table, .sess-table, .sub-table, .subs-table,
  .ud-table) — these still render data tables in 12 templates;
  selector-list aliasing keeps them visually unified with .data-table
  baseline.
- Legacy token aliases (--bg / --text / --text-light / --secondary /
  --card-bg / --radius) — still resolve absorbed style.css rules.

Templates' inline <style> blocks still contain dead rules for the
renamed classes (.users-search, .modal-btn, etc.); harmless but
bloat. Optional follow-up: a separate sweep can drop those.

* docs(changelog): design-system unification under [Unreleased]

* feat(css): unify page-shell width — .container baseline 1280px + modifiers

Inventory found 30+ unique max-width values across templates (280px
login → 1600px admin/tables). The legacy .container default was 800px,
which made every admin page set its own wider inline override —
30+ ad-hoc widths drifted as a result.

Canonical: .container max-width = var(--width-app) (1280px). Pages
that need a different shape opt in via modifiers:

  .container--narrow → var(--width-narrow)  (800px) — long-form text,
                                                     setup wizards
  .container--wide   → var(--width-wide)    (1400px) — admin lists,
                                                     marketplace grids
  .container--full   → max-width: none — hero / landing

Pages that already set a NARROWER inline max-width (setup, login flows
inside .login-card, etc.) still render at their narrower size — the
inline override beats the new canonical 1280px. The visible change
hits the ~20 admin pages currently rendering at 800px via the legacy
default, which jump to 1280px and pick up consistent breathing room.

Spacing also normalized: padding 24px 20px → var(--space-6) var(--space-5).

* fix(home+catalog): gut dashboard sections + remove confusing toggle + fix table count

Dashboard /home cleanup:
- Remove 'Your Data' card — Data Packages is already a top-nav entry,
  so duplicating data sources on the landing page just adds noise.
- Remove 'Account' card — group memberships + scripts + last sync
  belong on /profile, not on the welcome screen.
- Remove entire right-column (Corporate Memory + Activity Center
  widgets) — both surfaces have dedicated admin pages reachable from
  the Admin dropdown.
- Keep stats row (Tables/Columns/Rows/Data Size/Unstructured),
  env-setup-CTA, and Notifications card.

/catalog cleanup:
- Strip the 'Always included' badge + the locked toggle-switch from
  Core Business Data and Business Metrics cards. The toggle was
  always 'checked disabled' — it visually looked like a switch but
  could not be toggled, which was confusing. The 'Always included'
  copy itself was redundant once the toggle was gone. Agnes Internal
  already rendered without these, so the three cards are now visually
  consistent.

Catalog data_stats fix:
- 'total_tables' was len(sync_state) — counted only tables that had
  ever synced, so a 30-row table_registry with 0 ever synced rendered
  as '0 tables'. Switched to len(tables) — the registered
  business-data table list — so the count reflects what's actually
  available, not what's been touched.

* fix(home): real stat numbers + drop unstructured tile + cleanup dead CSS

Dashboard stats were hardcoded zeros (columns: 0, size_display:
'0 MB', unstructured_display: '0 MB') and the table counter pulled
from sync_state (synced) instead of table_registry (registered).
On a fresh deployment with 30 registered tables and 0 ever synced,
the page rendered '0 / 0 / 0 / 0 MB / 0 MB' — useless.

Now:
- Tables: COUNT(*) FROM table_registry WHERE source_type != 'internal'.
  Matches the /catalog Core Business Data counter.
- Columns: SUM(sync_state.columns). Zero only when nothing's synced yet.
- Rows: unchanged (SUM(sync_state.rows), already correct).
- Data Size: SUM(sync_state.file_size_bytes), human-formatted via
  inline _fmt_bytes helper (KB/MB/GB).
- Unstructured: tile dropped — was always '0 MB' and had no source.
- last_updated: now derived from sync_state max(last_sync), wasn't set
  before so the 'Synced …' tag never rendered.

Dashboard.html cleanup: ~725 lines of orphan inline <style> removed —
.section-title, .data-source*, .toggle-switch*, .catalog-cta*,
.memory-card / .memory-stat / .memory-description / .memory-footer
/ .btn-memory, .activity-card / .activity-stat / .activity-text
/ .btn-activity, .account-grid / .account-row / .account-scripts
/ .badge-role / .badge-group / .cron-line, .badge-included /
.badge-beta / .badge-demo. All matched markup deleted in the
previous commit; the CSS was dead code until now.

* ui(catalog): rename page heading 'Data Catalog' → 'Data Packages'

The top-nav entry says 'Data Packages' but the page itself said
'Data Catalog' — confusing two-name product. Aligns the heading and
<title> with the nav label. Subtitle trimmed too: 'manage your
subscriptions' was a vestige of the toggle UI that just got removed,
replaced with a one-liner describing what the page is for.

Two other 'Data Catalog' strings stay: they live inside the table-
profiler overlay JS and refer to an EXTERNAL catalog system (e.g.
OpenMetadata / Atlan) that an operator may link to per table — that
is a generic term for any external data-catalog product, not our
page name.

* fix(nav): dropdown clicks always work + mutual-exclusion close

Two bugs in the wireDropdown helper:

1. Clicking trigger B while trigger A's menu was open left both open.
   e.stopPropagation() in trigger.click prevented the document-click
   handler from firing, so trigger A's open menu had no way to learn
   that something else was clicked. Net effect: state diverged across
   the two dropdowns the more you clicked.

2. The target-vs-trigger equality check (e.target !== trigger) was
   strict. Clicking the chevron <svg> inside the button reports the
   svg or its <path> child as e.target — not the button — so removing
   stopPropagation alone would trip the close branch in the same
   click that just opened the panel.

Fix both at once: drop e.stopPropagation() AND switch the doc-handler
guard to trigger.contains(e.target). Now any click outside both the
trigger subtree and the panel subtree closes; any click on another
trigger closes via the OTHER dropdown's doc handler; clicks inside
the trigger (button OR svg child) are fully ignored by the doc
handler and only the trigger's own toggle handler fires.

* feat(ui): canonical blue-gradient hero on every admin page

The UI had a per-page hero pattern on ~10 onboarding/marketing pages
(admin_tokens / profile / install / setup_advanced / marketplace /
my_tokens / store_upload / home_*), each with its own ad-hoc CSS
(.tokens-hero, .profile-hero, .install-hero, .upload-hero, …). The
admin section's index + detail pages had plain H1/H2 with their own
.users-title / .gp-title / .obs-title / .cfg-title / … inline styling.
Net effect: half the app felt like a product, half felt like a
spreadsheet.

Now:
- .page-header--hero CSS upgraded to match the look analysts already
  liked from admin_tokens: 28px/32px/24px padding, 14px radius, soft
  primary-tinted box-shadow (0 4px 16px rgba(0,115,209,0.2)), 28px
  semibold title, optional uppercase eyebrow + 13.5px subtitle.
  Narrow-viewport breakpoint included.
- New _page_hero.html partial wraps the boilerplate. Usage:
    {% set page_hero_eyebrow  = "Users & Access" %}
    {% set page_hero_title    = "Users" %}
    {% set page_hero_subtitle = "…" %}
    {% include "_page_hero.html" %}
- 15 admin templates migrated to it: admin_users / admin_groups /
  admin_marketplaces / admin_access / admin_sessions /
  admin_session_detail / admin_store_submissions /
  admin_scheduler_runs / admin_usage / admin_user_detail /
  admin_welcome / admin_workspace_prompt / admin_server_config /
  activity_center / admin/news_editor. Each gets a grouped eyebrow
  (Users & Access / Data / Agent Experience / Activity Center /
  Server) matching the Admin dropdown sections so the page identity
  is unambiguous at a glance.

Legacy *-title H2/H1 + adjacent subtitle paragraphs deleted; their
per-page CSS rules are dead now (harmless, retire in a follow-up
sweep alongside other inline-style cleanup the reviewers flagged).

admin_tables.html intentionally NOT migrated — it's a standalone
HTML page that doesn't extend base.html; a separate refactor.

Test: test_admin_users_page_renders_for_admin assertion updated
from .users-title to .page-header__title + .page-header--hero (the
canonical pair). All other web/template tests stay green.

* refactor(ui): dedup _humanbytes, drop 267 lines of dead inline CSS

(1) _humanbytes consolidation:
- Add TB branch + optional precision param (default 2 preserves existing
  Store detail callers; dashboard uses precision=1 for headline tiles).
- Delete inline _fmt_bytes from dashboard handler — was a copy of
  _humanbytes with different rounding. One canonical helper now.

(2) Dead inline-CSS sweep across 17 migrated templates:
- Conservative regex: a CSS rule is deleted only when its primary class
  matches one of the known-dead names AND that name is NOT referenced
  from any class= attribute in the same file's markup.
- Per-file 'in-use' guard saved several false positives that the deny
  list would have nuked (e.g. .users-toolbar, .gp-search, .obs-subtitle,
  .marketplaces-toolbar are still in use; only .users-table, .users-search,
  .users-title, .modal-btn, etc. that have NO markup left went away).
- Removed: -267 lines across admin_users (-42), admin_marketplaces (-45),
  admin_groups (-31), my_tokens (-38), admin_tokens (-29), admin_access
  (-9), admin_user_detail (-6), admin_welcome (-8), admin_workspace_prompt
  (-8), admin_server_config (-2), admin_sessions (-1), admin_session_detail
  (-1), admin_usage (-1), admin_store_submissions (-3), admin_scheduler_runs
  (-3), activity_center (-4), corporate_memory_admin (-36).

Contract test stays green (9/9); all web/template/render/user_management
tests pass.

* feat(ui): canonical hero on /catalog (Data Packages)

Same .page-header--hero treatment as the admin pages — Data eyebrow,
Data Packages title, Browse-the-data-sources subtitle. Removes the
ad-hoc .page-title block (h1 / p / wrapper-div) and its CSS rules
(now dead, 3 rule blocks deleted).

* fix(nav): load app.js from _app_header.html — works on standalone pages

The previous nav-fix commit moved the inline dropdown script from
_app_header.html into app/web/static/app.js + added <script src=…>
to base.html. That broke EVERY page that includes _app_header.html
WITHOUT extending base.html (catalog, corporate_memory*,
admin_tables, install). They got the nav markup but no JS → both
Admin and AD dropdowns dead on those pages.

Fix: emit the <script src=app.js defer> directly inside the
_app_header.html partial. Any page that includes the header now
gets the script automatically — base.html-extenders AND standalone
HTML pages alike. base.html's duplicate <script> line removed.

Also fixes the wide-hero on /catalog: .page-header--hero now sets
its own max-width: var(--width-app) (1280px) so standalone pages
without a .container parent don't render the gradient edge-to-edge.
catalog's .source-cards bumped from 900px → 1280px to match the
hero, otherwise the page reads two-tier (wide blue band, narrow
content) which the user flagged.

Verified locally via agent-browser: Admin + AD dropdowns now click
through on /catalog, /admin/tables, /corporate-memory.

* docs(plan): standalone pages → base.html framework migration plan

Plan + Plan-agent review (8 must-fix items applied) for converting
the 5 templates that ship their own <html><head><body> scaffold
(catalog, install, corporate_memory, corporate_memory_admin,
admin_tables) to extend base.html. Root cause of yesterday's
'dropdown dead on /catalog' regression: shared infrastructure in
base.html doesn't propagate to standalones.

* feat(base): body_attrs block + migrate install.html to extend base

base.html: new {% block body_attrs %}{% endblock %} slot so pages
that need <body> attributes (admin_tables has data-source-type)
can carry them through extends.

install.html: convert from standalone <html><head><body> scaffold
to {% extends "base.html" %} with title / body_attrs / head_extra
/ layout / scripts blocks. Drops:
- <!DOCTYPE>, <html>, </html>, <head>, </head>
- <meta charset>, <meta viewport>
- Duplicate <link rel="stylesheet" href="...style-custom.css">
  (base.html already provides one)
- <body> opening + closing tags
- Leading _app_header.html include + _version_badge.html include
  (base.html handles both)

Preserves per-page CSS (in head_extra), per-page JS (in scripts),
the Inter font preconnect (kept inline; not hoisted to base in
this PR — separate decision).

Pilots the migration recipe before the 4 larger pages.

* refactor(memory): extend base.html

Same recipe as install.html. corporate_memory.html now inherits
<html>/<head>/<body> + nav + app.js script tag from base.html.
Page-specific CSS and JS preserved in head_extra + scripts blocks.

* refactor(memory-admin): extend base.html

Same recipe as install/corporate_memory. Curation page now in the
shared rendering pipeline.

* refactor(catalog): extend base.html

catalog.html had the most complexity: 7 head-level assets (chart.js,
Prism, prism-sql, metric_modal.css link + 2 preconnects + Inter
stylesheet), 5 body-level <script> blocks including a <script type=
"module"> for the metric modal, 2 duplicate style-custom.css links
in <head>. The migration script preserved all of them — head-level
externals hoisted to {% block head_extra %} in source order, body
scripts relocated to {% block scripts %} in source order (so chart.js
loads before the IIFE that builds Chart instances), duplicate
style-custom.css links dropped (base.html provides one).

* refactor(admin-tables): extend base.html + carry data-source-type

The biggest of the 5 standalones at 3563 lines. <body data-source-
type="{{ data_source_type }}"> attribute carried through via the
new {% block body_attrs %} slot (admin_tables JS reads
document.body.dataset.sourceType to switch between keboola and
bigquery rendering paths).

* release: 0.54.10 — UI design system unification + homepage status frame + initial workspace override + store guardrails

Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>

* refactor(web): migrate remaining templates to canonical design primitives

- admin_group_detail: .data-table, .btn family, appToast(), remove duplicate table/button/toast CSS
- admin_store_submission_detail: .data-table, .btn family, appToast(), remove duplicate btn/toast CSS
- profile_sessions: .data-table, _page_hero.html, remove duplicate table/title CSS
- me_debug: .data-table, .btn family, remove duplicate table/button CSS
- marketplace: .btn-primary/.btn-secondary, remove duplicate button CSS
- store_edit: remove duplicate .btn-primary/.btn-link CSS, canonical button classes
- store_upload: remove duplicate .btn-primary/.btn-secondary/.btn-link CSS

Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-14 13:28:03 +02:00
Vojtech
aa6a6700f4
feat(me/stats): per-analyst Stats dashboard with 4 tabs (#298)
* feat(me/stats): per-analyst Stats dashboard with 4 tabs

New /me/stats page shows the calling user's own analytics across
four tabs, lazy-loaded per activation:

- **Sessions** — paginated usage_session_summary join with a
  filesystem scan of un-processed JSONL (mirrors admin
  list_user_sessions shape). v44 token columns aggregated per row.
- **Tokens** — daily series (default 30 days), by-model breakdown
  (lifetime), top-10 biggest sessions, lifetime totals. Single
  CTE per sub-query against per-user partition (idx_usage_session_user).
- **Data access** — audit_log rows where action LIKE 'query.%' for
  the caller. Covers query.local / query.hybrid / query.remote /
  query.internal. Cursor-paginated on (timestamp, id).
- **Sync activity** — audit_log rows where action is sync.* or
  manifest.* for the caller, plus users.last_pull_at for the
  header. Per-pull history now persists thanks to the new
  manifest.fetch audit row.

Backend: app/api/me_stats.py — single APIRouter at /api/me/stats/*,
four GET endpoints, all gated by get_current_user (server-side
caller scope; the page route itself only renders the shell).

Frontend: app/web/templates/me_stats.html — tab bar + 4 panels,
plain JS lazy-loads each panel's endpoint on first activation,
caches per-tab so switching back doesn't refetch. Small SVG bar
chart on Tokens tab (no external charting dep). 'Stats' link
added to _app_header.html primary nav between 'Data Packages'
and the Admin dropdown.

Side change in app/api/sync.py: /api/sync/manifest now emits a
manifest.fetch audit_log row alongside the existing
users.last_pull_at bump. The column UPDATE only retains the
most recent timestamp; per-pull history needs an audit row.
client_kind='api' for the manifest endpoint (vs. 'web' which
the audit-read deduper uses for AC reads), so the Sync tab can
distinguish CLI pulls from browser-driven manifest peeks.

7 new tests in tests/test_me_stats.py:
- sessions endpoint caller-scope isolation (user A doesn't see B)
- sessions pagination
- tokens empty-user zero shape
- tokens aggregation across daily window + by_model + top + totals
- queries endpoint filters to action LIKE 'query.%' + caller scope
- sync endpoint surfaces both manifest.fetch and sync.trigger
- manifest endpoint writes the manifest.fetch audit row

* ui(me/stats): widen page to 1400px via main.main escape

Default base.html .container wraps content at max-width 800px. Stats
tables (by-model + top-sessions: 6 columns each) felt cramped at that
width — same constraint dashboard.html escapes via the {% block layout %}
override pattern. Mirror that here: render <main class="main"> and
bump .stats-page max-width to 1400px so the 6-column tables breathe
without going edge-to-edge on wide monitors.

* ui(me/stats): narrow from 1400px to 1280px to match /home

/home isn't actually .container's default 800px — style-custom.css
has a body:has(.home-mock) .container { max-width: 1280px } override
that widens it. 1280px is the shared 'wide content' width across the
codebase (top-nav header + /home + dashboard all use it).

Bumping me_stats from 1400px to 1280px so the Stats page reads as
'same chrome' instead of distinctly wider than its sibling pages.
2026-05-14 10:27:58 +00:00
Vojtech
37ad39c8a3
feat(home): status frame on /home (operator-gated, onboarded-only) (#297)
* feat(home): status frame on /home — last sync, sessions, prompts, tokens, projects

Adds the homepage status frame: a 5-card row above the install-hero /
offboard-strip on /home showing the calling user's Last sync (their
last `agnes pull`), Sessions, Prompts, Tokens used, and Projects worked
on, with a 24h/7d pill toggle.

Backed by `GET /api/me/home-stats?window=` (one DuckDB CTE joining
`users` + `usage_session_summary` + `usage_events`) and SSR'd from the
same `compute_home_stats` helper on initial paint so there's no
spinner. The window toggle is the only JS-driven path.

Side surfaces:
- `GET /api/sync/manifest` now stamps `users.last_pull_at` so
  `agnes pull` (and the Claude Code SessionStart hook that wraps it)
  imprints the analyst's last sync time for the new card.
- `usage_session_summary` gains four BIGINT token counters
  (input_tokens, output_tokens, cache_read_tokens, cache_creation_tokens)
  summed from JSONL `message.usage.*` per assistant turn.
- `USAGE_PROCESSOR_VERSION` bumps 1 → 2 so the session-pipeline
  reprocess loop invalidates stale summaries and backfills tokens
  on the next tick.

Schema migration v43 → v44 is idempotent ALTERs (last_pull_at +
4 token columns) — fresh installs receive them from `_SYSTEM_SCHEMA`,
upgrade path runs `_v43_to_v44`. Defaults (NULL / 0) backfill
existing rows cleanly.

9 new tests in tests/test_home_stats.py cover the migration,
endpoint shapes (24h/7d/unknown/empty/missing-user), and the
manifest-side last_pull_at bump.

* docs(CHANGELOG): homepage status frame entries under [Unreleased]

The post-rebase release-cut now belongs to whichever PR lands next
after main rolled to 0.54.9. This PR logs its bullets under
[Unreleased] (Added: homepage status frame, per-user pull tracking,
token counters; Changed: schema v43 → v44 migration) so they ride
out with the next release-cut.

* fix(tests): bump test_schema_v42_migration asserts to v44

CI failed because tests/test_schema_v42_migration.py hardcoded
`assert SCHEMA_VERSION == 43` and `assert v == 43` after init.
v44 (homepage stats frame backing columns) was introduced in the
preceding feat commit; this aligns the existing v42-era migration
tests with the new schema version.

* feat(home): gate status frame on operator flag + user.onboarded

Two gates on the homepage status frame:

1. **Operator master switch** — `get_home_status_frame_visibility()` in
   app/instance_config.py mirrors the existing `get_home_automode_visibility()`
   shape: env var `AGNES_HOME_SHOW_STATUS_FRAME` > yaml
   `instance.home.show_status_frame` > default `True`. Cautious-rollout
   instances can disable the frame without forking; the yaml example
   documents both knobs.

2. **Onboarded gate** — the template only renders the frame when the
   caller's `users.onboarded` is true. First-day users see a clean
   install-hero before all-zero stat cards; the frame appears
   automatically on the next render after `agnes init` POSTs
   `/api/me/onboarded`.

Router skips the `compute_home_stats` DB read entirely when either
gate is closed; `home_stats` arrives at the template as None in that
branch and the `{% if %}` shortcuts the include.

Why both gates: PostHog feature flags evaluated and rejected — this
codebase uses PostHog for analytics capture only, not feature gating;
adding a per-user feature_enabled() call on the /home critical path
would couple the homepage render to a remote eval and still require
an admin master switch. The onboarded gate is a UX coherence rule
layered on top of the operator switch, not an A/B test signal.

3 new tests in test_home_stats.py cover the env-var resolution
(falsey values + default-true). The yaml example gets a `home:`
block documenting both `show_automode` (pre-existing flag, was
undocumented in the example) and `show_status_frame`.
2026-05-14 09:28:47 +00:00
Vojtech
1e87354d7e
feat(home): Getting Started + Overview + Usage modes sections (release 0.54.7) (#291)
* feat(home): Getting Started + Overview + Usage modes sections

Three new content cards rendered between the install-hero and the
existing connector tiles on /home. Order: Getting Started → Overview
→ Usage modes → connectors.

- Getting Started — dismissible card with two clickable rows linking
  to /setup (install flow) and /setup-advanced (deeper reference).
  Subsumes the legacy `.advanced-pointer` row that sat above the news
  section. Per-device dismiss via a generic localStorage handler:
  `.home-card-close[data-dismiss-key="..."]` inside a <section> wires
  itself up at page load — drop in any future dismissible card without
  per-card JS.
- Overview — operator-owned HTML body sourced from the new
  `instance.overview` yaml field (env override
  `AGNES_INSTANCE_OVERVIEW`). HTML in, HTML out via the same `| safe`
  filter as news_intro. Empty default hides the section entirely,
  keeping the OSS vendor-neutral; operators paste their product
  framing / privacy posture into instance.yaml. New helper
  `get_instance_overview()` in app/instance_config.py mirrors
  `get_instance_logo_svg()`.
- Usage modes — three OSS-shipped tiles (Terminal / VS Code / Claude
  Desktop · claude.ai) explaining each surface and linking to the
  matching /setup-advanced anchors. Closes the gap for users
  wondering "where do I actually run this".

Supporting changes:
- setup_advanced.html gains a new `#claude-app` section between
  #vscode and #workspace, anchored by the Usage modes Claude Desktop
  tile. Covers the marketplace registration paths and when to prefer
  the terminal. Added to the table of contents.
- Three new tests in test_web_home_page.py pin the Getting Started
  card markup, the Overview-on-when-yaml-set path, and the
  Overview-off-by-default path. All 13 tests in the file pass.

Operator follow-up (separate infra PR — NOT this PR): paste the
Foundry-specific Overview body into instance.yaml's
`instance.overview` field. OSS ships with an empty default.

* fix(home): Overview is operator-owned content — drop dismiss button

Earlier iteration added a close X to the Overview section to match
the Getting Started card's dismiss UX. Wrong call: Overview is
operator-authored reference content (privacy posture, telemetry
policy, project framing) and a per-device localStorage hide means
returning users who want to re-read the policy can't recover it
without clearing storage.

Reverts the close button + the data-dismiss-key on the Overview
section. Test inverted to assert the dismiss key is absent (defends
against a future drive-by adding it back). Getting Started still
dismisses — that's procedural getting-started content users
legitimately stop needing once they've finished setup. Overview is
always reachable; whole section is still opt-in at the operator
level via the empty-yaml default.

* fix(home): Terminal usage-mode tile is informational (no click-through)

The setup hero above /home's Usage modes already walks the user
through the Claude Code CLI install — the Terminal tile click-through
to /setup just round-trips back to content the user already scrolled
past. Switch Terminal to a non-anchor <div> and scope the hover
affordance to a.home-usage-item so VS Code + Claude Desktop tiles
keep their click-through (those legitimately deep-link into
/setup-advanced anchors).

* fix(home): point Usage modes guidance at ~/{workspace}/Projects/ subfolder

The bundled plugin scopes the session-analysis loop and the
central-catalog sync to ~/<workspace>/Projects/, not the workspace
root itself — that convention already appears in the install hero's
Step 4 manual-fallback note ('Don't create ~/<workspace>/Projects/
manually — the bundled plugin offers to set it up after install').
Usage modes' footer guidance now matches: 'create every project
under ~/<workspace>/Projects/'. Also calls out that the
session-analysis loop is scoped to that root so users understand
why working outside the workspace dir is invisible to the platform.
2026-05-13 21:44:11 +02:00
Vojtech
14ddaf1e8e
feat(brand): wire instance.logo_svg into header brand slot (release 0.54.6) (#289)
* feat(brand): inline operator SVG logo + drop header subtitle (release 0.54.6)

Three header tweaks, one PR:

1. _app_header.html drops the small uppercase subtitle line below the
   brand. instance.subtitle still flows into the CLAUDE.md preamble +
   init welcome template ("Operated by …"); only the web header chrome
   loses it.

2. get_instance_logo_svg() in app/instance_config.py reads
   instance.logo_svg (yaml) / AGNES_INSTANCE_LOGO_SVG (env). The
   yaml field was already documented in instance.yaml.example and the
   template already supported inline <svg> via {{ config.LOGO_SVG |
   safe }}, but router.py:344 hard-coded LOGO_SVG = "" — the middle
   wire was missing. Now operators can paste a lockup directly into
   their instance.yaml under instance.logo_svg: | and have it render
   in the header. Resolution mirrors get_instance_brand (env > yaml >
   ""). instance.name remains independent: drives browser <title>
   tags + page h1s + CLAUDE.md heading; the SVG is the web-header
   visual only.

3. .app-header-logo svg gains max-height: 40px; width: auto; so any
   operator's lockup scales via its viewBox to fit the 72px header
   without per-asset width/height edits. Pairs with #2 — without the
   clamp, raw artwork (e.g. a 1600x430 lockup) overflows the chrome.

Release-cut included per the same-PR rule (Unreleased contained only
these bullets after rebase onto 0.54.5).

* revert: keep app-header-subtitle span — out of scope for this PR

Initial commit dropped the subtitle line on the assumption that
the user wanted both the secondary header line AND the future-SVG
brand cleaned up. The actual ask was narrower: drop the hostname
suffix that renders inside instance.name ("Foundry AI (hostname)"),
which is a startup.sh concern, not a template one. Restore the
subtitle span and the CHANGELOG bullet that announced its removal.
PR scope narrows to LOGO_SVG wiring + CSS clamp only.

* fix(header): hide subtitle span when instance.subtitle is empty

Pre-fix the template fell back to the literal string 'Data Analyst
Portal' when INSTANCE_SUBTITLE was unset, so operators who left the
field empty saw a stray hardcoded label below their brand. Switched
to a Jinja {% if %} guard around the whole <span class="app-header-
subtitle"> so an empty subtitle produces no element at all — clean
header chrome instead of placeholder leak.

* feat(home): hide install-hero once onboarded + X close button

- Wrap the entire install-hero in `{% if not onboarded %}` so once
  `users.onboarded=true` (auto-flipped by `agnes init` POSTing
  /api/me/onboarded, or by the new X / existing fallback button) the
  blue hero disappears entirely. Pre-PR the onboarded branch reused
  the same shell with a "Welcome back" header + "Steps 1–4 done" badge
  + minimize toggle, which visually outweighed the actual nav hub.
- Add a circular × close button (top-right of the hero, rendered only
  when not-onboarded). Click → window.confirm() asking the user to
  acknowledge onboarding → POST /api/me/onboarded → reload. The
  confirm string intentionally avoids the literal phrase
  "Mark me as offboarded" because cli/commands/onboarded.py::status
  scans /home's rendered HTML for that exact marker as a fallback for
  the api/me/profile check.
- Lift the offboard escape hatch out of the hero into a discrete
  `.offboard-strip` rendered below, gated `{% if onboarded %}`. Lets
  the analyst flip back to the install view after wiping their
  workspace folder.
- Centralize the /api/me/onboarded POST into a `postOnboarded()` JS
  helper reused by the hero X, the existing "Mark me as onboarded"
  fallback button, and the new offboard button.

Tests updated to match the new behavior:
- `test_home_onboarded_user_sees_nav_hub` — asserts the hero is gone
  and the offboard strip is the only setup-flow remnant.
- `test_minimize_toggle_no_longer_rendered` (renamed) — asserts the
  minimize toggle is absent in both states (was previously rendered
  inside the now-hidden onboarded branch of the hero).
- `test_home_no_auto_transition_after_post_until_reload` — checks
  offboard-strip presence post-flip instead of the removed
  "Welcome back" hero copy.

* fix(home): X-close button used invalid source enum, hit 422

The X button's data-target-source was 'self_acknowledged_x' to give
audit_log a separate marker for X-vs-button-driven flips. But
app/api/me.py:38's OnboardedRequest pins source to a Literal of
['agnes_init', 'self_acknowledged', 'self_unmark']  — pydantic
returned 422 on every X click.

Confusing side effect: both buttons share self-mark-status as the
status element, so the failed X click rendered 'Failed (422)' next
to the still-functional 'Mark me as onboarded' button. Looked like
the button itself broke.

Fix: drop the _x suffix. Both surfaces now POST source='self_acknowledged'.
Distinction in audit_log is not load-bearing — the source field
captures user intent ('I'm onboarded'), not the specific UI affordance.
2026-05-13 17:25:46 +00:00
Vojtech
50a974f196
feat(store-guardrails): admin-configurable content thresholds (#281)
* feat(store-guardrails): admin-configurable content thresholds

Adds the flea-market content guardrail floors to the /admin/server-config
editor so operators can tune the bar without code changes. Defaults are
unchanged (60 chars description, 25 chars command, 5 distinct words, 200
chars body) — patching guardrails.* in instance.yaml or via the admin UI
overrides any of them and the next inline check picks up the new value.

src/store_guardrails/content_check.py now resolves the four floors via
helper functions (_min_desc_chars / _min_command_desc_chars /
_min_distinct_words / _min_body_chars) that read app.instance_config at
call time. Module-level _DEFAULT_* constants remain as fallbacks if
the import fails (defensive — keeps the guardrail module loadable
without the app package on its path).

app/instance_config.py grows four matching getters returning the live
value with sane defaults + integer coercion.

app/api/admin.py registers 'guardrails' as an editable section + ships
nine known-fields entries (min_description_chars,
min_command_description_chars, min_distinct_words, min_body_chars,
enabled, review_model, blocked_quota_per_day, blocked_bundle_ttl_days,
stuck_review_grace_seconds) with operator-facing hint copy explaining
what each knob does.

app/web/templates/admin_server_config.html gets a SECTION_META entry
so the section renders as 'Flea-market guardrails' with a help string
instead of a bare section ID.

app/web/router.py threads the live thresholds into /store/new and
/store/examples via a small _guardrail_thresholds() helper so the
disclosure copy, char counter, and "Why these limits" table render
the configured value (not a hardcoded 60). End-to-end smoke verified:
PATCH guardrails.min_description_chars=90 → /store/new immediately
renders "90 characters" + JS DESC_MIN=90 on the next request, no
restart required (helpers read live config per call).

* chore(store-guardrails): address PR review safe-fix findings

Code-review safe_auto findings on PR #281 (review run
20260513-100126-64052520):

- CHANGELOG: add Unreleased entry covering the new
  /admin/server-config Flea-market guardrails section, the four live
  threshold getters, and the route-helper rendering knobs. Required by
  the project's non-negotiable "Changelog discipline" rule.
- content_check.py: narrow `except Exception` to `except ImportError`
  on the four `_min_*()` resolver helpers. Surface-level TypeError /
  ValueError on a malformed YAML value belongs to the
  instance_config getters' own try/except — the resolvers should only
  defend against the in-tree import itself failing, not silently
  swallow real bugs in the getters.
- store_upload.html: refresh the stale "30-char threshold" comment to
  reflect the configurable floor (default 60), and add `|default(60)`
  / `|default(25)` / `|default(5)` filters to the disclosure-copy
  bindings so the upload form matches store_examples.html's
  belt-and-suspenders rendering if a future route ever renders the
  template without populating the `guardrail` context.
- router.py: tighten `_guardrail_thresholds()` return annotation
  from bare `dict` to `dict[str, int]`.

Residual work (left for separate change after operator direction):
- Add round-trip test (PATCH guardrails -> next inline check uses
  new value) — primary testing gap.
- Decide policy on `min_*=0` (currently coerced to 1 via
  `max(1, int(val))`) vs treating 0 as a disable sentinel like
  neighbour getters (`blocked_quota_per_day`,
  `blocked_bundle_ttl_days`).
- Add POST-time integer validation for `guardrails.*` so a typo'd
  YAML value (bool / string / float) errors loudly instead of
  silently falling back to the default.

* test(store-guardrails): cover admin-configurable thresholds + PATCH round-trip

Closes the "primary testing gap" Vojta noted in the safe-fix commit
on PR #281 — the four new `get_guardrails_min_*` getters and the
PATCH-takes-effect-on-next-check live-config flow had no direct
coverage.

10 new tests in `tests/test_store_guardrails_admin_config.py`:

- TestGuardrailGetterDefaults (4 tests) — each new getter returns the
  documented default (60 / 25 / 5 / 200) when nothing is configured.
- TestGuardrailGetterOverlay (5 tests) — overlay-driven overrides win,
  string values that look numeric coerce via int(), garbage strings
  fall back to default via the (TypeError, ValueError) branch, and the
  `max(1, int(val))` floor pins zero/negative inputs to 1.
- TestPatchRoundTrip (1 test) — PATCH `/api/admin/server-config`
  `guardrails.min_description_chars=90`, then call content_check
  against a 75-char description that previously passed: must now fail
  with `too_short`. Then PATCH back to 60 and verify the next check
  passes again. Closes the cache-invalidation contract Vojta relies on
  for the "no app restart" claim — broken without the
  reset_cache() bracket in /api/admin/server-config.

The TestGuardrailGetterOverlay.test_zero_or_negative_floored_to_one
test pins the current `max(1, int(val))` policy. Vojta's safe-fix
commit explicitly left "policy on min_*=0 vs disable-sentinel" as
residual work — pinning the current behavior here ensures any future
change to use 0 as a disable sentinel must update this test (and the
reviewer sees the policy decision).

Verified: 4509 tests pass locally (4499 existing + 10 new).

* release: 0.54.2 — admin-configurable flea-market guardrail thresholds + tests

Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.1 →
0.54.2) bundling Vojta's admin-configurable thresholds for the
flea-market content guardrail (9 knobs in /admin/server-config) plus
the test coverage closing the "primary testing gap" he punted in the
safe-fix commit.

No DB migration; defaults unchanged from PR #276 — instances that
don't set `guardrails.*` keep the original bar transparently.

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
Co-authored-by: ZdenekSrotyr <139972147+ZdenekSrotyr@users.noreply.github.com>
2026-05-13 09:20:55 +00:00
ZdenekSrotyr
b4d3c576af
Activity Center: audit log + telemetry + sessions + agnes_* tables (#278)
* docs(spec): admin observability spec + Activity Center MVP plan

Parent spec (480 lines) + executable plan (2295 lines, 14 TDD tasks).
Covers Activity Center rebuild (/admin/activity), with /admin/sessions
and /admin/feedback deferred to follow-up plans.

Already incorporates reviewer-pass revisions across three angles
(security, production resilience, code architecture):
- _get_db import path corrected to app.auth.dependencies
- Test fixtures aligned with seeded_app / admin_user / get_system_db
- All new audit writes wrapped in try/except + logger.exception
- Filename sanitization on session uploads
- DuckDB DESC index behavior documented; upgrade window flagged
- Migration idempotency + evolved-DB test cases
- reveal_raw + shared-cache multi-worker explicitly deferred

Targets schema v40 (audit_log gains params_before, client_ip,
client_kind, correlation_id + 3 indices).

* feat(db): schema v40 — audit_log gains params_before, client_ip, client_kind, correlation_id + 3 indices

* chore(test): clean up Task 1 — drop unused import, rename stale test

* feat(audit): AuditRepository.log() accepts params_before/client_ip/client_kind/correlation_id

* test(audit): strengthen params_before assertion to round-trip JSON content

* feat(audit): AuditRepository.query() rich filters + keyset cursor pagination

* feat(sync): SyncStateRepository.list_recent() cross-table feed

* feat(audit): POST /api/sync/trigger writes audit_log row

* feat(audit): POST /api/scripts/run-due writes audit_log row

* feat(audit): POST /api/upload/sessions writes audit_log row + sanitizes filename

* feat(audit): GET /api/data/{table_id}/download writes audit_log row

* feat(activity): /api/admin/activity timeline + /health + /sync endpoints

* feat(ui): /admin/activity rebuilt — health pulse, timeline, sync grid; /activity-center → 308 redirect

BREAKING: removed demo executive-pulse / maturity-roadmap content from activity_center.html.
The page now reflects real audit_log + sync_history data.

* feat(ui): admin nav + dashboard widget point at /admin/activity

* feat(activity): recursive-audit suppression for AC read endpoints (60s window per actor+filter)

* feat(activity): emit PostHog events when integration enabled (no-op default)

* fix(audit): move v40 indices out of _SYSTEM_SCHEMA + update test_repositories to unpack query() tuple

_SYSTEM_SCHEMA CREATE INDEX on audit_log(timestamp) failed when migration
tests hand-roll a bare audit_log (id, action) without the timestamp column.
Fix: remove indices from _SYSTEM_SCHEMA; add ADD COLUMN IF NOT EXISTS guards
for timestamp and other pre-v40 columns in _v39_to_v40() so the upgrade path
is safe on any hand-rolled schema; call _v39_to_v40 explicitly in the
fresh-install (current==0) path to restore index creation there.

Also unpack the (rows, next_cursor) tuple from AuditRepository.query() in
the three TestAuditRepository tests that still treated it as a list.

* docs: CHANGELOG entry for Activity Center MVP

* chore: refresh stale module docstring in app/api/activity.py

* feat(cli): agnes admin activity — terminal access to Activity Center (timeline + health + sync)

* fix(db): _v39_to_v40 — add IF NOT EXISTS guard for 'action' column

The v39→v40 ladder step adds defensive ADD COLUMN IF NOT EXISTS for
every audit_log column so a hand-rolled bare audit_log (id only) is
safe through the ladder. 'action' was missing from the guard list,
causing CREATE INDEX idx_audit_action_time to fail on tests that
stub audit_log with only an id column (tests/test_e2e_extract.py::
TestSchemaMigration::test_migration_preserves_and_extends).

Local 6/6 schema tests + the previously-failing CI test pass.

* docs(spec): platform telemetry epic — Boss directive + Activity Monitoring plan rebased onto v40 (stacked on zs/spec-activity-center)

* feat(db): schema v41 — 7 usage_* tables for telemetry (events, summary, rollups, attribution)

* chore(db): tighten v41 — usage_session_summary.session_id NOT NULL + upgrade test asserts all 7 tables

* feat(usage): UsageAttributionRepository — replace/delete/lookup over usage_attribution_* tables

* refactor(marketplace): extract list_inner_skills/agents/commands to src/marketplace_listing.py for reuse

* feat(usage): explode plugin attribution on marketplace sync + store entity write; backfill script

* refactor(marketplace): finish src/marketplace_listing.py extraction — drop duplicate _list_inner_* + _parse_frontmatter from app/api/marketplace.py

* feat(usage): promote attribution helpers to src/usage_attribution_helpers.py; hook update_entity rename + bundle-swap; clarify best-effort semantics

* feat(usage): UsageProcessor real extraction + rollup rebuild + 10 fixture-driven tests

* fix(usage): include tool_id in event hash + executemany + rollup transaction (critical multi-tool-turn drop fix)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(marketplace): popularity stats — invocations_30d + trend + sort=most_used|trending + Most Popular section

* feat(admin): /admin/users/<id> Sessions section — list + single-file + bulk-zip downloads (audit-logged)

* feat(usage): admin export endpoint + CLI — csv/json/parquet streaming, filters, audit-logged

* feat(usage): agnes admin ask — LLM Text-to-SQL over usage_events with SELECT-only validator (audit-logged)

* feat(usage): reprocess + prune endpoints + scheduler daily prune job + CLI

* docs: PLATFORM_SETUP.md operator playbook + HOWTO/ cookbook (5 guides + index)

Adds docs/PLATFORM_SETUP.md as a consolidated operator playbook covering
bootstrap, TLS, marketplaces (curated + flea), scheduler env vars, telemetry
extraction/export/ask/prune, privacy posture, and daily routine.

Adds docs/HOWTO/ with 5 analyst cookbook guides: first query, snapshots for
remote tables, private sessions, feedback + admin ask, and customizing skills.

Existing setup docs (QUICKSTART, DEPLOYMENT, ONBOARDING, HEADLESS_USAGE)
get a one-line cross-reference at the top pointing to PLATFORM_SETUP.md.

* docs(changelog): platform telemetry epic — usage_* foundation + surfaces + admin access + docs

Comprehensive [Unreleased] entry covering: usage_events/session_summary/
tool_daily/plugin_daily tables (v41), attribution lookup tables, backfill
script, marketplace Most Popular + invocation chips + sort, admin Sessions
section, export/ask/reprocess/prune endpoints + CLI mirrors, Activity Center
(v40), PLATFORM_SETUP.md + HOWTO/ docs, and operations notes for v41 upgrade.

* fix(security): block DuckDB read_*/http_*/glob functions in usage_ask validator + symlink escape guard in session zip + clarify mark-private semantics

* fix(admin): parquet export tempfile cleanup on COPY failure + correct processed-first sort on /admin/users/<id>/sessions

* feat(audit): close 8 production audit gaps — query (local/remote/hybrid), catalog/schema/sample, snapshot estimate/create, check-access

* feat(ui): /admin/usage summary dashboard + per-user activity tab on /admin/users/<id>

* fix(audit): cap error messages at 200 chars + audit user_activity reads + recursion guard on usage.summary

* fix(audit): catalog.list audits on error path + clean up deferred json import

* fix(ux): client_kind=cli for PAT auth + timeline empty state + email-instead-of-uuid + nav reorder + help text + loading indicators + ask doc

* feat(observability): unify /admin/activity into single page with saved views

- KPI cards (events, users, error rate, p95) clickable as quick-filters
- Faceted filter dropdowns populated from audit_log in the current window
- Sortable audit table, cursor pagination, per-row JSON side panel
- Saved views (schema v43: user_observability_views) — per-user state
- Top bar: window selector + 30s Live toggle + saved views dropdown
- /admin/scheduler-runs → 308 redirect (source=scheduler filter)
- New endpoints: /api/admin/observability/{facets,kpis,views}

* test: update activity + scheduler-runs tests for unified page

- test_admin_activity_page_renders asserts new structural anchors
- test_admin_scheduler_runs_page_admin_only asserts 308 redirect

* fix(observability): respect [hidden] on modal + side panel

CSS `display: flex` on .obs-modal beat the [hidden] attribute's UA
display:none, so the save-view modal rendered on page load and Cancel
clicks couldn't dismiss it. Gate the modal's flex layout on
:not([hidden]); add the same display:none guard prophylactically to
.obs-panel and .obs-views-panel.

* feat(observability): user enrichment in audit + interactive /admin/usage

Activity:
- /api/admin/activity now joins users for user_email + user_name per row
- User column renders "name (id-prefix)" or "email (id-prefix)" instead
  of an opaque truncated UUID; falls back to id when the user record is
  missing

Usage:
- /admin/usage rewritten as the same filter/group-by/search pattern as
  /admin/activity. Faceted dropdowns (User / Tool / Source / Event type)
  populated from usage_events; debounced free-text search across
  tool_name / skill_name / subagent_type / command_name
- New endpoints /api/admin/usage/{facets,kpis,query}; the query endpoint
  supports group_by in {day, username, tool_name, source, ref_id} with
  sort + offset pagination, plus an ungrouped raw-events mode
- 4 KPI cards (events, distinct users, distinct tools, error rate) are
  clickable quick-filters; clicking a grouped row applies the bucket as
  a filter
- Old static `?window=7d|30d|all` server preload removed; all state is
  client-side via since_minutes + group_by + filters in the URL

* fix(observability): clearer labels, all-column sort, drop saved views UI

- Rename page titles: "Activity" → "Server activity", "Usage" → "Tool usage"
  with a one-line subtitle on each explaining what the page covers and
  linking the other one. The two pages source different data (audit_log
  vs usage_events) and the previous labels conflated them.
- Drop the saved-views dropdown + save modal from /admin/activity. The
  modal pop-open bug was the trigger; the value wasn't there yet. The
  /api/admin/observability/views CRUD + DuckDB table stay in place.
- Rename "Live (30s)" to "Auto-refresh (30s)" with a tooltip clarifying
  that it's the re-fetch rate, not the time range. Time range now
  labeled "Time range" instead of "Window".
- All audit-table columns are sortable (User, Source, Action, Resource,
  Result added); sort is page-local with a Jinja comment explaining the
  trade-off. Same for raw usage rows.
- Fix duplicate sort-arrow bug — the literal "▼" in the Time th HTML was
  rendering alongside the CSS ::before arrow. Removed the literal; CSS
  is the single source of truth.

* feat(observability): global Sessions browser + transcript viewer + CLI

Web:
- /admin/sessions — list every collected session JSONL across all users
  with time-range, user, model, errors-only and free-text filters. Default
  sort surfaces error-heavy sessions first. KPI cards (sessions, distinct
  users, sessions w/ errors, tool error rate) clickable as quick-filters.
- /admin/sessions/<username>/<file> — transcript viewer rendering the
  JSONL chronologically: user prompts, assistant text, tool calls (with
  JSON input) and tool results (with flattened output). Errors get a red
  border + chip and a "Next error" navigation button at the top.
- Admin dropdown gains a "Sessions" link.

API:
- GET /api/admin/sessions/{list,kpis,facets} — filtered cross-user reads
  off usage_session_summary
- GET /api/admin/sessions/{username}/{file}/transcript — parses JSONL via
  the existing services.session_pipeline.lib, returns chronological events
- GET /api/admin/sessions/{username}/{file}/download — JSONL stream, same
  path-safety guards as the per-user endpoint, audit-logged

CLI:
- `agnes admin sessions list [--user X] [--errors] [--since 7d]` — table
  output with `!` prefix on rows that hit a tool error
- `agnes admin sessions show <username> <file>` — transcript dump, with
  `--errors` to print only the failed tool_result blocks
- `agnes admin sessions download <username> <file> [-o path]`
- `agnes admin sessions kpis` — top-level numbers

* feat(internal): expose telemetry tables to agnes query with row-level RBAC

Three new registered tables backed by system.duckdb, queryable through
the same /api/query plumbing analysts use for Keboola / BigQuery /
local sources:

  agnes_sessions  → usage_session_summary  (filter: username)
  agnes_usage     → usage_events           (filter: username)
  agnes_audit     → audit_log              (filter: user_id)

RBAC is per-row, not per-table: admins see every user's rows; non-admins
see only their own. The filter is built server-side from the auth user
dict; non-admin filter values are regex-validated before SQL interpolation.

Implementation:
- new connector connectors/internal/ with access (filter+exec) + registry
  (idempotent table_registry seed at startup)
- /api/query detects internal table refs and short-circuits to a CTE
  wrapper that prepends "WITH agnes_x AS (SELECT * FROM <src> WHERE …),
  …" then "SELECT * FROM (<user_sql>) AS _q". DuckDB cursor on the
  shared system.duckdb handle — opening parallel handles / ATTACH on the
  same file is blocked process-wide.
- mixing internal + BQ / registered local tables in one SELECT is
  rejected (v1 limitation)
- src.rbac.can_access_table waves internal tables through for all
  authenticated users; row scoping is the actual security control
- /api/v2/schema and /api/v2/sample gained internal branches; sample
  intentionally skips its cache because rows are RBAC-scoped per caller
- audit row written as action='query.internal' with is_admin flag

Tests: connectors/internal/access — RBAC, filter clause, schema, CTE
wrapper coexistence with user-supplied aggregations, unsafe-username
rejection. 16/16 passing.

Motivating queries this enables:
  SELECT tool_name, COUNT(*) FROM agnes_usage
   WHERE is_error GROUP BY 1 ORDER BY 2 DESC
   -- analyst self-introspection: which tools fail for me?

  SELECT user_id, COUNT(*) FROM agnes_audit
   WHERE action = 'session.transcript_view' GROUP BY 1
   -- admin: who's been looking at whose session transcripts?

* feat(admin): group dropdown into 5 named sections + internal tables in /catalog

Admin dropdown gains section headers so admins can land on the right
page without re-reading the full menu:

  Activity Center      Server activity / Tool usage / Sessions
  Users & Access       Users / Groups / Resource access / Tokens
  Data                 Tables
  Agent Experience     Curated Marketplaces / Flea Submissions /
                       Agent Setup Prompt / Agent Workspace Prompt
  Server               Server config

"Agent Experience" frames the curated content + prompts as one cluster
— it's all admin-controlled material that shapes what an analyst's AI
agent encounters. "Configuration" → "Server" since only one item lives
there now.

Renamed the section's first two items:
  "Activity" → "Server activity" (matches page H1)
  "Usage"    → "Tool usage"

Also fixes /catalog visibility of the internal tables (agnes_sessions /
_usage / _audit) for non-admin users: ``app.auth.access.can_access``
short-circuits to True for resource_type='table' + an internal-table id.
Without this, non-admins saw the tables in /api/v2/catalog (which uses
the same RBAC bypass) but not on the /catalog HTML page (which calls
can_access directly, requiring a resource_grants row internal tables
don't have).

CSS for `.app-nav-menu-section`: small caps, muted, non-clickable; first
section trims top padding so the panel doesn't open with an awkward gap.

* refactor(admin): move corporate memory into Admin > Agent Experience

Memory link was the only admin-only entry in the primary nav (gated by
session.user.is_admin). Moves it into the Admin dropdown under Agent
Experience, alongside Curated Marketplaces / Flea Submissions / Prompts
— all admin-curated content that shapes what an analyst's AI agent
encounters.

Renamed the nav label to "Shared Knowledge" to match what the page
actually is (admin-curated organisational knowledge from session
verification, surfaced to agents). URL stays at /corporate-memory; the
route still gates on require_admin per the existing comment.

Side effect: primary nav (Home / Marketplace / Data Packages) is now
uniform for every authenticated user — no conditional admin-only entry.

* ui: rename admin entries to Curated Knowledge / Init Prompt / Workspace Prompt

- "Shared Knowledge" → "Curated Knowledge" (parallel with "Curated
  Marketplaces" in the same Agent Experience section; "curated" tells
  the admin what they do there — review + approve)
- "Agent Setup Prompt" → "Init Prompt" (matches the `agnes init` flow
  it actually drives)
- "Agent Workspace Prompt" → "Workspace Prompt" (the "Agent" prefix
  was redundant — every item in the section is agent-facing)

Renames page titles + H1s on /admin/agent-prompt and
/admin/workspace-prompt to match.

* refactor: rename Usage → Telemetry across user-facing surfaces

External surfaces all switch; internal Python module / file names and the
physical DB tables (usage_events, usage_session_summary, usage_tool_daily,
usage_plugin_daily) stay — renaming them would force a schema migration
+ a redo of the LLM Text-to-SQL prompt for no analyst-visible win.

Changes:
- Admin dropdown: "Tool usage" → "Telemetry"
- Page H1 / <title>: same
- URL: /admin/usage → /admin/telemetry; old URL 308-redirects
- API prefix: /api/admin/usage/* → /api/admin/telemetry/*
- CLI: primary command `agnes admin telemetry …`; `agnes admin usage` kept
  as a deprecated alias so existing operator scripts keep working
- Internal data-source table id: agnes_usage → agnes_telemetry. The
  registry seed now evicts any stale internal-source row whose id no
  longer matches INTERNAL_TABLES, so the old `agnes_usage` row is
  removed from table_registry on next app boot
- All tests + JS endpoint paths updated

* test(rbac): include auto-appended internal tables in expectations

get_accessible_tables now appends agnes_sessions / agnes_telemetry /
agnes_audit to every authenticated user's accessible-tables list so the
internal data source shows up in /catalog. The two existing rbac tests
asserted hardcoded list shapes that pre-dated the change.

Rewritten to assert "granted tables + the canonical internal-table set"
instead of literal lists, so the test stays correct if the internal
table roster changes again later.

* ui: visual dividers between admin-dropdown sections

Adds a 1px top border + 6px top margin to every section header except
the first, so the five named groups (Activity Center, Users & Access,
Data, Agent Experience, Server) read as visually separated clusters.
The header itself stays small-caps + muted as before — the border is
additive.

* ui(memory): match obs-topbar visual on /corporate-memory

The Curated Knowledge page (linked from the admin dropdown's Agent
Experience section) opened straight into the stats bar — no title,
no subtitle, no shared chrome with the other admin pages. Adds an
obs-topbar-style header at the top of .container-memory:

  - H1 "Curated Knowledge"
  - subtitle explaining what the page is + how AI agents pull from it

The `.ck-*` class set duplicates the inline obs-* styles from
/admin/activity etc. for this one page; promoting the obs-* class set
to style-custom.css for shared reuse is the obvious next step (4 pages
already inline the same CSS), tracked as a follow-up.

Page <title> also renamed from "Corporate Memory" → "Curated Knowledge".

* ui(tables): list Agnes internal tables in /admin/tables + group in /catalog

/admin/tables previously rendered three per-source-type listings
(BQ / Keboola / Jira) and dropped any row whose source_type didn't
match — so the agnes_sessions / agnes_telemetry / agnes_audit rows
seeded into table_registry were invisible. Adds a fourth read-only
section "Agnes internal tables" that filters source_type === 'internal'
and renders the same registry-table layout the other sections use,
with two changes:

  - no Register button (these rows are seeded on every app boot from
    connectors/internal/registry.py)
  - Edit + Delete actions hidden (any change would be reverted on the
    next start). Manage access stays so admins can still inspect.

Mode badge picks up a new mode-internal CSS class (teal accent) so the
display doesn't lie and call it "local".

In /catalog, internal tables now group under an "agnes" accordion
section (bucket="agnes" on seed) instead of falling into the catch-all
"default". Single source of truth for which tables exist; admins find
them where they expect.

* ui(tables): Agnes internal as a 4th tab next to BQ/Keboola/Jira

Previous iteration mounted the internal-table listing as a separate
standalone card under the tab strip. Reshapes it to a proper
tab-content section so admins switch between data sources via one
consistent nav (BigQuery / Keboola / Jira / Agnes internal).

- New tab button "Agnes internal" in the tab-nav.
- The listing card becomes <section id="tab-content-internal"
  class="tab-content">; switchTab() already routes by id so no JS
  change beyond extending the hash allowlist for direct #internal
  links.
- Tab content keeps the read-only treatment from the previous commit
  (no Register button, no Edit / Delete in renderRegistryListing).

* ui: rename Curated Knowledge → Curated Memory

Settles the naming back on "Curated Memory" — parallel structure with
"Curated Marketplaces" in the same Agent Experience section, and zero
rename ripple: URL (/corporate-memory), API (/api/memory/*), CLI
(agnes admin memory), and Python modules all stay on "memory" so the
admin label finally lines up with the underlying surfaces.

The "Curated" prefix still tells admins what they do on the page
(review pending → approve / mandate / reject) and reads as a sibling
of "Curated Marketplaces" right next to it in the dropdown.

Touches: admin dropdown label, page <title>, page H1. DB tables stay
on knowledge_* (already the canonical naming for the data shape).

* ui: rename "Server activity" → "Audit log"

"Audit log" is what the page actually is — server-side audit_log table
rendered with KPI cards + filter bar + sortable table. The "Server
activity" label confused the term with Claude Code session telemetry
(Telemetry page) and didn't make the source/concept clear.

Touches:
- Admin dropdown nav label
- /admin/activity page H1 + subtitle
- /admin/telemetry subtitle cross-link
- test_activity_api page-renders assertion

URL (/admin/activity) and API (/api/admin/activity/*) stay — the
"activity" name has stuck at the route layer for a year; rerouting
those would churn dashboards/bookmarks for zero analyst-visible win.

* ui(admin-nav): gray band on each section header for clearer separation

Previous iteration used a 1px top border between section labels — the
labels still blended into the items above/below at a glance. Switches
to a light gray background band per section header, extended edge-to-
edge inside the panel via negative horizontal margins. Bolder
font-weight (700) reinforces the separation; bumping the font color
isn't needed because the band itself does the work.

First section's header tucks into the panel's top border-radius so the
band reaches the corners without a gap.

* ui(catalog): rename internal-table category to "Agnes Internal"

`bucket` is what /catalog renders as the accordion category header
verbatim — "agnes" lowercase didn't read as a real category name and
got confused with a system identifier. Bumps to "Agnes Internal".
Seed re-applies on every app boot so existing rows pick up the new
bucket value via `ON CONFLICT (id) DO UPDATE`.

* ui(catalog): split Agnes Internal into its own card on /catalog

Previously the three internal tables landed inside the "Core Business
Data" card under an "Agnes Internal" accordion alongside Keboola / BQ
buckets — readers conflated system telemetry with business datasets,
and the data_stats header counter ("3 tables · ~X rows total") only
ever counted synced rows so internal tables looked invisible.

Split the catalog page into two cards:
- Core Business Data: only non-internal source_types (Keboola, BQ,
  Jira). Accordions group by bucket as before. Stats counter reflects
  this card's tables.
- Agnes Internal: a dedicated card with its own visual treatment
  (teal accent matching the mode-internal badge in /admin/tables).
  Flat list (no accordion — only 3 rows, never grows here), each
  row carries the canonical `agnes query` snippet. Read-only — no
  profiler click, no In-stack toggle, no sync metadata.

Route adds `internal_card` context object; template renders the new
card only when it's non-None.

* fix(rbac): hide internal tables from /admin/access + drop "my" framing

Two related cleanups for the Agnes-internal tables:

1. /admin/access (resource grants) no longer lists them. The
   `can_access` check has a hardcoded internal-table bypass — security
   is row-level (per-request view filter), so a table-grain
   `resource_grants` row would do nothing. Surfacing them in the UI
   let admins set up grants that silently no-op. Filter at the
   `_table_blocks` projection so the UI tree never sees them.

2. Display names drop the analyst-perspective "my" framing:
     "Agnes — my sessions"          → "Agnes sessions"
     "Agnes — my telemetry events"  → "Agnes telemetry events"
     "Agnes — my audit log"         → "Agnes audit log"
   The "my" only makes sense from the querying analyst's seat
   (`SELECT … FROM agnes_sessions` returns *their* rows); on /admin/*
   pages where admin sees / configures them across users, the
   pronoun was misleading. Description text now spells out the
   row-level RBAC contract explicitly.

Display names update via TableRegistryRepository.register's ON CONFLICT
UPDATE on next app boot; no manual cleanup needed.

* ui: subtitle notes about agnes_* tables on each Activity Center page

The recursive observability story — Agnes serves its own audit /
telemetry / session data through the same `agnes query` plumbing
analysts use for business data — wasn't surfaced anywhere on the
admin pages that show that data. Three pages get a one-liner with
the canonical `agnes query` snippet + the RBAC contract (analysts
see their own rows, admin sees all):

- /admin/activity (Audit log)   → agnes_audit
- /admin/telemetry (Tool usage) → agnes_telemetry
- /admin/sessions               → agnes_sessions

Sets up the discovery moment for admins: they're reading the page,
they see "you can query this from Claude Code", they remember it
when an analyst asks "how do I find my own failed tool calls?".

* ui(tables): explain "Show log" empty-state on /admin/tables

Cache warmup log <pre> renders with a dark background and is only
populated by the SSE stream during a Re-warm all run. Opening the
page cold + clicking Show log just revealed a black bar with no
context — admins couldn't tell what they were looking at.

Adds an inline paragraph above the <pre> explaining what the log is,
the row format, when it fills in, and where to find the historical
audit trail (/admin/activity). The actual <pre> stays empty until
SSE events arrive, but the surrounding copy carries the meaning.

* ui(tables): auto-open cache-warmup log on Re-warm all click

A Re-warm all run takes ~24s per remote BQ row. With the <details>
collapsed by default, operators saw the button disable, watched a
quiet ~24s pass, and assumed nothing had happened — the streaming
log was hidden behind a closed disclosure.

Two small JS tweaks:
- cacheWarmupRun() opens the details on click, so streamed lines
  appear without an extra interaction
- cacheWarmupOnStart() hides the inline hint paragraph the moment
  real log content lands, so the dark log block isn't competing
  with redundant context

Hint paragraph also clarifies that only `query_mode='remote'` BQ
rows are warmed — operators with only materialized/internal tables
would see total=0 and the page would "do nothing" by spec.

* ui: trim Agnes internal copy across surfaces

Descriptions had grown to explain the extraction pipeline ("parsed
out of session JSONLs"), the underlying table ("Backed by
usage_session_summary"), the RBAC mechanic ("row-level RBAC at query
time — analysts see their own; admin sees all"), and the SQL snippet.
Every implementation detail meant another rewrite on the next iter.

Strips to one stable line per surface: what the data is, plus
"Also available locally for analysis". Mechanics live in code +
docs; the page copy says what the user needs to know.

Touched:
- connectors/internal/access.py: INTERNAL_TABLES descriptions
- activity_center.html / admin_usage.html / admin_sessions.html
  subtitles
- catalog.html Agnes Internal card description + row strip
- admin_tables.html "Agnes internal" tab hint

* fix(internal): is_user_admin arity bugs + + saved-view payload cap

Round-1 code review (PR #278) caught two blocking bugs and three nits.

Blocking — both `is_user_admin(user)` (single dict arg) calls raised
TypeError. is_user_admin signature is `(user_id, conn)`. Affected:

- app/api/query.py:_run_internal_query — every POST /api/query that
  references agnes_sessions / agnes_telemetry / agnes_audit blew up
  with a 500. The headline analyst-facing feature of this PR was
  unusable through the API.
- app/api/v2_sample.py — same shape; `GET /api/v2/sample/agnes_*`
  returned 500.

Both fixed to call `is_user_admin(user.get("id"), conn)`. Added two
FastAPI-level tests in test_internal_data_source.py that go through
the TestClient — the existing unit tests on `execute_internal_query`
and `build_filter_clause` skipped the request-handler layer where the
bugs lived, which is why this landed.

Nits also closed:
- connectors/internal/access.py: `+` allowed in _USERNAME_RE /
  _USER_ID_RE so RFC 5321 email local-parts (alice+test@x) resolve
  correctly without hitting InternalAccessError.
- app/api/observability.py: saved-view payload capped at 64 KiB to
  prevent an admin from bloating system.duckdb with a malformed save.

* fix(security): close non-admin data-leak via underlying-table refs

PR #278 R2 review surfaced a non-admin-exploitable bypass: SQL whose
string literal contains 'agnes_sessions' routed into the privileged
internal-query path, then queried the underlying physical table
(usage_session_summary / usage_events / audit_log) directly, escaping
the CTE wrapper's row filter. Two reinforcing defenses:

1. find_internal_refs() now strips single-quoted string literals
   before scanning for alias names — a literal alone no longer
   routes the request into the privileged code path.

2. execute_internal_query() rejects non-admin SQL that references
   the underlying physical tables (usage_*, audit_log). The CTE
   wrapper only scopes the agnes_* aliases; a direct FROM on the
   base table — or a shadowing inner WITH that still has to read
   the base table — bypasses RBAC. Block before execution with an
   actionable error pointing to the agnes_* alias. Admins are
   unaffected (god-mode short-circuit on the filter clause).

3. tests/test_internal_data_source.py — three new negative tests
   covering literal-only matches, direct-table refs, and CTE
   shadow attempts.

Also tightens usage_ask.py's SELECT-only validator: pragma_table_info,
pragma_storage_info, pragma_database_*, and duckdb_tables / columns /
views / indexes / schemas are reflection functions that leak metadata
the analyst question shouldn't reach. \bPRAGMA\b in _FORBIDDEN never
matched the function-call form (word-boundary between `A` and `_`).

* fix(security): dynamic denylist for non-admin internal queries

R3 review (PR #278) caught a wider data-leak than R2: the underlying-
physical-table guard listed only the 7 usage_* + audit_log tables,
but system.duckdb has 30+ other sensitive tables — users (emails +
ids), personal_access_tokens, resource_grants, user_groups,
user_observability_views, store_*, marketplace_*, knowledge_*, etc.
A non-admin SQL like

    SELECT * FROM agnes_sessions
    UNION ALL SELECT email, id, … FROM users LIMIT 1

would leak every user's row.

Replaces the hardcoded denylist with a **dynamic allowlist** —
non-admin SQL may reference ONLY the registered agnes_* aliases.
Every other table in `information_schema.tables` (main schema) is
rejected. Future migrations that add a new sensitive table are
automatically covered without re-editing this module.

Also strips SQL comments (`/* */` and `--`) before the identifier
scan so a comment-wrapped table name (`/**/users/**/`) can't slip
past the regex.

Four new negative tests pin: `users`, `personal_access_tokens`,
block-comment wrap, line-comment wrap.

Plus: per-user view-count cap (100) on /api/admin/observability/views
so an admin can't fill system.duckdb with thousands of saved views.

* release: 0.54.0 — Activity Center + Telemetry + Sessions + internal datasource

Cuts the work shipped across this PR (Activity Center build, recursive
internal data source) into a versioned release. Bumps pyproject.toml
to 0.54.0; renames the top of CHANGELOG.md from [Unreleased] to
[0.54.0] — 2026-05-12 with a header summary; opens a fresh
[Unreleased] section for the next round.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-12 22:41:19 +02:00
Vojtech
fb6e930bc9
feat(store-guardrails): per-component description quality + plain-language UX (#276)
* feat(store-guardrails): enforce per-component description quality

Two-tier hard guardrail on flea-market submissions. Empty / placeholder /
single-word descriptions now block before any LLM call; vague-but-passes-
floor descriptions block on the substantive LLM review layer.

Tier 1 — inline mechanical check (src/store_guardrails/content_check.py).
Walks the baked plugin tree, evaluates each component (plugin manifest,
agents, skills, commands) plus the submission-level form description
against a 60-char / 25-char (commands) / 5-distinct-word / 200-char-body
floor with a placeholder denylist (TODO, TBD, {{var}}, etc.). Floors
calibrated against real ecosystem norms: Claude / superpowers /
compound-engineering skill packs cluster 150–220 chars, npm / Docker /
VS Code at 100–120. InlineResult.passed now ANDs in content.status.

Tier 2 — LLM review extension (prompts.py + llm_review.py). System
prompt gains a content-quality criterion; REVIEW_JSON_SCHEMA carries a
content_quality {verdict, issues[]} object alongside the existing
security findings. is_safe() requires content_quality.verdict == 'pass'.
Single LLM call covers both dimensions. MAX_RESPONSE_TOKENS bumped
2000 → 2500 for the extra payload. Verdicts missing content_quality
treated as pass (backwards compat with already-recorded rows).

Submitter UX:
- /store/new wizard now carries a "Before you upload — what passes
  review" collapsible disclosure on both step 1 and step 2 with the
  bar + patterns that work. Live char counter on the description
  field. Per-component preview table (green/red dots from the new
  summarize_for_preview helper) renders after the ZIP /preview round
  trip, scoping each finding to its file.
- New /store/examples page with rejected/passes pairs for skill /
  agent / plugin / command plus a "Why these limits" research table.
  Anchored sections (#skill / #agent / #plugin / #command) so the
  rejection banner can deep-link by component_type.
- Quarantine banner _content_findings.html groups findings by file
  (one "See <type> example ↗" per component, not per field) and
  translates field codes (frontmatter.description / body / etc.) to
  plain-English labels. _content_howto_fix.html surfaces a static
  "Re-upload as new version" + "See examples" action row beneath any
  content failure on the entity detail page.
- _parse_frontmatter moved to src/store_guardrails/_frontmatter.py so
  the new check module shares the parser without inverting the
  app → src dependency direction.

Tests:
- New tests/test_store_guardrails_content.py (29 cases) covering
  every failure code per component type plus submission-level checks
  and the summarize_components / summarize_for_preview helpers.
- Extended test_store_guardrails_inline.py for the new
  InlineResult.content field + aggregate behaviour.
- Extended test_store_guardrails_llm.py for the new
  content_quality verdict pathways (fail blocks, missing field passes).
- Backfilled fixture descriptions across test_store_api.py,
  test_store_entity_versions.py, test_store_put_atomic.py,
  test_admin_store_submissions.py, test_marketplace_api.py,
  test_marketplace_v32_endpoints.py so existing happy-path tests
  clear the new 60-char floor.

* fix(content-guardrail): align agents walker with preview + drop import-time .format()

Two cleanups from the takeover review on #276 (vr/guardrails-content).

1) `_iter_components` for agents now skips files lacking frontmatter
   (no `name` AND no `description`). Pre-fix the walker greedily
   evaluated every `*.md` under `agents/` — `agents/README.md` and
   helper docs got flagged as "frontmatter.description empty"
   rejections. Worse: `summarize_for_preview` for `type=agent` ALREADY
   filters the same shape, so the upload preview gave a green dot
   while the post-bake check gave a red rejection on submit. Two new
   regression tests in TestAgentsWalkerSkipsNonAgentFiles pin both
   shapes (README + _NOTES.md) so the preview/check parity stays
   aligned.

2) `body_too_short` hints now use the same runtime-kwarg substitution
   pattern as every other hint in the table. Pre-fix the skill +
   agent body_too_short hints called `.format(min_chars=_MIN_BODY_CHARS)`
   at module-load time, but the call site `_hint_for(type_,
   "body_too_short")` didn't pass `min_chars=`, so the format() was
   just baking the constant at import. Cosmetic inconsistency; pass
   `min_chars=_MIN_BODY_CHARS` at the call site instead and let
   `_hint_for` do the substitution like it does for `too_short`.

Verified end-to-end:
- New TestAgentsWalkerSkipsNonAgentFiles cases fail on the unfixed
  walker (verified by reverting to the pre-fix file and re-running);
  pass cleanly after the fix.
- Full content-guardrail suite: 25/25 (23 existing + 2 new).
- Full pytest: 4189 passed, 25 skipped.

* release: 0.53.5 — content guardrail (flea-market submitter UX) + catalog ENTITY column + BQ hint dispatch

Bundles three threads landed in [Unreleased]:
- Vojta's flea-market content guardrail (two-tier mechanical + LLM)
- Zdeněk's `agnes catalog` ENTITY column replacement for FLAVOR
- Zdeněk's `/api/query` remote_estimate_failed hint dispatch fix

Plus the takeover hygiene from #276 review (agents walker preview/check
parity + body_too_short hint runtime kwarg consistency) and the
backslash-escape fix follow-up to v0.53.4 #275.

No DB migration; no API change. Patch upgrade lands transparently.
Upload form's new "Before you upload" disclosure + per-component preview
table appear on the next dev-VM auto-pull. Quarantine banner now groups
findings by file with "See <type> example ↗" deep-links to the new
/store/examples reference page.

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-12 21:48:27 +02:00
Vojtech
79a958ec26
feat(setup): configurable instance brand + connector setup overhaul (#268)
- instance.brand (env AGNES_INSTANCE_BRAND, default "Agnes") +
  instance.workspace_dir replace hard-coded "Agnes" / "~/Agnes" across
  /home, /setup, /setup-advanced, /login, /install, /me/debug, and the
  Claude Code clipboard setup script. Terraform-friendly env override;
  defaults preserve existing Agnes branding.

- Explicit "create workspace folder" step on /home (OS-tabbed mkdir+cd)
  + same step baked into the clipboard script as step 2. Drops the
  implicit assumption that `agnes init --workspace .` lands in a
  sensibly-cd'd shell.

- Final "Restart Claude Code" step in the setup script (unconditional,
  between connectors and Confirm) so freshly-installed plugins, MCP
  servers, and SessionStart hooks load on the next Claude Code session.

- Asana reverted from hosted Remote MCP back to PAT + raw REST against
  app.asana.com/api/1.0. MCP envelope shape consumed ~5x tokens per
  call; the PAT path lets the agent read flat REST fields. Existing
  MCP registration is detected and the user is asked whether to remove
  it (default Y, with benefits listed: token cost, no third-party hop,
  no OAuth refresh dance, deterministic envelope shape).

- Atlassian connector instructs picking the longest API-token expiry
  (today "1 year") to cut re-mint friction. No public query-parameter
  hook exists on id.atlassian.com to pre-select expiry, so the prompt
  documents the manual click and acknowledges that limitation.

- Uniform  /  per-connector marker contract (Asana, GWS, Atlassian)
  for the Confirm summary to grep. Each connector now ends with a
  Claude-driven end-to-end test that uses Claude Code's own bash to
  exercise the stored credential and prints
  " <Connector> integration verified — ..." (or the failure variant).
2026-05-12 17:10:08 +02:00
Vojtech
9ae2dd19fe
fix(memory-admin): pass RBAC user_groups (not YAML config) to /corporate-memory/admin (#253)
* fix(memory-admin): pass RBAC user_groups (not YAML config) to /corporate-memory/admin

`GET /corporate-memory/admin` was passing `corporate_memory.groups` from
instance.yaml (a dict, default `{}`) as the template's `groups=`. The
template's `renderItemCard` does `GROUPS.map(g => ...)` to build the
mandate-form audience picker, which throws `{}.map is not a function`
the moment any pending item renders. The thrown TypeError bubbles up
through `renderReviewItems` and gets swallowed by the `loadReviewQueue`
catch block, which paints "Error loading pending items." over a
perfectly valid `/api/memory/admin/pending` response.

Bug was dormant since the initial system commit because `renderItemCard`
only runs when at least one pending item exists, so test fixtures and
empty queues never tripped it.

Mandate form actually targets RBAC user_groups (audience strings like
`group:Admin`), not the unrelated `corporate_memory.groups` YAML
section. Route now passes the user_groups list shaped as
`[{name, members_count}]`. Template additionally guards the `.map`
call with `Array.isArray(GROUPS) ? GROUPS : []` so a future shape
regression degrades to "no group options" instead of crashing the list.

No DB migration; no API change.

* test(memory-admin): assert /corporate-memory/admin renders GROUPS as array

Server-side regression for the bug Vojta's previous commit fixed: the route
used to pass `corporate_memory.groups` (a dict, default `{}`) into the
template as `groups=`, so `const GROUPS = {{ groups | tojson }}` rendered
as `{}` and `GROUPS.map(...)` inside renderItemCard threw at runtime.

The bug was dormant because renderItemCard only runs when at least one
pending item exists — empty queues never tripped it. Test seeds one
pending knowledge_item and asserts the rendered HTML contains
`const GROUPS = [` (array prefix), so any future regression that flips
the variable back to a dict shape fails immediately rather than waiting
for an operator to seed the queue and notice the broken admin page.

* release: 0.51.1 — corporate-memory admin pending-banner hotfix

Patch bump: ships the /corporate-memory/admin GROUPS-shape fix
(dict → array of {name, members_count}) and its server-side regression
test. No DB migration, no API change, no operator action required —
upgrades land transparently.

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-12 13:16:01 +00:00
Vojtech
c09c85d13a
fix(cta): clipboard fallback + fold Atlassian MCP into connectors (#249)
* fix(cta): fall back to textarea+execCommand when Clipboard API rejects

The "Setup a new Claude Code" CTA fetches /auth/tokens, parses the JSON
response, renders the setup script, THEN calls
`navigator.clipboard.writeText()`. Modern browsers (Safari, Firefox, and
Chrome on stricter configurations) reject `writeText` with
NotAllowedError when transient user activation has been consumed by an
intervening `await` — which is exactly the case here. Users perceived
this as "the browser blocked the copy" and got the manual-paste fallback
modal even though the textarea + `document.execCommand('copy')` path
WOULD have worked synchronously without needing fresh user activation.

`copyToClipboard` now:
- prefers the modern Clipboard API (unchanged for the happy path)
- on writeText rejection, falls back to `copyViaTextarea` instead of
  surfacing the rejection to the caller's catch block.

`copyViaTextarea` is the previously-inline textarea fallback factored
out into a named helper, with two small hardening touches:
- `readonly` + `tabindex=-1` so the hidden textarea doesn't steal
  focus or pop the virtual keyboard on mobile.
- explicit `setSelectionRange(0, text.length)` to belt-and-braces the
  selection on iOS Safari (where `.select()` alone sometimes selects
  zero chars on touch-focused textareas).

Only the CTA button needed this — the Step-1 install-command and the
connector-copy buttons all call `writeText` synchronously inside the
click handler (no awaits in between), so they keep their existing
user-gesture context and didn't hit the same rejection. No template
changes there.

* refactor(home): fold Atlassian MCP registration into connectors block

The standalone "Register the Atlassian MCP server" step (was step 6 in
the unified setup script) moves INTO the Atlassian connector's prompt
body so all Atlassian-related setup lives in one logical group. Same
intent that #247 carried for connectors, applied one level deeper:
the hosted Remote MCP registration is part of "set up Atlassian", not
its own ungrouped step.

What changed:
- `app/web/connector_prompts.py` — the Atlassian prompt's step 5
  replaces the speculative "Register the on-demand Atlassian MCP under
  .claude/mcp/atlassian" line with the actual hosted Remote MCP
  registration: `claude mcp add --transport sse atlassian
  https://mcp.atlassian.com/v1/sse || true`. The `|| true` keeps re-runs
  idempotent and the body explains the OAuth-on-first-use contract.
  Both /home's Atlassian tile and the inlined setup-script Atlassian
  sub-block emit this line — single source of truth holds.
- `app/web/setup_instructions.py` — `_mcp_servers_block` deleted; the
  `mcp_servers` step is removed from `_step_numbers`; resolve_lines no
  longer calls it.
- Renumbering: install (1), init (2), catalog (3), preflight (4),
  marketplace (5), diagnose (6), connectors (7), confirm (8). Was:
  6 = mcp_servers, 7 = diagnose, 8 = connectors, 9 = confirm.
- `tests/test_setup_instructions.py` — Confirm step 9→8, Connect 8→7,
  diagnose 7→6, mcp_servers references dropped.
  `test_step_numbering_with_connectors_step` now asserts
  `"mcp_servers" not in steps`. Stray-Confirm assertion lists shift
  by one position.
- `tests/test_setup_page_unified.py` + `tests/test_web_ui.py` — same
  step-number shifts in the rendered /setup preview assertions.

The `claude mcp add` line is still the Atlassian Remote-MCP path that
the 2026-05-10 init-report Fix C added — only its position in the
flow changes. /home Atlassian tile copying continues to install the
MCP too (the prompt body the tile pastes contains the same line).
112 tests pass.

* feat(atlassian): operator-overrideable base URL via AGNES_ATLASSIAN_BASE_URL

Adds an env var / YAML key the operator (Terraform module, customer-VM
template, OSS instance.yaml) can set to bake the Atlassian Cloud site
root into the connector prompt — so end users don't have to guess /
paste their org's `https://<myorg>.atlassian.net`.

When set, the Atlassian connector prompt (rendered on both /home tile
and inlined into the setup-script step 7 Atlassian sub-block) replaces
step 1's "Ask me for my Atlassian Cloud site URL and email" with a
one-line note that the URL is already provisioned by the operator and
asks only for the email. Step 4's helper-script body has the
`BASE_URL='<the site URL I gave you>'` placeholder substituted with
the literal value. When unset (empty), the existing "ask the user"
flow remains — no regression for OSS instances.

Resolution + normalization in `get_atlassian_base_url()`:
- env `AGNES_ATLASSIAN_BASE_URL` > yaml `instance.atlassian.base_url` > ""
- strips trailing slash + trailing `/wiki` so the canonical value is
  the bare site root. Matches the per-user helper script's
  normalization at storage time (atlassian_prompt step 4 guard 2), so
  the literal baked in by the operator stays consistent with what the
  user's helper script would have computed from their input.

Plumbing:
- `app/instance_config.py`: new `get_atlassian_base_url()` resolver.
- `app/web/connector_prompts.py`:
  - `atlassian_prompt(*, base_url: str = "")` — string-replace two
    explicit placeholder phrases when base_url is truthy; otherwise
    return the prompt unchanged.
  - `all_connector_prompts(..., atlassian_base_url: str = "")` —
    forwards the kwarg.
- `app/web/router.py` (`_build_context`): reads
  `get_atlassian_base_url()` and passes it through to
  `all_connector_prompts(...)` so both the /home tile context AND the
  inlined-script `resolve_lines(...)` call use the same value.
- `src/welcome_template.py` (`compute_default_agent_prompt`): same
  threading via the existing import-on-demand path.

Tests (`tests/test_home_route_resolution.py`):
- `get_atlassian_base_url` resolver: default empty, env override,
  trailing-slash strip, trailing-`/wiki` strip.
- `atlassian_prompt(base_url=...)`: literal URL baked in, ask-step
  removed, placeholder replaced, operator-baked-in copy appears.
- `atlassian_prompt(base_url="")`: existing ask-the-user flow
  unchanged.
- `all_connector_prompts(atlassian_base_url=...)`: kwarg threads
  through to the rendered atlassian prompt.

135 tests pass.

* feat(asana): register hosted Asana Remote MCP in connector prompt

The Asana connector prompt only stored a PAT in the OS keychain + ran
a curl verify against /api/1.0/users/me. That set Claude Code up for
direct `curl` calls but didn't actually wire Asana into Claude's tool
list — so the user couldn't ask Claude to "find my open Asana tasks"
and have it work. Symmetric oversight to the Atlassian connector's
original speculative `.claude/mcp/atlassian` line that this branch
already replaced with `claude mcp add --transport sse atlassian
https://mcp.atlassian.com/v1/sse`.

Adds a new step 5 that registers Asana's hosted Remote MCP:

  claude mcp add --transport http asana https://mcp.asana.com/mcp || true

This is the V2 endpoint (streamable HTTP transport, launched February
2026). The V1 SSE endpoint at https://mcp.asana.com/sse was deprecated
2026-05-11 (today) and must NOT be used — calling it out explicitly
in the prompt body so a future operator who finds an old reference
doesn't paste the dead URL. OAuth is handled by Claude Code at first
use, same model as the Atlassian MCP step.

The PAT stored in step 3 stays for direct `curl` calls (precheck +
ad-hoc scripts) — the MCP path uses its own OAuth grant, not the PAT.
Old step 5 (revoke instructions) renumbers to step 6 and adds the
`claude mcp remove asana` cleanup hint.

Same single-source-of-truth invariant holds: /home Asana tile + the
inlined Asana sub-block in the setup script (step 7 connectors) both
emit identical text from `asana_prompt()`.

71 tests pass.

* feat(asana): drive MCP OAuth login + end-to-end validation post-register

`claude mcp add --transport http asana ...` only registers the
server in Claude Code's local config — it does NOT trigger OAuth.
The browser tab opens the first time any `mcp__asana__*` tool gets
invoked. So the previous step 5 left a user looking at a "registered"
MCP that, in practice, hadn't authed yet and would fail on first
real use. Same blind spot Atlassian's prompt also has, but Asana was
the one called out in the latest review pass.

Adds a new step 6 between MCP registration (step 5) and the revoke
instructions (now step 7):

  a. Tell the user verbatim what's about to happen — a low-impact
     read through the MCP will pop the OAuth browser tab; sign in
     with the same account whose PAT they stored in step 3 and
     approve. Frames the OAuth as one-time so users don't wait
     for it on every later call.
  b. Drive an actual MCP read. Don't prescribe the exact tool name
     because the Asana MCP's exposed surface (`mcp__asana__*`) is
     versioned upstream and we don't want to pin to a name that
     gets renamed. Instead: tell Claude to pick the lightest read
     from its surfaced tool list (users-me / list-workspaces /
     equivalent). Document the recovery path when Claude Code
     times out waiting for the OAuth tool use: `claude mcp list`
     to confirm registration before retrying.
  c. Print a single one-line proof that combines wiring + auth:
     "Asana MCP connected as <name> — <N> workspace(s) visible."
     Explicit anti-echo callout for tokens, task content, comments.
     On failure, surface the exact Claude-Code error and stop —
     no silent pass.
  d. Sanity-check that the MCP OAuth identity and the PAT identity
     reference the same Asana account. Easy mistake to make when
     the user has multiple Asana accounts — flag only on mismatch,
     keep quiet when they match. Recovery: `claude mcp remove asana
     && claude logout asana` then redo step 5.

Step 7 (revoke) absorbs both the keychain delete + the
`claude logout asana` line so users have a single place to undo
everything.

43 tests pass.

* fix(init): clear stale CA env vars on Windows before any TLS handshake

Reported by the 2026-05-11 Windows test pass: after `agnes init` the
gws connector failed with `UnknownIssuer` TLS errors because
`SSL_CERT_FILE` and `REQUESTS_CA_BUNDLE` were still set in Windows
User scope pointing at `C:\Users\localadmin\.config\agnes\ca-bundle.pem`
— a file that did not exist on the test host. Past Agnes installs
(the setup-prompt trust block + older bootstrap helpers) write those
pointers when they materialize a combined Agnes-CA bundle; when the
bundle file later disappears (re-init on a new VM, machine swap, the
~/.agnes dir wiped), the pointers go stale and every native Windows
TLS handshake fails before Agnes itself runs. SSL_CERT_FILE in
particular REPLACES (not appends to) the trust store, so a stale
pointer is silently catastrophic.

`agnes init` now clears stale pointers in two layers before the first
server roundtrip:

1. Current-process env (os.environ) — what the immediately-following
   `api_get` to /api/catalog/tables actually reads. Without this, init
   itself blows up before it gets to step 2.
2. Windows User-scope env via PowerShell
   `[Environment]::SetEnvironmentVariable(name, $null, 'User')` — what
   every future shell + every native tool (gws, claude.exe, pip, uv)
   inherits. The 2026-05-11 reporter expected this exact cleanup
   ("init was supposed to clear these but they persisted").

The cleanup is best-effort and conservative:
- Only deletes a var when its value points at a path that does NOT
  exist on disk. Intentional operator config (e.g. SSL_CERT_FILE
  pointing at a corp certifi bundle) stays put.
- PowerShell missing / restricted execution policy / WSL-without-pwsh:
  swallowed silently. The current-process leg still runs, which
  unblocks init even on hosts where the User-scope leg cannot fire.

Tests (`tests/test_init_ca_cleanup.py`, 6 cases):
- Stale pointers → removed from process env.
- Real-path pointers → preserved.
- Non-Windows hosts: PowerShell is not invoked.
- Windows hosts: PowerShell IS invoked with a script that checks
  all three vars + uses Test-Path + SetEnvironmentVariable.
- PowerShell FileNotFoundError: cleanup swallows it, does not raise.
- `_is_windows_host()` reflects sys.platform.

* refactor(asana): MCP-first flow — drop PAT storage, precheck via `claude mcp list`

The Asana hosted MCP at https://mcp.asana.com/mcp authenticates via
OAuth (Claude Code holds the grant; browser tab pops on first tool
use). The earlier prompt walked the user through creating + keychain-
storing an Asana Personal Access Token AND registering the MCP — two
parallel auth surfaces for one connector. Once the MCP works, the PAT
has no consumer: the precheck/verify steps that used `curl
$BASE/api/1.0/users/me` are just redundant proof that Asana itself is
reachable, which the OAuth handshake already establishes.

Removed:
- Step 0 keychain probe + curl verify against /users/me with PAT.
- Step 1 open developer-console / create PAT.
- Step 2 click "+ New access token", warn shown-ONCE.
- Step 3 helper-script for keychain-storage (per-OS bodies: macOS
  `security add-generic-password`, Linux `secret-tool store`, Windows
  `cmdkey /generic`).
- Step 4 PAT-side `users/me` verify.
- Step 5's split that kept the PAT around for direct curl scripts.
- Step 6d's "MCP vs PAT identity sanity check" — there is no PAT
  anymore, nothing to mismatch against.

New flow (3 steps total):
- Step 0 precheck: `claude mcp list | grep ^asana` — if found, the
  server is registered AND Claude Code is holding its OAuth grant
  (otherwise prior failure would have removed it); print
  "Asana MCP already registered — skipping setup" and stop. Tells the
  user the explicit reset command (`claude mcp remove asana && claude
  logout asana`) so a re-register stays one paste.
- Step 1: `claude mcp add --transport http asana
  https://mcp.asana.com/mcp` — no `|| true` because step 0 should have
  caught the "already exists" case. Step explains the V2-vs-V1
  endpoint distinction (V1 SSE deprecated 2026-05-11) and the
  abort-clean recovery if the precheck somehow missed the existing
  server.
- Step 2: same OAuth + low-impact-read validation pattern as before.
- Step 3: revoke instructions (mcp remove + logout + Asana-side app
  revoke at app.asana.com/Settings → Apps).

Both surfaces (the /home Asana tile and the inlined Asana sub-block
in the setup script's step 7) emit the new text from the same
asana_prompt() — single-source-of-truth invariant intact.

77 tests pass.
2026-05-11 21:54:51 +02:00
Vojtech
a46b9dc928
/home install-hero polish: license link contrast, auto-mode reorder, Shift+Tab guidance (#243)
* Make /home install-hero links readable against blue background

The Claude license-options link added in the previous commit inherited
the default `<a>` style (`var(--hp-primary)` blue), which renders as
blue-on-blue and is unreadable inside the blue install-hero. Add a
scoped `.install-hero a` rule that uses white with an underline
(matching the existing lead-paragraph contrast pattern) so any link
nested in the hero stays legible.

* Reorder /home install flow: auto-mode is now Step 2, Agnes install becomes Step 3

Step 3 (was Step 2) pastes a ~20-command bash bootstrap into a fresh
Claude Code session. Without auto-mode enabled first, each Bash/edit
command needs a manual approve click — bad UX for first-time users.

Move auto-mode from the outside-hero `<details>` reference block into
the install-hero as a real Step 2, between "install Claude Code" and
"install Agnes". Content is the persistent `acceptEdits` snippet
(write to ~/.claude/settings.json) plus a one-liner pointing at
Shift+Tab for users who are already inside a running Claude Code
session. YOLO mode for full Bash auto-approve stays on
/setup-advanced behind the existing link.

The outside-hero `setup-collapsible[data-section="step3"]` block is
dropped — auto-mode is no longer reference content, it's a real
install step, and duplicating it would just diverge over time.
Onboarded users no longer see the auto-mode block at all (consistent
with Steps 1 + 3 also hiding post-onboarding).

Completion banner copy updated: "Step 1, 2 & 3 done — Claude Code
installed, auto-mode set, Agnes ready". Dashboard CTA partial and
other templates don't reference step numbers for this flow, so no
adaptation needed there.

* Simplify /home Step 2 to Shift+Tab only — drop the JSON snippet

Operator pointed out two issues with the prior Step 2:

1. The settings.json snippet is redundant. Claude Code's first
   Shift+Tab cycle to auto-accept mode already prompts the user
   whether to persist it as default — Claude writes the config
   itself, no manual file edit needed.

2. The snippet only showed the POSIX path `~/.claude/settings.json`,
   which doesn't translate to native Windows.

Replace the snippet + copy button with a plain Shift+Tab instruction,
explicitly call out the first-time "make this the default?" prompt,
and note that Claude handles the config write itself — same flow on
macOS / Linux / WSL / Windows. Adds a fallback line for users who
already closed the post-OAuth session.

* Tighten /home Step 2 install-note to two paragraphs

Operator: drop the 'Claude writes the setting itself, so this works
the same on macOS / Linux / WSL / Windows...' line plus the
'auto-approves file edits going forward; Bash commands stay gated
— that's the safe default' line. Both were filler — the make-default
prompt already implies persistence, and gated Bash is the obvious
default users won't be surprised by.

Result: paragraph 1 carries Shift+Tab + first-time make-default
say-yes + closed-session fallback in one breath; paragraph 2 keeps
the verbatim YOLO link. Same affordances, less vertical space.
2026-05-11 16:46:58 +00:00
minasarustamyan
9de679c714
System plugins (schema v39) + marketplace UX polish + drop legacy pages (#241)
* System plugin tier with mark/unmark fanout (schema v39)

Adds a mandatory plugin tier so admins can pin a small set of curated
plugins into every user's stack from day one. Marking a plugin via the
new toggle on /admin/marketplaces materializes resource_grants for every
group and user_plugin_optouts subscriptions for every user, so the
existing resolver pulls the plugin into every served set without a new
filter layer. Hooks on user-create (Google OAuth, magic-link, admin
POST, scheduler) and group-create propagate the same materialization to
new principals. UI locks: /admin/access disables the checkbox with a
SYSTEM pill; /marketplace cards swap the "In stack" green pill for an
amber "Required" badge with shield icon; the plugin detail install
button reads "Required by your org"; /my-ai-stack toggle is disabled.
Bypass paths return 409 (DELETE /api/admin/grants for system grants,
PUT /api/my-stack/curated/.../{enabled:false}, DELETE
/api/marketplace/curated/.../install). Unmark only flips the flag —
materialized rows persist so admins curate cleanup at their leisure
through the now-unlocked /admin/access checkboxes.

* Marketplace UX polish + drop legacy /store and /my-ai-stack pages

Two-part cleanup post-v39:

(1) Page deletion. /store and /my-ai-stack were already replaced by
/marketplace?tab=flea and /marketplace?tab=my respectively, but the
standalone routes lingered. Hard delete in dev mode — no redirects,
stale bookmarks 404. The /store/new upload wizard, the flea
detail/edit pages, the admin queue, and all /api/store/* +
/api/my-stack endpoints (CLI consumers) stay. Internal hardcoded
hrefs in the upload wizard's Cancel button and the advanced-setup
page repointed to the marketplace tabs.

(2) Detail-page install button rework. The single button that morphed
between "+ Add to my stack" and "✓ In your stack" did not
communicate uninstall affordance. The installed state now renders an
inline white status label *before* a separate red-bordered
"✕ Remove from stack" button on the same row, both at identical
height to avoid layout shift. System plugins keep their locked amber
"✓ Required by your org" pill (no Remove button — API refuses 409).
The post-action hint panel now fires on remove too with the title
flipped to "✓ Removed from your stack" — Claude Code needs the same
/update-agnes-plugins refresh either way.

Also: /admin/marketplaces Details modal "Mark as system" toggle
redesigned. The button was near-invisible (matched neutral row
metadata). It's now a balanced amber-toned chip with shield icon
and a structured confirm modal replacing the native confirm() dialog
that summarizes fanout consequences before commit.

* Move stack-hint inside hero with glass-on-gradient styling

The post-action hint card ("✓ Added to your stack" with the
/update-agnes-plugins recipe) used to live below the hero in
panel-what (gray card on white page body). Clicking add/remove
inserted/removed it between the hero and content, shifting the
panels below — a noticeable scroll jump.

The hint is now anchored inside the hero's top-right corner alongside
the install/remove buttons, both as flex children of an absolutely
positioned .actions container. The card uses a translucent
white-on-glass treatment that adopts the hero's kind color (blue for
plugin, green for skill, purple for agent) without per-kind branching.
Hero is always tall enough (160px photo) to contain the action+hint
stack without overflow, so toggling the hint visibility doesn't grow
the hero or shift body content.

The hero-head grid reserves a third 300px column for the absolute
actions overlay so meta gets the proper 1fr free space instead of
being squeezed by a padding-right hack. Responsive breakpoint at
1100px reflows the actions stack below hero-head when the viewport
isn't wide enough to keep meta + actions side-by-side comfortably.

* Add optional -DataPath bind mount to run-local-dev.ps1

When the operator wants to inspect DuckDB files (system.duckdb, extracts,
marketplaces, store/, …) directly from Windows Explorer, the named volume
inside the Docker Desktop WSL VM isn't reachable. The new -DataPath param
generates a transient compose override that rebinds /data on app, scheduler,
extract (and Caddy's /srv:ro mirror) to a Windows host folder.

Fully additive — when -DataPath is omitted everything behaves exactly as
before: no override file is generated, $composeFiles array is unchanged,
finally cleanup is a no-op. Existing positional invocations
(.\run-local-dev.ps1 up | down | logs) keep binding to $Action because
$DataPath is a named-only parameter with no Position attribute.

The override is written via [System.IO.File]::WriteAllText so the YAML is
BOM-less across PS 5.1 / 7+ — Compose rejects BOM-prefixed YAML on Windows.
The override file is unique per PID and removed in the script's finally
block so concurrent invocations and crashes don't leak files.

* factor mark_system fanout into UserCuratedSubscriptionsRepository

The endpoint imported UserCuratedSubscriptionsRepository, ignored it
(noqa: F841), then duplicated the user-side fanout SQL inline. Adds
fanout_system_for_plugin() symmetric to the existing
fanout_system_for_user() and routes mark_plugin_system through it —
removes the dead import + 14 lines of inline SQL, returns the same
`affected_users` delta count, no behavior change.

* drop customer-specific path from .ps1 example

Per CLAUDE.md vendor-agnostic OSS rule: replaced
C:\\Business\\Groupon\\Agnes\\agnes-data with the generic
C:\\Users\\<you>\\agnes-data placeholder so the docstring
example reads cleanly on any reviewer's box.

* release: 0.48.0 + parallelize Release-workflow pytest

Cuts the release shipped via #228 #230 #231 #232 #233 #234 #236 #237 #238
#239 #240 plus this PR (#241). Major changes:

- System plugin tier (schema v39) — admins mark a plugin mandatory; fans
  out RBAC grants + subscriptions to every existing user/group plus
  hooks for new principals
- BREAKING: removed standalone /store + /my-ai-stack page routes
  (replaced by /marketplace?tab=flea + /marketplace?tab=my)
- Setup-prompt + bootstrap recovery fixes (#240)
- DuckDB CHECKPOINT-on-shutdown + 60s compose grace (#235)
- Marketplace + flea-market UX polish, agnes-metadata.json enrichment

Bonus: switch release.yml test step to `-n auto` (matches ci.yml).
Single-threaded was 15-20 min and frequently the bottleneck on PR
mergeability — now ~6 min.

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-10 19:15:41 +00:00
Vojtech
929520f5e1
Flea-market edit feature with version history (schema v37) (#239)
* feat(store): flea-market entity edit feature with version history (schema v38)

Owner + admin can now edit a store entity from a real Edit page at
/marketplace/flea/{id}/edit, replacing the prior "coming soon"
placeholder. Editable: display name, description, category, video
URL, cover photo, and an optional new bundle. Type is locked (400
type_locked). Display-name change renames the on-disk slug for both
live plugin/ and version dirs (reuses rename-on-archive helper).

Schema v38 (originally drafted as v37; renumbered after rebase onto
main where v37 was taken by the curated marketplace enrichment).

Versioning model:
* Each bundle update bakes into ${DATA_DIR}/store/<id>/versions/v<N+1>/plugin/
  and runs the standard guardrails pipeline.
* DEFERRED PROMOTION: live plugin/ + entity.version_no stay at the
  prior approved version through the LLM review window so existing
  installers keep receiving the previously approved bundle. Live swap
  + version_no/version/file_size bump happen only on LLM approval.
  Blocked verdicts leave the prior version serving forever.
* store_entities gains version_no INTEGER + version_history JSON.
  Each version_history entry carries hash, sha256, size, submission_id,
  created_at, created_by.
* Existing entities backfill to v1 with a single-entry history seeded
  from the row's current `version` hash. Initial create also seeds
  versions/v1/plugin/ so future restore can copy v1 bytes forward.

Concurrency:
* Block-while-pending: an in-flight LLM review blocks any further edit
  with 409 prior_version_pending. Owner waits 5-30s; Edit button on
  detail page renders disabled in the same window via the new
  edit_in_flight flag (decoupled from quarantine_sub since the
  deferred-promotion flow keeps visibility='approved').

Rollback:
* New endpoint POST /api/store/entities/{id}/versions/{n}/restore
  (owner + admin). Copies vN bundle forward as v<max+1> and re-runs
  guardrails (rules tighten over time; pre-approved bundles re-validate).
  Forward-only history. Same deferred-promotion semantics — live stays
  at prior version until LLM approves the restored copy.

UI:
* New /marketplace/flea/{id}/edit page (owner + admin gated).
* Versions card on plugin + item detail templates (owner/admin only)
  via shared _flea_versions.html partial.
* Admin queue gains v# column with current badge + separate Hash
  column. Submission detail surfaces Version + Bundle hash rows.
* Activity timeline split into per-submission + entity-wide cards;
  entity-wide rows render vN chips when audit row params reference
  a specific version.
* Section headers (Manifest / Static / Quality / LLM review) tag
  with vN chip via shared macro.
* Reviewed-by-model field surfaces explanatory text per status.
* Banner upload-failure now redirects to detail page on
  submission_blocked instead of staying stuck.

Tests: 24 in tests/test_store_entity_versions.py covering metadata-
only edit, bundle-edit version bump, type lock, block-while-pending,
name change disk rename, restore flow + 404/400/403 paths, edit page
404 for non-owner, versions card visibility gating, admin queue v#
column, admin detail Version/Hash rows, deferred-promotion installer
contract (pending review doesn't break installer / blocked verdict
keeps prior / approved promotes), admin can edit/restore non-owned,
restore deferred promotion, audit log per-version params. 214 tests
green across guardrails + edit + admin + repo + schema suites.

* docs(store): refresh update_entity docstring to match deferred-promotion + submission-status gate

Bring the docstring in sync with the actual fixes from the prior
commit. The pre-fix wording said the gate read
visibility_status='pending' AND submission status — under deferred
promotion that would never fire for v2+ edits. Now describes:

- Block-while-pending gates on submission.status DIRECTLY,
  independent of visibility (so v2+ deferred-promotion edits don't
  slip through).
- Display-name + bundle change defers the live rename to promotion;
  metadata-only renames stay immediate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 00:14:33 +04:00
minasarustamyan
d269c69359
Drop legacy sslVerify=false fallback from install setup prompt (#238)
The marketplace step (step 5) emitted `git config --global
http.<host>/.sslVerify false` on AGNES_DEBUG_AUTH=1 instances when no
ca_pem was readable from AGNES_TLS_FULLCHAIN_PATH. Two problems:

1. Claude Code auto-mode classifiers ("do not disable TLS verification"
   rule) block the line, breaking hands-free setup.
2. It silently masked operator misconfiguration — a debug-auth instance
   without a fullchain on disk fell through to a TLS-disabled clone
   instead of surfacing the missing cert.

After the cross-platform trust block (#137), self-signed and private-CA
servers are fully covered by step 0 reading the fullchain via
_read_agnes_ca_pem; publicly-trusted certs need no bootstrap at all.
The legacy fallback no longer covers a real scenario — verified by
running step 5 without sslVerify=false against a self-signed instance.

BREAKING: drops `self_signed_tls` parameter from
app.web.setup_instructions.resolve_lines and render_setup_instructions
(only consumed by the deleted block). The AGNES_DEBUG_AUTH env var
itself is unchanged — still gates /api/me_debug and the dropdown link.

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
2026-05-09 20:10:01 +02:00
minasarustamyan
6fe67d5279
Curated marketplace enrichment via agnes-metadata.json + curator metadata (#234)
* Curated marketplace enrichment via agnes-metadata.json + curator metadata

Adds a second well-defined metadata file `.claude-plugin/agnes-metadata.json`
that upstream marketplace repos can opt into, providing per-plugin (and
per-skill / per-agent) cover photo, demo video URL, doc links, and
category override. The Claude Code marketplace contract is untouched —
agnes-metadata.json + the convention `.agnes/` directory are stripped
from the synthetic Claude Code marketplace served via /marketplace.zip
and /marketplace.git/*, so user instances see a clean Claude Code repo
with no Agnes-only metadata.

Highlights:
- DB schema v32 — adds curator_name + curator_email on marketplace_registry,
  cover_photo_url + video_url + doc_links on marketplace_plugins.
- Mandatory curator at marketplace registration, editable later through
  the admin UI; surfaces on cards + detail pages in place of owner_todo.
- External-asset mirror cache at ${DATA_DIR}/marketplace-cache/<slug>/
  with conditional GET, 60s timeout, 10 MB body cap, SSRF guards, and
  Wikipedia-policy-compliant User-Agent.
- Strict drop semantics — anything Agnes can't deliver as a real PDF /
  Markdown / plain text doc, or a real PNG / JPEG / WebP cover, is
  dropped from the served metadata; UI looks identical to no-entry case
  (gradient placeholder for missing covers, no row in the doc list).
- Doc allowlist + image allowlist enforced on both the curated mirror
  flow and the Flea upload flow (/store/new); shared module
  src/marketplace_assets.py.
- New /api/marketplace/curated/{mp}/{plugin}/{asset,doc,mirrored}/...
  endpoints with path-traversal guards + RBAC + Content-Disposition
  attachment for docs.
- Curator-focused format guide at /marketplace/format-guide; canonical
  source is docs/curated-marketplace-format.md, also linked from the
  admin /admin/marketplaces page next to + Add Marketplace.

See CHANGELOG.md under [Unreleased] for the full breakdown.

* Fix format-guide test assertion to match shortened disclaimer

The 'Flea Market' phrase was trimmed out of the disclaimer in
docs/curated-marketplace-format.md after the curator-focused rewrite.
Update the rendered-HTML test to assert the channel-scoping phrase
that's actually present ('Curated Marketplace channel only') rather
than the 'Flea Market' contrast that's no longer in the doc.

* Drop unused 'version' field from agnes-metadata.json schema

The parser never read it; it was a YAGNI placeholder for future
schema evolution. Curators don't need to wonder what to put there
when adding the file for the first time. Will be re-added if and
when we actually introduce a backwards-incompatible schema change.

* Harden asset mirror against SSRF via redirect + DNS rebinding

The pre-flight _is_safe_url check validated only the initial URL;
urllib.request.urlopen then followed redirects and re-resolved DNS for
the actual connection — both bypassable. Attacker-controlled origin
could 302 to http://169.254.169.254/... and exfil cloud metadata;
attacker-controlled DNS could return public IP first / 127.0.0.1 second.

Replace urlopen call with a shared OpenerDirector wired through three
custom handlers: _SafeRedirectHandler re-runs SSRF allowlist on every
redirect Location (max 5 hops, down from urllib's 10), and
_PinnedHTTPHandler / _PinnedHTTPSHandler connect to the IP that passed
validation rather than re-resolving the hostname. TLS SNI + cert verify
stay bound to the original hostname.

_resolve_safe returns the validated IP (the existing _is_safe_url
2-tuple wrapper stays for backwards compatibility) and rejects round-
robin DNS that mixes a public + private record. _UnsafeRedirectError
is a typed exception so _fetch_url can map redirect blocks to terminal
'rejected' status (not transient 'failed'). _http_open is the single
call site so tests can mock at one well-defined seam.

Tests cover redirect blocking (link-local, loopback), redirect-error
unwrapping inside URLError, pinned-IP connection target, and the
end-to-end DNS-rebinding scenario. Existing tests that mocked
urllib.request.urlopen are migrated to mock _http_open.

* Harden /asset/ endpoint against stored XSS

The endpoint served any file in the cloned marketplace repo with
stdlib-detected Content-Type, so a curator who landed evil.html (or a
renamed evil.png carrying HTML bytes) in the working tree got a
same-origin XSS — the response shares cookie scope with /admin and
/api/me/*.

The asset endpoint is image-only by contract (cover photos referenced
from agnes-metadata.json + inner skill / agent cards), so applying the
same allowlist + magic-bytes pattern that /doc/ already uses closes
the gap without breaking any legitimate use case. Three layered
checks: extension in IMAGE_EXTENSIONS (.png/.jpg/.jpeg/.webp; SVG
excluded — <script> inside SVG executes), validate_image_file magic
bytes (defeats rename-extension attack), Content-Type pinned from the
validated extension (never stdlib mimetypes).

Defense-in-depth: X-Content-Type-Options: nosniff stops browser MIME
sniffing; Content-Security-Policy: default-src 'none' blocks script /
iframe execution even if a future regression let HTML through.

Tests cover the .html extension reject, the renamed-HTML-as-PNG magic-
bytes reject, the .svg reject, and the happy-path PNG with security
headers attached. The pre-existing path-traversal test seeds a real
PNG instead of ok.txt now that the endpoint is image-only.

* Enforce mandatory curator on marketplace PATCH

The POST handler enforced curator_name + curator_email at create time,
but PATCH treated empty / missing curator inputs as 'no change'. Legacy
rows that pre-date v32 (curator_name=NULL) could be edited indefinitely
without ever filling the curator gap, and OWNER_TODO_PLACEHOLDER lingered
on every /marketplace card.

Reject the PATCH with 400 when the post-merge row would persist with
empty curator. The check fires after the existing field-merge logic, so
once-filled rows that don't touch curator still pass through (their
existing values fall through from the DB row). DB column stays nullable
so untouched legacy rows continue to coexist — the gate fires only the
moment an admin opens the edit modal.

Existing PATCH semantics preserved: empty-string input still means 'leave
existing value alone', and once-filled curator can't be cleared (those
test cases pass unchanged). New test seeds a legacy row directly via the
repository, then exercises url-only PATCH (rejected), partial-fill PATCH
(rejected), and full-fill PATCH (succeeds); a follow-up no-curator PATCH
on the now-formed row also passes.

* Drop unused curated-marketplace helpers (PR #234 review)

* build_db_payload — imported by src/marketplace.py but never called.
  The strict-drop semantics it would have implemented were re-written
  inline in _refresh_plugin_cache (see the comment block there). The
  standalone helper still carried the old fall-back-to-original-external-
  URL-on-mirror-failure behaviour, which contradicts the documented
  drop-when-can't-deliver contract — a future contributor who re-wired
  it would have introduced a silent regression. Delete with the helper
  + the import + the comment that referenced it.

* _resolve_marketplace_name — one-line shim with no remaining call
  sites. Callers use _resolve_marketplace_meta which returns name +
  curator together, avoiding the double DB hit the shim exists to
  hide.

* '# noqa: F401  Optional kept for forward-compat' was wrong — Optional
  IS used in src/marketplace.py (line 70 and line 238). Drop the noqa
  comment so a future ruff run doesn't try to remove a real import.

Removing build_db_payload also drops the only remaining use of Optional
in src/marketplace_metadata.py, so the import comes out there too.

* Cap agnes-metadata.json size + catch RecursionError on parse

The reader is invoked once per marketplace per sync and the file is
curator-controlled. Two failure modes were unguarded:

* Multi-GB JSON: path.read_text() pulled the whole file into memory
  before json.loads even ran. A curator with commit access to an
  upstream repo could OOM the sync worker.

* Deeply-nested JSON under any size cap: cpython's recursive object /
  array parser raises RecursionError at ~1000 levels of depth.
  RecursionError is a RuntimeError, not ValueError, so the existing
  catch let it propagate up and abort the entire sync — every other
  marketplace in the same pass got skipped.

Add AGNES_METADATA_MAX_BYTES = 1 MiB (a real metadata file with covers,
docs, categories for ~50 plugins fits in <100 KB so the cap is
generous) and gate the size check on path.stat().st_size before the
body read. Broaden the parse except to (ValueError, RecursionError)
with a unified log line. Both failure modes degrade to the same
empty-dict fall-back the malformed-JSON path already used, so one bad
upstream never aborts the rest of the sync.

Tests cover the size cap firing before json.loads (whitespace-padded
valid JSON exceeding the cap) and the recursion path (5000 nested
arrays — past cpython's default recursion limit but well under the
size cap).

* Persist asset-mirror manifest per body write, before unlink

sync_assets wrote each body atomically (tmp + rename) but persisted
the manifest only at the end of the batch. A kill -9 mid-Phase 2 left
on-disk files the manifest never referenced. Once a curator dropped
that URL from agnes-metadata.json, Phase 3's cleanup had no record of
the file and the orphan stayed forever — there's no GC pass walking
the cache dir today, so disk would slowly bloat.

Phase 2 (body-write iteration): after the in-memory manifest mutation,
persist BEFORE unlinking the previous body. The crash window narrows
from 'all of Phase 2' to 'between persist and unlink' (microseconds).
A persist failure mid-batch keeps the previous body on disk — the on-
disk manifest still references it, and a stale-but-existing file beats
a 404. Cost: one extra tmp+rename per body write; manifest is a few KB
so the overhead is negligible vs. the HTTP fetches.

Phase 3 (curator-removed URLs): same discipline. Collect the to-delete
relpaths, persist the manifest with the entries already gone, THEN
unlink. A crash mid-cleanup leaves at most a microsecond window where
files exist despite the manifest no longer naming them. The next sync
reads the (correct) manifest and the orphan stays orphaned, but the
served state is consistent.

Tests cover per-body persist call count, the post-update on-disk
manifest content, and Phase 3 ordering verified by reading the on-disk
manifest from inside Path.unlink.

* Consolidate marketplace video embeds + format-guide CSS

The YouTube nocookie / Vimeo / <video> / link-fallback detection logic
was duplicated verbatim in marketplace_plugin_detail.html and
marketplace_item_detail.html (~40 JS lines each, with subtly-different
inline styles). Both templates now {% include %} a single
_marketplace_video_embed.html partial inside their IIFE so the regex,
the nocookie attribute set, and the unknown-host link fallback live in
ONE place — future tweaks (new host, new attribute, fixed sandbox flag)
no longer need to be applied twice in lockstep.

The .video-wrap selectors (one inline <style> rule in plugin_detail,
one inline style='...' attribute in item_detail) are replaced by the
existing .video-embed 16:9 wrapper in style-custom.css, with new
.video-embed video / .video-embed a child rules added so the wrapper
handles all four embed shapes uniformly without per-template
positioning.

The 60-line inline <style> block in marketplace_format_guide.html
moves verbatim to style-custom.css under a new 'Marketplace format
guide page' section, scoped to .format-guide so other pages aren't
affected.

No user-visible behaviour change: the rendered HTML for valid
YouTube / Vimeo / mp4 / external links is byte-identical to before,
and the format-guide page renders the same.

* Maintainability cleanup batch (PR #234 review)

#10: drop _path_under from app/api/marketplace.py — it was a byte-
equivalent clone of _safe_join (same Path.resolve(strict=True) +
relative_to() containment check). The three v32 endpoint handlers
(/asset, /doc, /mirrored) now share the existing helper.

#14: rename src/marketplace_assets.py → src/marketplace_asset_validation.py
so the file's purpose is obvious from the name and the previous
overlap with src/marketplace_asset_mirror.py is gone. Six call-site
imports updated in lockstep; CHANGELOG references under [Unreleased]
updated to track the new path.

#11: consolidate the URL builders that resolve
/api/marketplace/curated/<slug>/<plugin>/{asset,doc,mirrored}/...
paths. _internal_asset_url / _internal_doc_url / _mirrored_asset_url
lived in src/marketplace.py, while a copy named _mirrored_url lived
in app/api/marketplace.py with a 'must stay aligned' comment. New
module src/marketplace_urls.py is the single source of truth — both
call sites import from it and a future URL-format tweak only needs
to change one file. The _ROUTE_PREFIX constant collapses the per-
function f-string repetition. The route-handler endpoints themselves
still own the path string literals (keeping the builders identical
to the route declarations remains a checklist item, not a runtime
guarantee).

* Re-key asset-mirror manifest by (plugin, url) + dedup HTTP fetches

The manifest used to be keyed by URL alone, so two plugins in the
same marketplace referencing the same external image (a shared CDN
icon, a common cover) collided on entry.plugin_name — last writer
won. The DB row for the losing plugin then stored a served URL
pointing under the winning plugin's tree, and require_resource_access
denied legitimate access on one side and let the other plugin's user
reach the wrong asset.

In-memory: Dict[Tuple[str, str], MirrorEntry] keyed (plugin_name, url).
On disk: format flips from {url: entry} dict to [entry, ...] list of
self-describing entries (each carries plugin_name + url + the
previous fields). JSON keys can't be tuples; encoding 'plugin::url'
would just shift the parsing burden.

Phase 1 of sync_assets deduplicates fetches by URL — three plugins
sharing one URL share one HTTP request. The conditional-GET prior is
picked from any owning plugin's prior entry; if their etags diverge
(rare) we miss one 304 and pay for a full re-download instead.
Phase 2 still creates a per-(plugin, url) manifest entry pointing
under the plugin's own subdir, and Phase 3 cleanup is keyed the same
way so dropping a URL from one plugin's metadata doesn't disturb
another plugin still referencing it.

Body files stay per plugin (RBAC-clean isolation: deleting plugin A's
cache can't strand plugin B). Bandwidth saved by fetch dedup.

Consumer code re-keyed: src.marketplace._refresh_plugin_cache rebuilt
served_url_for / mirror_status as composite-keyed maps;
app.api.marketplace._resolve_external_via_mirror /
_curated_inner_cover / _curated_inner_enrichment look up by
(plugin_name, url).

Tests cover per-plugin manifest entries with shared URL, the single
HTTP fetch for N plugins, and Phase 3 drop-one-keep-other. All
existing tests migrated to composite key access; v2 list format
assertions verify on-disk shape.

* Migrate asset mirror from urllib.request to httpx

The asset mirror was the only HTTP call site in Agnes still using
urllib.request; every other module (CLI, Jira / OpenMetadata / OpenAI
connectors, scheduler, Telegram bot) already used httpx. The asset
mirror was added in this PR's base commit, so this is the only chance
to bring it into convention before someone copies it as 'the pattern
for HTTP fetches in Agnes'.

Three concrete benefits beyond consistency:

* SSRF defence collapses from five urllib classes
  (_PinnedHTTPConnection, _PinnedHTTPSConnection, _PinnedHTTPHandler,
  _PinnedHTTPSHandler, _SafeRedirectHandler) into one
  _SSRFGuardTransport. httpx invokes handle_request() on every redirect
  hop, so re-validation is free — we don't need a custom redirect
  handler at all.

* DNS-rebinding defence: the transport rewrites request.url.host to the
  SSRF-validated IP before delegating to super().handle_request().
  httpcore connects to whatever URL.host says, so this pins the
  connection without subclassing HTTPSConnection. The original hostname
  goes into the Host header + the sni_hostname extension so TLS / vhost
  routing still bind to the curator-supplied hostname.

* Error handling: one httpx.HTTPError catch-all for transport errors,
  plus specific httpx.TimeoutException / httpx.TooManyRedirects branches
  for clearer diagnostics. Matches the _translate_transport_error shape
  in cli/client.py.

The shared httpx.Client is built lazily at module load (same pattern as
cli/client.py:_get_shared_client) with follow_redirects=True,
max_redirects=5, timeout=HTTP_TIMEOUT_SEC, and our custom transport.

Externally observable behaviour is unchanged: same FetchOutcome
statuses, same manifest format, same conditional GET semantics, same
body-size cap.

Tests migrated from urllib-shaped fakes to httpx-shaped (status_code,
iter_bytes, context manager). Five urllib-specific tests replaced with
httpx equivalents — three transport unit tests + one DNS-rebinding
integration test that verifies host rewrite via monkey-patched
super().handle_request. One test deleted without replacement
(unwrap-URLError-wrapping-an-_UnsafeRedirectError — urllib-specific,
not applicable to httpx).

* Surface curated agnes-metadata enrichment on My Stack tab

GET /api/marketplace/items?tab=my built each curated row from the
on-disk marketplace.json by way of resolve_allowed_plugins, which
doesn't carry the agnes-metadata enrichment columns
(cover_photo_url, video_url, category override, doc_links). The
handler then hard-coded cover_photo_url=None on the synthetic row.
Result: once a user clicked '+ Add to my stack' on a curated card,
the same plugin in tab=my rendered with the gradient placeholder
instead of its cover photo — confusing parity break vs. the curated
tab where the same row goes through MarketplacePluginsRepository
and gets the enriched columns.

Pre-load the enriched marketplace_plugins rows for every marketplace
the user is subscribed to, then look each granted+subscribed plugin
up by (marketplace_id, plugin_name). Fall back to the on-disk
synthetic shape only when the DB row is missing — happens during
the rare race where RBAC is granted before the first sync cycle
ingests the plugin. RBAC gating (granted set from
resolve_allowed_plugins) is unchanged so this fix can't widen
visibility; it just upgrades the data shape behind cards the user
was already going to see.

Per-marketplace list_for_marketplace beats N gets — typical user is
subscribed to <5 marketplaces, so this is at most a handful of
queries vs. one per subscribed plugin.

Regression test seeds a plugin with cover_photo_url + category
override, subscribes the user, hits /api/marketplace/items?tab=my,
and asserts photo_url + category come through. The misleading
'fall through to gradient until the user re-visits the curated tab'
comment is gone.

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
2026-05-09 17:01:37 +02:00
Vojtech
d6ad08f107
Flea-market upload guardrails + soft delete + JOIN-based admin queue (#233)
* feat(store): flea-market upload guardrails + soft delete + JOIN-based admin queue

Adds an end-to-end guardrails pipeline for store uploads (manifest +
static-security + LLM review), persists blocked bundles for forensics,
introduces soft-delete (Archive) semantics, consolidates the legacy
/store/{id} surface into /marketplace/flea/{id}, and reworks the admin
queue so lifecycle filters read live entity visibility via LEFT JOIN
rather than a denormalized submission column.

Schema v29 → v35:
  * v29 store_submissions table + store_entities.visibility_status
  * v30 file_size, bundle_sha256, bundle_purged_at on submissions
  * v31 reshape store_submissions (drop legacy unique on entity_id)
  * v32 store_entities.archived_at/by + 'archived' visibility value
  * v33 drop store_submissions.retry_count (unused)
  * v34 ensure idx_store_submissions_entity exists post column-drop
  * v35 broaden visibility_status enum + JOIN architecture cutover

Pipeline (src/store_guardrails/):
  * Inline checks: manifest_check, static_scan, quality_check
  * LLM review configurable haiku|sonnet|opus (default haiku)
  * BackgroundTasks-driven async path with structured-output JSON
  * Per-submitter daily quota (default 50)
  * 30-day TTL purge job (POST /api/admin/run-blocked-purge)
  * Bundle SHA256 + size persisted; sha256 survives purge for forensics

Visibility model:
  * pending | approved | hidden | archived
  * _enforce_visibility returns 404 (no leak) for non-owner non-admin
  * Owner sees own non-approved entries via include_owner_id widening
  * Install refused with 409 entity_not_approved when not approved

Soft-delete (DELETE /api/store/entities/{id}):
  * Default = soft (visibility_status='archived'); existing installs
    keep getting served the bundle so users don't lose the plugin
  * ?hard=true admin-only: drops bundle + cascades user_store_installs
  * Hard-delete preserves entity_id on submission as tombstone so
    audit_log linkage survives for the activity timeline

Admin queue lifecycle (the JOIN refactor):
  * Verdict (store_submissions.status) is immutable forensic record
  * Lifecycle (store_entities.visibility_status) is live state
  * /admin/store/submissions Archived chip translates to
    `e.visibility_status='archived'` via LEFT JOIN — any path that
    flips visibility surfaces in the queue immediately
  * Detail page renders Status (verdict) and Entity lifecycle side by
    side so admins see "approved at review, now archived" at a glance

URL consolidation:
  * /store/{id} deleted (no redirect, stale bookmarks 404)
  * /marketplace/flea/{id} is the canonical detail surface
  * Three in-tree callers (upload-success, my-stack card, store
    listing card) updated to point at the new URL
  * Quarantine banner extracted to _quarantine_banner.html partial,
    self-guarded, included from both flea detail templates
  * Banner JS auto-refreshes when the verdict lands by polling
    /api/marketplace/flea/{id}/detail (visibility_status +
    submission_status — the latter is needed because blocked_llm
    keeps the entity at visibility_status='pending')

Audit log resource format:
  * runner.py emits prefixed `store_submission:{id}` (post-fix)
  * Detail-page timeline query handles three patterns: prefixed
    submission, helper-emitted `store_entity:{sub_id}`, and bare-id
    legacy rows — all surface in the activity timeline

UX fixes:
  * Owner sees Under review / Quarantined / Hidden banner with status
  * Install button gray-disabled (not blue) when non-approved
  * Owner cannot delete quarantined entries (403); admin can
  * Admin queue: filter chips, sortable columns, paging, page-size
  * Auto-refresh queue every 5s while pending rows are visible
  * Store upload page file picker no longer opens twice (label →
    input default action collided with explicit JS handler)

Tests: 168 passed across the guardrails suites (admin submissions,
store API, inline / LLM / purge guardrails, store repositories,
marketplace filter, schema version). New regression coverage
includes: archive surfaces via JOIN even when API path is bypassed;
deleted submission renders activity timeline (tombstone); flea
detail surfaces submission_status only for owner/admin; detail page
renders Entity lifecycle row; audit log resource format covers both
helper and runner paths.

* fix(store-guardrails): PR #233 follow-up — prompt injection, atomic PUT, BG race, schema, reaper, sort whitelist

Addresses 9 of the 23 findings from the PR #233 review (spec at
docs/superpowers/specs/2026-05-09-pr233-guardrails-fixes-spec.md).
Merge-gate items #1-#6 plus high-value mediums #7, #9-#12, #23.
Architectural items (#8 enum split, #14 factory) and pure
maintainability (#15-#22) deferred to follow-ups.

Security:
* #1 prompt injection — SYSTEM_PROMPT now passed via the SDK's
  dedicated system= parameter; bundle wrapped in <bundle>...</bundle>
  sentinels declared data-only by the system prompt; literal
  sentinel strings in user content are escaped so an adversarial
  README can't forge a close tag.
* #6 static scan honesty — module docstring + admin copy + docs
  declare static scan as signal not gate; .md/.txt/.rst/.html/.json/
  .yaml/.yml/.toml skipped to avoid false positives on prose.
  AST mode for Python deferred (separate flag, FP comparison work).

Correctness:
* #2 PUT atomicity — bundles bake into plugin.staging-<rand>/
  alongside live, atomic-rename on success; failed checks leave
  live tree byte-for-byte intact.
* #3 BG-task race — set_visibility_if_pending guards verdict flips
  to the (pending, hidden) review window; admin archives during
  review survive; skipped flips audit-logged.
* #4 v35 NOT NULL/DEFAULT — schema v35→v36 re-applies them on
  store_entities.visibility_status. CHECK constraint enforced
  application-side (DuckDB ADD CHECK on existing column unsupported).
* #7 stuck-review reaper — reap_stuck_llm_reviews flips pending_llm
  rows older than guardrails.stuck_review_grace_seconds (default
  1800) to review_error. Scheduler runs every 15 min via new
  /api/admin/run-reap-stuck-reviews. Set knob to 0 to disable.
* #9 quota counter — count_blocked_for_submitter_since now counts
  blocked_inline + blocked_llm + review_error so a submitter
  triggering only LLM-blocked verdicts is bounded.
* #10 missing risk_level — surfaces as review_error with
  error='missing_risk_level' instead of silently defaulting to
  'medium' (which looked like a model-decided block).
* #11 archived_at clear — set_visibility nulls archived_at +
  archived_by when transitioning out of 'archived' so a future
  read doesn't show stale archive forensics on an approved row.

Maintainability:
* #12 FSM doc comment — accurate insert/transition/lifecycle
  description in src/db.py near store_submissions schema.
* #23 sort-key whitelist — admin queue rejects unknown sort keys
  with 400 invalid_sort_key; substring-replace footgun removed.

Deferred (separate PRs):
* #5 quota race — proper fix requires asyncio.Lock spanning the
  full pipeline; threading.Lock blocks event loop, DuckDB MVCC
  doesn't help. API-level slowapi bounds worst case for now.
* #6 part 3 (AST static scan), #8 (enum split), #13 (import
  bundle docs), #14 (factory consolidation), #15-#22 (maint).

Tests:
* New: tests/test_store_guardrails_prompt_injection.py (corpus +
  trust-boundary invariants), tests/test_store_put_atomic.py,
  tests/test_store_guardrails_reaper.py.
* Extended: test_store_guardrails_llm.py (system param, missing
  risk_level, BG race), test_admin_store_submissions.py (quota
  counter widening, sort whitelist 400), test_store_repositories.py
  (un-archive metadata clear), test_db_schema_version.py (v36).
* Full suite: 3738 passed; 17 pre-existing baseline failures
  unchanged (db migration tests, cli binary rename, catalog export,
  user mgmt v5 backfill — confirmed by stash + rerun on clean tree).
2026-05-09 17:32:53 +04:00
minasarustamyan
e26236fdc1
Extract session-pipeline framework + UsageProcessor skeleton (#232)
* Extract session pipeline framework, refactor verification, add UsageProcessor skeleton

Pluggable framework under services/session_pipeline/ (contract + lib + per-processor
runner) so multiple processors can read /data/user_sessions/<key>/*.jsonl on their
own cadence with full failure isolation. Verification flow becomes the first plugin;
a no-op UsageProcessor reserves the second slot pending a separate brainstorm on
extraction logic + storage shape.

Schema v28→v29: rename session_extraction_state → session_processor_state with
composite PK (processor_name, session_file). Existing rows copied over with
processor_name='verification'; legacy table dropped. Migration is idempotent and
no-ops the copy step on fresh installs that came up at the new schema.

Endpoint: /api/admin/run-verification-detector replaced by parametrized
/api/admin/run-session-processor?processor=<name>. Audit action format follows.
Scheduler JOBS: verification-detector entry split into session-processor:verification
+ session-processor:usage. SCHEDULER_VERIFICATION_DETECTOR_INTERVAL retained for
operator compatibility (drives both cadence and health-check grace window);
SCHEDULER_USAGE_PROCESSOR_INTERVAL added.

* Address PR #232 review: scan dead branch + per-processor lock

- `SessionProcessorStateRepository.scan_unprocessed_for` dead else: both
  branches surfaced every jsonl, the SELECT was unused, runner MD5-rehashed
  every stable session per tick. Replaced with an mtime precheck — stable
  sessions (mtime <= processed_at) are filtered at scan; modified files
  still surface for the runner's authoritative `file_hash` invalidation.
  Naive-local comparison matches the existing health-check idiom (DuckDB
  TIMESTAMP strips tz on storage).

- Per-processor advisory lock around `_run_processor` in
  `/api/admin/run-session-processor`. Scheduler tick + manual admin POST
  could otherwise both run, both call create_evidence on overlapping
  detections, and accumulate duplicate verification_evidence rows (the
  dedup short-circuit only covers create+contradiction, not evidence per
  ADR Decision 3). Non-blocking acquire → 409 Conflict on concurrent
  invocation; release in finally so a runner exception doesn't wedge the
  processor.

Tests: two new scan unit tests (mtime filter + post-mark mtime bump), 409
endpoint test, lock-released-on-exception test. Two existing tests updated
for the new "filtered at scan" stat shape (previously asserted skipped == 1,
now scanned == 0).

* Address PR #232 review #2: parallel scheduler tick + last_run on terminal state

Two pre-existing scaffold bugs in services/scheduler/__main__.py amplified
by adding more session-pipeline jobs:

1. Serial for-loop over jobs with synchronous httpx.post(timeout=900) — a
   10-minute verification run blocked every other job (data-refresh,
   health-check, usage, corporate-memory) for the whole window. The PR's
   stated isolation guarantee held inside the runner but broke at the
   scheduler dispatch layer.

2. last_run advanced only when _call_api returned True. Permanent-failure
   jobs hot-looped on every tick (30s) instead of cadence (15min).

Fix: ThreadPoolExecutor.submit per due job + per-job in_flight set so a
long-running job can't be re-launched on subsequent ticks. last_run
advances unconditionally in finally; errors still surface via _call_api
logging + audit_log on the receiving side.

_run_job extracted to module-level for unit testing. New tests:
- TestRunJobBookkeeping: advances on success / failure / unhandled raise
- TestRunLoopParallelism: in_flight protection prevents duplicate
  launches across ticks for a single slow job

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
2026-05-08 19:47:46 +02:00
Vojtech
2e2e1a1eca
feat(home): state-aware /home + /setup-advanced + schema v26 (#228)
* feat(home+news): state-aware /home + /news + admin-edited news section

Squash of the vr/home-page feature work for clean rebase onto main.
Original 18-commit history preserved in branch backup/vr-home-page-pre-rebase.

What's in this PR:

**State-aware /home page**
- New `/home` route with hero + auto-mode + connectors (Asana / GWS /
  Atlassian) + lookarounds. Onboarded vs not-onboarded state-machine
  branches a single template (`home_not_onboarded.html`); the install
  steps, "Setup a new Claude Code" CTA (90-day PAT mint), and per-
  connector setup prompts hide once `users.onboarded=TRUE`. A
  completion badge replaces them.
- "Mark me as offboarded" button reverses the flag without an SQL UPDATE.
- `users.onboarded BOOLEAN` column added; default FALSE; flipped by the
  CLI's `agnes init` post-success POST and the `/admin/users` API.
- Connector setup prompts pre-check whether the tool is already
  installed/connected before re-running setup.
- GWS scope set widened to include Google Chat (`chat.spaces`,
  `chat.messages`).

**Single template + design tokens**
- `dashboard.html` now extends `base.html` via the new
  `{% block layout %}` opt-out (full-width pages skip the 800px
  `.container`). Net: every page shares one shell.
- `style-custom.css` `:root` extended with `--space-{7,9,10,12}`,
  `--radius-2xl`, `--shadow-{card,elevated}`, `--text-{muted,disabled}`,
  `--focus-ring`, `--transition-*`, `--width-{narrow,app,wide}` so
  inline page styles can migrate incrementally.

**Auth redirects honor AGNES_HOME_ROUTE**
- `safe_next_path` resolves the configured home route when no `default=`
  is passed; OAuth callbacks, magic-link clicks, password form, and
  LOCAL_DEV_MODE shortcuts now land on `/home` (or whatever the operator
  picked) instead of always /dashboard.

**News section + /news permalink + /admin/news editor**
- Schema-bumped `news_template` table (single versioned entity, draft +
  publish gate). `published BOOLEAN` distinguishes draft from public;
  monotonically-increasing `version` per save; rows >30d pruned on
  save except the currently-displayed published version.
- `/home` bottom-of-page renders the latest published intro with a
  "Read more →" link to `/news` (which renders the full body).
- `/admin/news` editor with sandboxed live preview, versions table,
  per-row Unpublish, Format-help cheatsheet.
- `agnes admin news show / draft / edit / publish / unpublish /
  versions / export` (CLI). Talks to the live server via the
  `/api/admin/news/*` endpoints (PAT-authed) — no direct DB access
  so it coexists with a running uvicorn.
- **Optimistic-lock guard**: `agnes admin news publish --version N` and
  PUT/PATCH endpoints accept `expected_version` and 409 with structured
  `{error: "version_conflict", expected, actual, actual_by}` when a
  concurrent admin replaced the draft. Edit refuses to overwrite a
  draft authored by someone else without `--force` or
  `--expect-version`.
- nh3 (Rust-backed ammonia) HTML sanitizer; iframe pre-pass strips
  any iframe whose src is not on the YouTube/Vimeo/Loom allowlist;
  javascript:/data: schemes blocked everywhere.
- Author CSS vocabulary: `.news-hero` (blue gradient hero block),
  `.callout`/`.callout-{info,warn,success,danger}`,
  `.video-embed`, `.news-section`, `.news-grid-{2,3}`, `.news-cta` —
  all consolidated in `style-custom.css` under "News content
  vocabulary (shared)" so /home perex, /news body, and /admin/news
  preview share one source of styling.
- Code-inside-`<pre>` contrast fix (was unreadable amber-on-silver).
- `.news-content` table styling (border, header band, row-hover).

**`scripts/dev/run-local.sh`** — local uvicorn launcher. Pulls Google
OAuth client id/secret from GCP Secret Manager
(`AGNES_OAUTH_GCP_PROJECT`-driven, no vendor defaults), points
`AGNES_CLI_DIST_DIR` at `./dist` so the wheel endpoint resolves, and
`--dev` flips `LOCAL_DEV_MODE=1` + `AGNES_HOME_ROUTE=/home` for one-
command iteration. `LOCAL_DEV_MODE=1` also enables the FastAPI debug
toolbar.

**CLAUDE.md "Run tests before every push" section** codifies
`pytest tests/ -n auto -q` as non-negotiable before each push.

**Tests**: 51 + 14 + 8 = 73 new tests across news-template repo,
sanitizer, API, web, CLI; plus updated home/auth/template tests for
the new shared-shell architecture.

Origin docs (gitignored, customer-fork content):
docs/brainstorms/home-page-requirements.md,
docs/plans/2026-05-07-001-feat-home-page-plan.md.

* feat(cli): agnes onboarded {on,off,status} — self-scoped flag toggle

User-facing equivalent of the in-page "Mark me as (off)boarded" button
on /home. POSTs /api/me/onboarded with {onboarded, source}; --source
overrides the audit-log marker so flips made from the CLI vs the web
button vs agnes init automation stay distinguishable.

`status` reads via /api/me/profile (when present); falls back to a
quick body-marker scan of /home so the read path doesn't write an
audit_log row. PAT-authed via cli.client.api_post — same convention
as agnes admin news / agnes admin add-user etc.

Tests: 5 covering on/off/status round-trip, idempotency, and
audit-log source recording. Full suite holds at 12 pre-existing
failures (same set as before).

* ui(nav+home): primary nav reorg + green What's new band + /marketplace link fix

Primary nav (post-rebase audit + per-user feedback):

- Items: Home → Marketplace → Data Packages → Memory. Admin dropdown
  for admins only. The "Dashboard" label was renamed Home — point still
  resolves through `home_route` so customer instances on /dashboard
  still land there.
- Activity Center moved into the Admin dropdown. Per-team adoption
  analytics is admin-consumed in practice; the route still allows
  any authed user for direct deep-links so existing /home tile +
  bookmarks keep working.
- Memory link added (→ /corporate-memory) — was previously buried in
  the /home "Look around" tiles.
- Setup local agent + My Stack dropped from main nav. Setup is the
  /home install flow's home now; My Stack lives as a tab inside
  /marketplace.

/home tweaks:

- Plugin marketplace tile now points at /marketplace (was /store —
  legacy from before the marketplace rebrand landed in #230).
- "What's new" section header gets a green band (success-flavored
  D1FAE5 background, A7F3D0 border, darker green title) so the
  bottom-of-page news block visibly distinguishes from the blue
  install-hero at the top. Header strip only — body stays white.

Test fix: test_home_route_resolution renamed `dashboard_link_uses_home_route`
→ `home_link_uses_home_route` and asserts `href="/home">Home` instead
of `href="/home">Dashboard` after the label change.

* fix(home): decouple Step 3 + Connect-tools collapse from server onboarded flag

The server-side `users.onboarded` flip happens through two paths:

1. Explicit user click on "Mark me as onboarded" or `agnes onboarded on`.
2. Implicit `agnes init` POST → /api/me/onboarded on success.

Path 2 produced a UX surprise: an analyst running `agnes init` mid-flow
reloaded /home and saw Step 3 (auto-mode) + Connect-your-tools auto-
collapse to summary bars. They were actively working through those
sections — the install POST never signalled "I'm done with the rest
of setup", just "Agnes itself is installed".

Decouple the section-collapse decision from the server flag:

- Step 1 + Step 2 install blocks: still hidden on `onboarded=TRUE`
  (their completion is a hard server signal — Agnes IS installed).
- Step 3 + Connect-your-tools: render flat by default in BOTH states.
  Wrapped in `<details class="setup-collapsible" open>` so the
  browser's native disclosure handles per-section toggle without JS,
  but the `<summary>` is CSS-hidden until the page-level
  `data-setup-minimized="1"` attribute is set on `.home-mock`.
- New "Minimize setup view" toggle inside the blue install-hero,
  rendered only when onboarded. Click flips the data-attr on
  `.home-mock` AND removes the `open` attribute from each
  `<details>`. State persists in `localStorage["agnes_home_setup_minimized"]`
  so the choice survives reloads but is per-device.
- "Show full setup view" (the same button when minimized) re-opens
  both `<details>` and clears localStorage.

When minimized, each `<details>` still has its own native expand/
collapse — click the gray summary bar to peek at one section without
toggling the page-level minimize off.

Tests:
- test_step3_and_connectors_render_flat_when_onboarded_by_default —
  asserts `<details class="setup-collapsible" ... open>` for both
  sections post-onboarding and the absence of any server-rendered
  `data-setup-minimized` attribute on the `.home-mock` root.
- test_minimize_toggle_visible_only_when_onboarded — toggle button
  rendered only when onboarded.

Full pytest holds at 12 pre-existing failures (same set).
2026-05-08 18:28:47 +02:00
Vojtech
107195730d
feat(observability): optional PostHog integration (#231)
* feat(observability): optional PostHog integration (errors, LLM traces, replay, flags)

Off by default. Activates when POSTHOG_API_KEY is set in env. Defaults
to PostHog Cloud EU; override host for US Cloud or self-hosted.

Coverage:
  - FastAPI 500 handler captures unhandled exceptions
  - src/orchestrator.py rebuild + rebuild_source failures
  - services/scheduler/ HTTP-job failures
  - cli/main.py uncaught CLI errors (Typer.Exit/SystemExit/KeyboardInterrupt
    skipped; flushes before re-raise so short-lived CLI invocations don't
    drop events)
  - connectors/llm/anthropic_provider.py + openai_compat.py emit
    $ai_generation events with provider, model, latency, token counts
    (prompt/completion bodies stay off unless POSTHOG_LLM_PAYLOADS=1
    because LLM prompts here routinely include customer SQL/data)
  - Browser snippet injected into every text/html response by
    PosthogInjectionMiddleware — registered inside the GZip layer so it
    sees uncompressed HTML before compression. Many templates are
    standalone (their own DOCTYPE) and never extend base.html, so a
    per-template include would miss them.
  - Frontend: $pageview, $pageleave, JS error capture via window.error
    and unhandledrejection handlers, masked session replay
    (maskAllInputs: true plus CSS-selector mask for known data surfaces),
    feature flags (browser posthog.isFeatureEnabled + server-side
    feature_enabled with fallback for older SDKs).

Identification mode operator-configurable: none / id / email / full.
Default email ships user.id + email but never name. CLI entry point
moves from cli.main:app to cli.main:main (Typer wrapper).

Files:
  - src/observability/posthog_client.py — lazy singleton, no network
    when disabled, single-process flush on shutdown
  - src/observability/llm_tracing.py — trace_generation context manager
  - app/middleware/posthog_inject.py — HTML rewrite middleware
  - app/web/templates/_posthog.html — browser snippet template
  - docs/observability.md — operator guide
  - config/.env.template — documented POSTHOG_* knobs
  - tests/test_posthog_disabled.py + tests/test_posthog_client.py +
    tests/test_llm_tracing.py — 18 tests covering disabled state,
    identify-mode payloads, $ai_generation shape, error variant.

CHANGELOG entry under [Unreleased] Added.

* feat(observability): tag every PostHog event with environment + release

Splits PostHog dashboards cleanly between localhost / dev / staging /
production without manual tagging on every capture call.

- POSTHOG_ENVIRONMENT explicit override; auto-resolves to "local" when
  LOCAL_DEV_MODE=1, else RELEASE_CHANNEL, else AGNES_DEPLOYMENT_ENV,
  else "unknown".
- AGNES_VERSION → RELEASE_CHANNEL fallback feeds the `release` property
  for "is this error new in this release?" cohorting.
- Backend gets both via the PostHog SDK's super_properties constructor
  arg (every captured event picks them up automatically).
- Browser snippet calls posthog.register({environment, release}) inside
  the loaded callback so $pageview, $exception, autocapture, etc. all
  carry the same labels.
- request.state.user now populated by auth dependencies so the snippet
  can actually call posthog.identify(user_id, {email}) for logged-in
  users (previously the user block always resolved to None because
  nothing wrote to request.state.user).

4 new tests cover env resolution: explicit > LOCAL_DEV_MODE > channel
> unknown, plus super-properties forwarding into the SDK constructor.

* feat(observability): inline user attrs on every PostHog event + debug throw route

PostHog's UI shows person properties on the Person profile page, not
inline on each event — so a reviewer triaging an exception couldn't tell
which user hit the bug without clicking through. Fix it on both sides.

- Backend capture_exception merges user_id / user_email / user_name into
  the event properties (gated by POSTHOG_IDENTIFY_PII: none/id/email/full).
  Backed by a new _user_props_for_event helper on PosthogClient.
- Browser snippet registers user_id + user_email + user_name as super-
  properties via posthog.register({...}) so every $exception, $pageview,
  and custom event coming from posthog.captureException() carries them
  inline. Mirrors the backend so cross-referencing client/server events
  doesn't require a person-profile lookup.
- /api/debug/throw — debug-only endpoint gated by DEBUG=1 (404 in prod).
  Runs Depends(get_current_user) first so request.state.user is set when
  the unhandled-exception handler captures the event. Lets operators
  exercise the full observability path end-to-end without hand-rolling
  a TestClient script. Configurable via ?kind=ValueError&msg=...

7 new tests cover: backend user-attr merge across identify modes,
anonymous request fall-through, browser snippet super-prop emission for
logged-in / anonymous / id-only / full-name cases.

* fix(observability): address minasarustamyan PR #231 review

Two bugs caught in review.

1. PosthogInjectionMiddleware dropped Response.background on every
   return path. BaseHTTPMiddleware materialises the body and asks
   subclasses to return a fresh Response — three paths in dispatch()
   omitted background=, silently cancelling any BackgroundTask /
   BackgroundTasks the route attached (audit logging, async webhooks,
   email sends) with no log line. Fix: route every return through a
   _passthrough() helper that forwards background.

   Also adds a _MAX_BUFFER_BYTES (4 MB) cap so a streamed-HTML response
   can't balloon RSS during buffering. Bigger bodies short-circuit
   through with a warning rather than being injected.

   Regression tests in tests/test_posthog_inject_middleware.py exercise
   four return paths (snippet present, render-fail, double-injection
   guard, non-HTML passthrough) plus the streaming-guard short-circuit.

2. $ai_input / $ai_output_choices were emitted without truncation, so
   POSTHOG_LLM_PAYLOADS=1 silently dropped events past PostHog's ~32 KB
   per-event ingest limit — exactly the calls (large prompts with
   schemas / sample rows / SQL) an operator would want to inspect.
   Fix: clip both at POSTHOG_LLM_PAYLOAD_MAX_CHARS (default 30000) with
   an explicit "…[truncated N chars]" marker so readers don't mistake
   truncated captures for complete ones. Metadata (provider, model,
   tokens, latency, error) flows regardless. Three new tests cover
   default-cap clipping, env-override, and pass-through under the cap.

37 PostHog tests pass.
2026-05-08 17:57:10 +04:00
minasarustamyan
4fb2818a19
Add /marketplace browse page + Model B opt-in stack composition (#230)
* Add /marketplace browse page + Model B opt-in stack composition

New /marketplace browse surface unifies the curated marketplaces
(admin-managed git mirrors) and the community Flea Market behind
three tabs — Curated / Flea / My Stack — with per-tab category
filter, search across both sources with scope checkboxes, and
numeric pagination, all driven by URL query state. Plugin detail
at /marketplace/curated/<slug>/<plugin> and /marketplace/flea/<id>;
nested skill / agent detail at /marketplace/curated/<slug>/<plugin>/
{skill,agent}/<name> and the flea-side single-page detail.

Model B opt-in: an RBAC grant on a curated plugin is now only
*eligibility*. The user must click "Add to my stack" for it to
enter their served Claude Code marketplace. Composition flips
from (rbac ∖ opt_outs) ∪ store_installs to
(rbac ∩ subscriptions) ∪ store_installs. The legacy
user_plugin_optouts table is renamed user_curated_subscriptions
(schema v27) — same table shape, inverted semantic, repository
methods become subscribe / unsubscribe / is_subscribed.

UX vocabulary: Install → Add to my stack, Installed → In your
stack, card "Installed" badge → "In stack" (amber pill), tab
"My Subscriptions" → "My Stack". Bridges the two-step model
(server-side bookmark vs. on-laptop install) the previous label
hid. Click triggers an inline post-add hint panel under the
description with the agnes refresh-marketplace recipe + Copy
chip, dismissible per-browser via localStorage.

Per-tab info blocks above the filter row:
- Curated: trust signal — "Each plugin here has a named curator
  accountable for it." (blue accent + See-all-curators link)
- Flea: open-shelf signal — "Anyone in the company can upload
  here." (purple accent + Tips-for-sharing link)
- My Stack: personal-shelf orientation — "Your AI stack —
  everything you've added." (slate accent, no link)

Tabs carry per-tab Heroicons (shield-check / building-storefront
/ rectangle-stack) tinted to match each tab's accent; flips white
when the tab is active for contrast.

Hero illustration anchored to the right of the blue hero panel
(absolute, 47% wide, behind the search row content). Hidden
under 900px viewport.

Action-row CTAs realigned to publication intent: curated
"How to add new content" → "Submit a plugin" (links to the
guide page); flea button removed since +Upload sits next to it.
Empty-state CTAs match. /marketplace/guide/{curated,flea}
routes now host publication-flow guide pages with placeholder
ledes — full copy to be authored separately.

Categories: Heroicons-based icons mapped per category in
src/category_icons.py (zero new dependencies; SVG path strings
inlined). Marketplace cards, filter pills, and detail pages
read from the same source.

API endpoints under /api/marketplace:
- GET /items per-tab listing (curated / flea / my)
- GET /categories per-tab non-zero counts
- GET /curated/{slug}/{plugin} plugin detail
- POST/DELETE /curated/{slug}/{plugin}/install subscribe toggle
- GET /curated/{slug}/{plugin}/{skill,agent}/{name} inner item
The tab=my branch reads directly from
user_curated_subscriptions ∪ user_store_installs (not
resolve_user_marketplace, which bundles flea skills/agents into
a single store-bundle synthetic entry useful for serving the
Claude Code marketplace ZIP/git but wrong for browsing where
each item should appear as its own card).

Detail pages: plugin detail surfaces inner skills/agents as
clickable nested cards; commands/hooks/MCPs render as plain
name lists. Skill/agent detail mirrors the plugin layout with
kind-tinted accents (skill = green, agent = purple), Description
+ Details sidebar, Files + Docs sections, and the "How to call
it" copy-able invocation chip showing /<plugin>:<inner-name>
exactly as Claude Code namespaces it post-install. Curated
nested has no install button — links back to the parent plugin.

Navbar: standalone "My AI Stack" relabelled "My Stack" and
points at /marketplace?tab=my; "Store" link removed (Store
flow is reachable via the Flea Market tab's +Upload button).
The standalone /my-ai-stack and /store routes still work for
old bookmarks.

Tests cover the new browse / categories / install / RBAC paths
under tests/test_marketplace_api.py; existing marketplace and
store tests updated for Model B (explicit subscribe in fixtures).
Schema bumped v26 → v27 with idempotent migration that wipes
existing user_plugin_optouts rows on flip and adds
marketplace_plugins.created_at with registered_at backfill.

* Fix v28 migration + post-rebase test fallout

v28 ALTER TABLE marketplace_plugins ADD COLUMN created_at conflicted with
_SYSTEM_SCHEMA's earlier CREATE that already includes the column on fresh
installs (test fixtures starting at any pre-v28 version trip on it).
Switch to ADD COLUMN IF NOT EXISTS — same idiom as the upstream v27
Keboola sync-strategy migration on the same ladder.

Two test patches needed after the rebase bumped SCHEMA_VERSION 27 → 28:
- test_keboola_v27_migration.py: test_schema_version_constant_is_27 was
  pinning ==27. Loosened to >=27 (the test's purpose is to verify the
  v27 Keboola migration, not to pin the current SCHEMA_VERSION).
- test_setup_page_unified.py: was monkeypatching resolve_allowed_plugins
  but compute_default_agent_prompt now reads from resolve_user_marketplace
  (Model B-aware). Stub the right function so the test exercises the
  v28 served-set path.

* Harden curated skill/agent inner endpoints against path traversal

`_read_inner`, the `skill_dir` walk in `curated_skill_detail`, and the
`agent_path.stat` in `curated_agent_detail` joined URL path-params onto
`plugin_root` without verifying the resolved candidate stayed inside it.
Starlette's `[^/]+` on `{skill_name}` / `{agent_name}` blocks the direct
URL exploit (encoded `/` 404s before the handler), but a curator-planted
symlink inside a curated marketplace's git mirror could still dereference
outside the plugin tree on read.

Adds `_safe_join(plugin_root, *parts)` doing
`Path.resolve(strict=True)` + `relative_to(plugin_root.resolve())`, used
by all three call sites so the boundary is enforced once and consistently.
Tests cover the helper directly (normal path resolves, escaping `..`
returns None, escaping symlink returns None, missing file returns None)
plus an end-to-end check that the symlink case actually 404s on the
HTTP endpoint. Symlink tests skip on Windows where symlink creation
needs elevated permissions; they run on Linux CI.

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
2026-05-08 14:22:19 +02:00
ZdenekSrotyr
3d63965a67 Merge remote-tracking branch 'origin/main' into pr180-review
# Conflicts:
#	CHANGELOG.md
#	app/web/templates/_app_header.html
2026-05-05 12:05:50 +02:00
ZdenekSrotyr
952dc9e74d fix(profile-sessions): tolerate stat() failures on individual jsonl (Devin Review on #179)
The previous gather used `sorted(glob, key=lambda p: p.stat().st_mtime)`.
A transient OSError (race with delete, permission flicker, EBADF on a
weird filesystem) on any single file raised through the lambda and 500-ed
the whole page.

Reworked: stat each path under try/except into a (path, stat) list, sort
the already-statted entries. Bad files drop silently from the listing.

Regression test test_profile_sessions_page_tolerates_stat_failures
patches Path.stat to raise on one of two files, asserts the page returns
200 with the good row rendered and the bad row dropped.
2026-05-05 09:53:06 +02:00
ZdenekSrotyr
9ebe991b55 feat(profile): per-session jsonl download from /profile/sessions
User feedback during e2e of #179: the listing page is nice but I want
to grab the raw jsonl and look at what's inside.

Adds GET /profile/sessions/<filename>:
- Auth via get_current_user (owner-only).
- Path safety: rejects "/", "\", "..", leading ".", and any non-".jsonl"
  filename. The served path resolves under
  ${DATA_DIR}/user_sessions/<caller.id>/; if resolution escapes that
  base directory, returns 404 (never 403, so existence of other users'
  files isn't leaked).
- FileResponse with Content-Disposition: attachment.

UI: Download button per row in profile_sessions.html.

Tests in test_web_ui.py: path-traversal / nested / dotfile / non-jsonl
all 404 for owner; unauthenticated 302/401/403; authenticated owner
gets 200 + correct Content-Disposition.
2026-05-05 09:15:12 +02:00
ZdenekSrotyr
4c4dfee8e6 feat(profile): /profile/sessions page + audit on detector exception + correct SCHEDULER_AUDIT_ACTIONS
Three changes addressing user feedback during e2e test of #179 + Devin Review on e86dd5ed.

1) /profile/sessions — new self-service user page in the user menu.
   Lists all session jsonls the caller uploaded via `agnes push` joined
   against session_extraction_state. Each row shows uploaded_at, file
   size, status badge (pending/processed/extracted), processed_at, and
   items_extracted. The page docstring + help text explicitly call out
   that items_extracted=0 means the verification detector ran fine but
   the LLM found no claims to track — that's the documented "no items"
   outcome, not a broken pipeline. Closes the gap surfaced during the
   e2e test of #176 where a user could see their sessions on disk and
   process them through the LLM but had no UI to inspect what happened.

2) run_verification_detector audits unhandled exceptions (Devin #1).
   If detector.run() threw anything other than the already-translated
   ValueError, the audit_log row was never written. The endpoint now
   wraps detector.run in try/except, records the exception in
   audit_params["unhandled_error"], then re-raises as 500 after audit.
   The /admin/scheduler-runs page surfaces the failure row with the
   error type + message.

3) SCHEDULER_AUDIT_ACTIONS list corrected (Devin #2). Previous list
   had "marketplaces_sync_all" (wrong — actual is "marketplace.sync_all")
   plus "data_refresh" and "scripts_run_due" which app/api/sync.py and
   app/api/scripts.py don't write to audit_log. Fixed to the four
   actually-logged strings; comment points at the missing audit calls
   as a follow-up.

Tests: tests/test_web_ui.py adds TestAdminRoleGuards::test_profile_sessions_page_no_admin_required and tightens test_admin_scheduler_runs_page_admin_only to assert the correct marketplace.sync_all string.
2026-05-05 08:57:35 +02:00
ZdenekSrotyr
fd3c76d21b fix(store): security + correctness blockers found in PR review (F1, F2, F4, F5)
Three independent reviews of PR #180 surfaced four real defects in the new
Store / my-ai-stack surface. CHANGELOG entries detail each; one-liners:

- F1 video_url XSS: any authenticated user could upload a Store entity
  with `video_url=javascript:...` and pop XSS in any viewer's session via
  the `<a href=...>` "Watch video" link in store_detail.html. Jinja2
  autoescape doesn't block URI schemes inside attribute values. Fixed by
  scheme-validating to http(s) only on create + update; 400 invalid_video_url.

- F2 ZIP decompression bomb: _safe_zip_extract checked path-traversal but
  not declared file_size totals — a 50 MB compressed upload at 1:1000
  ratio decompresses to 50 GB and DOS the host disk. Fixed by summing
  zinfo.file_size across infolist() and refusing > 200 MB before
  extractall touches disk. 413 zip_too_large_uncompressed.

- F4 admin authz parity: PUT /api/store/entities/{id} was owner-only while
  DELETE allowed owner OR admin; the store-detail page hid Edit/Delete
  buttons from admin even though DELETE was permitted. Fixed by allowing
  admin on PUT and passing is_admin to the template; gate is now
  is_owner OR is_admin everywhere.

- F5 cross-owner suffix collision: sanitize_username is many-to-one
  (alice.smith / alice_smith both → alice-smith). Two such users uploading
  entities with the same display name produced identical
  `<name>-by-<username>` suffixes, silently colliding in the served
  agnes-store-bundle on-disk paths AND the manifest catalog (Claude Code
  dedupes by plugin.json `name`). Fixed by enforcing global uniqueness on
  the suffixed value at create_entity; 409 conflict_global_suffix.

F3 (ZIP symlink members) was investigated and confirmed to be a
false-positive — Python's stdlib ZipFile.extractall does not honor
symlink mode bits, so no exploit exists.

9 new regression tests in tests/test_store_api.py::TestStoreSecurityFixes
covering all four. Test run locally: 60/60 store-related tests pass.
2026-05-05 08:18:02 +02:00
ZdenekSrotyr
e86dd5edc5 fix(anthropic): strict json_schema (additionalProperties=false) + add /admin/scheduler-runs UI
E2E test on a real BQ deploy showed every verification-extraction call
fails with HTTP 400 invalid_request_error: "output_config.format.schema:
For 'object' type, 'additionalProperties' must be explicitly set to false".
The Anthropic structured-output API now requires the field on every object
node in the json_schema. Fix: connectors/llm/anthropic_provider.py wraps
the caller-supplied schema through a recursive _strict_json_schema()
walker that adds the field where missing (preserving any explicit
override), then passes the strict variant to the API. Six unit tests in
TestStrictJsonSchema pin the recursion across nested objects, array items,
and the no-mutation invariant.

Adds /admin/scheduler-runs — a read-only admin page that surfaces the
last 200 audit-log entries from scheduler-driven actions. New
AuditRepository.query_actions(actions, limit) helper, new admin nav
entry. Failed scheduler ticks (HTTP 401, network errors) don't reach
the audit_log; the page calls that out with a hint to set
SCHEDULER_API_TOKEN if no rows show up.
2026-05-05 08:00:57 +02:00
Minas Arustamyan
af72c5d259 fix(setup): walk TLS chain for trust-store match — Let's Encrypt cleanup
`_read_agnes_ca_pem()` decides whether the served fullchain.pem needs
trust-bootstrapping in the rendered setup prompt. Pre-fix it only
checked the leaf's *immediate* issuer against `certifi`'s trust store.
For Let's Encrypt that's the intermediate (R13), which `certifi` does
not ship — only roots are in trust stores. So a publicly-trusted LE
chain still tripped the "needs bootstrap" path and the setup prompt
emitted a step-0 TLS trust block + clone-fallback marketplace block
that no client actually needs (Bun-compiled `claude.exe`, system git,
Python via certifi all validate the chain through the bundled ISRG
Root X1).

Now we walk every cert in the fullchain (leaf + intermediates) and
return None the first time any cert's issuer is in the certifi trust
store — that captures the standard "leaf signed by intermediate signed
by publicly-trusted root" shape. Trusted subjects are read once into
a set for O(1) lookup. Self-signed (leaf.issuer == leaf.subject) and
private-CA chains (no chain link's issuer in certifi) keep their
previous "return PEM" behavior, so deployments that genuinely need
the bootstrap still get it.

Validated end-to-end against the live VM at
agnes-marustamyan.groupondev.com (LE R13 → ISRG Root X1):
  - Let's Encrypt fullchain                   → has_ca=False (was True)
  - Self-signed cert                          → has_ca=True
  - Corporate-CA chain (private root)         → has_ca=True
  - Missing fullchain.pem                     → has_ca=False
2026-05-05 04:55:06 +02:00
Minas Arustamyan
d5a7c9ad79 feat(store): /store + /my-ai-stack — community marketplace + per-user composition
Adds a community-driven Store where any authenticated user uploads
skills/agents/plugins as ZIPs, plus /my-ai-stack as the per-user
composition view. The served Claude Code marketplace is now:

    (admin_granted ∖ opt_outs) ∪ store_installs

Skill + agent installs are merged into a single `agnes-store-bundle`
plugin in the served marketplace; type=plugin uploads stay standalone.
Names are suffixed with `-by-<owner-username>` at upload time so two
owners can use the same display name without colliding in Claude Code's
flat skill/agent namespace.

Schema v23 → v24 adds three tables:
  - store_entities       — community-uploaded skills/agents/plugins
  - user_store_installs  — what each user has chosen to install
  - user_plugin_optouts  — opt-out overlay on top of admin grants

Admin grant-delete drops every user's opt-out for that plugin so
re-grant resets cleanly to enabled (no sticky personal preference).

UI:
  - /store      — e-commerce-style listing with type/category/owner
                  filters, search, pagination, owner-aware [Install]
                  buttons, clickable cards
  - /store/new  — 2-step upload wizard with drag & drop, preview
                  validation (POST /api/store/entities/preview), docs
                  multi-upload, photo + video URL
  - /store/{id} — detail page with hero, file list, docs, owner
                  actions (Edit/Delete) for the uploader
  - /my-ai-stack — Granted plugins (toggle opt-out) + From the Store
                  (uninstall) sections
  - Admin nav: Marketplaces moved into Admin dropdown, renamed to
                "Curated Marketplaces"

Validation hardening: type-mismatch guards reject skill ZIP uploaded as
agent (or vice versa), and plugin ZIPs masquerading as skills/agents.
Human-readable error messages mapped client-side from machine codes.

Cross-source naming: Store entity-id-prefixed dirs (`plugins/store-<id>/`)
plus the bundle (`plugins/store-bundle/`) avoid collisions with admin
marketplaces (whose `store` slug is reserved by `is_valid_slug`).

Bundle composition is content-hashed at serve time — install/uninstall
or owner re-upload bumps the bundle's plugin.json `version`, so Claude
Code's auto-update toggle picks up changes.

Tests: 50+ new tests across naming, repositories, filter (admin ∪ store
∪ bundle), API (upload/install/uninstall/delete/preview/docs), end-to-end
marketplace.zip with bundle merging.
2026-05-05 02:53:49 +02:00
ZdenekSrotyr
c53c1e1572 fix(ui): admin pending-review banner on /corporate-memory (#176)
The /corporate-memory page filters status IN ('approved','mandatory')
and showed no hint that pending items exist. With approval_mode set to
'review_queue' (the default in instance.yaml.example), every collection
run would silently funnel new items into the pending bucket where no
operator ever saw them.

For admins (is_km_admin), the page now renders a banner above the
stats bar:
  N pending items awaiting review — review them at /corporate-memory/admin

Non-admins see no change (the route zeroes the count server-side
before passing to the template, so the hint is never leaked).

Tests: tests/test_corporate_memory_page.py.
2026-05-05 00:01:22 +02:00
ZdenekSrotyr
2ee529533f refactor(setup-page): drop role query param
The `/setup` route no longer accepts `?role=analyst|admin`. The route
signature drops the `Literal[...] = Query(...)` parameter and the
silent admin-downgrade block (`if role == "admin" and not is_admin:
role = "analyst"`). The `role` ctx variable threaded into install.html
also goes away — Task 6 cleans up the template's role-tile UI and the
JS PAT-mint ternary.

`?role=` is silently ignored by FastAPI for unknown query params, so
existing bookmarks (none in production — the param was added in this
PR and never shipped) just degrade to the unified layout. No
RedirectResponse shim needed.

Tests: drop the entire `tests/test_setup_page_roles.py` file (eight
role-branching tests that no longer apply) and add
`tests/test_setup_page_unified.py` with three tests:

  - `test_setup_page_renders_unified_layout`
  - `test_setup_page_ignores_role_query_param`
  - `test_setup_page_renders_marketplace_for_user_with_grants`
  - `test_install_legacy_path_redirects_to_setup`

Also replace the role-aware `test_install_preview_*` tests in
test_web_ui.py with unified-layout assertions.

Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 5.
2026-05-04 22:16:59 +02:00
ZdenekSrotyr
291079b1d2 refactor(welcome-template): drop role param; resolve plugins per-user unconditionally
Removes the `role: Literal["analyst", "admin"] = "admin"` parameter from
`compute_default_agent_prompt`. The same RBAC pass
(`marketplace_filter.resolve_allowed_plugins`) now runs for every user —
admin or not. Users with no `resource_grants` rows get the
no-marketplace layout; users with grants get the marketplace block
inserted. Admin-vs-analyst is no longer a layout branch.

`render_agent_prompt_banner` no longer derives a `role` from
`user.is_admin`; it just delegates to `compute_default_agent_prompt`.
Two `compute_default_agent_prompt(...role=role)` call sites in
`app/web/router.py::setup_page` are updated to drop the keyword so the
route keeps rendering — Task 5 will remove the `?role=` query
parameter and the silent admin-downgrade block from the route signature
itself.

Tests: drop role-aware assertions from test_welcome_template_renderer
and test_welcome_template_api. Both files now assert the unified
default contains `agnes init` + `uv tool install` and bans the legacy
`agnes auth import-token` / `agnes auth whoami` verbs.

Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 4.
2026-05-04 22:13:46 +02:00
ZdenekSrotyr
92d477e422 fix(setup): default /setup to analyst, hide admin tile from non-admins
Three coupled UX fixes for the analyst-onboarding flow:

1. Dashboard "Setup a new Claude Code" CTA was rendering admin paste
   prompt for everyone (analysts couldn't actually execute the marketplace
   plugin install / skills setup steps). render_agent_prompt_banner now
   picks role based on user.is_admin — analysts get the analyst flow.

2. /setup default role changed from admin to analyst. Most visitors are
   analysts; admin layout is opt-in via the admin tile or ?role=admin.

3. Admin tile is admin-only on the role-tile nav. Non-admins see only
   the analyst tile. Server-side: non-admin requesting ?role=admin is
   silently downgraded to analyst (otherwise they'd see admin paste
   prompt despite no tile).

Tests:
- New: test_setup_page_admin_tile_hidden_for_non_admin (anonymous client
  can't see "Admin CLI" or role=admin link)
- New: test_setup_page_admin_role_downgraded_for_non_admin (anonymous
  ?role=admin → analyst layout, no marketplace step in clipboard)
- New: test_install_preview_default_role_is_analyst (admin signing in to
  bare /setup gets analyst clipboard by default)
- Renamed: test_setup_page_default_role_is_admin → ..._is_analyst
- Updated: test_setup_page_admin_clipboard_renders_admin_layout uses
  FastAPI dependency_overrides to inject admin user (admin layout is
  now admin-gated)
- Updated: test_install_preview_visible_for_signed_in_user explicitly
  passes ?role=admin to exercise admin layout
2026-05-04 20:20:37 +02:00
ZdenekSrotyr
a92c624dba feat(admin): yellow banner for legacy CLI verbs in workspace-prompt override 2026-05-04 17:46:50 +02:00
ZdenekSrotyr
f731ee7897 feat(setup): /setup?role=analyst|admin branching with role tiles 2026-05-04 17:28:47 +02:00