agnes-the-ai-analyst

Author	SHA1	Message	Date
David Rybar	3167d37a56	feat(web): operator-owned Support callout in welcome hero New `instance.support` (`AGNES_INSTANCE_SUPPORT` env override) config field renders operator-authored HTML in a mint-accent callout panel inside the welcome hero on /home, below the Overview footnotes. Designed for a one-line invitation pointing at a chat space, mailing list, or runbook so every user knows where to ask for help. - `get_instance_support()` helper mirrors `get_instance_overview()` (env > yaml > "" resolution, `\| safe` filter trust boundary). - Wired into the home template context as `config.INSTANCE_SUPPORT`. - Template renders the callout inside the welcome hero, after the Overview footnotes block — empty yaml hides the block so the OSS stays vendor-neutral. - Registered in `_KNOWN_FIELDS["instance"]` so the field appears in `/admin/server-config` as "Available but unset" even before the operator populates it (discoverability for first-time setup). - 4 new tests cover the gated render path, the hidden-when-unset path, and independence from `instance.overview`. Operators who want to fill the block via terraform write the body to `modules/.../assets/support.html` in their infra repo and include it in the startup.sh yaml heredoc — the OSS template treats this as one more `\| safe`-rendered field, no other plumbing needed.	2026-05-22 14:04:24 +02:00
Vojtech Rysanek	7efcb10154	Merge remote-tracking branch 'origin/main' into vr/custom-scripts-integration # Conflicts: # CHANGELOG.md	2026-05-21 14:20:37 +04:00
David Rybar	94af2581f6	feat(theme): switch default instance theme from navy to blue and enhance theme handling - Updated the default `instance.theme` to `blue`, making it the new out-of-the-box look. The previous default `navy` can still be used by explicitly setting `AGNES_INSTANCE_THEME`. - Pre-login pages now respect the configured `instance.theme`, eliminating the abrupt color change after sign-in for navy-configured instances. - Adjusted documentation and code comments to reflect the new theme settings and their implications for existing instances. - Version bump to 0.55.6 to reflect these changes.	2026-05-21 11:24:35 +02:00
Vojtech Rysanek	4b48377d44	feat(web): instance.custom_scripts — operator-injected HTML/JS into base.html Add a generic, placement-aware mechanism for operators to inject HTML/JS into every page that extends base.html or base_login.html. Each entry takes name, enabled, placement (head_start \| head_end \| body_end), and html. Replaces the need for per-vendor helpers when shipping feedback widgets, analytics, or error-capture snippets. Trust boundary mirrors the existing instance.logo_svg / instance.overview pattern — admin-only, rendered with `\| safe`. Resolved by app/instance_config.py::get_custom_scripts(), surfaced in /admin/server-config via _KNOWN_FIELDS["instance"]. Empty default keeps the OSS vendor-neutral; sample Marker.io block ships commented out in config/instance.yaml.example as the canonical example.	2026-05-21 13:22:27 +04:00
minasarustamyan	be9549c266	Marketplace: configurable 'See all curators' URL + flea-inner hero name fix (#370 ) * fix(web): flea-inner skill/agent hero shows skill name, not parent plugin name * feat(marketplace): admin-configurable 'See all curators' link URL --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>	2026-05-21 11:02:29 +02:00
Vojtech	001e5ce40e	feat(web): /home value-first redesign + unified page-shell across app (#366 ) * feat(web): value-first /home reskin (CEO mock palette + pillars + first-session) Restructures `/home` to lead with product value instead of install steps, matching the CEO mock proposed for the homepage: - New intro hero on top — eyebrow `Welcome, {{ display_name }}`, H1 `{{ instance_brand }} is your team's AI workspace`, lede framing the product as an "AI Chief of Staff", two CTAs (`Set up in ~15 min →` jumps to the wizard, `Just browse — no install needed` jumps to `#look-around`), and a four-pillar row (Data packages · Plugins · Skills · Memory). Renders for both onboarded and not-onboarded users so the value framing is consistent across visits. - New `first-session` narrative — five-beat walkthrough (launch → pick project → memory loads → ask → close) with mock terminal frames carrying traffic-light dots, prompts, and dimmed system output. - Setup wizard chrome — progress chip (`Step 1 of N · ~15 min · One-time · Reversible`), thin progress bar, and per-step number badges on each `.install-block` so the wizard reads as bounded instead of an open-ended scroll. - Palette shift from blue to green/navy: `--hp-primary` aliases `#2ea877` (mint), `--hp-hero-bg` is navy `#0f1b3a`, code panels stay near-black `#0c1224` with warm-yellow `#ffd866` accents. The token alias is reused so downstream rules pick up the new accent automatically; instance theme overrides via `config.theme_overrides()` still win. - VS Code surface tile carries a `Recommended` pill; the existing "Want to look around first?" section is renamed to `Explore your workspace` and gets the `#look-around` anchor. All test-pinned class names and IDs (`install-hero`, `install-block`, `home-mock`, `self-mark-btn`, `setupClaudeBtn`, `offboard-strip`, `home-getting-started`, `home-gs-item`, `home-overview`, `home-usage`) preserved as structural anchors; new visual language overlays via additional classes. Existing onboarded/not-onboarded branching, `/api/me/onboarded` POST, status frame gating, post-CTA modal, and OS-tab switching JS unchanged. Stray `~/FoundryAI` comment swapped for `~/{{ workspace_dir }}` to honor the vendor-agnostic OSS rule. 51 home tests pass without modification. * fix(web): /home palette inversion — dark intro hero on top, light setup card below Previous reskin commit kept the install-hero as a dark navy gradient and rendered the new intro hero as a light surface — opposite of what the CEO mock specifies. Playwright comparison vs `data/ceo_home.html` confirmed: - CEO mock: dark navy hero at TOP (with white pillars on navy), LIGHT white setup card BELOW with light step rows and dark code panels inset. - Previous: light intro hero on top, dark setup card below. Inverted. This patch flips both: - `.home-hero-intro` now: dark navy gradient `#0f1b3a → #1a2a5f`, green radial glow in the corner, green eyebrow, white H1 (`accent` span green), rgba-white lede, green pill primary CTA, translucent-white secondary CTA, pillars row separated by hairline border-top with green square-dot bullets in front of each pillar header. - `.install-hero` and `.install-block` now: white surface card with thin green accent strip across the top, light step rows split by hairline borders, green-tinted step-number circles (`#e6f9f0` bg, `#1f8a5e` ink), green progress chip + bar. Code panels (`.install-cmd`) and terminal frames stay dark — they're the "type this" surfaces. - All previously-rgba-white descendants of `.install-hero` (close button, eyebrow, h1, lead, links, code chips, OS tabs, install notes, setup-CTA button, self-mark fallback, auto-detect badge, terminal-howto disclosure) re-skinned for light surface. All 12 home page tests still pass (no markup changes, only CSS). * fix(web): /home parity polish — system font + mock sizes + blue info hint + gray step-num After v2 palette flip, user comparison vs CEO mock surfaced three remaining gaps in the wizard area: - Font stack mismatch: Agnes inherits Inter via `style-custom.css`, but the CEO mock uses the platform system stack (San Francisco on macOS, Segoe UI on Windows). The rendered weight/letterforms read noticeably different. `.home-mock` now declares `-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif` for itself and all descendants, with the monospace stack reserved for `code`/`kbd`/`pre`, `.install-cmd`, and `.terminal-body`. - Step number badges were green-tinted; mock uses neutral gray (`#f0f2f6` bg, `#4a5168` ink) — green is reserved for the "done" state. Switched to `--hp-surface-dim` + `--hp-text-secondary`. - "Don't have a terminal open?" disclosure was an amber/yellow variant left over from the old dark-hero palette. Mock uses a blue info-hint vocabulary (`--info-bg: #eef3ff`, `--info-line: #4f7cf2`, `--info-ink: #1c3994`) with white kbd chips. Added the info-* tokens to the `:root` block and re-skinned `details.terminal-howto` (incl. summary, body, kbd) to match. Step-body type sizes also brought in line with the mock spec — `.install-block .label` (step h3 equivalent) is now 17px / 700 with 6px gap; `.install-note` body type is 14px / 1.55. `--hp-info-bg / --hp-info-ink / --hp-info-line / --hp-warn-bg / --hp-warn-ink / --hp-warn-line / --hp-surface-dim` added as first-class tokens so future hint/warn callouts pick the same colors without a duplicate vocabulary. 12/12 home tests pass. * feat(web): centralize design tokens + reword /home wizard to 6 steps (CEO mock parity) Two intertwined changes that touch both global design + /home structure: GLOBAL TOKEN SHIFT (app/web/static/style-custom.css) - `--primary` flipped from blue `#0073D1` to green `#2ea877` — same brand alias the rest of the app referenced, so every page picks up the new accent automatically. Old `--primary-dark` / `--primary-light` recolored to match. - New tokens added: `--brand-accent`, `--hero-bg`, `--hero-ink`, `--surface-dim`, `--info-bg/ink/line`, `--warn-bg/ink/line`. Brings the global vocabulary in line with the CEO mock's `:root` block so callouts and hero surfaces don't have to invent local tokens. - `--font-primary` switched from Inter-led stack to the system stack (`-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Inter", system-ui, sans-serif`) so weight/letterforms render identically on macOS (San Francisco) and Windows (Segoe UI) — matches the mock and avoids a font-loading flash for analysts without Inter installed. - Shadow tints re-cast in navy `rgba(15,27,58,...)`; focus ring uses the new green `rgba(46,168,119,0.25)`. - `.app-nav-link` font-size 13px → 14px, padding 6px 12px → 8px 14px, hover bg → `--primary-light` (mint), color → `--primary-dark`. `.app-nav-menu-item.is-active` re-tinted to the same green system. - Sweep across 26 templates (style-custom.css + 25 template files) replacing every hardcoded `#0073D1` / `#005BA3` / `#E6F3FC` / `rgba(0,115,209,…)` / `rgba(0,86,163,…)` with token references or the new green hexes — 175 occurrences total. Pages that styled their own buttons / borders / shadows pick up the new brand color without per-page overrides. /HOME WIZARD: 6 STEPS PER MOCK (app/web/templates/home_not_onboarded.html) - Step 1 reworded `Install Claude Code on your computer` + `~3 min` subhead (mock copy). - Step 2 renamed `Pick a folder for {{ instance_brand }}` (was `create your workspace folder`) — same `mkdir` command, mock-aligned framing. - NEW Step 3 `Open a terminal inside that folder` — no shell command, just the "you are standing in the right directory" reassurance with a Finder/PowerShell/file-manager howto disclosure. Mirrors the CEO mock's Step 3. - Step 4 (was Step 3, gated by `home_automode.show`) renamed `Launch Claude with auto-approve on`. Body copy lightly updated so it references "the next step" instead of "Step 4". - Step 5 (was Step 4) renamed `Get the install script and paste it into Claude`. The setup-cta-lead now explicitly says "pasting the script into Claude Code will install {{ instance_brand }}…" so existing test assertions pinning the `install Agnes` substring still match. - NEW Step 6 `Optional: create a one-word shortcut for next time` — prints an `echo 'alias {{workspace_dir\|lower}}=…' >> ~/.zshrc` one-liner for Unix and an `Add-Content $PROFILE …` equivalent for Windows. OS tabs + copy buttons reuse the existing wizard chrome. - Progress chip dynamic: `Step 1 of 6` when home_automode is on, `Step 1 of 5` when off. Progress bar fill `100 // total_steps` so the bar sits at 16-20 % on first paint. - `.step-lede` token added for the new short body copy beneath each step label (14.5px / ink-soft). - `macOS / Linux / WSL` tab labels changed to `macOS / Linux` per user instruction. Terminal-howto `WSL:` paragraph dropped; the paste-shortcut hint now reads `(Linux)` instead of `(Linux/WSL)`. Functional WSL handling in `connector_prompts.py` (it's a Linux detection fallback, not user-facing label) preserved. - `setup_instructions.py` Claude Code install hint: `npm (Linux / WSL)` → `npm (Linux)`. SURFACES — 4 CARDS PER MOCK - Replaced the 3-tile `.home-usage-grid` with a 4-card grid: - VS Code (Recommended) — `.surface-card.feature`, green ring, DAILY USE eyebrow + 5-step numbered list + `Open VS Code setup guide →` link to `/setup-advanced#vscode`. - Terminal — QUICK ACCESS eyebrow + 4-step list. - Claude Code (Desktop app) — CONNECT IT eyebrow + 4-step list. - Cowork (claude.ai) — `.surface-card.incomplete`, warn-tinted border + `Instructions needed` badge + a TODO callout describing the missing content. The card is intentionally honest about the gap rather than hiding it. TEST UPDATES - `test_web_home_page.py` negative onboarded-state assertions rebased on the new step labels (6 entries instead of 4). - `test_home_route_resolution.py` `test_home_renders_automode_block_by_default` + its `_when_env_off` counterpart now check the new `Step 4 — Launch Claude with auto-approve on` label. * fix(web): /home section content + layout — verbatim mock match User comparison flagged several remaining gaps; this patch rewrites the three lower sections of /home to match the CEO mock spec exactly: FIRST-SESSION (5 beats) - h2 28px / 700 / -.5px tracking (was 19px / 600). - lede 18px ink-soft (was 13.5px secondary). - `.session-walk` wrapper, 36px gap between beats (mock spec). - `.session-step` grid 48px / 1fr, gap 22px — number circle on the left, content on the right. - `.session-num` 40 × 40 circle with SOLID GREEN bg (`--primary`) and WHITE text + soft green shadow (was 28px mint pill w/ dark-green text). - `.session-content h3` 18px / 600 (was 14.5px / 600). - `.session-content > p` 15px. - `.session-content .annotation` 13.5px ink-muted body type with `strong` for highlighting (replaces the upper-case "WHAT'S HAPPENING" eyebrow pattern that didn't match the mock). - `.session-intro` callout card (white surface + mint icon block) framing the "five beats" tagline. - `.session-tldr` summary box (brand-light bg + brand-dark left border) wrapping up the loop. - Terminal frames re-skinned: `#0c1224` body / `#182241` bar / real macOS traffic-light colors `#ff5f57` / `#febc2e` / `#28c840`. - Terminal body 13px / 1.65 line-height with mock-spec class vocabulary: `.you` (yellow input), `.ai-name` (brand bold), `.path` (light blue), `.dim` (translucent code-ink), `.caret` (blinking cursor). - Five beats rewritten with mock's exact narrative flow (launch → menu → pick → ask → close), vendor-agnostic project names (`RevenueAnalysis`, `Onboarding`, etc.) replacing the customer- specific `GRPN_` examples in the mock. Templated `{{ instance_brand }}` / `{{ workspace_dir }}` / `{{ workspace_dir \| lower }}` (the shortcut alias) everywhere. SURFACES (4 cards) - The section is no longer wrapped in a white rectangle; the `.home-usage` class loses its bg + border + padding (mock has the cards directly on the page bg). - h2 28px (was 22px). Eyebrow 12px / 1.5px tracking / brand-dark. - `.surface-card.feature` (VS Code) now uses 2px green border + vertical brand-light → white gradient (was 1px ring). - `.surface-card.incomplete` (Cowork) uses 2px red border (`#e35e5e`) + vertical red-tint → white gradient (was yellow flat bg). - `.surface-card .steps` panel: inner surface-dim bg + 8px radius + 13px font. - `.surface-foot` top-border + ink-muted (mock spec). - `.badge-warn` now a solid red box (`#e35e5e` bg + white ink + 4px radius) instead of a yellow pill, matching the mock. - Header layout fixed: the global absorbed `header { display: flex; justify-content: space-between }` rule was making the h2 sit on the right of the eyebrow; explicit `display: block` override on `.home-mock section > header` puts the title on the LEFT under the eyebrow as the mock has. BROWSE — Explore your workspace - Wrapped in `<section class="browse-section">` with proper eyebrow + h2 + lede (was a bare `.section-label` div). - `.browse-grid` 5-col grid (was responsive auto-fill, 4-card layout). Skills tile added as a 5th card linking to `/marketplace?type=skills`. - `.browse-card` mock-spec: 22 20 padding, 28px icon, 15px title, 12.5px ink-muted desc, hover lifts -2px with brand border + shadow-md. Section wrappers (`.home-usage`, `.first-session`) no longer carry the white card chrome — they sit directly on the page bg, matching the mock. Only Getting Started + Overview keep their white cards. GLOBAL eyebrow vocabulary (`.home-hero-intro .eyebrow`, `.first-session > .eyebrow`, `.surfaces > header .eyebrow`, `.browse-section .eyebrow`) all aligned to mock spec: 12px / 700 / 1.5px tracking / brand-dark color / 14px bottom margin. Hero h1 bumped to 44px / 800 / -1px tracking (was 32px / 600). 51/51 home tests pass. fix(web): /home session-intro card + terminal-body verbatim mock match User comparison flagged three remaining /home gaps; this patch addresses each: - `.session-intro` rule was missing — the "five beats" tagline rendered as a bare line with no card chrome. Added the mock- spec card: white surface, 14px radius, 20×24 padding, 1px border + shadow-sm, with a 44×44 brand-light icon block on the left. - Beat 1 terminal-title was `~/{{ workspace_dir }} — zsh` (mock- style shell-pwd format), but the user wants every terminal frame across all 5 beats to read `claude — {{ instance_brand }}`. Updated. - Terminal-body line structure for beats 2-5 rewritten verbatim from the CEO mock: - `<span class="prompt">></span><span class="you">…</span>` now has no space between the prompt and user input (mock pattern: zero gap, the .prompt's `margin-right: 8px` provides the visual separation). - Beat 2 menu items use `<strong>[N]</strong>` numbering with project entries on indented lines, each project name followed by a `<span class="dim">(N ago)</span>` timestamp at a fixed column — instead of my prior single-line concatenation. - Beat 3 narrative split into 4 stanzas separated by blank lines (matches mock): the "Switched to <strong>X</strong>" status, then dim Loaded/Last-session lines, then a stand-alone "One unprocessed input detected:" pair, then the "Want me to process …" question. My prior version dim-wrapped the entire block, which looked off. - Beat 4 narrative split into headline summary + risks section with <strong> heads + bullet lists separated by blank lines, matching the mock's "Q1 close summary" / "Open risks" rhythm. The Q1 question carries the mock's manual line-break + 2- space continuation indent inside the `.you` span — without that, terminal-body's `white-space: pre-wrap` would auto-wrap awkwardly at a different column than the mock. - Beat 5 exit narrative uses two separate dim lines + a standalone `.ai-name` "See you next time." line, then prompt + caret. My prior version collapsed everything into one dim block. - Project names changed from customer-specific (`GRPN_`) to generic (RevenueAnalysis, WeeklyReview, Onboarding, OpsDb, HRHandShake) so the OSS distribution stays vendor-agnostic per CLAUDE.md. - `Marketing plan` examples replaced with `Q1 close` so the narrative stays plausible for an analyst audience. 12/12 home tests pass. fix(web): /home surfaces verbatim mock — VS Code thumb, Terminal expected-output, NEW badge User comparison flagged three remaining surface-section gaps: - VS Code surface card was rendering a generic "Screenshot pending" placeholder; the mock has a labeled inline mockup (`<a class="vscode-thumb">` w/ `.thumb-fallback`) showing the recommended 4-pane layout (EXPLORER yellow, TERMINAL 1 purple, TERMINAL 2 green, TERMINAL 3 orange) on a dark navy bg + a "Recommended layout" caption pill. CSS `.vscode-thumb` block added — uses gradient-strip backgrounds to draw the colored panel bars without needing a base64 image. - "Recommended" badge was a pill (999px radius) with `--brand-accent` bg + navy text. Mock uses `.badge` instead of `.recommend-pill` — solid `--primary` (brand-dark green) bg with WHITE text and 4px radius. Replaced the class + CSS rule so the badge reads as a tag, not a pill. - Terminal surface card was missing the "What you should see" subsection — mock has an `.expected-output` block showing a sample of the welcome menu inside a dim dashed panel. Added the block with the mock's exact rendered output (templated to `{{ instance_brand }}` + generic project names instead of customer-specific GRPN entries) plus the `.expected-output` CSS (surface-dim bg + dashed border + `::before` "WHAT YOU SHOULD SEE" eyebrow per mock spec). Also addressed the explore-section feedback: - Skills browse-card now carries the `new` class so it picks up the `.browse-card.new::after` corner badge ("NEW", green bg, white text, 10px / 700 / 0.5px tracking) per mock. - Browse cards align same height via `align-self: stretch` (grid default) + `flex-grow: 1` on `.browse-desc` so descriptions fill remaining vertical space; previously the Skills tile sat shorter because its desc text was longer than others'. Structural HTML changes to all four surface cards: dropped the inner `<div class="surface-card-head">` wrapper + `<p class="surface-pitch">` class in favor of mock's flat layout (`.what` + `.steps` + `.when-to-use`). `<ol class="surface-steps">` replaced with `<div class="steps"><strong class="steps-eyebrow">DAILY USE / QUICK ACCESS / CONNECT IT</strong> <ol>...</ol></div>` so the eyebrow + numbered list share the mock's tinted surface-dim panel. 12/12 home tests pass. * fix(web): align /home setup walkthrough to design spec - Setup-section header (eyebrow + heading + lede) floats above the install hero; install card has no accent strip; step labels drop `Step N —` prefix; closing strip is single flex row. - VS Code surface card renders recommended-layout screenshot from `/static/img/vscode-layout.png` with click-to-enlarge lightbox. - Workspace install path cascades to `~/Desktop/{workspace_dir}` in every step, surface card, first-session annotation, and shortcut. - Step 1 verify text restores Enterprise — Finance and Legal option. - Step 6 shortcut installs a shell function with arg forwarding (`"$@"` unix / `@args` windows) and a user-facing Auto / YOLO permission-mode toggle. - Step 5 manual-fallback details inline on the CTA row; description reads at step-lede size, not 13px chip. - Setup-section heading no longer right-aligns (was inheriting `header { display: flex; justify-content: space-between }` from the legacy stylesheet; wrapper changed to `<div>`). - Getting Started `<details>` block removed (duplicated links). * test(web): align /home tests with restructured setup wizard - Replace test_getting_started_card_renders_on_home with test_setup_section_renders_for_not_onboarded — asserts the new setup-section-header floats above the install hero and Getting Started markup is absent (block removed in the prior commit). - Update automode-block test to match labels without the `Step N —` prefix. - Update setup-CTA partial test to match the relabeled "Copy install script to clipboard" button. Drop orphaned CSS for `.home-getting-started`, `.home-gs-summary`, and `.home-gs-item` — selectors had no matching markup after the Getting Started block was removed. Also: Step 3 `pwd` expected-output uses an absolute path (`/Users/yourname/Desktop/{workspace_dir}`) instead of the tilde-prefixed form, matching what the command actually prints. fix(web): repaint home_onboarded + setup_advanced; align CTA label - home_onboarded + setup_advanced still carried the retired blue `#0056A3` as both `--hp-primary-dark` and the hero gradient endpoint. Both reference `var(--primary-dark)` now so the green palette cascades. - setup_advanced YOLO snippet was the old `alias` form (no cd, no arg forwarding). Replaced with the shell function variant from /home Step 6 — drops into ~/Desktop/{workspace_dir} and forwards "\$@" (unix) / @args (Windows). - setup_advanced ~/{workspace_dir} path references cascaded to ~/Desktop/{workspace_dir} so install story matches /home. - Dashboard's "Setup a new Claude Code" button label aligned to the canonical "Copy install script to clipboard" — matches /home and the new docstring in _claude_setup_cta.jinja, which now mandates this label across consumers. * fix(web): keep base brand blue; scope green palette to /home redesign User noticed login + dashboard had turned green when the /home redesign flipped --primary from blue (#0073D1) to green (#2ea877) in commit 278f202e. The brand-wide flip went further than the redesign needed — only /home, /home (onboarded), and /setup-advanced intentionally use the green/navy spec; every other page (login, dashboard, catalog, marketplace, admin, profile) was just inheriting the green because --primary cascaded everywhere. Revert the global brand colour to blue and lock the green into the two outstanding redesign scopes: - style-custom.css: --primary back to #0073D1, --primary-light back to rgba(0,115,209,0.1), --primary-dark back to #005BA3, --brand-accent back to a lighter blue. - home_onboarded.html: .home-mock now sets --hp-primary, --hp-primary-dark, --hp-primary-light to explicit green hex (matching home_not_onboarded), so the hero stays green regardless of the global brand. - setup_advanced.html: same lock — .advanced-mock pins the green palette in-scope. Hero gradients on both pages now reference the local --hp-primary chain (not the global --primary), so any future palette tweak inside either scope cascades correctly without disturbing the rest of the app. * refactor(web): hoist --hp-* into shared design-tokens.css (--ds-) PR 2 of the design-system extraction ladder. Pure mechanical rename + dedup; no visual diff on any rendered page (verified on /home, /dashboard). - New app/web/static/css/design-tokens.css declares the full token set on :root: brand surface (green primary, primary-dark, mint light, brand-accent), hero (navy bg + ink), code-panel (near-black bg + cool ink + warm-yellow), light surfaces (bg/surface/border), text (primary/secondary/muted), orange accent, info + warn callout vocabularies, navy-tinted elevation shadows, system font stack + mono. - base.html loads it alongside style-custom.css so the tokens are globally available. - Rename --hp- -> --ds-* in home_not_onboarded (313 refs), home_onboarded (15), setup_advanced (39). 367 token references pointed at one of three local blocks; now all point at the global :root. - Drop the three local token blocks. Each scope class (.home-mock / .advanced-mock) only keeps its base ink + font-size + line-height rules. The legacy --primary family stays canonical for the blue base brand — login, dashboard, catalog, marketplace, admin still read blue. The design system is opt-in via the scope class. * refactor(web): extract shared components.css; migrate /home markup PR 3 of the design-system extraction ladder. First batch of reusable components lifted out of home_not_onboarded.html into a new shared stylesheet; markup migrated to consume them. - New app/web/static/css/components.css with five components, all reusable on any page that loads design-tokens.css: .callout-rec — amber lightbulb recommendation box .callout-hint — blue info hint box .code-output — "WHAT YOU SHOULD SEE" terminal output block .lightbox — full-bleed image enlarge overlay .setup-section-header — wizard header (eyebrow + h2 + lede) - base.html loads components.css after design-tokens.css. - home_not_onboarded.html markup renamed: class="rec" -> class="callout-rec" class="hint" -> class="callout-hint" class="expected-output" -> class="code-output" - Local CSS rules removed from home_not_onboarded.html for each of the extracted components — ~150 lines down to 5-line "extracted to components.css" comments. The bespoke wizard-specific styles (.install-cmd, .os-tabs, .mode-tabs, .terminal-frame) stay template-local for now since they only have one consumer. Visual regression check: /home install hero renders the amber rec callout, blue hint callout, dashed code-output block, green section header, and click-to-enlarge VS Code thumb identically to the pre-extraction render. 43 home tests pass. * fix(web): unify page-headers — activity-center full-width, marketplace shares box - /activity-center audit-log hero rendered as half-width because the _page_hero include was inside <header class="obs-topbar">, a flex row that pinned the time-range + auto-refresh controls next to it. The hero is now a sibling rendered before the <header>, so it spans the full container width like every other admin page; the controls keep their flex row underneath. - Marketplace hero unified with .page-header--hero. Markup is now <section class="page-header page-header--hero mp-hero"> so the shared box drives padding/radius/gradient/max-width/shadow; the .mp-hero override block only carries the right-anchored cover image and the rules for the search row + scope checkboxes (which the canonical hero doesn't have). Inner text uses the canonical .page-header__eyebrow / __title / __subtitle classes. - .page-header--hero shadow tint now follows the brand blue (rgba(0, 115, 209, 0.2)) instead of the leftover green from the prior palette flip; same depth highlight everywhere the gradient is blue. * fix(web): unify remaining page heroes — admin, profile, install, store, stack Sweep across pages that carried bespoke gradient hero markup so every page-hero shares the canonical `.page-header--hero` dimensions (padding 28/32/24, border-radius 14, max-width var(--width-app), navy-tinted shadow, gradient with --primary → --primary-dark). Inner text uses the .page-header__eyebrow / __title / __subtitle classes so typography matches across the app. - admin_tables: migrated to _page_hero.html include. - admin_tokens: kept .tokens-hero wrapper for the counts-chip row but added the canonical class on the same element; stripped duplicate gradient + padding + typography rules. - install: same pattern (kept hero-meta pill row). - profile: migrated to _page_hero.html include. - store_upload: kept .upload-hero wrapper for the .meta chip row; composite class with the canonical hero. - setup_advanced: .advanced-mock .ad-hero now matches canonical dimensions; green palette retained via --ds-primary/dark. - stack_card.css: .stack-hero (catalog + corporate-memory search hero) uses canonical gradient + padding + max-width. The detail-page heroes (marketplace_plugin_detail, marketplace_item_detail, catalog__detail, store_edit, admin_group_detail, admin_store_submission_detail) stay bespoke for now — they're rich detail headers with photos, badges, install actions; converting them would lose contract context. Same applies to dashboard.html env-setup-cta (it's a CTA card, not a page hero). fix(web): canonicalise .container — single page shell every page inherits Previously each admin page set its own `.container:has(.<page>) {max-width: none}` + `.<page>-page {max-width: 1400px}` override, and per-page hero markup either nested inside flex toolbars (which pinned the hero next to filter controls and squeezed it half-width) or self-constrained with a different max-width than the page. /home, /dashboard, /marketplace, and /admin/* ended up at different widths with different nav-to-hero gaps. - style-custom.css `.container` now carries the canonical 1280px max-width + `16px 32px 48px` padding so every page inherits the same nav-to-hero gap and side gutters. `.container > main` is margin/padding 0 so the container is the sole owner of gutters. - `.page-header--hero` drops its self-constraining max-width and auto-centering margin — the container provides the width, so the hero sits flush with the table/toolbar below it. - `.stack-hero` (catalog + corporate-memory) and `.advanced-mock .ad-hero` (/setup-advanced) follow the same pattern: container owns the width. - Per-page max-width overrides stripped from admin_users, admin_access, admin_groups, admin_marketplaces, admin_welcome, admin_workspace_prompt. - _page_hero include extracted from inside flex toolbars on admin_users, admin_access, admin_groups, admin_marketplaces, admin_server_config, admin_welcome, admin_workspace_prompt, admin_sessions, admin_session_detail, admin_usage, activity_center. The toolbar (`.users-toolbar`, `.gp-toolbar`, etc.) keeps only the filter + action controls; hero renders before it as a sibling. - _page_chrome.html trimmed to just the page-background tint for the redesign scopes; the duplicate `.container` rules it carried are now redundant. Verified: /home, /admin/marketplaces, /admin/users all render container width 1280px with hero top at 88px (16px below the 72px-tall sticky nav). Same spacing as /home design spec. * fix(web): admin_tables + admin_corporate_memory inherit canonical .container Both pages were overriding `{% block layout %}` from base.html, which bypasses the canonical `.container` wrapper. Result: hero span the full viewport (1596px on a wide screen) while the inner content sat at a narrower max-width — hero and content didn't align, and the nav-to-hero gap differed from every other admin page. Switched both templates to `{% block content %}` so they render inside the canonical `.container` from base.html — same path as admin_groups, admin_users, admin_marketplaces, etc. - admin_tables: dropped local `.page-title { max-width: 1600px }` + `.content { max-width: 1600px }` overrides (kept typography + inner gutter rules) and the mobile padding overrides that paired with them. Container now owns the gutters. - admin_corporate_memory: only the block keyword needed changing; the template already had a clean inner structure (no max-width override on `.container-memory`). Verified on /admin/tables and /admin/corporate-memory: - .container width 1280, padding 16/32/48 - Hero top 88 (nav 72 + container padding-top 16) - Hero + content both 1216px wide, both at left 190 — perfect alignment with /admin/groups. * fix(web): drop .page-shell padding override + admin_tables stale :root Two regressions discovered after the canonical-container unification: 1. `.container:has(.page-shell)` still set `padding: 28px 32px 48px` while the canonical `.container` had moved to `16px 32px 48px`. Every page-shell consumer (/admin/sessions, /admin/sessions/<id>, /admin/usage, /marketplace, /dashboard, marketplace detail pages, /me/activity, /store/, /admin/store-submissions) was rendering with a 28px nav-to-hero gap while /admin/users + /admin/groups rendered with 16px. Same width, mismatched vertical rhythm. The opt-in rule is now a no-op marker: canonical container already provides 1280px + 16/32/48 + main margin/padding 0. 2. admin_tables.html had a stale `<style>` block that re-declared `:root { --primary: var(--primary); ... }`. The self-referential token resolved to empty, collapsing the page-header hero's `linear-gradient(135deg, var(--primary), var(--primary-dark))` to no background — the hero appeared as a pale ghost without colour. The entire shadow `:root` block was a stale copy of the design tokens that style-custom.css already provides. Dropped it; tokens now resolve from the global `:root`. After both fixes /admin/sessions, /admin/tables, and every other page-shell consumer match /admin/groups exactly: container 1280px, container padding-top 16px, hero at top 88px / left 190px / width 1216px. fix(web): drop /admin/tokens .tokens-page width + padding override `.tokens-page` carried its own `max-width: 1280px; margin: 0 auto; padding: 28px 8px 48px` block — the canonical `.container` already provides width + 16/32/48 padding, so the nested wrapper was adding 28px on top of the container's 16px (= 44px nav-to-hero gap, vs 16px on every other admin page) and shrinking the hero sideways by 8px on each side (1200px vs the canonical 1216px). After: container owns the layout; `.tokens-page` is just a font-family scope. /admin/tokens hero now sits at top 88, left 190, width 1216 — same numbers as /admin/groups / /admin/users. * fix(web): hero links readable on blue; /admin/access Groups link href - New `.page-header--hero a` rule in style-custom.css forces any anchor inside a gradient hero to render white + underlined so links stay readable on the blue background. Previously links inherited the global `var(--primary)` blue, which disappeared on top of the matching blue gradient. No per-page class needed — drop a plain `<a>` in any hero subtitle and it just works. - /admin/access hero subtitle was Jinja-passing the inline link with HTML-entity-encoded quotes (`href="..."`). The entities decoded to literal `"` characters inside the rendered href, producing `/admin/%22/admin/groups%22` — a 404. Switched the `set` to a block-set (`{% set page_hero_subtitle %}...{% endset %}`) so the inline `<a href="/admin/groups">Groups</a>` survives unescaped through `_page_hero.html`. Also stripped the now-redundant inline `style="color:#fff;text-decoration:underline;"` — the new shared rule handles it. * fix(web): /dashboard top padding matches every other page `.main` on /dashboard had `padding: 28px 32px 48px` while every other page now uses `16px 32px 48px` via the canonical `.container`. Dashboard bypasses `.container` (overrides base.html's `layout` block to render a full-width `<main>` directly), so the padding lives on `.main` itself — bumped the top to 16px to match. After: first child top = 88, left = 190, width = 1216 — same numbers as /admin/groups / /admin/users / /admin/marketplaces. * fix(web): green eyebrow + white title on .page-header--hero (matches /home) `.page-header--hero .page-header__eyebrow` was faint white (rgba(255,255,255,0.75)) — readable but unbranded against the blue gradient. Changed to `var(--ds-brand-accent)` (mint green #54d3a0) so every page hero pairs a green eyebrow with white title + subtitle, echoing /home's setup-section header (green eyebrow, dark heading combo). One CSS rule applies everywhere — no per-page styling needed. Also bumped the eyebrow to font-weight 700 / letter-spacing 1.2px so the green stands out cleanly against the gradient. * fix(web): page-header--hero + stack-hero use /home navy gradient `.page-header--hero` and `.stack-hero` were on the brand-blue gradient (`var(--primary)` → `var(--primary-dark)`) while /home's hero (`.home-hero-intro`) sits on the deeper navy gradient (`#0f1b3a` → `#1a2a5f`). Every other page-hero now uses that same navy gradient so /home, /marketplace, /catalog, /corporate-memory, /admin/, /profile, /install, /dashboard, /setup-advanced share one brand surface. Shadow tint adjusted to the navy depth (rgba(15, 27, 58, 0.22)). Brand blue stays the link/CTA colour everywhere else; only the hero box itself is navy. fix(web): primary buttons green; marketplace tabs navy translucent Two parity tweaks pulling the rest of the app toward /home's visual language. - `.btn-primary` (both rules in style-custom.css) now uses `var(--ds-primary)` / `var(--ds-primary-dark)` green fill, matching the "Copy install script to clipboard" button on /home. Brand-blue `--primary` still drives link colour and the accent surface; only the filled button background flipped to green. Every page with a `.btn-primary` (admin "+Add user", "+Add marketplace", catalog, marketplace actions, dashboard, modals) now reads as the same "do it" affordance. - `.mp-tabs` (Curated Marketplace / Flea Market / My Stack tab group) now sits on the navy `--ds-hero-bg` with translucent white pills (rgba(255,255,255,0.10) inactive, 0.18 active) — same translucent-white-on-navy treatment as the "Just browse — no install needed" pill on /home. Icons render as soft white; per-tab colour-coding dropped in favour of the unified surface. * fix(web): catalog/memory tabs + empty-state CTA + admin action buttons Bring /catalog and /memory in line with /home + /marketplace: - `.stack-tabs` (Browse / My Stack / Recipes on /catalog, Browse / My Stack on /memory) now uses the navy `--ds-hero-bg` container with translucent-white-on-navy pills, mirroring the `.mp-tabs` treatment and /home's "Just browse — no install needed" CTA pill. Per-tab icon colour-coding dropped — icons render as soft white on the navy fill. - `.stack-tabs-row__actions .btn` (right-slot "+New Recipe", "+New Data Package" admin CTAs) now uses green primary fill (`--ds-primary`), matching `.btn-primary` and /home's "Copy install script to clipboard" button. - `.stack-empty .cta a` (empty-state action button — the "Open /admin/tables →" CTA on /catalog and equivalent on /memory) flipped from blue `--primary` to green `--ds-primary` so the colour aligns with every other primary button in the app. * fix(web): marketplace Search button green (--ds-primary) matching other CTAs * fix(web): unify Search button + admin-action button across browse pages - Added Search button (`<button class="stack-hero__search-btn">`) to /catalog and /memory heroes — same green pill as /marketplace. Wired to the existing live-filter pipeline (button click runs `applyFilters()` and refocuses the input). All three browse pages now wear the identical search bar UI. - `.stack-hero__search-btn` shares `--ds-primary` fill with `.mp-hero .search-btn`. - `.mp-actions .btn` ("Submit a skill or plugin" CTA on /marketplace) flipped from the legacy blue-outline to the same green primary fill + dimensions (`display: inline-flex; line-height: 1; padding: 9px 16px; gap: 6px`) as `.stack-tabs-row__actions .btn` on /catalog and /memory. All three right-slot action buttons render at identical height now. - `.stack-tabs-row__actions .btn` got `inline-flex` + `line-height: 1` + `gap: 6px` so a `<button class="btn">` and a `<a class="btn">` both render at exactly 33px high — the embedded `.admin-only-hint` chip no longer pushes one variant taller than the other. * fix(web): marketplace guide CTAs green (fastpath + primary); drop flea purple * fix(web): dashboard CTA hero on navy; readable <code> chips in hero - `.env-setup-cta` on /dashboard ("Set up a new Claude Code" card) flipped from the brand-blue gradient + green-tinted shadow to the canonical navy gradient (`--ds-hero-bg` → `#1a2a5f`) with navy-tinted shadow + 14px radius + 28/32/24 padding, matching `.page-header--hero` and /home's `.home-hero-intro`. Dashboard's top CTA now sits on the same brand surface as every other hero. - Added `.page-header--hero code` rule — translucent white pill + warm-yellow ink (#ffd866) so `<code>` chips embedded in hero subtitles read as code samples against the navy gradient. The global `code` rule sets `color: var(--text-primary)` (dark), which turned in-hero chips into invisible dark-on-white-on-navy ghosts (e.g. the `-by-dev` suffix on /store/new). - /store/new's `.page-header__subtitle code` dropped its inline style override — the shared rule handles it now. * feat(web): two-theme switching via data-theme + admin toggle Introduces a theme system that flips the entire UI palette between "navy" (current design, default) and "blue" (pre-redesign palette) via a single `<html data-theme="...">` attribute. Page markup, class names, and component styles don't change — only the `--ds-` token values flip. Backend - New `app/instance_config.py::get_instance_theme()` resolves the active theme from `AGNES_INSTANCE_THEME` env > `instance.theme` in instance.yaml > default "navy". Unrecognised values clamp to "navy" so a typo doesn't break the page. - `app/web/router.py::_build_context` injects `instance_theme` alongside `instance_brand` etc. so every template inherits it. - `app/web/templates/base.html` renders `<html lang="en" data-theme="{{ instance_theme \| default('navy') }}">`. CSS - `app/web/static/css/design-tokens.css` adds two new tokens to the default `:root` set: `--ds-hero-shadow` (drop-shadow tint on hero boxes) and `--ds-hero-eyebrow` (eyebrow accent colour). Plus a `:root[data-theme="blue"]` override block that flips seven tokens: `--ds-primary`, `--ds-primary-dark`, `--ds-primary-light`, `--ds-brand-accent`, `--ds-hero-bg`, `--ds-hero-bg-deep`, `--ds-hero-shadow`, `--ds-hero-eyebrow`. The blue theme aliases the brand surface tokens back to the legacy `--primary` family. - `.page-header--hero`, `.stack-hero`, `.env-setup-cta`, `.home-mock .home-hero-intro` now reference the new `--ds-hero-shadow` and `--ds-hero-bg-deep` tokens instead of hard-coding `rgba(15, 27, 58, 0.22)` and `#1a2a5f` — gradient + shadow now flip with the theme. - `.page-header--hero .page-header__eyebrow` uses `var(--ds-hero-eyebrow)` so the eyebrow goes mint-green on navy and translucent-white on blue (mint on blue reads poorly). Admin - `app/api/admin.py::_KNOWN_FIELDS["instance"]` now registers a `theme` field of kind `select` with options `["navy", "blue"]` and a `hint` explaining the trade-off. The existing /admin/server-config UI auto-renders a select for this — no template changes needed. Defaults - Default value is "navy" so existing instances see no visual change. Admins flip to "blue" via /admin/server-config to restore the pre-redesign look. Restart note: uvicorn must reload to pick up the Python changes (new getter, new template-context key, new known-field). CSS changes hot-reload via browser refresh. fix(web): blue theme — home hero eyebrow + CTA contrast `.home-hero-intro .eyebrow` and `.btn-intro-primary` referenced `--ds-brand-accent` directly, which on the blue theme resolves to the lighter brand-accent blue (#4F9DEB). Result: light-blue eyebrow on the blue gradient ("WELCOME, ADMIN" barely readable) and a light-blue button with darker-blue text ("Set up in ~15 min") that all sat in the same hue range. Introduces three new theme-aware tokens: - `--ds-hero-eyebrow` already existed; blue theme bumped opacity to 0.92 so the eyebrow reads as full white. - `--ds-hero-cta-bg` + `--ds-hero-cta-fg` + `--ds-hero-cta-bg-hover` flip the primary hero CTA: mint-green on navy (default), white- on-blue under `data-theme="blue"`. `.home-hero-intro .eyebrow` now uses `--ds-hero-eyebrow` (mint on navy / white on blue) and `.btn-intro-primary` uses the CTA token trio. Recommended palette on blue theme: - Eyebrow: white at 92% opacity (clear on the blue gradient). - Primary CTA pill: white background, brand-blue dark text (`--primary-dark` = #005BA3) for AAA-level contrast. - Secondary CTA: translucent white pill (unchanged). * fix(web): blue theme — callout-hint info bg/border/ink re-tinted to brand blue (was indigo, clashed with brand-blue hero)	2026-05-21 06:19:16 +00:00
ZdenekSrotyr	64cf78860d	feat(stack): unified Browse + My Stack for Data Packages and Memory (v49 schema) (#333 ) * feat(unified-stack): Browse + My Stack + Recipes + RBAC matrix (v49–v55) Squash of 94 commits spanning the v49 → v55 unified-stack rewrite. Full per-feature breakdown lives in CHANGELOG.md under [Unreleased]. Major buckets: * v49 schema — first-class user_groups + user_group_members + resource_grants; admin can CRUD groups and grants; Google Workspace nightly sync writes into the new tables. * v49 data_packages — admin-curated bundles of tables, RBAC-gated, first-class section on /catalog Browse + My Stack. * v49 memory_domains — row-backed (replaces hardcoded VALID_DOMAINS enum); admin can CRUD; grants follow the same shape as tables and packages. * v50 cover_image_url + admin sidebar collapsibles + per-row Mode tooltip + admin queue domain badges + admin "+ New Item" seed flow. * v51 lifecycle status (prod/poc/coming-soon/draft) + category + palette swatches on admin modals. * v52 per-table detail page /catalog/t/<id>. * v53 Recipes — admin-curated SQL templates as a second tab on /catalog with full Edit/Delete admin affordances. * v54 soft-delete (deleted_at) + Undo toast for packages, memory domains, and recipes; hard_delete() retained as escape hatch. * v55 Recipes RBAC — ResourceType.RECIPE registered, inline Group Access matrix on Create + Edit Recipe modals (mirrors the Memory Domain pattern). * Activity Center per-resource filter (resource_prefix LIKE-anchored on audit_log.resource); admin nav g+letter keyboard shortcuts; loadAdminTablesLayout N+1 → single endpoint; /api/memory 30s page-level cache. * CI hardening — Keboola legacy tests pytest.importorskip; perf- smoke threshold widened to stop cold-cache flake. 5002 tests passing, 35 skipped. * feat(p2 backlog): Cmd-K palette + suggest-a-domain + nightly E2E + v55 schema 10-item P2 sweep on top of the unified-stack squash. New behaviour: * Cmd-K admin command palette (base.html) — fuzzy-search overlay over admin + user-facing routes. Arrows/Enter to navigate, Esc to close. * Stack-tabs digit shortcuts — 1/2/3 switch Browse / My Stack / Recipes on /catalog + /corporate-memory. * Friendlier non-admin empty state on /corporate-memory, plus a "Suggest a domain" CTA → POST /api/memory-domain-suggestions, admin queue with approve/reject. Backed by a new memory_domain_suggestions table (schema v55). * /admin/corporate-memory 7-tab strip grouped under Moderation / Catalog parent labels. * Bulk-assign table → package dropdown annotates each option with "(N of M tables already in)" so the existing distribution is visible before picking a target. * GET /api/memory + /tree accept is_required filter; admin status dropdowns route the "Required" sentinel onto it (status no longer holds 'mandatory' post-v49, so the old dropdown returned nothing). * chip-input.js is now opt-in per template via {% block extra_scripts %} instead of loaded globally on every page from base.html. * Edit-modal close helpers consolidated onto _closeEditModalById(); docs the per-source-type modal architecture decision. * New .github/workflows/e2e-nightly.yml runs agent-browser smoke scripts (scripts/e2e/smoke_.sh) against a docker-compose stack nightly at 04:30 UTC; failures open an agent-browser-nightly issue. 5012 tests passing, 35 skipped. fix(visual audit): 6 page regressions on memory + data-package surfaces agent-browser walkthrough of every memory + data-package page in the PR turned up 6 real bugs. Fixes: 1. Admin memory modals were dead. Duplicate `let _cmdNewDomainId` declarations from the deprecated step-2 RBAC stubs in admin_corporate_memory.html collided with the live state vars declared earlier in the same <script> → SyntaxError on parse → the entire second script block silently failed → every inline onclick= handler defined there (`+ New Memory Domain`, Edit, etc.) was a no-op. Removed the duplicate stubs. 2. /catalog/t/<table_id> + /catalog/r/<slug> rendered unstyled. Both templates injected their CSS via {% block head %} but base.html exposes {% block head_extra %} — wrong block name meant <style> rules never reached the rendered HTML. Renamed to head_extra. Hero card, section cards, dark SQL block, proper full-width inputs all now render as designed. 3. L49 leak — "MANDATORY" KPI label + "Make Mandatory" row buttons on /admin/corporate-memory still used the old word. Renamed to "Required" / "Mark as Required" so UI matches the data model (v49 split moved the Required tier onto the orthogonal is_required boolean; status no longer holds 'mandatory'). 4. Activity Center Resource dropdown didn't know the v55 `memory_domain_suggestion:` namespace — added it. 5. Tab strip on /admin/corporate-memory wrapped text 2× per button on narrow viewports after the L50 MODERATION/CATALOG group labels pushed total width past most viewports. Switched the strip to flex-wrap:nowrap + overflow-x:auto with white-space:nowrap + flex-shrink:0 on every direct child so the tabs stay one row and slide horizontally when they overflow. 5012 tests passing, 35 skipped. * rebase-cleanup: align with main's 0.54.25-27 API design + comment fix Three follow-on fixes after rebasing onto origin/main (0.54.27): * admin_tables.html: dropped a stray nested ``{% if data_source_type == 'keboola' %}`` around ``prefillFromKeboolaTable`` (main never had it; the outer Phase F2 guard already covers it) and reworded a JS comment that contained literal ``{% %}`` tokens which Jinja was parsing as a real tag → unbalanced if/endif → 30 template render failures across the suite. * /api/stack/subscription/{type}/{id}: DELETE now returns 204 instead of 200 per the 0.54.26 design rules. CLI client + parity tests updated to accept 2xx / assert 204. * Memory-domain suggestion approve/reject paths added to ``_VERB_PATH_ALLOWLIST`` — they are pending → approved/rejected state-machine transitions (approve also creates the real memory_domains row as a side effect), so the RPC shape is intentional rather than a missed PATCH refactor. 5035 tests passing, 35 skipped. * fix(catalog_table_detail): real polish pass — hero glyph, dedup pills, rows/size meta, scoped sync CTA The previous fix only got the block-name typo so the existing CSS rendered. The actual layout was still wireframe-tier on close inspection: * No cover glyph in the hero (a flat white card with title + meta line); data-package + memory-domain detail pages both have a colored icon square. Restored parity — table.icon emoji if set, otherwise initials on a colored square using table.color. * "INTERNAL" pill rendered twice for agnes_audit etc. — the mode pill and the source-type pill happened to be identical strings. Now skip the source pill when it matches the mode (`internal == internal`). * Bucket / source_table code chip showed `Agnes Internal.audit_log` for internal rows — meaningless to a user. Hidden when source_type is internal. * `pairs_well_with` admin input was a comma-separated `<input>` always visible. Wrapped all 4 sections in an Edit-on-demand toggle: read- only display by default, "+ Add" / "Edit" button on the right edge of each section header reveals the inline form, Cancel hides it. * "Trigger sync now" was a cramped link squashed into the empty-state flex row (visible as `Tr…` overflow before). Promoted to a proper btn-primary button under the empty-state copy. Hidden entirely for internal tables (which are server-managed — no upstream to pull). * Hero meta now surfaces row count + payload size (when sync_state has them) + last sync timestamp on a single line — was missing from the original. * Mode pills colored by tier (local=green, remote=amber, materialized= blue, internal=gray) so the basic fact about a table reads at a glance, not from upper-cased ALL-CAPS text alone. * tests(v56): TDD baseline for extended data-packages content + per-table docs 68 failing tests across 8 files spec the v56 surface before any implementation lands: * test_schema_v55_to_v56_migration.py — schema bump, additive ALTERs on data_packages + table_registry, idempotency, sequential-upgrade preservation * test_data_packages_repo_v56.py — repo create/update/get/list for owner_name, owner_team, tags, long_description, when_to_use, when_not_to_use, example_questions (JSON list round-trip, empty defaults, partial-update preservation) * test_table_registry_v56_docs.py — update_docs for grain, platforms, partition_col, history, gotchas; preserves v52 docs columns * test_api_data_packages_v56.py — PUT/POST/GET for all new fields, field-level validation (tag count, bullet length, description size), virtual badge derivation (curated/new) * test_api_registry_docs_v56.py — PATCH /api/admin/registry/{id}/docs for v56 fields, validation, RBAC unchanged * test_web_catalog_package_detail_v56.py — /catalog/p/<slug> rewrite asserts on rendered owner line, tag pills, badges, What it is, Use it when, Skip it when, Example questions, per-table extended detail in collapsible row, key-gotcha distinctness, admin-only Edit * test_web_stack_card_v56_metadata.py — Browse-grid card additions (owner chip, tag chips, badges) without breaking back-compat for rows missing the new fields * test_data_packages_no_vendor_content.py — CI guard: scans app/ + src/ + cli/ + config/ + scripts/ for Groupon-specific tokens from the colleague's spec MD; fails if any leak into OSS surfaces * test_db_schema_version.py — bumped 55 → 56 with rationale Plus updates schema-version assertion to 56. Implementation lands in subsequent commits (schema migration → repo → API → templates). * feat(v56): schema + repo for extended data-packages content Schema additions (ALTER ADD COLUMN IF NOT EXISTS — additive + idempotent): * data_packages: owner_name, owner_team, tags, long_description, when_to_use, when_not_to_use, example_questions (JSON-as-VARCHAR for the lists) * table_registry: grain, platforms, partition_col, history, gotchas (extends the v52 sample_questions / things_to_know / pairs_well_with docs surface with structured per-table content) Repo extensions: * DataPackagesRepository.create + update accept the new fields with the same Optional-is-no-op contract as v51 (pass an empty list to clear a JSON column) * _decode_row decodes the new JSON-list columns to Python lists; NULL rounds back to [] so callers don't branch * TableRegistryRepository.update_docs grew the v56 fields alongside the existing v52 ones — single PATCH can write either tier atomically * TableRegistryRepository._decode_row picks up platforms + gotchas in the same NULL-tolerant decoder 22 repo + migration tests passing. API + UI land in subsequent commits. * feat(v56): API surface for extended data-packages + per-table docs CreateDataPackageRequest + UpdateDataPackageRequest grew the v56 fields (owner_name, owner_team, tags, long_description, when_to_use, when_not_to_use, example_questions) with per-field validators that match the Foundry spec checklist: * tags: ≤8 entries × ≤30 chars * long_description: ≤4000 chars * use/skip: ≤8 bullets × ≤200 chars * example_questions: ≤12 × ≤200 chars _serialize emits all v56 fields plus a virtual ``badges`` list derived server-side at render time (no DB column needed): "curated" when the creator is in the Admin group, "new" within 30 days of created_at. Backdating created_at or admin-status changes pick up automatically. PATCH /api/admin/registry/{id}/docs extended with v56 structured per-table fields (grain, platforms, partition_col, history, gotchas). gotchas: list of {key: bool, body: str} Pydantic models with the same ≤8 cap; first key=true entry becomes the Key gotcha on the rendered package detail page. PATCH echoes the fresh state so callers can re-render without a second GET. 26 API tests passing (16 data-packages + 10 registry-docs). * feat(v56): /catalog/p/<slug> rewrite + Browse-grid card augmentation The third (and final) v56 commit lights up the UI surfaces backed by the schema + API commits earlier in this PR: * /catalog/p/<slug> template rebuilt around the Foundry spec's section ladder — hero (icon + name + badges + owner + tags + description + meta + Add-to-stack), "What it is" markdown body, paired "Use it when / Skip it when" panels, "Tables in this package" with collapsible per-table extended detail (grain / platforms / partition_col / history / gotchas + sample questions), and an "Example questions you can ask Claude" prompt panel. Each section guarded by ``{% if pkg.<field> %}`` — empty content fields hide the section entirely (no "No X yet" placeholder noise on the public-facing drilldown). * router catalog_package_detail hydrates per-table v56 fields onto the tables list + derives the virtual badges (curated / new) server-side from creator-in-Admin + 30-day created_at. * StackResolver.ResourceEntry grew owner_name / owner_team / tags / badges; _fetch_entries pulls the v56 columns + computes badges once per fetch using a single Admin-group SELECT. * _data_package_entry_dict adapter passes the new fields through to the macro; tags are merged source-type pills + admin-authored category tags per the spec convention. * _stack_card.html renders the v56 badges (top-left, data-badge= hooks) + the owner chip (data-card-owner hook) without breaking back-compat — pre-v56 rows render unchanged. * Admin PUT handler strips the v56 docs fields from the read-modify-write merged dict so register() doesn't blow up with the now-larger row shape (same pattern as the v52 docs fields stripping). 5115 tests passing (+98 v56 + 18 fixed regressions from the merged- register PUT path), 35 skipped. * fix(rbac): Edit-on-package + Group-access 'required' persistence + CI vendor guard Three related bugs reported on the merged-with-main branch: 1. Clicking Edit on a Data Package card landed on /admin/tables with a `#<pkg.id>` hash that nothing listened to — admin saw the global table listing, not the editor for that specific package. Added a `?edit_package=<pkg_id>` query-param handler in admin_tables.html (analog to the existing `?edit=<table_id>` and `?assign_to=<pkg_id>` patterns) that calls openEditDataPackageModal on DOMContentLoaded after a 250ms layout settle. Updated the package-detail Edit link to use the new query param. 2. Setting Group Access to 'required' didn't persist — re-opening the modal showed 'available'. Root cause was the v49 ``resource_grants.requirement`` enum existing in the DB but the POST /api/admin/grants endpoint not surfacing it: ``CreateGrantRequest`` declared only group_id + resource_type + resource_id, so Pydantic silently dropped the matrix's ``requirement: 'required'`` payload and the new row landed at the DB column default ('available'). Plumbed ``requirement`` through ``CreateGrantRequest`` → ``ResourceGrantsRepository.create`` so the value persists in one round-trip. Plus a UNIQUE-constraint race in the matrix diff-apply: DELETE-old + POST-new ran in parallel via ``Promise.allSettled``, so POST could fire first and trip the unique check before DELETE freed the slot. Switched to sequential (await all deletes; then await all writes) across all three matrices (Edit Data Package, Edit Memory Domain, Edit Recipe). 3. CI vendor-content guard ``test_no_groupon_specific_strings_in_oss`` tripped on two of my own docstrings: a "Foundry Data team" mention in two src/db.py comments + an ``s1_session_landings`` example in cli/skills/agnes-table-registration.md. Rephrased the comments to "extended-descriptions admin spec" and replaced the example with a generic ``events_daily`` table name. 5164 tests passing, 35 skipped (+4 regression tests pinning the POST /api/admin/grants requirement contract). Vendor guard back to green. * fix(catalog): admin Browse path drops v58 card fields The /catalog and /memory admin god-mode branch built ResourceEntry instances inline from pkg_repo.list() / domains_repo.list() and skipped owner_name, owner_team, tags, and derived badges (curated/new). Visible symptom: a package with an owner + tags rendered with the v56 chrome for non-admin viewers but as a bare card for admins. Adds StackResolver.browse_admin(user_id, resource_type) — admin god-mode Browse that walks the full table but routes through the same _fetch_entries enrichment pass as browse(), so admin + non-admin Browse stay visually consistent. Both /catalog and /corporate-memory routes switch to it. Regression test in tests/test_stack_resolver_browse_admin.py covers: owner/tags propagation, new/curated badge derivation, in_stack from admin subscriptions, all-packages-regardless-of-grants, and the ValueError for unsupported resource types. * fix(catalog): three /catalog tab-strip UX bugs 1. Required Remove → red toast browse_admin passed empty required_ids to _fetch_entries, so the admin's own required grants surfaced as 'available' and the macro rendered an actionable Remove button that POST /unsubscribe 400'd on. Now derives required_ids from the admin's own groups so Required packages render with the disabled "In stack (required)" button. Regression test in test_stack_resolver_browse_admin.py. 2. Remove green-toasts but card stays until refresh The My-Stack empty-state placeholder was only emitted server-side when stack_entries was empty at render time. Removing the last card left the tab completely blank — users read that as "Remove didn't work, let me refresh". Both grid + empty-state are now always rendered with one of them initially hidden; the JS swaps visibility on add/remove instead of injecting DOM. Same fix in /corporate-memory. 3. "What are Recipes?" + ambiguous (admin) suffix Recipes tab now carries its own curator-block explainer (the shared one was moved inside Browse view so it doesn't bleed across tabs). The grey "(admin)" suffix becomes a yellow .admin-only-hint chip with a title tooltip — visibility hint is now unambiguous: yellow chip = "only you see this", non-admins don't see the affordance at all. * schema: renumber v51..v58 → v52..v59 to make room for main's v51 Main 0.54.29 introduced a NEW v51 (table_registry.bq_fqn — issue #343) that releases ahead of this branch. The unified-stack chain v51..v58 shifts up by one so main's v51 stays as the released schema and ours become v52..v59. Function names, internal version bumps, dispatch ladder thresholds, and the migration-test references all move together. Subsequent merge with main lands the bq_fqn column at the freed v51 slot. * fix(seed): seed admin lands in BOTH Admin AND Everyone groups The LOCAL_DEV_MODE / SEED_ADMIN_EMAIL bootstrap only added the seed user to Admin. Everyone-scoped grants — the canonical "every-user- sees-this" pattern for Required onboarding — didn't surface for the seed admin's own /catalog because they weren't in Everyone. Symptom: admin grants a Required-tier package to Everyone, then sees it on /catalog still rendered with an "Add to stack" button (because the admin's resolved required_ids was empty for that package). The dual-membership keeps Admin (authorization) and Everyone (default-grant target) intentionally separate per the design comment on UserRepository.create — every membership remains traceable to a concrete row, just now with a system_seed row in Everyone too. Both INSERTs go through UserGroupMembersRepository.add_member which is idempotent on (user_id, group_id), so re-fires on every lifespan startup don't duplicate rows. Regression test in test_main_seed_admin_everyone.py. * style: unify admin-only hints across marketplace + memory detail pages Replaces three stale ``(admin)`` parentheticals with the same yellow ``admin-only`` chip introduced for /catalog tab actions. Same tooltip copy ("Visible only to admins — analysts won't see this …") so the visibility hint is unmistakable wherever it appears: - Hard delete on marketplace_plugin_detail (admin-only destructive action — same gating as the original suffix conveyed). - Hard delete on marketplace_item_detail (same). - Edit link on memory_domain_detail (title-attr only before; now a visible chip too). Non-admin viewers never saw these affordances — the gates are unchanged. Pure styling pass for consistency. * fix(catalog): exclude soft-deleted data packages + memory domains from Browse ``StackResolver._fetch_entries`` and ``browse_admin`` were querying data_packages / memory_domains without a ``deleted_at IS NULL`` guard. A package soft-deleted via /admin/* (v54 soft-delete contract) stayed visible on /catalog and /memory until either an Undo or a hard delete — directly contradicting the soft-delete UX which is supposed to remove the affordance immediately and only retain the row for the Undo window. The repository accessors (DataPackagesRepository.list, MemoryDomainsRepository.list, list_packages_of_table, etc.) already filter deleted rows; this commit brings the resolver's direct SQL in line with that contract. Regression test in test_stack_resolver_browse_admin.py. * fix(catalog): Add/Remove updates full card chrome, not just button The previous _applyStackChange flipped only the footer button label — the card border (.is-in-stack class), top-right "In stack" badge, and button color class (--add / --remove) stayed at their server-rendered state. After Add the user saw the button checkmark but the rest of the card still looked like "available, not in stack". They read this as "the change didn't take — let me refresh". This commit makes the optimistic update mirror what the server-side macro renders for the new state: * ``c.classList.toggle('is-in-stack', becameInStack)`` — flips the border + visual state class. * Top-right ``.stack-card__req-badge--instack`` badge is injected on Add, removed on Remove (skipped when ``data-requirement='required'`` — that slot is owned by the Required badge). * Button text is "Remove" / "+ Add to stack" matching the macro (was "✓ In stack" which was visually nice but inconsistent). * Button color class --add / --remove swaps so the destructive Remove tint kicks in immediately. The clone-into-My-Stack path applies the same updates so the new card in My Stack reads identically to a server-rendered in_stack card. Mirrored in /corporate-memory. * fix(memory): four Devin-review bugs on /memory drill-down + manifest PR #333 Devin review surfaced four real bugs that ship a broken /memory experience even though the unit tests passed. 1. Manifest md5 omits is_required + content (app/api/sync.py:836-840) _build_memory_domains_section hashed only (id\|title\|status) per item. _build_per_domain_markdown routes items between "## Required" and "## Approved" by is_required and embeds full content — so an admin edit of either dimension left the manifest md5 unchanged, `agnes pull` skipped the re-fetch, and the analyst kept a stale bundle.md. Now both fields participate in the hash. 2. required_count always 0 (src/repositories/memory_domains.py) list_items_of_domain only SELECTed (id, title, status) so the `it.get("is_required")` in the manifest builder always evaluated to None → required_count = 0 regardless of actual state. The manifest builder advertised a count it could never compute. Now projects is_required + content too (required by fix 1 anyway). 3. Vote URL 404 (memory_domain_detail.html:289-290) Constructed `/api/memory/items/{id}/vote` but the route is `/api/memory/{id}/vote`. Every upvote/downvote button was a silent no-op. 4. Dismiss/undismiss URL + method both wrong (memory_domain_detail.html:296-305) Constructed `/api/memory/items/{id}/dismiss` (extra /items/) and /undismiss (no such route — undismiss is DELETE on /dismiss). Both buttons silently 404'd. Now POST + DELETE on `/api/memory/{id}/dismiss` per app/api/memory.py:635/675. * fix: multi-agent reviewer findings — vendor-token scrubs + manifest md5 predicate + soft-delete filter Three reviewer findings from the multi-agent review on PR #333, fixed in-place per CLAUDE.md issue-economy rule. Reviewer-rules (Important — vendor-agnostic OSS): - app/main.py:218 comment: replaced 'foundryai-prod' with generic 'a customer prod instance' phrasing. Public OSS repo must not carry customer-specific tokens (CLAUDE.md § Project conventions). - tests/test_table_registry_v56_docs.py:70 fixture string: replaced "user_brand_affiliation = 'groupon'" with 'acme' on the same rule. Reviewer-architecture (closes still-unresolved Devin 🚩 ANALYSIS): - app/api/sync.py _build_memory_domains_section: md5 hash loop now filters items to the SAME predicate the bundle renderer uses (is_required OR status='approved'). Pre-fix the hash iterated ALL items but _build_per_domain_markdown only rendered the union of required items + approved-non-required items — so an admin edit to a pending/rejected non-required item flipped the md5 against an identical-bytes bundle, triggering a wasteful re-fetch on every analyst's next 'agnes pull'. The earlier commit fixed the hash-input fields (is_required + content); this closes the set-of-items asymmetry Devin separately flagged. Reviewer-RBAC (minor cleanup): - app/resource_types.py _data_package_blocks and _memory_domain_blocks now filter 'WHERE deleted_at IS NULL' (v54 soft-delete column) so the /admin/access UI doesn't surface soft-deleted entities as grantable. Mirrors the existing filter on _recipe_blocks. No security leak pre-fix (resolver double-filters and re-checks at serve time), just UI cleanliness. - app/services/stack_resolver.py add_to_stack: docstring note added explaining that authorization is enforced at the API layer (app/api/stack.py can_access gate), not at the resolver. The initial review suggested adding a defensive 403 here, but that broke 5 existing tests that legitimately call add_to_stack directly without setting up grants first; the docstring captures the contract instead. stack() already intersects subscriptions with current available_ids on every read, so a 'zombie' row from a misuse never leaks into the user-facing manifest. * release: 0.55.0 — unified Browse + My Stack (Data Packages + Memory), schema v48→v59, 3 BREAKING	2026-05-19 15:00:15 +02:00
ZdenekSrotyr	c3e82972c8	feat(bq): decouple table_registry bucket from BQ dataset name (#343 ) (#346 ) * feat(bq): decouple table_registry bucket from BQ dataset name (#343) Adds optional `bq_fqn` column (schema v51) carrying the fully-qualified BigQuery path (project.dataset.table) so the rebuild path no longer has to reconstruct it from the dual-purpose `bucket` field (which is also a UX/RBAC label). - Schema v51 migration + _SYSTEM_SCHEMA carry the nullable column; rows without it keep using the legacy bucket+source_table+ remote_attach.project path (backwards compat). - BQ extractor honors bq_fqn per row when present: dataset/table override on same-project rows; cross-project VIEW path works via bigquery_query(billing, ...); cross-project BASE TABLE skipped with a clear warning (multi-ATTACH per project deferred to follow-up). - Orchestrator pre-pass detects drift between extract.duckdb _remote_attach.url and overlay data_source.bigquery.project, calls rebuild_from_registry to regenerate when they differ. Closes the operational hazard where /admin/server-config edits silently left the on-disk extract pointing at the old project until the next manual sync. - Startup config check warns when project ≠ billing_project without location set (the on-disk symptom is "provider returned no data" silently in metadata cache), and when a warehouse-like data project has no billing_project override (silent 403 serviceusage path). - _resolve_bq_location warning now points at the location config key explicitly so operators see the actionable fix in the log. - POST /api/admin/register-table and PUT /api/admin/registry/{id} accept bq_fqn; malformed values rejected at the API boundary (422). - 25 tests covering parse_bq_fqn matrix, extractor override paths (same-project + cross-project VIEW + cross-project BASE TABLE skip), orchestrator drift sync, startup-validator heuristic, admin models. UI surface for bq_fqn input in /admin/tables intentionally omitted from this PR (3.5k-line template change) — admins can register through the REST API or `agnes admin` CLI in the meantime. Multi-project ATTACH support is the same scope deferral as the cross-project BASE TABLE skip; both ride a follow-up PR. * review fixes: abstract CHANGELOG, merge duplicate Changed, bump docs schema version - CHANGELOG.md: remove customer-specific hostname + incident date range from the orchestrator drift-sync entry (vendor-agnostic OSS rule), fold the entry into the existing [Unreleased] ### Changed section instead of opening a duplicate heading. - docs/architecture.md: bump 'Current schema version' from 19 to 51 to match SCHEMA_VERSION (per agnes-orchestrator skill rule #4). * review fixes: vendor-agnostic test fixture + Schema v51 internal bullet - tests/test_bq_fqn.py: replace customer GCP project ID with generic 'my-warehouse-project' placeholder (vendor-agnostic OSS rule). Test asserts on the warehouse-like heuristic, not the literal project name, so the rename is behavior-neutral. - CHANGELOG.md: add explicit '\\Schema v51\\' bullet under `### Internal` naming the new version + summarizing the additive nullable column (matches the convention from v47/v48 bullets). * fix(bq): cross-project _detect_table_type bills against extractor project Addresses Devin review on #346 — pre-fix _detect_table_type passed the data project as BOTH the FROM-clause target AND the bigquery_query() first arg (billing project). For cross-project bq_fqn rows where fqn_project != project_id, the data SA holds bigquery.dataViewer on fqn_project but the serviceusage.services.use permission only on project_id, so the call 403'd. init_extract's broad except Exception swallowed the error and silently skipped the row, meaning the cross-project VIEW path at extractor.py:~696 — the PR's primary cross-project use case — never executed. - Add optional billing_project kwarg to _detect_table_type; defaults to project for backwards compat (same-project callers unaffected). - Update the init_extract call site to pass billing_project=project_id explicitly. Same-project rows (fqn_project == project_id) are a no-op; cross-project rows now route billing to the project where the SA actually has services.use. - 2 new tests in TestDetectTableTypeBilling cover (a) explicit billing_project routing to bigquery_query 1st arg + data project staying in FROM, and (b) the backwards-compat default. Plus test_cross_project_detect_call_bills_against_extractor_project pins the call-site wiring — captures the (project, billing_project) pair the extractor passes for a cross-project bq_fqn row. * release: 0.54.29 — bq_fqn decoupling + marketplace refactor + setup-script UX Accumulated [Unreleased] content from #342 (flea marketplace refactor), #344 (setup script step-2 cwd check), and #346 (this PR — bq_fqn column + orchestrator drift sync + startup config check). Schema v51.	2026-05-19 11:17:32 +00:00
Vojtech	c552bf8243	feat(api): enforce API design rules via pytest + fix DELETE/status-code violations (#338 ) * feat(api): enforce API design rules via pytest + fix DELETE/status-code violations Adds tests/test_api_design_rules.py with four forward-only design guardrails that prevent new endpoints from accumulating REST debt: Rule 1 — No new verbs in URL paths (existing 28 grandfathered via allowlist) Rule 2 — DELETE must declare 204 No Content (zero allowlist entries) Rule 3 — Creator POSTs (path has GET counterpart) must declare 201/202 Rule 4 — All protected /api/* routes must declare 401 and 403 Fixes found by running the rules: - DELETE /api/admin/metrics/{metric_id}: return 204, drop redundant body - DELETE /api/memory/{item_id}/dismiss (undismiss): return 204, drop body - POST /api/memory/admin/contradictions: add status_code=201 (creates a resource) - app/main.py: _add_auth_error_responses() injected into app.openapi() at startup; declares 401/403 on all protected /api/* operations centrally, fixing the 120 routes that previously omitted these response codes from the spec. Closes #337 * fix(api): resolve CI failures — extend 204 fixes + complete allowlists - Fix remaining 6 DELETE endpoints to return 204: store entities, store entity install, marketplace curated install, marketplace plugin system flag, admin store submission, and observability view - Update all affected tests to expect 204 (removed body assertions) - Add 4 missing verb paths to _VERB_PATH_ALLOWLIST in test_api_design_rules.py - Add 2 upsert endpoints to _CREATOR_POST_ALLOWLIST - Update admin_marketplaces.html to not call r.json() on 204 DELETE * fix(tests): align 2 DELETE-asserting tests with 204 contract (post-#339 rebase) CI's test-shard (1) and (4) failures on this PR were caused by Vojta's second commit (`fix(api): resolve CI failures — extend 204 fixes`) flipping more DELETE endpoints to status_code=204 than just the two mentioned in the PR body. Two tests assert status_code==200 on the DELETE response and broke: - tests/test_admin_store_submissions.py::TestQuarantineGates::test_admin_can_delete_quarantined (DELETE /api/store/entities/{entity_id}) - tests/test_store_api.py::TestInstallCycle::test_admin_hard_delete_cascades_installs (DELETE /api/store/entities/{entity_id}?hard=true) Updated both to assert 204 with a comment pointing at tests/test_api_design_rules.py rule 2 so future reviewers can trace the contract. Verified via broader scan that no other test asserts == 200 on a .delete() response directly (4 other sites do .delete() then check 200 on a subsequent GET — those are fine). * release: 0.54.26 — API design rules (test_api_design_rules.py) + 8 DELETE endpoints flip to 204 --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-18 15:25:07 +02:00
Vojtech	9eaa1dc53c	fix(store): rescan promotes non-current submission when guardrails off (Codex follow-up to #330 ) (#331 ) * fix(store): rescan promotes non-current submission when guardrails off Codex adversarial-review follow-up on PR #330: admin rescan with `guardrails.enabled: false` flipped submission status to `approved` and entity visibility to `approved` but never called `promote_to_version`. A rescan that re-approved a non-current v2+ left the entity stuck at the prior version even though the operator's intent in clicking rescan was to publish the rescanned bytes. Mirrors the inline-promote pattern in create / update / restore. The guardrails-on path is unchanged — it schedules an LLM review and promotion lands via `runner.run_llm_review` on approval. Adds tests for the byte-identical edge cases Codex flagged as under-covered by PR #330: - TestPromoteLookupByByteIdenticalBundles::test_byte_identical_v3_after_different_v2 - TestOverrideForwardOnly::test_override_byte_identical_v2_blocked_promotes_correctly - TestRescanPromotesNonCurrent::test_rescan_promotes_non_current_v2_when_guardrails_disabled * release: 0.54.23 — rescan promotes non-current submission when guardrails off (Codex follow-up to #330) --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-16 07:04:28 +02:00
Vojtech	78cd243e65	fix(store): promote-on-approve looks up version_no by submission_id (live agnes-development bug) (#330 ) * fix(store): promote-on-approve looks up version_no by submission_id Live bug observed on agnes-development: an entity had 5+ version_history rows sharing the same `hash` (user re-uploaded byte-identical bundles as v2/v4/v6 of the same skill — the LLM and inline checks happily approved each one). The runner's promote-on-approve path looked up the submission's version_no by hash: for entry in entity.version_history: if entry["hash"] == sub_hash: target = int(entry["n"]); break The loop matched the FIRST hash collision — always v1, n=1. With current=1, the forward-only `target > current` guard then skipped the promote, leaving the entity stuck at v1 even though the new submission's status flipped to `approved`. UI kept showing v1 as "current". Fix: look up by submission_id via the existing `_version_no_for_submission` helper (already used by retry / rescan / download paths). Same lookup applied in `admin_override_store_submission` which had the identical hash-match loop. Test: TestPromoteLookupByByteIdenticalBundles uploads v1 + a byte-identical v2, drives the LLM with mock-approve, asserts entity.version_no advances to 2. * fix: bundle #329 reviewer-Important follow-ups + post-merge polish Bundled with Vojtech's commit ahead of this (the promote-on-approve `version_no` lookup-by-submission_id fix) since #330 is the next release-cut PR and the four #329 follow-ups would otherwise need a standalone release-cut PR — prohibited by docs/RELEASING.md § "Release-cut belongs to the PR". Fixed: - src/usage_ask.py — SCHEMA_DIGEST + SYSTEM_PROMPT referenced the dropped `usage_plugin_daily` table. The admin `POST /api/admin/telemetry/ask` endpoint ships SYSTEM_PROMPT to the LLM, so any model-emitted SQL against `usage_plugin_daily` would fail with a DuckDB binder error post-#329 merge. Updated to describe the new v48 rollups (`usage_marketplace_item_daily` / `_window`) and rule 5 of the prompt to point at them. Internal: - CHANGELOG.md [0.54.20] section restored to its canonical content from the v0.54.20 git tag. The #329 self-merge carried 226 lines of author's pre-rebase bullets that ended up mis-attributed; the published v0.54.20 GitHub Release (FTS BM25 + batch bar) now matches the CHANGELOG section verbatim. Also fills in [Unreleased] with this PR's bullets (Fixed + Internal). - tests/conftest.py — dropped the unused `conn_with_usage_schema_and_attribution` fixture that INSERTed into the now-removed `usage_attribution_` tables. Zero callers today, but a tripwire — the first future test to request it would have failed with a binder error. - app/web/templates/marketplace.html — replaced a customer-specific token (`groupon-marketplace`) in the Most Popular sort-tiebreaker comment with a generic `<customer>-marketplace` placeholder per CLAUDE.md § Vendor-agnostic OSS. Also scrubbed an `agnes-development` reference in app/api/admin.py and src/store_guardrails/runner.py (cherry-picked from Vojtech's commit) on the same hygiene rule. release: 0.54.22 — flea-market promote-by-submission_id fix + #329 reviewer follow-ups --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-15 21:21:14 +02:00
minasarustamyan	302cf58ccd	feat(marketplace): telemetry v46 + flea inner parity + listing polish (#329 ) * feat(telemetry): marketplace item rollup refactor (schema v46) Replace the v42 attribution layer with prefix-split + live lookup against marketplace_plugins / store_entities. The v42 design had a latent bug — AttributionLookup keyed on bare skill names while Claude Code writes `<plugin>:<local>` in JSONL, so lookups never matched and usage_plugin_daily stayed empty in every deployment. Schema (v46 migration): - Drop usage_attribution_skills / _agents / _commands (mapping tables, derivable from marketplace_plugins + plugin tree). - Drop usage_plugin_daily (always empty in production due to the bug above). - Create usage_marketplace_item_daily — per-day fact (count, distinct_users, error_count), composite PK on (day, source, type, parent_plugin, name). - Create usage_marketplace_item_window — sliding-window snapshot with true cross-window distinct user counts; period_label='last_7d' refreshes every tick, 'last_30d' refreshes hourly (tracked via session_processor_state). - Mark usage_tool_daily as candidate for removal (no product-UI consumer). Attribution flow: - MarketplaceItemLookup replaces AttributionLookup. Preloads marketplace_plugins.name + store_entities.name into memory once per UsageProcessor tick, then per-event splits identifier on ':', matches prefix, writes resolved source / parent_plugin into usage_events. agnes-store-bundle prefix routes to flea entities. Slash commands with `plugin:` prefix count as type='skill' in rollup. API: - BREAKING: MarketplaceItem.unique_users_30d renamed to distinct_users_30d (now a true distinct count from the window snapshot, not sum-of-daily). - InnerDetailResponse gains a telemetry field — invocations_30d + distinct_users_30d surfaced on curated inner skill / agent detail pages. - Card chip hidden pending UX finalisation; data stays in the response. Backfill: scripts/backfill_marketplace_rollup.py — one-shot rebuild over historic usage_events after deploy, idempotent. USAGE_PROCESSOR_VERSION bumped 4 → 5 so the reprocess loop re-attributes existing events to the new source/ref_id semantics on the next tick. Tests rewritten: test_session_processor_usage, test_usage_rollups, test_marketplace_telemetry, test_api_admin_usage_reprocess, test_db_schema_version, test_home_stats, test_schema_v42_migration. New: test_backfill_marketplace_rollup. * fix(marketplace): refresh Most Popular on search + category changes `loadMostPopular()` early-exits when `state.q` or `state.category` is set, but the search + category handlers only called `loadItems()` — so once the section was visible, typing a query or filtering by category didn't re-run the hide check and the cards stayed on screen out of scope. Tab + sort handlers already chained the call. Add the call to runSearch + category pill click handlers (All + per-category) so the visibility contract holds for every state mutation that can flip the early-exit condition. * feat(marketplace): All-plugins section + 7-day Most Popular Listing layout: - Always-visible "All plugins" / "All items" / "Your stack" section header (label swaps per tab) wrapped in `#mp-all-section` so its margin-collapse mirrors the sibling `#mp-popular-section` and the spacing from the filter row stays consistent in both layouts. - Sort dropdown moved from the filter row into the All-* header, pinned right via `margin-left: auto`. Anchored to its section so the relationship between sort + grid is obvious. - `.mp-section-header` gets `min-height: 32px` + `align-items: center` so the bare-text Most Popular row matches the dropdown-bearing All-* row. - `.mp-section-header` margin tightened 24px → 20px on top. Most Popular: - Capacity reduced 8 → 4 cards. - Now reflects a 7-day window (was 30-day). Backend surfaces `invocations_7d` + `distinct_users_7d` on `MarketplaceItem` alongside the existing 30d fields; the loader pulls a wider page (server still sorts by 30d) and re-sorts + filters client-side on `invocations_7d > 0` so the strip stays "hot right now". - Section label updated to "Last 7 days". - Section now renders on both `curated` and `flea` tabs (was curated-only). Hidden on `my` and whenever search / category filter is active. Refresh hooks wired into search + category click handlers so visibility flips immediately on state change. Backend (`_load_invocation_stats`): - Single SELECT pulls both `last_30d` and `last_7d` rows from `usage_marketplace_item_window`; the result dict carries invocations + distinct_users for both windows. - Trend (recent_7 vs prior_7) kept on the daily fact table so it stays independent of the window snapshot's freshness. * feat(marketplace): Most adopted sort + hide Trending when no trend data Add a fourth sort option to the All-items dropdown — "Most adopted (30d)", keyed on `MarketplaceItem.distinct_users_30d` (true 30d distinct user count from `usage_marketplace_item_window`). Protects the listing from power-user skew that `most_used` is susceptible to: one user × 100 invokes can't beat 10 different users × 1 invoke under adoption sort. Hide Trending option when the response has no trend data. User reported `sort=trending` returning an empty grid because every plugin's `trend_pct` was None (prior-week threshold of >= 3 invocations didn't clear anywhere). Empty grids on a user-selected sort are worse UX than just not offering the sort — surface what works, hide what doesn't. Backend (`app/api/marketplace.py`): - `_apply_sort` gains a `most_adopted` branch (DESC distinct_users_30d, ties by name ASC). - `sort` Literal extended. - `ItemListResponse.available_sorts` lists the sort keys the UI should expose for this response. recent/most_used/most_adopted always; trending only when at least one item in the tab's stats carries a non-null trend_pct. - `_available_sorts(stats_dicts)` helper centralises the rule — curated and flea branches pass one stats dict, my-tab passes both (option is available when either source has trend data). Frontend (`app/web/templates/marketplace.html`): - New `<option value="most_adopted">Most adopted (30d)</option>` between Most used and Trending. - URL state allowlist extended so `?sort=most_adopted` round-trips. - `applyAvailableSorts(available)` runs after each list fetch: hides options not in the response's available_sorts; if the user is on a now-unavailable sort, resets to 'recent' and re-fetches. Search-mode fan-out unions availability across the curated + flea responses so a hit on either side keeps the option visible. * feat(marketplace): funnel chip on cards + deterministic Most Popular sort Card chip — funnel telemetry between description and footer: [stack-icon] N installed · [user-icon] N active · [bolt-icon] N calls · ↑/↓ N% - stack_count (new MarketplaceItem field): for curated it's COUNT() on user_plugin_optouts (post-v28 row PRESENCE = subscribed; system plugins are fanned out to every user via fanout_system_for_user so the count includes them naturally). For flea it reuses the existing store_entities.install_count (bumped on install/uninstall). - distinct_users_30d (existing) — active users in the 30d window. - invocations_30d (existing) — call volume. - trend_pct (existing) — week-over-week, both directions: green ↑ / red ↓, magnitude only (sign in the arrow). Hidden when null. Backend additions in app/api/marketplace.py: - MarketplaceItem.stack_count field. - _load_curated_stack_counts() — one SELECT per render, GROUP BY (marketplace_id, plugin_name). Wired into the curated + my-tab branches; flea reads install_count off the entity row directly. Frontend (app/web/templates/marketplace.html): - Heroicons solid 24×24 inlined (one helper per icon, all fill="currentColor" so per-segment colour tokens apply): rectangle- stack (mirrors the My Stack tab icon), user, bolt, arrow-trending- up/down. - Per-segment colour: installed=amber #F59F0A (My Stack accent), active=green #0e9b6a, calls=orange #f97316. Text stays neutral so the chip still reads as metadata, the leading glyph carries the visual cue. Trend pill keeps the full-segment green/red colour. - Zero state: chip hidden when stack_count == 0 AND invocations_30d == 0 — brand-new cards aren't visually penalised by a "0·0·0" row. - Tooltips on every segment via title="…" so hover explains the number's meaning to anyone uncertain about the icon. Most Popular section — deterministic ordering: Previously sorted by invocations_7d DESC with no tie-breakers, so several cards with identical 7d call counts would swap places on refresh (JS stable sort fell back on backend order, and the backend's own tie-breaker for `most_used` was just name ASC — six `grpn` plugins from six test marketplaces collapse to the same name and became indeterminate via list_with_filters' created_at order). New cascading hierarchy (chosen primary now matches what "most popular" really means — wide adoption, not power-user volume): 1. distinct_users_7d DESC ← adoption / social proof 2. invocations_7d DESC ← volume at equal adoption 3. distinct_users_30d DESC ← broader adoption fallback 4. invocations_30d DESC ← broader volume fallback 5. name ASC ← deterministic textual order 6. marketplace_slug ASC ← splits duplicate plugin names across marketplaces Six levels guarantee any two items end at a different sort key, so the strip is stable across refreshes. fix(marketplace): unify Most Popular on 30d + right-align installed chip Most Popular section was sorting on the 7d window while its cards rendered 30d numbers — header label promised one thing, cards showed another. Unified everything on 30d so a card means the same data everywhere on the page. - Dropped the "Last 7 days" meta from the Most Popular header. - Sort cascade now starts on distinct_users_30d, then invocations_30d, with 7d adoption/volume as recency-aware fallbacks before the name + marketplace_slug deterministic tail. Six levels guarantee identical sort keys never produce indeterminate order across refreshes. - Filter switched from invocations_7d > 0 to invocations_30d > 0 to match the new horizon. - Most Popular now only renders on page 1 of the listing. Past initial discovery, a top-of-list popularity strip on page 2+ would shadow the results the user paged into. Pager click handler refreshes the section so navigating back to page 1 re-mounts it. Chip layout — split engagement vs adoption visually: [user] N active · [bolt] N calls · [↑/↓] N% [stack] N installed └────────── LEFT (time-bounded engagement) ────┘ └── RIGHT (all-time) ──┘ - Installed (stack_count) is all-time, decremented on uninstall. Alone it says little ("12 people installed it") without the engagement context next to it ("…but did anyone actually use it?"). Visually separating the two groups makes that distinction obvious — left group answers "is it used", right answers "does anyone have it". - Implemented via flex with margin-left:auto on .seg-installed so installed drifts to the trailing edge. - Installed tooltip now reads "Currently installed by N users" — the count is a real-time net (uninstall drops it), and saying "currently" makes that explicit. Helps when a card shows 0: signals "nobody has this in their stack right now", not "data missing". * feat(plugin-detail): telemetry chip in hero, derived rows in sidebar Surface the same telemetry funnel the listing card carries on the curated plugin detail page, so clicking through from /marketplace keeps a single mental model — figures match, semantics match. The detail sidebar drops the two raw numbers that used to live there (Invocations 30d / Users 30d — duplicated by the chip now) and replaces them with two derived signals only the daily series can provide: Active days + Last used. Backend (app/api/marketplace.py): - PluginDetailResponse.stack_count — curated reads via _load_curated_stack_counts(), flea reuses install_count. Frontend treats both sources uniformly. - _build_telemetry() always returns a dict (never None). Frontend decides chip visibility from stack_count + invocations_30d the same way the listing card does. daily_series is always 30 entries (zero-padded) so "Active days" and "Last used" derivations on the sidebar are trivial array filters. Frontend (app/web/templates/marketplace_plugin_detail.html): - New .hero-telemetry slot at the bottom of the hero meta column, between the pills row and the action buttons. Renders the four funnel segments — active · calls · trend · installed — joined by ` · `. No left/right split: the hero has space, so a single coherent metadata strip reads cleaner than the card's split layout. - Heroicons solid inlined (user / bolt / arrow-trending-up,-down / rectangle-stack) recoloured against the dark hero — icons in lighter tokens (mint #6ee7b7, peach #fdba74, cream #fde68a), trend pill keeps the saturated green/red because direction-coding earns its own colour. - Tooltip on installed reads "Currently installed by N users" — the count is a real-time net (drops on uninstall), and "currently" makes that explicit when a card shows 0. - fmtNum helper added so 1.2k / 14M renderings match the card's format exactly. - Sidebar swap: Invocations + Users rows removed, replaced by Active days → "N of 30" Last used → fmtRelative of the latest non-zero day Both derived from telemetry.daily_series — engagement consistency + recency, neither of which the hero chip exposes on its own. * feat(item-detail): telemetry chip in hero for curated skill/agent Bring the funnel chip the plugin detail page got in 4cf38d40 to the curated inner skill/agent detail page — clicking through from the listing card now keeps the same metadata strip from grid to plugin page to inner item page. Backend (app/api/marketplace.py): - _load_inner_item_stats() rewritten: * always returns a dict (never None) so the frontend can decide chip visibility client-side, same contract as _build_telemetry * adds trend_pct, computed the same way as plugin level (recent_7 vs prior_7 from usage_marketplace_item_daily, ≥3 prior-week threshold) * adds daily_series (30 entries, zero-padded) so the sidebar can derive Active days + Last used - InnerDetailResponse.parent_stack_count — new field. Skills/agents don't have a per-item subscription model, so the hero shows the parent plugin's stack count under a "Plugin:" prefix. The funnel: "12 installed plugin → 2 actually use this skill". - curated_skill_detail + curated_agent_detail handlers load _load_curated_stack_counts() once and pass the parent's value. Frontend (app/web/templates/marketplace_item_detail.html): - New .item-detail .hero .hero-telemetry slot beneath the badges row. CSS mirrors plugin-detail's colour tokens (mint/peach/cream Heroicons solid + saturated trend pill) so the two surfaces read as one visual family. - Installed segment uses a "Plugin:" label rendered with reduced opacity to signal the metric describes the parent, not the item itself. Tooltip: "Parent plugin (<plugin_name>) currently installed by N users". - Sidebar Invocations + Users rows removed (chip carries them). Active days + Last used derived from telemetry.daily_series replace them; only rendered when activeDays > 0 so a brand-new skill doesn't show "0 of 30" / "Last used —". - "Type" row dropped from the sidebar — duplicates the hero badge. - fmtNum helper added (matches listing card + plugin detail). Plugin detail (app/web/templates/marketplace_plugin_detail.html): - Hero "Curator: …" line removed. The Details sidebar already carries that info; duplicating it under the h1 was visual noise. - Sidebar "Owner" row renamed to "Curator" — for curated plugins it's a person who curates inclusion in this Agnes instance, not the upstream code owner. "Owner" was a hold-over label. * feat(item-detail): unify hero with plugin detail — pills + breadcrumb + cleaner sidebar - Inner skill/agent hero now uses the same `.pills` / `.pill.cat / .curated / .flea / .muted` class names + CSS as the plugin detail page; the only item-only addition is `.pill.type` (Skill / Agent uppercase, plugin detail has no kind axis). - Hero `Updated` moved out of the meta-row into a muted pill (mirrors the plugin detail hero), removed from the Details sidebar to avoid duplication. - Details sidebar slimmed: dropped Marketplace, Path, Updated rows; Parent plugin now shows the curator-friendly display name (`parent_display_name \|\| manifest_name \|\| slug`) instead of the slug. - Breadcrumb extended to full path: Marketplace > <marketplace_name> > <plugin display name> > <self>, mirroring the plugin detail breadcrumb. - Backend: new `InnerDetailResponse.parent_display_name` field, populated via `_curated_plugin_enrichment` from marketplace-metadata.json — same source plugin detail hero already uses. * feat(marketplace): flea inner skill/agent detail + breadcrumb polish - Flea inner skill/agent detail page parity with curated: * GET /api/marketplace/flea/{id}/skill/{name} + /agent/{name} returning InnerDetailResponse (mirror of curated_skill_detail). * /marketplace/flea/{id}/skill\|agent/{name} web routes that render marketplace_item_detail.html with source='flea' + innerName context. * Frontend apiURL grows a third branch for flea-inner; breadcrumb grows to 4 segments (Marketplace > Flea Market > <plugin display name> > <self>) when innerName is set. * Telemetry attribution: MarketplaceItemLookup resolves <flea_plugin>:<inner> prefixes to (source='flea', parent_plugin=<plugin name>) so nested invocations land in the same rollups curated nested skills use. USAGE_PROCESSOR_VERSION bumped 5 -> 6 so the reprocess loop re-attributes historic events. - Breadcrumb 2nd segment is now a generic clickable "Curated Marketplace" / "Flea Market" link to /marketplace?tab=... instead of the opaque per-instance marketplace_name. Applied on both plugin detail and inner item detail. - Inner item hero telemetry chip works for both sources: installedCount branches on parent_stack_count (curated) vs install_count (flea), installed segment drops the "Plugin:" prefix for flea standalone / inner items. - Updated row dropped from Details sidebar on item detail — the hero pill already carries the value, sidebar row was duplicate. * feat(item-detail): block stack-install on flea inner items (mirror curated) Inner skills/agents nested inside a flea plugin can no longer be added to a user's stack on their own — adoption only happens at the plugin level, same rule curated nested items have followed since launch. - Hero action: when innerName is set (curated nested OR flea nested), render "Open parent plugin →" link + helper text instead of the install/remove buttons. Flea standalone entities (no innerName) keep the normal install UX. - Meta-row: same branch now serves curated + flea inner — "part of <parent plugin display name> · by <author>" with the parent link pointing at the right detail page per source. No API gate change needed: POST /api/store/entities/{id}/install only accepts existing entity ids (plugin-level), inner items have no entity id of their own so the endpoint cannot target them directly. * feat(marketplace): telemetry chip on inner cards + fix flea hero chip visibility Inner skill/agent cards on the plugin detail page now carry the same four-segment funnel chip the marketplace listing cards show (N active . N calls . trend . N installed), for both curated nested skills and flea nested skills. Plus two fixes that were keeping the hero chip hidden on flea plugin / flea inner detail pages. - Backend `_load_inner_items_stats_by_parent(conn, source, parent_plugin)` bulk loader: one query per plugin against usage_marketplace_item_window + one against _daily, returning {(name, type): stats}. Avoids N+1 per-card lookups. - `InnerItemSummary` gains invocations_30d / distinct_users_30d / trend_pct / parent_stack_count fields. `curated_detail` and `flea_detail` (in the entity.type=='plugin' branch) enrich the skills / agents lists after the existing cover-photo enrichment loop. - `marketplace_plugin_detail.html`: new `.plugin-detail .inner-card .inv-chip` CSS lifted from marketplace.html with the listing-card rules, new buildInnerCardChip() helper, buildCardSection appends the chip to each card body. Same gate as the listing card (hidden on parent_stack==0 && calls==0). - fix(flea): flea_detail forgot to populate PluginDetailResponse.stack_count from entity.install_count (listing card does this on line 851; detail endpoint didn't). Hero chip gate `stackCount===0 && calls===0` then always hid the chip even when the entity had installs. Now mirrors listing card semantics: stack_count == install_count for flea. - fix(flea inner): renderInnerHeroTelemetry was reading `d.install_count` for any non-curated source. InnerDetailResponse has no install_count field — it has parent_stack_count (populated server-side from the parent flea plugin's install_count). Gate + label now read parent_stack_count for both curated nested AND flea nested scenarios; install_count remains the flea standalone path. fix(marketplace): Owner label on flea + parent-centric sidebar for flea inner - Plugin detail Details sidebar — authorship row label now tracks the source: curated bundles get `Curator` (existing behaviour), flea bundles get `Owner`. The `owner_todo` reminder placeholder stays on the curated branch only; flea falls through silently. - Inner item detail Details sidebar — flea-inner (skill/agent nested inside a flea plugin) now shares the curated nested layout: Parent plugin / Bundle size / Active days / Last used / Owner. Drops the flea-standalone shape's `Category`, `Version`, `Installs`, `Released` rows that didn't apply to a nested item. Active days + Last used were already wired (telemetryRows) — they just weren't on the flea-inner branch. * fix(tests): bump SCHEMA_VERSION assertions 47 -> 48 post-rebase The marketplace telemetry migration was renamed _v46_to_v47 -> _v47_to_v48 during the rebase onto main (collision with #326 FTS BM25 migration that took the v47 slot). Two test files still asserted the pre-rebase value: - tests/test_home_stats.py::test_schema_version_constant_is_46 (CI red) - tests/test_schema_v46_migration.py::test_schema_version_is_46 Renames the helper fn name + bumps the assertion. The other two test files (test_db_schema_version.py, test_schema_v42_migration.py) were already updated in the rebase resolution. * fix(telemetry): _build_telemetry returns None when invocations_30d == 0 The follow-up commit that introduced the always-return-dict shape broke the test contract from the original v46 PR (commit b603e998): tests/test_marketplace_telemetry.py::TestDetailTelemetry:: test_detail_endpoint_telemetry_absent_when_no_data AssertionError: assert {'daily_series': [...], ...} is None Both `PluginDetailResponse.telemetry` and `InnerDetailResponse.telemetry` are declared `Optional[Dict] = None`, the frontend renders are None-safe (`d.telemetry \|\| {}` guard + `if (!d.telemetry \|\| ...)` on daily_series), so dropping the dict on zero activity is the cleaner default. * release: 0.54.21 — marketplace telemetry refactor (schema v48) + flea inner detail parity + listing UX polish --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-15 20:58:03 +02:00
Vojtech	bb703517c9	fix(store): close 2 medium + 1 low adversarial-review findings (#322 ) Three remaining findings from Codex's adversarial review of PR #316 (issue #318), plus a pre-existing version-numbering bug surfaced while fixing the atomic-promote ordering. M1 — Prompt sentinel escape now covers file PATHS, not just file BODIES. Pre-fix the per-file `--- FILE: {rel} ---` header inlined the untrusted relative path unescaped. A ZIP whose relative path concatenated to `</bundle>` (a `<` directory plus a `bundle>` child) could forge the trust-boundary close tag from inside the path slot and inject apparent system instructions after the boundary. Same `_escape_sentinels` helper now runs on both rel and body. M2 — Live-bundle swap + DB promote is now atomic-ish. The runner / override / inline-promote paths previously called `repo.promote_version(...)` then `_swap_live_to_version(...)`. A missing `versions/v<N>/plugin/` made the swap silently return False — leaving the DB ahead of live. New `promote_to_version` helper in `app/api/store.py` swaps FIRST (with the existing staging → backup → live rename chain) and only advances the DB row after the on-disk swap succeeds; rolls live back to prior on DB write failure. While wiring up M2, the strict source check exposed a pre-existing bug: `update_entity` and `restore_version` derived `new_version_no = entity.version_no + 1`. Under deferred promotion that's wrong — entity.version_no stays at the last approved version while version_history grows with blocked / pending entries. Subsequent PUTs would overwrite an in-flight blocked v2 dir's bytes, then the runner's hash-match promotion in `runner.run_llm_review` would load bytes that didn't match the recorded submission hash. Fixed by deriving from `max(version_history.n) + 1`. L1 — Admin forensic download now serves STAGED bundle bytes per submission, not live. Pre-fix downloading a blocked v2 streamed live's prior approved v1 bytes — admins reviewing whether to override saw the wrong bytes. Resolves staged `versions/v<N>/plugin/` via `_version_no_for_submission`; falls back to live for legacy rows without history linkage. Tests: - test_filename_with_bundle_sentinel_is_escaped - TestAtomicPromote::test_missing_source_dir_does_not_advance_db - TestAdminBundleDownload::test_download_v2_blocked_returns_staged_bundle_not_live	2026-05-15 17:56:09 +02:00
Vojtech	a694a30a5e	fix(store): surface review failures + harden publish gate (#316 ) * fix(store): surface review failures + harden publish gate Four independent fixes to the flea-market submission pipeline, all surfaced by an admin upload that landed at status='approved' without an LLM review. 1. LLM truncation no longer pins submissions in review_error. - Raised MAX_RESPONSE_TOKENS 2500 → 6000 in llm_review.py - Added one-shot retry-with-doubled-budget in anthropic_provider.py (capped at 4× initial) 2. Flea detail page surfaces the latest submission's failure verdict even when a previously-approved version is still serving (deferred-promotion path). The _quarantine_banner gate widened from `visibility != approved` to also fire on `blocked_inline / blocked_llm / review_error`, with copy that distinguishes the v2+ edit case ("Latest edit failed review — previously approved version (vN) keeps serving") from the initial-upload quarantine wording. 3. Restore button + endpoint no longer allow restoring a version that was never approved. Added StoreEntitiesRepository.get_with_version_approvals joining store_submissions, gated the UI button on submission_status in ('approved', None), rendered status pills for non-restorable rows, and added a 400 version_not_approved guard in POST /restore. 4. BREAKING (operator-facing): publish gate is now fail-CLOSED on misconfig. The previous get_guardrails_enabled() silently fell back to "disabled, auto-approve everything" when guardrails.enabled=true in YAML but no ANTHROPIC_API_KEY was in env. Split into: - get_guardrails_enabled() (intent — YAML) - get_guardrails_llm_provider_ready() (readiness — env) Three-state matrix: enabled=false → auto-approve (unchanged) enabled=true + ready=true → normal pipeline (unchanged) enabled=true + ready=false (NEW) → submissions hold at pending_llm awaiting admin retry or override (was: silent auto-approve) Admin "Retry review" eligibility broadened to include pending_llm. Boot-time WARNING banner surfaces the misconfig in app/main.py. docs/STORE_GUARDRAILS.md updated with the three-state matrix. Operators relying on the auto-fallback for local-dev no-LLM setups must now explicitly set `guardrails.enabled: false` in instance.yaml. Tests: 4623 passed. Added TestPublishGateFailClosed (4 tests) and TestRestoreVersion::test_restore_rejects_* (3 tests). conftest.py adds an autouse fixture defaulting guardrails OFF so legacy tests don't need to know about the new toggle. * fix(store): admin override promotes v2+ edits to current The override handler at app/api/admin.py:3708 only flipped submission status → 'overridden' and entity visibility → 'approved'. Under the v37+ deferred-promotion model that's insufficient for v2+ edits / restores: the new bundle sits in versions/v<N>/plugin/ and the entity row stays at the prior approved version_no + hash + on-disk live bundle. Installers kept getting the OLD bytes the admin had just intended to replace. Mirror the runner.run_llm_review auto-approval branch: look up the submission's version_hash in entity.version_history, and if its `n` differs from entity.version_no, promote_version + _swap_live_to_version. Initial v1 overrides are unaffected — the loop finds n=1 == version_no and skips promotion. Tests: - test_override_v2_edit_promotes_to_current: stage v1 approved + v2 blocked_llm; override the v2 sub; assert entity.version_no=2, entity.version flips off the v1 hash, and the live plugin/ dir mirrors versions/v2/plugin/. - test_override_v1_initial_upload_no_promote: regression guard so the promote loop doesn't accidentally bump a v1 override. Audit log gains a promoted_to_version_no field on the override action. * fix(store): retry/rescan review staged bundle; override forward-only Two adversarial-review findings from a Codex pass on the publish-gate work. C1. Admin retry + rescan were passing live `plugin/` to the LLM. For a v2+ submission held at `pending_llm` / `blocked_llm` / `review_error`, live still holds the prior approved version's bytes — so the LLM reviewed the WRONG bytes, and the runner's hash-match promotion in `run_llm_review` would then advance the entity to staged bytes that were never actually reviewed. Resolve the staged `<entity>/versions/v<N>/plugin/` from the submission's `version_history` entry, with a fall-back to live for legacy pre-v37 rows that never seeded a versions/ dir. Helpers `_submission_plugin_dir` and `_version_no_for_submission` added to `app/api/store.py` so override / retry / rescan share one path. H1. Override's promote loop used `target != current`, which would silently demote the live bundle when admin overrode a stale v2 submission while v3 was already approved + live. Changed to `target > current` so override flips status + visibility on the row regardless, but on-disk promotion only fires forward. Same `>` defensive guard applied in `runner.run_llm_review` so a late LLM verdict racing with a newer approval can't demote either. Tests: - TestAdminRetryReviewsStagedBundle::test_retry_v2_blocked_passes_staged_dir_not_live - TestAdminRetryReviewsStagedBundle::test_rescan_v2_blocked_passes_staged_dir_not_live - TestOverrideForwardOnly::test_override_stale_v2_does_not_demote_when_v3_current * review polish: CHANGELOG drift, override eligibility, defensive copy Three small additions on top of the retry/rescan staged-bundle fix: 1. CHANGELOG: the PR's bullets had drifted into the released [0.54.17] section during rebase (context-match landed them next to already-released content). Moved them up to [Unreleased] where they belong; [0.54.17] now holds only what was actually released (refresh-marketplace ls-remote, /me/activity hero, CI sharding + workflow polish). 2. app/api/admin.py: admin override eligibility now accepts pending_llm alongside blocked_inline + blocked_llm + review_error. Closes a UX gap from the new fail-CLOSED behavior: under enabled-but-not-ready, a known-good submission would otherwise sit indefinitely until the admin set credentials AND clicked Retry. Override already routes through version_history (and is now forward-only on promote), so it stays safe for v2+ deferred- promotion submissions. 3. src/repositories/store_entities.py: get_with_version_approvals defensively copies each version_history entry before annotating with submission_status. self.get() re-parses JSON each call today so this is belt-and-suspenders against any future caching layer leaking the annotated key into a subsequent plain get() call. Tests: 112 passed (focused on test_store_entity_versions + test_admin_store_submissions, covering the retry/rescan staged- bundle fix the author shipped + this polish). --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-15 15:52:07 +02:00
Vojtech	4501c9c3dd	fix(store-guardrails): post-#290 review follow-up — purge tuple, filter chip, stale docs, lazy bundle_meta, logger.exception (#295 ) Addresses post-merge review findings on #290: - Admin Rescan is the only post-v30 producer of status='blocked_inline'. Re-add it to admin queue 'Needs review' filter chip and to TERMINAL_BLOCKED_STATUSES in the bundle-purge job so rescan-produced rows surface in the default operator view and bundles get TTL-swept instead of lingering indefinitely. - Update three doc-drift sites still referring to the pre-#290 spam counter scope (counted blocked_inline). The counter now narrows to blocked_llm + review_error; fix the comment in app/api/store.py, the docstring in get_guardrails_blocked_quota_per_day(), and the operator-facing hint rendered on /admin/server-config. - Add positive test for _reject_inline_or_continue validation branch (code='validation_failed', checks payload shape, no-DB-write contract). Locks the frontend wizard's detail.checks contract. - Tighten test_quota_disabled_with_zero — assert (200, 201) explicitly instead of !=429 so a 500 regression no longer passes. - _reject_inline_or_continue takes plugin_dir and lazy-computes bundle_meta only on the security branch. Validation rejects no longer pay for a SHA256 walk on the bundle. - Surface store.upload.security_blocked audit-log write failures via logger.exception instead of swallowing — that audit row is the only forensic trace by design.	2026-05-14 08:02:44 +02:00
Vojtech	50a974f196	feat(store-guardrails): admin-configurable content thresholds (#281 ) * feat(store-guardrails): admin-configurable content thresholds Adds the flea-market content guardrail floors to the /admin/server-config editor so operators can tune the bar without code changes. Defaults are unchanged (60 chars description, 25 chars command, 5 distinct words, 200 chars body) — patching guardrails.* in instance.yaml or via the admin UI overrides any of them and the next inline check picks up the new value. src/store_guardrails/content_check.py now resolves the four floors via helper functions (_min_desc_chars / _min_command_desc_chars / _min_distinct_words / _min_body_chars) that read app.instance_config at call time. Module-level _DEFAULT_* constants remain as fallbacks if the import fails (defensive — keeps the guardrail module loadable without the app package on its path). app/instance_config.py grows four matching getters returning the live value with sane defaults + integer coercion. app/api/admin.py registers 'guardrails' as an editable section + ships nine known-fields entries (min_description_chars, min_command_description_chars, min_distinct_words, min_body_chars, enabled, review_model, blocked_quota_per_day, blocked_bundle_ttl_days, stuck_review_grace_seconds) with operator-facing hint copy explaining what each knob does. app/web/templates/admin_server_config.html gets a SECTION_META entry so the section renders as 'Flea-market guardrails' with a help string instead of a bare section ID. app/web/router.py threads the live thresholds into /store/new and /store/examples via a small _guardrail_thresholds() helper so the disclosure copy, char counter, and "Why these limits" table render the configured value (not a hardcoded 60). End-to-end smoke verified: PATCH guardrails.min_description_chars=90 → /store/new immediately renders "90 characters" + JS DESC_MIN=90 on the next request, no restart required (helpers read live config per call). * chore(store-guardrails): address PR review safe-fix findings Code-review safe_auto findings on PR #281 (review run 20260513-100126-64052520): - CHANGELOG: add Unreleased entry covering the new /admin/server-config Flea-market guardrails section, the four live threshold getters, and the route-helper rendering knobs. Required by the project's non-negotiable "Changelog discipline" rule. - content_check.py: narrow `except Exception` to `except ImportError` on the four `_min_()` resolver helpers. Surface-level TypeError / ValueError on a malformed YAML value belongs to the instance_config getters' own try/except — the resolvers should only defend against the in-tree import itself failing, not silently swallow real bugs in the getters. - store_upload.html: refresh the stale "30-char threshold" comment to reflect the configurable floor (default 60), and add `\|default(60)` / `\|default(25)` / `\|default(5)` filters to the disclosure-copy bindings so the upload form matches store_examples.html's belt-and-suspenders rendering if a future route ever renders the template without populating the `guardrail` context. - router.py: tighten `_guardrail_thresholds()` return annotation from bare `dict` to `dict[str, int]`. Residual work (left for separate change after operator direction): - Add round-trip test (PATCH guardrails -> next inline check uses new value) — primary testing gap. - Decide policy on `min_=0` (currently coerced to 1 via `max(1, int(val))`) vs treating 0 as a disable sentinel like neighbour getters (`blocked_quota_per_day`, `blocked_bundle_ttl_days`). - Add POST-time integer validation for `guardrails.` so a typo'd YAML value (bool / string / float) errors loudly instead of silently falling back to the default. test(store-guardrails): cover admin-configurable thresholds + PATCH round-trip Closes the "primary testing gap" Vojta noted in the safe-fix commit on PR #281 — the four new `get_guardrails_min_` getters and the PATCH-takes-effect-on-next-check live-config flow had no direct coverage. 10 new tests in `tests/test_store_guardrails_admin_config.py`: - TestGuardrailGetterDefaults (4 tests) — each new getter returns the documented default (60 / 25 / 5 / 200) when nothing is configured. - TestGuardrailGetterOverlay (5 tests) — overlay-driven overrides win, string values that look numeric coerce via int(), garbage strings fall back to default via the (TypeError, ValueError) branch, and the `max(1, int(val))` floor pins zero/negative inputs to 1. - TestPatchRoundTrip (1 test) — PATCH `/api/admin/server-config` `guardrails.min_description_chars=90`, then call content_check against a 75-char description that previously passed: must now fail with `too_short`. Then PATCH back to 60 and verify the next check passes again. Closes the cache-invalidation contract Vojta relies on for the "no app restart" claim — broken without the reset_cache() bracket in /api/admin/server-config. The TestGuardrailGetterOverlay.test_zero_or_negative_floored_to_one test pins the current `max(1, int(val))` policy. Vojta's safe-fix commit explicitly left "policy on min_=0 vs disable-sentinel" as residual work — pinning the current behavior here ensures any future change to use 0 as a disable sentinel must update this test (and the reviewer sees the policy decision). Verified: 4509 tests pass locally (4499 existing + 10 new). * release: 0.54.2 — admin-configurable flea-market guardrail thresholds + tests Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.1 → 0.54.2) bundling Vojta's admin-configurable thresholds for the flea-market content guardrail (9 knobs in /admin/server-config) plus the test coverage closing the "primary testing gap" he punted in the safe-fix commit. No DB migration; defaults unchanged from PR #276 — instances that don't set `guardrails.*` keep the original bar transparently. --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com> Co-authored-by: ZdenekSrotyr <139972147+ZdenekSrotyr@users.noreply.github.com>	2026-05-13 09:20:55 +00:00
ZdenekSrotyr	b4d3c576af	Activity Center: audit log + telemetry + sessions + agnes_* tables (#278 ) * docs(spec): admin observability spec + Activity Center MVP plan Parent spec (480 lines) + executable plan (2295 lines, 14 TDD tasks). Covers Activity Center rebuild (/admin/activity), with /admin/sessions and /admin/feedback deferred to follow-up plans. Already incorporates reviewer-pass revisions across three angles (security, production resilience, code architecture): - _get_db import path corrected to app.auth.dependencies - Test fixtures aligned with seeded_app / admin_user / get_system_db - All new audit writes wrapped in try/except + logger.exception - Filename sanitization on session uploads - DuckDB DESC index behavior documented; upgrade window flagged - Migration idempotency + evolved-DB test cases - reveal_raw + shared-cache multi-worker explicitly deferred Targets schema v40 (audit_log gains params_before, client_ip, client_kind, correlation_id + 3 indices). * feat(db): schema v40 — audit_log gains params_before, client_ip, client_kind, correlation_id + 3 indices * chore(test): clean up Task 1 — drop unused import, rename stale test * feat(audit): AuditRepository.log() accepts params_before/client_ip/client_kind/correlation_id * test(audit): strengthen params_before assertion to round-trip JSON content * feat(audit): AuditRepository.query() rich filters + keyset cursor pagination * feat(sync): SyncStateRepository.list_recent() cross-table feed * feat(audit): POST /api/sync/trigger writes audit_log row * feat(audit): POST /api/scripts/run-due writes audit_log row * feat(audit): POST /api/upload/sessions writes audit_log row + sanitizes filename * feat(audit): GET /api/data/{table_id}/download writes audit_log row * feat(activity): /api/admin/activity timeline + /health + /sync endpoints * feat(ui): /admin/activity rebuilt — health pulse, timeline, sync grid; /activity-center → 308 redirect BREAKING: removed demo executive-pulse / maturity-roadmap content from activity_center.html. The page now reflects real audit_log + sync_history data. * feat(ui): admin nav + dashboard widget point at /admin/activity * feat(activity): recursive-audit suppression for AC read endpoints (60s window per actor+filter) * feat(activity): emit PostHog events when integration enabled (no-op default) * fix(audit): move v40 indices out of _SYSTEM_SCHEMA + update test_repositories to unpack query() tuple _SYSTEM_SCHEMA CREATE INDEX on audit_log(timestamp) failed when migration tests hand-roll a bare audit_log (id, action) without the timestamp column. Fix: remove indices from _SYSTEM_SCHEMA; add ADD COLUMN IF NOT EXISTS guards for timestamp and other pre-v40 columns in _v39_to_v40() so the upgrade path is safe on any hand-rolled schema; call _v39_to_v40 explicitly in the fresh-install (current==0) path to restore index creation there. Also unpack the (rows, next_cursor) tuple from AuditRepository.query() in the three TestAuditRepository tests that still treated it as a list. * docs: CHANGELOG entry for Activity Center MVP * chore: refresh stale module docstring in app/api/activity.py * feat(cli): agnes admin activity — terminal access to Activity Center (timeline + health + sync) * fix(db): _v39_to_v40 — add IF NOT EXISTS guard for 'action' column The v39→v40 ladder step adds defensive ADD COLUMN IF NOT EXISTS for every audit_log column so a hand-rolled bare audit_log (id only) is safe through the ladder. 'action' was missing from the guard list, causing CREATE INDEX idx_audit_action_time to fail on tests that stub audit_log with only an id column (tests/test_e2e_extract.py:: TestSchemaMigration::test_migration_preserves_and_extends). Local 6/6 schema tests + the previously-failing CI test pass. * docs(spec): platform telemetry epic — Boss directive + Activity Monitoring plan rebased onto v40 (stacked on zs/spec-activity-center) * feat(db): schema v41 — 7 usage_* tables for telemetry (events, summary, rollups, attribution) * chore(db): tighten v41 — usage_session_summary.session_id NOT NULL + upgrade test asserts all 7 tables * feat(usage): UsageAttributionRepository — replace/delete/lookup over usage_attribution_* tables * refactor(marketplace): extract list_inner_skills/agents/commands to src/marketplace_listing.py for reuse * feat(usage): explode plugin attribution on marketplace sync + store entity write; backfill script * refactor(marketplace): finish src/marketplace_listing.py extraction — drop duplicate _list_inner_* + _parse_frontmatter from app/api/marketplace.py * feat(usage): promote attribution helpers to src/usage_attribution_helpers.py; hook update_entity rename + bundle-swap; clarify best-effort semantics * feat(usage): UsageProcessor real extraction + rollup rebuild + 10 fixture-driven tests * fix(usage): include tool_id in event hash + executemany + rollup transaction (critical multi-tool-turn drop fix) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(marketplace): popularity stats — invocations_30d + trend + sort=most_used\|trending + Most Popular section * feat(admin): /admin/users/<id> Sessions section — list + single-file + bulk-zip downloads (audit-logged) * feat(usage): admin export endpoint + CLI — csv/json/parquet streaming, filters, audit-logged * feat(usage): agnes admin ask — LLM Text-to-SQL over usage_events with SELECT-only validator (audit-logged) * feat(usage): reprocess + prune endpoints + scheduler daily prune job + CLI * docs: PLATFORM_SETUP.md operator playbook + HOWTO/ cookbook (5 guides + index) Adds docs/PLATFORM_SETUP.md as a consolidated operator playbook covering bootstrap, TLS, marketplaces (curated + flea), scheduler env vars, telemetry extraction/export/ask/prune, privacy posture, and daily routine. Adds docs/HOWTO/ with 5 analyst cookbook guides: first query, snapshots for remote tables, private sessions, feedback + admin ask, and customizing skills. Existing setup docs (QUICKSTART, DEPLOYMENT, ONBOARDING, HEADLESS_USAGE) get a one-line cross-reference at the top pointing to PLATFORM_SETUP.md. * docs(changelog): platform telemetry epic — usage_* foundation + surfaces + admin access + docs Comprehensive [Unreleased] entry covering: usage_events/session_summary/ tool_daily/plugin_daily tables (v41), attribution lookup tables, backfill script, marketplace Most Popular + invocation chips + sort, admin Sessions section, export/ask/reprocess/prune endpoints + CLI mirrors, Activity Center (v40), PLATFORM_SETUP.md + HOWTO/ docs, and operations notes for v41 upgrade. * fix(security): block DuckDB read_/http_/glob functions in usage_ask validator + symlink escape guard in session zip + clarify mark-private semantics * fix(admin): parquet export tempfile cleanup on COPY failure + correct processed-first sort on /admin/users/<id>/sessions * feat(audit): close 8 production audit gaps — query (local/remote/hybrid), catalog/schema/sample, snapshot estimate/create, check-access * feat(ui): /admin/usage summary dashboard + per-user activity tab on /admin/users/<id> * fix(audit): cap error messages at 200 chars + audit user_activity reads + recursion guard on usage.summary * fix(audit): catalog.list audits on error path + clean up deferred json import * fix(ux): client_kind=cli for PAT auth + timeline empty state + email-instead-of-uuid + nav reorder + help text + loading indicators + ask doc * feat(observability): unify /admin/activity into single page with saved views - KPI cards (events, users, error rate, p95) clickable as quick-filters - Faceted filter dropdowns populated from audit_log in the current window - Sortable audit table, cursor pagination, per-row JSON side panel - Saved views (schema v43: user_observability_views) — per-user state - Top bar: window selector + 30s Live toggle + saved views dropdown - /admin/scheduler-runs → 308 redirect (source=scheduler filter) - New endpoints: /api/admin/observability/{facets,kpis,views} * test: update activity + scheduler-runs tests for unified page - test_admin_activity_page_renders asserts new structural anchors - test_admin_scheduler_runs_page_admin_only asserts 308 redirect * fix(observability): respect [hidden] on modal + side panel CSS `display: flex` on .obs-modal beat the [hidden] attribute's UA display:none, so the save-view modal rendered on page load and Cancel clicks couldn't dismiss it. Gate the modal's flex layout on :not([hidden]); add the same display:none guard prophylactically to .obs-panel and .obs-views-panel. * feat(observability): user enrichment in audit + interactive /admin/usage Activity: - /api/admin/activity now joins users for user_email + user_name per row - User column renders "name (id-prefix)" or "email (id-prefix)" instead of an opaque truncated UUID; falls back to id when the user record is missing Usage: - /admin/usage rewritten as the same filter/group-by/search pattern as /admin/activity. Faceted dropdowns (User / Tool / Source / Event type) populated from usage_events; debounced free-text search across tool_name / skill_name / subagent_type / command_name - New endpoints /api/admin/usage/{facets,kpis,query}; the query endpoint supports group_by in {day, username, tool_name, source, ref_id} with sort + offset pagination, plus an ungrouped raw-events mode - 4 KPI cards (events, distinct users, distinct tools, error rate) are clickable quick-filters; clicking a grouped row applies the bucket as a filter - Old static `?window=7d\|30d\|all` server preload removed; all state is client-side via since_minutes + group_by + filters in the URL * fix(observability): clearer labels, all-column sort, drop saved views UI - Rename page titles: "Activity" → "Server activity", "Usage" → "Tool usage" with a one-line subtitle on each explaining what the page covers and linking the other one. The two pages source different data (audit_log vs usage_events) and the previous labels conflated them. - Drop the saved-views dropdown + save modal from /admin/activity. The modal pop-open bug was the trigger; the value wasn't there yet. The /api/admin/observability/views CRUD + DuckDB table stay in place. - Rename "Live (30s)" to "Auto-refresh (30s)" with a tooltip clarifying that it's the re-fetch rate, not the time range. Time range now labeled "Time range" instead of "Window". - All audit-table columns are sortable (User, Source, Action, Resource, Result added); sort is page-local with a Jinja comment explaining the trade-off. Same for raw usage rows. - Fix duplicate sort-arrow bug — the literal "▼" in the Time th HTML was rendering alongside the CSS ::before arrow. Removed the literal; CSS is the single source of truth. * feat(observability): global Sessions browser + transcript viewer + CLI Web: - /admin/sessions — list every collected session JSONL across all users with time-range, user, model, errors-only and free-text filters. Default sort surfaces error-heavy sessions first. KPI cards (sessions, distinct users, sessions w/ errors, tool error rate) clickable as quick-filters. - /admin/sessions/<username>/<file> — transcript viewer rendering the JSONL chronologically: user prompts, assistant text, tool calls (with JSON input) and tool results (with flattened output). Errors get a red border + chip and a "Next error" navigation button at the top. - Admin dropdown gains a "Sessions" link. API: - GET /api/admin/sessions/{list,kpis,facets} — filtered cross-user reads off usage_session_summary - GET /api/admin/sessions/{username}/{file}/transcript — parses JSONL via the existing services.session_pipeline.lib, returns chronological events - GET /api/admin/sessions/{username}/{file}/download — JSONL stream, same path-safety guards as the per-user endpoint, audit-logged CLI: - `agnes admin sessions list [--user X] [--errors] [--since 7d]` — table output with `!` prefix on rows that hit a tool error - `agnes admin sessions show <username> <file>` — transcript dump, with `--errors` to print only the failed tool_result blocks - `agnes admin sessions download <username> <file> [-o path]` - `agnes admin sessions kpis` — top-level numbers * feat(internal): expose telemetry tables to agnes query with row-level RBAC Three new registered tables backed by system.duckdb, queryable through the same /api/query plumbing analysts use for Keboola / BigQuery / local sources: agnes_sessions → usage_session_summary (filter: username) agnes_usage → usage_events (filter: username) agnes_audit → audit_log (filter: user_id) RBAC is per-row, not per-table: admins see every user's rows; non-admins see only their own. The filter is built server-side from the auth user dict; non-admin filter values are regex-validated before SQL interpolation. Implementation: - new connector connectors/internal/ with access (filter+exec) + registry (idempotent table_registry seed at startup) - /api/query detects internal table refs and short-circuits to a CTE wrapper that prepends "WITH agnes_x AS (SELECT * FROM <src> WHERE …), …" then "SELECT * FROM (<user_sql>) AS _q". DuckDB cursor on the shared system.duckdb handle — opening parallel handles / ATTACH on the same file is blocked process-wide. - mixing internal + BQ / registered local tables in one SELECT is rejected (v1 limitation) - src.rbac.can_access_table waves internal tables through for all authenticated users; row scoping is the actual security control - /api/v2/schema and /api/v2/sample gained internal branches; sample intentionally skips its cache because rows are RBAC-scoped per caller - audit row written as action='query.internal' with is_admin flag Tests: connectors/internal/access — RBAC, filter clause, schema, CTE wrapper coexistence with user-supplied aggregations, unsafe-username rejection. 16/16 passing. Motivating queries this enables: SELECT tool_name, COUNT() FROM agnes_usage WHERE is_error GROUP BY 1 ORDER BY 2 DESC -- analyst self-introspection: which tools fail for me? SELECT user_id, COUNT() FROM agnes_audit WHERE action = 'session.transcript_view' GROUP BY 1 -- admin: who's been looking at whose session transcripts? * feat(admin): group dropdown into 5 named sections + internal tables in /catalog Admin dropdown gains section headers so admins can land on the right page without re-reading the full menu: Activity Center Server activity / Tool usage / Sessions Users & Access Users / Groups / Resource access / Tokens Data Tables Agent Experience Curated Marketplaces / Flea Submissions / Agent Setup Prompt / Agent Workspace Prompt Server Server config "Agent Experience" frames the curated content + prompts as one cluster — it's all admin-controlled material that shapes what an analyst's AI agent encounters. "Configuration" → "Server" since only one item lives there now. Renamed the section's first two items: "Activity" → "Server activity" (matches page H1) "Usage" → "Tool usage" Also fixes /catalog visibility of the internal tables (agnes_sessions / _usage / _audit) for non-admin users: ``app.auth.access.can_access`` short-circuits to True for resource_type='table' + an internal-table id. Without this, non-admins saw the tables in /api/v2/catalog (which uses the same RBAC bypass) but not on the /catalog HTML page (which calls can_access directly, requiring a resource_grants row internal tables don't have). CSS for `.app-nav-menu-section`: small caps, muted, non-clickable; first section trims top padding so the panel doesn't open with an awkward gap. * refactor(admin): move corporate memory into Admin > Agent Experience Memory link was the only admin-only entry in the primary nav (gated by session.user.is_admin). Moves it into the Admin dropdown under Agent Experience, alongside Curated Marketplaces / Flea Submissions / Prompts — all admin-curated content that shapes what an analyst's AI agent encounters. Renamed the nav label to "Shared Knowledge" to match what the page actually is (admin-curated organisational knowledge from session verification, surfaced to agents). URL stays at /corporate-memory; the route still gates on require_admin per the existing comment. Side effect: primary nav (Home / Marketplace / Data Packages) is now uniform for every authenticated user — no conditional admin-only entry. * ui: rename admin entries to Curated Knowledge / Init Prompt / Workspace Prompt - "Shared Knowledge" → "Curated Knowledge" (parallel with "Curated Marketplaces" in the same Agent Experience section; "curated" tells the admin what they do there — review + approve) - "Agent Setup Prompt" → "Init Prompt" (matches the `agnes init` flow it actually drives) - "Agent Workspace Prompt" → "Workspace Prompt" (the "Agent" prefix was redundant — every item in the section is agent-facing) Renames page titles + H1s on /admin/agent-prompt and /admin/workspace-prompt to match. * refactor: rename Usage → Telemetry across user-facing surfaces External surfaces all switch; internal Python module / file names and the physical DB tables (usage_events, usage_session_summary, usage_tool_daily, usage_plugin_daily) stay — renaming them would force a schema migration + a redo of the LLM Text-to-SQL prompt for no analyst-visible win. Changes: - Admin dropdown: "Tool usage" → "Telemetry" - Page H1 / <title>: same - URL: /admin/usage → /admin/telemetry; old URL 308-redirects - API prefix: /api/admin/usage/* → /api/admin/telemetry/* - CLI: primary command `agnes admin telemetry …`; `agnes admin usage` kept as a deprecated alias so existing operator scripts keep working - Internal data-source table id: agnes_usage → agnes_telemetry. The registry seed now evicts any stale internal-source row whose id no longer matches INTERNAL_TABLES, so the old `agnes_usage` row is removed from table_registry on next app boot - All tests + JS endpoint paths updated * test(rbac): include auto-appended internal tables in expectations get_accessible_tables now appends agnes_sessions / agnes_telemetry / agnes_audit to every authenticated user's accessible-tables list so the internal data source shows up in /catalog. The two existing rbac tests asserted hardcoded list shapes that pre-dated the change. Rewritten to assert "granted tables + the canonical internal-table set" instead of literal lists, so the test stays correct if the internal table roster changes again later. * ui: visual dividers between admin-dropdown sections Adds a 1px top border + 6px top margin to every section header except the first, so the five named groups (Activity Center, Users & Access, Data, Agent Experience, Server) read as visually separated clusters. The header itself stays small-caps + muted as before — the border is additive. * ui(memory): match obs-topbar visual on /corporate-memory The Curated Knowledge page (linked from the admin dropdown's Agent Experience section) opened straight into the stats bar — no title, no subtitle, no shared chrome with the other admin pages. Adds an obs-topbar-style header at the top of .container-memory: - H1 "Curated Knowledge" - subtitle explaining what the page is + how AI agents pull from it The `.ck-` class set duplicates the inline obs- styles from /admin/activity etc. for this one page; promoting the obs-* class set to style-custom.css for shared reuse is the obvious next step (4 pages already inline the same CSS), tracked as a follow-up. Page <title> also renamed from "Corporate Memory" → "Curated Knowledge". * ui(tables): list Agnes internal tables in /admin/tables + group in /catalog /admin/tables previously rendered three per-source-type listings (BQ / Keboola / Jira) and dropped any row whose source_type didn't match — so the agnes_sessions / agnes_telemetry / agnes_audit rows seeded into table_registry were invisible. Adds a fourth read-only section "Agnes internal tables" that filters source_type === 'internal' and renders the same registry-table layout the other sections use, with two changes: - no Register button (these rows are seeded on every app boot from connectors/internal/registry.py) - Edit + Delete actions hidden (any change would be reverted on the next start). Manage access stays so admins can still inspect. Mode badge picks up a new mode-internal CSS class (teal accent) so the display doesn't lie and call it "local". In /catalog, internal tables now group under an "agnes" accordion section (bucket="agnes" on seed) instead of falling into the catch-all "default". Single source of truth for which tables exist; admins find them where they expect. * ui(tables): Agnes internal as a 4th tab next to BQ/Keboola/Jira Previous iteration mounted the internal-table listing as a separate standalone card under the tab strip. Reshapes it to a proper tab-content section so admins switch between data sources via one consistent nav (BigQuery / Keboola / Jira / Agnes internal). - New tab button "Agnes internal" in the tab-nav. - The listing card becomes <section id="tab-content-internal" class="tab-content">; switchTab() already routes by id so no JS change beyond extending the hash allowlist for direct #internal links. - Tab content keeps the read-only treatment from the previous commit (no Register button, no Edit / Delete in renderRegistryListing). * ui: rename Curated Knowledge → Curated Memory Settles the naming back on "Curated Memory" — parallel structure with "Curated Marketplaces" in the same Agent Experience section, and zero rename ripple: URL (/corporate-memory), API (/api/memory/), CLI (agnes admin memory), and Python modules all stay on "memory" so the admin label finally lines up with the underlying surfaces. The "Curated" prefix still tells admins what they do on the page (review pending → approve / mandate / reject) and reads as a sibling of "Curated Marketplaces" right next to it in the dropdown. Touches: admin dropdown label, page <title>, page H1. DB tables stay on knowledge_ (already the canonical naming for the data shape). * ui: rename "Server activity" → "Audit log" "Audit log" is what the page actually is — server-side audit_log table rendered with KPI cards + filter bar + sortable table. The "Server activity" label confused the term with Claude Code session telemetry (Telemetry page) and didn't make the source/concept clear. Touches: - Admin dropdown nav label - /admin/activity page H1 + subtitle - /admin/telemetry subtitle cross-link - test_activity_api page-renders assertion URL (/admin/activity) and API (/api/admin/activity/) stay — the "activity" name has stuck at the route layer for a year; rerouting those would churn dashboards/bookmarks for zero analyst-visible win. ui(admin-nav): gray band on each section header for clearer separation Previous iteration used a 1px top border between section labels — the labels still blended into the items above/below at a glance. Switches to a light gray background band per section header, extended edge-to- edge inside the panel via negative horizontal margins. Bolder font-weight (700) reinforces the separation; bumping the font color isn't needed because the band itself does the work. First section's header tucks into the panel's top border-radius so the band reaches the corners without a gap. * ui(catalog): rename internal-table category to "Agnes Internal" `bucket` is what /catalog renders as the accordion category header verbatim — "agnes" lowercase didn't read as a real category name and got confused with a system identifier. Bumps to "Agnes Internal". Seed re-applies on every app boot so existing rows pick up the new bucket value via `ON CONFLICT (id) DO UPDATE`. * ui(catalog): split Agnes Internal into its own card on /catalog Previously the three internal tables landed inside the "Core Business Data" card under an "Agnes Internal" accordion alongside Keboola / BQ buckets — readers conflated system telemetry with business datasets, and the data_stats header counter ("3 tables · ~X rows total") only ever counted synced rows so internal tables looked invisible. Split the catalog page into two cards: - Core Business Data: only non-internal source_types (Keboola, BQ, Jira). Accordions group by bucket as before. Stats counter reflects this card's tables. - Agnes Internal: a dedicated card with its own visual treatment (teal accent matching the mode-internal badge in /admin/tables). Flat list (no accordion — only 3 rows, never grows here), each row carries the canonical `agnes query` snippet. Read-only — no profiler click, no In-stack toggle, no sync metadata. Route adds `internal_card` context object; template renders the new card only when it's non-None. * fix(rbac): hide internal tables from /admin/access + drop "my" framing Two related cleanups for the Agnes-internal tables: 1. /admin/access (resource grants) no longer lists them. The `can_access` check has a hardcoded internal-table bypass — security is row-level (per-request view filter), so a table-grain `resource_grants` row would do nothing. Surfacing them in the UI let admins set up grants that silently no-op. Filter at the `_table_blocks` projection so the UI tree never sees them. 2. Display names drop the analyst-perspective "my" framing: "Agnes — my sessions" → "Agnes sessions" "Agnes — my telemetry events" → "Agnes telemetry events" "Agnes — my audit log" → "Agnes audit log" The "my" only makes sense from the querying analyst's seat (`SELECT … FROM agnes_sessions` returns their rows); on /admin/* pages where admin sees / configures them across users, the pronoun was misleading. Description text now spells out the row-level RBAC contract explicitly. Display names update via TableRegistryRepository.register's ON CONFLICT UPDATE on next app boot; no manual cleanup needed. * ui: subtitle notes about agnes_* tables on each Activity Center page The recursive observability story — Agnes serves its own audit / telemetry / session data through the same `agnes query` plumbing analysts use for business data — wasn't surfaced anywhere on the admin pages that show that data. Three pages get a one-liner with the canonical `agnes query` snippet + the RBAC contract (analysts see their own rows, admin sees all): - /admin/activity (Audit log) → agnes_audit - /admin/telemetry (Tool usage) → agnes_telemetry - /admin/sessions → agnes_sessions Sets up the discovery moment for admins: they're reading the page, they see "you can query this from Claude Code", they remember it when an analyst asks "how do I find my own failed tool calls?". * ui(tables): explain "Show log" empty-state on /admin/tables Cache warmup log <pre> renders with a dark background and is only populated by the SSE stream during a Re-warm all run. Opening the page cold + clicking Show log just revealed a black bar with no context — admins couldn't tell what they were looking at. Adds an inline paragraph above the <pre> explaining what the log is, the row format, when it fills in, and where to find the historical audit trail (/admin/activity). The actual <pre> stays empty until SSE events arrive, but the surrounding copy carries the meaning. * ui(tables): auto-open cache-warmup log on Re-warm all click A Re-warm all run takes ~24s per remote BQ row. With the <details> collapsed by default, operators saw the button disable, watched a quiet ~24s pass, and assumed nothing had happened — the streaming log was hidden behind a closed disclosure. Two small JS tweaks: - cacheWarmupRun() opens the details on click, so streamed lines appear without an extra interaction - cacheWarmupOnStart() hides the inline hint paragraph the moment real log content lands, so the dark log block isn't competing with redundant context Hint paragraph also clarifies that only `query_mode='remote'` BQ rows are warmed — operators with only materialized/internal tables would see total=0 and the page would "do nothing" by spec. * ui: trim Agnes internal copy across surfaces Descriptions had grown to explain the extraction pipeline ("parsed out of session JSONLs"), the underlying table ("Backed by usage_session_summary"), the RBAC mechanic ("row-level RBAC at query time — analysts see their own; admin sees all"), and the SQL snippet. Every implementation detail meant another rewrite on the next iter. Strips to one stable line per surface: what the data is, plus "Also available locally for analysis". Mechanics live in code + docs; the page copy says what the user needs to know. Touched: - connectors/internal/access.py: INTERNAL_TABLES descriptions - activity_center.html / admin_usage.html / admin_sessions.html subtitles - catalog.html Agnes Internal card description + row strip - admin_tables.html "Agnes internal" tab hint * fix(internal): is_user_admin arity bugs + + saved-view payload cap Round-1 code review (PR #278) caught two blocking bugs and three nits. Blocking — both `is_user_admin(user)` (single dict arg) calls raised TypeError. is_user_admin signature is `(user_id, conn)`. Affected: - app/api/query.py:_run_internal_query — every POST /api/query that references agnes_sessions / agnes_telemetry / agnes_audit blew up with a 500. The headline analyst-facing feature of this PR was unusable through the API. - app/api/v2_sample.py — same shape; `GET /api/v2/sample/agnes_` returned 500. Both fixed to call `is_user_admin(user.get("id"), conn)`. Added two FastAPI-level tests in test_internal_data_source.py that go through the TestClient — the existing unit tests on `execute_internal_query` and `build_filter_clause` skipped the request-handler layer where the bugs lived, which is why this landed. Nits also closed: - connectors/internal/access.py: `+` allowed in _USERNAME_RE / _USER_ID_RE so RFC 5321 email local-parts (alice+test@x) resolve correctly without hitting InternalAccessError. - app/api/observability.py: saved-view payload capped at 64 KiB to prevent an admin from bloating system.duckdb with a malformed save. fix(security): close non-admin data-leak via underlying-table refs PR #278 R2 review surfaced a non-admin-exploitable bypass: SQL whose string literal contains 'agnes_sessions' routed into the privileged internal-query path, then queried the underlying physical table (usage_session_summary / usage_events / audit_log) directly, escaping the CTE wrapper's row filter. Two reinforcing defenses: 1. find_internal_refs() now strips single-quoted string literals before scanning for alias names — a literal alone no longer routes the request into the privileged code path. 2. execute_internal_query() rejects non-admin SQL that references the underlying physical tables (usage_, audit_log). The CTE wrapper only scopes the agnes_ aliases; a direct FROM on the base table — or a shadowing inner WITH that still has to read the base table — bypasses RBAC. Block before execution with an actionable error pointing to the agnes_* alias. Admins are unaffected (god-mode short-circuit on the filter clause). 3. tests/test_internal_data_source.py — three new negative tests covering literal-only matches, direct-table refs, and CTE shadow attempts. Also tightens usage_ask.py's SELECT-only validator: pragma_table_info, pragma_storage_info, pragma_database_, and duckdb_tables / columns / views / indexes / schemas are reflection functions that leak metadata the analyst question shouldn't reach. \bPRAGMA\b in _FORBIDDEN never matched the function-call form (word-boundary between `A` and `_`). fix(security): dynamic denylist for non-admin internal queries R3 review (PR #278) caught a wider data-leak than R2: the underlying- physical-table guard listed only the 7 usage_* + audit_log tables, but system.duckdb has 30+ other sensitive tables — users (emails + ids), personal_access_tokens, resource_grants, user_groups, user_observability_views, store_, marketplace_, knowledge_, etc. A non-admin SQL like SELECT FROM agnes_sessions UNION ALL SELECT email, id, … FROM users LIMIT 1 would leak every user's row. Replaces the hardcoded denylist with a dynamic allowlist — non-admin SQL may reference ONLY the registered agnes_* aliases. Every other table in `information_schema.tables` (main schema) is rejected. Future migrations that add a new sensitive table are automatically covered without re-editing this module. Also strips SQL comments (`/* /` and `--`) before the identifier scan so a comment-wrapped table name (`//users//`) can't slip past the regex. Four new negative tests pin: `users`, `personal_access_tokens`, block-comment wrap, line-comment wrap. Plus: per-user view-count cap (100) on /api/admin/observability/views so an admin can't fill system.duckdb with thousands of saved views. release: 0.54.0 — Activity Center + Telemetry + Sessions + internal datasource Cuts the work shipped across this PR (Activity Center build, recursive internal data source) into a versioned release. Bumps pyproject.toml to 0.54.0; renames the top of CHANGELOG.md from [Unreleased] to [0.54.0] — 2026-05-12 with a header summary; opens a fresh [Unreleased] section for the next round. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 22:41:19 +02:00
Vojtech	d6ad08f107	Flea-market upload guardrails + soft delete + JOIN-based admin queue (#233 ) * feat(store): flea-market upload guardrails + soft delete + JOIN-based admin queue Adds an end-to-end guardrails pipeline for store uploads (manifest + static-security + LLM review), persists blocked bundles for forensics, introduces soft-delete (Archive) semantics, consolidates the legacy /store/{id} surface into /marketplace/flea/{id}, and reworks the admin queue so lifecycle filters read live entity visibility via LEFT JOIN rather than a denormalized submission column. Schema v29 → v35: * v29 store_submissions table + store_entities.visibility_status * v30 file_size, bundle_sha256, bundle_purged_at on submissions * v31 reshape store_submissions (drop legacy unique on entity_id) * v32 store_entities.archived_at/by + 'archived' visibility value * v33 drop store_submissions.retry_count (unused) * v34 ensure idx_store_submissions_entity exists post column-drop * v35 broaden visibility_status enum + JOIN architecture cutover Pipeline (src/store_guardrails/): * Inline checks: manifest_check, static_scan, quality_check * LLM review configurable haiku\|sonnet\|opus (default haiku) * BackgroundTasks-driven async path with structured-output JSON * Per-submitter daily quota (default 50) * 30-day TTL purge job (POST /api/admin/run-blocked-purge) * Bundle SHA256 + size persisted; sha256 survives purge for forensics Visibility model: * pending \| approved \| hidden \| archived * _enforce_visibility returns 404 (no leak) for non-owner non-admin * Owner sees own non-approved entries via include_owner_id widening * Install refused with 409 entity_not_approved when not approved Soft-delete (DELETE /api/store/entities/{id}): * Default = soft (visibility_status='archived'); existing installs keep getting served the bundle so users don't lose the plugin * ?hard=true admin-only: drops bundle + cascades user_store_installs * Hard-delete preserves entity_id on submission as tombstone so audit_log linkage survives for the activity timeline Admin queue lifecycle (the JOIN refactor): * Verdict (store_submissions.status) is immutable forensic record * Lifecycle (store_entities.visibility_status) is live state * /admin/store/submissions Archived chip translates to `e.visibility_status='archived'` via LEFT JOIN — any path that flips visibility surfaces in the queue immediately * Detail page renders Status (verdict) and Entity lifecycle side by side so admins see "approved at review, now archived" at a glance URL consolidation: * /store/{id} deleted (no redirect, stale bookmarks 404) * /marketplace/flea/{id} is the canonical detail surface * Three in-tree callers (upload-success, my-stack card, store listing card) updated to point at the new URL * Quarantine banner extracted to _quarantine_banner.html partial, self-guarded, included from both flea detail templates * Banner JS auto-refreshes when the verdict lands by polling /api/marketplace/flea/{id}/detail (visibility_status + submission_status — the latter is needed because blocked_llm keeps the entity at visibility_status='pending') Audit log resource format: * runner.py emits prefixed `store_submission:{id}` (post-fix) * Detail-page timeline query handles three patterns: prefixed submission, helper-emitted `store_entity:{sub_id}`, and bare-id legacy rows — all surface in the activity timeline UX fixes: * Owner sees Under review / Quarantined / Hidden banner with status * Install button gray-disabled (not blue) when non-approved * Owner cannot delete quarantined entries (403); admin can * Admin queue: filter chips, sortable columns, paging, page-size * Auto-refresh queue every 5s while pending rows are visible * Store upload page file picker no longer opens twice (label → input default action collided with explicit JS handler) Tests: 168 passed across the guardrails suites (admin submissions, store API, inline / LLM / purge guardrails, store repositories, marketplace filter, schema version). New regression coverage includes: archive surfaces via JOIN even when API path is bypassed; deleted submission renders activity timeline (tombstone); flea detail surfaces submission_status only for owner/admin; detail page renders Entity lifecycle row; audit log resource format covers both helper and runner paths. * fix(store-guardrails): PR #233 follow-up — prompt injection, atomic PUT, BG race, schema, reaper, sort whitelist Addresses 9 of the 23 findings from the PR #233 review (spec at docs/superpowers/specs/2026-05-09-pr233-guardrails-fixes-spec.md). Merge-gate items #1-#6 plus high-value mediums #7, #9-#12, #23. Architectural items (#8 enum split, #14 factory) and pure maintainability (#15-#22) deferred to follow-ups. Security: * #1 prompt injection — SYSTEM_PROMPT now passed via the SDK's dedicated system= parameter; bundle wrapped in <bundle>...</bundle> sentinels declared data-only by the system prompt; literal sentinel strings in user content are escaped so an adversarial README can't forge a close tag. * #6 static scan honesty — module docstring + admin copy + docs declare static scan as signal not gate; .md/.txt/.rst/.html/.json/ .yaml/.yml/.toml skipped to avoid false positives on prose. AST mode for Python deferred (separate flag, FP comparison work). Correctness: * #2 PUT atomicity — bundles bake into plugin.staging-<rand>/ alongside live, atomic-rename on success; failed checks leave live tree byte-for-byte intact. * #3 BG-task race — set_visibility_if_pending guards verdict flips to the (pending, hidden) review window; admin archives during review survive; skipped flips audit-logged. * #4 v35 NOT NULL/DEFAULT — schema v35→v36 re-applies them on store_entities.visibility_status. CHECK constraint enforced application-side (DuckDB ADD CHECK on existing column unsupported). * #7 stuck-review reaper — reap_stuck_llm_reviews flips pending_llm rows older than guardrails.stuck_review_grace_seconds (default 1800) to review_error. Scheduler runs every 15 min via new /api/admin/run-reap-stuck-reviews. Set knob to 0 to disable. * #9 quota counter — count_blocked_for_submitter_since now counts blocked_inline + blocked_llm + review_error so a submitter triggering only LLM-blocked verdicts is bounded. * #10 missing risk_level — surfaces as review_error with error='missing_risk_level' instead of silently defaulting to 'medium' (which looked like a model-decided block). * #11 archived_at clear — set_visibility nulls archived_at + archived_by when transitioning out of 'archived' so a future read doesn't show stale archive forensics on an approved row. Maintainability: * #12 FSM doc comment — accurate insert/transition/lifecycle description in src/db.py near store_submissions schema. * #23 sort-key whitelist — admin queue rejects unknown sort keys with 400 invalid_sort_key; substring-replace footgun removed. Deferred (separate PRs): * #5 quota race — proper fix requires asyncio.Lock spanning the full pipeline; threading.Lock blocks event loop, DuckDB MVCC doesn't help. API-level slowapi bounds worst case for now. * #6 part 3 (AST static scan), #8 (enum split), #13 (import bundle docs), #14 (factory consolidation), #15-#22 (maint). Tests: * New: tests/test_store_guardrails_prompt_injection.py (corpus + trust-boundary invariants), tests/test_store_put_atomic.py, tests/test_store_guardrails_reaper.py. * Extended: test_store_guardrails_llm.py (system param, missing risk_level, BG race), test_admin_store_submissions.py (quota counter widening, sort whitelist 400), test_store_repositories.py (un-archive metadata clear), test_db_schema_version.py (v36). * Full suite: 3738 passed; 17 pre-existing baseline failures unchanged (db migration tests, cli binary rename, catalog export, user mgmt v5 backfill — confirmed by stash + rerun on clean tree).	2026-05-09 17:32:53 +04:00
minasarustamyan	e26236fdc1	Extract session-pipeline framework + UsageProcessor skeleton (#232 ) * Extract session pipeline framework, refactor verification, add UsageProcessor skeleton Pluggable framework under services/session_pipeline/ (contract + lib + per-processor runner) so multiple processors can read /data/user_sessions/<key>/.jsonl on their own cadence with full failure isolation. Verification flow becomes the first plugin; a no-op UsageProcessor reserves the second slot pending a separate brainstorm on extraction logic + storage shape. Schema v28→v29: rename session_extraction_state → session_processor_state with composite PK (processor_name, session_file). Existing rows copied over with processor_name='verification'; legacy table dropped. Migration is idempotent and no-ops the copy step on fresh installs that came up at the new schema. Endpoint: /api/admin/run-verification-detector replaced by parametrized /api/admin/run-session-processor?processor=<name>. Audit action format follows. Scheduler JOBS: verification-detector entry split into session-processor:verification + session-processor:usage. SCHEDULER_VERIFICATION_DETECTOR_INTERVAL retained for operator compatibility (drives both cadence and health-check grace window); SCHEDULER_USAGE_PROCESSOR_INTERVAL added. Address PR #232 review: scan dead branch + per-processor lock - `SessionProcessorStateRepository.scan_unprocessed_for` dead else: both branches surfaced every jsonl, the SELECT was unused, runner MD5-rehashed every stable session per tick. Replaced with an mtime precheck — stable sessions (mtime <= processed_at) are filtered at scan; modified files still surface for the runner's authoritative `file_hash` invalidation. Naive-local comparison matches the existing health-check idiom (DuckDB TIMESTAMP strips tz on storage). - Per-processor advisory lock around `_run_processor` in `/api/admin/run-session-processor`. Scheduler tick + manual admin POST could otherwise both run, both call create_evidence on overlapping detections, and accumulate duplicate verification_evidence rows (the dedup short-circuit only covers create+contradiction, not evidence per ADR Decision 3). Non-blocking acquire → 409 Conflict on concurrent invocation; release in finally so a runner exception doesn't wedge the processor. Tests: two new scan unit tests (mtime filter + post-mark mtime bump), 409 endpoint test, lock-released-on-exception test. Two existing tests updated for the new "filtered at scan" stat shape (previously asserted skipped == 1, now scanned == 0). * Address PR #232 review #2: parallel scheduler tick + last_run on terminal state Two pre-existing scaffold bugs in services/scheduler/__main__.py amplified by adding more session-pipeline jobs: 1. Serial for-loop over jobs with synchronous httpx.post(timeout=900) — a 10-minute verification run blocked every other job (data-refresh, health-check, usage, corporate-memory) for the whole window. The PR's stated isolation guarantee held inside the runner but broke at the scheduler dispatch layer. 2. last_run advanced only when _call_api returned True. Permanent-failure jobs hot-looped on every tick (30s) instead of cadence (15min). Fix: ThreadPoolExecutor.submit per due job + per-job in_flight set so a long-running job can't be re-launched on subsequent ticks. last_run advances unconditionally in finally; errors still surface via _call_api logging + audit_log on the receiving side. _run_job extracted to module-level for unit testing. New tests: - TestRunJobBookkeeping: advances on success / failure / unhandled raise - TestRunLoopParallelism: in_flight protection prevents duplicate launches across ticks for a single slow job --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>	2026-05-08 19:47:46 +02:00
ZdenekSrotyr	506a378c3a	release: 0.47.1 — Keboola connector v27 (incremental, partitioned, where_filters, typed parquet) (#217 ) ## Summary Brings the Keboola connector to feature parity with the legacy internal data-analyst's per-table sync strategies. Closes the four documented gaps from the spec branch (`zs/keboola-connector-specs`): - Typed parquet in the legacy SDK extraction path — column types from Keboola Storage metadata (provider cascade `user > ai-metadata-enrichment > keboola.snowflake-transformation`) survive the CSV → parquet roundtrip; invalid date strings (`'0000-00-00'`) and invalid numeric strings (`'Non-Manager'`) become NULL while keeping the column's typed schema. Pre-fix everything was VARCHAR. - Incremental sync via Storage API `changedSince` — opt-in per table; pulls only delta rows, merges into the existing parquet by `primary_key` (drop_duplicates with keep='last'). Cuts daily extraction from O(full table) to O(delta). - Partitioned sync — flat per-partition layout `data/<table>/<key>.parquet` (e.g. `2026_05.parquet`), per-affected-partition merge for daily updates, chunked initial load with 1-day overlap and 2-empty-chunk stop heuristic. - `where_filters` — server-side row filter with date placeholders (`{{today}}`, `{{last_3_months}}`, `{{start_of_3_months_ago}}`, etc.) resolved at sync time. Force the SDK path; reject `incremental + where_filters` combination at API layer (changedSince already filters temporally). ## Architecture - Schema migration v25 → v26: 7 new columns on `table_registry`. Existing `sync_strategy` column reused (pre-v26 it was inert catalog metadata; post-v26 the extractor dispatches off it). - Per-table dispatcher in `extractor.run()` routes to one of `_extract_via_extension` (full_refresh + extension), `_extract_via_legacy` (full_refresh + filters or extension fallback), `extract_incremental`, or `extract_partitioned`. - API conflict policy: `incremental + where_filters` → 422; `partitioned + query_mode='remote'` → 422; `partitioned ⇒ partition_by required`. - Admin UI: third "Direct extract (Storage API)" radio in the Keboola Register / Edit modals, alongside existing "Whole table (extension)" and "Custom SQL". When selected, exposes a v26 sync-strategy panel with conditional fields per strategy. ## Test plan - [x] Unit + module — 134 v26 tests covering migration, repo, parquet_io, where_filters, incremental (compute_changed_since + merge_parquet + extract_incremental E2E), partitioned (key derivation + merge_partition + chunked windows + extract_partitioned E2E), extractor dispatcher, admin API validators, PUT field clearing, registry-shape → dispatcher bridge - [x] HTML form structure — all v26 inputs + visibility classes + JS payload fields verified in rendered template - [x] Real Keboola roundtrip — registered a small test table as `sync_strategy='incremental'` against a test Storage project, triggered two syncs: - Sync 1: `changedSince=None` → full pull → 9 rows typed parquet - Sync 2: `changedSince=last_sync - 1d window` → 9 delta rows merged with 9 existing → 9 after dedup on primary_key (PK merge confirmed) - [x] Browser UX — agent-browser session against a local uvicorn: login → admin/tables → register modal → switch radios → verify field visibility per strategy → submit → edit existing row → switch to Direct/Incremental → save → confirm DB persistence - [x] Regression — no regressions in the broader 3252-test suite (3 pre-v26 tests updated for the deprecation-marker removal + schema-version bump; 2 pre-existing environment-sensitive test failures unrelated to this change) ## Bugs caught + fixed during E2E The browser + real-Keboola roundtrip exposed four bugs the unit tests missed: 1. JS visibility race — two competing `forEach` loops set `display=''` then `display='none'` on form elements sharing `kb-strategy-incremental kb-strategy-partitioned` classes (window_days + max_history_days are reused across strategies). Fix: single-pass selector with class-based visibility resolver. 2. PUT cannot clear field — pre-v26 `updates = {k: v ... if v is not None}` collapsed "omitted from body" and "sent as null" into the same case, so admin couldn't switch a partitioned row back to full_refresh and have stale `partition_by` clear. Fix: `model_dump(exclude_unset=True)`. 3. Subprocess DB lock conflict — `_read_last_sync` reopened `system.duckdb` while the parent server held the write lock (subprocess contract at `app/api/sync.py:_run_sync` line 260). Fix: parent injects `__last_sync__` into table_config before subprocess spawn. 4. Wrong KBC table_id — `extract_incremental` / `extract_partitioned` built the Storage API table_id from the registry row's slugified `id` (`circle_inc`) instead of `bucket.source_table` (`in.c-finance.circle`), producing 404s. Fix: prefer `bucket+source_table`; fall back to `id` only when bucket empty. ## Operator notes - Existing tables stay on `full_refresh` after migration; admins opt individual tables in via `agnes admin register-table --sync-strategy ...`, the Keboola Edit modal, or `POST/PUT /api/admin/registry`. - `merge_parquet` and `merge_partition` use `pd.concat + drop_duplicates`, loading both existing and delta into pandas RAM. For tables in the multi-million-row range this may OOM — switch to `partitioned` strategy for those (per-partition merge keeps memory bounded). Documented in `### Internal` of the changelog entry. - Date placeholders are resolved at sync time, not register time — a typo'd `{{lasst_week}}` is accepted at register and surfaces only when the next sync runs. By design (rolling windows need late-binding). ## Spec source The four corresponding plans on the `zs/keboola-connector-specs` branch under `docs/superpowers/plans/2026-05-07-0[1-4]-*.md` capture the design rationale and link back to internal repo references for each subsystem. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/217" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->	2026-05-07 19:01:27 +02:00
ZdenekSrotyr	aa5921da67	release: 0.47.0 — source-agnostic catalog metadata + cache discipline (#223 ) ## Summary - Catalog enrichment for `query_mode='remote'` rows: `rows`, `size_bytes`, `partition_by`, `clustered_by` per table (BQ + Keboola providers). - `/api/v2/schema/{id}` cache miss: 2 BQ jobs → 1 (-50%) via shared `fetch_bq_columns_full`. - All four catalog/schema/sample/metadata caches flush on registry change; single-row re-warm scheduled. - Automatic cache warmup at server startup (bounded concurrency, opt-out via `AGNES_SKIP_CACHE_WARMUP=1`). - SSE-driven freshness toolbar on `/admin/tables` with progress bar, log, and per-row badge. - New admin doc `docs/admin/query-modes.md` — single source of truth on `local` / `remote` / `materialized` choice. Closes #155. Closes #156. ## Test plan - [x] 65+ targeted tests pass across 11 new test modules + 3 modified ones. - [x] No DB migration; no wire-break; `MIN_COMPAT_CLI_VERSION` unchanged. - [ ] Reviewer: register a remote BQ table via `/admin/tables`, observe the toolbar populates within ~2 s and the per-row badge transitions warming → fresh. - [ ] Reviewer: trigger `Re-warm all`, verify SSE log scrolls and `cacheWarmupBar` progresses. - [ ] Reviewer: edit a registered row's bucket, verify `agnes schema <id>` returns updated columns immediately (no 1-hour staleness). - [ ] Reviewer: confirm `agnes admin register-table --query-mode remote` prints the new IAM-smoke-check hint. ## Notable design decisions - BigQuery `INFORMATION_SCHEMA.TABLE_STORAGE` is the only valid scope for size+rows (verified live 2026-05-07; dataset-scoped doesn't exist). Region resolved from `instance.yaml.data_source.bigquery.location` → `bq.client().get_dataset(...)` → fall back to legacy `__TABLES__`. - VIEW handling: TABLE_STORAGE returns no rows for views, fall through to `__TABLES__` (also empty) → `TableMetadata(rows=None, size_bytes=None, partition_by=..., clustered_by=...)`. Null size signals analyst Claude to apply existing CLAUDE.md guidance. - `size_bytes` is `active_logical_bytes + long_term_logical_bytes` — full BQ scan reads both; reporting only active undercounts aged partitioned tables. - Source-agnostic provider seam: per-source `connectors/<source>/metadata.py:fetch(MetadataRequest)`; dispatcher in `app/api/v2_catalog.py:_metadata_provider_for` lazily imports per source_type so a Keboola-only deployment doesn't pay the BQ-extension import cost. - Warmup non-blocking: FastAPI `lifespan` schedules `asyncio.create_task(_warm_catalog_caches_bg)` before `yield`. Per-row failures isolated. ## Out of scope - Profile / column histograms / dimension cardinality for remote tables (separate issue). - Onboarding nudge ("you have 0 remote tables, consider registering some BQ ones") — separate UX call. - Provider plug-in registration via entry-points (the dispatch table is a hardcoded if-tree today; one line per future source). ## Release Bumps `pyproject.toml` 0.46.1 → 0.47.0 (main shipped 0.46.0 + 0.46.1 during this PR — see commit `d98976ec`). New CHANGELOG section under `## [0.47.0] — 2026-05-07`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/223" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->	2026-05-07 18:33:55 +02:00
ZdenekSrotyr	28430ced09	Keboola cutover: native parquet path + sync correctness + auto-discover protection (#190 ) * fix: cutover regressions + parallel Keboola legacy fallback Bundled fixes from a fresh-deploy run on a Keboola Storage backend with the block-shared-snowflake-access feature flag — DuckDB Keboola extension's per-table scan can't access bucket schemas, so the legacy kbcstorage Storage-API client is the only working path. CUTOVER REGRESSIONS - agnes pull hash mismatch on every Keboola local-mode table — src/orchestrator.py:_update_sync_state stored md5(mtime+size)[:12] while the CLI compares against full 32-char content MD5. Now stores the same content MD5 the materialized SQL path already used. - Trailing-slash sanitization in connectors/keboola/access.py and extractor.py — DuckDB Keboola extension's ATTACH fails when the URL ends in / (canonical form). - src/profiler.py:TableInfo.description becomes optional — two call sites instantiated without it, crashing the profiler pass. - scripts/ops/agnes-auto-upgrade.sh: chown on UID change — older images ran as root, current runs as agnes (uid 999). Reads target uid:gid from /etc/passwd inside the new image and chowns ${STATE_DIR}, /data/extracts, /data/analytics when the digest moves. - POST /api/sync/trigger is now singleton per process — two near-simultaneous trigger calls each forked an extractor subprocess, fought for extract.duckdb's file lock, starved uvicorn, flipped the container to unhealthy. Trigger now returns 409 (sync_already_in_progress) when held; _run_sync acquires non-blocking. PARALLEL LEGACY FALLBACK - Process pool fan-out for the _extract_via_legacy queue (default 8 workers, override via AGNES_KEBOOLA_PARALLELISM). Process pool, not thread pool, because connectors/keboola/client.py:export_table does os.chdir(temp_dir) — process-global, so threads raced and slice files landed in the wrong directory ("[Errno 2] No such file or directory: '<job_id>.csv_X_Y_Z.csv'"). - Extractor subprocess timeout 1800s -> 3600s (configurable via AGNES_EXTRACTOR_TIMEOUT_SEC). 28+ tables × multi-minute Keboola export jobs need the headroom on telemetry-class projects. - Process group cleanup on timeout — Popen(start_new_session=True) puts the extractor in its own group. On timeout the parent SIGTERMs the group (10s grace) then SIGKILLs stragglers. Without this, the pool workers were reparented to PID 1 and continued holding open Keboola Storage export jobs. Inline extractor script also installs a SIGTERM -> sys.exit(143) handler so the with ProcessPoolExecutor(...) block __exit__ runs cleanly. Tests: existing tests that patched subprocess.run updated to patch subprocess.Popen with a _FakePopen stand-in (same exit-code-injection contract). Two tests that exercised the parallel path forced AGNES_KEBOOLA_PARALLELISM=1 to keep mocks alive (mocks don't ride into ProcessPoolExecutor subprocesses). Squashed onto current main (was 7 commits + multi-commit CHANGELOG + agnes-auto-upgrade.sh conflicts; squash avoids per-commit conflict resolution against main's flat-mount STATE_DIR refactor and 0.38.0 release cut). * feat(keboola): Storage API direct extract path; drop extension data path The DuckDB Keboola extension's COPY routes through Keboola QueryService, which is unreliable on linked-bucket projects (extension v0.1.6 fixes that case but isn't yet in the community CDN, and pre-fix any project with the block-shared-snowflake-access feature flag couldn't see bucket schemas at all). Move the extract path off the extension entirely and talk to the Storage API directly via signed-URL download — works on any project, regardless of extension state. connectors/keboola/storage_api.py (NEW) Lightweight client built on requests.Session. Three endpoints: - POST /v2/storage/tables/{id}/export-async (kicks off job) - GET /v2/storage/jobs/{id} (poll until done) - GET /v2/storage/files/{id}?federationToken=1 (signed URL detail) - GET <signed_url> (download bytes) Supports sliced exports (manifest + per-slice signed URLs) and gzipped payloads. ExportFilter dataclass mirrors the Keboola filter spec (whereFilters / columns / changedSince / limit) and handles JSON round-trip with the registry's source_query column. Token redaction in error messages. Bounded exponential backoff on job polling. No cloud-SDK dependency on the data path; thread-safe. connectors/keboola/extractor.py - materialize_query() rewritten: takes bucket/source_table/source_query (JSON filter spec), exports via KeboolaStorageClient, converts CSV to parquet via DuckDB, atomic os.replace. Same return shape so sync.py downstream code stays uniform with the BQ branch. - _extract_via_legacy() also moved to Storage API direct (kept the name for caller compatibility with _legacy_worker / the parallel batch extractor). Per-call temp directories — no os.chdir, threads don't race. app/api/sync.py _run_materialized_pass for source_type='keboola' rows now constructs a KeboolaStorageClient (replaces KeboolaAccess) and passes bucket/source_table/source_query to materialize_query. Reuses one client across rows for HTTP keep-alive. Sources keboola URL from env too (KEBOOLA_STACK_URL) when instance.yaml doesn't have stack_url configured. cli/commands/admin.py discover-and-register defaults Keboola rows to query_mode='materialized' (NULL source_query = full table), matching the v26 migration's unification of the local/materialized split for Keboola. BigQuery and Jira keep their per-source defaults. src/db.py Schema bump 25 → 26. Migration: UPDATE table_registry SET query_mode='materialized' WHERE source_type='keboola' AND query_mode='local'. NULL source_query on those rows means "full table export" — same effective behavior the local mode provided, but now via Storage API instead of the extension. pyproject.toml kbcstorage dep stays (admin-side bucket/table list still uses the SDK in app/api/admin.py / connectors/keboola/client.py); only the data path is migrated off the SDK. Comment updated to reflect the new boundary. tests - test_keboola_storage_api.py (NEW, 19 tests): ExportFilter parsing, HTTP client (token redaction, retry logic, polling), download_file (single, gzipped, sliced), end-to-end export_table_to_csv. - test_keboola_materialize.py rewritten: mocks KeboolaStorageClient instead of FakeAccess; same atomic-write + zero-rows + unsafe-id contracts. - test_sync_trigger_keboola_materialized.py: registry rows now carry bucket+source_table+JSON-shape source_query. 114+ Keboola-impacted tests green locally. * test: schema version assertion bumped to 26 alongside the keboola query_mode migration * fix(keboola): cutover hot-patches surfaced on agnes-dev Five small fixes that were applied as in-container hot-patches during agnes-dev cutover and need to be on the source-of-truth image so a fresh upgrade does not undo them. - app/api/sync.py: auto-discover gate considers the WHOLE registry (any source, any mode), not just rows where source matches and query_mode is local. After the v25→v26 keboola materialized migration an instance can have 30 materialized rows and zero local rows; the previous gate kept re-firing _discover_and_register_tables every scheduler tick, creating duplicate auto-discovered rows with the wrong bucket prefix every time. - app/api/admin.py: _discover_and_register_tables reassembles the bucket as <stage>.<bucket-id> (e.g. in.c-finance) instead of dropping the stage prefix; default query_mode for keboola is now materialized (the v26 contract); validator allows NULL source_query for keboola materialized rows (full-table export via Storage API export-async, no SQL needed). - cli/commands/admin.py: register-table mirrors the server validator (NULL source_query allowed for source_type=keboola); --bucket help text generalized to cover both BQ dataset and Keboola bucket id. - connectors/keboola/extractor.py: max_line_size=64 MiB on read_csv_auto so embedded JSON / SQL cells (kbc_component_configuration in particular) do not trip the default 2 MiB ceiling. - connectors/keboola/storage_api.py: GCP backend support — when the Storage API returns a manifest whose slice URLs are gs:// references with a gcsCredentials block, rewrite to the JSON REST download endpoint and authenticate with the issued OAuth bearer token; redact tokens in any surfaced error string. * test: align with new keboola materialized + auto-discover-gate contracts - test_admin_keboola_materialized: rename test_register_keboola_materialized_rejects_missing_source_query → test_register_keboola_materialized_accepts_missing_source_query. v25→v26 introduced 'keboola materialized with NULL source_query means full-table export via Storage API export-async' as the default registration shape; the rejection case is no longer the contract. - test_sync_filter: add list_all() to _StubRegistry. The auto-discover gate in _run_sync now keys off the WHOLE registry (not just local rows) so materialized-only Keboola instances do not re-trigger discovery on every tick. * feat(keboola): native parquet export — skip CSV roundtrip Storage API export-async accepts fileType={csv,parquet}. Switching the materialized sync to parquet eliminates the CSV → DuckDB COPY → parquet roundtrip that pinned a single uvicorn worker over 4 GiB on multi-GB tables (read_csv with all_varchar + max_line_size=64MB has to materialize the whole CSV in memory before COPY can stream out a parquet). Snowflake UNLOAD on Keboola's side already produces typed, self-contained parquet files; the extractor downloads them and renames into place. Two cases: - Single-file export (small table): file_info.url points at one signed URL; download_file streams chunks straight to .parquet.tmp and we're done. No DuckDB. - Sliced export (Snowflake UNLOAD respects MAX_FILE_SIZE — 16 MiB default — so anything larger arrives as N parquet slices): each slice is a complete parquet file with its own footer; naive concat would corrupt them. download_file_slices keeps the slices as separate files in a tempdir, then DuckDB COPY (SELECT * FROM read_parquet([slice0, slice1, ...])) merges them into one consolidated parquet. DuckDB streams row groups during this — peak memory bounded to one row group (~1 MiB) regardless of source size. The legacy CSV path stays as the explicit opt-in via source_query= '{"file_type":"csv"}' for projects whose backend can't UNLOAD parquet (none known today; cheap escape hatch). Backward-compat alias KeboolaStorageClient.export_table_to_csv kept. Also fixes a latent bug in download_file's gzip detection: previous heuristic flagged any unencrypted file as gzipped, which would have corrupted parquet downloads at gunzip time. Name-suffix-only now. * fix: tempdir leak cleanup, every 0m schedule, /sync/trigger body shapes Three small self-contained fixes uncovered during agnes-dev cutover. - connectors/keboola/extractor.py: tempfile.TemporaryDirectory now uses ignore_cleanup_errors=True so a worker death mid-write doesn't leave multi-GiB stale slice trees on the boot disk. (12 GiB seen after a disk-full crash where TemporaryDirectory's own cleanup also raised and got swallowed.) - src/scheduler.py: is_valid_schedule accepts 'every 0m' (interval=0 = always due). Force-resync of an errored row no longer requires waiting out the default 'every 1h' interval — admin can flip the schedule, trigger, then flip back. - app/api/sync.py: POST /api/sync/trigger accepts both ['table_id'] (legacy bare-array body) and {'tables': ['table_id']} (matches the response payload shape, more discoverable for clients building requests by hand). Malformed bodies return 422 with a structured detail; null/missing means 'sync everything' as before. Tests cover: tempdir cleanup on raise (sliced parquet path), is_valid_schedule + is_table_due 'every 0m' acceptance, and trigger body parametrized matrix (8 valid shapes + 6 rejection cases). * fix: targeted-trigger filter in materialized pass + auto-upgrade defer Two operational gaps observed during agnes-dev cutover, in the same sync-routing area. - _run_materialized_pass now takes a 'tables' arg and skips rows not in the target set with reason='not_in_target'. POST /api/sync/trigger with a body of tables previously only scoped the legacy extractor subprocess — the materialized pass kept iterating every due materialized row, so an admin asking to re-sync kbc_job re-ran every other due materialized row alongside it. Match on registry id OR name (admins commonly pass either form). tables=None preserves the no-filter behavior. - New GET /api/sync/status (public, no auth) returns {locked: bool} off _sync_lock.locked(). agnes-auto-upgrade.sh probes this before docker compose up -d and exits 0 with a 'deferred recreate' log line if a sync is in flight — the next 5-min cron tick retries. Pre-fix, an auto-upgrade triggered mid-sync would recreate the uvicorn worker and kill the in-flight extractor / Snowflake-UNLOAD download (observed when kbc_job's first 7-day retry got SIGKILLed). Connection failures in the probe fall through to the upgrade — being stuck on a wedged image is worse than interrupting a hypothetical sync. * fix: auto-discover protects admin overrides + surfaces drift Two real-world incidents on agnes-dev drove this: 1. kbc_job was registered manually with the correct (in.c-kbc_telemetry, kbc_job) coordinates. A naive auto-discover re-run would have inserted a SECOND kbc_job row at the slugified id 'in_c-keboola-storage_kbc_job' (where Keboola's discovery places it) — and that row's Storage API export-async 404s. 2. An earlier auto-discover bug stripped the stage prefix from bucket ids ('c-finance' instead of 'in.c-finance'), inserting 137 rows whose syncs all failed. Fix: - _discover_and_register_tables now builds a plan first (_build_keboola_discovery_plan) classifying each discovered table into one of new / existing_match / existing_drift / invalid, then executes only the 'new' bucket. Drift rows are reported with both sides of the disagreement plus drift_kind: - same_id_diff_coords: registry has the same id but different bucket / source_table (admin migrated coords inline). - name_collision: discovery's slugified id differs from any registry id, but the discovered .name matches an existing row's .name (case-insensitive). Catches the kbc_job case. - Bucket detection now prefers the API's authoritative bucket_id field (separate field on the Keboola tables.list response, normalised by KeboolaClient.discover_all_tables). Falls back to id-string parsing only when bucket_id is missing (older fallback path inside discover_all_tables). - Endpoint POST /api/admin/discover-and-register?dry_run=true returns the plan without writing — would_register, drift, invalid lists. Lets an operator audit before merging discovery with a registry that has admin overrides. Removed 'every 0m' from test_register_request_rejects_malformed_sync_schedule — the runtime started accepting it in the previous commit (force-resync override) and the validator follows suit. * feat(keboola): AGNES_TEMP_DIR routes tempfiles off overlayfs /tmp The container's /tmp lives on the boot disk's overlayfs (29 GiB on agnes-dev, shared with /var). Snowflake UNLOAD of a wide table writes slices into per-call /tmp tempdirs that fill multi-GiB / many-slice exports long before the dedicated data disk fills. agnes-dev hit 100% boot-disk while the 20 GiB data disk had 15 GiB free. connectors.keboola.storage_api.get_temp_root() reads AGNES_TEMP_DIR; mkdirs the target on first use; unset / empty / unwritable falls back to None (system tempdir, OSS-pre-fix behaviour). Both materialize_query (parquet path) and _extract_via_legacy (CSV fallback) and the sliced-CSV concat path in storage_api use the helper now. docker-compose.yml defaults AGNES_TEMP_DIR=/data/tmp on app, scheduler, and extract services. The data volume is the dedicated disk in production layouts and a plain docker volume in single-disk dev/laptop setups — same blast radius as the previous /tmp default on the latter, no regression.	2026-05-07 12:12:14 +02:00
ZdenekSrotyr	05e535d743	fix(admin/tables): unescape shell-quoting backslashes in descriptions	2026-05-06 10:13:49 +02:00
ZdenekSrotyr	df2c33147c	fix: Devin Review on #194 round 2 — 3 BUG-class findings 1. instance.yaml overlay path now matches read site under STATE_DIR. Three sites updated: - app/api/admin.py:1005 (server-config endpoint writer) - app/api/admin.py:2610 (configure endpoint writer) - app/instance_config.py:106 (overlay reader) All three now go through _state_dir() so under flat-mount layout (STATE_DIR=/data-state) the irreplaceable instance.yaml overlay lands on the state disk (sdc) instead of the regenerable data disk (sdb). Without this fix, .env_overlay correctly went to the state disk while instance.yaml went to the data disk — config would be lost if an operator wiped sdb. 2. Strip customer-specific tokens from OSS repo per CLAUDE.md vendor-agnostic rule: - docker-compose.host-mount.yml: 'a deployer (Groupon FoundryAI)' → 'a deployer in production' - docker-compose.flat-mount.yml: 'caused 2026-05-05 in the Groupon FoundryAI deployment' → generic 'production failure mode' - docs/state-dir.md: rewrote the incident reference to describe the failure mode abstractly without naming the deployment; updated the recommendation table to say 'shadow-mount class' instead of dating the specific incident. 3. Updated docs/state-dir.md 'What reads STATE_DIR' to list all read/write sites including the three migrated in this round (admin.py, instance_config.py, marketplaces.py). ANALYSIS finding (tls-rotate.sh hardcoded host-mount.yml) deferred — same operator-side class as auto-upgrade.sh hardcoded host-mount, documented limitation per the PR body.	2026-05-05 20:02:50 +02:00
ZdenekSrotyr	b6543c9c55	fix: Devin Review on #194 — 2 BUG-class findings 1. .env_overlay write paths now match read path under STATE_DIR. app/main.py:343 reads via _state_dir() (post-PR #194), but two write sites still hardcoded ${DATA_DIR}/state/.env_overlay: - app/api/admin.py:2687 — configure endpoint secrets persistence - app/api/marketplaces.py:152 — marketplace PAT persistence Under flat-mount layout (STATE_DIR=/data-state) the admin UI wrote secrets to /data/state/.env_overlay while the app read from /data-state/.env_overlay, silently dropping the value on next restart. Both write sites now go through _state_dir(). 2. host-mount.yml: caddy inherits data:/srv:ro from base, but with no service populating the data: named volume (other services switched to direct /data binds), the inherited mount points at an empty Docker volume — try_files finds nothing, every parquet download falls through to uvicorn, defeating the v0.36.0 file_server bypass under the host-mount layout. Added a caddy override that restates all mounts including a direct /data:/srv:ro bind. Mirrors the comment + treatment already in flat-mount.yml.	2026-05-05 19:47:12 +02:00
ZdenekSrotyr	4f04235502	feat(bigquery): bq_query_timeout_ms knob; default 600s (was 90s) DuckDB BigQuery extension defaults `bq_query_timeout_ms` to 90 s, which is too tight for analyst-scale queries against view-backed BQ datasets. `agnes query --remote` HTTP 400'd with `Binder Error: Query execution exceeded the timeout. Job ID: ...` whenever the underlying BQ job ran longer than 90 s, even though the job itself was healthy. Add `data_source.bigquery.query_timeout_ms` (default 600 000 ms = 10 min, sentinel 0 falls through to the extension default). Applied via `SET bq_query_timeout_ms` after every `LOAD bigquery` on every BQ-touching DuckDB session: orchestrator's `_remote_attach` ATTACH path, BqAccess session factory, and the standalone extractor. Configurable via `/admin/server-config` UI. Fail-soft: extension versions that don't recognise the setting silently keep the default rather than poisoning the session.	2026-05-05 16:40:40 +02:00
ZdenekSrotyr	d878764ac1	fix(session-collector-api): mirror sibling endpoints' audit-on-exception (Devin Review on #179 ) Devin flagged that run_session_collector still had the same audit-skip gap I fixed in run_verification_detector and run_corporate_memory in the previous two rounds — a PermissionError walking /home, an OSError on /data/user_sessions mkdir, or any other unhandled exception from collector.run() would skip the audit_log row and only show in docker logs. Same try/except + unhandled_error pattern as the sibling endpoints. All three LLM-pipeline run-* endpoints now record their failures the same way; /admin/scheduler-runs sees them. Regression test in tests/test_admin_run_endpoints.py::TestRunSessionCollector::test_unhandled_exception_still_audits.	2026-05-05 09:31:33 +02:00
ZdenekSrotyr	e86da72997	fix(corporate-memory-api): mirror verification-detector audit-on-exception (Devin Review on #179 ) Devin flagged that run_corporate_memory still had the same audit-skip gap I just fixed in run_verification_detector — if collect_all() throws anything other than the already-translated ValueError (DuckDB lock, network blip, unexpected SDK error), the audit_log row was never written and /admin/scheduler-runs missed the failure. Same try/except + unhandled_error pattern as the verification_detector fix from `4c4dfee8`. Regression test in tests/test_admin_run_endpoints.py::TestRunCorporateMemory::test_unhandled_exception_still_audits.	2026-05-05 09:11:13 +02:00
ZdenekSrotyr	4c4dfee8e6	feat(profile): /profile/sessions page + audit on detector exception + correct SCHEDULER_AUDIT_ACTIONS Three changes addressing user feedback during e2e test of #179 + Devin Review on `e86dd5ed`. 1) /profile/sessions — new self-service user page in the user menu. Lists all session jsonls the caller uploaded via `agnes push` joined against session_extraction_state. Each row shows uploaded_at, file size, status badge (pending/processed/extracted), processed_at, and items_extracted. The page docstring + help text explicitly call out that items_extracted=0 means the verification detector ran fine but the LLM found no claims to track — that's the documented "no items" outcome, not a broken pipeline. Closes the gap surfaced during the e2e test of #176 where a user could see their sessions on disk and process them through the LLM but had no UI to inspect what happened. 2) run_verification_detector audits unhandled exceptions (Devin #1). If detector.run() threw anything other than the already-translated ValueError, the audit_log row was never written. The endpoint now wraps detector.run in try/except, records the exception in audit_params["unhandled_error"], then re-raises as 500 after audit. The /admin/scheduler-runs page surfaces the failure row with the error type + message. 3) SCHEDULER_AUDIT_ACTIONS list corrected (Devin #2). Previous list had "marketplaces_sync_all" (wrong — actual is "marketplace.sync_all") plus "data_refresh" and "scripts_run_due" which app/api/sync.py and app/api/scripts.py don't write to audit_log. Fixed to the four actually-logged strings; comment points at the missing audit calls as a follow-up. Tests: tests/test_web_ui.py adds TestAdminRoleGuards::test_profile_sessions_page_no_admin_required and tightens test_admin_scheduler_runs_page_admin_only to assert the correct marketplace.sync_all string.	2026-05-05 08:57:35 +02:00
ZdenekSrotyr	e68c2d3f0f	fix(session-collector): argv-free run() helper, drop SystemExit footgun (Devin Review on #179 ) run_session_collector called collector.main() which did argparse.parse_args() on uvicorn's sys.argv (['app.main:app', '--host', ...]) → sys.exit(2) → SystemExit(2), which inherits from BaseException, escapes FastAPI handlers, and propagates through the thread pool. Every scheduler tick that fired the endpoint either 500-ed or risked killing the uvicorn worker. services/session_collector/collector.py now exposes run(dry_run, verbose) that returns (rc, stats); main() is a thin CLI shim that parses argv and delegates. The admin endpoint calls run() directly and audit-logs the per-run stats (users_processed, files_copied, files_skipped) instead of just the rc. Three regression tests in TestRunHelper. Closes Devin Review finding on app/api/admin.py:2819 (#179).	2026-05-05 06:31:55 +02:00
ZdenekSrotyr	9f33e24bf9	fix(config): overlay-aware LLM consumers + env-ref resolution (#179 review) Devin BUG: /api/admin/configure seeds an ai: block to the writable overlay at DATA_DIR/state/instance.yaml, but the three LLM consumers imported from config.loader.load_instance_config — which reads the static config dir only. Even if they had read the overlay, the loader ran yaml.safe_load directly without passing through _resolve_env_refs, so '${ANTHROPIC_API_KEY}' would have stayed a literal placeholder. The pipeline appeared to work because the factory falls back to the env var directly, but the overlay path itself was dead code. Two fixes, both required: 1. Switched the three LLM consumers to app.instance_config.load_instance_config: - services/corporate_memory/collector.py:collect_all - services/verification_detector/__main__.py:main - app/api/admin.py:run_verification_detector 2. app/instance_config.py runs the loaded overlay through config.loader._resolve_env_refs before the deep-merge, so '${ANTHROPIC_API_KEY}' resolves at config-load time. New regression suite tests/test_instance_config_overlay.py pins: - env-ref resolution against the overlay (resolved when env set, empty when env missing — never the literal placeholder) - deep-merge still preserves static-only sections - the three consumers reach app.instance_config (inspected via inspect.getsource so a future refactor that reverts the import fails the test) - end-to-end: a seeded overlay + ANTHROPIC_API_KEY env reaches the factory with a resolved api_key	2026-05-05 05:57:22 +02:00
ZdenekSrotyr	98a8aba3be	fix(tests): align test_llm_connector with new factory + fail-fast (#179 review) The PR rewrote collect_all() to call the new create_extractor_from_env_or_config() helper, but the existing tests still mocked the old direct create_extractor() symbol and the old silent-skip-on-missing-config behavior. Five tests in TestCorporateMemoryCollector and one in TestCollectorExtractorIntegration were red on the PR branch. Changes: - Tests now mock connectors.llm.create_extractor_from_env_or_config (the symbol the collector imports lazily). - Renamed test_collect_all_no_ai_config_skips -> test_collect_all_no_ai_config_or_env_raises and test_collector_handles_invalid_config -> test_collector_raises_on_invalid_config. Both assert pytest.raises(ValueError) — the explicit fail-fast semantics defect 5 of #176 was supposed to enforce. - collect_all() no longer swallows the factory's ValueError into stats["errors"]; it propagates so the scheduler / admin endpoint surface the actionable misconfiguration message instead of pretending the run was a no-op. - /api/admin/run-corporate-memory translates the propagated ValueError into a 500 with the factory's message, matching /api/admin/run-verification-detector.	2026-05-05 05:55:01 +02:00
ZdenekSrotyr	45de71e8ab	fix(scheduler): wire LLM pipeline into scheduler-v2 (#176 ) The session-collector, verification-detector, and corporate-memory services now run on the same scheduler-v2 model that already drives data-refresh, health-check, script-runner, and marketplaces: - New admin endpoints in app/api/admin.py: POST /api/admin/run-session-collector POST /api/admin/run-verification-detector POST /api/admin/run-corporate-memory All admin-gated, sync-def (FastAPI thread pool), with one audit row per invocation. Same single-writer-of-system.duckdb pattern as the existing /api/marketplaces/sync-all job. - services/scheduler/__main__.py JOBS gains three entries with offset cadences (10m / 15m / 17m, all coprime modulo the 30s tick) so the three LLM-backed jobs don't fire on the same tick and stack their API + DB load. - The verification-detector endpoint surfaces the LLM factory's fail-fast ValueError as HTTP 500 with the actionable message, preserving the no-silent-skip contract from the previous commit. Tests: - tests/test_admin_run_endpoints.py covers admin gating + scheduler registration + endpoint contract. - tests/test_scheduler_sidecar.py existing tests continue to pass.	2026-05-04 23:57:43 +02:00
ZdenekSrotyr	bbb04ac041	fix(setup): seed default ai: block + env-var fallback (#176 ) POST /api/admin/configure now writes a default ai: block into the instance.yaml overlay when the request leaves it untouched and either ANTHROPIC_API_KEY or LLM_API_KEY is set in the environment. The block references the env var via ${VAR} syntax — secrets never land in YAML. connectors.llm.factory grows create_extractor_from_env_or_config which falls back to ANTHROPIC_API_KEY / LLM_API_KEY when ai_config is empty and raises a clear ValueError when neither is available. Both services/corporate_memory and services/verification_detector switch to the new helper, replacing the old 'silently skip when ai: missing' path that was the silent-failure root cause. Tests: - tests/test_setup_ai_block.py — overlay seeding contract. - tests/test_llm_provider_env_fallback.py — fallback + fail-fast.	2026-05-04 23:55:19 +02:00
ZdenekSrotyr	103efb69f0	chore(cli-rename): replace stale `da` verbs in active code paths Bring admin UI, audit-log messages, code comments, and analyst-facing skill docs in line with the post-bootstrap CLI surface (`agnes pull`, `agnes push`, `agnes init`, `agnes snapshot create`). The legacy `_LEGACY_STRINGS` detection tuple in `app/api/claude_md.py` and the hook upgrade markers in `cli/lib/hooks.py` are intentionally left as-is — they exist precisely to flag pre-rewrite content for re-authoring. Strip "(folded from `da metrics list`)" / "(lifted from `da metrics show`)" / "Replaces the old `da analyst status`" docstring noise — the rename history is in CHANGELOG.md, not in module docstrings.	2026-05-04 21:10:43 +02:00
ZdenekSrotyr	e438170ade	merge: pull #174 (BQ materialize view fix + concurrency, 0.33.0) into bootstrap branch Brings in zs/materialize-sync-fix (PR #174): - BigQuery view materialize works (wrap admin SQL in bigquery_query()) - Per-table mutex + fcntl.flock for concurrent COPY corruption - Cost guardrail dry-run engages on materialized rows - Schema v23 -> v24 migration: rewrite source_query to BQ-native - Server-generated trivial source_query from bucket+source_table - Validator backtick relaxation for materialized rows - 0.33.0 release cut Conflict resolution: - CHANGELOG.md: keep our [Unreleased] (bootstrap rewrite content) ABOVE the new [0.33.0] section from #174. The bootstrap rewrite remains unreleased; it'll cut 0.34.0 (or later) when this PR merges to main. - tests/conftest.py: union — keep our analyst-bootstrap fixture re-export AND #174's bq_instance / stub_bq_extractor fixtures. - pyproject.toml auto-merged to 0.33.0 (matches the cut), correct. - src/db.py auto-merged: SCHEMA_VERSION = 24, _v23_to_v24_finalize added — no overlap with our work which left schema at v23. - CLAUDE.md auto-merged: schema-history paragraph extended with v24. Verified: 79/79 across CLI bootstrap suite + materialize suite + schema v24 migration tests pass locally on Python 3.13/macOS.	2026-05-04 20:53:00 +02:00
ZdenekSrotyr	3d58768143	fix: address Devin Review findings — incomplete renames + estimate guard 13 Devin findings across 10 files: 🔴 Critical: - app/api/v2_catalog.py:42 — `_fetch_hint` returns `da fetch` in /api/v2/catalog responses (user-visible in every catalog list) - cli/skills/agnes-data-querying.md — 11 stale `da fetch`/`da sync` refs in the bundled skill markdown - config/claude_md_template.txt:38 — referenced `agnes pull --docs-only` flag that does NOT exist in agnes pull (removed; spec only ships --quiet/--json/ --dry-run) 🟡 Important: - app/api/admin.py:252 — `da fetch` in bq_max_scan_bytes hint - cli/commands/auth.py:119 — `da sync` in import-token docstring (--help text) - cli/commands/tokens.py:48 — "Export it so `da` can use it" prose - ARCHITECTURE.md — 4 stale rows in CLI commands table - README.md — stale paragraphs for analysts (da sync, da analyst setup) 🚩 Substantive observations addressed: - app/api/query.py:249,302,489 — server-side error/help strings still said `da sync`/`da fetch` (returned in API responses to clients) - cli/commands/snapshot.py:235-241 — DuckDB existence guard incorrectly blocked `--estimate` (server-side dry-run that never opens local DB). Added test ensuring estimate path skips the guard. Skipped (intentionally historical): - app/api/admin.py:2377,2429,2437 — historical comments describing past manifest-vs-sync_state bug; past tense, accurate to keep as `da sync`.	2026-05-04 20:05:06 +02:00
ZdenekSrotyr	6c0846fd17	feat(config): expose materialize.lock_ttl_seconds in server-config New top-level 'materialize' section, single field (lock_ttl_seconds). Default 86400 (24h). Backs the file-lock TTL reclaim added in the per-table-mutex change. Editable via PUT /api/admin/server-config and the /admin/server-config UI.	2026-05-04 18:52:54 +02:00
ZdenekSrotyr	3871d5320a	feat(admin): server-generate materialized source_query, allow BQ backticks When admin registers a materialized BQ row with bucket+source_table but no source_query, the server generates 'SELECT * FROM `<project>.<ds>.<tbl>`' from instance.yaml's configured BQ project. Same fallback fires on PUT when flipping to materialized. The backtick rejection guard, which was appropriate for DuckDB-flavor source_query, is relaxed for materialized rows since the new wrapping path (Task 2) runs admin SQL through BQ jobs API which uses BQ-native syntax (backticks for dashed identifiers).	2026-05-04 18:37:27 +02:00
ZdenekSrotyr	1563b05f2e	refactor(cli): hard-cutover env vars + config dir to AGNES_* Task 0.5 of clean-analyst-bootstrap. Greenfield rewrite — no fallback, no aliases. Existing dev environments lose their cached PAT and must re-authenticate. Env var renames (hard cutover): - DA_CONFIG_DIR -> AGNES_CONFIG_DIR - DA_SERVER -> AGNES_SERVER - DA_SERVER_URL -> AGNES_SERVER_URL (test-only stale ref, not in spec) - DA_NO_UPDATE_CHECK -> AGNES_NO_UPDATE_CHECK - DA_LOCAL_DIR -> AGNES_LOCAL_DIR - DA_TOKEN -> AGNES_TOKEN - DA_STREAM_RETRIES -> AGNES_STREAM_RETRIES Config dir rename: ~/.config/da/ -> ~/.config/agnes/ (across code, comments, docstrings, error messages, install templates, dev scripts). Stale `da X` references in CLI source (and adjacent app/, tests/): swept docstrings, comments, help text, and error messages where the verb survives the rewrite (init, pull, push, catalog, status, diagnose, auth, admin, skills, query, schema, describe, explore, disk-info, snapshot, login, logout, whoami, server, setup) and replaced `da X` with `agnes X`. Intentionally kept `da sync`, `da fetch`, `da analyst`, `da metrics` — those verbs are removed in later tasks; the legacy strings will be detected by `_LEGACY_STRINGS` (added in Task 2). Test fixes: - TestCLIVersion now asserts output starts with `agnes ` (was `da `). Test results: 2675 passed, 25 skipped (full pytest run, excluding 9 pre-existing test_db.py / test_user_management.py / test_e2e_extract.py / test_cli_binary_rename.py failures unrelated to this rename).	2026-05-04 16:35:44 +02:00
ZdenekSrotyr	28aba4c1f9	fix(query): #168 review iter 3 — RBAC name-vs-id, placeholder dead code Devin Review iter #3 found 3 new real bugs after iter #2's fixes landed. 🔴 RBAC check at app/api/query.py:362 used `row["name"]` against `accessible_set`, but `accessible_set` is keyed by registry IDs (`get_accessible_tables` returns `resource_grants.resource_id` — table IDs, not display names). Confirmed by `_table_blocks` projection at `app/resource_types.py:157-158`. When `id != name` (e.g. `id="bq.finance.ue", name="ue"`), non-admin users with valid grants got 403 `bq_path_access_denied`. Switch to `row["id"]`. 🚩 Bare-name pass at app/api/query.py:332 had the same name-vs-id mismatch (different impact): legitimate accessible rows were skipped from `dry_run_set`, so the cost guardrail under-counted scan bytes for non-admin users. Could let an over-cap query through and under-bill quota. Switch to `row_id` comparison. 🟡 `placeholder_from` for billing_project was dead code. `_BQ_OPTIONAL_FIELD_DEFAULTS["billing_project"] = ""` seeded an empty string into every GET payload via `_ensure_bq_optional_fields`. JS `isUnset = (value === undefined)` evaluated False, so the `(defaults to <project>)` placeholder NEVER rendered. Drop the seed — field stays in `known_fields` (UI sees it) but routes through the unset rendering path on GET, where placeholder_from fires. Tests: test_get_surfaces_bq_fields_even_when_unset assertion flipped from "billing_project IS present" to "billing_project NOT auto-seeded" to lock in the new shape. 67 affected tests pass.	2026-05-04 13:51:36 +02:00
ZdenekSrotyr	6423888d02	fix(query): #160 move bq_max_scan_bytes to data_source.bigquery (UI editable) E2E test on dev VM revealed: spec said "configurable via /admin/server-config" for the cost guardrail cap, but the underlying read path was `api.query.bq_max_scan_bytes` and `api` is NOT in `_EDITABLE_SECTIONS`. POST to /admin/server-config rejected `{"sections":{"api":...}}` as "unknown section(s): api" — the cap was only adjustable via direct YAML edit. Move to `data_source.bigquery.bq_max_scan_bytes`: - `_default_remote_query_cap_bytes()` reads from the new path. - Add to `_OPTIONAL_FIELDS["data_source"]["bigquery"]["fields"]` with the same shape as `max_bytes_per_materialize` (kind=int, default 5 GiB, hint). - Add to `_BQ_OPTIONAL_FIELD_DEFAULTS` so it surfaces in the GET payload even when YAML omits it. Convention now mirrors `max_bytes_per_materialize` — both BQ cost guardrails live under `data_source.bigquery`, both editable in the UI.	2026-05-04 12:46:38 +02:00
ZdenekSrotyr	39bdc1ff45	feat(admin): #160 BQ test-connection endpoint + billing_project placeholder UI Closes the operator-side half of the reporter's loop. The CLI fix in the previous commit makes USER_PROJECT_DENIED errors readable to analysts; this commit lets admins verify reachability proactively from /admin/server-config without waiting for analyst reports. New endpoint POST /api/admin/bigquery/test-connection (app/api/admin_bigquery_test.py, ~110 LOC): - Depends(require_admin); registered in app/main.py. - Builds BqAccess via existing get_bq_access(), runs `SELECT 1 AS ok` with a 10s polling timeout. - 200 with {ok, billing_project, data_project, elapsed_ms} on success. - 400 for `BqAccessError(not_configured)` (operator config issue). - 502 for any other typed BqAccessError or unknown upstream exception. - 504 for concurrent.futures.TimeoutError; best-effort cancel_job invoked (BQ-side cancel may still run; documented caveat). Server-config placeholder (app/api/admin.py + admin_server_config.html): - `data_source.bigquery.billing_project` field-spec gains `placeholder_from: ["data_source", "bigquery", "project"]`. - renderLeafInput's text branch reads `opts.spec.placeholder_from`, walks the loaded `original` config dict, injects `placeholder="(defaults to <project>)"` into the input HTML at construction time. Admin sees the access.py:339-340 fallback rule visible directly in the UI without reading source. UI button: - "Test BigQuery connection" button next to data_source's Save button. - onTestBigQuery() POSTs to the endpoint, renders structured result inline (green check + elapsed_ms on success; red kind + hint on failure). Tests: 6 endpoint cases + 1 placeholder payload test = 7 GREEN. 62 total across the affected admin server-config test files.	2026-05-04 10:31:35 +02:00
ZdenekSrotyr	9d0e4e687d	refactor(bq): #160 remove legacy_wrap_views config knob (always-wrap) Now that VIEW/MATERIALIZED_VIEW always wrap via bigquery_query() (the prior `legacy_wrap_views=True` branch behavior, made unconditional in the previous commit), the toggle has no semantic meaning and is removed across the codebase. Production code: - app/api/admin.py: drop the field from _OPTIONAL_FIELDS["data_source"] ["bigquery"]["fields"] and from _BQ_OPTIONAL_FIELD_DEFAULTS, plus the comment block above the defaults dict. - config/instance.yaml.example: drop the example snippet. - src/orchestrator.py: update the inner-objects skip-branch comment to reflect the new BQ behavior (the skip itself stays — keboola use_extension=False still inserts _meta rows without inner views). - app/web/templates/admin_tables.html: rewrite operator copy in the register and edit forms to reflect always-wrap. Tests: - tests/test_admin_server_config.py (TestServerConfigBigQueryFields): flip assertions from "field IS present" to "field NOT present" on legacy_wrap_views. Drop the test_post_persists_legacy_wrap_views test since the field no longer exists. - tests/test_admin_server_config_known_fields.py: same flip on the known-fields registry assertion. - tests/test_bigquery_extractor.py: drop the obsolete test_view_entity_does_not_create_master_view_by_default (asserted the bug we fixed) and test_legacy_wrap_views_toggle_restores_old_behavior (toggle no longer meaningful). Update remaining test docstrings. Operators with `legacy_wrap_views: true` set in their overlay get the new (equivalent) behavior automatically — the unrecognized key is silently ignored by the YAML loader. Operators with `false` get the issue-#160 fix as a behavior change, not a regression. Spec gate updated: production code grep gate grep -rn 'legacy_wrap_views' connectors app src config cli must return zero. tests/ excluded — historical "removed in #160" breadcrumbs and `assert "X" not in fields` regression guards retained as anti-regression signals.	2026-05-04 10:31:35 +02:00
ZdenekSrotyr	8030a867ec	fix(admin-api): keep source_type validator permissive when primary is 'local' (bootstrap) The strict source_type-availability validator from the prior commit broke ~12 existing tests that register tables on the default test instance (where `data_source.type` resolves to 'local' since no instance.yaml is loaded). The intent of the validator is to catch explicit misconfig: `type=bigquery` instance + `source_type=keboola` payload with no `data_source.keboola.*` block. The bootstrap workflow — admin sets up a fresh instance and registers a few tables before pointing at a real source — should not be gated here. Loosen the check: when `get_data_source_type()` returns 'local' (the fallback when no `data_source.type` is set), skip the rejection. The explicit mismatch case still 422s because that path resolves `configured_primary` to a real source type. Also adds an autouse keboola_instance fixture to test_journey_sync_query.py which exercises Keboola registrations through the full sync→query flow — the fixture documents the test's data-source assumption rather than relying on the bootstrap escape hatch.	2026-05-01 23:09:15 +02:00
ZdenekSrotyr	bc3ba0d43d	feat(admin-api): reject register-table for source_type not configured on instance E2E sub-agent finding: instance configured with `data_source.type='bigquery'` and no `data_source.keboola.` block. Admin POSTs `{source_type: 'keboola'}` to /api/admin/register-table → returns 201, row lands in the registry, but never syncs because the scheduler has no Keboola URL/token to ATTACH against. Operator only notices the gap when `da catalog` keeps showing nothing. The new `_validate_source_type_configured` helper runs immediately after the id/view-name collision checks in `register_table`. A source_type is considered configured when: - it matches `get_data_source_type()` (the instance's primary), OR - a non-empty `data_source.<source_type>` block exists in the effective `instance.yaml` (multi-source instance), OR - it's in `_SOURCE_TYPES_INDEPENDENT_OF_DATA_SOURCE` (Jira / local — both get data through paths that don't involve `data_source.`). Returns 422 with a message that names the configured primary source and points at `/admin/server-config` for enabling a secondary one. None / empty source_type is still tolerated for backward compat with legacy CLI scripts that don't set the field — the route resolves it later. 5 new tests cover: keboola-on-bq rejected, bq-on-keboola rejected, matching source_type still works, jira allowed regardless, omitted source_type passes through. Existing tests that registered Keboola rows on the unconfigured default test instance now opt into a `keboola_instance` fixture to satisfy the new validator (tests/test_admin_bq_register.py + .keboola_materialized + .unregister_cleanup; the multi-source PUT test in test_admin_bq_register adds a `keboola` block to its synthetic config). Pre-existing test_missing_project_returns_error failure in TestRebuildFromRegistry is unrelated (config-cache leakage from a previous test in the same class) — confirmed pre-existing on the prior commit via `git stash` reproduction.	2026-05-01 23:04:51 +02:00
ZdenekSrotyr	dd46461c6c	fix(admin+orchestrator): DELETE registry drops parquet + sync_state; rebuild skips orphan parquets E2E sub-agent finding: register a materialized BQ row → sync to materialize the parquet at `/data/extracts/bigquery/data/<id>.parquet` → DELETE the registry row. The DB row goes away but: - the parquet file stays on disk forever, AND - the sync_state row stays, so `/api/sync/manifest` keeps advertising the dropped table to `da sync`, AND - the orchestrator's next rebuild can resurrect a master view by picking up the leftover parquet. Two-part fix in `unregister_table`: 1. For materialized rows on bigquery/keboola, remove `${DATA_DIR}/extracts/<source_type>/data/<name>.parquet` (and any stale `<name>.parquet.tmp` from a crashed prior materialize). Filename is keyed on `table_registry.name` to match sync_state bookkeeping. File-removal errors are logged but don't fail the DELETE — the registry row is already gone, and an orphan parquet won't get a master view at next rebuild because the orchestrator's _meta-driven scan never picks up bare parquet files. 2. Always clear `sync_state` + `sync_history` rows for the dropped table_id so the manifest stops advertising the table — applies to all source types and modes, not just materialized, since any synced row had a sync_state entry. Orchestrator-side defensive guard (Finding 2b) is a no-op in the current implementation: `_attach_and_create_views` only creates master views from `_meta` rows in each connector's `extract.duckdb`, so a parquet without a matching `_meta` entry is already invisible to the rebuild. The new test `test_orchestrator_skips_orphan_parquet_in_extracts` is kept as a regression guard for that contract. 5 tests cover: BQ + Keboola materialized DELETE removes parquet, remote DELETE doesn't error trying to remove a non-existent file, sync_state cleared on DELETE, orchestrator orphan-skip invariant.	2026-05-01 22:54:11 +02:00
ZdenekSrotyr	f0979f997a	fix(admin-api): reject backtick BQ-native source_query at register; surface materialize errors per-row E2E testing showed admin POSTs of materialized BQ rows whose source_query uses BigQuery-native backtick identifiers (`prj.ds.t`) silently no-op'd at the next sync tick — the materialize path runs the SQL through the DuckDB BQ extension's COPY which uses DuckDB's parser; backticks aren't recognized and the query either parse-errors or matches zero rows. No parquet lands at the canonical path and no error reaches an operator-visible surface. Two-part fix: 1. RegisterTableRequest's _check_mode_query_coherence model_validator now rejects any source_query containing a backtick with a 422 + actionable message pointing at the DuckDB equivalent (bq."dataset"."table"). Same check is applied in update_table on the merged record so PATCHes that flip a stored source_query to backtick form are also caught. Covers BQ AND Keboola materialized rows since both connectors funnel source_query through DuckDB's COPY. 2. _run_materialized_pass now persists per-row failures via the new SyncStateRepository.set_error / clear_error methods (existing sync_state.error / status columns — no schema migration). GET /api/admin/registry enriches each row with `last_sync_error` from a single batched SELECT against sync_state, so the admin UI / da admin status can show "this table failed last sync because: X" instead of operators having to trawl scheduler logs. Recovered rows have the error cleared automatically — update_sync's success path resets status='ok' / error=NULL on the upsert. The materialized-path test fixture's _materialized_payload helper is updated to use DuckDB-flavor SQL (the prior backtick example pre-dated the fix). 6 new tests cover register/update rejection on BQ + Keboola, the sync_state error persistence, and the registry response surface.	2026-05-01 22:51:02 +02:00
ZdenekSrotyr	a4339ce679	fix(admin+diagnose): address 2 additional Devin Review findings on PR #152 Devin's second review pass on commit `16938ae7` surfaced 2 more issues: BUG_pr-review-job-58ae3148_0001 — non-BQ materialized via PUT bypasses source_query check app/api/admin.py update_table only enforces 'query_mode=materialized requires source_query' for source_type='bigquery' rows (via the synthetic RegisterTableRequest at line 2129+). Non-BQ source types (Keboola) skip the check — admin could PUT {query_mode: materialized} on a Keboola local row without source_query, persist successfully, then crash at the next sync tick when kb_materialize_query received sql=None and DuckDB rejected COPY (None) TO '...'. Fix: generic coherence guard before the BQ-specific block — for ALL source types, query_mode='materialized' requires non-empty source_query in the merged record. Returns 422 with a hint about reverting via query_mode='local'/'remote'. ANALYSIS_pr-review-job-642ff90f_0007 — diagnose returns 'ok' on BQ resolution failure app/api/health.py:_check_bq_billing_project caught get_bq_access() exceptions and returned status='ok' with a 'could not resolve' detail. Automated alerting keyed on status != 'ok' would silently miss missing google-cloud-bigquery, auth failures, or malformed config. Fix: return status='unknown' on resolution failure — surfaces it on operator dashboards without promoting the overall health to 'degraded' (which 'warning' does, intentionally for the billing==project case). Tests: - test_update_keboola_to_materialized_without_source_query_rejected: PUT {query_mode: materialized} on a Keboola local row returns 422 with 'source_query' in the detail - test_diagnose_returns_unknown_status_when_bq_resolution_fails: when get_bq_access raises, the bq_config service entry surfaces status='unknown' (not 'ok') Full sweep: 2507 passed, 25 skipped, 0 failed (+2 from previous sweep because of the 2 new regression tests; 8 pre-existing internal_roles schema-migration failures still ignored per task brief).	2026-05-01 21:21:23 +02:00
ZdenekSrotyr	85d3810535	feat(materialized): query_mode='materialized' for BigQuery + Keboola — admin SELECT → parquet → analyst Closes the 'admin pre-stages a curated table/view for analysts' use case end-to-end across both supported source connectors. Backend (BigQuery + Keboola, schema v20): - schema v20 adds source_query TEXT to table_registry (renumbered from v19 after main's #150 RBAC migration also bumped to v19) - connectors/bigquery/extractor.py adds materialize_query(table_id, sql, , bq, output_dir, max_bytes=...) — BqAccess session, dry-run cost guardrail (default 10 GiB, configurable via data_source.bigquery.max_bytes_per_materialize), idempotent ATTACH, rows/bytes/md5 metadata for sync_state - connectors/keboola/access.py — new KeboolaAccess facade (parallel of BqAccess) wrapping ATTACH 'keboola://...' AS kbc - connectors/keboola/extractor.py adds materialize_query — same shape, no dry-run analog (Keboola Storage API has different cost model); legacy bucket-download path skips query_mode='materialized' rows - app/api/sync.py:_run_materialized_pass dispatches by source_type to the right materialize_query - app/api/admin.py: RegisterTableRequest accepts source_query; model_validator coheres mode↔source_query↔bucket; PUT preserves omitted fields; deprecation marks (Field(deprecated=True)) on sync_strategy + profile_after_sync (no extractor reads them; profile_after_sync becomes inert — bug from earlier work where /api/sync/trigger never honored the flag); _BQ_OPTIONAL_FIELD_DEFAULTS injects defaults into GET /server-config payload Operator + CLI surface: - da admin register-table --query / --query-mode materialized - scripts/smoke-test-materialized-bq.sh — end-to-end smoke for operators Tests (incl. spike + integration + regression): - test_db_migration_v20, test_table_registry_source_query - test_bq_materialize, test_bq_cost_guardrail, test_bq_init_extract_skips - test_keboola_access, test_keboola_extension_query_passthrough (lock-in for the DuckDB extension capability), test_keboola_materialize, test_keboola_init_extract_skips, test_keboola_materialized_e2e (skipped without KBC_TEST_ creds) - test_sync_trigger_materialized, test_sync_trigger_keboola_materialized - test_api_admin_materialized, test_cli_admin_materialized - test_admin_bq_register, test_admin_discover_bigquery, test_admin_keboola_materialized, test_admin_phase_c_deprecation, test_admin_put_preservation, test_materialized_e2e Cost: BQ uses bigquery_query() (jobs API, view-aware) — works on tables, views, materialized views uniformly. Keboola uses ATTACH+COPY parquet through the DuckDB extension.	2026-05-01 20:25:56 +02:00

1 2

68 commits