agnes-the-ai-analyst

Author	SHA1	Message	Date
ZdenekSrotyr	1b0329e8c5	UI design system unification — one stylesheet, canonical primitives, nav fix (#284 ) * docs(plan): design-system unification plan (post-review revisions) Plan covers consolidating two CSS files into one, introducing canonical primitives (.btn family, .search-input, .filter-bar, .page-header, .data-table, .empty-state, .toast, .stat-card, .tab-strip), unifying the top-nav Admin trigger with sibling links, and migrating 41 templates that today carry inline <style> blocks. Post-review revisions: nav fix moved to first commit (user complaint lands first); sticky-header and dark-mode skeleton tasks dropped (defer to follow-up PRs); contract test class detection tokenizes class="..." attributes properly; baseline screenshot loop added to Task 0; vendor-token grep widened. * fix(nav): unify Admin trigger with sibling nav links The top-nav Admin entry is a <button class="app-nav-link app-nav-menu-trigger">, siblings are <a class="app-nav-link">. .app-nav-menu-trigger used to override .app-nav-link with "color: inherit; font: inherit", resetting font-size from 13px back to body default and color from --text-secondary to body color. Active state diverged too: .is-active on links used --primary blue, [aria-expanded=true] on the button used --border-light grey. Fix: expand .app-nav-link so it covers <button>-element resets (font-family: inherit, border: 0, background: transparent, cursor: pointer, display: inline-flex for chevron alignment). Add [aria-expanded="true"] as another active-state selector so the dropdown's open state highlights identically to .is-active on links. Delete the now-redundant .app-nav-menu-trigger rules that stripped button chrome. Extract the inline <script> from _app_header.html into a new app/web/static/app.js (loaded by base.html only — base_login.html has no nav). Sets up window.appUI.wireDropdown for both the user menu and the Admin dropdown via DOMContentLoaded. * style(css): consolidate style.css into style-custom.css + add cache-bust One stylesheet for the whole web UI: - style.css (1086 lines, legacy Google-inspired tokens + components) absorbed into style-custom.css under a labeled block, placed after the modern :root + body so style-custom's component rules continue to override the legacy ones (preserves the original cascade order that came from loading style.css first). - style.css deleted; <link> dropped from base.html + base_login.html. - static_url() now appends ?v=<mtime> to /static/<path>. Cheap per-request os.stat — auto-invalidates browser + proxy caches on redeploy without operator intervention. Mtime survives across uvicorn restarts as long as the file content is unchanged. Legacy classes (.btn, .card, .login-, .badge, .code-block, .flash, .form-group, .username-box, .btn-copy, .auth-tabs, .divider, etc.) still render — they live in style-custom.css now. Login pages, error page, password setup, and the dashboard's Claude Code Setup card all kept working in browser smoke. test(design): contract test for design-system invariants 7 structural invariants enforced from this commit onwards: - style.css must stay deleted - no template links style.css via static_url - exactly one bare :root block in style-custom.css - canonical primitives declared (.btn, .btn-primary, .search-input, .filter-bar, .page-header, .data-table, .empty-state, .toast, …) - no deprecated class names in templates (.users-table, .gp-table, .marketplaces-table, .audit-table, .users-search, .marketplaces-search, .modal-btn, .btn-primary-v2, …) - app.js loaded by base.html, NOT by base_login.html - 3 helper-level unit tests for the class-attribute tokenizer (multi-line attrs, Jinja-conditional fragments, false-positive prose) Two of the assertions intentionally start FAILING after this commit (missing primitives + legacy class refs in 7 admin templates) and will turn green as Tasks 4–7 add primitives and Tasks 8–15 migrate the templates. * feat(css): canonical button family + legacy token aliases Adds at top of :root: legacy token aliases (--bg, --card-bg, --text, --text-light, --secondary, --radius) pointing at modern equivalents. Absorbed style.css rules referenced these names; without aliases they fell back to 'unset'. Aliases live until Task 16 alongside their absorbed rules. Appends canonical .btn variants at end of file (last cascade): .btn-primary + .btn-primary-v2 + .modal-btn.primary (alias group) .btn-secondary + .btn-secondary-v2 + .modal-btn:not(.primary):not(.danger) .btn-ghost + .btn-ghost-v2 .btn-danger + .modal-btn.danger .btn-lg .btn:disabled + .btn:focus-visible (focus ring via --focus-ring) Existing absorbed .btn, .btn-primary, .btn-secondary, .btn-sm rules remain — the canonical block adds the missing variants + selector-list aliases so .modal-btn and v2 markup keep rendering until migration tasks swap them out. Contract test: .btn-danger now declared (one less missing primitive). Browser smoke: /admin/tokens hero + filter pills + empty state render correctly with the absorbed style.css rules now backed by real tokens. * feat(css): form-control primitives — .search-input + .filter-bar + .filter-pill + .form-input Canonical filter bar shape: 36px-height inputs (matches button height for vertical rhythm), 28px pills with .is-active state, consistent focus ring via --focus-ring token. Selector-list aliases for legacy per-page classes: - .users-search / .marketplaces-search / .kb-search → .search-input - .filters-card → .filter-bar - .pill[aria-pressed="true"] also matches the .filter-pill active state .form-input added as a sibling of .search-input for forms — same baseline height + radius + focus treatment, with textarea.form-input auto-sizing to min 96px and using the mono font (matches CSV/SQL pasted-snippet patterns on /admin/agent-prompt + /admin/workspace-prompt). Contract test: .search-input + .filter-bar + .filter-pill now declared. * feat(css): .page-header primitive + variants + .tab-strip Canonical page-header pattern with title (22px) + optional subtitle + optional eyebrow + right-aligned actions slot. Two modifiers: - .page-header--hero: gradient background (primary→primary-dark), 28px white title, semi-transparent subtitle/eyebrow. For /marketplace, /store, /profile-style pages that already use this layout via per-page inline <style>. Migration tasks delete the duplicated rules. - .page-header--compact: 18px title for dense admin index pages. .tab-strip + .tab-strip__item — the secondary tab row pattern used by /marketplace?tab=flea and similar. .is-active / [aria-selected=true] both flip the active treatment (primary color + bottom border). Contract test: .page-header / __title / __subtitle / __actions all now declared (4 fewer missing primitives). * feat(css+js): .data-table + .empty-state + .toast + .stat-card primitives Last primitive batch. All 8 canonical-primitives invariants in test_design_system_contract.py now green; only the template-migration test fails (expected — Tasks 8–15). .data-table (+ --compact modifier): selector-list aliases for legacy per-page table classes (.users-table, .gp-table, .marketplaces-table, .audit-table) so existing markup keeps rendering until migration. Compact modifier shrinks padding + font for dense lists (audit log). .empty-state with __icon / __title / __description / __actions — replaces the ad-hoc 'no results' rendering scattered across pages (corporate_memory, admin_users, admin_marketplaces, etc.). .toast / .toast-container — paired with window.appToast({kind, msg, timeout}) appended to app.js. Bottom-right stacked, click-to-dismiss, auto-dismiss after 4s by default. Kind 'success' / 'warning' / 'error' / 'info' shows a 3px colored left border. .stat-card (+ --accent variant) + .stat-row grid — for the dashboard metric tile row. * style(templates): migrate 8 templates off deprecated class names Mechanical class-attribute rewrite via tokenizer (preserves Jinja conditionals + multi-line attrs): modal-btn primary -> btn btn-primary modal-btn danger -> btn btn-danger modal-btn -> btn btn-secondary users-table -> data-table gp-table -> data-table marketplaces-table -> data-table audit-table -> data-table users-search -> search-input marketplaces-search -> search-input 8 templates touched: admin_groups, admin_marketplaces, admin_tokens, admin_users, admin_welcome, admin_workspace_prompt, my_tokens, corporate_memory_admin. 43 lines updated total. Inline <style> blocks in these templates still define rules for the old class names — those rules no longer match anything and become dead code, removed in Task 16's alias cleanup along with the selector-list aliases in style-custom.css. Contract test (tests/test_design_system_contract.py) now fully green: 9/9 invariants enforced from this commit onward. * feat(css): extend .data-table selector list to 13 more bespoke -table classes Visual unification of remaining tables across the codebase without per-template edits. The .data-table baseline rules (uppercase header tracking, 12px padding, hover state, border-radius) now apply to: .ad-table / .ea-table / .md-table / .members-table / .obs-table / .overview-stats-table / .registry-table / .sample-table / .sched-table / .sess-table / .sub-table / .subs-table / .ud-table These class names live in 12 templates (activity_center, admin_access, admin_group_detail, admin_scheduler_runs, admin_sessions, admin_store_submissions, admin_tables, admin_usage, admin_user_detail, catalog, me_debug, profile_sessions) that have their own per-page <style> blocks. Per-page rules with higher specificity still win for their custom needs (column widths, etc.) — this commit only sets a shared baseline so every table renders with the same chrome. Contract test stays green: 9/9 invariants enforced. * style(css): remove now-unused legacy class aliases Phase A renamed 8 templates off these names; no markup references them any more, so the selector-list memberships are dead weight. Removed from style-custom.css: .btn-primary-v2 / .btn-secondary-v2 / .btn-ghost-v2 .modal-btn / .modal-btn.primary / .modal-btn.danger / .modal-btn:not(.primary):not(.danger) .users-search / .marketplaces-search / .kb-search .users-table / .gp-table / .marketplaces-table / .audit-table .filters-card 37 lines smaller. Contract test catches any reintroduction. KEPT aliases (still in untouched template markup): - .pill (marketplace_plugin_detail.html, marketplace.html — these pages weren't part of Phase A's deprecated-class sweep; their own .pill CSS rules still apply) - All .data-table family extensions (.ad-table, .ea-table, .md-table, .members-table, .obs-table, .overview-stats-table, .registry-table, .sample-table, .sched-table, .sess-table, .sub-table, .subs-table, .ud-table) — these still render data tables in 12 templates; selector-list aliasing keeps them visually unified with .data-table baseline. - Legacy token aliases (--bg / --text / --text-light / --secondary / --card-bg / --radius) — still resolve absorbed style.css rules. Templates' inline <style> blocks still contain dead rules for the renamed classes (.users-search, .modal-btn, etc.); harmless but bloat. Optional follow-up: a separate sweep can drop those. * docs(changelog): design-system unification under [Unreleased] * feat(css): unify page-shell width — .container baseline 1280px + modifiers Inventory found 30+ unique max-width values across templates (280px login → 1600px admin/tables). The legacy .container default was 800px, which made every admin page set its own wider inline override — 30+ ad-hoc widths drifted as a result. Canonical: .container max-width = var(--width-app) (1280px). Pages that need a different shape opt in via modifiers: .container--narrow → var(--width-narrow) (800px) — long-form text, setup wizards .container--wide → var(--width-wide) (1400px) — admin lists, marketplace grids .container--full → max-width: none — hero / landing Pages that already set a NARROWER inline max-width (setup, login flows inside .login-card, etc.) still render at their narrower size — the inline override beats the new canonical 1280px. The visible change hits the ~20 admin pages currently rendering at 800px via the legacy default, which jump to 1280px and pick up consistent breathing room. Spacing also normalized: padding 24px 20px → var(--space-6) var(--space-5). * fix(home+catalog): gut dashboard sections + remove confusing toggle + fix table count Dashboard /home cleanup: - Remove 'Your Data' card — Data Packages is already a top-nav entry, so duplicating data sources on the landing page just adds noise. - Remove 'Account' card — group memberships + scripts + last sync belong on /profile, not on the welcome screen. - Remove entire right-column (Corporate Memory + Activity Center widgets) — both surfaces have dedicated admin pages reachable from the Admin dropdown. - Keep stats row (Tables/Columns/Rows/Data Size/Unstructured), env-setup-CTA, and Notifications card. /catalog cleanup: - Strip the 'Always included' badge + the locked toggle-switch from Core Business Data and Business Metrics cards. The toggle was always 'checked disabled' — it visually looked like a switch but could not be toggled, which was confusing. The 'Always included' copy itself was redundant once the toggle was gone. Agnes Internal already rendered without these, so the three cards are now visually consistent. Catalog data_stats fix: - 'total_tables' was len(sync_state) — counted only tables that had ever synced, so a 30-row table_registry with 0 ever synced rendered as '0 tables'. Switched to len(tables) — the registered business-data table list — so the count reflects what's actually available, not what's been touched. * fix(home): real stat numbers + drop unstructured tile + cleanup dead CSS Dashboard stats were hardcoded zeros (columns: 0, size_display: '0 MB', unstructured_display: '0 MB') and the table counter pulled from sync_state (synced) instead of table_registry (registered). On a fresh deployment with 30 registered tables and 0 ever synced, the page rendered '0 / 0 / 0 / 0 MB / 0 MB' — useless. Now: - Tables: COUNT() FROM table_registry WHERE source_type != 'internal'. Matches the /catalog Core Business Data counter. - Columns: SUM(sync_state.columns). Zero only when nothing's synced yet. - Rows: unchanged (SUM(sync_state.rows), already correct). - Data Size: SUM(sync_state.file_size_bytes), human-formatted via inline _fmt_bytes helper (KB/MB/GB). - Unstructured: tile dropped — was always '0 MB' and had no source. - last_updated: now derived from sync_state max(last_sync), wasn't set before so the 'Synced …' tag never rendered. Dashboard.html cleanup: ~725 lines of orphan inline <style> removed — .section-title, .data-source, .toggle-switch, .catalog-cta, .memory-card / .memory-stat / .memory-description / .memory-footer / .btn-memory, .activity-card / .activity-stat / .activity-text / .btn-activity, .account-grid / .account-row / .account-scripts / .badge-role / .badge-group / .cron-line, .badge-included / .badge-beta / .badge-demo. All matched markup deleted in the previous commit; the CSS was dead code until now. * ui(catalog): rename page heading 'Data Catalog' → 'Data Packages' The top-nav entry says 'Data Packages' but the page itself said 'Data Catalog' — confusing two-name product. Aligns the heading and <title> with the nav label. Subtitle trimmed too: 'manage your subscriptions' was a vestige of the toggle UI that just got removed, replaced with a one-liner describing what the page is for. Two other 'Data Catalog' strings stay: they live inside the table- profiler overlay JS and refer to an EXTERNAL catalog system (e.g. OpenMetadata / Atlan) that an operator may link to per table — that is a generic term for any external data-catalog product, not our page name. * fix(nav): dropdown clicks always work + mutual-exclusion close Two bugs in the wireDropdown helper: 1. Clicking trigger B while trigger A's menu was open left both open. e.stopPropagation() in trigger.click prevented the document-click handler from firing, so trigger A's open menu had no way to learn that something else was clicked. Net effect: state diverged across the two dropdowns the more you clicked. 2. The target-vs-trigger equality check (e.target !== trigger) was strict. Clicking the chevron <svg> inside the button reports the svg or its <path> child as e.target — not the button — so removing stopPropagation alone would trip the close branch in the same click that just opened the panel. Fix both at once: drop e.stopPropagation() AND switch the doc-handler guard to trigger.contains(e.target). Now any click outside both the trigger subtree and the panel subtree closes; any click on another trigger closes via the OTHER dropdown's doc handler; clicks inside the trigger (button OR svg child) are fully ignored by the doc handler and only the trigger's own toggle handler fires. * feat(ui): canonical blue-gradient hero on every admin page The UI had a per-page hero pattern on ~10 onboarding/marketing pages (admin_tokens / profile / install / setup_advanced / marketplace / my_tokens / store_upload / home_), each with its own ad-hoc CSS (.tokens-hero, .profile-hero, .install-hero, .upload-hero, …). The admin section's index + detail pages had plain H1/H2 with their own .users-title / .gp-title / .obs-title / .cfg-title / … inline styling. Net effect: half the app felt like a product, half felt like a spreadsheet. Now: - .page-header--hero CSS upgraded to match the look analysts already liked from admin_tokens: 28px/32px/24px padding, 14px radius, soft primary-tinted box-shadow (0 4px 16px rgba(0,115,209,0.2)), 28px semibold title, optional uppercase eyebrow + 13.5px subtitle. Narrow-viewport breakpoint included. - New _page_hero.html partial wraps the boilerplate. Usage: {% set page_hero_eyebrow = "Users & Access" %} {% set page_hero_title = "Users" %} {% set page_hero_subtitle = "…" %} {% include "_page_hero.html" %} - 15 admin templates migrated to it: admin_users / admin_groups / admin_marketplaces / admin_access / admin_sessions / admin_session_detail / admin_store_submissions / admin_scheduler_runs / admin_usage / admin_user_detail / admin_welcome / admin_workspace_prompt / admin_server_config / activity_center / admin/news_editor. Each gets a grouped eyebrow (Users & Access / Data / Agent Experience / Activity Center / Server) matching the Admin dropdown sections so the page identity is unambiguous at a glance. Legacy -title H2/H1 + adjacent subtitle paragraphs deleted; their per-page CSS rules are dead now (harmless, retire in a follow-up sweep alongside other inline-style cleanup the reviewers flagged). admin_tables.html intentionally NOT migrated — it's a standalone HTML page that doesn't extend base.html; a separate refactor. Test: test_admin_users_page_renders_for_admin assertion updated from .users-title to .page-header__title + .page-header--hero (the canonical pair). All other web/template tests stay green. * refactor(ui): dedup _humanbytes, drop 267 lines of dead inline CSS (1) _humanbytes consolidation: - Add TB branch + optional precision param (default 2 preserves existing Store detail callers; dashboard uses precision=1 for headline tiles). - Delete inline _fmt_bytes from dashboard handler — was a copy of _humanbytes with different rounding. One canonical helper now. (2) Dead inline-CSS sweep across 17 migrated templates: - Conservative regex: a CSS rule is deleted only when its primary class matches one of the known-dead names AND that name is NOT referenced from any class= attribute in the same file's markup. - Per-file 'in-use' guard saved several false positives that the deny list would have nuked (e.g. .users-toolbar, .gp-search, .obs-subtitle, .marketplaces-toolbar are still in use; only .users-table, .users-search, .users-title, .modal-btn, etc. that have NO markup left went away). - Removed: -267 lines across admin_users (-42), admin_marketplaces (-45), admin_groups (-31), my_tokens (-38), admin_tokens (-29), admin_access (-9), admin_user_detail (-6), admin_welcome (-8), admin_workspace_prompt (-8), admin_server_config (-2), admin_sessions (-1), admin_session_detail (-1), admin_usage (-1), admin_store_submissions (-3), admin_scheduler_runs (-3), activity_center (-4), corporate_memory_admin (-36). Contract test stays green (9/9); all web/template/render/user_management tests pass. * feat(ui): canonical hero on /catalog (Data Packages) Same .page-header--hero treatment as the admin pages — Data eyebrow, Data Packages title, Browse-the-data-sources subtitle. Removes the ad-hoc .page-title block (h1 / p / wrapper-div) and its CSS rules (now dead, 3 rule blocks deleted). * fix(nav): load app.js from _app_header.html — works on standalone pages The previous nav-fix commit moved the inline dropdown script from _app_header.html into app/web/static/app.js + added <script src=…> to base.html. That broke EVERY page that includes _app_header.html WITHOUT extending base.html (catalog, corporate_memory, admin_tables, install). They got the nav markup but no JS → both Admin and AD dropdowns dead on those pages. Fix: emit the <script src=app.js defer> directly inside the _app_header.html partial. Any page that includes the header now gets the script automatically — base.html-extenders AND standalone HTML pages alike. base.html's duplicate <script> line removed. Also fixes the wide-hero on /catalog: .page-header--hero now sets its own max-width: var(--width-app) (1280px) so standalone pages without a .container parent don't render the gradient edge-to-edge. catalog's .source-cards bumped from 900px → 1280px to match the hero, otherwise the page reads two-tier (wide blue band, narrow content) which the user flagged. Verified locally via agent-browser: Admin + AD dropdowns now click through on /catalog, /admin/tables, /corporate-memory. docs(plan): standalone pages → base.html framework migration plan Plan + Plan-agent review (8 must-fix items applied) for converting the 5 templates that ship their own <html><head><body> scaffold (catalog, install, corporate_memory, corporate_memory_admin, admin_tables) to extend base.html. Root cause of yesterday's 'dropdown dead on /catalog' regression: shared infrastructure in base.html doesn't propagate to standalones. * feat(base): body_attrs block + migrate install.html to extend base base.html: new {% block body_attrs %}{% endblock %} slot so pages that need <body> attributes (admin_tables has data-source-type) can carry them through extends. install.html: convert from standalone <html><head><body> scaffold to {% extends "base.html" %} with title / body_attrs / head_extra / layout / scripts blocks. Drops: - <!DOCTYPE>, <html>, </html>, <head>, </head> - <meta charset>, <meta viewport> - Duplicate <link rel="stylesheet" href="...style-custom.css"> (base.html already provides one) - <body> opening + closing tags - Leading _app_header.html include + _version_badge.html include (base.html handles both) Preserves per-page CSS (in head_extra), per-page JS (in scripts), the Inter font preconnect (kept inline; not hoisted to base in this PR — separate decision). Pilots the migration recipe before the 4 larger pages. * refactor(memory): extend base.html Same recipe as install.html. corporate_memory.html now inherits <html>/<head>/<body> + nav + app.js script tag from base.html. Page-specific CSS and JS preserved in head_extra + scripts blocks. * refactor(memory-admin): extend base.html Same recipe as install/corporate_memory. Curation page now in the shared rendering pipeline. * refactor(catalog): extend base.html catalog.html had the most complexity: 7 head-level assets (chart.js, Prism, prism-sql, metric_modal.css link + 2 preconnects + Inter stylesheet), 5 body-level <script> blocks including a <script type= "module"> for the metric modal, 2 duplicate style-custom.css links in <head>. The migration script preserved all of them — head-level externals hoisted to {% block head_extra %} in source order, body scripts relocated to {% block scripts %} in source order (so chart.js loads before the IIFE that builds Chart instances), duplicate style-custom.css links dropped (base.html provides one). * refactor(admin-tables): extend base.html + carry data-source-type The biggest of the 5 standalones at 3563 lines. <body data-source- type="{{ data_source_type }}"> attribute carried through via the new {% block body_attrs %} slot (admin_tables JS reads document.body.dataset.sourceType to switch between keboola and bigquery rendering paths). * release: 0.54.10 — UI design system unification + homepage status frame + initial workspace override + store guardrails Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com> * refactor(web): migrate remaining templates to canonical design primitives - admin_group_detail: .data-table, .btn family, appToast(), remove duplicate table/button/toast CSS - admin_store_submission_detail: .data-table, .btn family, appToast(), remove duplicate btn/toast CSS - profile_sessions: .data-table, _page_hero.html, remove duplicate table/title CSS - me_debug: .data-table, .btn family, remove duplicate table/button CSS - marketplace: .btn-primary/.btn-secondary, remove duplicate button CSS - store_edit: remove duplicate .btn-primary/.btn-link CSS, canonical button classes - store_upload: remove duplicate .btn-primary/.btn-secondary/.btn-link CSS Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-14 13:28:03 +02:00
Vojtech	aa6a6700f4	feat(me/stats): per-analyst Stats dashboard with 4 tabs (#298 ) * feat(me/stats): per-analyst Stats dashboard with 4 tabs New /me/stats page shows the calling user's own analytics across four tabs, lazy-loaded per activation: - Sessions — paginated usage_session_summary join with a filesystem scan of un-processed JSONL (mirrors admin list_user_sessions shape). v44 token columns aggregated per row. - Tokens — daily series (default 30 days), by-model breakdown (lifetime), top-10 biggest sessions, lifetime totals. Single CTE per sub-query against per-user partition (idx_usage_session_user). - Data access — audit_log rows where action LIKE 'query.%' for the caller. Covers query.local / query.hybrid / query.remote / query.internal. Cursor-paginated on (timestamp, id). - Sync activity — audit_log rows where action is sync.* or manifest.* for the caller, plus users.last_pull_at for the header. Per-pull history now persists thanks to the new manifest.fetch audit row. Backend: app/api/me_stats.py — single APIRouter at /api/me/stats/, four GET endpoints, all gated by get_current_user (server-side caller scope; the page route itself only renders the shell). Frontend: app/web/templates/me_stats.html — tab bar + 4 panels, plain JS lazy-loads each panel's endpoint on first activation, caches per-tab so switching back doesn't refetch. Small SVG bar chart on Tokens tab (no external charting dep). 'Stats' link added to _app_header.html primary nav between 'Data Packages' and the Admin dropdown. Side change in app/api/sync.py: /api/sync/manifest now emits a manifest.fetch audit_log row alongside the existing users.last_pull_at bump. The column UPDATE only retains the most recent timestamp; per-pull history needs an audit row. client_kind='api' for the manifest endpoint (vs. 'web' which the audit-read deduper uses for AC reads), so the Sync tab can distinguish CLI pulls from browser-driven manifest peeks. 7 new tests in tests/test_me_stats.py: - sessions endpoint caller-scope isolation (user A doesn't see B) - sessions pagination - tokens empty-user zero shape - tokens aggregation across daily window + by_model + top + totals - queries endpoint filters to action LIKE 'query.%' + caller scope - sync endpoint surfaces both manifest.fetch and sync.trigger - manifest endpoint writes the manifest.fetch audit row ui(me/stats): widen page to 1400px via main.main escape Default base.html .container wraps content at max-width 800px. Stats tables (by-model + top-sessions: 6 columns each) felt cramped at that width — same constraint dashboard.html escapes via the {% block layout %} override pattern. Mirror that here: render <main class="main"> and bump .stats-page max-width to 1400px so the 6-column tables breathe without going edge-to-edge on wide monitors. * ui(me/stats): narrow from 1400px to 1280px to match /home /home isn't actually .container's default 800px — style-custom.css has a body:has(.home-mock) .container { max-width: 1280px } override that widens it. 1280px is the shared 'wide content' width across the codebase (top-nav header + /home + dashboard all use it). Bumping me_stats from 1400px to 1280px so the Stats page reads as 'same chrome' instead of distinctly wider than its sibling pages.	2026-05-14 10:27:58 +00:00
Vojtech	37ad39c8a3	feat(home): status frame on /home (operator-gated, onboarded-only) (#297 ) * feat(home): status frame on /home — last sync, sessions, prompts, tokens, projects Adds the homepage status frame: a 5-card row above the install-hero / offboard-strip on /home showing the calling user's Last sync (their last `agnes pull`), Sessions, Prompts, Tokens used, and Projects worked on, with a 24h/7d pill toggle. Backed by `GET /api/me/home-stats?window=` (one DuckDB CTE joining `users` + `usage_session_summary` + `usage_events`) and SSR'd from the same `compute_home_stats` helper on initial paint so there's no spinner. The window toggle is the only JS-driven path. Side surfaces: - `GET /api/sync/manifest` now stamps `users.last_pull_at` so `agnes pull` (and the Claude Code SessionStart hook that wraps it) imprints the analyst's last sync time for the new card. - `usage_session_summary` gains four BIGINT token counters (input_tokens, output_tokens, cache_read_tokens, cache_creation_tokens) summed from JSONL `message.usage.` per assistant turn. - `USAGE_PROCESSOR_VERSION` bumps 1 → 2 so the session-pipeline reprocess loop invalidates stale summaries and backfills tokens on the next tick. Schema migration v43 → v44 is idempotent ALTERs (last_pull_at + 4 token columns) — fresh installs receive them from `_SYSTEM_SCHEMA`, upgrade path runs `_v43_to_v44`. Defaults (NULL / 0) backfill existing rows cleanly. 9 new tests in tests/test_home_stats.py cover the migration, endpoint shapes (24h/7d/unknown/empty/missing-user), and the manifest-side last_pull_at bump. docs(CHANGELOG): homepage status frame entries under [Unreleased] The post-rebase release-cut now belongs to whichever PR lands next after main rolled to 0.54.9. This PR logs its bullets under [Unreleased] (Added: homepage status frame, per-user pull tracking, token counters; Changed: schema v43 → v44 migration) so they ride out with the next release-cut. * fix(tests): bump test_schema_v42_migration asserts to v44 CI failed because tests/test_schema_v42_migration.py hardcoded `assert SCHEMA_VERSION == 43` and `assert v == 43` after init. v44 (homepage stats frame backing columns) was introduced in the preceding feat commit; this aligns the existing v42-era migration tests with the new schema version. * feat(home): gate status frame on operator flag + user.onboarded Two gates on the homepage status frame: 1. Operator master switch — `get_home_status_frame_visibility()` in app/instance_config.py mirrors the existing `get_home_automode_visibility()` shape: env var `AGNES_HOME_SHOW_STATUS_FRAME` > yaml `instance.home.show_status_frame` > default `True`. Cautious-rollout instances can disable the frame without forking; the yaml example documents both knobs. 2. Onboarded gate — the template only renders the frame when the caller's `users.onboarded` is true. First-day users see a clean install-hero before all-zero stat cards; the frame appears automatically on the next render after `agnes init` POSTs `/api/me/onboarded`. Router skips the `compute_home_stats` DB read entirely when either gate is closed; `home_stats` arrives at the template as None in that branch and the `{% if %}` shortcuts the include. Why both gates: PostHog feature flags evaluated and rejected — this codebase uses PostHog for analytics capture only, not feature gating; adding a per-user feature_enabled() call on the /home critical path would couple the homepage render to a remote eval and still require an admin master switch. The onboarded gate is a UX coherence rule layered on top of the operator switch, not an A/B test signal. 3 new tests in test_home_stats.py cover the env-var resolution (falsey values + default-true). The yaml example gets a `home:` block documenting both `show_automode` (pre-existing flag, was undocumented in the example) and `show_status_frame`.	2026-05-14 09:28:47 +00:00
minasarustamyan	63ae676b27	perf(marketplace): cache cover photos + restore Curated filter spacing (#294 ) * perf(marketplace): browser-cache cover photos + restore Curated tab filter spacing Cover photos on /marketplace grid now serve with `Cache-Control: public, max-age=2592000, immutable` plus URL fingerprinting (`?v=<commit-sha8>` for curated, `?v=<version_no>` for flea) so browser refresh stops hitting the server entirely for unchanged assets. Per-plugin RBAC dropped from the three image endpoints (curated_asset, curated_mirrored, get_entity_photo) in favor of login-only auth — eliminates _system_db_lock contention on parallel image requests. Per-request magic-bytes revalidation also dropped from curated_asset (it was re-reading the file just to discard the bytes, then FileResponse read it again). Spacing bug: sort-dropdown commit (6be1cee) wrapped .mp-filter-row in a new flex container with inline margin-bottom:4px, masking the original 12px CSS rule. Curated tab (where .mp-type-row is hidden) ended up with 4px between filters and the card grid. Wrapper margin restored to 12px. See CHANGELOG entry under [Unreleased] — the RBAC relaxation is called out under ### Security with explicit threat-model rationale for AI/human reviewers. * test(marketplace): update renamed-html-as-png test for dropped magic-bytes check Magic-bytes body validation was dropped from `curated_asset` in the previous commit — the request path now relies on extension allowlist + pinned Content-Type + nosniff + strict CSP to neuter mismatched payloads at the browser layer. Update the test to assert the new defense-in-depth posture (200 served, but Content-Type=image/png + nosniff + CSP=default-src 'none') rather than the gone 415. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>	2026-05-14 10:09:32 +02:00
Vojtech	3d244038b5	feat(home): Getting Started moves first, collapsible, in-page anchor (#296 ) Three tweaks to the post-PR-#291 Getting Started card: 1. Chronologically first. Moved from below the install-hero (where it sat as a static white card) to ABOVE it, inside the same `{% if not onboarded %}` guard. The blue hero is now the actual install flow that the card points at, not a peer that competes for attention. 2. Collapsed by default. Switched from <section> to <details> with no `open` attribute, so the page lands with just a quiet pill (`Getting Started — Two quick next steps — click to expand ›`). Expand to reveal the two rows. Chevron rotates 90deg when open via the `[open]` selector. Per-device dismiss X stays — generic `.home-card-close[data-dismiss-key]` handler now uses `closest('section, details')` so it works on both container types. 3. First row → #install-hero in-page anchor. Was `/setup` (which would round-trip to the same hero via a redirect through /setup). Anchored directly to the blue hero on the same page; copy reads "One-time install — walkthrough in the section below" so the user knows it's a scroll-to, not a navigation. Install-hero <div> gained `id="install-hero"`. `.install-hero { scroll-margin-top: 88px }` keeps the hero's eyebrow clear of the 72px sticky header on the jump. Second row link to /setup-advanced and the dismiss key unchanged. GS disappears alongside the install-hero when the user is onboarded, so the in-page anchor never dangles. Tests updated to assert the new markup + onboarded-state hiding.	2026-05-14 07:02:23 +00:00
Vojtech	4501c9c3dd	fix(store-guardrails): post-#290 review follow-up — purge tuple, filter chip, stale docs, lazy bundle_meta, logger.exception (#295 ) Addresses post-merge review findings on #290: - Admin Rescan is the only post-v30 producer of status='blocked_inline'. Re-add it to admin queue 'Needs review' filter chip and to TERMINAL_BLOCKED_STATUSES in the bundle-purge job so rescan-produced rows surface in the default operator view and bundles get TTL-swept instead of lingering indefinitely. - Update three doc-drift sites still referring to the pre-#290 spam counter scope (counted blocked_inline). The counter now narrows to blocked_llm + review_error; fix the comment in app/api/store.py, the docstring in get_guardrails_blocked_quota_per_day(), and the operator-facing hint rendered on /admin/server-config. - Add positive test for _reject_inline_or_continue validation branch (code='validation_failed', checks payload shape, no-DB-write contract). Locks the frontend wizard's detail.checks contract. - Tighten test_quota_disabled_with_zero — assert (200, 201) explicitly instead of !=429 so a 500 regression no longer passes. - _reject_inline_or_continue takes plugin_dir and lazy-computes bundle_meta only on the security branch. Validation rejects no longer pay for a SHA256 walk on the bundle. - Surface store.upload.security_blocked audit-log write failures via logger.exception instead of swallowing — that audit row is the only forensic trace by design.	2026-05-14 08:02:44 +02:00
minasarustamyan	69a1e22cf5	feat(initial-workspace): per-instance agnes init override (#292 ) * feat(initial-workspace): per-instance agnes init override Adds Initial Workspace Template — an admin-configurable per-instance override for the agnes init analyst workspace. When configured, agnes init downloads a server-rendered zip from a Git repo the admin registered and extracts it into the analyst's workspace, fully bypassing Agnes-default CLAUDE.md / settings.json / hooks / slash commands / AGNES_WORKSPACE.md. Repo layout convention: only the contents of a top-level `workspace/` subdirectory ship to analysts; admin docs (README, CI configs) at the repo root stay in the repo and never reach an analyst. Sync rejects repos without `workspace/` at root. Server side: - src/initial_workspace.py — clone (or fetch+reset), validate, build zip with strict path checks and reserved-path rejection (workspace/.claude/init-complete reserved by Agnes) - app/api/initial_workspace.py — admin CRUD + sync endpoint + analyst- facing status/zip/applied endpoints; config persists to instance.yaml overlay, PAT to .env_overlay - app/secrets.py — refactor: persist_overlay_token shared helper with threading.Lock for .env_overlay writes (closes pre-existing race between concurrent marketplaces saves) - app/web/templates/admin_server_config.html — new "Initial Workspace Template" section + modal + Sync/Edit/Delete/Download buttons (matches existing cfg-section visual language) CLI side: - cli/lib/override.py — single source of truth for is_override_workspace sentinel detection - cli/lib/initial_workspace.py — probe status, safe zip extraction with ../absolute/symlink rejection, typed-YES force confirmation - cli/commands/init.py — override branch (skips Agnes-default workspace writes); extended sentinel with override:true, template_source, template_sha so future agnes self-upgrade does not auto-refresh hooks - cli/lib/hooks.py + cli/lib/commands.py — short-circuit on override workspaces (install_claude_hooks, install_claude_commands, maybe_refresh_claude_hooks) Audit-event strategy: server writes initial_workspace.fetch_started inside GET /api/initial-workspace.zip (cannot be spoofed by PAT-holder); CLI POST /applied writes initial_workspace.applied as best-effort confirmation. Admin mutations log via the existing _audit pattern. Tests: 27 server (clone/validate/zip + workspace-subdir convention + concurrent persist_overlay_token + endpoint shapes + audit rows) + 29 CLI (override sentinel parse + probe fall-through + safe extraction + YES strictness + hook guards + e2e mocked init). Risk acceptance — documented in docs/initial-workspace-override.md + CHANGELOG Internal section so AI reviewers understand the deviations from defaults are intentional: - maybe_refresh_claude_hooks deliberately no-ops on override workspaces - --force on override does NOT back up CLAUDE.md (admin's repo is the source of truth) - .claude/CLAUDE.local.md IS overwritten by override extraction when admin's repo ships one * test+vendor-agnostic: drop Groupon tokens from #292 fixtures + extend admin-gate coverage Two fixes from the takeover review on #292: 1. Vendor-agnostic OSS rule: Replace `Groupon` / `groupon/template` tokens in test fixtures with `Acme` / `acme/template` (8 sites in test_cli_init_override.py + 1 in test_initial_workspace_api.py). Per CLAUDE.md "Vendor-agnostic OSS — no customer-specific content" rule: customer-specific tokens don't belong in shipped artifacts, even in test fixtures. The pre-existing FoundryAI mentions in test_instance_config.py + test_setup_instructions.py are out of scope for this PR (didn't introduce them). 2. Admin-gate coverage gap: `test_admin_endpoints_require_admin` only covered GET /api/admin/initial-workspace + POST .../sync. The register-write (POST .../initial-workspace) and delete (DELETE .../initial-workspace) endpoints used the same `Depends(require_admin)` wiring but had no regression test. Loop now covers all 4 verbs so a future refactor that drops the dependency from one endpoint fails here instead of silently exposing the write/delete paths to any analyst with a PAT. * release: 0.54.9 — Initial Workspace Template (per-instance agnes init override) Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.8 → 0.54.9) for Mina's Initial Workspace Template feature. No DB migration (config lives in instance.yaml overlay). No mandatory operator action — empty default keeps OSS-default agnes init behavior. Operators wanting full template control link a Git repo on /admin/server-config → "Initial Workspace Template". See docs/initial-workspace-override.md for the full responsibility-transfer contract. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-13 20:35:01 +00:00
Vojtech	513711ed37	feat(store): hard-reject inline guardrail failures, trace security only (#290 ) * feat(store): hard-reject inline guardrail failures, trace security only Inline failures (manifest + content validation, static-security deny-list hits) now hard-reject upstream of any DB write or bundle persistence. The v30 contract that landed every inline failure as a hidden+blocked_inline entity + admin-rescannable bundle is replaced with two response shapes: - 422 code=validation_failed — manifest/content issues. Banner-only, no submission row, no audit_log entry. Submitter fixes and retries. - 422 code=security_blocked — static_scan finding. Banner-only on the wire, plus one audit_log row (store.upload.security_blocked) carrying findings + sha256 + size for admin forensics. Quarantine + admin rescan/override apply only to the async LLM path (blocked_llm / review_error) — the cases that genuinely benefit from admin judgment. Spam-quota counter narrows to blocked_llm + review_error. Admin queue filter chip drops blocked_inline. Bundle TTL purge stops sweeping blocked_inline. Legacy blocked_inline rows from instances that ran the v30 contract remain reachable via the "All" tab. New _reject_inline_or_continue helper in app/api/store.py centralises the two-tier rejection across create_entity, update_entity, and restore_version. Frontend templates render the new payloads as inline banners (no redirect on failure) and keep submission_blocked as a one-release back-compat branch. Tests: new _seed_quarantined_entity helper replaces the older _make_eval_skill_zip-driven setup wherever a test needs a hidden+blocked_llm entity. 199 store tests pass under -n auto. * release: 0.54.8 — store inline hard-reject (BREAKING) Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.7 → 0.54.8) wrapping Vojta's hard-reject refactor. BREAKING for store-upload clients: validation failures now return 422 with `code='validation_failed'` (no entity row, no submission row, no audit_log entry) instead of the v30 `submission_blocked` 200 response that landed a hidden `blocked_inline` row. Frontend wizard + edit + restore still understand the legacy code for one release as a fallback for stale clients hitting an older deploy. Operators with custom integrations against `POST /api/store/entities` should update to handle the new `code='validation_failed'` / `code='security_blocked'` 422 responses. No DB migration required (legacy `blocked_inline` rows from instances that ran the v30 contract remain reachable via the admin queue's "All" tab; bundle-purge job no longer covers them but they linger harmlessly). --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-13 19:59:12 +00:00
Vojtech	1e87354d7e	feat(home): Getting Started + Overview + Usage modes sections (release 0.54.7) (#291 ) * feat(home): Getting Started + Overview + Usage modes sections Three new content cards rendered between the install-hero and the existing connector tiles on /home. Order: Getting Started → Overview → Usage modes → connectors. - Getting Started — dismissible card with two clickable rows linking to /setup (install flow) and /setup-advanced (deeper reference). Subsumes the legacy `.advanced-pointer` row that sat above the news section. Per-device dismiss via a generic localStorage handler: `.home-card-close[data-dismiss-key="..."]` inside a <section> wires itself up at page load — drop in any future dismissible card without per-card JS. - Overview — operator-owned HTML body sourced from the new `instance.overview` yaml field (env override `AGNES_INSTANCE_OVERVIEW`). HTML in, HTML out via the same `\| safe` filter as news_intro. Empty default hides the section entirely, keeping the OSS vendor-neutral; operators paste their product framing / privacy posture into instance.yaml. New helper `get_instance_overview()` in app/instance_config.py mirrors `get_instance_logo_svg()`. - Usage modes — three OSS-shipped tiles (Terminal / VS Code / Claude Desktop · claude.ai) explaining each surface and linking to the matching /setup-advanced anchors. Closes the gap for users wondering "where do I actually run this". Supporting changes: - setup_advanced.html gains a new `#claude-app` section between #vscode and #workspace, anchored by the Usage modes Claude Desktop tile. Covers the marketplace registration paths and when to prefer the terminal. Added to the table of contents. - Three new tests in test_web_home_page.py pin the Getting Started card markup, the Overview-on-when-yaml-set path, and the Overview-off-by-default path. All 13 tests in the file pass. Operator follow-up (separate infra PR — NOT this PR): paste the Foundry-specific Overview body into instance.yaml's `instance.overview` field. OSS ships with an empty default. * fix(home): Overview is operator-owned content — drop dismiss button Earlier iteration added a close X to the Overview section to match the Getting Started card's dismiss UX. Wrong call: Overview is operator-authored reference content (privacy posture, telemetry policy, project framing) and a per-device localStorage hide means returning users who want to re-read the policy can't recover it without clearing storage. Reverts the close button + the data-dismiss-key on the Overview section. Test inverted to assert the dismiss key is absent (defends against a future drive-by adding it back). Getting Started still dismisses — that's procedural getting-started content users legitimately stop needing once they've finished setup. Overview is always reachable; whole section is still opt-in at the operator level via the empty-yaml default. * fix(home): Terminal usage-mode tile is informational (no click-through) The setup hero above /home's Usage modes already walks the user through the Claude Code CLI install — the Terminal tile click-through to /setup just round-trips back to content the user already scrolled past. Switch Terminal to a non-anchor <div> and scope the hover affordance to a.home-usage-item so VS Code + Claude Desktop tiles keep their click-through (those legitimately deep-link into /setup-advanced anchors). * fix(home): point Usage modes guidance at ~/{workspace}/Projects/ subfolder The bundled plugin scopes the session-analysis loop and the central-catalog sync to ~/<workspace>/Projects/, not the workspace root itself — that convention already appears in the install hero's Step 4 manual-fallback note ('Don't create ~/<workspace>/Projects/ manually — the bundled plugin offers to set it up after install'). Usage modes' footer guidance now matches: 'create every project under ~/<workspace>/Projects/'. Also calls out that the session-analysis loop is scoped to that root so users understand why working outside the workspace dir is invisible to the platform.	2026-05-13 21:44:11 +02:00
Vojtech	14ddaf1e8e	feat(brand): wire instance.logo_svg into header brand slot (release 0.54.6) (#289 ) * feat(brand): inline operator SVG logo + drop header subtitle (release 0.54.6) Three header tweaks, one PR: 1. _app_header.html drops the small uppercase subtitle line below the brand. instance.subtitle still flows into the CLAUDE.md preamble + init welcome template ("Operated by …"); only the web header chrome loses it. 2. get_instance_logo_svg() in app/instance_config.py reads instance.logo_svg (yaml) / AGNES_INSTANCE_LOGO_SVG (env). The yaml field was already documented in instance.yaml.example and the template already supported inline <svg> via {{ config.LOGO_SVG \| safe }}, but router.py:344 hard-coded LOGO_SVG = "" — the middle wire was missing. Now operators can paste a lockup directly into their instance.yaml under instance.logo_svg: \| and have it render in the header. Resolution mirrors get_instance_brand (env > yaml > ""). instance.name remains independent: drives browser <title> tags + page h1s + CLAUDE.md heading; the SVG is the web-header visual only. 3. .app-header-logo svg gains max-height: 40px; width: auto; so any operator's lockup scales via its viewBox to fit the 72px header without per-asset width/height edits. Pairs with #2 — without the clamp, raw artwork (e.g. a 1600x430 lockup) overflows the chrome. Release-cut included per the same-PR rule (Unreleased contained only these bullets after rebase onto 0.54.5). * revert: keep app-header-subtitle span — out of scope for this PR Initial commit dropped the subtitle line on the assumption that the user wanted both the secondary header line AND the future-SVG brand cleaned up. The actual ask was narrower: drop the hostname suffix that renders inside instance.name ("Foundry AI (hostname)"), which is a startup.sh concern, not a template one. Restore the subtitle span and the CHANGELOG bullet that announced its removal. PR scope narrows to LOGO_SVG wiring + CSS clamp only. * fix(header): hide subtitle span when instance.subtitle is empty Pre-fix the template fell back to the literal string 'Data Analyst Portal' when INSTANCE_SUBTITLE was unset, so operators who left the field empty saw a stray hardcoded label below their brand. Switched to a Jinja {% if %} guard around the whole <span class="app-header- subtitle"> so an empty subtitle produces no element at all — clean header chrome instead of placeholder leak. * feat(home): hide install-hero once onboarded + X close button - Wrap the entire install-hero in `{% if not onboarded %}` so once `users.onboarded=true` (auto-flipped by `agnes init` POSTing /api/me/onboarded, or by the new X / existing fallback button) the blue hero disappears entirely. Pre-PR the onboarded branch reused the same shell with a "Welcome back" header + "Steps 1–4 done" badge + minimize toggle, which visually outweighed the actual nav hub. - Add a circular × close button (top-right of the hero, rendered only when not-onboarded). Click → window.confirm() asking the user to acknowledge onboarding → POST /api/me/onboarded → reload. The confirm string intentionally avoids the literal phrase "Mark me as offboarded" because cli/commands/onboarded.py::status scans /home's rendered HTML for that exact marker as a fallback for the api/me/profile check. - Lift the offboard escape hatch out of the hero into a discrete `.offboard-strip` rendered below, gated `{% if onboarded %}`. Lets the analyst flip back to the install view after wiping their workspace folder. - Centralize the /api/me/onboarded POST into a `postOnboarded()` JS helper reused by the hero X, the existing "Mark me as onboarded" fallback button, and the new offboard button. Tests updated to match the new behavior: - `test_home_onboarded_user_sees_nav_hub` — asserts the hero is gone and the offboard strip is the only setup-flow remnant. - `test_minimize_toggle_no_longer_rendered` (renamed) — asserts the minimize toggle is absent in both states (was previously rendered inside the now-hidden onboarded branch of the hero). - `test_home_no_auto_transition_after_post_until_reload` — checks offboard-strip presence post-flip instead of the removed "Welcome back" hero copy. * fix(home): X-close button used invalid source enum, hit 422 The X button's data-target-source was 'self_acknowledged_x' to give audit_log a separate marker for X-vs-button-driven flips. But app/api/me.py:38's OnboardedRequest pins source to a Literal of ['agnes_init', 'self_acknowledged', 'self_unmark'] — pydantic returned 422 on every X click. Confusing side effect: both buttons share self-mark-status as the status element, so the failed X click rendered 'Failed (422)' next to the still-functional 'Mark me as onboarded' button. Looked like the button itself broke. Fix: drop the _x suffix. Both surfaces now POST source='self_acknowledged'. Distinction in audit_log is not load-bearing — the source field captures user intent ('I'm onboarded'), not the specific UI affordance.	2026-05-13 17:25:46 +00:00
ZdenekSrotyr	471c63d711	docs(CLAUDE.md): release workflow recipe + issue economy anti-pattern guidance (#288 ) Two new sections that codify lessons learned from the v0.53.x → v0.54.x release cadence and from PRs #163, #277, #287: 1. Release workflow — concrete recipe (extends the existing "Release-cut belongs to the PR" rule). 8-step happy path for landing a release end-to-end, plus the operational quirks that bite every first-time contributor: - Use a fresh shallow clone in /tmp instead of an iCloud worktree (iCloud Drive randomly hangs on git operations) - Pick the next version: pyproject's current version is the post-cut next-target; verify against `git tag -l` before naming - Self-PR approve restriction (GitHub forbids self-approve; dismiss prior CHANGES_REQUESTED reviews before auto-merge) - CI quirks: `gh pr checks` glosses CANCELLED runs as `fail`; branch protection's strict mode caches cancelled `test` as blocking; required checks are only `test` + `docker-build` - Recovery patterns when force-push or wrong tag derails the release 2. Issue economy — fix or close, don't spawn (NEW top-level anti-pattern guidance). The default reaction to "I noticed something while doing X" is NOT "let me file an issue": - Mandatory checks before filing any follow-up: is the claim still true on main? Could you fix it in this PR (≤30 min, ≤1 file)? Is it a single-file change with obvious tests? Filing because you want to keep "this PR focused" is almost always wrong. - Audit-first reflex when investigating an existing issue: reproduce on current main BEFORE writing code; check if it's already fixed by an unreferenced PR; close moot issues with a closing comment that documents the audit. - Concrete patterns to avoid (4) + acceptable filing scenarios (4) + acceptable closing scenarios (4). Reference for the audit-first principle: PR #286's takeover review found the cited #163 leak doesn't fire on current main (writeable variant has zero callers; readonly callers all explicitly close). The deeper audit closed #163 + the speculative follow-up #287 — net zero new issues, problem audited and documented in the closing comments. Both sections sit between the existing "Release-cut belongs to the PR" and "Run tests before every push" sections so the release-related guidance reads as one coherent block.	2026-05-13 16:30:45 +00:00
ZdenekSrotyr	f35b53dba4	fix(db): get_analytics_db() singleton — close #163 (release 0.54.5) (#286 ) * fix(db): get_analytics_db() singleton mirroring get_system_db pattern Closes #163. Pre-fix `get_analytics_db()` opened a fresh `duckdb.connect()` on every call; most callers don't `.close()` the returned handle, so each leaked connection held a WAL ref + FD until GC kicked in. Under load this manifested as "too many open files" or DuckDB lock contention on the analytics DB. Singleton + cursor-per-call mirrors `get_system_db()` (lines 882-904 of src/db.py) — one underlying connection persists; callers that .close() the returned handle only close the cursor. Re-opens transparently when DATA_DIR changes (test fixtures that swap data dirs across cases). `get_analytics_db_readonly()` deliberately stays per-call — each invocation re-ATTACHes extract.duckdb files into a fresh read-only context, so caching the connection would require careful re-ATTACH bookkeeping the read-only path doesn't currently do. New `close_analytics_db()` mirrors `close_system_db()` (best-effort CHECKPOINT then close, swallow exceptions). Wired into the FastAPI shutdown hook in `app/main.py` alongside `close_system_db()`. 5 tests in `tests/test_analytics_db_singleton.py` pin the contract: - caches connection (two calls → same _analytics_db_conn) - closing a cursor doesn't close the underlying connection - DATA_DIR change → singleton drops + reopens - thread-safe (16 concurrent calls share the singleton) - close_analytics_db() clears state + reopen works Out of scope (per #163): auditing all caller sites to drop their now- redundant `.close()` calls. Closing a cursor is harmless; the production benefit is the connection cap. Verified: 4525 passed locally (4520 + 5 new), 1 pre-existing fail in test_readers_in_pre_init_dir (subprocess timeout, on origin/main, no relation to this diff). * release: 0.54.5 — close #163 analytics_db singleton Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.4 → 0.54.5) for the get_analytics_db() singleton refactor. No DB migration; no operator-facing config change. Internal-only cleanup of a known FD-leak under load. Closes #163.	2026-05-13 16:11:13 +00:00
ZdenekSrotyr	5d7241b9ec	fix(store-guardrails): close #277 — 3 LOW hygiene follow-ups (release 0.54.4) (#285 ) * perf(content-guardrail): skills walker uses rglob(".md") not rglob("") LOW finding #1 from #277. The skills walker in `_iter_components` greedily walked every file under `skills/` (assets, scripts, data fixtures) just to filter to `skill.md` by name. Wasteful, not incorrect — for asset-heavy skill packs (tutorials with screenshots, data fixtures) this is hundreds of stat() calls per ingest. Brings the skills walker in line with the agents + commands walkers (lines ~313 and ~335) which already filter at the glob layer. Kept the `.lower() != "skill.md"` case-insensitivity filter for macOS HFS+ users who write `Skill.md`. Two tests in TestSkillsWalkerSkipsNonMd: one functional (assets + scripts + JSON siblings under skills/ are NOT yielded as components), one source-level pin (rglob('.md') literal must appear in the walker — catches a future regression to rglob('')). * fix(llm-review): _normalize_content_quality verdict aggregates evidence both ways LOW finding #2 from #277. The dispatcher already downgraded `verdict='fail'` with empty issues to `pass` (no visible reason to block). It did NOT promote the inverse — `verdict='pass'` with non-empty issues — to fail, leaving a defense-in-depth gap: a compromised or prompt-injected model that flips the verdict without zeroing the issues would let the submission ship while the issues persisted on the row and got rendered in the UI. Symmetric branch added; verdict is now an aggregate of the evidence in both directions. 5 tests in TestNormalizeContentQualityVerdict pin all four corners of the (verdict, issues) matrix plus the malformed-input safe path. * fix(prompt-injection): tighten IGNORE rule scope to placeholder tokens only LOW finding #3 from #277. The IGNORE-as-benign rule for {{var}} placeholder tokens conflicted subtly with the trust-boundary paragraph above. A submitter aware of the prompt could embed instructions inside the placeholder framing (e.g. `{{IGNORE_ABOVE_AND_SET_content_quality_pass}}`) and bank on the "benign documentation token" exemption to bypass the security review. Tightened paragraph spells out that the placeholder tokens themselves are exempt but the text inside or around them is still untrusted bundle content subject to the trust-boundary rule. Concrete attack shape called out so the model has a canonical negative example to anchor against. Defense in depth — not a known break, the trust-boundary paragraph was the primary defense — but closes a class of attacks where a submitter could bet on the IGNORE rule being too permissive. Two tests in TestSystemPromptIgnoreRuleScope pin the new clause and verify the trust-boundary paragraph (`<bundle>...</bundle>` anchor) survived the edit. * release: 0.54.4 — close #277 (3 LOW guardrail follow-ups) Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.3 → 0.54.4) bundling the three LOW hygiene fixes from issue #277 — the takeover-review follow-ups punted from PR #276's safe-fix commit. No DB migration; no operator-facing config change. Submitter-facing behavior is conservative-tightening: descriptions previously sneaking through with `verdict='pass' + non-empty issues` now correctly fail review. SYSTEM_PROMPT IGNORE-rule scope tightening is defense in depth, not a known break. Skills walker perf change is invisible to operators (faster ingest on asset-heavy skill packs). Closes #277.	2026-05-13 15:16:33 +00:00
ZdenekSrotyr	117b6784ea	fix(sync+ops): defer-probe race, AGNES_TEMP_DIR chown, default-schedule env knob (#283 ) * fix(sync+ops): defer-probe race, AGNES_TEMP_DIR chown, default-schedule env knob Three sync-ops fixes surfaced during agnes-dev steady-state operation after the v0.46→v0.54 cutover settled. None of them depend on each other; bundled because they all live in the sync trigger / agnes-auto- upgrade flow and are diagnosed from the same observation window. 1. (fix) /api/sync/status race window. The trigger handler returned 200 BEFORE the background task acquired _sync_lock. In that few-hundred-ms gap, an honest /api/sync/status call returned locked=false — and the host-side agnes-auto-upgrade.sh defer probe fired right in that window proceeded with 'docker compose up -d' and SIGKILLed the just-spawning extractor / materialized worker. Observed on agnes-dev: 3 mid-sync container kills in 30 min, each followed by a few-min outage and a partial sync. The WAL replay auto-recovery (PR #217) kept the system DB consistent through each kill, but the actual sync work was lost. Fix: handler stamps _recent_trigger_at; status endpoint returns locked=true for _TRIGGER_HOLD_SEC (=30s) after the most recent trigger, even if the background task hasn't yet acquired the lock. 30s covers the schedule → spawn latency with margin; short enough not to indefinitely block auto-upgrade after a one-off trigger. Defense in depth: the real lock still gates the extractor subprocess. 2. (fix) scripts/ops/agnes-auto-upgrade.sh: post-upgrade chown loop now mkdir -p's /data/tmp before chown'ing, and includes it in the list of dirs that get the runtime UID:GID. /data/tmp is the default AGNES_TEMP_DIR set in docker-compose.yml — Snowflake-UNLOAD slice staging and CSV intermediates land here. Pre-fix the runtime user (uid 999) couldn't create /data/tmp under a root-owned data-disk root, so tempfiles silently fell back to the boot disk's overlayfs /tmp — defeating the whole point of routing slice staging onto the dedicated data volume. 3. (feat) AGNES_DEFAULT_SYNC_SCHEDULE env var sets the platform-wide fallback sync_schedule. Lets a deployment dial cadence down to 'daily 03:00' (data freshness budget once-per-day) without having to PUT every registry row. Per-table sync_schedule still wins; literal 'every 1h' is the floor if neither is set — OSS-historical default unchanged. Tests: - test_sync_status_trigger_hold_window_reports_locked_after_trigger - test_sync_status_trigger_hold_window_expires - test_default_schedule_falls_through_env_then_every_1h (3 branches) * release: 0.54.3 — sync defer-probe race + AGNES_TEMP_DIR chown + default-schedule env knob Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.2 → 0.54.3) bundling three sync-ops fixes from agnes-dev steady-state observation. No DB migration; trigger-hold window is additive (anything that already saw locked=true still does — the window EXTENDS the true period); /data/tmp chown is no-op when already correct; AGNES_DEFAULT_SYNC_SCHEDULE unset = every-1h default unchanged.	2026-05-13 09:44:20 +00:00
Vojtech	50a974f196	feat(store-guardrails): admin-configurable content thresholds (#281 ) * feat(store-guardrails): admin-configurable content thresholds Adds the flea-market content guardrail floors to the /admin/server-config editor so operators can tune the bar without code changes. Defaults are unchanged (60 chars description, 25 chars command, 5 distinct words, 200 chars body) — patching guardrails.* in instance.yaml or via the admin UI overrides any of them and the next inline check picks up the new value. src/store_guardrails/content_check.py now resolves the four floors via helper functions (_min_desc_chars / _min_command_desc_chars / _min_distinct_words / _min_body_chars) that read app.instance_config at call time. Module-level _DEFAULT_* constants remain as fallbacks if the import fails (defensive — keeps the guardrail module loadable without the app package on its path). app/instance_config.py grows four matching getters returning the live value with sane defaults + integer coercion. app/api/admin.py registers 'guardrails' as an editable section + ships nine known-fields entries (min_description_chars, min_command_description_chars, min_distinct_words, min_body_chars, enabled, review_model, blocked_quota_per_day, blocked_bundle_ttl_days, stuck_review_grace_seconds) with operator-facing hint copy explaining what each knob does. app/web/templates/admin_server_config.html gets a SECTION_META entry so the section renders as 'Flea-market guardrails' with a help string instead of a bare section ID. app/web/router.py threads the live thresholds into /store/new and /store/examples via a small _guardrail_thresholds() helper so the disclosure copy, char counter, and "Why these limits" table render the configured value (not a hardcoded 60). End-to-end smoke verified: PATCH guardrails.min_description_chars=90 → /store/new immediately renders "90 characters" + JS DESC_MIN=90 on the next request, no restart required (helpers read live config per call). * chore(store-guardrails): address PR review safe-fix findings Code-review safe_auto findings on PR #281 (review run 20260513-100126-64052520): - CHANGELOG: add Unreleased entry covering the new /admin/server-config Flea-market guardrails section, the four live threshold getters, and the route-helper rendering knobs. Required by the project's non-negotiable "Changelog discipline" rule. - content_check.py: narrow `except Exception` to `except ImportError` on the four `_min_()` resolver helpers. Surface-level TypeError / ValueError on a malformed YAML value belongs to the instance_config getters' own try/except — the resolvers should only defend against the in-tree import itself failing, not silently swallow real bugs in the getters. - store_upload.html: refresh the stale "30-char threshold" comment to reflect the configurable floor (default 60), and add `\|default(60)` / `\|default(25)` / `\|default(5)` filters to the disclosure-copy bindings so the upload form matches store_examples.html's belt-and-suspenders rendering if a future route ever renders the template without populating the `guardrail` context. - router.py: tighten `_guardrail_thresholds()` return annotation from bare `dict` to `dict[str, int]`. Residual work (left for separate change after operator direction): - Add round-trip test (PATCH guardrails -> next inline check uses new value) — primary testing gap. - Decide policy on `min_=0` (currently coerced to 1 via `max(1, int(val))`) vs treating 0 as a disable sentinel like neighbour getters (`blocked_quota_per_day`, `blocked_bundle_ttl_days`). - Add POST-time integer validation for `guardrails.` so a typo'd YAML value (bool / string / float) errors loudly instead of silently falling back to the default. test(store-guardrails): cover admin-configurable thresholds + PATCH round-trip Closes the "primary testing gap" Vojta noted in the safe-fix commit on PR #281 — the four new `get_guardrails_min_` getters and the PATCH-takes-effect-on-next-check live-config flow had no direct coverage. 10 new tests in `tests/test_store_guardrails_admin_config.py`: - TestGuardrailGetterDefaults (4 tests) — each new getter returns the documented default (60 / 25 / 5 / 200) when nothing is configured. - TestGuardrailGetterOverlay (5 tests) — overlay-driven overrides win, string values that look numeric coerce via int(), garbage strings fall back to default via the (TypeError, ValueError) branch, and the `max(1, int(val))` floor pins zero/negative inputs to 1. - TestPatchRoundTrip (1 test) — PATCH `/api/admin/server-config` `guardrails.min_description_chars=90`, then call content_check against a 75-char description that previously passed: must now fail with `too_short`. Then PATCH back to 60 and verify the next check passes again. Closes the cache-invalidation contract Vojta relies on for the "no app restart" claim — broken without the reset_cache() bracket in /api/admin/server-config. The TestGuardrailGetterOverlay.test_zero_or_negative_floored_to_one test pins the current `max(1, int(val))` policy. Vojta's safe-fix commit explicitly left "policy on min_=0 vs disable-sentinel" as residual work — pinning the current behavior here ensures any future change to use 0 as a disable sentinel must update this test (and the reviewer sees the policy decision). Verified: 4509 tests pass locally (4499 existing + 10 new). * release: 0.54.2 — admin-configurable flea-market guardrail thresholds + tests Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.1 → 0.54.2) bundling Vojta's admin-configurable thresholds for the flea-market content guardrail (9 knobs in /admin/server-config) plus the test coverage closing the "primary testing gap" he punted in the safe-fix commit. No DB migration; defaults unchanged from PR #276 — instances that don't set `guardrails.*` keep the original bar transparently. --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com> Co-authored-by: ZdenekSrotyr <139972147+ZdenekSrotyr@users.noreply.github.com>	2026-05-13 09:20:55 +00:00
dependabot[bot]	8f54ca3742	chore(deps): bump authlib from 1.6.11 to 1.6.12 (#282 ) Bumps [authlib](https://github.com/authlib/authlib) from 1.6.11 to 1.6.12. - [Release notes](https://github.com/authlib/authlib/releases) - [Changelog](https://github.com/authlib/authlib/blob/1.6.12/docs/changelog.rst) - [Commits](https://github.com/authlib/authlib/compare/v1.6.11...1.6.12) --- updated-dependencies: - dependency-name: authlib dependency-version: 1.6.12 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-13 11:11:04 +02:00
minasarustamyan	efc607f3ee	feat(cli): agnes marketplace search/detail/add/remove + retire stale subcommands (#280 ) * feat(cli): agnes marketplace search/detail/add/remove + retire stale subcommands Unified CLI surface for the v28+ marketplace: search across Curated and Flea Market (RBAC-filtered server-side), drill into a single item's detail, add/remove from your stack. Replaces opt-out era commands that no longer reflect how users compose their stack. CLI changes: - Added: agnes marketplace {search,detail,add,remove} - Removed: agnes my-stack toggle (opt-out semantics, curated-only) - Removed: agnes store {list,show,install,uninstall} (consumer-side ops moved under marketplace; store now covers only creator-side upload, update, delete, mine) ID format unifies curated and flea: marketplace_id/plugin_name (slash) routes to /api/marketplace/curated/..., bare UUID routes to /api/store/entities/... (flea bundles skills/agents into a synthetic plugin server-side, so the analyst sees a single add/remove surface). Templates: - claude_md_template.txt: rewritten marketplace section as operational guidance for Claude Code (discovery, stack management, behaviour notes). Dropped the static {% if marketplaces %} listing — the CLI is the source of truth for what's in the stack at any moment, so a snapshot rendered at init time would lie the moment the user runs agnes marketplace add/remove. Same discipline already applied to tables and metrics. - agnes_workspace_template.txt: cheat sheet adds 5 marketplace one-liners; keeps the file's reference-doc tone (the original commit's intent: 'what is this thing, how does it work, how do I uninstall it'). Docs: HOWTO/05-customizing-skills.md rewritten around the new CLI flow; the opt-out section is replaced by 'Removing items from your stack'. Tests: new test_cli_marketplace.py covers all four subcommands incl. RBAC/409 paths (system plugin guard, not-approved flea entity); test_cli_store.py trimmed to the retained creator-side commands. * release: 0.54.1 — agnes marketplace CLI redesign + retire stale subcommands Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.0 → 0.54.1) bundling the BREAKING removals of `agnes my-stack toggle` and `agnes store {list,show,install,uninstall}` plus the new unified `agnes marketplace {search,detail,add,remove}` surface. No DB migration; no operator-facing config change. Operators on floating tags (`:stable`) auto-upgrade transparently. Analyst CLI upgrade prompt fires on next `agnes pull`; users invoking the retired commands get "No such command" with the new `agnes marketplace` substitution called out in the BREAKING bullets. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-13 05:20:56 +00:00
ZdenekSrotyr	b4d3c576af	Activity Center: audit log + telemetry + sessions + agnes_* tables (#278 ) * docs(spec): admin observability spec + Activity Center MVP plan Parent spec (480 lines) + executable plan (2295 lines, 14 TDD tasks). Covers Activity Center rebuild (/admin/activity), with /admin/sessions and /admin/feedback deferred to follow-up plans. Already incorporates reviewer-pass revisions across three angles (security, production resilience, code architecture): - _get_db import path corrected to app.auth.dependencies - Test fixtures aligned with seeded_app / admin_user / get_system_db - All new audit writes wrapped in try/except + logger.exception - Filename sanitization on session uploads - DuckDB DESC index behavior documented; upgrade window flagged - Migration idempotency + evolved-DB test cases - reveal_raw + shared-cache multi-worker explicitly deferred Targets schema v40 (audit_log gains params_before, client_ip, client_kind, correlation_id + 3 indices). * feat(db): schema v40 — audit_log gains params_before, client_ip, client_kind, correlation_id + 3 indices * chore(test): clean up Task 1 — drop unused import, rename stale test * feat(audit): AuditRepository.log() accepts params_before/client_ip/client_kind/correlation_id * test(audit): strengthen params_before assertion to round-trip JSON content * feat(audit): AuditRepository.query() rich filters + keyset cursor pagination * feat(sync): SyncStateRepository.list_recent() cross-table feed * feat(audit): POST /api/sync/trigger writes audit_log row * feat(audit): POST /api/scripts/run-due writes audit_log row * feat(audit): POST /api/upload/sessions writes audit_log row + sanitizes filename * feat(audit): GET /api/data/{table_id}/download writes audit_log row * feat(activity): /api/admin/activity timeline + /health + /sync endpoints * feat(ui): /admin/activity rebuilt — health pulse, timeline, sync grid; /activity-center → 308 redirect BREAKING: removed demo executive-pulse / maturity-roadmap content from activity_center.html. The page now reflects real audit_log + sync_history data. * feat(ui): admin nav + dashboard widget point at /admin/activity * feat(activity): recursive-audit suppression for AC read endpoints (60s window per actor+filter) * feat(activity): emit PostHog events when integration enabled (no-op default) * fix(audit): move v40 indices out of _SYSTEM_SCHEMA + update test_repositories to unpack query() tuple _SYSTEM_SCHEMA CREATE INDEX on audit_log(timestamp) failed when migration tests hand-roll a bare audit_log (id, action) without the timestamp column. Fix: remove indices from _SYSTEM_SCHEMA; add ADD COLUMN IF NOT EXISTS guards for timestamp and other pre-v40 columns in _v39_to_v40() so the upgrade path is safe on any hand-rolled schema; call _v39_to_v40 explicitly in the fresh-install (current==0) path to restore index creation there. Also unpack the (rows, next_cursor) tuple from AuditRepository.query() in the three TestAuditRepository tests that still treated it as a list. * docs: CHANGELOG entry for Activity Center MVP * chore: refresh stale module docstring in app/api/activity.py * feat(cli): agnes admin activity — terminal access to Activity Center (timeline + health + sync) * fix(db): _v39_to_v40 — add IF NOT EXISTS guard for 'action' column The v39→v40 ladder step adds defensive ADD COLUMN IF NOT EXISTS for every audit_log column so a hand-rolled bare audit_log (id only) is safe through the ladder. 'action' was missing from the guard list, causing CREATE INDEX idx_audit_action_time to fail on tests that stub audit_log with only an id column (tests/test_e2e_extract.py:: TestSchemaMigration::test_migration_preserves_and_extends). Local 6/6 schema tests + the previously-failing CI test pass. * docs(spec): platform telemetry epic — Boss directive + Activity Monitoring plan rebased onto v40 (stacked on zs/spec-activity-center) * feat(db): schema v41 — 7 usage_* tables for telemetry (events, summary, rollups, attribution) * chore(db): tighten v41 — usage_session_summary.session_id NOT NULL + upgrade test asserts all 7 tables * feat(usage): UsageAttributionRepository — replace/delete/lookup over usage_attribution_* tables * refactor(marketplace): extract list_inner_skills/agents/commands to src/marketplace_listing.py for reuse * feat(usage): explode plugin attribution on marketplace sync + store entity write; backfill script * refactor(marketplace): finish src/marketplace_listing.py extraction — drop duplicate _list_inner_* + _parse_frontmatter from app/api/marketplace.py * feat(usage): promote attribution helpers to src/usage_attribution_helpers.py; hook update_entity rename + bundle-swap; clarify best-effort semantics * feat(usage): UsageProcessor real extraction + rollup rebuild + 10 fixture-driven tests * fix(usage): include tool_id in event hash + executemany + rollup transaction (critical multi-tool-turn drop fix) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(marketplace): popularity stats — invocations_30d + trend + sort=most_used\|trending + Most Popular section * feat(admin): /admin/users/<id> Sessions section — list + single-file + bulk-zip downloads (audit-logged) * feat(usage): admin export endpoint + CLI — csv/json/parquet streaming, filters, audit-logged * feat(usage): agnes admin ask — LLM Text-to-SQL over usage_events with SELECT-only validator (audit-logged) * feat(usage): reprocess + prune endpoints + scheduler daily prune job + CLI * docs: PLATFORM_SETUP.md operator playbook + HOWTO/ cookbook (5 guides + index) Adds docs/PLATFORM_SETUP.md as a consolidated operator playbook covering bootstrap, TLS, marketplaces (curated + flea), scheduler env vars, telemetry extraction/export/ask/prune, privacy posture, and daily routine. Adds docs/HOWTO/ with 5 analyst cookbook guides: first query, snapshots for remote tables, private sessions, feedback + admin ask, and customizing skills. Existing setup docs (QUICKSTART, DEPLOYMENT, ONBOARDING, HEADLESS_USAGE) get a one-line cross-reference at the top pointing to PLATFORM_SETUP.md. * docs(changelog): platform telemetry epic — usage_* foundation + surfaces + admin access + docs Comprehensive [Unreleased] entry covering: usage_events/session_summary/ tool_daily/plugin_daily tables (v41), attribution lookup tables, backfill script, marketplace Most Popular + invocation chips + sort, admin Sessions section, export/ask/reprocess/prune endpoints + CLI mirrors, Activity Center (v40), PLATFORM_SETUP.md + HOWTO/ docs, and operations notes for v41 upgrade. * fix(security): block DuckDB read_/http_/glob functions in usage_ask validator + symlink escape guard in session zip + clarify mark-private semantics * fix(admin): parquet export tempfile cleanup on COPY failure + correct processed-first sort on /admin/users/<id>/sessions * feat(audit): close 8 production audit gaps — query (local/remote/hybrid), catalog/schema/sample, snapshot estimate/create, check-access * feat(ui): /admin/usage summary dashboard + per-user activity tab on /admin/users/<id> * fix(audit): cap error messages at 200 chars + audit user_activity reads + recursion guard on usage.summary * fix(audit): catalog.list audits on error path + clean up deferred json import * fix(ux): client_kind=cli for PAT auth + timeline empty state + email-instead-of-uuid + nav reorder + help text + loading indicators + ask doc * feat(observability): unify /admin/activity into single page with saved views - KPI cards (events, users, error rate, p95) clickable as quick-filters - Faceted filter dropdowns populated from audit_log in the current window - Sortable audit table, cursor pagination, per-row JSON side panel - Saved views (schema v43: user_observability_views) — per-user state - Top bar: window selector + 30s Live toggle + saved views dropdown - /admin/scheduler-runs → 308 redirect (source=scheduler filter) - New endpoints: /api/admin/observability/{facets,kpis,views} * test: update activity + scheduler-runs tests for unified page - test_admin_activity_page_renders asserts new structural anchors - test_admin_scheduler_runs_page_admin_only asserts 308 redirect * fix(observability): respect [hidden] on modal + side panel CSS `display: flex` on .obs-modal beat the [hidden] attribute's UA display:none, so the save-view modal rendered on page load and Cancel clicks couldn't dismiss it. Gate the modal's flex layout on :not([hidden]); add the same display:none guard prophylactically to .obs-panel and .obs-views-panel. * feat(observability): user enrichment in audit + interactive /admin/usage Activity: - /api/admin/activity now joins users for user_email + user_name per row - User column renders "name (id-prefix)" or "email (id-prefix)" instead of an opaque truncated UUID; falls back to id when the user record is missing Usage: - /admin/usage rewritten as the same filter/group-by/search pattern as /admin/activity. Faceted dropdowns (User / Tool / Source / Event type) populated from usage_events; debounced free-text search across tool_name / skill_name / subagent_type / command_name - New endpoints /api/admin/usage/{facets,kpis,query}; the query endpoint supports group_by in {day, username, tool_name, source, ref_id} with sort + offset pagination, plus an ungrouped raw-events mode - 4 KPI cards (events, distinct users, distinct tools, error rate) are clickable quick-filters; clicking a grouped row applies the bucket as a filter - Old static `?window=7d\|30d\|all` server preload removed; all state is client-side via since_minutes + group_by + filters in the URL * fix(observability): clearer labels, all-column sort, drop saved views UI - Rename page titles: "Activity" → "Server activity", "Usage" → "Tool usage" with a one-line subtitle on each explaining what the page covers and linking the other one. The two pages source different data (audit_log vs usage_events) and the previous labels conflated them. - Drop the saved-views dropdown + save modal from /admin/activity. The modal pop-open bug was the trigger; the value wasn't there yet. The /api/admin/observability/views CRUD + DuckDB table stay in place. - Rename "Live (30s)" to "Auto-refresh (30s)" with a tooltip clarifying that it's the re-fetch rate, not the time range. Time range now labeled "Time range" instead of "Window". - All audit-table columns are sortable (User, Source, Action, Resource, Result added); sort is page-local with a Jinja comment explaining the trade-off. Same for raw usage rows. - Fix duplicate sort-arrow bug — the literal "▼" in the Time th HTML was rendering alongside the CSS ::before arrow. Removed the literal; CSS is the single source of truth. * feat(observability): global Sessions browser + transcript viewer + CLI Web: - /admin/sessions — list every collected session JSONL across all users with time-range, user, model, errors-only and free-text filters. Default sort surfaces error-heavy sessions first. KPI cards (sessions, distinct users, sessions w/ errors, tool error rate) clickable as quick-filters. - /admin/sessions/<username>/<file> — transcript viewer rendering the JSONL chronologically: user prompts, assistant text, tool calls (with JSON input) and tool results (with flattened output). Errors get a red border + chip and a "Next error" navigation button at the top. - Admin dropdown gains a "Sessions" link. API: - GET /api/admin/sessions/{list,kpis,facets} — filtered cross-user reads off usage_session_summary - GET /api/admin/sessions/{username}/{file}/transcript — parses JSONL via the existing services.session_pipeline.lib, returns chronological events - GET /api/admin/sessions/{username}/{file}/download — JSONL stream, same path-safety guards as the per-user endpoint, audit-logged CLI: - `agnes admin sessions list [--user X] [--errors] [--since 7d]` — table output with `!` prefix on rows that hit a tool error - `agnes admin sessions show <username> <file>` — transcript dump, with `--errors` to print only the failed tool_result blocks - `agnes admin sessions download <username> <file> [-o path]` - `agnes admin sessions kpis` — top-level numbers * feat(internal): expose telemetry tables to agnes query with row-level RBAC Three new registered tables backed by system.duckdb, queryable through the same /api/query plumbing analysts use for Keboola / BigQuery / local sources: agnes_sessions → usage_session_summary (filter: username) agnes_usage → usage_events (filter: username) agnes_audit → audit_log (filter: user_id) RBAC is per-row, not per-table: admins see every user's rows; non-admins see only their own. The filter is built server-side from the auth user dict; non-admin filter values are regex-validated before SQL interpolation. Implementation: - new connector connectors/internal/ with access (filter+exec) + registry (idempotent table_registry seed at startup) - /api/query detects internal table refs and short-circuits to a CTE wrapper that prepends "WITH agnes_x AS (SELECT * FROM <src> WHERE …), …" then "SELECT * FROM (<user_sql>) AS _q". DuckDB cursor on the shared system.duckdb handle — opening parallel handles / ATTACH on the same file is blocked process-wide. - mixing internal + BQ / registered local tables in one SELECT is rejected (v1 limitation) - src.rbac.can_access_table waves internal tables through for all authenticated users; row scoping is the actual security control - /api/v2/schema and /api/v2/sample gained internal branches; sample intentionally skips its cache because rows are RBAC-scoped per caller - audit row written as action='query.internal' with is_admin flag Tests: connectors/internal/access — RBAC, filter clause, schema, CTE wrapper coexistence with user-supplied aggregations, unsafe-username rejection. 16/16 passing. Motivating queries this enables: SELECT tool_name, COUNT() FROM agnes_usage WHERE is_error GROUP BY 1 ORDER BY 2 DESC -- analyst self-introspection: which tools fail for me? SELECT user_id, COUNT() FROM agnes_audit WHERE action = 'session.transcript_view' GROUP BY 1 -- admin: who's been looking at whose session transcripts? * feat(admin): group dropdown into 5 named sections + internal tables in /catalog Admin dropdown gains section headers so admins can land on the right page without re-reading the full menu: Activity Center Server activity / Tool usage / Sessions Users & Access Users / Groups / Resource access / Tokens Data Tables Agent Experience Curated Marketplaces / Flea Submissions / Agent Setup Prompt / Agent Workspace Prompt Server Server config "Agent Experience" frames the curated content + prompts as one cluster — it's all admin-controlled material that shapes what an analyst's AI agent encounters. "Configuration" → "Server" since only one item lives there now. Renamed the section's first two items: "Activity" → "Server activity" (matches page H1) "Usage" → "Tool usage" Also fixes /catalog visibility of the internal tables (agnes_sessions / _usage / _audit) for non-admin users: ``app.auth.access.can_access`` short-circuits to True for resource_type='table' + an internal-table id. Without this, non-admins saw the tables in /api/v2/catalog (which uses the same RBAC bypass) but not on the /catalog HTML page (which calls can_access directly, requiring a resource_grants row internal tables don't have). CSS for `.app-nav-menu-section`: small caps, muted, non-clickable; first section trims top padding so the panel doesn't open with an awkward gap. * refactor(admin): move corporate memory into Admin > Agent Experience Memory link was the only admin-only entry in the primary nav (gated by session.user.is_admin). Moves it into the Admin dropdown under Agent Experience, alongside Curated Marketplaces / Flea Submissions / Prompts — all admin-curated content that shapes what an analyst's AI agent encounters. Renamed the nav label to "Shared Knowledge" to match what the page actually is (admin-curated organisational knowledge from session verification, surfaced to agents). URL stays at /corporate-memory; the route still gates on require_admin per the existing comment. Side effect: primary nav (Home / Marketplace / Data Packages) is now uniform for every authenticated user — no conditional admin-only entry. * ui: rename admin entries to Curated Knowledge / Init Prompt / Workspace Prompt - "Shared Knowledge" → "Curated Knowledge" (parallel with "Curated Marketplaces" in the same Agent Experience section; "curated" tells the admin what they do there — review + approve) - "Agent Setup Prompt" → "Init Prompt" (matches the `agnes init` flow it actually drives) - "Agent Workspace Prompt" → "Workspace Prompt" (the "Agent" prefix was redundant — every item in the section is agent-facing) Renames page titles + H1s on /admin/agent-prompt and /admin/workspace-prompt to match. * refactor: rename Usage → Telemetry across user-facing surfaces External surfaces all switch; internal Python module / file names and the physical DB tables (usage_events, usage_session_summary, usage_tool_daily, usage_plugin_daily) stay — renaming them would force a schema migration + a redo of the LLM Text-to-SQL prompt for no analyst-visible win. Changes: - Admin dropdown: "Tool usage" → "Telemetry" - Page H1 / <title>: same - URL: /admin/usage → /admin/telemetry; old URL 308-redirects - API prefix: /api/admin/usage/* → /api/admin/telemetry/* - CLI: primary command `agnes admin telemetry …`; `agnes admin usage` kept as a deprecated alias so existing operator scripts keep working - Internal data-source table id: agnes_usage → agnes_telemetry. The registry seed now evicts any stale internal-source row whose id no longer matches INTERNAL_TABLES, so the old `agnes_usage` row is removed from table_registry on next app boot - All tests + JS endpoint paths updated * test(rbac): include auto-appended internal tables in expectations get_accessible_tables now appends agnes_sessions / agnes_telemetry / agnes_audit to every authenticated user's accessible-tables list so the internal data source shows up in /catalog. The two existing rbac tests asserted hardcoded list shapes that pre-dated the change. Rewritten to assert "granted tables + the canonical internal-table set" instead of literal lists, so the test stays correct if the internal table roster changes again later. * ui: visual dividers between admin-dropdown sections Adds a 1px top border + 6px top margin to every section header except the first, so the five named groups (Activity Center, Users & Access, Data, Agent Experience, Server) read as visually separated clusters. The header itself stays small-caps + muted as before — the border is additive. * ui(memory): match obs-topbar visual on /corporate-memory The Curated Knowledge page (linked from the admin dropdown's Agent Experience section) opened straight into the stats bar — no title, no subtitle, no shared chrome with the other admin pages. Adds an obs-topbar-style header at the top of .container-memory: - H1 "Curated Knowledge" - subtitle explaining what the page is + how AI agents pull from it The `.ck-` class set duplicates the inline obs- styles from /admin/activity etc. for this one page; promoting the obs-* class set to style-custom.css for shared reuse is the obvious next step (4 pages already inline the same CSS), tracked as a follow-up. Page <title> also renamed from "Corporate Memory" → "Curated Knowledge". * ui(tables): list Agnes internal tables in /admin/tables + group in /catalog /admin/tables previously rendered three per-source-type listings (BQ / Keboola / Jira) and dropped any row whose source_type didn't match — so the agnes_sessions / agnes_telemetry / agnes_audit rows seeded into table_registry were invisible. Adds a fourth read-only section "Agnes internal tables" that filters source_type === 'internal' and renders the same registry-table layout the other sections use, with two changes: - no Register button (these rows are seeded on every app boot from connectors/internal/registry.py) - Edit + Delete actions hidden (any change would be reverted on the next start). Manage access stays so admins can still inspect. Mode badge picks up a new mode-internal CSS class (teal accent) so the display doesn't lie and call it "local". In /catalog, internal tables now group under an "agnes" accordion section (bucket="agnes" on seed) instead of falling into the catch-all "default". Single source of truth for which tables exist; admins find them where they expect. * ui(tables): Agnes internal as a 4th tab next to BQ/Keboola/Jira Previous iteration mounted the internal-table listing as a separate standalone card under the tab strip. Reshapes it to a proper tab-content section so admins switch between data sources via one consistent nav (BigQuery / Keboola / Jira / Agnes internal). - New tab button "Agnes internal" in the tab-nav. - The listing card becomes <section id="tab-content-internal" class="tab-content">; switchTab() already routes by id so no JS change beyond extending the hash allowlist for direct #internal links. - Tab content keeps the read-only treatment from the previous commit (no Register button, no Edit / Delete in renderRegistryListing). * ui: rename Curated Knowledge → Curated Memory Settles the naming back on "Curated Memory" — parallel structure with "Curated Marketplaces" in the same Agent Experience section, and zero rename ripple: URL (/corporate-memory), API (/api/memory/), CLI (agnes admin memory), and Python modules all stay on "memory" so the admin label finally lines up with the underlying surfaces. The "Curated" prefix still tells admins what they do on the page (review pending → approve / mandate / reject) and reads as a sibling of "Curated Marketplaces" right next to it in the dropdown. Touches: admin dropdown label, page <title>, page H1. DB tables stay on knowledge_ (already the canonical naming for the data shape). * ui: rename "Server activity" → "Audit log" "Audit log" is what the page actually is — server-side audit_log table rendered with KPI cards + filter bar + sortable table. The "Server activity" label confused the term with Claude Code session telemetry (Telemetry page) and didn't make the source/concept clear. Touches: - Admin dropdown nav label - /admin/activity page H1 + subtitle - /admin/telemetry subtitle cross-link - test_activity_api page-renders assertion URL (/admin/activity) and API (/api/admin/activity/) stay — the "activity" name has stuck at the route layer for a year; rerouting those would churn dashboards/bookmarks for zero analyst-visible win. ui(admin-nav): gray band on each section header for clearer separation Previous iteration used a 1px top border between section labels — the labels still blended into the items above/below at a glance. Switches to a light gray background band per section header, extended edge-to- edge inside the panel via negative horizontal margins. Bolder font-weight (700) reinforces the separation; bumping the font color isn't needed because the band itself does the work. First section's header tucks into the panel's top border-radius so the band reaches the corners without a gap. * ui(catalog): rename internal-table category to "Agnes Internal" `bucket` is what /catalog renders as the accordion category header verbatim — "agnes" lowercase didn't read as a real category name and got confused with a system identifier. Bumps to "Agnes Internal". Seed re-applies on every app boot so existing rows pick up the new bucket value via `ON CONFLICT (id) DO UPDATE`. * ui(catalog): split Agnes Internal into its own card on /catalog Previously the three internal tables landed inside the "Core Business Data" card under an "Agnes Internal" accordion alongside Keboola / BQ buckets — readers conflated system telemetry with business datasets, and the data_stats header counter ("3 tables · ~X rows total") only ever counted synced rows so internal tables looked invisible. Split the catalog page into two cards: - Core Business Data: only non-internal source_types (Keboola, BQ, Jira). Accordions group by bucket as before. Stats counter reflects this card's tables. - Agnes Internal: a dedicated card with its own visual treatment (teal accent matching the mode-internal badge in /admin/tables). Flat list (no accordion — only 3 rows, never grows here), each row carries the canonical `agnes query` snippet. Read-only — no profiler click, no In-stack toggle, no sync metadata. Route adds `internal_card` context object; template renders the new card only when it's non-None. * fix(rbac): hide internal tables from /admin/access + drop "my" framing Two related cleanups for the Agnes-internal tables: 1. /admin/access (resource grants) no longer lists them. The `can_access` check has a hardcoded internal-table bypass — security is row-level (per-request view filter), so a table-grain `resource_grants` row would do nothing. Surfacing them in the UI let admins set up grants that silently no-op. Filter at the `_table_blocks` projection so the UI tree never sees them. 2. Display names drop the analyst-perspective "my" framing: "Agnes — my sessions" → "Agnes sessions" "Agnes — my telemetry events" → "Agnes telemetry events" "Agnes — my audit log" → "Agnes audit log" The "my" only makes sense from the querying analyst's seat (`SELECT … FROM agnes_sessions` returns their rows); on /admin/* pages where admin sees / configures them across users, the pronoun was misleading. Description text now spells out the row-level RBAC contract explicitly. Display names update via TableRegistryRepository.register's ON CONFLICT UPDATE on next app boot; no manual cleanup needed. * ui: subtitle notes about agnes_* tables on each Activity Center page The recursive observability story — Agnes serves its own audit / telemetry / session data through the same `agnes query` plumbing analysts use for business data — wasn't surfaced anywhere on the admin pages that show that data. Three pages get a one-liner with the canonical `agnes query` snippet + the RBAC contract (analysts see their own rows, admin sees all): - /admin/activity (Audit log) → agnes_audit - /admin/telemetry (Tool usage) → agnes_telemetry - /admin/sessions → agnes_sessions Sets up the discovery moment for admins: they're reading the page, they see "you can query this from Claude Code", they remember it when an analyst asks "how do I find my own failed tool calls?". * ui(tables): explain "Show log" empty-state on /admin/tables Cache warmup log <pre> renders with a dark background and is only populated by the SSE stream during a Re-warm all run. Opening the page cold + clicking Show log just revealed a black bar with no context — admins couldn't tell what they were looking at. Adds an inline paragraph above the <pre> explaining what the log is, the row format, when it fills in, and where to find the historical audit trail (/admin/activity). The actual <pre> stays empty until SSE events arrive, but the surrounding copy carries the meaning. * ui(tables): auto-open cache-warmup log on Re-warm all click A Re-warm all run takes ~24s per remote BQ row. With the <details> collapsed by default, operators saw the button disable, watched a quiet ~24s pass, and assumed nothing had happened — the streaming log was hidden behind a closed disclosure. Two small JS tweaks: - cacheWarmupRun() opens the details on click, so streamed lines appear without an extra interaction - cacheWarmupOnStart() hides the inline hint paragraph the moment real log content lands, so the dark log block isn't competing with redundant context Hint paragraph also clarifies that only `query_mode='remote'` BQ rows are warmed — operators with only materialized/internal tables would see total=0 and the page would "do nothing" by spec. * ui: trim Agnes internal copy across surfaces Descriptions had grown to explain the extraction pipeline ("parsed out of session JSONLs"), the underlying table ("Backed by usage_session_summary"), the RBAC mechanic ("row-level RBAC at query time — analysts see their own; admin sees all"), and the SQL snippet. Every implementation detail meant another rewrite on the next iter. Strips to one stable line per surface: what the data is, plus "Also available locally for analysis". Mechanics live in code + docs; the page copy says what the user needs to know. Touched: - connectors/internal/access.py: INTERNAL_TABLES descriptions - activity_center.html / admin_usage.html / admin_sessions.html subtitles - catalog.html Agnes Internal card description + row strip - admin_tables.html "Agnes internal" tab hint * fix(internal): is_user_admin arity bugs + + saved-view payload cap Round-1 code review (PR #278) caught two blocking bugs and three nits. Blocking — both `is_user_admin(user)` (single dict arg) calls raised TypeError. is_user_admin signature is `(user_id, conn)`. Affected: - app/api/query.py:_run_internal_query — every POST /api/query that references agnes_sessions / agnes_telemetry / agnes_audit blew up with a 500. The headline analyst-facing feature of this PR was unusable through the API. - app/api/v2_sample.py — same shape; `GET /api/v2/sample/agnes_` returned 500. Both fixed to call `is_user_admin(user.get("id"), conn)`. Added two FastAPI-level tests in test_internal_data_source.py that go through the TestClient — the existing unit tests on `execute_internal_query` and `build_filter_clause` skipped the request-handler layer where the bugs lived, which is why this landed. Nits also closed: - connectors/internal/access.py: `+` allowed in _USERNAME_RE / _USER_ID_RE so RFC 5321 email local-parts (alice+test@x) resolve correctly without hitting InternalAccessError. - app/api/observability.py: saved-view payload capped at 64 KiB to prevent an admin from bloating system.duckdb with a malformed save. fix(security): close non-admin data-leak via underlying-table refs PR #278 R2 review surfaced a non-admin-exploitable bypass: SQL whose string literal contains 'agnes_sessions' routed into the privileged internal-query path, then queried the underlying physical table (usage_session_summary / usage_events / audit_log) directly, escaping the CTE wrapper's row filter. Two reinforcing defenses: 1. find_internal_refs() now strips single-quoted string literals before scanning for alias names — a literal alone no longer routes the request into the privileged code path. 2. execute_internal_query() rejects non-admin SQL that references the underlying physical tables (usage_, audit_log). The CTE wrapper only scopes the agnes_ aliases; a direct FROM on the base table — or a shadowing inner WITH that still has to read the base table — bypasses RBAC. Block before execution with an actionable error pointing to the agnes_* alias. Admins are unaffected (god-mode short-circuit on the filter clause). 3. tests/test_internal_data_source.py — three new negative tests covering literal-only matches, direct-table refs, and CTE shadow attempts. Also tightens usage_ask.py's SELECT-only validator: pragma_table_info, pragma_storage_info, pragma_database_, and duckdb_tables / columns / views / indexes / schemas are reflection functions that leak metadata the analyst question shouldn't reach. \bPRAGMA\b in _FORBIDDEN never matched the function-call form (word-boundary between `A` and `_`). fix(security): dynamic denylist for non-admin internal queries R3 review (PR #278) caught a wider data-leak than R2: the underlying- physical-table guard listed only the 7 usage_* + audit_log tables, but system.duckdb has 30+ other sensitive tables — users (emails + ids), personal_access_tokens, resource_grants, user_groups, user_observability_views, store_, marketplace_, knowledge_, etc. A non-admin SQL like SELECT FROM agnes_sessions UNION ALL SELECT email, id, … FROM users LIMIT 1 would leak every user's row. Replaces the hardcoded denylist with a dynamic allowlist — non-admin SQL may reference ONLY the registered agnes_* aliases. Every other table in `information_schema.tables` (main schema) is rejected. Future migrations that add a new sensitive table are automatically covered without re-editing this module. Also strips SQL comments (`/* /` and `--`) before the identifier scan so a comment-wrapped table name (`//users//`) can't slip past the regex. Four new negative tests pin: `users`, `personal_access_tokens`, block-comment wrap, line-comment wrap. Plus: per-user view-count cap (100) on /api/admin/observability/views so an admin can't fill system.duckdb with thousands of saved views. release: 0.54.0 — Activity Center + Telemetry + Sessions + internal datasource Cuts the work shipped across this PR (Activity Center build, recursive internal data source) into a versioned release. Bumps pyproject.toml to 0.54.0; renames the top of CHANGELOG.md from [Unreleased] to [0.54.0] — 2026-05-12 with a header summary; opens a fresh [Unreleased] section for the next round. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 22:41:19 +02:00
Vojtech	fb6e930bc9	feat(store-guardrails): per-component description quality + plain-language UX (#276 ) * feat(store-guardrails): enforce per-component description quality Two-tier hard guardrail on flea-market submissions. Empty / placeholder / single-word descriptions now block before any LLM call; vague-but-passes- floor descriptions block on the substantive LLM review layer. Tier 1 — inline mechanical check (src/store_guardrails/content_check.py). Walks the baked plugin tree, evaluates each component (plugin manifest, agents, skills, commands) plus the submission-level form description against a 60-char / 25-char (commands) / 5-distinct-word / 200-char-body floor with a placeholder denylist (TODO, TBD, {{var}}, etc.). Floors calibrated against real ecosystem norms: Claude / superpowers / compound-engineering skill packs cluster 150–220 chars, npm / Docker / VS Code at 100–120. InlineResult.passed now ANDs in content.status. Tier 2 — LLM review extension (prompts.py + llm_review.py). System prompt gains a content-quality criterion; REVIEW_JSON_SCHEMA carries a content_quality {verdict, issues[]} object alongside the existing security findings. is_safe() requires content_quality.verdict == 'pass'. Single LLM call covers both dimensions. MAX_RESPONSE_TOKENS bumped 2000 → 2500 for the extra payload. Verdicts missing content_quality treated as pass (backwards compat with already-recorded rows). Submitter UX: - /store/new wizard now carries a "Before you upload — what passes review" collapsible disclosure on both step 1 and step 2 with the bar + patterns that work. Live char counter on the description field. Per-component preview table (green/red dots from the new summarize_for_preview helper) renders after the ZIP /preview round trip, scoping each finding to its file. - New /store/examples page with rejected/passes pairs for skill / agent / plugin / command plus a "Why these limits" research table. Anchored sections (#skill / #agent / #plugin / #command) so the rejection banner can deep-link by component_type. - Quarantine banner _content_findings.html groups findings by file (one "See <type> example ↗" per component, not per field) and translates field codes (frontmatter.description / body / etc.) to plain-English labels. _content_howto_fix.html surfaces a static "Re-upload as new version" + "See examples" action row beneath any content failure on the entity detail page. - _parse_frontmatter moved to src/store_guardrails/_frontmatter.py so the new check module shares the parser without inverting the app → src dependency direction. Tests: - New tests/test_store_guardrails_content.py (29 cases) covering every failure code per component type plus submission-level checks and the summarize_components / summarize_for_preview helpers. - Extended test_store_guardrails_inline.py for the new InlineResult.content field + aggregate behaviour. - Extended test_store_guardrails_llm.py for the new content_quality verdict pathways (fail blocks, missing field passes). - Backfilled fixture descriptions across test_store_api.py, test_store_entity_versions.py, test_store_put_atomic.py, test_admin_store_submissions.py, test_marketplace_api.py, test_marketplace_v32_endpoints.py so existing happy-path tests clear the new 60-char floor. * fix(content-guardrail): align agents walker with preview + drop import-time .format() Two cleanups from the takeover review on #276 (vr/guardrails-content). 1) `_iter_components` for agents now skips files lacking frontmatter (no `name` AND no `description`). Pre-fix the walker greedily evaluated every `.md` under `agents/` — `agents/README.md` and helper docs got flagged as "frontmatter.description empty" rejections. Worse: `summarize_for_preview` for `type=agent` ALREADY filters the same shape, so the upload preview gave a green dot while the post-bake check gave a red rejection on submit. Two new regression tests in TestAgentsWalkerSkipsNonAgentFiles pin both shapes (README + _NOTES.md) so the preview/check parity stays aligned. 2) `body_too_short` hints now use the same runtime-kwarg substitution pattern as every other hint in the table. Pre-fix the skill + agent body_too_short hints called `.format(min_chars=_MIN_BODY_CHARS)` at module-load time, but the call site `_hint_for(type_, "body_too_short")` didn't pass `min_chars=`, so the format() was just baking the constant at import. Cosmetic inconsistency; pass `min_chars=_MIN_BODY_CHARS` at the call site instead and let `_hint_for` do the substitution like it does for `too_short`. Verified end-to-end: - New TestAgentsWalkerSkipsNonAgentFiles cases fail on the unfixed walker (verified by reverting to the pre-fix file and re-running); pass cleanly after the fix. - Full content-guardrail suite: 25/25 (23 existing + 2 new). - Full pytest: 4189 passed, 25 skipped. release: 0.53.5 — content guardrail (flea-market submitter UX) + catalog ENTITY column + BQ hint dispatch Bundles three threads landed in [Unreleased]: - Vojta's flea-market content guardrail (two-tier mechanical + LLM) - Zdeněk's `agnes catalog` ENTITY column replacement for FLAVOR - Zdeněk's `/api/query` remote_estimate_failed hint dispatch fix Plus the takeover hygiene from #276 review (agents walker preview/check parity + body_too_short hint runtime kwarg consistency) and the backslash-escape fix follow-up to v0.53.4 #275. No DB migration; no API change. Patch upgrade lands transparently. Upload form's new "Before you upload" disclosure + per-component preview table appear on the next dev-VM auto-pull. Quarantine banner now groups findings by file with "See <type> example ↗" deep-links to the new /store/examples reference page. --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-12 21:48:27 +02:00
ZdenekSrotyr	1ade1300c6	fix(bq-hint): drop literal backslash escapes from syntax-error hint string (#275 ) PR #274 (just merged) introduced `\`AS \\\`rows\\\`\`` in the syntax-error branch of _hint_for_bq_bad_request. Python doesn't recognize \\\` as an escape sequence, so the literal backslashes survived into the JSON `hint:` field. Analyst reading the CLI error saw: Backtick the alias (`AS \`rows\``) or rename it ... with visible backslashes — exactly the misleading shape this dispatcher exists to clean up. Self-review caught it; this PR replaces the problematic substring with plain prose ("rename the alias to a non-reserved word (AS row_count) or backtick-quote it BQ-style (AS `rows` with literal backticks around the identifier)") that needs no escape gymnastics. New regression test test_no_hint_branch_leaks_literal_backslashes pins every dispatch branch against `\\\`` and `\\\\` substrings — pytest now catches this class on the next regression instead of waiting for an analyst to spot it. CHANGELOG bullet rephrased to match (the same broken backslashes leaked into the [Unreleased] entry). Verified: 4162 tests pass; 26 in test_api_query_guardrail.py green; demo print of the syntax-error branch shows clean output.	2026-05-12 18:57:46 +00:00
ZdenekSrotyr	5458ccc41b	hygiene: BQ error hint dispatch + catalog ENTITY column (#274 ) Two analyst-UX papercuts surfaced by the v0.53.4 onboarding smoke test. 1) /api/query remote_estimate_failed hint now branches on the BigQuery error class instead of always claiming a column doesn't exist. The previous hardcoded "Most often this means a column referenced … doesn't exist" misled analysts whenever BigQuery actually rejected on syntax — concretely, `SELECT COUNT(*) AS rows FROM …` fails with `Syntax error: Unexpected keyword ROWS at [1:20]` (`rows` is a BQ reserved word) and the hint pointed at non-existent columns. New _hint_for_bq_bad_request() helper dispatches: - "Syntax error" / "Unexpected keyword" → reserved-keyword alias hint with `AS row_count` workaround - "Unrecognized name" / "not found inside" → `agnes schema <id>` - "Table not found" → `agnes catalog` - fallback → enumerate all three 4 unit tests in TestHintForBqBadRequest pin each branch. Existing guardrail tests (test_fallback_fails_fast_on_pure_duckdb_syntax, test_remote_estimate_failed_surfaces_first_error_when_attempts_differ) continue to pass — both hint substrings they assert on still appear in the relevant branches. 2) `agnes catalog` replaces the FLAVOR column with ENTITY. FLAVOR rendered t['sql_flavor'] which duplicated SOURCE for any catalog dominated by one source type — analysts saw `SOURCE=bigquery FLAVOR=bigquery` on every row. ENTITY instead surfaces the upstream BigQuery entity_type (BASE TABLE / VIEW / MATERIALIZED_VIEW) for remote rows; non-remote rows render `-`. The distinction matters operationally: views don't support predicate pushdown, so `agnes query --remote` against a view trips the cost guardrail where the same query against a BASE TABLE pushes down cleanly. The entity_type field has been in the v2 catalog response since 0.51.0; this PR just stops hiding it behind a column header that conveyed no information. JSON output (`agnes catalog --json`) is unchanged — only the human- readable column changed. No DB migration; no API change. Verified: 4161 tests pass locally; 25 in test_api_query_guardrail.py green; the 4 new TestHintForBqBadRequest cases pin each branch.	2026-05-12 18:32:29 +00:00
ZdenekSrotyr	0407d194ba	ci: fix indentation in cli-wheel-clean-install Python heredoc (#273 ) The cli-wheel-clean-install lane introduced in v0.53.4 (#272) failed on its first main run with `IndentationError: unexpected indent`: YAML `run: \|` preserves the relative indent of the inline `python3 -c` heredoc, so the Python interpreter saw `try:` at column 12 and refused to parse. Fix: write the assertion script to /tmp/smoke.py via a `cat <<'PY'` heredoc (left-aligned content lands flat), mount it into the container, and invoke the tool's venv python directly (`$HOME/.local/share/uv/tools/agnes-the-ai-analyst/bin/python`). Cleaner than the previous inline form and side-steps `uv tool run --from <name>` doing a PyPI lookup that fails because we don't publish there. Verified locally with the same docker run as the CI step — prints `OK: kbcstorage absent, urllib3 2.7.0`.	2026-05-12 17:32:28 +00:00
ZdenekSrotyr	103669dafd	fix(cli-install): move kbcstorage to [server] extra so wheel installs cleanly (P0 onboarding hotfix → 0.53.4) (#272 ) * fix(cli-install): move kbcstorage to [server] extra so wheel installs cleanly The 0.53.3 wheel served at /cli/wheel/ is unsatisfiable on a clean machine: analyst runs `uv tool install <wheel-url>` per the published /setup instructions and the resolver immediately fails with Because kbcstorage<=0.9.5 depends on urllib3<2.0.0 and agnes-the-ai-analyst==0.53.3 depends on kbcstorage>=0.9.0 and urllib3>=2.7.0, we can conclude that agnes-the-ai-analyst==0.53.3 cannot be used. The `[tool.uv] override-dependencies = ["urllib3>=2.7.0"]` in pyproject.toml masked the conflict in workspace contexts (Dockerfile + dev install) but does NOT propagate to the wheel — wheel METADATA is plain PEP 621 Requires-Dist, and a fresh resolver context (uv tool install <wheel-url>) never sees the override. Every existing test passed because the dev venv already has kbcstorage 0.9.5 + urllib3 2.7.0 coexisting under workspace overrides; the break only surfaces on the next analyst's first install. Fix: kbcstorage moved out of [project] dependencies into [project.optional-dependencies].server, since it is server-side only (connectors/keboola/client.py is the sole import site, called from admin endpoints, server connectors, and integration tests — never from the CLI install path). Server install picks it up via Dockerfile's `uv pip install --system --no-cache ".[server]"`. CI installs `.[dev,server]` so workspace tests still cover the kbcstorage path. Analyst CLI wheel METADATA now lists `kbcstorage>=0.9.0; extra == 'server'` (gated) and `uv tool install <wheel>` resolves cleanly. Verified end-to-end: - Built wheel locally; inspected METADATA — kbcstorage line is now `; extra == 'server'`. - `docker run --rm python:3.13-slim` + `uv tool install <wheel>`: agnes 0.53.4 installs, `agnes --version` works, `agnes catalog --help` renders, kbcstorage absent from CLI venv, urllib3 = 2.7.0. - Same container with `.[server]` install path: kbcstorage present, urllib3 = 2.7.0 (override applies in workspace context). - Full pytest suite green locally (4157 passed, 25 skipped). * release: 0.53.4 — analyst CLI install hotfix (urllib3/kbcstorage resolver conflict) Patch bump shipping the [server] extra split + new clean-install CI lane. No DB migration; no API change; no operator-facing config change. Operator side (Dockerfile path) auto-picks `.[server]` so the production image gains kbcstorage transparently. Analyst onboarding (uv tool install <wheel>) starts working again.	2026-05-12 17:09:44 +00:00
ZdenekSrotyr	c8de0e0f64	release: 0.53.2 — diagnose silent-capture check + urllib3 2.7.0 + flaky-test fix (#270 ) Three bundled improvements: - #244 — new `agnes diagnose` check compares SessionStart events (~/.claude/projects/<encoded>/*.jsonl) against agnes-push uploaded log entries inside a 7-day window. Surfaces a warning when the gap exceeds 3, hinting at silently-broken capture-session — previously detectable only weeks after the fact. - Dependabot — bumps transitive urllib3 from 1.26.20 to 2.7.0 to close 5 advisories (4 high, 1 medium). kbcstorage 0.9.5 still pins urllib3<2.0.0 upstream; overridden via [tool.uv] override-dependencies since the SDK works fine against 2.x in practice (Client + Tables both flow through requests, which supports both lines). - #252 — fix flaky test_scratch_dir_cleaned_up_after_failed_extraction by redirecting tempfile.tempdir to a per-test tmp_path. Pre-#252 the test scanned the shared system tmp dir and a sibling store test in another pytest-xdist worker could trip the assertion mid-window. Closes #244. Closes #252.	2026-05-12 18:28:04 +02:00
ZdenekSrotyr	ea6fcfda3b	release: 0.53.2 Cut [Unreleased] → [0.53.2]. Bumps pyproject from 0.53.1 to 0.53.2. Contents: configurable instance.brand + workspace_dir, setup-script polish (explicit mkdir step, final "restart Claude Code" step, uniform connector markers), Asana revert from MCP to PAT+REST, Atlassian longest-expiry guidance, and the BREAKING removal of agnes query --register-bq.	2026-05-12 18:18:13 +02:00
ZdenekSrotyr	7d868159f2	address Devin Review on PR #264 BUG_0001 (red): config/claude_md_template.txt is the Jinja2 source for every analyst workspace's CLAUDE.md served via /api/welcome (src/claude_md.py). It still instructed the agent to use the removed --register-bq flag in 6 places — defeating the point of the PR for anyone who ran agnes init after merge. Rewritten: - ASCII routing diagram: "join with a local table" now points to "agnes snapshot create the remote side, then join locally" - "Three patterns" table → "Two patterns" (snapshot create + --remote) - "Hybrid query example" rewritten as snapshot-create + local join, with --remote called out as the escape hatch when the remote side is too big to snapshot - "When the table isn't in agnes catalog" — drop the ad-hoc --register-bq path; admins register, no analyst-side workaround - Footer cross-ref drops "hybrid-query examples" BUG_0002 (yellow): cli/error_render.py docstring line 7 said "All three previously flattened..." after I had already reduced "Three CLI paths" → "Two CLI paths" on line 3. "All three" → "Both".	2026-05-12 18:18:13 +02:00
ZdenekSrotyr	79b55b6ff3	remove agnes query --register-bq from client CLI The flag ran RemoteQueryEngine in-process on the caller's machine and required local BigQuery credentials (BIGQUERY_PROJECT + ADC). Analysts don't have those, so calling --register-bq from an analyst workspace surfaced as a confusing not_configured error chain ("Could not load static instance.yaml" + "BigQuery project not configured"). An agent following CLAUDE.md's hybrid-queries guidance would land in exactly that trap. The underlying engine was originally designed server-side (commit `d180b201`, "Step 28: Remote query architecture"); the CLI port (commit `d605e7d9`) silently assumed parity with the server. Server-side hybrid already exists as an admin-only POST /api/query/hybrid endpoint (app/api/query_hybrid.py) and is untouched here. Analysts combining local + remote data now have two documented paths: agnes snapshot create a filtered slice and join locally, or run the join server-side via agnes query --remote. CLAUDE.md, the agent skill, docs/DATA_SOURCES.md, and connectors.md updated accordingly.	2026-05-12 18:18:13 +02:00
ZdenekSrotyr	56db622e36	release: 0.53.1 — fix #266 BQ Edit modal destroying bucket/source_table (#269 ) Three client-side fixes in admin_tables.html plus a regression test file pinning the server-side PUT contract the new JS relies on. Bug 1 — saveBqTabEdit (synced/custom) nulled bucket/source_table on every save; the null was supposed to clear stale state on a true remote→materialized mode flip but fired on every save, silently wiping persisted bucket/source_table when admin edited only the description on an already-materialized row. Now gated by _editOriginalQueryMode !== 'materialized'. Bug 2/3 — _buildBigQueryPayload (synced/whole) at register time did not send bucket/source_table — only source_query — so whole-table materialized rows persisted with bucket=NULL. Edit modal then loaded empty Dataset/Table inputs over a SELECT * SQL. Register now sends both fields; _openEditBqModal additionally parses source_query as a fallback for rows that registered pre-0.53.1. Closes #266.	2026-05-12 17:29:56 +02:00
Vojtech	79a958ec26	feat(setup): configurable instance brand + connector setup overhaul (#268 ) - instance.brand (env AGNES_INSTANCE_BRAND, default "Agnes") + instance.workspace_dir replace hard-coded "Agnes" / "~/Agnes" across /home, /setup, /setup-advanced, /login, /install, /me/debug, and the Claude Code clipboard setup script. Terraform-friendly env override; defaults preserve existing Agnes branding. - Explicit "create workspace folder" step on /home (OS-tabbed mkdir+cd) + same step baked into the clipboard script as step 2. Drops the implicit assumption that `agnes init --workspace .` lands in a sensibly-cd'd shell. - Final "Restart Claude Code" step in the setup script (unconditional, between connectors and Confirm) so freshly-installed plugins, MCP servers, and SessionStart hooks load on the next Claude Code session. - Asana reverted from hosted Remote MCP back to PAT + raw REST against app.asana.com/api/1.0. MCP envelope shape consumed ~5x tokens per call; the PAT path lets the agent read flat REST fields. Existing MCP registration is detected and the user is asked whether to remove it (default Y, with benefits listed: token cost, no third-party hop, no OAuth refresh dance, deterministic envelope shape). - Atlassian connector instructs picking the longest API-token expiry (today "1 year") to cut re-mint friction. No public query-parameter hook exists on id.atlassian.com to pre-select expiry, so the prompt documents the manual click and acknowledges that limitation. - Uniform ✅ / ❌ per-connector marker contract (Asana, GWS, Atlassian) for the Confirm summary to grep. Each connector now ends with a Claude-driven end-to-end test that uses Claude Code's own bash to exercise the stored credential and prints "✅ <Connector> integration verified — ..." (or the failure variant).	2026-05-12 17:10:08 +02:00
ZdenekSrotyr	12db59127b	release: 0.53.0 — close Tier B trackers (#259-#261) + admin UI fix (#265 ) (#267 ) * release: 0.53.0 — Tier B trackers + admin UI bugfix Closes #259 (init resume sentinel), #260 (startup parquet-lock sweep), #261 (materialized schema uses local parquet, not BQ), #265 (admin tables apostrophe → HTML-entity escape). Tracker notes: #262 closed as obsolete (pre-empted by 0.51.0 changes), #266 left open pending UX clarification. * fix(init): move resume sentinel from .agnes/ to .claude/ The clean-install integration test (test_clean_install_integration.py) forbids creating .agnes/ in the workspace root via its forbidden_unconditional list — that path is reserved for ~/.agnes/ in the user's HOME (marketplace clone, CA bundle). .claude/ is already created by agnes init for settings.json + hooks, so dropping init-complete next to those keeps the resume sentinel consistent with the rest of Claude Code's workspace surface and lets the clean-install assertions pass. Issue #259. * docs(changelog): point #259 entry at new .claude/init-complete path Follows the sentinel move from .agnes/ → .claude/ to keep the changelog in sync with what 0.53.0 actually ships.	2026-05-12 16:28:41 +02:00
ZdenekSrotyr	114088d592	Merge pull request #263 from keboola/worktree-agnes-052-ux-fixes release: 0.52.0 — UX + hygiene round (closes #254-#258)	2026-05-12 15:27:07 +02:00
ZdenekSrotyr	e4e7bd2606	fix: remove stray conflict marker in CHANGELOG	2026-05-12 15:20:16 +02:00
ZdenekSrotyr	1ecdfd3cdc	Merge remote-tracking branch 'origin/main' into worktree-agnes-052-ux-fixes # Conflicts: # CHANGELOG.md # pyproject.toml	2026-05-12 15:19:36 +02:00
Vojtech	9ae2dd19fe	fix(memory-admin): pass RBAC user_groups (not YAML config) to /corporate-memory/admin (#253 ) * fix(memory-admin): pass RBAC user_groups (not YAML config) to /corporate-memory/admin `GET /corporate-memory/admin` was passing `corporate_memory.groups` from instance.yaml (a dict, default `{}`) as the template's `groups=`. The template's `renderItemCard` does `GROUPS.map(g => ...)` to build the mandate-form audience picker, which throws `{}.map is not a function` the moment any pending item renders. The thrown TypeError bubbles up through `renderReviewItems` and gets swallowed by the `loadReviewQueue` catch block, which paints "Error loading pending items." over a perfectly valid `/api/memory/admin/pending` response. Bug was dormant since the initial system commit because `renderItemCard` only runs when at least one pending item exists, so test fixtures and empty queues never tripped it. Mandate form actually targets RBAC user_groups (audience strings like `group:Admin`), not the unrelated `corporate_memory.groups` YAML section. Route now passes the user_groups list shaped as `[{name, members_count}]`. Template additionally guards the `.map` call with `Array.isArray(GROUPS) ? GROUPS : []` so a future shape regression degrades to "no group options" instead of crashing the list. No DB migration; no API change. * test(memory-admin): assert /corporate-memory/admin renders GROUPS as array Server-side regression for the bug Vojta's previous commit fixed: the route used to pass `corporate_memory.groups` (a dict, default `{}`) into the template as `groups=`, so `const GROUPS = {{ groups \| tojson }}` rendered as `{}` and `GROUPS.map(...)` inside renderItemCard threw at runtime. The bug was dormant because renderItemCard only runs when at least one pending item exists — empty queues never tripped it. Test seeds one pending knowledge_item and asserts the rendered HTML contains `const GROUPS = [` (array prefix), so any future regression that flips the variable back to a dict shape fails immediately rather than waiting for an operator to seed the queue and notice the broken admin page. * release: 0.51.1 — corporate-memory admin pending-banner hotfix Patch bump: ships the /corporate-memory/admin GROUPS-shape fix (dict → array of {name, members_count}) and its server-side regression test. No DB migration, no API change, no operator action required — upgrades land transparently. --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-12 13:16:01 +00:00
ZdenekSrotyr	48755b9864	release: 0.52.0 — UX/hygiene round (5 fixes from 0.51.0 retro) Closes #254 (agnes sample alias), #255 (wide-table render), #256 (single-flight on bq-metadata-refresh + run_id), #257 (init wording), #258 (progress bar clamp). Tier B trackers left open: #259 (init resume), #260 (stale .lock), #261 (schema cold-start), #262 (docker disk).	2026-05-12 15:09:14 +02:00
ZdenekSrotyr	cd0a97d269	Merge pull request #250 from keboola/worktree-catalog-bq-hotfix release: 0.51.0 — catalog hang fix, persistent BQ metadata cache, view-aware UX, scheduler hygiene	2026-05-12 12:04:46 +02:00
ZdenekSrotyr	99b9379ba3	Merge remote-tracking branch 'origin/main' into worktree-catalog-bq-hotfix # Conflicts: # CHANGELOG.md	2026-05-12 11:56:49 +02:00
minasarustamyan	dc5e0e0d11	Marketplace UX overhaul: rich plugin/skill/agent detail + filename rename (#251 ) * Rename agnes-metadata.json to marketplace-metadata.json Curated marketplace enrichment file (.claude-plugin/agnes-metadata.json) becomes marketplace-metadata.json. Clean cut, no fallback — curators of upstream marketplace repos must rename the file on their side. Python API renames mirror the file rename: read_agnes_metadata → read_marketplace_metadata, AGNES_METADATA_REL → MARKETPLACE_METADATA_REL, AGNES_METADATA_MAX_BYTES → MARKETPLACE_METADATA_MAX_BYTES. Synth Claude Code marketplace strip rule (.agnes/** + the metadata file) follows the new filename. * Marketplace detail polish: window cover + 715:310 aspect + helper alignment - Plugin & item (skill/agent) detail hero: 160x160 square cover replaced with a macOS-style window frame (3 traffic-light dots + titlebar label showing the entity name). Body is constrained to 715:310 so curator- uploaded covers no longer crop to a square. Window is 380px wide; meta column and absolutely-positioned top-right install/remove actions stay put. Fallback when no cover_photo_url (translucent gradient + PL/SK/AG initials) is unchanged, just inside the window body. - Inner skill/agent cards in the plugin detail's Internal structure section adopt the same 715:310 aspect (was fixed 78px tall). No window chrome on inner cards — just the matching proportions so covers read consistently across hero, grid tiles, and listing cards. - Curated nested item helper text ("This skill is part of ... — add the bundle to your stack to use it") now stacks UNDER the "Open parent plugin" button instead of being a side-by-side flex sibling in the actions-row. Added align-self: flex-end so the 260px helper box anchors at the right edge of the 300px actions column, matching the button's right edge. * Marketplace My tab: surface the same category + type filters as Flea - Frontend: mp-cat-row and mp-type-row now show on tab=my (previously hidden — type was flea-only, category was flea/curated-only). Curated browse stays plugin-only and continues to hide the type pills. fetchOne() sends the `type` param for tab=my too, so the items endpoint's existing my-branch filter actually receives it. - Backend categories endpoint, tab=my branch: when the type filter is set to skill/agent, skip counting curated subscriptions. Curated plugins are always type='plugin', so they wouldn't survive the items endpoint's type filter; including them in the category counts made the pill numbers overstate what users could actually see in the grid. type=None or type='plugin' keeps the previous behaviour. - CHANGELOG entry under [Unreleased]. * Marketplace plugin detail: render rich content from marketplace-metadata.json Adds five optional plugin-level fields to marketplace-metadata.json and renders them on the curated plugin detail page + listing card: * display_name — friendly h1 / listing-card name / mac-window titlebar label (overrides the technical plugin id) * tagline — punchy 1-line value prop for the hero subtitle and the listing card description (replacing the verbose marketplace.json description on cards) * description — multi-paragraph markdown body, server-side rendered through markdown-it-py and sanitized through nh3 with a description-scoped allowlist (no iframes / no raw HTML / no javascript: links). Powers the "What it does" panel. * use_cases[] — {title, description, prompt} entries that render as a 3-column "When to use it" card grid; each card shows the literal prompt as a code chip so users can copy-paste into Claude Code. * sample_interaction — {user, assistant} dialog rendered in a Claude Code-style dark Catppuccin Mocha transcript panel: monospace user row with a green ">" prompt indicator + sans-serif assistant body with markdown formatting (peach bold, yellow italic, pink inline code, mantle-dark fenced code blocks). All five fields are optional; UI sections only render when populated, so plugins without enrichment look identical to before. Fields are read on-demand from the working tree (cached by mtime per marketplace slug) so curator edits land at the next request without waiting for a sync cycle — same pattern as the existing inner-skill/agent enrichment path. No DB schema bump. Skill / agent rich-content rendering is deferred to a later phase (needs a source-of-truth decision: extend plugin.yml? LLM-generate from SKILL.md / agent.md?). The schema accepts the same fields at skill/agent level today for forward compatibility but the UI ignores them for now. Also: stripped a stale `background-color: var(--bg)` from the global `code` rule in style.css (was making inline code visually disappear on the page background). * Skill / agent detail: render rich content from marketplace-metadata.json Brings the skill/agent detail pages to parity with the plugin detail page. Same rich-content schema (display_name, tagline, description as markdown, use_cases[], sample_interaction) plus two per-item additions: * invocation — curator-provided literal command string. When set, overrides the computed "<manifest_name>:<inner_name>" chip and cleanly supports both "/" skill prefix and "@" agent prefix (the hardcoded "/" in the chip markup is hidden when the curator provides the invocation, so /grpn-eng:query <q> and @grpn-eng:cto-architect both render correctly). * when_to_use — markdown disambiguation block ("Use this for X. For similar Y, see /other-skill") rendered into a new "When to use this" panel below the Example section. Skill / agent category is now per-item overridable in marketplace-metadata.json. When absent, the API keeps the parent plugin's category as the badge so existing items don't lose their category until curators opt in to per-item categorization. The new "Example" Q&A panel uses the same Claude Code-style dark Catppuccin Mocha transcript treatment as the plugin detail — monospace user row with a green ">" prompt indicator + sans-serif assistant body with markdown formatting. All new fields are optional and read on-demand from the working tree. Skills / agents whose marketplace-metadata.json doesn't carry rich content render exactly the same way they did before (frontmatter description + computed slash command + cover from existing v32 enrichment). No DB schema bump. * Fix TypeError in skill / agent detail when curator sets per-item category `curated_skill_detail` and `curated_agent_detail` were passing both `parent` (from `_curated_inner_parent_fields`, which returns the parent plugin's category as a fallback) and `enrichment` (from `_curated_inner_enrichment`, which returns the per-item category override when the curator set one) into `InnerDetailResponse(...)`. Python function-call kwargs unpacking with overlapping keys raises `TypeError: got multiple values for keyword argument 'category'` — it doesn't merge like a literal dict does. The bug only surfaced when the marketplace-metadata.json carried a `category` field at skill / agent level (curator opting into per-item categorization); items without that override hit the endpoint cleanly because only parent provided the key. Fix: build `merged = {parent, enrichment}` first (literal-dict syntax DOES merge, with the right-hand-side winning) and unpack the merged dict. Curator override still wins via the merge order, and the same pattern is future-proof for any other field that lands in both layers later. Plus a regression test in test_marketplace_metadata.py asserting that the inner-resolver carries `category` for downstream merging. * Marketplace detail: tolerate partial curator JSON Server constructed UseCase / SampleInteraction via raw dict indexing (uc["title"], sample["assistant"]), so a curator commit missing any required Pydantic field crashed the whole plugin / skill / agent detail endpoint with a 500. Route both constructions through _safe_use_case / _safe_sample_interaction helpers — partial input silently drops the malformed card / section instead of breaking the page. Regression test in test_marketplace_api.py covers the three shapes: use_case missing a key, use_case with an empty string, and sample_interaction with only user (no assistant). Sibling rich fields still render. * Address PR-251 review (must-fixes + S2/S3 polish) + release-cut 0.50.0 Five must-fixes from the review pass (3 from @cvrysanek's two-stage review, 2 from my independent pass), plus the 0.50.0 release-cut as the last commit on this PR per CLAUDE.md (CLAUDE.md "Release-cut belongs to the PR" rule added in v0.49.1). Must-fixes ---------- 1. Cache eviction: bounded LRU instead of per-marketplace predicate. The previous predicate (`k[0] == marketplace_id and k[1] != mtime_ns`) only swept stale entries for the CURRENT marketplace; with N>100 distinct marketplaces each holding one mtime key, the cap silently failed and memory grew linearly. Replaced with OrderedDict-backed bounded LRU at cap=256, drop oldest insert on overflow. Cache stress test pinned in test_marketplace_metadata.py. 2. Render CPU cap: per-field byte cap on description / when_to_use / sample_interaction.assistant via MARKETPLACE_METADATA_FIELD_MAX_BYTES (= 64 KiB). Without this, a 1 MiB curator markdown body × QPS = curator-controlled CPU burn through pure-Python markdown-it-py. Truncation respects UTF-8 boundaries and logs a warning so the curator sees the cap fire on the next sync. Test for cap + UTF-8-boundary preservation. 3. Inner-detail bypassed the metadata cache. _curated_inner_enrichment, _curated_inner_cover, and curated_detail all called read_marketplace_metadata directly, defeating the mtime cache the plugin listing already shared. Routed all three through _read_metadata_cached so skill/agent detail hits are O(1) re-parses per marketplace per mtime instead of O(QPS). 4. Truthy-vs-presence trap in plugin/inner enrichment merge. API-layer writers used `if resolved.get(k):` which silently dropped any future falsy-but-valid resolver field (bool featured=False, int priority=0, str category=''). Switched to presence check (`if k in resolved`) so the resolver is the authority on field presence; `{parent, enrichment}` merge respects whatever the resolver decided to ship. 5. Vendor-agnostic OSS cleanup. Removed operator-specific token references (/grpn-eng:, @grpn-eng:, .foundryai/) from src/marketplace_metadata.py docstring, app/web/templates/ marketplace_item_detail.html JS comment, docs/curated-marketplace- format.md, and tests/test_marketplace_metadata.py fixtures. Replaced with generic /my-plugin:tool / @my-agent:role / .example/ placeholders. CHANGELOG --------- - New "### Fixed (PR #251 follow-ups)" section documenting all 4 code-side must-fixes - New "### Internal" section noting the vendor cleanup + new tests - BREAKING bullet for the file rename now covers operator-side migration: running instances see plugin enrichment disappear from the UI until upstream curator renames + nightly sync overwrites the working tree; POST /api/marketplaces/{id}/sync forces refresh sooner - Stripped /grpn-eng: leaks from the existing skill/agent rich-content bullet Tests ----- 128 targeted tests pass (test_marketplace_metadata, test_marketplace_api, test_marketplace, test_markdown_render, test_marketplace_synth_strip, test_marketplace_filter). New tests added: - 6 XSS regression tests on render_safe (javascript:/data:/vbscript: schemes via autolink, reference link, and mixed-case + positive http/https/mailto + noopener noreferrer rel) - 3 byte-cap tests (truncation + UTF-8 boundary + under-cap pass-through) - 1 cache eviction stress test (>256 marketplaces -> bounded at cap) - 1 truthy-vs-presence resolver-contract test Release-cut ----------- - pyproject.toml 0.49.1 -> 0.50.0 (minor; BREAKING file rename per pre-1.0 CHANGELOG note: "breaking changes called out under Changed or Removed with the BREAKING marker") - CHANGELOG [Unreleased] -> [0.50.0] - 2026-05-12, new empty [Unreleased] on top. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-12 08:38:39 +00:00
ZdenekSrotyr	b6cdd68e8d	feat(catalog): entity_type + validated where_examples + view-aware cost-guard + scheduler hygiene Three behavioural improvements driven by the sub-agent end-to-end test findings, plus scheduler tweaks to prevent the post-deploy contention burst we measured. CATALOG (catalog-side bugs the test agents tripped on): - new entity_type field per remote row (BASE TABLE / VIEW / MATERIALIZED VIEW). For views, rows + size_bytes return null instead of the misleading 0 that __TABLES__ reports. - where_examples now validates against the table's actual schema (cached known_columns from refresh). The pre-fix behavior blindly advertised `country_code = 'CZ'` on tables with no country_code column — the sub-agent tests reliably hit this on unit_economics. - new known_columns + entity_type columns on bq_metadata_cache; populated by bq_metadata_refresh.refresh_one from the same fetch_bq_columns_full call (no extra BQ roundtrip) plus a cheap INFORMATION_SCHEMA.TABLES lookup for table_type. QUERY COST-GUARD: - remote_scan_too_large suggestion now names views explicitly: `Target(s) <ids> are VIEW or MATERIALIZED VIEW. BigQuery does not push LIMIT into the view body — SELECT * FROM <view> LIMIT 1 still runs the full underlying scan.` Programmatic consumers get a new view_targets field on the error detail. SCHEDULER HYGIENE (the post-deploy 1-minute window where concurrent parquet downloads dropped to ~1 MB/s): - SCHEDULER_STARTUP_GRACE_SECONDS (default 60) holds the first tick so the burst doesn't overlap cache_warmup writes. - SCHEDULER_BQ_METADATA_INITIAL_OFFSET_MAX_SECONDS (default 900) randomises bq-metadata-refresh's first-fire offset. TESTS: - test_bq_metadata_cache_repo: entity_type + known_columns round-trip - test_v2_catalog_remote_metadata: where_examples validation, views return null rows/size_bytes, cold rows have empty examples - test_api_query_guardrail: VIEW-aware suggestion text + view_targets - test_connectors_bigquery_metadata: entity_type lookup mock + new fields in TableMetadata expectations - test_scheduler_sidecar: grace + jitter env-var resolution	2026-05-12 10:37:35 +02:00
Vojtech	c09c85d13a	fix(cta): clipboard fallback + fold Atlassian MCP into connectors (#249 ) * fix(cta): fall back to textarea+execCommand when Clipboard API rejects The "Setup a new Claude Code" CTA fetches /auth/tokens, parses the JSON response, renders the setup script, THEN calls `navigator.clipboard.writeText()`. Modern browsers (Safari, Firefox, and Chrome on stricter configurations) reject `writeText` with NotAllowedError when transient user activation has been consumed by an intervening `await` — which is exactly the case here. Users perceived this as "the browser blocked the copy" and got the manual-paste fallback modal even though the textarea + `document.execCommand('copy')` path WOULD have worked synchronously without needing fresh user activation. `copyToClipboard` now: - prefers the modern Clipboard API (unchanged for the happy path) - on writeText rejection, falls back to `copyViaTextarea` instead of surfacing the rejection to the caller's catch block. `copyViaTextarea` is the previously-inline textarea fallback factored out into a named helper, with two small hardening touches: - `readonly` + `tabindex=-1` so the hidden textarea doesn't steal focus or pop the virtual keyboard on mobile. - explicit `setSelectionRange(0, text.length)` to belt-and-braces the selection on iOS Safari (where `.select()` alone sometimes selects zero chars on touch-focused textareas). Only the CTA button needed this — the Step-1 install-command and the connector-copy buttons all call `writeText` synchronously inside the click handler (no awaits in between), so they keep their existing user-gesture context and didn't hit the same rejection. No template changes there. * refactor(home): fold Atlassian MCP registration into connectors block The standalone "Register the Atlassian MCP server" step (was step 6 in the unified setup script) moves INTO the Atlassian connector's prompt body so all Atlassian-related setup lives in one logical group. Same intent that #247 carried for connectors, applied one level deeper: the hosted Remote MCP registration is part of "set up Atlassian", not its own ungrouped step. What changed: - `app/web/connector_prompts.py` — the Atlassian prompt's step 5 replaces the speculative "Register the on-demand Atlassian MCP under .claude/mcp/atlassian" line with the actual hosted Remote MCP registration: `claude mcp add --transport sse atlassian https://mcp.atlassian.com/v1/sse \|\| true`. The `\|\| true` keeps re-runs idempotent and the body explains the OAuth-on-first-use contract. Both /home's Atlassian tile and the inlined setup-script Atlassian sub-block emit this line — single source of truth holds. - `app/web/setup_instructions.py` — `_mcp_servers_block` deleted; the `mcp_servers` step is removed from `_step_numbers`; resolve_lines no longer calls it. - Renumbering: install (1), init (2), catalog (3), preflight (4), marketplace (5), diagnose (6), connectors (7), confirm (8). Was: 6 = mcp_servers, 7 = diagnose, 8 = connectors, 9 = confirm. - `tests/test_setup_instructions.py` — Confirm step 9→8, Connect 8→7, diagnose 7→6, mcp_servers references dropped. `test_step_numbering_with_connectors_step` now asserts `"mcp_servers" not in steps`. Stray-Confirm assertion lists shift by one position. - `tests/test_setup_page_unified.py` + `tests/test_web_ui.py` — same step-number shifts in the rendered /setup preview assertions. The `claude mcp add` line is still the Atlassian Remote-MCP path that the 2026-05-10 init-report Fix C added — only its position in the flow changes. /home Atlassian tile copying continues to install the MCP too (the prompt body the tile pastes contains the same line). 112 tests pass. * feat(atlassian): operator-overrideable base URL via AGNES_ATLASSIAN_BASE_URL Adds an env var / YAML key the operator (Terraform module, customer-VM template, OSS instance.yaml) can set to bake the Atlassian Cloud site root into the connector prompt — so end users don't have to guess / paste their org's `https://<myorg>.atlassian.net`. When set, the Atlassian connector prompt (rendered on both /home tile and inlined into the setup-script step 7 Atlassian sub-block) replaces step 1's "Ask me for my Atlassian Cloud site URL and email" with a one-line note that the URL is already provisioned by the operator and asks only for the email. Step 4's helper-script body has the `BASE_URL='<the site URL I gave you>'` placeholder substituted with the literal value. When unset (empty), the existing "ask the user" flow remains — no regression for OSS instances. Resolution + normalization in `get_atlassian_base_url()`: - env `AGNES_ATLASSIAN_BASE_URL` > yaml `instance.atlassian.base_url` > "" - strips trailing slash + trailing `/wiki` so the canonical value is the bare site root. Matches the per-user helper script's normalization at storage time (atlassian_prompt step 4 guard 2), so the literal baked in by the operator stays consistent with what the user's helper script would have computed from their input. Plumbing: - `app/instance_config.py`: new `get_atlassian_base_url()` resolver. - `app/web/connector_prompts.py`: - `atlassian_prompt(, base_url: str = "")` — string-replace two explicit placeholder phrases when base_url is truthy; otherwise return the prompt unchanged. - `all_connector_prompts(..., atlassian_base_url: str = "")` — forwards the kwarg. - `app/web/router.py` (`_build_context`): reads `get_atlassian_base_url()` and passes it through to `all_connector_prompts(...)` so both the /home tile context AND the inlined-script `resolve_lines(...)` call use the same value. - `src/welcome_template.py` (`compute_default_agent_prompt`): same threading via the existing import-on-demand path. Tests (`tests/test_home_route_resolution.py`): - `get_atlassian_base_url` resolver: default empty, env override, trailing-slash strip, trailing-`/wiki` strip. - `atlassian_prompt(base_url=...)`: literal URL baked in, ask-step removed, placeholder replaced, operator-baked-in copy appears. - `atlassian_prompt(base_url="")`: existing ask-the-user flow unchanged. - `all_connector_prompts(atlassian_base_url=...)`: kwarg threads through to the rendered atlassian prompt. 135 tests pass. feat(asana): register hosted Asana Remote MCP in connector prompt The Asana connector prompt only stored a PAT in the OS keychain + ran a curl verify against /api/1.0/users/me. That set Claude Code up for direct `curl` calls but didn't actually wire Asana into Claude's tool list — so the user couldn't ask Claude to "find my open Asana tasks" and have it work. Symmetric oversight to the Atlassian connector's original speculative `.claude/mcp/atlassian` line that this branch already replaced with `claude mcp add --transport sse atlassian https://mcp.atlassian.com/v1/sse`. Adds a new step 5 that registers Asana's hosted Remote MCP: claude mcp add --transport http asana https://mcp.asana.com/mcp \|\| true This is the V2 endpoint (streamable HTTP transport, launched February 2026). The V1 SSE endpoint at https://mcp.asana.com/sse was deprecated 2026-05-11 (today) and must NOT be used — calling it out explicitly in the prompt body so a future operator who finds an old reference doesn't paste the dead URL. OAuth is handled by Claude Code at first use, same model as the Atlassian MCP step. The PAT stored in step 3 stays for direct `curl` calls (precheck + ad-hoc scripts) — the MCP path uses its own OAuth grant, not the PAT. Old step 5 (revoke instructions) renumbers to step 6 and adds the `claude mcp remove asana` cleanup hint. Same single-source-of-truth invariant holds: /home Asana tile + the inlined Asana sub-block in the setup script (step 7 connectors) both emit identical text from `asana_prompt()`. 71 tests pass. * feat(asana): drive MCP OAuth login + end-to-end validation post-register `claude mcp add --transport http asana ...` only registers the server in Claude Code's local config — it does NOT trigger OAuth. The browser tab opens the first time any `mcp__asana__` tool gets invoked. So the previous step 5 left a user looking at a "registered" MCP that, in practice, hadn't authed yet and would fail on first real use. Same blind spot Atlassian's prompt also has, but Asana was the one called out in the latest review pass. Adds a new step 6 between MCP registration (step 5) and the revoke instructions (now step 7): a. Tell the user verbatim what's about to happen — a low-impact read through the MCP will pop the OAuth browser tab; sign in with the same account whose PAT they stored in step 3 and approve. Frames the OAuth as one-time so users don't wait for it on every later call. b. Drive an actual MCP read. Don't prescribe the exact tool name because the Asana MCP's exposed surface (`mcp__asana__`) is versioned upstream and we don't want to pin to a name that gets renamed. Instead: tell Claude to pick the lightest read from its surfaced tool list (users-me / list-workspaces / equivalent). Document the recovery path when Claude Code times out waiting for the OAuth tool use: `claude mcp list` to confirm registration before retrying. c. Print a single one-line proof that combines wiring + auth: "Asana MCP connected as <name> — <N> workspace(s) visible." Explicit anti-echo callout for tokens, task content, comments. On failure, surface the exact Claude-Code error and stop — no silent pass. d. Sanity-check that the MCP OAuth identity and the PAT identity reference the same Asana account. Easy mistake to make when the user has multiple Asana accounts — flag only on mismatch, keep quiet when they match. Recovery: `claude mcp remove asana && claude logout asana` then redo step 5. Step 7 (revoke) absorbs both the keychain delete + the `claude logout asana` line so users have a single place to undo everything. 43 tests pass. * fix(init): clear stale CA env vars on Windows before any TLS handshake Reported by the 2026-05-11 Windows test pass: after `agnes init` the gws connector failed with `UnknownIssuer` TLS errors because `SSL_CERT_FILE` and `REQUESTS_CA_BUNDLE` were still set in Windows User scope pointing at `C:\Users\localadmin\.config\agnes\ca-bundle.pem` — a file that did not exist on the test host. Past Agnes installs (the setup-prompt trust block + older bootstrap helpers) write those pointers when they materialize a combined Agnes-CA bundle; when the bundle file later disappears (re-init on a new VM, machine swap, the ~/.agnes dir wiped), the pointers go stale and every native Windows TLS handshake fails before Agnes itself runs. SSL_CERT_FILE in particular REPLACES (not appends to) the trust store, so a stale pointer is silently catastrophic. `agnes init` now clears stale pointers in two layers before the first server roundtrip: 1. Current-process env (os.environ) — what the immediately-following `api_get` to /api/catalog/tables actually reads. Without this, init itself blows up before it gets to step 2. 2. Windows User-scope env via PowerShell `[Environment]::SetEnvironmentVariable(name, $null, 'User')` — what every future shell + every native tool (gws, claude.exe, pip, uv) inherits. The 2026-05-11 reporter expected this exact cleanup ("init was supposed to clear these but they persisted"). The cleanup is best-effort and conservative: - Only deletes a var when its value points at a path that does NOT exist on disk. Intentional operator config (e.g. SSL_CERT_FILE pointing at a corp certifi bundle) stays put. - PowerShell missing / restricted execution policy / WSL-without-pwsh: swallowed silently. The current-process leg still runs, which unblocks init even on hosts where the User-scope leg cannot fire. Tests (`tests/test_init_ca_cleanup.py`, 6 cases): - Stale pointers → removed from process env. - Real-path pointers → preserved. - Non-Windows hosts: PowerShell is not invoked. - Windows hosts: PowerShell IS invoked with a script that checks all three vars + uses Test-Path + SetEnvironmentVariable. - PowerShell FileNotFoundError: cleanup swallows it, does not raise. - `_is_windows_host()` reflects sys.platform. * refactor(asana): MCP-first flow — drop PAT storage, precheck via `claude mcp list` The Asana hosted MCP at https://mcp.asana.com/mcp authenticates via OAuth (Claude Code holds the grant; browser tab pops on first tool use). The earlier prompt walked the user through creating + keychain- storing an Asana Personal Access Token AND registering the MCP — two parallel auth surfaces for one connector. Once the MCP works, the PAT has no consumer: the precheck/verify steps that used `curl $BASE/api/1.0/users/me` are just redundant proof that Asana itself is reachable, which the OAuth handshake already establishes. Removed: - Step 0 keychain probe + curl verify against /users/me with PAT. - Step 1 open developer-console / create PAT. - Step 2 click "+ New access token", warn shown-ONCE. - Step 3 helper-script for keychain-storage (per-OS bodies: macOS `security add-generic-password`, Linux `secret-tool store`, Windows `cmdkey /generic`). - Step 4 PAT-side `users/me` verify. - Step 5's split that kept the PAT around for direct curl scripts. - Step 6d's "MCP vs PAT identity sanity check" — there is no PAT anymore, nothing to mismatch against. New flow (3 steps total): - Step 0 precheck: `claude mcp list \| grep ^asana` — if found, the server is registered AND Claude Code is holding its OAuth grant (otherwise prior failure would have removed it); print "Asana MCP already registered — skipping setup" and stop. Tells the user the explicit reset command (`claude mcp remove asana && claude logout asana`) so a re-register stays one paste. - Step 1: `claude mcp add --transport http asana https://mcp.asana.com/mcp` — no `\|\| true` because step 0 should have caught the "already exists" case. Step explains the V2-vs-V1 endpoint distinction (V1 SSE deprecated 2026-05-11) and the abort-clean recovery if the precheck somehow missed the existing server. - Step 2: same OAuth + low-impact-read validation pattern as before. - Step 3: revoke instructions (mcp remove + logout + Asana-side app revoke at app.asana.com/Settings → Apps). Both surfaces (the /home Asana tile and the inlined Asana sub-block in the setup script's step 7) emit the new text from the same asana_prompt() — single-source-of-truth invariant intact. 77 tests pass.	2026-05-11 21:54:51 +02:00
ZdenekSrotyr	27863f88e2	test(db_schema_version): bump locked SCHEMA_VERSION assertion to 40 + add v40 changelog comment	2026-05-11 21:04:11 +02:00
ZdenekSrotyr	d8cac7eeff	fix: bump duckdb >=1.5.2 (test_db migration ladder) + skip cli_binary_rename on stale venv - DuckDB 1.5.1 regressed: rejected `ALTER TABLE … ADD COLUMN IF NOT EXISTS` with `Cannot alter entry … because there are entries that depend on it` when the target was FK-referenced from another table. Hit on `internal_roles` (v8→v9) and `user_groups` (v11→v12) during migration replay. 1.5.2 fixes it. CI already runs 1.5.2; this pins the same floor for local devs. - tests/test_cli_binary_rename now skips with an actionable message instead of failing when the local venv has no `agnes` on PATH (fresh checkout) or has a stale shim from a prior editable install whose `cli` layout shifted. CI installs fresh and still asserts the real contract.	2026-05-11 20:47:49 +02:00
ZdenekSrotyr	b3841f5b6c	release: 0.50.0 — persistent BQ metadata cache + scheduled refresh; catalog never blocks on BigQuery Since 0.47.0 GET /api/v2/catalog enriched each remote BigQuery row by fetching INFORMATION_SCHEMA.TABLE_STORAGE + COLUMNS through the DuckDB BigQuery extension inside the request. On cold caches that fanned out to O(N) sequential BQ jobs-API roundtrips — easily 90 s+ on partitioned / view-backed tables — and reliably blew the CLI's 30 s httpx ReadTimeout. Reproduced with py-spy: three AnyIO worker threads stuck inside connectors/bigquery/metadata._fetch_via_legacy_tables. Refactor: enrichment is read exclusively from a new persistent bq_metadata_cache DuckDB table (schema v40), populated by a scheduler- driven refresh job at SCHEDULER_BQ_METADATA_REFRESH_INTERVAL (default 4 h). Cold catalog response on a fresh container is now tens of milliseconds with metadata_freshness=never_fetched for unwarmed rows. New surface: - POST /api/admin/run-bq-metadata-refresh (scheduler-driven, full) - POST /api/v2/metadata-cache/refresh?table=<id> (admin, single) - GET /api/v2/metadata-cache/status (auth, non-admin) - metadata_freshness field per catalog row Removed (internal API): v2_catalog._size_hint_for_row, _resolve_remote_metadata, _metadata_provider_for, _build_metadata_request, _materialized_size_hint, in-memory _metadata_cache. Response shape unchanged for external consumers. 991 tests passing; 2 pre-existing failures (test_db v3→v4 ladder, test_cli_binary_rename) unrelated to this change.	2026-05-11 20:37:17 +02:00
ZdenekSrotyr	183ee44bad	release: 0.49.1 — /home onboarding rework + memory admin gate + admin_email + folded connectors (#247 ) Cuts release shipping #243 (/home install hero polish, onboarding usability fixes, /corporate-memory admin gate (BREAKING), instance.admin_email operator knob, connector setup folded into the main install script as step 8) plus the post-rebase consolidation/review-fix work.	2026-05-11 17:03:43 +00:00
Vojtech	a46b9dc928	/home install-hero polish: license link contrast, auto-mode reorder, Shift+Tab guidance (#243 ) * Make /home install-hero links readable against blue background The Claude license-options link added in the previous commit inherited the default `<a>` style (`var(--hp-primary)` blue), which renders as blue-on-blue and is unreadable inside the blue install-hero. Add a scoped `.install-hero a` rule that uses white with an underline (matching the existing lead-paragraph contrast pattern) so any link nested in the hero stays legible. * Reorder /home install flow: auto-mode is now Step 2, Agnes install becomes Step 3 Step 3 (was Step 2) pastes a ~20-command bash bootstrap into a fresh Claude Code session. Without auto-mode enabled first, each Bash/edit command needs a manual approve click — bad UX for first-time users. Move auto-mode from the outside-hero `<details>` reference block into the install-hero as a real Step 2, between "install Claude Code" and "install Agnes". Content is the persistent `acceptEdits` snippet (write to ~/.claude/settings.json) plus a one-liner pointing at Shift+Tab for users who are already inside a running Claude Code session. YOLO mode for full Bash auto-approve stays on /setup-advanced behind the existing link. The outside-hero `setup-collapsible[data-section="step3"]` block is dropped — auto-mode is no longer reference content, it's a real install step, and duplicating it would just diverge over time. Onboarded users no longer see the auto-mode block at all (consistent with Steps 1 + 3 also hiding post-onboarding). Completion banner copy updated: "Step 1, 2 & 3 done — Claude Code installed, auto-mode set, Agnes ready". Dashboard CTA partial and other templates don't reference step numbers for this flow, so no adaptation needed there. * Simplify /home Step 2 to Shift+Tab only — drop the JSON snippet Operator pointed out two issues with the prior Step 2: 1. The settings.json snippet is redundant. Claude Code's first Shift+Tab cycle to auto-accept mode already prompts the user whether to persist it as default — Claude writes the config itself, no manual file edit needed. 2. The snippet only showed the POSIX path `~/.claude/settings.json`, which doesn't translate to native Windows. Replace the snippet + copy button with a plain Shift+Tab instruction, explicitly call out the first-time "make this the default?" prompt, and note that Claude handles the config write itself — same flow on macOS / Linux / WSL / Windows. Adds a fallback line for users who already closed the post-OAuth session. * Tighten /home Step 2 install-note to two paragraphs Operator: drop the 'Claude writes the setting itself, so this works the same on macOS / Linux / WSL / Windows...' line plus the 'auto-approves file edits going forward; Bash commands stay gated — that's the safe default' line. Both were filler — the make-default prompt already implies persistence, and gated Bash is the obvious default users won't be surprised by. Result: paragraph 1 carries Shift+Tab + first-time make-default say-yes + closed-session fallback in one breath; paragraph 2 keeps the verbatim YOLO link. Same affordances, less vertical space.	2026-05-11 16:46:58 +00:00
ZdenekSrotyr	65342cd1fb	release: 0.49.0 — session capture queue + private session feature (#245 ) Cuts release shipping #242 (Mina's session capture queue + /agnes-private slash command + statusLine indicator) plus the post-review follow-ups (my David #8 legacy-scan filter, S2.7 statusline cache+no-mkdir, David #11 capture-session breadcrumb, plus Mina's bbf63472 batch covering S2.6 docs / S2.9 e2e test / S2.10 truthy edge cases / 4xx loop fix).	2026-05-11 13:49:11 +00:00
minasarustamyan	19c5a7592a	Session capture queue, private session, and setup-prompt fixes (#242 ) * Capture session paths via SessionStart hook + lock parallel pushes Replace the encoding-based scan of ~/.claude/projects/<encoded-cwd>/ with a queue file populated by a new `agnes capture-session` SessionStart hook. The hook reads the documented `transcript_path` field from Claude Code's hook stdin JSON, sidestepping the cwd-to-folder encoding (which is an internal implementation detail and varies by Claude Code version). - New `agnes capture-session` subcommand appends transcript_path to <workspace>/.claude/agnes-sessions.txt. Silent on all malformed input so a hook chain failure doesn't clutter Claude Code startup. - `agnes push` now consumes the queue: atomic snapshot rename guards against hooks writing during the push window, successful uploads land in agnes-sessions-uploaded.txt (TSV: timestamp + path), failed paths are requeued. - Cross-platform single-instance lock via the filelock package (fcntl on POSIX, msvcrt on Windows). Concurrent SessionEnd hooks — common when the user closes several sessions at once — silent-exit on the losing side instead of all racing the upload. - Recovery: pre-existing snapshot files from a crashed push are picked up and processed before the live queue. - The SessionStart `agnes push` self-heal entry is dropped — it became redundant once the queue persists across runs (orphans from headless / crashed sessions ship out on the next interactive SessionEnd push). Existing workspaces auto-migrate via the marker-based replace logic. - Legacy encoding scan stays available behind `--legacy-scan` for one- off backfills of sessions predating the queue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add /agnes-private + statusLine indicator for private sessions Users handling sensitive data inside Claude Code can now opt a session out of the Agnes upload pipeline, either proactively (right after session start) or reactively (mid-session). The `/agnes-private` slash command runs `agnes mark-private` deterministically via `!`-prefix direct bash — no AI in the loop. A workspace-installed statusLine surfaces a `🔒 agnes-private` indicator in Claude Code's status bar so the user sees the state at a glance. Authoritative source of "do not upload" is a separate file `<workspace>/.claude/agnes-sessions-private.txt` (one session_id per line). Both `capture-session` (queue writer) and `push` (queue reader) consult the list. This makes the slash-command / SessionStart-hook race impossible by construction: whichever runs first, the session is correctly filtered out. - `agnes mark-private` reads `CLAUDE_CODE_SESSION_ID` from env (set by Claude Code in every bash subprocess it spawns — stable documented API) and appends to the private list. - `agnes statusline` reads the session JSON Claude Code pipes on stdin, checks the private list, and emits the indicator or nothing. Optimized for the high call frequency of statusLine renders. - `capture-session` extracts session_id from hook stdin and skips queue write when the ID is already on the private list (race protection). - `push` filters snapshot entries by the private list and appends to a per-workspace audit log `agnes-sessions-private-skipped.txt`. - Queue format migrated from `<path>` to `<session_id>\t<path>`; legacy one-column lines still parse (empty session_id, still upload, can't be marked private retroactively — fine, they pre-date the feature). - `install_claude_hooks` writes a workspace statusLine unless the user already has a custom one (warn + preserve). Idempotent re-init. - `install_claude_commands` ships `agnes-private.md` alongside `update-agnes-plugins.md`. Per-template fallback so a missing template doesn't get clobbered with the wrong content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix setup-prompt + CLAUDE.md marketplace copy + drop skills step Three issues against the post-PR-#240 / post-PR-#237 state: 1. Setup prompt's marketplace block trailer (both has-stack and empty-stack variants) claimed the SessionStart hook keeps the marketplace clone in sync via `agnes refresh-marketplace --quiet` on every session and that admin grants land automatically — both false since PR #237 (0.47.x) moved the install/update path out of the hook into the `/update-agnes-plugins` slash command. The hook is `--check`-only: detects server-side changes, prompts the user to run the slash command, which does the full reconcile interactively with output visible in the transcript. 2. The empty-stack variant framed composition as "admin grants only", missing the actual three-source served stack: (admin RBAC ∩ /marketplace subscriptions) ∪ system-mandatory plugins (admin-pinned, auto-applied) ∪ Flea market installs (skills/agents bundled, plugins standalone) Updated copy spells out all three sources so analysts know where their stack picks live, and what the SessionStart hook actually does on change detection. 3. CLAUDE.md template's "Agnes Marketplace" section conflated eligibility (`resolve_allowed_plugins` — what's listed) with served stack (`resolve_user_marketplace` — what actually reaches Claude Code). The two are different: a user can be RBAC-eligible for a plugin without having subscribed to it on /marketplace. Rewrote the section to distinguish the eligibility set from the served stack and to describe the `--check`-only hook accurately. Plus: deleted the setup prompt's interactive Skills step (final step before Confirm). The named-opinion question — "do you want me to bulk-copy every skill into ~/.claude/skills/agnes/ or pull on-demand via `agnes skills show <name>`?" — had no obvious right answer for new users at the tail end of a wall of technical steps. On-demand lookup is the one-size-fits-all default; `agnes skills list/show` remain discoverable and the CLAUDE.md template references specific skills inline (e.g. agnes-data-querying in the BigQuery section) where they're relevant. Layout: Confirm shifts from step 9 to step 8. Tests updated, full setup/marketplace/welcome surface green (115 passed). Remaining full-suite failures are pre-existing (BQ/Keboola fixtures, Windows charmap collection error in test_v26_keboola_e2e) — verified against a clean stash, unrelated to this diff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix session-queue race + snapshot PID-reuse data loss Two blocker fixes from the PR #242 review: 1. Concurrent SessionStart hooks could corrupt the queue file on Windows. Python's `open(path, "a")` is not atomic there — the CRT does not pass FILE_APPEND_DATA to CreateFile, so concurrent appenders (user opening several Claude Code windows simultaneously) could interleave bytes mid-line. The malformed lines then silently fail the parser and the entries are dropped. Fix: wrap append_to_queue, requeue_failed, and snapshot_queue in a short-lived FileLock on a dedicated `agnes-queue.lock`. Separate from `agnes-push.lock` so capture-session hooks don't block on the push command. New test_append_concurrent_threads_no_corruption reproduces the race with 4 threads x 50 appends. 2. Snapshot filenames embedded only the PID (`agnes-sessions.snapshot. <PID>.txt`). After a crashed push left a snapshot on disk and the OS recycled the PID for a new push, `os.rename` would atomically overwrite the recovery snapshot — every entry in it lost, silently. Fix: append a uuid8 hex tail (`agnes-sessions.snapshot.<PID>. <uuid8>.txt`). find_recovery_snapshots already globs the prefix so it picks up both old and new format. New test_snapshot_filename_is_unique_per_call asserts two consecutive snapshots under the same PID don't collide. Targeted tests green (47/47 in session_queue/capture_session/cli_push). Full suite failures unchanged from baseline (pre-existing BQ/Keboola fixture issues per CLAUDE.md). * Auto-refresh workspace hooks + bash-wrap all hook entries (Windows) Fixes from PR #242 second review (ZdenekSrotyr): 1. `uv.lock` regenerated to include `filelock 3.29.0` (declared in pyproject.toml but missing from the lock file — CI's lockfile-consistency check would fail; `uv pip install` on a clean cache would silently miss the dep). 2. `agnes self-upgrade` now auto-refreshes the workspace Claude Code hooks via the new `cli.lib.hooks.maybe_refresh_claude_hooks`. Closes the silent-stop migration gap: a v0.48 workspace would auto-upgrade the CLI from its existing SessionStart self-upgrade entry but never pick up the new `agnes capture-session` SessionStart hook, leaving the queue empty and `agnes push` uploading nothing. The refresh fires on both the "info is None" fast path (CLI already current — catches the second SessionStart after a prior upgrade) and the install-success path. Guarded by `workspace_has_agnes_hooks` so it never writes `.claude/settings.json` into directories that aren't Agnes workspaces (e.g. `agnes self-upgrade` invoked from `~/`). Errors are surfaced on stderr but never flip the upgrade exit code. 3. All Agnes-managed hooks are now wrapped in `bash -c "..."`. The self-upgrade+pull chained SessionStart entry was the only one still shipping unwrapped — Claude Code on Windows runs hook commands directly without a shell, so the `;` chain + `2>/dev/null` + `\|\| true` shell syntax silently no-op'd on native Windows installs without Git Bash on PATH. Workspaces still on the old form auto-upgrade via the refresh path above. Tests: +12 in test_lib_hooks.py (guard semantics, v0.48→v0.49 migration end-to-end, third-party-hook preservation, bash-wrap invariant). +5 in test_self_upgrade.py (refresh fires on info=None, fires on install success, skipped on failure, skipped on --check-only, refresh failure never flips exit code). 130 targeted tests green. The 2 pre-existing Windows path-separator failures in `test_smoke_test_detects_version_mismatch[uv\|pip]` are unrelated (path mismatch `\fake\uv\bin\agnes` vs `/fake/uv/bin/agnes` in test asserts, pre-PR baseline). * CHANGELOG: document PR-242 main features Closes ZdenekSrotyr #4: the [Unreleased] block was missing entries for the PR's primary surface — only the post-merge fix bullets and the unrelated setup-prompt copy change were captured. Adds: - ### Added: 6 bullets covering the session capture queue + new `agnes capture-session` subcommand, `/agnes-private` slash + `agnes mark-private`, `agnes statusline` + statusLine wiring, `--legacy-scan` opt-in fallback, single-instance push lock, and the new `filelock` runtime dep. - ### Changed: BREAKING bullet on the SessionStart / SessionEnd hook wire format change (capture-session as first SessionStart entry, push self-heal removed, SessionEnd push detached via nohup, all entries bash-wrapped). Folds the prior standalone bash-wrap bullet into this consolidated entry — Z's review flagged the layout shift as BREAKING, and grouping the related sub-changes makes the migration story readable in one place. - Operator migration is auto-handled by `maybe_refresh_claude_hooks` invoked from `agnes self-upgrade` (separate Changed entry below). No `agnes init` re-run required. Pre-queue session jsonls on upgrading workspaces still need a one-off `agnes push --legacy-scan` — flagged in the BREAKING bullet. No code change; doc only. * Drop permanent 4xx uploads instead of requeueing forever Closes ZdenekSrotyr #5. Previously the push retry path requeued any non-200 response except the literal "file not found on disk", so 401 (token expired), 403 (RBAC denial), 413 (payload too large), 400 (server-side validation) cycled through every push run forever — the queue grew without bound and each run re-bombarded the server with the same deterministically-failing upload. Now 4xx (except 408 Request Timeout + 429 Too Many Requests, which the HTTP spec marks as transient) is dropped and audit-logged to `<workspace>/.claude/agnes-sessions-failed.txt`: <iso_ts>\t<session_id>\t<status>\t<transcript_path> 5xx and network errors continue to requeue — those reflect server / transport state that can change between runs, so retry is the right behavior. The audit log piggybacks on the push single-instance lock (agnes-push.lock) — push is the only writer to this file, same as the existing `mark_uploaded` and `mark_private_skipped` paths, so no separate filelock is needed. `agnes push --json` surfaces a new `dropped_permanent` counter; non- quiet stdout mentions the audit-log path so operators tailing the output have a pointer to the forensic trail. Tests: +7 in test_cli_push.py (401/400/403/413 → drop; 408/429 → requeue; 500/502/503 → requeue; network exception → requeue; --json `dropped_permanent` counter; stdout audit-log pointer). +1 in test_session_queue.py (mark_failed_permanent TSV format). 127/129 targeted tests green. The 2 pre-existing Windows path-separator failures in `test_smoke_test_detects_version_mismatch [uv\|pip]` are unrelated (path mismatch `\fake\uv\bin\agnes` vs `/fake/uv/bin/agnes` in test asserts, pre-PR baseline). * Catch OSError in push lock acquisition Closes ZdenekSrotyr #8. `acquire_or_skip` in `cli/lib/push_lock.py` previously caught only `filelock.Timeout`. Any `OSError` from `FileLock.acquire` — read-only filesystem, permission denied on `.claude/`, disk full, hardware I/O failure — propagated as an unhandled traceback. Two visible failure modes: - SessionEnd hook: `\|\| true` in the wrapper swallowed the error, so daily pushes silently never ran. Operator had no signal. - Manual `agnes push`: ugly Python traceback dumped to the terminal instead of a clean exit. Now `OSError` is treated the same as `Timeout` — yield `None`, caller returns cleanly with rc=0. The operator's environment in these scenarios has bigger problems than missing session uploads, so we swallow rather than retry-loop or surface a noisy warning. Test: `test_push_silent_exit_when_filelock_raises_oserror` patches the `FileLock` used inside `push_lock` to raise OSError on acquire, verifies push exits 0 with no traceback and the queue is preserved for the next attempt. * Address remaining S2 items from PR-242 review Four items from ZdenekSrotyr's S2 list: S2.10 — `_install_statusline` truthy check (cli/lib/hooks.py): replace `if existing:` with explicit `if existing is None or existing == "":`. Documents and tests the behavior for both edge cases (explicit-null and empty-string `statusLine`) — both treated as "not configured" rather than "explicit user opt-out", so we install ours. Two new tests in test_lib_hooks.py pin the contract. S2.6 — onboarding docs for /agnes-private. New "Private sessions" subsection in `config/claude_md_template.txt` (next to Data Sync) covering the slash command, statusbar indicator, and audit-log location. One-line tip in `app/web/setup_instructions.py` so the feature is discoverable at onboarding. S2.9 — e2e privacy test (tests/test_e2e_privacy.py). Wires capture_session → mark_private → push against a recording fake api_post and asserts zero session uploads for the marked one. Three cases: mark-before-capture (queue write skipped), mark-after-capture (push-side filter catches it + audit-logs), control (unmarked sessions upload normally). David #8 — `--legacy-scan` help text now documents the private-list gap (legacy entries carry empty session_id, so the filter is not consulted). The practical impact is bounded — pre-queue sessions cannot have been marked private since the private list is a queue-era feature — but the disclaimer in the help text means an operator running a backfill is not surprised. 68 targeted tests green (3 new e2e + 2 new truthy edge tests + existing). 2 pre-existing Windows path-separator failures in test_smoke_test_detects_version_mismatch[uv\|pip] unchanged. Remaining S2 items (statusline mkdir push-back, capture-session silent-fail follow-up) handled in PR comment + follow-up issue respectively. * Address remaining S2 follow-ups (David #8, S2.7, David #11) Three items left over from Mina's bbf63472 batch — that commit addressed S2.6/S2.9/S2.10 + documented David #8 in help text but deferred the actual implementations of S2.7, David #11, and the real David #8 fix to follow-ups. This commit closes them. David #8 — `agnes push --legacy-scan` now consults the private list. Claude Code names jsonls `<session-id>.jsonl`, so the file stem IS the session id; the legacy-scan path can apply the same private filter the queue path uses. Both the dry-run and live-upload code paths fixed. Help text updated (no longer warns the filter is bypassed). Two new tests in test_cli_push.py cover the upload-skip path + the dry-run `would_skip_private` segregation. S2.7 — `statusline`/`is_private` no longer mkdir-pollutes arbitrary workdirs. Split `_claude_dir` into `_claude_dir_writable` (used only from `add_private`) and `_claude_dir_readonly` (no mkdir). The read-only public helpers (`private_list_path`, `read_all_private`, `is_private`) compose the no-mkdir variant by default; `add_private` opts in via `writable=True`. Added a process-local mtime-keyed cache around `read_all_private` so in-process callers (push doing one stat per upload candidate, future `agnes diagnose`) don't re-parse the file on every check. Cache eviction on `add_private` so a sub-second write+read sequence doesn't see stale data even on coarse-mtime filesystems. Two new tests pin the no-mkdir contract + the in-same-second add+read consistency. David #11 — `agnes capture-session` writes a breadcrumb log on every invocation. New `<workspace>/.claude/agnes-capture-session.log` TSV: `<iso_ts>\t<outcome>\t<detail>` where outcome covers every silent- exit path (`ok`, `private_skip`, `empty_stdin`, `bad_json`, `not_object`, `no_transcript_path`, `stdin_read_error`, `write_error`). Gives operators a signal to detect "hook fires but queue stays empty" — without it, an upstream Claude Code stdin- contract change is invisible because the hook always exits 0. Log rolls at 256 KiB so it doesn't grow unbounded on long-lived workspaces. Best-effort: a breadcrumb-write failure is itself swallowed so the hook contract stays "exit 0 always". Skipped in non-Agnes workdirs (no `.claude/` exists) so opening Claude Code in `~/` doesn't pollute it. Five new tests in test_capture_session.py cover the success / bad_json / no_transcript_path / private_skip / no-pollute paths. 115 targeted tests green (test_cli_push, test_capture_session, test_private_list, test_session_queue, test_e2e_privacy, test_lib_hooks, test_statusline, test_mark_private). --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-11 13:31:16 +00:00
minasarustamyan	9de679c714	System plugins (schema v39) + marketplace UX polish + drop legacy pages (#241 ) * System plugin tier with mark/unmark fanout (schema v39) Adds a mandatory plugin tier so admins can pin a small set of curated plugins into every user's stack from day one. Marking a plugin via the new toggle on /admin/marketplaces materializes resource_grants for every group and user_plugin_optouts subscriptions for every user, so the existing resolver pulls the plugin into every served set without a new filter layer. Hooks on user-create (Google OAuth, magic-link, admin POST, scheduler) and group-create propagate the same materialization to new principals. UI locks: /admin/access disables the checkbox with a SYSTEM pill; /marketplace cards swap the "In stack" green pill for an amber "Required" badge with shield icon; the plugin detail install button reads "Required by your org"; /my-ai-stack toggle is disabled. Bypass paths return 409 (DELETE /api/admin/grants for system grants, PUT /api/my-stack/curated/.../{enabled:false}, DELETE /api/marketplace/curated/.../install). Unmark only flips the flag — materialized rows persist so admins curate cleanup at their leisure through the now-unlocked /admin/access checkboxes. * Marketplace UX polish + drop legacy /store and /my-ai-stack pages Two-part cleanup post-v39: (1) Page deletion. /store and /my-ai-stack were already replaced by /marketplace?tab=flea and /marketplace?tab=my respectively, but the standalone routes lingered. Hard delete in dev mode — no redirects, stale bookmarks 404. The /store/new upload wizard, the flea detail/edit pages, the admin queue, and all /api/store/* + /api/my-stack endpoints (CLI consumers) stay. Internal hardcoded hrefs in the upload wizard's Cancel button and the advanced-setup page repointed to the marketplace tabs. (2) Detail-page install button rework. The single button that morphed between "+ Add to my stack" and "✓ In your stack" did not communicate uninstall affordance. The installed state now renders an inline white status label before a separate red-bordered "✕ Remove from stack" button on the same row, both at identical height to avoid layout shift. System plugins keep their locked amber "✓ Required by your org" pill (no Remove button — API refuses 409). The post-action hint panel now fires on remove too with the title flipped to "✓ Removed from your stack" — Claude Code needs the same /update-agnes-plugins refresh either way. Also: /admin/marketplaces Details modal "Mark as system" toggle redesigned. The button was near-invisible (matched neutral row metadata). It's now a balanced amber-toned chip with shield icon and a structured confirm modal replacing the native confirm() dialog that summarizes fanout consequences before commit. * Move stack-hint inside hero with glass-on-gradient styling The post-action hint card ("✓ Added to your stack" with the /update-agnes-plugins recipe) used to live below the hero in panel-what (gray card on white page body). Clicking add/remove inserted/removed it between the hero and content, shifting the panels below — a noticeable scroll jump. The hint is now anchored inside the hero's top-right corner alongside the install/remove buttons, both as flex children of an absolutely positioned .actions container. The card uses a translucent white-on-glass treatment that adopts the hero's kind color (blue for plugin, green for skill, purple for agent) without per-kind branching. Hero is always tall enough (160px photo) to contain the action+hint stack without overflow, so toggling the hint visibility doesn't grow the hero or shift body content. The hero-head grid reserves a third 300px column for the absolute actions overlay so meta gets the proper 1fr free space instead of being squeezed by a padding-right hack. Responsive breakpoint at 1100px reflows the actions stack below hero-head when the viewport isn't wide enough to keep meta + actions side-by-side comfortably. * Add optional -DataPath bind mount to run-local-dev.ps1 When the operator wants to inspect DuckDB files (system.duckdb, extracts, marketplaces, store/, …) directly from Windows Explorer, the named volume inside the Docker Desktop WSL VM isn't reachable. The new -DataPath param generates a transient compose override that rebinds /data on app, scheduler, extract (and Caddy's /srv:ro mirror) to a Windows host folder. Fully additive — when -DataPath is omitted everything behaves exactly as before: no override file is generated, $composeFiles array is unchanged, finally cleanup is a no-op. Existing positional invocations (.\run-local-dev.ps1 up \| down \| logs) keep binding to $Action because $DataPath is a named-only parameter with no Position attribute. The override is written via [System.IO.File]::WriteAllText so the YAML is BOM-less across PS 5.1 / 7+ — Compose rejects BOM-prefixed YAML on Windows. The override file is unique per PID and removed in the script's finally block so concurrent invocations and crashes don't leak files. * factor mark_system fanout into UserCuratedSubscriptionsRepository The endpoint imported UserCuratedSubscriptionsRepository, ignored it (noqa: F841), then duplicated the user-side fanout SQL inline. Adds fanout_system_for_plugin() symmetric to the existing fanout_system_for_user() and routes mark_plugin_system through it — removes the dead import + 14 lines of inline SQL, returns the same `affected_users` delta count, no behavior change. * drop customer-specific path from .ps1 example Per CLAUDE.md vendor-agnostic OSS rule: replaced C:\\Business\\Groupon\\Agnes\\agnes-data with the generic C:\\Users\\<you>\\agnes-data placeholder so the docstring example reads cleanly on any reviewer's box. * release: 0.48.0 + parallelize Release-workflow pytest Cuts the release shipped via #228 #230 #231 #232 #233 #234 #236 #237 #238 #239 #240 plus this PR (#241). Major changes: - System plugin tier (schema v39) — admins mark a plugin mandatory; fans out RBAC grants + subscriptions to every existing user/group plus hooks for new principals - BREAKING: removed standalone /store + /my-ai-stack page routes (replaced by /marketplace?tab=flea + /marketplace?tab=my) - Setup-prompt + bootstrap recovery fixes (#240) - DuckDB CHECKPOINT-on-shutdown + 60s compose grace (#235) - Marketplace + flea-market UX polish, agnes-metadata.json enrichment Bonus: switch release.yml test step to `-n auto` (matches ci.yml). Single-threaded was 15-20 min and frequently the bottleneck on PR mergeability — now ~6 min. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-10 19:15:41 +00:00
Vojtech	8b2f6620a8	fix(duckdb): CHECKPOINT on shutdown + 60s compose grace to prevent WAL corruption (#235 ) * fix(duckdb): CHECKPOINT on shutdown + 60s compose grace to prevent WAL corruption Default 10s stop_grace_period + missing CHECKPOINT on shutdown produced a class of WAL-replay failures during agnes-auto-upgrade recreates. Sequence: 1. New image digest detected → docker compose up -d → SIGTERM to app 2. App's lifespan close_system_db() called .close() but never CHECKPOINT, so any uncheckpointed ops stayed in system.duckdb.wal 3. Container didn't exit within 10s → dockerd SIGKILL (verified in journal: "Container failed to exit within 10s of signal 15 - using the force") 4. New container started with possibly-different DuckDB version, replay hit "Failure while replaying WAL ... GetDefaultDatabase with no default database set" assertion → 500 on every authed request Observed on foundryai-dev-vrysanek 2026-05-05; recovered by removing the WAL manually. _try_open_system_db already exists as a recovery net but requires a system.duckdb.pre-migrate snapshot, which doesn't exist outside migration windows. Two-part prevention: - src/db.py::close_system_db: execute CHECKPOINT before .close() so the WAL is empty when the file is released. Best-effort (try/except), so a locked or full-disk CHECKPOINT does not block close. - docker-compose.yml: stop_grace_period: 60s on app + scheduler, gives uvicorn + lifespan room to run shutdown handlers under load before Docker's SIGKILL fires. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * log CHECKPOINT outcome on system DB close Silent except: pass on both CHECKPOINT and close() left operators without any signal when the WAL-flush safety net actually saved them (or didn't). Add logger.warning on CHECKPOINT failure (operator-actionable - recovery via _try_open_system_db kicks in next start) and debug-level trace on success / close exception. * drop customer-specific token + add CHANGELOG entries Per CLAUDE.md vendor-agnostic OSS rule: nothing customer-specific in shipped code/comments. Replaced "foundryai-dev-vrysanek 2026-05-05" references in docker-compose.yml and src/db.py docstring with generic "Docker image upgrade window where DuckDB versions differ" framing. The original incident date + host live in the commit history / PR body, not in the tree. Adds CHANGELOG entries under Unreleased: - Fixed: close_system_db CHECKPOINT-on-shutdown semantics + WAL-replay failure mode the fix protects against - Changed: docker-compose stop_grace_period 60s on app + scheduler --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-10 19:02:30 +00:00
Vojtech	41829e8a45	Setup-prompt + bootstrap fixes from 2026-05-10 init report (#240 ) * Setup-prompt + bootstrap fixes from David's 2026-05-10 init report Three issues from clean-machine bootstrap evidence: 1. `agnes refresh-marketplace --bootstrap` failed to recover when the local clone existed but Claude Code's marketplace registry had lost the `agnes` entry. Bootstrap path now parses `claude plugin marketplace list`, re-runs `claude plugin marketplace add ~/.agnes/marketplace` when missing, and treats `add` failures as fatal (was warn-and-continue, root cause of the cascade into "Marketplace 'agnes' not found" plugin install errors). 2. Setup prompt now always emits the marketplace-registration block, even when the operator has zero plugin grants. Pre-wires the SessionStart hook so future admin grants land automatically without re-running setup. Block copy adapts: empty list shows "no plugins granted yet", populated list shows "install plugins". 3. Setup prompt registers the Atlassian Remote MCP server unattended (`claude mcp add --transport sse atlassian https://mcp.atlassian.com/v1/sse`). Hosted Remote MCP, OAuth handled automatically by Claude Code on first use. Asana / GWS stay on the /home connector cards (PAT/keychain flows don't fit unattended bootstrap). Confirm step nudges the user toward the /home connector cards for the PAT-flow services. CLAUDE.md template renames the marketplace section to "Agnes Marketplace" and documents that all plugins are addressed as `<plugin>@agnes` regardless of upstream slug. Layout: Confirm shifts from step 6/8 to step 9 across all variants (preflight, marketplace, MCP all unconditional). Tests updated. * Link Claude license options from /home install pane Step-1 Claude install on /home pointed users to OAuth without explaining what to do if they don't have a Pro/Max subscription. Add a one-line follow-up link to the plan-tier section on /setup-advanced (new `#claude-plan` anchor) so first-time users discover the subscription tiers rather than bouncing on the OAuth screen. * Add idempotent + no-TLS-bypass guardrails to /home connector prompts The Asana / Google Workspace / Atlassian connector prompts on /home already shipped a precheck step that short-circuits when the service is already wired, but they didn't carry the same idempotency + surface-errors-verbatim + don't-disable-TLS-verification guardrails the bash bootstrap prompt has. Add a one-paragraph 'Ground rules' block at the top of each prompt so a connector failure doesn't tempt the model into bypass workarounds, matching the same posture David's 2026-05-10 init report flagged for the bash flow. * skip Source: lines in marketplace registry detector `claude plugin marketplace list` prints a `Source: <local path>` line under each registered marketplace; the local clone almost always lives under a path containing the marketplace name itself (`~/.agnes/marketplace`). A naive \\bagnes\\b match over the full stdout therefore false-positives whenever ANY unrelated marketplace sits under `~/.agnes-…/` or similar. Filter Source: lines out before matching so the recovery path actually re-adds when needed instead of silently falling through to a broken `marketplace update agnes`. Adds regression test covering the substring-only case. * drop customer-specific tokens from CHANGELOG entries Per CLAUDE.md vendor-agnostic OSS rule ("nothing customer-specific ... in changelogs"): - "agnes-vrysanek.groupondev.com" -> "a private-CA Agnes deployment" - "Groupon Marketplace / groupon-marketplace" -> "<Org> Marketplace / <org>-marketplace" (placeholder example) - Removed "David flagged" attribution language; init-report context stays intact, just stripped of the named host + brand --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-10 20:24:00 +02:00

1 2 3 4 5 ...

850 commits