Commit graph

896 commits

Author SHA1 Message Date
ZdenekSrotyr
62336bfd32
fix(rbac): stack-gated analyst access + first-demo polish (#333 follow-up) (#356)
* fix(rbac): stack-gate analyst table access via data_packages exclusively

Previously analysts could see a table in ``agnes catalog`` /
``/api/sync/manifest`` either by:
  1. being in a group with ``resource_grants(group, 'table', id)``, or
  2. being in a group with ``resource_grants(group, 'data_package', …)``
     for a package containing the table.

Path 1 leaked: admins who minted a per-table grant without ever
wrapping the table in a data_package still shipped the table to
analysts — directly contradicting the unified-stack mental model
("the stack is the unit of access"). User report:
"i když to admin nedal do data package tak to by default uživatelé
dostali to by se nemělo stát".

New policy: analyst visibility is strictly stack-gated. A table is
visible iff at least one data_package containing it is in the
analyst's stack (required ∪ subscribed). Admin god-mode and the three
internal data-source tables (agnes_sessions / _telemetry / _audit
with row-level RBAC) keep their existing carve-outs.

Touched surfaces:
* ``src/rbac.can_access_table`` + ``get_accessible_tables`` —
  routed through ``StackResolver.stack(user, DATA_PACKAGE)`` +
  ``data_package_tables`` join instead of ``resource_grants(table)``.
* ``app/api/sync._build_direct_tables_section`` — always returns
  ``[]`` (key kept for older CLI destructuring); per-table grants
  no longer manifest.
* Standardised 403 detail across ``/api/data/*``, ``/api/query``,
  ``/api/v2/sample``, ``/api/v2/scan``, ``/api/v2/schema``:
  ``Table 'X' is not in your stack. Ask an admin to add it to a
  Data Package you have access to (Required or in your stack),
  then run `agnes pull` to refresh.`` Single source of truth lives
  in ``src.rbac.table_not_in_stack_message`` so the wording stays
  consistent across CLI surfaces.

UX side: ``/catalog/t/<id>`` (table detail page) dropped the four
editorial sections (Sample questions, What's inside, Things to know,
Pairs well with) per user feedback — the page's job is now
"what is this table, where do I find it" (hero + parent packages).

Tests:
* ``tests/conftest.grant_table_via_package`` / ``revoke_table_via_package``
  — shared helpers that wrap a table in an auto-named data_package +
  grant the package required to a custom group. Replaces the legacy
  per-test ``_grant_table_to_analyst`` table-grant pattern.
* All 17 previously-failing legacy tests (test_access_control,
  test_journey_rbac, test_audit_gap_*, test_rbac, …) migrated to use
  the new helper; logic stays the same.
* ``tests/fixtures/analyst_bootstrap._grant_table_access`` updated
  to wrap via data_package so the ``test_pat`` fixture's "two table
  grants" semantics still ship parquets through ``agnes init``.
* New ``tests/test_table_not_in_stack_message.py`` locks in the
  standardised 403 detail across the data + check-access endpoints.

5204 tests passing (added 1).

* fix(catalog): first-demo UX feedback — required-first grouping + longer card description

Two minor polish items from the 2026-05-19 stakeholder demo:

1. Required packages cluster at the top of the Browse grid instead of
   being interleaved by ``created_at``. Sort key
   ``(requirement != 'required', name)`` runs before the adapter
   call in both /catalog (data_packages) and /corporate-memory
   (memory_domains) so the required block is visible without
   scrolling. Regression test pins the order via
   ``data-id="…"`` position in rendered HTML.

2. ``.stack-card__desc`` line clamp bumped 2 → 4 lines. Two-line clamp
   trailed almost every admin-authored description off in "…" before
   the second clause, forcing a click-through to read it. The detail
   page (/catalog/p/<slug>) keeps the unclamped body for longer
   content.

* release: 0.55.3 — stack-gated analyst RBAC (BREAKING) + first-demo UX polish + #345 A/B/C/D + #347 UI consistency
2026-05-19 17:01:14 +02:00
ZdenekSrotyr
0a16e8f44e
feat(diagnose): role-aware overall status (#345 B) (#355)
Reading `Overall: degraded` on a fresh analyst install — driven by
server-side operator warnings (stale tables, session-pipeline cadence,
BQ billing-project config) that the analyst can't act on — erodes trust
in the install. The role-aware headline routes operator-only warnings
through a secondary line so they're not invisible, but they no longer
drive the headline an analyst sees.

Server-side (`app/api/health.py`):
- Per-check `audience: "analyst" | "operator"` tag on every entry in
  `/api/health/detailed` services dict.
- New top-level `caller_role` field (derived from `user.is_admin`) so
  the client knows which aggregation to display.
- New top-level `overall_analyst` field — analyst-only aggregation
  available to clients that don't want to recompute it.

Client-side (`cli/commands/diagnose.py`):
- When the server reports `caller_role`, analyst aggregations exclude
  audience=operator checks from the headline. Analyst-side warnings
  AND server-side errors still escalate (errors are universal).
- Secondary line surfaces operator warning count so they're visible:
  "Overall: healthy (analyst-side); 2 operator-side warnings".
- Admin/operator role auto-promotes to full aggregation; analysts can
  manually opt in via `--include-operator-checks` flag.
- Legacy servers without `caller_role` keep the pre-#345-B full
  aggregation — no silent regression against older deployments.

Audience defaults (`_AUDIENCE` map in health.py):
- analyst: duckdb_state
- operator: db_schema, data, users, bq_config, session_pipeline

Tests: 4 new in TestAnalystAudienceFilter (analyst-only filtering,
admin auto-promote, --include-operator-checks opt-in, legacy server
fallback). 26/26 diagnose + health tests pass.
2026-05-19 16:53:56 +02:00
ZdenekSrotyr
82a4b663d7
feat(cli): AGNES_MARKETPLACE_URL env override for refresh-marketplace --bootstrap (#345 A) (#354)
Deployments that serve the marketplace from a different host than the
API (reverse-proxy split, CDN-fronted marketplace, etc.) now bootstrap
cleanly. Default behaviour unchanged — falls back to
`{server_host}/marketplace.git/` derived from the workspace's
bootstrapped server URL.

Implementation:
- AGNES_MARKETPLACE_URL parsed via urlparse; missing scheme/host
  fail-fast with a clear error rather than silently falling back, so
  operator misconfigurations surface immediately.
- PAT is re-injected into the override URL the same way as the default
  path (user-info form of HTTP basic for git clone), then stripped on
  the post-clone `remote set-url` so it doesn't sit at rest in
  .git/config.
- Trailing slash normalized so urljoin-style consumers don't drop the
  path. Path defaults to /marketplace.git/ if the override omits it.

Help text updated to document the override.

Tests: 2 new in test_cli_refresh_marketplace.py (env override happy
path + invalid URL rejection). 46/46 refresh_marketplace passes.
2026-05-19 16:47:26 +02:00
ZdenekSrotyr
c910281df1
feat(cli): --json alias for --format json + remote-query exit-code regression test (#345 C+D) (#353)
D — `agnes query --json` shortcut for `--format json`. Paste-prompts
and LLM-assisted analysts routinely reach for `--json` first; the typer
"Did you mean --stdin?" suggestion the missing flag previously produced
was actively misleading. `--json --format <other>` is rejected as
mutually exclusive.

C — Verified that `_query_remote` already `raise typer.Exit(1)` on any
non-200 response (cli/commands/query.py:116). The reporter who filed
an explicit 502-path regression test alongside the existing 400 case so
any future regression in the `raise typer.Exit(1)` line gets caught.

Tests: 3 new in test_cli_query.py (5xx exit code, --json alias works,
--json + --format csv mutually exclusive). 15/15 cli_query passes.
2026-05-19 16:39:46 +02:00
Monika Feigler
caae12d02f
fix(web): UI consistency — code tokens, label-qualifier, radio cards, Keboola edit-modal JS (#347)
* fix(web): UI consistency — code tokens, label-qualifier, radio card selected state

I-UI-01: Add .sync-option-card:has(input:checked) rule — border + background
feedback when a radio option card is selected. Add class sync-option-card to
all 14 radio label cards in admin_tables.html.

I-UI-02: Add .label-qualifier / .optional to style-custom.css. Remove the
duplicate local definition from admin_tables.html <style> block.

I-UI-03: Migrate inline code rule to design tokens (--font-mono, --text-sm,
--border-light, --border, --radius-sm). Add background + border so inline
code is visually distinct across all pages.

I-UI-05 (partial): Replace hardcoded #c4c4c4 / #fafafa in .btn-google:hover
with var(--border) / var(--background) so theme overrides apply.

* fix(web): expose entire Keboola edit-modal JS to all instance types

openEditKeboolaModal, closeEditKeboolaModal, saveKeboolaTabEdit,
onEditKbStrategyChange and helpers were still inside {% if keboola %}
but called from always-rendered HTML (openEditModal dispatcher,
Escape key handler, modal overlay click, Cancel/Save buttons).

Removed the Phase F2 if-guard entirely — only prefillFromKeboolaTable
stays conditional (its callers are inside {% if keboola %} HTML blocks).

* fix(ui): promote .form-textarea to global CSS with design tokens

Removes the local hardcoded .form-textarea definition from admin_tables.html
and adds it globally to style-custom.css using design tokens, making
description textareas visually consistent with other form fields.

* fix(ui): restore .form-textarea to local style block for visual consistency

Tokens --text-sm (12px) and --radius-md (6px) differ from the local override
values (13px, 8px) used by .form-input on this page, causing a visible mismatch.
.form-textarea rejoins the shared local selector so all three classes render
identically; global .form-textarea in style-custom.css remains as a baseline
for other pages.

* fix(ui): use textarea.form-textarea in global CSS to override .form-group textarea

.form-group textarea (specificity 0,1,1) was overriding .form-textarea (0,1,0)
with a legacy monospace font and different padding. Raising the selector to
textarea.form-textarea matches specificity and wins via source order, making
description textareas consistent with other form inputs. Local admin_tables.html
overrides for .form-textarea removed — styling now comes entirely from global CSS.

* fix(ui): add border:none to .code-block code + add CHANGELOG entries

Fixes light-gray border leaking into dark .code-block backgrounds.
Adds required CHANGELOG.md entries for all user-visible changes in this PR.

* fix(ui): add --border-dark token + reset border-radius in .code-block code

- Adds --border-dark: #C4C4C4 design token for hover border states
- Uses var(--border-dark) in both .btn-google:hover rules so hover border
  is visually distinct from the base border (was a no-op with var(--border))
- Adds border-radius: 0 to .code-block code override to fully reset the
  new global code border-radius on dark code-block backgrounds

* fix(ui): reset code border/bg inside .use-case-prompt dark container

Adds .plugin-detail .use-case-prompt code override to prevent the new
global code border and background from leaking into the dark #1e1e2e
pre block in marketplace_plugin_detail.html.

* fix(ui): reset code border in all dark-background containers

Global code { border } leaks into dark-themed containers across templates.
Adds border: none (+ border-radius: 0 where needed) to:
- marketplace_plugin_detail.html: lead-rendered pre code, sample-assistant-body code/pre code
- marketplace_item_detail.html: same three selectors
- home_onboarded.html, home_not_onboarded.html, admin_welcome.html: inline code on hero dark backgrounds

* fix(ui): uniform form typography — chip-input font, data-package desc textarea, orphan endif

- .chip-input container gets font-family/size tokens so inner input
  inherits correctly (inline `font: inherit` was pulling browser default)
- cdp-desc / edp-desc switched from form-input to form-textarea so
  description fields render Inter, not monospace
- Removed orphan {% endif %} left in admin_tables.html after rebase
  (caused TemplateSyntaxError breaking all admin-tables tests in CI)
- .item-detail .use-case-prompt code: border/bg reset for dark container

* fix: relax test_keboola_discover_buttons assertion + CHANGELOG bullet for #347

The test_keboola_discover_buttons_hidden_on_bigquery_instance test
asserted bare-string `prefillFromKeboolaTable` not in the rendered
HTML on a non-Keboola instance. That made sense when the function
DEFINITION lived behind the keboola Jinja guard. #347 moves
several Keboola edit-modal helpers out from under the guard so
they're now defined as dead code on every instance, but the actual
call sites (`onclick="prefillFromKeboolaTable(...)"` + the
Discover buttons themselves) still respect the guard — which is
what actually matters for runtime behavior.

Updated the assertions to match `onclick="<fn>(` so they pin the
call-site contract, not the function-definition substring.

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-19 16:30:19 +02:00
Vojtech
318802854c
fix(marketplace): chmod +x .sh files after fetch+reset, not just bootstrap (#352)
Devin Review #350 caught a coverage gap: the chmod +x pass only ran in
_bootstrap_clone (initial install), not in _git_fetch_and_reset (every
subsequent `agnes refresh-marketplace` and `--check` follow-up). On
core.filemode=false setups, a `git reset --hard FETCH_HEAD` overwrites
the working tree without restoring the +x bit, so a hook plugin version
bump would silently re-strip the bit and Permission-denied breakage
would return on the next SessionStart.

Extracted _chmod_clone_sh_files() helper; both _bootstrap_clone and
_git_fetch_and_reset now call it. Best-effort, no-op on Windows NTFS.
2026-05-19 14:10:38 +00:00
ZdenekSrotyr
fc6de77e06
fix(infra): pre-create /data/uploads in customer-instance startup script (#351)
* fix(infra): pre-create /data/uploads in customer-instance startup script

v50/0.55.0 introduced marketplace cover-image uploads mounted under
${DATA_DIR}/uploads; app/main.py eagerly mkdirs the directory at boot
for the StaticFiles mount. On host-bind deploys where /data root is
root-owned, the container's non-root agnes user (UID 999) can't create
that directory and crashloops with PermissionError: '/data/uploads'.

Add 'uploads' to the existing mkdir + chown block in
infra/modules/customer-instance/startup-script.sh.tpl so the dir is
pre-created with the correct ownership at provision time — same way
state/analytics/extracts have been since infra-v1.7.0. The block runs
inside the [ -b $DATA_DEV ] guard, so it only fires when the data
disk is attached (idempotent on reboot, no-op when disk missing).

Fresh VMs provisioned at infra-v1.9.0+ get the dir at first boot. For
existing instances, bump the module pin + terraform apply (rewrites
the instance startup-script metadata) and reboot the VM so the
refreshed block replays — or run a one-off:
  sudo mkdir -p /data/uploads && sudo chown 999:999 /data/uploads

* release: 0.55.2 — pre-create /data/uploads in customer-instance startup script
2026-05-19 13:59:39 +00:00
Vojtech
ae67c40a81
fix(onboarding): /home install flow + agnes init UX hardening (#350)
* fix(web): /home Step 2 recommends --dangerously-skip-permissions for setup

The Step 4 paste runs ~20 shell commands (CLI install, workspace
bootstrap, marketplace clone, MCP register, connector logins). Previous
Step 2 recommended auto-accept-edits via Shift + Tab, which covers file
edits but not Bash — users still clicked ~20 Yes prompts during setup.

Step 2 now leads with `claude --dangerously-skip-permissions` as the
recommended session flag (Bash + edits both skip). Session-scoped, drops
on next plain `claude` — safe here because the pasted script is
generated by this server and ends after a fixed sequence; the flag does
not weaken future Claude sessions.

Auto-accept-edits via Shift + Tab kept as the strict-review fallback;
persistent YOLO allowlist link to /setup-advanced#yolo unchanged.

* fix(web): swap /home Steps 2↔3, claude --yolo as copy-button command

Folder creation moves to Step 2; Step 3 launches Claude from that
directory with `claude --dangerously-skip-permissions`. The YOLO flag
is rendered through the standard .install-cmd + copy-button affordance
(matching Step 1 + Step 2), not inline prose. Step 4 paste runs ~20
shell commands that auto-accept-edits would not cover (Bash still
prompts), so the YOLO flag is the default recommendation; session-
scoped, drops on next plain `claude`.

Setup script's pwd-check warning copy refreshed to reference "/home
Step 2" (the new folder-creation step number).

# Conflicts:
#	CHANGELOG.md

* fix(web): open YOLO setup-advanced link in new tab

Step 3 install-hero's persistent-YOLO link now opens /setup-advanced#yolo
in a new window so users don't lose their /home install context mid-
setup. target="_blank" + rel="noopener" (no reverse-tabnabbing).

* fix(web): merge /home Step 3 fallback prose into prior paragraph

Drop the <br><br> between the 'Session-scoped' line and the 'Prefer
reviewing each command' line so the strict-review fallback flows on
the same paragraph — less vertical space in the install-hero block.

* docs(web): add "What leaves your machine" privacy callout on /home

Install-hero lead now includes a short privacy paragraph: explains that
session telemetry (prompts / tool-calls / tool-responses) flows back to
the central catalog for failure-pattern analysis while raw data rows
the user queries locally stay on their machine. Points at /agnes-private
as the per-session opt-out.

Also collapses leftover cherry-pick conflict markers in CHANGELOG.md
into one clean [Unreleased] section.

* fix(init): harden agnes init UX — 5 issues from David's report

1. chmod +x hooks. agnes init + agnes refresh-marketplace --bootstrap
   now set the execute bit on every .sh they land on disk
   (`<workspace>/.claude/hooks/*.sh` after init; every `.sh` under the
   `~/.agnes/marketplace` clone after a bootstrap/pull). Git checkout
   doesn't always preserve filemode (filemode=false repos, ZIP
   extractions), so hooks were firing with "Permission denied" — silent
   SessionStart / PreToolUse breakage. Best-effort, no-op on Windows.

2. --token-file + AGNES_TOKEN. agnes init now accepts `--token-file
   <path>` and an `AGNES_TOKEN` env fallback alongside `--token`.
   Precedence: --token > --token-file > AGNES_TOKEN. The file / env-var
   paths dodge Claude Code's auto-classifier, which sometimes flags a
   long bearer token in `--token "eyJ..."` command line as a credential-
   exfil pattern. The pasted setup script now uses `--token-file
   ~/.agnes/token` (token written via single-quoted heredoc, umask 077)
   for the same reason.

3. Bash(agnes *) in allow. Default `.claude/settings.json` permissions.
   allow seeded by agnes init now includes `Bash(agnes *)` alongside the
   bare `Bash` entry, so Claude Code's classifier sees an explicit allow
   for subsequent `agnes <verb>` calls inside the workspace it just
   bootstrapped.

4. .zshrc PATH dedup. Setup-script step 1's PATH-persist snippet
   (no-CA install path) replaced with a `grep -qF + ||` idiom so a
   re-run doesn't append a duplicate `export PATH=...` line. Fixed-
   string match (not regex) per the dedup-bug report.

5. `!` prefix doc note. Setup-script step 3 now explicitly tells the
   user: if Claude Code blocks an `agnes` command, prefix it with `!`
   (e.g. `! agnes init …`) to run the command directly in the shell,
   bypassing the auto-classifier.

* release: 0.55.1 — /home onboarding install-hero rework + agnes init UX hardening

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-19 15:26:35 +02:00
ZdenekSrotyr
64cf78860d
feat(stack): unified Browse + My Stack for Data Packages and Memory (v49 schema) (#333)
* feat(unified-stack): Browse + My Stack + Recipes + RBAC matrix (v49–v55)

Squash of 94 commits spanning the v49 → v55 unified-stack rewrite.
Full per-feature breakdown lives in CHANGELOG.md under [Unreleased].
Major buckets:

* v49 schema — first-class user_groups + user_group_members +
  resource_grants; admin can CRUD groups and grants; Google
  Workspace nightly sync writes into the new tables.
* v49 data_packages — admin-curated bundles of tables, RBAC-gated,
  first-class section on /catalog Browse + My Stack.
* v49 memory_domains — row-backed (replaces hardcoded VALID_DOMAINS
  enum); admin can CRUD; grants follow the same shape as tables and
  packages.
* v50 cover_image_url + admin sidebar collapsibles + per-row Mode
  tooltip + admin queue domain badges + admin "+ New Item" seed flow.
* v51 lifecycle status (prod/poc/coming-soon/draft) + category +
  palette swatches on admin modals.
* v52 per-table detail page /catalog/t/<id>.
* v53 Recipes — admin-curated SQL templates as a second tab on
  /catalog with full Edit/Delete admin affordances.
* v54 soft-delete (deleted_at) + Undo toast for packages, memory
  domains, and recipes; hard_delete() retained as escape hatch.
* v55 Recipes RBAC — ResourceType.RECIPE registered, inline Group
  Access matrix on Create + Edit Recipe modals (mirrors the Memory
  Domain pattern).
* Activity Center per-resource filter (resource_prefix LIKE-anchored
  on audit_log.resource); admin nav g+letter keyboard shortcuts;
  loadAdminTablesLayout N+1 → single endpoint; /api/memory 30s
  page-level cache.
* CI hardening — Keboola legacy tests pytest.importorskip; perf-
  smoke threshold widened to stop cold-cache flake.

5002 tests passing, 35 skipped.

* feat(p2 backlog): Cmd-K palette + suggest-a-domain + nightly E2E + v55 schema

10-item P2 sweep on top of the unified-stack squash. New behaviour:

* Cmd-K admin command palette (base.html) — fuzzy-search overlay over
  admin + user-facing routes. Arrows/Enter to navigate, Esc to close.
* Stack-tabs digit shortcuts — 1/2/3 switch Browse / My Stack /
  Recipes on /catalog + /corporate-memory.
* Friendlier non-admin empty state on /corporate-memory, plus a
  "Suggest a domain" CTA → POST /api/memory-domain-suggestions, admin
  queue with approve/reject. Backed by a new memory_domain_suggestions
  table (schema v55).
* /admin/corporate-memory 7-tab strip grouped under Moderation /
  Catalog parent labels.
* Bulk-assign table → package dropdown annotates each option with
  "(N of M tables already in)" so the existing distribution is visible
  before picking a target.
* GET /api/memory + /tree accept is_required filter; admin status
  dropdowns route the "Required" sentinel onto it (status no longer
  holds 'mandatory' post-v49, so the old dropdown returned nothing).
* chip-input.js is now opt-in per template via {% block extra_scripts %}
  instead of loaded globally on every page from base.html.
* Edit-modal close helpers consolidated onto _closeEditModalById();
  docs the per-source-type modal architecture decision.
* New .github/workflows/e2e-nightly.yml runs agent-browser smoke
  scripts (scripts/e2e/smoke_*.sh) against a docker-compose stack
  nightly at 04:30 UTC; failures open an agent-browser-nightly issue.

5012 tests passing, 35 skipped.

* fix(visual audit): 6 page regressions on memory + data-package surfaces

agent-browser walkthrough of every memory + data-package page in the PR
turned up 6 real bugs. Fixes:

1. Admin memory modals were dead. Duplicate `let _cmdNewDomainId`
   declarations from the deprecated step-2 RBAC stubs in
   admin_corporate_memory.html collided with the live state vars
   declared earlier in the same <script> → SyntaxError on parse →
   the entire second script block silently failed → every inline
   onclick= handler defined there (`+ New Memory Domain`, Edit, etc.)
   was a no-op. Removed the duplicate stubs.

2. /catalog/t/<table_id> + /catalog/r/<slug> rendered unstyled.
   Both templates injected their CSS via {% block head %} but
   base.html exposes {% block head_extra %} — wrong block name
   meant <style> rules never reached the rendered HTML. Renamed
   to head_extra. Hero card, section cards, dark SQL block, proper
   full-width inputs all now render as designed.

3. L49 leak — "MANDATORY" KPI label + "Make Mandatory" row buttons
   on /admin/corporate-memory still used the old word. Renamed to
   "Required" / "Mark as Required" so UI matches the data model
   (v49 split moved the Required tier onto the orthogonal
   is_required boolean; status no longer holds 'mandatory').

4. Activity Center Resource dropdown didn't know the v55
   `memory_domain_suggestion:` namespace — added it.

5. Tab strip on /admin/corporate-memory wrapped text 2× per button
   on narrow viewports after the L50 MODERATION/CATALOG group
   labels pushed total width past most viewports. Switched the
   strip to flex-wrap:nowrap + overflow-x:auto with
   white-space:nowrap + flex-shrink:0 on every direct child so the
   tabs stay one row and slide horizontally when they overflow.

5012 tests passing, 35 skipped.

* rebase-cleanup: align with main's 0.54.25-27 API design + comment fix

Three follow-on fixes after rebasing onto origin/main (0.54.27):

* admin_tables.html: dropped a stray nested ``{% if data_source_type
  == 'keboola' %}`` around ``prefillFromKeboolaTable`` (main never had
  it; the outer Phase F2 guard already covers it) and reworded a JS
  comment that contained literal ``{% %}`` tokens which Jinja was
  parsing as a real tag → unbalanced if/endif → 30 template render
  failures across the suite.
* /api/stack/subscription/{type}/{id}: DELETE now returns 204 instead
  of 200 per the 0.54.26 design rules. CLI client + parity tests
  updated to accept 2xx / assert 204.
* Memory-domain suggestion approve/reject paths added to
  ``_VERB_PATH_ALLOWLIST`` — they are pending → approved/rejected
  state-machine transitions (approve also creates the real
  memory_domains row as a side effect), so the RPC shape is
  intentional rather than a missed PATCH refactor.

5035 tests passing, 35 skipped.

* fix(catalog_table_detail): real polish pass — hero glyph, dedup pills, rows/size meta, scoped sync CTA

The previous fix only got the block-name typo so the existing CSS rendered.
The actual layout was still wireframe-tier on close inspection:

* No cover glyph in the hero (a flat white card with title + meta line);
  data-package + memory-domain detail pages both have a colored icon
  square. Restored parity — table.icon emoji if set, otherwise initials
  on a colored square using table.color.
* "INTERNAL" pill rendered twice for agnes_audit etc. — the mode pill
  and the source-type pill happened to be identical strings. Now skip
  the source pill when it matches the mode (`internal == internal`).
* Bucket / source_table code chip showed `Agnes Internal.audit_log` for
  internal rows — meaningless to a user. Hidden when source_type is
  internal.
* `pairs_well_with` admin input was a comma-separated `<input>` always
  visible. Wrapped all 4 sections in an Edit-on-demand toggle: read-
  only display by default, "+ Add" / "Edit" button on the right edge
  of each section header reveals the inline form, Cancel hides it.
* "Trigger sync now" was a cramped link squashed into the empty-state
  flex row (visible as `Tr…` overflow before). Promoted to a proper
  btn-primary button under the empty-state copy. Hidden entirely for
  internal tables (which are server-managed — no upstream to pull).
* Hero meta now surfaces row count + payload size (when sync_state has
  them) + last sync timestamp on a single line — was missing from the
  original.
* Mode pills colored by tier (local=green, remote=amber, materialized=
  blue, internal=gray) so the basic fact about a table reads at a
  glance, not from upper-cased ALL-CAPS text alone.

* tests(v56): TDD baseline for extended data-packages content + per-table docs

68 failing tests across 8 files spec the v56 surface before any
implementation lands:

* test_schema_v55_to_v56_migration.py — schema bump, additive ALTERs
  on data_packages + table_registry, idempotency, sequential-upgrade
  preservation
* test_data_packages_repo_v56.py — repo create/update/get/list for
  owner_name, owner_team, tags, long_description, when_to_use,
  when_not_to_use, example_questions (JSON list round-trip, empty
  defaults, partial-update preservation)
* test_table_registry_v56_docs.py — update_docs for grain, platforms,
  partition_col, history, gotchas; preserves v52 docs columns
* test_api_data_packages_v56.py — PUT/POST/GET for all new fields,
  field-level validation (tag count, bullet length, description size),
  virtual badge derivation (curated/new)
* test_api_registry_docs_v56.py — PATCH /api/admin/registry/{id}/docs
  for v56 fields, validation, RBAC unchanged
* test_web_catalog_package_detail_v56.py — /catalog/p/<slug> rewrite
  asserts on rendered owner line, tag pills, badges, What it is,
  Use it when, Skip it when, Example questions, per-table extended
  detail in collapsible row, key-gotcha distinctness, admin-only Edit
* test_web_stack_card_v56_metadata.py — Browse-grid card additions
  (owner chip, tag chips, badges) without breaking back-compat for
  rows missing the new fields
* test_data_packages_no_vendor_content.py — CI guard: scans app/ +
  src/ + cli/ + config/ + scripts/ for Groupon-specific tokens from
  the colleague's spec MD; fails if any leak into OSS surfaces
* test_db_schema_version.py — bumped 55 → 56 with rationale

Plus updates schema-version assertion to 56. Implementation lands in
subsequent commits (schema migration → repo → API → templates).

* feat(v56): schema + repo for extended data-packages content

Schema additions (ALTER ADD COLUMN IF NOT EXISTS — additive + idempotent):

* data_packages: owner_name, owner_team, tags, long_description,
  when_to_use, when_not_to_use, example_questions (JSON-as-VARCHAR for
  the lists)
* table_registry: grain, platforms, partition_col, history, gotchas
  (extends the v52 sample_questions / things_to_know / pairs_well_with
  docs surface with structured per-table content)

Repo extensions:

* DataPackagesRepository.create + update accept the new fields with
  the same Optional-is-no-op contract as v51 (pass an empty list to
  clear a JSON column)
* _decode_row decodes the new JSON-list columns to Python lists; NULL
  rounds back to [] so callers don't branch
* TableRegistryRepository.update_docs grew the v56 fields alongside
  the existing v52 ones — single PATCH can write either tier
  atomically
* TableRegistryRepository._decode_row picks up platforms + gotchas in
  the same NULL-tolerant decoder

22 repo + migration tests passing. API + UI land in subsequent commits.

* feat(v56): API surface for extended data-packages + per-table docs

CreateDataPackageRequest + UpdateDataPackageRequest grew the v56 fields
(owner_name, owner_team, tags, long_description, when_to_use,
when_not_to_use, example_questions) with per-field validators that
match the Foundry spec checklist:

  * tags: ≤8 entries × ≤30 chars
  * long_description: ≤4000 chars
  * use/skip: ≤8 bullets × ≤200 chars
  * example_questions: ≤12 × ≤200 chars

_serialize emits all v56 fields plus a virtual ``badges`` list derived
server-side at render time (no DB column needed): "curated" when the
creator is in the Admin group, "new" within 30 days of created_at.
Backdating created_at or admin-status changes pick up automatically.

PATCH /api/admin/registry/{id}/docs extended with v56 structured
per-table fields (grain, platforms, partition_col, history, gotchas).
gotchas: list of {key: bool, body: str} Pydantic models with the same
≤8 cap; first key=true entry becomes the Key gotcha on the rendered
package detail page. PATCH echoes the fresh state so callers can
re-render without a second GET.

26 API tests passing (16 data-packages + 10 registry-docs).

* feat(v56): /catalog/p/<slug> rewrite + Browse-grid card augmentation

The third (and final) v56 commit lights up the UI surfaces backed by
the schema + API commits earlier in this PR:

* /catalog/p/<slug> template rebuilt around the Foundry spec's
  section ladder — hero (icon + name + badges + owner + tags +
  description + meta + Add-to-stack), "What it is" markdown body,
  paired "Use it when / Skip it when" panels, "Tables in this
  package" with collapsible per-table extended detail (grain /
  platforms / partition_col / history / gotchas + sample questions),
  and an "Example questions you can ask Claude" prompt panel. Each
  section guarded by ``{% if pkg.<field> %}`` — empty content fields
  hide the section entirely (no "No X yet" placeholder noise on the
  public-facing drilldown).
* router catalog_package_detail hydrates per-table v56 fields onto
  the tables list + derives the virtual badges (curated / new)
  server-side from creator-in-Admin + 30-day created_at.
* StackResolver.ResourceEntry grew owner_name / owner_team / tags /
  badges; _fetch_entries pulls the v56 columns + computes badges
  once per fetch using a single Admin-group SELECT.
* _data_package_entry_dict adapter passes the new fields through to
  the macro; tags are merged source-type pills + admin-authored
  category tags per the spec convention.
* _stack_card.html renders the v56 badges (top-left, data-badge=
  hooks) + the owner chip (data-card-owner hook) without breaking
  back-compat — pre-v56 rows render unchanged.
* Admin PUT handler strips the v56 docs fields from the
  read-modify-write merged dict so register() doesn't blow up
  with the now-larger row shape (same pattern as the v52 docs
  fields stripping).

5115 tests passing (+98 v56 + 18 fixed regressions from the merged-
register PUT path), 35 skipped.

* fix(rbac): Edit-on-package + Group-access 'required' persistence + CI vendor guard

Three related bugs reported on the merged-with-main branch:

1. Clicking Edit on a Data Package card landed on /admin/tables with
   a `#<pkg.id>` hash that nothing listened to — admin saw the global
   table listing, not the editor for that specific package. Added a
   `?edit_package=<pkg_id>` query-param handler in admin_tables.html
   (analog to the existing `?edit=<table_id>` and `?assign_to=<pkg_id>`
   patterns) that calls openEditDataPackageModal on DOMContentLoaded
   after a 250ms layout settle. Updated the package-detail Edit link
   to use the new query param.

2. Setting Group Access to 'required' didn't persist — re-opening
   the modal showed 'available'. Root cause was the v49
   ``resource_grants.requirement`` enum existing in the DB but the
   POST /api/admin/grants endpoint not surfacing it: ``CreateGrantRequest``
   declared only group_id + resource_type + resource_id, so Pydantic
   silently dropped the matrix's ``requirement: 'required'`` payload
   and the new row landed at the DB column default ('available').
   Plumbed ``requirement`` through ``CreateGrantRequest`` →
   ``ResourceGrantsRepository.create`` so the value persists in one
   round-trip. Plus a UNIQUE-constraint race in the matrix
   diff-apply: DELETE-old + POST-new ran in parallel via
   ``Promise.allSettled``, so POST could fire first and trip the
   unique check before DELETE freed the slot. Switched to sequential
   (await all deletes; then await all writes) across all three
   matrices (Edit Data Package, Edit Memory Domain, Edit Recipe).

3. CI vendor-content guard ``test_no_groupon_specific_strings_in_oss``
   tripped on two of my own docstrings: a "Foundry Data team" mention
   in two src/db.py comments + an ``s1_session_landings`` example in
   cli/skills/agnes-table-registration.md. Rephrased the comments to
   "extended-descriptions admin spec" and replaced the example with
   a generic ``events_daily`` table name.

5164 tests passing, 35 skipped (+4 regression tests pinning the POST
/api/admin/grants requirement contract). Vendor guard back to green.

* fix(catalog): admin Browse path drops v58 card fields

The /catalog and /memory admin god-mode branch built ResourceEntry
instances inline from pkg_repo.list() / domains_repo.list() and skipped
owner_name, owner_team, tags, and derived badges (curated/new). Visible
symptom: a package with an owner + tags rendered with the v56 chrome
for non-admin viewers but as a bare card for admins.

Adds StackResolver.browse_admin(user_id, resource_type) — admin god-mode
Browse that walks the full table but routes through the same
_fetch_entries enrichment pass as browse(), so admin + non-admin Browse
stay visually consistent. Both /catalog and /corporate-memory routes
switch to it.

Regression test in tests/test_stack_resolver_browse_admin.py covers:
owner/tags propagation, new/curated badge derivation, in_stack from
admin subscriptions, all-packages-regardless-of-grants, and the
ValueError for unsupported resource types.

* fix(catalog): three /catalog tab-strip UX bugs

1. Required Remove → red toast
   browse_admin passed empty required_ids to _fetch_entries, so the
   admin's own required grants surfaced as 'available' and the macro
   rendered an actionable Remove button that POST /unsubscribe 400'd
   on. Now derives required_ids from the admin's own groups so
   Required packages render with the disabled "In stack (required)"
   button. Regression test in test_stack_resolver_browse_admin.py.

2. Remove green-toasts but card stays until refresh
   The My-Stack empty-state placeholder was only emitted server-side
   when stack_entries was empty at render time. Removing the last
   card left the tab completely blank — users read that as "Remove
   didn't work, let me refresh". Both grid + empty-state are now
   always rendered with one of them initially hidden; the JS swaps
   visibility on add/remove instead of injecting DOM. Same fix in
   /corporate-memory.

3. "What are Recipes?" + ambiguous (admin) suffix
   Recipes tab now carries its own curator-block explainer (the
   shared one was moved inside Browse view so it doesn't bleed
   across tabs). The grey "(admin)" suffix becomes a yellow
   .admin-only-hint chip with a title tooltip — visibility hint is
   now unambiguous: yellow chip = "only you see this", non-admins
   don't see the affordance at all.

* schema: renumber v51..v58 → v52..v59 to make room for main's v51

Main 0.54.29 introduced a NEW v51 (table_registry.bq_fqn — issue #343)
that releases ahead of this branch. The unified-stack chain v51..v58
shifts up by one so main's v51 stays as the released schema and ours
become v52..v59. Function names, internal version bumps, dispatch
ladder thresholds, and the migration-test references all move
together. Subsequent merge with main lands the bq_fqn column at the
freed v51 slot.

* fix(seed): seed admin lands in BOTH Admin AND Everyone groups

The LOCAL_DEV_MODE / SEED_ADMIN_EMAIL bootstrap only added the seed
user to Admin. Everyone-scoped grants — the canonical "every-user-
sees-this" pattern for Required onboarding — didn't surface for the
seed admin's own /catalog because they weren't in Everyone. Symptom:
admin grants a Required-tier package to Everyone, then sees it on
/catalog still rendered with an "Add to stack" button (because the
admin's resolved required_ids was empty for that package).

The dual-membership keeps Admin (authorization) and Everyone
(default-grant target) intentionally separate per the design comment
on UserRepository.create — every membership remains traceable to a
concrete row, just now with a system_seed row in Everyone too. Both
INSERTs go through UserGroupMembersRepository.add_member which is
idempotent on (user_id, group_id), so re-fires on every lifespan
startup don't duplicate rows.

Regression test in test_main_seed_admin_everyone.py.

* style: unify admin-only hints across marketplace + memory detail pages

Replaces three stale ``(admin)`` parentheticals with the same yellow
``admin-only`` chip introduced for /catalog tab actions. Same tooltip
copy ("Visible only to admins — analysts won't see this …") so the
visibility hint is unmistakable wherever it appears:

- Hard delete on marketplace_plugin_detail (admin-only destructive
  action — same gating as the original suffix conveyed).
- Hard delete on marketplace_item_detail (same).
- Edit link on memory_domain_detail (title-attr only before; now a
  visible chip too).

Non-admin viewers never saw these affordances — the gates are
unchanged. Pure styling pass for consistency.

* fix(catalog): exclude soft-deleted data packages + memory domains from Browse

``StackResolver._fetch_entries`` and ``browse_admin`` were querying
data_packages / memory_domains without a ``deleted_at IS NULL`` guard.
A package soft-deleted via /admin/* (v54 soft-delete contract) stayed
visible on /catalog and /memory until either an Undo or a hard delete
— directly contradicting the soft-delete UX which is supposed to
remove the affordance immediately and only retain the row for the
Undo window.

The repository accessors (DataPackagesRepository.list,
MemoryDomainsRepository.list, list_packages_of_table, etc.) already
filter deleted rows; this commit brings the resolver's direct SQL in
line with that contract.

Regression test in test_stack_resolver_browse_admin.py.

* fix(catalog): Add/Remove updates full card chrome, not just button

The previous _applyStackChange flipped only the footer button label —
the card border (.is-in-stack class), top-right "In stack" badge, and
button color class (--add / --remove) stayed at their server-rendered
state. After Add the user saw the button checkmark but the rest of
the card still looked like "available, not in stack". They read this
as "the change didn't take — let me refresh".

This commit makes the optimistic update mirror what the server-side
macro renders for the new state:

* ``c.classList.toggle('is-in-stack', becameInStack)`` — flips the
  border + visual state class.
* Top-right ``.stack-card__req-badge--instack`` badge is injected on
  Add, removed on Remove (skipped when ``data-requirement='required'``
  — that slot is owned by the Required badge).
* Button text is "Remove" / "+ Add to stack" matching the macro
  (was "✓ In stack" which was visually nice but inconsistent).
* Button color class --add / --remove swaps so the destructive Remove
  tint kicks in immediately.

The clone-into-My-Stack path applies the same updates so the new card
in My Stack reads identically to a server-rendered in_stack card.
Mirrored in /corporate-memory.

* fix(memory): four Devin-review bugs on /memory drill-down + manifest

PR #333 Devin review surfaced four real bugs that ship a broken
/memory experience even though the unit tests passed.

1. Manifest md5 omits is_required + content (app/api/sync.py:836-840)
   _build_memory_domains_section hashed only (id|title|status) per
   item. _build_per_domain_markdown routes items between "## Required"
   and "## Approved" by is_required and embeds full content — so an
   admin edit of either dimension left the manifest md5 unchanged,
   `agnes pull` skipped the re-fetch, and the analyst kept a stale
   bundle.md. Now both fields participate in the hash.

2. required_count always 0 (src/repositories/memory_domains.py)
   list_items_of_domain only SELECTed (id, title, status) so the
   `it.get("is_required")` in the manifest builder always evaluated
   to None → required_count = 0 regardless of actual state. The
   manifest builder advertised a count it could never compute. Now
   projects is_required + content too (required by fix 1 anyway).

3. Vote URL 404 (memory_domain_detail.html:289-290)
   Constructed `/api/memory/items/{id}/vote` but the route is
   `/api/memory/{id}/vote`. Every upvote/downvote button was a
   silent no-op.

4. Dismiss/undismiss URL + method both wrong (memory_domain_detail.html:296-305)
   Constructed `/api/memory/items/{id}/dismiss` (extra /items/) and
   /undismiss (no such route — undismiss is DELETE on /dismiss).
   Both buttons silently 404'd. Now POST + DELETE on
   `/api/memory/{id}/dismiss` per app/api/memory.py:635/675.

* fix: multi-agent reviewer findings — vendor-token scrubs + manifest md5 predicate + soft-delete filter

Three reviewer findings from the multi-agent review on PR #333,
fixed in-place per CLAUDE.md issue-economy rule.

Reviewer-rules (Important — vendor-agnostic OSS):
- app/main.py:218 comment: replaced 'foundryai-prod' with generic
  'a customer prod instance' phrasing. Public OSS repo must not
  carry customer-specific tokens (CLAUDE.md § Project conventions).
- tests/test_table_registry_v56_docs.py:70 fixture string:
  replaced "user_brand_affiliation = 'groupon'" with 'acme' on
  the same rule.

Reviewer-architecture (closes still-unresolved Devin 🚩 ANALYSIS):
- app/api/sync.py _build_memory_domains_section: md5 hash loop now
  filters items to the SAME predicate the bundle renderer uses
  (is_required OR status='approved'). Pre-fix the hash iterated ALL
  items but _build_per_domain_markdown only rendered the union of
  required items + approved-non-required items — so an admin edit
  to a pending/rejected non-required item flipped the md5 against
  an identical-bytes bundle, triggering a wasteful re-fetch on
  every analyst's next 'agnes pull'. The earlier commit fixed the
  hash-input fields (is_required + content); this closes the
  set-of-items asymmetry Devin separately flagged.

Reviewer-RBAC (minor cleanup):
- app/resource_types.py _data_package_blocks and _memory_domain_blocks
  now filter 'WHERE deleted_at IS NULL' (v54 soft-delete column) so
  the /admin/access UI doesn't surface soft-deleted entities as
  grantable. Mirrors the existing filter on _recipe_blocks. No
  security leak pre-fix (resolver double-filters and re-checks at
  serve time), just UI cleanliness.
- app/services/stack_resolver.py add_to_stack: docstring note
  added explaining that authorization is enforced at the API layer
  (app/api/stack.py can_access gate), not at the resolver. The
  initial review suggested adding a defensive 403 here, but that
  broke 5 existing tests that legitimately call add_to_stack
  directly without setting up grants first; the docstring captures
  the contract instead. stack() already intersects subscriptions
  with current available_ids on every read, so a 'zombie' row from
  a misuse never leaks into the user-facing manifest.

* release: 0.55.0 — unified Browse + My Stack (Data Packages + Memory), schema v48→v59, 3 BREAKING
2026-05-19 15:00:15 +02:00
ZdenekSrotyr
c3e82972c8
feat(bq): decouple table_registry bucket from BQ dataset name (#343) (#346)
* feat(bq): decouple table_registry bucket from BQ dataset name (#343)

Adds optional `bq_fqn` column (schema v51) carrying the fully-qualified
BigQuery path (project.dataset.table) so the rebuild path no longer has
to reconstruct it from the dual-purpose `bucket` field (which is also a
UX/RBAC label).

- Schema v51 migration + _SYSTEM_SCHEMA carry the nullable column;
  rows without it keep using the legacy bucket+source_table+
  remote_attach.project path (backwards compat).
- BQ extractor honors bq_fqn per row when present: dataset/table
  override on same-project rows; cross-project VIEW path works via
  bigquery_query(billing, ...); cross-project BASE TABLE skipped with
  a clear warning (multi-ATTACH per project deferred to follow-up).
- Orchestrator pre-pass detects drift between extract.duckdb
  _remote_attach.url and overlay data_source.bigquery.project, calls
  rebuild_from_registry to regenerate when they differ. Closes the
  operational hazard where /admin/server-config edits silently left
  the on-disk extract pointing at the old project until the next
  manual sync.
- Startup config check warns when project ≠ billing_project without
  location set (the on-disk symptom is "provider returned no data"
  silently in metadata cache), and when a warehouse-like data project
  has no billing_project override (silent 403 serviceusage path).
- _resolve_bq_location warning now points at the location config key
  explicitly so operators see the actionable fix in the log.
- POST /api/admin/register-table and PUT /api/admin/registry/{id}
  accept bq_fqn; malformed values rejected at the API boundary (422).
- 25 tests covering parse_bq_fqn matrix, extractor override paths
  (same-project + cross-project VIEW + cross-project BASE TABLE skip),
  orchestrator drift sync, startup-validator heuristic, admin models.

UI surface for bq_fqn input in /admin/tables intentionally omitted from
this PR (3.5k-line template change) — admins can register through the
REST API or `agnes admin` CLI in the meantime. Multi-project ATTACH
support is the same scope deferral as the cross-project BASE TABLE
skip; both ride a follow-up PR.

* review fixes: abstract CHANGELOG, merge duplicate Changed, bump docs schema version

- CHANGELOG.md: remove customer-specific hostname + incident date range
  from the orchestrator drift-sync entry (vendor-agnostic OSS rule),
  fold the entry into the existing [Unreleased] ### Changed section
  instead of opening a duplicate heading.
- docs/architecture.md: bump 'Current schema version' from 19 to 51 to
  match SCHEMA_VERSION (per agnes-orchestrator skill rule #4).

* review fixes: vendor-agnostic test fixture + Schema v51 internal bullet

- tests/test_bq_fqn.py: replace customer GCP project ID with generic
  'my-warehouse-project' placeholder (vendor-agnostic OSS rule). Test
  asserts on the warehouse-like heuristic, not the literal project
  name, so the rename is behavior-neutral.
- CHANGELOG.md: add explicit '\*\*Schema v51\*\*' bullet under
  `### Internal` naming the new version + summarizing the additive
  nullable column (matches the convention from v47/v48 bullets).

* fix(bq): cross-project _detect_table_type bills against extractor project

Addresses Devin review on #346 — pre-fix _detect_table_type passed the
data project as BOTH the FROM-clause target AND the bigquery_query()
first arg (billing project). For cross-project bq_fqn rows where
fqn_project != project_id, the data SA holds bigquery.dataViewer on
fqn_project but the serviceusage.services.use permission only on
project_id, so the call 403'd. init_extract's broad except Exception
swallowed the error and silently skipped the row, meaning the
cross-project VIEW path at extractor.py:~696 — the PR's primary
cross-project use case — never executed.

- Add optional billing_project kwarg to _detect_table_type; defaults
  to project for backwards compat (same-project callers unaffected).
- Update the init_extract call site to pass billing_project=project_id
  explicitly. Same-project rows (fqn_project == project_id) are a
  no-op; cross-project rows now route billing to the project where
  the SA actually has services.use.
- 2 new tests in TestDetectTableTypeBilling cover (a) explicit
  billing_project routing to bigquery_query 1st arg + data project
  staying in FROM, and (b) the backwards-compat default. Plus
  test_cross_project_detect_call_bills_against_extractor_project
  pins the call-site wiring — captures the (project, billing_project)
  pair the extractor passes for a cross-project bq_fqn row.

* release: 0.54.29 — bq_fqn decoupling + marketplace refactor + setup-script UX

Accumulated [Unreleased] content from #342 (flea marketplace refactor),
#344 (setup script step-2 cwd check), and #346 (this PR — bq_fqn column
+ orchestrator drift sync + startup config check). Schema v51.
2026-05-19 11:17:32 +00:00
Vojtech
bd90485dbd
fix(web): setup script step 2 checks pwd, no auto-mkdir (#344)
The /home onboarding page already has a visible manual "Step 3 — create
your workspace folder" instructing the user to `mkdir -p ~/<dir> && cd
~/<dir>` BEFORE pasting the install script into Claude Code. The pasted
script's step 2 then re-ran the same mkdir+cd, which silently overrode
an intentional alternate install path (e.g. user cd'd to
~/work/agnes-prod on purpose) and was redundant on the default path.

Step 2 now verifies the user is in `$HOME/<workspace_dir>` via `pwd`.
On mismatch it stops and asks the user to either re-paste from the
correct folder or reply `install here` to accept the current cwd.
Never auto-creates a folder.

Step 9 (restart Claude Code) references the install directory confirmed
in step 2 instead of a hardcoded `~/<workspace_dir>`, so users on a
custom path see accurate guidance.
2026-05-19 11:20:33 +02:00
minasarustamyan
c6c72b9c00
feat(flea): marketplace refactor — data model, attribution, UI unification (#342)
* feat(flea): phase-1 — title, tagline, synthetic_name columns + upload UX

Schema v49 adds three user-facing metadata columns to store_entities:

- title (NOT NULL) — humanized display name shown on marketplace
  surfaces in later phases. Acronym-aware humanizer in
  src/store_naming.py (27 entries: MCP, API, OAuth, S3, …) shared
  with the frontend via Jinja-injected dict so JS pre-fill and
  Python backfill produce identical output.
- tagline (NULL, ≤200 chars) — optional short description for card
  listings. Long-form `description` stays.
- synthetic_name (NOT NULL) — deterministic `<name>-by-<owner_username>`
  stored as a column for indexing and as the single source of truth
  for attribution lookups in later phases. Today's bundle bake still
  uses suffixed_name() at the same call sites.

Migration (_v48_to_v49_migrate, Python function — humanize has no
SQL equivalent) backfills existing rows: title from
humanize_name(strip_archive_suffix(name)), synthetic from the concat
formula; tagline stays NULL. Idempotent (ADD COLUMN IF NOT EXISTS +
SET NOT NULL no-op on re-run).

Upload form (store_upload.html step 2) reorders fields: Title
(pre-filled from server-side humanize, JS keeps it in sync until
the user edits manually) → Name + dark synthetic preview on one
row (matches marketplace_item_detail.html dark code styling, no
copy button — preview only) → Short description with character
counter → Description (unchanged). Edit form (store_edit.html)
mirrors the layout with pre-filled values from the entity row.

API:

- POST /api/store/entities/preview returns `title` (humanized
  fallback) for upload form pre-fill.
- POST + PUT /api/store/entities accept `title` and `tagline` form
  fields with 100/200-char validation; PUT recomputes
  synthetic_name when `name` changes (caller responsibility per
  repo contract).
- StoreEntityResponse exposes all three new fields.

Repository:

- create() takes title + tagline + synthetic_name as optional
  kwargs with derived defaults (humanize_name(name) / concat) so
  existing test fixtures don't need to thread them.
- update() supports partial updates on all three; tagline empty
  string clears via NULL sentinel.
- archive() recomputes synthetic_name on rename to the archived
  slug so the column stays consistent with name.

Tests:

- New test_schema_v48_to_v49_migration.py: fresh install,
  populated-row backfill (incl. archived row strip), idempotence,
  NOT NULL constraint verification.
- test_store_naming.py: 14 humanize parametrize cases + acronym
  dict invariants.
- test_store_api.py::TestStoreV49Metadata: preview humanize, POST
  with explicit + fallback title, 100/200-char rejects, PUT
  partial update + synthetic recompute on rename.
- Schema version assertion bumps (48 → 49) in test_db_schema_version,
  test_home_stats, test_schema_v42_migration, test_schema_v46_migration.

Phase 1 only — surface rendering on cards / detail pages and
Claude Code bundle propagation come in later phases.

* feat(flea): phase-2 — wire title/tagline/owner through marketplace cards + detail pages

Phase 1 (7f4cfcbb) populated the three new columns on store_entities;
phase 2 surfaces them across the web presentation layer so the kebab-
case slug + bare username no longer leak into user-facing copy.

API:

- `_flea_to_item` now takes `conn` (both callsites updated) and sets
  `display_name=entity.title`, `tagline=entity.tagline`, `owner=
  _resolve_owner_display(conn, owner_user_id, owner_username)` —
  matches the chain the curated path already uses (users.name →
  users.email → fallback). The card JS chain `it.display_name ||
  it.name` then renders the friendly form; `name` stays at the
  suffixed slug as the technical identifier JS uses for fallbacks.
- `flea_detail` adds `display_name` + `tagline` to PluginDetailResponse
  so the standalone skill/agent + plugin detail heroes pick them up
  through the existing `d.display_name` / `d.tagline` chains.
- `_flea_inner_parent_fields` swaps `parent_display_name` from
  `strip_archive_suffix(name)` to `entity.title or strip_archive_suffix(
  name)`. Drives parent-plugin label in four surfaces at once:
  breadcrumb 3rd segment, hero "part of <plugin>" meta-row,
  helper "This skill is part of <plugin>" panel, and the Details
  sidebar's "Parent plugin" row.

Templates — `marketplace_item_detail.html`:

- Pre-render: browser title, hero h1, and hero-window-label read
  `(entity.title if entity else None) or inner_name or item_name or
  plugin_name` so the SSR shell shows the friendly title before the
  JS fetch lands (no flash of kebab-case).
- Breadcrumb last segment for flea standalone drops the `d.manifest_name
  || heroTitle` fallback in favour of just `heroTitle` — manifest_name
  is the suffixed slug and users explicitly didn't want it in the path.
- Hero meta-row for flea standalone is now hidden. The prior "by
  <author> · N installed · <size>" line duplicated install count
  (hero telemetry chip below), owner + bundle size (Details sidebar).

Templates — `marketplace_plugin_detail.html`:

- Same SSR pre-render swap (title, h1, window-label, crumb-name).
- Hero tagline element starts hidden; JS shows it only when
  `d.tagline` is truthy. Pre-fix it fell back to `d.description`
  (long-form text), which read awkwardly under the h1 and pulled the
  hero too tall. Description still renders in the "What it does"
  panel below the hero.
- Initial "Loading…" placeholder removed so entities without a
  tagline don't flash that text mid-fetch.

Tests:

- New `TestFleaPhase2Presentation` class in test_marketplace_api.py
  (6 cases): card title + tagline + full-name owner, owner fallback
  chain when users.name is NULL, flea_detail exposes title + tagline,
  tagline null when omitted, inner skill parent_display_name uses
  entity.title (explicit + humanize-fallback variants).
- Updated `TestListItems.test_flea_lists_uploads` to assert both
  `display_name == "Alpha"` (humanized) and `name ==
  "alpha-by-alice"` (suffixed slug compat).
- Updated `TestWebPages.test_marketplace_flea_detail_page_renders`
  to look for the humanized title ("Page Skill") in the SSR shell
  instead of the kebab-case `page-skill`.

* feat(flea): phase-3 — read synthetic_name from DB, suffixed_name() only on write

Phase 1 added the column + backfill, repo write paths keep it in sync.
Phase 3 routes every READ callsite through `store_entities.synthetic_name`
directly instead of recomputing `<name>-by-<owner_username>` on the fly,
and switches the collision query off the inline string concat. The
`suffixed_name()` primitive now lives exclusively in write flows.

Read callsites updated (all read `entity["synthetic_name"]` directly,
no fallback — the column is NOT NULL and a missing value would be a
real bug worth surfacing as KeyError):

- app/api/marketplace.py:_flea_to_item — card MarketplaceItem.name.
- app/api/marketplace.py:flea_detail — PluginDetailResponse.manifest_name.
- app/api/store.py:_entity_to_response — StoreEntityResponse.invocation_name.
- app/api/store.py PUT bundle re-bake — `suffixed` passed to
  `_bake_plugin_tree`; entity is loaded pre-rename, so its
  synthetic_name is the OLD value `_bake_plugin_tree` expects.
- app/api/store.py PUT rename — `old_suffix` for `_rename_baked_tree`.
- app/api/my_stack.py — StoreInstallEntry.invocation_name.
- src/marketplace_filter.py — manifest_name in served plugin entry.

`suffixed_name` imports removed from marketplace.py, my_stack.py, and
marketplace_filter.py (no remaining callsites). store.py keeps the
import for its write paths:

- POST create (`suffixed = suffixed_name(final_name, username)` →
  passed to `_bake_plugin_tree` and `repo.create(synthetic_name=...)`).
- PUT rename collision check (`new_suffixed`).
- PUT rename `new_suffix` for `_rename_baked_tree` (proposed value).
- PUT rename `new_synthetic` for `repo.update(synthetic_name=...)`.
- Archive `old_suffix` + `new_suffix` for `_rename_baked_tree`
  (retro-compute pre-archive value after `repo.archive` already
  overwrote the DB row with the post-archive synthetic).

Collision SQL — `_suffixed_already_taken`:

  WHERE name || '-by-' || owner_username = ?   (before)
  WHERE synthetic_name = ?                     (after)

Same matches today (phase 1 backfill + NOT NULL invariant + write
paths in sync); indexable + single source of truth going forward.

Repository:

- UserStoreInstallsRepository.list_for_user explicit SELECT extended
  with `se.title`, `se.tagline`, `se.synthetic_name` so my_stack and
  marketplace_filter callers can read them off the joined row.

Tests:

- test_store_api.py::test_invocation_name_reads_from_synthetic_column —
  upload entity, manually override the column with a non-canonical
  value, verify GET response returns the override (proves read path
  consumes the column, not recomputes).
- test_marketplace_api.py::test_flea_card_and_detail_read_synthetic_name_from_db —
  same proof for `MarketplaceItem.name` (card) and
  `PluginDetailResponse.manifest_name` (detail).

* feat(flea): phase-4 — rename agnes-store-bundle → flea (synthetic plugin)

The synthetic plugin that wraps loose flea-market skills + agents into
one Claude Code plugin is renamed from `agnes-store-bundle` to `flea`.
Plugin-type flea uploads (their own standalone plugin entry) are
unaffected.

Constants:
- src/marketplace_filter.py:
  - BUNDLE_PLUGIN_NAME: "agnes-store-bundle" → "flea"  (Claude Code
    plugin manifest name + .claude-plugin/plugin.json name)
  - BUNDLE_PREFIXED_NAME: "store-bundle" → "flea"      (on-disk ZIP /
    git tree path, now plugins/flea/...)

Attribution layer (services/session_processors/usage_lib.py):
- FLEA_BUNDLE_PREFIX: "agnes-store-bundle" → "flea". The JSONL
  invocation identifier going forward is `flea:<skill-name>`.
- New `_LEGACY_FLEA_BUNDLE_PREFIXES = ("agnes-store-bundle",)`.
  `MarketplaceItemLookup.resolve()` + `_attribute_event()` accept BOTH
  the new and the legacy prefix so historic usage_events (~90-day
  retention) continue attributing to source='flea'. The tuple becomes
  a no-op once the rename has been live past the retention window —
  a follow-up commit can drop it then.
- USAGE_PROCESSOR_VERSION bumped 6 → 7 so the session-pipeline reprocess
  loop re-runs attribution with the new + legacy prefix branches.

User-facing copy:
- /api/store/bundle.zip Content-Disposition filename: agnes-store-bundle.zip → flea.zip
- `agnes admin store pull` default --out: agnes-store-bundle.zip → flea.zip
- Docstrings + JS comment + welcome template comment updated.

Tests:
- skill_flea.jsonl fixture identifier updated to flea:flea-skill.
- New skill_flea_legacy.jsonl with the legacy prefix for backward-compat
  coverage.
- New test `test_legacy_agnes_store_bundle_prefix_resolves` replays the
  legacy fixture and asserts source='flea' attribution still lands.
- All other test assertions / mocks substituted mechanically:
  test_session_processor_usage.py, test_usage_rollups.py,
  test_marketplace_filter_store.py, test_store_api.py,
  test_cli_refresh_marketplace.py.
- `_seed_flea_entity` (test_usage_rollups.py) + `_seed_attribution`
  (test_session_processor_usage.py) helpers now supply the NOT NULL
  `title` + `synthetic_name` columns from phase 1, since they INSERT
  directly bypassing the repo's create() fallback.

Client rollover note (CHANGELOG): `agnes refresh-marketplace` will
install the new `flea@agnes` plugin and the local marketplace clone's
`plugins/store-bundle/` source folder is removed via `git reset --hard`.
Whether Claude Code itself auto-prunes the orphan `agnes-store-bundle
@agnes` registry entry is undocumented — to verify empirically on the
dev VM. If the orphan entry lingers, a follow-up will add targeted
cleanup; until then users can manually run
`claude plugin uninstall agnes-store-bundle@agnes`.

Verified locally: 98 passed (session_processor_usage + usage_rollups +
marketplace_filter_store + cli_refresh_marketplace) + 228 passed/2
skipped (store_api + marketplace_api + admin_store_submissions +
store_entity_versions + store_repositories).

* fix(flea): phase-5 — attribution keyspace mismatch (closes #335)

Pre-fix every flea skill/agent invocation silently fell through to
`usage_events.source = 'builtin'`. Root cause: lookup tables in
`services/session_processors/usage_lib.py` keyed `_flea_entities` (and
the derived `_flea_plugins` set) by `store_entities.name` — the
un-suffixed display name. Claude Code writes invocations as
`flea:<synthetic_name>` (e.g. `flea:xlsx-by-c-marustamyan`), so
`dict.get(local)` always missed and the resolver fell through to
builtin. Result: marketplace cards, detail telemetry chips, admin
group-by-source all showed 0 flea invocations even when the raw
JSONL stream was correct.

Phase 1 added the `synthetic_name` column + backfill; phase 4 renamed
the bundle prefix to `flea`; phase 5 finally flips the lookup
keyspace to match what JSONL writes.

usage_lib.py:
- `MarketplaceItemLookup.__init__` preload: `SELECT synthetic_name,
  type FROM store_entities` (was `SELECT name, type`). `_flea_plugins`
  set derived from those keys, so it now carries synthetic_names
  too — matches what Claude Code writes when invoking a skill nested
  inside a flea plugin (`<synthetic>:<inner>`).
- `rebuild_rollups` preload: same SELECT change; also derives
  `flea_plugins` and threads it through `_aggregate_events` /
  `_rebuild_window`.
- `_attribute_event`: signature extended with `flea_plugins`; new
  branch `if prefix in flea_plugins: return ("flea", default_type,
  prefix, local)` for flea-plugin-nested skills/agents. This branch
  was added to `MarketplaceItemLookup.resolve()` in v6 (commit
  e076ebbe) but the rollup builder's helper was never updated to
  match, so nested skills inside flea plugins silently dropped out
  of the daily/window fact tables.
- `USAGE_PROCESSOR_VERSION`: 7 → 8. Forces the session-pipeline
  reprocess loop to re-attribute existing usage_events rows with
  the corrected lookup so rollup tables fill correctly on the next
  tick.

marketplace.py — 4 API stats lookup callsites switched from
`entity["name"]` to `entity["synthetic_name"]`:
- `_flea_to_item` (card stats lookup)
- `flea_detail` (`_build_telemetry` + `_load_inner_items_stats_by_parent`)
- `flea_skill_detail` (inner detail `parent_plugin` key)
- `flea_agent_detail` (inner detail `parent_plugin` key)

Tests:
- `skill_flea.jsonl` invocation: `flea:flea-skill` →
  `flea:flea-skill-by-alice` (mirrors what Claude Code writes after
  phase 1/4 — the suffixed synthetic_name).
- `test_flea_skill_attributed_with_empty_parent` assertion: rollup
  `name` column now carries the synthetic_name.

No legacy `agnes-store-bundle` prefix backward compat — clean cut per
user direction (dev phase, no production data worth preserving).

Verified locally: 53 passed targeted (session_processor_usage +
usage_rollups + marketplace_filter_store) + 215 passed/2 skipped
broader (store_api + marketplace_api + admin_store_submissions +
store_entity_versions).

* fix(flea): phase-6 — plugin-level rollup aggregation parity for flea

Flea plugin entity cards + detail pages showed 0 invocations even
though nested skills had correct rollup rows. Root cause: the
plugin-level aggregation pass in `_aggregate_events` was hardcoded
to `source='curated'` only:

    if source != "curated" or not parent:
        continue
    if group_by_day:
        pkey = (day, "curated", "plugin", "", parent)
    else:
        pkey = ("curated", "plugin", "", parent)

So flea plugin entities never got a synthetic
`(source='flea', type='plugin', parent_plugin='', name=<synth>)`
row aggregating nested invocations. `_load_invocation_stats('flea')`
filters `parent_plugin = ''` and returned no row for flea plugin
entity cards, so `stats.get(entity["synthetic_name"])` missed and
the API exposed 0/0.

Triggered by empirical observation on the dev VM —
`codex-second-opinion-by-c-marustamyan` plugin showed 0 calls in
the listing card while its three inner skills (codex-setup ×3,
codex-review ×1, codex-second-opinion ×1) had the expected child
rollup rows.

Fix:

- Extend the guard to `source in ("curated", "flea")`.
- Replace the hardcoded `"curated"` in the `pkey` tuple with the
  loop's `source` variable, so flea aggregation lands as `source=
  'flea'` and curated aggregation continues landing as
  `source='curated'`.

API path unchanged — `_load_invocation_stats('flea')` filters
`parent_plugin = ''` already picks up the new aggregated row
alongside standalone skill/agent rows. Rollup `name` field carries
the synthetic_name keyspace; no collision between standalone entity
synthetic and plugin entity synthetic (global suffix uniqueness
enforced by `_suffixed_already_taken`).

`USAGE_PROCESSOR_VERSION` bumped 8 → 9 to force a reprocess pass so
historic nested-invocation data fills the new plugin-level rows on
the next tick (instead of waiting for the next live invocation).

Tests:

- New `test_flea_plugin_row_aggregates_children` mirrors the existing
  `test_curated_plugin_row_aggregates_children`: seeds a flea plugin
  entity, three nested events (one user invoking two skills, a
  second user invoking one) → asserts the aggregated plugin row
  carries count=3, distinct_users=2 (union, not sum), plus the child
  rows survive alongside.

Verified locally: 43 passed (session_processor_usage + usage_rollups)
+ 82 passed/2 skipped broader (+ marketplace_filter_store +
marketplace_api).

* refactor(marketplace): phase-7 — unify Details sidebar across detail surfaces

Five marketplace detail surfaces (curated plugin, flea plugin, curated
inner skill/agent, flea inner skill/agent, flea standalone skill/agent)
had drifted on which Details rows they show and what order — the same
field landed in different positions, some fields duplicated hero info,
and the flea plugin Owner row leaked the kebab-case `owner_username`
slug instead of the user's real name. This commit aligns all five
surfaces on a single scan order driven by UX priority:

  identity → life-stage → telemetry → debug-tier

Concretely:

  1. Curator / Owner          (first scan signal — trust)
  2. Parent plugin            (inner skill/agent only)
  3. Released                 (top-level only — plugins + flea standalone)
  4. Last used                (recency)
  5. Active days              (engagement consistency)
  6. Version                  (flea standalone only — content hash)
  7. Bundle size              (debug-tier)

Dropped:

  - Slug field on plugin detail surfaces (`marketplace_id` for curated,
    `entity_id` for flea). Pure debug info, never user-relevant; URL
    already carries it.
  - Category + Installs on flea standalone skill/agent detail.
    Category is already shown as a hero badge; install count is in
    the hero telemetry chip — sidebar duplication added noise.

Owner display:

  - Flea plugin Owner row now reads `d.owner_display` (resolved through
    `users.name → users.email → owner_username` by `_resolve_owner_display`
    in `app/api/marketplace.py:1491`) instead of the raw `d.author_name`
    (which is `owner_username`, the kebab-case slug). API field already
    populated from phase 2; templates just consume it.
  - Curated Curator row continues to read `d.author_name` from
    marketplace-metadata.json; `owner_todo` placeholder behavior
    preserved.

Files:

  - app/web/templates/marketplace_plugin_detail.html — rewrote the
    Details render loop (lines 1364-1427 area). Slug row removed,
    rows reordered, Owner branch reads `d.owner_display`.
  - app/web/templates/marketplace_item_detail.html — both branches of
    the Details sidebar (inner skill/agent + flea standalone) re-laid
    around the same scan order. Telemetry helper unchanged, just
    repositioned. Category + Installs rows removed from the
    standalone branch.

No new tests — no existing test asserts the precise order of Details
rows or references the dropped fields in a sidebar context (grep
confirmed). API surface unchanged.

Verified locally: 84 passed / 2 skipped on `test_marketplace_api.py`
+ `test_store_api.py`.

* fix(flea): post-review hardening — N+1, v50 UNIQUE, docs, test cleanup

Addresses 5 critical findings from PR #342 code review:

1. N+1 query in `_flea_to_item` — owner-display resolution previously
   ran one `SELECT … FROM users WHERE id = ?` per item in the listing
   comprehension. Now batched via `_load_users_display` IN-query
   prefetch; 50 items drops 51 user queries to 2. Regression-guarded
   by `TestFleaOwnerDisplayBatched` (spies `_resolve_owner_display`
   and asserts it's not called inside the list path).

2. Misleading comment in `src/marketplace_filter.py` claimed the
   attribution layer accepts both `agnes-store-bundle` and `flea`
   prefixes — it doesn't (clean cut per CHANGELOG). Rewrote to match
   reality.

3. CHANGELOG `[Unreleased]` had two `### Changed` blocks. Merged into
   one (BREAKING bullet first).

4. New v49→v50 migration adds `UNIQUE INDEX
   idx_store_entities_synthetic_name`. v49 made `synthetic_name` the
   canonical attribution key but uniqueness was only app-enforced;
   v50 promotes the invariant to the DB layer. Migration pre-checks
   for existing duplicates and raises `RuntimeError` listing them
   rather than letting `CREATE UNIQUE INDEX` fail mid-way. v48→v49
   migration gained an `is_nullable='YES'` guard on its `SET NOT NULL`
   ALTERs so re-runs on a fully-migrated DB don't trip DuckDB's
   "cannot alter entry … entries depend on it" block (the new index
   counts as such an entry). Index is created by the migration only —
   keeping it out of `_SYSTEM_SCHEMA` preserves fresh-install ordering
   (CREATE TABLE → v49 ALTERs → v50 CREATE INDEX).

5. Deleted three redundant version-pinned schema asserts whose names
   lied about their bodies (`test_schema_version_is_42` asserting
   `== 49`, etc.). Canonical assert lives in
   `test_db_schema_version.py`, renamed to
   `test_schema_version_matches_constant`.

* fix(db): gate v34→v38 store_entities ALTER COLUMN steps on column state

CI on Linux failed `test_v17_to_v18_drops_*` after the v50 UNIQUE INDEX
landed. Root cause: those tests open a DB at the full target version,
seed fixtures, then reset `schema_version` to 17 and reopen — forcing
the ladder to re-run from 17 → current. With the v50 index now in place,
DuckDB blocks intermediate `ALTER COLUMN` steps on `store_entities`
("Cannot drop this column: an index depends on a column after it!" /
"Cannot alter entry because there are entries that depend on it"),
because `synthetic_name` (the indexed column) sits positionally after
the columns those steps touch.

Fix: convert the three SQL-list migrations that hit store_entities into
defensive Python functions:

- `_v34_to_v35_migrate` short-circuits when `synthetic_name` already
  exists (post-v49 shape — the visibility_status rebuild is moot and
  the DROP COLUMN would be blocked by the index).
- `_v35_to_v36_migrate` gates the `visibility_status SET NOT NULL` +
  `SET DEFAULT` on `is_nullable='YES'` so it's a true no-op when the
  column is already constrained.
- `_v37_to_v38_migrate` gates the `version_no SET NOT NULL` step the
  same way.

Forward-roll path (real installs that never reset schema_version) is
unchanged: the gates fire `YES` → ALTERs run. The fix only changes
behavior for the "DB is already at v50 shape but version row says 17"
scenario the tests construct.

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
2026-05-19 02:32:41 +02:00
David Rybar
e11f03eb60
fix(api): sample endpoint returns 500 for materialized BQ tables (#341)
* fix(api): v2 sample endpoint returns 500 for materialized BQ tables

build_sample in app/api/v2_sample.py checked only source_type ==
'bigquery' before routing to _fetch_bq_sample, so materialized
tables (source_type='bigquery', query_mode='materialized') attempted
a live BigQuery query for data that lives locally as parquet —
causing an unhandled exception and HTTP 500.

Fix mirrors the existing guard already in v2_schema.py (#261): skip
_fetch_bq_sample when query_mode='materialized' and fall through to
the local parquet read path. The parquet is the source of truth for
any materialized source regardless of source_type.

Regression test test_materialized_bq_table_reads_parquet_not_bq
patches _fetch_bq_sample with a sentinel, registers a materialized
BQ table, calls build_sample, and asserts (a) the sentinel was never
hit and (b) rows came from the local parquet.

Credit @davidrybar-grpn (#341, cleaned + rebased onto post-#340 main).

* release: 0.54.28 — v2 sample endpoint materialized-BQ 500 fix

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-18 22:57:32 +02:00
Monika Feigler
86933a2cb5
fix(web): move keboola {% endif %} so edit-modal JS is always available (#340)
* fix(web): keboola sync-mode helpers escape the {% if data_source_type == 'keboola' %} guard

Edit-modal functions were wrapped inside
{% if data_source_type == 'keboola' %} in admin_tables.html. Two of
them — _getEditKbSyncMode and onEditKbSyncModeChange — are called
from sync-mode radio buttons that are rendered for ALL instance
types (not inside any Jinja2 conditional). On a BigQuery or CSV
instance the JS functions were absent from the page, causing a
ReferenceError when the edit modal was opened.

Fix: split the conditional into two regions:

1. Discover helpers (loadKeboolaBuckets, loadKeboolaTables) — remain
   inside {% if keboola %}, they call the Keboola Storage API.
2. _getEditKbSyncMode + onEditKbSyncModeChange — moved outside the
   guard, because the sync-mode radio buttons are rendered for all
   instance types.
3. Phase F2 edit modal + prefillFromKeboolaTable — remain inside
   {% if keboola %}, called only from Keboola-conditional HTML.

Credit @MonikaFeigler.

* release: 0.54.27 — /admin/tables edit modal ReferenceError fix on non-Keboola instances

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-18 20:58:24 +02:00
Vojtech
c552bf8243
feat(api): enforce API design rules via pytest + fix DELETE/status-code violations (#338)
* feat(api): enforce API design rules via pytest + fix DELETE/status-code violations

Adds tests/test_api_design_rules.py with four forward-only design guardrails
that prevent new endpoints from accumulating REST debt:

  Rule 1 — No new verbs in URL paths (existing 28 grandfathered via allowlist)
  Rule 2 — DELETE must declare 204 No Content (zero allowlist entries)
  Rule 3 — Creator POSTs (path has GET counterpart) must declare 201/202
  Rule 4 — All protected /api/* routes must declare 401 and 403

Fixes found by running the rules:

- DELETE /api/admin/metrics/{metric_id}: return 204, drop redundant body
- DELETE /api/memory/{item_id}/dismiss (undismiss): return 204, drop body
- POST /api/memory/admin/contradictions: add status_code=201 (creates a resource)
- app/main.py: _add_auth_error_responses() injected into app.openapi() at startup;
  declares 401/403 on all protected /api/* operations centrally, fixing the 120
  routes that previously omitted these response codes from the spec.

Closes #337

* fix(api): resolve CI failures — extend 204 fixes + complete allowlists

- Fix remaining 6 DELETE endpoints to return 204: store entities,
  store entity install, marketplace curated install, marketplace plugin
  system flag, admin store submission, and observability view
- Update all affected tests to expect 204 (removed body assertions)
- Add 4 missing verb paths to _VERB_PATH_ALLOWLIST in test_api_design_rules.py
- Add 2 upsert endpoints to _CREATOR_POST_ALLOWLIST
- Update admin_marketplaces.html to not call r.json() on 204 DELETE

* fix(tests): align 2 DELETE-asserting tests with 204 contract (post-#339 rebase)

CI's test-shard (1) and (4) failures on this PR were caused by
Vojta's second commit (`fix(api): resolve CI failures — extend 204
fixes`) flipping more DELETE endpoints to status_code=204 than just
the two mentioned in the PR body. Two tests assert status_code==200
on the DELETE response and broke:

- tests/test_admin_store_submissions.py::TestQuarantineGates::test_admin_can_delete_quarantined
  (DELETE /api/store/entities/{entity_id})
- tests/test_store_api.py::TestInstallCycle::test_admin_hard_delete_cascades_installs
  (DELETE /api/store/entities/{entity_id}?hard=true)

Updated both to assert 204 with a comment pointing at
tests/test_api_design_rules.py rule 2 so future reviewers can
trace the contract. Verified via broader scan that no other test
asserts == 200 on a .delete() response directly (4 other sites do
.delete() then check 200 on a subsequent GET — those are fine).

* release: 0.54.26 — API design rules (test_api_design_rules.py) + 8 DELETE endpoints flip to 204

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-18 15:25:07 +02:00
Vojtech
c5948f26fc
fix(api): harden API surface before Swagger (issue #336) (#339)
* fix(api): harden API surface before Swagger — 9 findings from issue #336

ADV-001: POST /api/sync/table-subscriptions now checks can_access() per
table entry, matching the gate already on POST /api/sync/settings.

ADV-002: GET /webhooks/jira/health gated behind require_admin; jira_domain
removed from response to prevent anonymous info disclosure.

ADV-003: GET /api/version no longer exposes commit_sha or schema_version.

ADV-005: /docs, /redoc, /openapi.json now require a valid session via custom
FastAPI routes (docs_url=None, redoc_url=None, openapi_url=None).

ADV-006: /cli/ and /webhooks/ added to _API_PATH_PREFIXES so future
auth-gated routes there return JSON 401 not an HTML redirect.

ADV-007: GET /api/catalog/tables wired to CatalogTablesResponse model.

ADV-008: TableSubscriptionUpdate.tables capped at max_length=500.

ADV-009: GET /api/users and GET /auth/admin/tokens accept limit/offset
(default 1000, max 10000); repositories updated accordingly.

Tests: 11 new regression tests in TestApiHardening336; test_jira_webhooks
fixture updated with seeded admin user; OpenAPI snapshot regenerated.

* fix(test): update test_journey_jira health check to use admin auth after ADV-002 gate

* fix(security): close /auth/bootstrap auth-bypass + BREAKING markers on ADV-002/003/005

Reviewer-flagged regression introduced by ADV-009's pagination on
UserRepository.list_all(): the silent default LIMIT 1000 broke the
bootstrap check at app/auth/router.py and the startup no-password
warning at app/main.py — both call list_all() with no args and depend
on exhaustive enumeration.

On an instance with >1000 users where no password-holder lands in
the email-sorted first page, [u for u in list_all() if
u.get('password_hash')] becomes empty → bootstrap re-opens → an
unauthenticated caller can claim admin via /auth/bootstrap. Real
auth-bypass on a security-sensitive boot path.

Fix:
- src/repositories/users.py: list_all() restored to no-arg, returns
  EVERY row (no LIMIT). Comment explicitly warns against re-adding
  pagination here. API-surface pagination moved to a new
  list_paginated(limit, offset) method with its own docstring.
- app/api/users.py: GET /api/users now calls list_paginated().
  Existing query-param validation (limit <= 10000) preserved.

Regression guards in tests/test_security.py::TestApiHardening336:
- test_users_list_all_returns_every_row_no_silent_limit asserts
  list_all() takes no params other than self (via inspect.signature)
  so a future cleanup can't accidentally re-add limit/offset.
- test_users_list_paginated_is_separate_method asserts the
  paginated variant is a distinct method, not an overload.

CHANGELOG: added **BREAKING** markers per CLAUDE.md release
discipline to three pre-existing ADV bullets that are observable
breaking changes for external consumers:
- ADV-002 (webhook health going from anonymous to admin-only)
- ADV-003 (/api/version dropping commit_sha + schema_version)
- ADV-005 (/docs, /redoc, /openapi.json going from anonymous to
  session-required)

* release: 0.54.25 — API hardening before Swagger (ADV-001..009) + bootstrap-bypass regression fix

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-18 15:13:21 +02:00
Vojtech
bc5956ac94
test(store): full flea-market lifecycle integration coverage (#334)
Integration test class covering the v1 → subscribe → v2 → v3-blocked →
override → restore-v1 lifecycle from issuer + admin + subscribed-user
perspectives. Asserts BOTH the entity state AND the served
`marketplace.zip` bytes + ETag at each transition — the channel the
analyst's `agnes pull` actually hits.

Main test:
- TestFullLifecycleFromInstaller::test_main_lifecycle_v1_v2_v3blocked_override_restorev1

5 corner cases covering role × phase gaps surfaced by think-hard
coverage analysis:
- test_unsubscribed_user_does_not_get_entity — negative control on
  install filter
- test_late_subscriber_during_quarantine_gets_v2 — fresh install
  during v3-blocked window must get v2
- test_non_owner_does_not_see_quarantine_banner — privacy gate: only
  owner + admin see in-flight failure detail
- test_second_restore_of_v1_triggers_reuse_path — PR #332 lifecycle
  validation (the EXACT live agnes-development bug)
- test_archived_entity_keeps_serving_installed_subscribers —
  CLAUDE.md contract: archive hides from browse but preserves
  existing user_store_installs

Reuses existing scaffolding (`_create_user`, `_make_skill_zip`,
conftest autouse). Adds class-scoped static helpers `_install`,
`_serve_zip`, `_drive_llm`, `_latest_sub_id`, `_ent`, `_installs`,
`_setup_guardrails_on`.

Plan: ~/.claude/plans/peppy-napping-rose.md.

198 tests green across touched surface (admin + versions + api +
marketplace).
2026-05-16 12:41:09 +02:00
Vojtech
cd03028776
fix(store): restore reuses prior approved verdict + admin detail surfaces content_quality (#332)
* fix(store): restore reuses prior approved verdict; admin detail surfaces content_quality

Live bug on agnes-development: entity 6ba2ee1d…'s v5 submission (third
restore of v1, byte-identical to v1/v2/v4/v6) landed `blocked_llm`
while the other identical-hash siblings landed `approved`. Anthropic
structured output is non-deterministic — same bytes flipped
`content_quality.verdict` pass↔fail across calls. Admin detail page
made the failure look mysterious: only security-findings table
rendered, so a content-quality-only block showed up as
"No findings — model verdict was clean".

Two fixes:

1. Restore endpoint reuses a prior `approved` submission's verdict
   when the restored bundle hash matches an existing history entry
   AND `reviewed_by_model` matches. Skips the LLM call, stamps the
   new submission with the prior verdict + `reused_from_submission_id`
   marker. Deterministic + saves Anthropic tokens. Gated on
   schedule_async_llm so guardrails-off keeps its existing path.

2. Admin detail template now renders `content_quality.issues` in its
   own table + adds an explicit "Blocked but no findings recorded"
   notice for the transient-non-determinism case + surfaces the
   reuse marker when present.

Reuse falls back to a real LLM call when:
- prior submission's reviewed_by_model doesn't match current (admin
  upgraded tier Haiku → Sonnet → Opus)
- prior submission was guardrails-off (no reviewed_by_model)
- no history entry has matching hash

Tests:
- TestRestoreReusesApprovedVerdict::test_restore_of_approved_version_skips_llm_and_reuses_verdict
- TestRestoreReusesApprovedVerdict::test_restore_legacy_v1_falls_back_to_llm

* fix(store): admin detail v# by submission_id + version switcher

Three related fixes surfaced live by a user inspecting submission
47bbc1f5… on localhost where v# rendered as v1 even though current
was v10.

1. Admin queue + admin detail derive submission v# by submission_id
   instead of hash. Pre-fix the loop matched first hash-equal entry
   in version_history — always v1 when bundles were byte-identical
   (which is the common case after the restore-reuse path). Two
   call sites updated:
   - `src/repositories/store_submissions.py:list_for_admin` (queue
     v# column)
   - `app/web/router.py:admin_store_submission_detail_page` (detail
     page v# chip on each section header)

   Same fix pattern as PR #330 for runner / override.

2. New version-switcher card on admin detail page lists every
   submission linked to the entity with status + reviewed_by_model +
   click-to-jump. Solves the user's secondary ask ("should be a way
   to switch different versions on the submission detail").

3. Initial POST now backfills the v1 seed entry's submission_id
   right after creating the v1 submission. The helper
   `update_history_submission_id` existed but no production code
   path called it — so v1 always had submission_id=None and every
   "find v# for submission" lookup silently failed for v1.

171 tests green on touched surface.

* release: 0.54.24 — restore reuses prior approved verdict + admin detail content_quality + v# by submission_id (Codex/Live follow-up to #330/#331)

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-16 07:12:29 +02:00
Vojtech
9eaa1dc53c
fix(store): rescan promotes non-current submission when guardrails off (Codex follow-up to #330) (#331)
* fix(store): rescan promotes non-current submission when guardrails off

Codex adversarial-review follow-up on PR #330: admin rescan with
`guardrails.enabled: false` flipped submission status to `approved`
and entity visibility to `approved` but never called
`promote_to_version`. A rescan that re-approved a non-current v2+
left the entity stuck at the prior version even though the operator's
intent in clicking rescan was to publish the rescanned bytes.

Mirrors the inline-promote pattern in create / update / restore. The
guardrails-on path is unchanged — it schedules an LLM review and
promotion lands via `runner.run_llm_review` on approval.

Adds tests for the byte-identical edge cases Codex flagged as
under-covered by PR #330:
- TestPromoteLookupByByteIdenticalBundles::test_byte_identical_v3_after_different_v2
- TestOverrideForwardOnly::test_override_byte_identical_v2_blocked_promotes_correctly
- TestRescanPromotesNonCurrent::test_rescan_promotes_non_current_v2_when_guardrails_disabled

* release: 0.54.23 — rescan promotes non-current submission when guardrails off (Codex follow-up to #330)

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-16 07:04:28 +02:00
Vojtech
78cd243e65
fix(store): promote-on-approve looks up version_no by submission_id (live agnes-development bug) (#330)
* fix(store): promote-on-approve looks up version_no by submission_id

Live bug observed on agnes-development: an entity had 5+
version_history rows sharing the same `hash` (user re-uploaded
byte-identical bundles as v2/v4/v6 of the same skill — the LLM and
inline checks happily approved each one). The runner's
promote-on-approve path looked up the submission's version_no by
hash:

    for entry in entity.version_history:
        if entry["hash"] == sub_hash:
            target = int(entry["n"]); break

The loop matched the FIRST hash collision — always v1, n=1. With
current=1, the forward-only `target > current` guard then skipped
the promote, leaving the entity stuck at v1 even though the new
submission's status flipped to `approved`. UI kept showing v1 as
"current".

Fix: look up by submission_id via the existing
`_version_no_for_submission` helper (already used by retry / rescan
/ download paths). Same lookup applied in
`admin_override_store_submission` which had the identical hash-match
loop.

Test: TestPromoteLookupByByteIdenticalBundles uploads v1 + a
byte-identical v2, drives the LLM with mock-approve, asserts
entity.version_no advances to 2.

* fix: bundle #329 reviewer-Important follow-ups + post-merge polish

Bundled with Vojtech's commit ahead of this (the promote-on-approve
`version_no` lookup-by-submission_id fix) since #330 is the next
release-cut PR and the four #329 follow-ups would otherwise need a
standalone release-cut PR — prohibited by docs/RELEASING.md §
"Release-cut belongs to the PR".

Fixed:
- src/usage_ask.py — SCHEMA_DIGEST + SYSTEM_PROMPT referenced the
  dropped `usage_plugin_daily` table. The admin
  `POST /api/admin/telemetry/ask` endpoint ships SYSTEM_PROMPT to
  the LLM, so any model-emitted SQL against `usage_plugin_daily`
  would fail with a DuckDB binder error post-#329 merge. Updated to
  describe the new v48 rollups (`usage_marketplace_item_daily` /
  `_window`) and rule 5 of the prompt to point at them.

Internal:
- CHANGELOG.md [0.54.20] section restored to its canonical content
  from the v0.54.20 git tag. The #329 self-merge carried 226 lines
  of author's pre-rebase bullets that ended up mis-attributed; the
  published v0.54.20 GitHub Release (FTS BM25 + batch bar) now
  matches the CHANGELOG section verbatim. Also fills in [Unreleased]
  with this PR's bullets (Fixed + Internal).
- tests/conftest.py — dropped the unused
  `conn_with_usage_schema_and_attribution` fixture that INSERTed
  into the now-removed `usage_attribution_*` tables. Zero callers
  today, but a tripwire — the first future test to request it would
  have failed with a binder error.
- app/web/templates/marketplace.html — replaced a customer-specific
  token (`groupon-marketplace`) in the Most Popular sort-tiebreaker
  comment with a generic `<customer>-marketplace` placeholder per
  CLAUDE.md § Vendor-agnostic OSS. Also scrubbed an `agnes-development`
  reference in app/api/admin.py and src/store_guardrails/runner.py
  (cherry-picked from Vojtech's commit) on the same hygiene rule.

* release: 0.54.22 — flea-market promote-by-submission_id fix + #329 reviewer follow-ups

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-15 21:21:14 +02:00
minasarustamyan
302cf58ccd
feat(marketplace): telemetry v46 + flea inner parity + listing polish (#329)
* feat(telemetry): marketplace item rollup refactor (schema v46)

Replace the v42 attribution layer with prefix-split + live lookup against
marketplace_plugins / store_entities. The v42 design had a latent bug —
AttributionLookup keyed on bare skill names while Claude Code writes
`<plugin>:<local>` in JSONL, so lookups never matched and
usage_plugin_daily stayed empty in every deployment.

Schema (v46 migration):
- Drop usage_attribution_skills / _agents / _commands (mapping tables,
  derivable from marketplace_plugins + plugin tree).
- Drop usage_plugin_daily (always empty in production due to the bug above).
- Create usage_marketplace_item_daily — per-day fact (count, distinct_users,
  error_count), composite PK on (day, source, type, parent_plugin, name).
- Create usage_marketplace_item_window — sliding-window snapshot with
  true cross-window distinct user counts; period_label='last_7d' refreshes
  every tick, 'last_30d' refreshes hourly (tracked via session_processor_state).
- Mark usage_tool_daily as candidate for removal (no product-UI consumer).

Attribution flow:
- MarketplaceItemLookup replaces AttributionLookup. Preloads
  marketplace_plugins.name + store_entities.name into memory once per
  UsageProcessor tick, then per-event splits identifier on ':',
  matches prefix, writes resolved source / parent_plugin into
  usage_events. agnes-store-bundle prefix routes to flea entities.
  Slash commands with `plugin:` prefix count as type='skill' in rollup.

API:
- BREAKING: MarketplaceItem.unique_users_30d renamed to distinct_users_30d
  (now a true distinct count from the window snapshot, not sum-of-daily).
- InnerDetailResponse gains a telemetry field — invocations_30d +
  distinct_users_30d surfaced on curated inner skill / agent detail pages.
- Card chip hidden pending UX finalisation; data stays in the response.

Backfill: scripts/backfill_marketplace_rollup.py — one-shot rebuild over
historic usage_events after deploy, idempotent.

USAGE_PROCESSOR_VERSION bumped 4 → 5 so the reprocess loop re-attributes
existing events to the new source/ref_id semantics on the next tick.

Tests rewritten: test_session_processor_usage, test_usage_rollups,
test_marketplace_telemetry, test_api_admin_usage_reprocess,
test_db_schema_version, test_home_stats, test_schema_v42_migration.
New: test_backfill_marketplace_rollup.

* fix(marketplace): refresh Most Popular on search + category changes

`loadMostPopular()` early-exits when `state.q` or `state.category` is
set, but the search + category handlers only called `loadItems()` —
so once the section was visible, typing a query or filtering by
category didn't re-run the hide check and the cards stayed on screen
out of scope. Tab + sort handlers already chained the call.

Add the call to runSearch + category pill click handlers (All +
per-category) so the visibility contract holds for every state
mutation that can flip the early-exit condition.

* feat(marketplace): All-plugins section + 7-day Most Popular

Listing layout:
- Always-visible "All plugins" / "All items" / "Your stack" section
  header (label swaps per tab) wrapped in `#mp-all-section` so its
  margin-collapse mirrors the sibling `#mp-popular-section` and the
  spacing from the filter row stays consistent in both layouts.
- Sort dropdown moved from the filter row into the All-* header,
  pinned right via `margin-left: auto`. Anchored to its section so
  the relationship between sort + grid is obvious.
- `.mp-section-header` gets `min-height: 32px` + `align-items: center`
  so the bare-text Most Popular row matches the dropdown-bearing
  All-* row.
- `.mp-section-header` margin tightened 24px → 20px on top.

Most Popular:
- Capacity reduced 8 → 4 cards.
- Now reflects a 7-day window (was 30-day). Backend surfaces
  `invocations_7d` + `distinct_users_7d` on `MarketplaceItem`
  alongside the existing 30d fields; the loader pulls a wider page
  (server still sorts by 30d) and re-sorts + filters client-side
  on `invocations_7d > 0` so the strip stays "hot right now".
- Section label updated to "Last 7 days".
- Section now renders on both `curated` and `flea` tabs (was
  curated-only). Hidden on `my` and whenever search / category
  filter is active. Refresh hooks wired into search + category
  click handlers so visibility flips immediately on state change.

Backend (`_load_invocation_stats`):
- Single SELECT pulls both `last_30d` and `last_7d` rows from
  `usage_marketplace_item_window`; the result dict carries
  invocations + distinct_users for both windows.
- Trend (recent_7 vs prior_7) kept on the daily fact table so it
  stays independent of the window snapshot's freshness.

* feat(marketplace): Most adopted sort + hide Trending when no trend data

Add a fourth sort option to the All-items dropdown — "Most adopted
(30d)", keyed on `MarketplaceItem.distinct_users_30d` (true 30d
distinct user count from `usage_marketplace_item_window`). Protects
the listing from power-user skew that `most_used` is susceptible to:
one user × 100 invokes can't beat 10 different users × 1 invoke
under adoption sort.

Hide Trending option when the response has no trend data. User
reported `sort=trending` returning an empty grid because every
plugin's `trend_pct` was None (prior-week threshold of >= 3
invocations didn't clear anywhere). Empty grids on a user-selected
sort are worse UX than just not offering the sort — surface what
works, hide what doesn't.

Backend (`app/api/marketplace.py`):
- `_apply_sort` gains a `most_adopted` branch (DESC distinct_users_30d,
  ties by name ASC).
- `sort` Literal extended.
- `ItemListResponse.available_sorts` lists the sort keys the UI
  should expose for this response. recent/most_used/most_adopted
  always; trending only when at least one item in the tab's stats
  carries a non-null trend_pct.
- `_available_sorts(stats_dicts)` helper centralises the rule —
  curated and flea branches pass one stats dict, my-tab passes both
  (option is available when either source has trend data).

Frontend (`app/web/templates/marketplace.html`):
- New `<option value="most_adopted">Most adopted (30d)</option>`
  between Most used and Trending.
- URL state allowlist extended so `?sort=most_adopted` round-trips.
- `applyAvailableSorts(available)` runs after each list fetch:
  hides options not in the response's available_sorts; if the user
  is on a now-unavailable sort, resets to 'recent' and re-fetches.
  Search-mode fan-out unions availability across the curated + flea
  responses so a hit on either side keeps the option visible.

* feat(marketplace): funnel chip on cards + deterministic Most Popular sort

Card chip — funnel telemetry between description and footer:

  [stack-icon] N installed · [user-icon] N active · [bolt-icon] N calls · ↑/↓ N%

- stack_count (new MarketplaceItem field): for curated it's COUNT(*)
  on user_plugin_optouts (post-v28 row PRESENCE = subscribed; system
  plugins are fanned out to every user via fanout_system_for_user so
  the count includes them naturally). For flea it reuses the existing
  store_entities.install_count (bumped on install/uninstall).
- distinct_users_30d (existing) — active users in the 30d window.
- invocations_30d (existing) — call volume.
- trend_pct (existing) — week-over-week, both directions: green ↑ /
  red ↓, magnitude only (sign in the arrow). Hidden when null.

Backend additions in app/api/marketplace.py:
- MarketplaceItem.stack_count field.
- _load_curated_stack_counts() — one SELECT per render, GROUP BY
  (marketplace_id, plugin_name). Wired into the curated + my-tab
  branches; flea reads install_count off the entity row directly.

Frontend (app/web/templates/marketplace.html):
- Heroicons solid 24×24 inlined (one helper per icon, all
  fill="currentColor" so per-segment colour tokens apply): rectangle-
  stack (mirrors the My Stack tab icon), user, bolt, arrow-trending-
  up/down.
- Per-segment colour: installed=amber #F59F0A (My Stack accent),
  active=green #0e9b6a, calls=orange #f97316. Text stays neutral so
  the chip still reads as metadata, the leading glyph carries the
  visual cue. Trend pill keeps the full-segment green/red colour.
- Zero state: chip hidden when stack_count == 0 AND invocations_30d
  == 0 — brand-new cards aren't visually penalised by a "0·0·0" row.
- Tooltips on every segment via title="…" so hover explains the
  number's meaning to anyone uncertain about the icon.

Most Popular section — deterministic ordering:

Previously sorted by invocations_7d DESC with no tie-breakers, so
several cards with identical 7d call counts would swap places on
refresh (JS stable sort fell back on backend order, and the backend's
own tie-breaker for `most_used` was just name ASC — six `grpn`
plugins from six test marketplaces collapse to the same name and
became indeterminate via list_with_filters' created_at order).

New cascading hierarchy (chosen primary now matches what "most
popular" really means — wide adoption, not power-user volume):

  1. distinct_users_7d DESC  ← adoption / social proof
  2. invocations_7d   DESC  ← volume at equal adoption
  3. distinct_users_30d DESC ← broader adoption fallback
  4. invocations_30d  DESC  ← broader volume fallback
  5. name              ASC  ← deterministic textual order
  6. marketplace_slug  ASC  ← splits duplicate plugin names across
                              marketplaces

Six levels guarantee any two items end at a different sort key, so
the strip is stable across refreshes.

* fix(marketplace): unify Most Popular on 30d + right-align installed chip

Most Popular section was sorting on the 7d window while its cards
rendered 30d numbers — header label promised one thing, cards showed
another. Unified everything on 30d so a card means the same data
everywhere on the page.

- Dropped the "Last 7 days" meta from the Most Popular header.
- Sort cascade now starts on distinct_users_30d, then invocations_30d,
  with 7d adoption/volume as recency-aware fallbacks before the name +
  marketplace_slug deterministic tail. Six levels guarantee identical
  sort keys never produce indeterminate order across refreshes.
- Filter switched from invocations_7d > 0 to invocations_30d > 0 to
  match the new horizon.
- Most Popular now only renders on page 1 of the listing. Past initial
  discovery, a top-of-list popularity strip on page 2+ would shadow the
  results the user paged into. Pager click handler refreshes the
  section so navigating back to page 1 re-mounts it.

Chip layout — split engagement vs adoption visually:

  [user] N active · [bolt] N calls · [↑/↓] N%        [stack] N installed
  └────────── LEFT (time-bounded engagement) ────┘   └── RIGHT (all-time) ──┘

- Installed (stack_count) is all-time, decremented on uninstall. Alone
  it says little ("12 people installed it") without the engagement
  context next to it ("…but did anyone actually use it?"). Visually
  separating the two groups makes that distinction obvious — left
  group answers "is it used", right answers "does anyone have it".
- Implemented via flex with margin-left:auto on .seg-installed so
  installed drifts to the trailing edge.
- Installed tooltip now reads "Currently installed by N users" — the
  count is a real-time net (uninstall drops it), and saying "currently"
  makes that explicit. Helps when a card shows 0: signals "nobody has
  this in their stack right now", not "data missing".

* feat(plugin-detail): telemetry chip in hero, derived rows in sidebar

Surface the same telemetry funnel the listing card carries on the
curated plugin detail page, so clicking through from /marketplace
keeps a single mental model — figures match, semantics match. The
detail sidebar drops the two raw numbers that used to live there
(Invocations 30d / Users 30d — duplicated by the chip now) and
replaces them with two *derived* signals only the daily series can
provide: Active days + Last used.

Backend (app/api/marketplace.py):
- PluginDetailResponse.stack_count — curated reads via
  _load_curated_stack_counts(), flea reuses install_count. Frontend
  treats both sources uniformly.
- _build_telemetry() always returns a dict (never None). Frontend
  decides chip visibility from stack_count + invocations_30d the
  same way the listing card does. daily_series is always 30 entries
  (zero-padded) so "Active days" and "Last used" derivations on the
  sidebar are trivial array filters.

Frontend (app/web/templates/marketplace_plugin_detail.html):
- New .hero-telemetry slot at the bottom of the hero meta column,
  between the pills row and the action buttons. Renders the four
  funnel segments — active · calls · trend · installed — joined by
  ` · `. No left/right split: the hero has space, so a single
  coherent metadata strip reads cleaner than the card's split layout.
- Heroicons solid inlined (user / bolt / arrow-trending-up,-down /
  rectangle-stack) recoloured against the dark hero — icons in
  lighter tokens (mint #6ee7b7, peach #fdba74, cream #fde68a), trend
  pill keeps the saturated green/red because direction-coding earns
  its own colour.
- Tooltip on installed reads "Currently installed by N users" — the
  count is a real-time net (drops on uninstall), and "currently"
  makes that explicit when a card shows 0.
- fmtNum helper added so 1.2k / 14M renderings match the card's
  format exactly.
- Sidebar swap: Invocations + Users rows removed, replaced by
    Active days  →  "N of 30"
    Last used    →  fmtRelative of the latest non-zero day
  Both derived from telemetry.daily_series — engagement consistency
  + recency, neither of which the hero chip exposes on its own.

* feat(item-detail): telemetry chip in hero for curated skill/agent

Bring the funnel chip the plugin detail page got in 4cf38d40 to the
curated inner skill/agent detail page — clicking through from the
listing card now keeps the same metadata strip from grid to plugin
page to inner item page.

Backend (app/api/marketplace.py):
- _load_inner_item_stats() rewritten:
    * always returns a dict (never None) so the frontend can decide
      chip visibility client-side, same contract as _build_telemetry
    * adds trend_pct, computed the same way as plugin level
      (recent_7 vs prior_7 from usage_marketplace_item_daily, ≥3
      prior-week threshold)
    * adds daily_series (30 entries, zero-padded) so the sidebar can
      derive Active days + Last used
- InnerDetailResponse.parent_stack_count — new field. Skills/agents
  don't have a per-item subscription model, so the hero shows the
  *parent plugin's* stack count under a "Plugin:" prefix. The
  funnel: "12 installed plugin → 2 actually use this skill".
- curated_skill_detail + curated_agent_detail handlers load
  _load_curated_stack_counts() once and pass the parent's value.

Frontend (app/web/templates/marketplace_item_detail.html):
- New .item-detail .hero .hero-telemetry slot beneath the badges
  row. CSS mirrors plugin-detail's colour tokens (mint/peach/cream
  Heroicons solid + saturated trend pill) so the two surfaces read
  as one visual family.
- Installed segment uses a "Plugin:" label rendered with reduced
  opacity to signal the metric describes the parent, not the item
  itself. Tooltip: "Parent plugin (<plugin_name>) currently
  installed by N users".
- Sidebar Invocations + Users rows removed (chip carries them).
  Active days + Last used derived from telemetry.daily_series replace
  them; only rendered when activeDays > 0 so a brand-new skill
  doesn't show "0 of 30" / "Last used —".
- "Type" row dropped from the sidebar — duplicates the hero badge.
- fmtNum helper added (matches listing card + plugin detail).

Plugin detail (app/web/templates/marketplace_plugin_detail.html):
- Hero "Curator: …" line removed. The Details sidebar already
  carries that info; duplicating it under the h1 was visual noise.
- Sidebar "Owner" row renamed to "Curator" — for curated plugins
  it's a person who curates inclusion in this Agnes instance, not
  the upstream code owner. "Owner" was a hold-over label.

* feat(item-detail): unify hero with plugin detail — pills + breadcrumb + cleaner sidebar

- Inner skill/agent hero now uses the same `.pills` / `.pill.cat / .curated /
  .flea / .muted` class names + CSS as the plugin detail page; the only
  item-only addition is `.pill.type` (Skill / Agent uppercase, plugin detail
  has no kind axis).
- Hero `Updated` moved out of the meta-row into a muted pill (mirrors the
  plugin detail hero), removed from the Details sidebar to avoid duplication.
- Details sidebar slimmed: dropped Marketplace, Path, Updated rows; Parent
  plugin now shows the curator-friendly display name
  (`parent_display_name || manifest_name || slug`) instead of the slug.
- Breadcrumb extended to full path: Marketplace > <marketplace_name> >
  <plugin display name> > <self>, mirroring the plugin detail breadcrumb.
- Backend: new `InnerDetailResponse.parent_display_name` field, populated via
  `_curated_plugin_enrichment` from marketplace-metadata.json — same source
  plugin detail hero already uses.

* feat(marketplace): flea inner skill/agent detail + breadcrumb polish

- Flea inner skill/agent detail page parity with curated:
  * GET /api/marketplace/flea/{id}/skill/{name} + /agent/{name}
    returning InnerDetailResponse (mirror of curated_skill_detail).
  * /marketplace/flea/{id}/skill|agent/{name} web routes that render
    marketplace_item_detail.html with source='flea' + innerName context.
  * Frontend apiURL grows a third branch for flea-inner; breadcrumb
    grows to 4 segments (Marketplace > Flea Market > <plugin display
    name> > <self>) when innerName is set.
  * Telemetry attribution: MarketplaceItemLookup resolves
    <flea_plugin>:<inner> prefixes to (source='flea',
    parent_plugin=<plugin name>) so nested invocations land in the
    same rollups curated nested skills use. USAGE_PROCESSOR_VERSION
    bumped 5 -> 6 so the reprocess loop re-attributes historic events.
- Breadcrumb 2nd segment is now a generic clickable "Curated
  Marketplace" / "Flea Market" link to /marketplace?tab=... instead
  of the opaque per-instance marketplace_name. Applied on both plugin
  detail and inner item detail.
- Inner item hero telemetry chip works for both sources: installedCount
  branches on parent_stack_count (curated) vs install_count (flea),
  installed segment drops the "Plugin:" prefix for flea standalone /
  inner items.
- Updated row dropped from Details sidebar on item detail — the hero
  pill already carries the value, sidebar row was duplicate.

* feat(item-detail): block stack-install on flea inner items (mirror curated)

Inner skills/agents nested inside a flea plugin can no longer be added
to a user's stack on their own — adoption only happens at the plugin
level, same rule curated nested items have followed since launch.

- Hero action: when innerName is set (curated nested OR flea nested),
  render "Open parent plugin →" link + helper text instead of the
  install/remove buttons. Flea standalone entities (no innerName) keep
  the normal install UX.
- Meta-row: same branch now serves curated + flea inner — "part of
  <parent plugin display name> · by <author>" with the parent link
  pointing at the right detail page per source.

No API gate change needed: POST /api/store/entities/{id}/install only
accepts existing entity ids (plugin-level), inner items have no entity
id of their own so the endpoint cannot target them directly.

* feat(marketplace): telemetry chip on inner cards + fix flea hero chip visibility

Inner skill/agent cards on the plugin detail page now carry the same
four-segment funnel chip the marketplace listing cards show (N active
. N calls . trend . N installed), for both curated nested skills and
flea nested skills. Plus two fixes that were keeping the hero chip
hidden on flea plugin / flea inner detail pages.

- Backend `_load_inner_items_stats_by_parent(conn, source, parent_plugin)`
  bulk loader: one query per plugin against usage_marketplace_item_window
  + one against _daily, returning {(name, type): stats}. Avoids N+1
  per-card lookups.
- `InnerItemSummary` gains invocations_30d / distinct_users_30d /
  trend_pct / parent_stack_count fields. `curated_detail` and
  `flea_detail` (in the entity.type=='plugin' branch) enrich the
  skills / agents lists after the existing cover-photo enrichment loop.
- `marketplace_plugin_detail.html`: new `.plugin-detail .inner-card
  .inv-chip*` CSS lifted from marketplace.html with the listing-card
  rules, new buildInnerCardChip() helper, buildCardSection appends
  the chip to each card body. Same gate as the listing card (hidden
  on parent_stack==0 && calls==0).

- fix(flea): flea_detail forgot to populate PluginDetailResponse.stack_count
  from entity.install_count (listing card does this on line 851; detail
  endpoint didn't). Hero chip gate `stackCount===0 && calls===0` then
  always hid the chip even when the entity had installs. Now mirrors
  listing card semantics: stack_count == install_count for flea.
- fix(flea inner): renderInnerHeroTelemetry was reading `d.install_count`
  for any non-curated source. InnerDetailResponse has no install_count
  field — it has parent_stack_count (populated server-side from the
  parent flea plugin's install_count). Gate + label now read
  parent_stack_count for both curated nested AND flea nested scenarios;
  install_count remains the flea standalone path.

* fix(marketplace): Owner label on flea + parent-centric sidebar for flea inner

- Plugin detail Details sidebar — authorship row label now tracks the
  source: curated bundles get `Curator` (existing behaviour), flea
  bundles get `Owner`. The `owner_todo` reminder placeholder stays on
  the curated branch only; flea falls through silently.
- Inner item detail Details sidebar — flea-inner (skill/agent nested
  inside a flea plugin) now shares the curated nested layout: Parent
  plugin / Bundle size / Active days / Last used / Owner. Drops the
  flea-standalone shape's `Category`, `Version`, `Installs`, `Released`
  rows that didn't apply to a nested item. Active days + Last used were
  already wired (telemetryRows) — they just weren't on the flea-inner
  branch.

* fix(tests): bump SCHEMA_VERSION assertions 47 -> 48 post-rebase

The marketplace telemetry migration was renamed _v46_to_v47 -> _v47_to_v48
during the rebase onto main (collision with #326 FTS BM25 migration that
took the v47 slot). Two test files still asserted the pre-rebase value:

- tests/test_home_stats.py::test_schema_version_constant_is_46 (CI red)
- tests/test_schema_v46_migration.py::test_schema_version_is_46

Renames the helper fn name + bumps the assertion. The other two test
files (test_db_schema_version.py, test_schema_v42_migration.py) were
already updated in the rebase resolution.

* fix(telemetry): _build_telemetry returns None when invocations_30d == 0

The follow-up commit that introduced the always-return-dict shape broke
the test contract from the original v46 PR (commit b603e998):

  tests/test_marketplace_telemetry.py::TestDetailTelemetry::
    test_detail_endpoint_telemetry_absent_when_no_data
    AssertionError: assert {'daily_series': [...], ...} is None

Both `PluginDetailResponse.telemetry` and `InnerDetailResponse.telemetry`
are declared `Optional[Dict] = None`, the frontend renders are None-safe
(`d.telemetry || {}` guard + `if (!d.telemetry || ...)` on daily_series),
so dropping the dict on zero activity is the cleaner default.

* release: 0.54.21 — marketplace telemetry refactor (schema v48) + flea inner detail parity + listing UX polish

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-15 20:58:03 +02:00
ZdenekSrotyr
650ea3c804
feat: Agnes specialist agents and skills under .claude/ (#328) (#328)
Four knowledge skills auto-load into the main agent's context when
their description matches the work; invokable explicitly via
Skill(<name>):

- agnes-orchestrator — extract.duckdb ATTACH flow, query_mode
  semantics, _remote_attach, rebuild lock
- agnes-rbac — require_admin vs require_resource_access,
  ResourceType registration
- agnes-connectors — _meta contract, three connector shapes,
  new-connector checklist
- agnes-release-process — CHANGELOG discipline, release-cut,
  version bump, post-merge auto-rollback

Three reviewer subagents fire in parallel at end of PR work; one
releaser subagent handles pre-merge release-cut + post-merge tag /
GitHub Release:

- agnes-reviewer-rules — CHANGELOG bullet, vendor-agnostic scan,
  AI attribution, commit hygiene (always fires)
- agnes-reviewer-rbac — endpoint gates, ResourceType registration
  (fires on app/api/, app/auth/ diffs)
- agnes-reviewer-architecture — extract.duckdb invariants, schema
  migrations, rebuild lock (fires on src/, connectors/ diffs)
- agnes-releaser — Phase 1 pre-merge release-cut commit; Phase 2
  post-merge tag + GitHub Release

.gitignore un-ignores .claude/agents/ and .claude/skills/ while
keeping the rest of .claude/ local-only. CLAUDE.md gets a new
'Specialized agents and skills' section pointing at the two
directories.

Source of truth for the rules these encode remains CLAUDE.md +
docs/RELEASING.md — skills explicitly defer to the master docs on
conflict.

Design rationale: docs/superpowers/specs/2026-05-15-agnes-agents-design.md
Implementation plan: docs/superpowers/plans/2026-05-15-agnes-agents.md
2026-05-15 20:39:11 +02:00
ZdenekSrotyr
c5d67faad2
feat(memory): DuckDB FTS BM25 search for knowledge items (#121) (#326)
* feat(memory): DuckDB FTS BM25 search for knowledge items (#121)

Replaces `title ILIKE '%q%' OR content ILIKE '%q%'` ranked by
insertion order with BM25 relevance ranking via the DuckDB `fts`
extension. Czech queries like `cesky` match documents containing
`česky` (`strip_accents=1` + `lower=1`).

Architecture:
- src/fts.py — ensure_fts_loaded / ensure_knowledge_fts_index helpers.
  The extension is per-connection (INSTALL persisted at engine level,
  LOAD per-conn). Both helpers are idempotent and soft-fail on
  unavailability with a logged WARNING.
- Schema v47 (_v46_to_v47) — builds the initial BM25 index over
  knowledge_items(title, content) keyed by id. Migration is
  best-effort against ANY exception (not just duckdb.Error) so the
  schema bump cannot get stuck on v46 if a non-DuckDB error escapes
  the helper.
- KnowledgeRepository.search — FTS-or-ILIKE dichotomy with execute-
  time fallback. Same filter surface (statuses / category / domain /
  source_type / personal / audience / dismissed) either way.
  ensure_fts_loaded() returning True only guarantees the extension is
  loadable, NOT that the index exists — migration soft-fail or a
  concurrent overwrite=1 rebuild's drop-then-create window leaves the
  extension loaded but the index missing. The BM25 execute is wrapped
  in try/except duckdb.Error → ILIKE retry so transient failures
  cannot 500 the /api/memory?search= endpoint.
- KnowledgeRepository.count_items — mirrors the same FTS-or-ILIKE
  decision tree plus the execute-time fallback so the count always
  matches the paginated result set.
- Per-mutation rebuild — create and title-or-content update rebuild
  the index via overwrite=1 PRAGMA. Status flips skip (token stream
  unchanged).
- app/main.py lifespan rebuilds once at boot as a safety net for
  instances already on v47 across restarts.
- bm25_score column shape: ILIKE fallback now selects
  `NULL AS bm25_score` so the result column set matches the FTS
  path. Consumers can read the score uniformly; absence of relevance
  ranking is signalled by the column being None everywhere, not
  missing.

Tests in tests/test_knowledge_fts_search.py (9 tests):
- BM25 multi-term match set + adversarial-review fix asserting
  higher-density doc ranks first (skipped if extension unavailable).
- bm25_score column attached when extension available.
- ILIKE fallback path on search + count_items via patched
  ensure_fts_loaded → False; bm25_score is None on this path.
- Adversarial-review fix: search and count_items also fall back when
  the extension is loaded but the index is missing (simulated via
  drop_fts_index PRAGMA — the exact production failure mode the
  fallback guards against).
- Index rebuild on create (new item searchable immediately).
- Title update re-surfaces row under new term, drops old.
- Czech-diacritic round-trip (cesky query → česky doc).

Pinned schema-version asserts bumped 46 → 47 (test_db_schema_version,
test_home_stats, test_schema_v42_migration, test_schema_v46_migration).

Closes #121.

* release: 0.54.20 — Corporate Memory BM25 search + All-Items bulk-edit batch bar
2026-05-15 20:10:59 +02:00
ZdenekSrotyr
e77f6067fa
feat(memory): bulk-edit batch bar on All Items tab (#129) (#325)
Follow-up to #62 / PR #126 — that PR shipped the bulk-edit batch bar
in the Review tab only and deferred the symmetric bar on All Items.
This adds it.

Scope:
- New .batch-bar block in #tab-all with selectAllAll, selectedCountAll,
  and the five bulk-edit buttons (Move to category / Move to domain /
  Add tag / Remove tag / Set audience).
- renderItemCard signature: third param widened from boolean isReview
  to a 'review' | 'all' | 'browse' mode enum so the Browse tab's
  call site can explicitly suppress the row checkbox. The earlier
  iteration of this PR widened the checkbox condition without
  auditing other callers, which left the Browse tab with orphan
  checkboxes that fired updateSelectionCount('all') against an
  invisible tab. Adversarial-review fix.
- updateSelectionCount('all') toggles the *BtnAll set; renderAllItems
  resets the header checkbox + recomputes counts on every re-render so
  stale selection state can't survive a list refresh.

Approve / Reject stay scoped to Review per the issue's scope decision
— status-change actions belong with the per-row action buttons or the
keyboard workflow in Review.

Existing JS plumbing already assumed tab-aware selection
(getSelectedIds(tab), toggleSelectAll(tab), openBulkEditModal reads
currentTab); the All-items DOM and the *BtnAll ID set are the only
additions.

Tests in tests/test_admin_memory_page_all_items_batch_bar.py:
- test_admin_page_renders_all_items_batch_bar — all five button IDs
  + the select-all checkbox + the toggleSelectAll('all') callback
  are present on the rendered admin page.
- test_all_items_bar_omits_approve_reject — Review-only Approve /
  Reject IDs do not appear with the All suffix (scope guard).
- test_browse_tab_omits_row_checkbox — regression guard for the
  Browse-tab orphan checkbox: confirms the call site uses 'browse'
  mode and renderItemCard omits the checkbox markup on that branch.

Closes #129.
2026-05-15 20:05:21 +02:00
ZdenekSrotyr
ff8c0f9656
docs(claude.md): add auto-rollback rule to release-process non-negotiables (#323)
Follow-up to #314 (v0.54.17). #314 grew an auto-rollback chain
(release.yml smoke-test fail → rollback-on-smoke-fail job → reusable
rollback.yml re-tags :stable to the previous known-good build + opens
a tracking issue). That's rule-relevant for agents merging to main —
the post-merge success signal is now 'smoke-test green +
rollback-on-smoke-fail skipped', and a fired rollback means the merge
shipped a broken image and needs investigation before any further push.

Adds a 4th non-negotiable rule covering exactly that, and broadens
the link description on docs/RELEASING.md to flag the manual-rollback
runbook + weekly tag-housekeeping that #314 added there.

Doc-only, no CHANGELOG bullet (non-user-visible per docs/RELEASING.md §1).
2026-05-15 19:16:04 +02:00
ZdenekSrotyr
ed3e8337ab
fix(jira): harden _remote_links fetch — prevent transient outage from wiping parquet rows (#319)
* fix(jira): harden _remote_links fetch — transient API failure no longer wipes parquet rows

Pre-fix, all three fetch_remote_links call sites (service.py,
scripts/backfill.py, scripts/backfill_remote_links.py) silently
returned [] on 401/403/429/5xx or httpx.RequestError. Callers overlaid
that [] onto cached issue JSON, and transform_remote_links interpreted
the empty list as 'issue legitimately has no remote links — delete
existing rows', so a transient Jira auth blip permanently wiped
remote-link history.

Now:
- Every fetch site raises JiraFetchError on non-200/non-404 status,
  on httpx.RequestError, and on the 'service not configured' path.
- Overlay sites skip the _remote_links key when fetch raises, leaving
  it ABSENT (not present-but-empty).
- transform_remote_links returns None for absent/null keys (preserve
  existing rows) vs [] (legitimate empty — wipe).
- Both consumers (batch transform_all, incremental
  transform_single_issue) honor the new contract.
- End-to-end tests
  test_incremental_preserves_remote_links_when_overlay_absent and
  test_incremental_wipes_remote_links_when_overlay_present_but_empty
  lock both halves.

Adversarial-review fixes bundled:
- service.py: unconfigured-service path now raises JiraFetchError
  instead of returning [] (a webhook can arrive while API creds are
  missing — HMAC verification uses a separate JIRA_WEBHOOK_SECRET).
  Regression guard test_raises_when_unconfigured added.
- consistency_check.py: AUTO_FIX_THRESHOLD bumped 10 -> 20 to cover
  typical SLA-poller hiccups before escalating to ERROR.
- CLAUDE.md: connectors/jira/transform.py removed from 'Files NOT to
  modify' (overlay-contract change required touching it; module
  remains sensitive but is no longer off-limits).

* release: 0.54.19 — jira remote_links hardening (transient API failure no longer wipes parquet rows)
2026-05-15 19:09:46 +02:00
ZdenekSrotyr
9e948abc9c
release(0.54.18): Curated Memory restructure + per-user Dismiss + bundled adversarial-review fixes (#316/#320/#322) (#324)
* feat(web): Curated Memory restructure + per-user Dismiss + filter-state utility

Squashed from cvrysanek/zsrotyr's 4-commit PR branch + rebased onto
current main + CHANGELOG bullets spliced into [Unreleased] (preserves
existing #316/#320/#322 entries that landed on main since the branch
was authored).

Routes + access:
- /corporate-memory now user-facing (get_current_user), in primary
  nav next to "Data Packages" — same gate as /api/memory/*.
- /admin/corporate-memory is the new admin review queue location
  (was /corporate-memory/admin); reached via Admin dropdown. Template
  renamed: corporate_memory_admin.html → admin_corporate_memory.html.

Visual chrome:
- Both pages migrate to shared _page_hero.html blue hero band.

Per-user Dismiss (new feature, schema v46):
- knowledge_item_user_dismissed(user_id, item_id, dismissed_at) + index.
- POST /api/memory/{id}/dismiss + DELETE (idempotent).
- Mandatory items can never be dismissed — enforced at 2 layers.
- GET /api/memory: hide_dismissed=false default + dismissed_by_me flag.
- GET /api/memory/bundle: always excludes dismissed for the caller.
- UI: Dismiss/Undismiss button per item (hidden for mandatory),
  gray-out + line-through for dismissed rows, Hide-dismissed toggle.

Admin edit modal:
- Category as <select> + "Add new category…" reveal.
- Audience as <select> with (unset)/all/group:<name> from RBAC.
- Tags: full tag-input widget (pills, ×-remove, Backspace pop,
  Enter/comma to add, ↑/↓ typeahead from EXISTING_TAGS).

Bulk-edit modal pickers (closes #128):
- Move-to-category / Add-tag: <select> + add-new.
- Set-audience: <select> (no more typo-able 'gourp:eng').
- Remove-tag: closed-set picker.

FilterState utility:
- app/web/static/js/filter-state.js — save/load/clear/bindInputs
  for per-page localStorage filter state. Adopted on /corporate-memory.

E2E verified live on a real VM through the API + browser flow.

* release: 0.54.18 — Curated Memory restructure + 4 adversarial-review fixes

Bundles together:
- #316  fix(store): surface review failures + harden publish gate
        (BREAKING fail-CLOSED guardrail, override v2+ promote, restore guard,
        retry/rescan staged-bundle, banner widening, LLM truncation retry)
- #320  fix(store): C2 bundle export RBAC + H2 per-entity write lock +
        H3 update_status compare-and-swap with bg_verdict_skipped audit
- #322  fix(store): M1 prompt sentinel filename escape + M2 atomic
        promote_to_version helper + L1 admin forensic download per-version
- #324  Curated Memory restructure + per-user Dismiss + FilterState utility

Bump from 0.54.17 → 0.54.18 (patch — pre-1.0 policy: every cycle is patch).
2026-05-15 18:51:05 +02:00
Vojtech
bb703517c9
fix(store): close 2 medium + 1 low adversarial-review findings (#322)
Three remaining findings from Codex's adversarial review of PR #316
(issue #318), plus a pre-existing version-numbering bug surfaced while
fixing the atomic-promote ordering.

M1 — Prompt sentinel escape now covers file PATHS, not just file
BODIES. Pre-fix the per-file `--- FILE: {rel} ---` header inlined the
untrusted relative path unescaped. A ZIP whose relative path
concatenated to `</bundle>` (a `<` directory plus a `bundle>` child)
could forge the trust-boundary close tag from inside the path slot
and inject apparent system instructions after the boundary. Same
`_escape_sentinels` helper now runs on both rel and body.

M2 — Live-bundle swap + DB promote is now atomic-ish. The runner /
override / inline-promote paths previously called
`repo.promote_version(...)` then `_swap_live_to_version(...)`. A
missing `versions/v<N>/plugin/` made the swap silently return False
— leaving the DB ahead of live. New `promote_to_version` helper in
`app/api/store.py` swaps FIRST (with the existing
staging → backup → live rename chain) and only advances the DB row
after the on-disk swap succeeds; rolls live back to prior on DB
write failure.

While wiring up M2, the strict source check exposed a pre-existing
bug: `update_entity` and `restore_version` derived
`new_version_no = entity.version_no + 1`. Under deferred promotion
that's wrong — entity.version_no stays at the last approved version
while version_history grows with blocked / pending entries.
Subsequent PUTs would overwrite an in-flight blocked v2 dir's bytes,
then the runner's hash-match promotion in `runner.run_llm_review`
would load bytes that didn't match the recorded submission hash.
Fixed by deriving from `max(version_history.n) + 1`.

L1 — Admin forensic download now serves STAGED bundle bytes per
submission, not live. Pre-fix downloading a blocked v2 streamed
live's prior approved v1 bytes — admins reviewing whether to
override saw the wrong bytes. Resolves staged `versions/v<N>/plugin/`
via `_version_no_for_submission`; falls back to live for legacy rows
without history linkage.

Tests:
- test_filename_with_bundle_sentinel_is_escaped
- TestAtomicPromote::test_missing_source_dir_does_not_advance_db
- TestAdminBundleDownload::test_download_v2_blocked_returns_staged_bundle_not_live
2026-05-15 17:56:09 +02:00
Vojtech
6fb11a137b
fix(store): close 1 critical + 2 high adversarial-review findings (C2/H2/H3 from #318) (#320)
* fix(store): close 1 critical + 2 high adversarial-review findings

Three findings from Codex's adversarial review of PR #316 (issue #318).

C2 — `/api/store/bundle.zip` leaked quarantined entities. The export
endpoint called `repo.list(...)` with no `visibility_status` filter,
so any authenticated non-admin could download pending / blocked v1
bytes — bypassing the publish gate. Mirrored the browse-listing gate:
non-admin sees only `approved` (plus their own non-approved entries
via `include_owner_id`); admins skip the filter.

H2 — concurrent PUTs on the same entity could both pass the
`latest_for_entity` pending gate. The `update_entity` and
`restore_version` handlers now wrap their critical section in a
per-entity asyncio.Lock (`_hold_entity_write_lock`). Single-process
deployments are now serialized; multi-worker deployments still have
a residual window (tracked in issue #318).

H3 — `StoreSubmissionsRepository.update_status` blindly overwrote any
current status. A late BG-task LLM verdict could clobber an
`overridden` row back to `approved` / `blocked_llm` after the admin
had already force-published. Added compare-and-swap on terminal
statuses (`approved`, `overridden`, `blocked_inline`); callers that
legitimately need to overwrite (admin rescan etc.) pass
`allow_terminal_overwrite=True`. Returns bool indicating whether the
write landed; BG callers no-op on terminal rows.

Tests:
- TestStoreBundle::test_bundle_zip_filters_quarantined_for_non_owner
- TestStoreBundle::test_bundle_zip_owner_sees_own_pending
- TestStoreBundle::test_bundle_zip_admin_sees_all
- TestConcurrentPutSerialization::test_per_entity_lock_serializes
- TestConcurrentPutSerialization::test_per_entity_lock_does_not_serialize_across_entities
- TestBgTaskIdempotency::test_late_verdict_does_not_clobber_overridden
- TestBgTaskIdempotency::test_explicit_allow_terminal_overwrite_works

* review fix: runner.run_llm_review honors update_status CAS bool

Codex's CAS in update_status closes the DB-level race correctly, but
runner.run_llm_review was still discarding the new bool return on both
its `approved` and `blocked_llm` branches. When the CAS no-op'd
(submission already at terminal status — most commonly an admin
override fired mid-review), the runner kept running the downstream
cascade:
  - set_visibility_if_pending (no-op on approved, but still ran)
  - promote_version + _swap_live_to_version (forward-only check
    mitigated worst case)
  - update_flea_attribution
  - audit.log(action="store.submission.approved" / "blocked_llm")
    — this is the operator-visible damage: the audit trail would
    show a verdict that contradicts the row's actual `overridden`
    status.

Fix: capture the bool, skip the cascade on no-op, log a single
`store.submission.bg_verdict_skipped` audit row instead. Mirrors the
existing `superseded_reason` path the runner already has for the
archive-during-review case (TestPRReviewFixes::
test_bg_verdict_skipped_when_admin_archives_during_review).

Test: TestBgTaskIdempotency::test_runner_late_verdict_logs_skipped_not_approved
sets up the v1-approved + v2-pending + admin-override sequence, fires
run_llm_review directly with a mocked "approved" verdict, asserts row
stays overridden AND audit has bg_verdict_skipped AND audit does NOT
have a contradictory approved entry.

CHANGELOG H3 bullet expanded to acknowledge the bg_verdict_skipped
audit-row behavior — operator reviewing the queue now sees dropped
verdicts explicitly rather than via row-vs-audit contradiction.

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-15 17:45:43 +02:00
minasarustamyan
75210897d2
fix(dev): move system.duckdb seed blocks from create_app() into lifespan (#321)
DuckDB 1.5 enforces a strict per-process exclusive file lock on
system.duckdb. The uvicorn --reload master process was importing
app.main, which called create_app() at module load — and create_app()
was opening system.duckdb via three seed/warning blocks (seed_admin,
scheduler_user, no-password warning). The forked worker then could
not acquire the lock and every request 500'd with "Could not set lock
on file system.duckdb".

Move the three blocks into the existing lifespan (worker-only). The
master/reloader no longer touches system.duckdb. _resolve_error_user
stays in create_app() — it only runs at request time inside a closure,
which is worker-context already.

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
2026-05-15 17:08:05 +02:00
Vojtech
a694a30a5e
fix(store): surface review failures + harden publish gate (#316)
* fix(store): surface review failures + harden publish gate

Four independent fixes to the flea-market submission pipeline, all surfaced
by an admin upload that landed at status='approved' without an LLM review.

1. LLM truncation no longer pins submissions in review_error.
   - Raised MAX_RESPONSE_TOKENS 2500 → 6000 in llm_review.py
   - Added one-shot retry-with-doubled-budget in anthropic_provider.py
     (capped at 4× initial)

2. Flea detail page surfaces the latest submission's failure verdict even
   when a previously-approved version is still serving (deferred-promotion
   path). The _quarantine_banner gate widened from `visibility != approved`
   to also fire on `blocked_inline / blocked_llm / review_error`, with copy
   that distinguishes the v2+ edit case ("Latest edit failed review —
   previously approved version (vN) keeps serving") from the initial-upload
   quarantine wording.

3. Restore button + endpoint no longer allow restoring a version that was
   never approved. Added StoreEntitiesRepository.get_with_version_approvals
   joining store_submissions, gated the UI button on submission_status in
   ('approved', None), rendered status pills for non-restorable rows, and
   added a 400 version_not_approved guard in POST /restore.

4. **BREAKING (operator-facing)**: publish gate is now fail-CLOSED on
   misconfig. The previous get_guardrails_enabled() silently fell back to
   "disabled, auto-approve everything" when guardrails.enabled=true in YAML
   but no ANTHROPIC_API_KEY was in env. Split into:
     - get_guardrails_enabled()              (intent — YAML)
     - get_guardrails_llm_provider_ready()   (readiness — env)
   Three-state matrix:
     enabled=false                       → auto-approve (unchanged)
     enabled=true + ready=true           → normal pipeline (unchanged)
     enabled=true + ready=false (NEW)    → submissions hold at pending_llm
                                           awaiting admin retry or override
                                           (was: silent auto-approve)
   Admin "Retry review" eligibility broadened to include pending_llm.
   Boot-time WARNING banner surfaces the misconfig in app/main.py.
   docs/STORE_GUARDRAILS.md updated with the three-state matrix.
   Operators relying on the auto-fallback for local-dev no-LLM setups must
   now explicitly set `guardrails.enabled: false` in instance.yaml.

Tests: 4623 passed. Added TestPublishGateFailClosed (4 tests) and
TestRestoreVersion::test_restore_rejects_* (3 tests). conftest.py adds an
autouse fixture defaulting guardrails OFF so legacy tests don't need to
know about the new toggle.

* fix(store): admin override promotes v2+ edits to current

The override handler at app/api/admin.py:3708 only flipped submission
status → 'overridden' and entity visibility → 'approved'. Under the v37+
deferred-promotion model that's insufficient for v2+ edits / restores:
the new bundle sits in versions/v<N>/plugin/ and the entity row stays at
the prior approved version_no + hash + on-disk live bundle. Installers
kept getting the OLD bytes the admin had just intended to replace.

Mirror the runner.run_llm_review auto-approval branch: look up the
submission's version_hash in entity.version_history, and if its `n`
differs from entity.version_no, promote_version + _swap_live_to_version.
Initial v1 overrides are unaffected — the loop finds n=1 == version_no
and skips promotion.

Tests:
- test_override_v2_edit_promotes_to_current: stage v1 approved + v2
  blocked_llm; override the v2 sub; assert entity.version_no=2,
  entity.version flips off the v1 hash, and the live plugin/ dir
  mirrors versions/v2/plugin/.
- test_override_v1_initial_upload_no_promote: regression guard so the
  promote loop doesn't accidentally bump a v1 override.

Audit log gains a promoted_to_version_no field on the override action.

* fix(store): retry/rescan review staged bundle; override forward-only

Two adversarial-review findings from a Codex pass on the publish-gate
work.

C1. Admin retry + rescan were passing live `plugin/` to the LLM. For a
v2+ submission held at `pending_llm` / `blocked_llm` / `review_error`,
live still holds the prior approved version's bytes — so the LLM
reviewed the WRONG bytes, and the runner's hash-match promotion in
`run_llm_review` would then advance the entity to staged bytes that
were never actually reviewed. Resolve the staged
`<entity>/versions/v<N>/plugin/` from the submission's
`version_history` entry, with a fall-back to live for legacy pre-v37
rows that never seeded a versions/ dir. Helpers
`_submission_plugin_dir` and `_version_no_for_submission` added to
`app/api/store.py` so override / retry / rescan share one path.

H1. Override's promote loop used `target != current`, which would
silently demote the live bundle when admin overrode a stale v2
submission while v3 was already approved + live. Changed to
`target > current` so override flips status + visibility on the row
regardless, but on-disk promotion only fires forward. Same `>`
defensive guard applied in `runner.run_llm_review` so a late LLM
verdict racing with a newer approval can't demote either.

Tests:
- TestAdminRetryReviewsStagedBundle::test_retry_v2_blocked_passes_staged_dir_not_live
- TestAdminRetryReviewsStagedBundle::test_rescan_v2_blocked_passes_staged_dir_not_live
- TestOverrideForwardOnly::test_override_stale_v2_does_not_demote_when_v3_current

* review polish: CHANGELOG drift, override eligibility, defensive copy

Three small additions on top of the retry/rescan staged-bundle fix:

1. CHANGELOG: the PR's bullets had drifted into the released
   [0.54.17] section during rebase (context-match landed them next
   to already-released content). Moved them up to [Unreleased] where
   they belong; [0.54.17] now holds only what was actually released
   (refresh-marketplace ls-remote, /me/activity hero, CI sharding +
   workflow polish).

2. app/api/admin.py: admin override eligibility now accepts
   pending_llm alongside blocked_inline + blocked_llm + review_error.
   Closes a UX gap from the new fail-CLOSED behavior: under
   enabled-but-not-ready, a known-good submission would otherwise
   sit indefinitely until the admin set credentials AND clicked
   Retry. Override already routes through version_history (and is
   now forward-only on promote), so it stays safe for v2+ deferred-
   promotion submissions.

3. src/repositories/store_entities.py: get_with_version_approvals
   defensively copies each version_history entry before annotating
   with submission_status. self.get() re-parses JSON each call today
   so this is belt-and-suspenders against any future caching layer
   leaking the annotated key into a subsequent plain get() call.

Tests: 112 passed (focused on test_store_entity_versions +
test_admin_store_submissions, covering the retry/rescan staged-
bundle fix the author shipped + this polish).

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-15 15:52:07 +02:00
Vojtech
fbe756685b
fix(api): redirect unauthorized browser requests to login for initial workspace zip (#315)
* fix(api): redirect unauthorized browser requests to login for initial workspace zip

* fix(api): import Request and RedirectResponse in initial_workspace router

FastAPI was treating `request` as a required query parameter because
`Request` was missing from the fastapi import, causing 422 on
GET /api/initial-workspace.zip. `RedirectResponse` was also missing
(used for browser redirect to /login).

* review fixes: CHANGELOG + comment + 2 edge tests

- CHANGELOG.md: add [Unreleased] ### Fixed bullet per project rule.
- app/api/initial_workspace.py: comment explaining why this /api/*
  endpoint intentionally opts out of the _API_PATH_PREFIXES
  "never redirect /api/*" contract in app/main.py, and why matching
  only `text/html` (not `*/*`) mirrors _wants_html()'s rationale.
- tests: add Accept: */* (curl default) and empty-Accept cases —
  both lock in 401, defending the curl-tooling-must-keep-getting-401
  contract the comment now documents.

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-15 15:18:39 +02:00
ZdenekSrotyr
9f5adbce37
ci: consolidate release pipeline (salvageable subset of #139) (#314)
* ci: add actionlint workflow lint, drop superseded deploy.yml stub

* ci: extract rollback into reusable rollback.yml, wire into release smoke-test

* ci: add weekly prune-dev-tags workflow for legacy CalVer tag/image cleanup

* release: 0.54.17 — CI/release workflow consolidation

* fix(ci): warn when rollback.yml receives a non-stable failed_image_tag

* fix(ci): rollback.yml + prune-dev-tags.sh review findings

rollback.yml:
- Pass workflow_dispatch inputs (failed_image_tag, target_image_tag)
  through env: instead of textual ${{ }} splicing into bash run blocks
  — prevents an actor with workflow_dispatch privilege from injecting
  shell via quote/backtick payloads.
- Guard against TARGET == FAILED when only one stable-* tag exists
  (fresh repo, or aggressive pruning at month boundary). Fail loudly
  rather than re-push the broken image as :stable.
- Add commit SHA to the rollback tracking-issue body — github.sha is
  inherited across workflow_call, so on-call no longer has to navigate
  rollback run → caller-workflow breadcrumb → failing commit.

prune-dev-tags.sh:
- Replace 'printf … | head -20' preview pipeline with array slice
  ('"${TO_PRUNE[@]:0:20}"'). Under set -o pipefail, head closing
  the pipe early SIGPIPEs printf (exit 141) and aborts the script
  before any deletion runs — exactly the multi-month-backlog scenario
  the script targets.
- Refactor GHCR-pass: fetch versions JSON once before the loop, then
  build a tag→version-id map up-front. Closes two problems:
    1. O(N × pages) GHCR API calls collapse to one paginated listing
       — months of accumulated CalVer tags no longer risk tripping
       abuse detection.
    2. The new jq filter excludes any version that ALSO carries a
       floating alias (:stable, :dev, *-latest). GHCR DELETE-version
       drops the entire manifest, so pruning a CalVer tag that shares
       a manifest with :stable (e.g. after a rollback re-tag) would
       have vaporized :stable. Now it's skipped with a log line.

lint-workflows.yml:
- Add an explicit shellcheck step. actionlint only walks
  .github/workflows/ and the shell embedded in their run: blocks, so
  freestanding scripts/ops/*.sh (which are in the workflow's path
  filter) were never actually validated despite triggering CI.

* fix(ci): shellcheck --severity=warning to skip pre-existing info findings

The new shellcheck step caught info-level findings (SC1091, SC2015) in
agnes-auto-upgrade.sh / agnes-tls-rotate.sh — pre-existing, not regressed
by this PR. Constrain shellcheck to warning+ severity (real bugs) so info
and style findings don't block CI; mirrors the actionlint step's
continue-on-error initial-rollout posture.

* fix(ci): second-pass review findings — concurrency, walk-back, failure propagation

rollback.yml:
- Add own concurrency block (group: rollback-<repo>-<failed_tag>,
  cancel-in-progress: false). The caller release.yml uses
  cancel-in-progress: true to avoid duplicate CalVer claims, but a
  second push to main mid-rollback would otherwise kill the workflow
  between the :stable recovery push and the :deprecated-* audit push,
  leaving :stable stuck on the broken image. A reusable workflow's own
  concurrency overrides the inherited one.
- Walk back through stable-* tags newest-first, skipping any whose
  :deprecated-<stripped> GHCR alias already exists (carries the mark of
  a prior failed rollback). The previous 'second-most-recent' heuristic
  could re-point :stable at a known-broken image on cascading failures.
- Reorder re-tag step: push :stable recovery FIRST, then the
  :deprecated-* audit tag. Defense in depth — even if the concurrency
  block somehow misfires, the worst case is missing audit metadata
  rather than production stuck on the broken image.
- Move GHCR login before resolve step so 'docker manifest inspect' can
  probe for :deprecated-* aliases during walk-back.
- Document the top-level permissions block's dual semantics
  (workflow_dispatch grants directly; workflow_call acts as a cap
  intersected with the caller's job-level permissions).

release.yml:
- Rewrite the 'issues: write' comment. Old wording ('default for jobs')
  was factually wrong — GITHUB_TOKEN's default for issues is never write
  — and read as 'this line just documents a default', so a future
  cleanup PR could delete it. The line is load-bearing: workflow_call
  permissions are bounded by the caller's GITHUB_TOKEN scope, and
  removing it would silently 403 rollback.yml's gh issue create step.

prune-dev-tags.sh:
- Drop the '|| echo "[]"' fallback on the GHCR versions fetch. The
  fallback turned every API failure (403 missing scope, 429 rate limit,
  transient 5xx) into a silent no-op with exit 0 — operators saw a
  green run while every TAG fell through to the same 'no eligible
  version' skip message used for legitimate manifest-collision skips.
- Reorder: fetch GHCR versions BEFORE any git-tag deletion. Git-tag
  delete is irrecoverable (next run rebuilds TO_PRUNE from 'git tag
  -l', so an orphan GHCR image is never enumerated again). Fetching
  first means an API failure aborts cleanly with no state change.
- Track PRUNE_FAILED flag. 'git push --delete' fallback is no longer
  unconditional — local 'git tag -d' is gated on successful remote
  push, so a refused remote delete (tag-protection rule, missing
  contents:write) leaves the local tag in place for retry. The flag
  propagates to a final 'exit 1' so the cron run turns red on any
  push or DELETE failure.

lint-workflows.yml:
- shellcheck step now uses 'find scripts/ops -type f -name *.sh' to
  match the workflow's recursive 'scripts/ops/**.sh' path filter. The
  previous bare 'scripts/ops/*.sh' glob only matched top-level files;
  a future script under a subdirectory would have triggered the
  workflow but never been linted.

* docs(releasing): document rollback.yml, prune-dev-tags.yml, lint-workflows.yml

Reflects the new operational workflows landing in this release:
- Auto-rollback paragraph in release.yml description (smoke-test job +
  rollback-on-smoke-fail → rollback.yml)
- rollback.yml subsection — workflow_call + workflow_dispatch entry
  points, walk-back target resolution, immutability + concurrency
  guarantees, manual operator gh workflow run examples
- prune-dev-tags.yml subsection — weekly cron, KEEP_MONTHS retention
  semantics, floating-alias safety, dry_run preview, failure-propagation
  exit-non-zero behavior
- lint-workflows.yml CI quirk — actionlint (continue-on-error) +
  shellcheck (--severity=warning blocking) advisory checks

CLAUDE.md non-negotiable rules unchanged — still high-level and
correct (changelog discipline + release-cut belongs to the PR + run the
full test suite).
2026-05-15 14:06:59 +02:00
minasarustamyan
7907b8082e
perf(cli): use git ls-remote in refresh-marketplace --check (~8s -> ~1s) (#313)
* perf(cli): use git ls-remote in refresh-marketplace --check (~8s -> ~1s)

The SessionStart hook fires agnes refresh-marketplace --check on every
Claude Code session in every workspace. The detector used to run git
fetch origin against the per-user marketplace bare repo just to update
FETCH_HEAD for a HEAD-vs-FETCH_HEAD comparison — paying the full git
object/metadata download (~8s) even on the (overwhelming) no-change
path.

Replace with git ls-remote origin HEAD: one HTTPS round-trip, one line
of text, no objects transferred. Compare the returned SHA against local
HEAD via the new _remote_head_sha and _local_head_sha helpers; emit the
/update-agnes-plugins hook JSON on mismatch, silent on match. Same PAT
wiring (AGNES_TOKEN in env, never on argv), same exit-code contract
(remote-read failure -> exit 1 so the hook's || true swallows it).

The default and --bootstrap paths still do real git fetch + reset --hard
— they need the objects.

* docs(cli): update --check help text to mention git ls-remote, not git fetch

Followup to the previous commit — the --check flag's user-facing help
string still described the old  + FETCH_HEAD comparison.
Updated to match the new ls-remote-based implementation.

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
2026-05-15 06:24:27 +02:00
ZdenekSrotyr
8b5b0f8ef5
fix(web): render <strong> in /me/activity hero subtitle instead of escaping it (#312)
The subtitle was built by ~-concatenating a Markup operand
(user.email | e) with HTML string literals. Under autoescaping,
Jinja2's markup_join escapes every non-Markup part once it hits a
Markup operand — so the literal <strong> tags became &lt;strong&gt;
and the page showed literal "<strong>...</strong>" text around the
email. The | safe in _page_hero.html was too late to undo it.

Switch to {% set %}...{% endset %} block capture: the literal
<strong> stays HTML while {{ user.email }} is still autoescaped.
Regression test asserts the tags render and a hostile email stays
escaped.
2026-05-14 22:27:34 +02:00
ZdenekSrotyr
a1c7849b3e
ci: shard test suite + drop duplicate test run (#311)
The `test` job in ci.yml becomes a 4-way `test-shard` matrix (pytest-split,
balanced by a committed .test_durations), aggregated into a single `test`
status check so branch protection is unchanged.

release.yml's duplicate full-suite `test` job is removed — it re-ran the
same ~10 min suite a second time on every push to main/feature branches.
release.yml is now image-build only; the advisory ruff/mypy steps move to
a lean `lint` job in ci.yml.

Net: ~10 min -> ~3 min wall-clock per push, and the suite runs once
instead of twice.
2026-05-14 20:18:21 +00:00
Vojtech
6a4b3ba461
fix(store-upload): Next/Back/Finish buttons missing .btn base class (#310)
The wizard nav buttons used class="btn-primary" / "btn-secondary"
without the .btn base class, so the padding (10px 20px),
border-radius (8px), font-size, and inline-flex centering rules from
.btn never applied. Buttons rendered as ~18px-tall colored boxes with
no padding (visible mismatch against the sibling Cancel <a> which
correctly used class="btn btn-secondary").

Added .btn to all three buttons (#next-btn, #back-btn, #finish-btn).
No CSS change — purely a markup fix.

Playwright before: next.padding="0px" borderRadius="0px" height=18
Playwright after:  next.padding="10px 20px" borderRadius="8px" height=38
2026-05-14 19:49:13 +00:00
ZdenekSrotyr
d55c8a3c33
feat(web): consolidate the personal /me/* surface — /me/activity + /me/profile (#304)
Consolidates the scattered per-analyst pages into /me/activity (usage
analytics) and /me/profile (account hub). /me/stats and /profile/sessions
301-redirect; /profile, /me/debug, /tokens are removed with every internal
link repointed. Includes an XSS fix in the /me/activity page hero, the
user_id-keyed session-lookup alignment, and the v0.54.15 release cut.

Co-developed by @ZdenekSrotyr and @cvrysanek.
2026-05-14 21:29:51 +02:00
ZdenekSrotyr
a48524509a
docs: consolidate and de-clutter the documentation tree (#306)
CLAUDE.md rewritten (708 -> ~320 lines): four overlapping release
sections collapsed to one, stale v1->v35 schema history dropped (it
lives in CHANGELOG), marketplace endpoint internals and verbose
process sections moved out or tightened.

New focused docs:
- docs/RELEASING.md - release process, deploy workflows, CI quirks
  (RELEASE_TEMPLATE.md folded in as an appendix)
- docs/marketplace.md - marketplace ingestion + re-serving internals
- docs/README.md - documentation index by audience, linked from
  README.md and CLAUDE.md

Archived under docs/archive/: docs/superpowers/ (52 historical
planning artifacts), HACKATHON.md, pd-ps-comments.md,
security-audit-2026-04.md, future/NOTIFICATIONS.md.

Removed the docs/auto-install.md stub. Fixed dangling links in
connectors/jira/README.md and dev_docs/README.md, repointed
code/doc references to archived paths.
2026-05-14 18:54:22 +00:00
ZdenekSrotyr
e290baa31b
release: 0.54.14 — changelog repair + post-#305 dead-code sweep (#309)
Cuts 0.54.14. Repairs the [Unreleased] changelog state left by three PRs that merged since v0.54.13 without proper changelog hygiene, sweeps dead code orphaned by #305 and earlier PRs, and bumps the version.

- CHANGELOG: #307 bullets moved out of the released [0.54.10] section into [Unreleased]; backfilled missing entries for #305 (Removed) and #308 (Changed); [Unreleased] -> [0.54.14] release cut.
- pyproject.toml: 0.54.13 -> 0.54.14.
- app/web/router.py: removed orphaned gws_oauth / instance_admin_email / connector_prompts keys from the shared _build_context ctx dict.
- app/web/templates/home_not_onboarded.html: swept dead connector-tile, automode, and setup-collapsible/minimize CSS + the orphaned JS click-wiring.
2026-05-14 20:41:04 +02:00
Vojtech
cb13f80241
feat(marketplace): rename CTA + expand submit-flow guides (#308)
Three coordinated tweaks to the publication discovery surface:

1. Action-row CTA on /marketplace?tab=curated reads 'Submit a skill
   or plugin' instead of 'Submit a plugin'. Skills are first-class
   citizens of the curated shelf; the old wording made them feel
   like an afterthought. Same rename in the empty-state JS innerHTML
   so the two paths can't drift.

2. Curated guide page (/marketplace/guide/curated) expanded from a
   4-line stub into a 3-step ordered list documenting the Named
   Curator handoff (find curator → handoff → publish + lifecycle).
   New '.guide-fastpath' callout block points users at the Flea
   Market when they want lighter review-bar / faster path. Primary
   CTA at the bottom of the curated guide now links to the flea
   guide too, so users who skim past the fast-path callout still
   see the escape hatch.

3. Flea guide page (/marketplace/guide/flea) expanded from a
   3-line stub into a 4-step ordered list (package → upload via
   form → automated review → published). Documents the actual
   /store/new flow + the automated guardrails (manifest, content
   quality, prompt-injection scan) so users know what 'self-service'
   actually means before they upload.

Route titles updated to match: 'Submit a skill or plugin to
Curated Marketplace'.

New file: tests/test_web_marketplace_guide.py — three tests covering
the CTA rename, the curated guide's structural elements (Named
Curators lede, 3 steps, fastpath callout, primary-CTA href), and
the flea guide's structural elements (4 steps, no fastpath
asymmetry, /store/new primary CTA).
2026-05-14 19:44:33 +02:00
minasarustamyan
17159bfad9
fix: refresh-marketplace enables stack plugins; override sentinel is init-time only (#307)
* fix(refresh-marketplace): also enable stack plugins in workspace settings

Reconcile previously stopped at `claude plugin install --scope project`,
which only writes the global plugin registry. Without an entry in the
workspace `.claude/settings.json` `enabledPlugins` map, Claude Code
treats every plugin as disabled — `/plugins` doesn't list them and
their slash commands, skills, and agents are unreachable.

Refresh now writes the enable map after install/update, treating the
user's marketplace stack as the source of truth (re-enables anything a
prior `claude plugin disable` locally turned off). Override workspaces
are skipped via `is_override_workspace`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(override): sentinel governs init only, not runtime CLI

Sentinel `.claude/init-complete` with `override: true` was meant to
let admins ship INITIAL workspace content. The implementation was
over-scoped — `is_override_workspace` check sat inside every Agnes
writer (`install_claude_hooks`, `install_claude_commands`,
`maybe_refresh_claude_hooks`, `_enable_plugins_in_workspace_settings`),
which blocked runtime commands too. Operators on override workspaces
got trapped at the template snapshot: no `enabledPlugins` map from
`agnes refresh-marketplace`, no hook auto-migration from
`agnes self-upgrade`.

Move the check to the init-time call site (cli/commands/init.py,
`if not override_active:`) — the single place where init-time skip
is the right behavior. Writers themselves become unconditional;
runtime CLI now updates `.claude/` regardless of the sentinel.

Admin custom hooks survive — refresh only rewrites entries matching
`_OUR_COMMAND_MARKERS` (foreign commands fall through unchanged,
same contract as default workspaces).

Existing override workspaces auto-converge on next
`agnes self-upgrade` (fires from every SessionStart). No manual
migration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:43:32 +02:00
Vojtech
d63f1473ab
feat(home): drop /home connectors block — onboarding covers it (#305)
The dedicated `<details data-section="connectors">` section on /home
duplicated content that the install hero's Step 4 clipboard payload
already inlines. Both surfaces sourced the same prompt strings from
`app/web/connector_prompts.py` (home tiles via `<code id="*-prompt">`,
setup script via `app/web/setup_instructions.py::_connectors_block`),
so users walking the install script visited each connector inline and
then had no reason to scroll back up.

Removed the full block (3 tiles + summary + section-label). Lead
paragraph in the install hero now mentions the connector families
briefly so the benefit is visible before kick-off:

  "... your team's curated data, plugins, third-party tools (Asana,
   Google Workspace, Atlassian), and shared knowledge ... the install
   script also connects your tools for you, so there's no extra page
   to visit."

The "Email admin" mailto CTA, previously gated inside the GWS tile
when admin_email was set + GWS unconfigured, moves implicitly to the
install script's GWS step (Claude prompts the user when the OAuth
gating wall lands). Tests updated:

- test_connectors_section_removed_from_home (renamed from
  test_connectors_render_flat_when_onboarded_by_default) — asserts
  `class="connector-tiles"` and `data-section="connectors"` are absent
  in BOTH onboarded states, and that the lead paragraph still mentions
  the three connector families so the benefit isn't lost.
- test_home_renders_connector_prompts_from_shared_module — DROPPED.
  Was a parity check between the home tiles and the setup script's
  connector_prompts.py source. One surface now → no drift risk → test
  redundant. Replaced with an inline comment pointing future readers
  at where the strings flow (setup_instructions.py::_connectors_block).
- test_home_no_longer_shows_email_admin_button (renamed from
  test_home_shows_email_admin_button_when_admin_email_set_and_gws_unconfigured)
  — asserts the mailto CTA is gone from /home regardless of
  admin_email / GWS-configured state; documents the path-move.

CSS for `.connector-tile*` left in place as dead bytes — small
footprint, no behavior, easy follow-up if/when someone audits.
2026-05-14 18:31:24 +02:00
ZdenekSrotyr
3e19caa975
fix(security): RBAC filter uses stable user_id instead of mutable email local-part (#293) (#299)
* fix(security): RBAC filter for agnes_sessions matches both email local-part and user_id

The upload API (POST /api/upload/sessions) stores session files under
user_sessions/{user_id}/ (UUID), while the session collector uses the
OS username (email local-part). The session pipeline writes the directory
name verbatim into usage_session_summary.username, so the column can
contain either value depending on the ingestion path.

The RBAC filter in build_filter_clause previously only matched the email
local-part, missing sessions uploaded via the API. The fix adds an OR
condition so non-admin users see rows where username matches either their
email local-part or their user_id.

Closes #293

Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>

* fix(security): RBAC filter uses stable user_id instead of mutable email local-part

Closes #293

Previous fix used OR condition matching both email local-part and user_id
in the username column. This was fragile: email changes would break
filtering. This commit introduces a dedicated user_id column populated
by the session pipeline via resolve_user_id(), and switches the RBAC
filter to use it exclusively.

Changes:
- Schema v45: add user_id column to usage_session_summary and usage_events
- UsageProcessor: accept and store user_id in both tables
- runner.py: resolve_user_id() maps directory name to users.id UUID
  (exact match for UUID dirs, email LIKE for local-part dirs)
- INTERNAL_TABLES: agnes_sessions/agnes_telemetry filter on user_id column
- build_filter_clause: simplified to WHERE user_id = '<uuid>' (no OR)
- me.py/admin_user_sessions.py: query by user_id OR username for
  backward compatibility during transition
- USAGE_PROCESSOR_VERSION bumped 2→3 to trigger reprocessing/backfill
- Tests updated: 27 pass including new email-change resilience test

Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>

* fix(tests): bump schema version assertions 44→45

Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>

* fix(docs): correct resolve_user_id docstring, add TypeError comment

Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>

* fix(security): address review — backward-compat OR, LIKE escaping, narrower TypeError

Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>

* fix(security): address code review — eliminate TypeError hack, add resolve_user_id tests

Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>

* fix(db): create user_id indexes in _v44_to_v45, not _SYSTEM_SCHEMA

_SYSTEM_SCHEMA runs before the migration ladder. On an upgrade from
v42/v43/v44, usage_events / usage_session_summary already exist without
the user_id column (CREATE TABLE IF NOT EXISTS is a no-op), so the
CREATE INDEX ... (user_id) lines in _SYSTEM_SCHEMA failed to bind and
aborted _ensure_schema — the app would not start post-upgrade. Move the
index creation to _v44_to_v45, which ADDs the column first. Same pattern
as the v41 audit_log indices.

* fix(usage): bump USAGE_PROCESSOR_VERSION 3→4 for user_id backfill

#303 shipped USAGE_PROCESSOR_VERSION=3 (release 0.54.12) for its
<command-name> slash extraction. This PR's 2→3 bump collided with it
on rebase, so the reprocess loop would not re-trigger to backfill the
new user_id column on deployments already running v3. Bump to 4.

* release: 0.54.13 — RBAC filter uses stable user_id (#293)

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-14 14:12:54 +00:00
minasarustamyan
f53e98d5a3
fix(usage): extract <command-name> slash invocations (release 0.54.12) (#303)
UsageProcessor missed every user-typed slash invocation because the
SLASH_RE regex (^\s*/<name>) expected a raw "/foo" prefix that Claude
Code never writes. Real session jsonls wrap slash commands in a
<command-name>/foo</command-name> XML tag inside user message content.
Result on production: usage_events.command_name and
usage_session_summary.slash_commands stayed NULL/0 for /clear, /exit,
plugin commands like /plugin:name — verified on 17 dev-VM jsonls
holding 25 <command-name> tags / 0 extracted rows.

Replaces SLASH_RE with COMMAND_NAME_RE that searches for the tag
anywhere in the user text (the tag sits after a <command-message>
sibling). USAGE_PROCESSOR_VERSION bumps 2 → 3; operators wanting to
rewrite historical rows under the new logic call
POST /api/admin/usage/reprocess (agnes admin telemetry reprocess).

Fixtures slash_command.jsonl, mixed.jsonl, skill_curated.jsonl
rewritten from the unrealistic "/foo args" string format to the real
<command-name> tag wrapper — existing assertions stay green against
the new format, which is the regression baseline going forward. Adds
TestCommandNameTagExtraction (4 unit tests on iter_events) covering
string content, list-of-text-blocks content, mid-text tag position,
and plain-prose "/x" non-match.

Implicit Skill tool_use extraction (LLM-decided invocations)
unchanged.

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
2026-05-14 13:33:57 +00:00
pcernik-grpn
840c68ff1a
feat(catalog): promote buckets to top-level Data Package cards (#301)
Each catalog_data bucket renders as its own top-level Data Package card instead of nested accordions under a single Core Business Data wrapper. Tables flat-listed per package, mirroring the Agnes Internal card. Pluralization fix (1 table / N tables).

Includes the 0.54.11 release-cut.

Co-authored-by: pcernik-grpn <pcernik-grpn@users.noreply.github.com>
2026-05-14 15:18:35 +02:00
ZdenekSrotyr
1b0329e8c5
UI design system unification — one stylesheet, canonical primitives, nav fix (#284)
* docs(plan): design-system unification plan (post-review revisions)

Plan covers consolidating two CSS files into one, introducing
canonical primitives (.btn family, .search-input, .filter-bar,
.page-header, .data-table, .empty-state, .toast, .stat-card,
.tab-strip), unifying the top-nav Admin trigger with sibling
links, and migrating 41 templates that today carry inline
<style> blocks.

Post-review revisions: nav fix moved to first commit (user
complaint lands first); sticky-header and dark-mode skeleton
tasks dropped (defer to follow-up PRs); contract test class
detection tokenizes class="..." attributes properly; baseline
screenshot loop added to Task 0; vendor-token grep widened.

* fix(nav): unify Admin trigger with sibling nav links

The top-nav Admin entry is a <button class="app-nav-link
app-nav-menu-trigger">, siblings are <a class="app-nav-link">.
.app-nav-menu-trigger used to override .app-nav-link with
"color: inherit; font: inherit", resetting font-size from 13px
back to body default and color from --text-secondary to body
color. Active state diverged too: .is-active on links used
--primary blue, [aria-expanded=true] on the button used
--border-light grey.

Fix: expand .app-nav-link so it covers <button>-element resets
(font-family: inherit, border: 0, background: transparent,
cursor: pointer, display: inline-flex for chevron alignment).
Add [aria-expanded="true"] as another active-state selector
so the dropdown's open state highlights identically to .is-active
on links. Delete the now-redundant .app-nav-menu-trigger rules
that stripped button chrome.

Extract the inline <script> from _app_header.html into a new
app/web/static/app.js (loaded by base.html only — base_login.html
has no nav). Sets up window.appUI.wireDropdown for both the user
menu and the Admin dropdown via DOMContentLoaded.

* style(css): consolidate style.css into style-custom.css + add cache-bust

One stylesheet for the whole web UI:
- style.css (1086 lines, legacy Google-inspired tokens + components)
  absorbed into style-custom.css under a labeled block, placed after
  the modern :root + body so style-custom's component rules continue
  to override the legacy ones (preserves the original cascade order
  that came from loading style.css first).
- style.css deleted; <link> dropped from base.html + base_login.html.
- static_url() now appends ?v=<mtime> to /static/<path>. Cheap
  per-request os.stat — auto-invalidates browser + proxy caches on
  redeploy without operator intervention. Mtime survives across
  uvicorn restarts as long as the file content is unchanged.

Legacy classes (.btn, .card, .login-*, .badge, .code-block, .flash,
.form-group, .username-box, .btn-copy, .auth-tabs, .divider, etc.)
still render — they live in style-custom.css now. Login pages,
error page, password setup, and the dashboard's Claude Code Setup
card all kept working in browser smoke.

* test(design): contract test for design-system invariants

7 structural invariants enforced from this commit onwards:
- style.css must stay deleted
- no template links style.css via static_url
- exactly one bare :root block in style-custom.css
- canonical primitives declared (.btn, .btn-primary, .search-input,
  .filter-bar, .page-header, .data-table, .empty-state, .toast, …)
- no deprecated class names in templates (.users-table, .gp-table,
  .marketplaces-table, .audit-table, .users-search, .marketplaces-search,
  .modal-btn, .btn-primary-v2, …)
- app.js loaded by base.html, NOT by base_login.html
- 3 helper-level unit tests for the class-attribute tokenizer
  (multi-line attrs, Jinja-conditional fragments, false-positive prose)

Two of the assertions intentionally start FAILING after this commit
(missing primitives + legacy class refs in 7 admin templates) and
will turn green as Tasks 4–7 add primitives and Tasks 8–15 migrate
the templates.

* feat(css): canonical button family + legacy token aliases

Adds at top of :root: legacy token aliases (--bg, --card-bg, --text,
--text-light, --secondary, --radius) pointing at modern equivalents.
Absorbed style.css rules referenced these names; without aliases
they fell back to 'unset'. Aliases live until Task 16 alongside
their absorbed rules.

Appends canonical .btn variants at end of file (last cascade):
  .btn-primary + .btn-primary-v2 + .modal-btn.primary (alias group)
  .btn-secondary + .btn-secondary-v2 + .modal-btn:not(.primary):not(.danger)
  .btn-ghost + .btn-ghost-v2
  .btn-danger + .modal-btn.danger
  .btn-lg
  .btn:disabled + .btn:focus-visible (focus ring via --focus-ring)

Existing absorbed .btn, .btn-primary, .btn-secondary, .btn-sm rules
remain — the canonical block adds the missing variants + selector-list
aliases so .modal-btn and v2 markup keep rendering until migration
tasks swap them out.

Contract test: .btn-danger now declared (one less missing primitive).
Browser smoke: /admin/tokens hero + filter pills + empty state render
correctly with the absorbed style.css rules now backed by real tokens.

* feat(css): form-control primitives — .search-input + .filter-bar + .filter-pill + .form-input

Canonical filter bar shape: 36px-height inputs (matches button height
for vertical rhythm), 28px pills with .is-active state, consistent
focus ring via --focus-ring token.

Selector-list aliases for legacy per-page classes:
- .users-search / .marketplaces-search / .kb-search → .search-input
- .filters-card → .filter-bar
- .pill[aria-pressed="true"] also matches the .filter-pill active state

.form-input added as a sibling of .search-input for forms — same
baseline height + radius + focus treatment, with textarea.form-input
auto-sizing to min 96px and using the mono font (matches CSV/SQL
pasted-snippet patterns on /admin/agent-prompt + /admin/workspace-prompt).

Contract test: .search-input + .filter-bar + .filter-pill now declared.

* feat(css): .page-header primitive + variants + .tab-strip

Canonical page-header pattern with title (22px) + optional subtitle +
optional eyebrow + right-aligned actions slot. Two modifiers:
- .page-header--hero: gradient background (primary→primary-dark),
  28px white title, semi-transparent subtitle/eyebrow. For
  /marketplace, /store, /profile-style pages that already use this
  layout via per-page inline <style>. Migration tasks delete the
  duplicated rules.
- .page-header--compact: 18px title for dense admin index pages.

.tab-strip + .tab-strip__item — the secondary tab row pattern used by
/marketplace?tab=flea and similar. .is-active / [aria-selected=true]
both flip the active treatment (primary color + bottom border).

Contract test: .page-header / __title / __subtitle / __actions all
now declared (4 fewer missing primitives).

* feat(css+js): .data-table + .empty-state + .toast + .stat-card primitives

Last primitive batch. All 8 canonical-primitives invariants in
test_design_system_contract.py now green; only the template-migration
test fails (expected — Tasks 8–15).

.data-table (+ --compact modifier): selector-list aliases for legacy
per-page table classes (.users-table, .gp-table, .marketplaces-table,
.audit-table) so existing markup keeps rendering until migration.
Compact modifier shrinks padding + font for dense lists (audit log).

.empty-state with __icon / __title / __description / __actions —
replaces the ad-hoc 'no results' rendering scattered across pages
(corporate_memory, admin_users, admin_marketplaces, etc.).

.toast / .toast-container — paired with window.appToast({kind, msg,
timeout}) appended to app.js. Bottom-right stacked, click-to-dismiss,
auto-dismiss after 4s by default. Kind 'success' / 'warning' / 'error'
/ 'info' shows a 3px colored left border.

.stat-card (+ --accent variant) + .stat-row grid — for the dashboard
metric tile row.

* style(templates): migrate 8 templates off deprecated class names

Mechanical class-attribute rewrite via tokenizer (preserves Jinja
conditionals + multi-line attrs):

  modal-btn primary    -> btn btn-primary
  modal-btn danger     -> btn btn-danger
  modal-btn            -> btn btn-secondary
  users-table          -> data-table
  gp-table             -> data-table
  marketplaces-table   -> data-table
  audit-table          -> data-table
  users-search         -> search-input
  marketplaces-search  -> search-input

8 templates touched: admin_groups, admin_marketplaces, admin_tokens,
admin_users, admin_welcome, admin_workspace_prompt, my_tokens,
corporate_memory_admin. 43 lines updated total.

Inline <style> blocks in these templates still define rules for the
old class names — those rules no longer match anything and become
dead code, removed in Task 16's alias cleanup along with the
selector-list aliases in style-custom.css.

Contract test (tests/test_design_system_contract.py) now fully green:
9/9 invariants enforced from this commit onward.

* feat(css): extend .data-table selector list to 13 more bespoke -table classes

Visual unification of remaining tables across the codebase without
per-template edits. The .data-table baseline rules (uppercase header
tracking, 12px padding, hover state, border-radius) now apply to:

  .ad-table / .ea-table / .md-table / .members-table /
  .obs-table / .overview-stats-table / .registry-table /
  .sample-table / .sched-table / .sess-table / .sub-table /
  .subs-table / .ud-table

These class names live in 12 templates (activity_center, admin_access,
admin_group_detail, admin_scheduler_runs, admin_sessions,
admin_store_submissions, admin_tables, admin_usage, admin_user_detail,
catalog, me_debug, profile_sessions) that have their own per-page
<style> blocks. Per-page rules with higher specificity still win for
their custom needs (column widths, etc.) — this commit only sets a
shared baseline so every table renders with the same chrome.

Contract test stays green: 9/9 invariants enforced.

* style(css): remove now-unused legacy class aliases

Phase A renamed 8 templates off these names; no markup references
them any more, so the selector-list memberships are dead weight.
Removed from style-custom.css:

  .btn-primary-v2 / .btn-secondary-v2 / .btn-ghost-v2
  .modal-btn / .modal-btn.primary / .modal-btn.danger /
  .modal-btn:not(.primary):not(.danger)
  .users-search / .marketplaces-search / .kb-search
  .users-table / .gp-table / .marketplaces-table / .audit-table
  .filters-card

37 lines smaller. Contract test catches any reintroduction.

KEPT aliases (still in untouched template markup):
- .pill (marketplace_plugin_detail.html, marketplace.html — these
  pages weren't part of Phase A's deprecated-class sweep; their
  own .pill CSS rules still apply)
- All .data-table family extensions (.ad-table, .ea-table, .md-table,
  .members-table, .obs-table, .overview-stats-table, .registry-table,
  .sample-table, .sched-table, .sess-table, .sub-table, .subs-table,
  .ud-table) — these still render data tables in 12 templates;
  selector-list aliasing keeps them visually unified with .data-table
  baseline.
- Legacy token aliases (--bg / --text / --text-light / --secondary /
  --card-bg / --radius) — still resolve absorbed style.css rules.

Templates' inline <style> blocks still contain dead rules for the
renamed classes (.users-search, .modal-btn, etc.); harmless but
bloat. Optional follow-up: a separate sweep can drop those.

* docs(changelog): design-system unification under [Unreleased]

* feat(css): unify page-shell width — .container baseline 1280px + modifiers

Inventory found 30+ unique max-width values across templates (280px
login → 1600px admin/tables). The legacy .container default was 800px,
which made every admin page set its own wider inline override —
30+ ad-hoc widths drifted as a result.

Canonical: .container max-width = var(--width-app) (1280px). Pages
that need a different shape opt in via modifiers:

  .container--narrow → var(--width-narrow)  (800px) — long-form text,
                                                     setup wizards
  .container--wide   → var(--width-wide)    (1400px) — admin lists,
                                                     marketplace grids
  .container--full   → max-width: none — hero / landing

Pages that already set a NARROWER inline max-width (setup, login flows
inside .login-card, etc.) still render at their narrower size — the
inline override beats the new canonical 1280px. The visible change
hits the ~20 admin pages currently rendering at 800px via the legacy
default, which jump to 1280px and pick up consistent breathing room.

Spacing also normalized: padding 24px 20px → var(--space-6) var(--space-5).

* fix(home+catalog): gut dashboard sections + remove confusing toggle + fix table count

Dashboard /home cleanup:
- Remove 'Your Data' card — Data Packages is already a top-nav entry,
  so duplicating data sources on the landing page just adds noise.
- Remove 'Account' card — group memberships + scripts + last sync
  belong on /profile, not on the welcome screen.
- Remove entire right-column (Corporate Memory + Activity Center
  widgets) — both surfaces have dedicated admin pages reachable from
  the Admin dropdown.
- Keep stats row (Tables/Columns/Rows/Data Size/Unstructured),
  env-setup-CTA, and Notifications card.

/catalog cleanup:
- Strip the 'Always included' badge + the locked toggle-switch from
  Core Business Data and Business Metrics cards. The toggle was
  always 'checked disabled' — it visually looked like a switch but
  could not be toggled, which was confusing. The 'Always included'
  copy itself was redundant once the toggle was gone. Agnes Internal
  already rendered without these, so the three cards are now visually
  consistent.

Catalog data_stats fix:
- 'total_tables' was len(sync_state) — counted only tables that had
  ever synced, so a 30-row table_registry with 0 ever synced rendered
  as '0 tables'. Switched to len(tables) — the registered
  business-data table list — so the count reflects what's actually
  available, not what's been touched.

* fix(home): real stat numbers + drop unstructured tile + cleanup dead CSS

Dashboard stats were hardcoded zeros (columns: 0, size_display:
'0 MB', unstructured_display: '0 MB') and the table counter pulled
from sync_state (synced) instead of table_registry (registered).
On a fresh deployment with 30 registered tables and 0 ever synced,
the page rendered '0 / 0 / 0 / 0 MB / 0 MB' — useless.

Now:
- Tables: COUNT(*) FROM table_registry WHERE source_type != 'internal'.
  Matches the /catalog Core Business Data counter.
- Columns: SUM(sync_state.columns). Zero only when nothing's synced yet.
- Rows: unchanged (SUM(sync_state.rows), already correct).
- Data Size: SUM(sync_state.file_size_bytes), human-formatted via
  inline _fmt_bytes helper (KB/MB/GB).
- Unstructured: tile dropped — was always '0 MB' and had no source.
- last_updated: now derived from sync_state max(last_sync), wasn't set
  before so the 'Synced …' tag never rendered.

Dashboard.html cleanup: ~725 lines of orphan inline <style> removed —
.section-title, .data-source*, .toggle-switch*, .catalog-cta*,
.memory-card / .memory-stat / .memory-description / .memory-footer
/ .btn-memory, .activity-card / .activity-stat / .activity-text
/ .btn-activity, .account-grid / .account-row / .account-scripts
/ .badge-role / .badge-group / .cron-line, .badge-included /
.badge-beta / .badge-demo. All matched markup deleted in the
previous commit; the CSS was dead code until now.

* ui(catalog): rename page heading 'Data Catalog' → 'Data Packages'

The top-nav entry says 'Data Packages' but the page itself said
'Data Catalog' — confusing two-name product. Aligns the heading and
<title> with the nav label. Subtitle trimmed too: 'manage your
subscriptions' was a vestige of the toggle UI that just got removed,
replaced with a one-liner describing what the page is for.

Two other 'Data Catalog' strings stay: they live inside the table-
profiler overlay JS and refer to an EXTERNAL catalog system (e.g.
OpenMetadata / Atlan) that an operator may link to per table — that
is a generic term for any external data-catalog product, not our
page name.

* fix(nav): dropdown clicks always work + mutual-exclusion close

Two bugs in the wireDropdown helper:

1. Clicking trigger B while trigger A's menu was open left both open.
   e.stopPropagation() in trigger.click prevented the document-click
   handler from firing, so trigger A's open menu had no way to learn
   that something else was clicked. Net effect: state diverged across
   the two dropdowns the more you clicked.

2. The target-vs-trigger equality check (e.target !== trigger) was
   strict. Clicking the chevron <svg> inside the button reports the
   svg or its <path> child as e.target — not the button — so removing
   stopPropagation alone would trip the close branch in the same
   click that just opened the panel.

Fix both at once: drop e.stopPropagation() AND switch the doc-handler
guard to trigger.contains(e.target). Now any click outside both the
trigger subtree and the panel subtree closes; any click on another
trigger closes via the OTHER dropdown's doc handler; clicks inside
the trigger (button OR svg child) are fully ignored by the doc
handler and only the trigger's own toggle handler fires.

* feat(ui): canonical blue-gradient hero on every admin page

The UI had a per-page hero pattern on ~10 onboarding/marketing pages
(admin_tokens / profile / install / setup_advanced / marketplace /
my_tokens / store_upload / home_*), each with its own ad-hoc CSS
(.tokens-hero, .profile-hero, .install-hero, .upload-hero, …). The
admin section's index + detail pages had plain H1/H2 with their own
.users-title / .gp-title / .obs-title / .cfg-title / … inline styling.
Net effect: half the app felt like a product, half felt like a
spreadsheet.

Now:
- .page-header--hero CSS upgraded to match the look analysts already
  liked from admin_tokens: 28px/32px/24px padding, 14px radius, soft
  primary-tinted box-shadow (0 4px 16px rgba(0,115,209,0.2)), 28px
  semibold title, optional uppercase eyebrow + 13.5px subtitle.
  Narrow-viewport breakpoint included.
- New _page_hero.html partial wraps the boilerplate. Usage:
    {% set page_hero_eyebrow  = "Users & Access" %}
    {% set page_hero_title    = "Users" %}
    {% set page_hero_subtitle = "…" %}
    {% include "_page_hero.html" %}
- 15 admin templates migrated to it: admin_users / admin_groups /
  admin_marketplaces / admin_access / admin_sessions /
  admin_session_detail / admin_store_submissions /
  admin_scheduler_runs / admin_usage / admin_user_detail /
  admin_welcome / admin_workspace_prompt / admin_server_config /
  activity_center / admin/news_editor. Each gets a grouped eyebrow
  (Users & Access / Data / Agent Experience / Activity Center /
  Server) matching the Admin dropdown sections so the page identity
  is unambiguous at a glance.

Legacy *-title H2/H1 + adjacent subtitle paragraphs deleted; their
per-page CSS rules are dead now (harmless, retire in a follow-up
sweep alongside other inline-style cleanup the reviewers flagged).

admin_tables.html intentionally NOT migrated — it's a standalone
HTML page that doesn't extend base.html; a separate refactor.

Test: test_admin_users_page_renders_for_admin assertion updated
from .users-title to .page-header__title + .page-header--hero (the
canonical pair). All other web/template tests stay green.

* refactor(ui): dedup _humanbytes, drop 267 lines of dead inline CSS

(1) _humanbytes consolidation:
- Add TB branch + optional precision param (default 2 preserves existing
  Store detail callers; dashboard uses precision=1 for headline tiles).
- Delete inline _fmt_bytes from dashboard handler — was a copy of
  _humanbytes with different rounding. One canonical helper now.

(2) Dead inline-CSS sweep across 17 migrated templates:
- Conservative regex: a CSS rule is deleted only when its primary class
  matches one of the known-dead names AND that name is NOT referenced
  from any class= attribute in the same file's markup.
- Per-file 'in-use' guard saved several false positives that the deny
  list would have nuked (e.g. .users-toolbar, .gp-search, .obs-subtitle,
  .marketplaces-toolbar are still in use; only .users-table, .users-search,
  .users-title, .modal-btn, etc. that have NO markup left went away).
- Removed: -267 lines across admin_users (-42), admin_marketplaces (-45),
  admin_groups (-31), my_tokens (-38), admin_tokens (-29), admin_access
  (-9), admin_user_detail (-6), admin_welcome (-8), admin_workspace_prompt
  (-8), admin_server_config (-2), admin_sessions (-1), admin_session_detail
  (-1), admin_usage (-1), admin_store_submissions (-3), admin_scheduler_runs
  (-3), activity_center (-4), corporate_memory_admin (-36).

Contract test stays green (9/9); all web/template/render/user_management
tests pass.

* feat(ui): canonical hero on /catalog (Data Packages)

Same .page-header--hero treatment as the admin pages — Data eyebrow,
Data Packages title, Browse-the-data-sources subtitle. Removes the
ad-hoc .page-title block (h1 / p / wrapper-div) and its CSS rules
(now dead, 3 rule blocks deleted).

* fix(nav): load app.js from _app_header.html — works on standalone pages

The previous nav-fix commit moved the inline dropdown script from
_app_header.html into app/web/static/app.js + added <script src=…>
to base.html. That broke EVERY page that includes _app_header.html
WITHOUT extending base.html (catalog, corporate_memory*,
admin_tables, install). They got the nav markup but no JS → both
Admin and AD dropdowns dead on those pages.

Fix: emit the <script src=app.js defer> directly inside the
_app_header.html partial. Any page that includes the header now
gets the script automatically — base.html-extenders AND standalone
HTML pages alike. base.html's duplicate <script> line removed.

Also fixes the wide-hero on /catalog: .page-header--hero now sets
its own max-width: var(--width-app) (1280px) so standalone pages
without a .container parent don't render the gradient edge-to-edge.
catalog's .source-cards bumped from 900px → 1280px to match the
hero, otherwise the page reads two-tier (wide blue band, narrow
content) which the user flagged.

Verified locally via agent-browser: Admin + AD dropdowns now click
through on /catalog, /admin/tables, /corporate-memory.

* docs(plan): standalone pages → base.html framework migration plan

Plan + Plan-agent review (8 must-fix items applied) for converting
the 5 templates that ship their own <html><head><body> scaffold
(catalog, install, corporate_memory, corporate_memory_admin,
admin_tables) to extend base.html. Root cause of yesterday's
'dropdown dead on /catalog' regression: shared infrastructure in
base.html doesn't propagate to standalones.

* feat(base): body_attrs block + migrate install.html to extend base

base.html: new {% block body_attrs %}{% endblock %} slot so pages
that need <body> attributes (admin_tables has data-source-type)
can carry them through extends.

install.html: convert from standalone <html><head><body> scaffold
to {% extends "base.html" %} with title / body_attrs / head_extra
/ layout / scripts blocks. Drops:
- <!DOCTYPE>, <html>, </html>, <head>, </head>
- <meta charset>, <meta viewport>
- Duplicate <link rel="stylesheet" href="...style-custom.css">
  (base.html already provides one)
- <body> opening + closing tags
- Leading _app_header.html include + _version_badge.html include
  (base.html handles both)

Preserves per-page CSS (in head_extra), per-page JS (in scripts),
the Inter font preconnect (kept inline; not hoisted to base in
this PR — separate decision).

Pilots the migration recipe before the 4 larger pages.

* refactor(memory): extend base.html

Same recipe as install.html. corporate_memory.html now inherits
<html>/<head>/<body> + nav + app.js script tag from base.html.
Page-specific CSS and JS preserved in head_extra + scripts blocks.

* refactor(memory-admin): extend base.html

Same recipe as install/corporate_memory. Curation page now in the
shared rendering pipeline.

* refactor(catalog): extend base.html

catalog.html had the most complexity: 7 head-level assets (chart.js,
Prism, prism-sql, metric_modal.css link + 2 preconnects + Inter
stylesheet), 5 body-level <script> blocks including a <script type=
"module"> for the metric modal, 2 duplicate style-custom.css links
in <head>. The migration script preserved all of them — head-level
externals hoisted to {% block head_extra %} in source order, body
scripts relocated to {% block scripts %} in source order (so chart.js
loads before the IIFE that builds Chart instances), duplicate
style-custom.css links dropped (base.html provides one).

* refactor(admin-tables): extend base.html + carry data-source-type

The biggest of the 5 standalones at 3563 lines. <body data-source-
type="{{ data_source_type }}"> attribute carried through via the
new {% block body_attrs %} slot (admin_tables JS reads
document.body.dataset.sourceType to switch between keboola and
bigquery rendering paths).

* release: 0.54.10 — UI design system unification + homepage status frame + initial workspace override + store guardrails

Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>

* refactor(web): migrate remaining templates to canonical design primitives

- admin_group_detail: .data-table, .btn family, appToast(), remove duplicate table/button/toast CSS
- admin_store_submission_detail: .data-table, .btn family, appToast(), remove duplicate btn/toast CSS
- profile_sessions: .data-table, _page_hero.html, remove duplicate table/title CSS
- me_debug: .data-table, .btn family, remove duplicate table/button CSS
- marketplace: .btn-primary/.btn-secondary, remove duplicate button CSS
- store_edit: remove duplicate .btn-primary/.btn-link CSS, canonical button classes
- store_upload: remove duplicate .btn-primary/.btn-secondary/.btn-link CSS

Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-05-14 13:28:03 +02:00
Vojtech
aa6a6700f4
feat(me/stats): per-analyst Stats dashboard with 4 tabs (#298)
* feat(me/stats): per-analyst Stats dashboard with 4 tabs

New /me/stats page shows the calling user's own analytics across
four tabs, lazy-loaded per activation:

- **Sessions** — paginated usage_session_summary join with a
  filesystem scan of un-processed JSONL (mirrors admin
  list_user_sessions shape). v44 token columns aggregated per row.
- **Tokens** — daily series (default 30 days), by-model breakdown
  (lifetime), top-10 biggest sessions, lifetime totals. Single
  CTE per sub-query against per-user partition (idx_usage_session_user).
- **Data access** — audit_log rows where action LIKE 'query.%' for
  the caller. Covers query.local / query.hybrid / query.remote /
  query.internal. Cursor-paginated on (timestamp, id).
- **Sync activity** — audit_log rows where action is sync.* or
  manifest.* for the caller, plus users.last_pull_at for the
  header. Per-pull history now persists thanks to the new
  manifest.fetch audit row.

Backend: app/api/me_stats.py — single APIRouter at /api/me/stats/*,
four GET endpoints, all gated by get_current_user (server-side
caller scope; the page route itself only renders the shell).

Frontend: app/web/templates/me_stats.html — tab bar + 4 panels,
plain JS lazy-loads each panel's endpoint on first activation,
caches per-tab so switching back doesn't refetch. Small SVG bar
chart on Tokens tab (no external charting dep). 'Stats' link
added to _app_header.html primary nav between 'Data Packages'
and the Admin dropdown.

Side change in app/api/sync.py: /api/sync/manifest now emits a
manifest.fetch audit_log row alongside the existing
users.last_pull_at bump. The column UPDATE only retains the
most recent timestamp; per-pull history needs an audit row.
client_kind='api' for the manifest endpoint (vs. 'web' which
the audit-read deduper uses for AC reads), so the Sync tab can
distinguish CLI pulls from browser-driven manifest peeks.

7 new tests in tests/test_me_stats.py:
- sessions endpoint caller-scope isolation (user A doesn't see B)
- sessions pagination
- tokens empty-user zero shape
- tokens aggregation across daily window + by_model + top + totals
- queries endpoint filters to action LIKE 'query.%' + caller scope
- sync endpoint surfaces both manifest.fetch and sync.trigger
- manifest endpoint writes the manifest.fetch audit row

* ui(me/stats): widen page to 1400px via main.main escape

Default base.html .container wraps content at max-width 800px. Stats
tables (by-model + top-sessions: 6 columns each) felt cramped at that
width — same constraint dashboard.html escapes via the {% block layout %}
override pattern. Mirror that here: render <main class="main"> and
bump .stats-page max-width to 1400px so the 6-column tables breathe
without going edge-to-edge on wide monitors.

* ui(me/stats): narrow from 1400px to 1280px to match /home

/home isn't actually .container's default 800px — style-custom.css
has a body:has(.home-mock) .container { max-width: 1280px } override
that widens it. 1280px is the shared 'wide content' width across the
codebase (top-nav header + /home + dashboard all use it).

Bumping me_stats from 1400px to 1280px so the Stats page reads as
'same chrome' instead of distinctly wider than its sibling pages.
2026-05-14 10:27:58 +00:00
Vojtech
37ad39c8a3
feat(home): status frame on /home (operator-gated, onboarded-only) (#297)
* feat(home): status frame on /home — last sync, sessions, prompts, tokens, projects

Adds the homepage status frame: a 5-card row above the install-hero /
offboard-strip on /home showing the calling user's Last sync (their
last `agnes pull`), Sessions, Prompts, Tokens used, and Projects worked
on, with a 24h/7d pill toggle.

Backed by `GET /api/me/home-stats?window=` (one DuckDB CTE joining
`users` + `usage_session_summary` + `usage_events`) and SSR'd from the
same `compute_home_stats` helper on initial paint so there's no
spinner. The window toggle is the only JS-driven path.

Side surfaces:
- `GET /api/sync/manifest` now stamps `users.last_pull_at` so
  `agnes pull` (and the Claude Code SessionStart hook that wraps it)
  imprints the analyst's last sync time for the new card.
- `usage_session_summary` gains four BIGINT token counters
  (input_tokens, output_tokens, cache_read_tokens, cache_creation_tokens)
  summed from JSONL `message.usage.*` per assistant turn.
- `USAGE_PROCESSOR_VERSION` bumps 1 → 2 so the session-pipeline
  reprocess loop invalidates stale summaries and backfills tokens
  on the next tick.

Schema migration v43 → v44 is idempotent ALTERs (last_pull_at +
4 token columns) — fresh installs receive them from `_SYSTEM_SCHEMA`,
upgrade path runs `_v43_to_v44`. Defaults (NULL / 0) backfill
existing rows cleanly.

9 new tests in tests/test_home_stats.py cover the migration,
endpoint shapes (24h/7d/unknown/empty/missing-user), and the
manifest-side last_pull_at bump.

* docs(CHANGELOG): homepage status frame entries under [Unreleased]

The post-rebase release-cut now belongs to whichever PR lands next
after main rolled to 0.54.9. This PR logs its bullets under
[Unreleased] (Added: homepage status frame, per-user pull tracking,
token counters; Changed: schema v43 → v44 migration) so they ride
out with the next release-cut.

* fix(tests): bump test_schema_v42_migration asserts to v44

CI failed because tests/test_schema_v42_migration.py hardcoded
`assert SCHEMA_VERSION == 43` and `assert v == 43` after init.
v44 (homepage stats frame backing columns) was introduced in the
preceding feat commit; this aligns the existing v42-era migration
tests with the new schema version.

* feat(home): gate status frame on operator flag + user.onboarded

Two gates on the homepage status frame:

1. **Operator master switch** — `get_home_status_frame_visibility()` in
   app/instance_config.py mirrors the existing `get_home_automode_visibility()`
   shape: env var `AGNES_HOME_SHOW_STATUS_FRAME` > yaml
   `instance.home.show_status_frame` > default `True`. Cautious-rollout
   instances can disable the frame without forking; the yaml example
   documents both knobs.

2. **Onboarded gate** — the template only renders the frame when the
   caller's `users.onboarded` is true. First-day users see a clean
   install-hero before all-zero stat cards; the frame appears
   automatically on the next render after `agnes init` POSTs
   `/api/me/onboarded`.

Router skips the `compute_home_stats` DB read entirely when either
gate is closed; `home_stats` arrives at the template as None in that
branch and the `{% if %}` shortcuts the include.

Why both gates: PostHog feature flags evaluated and rejected — this
codebase uses PostHog for analytics capture only, not feature gating;
adding a per-user feature_enabled() call on the /home critical path
would couple the homepage render to a remote eval and still require
an admin master switch. The onboarded gate is a UX coherence rule
layered on top of the operator switch, not an A/B test signal.

3 new tests in test_home_stats.py cover the env-var resolution
(falsey values + default-true). The yaml example gets a `home:`
block documenting both `show_automode` (pre-existing flag, was
undocumented in the example) and `show_status_frame`.
2026-05-14 09:28:47 +00:00
minasarustamyan
63ae676b27
perf(marketplace): cache cover photos + restore Curated filter spacing (#294)
* perf(marketplace): browser-cache cover photos + restore Curated tab filter spacing

Cover photos on /marketplace grid now serve with `Cache-Control: public,
max-age=2592000, immutable` plus URL fingerprinting (`?v=<commit-sha8>`
for curated, `?v=<version_no>` for flea) so browser refresh stops hitting
the server entirely for unchanged assets. Per-plugin RBAC dropped from
the three image endpoints (curated_asset, curated_mirrored, get_entity_photo)
in favor of login-only auth — eliminates _system_db_lock contention on
parallel image requests. Per-request magic-bytes revalidation also dropped
from curated_asset (it was re-reading the file just to discard the bytes,
then FileResponse read it again).

Spacing bug: sort-dropdown commit (6be1cee) wrapped .mp-filter-row in a
new flex container with inline margin-bottom:4px, masking the original
12px CSS rule. Curated tab (where .mp-type-row is hidden) ended up with
4px between filters and the card grid. Wrapper margin restored to 12px.

See CHANGELOG entry under [Unreleased] — the RBAC relaxation is called
out under ### Security with explicit threat-model rationale for AI/human
reviewers.

* test(marketplace): update renamed-html-as-png test for dropped magic-bytes check

Magic-bytes body validation was dropped from `curated_asset` in the previous
commit — the request path now relies on extension allowlist + pinned
Content-Type + nosniff + strict CSP to neuter mismatched payloads at the
browser layer. Update the test to assert the new defense-in-depth posture
(200 served, but Content-Type=image/png + nosniff + CSP=default-src 'none')
rather than the gone 415.

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
2026-05-14 10:09:32 +02:00