* feat(flea): phase-1 — title, tagline, synthetic_name columns + upload UX
Schema v49 adds three user-facing metadata columns to store_entities:
- title (NOT NULL) — humanized display name shown on marketplace
surfaces in later phases. Acronym-aware humanizer in
src/store_naming.py (27 entries: MCP, API, OAuth, S3, …) shared
with the frontend via Jinja-injected dict so JS pre-fill and
Python backfill produce identical output.
- tagline (NULL, ≤200 chars) — optional short description for card
listings. Long-form `description` stays.
- synthetic_name (NOT NULL) — deterministic `<name>-by-<owner_username>`
stored as a column for indexing and as the single source of truth
for attribution lookups in later phases. Today's bundle bake still
uses suffixed_name() at the same call sites.
Migration (_v48_to_v49_migrate, Python function — humanize has no
SQL equivalent) backfills existing rows: title from
humanize_name(strip_archive_suffix(name)), synthetic from the concat
formula; tagline stays NULL. Idempotent (ADD COLUMN IF NOT EXISTS +
SET NOT NULL no-op on re-run).
Upload form (store_upload.html step 2) reorders fields: Title
(pre-filled from server-side humanize, JS keeps it in sync until
the user edits manually) → Name + dark synthetic preview on one
row (matches marketplace_item_detail.html dark code styling, no
copy button — preview only) → Short description with character
counter → Description (unchanged). Edit form (store_edit.html)
mirrors the layout with pre-filled values from the entity row.
API:
- POST /api/store/entities/preview returns `title` (humanized
fallback) for upload form pre-fill.
- POST + PUT /api/store/entities accept `title` and `tagline` form
fields with 100/200-char validation; PUT recomputes
synthetic_name when `name` changes (caller responsibility per
repo contract).
- StoreEntityResponse exposes all three new fields.
Repository:
- create() takes title + tagline + synthetic_name as optional
kwargs with derived defaults (humanize_name(name) / concat) so
existing test fixtures don't need to thread them.
- update() supports partial updates on all three; tagline empty
string clears via NULL sentinel.
- archive() recomputes synthetic_name on rename to the archived
slug so the column stays consistent with name.
Tests:
- New test_schema_v48_to_v49_migration.py: fresh install,
populated-row backfill (incl. archived row strip), idempotence,
NOT NULL constraint verification.
- test_store_naming.py: 14 humanize parametrize cases + acronym
dict invariants.
- test_store_api.py::TestStoreV49Metadata: preview humanize, POST
with explicit + fallback title, 100/200-char rejects, PUT
partial update + synthetic recompute on rename.
- Schema version assertion bumps (48 → 49) in test_db_schema_version,
test_home_stats, test_schema_v42_migration, test_schema_v46_migration.
Phase 1 only — surface rendering on cards / detail pages and
Claude Code bundle propagation come in later phases.
* feat(flea): phase-2 — wire title/tagline/owner through marketplace cards + detail pages
Phase 1 (7f4cfcbb) populated the three new columns on store_entities;
phase 2 surfaces them across the web presentation layer so the kebab-
case slug + bare username no longer leak into user-facing copy.
API:
- `_flea_to_item` now takes `conn` (both callsites updated) and sets
`display_name=entity.title`, `tagline=entity.tagline`, `owner=
_resolve_owner_display(conn, owner_user_id, owner_username)` —
matches the chain the curated path already uses (users.name →
users.email → fallback). The card JS chain `it.display_name ||
it.name` then renders the friendly form; `name` stays at the
suffixed slug as the technical identifier JS uses for fallbacks.
- `flea_detail` adds `display_name` + `tagline` to PluginDetailResponse
so the standalone skill/agent + plugin detail heroes pick them up
through the existing `d.display_name` / `d.tagline` chains.
- `_flea_inner_parent_fields` swaps `parent_display_name` from
`strip_archive_suffix(name)` to `entity.title or strip_archive_suffix(
name)`. Drives parent-plugin label in four surfaces at once:
breadcrumb 3rd segment, hero "part of <plugin>" meta-row,
helper "This skill is part of <plugin>" panel, and the Details
sidebar's "Parent plugin" row.
Templates — `marketplace_item_detail.html`:
- Pre-render: browser title, hero h1, and hero-window-label read
`(entity.title if entity else None) or inner_name or item_name or
plugin_name` so the SSR shell shows the friendly title before the
JS fetch lands (no flash of kebab-case).
- Breadcrumb last segment for flea standalone drops the `d.manifest_name
|| heroTitle` fallback in favour of just `heroTitle` — manifest_name
is the suffixed slug and users explicitly didn't want it in the path.
- Hero meta-row for flea standalone is now hidden. The prior "by
<author> · N installed · <size>" line duplicated install count
(hero telemetry chip below), owner + bundle size (Details sidebar).
Templates — `marketplace_plugin_detail.html`:
- Same SSR pre-render swap (title, h1, window-label, crumb-name).
- Hero tagline element starts hidden; JS shows it only when
`d.tagline` is truthy. Pre-fix it fell back to `d.description`
(long-form text), which read awkwardly under the h1 and pulled the
hero too tall. Description still renders in the "What it does"
panel below the hero.
- Initial "Loading…" placeholder removed so entities without a
tagline don't flash that text mid-fetch.
Tests:
- New `TestFleaPhase2Presentation` class in test_marketplace_api.py
(6 cases): card title + tagline + full-name owner, owner fallback
chain when users.name is NULL, flea_detail exposes title + tagline,
tagline null when omitted, inner skill parent_display_name uses
entity.title (explicit + humanize-fallback variants).
- Updated `TestListItems.test_flea_lists_uploads` to assert both
`display_name == "Alpha"` (humanized) and `name ==
"alpha-by-alice"` (suffixed slug compat).
- Updated `TestWebPages.test_marketplace_flea_detail_page_renders`
to look for the humanized title ("Page Skill") in the SSR shell
instead of the kebab-case `page-skill`.
* feat(flea): phase-3 — read synthetic_name from DB, suffixed_name() only on write
Phase 1 added the column + backfill, repo write paths keep it in sync.
Phase 3 routes every READ callsite through `store_entities.synthetic_name`
directly instead of recomputing `<name>-by-<owner_username>` on the fly,
and switches the collision query off the inline string concat. The
`suffixed_name()` primitive now lives exclusively in write flows.
Read callsites updated (all read `entity["synthetic_name"]` directly,
no fallback — the column is NOT NULL and a missing value would be a
real bug worth surfacing as KeyError):
- app/api/marketplace.py:_flea_to_item — card MarketplaceItem.name.
- app/api/marketplace.py:flea_detail — PluginDetailResponse.manifest_name.
- app/api/store.py:_entity_to_response — StoreEntityResponse.invocation_name.
- app/api/store.py PUT bundle re-bake — `suffixed` passed to
`_bake_plugin_tree`; entity is loaded pre-rename, so its
synthetic_name is the OLD value `_bake_plugin_tree` expects.
- app/api/store.py PUT rename — `old_suffix` for `_rename_baked_tree`.
- app/api/my_stack.py — StoreInstallEntry.invocation_name.
- src/marketplace_filter.py — manifest_name in served plugin entry.
`suffixed_name` imports removed from marketplace.py, my_stack.py, and
marketplace_filter.py (no remaining callsites). store.py keeps the
import for its write paths:
- POST create (`suffixed = suffixed_name(final_name, username)` →
passed to `_bake_plugin_tree` and `repo.create(synthetic_name=...)`).
- PUT rename collision check (`new_suffixed`).
- PUT rename `new_suffix` for `_rename_baked_tree` (proposed value).
- PUT rename `new_synthetic` for `repo.update(synthetic_name=...)`.
- Archive `old_suffix` + `new_suffix` for `_rename_baked_tree`
(retro-compute pre-archive value after `repo.archive` already
overwrote the DB row with the post-archive synthetic).
Collision SQL — `_suffixed_already_taken`:
WHERE name || '-by-' || owner_username = ? (before)
WHERE synthetic_name = ? (after)
Same matches today (phase 1 backfill + NOT NULL invariant + write
paths in sync); indexable + single source of truth going forward.
Repository:
- UserStoreInstallsRepository.list_for_user explicit SELECT extended
with `se.title`, `se.tagline`, `se.synthetic_name` so my_stack and
marketplace_filter callers can read them off the joined row.
Tests:
- test_store_api.py::test_invocation_name_reads_from_synthetic_column —
upload entity, manually override the column with a non-canonical
value, verify GET response returns the override (proves read path
consumes the column, not recomputes).
- test_marketplace_api.py::test_flea_card_and_detail_read_synthetic_name_from_db —
same proof for `MarketplaceItem.name` (card) and
`PluginDetailResponse.manifest_name` (detail).
* feat(flea): phase-4 — rename agnes-store-bundle → flea (synthetic plugin)
The synthetic plugin that wraps loose flea-market skills + agents into
one Claude Code plugin is renamed from `agnes-store-bundle` to `flea`.
Plugin-type flea uploads (their own standalone plugin entry) are
unaffected.
Constants:
- src/marketplace_filter.py:
- BUNDLE_PLUGIN_NAME: "agnes-store-bundle" → "flea" (Claude Code
plugin manifest name + .claude-plugin/plugin.json name)
- BUNDLE_PREFIXED_NAME: "store-bundle" → "flea" (on-disk ZIP /
git tree path, now plugins/flea/...)
Attribution layer (services/session_processors/usage_lib.py):
- FLEA_BUNDLE_PREFIX: "agnes-store-bundle" → "flea". The JSONL
invocation identifier going forward is `flea:<skill-name>`.
- New `_LEGACY_FLEA_BUNDLE_PREFIXES = ("agnes-store-bundle",)`.
`MarketplaceItemLookup.resolve()` + `_attribute_event()` accept BOTH
the new and the legacy prefix so historic usage_events (~90-day
retention) continue attributing to source='flea'. The tuple becomes
a no-op once the rename has been live past the retention window —
a follow-up commit can drop it then.
- USAGE_PROCESSOR_VERSION bumped 6 → 7 so the session-pipeline reprocess
loop re-runs attribution with the new + legacy prefix branches.
User-facing copy:
- /api/store/bundle.zip Content-Disposition filename: agnes-store-bundle.zip → flea.zip
- `agnes admin store pull` default --out: agnes-store-bundle.zip → flea.zip
- Docstrings + JS comment + welcome template comment updated.
Tests:
- skill_flea.jsonl fixture identifier updated to flea:flea-skill.
- New skill_flea_legacy.jsonl with the legacy prefix for backward-compat
coverage.
- New test `test_legacy_agnes_store_bundle_prefix_resolves` replays the
legacy fixture and asserts source='flea' attribution still lands.
- All other test assertions / mocks substituted mechanically:
test_session_processor_usage.py, test_usage_rollups.py,
test_marketplace_filter_store.py, test_store_api.py,
test_cli_refresh_marketplace.py.
- `_seed_flea_entity` (test_usage_rollups.py) + `_seed_attribution`
(test_session_processor_usage.py) helpers now supply the NOT NULL
`title` + `synthetic_name` columns from phase 1, since they INSERT
directly bypassing the repo's create() fallback.
Client rollover note (CHANGELOG): `agnes refresh-marketplace` will
install the new `flea@agnes` plugin and the local marketplace clone's
`plugins/store-bundle/` source folder is removed via `git reset --hard`.
Whether Claude Code itself auto-prunes the orphan `agnes-store-bundle
@agnes` registry entry is undocumented — to verify empirically on the
dev VM. If the orphan entry lingers, a follow-up will add targeted
cleanup; until then users can manually run
`claude plugin uninstall agnes-store-bundle@agnes`.
Verified locally: 98 passed (session_processor_usage + usage_rollups +
marketplace_filter_store + cli_refresh_marketplace) + 228 passed/2
skipped (store_api + marketplace_api + admin_store_submissions +
store_entity_versions + store_repositories).
* fix(flea): phase-5 — attribution keyspace mismatch (closes#335)
Pre-fix every flea skill/agent invocation silently fell through to
`usage_events.source = 'builtin'`. Root cause: lookup tables in
`services/session_processors/usage_lib.py` keyed `_flea_entities` (and
the derived `_flea_plugins` set) by `store_entities.name` — the
un-suffixed display name. Claude Code writes invocations as
`flea:<synthetic_name>` (e.g. `flea:xlsx-by-c-marustamyan`), so
`dict.get(local)` always missed and the resolver fell through to
builtin. Result: marketplace cards, detail telemetry chips, admin
group-by-source all showed 0 flea invocations even when the raw
JSONL stream was correct.
Phase 1 added the `synthetic_name` column + backfill; phase 4 renamed
the bundle prefix to `flea`; phase 5 finally flips the lookup
keyspace to match what JSONL writes.
usage_lib.py:
- `MarketplaceItemLookup.__init__` preload: `SELECT synthetic_name,
type FROM store_entities` (was `SELECT name, type`). `_flea_plugins`
set derived from those keys, so it now carries synthetic_names
too — matches what Claude Code writes when invoking a skill nested
inside a flea plugin (`<synthetic>:<inner>`).
- `rebuild_rollups` preload: same SELECT change; also derives
`flea_plugins` and threads it through `_aggregate_events` /
`_rebuild_window`.
- `_attribute_event`: signature extended with `flea_plugins`; new
branch `if prefix in flea_plugins: return ("flea", default_type,
prefix, local)` for flea-plugin-nested skills/agents. This branch
was added to `MarketplaceItemLookup.resolve()` in v6 (commit
e076ebbe) but the rollup builder's helper was never updated to
match, so nested skills inside flea plugins silently dropped out
of the daily/window fact tables.
- `USAGE_PROCESSOR_VERSION`: 7 → 8. Forces the session-pipeline
reprocess loop to re-attribute existing usage_events rows with
the corrected lookup so rollup tables fill correctly on the next
tick.
marketplace.py — 4 API stats lookup callsites switched from
`entity["name"]` to `entity["synthetic_name"]`:
- `_flea_to_item` (card stats lookup)
- `flea_detail` (`_build_telemetry` + `_load_inner_items_stats_by_parent`)
- `flea_skill_detail` (inner detail `parent_plugin` key)
- `flea_agent_detail` (inner detail `parent_plugin` key)
Tests:
- `skill_flea.jsonl` invocation: `flea:flea-skill` →
`flea:flea-skill-by-alice` (mirrors what Claude Code writes after
phase 1/4 — the suffixed synthetic_name).
- `test_flea_skill_attributed_with_empty_parent` assertion: rollup
`name` column now carries the synthetic_name.
No legacy `agnes-store-bundle` prefix backward compat — clean cut per
user direction (dev phase, no production data worth preserving).
Verified locally: 53 passed targeted (session_processor_usage +
usage_rollups + marketplace_filter_store) + 215 passed/2 skipped
broader (store_api + marketplace_api + admin_store_submissions +
store_entity_versions).
* fix(flea): phase-6 — plugin-level rollup aggregation parity for flea
Flea plugin entity cards + detail pages showed 0 invocations even
though nested skills had correct rollup rows. Root cause: the
plugin-level aggregation pass in `_aggregate_events` was hardcoded
to `source='curated'` only:
if source != "curated" or not parent:
continue
if group_by_day:
pkey = (day, "curated", "plugin", "", parent)
else:
pkey = ("curated", "plugin", "", parent)
So flea plugin entities never got a synthetic
`(source='flea', type='plugin', parent_plugin='', name=<synth>)`
row aggregating nested invocations. `_load_invocation_stats('flea')`
filters `parent_plugin = ''` and returned no row for flea plugin
entity cards, so `stats.get(entity["synthetic_name"])` missed and
the API exposed 0/0.
Triggered by empirical observation on the dev VM —
`codex-second-opinion-by-c-marustamyan` plugin showed 0 calls in
the listing card while its three inner skills (codex-setup ×3,
codex-review ×1, codex-second-opinion ×1) had the expected child
rollup rows.
Fix:
- Extend the guard to `source in ("curated", "flea")`.
- Replace the hardcoded `"curated"` in the `pkey` tuple with the
loop's `source` variable, so flea aggregation lands as `source=
'flea'` and curated aggregation continues landing as
`source='curated'`.
API path unchanged — `_load_invocation_stats('flea')` filters
`parent_plugin = ''` already picks up the new aggregated row
alongside standalone skill/agent rows. Rollup `name` field carries
the synthetic_name keyspace; no collision between standalone entity
synthetic and plugin entity synthetic (global suffix uniqueness
enforced by `_suffixed_already_taken`).
`USAGE_PROCESSOR_VERSION` bumped 8 → 9 to force a reprocess pass so
historic nested-invocation data fills the new plugin-level rows on
the next tick (instead of waiting for the next live invocation).
Tests:
- New `test_flea_plugin_row_aggregates_children` mirrors the existing
`test_curated_plugin_row_aggregates_children`: seeds a flea plugin
entity, three nested events (one user invoking two skills, a
second user invoking one) → asserts the aggregated plugin row
carries count=3, distinct_users=2 (union, not sum), plus the child
rows survive alongside.
Verified locally: 43 passed (session_processor_usage + usage_rollups)
+ 82 passed/2 skipped broader (+ marketplace_filter_store +
marketplace_api).
* refactor(marketplace): phase-7 — unify Details sidebar across detail surfaces
Five marketplace detail surfaces (curated plugin, flea plugin, curated
inner skill/agent, flea inner skill/agent, flea standalone skill/agent)
had drifted on which Details rows they show and what order — the same
field landed in different positions, some fields duplicated hero info,
and the flea plugin Owner row leaked the kebab-case `owner_username`
slug instead of the user's real name. This commit aligns all five
surfaces on a single scan order driven by UX priority:
identity → life-stage → telemetry → debug-tier
Concretely:
1. Curator / Owner (first scan signal — trust)
2. Parent plugin (inner skill/agent only)
3. Released (top-level only — plugins + flea standalone)
4. Last used (recency)
5. Active days (engagement consistency)
6. Version (flea standalone only — content hash)
7. Bundle size (debug-tier)
Dropped:
- Slug field on plugin detail surfaces (`marketplace_id` for curated,
`entity_id` for flea). Pure debug info, never user-relevant; URL
already carries it.
- Category + Installs on flea standalone skill/agent detail.
Category is already shown as a hero badge; install count is in
the hero telemetry chip — sidebar duplication added noise.
Owner display:
- Flea plugin Owner row now reads `d.owner_display` (resolved through
`users.name → users.email → owner_username` by `_resolve_owner_display`
in `app/api/marketplace.py:1491`) instead of the raw `d.author_name`
(which is `owner_username`, the kebab-case slug). API field already
populated from phase 2; templates just consume it.
- Curated Curator row continues to read `d.author_name` from
marketplace-metadata.json; `owner_todo` placeholder behavior
preserved.
Files:
- app/web/templates/marketplace_plugin_detail.html — rewrote the
Details render loop (lines 1364-1427 area). Slug row removed,
rows reordered, Owner branch reads `d.owner_display`.
- app/web/templates/marketplace_item_detail.html — both branches of
the Details sidebar (inner skill/agent + flea standalone) re-laid
around the same scan order. Telemetry helper unchanged, just
repositioned. Category + Installs rows removed from the
standalone branch.
No new tests — no existing test asserts the precise order of Details
rows or references the dropped fields in a sidebar context (grep
confirmed). API surface unchanged.
Verified locally: 84 passed / 2 skipped on `test_marketplace_api.py`
+ `test_store_api.py`.
* fix(flea): post-review hardening — N+1, v50 UNIQUE, docs, test cleanup
Addresses 5 critical findings from PR #342 code review:
1. N+1 query in `_flea_to_item` — owner-display resolution previously
ran one `SELECT … FROM users WHERE id = ?` per item in the listing
comprehension. Now batched via `_load_users_display` IN-query
prefetch; 50 items drops 51 user queries to 2. Regression-guarded
by `TestFleaOwnerDisplayBatched` (spies `_resolve_owner_display`
and asserts it's not called inside the list path).
2. Misleading comment in `src/marketplace_filter.py` claimed the
attribution layer accepts both `agnes-store-bundle` and `flea`
prefixes — it doesn't (clean cut per CHANGELOG). Rewrote to match
reality.
3. CHANGELOG `[Unreleased]` had two `### Changed` blocks. Merged into
one (BREAKING bullet first).
4. New v49→v50 migration adds `UNIQUE INDEX
idx_store_entities_synthetic_name`. v49 made `synthetic_name` the
canonical attribution key but uniqueness was only app-enforced;
v50 promotes the invariant to the DB layer. Migration pre-checks
for existing duplicates and raises `RuntimeError` listing them
rather than letting `CREATE UNIQUE INDEX` fail mid-way. v48→v49
migration gained an `is_nullable='YES'` guard on its `SET NOT NULL`
ALTERs so re-runs on a fully-migrated DB don't trip DuckDB's
"cannot alter entry … entries depend on it" block (the new index
counts as such an entry). Index is created by the migration only —
keeping it out of `_SYSTEM_SCHEMA` preserves fresh-install ordering
(CREATE TABLE → v49 ALTERs → v50 CREATE INDEX).
5. Deleted three redundant version-pinned schema asserts whose names
lied about their bodies (`test_schema_version_is_42` asserting
`== 49`, etc.). Canonical assert lives in
`test_db_schema_version.py`, renamed to
`test_schema_version_matches_constant`.
* fix(db): gate v34→v38 store_entities ALTER COLUMN steps on column state
CI on Linux failed `test_v17_to_v18_drops_*` after the v50 UNIQUE INDEX
landed. Root cause: those tests open a DB at the full target version,
seed fixtures, then reset `schema_version` to 17 and reopen — forcing
the ladder to re-run from 17 → current. With the v50 index now in place,
DuckDB blocks intermediate `ALTER COLUMN` steps on `store_entities`
("Cannot drop this column: an index depends on a column after it!" /
"Cannot alter entry because there are entries that depend on it"),
because `synthetic_name` (the indexed column) sits positionally after
the columns those steps touch.
Fix: convert the three SQL-list migrations that hit store_entities into
defensive Python functions:
- `_v34_to_v35_migrate` short-circuits when `synthetic_name` already
exists (post-v49 shape — the visibility_status rebuild is moot and
the DROP COLUMN would be blocked by the index).
- `_v35_to_v36_migrate` gates the `visibility_status SET NOT NULL` +
`SET DEFAULT` on `is_nullable='YES'` so it's a true no-op when the
column is already constrained.
- `_v37_to_v38_migrate` gates the `version_no SET NOT NULL` step the
same way.
Forward-roll path (real installs that never reset schema_version) is
unchanged: the gates fire `YES` → ALTERs run. The fix only changes
behavior for the "DB is already at v50 shape but version row says 17"
scenario the tests construct.
---------
Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
* feat(telemetry): marketplace item rollup refactor (schema v46)
Replace the v42 attribution layer with prefix-split + live lookup against
marketplace_plugins / store_entities. The v42 design had a latent bug —
AttributionLookup keyed on bare skill names while Claude Code writes
`<plugin>:<local>` in JSONL, so lookups never matched and
usage_plugin_daily stayed empty in every deployment.
Schema (v46 migration):
- Drop usage_attribution_skills / _agents / _commands (mapping tables,
derivable from marketplace_plugins + plugin tree).
- Drop usage_plugin_daily (always empty in production due to the bug above).
- Create usage_marketplace_item_daily — per-day fact (count, distinct_users,
error_count), composite PK on (day, source, type, parent_plugin, name).
- Create usage_marketplace_item_window — sliding-window snapshot with
true cross-window distinct user counts; period_label='last_7d' refreshes
every tick, 'last_30d' refreshes hourly (tracked via session_processor_state).
- Mark usage_tool_daily as candidate for removal (no product-UI consumer).
Attribution flow:
- MarketplaceItemLookup replaces AttributionLookup. Preloads
marketplace_plugins.name + store_entities.name into memory once per
UsageProcessor tick, then per-event splits identifier on ':',
matches prefix, writes resolved source / parent_plugin into
usage_events. agnes-store-bundle prefix routes to flea entities.
Slash commands with `plugin:` prefix count as type='skill' in rollup.
API:
- BREAKING: MarketplaceItem.unique_users_30d renamed to distinct_users_30d
(now a true distinct count from the window snapshot, not sum-of-daily).
- InnerDetailResponse gains a telemetry field — invocations_30d +
distinct_users_30d surfaced on curated inner skill / agent detail pages.
- Card chip hidden pending UX finalisation; data stays in the response.
Backfill: scripts/backfill_marketplace_rollup.py — one-shot rebuild over
historic usage_events after deploy, idempotent.
USAGE_PROCESSOR_VERSION bumped 4 → 5 so the reprocess loop re-attributes
existing events to the new source/ref_id semantics on the next tick.
Tests rewritten: test_session_processor_usage, test_usage_rollups,
test_marketplace_telemetry, test_api_admin_usage_reprocess,
test_db_schema_version, test_home_stats, test_schema_v42_migration.
New: test_backfill_marketplace_rollup.
* fix(marketplace): refresh Most Popular on search + category changes
`loadMostPopular()` early-exits when `state.q` or `state.category` is
set, but the search + category handlers only called `loadItems()` —
so once the section was visible, typing a query or filtering by
category didn't re-run the hide check and the cards stayed on screen
out of scope. Tab + sort handlers already chained the call.
Add the call to runSearch + category pill click handlers (All +
per-category) so the visibility contract holds for every state
mutation that can flip the early-exit condition.
* feat(marketplace): All-plugins section + 7-day Most Popular
Listing layout:
- Always-visible "All plugins" / "All items" / "Your stack" section
header (label swaps per tab) wrapped in `#mp-all-section` so its
margin-collapse mirrors the sibling `#mp-popular-section` and the
spacing from the filter row stays consistent in both layouts.
- Sort dropdown moved from the filter row into the All-* header,
pinned right via `margin-left: auto`. Anchored to its section so
the relationship between sort + grid is obvious.
- `.mp-section-header` gets `min-height: 32px` + `align-items: center`
so the bare-text Most Popular row matches the dropdown-bearing
All-* row.
- `.mp-section-header` margin tightened 24px → 20px on top.
Most Popular:
- Capacity reduced 8 → 4 cards.
- Now reflects a 7-day window (was 30-day). Backend surfaces
`invocations_7d` + `distinct_users_7d` on `MarketplaceItem`
alongside the existing 30d fields; the loader pulls a wider page
(server still sorts by 30d) and re-sorts + filters client-side
on `invocations_7d > 0` so the strip stays "hot right now".
- Section label updated to "Last 7 days".
- Section now renders on both `curated` and `flea` tabs (was
curated-only). Hidden on `my` and whenever search / category
filter is active. Refresh hooks wired into search + category
click handlers so visibility flips immediately on state change.
Backend (`_load_invocation_stats`):
- Single SELECT pulls both `last_30d` and `last_7d` rows from
`usage_marketplace_item_window`; the result dict carries
invocations + distinct_users for both windows.
- Trend (recent_7 vs prior_7) kept on the daily fact table so it
stays independent of the window snapshot's freshness.
* feat(marketplace): Most adopted sort + hide Trending when no trend data
Add a fourth sort option to the All-items dropdown — "Most adopted
(30d)", keyed on `MarketplaceItem.distinct_users_30d` (true 30d
distinct user count from `usage_marketplace_item_window`). Protects
the listing from power-user skew that `most_used` is susceptible to:
one user × 100 invokes can't beat 10 different users × 1 invoke
under adoption sort.
Hide Trending option when the response has no trend data. User
reported `sort=trending` returning an empty grid because every
plugin's `trend_pct` was None (prior-week threshold of >= 3
invocations didn't clear anywhere). Empty grids on a user-selected
sort are worse UX than just not offering the sort — surface what
works, hide what doesn't.
Backend (`app/api/marketplace.py`):
- `_apply_sort` gains a `most_adopted` branch (DESC distinct_users_30d,
ties by name ASC).
- `sort` Literal extended.
- `ItemListResponse.available_sorts` lists the sort keys the UI
should expose for this response. recent/most_used/most_adopted
always; trending only when at least one item in the tab's stats
carries a non-null trend_pct.
- `_available_sorts(stats_dicts)` helper centralises the rule —
curated and flea branches pass one stats dict, my-tab passes both
(option is available when either source has trend data).
Frontend (`app/web/templates/marketplace.html`):
- New `<option value="most_adopted">Most adopted (30d)</option>`
between Most used and Trending.
- URL state allowlist extended so `?sort=most_adopted` round-trips.
- `applyAvailableSorts(available)` runs after each list fetch:
hides options not in the response's available_sorts; if the user
is on a now-unavailable sort, resets to 'recent' and re-fetches.
Search-mode fan-out unions availability across the curated + flea
responses so a hit on either side keeps the option visible.
* feat(marketplace): funnel chip on cards + deterministic Most Popular sort
Card chip — funnel telemetry between description and footer:
[stack-icon] N installed · [user-icon] N active · [bolt-icon] N calls · ↑/↓ N%
- stack_count (new MarketplaceItem field): for curated it's COUNT(*)
on user_plugin_optouts (post-v28 row PRESENCE = subscribed; system
plugins are fanned out to every user via fanout_system_for_user so
the count includes them naturally). For flea it reuses the existing
store_entities.install_count (bumped on install/uninstall).
- distinct_users_30d (existing) — active users in the 30d window.
- invocations_30d (existing) — call volume.
- trend_pct (existing) — week-over-week, both directions: green ↑ /
red ↓, magnitude only (sign in the arrow). Hidden when null.
Backend additions in app/api/marketplace.py:
- MarketplaceItem.stack_count field.
- _load_curated_stack_counts() — one SELECT per render, GROUP BY
(marketplace_id, plugin_name). Wired into the curated + my-tab
branches; flea reads install_count off the entity row directly.
Frontend (app/web/templates/marketplace.html):
- Heroicons solid 24×24 inlined (one helper per icon, all
fill="currentColor" so per-segment colour tokens apply): rectangle-
stack (mirrors the My Stack tab icon), user, bolt, arrow-trending-
up/down.
- Per-segment colour: installed=amber #F59F0A (My Stack accent),
active=green #0e9b6a, calls=orange #f97316. Text stays neutral so
the chip still reads as metadata, the leading glyph carries the
visual cue. Trend pill keeps the full-segment green/red colour.
- Zero state: chip hidden when stack_count == 0 AND invocations_30d
== 0 — brand-new cards aren't visually penalised by a "0·0·0" row.
- Tooltips on every segment via title="…" so hover explains the
number's meaning to anyone uncertain about the icon.
Most Popular section — deterministic ordering:
Previously sorted by invocations_7d DESC with no tie-breakers, so
several cards with identical 7d call counts would swap places on
refresh (JS stable sort fell back on backend order, and the backend's
own tie-breaker for `most_used` was just name ASC — six `grpn`
plugins from six test marketplaces collapse to the same name and
became indeterminate via list_with_filters' created_at order).
New cascading hierarchy (chosen primary now matches what "most
popular" really means — wide adoption, not power-user volume):
1. distinct_users_7d DESC ← adoption / social proof
2. invocations_7d DESC ← volume at equal adoption
3. distinct_users_30d DESC ← broader adoption fallback
4. invocations_30d DESC ← broader volume fallback
5. name ASC ← deterministic textual order
6. marketplace_slug ASC ← splits duplicate plugin names across
marketplaces
Six levels guarantee any two items end at a different sort key, so
the strip is stable across refreshes.
* fix(marketplace): unify Most Popular on 30d + right-align installed chip
Most Popular section was sorting on the 7d window while its cards
rendered 30d numbers — header label promised one thing, cards showed
another. Unified everything on 30d so a card means the same data
everywhere on the page.
- Dropped the "Last 7 days" meta from the Most Popular header.
- Sort cascade now starts on distinct_users_30d, then invocations_30d,
with 7d adoption/volume as recency-aware fallbacks before the name +
marketplace_slug deterministic tail. Six levels guarantee identical
sort keys never produce indeterminate order across refreshes.
- Filter switched from invocations_7d > 0 to invocations_30d > 0 to
match the new horizon.
- Most Popular now only renders on page 1 of the listing. Past initial
discovery, a top-of-list popularity strip on page 2+ would shadow the
results the user paged into. Pager click handler refreshes the
section so navigating back to page 1 re-mounts it.
Chip layout — split engagement vs adoption visually:
[user] N active · [bolt] N calls · [↑/↓] N% [stack] N installed
└────────── LEFT (time-bounded engagement) ────┘ └── RIGHT (all-time) ──┘
- Installed (stack_count) is all-time, decremented on uninstall. Alone
it says little ("12 people installed it") without the engagement
context next to it ("…but did anyone actually use it?"). Visually
separating the two groups makes that distinction obvious — left
group answers "is it used", right answers "does anyone have it".
- Implemented via flex with margin-left:auto on .seg-installed so
installed drifts to the trailing edge.
- Installed tooltip now reads "Currently installed by N users" — the
count is a real-time net (uninstall drops it), and saying "currently"
makes that explicit. Helps when a card shows 0: signals "nobody has
this in their stack right now", not "data missing".
* feat(plugin-detail): telemetry chip in hero, derived rows in sidebar
Surface the same telemetry funnel the listing card carries on the
curated plugin detail page, so clicking through from /marketplace
keeps a single mental model — figures match, semantics match. The
detail sidebar drops the two raw numbers that used to live there
(Invocations 30d / Users 30d — duplicated by the chip now) and
replaces them with two *derived* signals only the daily series can
provide: Active days + Last used.
Backend (app/api/marketplace.py):
- PluginDetailResponse.stack_count — curated reads via
_load_curated_stack_counts(), flea reuses install_count. Frontend
treats both sources uniformly.
- _build_telemetry() always returns a dict (never None). Frontend
decides chip visibility from stack_count + invocations_30d the
same way the listing card does. daily_series is always 30 entries
(zero-padded) so "Active days" and "Last used" derivations on the
sidebar are trivial array filters.
Frontend (app/web/templates/marketplace_plugin_detail.html):
- New .hero-telemetry slot at the bottom of the hero meta column,
between the pills row and the action buttons. Renders the four
funnel segments — active · calls · trend · installed — joined by
` · `. No left/right split: the hero has space, so a single
coherent metadata strip reads cleaner than the card's split layout.
- Heroicons solid inlined (user / bolt / arrow-trending-up,-down /
rectangle-stack) recoloured against the dark hero — icons in
lighter tokens (mint #6ee7b7, peach #fdba74, cream #fde68a), trend
pill keeps the saturated green/red because direction-coding earns
its own colour.
- Tooltip on installed reads "Currently installed by N users" — the
count is a real-time net (drops on uninstall), and "currently"
makes that explicit when a card shows 0.
- fmtNum helper added so 1.2k / 14M renderings match the card's
format exactly.
- Sidebar swap: Invocations + Users rows removed, replaced by
Active days → "N of 30"
Last used → fmtRelative of the latest non-zero day
Both derived from telemetry.daily_series — engagement consistency
+ recency, neither of which the hero chip exposes on its own.
* feat(item-detail): telemetry chip in hero for curated skill/agent
Bring the funnel chip the plugin detail page got in 4cf38d40 to the
curated inner skill/agent detail page — clicking through from the
listing card now keeps the same metadata strip from grid to plugin
page to inner item page.
Backend (app/api/marketplace.py):
- _load_inner_item_stats() rewritten:
* always returns a dict (never None) so the frontend can decide
chip visibility client-side, same contract as _build_telemetry
* adds trend_pct, computed the same way as plugin level
(recent_7 vs prior_7 from usage_marketplace_item_daily, ≥3
prior-week threshold)
* adds daily_series (30 entries, zero-padded) so the sidebar can
derive Active days + Last used
- InnerDetailResponse.parent_stack_count — new field. Skills/agents
don't have a per-item subscription model, so the hero shows the
*parent plugin's* stack count under a "Plugin:" prefix. The
funnel: "12 installed plugin → 2 actually use this skill".
- curated_skill_detail + curated_agent_detail handlers load
_load_curated_stack_counts() once and pass the parent's value.
Frontend (app/web/templates/marketplace_item_detail.html):
- New .item-detail .hero .hero-telemetry slot beneath the badges
row. CSS mirrors plugin-detail's colour tokens (mint/peach/cream
Heroicons solid + saturated trend pill) so the two surfaces read
as one visual family.
- Installed segment uses a "Plugin:" label rendered with reduced
opacity to signal the metric describes the parent, not the item
itself. Tooltip: "Parent plugin (<plugin_name>) currently
installed by N users".
- Sidebar Invocations + Users rows removed (chip carries them).
Active days + Last used derived from telemetry.daily_series replace
them; only rendered when activeDays > 0 so a brand-new skill
doesn't show "0 of 30" / "Last used —".
- "Type" row dropped from the sidebar — duplicates the hero badge.
- fmtNum helper added (matches listing card + plugin detail).
Plugin detail (app/web/templates/marketplace_plugin_detail.html):
- Hero "Curator: …" line removed. The Details sidebar already
carries that info; duplicating it under the h1 was visual noise.
- Sidebar "Owner" row renamed to "Curator" — for curated plugins
it's a person who curates inclusion in this Agnes instance, not
the upstream code owner. "Owner" was a hold-over label.
* feat(item-detail): unify hero with plugin detail — pills + breadcrumb + cleaner sidebar
- Inner skill/agent hero now uses the same `.pills` / `.pill.cat / .curated /
.flea / .muted` class names + CSS as the plugin detail page; the only
item-only addition is `.pill.type` (Skill / Agent uppercase, plugin detail
has no kind axis).
- Hero `Updated` moved out of the meta-row into a muted pill (mirrors the
plugin detail hero), removed from the Details sidebar to avoid duplication.
- Details sidebar slimmed: dropped Marketplace, Path, Updated rows; Parent
plugin now shows the curator-friendly display name
(`parent_display_name || manifest_name || slug`) instead of the slug.
- Breadcrumb extended to full path: Marketplace > <marketplace_name> >
<plugin display name> > <self>, mirroring the plugin detail breadcrumb.
- Backend: new `InnerDetailResponse.parent_display_name` field, populated via
`_curated_plugin_enrichment` from marketplace-metadata.json — same source
plugin detail hero already uses.
* feat(marketplace): flea inner skill/agent detail + breadcrumb polish
- Flea inner skill/agent detail page parity with curated:
* GET /api/marketplace/flea/{id}/skill/{name} + /agent/{name}
returning InnerDetailResponse (mirror of curated_skill_detail).
* /marketplace/flea/{id}/skill|agent/{name} web routes that render
marketplace_item_detail.html with source='flea' + innerName context.
* Frontend apiURL grows a third branch for flea-inner; breadcrumb
grows to 4 segments (Marketplace > Flea Market > <plugin display
name> > <self>) when innerName is set.
* Telemetry attribution: MarketplaceItemLookup resolves
<flea_plugin>:<inner> prefixes to (source='flea',
parent_plugin=<plugin name>) so nested invocations land in the
same rollups curated nested skills use. USAGE_PROCESSOR_VERSION
bumped 5 -> 6 so the reprocess loop re-attributes historic events.
- Breadcrumb 2nd segment is now a generic clickable "Curated
Marketplace" / "Flea Market" link to /marketplace?tab=... instead
of the opaque per-instance marketplace_name. Applied on both plugin
detail and inner item detail.
- Inner item hero telemetry chip works for both sources: installedCount
branches on parent_stack_count (curated) vs install_count (flea),
installed segment drops the "Plugin:" prefix for flea standalone /
inner items.
- Updated row dropped from Details sidebar on item detail — the hero
pill already carries the value, sidebar row was duplicate.
* feat(item-detail): block stack-install on flea inner items (mirror curated)
Inner skills/agents nested inside a flea plugin can no longer be added
to a user's stack on their own — adoption only happens at the plugin
level, same rule curated nested items have followed since launch.
- Hero action: when innerName is set (curated nested OR flea nested),
render "Open parent plugin →" link + helper text instead of the
install/remove buttons. Flea standalone entities (no innerName) keep
the normal install UX.
- Meta-row: same branch now serves curated + flea inner — "part of
<parent plugin display name> · by <author>" with the parent link
pointing at the right detail page per source.
No API gate change needed: POST /api/store/entities/{id}/install only
accepts existing entity ids (plugin-level), inner items have no entity
id of their own so the endpoint cannot target them directly.
* feat(marketplace): telemetry chip on inner cards + fix flea hero chip visibility
Inner skill/agent cards on the plugin detail page now carry the same
four-segment funnel chip the marketplace listing cards show (N active
. N calls . trend . N installed), for both curated nested skills and
flea nested skills. Plus two fixes that were keeping the hero chip
hidden on flea plugin / flea inner detail pages.
- Backend `_load_inner_items_stats_by_parent(conn, source, parent_plugin)`
bulk loader: one query per plugin against usage_marketplace_item_window
+ one against _daily, returning {(name, type): stats}. Avoids N+1
per-card lookups.
- `InnerItemSummary` gains invocations_30d / distinct_users_30d /
trend_pct / parent_stack_count fields. `curated_detail` and
`flea_detail` (in the entity.type=='plugin' branch) enrich the
skills / agents lists after the existing cover-photo enrichment loop.
- `marketplace_plugin_detail.html`: new `.plugin-detail .inner-card
.inv-chip*` CSS lifted from marketplace.html with the listing-card
rules, new buildInnerCardChip() helper, buildCardSection appends
the chip to each card body. Same gate as the listing card (hidden
on parent_stack==0 && calls==0).
- fix(flea): flea_detail forgot to populate PluginDetailResponse.stack_count
from entity.install_count (listing card does this on line 851; detail
endpoint didn't). Hero chip gate `stackCount===0 && calls===0` then
always hid the chip even when the entity had installs. Now mirrors
listing card semantics: stack_count == install_count for flea.
- fix(flea inner): renderInnerHeroTelemetry was reading `d.install_count`
for any non-curated source. InnerDetailResponse has no install_count
field — it has parent_stack_count (populated server-side from the
parent flea plugin's install_count). Gate + label now read
parent_stack_count for both curated nested AND flea nested scenarios;
install_count remains the flea standalone path.
* fix(marketplace): Owner label on flea + parent-centric sidebar for flea inner
- Plugin detail Details sidebar — authorship row label now tracks the
source: curated bundles get `Curator` (existing behaviour), flea
bundles get `Owner`. The `owner_todo` reminder placeholder stays on
the curated branch only; flea falls through silently.
- Inner item detail Details sidebar — flea-inner (skill/agent nested
inside a flea plugin) now shares the curated nested layout: Parent
plugin / Bundle size / Active days / Last used / Owner. Drops the
flea-standalone shape's `Category`, `Version`, `Installs`, `Released`
rows that didn't apply to a nested item. Active days + Last used were
already wired (telemetryRows) — they just weren't on the flea-inner
branch.
* fix(tests): bump SCHEMA_VERSION assertions 47 -> 48 post-rebase
The marketplace telemetry migration was renamed _v46_to_v47 -> _v47_to_v48
during the rebase onto main (collision with #326 FTS BM25 migration that
took the v47 slot). Two test files still asserted the pre-rebase value:
- tests/test_home_stats.py::test_schema_version_constant_is_46 (CI red)
- tests/test_schema_v46_migration.py::test_schema_version_is_46
Renames the helper fn name + bumps the assertion. The other two test
files (test_db_schema_version.py, test_schema_v42_migration.py) were
already updated in the rebase resolution.
* fix(telemetry): _build_telemetry returns None when invocations_30d == 0
The follow-up commit that introduced the always-return-dict shape broke
the test contract from the original v46 PR (commit b603e998):
tests/test_marketplace_telemetry.py::TestDetailTelemetry::
test_detail_endpoint_telemetry_absent_when_no_data
AssertionError: assert {'daily_series': [...], ...} is None
Both `PluginDetailResponse.telemetry` and `InnerDetailResponse.telemetry`
are declared `Optional[Dict] = None`, the frontend renders are None-safe
(`d.telemetry || {}` guard + `if (!d.telemetry || ...)` on daily_series),
so dropping the dict on zero activity is the cleaner default.
* release: 0.54.21 — marketplace telemetry refactor (schema v48) + flea inner detail parity + listing UX polish
---------
Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
CLAUDE.md rewritten (708 -> ~320 lines): four overlapping release
sections collapsed to one, stale v1->v35 schema history dropped (it
lives in CHANGELOG), marketplace endpoint internals and verbose
process sections moved out or tightened.
New focused docs:
- docs/RELEASING.md - release process, deploy workflows, CI quirks
(RELEASE_TEMPLATE.md folded in as an appendix)
- docs/marketplace.md - marketplace ingestion + re-serving internals
- docs/README.md - documentation index by audience, linked from
README.md and CLAUDE.md
Archived under docs/archive/: docs/superpowers/ (52 historical
planning artifacts), HACKATHON.md, pd-ps-comments.md,
security-audit-2026-04.md, future/NOTIFICATIONS.md.
Removed the docs/auto-install.md stub. Fixed dangling links in
connectors/jira/README.md and dev_docs/README.md, repointed
code/doc references to archived paths.
* fix(security): RBAC filter for agnes_sessions matches both email local-part and user_id
The upload API (POST /api/upload/sessions) stores session files under
user_sessions/{user_id}/ (UUID), while the session collector uses the
OS username (email local-part). The session pipeline writes the directory
name verbatim into usage_session_summary.username, so the column can
contain either value depending on the ingestion path.
The RBAC filter in build_filter_clause previously only matched the email
local-part, missing sessions uploaded via the API. The fix adds an OR
condition so non-admin users see rows where username matches either their
email local-part or their user_id.
Closes#293
Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>
* fix(security): RBAC filter uses stable user_id instead of mutable email local-part
Closes#293
Previous fix used OR condition matching both email local-part and user_id
in the username column. This was fragile: email changes would break
filtering. This commit introduces a dedicated user_id column populated
by the session pipeline via resolve_user_id(), and switches the RBAC
filter to use it exclusively.
Changes:
- Schema v45: add user_id column to usage_session_summary and usage_events
- UsageProcessor: accept and store user_id in both tables
- runner.py: resolve_user_id() maps directory name to users.id UUID
(exact match for UUID dirs, email LIKE for local-part dirs)
- INTERNAL_TABLES: agnes_sessions/agnes_telemetry filter on user_id column
- build_filter_clause: simplified to WHERE user_id = '<uuid>' (no OR)
- me.py/admin_user_sessions.py: query by user_id OR username for
backward compatibility during transition
- USAGE_PROCESSOR_VERSION bumped 2→3 to trigger reprocessing/backfill
- Tests updated: 27 pass including new email-change resilience test
Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>
* fix(tests): bump schema version assertions 44→45
Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>
* fix(docs): correct resolve_user_id docstring, add TypeError comment
Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>
* fix(security): address review — backward-compat OR, LIKE escaping, narrower TypeError
Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>
* fix(security): address code review — eliminate TypeError hack, add resolve_user_id tests
Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>
* fix(db): create user_id indexes in _v44_to_v45, not _SYSTEM_SCHEMA
_SYSTEM_SCHEMA runs before the migration ladder. On an upgrade from
v42/v43/v44, usage_events / usage_session_summary already exist without
the user_id column (CREATE TABLE IF NOT EXISTS is a no-op), so the
CREATE INDEX ... (user_id) lines in _SYSTEM_SCHEMA failed to bind and
aborted _ensure_schema — the app would not start post-upgrade. Move the
index creation to _v44_to_v45, which ADDs the column first. Same pattern
as the v41 audit_log indices.
* fix(usage): bump USAGE_PROCESSOR_VERSION 3→4 for user_id backfill
#303 shipped USAGE_PROCESSOR_VERSION=3 (release 0.54.12) for its
<command-name> slash extraction. This PR's 2→3 bump collided with it
on rebase, so the reprocess loop would not re-trigger to backfill the
new user_id column on deployments already running v3. Bump to 4.
* release: 0.54.13 — RBAC filter uses stable user_id (#293)
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
UsageProcessor missed every user-typed slash invocation because the
SLASH_RE regex (^\s*/<name>) expected a raw "/foo" prefix that Claude
Code never writes. Real session jsonls wrap slash commands in a
<command-name>/foo</command-name> XML tag inside user message content.
Result on production: usage_events.command_name and
usage_session_summary.slash_commands stayed NULL/0 for /clear, /exit,
plugin commands like /plugin:name — verified on 17 dev-VM jsonls
holding 25 <command-name> tags / 0 extracted rows.
Replaces SLASH_RE with COMMAND_NAME_RE that searches for the tag
anywhere in the user text (the tag sits after a <command-message>
sibling). USAGE_PROCESSOR_VERSION bumps 2 → 3; operators wanting to
rewrite historical rows under the new logic call
POST /api/admin/usage/reprocess (agnes admin telemetry reprocess).
Fixtures slash_command.jsonl, mixed.jsonl, skill_curated.jsonl
rewritten from the unrealistic "/foo args" string format to the real
<command-name> tag wrapper — existing assertions stay green against
the new format, which is the regression baseline going forward. Adds
TestCommandNameTagExtraction (4 unit tests on iter_events) covering
string content, list-of-text-blocks content, mid-text tag position,
and plain-prose "/x" non-match.
Implicit Skill tool_use extraction (LLM-decided invocations)
unchanged.
Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
* feat(home): status frame on /home — last sync, sessions, prompts, tokens, projects
Adds the homepage status frame: a 5-card row above the install-hero /
offboard-strip on /home showing the calling user's Last sync (their
last `agnes pull`), Sessions, Prompts, Tokens used, and Projects worked
on, with a 24h/7d pill toggle.
Backed by `GET /api/me/home-stats?window=` (one DuckDB CTE joining
`users` + `usage_session_summary` + `usage_events`) and SSR'd from the
same `compute_home_stats` helper on initial paint so there's no
spinner. The window toggle is the only JS-driven path.
Side surfaces:
- `GET /api/sync/manifest` now stamps `users.last_pull_at` so
`agnes pull` (and the Claude Code SessionStart hook that wraps it)
imprints the analyst's last sync time for the new card.
- `usage_session_summary` gains four BIGINT token counters
(input_tokens, output_tokens, cache_read_tokens, cache_creation_tokens)
summed from JSONL `message.usage.*` per assistant turn.
- `USAGE_PROCESSOR_VERSION` bumps 1 → 2 so the session-pipeline
reprocess loop invalidates stale summaries and backfills tokens
on the next tick.
Schema migration v43 → v44 is idempotent ALTERs (last_pull_at +
4 token columns) — fresh installs receive them from `_SYSTEM_SCHEMA`,
upgrade path runs `_v43_to_v44`. Defaults (NULL / 0) backfill
existing rows cleanly.
9 new tests in tests/test_home_stats.py cover the migration,
endpoint shapes (24h/7d/unknown/empty/missing-user), and the
manifest-side last_pull_at bump.
* docs(CHANGELOG): homepage status frame entries under [Unreleased]
The post-rebase release-cut now belongs to whichever PR lands next
after main rolled to 0.54.9. This PR logs its bullets under
[Unreleased] (Added: homepage status frame, per-user pull tracking,
token counters; Changed: schema v43 → v44 migration) so they ride
out with the next release-cut.
* fix(tests): bump test_schema_v42_migration asserts to v44
CI failed because tests/test_schema_v42_migration.py hardcoded
`assert SCHEMA_VERSION == 43` and `assert v == 43` after init.
v44 (homepage stats frame backing columns) was introduced in the
preceding feat commit; this aligns the existing v42-era migration
tests with the new schema version.
* feat(home): gate status frame on operator flag + user.onboarded
Two gates on the homepage status frame:
1. **Operator master switch** — `get_home_status_frame_visibility()` in
app/instance_config.py mirrors the existing `get_home_automode_visibility()`
shape: env var `AGNES_HOME_SHOW_STATUS_FRAME` > yaml
`instance.home.show_status_frame` > default `True`. Cautious-rollout
instances can disable the frame without forking; the yaml example
documents both knobs.
2. **Onboarded gate** — the template only renders the frame when the
caller's `users.onboarded` is true. First-day users see a clean
install-hero before all-zero stat cards; the frame appears
automatically on the next render after `agnes init` POSTs
`/api/me/onboarded`.
Router skips the `compute_home_stats` DB read entirely when either
gate is closed; `home_stats` arrives at the template as None in that
branch and the `{% if %}` shortcuts the include.
Why both gates: PostHog feature flags evaluated and rejected — this
codebase uses PostHog for analytics capture only, not feature gating;
adding a per-user feature_enabled() call on the /home critical path
would couple the homepage render to a remote eval and still require
an admin master switch. The onboarded gate is a UX coherence rule
layered on top of the operator switch, not an A/B test signal.
3 new tests in test_home_stats.py cover the env-var resolution
(falsey values + default-true). The yaml example gets a `home:`
block documenting both `show_automode` (pre-existing flag, was
undocumented in the example) and `show_status_frame`.
* docs(spec): admin observability spec + Activity Center MVP plan
Parent spec (480 lines) + executable plan (2295 lines, 14 TDD tasks).
Covers Activity Center rebuild (/admin/activity), with /admin/sessions
and /admin/feedback deferred to follow-up plans.
Already incorporates reviewer-pass revisions across three angles
(security, production resilience, code architecture):
- _get_db import path corrected to app.auth.dependencies
- Test fixtures aligned with seeded_app / admin_user / get_system_db
- All new audit writes wrapped in try/except + logger.exception
- Filename sanitization on session uploads
- DuckDB DESC index behavior documented; upgrade window flagged
- Migration idempotency + evolved-DB test cases
- reveal_raw + shared-cache multi-worker explicitly deferred
Targets schema v40 (audit_log gains params_before, client_ip,
client_kind, correlation_id + 3 indices).
* feat(db): schema v40 — audit_log gains params_before, client_ip, client_kind, correlation_id + 3 indices
* chore(test): clean up Task 1 — drop unused import, rename stale test
* feat(audit): AuditRepository.log() accepts params_before/client_ip/client_kind/correlation_id
* test(audit): strengthen params_before assertion to round-trip JSON content
* feat(audit): AuditRepository.query() rich filters + keyset cursor pagination
* feat(sync): SyncStateRepository.list_recent() cross-table feed
* feat(audit): POST /api/sync/trigger writes audit_log row
* feat(audit): POST /api/scripts/run-due writes audit_log row
* feat(audit): POST /api/upload/sessions writes audit_log row + sanitizes filename
* feat(audit): GET /api/data/{table_id}/download writes audit_log row
* feat(activity): /api/admin/activity timeline + /health + /sync endpoints
* feat(ui): /admin/activity rebuilt — health pulse, timeline, sync grid; /activity-center → 308 redirect
BREAKING: removed demo executive-pulse / maturity-roadmap content from activity_center.html.
The page now reflects real audit_log + sync_history data.
* feat(ui): admin nav + dashboard widget point at /admin/activity
* feat(activity): recursive-audit suppression for AC read endpoints (60s window per actor+filter)
* feat(activity): emit PostHog events when integration enabled (no-op default)
* fix(audit): move v40 indices out of _SYSTEM_SCHEMA + update test_repositories to unpack query() tuple
_SYSTEM_SCHEMA CREATE INDEX on audit_log(timestamp) failed when migration
tests hand-roll a bare audit_log (id, action) without the timestamp column.
Fix: remove indices from _SYSTEM_SCHEMA; add ADD COLUMN IF NOT EXISTS guards
for timestamp and other pre-v40 columns in _v39_to_v40() so the upgrade path
is safe on any hand-rolled schema; call _v39_to_v40 explicitly in the
fresh-install (current==0) path to restore index creation there.
Also unpack the (rows, next_cursor) tuple from AuditRepository.query() in
the three TestAuditRepository tests that still treated it as a list.
* docs: CHANGELOG entry for Activity Center MVP
* chore: refresh stale module docstring in app/api/activity.py
* feat(cli): agnes admin activity — terminal access to Activity Center (timeline + health + sync)
* fix(db): _v39_to_v40 — add IF NOT EXISTS guard for 'action' column
The v39→v40 ladder step adds defensive ADD COLUMN IF NOT EXISTS for
every audit_log column so a hand-rolled bare audit_log (id only) is
safe through the ladder. 'action' was missing from the guard list,
causing CREATE INDEX idx_audit_action_time to fail on tests that
stub audit_log with only an id column (tests/test_e2e_extract.py::
TestSchemaMigration::test_migration_preserves_and_extends).
Local 6/6 schema tests + the previously-failing CI test pass.
* docs(spec): platform telemetry epic — Boss directive + Activity Monitoring plan rebased onto v40 (stacked on zs/spec-activity-center)
* feat(db): schema v41 — 7 usage_* tables for telemetry (events, summary, rollups, attribution)
* chore(db): tighten v41 — usage_session_summary.session_id NOT NULL + upgrade test asserts all 7 tables
* feat(usage): UsageAttributionRepository — replace/delete/lookup over usage_attribution_* tables
* refactor(marketplace): extract list_inner_skills/agents/commands to src/marketplace_listing.py for reuse
* feat(usage): explode plugin attribution on marketplace sync + store entity write; backfill script
* refactor(marketplace): finish src/marketplace_listing.py extraction — drop duplicate _list_inner_* + _parse_frontmatter from app/api/marketplace.py
* feat(usage): promote attribution helpers to src/usage_attribution_helpers.py; hook update_entity rename + bundle-swap; clarify best-effort semantics
* feat(usage): UsageProcessor real extraction + rollup rebuild + 10 fixture-driven tests
* fix(usage): include tool_id in event hash + executemany + rollup transaction (critical multi-tool-turn drop fix)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(marketplace): popularity stats — invocations_30d + trend + sort=most_used|trending + Most Popular section
* feat(admin): /admin/users/<id> Sessions section — list + single-file + bulk-zip downloads (audit-logged)
* feat(usage): admin export endpoint + CLI — csv/json/parquet streaming, filters, audit-logged
* feat(usage): agnes admin ask — LLM Text-to-SQL over usage_events with SELECT-only validator (audit-logged)
* feat(usage): reprocess + prune endpoints + scheduler daily prune job + CLI
* docs: PLATFORM_SETUP.md operator playbook + HOWTO/ cookbook (5 guides + index)
Adds docs/PLATFORM_SETUP.md as a consolidated operator playbook covering
bootstrap, TLS, marketplaces (curated + flea), scheduler env vars, telemetry
extraction/export/ask/prune, privacy posture, and daily routine.
Adds docs/HOWTO/ with 5 analyst cookbook guides: first query, snapshots for
remote tables, private sessions, feedback + admin ask, and customizing skills.
Existing setup docs (QUICKSTART, DEPLOYMENT, ONBOARDING, HEADLESS_USAGE)
get a one-line cross-reference at the top pointing to PLATFORM_SETUP.md.
* docs(changelog): platform telemetry epic — usage_* foundation + surfaces + admin access + docs
Comprehensive [Unreleased] entry covering: usage_events/session_summary/
tool_daily/plugin_daily tables (v41), attribution lookup tables, backfill
script, marketplace Most Popular + invocation chips + sort, admin Sessions
section, export/ask/reprocess/prune endpoints + CLI mirrors, Activity Center
(v40), PLATFORM_SETUP.md + HOWTO/ docs, and operations notes for v41 upgrade.
* fix(security): block DuckDB read_*/http_*/glob functions in usage_ask validator + symlink escape guard in session zip + clarify mark-private semantics
* fix(admin): parquet export tempfile cleanup on COPY failure + correct processed-first sort on /admin/users/<id>/sessions
* feat(audit): close 8 production audit gaps — query (local/remote/hybrid), catalog/schema/sample, snapshot estimate/create, check-access
* feat(ui): /admin/usage summary dashboard + per-user activity tab on /admin/users/<id>
* fix(audit): cap error messages at 200 chars + audit user_activity reads + recursion guard on usage.summary
* fix(audit): catalog.list audits on error path + clean up deferred json import
* fix(ux): client_kind=cli for PAT auth + timeline empty state + email-instead-of-uuid + nav reorder + help text + loading indicators + ask doc
* feat(observability): unify /admin/activity into single page with saved views
- KPI cards (events, users, error rate, p95) clickable as quick-filters
- Faceted filter dropdowns populated from audit_log in the current window
- Sortable audit table, cursor pagination, per-row JSON side panel
- Saved views (schema v43: user_observability_views) — per-user state
- Top bar: window selector + 30s Live toggle + saved views dropdown
- /admin/scheduler-runs → 308 redirect (source=scheduler filter)
- New endpoints: /api/admin/observability/{facets,kpis,views}
* test: update activity + scheduler-runs tests for unified page
- test_admin_activity_page_renders asserts new structural anchors
- test_admin_scheduler_runs_page_admin_only asserts 308 redirect
* fix(observability): respect [hidden] on modal + side panel
CSS `display: flex` on .obs-modal beat the [hidden] attribute's UA
display:none, so the save-view modal rendered on page load and Cancel
clicks couldn't dismiss it. Gate the modal's flex layout on
:not([hidden]); add the same display:none guard prophylactically to
.obs-panel and .obs-views-panel.
* feat(observability): user enrichment in audit + interactive /admin/usage
Activity:
- /api/admin/activity now joins users for user_email + user_name per row
- User column renders "name (id-prefix)" or "email (id-prefix)" instead
of an opaque truncated UUID; falls back to id when the user record is
missing
Usage:
- /admin/usage rewritten as the same filter/group-by/search pattern as
/admin/activity. Faceted dropdowns (User / Tool / Source / Event type)
populated from usage_events; debounced free-text search across
tool_name / skill_name / subagent_type / command_name
- New endpoints /api/admin/usage/{facets,kpis,query}; the query endpoint
supports group_by in {day, username, tool_name, source, ref_id} with
sort + offset pagination, plus an ungrouped raw-events mode
- 4 KPI cards (events, distinct users, distinct tools, error rate) are
clickable quick-filters; clicking a grouped row applies the bucket as
a filter
- Old static `?window=7d|30d|all` server preload removed; all state is
client-side via since_minutes + group_by + filters in the URL
* fix(observability): clearer labels, all-column sort, drop saved views UI
- Rename page titles: "Activity" → "Server activity", "Usage" → "Tool usage"
with a one-line subtitle on each explaining what the page covers and
linking the other one. The two pages source different data (audit_log
vs usage_events) and the previous labels conflated them.
- Drop the saved-views dropdown + save modal from /admin/activity. The
modal pop-open bug was the trigger; the value wasn't there yet. The
/api/admin/observability/views CRUD + DuckDB table stay in place.
- Rename "Live (30s)" to "Auto-refresh (30s)" with a tooltip clarifying
that it's the re-fetch rate, not the time range. Time range now
labeled "Time range" instead of "Window".
- All audit-table columns are sortable (User, Source, Action, Resource,
Result added); sort is page-local with a Jinja comment explaining the
trade-off. Same for raw usage rows.
- Fix duplicate sort-arrow bug — the literal "▼" in the Time th HTML was
rendering alongside the CSS ::before arrow. Removed the literal; CSS
is the single source of truth.
* feat(observability): global Sessions browser + transcript viewer + CLI
Web:
- /admin/sessions — list every collected session JSONL across all users
with time-range, user, model, errors-only and free-text filters. Default
sort surfaces error-heavy sessions first. KPI cards (sessions, distinct
users, sessions w/ errors, tool error rate) clickable as quick-filters.
- /admin/sessions/<username>/<file> — transcript viewer rendering the
JSONL chronologically: user prompts, assistant text, tool calls (with
JSON input) and tool results (with flattened output). Errors get a red
border + chip and a "Next error" navigation button at the top.
- Admin dropdown gains a "Sessions" link.
API:
- GET /api/admin/sessions/{list,kpis,facets} — filtered cross-user reads
off usage_session_summary
- GET /api/admin/sessions/{username}/{file}/transcript — parses JSONL via
the existing services.session_pipeline.lib, returns chronological events
- GET /api/admin/sessions/{username}/{file}/download — JSONL stream, same
path-safety guards as the per-user endpoint, audit-logged
CLI:
- `agnes admin sessions list [--user X] [--errors] [--since 7d]` — table
output with `!` prefix on rows that hit a tool error
- `agnes admin sessions show <username> <file>` — transcript dump, with
`--errors` to print only the failed tool_result blocks
- `agnes admin sessions download <username> <file> [-o path]`
- `agnes admin sessions kpis` — top-level numbers
* feat(internal): expose telemetry tables to agnes query with row-level RBAC
Three new registered tables backed by system.duckdb, queryable through
the same /api/query plumbing analysts use for Keboola / BigQuery /
local sources:
agnes_sessions → usage_session_summary (filter: username)
agnes_usage → usage_events (filter: username)
agnes_audit → audit_log (filter: user_id)
RBAC is per-row, not per-table: admins see every user's rows; non-admins
see only their own. The filter is built server-side from the auth user
dict; non-admin filter values are regex-validated before SQL interpolation.
Implementation:
- new connector connectors/internal/ with access (filter+exec) + registry
(idempotent table_registry seed at startup)
- /api/query detects internal table refs and short-circuits to a CTE
wrapper that prepends "WITH agnes_x AS (SELECT * FROM <src> WHERE …),
…" then "SELECT * FROM (<user_sql>) AS _q". DuckDB cursor on the
shared system.duckdb handle — opening parallel handles / ATTACH on the
same file is blocked process-wide.
- mixing internal + BQ / registered local tables in one SELECT is
rejected (v1 limitation)
- src.rbac.can_access_table waves internal tables through for all
authenticated users; row scoping is the actual security control
- /api/v2/schema and /api/v2/sample gained internal branches; sample
intentionally skips its cache because rows are RBAC-scoped per caller
- audit row written as action='query.internal' with is_admin flag
Tests: connectors/internal/access — RBAC, filter clause, schema, CTE
wrapper coexistence with user-supplied aggregations, unsafe-username
rejection. 16/16 passing.
Motivating queries this enables:
SELECT tool_name, COUNT(*) FROM agnes_usage
WHERE is_error GROUP BY 1 ORDER BY 2 DESC
-- analyst self-introspection: which tools fail for me?
SELECT user_id, COUNT(*) FROM agnes_audit
WHERE action = 'session.transcript_view' GROUP BY 1
-- admin: who's been looking at whose session transcripts?
* feat(admin): group dropdown into 5 named sections + internal tables in /catalog
Admin dropdown gains section headers so admins can land on the right
page without re-reading the full menu:
Activity Center Server activity / Tool usage / Sessions
Users & Access Users / Groups / Resource access / Tokens
Data Tables
Agent Experience Curated Marketplaces / Flea Submissions /
Agent Setup Prompt / Agent Workspace Prompt
Server Server config
"Agent Experience" frames the curated content + prompts as one cluster
— it's all admin-controlled material that shapes what an analyst's AI
agent encounters. "Configuration" → "Server" since only one item lives
there now.
Renamed the section's first two items:
"Activity" → "Server activity" (matches page H1)
"Usage" → "Tool usage"
Also fixes /catalog visibility of the internal tables (agnes_sessions /
_usage / _audit) for non-admin users: ``app.auth.access.can_access``
short-circuits to True for resource_type='table' + an internal-table id.
Without this, non-admins saw the tables in /api/v2/catalog (which uses
the same RBAC bypass) but not on the /catalog HTML page (which calls
can_access directly, requiring a resource_grants row internal tables
don't have).
CSS for `.app-nav-menu-section`: small caps, muted, non-clickable; first
section trims top padding so the panel doesn't open with an awkward gap.
* refactor(admin): move corporate memory into Admin > Agent Experience
Memory link was the only admin-only entry in the primary nav (gated by
session.user.is_admin). Moves it into the Admin dropdown under Agent
Experience, alongside Curated Marketplaces / Flea Submissions / Prompts
— all admin-curated content that shapes what an analyst's AI agent
encounters.
Renamed the nav label to "Shared Knowledge" to match what the page
actually is (admin-curated organisational knowledge from session
verification, surfaced to agents). URL stays at /corporate-memory; the
route still gates on require_admin per the existing comment.
Side effect: primary nav (Home / Marketplace / Data Packages) is now
uniform for every authenticated user — no conditional admin-only entry.
* ui: rename admin entries to Curated Knowledge / Init Prompt / Workspace Prompt
- "Shared Knowledge" → "Curated Knowledge" (parallel with "Curated
Marketplaces" in the same Agent Experience section; "curated" tells
the admin what they do there — review + approve)
- "Agent Setup Prompt" → "Init Prompt" (matches the `agnes init` flow
it actually drives)
- "Agent Workspace Prompt" → "Workspace Prompt" (the "Agent" prefix
was redundant — every item in the section is agent-facing)
Renames page titles + H1s on /admin/agent-prompt and
/admin/workspace-prompt to match.
* refactor: rename Usage → Telemetry across user-facing surfaces
External surfaces all switch; internal Python module / file names and the
physical DB tables (usage_events, usage_session_summary, usage_tool_daily,
usage_plugin_daily) stay — renaming them would force a schema migration
+ a redo of the LLM Text-to-SQL prompt for no analyst-visible win.
Changes:
- Admin dropdown: "Tool usage" → "Telemetry"
- Page H1 / <title>: same
- URL: /admin/usage → /admin/telemetry; old URL 308-redirects
- API prefix: /api/admin/usage/* → /api/admin/telemetry/*
- CLI: primary command `agnes admin telemetry …`; `agnes admin usage` kept
as a deprecated alias so existing operator scripts keep working
- Internal data-source table id: agnes_usage → agnes_telemetry. The
registry seed now evicts any stale internal-source row whose id no
longer matches INTERNAL_TABLES, so the old `agnes_usage` row is
removed from table_registry on next app boot
- All tests + JS endpoint paths updated
* test(rbac): include auto-appended internal tables in expectations
get_accessible_tables now appends agnes_sessions / agnes_telemetry /
agnes_audit to every authenticated user's accessible-tables list so the
internal data source shows up in /catalog. The two existing rbac tests
asserted hardcoded list shapes that pre-dated the change.
Rewritten to assert "granted tables + the canonical internal-table set"
instead of literal lists, so the test stays correct if the internal
table roster changes again later.
* ui: visual dividers between admin-dropdown sections
Adds a 1px top border + 6px top margin to every section header except
the first, so the five named groups (Activity Center, Users & Access,
Data, Agent Experience, Server) read as visually separated clusters.
The header itself stays small-caps + muted as before — the border is
additive.
* ui(memory): match obs-topbar visual on /corporate-memory
The Curated Knowledge page (linked from the admin dropdown's Agent
Experience section) opened straight into the stats bar — no title,
no subtitle, no shared chrome with the other admin pages. Adds an
obs-topbar-style header at the top of .container-memory:
- H1 "Curated Knowledge"
- subtitle explaining what the page is + how AI agents pull from it
The `.ck-*` class set duplicates the inline obs-* styles from
/admin/activity etc. for this one page; promoting the obs-* class set
to style-custom.css for shared reuse is the obvious next step (4 pages
already inline the same CSS), tracked as a follow-up.
Page <title> also renamed from "Corporate Memory" → "Curated Knowledge".
* ui(tables): list Agnes internal tables in /admin/tables + group in /catalog
/admin/tables previously rendered three per-source-type listings
(BQ / Keboola / Jira) and dropped any row whose source_type didn't
match — so the agnes_sessions / agnes_telemetry / agnes_audit rows
seeded into table_registry were invisible. Adds a fourth read-only
section "Agnes internal tables" that filters source_type === 'internal'
and renders the same registry-table layout the other sections use,
with two changes:
- no Register button (these rows are seeded on every app boot from
connectors/internal/registry.py)
- Edit + Delete actions hidden (any change would be reverted on the
next start). Manage access stays so admins can still inspect.
Mode badge picks up a new mode-internal CSS class (teal accent) so the
display doesn't lie and call it "local".
In /catalog, internal tables now group under an "agnes" accordion
section (bucket="agnes" on seed) instead of falling into the catch-all
"default". Single source of truth for which tables exist; admins find
them where they expect.
* ui(tables): Agnes internal as a 4th tab next to BQ/Keboola/Jira
Previous iteration mounted the internal-table listing as a separate
standalone card under the tab strip. Reshapes it to a proper
tab-content section so admins switch between data sources via one
consistent nav (BigQuery / Keboola / Jira / Agnes internal).
- New tab button "Agnes internal" in the tab-nav.
- The listing card becomes <section id="tab-content-internal"
class="tab-content">; switchTab() already routes by id so no JS
change beyond extending the hash allowlist for direct #internal
links.
- Tab content keeps the read-only treatment from the previous commit
(no Register button, no Edit / Delete in renderRegistryListing).
* ui: rename Curated Knowledge → Curated Memory
Settles the naming back on "Curated Memory" — parallel structure with
"Curated Marketplaces" in the same Agent Experience section, and zero
rename ripple: URL (/corporate-memory), API (/api/memory/*), CLI
(agnes admin memory), and Python modules all stay on "memory" so the
admin label finally lines up with the underlying surfaces.
The "Curated" prefix still tells admins what they do on the page
(review pending → approve / mandate / reject) and reads as a sibling
of "Curated Marketplaces" right next to it in the dropdown.
Touches: admin dropdown label, page <title>, page H1. DB tables stay
on knowledge_* (already the canonical naming for the data shape).
* ui: rename "Server activity" → "Audit log"
"Audit log" is what the page actually is — server-side audit_log table
rendered with KPI cards + filter bar + sortable table. The "Server
activity" label confused the term with Claude Code session telemetry
(Telemetry page) and didn't make the source/concept clear.
Touches:
- Admin dropdown nav label
- /admin/activity page H1 + subtitle
- /admin/telemetry subtitle cross-link
- test_activity_api page-renders assertion
URL (/admin/activity) and API (/api/admin/activity/*) stay — the
"activity" name has stuck at the route layer for a year; rerouting
those would churn dashboards/bookmarks for zero analyst-visible win.
* ui(admin-nav): gray band on each section header for clearer separation
Previous iteration used a 1px top border between section labels — the
labels still blended into the items above/below at a glance. Switches
to a light gray background band per section header, extended edge-to-
edge inside the panel via negative horizontal margins. Bolder
font-weight (700) reinforces the separation; bumping the font color
isn't needed because the band itself does the work.
First section's header tucks into the panel's top border-radius so the
band reaches the corners without a gap.
* ui(catalog): rename internal-table category to "Agnes Internal"
`bucket` is what /catalog renders as the accordion category header
verbatim — "agnes" lowercase didn't read as a real category name and
got confused with a system identifier. Bumps to "Agnes Internal".
Seed re-applies on every app boot so existing rows pick up the new
bucket value via `ON CONFLICT (id) DO UPDATE`.
* ui(catalog): split Agnes Internal into its own card on /catalog
Previously the three internal tables landed inside the "Core Business
Data" card under an "Agnes Internal" accordion alongside Keboola / BQ
buckets — readers conflated system telemetry with business datasets,
and the data_stats header counter ("3 tables · ~X rows total") only
ever counted synced rows so internal tables looked invisible.
Split the catalog page into two cards:
- Core Business Data: only non-internal source_types (Keboola, BQ,
Jira). Accordions group by bucket as before. Stats counter reflects
this card's tables.
- Agnes Internal: a dedicated card with its own visual treatment
(teal accent matching the mode-internal badge in /admin/tables).
Flat list (no accordion — only 3 rows, never grows here), each
row carries the canonical `agnes query` snippet. Read-only — no
profiler click, no In-stack toggle, no sync metadata.
Route adds `internal_card` context object; template renders the new
card only when it's non-None.
* fix(rbac): hide internal tables from /admin/access + drop "my" framing
Two related cleanups for the Agnes-internal tables:
1. /admin/access (resource grants) no longer lists them. The
`can_access` check has a hardcoded internal-table bypass — security
is row-level (per-request view filter), so a table-grain
`resource_grants` row would do nothing. Surfacing them in the UI
let admins set up grants that silently no-op. Filter at the
`_table_blocks` projection so the UI tree never sees them.
2. Display names drop the analyst-perspective "my" framing:
"Agnes — my sessions" → "Agnes sessions"
"Agnes — my telemetry events" → "Agnes telemetry events"
"Agnes — my audit log" → "Agnes audit log"
The "my" only makes sense from the querying analyst's seat
(`SELECT … FROM agnes_sessions` returns *their* rows); on /admin/*
pages where admin sees / configures them across users, the
pronoun was misleading. Description text now spells out the
row-level RBAC contract explicitly.
Display names update via TableRegistryRepository.register's ON CONFLICT
UPDATE on next app boot; no manual cleanup needed.
* ui: subtitle notes about agnes_* tables on each Activity Center page
The recursive observability story — Agnes serves its own audit /
telemetry / session data through the same `agnes query` plumbing
analysts use for business data — wasn't surfaced anywhere on the
admin pages that show that data. Three pages get a one-liner with
the canonical `agnes query` snippet + the RBAC contract (analysts
see their own rows, admin sees all):
- /admin/activity (Audit log) → agnes_audit
- /admin/telemetry (Tool usage) → agnes_telemetry
- /admin/sessions → agnes_sessions
Sets up the discovery moment for admins: they're reading the page,
they see "you can query this from Claude Code", they remember it
when an analyst asks "how do I find my own failed tool calls?".
* ui(tables): explain "Show log" empty-state on /admin/tables
Cache warmup log <pre> renders with a dark background and is only
populated by the SSE stream during a Re-warm all run. Opening the
page cold + clicking Show log just revealed a black bar with no
context — admins couldn't tell what they were looking at.
Adds an inline paragraph above the <pre> explaining what the log is,
the row format, when it fills in, and where to find the historical
audit trail (/admin/activity). The actual <pre> stays empty until
SSE events arrive, but the surrounding copy carries the meaning.
* ui(tables): auto-open cache-warmup log on Re-warm all click
A Re-warm all run takes ~24s per remote BQ row. With the <details>
collapsed by default, operators saw the button disable, watched a
quiet ~24s pass, and assumed nothing had happened — the streaming
log was hidden behind a closed disclosure.
Two small JS tweaks:
- cacheWarmupRun() opens the details on click, so streamed lines
appear without an extra interaction
- cacheWarmupOnStart() hides the inline hint paragraph the moment
real log content lands, so the dark log block isn't competing
with redundant context
Hint paragraph also clarifies that only `query_mode='remote'` BQ
rows are warmed — operators with only materialized/internal tables
would see total=0 and the page would "do nothing" by spec.
* ui: trim Agnes internal copy across surfaces
Descriptions had grown to explain the extraction pipeline ("parsed
out of session JSONLs"), the underlying table ("Backed by
usage_session_summary"), the RBAC mechanic ("row-level RBAC at query
time — analysts see their own; admin sees all"), and the SQL snippet.
Every implementation detail meant another rewrite on the next iter.
Strips to one stable line per surface: what the data is, plus
"Also available locally for analysis". Mechanics live in code +
docs; the page copy says what the user needs to know.
Touched:
- connectors/internal/access.py: INTERNAL_TABLES descriptions
- activity_center.html / admin_usage.html / admin_sessions.html
subtitles
- catalog.html Agnes Internal card description + row strip
- admin_tables.html "Agnes internal" tab hint
* fix(internal): is_user_admin arity bugs + + saved-view payload cap
Round-1 code review (PR #278) caught two blocking bugs and three nits.
Blocking — both `is_user_admin(user)` (single dict arg) calls raised
TypeError. is_user_admin signature is `(user_id, conn)`. Affected:
- app/api/query.py:_run_internal_query — every POST /api/query that
references agnes_sessions / agnes_telemetry / agnes_audit blew up
with a 500. The headline analyst-facing feature of this PR was
unusable through the API.
- app/api/v2_sample.py — same shape; `GET /api/v2/sample/agnes_*`
returned 500.
Both fixed to call `is_user_admin(user.get("id"), conn)`. Added two
FastAPI-level tests in test_internal_data_source.py that go through
the TestClient — the existing unit tests on `execute_internal_query`
and `build_filter_clause` skipped the request-handler layer where the
bugs lived, which is why this landed.
Nits also closed:
- connectors/internal/access.py: `+` allowed in _USERNAME_RE /
_USER_ID_RE so RFC 5321 email local-parts (alice+test@x) resolve
correctly without hitting InternalAccessError.
- app/api/observability.py: saved-view payload capped at 64 KiB to
prevent an admin from bloating system.duckdb with a malformed save.
* fix(security): close non-admin data-leak via underlying-table refs
PR #278 R2 review surfaced a non-admin-exploitable bypass: SQL whose
string literal contains 'agnes_sessions' routed into the privileged
internal-query path, then queried the underlying physical table
(usage_session_summary / usage_events / audit_log) directly, escaping
the CTE wrapper's row filter. Two reinforcing defenses:
1. find_internal_refs() now strips single-quoted string literals
before scanning for alias names — a literal alone no longer
routes the request into the privileged code path.
2. execute_internal_query() rejects non-admin SQL that references
the underlying physical tables (usage_*, audit_log). The CTE
wrapper only scopes the agnes_* aliases; a direct FROM on the
base table — or a shadowing inner WITH that still has to read
the base table — bypasses RBAC. Block before execution with an
actionable error pointing to the agnes_* alias. Admins are
unaffected (god-mode short-circuit on the filter clause).
3. tests/test_internal_data_source.py — three new negative tests
covering literal-only matches, direct-table refs, and CTE
shadow attempts.
Also tightens usage_ask.py's SELECT-only validator: pragma_table_info,
pragma_storage_info, pragma_database_*, and duckdb_tables / columns /
views / indexes / schemas are reflection functions that leak metadata
the analyst question shouldn't reach. \bPRAGMA\b in _FORBIDDEN never
matched the function-call form (word-boundary between `A` and `_`).
* fix(security): dynamic denylist for non-admin internal queries
R3 review (PR #278) caught a wider data-leak than R2: the underlying-
physical-table guard listed only the 7 usage_* + audit_log tables,
but system.duckdb has 30+ other sensitive tables — users (emails +
ids), personal_access_tokens, resource_grants, user_groups,
user_observability_views, store_*, marketplace_*, knowledge_*, etc.
A non-admin SQL like
SELECT * FROM agnes_sessions
UNION ALL SELECT email, id, … FROM users LIMIT 1
would leak every user's row.
Replaces the hardcoded denylist with a **dynamic allowlist** —
non-admin SQL may reference ONLY the registered agnes_* aliases.
Every other table in `information_schema.tables` (main schema) is
rejected. Future migrations that add a new sensitive table are
automatically covered without re-editing this module.
Also strips SQL comments (`/* */` and `--`) before the identifier
scan so a comment-wrapped table name (`/**/users/**/`) can't slip
past the regex.
Four new negative tests pin: `users`, `personal_access_tokens`,
block-comment wrap, line-comment wrap.
Plus: per-user view-count cap (100) on /api/admin/observability/views
so an admin can't fill system.duckdb with thousands of saved views.
* release: 0.54.0 — Activity Center + Telemetry + Sessions + internal datasource
Cuts the work shipped across this PR (Activity Center build, recursive
internal data source) into a versioned release. Bumps pyproject.toml
to 0.54.0; renames the top of CHANGELOG.md from [Unreleased] to
[0.54.0] — 2026-05-12 with a header summary; opens a fresh
[Unreleased] section for the next round.
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Extract session pipeline framework, refactor verification, add UsageProcessor skeleton
Pluggable framework under services/session_pipeline/ (contract + lib + per-processor
runner) so multiple processors can read /data/user_sessions/<key>/*.jsonl on their
own cadence with full failure isolation. Verification flow becomes the first plugin;
a no-op UsageProcessor reserves the second slot pending a separate brainstorm on
extraction logic + storage shape.
Schema v28→v29: rename session_extraction_state → session_processor_state with
composite PK (processor_name, session_file). Existing rows copied over with
processor_name='verification'; legacy table dropped. Migration is idempotent and
no-ops the copy step on fresh installs that came up at the new schema.
Endpoint: /api/admin/run-verification-detector replaced by parametrized
/api/admin/run-session-processor?processor=<name>. Audit action format follows.
Scheduler JOBS: verification-detector entry split into session-processor:verification
+ session-processor:usage. SCHEDULER_VERIFICATION_DETECTOR_INTERVAL retained for
operator compatibility (drives both cadence and health-check grace window);
SCHEDULER_USAGE_PROCESSOR_INTERVAL added.
* Address PR #232 review: scan dead branch + per-processor lock
- `SessionProcessorStateRepository.scan_unprocessed_for` dead else: both
branches surfaced every jsonl, the SELECT was unused, runner MD5-rehashed
every stable session per tick. Replaced with an mtime precheck — stable
sessions (mtime <= processed_at) are filtered at scan; modified files
still surface for the runner's authoritative `file_hash` invalidation.
Naive-local comparison matches the existing health-check idiom (DuckDB
TIMESTAMP strips tz on storage).
- Per-processor advisory lock around `_run_processor` in
`/api/admin/run-session-processor`. Scheduler tick + manual admin POST
could otherwise both run, both call create_evidence on overlapping
detections, and accumulate duplicate verification_evidence rows (the
dedup short-circuit only covers create+contradiction, not evidence per
ADR Decision 3). Non-blocking acquire → 409 Conflict on concurrent
invocation; release in finally so a runner exception doesn't wedge the
processor.
Tests: two new scan unit tests (mtime filter + post-mark mtime bump), 409
endpoint test, lock-released-on-exception test. Two existing tests updated
for the new "filtered at scan" stat shape (previously asserted skipped == 1,
now scanned == 0).
* Address PR #232 review #2: parallel scheduler tick + last_run on terminal state
Two pre-existing scaffold bugs in services/scheduler/__main__.py amplified
by adding more session-pipeline jobs:
1. Serial for-loop over jobs with synchronous httpx.post(timeout=900) — a
10-minute verification run blocked every other job (data-refresh,
health-check, usage, corporate-memory) for the whole window. The PR's
stated isolation guarantee held inside the runner but broke at the
scheduler dispatch layer.
2. last_run advanced only when _call_api returned True. Permanent-failure
jobs hot-looped on every tick (30s) instead of cadence (15min).
Fix: ThreadPoolExecutor.submit per due job + per-job in_flight set so a
long-running job can't be re-launched on subsequent ticks. last_run
advances unconditionally in finally; errors still surface via _call_api
logging + audit_log on the receiving side.
_run_job extracted to module-level for unit testing. New tests:
- TestRunJobBookkeeping: advances on success / failure / unhandled raise
- TestRunLoopParallelism: in_flight protection prevents duplicate
launches across ticks for a single slow job
---------
Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>