agnes-the-ai-analyst

Author	SHA1	Message	Date
ZdenekSrotyr	c3e82972c8	feat(bq): decouple table_registry bucket from BQ dataset name (#343 ) (#346 ) * feat(bq): decouple table_registry bucket from BQ dataset name (#343) Adds optional `bq_fqn` column (schema v51) carrying the fully-qualified BigQuery path (project.dataset.table) so the rebuild path no longer has to reconstruct it from the dual-purpose `bucket` field (which is also a UX/RBAC label). - Schema v51 migration + _SYSTEM_SCHEMA carry the nullable column; rows without it keep using the legacy bucket+source_table+ remote_attach.project path (backwards compat). - BQ extractor honors bq_fqn per row when present: dataset/table override on same-project rows; cross-project VIEW path works via bigquery_query(billing, ...); cross-project BASE TABLE skipped with a clear warning (multi-ATTACH per project deferred to follow-up). - Orchestrator pre-pass detects drift between extract.duckdb _remote_attach.url and overlay data_source.bigquery.project, calls rebuild_from_registry to regenerate when they differ. Closes the operational hazard where /admin/server-config edits silently left the on-disk extract pointing at the old project until the next manual sync. - Startup config check warns when project ≠ billing_project without location set (the on-disk symptom is "provider returned no data" silently in metadata cache), and when a warehouse-like data project has no billing_project override (silent 403 serviceusage path). - _resolve_bq_location warning now points at the location config key explicitly so operators see the actionable fix in the log. - POST /api/admin/register-table and PUT /api/admin/registry/{id} accept bq_fqn; malformed values rejected at the API boundary (422). - 25 tests covering parse_bq_fqn matrix, extractor override paths (same-project + cross-project VIEW + cross-project BASE TABLE skip), orchestrator drift sync, startup-validator heuristic, admin models. UI surface for bq_fqn input in /admin/tables intentionally omitted from this PR (3.5k-line template change) — admins can register through the REST API or `agnes admin` CLI in the meantime. Multi-project ATTACH support is the same scope deferral as the cross-project BASE TABLE skip; both ride a follow-up PR. * review fixes: abstract CHANGELOG, merge duplicate Changed, bump docs schema version - CHANGELOG.md: remove customer-specific hostname + incident date range from the orchestrator drift-sync entry (vendor-agnostic OSS rule), fold the entry into the existing [Unreleased] ### Changed section instead of opening a duplicate heading. - docs/architecture.md: bump 'Current schema version' from 19 to 51 to match SCHEMA_VERSION (per agnes-orchestrator skill rule #4). * review fixes: vendor-agnostic test fixture + Schema v51 internal bullet - tests/test_bq_fqn.py: replace customer GCP project ID with generic 'my-warehouse-project' placeholder (vendor-agnostic OSS rule). Test asserts on the warehouse-like heuristic, not the literal project name, so the rename is behavior-neutral. - CHANGELOG.md: add explicit '\\Schema v51\\' bullet under `### Internal` naming the new version + summarizing the additive nullable column (matches the convention from v47/v48 bullets). * fix(bq): cross-project _detect_table_type bills against extractor project Addresses Devin review on #346 — pre-fix _detect_table_type passed the data project as BOTH the FROM-clause target AND the bigquery_query() first arg (billing project). For cross-project bq_fqn rows where fqn_project != project_id, the data SA holds bigquery.dataViewer on fqn_project but the serviceusage.services.use permission only on project_id, so the call 403'd. init_extract's broad except Exception swallowed the error and silently skipped the row, meaning the cross-project VIEW path at extractor.py:~696 — the PR's primary cross-project use case — never executed. - Add optional billing_project kwarg to _detect_table_type; defaults to project for backwards compat (same-project callers unaffected). - Update the init_extract call site to pass billing_project=project_id explicitly. Same-project rows (fqn_project == project_id) are a no-op; cross-project rows now route billing to the project where the SA actually has services.use. - 2 new tests in TestDetectTableTypeBilling cover (a) explicit billing_project routing to bigquery_query 1st arg + data project staying in FROM, and (b) the backwards-compat default. Plus test_cross_project_detect_call_bills_against_extractor_project pins the call-site wiring — captures the (project, billing_project) pair the extractor passes for a cross-project bq_fqn row. * release: 0.54.29 — bq_fqn decoupling + marketplace refactor + setup-script UX Accumulated [Unreleased] content from #342 (flea marketplace refactor), #344 (setup script step-2 cwd check), and #346 (this PR — bq_fqn column + orchestrator drift sync + startup config check). Schema v51.	2026-05-19 11:17:32 +00:00
minasarustamyan	c6c72b9c00	feat(flea): marketplace refactor — data model, attribution, UI unification (#342 ) * feat(flea): phase-1 — title, tagline, synthetic_name columns + upload UX Schema v49 adds three user-facing metadata columns to store_entities: - title (NOT NULL) — humanized display name shown on marketplace surfaces in later phases. Acronym-aware humanizer in src/store_naming.py (27 entries: MCP, API, OAuth, S3, …) shared with the frontend via Jinja-injected dict so JS pre-fill and Python backfill produce identical output. - tagline (NULL, ≤200 chars) — optional short description for card listings. Long-form `description` stays. - synthetic_name (NOT NULL) — deterministic `<name>-by-<owner_username>` stored as a column for indexing and as the single source of truth for attribution lookups in later phases. Today's bundle bake still uses suffixed_name() at the same call sites. Migration (_v48_to_v49_migrate, Python function — humanize has no SQL equivalent) backfills existing rows: title from humanize_name(strip_archive_suffix(name)), synthetic from the concat formula; tagline stays NULL. Idempotent (ADD COLUMN IF NOT EXISTS + SET NOT NULL no-op on re-run). Upload form (store_upload.html step 2) reorders fields: Title (pre-filled from server-side humanize, JS keeps it in sync until the user edits manually) → Name + dark synthetic preview on one row (matches marketplace_item_detail.html dark code styling, no copy button — preview only) → Short description with character counter → Description (unchanged). Edit form (store_edit.html) mirrors the layout with pre-filled values from the entity row. API: - POST /api/store/entities/preview returns `title` (humanized fallback) for upload form pre-fill. - POST + PUT /api/store/entities accept `title` and `tagline` form fields with 100/200-char validation; PUT recomputes synthetic_name when `name` changes (caller responsibility per repo contract). - StoreEntityResponse exposes all three new fields. Repository: - create() takes title + tagline + synthetic_name as optional kwargs with derived defaults (humanize_name(name) / concat) so existing test fixtures don't need to thread them. - update() supports partial updates on all three; tagline empty string clears via NULL sentinel. - archive() recomputes synthetic_name on rename to the archived slug so the column stays consistent with name. Tests: - New test_schema_v48_to_v49_migration.py: fresh install, populated-row backfill (incl. archived row strip), idempotence, NOT NULL constraint verification. - test_store_naming.py: 14 humanize parametrize cases + acronym dict invariants. - test_store_api.py::TestStoreV49Metadata: preview humanize, POST with explicit + fallback title, 100/200-char rejects, PUT partial update + synthetic recompute on rename. - Schema version assertion bumps (48 → 49) in test_db_schema_version, test_home_stats, test_schema_v42_migration, test_schema_v46_migration. Phase 1 only — surface rendering on cards / detail pages and Claude Code bundle propagation come in later phases. * feat(flea): phase-2 — wire title/tagline/owner through marketplace cards + detail pages Phase 1 (7f4cfcbb) populated the three new columns on store_entities; phase 2 surfaces them across the web presentation layer so the kebab- case slug + bare username no longer leak into user-facing copy. API: - `_flea_to_item` now takes `conn` (both callsites updated) and sets `display_name=entity.title`, `tagline=entity.tagline`, `owner= _resolve_owner_display(conn, owner_user_id, owner_username)` — matches the chain the curated path already uses (users.name → users.email → fallback). The card JS chain `it.display_name \|\| it.name` then renders the friendly form; `name` stays at the suffixed slug as the technical identifier JS uses for fallbacks. - `flea_detail` adds `display_name` + `tagline` to PluginDetailResponse so the standalone skill/agent + plugin detail heroes pick them up through the existing `d.display_name` / `d.tagline` chains. - `_flea_inner_parent_fields` swaps `parent_display_name` from `strip_archive_suffix(name)` to `entity.title or strip_archive_suffix( name)`. Drives parent-plugin label in four surfaces at once: breadcrumb 3rd segment, hero "part of <plugin>" meta-row, helper "This skill is part of <plugin>" panel, and the Details sidebar's "Parent plugin" row. Templates — `marketplace_item_detail.html`: - Pre-render: browser title, hero h1, and hero-window-label read `(entity.title if entity else None) or inner_name or item_name or plugin_name` so the SSR shell shows the friendly title before the JS fetch lands (no flash of kebab-case). - Breadcrumb last segment for flea standalone drops the `d.manifest_name \|\| heroTitle` fallback in favour of just `heroTitle` — manifest_name is the suffixed slug and users explicitly didn't want it in the path. - Hero meta-row for flea standalone is now hidden. The prior "by <author> · N installed · <size>" line duplicated install count (hero telemetry chip below), owner + bundle size (Details sidebar). Templates — `marketplace_plugin_detail.html`: - Same SSR pre-render swap (title, h1, window-label, crumb-name). - Hero tagline element starts hidden; JS shows it only when `d.tagline` is truthy. Pre-fix it fell back to `d.description` (long-form text), which read awkwardly under the h1 and pulled the hero too tall. Description still renders in the "What it does" panel below the hero. - Initial "Loading…" placeholder removed so entities without a tagline don't flash that text mid-fetch. Tests: - New `TestFleaPhase2Presentation` class in test_marketplace_api.py (6 cases): card title + tagline + full-name owner, owner fallback chain when users.name is NULL, flea_detail exposes title + tagline, tagline null when omitted, inner skill parent_display_name uses entity.title (explicit + humanize-fallback variants). - Updated `TestListItems.test_flea_lists_uploads` to assert both `display_name == "Alpha"` (humanized) and `name == "alpha-by-alice"` (suffixed slug compat). - Updated `TestWebPages.test_marketplace_flea_detail_page_renders` to look for the humanized title ("Page Skill") in the SSR shell instead of the kebab-case `page-skill`. * feat(flea): phase-3 — read synthetic_name from DB, suffixed_name() only on write Phase 1 added the column + backfill, repo write paths keep it in sync. Phase 3 routes every READ callsite through `store_entities.synthetic_name` directly instead of recomputing `<name>-by-<owner_username>` on the fly, and switches the collision query off the inline string concat. The `suffixed_name()` primitive now lives exclusively in write flows. Read callsites updated (all read `entity["synthetic_name"]` directly, no fallback — the column is NOT NULL and a missing value would be a real bug worth surfacing as KeyError): - app/api/marketplace.py:_flea_to_item — card MarketplaceItem.name. - app/api/marketplace.py:flea_detail — PluginDetailResponse.manifest_name. - app/api/store.py:_entity_to_response — StoreEntityResponse.invocation_name. - app/api/store.py PUT bundle re-bake — `suffixed` passed to `_bake_plugin_tree`; entity is loaded pre-rename, so its synthetic_name is the OLD value `_bake_plugin_tree` expects. - app/api/store.py PUT rename — `old_suffix` for `_rename_baked_tree`. - app/api/my_stack.py — StoreInstallEntry.invocation_name. - src/marketplace_filter.py — manifest_name in served plugin entry. `suffixed_name` imports removed from marketplace.py, my_stack.py, and marketplace_filter.py (no remaining callsites). store.py keeps the import for its write paths: - POST create (`suffixed = suffixed_name(final_name, username)` → passed to `_bake_plugin_tree` and `repo.create(synthetic_name=...)`). - PUT rename collision check (`new_suffixed`). - PUT rename `new_suffix` for `_rename_baked_tree` (proposed value). - PUT rename `new_synthetic` for `repo.update(synthetic_name=...)`. - Archive `old_suffix` + `new_suffix` for `_rename_baked_tree` (retro-compute pre-archive value after `repo.archive` already overwrote the DB row with the post-archive synthetic). Collision SQL — `_suffixed_already_taken`: WHERE name \|\| '-by-' \|\| owner_username = ? (before) WHERE synthetic_name = ? (after) Same matches today (phase 1 backfill + NOT NULL invariant + write paths in sync); indexable + single source of truth going forward. Repository: - UserStoreInstallsRepository.list_for_user explicit SELECT extended with `se.title`, `se.tagline`, `se.synthetic_name` so my_stack and marketplace_filter callers can read them off the joined row. Tests: - test_store_api.py::test_invocation_name_reads_from_synthetic_column — upload entity, manually override the column with a non-canonical value, verify GET response returns the override (proves read path consumes the column, not recomputes). - test_marketplace_api.py::test_flea_card_and_detail_read_synthetic_name_from_db — same proof for `MarketplaceItem.name` (card) and `PluginDetailResponse.manifest_name` (detail). * feat(flea): phase-4 — rename agnes-store-bundle → flea (synthetic plugin) The synthetic plugin that wraps loose flea-market skills + agents into one Claude Code plugin is renamed from `agnes-store-bundle` to `flea`. Plugin-type flea uploads (their own standalone plugin entry) are unaffected. Constants: - src/marketplace_filter.py: - BUNDLE_PLUGIN_NAME: "agnes-store-bundle" → "flea" (Claude Code plugin manifest name + .claude-plugin/plugin.json name) - BUNDLE_PREFIXED_NAME: "store-bundle" → "flea" (on-disk ZIP / git tree path, now plugins/flea/...) Attribution layer (services/session_processors/usage_lib.py): - FLEA_BUNDLE_PREFIX: "agnes-store-bundle" → "flea". The JSONL invocation identifier going forward is `flea:<skill-name>`. - New `_LEGACY_FLEA_BUNDLE_PREFIXES = ("agnes-store-bundle",)`. `MarketplaceItemLookup.resolve()` + `_attribute_event()` accept BOTH the new and the legacy prefix so historic usage_events (~90-day retention) continue attributing to source='flea'. The tuple becomes a no-op once the rename has been live past the retention window — a follow-up commit can drop it then. - USAGE_PROCESSOR_VERSION bumped 6 → 7 so the session-pipeline reprocess loop re-runs attribution with the new + legacy prefix branches. User-facing copy: - /api/store/bundle.zip Content-Disposition filename: agnes-store-bundle.zip → flea.zip - `agnes admin store pull` default --out: agnes-store-bundle.zip → flea.zip - Docstrings + JS comment + welcome template comment updated. Tests: - skill_flea.jsonl fixture identifier updated to flea:flea-skill. - New skill_flea_legacy.jsonl with the legacy prefix for backward-compat coverage. - New test `test_legacy_agnes_store_bundle_prefix_resolves` replays the legacy fixture and asserts source='flea' attribution still lands. - All other test assertions / mocks substituted mechanically: test_session_processor_usage.py, test_usage_rollups.py, test_marketplace_filter_store.py, test_store_api.py, test_cli_refresh_marketplace.py. - `_seed_flea_entity` (test_usage_rollups.py) + `_seed_attribution` (test_session_processor_usage.py) helpers now supply the NOT NULL `title` + `synthetic_name` columns from phase 1, since they INSERT directly bypassing the repo's create() fallback. Client rollover note (CHANGELOG): `agnes refresh-marketplace` will install the new `flea@agnes` plugin and the local marketplace clone's `plugins/store-bundle/` source folder is removed via `git reset --hard`. Whether Claude Code itself auto-prunes the orphan `agnes-store-bundle @agnes` registry entry is undocumented — to verify empirically on the dev VM. If the orphan entry lingers, a follow-up will add targeted cleanup; until then users can manually run `claude plugin uninstall agnes-store-bundle@agnes`. Verified locally: 98 passed (session_processor_usage + usage_rollups + marketplace_filter_store + cli_refresh_marketplace) + 228 passed/2 skipped (store_api + marketplace_api + admin_store_submissions + store_entity_versions + store_repositories). * fix(flea): phase-5 — attribution keyspace mismatch (closes #335) Pre-fix every flea skill/agent invocation silently fell through to `usage_events.source = 'builtin'`. Root cause: lookup tables in `services/session_processors/usage_lib.py` keyed `_flea_entities` (and the derived `_flea_plugins` set) by `store_entities.name` — the un-suffixed display name. Claude Code writes invocations as `flea:<synthetic_name>` (e.g. `flea:xlsx-by-c-marustamyan`), so `dict.get(local)` always missed and the resolver fell through to builtin. Result: marketplace cards, detail telemetry chips, admin group-by-source all showed 0 flea invocations even when the raw JSONL stream was correct. Phase 1 added the `synthetic_name` column + backfill; phase 4 renamed the bundle prefix to `flea`; phase 5 finally flips the lookup keyspace to match what JSONL writes. usage_lib.py: - `MarketplaceItemLookup.__init__` preload: `SELECT synthetic_name, type FROM store_entities` (was `SELECT name, type`). `_flea_plugins` set derived from those keys, so it now carries synthetic_names too — matches what Claude Code writes when invoking a skill nested inside a flea plugin (`<synthetic>:<inner>`). - `rebuild_rollups` preload: same SELECT change; also derives `flea_plugins` and threads it through `_aggregate_events` / `_rebuild_window`. - `_attribute_event`: signature extended with `flea_plugins`; new branch `if prefix in flea_plugins: return ("flea", default_type, prefix, local)` for flea-plugin-nested skills/agents. This branch was added to `MarketplaceItemLookup.resolve()` in v6 (commit e076ebbe) but the rollup builder's helper was never updated to match, so nested skills inside flea plugins silently dropped out of the daily/window fact tables. - `USAGE_PROCESSOR_VERSION`: 7 → 8. Forces the session-pipeline reprocess loop to re-attribute existing usage_events rows with the corrected lookup so rollup tables fill correctly on the next tick. marketplace.py — 4 API stats lookup callsites switched from `entity["name"]` to `entity["synthetic_name"]`: - `_flea_to_item` (card stats lookup) - `flea_detail` (`_build_telemetry` + `_load_inner_items_stats_by_parent`) - `flea_skill_detail` (inner detail `parent_plugin` key) - `flea_agent_detail` (inner detail `parent_plugin` key) Tests: - `skill_flea.jsonl` invocation: `flea:flea-skill` → `flea:flea-skill-by-alice` (mirrors what Claude Code writes after phase 1/4 — the suffixed synthetic_name). - `test_flea_skill_attributed_with_empty_parent` assertion: rollup `name` column now carries the synthetic_name. No legacy `agnes-store-bundle` prefix backward compat — clean cut per user direction (dev phase, no production data worth preserving). Verified locally: 53 passed targeted (session_processor_usage + usage_rollups + marketplace_filter_store) + 215 passed/2 skipped broader (store_api + marketplace_api + admin_store_submissions + store_entity_versions). * fix(flea): phase-6 — plugin-level rollup aggregation parity for flea Flea plugin entity cards + detail pages showed 0 invocations even though nested skills had correct rollup rows. Root cause: the plugin-level aggregation pass in `_aggregate_events` was hardcoded to `source='curated'` only: if source != "curated" or not parent: continue if group_by_day: pkey = (day, "curated", "plugin", "", parent) else: pkey = ("curated", "plugin", "", parent) So flea plugin entities never got a synthetic `(source='flea', type='plugin', parent_plugin='', name=<synth>)` row aggregating nested invocations. `_load_invocation_stats('flea')` filters `parent_plugin = ''` and returned no row for flea plugin entity cards, so `stats.get(entity["synthetic_name"])` missed and the API exposed 0/0. Triggered by empirical observation on the dev VM — `codex-second-opinion-by-c-marustamyan` plugin showed 0 calls in the listing card while its three inner skills (codex-setup ×3, codex-review ×1, codex-second-opinion ×1) had the expected child rollup rows. Fix: - Extend the guard to `source in ("curated", "flea")`. - Replace the hardcoded `"curated"` in the `pkey` tuple with the loop's `source` variable, so flea aggregation lands as `source= 'flea'` and curated aggregation continues landing as `source='curated'`. API path unchanged — `_load_invocation_stats('flea')` filters `parent_plugin = ''` already picks up the new aggregated row alongside standalone skill/agent rows. Rollup `name` field carries the synthetic_name keyspace; no collision between standalone entity synthetic and plugin entity synthetic (global suffix uniqueness enforced by `_suffixed_already_taken`). `USAGE_PROCESSOR_VERSION` bumped 8 → 9 to force a reprocess pass so historic nested-invocation data fills the new plugin-level rows on the next tick (instead of waiting for the next live invocation). Tests: - New `test_flea_plugin_row_aggregates_children` mirrors the existing `test_curated_plugin_row_aggregates_children`: seeds a flea plugin entity, three nested events (one user invoking two skills, a second user invoking one) → asserts the aggregated plugin row carries count=3, distinct_users=2 (union, not sum), plus the child rows survive alongside. Verified locally: 43 passed (session_processor_usage + usage_rollups) + 82 passed/2 skipped broader (+ marketplace_filter_store + marketplace_api). * refactor(marketplace): phase-7 — unify Details sidebar across detail surfaces Five marketplace detail surfaces (curated plugin, flea plugin, curated inner skill/agent, flea inner skill/agent, flea standalone skill/agent) had drifted on which Details rows they show and what order — the same field landed in different positions, some fields duplicated hero info, and the flea plugin Owner row leaked the kebab-case `owner_username` slug instead of the user's real name. This commit aligns all five surfaces on a single scan order driven by UX priority: identity → life-stage → telemetry → debug-tier Concretely: 1. Curator / Owner (first scan signal — trust) 2. Parent plugin (inner skill/agent only) 3. Released (top-level only — plugins + flea standalone) 4. Last used (recency) 5. Active days (engagement consistency) 6. Version (flea standalone only — content hash) 7. Bundle size (debug-tier) Dropped: - Slug field on plugin detail surfaces (`marketplace_id` for curated, `entity_id` for flea). Pure debug info, never user-relevant; URL already carries it. - Category + Installs on flea standalone skill/agent detail. Category is already shown as a hero badge; install count is in the hero telemetry chip — sidebar duplication added noise. Owner display: - Flea plugin Owner row now reads `d.owner_display` (resolved through `users.name → users.email → owner_username` by `_resolve_owner_display` in `app/api/marketplace.py:1491`) instead of the raw `d.author_name` (which is `owner_username`, the kebab-case slug). API field already populated from phase 2; templates just consume it. - Curated Curator row continues to read `d.author_name` from marketplace-metadata.json; `owner_todo` placeholder behavior preserved. Files: - app/web/templates/marketplace_plugin_detail.html — rewrote the Details render loop (lines 1364-1427 area). Slug row removed, rows reordered, Owner branch reads `d.owner_display`. - app/web/templates/marketplace_item_detail.html — both branches of the Details sidebar (inner skill/agent + flea standalone) re-laid around the same scan order. Telemetry helper unchanged, just repositioned. Category + Installs rows removed from the standalone branch. No new tests — no existing test asserts the precise order of Details rows or references the dropped fields in a sidebar context (grep confirmed). API surface unchanged. Verified locally: 84 passed / 2 skipped on `test_marketplace_api.py` + `test_store_api.py`. * fix(flea): post-review hardening — N+1, v50 UNIQUE, docs, test cleanup Addresses 5 critical findings from PR #342 code review: 1. N+1 query in `_flea_to_item` — owner-display resolution previously ran one `SELECT … FROM users WHERE id = ?` per item in the listing comprehension. Now batched via `_load_users_display` IN-query prefetch; 50 items drops 51 user queries to 2. Regression-guarded by `TestFleaOwnerDisplayBatched` (spies `_resolve_owner_display` and asserts it's not called inside the list path). 2. Misleading comment in `src/marketplace_filter.py` claimed the attribution layer accepts both `agnes-store-bundle` and `flea` prefixes — it doesn't (clean cut per CHANGELOG). Rewrote to match reality. 3. CHANGELOG `[Unreleased]` had two `### Changed` blocks. Merged into one (BREAKING bullet first). 4. New v49→v50 migration adds `UNIQUE INDEX idx_store_entities_synthetic_name`. v49 made `synthetic_name` the canonical attribution key but uniqueness was only app-enforced; v50 promotes the invariant to the DB layer. Migration pre-checks for existing duplicates and raises `RuntimeError` listing them rather than letting `CREATE UNIQUE INDEX` fail mid-way. v48→v49 migration gained an `is_nullable='YES'` guard on its `SET NOT NULL` ALTERs so re-runs on a fully-migrated DB don't trip DuckDB's "cannot alter entry … entries depend on it" block (the new index counts as such an entry). Index is created by the migration only — keeping it out of `_SYSTEM_SCHEMA` preserves fresh-install ordering (CREATE TABLE → v49 ALTERs → v50 CREATE INDEX). 5. Deleted three redundant version-pinned schema asserts whose names lied about their bodies (`test_schema_version_is_42` asserting `== 49`, etc.). Canonical assert lives in `test_db_schema_version.py`, renamed to `test_schema_version_matches_constant`. * fix(db): gate v34→v38 store_entities ALTER COLUMN steps on column state CI on Linux failed `test_v17_to_v18_drops_*` after the v50 UNIQUE INDEX landed. Root cause: those tests open a DB at the full target version, seed fixtures, then reset `schema_version` to 17 and reopen — forcing the ladder to re-run from 17 → current. With the v50 index now in place, DuckDB blocks intermediate `ALTER COLUMN` steps on `store_entities` ("Cannot drop this column: an index depends on a column after it!" / "Cannot alter entry because there are entries that depend on it"), because `synthetic_name` (the indexed column) sits positionally after the columns those steps touch. Fix: convert the three SQL-list migrations that hit store_entities into defensive Python functions: - `_v34_to_v35_migrate` short-circuits when `synthetic_name` already exists (post-v49 shape — the visibility_status rebuild is moot and the DROP COLUMN would be blocked by the index). - `_v35_to_v36_migrate` gates the `visibility_status SET NOT NULL` + `SET DEFAULT` on `is_nullable='YES'` so it's a true no-op when the column is already constrained. - `_v37_to_v38_migrate` gates the `version_no SET NOT NULL` step the same way. Forward-roll path (real installs that never reset schema_version) is unchanged: the gates fire `YES` → ALTERs run. The fix only changes behavior for the "DB is already at v50 shape but version row says 17" scenario the tests construct. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>	2026-05-19 02:32:41 +02:00
David Rybar	e11f03eb60	fix(api): sample endpoint returns 500 for materialized BQ tables (#341 ) * fix(api): v2 sample endpoint returns 500 for materialized BQ tables build_sample in app/api/v2_sample.py checked only source_type == 'bigquery' before routing to _fetch_bq_sample, so materialized tables (source_type='bigquery', query_mode='materialized') attempted a live BigQuery query for data that lives locally as parquet — causing an unhandled exception and HTTP 500. Fix mirrors the existing guard already in v2_schema.py (#261): skip _fetch_bq_sample when query_mode='materialized' and fall through to the local parquet read path. The parquet is the source of truth for any materialized source regardless of source_type. Regression test test_materialized_bq_table_reads_parquet_not_bq patches _fetch_bq_sample with a sentinel, registers a materialized BQ table, calls build_sample, and asserts (a) the sentinel was never hit and (b) rows came from the local parquet. Credit @davidrybar-grpn (#341, cleaned + rebased onto post-#340 main). * release: 0.54.28 — v2 sample endpoint materialized-BQ 500 fix --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-18 22:57:32 +02:00
Vojtech	c552bf8243	feat(api): enforce API design rules via pytest + fix DELETE/status-code violations (#338 ) * feat(api): enforce API design rules via pytest + fix DELETE/status-code violations Adds tests/test_api_design_rules.py with four forward-only design guardrails that prevent new endpoints from accumulating REST debt: Rule 1 — No new verbs in URL paths (existing 28 grandfathered via allowlist) Rule 2 — DELETE must declare 204 No Content (zero allowlist entries) Rule 3 — Creator POSTs (path has GET counterpart) must declare 201/202 Rule 4 — All protected /api/* routes must declare 401 and 403 Fixes found by running the rules: - DELETE /api/admin/metrics/{metric_id}: return 204, drop redundant body - DELETE /api/memory/{item_id}/dismiss (undismiss): return 204, drop body - POST /api/memory/admin/contradictions: add status_code=201 (creates a resource) - app/main.py: _add_auth_error_responses() injected into app.openapi() at startup; declares 401/403 on all protected /api/* operations centrally, fixing the 120 routes that previously omitted these response codes from the spec. Closes #337 * fix(api): resolve CI failures — extend 204 fixes + complete allowlists - Fix remaining 6 DELETE endpoints to return 204: store entities, store entity install, marketplace curated install, marketplace plugin system flag, admin store submission, and observability view - Update all affected tests to expect 204 (removed body assertions) - Add 4 missing verb paths to _VERB_PATH_ALLOWLIST in test_api_design_rules.py - Add 2 upsert endpoints to _CREATOR_POST_ALLOWLIST - Update admin_marketplaces.html to not call r.json() on 204 DELETE * fix(tests): align 2 DELETE-asserting tests with 204 contract (post-#339 rebase) CI's test-shard (1) and (4) failures on this PR were caused by Vojta's second commit (`fix(api): resolve CI failures — extend 204 fixes`) flipping more DELETE endpoints to status_code=204 than just the two mentioned in the PR body. Two tests assert status_code==200 on the DELETE response and broke: - tests/test_admin_store_submissions.py::TestQuarantineGates::test_admin_can_delete_quarantined (DELETE /api/store/entities/{entity_id}) - tests/test_store_api.py::TestInstallCycle::test_admin_hard_delete_cascades_installs (DELETE /api/store/entities/{entity_id}?hard=true) Updated both to assert 204 with a comment pointing at tests/test_api_design_rules.py rule 2 so future reviewers can trace the contract. Verified via broader scan that no other test asserts == 200 on a .delete() response directly (4 other sites do .delete() then check 200 on a subsequent GET — those are fine). * release: 0.54.26 — API design rules (test_api_design_rules.py) + 8 DELETE endpoints flip to 204 --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-18 15:25:07 +02:00
Vojtech	c5948f26fc	fix(api): harden API surface before Swagger (issue #336 ) (#339 ) * fix(api): harden API surface before Swagger — 9 findings from issue #336 ADV-001: POST /api/sync/table-subscriptions now checks can_access() per table entry, matching the gate already on POST /api/sync/settings. ADV-002: GET /webhooks/jira/health gated behind require_admin; jira_domain removed from response to prevent anonymous info disclosure. ADV-003: GET /api/version no longer exposes commit_sha or schema_version. ADV-005: /docs, /redoc, /openapi.json now require a valid session via custom FastAPI routes (docs_url=None, redoc_url=None, openapi_url=None). ADV-006: /cli/ and /webhooks/ added to _API_PATH_PREFIXES so future auth-gated routes there return JSON 401 not an HTML redirect. ADV-007: GET /api/catalog/tables wired to CatalogTablesResponse model. ADV-008: TableSubscriptionUpdate.tables capped at max_length=500. ADV-009: GET /api/users and GET /auth/admin/tokens accept limit/offset (default 1000, max 10000); repositories updated accordingly. Tests: 11 new regression tests in TestApiHardening336; test_jira_webhooks fixture updated with seeded admin user; OpenAPI snapshot regenerated. * fix(test): update test_journey_jira health check to use admin auth after ADV-002 gate * fix(security): close /auth/bootstrap auth-bypass + BREAKING markers on ADV-002/003/005 Reviewer-flagged regression introduced by ADV-009's pagination on UserRepository.list_all(): the silent default LIMIT 1000 broke the bootstrap check at app/auth/router.py and the startup no-password warning at app/main.py — both call list_all() with no args and depend on exhaustive enumeration. On an instance with >1000 users where no password-holder lands in the email-sorted first page, [u for u in list_all() if u.get('password_hash')] becomes empty → bootstrap re-opens → an unauthenticated caller can claim admin via /auth/bootstrap. Real auth-bypass on a security-sensitive boot path. Fix: - src/repositories/users.py: list_all() restored to no-arg, returns EVERY row (no LIMIT). Comment explicitly warns against re-adding pagination here. API-surface pagination moved to a new list_paginated(limit, offset) method with its own docstring. - app/api/users.py: GET /api/users now calls list_paginated(). Existing query-param validation (limit <= 10000) preserved. Regression guards in tests/test_security.py::TestApiHardening336: - test_users_list_all_returns_every_row_no_silent_limit asserts list_all() takes no params other than self (via inspect.signature) so a future cleanup can't accidentally re-add limit/offset. - test_users_list_paginated_is_separate_method asserts the paginated variant is a distinct method, not an overload. CHANGELOG: added BREAKING markers per CLAUDE.md release discipline to three pre-existing ADV bullets that are observable breaking changes for external consumers: - ADV-002 (webhook health going from anonymous to admin-only) - ADV-003 (/api/version dropping commit_sha + schema_version) - ADV-005 (/docs, /redoc, /openapi.json going from anonymous to session-required) * release: 0.54.25 — API hardening before Swagger (ADV-001..009) + bootstrap-bypass regression fix --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-18 15:13:21 +02:00
Vojtech	cd03028776	fix(store): restore reuses prior approved verdict + admin detail surfaces content_quality (#332 ) * fix(store): restore reuses prior approved verdict; admin detail surfaces content_quality Live bug on agnes-development: entity 6ba2ee1d…'s v5 submission (third restore of v1, byte-identical to v1/v2/v4/v6) landed `blocked_llm` while the other identical-hash siblings landed `approved`. Anthropic structured output is non-deterministic — same bytes flipped `content_quality.verdict` pass↔fail across calls. Admin detail page made the failure look mysterious: only security-findings table rendered, so a content-quality-only block showed up as "No findings — model verdict was clean". Two fixes: 1. Restore endpoint reuses a prior `approved` submission's verdict when the restored bundle hash matches an existing history entry AND `reviewed_by_model` matches. Skips the LLM call, stamps the new submission with the prior verdict + `reused_from_submission_id` marker. Deterministic + saves Anthropic tokens. Gated on schedule_async_llm so guardrails-off keeps its existing path. 2. Admin detail template now renders `content_quality.issues` in its own table + adds an explicit "Blocked but no findings recorded" notice for the transient-non-determinism case + surfaces the reuse marker when present. Reuse falls back to a real LLM call when: - prior submission's reviewed_by_model doesn't match current (admin upgraded tier Haiku → Sonnet → Opus) - prior submission was guardrails-off (no reviewed_by_model) - no history entry has matching hash Tests: - TestRestoreReusesApprovedVerdict::test_restore_of_approved_version_skips_llm_and_reuses_verdict - TestRestoreReusesApprovedVerdict::test_restore_legacy_v1_falls_back_to_llm * fix(store): admin detail v# by submission_id + version switcher Three related fixes surfaced live by a user inspecting submission 47bbc1f5… on localhost where v# rendered as v1 even though current was v10. 1. Admin queue + admin detail derive submission v# by submission_id instead of hash. Pre-fix the loop matched first hash-equal entry in version_history — always v1 when bundles were byte-identical (which is the common case after the restore-reuse path). Two call sites updated: - `src/repositories/store_submissions.py:list_for_admin` (queue v# column) - `app/web/router.py:admin_store_submission_detail_page` (detail page v# chip on each section header) Same fix pattern as PR #330 for runner / override. 2. New version-switcher card on admin detail page lists every submission linked to the entity with status + reviewed_by_model + click-to-jump. Solves the user's secondary ask ("should be a way to switch different versions on the submission detail"). 3. Initial POST now backfills the v1 seed entry's submission_id right after creating the v1 submission. The helper `update_history_submission_id` existed but no production code path called it — so v1 always had submission_id=None and every "find v# for submission" lookup silently failed for v1. 171 tests green on touched surface. * release: 0.54.24 — restore reuses prior approved verdict + admin detail content_quality + v# by submission_id (Codex/Live follow-up to #330/#331) --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-16 07:12:29 +02:00
Vojtech	9eaa1dc53c	fix(store): rescan promotes non-current submission when guardrails off (Codex follow-up to #330 ) (#331 ) * fix(store): rescan promotes non-current submission when guardrails off Codex adversarial-review follow-up on PR #330: admin rescan with `guardrails.enabled: false` flipped submission status to `approved` and entity visibility to `approved` but never called `promote_to_version`. A rescan that re-approved a non-current v2+ left the entity stuck at the prior version even though the operator's intent in clicking rescan was to publish the rescanned bytes. Mirrors the inline-promote pattern in create / update / restore. The guardrails-on path is unchanged — it schedules an LLM review and promotion lands via `runner.run_llm_review` on approval. Adds tests for the byte-identical edge cases Codex flagged as under-covered by PR #330: - TestPromoteLookupByByteIdenticalBundles::test_byte_identical_v3_after_different_v2 - TestOverrideForwardOnly::test_override_byte_identical_v2_blocked_promotes_correctly - TestRescanPromotesNonCurrent::test_rescan_promotes_non_current_v2_when_guardrails_disabled * release: 0.54.23 — rescan promotes non-current submission when guardrails off (Codex follow-up to #330) --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-16 07:04:28 +02:00
Vojtech	78cd243e65	fix(store): promote-on-approve looks up version_no by submission_id (live agnes-development bug) (#330 ) * fix(store): promote-on-approve looks up version_no by submission_id Live bug observed on agnes-development: an entity had 5+ version_history rows sharing the same `hash` (user re-uploaded byte-identical bundles as v2/v4/v6 of the same skill — the LLM and inline checks happily approved each one). The runner's promote-on-approve path looked up the submission's version_no by hash: for entry in entity.version_history: if entry["hash"] == sub_hash: target = int(entry["n"]); break The loop matched the FIRST hash collision — always v1, n=1. With current=1, the forward-only `target > current` guard then skipped the promote, leaving the entity stuck at v1 even though the new submission's status flipped to `approved`. UI kept showing v1 as "current". Fix: look up by submission_id via the existing `_version_no_for_submission` helper (already used by retry / rescan / download paths). Same lookup applied in `admin_override_store_submission` which had the identical hash-match loop. Test: TestPromoteLookupByByteIdenticalBundles uploads v1 + a byte-identical v2, drives the LLM with mock-approve, asserts entity.version_no advances to 2. * fix: bundle #329 reviewer-Important follow-ups + post-merge polish Bundled with Vojtech's commit ahead of this (the promote-on-approve `version_no` lookup-by-submission_id fix) since #330 is the next release-cut PR and the four #329 follow-ups would otherwise need a standalone release-cut PR — prohibited by docs/RELEASING.md § "Release-cut belongs to the PR". Fixed: - src/usage_ask.py — SCHEMA_DIGEST + SYSTEM_PROMPT referenced the dropped `usage_plugin_daily` table. The admin `POST /api/admin/telemetry/ask` endpoint ships SYSTEM_PROMPT to the LLM, so any model-emitted SQL against `usage_plugin_daily` would fail with a DuckDB binder error post-#329 merge. Updated to describe the new v48 rollups (`usage_marketplace_item_daily` / `_window`) and rule 5 of the prompt to point at them. Internal: - CHANGELOG.md [0.54.20] section restored to its canonical content from the v0.54.20 git tag. The #329 self-merge carried 226 lines of author's pre-rebase bullets that ended up mis-attributed; the published v0.54.20 GitHub Release (FTS BM25 + batch bar) now matches the CHANGELOG section verbatim. Also fills in [Unreleased] with this PR's bullets (Fixed + Internal). - tests/conftest.py — dropped the unused `conn_with_usage_schema_and_attribution` fixture that INSERTed into the now-removed `usage_attribution_` tables. Zero callers today, but a tripwire — the first future test to request it would have failed with a binder error. - app/web/templates/marketplace.html — replaced a customer-specific token (`groupon-marketplace`) in the Most Popular sort-tiebreaker comment with a generic `<customer>-marketplace` placeholder per CLAUDE.md § Vendor-agnostic OSS. Also scrubbed an `agnes-development` reference in app/api/admin.py and src/store_guardrails/runner.py (cherry-picked from Vojtech's commit) on the same hygiene rule. release: 0.54.22 — flea-market promote-by-submission_id fix + #329 reviewer follow-ups --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-15 21:21:14 +02:00
minasarustamyan	302cf58ccd	feat(marketplace): telemetry v46 + flea inner parity + listing polish (#329 ) * feat(telemetry): marketplace item rollup refactor (schema v46) Replace the v42 attribution layer with prefix-split + live lookup against marketplace_plugins / store_entities. The v42 design had a latent bug — AttributionLookup keyed on bare skill names while Claude Code writes `<plugin>:<local>` in JSONL, so lookups never matched and usage_plugin_daily stayed empty in every deployment. Schema (v46 migration): - Drop usage_attribution_skills / _agents / _commands (mapping tables, derivable from marketplace_plugins + plugin tree). - Drop usage_plugin_daily (always empty in production due to the bug above). - Create usage_marketplace_item_daily — per-day fact (count, distinct_users, error_count), composite PK on (day, source, type, parent_plugin, name). - Create usage_marketplace_item_window — sliding-window snapshot with true cross-window distinct user counts; period_label='last_7d' refreshes every tick, 'last_30d' refreshes hourly (tracked via session_processor_state). - Mark usage_tool_daily as candidate for removal (no product-UI consumer). Attribution flow: - MarketplaceItemLookup replaces AttributionLookup. Preloads marketplace_plugins.name + store_entities.name into memory once per UsageProcessor tick, then per-event splits identifier on ':', matches prefix, writes resolved source / parent_plugin into usage_events. agnes-store-bundle prefix routes to flea entities. Slash commands with `plugin:` prefix count as type='skill' in rollup. API: - BREAKING: MarketplaceItem.unique_users_30d renamed to distinct_users_30d (now a true distinct count from the window snapshot, not sum-of-daily). - InnerDetailResponse gains a telemetry field — invocations_30d + distinct_users_30d surfaced on curated inner skill / agent detail pages. - Card chip hidden pending UX finalisation; data stays in the response. Backfill: scripts/backfill_marketplace_rollup.py — one-shot rebuild over historic usage_events after deploy, idempotent. USAGE_PROCESSOR_VERSION bumped 4 → 5 so the reprocess loop re-attributes existing events to the new source/ref_id semantics on the next tick. Tests rewritten: test_session_processor_usage, test_usage_rollups, test_marketplace_telemetry, test_api_admin_usage_reprocess, test_db_schema_version, test_home_stats, test_schema_v42_migration. New: test_backfill_marketplace_rollup. * fix(marketplace): refresh Most Popular on search + category changes `loadMostPopular()` early-exits when `state.q` or `state.category` is set, but the search + category handlers only called `loadItems()` — so once the section was visible, typing a query or filtering by category didn't re-run the hide check and the cards stayed on screen out of scope. Tab + sort handlers already chained the call. Add the call to runSearch + category pill click handlers (All + per-category) so the visibility contract holds for every state mutation that can flip the early-exit condition. * feat(marketplace): All-plugins section + 7-day Most Popular Listing layout: - Always-visible "All plugins" / "All items" / "Your stack" section header (label swaps per tab) wrapped in `#mp-all-section` so its margin-collapse mirrors the sibling `#mp-popular-section` and the spacing from the filter row stays consistent in both layouts. - Sort dropdown moved from the filter row into the All-* header, pinned right via `margin-left: auto`. Anchored to its section so the relationship between sort + grid is obvious. - `.mp-section-header` gets `min-height: 32px` + `align-items: center` so the bare-text Most Popular row matches the dropdown-bearing All-* row. - `.mp-section-header` margin tightened 24px → 20px on top. Most Popular: - Capacity reduced 8 → 4 cards. - Now reflects a 7-day window (was 30-day). Backend surfaces `invocations_7d` + `distinct_users_7d` on `MarketplaceItem` alongside the existing 30d fields; the loader pulls a wider page (server still sorts by 30d) and re-sorts + filters client-side on `invocations_7d > 0` so the strip stays "hot right now". - Section label updated to "Last 7 days". - Section now renders on both `curated` and `flea` tabs (was curated-only). Hidden on `my` and whenever search / category filter is active. Refresh hooks wired into search + category click handlers so visibility flips immediately on state change. Backend (`_load_invocation_stats`): - Single SELECT pulls both `last_30d` and `last_7d` rows from `usage_marketplace_item_window`; the result dict carries invocations + distinct_users for both windows. - Trend (recent_7 vs prior_7) kept on the daily fact table so it stays independent of the window snapshot's freshness. * feat(marketplace): Most adopted sort + hide Trending when no trend data Add a fourth sort option to the All-items dropdown — "Most adopted (30d)", keyed on `MarketplaceItem.distinct_users_30d` (true 30d distinct user count from `usage_marketplace_item_window`). Protects the listing from power-user skew that `most_used` is susceptible to: one user × 100 invokes can't beat 10 different users × 1 invoke under adoption sort. Hide Trending option when the response has no trend data. User reported `sort=trending` returning an empty grid because every plugin's `trend_pct` was None (prior-week threshold of >= 3 invocations didn't clear anywhere). Empty grids on a user-selected sort are worse UX than just not offering the sort — surface what works, hide what doesn't. Backend (`app/api/marketplace.py`): - `_apply_sort` gains a `most_adopted` branch (DESC distinct_users_30d, ties by name ASC). - `sort` Literal extended. - `ItemListResponse.available_sorts` lists the sort keys the UI should expose for this response. recent/most_used/most_adopted always; trending only when at least one item in the tab's stats carries a non-null trend_pct. - `_available_sorts(stats_dicts)` helper centralises the rule — curated and flea branches pass one stats dict, my-tab passes both (option is available when either source has trend data). Frontend (`app/web/templates/marketplace.html`): - New `<option value="most_adopted">Most adopted (30d)</option>` between Most used and Trending. - URL state allowlist extended so `?sort=most_adopted` round-trips. - `applyAvailableSorts(available)` runs after each list fetch: hides options not in the response's available_sorts; if the user is on a now-unavailable sort, resets to 'recent' and re-fetches. Search-mode fan-out unions availability across the curated + flea responses so a hit on either side keeps the option visible. * feat(marketplace): funnel chip on cards + deterministic Most Popular sort Card chip — funnel telemetry between description and footer: [stack-icon] N installed · [user-icon] N active · [bolt-icon] N calls · ↑/↓ N% - stack_count (new MarketplaceItem field): for curated it's COUNT() on user_plugin_optouts (post-v28 row PRESENCE = subscribed; system plugins are fanned out to every user via fanout_system_for_user so the count includes them naturally). For flea it reuses the existing store_entities.install_count (bumped on install/uninstall). - distinct_users_30d (existing) — active users in the 30d window. - invocations_30d (existing) — call volume. - trend_pct (existing) — week-over-week, both directions: green ↑ / red ↓, magnitude only (sign in the arrow). Hidden when null. Backend additions in app/api/marketplace.py: - MarketplaceItem.stack_count field. - _load_curated_stack_counts() — one SELECT per render, GROUP BY (marketplace_id, plugin_name). Wired into the curated + my-tab branches; flea reads install_count off the entity row directly. Frontend (app/web/templates/marketplace.html): - Heroicons solid 24×24 inlined (one helper per icon, all fill="currentColor" so per-segment colour tokens apply): rectangle- stack (mirrors the My Stack tab icon), user, bolt, arrow-trending- up/down. - Per-segment colour: installed=amber #F59F0A (My Stack accent), active=green #0e9b6a, calls=orange #f97316. Text stays neutral so the chip still reads as metadata, the leading glyph carries the visual cue. Trend pill keeps the full-segment green/red colour. - Zero state: chip hidden when stack_count == 0 AND invocations_30d == 0 — brand-new cards aren't visually penalised by a "0·0·0" row. - Tooltips on every segment via title="…" so hover explains the number's meaning to anyone uncertain about the icon. Most Popular section — deterministic ordering: Previously sorted by invocations_7d DESC with no tie-breakers, so several cards with identical 7d call counts would swap places on refresh (JS stable sort fell back on backend order, and the backend's own tie-breaker for `most_used` was just name ASC — six `grpn` plugins from six test marketplaces collapse to the same name and became indeterminate via list_with_filters' created_at order). New cascading hierarchy (chosen primary now matches what "most popular" really means — wide adoption, not power-user volume): 1. distinct_users_7d DESC ← adoption / social proof 2. invocations_7d DESC ← volume at equal adoption 3. distinct_users_30d DESC ← broader adoption fallback 4. invocations_30d DESC ← broader volume fallback 5. name ASC ← deterministic textual order 6. marketplace_slug ASC ← splits duplicate plugin names across marketplaces Six levels guarantee any two items end at a different sort key, so the strip is stable across refreshes. fix(marketplace): unify Most Popular on 30d + right-align installed chip Most Popular section was sorting on the 7d window while its cards rendered 30d numbers — header label promised one thing, cards showed another. Unified everything on 30d so a card means the same data everywhere on the page. - Dropped the "Last 7 days" meta from the Most Popular header. - Sort cascade now starts on distinct_users_30d, then invocations_30d, with 7d adoption/volume as recency-aware fallbacks before the name + marketplace_slug deterministic tail. Six levels guarantee identical sort keys never produce indeterminate order across refreshes. - Filter switched from invocations_7d > 0 to invocations_30d > 0 to match the new horizon. - Most Popular now only renders on page 1 of the listing. Past initial discovery, a top-of-list popularity strip on page 2+ would shadow the results the user paged into. Pager click handler refreshes the section so navigating back to page 1 re-mounts it. Chip layout — split engagement vs adoption visually: [user] N active · [bolt] N calls · [↑/↓] N% [stack] N installed └────────── LEFT (time-bounded engagement) ────┘ └── RIGHT (all-time) ──┘ - Installed (stack_count) is all-time, decremented on uninstall. Alone it says little ("12 people installed it") without the engagement context next to it ("…but did anyone actually use it?"). Visually separating the two groups makes that distinction obvious — left group answers "is it used", right answers "does anyone have it". - Implemented via flex with margin-left:auto on .seg-installed so installed drifts to the trailing edge. - Installed tooltip now reads "Currently installed by N users" — the count is a real-time net (uninstall drops it), and saying "currently" makes that explicit. Helps when a card shows 0: signals "nobody has this in their stack right now", not "data missing". * feat(plugin-detail): telemetry chip in hero, derived rows in sidebar Surface the same telemetry funnel the listing card carries on the curated plugin detail page, so clicking through from /marketplace keeps a single mental model — figures match, semantics match. The detail sidebar drops the two raw numbers that used to live there (Invocations 30d / Users 30d — duplicated by the chip now) and replaces them with two derived signals only the daily series can provide: Active days + Last used. Backend (app/api/marketplace.py): - PluginDetailResponse.stack_count — curated reads via _load_curated_stack_counts(), flea reuses install_count. Frontend treats both sources uniformly. - _build_telemetry() always returns a dict (never None). Frontend decides chip visibility from stack_count + invocations_30d the same way the listing card does. daily_series is always 30 entries (zero-padded) so "Active days" and "Last used" derivations on the sidebar are trivial array filters. Frontend (app/web/templates/marketplace_plugin_detail.html): - New .hero-telemetry slot at the bottom of the hero meta column, between the pills row and the action buttons. Renders the four funnel segments — active · calls · trend · installed — joined by ` · `. No left/right split: the hero has space, so a single coherent metadata strip reads cleaner than the card's split layout. - Heroicons solid inlined (user / bolt / arrow-trending-up,-down / rectangle-stack) recoloured against the dark hero — icons in lighter tokens (mint #6ee7b7, peach #fdba74, cream #fde68a), trend pill keeps the saturated green/red because direction-coding earns its own colour. - Tooltip on installed reads "Currently installed by N users" — the count is a real-time net (drops on uninstall), and "currently" makes that explicit when a card shows 0. - fmtNum helper added so 1.2k / 14M renderings match the card's format exactly. - Sidebar swap: Invocations + Users rows removed, replaced by Active days → "N of 30" Last used → fmtRelative of the latest non-zero day Both derived from telemetry.daily_series — engagement consistency + recency, neither of which the hero chip exposes on its own. * feat(item-detail): telemetry chip in hero for curated skill/agent Bring the funnel chip the plugin detail page got in 4cf38d40 to the curated inner skill/agent detail page — clicking through from the listing card now keeps the same metadata strip from grid to plugin page to inner item page. Backend (app/api/marketplace.py): - _load_inner_item_stats() rewritten: * always returns a dict (never None) so the frontend can decide chip visibility client-side, same contract as _build_telemetry * adds trend_pct, computed the same way as plugin level (recent_7 vs prior_7 from usage_marketplace_item_daily, ≥3 prior-week threshold) * adds daily_series (30 entries, zero-padded) so the sidebar can derive Active days + Last used - InnerDetailResponse.parent_stack_count — new field. Skills/agents don't have a per-item subscription model, so the hero shows the parent plugin's stack count under a "Plugin:" prefix. The funnel: "12 installed plugin → 2 actually use this skill". - curated_skill_detail + curated_agent_detail handlers load _load_curated_stack_counts() once and pass the parent's value. Frontend (app/web/templates/marketplace_item_detail.html): - New .item-detail .hero .hero-telemetry slot beneath the badges row. CSS mirrors plugin-detail's colour tokens (mint/peach/cream Heroicons solid + saturated trend pill) so the two surfaces read as one visual family. - Installed segment uses a "Plugin:" label rendered with reduced opacity to signal the metric describes the parent, not the item itself. Tooltip: "Parent plugin (<plugin_name>) currently installed by N users". - Sidebar Invocations + Users rows removed (chip carries them). Active days + Last used derived from telemetry.daily_series replace them; only rendered when activeDays > 0 so a brand-new skill doesn't show "0 of 30" / "Last used —". - "Type" row dropped from the sidebar — duplicates the hero badge. - fmtNum helper added (matches listing card + plugin detail). Plugin detail (app/web/templates/marketplace_plugin_detail.html): - Hero "Curator: …" line removed. The Details sidebar already carries that info; duplicating it under the h1 was visual noise. - Sidebar "Owner" row renamed to "Curator" — for curated plugins it's a person who curates inclusion in this Agnes instance, not the upstream code owner. "Owner" was a hold-over label. * feat(item-detail): unify hero with plugin detail — pills + breadcrumb + cleaner sidebar - Inner skill/agent hero now uses the same `.pills` / `.pill.cat / .curated / .flea / .muted` class names + CSS as the plugin detail page; the only item-only addition is `.pill.type` (Skill / Agent uppercase, plugin detail has no kind axis). - Hero `Updated` moved out of the meta-row into a muted pill (mirrors the plugin detail hero), removed from the Details sidebar to avoid duplication. - Details sidebar slimmed: dropped Marketplace, Path, Updated rows; Parent plugin now shows the curator-friendly display name (`parent_display_name \|\| manifest_name \|\| slug`) instead of the slug. - Breadcrumb extended to full path: Marketplace > <marketplace_name> > <plugin display name> > <self>, mirroring the plugin detail breadcrumb. - Backend: new `InnerDetailResponse.parent_display_name` field, populated via `_curated_plugin_enrichment` from marketplace-metadata.json — same source plugin detail hero already uses. * feat(marketplace): flea inner skill/agent detail + breadcrumb polish - Flea inner skill/agent detail page parity with curated: * GET /api/marketplace/flea/{id}/skill/{name} + /agent/{name} returning InnerDetailResponse (mirror of curated_skill_detail). * /marketplace/flea/{id}/skill\|agent/{name} web routes that render marketplace_item_detail.html with source='flea' + innerName context. * Frontend apiURL grows a third branch for flea-inner; breadcrumb grows to 4 segments (Marketplace > Flea Market > <plugin display name> > <self>) when innerName is set. * Telemetry attribution: MarketplaceItemLookup resolves <flea_plugin>:<inner> prefixes to (source='flea', parent_plugin=<plugin name>) so nested invocations land in the same rollups curated nested skills use. USAGE_PROCESSOR_VERSION bumped 5 -> 6 so the reprocess loop re-attributes historic events. - Breadcrumb 2nd segment is now a generic clickable "Curated Marketplace" / "Flea Market" link to /marketplace?tab=... instead of the opaque per-instance marketplace_name. Applied on both plugin detail and inner item detail. - Inner item hero telemetry chip works for both sources: installedCount branches on parent_stack_count (curated) vs install_count (flea), installed segment drops the "Plugin:" prefix for flea standalone / inner items. - Updated row dropped from Details sidebar on item detail — the hero pill already carries the value, sidebar row was duplicate. * feat(item-detail): block stack-install on flea inner items (mirror curated) Inner skills/agents nested inside a flea plugin can no longer be added to a user's stack on their own — adoption only happens at the plugin level, same rule curated nested items have followed since launch. - Hero action: when innerName is set (curated nested OR flea nested), render "Open parent plugin →" link + helper text instead of the install/remove buttons. Flea standalone entities (no innerName) keep the normal install UX. - Meta-row: same branch now serves curated + flea inner — "part of <parent plugin display name> · by <author>" with the parent link pointing at the right detail page per source. No API gate change needed: POST /api/store/entities/{id}/install only accepts existing entity ids (plugin-level), inner items have no entity id of their own so the endpoint cannot target them directly. * feat(marketplace): telemetry chip on inner cards + fix flea hero chip visibility Inner skill/agent cards on the plugin detail page now carry the same four-segment funnel chip the marketplace listing cards show (N active . N calls . trend . N installed), for both curated nested skills and flea nested skills. Plus two fixes that were keeping the hero chip hidden on flea plugin / flea inner detail pages. - Backend `_load_inner_items_stats_by_parent(conn, source, parent_plugin)` bulk loader: one query per plugin against usage_marketplace_item_window + one against _daily, returning {(name, type): stats}. Avoids N+1 per-card lookups. - `InnerItemSummary` gains invocations_30d / distinct_users_30d / trend_pct / parent_stack_count fields. `curated_detail` and `flea_detail` (in the entity.type=='plugin' branch) enrich the skills / agents lists after the existing cover-photo enrichment loop. - `marketplace_plugin_detail.html`: new `.plugin-detail .inner-card .inv-chip` CSS lifted from marketplace.html with the listing-card rules, new buildInnerCardChip() helper, buildCardSection appends the chip to each card body. Same gate as the listing card (hidden on parent_stack==0 && calls==0). - fix(flea): flea_detail forgot to populate PluginDetailResponse.stack_count from entity.install_count (listing card does this on line 851; detail endpoint didn't). Hero chip gate `stackCount===0 && calls===0` then always hid the chip even when the entity had installs. Now mirrors listing card semantics: stack_count == install_count for flea. - fix(flea inner): renderInnerHeroTelemetry was reading `d.install_count` for any non-curated source. InnerDetailResponse has no install_count field — it has parent_stack_count (populated server-side from the parent flea plugin's install_count). Gate + label now read parent_stack_count for both curated nested AND flea nested scenarios; install_count remains the flea standalone path. fix(marketplace): Owner label on flea + parent-centric sidebar for flea inner - Plugin detail Details sidebar — authorship row label now tracks the source: curated bundles get `Curator` (existing behaviour), flea bundles get `Owner`. The `owner_todo` reminder placeholder stays on the curated branch only; flea falls through silently. - Inner item detail Details sidebar — flea-inner (skill/agent nested inside a flea plugin) now shares the curated nested layout: Parent plugin / Bundle size / Active days / Last used / Owner. Drops the flea-standalone shape's `Category`, `Version`, `Installs`, `Released` rows that didn't apply to a nested item. Active days + Last used were already wired (telemetryRows) — they just weren't on the flea-inner branch. * fix(tests): bump SCHEMA_VERSION assertions 47 -> 48 post-rebase The marketplace telemetry migration was renamed _v46_to_v47 -> _v47_to_v48 during the rebase onto main (collision with #326 FTS BM25 migration that took the v47 slot). Two test files still asserted the pre-rebase value: - tests/test_home_stats.py::test_schema_version_constant_is_46 (CI red) - tests/test_schema_v46_migration.py::test_schema_version_is_46 Renames the helper fn name + bumps the assertion. The other two test files (test_db_schema_version.py, test_schema_v42_migration.py) were already updated in the rebase resolution. * fix(telemetry): _build_telemetry returns None when invocations_30d == 0 The follow-up commit that introduced the always-return-dict shape broke the test contract from the original v46 PR (commit b603e998): tests/test_marketplace_telemetry.py::TestDetailTelemetry:: test_detail_endpoint_telemetry_absent_when_no_data AssertionError: assert {'daily_series': [...], ...} is None Both `PluginDetailResponse.telemetry` and `InnerDetailResponse.telemetry` are declared `Optional[Dict] = None`, the frontend renders are None-safe (`d.telemetry \|\| {}` guard + `if (!d.telemetry \|\| ...)` on daily_series), so dropping the dict on zero activity is the cleaner default. * release: 0.54.21 — marketplace telemetry refactor (schema v48) + flea inner detail parity + listing UX polish --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-15 20:58:03 +02:00
ZdenekSrotyr	9e948abc9c	release(0.54.18): Curated Memory restructure + per-user Dismiss + bundled adversarial-review fixes (#316/#320/#322) (#324 ) * feat(web): Curated Memory restructure + per-user Dismiss + filter-state utility Squashed from cvrysanek/zsrotyr's 4-commit PR branch + rebased onto current main + CHANGELOG bullets spliced into [Unreleased] (preserves existing #316/#320/#322 entries that landed on main since the branch was authored). Routes + access: - /corporate-memory now user-facing (get_current_user), in primary nav next to "Data Packages" — same gate as /api/memory/. - /admin/corporate-memory is the new admin review queue location (was /corporate-memory/admin); reached via Admin dropdown. Template renamed: corporate_memory_admin.html → admin_corporate_memory.html. Visual chrome: - Both pages migrate to shared _page_hero.html blue hero band. Per-user Dismiss (new feature, schema v46): - knowledge_item_user_dismissed(user_id, item_id, dismissed_at) + index. - POST /api/memory/{id}/dismiss + DELETE (idempotent). - Mandatory items can never be dismissed — enforced at 2 layers. - GET /api/memory: hide_dismissed=false default + dismissed_by_me flag. - GET /api/memory/bundle: always excludes dismissed for the caller. - UI: Dismiss/Undismiss button per item (hidden for mandatory), gray-out + line-through for dismissed rows, Hide-dismissed toggle. Admin edit modal: - Category as <select> + "Add new category…" reveal. - Audience as <select> with (unset)/all/group:<name> from RBAC. - Tags: full tag-input widget (pills, ×-remove, Backspace pop, Enter/comma to add, ↑/↓ typeahead from EXISTING_TAGS). Bulk-edit modal pickers (closes #128): - Move-to-category / Add-tag: <select> + add-new. - Set-audience: <select> (no more typo-able 'gourp:eng'). - Remove-tag: closed-set picker. FilterState utility: - app/web/static/js/filter-state.js — save/load/clear/bindInputs for per-page localStorage filter state. Adopted on /corporate-memory. E2E verified live on a real VM through the API + browser flow. release: 0.54.18 — Curated Memory restructure + 4 adversarial-review fixes Bundles together: - #316 fix(store): surface review failures + harden publish gate (BREAKING fail-CLOSED guardrail, override v2+ promote, restore guard, retry/rescan staged-bundle, banner widening, LLM truncation retry) - #320 fix(store): C2 bundle export RBAC + H2 per-entity write lock + H3 update_status compare-and-swap with bg_verdict_skipped audit - #322 fix(store): M1 prompt sentinel filename escape + M2 atomic promote_to_version helper + L1 admin forensic download per-version - #324 Curated Memory restructure + per-user Dismiss + FilterState utility Bump from 0.54.17 → 0.54.18 (patch — pre-1.0 policy: every cycle is patch).	2026-05-15 18:51:05 +02:00
Vojtech	bb703517c9	fix(store): close 2 medium + 1 low adversarial-review findings (#322 ) Three remaining findings from Codex's adversarial review of PR #316 (issue #318), plus a pre-existing version-numbering bug surfaced while fixing the atomic-promote ordering. M1 — Prompt sentinel escape now covers file PATHS, not just file BODIES. Pre-fix the per-file `--- FILE: {rel} ---` header inlined the untrusted relative path unescaped. A ZIP whose relative path concatenated to `</bundle>` (a `<` directory plus a `bundle>` child) could forge the trust-boundary close tag from inside the path slot and inject apparent system instructions after the boundary. Same `_escape_sentinels` helper now runs on both rel and body. M2 — Live-bundle swap + DB promote is now atomic-ish. The runner / override / inline-promote paths previously called `repo.promote_version(...)` then `_swap_live_to_version(...)`. A missing `versions/v<N>/plugin/` made the swap silently return False — leaving the DB ahead of live. New `promote_to_version` helper in `app/api/store.py` swaps FIRST (with the existing staging → backup → live rename chain) and only advances the DB row after the on-disk swap succeeds; rolls live back to prior on DB write failure. While wiring up M2, the strict source check exposed a pre-existing bug: `update_entity` and `restore_version` derived `new_version_no = entity.version_no + 1`. Under deferred promotion that's wrong — entity.version_no stays at the last approved version while version_history grows with blocked / pending entries. Subsequent PUTs would overwrite an in-flight blocked v2 dir's bytes, then the runner's hash-match promotion in `runner.run_llm_review` would load bytes that didn't match the recorded submission hash. Fixed by deriving from `max(version_history.n) + 1`. L1 — Admin forensic download now serves STAGED bundle bytes per submission, not live. Pre-fix downloading a blocked v2 streamed live's prior approved v1 bytes — admins reviewing whether to override saw the wrong bytes. Resolves staged `versions/v<N>/plugin/` via `_version_no_for_submission`; falls back to live for legacy rows without history linkage. Tests: - test_filename_with_bundle_sentinel_is_escaped - TestAtomicPromote::test_missing_source_dir_does_not_advance_db - TestAdminBundleDownload::test_download_v2_blocked_returns_staged_bundle_not_live	2026-05-15 17:56:09 +02:00
Vojtech	6fb11a137b	fix(store): close 1 critical + 2 high adversarial-review findings (C2/H2/H3 from #318 ) (#320 ) * fix(store): close 1 critical + 2 high adversarial-review findings Three findings from Codex's adversarial review of PR #316 (issue #318). C2 — `/api/store/bundle.zip` leaked quarantined entities. The export endpoint called `repo.list(...)` with no `visibility_status` filter, so any authenticated non-admin could download pending / blocked v1 bytes — bypassing the publish gate. Mirrored the browse-listing gate: non-admin sees only `approved` (plus their own non-approved entries via `include_owner_id`); admins skip the filter. H2 — concurrent PUTs on the same entity could both pass the `latest_for_entity` pending gate. The `update_entity` and `restore_version` handlers now wrap their critical section in a per-entity asyncio.Lock (`_hold_entity_write_lock`). Single-process deployments are now serialized; multi-worker deployments still have a residual window (tracked in issue #318). H3 — `StoreSubmissionsRepository.update_status` blindly overwrote any current status. A late BG-task LLM verdict could clobber an `overridden` row back to `approved` / `blocked_llm` after the admin had already force-published. Added compare-and-swap on terminal statuses (`approved`, `overridden`, `blocked_inline`); callers that legitimately need to overwrite (admin rescan etc.) pass `allow_terminal_overwrite=True`. Returns bool indicating whether the write landed; BG callers no-op on terminal rows. Tests: - TestStoreBundle::test_bundle_zip_filters_quarantined_for_non_owner - TestStoreBundle::test_bundle_zip_owner_sees_own_pending - TestStoreBundle::test_bundle_zip_admin_sees_all - TestConcurrentPutSerialization::test_per_entity_lock_serializes - TestConcurrentPutSerialization::test_per_entity_lock_does_not_serialize_across_entities - TestBgTaskIdempotency::test_late_verdict_does_not_clobber_overridden - TestBgTaskIdempotency::test_explicit_allow_terminal_overwrite_works * review fix: runner.run_llm_review honors update_status CAS bool Codex's CAS in update_status closes the DB-level race correctly, but runner.run_llm_review was still discarding the new bool return on both its `approved` and `blocked_llm` branches. When the CAS no-op'd (submission already at terminal status — most commonly an admin override fired mid-review), the runner kept running the downstream cascade: - set_visibility_if_pending (no-op on approved, but still ran) - promote_version + _swap_live_to_version (forward-only check mitigated worst case) - update_flea_attribution - audit.log(action="store.submission.approved" / "blocked_llm") — this is the operator-visible damage: the audit trail would show a verdict that contradicts the row's actual `overridden` status. Fix: capture the bool, skip the cascade on no-op, log a single `store.submission.bg_verdict_skipped` audit row instead. Mirrors the existing `superseded_reason` path the runner already has for the archive-during-review case (TestPRReviewFixes:: test_bg_verdict_skipped_when_admin_archives_during_review). Test: TestBgTaskIdempotency::test_runner_late_verdict_logs_skipped_not_approved sets up the v1-approved + v2-pending + admin-override sequence, fires run_llm_review directly with a mocked "approved" verdict, asserts row stays overridden AND audit has bg_verdict_skipped AND audit does NOT have a contradictory approved entry. CHANGELOG H3 bullet expanded to acknowledge the bg_verdict_skipped audit-row behavior — operator reviewing the queue now sees dropped verdicts explicitly rather than via row-vs-audit contradiction. --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-15 17:45:43 +02:00
Vojtech	a694a30a5e	fix(store): surface review failures + harden publish gate (#316 ) * fix(store): surface review failures + harden publish gate Four independent fixes to the flea-market submission pipeline, all surfaced by an admin upload that landed at status='approved' without an LLM review. 1. LLM truncation no longer pins submissions in review_error. - Raised MAX_RESPONSE_TOKENS 2500 → 6000 in llm_review.py - Added one-shot retry-with-doubled-budget in anthropic_provider.py (capped at 4× initial) 2. Flea detail page surfaces the latest submission's failure verdict even when a previously-approved version is still serving (deferred-promotion path). The _quarantine_banner gate widened from `visibility != approved` to also fire on `blocked_inline / blocked_llm / review_error`, with copy that distinguishes the v2+ edit case ("Latest edit failed review — previously approved version (vN) keeps serving") from the initial-upload quarantine wording. 3. Restore button + endpoint no longer allow restoring a version that was never approved. Added StoreEntitiesRepository.get_with_version_approvals joining store_submissions, gated the UI button on submission_status in ('approved', None), rendered status pills for non-restorable rows, and added a 400 version_not_approved guard in POST /restore. 4. BREAKING (operator-facing): publish gate is now fail-CLOSED on misconfig. The previous get_guardrails_enabled() silently fell back to "disabled, auto-approve everything" when guardrails.enabled=true in YAML but no ANTHROPIC_API_KEY was in env. Split into: - get_guardrails_enabled() (intent — YAML) - get_guardrails_llm_provider_ready() (readiness — env) Three-state matrix: enabled=false → auto-approve (unchanged) enabled=true + ready=true → normal pipeline (unchanged) enabled=true + ready=false (NEW) → submissions hold at pending_llm awaiting admin retry or override (was: silent auto-approve) Admin "Retry review" eligibility broadened to include pending_llm. Boot-time WARNING banner surfaces the misconfig in app/main.py. docs/STORE_GUARDRAILS.md updated with the three-state matrix. Operators relying on the auto-fallback for local-dev no-LLM setups must now explicitly set `guardrails.enabled: false` in instance.yaml. Tests: 4623 passed. Added TestPublishGateFailClosed (4 tests) and TestRestoreVersion::test_restore_rejects_* (3 tests). conftest.py adds an autouse fixture defaulting guardrails OFF so legacy tests don't need to know about the new toggle. * fix(store): admin override promotes v2+ edits to current The override handler at app/api/admin.py:3708 only flipped submission status → 'overridden' and entity visibility → 'approved'. Under the v37+ deferred-promotion model that's insufficient for v2+ edits / restores: the new bundle sits in versions/v<N>/plugin/ and the entity row stays at the prior approved version_no + hash + on-disk live bundle. Installers kept getting the OLD bytes the admin had just intended to replace. Mirror the runner.run_llm_review auto-approval branch: look up the submission's version_hash in entity.version_history, and if its `n` differs from entity.version_no, promote_version + _swap_live_to_version. Initial v1 overrides are unaffected — the loop finds n=1 == version_no and skips promotion. Tests: - test_override_v2_edit_promotes_to_current: stage v1 approved + v2 blocked_llm; override the v2 sub; assert entity.version_no=2, entity.version flips off the v1 hash, and the live plugin/ dir mirrors versions/v2/plugin/. - test_override_v1_initial_upload_no_promote: regression guard so the promote loop doesn't accidentally bump a v1 override. Audit log gains a promoted_to_version_no field on the override action. * fix(store): retry/rescan review staged bundle; override forward-only Two adversarial-review findings from a Codex pass on the publish-gate work. C1. Admin retry + rescan were passing live `plugin/` to the LLM. For a v2+ submission held at `pending_llm` / `blocked_llm` / `review_error`, live still holds the prior approved version's bytes — so the LLM reviewed the WRONG bytes, and the runner's hash-match promotion in `run_llm_review` would then advance the entity to staged bytes that were never actually reviewed. Resolve the staged `<entity>/versions/v<N>/plugin/` from the submission's `version_history` entry, with a fall-back to live for legacy pre-v37 rows that never seeded a versions/ dir. Helpers `_submission_plugin_dir` and `_version_no_for_submission` added to `app/api/store.py` so override / retry / rescan share one path. H1. Override's promote loop used `target != current`, which would silently demote the live bundle when admin overrode a stale v2 submission while v3 was already approved + live. Changed to `target > current` so override flips status + visibility on the row regardless, but on-disk promotion only fires forward. Same `>` defensive guard applied in `runner.run_llm_review` so a late LLM verdict racing with a newer approval can't demote either. Tests: - TestAdminRetryReviewsStagedBundle::test_retry_v2_blocked_passes_staged_dir_not_live - TestAdminRetryReviewsStagedBundle::test_rescan_v2_blocked_passes_staged_dir_not_live - TestOverrideForwardOnly::test_override_stale_v2_does_not_demote_when_v3_current * review polish: CHANGELOG drift, override eligibility, defensive copy Three small additions on top of the retry/rescan staged-bundle fix: 1. CHANGELOG: the PR's bullets had drifted into the released [0.54.17] section during rebase (context-match landed them next to already-released content). Moved them up to [Unreleased] where they belong; [0.54.17] now holds only what was actually released (refresh-marketplace ls-remote, /me/activity hero, CI sharding + workflow polish). 2. app/api/admin.py: admin override eligibility now accepts pending_llm alongside blocked_inline + blocked_llm + review_error. Closes a UX gap from the new fail-CLOSED behavior: under enabled-but-not-ready, a known-good submission would otherwise sit indefinitely until the admin set credentials AND clicked Retry. Override already routes through version_history (and is now forward-only on promote), so it stays safe for v2+ deferred- promotion submissions. 3. src/repositories/store_entities.py: get_with_version_approvals defensively copies each version_history entry before annotating with submission_status. self.get() re-parses JSON each call today so this is belt-and-suspenders against any future caching layer leaking the annotated key into a subsequent plain get() call. Tests: 112 passed (focused on test_store_entity_versions + test_admin_store_submissions, covering the retry/rescan staged- bundle fix the author shipped + this polish). --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-15 15:52:07 +02:00
Vojtech	fbe756685b	fix(api): redirect unauthorized browser requests to login for initial workspace zip (#315 ) * fix(api): redirect unauthorized browser requests to login for initial workspace zip * fix(api): import Request and RedirectResponse in initial_workspace router FastAPI was treating `request` as a required query parameter because `Request` was missing from the fastapi import, causing 422 on GET /api/initial-workspace.zip. `RedirectResponse` was also missing (used for browser redirect to /login). * review fixes: CHANGELOG + comment + 2 edge tests - CHANGELOG.md: add [Unreleased] ### Fixed bullet per project rule. - app/api/initial_workspace.py: comment explaining why this /api/* endpoint intentionally opts out of the _API_PATH_PREFIXES "never redirect /api/" contract in app/main.py, and why matching only `text/html` (not `/`) mirrors _wants_html()'s rationale. - tests: add Accept: /* (curl default) and empty-Accept cases — both lock in 401, defending the curl-tooling-must-keep-getting-401 contract the comment now documents. --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-15 15:18:39 +02:00
ZdenekSrotyr	d55c8a3c33	feat(web): consolidate the personal /me/* surface — /me/activity + /me/profile (#304 ) Consolidates the scattered per-analyst pages into /me/activity (usage analytics) and /me/profile (account hub). /me/stats and /profile/sessions 301-redirect; /profile, /me/debug, /tokens are removed with every internal link repointed. Includes an XSS fix in the /me/activity page hero, the user_id-keyed session-lookup alignment, and the v0.54.15 release cut. Co-developed by @ZdenekSrotyr and @cvrysanek.	2026-05-14 21:29:51 +02:00
ZdenekSrotyr	3e19caa975	fix(security): RBAC filter uses stable user_id instead of mutable email local-part (#293 ) (#299 ) * fix(security): RBAC filter for agnes_sessions matches both email local-part and user_id The upload API (POST /api/upload/sessions) stores session files under user_sessions/{user_id}/ (UUID), while the session collector uses the OS username (email local-part). The session pipeline writes the directory name verbatim into usage_session_summary.username, so the column can contain either value depending on the ingestion path. The RBAC filter in build_filter_clause previously only matched the email local-part, missing sessions uploaded via the API. The fix adds an OR condition so non-admin users see rows where username matches either their email local-part or their user_id. Closes #293 Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com> * fix(security): RBAC filter uses stable user_id instead of mutable email local-part Closes #293 Previous fix used OR condition matching both email local-part and user_id in the username column. This was fragile: email changes would break filtering. This commit introduces a dedicated user_id column populated by the session pipeline via resolve_user_id(), and switches the RBAC filter to use it exclusively. Changes: - Schema v45: add user_id column to usage_session_summary and usage_events - UsageProcessor: accept and store user_id in both tables - runner.py: resolve_user_id() maps directory name to users.id UUID (exact match for UUID dirs, email LIKE for local-part dirs) - INTERNAL_TABLES: agnes_sessions/agnes_telemetry filter on user_id column - build_filter_clause: simplified to WHERE user_id = '<uuid>' (no OR) - me.py/admin_user_sessions.py: query by user_id OR username for backward compatibility during transition - USAGE_PROCESSOR_VERSION bumped 2→3 to trigger reprocessing/backfill - Tests updated: 27 pass including new email-change resilience test Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com> * fix(tests): bump schema version assertions 44→45 Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com> * fix(docs): correct resolve_user_id docstring, add TypeError comment Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com> * fix(security): address review — backward-compat OR, LIKE escaping, narrower TypeError Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com> * fix(security): address code review — eliminate TypeError hack, add resolve_user_id tests Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com> * fix(db): create user_id indexes in _v44_to_v45, not _SYSTEM_SCHEMA _SYSTEM_SCHEMA runs before the migration ladder. On an upgrade from v42/v43/v44, usage_events / usage_session_summary already exist without the user_id column (CREATE TABLE IF NOT EXISTS is a no-op), so the CREATE INDEX ... (user_id) lines in _SYSTEM_SCHEMA failed to bind and aborted _ensure_schema — the app would not start post-upgrade. Move the index creation to _v44_to_v45, which ADDs the column first. Same pattern as the v41 audit_log indices. * fix(usage): bump USAGE_PROCESSOR_VERSION 3→4 for user_id backfill #303 shipped USAGE_PROCESSOR_VERSION=3 (release 0.54.12) for its <command-name> slash extraction. This PR's 2→3 bump collided with it on rebase, so the reprocess loop would not re-trigger to backfill the new user_id column on deployments already running v3. Bump to 4. * release: 0.54.13 — RBAC filter uses stable user_id (#293) --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-05-14 14:12:54 +00:00
Vojtech	aa6a6700f4	feat(me/stats): per-analyst Stats dashboard with 4 tabs (#298 ) * feat(me/stats): per-analyst Stats dashboard with 4 tabs New /me/stats page shows the calling user's own analytics across four tabs, lazy-loaded per activation: - Sessions — paginated usage_session_summary join with a filesystem scan of un-processed JSONL (mirrors admin list_user_sessions shape). v44 token columns aggregated per row. - Tokens — daily series (default 30 days), by-model breakdown (lifetime), top-10 biggest sessions, lifetime totals. Single CTE per sub-query against per-user partition (idx_usage_session_user). - Data access — audit_log rows where action LIKE 'query.%' for the caller. Covers query.local / query.hybrid / query.remote / query.internal. Cursor-paginated on (timestamp, id). - Sync activity — audit_log rows where action is sync.* or manifest.* for the caller, plus users.last_pull_at for the header. Per-pull history now persists thanks to the new manifest.fetch audit row. Backend: app/api/me_stats.py — single APIRouter at /api/me/stats/, four GET endpoints, all gated by get_current_user (server-side caller scope; the page route itself only renders the shell). Frontend: app/web/templates/me_stats.html — tab bar + 4 panels, plain JS lazy-loads each panel's endpoint on first activation, caches per-tab so switching back doesn't refetch. Small SVG bar chart on Tokens tab (no external charting dep). 'Stats' link added to _app_header.html primary nav between 'Data Packages' and the Admin dropdown. Side change in app/api/sync.py: /api/sync/manifest now emits a manifest.fetch audit_log row alongside the existing users.last_pull_at bump. The column UPDATE only retains the most recent timestamp; per-pull history needs an audit row. client_kind='api' for the manifest endpoint (vs. 'web' which the audit-read deduper uses for AC reads), so the Sync tab can distinguish CLI pulls from browser-driven manifest peeks. 7 new tests in tests/test_me_stats.py: - sessions endpoint caller-scope isolation (user A doesn't see B) - sessions pagination - tokens empty-user zero shape - tokens aggregation across daily window + by_model + top + totals - queries endpoint filters to action LIKE 'query.%' + caller scope - sync endpoint surfaces both manifest.fetch and sync.trigger - manifest endpoint writes the manifest.fetch audit row ui(me/stats): widen page to 1400px via main.main escape Default base.html .container wraps content at max-width 800px. Stats tables (by-model + top-sessions: 6 columns each) felt cramped at that width — same constraint dashboard.html escapes via the {% block layout %} override pattern. Mirror that here: render <main class="main"> and bump .stats-page max-width to 1400px so the 6-column tables breathe without going edge-to-edge on wide monitors. * ui(me/stats): narrow from 1400px to 1280px to match /home /home isn't actually .container's default 800px — style-custom.css has a body:has(.home-mock) .container { max-width: 1280px } override that widens it. 1280px is the shared 'wide content' width across the codebase (top-nav header + /home + dashboard all use it). Bumping me_stats from 1400px to 1280px so the Stats page reads as 'same chrome' instead of distinctly wider than its sibling pages.	2026-05-14 10:27:58 +00:00
Vojtech	37ad39c8a3	feat(home): status frame on /home (operator-gated, onboarded-only) (#297 ) * feat(home): status frame on /home — last sync, sessions, prompts, tokens, projects Adds the homepage status frame: a 5-card row above the install-hero / offboard-strip on /home showing the calling user's Last sync (their last `agnes pull`), Sessions, Prompts, Tokens used, and Projects worked on, with a 24h/7d pill toggle. Backed by `GET /api/me/home-stats?window=` (one DuckDB CTE joining `users` + `usage_session_summary` + `usage_events`) and SSR'd from the same `compute_home_stats` helper on initial paint so there's no spinner. The window toggle is the only JS-driven path. Side surfaces: - `GET /api/sync/manifest` now stamps `users.last_pull_at` so `agnes pull` (and the Claude Code SessionStart hook that wraps it) imprints the analyst's last sync time for the new card. - `usage_session_summary` gains four BIGINT token counters (input_tokens, output_tokens, cache_read_tokens, cache_creation_tokens) summed from JSONL `message.usage.` per assistant turn. - `USAGE_PROCESSOR_VERSION` bumps 1 → 2 so the session-pipeline reprocess loop invalidates stale summaries and backfills tokens on the next tick. Schema migration v43 → v44 is idempotent ALTERs (last_pull_at + 4 token columns) — fresh installs receive them from `_SYSTEM_SCHEMA`, upgrade path runs `_v43_to_v44`. Defaults (NULL / 0) backfill existing rows cleanly. 9 new tests in tests/test_home_stats.py cover the migration, endpoint shapes (24h/7d/unknown/empty/missing-user), and the manifest-side last_pull_at bump. docs(CHANGELOG): homepage status frame entries under [Unreleased] The post-rebase release-cut now belongs to whichever PR lands next after main rolled to 0.54.9. This PR logs its bullets under [Unreleased] (Added: homepage status frame, per-user pull tracking, token counters; Changed: schema v43 → v44 migration) so they ride out with the next release-cut. * fix(tests): bump test_schema_v42_migration asserts to v44 CI failed because tests/test_schema_v42_migration.py hardcoded `assert SCHEMA_VERSION == 43` and `assert v == 43` after init. v44 (homepage stats frame backing columns) was introduced in the preceding feat commit; this aligns the existing v42-era migration tests with the new schema version. * feat(home): gate status frame on operator flag + user.onboarded Two gates on the homepage status frame: 1. Operator master switch — `get_home_status_frame_visibility()` in app/instance_config.py mirrors the existing `get_home_automode_visibility()` shape: env var `AGNES_HOME_SHOW_STATUS_FRAME` > yaml `instance.home.show_status_frame` > default `True`. Cautious-rollout instances can disable the frame without forking; the yaml example documents both knobs. 2. Onboarded gate — the template only renders the frame when the caller's `users.onboarded` is true. First-day users see a clean install-hero before all-zero stat cards; the frame appears automatically on the next render after `agnes init` POSTs `/api/me/onboarded`. Router skips the `compute_home_stats` DB read entirely when either gate is closed; `home_stats` arrives at the template as None in that branch and the `{% if %}` shortcuts the include. Why both gates: PostHog feature flags evaluated and rejected — this codebase uses PostHog for analytics capture only, not feature gating; adding a per-user feature_enabled() call on the /home critical path would couple the homepage render to a remote eval and still require an admin master switch. The onboarded gate is a UX coherence rule layered on top of the operator switch, not an A/B test signal. 3 new tests in test_home_stats.py cover the env-var resolution (falsey values + default-true). The yaml example gets a `home:` block documenting both `show_automode` (pre-existing flag, was undocumented in the example) and `show_status_frame`.	2026-05-14 09:28:47 +00:00
minasarustamyan	63ae676b27	perf(marketplace): cache cover photos + restore Curated filter spacing (#294 ) * perf(marketplace): browser-cache cover photos + restore Curated tab filter spacing Cover photos on /marketplace grid now serve with `Cache-Control: public, max-age=2592000, immutable` plus URL fingerprinting (`?v=<commit-sha8>` for curated, `?v=<version_no>` for flea) so browser refresh stops hitting the server entirely for unchanged assets. Per-plugin RBAC dropped from the three image endpoints (curated_asset, curated_mirrored, get_entity_photo) in favor of login-only auth — eliminates _system_db_lock contention on parallel image requests. Per-request magic-bytes revalidation also dropped from curated_asset (it was re-reading the file just to discard the bytes, then FileResponse read it again). Spacing bug: sort-dropdown commit (6be1cee) wrapped .mp-filter-row in a new flex container with inline margin-bottom:4px, masking the original 12px CSS rule. Curated tab (where .mp-type-row is hidden) ended up with 4px between filters and the card grid. Wrapper margin restored to 12px. See CHANGELOG entry under [Unreleased] — the RBAC relaxation is called out under ### Security with explicit threat-model rationale for AI/human reviewers. * test(marketplace): update renamed-html-as-png test for dropped magic-bytes check Magic-bytes body validation was dropped from `curated_asset` in the previous commit — the request path now relies on extension allowlist + pinned Content-Type + nosniff + strict CSP to neuter mismatched payloads at the browser layer. Update the test to assert the new defense-in-depth posture (200 served, but Content-Type=image/png + nosniff + CSP=default-src 'none') rather than the gone 415. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>	2026-05-14 10:09:32 +02:00
Vojtech	4501c9c3dd	fix(store-guardrails): post-#290 review follow-up — purge tuple, filter chip, stale docs, lazy bundle_meta, logger.exception (#295 ) Addresses post-merge review findings on #290: - Admin Rescan is the only post-v30 producer of status='blocked_inline'. Re-add it to admin queue 'Needs review' filter chip and to TERMINAL_BLOCKED_STATUSES in the bundle-purge job so rescan-produced rows surface in the default operator view and bundles get TTL-swept instead of lingering indefinitely. - Update three doc-drift sites still referring to the pre-#290 spam counter scope (counted blocked_inline). The counter now narrows to blocked_llm + review_error; fix the comment in app/api/store.py, the docstring in get_guardrails_blocked_quota_per_day(), and the operator-facing hint rendered on /admin/server-config. - Add positive test for _reject_inline_or_continue validation branch (code='validation_failed', checks payload shape, no-DB-write contract). Locks the frontend wizard's detail.checks contract. - Tighten test_quota_disabled_with_zero — assert (200, 201) explicitly instead of !=429 so a 500 regression no longer passes. - _reject_inline_or_continue takes plugin_dir and lazy-computes bundle_meta only on the security branch. Validation rejects no longer pay for a SHA256 walk on the bundle. - Surface store.upload.security_blocked audit-log write failures via logger.exception instead of swallowing — that audit row is the only forensic trace by design.	2026-05-14 08:02:44 +02:00
minasarustamyan	69a1e22cf5	feat(initial-workspace): per-instance agnes init override (#292 ) * feat(initial-workspace): per-instance agnes init override Adds Initial Workspace Template — an admin-configurable per-instance override for the agnes init analyst workspace. When configured, agnes init downloads a server-rendered zip from a Git repo the admin registered and extracts it into the analyst's workspace, fully bypassing Agnes-default CLAUDE.md / settings.json / hooks / slash commands / AGNES_WORKSPACE.md. Repo layout convention: only the contents of a top-level `workspace/` subdirectory ship to analysts; admin docs (README, CI configs) at the repo root stay in the repo and never reach an analyst. Sync rejects repos without `workspace/` at root. Server side: - src/initial_workspace.py — clone (or fetch+reset), validate, build zip with strict path checks and reserved-path rejection (workspace/.claude/init-complete reserved by Agnes) - app/api/initial_workspace.py — admin CRUD + sync endpoint + analyst- facing status/zip/applied endpoints; config persists to instance.yaml overlay, PAT to .env_overlay - app/secrets.py — refactor: persist_overlay_token shared helper with threading.Lock for .env_overlay writes (closes pre-existing race between concurrent marketplaces saves) - app/web/templates/admin_server_config.html — new "Initial Workspace Template" section + modal + Sync/Edit/Delete/Download buttons (matches existing cfg-section visual language) CLI side: - cli/lib/override.py — single source of truth for is_override_workspace sentinel detection - cli/lib/initial_workspace.py — probe status, safe zip extraction with ../absolute/symlink rejection, typed-YES force confirmation - cli/commands/init.py — override branch (skips Agnes-default workspace writes); extended sentinel with override:true, template_source, template_sha so future agnes self-upgrade does not auto-refresh hooks - cli/lib/hooks.py + cli/lib/commands.py — short-circuit on override workspaces (install_claude_hooks, install_claude_commands, maybe_refresh_claude_hooks) Audit-event strategy: server writes initial_workspace.fetch_started inside GET /api/initial-workspace.zip (cannot be spoofed by PAT-holder); CLI POST /applied writes initial_workspace.applied as best-effort confirmation. Admin mutations log via the existing _audit pattern. Tests: 27 server (clone/validate/zip + workspace-subdir convention + concurrent persist_overlay_token + endpoint shapes + audit rows) + 29 CLI (override sentinel parse + probe fall-through + safe extraction + YES strictness + hook guards + e2e mocked init). Risk acceptance — documented in docs/initial-workspace-override.md + CHANGELOG Internal section so AI reviewers understand the deviations from defaults are intentional: - maybe_refresh_claude_hooks deliberately no-ops on override workspaces - --force on override does NOT back up CLAUDE.md (admin's repo is the source of truth) - .claude/CLAUDE.local.md IS overwritten by override extraction when admin's repo ships one * test+vendor-agnostic: drop Groupon tokens from #292 fixtures + extend admin-gate coverage Two fixes from the takeover review on #292: 1. Vendor-agnostic OSS rule: Replace `Groupon` / `groupon/template` tokens in test fixtures with `Acme` / `acme/template` (8 sites in test_cli_init_override.py + 1 in test_initial_workspace_api.py). Per CLAUDE.md "Vendor-agnostic OSS — no customer-specific content" rule: customer-specific tokens don't belong in shipped artifacts, even in test fixtures. The pre-existing FoundryAI mentions in test_instance_config.py + test_setup_instructions.py are out of scope for this PR (didn't introduce them). 2. Admin-gate coverage gap: `test_admin_endpoints_require_admin` only covered GET /api/admin/initial-workspace + POST .../sync. The register-write (POST .../initial-workspace) and delete (DELETE .../initial-workspace) endpoints used the same `Depends(require_admin)` wiring but had no regression test. Loop now covers all 4 verbs so a future refactor that drops the dependency from one endpoint fails here instead of silently exposing the write/delete paths to any analyst with a PAT. * release: 0.54.9 — Initial Workspace Template (per-instance agnes init override) Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.8 → 0.54.9) for Mina's Initial Workspace Template feature. No DB migration (config lives in instance.yaml overlay). No mandatory operator action — empty default keeps OSS-default agnes init behavior. Operators wanting full template control link a Git repo on /admin/server-config → "Initial Workspace Template". See docs/initial-workspace-override.md for the full responsibility-transfer contract. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-13 20:35:01 +00:00
Vojtech	513711ed37	feat(store): hard-reject inline guardrail failures, trace security only (#290 ) * feat(store): hard-reject inline guardrail failures, trace security only Inline failures (manifest + content validation, static-security deny-list hits) now hard-reject upstream of any DB write or bundle persistence. The v30 contract that landed every inline failure as a hidden+blocked_inline entity + admin-rescannable bundle is replaced with two response shapes: - 422 code=validation_failed — manifest/content issues. Banner-only, no submission row, no audit_log entry. Submitter fixes and retries. - 422 code=security_blocked — static_scan finding. Banner-only on the wire, plus one audit_log row (store.upload.security_blocked) carrying findings + sha256 + size for admin forensics. Quarantine + admin rescan/override apply only to the async LLM path (blocked_llm / review_error) — the cases that genuinely benefit from admin judgment. Spam-quota counter narrows to blocked_llm + review_error. Admin queue filter chip drops blocked_inline. Bundle TTL purge stops sweeping blocked_inline. Legacy blocked_inline rows from instances that ran the v30 contract remain reachable via the "All" tab. New _reject_inline_or_continue helper in app/api/store.py centralises the two-tier rejection across create_entity, update_entity, and restore_version. Frontend templates render the new payloads as inline banners (no redirect on failure) and keep submission_blocked as a one-release back-compat branch. Tests: new _seed_quarantined_entity helper replaces the older _make_eval_skill_zip-driven setup wherever a test needs a hidden+blocked_llm entity. 199 store tests pass under -n auto. * release: 0.54.8 — store inline hard-reject (BREAKING) Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.7 → 0.54.8) wrapping Vojta's hard-reject refactor. BREAKING for store-upload clients: validation failures now return 422 with `code='validation_failed'` (no entity row, no submission row, no audit_log entry) instead of the v30 `submission_blocked` 200 response that landed a hidden `blocked_inline` row. Frontend wizard + edit + restore still understand the legacy code for one release as a fallback for stale clients hitting an older deploy. Operators with custom integrations against `POST /api/store/entities` should update to handle the new `code='validation_failed'` / `code='security_blocked'` 422 responses. No DB migration required (legacy `blocked_inline` rows from instances that ran the v30 contract remain reachable via the admin queue's "All" tab; bundle-purge job no longer covers them but they linger harmlessly). --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-13 19:59:12 +00:00
ZdenekSrotyr	117b6784ea	fix(sync+ops): defer-probe race, AGNES_TEMP_DIR chown, default-schedule env knob (#283 ) * fix(sync+ops): defer-probe race, AGNES_TEMP_DIR chown, default-schedule env knob Three sync-ops fixes surfaced during agnes-dev steady-state operation after the v0.46→v0.54 cutover settled. None of them depend on each other; bundled because they all live in the sync trigger / agnes-auto- upgrade flow and are diagnosed from the same observation window. 1. (fix) /api/sync/status race window. The trigger handler returned 200 BEFORE the background task acquired _sync_lock. In that few-hundred-ms gap, an honest /api/sync/status call returned locked=false — and the host-side agnes-auto-upgrade.sh defer probe fired right in that window proceeded with 'docker compose up -d' and SIGKILLed the just-spawning extractor / materialized worker. Observed on agnes-dev: 3 mid-sync container kills in 30 min, each followed by a few-min outage and a partial sync. The WAL replay auto-recovery (PR #217) kept the system DB consistent through each kill, but the actual sync work was lost. Fix: handler stamps _recent_trigger_at; status endpoint returns locked=true for _TRIGGER_HOLD_SEC (=30s) after the most recent trigger, even if the background task hasn't yet acquired the lock. 30s covers the schedule → spawn latency with margin; short enough not to indefinitely block auto-upgrade after a one-off trigger. Defense in depth: the real lock still gates the extractor subprocess. 2. (fix) scripts/ops/agnes-auto-upgrade.sh: post-upgrade chown loop now mkdir -p's /data/tmp before chown'ing, and includes it in the list of dirs that get the runtime UID:GID. /data/tmp is the default AGNES_TEMP_DIR set in docker-compose.yml — Snowflake-UNLOAD slice staging and CSV intermediates land here. Pre-fix the runtime user (uid 999) couldn't create /data/tmp under a root-owned data-disk root, so tempfiles silently fell back to the boot disk's overlayfs /tmp — defeating the whole point of routing slice staging onto the dedicated data volume. 3. (feat) AGNES_DEFAULT_SYNC_SCHEDULE env var sets the platform-wide fallback sync_schedule. Lets a deployment dial cadence down to 'daily 03:00' (data freshness budget once-per-day) without having to PUT every registry row. Per-table sync_schedule still wins; literal 'every 1h' is the floor if neither is set — OSS-historical default unchanged. Tests: - test_sync_status_trigger_hold_window_reports_locked_after_trigger - test_sync_status_trigger_hold_window_expires - test_default_schedule_falls_through_env_then_every_1h (3 branches) * release: 0.54.3 — sync defer-probe race + AGNES_TEMP_DIR chown + default-schedule env knob Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.2 → 0.54.3) bundling three sync-ops fixes from agnes-dev steady-state observation. No DB migration; trigger-hold window is additive (anything that already saw locked=true still does — the window EXTENDS the true period); /data/tmp chown is no-op when already correct; AGNES_DEFAULT_SYNC_SCHEDULE unset = every-1h default unchanged.	2026-05-13 09:44:20 +00:00
Vojtech	50a974f196	feat(store-guardrails): admin-configurable content thresholds (#281 ) * feat(store-guardrails): admin-configurable content thresholds Adds the flea-market content guardrail floors to the /admin/server-config editor so operators can tune the bar without code changes. Defaults are unchanged (60 chars description, 25 chars command, 5 distinct words, 200 chars body) — patching guardrails.* in instance.yaml or via the admin UI overrides any of them and the next inline check picks up the new value. src/store_guardrails/content_check.py now resolves the four floors via helper functions (_min_desc_chars / _min_command_desc_chars / _min_distinct_words / _min_body_chars) that read app.instance_config at call time. Module-level _DEFAULT_* constants remain as fallbacks if the import fails (defensive — keeps the guardrail module loadable without the app package on its path). app/instance_config.py grows four matching getters returning the live value with sane defaults + integer coercion. app/api/admin.py registers 'guardrails' as an editable section + ships nine known-fields entries (min_description_chars, min_command_description_chars, min_distinct_words, min_body_chars, enabled, review_model, blocked_quota_per_day, blocked_bundle_ttl_days, stuck_review_grace_seconds) with operator-facing hint copy explaining what each knob does. app/web/templates/admin_server_config.html gets a SECTION_META entry so the section renders as 'Flea-market guardrails' with a help string instead of a bare section ID. app/web/router.py threads the live thresholds into /store/new and /store/examples via a small _guardrail_thresholds() helper so the disclosure copy, char counter, and "Why these limits" table render the configured value (not a hardcoded 60). End-to-end smoke verified: PATCH guardrails.min_description_chars=90 → /store/new immediately renders "90 characters" + JS DESC_MIN=90 on the next request, no restart required (helpers read live config per call). * chore(store-guardrails): address PR review safe-fix findings Code-review safe_auto findings on PR #281 (review run 20260513-100126-64052520): - CHANGELOG: add Unreleased entry covering the new /admin/server-config Flea-market guardrails section, the four live threshold getters, and the route-helper rendering knobs. Required by the project's non-negotiable "Changelog discipline" rule. - content_check.py: narrow `except Exception` to `except ImportError` on the four `_min_()` resolver helpers. Surface-level TypeError / ValueError on a malformed YAML value belongs to the instance_config getters' own try/except — the resolvers should only defend against the in-tree import itself failing, not silently swallow real bugs in the getters. - store_upload.html: refresh the stale "30-char threshold" comment to reflect the configurable floor (default 60), and add `\|default(60)` / `\|default(25)` / `\|default(5)` filters to the disclosure-copy bindings so the upload form matches store_examples.html's belt-and-suspenders rendering if a future route ever renders the template without populating the `guardrail` context. - router.py: tighten `_guardrail_thresholds()` return annotation from bare `dict` to `dict[str, int]`. Residual work (left for separate change after operator direction): - Add round-trip test (PATCH guardrails -> next inline check uses new value) — primary testing gap. - Decide policy on `min_=0` (currently coerced to 1 via `max(1, int(val))`) vs treating 0 as a disable sentinel like neighbour getters (`blocked_quota_per_day`, `blocked_bundle_ttl_days`). - Add POST-time integer validation for `guardrails.` so a typo'd YAML value (bool / string / float) errors loudly instead of silently falling back to the default. test(store-guardrails): cover admin-configurable thresholds + PATCH round-trip Closes the "primary testing gap" Vojta noted in the safe-fix commit on PR #281 — the four new `get_guardrails_min_` getters and the PATCH-takes-effect-on-next-check live-config flow had no direct coverage. 10 new tests in `tests/test_store_guardrails_admin_config.py`: - TestGuardrailGetterDefaults (4 tests) — each new getter returns the documented default (60 / 25 / 5 / 200) when nothing is configured. - TestGuardrailGetterOverlay (5 tests) — overlay-driven overrides win, string values that look numeric coerce via int(), garbage strings fall back to default via the (TypeError, ValueError) branch, and the `max(1, int(val))` floor pins zero/negative inputs to 1. - TestPatchRoundTrip (1 test) — PATCH `/api/admin/server-config` `guardrails.min_description_chars=90`, then call content_check against a 75-char description that previously passed: must now fail with `too_short`. Then PATCH back to 60 and verify the next check passes again. Closes the cache-invalidation contract Vojta relies on for the "no app restart" claim — broken without the reset_cache() bracket in /api/admin/server-config. The TestGuardrailGetterOverlay.test_zero_or_negative_floored_to_one test pins the current `max(1, int(val))` policy. Vojta's safe-fix commit explicitly left "policy on min_=0 vs disable-sentinel" as residual work — pinning the current behavior here ensures any future change to use 0 as a disable sentinel must update this test (and the reviewer sees the policy decision). Verified: 4509 tests pass locally (4499 existing + 10 new). * release: 0.54.2 — admin-configurable flea-market guardrail thresholds + tests Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.1 → 0.54.2) bundling Vojta's admin-configurable thresholds for the flea-market content guardrail (9 knobs in /admin/server-config) plus the test coverage closing the "primary testing gap" he punted in the safe-fix commit. No DB migration; defaults unchanged from PR #276 — instances that don't set `guardrails.*` keep the original bar transparently. --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com> Co-authored-by: ZdenekSrotyr <139972147+ZdenekSrotyr@users.noreply.github.com>	2026-05-13 09:20:55 +00:00
minasarustamyan	efc607f3ee	feat(cli): agnes marketplace search/detail/add/remove + retire stale subcommands (#280 ) * feat(cli): agnes marketplace search/detail/add/remove + retire stale subcommands Unified CLI surface for the v28+ marketplace: search across Curated and Flea Market (RBAC-filtered server-side), drill into a single item's detail, add/remove from your stack. Replaces opt-out era commands that no longer reflect how users compose their stack. CLI changes: - Added: agnes marketplace {search,detail,add,remove} - Removed: agnes my-stack toggle (opt-out semantics, curated-only) - Removed: agnes store {list,show,install,uninstall} (consumer-side ops moved under marketplace; store now covers only creator-side upload, update, delete, mine) ID format unifies curated and flea: marketplace_id/plugin_name (slash) routes to /api/marketplace/curated/..., bare UUID routes to /api/store/entities/... (flea bundles skills/agents into a synthetic plugin server-side, so the analyst sees a single add/remove surface). Templates: - claude_md_template.txt: rewritten marketplace section as operational guidance for Claude Code (discovery, stack management, behaviour notes). Dropped the static {% if marketplaces %} listing — the CLI is the source of truth for what's in the stack at any moment, so a snapshot rendered at init time would lie the moment the user runs agnes marketplace add/remove. Same discipline already applied to tables and metrics. - agnes_workspace_template.txt: cheat sheet adds 5 marketplace one-liners; keeps the file's reference-doc tone (the original commit's intent: 'what is this thing, how does it work, how do I uninstall it'). Docs: HOWTO/05-customizing-skills.md rewritten around the new CLI flow; the opt-out section is replaced by 'Removing items from your stack'. Tests: new test_cli_marketplace.py covers all four subcommands incl. RBAC/409 paths (system plugin guard, not-approved flea entity); test_cli_store.py trimmed to the retained creator-side commands. * release: 0.54.1 — agnes marketplace CLI redesign + retire stale subcommands Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.0 → 0.54.1) bundling the BREAKING removals of `agnes my-stack toggle` and `agnes store {list,show,install,uninstall}` plus the new unified `agnes marketplace {search,detail,add,remove}` surface. No DB migration; no operator-facing config change. Operators on floating tags (`:stable`) auto-upgrade transparently. Analyst CLI upgrade prompt fires on next `agnes pull`; users invoking the retired commands get "No such command" with the new `agnes marketplace` substitution called out in the BREAKING bullets. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-13 05:20:56 +00:00
ZdenekSrotyr	b4d3c576af	Activity Center: audit log + telemetry + sessions + agnes_* tables (#278 ) * docs(spec): admin observability spec + Activity Center MVP plan Parent spec (480 lines) + executable plan (2295 lines, 14 TDD tasks). Covers Activity Center rebuild (/admin/activity), with /admin/sessions and /admin/feedback deferred to follow-up plans. Already incorporates reviewer-pass revisions across three angles (security, production resilience, code architecture): - _get_db import path corrected to app.auth.dependencies - Test fixtures aligned with seeded_app / admin_user / get_system_db - All new audit writes wrapped in try/except + logger.exception - Filename sanitization on session uploads - DuckDB DESC index behavior documented; upgrade window flagged - Migration idempotency + evolved-DB test cases - reveal_raw + shared-cache multi-worker explicitly deferred Targets schema v40 (audit_log gains params_before, client_ip, client_kind, correlation_id + 3 indices). * feat(db): schema v40 — audit_log gains params_before, client_ip, client_kind, correlation_id + 3 indices * chore(test): clean up Task 1 — drop unused import, rename stale test * feat(audit): AuditRepository.log() accepts params_before/client_ip/client_kind/correlation_id * test(audit): strengthen params_before assertion to round-trip JSON content * feat(audit): AuditRepository.query() rich filters + keyset cursor pagination * feat(sync): SyncStateRepository.list_recent() cross-table feed * feat(audit): POST /api/sync/trigger writes audit_log row * feat(audit): POST /api/scripts/run-due writes audit_log row * feat(audit): POST /api/upload/sessions writes audit_log row + sanitizes filename * feat(audit): GET /api/data/{table_id}/download writes audit_log row * feat(activity): /api/admin/activity timeline + /health + /sync endpoints * feat(ui): /admin/activity rebuilt — health pulse, timeline, sync grid; /activity-center → 308 redirect BREAKING: removed demo executive-pulse / maturity-roadmap content from activity_center.html. The page now reflects real audit_log + sync_history data. * feat(ui): admin nav + dashboard widget point at /admin/activity * feat(activity): recursive-audit suppression for AC read endpoints (60s window per actor+filter) * feat(activity): emit PostHog events when integration enabled (no-op default) * fix(audit): move v40 indices out of _SYSTEM_SCHEMA + update test_repositories to unpack query() tuple _SYSTEM_SCHEMA CREATE INDEX on audit_log(timestamp) failed when migration tests hand-roll a bare audit_log (id, action) without the timestamp column. Fix: remove indices from _SYSTEM_SCHEMA; add ADD COLUMN IF NOT EXISTS guards for timestamp and other pre-v40 columns in _v39_to_v40() so the upgrade path is safe on any hand-rolled schema; call _v39_to_v40 explicitly in the fresh-install (current==0) path to restore index creation there. Also unpack the (rows, next_cursor) tuple from AuditRepository.query() in the three TestAuditRepository tests that still treated it as a list. * docs: CHANGELOG entry for Activity Center MVP * chore: refresh stale module docstring in app/api/activity.py * feat(cli): agnes admin activity — terminal access to Activity Center (timeline + health + sync) * fix(db): _v39_to_v40 — add IF NOT EXISTS guard for 'action' column The v39→v40 ladder step adds defensive ADD COLUMN IF NOT EXISTS for every audit_log column so a hand-rolled bare audit_log (id only) is safe through the ladder. 'action' was missing from the guard list, causing CREATE INDEX idx_audit_action_time to fail on tests that stub audit_log with only an id column (tests/test_e2e_extract.py:: TestSchemaMigration::test_migration_preserves_and_extends). Local 6/6 schema tests + the previously-failing CI test pass. * docs(spec): platform telemetry epic — Boss directive + Activity Monitoring plan rebased onto v40 (stacked on zs/spec-activity-center) * feat(db): schema v41 — 7 usage_* tables for telemetry (events, summary, rollups, attribution) * chore(db): tighten v41 — usage_session_summary.session_id NOT NULL + upgrade test asserts all 7 tables * feat(usage): UsageAttributionRepository — replace/delete/lookup over usage_attribution_* tables * refactor(marketplace): extract list_inner_skills/agents/commands to src/marketplace_listing.py for reuse * feat(usage): explode plugin attribution on marketplace sync + store entity write; backfill script * refactor(marketplace): finish src/marketplace_listing.py extraction — drop duplicate _list_inner_* + _parse_frontmatter from app/api/marketplace.py * feat(usage): promote attribution helpers to src/usage_attribution_helpers.py; hook update_entity rename + bundle-swap; clarify best-effort semantics * feat(usage): UsageProcessor real extraction + rollup rebuild + 10 fixture-driven tests * fix(usage): include tool_id in event hash + executemany + rollup transaction (critical multi-tool-turn drop fix) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(marketplace): popularity stats — invocations_30d + trend + sort=most_used\|trending + Most Popular section * feat(admin): /admin/users/<id> Sessions section — list + single-file + bulk-zip downloads (audit-logged) * feat(usage): admin export endpoint + CLI — csv/json/parquet streaming, filters, audit-logged * feat(usage): agnes admin ask — LLM Text-to-SQL over usage_events with SELECT-only validator (audit-logged) * feat(usage): reprocess + prune endpoints + scheduler daily prune job + CLI * docs: PLATFORM_SETUP.md operator playbook + HOWTO/ cookbook (5 guides + index) Adds docs/PLATFORM_SETUP.md as a consolidated operator playbook covering bootstrap, TLS, marketplaces (curated + flea), scheduler env vars, telemetry extraction/export/ask/prune, privacy posture, and daily routine. Adds docs/HOWTO/ with 5 analyst cookbook guides: first query, snapshots for remote tables, private sessions, feedback + admin ask, and customizing skills. Existing setup docs (QUICKSTART, DEPLOYMENT, ONBOARDING, HEADLESS_USAGE) get a one-line cross-reference at the top pointing to PLATFORM_SETUP.md. * docs(changelog): platform telemetry epic — usage_* foundation + surfaces + admin access + docs Comprehensive [Unreleased] entry covering: usage_events/session_summary/ tool_daily/plugin_daily tables (v41), attribution lookup tables, backfill script, marketplace Most Popular + invocation chips + sort, admin Sessions section, export/ask/reprocess/prune endpoints + CLI mirrors, Activity Center (v40), PLATFORM_SETUP.md + HOWTO/ docs, and operations notes for v41 upgrade. * fix(security): block DuckDB read_/http_/glob functions in usage_ask validator + symlink escape guard in session zip + clarify mark-private semantics * fix(admin): parquet export tempfile cleanup on COPY failure + correct processed-first sort on /admin/users/<id>/sessions * feat(audit): close 8 production audit gaps — query (local/remote/hybrid), catalog/schema/sample, snapshot estimate/create, check-access * feat(ui): /admin/usage summary dashboard + per-user activity tab on /admin/users/<id> * fix(audit): cap error messages at 200 chars + audit user_activity reads + recursion guard on usage.summary * fix(audit): catalog.list audits on error path + clean up deferred json import * fix(ux): client_kind=cli for PAT auth + timeline empty state + email-instead-of-uuid + nav reorder + help text + loading indicators + ask doc * feat(observability): unify /admin/activity into single page with saved views - KPI cards (events, users, error rate, p95) clickable as quick-filters - Faceted filter dropdowns populated from audit_log in the current window - Sortable audit table, cursor pagination, per-row JSON side panel - Saved views (schema v43: user_observability_views) — per-user state - Top bar: window selector + 30s Live toggle + saved views dropdown - /admin/scheduler-runs → 308 redirect (source=scheduler filter) - New endpoints: /api/admin/observability/{facets,kpis,views} * test: update activity + scheduler-runs tests for unified page - test_admin_activity_page_renders asserts new structural anchors - test_admin_scheduler_runs_page_admin_only asserts 308 redirect * fix(observability): respect [hidden] on modal + side panel CSS `display: flex` on .obs-modal beat the [hidden] attribute's UA display:none, so the save-view modal rendered on page load and Cancel clicks couldn't dismiss it. Gate the modal's flex layout on :not([hidden]); add the same display:none guard prophylactically to .obs-panel and .obs-views-panel. * feat(observability): user enrichment in audit + interactive /admin/usage Activity: - /api/admin/activity now joins users for user_email + user_name per row - User column renders "name (id-prefix)" or "email (id-prefix)" instead of an opaque truncated UUID; falls back to id when the user record is missing Usage: - /admin/usage rewritten as the same filter/group-by/search pattern as /admin/activity. Faceted dropdowns (User / Tool / Source / Event type) populated from usage_events; debounced free-text search across tool_name / skill_name / subagent_type / command_name - New endpoints /api/admin/usage/{facets,kpis,query}; the query endpoint supports group_by in {day, username, tool_name, source, ref_id} with sort + offset pagination, plus an ungrouped raw-events mode - 4 KPI cards (events, distinct users, distinct tools, error rate) are clickable quick-filters; clicking a grouped row applies the bucket as a filter - Old static `?window=7d\|30d\|all` server preload removed; all state is client-side via since_minutes + group_by + filters in the URL * fix(observability): clearer labels, all-column sort, drop saved views UI - Rename page titles: "Activity" → "Server activity", "Usage" → "Tool usage" with a one-line subtitle on each explaining what the page covers and linking the other one. The two pages source different data (audit_log vs usage_events) and the previous labels conflated them. - Drop the saved-views dropdown + save modal from /admin/activity. The modal pop-open bug was the trigger; the value wasn't there yet. The /api/admin/observability/views CRUD + DuckDB table stay in place. - Rename "Live (30s)" to "Auto-refresh (30s)" with a tooltip clarifying that it's the re-fetch rate, not the time range. Time range now labeled "Time range" instead of "Window". - All audit-table columns are sortable (User, Source, Action, Resource, Result added); sort is page-local with a Jinja comment explaining the trade-off. Same for raw usage rows. - Fix duplicate sort-arrow bug — the literal "▼" in the Time th HTML was rendering alongside the CSS ::before arrow. Removed the literal; CSS is the single source of truth. * feat(observability): global Sessions browser + transcript viewer + CLI Web: - /admin/sessions — list every collected session JSONL across all users with time-range, user, model, errors-only and free-text filters. Default sort surfaces error-heavy sessions first. KPI cards (sessions, distinct users, sessions w/ errors, tool error rate) clickable as quick-filters. - /admin/sessions/<username>/<file> — transcript viewer rendering the JSONL chronologically: user prompts, assistant text, tool calls (with JSON input) and tool results (with flattened output). Errors get a red border + chip and a "Next error" navigation button at the top. - Admin dropdown gains a "Sessions" link. API: - GET /api/admin/sessions/{list,kpis,facets} — filtered cross-user reads off usage_session_summary - GET /api/admin/sessions/{username}/{file}/transcript — parses JSONL via the existing services.session_pipeline.lib, returns chronological events - GET /api/admin/sessions/{username}/{file}/download — JSONL stream, same path-safety guards as the per-user endpoint, audit-logged CLI: - `agnes admin sessions list [--user X] [--errors] [--since 7d]` — table output with `!` prefix on rows that hit a tool error - `agnes admin sessions show <username> <file>` — transcript dump, with `--errors` to print only the failed tool_result blocks - `agnes admin sessions download <username> <file> [-o path]` - `agnes admin sessions kpis` — top-level numbers * feat(internal): expose telemetry tables to agnes query with row-level RBAC Three new registered tables backed by system.duckdb, queryable through the same /api/query plumbing analysts use for Keboola / BigQuery / local sources: agnes_sessions → usage_session_summary (filter: username) agnes_usage → usage_events (filter: username) agnes_audit → audit_log (filter: user_id) RBAC is per-row, not per-table: admins see every user's rows; non-admins see only their own. The filter is built server-side from the auth user dict; non-admin filter values are regex-validated before SQL interpolation. Implementation: - new connector connectors/internal/ with access (filter+exec) + registry (idempotent table_registry seed at startup) - /api/query detects internal table refs and short-circuits to a CTE wrapper that prepends "WITH agnes_x AS (SELECT * FROM <src> WHERE …), …" then "SELECT * FROM (<user_sql>) AS _q". DuckDB cursor on the shared system.duckdb handle — opening parallel handles / ATTACH on the same file is blocked process-wide. - mixing internal + BQ / registered local tables in one SELECT is rejected (v1 limitation) - src.rbac.can_access_table waves internal tables through for all authenticated users; row scoping is the actual security control - /api/v2/schema and /api/v2/sample gained internal branches; sample intentionally skips its cache because rows are RBAC-scoped per caller - audit row written as action='query.internal' with is_admin flag Tests: connectors/internal/access — RBAC, filter clause, schema, CTE wrapper coexistence with user-supplied aggregations, unsafe-username rejection. 16/16 passing. Motivating queries this enables: SELECT tool_name, COUNT() FROM agnes_usage WHERE is_error GROUP BY 1 ORDER BY 2 DESC -- analyst self-introspection: which tools fail for me? SELECT user_id, COUNT() FROM agnes_audit WHERE action = 'session.transcript_view' GROUP BY 1 -- admin: who's been looking at whose session transcripts? * feat(admin): group dropdown into 5 named sections + internal tables in /catalog Admin dropdown gains section headers so admins can land on the right page without re-reading the full menu: Activity Center Server activity / Tool usage / Sessions Users & Access Users / Groups / Resource access / Tokens Data Tables Agent Experience Curated Marketplaces / Flea Submissions / Agent Setup Prompt / Agent Workspace Prompt Server Server config "Agent Experience" frames the curated content + prompts as one cluster — it's all admin-controlled material that shapes what an analyst's AI agent encounters. "Configuration" → "Server" since only one item lives there now. Renamed the section's first two items: "Activity" → "Server activity" (matches page H1) "Usage" → "Tool usage" Also fixes /catalog visibility of the internal tables (agnes_sessions / _usage / _audit) for non-admin users: ``app.auth.access.can_access`` short-circuits to True for resource_type='table' + an internal-table id. Without this, non-admins saw the tables in /api/v2/catalog (which uses the same RBAC bypass) but not on the /catalog HTML page (which calls can_access directly, requiring a resource_grants row internal tables don't have). CSS for `.app-nav-menu-section`: small caps, muted, non-clickable; first section trims top padding so the panel doesn't open with an awkward gap. * refactor(admin): move corporate memory into Admin > Agent Experience Memory link was the only admin-only entry in the primary nav (gated by session.user.is_admin). Moves it into the Admin dropdown under Agent Experience, alongside Curated Marketplaces / Flea Submissions / Prompts — all admin-curated content that shapes what an analyst's AI agent encounters. Renamed the nav label to "Shared Knowledge" to match what the page actually is (admin-curated organisational knowledge from session verification, surfaced to agents). URL stays at /corporate-memory; the route still gates on require_admin per the existing comment. Side effect: primary nav (Home / Marketplace / Data Packages) is now uniform for every authenticated user — no conditional admin-only entry. * ui: rename admin entries to Curated Knowledge / Init Prompt / Workspace Prompt - "Shared Knowledge" → "Curated Knowledge" (parallel with "Curated Marketplaces" in the same Agent Experience section; "curated" tells the admin what they do there — review + approve) - "Agent Setup Prompt" → "Init Prompt" (matches the `agnes init` flow it actually drives) - "Agent Workspace Prompt" → "Workspace Prompt" (the "Agent" prefix was redundant — every item in the section is agent-facing) Renames page titles + H1s on /admin/agent-prompt and /admin/workspace-prompt to match. * refactor: rename Usage → Telemetry across user-facing surfaces External surfaces all switch; internal Python module / file names and the physical DB tables (usage_events, usage_session_summary, usage_tool_daily, usage_plugin_daily) stay — renaming them would force a schema migration + a redo of the LLM Text-to-SQL prompt for no analyst-visible win. Changes: - Admin dropdown: "Tool usage" → "Telemetry" - Page H1 / <title>: same - URL: /admin/usage → /admin/telemetry; old URL 308-redirects - API prefix: /api/admin/usage/* → /api/admin/telemetry/* - CLI: primary command `agnes admin telemetry …`; `agnes admin usage` kept as a deprecated alias so existing operator scripts keep working - Internal data-source table id: agnes_usage → agnes_telemetry. The registry seed now evicts any stale internal-source row whose id no longer matches INTERNAL_TABLES, so the old `agnes_usage` row is removed from table_registry on next app boot - All tests + JS endpoint paths updated * test(rbac): include auto-appended internal tables in expectations get_accessible_tables now appends agnes_sessions / agnes_telemetry / agnes_audit to every authenticated user's accessible-tables list so the internal data source shows up in /catalog. The two existing rbac tests asserted hardcoded list shapes that pre-dated the change. Rewritten to assert "granted tables + the canonical internal-table set" instead of literal lists, so the test stays correct if the internal table roster changes again later. * ui: visual dividers between admin-dropdown sections Adds a 1px top border + 6px top margin to every section header except the first, so the five named groups (Activity Center, Users & Access, Data, Agent Experience, Server) read as visually separated clusters. The header itself stays small-caps + muted as before — the border is additive. * ui(memory): match obs-topbar visual on /corporate-memory The Curated Knowledge page (linked from the admin dropdown's Agent Experience section) opened straight into the stats bar — no title, no subtitle, no shared chrome with the other admin pages. Adds an obs-topbar-style header at the top of .container-memory: - H1 "Curated Knowledge" - subtitle explaining what the page is + how AI agents pull from it The `.ck-` class set duplicates the inline obs- styles from /admin/activity etc. for this one page; promoting the obs-* class set to style-custom.css for shared reuse is the obvious next step (4 pages already inline the same CSS), tracked as a follow-up. Page <title> also renamed from "Corporate Memory" → "Curated Knowledge". * ui(tables): list Agnes internal tables in /admin/tables + group in /catalog /admin/tables previously rendered three per-source-type listings (BQ / Keboola / Jira) and dropped any row whose source_type didn't match — so the agnes_sessions / agnes_telemetry / agnes_audit rows seeded into table_registry were invisible. Adds a fourth read-only section "Agnes internal tables" that filters source_type === 'internal' and renders the same registry-table layout the other sections use, with two changes: - no Register button (these rows are seeded on every app boot from connectors/internal/registry.py) - Edit + Delete actions hidden (any change would be reverted on the next start). Manage access stays so admins can still inspect. Mode badge picks up a new mode-internal CSS class (teal accent) so the display doesn't lie and call it "local". In /catalog, internal tables now group under an "agnes" accordion section (bucket="agnes" on seed) instead of falling into the catch-all "default". Single source of truth for which tables exist; admins find them where they expect. * ui(tables): Agnes internal as a 4th tab next to BQ/Keboola/Jira Previous iteration mounted the internal-table listing as a separate standalone card under the tab strip. Reshapes it to a proper tab-content section so admins switch between data sources via one consistent nav (BigQuery / Keboola / Jira / Agnes internal). - New tab button "Agnes internal" in the tab-nav. - The listing card becomes <section id="tab-content-internal" class="tab-content">; switchTab() already routes by id so no JS change beyond extending the hash allowlist for direct #internal links. - Tab content keeps the read-only treatment from the previous commit (no Register button, no Edit / Delete in renderRegistryListing). * ui: rename Curated Knowledge → Curated Memory Settles the naming back on "Curated Memory" — parallel structure with "Curated Marketplaces" in the same Agent Experience section, and zero rename ripple: URL (/corporate-memory), API (/api/memory/), CLI (agnes admin memory), and Python modules all stay on "memory" so the admin label finally lines up with the underlying surfaces. The "Curated" prefix still tells admins what they do on the page (review pending → approve / mandate / reject) and reads as a sibling of "Curated Marketplaces" right next to it in the dropdown. Touches: admin dropdown label, page <title>, page H1. DB tables stay on knowledge_ (already the canonical naming for the data shape). * ui: rename "Server activity" → "Audit log" "Audit log" is what the page actually is — server-side audit_log table rendered with KPI cards + filter bar + sortable table. The "Server activity" label confused the term with Claude Code session telemetry (Telemetry page) and didn't make the source/concept clear. Touches: - Admin dropdown nav label - /admin/activity page H1 + subtitle - /admin/telemetry subtitle cross-link - test_activity_api page-renders assertion URL (/admin/activity) and API (/api/admin/activity/) stay — the "activity" name has stuck at the route layer for a year; rerouting those would churn dashboards/bookmarks for zero analyst-visible win. ui(admin-nav): gray band on each section header for clearer separation Previous iteration used a 1px top border between section labels — the labels still blended into the items above/below at a glance. Switches to a light gray background band per section header, extended edge-to- edge inside the panel via negative horizontal margins. Bolder font-weight (700) reinforces the separation; bumping the font color isn't needed because the band itself does the work. First section's header tucks into the panel's top border-radius so the band reaches the corners without a gap. * ui(catalog): rename internal-table category to "Agnes Internal" `bucket` is what /catalog renders as the accordion category header verbatim — "agnes" lowercase didn't read as a real category name and got confused with a system identifier. Bumps to "Agnes Internal". Seed re-applies on every app boot so existing rows pick up the new bucket value via `ON CONFLICT (id) DO UPDATE`. * ui(catalog): split Agnes Internal into its own card on /catalog Previously the three internal tables landed inside the "Core Business Data" card under an "Agnes Internal" accordion alongside Keboola / BQ buckets — readers conflated system telemetry with business datasets, and the data_stats header counter ("3 tables · ~X rows total") only ever counted synced rows so internal tables looked invisible. Split the catalog page into two cards: - Core Business Data: only non-internal source_types (Keboola, BQ, Jira). Accordions group by bucket as before. Stats counter reflects this card's tables. - Agnes Internal: a dedicated card with its own visual treatment (teal accent matching the mode-internal badge in /admin/tables). Flat list (no accordion — only 3 rows, never grows here), each row carries the canonical `agnes query` snippet. Read-only — no profiler click, no In-stack toggle, no sync metadata. Route adds `internal_card` context object; template renders the new card only when it's non-None. * fix(rbac): hide internal tables from /admin/access + drop "my" framing Two related cleanups for the Agnes-internal tables: 1. /admin/access (resource grants) no longer lists them. The `can_access` check has a hardcoded internal-table bypass — security is row-level (per-request view filter), so a table-grain `resource_grants` row would do nothing. Surfacing them in the UI let admins set up grants that silently no-op. Filter at the `_table_blocks` projection so the UI tree never sees them. 2. Display names drop the analyst-perspective "my" framing: "Agnes — my sessions" → "Agnes sessions" "Agnes — my telemetry events" → "Agnes telemetry events" "Agnes — my audit log" → "Agnes audit log" The "my" only makes sense from the querying analyst's seat (`SELECT … FROM agnes_sessions` returns their rows); on /admin/* pages where admin sees / configures them across users, the pronoun was misleading. Description text now spells out the row-level RBAC contract explicitly. Display names update via TableRegistryRepository.register's ON CONFLICT UPDATE on next app boot; no manual cleanup needed. * ui: subtitle notes about agnes_* tables on each Activity Center page The recursive observability story — Agnes serves its own audit / telemetry / session data through the same `agnes query` plumbing analysts use for business data — wasn't surfaced anywhere on the admin pages that show that data. Three pages get a one-liner with the canonical `agnes query` snippet + the RBAC contract (analysts see their own rows, admin sees all): - /admin/activity (Audit log) → agnes_audit - /admin/telemetry (Tool usage) → agnes_telemetry - /admin/sessions → agnes_sessions Sets up the discovery moment for admins: they're reading the page, they see "you can query this from Claude Code", they remember it when an analyst asks "how do I find my own failed tool calls?". * ui(tables): explain "Show log" empty-state on /admin/tables Cache warmup log <pre> renders with a dark background and is only populated by the SSE stream during a Re-warm all run. Opening the page cold + clicking Show log just revealed a black bar with no context — admins couldn't tell what they were looking at. Adds an inline paragraph above the <pre> explaining what the log is, the row format, when it fills in, and where to find the historical audit trail (/admin/activity). The actual <pre> stays empty until SSE events arrive, but the surrounding copy carries the meaning. * ui(tables): auto-open cache-warmup log on Re-warm all click A Re-warm all run takes ~24s per remote BQ row. With the <details> collapsed by default, operators saw the button disable, watched a quiet ~24s pass, and assumed nothing had happened — the streaming log was hidden behind a closed disclosure. Two small JS tweaks: - cacheWarmupRun() opens the details on click, so streamed lines appear without an extra interaction - cacheWarmupOnStart() hides the inline hint paragraph the moment real log content lands, so the dark log block isn't competing with redundant context Hint paragraph also clarifies that only `query_mode='remote'` BQ rows are warmed — operators with only materialized/internal tables would see total=0 and the page would "do nothing" by spec. * ui: trim Agnes internal copy across surfaces Descriptions had grown to explain the extraction pipeline ("parsed out of session JSONLs"), the underlying table ("Backed by usage_session_summary"), the RBAC mechanic ("row-level RBAC at query time — analysts see their own; admin sees all"), and the SQL snippet. Every implementation detail meant another rewrite on the next iter. Strips to one stable line per surface: what the data is, plus "Also available locally for analysis". Mechanics live in code + docs; the page copy says what the user needs to know. Touched: - connectors/internal/access.py: INTERNAL_TABLES descriptions - activity_center.html / admin_usage.html / admin_sessions.html subtitles - catalog.html Agnes Internal card description + row strip - admin_tables.html "Agnes internal" tab hint * fix(internal): is_user_admin arity bugs + + saved-view payload cap Round-1 code review (PR #278) caught two blocking bugs and three nits. Blocking — both `is_user_admin(user)` (single dict arg) calls raised TypeError. is_user_admin signature is `(user_id, conn)`. Affected: - app/api/query.py:_run_internal_query — every POST /api/query that references agnes_sessions / agnes_telemetry / agnes_audit blew up with a 500. The headline analyst-facing feature of this PR was unusable through the API. - app/api/v2_sample.py — same shape; `GET /api/v2/sample/agnes_` returned 500. Both fixed to call `is_user_admin(user.get("id"), conn)`. Added two FastAPI-level tests in test_internal_data_source.py that go through the TestClient — the existing unit tests on `execute_internal_query` and `build_filter_clause` skipped the request-handler layer where the bugs lived, which is why this landed. Nits also closed: - connectors/internal/access.py: `+` allowed in _USERNAME_RE / _USER_ID_RE so RFC 5321 email local-parts (alice+test@x) resolve correctly without hitting InternalAccessError. - app/api/observability.py: saved-view payload capped at 64 KiB to prevent an admin from bloating system.duckdb with a malformed save. fix(security): close non-admin data-leak via underlying-table refs PR #278 R2 review surfaced a non-admin-exploitable bypass: SQL whose string literal contains 'agnes_sessions' routed into the privileged internal-query path, then queried the underlying physical table (usage_session_summary / usage_events / audit_log) directly, escaping the CTE wrapper's row filter. Two reinforcing defenses: 1. find_internal_refs() now strips single-quoted string literals before scanning for alias names — a literal alone no longer routes the request into the privileged code path. 2. execute_internal_query() rejects non-admin SQL that references the underlying physical tables (usage_, audit_log). The CTE wrapper only scopes the agnes_ aliases; a direct FROM on the base table — or a shadowing inner WITH that still has to read the base table — bypasses RBAC. Block before execution with an actionable error pointing to the agnes_* alias. Admins are unaffected (god-mode short-circuit on the filter clause). 3. tests/test_internal_data_source.py — three new negative tests covering literal-only matches, direct-table refs, and CTE shadow attempts. Also tightens usage_ask.py's SELECT-only validator: pragma_table_info, pragma_storage_info, pragma_database_, and duckdb_tables / columns / views / indexes / schemas are reflection functions that leak metadata the analyst question shouldn't reach. \bPRAGMA\b in _FORBIDDEN never matched the function-call form (word-boundary between `A` and `_`). fix(security): dynamic denylist for non-admin internal queries R3 review (PR #278) caught a wider data-leak than R2: the underlying- physical-table guard listed only the 7 usage_* + audit_log tables, but system.duckdb has 30+ other sensitive tables — users (emails + ids), personal_access_tokens, resource_grants, user_groups, user_observability_views, store_, marketplace_, knowledge_, etc. A non-admin SQL like SELECT FROM agnes_sessions UNION ALL SELECT email, id, … FROM users LIMIT 1 would leak every user's row. Replaces the hardcoded denylist with a dynamic allowlist — non-admin SQL may reference ONLY the registered agnes_* aliases. Every other table in `information_schema.tables` (main schema) is rejected. Future migrations that add a new sensitive table are automatically covered without re-editing this module. Also strips SQL comments (`/* /` and `--`) before the identifier scan so a comment-wrapped table name (`//users//`) can't slip past the regex. Four new negative tests pin: `users`, `personal_access_tokens`, block-comment wrap, line-comment wrap. Plus: per-user view-count cap (100) on /api/admin/observability/views so an admin can't fill system.duckdb with thousands of saved views. release: 0.54.0 — Activity Center + Telemetry + Sessions + internal datasource Cuts the work shipped across this PR (Activity Center build, recursive internal data source) into a versioned release. Bumps pyproject.toml to 0.54.0; renames the top of CHANGELOG.md from [Unreleased] to [0.54.0] — 2026-05-12 with a header summary; opens a fresh [Unreleased] section for the next round. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 22:41:19 +02:00
Vojtech	fb6e930bc9	feat(store-guardrails): per-component description quality + plain-language UX (#276 ) * feat(store-guardrails): enforce per-component description quality Two-tier hard guardrail on flea-market submissions. Empty / placeholder / single-word descriptions now block before any LLM call; vague-but-passes- floor descriptions block on the substantive LLM review layer. Tier 1 — inline mechanical check (src/store_guardrails/content_check.py). Walks the baked plugin tree, evaluates each component (plugin manifest, agents, skills, commands) plus the submission-level form description against a 60-char / 25-char (commands) / 5-distinct-word / 200-char-body floor with a placeholder denylist (TODO, TBD, {{var}}, etc.). Floors calibrated against real ecosystem norms: Claude / superpowers / compound-engineering skill packs cluster 150–220 chars, npm / Docker / VS Code at 100–120. InlineResult.passed now ANDs in content.status. Tier 2 — LLM review extension (prompts.py + llm_review.py). System prompt gains a content-quality criterion; REVIEW_JSON_SCHEMA carries a content_quality {verdict, issues[]} object alongside the existing security findings. is_safe() requires content_quality.verdict == 'pass'. Single LLM call covers both dimensions. MAX_RESPONSE_TOKENS bumped 2000 → 2500 for the extra payload. Verdicts missing content_quality treated as pass (backwards compat with already-recorded rows). Submitter UX: - /store/new wizard now carries a "Before you upload — what passes review" collapsible disclosure on both step 1 and step 2 with the bar + patterns that work. Live char counter on the description field. Per-component preview table (green/red dots from the new summarize_for_preview helper) renders after the ZIP /preview round trip, scoping each finding to its file. - New /store/examples page with rejected/passes pairs for skill / agent / plugin / command plus a "Why these limits" research table. Anchored sections (#skill / #agent / #plugin / #command) so the rejection banner can deep-link by component_type. - Quarantine banner _content_findings.html groups findings by file (one "See <type> example ↗" per component, not per field) and translates field codes (frontmatter.description / body / etc.) to plain-English labels. _content_howto_fix.html surfaces a static "Re-upload as new version" + "See examples" action row beneath any content failure on the entity detail page. - _parse_frontmatter moved to src/store_guardrails/_frontmatter.py so the new check module shares the parser without inverting the app → src dependency direction. Tests: - New tests/test_store_guardrails_content.py (29 cases) covering every failure code per component type plus submission-level checks and the summarize_components / summarize_for_preview helpers. - Extended test_store_guardrails_inline.py for the new InlineResult.content field + aggregate behaviour. - Extended test_store_guardrails_llm.py for the new content_quality verdict pathways (fail blocks, missing field passes). - Backfilled fixture descriptions across test_store_api.py, test_store_entity_versions.py, test_store_put_atomic.py, test_admin_store_submissions.py, test_marketplace_api.py, test_marketplace_v32_endpoints.py so existing happy-path tests clear the new 60-char floor. * fix(content-guardrail): align agents walker with preview + drop import-time .format() Two cleanups from the takeover review on #276 (vr/guardrails-content). 1) `_iter_components` for agents now skips files lacking frontmatter (no `name` AND no `description`). Pre-fix the walker greedily evaluated every `.md` under `agents/` — `agents/README.md` and helper docs got flagged as "frontmatter.description empty" rejections. Worse: `summarize_for_preview` for `type=agent` ALREADY filters the same shape, so the upload preview gave a green dot while the post-bake check gave a red rejection on submit. Two new regression tests in TestAgentsWalkerSkipsNonAgentFiles pin both shapes (README + _NOTES.md) so the preview/check parity stays aligned. 2) `body_too_short` hints now use the same runtime-kwarg substitution pattern as every other hint in the table. Pre-fix the skill + agent body_too_short hints called `.format(min_chars=_MIN_BODY_CHARS)` at module-load time, but the call site `_hint_for(type_, "body_too_short")` didn't pass `min_chars=`, so the format() was just baking the constant at import. Cosmetic inconsistency; pass `min_chars=_MIN_BODY_CHARS` at the call site instead and let `_hint_for` do the substitution like it does for `too_short`. Verified end-to-end: - New TestAgentsWalkerSkipsNonAgentFiles cases fail on the unfixed walker (verified by reverting to the pre-fix file and re-running); pass cleanly after the fix. - Full content-guardrail suite: 25/25 (23 existing + 2 new). - Full pytest: 4189 passed, 25 skipped. release: 0.53.5 — content guardrail (flea-market submitter UX) + catalog ENTITY column + BQ hint dispatch Bundles three threads landed in [Unreleased]: - Vojta's flea-market content guardrail (two-tier mechanical + LLM) - Zdeněk's `agnes catalog` ENTITY column replacement for FLAVOR - Zdeněk's `/api/query` remote_estimate_failed hint dispatch fix Plus the takeover hygiene from #276 review (agents walker preview/check parity + body_too_short hint runtime kwarg consistency) and the backslash-escape fix follow-up to v0.53.4 #275. No DB migration; no API change. Patch upgrade lands transparently. Upload form's new "Before you upload" disclosure + per-component preview table appear on the next dev-VM auto-pull. Quarantine banner now groups findings by file with "See <type> example ↗" deep-links to the new /store/examples reference page. --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-12 21:48:27 +02:00
ZdenekSrotyr	1ade1300c6	fix(bq-hint): drop literal backslash escapes from syntax-error hint string (#275 ) PR #274 (just merged) introduced `\`AS \\\`rows\\\`\`` in the syntax-error branch of _hint_for_bq_bad_request. Python doesn't recognize \\\` as an escape sequence, so the literal backslashes survived into the JSON `hint:` field. Analyst reading the CLI error saw: Backtick the alias (`AS \`rows\``) or rename it ... with visible backslashes — exactly the misleading shape this dispatcher exists to clean up. Self-review caught it; this PR replaces the problematic substring with plain prose ("rename the alias to a non-reserved word (AS row_count) or backtick-quote it BQ-style (AS `rows` with literal backticks around the identifier)") that needs no escape gymnastics. New regression test test_no_hint_branch_leaks_literal_backslashes pins every dispatch branch against `\\\`` and `\\\\` substrings — pytest now catches this class on the next regression instead of waiting for an analyst to spot it. CHANGELOG bullet rephrased to match (the same broken backslashes leaked into the [Unreleased] entry). Verified: 4162 tests pass; 26 in test_api_query_guardrail.py green; demo print of the syntax-error branch shows clean output.	2026-05-12 18:57:46 +00:00
ZdenekSrotyr	5458ccc41b	hygiene: BQ error hint dispatch + catalog ENTITY column (#274 ) Two analyst-UX papercuts surfaced by the v0.53.4 onboarding smoke test. 1) /api/query remote_estimate_failed hint now branches on the BigQuery error class instead of always claiming a column doesn't exist. The previous hardcoded "Most often this means a column referenced … doesn't exist" misled analysts whenever BigQuery actually rejected on syntax — concretely, `SELECT COUNT(*) AS rows FROM …` fails with `Syntax error: Unexpected keyword ROWS at [1:20]` (`rows` is a BQ reserved word) and the hint pointed at non-existent columns. New _hint_for_bq_bad_request() helper dispatches: - "Syntax error" / "Unexpected keyword" → reserved-keyword alias hint with `AS row_count` workaround - "Unrecognized name" / "not found inside" → `agnes schema <id>` - "Table not found" → `agnes catalog` - fallback → enumerate all three 4 unit tests in TestHintForBqBadRequest pin each branch. Existing guardrail tests (test_fallback_fails_fast_on_pure_duckdb_syntax, test_remote_estimate_failed_surfaces_first_error_when_attempts_differ) continue to pass — both hint substrings they assert on still appear in the relevant branches. 2) `agnes catalog` replaces the FLAVOR column with ENTITY. FLAVOR rendered t['sql_flavor'] which duplicated SOURCE for any catalog dominated by one source type — analysts saw `SOURCE=bigquery FLAVOR=bigquery` on every row. ENTITY instead surfaces the upstream BigQuery entity_type (BASE TABLE / VIEW / MATERIALIZED_VIEW) for remote rows; non-remote rows render `-`. The distinction matters operationally: views don't support predicate pushdown, so `agnes query --remote` against a view trips the cost guardrail where the same query against a BASE TABLE pushes down cleanly. The entity_type field has been in the v2 catalog response since 0.51.0; this PR just stops hiding it behind a column header that conveyed no information. JSON output (`agnes catalog --json`) is unchanged — only the human- readable column changed. No DB migration; no API change. Verified: 4161 tests pass locally; 25 in test_api_query_guardrail.py green; the 4 new TestHintForBqBadRequest cases pin each branch.	2026-05-12 18:32:29 +00:00
ZdenekSrotyr	12db59127b	release: 0.53.0 — close Tier B trackers (#259-#261) + admin UI fix (#265 ) (#267 ) * release: 0.53.0 — Tier B trackers + admin UI bugfix Closes #259 (init resume sentinel), #260 (startup parquet-lock sweep), #261 (materialized schema uses local parquet, not BQ), #265 (admin tables apostrophe → HTML-entity escape). Tracker notes: #262 closed as obsolete (pre-empted by 0.51.0 changes), #266 left open pending UX clarification. * fix(init): move resume sentinel from .agnes/ to .claude/ The clean-install integration test (test_clean_install_integration.py) forbids creating .agnes/ in the workspace root via its forbidden_unconditional list — that path is reserved for ~/.agnes/ in the user's HOME (marketplace clone, CA bundle). .claude/ is already created by agnes init for settings.json + hooks, so dropping init-complete next to those keeps the resume sentinel consistent with the rest of Claude Code's workspace surface and lets the clean-install assertions pass. Issue #259. * docs(changelog): point #259 entry at new .claude/init-complete path Follows the sentinel move from .agnes/ → .claude/ to keep the changelog in sync with what 0.53.0 actually ships.	2026-05-12 16:28:41 +02:00
ZdenekSrotyr	48755b9864	release: 0.52.0 — UX/hygiene round (5 fixes from 0.51.0 retro) Closes #254 (agnes sample alias), #255 (wide-table render), #256 (single-flight on bq-metadata-refresh + run_id), #257 (init wording), #258 (progress bar clamp). Tier B trackers left open: #259 (init resume), #260 (stale .lock), #261 (schema cold-start), #262 (docker disk).	2026-05-12 15:09:14 +02:00
ZdenekSrotyr	99b9379ba3	Merge remote-tracking branch 'origin/main' into worktree-catalog-bq-hotfix # Conflicts: # CHANGELOG.md	2026-05-12 11:56:49 +02:00
minasarustamyan	dc5e0e0d11	Marketplace UX overhaul: rich plugin/skill/agent detail + filename rename (#251 ) * Rename agnes-metadata.json to marketplace-metadata.json Curated marketplace enrichment file (.claude-plugin/agnes-metadata.json) becomes marketplace-metadata.json. Clean cut, no fallback — curators of upstream marketplace repos must rename the file on their side. Python API renames mirror the file rename: read_agnes_metadata → read_marketplace_metadata, AGNES_METADATA_REL → MARKETPLACE_METADATA_REL, AGNES_METADATA_MAX_BYTES → MARKETPLACE_METADATA_MAX_BYTES. Synth Claude Code marketplace strip rule (.agnes/** + the metadata file) follows the new filename. * Marketplace detail polish: window cover + 715:310 aspect + helper alignment - Plugin & item (skill/agent) detail hero: 160x160 square cover replaced with a macOS-style window frame (3 traffic-light dots + titlebar label showing the entity name). Body is constrained to 715:310 so curator- uploaded covers no longer crop to a square. Window is 380px wide; meta column and absolutely-positioned top-right install/remove actions stay put. Fallback when no cover_photo_url (translucent gradient + PL/SK/AG initials) is unchanged, just inside the window body. - Inner skill/agent cards in the plugin detail's Internal structure section adopt the same 715:310 aspect (was fixed 78px tall). No window chrome on inner cards — just the matching proportions so covers read consistently across hero, grid tiles, and listing cards. - Curated nested item helper text ("This skill is part of ... — add the bundle to your stack to use it") now stacks UNDER the "Open parent plugin" button instead of being a side-by-side flex sibling in the actions-row. Added align-self: flex-end so the 260px helper box anchors at the right edge of the 300px actions column, matching the button's right edge. * Marketplace My tab: surface the same category + type filters as Flea - Frontend: mp-cat-row and mp-type-row now show on tab=my (previously hidden — type was flea-only, category was flea/curated-only). Curated browse stays plugin-only and continues to hide the type pills. fetchOne() sends the `type` param for tab=my too, so the items endpoint's existing my-branch filter actually receives it. - Backend categories endpoint, tab=my branch: when the type filter is set to skill/agent, skip counting curated subscriptions. Curated plugins are always type='plugin', so they wouldn't survive the items endpoint's type filter; including them in the category counts made the pill numbers overstate what users could actually see in the grid. type=None or type='plugin' keeps the previous behaviour. - CHANGELOG entry under [Unreleased]. * Marketplace plugin detail: render rich content from marketplace-metadata.json Adds five optional plugin-level fields to marketplace-metadata.json and renders them on the curated plugin detail page + listing card: * display_name — friendly h1 / listing-card name / mac-window titlebar label (overrides the technical plugin id) * tagline — punchy 1-line value prop for the hero subtitle and the listing card description (replacing the verbose marketplace.json description on cards) * description — multi-paragraph markdown body, server-side rendered through markdown-it-py and sanitized through nh3 with a description-scoped allowlist (no iframes / no raw HTML / no javascript: links). Powers the "What it does" panel. * use_cases[] — {title, description, prompt} entries that render as a 3-column "When to use it" card grid; each card shows the literal prompt as a code chip so users can copy-paste into Claude Code. * sample_interaction — {user, assistant} dialog rendered in a Claude Code-style dark Catppuccin Mocha transcript panel: monospace user row with a green ">" prompt indicator + sans-serif assistant body with markdown formatting (peach bold, yellow italic, pink inline code, mantle-dark fenced code blocks). All five fields are optional; UI sections only render when populated, so plugins without enrichment look identical to before. Fields are read on-demand from the working tree (cached by mtime per marketplace slug) so curator edits land at the next request without waiting for a sync cycle — same pattern as the existing inner-skill/agent enrichment path. No DB schema bump. Skill / agent rich-content rendering is deferred to a later phase (needs a source-of-truth decision: extend plugin.yml? LLM-generate from SKILL.md / agent.md?). The schema accepts the same fields at skill/agent level today for forward compatibility but the UI ignores them for now. Also: stripped a stale `background-color: var(--bg)` from the global `code` rule in style.css (was making inline code visually disappear on the page background). * Skill / agent detail: render rich content from marketplace-metadata.json Brings the skill/agent detail pages to parity with the plugin detail page. Same rich-content schema (display_name, tagline, description as markdown, use_cases[], sample_interaction) plus two per-item additions: * invocation — curator-provided literal command string. When set, overrides the computed "<manifest_name>:<inner_name>" chip and cleanly supports both "/" skill prefix and "@" agent prefix (the hardcoded "/" in the chip markup is hidden when the curator provides the invocation, so /grpn-eng:query <q> and @grpn-eng:cto-architect both render correctly). * when_to_use — markdown disambiguation block ("Use this for X. For similar Y, see /other-skill") rendered into a new "When to use this" panel below the Example section. Skill / agent category is now per-item overridable in marketplace-metadata.json. When absent, the API keeps the parent plugin's category as the badge so existing items don't lose their category until curators opt in to per-item categorization. The new "Example" Q&A panel uses the same Claude Code-style dark Catppuccin Mocha transcript treatment as the plugin detail — monospace user row with a green ">" prompt indicator + sans-serif assistant body with markdown formatting. All new fields are optional and read on-demand from the working tree. Skills / agents whose marketplace-metadata.json doesn't carry rich content render exactly the same way they did before (frontmatter description + computed slash command + cover from existing v32 enrichment). No DB schema bump. * Fix TypeError in skill / agent detail when curator sets per-item category `curated_skill_detail` and `curated_agent_detail` were passing both `parent` (from `_curated_inner_parent_fields`, which returns the parent plugin's category as a fallback) and `enrichment` (from `_curated_inner_enrichment`, which returns the per-item category override when the curator set one) into `InnerDetailResponse(...)`. Python function-call kwargs unpacking with overlapping keys raises `TypeError: got multiple values for keyword argument 'category'` — it doesn't merge like a literal dict does. The bug only surfaced when the marketplace-metadata.json carried a `category` field at skill / agent level (curator opting into per-item categorization); items without that override hit the endpoint cleanly because only parent provided the key. Fix: build `merged = {parent, enrichment}` first (literal-dict syntax DOES merge, with the right-hand-side winning) and unpack the merged dict. Curator override still wins via the merge order, and the same pattern is future-proof for any other field that lands in both layers later. Plus a regression test in test_marketplace_metadata.py asserting that the inner-resolver carries `category` for downstream merging. * Marketplace detail: tolerate partial curator JSON Server constructed UseCase / SampleInteraction via raw dict indexing (uc["title"], sample["assistant"]), so a curator commit missing any required Pydantic field crashed the whole plugin / skill / agent detail endpoint with a 500. Route both constructions through _safe_use_case / _safe_sample_interaction helpers — partial input silently drops the malformed card / section instead of breaking the page. Regression test in test_marketplace_api.py covers the three shapes: use_case missing a key, use_case with an empty string, and sample_interaction with only user (no assistant). Sibling rich fields still render. * Address PR-251 review (must-fixes + S2/S3 polish) + release-cut 0.50.0 Five must-fixes from the review pass (3 from @cvrysanek's two-stage review, 2 from my independent pass), plus the 0.50.0 release-cut as the last commit on this PR per CLAUDE.md (CLAUDE.md "Release-cut belongs to the PR" rule added in v0.49.1). Must-fixes ---------- 1. Cache eviction: bounded LRU instead of per-marketplace predicate. The previous predicate (`k[0] == marketplace_id and k[1] != mtime_ns`) only swept stale entries for the CURRENT marketplace; with N>100 distinct marketplaces each holding one mtime key, the cap silently failed and memory grew linearly. Replaced with OrderedDict-backed bounded LRU at cap=256, drop oldest insert on overflow. Cache stress test pinned in test_marketplace_metadata.py. 2. Render CPU cap: per-field byte cap on description / when_to_use / sample_interaction.assistant via MARKETPLACE_METADATA_FIELD_MAX_BYTES (= 64 KiB). Without this, a 1 MiB curator markdown body × QPS = curator-controlled CPU burn through pure-Python markdown-it-py. Truncation respects UTF-8 boundaries and logs a warning so the curator sees the cap fire on the next sync. Test for cap + UTF-8-boundary preservation. 3. Inner-detail bypassed the metadata cache. _curated_inner_enrichment, _curated_inner_cover, and curated_detail all called read_marketplace_metadata directly, defeating the mtime cache the plugin listing already shared. Routed all three through _read_metadata_cached so skill/agent detail hits are O(1) re-parses per marketplace per mtime instead of O(QPS). 4. Truthy-vs-presence trap in plugin/inner enrichment merge. API-layer writers used `if resolved.get(k):` which silently dropped any future falsy-but-valid resolver field (bool featured=False, int priority=0, str category=''). Switched to presence check (`if k in resolved`) so the resolver is the authority on field presence; `{parent, enrichment}` merge respects whatever the resolver decided to ship. 5. Vendor-agnostic OSS cleanup. Removed operator-specific token references (/grpn-eng:, @grpn-eng:, .foundryai/) from src/marketplace_metadata.py docstring, app/web/templates/ marketplace_item_detail.html JS comment, docs/curated-marketplace- format.md, and tests/test_marketplace_metadata.py fixtures. Replaced with generic /my-plugin:tool / @my-agent:role / .example/ placeholders. CHANGELOG --------- - New "### Fixed (PR #251 follow-ups)" section documenting all 4 code-side must-fixes - New "### Internal" section noting the vendor cleanup + new tests - BREAKING bullet for the file rename now covers operator-side migration: running instances see plugin enrichment disappear from the UI until upstream curator renames + nightly sync overwrites the working tree; POST /api/marketplaces/{id}/sync forces refresh sooner - Stripped /grpn-eng: leaks from the existing skill/agent rich-content bullet Tests ----- 128 targeted tests pass (test_marketplace_metadata, test_marketplace_api, test_marketplace, test_markdown_render, test_marketplace_synth_strip, test_marketplace_filter). New tests added: - 6 XSS regression tests on render_safe (javascript:/data:/vbscript: schemes via autolink, reference link, and mixed-case + positive http/https/mailto + noopener noreferrer rel) - 3 byte-cap tests (truncation + UTF-8 boundary + under-cap pass-through) - 1 cache eviction stress test (>256 marketplaces -> bounded at cap) - 1 truthy-vs-presence resolver-contract test Release-cut ----------- - pyproject.toml 0.49.1 -> 0.50.0 (minor; BREAKING file rename per pre-1.0 CHANGELOG note: "breaking changes called out under Changed or Removed with the BREAKING marker") - CHANGELOG [Unreleased] -> [0.50.0] - 2026-05-12, new empty [Unreleased] on top. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-12 08:38:39 +00:00
ZdenekSrotyr	b6cdd68e8d	feat(catalog): entity_type + validated where_examples + view-aware cost-guard + scheduler hygiene Three behavioural improvements driven by the sub-agent end-to-end test findings, plus scheduler tweaks to prevent the post-deploy contention burst we measured. CATALOG (catalog-side bugs the test agents tripped on): - new entity_type field per remote row (BASE TABLE / VIEW / MATERIALIZED VIEW). For views, rows + size_bytes return null instead of the misleading 0 that __TABLES__ reports. - where_examples now validates against the table's actual schema (cached known_columns from refresh). The pre-fix behavior blindly advertised `country_code = 'CZ'` on tables with no country_code column — the sub-agent tests reliably hit this on unit_economics. - new known_columns + entity_type columns on bq_metadata_cache; populated by bq_metadata_refresh.refresh_one from the same fetch_bq_columns_full call (no extra BQ roundtrip) plus a cheap INFORMATION_SCHEMA.TABLES lookup for table_type. QUERY COST-GUARD: - remote_scan_too_large suggestion now names views explicitly: `Target(s) <ids> are VIEW or MATERIALIZED VIEW. BigQuery does not push LIMIT into the view body — SELECT * FROM <view> LIMIT 1 still runs the full underlying scan.` Programmatic consumers get a new view_targets field on the error detail. SCHEDULER HYGIENE (the post-deploy 1-minute window where concurrent parquet downloads dropped to ~1 MB/s): - SCHEDULER_STARTUP_GRACE_SECONDS (default 60) holds the first tick so the burst doesn't overlap cache_warmup writes. - SCHEDULER_BQ_METADATA_INITIAL_OFFSET_MAX_SECONDS (default 900) randomises bq-metadata-refresh's first-fire offset. TESTS: - test_bq_metadata_cache_repo: entity_type + known_columns round-trip - test_v2_catalog_remote_metadata: where_examples validation, views return null rows/size_bytes, cold rows have empty examples - test_api_query_guardrail: VIEW-aware suggestion text + view_targets - test_connectors_bigquery_metadata: entity_type lookup mock + new fields in TableMetadata expectations - test_scheduler_sidecar: grace + jitter env-var resolution	2026-05-12 10:37:35 +02:00
ZdenekSrotyr	b3841f5b6c	release: 0.50.0 — persistent BQ metadata cache + scheduled refresh; catalog never blocks on BigQuery Since 0.47.0 GET /api/v2/catalog enriched each remote BigQuery row by fetching INFORMATION_SCHEMA.TABLE_STORAGE + COLUMNS through the DuckDB BigQuery extension inside the request. On cold caches that fanned out to O(N) sequential BQ jobs-API roundtrips — easily 90 s+ on partitioned / view-backed tables — and reliably blew the CLI's 30 s httpx ReadTimeout. Reproduced with py-spy: three AnyIO worker threads stuck inside connectors/bigquery/metadata._fetch_via_legacy_tables. Refactor: enrichment is read exclusively from a new persistent bq_metadata_cache DuckDB table (schema v40), populated by a scheduler- driven refresh job at SCHEDULER_BQ_METADATA_REFRESH_INTERVAL (default 4 h). Cold catalog response on a fresh container is now tens of milliseconds with metadata_freshness=never_fetched for unwarmed rows. New surface: - POST /api/admin/run-bq-metadata-refresh (scheduler-driven, full) - POST /api/v2/metadata-cache/refresh?table=<id> (admin, single) - GET /api/v2/metadata-cache/status (auth, non-admin) - metadata_freshness field per catalog row Removed (internal API): v2_catalog._size_hint_for_row, _resolve_remote_metadata, _metadata_provider_for, _build_metadata_request, _materialized_size_hint, in-memory _metadata_cache. Response shape unchanged for external consumers. 991 tests passing; 2 pre-existing failures (test_db v3→v4 ladder, test_cli_binary_rename) unrelated to this change.	2026-05-11 20:37:17 +02:00
minasarustamyan	9de679c714	System plugins (schema v39) + marketplace UX polish + drop legacy pages (#241 ) * System plugin tier with mark/unmark fanout (schema v39) Adds a mandatory plugin tier so admins can pin a small set of curated plugins into every user's stack from day one. Marking a plugin via the new toggle on /admin/marketplaces materializes resource_grants for every group and user_plugin_optouts subscriptions for every user, so the existing resolver pulls the plugin into every served set without a new filter layer. Hooks on user-create (Google OAuth, magic-link, admin POST, scheduler) and group-create propagate the same materialization to new principals. UI locks: /admin/access disables the checkbox with a SYSTEM pill; /marketplace cards swap the "In stack" green pill for an amber "Required" badge with shield icon; the plugin detail install button reads "Required by your org"; /my-ai-stack toggle is disabled. Bypass paths return 409 (DELETE /api/admin/grants for system grants, PUT /api/my-stack/curated/.../{enabled:false}, DELETE /api/marketplace/curated/.../install). Unmark only flips the flag — materialized rows persist so admins curate cleanup at their leisure through the now-unlocked /admin/access checkboxes. * Marketplace UX polish + drop legacy /store and /my-ai-stack pages Two-part cleanup post-v39: (1) Page deletion. /store and /my-ai-stack were already replaced by /marketplace?tab=flea and /marketplace?tab=my respectively, but the standalone routes lingered. Hard delete in dev mode — no redirects, stale bookmarks 404. The /store/new upload wizard, the flea detail/edit pages, the admin queue, and all /api/store/* + /api/my-stack endpoints (CLI consumers) stay. Internal hardcoded hrefs in the upload wizard's Cancel button and the advanced-setup page repointed to the marketplace tabs. (2) Detail-page install button rework. The single button that morphed between "+ Add to my stack" and "✓ In your stack" did not communicate uninstall affordance. The installed state now renders an inline white status label before a separate red-bordered "✕ Remove from stack" button on the same row, both at identical height to avoid layout shift. System plugins keep their locked amber "✓ Required by your org" pill (no Remove button — API refuses 409). The post-action hint panel now fires on remove too with the title flipped to "✓ Removed from your stack" — Claude Code needs the same /update-agnes-plugins refresh either way. Also: /admin/marketplaces Details modal "Mark as system" toggle redesigned. The button was near-invisible (matched neutral row metadata). It's now a balanced amber-toned chip with shield icon and a structured confirm modal replacing the native confirm() dialog that summarizes fanout consequences before commit. * Move stack-hint inside hero with glass-on-gradient styling The post-action hint card ("✓ Added to your stack" with the /update-agnes-plugins recipe) used to live below the hero in panel-what (gray card on white page body). Clicking add/remove inserted/removed it between the hero and content, shifting the panels below — a noticeable scroll jump. The hint is now anchored inside the hero's top-right corner alongside the install/remove buttons, both as flex children of an absolutely positioned .actions container. The card uses a translucent white-on-glass treatment that adopts the hero's kind color (blue for plugin, green for skill, purple for agent) without per-kind branching. Hero is always tall enough (160px photo) to contain the action+hint stack without overflow, so toggling the hint visibility doesn't grow the hero or shift body content. The hero-head grid reserves a third 300px column for the absolute actions overlay so meta gets the proper 1fr free space instead of being squeezed by a padding-right hack. Responsive breakpoint at 1100px reflows the actions stack below hero-head when the viewport isn't wide enough to keep meta + actions side-by-side comfortably. * Add optional -DataPath bind mount to run-local-dev.ps1 When the operator wants to inspect DuckDB files (system.duckdb, extracts, marketplaces, store/, …) directly from Windows Explorer, the named volume inside the Docker Desktop WSL VM isn't reachable. The new -DataPath param generates a transient compose override that rebinds /data on app, scheduler, extract (and Caddy's /srv:ro mirror) to a Windows host folder. Fully additive — when -DataPath is omitted everything behaves exactly as before: no override file is generated, $composeFiles array is unchanged, finally cleanup is a no-op. Existing positional invocations (.\run-local-dev.ps1 up \| down \| logs) keep binding to $Action because $DataPath is a named-only parameter with no Position attribute. The override is written via [System.IO.File]::WriteAllText so the YAML is BOM-less across PS 5.1 / 7+ — Compose rejects BOM-prefixed YAML on Windows. The override file is unique per PID and removed in the script's finally block so concurrent invocations and crashes don't leak files. * factor mark_system fanout into UserCuratedSubscriptionsRepository The endpoint imported UserCuratedSubscriptionsRepository, ignored it (noqa: F841), then duplicated the user-side fanout SQL inline. Adds fanout_system_for_plugin() symmetric to the existing fanout_system_for_user() and routes mark_plugin_system through it — removes the dead import + 14 lines of inline SQL, returns the same `affected_users` delta count, no behavior change. * drop customer-specific path from .ps1 example Per CLAUDE.md vendor-agnostic OSS rule: replaced C:\\Business\\Groupon\\Agnes\\agnes-data with the generic C:\\Users\\<you>\\agnes-data placeholder so the docstring example reads cleanly on any reviewer's box. * release: 0.48.0 + parallelize Release-workflow pytest Cuts the release shipped via #228 #230 #231 #232 #233 #234 #236 #237 #238 #239 #240 plus this PR (#241). Major changes: - System plugin tier (schema v39) — admins mark a plugin mandatory; fans out RBAC grants + subscriptions to every existing user/group plus hooks for new principals - BREAKING: removed standalone /store + /my-ai-stack page routes (replaced by /marketplace?tab=flea + /marketplace?tab=my) - Setup-prompt + bootstrap recovery fixes (#240) - DuckDB CHECKPOINT-on-shutdown + 60s compose grace (#235) - Marketplace + flea-market UX polish, agnes-metadata.json enrichment Bonus: switch release.yml test step to `-n auto` (matches ci.yml). Single-threaded was 15-20 min and frequently the bottleneck on PR mergeability — now ~6 min. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-10 19:15:41 +00:00
Vojtech	929520f5e1	Flea-market edit feature with version history (schema v37) (#239 ) * feat(store): flea-market entity edit feature with version history (schema v38) Owner + admin can now edit a store entity from a real Edit page at /marketplace/flea/{id}/edit, replacing the prior "coming soon" placeholder. Editable: display name, description, category, video URL, cover photo, and an optional new bundle. Type is locked (400 type_locked). Display-name change renames the on-disk slug for both live plugin/ and version dirs (reuses rename-on-archive helper). Schema v38 (originally drafted as v37; renumbered after rebase onto main where v37 was taken by the curated marketplace enrichment). Versioning model: * Each bundle update bakes into ${DATA_DIR}/store/<id>/versions/v<N+1>/plugin/ and runs the standard guardrails pipeline. * DEFERRED PROMOTION: live plugin/ + entity.version_no stay at the prior approved version through the LLM review window so existing installers keep receiving the previously approved bundle. Live swap + version_no/version/file_size bump happen only on LLM approval. Blocked verdicts leave the prior version serving forever. * store_entities gains version_no INTEGER + version_history JSON. Each version_history entry carries hash, sha256, size, submission_id, created_at, created_by. * Existing entities backfill to v1 with a single-entry history seeded from the row's current `version` hash. Initial create also seeds versions/v1/plugin/ so future restore can copy v1 bytes forward. Concurrency: * Block-while-pending: an in-flight LLM review blocks any further edit with 409 prior_version_pending. Owner waits 5-30s; Edit button on detail page renders disabled in the same window via the new edit_in_flight flag (decoupled from quarantine_sub since the deferred-promotion flow keeps visibility='approved'). Rollback: * New endpoint POST /api/store/entities/{id}/versions/{n}/restore (owner + admin). Copies vN bundle forward as v<max+1> and re-runs guardrails (rules tighten over time; pre-approved bundles re-validate). Forward-only history. Same deferred-promotion semantics — live stays at prior version until LLM approves the restored copy. UI: * New /marketplace/flea/{id}/edit page (owner + admin gated). * Versions card on plugin + item detail templates (owner/admin only) via shared _flea_versions.html partial. * Admin queue gains v# column with current badge + separate Hash column. Submission detail surfaces Version + Bundle hash rows. * Activity timeline split into per-submission + entity-wide cards; entity-wide rows render vN chips when audit row params reference a specific version. * Section headers (Manifest / Static / Quality / LLM review) tag with vN chip via shared macro. * Reviewed-by-model field surfaces explanatory text per status. * Banner upload-failure now redirects to detail page on submission_blocked instead of staying stuck. Tests: 24 in tests/test_store_entity_versions.py covering metadata- only edit, bundle-edit version bump, type lock, block-while-pending, name change disk rename, restore flow + 404/400/403 paths, edit page 404 for non-owner, versions card visibility gating, admin queue v# column, admin detail Version/Hash rows, deferred-promotion installer contract (pending review doesn't break installer / blocked verdict keeps prior / approved promotes), admin can edit/restore non-owned, restore deferred promotion, audit log per-version params. 214 tests green across guardrails + edit + admin + repo + schema suites. * docs(store): refresh update_entity docstring to match deferred-promotion + submission-status gate Bring the docstring in sync with the actual fixes from the prior commit. The pre-fix wording said the gate read visibility_status='pending' AND submission status — under deferred promotion that would never fire for v2+ edits. Now describes: - Block-while-pending gates on submission.status DIRECTLY, independent of visibility (so v2+ deferred-promotion edits don't slip through). - Display-name + bundle change defers the live rename to promotion; metadata-only renames stay immediate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 00:14:33 +04:00
minasarustamyan	6fe67d5279	Curated marketplace enrichment via agnes-metadata.json + curator metadata (#234 ) * Curated marketplace enrichment via agnes-metadata.json + curator metadata Adds a second well-defined metadata file `.claude-plugin/agnes-metadata.json` that upstream marketplace repos can opt into, providing per-plugin (and per-skill / per-agent) cover photo, demo video URL, doc links, and category override. The Claude Code marketplace contract is untouched — agnes-metadata.json + the convention `.agnes/` directory are stripped from the synthetic Claude Code marketplace served via /marketplace.zip and /marketplace.git/, so user instances see a clean Claude Code repo with no Agnes-only metadata. Highlights: - DB schema v32 — adds curator_name + curator_email on marketplace_registry, cover_photo_url + video_url + doc_links on marketplace_plugins. - Mandatory curator at marketplace registration, editable later through the admin UI; surfaces on cards + detail pages in place of owner_todo. - External-asset mirror cache at ${DATA_DIR}/marketplace-cache/<slug>/ with conditional GET, 60s timeout, 10 MB body cap, SSRF guards, and Wikipedia-policy-compliant User-Agent. - Strict drop semantics — anything Agnes can't deliver as a real PDF / Markdown / plain text doc, or a real PNG / JPEG / WebP cover, is dropped from the served metadata; UI looks identical to no-entry case (gradient placeholder for missing covers, no row in the doc list). - Doc allowlist + image allowlist enforced on both the curated mirror flow and the Flea upload flow (/store/new); shared module src/marketplace_assets.py. - New /api/marketplace/curated/{mp}/{plugin}/{asset,doc,mirrored}/... endpoints with path-traversal guards + RBAC + Content-Disposition attachment for docs. - Curator-focused format guide at /marketplace/format-guide; canonical source is docs/curated-marketplace-format.md, also linked from the admin /admin/marketplaces page next to + Add Marketplace. See CHANGELOG.md under [Unreleased] for the full breakdown. Fix format-guide test assertion to match shortened disclaimer The 'Flea Market' phrase was trimmed out of the disclaimer in docs/curated-marketplace-format.md after the curator-focused rewrite. Update the rendered-HTML test to assert the channel-scoping phrase that's actually present ('Curated Marketplace channel only') rather than the 'Flea Market' contrast that's no longer in the doc. * Drop unused 'version' field from agnes-metadata.json schema The parser never read it; it was a YAGNI placeholder for future schema evolution. Curators don't need to wonder what to put there when adding the file for the first time. Will be re-added if and when we actually introduce a backwards-incompatible schema change. * Harden asset mirror against SSRF via redirect + DNS rebinding The pre-flight _is_safe_url check validated only the initial URL; urllib.request.urlopen then followed redirects and re-resolved DNS for the actual connection — both bypassable. Attacker-controlled origin could 302 to http://169.254.169.254/... and exfil cloud metadata; attacker-controlled DNS could return public IP first / 127.0.0.1 second. Replace urlopen call with a shared OpenerDirector wired through three custom handlers: _SafeRedirectHandler re-runs SSRF allowlist on every redirect Location (max 5 hops, down from urllib's 10), and _PinnedHTTPHandler / _PinnedHTTPSHandler connect to the IP that passed validation rather than re-resolving the hostname. TLS SNI + cert verify stay bound to the original hostname. _resolve_safe returns the validated IP (the existing _is_safe_url 2-tuple wrapper stays for backwards compatibility) and rejects round- robin DNS that mixes a public + private record. _UnsafeRedirectError is a typed exception so _fetch_url can map redirect blocks to terminal 'rejected' status (not transient 'failed'). _http_open is the single call site so tests can mock at one well-defined seam. Tests cover redirect blocking (link-local, loopback), redirect-error unwrapping inside URLError, pinned-IP connection target, and the end-to-end DNS-rebinding scenario. Existing tests that mocked urllib.request.urlopen are migrated to mock _http_open. * Harden /asset/ endpoint against stored XSS The endpoint served any file in the cloned marketplace repo with stdlib-detected Content-Type, so a curator who landed evil.html (or a renamed evil.png carrying HTML bytes) in the working tree got a same-origin XSS — the response shares cookie scope with /admin and /api/me/. The asset endpoint is image-only by contract (cover photos referenced from agnes-metadata.json + inner skill / agent cards), so applying the same allowlist + magic-bytes pattern that /doc/ already uses closes the gap without breaking any legitimate use case. Three layered checks: extension in IMAGE_EXTENSIONS (.png/.jpg/.jpeg/.webp; SVG excluded — <script> inside SVG executes), validate_image_file magic bytes (defeats rename-extension attack), Content-Type pinned from the validated extension (never stdlib mimetypes). Defense-in-depth: X-Content-Type-Options: nosniff stops browser MIME sniffing; Content-Security-Policy: default-src 'none' blocks script / iframe execution even if a future regression let HTML through. Tests cover the .html extension reject, the renamed-HTML-as-PNG magic- bytes reject, the .svg reject, and the happy-path PNG with security headers attached. The pre-existing path-traversal test seeds a real PNG instead of ok.txt now that the endpoint is image-only. Enforce mandatory curator on marketplace PATCH The POST handler enforced curator_name + curator_email at create time, but PATCH treated empty / missing curator inputs as 'no change'. Legacy rows that pre-date v32 (curator_name=NULL) could be edited indefinitely without ever filling the curator gap, and OWNER_TODO_PLACEHOLDER lingered on every /marketplace card. Reject the PATCH with 400 when the post-merge row would persist with empty curator. The check fires after the existing field-merge logic, so once-filled rows that don't touch curator still pass through (their existing values fall through from the DB row). DB column stays nullable so untouched legacy rows continue to coexist — the gate fires only the moment an admin opens the edit modal. Existing PATCH semantics preserved: empty-string input still means 'leave existing value alone', and once-filled curator can't be cleared (those test cases pass unchanged). New test seeds a legacy row directly via the repository, then exercises url-only PATCH (rejected), partial-fill PATCH (rejected), and full-fill PATCH (succeeds); a follow-up no-curator PATCH on the now-formed row also passes. * Drop unused curated-marketplace helpers (PR #234 review) * build_db_payload — imported by src/marketplace.py but never called. The strict-drop semantics it would have implemented were re-written inline in _refresh_plugin_cache (see the comment block there). The standalone helper still carried the old fall-back-to-original-external- URL-on-mirror-failure behaviour, which contradicts the documented drop-when-can't-deliver contract — a future contributor who re-wired it would have introduced a silent regression. Delete with the helper + the import + the comment that referenced it. * _resolve_marketplace_name — one-line shim with no remaining call sites. Callers use _resolve_marketplace_meta which returns name + curator together, avoiding the double DB hit the shim exists to hide. * '# noqa: F401 Optional kept for forward-compat' was wrong — Optional IS used in src/marketplace.py (line 70 and line 238). Drop the noqa comment so a future ruff run doesn't try to remove a real import. Removing build_db_payload also drops the only remaining use of Optional in src/marketplace_metadata.py, so the import comes out there too. * Cap agnes-metadata.json size + catch RecursionError on parse The reader is invoked once per marketplace per sync and the file is curator-controlled. Two failure modes were unguarded: * Multi-GB JSON: path.read_text() pulled the whole file into memory before json.loads even ran. A curator with commit access to an upstream repo could OOM the sync worker. * Deeply-nested JSON under any size cap: cpython's recursive object / array parser raises RecursionError at ~1000 levels of depth. RecursionError is a RuntimeError, not ValueError, so the existing catch let it propagate up and abort the entire sync — every other marketplace in the same pass got skipped. Add AGNES_METADATA_MAX_BYTES = 1 MiB (a real metadata file with covers, docs, categories for ~50 plugins fits in <100 KB so the cap is generous) and gate the size check on path.stat().st_size before the body read. Broaden the parse except to (ValueError, RecursionError) with a unified log line. Both failure modes degrade to the same empty-dict fall-back the malformed-JSON path already used, so one bad upstream never aborts the rest of the sync. Tests cover the size cap firing before json.loads (whitespace-padded valid JSON exceeding the cap) and the recursion path (5000 nested arrays — past cpython's default recursion limit but well under the size cap). * Persist asset-mirror manifest per body write, before unlink sync_assets wrote each body atomically (tmp + rename) but persisted the manifest only at the end of the batch. A kill -9 mid-Phase 2 left on-disk files the manifest never referenced. Once a curator dropped that URL from agnes-metadata.json, Phase 3's cleanup had no record of the file and the orphan stayed forever — there's no GC pass walking the cache dir today, so disk would slowly bloat. Phase 2 (body-write iteration): after the in-memory manifest mutation, persist BEFORE unlinking the previous body. The crash window narrows from 'all of Phase 2' to 'between persist and unlink' (microseconds). A persist failure mid-batch keeps the previous body on disk — the on- disk manifest still references it, and a stale-but-existing file beats a 404. Cost: one extra tmp+rename per body write; manifest is a few KB so the overhead is negligible vs. the HTTP fetches. Phase 3 (curator-removed URLs): same discipline. Collect the to-delete relpaths, persist the manifest with the entries already gone, THEN unlink. A crash mid-cleanup leaves at most a microsecond window where files exist despite the manifest no longer naming them. The next sync reads the (correct) manifest and the orphan stays orphaned, but the served state is consistent. Tests cover per-body persist call count, the post-update on-disk manifest content, and Phase 3 ordering verified by reading the on-disk manifest from inside Path.unlink. * Consolidate marketplace video embeds + format-guide CSS The YouTube nocookie / Vimeo / <video> / link-fallback detection logic was duplicated verbatim in marketplace_plugin_detail.html and marketplace_item_detail.html (~40 JS lines each, with subtly-different inline styles). Both templates now {% include %} a single _marketplace_video_embed.html partial inside their IIFE so the regex, the nocookie attribute set, and the unknown-host link fallback live in ONE place — future tweaks (new host, new attribute, fixed sandbox flag) no longer need to be applied twice in lockstep. The .video-wrap selectors (one inline <style> rule in plugin_detail, one inline style='...' attribute in item_detail) are replaced by the existing .video-embed 16:9 wrapper in style-custom.css, with new .video-embed video / .video-embed a child rules added so the wrapper handles all four embed shapes uniformly without per-template positioning. The 60-line inline <style> block in marketplace_format_guide.html moves verbatim to style-custom.css under a new 'Marketplace format guide page' section, scoped to .format-guide so other pages aren't affected. No user-visible behaviour change: the rendered HTML for valid YouTube / Vimeo / mp4 / external links is byte-identical to before, and the format-guide page renders the same. * Maintainability cleanup batch (PR #234 review) #10: drop _path_under from app/api/marketplace.py — it was a byte- equivalent clone of _safe_join (same Path.resolve(strict=True) + relative_to() containment check). The three v32 endpoint handlers (/asset, /doc, /mirrored) now share the existing helper. #14: rename src/marketplace_assets.py → src/marketplace_asset_validation.py so the file's purpose is obvious from the name and the previous overlap with src/marketplace_asset_mirror.py is gone. Six call-site imports updated in lockstep; CHANGELOG references under [Unreleased] updated to track the new path. #11: consolidate the URL builders that resolve /api/marketplace/curated/<slug>/<plugin>/{asset,doc,mirrored}/... paths. _internal_asset_url / _internal_doc_url / _mirrored_asset_url lived in src/marketplace.py, while a copy named _mirrored_url lived in app/api/marketplace.py with a 'must stay aligned' comment. New module src/marketplace_urls.py is the single source of truth — both call sites import from it and a future URL-format tweak only needs to change one file. The _ROUTE_PREFIX constant collapses the per- function f-string repetition. The route-handler endpoints themselves still own the path string literals (keeping the builders identical to the route declarations remains a checklist item, not a runtime guarantee). * Re-key asset-mirror manifest by (plugin, url) + dedup HTTP fetches The manifest used to be keyed by URL alone, so two plugins in the same marketplace referencing the same external image (a shared CDN icon, a common cover) collided on entry.plugin_name — last writer won. The DB row for the losing plugin then stored a served URL pointing under the winning plugin's tree, and require_resource_access denied legitimate access on one side and let the other plugin's user reach the wrong asset. In-memory: Dict[Tuple[str, str], MirrorEntry] keyed (plugin_name, url). On disk: format flips from {url: entry} dict to [entry, ...] list of self-describing entries (each carries plugin_name + url + the previous fields). JSON keys can't be tuples; encoding 'plugin::url' would just shift the parsing burden. Phase 1 of sync_assets deduplicates fetches by URL — three plugins sharing one URL share one HTTP request. The conditional-GET prior is picked from any owning plugin's prior entry; if their etags diverge (rare) we miss one 304 and pay for a full re-download instead. Phase 2 still creates a per-(plugin, url) manifest entry pointing under the plugin's own subdir, and Phase 3 cleanup is keyed the same way so dropping a URL from one plugin's metadata doesn't disturb another plugin still referencing it. Body files stay per plugin (RBAC-clean isolation: deleting plugin A's cache can't strand plugin B). Bandwidth saved by fetch dedup. Consumer code re-keyed: src.marketplace._refresh_plugin_cache rebuilt served_url_for / mirror_status as composite-keyed maps; app.api.marketplace._resolve_external_via_mirror / _curated_inner_cover / _curated_inner_enrichment look up by (plugin_name, url). Tests cover per-plugin manifest entries with shared URL, the single HTTP fetch for N plugins, and Phase 3 drop-one-keep-other. All existing tests migrated to composite key access; v2 list format assertions verify on-disk shape. * Migrate asset mirror from urllib.request to httpx The asset mirror was the only HTTP call site in Agnes still using urllib.request; every other module (CLI, Jira / OpenMetadata / OpenAI connectors, scheduler, Telegram bot) already used httpx. The asset mirror was added in this PR's base commit, so this is the only chance to bring it into convention before someone copies it as 'the pattern for HTTP fetches in Agnes'. Three concrete benefits beyond consistency: * SSRF defence collapses from five urllib classes (_PinnedHTTPConnection, _PinnedHTTPSConnection, _PinnedHTTPHandler, _PinnedHTTPSHandler, _SafeRedirectHandler) into one _SSRFGuardTransport. httpx invokes handle_request() on every redirect hop, so re-validation is free — we don't need a custom redirect handler at all. * DNS-rebinding defence: the transport rewrites request.url.host to the SSRF-validated IP before delegating to super().handle_request(). httpcore connects to whatever URL.host says, so this pins the connection without subclassing HTTPSConnection. The original hostname goes into the Host header + the sni_hostname extension so TLS / vhost routing still bind to the curator-supplied hostname. * Error handling: one httpx.HTTPError catch-all for transport errors, plus specific httpx.TimeoutException / httpx.TooManyRedirects branches for clearer diagnostics. Matches the _translate_transport_error shape in cli/client.py. The shared httpx.Client is built lazily at module load (same pattern as cli/client.py:_get_shared_client) with follow_redirects=True, max_redirects=5, timeout=HTTP_TIMEOUT_SEC, and our custom transport. Externally observable behaviour is unchanged: same FetchOutcome statuses, same manifest format, same conditional GET semantics, same body-size cap. Tests migrated from urllib-shaped fakes to httpx-shaped (status_code, iter_bytes, context manager). Five urllib-specific tests replaced with httpx equivalents — three transport unit tests + one DNS-rebinding integration test that verifies host rewrite via monkey-patched super().handle_request. One test deleted without replacement (unwrap-URLError-wrapping-an-_UnsafeRedirectError — urllib-specific, not applicable to httpx). * Surface curated agnes-metadata enrichment on My Stack tab GET /api/marketplace/items?tab=my built each curated row from the on-disk marketplace.json by way of resolve_allowed_plugins, which doesn't carry the agnes-metadata enrichment columns (cover_photo_url, video_url, category override, doc_links). The handler then hard-coded cover_photo_url=None on the synthetic row. Result: once a user clicked '+ Add to my stack' on a curated card, the same plugin in tab=my rendered with the gradient placeholder instead of its cover photo — confusing parity break vs. the curated tab where the same row goes through MarketplacePluginsRepository and gets the enriched columns. Pre-load the enriched marketplace_plugins rows for every marketplace the user is subscribed to, then look each granted+subscribed plugin up by (marketplace_id, plugin_name). Fall back to the on-disk synthetic shape only when the DB row is missing — happens during the rare race where RBAC is granted before the first sync cycle ingests the plugin. RBAC gating (granted set from resolve_allowed_plugins) is unchanged so this fix can't widen visibility; it just upgrades the data shape behind cards the user was already going to see. Per-marketplace list_for_marketplace beats N gets — typical user is subscribed to <5 marketplaces, so this is at most a handful of queries vs. one per subscribed plugin. Regression test seeds a plugin with cover_photo_url + category override, subscribes the user, hits /api/marketplace/items?tab=my, and asserts photo_url + category come through. The misleading 'fall through to gradient until the user re-visits the curated tab' comment is gone. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>	2026-05-09 17:01:37 +02:00
Vojtech	d6ad08f107	Flea-market upload guardrails + soft delete + JOIN-based admin queue (#233 ) * feat(store): flea-market upload guardrails + soft delete + JOIN-based admin queue Adds an end-to-end guardrails pipeline for store uploads (manifest + static-security + LLM review), persists blocked bundles for forensics, introduces soft-delete (Archive) semantics, consolidates the legacy /store/{id} surface into /marketplace/flea/{id}, and reworks the admin queue so lifecycle filters read live entity visibility via LEFT JOIN rather than a denormalized submission column. Schema v29 → v35: * v29 store_submissions table + store_entities.visibility_status * v30 file_size, bundle_sha256, bundle_purged_at on submissions * v31 reshape store_submissions (drop legacy unique on entity_id) * v32 store_entities.archived_at/by + 'archived' visibility value * v33 drop store_submissions.retry_count (unused) * v34 ensure idx_store_submissions_entity exists post column-drop * v35 broaden visibility_status enum + JOIN architecture cutover Pipeline (src/store_guardrails/): * Inline checks: manifest_check, static_scan, quality_check * LLM review configurable haiku\|sonnet\|opus (default haiku) * BackgroundTasks-driven async path with structured-output JSON * Per-submitter daily quota (default 50) * 30-day TTL purge job (POST /api/admin/run-blocked-purge) * Bundle SHA256 + size persisted; sha256 survives purge for forensics Visibility model: * pending \| approved \| hidden \| archived * _enforce_visibility returns 404 (no leak) for non-owner non-admin * Owner sees own non-approved entries via include_owner_id widening * Install refused with 409 entity_not_approved when not approved Soft-delete (DELETE /api/store/entities/{id}): * Default = soft (visibility_status='archived'); existing installs keep getting served the bundle so users don't lose the plugin * ?hard=true admin-only: drops bundle + cascades user_store_installs * Hard-delete preserves entity_id on submission as tombstone so audit_log linkage survives for the activity timeline Admin queue lifecycle (the JOIN refactor): * Verdict (store_submissions.status) is immutable forensic record * Lifecycle (store_entities.visibility_status) is live state * /admin/store/submissions Archived chip translates to `e.visibility_status='archived'` via LEFT JOIN — any path that flips visibility surfaces in the queue immediately * Detail page renders Status (verdict) and Entity lifecycle side by side so admins see "approved at review, now archived" at a glance URL consolidation: * /store/{id} deleted (no redirect, stale bookmarks 404) * /marketplace/flea/{id} is the canonical detail surface * Three in-tree callers (upload-success, my-stack card, store listing card) updated to point at the new URL * Quarantine banner extracted to _quarantine_banner.html partial, self-guarded, included from both flea detail templates * Banner JS auto-refreshes when the verdict lands by polling /api/marketplace/flea/{id}/detail (visibility_status + submission_status — the latter is needed because blocked_llm keeps the entity at visibility_status='pending') Audit log resource format: * runner.py emits prefixed `store_submission:{id}` (post-fix) * Detail-page timeline query handles three patterns: prefixed submission, helper-emitted `store_entity:{sub_id}`, and bare-id legacy rows — all surface in the activity timeline UX fixes: * Owner sees Under review / Quarantined / Hidden banner with status * Install button gray-disabled (not blue) when non-approved * Owner cannot delete quarantined entries (403); admin can * Admin queue: filter chips, sortable columns, paging, page-size * Auto-refresh queue every 5s while pending rows are visible * Store upload page file picker no longer opens twice (label → input default action collided with explicit JS handler) Tests: 168 passed across the guardrails suites (admin submissions, store API, inline / LLM / purge guardrails, store repositories, marketplace filter, schema version). New regression coverage includes: archive surfaces via JOIN even when API path is bypassed; deleted submission renders activity timeline (tombstone); flea detail surfaces submission_status only for owner/admin; detail page renders Entity lifecycle row; audit log resource format covers both helper and runner paths. * fix(store-guardrails): PR #233 follow-up — prompt injection, atomic PUT, BG race, schema, reaper, sort whitelist Addresses 9 of the 23 findings from the PR #233 review (spec at docs/superpowers/specs/2026-05-09-pr233-guardrails-fixes-spec.md). Merge-gate items #1-#6 plus high-value mediums #7, #9-#12, #23. Architectural items (#8 enum split, #14 factory) and pure maintainability (#15-#22) deferred to follow-ups. Security: * #1 prompt injection — SYSTEM_PROMPT now passed via the SDK's dedicated system= parameter; bundle wrapped in <bundle>...</bundle> sentinels declared data-only by the system prompt; literal sentinel strings in user content are escaped so an adversarial README can't forge a close tag. * #6 static scan honesty — module docstring + admin copy + docs declare static scan as signal not gate; .md/.txt/.rst/.html/.json/ .yaml/.yml/.toml skipped to avoid false positives on prose. AST mode for Python deferred (separate flag, FP comparison work). Correctness: * #2 PUT atomicity — bundles bake into plugin.staging-<rand>/ alongside live, atomic-rename on success; failed checks leave live tree byte-for-byte intact. * #3 BG-task race — set_visibility_if_pending guards verdict flips to the (pending, hidden) review window; admin archives during review survive; skipped flips audit-logged. * #4 v35 NOT NULL/DEFAULT — schema v35→v36 re-applies them on store_entities.visibility_status. CHECK constraint enforced application-side (DuckDB ADD CHECK on existing column unsupported). * #7 stuck-review reaper — reap_stuck_llm_reviews flips pending_llm rows older than guardrails.stuck_review_grace_seconds (default 1800) to review_error. Scheduler runs every 15 min via new /api/admin/run-reap-stuck-reviews. Set knob to 0 to disable. * #9 quota counter — count_blocked_for_submitter_since now counts blocked_inline + blocked_llm + review_error so a submitter triggering only LLM-blocked verdicts is bounded. * #10 missing risk_level — surfaces as review_error with error='missing_risk_level' instead of silently defaulting to 'medium' (which looked like a model-decided block). * #11 archived_at clear — set_visibility nulls archived_at + archived_by when transitioning out of 'archived' so a future read doesn't show stale archive forensics on an approved row. Maintainability: * #12 FSM doc comment — accurate insert/transition/lifecycle description in src/db.py near store_submissions schema. * #23 sort-key whitelist — admin queue rejects unknown sort keys with 400 invalid_sort_key; substring-replace footgun removed. Deferred (separate PRs): * #5 quota race — proper fix requires asyncio.Lock spanning the full pipeline; threading.Lock blocks event loop, DuckDB MVCC doesn't help. API-level slowapi bounds worst case for now. * #6 part 3 (AST static scan), #8 (enum split), #13 (import bundle docs), #14 (factory consolidation), #15-#22 (maint). Tests: * New: tests/test_store_guardrails_prompt_injection.py (corpus + trust-boundary invariants), tests/test_store_put_atomic.py, tests/test_store_guardrails_reaper.py. * Extended: test_store_guardrails_llm.py (system param, missing risk_level, BG race), test_admin_store_submissions.py (quota counter widening, sort whitelist 400), test_store_repositories.py (un-archive metadata clear), test_db_schema_version.py (v36). * Full suite: 3738 passed; 17 pre-existing baseline failures unchanged (db migration tests, cli binary rename, catalog export, user mgmt v5 backfill — confirmed by stash + rerun on clean tree).	2026-05-09 17:32:53 +04:00
minasarustamyan	e26236fdc1	Extract session-pipeline framework + UsageProcessor skeleton (#232 ) * Extract session pipeline framework, refactor verification, add UsageProcessor skeleton Pluggable framework under services/session_pipeline/ (contract + lib + per-processor runner) so multiple processors can read /data/user_sessions/<key>/.jsonl on their own cadence with full failure isolation. Verification flow becomes the first plugin; a no-op UsageProcessor reserves the second slot pending a separate brainstorm on extraction logic + storage shape. Schema v28→v29: rename session_extraction_state → session_processor_state with composite PK (processor_name, session_file). Existing rows copied over with processor_name='verification'; legacy table dropped. Migration is idempotent and no-ops the copy step on fresh installs that came up at the new schema. Endpoint: /api/admin/run-verification-detector replaced by parametrized /api/admin/run-session-processor?processor=<name>. Audit action format follows. Scheduler JOBS: verification-detector entry split into session-processor:verification + session-processor:usage. SCHEDULER_VERIFICATION_DETECTOR_INTERVAL retained for operator compatibility (drives both cadence and health-check grace window); SCHEDULER_USAGE_PROCESSOR_INTERVAL added. Address PR #232 review: scan dead branch + per-processor lock - `SessionProcessorStateRepository.scan_unprocessed_for` dead else: both branches surfaced every jsonl, the SELECT was unused, runner MD5-rehashed every stable session per tick. Replaced with an mtime precheck — stable sessions (mtime <= processed_at) are filtered at scan; modified files still surface for the runner's authoritative `file_hash` invalidation. Naive-local comparison matches the existing health-check idiom (DuckDB TIMESTAMP strips tz on storage). - Per-processor advisory lock around `_run_processor` in `/api/admin/run-session-processor`. Scheduler tick + manual admin POST could otherwise both run, both call create_evidence on overlapping detections, and accumulate duplicate verification_evidence rows (the dedup short-circuit only covers create+contradiction, not evidence per ADR Decision 3). Non-blocking acquire → 409 Conflict on concurrent invocation; release in finally so a runner exception doesn't wedge the processor. Tests: two new scan unit tests (mtime filter + post-mark mtime bump), 409 endpoint test, lock-released-on-exception test. Two existing tests updated for the new "filtered at scan" stat shape (previously asserted skipped == 1, now scanned == 0). * Address PR #232 review #2: parallel scheduler tick + last_run on terminal state Two pre-existing scaffold bugs in services/scheduler/__main__.py amplified by adding more session-pipeline jobs: 1. Serial for-loop over jobs with synchronous httpx.post(timeout=900) — a 10-minute verification run blocked every other job (data-refresh, health-check, usage, corporate-memory) for the whole window. The PR's stated isolation guarantee held inside the runner but broke at the scheduler dispatch layer. 2. last_run advanced only when _call_api returned True. Permanent-failure jobs hot-looped on every tick (30s) instead of cadence (15min). Fix: ThreadPoolExecutor.submit per due job + per-job in_flight set so a long-running job can't be re-launched on subsequent ticks. last_run advances unconditionally in finally; errors still surface via _call_api logging + audit_log on the receiving side. _run_job extracted to module-level for unit testing. New tests: - TestRunJobBookkeeping: advances on success / failure / unhandled raise - TestRunLoopParallelism: in_flight protection prevents duplicate launches across ticks for a single slow job --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>	2026-05-08 19:47:46 +02:00
Vojtech	2e2e1a1eca	feat(home): state-aware /home + /setup-advanced + schema v26 (#228 ) * feat(home+news): state-aware /home + /news + admin-edited news section Squash of the vr/home-page feature work for clean rebase onto main. Original 18-commit history preserved in branch backup/vr-home-page-pre-rebase. What's in this PR: State-aware /home page - New `/home` route with hero + auto-mode + connectors (Asana / GWS / Atlassian) + lookarounds. Onboarded vs not-onboarded state-machine branches a single template (`home_not_onboarded.html`); the install steps, "Setup a new Claude Code" CTA (90-day PAT mint), and per- connector setup prompts hide once `users.onboarded=TRUE`. A completion badge replaces them. - "Mark me as offboarded" button reverses the flag without an SQL UPDATE. - `users.onboarded BOOLEAN` column added; default FALSE; flipped by the CLI's `agnes init` post-success POST and the `/admin/users` API. - Connector setup prompts pre-check whether the tool is already installed/connected before re-running setup. - GWS scope set widened to include Google Chat (`chat.spaces`, `chat.messages`). Single template + design tokens - `dashboard.html` now extends `base.html` via the new `{% block layout %}` opt-out (full-width pages skip the 800px `.container`). Net: every page shares one shell. - `style-custom.css` `:root` extended with `--space-{7,9,10,12}`, `--radius-2xl`, `--shadow-{card,elevated}`, `--text-{muted,disabled}`, `--focus-ring`, `--transition-`, `--width-{narrow,app,wide}` so inline page styles can migrate incrementally. Auth redirects honor AGNES_HOME_ROUTE* - `safe_next_path` resolves the configured home route when no `default=` is passed; OAuth callbacks, magic-link clicks, password form, and LOCAL_DEV_MODE shortcuts now land on `/home` (or whatever the operator picked) instead of always /dashboard. News section + /news permalink + /admin/news editor - Schema-bumped `news_template` table (single versioned entity, draft + publish gate). `published BOOLEAN` distinguishes draft from public; monotonically-increasing `version` per save; rows >30d pruned on save except the currently-displayed published version. - `/home` bottom-of-page renders the latest published intro with a "Read more →" link to `/news` (which renders the full body). - `/admin/news` editor with sandboxed live preview, versions table, per-row Unpublish, Format-help cheatsheet. - `agnes admin news show / draft / edit / publish / unpublish / versions / export` (CLI). Talks to the live server via the `/api/admin/news/` endpoints (PAT-authed) — no direct DB access so it coexists with a running uvicorn. - Optimistic-lock guard: `agnes admin news publish --version N` and PUT/PATCH endpoints accept `expected_version` and 409 with structured `{error: "version_conflict", expected, actual, actual_by}` when a concurrent admin replaced the draft. Edit refuses to overwrite a draft authored by someone else without `--force` or `--expect-version`. - nh3 (Rust-backed ammonia) HTML sanitizer; iframe pre-pass strips any iframe whose src is not on the YouTube/Vimeo/Loom allowlist; javascript:/data: schemes blocked everywhere. - Author CSS vocabulary: `.news-hero` (blue gradient hero block), `.callout`/`.callout-{info,warn,success,danger}`, `.video-embed`, `.news-section`, `.news-grid-{2,3}`, `.news-cta` — all consolidated in `style-custom.css` under "News content vocabulary (shared)" so /home perex, /news body, and /admin/news preview share one source of styling. - Code-inside-`<pre>` contrast fix (was unreadable amber-on-silver). - `.news-content` table styling (border, header band, row-hover). `scripts/dev/run-local.sh`* — local uvicorn launcher. Pulls Google OAuth client id/secret from GCP Secret Manager (`AGNES_OAUTH_GCP_PROJECT`-driven, no vendor defaults), points `AGNES_CLI_DIST_DIR` at `./dist` so the wheel endpoint resolves, and `--dev` flips `LOCAL_DEV_MODE=1` + `AGNES_HOME_ROUTE=/home` for one- command iteration. `LOCAL_DEV_MODE=1` also enables the FastAPI debug toolbar. CLAUDE.md "Run tests before every push" section codifies `pytest tests/ -n auto -q` as non-negotiable before each push. Tests: 51 + 14 + 8 = 73 new tests across news-template repo, sanitizer, API, web, CLI; plus updated home/auth/template tests for the new shared-shell architecture. Origin docs (gitignored, customer-fork content): docs/brainstorms/home-page-requirements.md, docs/plans/2026-05-07-001-feat-home-page-plan.md. * feat(cli): agnes onboarded {on,off,status} — self-scoped flag toggle User-facing equivalent of the in-page "Mark me as (off)boarded" button on /home. POSTs /api/me/onboarded with {onboarded, source}; --source overrides the audit-log marker so flips made from the CLI vs the web button vs agnes init automation stay distinguishable. `status` reads via /api/me/profile (when present); falls back to a quick body-marker scan of /home so the read path doesn't write an audit_log row. PAT-authed via cli.client.api_post — same convention as agnes admin news / agnes admin add-user etc. Tests: 5 covering on/off/status round-trip, idempotency, and audit-log source recording. Full suite holds at 12 pre-existing failures (same set as before). * ui(nav+home): primary nav reorg + green What's new band + /marketplace link fix Primary nav (post-rebase audit + per-user feedback): - Items: Home → Marketplace → Data Packages → Memory. Admin dropdown for admins only. The "Dashboard" label was renamed Home — point still resolves through `home_route` so customer instances on /dashboard still land there. - Activity Center moved into the Admin dropdown. Per-team adoption analytics is admin-consumed in practice; the route still allows any authed user for direct deep-links so existing /home tile + bookmarks keep working. - Memory link added (→ /corporate-memory) — was previously buried in the /home "Look around" tiles. - Setup local agent + My Stack dropped from main nav. Setup is the /home install flow's home now; My Stack lives as a tab inside /marketplace. /home tweaks: - Plugin marketplace tile now points at /marketplace (was /store — legacy from before the marketplace rebrand landed in #230). - "What's new" section header gets a green band (success-flavored D1FAE5 background, A7F3D0 border, darker green title) so the bottom-of-page news block visibly distinguishes from the blue install-hero at the top. Header strip only — body stays white. Test fix: test_home_route_resolution renamed `dashboard_link_uses_home_route` → `home_link_uses_home_route` and asserts `href="/home">Home` instead of `href="/home">Dashboard` after the label change. * fix(home): decouple Step 3 + Connect-tools collapse from server onboarded flag The server-side `users.onboarded` flip happens through two paths: 1. Explicit user click on "Mark me as onboarded" or `agnes onboarded on`. 2. Implicit `agnes init` POST → /api/me/onboarded on success. Path 2 produced a UX surprise: an analyst running `agnes init` mid-flow reloaded /home and saw Step 3 (auto-mode) + Connect-your-tools auto- collapse to summary bars. They were actively working through those sections — the install POST never signalled "I'm done with the rest of setup", just "Agnes itself is installed". Decouple the section-collapse decision from the server flag: - Step 1 + Step 2 install blocks: still hidden on `onboarded=TRUE` (their completion is a hard server signal — Agnes IS installed). - Step 3 + Connect-your-tools: render flat by default in BOTH states. Wrapped in `<details class="setup-collapsible" open>` so the browser's native disclosure handles per-section toggle without JS, but the `<summary>` is CSS-hidden until the page-level `data-setup-minimized="1"` attribute is set on `.home-mock`. - New "Minimize setup view" toggle inside the blue install-hero, rendered only when onboarded. Click flips the data-attr on `.home-mock` AND removes the `open` attribute from each `<details>`. State persists in `localStorage["agnes_home_setup_minimized"]` so the choice survives reloads but is per-device. - "Show full setup view" (the same button when minimized) re-opens both `<details>` and clears localStorage. When minimized, each `<details>` still has its own native expand/ collapse — click the gray summary bar to peek at one section without toggling the page-level minimize off. Tests: - test_step3_and_connectors_render_flat_when_onboarded_by_default — asserts `<details class="setup-collapsible" ... open>` for both sections post-onboarding and the absence of any server-rendered `data-setup-minimized` attribute on the `.home-mock` root. - test_minimize_toggle_visible_only_when_onboarded — toggle button rendered only when onboarded. Full pytest holds at 12 pre-existing failures (same set).	2026-05-08 18:28:47 +02:00
Vojtech	107195730d	feat(observability): optional PostHog integration (#231 ) * feat(observability): optional PostHog integration (errors, LLM traces, replay, flags) Off by default. Activates when POSTHOG_API_KEY is set in env. Defaults to PostHog Cloud EU; override host for US Cloud or self-hosted. Coverage: - FastAPI 500 handler captures unhandled exceptions - src/orchestrator.py rebuild + rebuild_source failures - services/scheduler/ HTTP-job failures - cli/main.py uncaught CLI errors (Typer.Exit/SystemExit/KeyboardInterrupt skipped; flushes before re-raise so short-lived CLI invocations don't drop events) - connectors/llm/anthropic_provider.py + openai_compat.py emit $ai_generation events with provider, model, latency, token counts (prompt/completion bodies stay off unless POSTHOG_LLM_PAYLOADS=1 because LLM prompts here routinely include customer SQL/data) - Browser snippet injected into every text/html response by PosthogInjectionMiddleware — registered inside the GZip layer so it sees uncompressed HTML before compression. Many templates are standalone (their own DOCTYPE) and never extend base.html, so a per-template include would miss them. - Frontend: $pageview, $pageleave, JS error capture via window.error and unhandledrejection handlers, masked session replay (maskAllInputs: true plus CSS-selector mask for known data surfaces), feature flags (browser posthog.isFeatureEnabled + server-side feature_enabled with fallback for older SDKs). Identification mode operator-configurable: none / id / email / full. Default email ships user.id + email but never name. CLI entry point moves from cli.main:app to cli.main:main (Typer wrapper). Files: - src/observability/posthog_client.py — lazy singleton, no network when disabled, single-process flush on shutdown - src/observability/llm_tracing.py — trace_generation context manager - app/middleware/posthog_inject.py — HTML rewrite middleware - app/web/templates/_posthog.html — browser snippet template - docs/observability.md — operator guide - config/.env.template — documented POSTHOG_* knobs - tests/test_posthog_disabled.py + tests/test_posthog_client.py + tests/test_llm_tracing.py — 18 tests covering disabled state, identify-mode payloads, $ai_generation shape, error variant. CHANGELOG entry under [Unreleased] Added. * feat(observability): tag every PostHog event with environment + release Splits PostHog dashboards cleanly between localhost / dev / staging / production without manual tagging on every capture call. - POSTHOG_ENVIRONMENT explicit override; auto-resolves to "local" when LOCAL_DEV_MODE=1, else RELEASE_CHANNEL, else AGNES_DEPLOYMENT_ENV, else "unknown". - AGNES_VERSION → RELEASE_CHANNEL fallback feeds the `release` property for "is this error new in this release?" cohorting. - Backend gets both via the PostHog SDK's super_properties constructor arg (every captured event picks them up automatically). - Browser snippet calls posthog.register({environment, release}) inside the loaded callback so $pageview, $exception, autocapture, etc. all carry the same labels. - request.state.user now populated by auth dependencies so the snippet can actually call posthog.identify(user_id, {email}) for logged-in users (previously the user block always resolved to None because nothing wrote to request.state.user). 4 new tests cover env resolution: explicit > LOCAL_DEV_MODE > channel > unknown, plus super-properties forwarding into the SDK constructor. * feat(observability): inline user attrs on every PostHog event + debug throw route PostHog's UI shows person properties on the Person profile page, not inline on each event — so a reviewer triaging an exception couldn't tell which user hit the bug without clicking through. Fix it on both sides. - Backend capture_exception merges user_id / user_email / user_name into the event properties (gated by POSTHOG_IDENTIFY_PII: none/id/email/full). Backed by a new _user_props_for_event helper on PosthogClient. - Browser snippet registers user_id + user_email + user_name as super- properties via posthog.register({...}) so every $exception, $pageview, and custom event coming from posthog.captureException() carries them inline. Mirrors the backend so cross-referencing client/server events doesn't require a person-profile lookup. - /api/debug/throw — debug-only endpoint gated by DEBUG=1 (404 in prod). Runs Depends(get_current_user) first so request.state.user is set when the unhandled-exception handler captures the event. Lets operators exercise the full observability path end-to-end without hand-rolling a TestClient script. Configurable via ?kind=ValueError&msg=... 7 new tests cover: backend user-attr merge across identify modes, anonymous request fall-through, browser snippet super-prop emission for logged-in / anonymous / id-only / full-name cases. * fix(observability): address minasarustamyan PR #231 review Two bugs caught in review. 1. PosthogInjectionMiddleware dropped Response.background on every return path. BaseHTTPMiddleware materialises the body and asks subclasses to return a fresh Response — three paths in dispatch() omitted background=, silently cancelling any BackgroundTask / BackgroundTasks the route attached (audit logging, async webhooks, email sends) with no log line. Fix: route every return through a _passthrough() helper that forwards background. Also adds a _MAX_BUFFER_BYTES (4 MB) cap so a streamed-HTML response can't balloon RSS during buffering. Bigger bodies short-circuit through with a warning rather than being injected. Regression tests in tests/test_posthog_inject_middleware.py exercise four return paths (snippet present, render-fail, double-injection guard, non-HTML passthrough) plus the streaming-guard short-circuit. 2. $ai_input / $ai_output_choices were emitted without truncation, so POSTHOG_LLM_PAYLOADS=1 silently dropped events past PostHog's ~32 KB per-event ingest limit — exactly the calls (large prompts with schemas / sample rows / SQL) an operator would want to inspect. Fix: clip both at POSTHOG_LLM_PAYLOAD_MAX_CHARS (default 30000) with an explicit "…[truncated N chars]" marker so readers don't mistake truncated captures for complete ones. Metadata (provider, model, tokens, latency, error) flows regardless. Three new tests cover default-cap clipping, env-override, and pass-through under the cap. 37 PostHog tests pass.	2026-05-08 17:57:10 +04:00
minasarustamyan	4fb2818a19	Add /marketplace browse page + Model B opt-in stack composition (#230 ) * Add /marketplace browse page + Model B opt-in stack composition New /marketplace browse surface unifies the curated marketplaces (admin-managed git mirrors) and the community Flea Market behind three tabs — Curated / Flea / My Stack — with per-tab category filter, search across both sources with scope checkboxes, and numeric pagination, all driven by URL query state. Plugin detail at /marketplace/curated/<slug>/<plugin> and /marketplace/flea/<id>; nested skill / agent detail at /marketplace/curated/<slug>/<plugin>/ {skill,agent}/<name> and the flea-side single-page detail. Model B opt-in: an RBAC grant on a curated plugin is now only eligibility. The user must click "Add to my stack" for it to enter their served Claude Code marketplace. Composition flips from (rbac ∖ opt_outs) ∪ store_installs to (rbac ∩ subscriptions) ∪ store_installs. The legacy user_plugin_optouts table is renamed user_curated_subscriptions (schema v27) — same table shape, inverted semantic, repository methods become subscribe / unsubscribe / is_subscribed. UX vocabulary: Install → Add to my stack, Installed → In your stack, card "Installed" badge → "In stack" (amber pill), tab "My Subscriptions" → "My Stack". Bridges the two-step model (server-side bookmark vs. on-laptop install) the previous label hid. Click triggers an inline post-add hint panel under the description with the agnes refresh-marketplace recipe + Copy chip, dismissible per-browser via localStorage. Per-tab info blocks above the filter row: - Curated: trust signal — "Each plugin here has a named curator accountable for it." (blue accent + See-all-curators link) - Flea: open-shelf signal — "Anyone in the company can upload here." (purple accent + Tips-for-sharing link) - My Stack: personal-shelf orientation — "Your AI stack — everything you've added." (slate accent, no link) Tabs carry per-tab Heroicons (shield-check / building-storefront / rectangle-stack) tinted to match each tab's accent; flips white when the tab is active for contrast. Hero illustration anchored to the right of the blue hero panel (absolute, 47% wide, behind the search row content). Hidden under 900px viewport. Action-row CTAs realigned to publication intent: curated "How to add new content" → "Submit a plugin" (links to the guide page); flea button removed since +Upload sits next to it. Empty-state CTAs match. /marketplace/guide/{curated,flea} routes now host publication-flow guide pages with placeholder ledes — full copy to be authored separately. Categories: Heroicons-based icons mapped per category in src/category_icons.py (zero new dependencies; SVG path strings inlined). Marketplace cards, filter pills, and detail pages read from the same source. API endpoints under /api/marketplace: - GET /items per-tab listing (curated / flea / my) - GET /categories per-tab non-zero counts - GET /curated/{slug}/{plugin} plugin detail - POST/DELETE /curated/{slug}/{plugin}/install subscribe toggle - GET /curated/{slug}/{plugin}/{skill,agent}/{name} inner item The tab=my branch reads directly from user_curated_subscriptions ∪ user_store_installs (not resolve_user_marketplace, which bundles flea skills/agents into a single store-bundle synthetic entry useful for serving the Claude Code marketplace ZIP/git but wrong for browsing where each item should appear as its own card). Detail pages: plugin detail surfaces inner skills/agents as clickable nested cards; commands/hooks/MCPs render as plain name lists. Skill/agent detail mirrors the plugin layout with kind-tinted accents (skill = green, agent = purple), Description + Details sidebar, Files + Docs sections, and the "How to call it" copy-able invocation chip showing /<plugin>:<inner-name> exactly as Claude Code namespaces it post-install. Curated nested has no install button — links back to the parent plugin. Navbar: standalone "My AI Stack" relabelled "My Stack" and points at /marketplace?tab=my; "Store" link removed (Store flow is reachable via the Flea Market tab's +Upload button). The standalone /my-ai-stack and /store routes still work for old bookmarks. Tests cover the new browse / categories / install / RBAC paths under tests/test_marketplace_api.py; existing marketplace and store tests updated for Model B (explicit subscribe in fixtures). Schema bumped v26 → v27 with idempotent migration that wipes existing user_plugin_optouts rows on flip and adds marketplace_plugins.created_at with registered_at backfill. * Fix v28 migration + post-rebase test fallout v28 ALTER TABLE marketplace_plugins ADD COLUMN created_at conflicted with _SYSTEM_SCHEMA's earlier CREATE that already includes the column on fresh installs (test fixtures starting at any pre-v28 version trip on it). Switch to ADD COLUMN IF NOT EXISTS — same idiom as the upstream v27 Keboola sync-strategy migration on the same ladder. Two test patches needed after the rebase bumped SCHEMA_VERSION 27 → 28: - test_keboola_v27_migration.py: test_schema_version_constant_is_27 was pinning ==27. Loosened to >=27 (the test's purpose is to verify the v27 Keboola migration, not to pin the current SCHEMA_VERSION). - test_setup_page_unified.py: was monkeypatching resolve_allowed_plugins but compute_default_agent_prompt now reads from resolve_user_marketplace (Model B-aware). Stub the right function so the test exercises the v28 served-set path. * Harden curated skill/agent inner endpoints against path traversal `_read_inner`, the `skill_dir` walk in `curated_skill_detail`, and the `agent_path.stat` in `curated_agent_detail` joined URL path-params onto `plugin_root` without verifying the resolved candidate stayed inside it. Starlette's `[^/]+` on `{skill_name}` / `{agent_name}` blocks the direct URL exploit (encoded `/` 404s before the handler), but a curator-planted symlink inside a curated marketplace's git mirror could still dereference outside the plugin tree on read. Adds `_safe_join(plugin_root, *parts)` doing `Path.resolve(strict=True)` + `relative_to(plugin_root.resolve())`, used by all three call sites so the boundary is enforced once and consistently. Tests cover the helper directly (normal path resolves, escaping `..` returns None, escaping symlink returns None, missing file returns None) plus an end-to-end check that the symlink case actually 404s on the HTTP endpoint. Symlink tests skip on Windows where symlink creation needs elevated permissions; they run on Linux CI. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>	2026-05-08 14:22:19 +02:00
ZdenekSrotyr	cc1886c97c	release: 0.47.4 — Docker collector skip + FIFO session-pipeline check (#229 ) ## Summary Two minimum-viable fixes after today's 0.44.0 → 0.47.3 release train and the production 30-user launch. Devil's advocate review of a 3-PR / 7-item plan cut scope to these 2 — the rest is deferred to a separate "operate-first, instrument-second" backlog item. ### B2 — Docker session_collector log skip `services/session_collector` was logging `Collection complete: 0 users, 0 files copied` + `WARNING: Group 'data-ops' not found, using default group` every 10 minutes in the Docker layout (where `/home//user/sessions/` doesn't exist). New env var `AGNES_SKIP_LEGACY_COLLECTOR=1` set by default in `docker-compose.yml` short-circuits the collector pass. The bare-VM deployment path (where /home/ IS populated by Claude Code) leaves the env var unset and continues to scan normally — including the data-ops warning, which is load-bearing for catching missing-group mis-deploys. ### O2 — FIFO check in `_check_session_pipeline` The existing check compares `MAX(processed_at)` to newest jsonl mtime — catches "detector hasn't run lately" but blind to "old file was skipped while newer ones were processed". New code finds the oldest FS jsonl that's NOT in `session_extraction_state.session_file` and flags if its mtime is older than `SESSION_PIPELINE_STUCK_FILE_GRACE_SECONDS` (default 4× the existing grace = 2h). Severity intentionally starts at `info` so we can collect prod data on false-positive rate before tightening to `warning`. The aggregator already treats `info` as non-promoting (see the severity vocabulary docstring at the top of `app/api/health.py`), so the headline `status` stays at `healthy` even when this fires — the operator sees the entry in the per-check breakdown but no spurious `degraded` overall. ## Test plan - [x] `pytest tests/test_session_collector.py` — 17 tests pass (existing 9 + new 8 covering env-set/unset, truthy variants, falsy non-skip). - [x] `pytest tests/test_health_session_pipeline.py` — 8 tests pass (existing 4 + new 4 FIFO tests covering stuck-file, under-threshold, all-processed, env-override). <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/229" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->	2026-05-08 09:38:21 +02:00
ZdenekSrotyr	917f9aaef0	release: 0.47.2 — restore #218 + #219 fixes silently reverted by #217 (#225 ) ## Summary Smoke-testing the just-shipped 0.47.1 against production exposed two regressions: 1. `agnes query --remote "SELECT FROM unit_economics WHERE bad_col=1"` returned `Table "unit_economics" must be qualified` (the OLD error) instead of `Unrecognized name: bad_col` (the #218 fix's intended behavior). 2. `agnes query "DESCRIBE unit_economics"` showed only DuckDB's misleading `Did you mean order_economics?` with no Agnes hint paragraph (the #219 fix is missing). Root cause: PR #217's squash merge (`506a378c`) carried stale snapshots of `app/api/query.py` and `cli/commands/query.py` from before #218 and #219 merged. The rebase-and-merge auto-merged those files cleanly (no conflict markers) but the result silently reverted both fixes. Restore the two changes verbatim. Tests for both fixes already on main and continue to pass against the restored code. ## Test plan - [x] `pytest tests/test_api_query_guardrail.py tests/test_cli_query.py` — clean - [x] Manual repro against prod after deploy: both flows now surface the intended diagnostic. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/225" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->	2026-05-07 19:57:18 +02:00
ZdenekSrotyr	506a378c3a	release: 0.47.1 — Keboola connector v27 (incremental, partitioned, where_filters, typed parquet) (#217 ) ## Summary Brings the Keboola connector to feature parity with the legacy internal data-analyst's per-table sync strategies. Closes the four documented gaps from the spec branch (`zs/keboola-connector-specs`): - Typed parquet in the legacy SDK extraction path — column types from Keboola Storage metadata (provider cascade `user > ai-metadata-enrichment > keboola.snowflake-transformation`) survive the CSV → parquet roundtrip; invalid date strings (`'0000-00-00'`) and invalid numeric strings (`'Non-Manager'`) become NULL while keeping the column's typed schema. Pre-fix everything was VARCHAR. - Incremental sync via Storage API `changedSince` — opt-in per table; pulls only delta rows, merges into the existing parquet by `primary_key` (drop_duplicates with keep='last'). Cuts daily extraction from O(full table) to O(delta). - Partitioned sync — flat per-partition layout `data/<table>/<key>.parquet` (e.g. `2026_05.parquet`), per-affected-partition merge for daily updates, chunked initial load with 1-day overlap and 2-empty-chunk stop heuristic. - `where_filters` — server-side row filter with date placeholders (`{{today}}`, `{{last_3_months}}`, `{{start_of_3_months_ago}}`, etc.) resolved at sync time. Force the SDK path; reject `incremental + where_filters` combination at API layer (changedSince already filters temporally). ## Architecture - Schema migration v25 → v26: 7 new columns on `table_registry`. Existing `sync_strategy` column reused (pre-v26 it was inert catalog metadata; post-v26 the extractor dispatches off it). - Per-table dispatcher in `extractor.run()` routes to one of `_extract_via_extension` (full_refresh + extension), `_extract_via_legacy` (full_refresh + filters or extension fallback), `extract_incremental`, or `extract_partitioned`. - API conflict policy: `incremental + where_filters` → 422; `partitioned + query_mode='remote'` → 422; `partitioned ⇒ partition_by required`. - Admin UI: third "Direct extract (Storage API)" radio in the Keboola Register / Edit modals, alongside existing "Whole table (extension)" and "Custom SQL". When selected, exposes a v26 sync-strategy panel with conditional fields per strategy. ## Test plan - [x] Unit + module — 134 v26 tests covering migration, repo, parquet_io, where_filters, incremental (compute_changed_since + merge_parquet + extract_incremental E2E), partitioned (key derivation + merge_partition + chunked windows + extract_partitioned E2E), extractor dispatcher, admin API validators, PUT field clearing, registry-shape → dispatcher bridge - [x] HTML form structure — all v26 inputs + visibility classes + JS payload fields verified in rendered template - [x] Real Keboola roundtrip — registered a small test table as `sync_strategy='incremental'` against a test Storage project, triggered two syncs: - Sync 1: `changedSince=None` → full pull → 9 rows typed parquet - Sync 2: `changedSince=last_sync - 1d window` → 9 delta rows merged with 9 existing → 9 after dedup on primary_key (PK merge confirmed) - [x] Browser UX — agent-browser session against a local uvicorn: login → admin/tables → register modal → switch radios → verify field visibility per strategy → submit → edit existing row → switch to Direct/Incremental → save → confirm DB persistence - [x] Regression — no regressions in the broader 3252-test suite (3 pre-v26 tests updated for the deprecation-marker removal + schema-version bump; 2 pre-existing environment-sensitive test failures unrelated to this change) ## Bugs caught + fixed during E2E The browser + real-Keboola roundtrip exposed four bugs the unit tests missed: 1. JS visibility race — two competing `forEach` loops set `display=''` then `display='none'` on form elements sharing `kb-strategy-incremental kb-strategy-partitioned` classes (window_days + max_history_days are reused across strategies). Fix: single-pass selector with class-based visibility resolver. 2. PUT cannot clear field — pre-v26 `updates = {k: v ... if v is not None}` collapsed "omitted from body" and "sent as null" into the same case, so admin couldn't switch a partitioned row back to full_refresh and have stale `partition_by` clear. Fix: `model_dump(exclude_unset=True)`. 3. Subprocess DB lock conflict — `_read_last_sync` reopened `system.duckdb` while the parent server held the write lock (subprocess contract at `app/api/sync.py:_run_sync` line 260). Fix: parent injects `__last_sync__` into table_config before subprocess spawn. 4. Wrong KBC table_id — `extract_incremental` / `extract_partitioned` built the Storage API table_id from the registry row's slugified `id` (`circle_inc`) instead of `bucket.source_table` (`in.c-finance.circle`), producing 404s. Fix: prefer `bucket+source_table`; fall back to `id` only when bucket empty. ## Operator notes - Existing tables stay on `full_refresh` after migration; admins opt individual tables in via `agnes admin register-table --sync-strategy ...`, the Keboola Edit modal, or `POST/PUT /api/admin/registry`. - `merge_parquet` and `merge_partition` use `pd.concat + drop_duplicates`, loading both existing and delta into pandas RAM. For tables in the multi-million-row range this may OOM — switch to `partitioned` strategy for those (per-partition merge keeps memory bounded). Documented in `### Internal` of the changelog entry. - Date placeholders are resolved at sync time, not register time — a typo'd `{{lasst_week}}` is accepted at register and surfaces only when the next sync runs. By design (rolling windows need late-binding). ## Spec source The four corresponding plans on the `zs/keboola-connector-specs` branch under `docs/superpowers/plans/2026-05-07-0[1-4]-*.md` capture the design rationale and link back to internal repo references for each subsystem. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/217" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->	2026-05-07 19:01:27 +02:00
ZdenekSrotyr	aa5921da67	release: 0.47.0 — source-agnostic catalog metadata + cache discipline (#223 ) ## Summary - Catalog enrichment for `query_mode='remote'` rows: `rows`, `size_bytes`, `partition_by`, `clustered_by` per table (BQ + Keboola providers). - `/api/v2/schema/{id}` cache miss: 2 BQ jobs → 1 (-50%) via shared `fetch_bq_columns_full`. - All four catalog/schema/sample/metadata caches flush on registry change; single-row re-warm scheduled. - Automatic cache warmup at server startup (bounded concurrency, opt-out via `AGNES_SKIP_CACHE_WARMUP=1`). - SSE-driven freshness toolbar on `/admin/tables` with progress bar, log, and per-row badge. - New admin doc `docs/admin/query-modes.md` — single source of truth on `local` / `remote` / `materialized` choice. Closes #155. Closes #156. ## Test plan - [x] 65+ targeted tests pass across 11 new test modules + 3 modified ones. - [x] No DB migration; no wire-break; `MIN_COMPAT_CLI_VERSION` unchanged. - [ ] Reviewer: register a remote BQ table via `/admin/tables`, observe the toolbar populates within ~2 s and the per-row badge transitions warming → fresh. - [ ] Reviewer: trigger `Re-warm all`, verify SSE log scrolls and `cacheWarmupBar` progresses. - [ ] Reviewer: edit a registered row's bucket, verify `agnes schema <id>` returns updated columns immediately (no 1-hour staleness). - [ ] Reviewer: confirm `agnes admin register-table --query-mode remote` prints the new IAM-smoke-check hint. ## Notable design decisions - BigQuery `INFORMATION_SCHEMA.TABLE_STORAGE` is the only valid scope for size+rows (verified live 2026-05-07; dataset-scoped doesn't exist). Region resolved from `instance.yaml.data_source.bigquery.location` → `bq.client().get_dataset(...)` → fall back to legacy `__TABLES__`. - VIEW handling: TABLE_STORAGE returns no rows for views, fall through to `__TABLES__` (also empty) → `TableMetadata(rows=None, size_bytes=None, partition_by=..., clustered_by=...)`. Null size signals analyst Claude to apply existing CLAUDE.md guidance. - `size_bytes` is `active_logical_bytes + long_term_logical_bytes` — full BQ scan reads both; reporting only active undercounts aged partitioned tables. - Source-agnostic provider seam: per-source `connectors/<source>/metadata.py:fetch(MetadataRequest)`; dispatcher in `app/api/v2_catalog.py:_metadata_provider_for` lazily imports per source_type so a Keboola-only deployment doesn't pay the BQ-extension import cost. - Warmup non-blocking: FastAPI `lifespan` schedules `asyncio.create_task(_warm_catalog_caches_bg)` before `yield`. Per-row failures isolated. ## Out of scope - Profile / column histograms / dimension cardinality for remote tables (separate issue). - Onboarding nudge ("you have 0 remote tables, consider registering some BQ ones") — separate UX call. - Provider plug-in registration via entry-points (the dispatch table is a hardcoded if-tree today; one line per future source). ## Release Bumps `pyproject.toml` 0.46.1 → 0.47.0 (main shipped 0.46.0 + 0.46.1 during this PR — see commit `d98976ec`). New CHANGELOG section under `## [0.47.0] — 2026-05-07`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/223" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->	2026-05-07 18:33:55 +02:00
ZdenekSrotyr	751cc25327	release: 0.46.5 — agnes describe -n parses, server sanitizes NaN (#224 ) ## Summary Two bugs in `agnes describe` surfaced from a real analyst session following the CLAUDE.md agent-rails discovery workflow. Together they break `agnes describe` end-to-end for any analyst (or analyst-AI) who follows the documented form. ### A) CLI parsing `agnes describe TABLE -n 5` failed with `Missing argument 'TABLE_ID'`. Root cause: the command was registered as a `Typer.Typer` subcommand group via `app.add_typer(describe_app, name="describe")` + `@describe_app.callback(invoke_without_command=True)`, and that pattern mis-parses positional + short-int option in some orderings. Same pattern in `cli/commands/schema.py` works only because schema has no INTEGER short option. Fix: switch to flat `@app.command("describe")`. ### B) Server NaN `/api/v2/sample/<id>` (called by `agnes describe`) returned HTTP 500 with `ValueError: Out of range float values are not JSON compliant: nan` whenever a row contained NaN. Fix: sanitize NaN/±inf to None before JSON serialization. ## Test plan - [x] `pytest tests/test_cli_describe.py` — added regression tests pinning `-n` parsing on either side of the positional. - [x] `pytest tests/test_api_v2_sample.py` — added regression test for NaN row → JSON `null` (not 500). <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/224" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->	2026-05-07 18:16:21 +02:00
ZdenekSrotyr	7fc5365891	release: 0.46.3 — self-heal session pipeline + clearer diagnose (#220 ) ## Summary Verified against production: `claude -p` headless mode doesn't fire SessionEnd hooks (proven via `--output-format stream-json --include-hook-events`: zero `SessionEnd` events), so any session JSONLs from `-p` invocations stay orphaned locally and never reach the server. Fix: add `agnes push --quiet` as a third SessionStart entry — symmetric self-heal alongside the existing `agnes pull` entry. Existing workspaces pick this up on their next `agnes init` via the marker-based migration already in `cli/lib/hooks.py`. Separately: a colleague's fresh install showed `agnes diagnose` warning "uploads are not being processed", which led them to suspect their `agnes push` was broken. The warning is actually about the LLM-based `verification-detector` backlog (uploads themselves were arriving fine — confirmed by 23+3 JSONLs landed on the server while the warning was firing). Reword the warning to "verification-detector backlog" + add `last_processed` to the diagnose dict so operators don't have to grep logs to confirm. ## Test plan - [x] `pytest tests/test_lib_hooks.py` — updated count + added `agnes push in SessionStart` assertion. - [x] `pytest tests/test_setup_hooks_template.py` — updated. - [x] `pytest tests/test_clean_install_integration.py` — updated. - [x] `pytest tests/test_health_session_pipeline.py` — updated warning text + asserted `last_processed` field. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/220" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->	2026-05-07 17:41:22 +02:00
ZdenekSrotyr	378ee40459	release: 0.46.1 — surface real BQ error from remote_estimate_failed retry (#218 ) ## Summary When `agnes query --remote` references a column that doesn't exist on the FROM table, users were seeing `Table "<id>" must be qualified with a dataset` instead of the actually-useful `Unrecognized name: <column>` from BigQuery. Surface the first-attempt diagnostic now; keep the second-attempt context as `underlying_original`. Reproduced against production: ``` $ agnes query --remote "SELECT COUNT(*) FROM unit_economics WHERE authorize_date = DATE '2025-05-06'" Error: remote_estimate_failed (HTTP 400) message: Could not estimate scan size for this query. underlying: 400 ... Table "unit_economics" must be qualified with a dataset. ``` (`unit_economics` has `authorize_timestamp`, not `authorize_date`.) ## Test plan - [x] New `test_remote_estimate_failed_surfaces_first_error_when_attempts_differ` asserts the first-attempt message wins, second-attempt is preserved as `underlying_original`, hint points to `agnes schema`. - [x] Existing `test_guardrail_returns_400_remote_estimate_failed_on_double_parse_error` still passes (both attempts mocked to identical error). - [x] `pytest tests/test_api_query_guardrail.py` clean. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/218" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->	2026-05-07 16:54:45 +02:00

1 2 3 4 5

202 commits