fix(store): promote-on-approve looks up version_no by submission_id (live agnes-development bug) (#330)

* fix(store): promote-on-approve looks up version_no by submission_id

Live bug observed on agnes-development: an entity had 5+
version_history rows sharing the same `hash` (user re-uploaded
byte-identical bundles as v2/v4/v6 of the same skill — the LLM and
inline checks happily approved each one). The runner's
promote-on-approve path looked up the submission's version_no by
hash:

    for entry in entity.version_history:
        if entry["hash"] == sub_hash:
            target = int(entry["n"]); break

The loop matched the FIRST hash collision — always v1, n=1. With
current=1, the forward-only `target > current` guard then skipped
the promote, leaving the entity stuck at v1 even though the new
submission's status flipped to `approved`. UI kept showing v1 as
"current".

Fix: look up by submission_id via the existing
`_version_no_for_submission` helper (already used by retry / rescan
/ download paths). Same lookup applied in
`admin_override_store_submission` which had the identical hash-match
loop.

Test: TestPromoteLookupByByteIdenticalBundles uploads v1 + a
byte-identical v2, drives the LLM with mock-approve, asserts
entity.version_no advances to 2.

* fix: bundle #329 reviewer-Important follow-ups + post-merge polish

Bundled with Vojtech's commit ahead of this (the promote-on-approve
`version_no` lookup-by-submission_id fix) since #330 is the next
release-cut PR and the four #329 follow-ups would otherwise need a
standalone release-cut PR — prohibited by docs/RELEASING.md §
"Release-cut belongs to the PR".

Fixed:
- src/usage_ask.py — SCHEMA_DIGEST + SYSTEM_PROMPT referenced the
  dropped `usage_plugin_daily` table. The admin
  `POST /api/admin/telemetry/ask` endpoint ships SYSTEM_PROMPT to
  the LLM, so any model-emitted SQL against `usage_plugin_daily`
  would fail with a DuckDB binder error post-#329 merge. Updated to
  describe the new v48 rollups (`usage_marketplace_item_daily` /
  `_window`) and rule 5 of the prompt to point at them.

Internal:
- CHANGELOG.md [0.54.20] section restored to its canonical content
  from the v0.54.20 git tag. The #329 self-merge carried 226 lines
  of author's pre-rebase bullets that ended up mis-attributed; the
  published v0.54.20 GitHub Release (FTS BM25 + batch bar) now
  matches the CHANGELOG section verbatim. Also fills in [Unreleased]
  with this PR's bullets (Fixed + Internal).
- tests/conftest.py — dropped the unused
  `conn_with_usage_schema_and_attribution` fixture that INSERTed
  into the now-removed `usage_attribution_*` tables. Zero callers
  today, but a tripwire — the first future test to request it would
  have failed with a binder error.
- app/web/templates/marketplace.html — replaced a customer-specific
  token (`groupon-marketplace`) in the Most Popular sort-tiebreaker
  comment with a generic `<customer>-marketplace` placeholder per
  CLAUDE.md § Vendor-agnostic OSS. Also scrubbed an `agnes-development`
  reference in app/api/admin.py and src/store_guardrails/runner.py
  (cherry-picked from Vojtech's commit) on the same hygiene rule.

* release: 0.54.22 — flea-market promote-by-submission_id fix + #329 reviewer follow-ups

---------

Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
This commit is contained in:
Vojtech 2026-05-15 23:21:14 +04:00 committed by GitHub
parent 302cf58ccd
commit 78cd243e65
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
8 changed files with 195 additions and 290 deletions

View file

@ -10,6 +10,47 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C
## [Unreleased] ## [Unreleased]
## [0.54.22] — 2026-05-15
### Fixed
- **Flea-market — promote-on-approve + admin-override now look up
the submission's `version_no` in `version_history` by
`submission_id`, not by `hash`.** Hash-based lookup broke whenever
the user uploaded byte-identical bundles across versions (e.g.
same content as v2 and v4): the loop matched the FIRST history
entry with that hash — always v1 — so `target_version_no` landed
at 1, the forward-only `target > current` guard skipped the
promote, and the entity stayed stuck at v1 even though the new
submission was `status='approved'`. UI kept showing v1 as
"current". Both `runner.run_llm_review` (background auto-approve)
and `admin_override_store_submission` now reuse the existing
`_version_no_for_submission` helper. Closes the live development-
deployment case where an entity had 5+ identical-hash history
rows.
- **Admin "ask telemetry" feature** (`POST /api/admin/telemetry/ask`)
no longer emits SQL against the dropped `usage_plugin_daily`
table. `src/usage_ask.py` `SCHEMA_DIGEST` and `SYSTEM_PROMPT`
bumped to describe the v48 rollups (`usage_marketplace_item_daily`
/ `_window`) and rule 5 of the prompt updated. Pre-fix, the LLM
would happily emit `SELECT … FROM usage_plugin_daily` per the
stale prompt and the DuckDB binder would reject it.
### Internal
- **CHANGELOG `[0.54.20]` section restored to its canonical content
from the `v0.54.20` git tag.** The #329 self-merge had carried 226
lines of author's pre-rebase bullets that ended up mis-attributed
to `[0.54.20]`; the published v0.54.20 GitHub Release (FTS BM25 +
batch bar) now matches the CHANGELOG section verbatim.
- `tests/conftest.py` — dropped the unused
`conn_with_usage_schema_and_attribution` fixture that seeded into
the now-removed `usage_attribution_*` tables. Zero callers today
but a tripwire — the first future test to request it would have
failed with a DuckDB binder error.
- `app/web/templates/marketplace.html` — replaced a customer-
specific token (`groupon-marketplace`) in the Most Popular sort-
tiebreaker comment with a generic `<customer>-marketplace`
placeholder per `CLAUDE.md § Vendor-agnostic OSS`.
## [0.54.21] — 2026-05-15 ## [0.54.21] — 2026-05-15
### Added ### Added
@ -131,233 +172,6 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C
/ Reject stay scoped to Review per #129's scope decision (status / Reject stay scoped to Review per #129's scope decision (status
flips belong with the per-row actions or the keyboard workflow). flips belong with the per-row actions or the keyboard workflow).
Closes #129. Closes #129.
- Marketplace telemetry rollup tables (schema v48):
- `usage_marketplace_item_daily` — per-day fact (count, distinct_users,
error_count) keyed by (day, source, type, parent_plugin, name).
- `usage_marketplace_item_window` — sliding-window snapshot with true
cross-window distinct user counts; `period_label='last_7d'`
refreshed every UsageProcessor tick, `period_label='last_30d'`
refreshed hourly.
- `InnerDetailResponse.telemetry` — 30-day invocation + distinct-user
metrics surfaced on curated inner skill / agent detail pages
(`/marketplace/curated/<mp>/<plugin>/skill/<skill>` and `…/agent/<agent>`).
- `scripts/backfill_marketplace_rollup.py` — one-shot script to populate
the new rollup tables from historic `usage_events` after a v46 deploy.
### Changed
- **BREAKING (operator-facing)**: flea-market guardrail pipeline now
fail-CLOSED on misconfig. `get_guardrails_enabled()` previously
conflated operator intent (`guardrails.enabled` YAML) with provider
readiness (`ANTHROPIC_API_KEY` env) — when intent was True but the
env var was missing, the pipeline silently auto-fell-back to disabled
and every upload landed `approved` without an LLM review. Split into
`get_guardrails_enabled()` (intent only) and a new
`get_guardrails_llm_provider_ready()` (env only). Three-state matrix:
`enabled=false → auto-approve` (unchanged), `enabled=true + ready →
normal hold-for-review` (unchanged), `enabled=true + not-ready →
submissions sit at `pending_llm`, no auto-approval` (new — was
silent auto-approval). Admin **Retry review** action on
`/admin/store/submissions/<id>` now covers `pending_llm` too (was
`review_error` + `blocked_llm`). Boot-time `WARNING` banner surfaces
the misconfig in `app/main.py`. Operators who relied on the
auto-fallback for local-dev no-LLM setups must now explicitly set
`guardrails.enabled: false` in `instance.yaml` — same outcome,
explicit intent.
- Flea-market admin **Override** action on
`/admin/store/submissions/<id>` now covers `pending_llm` submissions
too (was `blocked_inline` + `blocked_llm` + `review_error`). Closes
a UX gap created by the new fail-CLOSED behavior above: under
enabled-but-not-ready, a known-good submission would otherwise sit
indefinitely until the admin set credentials AND clicked **Retry
review**. Override already routes through `entity.version_history`
to resolve the correct version dir (and is now forward-only on
promote — see the Fixed bullet below), so it stays safe for v2+
deferred-promotion submissions.
- The `/corporate-memory` domain filter dropdown now offers the full
`VALID_DOMAINS` enum (finance / engineering / product / data /
operations / infrastructure), not just domains that already have ≥1
item — the old behavior collapsed to a single option on fresh stores.
- **BREAKING (UI):** the Curated Memory admin review queue moved from
`/corporate-memory/admin` to `/admin/corporate-memory` — no redirect.
It is now reached from the Admin nav dropdown ("Memory Review").
- `/corporate-memory` (Curated Memory) is no longer admin-only — the
route runs on `get_current_user`, matching the already-open
`/api/memory/*` endpoints. The pending-review banner stays admin-only
(suppressed server-side for non-admins).
- Both Curated Memory pages now render the shared `_page_hero.html`
header band instead of bespoke per-page chrome, matching catalog /
me-activity / admin pages.
- **BREAKING:** `MarketplaceItem.unique_users_30d` renamed to
`distinct_users_30d` in the `/api/marketplace/items` response. The new
value is a true distinct count across the 30-day window (from the
`usage_marketplace_item_window` snapshot), not the old sum-of-daily
proxy that over-counted active multi-day users.
- `usage_events.source` is now populated per-event by `MarketplaceItemLookup`
(live join against `marketplace_plugins` + `store_entities`). Previously
it sat at `'builtin'` for every row because the v42 `AttributionLookup`
matched skill/command names without the plugin prefix that Claude Code
actually writes — `usage_events.source = 'curated'` / `'flea'` /
`'builtin'` becomes meaningful for the first time.
- `usage_events.ref_id` semantics — now stores the plain plugin name
(curated) instead of `<marketplace_id>/<plugin_name>`; flea entities
store `NULL` (no parent plugin). Admin telemetry endpoints
(`/api/admin/telemetry/{summary,query,facets,export}`) keep their
`GROUP BY ref_id` / `source` shapes — bucket values shift to the
new semantics.
- The marketplace browse card no longer renders the invocation chip
(`🔥 N uses · ↑ X%`) pending UX finalisation. `invocations_30d`,
`distinct_users_30d`, and `trend_pct` remain in the API response for
the upcoming Most-Used / Trending sections and the detail pages.
- `USAGE_PROCESSOR_VERSION` bumped 4 → 5 so the session-pipeline
reprocess loop re-attributes historic events to the new
source/ref_id semantics on the next tick.
### Removed
- **BREAKING:** four schema-v42 telemetry tables (v48 migration):
- `usage_attribution_skills`, `usage_attribution_agents`, and
`usage_attribution_commands` — replaced by live prefix-split lookup
against `marketplace_plugins` + `store_entities`. Verified empty or
derivable; no unique data lost.
- `usage_plugin_daily` — replaced by `usage_marketplace_item_daily`
+ `_window`. Verified empty in production-shape data (the v42 rollup
INSERT was gated on `source IN ('curated','flea')` but the broken
attribution layer always produced `'builtin'`).
- `src/repositories/usage_attribution.py`, `src/usage_attribution_helpers.py`,
`scripts/backfill_usage_attribution.py`, and their test fixtures
(`tests/test_usage_attribution.py`,
`tests/test_backfill_usage_attribution.py`) — no callers remain.
### Fixed
- **Unauthenticated browser requests to `GET /api/initial-workspace.zip` now redirect to `/login?next=/api/initial-workspace.zip` instead of returning a raw JSON 401** (#315). This is the one `/api/*` endpoint that's designed to be hit directly from a browser bookmark (the analyst clean-install zip), so it intentionally opts out of the global `_API_PATH_PREFIXES` "never redirect /api/*" contract in `app/main.py`. CLI / curl / other API clients (any `Accept` without `text/html` — including the `*/*` default) keep getting the 401 they can handle.
- Flea-market LLM security review failed with `LLMFormatError: Response
truncated (max_tokens) for schema store_guardrails_review` when the
reviewer emitted many findings or content-quality issues. Raised the
output budget (2500 → 6000 tokens) and added a one-shot
retry-with-doubled-budget in the Anthropic provider (up to 4× initial)
so the verdict survives an occasional verbose response instead of
pinning the submission in `review_error`. (We initially added
`maxItems=20` to the schema's `findings` and `content_quality.issues`
arrays, but Anthropic's structured-output validator rejects `maxItems`
on array types — removed.)
- Flea-market entity detail page surfaces the latest submission's
failure verdict even when a previously-approved version is still
serving (deferred-promotion path). The owner / admin used to see no
banner at all when a v2+ edit landed in `review_error`,
`blocked_llm`, or `blocked_inline` because the `_quarantine_banner`
partial gated on `entity.visibility_status != 'approved'`. The
banner now renders for those failure statuses too, with copy that
acknowledges the prior version is still live ("Latest edit failed
review — previously approved version (vN) keeps serving …").
- Flea-market `/api/store/bundle.zip` export now filters by
`visibility_status='approved'` for non-admin non-owner callers.
Previously an authenticated user could call the bundle endpoint
and pull pending / blocked / hidden v1 bytes — bypassing the
publish gate the same way `_enforce_visibility` already prevents
on detail pages + install. Owners still see their own non-approved
entries (matches the browse-listing `include_owner_id` affordance);
admins still see everything. (Critical — surfaced by adversarial
review.)
- Flea-market PUT (edit) + restore endpoints now serialize concurrent
writes against the same `entity_id` via a per-entity asyncio lock.
Two concurrent PUTs could previously both pass the
`latest_for_entity` pending gate, both bake into
`versions/v<N+1>/plugin/`, and both append a `version_history`
entry. The lock closes the window for single-process deployments;
multi-worker deployments still have a residual window (tracked in
the follow-up issue). (High — surfaced by adversarial review.)
- Flea-market `StoreSubmissionsRepository.update_status` is now
compare-and-swap on terminal statuses (`approved`, `overridden`,
`blocked_inline`). A late background-task LLM verdict racing with
an admin override or with a more recent terminal verdict can no
longer silently clobber the row. Callers that legitimately need to
overwrite a terminal state pass `allow_terminal_overwrite=True`
explicitly. Returns a boolean indicating whether the write landed.
`runner.run_llm_review` now honors the bool on both its `approved`
and `blocked_llm` branches: a CAS no-op skips the downstream
cascade (visibility flip, version promote, the verdict-specific
audit entry that would otherwise contradict the row) and logs a
single `store.submission.bg_verdict_skipped` audit row instead,
so an operator reviewing the queue sees dropped verdicts
explicitly rather than via row-vs-audit contradiction.
(High — surfaced by adversarial review.)
- Flea-market admin **Retry review** and **Rescan** now review the
STAGED version's bundle, not the live `plugin/` directory. For a
v2+ edit held at `pending_llm` / `blocked_llm` / `review_error`,
live still holds the prior approved version. Reviewing live would
produce a verdict against the WRONG bytes; the runner's hash-match
promotion would then advance the entity to staged bytes that were
never actually reviewed. Now resolves the staged
`versions/v<N>/plugin/` from the submission's `version_history`
entry. (Critical — surfaced by adversarial review.)
- Flea-market admin **Override** is now forward-only on promotion.
Previously `target_version_no != current` would happily demote the
live bundle when admin overrode a stale v2 submission while v3 was
already approved + live. Changed to `target > current` so override
flips status + visibility on the row regardless, but on-disk
promotion only fires for newer versions. The same `> current`
guard is now applied defensively in `runner.run_llm_review` so a
late LLM verdict can't demote past a more recent approval either.
(High — surfaced by adversarial review.)
- Flea-market admin **Override** on a v2+ edit/restore submission now
promotes the entity to the overridden version + swaps the on-disk
live bundle. Pre-fix the override only flipped
`visibility_status='approved'` and `submission.status='overridden'`,
leaving `entity.version_no` at the prior approved version — so
installers (and the marketplace UI) kept serving the OLD bytes the
admin just intended to replace. Mirrors the auto-approval branch in
`runner.run_llm_review`: look up the submission's version in
`version_history`, `promote_version` + `_swap_live_to_version` when
it differs from current. Initial-v1 overrides unchanged (no
promotion needed).
- Flea-market Restore button + endpoint no longer allow restoring a
version that was never approved. The versions card hid the gate
entirely (showed Restore on any non-current row), and the backend
blocked only while the latest submission was `pending_*` — so a
`blocked_llm` / `review_error` version could be restored anyway.
Added `submission_status` decoration on `version_history` via a new
`StoreEntitiesRepository.get_with_version_approvals` helper, gated
the UI button on `submission_status in ('approved', None)`
(`None` is the legacy v1 seed, back-compat-treated as approved),
rendered status pills for blocked / errored / pending rows, and
added a 400 `version_not_approved` guard in
`POST /api/store/entities/{id}/versions/{n}/restore`.
- `/me/activity` hero subtitle showed literal `<strong>…</strong>` tags
around the user's email instead of rendering them bold. The subtitle
was built by `~`-concatenating a `Markup` operand (`user.email | e`)
with HTML string literals, which made Jinja2's `markup_join` escape
the literal tags too. Switched to `{% set %}…{% endset %}` block
capture so the literal `<strong>` stays HTML while the email is still
autoescaped.
### Internal
- `StoreEntitiesRepository.get_with_version_approvals` now defensively
copies each `version_history` entry before annotating with
`submission_status`. `self.get()` re-parses JSON each call today so
this is belt-and-suspenders, but it protects any future caching layer
from leaking the annotated key into a subsequent plain `get()` call.
- `corporate_memory_admin.html` renamed to `admin_corporate_memory.html`
to match the `admin_*` template convention; dropped dead `.ck-topbar` /
`.page-header` inline CSS (the latter collided with the design-system
`.page-header--hero` selector).
- `usage_tool_daily` flagged as candidate for removal. The table is
still populated by `rebuild_rollups` for backwards compatibility with
the `usage_ask` schema digest, but has no product-UI consumer after
the v46 marketplace telemetry refactor. Will be evaluated for drop in
the next iteration.
- CI test suite sharded for speed. The `test` job in `.github/workflows/ci.yml` is now a `test-shard` matrix — 4 parallel jobs via `pytest-split`, balanced by a committed `.test_durations` file — aggregated into a single `test` status check so branch protection needs no change. The duplicate full-suite `test` job in `release.yml` is removed (it re-ran the same ~10 min suite a second time on every push to main/feature branches); `release.yml` is now image-build only, with the advisory ruff/mypy steps moved to a lean `lint` job in `ci.yml`. Net: ~10 min → ~3 min wall-clock per push, and the suite runs once instead of twice. Adds `pytest-split` to the `dev` extra.
- CI/release workflow polish (the still-salvageable subset of the
abandoned PR #139, after #311 obsoleted the test-job refactor):
`rollback.yml` extracts the `release.yml` smoke-test rollback into a
reusable + manually dispatchable workflow, with a warning guard on
non-`stable-*` `workflow_dispatch` inputs. `prune-dev-tags.yml` adds
weekly housekeeping (Sundays 04:00 UTC) of legacy CalVer git tags +
GHCR images outside a `KEEP_MONTHS` retention window; floating
aliases are git-tagless and never matched. `lint-workflows.yml` runs
`actionlint` on `.github/workflows/**` + `scripts/ops/**.sh` changes
(non-blocking initially). The superseded `deploy.yml` stub is removed.
Excludes #139's rejected pieces (Release Drafter, setuptools_scm,
run-number tag scheme, main-only release triggers, deletion of
`cli-wheel-clean-install`).
### Internal ### Internal
- **Schema v47** — adds DuckDB `fts` BM25 index over - **Schema v47** — adds DuckDB `fts` BM25 index over

View file

@ -3754,15 +3754,18 @@ async def admin_override_store_submission(
# just no-ops and we skip promotion harmlessly. # just no-ops and we skip promotion harmlessly.
entity_row = ents_repo.get(entity_id) or {} entity_row = ents_repo.get(entity_id) or {}
promoted_to: Optional[int] = None promoted_to: Optional[int] = None
sub_hash = sub.get("version") # Look up THIS submission's version entry by submission_id, NOT
target_version_no: Optional[int] = None # by hash. Hash-based lookup breaks when the user re-uploads
for entry in (entity_row.get("version_history") or []): # byte-identical bundles (e.g. v2 same content as v1): the loop
if entry.get("hash") == sub_hash: # picks the FIRST history entry with that hash (always v1, n=1),
try: # so target_version_no lands at 1 instead of the actual new
target_version_no = int(entry.get("n")) # entry's n. The forward-only `target > current` guard then
except (TypeError, ValueError): # skips the promote, leaving the entity stuck at v1. Surfaced
target_version_no = None # live on a development deployment.
break from app.api.store import _version_no_for_submission
target_version_no: Optional[int] = _version_no_for_submission(
entity_row, submission_id,
)
# Forward-only: refuse to promote backwards. An admin overriding a # Forward-only: refuse to promote backwards. An admin overriding a
# stale v2 submission when v3 is already approved + live must NOT # stale v2 submission when v3 is already approved + live must NOT
# demote the live bundle back to v2's bytes. Override flips the # demote the live bundle back to v2's bytes. Override flips the

View file

@ -1051,8 +1051,8 @@ async function loadMostPopular() {
// 4. invocations_7d DESC ← recent volume // 4. invocations_7d DESC ← recent volume
// 5. name ASC ← deterministic textual order // 5. name ASC ← deterministic textual order
// 6. marketplace_slug ASC ← splits duplicate plugin names // 6. marketplace_slug ASC ← splits duplicate plugin names
// across multiple marketplaces (e.g. `grpn` in `groupon- // across multiple marketplaces (e.g. a `foo` plugin shipped
// marketplace` vs. test seeds) // both by `<customer>-marketplace` and by curated test seeds)
const r = await fetchOne(state.tab, { sort: 'most_used', page: 1, page_size: 24 }); const r = await fetchOne(state.tab, { sort: 'most_used', page: 1, page_size: 24 });
const cmp = (a, b) => { const cmp = (a, b) => {
const cmpN = (x, y) => (y || 0) - (x || 0); const cmpN = (x, y) => (y || 0) - (x || 0);

View file

@ -1,6 +1,6 @@
[project] [project]
name = "agnes-the-ai-analyst" name = "agnes-the-ai-analyst"
version = "0.54.21" version = "0.54.22"
description = "Agnes — AI Data Analyst platform for AI analytical systems" description = "Agnes — AI Data Analyst platform for AI analytical systems"
requires-python = ">=3.11,<3.14" requires-python = ">=3.11,<3.14"
license = "MIT" license = "MIT"

View file

@ -283,18 +283,25 @@ def run_llm_review(
# serve-able state. Archive / hidden-by-admin paths # serve-able state. Archive / hidden-by-admin paths
# leave alone. # leave alone.
if current_visibility == "approved": if current_visibility == "approved":
# v46: attribution lookup is live — the next # v48 (#329): attribution lookup is live — the next
# UsageProcessor tick preloads the approved entity by name. # UsageProcessor tick preloads the approved entity by
sub_row = subs_repo.get(submission_id) or {} # name against marketplace_plugins / store_entities;
sub_hash = sub_row.get("version") # no cached attribution rows to refresh on promote.
target_version_no = None # Look up THIS submission's version entry by
for entry in (ent_row.get("version_history") or []): # submission_id, NOT by hash. Hash-based lookup
if entry.get("hash") == sub_hash: # breaks when a user re-uploads byte-identical
try: # bundles (e.g. v2 same content as v1): the loop
target_version_no = int(entry.get("n")) # picks the FIRST history entry with that hash
except (TypeError, ValueError): # (always v1, n=1), so `target_version_no` lands at
target_version_no = None # 1 instead of the actual new entry's n. The
break # forward-only `target > current` guard then skips
# the promote, leaving the entity stuck at v1.
# Surfaced live on a development deployment with
# an entity that had 5+ identical-hash history rows.
from app.api.store import _version_no_for_submission
target_version_no = _version_no_for_submission(
ent_row, submission_id,
)
# Forward-only promotion. A late verdict landing for # Forward-only promotion. A late verdict landing for
# an older submission must NOT demote the live bundle # an older submission must NOT demote the live bundle
# past a version that was approved more recently. # past a version that was approved more recently.

View file

@ -66,14 +66,27 @@ TABLE usage_tool_daily
distinct_sessions INTEGER distinct_sessions INTEGER
PRIMARY KEY (day, tool_name, source) PRIMARY KEY (day, tool_name, source)
TABLE usage_plugin_daily TABLE usage_marketplace_item_daily
day DATE NOT NULL day DATE NOT NULL
source VARCHAR NOT NULL source VARCHAR NOT NULL -- 'curated' | 'flea' | 'builtin'
ref_id VARCHAR NOT NULL type VARCHAR NOT NULL -- 'plugin' | 'skill' | 'agent' | 'command'
invocations INTEGER parent_plugin VARCHAR NOT NULL -- '' for top-level plugins; '<plugin>' for inner items
name VARCHAR NOT NULL
count INTEGER
distinct_users INTEGER distinct_users INTEGER
distinct_sessions INTEGER error_count INTEGER
PRIMARY KEY (day, source, ref_id) PRIMARY KEY (day, source, type, parent_plugin, name)
TABLE usage_marketplace_item_window
period_label VARCHAR NOT NULL -- 'last_7d' | 'last_30d'
source VARCHAR NOT NULL
type VARCHAR NOT NULL
parent_plugin VARCHAR NOT NULL
name VARCHAR NOT NULL
invocations INTEGER
distinct_users INTEGER -- TRUE distinct across the window (not summed from daily)
refreshed_at TIMESTAMP
PRIMARY KEY (period_label, source, type, parent_plugin, name)
""" """
SYSTEM_PROMPT = """You translate natural-language questions into DuckDB SELECT statements over a telemetry schema. SYSTEM_PROMPT = """You translate natural-language questions into DuckDB SELECT statements over a telemetry schema.
@ -83,7 +96,7 @@ Rules:
2. SELECT-only never INSERT, UPDATE, DELETE, DROP, CREATE, ALTER, ATTACH, COPY, PRAGMA, or any side-effect statement. 2. SELECT-only never INSERT, UPDATE, DELETE, DROP, CREATE, ALTER, ATTACH, COPY, PRAGMA, or any side-effect statement.
3. No semicolons except optionally one at the end. 3. No semicolons except optionally one at the end.
4. No CTE that contains a write. 4. No CTE that contains a write.
5. Prefer rollup tables (`usage_tool_daily`, `usage_plugin_daily`) for date-range aggregations; use `usage_events` for forensic detail. 5. Prefer rollup tables for date-range aggregations: `usage_tool_daily` (per-tool), `usage_marketplace_item_daily` (per marketplace item plugin / skill / agent / command keyed by source + type + parent_plugin + name), and `usage_marketplace_item_window` (true-distinct counts across `last_7d` / `last_30d` snapshots). Use `usage_events` for forensic detail. The pre-v48 `usage_plugin_daily` and `usage_attribution_*` tables are gone do not reference them.
6. Use DuckDB-flavor SQL: `CURRENT_DATE`, `INTERVAL 7 DAY`, `DATE_TRUNC('week', ...)`, `EPOCH`, etc. 6. Use DuckDB-flavor SQL: `CURRENT_DATE`, `INTERVAL 7 DAY`, `DATE_TRUNC('week', ...)`, `EPOCH`, etc.
7. Limit large result sets default `LIMIT 100` unless the question asks for ALL rows. 7. Limit large result sets default `LIMIT 100` unless the question asks for ALL rows.

View file

@ -380,38 +380,6 @@ from tests.fixtures.analyst_bootstrap import ( # noqa: E402,F401
) )
@pytest.fixture
def conn_with_usage_schema_and_attribution(tmp_path, monkeypatch):
"""DuckDB conn at v41 with attribution rows seeded for fixture skills/agents/commands."""
monkeypatch.setenv("DATA_DIR", str(tmp_path))
import src.db as db_module
db_module._system_db_conn = None
db_module._system_db_path = None
c = db_module.get_system_db()
# Seed attribution rows the fixtures reference
c.execute(
"INSERT OR IGNORE INTO usage_attribution_skills (source, ref_id, skill_name)"
" VALUES ('curated', 'mp/plug', 'my-skill')"
)
c.execute(
"INSERT OR IGNORE INTO usage_attribution_skills (source, ref_id, skill_name)"
" VALUES ('flea', 'entity-1', 'flea-skill')"
)
c.execute(
"INSERT OR IGNORE INTO usage_attribution_agents (source, ref_id, agent_name)"
" VALUES ('curated', 'mp/plug', 'my-agent')"
)
c.execute(
"INSERT OR IGNORE INTO usage_attribution_commands (source, ref_id, command_name)"
" VALUES ('curated', 'mp/plug', 'compound:debug')"
)
yield c
try:
c.close()
except Exception:
pass
@pytest.fixture @pytest.fixture
def bq_instance(monkeypatch): def bq_instance(monkeypatch):
"""Force instance.yaml to look like a BigQuery deployment for the """Force instance.yaml to look like a BigQuery deployment for the

View file

@ -1706,3 +1706,103 @@ class TestAtomicPromote:
f"DB must NOT advance when live swap can't happen; got " f"DB must NOT advance when live swap can't happen; got "
f"version_no={ent_after['version_no']}" f"version_no={ent_after['version_no']}"
) )
class TestPromoteLookupByByteIdenticalBundles:
"""Live-issue regression observed on a development deployment: an
entity had multiple version_history rows sharing the same `hash`
(user re-uploaded byte-identical bundles as v2/v4/v6). The runner's
promote-on-approve path looked up the submission's version_no
in version_history BY HASH and broke on the FIRST match always
v1. With v1's n=1 and current=1, the forward-only
`target > current` guard skipped the promote, so the passing
LLM verdict never advanced the entity. UI kept showing v1 as
'current' even though the new submission's status was 'approved'.
Fix: look up by `submission_id` via `_version_no_for_submission`."""
def test_byte_identical_v2_promotes_to_current(
self, web_client, monkeypatch,
):
from pathlib import Path
from app.utils import get_store_dir
from src.repositories.store_submissions import StoreSubmissionsRepository
from src.store_guardrails.runner import run_llm_review
monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-fake-for-identical-test")
owner_id, owner_cookies = _create_user(web_client, "identical@x.com")
# Same body for v1 + v2 → byte-identical zip → same hash.
identical_body = (
"Identical body line that is intentionally long enough to "
"clear the content threshold for skill bodies. " * 4
)
v1_zip = _make_skill_zip("identical", body=identical_body)
r = web_client.post(
"/api/store/entities",
files={"file": ("v1.zip", v1_zip, "application/zip")},
data={"type": "skill", "description": _OK_DESC},
cookies=owner_cookies,
)
assert r.status_code == 201, r.text
eid = r.json()["id"]
conn = get_system_db()
ent_v1 = StoreEntitiesRepository(conn).get(eid)
conn.close()
v1_hash = ent_v1["version"]
assert ent_v1["version_no"] == 1
# Flip guardrails ON for v2. Mock LLM to approve.
monkeypatch.setattr(
"app.api.store.get_guardrails_enabled", lambda: True,
)
monkeypatch.setattr(
"app.api.store.get_guardrails_llm_provider_ready", lambda: True,
)
def mock_approve(*a, **kw):
return {
"risk_level": "safe", "summary": "ok",
"findings": [], "template_placeholders_found": 0,
"reviewed_by_model": "mock", "error": None,
"content_quality": {"verdict": "pass", "issues": []},
}
monkeypatch.setattr(
"src.store_guardrails.llm_review.review_bundle", mock_approve,
)
# PUT v2 with IDENTICAL bytes.
v2_zip = _make_skill_zip("identical", body=identical_body)
r = web_client.put(
f"/api/store/entities/{eid}",
files={"file": ("v2.zip", v2_zip, "application/zip")},
cookies=owner_cookies,
)
assert r.status_code == 200, r.text
conn = get_system_db()
v2_sub_id = StoreSubmissionsRepository(conn).latest_for_entity(eid)["id"]
conn.close()
run_llm_review(
v2_sub_id,
plugin_dir=Path(get_store_dir()) / eid / "versions" / "v2" / "plugin",
conn_factory=get_system_db,
api_key_loader=lambda: "sk", model_loader=lambda: "mock",
)
conn = get_system_db()
ent_after = StoreEntitiesRepository(conn).get(eid)
v2_sub = StoreSubmissionsRepository(conn).get(v2_sub_id)
conn.close()
# Pre-fix the runner would have matched v1's history entry
# first (same hash), target_version_no=1, `1 > 1` False, no
# promote → entity stuck at v1.
assert ent_after["version_no"] == 2, (
f"v2 must promote even when its hash matches v1's; got "
f"version_no={ent_after['version_no']}. Lookup-by-hash "
f"would have stuck the entity at v1."
)
assert v2_sub["status"] == "approved"
assert ent_after["version"] == v1_hash, (
"hash unchanged (bundle is byte-identical) but version_no DID move"
)