# Changelog All notable changes to Agnes AI Data Analyst. Format: [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Versions follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html), pre-1.0 — public surface (CLI flags, REST endpoints, `instance.yaml` schema, `extract.duckdb` contract) may shift between minor versions; breaking changes called out under **Changed** or **Removed** with the **BREAKING** marker. CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every CI build; semver tags (`v0.X.Y`) are cut at release boundaries and reference the same commit as a `stable-*` tag from the same day. --- ## [Unreleased] ### Changed - **BREAKING (operator-facing)**: flea-market guardrail pipeline now fail-CLOSED on misconfig. `get_guardrails_enabled()` previously conflated operator intent (`guardrails.enabled` YAML) with provider readiness (`ANTHROPIC_API_KEY` env) — when intent was True but the env var was missing, the pipeline silently auto-fell-back to disabled and every upload landed `approved` without an LLM review. Split into `get_guardrails_enabled()` (intent only) and a new `get_guardrails_llm_provider_ready()` (env only). Three-state matrix: `enabled=false → auto-approve` (unchanged), `enabled=true + ready → normal hold-for-review` (unchanged), `enabled=true + not-ready → submissions sit at `pending_llm`, no auto-approval` (new — was silent auto-approval). Admin **Retry review** action on `/admin/store/submissions/` now covers `pending_llm` too (was `review_error` + `blocked_llm`). Boot-time `WARNING` banner surfaces the misconfig in `app/main.py`. Operators who relied on the auto-fallback for local-dev no-LLM setups must now explicitly set `guardrails.enabled: false` in `instance.yaml` — same outcome, explicit intent. - Flea-market admin **Override** action on `/admin/store/submissions/` now covers `pending_llm` submissions too (was `blocked_inline` + `blocked_llm` + `review_error`). Closes a UX gap created by the new fail-CLOSED behavior above: under enabled-but-not-ready, a known-good submission would otherwise sit indefinitely until the admin set credentials AND clicked **Retry review**. Override already routes through `entity.version_history` to resolve the correct version dir (and is now forward-only on promote — see the Fixed bullet below), so it stays safe for v2+ deferred-promotion submissions. ### Fixed - **Unauthenticated browser requests to `GET /api/initial-workspace.zip` now redirect to `/login?next=/api/initial-workspace.zip` instead of returning a raw JSON 401** (#315). This is the one `/api/*` endpoint that's designed to be hit directly from a browser bookmark (the analyst clean-install zip), so it intentionally opts out of the global `_API_PATH_PREFIXES` "never redirect /api/*" contract in `app/main.py`. CLI / curl / other API clients (any `Accept` without `text/html` — including the `*/*` default) keep getting the 401 they can handle. - Flea-market LLM security review failed with `LLMFormatError: Response truncated (max_tokens) for schema store_guardrails_review` when the reviewer emitted many findings or content-quality issues. Raised the output budget (2500 → 6000 tokens) and added a one-shot retry-with-doubled-budget in the Anthropic provider (up to 4× initial) so the verdict survives an occasional verbose response instead of pinning the submission in `review_error`. (We initially added `maxItems=20` to the schema's `findings` and `content_quality.issues` arrays, but Anthropic's structured-output validator rejects `maxItems` on array types — removed.) - Flea-market entity detail page surfaces the latest submission's failure verdict even when a previously-approved version is still serving (deferred-promotion path). The owner / admin used to see no banner at all when a v2+ edit landed in `review_error`, `blocked_llm`, or `blocked_inline` because the `_quarantine_banner` partial gated on `entity.visibility_status != 'approved'`. The banner now renders for those failure statuses too, with copy that acknowledges the prior version is still live ("Latest edit failed review — previously approved version (vN) keeps serving …"). - Flea-market `/api/store/bundle.zip` export now filters by `visibility_status='approved'` for non-admin non-owner callers. Previously an authenticated user could call the bundle endpoint and pull pending / blocked / hidden v1 bytes — bypassing the publish gate the same way `_enforce_visibility` already prevents on detail pages + install. Owners still see their own non-approved entries (matches the browse-listing `include_owner_id` affordance); admins still see everything. (Critical — surfaced by adversarial review.) - Flea-market PUT (edit) + restore endpoints now serialize concurrent writes against the same `entity_id` via a per-entity asyncio lock. Two concurrent PUTs could previously both pass the `latest_for_entity` pending gate, both bake into `versions/v/plugin/`, and both append a `version_history` entry. The lock closes the window for single-process deployments; multi-worker deployments still have a residual window (tracked in the follow-up issue). (High — surfaced by adversarial review.) - Flea-market `StoreSubmissionsRepository.update_status` is now compare-and-swap on terminal statuses (`approved`, `overridden`, `blocked_inline`). A late background-task LLM verdict racing with an admin override or with a more recent terminal verdict can no longer silently clobber the row. Callers that legitimately need to overwrite a terminal state pass `allow_terminal_overwrite=True` explicitly. Returns a boolean indicating whether the write landed. `runner.run_llm_review` now honors the bool on both its `approved` and `blocked_llm` branches: a CAS no-op skips the downstream cascade (visibility flip, version promote, the verdict-specific audit entry that would otherwise contradict the row) and logs a single `store.submission.bg_verdict_skipped` audit row instead, so an operator reviewing the queue sees dropped verdicts explicitly rather than via row-vs-audit contradiction. (High — surfaced by adversarial review.) - Flea-market admin **Retry review** and **Rescan** now review the STAGED version's bundle, not the live `plugin/` directory. For a v2+ edit held at `pending_llm` / `blocked_llm` / `review_error`, live still holds the prior approved version. Reviewing live would produce a verdict against the WRONG bytes; the runner's hash-match promotion would then advance the entity to staged bytes that were never actually reviewed. Now resolves the staged `versions/v/plugin/` from the submission's `version_history` entry. (Critical — surfaced by adversarial review.) - Flea-market admin **Override** is now forward-only on promotion. Previously `target_version_no != current` would happily demote the live bundle when admin overrode a stale v2 submission while v3 was already approved + live. Changed to `target > current` so override flips status + visibility on the row regardless, but on-disk promotion only fires for newer versions. The same `> current` guard is now applied defensively in `runner.run_llm_review` so a late LLM verdict can't demote past a more recent approval either. (High — surfaced by adversarial review.) - Flea-market admin **Override** on a v2+ edit/restore submission now promotes the entity to the overridden version + swaps the on-disk live bundle. Pre-fix the override only flipped `visibility_status='approved'` and `submission.status='overridden'`, leaving `entity.version_no` at the prior approved version — so installers (and the marketplace UI) kept serving the OLD bytes the admin just intended to replace. Mirrors the auto-approval branch in `runner.run_llm_review`: look up the submission's version in `version_history`, `promote_version` + `_swap_live_to_version` when it differs from current. Initial-v1 overrides unchanged (no promotion needed). - Flea-market Restore button + endpoint no longer allow restoring a version that was never approved. The versions card hid the gate entirely (showed Restore on any non-current row), and the backend blocked only while the latest submission was `pending_*` — so a `blocked_llm` / `review_error` version could be restored anyway. Added `submission_status` decoration on `version_history` via a new `StoreEntitiesRepository.get_with_version_approvals` helper, gated the UI button on `submission_status in ('approved', None)` (`None` is the legacy v1 seed, back-compat-treated as approved), rendered status pills for blocked / errored / pending rows, and added a 400 `version_not_approved` guard in `POST /api/store/entities/{id}/versions/{n}/restore`. ### Internal - `StoreEntitiesRepository.get_with_version_approvals` now defensively copies each `version_history` entry before annotating with `submission_status`. `self.get()` re-parses JSON each call today so this is belt-and-suspenders, but it protects any future caching layer from leaking the annotated key into a subsequent plain `get()` call. ## [0.54.17] — 2026-05-15 ### Changed - `agnes refresh-marketplace --check` (the SessionStart-hook detector that fires on every Claude Code session start in every workspace) now uses `git ls-remote origin HEAD` instead of `git fetch origin` to learn whether the remote marketplace has changed. ls-remote transfers one line of text (`\tHEAD`) over a single HTTPS round-trip — no git objects, no metadata — so the hook completes in ~0.5–1 s instead of the ~8 s a full fetch took. Detection logic is unchanged (compare local `HEAD` SHA to remote `HEAD` SHA, emit the `/update-agnes-plugins` hint JSON on mismatch, silent on match). The slash-command and `--bootstrap` paths still do real `git fetch + reset --hard` — they actually need the objects. ### Fixed - `/me/activity` hero subtitle showed literal `` tags around the user's email instead of rendering them bold. The subtitle was built by `~`-concatenating a `Markup` operand (`user.email | e`) with HTML string literals, which made Jinja2's `markup_join` escape the literal tags too. Switched to `{% set %}…{% endset %}` block capture so the literal `` stays HTML while the email is still autoescaped. ### Internal - CI test suite sharded for speed. The `test` job in `.github/workflows/ci.yml` is now a `test-shard` matrix — 4 parallel jobs via `pytest-split`, balanced by a committed `.test_durations` file — aggregated into a single `test` status check so branch protection needs no change. The duplicate full-suite `test` job in `release.yml` is removed (it re-ran the same ~10 min suite a second time on every push to main/feature branches); `release.yml` is now image-build only, with the advisory ruff/mypy steps moved to a lean `lint` job in `ci.yml`. Net: ~10 min → ~3 min wall-clock per push, and the suite runs once instead of twice. Adds `pytest-split` to the `dev` extra. - CI/release workflow polish (the still-salvageable subset of the abandoned PR #139, after #311 obsoleted the test-job refactor): `rollback.yml` extracts the `release.yml` smoke-test rollback into a reusable + manually dispatchable workflow, with a warning guard on non-`stable-*` `workflow_dispatch` inputs. `prune-dev-tags.yml` adds weekly housekeeping (Sundays 04:00 UTC) of legacy CalVer git tags + GHCR images outside a `KEEP_MONTHS` retention window; floating aliases are git-tagless and never matched. `lint-workflows.yml` runs `actionlint` on `.github/workflows/**` + `scripts/ops/**.sh` changes (non-blocking initially). The superseded `deploy.yml` stub is removed. Excludes #139's rejected pieces (Release Drafter, setuptools_scm, run-number tag scheme, main-only release triggers, deletion of `cli-wheel-clean-install`). ## [0.54.16] — 2026-05-14 ### Fixed - Store submit-flow wizard buttons were missing the `.btn` base class — Next / Back / Finish on `/store/new` and Save on `/store/edit/` carried only the `.btn-primary` / `.btn-secondary` color modifier, so they rendered with no padding, border-radius, or proper sizing (~18px-tall color boxes) instead of matching their sibling Cancel links. Added the `.btn` base class on all four. ## [0.54.15] — 2026-05-14 ### Added - New `/me/activity` page consolidating per-analyst usage analytics into one place: four tabs — Sessions, Token usage, Data access, Sync activity. The Sessions tab merges what used to be split across two pages: usage metrics (model, prompts, tools, tokens) plus pipeline status (pending/processed/extracted), items-extracted count, and the session download link, all in one table. - `GET /api/me/stats/sessions` response now includes `pipeline_status`, `items_extracted`, and `download_url` per row (joined from `session_processor_state` and the `user_sessions/` filesystem). ### Changed - `/me/stats` and `/profile/sessions` are consolidated into `/me/activity`. Both old URLs now 301-redirect — `/me/stats` → `/me/activity`, `/profile/sessions` → `/me/activity?tab=sessions`. The `/profile/sessions/{filename}` download endpoint is unchanged. - `/profile` is renamed to `/me/profile` and absorbs the former `/me/debug` (session diagnostics) and `/tokens` (Personal Authentication Token management) pages into one account page — Account, Group memberships, Effective access, Personal Authentication Tokens, and a collapsible Session & troubleshooting section. The user menu is now Profile → My activity; the "Stats", "My tokens", and "Auth debug" entries are retired. `/admin/tokens`, the `/auth/tokens` API, and `/api/me/profile` are unchanged. - `/api/me/stats/*` session lookup now keys by `user_id` — matching how the session pipeline writes `usage_session_summary.username` — fixing empty results when an analyst's email local-part differed from their user_id. `items_extracted` renders `0` instead of blank when null. ### Fixed - `/me/activity` page hero subtitle now escapes `user.email` before concatenating it into the `| safe`-rendered subtitle. The raw concatenation bypassed Jinja2 auto-escaping — an XSS regression relative to the auto-escaped `me_stats.html` it replaced. - Local dev with `docker-compose.dev.yml` (uvicorn --reload) no longer hits "Could not set lock on file system.duckdb" — moved seed_admin / scheduler_user / no-password-warning blocks from `create_app()` (where they ran in both reloader + worker) into the lifespan (worker-only). ### Removed - `/profile`, `/me/debug`, and `/tokens` routes plus their templates (`me_stats.html`, `profile_sessions.html`, `me_debug.html`, `my_tokens.html`). `/me/stats` and `/profile/sessions` 301-redirect; `/profile`, `/me/debug`, `/tokens` are removed outright with every internal link repointed to `/me/profile`. The `/me/debug/refetch-groups` POST moved to `/me/profile/refetch-groups` (still gated behind `AGNES_DEBUG_AUTH`). ### Internal - `/me/activity` and `/me/profile` use the canonical design-system primitives (`.data-table`, `.stat-card`, `.btn`) from the v0.54.10 design pass rather than bespoke per-page CSS; `stats-table` added to the design-system contract test's deprecated-class list. `me_debug.py` slimmed to a session-diagnostics helpers module; the page is composed from `_profile_tokens.html` and `_profile_troubleshooting.html` partials. - Documentation tree cleaned up and consolidated. `CLAUDE.md` rewritten (708 → ~320 lines): the four overlapping release sections, the stale `v1→v35` DuckDB schema history, and the marketplace endpoint internals moved out to focused docs; preachy process sections tightened. New `docs/RELEASING.md` (release process + deploy workflows + CI quirks, with `RELEASE_TEMPLATE.md` folded in as an appendix) and `docs/marketplace.md` (marketplace ingestion + re-serving internals). Historical planning artifacts (`docs/superpowers/`, 52 files) and dated one-off docs (`HACKATHON.md`, `pd-ps-comments.md`, `security-audit-2026-04.md`, `future/NOTIFICATIONS.md`) moved under `docs/archive/`. New `docs/README.md` documentation index organized by audience, linked from `README.md` and `CLAUDE.md`. Removed the `docs/auto-install.md` stub. Fixed dangling doc links in `connectors/jira/README.md` and `dev_docs/README.md`, and repointed code/doc references to the archived paths (or dropped the pointer where the target was already a dead reference on `main`). Added a root `AGENTS.md` pointing to `CLAUDE.md` as the single source of truth for any AI coding agent, and `CLAUDE.local.md` to `.gitignore`. ## [0.54.14] — 2026-05-14 ### Changed - **Marketplace submission surfaces — clearer CTA + fuller guides (#308).** The curated-tab action-row CTA now reads "Submit a skill or plugin" (was "Submit a plugin") — skills are first-class on the curated shelf — with the same wording mirrored in the empty-state JS and the route titles so the surfaces can't drift. The curated guide (`/marketplace/guide/curated`) grows from a 4-line stub into a 3-step walkthrough of the Named Curator handoff plus a `.guide-fastpath` callout pointing lighter submissions at the Flea Market; the flea guide (`/marketplace/guide/flea`) grows from a 3-line stub into a 4-step walkthrough of the `/store/new` self-serve flow and its automated guardrails (manifest, content-quality, and prompt-injection scans). ### Fixed - **`agnes refresh-marketplace` now enables stack plugins in workspace settings (#307).** The reconcile step previously stopped at `claude plugin install --scope project`, which only writes the global plugin registry (`~/.claude/plugins/installed_plugins.json`). Without a corresponding entry in the workspace `.claude/settings.json` `enabledPlugins` map, Claude Code treats every installed stack plugin as disabled — `/plugins` hides them from the active section and their slash commands, skills, and agents are unreachable. Refresh now writes `"@agnes": true` to the workspace settings file after install and update, treating the user's marketplace stack as the source of truth and re-enabling any plugin that a prior local `claude plugin disable` had turned off. - **Runtime CLI commands now work on Initial Workspace Template (override) workspaces (#307).** The `.claude/init-complete` sentinel carrying `override: true` previously short-circuited **every** Agnes writer to `.claude/`, which trapped admin-templated workspaces at a stale snapshot: `agnes refresh-marketplace` couldn't write the `enabledPlugins` map (the fix above stayed inert), and `agnes self-upgrade`'s `maybe_refresh_claude_hooks` couldn't migrate workspaces to new Agnes hook layouts. The sentinel was meant to gate **init-time** skip only — let admins ship the *initial* `.claude/` contents — not to lock the workspace permanently. The override check moves from inside the writers (`cli/lib/hooks.py::install_claude_hooks`, `cli/lib/hooks.py::maybe_refresh_claude_hooks`, `cli/lib/commands.py::install_claude_commands`, `cli/commands/refresh_marketplace.py::_enable_plugins_in_workspace_settings`) to the init-time call site that always was the right place (`cli/commands/init.py::init`, `if not override_active:`). Init-time behavior unchanged — `agnes init` on an override workspace still defers the workspace skeleton to admin's template. Admin custom hooks survive runtime refresh: Agnes only rewrites entries matching `_OUR_COMMAND_MARKERS` (`agnes self-upgrade` / `agnes pull` / ... substring set in `cli/lib/hooks.py`); foreign commands fall through unchanged, same contract as in default workspaces. Existing override workspaces auto-converge on the next `agnes self-upgrade` (which fires from every SessionStart hook); no manual operator action needed. Retracts the earlier *"full responsibility transfer; future Agnes hook fixes will NOT auto-propagate"* contract documented in the `[0.54.10]` `### Internal — risk-accepted by design` bullets — that scope was wider than the feature's actual intent. ### Fixed - `/me/activity` page hero subtitle now escapes `user.email` before concatenating it into the `| safe`-rendered subtitle. The raw concatenation bypassed Jinja2 auto-escaping — an XSS regression relative to the auto-escaped `me_stats.html` on `main`. ### Removed - **`/home` connectors block dropped — the onboarding flow covers it (#305).** The dedicated `
` section on `/home` (three tiles — Asana / Google Workspace / Atlassian — each with a "Copy prompt" button) duplicated content the install-hero's Step 4 clipboard payload already inlines via `app/web/setup_instructions.py::_connectors_block`: users walking the setup script visit every connector inline. The install-hero lead paragraph now names the connector families so the benefit stays visible before kick-off. The per-instance "Email admin" mailto CTA — previously gated inside the GWS tile when an operator contact email was set and GWS OAuth was unconfigured — was dropped along with the block; the GWS connector setup prompt still tells the user to ask an admin, but without the pre-filled per-instance contact address. ### Internal - Post-#305 cleanup. Removed the now-orphaned `gws_oauth`, `instance_admin_email`, and `connector_prompts` keys from the shared `_build_context` ctx dict in `app/web/router.py` — no template referenced them once the connectors block was dropped, and `connector_prompts` was calling `all_connector_prompts()` on every page render app-wide. Swept the dead `.connector-tile*`, `.connector-copy`, `.connector-preview`, `.copy-next-hint`, `.time-badge`, `.gating-note`, `.email-admin`, `.card-mini-cmd`, and `.connector-head` CSS rules plus the orphaned `.connector-copy` click-wiring JS from `home_not_onboarded.html`. Also removed the dead `.automode-*`, `.setup-collapsible`, and `.setup-minimize` CSS blocks and the `setupMinimizeToggle` / `data-setup-minimized` JS handler from the same template — the `
` sections and the "Minimize setup view" toggle they styled were removed by earlier PRs (#243 onward), leaving the whole minimize-mode machinery unreachable. ## [0.54.13] — 2026-05-14 ### Security - **RBAC filter uses stable `user_id` (UUID) instead of mutable email local-part (#293).** Non-admin users querying `agnes_sessions` / `agnes_telemetry` are now filtered by `user_id` (immutable UUID) rather than `username` (email local-part, which changes on rename). Schema v45 adds a `user_id` column to `usage_session_summary` and `usage_events`; the session pipeline's `resolve_user_id()` populates it on every (re)process run. `USAGE_PROCESSOR_VERSION` bumps 3→4 to trigger backfill. During the transition period, RBAC queries include an OR fallback on `username` so pre-backfill rows remain visible. ## [0.54.12] — 2026-05-14 ### Fixed - **Usage processor now extracts user-typed slash invocations.** Claude Code records `/foo` and `/plugin:name` slash commands as `/foo` XML tags embedded in user message content; the previous `^\s*/` regex in `iter_events` only matched raw `/foo` prefixes, which never appear in real session jsonls. Result on production: `usage_events.command_name` and `usage_session_summary.slash_commands` stayed NULL/0 for every actually-typed slash invocation (`/clear`, `/exit`, `/plugin`, `/model`, plugin commands of the form `/plugin:name`). Replaced with a `` tag scan; `USAGE_PROCESSOR_VERSION` bumps 2 → 3. Operators wanting to rewrite historical rows under the new logic call `POST /api/admin/usage/reprocess` (CLI: `agnes admin telemetry reprocess`). Implicit Skill tool_use extraction (LLM-decided invocations) is unchanged. ## [0.54.11] — 2026-05-14 ### Changed - Catalog page: each `catalog_data` bucket now renders as its own top-level Data Package card instead of being nested as a collapsible accordion under a single "Core Business Data" wrapper. The page hero title ("Data Packages") now describes the actual visual structure, and the card grain matches the `bucket` column on `table_registry`. Tables inside each package are flat-listed (no per-bucket accordion), mirroring the existing `Agnes Internal` card; the `Agnes Internal` and `Business Metrics` cards themselves are unchanged. Per-table sync info ("Synced …" / "Queried directly from BigQuery") on each row is preserved. The aggregate meta line ("N tables · ~M rows total · Synced X") on the old wrapper is dropped with no replacement — the global sync timestamp is no longer shown on this page. An instance with zero registered tables now renders no Data Package cards at all, where the old wrapper always rendered (showing "0 tables"). ## [0.54.10] — 2026-05-14 ### Changed - Web UI design system unified: single stylesheet (`style-custom.css`), canonical primitives for buttons, form controls, page headers, tables, empty states, toasts, and stat cards. Top-nav Admin entry now shares styling 1:1 with sibling links (font, color, padding, hover, active state) — previously a `