# Changelog All notable changes to Agnes AI Data Analyst. Format: [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Versions follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html), pre-1.0 — public surface (CLI flags, REST endpoints, `instance.yaml` schema, `extract.duckdb` contract) may shift between minor versions; breaking changes called out under **Changed** or **Removed** with the **BREAKING** marker. CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every CI build; semver tags (`v0.X.Y`) are cut at release boundaries and reference the same commit as a `stable-*` tag from the same day. --- ## [Unreleased] ## [0.47.0] — 2026-05-07 Catalog metadata enrichment + cache discipline + automatic warmup. Closes #155 + #156. ### Added - **`/api/v2/catalog` returns four new optional fields per row** — `rows`, `size_bytes`, `partition_by`, `clustered_by` — populated by per-source-type metadata providers (`connectors/bigquery/metadata.py`, `connectors/keboola/metadata.py`). For `query_mode='remote'` BigQuery rows, `size_bytes` is `active_logical_bytes + long_term_logical_bytes` (a full scan reads both); region resolved from `data_source.bigquery.location` → `bq_client.get_dataset(...)` → fall back to legacy `__TABLES__`. Existing CLI consumers reading only `rough_size_hint` are unaffected. - **Automatic cache warmup at startup.** FastAPI startup event schedules a background task that walks BQ remote rows and pre-populates `_metadata_cache` + `_schema_cache` with bounded concurrency (default 4, tunable via `AGNES_WARMUP_CONCURRENCY`). Doesn't block readiness; per-row failures logged + skipped. Opt-out via `AGNES_SKIP_CACHE_WARMUP=1`. - **Three new admin endpoints under `/api/admin/cache-warmup/*`:** - `GET /status` — JSON snapshot of the latest run. - `POST /run` — manual trigger, idempotent under concurrent invocation. - `GET /stream` — Server-Sent Events with `start` / `row` / `complete` events for live UI updates. - **`/admin/tables` cache freshness panel.** Toolbar above the per-source-type listings with progress bar + "Re-warm all" button + collapsible terminal-style log fed by SSE (polling fallback at 3 s). Per-row badge in the existing `col-status` column updates live (fresh / warming / pending / error). - **`docs/admin/query-modes.md`** — source-agnostic admin reference for registering tables as `local` / `remote` / `materialized`. Decision tree, per-source-type IAM + setup, three worked examples. Linked from the `?` icon next to the `query_mode` field in the admin UI edit modal and from the third post-register hint in `agnes admin register-table`. - **`agnes admin register-table` post-register hint** for `query_mode=remote`: points at `agnes query --remote "SELECT COUNT(*)..."` as the IAM smoke check so a missing `dataViewer` / `jobUser` surfaces at registration time, not 30 minutes later. ### Changed - **`/api/v2/schema/{id}` cache miss now does 1 BQ job instead of 2.** `connectors/bigquery/access.py:fetch_bq_columns_full` collapses what used to be `_fetch_bq_schema` + `_fetch_bq_table_options` into a single `INFORMATION_SCHEMA.COLUMNS` query (same view, same predicate, just a combined SELECT list). The metadata provider's partition/cluster path shares the same helper — zero SQL duplication across the two consumers. - **All four catalog/schema/sample/metadata caches are flushed on registry change.** `app/api/v2_catalog.py:invalidate_for_table` is wired into `POST /api/admin/register-table`, `PUT /api/admin/registry/{id}`, and `DELETE /api/admin/registry/{id}`. After a registry write, a single-row re-warm task is scheduled in the background so the admin's verification request hits warm caches within ~1 s instead of waiting for the next analyst miss. Pre-fix none of the caches were invalidated — admin registers a table, `agnes catalog` doesn't show the new row for up to 5 min; admin updates a row's bucket, `agnes schema` returns the OLD column list for up to 1 hour. - **`v2_schema.build_schema` split into RBAC-aware outer + RBAC-naive inner (`build_schema_uncached`).** Live endpoint behavior unchanged; warmup uses the inner entry point to populate `_schema_cache` without a user context. ### Internal - New shared dataclass module `app/api/_metadata_models.py` with `MetadataRequest` (frozen) + `TableMetadata` for source-agnostic provider input/output. - New `connectors/keboola/storage_api.py:KeboolaStorageClient.get_table_info` thin wrapper — keeps `_get` private to the module. - New env vars (operator-facing tuning, no required setup change): - `AGNES_SKIP_CACHE_WARMUP` — opt-out of startup warmup. - `AGNES_WARMUP_CONCURRENCY` — default 4, max parallel BQ INFORMATION_SCHEMA jobs during a warmup pass. - New runtime dependency: `sse-starlette>=2.0` (Server-Sent Events responses for the cache-warmup stream). - Tests added: `test_metadata_models`, `test_v2_schema_columns_consolidation`, `test_v2_catalog_dispatcher`, `test_connectors_bigquery_metadata`, `test_connectors_keboola_metadata`, `test_v2_catalog_remote_metadata`, `test_v2_catalog_invalidation`, `test_cache_warmup`, `test_main_startup_warmup`, `test_admin_tables_warmup_ui`. ## [0.46.5] — 2026-05-07 ### Fixed - `agnes describe -n 5` previously failed with `Missing argument 'TABLE_ID'` because the command was registered as a `Typer.Typer` subcommand group; the combination of positional `table_id` + short option `-n INTEGER` mis-parses in that pattern. Switched to a flat `@app.command("describe")` registration. All forms (`-n` before/after positional, `--rows=N`, default n=5) now parse correctly. Surfaced from a real analyst session following the CLAUDE.md "agent rails" discovery workflow. - `/api/v2/sample/` (called by `agnes describe`) returned HTTP 500 with `ValueError: Out of range float values are not JSON compliant: nan` when the result rows contained NaN values from the underlying DuckDB / BigQuery scan. The endpoint now sanitizes NaN/±inf to JSON `null` before serialization. Same surfaced from a real analyst session. ## [0.46.4] — 2026-05-07 ### Fixed - SessionEnd `agnes push` hook previously synchronous-ran in the foreground; Claude Code's `-p` (headless) mode terminates SessionEnd hook subprocesses after ~1 second regardless of work in progress, so the upload was killed mid-stream and most session JSONLs never reached the server. Now wrapped in `bash -c "( nohup agnes push ... & ) ; true"` so the upload child detaches from the hook subprocess and survives Claude's aggressive shutdown. Existing workspaces pick up the detached form on their next `agnes init` invocation via the existing migration path. Verified end-to-end against production: `claude -p` exited in 5s, the detached child completed the upload, and the session JSONL landed on the server within 30s. ## [0.46.3] — 2026-05-07 ### Added - `agnes init` now installs a third SessionStart hook entry (`agnes push --quiet`) so orphan session JSONLs left behind by `claude -p` headless invocations (where Claude Code does NOT fire SessionEnd) or abnormal exits get uploaded on the next interactive session start. Symmetric self-healing alongside the existing `agnes pull` SessionStart entry. Existing workspaces pick up the third entry on their next `agnes init` invocation via the existing migration path in `cli/lib/hooks.py:_OUR_COMMAND_MARKERS`. ### Fixed - `agnes diagnose` `session_pipeline` warning previously read "uploads are not being processed", which led users to suspect their `agnes push` uploads were failing. The warning now reads "verification-detector backlog" and includes `last_processed` so operators see at a glance that uploads are fine and only the LLM extraction step is behind. ## [0.46.2] — 2026-05-07 ### Fixed - `agnes query` against a `query_mode='remote'` table previously surfaced DuckDB's misleading "did you mean " suggestion. Now appends a friendlier hint pointing users to `agnes catalog`, `agnes schema `, and `agnes query --remote`. Reproduces from a real analyst session where `DESCRIBE unit_economics` (a remote table) sent the user down a 30-second wrong path. ## [0.46.1] — 2026-05-07 ### Fixed - `remote_estimate_failed` now surfaces the rewritten-SQL diagnostic (the actual BQ "Unrecognized name" / "Syntax error" message) instead of the unhelpful "Table must be qualified" from the user-original-SQL retry. Adds `underlying_original` for the second-attempt context. Hint now points users to `agnes schema ` first — the typical cause is a typo'd column name. ## [0.46.0] — 2026-05-07 Catalog metadata enrichment + cache discipline + automatic warmup. Closes #155 + #156. ### Added - **`/api/v2/catalog` returns four new optional fields per row** — `rows`, `size_bytes`, `partition_by`, `clustered_by` — populated by per-source-type metadata providers (`connectors/bigquery/metadata.py`, `connectors/keboola/metadata.py`). For `query_mode='remote'` BigQuery rows, `size_bytes` is `active_logical_bytes + long_term_logical_bytes` (a full scan reads both); region resolved from `data_source.bigquery.location` → `bq_client.get_dataset(...)` → fall back to legacy `__TABLES__`. Existing CLI consumers reading only `rough_size_hint` are unaffected. - **Automatic cache warmup at startup.** FastAPI startup event schedules a background task that walks BQ remote rows and pre-populates `_metadata_cache` + `_schema_cache` with bounded concurrency (default 4, tunable via `AGNES_WARMUP_CONCURRENCY`). Doesn't block readiness; per-row failures logged + skipped. Opt-out via `AGNES_SKIP_CACHE_WARMUP=1`. - **Three new admin endpoints under `/api/admin/cache-warmup/*`:** - `GET /status` — JSON snapshot of the latest run. - `POST /run` — manual trigger, idempotent under concurrent invocation. - `GET /stream` — Server-Sent Events with `start` / `row` / `complete` events for live UI updates. - **`/admin/tables` cache freshness panel.** Toolbar above the per-source-type listings with progress bar + "Re-warm all" button + collapsible terminal-style log fed by SSE (polling fallback at 3 s). Per-row badge in the existing `col-status` column updates live (fresh / warming / pending / error). - **`docs/admin/query-modes.md`** — source-agnostic admin reference for registering tables as `local` / `remote` / `materialized`. Decision tree, per-source-type IAM + setup, three worked examples. Linked from the `?` icon next to the `query_mode` field in the admin UI edit modal and from the third post-register hint in `agnes admin register-table`. - **`agnes admin register-table` post-register hint** for `query_mode=remote`: points at `agnes query --remote "SELECT COUNT(*)..."` as the IAM smoke check so a missing `dataViewer` / `jobUser` surfaces at registration time, not 30 minutes later. ### Changed - **`/api/v2/schema/{id}` cache miss now does 1 BQ job instead of 2.** `connectors/bigquery/access.py:fetch_bq_columns_full` collapses what used to be `_fetch_bq_schema` + `_fetch_bq_table_options` into a single `INFORMATION_SCHEMA.COLUMNS` query (same view, same predicate, just a combined SELECT list). The metadata provider's partition/cluster path shares the same helper — zero SQL duplication across the two consumers. - **All four catalog/schema/sample/metadata caches are flushed on registry change.** `app/api/v2_catalog.py:invalidate_for_table` is wired into `POST /api/admin/register-table`, `PUT /api/admin/registry/{id}`, and `DELETE /api/admin/registry/{id}`. After a registry write, a single-row re-warm task is scheduled in the background so the admin's verification request hits warm caches within ~1 s instead of waiting for the next analyst miss. Pre-fix none of the caches were invalidated — admin registers a table, `agnes catalog` doesn't show the new row for up to 5 min; admin updates a row's bucket, `agnes schema` returns the OLD column list for up to 1 hour. - **`v2_schema.build_schema` split into RBAC-aware outer + RBAC-naive inner (`build_schema_uncached`).** Live endpoint behavior unchanged; warmup uses the inner entry point to populate `_schema_cache` without a user context. ### Internal - New shared dataclass module `app/api/_metadata_models.py` with `MetadataRequest` (frozen) + `TableMetadata` for source-agnostic provider input/output. - New `connectors/keboola/storage_api.py:KeboolaStorageClient.get_table_info` thin wrapper — keeps `_get` private to the module. - New env vars (operator-facing tuning, no required setup change): - `AGNES_SKIP_CACHE_WARMUP` — opt-out of startup warmup. - `AGNES_WARMUP_CONCURRENCY` — default 4, max parallel BQ INFORMATION_SCHEMA jobs during a warmup pass. - New runtime dependency: `sse-starlette>=2.0` (Server-Sent Events responses for the cache-warmup stream). - Tests added: `test_metadata_models`, `test_v2_schema_columns_consolidation`, `test_v2_catalog_dispatcher`, `test_connectors_bigquery_metadata`, `test_connectors_keboola_metadata`, `test_v2_catalog_remote_metadata`, `test_v2_catalog_invalidation`, `test_cache_warmup`, `test_main_startup_warmup`, `test_admin_tables_warmup_ui`. ## [0.45.0] — 2026-05-07 Operator-and-analyst quality bundle: a security fix for the optional Telegram bot, two CLI gaps closed, and three rounds of UX polish on `agnes diagnose` and `agnes pull` so non-TTY consumers (CI runners, Claude Code SessionStart hooks, sub-agent watchdogs) get readable, actionable signal. Closes #84, #164, #177, #178, #203, #204. ### Security - **Telegram bot pairing-code RNG hardened (#84).** The pairing code used to link a Telegram chat to an Agnes user is now generated via `secrets.choice` (CSPRNG) rather than `random.choices`. Pre-fix an attacker who scraped one issued code could recover the `random` module's PRNG state and predict subsequent codes issued in the same process — the fix neutralizes that class of attack (`services/telegram_bot/storage.py:_generate_code`). - **Telegram script runner refuses out-of-shape usernames (#84).** The optional notification runner shells out via `sudo -u `. A username controlled by an attacker — e.g. via tampering with `telegram_users.json` — could otherwise carry sudo flags (`-u`, `--shell=…`) or shell metacharacters. The runner now validates the value against a POSIX-conservative regex (`^[a-z_][a-z0-9._-]{0,31}$`) and returns `None` before invoking `subprocess.run` if it doesn't match (`services/telegram_bot/runner.py:_USERNAME_RE`). ### Added - `agnes admin unregister-table ` — CLI wrapper for `DELETE /api/admin/registry/{id}` (#177). Confirms before destructive action; pass `--yes` to skip the prompt in scripts. The server-side endpoint already does the parquet/`sync_state` cleanup; the CLI is a thin client. - `agnes admin update-table ` — CLI wrapper for `PUT /api/admin/registry/{id}` (#177). Only the supplied flags go in the body (`--name`, `--bucket`, `--source-table`, `--query-mode`, `--query`, `--description`, `--sync-schedule`, `--source-type`); the rest stay unchanged on the server. `--query` accepts `@path/to.sql` for files. Calling with no flags errors (`No fields supplied`) instead of silently no-opping. - `agnes diagnose --include-schema` (#204). The default `agnes diagnose` no longer surfaces the DB schema-version check — analysts hitting the CLI rarely care about the integer, and it dominated the agent-facing output. Pass `--include-schema` (or query `/api/health/detailed?include=schema` directly) when verifying a migration. - **`info` severity tier in `/api/health/detailed`** (#178). Sits between `ok` and `warning`: surfaces a non-trivial observation worth reading without promoting the headline status to `degraded`. See the module docstring at `app/api/health.py` for the full severity ladder. The BQ billing-equals-data check is the first consumer (was `warning` → now `info`). - `AGNES_PULL_PROGRESS_INTERVAL_SECONDS` and `AGNES_PULL_PROGRESS_INTERVAL_BYTES` env knobs for the textual progress emitter (#203). Defaults are tighter than pre-fix (5 s / 1 MiB vs the previous 30 s / 10%-of-total) so non-TTY consumers see continuous output and don't trip dead-process watchdogs on multi-GB parquets. Override either independently. ### Changed - **`agnes pull` non-TTY progress is more chatty by default (#203).** Previous cadence (30 s / 10%) produced one line every several minutes on multi-GB parquets, long enough for Claude Code sub-agent watchdogs to kill the pull as a hung process. New defaults: emit when *any* of (10% boundary, 5 s elapsed, 1 MiB bytes since last emit). The 10% boundary is unchanged so small files still get the original visual rhythm. - **`/api/health/detailed` no longer includes `db_schema` by default (#204).** Pass `?include=schema` to opt back in. The aggregator treats the schema check as "not asserted" when absent, so unrelated services can still drive the headline. Operators using the legacy entry should add the parameter to their probe configuration. - **BQ billing-project equals data-project surfaces as `info`, not `warning` (#178).** Many valid single-project dev instances run with billing == data; the message is informational. The `detail` + `hint` strings are unchanged so the operator still gets the USER_PROJECT_DENIED context if they're hitting it. Pre-fix, the message alone promoted the overall headline to `degraded` even on intentionally collapsed setups. - `agnes init --force` now snapshots the prior `CLAUDE.md` to `CLAUDE.md.bak.` before regenerating it (#164). Each re-run produces a fresh backup; the prior backup is not clobbered. A FS error on the backup path is logged but does not abort the init (the existing-workspace gate still requires `--force`). ### Internal - New `cli.client.api_put` helper to mirror `api_get` / `api_post` / `api_delete` / `api_patch` for the new `update-table` command. - Tests added: `tests/test_telegram_bot_runner.py`, `tests/test_health_schema_gate.py`, plus extensions to `test_telegram_storage`, `test_pull_progress`, `test_diagnose_billing`, `test_cli_admin`, `test_cli_init`. - `infra/modules/customer-instance` (tag `infra-v1.8.0`): `startup-script.sh.tpl` no longer overwrites operator-edited `AGNES_TAG` / `AGNES_TEMP_DIR` in `/opt/agnes/.env` on every boot. Reads the existing values when present and lets them win over the template-computed `$IMAGE_TAG`. Pre-fix, an in-place TF action that stopped/started the VM (e.g. `machine_type` change) would re-run the startup script and clobber any manually-pinned image tag — operators had to re-edit the file post-restart. Fresh provisions still get the TF-driven values; the `.env` file's existence is the disambiguator. To force a TF-driven reset, `rm /opt/agnes/.env` and reboot. Folded in from #214, which landed on main between 0.44.1 and this cut. ## [0.44.1] — 2026-05-07 ## [0.44.1] — 2026-05-07 ### Fixed - `/admin/users/{id}` — "Add to group" dropdown explains itself when empty instead of leaving the admin staring at a silent `— Pick a group —` placeholder. Three cases now surface a hint below the picker: (a) user is already in every group, (b) every remaining group is Google-Workspace-managed and Agnes can't grant manually (POST would 409 — link to `/admin/groups` to create a custom group), (c) no groups exist at all. Pre-fix on deployments where `Admin` + `Everyone` are mapped via `AGNES_GROUP_{ADMIN,EVERYONE}_EMAIL` and no custom groups exist, the picker was empty with zero indication that the operator needed to create a custom group first. - `/admin/users/{id}` — "Add to group" dropdown's `loadAll()` race fixed: pre-fix `loadGroups()` and `loadMemberships()` ran in parallel and `refreshGroupDropdown()` (called from `loadGroups`) read the `memberships` global, which could still be `[]` if memberships hadn't returned yet — letting the dropdown show groups the user was already in. `loadMemberships()` now re-runs the dropdown refresh once it has its data, so the final render reflects both data sets regardless of which fetch completes first. ## [0.44.0] — 2026-05-07 ### Added - `agnes refresh-marketplace` — single CLI command that owns the per-user filtered Claude Code marketplace lifecycle. `--bootstrap` does the first-time setup: clones the per-user marketplace bare repo to `~/.agnes/marketplace`, strips the PAT from the cloned origin URL so it doesn't sit in plaintext at rest, registers the local path with Claude Code, and installs every plugin in the served manifest at `--scope project`. Without `--bootstrap` it does an incremental refresh: fetch + reset to the remote, then version-aware reconcile (install missing plugins, update on version diff, skip on match). Plugins removed from the manifest are deliberately NOT auto-uninstalled — a transient empty manifest from the server would otherwise wipe the user's stack. - `agnes init` now installs a SessionStart hook that runs `agnes refresh-marketplace --quiet` on every Claude Code session, alongside the existing chained `agnes self-upgrade; agnes pull` entry. The marketplace refresh runs as a *separate* hook entry (not chained) so a failure (e.g. fresh workspace with no clone yet) doesn't suppress the data pull. The refresh command is wrapped in `bash -c "..."` because Claude Code on Windows runs hook commands directly without a shell, which would otherwise leave the `2>/dev/null || true` syntax uninterpreted. - When `agnes refresh-marketplace` detects an actual change, it emits Claude Code hook JSON on stdout — `systemMessage` (transient toast) and `additionalContext` (model-side system reminder) — both pointing at `/reload-plugins` so the running session loads new plugins without a restart. ### Changed - Install-prompt step 5 (in the dashboard-served setup payload) collapses from a 15-line inline shell sequence — `rm -rf` + `git clone` + per-plugin `claude plugin install` calls — to a single `agnes refresh-marketplace --bootstrap` invocation. The old inline form tripped Claude Code's agent `rm -rf` permission gate on first run. - `scripts/dev/agnes-client-reset.sh`: now cleans `~/.claude/plugins/{marketplaces,cache}/agnes`, drops the uv build cache, and documents workspace-scoped residue that can't be enumerated from a user-level reset. ### Internal - `infra/modules/customer-instance` (tag `infra-v1.7.0`): `google_compute_instance.vm` now sets `allow_stopping_for_update = true`. Without it, changing `machine_type` (or any other field GCP will only mutate on a stopped VM) caused Terraform to fall back to a destroy + recreate, churning VM-local state for what should be an in-place resize. Consumers do not need to update — the field is provider-side only — but bumping the module ref to `infra-v1.7.0` enables in-place machine-type bumps. ## [0.43.0] — 2026-05-06 ### Added - CLI auto-upgrade: `agnes self-upgrade` reinstalls the CLI from the server's currently-shipped wheel via `uv tool install --force`, falling back to `pip install --force-reinstall --no-deps` via `sys.executable` when uv is not on PATH. After install, the new binary is smoke-tested at the install-resolved path (`uv tool dir --bin` for uv, `/agnes` for pip) — never via PATH lookup, to avoid stale-shadow false positives. Smoke failure triggers automatic rollback to the previously verified-good wheel (recorded in `~/.config/agnes/last_known_good.json`); rollback's exit code is captured and surfaced on stderr if it also fails. First-ever upgrade or unrecoverable rollback prints the canonical bootstrap recovery: `curl -fsSL /cli/install.sh | bash`. The new command is wired into the SessionStart hook installed by `agnes init` as a chained shell entry (`agnes self-upgrade … || true; agnes pull … || true`) so an upgrade failure does not block the pull. - Server: `/api/*` responses now carry `X-Agnes-Latest-Version` and `X-Agnes-Min-Version` headers. CLIs older than `X-Agnes-Min-Version` exit with **code 2** and a remediation message instead of failing on a wire-protocol mismatch. Day-one floor is `0.0.0` (no enforcement) — bump `MIN_COMPAT_CLI_VERSION` in `app/version.py` in the same PR that ships a deliberate wire break. - CLI: `cli/update_check.py:check()` accepts a keyword-only `bypass_disabled=True` so explicit `agnes self-upgrade` invocations probe `/cli/latest` even when `AGNES_NO_UPDATE_CHECK=1` is set (which silences the implicit warning loop only). ## [0.42.0] — 2026-05-06 ### Fixed - `agnes query --remote`: full backtick BigQuery paths in user SQL are no longer corrupted by the registered-name rewriter. Previously a query like ``SELECT … FROM `..
` WHERE …`` whose table name happened to be registered as a bare-name alias would have the alias re-substituted *inside* the backtick path, producing malformed SQL that BigQuery rejected with a parse error. The cap-guard then fell back to a filter-less `SELECT *` size estimate (often orders of magnitude larger than the real scan), blocking the query as `remote_scan_too_large`. Issue #201. ### Changed - `agnes query --remote`: cap-guard fallback no longer estimates from a synthetic `SELECT *` when the rewritten SQL fails dry-run. It first retries the user's original SQL (handles BQ-native input cleanly), and only when *that* also fails returns a structured `remote_estimate_failed` HTTP 400 with a hint instead of silently over-estimating. - **BREAKING (clients matching error kinds)**: failure to estimate remote-query scan size now returns `kind="remote_estimate_failed"` instead of being masked as `remote_scan_too_large` caused by over-estimation. Operators that grep for the old kind in dashboards should update. ### Security - `agnes query --remote`: full backtick BigQuery paths are now registry-gated identically to `bq.""."
"` syntax. Previously, full backtick paths bypassed Agnes RBAC entirely — only the configured service account scope limited what users could query. New `bq_path_cross_project` (when the project ≠ configured data project) and `bq_path_not_registered` (when path is unknown) error kinds. Issue #201. ## [0.41.0] — 2026-05-06 ### Fixed - **Orchestrator filesystem fallback for materialized parquets that couldn't register in `extract.duckdb`'s `_meta`** (`src/orchestrator.py:_attach_and_create_views`). The 0.40.0 fix in `materialize_query` opens `extract.duckdb` from a fresh DuckDB handle to write the `_meta` row + inner view; in production the same uvicorn process already holds `extract.duckdb` ATTACHed read-only as the source-name alias under the orchestrator's analytics connection, and DuckDB's single-process file-handle uniqueness rejects the second open with `Binder Error: Unique file handle conflict: Cannot attach "extract" — already attached by database ""`. The 0.40.0 helper logs WARNING and falls through; parquet stays canonical, but the master view never appears via the meta path. This release adds a second pass at the end of `_attach_and_create_views`: scan `/data/*.parquet` and create a master view via `read_parquet('')` for any parquet whose `` is not already in the per-source `tables` list (i.e. the meta path didn't pick it up). Decoupled from `materialize_query`'s open-handle race; robust against any registration drift between materialize and rebuild. Honors the same `view_ownership` / cross- connector collision rules as the meta path (first-come-first-served via `view_repo.claim`). Tests cover: fallback fires when meta row is missing; fallback skips when meta path already created the view (no shadow); invalid identifier in parquet stem is skipped without crash; source without `data/` subdir doesn't crash the scan. ## [0.40.0] — 2026-05-06 ### Fixed - **Materialized BigQuery parquets now register themselves in `extract.duckdb` so the master view actually appears** (`connectors/bigquery/extractor.py:materialize_query`). Pre-fix the function wrote the `.parquet` to disk and returned the row count, but **never** wrote a `_meta` row or an inner view in the connector's `extract.duckdb`. The orchestrator's `rebuild()` scans `_meta` to decide which master views to create, so materialized tables remained invisible: `agnes query "SELECT … FROM "` returned HTTP 400 *"registered as query_mode='materialized' but is not yet materialized in this instance's analytics views"* even though the parquet was sitting there. Symptom appeared after every container recreate (image upgrade) and after every `_create_meta_table` cycle in the extractor subprocess (which `DROP TABLE IF EXISTS _meta` + `CREATE TABLE` cleanly each pass — wiping any prior materialized rows). Fix: after the atomic `os.replace(tmp_path, parquet_path)`, open `extract.duckdb` and `DELETE FROM _meta WHERE table_name = ? + INSERT + CREATE OR REPLACE VIEW AS SELECT * FROM read_parquet('')` inside a single transaction. Idempotent, fail-soft (parquet remains canonical, the next sync pass recovers any registration drift). When `extract.duckdb` doesn't exist yet (fresh BQ-only deployment), the fix logs and continues — the next extractor pass creates the file and the master view appears on the rebuild after that. ## [0.39.0] — 2026-05-06 ### Performance - **`/api/query` (and `agnes query --remote`) now rewrites user SQL referencing `query_mode='remote'` BigQuery rows into a single `bigquery_query()` call before execute** (`app/api/query.py`). Pre-fix the master view (`CREATE VIEW AS SELECT * FROM bigquery..`) did not push WHERE / SELECT / LIMIT into BQ — the DuckDB BQ extension opened a Storage Read API session over the entire upstream table, scanning the full partitioned dataset before the local DuckDB filter ran. On 100M+ row remote-mode tables this was 50-100× slower than the equivalent direct `bigquery_query()` call (70-150 s vs 1.5 s) and frequently failed with `Response too large to return`. The rewriter (shared core with the existing dry-run helper) wraps the user's whole SQL in `bigquery_query('', '')` so the BQ planner receives the full query and applies partition pruning + projection pushdown server-side. Conservative fall-through: cross-source JOINs (BQ ↔ Keboola/Jira local), queries already containing `bigquery_query(`, and unconfigured BQ project all keep the original ATTACH-catalog path so behavior degrades gracefully. - **DuckDB BigQuery-extension session pool** (`connectors/bigquery/access.py`). `BqAccess.duckdb_session()` now acquires pre-warmed connections from a bounded process-local pool instead of running `INSTALL bigquery; LOAD bigquery; CREATE SECRET; ATTACH …` on every request. Each acquire saves the ~0.5 s extension-load + secret-creation cost when the pool has a warm entry; auth SECRET is refreshed on acquire so a long-lived pooled entry doesn't keep a stale GCE metadata token past its TTL. Pool size is configurable via `data_source.bigquery.session_pool_size` (default 4; sentinel `0` disables pooling). Affects every BQ-touching path — `/api/query`, `/api/v2/scan`, `/api/v2/sample`, `/api/v2/schema`, materialize, and the orchestrator's remote-attach. - **`agnes pull` chunked download for large parquets**: when the server advertises `accept-ranges: bytes` and a parquet exceeds `AGNES_PULL_CHUNK_THRESHOLD_BYTES` (default 50 MB), the CLI now splits the file into N parallel HTTP Range requests (`AGNES_PULL_CHUNK_PARALLELISM`, default 4, capped 1..16) and assembles the parts into the destination atomically. Targets the per-flow-shaped network (corp VPN with per-TCP-connection rate-limiting) where a single stream is throttled but N parallel streams over the same connection scale roughly linearly. Falls back to single-stream when the server responds 200 instead of 206 to a Range probe, when no `accept-ranges: bytes` is advertised, or when content is below the threshold — no behavior change in the small-file / non-cooperating- server cases. - **Persistent HTTP/2 client across `agnes pull`**: `stream_download` now routes through a process-wide pooled `httpx.Client` so N parquet downloads share a single TLS handshake; HTTP/2 multiplexing (when the optional `h2` package is installed) lets all chunk Range requests share one TCP connection. Gracefully falls back to HTTP/1.1 pooling when `h2` is missing — no crash, just slightly less benefit. ### Fixed - **BigQuery `responseTooLarge` no longer surfaces as a generic 400 / 502 with the raw upstream message** (`connectors/bigquery/access.py`). The `translate_bq_error` helper now classifies "Response too large to return" errors via a dedicated `bq_response_too_large` kind (HTTP 400) with an actionable hint pointing at the WHERE / aggregation / materialized-table remediations. Pre-fix this failure mode fell through to the generic `bq_bad_request` mapping, which implied the user's SQL had a syntax error — wrong root cause. Affects every BQ-touching path (`/api/query`, `/api/v2/scan`, `/api/v2/sample`, `/api/v2/schema`, materialize) since they all share `translate_bq_error`. ### Added - New optional dependency `h2>=4.1.0` (HTTP/2 transport for httpx). Pure performance — `agnes pull` works on HTTP/1.1 if the install skips it. - **Textual progress fallback for non-TTY `agnes pull`**: when stderr is not a terminal (Claude Code SessionStart hook, CI runner, Docker log capture, …), `agnes pull --no-quiet` now emits a plain-text progress line per file at most every 10% or 30 s, plus a final completion line. Replaces the previous Rich-bar-on-pipe behavior that either suppressed output entirely or leaked ANSI escape sequences. TTY path unchanged (Rich progress bar with bytes / speed / ETA, aggregated per-file across chunked-download chunks). ## [0.38.3] — 2026-05-06 ### Changed - **Admin / Tables**: registry table now shows Source (bucket/table), Schedule, Folder, Registered by/at, and a sync-error warning icon per row. The page widens to ~1600px to accommodate. ### Fixed - **Admin / Tables**: long table descriptions no longer push the row's Edit / Manage access / Delete buttons off-screen. The Description column is now clamped to 2 lines with the full text available on hover and in the Edit modal. - **Admin / Tables**: descriptions stored with shell-quoting backslash-escapes (`Don\'t`, `\n`) now render correctly. The same normalization also runs at register/update time so newly-saved descriptions are never corrupted. - **Admin / Tables**: `scripts/fix_description_escapes.py` cleans up already-corrupted descriptions in `table_registry` (run with `--dry-run` first, then `--apply`). ## [0.38.2] — 2026-05-06 ### Fixed - **`bq_query_timeout_ms` was not applied on every BigQuery ATTACH branch** (`src/db.py:_reattach_remote_extensions`, `src/orchestrator.py:_attach_remote_extensions`). Pre-fix only the metadata-token branch (the BqAccess contract, `token_env=''`) called `apply_bq_session_settings`. BigQuery sources registered with an explicit `token_env`, or with no auth env, ATTACH'd without ever applying the timeout — falling back to the extension's 90 s default. Default-config operators on those branches now consistently get the configured 600 s (or whatever `data_source.bigquery.query_timeout_ms` is set to). - **`apply_bq_session_settings` swallowed every `Exception` silently** (`connectors/bigquery/access.py`). Two realistic failure modes — the BigQuery extension not yet loaded on the connection, or an installed extension version that doesn't recognise the setting — left the 90 s default in place with no log line explaining why. Each failure path now logs `WARNING` with the actionable cause; on success the applied value is verified via a `current_setting('bq_query_timeout_ms')` readback (catches the silent-ignore mode some extension versions exhibit) and a mismatch logs `WARNING` too. ## [0.38.1] — 2026-05-06 ### Internal - `CLAUDE.md` — `Claude Code marketplace endpoint` section now documents the two-step fallback (system `git clone` + local `claude plugin marketplace add`) for users registering manually against a private-CA Agnes instance. Bun-compiled `claude` ignores the OS trust store and CA env vars on the marketplace HTTPS path, so direct `/plugin marketplace add` over HTTPS can fail with TLS errors on macOS / Windows even when system tools work fine. The dashboard-served setup payload (`app/web/setup_instructions.py`) already branches between the two automatically based on platform; the doc snippet now matches that behavior for manual flows. ## [0.38.0] — 2026-05-06 ### Added - **`/store` page** — community marketplace where every authenticated user can upload skills, agents, and plugins as ZIPs. Listing has type / category / search filters; detail page shows metadata, file list, photo, video link, and an `[Install]` button. Same owner can't have two entities with the same `name` (any type). Plugin/skill/agent name is suffixed `-by-` (sanitized email-local-part) at upload time to avoid collisions in Claude Code's flat namespace. - **`/my-ai-stack` page** — every user's per-user composition view: the admin-granted plugins (with an opt-out toggle each, default enabled) plus the entities they've installed from the Store. Toggling a curated plugin off writes a `user_plugin_optouts` row; admin removing the underlying grant drops everyone's opt-out (re-grant restarts at enabled). - **Composed served marketplace**: the `/marketplace.zip` and `/marketplace.git/` endpoints now serve `(admin_granted ∖ opt_outs) ∪ store_installs` — driven by the new `src/marketplace_filter.py:resolve_user_marketplace`. Same content-addressed ETag / git-commit-SHA contract as before; any change on either layer propagates to Claude Code on the next refresh. - **Store skill+agent bundle**: skill/agent installs are merged into a single synthetic `agnes-store-bundle` plugin in the served marketplace (one plugin with N skills/agents inside), while `type='plugin'` Store entities stay standalone. Cuts plugin-entry count in Claude Code from O(installs) down to O(1) for the skill+agent path. Bundle's `version` field hashes its combined contents so install/uninstall flips it for auto-update detection. - REST: `POST/PUT/DELETE/GET /api/store/entities[/{id}]`, `POST/DELETE /api/store/entities/{id}/install`, `GET /api/store/entities/{id}/photo`, `GET /api/store/entities/{id}/docs/{filename}`, `POST /api/store/entities/preview` (wizard step-1 validation), `GET /api/store/categories`, `GET /api/store/owners`, `GET /api/my-stack`, `PUT /api/my-stack/curated/{marketplace_id}/{plugin_name}`. - **CLI: `agnes store {list,show,install,uninstall,upload,update,delete,pull,info}`** and **`agnes my-stack {show,toggle}`** — full analyst-side coverage of the new Store + composition REST surface. Multipart upload helper added to `cli/v2_client.py` (`api_post_multipart` / `api_put_multipart` / `api_get_stream`) so future multipart and binary-download endpoints don't have to roll their own httpx wiring. - **CLI: `agnes admin store {pull,push,info}`** — operator-flavored bulk Store ops. ``pull`` and ``info`` share the open `GET /api/store/bundle.zip` / `/entities` endpoints; ``push`` wraps the admin-gated `POST /api/store/import-bundle`. ``push`` accepts either a *.zip file or a directory containing `manifest.json` + `entities/` (CLI zips a directory client-side, so a backup git repo's working tree round-trips straight back into Agnes via a single command). - **CLI: `agnes store mine`** — analyst-facing self-bundle. Same endpoint as `admin store pull`, scoped via `?owner=me` (server resolves the magic value to the caller's user_id) so authors can archive their own uploads without admin role. - **REST: `GET /api/store/bundle.zip`** — deterministic ZIP of all (filtered) Store entities for whole-Store backup. Layout: `manifest.json` at the top with per-entity metadata + `owner_email` for cross-instance restore, then `entities//{plugin,assets}/`. Auth: any authenticated user (Store is community-open, the same set is already visible via `GET /api/store/entities`). Filters mirror the listing endpoint (type / category / owner / search). - **REST: `POST /api/store/import-bundle`** — admin-only restore of a bundle ZIP. Modes: `merge` (default — upsert by `entity_id`, replace when version differs), `replace` (overwrite all matching), `skip` (only insert new). Owner resolution by `owner_email` against `users.email`; missing emails get a stub disabled user (`active=False`, no password, id `imported-`) so the historical owner stays attached and an admin can later activate or reassign in `/admin/users`. Audit-logged with the full counts. ### Changed - `/admin/marketplaces` admin nav entry moved from the top-level header into the Admin dropdown and renamed to **Curated Marketplaces** to disambiguate from the new community Store. - `app/api/access.py` `DELETE /api/admin/grants/{grant_id}` now drops every user's `user_plugin_optouts` row matching the deleted plugin and flushes the marketplace ETag cache. Audit log entry for `resource_grant.deleted` carries `optouts_dropped` so operators can correlate. - `app/marketplace_server/{packager,git_backend}.py` consume `resolve_user_marketplace` instead of `resolve_allowed_plugins`. The `/marketplace/info` payload now splits its `plugins` array by `source`, exposing `plugins` (admin) and `store_plugins` (community). ### Fixed - **Stored XSS via `video_url`** (`app/api/store.py`) — `video_url` accepted on `POST/PUT /api/store/entities` is now scheme-validated to `http(s)://` only. Previously a `javascript:` URI flowed through the form field into `store_detail.html`'s `` and would execute in any viewer's session on click. 400 `invalid_video_url` on bad input. - **ZIP decompression bomb** (`app/api/store.py:_safe_zip_extract`) — the uncompressed-side total of an upload is now capped at 200 MB (`MAX_ZIP_UNCOMPRESSED`); the compressed-side cap (50 MB) alone did not bound the on-disk footprint. 413 `zip_too_large_uncompressed` on oversize. - **Admin authz parity for Store mutations** (`app/api/store.py`, `app/web/router.py`, `app/web/templates/store_detail.html`) — `PUT /api/store/entities/{id}` now permits owner OR admin (matches `DELETE`); the store-detail page passes `is_admin` to the template and gates the Edit/Delete buttons on `is_owner OR is_admin`. Pre-fix, an admin could delete via the API but saw no Edit/Delete affordance in the UI, and could not update non-owned entities at all. - **Scratch directory leak on ZIP validation failure** (`app/api/store.py`, Devin Review) — `create_entity` and `update_entity` created the `scratch` temp dir inside one `try/finally` block but cleaned it up in a separate one. When `_safe_zip_extract` raised `HTTPException` (zip-slip, uncompressed-too-large) or `BadZipFile` was caught and re-raised, the exception exited the first scope and the cleanup `finally` was never reached. Each failed upload leaked a temp dir. Fixed by collapsing scratch creation + cleanup into a single outer `try/finally` covering both extraction and the metadata/bake work. - **Cross-owner suffix collision** (`app/api/store.py:create_entity`) — `sanitize_username` is many-to-one (`alice.smith` and `alice_smith` both → `alice-smith`). Two such users uploading entities with the same display `name` produced identical `-by-` suffixes, silently colliding in the served bundle's on-disk paths and the manifest catalog (Claude Code dedupes by `plugin.json`'s `name`). We now refuse the second upload with 409 `conflict_global_suffix`. ### Internal - Schema **v24 → v25**: adds `store_entities`, `user_store_installs`, `user_plugin_optouts`. Auto-migration via `_V24_TO_V25_MIGRATIONS` ladder branch in `src/db.py` (existing self-heal path also creates the tables on same-version starts). - New helpers in `src/store_naming.py`: `sanitize_username`, `suffixed_name`, `compute_entity_version` (sha256 of sorted `(relpath, content)` tuples, 16-char hex prefix). Predefined category taxonomy in `src/store_categories.py`. - New repositories: `src/repositories/{store_entities,user_store_installs, user_plugin_optouts}.py` (mirror existing `marketplace_plugins` style — dict returns, parameterized SQL, no ORM). - `app/utils.py:get_store_dir()` — `${DATA_DIR}/store/`. - `humanbytes` Jinja2 filter on Store detail page (binary KB/MB/GB). - New CLI command modules: `cli/commands/store.py`, `cli/commands/my_stack.py`. Registered as Typer subapps `agnes store` and `agnes my-stack` in `cli/main.py`. Tests at `tests/test_cli_store.py`. - `tests/test_store_api.py:TestStoreSecurityFixes` — regression suite for F1 (video_url), F2 (zip-bomb), F4 (admin authz parity), F5 (cross-owner suffix collision). ## [0.37.0] — 2026-05-06 Operator-side disk-layout release. Closes the 2026-05-05 shadow-mount class identified in v0.36.0's deploy notes via two independent fixes that operators can adopt separately: (#194 folds in @cvrysanek's #191 + #192). The image-side change is invisible — `STATE_DIR` defaults to the legacy nested path, so existing deployments see no behavior change unless they opt into the new flat layout. Folds in three rounds of Devin Review (3 BUGs + 1 ANALYSIS class, ANALYSIS deferred per the operator-side limitation it describes). ### Added - **`STATE_DIR` env var + `docker-compose.flat-mount.yml` overlay** — operators can now place the writable state disk in **parallel** to the data disk (`sdb` at `/data`, `sdc` at `/data-state`) instead of nested (`sdc` at `/data/state` inside `/data`). The flat layout removes three structural fragilities of the legacy nested layout: bind-mount propagation gotchas (the 2026-05-05 shadow-mount class), two-writer collisions on a shared prefix (host's `tls-rotate.timer` as root + container app as uid 999 on the same path), and mount-order coupling on disk resize. `STATE_DIR` defaults to `${DATA_DIR}/state` so existing deployers see no behavior change; opt-in to flat layout via the new overlay + `STATE_DIR=/data-state` per the runbook in `docs/state-dir.md`. Read by `src/db.py:_get_state_dir()`, `app/secrets.py:_state_dir()`, `app/main.py` (`.env_overlay`), `app/instance_config.py` (`instance.yaml` overlay reader), `app/api/admin.py` (writers for both `/api/admin/configure` and `/api/admin/server-config` against the same overlay), `app/api/marketplaces.py` (marketplace PAT persistence into `.env_overlay`), `scripts/ops/agnes-auto-upgrade.sh` (mount-sanity + cert detection), `scripts/ops/agnes-tls-rotate.sh` (`CERT_DIR=$STATE_DIR/certs`). All read/write sites resolve via the same helper so under `STATE_DIR=/data-state` the irreplaceable tier (`system.duckdb`, secrets, `instance.yaml`, `.env_overlay`, certs) lands on sdc consistently — partial migration would silently lose secrets on container restart. ### Changed - **`docker-compose.host-mount.yml` switched from "named volume + driver_opts" to direct service-level bind mounts** (`volumes: !override` per service). Docker named volumes have an immutability footgun: once a volume is created, its driver options are fixed for the life of the volume, and editing this file does NOT propagate the new options to existing volumes. This bit a deployer in production: the volume was created before the overlay had `bind,rbind`, kept the old `bind` (non-recursive) propagation, and containers wrote to a shadowed subdirectory of the parent disk instead of the nested child mount. DuckDB went FATAL on a root-owned WAL during a routine container recreate; sign-in broke. Direct service binds re-evaluate options every container start and default to recursive in modern Docker (20.10+) — no immutable state to migrate, no shadow-mount class. Operators on this overlay: next `docker compose up -d` starts containers with direct binds; the old `agnes_data` named volume is no longer referenced and can be removed with `docker volume rm agnes_data` (operator's choice — orphaned but harmless if left). Both `host-mount.yml` and `flat-mount.yml` `volumes: !override` blocks for `caddy` now restate every mount the base service depends on (notably `data:/srv:ro` for the v0.36.0 file_server bypass and `caddy_config:/config` for ACME state) — a Devin-caught regression where `!override` silently dropped these mounts under the new layout, defeating the parquet-download perf bypass. ## [0.36.0] — 2026-05-05 Combined performance + analyst-clarity bundle. Folds three previously-staged work streams into one PR (#188): the long-running `agnes query --remote` timeout (#181), the Caddy parquet-download bypass (#182), and Pavel's #185 Phase 1 trace findings (silent 44-min first-init, opaque CLI tracebacks, no analyst-Claude size signal). Also performs the Tier 1 event-loop unblocking — the five hottest BQ-touching endpoints were `async def` over synchronous DuckDB / BQ-extension calls, so a single heavy `agnes query --remote` froze every other request for the duration of the BQ wait. The image-side fixes ship in this release; for existing VMs, the new auto-upgrade.sh self-fetches the matching Caddyfile + compose overlays from `main` on its next 5-minute tick, so deployment requires no operator action beyond letting the cron run. ### Added - **`data_source.bigquery.query_timeout_ms` config knob** (default 600 000 ms = 10 min). The DuckDB BigQuery extension's built-in default of 90 s was too tight for analyst-scale queries against view-backed BQ datasets — `agnes query --remote` would HTTP 400 with `Binder Error: Query execution exceeded the timeout. Job ID: …` whenever the underlying BQ job took longer than 90 s, even though the BQ job itself was healthy. The new knob is applied via `SET bq_query_timeout_ms` after every `LOAD bigquery` on every BQ-touching DuckDB session — the orchestrator's `_remote_attach` ATTACH path (`src/orchestrator.py`), the analytics-DB read-only reattach path (`src/db.py:_reattach_remote_extensions` — the primary `agnes query --remote` request path), the `BqAccess` session factory (`connectors/bigquery/access.py`), and the standalone extractor (`connectors/bigquery/extractor.py`). Sentinel `0` (or non-numeric / unparseable values) leaves the extension default in place so operators on legacy extension versions that don't recognise the setting aren't broken. Configurable via `/admin/server-config` UI. Note: BigQuery's `jobs.query` RPC caps the wait at ~200 s per call regardless of this setting; the extension polls on top so the effective ceiling is the value here but each poll is ~200 s. DuckDB emits an informational warning when the value is set above the BQ RPC cap — operators can safely ignore it. - **Per-user parallel parquet downloads in `agnes pull`** — the download loop in `cli/lib/pull.py` now uses a `ThreadPoolExecutor` with concurrency capped by the new `AGNES_PULL_PARALLELISM` env var (default 4, set 1 to restore pre-PR serial behavior). On a registry of N tables the wall-clock time drops from `Σ stream_download_seconds(table_i)` to roughly `max × ceil(N/4)`. Works hand-in-hand with the Caddy `file_server` change below: without it parallel client-side downloads would still queue on the single uvicorn worker; with it each request is its own caddy goroutine + sendfile, so 4-way parallelism actually delivers throughput. Per-table error semantics preserved — a failure on one table no longer aborts the rest of the batch. - **`agnes init` / `agnes pull --skip-materialize`** — opts the first sync out of materialized-mode tables (server-side scheduled-query parquets, often multi-GB). Pavel's #185 Phase 1: a single 6.3 GB `order_economics` parquet kept first init silent for 44 minutes. Materialized rows stay discoverable via `agnes catalog`; rerun without the flag once the analyst actually needs them locally. - **`agnes pull` progress bar** — Rich-driven aggregate transfer display rendered to stderr when not `--quiet` and not `--json`. Per-file label + bytes / total / rate / ETA, aggregated across the parallel `ThreadPoolExecutor` workers introduced earlier in this PR. Replaces the prior 0-stdout silence on first init. - **CLI clean-error wrapper** (`cli/main.py:_run_with_clean_errors`, new entry point in `pyproject.toml`) — `httpx.ReadTimeout` / `ConnectError` / `RemoteProtocolError` etc. used to dump a five-frame Python traceback to the analyst's terminal when a `agnes query --remote` against a slow BQ view timed out client-side. Now: one-line `Error: …` message + actionable hint (e.g. "narrow the WHERE on the partition column from `agnes catalog --json`, or run `agnes snapshot create --estimate`"), exit code 1. Full traceback is appended to `~/.config/agnes/last-error.log` so an operator can recover it for support without spamming the analyst's terminal. Implemented as `AgnesTransportError` raised from the `api_get` / `api_post` / `api_delete` / `api_patch` / `stream_download` helpers in `cli/client.py`; the top-level Typer wrapper renders it. Unhandled `Exception`s are caught at the same boundary, logged, and printed as "internal CLI error (see logfile)" so a Python traceback never leaks to the analyst. - **`scripts/ops/agnes-auto-upgrade.sh` now re-fetches Caddyfile + every compose overlay** from `keboola/agnes-the-ai-analyst@main` on every tick, hashes them, and triggers a `docker compose up -d` recreation when the hash changes — same path as an image-digest change. Pre-fix the script only watched `docker images` digests, so a Caddyfile or compose change in main never reached running VMs (only fresh boots ran `startup.sh`'s file fetch). Without this, the new file_server downloads-path below would land in the image but stay inert against an old Caddyfile. The script also self-updates from the same path so the very fix that watches config files isn't itself stuck on running VMs. Fail-soft on curl errors — keeps the existing file rather than blanking it. - **Caddy `file_server` for parquet downloads** — `GET /api/data/{table_id}/download` is now intercepted at the Caddy layer (TLS profile only) and served directly via sendfile/zero-copy from the data volume mounted read-only at `/srv` inside the caddy container. Caddy authorises every request via a new lightweight RBAC probe `GET /api/data/{table_id}/check-access` (returns 204 when the caller has read access on the table, 403 otherwise) using the `forward_auth` directive — the bulk byte transfer never touches uvicorn workers. Resolves a real production failure mode where a single multi-GB analyst pull held the app's only uvicorn worker for the duration of the stream and starved the UI / `/api/health` / every other API endpoint, eventually flipping the container to `unhealthy`. Path discovery uses Caddy's `try_files` over the known `extract.duckdb` v2 source subdirs (`bigquery/data/.parquet`, `keboola/data/.parquet`, `jira/data/.parquet`); a parquet not at any of those paths transparently falls through to the existing app handler so legacy `src_data/parquet` layouts and future connectors keep working with no Caddyfile change. Non-Caddy deployments (dev `docker compose up` without `--profile tls`) continue to use the app handler unchanged. - **Workspace prompt: decision tree, common-mistakes callout, failure-mode dictionary** in `config/claude_md_template.txt` (the template `agnes init` writes to `/CLAUDE.md`). Surfaces every catalog-row field analyst Claude should read before deciding which command to use (`query_mode`, `sql_flavor`, `where_examples`, `fetch_via`, `rough_size_hint`); explicitly binds `--estimate` to `agnes snapshot create` ONLY (was the most-failed first-try misuse — fails with `No such option: --estimate` on `agnes query`); calls out the `agnes fetch` → `agnes snapshot create` rename so stale-doc analysts don't run a non-command; documents the BQ permission model (server SA, not personal Google identity) and a 6-row failure-mode table mapping each common error wording to its cause + the right next step. - **`rough_size_hint` populated for `local` + `materialized` catalog rows** in `GET /api/v2/catalog` (was hardcoded `null` with a "Task 8" TODO). Reads the parquet file size at `${DATA_DIR}/extracts//data/.parquet` and buckets into `small` (≤100 MiB), `medium` (≤1 GiB), `large` (≤10 GiB), `very_large` (>10 GiB). `remote` rows stay `null` for now (size requires a BQ INFORMATION_SCHEMA call; tracked separately). Lets analyst Claude pick `agnes snapshot create` over `agnes query --remote` by inspecting `agnes catalog --json` rather than discovering size empirically via a failed `--remote` round-trip. ### Changed - **Tier 1 event-loop unblocking** — the five hottest BQ-touching endpoints (`POST /api/query`, `POST /api/v2/scan`, `POST /api/v2/scan/estimate`, `GET /api/v2/sample/{id}`, `GET /api/v2/schema/{id}`) were declared `async def` but invoked synchronous DuckDB / BQ-extension calls inside the body. Under uvicorn's single event loop that meant a single heavy `agnes query --remote` (waiting up to ~200 s for BQ's `jobs.query` to return) **froze every other request** — `/api/health`, the dashboard, auth, even another query — for the full duration of the BQ wait. Operators saw "VM idle, app frozen" symptoms during this work. Converted all five to plain `def` so FastAPI auto-offloads the blocking body to the anyio thread pool; the event loop stays free for non-BQ requests. Verified via 0-await audit (no `await` statements in the converted handlers, so the rename is safe). Tests: `tests/test_v2_*.py` were rewritten to call the handlers directly instead of `asyncio.run(...)` (which now fails on a non-coroutine return). Pairs with the thread-pool capacity bump below. - **`AGNES_THREADPOOL_SIZE` env var** (default 200, was anyio's stock 40) controls the FastAPI / Starlette thread pool capacity used by every plain-`def` route handler. Set in `app/main.py:lifespan` via `anyio.to_thread.current_default_thread_limiter().total_tokens`. 200 leaves comfortable headroom over the BQ extension's connection budget while keeping the per-process thread cost bounded — for the workload of <50 concurrent analysts this is well over what's needed; bump for higher concurrency. - **CLI update-banner now says `agnes` instead of `da`** (`cli/update_check.py:format_outdated_notice`). The string `[update] da X is out of date` had survived the `da` → `agnes` CLI rename and was the most-visible stale identifier in the analyst-facing surface — every CLI command printed it on stderr when a newer wheel was available. ### Fixed - **CLI ReadTimeout message reports the actual httpx timeout** (was hardcoded to `QUERY_TIMEOUT_S` = 300s). On a 30s-default call (`agnes catalog`, `agnes auth`, …) the analyst saw "didn't respond within the read timeout (300s)" while the call had actually given up after 30s — confusing and unactionable. The translator now takes the real timeout from the calling helper and renders it; the long-running-BQ advisory only appears for calls where the timeout was set ≥ 60s. Devin Review on PR #188. - Keboola sync now falls back to the legacy Storage-API client when the DuckDB Keboola extension's per-table scan fails, not just when the initial `ATTACH` fails. Two changes: - `kbcstorage>=0.9.0` is promoted from optional to core dependency. The legacy fallback path in `connectors/keboola/extractor.py:_extract_via_legacy` has been there since the extension landed, but until now the bare `from kbcstorage.client import Client` would crash any default install with `ModuleNotFoundError`. - `connectors/keboola/extractor.py:run` now wraps `_extract_via_extension` in a per-table try/except — on any per-table scan failure it retries via the legacy client. Previously, when `ATTACH` succeeded but the table-level `COPY (SELECT * FROM kbc.""."
")` failed, the table was just marked failed with no retry. Together these unblock deployments where the extension's bucket-schema scans return `Schema '..."in.c-..."' does not exist or not authorized` (keboola/duckdb-extension#17) while the upstream extension fix is in flight. ## [0.35.1] — 2026-05-05 ### Fixed - `agnes query --remote` no longer dies after 30s on long-running BigQuery SELECTs. The CLI HTTP client now defaults to a 300s timeout for `/api/query` and exposes `AGNES_QUERY_TIMEOUT` (seconds, float) for operators who need to extend it further. Other CLI calls keep the 30s default. (`cli/client.py`, `cli/commands/query.py`) ## [0.35.0] — 2026-05-05 Five-defect fix for the silently-broken session pipeline on default Compose deploys (#176). Sessions uploaded by `agnes push` landed on `/data/user_sessions//*.jsonl`, but on a stock `docker compose up` deploy nothing ever processed them — `/corporate-memory` stayed empty even when sessions and `CLAUDE.local.md` were uploaded. The root cause was a stack of compounding defects: LLM SDKs were dev-only deps so the scheduler container boot-looped on `ModuleNotFoundError`, the side-car services were profile-gated and ran as tight `restart: unless-stopped` boot loops anyway, the `verification_detector` had no scheduler entry at all, the first-time setup never seeded an `ai:` block, and the `/corporate-memory` page silently filtered out the pending review queue. This release wires the LLM pipeline into the existing scheduler-v2 model (one HTTP-driven cron tick per service) and adds a health-check that warns when uploaded jsonls aren't being processed. ### Changed - **BREAKING** `docker-compose.yml` and `docker-compose.prod.yml` no longer ship the `corporate-memory` and `session-collector` services. The scheduler container drives both jobs through admin HTTP endpoints (see Added below) on offset cadences (10 min / 17 min). Operators previously running `COMPOSE_PROFILES=full` or maintaining custom Compose overrides need to drop those service stanzas — leaving them in produces a double-driver footgun (the standalone container loop races the scheduler-v2 cron tick on `/data/user_sessions` and `knowledge_items` writes). The Python entry points (`services/{corporate_memory, session_collector, verification_detector}/__main__.py`) remain — they're still callable from the CLI for one-shot manual runs and from the new admin endpoints. ### Added - New admin endpoints in `app/api/admin.py` that wrap the LLM pipeline jobs so the scheduler can drive them over HTTP (matching the existing `/api/marketplaces/sync-all` pattern): - `POST /api/admin/run-session-collector` — copies Claude Code session jsonls from user homes to `/data/user_sessions//`. - `POST /api/admin/run-verification-detector` — extracts verified knowledge from session transcripts via the LLM, writes pending items to `knowledge_items`. - `POST /api/admin/run-corporate-memory` — refreshes the catalog from team `CLAUDE.local.md` files. All three are admin-gated, sync-def (FastAPI thread pool), and emit one audit row per invocation. - Three new entries in `services/scheduler/__main__.py:JOBS` with deliberately offset cadences (10 m / 15 m / 17 m, all coprime modulo the 30 s tick) so the LLM-backed jobs don't fire on the same tick and stack their API + DB load: - `session-collector` — every 10 min → `POST /api/admin/run-session-collector`. - `verification-detector` — every 15 min → `POST /api/admin/run-verification-detector`. - `corporate-memory` — every 17 min → `POST /api/admin/run-corporate-memory`. - `connectors.llm.factory.create_extractor_from_env_or_config(ai_config)` — falls back to `ANTHROPIC_API_KEY` / `LLM_API_KEY` env vars when the `ai:` block is empty, raises a clear `ValueError` when neither is available. `services/corporate_memory` and `services/verification_detector` switch to the new helper so a missing `ai:` section is no longer a silent skip. - `POST /api/admin/configure` now seeds a default `ai:` block into the writable `instance.yaml` overlay when the overlay has no `ai:` yet AND `ANTHROPIC_API_KEY` (or `LLM_API_KEY`) is present in the environment. The block stores the env-var reference (`${ANTHROPIC_API_KEY}`), never the raw secret. Existing operator config is preserved verbatim. - `/corporate-memory` page renders an admin-only banner (`N pending items awaiting review — review them at /corporate-memory/admin`) when the pending review queue is non-empty. Non-admins see no change — the route zeroes the count server-side before the template renders. Closes the silent-failure UX gap that hid the review queue from operators with `approval_mode='review_queue'` (the default). - `GET /api/health/detailed` now returns a `session_pipeline` service entry that warns when uploaded session jsonls aren't being processed. Heuristic: `max(mtime of /data/user_sessions/**/*.jsonl) <= max(processed_at in session_extraction_state) + grace_seconds`, where `grace_seconds = 2 ×` the verification-detector cadence (default 30 min, configurable via `SCHEDULER_VERIFICATION_DETECTOR_INTERVAL`). Surfaces as `status='warning'` (never `error`) with an actionable `detail` pointing at the verification-detector job. A warning bubbles up to the existing `overall='degraded'` aggregation so `agnes diagnose system` flags it. ### Fixed - **Defect 4 — LLM provider SDKs in dev-only deps caused scheduler container boot loops.** `anthropic>=0.30.0` and `openai>=1.30.0` are now in `[project].dependencies`, not `[project.optional-dependencies].dev`. The Dockerfile's `uv pip install --system --no-cache .` picks them up automatically, no Dockerfile change required. `tests/test_packaging.py` locks the contract. - **Defect 5 — first-time setup never wrote an `ai:` block.** Two paths to a working LLM pipeline now actually work end-to-end (#179 review): (a) a default `ai:` block seeded by `POST /api/admin/configure` into the writable overlay at `${DATA_DIR}/state/instance.yaml` when env keys are present (Added above), or (b) env-var fallback at service start time. The seeded overlay path was dead code on the initial 0.35.0 cut — the three LLM consumers (`services/corporate_memory/collector.py`, `services/verification_detector/__main__.py`, `app/api/admin.py:run_verification_detector`) imported `load_instance_config` from `config.loader` (which only reads the static config dir), and even if they had read the overlay, `app/instance_config.py` ran `yaml.safe_load` on it without resolving `${ENV_VAR}` references so the seeded `${ANTHROPIC_API_KEY}` placeholder would have stayed literal. Both fixes shipped: consumers switched to the overlay-aware `app.instance_config.load_instance_config`, and the overlay is now passed through `config.loader._resolve_env_refs` before deep-merge with the static base. `collect_all` no longer swallows the factory's `ValueError` into `stats["errors"]` — fail-fast propagates so the scheduler / admin endpoint surface the actionable misconfiguration message. - **#179 review — scheduler ignored its own LLM cadence env vars.** `app/api/health.py` already read `SCHEDULER_VERIFICATION_DETECTOR_INTERVAL` to compute the staleness grace window, but the scheduler cadence was hardcoded to `every 15m`, so an operator throttling the detector via the env was silently ignored on the schedule side while the health grace silently widened. All three LLM-pipeline cadences are now env-driven through the same `_read_positive_int` pattern as `data-refresh` / `health-check` / `script-runner`: `SCHEDULER_SESSION_COLLECTOR_INTERVAL` (default 600s = 10m), `SCHEDULER_VERIFICATION_DETECTOR_INTERVAL` (default 900s = 15m), and `SCHEDULER_CORPORATE_MEMORY_INTERVAL` (default 1020s = 17m). Defaults preserve the 10/15/17m coprime offset so the three jobs don't fire on the same tick. The verification-detector env var remains the single source of truth for the health-check grace (still `2 ×` the cadence). - **Defect 3 — `verification_detector` had no scheduler entry.** Now in `JOBS` with a 15 min cadence, hitting the new `/api/admin/run-verification-detector` endpoint. - **Defect 2 — side-car services gated by `profiles: [full]` were silently skipped on default deploys.** Both stanzas dropped (Changed above); the scheduler-v2 cron is the sole driver. - **Defect 1 — `/corporate-memory` filtered `status IN ('approved','mandatory')` with no hint that pending items existed.** Admin banner added (Added above). - **#179 review — `/api/admin/run-session-collector` would SystemExit the worker.** The endpoint called `collector.main()`, whose `argparse.parse_args()` parsed uvicorn's `sys.argv` (`['app.main:app', '--host', …]`) and called `sys.exit(2)` on the unrecognised flags. `SystemExit` inherits from `BaseException`, escapes FastAPI's exception machinery, and propagates through the thread pool — every scheduler tick that fired the endpoint either 500-ed or risked killing the uvicorn worker. Fix: `services/session_collector/collector.py` now exposes an argv-free `run(dry_run, verbose) -> (rc, stats)` helper; `main()` is a thin CLI shim around it and the admin endpoint calls `run()` directly. Audit log now carries the per-run stats (`users_processed`, `files_copied`, `files_skipped`) instead of just the rc. Regression tests in `tests/test_session_collector.py::TestRunHelper`. - **#179 review — `python -m services.corporate_memory` crashed on missing LLM config instead of exiting cleanly.** The PR's fail-fast change made `collect_all()` raise `ValueError` when neither an `ai:` block nor `ANTHROPIC_API_KEY`/`LLM_API_KEY` was available. The `verification_detector` CLI was updated to catch it; the corporate-memory CLI was missed. Now also wrapped — operators get a one-line `Corporate Memory cannot run: ` on stderr and rc=1 instead of a raw traceback. Regression test in `tests/test_llm_connector.py::TestCorporateMemoryCollector::test_main_returns_1_on_no_ai_config_instead_of_traceback`. - **E2E test — Anthropic API rejected every extraction request.** The structured-output API now requires `additionalProperties: false` on every `{"type": "object"}` node in the json_schema; without it the API returns 400 `invalid_request_error` ("output_config.format.schema: For 'object' type, 'additionalProperties' must be explicitly set to false"). Surfaced on a real BQ-backed deploy: every uploaded session jsonl failed verification-extraction in a tight retry loop. Fix: `connectors/llm/anthropic_provider.py` now wraps the caller-supplied schema through a recursive `_strict_json_schema()` walker that adds the field where missing (preserving any explicit operator override), then passes the strict variant to the API. Six unit tests in `tests/test_llm_connector.py::TestStrictJsonSchema` pin the recursion across nested objects, array items, and the no-mutation invariant. - **#179 review — `/api/admin/run-verification-detector` skipped audit on unhandled exceptions.** If `detector.run()` threw anything other than the already-translated `ValueError` (DuckDB lock, network blip, unexpected SDK error), the audit_log row was never written — the operator's only signal was `docker logs agnes-scheduler-1`. The endpoint now wraps `detector.run` in try/except, records the exception in `audit_params["unhandled_error"]`, then re-raises as 500 after audit. The `/admin/scheduler-runs` page surfaces the failure row with the error type and message. - **#179 review — `SCHEDULER_AUDIT_ACTIONS` listed action strings that don't actually appear in `audit_log`.** The list at `app/web/router.py:952` had `"marketplaces_sync_all"` (wrong — actual is `"marketplace.sync_all"`) plus `"data_refresh"` and `"scripts_run_due"` (which `app/api/sync.py` and `app/api/scripts.py` don't write). Corrected to the four actually-logged strings, with a comment pointing at the missing audit calls in sync/scripts as a follow-up. - **#179 review — `/api/admin/run-corporate-memory` skipped audit on unhandled exceptions** (same gap as `run_verification_detector` from the previous round). Mirrored the same try/except + `unhandled_error` audit pattern, so a DuckDB lock or unexpected SDK error from `collect_all()` now produces an audit row with the error type+message before re-raising as 500. Regression test in `tests/test_admin_run_endpoints.py::TestRunCorporateMemory::test_unhandled_exception_still_audits`. - **#179 review — `/api/admin/run-session-collector` skipped audit on unhandled exceptions** (third occurrence of the same pattern, completes the trilogy of LLM-pipeline endpoints). Mirrored the same try/except + `unhandled_error` audit pattern from the other two endpoints, so a `PermissionError` walking `/home`, an `OSError` on `/data/user_sessions` mkdir, or any other unhandled exception from `collector.run()` now produces an audit row before re-raising as 500. Regression test in `tests/test_admin_run_endpoints.py::TestRunSessionCollector::test_unhandled_exception_still_audits`. - **#179 review — `/profile/sessions` 500-ed on transient `stat()` failure.** The previous implementation used `sorted(glob, key=lambda p: p.stat().st_mtime)`; if any single jsonl file's stat call raised (race with delete, EACCES from a remount, etc.), the whole sort raised and the page returned 500 instead of just dropping that one row. Reworked the gather: stat each path under try/except into a `(path, stat)` list, then sort the already-statted entries. Bad files are silently dropped from the listing. Regression test in `tests/test_web_ui.py::TestAdminRoleGuards::test_profile_sessions_page_tolerates_stat_failures`. ### Added - `/admin/scheduler-runs` — read-only admin page showing the last 200 audit-log entries from scheduler-driven actions (`run_session_collector`, `run_verification_detector`, `run_corporate_memory`, `marketplace.sync_all`). New `AuditRepository.query_actions(actions, limit)` query helper, new admin nav entry under the Admin dropdown. `data-refresh` (`POST /api/sync/trigger`) and `script-runner` (`POST /api/scripts/run-due`) are scheduler jobs but don't write to `audit_log` today, so they can't appear here yet. Failed scheduler ticks (HTTP 401, network errors) don't reach the audit_log either — those still live only in `docker logs agnes-scheduler-1`; the page calls that out with a hint to set `SCHEDULER_API_TOKEN` if no rows show up. - `/profile/sessions` — self-service user page in the user menu, showing all session jsonls the caller uploaded via `agnes push` joined against `session_extraction_state`. Each row shows uploaded_at, file size, status badge (`pending` / `processed` / `extracted`), processed_at, `items_extracted`, and a per-row Download button. The page docstring explicitly calls out that `items_extracted = 0` means the verification detector ran successfully but the LLM found no claims worth tracking — that's the documented "no items" outcome, not a broken pipeline. Closes the gap surfaced during the e2e test of #176 where a user could see their sessions on disk and process them through the LLM but had no UI to inspect what happened. - `GET /profile/sessions/` — owner-only download of a single jsonl. Auth via `get_current_user`; path safety locks the served file under `${DATA_DIR}/user_sessions//` and rejects path-traversal / nested-component / non-`.jsonl` / dotfile filenames with 404 (never 403, so existence of files belonging to other users is not leaked). `Content-Disposition: attachment` returns the file as a download. ### Internal - `tests/test_packaging.py` — guards against `anthropic`/`openai` slipping back into dev extras. - `tests/test_setup_ai_block.py` — overlay seeding contract for `POST /api/admin/configure`. - `tests/test_llm_provider_env_fallback.py` — env fallback + fail-fast for `create_extractor_from_env_or_config`. - `tests/test_admin_run_endpoints.py` — admin gating + scheduler registration + endpoint contract for the three new run-* endpoints. - `tests/test_docker_compose.py` — pins the compose contract: the two side-car services must not reappear under either Compose file. - `tests/test_corporate_memory_page.py` — pending-banner contract (admin sees, non-admin doesn't). - `tests/test_health_session_pipeline.py` — session-pipeline staleness check across cold-start + ok + warning + never-processed cases. - `tests/test_instance_config_overlay.py` — pins overlay env-ref resolution + the three LLM consumers reading from `app.instance_config` (#179 review). - `tests/test_scheduler.py` — `TestLLMPipelineCadenceEnvVars` + `TestVerificationDetectorGraceFollowsCadence` pin the new env-var-driven cadences and the single-source-of-truth contract between scheduler and health-check grace (#179 review). - `docs/architecture.md` — Services table updated to reflect the scheduler-v2 cadence map. ## [0.34.0] — 2026-05-04 End-to-end clean-analyst-bootstrap rewrite. The web `/setup` page now produces a single unified paste prompt that, dropped into Claude Code in an empty folder, fully bootstraps a workspace — installs the CLI, authenticates, fetches `CLAUDE.md`, installs SessionStart/End hooks, runs the first data refresh, and writes a human-readable workspace docs file (`AGNES_WORKSPACE.md`). The admin-vs-analyst layout split (introduced as `?role=` mid-cycle) was collapsed before merge: every caller sees the same flow, with the marketplace + plugins block emitted iff the caller has plugin grants. 26 implementation tasks across 6 phases plus a 10-task unification follow-up. ### Changed - **BREAKING** CLI binary renamed from `da` to `agnes`. No backward-compat alias is shipped. Update shell aliases, hook commands in any pre-existing `.claude/settings.json`, scripts, and cron jobs. Reinstall via `uv tool install `; the wheel now ships an `agnes` entry point. - **BREAKING** Environment variables and config dir renamed: `DA_CONFIG_DIR/DA_SERVER/DA_NO_UPDATE_CHECK/DA_LOCAL_DIR/DA_TOKEN/DA_STREAM_RETRIES` → `AGNES_*`; `~/.config/da/` → `~/.config/agnes/`. Hard cutover, no fallback. Existing analysts re-authenticate via `agnes auth import-token`. - **BREAKING** Analyst bootstrap rewritten end-to-end. `da analyst setup` is removed; replaced by `agnes init` (non-interactive, requires `--server-url` and `--token`). `da sync` is split into `agnes pull` (refresh) and `agnes push` (upload). `da fetch` is folded into `agnes snapshot create`. `da metrics list/show` is folded into `agnes catalog --metrics`; `da metrics import/export/validate` move to `agnes admin metrics {import,export,validate}`. The `da analyst` namespace is removed; the workspace status command is now `agnes status`. The previous `da status` (server-health overview) becomes `agnes diagnose system`. - **BREAKING** Workspace layout simplified. Removed: `data/parquet/`, `data/duckdb/`, `data/metadata/`, `user/artifacts/`. Canonical paths: `server/parquet/` (synced parquets), `user/duckdb/analytics.duckdb` (DuckDB views), `user/snapshots/` (ad-hoc snapshots), `user/sessions/` (recorded sessions). Lazy-mkdir contract — no empty pre-allocated directories. - **BREAKING** `/setup` is now a single unified flow regardless of caller's role. The `?role=` query parameter (introduced earlier in this Unreleased cycle but never released) is removed before merge — no migration needed. The admin tile is gone. PAT scope is uniform: every install-page mint uses `scope=general` with `expires_in_days=90`, calling the existing `POST /auth/tokens` endpoint. The `bootstrap-analyst` 1 h-clamped scope is no longer used from `/setup` (still defined in code for future reuse, see open issue for redesign). The marketplace + plugins block is emitted iff the caller has plugin grants in `resource_grants`. `agnes init` is now part of every setup flow (admin and analyst alike) — it's the workspace-rails delivery mechanism. `/install` continues to 302 to `/setup`. - `CLAUDE.md` server-side template + repo-root `CLAUDE.md` updated to reference the new CLI verbs and workspace paths. The admin UI for the `claude_md_template` DB override (`/admin/workspace-prompt`) renders a yellow banner when the saved override contains legacy strings (`data/parquet/`, `da sync`, `da fetch`, `da analyst setup`, `da metrics list/show`); admins re-author and save to clear it. Migration is manual. ### Added - `agnes init ` — non-interactive workspace bootstrap orchestrator. 8 steps: detect existing workspace, verify PAT (`GET /api/catalog/tables`), save config + token globally, fetch `CLAUDE.md` from `/api/welcome`, install SessionStart/End hooks via `cli/lib/hooks.py:install_claude_hooks`, write `CLAUDE.local.md` stub (preserved on `--force`), run first `agnes pull`, write `AGNES_WORKSPACE.md`. Errors render via `cli/error_render.py:render_error()` with typed kinds (`auth_failed`, `server_unreachable`, `partial_state`, `manifest_unauthorized`). - `agnes pull` / `agnes push` — split from the old `da sync` / `da sync --upload-only`. `--quiet` / `--json` / `--dry-run` flags. SessionStart hook runs `agnes pull --quiet`; SessionEnd hook runs `agnes push --quiet`. - `agnes snapshot create
` — folded from `da fetch`. Adds `if not local_db.exists()` guard so `agnes snapshot create` no longer silently materializes an empty DuckDB file when run before any `agnes pull`. - `agnes catalog --metrics` (replaces `da metrics list`) and `agnes catalog --metrics --show ` (replaces `da metrics show`). - `agnes admin metrics {import,export,validate}` — write paths relocated from the deleted `da metrics` namespace. - `agnes diagnose system` — server-side health check (was the old `da status`). - `AGNES_WORKSPACE.md` — human-readable workspace docs file generated by `agnes init` in the workspace root. Documents global install, workspace layout, hooks, cheat sheet, uninstall recipe. - PAT request body now accepts `scope: str = "general"` and `ttl_seconds: int | None = None` fields. PATs minted with `scope="bootstrap-analyst"` are TTL-clamped to ≤ 1 h server-side. Existing `expires_in_days` field continues to work; `ttl_seconds` wins when both are set. `ttl_seconds` upper bound is 315_360_000 (matches `expires_in_days <= 3650` cap). JWT carries the `scope` claim via new `extra_claims` parameter on `create_access_token`; reserved keys (`sub`/`email`/`typ`/`iat`/`jti`/`exp`) cannot be overridden via `extra_claims`. Audit log includes the scope. - `cli/lib/` shared-library tree with `cli/lib/pull.py:run_pull` (data-refresh primitive callable from both the Typer wrapper and `agnes init`) and `cli/lib/hooks.py:install_claude_hooks` (workspace-scoped, idempotent Claude Code hook installer). - `_scan_legacy_strings` helper + `legacy_strings_detected` field on `GET /api/admin/workspace-prompt-template` — server scans saved CLAUDE.md overrides for stale CLI verbs / paths; the admin UI banner consumes the field. - `/setup` pre-flight check (step 4, gated on the marketplace block being present) now verifies `claude --version` in addition to `git --version`. Both binaries are needed by `claude plugin marketplace add` and the git-clone fallback — checking them together surfaces a clear "install X" message instead of a confusing downstream error. Install hints: `npm i -g @anthropic-ai/claude-code` for Linux/WSL plus a doc URL (`https://docs.claude.com/claude-code`) for macOS / Windows native installers. ### Fixed - `agnes pull` (formerly `da sync`) no longer creates `.claude/rules/` when the corporate-memory bundle is empty. - `agnes pull` no longer creates `server/parquet/` when the manifest is empty (mkdir is lazy — only on first per-table write). - `agnes snapshot create` (formerly `da fetch`) no longer materializes an empty `user/duckdb/analytics.duckdb` when run before any `agnes pull`. Friendly hint redirects to `agnes pull`. - Workspace `agnes status` reads from the canonical `server/parquet/` and `user/duckdb/analytics.duckdb` paths (was reading legacy `data/parquet/`, `data/metadata/last_sync.json`). - `agnes init` and `agnes pull` errors now use the `cli/error_render.py` typed-error renderer (added in 0.32.0), so analyst-facing error UX matches the structured shape `agnes query --remote` already produces. - **Schema v24 migration retry path is no longer dead** (Devin Review on `db.py:1757`, escalated from advisory to critical on rescan). Pre-fix: when `_v23_to_v24_finalize` had materialized BQ rows to migrate but `data_source.bigquery.project` was not configured, it logged a warning per row and returned normally. The schema_version then bumped to 24 unconditionally, the `if current < 24:` gate in `_ensure_schema` skipped the function on every subsequent startup, and the affected rows kept their DuckDB-flavor `bq."ds"."tbl"` source_query forever — which the new `_wrap_admin_sql_for_jobs_api` wrapping path rejects as unparseable BQ SQL with no automatic recovery. The "set the project and restart to retry" log hint pointed at a code path that no longer ran. Fix: the migration now raises `RuntimeError` BEFORE the schema_version bump when it has rows to migrate but no project_id, blocking startup with a clear actionable error pointing at `data_source.bigquery.project`. Operator configures the project, restarts, and the migration completes (schema_version is still at 23, so the `if current < 24:` gate fires). Side effect: a BQ-using deployment that hasn't set the project blocks startup until they do — that's the right call for a config error that would otherwise silently break materialized tables. Two regression tests in `test_schema_v24_source_query_rewrite.py`: `test_v24_raises_when_project_not_configured_and_rows_need_migration` (raise + version-stays-at-23) and `test_v24_skips_clean_when_no_rows_match_even_without_project` (no-rows-no-block invariant). - **`agnes admin register-table` UX**: three real-world feedback items addressed. - **`--query-mode materialized` now requires `--bucket`** (client-side validation; exits with a clear error before hitting the server). The previous help docstring claimed `--bucket` was *ignored* for materialized rows, but the value is actually load-bearing — `agnes schema ` builds the BQ identifier as `bq..`, so an empty bucket registered the row but broke subsequent schema/describe with HTTP 400 "unsafe BQ identifier in registry". Docstring rewritten to reflect reality. - **Post-success hints**: after a successful registration the CLI now points operators at the two follow-ups they routinely miss: (a) `agnes setup first-sync` to materialize the parquet (registration alone doesn't trigger a build; `agnes pull` reports "Updated 0 tables" until the scheduler tick), and (b) `agnes admin grant create table ` to make the row visible in `agnes catalog` for non-admin users (catalog is RBAC-filtered). - Test coverage: `tests/test_cli_admin_materialized.py::test_register_materialized_without_bucket_fails_with_clear_error` and `test_register_table_emits_first_sync_and_grant_hints`. - **`agnes query --remote` SQL rewriter no longer corrupts output when the GCP project ID contains a registered table name as a hyphen-delimited word** (Devin Review on `query.py:464`). The previous iterative rewrite (one `re.sub(\b\b, ...)` per registered name) was vulnerable to cross-contamination: e.g. project `my-ue-project` + registered `orders` + registered `ue` → iter 1 rewrites `orders` to `\`my-ue-project.fin.orders\``, iter 2's `\bue\b` then matches the `ue` INSIDE `my-ue-project` and corrupts the iter-1 path. Fix: replaced the iteration with a SINGLE `re.sub` whose alternation regex (sorted longest-first) handles every name in one pass, so freshly-inserted backticked text isn't re-scanned. The fallback at `query.py:576` (per-table SELECT * on BQ parse error) caught the corrupted output as `bq_bad_request` so impact was over-estimation rather than fail-open, but the partition-pruning benefit of #171 is now preserved for projects whose IDs share a hyphen-segment with a registered table name. Regression test in `tests/test_api_query_guardrail.py::test_rewrite_helper_does_not_corrupt_when_project_id_contains_registered_name`. - **BigQuery materialize TTL reclaim is no longer dead code** (Devin Review on `extractor.py:166`). `_try_acquire_file_lock` used to call `open(lock_path, mode="w")` BEFORE checking the lock-file mtime, which truncated the file and refreshed mtime to *now* on every invocation. The subsequent `time.time() - lock_path.stat().st_mtime` always saw age ~0, so `age > TTL` never fired, and `materialize.lock_ttl_seconds` was a silently no-op config knob. Fix: stat the lock path BEFORE any `open()` to read the real pre-probe mtime; if older than TTL, unlink (forcing a fresh inode for the next `open + flock`); only then probe. Two regression tests added: `test_stale_held_lock_is_reclaimed_despite_live_holder` exercises the full reclaim path with a still-living fcntl holder, `test_failed_probe_does_not_self_refresh_lock_mtime` pins that a failed acquisition doesn't pathologically loop. Residual cross-process risk (a genuinely overrunning materialize past TTL races a fresh attempt) is documented in the helper docstring; in-process `threading.Lock` keyed on `table_id` blocks the single-process race. - **`agnes init --token X` now correctly uses the explicit token in the verify call**, even when `~/.config/agnes/token.json` already holds a stale token from a prior install. Pre-fix `cli.config.get_token()` read the on-disk file first and only fell back to env vars, so step 2 (PAT-verify) ran with the stale token and failed with a confusing 401 — even though the `--token` arg was valid (Devin Review on `init.py:99`). Fix: a `ContextVar`-based override in `cli.config` short-circuits `get_token()` before the file read; `_override_server_env` (used by both `agnes init` and `agnes pull`'s `run_pull`) sets it for the duration of the call. Async-safe (each task sees its own override) and leak-proof (resets on context exit). - **`agnes status` sessions counter now reads the same source as `agnes push`** — `~/.claude/projects//` (Claude Code's actual write path) with the legacy `/user/sessions/` as a fallback, via `cli.lib.claude_sessions.list_session_files()`. Pre-fix the counter only checked the legacy dir and always reported 0 in workspaces bootstrapped with `agnes init` (since Claude Code never writes there). - **BigQuery materialize lock-reclaim docstring** at `connectors/bigquery/extractor.py:_try_acquire_file_lock` corrected: a still-running holder's `fcntl.flock` does NOT block the post-unlink reacquisition (new file = new inode = independent lock). The in-process `threading.Lock` keyed on `table_id` is the actual concurrency guard; cross-process protection (two schedulers on one workspace) relies on operators not running multiple concurrent schedulers AND on the TTL being well above the longest plausible COPY (24 h default). Documenting the residual risk so it isn't masked by a misleading "we're safe" comment (Devin Review on extractor.py:111). - **`agnes pull` now re-downloads parquets when the local file is missing, even if the recorded hash matches the server.** Pre-fix the download set was computed from `sync_state.json` hash equality alone — if the parquet had been deleted (manual `rm`, disk cleanup, a different workspace sharing the same global `~/.config/agnes/sync_state.json` writing one workspace's parquets while another reads sync_state and assumes "I already have these"), the hash-equal check would short-circuit the download and the next DuckDB view rebuild would fail on a missing file. Now the existence check on `/server/parquet/.parquet` runs alongside the hash compare; missing file → forced re-download regardless of hash. - **`agnes query --remote` no longer over-rejects narrow queries on partitioned/clustered BigQuery tables.** Closes #171. Pre-fix the `/api/query` cost guardrail dry-ran a synthetic `SELECT * FROM
` per registered remote-BQ row referenced by the user SQL, which forced BQ to estimate "full table scan" — column projection, predicate pushdown, and partition pruning were all ignored, producing scan-byte estimates up to ~30,000× larger than the actual query would scan. Narrow queries on big partitioned tables (the documented happy-path use case) were rejected with 400 `remote_scan_too_large` even when BQ's own dry-run reported single-digit MB. Now the guardrail rewrites the user SQL from DuckDB-flavor (bare registered names + `bq."".""`) to BQ-native (`` `..` ``) and runs ONE dry-run on the EXACT user SQL — partition pruning, column projection, and predicate pushdown all engage. Cap check uses the real estimate. Fallback: if BQ rejects the rewritten SQL with `bq_bad_request` (DuckDB-only syntax that doesn't translate, e.g. `::INT` casts), the guardrail falls back to the pre-fix per-table SELECT * estimate so a non-portable query still gets bounded; non-parse errors (forbidden / upstream) propagate as 502. Helpers exported as `_rewrite_user_sql_for_bq_dry_run` (test seam). - **Windows: `agnes` CLI no longer crashes on cs-CZ / non-UTF-8 consoles.** Two failure modes addressed (originally reported in #172 against the pre-rename `da` CLI; ported and broadened here): (1) `agnes pull` and any other Rich-progress-bar codepath crashed with `UnicodeEncodeError` because cp1250 / cp1252 cannot encode Rich's Braille spinner glyphs — `cli/main.py` now reconfigures `sys.stdout` / `sys.stderr` to UTF-8 with `errors="replace"` at import time when `sys.platform == "win32"`. (2) `agnes skills list` and `agnes skills show` crashed with `UnicodeDecodeError` reading skill markdown that contains em-dashes / accents — every `Path.read_text()` / `Path.write_text()` / `open()` call site in `cli/` (including ones not touched by #172, since several files were renamed in the bootstrap rewrite) now passes `encoding="utf-8"` explicitly. Defensive: also covers JSON / YAML config files that were ASCII-only in practice but were one non-ASCII value away from the same failure mode. - `agnes snapshot create … --estimate` in a pre-init directory no longer leaks an httpx `ConnectError` traceback to stderr. The estimate-guard fix (3d587681) let `--estimate` reach `api_post_json`, but the existing `except V2ClientError` clause didn't catch transport-layer errors when no server was configured (defaulted to `http://localhost:8000`). Now also catches `httpx.HTTPError` and renders the friendly hint `Run \`agnes init …\` first`. - `agnes push` now reads Claude Code session jsonls from `~/.claude/projects//` (where Claude Code actually writes them), instead of `/user/sessions/` (which the SessionEnd hook never populated — the previous code uploaded an empty list every time). Encoding logic in `cli/lib/claude_sessions.py` probes both Claude Code variants — older `/`→`-` and newer all-non-alphanumeric→`-` — and unions the result, so users who have upgraded Claude Code mid-project see sessions from both encoded dirs. Falls back to `/user/sessions/` for back-compat. ### Removed - `da analyst setup`, `da analyst status`, `da sync`, `da fetch`, `da metrics`. See **Changed** for replacements. - `da metrics` namespace as a top-level group (subcommands moved to `agnes catalog --metrics` for read-only views and `agnes admin metrics …` for write operations). - Legacy workspace directories `data/parquet/`, `data/duckdb/`, `data/metadata/`, `user/artifacts/`. Existing analyst workspaces should be reinitialized with `agnes init --server-url ... --token ... --force` (a fresh empty folder is recommended). - `_resolve_analyst_lines`, `_analyst_init_lines`, `_analyst_finale_lines` helpers in `app/web/setup_instructions.py` — the analyst-vs-admin layout split is gone. `role` parameter on `compute_default_agent_prompt`, `resolve_lines`, and `render_setup_instructions`. `?role=` query parameter on `/setup`. Admin tile (`