# Changelog All notable changes to Agnes AI Data Analyst. Format: [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Versions follow [Semantic Versioning](https://semver.org/spec/v2.0.0.html), pre-1.0 — public surface (CLI flags, REST endpoints, `instance.yaml` schema, `extract.duckdb` contract) may shift between minor versions; breaking changes called out under **Changed** or **Removed** with the **BREAKING** marker. CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every CI build; semver tags (`v0.X.Y`) are cut at release boundaries and reference the same commit as a `stable-*` tag from the same day. --- ## [Unreleased] ### Performance - **`/api/query` (and `agnes query --remote`) now rewrites user SQL referencing `query_mode='remote'` BigQuery rows into a single `bigquery_query()` call before execute** (`app/api/query.py`). Pre-fix the master view (`CREATE VIEW AS SELECT * FROM bigquery..`) did not push WHERE / SELECT / LIMIT into BQ — the DuckDB BQ extension opened a Storage Read API session over the entire upstream table, scanning the full partitioned dataset before the local DuckDB filter ran. On 100M+ row remote-mode tables this was 50-100× slower than the equivalent direct `bigquery_query()` call (70-150 s vs 1.5 s) and frequently failed with `Response too large to return`. The rewriter (shared core with the existing dry-run helper) wraps the user's whole SQL in `bigquery_query('', '')` so the BQ planner receives the full query and applies partition pruning + projection pushdown server-side. Conservative fall-through: cross-source JOINs (BQ ↔ Keboola/Jira local), queries already containing `bigquery_query(`, and unconfigured BQ project all keep the original ATTACH-catalog path so behavior degrades gracefully. - **DuckDB BigQuery-extension session pool** (`connectors/bigquery/access.py`). `BqAccess.duckdb_session()` now acquires pre-warmed connections from a bounded process-local pool instead of running `INSTALL bigquery; LOAD bigquery; CREATE SECRET; ATTACH …` on every request. Each acquire saves the ~0.5 s extension-load + secret-creation cost when the pool has a warm entry; auth SECRET is refreshed on acquire so a long-lived pooled entry doesn't keep a stale GCE metadata token past its TTL. Pool size is configurable via `data_source.bigquery.session_pool_size` (default 4; sentinel `0` disables pooling). Affects every BQ-touching path — `/api/query`, `/api/v2/scan`, `/api/v2/sample`, `/api/v2/schema`, materialize, and the orchestrator's remote-attach. - **`agnes pull` chunked download for large parquets**: when the server advertises `accept-ranges: bytes` and a parquet exceeds `AGNES_PULL_CHUNK_THRESHOLD_BYTES` (default 50 MB), the CLI now splits the file into N parallel HTTP Range requests (`AGNES_PULL_CHUNK_PARALLELISM`, default 4, capped 1..16) and assembles the parts into the destination atomically. Targets the per-flow-shaped network (corp VPN with per-TCP-connection rate-limiting) where a single stream is throttled but N parallel streams over the same connection scale roughly linearly. Falls back to single-stream when the server responds 200 instead of 206 to a Range probe, when no `accept-ranges: bytes` is advertised, or when content is below the threshold — no behavior change in the small-file / non-cooperating- server cases. - **Persistent HTTP/2 client across `agnes pull`**: `stream_download` now routes through a process-wide pooled `httpx.Client` so N parquet downloads share a single TLS handshake; HTTP/2 multiplexing (when the optional `h2` package is installed) lets all chunk Range requests share one TCP connection. Gracefully falls back to HTTP/1.1 pooling when `h2` is missing — no crash, just slightly less benefit. ### Fixed - **BigQuery `responseTooLarge` no longer surfaces as a generic 400 / 502 with the raw upstream message** (`connectors/bigquery/access.py`). The `translate_bq_error` helper now classifies "Response too large to return" errors via a dedicated `bq_response_too_large` kind (HTTP 400) with an actionable hint pointing at the WHERE / aggregation / materialized-table remediations. Pre-fix this failure mode fell through to the generic `bq_bad_request` mapping, which implied the user's SQL had a syntax error — wrong root cause. Affects every BQ-touching path (`/api/query`, `/api/v2/scan`, `/api/v2/sample`, `/api/v2/schema`, materialize) since they all share `translate_bq_error`. ### Added - New optional dependency `h2>=4.1.0` (HTTP/2 transport for httpx). Pure performance — `agnes pull` works on HTTP/1.1 if the install skips it. - **Textual progress fallback for non-TTY `agnes pull`**: when stderr is not a terminal (Claude Code SessionStart hook, CI runner, Docker log capture, …), `agnes pull --no-quiet` now emits a plain-text progress line per file at most every 10% or 30 s, plus a final completion line. Replaces the previous Rich-bar-on-pipe behavior that either suppressed output entirely or leaked ANSI escape sequences. TTY path unchanged (Rich progress bar with bytes / speed / ETA, aggregated per-file across chunked-download chunks). ## [0.38.3] — 2026-05-06 ### Changed - **Admin / Tables**: registry table now shows Source (bucket/table), Schedule, Folder, Registered by/at, and a sync-error warning icon per row. The page widens to ~1600px to accommodate. ### Fixed - **Admin / Tables**: long table descriptions no longer push the row's Edit / Manage access / Delete buttons off-screen. The Description column is now clamped to 2 lines with the full text available on hover and in the Edit modal. - **Admin / Tables**: descriptions stored with shell-quoting backslash-escapes (`Don\'t`, `\n`) now render correctly. The same normalization also runs at register/update time so newly-saved descriptions are never corrupted. - **Admin / Tables**: `scripts/fix_description_escapes.py` cleans up already-corrupted descriptions in `table_registry` (run with `--dry-run` first, then `--apply`). ## [0.38.2] — 2026-05-06 ### Fixed - **`bq_query_timeout_ms` was not applied on every BigQuery ATTACH branch** (`src/db.py:_reattach_remote_extensions`, `src/orchestrator.py:_attach_remote_extensions`). Pre-fix only the metadata-token branch (the BqAccess contract, `token_env=''`) called `apply_bq_session_settings`. BigQuery sources registered with an explicit `token_env`, or with no auth env, ATTACH'd without ever applying the timeout — falling back to the extension's 90 s default. Default-config operators on those branches now consistently get the configured 600 s (or whatever `data_source.bigquery.query_timeout_ms` is set to). - **`apply_bq_session_settings` swallowed every `Exception` silently** (`connectors/bigquery/access.py`). Two realistic failure modes — the BigQuery extension not yet loaded on the connection, or an installed extension version that doesn't recognise the setting — left the 90 s default in place with no log line explaining why. Each failure path now logs `WARNING` with the actionable cause; on success the applied value is verified via a `current_setting('bq_query_timeout_ms')` readback (catches the silent-ignore mode some extension versions exhibit) and a mismatch logs `WARNING` too. ## [0.38.1] — 2026-05-06 ### Internal - `CLAUDE.md` — `Claude Code marketplace endpoint` section now documents the two-step fallback (system `git clone` + local `claude plugin marketplace add`) for users registering manually against a private-CA Agnes instance. Bun-compiled `claude` ignores the OS trust store and CA env vars on the marketplace HTTPS path, so direct `/plugin marketplace add` over HTTPS can fail with TLS errors on macOS / Windows even when system tools work fine. The dashboard-served setup payload (`app/web/setup_instructions.py`) already branches between the two automatically based on platform; the doc snippet now matches that behavior for manual flows. - **`agnes pull` chunked download for large parquets**: when the server advertises `accept-ranges: bytes` and a parquet exceeds `AGNES_PULL_CHUNK_THRESHOLD_BYTES` (default 50 MB), the CLI now splits the file into N parallel HTTP Range requests (`AGNES_PULL_CHUNK_PARALLELISM`, default 4, capped 1..16) and assembles the parts into the destination atomically. Targets the per-flow-shaped network (corp VPN with per-TCP-connection rate-limiting) where a single stream is throttled but N parallel streams over the same connection scale roughly linearly. Falls back to single-stream when the server responds 200 instead of 206 to a Range probe, when no `accept-ranges: bytes` is advertised, or when content is below the threshold — no behavior change in the small-file / non-cooperating- server cases. - **Persistent HTTP/2 client across `agnes pull`**: `stream_download` now routes through a process-wide pooled `httpx.Client` so N parquet downloads share a single TLS handshake; HTTP/2 multiplexing (when the optional `h2` package is installed) lets all chunk Range requests share one TCP connection. Gracefully falls back to HTTP/1.1 pooling when `h2` is missing — no crash, just slightly less benefit. ### Added - New optional dependency `h2>=4.1.0` (HTTP/2 transport for httpx). Pure performance — `agnes pull` works on HTTP/1.1 if the install skips it. - **Textual progress fallback for non-TTY `agnes pull`**: when stderr is not a terminal (Claude Code SessionStart hook, CI runner, Docker log capture, …), `agnes pull --no-quiet` now emits a plain-text progress line per file at most every 10% or 30 s, plus a final completion line. Replaces the previous Rich-bar-on-pipe behavior that either suppressed output entirely or leaked ANSI escape sequences. TTY path unchanged (Rich progress bar with bytes / speed / ETA, aggregated per-file across chunked-download chunks). ## [0.38.0] — 2026-05-06 ### Added - **`/store` page** — community marketplace where every authenticated user can upload skills, agents, and plugins as ZIPs. Listing has type / category / search filters; detail page shows metadata, file list, photo, video link, and an `[Install]` button. Same owner can't have two entities with the same `name` (any type). Plugin/skill/agent name is suffixed `-by-` (sanitized email-local-part) at upload time to avoid collisions in Claude Code's flat namespace. - **`/my-ai-stack` page** — every user's per-user composition view: the admin-granted plugins (with an opt-out toggle each, default enabled) plus the entities they've installed from the Store. Toggling a curated plugin off writes a `user_plugin_optouts` row; admin removing the underlying grant drops everyone's opt-out (re-grant restarts at enabled). - **Composed served marketplace**: the `/marketplace.zip` and `/marketplace.git/` endpoints now serve `(admin_granted ∖ opt_outs) ∪ store_installs` — driven by the new `src/marketplace_filter.py:resolve_user_marketplace`. Same content-addressed ETag / git-commit-SHA contract as before; any change on either layer propagates to Claude Code on the next refresh. - **Store skill+agent bundle**: skill/agent installs are merged into a single synthetic `agnes-store-bundle` plugin in the served marketplace (one plugin with N skills/agents inside), while `type='plugin'` Store entities stay standalone. Cuts plugin-entry count in Claude Code from O(installs) down to O(1) for the skill+agent path. Bundle's `version` field hashes its combined contents so install/uninstall flips it for auto-update detection. - REST: `POST/PUT/DELETE/GET /api/store/entities[/{id}]`, `POST/DELETE /api/store/entities/{id}/install`, `GET /api/store/entities/{id}/photo`, `GET /api/store/entities/{id}/docs/{filename}`, `POST /api/store/entities/preview` (wizard step-1 validation), `GET /api/store/categories`, `GET /api/store/owners`, `GET /api/my-stack`, `PUT /api/my-stack/curated/{marketplace_id}/{plugin_name}`. - **CLI: `agnes store {list,show,install,uninstall,upload,update,delete,pull,info}`** and **`agnes my-stack {show,toggle}`** — full analyst-side coverage of the new Store + composition REST surface. Multipart upload helper added to `cli/v2_client.py` (`api_post_multipart` / `api_put_multipart` / `api_get_stream`) so future multipart and binary-download endpoints don't have to roll their own httpx wiring. - **CLI: `agnes admin store {pull,push,info}`** — operator-flavored bulk Store ops. ``pull`` and ``info`` share the open `GET /api/store/bundle.zip` / `/entities` endpoints; ``push`` wraps the admin-gated `POST /api/store/import-bundle`. ``push`` accepts either a *.zip file or a directory containing `manifest.json` + `entities/` (CLI zips a directory client-side, so a backup git repo's working tree round-trips straight back into Agnes via a single command). - **CLI: `agnes store mine`** — analyst-facing self-bundle. Same endpoint as `admin store pull`, scoped via `?owner=me` (server resolves the magic value to the caller's user_id) so authors can archive their own uploads without admin role. - **REST: `GET /api/store/bundle.zip`** — deterministic ZIP of all (filtered) Store entities for whole-Store backup. Layout: `manifest.json` at the top with per-entity metadata + `owner_email` for cross-instance restore, then `entities//{plugin,assets}/`. Auth: any authenticated user (Store is community-open, the same set is already visible via `GET /api/store/entities`). Filters mirror the listing endpoint (type / category / owner / search). - **REST: `POST /api/store/import-bundle`** — admin-only restore of a bundle ZIP. Modes: `merge` (default — upsert by `entity_id`, replace when version differs), `replace` (overwrite all matching), `skip` (only insert new). Owner resolution by `owner_email` against `users.email`; missing emails get a stub disabled user (`active=False`, no password, id `imported-`) so the historical owner stays attached and an admin can later activate or reassign in `/admin/users`. Audit-logged with the full counts. ### Changed - `/admin/marketplaces` admin nav entry moved from the top-level header into the Admin dropdown and renamed to **Curated Marketplaces** to disambiguate from the new community Store. - `app/api/access.py` `DELETE /api/admin/grants/{grant_id}` now drops every user's `user_plugin_optouts` row matching the deleted plugin and flushes the marketplace ETag cache. Audit log entry for `resource_grant.deleted` carries `optouts_dropped` so operators can correlate. - `app/marketplace_server/{packager,git_backend}.py` consume `resolve_user_marketplace` instead of `resolve_allowed_plugins`. The `/marketplace/info` payload now splits its `plugins` array by `source`, exposing `plugins` (admin) and `store_plugins` (community). ### Fixed - **Stored XSS via `video_url`** (`app/api/store.py`) — `video_url` accepted on `POST/PUT /api/store/entities` is now scheme-validated to `http(s)://` only. Previously a `javascript:` URI flowed through the form field into `store_detail.html`'s `` and would execute in any viewer's session on click. 400 `invalid_video_url` on bad input. - **ZIP decompression bomb** (`app/api/store.py:_safe_zip_extract`) — the uncompressed-side total of an upload is now capped at 200 MB (`MAX_ZIP_UNCOMPRESSED`); the compressed-side cap (50 MB) alone did not bound the on-disk footprint. 413 `zip_too_large_uncompressed` on oversize. - **Admin authz parity for Store mutations** (`app/api/store.py`, `app/web/router.py`, `app/web/templates/store_detail.html`) — `PUT /api/store/entities/{id}` now permits owner OR admin (matches `DELETE`); the store-detail page passes `is_admin` to the template and gates the Edit/Delete buttons on `is_owner OR is_admin`. Pre-fix, an admin could delete via the API but saw no Edit/Delete affordance in the UI, and could not update non-owned entities at all. - **Scratch directory leak on ZIP validation failure** (`app/api/store.py`, Devin Review) — `create_entity` and `update_entity` created the `scratch` temp dir inside one `try/finally` block but cleaned it up in a separate one. When `_safe_zip_extract` raised `HTTPException` (zip-slip, uncompressed-too-large) or `BadZipFile` was caught and re-raised, the exception exited the first scope and the cleanup `finally` was never reached. Each failed upload leaked a temp dir. Fixed by collapsing scratch creation + cleanup into a single outer `try/finally` covering both extraction and the metadata/bake work. - **Cross-owner suffix collision** (`app/api/store.py:create_entity`) — `sanitize_username` is many-to-one (`alice.smith` and `alice_smith` both → `alice-smith`). Two such users uploading entities with the same display `name` produced identical `-by-` suffixes, silently colliding in the served bundle's on-disk paths and the manifest catalog (Claude Code dedupes by `plugin.json`'s `name`). We now refuse the second upload with 409 `conflict_global_suffix`. ### Internal - Schema **v24 → v25**: adds `store_entities`, `user_store_installs`, `user_plugin_optouts`. Auto-migration via `_V24_TO_V25_MIGRATIONS` ladder branch in `src/db.py` (existing self-heal path also creates the tables on same-version starts). - New helpers in `src/store_naming.py`: `sanitize_username`, `suffixed_name`, `compute_entity_version` (sha256 of sorted `(relpath, content)` tuples, 16-char hex prefix). Predefined category taxonomy in `src/store_categories.py`. - New repositories: `src/repositories/{store_entities,user_store_installs, user_plugin_optouts}.py` (mirror existing `marketplace_plugins` style — dict returns, parameterized SQL, no ORM). - `app/utils.py:get_store_dir()` — `${DATA_DIR}/store/`. - `humanbytes` Jinja2 filter on Store detail page (binary KB/MB/GB). - New CLI command modules: `cli/commands/store.py`, `cli/commands/my_stack.py`. Registered as Typer subapps `agnes store` and `agnes my-stack` in `cli/main.py`. Tests at `tests/test_cli_store.py`. - `tests/test_store_api.py:TestStoreSecurityFixes` — regression suite for F1 (video_url), F2 (zip-bomb), F4 (admin authz parity), F5 (cross-owner suffix collision). ## [0.37.0] — 2026-05-06 Operator-side disk-layout release. Closes the 2026-05-05 shadow-mount class identified in v0.36.0's deploy notes via two independent fixes that operators can adopt separately: (#194 folds in @cvrysanek's #191 + #192). The image-side change is invisible — `STATE_DIR` defaults to the legacy nested path, so existing deployments see no behavior change unless they opt into the new flat layout. Folds in three rounds of Devin Review (3 BUGs + 1 ANALYSIS class, ANALYSIS deferred per the operator-side limitation it describes). ### Added - **`STATE_DIR` env var + `docker-compose.flat-mount.yml` overlay** — operators can now place the writable state disk in **parallel** to the data disk (`sdb` at `/data`, `sdc` at `/data-state`) instead of nested (`sdc` at `/data/state` inside `/data`). The flat layout removes three structural fragilities of the legacy nested layout: bind-mount propagation gotchas (the 2026-05-05 shadow-mount class), two-writer collisions on a shared prefix (host's `tls-rotate.timer` as root + container app as uid 999 on the same path), and mount-order coupling on disk resize. `STATE_DIR` defaults to `${DATA_DIR}/state` so existing deployers see no behavior change; opt-in to flat layout via the new overlay + `STATE_DIR=/data-state` per the runbook in `docs/state-dir.md`. Read by `src/db.py:_get_state_dir()`, `app/secrets.py:_state_dir()`, `app/main.py` (`.env_overlay`), `app/instance_config.py` (`instance.yaml` overlay reader), `app/api/admin.py` (writers for both `/api/admin/configure` and `/api/admin/server-config` against the same overlay), `app/api/marketplaces.py` (marketplace PAT persistence into `.env_overlay`), `scripts/ops/agnes-auto-upgrade.sh` (mount-sanity + cert detection), `scripts/ops/agnes-tls-rotate.sh` (`CERT_DIR=$STATE_DIR/certs`). All read/write sites resolve via the same helper so under `STATE_DIR=/data-state` the irreplaceable tier (`system.duckdb`, secrets, `instance.yaml`, `.env_overlay`, certs) lands on sdc consistently — partial migration would silently lose secrets on container restart. ### Changed - **`docker-compose.host-mount.yml` switched from "named volume + driver_opts" to direct service-level bind mounts** (`volumes: !override` per service). Docker named volumes have an immutability footgun: once a volume is created, its driver options are fixed for the life of the volume, and editing this file does NOT propagate the new options to existing volumes. This bit a deployer in production: the volume was created before the overlay had `bind,rbind`, kept the old `bind` (non-recursive) propagation, and containers wrote to a shadowed subdirectory of the parent disk instead of the nested child mount. DuckDB went FATAL on a root-owned WAL during a routine container recreate; sign-in broke. Direct service binds re-evaluate options every container start and default to recursive in modern Docker (20.10+) — no immutable state to migrate, no shadow-mount class. Operators on this overlay: next `docker compose up -d` starts containers with direct binds; the old `agnes_data` named volume is no longer referenced and can be removed with `docker volume rm agnes_data` (operator's choice — orphaned but harmless if left). Both `host-mount.yml` and `flat-mount.yml` `volumes: !override` blocks for `caddy` now restate every mount the base service depends on (notably `data:/srv:ro` for the v0.36.0 file_server bypass and `caddy_config:/config` for ACME state) — a Devin-caught regression where `!override` silently dropped these mounts under the new layout, defeating the parquet-download perf bypass. ## [0.36.0] — 2026-05-05 Combined performance + analyst-clarity bundle. Folds three previously-staged work streams into one PR (#188): the long-running `agnes query --remote` timeout (#181), the Caddy parquet-download bypass (#182), and Pavel's #185 Phase 1 trace findings (silent 44-min first-init, opaque CLI tracebacks, no analyst-Claude size signal). Also performs the Tier 1 event-loop unblocking — the five hottest BQ-touching endpoints were `async def` over synchronous DuckDB / BQ-extension calls, so a single heavy `agnes query --remote` froze every other request for the duration of the BQ wait. The image-side fixes ship in this release; for existing VMs, the new auto-upgrade.sh self-fetches the matching Caddyfile + compose overlays from `main` on its next 5-minute tick, so deployment requires no operator action beyond letting the cron run. ### Added - **`data_source.bigquery.query_timeout_ms` config knob** (default 600 000 ms = 10 min). The DuckDB BigQuery extension's built-in default of 90 s was too tight for analyst-scale queries against view-backed BQ datasets — `agnes query --remote` would HTTP 400 with `Binder Error: Query execution exceeded the timeout. Job ID: …` whenever the underlying BQ job took longer than 90 s, even though the BQ job itself was healthy. The new knob is applied via `SET bq_query_timeout_ms` after every `LOAD bigquery` on every BQ-touching DuckDB session — the orchestrator's `_remote_attach` ATTACH path (`src/orchestrator.py`), the analytics-DB read-only reattach path (`src/db.py:_reattach_remote_extensions` — the primary `agnes query --remote` request path), the `BqAccess` session factory (`connectors/bigquery/access.py`), and the standalone extractor (`connectors/bigquery/extractor.py`). Sentinel `0` (or non-numeric / unparseable values) leaves the extension default in place so operators on legacy extension versions that don't recognise the setting aren't broken. Configurable via `/admin/server-config` UI. Note: BigQuery's `jobs.query` RPC caps the wait at ~200 s per call regardless of this setting; the extension polls on top so the effective ceiling is the value here but each poll is ~200 s. DuckDB emits an informational warning when the value is set above the BQ RPC cap — operators can safely ignore it. - **Per-user parallel parquet downloads in `agnes pull`** — the download loop in `cli/lib/pull.py` now uses a `ThreadPoolExecutor` with concurrency capped by the new `AGNES_PULL_PARALLELISM` env var (default 4, set 1 to restore pre-PR serial behavior). On a registry of N tables the wall-clock time drops from `Σ stream_download_seconds(table_i)` to roughly `max × ceil(N/4)`. Works hand-in-hand with the Caddy `file_server` change below: without it parallel client-side downloads would still queue on the single uvicorn worker; with it each request is its own caddy goroutine + sendfile, so 4-way parallelism actually delivers throughput. Per-table error semantics preserved — a failure on one table no longer aborts the rest of the batch. - **`agnes init` / `agnes pull --skip-materialize`** — opts the first sync out of materialized-mode tables (server-side scheduled-query parquets, often multi-GB). Pavel's #185 Phase 1: a single 6.3 GB `order_economics` parquet kept first init silent for 44 minutes. Materialized rows stay discoverable via `agnes catalog`; rerun without the flag once the analyst actually needs them locally. - **`agnes pull` progress bar** — Rich-driven aggregate transfer display rendered to stderr when not `--quiet` and not `--json`. Per-file label + bytes / total / rate / ETA, aggregated across the parallel `ThreadPoolExecutor` workers introduced earlier in this PR. Replaces the prior 0-stdout silence on first init. - **CLI clean-error wrapper** (`cli/main.py:_run_with_clean_errors`, new entry point in `pyproject.toml`) — `httpx.ReadTimeout` / `ConnectError` / `RemoteProtocolError` etc. used to dump a five-frame Python traceback to the analyst's terminal when a `agnes query --remote` against a slow BQ view timed out client-side. Now: one-line `Error: …` message + actionable hint (e.g. "narrow the WHERE on the partition column from `agnes catalog --json`, or run `agnes snapshot create --estimate`"), exit code 1. Full traceback is appended to `~/.config/agnes/last-error.log` so an operator can recover it for support without spamming the analyst's terminal. Implemented as `AgnesTransportError` raised from the `api_get` / `api_post` / `api_delete` / `api_patch` / `stream_download` helpers in `cli/client.py`; the top-level Typer wrapper renders it. Unhandled `Exception`s are caught at the same boundary, logged, and printed as "internal CLI error (see logfile)" so a Python traceback never leaks to the analyst. - **`scripts/ops/agnes-auto-upgrade.sh` now re-fetches Caddyfile + every compose overlay** from `keboola/agnes-the-ai-analyst@main` on every tick, hashes them, and triggers a `docker compose up -d` recreation when the hash changes — same path as an image-digest change. Pre-fix the script only watched `docker images` digests, so a Caddyfile or compose change in main never reached running VMs (only fresh boots ran `startup.sh`'s file fetch). Without this, the new file_server downloads-path below would land in the image but stay inert against an old Caddyfile. The script also self-updates from the same path so the very fix that watches config files isn't itself stuck on running VMs. Fail-soft on curl errors — keeps the existing file rather than blanking it. - **Caddy `file_server` for parquet downloads** — `GET /api/data/{table_id}/download` is now intercepted at the Caddy layer (TLS profile only) and served directly via sendfile/zero-copy from the data volume mounted read-only at `/srv` inside the caddy container. Caddy authorises every request via a new lightweight RBAC probe `GET /api/data/{table_id}/check-access` (returns 204 when the caller has read access on the table, 403 otherwise) using the `forward_auth` directive — the bulk byte transfer never touches uvicorn workers. Resolves a real production failure mode where a single multi-GB analyst pull held the app's only uvicorn worker for the duration of the stream and starved the UI / `/api/health` / every other API endpoint, eventually flipping the container to `unhealthy`. Path discovery uses Caddy's `try_files` over the known `extract.duckdb` v2 source subdirs (`bigquery/data/.parquet`, `keboola/data/.parquet`, `jira/data/.parquet`); a parquet not at any of those paths transparently falls through to the existing app handler so legacy `src_data/parquet` layouts and future connectors keep working with no Caddyfile change. Non-Caddy deployments (dev `docker compose up` without `--profile tls`) continue to use the app handler unchanged. - **Workspace prompt: decision tree, common-mistakes callout, failure-mode dictionary** in `config/claude_md_template.txt` (the template `agnes init` writes to `/CLAUDE.md`). Surfaces every catalog-row field analyst Claude should read before deciding which command to use (`query_mode`, `sql_flavor`, `where_examples`, `fetch_via`, `rough_size_hint`); explicitly binds `--estimate` to `agnes snapshot create` ONLY (was the most-failed first-try misuse — fails with `No such option: --estimate` on `agnes query`); calls out the `agnes fetch` → `agnes snapshot create` rename so stale-doc analysts don't run a non-command; documents the BQ permission model (server SA, not personal Google identity) and a 6-row failure-mode table mapping each common error wording to its cause + the right next step. - **`rough_size_hint` populated for `local` + `materialized` catalog rows** in `GET /api/v2/catalog` (was hardcoded `null` with a "Task 8" TODO). Reads the parquet file size at `${DATA_DIR}/extracts//data/.parquet` and buckets into `small` (≤100 MiB), `medium` (≤1 GiB), `large` (≤10 GiB), `very_large` (>10 GiB). `remote` rows stay `null` for now (size requires a BQ INFORMATION_SCHEMA call; tracked separately). Lets analyst Claude pick `agnes snapshot create` over `agnes query --remote` by inspecting `agnes catalog --json` rather than discovering size empirically via a failed `--remote` round-trip. ### Changed - **Tier 1 event-loop unblocking** — the five hottest BQ-touching endpoints (`POST /api/query`, `POST /api/v2/scan`, `POST /api/v2/scan/estimate`, `GET /api/v2/sample/{id}`, `GET /api/v2/schema/{id}`) were declared `async def` but invoked synchronous DuckDB / BQ-extension calls inside the body. Under uvicorn's single event loop that meant a single heavy `agnes query --remote` (waiting up to ~200 s for BQ's `jobs.query` to return) **froze every other request** — `/api/health`, the dashboard, auth, even another query — for the full duration of the BQ wait. Operators saw "VM idle, app frozen" symptoms during this work. Converted all five to plain `def` so FastAPI auto-offloads the blocking body to the anyio thread pool; the event loop stays free for non-BQ requests. Verified via 0-await audit (no `await` statements in the converted handlers, so the rename is safe). Tests: `tests/test_v2_*.py` were rewritten to call the handlers directly instead of `asyncio.run(...)` (which now fails on a non-coroutine return). Pairs with the thread-pool capacity bump below. - **`AGNES_THREADPOOL_SIZE` env var** (default 200, was anyio's stock 40) controls the FastAPI / Starlette thread pool capacity used by every plain-`def` route handler. Set in `app/main.py:lifespan` via `anyio.to_thread.current_default_thread_limiter().total_tokens`. 200 leaves comfortable headroom over the BQ extension's connection budget while keeping the per-process thread cost bounded — for the workload of <50 concurrent analysts this is well over what's needed; bump for higher concurrency. - **CLI update-banner now says `agnes` instead of `da`** (`cli/update_check.py:format_outdated_notice`). The string `[update] da X is out of date` had survived the `da` → `agnes` CLI rename and was the most-visible stale identifier in the analyst-facing surface — every CLI command printed it on stderr when a newer wheel was available. ### Fixed - **CLI ReadTimeout message reports the actual httpx timeout** (was hardcoded to `QUERY_TIMEOUT_S` = 300s). On a 30s-default call (`agnes catalog`, `agnes auth`, …) the analyst saw "didn't respond within the read timeout (300s)" while the call had actually given up after 30s — confusing and unactionable. The translator now takes the real timeout from the calling helper and renders it; the long-running-BQ advisory only appears for calls where the timeout was set ≥ 60s. Devin Review on PR #188. - Keboola sync now falls back to the legacy Storage-API client when the DuckDB Keboola extension's per-table scan fails, not just when the initial `ATTACH` fails. Two changes: - `kbcstorage>=0.9.0` is promoted from optional to core dependency. The legacy fallback path in `connectors/keboola/extractor.py:_extract_via_legacy` has been there since the extension landed, but until now the bare `from kbcstorage.client import Client` would crash any default install with `ModuleNotFoundError`. - `connectors/keboola/extractor.py:run` now wraps `_extract_via_extension` in a per-table try/except — on any per-table scan failure it retries via the legacy client. Previously, when `ATTACH` succeeded but the table-level `COPY (SELECT * FROM kbc.""."")` failed, the table was just marked failed with no retry. Together these unblock deployments where the extension's bucket-schema scans return `Schema '..."in.c-..."' does not exist or not authorized` (keboola/duckdb-extension#17) while the upstream extension fix is in flight. ## [0.35.1] — 2026-05-05 ### Fixed - `agnes query --remote` no longer dies after 30s on long-running BigQuery SELECTs. The CLI HTTP client now defaults to a 300s timeout for `/api/query` and exposes `AGNES_QUERY_TIMEOUT` (seconds, float) for operators who need to extend it further. Other CLI calls keep the 30s default. (`cli/client.py`, `cli/commands/query.py`) ## [0.35.0] — 2026-05-05 Five-defect fix for the silently-broken session pipeline on default Compose deploys (#176). Sessions uploaded by `agnes push` landed on `/data/user_sessions//*.jsonl`, but on a stock `docker compose up` deploy nothing ever processed them — `/corporate-memory` stayed empty even when sessions and `CLAUDE.local.md` were uploaded. The root cause was a stack of compounding defects: LLM SDKs were dev-only deps so the scheduler container boot-looped on `ModuleNotFoundError`, the side-car services were profile-gated and ran as tight `restart: unless-stopped` boot loops anyway, the `verification_detector` had no scheduler entry at all, the first-time setup never seeded an `ai:` block, and the `/corporate-memory` page silently filtered out the pending review queue. This release wires the LLM pipeline into the existing scheduler-v2 model (one HTTP-driven cron tick per service) and adds a health-check that warns when uploaded jsonls aren't being processed. ### Changed - **BREAKING** `docker-compose.yml` and `docker-compose.prod.yml` no longer ship the `corporate-memory` and `session-collector` services. The scheduler container drives both jobs through admin HTTP endpoints (see Added below) on offset cadences (10 min / 17 min). Operators previously running `COMPOSE_PROFILES=full` or maintaining custom Compose overrides need to drop those service stanzas — leaving them in produces a double-driver footgun (the standalone container loop races the scheduler-v2 cron tick on `/data/user_sessions` and `knowledge_items` writes). The Python entry points (`services/{corporate_memory, session_collector, verification_detector}/__main__.py`) remain — they're still callable from the CLI for one-shot manual runs and from the new admin endpoints. ### Added - New admin endpoints in `app/api/admin.py` that wrap the LLM pipeline jobs so the scheduler can drive them over HTTP (matching the existing `/api/marketplaces/sync-all` pattern): - `POST /api/admin/run-session-collector` — copies Claude Code session jsonls from user homes to `/data/user_sessions//`. - `POST /api/admin/run-verification-detector` — extracts verified knowledge from session transcripts via the LLM, writes pending items to `knowledge_items`. - `POST /api/admin/run-corporate-memory` — refreshes the catalog from team `CLAUDE.local.md` files. All three are admin-gated, sync-def (FastAPI thread pool), and emit one audit row per invocation. - Three new entries in `services/scheduler/__main__.py:JOBS` with deliberately offset cadences (10 m / 15 m / 17 m, all coprime modulo the 30 s tick) so the LLM-backed jobs don't fire on the same tick and stack their API + DB load: - `session-collector` — every 10 min → `POST /api/admin/run-session-collector`. - `verification-detector` — every 15 min → `POST /api/admin/run-verification-detector`. - `corporate-memory` — every 17 min → `POST /api/admin/run-corporate-memory`. - `connectors.llm.factory.create_extractor_from_env_or_config(ai_config)` — falls back to `ANTHROPIC_API_KEY` / `LLM_API_KEY` env vars when the `ai:` block is empty, raises a clear `ValueError` when neither is available. `services/corporate_memory` and `services/verification_detector` switch to the new helper so a missing `ai:` section is no longer a silent skip. - `POST /api/admin/configure` now seeds a default `ai:` block into the writable `instance.yaml` overlay when the overlay has no `ai:` yet AND `ANTHROPIC_API_KEY` (or `LLM_API_KEY`) is present in the environment. The block stores the env-var reference (`${ANTHROPIC_API_KEY}`), never the raw secret. Existing operator config is preserved verbatim. - `/corporate-memory` page renders an admin-only banner (`N pending items awaiting review — review them at /corporate-memory/admin`) when the pending review queue is non-empty. Non-admins see no change — the route zeroes the count server-side before the template renders. Closes the silent-failure UX gap that hid the review queue from operators with `approval_mode='review_queue'` (the default). - `GET /api/health/detailed` now returns a `session_pipeline` service entry that warns when uploaded session jsonls aren't being processed. Heuristic: `max(mtime of /data/user_sessions/**/*.jsonl) <= max(processed_at in session_extraction_state) + grace_seconds`, where `grace_seconds = 2 ×` the verification-detector cadence (default 30 min, configurable via `SCHEDULER_VERIFICATION_DETECTOR_INTERVAL`). Surfaces as `status='warning'` (never `error`) with an actionable `detail` pointing at the verification-detector job. A warning bubbles up to the existing `overall='degraded'` aggregation so `agnes diagnose system` flags it. ### Fixed - **Defect 4 — LLM provider SDKs in dev-only deps caused scheduler container boot loops.** `anthropic>=0.30.0` and `openai>=1.30.0` are now in `[project].dependencies`, not `[project.optional-dependencies].dev`. The Dockerfile's `uv pip install --system --no-cache .` picks them up automatically, no Dockerfile change required. `tests/test_packaging.py` locks the contract. - **Defect 5 — first-time setup never wrote an `ai:` block.** Two paths to a working LLM pipeline now actually work end-to-end (#179 review): (a) a default `ai:` block seeded by `POST /api/admin/configure` into the writable overlay at `${DATA_DIR}/state/instance.yaml` when env keys are present (Added above), or (b) env-var fallback at service start time. The seeded overlay path was dead code on the initial 0.35.0 cut — the three LLM consumers (`services/corporate_memory/collector.py`, `services/verification_detector/__main__.py`, `app/api/admin.py:run_verification_detector`) imported `load_instance_config` from `config.loader` (which only reads the static config dir), and even if they had read the overlay, `app/instance_config.py` ran `yaml.safe_load` on it without resolving `${ENV_VAR}` references so the seeded `${ANTHROPIC_API_KEY}` placeholder would have stayed literal. Both fixes shipped: consumers switched to the overlay-aware `app.instance_config.load_instance_config`, and the overlay is now passed through `config.loader._resolve_env_refs` before deep-merge with the static base. `collect_all` no longer swallows the factory's `ValueError` into `stats["errors"]` — fail-fast propagates so the scheduler / admin endpoint surface the actionable misconfiguration message. - **#179 review — scheduler ignored its own LLM cadence env vars.** `app/api/health.py` already read `SCHEDULER_VERIFICATION_DETECTOR_INTERVAL` to compute the staleness grace window, but the scheduler cadence was hardcoded to `every 15m`, so an operator throttling the detector via the env was silently ignored on the schedule side while the health grace silently widened. All three LLM-pipeline cadences are now env-driven through the same `_read_positive_int` pattern as `data-refresh` / `health-check` / `script-runner`: `SCHEDULER_SESSION_COLLECTOR_INTERVAL` (default 600s = 10m), `SCHEDULER_VERIFICATION_DETECTOR_INTERVAL` (default 900s = 15m), and `SCHEDULER_CORPORATE_MEMORY_INTERVAL` (default 1020s = 17m). Defaults preserve the 10/15/17m coprime offset so the three jobs don't fire on the same tick. The verification-detector env var remains the single source of truth for the health-check grace (still `2 ×` the cadence). - **Defect 3 — `verification_detector` had no scheduler entry.** Now in `JOBS` with a 15 min cadence, hitting the new `/api/admin/run-verification-detector` endpoint. - **Defect 2 — side-car services gated by `profiles: [full]` were silently skipped on default deploys.** Both stanzas dropped (Changed above); the scheduler-v2 cron is the sole driver. - **Defect 1 — `/corporate-memory` filtered `status IN ('approved','mandatory')` with no hint that pending items existed.** Admin banner added (Added above). - **#179 review — `/api/admin/run-session-collector` would SystemExit the worker.** The endpoint called `collector.main()`, whose `argparse.parse_args()` parsed uvicorn's `sys.argv` (`['app.main:app', '--host', …]`) and called `sys.exit(2)` on the unrecognised flags. `SystemExit` inherits from `BaseException`, escapes FastAPI's exception machinery, and propagates through the thread pool — every scheduler tick that fired the endpoint either 500-ed or risked killing the uvicorn worker. Fix: `services/session_collector/collector.py` now exposes an argv-free `run(dry_run, verbose) -> (rc, stats)` helper; `main()` is a thin CLI shim around it and the admin endpoint calls `run()` directly. Audit log now carries the per-run stats (`users_processed`, `files_copied`, `files_skipped`) instead of just the rc. Regression tests in `tests/test_session_collector.py::TestRunHelper`. - **#179 review — `python -m services.corporate_memory` crashed on missing LLM config instead of exiting cleanly.** The PR's fail-fast change made `collect_all()` raise `ValueError` when neither an `ai:` block nor `ANTHROPIC_API_KEY`/`LLM_API_KEY` was available. The `verification_detector` CLI was updated to catch it; the corporate-memory CLI was missed. Now also wrapped — operators get a one-line `Corporate Memory cannot run: ` on stderr and rc=1 instead of a raw traceback. Regression test in `tests/test_llm_connector.py::TestCorporateMemoryCollector::test_main_returns_1_on_no_ai_config_instead_of_traceback`. - **E2E test — Anthropic API rejected every extraction request.** The structured-output API now requires `additionalProperties: false` on every `{"type": "object"}` node in the json_schema; without it the API returns 400 `invalid_request_error` ("output_config.format.schema: For 'object' type, 'additionalProperties' must be explicitly set to false"). Surfaced on a real BQ-backed deploy: every uploaded session jsonl failed verification-extraction in a tight retry loop. Fix: `connectors/llm/anthropic_provider.py` now wraps the caller-supplied schema through a recursive `_strict_json_schema()` walker that adds the field where missing (preserving any explicit operator override), then passes the strict variant to the API. Six unit tests in `tests/test_llm_connector.py::TestStrictJsonSchema` pin the recursion across nested objects, array items, and the no-mutation invariant. - **#179 review — `/api/admin/run-verification-detector` skipped audit on unhandled exceptions.** If `detector.run()` threw anything other than the already-translated `ValueError` (DuckDB lock, network blip, unexpected SDK error), the audit_log row was never written — the operator's only signal was `docker logs agnes-scheduler-1`. The endpoint now wraps `detector.run` in try/except, records the exception in `audit_params["unhandled_error"]`, then re-raises as 500 after audit. The `/admin/scheduler-runs` page surfaces the failure row with the error type and message. - **#179 review — `SCHEDULER_AUDIT_ACTIONS` listed action strings that don't actually appear in `audit_log`.** The list at `app/web/router.py:952` had `"marketplaces_sync_all"` (wrong — actual is `"marketplace.sync_all"`) plus `"data_refresh"` and `"scripts_run_due"` (which `app/api/sync.py` and `app/api/scripts.py` don't write). Corrected to the four actually-logged strings, with a comment pointing at the missing audit calls in sync/scripts as a follow-up. - **#179 review — `/api/admin/run-corporate-memory` skipped audit on unhandled exceptions** (same gap as `run_verification_detector` from the previous round). Mirrored the same try/except + `unhandled_error` audit pattern, so a DuckDB lock or unexpected SDK error from `collect_all()` now produces an audit row with the error type+message before re-raising as 500. Regression test in `tests/test_admin_run_endpoints.py::TestRunCorporateMemory::test_unhandled_exception_still_audits`. - **#179 review — `/api/admin/run-session-collector` skipped audit on unhandled exceptions** (third occurrence of the same pattern, completes the trilogy of LLM-pipeline endpoints). Mirrored the same try/except + `unhandled_error` audit pattern from the other two endpoints, so a `PermissionError` walking `/home`, an `OSError` on `/data/user_sessions` mkdir, or any other unhandled exception from `collector.run()` now produces an audit row before re-raising as 500. Regression test in `tests/test_admin_run_endpoints.py::TestRunSessionCollector::test_unhandled_exception_still_audits`. - **#179 review — `/profile/sessions` 500-ed on transient `stat()` failure.** The previous implementation used `sorted(glob, key=lambda p: p.stat().st_mtime)`; if any single jsonl file's stat call raised (race with delete, EACCES from a remount, etc.), the whole sort raised and the page returned 500 instead of just dropping that one row. Reworked the gather: stat each path under try/except into a `(path, stat)` list, then sort the already-statted entries. Bad files are silently dropped from the listing. Regression test in `tests/test_web_ui.py::TestAdminRoleGuards::test_profile_sessions_page_tolerates_stat_failures`. ### Added - `/admin/scheduler-runs` — read-only admin page showing the last 200 audit-log entries from scheduler-driven actions (`run_session_collector`, `run_verification_detector`, `run_corporate_memory`, `marketplace.sync_all`). New `AuditRepository.query_actions(actions, limit)` query helper, new admin nav entry under the Admin dropdown. `data-refresh` (`POST /api/sync/trigger`) and `script-runner` (`POST /api/scripts/run-due`) are scheduler jobs but don't write to `audit_log` today, so they can't appear here yet. Failed scheduler ticks (HTTP 401, network errors) don't reach the audit_log either — those still live only in `docker logs agnes-scheduler-1`; the page calls that out with a hint to set `SCHEDULER_API_TOKEN` if no rows show up. - `/profile/sessions` — self-service user page in the user menu, showing all session jsonls the caller uploaded via `agnes push` joined against `session_extraction_state`. Each row shows uploaded_at, file size, status badge (`pending` / `processed` / `extracted`), processed_at, `items_extracted`, and a per-row Download button. The page docstring explicitly calls out that `items_extracted = 0` means the verification detector ran successfully but the LLM found no claims worth tracking — that's the documented "no items" outcome, not a broken pipeline. Closes the gap surfaced during the e2e test of #176 where a user could see their sessions on disk and process them through the LLM but had no UI to inspect what happened. - `GET /profile/sessions/` — owner-only download of a single jsonl. Auth via `get_current_user`; path safety locks the served file under `${DATA_DIR}/user_sessions//` and rejects path-traversal / nested-component / non-`.jsonl` / dotfile filenames with 404 (never 403, so existence of files belonging to other users is not leaked). `Content-Disposition: attachment` returns the file as a download. ### Internal - `tests/test_packaging.py` — guards against `anthropic`/`openai` slipping back into dev extras. - `tests/test_setup_ai_block.py` — overlay seeding contract for `POST /api/admin/configure`. - `tests/test_llm_provider_env_fallback.py` — env fallback + fail-fast for `create_extractor_from_env_or_config`. - `tests/test_admin_run_endpoints.py` — admin gating + scheduler registration + endpoint contract for the three new run-* endpoints. - `tests/test_docker_compose.py` — pins the compose contract: the two side-car services must not reappear under either Compose file. - `tests/test_corporate_memory_page.py` — pending-banner contract (admin sees, non-admin doesn't). - `tests/test_health_session_pipeline.py` — session-pipeline staleness check across cold-start + ok + warning + never-processed cases. - `tests/test_instance_config_overlay.py` — pins overlay env-ref resolution + the three LLM consumers reading from `app.instance_config` (#179 review). - `tests/test_scheduler.py` — `TestLLMPipelineCadenceEnvVars` + `TestVerificationDetectorGraceFollowsCadence` pin the new env-var-driven cadences and the single-source-of-truth contract between scheduler and health-check grace (#179 review). - `docs/architecture.md` — Services table updated to reflect the scheduler-v2 cadence map. ## [0.34.0] — 2026-05-04 End-to-end clean-analyst-bootstrap rewrite. The web `/setup` page now produces a single unified paste prompt that, dropped into Claude Code in an empty folder, fully bootstraps a workspace — installs the CLI, authenticates, fetches `CLAUDE.md`, installs SessionStart/End hooks, runs the first data refresh, and writes a human-readable workspace docs file (`AGNES_WORKSPACE.md`). The admin-vs-analyst layout split (introduced as `?role=` mid-cycle) was collapsed before merge: every caller sees the same flow, with the marketplace + plugins block emitted iff the caller has plugin grants. 26 implementation tasks across 6 phases plus a 10-task unification follow-up. ### Changed - **BREAKING** CLI binary renamed from `da` to `agnes`. No backward-compat alias is shipped. Update shell aliases, hook commands in any pre-existing `.claude/settings.json`, scripts, and cron jobs. Reinstall via `uv tool install `; the wheel now ships an `agnes` entry point. - **BREAKING** Environment variables and config dir renamed: `DA_CONFIG_DIR/DA_SERVER/DA_NO_UPDATE_CHECK/DA_LOCAL_DIR/DA_TOKEN/DA_STREAM_RETRIES` → `AGNES_*`; `~/.config/da/` → `~/.config/agnes/`. Hard cutover, no fallback. Existing analysts re-authenticate via `agnes auth import-token`. - **BREAKING** Analyst bootstrap rewritten end-to-end. `da analyst setup` is removed; replaced by `agnes init` (non-interactive, requires `--server-url` and `--token`). `da sync` is split into `agnes pull` (refresh) and `agnes push` (upload). `da fetch` is folded into `agnes snapshot create`. `da metrics list/show` is folded into `agnes catalog --metrics`; `da metrics import/export/validate` move to `agnes admin metrics {import,export,validate}`. The `da analyst` namespace is removed; the workspace status command is now `agnes status`. The previous `da status` (server-health overview) becomes `agnes diagnose system`. - **BREAKING** Workspace layout simplified. Removed: `data/parquet/`, `data/duckdb/`, `data/metadata/`, `user/artifacts/`. Canonical paths: `server/parquet/` (synced parquets), `user/duckdb/analytics.duckdb` (DuckDB views), `user/snapshots/` (ad-hoc snapshots), `user/sessions/` (recorded sessions). Lazy-mkdir contract — no empty pre-allocated directories. - **BREAKING** `/setup` is now a single unified flow regardless of caller's role. The `?role=` query parameter (introduced earlier in this Unreleased cycle but never released) is removed before merge — no migration needed. The admin tile is gone. PAT scope is uniform: every install-page mint uses `scope=general` with `expires_in_days=90`, calling the existing `POST /auth/tokens` endpoint. The `bootstrap-analyst` 1 h-clamped scope is no longer used from `/setup` (still defined in code for future reuse, see open issue for redesign). The marketplace + plugins block is emitted iff the caller has plugin grants in `resource_grants`. `agnes init` is now part of every setup flow (admin and analyst alike) — it's the workspace-rails delivery mechanism. `/install` continues to 302 to `/setup`. - `CLAUDE.md` server-side template + repo-root `CLAUDE.md` updated to reference the new CLI verbs and workspace paths. The admin UI for the `claude_md_template` DB override (`/admin/workspace-prompt`) renders a yellow banner when the saved override contains legacy strings (`data/parquet/`, `da sync`, `da fetch`, `da analyst setup`, `da metrics list/show`); admins re-author and save to clear it. Migration is manual. ### Added - `agnes init ` — non-interactive workspace bootstrap orchestrator. 8 steps: detect existing workspace, verify PAT (`GET /api/catalog/tables`), save config + token globally, fetch `CLAUDE.md` from `/api/welcome`, install SessionStart/End hooks via `cli/lib/hooks.py:install_claude_hooks`, write `CLAUDE.local.md` stub (preserved on `--force`), run first `agnes pull`, write `AGNES_WORKSPACE.md`. Errors render via `cli/error_render.py:render_error()` with typed kinds (`auth_failed`, `server_unreachable`, `partial_state`, `manifest_unauthorized`). - `agnes pull` / `agnes push` — split from the old `da sync` / `da sync --upload-only`. `--quiet` / `--json` / `--dry-run` flags. SessionStart hook runs `agnes pull --quiet`; SessionEnd hook runs `agnes push --quiet`. - `agnes snapshot create

` — folded from `da fetch`. Adds `if not local_db.exists()` guard so `agnes snapshot create` no longer silently materializes an empty DuckDB file when run before any `agnes pull`. - `agnes catalog --metrics` (replaces `da metrics list`) and `agnes catalog --metrics --show ` (replaces `da metrics show`). - `agnes admin metrics {import,export,validate}` — write paths relocated from the deleted `da metrics` namespace. - `agnes diagnose system` — server-side health check (was the old `da status`). - `AGNES_WORKSPACE.md` — human-readable workspace docs file generated by `agnes init` in the workspace root. Documents global install, workspace layout, hooks, cheat sheet, uninstall recipe. - PAT request body now accepts `scope: str = "general"` and `ttl_seconds: int | None = None` fields. PATs minted with `scope="bootstrap-analyst"` are TTL-clamped to ≤ 1 h server-side. Existing `expires_in_days` field continues to work; `ttl_seconds` wins when both are set. `ttl_seconds` upper bound is 315_360_000 (matches `expires_in_days <= 3650` cap). JWT carries the `scope` claim via new `extra_claims` parameter on `create_access_token`; reserved keys (`sub`/`email`/`typ`/`iat`/`jti`/`exp`) cannot be overridden via `extra_claims`. Audit log includes the scope. - `cli/lib/` shared-library tree with `cli/lib/pull.py:run_pull` (data-refresh primitive callable from both the Typer wrapper and `agnes init`) and `cli/lib/hooks.py:install_claude_hooks` (workspace-scoped, idempotent Claude Code hook installer). - `_scan_legacy_strings` helper + `legacy_strings_detected` field on `GET /api/admin/workspace-prompt-template` — server scans saved CLAUDE.md overrides for stale CLI verbs / paths; the admin UI banner consumes the field. - `/setup` pre-flight check (step 4, gated on the marketplace block being present) now verifies `claude --version` in addition to `git --version`. Both binaries are needed by `claude plugin marketplace add` and the git-clone fallback — checking them together surfaces a clear "install X" message instead of a confusing downstream error. Install hints: `npm i -g @anthropic-ai/claude-code` for Linux/WSL plus a doc URL (`https://docs.claude.com/claude-code`) for macOS / Windows native installers. ### Fixed - `agnes pull` (formerly `da sync`) no longer creates `.claude/rules/` when the corporate-memory bundle is empty. - `agnes pull` no longer creates `server/parquet/` when the manifest is empty (mkdir is lazy — only on first per-table write). - `agnes snapshot create` (formerly `da fetch`) no longer materializes an empty `user/duckdb/analytics.duckdb` when run before any `agnes pull`. Friendly hint redirects to `agnes pull`. - Workspace `agnes status` reads from the canonical `server/parquet/` and `user/duckdb/analytics.duckdb` paths (was reading legacy `data/parquet/`, `data/metadata/last_sync.json`). - `agnes init` and `agnes pull` errors now use the `cli/error_render.py` typed-error renderer (added in 0.32.0), so analyst-facing error UX matches the structured shape `agnes query --remote` already produces. - **Schema v24 migration retry path is no longer dead** (Devin Review on `db.py:1757`, escalated from advisory to critical on rescan). Pre-fix: when `_v23_to_v24_finalize` had materialized BQ rows to migrate but `data_source.bigquery.project` was not configured, it logged a warning per row and returned normally. The schema_version then bumped to 24 unconditionally, the `if current < 24:` gate in `_ensure_schema` skipped the function on every subsequent startup, and the affected rows kept their DuckDB-flavor `bq."ds"."tbl"` source_query forever — which the new `_wrap_admin_sql_for_jobs_api` wrapping path rejects as unparseable BQ SQL with no automatic recovery. The "set the project and restart to retry" log hint pointed at a code path that no longer ran. Fix: the migration now raises `RuntimeError` BEFORE the schema_version bump when it has rows to migrate but no project_id, blocking startup with a clear actionable error pointing at `data_source.bigquery.project`. Operator configures the project, restarts, and the migration completes (schema_version is still at 23, so the `if current < 24:` gate fires). Side effect: a BQ-using deployment that hasn't set the project blocks startup until they do — that's the right call for a config error that would otherwise silently break materialized tables. Two regression tests in `test_schema_v24_source_query_rewrite.py`: `test_v24_raises_when_project_not_configured_and_rows_need_migration` (raise + version-stays-at-23) and `test_v24_skips_clean_when_no_rows_match_even_without_project` (no-rows-no-block invariant). - **`agnes admin register-table` UX**: three real-world feedback items addressed. - **`--query-mode materialized` now requires `--bucket`** (client-side validation; exits with a clear error before hitting the server). The previous help docstring claimed `--bucket` was *ignored* for materialized rows, but the value is actually load-bearing — `agnes schema ` builds the BQ identifier as `bq..`, so an empty bucket registered the row but broke subsequent schema/describe with HTTP 400 "unsafe BQ identifier in registry". Docstring rewritten to reflect reality. - **Post-success hints**: after a successful registration the CLI now points operators at the two follow-ups they routinely miss: (a) `agnes setup first-sync` to materialize the parquet (registration alone doesn't trigger a build; `agnes pull` reports "Updated 0 tables" until the scheduler tick), and (b) `agnes admin grant create table ` to make the row visible in `agnes catalog` for non-admin users (catalog is RBAC-filtered). - Test coverage: `tests/test_cli_admin_materialized.py::test_register_materialized_without_bucket_fails_with_clear_error` and `test_register_table_emits_first_sync_and_grant_hints`. - **`agnes query --remote` SQL rewriter no longer corrupts output when the GCP project ID contains a registered table name as a hyphen-delimited word** (Devin Review on `query.py:464`). The previous iterative rewrite (one `re.sub(\b\b, ...)` per registered name) was vulnerable to cross-contamination: e.g. project `my-ue-project` + registered `orders` + registered `ue` → iter 1 rewrites `orders` to `\`my-ue-project.fin.orders\``, iter 2's `\bue\b` then matches the `ue` INSIDE `my-ue-project` and corrupts the iter-1 path. Fix: replaced the iteration with a SINGLE `re.sub` whose alternation regex (sorted longest-first) handles every name in one pass, so freshly-inserted backticked text isn't re-scanned. The fallback at `query.py:576` (per-table SELECT * on BQ parse error) caught the corrupted output as `bq_bad_request` so impact was over-estimation rather than fail-open, but the partition-pruning benefit of #171 is now preserved for projects whose IDs share a hyphen-segment with a registered table name. Regression test in `tests/test_api_query_guardrail.py::test_rewrite_helper_does_not_corrupt_when_project_id_contains_registered_name`. - **BigQuery materialize TTL reclaim is no longer dead code** (Devin Review on `extractor.py:166`). `_try_acquire_file_lock` used to call `open(lock_path, mode="w")` BEFORE checking the lock-file mtime, which truncated the file and refreshed mtime to *now* on every invocation. The subsequent `time.time() - lock_path.stat().st_mtime` always saw age ~0, so `age > TTL` never fired, and `materialize.lock_ttl_seconds` was a silently no-op config knob. Fix: stat the lock path BEFORE any `open()` to read the real pre-probe mtime; if older than TTL, unlink (forcing a fresh inode for the next `open + flock`); only then probe. Two regression tests added: `test_stale_held_lock_is_reclaimed_despite_live_holder` exercises the full reclaim path with a still-living fcntl holder, `test_failed_probe_does_not_self_refresh_lock_mtime` pins that a failed acquisition doesn't pathologically loop. Residual cross-process risk (a genuinely overrunning materialize past TTL races a fresh attempt) is documented in the helper docstring; in-process `threading.Lock` keyed on `table_id` blocks the single-process race. - **`agnes init --token X` now correctly uses the explicit token in the verify call**, even when `~/.config/agnes/token.json` already holds a stale token from a prior install. Pre-fix `cli.config.get_token()` read the on-disk file first and only fell back to env vars, so step 2 (PAT-verify) ran with the stale token and failed with a confusing 401 — even though the `--token` arg was valid (Devin Review on `init.py:99`). Fix: a `ContextVar`-based override in `cli.config` short-circuits `get_token()` before the file read; `_override_server_env` (used by both `agnes init` and `agnes pull`'s `run_pull`) sets it for the duration of the call. Async-safe (each task sees its own override) and leak-proof (resets on context exit). - **`agnes status` sessions counter now reads the same source as `agnes push`** — `~/.claude/projects//` (Claude Code's actual write path) with the legacy `/user/sessions/` as a fallback, via `cli.lib.claude_sessions.list_session_files()`. Pre-fix the counter only checked the legacy dir and always reported 0 in workspaces bootstrapped with `agnes init` (since Claude Code never writes there). - **BigQuery materialize lock-reclaim docstring** at `connectors/bigquery/extractor.py:_try_acquire_file_lock` corrected: a still-running holder's `fcntl.flock` does NOT block the post-unlink reacquisition (new file = new inode = independent lock). The in-process `threading.Lock` keyed on `table_id` is the actual concurrency guard; cross-process protection (two schedulers on one workspace) relies on operators not running multiple concurrent schedulers AND on the TTL being well above the longest plausible COPY (24 h default). Documenting the residual risk so it isn't masked by a misleading "we're safe" comment (Devin Review on extractor.py:111). - **`agnes pull` now re-downloads parquets when the local file is missing, even if the recorded hash matches the server.** Pre-fix the download set was computed from `sync_state.json` hash equality alone — if the parquet had been deleted (manual `rm`, disk cleanup, a different workspace sharing the same global `~/.config/agnes/sync_state.json` writing one workspace's parquets while another reads sync_state and assumes "I already have these"), the hash-equal check would short-circuit the download and the next DuckDB view rebuild would fail on a missing file. Now the existence check on `/server/parquet/.parquet` runs alongside the hash compare; missing file → forced re-download regardless of hash. - **`agnes query --remote` no longer over-rejects narrow queries on partitioned/clustered BigQuery tables.** Closes #171. Pre-fix the `/api/query` cost guardrail dry-ran a synthetic `SELECT * FROM

` per registered remote-BQ row referenced by the user SQL, which forced BQ to estimate "full table scan" — column projection, predicate pushdown, and partition pruning were all ignored, producing scan-byte estimates up to ~30,000× larger than the actual query would scan. Narrow queries on big partitioned tables (the documented happy-path use case) were rejected with 400 `remote_scan_too_large` even when BQ's own dry-run reported single-digit MB. Now the guardrail rewrites the user SQL from DuckDB-flavor (bare registered names + `bq."".""`) to BQ-native (`` `..` ``) and runs ONE dry-run on the EXACT user SQL — partition pruning, column projection, and predicate pushdown all engage. Cap check uses the real estimate. Fallback: if BQ rejects the rewritten SQL with `bq_bad_request` (DuckDB-only syntax that doesn't translate, e.g. `::INT` casts), the guardrail falls back to the pre-fix per-table SELECT * estimate so a non-portable query still gets bounded; non-parse errors (forbidden / upstream) propagate as 502. Helpers exported as `_rewrite_user_sql_for_bq_dry_run` (test seam). - **Windows: `agnes` CLI no longer crashes on cs-CZ / non-UTF-8 consoles.** Two failure modes addressed (originally reported in #172 against the pre-rename `da` CLI; ported and broadened here): (1) `agnes pull` and any other Rich-progress-bar codepath crashed with `UnicodeEncodeError` because cp1250 / cp1252 cannot encode Rich's Braille spinner glyphs — `cli/main.py` now reconfigures `sys.stdout` / `sys.stderr` to UTF-8 with `errors="replace"` at import time when `sys.platform == "win32"`. (2) `agnes skills list` and `agnes skills show` crashed with `UnicodeDecodeError` reading skill markdown that contains em-dashes / accents — every `Path.read_text()` / `Path.write_text()` / `open()` call site in `cli/` (including ones not touched by #172, since several files were renamed in the bootstrap rewrite) now passes `encoding="utf-8"` explicitly. Defensive: also covers JSON / YAML config files that were ASCII-only in practice but were one non-ASCII value away from the same failure mode. - `agnes snapshot create … --estimate` in a pre-init directory no longer leaks an httpx `ConnectError` traceback to stderr. The estimate-guard fix (3d587681) let `--estimate` reach `api_post_json`, but the existing `except V2ClientError` clause didn't catch transport-layer errors when no server was configured (defaulted to `http://localhost:8000`). Now also catches `httpx.HTTPError` and renders the friendly hint `Run \`agnes init …\` first`. - `agnes push` now reads Claude Code session jsonls from `~/.claude/projects//` (where Claude Code actually writes them), instead of `/user/sessions/` (which the SessionEnd hook never populated — the previous code uploaded an empty list every time). Encoding logic in `cli/lib/claude_sessions.py` probes both Claude Code variants — older `/`→`-` and newer all-non-alphanumeric→`-` — and unions the result, so users who have upgraded Claude Code mid-project see sessions from both encoded dirs. Falls back to `/user/sessions/` for back-compat. ### Removed - `da analyst setup`, `da analyst status`, `da sync`, `da fetch`, `da metrics`. See **Changed** for replacements. - `da metrics` namespace as a top-level group (subcommands moved to `agnes catalog --metrics` for read-only views and `agnes admin metrics …` for write operations). - Legacy workspace directories `data/parquet/`, `data/duckdb/`, `data/metadata/`, `user/artifacts/`. Existing analyst workspaces should be reinitialized with `agnes init --server-url ... --token ... --force` (a fresh empty folder is recommended). - `_resolve_analyst_lines`, `_analyst_init_lines`, `_analyst_finale_lines` helpers in `app/web/setup_instructions.py` — the analyst-vs-admin layout split is gone. `role` parameter on `compute_default_agent_prompt`, `resolve_lines`, and `render_setup_instructions`. `?role=` query parameter on `/setup`. Admin tile (`