Merge pull request #188 from keboola/zs/combined-perf-and-clarity

release: 0.36.0 — perf + analyst-clarity bundle BQ query timeout knob, Caddy file_server parquet bypass, parallel parquet pulls, auto-upgrade self-update, Tier 1 event-loop unblocking, clean CLI errors + init progress + skip-materialize, workspace prompt decision tree + size hint.
2026-05-05 19:22:53 +02:00 · 2026-05-05 19:22:53 +02:00 · 1315f9f93c
commit 1315f9f93c
parent 4751094e1c e2f740d7ab
31 changed files with 1062 additions and 74 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -10,8 +10,29 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C

 ## [Unreleased]

+## [0.36.0] — 2026-05-05
+
+Combined performance + analyst-clarity bundle. Folds three previously-staged work streams into one PR (#188): the long-running `agnes query --remote` timeout (#181), the Caddy parquet-download bypass (#182), and Pavel's #185 Phase 1 trace findings (silent 44-min first-init, opaque CLI tracebacks, no analyst-Claude size signal). Also performs the Tier 1 event-loop unblocking — the five hottest BQ-touching endpoints were `async def` over synchronous DuckDB / BQ-extension calls, so a single heavy `agnes query --remote` froze every other request for the duration of the BQ wait. The image-side fixes ship in this release; for existing VMs, the new auto-upgrade.sh self-fetches the matching Caddyfile + compose overlays from `main` on its next 5-minute tick, so deployment requires no operator action beyond letting the cron run.
+
+### Added
+- **`data_source.bigquery.query_timeout_ms` config knob** (default 600 000 ms = 10 min). The DuckDB BigQuery extension's built-in default of 90 s was too tight for analyst-scale queries against view-backed BQ datasets — `agnes query --remote` would HTTP 400 with `Binder Error: Query execution exceeded the timeout. Job ID: …` whenever the underlying BQ job took longer than 90 s, even though the BQ job itself was healthy. The new knob is applied via `SET bq_query_timeout_ms` after every `LOAD bigquery` on every BQ-touching DuckDB session — the orchestrator's `_remote_attach` ATTACH path (`src/orchestrator.py`), the analytics-DB read-only reattach path (`src/db.py:_reattach_remote_extensions` — the primary `agnes query --remote` request path), the `BqAccess` session factory (`connectors/bigquery/access.py`), and the standalone extractor (`connectors/bigquery/extractor.py`). Sentinel `0` (or non-numeric / unparseable values) leaves the extension default in place so operators on legacy extension versions that don't recognise the setting aren't broken. Configurable via `/admin/server-config` UI. Note: BigQuery's `jobs.query` RPC caps the wait at ~200 s per call regardless of this setting; the extension polls on top so the effective ceiling is the value here but each poll is ~200 s. DuckDB emits an informational warning when the value is set above the BQ RPC cap — operators can safely ignore it.
+- **Per-user parallel parquet downloads in `agnes pull`** — the download loop in `cli/lib/pull.py` now uses a `ThreadPoolExecutor` with concurrency capped by the new `AGNES_PULL_PARALLELISM` env var (default 4, set 1 to restore pre-PR serial behavior). On a registry of N tables the wall-clock time drops from `Σ stream_download_seconds(table_i)` to roughly `max × ceil(N/4)`. Works hand-in-hand with the Caddy `file_server` change below: without it parallel client-side downloads would still queue on the single uvicorn worker; with it each request is its own caddy goroutine + sendfile, so 4-way parallelism actually delivers throughput. Per-table error semantics preserved — a failure on one table no longer aborts the rest of the batch.
+- **`agnes init` / `agnes pull --skip-materialize`** — opts the first sync out of materialized-mode tables (server-side scheduled-query parquets, often multi-GB). Pavel's #185 Phase 1: a single 6.3 GB `order_economics` parquet kept first init silent for 44 minutes. Materialized rows stay discoverable via `agnes catalog`; rerun without the flag once the analyst actually needs them locally.
+- **`agnes pull` progress bar** — Rich-driven aggregate transfer display rendered to stderr when not `--quiet` and not `--json`. Per-file label + bytes / total / rate / ETA, aggregated across the parallel `ThreadPoolExecutor` workers introduced earlier in this PR. Replaces the prior 0-stdout silence on first init.
+- **CLI clean-error wrapper** (`cli/main.py:_run_with_clean_errors`, new entry point in `pyproject.toml`) — `httpx.ReadTimeout` / `ConnectError` / `RemoteProtocolError` etc. used to dump a five-frame Python traceback to the analyst's terminal when a `agnes query --remote` against a slow BQ view timed out client-side. Now: one-line `Error: …` message + actionable hint (e.g. "narrow the WHERE on the partition column from `agnes catalog --json`, or run `agnes snapshot create --estimate`"), exit code 1. Full traceback is appended to `~/.config/agnes/last-error.log` so an operator can recover it for support without spamming the analyst's terminal. Implemented as `AgnesTransportError` raised from the `api_get` / `api_post` / `api_delete` / `api_patch` / `stream_download` helpers in `cli/client.py`; the top-level Typer wrapper renders it. Unhandled `Exception`s are caught at the same boundary, logged, and printed as "internal CLI error (see logfile)" so a Python traceback never leaks to the analyst.
+- **`scripts/ops/agnes-auto-upgrade.sh` now re-fetches Caddyfile + every compose overlay** from `keboola/agnes-the-ai-analyst@main` on every tick, hashes them, and triggers a `docker compose up -d` recreation when the hash changes — same path as an image-digest change. Pre-fix the script only watched `docker images` digests, so a Caddyfile or compose change in main never reached running VMs (only fresh boots ran `startup.sh`'s file fetch). Without this, the new file_server downloads-path below would land in the image but stay inert against an old Caddyfile. The script also self-updates from the same path so the very fix that watches config files isn't itself stuck on running VMs. Fail-soft on curl errors — keeps the existing file rather than blanking it.
+- **Caddy `file_server` for parquet downloads** — `GET /api/data/{table_id}/download` is now intercepted at the Caddy layer (TLS profile only) and served directly via sendfile/zero-copy from the data volume mounted read-only at `/srv` inside the caddy container. Caddy authorises every request via a new lightweight RBAC probe `GET /api/data/{table_id}/check-access` (returns 204 when the caller has read access on the table, 403 otherwise) using the `forward_auth` directive — the bulk byte transfer never touches uvicorn workers. Resolves a real production failure mode where a single multi-GB analyst pull held the app's only uvicorn worker for the duration of the stream and starved the UI / `/api/health` / every other API endpoint, eventually flipping the container to `unhealthy`. Path discovery uses Caddy's `try_files` over the known `extract.duckdb` v2 source subdirs (`bigquery/data/<id>.parquet`, `keboola/data/<id>.parquet`, `jira/data/<id>.parquet`); a parquet not at any of those paths transparently falls through to the existing app handler so legacy `src_data/parquet` layouts and future connectors keep working with no Caddyfile change. Non-Caddy deployments (dev `docker compose up` without `--profile tls`) continue to use the app handler unchanged.
+- **Workspace prompt: decision tree, common-mistakes callout, failure-mode dictionary** in `config/claude_md_template.txt` (the template `agnes init` writes to `<workspace>/CLAUDE.md`). Surfaces every catalog-row field analyst Claude should read before deciding which command to use (`query_mode`, `sql_flavor`, `where_examples`, `fetch_via`, `rough_size_hint`); explicitly binds `--estimate` to `agnes snapshot create` ONLY (was the most-failed first-try misuse — fails with `No such option: --estimate` on `agnes query`); calls out the `agnes fetch` → `agnes snapshot create` rename so stale-doc analysts don't run a non-command; documents the BQ permission model (server SA, not personal Google identity) and a 6-row failure-mode table mapping each common error wording to its cause + the right next step.
+- **`rough_size_hint` populated for `local` + `materialized` catalog rows** in `GET /api/v2/catalog` (was hardcoded `null` with a "Task 8" TODO). Reads the parquet file size at `${DATA_DIR}/extracts/<source_type>/data/<table_id>.parquet` and buckets into `small` (≤100 MiB), `medium` (≤1 GiB), `large` (≤10 GiB), `very_large` (>10 GiB). `remote` rows stay `null` for now (size requires a BQ INFORMATION_SCHEMA call; tracked separately). Lets analyst Claude pick `agnes snapshot create` over `agnes query --remote` by inspecting `agnes catalog --json` rather than discovering size empirically via a failed `--remote` round-trip.
+
+### Changed
+- **Tier 1 event-loop unblocking** — the five hottest BQ-touching endpoints (`POST /api/query`, `POST /api/v2/scan`, `POST /api/v2/scan/estimate`, `GET /api/v2/sample/{id}`, `GET /api/v2/schema/{id}`) were declared `async def` but invoked synchronous DuckDB / BQ-extension calls inside the body. Under uvicorn's single event loop that meant a single heavy `agnes query --remote` (waiting up to ~200 s for BQ's `jobs.query` to return) **froze every other request** — `/api/health`, the dashboard, auth, even another query — for the full duration of the BQ wait. Operators saw "VM idle, app frozen" symptoms during this work. Converted all five to plain `def` so FastAPI auto-offloads the blocking body to the anyio thread pool; the event loop stays free for non-BQ requests. Verified via 0-await audit (no `await` statements in the converted handlers, so the rename is safe). Tests: `tests/test_v2_*.py` were rewritten to call the handlers directly instead of `asyncio.run(...)` (which now fails on a non-coroutine return). Pairs with the thread-pool capacity bump below.
+- **`AGNES_THREADPOOL_SIZE` env var** (default 200, was anyio's stock 40) controls the FastAPI / Starlette thread pool capacity used by every plain-`def` route handler. Set in `app/main.py:lifespan` via `anyio.to_thread.current_default_thread_limiter().total_tokens`. 200 leaves comfortable headroom over the BQ extension's connection budget while keeping the per-process thread cost bounded — for the workload of <50 concurrent analysts this is well over what's needed; bump for higher concurrency.
+- **CLI update-banner now says `agnes` instead of `da`** (`cli/update_check.py:format_outdated_notice`). The string `[update] da X is out of date` had survived the `da` → `agnes` CLI rename and was the most-visible stale identifier in the analyst-facing surface — every CLI command printed it on stderr when a newer wheel was available.
+
 ### Fixed

+- **CLI ReadTimeout message reports the actual httpx timeout** (was hardcoded to `QUERY_TIMEOUT_S` = 300s). On a 30s-default call (`agnes catalog`, `agnes auth`, …) the analyst saw "didn't respond within the read timeout (300s)" while the call had actually given up after 30s — confusing and unactionable. The translator now takes the real timeout from the calling helper and renders it; the long-running-BQ advisory only appears for calls where the timeout was set ≥ 60s. Devin Review on PR #188.
 - Keboola sync now falls back to the legacy Storage-API client when the DuckDB Keboola extension's per-table scan fails, not just when the initial `ATTACH` fails. Two changes:
  - `kbcstorage>=0.9.0` is promoted from optional to core dependency. The legacy fallback path in `connectors/keboola/extractor.py:_extract_via_legacy` has been there since the extension landed, but until now the bare `from kbcstorage.client import Client` would crash any default install with `ModuleNotFoundError`.
  - `connectors/keboola/extractor.py:run` now wraps `_extract_via_extension` in a per-table try/except — on any per-table scan failure it retries via the legacy client. Previously, when `ATTACH` succeeded but the table-level `COPY (SELECT * FROM kbc."<bucket>"."<table>")` failed, the table was just marked failed with no retry.
--- a/54
+++ b/54
@ -34,6 +34,60 @@
 		-Server
 	}

+	# Direct file_server for parquet downloads — bypasses uvicorn so a
+	# multi-GB pull from one analyst can't starve the app workers and
+	# block UI / health / API for everyone else. forward_auth calls the
+	# app's lightweight ``/api/data/{id}/check-access`` (RBAC only,
+	# ~1 ms) on every request; on 2xx Caddy serves the file directly
+	# via sendfile/zero-copy from the data volume mounted read-only.
+	#
+	# Path layout matches `app/api/data.py`'s extract.duckdb v2 search:
+	#   /data/extracts/<source_type>/data/<table_id>.parquet
+	# try_files probes known source subdirs in order; first hit wins.
+	# If a deployment adds a new connector and lands parquets at a fresh
+	# subdir, extend the try_files list. Anything that misses falls
+	# through to the app reverse_proxy below — so an unmapped source
+	# degrades to "downloads work, just through uvicorn" — never 404.
+	@download path_regexp tid ^/api/data/([^/]+)/download$
+	handle @download {
+		forward_auth app:8000 {
+			uri /api/data/{re.tid.1}/check-access
+			# Bearer PAT or session cookie travels in Authorization
+			# / Cookie; copy_headers ensures the upstream sees them.
+			copy_headers Authorization Cookie
+		}
+		# Caddy's own /data is occupied by the caddy_data volume, so the
+		# agnes data dir is mounted at /srv (read-only) instead — see the
+		# `data:/srv:ro` line in docker-compose.yml's caddy service. The
+		# root + try_files combo therefore probes /srv/extracts/...
+		#
+		# Devin Review caught: `try_files A B C` rewrites the URI to its
+		# LAST entry when no file matches (per Caddy docs). Without an
+		# explicit "rewrite back to original URI" fallback, a parquet
+		# missing from all three known paths would get rewritten to the
+		# last static candidate (`/jira/data/<id>.parquet`), and the
+		# reverse_proxy below would forward THAT rewritten URI to
+		# app:8000 → app has no such route → 404. To make the documented
+		# "missed → falls through to app handler" promise hold, append
+		# the original `/api/data/<id>/download` path as the final
+		# try_files entry: when no file matches, the URI is rewritten
+		# back to the analyst-facing path and the app's `download_table`
+		# handler picks it up via the reverse_proxy fallback below.
+		root * /srv/extracts
+		try_files /bigquery/data/{re.tid.1}.parquet /keboola/data/{re.tid.1}.parquet /jira/data/{re.tid.1}.parquet /api/data/{re.tid.1}/download
+		@found file
+		handle @found {
+			header Content-Disposition "attachment; filename=\"{re.tid.1}.parquet\""
+			file_server
+		}
+		# Fallback: parquet not at any known static path → defer to app
+		# (handles legacy src_data/parquet layout + future connectors).
+		reverse_proxy app:8000 {
+			header_up X-Forwarded-Proto https
+			header_up X-Forwarded-Host {host}
+		}
+	}
+
 	reverse_proxy app:8000 {
 		# App's uvicorn runs with --proxy-headers, so stamping these
 		# ourselves makes OAuth callback URLs and Set-Cookie Secure
--- a/app/api/admin.py
+++ b/app/api/admin.py
@ -285,6 +285,24 @@ _KNOWN_FIELDS: dict[str, dict[str, dict]] = {
                        "`agnes snapshot create` suggestion. 0 disables the gate. Default 5368709120 = 5 GiB."
                    ),
                },
+                "query_timeout_ms": {
+                    "kind": "int",
+                    "default": 600000,
+                    "hint": (
+                        "DuckDB BigQuery extension query timeout (milliseconds). Applied "
+                        "via `SET bq_query_timeout_ms` after every `LOAD bigquery` on "
+                        "every BQ-touching DuckDB session (orchestrator remote-view "
+                        "ATTACH, BqAccess factory, standalone extractor). Extension "
+                        "default is 90 000 ms = 90 s, which is too tight for analyst "
+                        "queries against view-backed datasets — bumped to 600 000 ms = "
+                        "10 min by default. Set 0 to fall through to the extension "
+                        "default. Note: the underlying BQ jobs.query RPC caps the wait "
+                        "at ~200 s per call; the extension polls on top, so the "
+                        "effective ceiling is this value but each poll round-trip is "
+                        "~200 s. DuckDB itself emits a warning when this is set above "
+                        "~200 s — that warning is informational, not an error."
+                    ),
+                },
            },
        },
        "keboola": {
--- a/app/api/data.py
+++ b/app/api/data.py
@ -1,6 +1,6 @@
 """Data download endpoint — streaming parquet files."""

-from fastapi import APIRouter, Depends, HTTPException, Request
+from fastapi import APIRouter, Depends, HTTPException, Request, Response
 from fastapi.responses import FileResponse
 import duckdb

@ -12,6 +12,36 @@ from src.rbac import can_access_table
 router = APIRouter(prefix="/api/data", tags=["data"])


+@router.get("/{table_id}/check-access")
+async def check_access(
+    table_id: str,
+    user: dict = Depends(get_current_user),
+    conn: duckdb.DuckDBPyConnection = Depends(_get_db),
+):
+    """Lightweight RBAC probe used by Caddy's ``forward_auth`` directive
+    to gate file_server-served parquet downloads without involving the
+    app's request workers in the bulk byte transfer.
+
+    Returns HTTP 204 No Content when the caller has read access to
+    ``table_id``; HTTP 403 (via ``can_access_table`` returning False)
+    otherwise. Caddy treats 2xx as authorized and forwards the request
+    to its own ``file_server`` block; non-2xx is returned to the client
+    verbatim.
+
+    Why a separate endpoint and not just ``HEAD /download``: ``HEAD`` on
+    the FileResponse-based ``download`` handler still opens the file and
+    runs stat() to populate Content-Length / ETag. ``forward_auth`` calls
+    this endpoint on every request, so the per-call cost matters; a pure
+    RBAC check is ~1 ms while a HEAD path involves filesystem walks
+    (``rglob`` for the parquet across source subdirs).
+    """
+    if not _SAFE_QUOTED_IDENTIFIER.match(table_id):
+        raise HTTPException(status_code=404, detail="Table not found")
+    if not can_access_table(user, table_id, conn):
+        raise HTTPException(status_code=403, detail="Access denied to this table")
+    return Response(status_code=204)
+
+
@router.get("/{table_id}/download")
 async def download_table(
    table_id: str,
@ -19,7 +49,16 @@ async def download_table(
    user: dict = Depends(get_current_user),
    conn: duckdb.DuckDBPyConnection = Depends(_get_db),
 ):
-    """Stream a parquet file for download. Supports ETag for caching."""
+    """Stream a parquet file for download. Supports ETag for caching.
+
+    On Caddy-fronted deployments the matching Caddyfile rule intercepts
+    ``GET /api/data/{table_id}/download``, calls ``check-access`` via
+    ``forward_auth``, and serves the parquet directly via ``file_server``
+    — bypassing this handler entirely. This handler stays as the
+    canonical fallback for non-Caddy deployments (dev `docker compose
+    up`, alternative reverse proxies, direct :8000 access) where the
+    bulk transfer goes through uvicorn.
+    """
    # Reject unsafe table_id before any filesystem or DB operations.
    # Use the relaxed quoted-identifier check that allows dots and hyphens
    # (Keboola table IDs like "in.c-crm.orders") while still blocking
@ -53,7 +92,6 @@ async def download_table(
    etag = f'"{stat.st_mtime_ns}"'
    if_none_match = request.headers.get("if-none-match")
    if if_none_match == etag:
-        from starlette.responses import Response
        return Response(status_code=304)

    return FileResponse(
--- a/app/api/query.py
+++ b/app/api/query.py
@ -71,12 +71,23 @@ class QueryResponse(BaseModel):


@router.post("", response_model=QueryResponse)
-async def execute_query(
+def execute_query(
    request: QueryRequest,
    user: dict = Depends(get_current_user),
    conn: duckdb.DuckDBPyConnection = Depends(_get_db),
 ):
-    """Execute SQL against the server analytics DuckDB."""
+    """Execute SQL against the server analytics DuckDB.
+
+    Plain ``def`` (not ``async def``) so FastAPI auto-offloads the call
+    to the anyio thread pool. The body invokes ``analytics.execute(sql)``
+    synchronously, which blocks for the full BQ jobs.query wait when a
+    referenced view resolves through the BQ extension. Under ``async def``
+    that block holds the single uvicorn event loop, freezing every other
+    request (UI, /api/health, auth) until the query returns. Plain ``def``
+    runs each invocation on its own thread, so heavy queries no longer
+    starve unrelated endpoints. See PR #188's CHANGELOG entry for the
+    Tier 1 event-loop unblocking rollout.
+    """
    sql_lower = request.sql.strip().lower()

    # Block everything except SELECT
--- a/app/api/v2_catalog.py
+++ b/app/api/v2_catalog.py
@ -2,10 +2,12 @@

 from __future__ import annotations
 from datetime import datetime, timezone
+from pathlib import Path
 from fastapi import APIRouter, Depends
 import duckdb

 from app.auth.dependencies import get_current_user, _get_db
+from app.utils import get_data_dir as _get_data_dir
 from src.rbac import can_access_table
 from src.repositories.table_registry import TableRegistryRepository
 from app.api.v2_cache import TTLCache
@ -43,6 +45,52 @@ def _fetch_hint(table_id: str, source_type: str) -> str:
    return "already local — query directly via `agnes query`"


+# Coarse size buckets for `rough_size_hint`. Boundaries chosen so an analyst
+# Claude can decide tool by inspection: anything `large` or worse implies
+# `agnes snapshot create` over `agnes query --remote`. Numbers reflect the
+# default `bq_max_scan_bytes` 5 GiB ceiling — at "large" you're already at
+# half the per-query gate and a naive `--remote` is likely to refuse.
+_SIZE_BUCKETS = (
+    (10 * 2**20, "small"),     # ≤10 MiB
+    (100 * 2**20, "small"),    # ≤100 MiB still small (analyst-laptop scale)
+    (1 * 2**30, "medium"),     # ≤1 GiB
+    (10 * 2**30, "large"),     # ≤10 GiB
+)
+
+
+def _bucket_size(byte_count: int) -> str:
+    for cap, label in _SIZE_BUCKETS:
+        if byte_count <= cap:
+            return label
+    return "very_large"
+
+
+def _materialized_size_hint(table_id: str, source_type: str, query_mode: str) -> str | None:
+    """Return a rough size bucket for a row whose data is on the server's
+    local filesystem (any `query_mode` that produces a parquet — `local` and
+    `materialized`). Returns ``None`` for `remote` (size requires a BQ
+    INFORMATION_SCHEMA round-trip; tracked separately) and for tables whose
+    parquet hasn't been materialised yet so the AI gets ``null`` not a
+    misleading "small".
+
+    Layout matches the v2 extract.duckdb contract:
+      ${DATA_DIR}/extracts/<source_type>/data/<table_id>.parquet
+    """
+    if query_mode == "remote":
+        return None
+    if not source_type:
+        return None
+    try:
+        path = Path(_get_data_dir()) / "extracts" / source_type / "data" / f"{table_id}.parquet"
+        if not path.exists():
+            return None
+        return _bucket_size(path.stat().st_size)
+    except Exception:
+        # Filesystem stat() race / permissions / weird DATA_DIR — fall back
+        # to null rather than crash the whole catalog response.
+        return None
+
+
 def build_catalog(conn: duckdb.DuckDBPyConnection, user: dict) -> dict:
    rows = _table_rows_cache.get(_TABLE_ROWS_KEY)
    if rows is None:
@ -66,7 +114,10 @@ def build_catalog(conn: duckdb.DuckDBPyConnection, user: dict) -> dict:
            "sql_flavor": _flavor_for(r.get("source_type") or ""),
            "where_examples": _examples_for(r.get("source_type") or ""),
            "fetch_via": _fetch_hint(r["id"], r.get("source_type") or ""),
-            "rough_size_hint": None,  # populated by Task 8 schema endpoint when called
+            "rough_size_hint": _materialized_size_hint(
+                r["id"], r.get("source_type") or "",
+                r.get("query_mode") or "local",
+            ),
        })

    return {
@ -76,8 +127,17 @@ def build_catalog(conn: duckdb.DuckDBPyConnection, user: dict) -> dict:


@router.get("/catalog")
-async def catalog(
+def catalog(
    user: dict = Depends(get_current_user),
    conn: duckdb.DuckDBPyConnection = Depends(_get_db),
 ):
+    # Plain ``def`` so FastAPI auto-offloads to the anyio thread pool —
+    # build_catalog now calls `_materialized_size_hint` for every visible
+    # row, which does sync `Path.stat()` / `Path.exists()` on the data
+    # volume. On local FS that's microseconds, but on a network-mounted
+    # DATA_DIR (NFS / CIFS / GCS-FUSE) those calls can block. Plain ``def``
+    # means each request runs on its own thread; the event loop stays
+    # free for non-catalog traffic. Mirrors the Tier 1 conversion of
+    # /api/query, /api/v2/scan, /api/v2/sample, /api/v2/schema —
+    # Devin Review on PR #188.
    return build_catalog(conn, user)
--- a/app/api/v2_sample.py
+++ b/app/api/v2_sample.py
@ -104,13 +104,15 @@ def build_sample(


@router.get("/sample/{table_id}")
-async def sample(
+def sample(
    table_id: str,
    n: int = Query(default=5, ge=1, le=_MAX_N),
    user: dict = Depends(get_current_user),
    conn: duckdb.DuckDBPyConnection = Depends(_get_db),
    bq: BqAccess = Depends(get_bq_access),
 ):
+    # Plain ``def`` — opens a `bq.duckdb_session()` and runs sync queries
+    # through the BQ extension. See PR #188 Tier 1 entry.
    try:
        return build_sample(conn, user, table_id, n=n, bq=bq)
    except FileNotFoundError:
--- a/app/api/v2_scan.py
+++ b/app/api/v2_scan.py
@ -218,12 +218,17 @@ def _avg_bytes_for_type(t: str) -> int:


@router.post("/scan/estimate")
-async def scan_estimate_endpoint(
+def scan_estimate_endpoint(
    raw: dict,
    user: dict = Depends(get_current_user),
    conn: duckdb.DuckDBPyConnection = Depends(_get_db),
    bq: BqAccess = Depends(get_bq_access),
 ):
+    # Plain ``def`` so FastAPI auto-offloads to the anyio thread pool — the
+    # estimate path calls into google-cloud-bigquery's `client.query(...,
+    # dry_run=True)` which blocks until BQ returns the dry-run cost. Under
+    # ``async def`` that wait holds the event loop. See PR #188's Tier 1
+    # entry for the wider rollout.
    try:
        return estimate(conn, user, raw, bq=bq)
    except WhereValidationError as e:
@ -374,7 +379,7 @@ def run_scan(


@router.post("/scan")
-async def scan_endpoint(
+def scan_endpoint(
    raw: dict,
    user: dict = Depends(get_current_user),
    conn: duckdb.DuckDBPyConnection = Depends(_get_db),
--- a/app/api/v2_schema.py
+++ b/app/api/v2_schema.py
@ -209,12 +209,14 @@ def build_schema(


@router.get("/schema/{table_id}")
-async def schema(
+def schema(
    table_id: str,
    user: dict = Depends(get_current_user),
    conn: duckdb.DuckDBPyConnection = Depends(_get_db),
    bq: BqAccess = Depends(get_bq_access),
 ):
+    # Plain ``def`` — opens a `bq.duckdb_session()` and runs sync metadata
+    # queries through the BQ extension. See PR #188 Tier 1 entry.
    try:
        return build_schema(conn, user, table_id, bq=bq)
    except NotFound:
--- a/app/main.py
+++ b/app/main.py
@ -141,6 +141,23 @@ async def lifespan(app):
        log_effective_policy()
    except Exception:
        pass  # never block startup on a logging convenience
+
+    # Bump anyio's default thread pool size from 40 → AGNES_THREADPOOL_SIZE
+    # (default 200). FastAPI auto-runs every plain `def` route handler in
+    # this pool — the Tier 1 endpoints converted in PR #188 (`/api/query`,
+    # `/api/v2/scan`, `/api/v2/sample`, `/api/v2/schema`) all block on
+    # synchronous DuckDB / BQ-extension calls inside the handler body and
+    # would otherwise serialise once 40 are in flight. 200 keeps the per-
+    # process working set well under the BQ extension's connection cap
+    # while leaving headroom for concurrent UI / health probes.
+    try:
+        import anyio.to_thread
+        size = int(os.environ.get("AGNES_THREADPOOL_SIZE", "200"))
+        anyio.to_thread.current_default_thread_limiter().total_tokens = size
+        logger.info("anyio thread pool capacity set to %d", size)
+    except Exception as e:
+        logger.warning("failed to bump anyio thread pool capacity: %s", e)
+
    yield
    from src.db import close_system_db
    close_system_db()
--- a/cli/client.py
+++ b/cli/client.py
@ -2,12 +2,14 @@

 import os
 import time
+import traceback
+from datetime import datetime, timezone
 from pathlib import Path
 from typing import Optional

 import httpx

-from cli.config import get_server_url, get_token
+from cli.config import _config_dir, get_server_url, get_token

 # Retry policy for transient failures during stream downloads. Scoped to
 # network issues and 5xx — 4xx (auth, 404, 400) is NOT retried. Tunable via
@ -21,6 +23,125 @@ _RETRY_BACKOFFS_S = (0.3, 1.0, 3.0)  # seconds before attempt 2, 3, 4
 QUERY_TIMEOUT_S = float(os.environ.get("AGNES_QUERY_TIMEOUT", "300"))


+# ── Transport-error translation ─────────────────────────────────────────
+# Pavel's Issue #185 Phase 3B caught the failure mode: when httpx raises
+# `ReadTimeout` / `ConnectError` / `RemoteProtocolError` and the CLI
+# command doesn't catch it, Typer dumps a five-frame Python traceback to
+# the analyst's terminal. That looks like a CLI bug to a non-Python user
+# and obscures the actionable signal ("server slow, try snapshot create").
+# Translate transport exceptions to `AgnesTransportError` with a typed
+# user-facing message, log the full traceback to `~/.config/agnes/last-
+# error.log` for debug, and let the top-level CLI handler render the
+# clean message + exit non-zero.
+
+_LOG_FILE = _config_dir() / "last-error.log"
+
+
+class AgnesTransportError(Exception):
+    """Network / transport failure with a user-actionable message.
+
+    Raised by the api_* / stream_download helpers when httpx surfaces a
+    connection / timeout / protocol error. The CLI's top-level Typer
+    handler catches this, prints `.user_message` (NOT the traceback),
+    and exits non-zero. Full traceback goes to ``~/.config/agnes/last-
+    error.log`` so an operator can recover it for support.
+    """
+
+    def __init__(self, user_message: str, *, hint: str = "", logfile_path: Path | None = None):
+        super().__init__(user_message)
+        self.user_message = user_message
+        self.hint = hint
+        self.logfile_path = logfile_path
+
+
+def _log_traceback(exc: BaseException, *, context: str) -> Path:
+    """Append a timestamped traceback to ``~/.config/agnes/last-error.log``
+    and return the path. Best-effort — never raises (a logging failure
+    must not mask the original error)."""
+    try:
+        with open(_LOG_FILE, "a", encoding="utf-8") as f:
+            ts = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
+            f.write(f"\n=== {ts} {context} ===\n")
+            traceback.print_exception(type(exc), exc, exc.__traceback__, file=f)
+    except Exception:
+        pass
+    return _LOG_FILE
+
+
+def _translate_transport_error(
+    exc: Exception, *, context: str, timeout_s: float | None = None,
+) -> AgnesTransportError:
+    """Map httpx transport exceptions to user-facing CLI messages. The
+    mapping is intentionally pragmatic — analysts care about "what do I
+    do next", not the gRPC / TCP detail.
+
+    `timeout_s`, when supplied, is the actual httpx timeout used by the
+    failing call so the ReadTimeout message reports the real wait window
+    (a `agnes catalog` GET dies at 30s, not 300s — Devin Review on PR
+    #188 caught the original signature hardcoding `QUERY_TIMEOUT_S`,
+    which only matches `agnes query --remote`)."""
+    log = _log_traceback(exc, context=context)
+    if isinstance(exc, httpx.ReadTimeout):
+        wait_s = timeout_s if timeout_s is not None else QUERY_TIMEOUT_S
+        # The "long-running BQ" advisory only makes sense when the call
+        # actually hit the query path (timeout ≥ ~60s). For short calls
+        # (the 30s default on `agnes catalog` etc.) it's just confusing.
+        if wait_s >= 60:
+            hint = (
+                "If this is `agnes query --remote` against a heavy BQ view, "
+                "the underlying BQ job took longer than the wait window. Try:\n"
+                "  • narrow the WHERE (especially the partition column from `agnes catalog --json`)\n"
+                "  • `agnes snapshot create <table> ... --estimate` to materialize once + query locally\n"
+                "  • set AGNES_QUERY_TIMEOUT=600 for a longer client-side wait\n"
+                f"Full traceback: {log}"
+            )
+        else:
+            hint = (
+                "Server is slow or unreachable. Check `agnes status`; "
+                "re-run if transient.\n"
+                f"Full traceback: {log}"
+            )
+        return AgnesTransportError(
+            f"Server didn't respond within the read timeout ({wait_s:.0f}s) "
+            f"for {context}.",
+            hint=hint,
+            logfile_path=log,
+        )
+    if isinstance(exc, httpx.ConnectError):
+        return AgnesTransportError(
+            f"Can't reach the agnes server for {context}.",
+            hint=(
+                "Check the server URL with `agnes status`, network reachability "
+                "(VPN / DNS / firewall), and the TLS-trust setup if this is a "
+                f"corporate-CA deployment.\nFull traceback: {log}"
+            ),
+            logfile_path=log,
+        )
+    if isinstance(exc, (httpx.RemoteProtocolError, httpx.ReadError, httpx.WriteError)):
+        return AgnesTransportError(
+            f"Connection broke mid-flight on {context}.",
+            hint=(
+                "Usually a transient network blip. Re-run the command. If it "
+                f"keeps happening, check `agnes status`.\nFull traceback: {log}"
+            ),
+            logfile_path=log,
+        )
+    if isinstance(exc, httpx.TimeoutException):
+        return AgnesTransportError(
+            f"Network timeout on {context}.",
+            hint=f"Re-run; if persistent, check the server.\nFull traceback: {log}",
+            logfile_path=log,
+        )
+    # Anything else: re-wrap with a generic message so the CLI doesn't
+    # dump the traceback. We'd prefer a typed translation; if you hit
+    # this branch, add a clause above.
+    return AgnesTransportError(
+        f"Unexpected error on {context}: {type(exc).__name__}.",
+        hint=f"Full traceback: {log}",
+        logfile_path=log,
+    )
+
+
 def get_client(timeout: float = 30.0) -> httpx.Client:
    """Get an authenticated httpx client."""
    token = get_token()
@ -35,23 +156,35 @@ def get_client(timeout: float = 30.0) -> httpx.Client:


 def api_get(path: str, *, timeout: float = 30.0, **kwargs) -> httpx.Response:
+    try:
        with get_client(timeout=timeout) as client:
            return client.get(path, **kwargs)
+    except httpx.HTTPError as exc:
+        raise _translate_transport_error(exc, context=f"GET {path}", timeout_s=timeout) from exc


 def api_post(path: str, *, timeout: float = 30.0, **kwargs) -> httpx.Response:
+    try:
        with get_client(timeout=timeout) as client:
            return client.post(path, **kwargs)
+    except httpx.HTTPError as exc:
+        raise _translate_transport_error(exc, context=f"POST {path}", timeout_s=timeout) from exc


 def api_delete(path: str, *, timeout: float = 30.0, **kwargs) -> httpx.Response:
+    try:
        with get_client(timeout=timeout) as client:
            return client.delete(path, **kwargs)
+    except httpx.HTTPError as exc:
+        raise _translate_transport_error(exc, context=f"DELETE {path}", timeout_s=timeout) from exc


 def api_patch(path: str, *, timeout: float = 30.0, **kwargs) -> httpx.Response:
+    try:
        with get_client(timeout=timeout) as client:
            return client.patch(path, **kwargs)
+    except httpx.HTTPError as exc:
+        raise _translate_transport_error(exc, context=f"PATCH {path}", timeout_s=timeout) from exc


 def _is_transient(exc: Exception) -> bool:
@ -98,7 +231,23 @@ def stream_download(path: str, target_path: str, progress_callback=None) -> int:
            if attempt == _RETRY_ATTEMPTS or not _is_transient(exc):
                break
            time.sleep(_RETRY_BACKOFFS_S[min(attempt, len(_RETRY_BACKOFFS_S) - 1)])
-    # Clean up any leftover tmp, then surface the last exception.
+    # Clean up any leftover tmp, then surface the last exception. Translate
+    # transport errors (timeouts, connection drops, protocol errors) to
+    # AgnesTransportError so the CLI prints a clean message instead of a
+    # Python traceback (Pavel's #185 Phase 3B). HTTPStatusError (4xx/5xx
+    # response from the server) is NOT a transport failure and must
+    # re-raise verbatim so the caller's status-code handling + the rich
+    # server error body (e.g. 401 with "token expired", 403 with
+    # cross_project_forbidden detail) reach the analyst — Devin Review on
+    # PR #188 caught: HTTPStatusError is a subclass of HTTPError, so the
+    # generic isinstance(HTTPError) translation was eating status codes.
    tmp_path.unlink(missing_ok=True)
    assert last_exc is not None
+    if isinstance(last_exc, httpx.HTTPStatusError):
+        raise last_exc
+    if isinstance(last_exc, httpx.HTTPError):
+        raise _translate_transport_error(
+            last_exc, context=f"GET {path} (stream → {target_path})",
+            timeout_s=300.0,
+        ) from last_exc
    raise last_exc
--- a/cli/commands/init.py
+++ b/cli/commands/init.py
@ -64,6 +64,16 @@ def init(
    token: str = typer.Option(..., "--token", help="Personal access token"),
    force: bool = typer.Option(False, "--force", help="Re-initialize an existing workspace"),
    workspace_str: Optional[str] = typer.Option(None, "--workspace", help="Target dir (default: cwd)"),
+    skip_materialize: bool = typer.Option(
+        False, "--skip-materialize",
+        help=(
+            "Skip materialized-mode tables on the first pull. The first "
+            "init can otherwise spend tens of minutes silently downloading "
+            "a single multi-GB scheduled-query parquet. Materialized rows "
+            "are still discoverable via `agnes catalog`; rerun `agnes pull` "
+            "without this flag once you actually need them locally."
+        ),
+    ),
 ):
    """Bootstrap workspace: auth, CLAUDE.md, hooks, first pull, AGNES_WORKSPACE.md."""
    workspace = Path(workspace_str).resolve() if workspace_str else Path.cwd()
@ -176,7 +186,15 @@ def init(
    # exception escaping here is a programming error worth surfacing.
    # ------------------------------------------------------------------
    try:
-        result: PullResult = run_pull(server_url, token, workspace)
+        # `agnes init` always runs interactively (analyst typing the
+        # command), so progress is on by default — Pavel's #185 Phase 1
+        # was a 44-minute silent download on the very first install.
+        # Pass it through to run_pull.
+        result: PullResult = run_pull(
+            server_url, token, workspace,
+            skip_materialize=skip_materialize,
+            show_progress=True,
+        )
    except Exception as exc:
        typer.echo(render_error(0, {"detail": {
            "kind": "manifest_unauthorized",
--- a/cli/commands/pull.py
+++ b/cli/commands/pull.py
@ -38,6 +38,15 @@ def pull(
    quiet: bool = typer.Option(False, "--quiet", help="Suppress success stdout (errors still surface on stderr)"),
    as_json: bool = typer.Option(False, "--json", help="Emit a single JSON object summarizing the pull"),
    dry_run: bool = typer.Option(False, "--dry-run", help="Compute the delta without writing anything to disk"),
+    skip_materialize: bool = typer.Option(
+        False, "--skip-materialize",
+        help=(
+            "Skip materialized-mode tables (server-side scheduled BQ "
+            "scan results, often multi-GB). Their data is still discoverable "
+            "via `agnes catalog` and remote-mode tables still pull. Useful "
+            "for a fast first init when an analyst only needs --remote access."
+        ),
+    ),
 ):
    """Refresh data from the server into ./server/parquet + ./user/duckdb."""
    server_url = get_server_url()
@ -68,8 +77,17 @@ def pull(

    workspace = Path(os.environ.get("AGNES_LOCAL_DIR", ".")).resolve()

+    # Show progress unless quiet (SessionStart hooks) or json (machine-
+    # readable output where Rich's terminal-control sequences would be
+    # garbage in the consumer's parser).
+    show_progress = not (quiet or as_json)
    try:
-        result: PullResult = run_pull(server_url, token, workspace, dry_run=dry_run)
+        result: PullResult = run_pull(
+            server_url, token, workspace,
+            dry_run=dry_run,
+            skip_materialize=skip_materialize,
+            show_progress=show_progress,
+        )
    except Exception as exc:
        # `run_pull` is documented to record per-table / per-stage failures
        # under `result.errors` rather than raising, so reaching this branch
--- a/cli/lib/pull.py
+++ b/cli/lib/pull.py
@ -112,6 +112,8 @@ def run_pull(
    workspace: Path,
    *,
    dry_run: bool = False,
+    skip_materialize: bool = False,
+    show_progress: bool = False,
 ) -> PullResult:
    """Refresh local parquets + corporate memory rules from the server.

@ -119,6 +121,17 @@ def run_pull(
    Typer/Rich UI. Returns a `PullResult` summary; never raises for
    network/server errors (records them under `errors` instead) so the
    caller can decide whether a partial pull is fatal.
+
+    Args:
+        skip_materialize: When True, omit `query_mode='materialized'`
+            tables from the download set. Use for analysts who only
+            care about `--remote` access on the workspace and don't
+            want to wait on multi-GB scheduled-query parquets at first
+            init. Pavel's #185 Phase 1: a 6.3 GB `order_economics`
+            parquet kept first init silent for 44 minutes.
+        show_progress: When True, render a per-file progress bar to
+            stderr via Rich during the parallel download phase. Pass
+            False from `--quiet` callers (SessionStart hooks).
    """
    started = time.monotonic()
    result = PullResult()
@ -159,6 +172,11 @@ def run_pull(
        for tid, info in server_tables.items():
            if info.get("query_mode") == "remote":
                continue
+            if skip_materialize and info.get("query_mode") == "materialized":
+                # Operator opt-out for first-init. Materialized rows are
+                # still discoverable via `agnes catalog` and queryable
+                # the next time `agnes pull` runs without --skip-materialize.
+                continue
            non_remote_total += 1
            local_hash = local_tables.get(tid, {}).get("hash", "")
            server_hash = info.get("hash", "")
@ -178,15 +196,75 @@ def run_pull(
            result.duration_s = time.monotonic() - started
            return result

-        # 4. Download parquets. Lazy mkdir: only create server/parquet/
-        # when we have at least one table to write into it.
-        for tid in to_download:
-            if not parquet_dir.exists():
+        # 4. Download parquets in parallel. Lazy mkdir: only create
+        # server/parquet/ when we have at least one table to write into it.
+        # Concurrency capped by `AGNES_PULL_PARALLELISM` (default 4) so a
+        # registry of 50+ tables doesn't open 50+ TCP connections + saturate
+        # the analyst's NIC; 4 matches typical home-broadband saturation
+        # without over-subscribing the server's caddy file_server (each
+        # request is a separate goroutine + sendfile, but the analyst's
+        # downlink is the more frequent bottleneck). Set to 1 to restore
+        # the pre-PR serial behavior for debug repro. The server-side
+        # bypass-uvicorn fix (Caddy file_server) is the other half —
+        # without it, parallel downloads would still queue on the single
+        # uvicorn worker.
+        if to_download and not parquet_dir.exists():
            parquet_dir.mkdir(parents=True, exist_ok=True)
+
+        try:
+            workers = max(1, int(os.environ.get("AGNES_PULL_PARALLELISM", "4")))
+        except ValueError:
+            workers = 4
+        # Drop to serial when there's only one (or zero) tables — avoids
+        # the executor + thread overhead for the common single-update case.
+        workers = min(workers, len(to_download)) if to_download else 1
+
+        # Optional progress bar — Rich's Progress tracks per-file bytes
+        # streamed, aggregated across the parallel ThreadPoolExecutor
+        # workers. Pavel's #185 Phase 1: a single 6.3 GB parquet on first
+        # init went 44 minutes silent, looked frozen. Now: aggregate "X.Y
+        # GB / Z.A GB · 56 MB/s · ETA 1m 20s" to stderr while threads
+        # stream. None when show_progress=False (SessionStart hooks etc.).
+        progress = None
+        progress_tasks: dict[str, int] = {}
+        if show_progress and to_download:
+            from rich.progress import (
+                Progress, BarColumn, DownloadColumn, TextColumn,
+                TimeRemainingColumn, TransferSpeedColumn,
+            )
+            progress = Progress(
+                TextColumn("[bold]{task.fields[label]}[/]"),
+                BarColumn(),
+                DownloadColumn(),
+                TransferSpeedColumn(),
+                TimeRemainingColumn(),
+                transient=False,
+            )
+            progress.start()
+            for tid in to_download:
+                size = int(server_tables[tid].get("size_bytes") or 0)
+                # Some manifest entries don't carry size — Rich shows
+                # an indeterminate bar in that case.
+                progress_tasks[tid] = progress.add_task(
+                    "download", label=tid, total=size if size > 0 else None,
+                )
+
+        def _download_one(tid: str) -> tuple[str, dict | None, str | None]:
+            """Returns (tid, local_table_entry_or_None, error_or_None).
+            One bound thread per call; stream_download is sync I/O so a
+            ThreadPoolExecutor (not asyncio) is the right tool. The
+            progress callback is thread-safe — Rich's Progress.update
+            holds an internal lock."""
            target = parquet_dir / f"{tid}.parquet"
            expected_hash = server_tables[tid].get("hash", "")
+            cb = None
+            if progress is not None and tid in progress_tasks:
+                task_id = progress_tasks[tid]
+                def cb(n: int, _tid=tid, _task=task_id):
+                    progress.update(_task, advance=n)
            try:
-                stream_download(f"/api/data/{tid}/download", str(target))
+                stream_download(f"/api/data/{tid}/download", str(target),
+                                progress_callback=cb)
                if expected_hash:
                    actual_hash = _file_md5(target)
                    if actual_hash != expected_hash:
@ -197,14 +275,32 @@ def run_pull(
                elif not _is_valid_parquet(target):
                    target.unlink(missing_ok=True)
                    raise ValueError("not a valid parquet (missing PAR1 magic)")
-                local_tables[tid] = {
+                entry = {
                    "hash": expected_hash,
                    "rows": server_tables[tid].get("rows", 0),
                    "size_bytes": server_tables[tid].get("size_bytes", 0),
                }
-                result.tables_updated += 1
+                return tid, entry, None
            except Exception as exc:
-                result.errors.append({"table": tid, "error": str(exc)})
+                return tid, None, str(exc)
+
+        try:
+            if workers <= 1:
+                outcomes = [_download_one(tid) for tid in to_download]
+            else:
+                from concurrent.futures import ThreadPoolExecutor
+                with ThreadPoolExecutor(max_workers=workers) as ex:
+                    outcomes = list(ex.map(_download_one, to_download))
+        finally:
+            if progress is not None:
+                progress.stop()
+
+        for tid, entry, err in outcomes:
+            if err is not None:
+                result.errors.append({"table": tid, "error": err})
+            else:
+                local_tables[tid] = entry
+                result.tables_updated += 1

        # 5. Persist sync state (only on real runs).
        # TODO(workspace-scoped-sync-state): currently saved to
--- a/cli/main.py
+++ b/cli/main.py
@ -123,5 +123,41 @@ app.add_typer(snapshot_app, name="snapshot")
 app.add_typer(disk_info_app, name="disk-info")


-if __name__ == "__main__":
+def _run_with_clean_errors() -> None:
+    """Wrap ``app()`` so AgnesTransportError (and other typed CLI errors)
+    surface as a one-line message + exit, never as a Python traceback. The
+    full traceback is already logged to ``~/.config/agnes/last-error.log``
+    by the api_* helpers — operators read it from there for support
+    forwarding. Anything that escapes this wrapper IS a CLI bug worth
+    fixing — log + print "internal error" so the analyst doesn't see a
+    Pythonist's traceback either.
+
+    Pavel's #185 Phase 3B: previously a `httpx.ReadTimeout` from an
+    `agnes query --remote` against a slow BQ view dumped a 30-frame
+    traceback to the analyst's terminal. Now: one clean line + a hint,
+    return code 1.
+    """
+    from cli.client import AgnesTransportError, _log_traceback, _LOG_FILE
+    try:
        app()
+    except (AgnesTransportError) as exc:
+        typer.echo(f"Error: {exc.user_message}", err=True)
+        if exc.hint:
+            typer.echo(exc.hint, err=True)
+        sys.exit(1)
+    except typer.Exit:
+        raise
+    except (KeyboardInterrupt, SystemExit):
+        raise
+    except Exception as exc:  # last-resort net — escaped exceptions are bugs
+        log = _log_traceback(exc, context="unhandled at CLI top-level")
+        typer.echo(
+            f"Error: internal CLI error ({type(exc).__name__}). "
+            f"Full traceback logged to {log}.",
+            err=True,
+        )
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    _run_with_clean_errors()
--- a/cli/update_check.py
+++ b/cli/update_check.py
@ -184,7 +184,7 @@ def format_outdated_notice(info: UpdateInfo) -> str:
    literal string "None" into a copy-pasteable command — drop the upgrade
    snippet in that case.
    """
-    msg = f"[update] da {info.installed} is out of date — latest on this server is {info.latest}."
+    msg = f"[update] agnes {info.installed} is out of date — latest on this server is {info.latest}."
    if info.download_url:
        msg += f" Upgrade: uv tool install --force {info.download_url}"
    return msg
--- a/config/claude_md_template.txt
+++ b/config/claude_md_template.txt
@ -28,22 +28,41 @@ This workspace is connected to {{ server.url }}.
 - **Personal customizations go in `.claude/CLAUDE.local.md`, NOT here.** This file is regenerated by `agnes init --force`; edits here will be lost. CLAUDE.local.md is preserved across regeneration and uploaded on `agnes push`.

 ## Metrics Workflow
-1. `agnes catalog --metrics` — find the relevant metric ({{ metrics.count }} available, categories: {{ metrics.categories | join(", ") or "none yet" }})
-2. `agnes catalog --metrics --show <category>/<name>` — read SQL and business rules
-3. Use the canonical SQL from the metric definition, adapt to the question
-4. Never invent metric calculations — always check existing definitions first
+1. `agnes catalog --metrics` — list registered metrics + categories
+2. `agnes catalog --metrics --show <category>/<name>` — read the canonical SQL + business rules
+3. Adapt the canonical SQL; never invent metric calculations

 ## Data Sync
 - `agnes pull` — download current data from server
 - `agnes push` — upload sessions and local notes to server
 - Data on the server refreshes every {{ sync_interval }}

-## Available Datasets
-{% for t in tables -%}
- `{{ t.name }}`{% if t.description %} — {{ t.description }}{% endif %}{% if t.query_mode == "remote" %} *(remote, queried on demand)*{% endif %}
-{% else -%}
- _No tables registered yet — ask an admin to register tables in the dashboard._
-{% endfor %}
+## Discovering tables — never enumerate from memory
+
+Tables, columns, sizes, descriptions, and `query_mode` change as admins
+register / migrate / drop entries. Always re-discover from the live server,
+never from this file or your training data:
+
+```
+agnes catalog --json                                     # all tables: id, query_mode, sql_flavor,
+                                                          # where_examples, fetch_via, rough_size_hint, description
+agnes catalog --json | jq '.tables[] | select(.id=="<id>")'   # single table — read its description in full BEFORE writing any SQL
+agnes schema <table>                                     # columns + types in the right SQL dialect
+agnes describe <table> -n 5                              # sample rows (local + materialized only)
+```
+
+The `description` field on each catalog row is the **authoritative
+business-rules text** for that table — it carries grain, partition
+column, join contracts, and column-level gotchas. Re-read it from the
+live `agnes catalog` for every cross-table decision; do **not** copy
+it into this workspace `CLAUDE.md` (it's a snapshot that goes stale,
+and `agnes init` will overwrite local edits — put personal notes into
+`.claude/CLAUDE.local.md` instead). The CLI is the source of truth.
+
+`rough_size_hint` is server-populated for `local` and `materialized` tables
+(`small` ≤100 MiB, `medium` ≤1 GiB, `large` ≤10 GiB, `very_large` >10 GiB) and
+`null` for `remote` rows. When `null`, treat the table as potentially large
+and use `agnes snapshot create --estimate` to size-check before fetching.

 {% if marketplaces -%}
 ## Plugins available to you
@ -58,15 +77,43 @@ Not every table is synced. Tables registered with `query_mode: "remote"` live in
 BigQuery, accessed server-side via DuckDB's BQ extension — no parquet on disk.
 Tables you don't see in `server/parquet/` may still be queryable.

-### Discovery first
+### Discovery first — read `agnes catalog --json` BEFORE every cross-table decision
+
+`agnes catalog --json` returns one row per table with these fields. Use them; don't guess:
+
+| Field | What it tells you | How to use it |
+|---|---|---|
+| `query_mode` | `local` (parquet on laptop) / `remote` (BQ on demand) / `materialized` (synced parquet of a BQ result) | Picks the tool — see decision tree below |
+| `source_type` | `keboola` / `bigquery` / `jira` | Determines SQL dialect |
+| `sql_flavor` | `duckdb` for local sources, `bigquery` for `--remote` queries on BQ rows | What syntax `--where` expects |
+| `where_examples` | 1–3 example WHERE predicates that are valid for this table's dialect | Copy as starting point for `--where` |
+| `fetch_via` | Pre-formatted `agnes snapshot create …` template for this table | The canonical "how do I get a slice of this table" command |
+| `rough_size_hint` | Coarse size hint (`small` / `medium` / `large` or null when unknown) | Bigger than `medium` → never `agnes query --remote` without a tight `--where`; use `agnes snapshot create` |

 ```
-agnes catalog --json | jq '.[] | {name, source_type, query_mode}'   # see all tables + their modes
-agnes schema <table>                                                # columns + types
-agnes describe <table> -n 5                                         # sample rows
+agnes catalog --json                              # full structured view (use this in scripts)
+agnes catalog                                     # human-readable summary
+agnes schema <table>                              # columns + types (BIGQUERY/DUCKDB dialect printed in header)
+agnes describe <table> -n 5                       # sample rows (works on local & materialized only)
 ```

-For local-mode tables, query directly with `agnes query "SELECT … FROM <table>"`.
+### Decision tree — pick the right tool BEFORE writing SQL
+
+```
+                       ┌─ local        → agnes query "SELECT ..."
+agnes catalog →   ─────┤
+query_mode of <table>  ├─ materialized → agnes query (parquet was synced by agnes pull)
+                       │                 (if missing locally, run `agnes pull` first)
+                       │
+                       └─ remote       → choose by table size + query shape:
+                                          - one cheap probe (COUNT, schema-confirm, single agg ≤200s)
+                                              → agnes query --remote "..."
+                                          - repeated questions on same slice / large scan
+                                              → agnes snapshot create <table> --select ... --where ... --as <name>
+                                                then agnes query "SELECT ... FROM <name>"
+                                          - join with a local table
+                                              → agnes query --register-bq "alias=BQ_SQL" --sql "..."
+```

 ### Three patterns for `query_mode: "remote"` tables

@ -76,13 +123,30 @@ For local-mode tables, query directly with `agnes query "SELECT … FROM <table>
 | **`agnes query --remote`** | one-shot, server-side execution against BigQuery (works for BASE TABLE rows directly + VIEW/MATERIALIZED_VIEW rows via the BQ jobs API; cost-guarded by a 5 GiB scan cap configurable in /admin/server-config) | single aggregate / cheap probe |
 | **`agnes query --register-bq`** | hybrid joins between local snapshots and ad-hoc BQ subqueries | crossing local + remote |

-### Permission model + cost — important
+### Common mistakes — avoid on first try

- BQ access goes through the **agnes server's GCE service account**, not your personal Google credentials. If a query fails with a permission error, the table is in a project the server SA cannot read — escalate to admin, do NOT try to authenticate yourself.
- Every BQ query bills the SA's GCP project for **bytes scanned**. A naive `SELECT * FROM <large_table>` can cost real money. ALWAYS:
-  - filter via `--where` on the partition column (typically a date)
-  - list specific columns in `--select` — column-store BQ skips the rest, cheaper
-  - run `--estimate` first when unsure of the table size or partitioning
+- **`--estimate` is on `agnes snapshot create` ONLY.** Do NOT pass it to `agnes query` — fails with `No such option: --estimate`. The estimate flow is a snapshot-creation cost gate, not a query primitive.
+- **Old `agnes fetch` / `da fetch` / `da query` references in stale docs** — the CLI is `agnes`; `agnes fetch` was renamed to `agnes snapshot create`. If you see those names, translate before running.
+- **Don't attempt personal GCP auth** if a BQ query fails with permission errors. BQ access uses the **server's service account**, not your Google identity — escalate to admin instead.
+- **Don't `agnes query --remote "SELECT * FROM <large_table>"`** without a `--where`. Even if the scan-byte gate refuses, you've wasted the round-trip; gate yourself first by reading `rough_size_hint` and `where_examples` from `agnes catalog --json`.
+
+### Failure-mode dictionary — what each error means + the right response
+
+| Error wording (substring) | Cause | Response |
+|---|---|---|
+| `Binder Error: Query execution exceeded the timeout. Job ID: ...` | BQ-side query took >~200 s wall-clock; the DuckDB BQ extension's `bq_query_timeout_ms` (default 90 s, server may bump to 600 s) elapsed | Narrow `--where` (especially partition column), drop unused columns from `--select`, or switch to `agnes snapshot create` to materialise once + query locally |
+| `HTTP 400: remote_scan_too_large` | Server's `bq_max_scan_bytes` cost gate refused the query (default 5 GiB) | Tighten `--where`; consider `agnes snapshot create` so the cost is paid once, then local queries are free |
+| `HTTP 401: ... unauthorized` | PAT expired or wrong | `agnes init --server-url ... --token <new-PAT>`; re-mint via the dashboard's "Personal Access Tokens" page |
+| `HTTP 403: cross_project_forbidden` (with `serviceusage` mention) | Server SA lacks `serviceusage.services.use` on the BQ data project | Escalate to admin to set `data_source.bigquery.billing_project`; do NOT try personal auth |
+| `ReadTimeout` (client-side) on `agnes query --remote` | CLI is older than 0.35.1 (had 30 s default) | `agnes --version`; if <0.35.1, upgrade with `uv tool install --force <wheel-from-server>` (the URL is in the `[update]` banner that prints on every command). Then retry. |
+| `unknown columns: [...]` from `agnes snapshot create` | `--select` lists columns that don't exist | Run `agnes schema <table>` and copy column names verbatim |
+
+### Cost discipline — every BQ query bills bytes scanned
+
+A naive `SELECT * FROM <large_table>` can cost real money. ALWAYS:
+- filter via `--where` on the partition column (typically a date) — read `where_examples` in `agnes catalog --json`
+- list specific columns in `--select` — column-store BQ skips the rest
+- run `--estimate` first (only valid on `agnes snapshot create`) when the table is partitioned/clustered or when `rough_size_hint` is unknown

 ### `agnes snapshot create` discipline

--- a/config/instance.yaml.example
+++ b/config/instance.yaml.example
@ -127,6 +127,14 @@ data_source:
    #                                    # Dry-run check before running; exceeding -> registration / sync
    #                                    # rejected. Default 10 GiB (10737418240). Set 0 to disable.
    #                                    # null falls through to default. Configurable via /admin/server-config UI.
+    # query_timeout_ms: 600000
+    #                                    # DuckDB BigQuery extension query timeout (milliseconds).
+    #                                    # Applied via `SET bq_query_timeout_ms` after every LOAD bigquery
+    #                                    # on every BQ-touching DuckDB session. Extension default is
+    #                                    # 90 000 ms = 90 s, which is too tight for analyst queries against
+    #                                    # view-backed datasets -- bumped to 600 000 ms = 10 min by default.
+    #                                    # Set 0 to fall through to the extension default. Configurable via
+    #                                    # /admin/server-config UI.

 # --- OpenMetadata catalog (optional) ---
 # Enriches table and column metadata from OpenMetadata REST API.
--- a/connectors/bigquery/access.py
+++ b/connectors/bigquery/access.py
@ -232,11 +232,48 @@ def _default_duckdb_session_factory(projects: BqProjects):
                f"failed to install/load BigQuery DuckDB extension: {e}",
                details={"original": str(e)},
            )
+        apply_bq_session_settings(conn)
        yield conn
    finally:
        conn.close()


+def apply_bq_session_settings(conn) -> None:
+    """Apply per-session DuckDB BigQuery-extension settings from instance config.
+
+    Currently sets ``bq_query_timeout_ms`` from
+    ``data_source.bigquery.query_timeout_ms``. The extension default is 90 s,
+    which is too tight for analyst-scale queries against view-backed BQ
+    datasets — bumping the default to 600 s here. Sentinel ``0`` (or a
+    non-numeric / unparseable value) leaves the extension default in place.
+
+    Call AFTER ``LOAD bigquery`` on every DuckDB session that touches BQ:
+    BqAccess's session factory, the standalone extractor in
+    ``connectors/bigquery/extractor.py``, and the orchestrator's
+    ``_remote_attach`` path in ``src/orchestrator.py``.
+    """
+    try:
+        from app.instance_config import get_value
+    except Exception:
+        return
+    raw = get_value(
+        "data_source", "bigquery", "query_timeout_ms", default=600_000,
+    )
+    try:
+        ms = int(raw) if raw is not None else 0
+    except (TypeError, ValueError):
+        return
+    if ms <= 0:
+        return
+    try:
+        conn.execute(f"SET bq_query_timeout_ms = {int(ms)}")
+    except Exception:
+        # Fail-soft: extension version may not support the setting, or the
+        # session may already have been frozen — leave the default rather
+        # than poisoning the whole session.
+        pass
+
+
 class BqAccess:
    """Single entry point for BigQuery access. Stateless after construction.

--- a/connectors/bigquery/extractor.py
+++ b/connectors/bigquery/extractor.py
@ -359,6 +359,8 @@ def _init_extract_locked(
            conn.execute(
                f"CREATE SECRET bq_session (TYPE bigquery, ACCESS_TOKEN '{escaped_token}')"
            )
+            from connectors.bigquery.access import apply_bq_session_settings
+            apply_bq_session_settings(conn)
            conn.execute(
                f"ATTACH 'project={project_id}' AS bq (TYPE bigquery, READ_ONLY)"
            )
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -112,6 +112,11 @@ services:
      - /data/state/certs:/certs:ro
      - caddy_data:/data
      - caddy_config:/config
+      # Read-only mount of the agnes data dir so Caddy's file_server can
+      # serve parquets directly (sendfile/zero-copy) and bypass the app's
+      # uvicorn workers — see Caddyfile's @download handler. Mounted at
+      # /srv (not /data) because /data is already the caddy_data volume.
+      - data:/srv:ro
    environment:
      - DOMAIN=${DOMAIN:-localhost}
      # Passes through whatever the operator set in .env. Caddyfile uses
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,6 +1,6 @@
 [project]
 name = "agnes-the-ai-analyst"
-version = "0.35.1"
+version = "0.36.0"
 description = "Agnes — AI Data Analyst platform for AI analytical systems"
 requires-python = ">=3.11,<3.14"
 license = "MIT"
@ -95,7 +95,7 @@ dev = [
 ]

 [project.scripts]
-agnes = "cli.main:app"
+agnes = "cli.main:_run_with_clean_errors"

 [build-system]
 requires = ["hatchling"]
--- a/scripts/ops/agnes-auto-upgrade.sh
+++ b/scripts/ops/agnes-auto-upgrade.sh
@ -53,8 +53,57 @@ IMAGE="ghcr.io/keboola/agnes-the-ai-analyst:${AGNES_TAG:-stable}"
 # Array form (vs. word-split string) — quoted expansion survives paths
 # with spaces and is the modern bash idiom. Functionally identical here
 # since /opt/agnes paths are tame, but it's a cheap habit to keep.
+#
+# The TLS-overlay decision deliberately runs BELOW the config re-fetch
+# (Devin Review caught: this used to live here, evaluating Caddyfile
+# existence against the PRE-fetch state. If the fetch added a
+# previously-missing Caddyfile, this tick's docker compose would still
+# omit `--profile tls` until the next 5-minute tick — a window where
+# the recreate uses the wrong overlay set). Base file list is fine to
+# initialise here because the tls overlay is the only conditional one.
 COMPOSE_FILES=( -f docker-compose.yml -f docker-compose.prod.yml -f docker-compose.host-mount.yml )
 PROFILE_ARGS=()
+
+# Re-fetch the bind-mounted config files (compose overlays + Caddyfile)
+# from the OSS main branch on every tick. Without this, an image-only
+# change is fine, but a change to the Caddyfile or any compose overlay
+# (e.g. a new bind mount, a route, an env_file path) only lands on VMs
+# that get a fresh `startup.sh` boot — leaving long-uptime VMs running
+# the new image against stale config. Confirmed live on 2026-05-05
+# when a Caddyfile change adding a `data:/srv:ro` mount + a new
+# `forward_auth` + `file_server` route for parquet downloads landed
+# in main but stayed inert on running VMs because auto-upgrade only
+# watched image digests.
+#
+# Hash before/after to detect content drift; treat as "trigger recreate"
+# alongside an image digest change. Atomic move-after-fetch guards
+# against a partial download corrupting compose at the next docker
+# action — `curl --fail` plus the `.new` rename means a 404 / network
+# blip leaves the existing file untouched.
+RAW_BASE="https://raw.githubusercontent.com/keboola/agnes-the-ai-analyst/main"
+CONFIG_FILES=(
+  docker-compose.yml docker-compose.prod.yml docker-compose.host-mount.yml
+  docker-compose.tls.yml Caddyfile
+)
+hash_config_files() {
+  # Sort to keep hash stable across operator add/remove, missing files
+  # contribute the empty string (sha256 of "" is well-defined). Run
+  # from /opt/agnes to keep relative paths terse in the hash input.
+  ( cd /opt/agnes && for f in "${CONFIG_FILES[@]}"; do
+      sha256sum "$f" 2>/dev/null || printf 'missing %s\n' "$f"
+    done ) | sort | sha256sum | awk '{print $1}'
+}
+CONFIG_BEFORE=$(hash_config_files)
+for f in "${CONFIG_FILES[@]}"; do
+  if curl -fsSL "$RAW_BASE/$f" -o "/opt/agnes/$f.new" 2>/dev/null; then
+    mv -f "/opt/agnes/$f.new" "/opt/agnes/$f"
+  else
+    rm -f "/opt/agnes/$f.new"
+    logger -t agnes-auto-upgrade "WARN: failed to fetch $f from $RAW_BASE — keeping existing /opt/agnes/$f"
+  fi
+done
+CONFIG_AFTER=$(hash_config_files)
+
 # `-s` (size > 0) instead of `-f` — guards against the corner case where
 # rotate.sh wrote a 0-byte cert and exited (or got SIGKILLed mid-write).
 # Bringing up the tls profile against an empty cert would just crash
@ -63,19 +112,46 @@ PROFILE_ARGS=()
 # with an empty one) the caddy service crash-loops while the tls overlay
 # has already closed :8000 — net effect is "app unreachable". Skipping
 # the overlay keeps the app on plain :8000 until config lands.
+#
+# Evaluated AFTER the config re-fetch above so a freshly-added or
+# freshly-removed Caddyfile is reflected in this tick's compose set,
+# not the next one.
 if [ -s /data/state/certs/fullchain.pem ] && [ -s /data/state/certs/privkey.pem ] && [ -s Caddyfile ]; then
    COMPOSE_FILES+=( -f docker-compose.tls.yml )
    PROFILE_ARGS=( --profile tls )
 elif [ -s /data/state/certs/fullchain.pem ] && [ -s /data/state/certs/privkey.pem ]; then
    logger -t agnes-auto-upgrade "WARN: certs present but Caddyfile missing/empty — skipping tls overlay"
 fi
+
 BEFORE=$(docker images --no-trunc --format '{{.Digest}}' "$IMAGE" | head -1)
 docker compose "${COMPOSE_FILES[@]}" pull >/dev/null 2>&1
 AFTER=$(docker images --no-trunc --format '{{.Digest}}' "$IMAGE" | head -1)
-if [ "$BEFORE" != "$AFTER" ]; then
-    echo "$(date): new digest for $IMAGE — recreating containers"
+
+if [ "$BEFORE" != "$AFTER" ] || [ "$CONFIG_BEFORE" != "$CONFIG_AFTER" ]; then
+    REASON=()
+    [ "$BEFORE" != "$AFTER" ] && REASON+=("image digest")
+    [ "$CONFIG_BEFORE" != "$CONFIG_AFTER" ] && REASON+=("config files")
+    echo "$(date): change detected (${REASON[*]}) — recreating containers"
    # ${arr[@]+"${arr[@]}"} pattern: expands to nothing when array is
    # empty (vs. plain "${arr[@]}" which trips `set -u` on bash <4.4).
    docker compose "${COMPOSE_FILES[@]}" ${PROFILE_ARGS[@]+"${PROFILE_ARGS[@]}"} up -d
    docker image prune -f >/dev/null 2>&1
 fi
+
+# Self-update: re-fetch *this* script too. Without this, the very fix
+# that lets auto-upgrade watch config files would itself never land on
+# running VMs — a self-perpetuating "old script" problem. Atomic via
+# .new + mv; chmod preserved. The next tick (5 min later) runs the
+# new logic. Skipping if curl fails leaves the existing script in place.
+if curl -fsSL "$RAW_BASE/scripts/ops/agnes-auto-upgrade.sh" \
+   -o /usr/local/bin/agnes-auto-upgrade.sh.new 2>/dev/null; then
+  if ! cmp -s /usr/local/bin/agnes-auto-upgrade.sh.new \
+                /usr/local/bin/agnes-auto-upgrade.sh; then
+    chmod +x /usr/local/bin/agnes-auto-upgrade.sh.new
+    mv -f /usr/local/bin/agnes-auto-upgrade.sh.new \
+          /usr/local/bin/agnes-auto-upgrade.sh
+    logger -t agnes-auto-upgrade "self-update: replaced /usr/local/bin/agnes-auto-upgrade.sh"
+  else
+    rm -f /usr/local/bin/agnes-auto-upgrade.sh.new
+  fi
+fi
--- a/src/db.py
+++ b/src/db.py
@ -614,6 +614,8 @@ def _reattach_remote_extensions(
                        f"CREATE OR REPLACE SECRET {secret_name} "
                        f"(TYPE bigquery, ACCESS_TOKEN '{escaped}')"
                    )
+                    from connectors.bigquery.access import apply_bq_session_settings
+                    apply_bq_session_settings(conn)
                    conn.execute(
                        f"ATTACH '{safe_url}' AS {alias} (TYPE {extension}, READ_ONLY)"
                    )
--- a/src/orchestrator.py
+++ b/src/orchestrator.py
@ -502,6 +502,8 @@ class SyncOrchestrator:
                        f"CREATE OR REPLACE SECRET {secret_name} "
                        f"(TYPE bigquery, ACCESS_TOKEN '{escaped}')"
                    )
+                    from connectors.bigquery.access import apply_bq_session_settings
+                    apply_bq_session_settings(conn)
                    conn.execute(
                        f"ATTACH '{safe_url}' AS {alias} (TYPE {extension}, READ_ONLY)"
                    )
--- a/tests/test_bq_query_timeout.py
+++ b/tests/test_bq_query_timeout.py
@ -0,0 +1,123 @@
+"""Unit tests for apply_bq_session_settings.
+
+Covers the data_source.bigquery.query_timeout_ms knob added so that
+agnes query --remote no longer trips the DuckDB BigQuery extension's
+built-in 90 s wait timeout when the underlying BQ job takes longer.
+"""
+
+from unittest.mock import patch
+
+from connectors.bigquery.access import apply_bq_session_settings
+
+
+class _RecordingConn:
+    """Minimal DuckDB-conn stand-in that records execute() calls.
+
+    apply_bq_session_settings only calls .execute(); we don't need a
+    real DuckDB to verify the SET command shape.
+    """
+
+    def __init__(self, raise_on=None):
+        self.calls: list[str] = []
+        self.raise_on = raise_on
+
+    def execute(self, sql: str):
+        self.calls.append(sql)
+        if self.raise_on and self.raise_on in sql:
+            raise RuntimeError(f"simulated failure on: {sql}")
+
+
+def _patched_get_value(value):
+    """Helper: build a patch target that returns *value* for the
+    data_source.bigquery.query_timeout_ms key and propagates the
+    `default=` kwarg for any other lookup so we don't accidentally
+    break tests that read other keys via the same module."""
+    def fake(*keys, default=None):
+        if keys == ("data_source", "bigquery", "query_timeout_ms"):
+            return value
+        return default
+    return patch("app.instance_config.get_value", side_effect=fake)
+
+
+def test_default_when_config_missing():
+    """When get_value returns the default (None passed through, default arg
+    used), apply_bq_session_settings should fall back to the bumped
+    600 000 ms default and emit the SET."""
+    conn = _RecordingConn()
+    # Simulate get_value returning the default we passed (600_000) by
+    # echoing the default kwarg.
+    def fake(*keys, default=None):
+        return default
+    with patch("app.instance_config.get_value", side_effect=fake):
+        apply_bq_session_settings(conn)
+    assert conn.calls == ["SET bq_query_timeout_ms = 600000"]
+
+
+def test_explicit_value():
+    conn = _RecordingConn()
+    with _patched_get_value(900_000):
+        apply_bq_session_settings(conn)
+    assert conn.calls == ["SET bq_query_timeout_ms = 900000"]
+
+
+def test_zero_sentinel_leaves_extension_default():
+    """0 means 'use the DuckDB BQ extension's built-in default' — no SET
+    must be emitted so a non-zero default doesn't override an operator's
+    explicit opt-out."""
+    conn = _RecordingConn()
+    with _patched_get_value(0):
+        apply_bq_session_settings(conn)
+    assert conn.calls == []
+
+
+def test_negative_value_treated_as_zero():
+    """Negative is nonsensical for a timeout; treat as 'extension default'
+    rather than emitting a negative SET that the extension might reject
+    or interpret unexpectedly."""
+    conn = _RecordingConn()
+    with _patched_get_value(-1):
+        apply_bq_session_settings(conn)
+    assert conn.calls == []
+
+
+def test_non_numeric_silently_skipped():
+    """A string-typed YAML value (e.g. operator typo) shouldn't crash
+    the BQ session — fall through to the extension default."""
+    conn = _RecordingConn()
+    with _patched_get_value("notanumber"):
+        apply_bq_session_settings(conn)
+    assert conn.calls == []
+
+
+def test_string_numeric_is_coerced():
+    """YAML loaders sometimes deliver int-like values as strings; accept
+    those rather than failing."""
+    conn = _RecordingConn()
+    with _patched_get_value("750000"):
+        apply_bq_session_settings(conn)
+    assert conn.calls == ["SET bq_query_timeout_ms = 750000"]
+
+
+def test_set_failure_does_not_propagate():
+    """Older DuckDB BQ extension versions may not recognise the setting.
+    The function must fail-soft so a session that was otherwise healthy
+    keeps working — just with the extension's built-in default timeout."""
+    conn = _RecordingConn(raise_on="SET bq_query_timeout_ms")
+    with _patched_get_value(600_000):
+        # Must not raise.
+        apply_bq_session_settings(conn)
+    # The SET was attempted (recorded before the exception).
+    assert conn.calls == ["SET bq_query_timeout_ms = 600000"]
+
+
+def test_no_app_config_module_silently_skipped():
+    """Unit-test contexts that don't bring up the app config layer must
+    still be able to construct BQ sessions for narrow tests; an
+    ImportError on app.instance_config means we can't read the knob,
+    so we leave the extension default in place."""
+    conn = _RecordingConn()
+    with patch.dict(
+        "sys.modules", {"app.instance_config": None},
+    ):
+        apply_bq_session_settings(conn)
+    assert conn.calls == []
--- a/tests/test_check_access_endpoint.py
+++ b/tests/test_check_access_endpoint.py
@ -0,0 +1,124 @@
+"""Unit tests for ``GET /api/data/{table_id}/check-access`` — the
+lightweight RBAC probe used by Caddy's ``forward_auth`` directive to gate
+file_server-served parquet downloads without involving the app's request
+workers in the bulk byte transfer.
+
+The endpoint must:
+  - return 204 when the caller has read access (admin → always; non-admin
+    only with an explicit ``resource_grants`` row),
+  - return 403 with no body / minimal body when the caller does not,
+  - return 404 for unsafe identifiers (path-traversal guard),
+  - return 401 when the request has no auth.
+"""
+
+from tests.conftest import create_mock_extract
+
+
+def _auth(token):
+    return {"Authorization": f"Bearer {token}"}
+
+
+def test_admin_gets_204(seeded_app):
+    """Admin short-circuits all RBAC checks — must always succeed."""
+    c = seeded_app["client"]
+    env = seeded_app["env"]
+    create_mock_extract(env["extracts_dir"], "keboola", [
+        {"name": "salaries", "data": [{"id": "1"}]},
+    ])
+    from src.orchestrator import SyncOrchestrator
+    SyncOrchestrator().rebuild()
+    c.post(
+        "/api/admin/register-table",
+        json={"name": "salaries", "source_type": "keboola"},
+        headers=_auth(seeded_app["admin_token"]),
+    )
+
+    resp = c.get(
+        "/api/data/salaries/check-access",
+        headers=_auth(seeded_app["admin_token"]),
+    )
+    assert resp.status_code == 204
+    assert resp.content == b""
+
+
+def test_analyst_without_grant_gets_403(seeded_app):
+    """Non-admin without an explicit `resource_grants` row must be denied
+    — the production failure mode where Caddy's forward_auth returns the
+    403 to the client and never invokes file_server."""
+    c = seeded_app["client"]
+    env = seeded_app["env"]
+    create_mock_extract(env["extracts_dir"], "keboola", [
+        {"name": "salaries", "data": [{"id": "1"}]},
+    ])
+    from src.orchestrator import SyncOrchestrator
+    SyncOrchestrator().rebuild()
+    c.post(
+        "/api/admin/register-table",
+        json={"name": "salaries", "source_type": "keboola"},
+        headers=_auth(seeded_app["admin_token"]),
+    )
+
+    resp = c.get(
+        "/api/data/salaries/check-access",
+        headers=_auth(seeded_app["analyst_token"]),
+    )
+    assert resp.status_code == 403
+
+
+def test_analyst_with_grant_gets_204(seeded_app):
+    """Once the analyst has a TABLE grant, check-access flips to 204
+    and Caddy is free to serve the file directly. Mirrors the same
+    grant flow used by ``/api/data/{id}/download``."""
+    c = seeded_app["client"]
+    env = seeded_app["env"]
+    create_mock_extract(env["extracts_dir"], "keboola", [
+        {"name": "salaries", "data": [{"id": "1"}]},
+    ])
+    from src.orchestrator import SyncOrchestrator
+    SyncOrchestrator().rebuild()
+    c.post(
+        "/api/admin/register-table",
+        json={"name": "salaries", "source_type": "keboola"},
+        headers=_auth(seeded_app["admin_token"]),
+    )
+
+    # Mint the grant via the admin API the same way the existing download
+    # access-control tests do — see test_access_control.py.
+    from tests.test_access_control import _grant_table_to_analyst
+    from src.db import get_system_db
+    conn = get_system_db()
+    try:
+        _grant_table_to_analyst(conn, "salaries")
+    finally:
+        conn.close()
+
+    resp = c.get(
+        "/api/data/salaries/check-access",
+        headers=_auth(seeded_app["analyst_token"]),
+    )
+    assert resp.status_code == 204
+
+
+def test_unsafe_table_id_gets_404(seeded_app):
+    """Identifier validation runs BEFORE RBAC — keeps path-traversal
+    payloads (``../etc/passwd``) from reaching ``can_access_table`` and
+    matches the pre-existing behavior of ``/download``."""
+    c = seeded_app["client"]
+    resp = c.get(
+        "/api/data/..%2Fetc%2Fpasswd/check-access",
+        headers=_auth(seeded_app["admin_token"]),
+    )
+    # FastAPI's path converter rejects encoded slashes outright; either
+    # 404 from the validator or 404 from no-such-route is acceptable —
+    # both block the traversal. The point is no 5xx and no 204.
+    assert resp.status_code in (404, 422)
+
+
+def test_no_auth_gets_401(seeded_app):
+    """Caddy will only call the auth-check endpoint when the client sent
+    credentials — but if a request slips through without them, the
+    endpoint must reject with 401 so Caddy returns 401 to the client
+    instead of falling through to file_server with no identity."""
+    c = seeded_app["client"]
+    resp = c.get("/api/data/salaries/check-access")
+    assert resp.status_code == 401
--- a/tests/test_v2_sample.py
+++ b/tests/test_v2_sample.py
@ -152,7 +152,7 @@ class TestBqAccessErrors:
            # Endpoint is async — drive it directly. dependency_overrides only
            # fires through TestClient/HTTP, so pass `bq=bq` explicitly.
            with pytest.raises(HTTPException) as exc_info:
-                asyncio.run(v2_sample.sample(
+                (v2_sample.sample(
                    table_id="bq_view", n=5, user=user, conn=conn, bq=bq,
                ))
        finally:
@ -182,7 +182,7 @@ class TestBqAccessErrors:
            user = {"id": "admin1", "email": "a@x.com"}

            with pytest.raises(HTTPException) as exc_info:
-                asyncio.run(v2_sample.sample(
+                (v2_sample.sample(
                    table_id="bq_view", n=5, user=user, conn=conn, bq=bq,
                ))
        finally:
@ -211,7 +211,7 @@ class TestBqAccessErrors:
            user = {"id": "admin1", "email": "a@x.com"}

            with pytest.raises(HTTPException) as exc_info:
-                asyncio.run(v2_sample.sample(
+                (v2_sample.sample(
                    table_id="bq_view", n=5, user=user, conn=conn, bq=bq,
                ))
        finally:
@ -248,7 +248,7 @@ class TestBqAccessErrors:
        try:
            _seed(conn)
            user = {"id": "admin1", "email": "a@x.com"}
-            asyncio.run(v2_sample.sample(
+            (v2_sample.sample(
                table_id="bq_view", n=5, user=user, conn=conn, bq=bq,
            ))
        finally:
--- a/tests/test_v2_scan.py
+++ b/tests/test_v2_scan.py
@ -298,7 +298,7 @@ class TestBqAccessErrors:
                lambda *a, **kw: {"event_date": "DATE", "country_code": "STRING"},
            ):
                with pytest.raises(HTTPException) as exc_info:
-                    asyncio.run(
+                    (
                        v2_scan.scan_endpoint(raw=req, user=user, conn=conn, bq=bq)
                    )
        finally:
@ -337,7 +337,7 @@ class TestBqAccessErrors:
                lambda *a, **kw: {"event_date": "DATE", "country_code": "STRING"},
            ):
                with pytest.raises(HTTPException) as exc_info:
-                    asyncio.run(
+                    (
                        v2_scan.scan_endpoint(raw=req, user=user, conn=conn, bq=bq)
                    )
        finally:
@ -372,7 +372,7 @@ class TestBqAccessErrors:
                lambda *a, **kw: {"event_date": "DATE", "country_code": "STRING"},
            ):
                with pytest.raises(HTTPException) as exc_info:
-                    asyncio.run(
+                    (
                        v2_scan.scan_endpoint(raw=req, user=user, conn=conn, bq=bq)
                    )
        finally:
--- a/tests/test_v2_scan_estimate.py
+++ b/tests/test_v2_scan_estimate.py
@ -117,7 +117,7 @@ class TestBqAccessErrors:
                lambda *a, **kw: {"event_date": "DATE", "country_code": "STRING"},
            ):
                with pytest.raises(HTTPException) as exc_info:
-                    asyncio.run(
+                    (
                        v2_scan.scan_estimate_endpoint(raw=req, user=user, conn=conn, bq=bq)
                    )
        finally:
@ -156,7 +156,7 @@ class TestBqAccessErrors:
                lambda *a, **kw: {"event_date": "DATE", "country_code": "STRING"},
            ):
                with pytest.raises(HTTPException) as exc_info:
-                    asyncio.run(
+                    (
                        v2_scan.scan_estimate_endpoint(raw=req, user=user, conn=conn, bq=bq)
                    )
        finally:
@ -191,7 +191,7 @@ class TestBqAccessErrors:
                lambda *a, **kw: {"event_date": "DATE", "country_code": "STRING"},
            ):
                with pytest.raises(HTTPException) as exc_info:
-                    asyncio.run(
+                    (
                        v2_scan.scan_estimate_endpoint(raw=req, user=user, conn=conn, bq=bq)
                    )
        finally:
--- a/tests/test_v2_schema.py
+++ b/tests/test_v2_schema.py
@ -161,7 +161,7 @@ class TestBqAccessErrors:
            # Endpoint is async — drive it directly. dependency_overrides only
            # fires through TestClient/HTTP, so pass `bq=bq` explicitly.
            with pytest.raises(HTTPException) as exc_info:
-                asyncio.run(v2_schema.schema(
+                (v2_schema.schema(
                    table_id="bq_view", user=user, conn=conn, bq=bq,
                ))
        finally:
@ -191,7 +191,7 @@ class TestBqAccessErrors:
            _seed_bq_table(conn)
            user = {"id": "admin1", "email": "a@x.com"}
            with pytest.raises(HTTPException) as exc_info:
-                asyncio.run(v2_schema.schema(
+                (v2_schema.schema(
                    table_id="bq_view", user=user, conn=conn, bq=bq,
                ))
        finally:
@ -217,7 +217,7 @@ class TestBqAccessErrors:
            _seed_bq_table(conn)
            user = {"id": "admin1", "email": "a@x.com"}
            with pytest.raises(HTTPException) as exc_info:
-                asyncio.run(v2_schema.schema(
+                (v2_schema.schema(
                    table_id="bq_view", user=user, conn=conn, bq=bq,
                ))
        finally:
@ -259,7 +259,7 @@ class TestBqAccessErrors:
        try:
            _seed_bq_table(conn)
            user = {"id": "admin1", "email": "a@x.com"}
-            data = asyncio.run(v2_schema.schema(
+            data = (v2_schema.schema(
                table_id="bq_view", user=user, conn=conn, bq=_bq(),
            ))
        finally:
@ -322,7 +322,7 @@ class TestBqAccessErrors:
        try:
            _seed_bq_table(conn)
            user = {"id": "admin1", "email": "a@x.com"}
-            asyncio.run(v2_schema.schema(
+            (v2_schema.schema(
                table_id="bq_view", user=user, conn=conn, bq=bq,
            ))
        finally: