fix: Devin Review on #188 commit 28423907 — 2 bugs

🚩 /api/v2/catalog still async def while now calling sync stat()

`/api/v2/catalog` was left as `async def` when the rest of Tier 1 was
converted, on the assumption it was lightweight. The new
`_materialized_size_hint` populator added in this PR calls
`Path.stat()` / `Path.exists()` for every visible row to bucket the
parquet size — on a local FS that's microseconds, but on a
network-mounted DATA_DIR (NFS / CIFS / GCS-FUSE) those syscalls
can block the event loop. Convert to plain `def` so FastAPI
auto-offloads to the thread pool, mirroring /api/query etc.

🔴 stream_download translates HTTPStatusError as generic transport error

`response.raise_for_status()` inside the retry loop raises
`httpx.HTTPStatusError` on 4xx/5xx. After retries exhaust, the new
`isinstance(last_exc, httpx.HTTPError)` check at line 219 was eating
the status code: HTTPStatusError is a subclass of HTTPError, so the
generic transport translation produced "Unexpected error: HTTPStatusError"
instead of the informative "Client error '401 Unauthorized' for url …"
that callers expect. Fix: short-circuit HTTPStatusError before the
HTTPError branch — it re-raises verbatim so the caller's status-code
handling + the rich server error body (e.g. 401 expired token, 403
cross_project_forbidden) reach the analyst.

api_get / api_post / api_delete / api_patch don't have the same bug:
httpx Client.get/etc. don't raise HTTPStatusError unless the caller
explicitly calls .raise_for_status(), and our wrappers don't.
Only stream_download does, hence the targeted fix there.
This commit is contained in:
ZdenekSrotyr 2026-05-05 18:29:44 +02:00
parent 28423907fd
commit f2ce915458
2 changed files with 21 additions and 3 deletions

View file

@ -127,8 +127,17 @@ def build_catalog(conn: duckdb.DuckDBPyConnection, user: dict) -> dict:
@router.get("/catalog") @router.get("/catalog")
async def catalog( def catalog(
user: dict = Depends(get_current_user), user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db), conn: duckdb.DuckDBPyConnection = Depends(_get_db),
): ):
# Plain ``def`` so FastAPI auto-offloads to the anyio thread pool —
# build_catalog now calls `_materialized_size_hint` for every visible
# row, which does sync `Path.stat()` / `Path.exists()` on the data
# volume. On local FS that's microseconds, but on a network-mounted
# DATA_DIR (NFS / CIFS / GCS-FUSE) those calls can block. Plain ``def``
# means each request runs on its own thread; the event loop stays
# free for non-catalog traffic. Mirrors the Tier 1 conversion of
# /api/query, /api/v2/scan, /api/v2/sample, /api/v2/schema —
# Devin Review on PR #188.
return build_catalog(conn, user) return build_catalog(conn, user)

View file

@ -212,10 +212,19 @@ def stream_download(path: str, target_path: str, progress_callback=None) -> int:
break break
time.sleep(_RETRY_BACKOFFS_S[min(attempt, len(_RETRY_BACKOFFS_S) - 1)]) time.sleep(_RETRY_BACKOFFS_S[min(attempt, len(_RETRY_BACKOFFS_S) - 1)])
# Clean up any leftover tmp, then surface the last exception. Translate # Clean up any leftover tmp, then surface the last exception. Translate
# transport errors to AgnesTransportError so the CLI prints a clean # transport errors (timeouts, connection drops, protocol errors) to
# message instead of a Python traceback (Pavel's #185 Phase 3B). # AgnesTransportError so the CLI prints a clean message instead of a
# Python traceback (Pavel's #185 Phase 3B). HTTPStatusError (4xx/5xx
# response from the server) is NOT a transport failure and must
# re-raise verbatim so the caller's status-code handling + the rich
# server error body (e.g. 401 with "token expired", 403 with
# cross_project_forbidden detail) reach the analyst — Devin Review on
# PR #188 caught: HTTPStatusError is a subclass of HTTPError, so the
# generic isinstance(HTTPError) translation was eating status codes.
tmp_path.unlink(missing_ok=True) tmp_path.unlink(missing_ok=True)
assert last_exc is not None assert last_exc is not None
if isinstance(last_exc, httpx.HTTPStatusError):
raise last_exc
if isinstance(last_exc, httpx.HTTPError): if isinstance(last_exc, httpx.HTTPError):
raise _translate_transport_error( raise _translate_transport_error(
last_exc, context=f"GET {path} (stream → {target_path})" last_exc, context=f"GET {path} (stream → {target_path})"