release: 0.46.1 — surface real BQ error from remote_estimate_failed retry (#218)

## Summary When `agnes query --remote` references a column that doesn't exist on the FROM table, users were seeing `Table "<id>" must be qualified with a dataset` instead of the actually-useful `Unrecognized name: <column>` from BigQuery. Surface the first-attempt diagnostic now; keep the second-attempt context as `underlying_original`. Reproduced against production: ``` $ agnes query --remote "SELECT COUNT(*) FROM unit_economics WHERE authorize_date = DATE '2025-05-06'" Error: remote_estimate_failed (HTTP 400) message: Could not estimate scan size for this query. underlying: 400 ... Table "unit_economics" must be qualified with a dataset. ``` (`unit_economics` has `authorize_timestamp`, not `authorize_date`.) ## Test plan - [x] New `test_remote_estimate_failed_surfaces_first_error_when_attempts_differ` asserts the first-attempt message wins, second-attempt is preserved as `underlying_original`, hint points to `agnes schema`. - [x] Existing `test_guardrail_returns_400_remote_estimate_failed_on_double_parse_error` still passes (both attempts mocked to identical error). - [x] `pytest tests/test_api_query_guardrail.py` clean.  --- <a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/218" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a>
2026-05-07 16:54:45 +02:00 · 2026-05-07 16:54:45 +02:00 · 378ee40459
commit 378ee40459
parent f1561a67d8
4 changed files with 100 additions and 7 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -10,6 +10,12 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C

 ## [Unreleased]

+## [0.46.1] — 2026-05-07
+
+### Fixed
+
+- `remote_estimate_failed` now surfaces the rewritten-SQL diagnostic (the actual BQ "Unrecognized name" / "Syntax error" message) instead of the unhelpful "Table must be qualified" from the user-original-SQL retry. Adds `underlying_original` for the second-attempt context. Hint now points users to `agnes schema <id>` first — the typical cause is a typo'd column name.
+
 ## [0.46.0] — 2026-05-07

 Keboola cutover bundle: native parquet on the materialized sync,
--- a/app/api/query.py
+++ b/app/api/query.py
@ -971,6 +971,16 @@ def _bq_quota_and_cap_guard(
    queries via `remote_scan_too_large`. Forbidden / upstream errors
    still propagate as HTTP 502.

+    On retry-failure the surfaced `underlying` is the FIRST exception's
+    message (the rewritten-SQL diagnostic) — not the second's. For the
+    common case where the user references a catalog id (no qualifying
+    dataset in their SQL), the second attempt is guaranteed to fail
+    with the unhelpful ``Table "<id>" must be qualified with a dataset``,
+    masking the actually-useful ``Unrecognized name: <column>`` /
+    ``Syntax error`` diagnostic from the rewritten attempt. The
+    second-attempt message is preserved as `underlying_original` for
+    operator visibility.
+
    Flow:
    1. `check_daily_budget` — over-cap users get 429 BEFORE any BQ work.
    2. `quota.acquire(user_id)` opened — concurrent-slot held throughout.
@ -1068,15 +1078,26 @@ def _bq_quota_and_cap_guard(
                    raise HTTPException(status_code=400, detail={
                        "kind": "remote_estimate_failed",
                        "message": (
-                            "Could not estimate scan size for this query."
+                            "BigQuery rejected this query during cost "
+                            "estimation."
                        ),
                        "hint": (
-                            "Use a registered table name from `agnes "
-                            "catalog`, or write BQ-native SQL with full "
-                            "backtick paths. Pure DuckDB-only syntax is "
-                            "not supported for --remote queries."
+                            "Most often this means a column referenced "
+                            "in WHERE/SELECT/etc doesn't exist on the "
+                            "table — verify with `agnes schema <id>`. "
+                            "Otherwise: use a registered table name from "
+                            "`agnes catalog`, or write BQ-native SQL "
+                            "with full backtick paths."
                        ),
-                        "underlying": exc2.message,
+                        # Surface the FIRST attempt's diagnostic (rewritten
+                        # SQL — has the real "Unrecognized name" / syntax
+                        # info). Second attempt for catalog-id-only SQL
+                        # always fails with the unhelpful "must be
+                        # qualified" message, so we keep it as
+                        # `underlying_original` for operator context but
+                        # don't lead with it.
+                        "underlying": exc.message,
+                        "underlying_original": exc2.message,
                    })

            # Distribute the total to dry_run_set so the caller's
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,6 +1,6 @@
 [project]
 name = "agnes-the-ai-analyst"
-version = "0.46.0"
+version = "0.46.1"
 description = "Agnes — AI Data Analyst platform for AI analytical systems"
 requires-python = ">=3.11,<3.14"
 license = "MIT"
--- a/tests/test_api_query_guardrail.py
+++ b/tests/test_api_query_guardrail.py
@ -347,6 +347,72 @@ def test_fallback_fails_fast_on_pure_duckdb_syntax(
           "backtick" in detail.get("hint", "").lower(), detail


+def test_remote_estimate_failed_surfaces_first_error_when_attempts_differ(
+    seeded_app, mock_dry_run, monkeypatch,
+):
+    """When the rewritten-SQL dry-run fails with a column-not-found /
+    syntax error and the original-SQL retry fails with the unhelpful
+    "must be qualified" (the typical shape for catalog-id references —
+    user SQL has no qualifying dataset, so the retry is guaranteed to
+    fail this way), the surfaced `underlying` MUST be the first
+    attempt's diagnostic. Pre-fix the second attempt's message
+    overwrote the first, masking the real cause from the user.
+    """
+    from connectors.bigquery.access import BqAccessError
+
+    _register_bq_remote_row("ue", "finance", "ue")
+
+    state = {"calls": 0}
+
+    def two_different_errors(_bq, _sql):
+        state["calls"] += 1
+        if state["calls"] == 1:
+            raise BqAccessError(
+                "bq_bad_request",
+                "Unrecognized name: authorize_date at [1:88]",
+            )
+        raise BqAccessError(
+            "bq_bad_request",
+            "Table 'unit_economics' must be qualified with a dataset "
+            "(e.g. dataset.table)",
+        )
+
+    monkeypatch.setattr(
+        "app.api.query._bq_dry_run_bytes", two_different_errors,
+        raising=False,
+    )
+
+    c = seeded_app["client"]
+    token = seeded_app["admin_token"]
+    r = c.post(
+        "/api/query",
+        json={
+            "sql": (
+                "SELECT COUNT(*) FROM ue "
+                "WHERE authorize_date = DATE '2025-05-06'"
+            ),
+        },
+        headers=_auth(token),
+    )
+
+    assert state["calls"] == 2, (
+        f"expected rewritten + original-retry = 2 dry-runs, got "
+        f"{state['calls']}"
+    )
+    assert r.status_code == 400, r.json()
+    detail = r.json().get("detail", {})
+    assert isinstance(detail, dict), detail
+    assert detail.get("kind") == "remote_estimate_failed", detail
+    # The FIRST attempt's diagnostic — the actually-useful one — wins.
+    assert "authorize_date" in detail.get("underlying", ""), detail
+    # The second attempt's context is preserved for operator visibility.
+    assert "must be qualified" in detail.get("underlying_original", ""), \
+        detail
+    # Hint now points at `agnes schema` first — the typical cause is a
+    # typo'd column name on the FROM table.
+    assert "agnes schema" in detail.get("hint", "").lower(), detail
+
+
 def test_guardrail_propagates_502_on_non_parse_bq_errors(
    seeded_app, mock_dry_run, monkeypatch,
 ):