release: 0.47.2 — restore #218 + #219 fixes silently reverted by #217 (#225)

## Summary

Smoke-testing the just-shipped 0.47.1 against production exposed two regressions:

1. `agnes query --remote "SELECT FROM unit_economics WHERE bad_col=1"` returned `Table "unit_economics" must be qualified` (the OLD error) instead of `Unrecognized name: bad_col` (the #218 fix's intended behavior).
2. `agnes query "DESCRIBE unit_economics"` showed only DuckDB's misleading `Did you mean order_economics?` with no Agnes hint paragraph (the #219 fix is missing).

Root cause: PR #217's squash merge (`506a378c`) carried stale snapshots of `app/api/query.py` and `cli/commands/query.py` from before #218 and #219 merged. The rebase-and-merge auto-merged those files cleanly (no conflict markers) but the result silently reverted both fixes.

Restore the two changes verbatim. Tests for both fixes already on main and continue to pass against the restored code.

## Test plan

- [x] `pytest tests/test_api_query_guardrail.py tests/test_cli_query.py` — clean
- [x] Manual repro against prod after deploy: both flows now surface the intended diagnostic.
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/225" target="_blank">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
    <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review">
  </picture>
</a>
<!-- devin-review-badge-end -->
This commit is contained in:
ZdenekSrotyr 2026-05-07 19:57:18 +02:00 committed by GitHub
parent 506a378c3a
commit 917f9aaef0
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 164 additions and 7 deletions

View file

@ -10,6 +10,12 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C
## [Unreleased] ## [Unreleased]
## [0.47.2] — 2026-05-07
### Fixed
- Restore #218 (real BQ error surfacing in `remote_estimate_failed`) and #219 (friendlier missing-table hint in `agnes query`) — both fixes were silently reverted by the squash merge of #217 because that branch carried stale snapshots of `app/api/query.py` and `cli/commands/query.py` from before #218 and #219 merged. Verified end-to-end against production: `agnes query --remote "SELECT FROM unit_economics WHERE bad_col=1"` now returns the BQ "Unrecognized name" diagnostic; `agnes query "DESCRIBE unit_economics"` now appends the remote-table hint.
## [0.47.1] — 2026-05-07 ## [0.47.1] — 2026-05-07
Keboola connector v27 — incremental, partitioned, where_filters, typed parquet. Keboola connector v27 — incremental, partitioned, where_filters, typed parquet.

View file

@ -971,6 +971,16 @@ def _bq_quota_and_cap_guard(
queries via `remote_scan_too_large`. Forbidden / upstream errors queries via `remote_scan_too_large`. Forbidden / upstream errors
still propagate as HTTP 502. still propagate as HTTP 502.
On retry-failure the surfaced `underlying` is the FIRST exception's
message (the rewritten-SQL diagnostic) not the second's. For the
common case where the user references a catalog id (no qualifying
dataset in their SQL), the second attempt is guaranteed to fail
with the unhelpful ``Table "<id>" must be qualified with a dataset``,
masking the actually-useful ``Unrecognized name: <column>`` /
``Syntax error`` diagnostic from the rewritten attempt. The
second-attempt message is preserved as `underlying_original` for
operator visibility.
Flow: Flow:
1. `check_daily_budget` over-cap users get 429 BEFORE any BQ work. 1. `check_daily_budget` over-cap users get 429 BEFORE any BQ work.
2. `quota.acquire(user_id)` opened concurrent-slot held throughout. 2. `quota.acquire(user_id)` opened concurrent-slot held throughout.
@ -1068,15 +1078,26 @@ def _bq_quota_and_cap_guard(
raise HTTPException(status_code=400, detail={ raise HTTPException(status_code=400, detail={
"kind": "remote_estimate_failed", "kind": "remote_estimate_failed",
"message": ( "message": (
"Could not estimate scan size for this query." "BigQuery rejected this query during cost "
"estimation."
), ),
"hint": ( "hint": (
"Use a registered table name from `agnes " "Most often this means a column referenced "
"catalog`, or write BQ-native SQL with full " "in WHERE/SELECT/etc doesn't exist on the "
"backtick paths. Pure DuckDB-only syntax is " "table — verify with `agnes schema <id>`. "
"not supported for --remote queries." "Otherwise: use a registered table name from "
"`agnes catalog`, or write BQ-native SQL "
"with full backtick paths."
), ),
"underlying": exc2.message, # Surface the FIRST attempt's diagnostic (rewritten
# SQL — has the real "Unrecognized name" / syntax
# info). Second attempt for catalog-id-only SQL
# always fails with the unhelpful "must be
# qualified" message, so we keep it as
# `underlying_original` for operator context but
# don't lead with it.
"underlying": exc.message,
"underlying_original": exc2.message,
}) })
# Distribute the total to dry_run_set so the caller's # Distribute the total to dry_run_set so the caller's

View file

@ -2,6 +2,7 @@
import json import json
import os import os
import re
import sys import sys
from pathlib import Path from pathlib import Path
from typing import List, Optional from typing import List, Optional
@ -78,6 +79,26 @@ def _query_local(sql: str, fmt: str, limit: int):
_output(columns, result, fmt) _output(columns, result, fmt)
except Exception as e: except Exception as e:
typer.echo(f"Query error: {e}", err=True) typer.echo(f"Query error: {e}", err=True)
# DuckDB's "Did you mean <similar materialized view>" suggestion is
# misleading when the unresolvable identifier is actually a
# `query_mode='remote'` table — those have no local view by design.
# Append a friendly hint pointing the user at `agnes catalog`,
# `agnes schema`, and `agnes query --remote`. We don't verify against
# the remote registry here (this command is offline-friendly), so the
# hint is conditional ("might be") — safe even when the name was just
# a typo.
m = re.search(r"Table with name ([A-Za-z_][A-Za-z0-9_]*) does not exist", str(e))
if m:
typer.echo("", err=True)
typer.echo(
f"Note: `{m.group(1)}` might be a `query_mode='remote'` table. Local "
"DuckDB only holds views for `local` and `materialized` tables — "
"`remote` ones live on BigQuery and are not synced.\n"
" - List all registered tables: agnes catalog\n"
" - Inspect column schema: agnes schema <name>\n"
" - Run a query against BigQuery: agnes query --remote \"<SQL>\"",
err=True,
)
raise typer.Exit(1) raise typer.Exit(1)
finally: finally:
conn.close() conn.close()

View file

@ -1,6 +1,6 @@
[project] [project]
name = "agnes-the-ai-analyst" name = "agnes-the-ai-analyst"
version = "0.47.1" version = "0.47.2"
description = "Agnes — AI Data Analyst platform for AI analytical systems" description = "Agnes — AI Data Analyst platform for AI analytical systems"
requires-python = ">=3.11,<3.14" requires-python = ">=3.11,<3.14"
license = "MIT" license = "MIT"

View file

@ -347,6 +347,72 @@ def test_fallback_fails_fast_on_pure_duckdb_syntax(
"backtick" in detail.get("hint", "").lower(), detail "backtick" in detail.get("hint", "").lower(), detail
def test_remote_estimate_failed_surfaces_first_error_when_attempts_differ(
seeded_app, mock_dry_run, monkeypatch,
):
"""When the rewritten-SQL dry-run fails with a column-not-found /
syntax error and the original-SQL retry fails with the unhelpful
"must be qualified" (the typical shape for catalog-id references
user SQL has no qualifying dataset, so the retry is guaranteed to
fail this way), the surfaced `underlying` MUST be the first
attempt's diagnostic. Pre-fix the second attempt's message
overwrote the first, masking the real cause from the user.
"""
from connectors.bigquery.access import BqAccessError
_register_bq_remote_row("ue", "finance", "ue")
state = {"calls": 0}
def two_different_errors(_bq, _sql):
state["calls"] += 1
if state["calls"] == 1:
raise BqAccessError(
"bq_bad_request",
"Unrecognized name: authorize_date at [1:88]",
)
raise BqAccessError(
"bq_bad_request",
"Table 'unit_economics' must be qualified with a dataset "
"(e.g. dataset.table)",
)
monkeypatch.setattr(
"app.api.query._bq_dry_run_bytes", two_different_errors,
raising=False,
)
c = seeded_app["client"]
token = seeded_app["admin_token"]
r = c.post(
"/api/query",
json={
"sql": (
"SELECT COUNT(*) FROM ue "
"WHERE authorize_date = DATE '2025-05-06'"
),
},
headers=_auth(token),
)
assert state["calls"] == 2, (
f"expected rewritten + original-retry = 2 dry-runs, got "
f"{state['calls']}"
)
assert r.status_code == 400, r.json()
detail = r.json().get("detail", {})
assert isinstance(detail, dict), detail
assert detail.get("kind") == "remote_estimate_failed", detail
# The FIRST attempt's diagnostic — the actually-useful one — wins.
assert "authorize_date" in detail.get("underlying", ""), detail
# The second attempt's context is preserved for operator visibility.
assert "must be qualified" in detail.get("underlying_original", ""), \
detail
# Hint now points at `agnes schema` first — the typical cause is a
# typo'd column name on the FROM table.
assert "agnes schema" in detail.get("hint", "").lower(), detail
def test_guardrail_propagates_502_on_non_parse_bq_errors( def test_guardrail_propagates_502_on_non_parse_bq_errors(
seeded_app, mock_dry_run, monkeypatch, seeded_app, mock_dry_run, monkeypatch,
): ):

View file

@ -150,3 +150,46 @@ class TestLocalQuery:
result = runner.invoke(app, ["query", "SELECT * FROM nonexistent_table_xyz"]) result = runner.invoke(app, ["query", "SELECT * FROM nonexistent_table_xyz"])
assert result.exit_code == 1 assert result.exit_code == 1
assert "Query error" in result.output assert "Query error" in result.output
def test_local_query_missing_table_hints_remote(self, tmp_config):
"""Querying a table absent from local DuckDB surfaces a hint about
`query_mode='remote'` tables alongside the original DuckDB error.
Reproduces the analyst-session UX gap where DuckDB's nearest-name
("Did you mean <other_table>") suggestion sent the user down the
wrong path they thought the table didn't exist or they typo'd,
when in fact it's a remote table that intentionally has no local
view.
"""
import duckdb
db_dir = tmp_config / "local" / "user" / "duckdb"
db_dir.mkdir(parents=True)
duckdb.connect(str(db_dir / "analytics.duckdb")).close()
result = runner.invoke(app, ["query", "DESCRIBE unit_economics"])
assert result.exit_code == 1
# Original DuckDB diagnostic must remain visible (don't break logging).
assert "Query error" in result.output
assert "Table with name unit_economics does not exist" in result.output
# New hint fires.
assert "query_mode='remote'" in result.output
assert "agnes catalog" in result.output
assert "agnes schema" in result.output
assert "agnes query --remote" in result.output
def test_local_query_syntax_error_does_not_show_remote_hint(self, tmp_config):
"""A non-missing-table failure (e.g. raw syntax error) must NOT
trigger the new remote-mode hint the regex only matches DuckDB's
`Table with name X does not exist` shape.
"""
import duckdb
db_dir = tmp_config / "local" / "user" / "duckdb"
db_dir.mkdir(parents=True)
duckdb.connect(str(db_dir / "analytics.duckdb")).close()
# Trailing FROM with no relation -> ParserException, not CatalogException.
result = runner.invoke(app, ["query", "SELECT * FROM"])
assert result.exit_code == 1
assert "Query error" in result.output
assert "query_mode='remote'" not in result.output
assert "agnes query --remote" not in result.output