From abc2335ea203c6ef64c0cdd02651712cdabe3d3b Mon Sep 17 00:00:00 2001 From: Vojtech Rysanek Date: Wed, 6 May 2026 11:00:59 +0400 Subject: [PATCH 1/4] docs(marketplace): document two-step fallback for marketplace registration MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The 'Git channel' block previously showed only the direct '/plugin marketplace add https://x:$AGNES_PAT@…' path. That path fails on macOS/Windows against a private-CA Agnes instance because Bun-compiled 'claude' ignores the OS trust store and CA env vars on the marketplace HTTPS path (see the existing rationale in app/web/setup_instructions.py). Document the two-step fallback explicitly: git clone https://x:$AGNES_PAT@agnes.example.com/marketplace.git/ \ ~/agnes-marketplace claude plugin marketplace add ~/agnes-marketplace System 'git' honors GIT_SSL_CAINFO + the OS trust store, so the clone succeeds where direct add fails; pointing Claude Code at the local clone then sidesteps the Bun TLS path entirely. The dashboard-served setup payload already branches between the two automatically based on platform; the docs now match that behavior for manual flows. Also note the optional 'remote set-url' hardening to strip the PAT from the cloned repo's origin (mirrors what the dashboard payload does). --- CLAUDE.md | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/CLAUDE.md b/CLAUDE.md index 13c3af9..cb0d483 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -363,10 +363,32 @@ User registration inside Claude Code: # ZIP channel (typically via a SessionStart hook that unpacks into ./marketplace/) curl -H "Authorization: Bearer $AGNES_PAT" https://agnes.example.com/marketplace.zip -# Git channel — one-time registration +# Git channel — one-time registration. Two paths; pick the first that works. + +# (a) Direct registration — preferred when it works. /plugin marketplace add https://x:$AGNES_PAT@agnes.example.com/marketplace.git/ + +# (b) Two-step fallback — required when (a) fails. Bun-compiled `claude` on +# macOS / Windows ignores the OS trust store and CA env vars on the +# marketplace HTTPS path, so direct add can fail with TLS errors against +# a private-CA Agnes instance even when system tools work fine. System +# `git` honors GIT_SSL_CAINFO + the OS trust store, so cloning manually +# and pointing Claude Code at the local clone sidesteps the Bun TLS path +# entirely. +git clone https://x:$AGNES_PAT@agnes.example.com/marketplace.git/ ~/agnes-marketplace +claude plugin marketplace add ~/agnes-marketplace +# Optional hardening: strip the PAT from the cloned repo's origin so it +# doesn't sit in plaintext at ~/agnes-marketplace/.git/config — re-clone via +# the dashboard's setup flow when the PAT rotates. +git -C ~/agnes-marketplace remote set-url origin https://agnes.example.com/marketplace.git/ ``` +The dashboard-served setup payload (see `app/web/setup_instructions.py`) already +branches between (a) and (b) automatically based on platform when a private CA +is in play. The block above is the manual equivalent for users registering +outside that flow (e.g. operators bringing up a new instance, or +analysts whose first attempt failed and need to retry by hand). + ## Hybrid Queries (BigQuery + Local) For tables too large to sync locally, use hybrid queries that JOIN local data with on-demand BigQuery results: From 32c8ea601a0f285022c28e1d4fd41716921bb088 Mon Sep 17 00:00:00 2001 From: Vojtech Rysanek Date: Wed, 6 May 2026 11:24:14 +0400 Subject: [PATCH 2/4] fix(bigquery): apply bq_query_timeout_ms on every BQ-extension attach + surface silent failures MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The DuckDB BigQuery extension defaults bq_query_timeout_ms to 90 s, which is too tight for analyst-scale queries against view-backed BQ datasets. Agnes already has apply_bq_session_settings() that bumps it to 600 s (configurable via data_source.bigquery.query_timeout_ms), but two regressions let the 90 s default leak through to live queries: 1. apply_bq_session_settings() swallowed every Exception silently. If the BigQuery extension wasn't loaded on the connection yet, or the installed extension version didn't recognise the setting, the SET would fail and the function would return without surfacing the problem. Operators saw 90 s timeouts on 'agnes query --remote' with no log line explaining why. 2. The call sites in src/db.py:_reattach_remote_extensions and src/orchestrator.py:_remote_attach only invoked apply_bq_session_settings on the metadata-token branch (token_env empty, the BqAccess contract). The token-based and no-auth branches ran ATTACH against the BigQuery extension without ever applying the timeout setting — so any BQ source registered with an explicit token_env, or with no auth env at all, fell back to the 90 s default. Fix: - apply_bq_session_settings now logs WARNING on each failure path (instance_config import error, non-numeric value, SET execution failure, readback error). It also verifies the setting actually landed via SELECT current_setting('bq_query_timeout_ms') and logs WARNING when the readback disagrees with the requested value, which catches the silent-ignore case some extension versions exhibit. - Both _reattach_remote_extensions (src/db.py) and _remote_attach (src/orchestrator.py) now call apply_bq_session_settings on every branch that ATTACHes a BigQuery alias, not only the metadata-token branch. Idempotent: calling it twice on the metadata-token path is a no-op SET. Tests: - Extended the _RecordingConn fixture to support .fetchone() so the readback assertion path works. Updated existing call-shape assertions to expect the SELECT current_setting readback alongside the SET. Added two new tests covering the WARNING surfaces for SET failure and readback mismatch — regression guards for the silent- fallback bug this PR addresses. - Full BQ-touching suite (398 tests) passes. --- connectors/bigquery/access.py | 73 +++++++++++++++++++++--- src/db.py | 11 ++++ src/orchestrator.py | 11 ++++ tests/test_bq_query_timeout.py | 100 +++++++++++++++++++++++++++++---- 4 files changed, 176 insertions(+), 19 deletions(-) diff --git a/connectors/bigquery/access.py b/connectors/bigquery/access.py index e46a877..48e4e9f 100644 --- a/connectors/bigquery/access.py +++ b/connectors/bigquery/access.py @@ -249,12 +249,24 @@ def apply_bq_session_settings(conn) -> None: Call AFTER ``LOAD bigquery`` on every DuckDB session that touches BQ: BqAccess's session factory, the standalone extractor in - ``connectors/bigquery/extractor.py``, and the orchestrator's - ``_remote_attach`` path in ``src/orchestrator.py``. + ``connectors/bigquery/extractor.py``, the orchestrator's + ``_remote_attach`` path in ``src/orchestrator.py``, and ``src/db.py``'s + read-only analytics-DB factory (called from ``_reattach_remote_extensions`` + plus a belt-and-suspenders call from ``get_analytics_db_readonly`` itself). + + SET failures are logged at WARNING level (previously silent) so operators + can diagnose timeouts that surface as the extension default 90 s when the + intended value was higher. The applied value is verified via + ``current_setting('bq_query_timeout_ms')``; a mismatch is also logged. """ try: from app.instance_config import get_value - except Exception: + except Exception as e: + logger.warning( + "apply_bq_session_settings: instance_config unavailable (%s); " + "extension default bq_query_timeout_ms (90 s) will apply", + e, + ) return raw = get_value( "data_source", "bigquery", "query_timeout_ms", default=600_000, @@ -262,16 +274,61 @@ def apply_bq_session_settings(conn) -> None: try: ms = int(raw) if raw is not None else 0 except (TypeError, ValueError): + logger.warning( + "apply_bq_session_settings: query_timeout_ms=%r is not an int; " + "extension default (90 s) will apply", + raw, + ) return if ms <= 0: + # Operator opt-out: leave extension default in place. Log INFO so the + # choice shows up in startup logs without being noisy. + logger.info( + "apply_bq_session_settings: query_timeout_ms=%d (≤0); extension " + "default bq_query_timeout_ms (90 s) will apply", + ms, + ) return try: conn.execute(f"SET bq_query_timeout_ms = {int(ms)}") - except Exception: - # Fail-soft: extension version may not support the setting, or the - # session may already have been frozen — leave the default rather - # than poisoning the whole session. - pass + except Exception as e: + # Most common cause: the BigQuery extension is not loaded on this + # connection yet (caller forgot the `LOAD bigquery` step), or the + # installed extension version pre-dates the setting. Either way the + # 90 s default sticks and remote queries time out unexpectedly. + # Surface this — silent fallback was the bug behind real outages. + logger.warning( + "apply_bq_session_settings: SET bq_query_timeout_ms=%d failed (%s); " + "extension default (90 s) will apply. Likely cause: BigQuery " + "extension not loaded on this connection, or the installed " + "extension version does not support this setting.", + ms, e, + ) + return + # Verify the setting actually landed — protects against silent ignores + # the extension might do in some failure modes. + try: + result = conn.execute( + "SELECT current_setting('bq_query_timeout_ms')" + ).fetchone() + actual = int(result[0]) if result and result[0] is not None else None + except Exception as e: + logger.warning( + "apply_bq_session_settings: could not read back " + "bq_query_timeout_ms (%s); cannot verify setting was applied", + e, + ) + return + if actual != ms: + logger.warning( + "apply_bq_session_settings: requested bq_query_timeout_ms=%d but " + "current_setting reports %r — extension may have ignored the SET", + ms, actual, + ) + else: + logger.debug( + "apply_bq_session_settings: bq_query_timeout_ms=%d applied", ms, + ) class BqAccess: diff --git a/src/db.py b/src/db.py index d6c1ba4..261c7bc 100644 --- a/src/db.py +++ b/src/db.py @@ -690,10 +690,21 @@ def _reattach_remote_extensions( conn.execute( f"ATTACH '{safe_url}' AS {alias} (TYPE {extension}, TOKEN '{escaped_token}')" ) + # Apply BQ session settings on every BQ-extension attach, + # not only the metadata-token branch above. Previously the + # token-based branch fell through without setting + # bq_query_timeout_ms, leaving the 90 s extension default + # in place and causing "remote query timeout" surprises. + if extension == "bigquery": + from connectors.bigquery.access import apply_bq_session_settings + apply_bq_session_settings(conn) else: conn.execute( f"ATTACH '{safe_url}' AS {alias} (TYPE {extension}, READ_ONLY)" ) + if extension == "bigquery": + from connectors.bigquery.access import apply_bq_session_settings + apply_bq_session_settings(conn) attached_dbs.add(alias) logger.debug("Re-attached remote source %s via %s extension", alias, extension) except Exception as e: diff --git a/src/orchestrator.py b/src/orchestrator.py index 914bcd6..47f1737 100644 --- a/src/orchestrator.py +++ b/src/orchestrator.py @@ -512,11 +512,22 @@ class SyncOrchestrator: conn.execute( f"ATTACH '{safe_url}' AS {alias} (TYPE {extension}, TOKEN '{escaped_token}')" ) + # Apply BQ session settings on every BQ-extension attach, + # not only the metadata-token branch above. The token-based + # branch previously fell through without calling + # apply_bq_session_settings, leaving the 90 s extension + # default for bq_query_timeout_ms in place. + if extension == "bigquery": + from connectors.bigquery.access import apply_bq_session_settings + apply_bq_session_settings(conn) else: # No auth required (or extension handles it via env automatically) conn.execute( f"ATTACH '{safe_url}' AS {alias} (TYPE {extension}, READ_ONLY)" ) + if extension == "bigquery": + from connectors.bigquery.access import apply_bq_session_settings + apply_bq_session_settings(conn) logger.info("Attached remote source %s via %s extension", alias, extension) except Exception as e: diff --git a/tests/test_bq_query_timeout.py b/tests/test_bq_query_timeout.py index fa64e08..91c7f54 100644 --- a/tests/test_bq_query_timeout.py +++ b/tests/test_bq_query_timeout.py @@ -13,18 +13,45 @@ from connectors.bigquery.access import apply_bq_session_settings class _RecordingConn: """Minimal DuckDB-conn stand-in that records execute() calls. - apply_bq_session_settings only calls .execute(); we don't need a - real DuckDB to verify the SET command shape. + apply_bq_session_settings calls .execute() to issue the SET and the + follow-up ``current_setting`` readback (added so the function can + verify the extension actually accepted the setting). The readback + expects .fetchone() on the result — wire it to echo the SET value + so the verification path succeeds when nothing rejects the SET. """ + SETTING_NAME = "bq_query_timeout_ms" + SET_PREFIX = f"SET {SETTING_NAME} = " + def __init__(self, raise_on=None): self.calls: list[str] = [] self.raise_on = raise_on + # Last value the extension would report from + # current_setting('bq_query_timeout_ms') — set when SET is observed, + # echoed back from .fetchone(). + self._reported_setting: str | None = None def execute(self, sql: str): self.calls.append(sql) if self.raise_on and self.raise_on in sql: raise RuntimeError(f"simulated failure on: {sql}") + if sql.startswith(self.SET_PREFIX): + # Capture the value the production code asked the extension to + # apply so the readback below echoes a consistent answer. + self._reported_setting = sql[len(self.SET_PREFIX):] + return _RecordingResult(self._reported_setting) + + +class _RecordingResult: + """Stand-in for the DuckDB result of ``SELECT current_setting(...)``.""" + + def __init__(self, value): + self._value = value + + def fetchone(self): + # current_setting returns a one-tuple. None is the realistic + # answer when the extension doesn't have the setting registered. + return (self._value,) def _patched_get_value(value): @@ -42,7 +69,8 @@ def _patched_get_value(value): def test_default_when_config_missing(): """When get_value returns the default (None passed through, default arg used), apply_bq_session_settings should fall back to the bumped - 600 000 ms default and emit the SET.""" + 600 000 ms default, emit the SET, and verify it landed via the + current_setting readback.""" conn = _RecordingConn() # Simulate get_value returning the default we passed (600_000) by # echoing the default kwarg. @@ -50,14 +78,20 @@ def test_default_when_config_missing(): return default with patch("app.instance_config.get_value", side_effect=fake): apply_bq_session_settings(conn) - assert conn.calls == ["SET bq_query_timeout_ms = 600000"] + assert conn.calls == [ + "SET bq_query_timeout_ms = 600000", + "SELECT current_setting('bq_query_timeout_ms')", + ] def test_explicit_value(): conn = _RecordingConn() with _patched_get_value(900_000): apply_bq_session_settings(conn) - assert conn.calls == ["SET bq_query_timeout_ms = 900000"] + assert conn.calls == [ + "SET bq_query_timeout_ms = 900000", + "SELECT current_setting('bq_query_timeout_ms')", + ] def test_zero_sentinel_leaves_extension_default(): @@ -95,19 +129,63 @@ def test_string_numeric_is_coerced(): conn = _RecordingConn() with _patched_get_value("750000"): apply_bq_session_settings(conn) - assert conn.calls == ["SET bq_query_timeout_ms = 750000"] + assert conn.calls == [ + "SET bq_query_timeout_ms = 750000", + "SELECT current_setting('bq_query_timeout_ms')", + ] -def test_set_failure_does_not_propagate(): +def test_set_failure_does_not_propagate(caplog): """Older DuckDB BQ extension versions may not recognise the setting. The function must fail-soft so a session that was otherwise healthy - keeps working — just with the extension's built-in default timeout.""" + keeps working — just with the extension's built-in default timeout. + The failure is logged at WARNING so an operator who hits the 90 s + extension default unexpectedly can see why.""" conn = _RecordingConn(raise_on="SET bq_query_timeout_ms") with _patched_get_value(600_000): - # Must not raise. - apply_bq_session_settings(conn) - # The SET was attempted (recorded before the exception). + with caplog.at_level("WARNING", logger="connectors.bigquery.access"): + # Must not raise. + apply_bq_session_settings(conn) + # The SET was attempted (recorded before the exception); no readback + # because the SET path raised before reaching it. assert conn.calls == ["SET bq_query_timeout_ms = 600000"] + assert any( + "SET bq_query_timeout_ms=600000 failed" in r.message + for r in caplog.records + ), "expected a WARNING surfacing the silent-failure regression that hid 90 s timeouts" + + +def test_setting_mismatch_is_logged(caplog): + """If the extension accepts the SET silently but doesn't actually apply + it (some failure modes), the readback verification must surface the + mismatch via WARNING so operators can diagnose.""" + conn = _RecordingConn() + # Simulate extension ignoring the SET: keep the readback value at + # whatever it was before (None — extension default in effect). + conn._reported_setting = None # pre-seed: readback returns None + with _patched_get_value(600_000): + with caplog.at_level("WARNING", logger="connectors.bigquery.access"): + # _RecordingConn echoes the SET into _reported_setting on observe; + # to simulate "extension ignored SET" we override execute() to + # NOT update the setting on SET. + original_execute = conn.execute + + def execute_without_capture(sql: str): + conn.calls.append(sql) + if sql.startswith(_RecordingConn.SET_PREFIX): + # Don't update _reported_setting → readback returns None + return _RecordingResult(conn._reported_setting) + return _RecordingResult(conn._reported_setting) + + conn.execute = execute_without_capture # type: ignore[method-assign] + try: + apply_bq_session_settings(conn) + finally: + conn.execute = original_execute # type: ignore[method-assign] + assert any( + "current_setting reports" in r.message + for r in caplog.records + ), "expected a WARNING when the readback disagrees with the SET" def test_no_app_config_module_silently_skipped(): From a7d19206d74e6511454eecadca388bcd1fd3ae58 Mon Sep 17 00:00:00 2001 From: ZdenekSrotyr Date: Wed, 6 May 2026 09:27:42 +0200 Subject: [PATCH 3/4] =?UTF-8?q?release:=200.38.1=20=E2=80=94=20docs(market?= =?UTF-8?q?place)=20two-step=20fallback?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- CHANGELOG.md | 13 +++++++++++++ pyproject.toml | 2 +- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 6b40d45..9a787fc 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,19 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C ## [Unreleased] +## [0.38.1] — 2026-05-06 + +### Internal +- `CLAUDE.md` — `Claude Code marketplace endpoint` section now documents the + two-step fallback (system `git clone` + local `claude plugin marketplace + add`) for users registering manually against a private-CA Agnes instance. + Bun-compiled `claude` ignores the OS trust store and CA env vars on the + marketplace HTTPS path, so direct `/plugin marketplace add` over HTTPS can + fail with TLS errors on macOS / Windows even when system tools work fine. + The dashboard-served setup payload (`app/web/setup_instructions.py`) + already branches between the two automatically based on platform; the + doc snippet now matches that behavior for manual flows. + ## [0.38.0] — 2026-05-06 ### Added diff --git a/pyproject.toml b/pyproject.toml index 40106fe..54d552e 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "agnes-the-ai-analyst" -version = "0.38.0" +version = "0.38.1" description = "Agnes — AI Data Analyst platform for AI analytical systems" requires-python = ">=3.11,<3.14" license = "MIT" From d68c3c5fa2cf7b81da71af3c1e68d40b07f63ddb Mon Sep 17 00:00:00 2001 From: ZdenekSrotyr Date: Wed, 6 May 2026 09:48:12 +0200 Subject: [PATCH 4/4] =?UTF-8?q?release:=200.38.2=20=E2=80=94=20bq=5Fquery?= =?UTF-8?q?=5Ftimeout=5Fms=20applied=20on=20every=20BQ=20attach=20+=20surf?= =?UTF-8?q?aced=20silent=20failures?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- CHANGELOG.md | 22 ++++++++++++++++++++++ pyproject.toml | 2 +- 2 files changed, 23 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 9a787fc..2f8ddcf 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,28 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C ## [Unreleased] +## [0.38.2] — 2026-05-06 + +### Fixed +- **`bq_query_timeout_ms` was not applied on every BigQuery ATTACH branch** + (`src/db.py:_reattach_remote_extensions`, + `src/orchestrator.py:_attach_remote_extensions`). Pre-fix only the + metadata-token branch (the BqAccess contract, `token_env=''`) called + `apply_bq_session_settings`. BigQuery sources registered with an explicit + `token_env`, or with no auth env, ATTACH'd without ever applying the + timeout — falling back to the extension's 90 s default. Default-config + operators on those branches now consistently get the configured 600 s + (or whatever `data_source.bigquery.query_timeout_ms` is set to). +- **`apply_bq_session_settings` swallowed every `Exception` silently** + (`connectors/bigquery/access.py`). Two realistic failure modes — the + BigQuery extension not yet loaded on the connection, or an installed + extension version that doesn't recognise the setting — left the 90 s + default in place with no log line explaining why. Each failure path + now logs `WARNING` with the actionable cause; on success the applied + value is verified via a `current_setting('bq_query_timeout_ms')` + readback (catches the silent-ignore mode some extension versions + exhibit) and a mismatch logs `WARNING` too. + ## [0.38.1] — 2026-05-06 ### Internal diff --git a/pyproject.toml b/pyproject.toml index 54d552e..def8c93 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "agnes-the-ai-analyst" -version = "0.38.1" +version = "0.38.2" description = "Agnes — AI Data Analyst platform for AI analytical systems" requires-python = ">=3.11,<3.14" license = "MIT"