Commit graph

808 commits

Author SHA1 Message Date
ZdenekSrotyr
630e224578 feat(cli): add agnes self-upgrade with smoke test + rollback
Reuses cli.update_check.check() for the version probe — extended with
bypass_disabled=True so explicit user-typed self-upgrade is not silenced
by AGNES_NO_UPDATE_CHECK (which is for the implicit warning loop).

Install path: uv tool install --force when uv is on PATH; otherwise
curl + pip via sys.executable (NOT system python3, NOT --user — both
would land outside the agnes venv and silently no-op the upgrade).

Smoke test execs the binary at the install-resolved path (uv tool dir
joined with agnes-the-ai-analyst/bin/agnes, or sys.executable's sibling
agnes for pip) — never via shutil.which, which can resolve a stale shadow
on PATH and produce a false-positive smoke pass on the OLD version. Smoke
also asserts --version output contains info.latest via PEP 440 Version()
equality (so 0.40.0 does not falsely match 0.40.10).

On smoke fail: rollback to last_known_good.json (written only after a
previous run's smoke passed). Rollback rc is captured and surfaced on
stderr if it also fails. First-ever upgrade or unrecoverable rollback
prints the canonical bootstrap recovery: curl -fsSL <server>/cli/install.sh | bash.

AGNES_SELF_UPGRADE_IN_PROGRESS=1 is set for the duration of the run
and propagated to the smoke-test subprocess. Layer B's _check_version_headers
honors the sentinel and skips the < min hard-stop, so an in-flight
upgrade can never sys.exit(2) itself.

--force invalidates the update_check cache BEFORE probing. --force +
offline = exit 1 with explicit stderr (without --force, offline is silent).
--quiet suppresses progress output but never gags failure stderr.
2026-05-06 23:23:23 +02:00
ZdenekSrotyr
d93eda7de3 perf+test(cli): cache User-Agent at module scope; pin local==min boundary 2026-05-06 23:23:23 +02:00
ZdenekSrotyr
2680a6724b feat(cli): hard-stop on incompatible-version response header
Every API response is inspected via httpx event_hooks. When the server
reports X-Agnes-Min-Version > local, CLI prints a remediation message
and exits 2. Latest-version drift continues to be handled by the
update_check warning loop — no double-warning on every API call.
2026-05-06 23:23:23 +02:00
ZdenekSrotyr
af2b866961 docs(version): clarify APP_VERSION scope + middleware /api prefix rationale 2026-05-06 23:23:23 +02:00
ZdenekSrotyr
57170bc556 feat(server): expose APP_VERSION + MIN_COMPAT_CLI_VERSION on /api/* response headers
Adds X-Agnes-Latest-Version and X-Agnes-Min-Version headers to every
/api/* response. CLI consumes these to hard-stop on incompatible drift.
MIN_COMPAT_CLI_VERSION ships at 0.0.0 — no enforcement until a deliberate
wire-protocol break bumps it.

Also dedupes app version logic: app/main.py:_app_version() helper deleted,
replaced by app/version.py:APP_VERSION as the single source of truth.
test_app_version.py rewritten to target app.version.
2026-05-06 23:23:23 +02:00
ZdenekSrotyr
56483989cf docs(plan): server-pinned CLI auto-upgrade — spec + implementation plan
Four review iterations resolved:
- PATH-shadow-safe smoke test (uv tool dir --bin + ~/.local/bin fallback)
- Recursion sentinel for in-flight self-upgrade
- sys.executable + --no-deps pip fallback (NOT system python3, NOT --user)
- Smoke + rollback with rc capture and bootstrap recovery
- Single chained SessionStart entry (shell ; for ordering, no Claude Code semantics dependency)
- AGNES_NO_UPDATE_CHECK bypass for explicit self-upgrade
- _get_shared_client() left unhooked (mid-stream sys.exit unsafe; Caddy proxies parquets anyway)

Targets release 0.40.0.
2026-05-06 23:23:23 +02:00
ZdenekSrotyr
e3494607bf
Merge pull request #208 from keboola/zs/issue-201-rewriter-backtick
fix(query): rewriter respects backtick paths; tighten cap-guard fallback (#201)
2026-05-06 23:09:43 +02:00
ZdenekSrotyr
bc55af6e88 chore: trigger Devin re-review 2026-05-06 22:07:49 +02:00
ZdenekSrotyr
f4bc04958d fix: Devin Review #1 — apply backtick mask to wrapping rewriter
`_rewrite_user_sql_for_bigquery_query` does its own bare-name detection
(mirroring the non-RBAC parts of `_bq_guardrail_inputs`). The backtick
masking from #201 was applied to `_bq_guardrail_inputs` and the
forbidden-table loop, but missed this third site — so a registered
local-mode table name appearing as the table segment of a
user-supplied full backtick path (e.g. ``\`prj.ds.orders\`` matching
registered local ``orders``) tripped the cross-source guard and
forced every backtick-path query into the 50-100× slower
ATTACH-catalog fallback.

Mask once at the top of the function, route both the BQ-name
detection (line ~830) and the cross-source check (line ~867) through
the masked copy. New regression test
`test_local_name_inside_backtick_path_does_not_trip_cross_source`
proves the wrapper now wraps when it should.
2026-05-06 21:06:21 +02:00
ZdenekSrotyr
09958c9d87 release: 0.42.0 2026-05-06 18:04:39 +02:00
ZdenekSrotyr
824e3cb636 feat(query): registry-gate full backtick BigQuery paths (#201)
Adds Pass 3 to `_bq_guardrail_inputs` that scans user SQL for full
backtick paths `<project>.<dataset>.<table>` and gates them
identically to the `bq."<dataset>"."<table>"` pass:

- Project must match the configured BigQuery data project
  (`get_bq_access().projects.data`). Mismatch → HTTP 403
  `bq_path_cross_project`.
- Path must point at a registered row. Unregistered → HTTP 403
  `bq_path_not_registered`.
- Non-admin caller must hold a grant on the registered row's id.
  Missing grant → HTTP 403 `bq_path_access_denied`.

Pre-fix, full backtick paths bypassed Agnes RBAC entirely — only the
service account scope limited reach. Post-fix the boundary matches
what `agnes catalog`-driven flows already enforce. Admin still
bypasses the per-id grant check but cannot bypass registration or
project match.

Pass 3 also seeds `dry_run_set` for resolved registered paths so the
cost-cap dry-run runs against the same physical table the user named
— composing cleanly with the Layer 2 fail-fast fallback.
2026-05-06 18:02:53 +02:00
ZdenekSrotyr
c32be3fe96 fix(query): cap-guard fallback retries original SQL, fails fast (#201)
When BQ rejects the rewritten dry-run SQL with `bq_bad_request`, the
cap-guard now retries with the user's ORIGINAL SQL instead of building
a synthetic `SELECT * FROM <table>` per registered table. The
synthetic path threw away user filters / projections / partition
predicates and routinely ballooned the estimate to "full table size",
falsely tripping `remote_scan_too_large` on legitimate narrow queries
(typical issue #201 trace: rewriter corrupts a backtick path → BQ
parse error → synthetic over-estimate → 400).

Behaviour:

- Rewritten SQL succeeds: same as before (issue #171 single-dry-run).
- Rewritten SQL parse-errors, original SQL succeeds: use original
  estimate. Common case for users submitting BQ-native input.
- Both fail with `bq_bad_request`: HTTP 400 `remote_estimate_failed`
  with a hint pointing at `agnes catalog` / BQ-native syntax. No
  silent over-estimate.
- Non-parse BQ error (forbidden, upstream): still 502 as before.

This is a behaviour change for clients matching error kinds — failure
to estimate scan size now surfaces as `remote_estimate_failed`
instead of being masked behind `remote_scan_too_large` from the
synthetic path.

Replaces the existing `test_guardrail_falls_back_to_per_table_estimate_on_bq_parse_error`
(which pinned the old contract) with `test_fallback_tries_original_sql_first`
and `test_fallback_fails_fast_on_pure_duckdb_syntax`.
2026-05-06 18:02:53 +02:00
ZdenekSrotyr
720a2180c0 fix(query): rewriter respects backtick segments (#201)
`agnes query --remote` corrupted user SQL when the request contained a
full BigQuery backtick path (`<project>.<dataset>.<table>`) whose
table segment matched a registered bare-name alias. The bare-name
rewriter used `\b` word-boundary matching against the lower-cased SQL;
both `.` and `` ` `` are non-word characters, so the regex fired
INSIDE the user's backtick path and produced malformed nested-backtick
SQL that BigQuery rejected at parse time.

Fix:

- Add `_mask_backticks(sql)` helper: replace each `…` segment with
  spaces of equal length, preserving offsets so word-boundary
  searches find positions only outside backticks.
- `_bq_guardrail_inputs` (bare-name pass + forbidden-table pass)
  searches against the masked SQL.
- `_rewrite_bq_table_refs_to_native` Pass 1 splits the SQL on
  `(\`[^\`]*\`)` and rewrites only the outside-backtick chunks. Pass
  2 (`bq."ds"."tbl"` → backtick form) is unchanged — its prefix can't
  appear inside backticks.

Adds three regressions covering the rewrite + guardrail paths.
2026-05-06 18:02:53 +02:00
ZdenekSrotyr
1b49de1568
Merge pull request #202 from keboola/zs/perf-followup-0.41.0
fix(0.41.0): orchestrator filesystem fallback for materialized parquets
2026-05-06 17:16:38 +02:00
ZdenekSrotyr
7781c3f331 fix(0.41.0): orphan parquet skip in filesystem fallback (CI regression)
Pre-existing test_orchestrator_skips_orphan_parquet_in_extracts caught
the regression: my filesystem fallback created master views for ANY
parquet on disk, including orphans where DELETE /api/admin/registry
removed the registry row but the parquet wasn't fully cleaned up.

Fix: load the set of registered materialized table_ids for THIS source
from table_registry before the scan, and skip any parquet whose stem
isn't in that set. If the registry read fails (test fixture, transient
DB error), skip the fallback entirely — orphan exposure is worse than
missing master view recovery.

Pre-existing test now passes. New regression test pins the orphan-skip
contract specifically for the filesystem-fallback path.
2026-05-06 17:06:20 +02:00
ZdenekSrotyr
dfb7f25e76 release: 0.41.0 — orchestrator filesystem fallback for missing _meta materialized rows
0.40.0 added _persist_materialized_inner_view in materialize_query, which
tried to open extract.duckdb from a fresh DuckDB handle to write the _meta
row + inner view. In production this conflicts with the same uvicorn
process's existing read-only ATTACH (orchestrator's analytics conn holds
extract.duckdb ATTACHed as <source_name> alias), and DuckDB single-process
file-handle uniqueness rejects with:

  Binder Error: Unique file handle conflict: Cannot attach "extract"
  — already attached by database "<source>"

The helper logs WARNING fail-soft, parquet stays canonical, but the
master view never appears via the meta path.

Fix: at the end of _attach_and_create_views, scan
<extract_dir>/data/*.parquet and CREATE OR REPLACE VIEW <id> AS
SELECT * FROM read_parquet('<path>') for any parquet whose <id> is not
already in the per-source tables list (= meta path didn't pick it up).

Decoupled from materialize_query open-handle race. Honors the same
view_ownership cross-connector collision rules as the meta path
(first-come-first-served via view_repo.claim).

Tests:
- filesystem-fallback fires when _meta row missing
- skipped when meta path already created the view (no shadow)
- skips invalid identifiers (e.g. parquet stem starting with a digit)
- doesn't crash when source has no data/ subdir
2026-05-06 16:58:18 +02:00
ZdenekSrotyr
0fd73faa8d
Merge pull request #200 from keboola/zs/perf-followup-0.40.0
fix(0.40.0): materialize_query writes _meta + inner view (master view recovery)
2026-05-06 16:18:41 +02:00
ZdenekSrotyr
b5b16e98a0 release: 0.40.0 — materialize_query writes _meta + inner view so master views appear
Pre-fix flow:
1. extractor subprocess writes _meta with N remote rows + creates N inner
   views in extract.duckdb (rebuild_from_registry skips materialized rows
   per design — explicit `continue` at line 389)
2. _run_materialized_pass calls materialize_query, which writes parquet
   atomically + returns stats — but never updates _meta
3. orchestrator.rebuild scans _meta, finds only the N remote rows, creates
   master views only for them. Materialized parquet is on disk but
   invisible to /api/query → 400 'not yet materialized'

Symptom appears after every container recreate (the previous run's _meta
state is wiped because docker compose down nukes the named volume that
backs extract.duckdb on some compose layouts; even on volumes that
persist, the next extractor pass calls _create_meta_table which DROPs
+ CREATEs _meta cleanly).

Fix: after os.replace(tmp_path, parquet_path) in materialize_query, open
extract.duckdb (read-write), DELETE existing _meta row for table_id,
INSERT new one with query_mode='materialized', and CREATE OR REPLACE
VIEW <table_id> AS SELECT * FROM read_parquet(<path>). All inside a
single transaction so concurrent reads see either old or new state, not
torn rows. Fail-soft on lock contention or schema drift — parquet
remains canonical, next sync pass recovers.

Tests: 3 new in test_bq_materialize.py covering:
- meta + inner view registered after materialize, alongside existing
  remote rows
- re-run replaces (not duplicates) the meta row
- skips inner-view registration when extract.duckdb doesn't exist yet
  (fresh BQ-only deployment edge case)
2026-05-06 16:04:58 +02:00
ZdenekSrotyr
6de7084c9f
Merge pull request #199 from keboola/zs/perf-bundle-0.39.0
perf(0.39.0): bundle — BQ query rewrite + session pool + chunked download + HTTP/2
2026-05-06 14:37:48 +02:00
ZdenekSrotyr
f03fa67b2e chore: trigger Devin re-review
All Devin findings from initial review on 8e56d45c addressed:
- Devin #1 (BQ billing project) → fixed in 81d065b1
- Devin #2 (try/except scope) → fixed in aee585fa (was already in flight at initial review time)

Plus three rounds of devil's advocate review (e5645fd2, aee585fa, 77d88014)
addressing 9 additional findings. 76/76 perf tests pass; CI green.
2026-05-06 14:32:36 +02:00
ZdenekSrotyr
81d065b1ea fix: Devin Review #1 — bigquery_query() first arg uses billing project, not data
In cross-project BQ setups (where billing != data), the SA typically has
serviceusage.services.use on the billing project but not on the data
project. The rewriter passed bq.projects.data as the first arg to
bigquery_query(), which BQ uses as the execution + billing project →
403 USER_PROJECT_DENIED.

Match the convention used everywhere else in the codebase
(app/api/v2_scan.py, app/api/v2_sample.py, app/api/v2_schema.py,
connectors/bigquery/extractor.py): backtick paths inside the inner SQL
use the **data** project (resolves the actual table location), the
bigquery_query() first arg uses the **billing** project (decides who
pays + which project the job runs under). For single-project deploys
the two are identical so the fix is a no-op there.

Test pins the cross-project case: data-prj for backticks, billing-prj
for the bigquery_query() first arg.
2026-05-06 14:07:38 +02:00
ZdenekSrotyr
77d88014df fix: devil's advocate R3 — reap PID-suffixed leftovers from dead processes
R3 final pass surfaced one issue, addressed:

R2#2 introduced PID-suffixed <target>.{pid}.tmp / .{pid}.partN to
prevent concurrent agnes pull invocations from yanking each other's
in-progress writes. The pre-clean inside _download_chunked /
_download_single_stream only deletes leftovers from the CURRENT
process's PID — files from a SIGKILL'd or crashed prior pull (any
other PID) are never touched and accumulate on disk forever.

Add _reap_dead_pid_leftovers(target_path) called at the start of both
download paths. Globs <target>.*.tmp / <target>.*.partN, extracts the
embedded PID, calls os.kill(pid, 0) to test liveness (POSIX standard
no-op probe), and unlinks files whose process no longer exists.
Permission-denied = process is alive but owned by another user → keep
the file (conservative). Windows users get the conservative 'keep'
default.

Two new tests pin the behavior — live-PID file preserved, dead-PID
.tmp + .partN reaped, bare-name (legacy) untouched, garbage filenames
skipped without raise.
2026-05-06 14:04:47 +02:00
ZdenekSrotyr
aee585fac6 fix: devil's advocate R2 — narrow shared-client try, PID tmp suffix, Syntax error anchor
R2 adversarial review surfaced 3 issues, all addressed:

#1 cli/client.py:572-577 outer try/except wrapped both _get_shared_client()
AND the actual download. A 401/403/404/5xx from the server triggered a
full second download attempt with a fresh client — wasted bandwidth on
hard failures, no fail-fast on revoked PAT. Narrowed the try to only
the shared-client construction; the download itself is no longer
retried under the fallback except.

#2 concurrent agnes pull invocations (e.g. SessionStart hook + manual
run) collided on bare <target>.tmp / <target>.partN paths — one process's
in-progress write got yanked by the other's cleanup, manifest hash
check then failed spuriously. Per-process suffix (<target>.{pid}.tmp,
<target>.{pid}.partN) makes intermediate files disjoint; the final
os.replace to the bare target is atomic so last-writer-wins.

#3 _looks_like_bq_rewrite_parse_error patterns 'Syntax error' could
false-positive on a query like WHERE log_msg = 'Syntax error in foo'
that fails for an unrelated reason (quota, network) and has the
literal substring echoed in the error text. Anchored to 'Syntax error: '
(with trailing colon) — BQ always emits the colon in this error
format, user SQL string literals normally don't.
2026-05-06 13:57:29 +02:00
ZdenekSrotyr
e5645fd280 fix: devil's advocate R1 — chunked probe, parse-error heuristic narrow, pool settings refresh, content-length sanity, multi-project skip
R1 adversarial review surfaced 5 issues, all addressed:

#1 chunked download silently disabled in non-Caddy deployments (HEAD on
GET-only FastAPI route returns 405). _probe_range_support now falls back
to GET with Range: bytes=0-0 when HEAD fails — works against both
Caddy file_server (HEAD-friendly) and dev FastAPI direct (GET-only).

#2 parse-error fallback heuristic too broad — matched on Unrecognized
name / Function not found / No matching signature / Invalid cast,
which BQ surfaces for ordinary user-column typos. That triggered slow
ATTACH-catalog retry on every typo (2× latency tax). Narrowed to just
'Syntax error' / 'syntax error' which are the genuine DuckDB-vs-BQ
dialect mismatch markers.

#3 apply_bq_session_settings was only run on fresh-built pool entries,
not on reuse. An operator's /admin/server-config change to bq_query
_timeout_ms wouldn't propagate to long-lived pooled sessions until
restart. Fixed: re-apply on every pool acquire (idempotent + fail-soft).

#4 content-length sanity bound — a misconfigured proxy returning a
wildly inflated Content-Length would cause overlapping chunked Range
requests against the actual file → corrupt assembled output (caught
by manifest hash check, but only after wasted bandwidth). Cap at 100
GiB; above that, drop to single-stream.

#5 rewriter assumed every BQ row resolves under the single
bq.projects.data project. Bucket containing '.' suggests a project-
qualified bucket (multi-project deployment); rewriter would silently
target the wrong project. Conservative skip with regression test.
2026-05-06 13:50:46 +02:00
ZdenekSrotyr
8e56d45c68 fix(query): code-review fixes — outer LIMIT wrap, dollar-quoting, parse-error fallback
Address code-reviewer findings on the bigquery_query() rewrite path:

1. Outer LIMIT wrap — bigquery_query() materialises BQ result into DuckDB
   before fetchmany sees it (vs ATTACH-catalog Storage Read API streaming).
   A user 'SELECT *' against a billion-row remote table would buffer the
   entire result before request.limit applied. Wrap rewritten SQL in an
   outer 'LIMIT N+1' so the cap pushes into the BQ job itself.

2. Dollar-quoted inner SQL — naive replace("'", "''") doubling missed
   DuckDB backslash-escape sequences (\\, \\n, \\t, …). A predicate
   like 'WHERE name = ''O\\'Brien''' was unsafe under the doubling
   path. DuckDB $bqq_inner$ … $bqq_inner$ form takes the inner SQL
   verbatim with no escapes whatsoever. Falls back to legacy doubling
   if user SQL improbably contains the literal tag.

3. Parse-error fallback — when the rewritten path fails with a BQ-side
   parse / validation error (DuckDB-only syntax like ::INT cast that
   survives identifier rewrite but BQ refuses), retry the user's
   original SQL via the legacy ATTACH-catalog path so the request still
   succeeds. Mirrors the existing dry-run fallback contract.

4. CHANGELOG — delete duplicate CLI bullets that landed under
   already-released [0.38.1] (file corruption from merge — entries are
   correctly under [0.39.0]).
2026-05-06 13:29:45 +02:00
ZdenekSrotyr
3b9f6b447d release: 0.39.0 — perf bundle (BQ query rewrite + session pool + chunked download + HTTP/2) 2026-05-06 13:18:19 +02:00
ZdenekSrotyr
830d1a38f6 merge: CLI perf (chunked DL + HTTP/2 + persistent client + progress)
# Conflicts:
#	CHANGELOG.md
2026-05-06 13:16:31 +02:00
ZdenekSrotyr
c96ea3ad49 merge: server-side perf (BQ rewrite + session pool + error mapping) 2026-05-06 13:13:48 +02:00
ZdenekSrotyr
e72ff259f9 feat(pull): aggregated progress + non-TTY textual fallback
Two improvements to `agnes pull` progress reporting:

1. **Aggregated per-file progress across chunked downloads**: the
   existing Rich progress bar already used one task per file, but the
   chunked-download contract (one file = N parallel chunk callbacks
   summing to file size) meant we needed to verify that all chunk
   threads advance the same task. They do — the per-file callback is
   constructed once per tid and routes every chunk's byte delta to the
   same task / textual entry, so the bar shows one aggregated bytes-
   downloaded total rather than N separate sub-bars.

2. **Textual fallback for non-TTY stderr**: when stderr is not a
   terminal (SessionStart hook, CI runner, Docker log capture), Rich
   either suppresses output (silent multi-minute pull on a 5 GB
   parquet) or emits raw control sequences. The new `_TextualProgress`
   helper instead emits one plain-text line per file at most every
   10%-of-total-bytes or 30 s, plus a final `100% done` line per file.
   Format: `[N/T files] <tid>: 25% (16 MB / 66 MB) at 1.5 MB/s`.

The TTY path is unchanged. Detection uses `sys.stderr.isatty()` —
`show_progress=True` flips into the textual fallback when that returns
False. `show_progress=False` (the SessionStart hook) still emits no
progress text in either mode.
2026-05-06 13:09:37 +02:00
ZdenekSrotyr
14db85f506 fix(bq): map 'Response too large' to its own error class instead of generic bad_request
translate_bq_error previously mapped BQ's responseTooLarge failure mode
to bq_bad_request (HTTP 400 with the raw upstream message). The user-
facing implication ('your SQL has a syntax error') is wrong -- the root
cause is query shape (BQ refused to return the result inline because
it exceeded the response size limit), and the actionable remediation is
'narrow the WHERE clause, aggregate further, or use a materialized
table'.

Add bq_response_too_large as a first-class BqAccessError kind (also 400)
with a canonical hint message; original BQ message preserved in details
for operator debugging. Detection is substring-based on 'response too
large' and fires before the generic BadRequest path so the dedicated
mapping always wins. Affects every BQ-touching path since they all
share translate_bq_error -- /api/query, /api/v2/{scan,sample,schema},
materialize.
2026-05-06 13:09:31 +02:00
ZdenekSrotyr
bd1b5ad444 perf(cli): persistent HTTP/2 client across pull invocation
Pool the httpx.Client used by `stream_download` so N parquet downloads
share a single TLS handshake instead of one handshake each. With the
optional `h2` package installed, HTTP/2 multiplexing further lets all
chunk Range requests share a single TCP connection — synergizes with
the range-chunked download path added in the previous commit.

The shared client is created lazily on first stream-download call, kept
alive for the duration of the process via a module-level slot, and
closed at exit via `atexit.register`. Construction wraps in a
try/except: when `h2` is unavailable (slim install), httpx raises
ImportError on `http2=True` and we transparently fall back to an
HTTP/1.1 client — pooling alone still amortizes TLS handshakes.

`agnes pull` must never crash on a missing optional package, so the
fallback path is non-negotiable. `h2>=4.1.0` is added to the core
dependency set; downstream slim installs that drop it lose the HTTP/2
benefit but keep correctness.
2026-05-06 13:06:36 +02:00
ZdenekSrotyr
83209f32b0 perf(bq): pool DuckDB BQ extension sessions to amortize INSTALL/LOAD/ATTACH cost
Each BqAccess.duckdb_session() acquire previously created a fresh
in-memory DuckDB conn and ran INSTALL bigquery; LOAD bigquery;
CREATE SECRET; ATTACH on it -- costing ~0.5 s per request even before
any BQ work. Add a process-local pool (deque + lock) of pre-warmed
sessions; acquire reuses a warm entry when available, refreshing the
auth SECRET so a long-lived pool entry doesn't keep a stale GCE
metadata token past its TTL. Liveness probe (cheap SELECT 1) drops
broken entries before handing them to callers.

On exception inside the with-block the conn is closed instead of
returned to pool (session may carry dirty state). Pool size is
data_source.bigquery.session_pool_size (default 4; sentinel 0
disables pooling). Process-cached, not fork-safe (single uvicorn
worker is the supported deployment shape per CLAUDE.md).

All call sites get faster automatically: /api/query, /api/v2/{scan,
sample,schema}, materialize, the orchestrator's remote-attach, and
the BQ dry-run cap-guard.
2026-05-06 13:06:25 +02:00
ZdenekSrotyr
dee33fe25b feat(pull): range-chunked parallel download for single large files
When the server advertises `accept-ranges: bytes` and a parquet exceeds
`AGNES_PULL_CHUNK_THRESHOLD_BYTES` (default 50 MB), `stream_download`
now splits the file into N parallel HTTP Range requests
(`AGNES_PULL_CHUNK_PARALLELISM`, default 4, capped 1..16) and
assembles the parts into the destination atomically.

Targets the per-flow-shaped network (corp VPN with per-TCP-connection
rate-limiting) where single-stream throughput is throttled but N parallel
streams over the same connection scale roughly linearly. Manifests with
1 large materialized parquet + N remote tables previously left the
existing across-files `AGNES_PULL_PARALLELISM=4` pool with 1 active
worker = single-stream throughput; this fixes that.

Falls back to single-stream when:
- HEAD doesn't advertise `accept-ranges: bytes`
- Server returns 200 instead of 206 to a Range probe
- File size below the threshold

Cleanup discipline: every part file removed before return (success or
failure); destination written via `<target>.tmp` and renamed atomically.
Per-chunk retry on transient network blips (bounded by AGNES_STREAM_RETRIES).
2026-05-06 13:04:53 +02:00
ZdenekSrotyr
b2c1ff143c fix(query): rewrite BQ-backed user SQL via bigquery_query() to enable predicate pushdown
User SQL hitting query_mode='remote' BigQuery rows was 50-100x slower
than the equivalent direct bigquery_query() call because DuckDB's master
view (CREATE VIEW … AS SELECT * FROM bigquery.<ds>.<tbl>) does not push
WHERE/SELECT/LIMIT into BQ in ATTACH-catalog mode. The BQ extension opens
a Storage Read API session over the entire upstream table; on >100M-row
sources this was 70-150s and frequently failed with 'Response too large
to return'.

Extract the existing dry-run rewriter's core (table-name → BQ-native
backtick path) into a shared helper. Add an execution-path rewriter
that wraps the whole user SQL in bigquery_query('<project>', '<inner>')
so the BQ planner sees the full query and engages partition pruning +
projection pushdown server-side.

Conservative fall-through: cross-source JOINs (BQ ↔ Keboola/Jira local),
queries already containing bigquery_query(, and unconfigured BQ project
all skip the rewrite and run the original SQL via ATTACH-catalog so
behavior degrades gracefully.
2026-05-06 13:02:34 +02:00
ZdenekSrotyr
9649f42b99
Merge pull request #198 from keboola/zs/admin-tables-description-clamp
fix(admin/tables): keep row Actions reachable + sanitize description escapes
2026-05-06 11:50:13 +02:00
ZdenekSrotyr
226eb71592 Merge remote-tracking branch 'origin/main' into pr198-review
# Conflicts:
#	CHANGELOG.md
2026-05-06 11:35:45 +02:00
ZdenekSrotyr
6bc8739010 feat(admin/tables): show source, schedule, folder, registered, and sync-error in row 2026-05-06 11:09:02 +02:00
ZdenekSrotyr
b230d44687 docs(admin/tables): clarify NUL sentinel in unescapeShellQuoting 2026-05-06 10:15:56 +02:00
ZdenekSrotyr
c1c3ba5fef fix(admin/tables): script to clean already-corrupted descriptions in registry 2026-05-06 10:14:23 +02:00
ZdenekSrotyr
05e535d743 fix(admin/tables): unescape shell-quoting backslashes in descriptions 2026-05-06 10:13:49 +02:00
ZdenekSrotyr
e369d0ed7b fix(admin/tables): clamp long description to 2 lines so Actions stay reachable 2026-05-06 10:06:57 +02:00
ZdenekSrotyr
226ec9e189
Merge pull request #197 from keboola/fix/bigquery-extension-timeout
fix(bigquery): apply bq_query_timeout_ms on every BQ attach + surface silent failures
2026-05-06 10:00:52 +02:00
ZdenekSrotyr
d68c3c5fa2 release: 0.38.2 — bq_query_timeout_ms applied on every BQ attach + surfaced silent failures 2026-05-06 09:48:12 +02:00
ZdenekSrotyr
cd90d9dfa3 Merge remote-tracking branch 'origin/main' into pr197-review 2026-05-06 09:47:39 +02:00
ZdenekSrotyr
f33e78a85a
Merge pull request #196 from keboola/docs/marketplace-setup-fallback
docs(marketplace): document two-step fallback for marketplace registration
2026-05-06 09:42:34 +02:00
ZdenekSrotyr
a7d19206d7 release: 0.38.1 — docs(marketplace) two-step fallback 2026-05-06 09:27:42 +02:00
Vojtech Rysanek
32c8ea601a fix(bigquery): apply bq_query_timeout_ms on every BQ-extension attach + surface silent failures
The DuckDB BigQuery extension defaults bq_query_timeout_ms to 90 s,
which is too tight for analyst-scale queries against view-backed BQ
datasets. Agnes already has apply_bq_session_settings() that bumps it
to 600 s (configurable via data_source.bigquery.query_timeout_ms), but
two regressions let the 90 s default leak through to live queries:

1. apply_bq_session_settings() swallowed every Exception silently. If
   the BigQuery extension wasn't loaded on the connection yet, or the
   installed extension version didn't recognise the setting, the SET
   would fail and the function would return without surfacing the
   problem. Operators saw 90 s timeouts on 'agnes query --remote' with
   no log line explaining why.

2. The call sites in src/db.py:_reattach_remote_extensions and
   src/orchestrator.py:_remote_attach only invoked
   apply_bq_session_settings on the metadata-token branch (token_env
   empty, the BqAccess contract). The token-based and no-auth branches
   ran ATTACH against the BigQuery extension without ever applying the
   timeout setting — so any BQ source registered with an explicit
   token_env, or with no auth env at all, fell back to the 90 s default.

Fix:

- apply_bq_session_settings now logs WARNING on each failure path
  (instance_config import error, non-numeric value, SET execution
  failure, readback error). It also verifies the setting actually
  landed via SELECT current_setting('bq_query_timeout_ms') and logs
  WARNING when the readback disagrees with the requested value, which
  catches the silent-ignore case some extension versions exhibit.

- Both _reattach_remote_extensions (src/db.py) and _remote_attach
  (src/orchestrator.py) now call apply_bq_session_settings on every
  branch that ATTACHes a BigQuery alias, not only the metadata-token
  branch. Idempotent: calling it twice on the metadata-token path is a
  no-op SET.

Tests:

- Extended the _RecordingConn fixture to support .fetchone() so the
  readback assertion path works. Updated existing call-shape
  assertions to expect the SELECT current_setting readback alongside
  the SET. Added two new tests covering the WARNING surfaces for SET
  failure and readback mismatch — regression guards for the silent-
  fallback bug this PR addresses.

- Full BQ-touching suite (398 tests) passes.
2026-05-06 11:24:14 +04:00
Vojtech Rysanek
abc2335ea2 docs(marketplace): document two-step fallback for marketplace registration
The 'Git channel' block previously showed only the direct '/plugin
marketplace add https://x:$AGNES_PAT@…' path. That path fails on
macOS/Windows against a private-CA Agnes instance because Bun-compiled
'claude' ignores the OS trust store and CA env vars on the marketplace
HTTPS path (see the existing rationale in app/web/setup_instructions.py).

Document the two-step fallback explicitly:

  git clone https://x:$AGNES_PAT@agnes.example.com/marketplace.git/ \
    ~/agnes-marketplace
  claude plugin marketplace add ~/agnes-marketplace

System 'git' honors GIT_SSL_CAINFO + the OS trust store, so the clone
succeeds where direct add fails; pointing Claude Code at the local clone
then sidesteps the Bun TLS path entirely. The dashboard-served setup
payload already branches between the two automatically based on
platform; the docs now match that behavior for manual flows.

Also note the optional 'remote set-url' hardening to strip the PAT from
the cloned repo's origin (mirrors what the dashboard payload does).
2026-05-06 11:00:59 +04:00
ZdenekSrotyr
f598b7e2f6
Merge pull request #180 from keboola/ma/my-ai-stack
feat(store): /store + /my-ai-stack — per-user marketplace composition
2026-05-06 07:41:05 +02:00
ZdenekSrotyr
6c94d2cbce Merge remote-tracking branch 'origin/main' into pr180-review
# Conflicts:
#	CHANGELOG.md
#	pyproject.toml
2026-05-06 07:27:25 +02:00