Commit graph

871 commits

Author SHA1 Message Date
ZdenekSrotyr
cbf335cb5e
Merge pull request #210 from keboola/ma/marketplace-clone-and-auto-refresh
feat(marketplace): clone-based plugin setup + SessionStart auto-refresh
2026-05-07 07:11:59 +02:00
ZdenekSrotyr
d3e8d29cfb test(hooks): pin v0.43.0 chained-entry → v0.44.0 two-entry upgrade path 2026-05-07 07:00:00 +02:00
ZdenekSrotyr
bb36a69b1e release: 0.44.0 2026-05-07 07:00:00 +02:00
Minas Arustamyan
cd10aefdbd fix(refresh-marketplace): align manual-mode hint with hook JSON
Hook JSON path uses /reload-plugins (no restart needed); manual-mode
echo path was still telling the operator to /exit + restart. Both now
say /reload-plugins.

Tests renamed to *_reload_hint_* to match the new wording.
2026-05-07 06:59:13 +02:00
Minas Arustamyan
3aeb0f2fbd fix(refresh-marketplace): use /reload-plugins instead of /exit + restart
Claude Code's `/reload-plugins` slash command picks up newly installed
plugins into the running session without forcing the user to /exit and
restart Claude Code. The hook JSON `systemMessage` and `additionalContext`
both now point at it.

Tests updated to pin the new hint shape.
2026-05-07 06:59:13 +02:00
Minas Arustamyan
166c1c0752 fix(refresh-marketplace): pass --scope project to claude plugin update
Without `--scope project`, `claude plugin update <name>@agnes` operated
at user scope (the default) instead of updating the project-scoped
install — so version bumps in the served manifest never propagated to
the workspace, even though `claude plugin install` correctly used
`--scope project` for the missing-plugin path.

Mirrors the install line in the same function. Any change refresh-
marketplace makes to a plugin must now stay in project scope —
consistent with the SessionStart hook firing per-workspace.
2026-05-07 06:59:13 +02:00
Minas Arustamyan
50e0463501 feat(marketplace): clone-based plugin setup + auto-refresh SessionStart hook
Adds end-to-end flow for installing and keeping the per-user filtered
Claude Code marketplace in sync with the user's Agnes stack
(admin RBAC grants \ MyAIStack opt-outs U /store installs).

Setup (one-liner in install prompt step 5):
  `agnes refresh-marketplace --bootstrap` clones the per-user marketplace
  bare repo to ~/.agnes/marketplace, strips PAT from the cloned origin
  URL, registers the local path with Claude Code, and installs every
  plugin in the served manifest at --scope project. Replaces a 15-line
  inline shell sequence that tripped Claude Code's agent-driven `rm -rf`
  permission gate.

Auto-refresh (SessionStart hook installed by `agnes init`):
  `agnes refresh-marketplace --quiet` runs every Claude Code session,
  fetches+resets the clone (server rebuilds as orphan commits, so
  pull --ff-only is impossible), and version-aware reconciles:
    - missing in workspace -> claude plugin install <name>@agnes --scope project
    - version differs       -> claude plugin update <name>@agnes
    - matches               -> skip
  Don't auto-uninstall plugins that disappeared from the manifest --
  a transient empty manifest from the server would wipe the stack.

Hook output: when --quiet AND something actually changed, emits Claude
Code hook JSON on stdout -- `systemMessage` (transient toast) and
`hookSpecificOutput.additionalContext` (model-side system reminder),
both carrying the change summary plus a "/exit + restart Claude Code"
instruction (Claude only scans plugins at session start).

Windows hook compatibility: the refresh-marketplace hook command is
wrapped in `bash -c "..."` because Claude Code on Windows runs hook
commands directly without invoking a shell, so `2>/dev/null || true`
would otherwise be passed as literal argv tokens.

Cross-cutting:
  - cli/lib/marketplace.py: shared CLONE_DIR + MARKETPLACE_NAME constants.
  - cli/lib/hooks.py: SessionStart now has two independent entries
    (pull + refresh-marketplace) so a failure in one doesn't suppress
    the other; legacy `da sync` and prior single-pull layouts upgrade
    cleanly on re-init.
  - PAT injection on every git fetch via per-invocation credential
    helper (token in \$AGNES_TOKEN env, never in argv or .git/config).
  - Pre-snapshot of installed plugins captured BEFORE
    `claude plugin marketplace update` so silent auto-applied version
    bumps still fire notifications.
  - scripts/dev/agnes-client-reset.sh: cleans ~/.claude/plugins/marketplaces/agnes,
    ~/.claude/plugins/cache/agnes, drops uv build cache, documents
    workspace-scoped residue that can't be enumerated from the script.
  - app/web/setup_instructions.py: legacy AGNES_DEBUG_AUTH path also
    uses clone (direct HTTPS marketplace add is broken end-to-end on
    every Claude Code distribution -- stores response as single file,
    plugin source paths then 404).

28 new tests (test_cli_refresh_marketplace.py) + extended hook + setup
template tests cover bootstrap, fetch+reset ordering, version-aware
reconcile, project-path filtering, hook JSON shape, and the bash-c
Windows wrapper invariant.
2026-05-07 06:59:13 +02:00
ZdenekSrotyr
f52cfd1119
infra(customer-instance): allow stopping VMs for in-place updates (#211)
Add allow_stopping_for_update=true on google_compute_instance.vm. Without
it, a TF change to machine_type triggers ForceNew (destroy + recreate);
with it, the provider stops + mutates + restarts the VM in place, which
is what an operator resizing a running deployment expects. Tag as
infra-v1.7.0; consumers opt in by bumping the module ref.
2026-05-07 06:58:10 +02:00
ZdenekSrotyr
d3113e7a31
Merge pull request #209 from keboola/zs/cli-auto-upgrade-spec
feat: server-pinned CLI auto-upgrade (0.43.0)
2026-05-06 23:46:47 +02:00
ZdenekSrotyr
e1ac7d41f1 release: 0.43.0 — server-pinned CLI auto-upgrade
See CHANGELOG.md for the full entry. (Bumped from 0.42.0 to 0.43.0 since
0.42.0 was taken by PR #208's backtick-rewriter fix during this branch's
review cycle.)
2026-05-06 23:24:44 +02:00
ZdenekSrotyr
df896816d8 chore: rename stale 'da' references to 'agnes' + CHANGELOG
Drive-by docstring/comment cleanup in cli_artifacts.py and update_check.py.
CHANGELOG entry for the auto-upgrade feature shipped in this branch.
2026-05-06 23:23:59 +02:00
ZdenekSrotyr
73d2896fa6 docs(hooks): update install_claude_hooks docstring for chained SessionStart 2026-05-06 23:23:23 +02:00
ZdenekSrotyr
be62ce61b8 feat(cli): install SessionStart hook chaining self-upgrade then pull
Single hook entry: 'agnes self-upgrade --quiet ... || true; agnes pull
--quiet ... || true'. Shell semicolon guarantees ordering across every
Claude Code version (no reliance on undocumented multi-hook execution
semantics); each segment's || true preserves the original property
that an upgrade failure does not abort the pull.
2026-05-06 23:23:23 +02:00
ZdenekSrotyr
630e224578 feat(cli): add agnes self-upgrade with smoke test + rollback
Reuses cli.update_check.check() for the version probe — extended with
bypass_disabled=True so explicit user-typed self-upgrade is not silenced
by AGNES_NO_UPDATE_CHECK (which is for the implicit warning loop).

Install path: uv tool install --force when uv is on PATH; otherwise
curl + pip via sys.executable (NOT system python3, NOT --user — both
would land outside the agnes venv and silently no-op the upgrade).

Smoke test execs the binary at the install-resolved path (uv tool dir
joined with agnes-the-ai-analyst/bin/agnes, or sys.executable's sibling
agnes for pip) — never via shutil.which, which can resolve a stale shadow
on PATH and produce a false-positive smoke pass on the OLD version. Smoke
also asserts --version output contains info.latest via PEP 440 Version()
equality (so 0.40.0 does not falsely match 0.40.10).

On smoke fail: rollback to last_known_good.json (written only after a
previous run's smoke passed). Rollback rc is captured and surfaced on
stderr if it also fails. First-ever upgrade or unrecoverable rollback
prints the canonical bootstrap recovery: curl -fsSL <server>/cli/install.sh | bash.

AGNES_SELF_UPGRADE_IN_PROGRESS=1 is set for the duration of the run
and propagated to the smoke-test subprocess. Layer B's _check_version_headers
honors the sentinel and skips the < min hard-stop, so an in-flight
upgrade can never sys.exit(2) itself.

--force invalidates the update_check cache BEFORE probing. --force +
offline = exit 1 with explicit stderr (without --force, offline is silent).
--quiet suppresses progress output but never gags failure stderr.
2026-05-06 23:23:23 +02:00
ZdenekSrotyr
d93eda7de3 perf+test(cli): cache User-Agent at module scope; pin local==min boundary 2026-05-06 23:23:23 +02:00
ZdenekSrotyr
2680a6724b feat(cli): hard-stop on incompatible-version response header
Every API response is inspected via httpx event_hooks. When the server
reports X-Agnes-Min-Version > local, CLI prints a remediation message
and exits 2. Latest-version drift continues to be handled by the
update_check warning loop — no double-warning on every API call.
2026-05-06 23:23:23 +02:00
ZdenekSrotyr
af2b866961 docs(version): clarify APP_VERSION scope + middleware /api prefix rationale 2026-05-06 23:23:23 +02:00
ZdenekSrotyr
57170bc556 feat(server): expose APP_VERSION + MIN_COMPAT_CLI_VERSION on /api/* response headers
Adds X-Agnes-Latest-Version and X-Agnes-Min-Version headers to every
/api/* response. CLI consumes these to hard-stop on incompatible drift.
MIN_COMPAT_CLI_VERSION ships at 0.0.0 — no enforcement until a deliberate
wire-protocol break bumps it.

Also dedupes app version logic: app/main.py:_app_version() helper deleted,
replaced by app/version.py:APP_VERSION as the single source of truth.
test_app_version.py rewritten to target app.version.
2026-05-06 23:23:23 +02:00
ZdenekSrotyr
56483989cf docs(plan): server-pinned CLI auto-upgrade — spec + implementation plan
Four review iterations resolved:
- PATH-shadow-safe smoke test (uv tool dir --bin + ~/.local/bin fallback)
- Recursion sentinel for in-flight self-upgrade
- sys.executable + --no-deps pip fallback (NOT system python3, NOT --user)
- Smoke + rollback with rc capture and bootstrap recovery
- Single chained SessionStart entry (shell ; for ordering, no Claude Code semantics dependency)
- AGNES_NO_UPDATE_CHECK bypass for explicit self-upgrade
- _get_shared_client() left unhooked (mid-stream sys.exit unsafe; Caddy proxies parquets anyway)

Targets release 0.40.0.
2026-05-06 23:23:23 +02:00
ZdenekSrotyr
e3494607bf
Merge pull request #208 from keboola/zs/issue-201-rewriter-backtick
fix(query): rewriter respects backtick paths; tighten cap-guard fallback (#201)
2026-05-06 23:09:43 +02:00
ZdenekSrotyr
bc55af6e88 chore: trigger Devin re-review 2026-05-06 22:07:49 +02:00
ZdenekSrotyr
f4bc04958d fix: Devin Review #1 — apply backtick mask to wrapping rewriter
`_rewrite_user_sql_for_bigquery_query` does its own bare-name detection
(mirroring the non-RBAC parts of `_bq_guardrail_inputs`). The backtick
masking from #201 was applied to `_bq_guardrail_inputs` and the
forbidden-table loop, but missed this third site — so a registered
local-mode table name appearing as the table segment of a
user-supplied full backtick path (e.g. ``\`prj.ds.orders\`` matching
registered local ``orders``) tripped the cross-source guard and
forced every backtick-path query into the 50-100× slower
ATTACH-catalog fallback.

Mask once at the top of the function, route both the BQ-name
detection (line ~830) and the cross-source check (line ~867) through
the masked copy. New regression test
`test_local_name_inside_backtick_path_does_not_trip_cross_source`
proves the wrapper now wraps when it should.
2026-05-06 21:06:21 +02:00
ZdenekSrotyr
09958c9d87 release: 0.42.0 2026-05-06 18:04:39 +02:00
ZdenekSrotyr
824e3cb636 feat(query): registry-gate full backtick BigQuery paths (#201)
Adds Pass 3 to `_bq_guardrail_inputs` that scans user SQL for full
backtick paths `<project>.<dataset>.<table>` and gates them
identically to the `bq."<dataset>"."<table>"` pass:

- Project must match the configured BigQuery data project
  (`get_bq_access().projects.data`). Mismatch → HTTP 403
  `bq_path_cross_project`.
- Path must point at a registered row. Unregistered → HTTP 403
  `bq_path_not_registered`.
- Non-admin caller must hold a grant on the registered row's id.
  Missing grant → HTTP 403 `bq_path_access_denied`.

Pre-fix, full backtick paths bypassed Agnes RBAC entirely — only the
service account scope limited reach. Post-fix the boundary matches
what `agnes catalog`-driven flows already enforce. Admin still
bypasses the per-id grant check but cannot bypass registration or
project match.

Pass 3 also seeds `dry_run_set` for resolved registered paths so the
cost-cap dry-run runs against the same physical table the user named
— composing cleanly with the Layer 2 fail-fast fallback.
2026-05-06 18:02:53 +02:00
ZdenekSrotyr
c32be3fe96 fix(query): cap-guard fallback retries original SQL, fails fast (#201)
When BQ rejects the rewritten dry-run SQL with `bq_bad_request`, the
cap-guard now retries with the user's ORIGINAL SQL instead of building
a synthetic `SELECT * FROM <table>` per registered table. The
synthetic path threw away user filters / projections / partition
predicates and routinely ballooned the estimate to "full table size",
falsely tripping `remote_scan_too_large` on legitimate narrow queries
(typical issue #201 trace: rewriter corrupts a backtick path → BQ
parse error → synthetic over-estimate → 400).

Behaviour:

- Rewritten SQL succeeds: same as before (issue #171 single-dry-run).
- Rewritten SQL parse-errors, original SQL succeeds: use original
  estimate. Common case for users submitting BQ-native input.
- Both fail with `bq_bad_request`: HTTP 400 `remote_estimate_failed`
  with a hint pointing at `agnes catalog` / BQ-native syntax. No
  silent over-estimate.
- Non-parse BQ error (forbidden, upstream): still 502 as before.

This is a behaviour change for clients matching error kinds — failure
to estimate scan size now surfaces as `remote_estimate_failed`
instead of being masked behind `remote_scan_too_large` from the
synthetic path.

Replaces the existing `test_guardrail_falls_back_to_per_table_estimate_on_bq_parse_error`
(which pinned the old contract) with `test_fallback_tries_original_sql_first`
and `test_fallback_fails_fast_on_pure_duckdb_syntax`.
2026-05-06 18:02:53 +02:00
ZdenekSrotyr
720a2180c0 fix(query): rewriter respects backtick segments (#201)
`agnes query --remote` corrupted user SQL when the request contained a
full BigQuery backtick path (`<project>.<dataset>.<table>`) whose
table segment matched a registered bare-name alias. The bare-name
rewriter used `\b` word-boundary matching against the lower-cased SQL;
both `.` and `` ` `` are non-word characters, so the regex fired
INSIDE the user's backtick path and produced malformed nested-backtick
SQL that BigQuery rejected at parse time.

Fix:

- Add `_mask_backticks(sql)` helper: replace each `…` segment with
  spaces of equal length, preserving offsets so word-boundary
  searches find positions only outside backticks.
- `_bq_guardrail_inputs` (bare-name pass + forbidden-table pass)
  searches against the masked SQL.
- `_rewrite_bq_table_refs_to_native` Pass 1 splits the SQL on
  `(\`[^\`]*\`)` and rewrites only the outside-backtick chunks. Pass
  2 (`bq."ds"."tbl"` → backtick form) is unchanged — its prefix can't
  appear inside backticks.

Adds three regressions covering the rewrite + guardrail paths.
2026-05-06 18:02:53 +02:00
ZdenekSrotyr
1b49de1568
Merge pull request #202 from keboola/zs/perf-followup-0.41.0
fix(0.41.0): orchestrator filesystem fallback for materialized parquets
2026-05-06 17:16:38 +02:00
ZdenekSrotyr
7781c3f331 fix(0.41.0): orphan parquet skip in filesystem fallback (CI regression)
Pre-existing test_orchestrator_skips_orphan_parquet_in_extracts caught
the regression: my filesystem fallback created master views for ANY
parquet on disk, including orphans where DELETE /api/admin/registry
removed the registry row but the parquet wasn't fully cleaned up.

Fix: load the set of registered materialized table_ids for THIS source
from table_registry before the scan, and skip any parquet whose stem
isn't in that set. If the registry read fails (test fixture, transient
DB error), skip the fallback entirely — orphan exposure is worse than
missing master view recovery.

Pre-existing test now passes. New regression test pins the orphan-skip
contract specifically for the filesystem-fallback path.
2026-05-06 17:06:20 +02:00
ZdenekSrotyr
dfb7f25e76 release: 0.41.0 — orchestrator filesystem fallback for missing _meta materialized rows
0.40.0 added _persist_materialized_inner_view in materialize_query, which
tried to open extract.duckdb from a fresh DuckDB handle to write the _meta
row + inner view. In production this conflicts with the same uvicorn
process's existing read-only ATTACH (orchestrator's analytics conn holds
extract.duckdb ATTACHed as <source_name> alias), and DuckDB single-process
file-handle uniqueness rejects with:

  Binder Error: Unique file handle conflict: Cannot attach "extract"
  — already attached by database "<source>"

The helper logs WARNING fail-soft, parquet stays canonical, but the
master view never appears via the meta path.

Fix: at the end of _attach_and_create_views, scan
<extract_dir>/data/*.parquet and CREATE OR REPLACE VIEW <id> AS
SELECT * FROM read_parquet('<path>') for any parquet whose <id> is not
already in the per-source tables list (= meta path didn't pick it up).

Decoupled from materialize_query open-handle race. Honors the same
view_ownership cross-connector collision rules as the meta path
(first-come-first-served via view_repo.claim).

Tests:
- filesystem-fallback fires when _meta row missing
- skipped when meta path already created the view (no shadow)
- skips invalid identifiers (e.g. parquet stem starting with a digit)
- doesn't crash when source has no data/ subdir
2026-05-06 16:58:18 +02:00
ZdenekSrotyr
0fd73faa8d
Merge pull request #200 from keboola/zs/perf-followup-0.40.0
fix(0.40.0): materialize_query writes _meta + inner view (master view recovery)
2026-05-06 16:18:41 +02:00
ZdenekSrotyr
b5b16e98a0 release: 0.40.0 — materialize_query writes _meta + inner view so master views appear
Pre-fix flow:
1. extractor subprocess writes _meta with N remote rows + creates N inner
   views in extract.duckdb (rebuild_from_registry skips materialized rows
   per design — explicit `continue` at line 389)
2. _run_materialized_pass calls materialize_query, which writes parquet
   atomically + returns stats — but never updates _meta
3. orchestrator.rebuild scans _meta, finds only the N remote rows, creates
   master views only for them. Materialized parquet is on disk but
   invisible to /api/query → 400 'not yet materialized'

Symptom appears after every container recreate (the previous run's _meta
state is wiped because docker compose down nukes the named volume that
backs extract.duckdb on some compose layouts; even on volumes that
persist, the next extractor pass calls _create_meta_table which DROPs
+ CREATEs _meta cleanly).

Fix: after os.replace(tmp_path, parquet_path) in materialize_query, open
extract.duckdb (read-write), DELETE existing _meta row for table_id,
INSERT new one with query_mode='materialized', and CREATE OR REPLACE
VIEW <table_id> AS SELECT * FROM read_parquet(<path>). All inside a
single transaction so concurrent reads see either old or new state, not
torn rows. Fail-soft on lock contention or schema drift — parquet
remains canonical, next sync pass recovers.

Tests: 3 new in test_bq_materialize.py covering:
- meta + inner view registered after materialize, alongside existing
  remote rows
- re-run replaces (not duplicates) the meta row
- skips inner-view registration when extract.duckdb doesn't exist yet
  (fresh BQ-only deployment edge case)
2026-05-06 16:04:58 +02:00
ZdenekSrotyr
6de7084c9f
Merge pull request #199 from keboola/zs/perf-bundle-0.39.0
perf(0.39.0): bundle — BQ query rewrite + session pool + chunked download + HTTP/2
2026-05-06 14:37:48 +02:00
ZdenekSrotyr
f03fa67b2e chore: trigger Devin re-review
All Devin findings from initial review on 8e56d45c addressed:
- Devin #1 (BQ billing project) → fixed in 81d065b1
- Devin #2 (try/except scope) → fixed in aee585fa (was already in flight at initial review time)

Plus three rounds of devil's advocate review (e5645fd2, aee585fa, 77d88014)
addressing 9 additional findings. 76/76 perf tests pass; CI green.
2026-05-06 14:32:36 +02:00
ZdenekSrotyr
81d065b1ea fix: Devin Review #1 — bigquery_query() first arg uses billing project, not data
In cross-project BQ setups (where billing != data), the SA typically has
serviceusage.services.use on the billing project but not on the data
project. The rewriter passed bq.projects.data as the first arg to
bigquery_query(), which BQ uses as the execution + billing project →
403 USER_PROJECT_DENIED.

Match the convention used everywhere else in the codebase
(app/api/v2_scan.py, app/api/v2_sample.py, app/api/v2_schema.py,
connectors/bigquery/extractor.py): backtick paths inside the inner SQL
use the **data** project (resolves the actual table location), the
bigquery_query() first arg uses the **billing** project (decides who
pays + which project the job runs under). For single-project deploys
the two are identical so the fix is a no-op there.

Test pins the cross-project case: data-prj for backticks, billing-prj
for the bigquery_query() first arg.
2026-05-06 14:07:38 +02:00
ZdenekSrotyr
77d88014df fix: devil's advocate R3 — reap PID-suffixed leftovers from dead processes
R3 final pass surfaced one issue, addressed:

R2#2 introduced PID-suffixed <target>.{pid}.tmp / .{pid}.partN to
prevent concurrent agnes pull invocations from yanking each other's
in-progress writes. The pre-clean inside _download_chunked /
_download_single_stream only deletes leftovers from the CURRENT
process's PID — files from a SIGKILL'd or crashed prior pull (any
other PID) are never touched and accumulate on disk forever.

Add _reap_dead_pid_leftovers(target_path) called at the start of both
download paths. Globs <target>.*.tmp / <target>.*.partN, extracts the
embedded PID, calls os.kill(pid, 0) to test liveness (POSIX standard
no-op probe), and unlinks files whose process no longer exists.
Permission-denied = process is alive but owned by another user → keep
the file (conservative). Windows users get the conservative 'keep'
default.

Two new tests pin the behavior — live-PID file preserved, dead-PID
.tmp + .partN reaped, bare-name (legacy) untouched, garbage filenames
skipped without raise.
2026-05-06 14:04:47 +02:00
ZdenekSrotyr
aee585fac6 fix: devil's advocate R2 — narrow shared-client try, PID tmp suffix, Syntax error anchor
R2 adversarial review surfaced 3 issues, all addressed:

#1 cli/client.py:572-577 outer try/except wrapped both _get_shared_client()
AND the actual download. A 401/403/404/5xx from the server triggered a
full second download attempt with a fresh client — wasted bandwidth on
hard failures, no fail-fast on revoked PAT. Narrowed the try to only
the shared-client construction; the download itself is no longer
retried under the fallback except.

#2 concurrent agnes pull invocations (e.g. SessionStart hook + manual
run) collided on bare <target>.tmp / <target>.partN paths — one process's
in-progress write got yanked by the other's cleanup, manifest hash
check then failed spuriously. Per-process suffix (<target>.{pid}.tmp,
<target>.{pid}.partN) makes intermediate files disjoint; the final
os.replace to the bare target is atomic so last-writer-wins.

#3 _looks_like_bq_rewrite_parse_error patterns 'Syntax error' could
false-positive on a query like WHERE log_msg = 'Syntax error in foo'
that fails for an unrelated reason (quota, network) and has the
literal substring echoed in the error text. Anchored to 'Syntax error: '
(with trailing colon) — BQ always emits the colon in this error
format, user SQL string literals normally don't.
2026-05-06 13:57:29 +02:00
ZdenekSrotyr
e5645fd280 fix: devil's advocate R1 — chunked probe, parse-error heuristic narrow, pool settings refresh, content-length sanity, multi-project skip
R1 adversarial review surfaced 5 issues, all addressed:

#1 chunked download silently disabled in non-Caddy deployments (HEAD on
GET-only FastAPI route returns 405). _probe_range_support now falls back
to GET with Range: bytes=0-0 when HEAD fails — works against both
Caddy file_server (HEAD-friendly) and dev FastAPI direct (GET-only).

#2 parse-error fallback heuristic too broad — matched on Unrecognized
name / Function not found / No matching signature / Invalid cast,
which BQ surfaces for ordinary user-column typos. That triggered slow
ATTACH-catalog retry on every typo (2× latency tax). Narrowed to just
'Syntax error' / 'syntax error' which are the genuine DuckDB-vs-BQ
dialect mismatch markers.

#3 apply_bq_session_settings was only run on fresh-built pool entries,
not on reuse. An operator's /admin/server-config change to bq_query
_timeout_ms wouldn't propagate to long-lived pooled sessions until
restart. Fixed: re-apply on every pool acquire (idempotent + fail-soft).

#4 content-length sanity bound — a misconfigured proxy returning a
wildly inflated Content-Length would cause overlapping chunked Range
requests against the actual file → corrupt assembled output (caught
by manifest hash check, but only after wasted bandwidth). Cap at 100
GiB; above that, drop to single-stream.

#5 rewriter assumed every BQ row resolves under the single
bq.projects.data project. Bucket containing '.' suggests a project-
qualified bucket (multi-project deployment); rewriter would silently
target the wrong project. Conservative skip with regression test.
2026-05-06 13:50:46 +02:00
ZdenekSrotyr
8e56d45c68 fix(query): code-review fixes — outer LIMIT wrap, dollar-quoting, parse-error fallback
Address code-reviewer findings on the bigquery_query() rewrite path:

1. Outer LIMIT wrap — bigquery_query() materialises BQ result into DuckDB
   before fetchmany sees it (vs ATTACH-catalog Storage Read API streaming).
   A user 'SELECT *' against a billion-row remote table would buffer the
   entire result before request.limit applied. Wrap rewritten SQL in an
   outer 'LIMIT N+1' so the cap pushes into the BQ job itself.

2. Dollar-quoted inner SQL — naive replace("'", "''") doubling missed
   DuckDB backslash-escape sequences (\\, \\n, \\t, …). A predicate
   like 'WHERE name = ''O\\'Brien''' was unsafe under the doubling
   path. DuckDB $bqq_inner$ … $bqq_inner$ form takes the inner SQL
   verbatim with no escapes whatsoever. Falls back to legacy doubling
   if user SQL improbably contains the literal tag.

3. Parse-error fallback — when the rewritten path fails with a BQ-side
   parse / validation error (DuckDB-only syntax like ::INT cast that
   survives identifier rewrite but BQ refuses), retry the user's
   original SQL via the legacy ATTACH-catalog path so the request still
   succeeds. Mirrors the existing dry-run fallback contract.

4. CHANGELOG — delete duplicate CLI bullets that landed under
   already-released [0.38.1] (file corruption from merge — entries are
   correctly under [0.39.0]).
2026-05-06 13:29:45 +02:00
ZdenekSrotyr
3b9f6b447d release: 0.39.0 — perf bundle (BQ query rewrite + session pool + chunked download + HTTP/2) 2026-05-06 13:18:19 +02:00
ZdenekSrotyr
830d1a38f6 merge: CLI perf (chunked DL + HTTP/2 + persistent client + progress)
# Conflicts:
#	CHANGELOG.md
2026-05-06 13:16:31 +02:00
ZdenekSrotyr
c96ea3ad49 merge: server-side perf (BQ rewrite + session pool + error mapping) 2026-05-06 13:13:48 +02:00
ZdenekSrotyr
e72ff259f9 feat(pull): aggregated progress + non-TTY textual fallback
Two improvements to `agnes pull` progress reporting:

1. **Aggregated per-file progress across chunked downloads**: the
   existing Rich progress bar already used one task per file, but the
   chunked-download contract (one file = N parallel chunk callbacks
   summing to file size) meant we needed to verify that all chunk
   threads advance the same task. They do — the per-file callback is
   constructed once per tid and routes every chunk's byte delta to the
   same task / textual entry, so the bar shows one aggregated bytes-
   downloaded total rather than N separate sub-bars.

2. **Textual fallback for non-TTY stderr**: when stderr is not a
   terminal (SessionStart hook, CI runner, Docker log capture), Rich
   either suppresses output (silent multi-minute pull on a 5 GB
   parquet) or emits raw control sequences. The new `_TextualProgress`
   helper instead emits one plain-text line per file at most every
   10%-of-total-bytes or 30 s, plus a final `100% done` line per file.
   Format: `[N/T files] <tid>: 25% (16 MB / 66 MB) at 1.5 MB/s`.

The TTY path is unchanged. Detection uses `sys.stderr.isatty()` —
`show_progress=True` flips into the textual fallback when that returns
False. `show_progress=False` (the SessionStart hook) still emits no
progress text in either mode.
2026-05-06 13:09:37 +02:00
ZdenekSrotyr
14db85f506 fix(bq): map 'Response too large' to its own error class instead of generic bad_request
translate_bq_error previously mapped BQ's responseTooLarge failure mode
to bq_bad_request (HTTP 400 with the raw upstream message). The user-
facing implication ('your SQL has a syntax error') is wrong -- the root
cause is query shape (BQ refused to return the result inline because
it exceeded the response size limit), and the actionable remediation is
'narrow the WHERE clause, aggregate further, or use a materialized
table'.

Add bq_response_too_large as a first-class BqAccessError kind (also 400)
with a canonical hint message; original BQ message preserved in details
for operator debugging. Detection is substring-based on 'response too
large' and fires before the generic BadRequest path so the dedicated
mapping always wins. Affects every BQ-touching path since they all
share translate_bq_error -- /api/query, /api/v2/{scan,sample,schema},
materialize.
2026-05-06 13:09:31 +02:00
ZdenekSrotyr
bd1b5ad444 perf(cli): persistent HTTP/2 client across pull invocation
Pool the httpx.Client used by `stream_download` so N parquet downloads
share a single TLS handshake instead of one handshake each. With the
optional `h2` package installed, HTTP/2 multiplexing further lets all
chunk Range requests share a single TCP connection — synergizes with
the range-chunked download path added in the previous commit.

The shared client is created lazily on first stream-download call, kept
alive for the duration of the process via a module-level slot, and
closed at exit via `atexit.register`. Construction wraps in a
try/except: when `h2` is unavailable (slim install), httpx raises
ImportError on `http2=True` and we transparently fall back to an
HTTP/1.1 client — pooling alone still amortizes TLS handshakes.

`agnes pull` must never crash on a missing optional package, so the
fallback path is non-negotiable. `h2>=4.1.0` is added to the core
dependency set; downstream slim installs that drop it lose the HTTP/2
benefit but keep correctness.
2026-05-06 13:06:36 +02:00
ZdenekSrotyr
83209f32b0 perf(bq): pool DuckDB BQ extension sessions to amortize INSTALL/LOAD/ATTACH cost
Each BqAccess.duckdb_session() acquire previously created a fresh
in-memory DuckDB conn and ran INSTALL bigquery; LOAD bigquery;
CREATE SECRET; ATTACH on it -- costing ~0.5 s per request even before
any BQ work. Add a process-local pool (deque + lock) of pre-warmed
sessions; acquire reuses a warm entry when available, refreshing the
auth SECRET so a long-lived pool entry doesn't keep a stale GCE
metadata token past its TTL. Liveness probe (cheap SELECT 1) drops
broken entries before handing them to callers.

On exception inside the with-block the conn is closed instead of
returned to pool (session may carry dirty state). Pool size is
data_source.bigquery.session_pool_size (default 4; sentinel 0
disables pooling). Process-cached, not fork-safe (single uvicorn
worker is the supported deployment shape per CLAUDE.md).

All call sites get faster automatically: /api/query, /api/v2/{scan,
sample,schema}, materialize, the orchestrator's remote-attach, and
the BQ dry-run cap-guard.
2026-05-06 13:06:25 +02:00
ZdenekSrotyr
dee33fe25b feat(pull): range-chunked parallel download for single large files
When the server advertises `accept-ranges: bytes` and a parquet exceeds
`AGNES_PULL_CHUNK_THRESHOLD_BYTES` (default 50 MB), `stream_download`
now splits the file into N parallel HTTP Range requests
(`AGNES_PULL_CHUNK_PARALLELISM`, default 4, capped 1..16) and
assembles the parts into the destination atomically.

Targets the per-flow-shaped network (corp VPN with per-TCP-connection
rate-limiting) where single-stream throughput is throttled but N parallel
streams over the same connection scale roughly linearly. Manifests with
1 large materialized parquet + N remote tables previously left the
existing across-files `AGNES_PULL_PARALLELISM=4` pool with 1 active
worker = single-stream throughput; this fixes that.

Falls back to single-stream when:
- HEAD doesn't advertise `accept-ranges: bytes`
- Server returns 200 instead of 206 to a Range probe
- File size below the threshold

Cleanup discipline: every part file removed before return (success or
failure); destination written via `<target>.tmp` and renamed atomically.
Per-chunk retry on transient network blips (bounded by AGNES_STREAM_RETRIES).
2026-05-06 13:04:53 +02:00
ZdenekSrotyr
b2c1ff143c fix(query): rewrite BQ-backed user SQL via bigquery_query() to enable predicate pushdown
User SQL hitting query_mode='remote' BigQuery rows was 50-100x slower
than the equivalent direct bigquery_query() call because DuckDB's master
view (CREATE VIEW … AS SELECT * FROM bigquery.<ds>.<tbl>) does not push
WHERE/SELECT/LIMIT into BQ in ATTACH-catalog mode. The BQ extension opens
a Storage Read API session over the entire upstream table; on >100M-row
sources this was 70-150s and frequently failed with 'Response too large
to return'.

Extract the existing dry-run rewriter's core (table-name → BQ-native
backtick path) into a shared helper. Add an execution-path rewriter
that wraps the whole user SQL in bigquery_query('<project>', '<inner>')
so the BQ planner sees the full query and engages partition pruning +
projection pushdown server-side.

Conservative fall-through: cross-source JOINs (BQ ↔ Keboola/Jira local),
queries already containing bigquery_query(, and unconfigured BQ project
all skip the rewrite and run the original SQL via ATTACH-catalog so
behavior degrades gracefully.
2026-05-06 13:02:34 +02:00
ZdenekSrotyr
9649f42b99
Merge pull request #198 from keboola/zs/admin-tables-description-clamp
fix(admin/tables): keep row Actions reachable + sanitize description escapes
2026-05-06 11:50:13 +02:00
ZdenekSrotyr
226eb71592 Merge remote-tracking branch 'origin/main' into pr198-review
# Conflicts:
#	CHANGELOG.md
2026-05-06 11:35:45 +02:00
ZdenekSrotyr
6bc8739010 feat(admin/tables): show source, schedule, folder, registered, and sync-error in row 2026-05-06 11:09:02 +02:00