Commit graph

138 commits

Author SHA1 Message Date
ZdenekSrotyr
8e56d45c68 fix(query): code-review fixes — outer LIMIT wrap, dollar-quoting, parse-error fallback
Address code-reviewer findings on the bigquery_query() rewrite path:

1. Outer LIMIT wrap — bigquery_query() materialises BQ result into DuckDB
   before fetchmany sees it (vs ATTACH-catalog Storage Read API streaming).
   A user 'SELECT *' against a billion-row remote table would buffer the
   entire result before request.limit applied. Wrap rewritten SQL in an
   outer 'LIMIT N+1' so the cap pushes into the BQ job itself.

2. Dollar-quoted inner SQL — naive replace("'", "''") doubling missed
   DuckDB backslash-escape sequences (\\, \\n, \\t, …). A predicate
   like 'WHERE name = ''O\\'Brien''' was unsafe under the doubling
   path. DuckDB $bqq_inner$ … $bqq_inner$ form takes the inner SQL
   verbatim with no escapes whatsoever. Falls back to legacy doubling
   if user SQL improbably contains the literal tag.

3. Parse-error fallback — when the rewritten path fails with a BQ-side
   parse / validation error (DuckDB-only syntax like ::INT cast that
   survives identifier rewrite but BQ refuses), retry the user's
   original SQL via the legacy ATTACH-catalog path so the request still
   succeeds. Mirrors the existing dry-run fallback contract.

4. CHANGELOG — delete duplicate CLI bullets that landed under
   already-released [0.38.1] (file corruption from merge — entries are
   correctly under [0.39.0]).
2026-05-06 13:29:45 +02:00
ZdenekSrotyr
3b9f6b447d release: 0.39.0 — perf bundle (BQ query rewrite + session pool + chunked download + HTTP/2) 2026-05-06 13:18:19 +02:00
ZdenekSrotyr
830d1a38f6 merge: CLI perf (chunked DL + HTTP/2 + persistent client + progress)
# Conflicts:
#	CHANGELOG.md
2026-05-06 13:16:31 +02:00
ZdenekSrotyr
e72ff259f9 feat(pull): aggregated progress + non-TTY textual fallback
Two improvements to `agnes pull` progress reporting:

1. **Aggregated per-file progress across chunked downloads**: the
   existing Rich progress bar already used one task per file, but the
   chunked-download contract (one file = N parallel chunk callbacks
   summing to file size) meant we needed to verify that all chunk
   threads advance the same task. They do — the per-file callback is
   constructed once per tid and routes every chunk's byte delta to the
   same task / textual entry, so the bar shows one aggregated bytes-
   downloaded total rather than N separate sub-bars.

2. **Textual fallback for non-TTY stderr**: when stderr is not a
   terminal (SessionStart hook, CI runner, Docker log capture), Rich
   either suppresses output (silent multi-minute pull on a 5 GB
   parquet) or emits raw control sequences. The new `_TextualProgress`
   helper instead emits one plain-text line per file at most every
   10%-of-total-bytes or 30 s, plus a final `100% done` line per file.
   Format: `[N/T files] <tid>: 25% (16 MB / 66 MB) at 1.5 MB/s`.

The TTY path is unchanged. Detection uses `sys.stderr.isatty()` —
`show_progress=True` flips into the textual fallback when that returns
False. `show_progress=False` (the SessionStart hook) still emits no
progress text in either mode.
2026-05-06 13:09:37 +02:00
ZdenekSrotyr
14db85f506 fix(bq): map 'Response too large' to its own error class instead of generic bad_request
translate_bq_error previously mapped BQ's responseTooLarge failure mode
to bq_bad_request (HTTP 400 with the raw upstream message). The user-
facing implication ('your SQL has a syntax error') is wrong -- the root
cause is query shape (BQ refused to return the result inline because
it exceeded the response size limit), and the actionable remediation is
'narrow the WHERE clause, aggregate further, or use a materialized
table'.

Add bq_response_too_large as a first-class BqAccessError kind (also 400)
with a canonical hint message; original BQ message preserved in details
for operator debugging. Detection is substring-based on 'response too
large' and fires before the generic BadRequest path so the dedicated
mapping always wins. Affects every BQ-touching path since they all
share translate_bq_error -- /api/query, /api/v2/{scan,sample,schema},
materialize.
2026-05-06 13:09:31 +02:00
ZdenekSrotyr
bd1b5ad444 perf(cli): persistent HTTP/2 client across pull invocation
Pool the httpx.Client used by `stream_download` so N parquet downloads
share a single TLS handshake instead of one handshake each. With the
optional `h2` package installed, HTTP/2 multiplexing further lets all
chunk Range requests share a single TCP connection — synergizes with
the range-chunked download path added in the previous commit.

The shared client is created lazily on first stream-download call, kept
alive for the duration of the process via a module-level slot, and
closed at exit via `atexit.register`. Construction wraps in a
try/except: when `h2` is unavailable (slim install), httpx raises
ImportError on `http2=True` and we transparently fall back to an
HTTP/1.1 client — pooling alone still amortizes TLS handshakes.

`agnes pull` must never crash on a missing optional package, so the
fallback path is non-negotiable. `h2>=4.1.0` is added to the core
dependency set; downstream slim installs that drop it lose the HTTP/2
benefit but keep correctness.
2026-05-06 13:06:36 +02:00
ZdenekSrotyr
83209f32b0 perf(bq): pool DuckDB BQ extension sessions to amortize INSTALL/LOAD/ATTACH cost
Each BqAccess.duckdb_session() acquire previously created a fresh
in-memory DuckDB conn and ran INSTALL bigquery; LOAD bigquery;
CREATE SECRET; ATTACH on it -- costing ~0.5 s per request even before
any BQ work. Add a process-local pool (deque + lock) of pre-warmed
sessions; acquire reuses a warm entry when available, refreshing the
auth SECRET so a long-lived pool entry doesn't keep a stale GCE
metadata token past its TTL. Liveness probe (cheap SELECT 1) drops
broken entries before handing them to callers.

On exception inside the with-block the conn is closed instead of
returned to pool (session may carry dirty state). Pool size is
data_source.bigquery.session_pool_size (default 4; sentinel 0
disables pooling). Process-cached, not fork-safe (single uvicorn
worker is the supported deployment shape per CLAUDE.md).

All call sites get faster automatically: /api/query, /api/v2/{scan,
sample,schema}, materialize, the orchestrator's remote-attach, and
the BQ dry-run cap-guard.
2026-05-06 13:06:25 +02:00
ZdenekSrotyr
dee33fe25b feat(pull): range-chunked parallel download for single large files
When the server advertises `accept-ranges: bytes` and a parquet exceeds
`AGNES_PULL_CHUNK_THRESHOLD_BYTES` (default 50 MB), `stream_download`
now splits the file into N parallel HTTP Range requests
(`AGNES_PULL_CHUNK_PARALLELISM`, default 4, capped 1..16) and
assembles the parts into the destination atomically.

Targets the per-flow-shaped network (corp VPN with per-TCP-connection
rate-limiting) where single-stream throughput is throttled but N parallel
streams over the same connection scale roughly linearly. Manifests with
1 large materialized parquet + N remote tables previously left the
existing across-files `AGNES_PULL_PARALLELISM=4` pool with 1 active
worker = single-stream throughput; this fixes that.

Falls back to single-stream when:
- HEAD doesn't advertise `accept-ranges: bytes`
- Server returns 200 instead of 206 to a Range probe
- File size below the threshold

Cleanup discipline: every part file removed before return (success or
failure); destination written via `<target>.tmp` and renamed atomically.
Per-chunk retry on transient network blips (bounded by AGNES_STREAM_RETRIES).
2026-05-06 13:04:53 +02:00
ZdenekSrotyr
b2c1ff143c fix(query): rewrite BQ-backed user SQL via bigquery_query() to enable predicate pushdown
User SQL hitting query_mode='remote' BigQuery rows was 50-100x slower
than the equivalent direct bigquery_query() call because DuckDB's master
view (CREATE VIEW … AS SELECT * FROM bigquery.<ds>.<tbl>) does not push
WHERE/SELECT/LIMIT into BQ in ATTACH-catalog mode. The BQ extension opens
a Storage Read API session over the entire upstream table; on >100M-row
sources this was 70-150s and frequently failed with 'Response too large
to return'.

Extract the existing dry-run rewriter's core (table-name → BQ-native
backtick path) into a shared helper. Add an execution-path rewriter
that wraps the whole user SQL in bigquery_query('<project>', '<inner>')
so the BQ planner sees the full query and engages partition pruning +
projection pushdown server-side.

Conservative fall-through: cross-source JOINs (BQ ↔ Keboola/Jira local),
queries already containing bigquery_query(, and unconfigured BQ project
all skip the rewrite and run the original SQL via ATTACH-catalog so
behavior degrades gracefully.
2026-05-06 13:02:34 +02:00
ZdenekSrotyr
226eb71592 Merge remote-tracking branch 'origin/main' into pr198-review
# Conflicts:
#	CHANGELOG.md
2026-05-06 11:35:45 +02:00
ZdenekSrotyr
6bc8739010 feat(admin/tables): show source, schedule, folder, registered, and sync-error in row 2026-05-06 11:09:02 +02:00
ZdenekSrotyr
05e535d743 fix(admin/tables): unescape shell-quoting backslashes in descriptions 2026-05-06 10:13:49 +02:00
ZdenekSrotyr
e369d0ed7b fix(admin/tables): clamp long description to 2 lines so Actions stay reachable 2026-05-06 10:06:57 +02:00
ZdenekSrotyr
d68c3c5fa2 release: 0.38.2 — bq_query_timeout_ms applied on every BQ attach + surfaced silent failures 2026-05-06 09:48:12 +02:00
ZdenekSrotyr
a7d19206d7 release: 0.38.1 — docs(marketplace) two-step fallback 2026-05-06 09:27:42 +02:00
ZdenekSrotyr
6c94d2cbce Merge remote-tracking branch 'origin/main' into pr180-review
# Conflicts:
#	CHANGELOG.md
#	pyproject.toml
2026-05-06 07:27:25 +02:00
ZdenekSrotyr
fdc6cd7fb4 release: 0.37.0 — STATE_DIR + flat-mount overlay; host-mount direct-bind fix 2026-05-06 06:53:48 +02:00
ZdenekSrotyr
a9ae5f9c35 fix(flat-mount): preserve data:/srv:ro and caddy_config:/config in caddy override; CHANGELOG
The flat-mount overlay's caddy `volumes: !override` block listed only
three mounts, but the base docker-compose.yml caddy service has five.
`!override` (compose-spec semantics) replaces the entire list, so two
mounts were silently dropped under the flat layout:

- `data:/srv:ro` — Caddy's read-only view of the agnes data dir, used
  by the `@download` file_server handler in Caddyfile (added in v0.36.0
  as the perf bypass for multi-GB parquet downloads). Without this
  mount, `try_files /bigquery/data/<id>.parquet …` finds no file and
  every parquet download falls through to the app's uvicorn worker —
  defeating the bypass entirely.
- `caddy_config:/config` — Caddy's autosave/ACME state. Less critical
  (we feed certs in via /certs) but loses the autosaved adapter config
  across container recreates.

Restated both mounts with a comment block explaining the !override
caveat for any future overlay author.

Plus: CHANGELOG entries for the host-mount.yml direct-bind fix and
the STATE_DIR + flat-mount overlay under [Unreleased].
2026-05-05 19:29:38 +02:00
ZdenekSrotyr
e2f740d7ab fix(changelog): consolidate duplicate Added/Changed sections in 0.36.0
Devin Review on PR #188 (15:53Z): the renamed [0.36.0] section had
two separate ### Added blocks and two separate ### Changed blocks,
which violates Keep-a-Changelog grouping (and CLAUDE.md's explicit
'group by section' rule). Merged each set into a single ordered
block: Added, Changed, Fixed. No content removed; only reflowed.
2026-05-05 19:04:51 +02:00
ZdenekSrotyr
f33475cec3 release: 0.36.0 — perf + analyst-clarity bundle
Renames the [Unreleased] section to [0.36.0] in CHANGELOG, adds the
top-level summary, drops a fresh empty [Unreleased] above, and bumps
pyproject from 0.35.1.

Also fixes the third Devin Review finding on this PR: the CLI
ReadTimeout message hardcoded QUERY_TIMEOUT_S (300s) so a 30s-default
call (agnes catalog, agnes auth, …) reported a wait window that
didn't match reality. _translate_transport_error now takes the actual
httpx timeout from the calling helper; the BQ-job advisory only
appears for calls where the timeout was set ≥ 60s.
2026-05-05 18:57:04 +02:00
ZdenekSrotyr
28423907fd feat: clean CLI errors + init progress + skip-materialize + claude.md catalog pointer
Three first-try-failure-surface fixes from Pavel's #185 trace + the
template guidance question, all under PR #188's umbrella so they land
together with the file_server / parallel pull / Tier 1 work.

1. CLI clean-error wrapper — new AgnesTransportError raised by the
   api_*/stream_download helpers when httpx times out / drops /
   refuses, plus a top-level Typer wrapper (cli/main.py) that prints
   one-line "Error: …" + actionable hint and exits non-zero. Full
   traceback goes to ~/.config/agnes/last-error.log for support
   forwarding. Unhandled Exceptions are caught at the same boundary
   so no Python traceback ever leaks to the analyst's terminal.

   Pavel's #185 Phase 3B: a 30-frame httpx traceback from a slow BQ
   --remote query made it look like a CLI bug. Now: clean message +
   hint pointing at `agnes snapshot create` / partition-column
   guidance.

   Entry point in pyproject.toml flipped from `cli.main:app` →
   `cli.main:_run_with_clean_errors` so the wrapper actually runs
   under the installed `agnes` binary.

2. agnes init / agnes pull --skip-materialize + progress bar.
   --skip-materialize omits query_mode='materialized' rows from the
   download set so a first init doesn't spend 44 minutes silently
   pulling a single 6 GB parquet (Pavel's #185 Phase 1). Rich-driven
   per-file progress bar with label/bytes/rate/ETA renders to stderr
   when not --quiet and not --json. Aggregates across the parallel
   ThreadPoolExecutor workers added earlier in this PR.

3. config/claude_md_template.txt: explicit one-line snippet pointing
   at `agnes catalog --json | jq '.tables[] | select(.id=="<id>")'`
   for per-table descriptions + restated invariant: "the description
   field on each catalog row is the authoritative business-rules
   text — re-read live, never copy into this file." Resolves the
   regression-or-feature debate between Pavel (wants annotations)
   and the user feedback that landed in the prior commit (don't
   embed table-specific content; tables change). Catalog command
   stays the source of truth.
2026-05-05 18:11:59 +02:00
ZdenekSrotyr
e5fb913cec perf: Tier 1 event-loop unblocking — async def → def on BQ-bound handlers
Five hottest BQ-touching endpoints were `async def` but invoked synchronous
DuckDB / BQ-extension calls inside the body. Under uvicorn's single event
loop that meant a single heavy `agnes query --remote` (waiting up to
~200 s for BQ's jobs.query) froze EVERY other request — /api/health,
dashboard, auth, even another query — for the full BQ wait. Operators
saw "VM idle, app frozen" during PR #188's testing.

Convert to plain `def` so FastAPI auto-offloads the body to the anyio
thread pool. Event loop stays free for non-BQ requests.

- app/api/query.py:execute_query
- app/api/v2_scan.py:scan_estimate_endpoint, scan_endpoint
- app/api/v2_sample.py:sample
- app/api/v2_schema.py:schema

Audit: 0 `await` statements in any converted handler (verified file-by-
file), so the rename is safe. Tests in tests/test_v2_*.py called the
handlers via `asyncio.run(...)` which now fails on a non-coroutine return;
swapped for direct calls (asyncio.run( -> ( ) — keeps paren balance).

Plus AGNES_THREADPOOL_SIZE env var (default 200, was anyio's stock 40)
in app/main.py:lifespan. Set via
anyio.to_thread.current_default_thread_limiter().total_tokens. 200 is
comfortable headroom for <50 concurrent analysts; bump for more.

480/480 impacted tests pass (the 2 remaining errors are a pre-existing
fixture setup issue in test_reader_smoke_matrix.py unrelated to this
change).
2026-05-05 17:44:08 +02:00
ZdenekSrotyr
30e81a15b9 feat(workspace-prompt): decision tree + size-hint so analyst Claude gets it right first try
Three concrete changes addressing the "analyst Claude misuses the CLI"
class of bugs (image.png table — issues #3, #5, plus the recurrent
"how big is this table" guesswork):

1. config/claude_md_template.txt — the template agnes init writes to
   <workspace>/CLAUDE.md. Surfaces every catalog-row field with a why,
   adds a query_mode-based decision tree, explicit --estimate scoping
   (snapshot create ONLY — was the #1 first-try error), an agnes fetch
   → agnes snapshot create rename note, and a 6-row failure-mode table
   that maps each common error wording to its right next step.

2. app/api/v2_catalog.py — populate rough_size_hint for local +
   materialized rows from the on-disk parquet size, bucketed
   small/medium/large/very_large. Was hardcoded null with a TODO; AI
   couldn't tell "is this 6.8 GB" without a failed --remote round-trip.

3. cli/update_check.py — the [update] banner survived the da→agnes
   rename and printed "[update] da X is out of date" on every command,
   training analysts to associate the binary with the old name.

Verified by rendering the template against representative contexts
(33/33 tests pass) and running every use case from the original
screenshot through the real CLI against a dev VM.
2026-05-05 16:44:24 +02:00
ZdenekSrotyr
2ae486bc5d feat(pull): parallel parquet downloads (AGNES_PULL_PARALLELISM=4 default)
The download loop in cli/lib/pull.py was strictly serial — N tables took
Σ stream_download(t_i). With the Caddy file_server change in this PR,
the server can now sustain many parallel sendfile transfers without
blocking app workers, so the client-side serialization became the new
bottleneck.

Switch to ThreadPoolExecutor capped by AGNES_PULL_PARALLELISM (default 4,
set 1 to restore pre-PR serial). 4 matches typical home-broadband
saturation without over-subscribing the analyst's NIC. Drops to serial
when len(to_download) <= 1 to avoid executor overhead in the common
single-table case.

Per-table error semantics preserved via (tid, entry, err) tuple — a
failure on one parquet doesn't abort the rest of the batch.

Verified end-to-end against a dev VM with the new Caddy file_server
deployed: 2-table pull through agnes CLI works under the new concurrency.
2026-05-05 16:42:55 +02:00
ZdenekSrotyr
ab61e30c91 chore(auto-upgrade): re-fetch compose + Caddyfile, self-update
Sibling change to the Caddy file_server PR (#182). Without this,
existing long-uptime VMs would pull the new agnes image on auto-upgrade
but keep their stale Caddyfile + docker-compose.yml — leaving the
file_server route + the data:/srv:ro mount inert. Confirmed live
2026-05-05 when the file_server change merged in main but stayed
unreachable on a running dev VM until /opt/agnes/* was scp'd by hand.

agnes-auto-upgrade.sh now hashes the bind-mounted config files
(Caddyfile + every docker-compose overlay) on every 5 min tick and
triggers a `docker compose up -d` recreation when the hash drifts
— same trigger path as an image-digest change. Fail-soft via the
.new-then-mv pattern: a curl 404 / network blip leaves the existing
file untouched.

Self-update at the bottom of the script: re-fetch
/usr/local/bin/agnes-auto-upgrade.sh itself so the very fix that
watches config files lands on running VMs without a manual ssh-and-
curl cycle. Otherwise we'd have a self-perpetuating "old script
problem" — the watch-config logic never propagating to the VMs that
need it.

Operators no longer need to ssh + scp Caddyfile/compose changes.
2026-05-05 16:42:13 +02:00
ZdenekSrotyr
1be997f6d4 feat(caddy): file_server for parquet downloads — bypass uvicorn
A single analyst's multi-GB `agnes pull` held the only uvicorn worker
for the duration of the stream, starving UI / /api/health / every other
API endpoint. Container flipped to `unhealthy`. Triggered while a
6.8 GB `order_economics` pull was in-flight on prod 2026-05-05.

Caddy now intercepts `GET /api/data/{table_id}/download` and serves
the parquet directly via sendfile from the data volume (mounted r-o
at /srv inside the caddy container). RBAC enforced by `forward_auth`
to a new lightweight `GET /api/data/{table_id}/check-access` endpoint
(returns 204 / 403) — the bulk transfer never reaches uvicorn.

Path discovery via `try_files` over the known extract.duckdb v2 source
subdirs. Anything not at a static path falls through to the existing
app handler so legacy `src_data/parquet` and future connectors still
work without a Caddyfile change. Non-Caddy deployments are unchanged.

Stage 1 (multi-worker uvicorn) was considered but blocked by the
single-writer DuckDB lock on system.duckdb — workers > 1 would crash
at startup on "Could not set lock on file", the same race that pushed
the scheduler from in-process writes to HTTP-via-app. Multi-reader
workers + single-writer coordination is out of scope for this PR.
2026-05-05 16:41:33 +02:00
ZdenekSrotyr
4f04235502 feat(bigquery): bq_query_timeout_ms knob; default 600s (was 90s)
DuckDB BigQuery extension defaults `bq_query_timeout_ms` to 90 s, which
is too tight for analyst-scale queries against view-backed BQ datasets.
`agnes query --remote` HTTP 400'd with `Binder Error: Query execution
exceeded the timeout. Job ID: ...` whenever the underlying BQ job ran
longer than 90 s, even though the job itself was healthy.

Add `data_source.bigquery.query_timeout_ms` (default 600 000 ms = 10 min,
sentinel 0 falls through to the extension default). Applied via
`SET bq_query_timeout_ms` after every `LOAD bigquery` on every BQ-touching
DuckDB session: orchestrator's `_remote_attach` ATTACH path, BqAccess
session factory, and the standalone extractor. Configurable via
`/admin/server-config` UI.

Fail-soft: extension versions that don't recognise the setting silently
keep the default rather than poisoning the session.
2026-05-05 16:40:40 +02:00
ZdenekSrotyr
4751094e1c
fix(keboola): per-table fallback to legacy Storage-API client (#183)
* fix(keboola): per-table fallback to legacy Storage-API client

The DuckDB Keboola extension's per-table COPY fails with
`Schema '..."in.c-..."' does not exist or not authorized` on
projects whose Snowflake backend doesn't expose bucket schemas
to the storage-token-derived QueryService role
(keboola/duckdb-extension#17). ATTACH itself succeeds, so the
existing extension-level fallback in `_try_attach_extension`
never triggers — the table is just marked failed.

- Promote `kbcstorage>=0.9.0` from optional to core dep so the
  legacy client import in `_extract_via_legacy` doesn't crash
  default installs with `ModuleNotFoundError`.
- Wrap `_extract_via_extension` in a per-table try/except so a
  scan failure retries via `_extract_via_legacy` instead of
  recording `tables_failed` and moving on.

Slower than the extension path, but produces correct parquets
on affected projects while the upstream extension fix lands.

* test(keboola): cover per-table extension→legacy fallback

Two existing tests mocked _extract_via_extension to throw and asserted
the original message survived in result["errors"]. With per-table
fallback, the new flow retries via _extract_via_legacy — which on the
mock URLs would throw a different (404 / DNS-fail) error, replacing the
asserted message.

- Mock _extract_via_legacy alongside _extract_via_extension in
  test_network_timeout_during_extraction +
  test_partial_failure_continues +
  test_all_tables_fail_returns_full_failure_stats so the assertion
  observes the final propagated error from the fallback chain.
- Add test_extension_per_table_failure_falls_back_to_legacy that
  exercises the new behavior directly: extension scan fails with the
  QueryService schema-not-authorized message
  (keboola/duckdb-extension#17), legacy succeeds, parquet ends up
  queryable.
2026-05-05 15:47:44 +02:00
ZdenekSrotyr
4908a0d7a2 Merge remote-tracking branch 'origin/main' into pr180-review
# Conflicts:
#	CHANGELOG.md
#	pyproject.toml
2026-05-05 15:22:10 +02:00
ZdenekSrotyr
a220955640 release: 0.35.1 — CLI --remote query timeout fix
Patch release bundling the only Unreleased change: bump httpx client
timeout for agnes query --remote from 30s to 300s (configurable via
AGNES_QUERY_TIMEOUT). Renames CHANGELOG [Unreleased] section to
[0.35.1] and bumps pyproject version to match.
2026-05-05 15:01:37 +02:00
Vojtech Rysanek
0843c2bd1b fix(cli): bump --remote query timeout to 300s, add AGNES_QUERY_TIMEOUT
The httpx client behind 'agnes query --remote' used the default 30s
timeout, killing every BigQuery SELECT that took longer than half a
minute — i.e. most non-trivial remote queries.

cli/client.py now exposes QUERY_TIMEOUT_S (default 300s, override via
AGNES_QUERY_TIMEOUT) and propagates a kw-only 'timeout' through
api_get/post/delete/patch. _query_remote passes QUERY_TIMEOUT_S so only
the long-running /api/query path gets the bump; every other CLI call
keeps the 30s default.

Server-side has no read deadline on /api/query, so the client cap was
the sole bottleneck.
2026-05-05 16:40:54 +04:00
ZdenekSrotyr
8d8d2c219e refactor(cli-store): pull/info → agnes admin store; add agnes store mine
Backup-orchestration commands were split across two namespaces (pull in
agnes store, push in agnes admin store), which broke the operator
mental model — pull/push are a paired operation and should sit
together.

Move pull + info into agnes admin store so all bulk operations share
one help screen. Add agnes store mine as the user-facing equivalent —
calls the same /api/store/bundle.zip endpoint with ?owner=me, which
the server resolves to the caller's user_id. Authors can archive
their own uploads without admin role; whole-Store bulk reads stay
admin-flavored as a discoverability hint.

Server: 3-line addition to export_bundle handles owner='me' as a
magic alias for the caller. No new endpoint.

Tests updated: pull/info expectations move from agnes store to
agnes admin store; new tests cover agnes store mine and the
?owner=me server resolution. 69/69 store tests green locally.
2026-05-05 13:49:18 +02:00
ZdenekSrotyr
3d63965a67 Merge remote-tracking branch 'origin/main' into pr180-review
# Conflicts:
#	CHANGELOG.md
#	app/web/templates/_app_header.html
2026-05-05 12:05:50 +02:00
ZdenekSrotyr
a8f9d065c8 feat(store): bundle export/import + agnes store update + agnes admin store push
Adds whole-Store backup/restore primitives so an external CI/CD job can
mirror the Store to a git repo (and restore back from one).

REST:
- GET /api/store/bundle.zip — deterministic ZIP of all (filtered) Store
  entities. Layout: manifest.json + entities/<id>/{plugin,assets}/.
  Manifest carries owner_email for cross-instance restore. Auth: any
  authenticated user (Store is community-open).
- POST /api/store/import-bundle — admin-only restore. Modes
  merge|replace|skip; owner resolution by email with stub-disabled-user
  fallback when the email is unknown on the target instance.

CLI:
- agnes store update <id> [--description X] [--zip PATH] ... — in-place
  edit (server PUT permits owner OR admin per F4). Closes the missing
  edit affordance for analysts who want to fix a typo or push a new
  ZIP without losing install_count.
- agnes store pull [-o store.zip] [--unpack DIR] — download the bundle.
  --unpack streams + extracts so an external git-backup workflow can
  drop the tree straight into a repo and `git add .`.
- agnes store info [--json] — counts + size summary.
- agnes admin store push <zip-or-dir> [--mode ...] — admin-only restore.
  Auto-zips a directory client-side so a working-tree → server
  round-trip is one command.

cli/v2_client.py gains api_get_stream helper for binary downloads.

Tests: 5 new server tests (bundle shape + filters + round-trip + stub
user creation + skip mode + admin-only gate) + 11 new CLI tests
(update, pull/unpack, info, admin push). 66/66 store-related tests
green locally.
2026-05-05 11:51:31 +02:00
ZdenekSrotyr
952dc9e74d fix(profile-sessions): tolerate stat() failures on individual jsonl (Devin Review on #179)
The previous gather used `sorted(glob, key=lambda p: p.stat().st_mtime)`.
A transient OSError (race with delete, permission flicker, EBADF on a
weird filesystem) on any single file raised through the lambda and 500-ed
the whole page.

Reworked: stat each path under try/except into a (path, stat) list, sort
the already-statted entries. Bad files drop silently from the listing.

Regression test test_profile_sessions_page_tolerates_stat_failures
patches Path.stat to raise on one of two files, asserts the page returns
200 with the good row rendered and the bad row dropped.
2026-05-05 09:53:06 +02:00
ZdenekSrotyr
d878764ac1 fix(session-collector-api): mirror sibling endpoints' audit-on-exception (Devin Review on #179)
Devin flagged that run_session_collector still had the same audit-skip
gap I fixed in run_verification_detector and run_corporate_memory in
the previous two rounds — a PermissionError walking /home, an OSError
on /data/user_sessions mkdir, or any other unhandled exception from
collector.run() would skip the audit_log row and only show in docker
logs.

Same try/except + unhandled_error pattern as the sibling endpoints.
All three LLM-pipeline run-* endpoints now record their failures the
same way; /admin/scheduler-runs sees them. Regression test in
tests/test_admin_run_endpoints.py::TestRunSessionCollector::test_unhandled_exception_still_audits.
2026-05-05 09:31:33 +02:00
ZdenekSrotyr
9ebe991b55 feat(profile): per-session jsonl download from /profile/sessions
User feedback during e2e of #179: the listing page is nice but I want
to grab the raw jsonl and look at what's inside.

Adds GET /profile/sessions/<filename>:
- Auth via get_current_user (owner-only).
- Path safety: rejects "/", "\", "..", leading ".", and any non-".jsonl"
  filename. The served path resolves under
  ${DATA_DIR}/user_sessions/<caller.id>/; if resolution escapes that
  base directory, returns 404 (never 403, so existence of other users'
  files isn't leaked).
- FileResponse with Content-Disposition: attachment.

UI: Download button per row in profile_sessions.html.

Tests in test_web_ui.py: path-traversal / nested / dotfile / non-jsonl
all 404 for owner; unauthenticated 302/401/403; authenticated owner
gets 200 + correct Content-Disposition.
2026-05-05 09:15:12 +02:00
ZdenekSrotyr
e86da72997 fix(corporate-memory-api): mirror verification-detector audit-on-exception (Devin Review on #179)
Devin flagged that run_corporate_memory still had the same audit-skip
gap I just fixed in run_verification_detector — if collect_all() throws
anything other than the already-translated ValueError (DuckDB lock,
network blip, unexpected SDK error), the audit_log row was never
written and /admin/scheduler-runs missed the failure.

Same try/except + unhandled_error pattern as the verification_detector
fix from 4c4dfee8. Regression test in
tests/test_admin_run_endpoints.py::TestRunCorporateMemory::test_unhandled_exception_still_audits.
2026-05-05 09:11:13 +02:00
ZdenekSrotyr
4c4dfee8e6 feat(profile): /profile/sessions page + audit on detector exception + correct SCHEDULER_AUDIT_ACTIONS
Three changes addressing user feedback during e2e test of #179 + Devin Review on e86dd5ed.

1) /profile/sessions — new self-service user page in the user menu.
   Lists all session jsonls the caller uploaded via `agnes push` joined
   against session_extraction_state. Each row shows uploaded_at, file
   size, status badge (pending/processed/extracted), processed_at, and
   items_extracted. The page docstring + help text explicitly call out
   that items_extracted=0 means the verification detector ran fine but
   the LLM found no claims to track — that's the documented "no items"
   outcome, not a broken pipeline. Closes the gap surfaced during the
   e2e test of #176 where a user could see their sessions on disk and
   process them through the LLM but had no UI to inspect what happened.

2) run_verification_detector audits unhandled exceptions (Devin #1).
   If detector.run() threw anything other than the already-translated
   ValueError, the audit_log row was never written. The endpoint now
   wraps detector.run in try/except, records the exception in
   audit_params["unhandled_error"], then re-raises as 500 after audit.
   The /admin/scheduler-runs page surfaces the failure row with the
   error type + message.

3) SCHEDULER_AUDIT_ACTIONS list corrected (Devin #2). Previous list
   had "marketplaces_sync_all" (wrong — actual is "marketplace.sync_all")
   plus "data_refresh" and "scripts_run_due" which app/api/sync.py and
   app/api/scripts.py don't write to audit_log. Fixed to the four
   actually-logged strings; comment points at the missing audit calls
   as a follow-up.

Tests: tests/test_web_ui.py adds TestAdminRoleGuards::test_profile_sessions_page_no_admin_required and tightens test_admin_scheduler_runs_page_admin_only to assert the correct marketplace.sync_all string.
2026-05-05 08:57:35 +02:00
ZdenekSrotyr
f0d091f721 fix(store): scratch dir leak on ZIP validation failure (Devin Review)
create_entity + update_entity created the `scratch` temp dir inside one
try/finally but cleaned it up in a separate one. Validation HTTPExceptions
raised by _safe_zip_extract (zip_unsafe_path, zip_too_large_uncompressed)
or the BadZipFile→422 conversion exited the first scope, and the second
finally was never entered → temp dir leaked on every failed upload.

Devin flagged this on the F2 commit. The leak pre-existed (zip_unsafe_path
was the original vector); F2 added zip_too_large_uncompressed to the same
broken cleanup path. Fixed by collapsing scratch creation + cleanup into
one outer try/finally that covers both extraction AND metadata/bake; the
inner try/except/finally still handles BadZipFile→422 + tmp file cleanup.

Same restructure in update_entity. Regression test
`test_scratch_dir_cleaned_up_after_failed_extraction` triggers a
zip_unsafe_path 422 and asserts tmp/agnes_store_* contains no leaked
dirs.
2026-05-05 08:52:15 +02:00
ZdenekSrotyr
fd3c76d21b fix(store): security + correctness blockers found in PR review (F1, F2, F4, F5)
Three independent reviews of PR #180 surfaced four real defects in the new
Store / my-ai-stack surface. CHANGELOG entries detail each; one-liners:

- F1 video_url XSS: any authenticated user could upload a Store entity
  with `video_url=javascript:...` and pop XSS in any viewer's session via
  the `<a href=...>` "Watch video" link in store_detail.html. Jinja2
  autoescape doesn't block URI schemes inside attribute values. Fixed by
  scheme-validating to http(s) only on create + update; 400 invalid_video_url.

- F2 ZIP decompression bomb: _safe_zip_extract checked path-traversal but
  not declared file_size totals — a 50 MB compressed upload at 1:1000
  ratio decompresses to 50 GB and DOS the host disk. Fixed by summing
  zinfo.file_size across infolist() and refusing > 200 MB before
  extractall touches disk. 413 zip_too_large_uncompressed.

- F4 admin authz parity: PUT /api/store/entities/{id} was owner-only while
  DELETE allowed owner OR admin; the store-detail page hid Edit/Delete
  buttons from admin even though DELETE was permitted. Fixed by allowing
  admin on PUT and passing is_admin to the template; gate is now
  is_owner OR is_admin everywhere.

- F5 cross-owner suffix collision: sanitize_username is many-to-one
  (alice.smith / alice_smith both → alice-smith). Two such users uploading
  entities with the same display name produced identical
  `<name>-by-<username>` suffixes, silently colliding in the served
  agnes-store-bundle on-disk paths AND the manifest catalog (Claude Code
  dedupes by plugin.json `name`). Fixed by enforcing global uniqueness on
  the suffixed value at create_entity; 409 conflict_global_suffix.

F3 (ZIP symlink members) was investigated and confirmed to be a
false-positive — Python's stdlib ZipFile.extractall does not honor
symlink mode bits, so no exploit exists.

9 new regression tests in tests/test_store_api.py::TestStoreSecurityFixes
covering all four. Test run locally: 60/60 store-related tests pass.
2026-05-05 08:18:02 +02:00
ZdenekSrotyr
e86dd5edc5 fix(anthropic): strict json_schema (additionalProperties=false) + add /admin/scheduler-runs UI
E2E test on a real BQ deploy showed every verification-extraction call
fails with HTTP 400 invalid_request_error: "output_config.format.schema:
For 'object' type, 'additionalProperties' must be explicitly set to false".
The Anthropic structured-output API now requires the field on every object
node in the json_schema. Fix: connectors/llm/anthropic_provider.py wraps
the caller-supplied schema through a recursive _strict_json_schema()
walker that adds the field where missing (preserving any explicit
override), then passes the strict variant to the API. Six unit tests in
TestStrictJsonSchema pin the recursion across nested objects, array items,
and the no-mutation invariant.

Adds /admin/scheduler-runs — a read-only admin page that surfaces the
last 200 audit-log entries from scheduler-driven actions. New
AuditRepository.query_actions(actions, limit) helper, new admin nav
entry. Failed scheduler ticks (HTTP 401, network errors) don't reach
the audit_log; the page calls that out with a hint to set
SCHEDULER_API_TOKEN if no rows show up.
2026-05-05 08:00:57 +02:00
ZdenekSrotyr
9f9aabd72b fix(corporate-memory): CLI catches fail-fast ValueError, exits 1 with clean message (Devin Review on #179)
The PR's #176 fail-fast change made collect_all() raise ValueError when
neither an ai: block nor ANTHROPIC_API_KEY/LLM_API_KEY was available.
verification_detector's CLI was updated to handle it; corporate_memory's
CLI was missed and crashed with an unhandled traceback.

services/corporate_memory/collector.py:main() now wraps the collect_all
call in try/except ValueError, prints a one-line actionable message
to stderr, and returns rc=1.

Regression test:
test_llm_connector.py::TestCorporateMemoryCollector::test_main_returns_1_on_no_ai_config_instead_of_traceback.
2026-05-05 06:45:10 +02:00
ZdenekSrotyr
e68c2d3f0f fix(session-collector): argv-free run() helper, drop SystemExit footgun (Devin Review on #179)
run_session_collector called collector.main() which did argparse.parse_args()
on uvicorn's sys.argv (['app.main:app', '--host', ...]) → sys.exit(2) →
SystemExit(2), which inherits from BaseException, escapes FastAPI handlers,
and propagates through the thread pool. Every scheduler tick that fired the
endpoint either 500-ed or risked killing the uvicorn worker.

services/session_collector/collector.py now exposes run(dry_run, verbose)
that returns (rc, stats); main() is a thin CLI shim that parses argv and
delegates. The admin endpoint calls run() directly and audit-logs the
per-run stats (users_processed, files_copied, files_skipped) instead of
just the rc. Three regression tests in TestRunHelper.

Closes Devin Review finding on app/api/admin.py:2819 (#179).
2026-05-05 06:31:55 +02:00
ZdenekSrotyr
046d8705ee docs(changelog): correct "two paths" claim + document new env vars
The 0.35.0 entry's 'two paths to a working LLM pipeline' wording was
defensible only after the #179 review fixes — on the initial cut, the
seeded-overlay path was dead code (consumers imported the static-only
loader; even when they didn't, env refs in the overlay weren't resolved).
Updated Defect 5's bullet to spell out what was broken and what
shipped, and added a new bullet for the scheduler-cadence env-var fix.
Added the two new test modules under Internal.
2026-05-05 06:05:27 +02:00
Minas Arustamyan
d5a7c9ad79 feat(store): /store + /my-ai-stack — community marketplace + per-user composition
Adds a community-driven Store where any authenticated user uploads
skills/agents/plugins as ZIPs, plus /my-ai-stack as the per-user
composition view. The served Claude Code marketplace is now:

    (admin_granted ∖ opt_outs) ∪ store_installs

Skill + agent installs are merged into a single `agnes-store-bundle`
plugin in the served marketplace; type=plugin uploads stay standalone.
Names are suffixed with `-by-<owner-username>` at upload time so two
owners can use the same display name without colliding in Claude Code's
flat skill/agent namespace.

Schema v23 → v24 adds three tables:
  - store_entities       — community-uploaded skills/agents/plugins
  - user_store_installs  — what each user has chosen to install
  - user_plugin_optouts  — opt-out overlay on top of admin grants

Admin grant-delete drops every user's opt-out for that plugin so
re-grant resets cleanly to enabled (no sticky personal preference).

UI:
  - /store      — e-commerce-style listing with type/category/owner
                  filters, search, pagination, owner-aware [Install]
                  buttons, clickable cards
  - /store/new  — 2-step upload wizard with drag & drop, preview
                  validation (POST /api/store/entities/preview), docs
                  multi-upload, photo + video URL
  - /store/{id} — detail page with hero, file list, docs, owner
                  actions (Edit/Delete) for the uploader
  - /my-ai-stack — Granted plugins (toggle opt-out) + From the Store
                  (uninstall) sections
  - Admin nav: Marketplaces moved into Admin dropdown, renamed to
                "Curated Marketplaces"

Validation hardening: type-mismatch guards reject skill ZIP uploaded as
agent (or vice versa), and plugin ZIPs masquerading as skills/agents.
Human-readable error messages mapped client-side from machine codes.

Cross-source naming: Store entity-id-prefixed dirs (`plugins/store-<id>/`)
plus the bundle (`plugins/store-bundle/`) avoid collisions with admin
marketplaces (whose `store` slug is reserved by `is_valid_slug`).

Bundle composition is content-hashed at serve time — install/uninstall
or owner re-upload bumps the bundle's plugin.json `version`, so Claude
Code's auto-update toggle picks up changes.

Tests: 50+ new tests across naming, repositories, filter (admin ∪ store
∪ bundle), API (upload/install/uninstall/delete/preview/docs), end-to-end
marketplace.zip with bundle merging.
2026-05-05 02:53:49 +02:00
ZdenekSrotyr
567385d046 release: 0.35.0 — session pipeline fix (BREAKING) (#176)
Five compounding defects on default `docker compose up` deploys made the
session pipeline silently broken: sessions uploaded by analysts via
`agnes push` landed on /data/user_sessions/<user>/*.jsonl but nothing
ever processed them. Fix is one PR: promote anthropic + openai to core
deps, wire all three LLM-pipeline jobs into scheduler-v2 with offset
cadences (10m/15m/17m), drop the side-car services from compose, seed a
default ai: block on first-time setup with an env-var fallback in code,
surface the pending review queue to admins, and expose a health check
that warns when uploaded jsonls aren't being processed.

**BREAKING** for operators on COMPOSE_PROFILES=full or with custom
Compose overrides referencing the corporate-memory or session-collector
service stanzas — drop them. The scheduler is now the sole driver.
2026-05-05 00:46:27 +02:00
ZdenekSrotyr
0430c0de00 release: 0.34.0 — clean analyst bootstrap (BREAKING) + bundled fixes
Headlines:

- Clean analyst bootstrap rewrite: web /setup → paste prompt → Claude Code
  in empty folder = working analyst workspace. CLI binary renamed da → agnes.
  See CHANGELOG ## [0.34.0] for the full breaking-change matrix.

- Unified /setup flow: collapsed the admin/analyst tile split (the ?role=
  query parameter introduced mid-cycle is gone). Every signed-in user
  sees the same flow; marketplace + plugins block emitted iff caller has
  plugin grants. PAT scope uniform (general 90 d).

- Bundled fixes: supersedes #172 (Windows console encoding), merges #174
  (BigQuery materialize view fix + concurrency, schema v24 migration),
  closes #171 (--remote query pre-check no longer over-rejects narrow
  queries on partitioned tables, ~30,000x over-estimate fix).

- Devin Review findings addressed throughout the cycle:
  query.py:464 (rewriter cross-contamination), extractor.py:166 (TTL
  reclaim dead code), db.py:1757 (v24 migration retry path),
  init.py:99 (stale on-disk token override), and more.

- Operator UX: register-table now requires --bucket for materialized
  rows + emits first-sync and grant hints on success. agnes status
  sessions counter reads from ~/.claude/projects/<encoded-cwd>/.
  agnes init --token now wins over stale ~/.config/agnes/token.json.

Open follow-ups (separate issues):
- #175 sync architecture redesign (full-extract Keboola, full-file
  downloads, user-global sync_state)
- #177 admin CLI: missing unregister-table / update-table commands
- #178 agnes diagnose: introduce "info" severity tier
2026-05-04 23:13:23 +02:00
ZdenekSrotyr
0612c1e1a1 fix(schema-v24): raise on deferred migration so retry path actually runs (Devin Review on db.py:1757)
Pre-fix: when v24 migration found rows to migrate but
data_source.bigquery.project was empty, it logged a warning per row
and returned normally. Schema_version then bumped to 24 unconditionally
→ next start's 'if current < 24:' gate skipped _v23_to_v24_finalize
forever, leaving rows in DuckDB-flavor SQL that the new
_wrap_admin_sql_for_jobs_api wrapping path rejects.

Devin escalated this from advisory ("idempotent retry") to critical
on rescan after my reply. The reply was wrong — the LIKE filter inside
the function gives idempotency IF the function is called again, but
the schema-version gate prevents that call from happening.

Fix (Devin's recommended Approach 1): raise RuntimeError BEFORE the
schema-version bump when rows need migration but project_id is empty.
The schema_version stays at 23, so on next start the 'if current < 24:'
gate fires and the migration runs again — this time with project_id
configured.

Side effect: a BQ-using deployment that hasn't set the project blocks
startup until they do. That's the right call for a config error that
would otherwise silently break all materialized tables. The error
message points at the right knob (data_source.bigquery.project +
restart).

No-rows-no-block invariant preserved: the early 'if not rows: return'
at the top of _v23_to_v24_finalize means non-BQ deployments are
unaffected.

Tests:
- test_v24_raises_when_project_not_configured_and_rows_need_migration:
  asserts raise + schema_version stays at 23 (the load-bearing
  invariant for retry-on-next-start to work)
- test_v24_skips_clean_when_no_rows_match_even_without_project:
  asserts non-BQ deployments don't block startup
- Existing 3 tests still pass
2026-05-04 23:11:34 +02:00
ZdenekSrotyr
36012e0833 fix(admin): register-table real-world UX gaps for materialized BQ
Three items from operator feedback after running the actual flow:

(1) Help docstring lied: "--bucket / --source-table ignored" for
materialized rows. Reality: --bucket is load-bearing because
`agnes schema <name>` builds the BQ identifier as
`bq.<bucket>.<source_table>`. An empty bucket registered the row but
broke schema/describe with HTTP 400 "unsafe BQ identifier in
registry". Fix: docstring rewritten to reflect reality, plus
client-side validation rejects materialized + empty bucket with a
clear error pointing at the right knob.

(2) Post-register UX cliff: `agnes pull` after register-table reports
"Updated 0 tables (1 total)" because registration adds a registry
row but does NOT trigger a parquet build. Operators routinely
assume something's broken when they need to run
`agnes setup first-sync` to kick off the materialization. Hint
emitted on success now points at first-sync.

(3) RBAC gotcha: `agnes catalog` is RBAC-filtered via
`resource_grants`, so non-admin users don't see freshly-registered
rows until a grant is created. Hint emitted on success now points at
`agnes admin grant create <group> table <name>`.

Tests: 8/8 in test_cli_admin_materialized.py, including two new
regression tests for the validation + the hint output.
2026-05-04 23:06:17 +02:00