Add allow_stopping_for_update=true on google_compute_instance.vm. Without
it, a TF change to machine_type triggers ForceNew (destroy + recreate);
with it, the provider stops + mutates + restarts the VM in place, which
is what an operator resizing a running deployment expects. Tag as
infra-v1.7.0; consumers opt in by bumping the module ref.
See CHANGELOG.md for the full entry. (Bumped from 0.42.0 to 0.43.0 since
0.42.0 was taken by PR #208's backtick-rewriter fix during this branch's
review cycle.)
0.40.0 added _persist_materialized_inner_view in materialize_query, which
tried to open extract.duckdb from a fresh DuckDB handle to write the _meta
row + inner view. In production this conflicts with the same uvicorn
process's existing read-only ATTACH (orchestrator's analytics conn holds
extract.duckdb ATTACHed as <source_name> alias), and DuckDB single-process
file-handle uniqueness rejects with:
Binder Error: Unique file handle conflict: Cannot attach "extract"
— already attached by database "<source>"
The helper logs WARNING fail-soft, parquet stays canonical, but the
master view never appears via the meta path.
Fix: at the end of _attach_and_create_views, scan
<extract_dir>/data/*.parquet and CREATE OR REPLACE VIEW <id> AS
SELECT * FROM read_parquet('<path>') for any parquet whose <id> is not
already in the per-source tables list (= meta path didn't pick it up).
Decoupled from materialize_query open-handle race. Honors the same
view_ownership cross-connector collision rules as the meta path
(first-come-first-served via view_repo.claim).
Tests:
- filesystem-fallback fires when _meta row missing
- skipped when meta path already created the view (no shadow)
- skips invalid identifiers (e.g. parquet stem starting with a digit)
- doesn't crash when source has no data/ subdir
Pre-fix flow:
1. extractor subprocess writes _meta with N remote rows + creates N inner
views in extract.duckdb (rebuild_from_registry skips materialized rows
per design — explicit `continue` at line 389)
2. _run_materialized_pass calls materialize_query, which writes parquet
atomically + returns stats — but never updates _meta
3. orchestrator.rebuild scans _meta, finds only the N remote rows, creates
master views only for them. Materialized parquet is on disk but
invisible to /api/query → 400 'not yet materialized'
Symptom appears after every container recreate (the previous run's _meta
state is wiped because docker compose down nukes the named volume that
backs extract.duckdb on some compose layouts; even on volumes that
persist, the next extractor pass calls _create_meta_table which DROPs
+ CREATEs _meta cleanly).
Fix: after os.replace(tmp_path, parquet_path) in materialize_query, open
extract.duckdb (read-write), DELETE existing _meta row for table_id,
INSERT new one with query_mode='materialized', and CREATE OR REPLACE
VIEW <table_id> AS SELECT * FROM read_parquet(<path>). All inside a
single transaction so concurrent reads see either old or new state, not
torn rows. Fail-soft on lock contention or schema drift — parquet
remains canonical, next sync pass recovers.
Tests: 3 new in test_bq_materialize.py covering:
- meta + inner view registered after materialize, alongside existing
remote rows
- re-run replaces (not duplicates) the meta row
- skips inner-view registration when extract.duckdb doesn't exist yet
(fresh BQ-only deployment edge case)
Address code-reviewer findings on the bigquery_query() rewrite path:
1. Outer LIMIT wrap — bigquery_query() materialises BQ result into DuckDB
before fetchmany sees it (vs ATTACH-catalog Storage Read API streaming).
A user 'SELECT *' against a billion-row remote table would buffer the
entire result before request.limit applied. Wrap rewritten SQL in an
outer 'LIMIT N+1' so the cap pushes into the BQ job itself.
2. Dollar-quoted inner SQL — naive replace("'", "''") doubling missed
DuckDB backslash-escape sequences (\\, \\n, \\t, …). A predicate
like 'WHERE name = ''O\\'Brien''' was unsafe under the doubling
path. DuckDB $bqq_inner$ … $bqq_inner$ form takes the inner SQL
verbatim with no escapes whatsoever. Falls back to legacy doubling
if user SQL improbably contains the literal tag.
3. Parse-error fallback — when the rewritten path fails with a BQ-side
parse / validation error (DuckDB-only syntax like ::INT cast that
survives identifier rewrite but BQ refuses), retry the user's
original SQL via the legacy ATTACH-catalog path so the request still
succeeds. Mirrors the existing dry-run fallback contract.
4. CHANGELOG — delete duplicate CLI bullets that landed under
already-released [0.38.1] (file corruption from merge — entries are
correctly under [0.39.0]).
Two improvements to `agnes pull` progress reporting:
1. **Aggregated per-file progress across chunked downloads**: the
existing Rich progress bar already used one task per file, but the
chunked-download contract (one file = N parallel chunk callbacks
summing to file size) meant we needed to verify that all chunk
threads advance the same task. They do — the per-file callback is
constructed once per tid and routes every chunk's byte delta to the
same task / textual entry, so the bar shows one aggregated bytes-
downloaded total rather than N separate sub-bars.
2. **Textual fallback for non-TTY stderr**: when stderr is not a
terminal (SessionStart hook, CI runner, Docker log capture), Rich
either suppresses output (silent multi-minute pull on a 5 GB
parquet) or emits raw control sequences. The new `_TextualProgress`
helper instead emits one plain-text line per file at most every
10%-of-total-bytes or 30 s, plus a final `100% done` line per file.
Format: `[N/T files] <tid>: 25% (16 MB / 66 MB) at 1.5 MB/s`.
The TTY path is unchanged. Detection uses `sys.stderr.isatty()` —
`show_progress=True` flips into the textual fallback when that returns
False. `show_progress=False` (the SessionStart hook) still emits no
progress text in either mode.
translate_bq_error previously mapped BQ's responseTooLarge failure mode
to bq_bad_request (HTTP 400 with the raw upstream message). The user-
facing implication ('your SQL has a syntax error') is wrong -- the root
cause is query shape (BQ refused to return the result inline because
it exceeded the response size limit), and the actionable remediation is
'narrow the WHERE clause, aggregate further, or use a materialized
table'.
Add bq_response_too_large as a first-class BqAccessError kind (also 400)
with a canonical hint message; original BQ message preserved in details
for operator debugging. Detection is substring-based on 'response too
large' and fires before the generic BadRequest path so the dedicated
mapping always wins. Affects every BQ-touching path since they all
share translate_bq_error -- /api/query, /api/v2/{scan,sample,schema},
materialize.
Pool the httpx.Client used by `stream_download` so N parquet downloads
share a single TLS handshake instead of one handshake each. With the
optional `h2` package installed, HTTP/2 multiplexing further lets all
chunk Range requests share a single TCP connection — synergizes with
the range-chunked download path added in the previous commit.
The shared client is created lazily on first stream-download call, kept
alive for the duration of the process via a module-level slot, and
closed at exit via `atexit.register`. Construction wraps in a
try/except: when `h2` is unavailable (slim install), httpx raises
ImportError on `http2=True` and we transparently fall back to an
HTTP/1.1 client — pooling alone still amortizes TLS handshakes.
`agnes pull` must never crash on a missing optional package, so the
fallback path is non-negotiable. `h2>=4.1.0` is added to the core
dependency set; downstream slim installs that drop it lose the HTTP/2
benefit but keep correctness.
Each BqAccess.duckdb_session() acquire previously created a fresh
in-memory DuckDB conn and ran INSTALL bigquery; LOAD bigquery;
CREATE SECRET; ATTACH on it -- costing ~0.5 s per request even before
any BQ work. Add a process-local pool (deque + lock) of pre-warmed
sessions; acquire reuses a warm entry when available, refreshing the
auth SECRET so a long-lived pool entry doesn't keep a stale GCE
metadata token past its TTL. Liveness probe (cheap SELECT 1) drops
broken entries before handing them to callers.
On exception inside the with-block the conn is closed instead of
returned to pool (session may carry dirty state). Pool size is
data_source.bigquery.session_pool_size (default 4; sentinel 0
disables pooling). Process-cached, not fork-safe (single uvicorn
worker is the supported deployment shape per CLAUDE.md).
All call sites get faster automatically: /api/query, /api/v2/{scan,
sample,schema}, materialize, the orchestrator's remote-attach, and
the BQ dry-run cap-guard.
When the server advertises `accept-ranges: bytes` and a parquet exceeds
`AGNES_PULL_CHUNK_THRESHOLD_BYTES` (default 50 MB), `stream_download`
now splits the file into N parallel HTTP Range requests
(`AGNES_PULL_CHUNK_PARALLELISM`, default 4, capped 1..16) and
assembles the parts into the destination atomically.
Targets the per-flow-shaped network (corp VPN with per-TCP-connection
rate-limiting) where single-stream throughput is throttled but N parallel
streams over the same connection scale roughly linearly. Manifests with
1 large materialized parquet + N remote tables previously left the
existing across-files `AGNES_PULL_PARALLELISM=4` pool with 1 active
worker = single-stream throughput; this fixes that.
Falls back to single-stream when:
- HEAD doesn't advertise `accept-ranges: bytes`
- Server returns 200 instead of 206 to a Range probe
- File size below the threshold
Cleanup discipline: every part file removed before return (success or
failure); destination written via `<target>.tmp` and renamed atomically.
Per-chunk retry on transient network blips (bounded by AGNES_STREAM_RETRIES).
User SQL hitting query_mode='remote' BigQuery rows was 50-100x slower
than the equivalent direct bigquery_query() call because DuckDB's master
view (CREATE VIEW … AS SELECT * FROM bigquery.<ds>.<tbl>) does not push
WHERE/SELECT/LIMIT into BQ in ATTACH-catalog mode. The BQ extension opens
a Storage Read API session over the entire upstream table; on >100M-row
sources this was 70-150s and frequently failed with 'Response too large
to return'.
Extract the existing dry-run rewriter's core (table-name → BQ-native
backtick path) into a shared helper. Add an execution-path rewriter
that wraps the whole user SQL in bigquery_query('<project>', '<inner>')
so the BQ planner sees the full query and engages partition pruning +
projection pushdown server-side.
Conservative fall-through: cross-source JOINs (BQ ↔ Keboola/Jira local),
queries already containing bigquery_query(, and unconfigured BQ project
all skip the rewrite and run the original SQL via ATTACH-catalog so
behavior degrades gracefully.
The flat-mount overlay's caddy `volumes: !override` block listed only
three mounts, but the base docker-compose.yml caddy service has five.
`!override` (compose-spec semantics) replaces the entire list, so two
mounts were silently dropped under the flat layout:
- `data:/srv:ro` — Caddy's read-only view of the agnes data dir, used
by the `@download` file_server handler in Caddyfile (added in v0.36.0
as the perf bypass for multi-GB parquet downloads). Without this
mount, `try_files /bigquery/data/<id>.parquet …` finds no file and
every parquet download falls through to the app's uvicorn worker —
defeating the bypass entirely.
- `caddy_config:/config` — Caddy's autosave/ACME state. Less critical
(we feed certs in via /certs) but loses the autosaved adapter config
across container recreates.
Restated both mounts with a comment block explaining the !override
caveat for any future overlay author.
Plus: CHANGELOG entries for the host-mount.yml direct-bind fix and
the STATE_DIR + flat-mount overlay under [Unreleased].
Devin Review on PR #188 (15:53Z): the renamed [0.36.0] section had
two separate ### Added blocks and two separate ### Changed blocks,
which violates Keep-a-Changelog grouping (and CLAUDE.md's explicit
'group by section' rule). Merged each set into a single ordered
block: Added, Changed, Fixed. No content removed; only reflowed.
Renames the [Unreleased] section to [0.36.0] in CHANGELOG, adds the
top-level summary, drops a fresh empty [Unreleased] above, and bumps
pyproject from 0.35.1.
Also fixes the third Devin Review finding on this PR: the CLI
ReadTimeout message hardcoded QUERY_TIMEOUT_S (300s) so a 30s-default
call (agnes catalog, agnes auth, …) reported a wait window that
didn't match reality. _translate_transport_error now takes the actual
httpx timeout from the calling helper; the BQ-job advisory only
appears for calls where the timeout was set ≥ 60s.
Three first-try-failure-surface fixes from Pavel's #185 trace + the
template guidance question, all under PR #188's umbrella so they land
together with the file_server / parallel pull / Tier 1 work.
1. CLI clean-error wrapper — new AgnesTransportError raised by the
api_*/stream_download helpers when httpx times out / drops /
refuses, plus a top-level Typer wrapper (cli/main.py) that prints
one-line "Error: …" + actionable hint and exits non-zero. Full
traceback goes to ~/.config/agnes/last-error.log for support
forwarding. Unhandled Exceptions are caught at the same boundary
so no Python traceback ever leaks to the analyst's terminal.
Pavel's #185 Phase 3B: a 30-frame httpx traceback from a slow BQ
--remote query made it look like a CLI bug. Now: clean message +
hint pointing at `agnes snapshot create` / partition-column
guidance.
Entry point in pyproject.toml flipped from `cli.main:app` →
`cli.main:_run_with_clean_errors` so the wrapper actually runs
under the installed `agnes` binary.
2. agnes init / agnes pull --skip-materialize + progress bar.
--skip-materialize omits query_mode='materialized' rows from the
download set so a first init doesn't spend 44 minutes silently
pulling a single 6 GB parquet (Pavel's #185 Phase 1). Rich-driven
per-file progress bar with label/bytes/rate/ETA renders to stderr
when not --quiet and not --json. Aggregates across the parallel
ThreadPoolExecutor workers added earlier in this PR.
3. config/claude_md_template.txt: explicit one-line snippet pointing
at `agnes catalog --json | jq '.tables[] | select(.id=="<id>")'`
for per-table descriptions + restated invariant: "the description
field on each catalog row is the authoritative business-rules
text — re-read live, never copy into this file." Resolves the
regression-or-feature debate between Pavel (wants annotations)
and the user feedback that landed in the prior commit (don't
embed table-specific content; tables change). Catalog command
stays the source of truth.
Five hottest BQ-touching endpoints were `async def` but invoked synchronous
DuckDB / BQ-extension calls inside the body. Under uvicorn's single event
loop that meant a single heavy `agnes query --remote` (waiting up to
~200 s for BQ's jobs.query) froze EVERY other request — /api/health,
dashboard, auth, even another query — for the full BQ wait. Operators
saw "VM idle, app frozen" during PR #188's testing.
Convert to plain `def` so FastAPI auto-offloads the body to the anyio
thread pool. Event loop stays free for non-BQ requests.
- app/api/query.py:execute_query
- app/api/v2_scan.py:scan_estimate_endpoint, scan_endpoint
- app/api/v2_sample.py:sample
- app/api/v2_schema.py:schema
Audit: 0 `await` statements in any converted handler (verified file-by-
file), so the rename is safe. Tests in tests/test_v2_*.py called the
handlers via `asyncio.run(...)` which now fails on a non-coroutine return;
swapped for direct calls (asyncio.run( -> ( ) — keeps paren balance).
Plus AGNES_THREADPOOL_SIZE env var (default 200, was anyio's stock 40)
in app/main.py:lifespan. Set via
anyio.to_thread.current_default_thread_limiter().total_tokens. 200 is
comfortable headroom for <50 concurrent analysts; bump for more.
480/480 impacted tests pass (the 2 remaining errors are a pre-existing
fixture setup issue in test_reader_smoke_matrix.py unrelated to this
change).
Three concrete changes addressing the "analyst Claude misuses the CLI"
class of bugs (image.png table — issues #3, #5, plus the recurrent
"how big is this table" guesswork):
1. config/claude_md_template.txt — the template agnes init writes to
<workspace>/CLAUDE.md. Surfaces every catalog-row field with a why,
adds a query_mode-based decision tree, explicit --estimate scoping
(snapshot create ONLY — was the #1 first-try error), an agnes fetch
→ agnes snapshot create rename note, and a 6-row failure-mode table
that maps each common error wording to its right next step.
2. app/api/v2_catalog.py — populate rough_size_hint for local +
materialized rows from the on-disk parquet size, bucketed
small/medium/large/very_large. Was hardcoded null with a TODO; AI
couldn't tell "is this 6.8 GB" without a failed --remote round-trip.
3. cli/update_check.py — the [update] banner survived the da→agnes
rename and printed "[update] da X is out of date" on every command,
training analysts to associate the binary with the old name.
Verified by rendering the template against representative contexts
(33/33 tests pass) and running every use case from the original
screenshot through the real CLI against a dev VM.
The download loop in cli/lib/pull.py was strictly serial — N tables took
Σ stream_download(t_i). With the Caddy file_server change in this PR,
the server can now sustain many parallel sendfile transfers without
blocking app workers, so the client-side serialization became the new
bottleneck.
Switch to ThreadPoolExecutor capped by AGNES_PULL_PARALLELISM (default 4,
set 1 to restore pre-PR serial). 4 matches typical home-broadband
saturation without over-subscribing the analyst's NIC. Drops to serial
when len(to_download) <= 1 to avoid executor overhead in the common
single-table case.
Per-table error semantics preserved via (tid, entry, err) tuple — a
failure on one parquet doesn't abort the rest of the batch.
Verified end-to-end against a dev VM with the new Caddy file_server
deployed: 2-table pull through agnes CLI works under the new concurrency.
Sibling change to the Caddy file_server PR (#182). Without this,
existing long-uptime VMs would pull the new agnes image on auto-upgrade
but keep their stale Caddyfile + docker-compose.yml — leaving the
file_server route + the data:/srv:ro mount inert. Confirmed live
2026-05-05 when the file_server change merged in main but stayed
unreachable on a running dev VM until /opt/agnes/* was scp'd by hand.
agnes-auto-upgrade.sh now hashes the bind-mounted config files
(Caddyfile + every docker-compose overlay) on every 5 min tick and
triggers a `docker compose up -d` recreation when the hash drifts
— same trigger path as an image-digest change. Fail-soft via the
.new-then-mv pattern: a curl 404 / network blip leaves the existing
file untouched.
Self-update at the bottom of the script: re-fetch
/usr/local/bin/agnes-auto-upgrade.sh itself so the very fix that
watches config files lands on running VMs without a manual ssh-and-
curl cycle. Otherwise we'd have a self-perpetuating "old script
problem" — the watch-config logic never propagating to the VMs that
need it.
Operators no longer need to ssh + scp Caddyfile/compose changes.
A single analyst's multi-GB `agnes pull` held the only uvicorn worker
for the duration of the stream, starving UI / /api/health / every other
API endpoint. Container flipped to `unhealthy`. Triggered while a
6.8 GB `order_economics` pull was in-flight on prod 2026-05-05.
Caddy now intercepts `GET /api/data/{table_id}/download` and serves
the parquet directly via sendfile from the data volume (mounted r-o
at /srv inside the caddy container). RBAC enforced by `forward_auth`
to a new lightweight `GET /api/data/{table_id}/check-access` endpoint
(returns 204 / 403) — the bulk transfer never reaches uvicorn.
Path discovery via `try_files` over the known extract.duckdb v2 source
subdirs. Anything not at a static path falls through to the existing
app handler so legacy `src_data/parquet` and future connectors still
work without a Caddyfile change. Non-Caddy deployments are unchanged.
Stage 1 (multi-worker uvicorn) was considered but blocked by the
single-writer DuckDB lock on system.duckdb — workers > 1 would crash
at startup on "Could not set lock on file", the same race that pushed
the scheduler from in-process writes to HTTP-via-app. Multi-reader
workers + single-writer coordination is out of scope for this PR.
DuckDB BigQuery extension defaults `bq_query_timeout_ms` to 90 s, which
is too tight for analyst-scale queries against view-backed BQ datasets.
`agnes query --remote` HTTP 400'd with `Binder Error: Query execution
exceeded the timeout. Job ID: ...` whenever the underlying BQ job ran
longer than 90 s, even though the job itself was healthy.
Add `data_source.bigquery.query_timeout_ms` (default 600 000 ms = 10 min,
sentinel 0 falls through to the extension default). Applied via
`SET bq_query_timeout_ms` after every `LOAD bigquery` on every BQ-touching
DuckDB session: orchestrator's `_remote_attach` ATTACH path, BqAccess
session factory, and the standalone extractor. Configurable via
`/admin/server-config` UI.
Fail-soft: extension versions that don't recognise the setting silently
keep the default rather than poisoning the session.
* fix(keboola): per-table fallback to legacy Storage-API client
The DuckDB Keboola extension's per-table COPY fails with
`Schema '..."in.c-..."' does not exist or not authorized` on
projects whose Snowflake backend doesn't expose bucket schemas
to the storage-token-derived QueryService role
(keboola/duckdb-extension#17). ATTACH itself succeeds, so the
existing extension-level fallback in `_try_attach_extension`
never triggers — the table is just marked failed.
- Promote `kbcstorage>=0.9.0` from optional to core dep so the
legacy client import in `_extract_via_legacy` doesn't crash
default installs with `ModuleNotFoundError`.
- Wrap `_extract_via_extension` in a per-table try/except so a
scan failure retries via `_extract_via_legacy` instead of
recording `tables_failed` and moving on.
Slower than the extension path, but produces correct parquets
on affected projects while the upstream extension fix lands.
* test(keboola): cover per-table extension→legacy fallback
Two existing tests mocked _extract_via_extension to throw and asserted
the original message survived in result["errors"]. With per-table
fallback, the new flow retries via _extract_via_legacy — which on the
mock URLs would throw a different (404 / DNS-fail) error, replacing the
asserted message.
- Mock _extract_via_legacy alongside _extract_via_extension in
test_network_timeout_during_extraction +
test_partial_failure_continues +
test_all_tables_fail_returns_full_failure_stats so the assertion
observes the final propagated error from the fallback chain.
- Add test_extension_per_table_failure_falls_back_to_legacy that
exercises the new behavior directly: extension scan fails with the
QueryService schema-not-authorized message
(keboola/duckdb-extension#17), legacy succeeds, parquet ends up
queryable.
Patch release bundling the only Unreleased change: bump httpx client
timeout for agnes query --remote from 30s to 300s (configurable via
AGNES_QUERY_TIMEOUT). Renames CHANGELOG [Unreleased] section to
[0.35.1] and bumps pyproject version to match.
The httpx client behind 'agnes query --remote' used the default 30s
timeout, killing every BigQuery SELECT that took longer than half a
minute — i.e. most non-trivial remote queries.
cli/client.py now exposes QUERY_TIMEOUT_S (default 300s, override via
AGNES_QUERY_TIMEOUT) and propagates a kw-only 'timeout' through
api_get/post/delete/patch. _query_remote passes QUERY_TIMEOUT_S so only
the long-running /api/query path gets the bump; every other CLI call
keeps the 30s default.
Server-side has no read deadline on /api/query, so the client cap was
the sole bottleneck.
Backup-orchestration commands were split across two namespaces (pull in
agnes store, push in agnes admin store), which broke the operator
mental model — pull/push are a paired operation and should sit
together.
Move pull + info into agnes admin store so all bulk operations share
one help screen. Add agnes store mine as the user-facing equivalent —
calls the same /api/store/bundle.zip endpoint with ?owner=me, which
the server resolves to the caller's user_id. Authors can archive
their own uploads without admin role; whole-Store bulk reads stay
admin-flavored as a discoverability hint.
Server: 3-line addition to export_bundle handles owner='me' as a
magic alias for the caller. No new endpoint.
Tests updated: pull/info expectations move from agnes store to
agnes admin store; new tests cover agnes store mine and the
?owner=me server resolution. 69/69 store tests green locally.
Adds whole-Store backup/restore primitives so an external CI/CD job can
mirror the Store to a git repo (and restore back from one).
REST:
- GET /api/store/bundle.zip — deterministic ZIP of all (filtered) Store
entities. Layout: manifest.json + entities/<id>/{plugin,assets}/.
Manifest carries owner_email for cross-instance restore. Auth: any
authenticated user (Store is community-open).
- POST /api/store/import-bundle — admin-only restore. Modes
merge|replace|skip; owner resolution by email with stub-disabled-user
fallback when the email is unknown on the target instance.
CLI:
- agnes store update <id> [--description X] [--zip PATH] ... — in-place
edit (server PUT permits owner OR admin per F4). Closes the missing
edit affordance for analysts who want to fix a typo or push a new
ZIP without losing install_count.
- agnes store pull [-o store.zip] [--unpack DIR] — download the bundle.
--unpack streams + extracts so an external git-backup workflow can
drop the tree straight into a repo and `git add .`.
- agnes store info [--json] — counts + size summary.
- agnes admin store push <zip-or-dir> [--mode ...] — admin-only restore.
Auto-zips a directory client-side so a working-tree → server
round-trip is one command.
cli/v2_client.py gains api_get_stream helper for binary downloads.
Tests: 5 new server tests (bundle shape + filters + round-trip + stub
user creation + skip mode + admin-only gate) + 11 new CLI tests
(update, pull/unpack, info, admin push). 66/66 store-related tests
green locally.
The previous gather used `sorted(glob, key=lambda p: p.stat().st_mtime)`.
A transient OSError (race with delete, permission flicker, EBADF on a
weird filesystem) on any single file raised through the lambda and 500-ed
the whole page.
Reworked: stat each path under try/except into a (path, stat) list, sort
the already-statted entries. Bad files drop silently from the listing.
Regression test test_profile_sessions_page_tolerates_stat_failures
patches Path.stat to raise on one of two files, asserts the page returns
200 with the good row rendered and the bad row dropped.
Devin flagged that run_session_collector still had the same audit-skip
gap I fixed in run_verification_detector and run_corporate_memory in
the previous two rounds — a PermissionError walking /home, an OSError
on /data/user_sessions mkdir, or any other unhandled exception from
collector.run() would skip the audit_log row and only show in docker
logs.
Same try/except + unhandled_error pattern as the sibling endpoints.
All three LLM-pipeline run-* endpoints now record their failures the
same way; /admin/scheduler-runs sees them. Regression test in
tests/test_admin_run_endpoints.py::TestRunSessionCollector::test_unhandled_exception_still_audits.
User feedback during e2e of #179: the listing page is nice but I want
to grab the raw jsonl and look at what's inside.
Adds GET /profile/sessions/<filename>:
- Auth via get_current_user (owner-only).
- Path safety: rejects "/", "\", "..", leading ".", and any non-".jsonl"
filename. The served path resolves under
${DATA_DIR}/user_sessions/<caller.id>/; if resolution escapes that
base directory, returns 404 (never 403, so existence of other users'
files isn't leaked).
- FileResponse with Content-Disposition: attachment.
UI: Download button per row in profile_sessions.html.
Tests in test_web_ui.py: path-traversal / nested / dotfile / non-jsonl
all 404 for owner; unauthenticated 302/401/403; authenticated owner
gets 200 + correct Content-Disposition.
Devin flagged that run_corporate_memory still had the same audit-skip
gap I just fixed in run_verification_detector — if collect_all() throws
anything other than the already-translated ValueError (DuckDB lock,
network blip, unexpected SDK error), the audit_log row was never
written and /admin/scheduler-runs missed the failure.
Same try/except + unhandled_error pattern as the verification_detector
fix from 4c4dfee8. Regression test in
tests/test_admin_run_endpoints.py::TestRunCorporateMemory::test_unhandled_exception_still_audits.
Three changes addressing user feedback during e2e test of #179 + Devin Review on e86dd5ed.
1) /profile/sessions — new self-service user page in the user menu.
Lists all session jsonls the caller uploaded via `agnes push` joined
against session_extraction_state. Each row shows uploaded_at, file
size, status badge (pending/processed/extracted), processed_at, and
items_extracted. The page docstring + help text explicitly call out
that items_extracted=0 means the verification detector ran fine but
the LLM found no claims to track — that's the documented "no items"
outcome, not a broken pipeline. Closes the gap surfaced during the
e2e test of #176 where a user could see their sessions on disk and
process them through the LLM but had no UI to inspect what happened.
2) run_verification_detector audits unhandled exceptions (Devin #1).
If detector.run() threw anything other than the already-translated
ValueError, the audit_log row was never written. The endpoint now
wraps detector.run in try/except, records the exception in
audit_params["unhandled_error"], then re-raises as 500 after audit.
The /admin/scheduler-runs page surfaces the failure row with the
error type + message.
3) SCHEDULER_AUDIT_ACTIONS list corrected (Devin #2). Previous list
had "marketplaces_sync_all" (wrong — actual is "marketplace.sync_all")
plus "data_refresh" and "scripts_run_due" which app/api/sync.py and
app/api/scripts.py don't write to audit_log. Fixed to the four
actually-logged strings; comment points at the missing audit calls
as a follow-up.
Tests: tests/test_web_ui.py adds TestAdminRoleGuards::test_profile_sessions_page_no_admin_required and tightens test_admin_scheduler_runs_page_admin_only to assert the correct marketplace.sync_all string.
create_entity + update_entity created the `scratch` temp dir inside one
try/finally but cleaned it up in a separate one. Validation HTTPExceptions
raised by _safe_zip_extract (zip_unsafe_path, zip_too_large_uncompressed)
or the BadZipFile→422 conversion exited the first scope, and the second
finally was never entered → temp dir leaked on every failed upload.
Devin flagged this on the F2 commit. The leak pre-existed (zip_unsafe_path
was the original vector); F2 added zip_too_large_uncompressed to the same
broken cleanup path. Fixed by collapsing scratch creation + cleanup into
one outer try/finally that covers both extraction AND metadata/bake; the
inner try/except/finally still handles BadZipFile→422 + tmp file cleanup.
Same restructure in update_entity. Regression test
`test_scratch_dir_cleaned_up_after_failed_extraction` triggers a
zip_unsafe_path 422 and asserts tmp/agnes_store_* contains no leaked
dirs.
Three independent reviews of PR #180 surfaced four real defects in the new
Store / my-ai-stack surface. CHANGELOG entries detail each; one-liners:
- F1 video_url XSS: any authenticated user could upload a Store entity
with `video_url=javascript:...` and pop XSS in any viewer's session via
the `<a href=...>` "Watch video" link in store_detail.html. Jinja2
autoescape doesn't block URI schemes inside attribute values. Fixed by
scheme-validating to http(s) only on create + update; 400 invalid_video_url.
- F2 ZIP decompression bomb: _safe_zip_extract checked path-traversal but
not declared file_size totals — a 50 MB compressed upload at 1:1000
ratio decompresses to 50 GB and DOS the host disk. Fixed by summing
zinfo.file_size across infolist() and refusing > 200 MB before
extractall touches disk. 413 zip_too_large_uncompressed.
- F4 admin authz parity: PUT /api/store/entities/{id} was owner-only while
DELETE allowed owner OR admin; the store-detail page hid Edit/Delete
buttons from admin even though DELETE was permitted. Fixed by allowing
admin on PUT and passing is_admin to the template; gate is now
is_owner OR is_admin everywhere.
- F5 cross-owner suffix collision: sanitize_username is many-to-one
(alice.smith / alice_smith both → alice-smith). Two such users uploading
entities with the same display name produced identical
`<name>-by-<username>` suffixes, silently colliding in the served
agnes-store-bundle on-disk paths AND the manifest catalog (Claude Code
dedupes by plugin.json `name`). Fixed by enforcing global uniqueness on
the suffixed value at create_entity; 409 conflict_global_suffix.
F3 (ZIP symlink members) was investigated and confirmed to be a
false-positive — Python's stdlib ZipFile.extractall does not honor
symlink mode bits, so no exploit exists.
9 new regression tests in tests/test_store_api.py::TestStoreSecurityFixes
covering all four. Test run locally: 60/60 store-related tests pass.
E2E test on a real BQ deploy showed every verification-extraction call
fails with HTTP 400 invalid_request_error: "output_config.format.schema:
For 'object' type, 'additionalProperties' must be explicitly set to false".
The Anthropic structured-output API now requires the field on every object
node in the json_schema. Fix: connectors/llm/anthropic_provider.py wraps
the caller-supplied schema through a recursive _strict_json_schema()
walker that adds the field where missing (preserving any explicit
override), then passes the strict variant to the API. Six unit tests in
TestStrictJsonSchema pin the recursion across nested objects, array items,
and the no-mutation invariant.
Adds /admin/scheduler-runs — a read-only admin page that surfaces the
last 200 audit-log entries from scheduler-driven actions. New
AuditRepository.query_actions(actions, limit) helper, new admin nav
entry. Failed scheduler ticks (HTTP 401, network errors) don't reach
the audit_log; the page calls that out with a hint to set
SCHEDULER_API_TOKEN if no rows show up.
The PR's #176 fail-fast change made collect_all() raise ValueError when
neither an ai: block nor ANTHROPIC_API_KEY/LLM_API_KEY was available.
verification_detector's CLI was updated to handle it; corporate_memory's
CLI was missed and crashed with an unhandled traceback.
services/corporate_memory/collector.py:main() now wraps the collect_all
call in try/except ValueError, prints a one-line actionable message
to stderr, and returns rc=1.
Regression test:
test_llm_connector.py::TestCorporateMemoryCollector::test_main_returns_1_on_no_ai_config_instead_of_traceback.