## Summary
Smoke-testing the just-shipped 0.47.1 against production exposed two regressions:
1. `agnes query --remote "SELECT FROM unit_economics WHERE bad_col=1"` returned `Table "unit_economics" must be qualified` (the OLD error) instead of `Unrecognized name: bad_col` (the #218 fix's intended behavior).
2. `agnes query "DESCRIBE unit_economics"` showed only DuckDB's misleading `Did you mean order_economics?` with no Agnes hint paragraph (the #219 fix is missing).
Root cause: PR #217's squash merge (`506a378c`) carried stale snapshots of `app/api/query.py` and `cli/commands/query.py` from before #218 and #219 merged. The rebase-and-merge auto-merged those files cleanly (no conflict markers) but the result silently reverted both fixes.
Restore the two changes verbatim. Tests for both fixes already on main and continue to pass against the restored code.
## Test plan
- [x] `pytest tests/test_api_query_guardrail.py tests/test_cli_query.py` — clean
- [x] Manual repro against prod after deploy: both flows now surface the intended diagnostic.
<!-- devin-review-badge-begin -->
---
<a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/225" target="_blank">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review">
</picture>
</a>
<!-- devin-review-badge-end -->
## Summary
Brings the Keboola connector to feature parity with the legacy internal data-analyst's per-table sync strategies. Closes the four documented gaps from the spec branch (`zs/keboola-connector-specs`):
- **Typed parquet** in the legacy SDK extraction path — column types from Keboola Storage metadata (provider cascade `user > ai-metadata-enrichment > keboola.snowflake-transformation`) survive the CSV → parquet roundtrip; invalid date strings (`'0000-00-00'`) and invalid numeric strings (`'Non-Manager'`) become NULL while keeping the column's typed schema. Pre-fix everything was VARCHAR.
- **Incremental sync** via Storage API `changedSince` — opt-in per table; pulls only delta rows, merges into the existing parquet by `primary_key` (drop_duplicates with keep='last'). Cuts daily extraction from O(full table) to O(delta).
- **Partitioned sync** — flat per-partition layout `data/<table>/<key>.parquet` (e.g. `2026_05.parquet`), per-affected-partition merge for daily updates, chunked initial load with 1-day overlap and 2-empty-chunk stop heuristic.
- **`where_filters`** — server-side row filter with date placeholders (`{{today}}`, `{{last_3_months}}`, `{{start_of_3_months_ago}}`, etc.) resolved at sync time. Force the SDK path; reject `incremental + where_filters` combination at API layer (changedSince already filters temporally).
## Architecture
- **Schema migration v25 → v26**: 7 new columns on `table_registry`. Existing `sync_strategy` column reused (pre-v26 it was inert catalog metadata; post-v26 the extractor dispatches off it).
- **Per-table dispatcher** in `extractor.run()` routes to one of `_extract_via_extension` (full_refresh + extension), `_extract_via_legacy` (full_refresh + filters or extension fallback), `extract_incremental`, or `extract_partitioned`.
- **API conflict policy**: `incremental + where_filters` → 422; `partitioned + query_mode='remote'` → 422; `partitioned ⇒ partition_by required`.
- **Admin UI**: third "Direct extract (Storage API)" radio in the Keboola Register / Edit modals, alongside existing "Whole table (extension)" and "Custom SQL". When selected, exposes a v26 sync-strategy panel with conditional fields per strategy.
## Test plan
- [x] **Unit + module** — 134 v26 tests covering migration, repo, parquet_io, where_filters, incremental (compute_changed_since + merge_parquet + extract_incremental E2E), partitioned (key derivation + merge_partition + chunked windows + extract_partitioned E2E), extractor dispatcher, admin API validators, PUT field clearing, registry-shape → dispatcher bridge
- [x] **HTML form structure** — all v26 inputs + visibility classes + JS payload fields verified in rendered template
- [x] **Real Keboola roundtrip** — registered a small test table as `sync_strategy='incremental'` against a test Storage project, triggered two syncs:
- Sync 1: `changedSince=None` → full pull → 9 rows typed parquet
- Sync 2: `changedSince=last_sync - 1d window` → 9 delta rows merged with 9 existing → 9 after dedup on primary_key (PK merge confirmed)
- [x] **Browser UX** — agent-browser session against a local uvicorn: login → admin/tables → register modal → switch radios → verify field visibility per strategy → submit → edit existing row → switch to Direct/Incremental → save → confirm DB persistence
- [x] **Regression** — no regressions in the broader 3252-test suite (3 pre-v26 tests updated for the deprecation-marker removal + schema-version bump; 2 pre-existing environment-sensitive test failures unrelated to this change)
## Bugs caught + fixed during E2E
The browser + real-Keboola roundtrip exposed four bugs the unit tests missed:
1. **JS visibility race** — two competing `forEach` loops set `display=''` then `display='none'` on form elements sharing `kb-strategy-incremental kb-strategy-partitioned` classes (window_days + max_history_days are reused across strategies). Fix: single-pass selector with class-based visibility resolver.
2. **PUT cannot clear field** — pre-v26 `updates = {k: v ... if v is not None}` collapsed "omitted from body" and "sent as null" into the same case, so admin couldn't switch a partitioned row back to full_refresh and have stale `partition_by` clear. Fix: `model_dump(exclude_unset=True)`.
3. **Subprocess DB lock conflict** — `_read_last_sync` reopened `system.duckdb` while the parent server held the write lock (subprocess contract at `app/api/sync.py:_run_sync` line 260). Fix: parent injects `__last_sync__` into table_config before subprocess spawn.
4. **Wrong KBC table_id** — `extract_incremental` / `extract_partitioned` built the Storage API table_id from the registry row's slugified `id` (`circle_inc`) instead of `bucket.source_table` (`in.c-finance.circle`), producing 404s. Fix: prefer `bucket+source_table`; fall back to `id` only when bucket empty.
## Operator notes
- Existing tables stay on `full_refresh` after migration; admins opt individual tables in via `agnes admin register-table --sync-strategy ...`, the Keboola Edit modal, or `POST/PUT /api/admin/registry`.
- `merge_parquet` and `merge_partition` use `pd.concat + drop_duplicates`, loading both existing and delta into pandas RAM. For tables in the multi-million-row range this may OOM — switch to `partitioned` strategy for those (per-partition merge keeps memory bounded). Documented in `### Internal` of the changelog entry.
- Date placeholders are resolved at **sync time**, not register time — a typo'd `{{lasst_week}}` is accepted at register and surfaces only when the next sync runs. By design (rolling windows need late-binding).
## Spec source
The four corresponding plans on the `zs/keboola-connector-specs` branch under `docs/superpowers/plans/2026-05-07-0[1-4]-*.md` capture the design rationale and link back to internal repo references for each subsystem.
<!-- devin-review-badge-begin -->
---
<a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/217" target="_blank">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review">
</picture>
</a>
<!-- devin-review-badge-end -->
## Summary
- Catalog enrichment for `query_mode='remote'` rows: `rows`, `size_bytes`, `partition_by`, `clustered_by` per table (BQ + Keboola providers).
- `/api/v2/schema/{id}` cache miss: 2 BQ jobs → 1 (-50%) via shared `fetch_bq_columns_full`.
- All four catalog/schema/sample/metadata caches flush on registry change; single-row re-warm scheduled.
- Automatic cache warmup at server startup (bounded concurrency, opt-out via `AGNES_SKIP_CACHE_WARMUP=1`).
- SSE-driven freshness toolbar on `/admin/tables` with progress bar, log, and per-row badge.
- New admin doc `docs/admin/query-modes.md` — single source of truth on `local` / `remote` / `materialized` choice.
Closes#155.
Closes#156.
## Test plan
- [x] 65+ targeted tests pass across 11 new test modules + 3 modified ones.
- [x] No DB migration; no wire-break; `MIN_COMPAT_CLI_VERSION` unchanged.
- [ ] Reviewer: register a remote BQ table via `/admin/tables`, observe the toolbar populates within ~2 s and the per-row badge transitions warming → fresh.
- [ ] Reviewer: trigger `Re-warm all`, verify SSE log scrolls and `cacheWarmupBar` progresses.
- [ ] Reviewer: edit a registered row's bucket, verify `agnes schema <id>` returns updated columns immediately (no 1-hour staleness).
- [ ] Reviewer: confirm `agnes admin register-table --query-mode remote` prints the new IAM-smoke-check hint.
## Notable design decisions
- BigQuery `INFORMATION_SCHEMA.TABLE_STORAGE` is the only valid scope for size+rows (verified live 2026-05-07; dataset-scoped doesn't exist). Region resolved from `instance.yaml.data_source.bigquery.location` → `bq.client().get_dataset(...)` → fall back to legacy `__TABLES__`.
- VIEW handling: TABLE_STORAGE returns no rows for views, fall through to `__TABLES__` (also empty) → `TableMetadata(rows=None, size_bytes=None, partition_by=..., clustered_by=...)`. Null size signals analyst Claude to apply existing CLAUDE.md guidance.
- `size_bytes` is `active_logical_bytes + long_term_logical_bytes` — full BQ scan reads both; reporting only active undercounts aged partitioned tables.
- Source-agnostic provider seam: per-source `connectors/<source>/metadata.py:fetch(MetadataRequest)`; dispatcher in `app/api/v2_catalog.py:_metadata_provider_for` lazily imports per source_type so a Keboola-only deployment doesn't pay the BQ-extension import cost.
- Warmup non-blocking: FastAPI `lifespan` schedules `asyncio.create_task(_warm_catalog_caches_bg)` before `yield`. Per-row failures isolated.
## Out of scope
- Profile / column histograms / dimension cardinality for remote tables (separate issue).
- Onboarding nudge ("you have 0 remote tables, consider registering some BQ ones") — separate UX call.
- Provider plug-in registration via entry-points (the dispatch table is a hardcoded if-tree today; one line per future source).
## Release
Bumps `pyproject.toml` 0.46.1 → 0.47.0 (main shipped 0.46.0 + 0.46.1 during this PR — see commit `d98976ec`). New CHANGELOG section under `## [0.47.0] — 2026-05-07`.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- devin-review-badge-begin -->
---
<a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/223" target="_blank">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review">
</picture>
</a>
<!-- devin-review-badge-end -->
## Summary
Two bugs in `agnes describe` surfaced from a real analyst session following the CLAUDE.md agent-rails discovery workflow. Together they break `agnes describe` end-to-end for any analyst (or analyst-AI) who follows the documented form.
### A) CLI parsing
`agnes describe TABLE -n 5` failed with `Missing argument 'TABLE_ID'`. Root cause: the command was registered as a `Typer.Typer` subcommand group via `app.add_typer(describe_app, name="describe")` + `@describe_app.callback(invoke_without_command=True)`, and that pattern mis-parses positional + short-int option in some orderings. Same pattern in `cli/commands/schema.py` works only because schema has no INTEGER short option. Fix: switch to flat `@app.command("describe")`.
### B) Server NaN
`/api/v2/sample/<id>` (called by `agnes describe`) returned HTTP 500 with `ValueError: Out of range float values are not JSON compliant: nan` whenever a row contained NaN. Fix: sanitize NaN/±inf to None before JSON serialization.
## Test plan
- [x] `pytest tests/test_cli_describe*.py` — added regression tests pinning `-n` parsing on either side of the positional.
- [x] `pytest tests/test_api_v2_sample*.py` — added regression test for NaN row → JSON `null` (not 500).
<!-- devin-review-badge-begin -->
---
<a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/224" target="_blank">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review">
</picture>
</a>
<!-- devin-review-badge-end -->
## Summary
Verified against production: `claude -p` headless mode doesn't fire SessionEnd hooks (proven via `--output-format stream-json --include-hook-events`: zero `SessionEnd` events), so any session JSONLs from `-p` invocations stay orphaned locally and never reach the server. Fix: add `agnes push --quiet` as a third SessionStart entry — symmetric self-heal alongside the existing `agnes pull` entry. Existing workspaces pick this up on their next `agnes init` via the marker-based migration already in `cli/lib/hooks.py`.
Separately: a colleague's fresh install showed `agnes diagnose` warning "uploads are not being processed", which led them to suspect their `agnes push` was broken. The warning is actually about the LLM-based `verification-detector` backlog (uploads themselves were arriving fine — confirmed by 23+3 JSONLs landed on the server while the warning was firing). Reword the warning to "verification-detector backlog" + add `last_processed` to the diagnose dict so operators don't have to grep logs to confirm.
## Test plan
- [x] `pytest tests/test_lib_hooks.py` — updated count + added `agnes push in SessionStart` assertion.
- [x] `pytest tests/test_setup_hooks_template.py` — updated.
- [x] `pytest tests/test_clean_install_integration.py` — updated.
- [x] `pytest tests/test_health_session_pipeline.py` — updated warning text + asserted `last_processed` field.
<!-- devin-review-badge-begin -->
---
<a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/220" target="_blank">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review">
</picture>
</a>
<!-- devin-review-badge-end -->
## Summary
When `agnes query --remote` references a column that doesn't exist on the FROM table, users were seeing `Table "<id>" must be qualified with a dataset` instead of the actually-useful `Unrecognized name: <column>` from BigQuery. Surface the first-attempt diagnostic now; keep the second-attempt context as `underlying_original`.
Reproduced against production:
```
$ agnes query --remote "SELECT COUNT(*) FROM unit_economics WHERE authorize_date = DATE '2025-05-06'"
Error: remote_estimate_failed (HTTP 400)
message: Could not estimate scan size for this query.
underlying: 400 ... Table "unit_economics" must be qualified with a dataset.
```
(`unit_economics` has `authorize_timestamp`, not `authorize_date`.)
## Test plan
- [x] New `test_remote_estimate_failed_surfaces_first_error_when_attempts_differ` asserts the first-attempt message wins, second-attempt is preserved as `underlying_original`, hint points to `agnes schema`.
- [x] Existing `test_guardrail_returns_400_remote_estimate_failed_on_double_parse_error` still passes (both attempts mocked to identical error).
- [x] `pytest tests/test_api_query_guardrail.py` clean.
<!-- devin-review-badge-begin -->
---
<a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/218" target="_blank">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
<img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review">
</picture>
</a>
<!-- devin-review-badge-end -->
* fix: cutover regressions + parallel Keboola legacy fallback
Bundled fixes from a fresh-deploy run on a Keboola Storage backend with
the block-shared-snowflake-access feature flag — DuckDB Keboola
extension's per-table scan can't access bucket schemas, so the legacy
kbcstorage Storage-API client is the only working path.
CUTOVER REGRESSIONS
- agnes pull hash mismatch on every Keboola local-mode table —
src/orchestrator.py:_update_sync_state stored md5(mtime+size)[:12]
while the CLI compares against full 32-char content MD5. Now stores
the same content MD5 the materialized SQL path already used.
- Trailing-slash sanitization in connectors/keboola/access.py and
extractor.py — DuckDB Keboola extension's ATTACH fails when the URL
ends in / (canonical form).
- src/profiler.py:TableInfo.description becomes optional — two call
sites instantiated without it, crashing the profiler pass.
- scripts/ops/agnes-auto-upgrade.sh: chown on UID change — older images
ran as root, current runs as agnes (uid 999). Reads target uid:gid
from /etc/passwd inside the new image and chowns ${STATE_DIR},
/data/extracts, /data/analytics when the digest moves.
- POST /api/sync/trigger is now singleton per process — two
near-simultaneous trigger calls each forked an extractor subprocess,
fought for extract.duckdb's file lock, starved uvicorn, flipped the
container to unhealthy. Trigger now returns 409
(sync_already_in_progress) when held; _run_sync acquires non-blocking.
PARALLEL LEGACY FALLBACK
- Process pool fan-out for the _extract_via_legacy queue (default 8
workers, override via AGNES_KEBOOLA_PARALLELISM). Process pool, not
thread pool, because connectors/keboola/client.py:export_table does
os.chdir(temp_dir) — process-global, so threads raced and slice files
landed in the wrong directory ("[Errno 2] No such file or directory:
'<job_id>.csv_X_Y_Z.csv'").
- Extractor subprocess timeout 1800s -> 3600s (configurable via
AGNES_EXTRACTOR_TIMEOUT_SEC). 28+ tables × multi-minute Keboola export
jobs need the headroom on telemetry-class projects.
- Process group cleanup on timeout — Popen(start_new_session=True) puts
the extractor in its own group. On timeout the parent SIGTERMs the
group (10s grace) then SIGKILLs stragglers. Without this, the pool
workers were reparented to PID 1 and continued holding open Keboola
Storage export jobs. Inline extractor script also installs a SIGTERM
-> sys.exit(143) handler so the with ProcessPoolExecutor(...) block
__exit__ runs cleanly.
Tests: existing tests that patched subprocess.run updated to patch
subprocess.Popen with a _FakePopen stand-in (same exit-code-injection
contract). Two tests that exercised the parallel path forced
AGNES_KEBOOLA_PARALLELISM=1 to keep mocks alive (mocks don't ride into
ProcessPoolExecutor subprocesses).
Squashed onto current main (was 7 commits + multi-commit CHANGELOG +
agnes-auto-upgrade.sh conflicts; squash avoids per-commit conflict
resolution against main's flat-mount STATE_DIR refactor and 0.38.0
release cut).
* feat(keboola): Storage API direct extract path; drop extension data path
The DuckDB Keboola extension's COPY routes through Keboola QueryService,
which is unreliable on linked-bucket projects (extension v0.1.6 fixes
that case but isn't yet in the community CDN, and pre-fix any project
with the block-shared-snowflake-access feature flag couldn't see bucket
schemas at all). Move the extract path off the extension entirely and
talk to the Storage API directly via signed-URL download — works on any
project, regardless of extension state.
connectors/keboola/storage_api.py (NEW)
Lightweight client built on requests.Session. Three endpoints:
- POST /v2/storage/tables/{id}/export-async (kicks off job)
- GET /v2/storage/jobs/{id} (poll until done)
- GET /v2/storage/files/{id}?federationToken=1 (signed URL detail)
- GET <signed_url> (download bytes)
Supports sliced exports (manifest + per-slice signed URLs) and gzipped
payloads. ExportFilter dataclass mirrors the Keboola filter spec
(whereFilters / columns / changedSince / limit) and handles JSON
round-trip with the registry's source_query column. Token redaction
in error messages. Bounded exponential backoff on job polling.
No cloud-SDK dependency on the data path; thread-safe.
connectors/keboola/extractor.py
- materialize_query() rewritten: takes bucket/source_table/source_query
(JSON filter spec), exports via KeboolaStorageClient, converts CSV
to parquet via DuckDB, atomic os.replace. Same return shape so
sync.py downstream code stays uniform with the BQ branch.
- _extract_via_legacy() also moved to Storage API direct (kept the
name for caller compatibility with _legacy_worker / the parallel
batch extractor). Per-call temp directories — no os.chdir, threads
don't race.
app/api/sync.py
_run_materialized_pass for source_type='keboola' rows now constructs a
KeboolaStorageClient (replaces KeboolaAccess) and passes
bucket/source_table/source_query to materialize_query. Reuses one
client across rows for HTTP keep-alive. Sources keboola URL from env
too (KEBOOLA_STACK_URL) when instance.yaml doesn't have stack_url
configured.
cli/commands/admin.py
discover-and-register defaults Keboola rows to query_mode='materialized'
(NULL source_query = full table), matching the v26 migration's
unification of the local/materialized split for Keboola. BigQuery and
Jira keep their per-source defaults.
src/db.py
Schema bump 25 → 26. Migration: UPDATE table_registry SET
query_mode='materialized' WHERE source_type='keboola' AND
query_mode='local'. NULL source_query on those rows means "full table
export" — same effective behavior the local mode provided, but now
via Storage API instead of the extension.
pyproject.toml
kbcstorage dep stays (admin-side bucket/table list still uses the
SDK in app/api/admin.py / connectors/keboola/client.py); only the
data path is migrated off the SDK. Comment updated to reflect the
new boundary.
tests
- test_keboola_storage_api.py (NEW, 19 tests): ExportFilter parsing,
HTTP client (token redaction, retry logic, polling), download_file
(single, gzipped, sliced), end-to-end export_table_to_csv.
- test_keboola_materialize.py rewritten: mocks KeboolaStorageClient
instead of FakeAccess; same atomic-write + zero-rows + unsafe-id
contracts.
- test_sync_trigger_keboola_materialized.py: registry rows now carry
bucket+source_table+JSON-shape source_query.
114+ Keboola-impacted tests green locally.
* test: schema version assertion bumped to 26 alongside the keboola query_mode migration
* fix(keboola): cutover hot-patches surfaced on agnes-dev
Five small fixes that were applied as in-container hot-patches during
agnes-dev cutover and need to be on the source-of-truth image so a fresh
upgrade does not undo them.
- app/api/sync.py: auto-discover gate considers the WHOLE registry (any
source, any mode), not just rows where source matches and query_mode
is local. After the v25→v26 keboola materialized migration an
instance can have 30 materialized rows and zero local rows; the
previous gate kept re-firing _discover_and_register_tables every
scheduler tick, creating duplicate auto-discovered rows with the
wrong bucket prefix every time.
- app/api/admin.py: _discover_and_register_tables reassembles the
bucket as <stage>.<bucket-id> (e.g. in.c-finance) instead of
dropping the stage prefix; default query_mode for keboola is now
materialized (the v26 contract); validator allows NULL source_query
for keboola materialized rows (full-table export via Storage API
export-async, no SQL needed).
- cli/commands/admin.py: register-table mirrors the server validator
(NULL source_query allowed for source_type=keboola); --bucket help
text generalized to cover both BQ dataset and Keboola bucket id.
- connectors/keboola/extractor.py: max_line_size=64 MiB on
read_csv_auto so embedded JSON / SQL cells (kbc_component_configuration
in particular) do not trip the default 2 MiB ceiling.
- connectors/keboola/storage_api.py: GCP backend support — when the
Storage API returns a manifest whose slice URLs are gs://
references with a gcsCredentials block, rewrite to the JSON REST
download endpoint and authenticate with the issued OAuth bearer
token; redact tokens in any surfaced error string.
* test: align with new keboola materialized + auto-discover-gate contracts
- test_admin_keboola_materialized: rename
test_register_keboola_materialized_rejects_missing_source_query →
test_register_keboola_materialized_accepts_missing_source_query.
v25→v26 introduced 'keboola materialized with NULL source_query
means full-table export via Storage API export-async' as the
default registration shape; the rejection case is no longer the
contract.
- test_sync_filter: add list_all() to _StubRegistry. The auto-discover
gate in _run_sync now keys off the WHOLE registry (not just local
rows) so materialized-only Keboola instances do not re-trigger
discovery on every tick.
* feat(keboola): native parquet export — skip CSV roundtrip
Storage API export-async accepts fileType={csv,parquet}. Switching the
materialized sync to parquet eliminates the CSV → DuckDB COPY → parquet
roundtrip that pinned a single uvicorn worker over 4 GiB on multi-GB
tables (read_csv with all_varchar + max_line_size=64MB has to
materialize the whole CSV in memory before COPY can stream out a
parquet). Snowflake UNLOAD on Keboola's side already produces typed,
self-contained parquet files; the extractor downloads them and renames
into place.
Two cases:
- **Single-file** export (small table): file_info.url points at one
signed URL; download_file streams chunks straight to .parquet.tmp
and we're done. No DuckDB.
- **Sliced** export (Snowflake UNLOAD respects MAX_FILE_SIZE — 16 MiB
default — so anything larger arrives as N parquet slices): each
slice is a complete parquet file with its own footer; naive concat
would corrupt them. download_file_slices keeps the slices as
separate files in a tempdir, then DuckDB COPY (SELECT * FROM
read_parquet([slice0, slice1, ...])) merges them into one
consolidated parquet. DuckDB streams row groups during this — peak
memory bounded to one row group (~1 MiB) regardless of source size.
The legacy CSV path stays as the explicit opt-in via source_query=
'{"file_type":"csv"}' for projects whose backend can't UNLOAD
parquet (none known today; cheap escape hatch). Backward-compat alias
KeboolaStorageClient.export_table_to_csv kept.
Also fixes a latent bug in download_file's gzip detection: previous
heuristic flagged any unencrypted file as gzipped, which would have
corrupted parquet downloads at gunzip time. Name-suffix-only now.
* fix: tempdir leak cleanup, every 0m schedule, /sync/trigger body shapes
Three small self-contained fixes uncovered during agnes-dev cutover.
- connectors/keboola/extractor.py: tempfile.TemporaryDirectory now uses
ignore_cleanup_errors=True so a worker death mid-write doesn't leave
multi-GiB stale slice trees on the boot disk. (12 GiB seen after a
disk-full crash where TemporaryDirectory's own cleanup also raised
and got swallowed.)
- src/scheduler.py: is_valid_schedule accepts 'every 0m' (interval=0
= always due). Force-resync of an errored row no longer requires
waiting out the default 'every 1h' interval — admin can flip the
schedule, trigger, then flip back.
- app/api/sync.py: POST /api/sync/trigger accepts both ['table_id']
(legacy bare-array body) and {'tables': ['table_id']} (matches the
response payload shape, more discoverable for clients building
requests by hand). Malformed bodies return 422 with a structured
detail; null/missing means 'sync everything' as before.
Tests cover: tempdir cleanup on raise (sliced parquet path),
is_valid_schedule + is_table_due 'every 0m' acceptance, and trigger
body parametrized matrix (8 valid shapes + 6 rejection cases).
* fix: targeted-trigger filter in materialized pass + auto-upgrade defer
Two operational gaps observed during agnes-dev cutover, in the same
sync-routing area.
- _run_materialized_pass now takes a 'tables' arg and skips rows not in
the target set with reason='not_in_target'. POST /api/sync/trigger
with a body of tables previously only scoped the legacy extractor
subprocess — the materialized pass kept iterating every due
materialized row, so an admin asking to re-sync kbc_job re-ran
every other due materialized row alongside it. Match on registry id
OR name (admins commonly pass either form). tables=None preserves
the no-filter behavior.
- New GET /api/sync/status (public, no auth) returns {locked: bool}
off _sync_lock.locked(). agnes-auto-upgrade.sh probes this before
docker compose up -d and exits 0 with a 'deferred recreate' log
line if a sync is in flight — the next 5-min cron tick retries.
Pre-fix, an auto-upgrade triggered mid-sync would recreate the
uvicorn worker and kill the in-flight extractor / Snowflake-UNLOAD
download (observed when kbc_job's first 7-day retry got SIGKILLed).
Connection failures in the probe fall through to the upgrade —
being stuck on a wedged image is worse than interrupting a
hypothetical sync.
* fix: auto-discover protects admin overrides + surfaces drift
Two real-world incidents on agnes-dev drove this:
1. kbc_job was registered manually with the correct
(in.c-kbc_telemetry, kbc_job) coordinates. A naive auto-discover
re-run would have inserted a SECOND kbc_job row at the slugified
id 'in_c-keboola-storage_kbc_job' (where Keboola's discovery
places it) — and that row's Storage API export-async 404s.
2. An earlier auto-discover bug stripped the stage prefix from
bucket ids ('c-finance' instead of 'in.c-finance'), inserting
137 rows whose syncs all failed.
Fix:
- _discover_and_register_tables now builds a plan first
(_build_keboola_discovery_plan) classifying each discovered table
into one of new / existing_match / existing_drift / invalid, then
executes only the 'new' bucket. Drift rows are reported with both
sides of the disagreement plus drift_kind:
- same_id_diff_coords: registry has the same id but different
bucket / source_table (admin migrated coords inline).
- name_collision: discovery's slugified id differs from any
registry id, but the discovered .name matches an existing row's
.name (case-insensitive). Catches the kbc_job case.
- Bucket detection now prefers the API's authoritative bucket_id
field (separate field on the Keboola tables.list response,
normalised by KeboolaClient.discover_all_tables). Falls back to
id-string parsing only when bucket_id is missing (older fallback
path inside discover_all_tables).
- Endpoint POST /api/admin/discover-and-register?dry_run=true
returns the plan without writing — would_register, drift,
invalid lists. Lets an operator audit before merging discovery
with a registry that has admin overrides.
Removed 'every 0m' from test_register_request_rejects_malformed_sync_schedule
— the runtime started accepting it in the previous commit (force-resync
override) and the validator follows suit.
* feat(keboola): AGNES_TEMP_DIR routes tempfiles off overlayfs /tmp
The container's /tmp lives on the boot disk's overlayfs (29 GiB on
agnes-dev, shared with /var). Snowflake UNLOAD of a wide table writes
slices into per-call /tmp tempdirs that fill multi-GiB / many-slice
exports long before the dedicated data disk fills. agnes-dev hit
100% boot-disk while the 20 GiB data disk had 15 GiB free.
connectors.keboola.storage_api.get_temp_root() reads AGNES_TEMP_DIR;
mkdirs the target on first use; unset / empty / unwritable falls
back to None (system tempdir, OSS-pre-fix behaviour). Both
materialize_query (parquet path) and _extract_via_legacy (CSV
fallback) and the sliced-CSV concat path in storage_api use the
helper now.
docker-compose.yml defaults AGNES_TEMP_DIR=/data/tmp on app, scheduler,
and extract services. The data volume is the dedicated disk in
production layouts and a plain docker volume in single-disk
dev/laptop setups — same blast radius as the previous /tmp default
on the latter, no regression.
Operator-and-analyst quality bundle: a security fix for the optional
Telegram bot, two CLI gaps closed, and three rounds of UX polish on
`agnes diagnose` and `agnes pull` so non-TTY consumers (CI runners,
Claude Code SessionStart hooks, sub-agent watchdogs) get readable,
actionable signal.
- Pairing-code RNG: random.choices -> secrets.choice (CSPRNG).
- Telegram script runner: refuse out-of-shape usernames before sudo -u.
CLAUDE.md.bak.<ISO-timestamp> before regenerating.
- agnes admin unregister-table <id> -> DELETE /api/admin/registry/{id}
- agnes admin update-table <id> --field=value ... -> PUT /api/admin/registry/{id}
response but never promotes the headline. BQ billing-equals-data check
downgraded warning -> info.
default (5 s / 1 MiB vs 30 s / 10%) so sub-agent watchdogs don't kill
the pull as a hung process. New env knobs:
AGNES_PULL_PROGRESS_INTERVAL_{SECONDS,BYTES}.
--include-schema (or ?include=schema) to opt back in.
Tests: 120 passed across the touched modules, including new tests for
each fix. Pre-existing failures on main (DB migration v1->v9, binary
rename) are unrelated and not introduced here.
`_rewrite_user_sql_for_bigquery_query` does its own bare-name detection
(mirroring the non-RBAC parts of `_bq_guardrail_inputs`). The backtick
masking from #201 was applied to `_bq_guardrail_inputs` and the
forbidden-table loop, but missed this third site — so a registered
local-mode table name appearing as the table segment of a
user-supplied full backtick path (e.g. ``\`prj.ds.orders\`` matching
registered local ``orders``) tripped the cross-source guard and
forced every backtick-path query into the 50-100× slower
ATTACH-catalog fallback.
Mask once at the top of the function, route both the BQ-name
detection (line ~830) and the cross-source check (line ~867) through
the masked copy. New regression test
`test_local_name_inside_backtick_path_does_not_trip_cross_source`
proves the wrapper now wraps when it should.
Adds Pass 3 to `_bq_guardrail_inputs` that scans user SQL for full
backtick paths `<project>.<dataset>.<table>` and gates them
identically to the `bq."<dataset>"."<table>"` pass:
- Project must match the configured BigQuery data project
(`get_bq_access().projects.data`). Mismatch → HTTP 403
`bq_path_cross_project`.
- Path must point at a registered row. Unregistered → HTTP 403
`bq_path_not_registered`.
- Non-admin caller must hold a grant on the registered row's id.
Missing grant → HTTP 403 `bq_path_access_denied`.
Pre-fix, full backtick paths bypassed Agnes RBAC entirely — only the
service account scope limited reach. Post-fix the boundary matches
what `agnes catalog`-driven flows already enforce. Admin still
bypasses the per-id grant check but cannot bypass registration or
project match.
Pass 3 also seeds `dry_run_set` for resolved registered paths so the
cost-cap dry-run runs against the same physical table the user named
— composing cleanly with the Layer 2 fail-fast fallback.
When BQ rejects the rewritten dry-run SQL with `bq_bad_request`, the
cap-guard now retries with the user's ORIGINAL SQL instead of building
a synthetic `SELECT * FROM <table>` per registered table. The
synthetic path threw away user filters / projections / partition
predicates and routinely ballooned the estimate to "full table size",
falsely tripping `remote_scan_too_large` on legitimate narrow queries
(typical issue #201 trace: rewriter corrupts a backtick path → BQ
parse error → synthetic over-estimate → 400).
Behaviour:
- Rewritten SQL succeeds: same as before (issue #171 single-dry-run).
- Rewritten SQL parse-errors, original SQL succeeds: use original
estimate. Common case for users submitting BQ-native input.
- Both fail with `bq_bad_request`: HTTP 400 `remote_estimate_failed`
with a hint pointing at `agnes catalog` / BQ-native syntax. No
silent over-estimate.
- Non-parse BQ error (forbidden, upstream): still 502 as before.
This is a behaviour change for clients matching error kinds — failure
to estimate scan size now surfaces as `remote_estimate_failed`
instead of being masked behind `remote_scan_too_large` from the
synthetic path.
Replaces the existing `test_guardrail_falls_back_to_per_table_estimate_on_bq_parse_error`
(which pinned the old contract) with `test_fallback_tries_original_sql_first`
and `test_fallback_fails_fast_on_pure_duckdb_syntax`.
`agnes query --remote` corrupted user SQL when the request contained a
full BigQuery backtick path (`<project>.<dataset>.<table>`) whose
table segment matched a registered bare-name alias. The bare-name
rewriter used `\b` word-boundary matching against the lower-cased SQL;
both `.` and `` ` `` are non-word characters, so the regex fired
INSIDE the user's backtick path and produced malformed nested-backtick
SQL that BigQuery rejected at parse time.
Fix:
- Add `_mask_backticks(sql)` helper: replace each `…` segment with
spaces of equal length, preserving offsets so word-boundary
searches find positions only outside backticks.
- `_bq_guardrail_inputs` (bare-name pass + forbidden-table pass)
searches against the masked SQL.
- `_rewrite_bq_table_refs_to_native` Pass 1 splits the SQL on
`(\`[^\`]*\`)` and rewrites only the outside-backtick chunks. Pass
2 (`bq."ds"."tbl"` → backtick form) is unchanged — its prefix can't
appear inside backticks.
Adds three regressions covering the rewrite + guardrail paths.
In cross-project BQ setups (where billing != data), the SA typically has
serviceusage.services.use on the billing project but not on the data
project. The rewriter passed bq.projects.data as the first arg to
bigquery_query(), which BQ uses as the execution + billing project →
403 USER_PROJECT_DENIED.
Match the convention used everywhere else in the codebase
(app/api/v2_scan.py, app/api/v2_sample.py, app/api/v2_schema.py,
connectors/bigquery/extractor.py): backtick paths inside the inner SQL
use the **data** project (resolves the actual table location), the
bigquery_query() first arg uses the **billing** project (decides who
pays + which project the job runs under). For single-project deploys
the two are identical so the fix is a no-op there.
Test pins the cross-project case: data-prj for backticks, billing-prj
for the bigquery_query() first arg.
R2 adversarial review surfaced 3 issues, all addressed:
#1 cli/client.py:572-577 outer try/except wrapped both _get_shared_client()
AND the actual download. A 401/403/404/5xx from the server triggered a
full second download attempt with a fresh client — wasted bandwidth on
hard failures, no fail-fast on revoked PAT. Narrowed the try to only
the shared-client construction; the download itself is no longer
retried under the fallback except.
#2 concurrent agnes pull invocations (e.g. SessionStart hook + manual
run) collided on bare <target>.tmp / <target>.partN paths — one process's
in-progress write got yanked by the other's cleanup, manifest hash
check then failed spuriously. Per-process suffix (<target>.{pid}.tmp,
<target>.{pid}.partN) makes intermediate files disjoint; the final
os.replace to the bare target is atomic so last-writer-wins.
#3 _looks_like_bq_rewrite_parse_error patterns 'Syntax error' could
false-positive on a query like WHERE log_msg = 'Syntax error in foo'
that fails for an unrelated reason (quota, network) and has the
literal substring echoed in the error text. Anchored to 'Syntax error: '
(with trailing colon) — BQ always emits the colon in this error
format, user SQL string literals normally don't.
R1 adversarial review surfaced 5 issues, all addressed:
#1 chunked download silently disabled in non-Caddy deployments (HEAD on
GET-only FastAPI route returns 405). _probe_range_support now falls back
to GET with Range: bytes=0-0 when HEAD fails — works against both
Caddy file_server (HEAD-friendly) and dev FastAPI direct (GET-only).
#2 parse-error fallback heuristic too broad — matched on Unrecognized
name / Function not found / No matching signature / Invalid cast,
which BQ surfaces for ordinary user-column typos. That triggered slow
ATTACH-catalog retry on every typo (2× latency tax). Narrowed to just
'Syntax error' / 'syntax error' which are the genuine DuckDB-vs-BQ
dialect mismatch markers.
#3 apply_bq_session_settings was only run on fresh-built pool entries,
not on reuse. An operator's /admin/server-config change to bq_query
_timeout_ms wouldn't propagate to long-lived pooled sessions until
restart. Fixed: re-apply on every pool acquire (idempotent + fail-soft).
#4 content-length sanity bound — a misconfigured proxy returning a
wildly inflated Content-Length would cause overlapping chunked Range
requests against the actual file → corrupt assembled output (caught
by manifest hash check, but only after wasted bandwidth). Cap at 100
GiB; above that, drop to single-stream.
#5 rewriter assumed every BQ row resolves under the single
bq.projects.data project. Bucket containing '.' suggests a project-
qualified bucket (multi-project deployment); rewriter would silently
target the wrong project. Conservative skip with regression test.
Address code-reviewer findings on the bigquery_query() rewrite path:
1. Outer LIMIT wrap — bigquery_query() materialises BQ result into DuckDB
before fetchmany sees it (vs ATTACH-catalog Storage Read API streaming).
A user 'SELECT *' against a billion-row remote table would buffer the
entire result before request.limit applied. Wrap rewritten SQL in an
outer 'LIMIT N+1' so the cap pushes into the BQ job itself.
2. Dollar-quoted inner SQL — naive replace("'", "''") doubling missed
DuckDB backslash-escape sequences (\\, \\n, \\t, …). A predicate
like 'WHERE name = ''O\\'Brien''' was unsafe under the doubling
path. DuckDB $bqq_inner$ … $bqq_inner$ form takes the inner SQL
verbatim with no escapes whatsoever. Falls back to legacy doubling
if user SQL improbably contains the literal tag.
3. Parse-error fallback — when the rewritten path fails with a BQ-side
parse / validation error (DuckDB-only syntax like ::INT cast that
survives identifier rewrite but BQ refuses), retry the user's
original SQL via the legacy ATTACH-catalog path so the request still
succeeds. Mirrors the existing dry-run fallback contract.
4. CHANGELOG — delete duplicate CLI bullets that landed under
already-released [0.38.1] (file corruption from merge — entries are
correctly under [0.39.0]).
User SQL hitting query_mode='remote' BigQuery rows was 50-100x slower
than the equivalent direct bigquery_query() call because DuckDB's master
view (CREATE VIEW … AS SELECT * FROM bigquery.<ds>.<tbl>) does not push
WHERE/SELECT/LIMIT into BQ in ATTACH-catalog mode. The BQ extension opens
a Storage Read API session over the entire upstream table; on >100M-row
sources this was 70-150s and frequently failed with 'Response too large
to return'.
Extract the existing dry-run rewriter's core (table-name → BQ-native
backtick path) into a shared helper. Add an execution-path rewriter
that wraps the whole user SQL in bigquery_query('<project>', '<inner>')
so the BQ planner sees the full query and engages partition pruning +
projection pushdown server-side.
Conservative fall-through: cross-source JOINs (BQ ↔ Keboola/Jira local),
queries already containing bigquery_query(, and unconfigured BQ project
all skip the rewrite and run the original SQL via ATTACH-catalog so
behavior degrades gracefully.
1. instance.yaml overlay path now matches read site under STATE_DIR.
Three sites updated:
- app/api/admin.py:1005 (server-config endpoint writer)
- app/api/admin.py:2610 (configure endpoint writer)
- app/instance_config.py:106 (overlay reader)
All three now go through _state_dir() so under flat-mount layout
(STATE_DIR=/data-state) the irreplaceable instance.yaml overlay
lands on the state disk (sdc) instead of the regenerable data
disk (sdb). Without this fix, .env_overlay correctly went to the
state disk while instance.yaml went to the data disk — config
would be lost if an operator wiped sdb.
2. Strip customer-specific tokens from OSS repo per CLAUDE.md
vendor-agnostic rule:
- docker-compose.host-mount.yml: 'a deployer (Groupon FoundryAI)'
→ 'a deployer in production'
- docker-compose.flat-mount.yml: 'caused 2026-05-05 in the
Groupon FoundryAI deployment' → generic 'production failure
mode'
- docs/state-dir.md: rewrote the incident reference to describe
the failure mode abstractly without naming the deployment;
updated the recommendation table to say 'shadow-mount class'
instead of dating the specific incident.
3. Updated docs/state-dir.md 'What reads STATE_DIR' to list all
read/write sites including the three migrated in this round
(admin.py, instance_config.py, marketplaces.py).
ANALYSIS finding (tls-rotate.sh hardcoded host-mount.yml) deferred
— same operator-side class as auto-upgrade.sh hardcoded host-mount,
documented limitation per the PR body.
1. .env_overlay write paths now match read path under STATE_DIR.
app/main.py:343 reads via _state_dir() (post-PR #194), but two
write sites still hardcoded ${DATA_DIR}/state/.env_overlay:
- app/api/admin.py:2687 — configure endpoint secrets persistence
- app/api/marketplaces.py:152 — marketplace PAT persistence
Under flat-mount layout (STATE_DIR=/data-state) the admin UI wrote
secrets to /data/state/.env_overlay while the app read from
/data-state/.env_overlay, silently dropping the value on next
restart. Both write sites now go through _state_dir().
2. host-mount.yml: caddy inherits data:/srv:ro from base, but with
no service populating the data: named volume (other services
switched to direct /data binds), the inherited mount points at an
empty Docker volume — try_files finds nothing, every parquet
download falls through to uvicorn, defeating the v0.36.0
file_server bypass under the host-mount layout. Added a caddy
override that restates all mounts including a direct /data:/srv:ro
bind. Mirrors the comment + treatment already in flat-mount.yml.
🚩 /api/v2/catalog still async def while now calling sync stat()
`/api/v2/catalog` was left as `async def` when the rest of Tier 1 was
converted, on the assumption it was lightweight. The new
`_materialized_size_hint` populator added in this PR calls
`Path.stat()` / `Path.exists()` for every visible row to bucket the
parquet size — on a local FS that's microseconds, but on a
network-mounted DATA_DIR (NFS / CIFS / GCS-FUSE) those syscalls
can block the event loop. Convert to plain `def` so FastAPI
auto-offloads to the thread pool, mirroring /api/query etc.
🔴 stream_download translates HTTPStatusError as generic transport error
`response.raise_for_status()` inside the retry loop raises
`httpx.HTTPStatusError` on 4xx/5xx. After retries exhaust, the new
`isinstance(last_exc, httpx.HTTPError)` check at line 219 was eating
the status code: HTTPStatusError is a subclass of HTTPError, so the
generic transport translation produced "Unexpected error: HTTPStatusError"
instead of the informative "Client error '401 Unauthorized' for url …"
that callers expect. Fix: short-circuit HTTPStatusError before the
HTTPError branch — it re-raises verbatim so the caller's status-code
handling + the rich server error body (e.g. 401 expired token, 403
cross_project_forbidden) reach the analyst.
api_get / api_post / api_delete / api_patch don't have the same bug:
httpx Client.get/etc. don't raise HTTPStatusError unless the caller
explicitly calls .raise_for_status(), and our wrappers don't.
Only stream_download does, hence the targeted fix there.
Five hottest BQ-touching endpoints were `async def` but invoked synchronous
DuckDB / BQ-extension calls inside the body. Under uvicorn's single event
loop that meant a single heavy `agnes query --remote` (waiting up to
~200 s for BQ's jobs.query) froze EVERY other request — /api/health,
dashboard, auth, even another query — for the full BQ wait. Operators
saw "VM idle, app frozen" during PR #188's testing.
Convert to plain `def` so FastAPI auto-offloads the body to the anyio
thread pool. Event loop stays free for non-BQ requests.
- app/api/query.py:execute_query
- app/api/v2_scan.py:scan_estimate_endpoint, scan_endpoint
- app/api/v2_sample.py:sample
- app/api/v2_schema.py:schema
Audit: 0 `await` statements in any converted handler (verified file-by-
file), so the rename is safe. Tests in tests/test_v2_*.py called the
handlers via `asyncio.run(...)` which now fails on a non-coroutine return;
swapped for direct calls (asyncio.run( -> ( ) — keeps paren balance).
Plus AGNES_THREADPOOL_SIZE env var (default 200, was anyio's stock 40)
in app/main.py:lifespan. Set via
anyio.to_thread.current_default_thread_limiter().total_tokens. 200 is
comfortable headroom for <50 concurrent analysts; bump for more.
480/480 impacted tests pass (the 2 remaining errors are a pre-existing
fixture setup issue in test_reader_smoke_matrix.py unrelated to this
change).
Three concrete changes addressing the "analyst Claude misuses the CLI"
class of bugs (image.png table — issues #3, #5, plus the recurrent
"how big is this table" guesswork):
1. config/claude_md_template.txt — the template agnes init writes to
<workspace>/CLAUDE.md. Surfaces every catalog-row field with a why,
adds a query_mode-based decision tree, explicit --estimate scoping
(snapshot create ONLY — was the #1 first-try error), an agnes fetch
→ agnes snapshot create rename note, and a 6-row failure-mode table
that maps each common error wording to its right next step.
2. app/api/v2_catalog.py — populate rough_size_hint for local +
materialized rows from the on-disk parquet size, bucketed
small/medium/large/very_large. Was hardcoded null with a TODO; AI
couldn't tell "is this 6.8 GB" without a failed --remote round-trip.
3. cli/update_check.py — the [update] banner survived the da→agnes
rename and printed "[update] da X is out of date" on every command,
training analysts to associate the binary with the old name.
Verified by rendering the template against representative contexts
(33/33 tests pass) and running every use case from the original
screenshot through the real CLI against a dev VM.
A single analyst's multi-GB `agnes pull` held the only uvicorn worker
for the duration of the stream, starving UI / /api/health / every other
API endpoint. Container flipped to `unhealthy`. Triggered while a
6.8 GB `order_economics` pull was in-flight on prod 2026-05-05.
Caddy now intercepts `GET /api/data/{table_id}/download` and serves
the parquet directly via sendfile from the data volume (mounted r-o
at /srv inside the caddy container). RBAC enforced by `forward_auth`
to a new lightweight `GET /api/data/{table_id}/check-access` endpoint
(returns 204 / 403) — the bulk transfer never reaches uvicorn.
Path discovery via `try_files` over the known extract.duckdb v2 source
subdirs. Anything not at a static path falls through to the existing
app handler so legacy `src_data/parquet` and future connectors still
work without a Caddyfile change. Non-Caddy deployments are unchanged.
Stage 1 (multi-worker uvicorn) was considered but blocked by the
single-writer DuckDB lock on system.duckdb — workers > 1 would crash
at startup on "Could not set lock on file", the same race that pushed
the scheduler from in-process writes to HTTP-via-app. Multi-reader
workers + single-writer coordination is out of scope for this PR.
DuckDB BigQuery extension defaults `bq_query_timeout_ms` to 90 s, which
is too tight for analyst-scale queries against view-backed BQ datasets.
`agnes query --remote` HTTP 400'd with `Binder Error: Query execution
exceeded the timeout. Job ID: ...` whenever the underlying BQ job ran
longer than 90 s, even though the job itself was healthy.
Add `data_source.bigquery.query_timeout_ms` (default 600 000 ms = 10 min,
sentinel 0 falls through to the extension default). Applied via
`SET bq_query_timeout_ms` after every `LOAD bigquery` on every BQ-touching
DuckDB session: orchestrator's `_remote_attach` ATTACH path, BqAccess
session factory, and the standalone extractor. Configurable via
`/admin/server-config` UI.
Fail-soft: extension versions that don't recognise the setting silently
keep the default rather than poisoning the session.
Backup-orchestration commands were split across two namespaces (pull in
agnes store, push in agnes admin store), which broke the operator
mental model — pull/push are a paired operation and should sit
together.
Move pull + info into agnes admin store so all bulk operations share
one help screen. Add agnes store mine as the user-facing equivalent —
calls the same /api/store/bundle.zip endpoint with ?owner=me, which
the server resolves to the caller's user_id. Authors can archive
their own uploads without admin role; whole-Store bulk reads stay
admin-flavored as a discoverability hint.
Server: 3-line addition to export_bundle handles owner='me' as a
magic alias for the caller. No new endpoint.
Tests updated: pull/info expectations move from agnes store to
agnes admin store; new tests cover agnes store mine and the
?owner=me server resolution. 69/69 store tests green locally.
Adds whole-Store backup/restore primitives so an external CI/CD job can
mirror the Store to a git repo (and restore back from one).
REST:
- GET /api/store/bundle.zip — deterministic ZIP of all (filtered) Store
entities. Layout: manifest.json + entities/<id>/{plugin,assets}/.
Manifest carries owner_email for cross-instance restore. Auth: any
authenticated user (Store is community-open).
- POST /api/store/import-bundle — admin-only restore. Modes
merge|replace|skip; owner resolution by email with stub-disabled-user
fallback when the email is unknown on the target instance.
CLI:
- agnes store update <id> [--description X] [--zip PATH] ... — in-place
edit (server PUT permits owner OR admin per F4). Closes the missing
edit affordance for analysts who want to fix a typo or push a new
ZIP without losing install_count.
- agnes store pull [-o store.zip] [--unpack DIR] — download the bundle.
--unpack streams + extracts so an external git-backup workflow can
drop the tree straight into a repo and `git add .`.
- agnes store info [--json] — counts + size summary.
- agnes admin store push <zip-or-dir> [--mode ...] — admin-only restore.
Auto-zips a directory client-side so a working-tree → server
round-trip is one command.
cli/v2_client.py gains api_get_stream helper for binary downloads.
Tests: 5 new server tests (bundle shape + filters + round-trip + stub
user creation + skip mode + admin-only gate) + 11 new CLI tests
(update, pull/unpack, info, admin push). 66/66 store-related tests
green locally.
Devin flagged that run_session_collector still had the same audit-skip
gap I fixed in run_verification_detector and run_corporate_memory in
the previous two rounds — a PermissionError walking /home, an OSError
on /data/user_sessions mkdir, or any other unhandled exception from
collector.run() would skip the audit_log row and only show in docker
logs.
Same try/except + unhandled_error pattern as the sibling endpoints.
All three LLM-pipeline run-* endpoints now record their failures the
same way; /admin/scheduler-runs sees them. Regression test in
tests/test_admin_run_endpoints.py::TestRunSessionCollector::test_unhandled_exception_still_audits.
Devin flagged that run_corporate_memory still had the same audit-skip
gap I just fixed in run_verification_detector — if collect_all() throws
anything other than the already-translated ValueError (DuckDB lock,
network blip, unexpected SDK error), the audit_log row was never
written and /admin/scheduler-runs missed the failure.
Same try/except + unhandled_error pattern as the verification_detector
fix from 4c4dfee8. Regression test in
tests/test_admin_run_endpoints.py::TestRunCorporateMemory::test_unhandled_exception_still_audits.
Three changes addressing user feedback during e2e test of #179 + Devin Review on e86dd5ed.
1) /profile/sessions — new self-service user page in the user menu.
Lists all session jsonls the caller uploaded via `agnes push` joined
against session_extraction_state. Each row shows uploaded_at, file
size, status badge (pending/processed/extracted), processed_at, and
items_extracted. The page docstring + help text explicitly call out
that items_extracted=0 means the verification detector ran fine but
the LLM found no claims to track — that's the documented "no items"
outcome, not a broken pipeline. Closes the gap surfaced during the
e2e test of #176 where a user could see their sessions on disk and
process them through the LLM but had no UI to inspect what happened.
2) run_verification_detector audits unhandled exceptions (Devin #1).
If detector.run() threw anything other than the already-translated
ValueError, the audit_log row was never written. The endpoint now
wraps detector.run in try/except, records the exception in
audit_params["unhandled_error"], then re-raises as 500 after audit.
The /admin/scheduler-runs page surfaces the failure row with the
error type + message.
3) SCHEDULER_AUDIT_ACTIONS list corrected (Devin #2). Previous list
had "marketplaces_sync_all" (wrong — actual is "marketplace.sync_all")
plus "data_refresh" and "scripts_run_due" which app/api/sync.py and
app/api/scripts.py don't write to audit_log. Fixed to the four
actually-logged strings; comment points at the missing audit calls
as a follow-up.
Tests: tests/test_web_ui.py adds TestAdminRoleGuards::test_profile_sessions_page_no_admin_required and tightens test_admin_scheduler_runs_page_admin_only to assert the correct marketplace.sync_all string.
create_entity + update_entity created the `scratch` temp dir inside one
try/finally but cleaned it up in a separate one. Validation HTTPExceptions
raised by _safe_zip_extract (zip_unsafe_path, zip_too_large_uncompressed)
or the BadZipFile→422 conversion exited the first scope, and the second
finally was never entered → temp dir leaked on every failed upload.
Devin flagged this on the F2 commit. The leak pre-existed (zip_unsafe_path
was the original vector); F2 added zip_too_large_uncompressed to the same
broken cleanup path. Fixed by collapsing scratch creation + cleanup into
one outer try/finally that covers both extraction AND metadata/bake; the
inner try/except/finally still handles BadZipFile→422 + tmp file cleanup.
Same restructure in update_entity. Regression test
`test_scratch_dir_cleaned_up_after_failed_extraction` triggers a
zip_unsafe_path 422 and asserts tmp/agnes_store_* contains no leaked
dirs.
Three independent reviews of PR #180 surfaced four real defects in the new
Store / my-ai-stack surface. CHANGELOG entries detail each; one-liners:
- F1 video_url XSS: any authenticated user could upload a Store entity
with `video_url=javascript:...` and pop XSS in any viewer's session via
the `<a href=...>` "Watch video" link in store_detail.html. Jinja2
autoescape doesn't block URI schemes inside attribute values. Fixed by
scheme-validating to http(s) only on create + update; 400 invalid_video_url.
- F2 ZIP decompression bomb: _safe_zip_extract checked path-traversal but
not declared file_size totals — a 50 MB compressed upload at 1:1000
ratio decompresses to 50 GB and DOS the host disk. Fixed by summing
zinfo.file_size across infolist() and refusing > 200 MB before
extractall touches disk. 413 zip_too_large_uncompressed.
- F4 admin authz parity: PUT /api/store/entities/{id} was owner-only while
DELETE allowed owner OR admin; the store-detail page hid Edit/Delete
buttons from admin even though DELETE was permitted. Fixed by allowing
admin on PUT and passing is_admin to the template; gate is now
is_owner OR is_admin everywhere.
- F5 cross-owner suffix collision: sanitize_username is many-to-one
(alice.smith / alice_smith both → alice-smith). Two such users uploading
entities with the same display name produced identical
`<name>-by-<username>` suffixes, silently colliding in the served
agnes-store-bundle on-disk paths AND the manifest catalog (Claude Code
dedupes by plugin.json `name`). Fixed by enforcing global uniqueness on
the suffixed value at create_entity; 409 conflict_global_suffix.
F3 (ZIP symlink members) was investigated and confirmed to be a
false-positive — Python's stdlib ZipFile.extractall does not honor
symlink mode bits, so no exploit exists.
9 new regression tests in tests/test_store_api.py::TestStoreSecurityFixes
covering all four. Test run locally: 60/60 store-related tests pass.
run_session_collector called collector.main() which did argparse.parse_args()
on uvicorn's sys.argv (['app.main:app', '--host', ...]) → sys.exit(2) →
SystemExit(2), which inherits from BaseException, escapes FastAPI handlers,
and propagates through the thread pool. Every scheduler tick that fired the
endpoint either 500-ed or risked killing the uvicorn worker.
services/session_collector/collector.py now exposes run(dry_run, verbose)
that returns (rc, stats); main() is a thin CLI shim that parses argv and
delegates. The admin endpoint calls run() directly and audit-logs the
per-run stats (users_processed, files_copied, files_skipped) instead of
just the rc. Three regression tests in TestRunHelper.
Closes Devin Review finding on app/api/admin.py:2819 (#179).
Devin BUG: /api/admin/configure seeds an ai: block to the writable
overlay at DATA_DIR/state/instance.yaml, but the three LLM consumers
imported from config.loader.load_instance_config — which reads the
static config dir only. Even if they had read the overlay, the loader
ran yaml.safe_load directly without passing through _resolve_env_refs,
so '${ANTHROPIC_API_KEY}' would have stayed a literal placeholder. The
pipeline appeared to work because the factory falls back to the env
var directly, but the overlay path itself was dead code.
Two fixes, both required:
1. Switched the three LLM consumers to app.instance_config.load_instance_config:
- services/corporate_memory/collector.py:collect_all
- services/verification_detector/__main__.py:main
- app/api/admin.py:run_verification_detector
2. app/instance_config.py runs the loaded overlay through
config.loader._resolve_env_refs *before* the deep-merge, so
'${ANTHROPIC_API_KEY}' resolves at config-load time.
New regression suite tests/test_instance_config_overlay.py pins:
- env-ref resolution against the overlay (resolved when env set,
empty when env missing — never the literal placeholder)
- deep-merge still preserves static-only sections
- the three consumers reach app.instance_config (inspected via
inspect.getsource so a future refactor that reverts the import
fails the test)
- end-to-end: a seeded overlay + ANTHROPIC_API_KEY env reaches the
factory with a resolved api_key
The PR rewrote collect_all() to call the new
create_extractor_from_env_or_config() helper, but the existing tests
still mocked the old direct create_extractor() symbol and the old
silent-skip-on-missing-config behavior. Five tests in
TestCorporateMemoryCollector and one in TestCollectorExtractorIntegration
were red on the PR branch.
Changes:
- Tests now mock connectors.llm.create_extractor_from_env_or_config
(the symbol the collector imports lazily).
- Renamed test_collect_all_no_ai_config_skips ->
test_collect_all_no_ai_config_or_env_raises and
test_collector_handles_invalid_config -> test_collector_raises_on_invalid_config.
Both assert pytest.raises(ValueError) — the explicit fail-fast
semantics defect 5 of #176 was supposed to enforce.
- collect_all() no longer swallows the factory's ValueError into
stats["errors"]; it propagates so the scheduler / admin endpoint
surface the actionable misconfiguration message instead of
pretending the run was a no-op.
- /api/admin/run-corporate-memory translates the propagated ValueError
into a 500 with the factory's message, matching
/api/admin/run-verification-detector.
Adds a community-driven Store where any authenticated user uploads
skills/agents/plugins as ZIPs, plus /my-ai-stack as the per-user
composition view. The served Claude Code marketplace is now:
(admin_granted ∖ opt_outs) ∪ store_installs
Skill + agent installs are merged into a single `agnes-store-bundle`
plugin in the served marketplace; type=plugin uploads stay standalone.
Names are suffixed with `-by-<owner-username>` at upload time so two
owners can use the same display name without colliding in Claude Code's
flat skill/agent namespace.
Schema v23 → v24 adds three tables:
- store_entities — community-uploaded skills/agents/plugins
- user_store_installs — what each user has chosen to install
- user_plugin_optouts — opt-out overlay on top of admin grants
Admin grant-delete drops every user's opt-out for that plugin so
re-grant resets cleanly to enabled (no sticky personal preference).
UI:
- /store — e-commerce-style listing with type/category/owner
filters, search, pagination, owner-aware [Install]
buttons, clickable cards
- /store/new — 2-step upload wizard with drag & drop, preview
validation (POST /api/store/entities/preview), docs
multi-upload, photo + video URL
- /store/{id} — detail page with hero, file list, docs, owner
actions (Edit/Delete) for the uploader
- /my-ai-stack — Granted plugins (toggle opt-out) + From the Store
(uninstall) sections
- Admin nav: Marketplaces moved into Admin dropdown, renamed to
"Curated Marketplaces"
Validation hardening: type-mismatch guards reject skill ZIP uploaded as
agent (or vice versa), and plugin ZIPs masquerading as skills/agents.
Human-readable error messages mapped client-side from machine codes.
Cross-source naming: Store entity-id-prefixed dirs (`plugins/store-<id>/`)
plus the bundle (`plugins/store-bundle/`) avoid collisions with admin
marketplaces (whose `store` slug is reserved by `is_valid_slug`).
Bundle composition is content-hashed at serve time — install/uninstall
or owner re-upload bumps the bundle's plugin.json `version`, so Claude
Code's auto-update toggle picks up changes.
Tests: 50+ new tests across naming, repositories, filter (admin ∪ store
∪ bundle), API (upload/install/uninstall/delete/preview/docs), end-to-end
marketplace.zip with bundle merging.
GET /api/health/detailed now returns a session_pipeline service entry.
Heuristic:
max(mtime of /data/user_sessions/**/*.jsonl) <=
max(processed_at in session_extraction_state) + grace_seconds
grace_seconds = 2 × verification-detector cadence (default 30 min;
configurable via SCHEDULER_VERIFICATION_DETECTOR_INTERVAL).
When the assert fails, status='warning' (never 'error') with an
actionable detail pointing at the verification-detector scheduler job.
A warning bubbles up to the existing overall='degraded' aggregation —
operators querying /api/health/detailed (or /agnes diagnose system)
get a clear breadcrumb instead of a silently-broken pipeline.
Cold-start case (no session files, or files newer than the grace
window with empty state table) is handled explicitly to avoid noise
on a fresh deploy.
Tests: tests/test_health_session_pipeline.py.
The session-collector, verification-detector, and corporate-memory
services now run on the same scheduler-v2 model that already drives
data-refresh, health-check, script-runner, and marketplaces:
- New admin endpoints in app/api/admin.py:
POST /api/admin/run-session-collector
POST /api/admin/run-verification-detector
POST /api/admin/run-corporate-memory
All admin-gated, sync-def (FastAPI thread pool), with one audit row
per invocation. Same single-writer-of-system.duckdb pattern as the
existing /api/marketplaces/sync-all job.
- services/scheduler/__main__.py JOBS gains three entries with offset
cadences (10m / 15m / 17m, all coprime modulo the 30s tick) so the
three LLM-backed jobs don't fire on the same tick and stack their
API + DB load.
- The verification-detector endpoint surfaces the LLM factory's
fail-fast ValueError as HTTP 500 with the actionable message,
preserving the no-silent-skip contract from the previous commit.
Tests:
- tests/test_admin_run_endpoints.py covers admin gating + scheduler
registration + endpoint contract.
- tests/test_scheduler_sidecar.py existing tests continue to pass.
POST /api/admin/configure now writes a default ai: block into the
instance.yaml overlay when the request leaves it untouched and either
ANTHROPIC_API_KEY or LLM_API_KEY is set in the environment. The block
references the env var via ${VAR} syntax — secrets never land in YAML.
connectors.llm.factory grows create_extractor_from_env_or_config which
falls back to ANTHROPIC_API_KEY / LLM_API_KEY when ai_config is empty
and raises a clear ValueError when neither is available. Both
services/corporate_memory and services/verification_detector switch to
the new helper, replacing the old 'silently skip when ai: missing'
path that was the silent-failure root cause.
Tests:
- tests/test_setup_ai_block.py — overlay seeding contract.
- tests/test_llm_provider_env_fallback.py — fallback + fail-fast.
The iterative bare-name rewriter (one re.sub per name, longest-first)
was vulnerable to cross-contamination when the GCP project ID contained
a registered table name as a hyphen-delimited word.
Concrete repro:
project = 'my-ue-project'
registered = ['orders', 'ue']
user SQL = 'SELECT * FROM orders JOIN ue ON ...'
iter 1 (orders): produces 'FROM `my-ue-project.fin.orders` JOIN ue ...'
iter 2 (ue): '\bue\b' matches 'ue' INSIDE 'my-ue-project' (hyphen
creates word boundary on both sides) — corrupts
the iter-1 path
Fallback at query.py:576 caught the resulting BQ parse error and fell
back to per-table SELECT * estimate, so impact was over-estimation,
not fail-open — but the #171 partition-pruning fix silently degraded
to pre-fix behavior whenever a project name shared a hyphen-segment
with a registered table.
Fix: single re.sub call with an alternation regex sorted longest-first.
Single-pass means each source position is processed exactly once, so
freshly-inserted backticked text from one match isn't re-scanned by
later names in the alternation.
Regression test
test_rewrite_helper_does_not_corrupt_when_project_id_contains_registered_name
covers the exact Devin repro.
Bring admin UI, audit-log messages, code comments, and analyst-facing
skill docs in line with the post-bootstrap CLI surface (`agnes pull`,
`agnes push`, `agnes init`, `agnes snapshot create`). The legacy
`_LEGACY_STRINGS` detection tuple in `app/api/claude_md.py` and the hook
upgrade markers in `cli/lib/hooks.py` are intentionally left as-is —
they exist precisely to flag pre-rewrite content for re-authoring.
Strip "(folded from `da metrics list`)" / "(lifted from `da metrics
show`)" / "Replaces the old `da analyst status`" docstring noise — the
rename history is in CHANGELOG.md, not in module docstrings.
Closes#171. The /api/query cost guardrail used to dry-run a synthetic
`SELECT * FROM <table>` for each registered remote-BQ row referenced
by the user SQL — which made BigQuery estimate a full table scan, with
column projection, predicate pushdown, and partition pruning all
disabled. Narrow queries on big partitioned/clustered tables (the
documented happy path for `agnes query --remote`) hit ~30,000×
over-estimates and got rejected with 400 `remote_scan_too_large` even
when BQ's own dry-run reported single-digit MB.
Pavel's report on #171 traced the root cause and proposed the fix:
rewrite the user SQL to BQ-native syntax and dry-run it as a single
job, exactly the way `bq query --dry_run` works.
Implementation:
- New helper _rewrite_user_sql_for_bq_dry_run rewrites bare registered
names (word-boundary, case-insensitive, longest-first to avoid prefix
collisions) + bq."<ds>"."<tbl>" forms to backticked
`<project>.<ds>.<tbl>` paths.
- _bq_quota_and_cap_guard runs ONE dry-run on the rewritten SQL. Cap
check uses the real estimate.
- Fallback path: if BQ rejects with bq_bad_request (e.g. DuckDB-only
syntax like ::INT casts), the guard falls back to the pre-fix
per-table SELECT * approach so non-portable queries still get a
(loose) cap estimate instead of fail-opening. Non-parse BQ errors
(forbidden, upstream) still propagate as 502.
- _bq_guardrail_inputs now also returns name_lookups so the rewriter
has the (registered_name, bucket, source_table) mapping it needs.
- Per-table breakdown is unavailable from a composite dry-run; total
bytes are pinned to dry_run_set[0] for the post-flight
record_bytes(sum(...)) call to keep returning the right total.
Tests (7 new, 3 existing still pass):
- dry-run receives rewritten user SQL with WHERE clause intact (the
load-bearing assertion for #171)
- single dry-run per request even with multiple registered tables
(JOIN, UNION) referenced
- fallback to per-table SELECT * on bq_bad_request
- non-parse BQ errors (forbidden) still 502
- rewriter unit tests: bare + bq.path in same SQL, longest-name-wins
on prefix collision, case-insensitive bare-name match
13 Devin findings across 10 files:
🔴 Critical:
- app/api/v2_catalog.py:42 — `_fetch_hint` returns `da fetch` in /api/v2/catalog
responses (user-visible in every catalog list)
- cli/skills/agnes-data-querying.md — 11 stale `da fetch`/`da sync` refs in the
bundled skill markdown
- config/claude_md_template.txt:38 — referenced `agnes pull --docs-only` flag
that does NOT exist in agnes pull (removed; spec only ships --quiet/--json/
--dry-run)
🟡 Important:
- app/api/admin.py:252 — `da fetch` in bq_max_scan_bytes hint
- cli/commands/auth.py:119 — `da sync` in import-token docstring (--help text)
- cli/commands/tokens.py:48 — "Export it so `da` can use it" prose
- ARCHITECTURE.md — 4 stale rows in CLI commands table
- README.md — stale paragraphs for analysts (da sync, da analyst setup)
🚩 Substantive observations addressed:
- app/api/query.py:249,302,489 — server-side error/help strings still said
`da sync`/`da fetch` (returned in API responses to clients)
- cli/commands/snapshot.py:235-241 — DuckDB existence guard incorrectly
blocked `--estimate` (server-side dry-run that never opens local DB).
Added test ensuring estimate path skips the guard.
Skipped (intentionally historical):
- app/api/admin.py:2377,2429,2437 — historical comments describing past
manifest-vs-sync_state bug; past tense, accurate to keep as `da sync`.
New top-level 'materialize' section, single field (lock_ttl_seconds).
Default 86400 (24h). Backs the file-lock TTL reclaim added in the
per-table-mutex change. Editable via PUT /api/admin/server-config and
the /admin/server-config UI.
When admin registers a materialized BQ row with bucket+source_table but
no source_query, the server generates 'SELECT * FROM `<project>.<ds>.<tbl>`'
from instance.yaml's configured BQ project. Same fallback fires on PUT
when flipping to materialized. The backtick rejection guard, which was
appropriate for DuckDB-flavor source_query, is relaxed for materialized
rows since the new wrapping path (Task 2) runs admin SQL through BQ
jobs API which uses BQ-native syntax (backticks for dashed identifiers).