Commit graph

155 commits

Author SHA1 Message Date
ZdenekSrotyr
751cc25327
release: 0.46.5 — agnes describe -n parses, server sanitizes NaN (#224)
## Summary

Two bugs in `agnes describe` surfaced from a real analyst session following the CLAUDE.md agent-rails discovery workflow. Together they break `agnes describe` end-to-end for any analyst (or analyst-AI) who follows the documented form.

### A) CLI parsing

`agnes describe TABLE -n 5` failed with `Missing argument 'TABLE_ID'`. Root cause: the command was registered as a `Typer.Typer` subcommand group via `app.add_typer(describe_app, name="describe")` + `@describe_app.callback(invoke_without_command=True)`, and that pattern mis-parses positional + short-int option in some orderings. Same pattern in `cli/commands/schema.py` works only because schema has no INTEGER short option. Fix: switch to flat `@app.command("describe")`.

### B) Server NaN

`/api/v2/sample/<id>` (called by `agnes describe`) returned HTTP 500 with `ValueError: Out of range float values are not JSON compliant: nan` whenever a row contained NaN. Fix: sanitize NaN/±inf to None before JSON serialization.

## Test plan

- [x] `pytest tests/test_cli_describe*.py` — added regression tests pinning `-n` parsing on either side of the positional.
- [x] `pytest tests/test_api_v2_sample*.py` — added regression test for NaN row → JSON `null` (not 500).
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/224" target="_blank">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
    <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review">
  </picture>
</a>
<!-- devin-review-badge-end -->
2026-05-07 18:16:21 +02:00
ZdenekSrotyr
7fc5365891
release: 0.46.3 — self-heal session pipeline + clearer diagnose (#220)
## Summary

Verified against production: `claude -p` headless mode doesn't fire SessionEnd hooks (proven via `--output-format stream-json --include-hook-events`: zero `SessionEnd` events), so any session JSONLs from `-p` invocations stay orphaned locally and never reach the server. Fix: add `agnes push --quiet` as a third SessionStart entry — symmetric self-heal alongside the existing `agnes pull` entry. Existing workspaces pick this up on their next `agnes init` via the marker-based migration already in `cli/lib/hooks.py`.

Separately: a colleague's fresh install showed `agnes diagnose` warning "uploads are not being processed", which led them to suspect their `agnes push` was broken. The warning is actually about the LLM-based `verification-detector` backlog (uploads themselves were arriving fine — confirmed by 23+3 JSONLs landed on the server while the warning was firing). Reword the warning to "verification-detector backlog" + add `last_processed` to the diagnose dict so operators don't have to grep logs to confirm.

## Test plan

- [x] `pytest tests/test_lib_hooks.py` — updated count + added `agnes push in SessionStart` assertion.
- [x] `pytest tests/test_setup_hooks_template.py` — updated.
- [x] `pytest tests/test_clean_install_integration.py` — updated.
- [x] `pytest tests/test_health_session_pipeline.py` — updated warning text + asserted `last_processed` field.
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/220" target="_blank">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
    <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review">
  </picture>
</a>
<!-- devin-review-badge-end -->
2026-05-07 17:41:22 +02:00
ZdenekSrotyr
378ee40459
release: 0.46.1 — surface real BQ error from remote_estimate_failed retry (#218)
## Summary

When `agnes query --remote` references a column that doesn't exist on the FROM table, users were seeing `Table "<id>" must be qualified with a dataset` instead of the actually-useful `Unrecognized name: <column>` from BigQuery. Surface the first-attempt diagnostic now; keep the second-attempt context as `underlying_original`.

Reproduced against production:
```
$ agnes query --remote "SELECT COUNT(*) FROM unit_economics WHERE authorize_date = DATE '2025-05-06'"
Error: remote_estimate_failed (HTTP 400)
  message: Could not estimate scan size for this query.
  underlying: 400 ... Table "unit_economics" must be qualified with a dataset.
```

(`unit_economics` has `authorize_timestamp`, not `authorize_date`.)

## Test plan

- [x] New `test_remote_estimate_failed_surfaces_first_error_when_attempts_differ` asserts the first-attempt message wins, second-attempt is preserved as `underlying_original`, hint points to `agnes schema`.
- [x] Existing `test_guardrail_returns_400_remote_estimate_failed_on_double_parse_error` still passes (both attempts mocked to identical error).
- [x] `pytest tests/test_api_query_guardrail.py` clean.
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/218" target="_blank">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
    <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review">
  </picture>
</a>
<!-- devin-review-badge-end -->
2026-05-07 16:54:45 +02:00
ZdenekSrotyr
28430ced09
Keboola cutover: native parquet path + sync correctness + auto-discover protection (#190)
* fix: cutover regressions + parallel Keboola legacy fallback

Bundled fixes from a fresh-deploy run on a Keboola Storage backend with
the block-shared-snowflake-access feature flag — DuckDB Keboola
extension's per-table scan can't access bucket schemas, so the legacy
kbcstorage Storage-API client is the only working path.

CUTOVER REGRESSIONS

- agnes pull hash mismatch on every Keboola local-mode table —
  src/orchestrator.py:_update_sync_state stored md5(mtime+size)[:12]
  while the CLI compares against full 32-char content MD5. Now stores
  the same content MD5 the materialized SQL path already used.

- Trailing-slash sanitization in connectors/keboola/access.py and
  extractor.py — DuckDB Keboola extension's ATTACH fails when the URL
  ends in / (canonical form).

- src/profiler.py:TableInfo.description becomes optional — two call
  sites instantiated without it, crashing the profiler pass.

- scripts/ops/agnes-auto-upgrade.sh: chown on UID change — older images
  ran as root, current runs as agnes (uid 999). Reads target uid:gid
  from /etc/passwd inside the new image and chowns ${STATE_DIR},
  /data/extracts, /data/analytics when the digest moves.

- POST /api/sync/trigger is now singleton per process — two
  near-simultaneous trigger calls each forked an extractor subprocess,
  fought for extract.duckdb's file lock, starved uvicorn, flipped the
  container to unhealthy. Trigger now returns 409
  (sync_already_in_progress) when held; _run_sync acquires non-blocking.

PARALLEL LEGACY FALLBACK

- Process pool fan-out for the _extract_via_legacy queue (default 8
  workers, override via AGNES_KEBOOLA_PARALLELISM). Process pool, not
  thread pool, because connectors/keboola/client.py:export_table does
  os.chdir(temp_dir) — process-global, so threads raced and slice files
  landed in the wrong directory ("[Errno 2] No such file or directory:
  '<job_id>.csv_X_Y_Z.csv'").

- Extractor subprocess timeout 1800s -> 3600s (configurable via
  AGNES_EXTRACTOR_TIMEOUT_SEC). 28+ tables × multi-minute Keboola export
  jobs need the headroom on telemetry-class projects.

- Process group cleanup on timeout — Popen(start_new_session=True) puts
  the extractor in its own group. On timeout the parent SIGTERMs the
  group (10s grace) then SIGKILLs stragglers. Without this, the pool
  workers were reparented to PID 1 and continued holding open Keboola
  Storage export jobs. Inline extractor script also installs a SIGTERM
  -> sys.exit(143) handler so the with ProcessPoolExecutor(...) block
  __exit__ runs cleanly.

Tests: existing tests that patched subprocess.run updated to patch
subprocess.Popen with a _FakePopen stand-in (same exit-code-injection
contract). Two tests that exercised the parallel path forced
AGNES_KEBOOLA_PARALLELISM=1 to keep mocks alive (mocks don't ride into
ProcessPoolExecutor subprocesses).

Squashed onto current main (was 7 commits + multi-commit CHANGELOG +
agnes-auto-upgrade.sh conflicts; squash avoids per-commit conflict
resolution against main's flat-mount STATE_DIR refactor and 0.38.0
release cut).

* feat(keboola): Storage API direct extract path; drop extension data path

The DuckDB Keboola extension's COPY routes through Keboola QueryService,
which is unreliable on linked-bucket projects (extension v0.1.6 fixes
that case but isn't yet in the community CDN, and pre-fix any project
with the block-shared-snowflake-access feature flag couldn't see bucket
schemas at all). Move the extract path off the extension entirely and
talk to the Storage API directly via signed-URL download — works on any
project, regardless of extension state.

connectors/keboola/storage_api.py (NEW)
  Lightweight client built on requests.Session. Three endpoints:
  - POST /v2/storage/tables/{id}/export-async        (kicks off job)
  - GET  /v2/storage/jobs/{id}                        (poll until done)
  - GET  /v2/storage/files/{id}?federationToken=1     (signed URL detail)
  - GET  <signed_url>                                 (download bytes)
  Supports sliced exports (manifest + per-slice signed URLs) and gzipped
  payloads. ExportFilter dataclass mirrors the Keboola filter spec
  (whereFilters / columns / changedSince / limit) and handles JSON
  round-trip with the registry's source_query column. Token redaction
  in error messages. Bounded exponential backoff on job polling.
  No cloud-SDK dependency on the data path; thread-safe.

connectors/keboola/extractor.py
  - materialize_query() rewritten: takes bucket/source_table/source_query
    (JSON filter spec), exports via KeboolaStorageClient, converts CSV
    to parquet via DuckDB, atomic os.replace. Same return shape so
    sync.py downstream code stays uniform with the BQ branch.
  - _extract_via_legacy() also moved to Storage API direct (kept the
    name for caller compatibility with _legacy_worker / the parallel
    batch extractor). Per-call temp directories — no os.chdir, threads
    don't race.

app/api/sync.py
  _run_materialized_pass for source_type='keboola' rows now constructs a
  KeboolaStorageClient (replaces KeboolaAccess) and passes
  bucket/source_table/source_query to materialize_query. Reuses one
  client across rows for HTTP keep-alive. Sources keboola URL from env
  too (KEBOOLA_STACK_URL) when instance.yaml doesn't have stack_url
  configured.

cli/commands/admin.py
  discover-and-register defaults Keboola rows to query_mode='materialized'
  (NULL source_query = full table), matching the v26 migration's
  unification of the local/materialized split for Keboola. BigQuery and
  Jira keep their per-source defaults.

src/db.py
  Schema bump 25 → 26. Migration: UPDATE table_registry SET
  query_mode='materialized' WHERE source_type='keboola' AND
  query_mode='local'. NULL source_query on those rows means "full table
  export" — same effective behavior the local mode provided, but now
  via Storage API instead of the extension.

pyproject.toml
  kbcstorage dep stays (admin-side bucket/table list still uses the
  SDK in app/api/admin.py / connectors/keboola/client.py); only the
  data path is migrated off the SDK. Comment updated to reflect the
  new boundary.

tests
  - test_keboola_storage_api.py (NEW, 19 tests): ExportFilter parsing,
    HTTP client (token redaction, retry logic, polling), download_file
    (single, gzipped, sliced), end-to-end export_table_to_csv.
  - test_keboola_materialize.py rewritten: mocks KeboolaStorageClient
    instead of FakeAccess; same atomic-write + zero-rows + unsafe-id
    contracts.
  - test_sync_trigger_keboola_materialized.py: registry rows now carry
    bucket+source_table+JSON-shape source_query.

114+ Keboola-impacted tests green locally.

* test: schema version assertion bumped to 26 alongside the keboola query_mode migration

* fix(keboola): cutover hot-patches surfaced on agnes-dev

Five small fixes that were applied as in-container hot-patches during
agnes-dev cutover and need to be on the source-of-truth image so a fresh
upgrade does not undo them.

- app/api/sync.py: auto-discover gate considers the WHOLE registry (any
  source, any mode), not just rows where source matches and query_mode
  is local. After the v25→v26 keboola materialized migration an
  instance can have 30 materialized rows and zero local rows; the
  previous gate kept re-firing _discover_and_register_tables every
  scheduler tick, creating duplicate auto-discovered rows with the
  wrong bucket prefix every time.

- app/api/admin.py: _discover_and_register_tables reassembles the
  bucket as <stage>.<bucket-id> (e.g. in.c-finance) instead of
  dropping the stage prefix; default query_mode for keboola is now
  materialized (the v26 contract); validator allows NULL source_query
  for keboola materialized rows (full-table export via Storage API
  export-async, no SQL needed).

- cli/commands/admin.py: register-table mirrors the server validator
  (NULL source_query allowed for source_type=keboola); --bucket help
  text generalized to cover both BQ dataset and Keboola bucket id.

- connectors/keboola/extractor.py: max_line_size=64 MiB on
  read_csv_auto so embedded JSON / SQL cells (kbc_component_configuration
  in particular) do not trip the default 2 MiB ceiling.

- connectors/keboola/storage_api.py: GCP backend support — when the
  Storage API returns a manifest whose slice URLs are gs://
  references with a gcsCredentials block, rewrite to the JSON REST
  download endpoint and authenticate with the issued OAuth bearer
  token; redact tokens in any surfaced error string.

* test: align with new keboola materialized + auto-discover-gate contracts

- test_admin_keboola_materialized: rename
  test_register_keboola_materialized_rejects_missing_source_query →
  test_register_keboola_materialized_accepts_missing_source_query.
  v25→v26 introduced 'keboola materialized with NULL source_query
  means full-table export via Storage API export-async' as the
  default registration shape; the rejection case is no longer the
  contract.

- test_sync_filter: add list_all() to _StubRegistry. The auto-discover
  gate in _run_sync now keys off the WHOLE registry (not just local
  rows) so materialized-only Keboola instances do not re-trigger
  discovery on every tick.

* feat(keboola): native parquet export — skip CSV roundtrip

Storage API export-async accepts fileType={csv,parquet}. Switching the
materialized sync to parquet eliminates the CSV → DuckDB COPY → parquet
roundtrip that pinned a single uvicorn worker over 4 GiB on multi-GB
tables (read_csv with all_varchar + max_line_size=64MB has to
materialize the whole CSV in memory before COPY can stream out a
parquet). Snowflake UNLOAD on Keboola's side already produces typed,
self-contained parquet files; the extractor downloads them and renames
into place.

Two cases:

- **Single-file** export (small table): file_info.url points at one
  signed URL; download_file streams chunks straight to .parquet.tmp
  and we're done. No DuckDB.

- **Sliced** export (Snowflake UNLOAD respects MAX_FILE_SIZE — 16 MiB
  default — so anything larger arrives as N parquet slices): each
  slice is a complete parquet file with its own footer; naive concat
  would corrupt them. download_file_slices keeps the slices as
  separate files in a tempdir, then DuckDB COPY (SELECT * FROM
  read_parquet([slice0, slice1, ...])) merges them into one
  consolidated parquet. DuckDB streams row groups during this — peak
  memory bounded to one row group (~1 MiB) regardless of source size.

The legacy CSV path stays as the explicit opt-in via source_query=
'{"file_type":"csv"}' for projects whose backend can't UNLOAD
parquet (none known today; cheap escape hatch). Backward-compat alias
KeboolaStorageClient.export_table_to_csv kept.

Also fixes a latent bug in download_file's gzip detection: previous
heuristic flagged any unencrypted file as gzipped, which would have
corrupted parquet downloads at gunzip time. Name-suffix-only now.

* fix: tempdir leak cleanup, every 0m schedule, /sync/trigger body shapes

Three small self-contained fixes uncovered during agnes-dev cutover.

- connectors/keboola/extractor.py: tempfile.TemporaryDirectory now uses
  ignore_cleanup_errors=True so a worker death mid-write doesn't leave
  multi-GiB stale slice trees on the boot disk. (12 GiB seen after a
  disk-full crash where TemporaryDirectory's own cleanup also raised
  and got swallowed.)

- src/scheduler.py: is_valid_schedule accepts 'every 0m' (interval=0
  = always due). Force-resync of an errored row no longer requires
  waiting out the default 'every 1h' interval — admin can flip the
  schedule, trigger, then flip back.

- app/api/sync.py: POST /api/sync/trigger accepts both ['table_id']
  (legacy bare-array body) and {'tables': ['table_id']} (matches the
  response payload shape, more discoverable for clients building
  requests by hand). Malformed bodies return 422 with a structured
  detail; null/missing means 'sync everything' as before.

Tests cover: tempdir cleanup on raise (sliced parquet path),
is_valid_schedule + is_table_due 'every 0m' acceptance, and trigger
body parametrized matrix (8 valid shapes + 6 rejection cases).

* fix: targeted-trigger filter in materialized pass + auto-upgrade defer

Two operational gaps observed during agnes-dev cutover, in the same
sync-routing area.

- _run_materialized_pass now takes a 'tables' arg and skips rows not in
  the target set with reason='not_in_target'. POST /api/sync/trigger
  with a body of tables previously only scoped the legacy extractor
  subprocess — the materialized pass kept iterating every due
  materialized row, so an admin asking to re-sync kbc_job re-ran
  every other due materialized row alongside it. Match on registry id
  OR name (admins commonly pass either form). tables=None preserves
  the no-filter behavior.

- New GET /api/sync/status (public, no auth) returns {locked: bool}
  off _sync_lock.locked(). agnes-auto-upgrade.sh probes this before
  docker compose up -d and exits 0 with a 'deferred recreate' log
  line if a sync is in flight — the next 5-min cron tick retries.
  Pre-fix, an auto-upgrade triggered mid-sync would recreate the
  uvicorn worker and kill the in-flight extractor / Snowflake-UNLOAD
  download (observed when kbc_job's first 7-day retry got SIGKILLed).
  Connection failures in the probe fall through to the upgrade —
  being stuck on a wedged image is worse than interrupting a
  hypothetical sync.

* fix: auto-discover protects admin overrides + surfaces drift

Two real-world incidents on agnes-dev drove this:

1. kbc_job was registered manually with the correct
   (in.c-kbc_telemetry, kbc_job) coordinates. A naive auto-discover
   re-run would have inserted a SECOND kbc_job row at the slugified
   id 'in_c-keboola-storage_kbc_job' (where Keboola's discovery
   places it) — and that row's Storage API export-async 404s.

2. An earlier auto-discover bug stripped the stage prefix from
   bucket ids ('c-finance' instead of 'in.c-finance'), inserting
   137 rows whose syncs all failed.

Fix:

- _discover_and_register_tables now builds a plan first
  (_build_keboola_discovery_plan) classifying each discovered table
  into one of new / existing_match / existing_drift / invalid, then
  executes only the 'new' bucket. Drift rows are reported with both
  sides of the disagreement plus drift_kind:
  - same_id_diff_coords: registry has the same id but different
    bucket / source_table (admin migrated coords inline).
  - name_collision: discovery's slugified id differs from any
    registry id, but the discovered .name matches an existing row's
    .name (case-insensitive). Catches the kbc_job case.

- Bucket detection now prefers the API's authoritative bucket_id
  field (separate field on the Keboola tables.list response,
  normalised by KeboolaClient.discover_all_tables). Falls back to
  id-string parsing only when bucket_id is missing (older fallback
  path inside discover_all_tables).

- Endpoint POST /api/admin/discover-and-register?dry_run=true
  returns the plan without writing — would_register, drift,
  invalid lists. Lets an operator audit before merging discovery
  with a registry that has admin overrides.

Removed 'every 0m' from test_register_request_rejects_malformed_sync_schedule
— the runtime started accepting it in the previous commit (force-resync
override) and the validator follows suit.

* feat(keboola): AGNES_TEMP_DIR routes tempfiles off overlayfs /tmp

The container's /tmp lives on the boot disk's overlayfs (29 GiB on
agnes-dev, shared with /var). Snowflake UNLOAD of a wide table writes
slices into per-call /tmp tempdirs that fill multi-GiB / many-slice
exports long before the dedicated data disk fills. agnes-dev hit
100% boot-disk while the 20 GiB data disk had 15 GiB free.

connectors.keboola.storage_api.get_temp_root() reads AGNES_TEMP_DIR;
mkdirs the target on first use; unset / empty / unwritable falls
back to None (system tempdir, OSS-pre-fix behaviour). Both
materialize_query (parquet path) and _extract_via_legacy (CSV
fallback) and the sliced-CSV concat path in storage_api use the
helper now.

docker-compose.yml defaults AGNES_TEMP_DIR=/data/tmp on app, scheduler,
and extract services. The data volume is the dedicated disk in
production layouts and a plain docker volume in single-disk
dev/laptop setups — same blast radius as the previous /tmp default
on the latter, no regression.
2026-05-07 12:12:14 +02:00
ZdenekSrotyr
c97fd504c5 release: 0.45.0 — easy-wins bundle (#84 #164 #177 #178 #203 #204)
Operator-and-analyst quality bundle: a security fix for the optional
Telegram bot, two CLI gaps closed, and three rounds of UX polish on
`agnes diagnose` and `agnes pull` so non-TTY consumers (CI runners,
Claude Code SessionStart hooks, sub-agent watchdogs) get readable,
actionable signal.

- Pairing-code RNG: random.choices -> secrets.choice (CSPRNG).
- Telegram script runner: refuse out-of-shape usernames before sudo -u.

CLAUDE.md.bak.<ISO-timestamp> before regenerating.

- agnes admin unregister-table <id> -> DELETE /api/admin/registry/{id}
- agnes admin update-table <id> --field=value ...  -> PUT /api/admin/registry/{id}

response but never promotes the headline. BQ billing-equals-data check
downgraded warning -> info.

default (5 s / 1 MiB vs 30 s / 10%) so sub-agent watchdogs don't kill
the pull as a hung process. New env knobs:
AGNES_PULL_PROGRESS_INTERVAL_{SECONDS,BYTES}.

--include-schema (or ?include=schema) to opt back in.

Tests: 120 passed across the touched modules, including new tests for
each fix. Pre-existing failures on main (DB migration v1->v9, binary
rename) are unrelated and not introduced here.
2026-05-07 11:43:16 +02:00
ZdenekSrotyr
df896816d8 chore: rename stale 'da' references to 'agnes' + CHANGELOG
Drive-by docstring/comment cleanup in cli_artifacts.py and update_check.py.
CHANGELOG entry for the auto-upgrade feature shipped in this branch.
2026-05-06 23:23:59 +02:00
ZdenekSrotyr
f4bc04958d fix: Devin Review #1 — apply backtick mask to wrapping rewriter
`_rewrite_user_sql_for_bigquery_query` does its own bare-name detection
(mirroring the non-RBAC parts of `_bq_guardrail_inputs`). The backtick
masking from #201 was applied to `_bq_guardrail_inputs` and the
forbidden-table loop, but missed this third site — so a registered
local-mode table name appearing as the table segment of a
user-supplied full backtick path (e.g. ``\`prj.ds.orders\`` matching
registered local ``orders``) tripped the cross-source guard and
forced every backtick-path query into the 50-100× slower
ATTACH-catalog fallback.

Mask once at the top of the function, route both the BQ-name
detection (line ~830) and the cross-source check (line ~867) through
the masked copy. New regression test
`test_local_name_inside_backtick_path_does_not_trip_cross_source`
proves the wrapper now wraps when it should.
2026-05-06 21:06:21 +02:00
ZdenekSrotyr
824e3cb636 feat(query): registry-gate full backtick BigQuery paths (#201)
Adds Pass 3 to `_bq_guardrail_inputs` that scans user SQL for full
backtick paths `<project>.<dataset>.<table>` and gates them
identically to the `bq."<dataset>"."<table>"` pass:

- Project must match the configured BigQuery data project
  (`get_bq_access().projects.data`). Mismatch → HTTP 403
  `bq_path_cross_project`.
- Path must point at a registered row. Unregistered → HTTP 403
  `bq_path_not_registered`.
- Non-admin caller must hold a grant on the registered row's id.
  Missing grant → HTTP 403 `bq_path_access_denied`.

Pre-fix, full backtick paths bypassed Agnes RBAC entirely — only the
service account scope limited reach. Post-fix the boundary matches
what `agnes catalog`-driven flows already enforce. Admin still
bypasses the per-id grant check but cannot bypass registration or
project match.

Pass 3 also seeds `dry_run_set` for resolved registered paths so the
cost-cap dry-run runs against the same physical table the user named
— composing cleanly with the Layer 2 fail-fast fallback.
2026-05-06 18:02:53 +02:00
ZdenekSrotyr
c32be3fe96 fix(query): cap-guard fallback retries original SQL, fails fast (#201)
When BQ rejects the rewritten dry-run SQL with `bq_bad_request`, the
cap-guard now retries with the user's ORIGINAL SQL instead of building
a synthetic `SELECT * FROM <table>` per registered table. The
synthetic path threw away user filters / projections / partition
predicates and routinely ballooned the estimate to "full table size",
falsely tripping `remote_scan_too_large` on legitimate narrow queries
(typical issue #201 trace: rewriter corrupts a backtick path → BQ
parse error → synthetic over-estimate → 400).

Behaviour:

- Rewritten SQL succeeds: same as before (issue #171 single-dry-run).
- Rewritten SQL parse-errors, original SQL succeeds: use original
  estimate. Common case for users submitting BQ-native input.
- Both fail with `bq_bad_request`: HTTP 400 `remote_estimate_failed`
  with a hint pointing at `agnes catalog` / BQ-native syntax. No
  silent over-estimate.
- Non-parse BQ error (forbidden, upstream): still 502 as before.

This is a behaviour change for clients matching error kinds — failure
to estimate scan size now surfaces as `remote_estimate_failed`
instead of being masked behind `remote_scan_too_large` from the
synthetic path.

Replaces the existing `test_guardrail_falls_back_to_per_table_estimate_on_bq_parse_error`
(which pinned the old contract) with `test_fallback_tries_original_sql_first`
and `test_fallback_fails_fast_on_pure_duckdb_syntax`.
2026-05-06 18:02:53 +02:00
ZdenekSrotyr
720a2180c0 fix(query): rewriter respects backtick segments (#201)
`agnes query --remote` corrupted user SQL when the request contained a
full BigQuery backtick path (`<project>.<dataset>.<table>`) whose
table segment matched a registered bare-name alias. The bare-name
rewriter used `\b` word-boundary matching against the lower-cased SQL;
both `.` and `` ` `` are non-word characters, so the regex fired
INSIDE the user's backtick path and produced malformed nested-backtick
SQL that BigQuery rejected at parse time.

Fix:

- Add `_mask_backticks(sql)` helper: replace each `…` segment with
  spaces of equal length, preserving offsets so word-boundary
  searches find positions only outside backticks.
- `_bq_guardrail_inputs` (bare-name pass + forbidden-table pass)
  searches against the masked SQL.
- `_rewrite_bq_table_refs_to_native` Pass 1 splits the SQL on
  `(\`[^\`]*\`)` and rewrites only the outside-backtick chunks. Pass
  2 (`bq."ds"."tbl"` → backtick form) is unchanged — its prefix can't
  appear inside backticks.

Adds three regressions covering the rewrite + guardrail paths.
2026-05-06 18:02:53 +02:00
ZdenekSrotyr
81d065b1ea fix: Devin Review #1 — bigquery_query() first arg uses billing project, not data
In cross-project BQ setups (where billing != data), the SA typically has
serviceusage.services.use on the billing project but not on the data
project. The rewriter passed bq.projects.data as the first arg to
bigquery_query(), which BQ uses as the execution + billing project →
403 USER_PROJECT_DENIED.

Match the convention used everywhere else in the codebase
(app/api/v2_scan.py, app/api/v2_sample.py, app/api/v2_schema.py,
connectors/bigquery/extractor.py): backtick paths inside the inner SQL
use the **data** project (resolves the actual table location), the
bigquery_query() first arg uses the **billing** project (decides who
pays + which project the job runs under). For single-project deploys
the two are identical so the fix is a no-op there.

Test pins the cross-project case: data-prj for backticks, billing-prj
for the bigquery_query() first arg.
2026-05-06 14:07:38 +02:00
ZdenekSrotyr
aee585fac6 fix: devil's advocate R2 — narrow shared-client try, PID tmp suffix, Syntax error anchor
R2 adversarial review surfaced 3 issues, all addressed:

#1 cli/client.py:572-577 outer try/except wrapped both _get_shared_client()
AND the actual download. A 401/403/404/5xx from the server triggered a
full second download attempt with a fresh client — wasted bandwidth on
hard failures, no fail-fast on revoked PAT. Narrowed the try to only
the shared-client construction; the download itself is no longer
retried under the fallback except.

#2 concurrent agnes pull invocations (e.g. SessionStart hook + manual
run) collided on bare <target>.tmp / <target>.partN paths — one process's
in-progress write got yanked by the other's cleanup, manifest hash
check then failed spuriously. Per-process suffix (<target>.{pid}.tmp,
<target>.{pid}.partN) makes intermediate files disjoint; the final
os.replace to the bare target is atomic so last-writer-wins.

#3 _looks_like_bq_rewrite_parse_error patterns 'Syntax error' could
false-positive on a query like WHERE log_msg = 'Syntax error in foo'
that fails for an unrelated reason (quota, network) and has the
literal substring echoed in the error text. Anchored to 'Syntax error: '
(with trailing colon) — BQ always emits the colon in this error
format, user SQL string literals normally don't.
2026-05-06 13:57:29 +02:00
ZdenekSrotyr
e5645fd280 fix: devil's advocate R1 — chunked probe, parse-error heuristic narrow, pool settings refresh, content-length sanity, multi-project skip
R1 adversarial review surfaced 5 issues, all addressed:

#1 chunked download silently disabled in non-Caddy deployments (HEAD on
GET-only FastAPI route returns 405). _probe_range_support now falls back
to GET with Range: bytes=0-0 when HEAD fails — works against both
Caddy file_server (HEAD-friendly) and dev FastAPI direct (GET-only).

#2 parse-error fallback heuristic too broad — matched on Unrecognized
name / Function not found / No matching signature / Invalid cast,
which BQ surfaces for ordinary user-column typos. That triggered slow
ATTACH-catalog retry on every typo (2× latency tax). Narrowed to just
'Syntax error' / 'syntax error' which are the genuine DuckDB-vs-BQ
dialect mismatch markers.

#3 apply_bq_session_settings was only run on fresh-built pool entries,
not on reuse. An operator's /admin/server-config change to bq_query
_timeout_ms wouldn't propagate to long-lived pooled sessions until
restart. Fixed: re-apply on every pool acquire (idempotent + fail-soft).

#4 content-length sanity bound — a misconfigured proxy returning a
wildly inflated Content-Length would cause overlapping chunked Range
requests against the actual file → corrupt assembled output (caught
by manifest hash check, but only after wasted bandwidth). Cap at 100
GiB; above that, drop to single-stream.

#5 rewriter assumed every BQ row resolves under the single
bq.projects.data project. Bucket containing '.' suggests a project-
qualified bucket (multi-project deployment); rewriter would silently
target the wrong project. Conservative skip with regression test.
2026-05-06 13:50:46 +02:00
ZdenekSrotyr
8e56d45c68 fix(query): code-review fixes — outer LIMIT wrap, dollar-quoting, parse-error fallback
Address code-reviewer findings on the bigquery_query() rewrite path:

1. Outer LIMIT wrap — bigquery_query() materialises BQ result into DuckDB
   before fetchmany sees it (vs ATTACH-catalog Storage Read API streaming).
   A user 'SELECT *' against a billion-row remote table would buffer the
   entire result before request.limit applied. Wrap rewritten SQL in an
   outer 'LIMIT N+1' so the cap pushes into the BQ job itself.

2. Dollar-quoted inner SQL — naive replace("'", "''") doubling missed
   DuckDB backslash-escape sequences (\\, \\n, \\t, …). A predicate
   like 'WHERE name = ''O\\'Brien''' was unsafe under the doubling
   path. DuckDB $bqq_inner$ … $bqq_inner$ form takes the inner SQL
   verbatim with no escapes whatsoever. Falls back to legacy doubling
   if user SQL improbably contains the literal tag.

3. Parse-error fallback — when the rewritten path fails with a BQ-side
   parse / validation error (DuckDB-only syntax like ::INT cast that
   survives identifier rewrite but BQ refuses), retry the user's
   original SQL via the legacy ATTACH-catalog path so the request still
   succeeds. Mirrors the existing dry-run fallback contract.

4. CHANGELOG — delete duplicate CLI bullets that landed under
   already-released [0.38.1] (file corruption from merge — entries are
   correctly under [0.39.0]).
2026-05-06 13:29:45 +02:00
ZdenekSrotyr
b2c1ff143c fix(query): rewrite BQ-backed user SQL via bigquery_query() to enable predicate pushdown
User SQL hitting query_mode='remote' BigQuery rows was 50-100x slower
than the equivalent direct bigquery_query() call because DuckDB's master
view (CREATE VIEW … AS SELECT * FROM bigquery.<ds>.<tbl>) does not push
WHERE/SELECT/LIMIT into BQ in ATTACH-catalog mode. The BQ extension opens
a Storage Read API session over the entire upstream table; on >100M-row
sources this was 70-150s and frequently failed with 'Response too large
to return'.

Extract the existing dry-run rewriter's core (table-name → BQ-native
backtick path) into a shared helper. Add an execution-path rewriter
that wraps the whole user SQL in bigquery_query('<project>', '<inner>')
so the BQ planner sees the full query and engages partition pruning +
projection pushdown server-side.

Conservative fall-through: cross-source JOINs (BQ ↔ Keboola/Jira local),
queries already containing bigquery_query(, and unconfigured BQ project
all skip the rewrite and run the original SQL via ATTACH-catalog so
behavior degrades gracefully.
2026-05-06 13:02:34 +02:00
ZdenekSrotyr
05e535d743 fix(admin/tables): unescape shell-quoting backslashes in descriptions 2026-05-06 10:13:49 +02:00
ZdenekSrotyr
6c94d2cbce Merge remote-tracking branch 'origin/main' into pr180-review
# Conflicts:
#	CHANGELOG.md
#	pyproject.toml
2026-05-06 07:27:25 +02:00
ZdenekSrotyr
df2c33147c fix: Devin Review on #194 round 2 — 3 BUG-class findings
1. instance.yaml overlay path now matches read site under STATE_DIR.
   Three sites updated:
     - app/api/admin.py:1005 (server-config endpoint writer)
     - app/api/admin.py:2610 (configure endpoint writer)
     - app/instance_config.py:106 (overlay reader)
   All three now go through _state_dir() so under flat-mount layout
   (STATE_DIR=/data-state) the irreplaceable instance.yaml overlay
   lands on the state disk (sdc) instead of the regenerable data
   disk (sdb). Without this fix, .env_overlay correctly went to the
   state disk while instance.yaml went to the data disk — config
   would be lost if an operator wiped sdb.

2. Strip customer-specific tokens from OSS repo per CLAUDE.md
   vendor-agnostic rule:
     - docker-compose.host-mount.yml: 'a deployer (Groupon FoundryAI)'
       → 'a deployer in production'
     - docker-compose.flat-mount.yml: 'caused 2026-05-05 in the
       Groupon FoundryAI deployment' → generic 'production failure
       mode'
     - docs/state-dir.md: rewrote the incident reference to describe
       the failure mode abstractly without naming the deployment;
       updated the recommendation table to say 'shadow-mount class'
       instead of dating the specific incident.

3. Updated docs/state-dir.md 'What reads STATE_DIR' to list all
   read/write sites including the three migrated in this round
   (admin.py, instance_config.py, marketplaces.py).

ANALYSIS finding (tls-rotate.sh hardcoded host-mount.yml) deferred
— same operator-side class as auto-upgrade.sh hardcoded host-mount,
documented limitation per the PR body.
2026-05-05 20:02:50 +02:00
ZdenekSrotyr
b6543c9c55 fix: Devin Review on #194 — 2 BUG-class findings
1. .env_overlay write paths now match read path under STATE_DIR.
   app/main.py:343 reads via _state_dir() (post-PR #194), but two
   write sites still hardcoded ${DATA_DIR}/state/.env_overlay:
     - app/api/admin.py:2687 — configure endpoint secrets persistence
     - app/api/marketplaces.py:152 — marketplace PAT persistence
   Under flat-mount layout (STATE_DIR=/data-state) the admin UI wrote
   secrets to /data/state/.env_overlay while the app read from
   /data-state/.env_overlay, silently dropping the value on next
   restart. Both write sites now go through _state_dir().

2. host-mount.yml: caddy inherits data:/srv:ro from base, but with
   no service populating the data: named volume (other services
   switched to direct /data binds), the inherited mount points at an
   empty Docker volume — try_files finds nothing, every parquet
   download falls through to uvicorn, defeating the v0.36.0
   file_server bypass under the host-mount layout. Added a caddy
   override that restates all mounts including a direct /data:/srv:ro
   bind. Mirrors the comment + treatment already in flat-mount.yml.
2026-05-05 19:47:12 +02:00
ZdenekSrotyr
f2ce915458 fix: Devin Review on #188 commit 28423907 — 2 bugs
🚩 /api/v2/catalog still async def while now calling sync stat()

`/api/v2/catalog` was left as `async def` when the rest of Tier 1 was
converted, on the assumption it was lightweight. The new
`_materialized_size_hint` populator added in this PR calls
`Path.stat()` / `Path.exists()` for every visible row to bucket the
parquet size — on a local FS that's microseconds, but on a
network-mounted DATA_DIR (NFS / CIFS / GCS-FUSE) those syscalls
can block the event loop. Convert to plain `def` so FastAPI
auto-offloads to the thread pool, mirroring /api/query etc.

🔴 stream_download translates HTTPStatusError as generic transport error

`response.raise_for_status()` inside the retry loop raises
`httpx.HTTPStatusError` on 4xx/5xx. After retries exhaust, the new
`isinstance(last_exc, httpx.HTTPError)` check at line 219 was eating
the status code: HTTPStatusError is a subclass of HTTPError, so the
generic transport translation produced "Unexpected error: HTTPStatusError"
instead of the informative "Client error '401 Unauthorized' for url …"
that callers expect. Fix: short-circuit HTTPStatusError before the
HTTPError branch — it re-raises verbatim so the caller's status-code
handling + the rich server error body (e.g. 401 expired token, 403
cross_project_forbidden) reach the analyst.

api_get / api_post / api_delete / api_patch don't have the same bug:
httpx Client.get/etc. don't raise HTTPStatusError unless the caller
explicitly calls .raise_for_status(), and our wrappers don't.
Only stream_download does, hence the targeted fix there.
2026-05-05 18:29:44 +02:00
ZdenekSrotyr
e5fb913cec perf: Tier 1 event-loop unblocking — async def → def on BQ-bound handlers
Five hottest BQ-touching endpoints were `async def` but invoked synchronous
DuckDB / BQ-extension calls inside the body. Under uvicorn's single event
loop that meant a single heavy `agnes query --remote` (waiting up to
~200 s for BQ's jobs.query) froze EVERY other request — /api/health,
dashboard, auth, even another query — for the full BQ wait. Operators
saw "VM idle, app frozen" during PR #188's testing.

Convert to plain `def` so FastAPI auto-offloads the body to the anyio
thread pool. Event loop stays free for non-BQ requests.

- app/api/query.py:execute_query
- app/api/v2_scan.py:scan_estimate_endpoint, scan_endpoint
- app/api/v2_sample.py:sample
- app/api/v2_schema.py:schema

Audit: 0 `await` statements in any converted handler (verified file-by-
file), so the rename is safe. Tests in tests/test_v2_*.py called the
handlers via `asyncio.run(...)` which now fails on a non-coroutine return;
swapped for direct calls (asyncio.run( -> ( ) — keeps paren balance).

Plus AGNES_THREADPOOL_SIZE env var (default 200, was anyio's stock 40)
in app/main.py:lifespan. Set via
anyio.to_thread.current_default_thread_limiter().total_tokens. 200 is
comfortable headroom for <50 concurrent analysts; bump for more.

480/480 impacted tests pass (the 2 remaining errors are a pre-existing
fixture setup issue in test_reader_smoke_matrix.py unrelated to this
change).
2026-05-05 17:44:08 +02:00
ZdenekSrotyr
30e81a15b9 feat(workspace-prompt): decision tree + size-hint so analyst Claude gets it right first try
Three concrete changes addressing the "analyst Claude misuses the CLI"
class of bugs (image.png table — issues #3, #5, plus the recurrent
"how big is this table" guesswork):

1. config/claude_md_template.txt — the template agnes init writes to
   <workspace>/CLAUDE.md. Surfaces every catalog-row field with a why,
   adds a query_mode-based decision tree, explicit --estimate scoping
   (snapshot create ONLY — was the #1 first-try error), an agnes fetch
   → agnes snapshot create rename note, and a 6-row failure-mode table
   that maps each common error wording to its right next step.

2. app/api/v2_catalog.py — populate rough_size_hint for local +
   materialized rows from the on-disk parquet size, bucketed
   small/medium/large/very_large. Was hardcoded null with a TODO; AI
   couldn't tell "is this 6.8 GB" without a failed --remote round-trip.

3. cli/update_check.py — the [update] banner survived the da→agnes
   rename and printed "[update] da X is out of date" on every command,
   training analysts to associate the binary with the old name.

Verified by rendering the template against representative contexts
(33/33 tests pass) and running every use case from the original
screenshot through the real CLI against a dev VM.
2026-05-05 16:44:24 +02:00
ZdenekSrotyr
1be997f6d4 feat(caddy): file_server for parquet downloads — bypass uvicorn
A single analyst's multi-GB `agnes pull` held the only uvicorn worker
for the duration of the stream, starving UI / /api/health / every other
API endpoint. Container flipped to `unhealthy`. Triggered while a
6.8 GB `order_economics` pull was in-flight on prod 2026-05-05.

Caddy now intercepts `GET /api/data/{table_id}/download` and serves
the parquet directly via sendfile from the data volume (mounted r-o
at /srv inside the caddy container). RBAC enforced by `forward_auth`
to a new lightweight `GET /api/data/{table_id}/check-access` endpoint
(returns 204 / 403) — the bulk transfer never reaches uvicorn.

Path discovery via `try_files` over the known extract.duckdb v2 source
subdirs. Anything not at a static path falls through to the existing
app handler so legacy `src_data/parquet` and future connectors still
work without a Caddyfile change. Non-Caddy deployments are unchanged.

Stage 1 (multi-worker uvicorn) was considered but blocked by the
single-writer DuckDB lock on system.duckdb — workers > 1 would crash
at startup on "Could not set lock on file", the same race that pushed
the scheduler from in-process writes to HTTP-via-app. Multi-reader
workers + single-writer coordination is out of scope for this PR.
2026-05-05 16:41:33 +02:00
ZdenekSrotyr
4f04235502 feat(bigquery): bq_query_timeout_ms knob; default 600s (was 90s)
DuckDB BigQuery extension defaults `bq_query_timeout_ms` to 90 s, which
is too tight for analyst-scale queries against view-backed BQ datasets.
`agnes query --remote` HTTP 400'd with `Binder Error: Query execution
exceeded the timeout. Job ID: ...` whenever the underlying BQ job ran
longer than 90 s, even though the job itself was healthy.

Add `data_source.bigquery.query_timeout_ms` (default 600 000 ms = 10 min,
sentinel 0 falls through to the extension default). Applied via
`SET bq_query_timeout_ms` after every `LOAD bigquery` on every BQ-touching
DuckDB session: orchestrator's `_remote_attach` ATTACH path, BqAccess
session factory, and the standalone extractor. Configurable via
`/admin/server-config` UI.

Fail-soft: extension versions that don't recognise the setting silently
keep the default rather than poisoning the session.
2026-05-05 16:40:40 +02:00
ZdenekSrotyr
8d8d2c219e refactor(cli-store): pull/info → agnes admin store; add agnes store mine
Backup-orchestration commands were split across two namespaces (pull in
agnes store, push in agnes admin store), which broke the operator
mental model — pull/push are a paired operation and should sit
together.

Move pull + info into agnes admin store so all bulk operations share
one help screen. Add agnes store mine as the user-facing equivalent —
calls the same /api/store/bundle.zip endpoint with ?owner=me, which
the server resolves to the caller's user_id. Authors can archive
their own uploads without admin role; whole-Store bulk reads stay
admin-flavored as a discoverability hint.

Server: 3-line addition to export_bundle handles owner='me' as a
magic alias for the caller. No new endpoint.

Tests updated: pull/info expectations move from agnes store to
agnes admin store; new tests cover agnes store mine and the
?owner=me server resolution. 69/69 store tests green locally.
2026-05-05 13:49:18 +02:00
ZdenekSrotyr
3d63965a67 Merge remote-tracking branch 'origin/main' into pr180-review
# Conflicts:
#	CHANGELOG.md
#	app/web/templates/_app_header.html
2026-05-05 12:05:50 +02:00
ZdenekSrotyr
a8f9d065c8 feat(store): bundle export/import + agnes store update + agnes admin store push
Adds whole-Store backup/restore primitives so an external CI/CD job can
mirror the Store to a git repo (and restore back from one).

REST:
- GET /api/store/bundle.zip — deterministic ZIP of all (filtered) Store
  entities. Layout: manifest.json + entities/<id>/{plugin,assets}/.
  Manifest carries owner_email for cross-instance restore. Auth: any
  authenticated user (Store is community-open).
- POST /api/store/import-bundle — admin-only restore. Modes
  merge|replace|skip; owner resolution by email with stub-disabled-user
  fallback when the email is unknown on the target instance.

CLI:
- agnes store update <id> [--description X] [--zip PATH] ... — in-place
  edit (server PUT permits owner OR admin per F4). Closes the missing
  edit affordance for analysts who want to fix a typo or push a new
  ZIP without losing install_count.
- agnes store pull [-o store.zip] [--unpack DIR] — download the bundle.
  --unpack streams + extracts so an external git-backup workflow can
  drop the tree straight into a repo and `git add .`.
- agnes store info [--json] — counts + size summary.
- agnes admin store push <zip-or-dir> [--mode ...] — admin-only restore.
  Auto-zips a directory client-side so a working-tree → server
  round-trip is one command.

cli/v2_client.py gains api_get_stream helper for binary downloads.

Tests: 5 new server tests (bundle shape + filters + round-trip + stub
user creation + skip mode + admin-only gate) + 11 new CLI tests
(update, pull/unpack, info, admin push). 66/66 store-related tests
green locally.
2026-05-05 11:51:31 +02:00
ZdenekSrotyr
d878764ac1 fix(session-collector-api): mirror sibling endpoints' audit-on-exception (Devin Review on #179)
Devin flagged that run_session_collector still had the same audit-skip
gap I fixed in run_verification_detector and run_corporate_memory in
the previous two rounds — a PermissionError walking /home, an OSError
on /data/user_sessions mkdir, or any other unhandled exception from
collector.run() would skip the audit_log row and only show in docker
logs.

Same try/except + unhandled_error pattern as the sibling endpoints.
All three LLM-pipeline run-* endpoints now record their failures the
same way; /admin/scheduler-runs sees them. Regression test in
tests/test_admin_run_endpoints.py::TestRunSessionCollector::test_unhandled_exception_still_audits.
2026-05-05 09:31:33 +02:00
ZdenekSrotyr
e86da72997 fix(corporate-memory-api): mirror verification-detector audit-on-exception (Devin Review on #179)
Devin flagged that run_corporate_memory still had the same audit-skip
gap I just fixed in run_verification_detector — if collect_all() throws
anything other than the already-translated ValueError (DuckDB lock,
network blip, unexpected SDK error), the audit_log row was never
written and /admin/scheduler-runs missed the failure.

Same try/except + unhandled_error pattern as the verification_detector
fix from 4c4dfee8. Regression test in
tests/test_admin_run_endpoints.py::TestRunCorporateMemory::test_unhandled_exception_still_audits.
2026-05-05 09:11:13 +02:00
ZdenekSrotyr
4c4dfee8e6 feat(profile): /profile/sessions page + audit on detector exception + correct SCHEDULER_AUDIT_ACTIONS
Three changes addressing user feedback during e2e test of #179 + Devin Review on e86dd5ed.

1) /profile/sessions — new self-service user page in the user menu.
   Lists all session jsonls the caller uploaded via `agnes push` joined
   against session_extraction_state. Each row shows uploaded_at, file
   size, status badge (pending/processed/extracted), processed_at, and
   items_extracted. The page docstring + help text explicitly call out
   that items_extracted=0 means the verification detector ran fine but
   the LLM found no claims to track — that's the documented "no items"
   outcome, not a broken pipeline. Closes the gap surfaced during the
   e2e test of #176 where a user could see their sessions on disk and
   process them through the LLM but had no UI to inspect what happened.

2) run_verification_detector audits unhandled exceptions (Devin #1).
   If detector.run() threw anything other than the already-translated
   ValueError, the audit_log row was never written. The endpoint now
   wraps detector.run in try/except, records the exception in
   audit_params["unhandled_error"], then re-raises as 500 after audit.
   The /admin/scheduler-runs page surfaces the failure row with the
   error type + message.

3) SCHEDULER_AUDIT_ACTIONS list corrected (Devin #2). Previous list
   had "marketplaces_sync_all" (wrong — actual is "marketplace.sync_all")
   plus "data_refresh" and "scripts_run_due" which app/api/sync.py and
   app/api/scripts.py don't write to audit_log. Fixed to the four
   actually-logged strings; comment points at the missing audit calls
   as a follow-up.

Tests: tests/test_web_ui.py adds TestAdminRoleGuards::test_profile_sessions_page_no_admin_required and tightens test_admin_scheduler_runs_page_admin_only to assert the correct marketplace.sync_all string.
2026-05-05 08:57:35 +02:00
ZdenekSrotyr
f0d091f721 fix(store): scratch dir leak on ZIP validation failure (Devin Review)
create_entity + update_entity created the `scratch` temp dir inside one
try/finally but cleaned it up in a separate one. Validation HTTPExceptions
raised by _safe_zip_extract (zip_unsafe_path, zip_too_large_uncompressed)
or the BadZipFile→422 conversion exited the first scope, and the second
finally was never entered → temp dir leaked on every failed upload.

Devin flagged this on the F2 commit. The leak pre-existed (zip_unsafe_path
was the original vector); F2 added zip_too_large_uncompressed to the same
broken cleanup path. Fixed by collapsing scratch creation + cleanup into
one outer try/finally that covers both extraction AND metadata/bake; the
inner try/except/finally still handles BadZipFile→422 + tmp file cleanup.

Same restructure in update_entity. Regression test
`test_scratch_dir_cleaned_up_after_failed_extraction` triggers a
zip_unsafe_path 422 and asserts tmp/agnes_store_* contains no leaked
dirs.
2026-05-05 08:52:15 +02:00
ZdenekSrotyr
fd3c76d21b fix(store): security + correctness blockers found in PR review (F1, F2, F4, F5)
Three independent reviews of PR #180 surfaced four real defects in the new
Store / my-ai-stack surface. CHANGELOG entries detail each; one-liners:

- F1 video_url XSS: any authenticated user could upload a Store entity
  with `video_url=javascript:...` and pop XSS in any viewer's session via
  the `<a href=...>` "Watch video" link in store_detail.html. Jinja2
  autoescape doesn't block URI schemes inside attribute values. Fixed by
  scheme-validating to http(s) only on create + update; 400 invalid_video_url.

- F2 ZIP decompression bomb: _safe_zip_extract checked path-traversal but
  not declared file_size totals — a 50 MB compressed upload at 1:1000
  ratio decompresses to 50 GB and DOS the host disk. Fixed by summing
  zinfo.file_size across infolist() and refusing > 200 MB before
  extractall touches disk. 413 zip_too_large_uncompressed.

- F4 admin authz parity: PUT /api/store/entities/{id} was owner-only while
  DELETE allowed owner OR admin; the store-detail page hid Edit/Delete
  buttons from admin even though DELETE was permitted. Fixed by allowing
  admin on PUT and passing is_admin to the template; gate is now
  is_owner OR is_admin everywhere.

- F5 cross-owner suffix collision: sanitize_username is many-to-one
  (alice.smith / alice_smith both → alice-smith). Two such users uploading
  entities with the same display name produced identical
  `<name>-by-<username>` suffixes, silently colliding in the served
  agnes-store-bundle on-disk paths AND the manifest catalog (Claude Code
  dedupes by plugin.json `name`). Fixed by enforcing global uniqueness on
  the suffixed value at create_entity; 409 conflict_global_suffix.

F3 (ZIP symlink members) was investigated and confirmed to be a
false-positive — Python's stdlib ZipFile.extractall does not honor
symlink mode bits, so no exploit exists.

9 new regression tests in tests/test_store_api.py::TestStoreSecurityFixes
covering all four. Test run locally: 60/60 store-related tests pass.
2026-05-05 08:18:02 +02:00
ZdenekSrotyr
e68c2d3f0f fix(session-collector): argv-free run() helper, drop SystemExit footgun (Devin Review on #179)
run_session_collector called collector.main() which did argparse.parse_args()
on uvicorn's sys.argv (['app.main:app', '--host', ...]) → sys.exit(2) →
SystemExit(2), which inherits from BaseException, escapes FastAPI handlers,
and propagates through the thread pool. Every scheduler tick that fired the
endpoint either 500-ed or risked killing the uvicorn worker.

services/session_collector/collector.py now exposes run(dry_run, verbose)
that returns (rc, stats); main() is a thin CLI shim that parses argv and
delegates. The admin endpoint calls run() directly and audit-logs the
per-run stats (users_processed, files_copied, files_skipped) instead of
just the rc. Three regression tests in TestRunHelper.

Closes Devin Review finding on app/api/admin.py:2819 (#179).
2026-05-05 06:31:55 +02:00
ZdenekSrotyr
9f33e24bf9 fix(config): overlay-aware LLM consumers + env-ref resolution (#179 review)
Devin BUG: /api/admin/configure seeds an ai: block to the writable
overlay at DATA_DIR/state/instance.yaml, but the three LLM consumers
imported from config.loader.load_instance_config — which reads the
static config dir only. Even if they had read the overlay, the loader
ran yaml.safe_load directly without passing through _resolve_env_refs,
so '${ANTHROPIC_API_KEY}' would have stayed a literal placeholder. The
pipeline appeared to work because the factory falls back to the env
var directly, but the overlay path itself was dead code.

Two fixes, both required:

1. Switched the three LLM consumers to app.instance_config.load_instance_config:
   - services/corporate_memory/collector.py:collect_all
   - services/verification_detector/__main__.py:main
   - app/api/admin.py:run_verification_detector

2. app/instance_config.py runs the loaded overlay through
   config.loader._resolve_env_refs *before* the deep-merge, so
   '${ANTHROPIC_API_KEY}' resolves at config-load time.

New regression suite tests/test_instance_config_overlay.py pins:
- env-ref resolution against the overlay (resolved when env set,
  empty when env missing — never the literal placeholder)
- deep-merge still preserves static-only sections
- the three consumers reach app.instance_config (inspected via
  inspect.getsource so a future refactor that reverts the import
  fails the test)
- end-to-end: a seeded overlay + ANTHROPIC_API_KEY env reaches the
  factory with a resolved api_key
2026-05-05 05:57:22 +02:00
ZdenekSrotyr
98a8aba3be fix(tests): align test_llm_connector with new factory + fail-fast (#179 review)
The PR rewrote collect_all() to call the new
create_extractor_from_env_or_config() helper, but the existing tests
still mocked the old direct create_extractor() symbol and the old
silent-skip-on-missing-config behavior. Five tests in
TestCorporateMemoryCollector and one in TestCollectorExtractorIntegration
were red on the PR branch.

Changes:
- Tests now mock connectors.llm.create_extractor_from_env_or_config
  (the symbol the collector imports lazily).
- Renamed test_collect_all_no_ai_config_skips ->
  test_collect_all_no_ai_config_or_env_raises and
  test_collector_handles_invalid_config -> test_collector_raises_on_invalid_config.
  Both assert pytest.raises(ValueError) — the explicit fail-fast
  semantics defect 5 of #176 was supposed to enforce.
- collect_all() no longer swallows the factory's ValueError into
  stats["errors"]; it propagates so the scheduler / admin endpoint
  surface the actionable misconfiguration message instead of
  pretending the run was a no-op.
- /api/admin/run-corporate-memory translates the propagated ValueError
  into a 500 with the factory's message, matching
  /api/admin/run-verification-detector.
2026-05-05 05:55:01 +02:00
Minas Arustamyan
d5a7c9ad79 feat(store): /store + /my-ai-stack — community marketplace + per-user composition
Adds a community-driven Store where any authenticated user uploads
skills/agents/plugins as ZIPs, plus /my-ai-stack as the per-user
composition view. The served Claude Code marketplace is now:

    (admin_granted ∖ opt_outs) ∪ store_installs

Skill + agent installs are merged into a single `agnes-store-bundle`
plugin in the served marketplace; type=plugin uploads stay standalone.
Names are suffixed with `-by-<owner-username>` at upload time so two
owners can use the same display name without colliding in Claude Code's
flat skill/agent namespace.

Schema v23 → v24 adds three tables:
  - store_entities       — community-uploaded skills/agents/plugins
  - user_store_installs  — what each user has chosen to install
  - user_plugin_optouts  — opt-out overlay on top of admin grants

Admin grant-delete drops every user's opt-out for that plugin so
re-grant resets cleanly to enabled (no sticky personal preference).

UI:
  - /store      — e-commerce-style listing with type/category/owner
                  filters, search, pagination, owner-aware [Install]
                  buttons, clickable cards
  - /store/new  — 2-step upload wizard with drag & drop, preview
                  validation (POST /api/store/entities/preview), docs
                  multi-upload, photo + video URL
  - /store/{id} — detail page with hero, file list, docs, owner
                  actions (Edit/Delete) for the uploader
  - /my-ai-stack — Granted plugins (toggle opt-out) + From the Store
                  (uninstall) sections
  - Admin nav: Marketplaces moved into Admin dropdown, renamed to
                "Curated Marketplaces"

Validation hardening: type-mismatch guards reject skill ZIP uploaded as
agent (or vice versa), and plugin ZIPs masquerading as skills/agents.
Human-readable error messages mapped client-side from machine codes.

Cross-source naming: Store entity-id-prefixed dirs (`plugins/store-<id>/`)
plus the bundle (`plugins/store-bundle/`) avoid collisions with admin
marketplaces (whose `store` slug is reserved by `is_valid_slug`).

Bundle composition is content-hashed at serve time — install/uninstall
or owner re-upload bumps the bundle's plugin.json `version`, so Claude
Code's auto-update toggle picks up changes.

Tests: 50+ new tests across naming, repositories, filter (admin ∪ store
∪ bundle), API (upload/install/uninstall/delete/preview/docs), end-to-end
marketplace.zip with bundle merging.
2026-05-05 02:53:49 +02:00
ZdenekSrotyr
a621a415cc fix(health): session-pipeline staleness check (#176)
GET /api/health/detailed now returns a session_pipeline service entry.
Heuristic:
  max(mtime of /data/user_sessions/**/*.jsonl) <=
  max(processed_at in session_extraction_state) + grace_seconds

grace_seconds = 2 × verification-detector cadence (default 30 min;
configurable via SCHEDULER_VERIFICATION_DETECTOR_INTERVAL).

When the assert fails, status='warning' (never 'error') with an
actionable detail pointing at the verification-detector scheduler job.
A warning bubbles up to the existing overall='degraded' aggregation —
operators querying /api/health/detailed (or /agnes diagnose system)
get a clear breadcrumb instead of a silently-broken pipeline.

Cold-start case (no session files, or files newer than the grace
window with empty state table) is handled explicitly to avoid noise
on a fresh deploy.

Tests: tests/test_health_session_pipeline.py.
2026-05-05 00:04:28 +02:00
ZdenekSrotyr
45de71e8ab fix(scheduler): wire LLM pipeline into scheduler-v2 (#176)
The session-collector, verification-detector, and corporate-memory
services now run on the same scheduler-v2 model that already drives
data-refresh, health-check, script-runner, and marketplaces:

- New admin endpoints in app/api/admin.py:
    POST /api/admin/run-session-collector
    POST /api/admin/run-verification-detector
    POST /api/admin/run-corporate-memory
  All admin-gated, sync-def (FastAPI thread pool), with one audit row
  per invocation. Same single-writer-of-system.duckdb pattern as the
  existing /api/marketplaces/sync-all job.

- services/scheduler/__main__.py JOBS gains three entries with offset
  cadences (10m / 15m / 17m, all coprime modulo the 30s tick) so the
  three LLM-backed jobs don't fire on the same tick and stack their
  API + DB load.

- The verification-detector endpoint surfaces the LLM factory's
  fail-fast ValueError as HTTP 500 with the actionable message,
  preserving the no-silent-skip contract from the previous commit.

Tests:
- tests/test_admin_run_endpoints.py covers admin gating + scheduler
  registration + endpoint contract.
- tests/test_scheduler_sidecar.py existing tests continue to pass.
2026-05-04 23:57:43 +02:00
ZdenekSrotyr
bbb04ac041 fix(setup): seed default ai: block + env-var fallback (#176)
POST /api/admin/configure now writes a default ai: block into the
instance.yaml overlay when the request leaves it untouched and either
ANTHROPIC_API_KEY or LLM_API_KEY is set in the environment. The block
references the env var via ${VAR} syntax — secrets never land in YAML.

connectors.llm.factory grows create_extractor_from_env_or_config which
falls back to ANTHROPIC_API_KEY / LLM_API_KEY when ai_config is empty
and raises a clear ValueError when neither is available. Both
services/corporate_memory and services/verification_detector switch to
the new helper, replacing the old 'silently skip when ai: missing'
path that was the silent-failure root cause.

Tests:
- tests/test_setup_ai_block.py — overlay seeding contract.
- tests/test_llm_provider_env_fallback.py — fallback + fail-fast.
2026-05-04 23:55:19 +02:00
ZdenekSrotyr
5915f92eaa fix(query-guardrail): single-pass alternation regex (Devin Review on query.py:464)
The iterative bare-name rewriter (one re.sub per name, longest-first)
was vulnerable to cross-contamination when the GCP project ID contained
a registered table name as a hyphen-delimited word.

Concrete repro:
  project        = 'my-ue-project'
  registered     = ['orders', 'ue']
  user SQL       = 'SELECT * FROM orders JOIN ue ON ...'
  iter 1 (orders): produces 'FROM `my-ue-project.fin.orders` JOIN ue ...'
  iter 2 (ue):     '\bue\b' matches 'ue' INSIDE 'my-ue-project' (hyphen
                   creates word boundary on both sides) — corrupts
                   the iter-1 path

Fallback at query.py:576 caught the resulting BQ parse error and fell
back to per-table SELECT * estimate, so impact was over-estimation,
not fail-open — but the #171 partition-pruning fix silently degraded
to pre-fix behavior whenever a project name shared a hyphen-segment
with a registered table.

Fix: single re.sub call with an alternation regex sorted longest-first.
Single-pass means each source position is processed exactly once, so
freshly-inserted backticked text from one match isn't re-scanned by
later names in the alternation.

Regression test
test_rewrite_helper_does_not_corrupt_when_project_id_contains_registered_name
covers the exact Devin repro.
2026-05-04 22:51:33 +02:00
ZdenekSrotyr
103efb69f0 chore(cli-rename): replace stale da verbs in active code paths
Bring admin UI, audit-log messages, code comments, and analyst-facing
skill docs in line with the post-bootstrap CLI surface (`agnes pull`,
`agnes push`, `agnes init`, `agnes snapshot create`). The legacy
`_LEGACY_STRINGS` detection tuple in `app/api/claude_md.py` and the hook
upgrade markers in `cli/lib/hooks.py` are intentionally left as-is —
they exist precisely to flag pre-rewrite content for re-authoring.

Strip "(folded from `da metrics list`)" / "(lifted from `da metrics
show`)" / "Replaces the old `da analyst status`" docstring noise — the
rename history is in CHANGELOG.md, not in module docstrings.
2026-05-04 21:10:43 +02:00
ZdenekSrotyr
500db8cd3c fix(query-guardrail): dry-run user SQL not synthetic SELECT * (#171)
Closes #171. The /api/query cost guardrail used to dry-run a synthetic
`SELECT * FROM <table>` for each registered remote-BQ row referenced
by the user SQL — which made BigQuery estimate a full table scan, with
column projection, predicate pushdown, and partition pruning all
disabled. Narrow queries on big partitioned/clustered tables (the
documented happy path for `agnes query --remote`) hit ~30,000×
over-estimates and got rejected with 400 `remote_scan_too_large` even
when BQ's own dry-run reported single-digit MB.

Pavel's report on #171 traced the root cause and proposed the fix:
rewrite the user SQL to BQ-native syntax and dry-run it as a single
job, exactly the way `bq query --dry_run` works.

Implementation:
- New helper _rewrite_user_sql_for_bq_dry_run rewrites bare registered
  names (word-boundary, case-insensitive, longest-first to avoid prefix
  collisions) + bq."<ds>"."<tbl>" forms to backticked
  `<project>.<ds>.<tbl>` paths.
- _bq_quota_and_cap_guard runs ONE dry-run on the rewritten SQL. Cap
  check uses the real estimate.
- Fallback path: if BQ rejects with bq_bad_request (e.g. DuckDB-only
  syntax like ::INT casts), the guard falls back to the pre-fix
  per-table SELECT * approach so non-portable queries still get a
  (loose) cap estimate instead of fail-opening. Non-parse BQ errors
  (forbidden, upstream) still propagate as 502.
- _bq_guardrail_inputs now also returns name_lookups so the rewriter
  has the (registered_name, bucket, source_table) mapping it needs.
- Per-table breakdown is unavailable from a composite dry-run; total
  bytes are pinned to dry_run_set[0] for the post-flight
  record_bytes(sum(...)) call to keep returning the right total.

Tests (7 new, 3 existing still pass):
- dry-run receives rewritten user SQL with WHERE clause intact (the
  load-bearing assertion for #171)
- single dry-run per request even with multiple registered tables
  (JOIN, UNION) referenced
- fallback to per-table SELECT * on bq_bad_request
- non-parse BQ errors (forbidden) still 502
- rewriter unit tests: bare + bq.path in same SQL, longest-name-wins
  on prefix collision, case-insensitive bare-name match
2026-05-04 21:08:21 +02:00
ZdenekSrotyr
e438170ade merge: pull #174 (BQ materialize view fix + concurrency, 0.33.0) into bootstrap branch
Brings in zs/materialize-sync-fix (PR #174):
- BigQuery view materialize works (wrap admin SQL in bigquery_query())
- Per-table mutex + fcntl.flock for concurrent COPY corruption
- Cost guardrail dry-run engages on materialized rows
- Schema v23 -> v24 migration: rewrite source_query to BQ-native
- Server-generated trivial source_query from bucket+source_table
- Validator backtick relaxation for materialized rows
- 0.33.0 release cut

Conflict resolution:
- CHANGELOG.md: keep our [Unreleased] (bootstrap rewrite content) ABOVE
  the new [0.33.0] section from #174. The bootstrap rewrite remains
  unreleased; it'll cut 0.34.0 (or later) when this PR merges to main.
- tests/conftest.py: union — keep our analyst-bootstrap fixture
  re-export AND #174's bq_instance / stub_bq_extractor fixtures.
- pyproject.toml auto-merged to 0.33.0 (matches the cut), correct.
- src/db.py auto-merged: SCHEMA_VERSION = 24, _v23_to_v24_finalize
  added — no overlap with our work which left schema at v23.
- CLAUDE.md auto-merged: schema-history paragraph extended with v24.

Verified: 79/79 across CLI bootstrap suite + materialize suite +
schema v24 migration tests pass locally on Python 3.13/macOS.
2026-05-04 20:53:00 +02:00
ZdenekSrotyr
3d58768143 fix: address Devin Review findings — incomplete renames + estimate guard
13 Devin findings across 10 files:

🔴 Critical:
- app/api/v2_catalog.py:42 — `_fetch_hint` returns `da fetch` in /api/v2/catalog
  responses (user-visible in every catalog list)
- cli/skills/agnes-data-querying.md — 11 stale `da fetch`/`da sync` refs in the
  bundled skill markdown
- config/claude_md_template.txt:38 — referenced `agnes pull --docs-only` flag
  that does NOT exist in agnes pull (removed; spec only ships --quiet/--json/
  --dry-run)

🟡 Important:
- app/api/admin.py:252 — `da fetch` in bq_max_scan_bytes hint
- cli/commands/auth.py:119 — `da sync` in import-token docstring (--help text)
- cli/commands/tokens.py:48 — "Export it so `da` can use it" prose
- ARCHITECTURE.md — 4 stale rows in CLI commands table
- README.md — stale paragraphs for analysts (da sync, da analyst setup)

🚩 Substantive observations addressed:
- app/api/query.py:249,302,489 — server-side error/help strings still said
  `da sync`/`da fetch` (returned in API responses to clients)
- cli/commands/snapshot.py:235-241 — DuckDB existence guard incorrectly
  blocked `--estimate` (server-side dry-run that never opens local DB).
  Added test ensuring estimate path skips the guard.

Skipped (intentionally historical):
- app/api/admin.py:2377,2429,2437 — historical comments describing past
  manifest-vs-sync_state bug; past tense, accurate to keep as `da sync`.
2026-05-04 20:05:06 +02:00
ZdenekSrotyr
5bffec641f chore(lint): final ruff fixes 2026-05-04 19:32:52 +02:00
ZdenekSrotyr
6c0846fd17 feat(config): expose materialize.lock_ttl_seconds in server-config
New top-level 'materialize' section, single field (lock_ttl_seconds).
Default 86400 (24h). Backs the file-lock TTL reclaim added in the
per-table-mutex change. Editable via PUT /api/admin/server-config and
the /admin/server-config UI.
2026-05-04 18:52:54 +02:00
ZdenekSrotyr
3871d5320a feat(admin): server-generate materialized source_query, allow BQ backticks
When admin registers a materialized BQ row with bucket+source_table but
no source_query, the server generates 'SELECT * FROM `<project>.<ds>.<tbl>`'
from instance.yaml's configured BQ project. Same fallback fires on PUT
when flipping to materialized. The backtick rejection guard, which was
appropriate for DuckDB-flavor source_query, is relaxed for materialized
rows since the new wrapping path (Task 2) runs admin SQL through BQ
jobs API which uses BQ-native syntax (backticks for dashed identifiers).
2026-05-04 18:37:27 +02:00
ZdenekSrotyr
c7c42de0f0 feat(sync): treat MaterializeInFlightError as 'skipped, in_flight'
_run_materialized_pass distinguishes due-check skips from in-flight
skips and never calls state.set_error for either. summary['skipped']
becomes a list of {table, reason} dicts; the end-of-pass log line
breaks out the in_flight subcount.

Hoists is_table_due to module-level import so test monkeypatching of
the symbol intercepts the call (the previous local import made
patches a no-op).
2026-05-04 18:11:38 +02:00
ZdenekSrotyr
59324f9361 feat(admin): scan CLAUDE.md override for legacy strings 2026-05-04 17:10:58 +02:00
ZdenekSrotyr
4ee7323436 feat(tokens): add scope + ttl_seconds fields with bootstrap-analyst clamp 2026-05-04 17:00:54 +02:00