Commit graph

269 commits

Author SHA1 Message Date
ZdenekSrotyr
8e56d45c68 fix(query): code-review fixes — outer LIMIT wrap, dollar-quoting, parse-error fallback
Address code-reviewer findings on the bigquery_query() rewrite path:

1. Outer LIMIT wrap — bigquery_query() materialises BQ result into DuckDB
   before fetchmany sees it (vs ATTACH-catalog Storage Read API streaming).
   A user 'SELECT *' against a billion-row remote table would buffer the
   entire result before request.limit applied. Wrap rewritten SQL in an
   outer 'LIMIT N+1' so the cap pushes into the BQ job itself.

2. Dollar-quoted inner SQL — naive replace("'", "''") doubling missed
   DuckDB backslash-escape sequences (\\, \\n, \\t, …). A predicate
   like 'WHERE name = ''O\\'Brien''' was unsafe under the doubling
   path. DuckDB $bqq_inner$ … $bqq_inner$ form takes the inner SQL
   verbatim with no escapes whatsoever. Falls back to legacy doubling
   if user SQL improbably contains the literal tag.

3. Parse-error fallback — when the rewritten path fails with a BQ-side
   parse / validation error (DuckDB-only syntax like ::INT cast that
   survives identifier rewrite but BQ refuses), retry the user's
   original SQL via the legacy ATTACH-catalog path so the request still
   succeeds. Mirrors the existing dry-run fallback contract.

4. CHANGELOG — delete duplicate CLI bullets that landed under
   already-released [0.38.1] (file corruption from merge — entries are
   correctly under [0.39.0]).
2026-05-06 13:29:45 +02:00
ZdenekSrotyr
b2c1ff143c fix(query): rewrite BQ-backed user SQL via bigquery_query() to enable predicate pushdown
User SQL hitting query_mode='remote' BigQuery rows was 50-100x slower
than the equivalent direct bigquery_query() call because DuckDB's master
view (CREATE VIEW … AS SELECT * FROM bigquery.<ds>.<tbl>) does not push
WHERE/SELECT/LIMIT into BQ in ATTACH-catalog mode. The BQ extension opens
a Storage Read API session over the entire upstream table; on >100M-row
sources this was 70-150s and frequently failed with 'Response too large
to return'.

Extract the existing dry-run rewriter's core (table-name → BQ-native
backtick path) into a shared helper. Add an execution-path rewriter
that wraps the whole user SQL in bigquery_query('<project>', '<inner>')
so the BQ planner sees the full query and engages partition pruning +
projection pushdown server-side.

Conservative fall-through: cross-source JOINs (BQ ↔ Keboola/Jira local),
queries already containing bigquery_query(, and unconfigured BQ project
all skip the rewrite and run the original SQL via ATTACH-catalog so
behavior degrades gracefully.
2026-05-06 13:02:34 +02:00
ZdenekSrotyr
6bc8739010 feat(admin/tables): show source, schedule, folder, registered, and sync-error in row 2026-05-06 11:09:02 +02:00
ZdenekSrotyr
b230d44687 docs(admin/tables): clarify NUL sentinel in unescapeShellQuoting 2026-05-06 10:15:56 +02:00
ZdenekSrotyr
05e535d743 fix(admin/tables): unescape shell-quoting backslashes in descriptions 2026-05-06 10:13:49 +02:00
ZdenekSrotyr
e369d0ed7b fix(admin/tables): clamp long description to 2 lines so Actions stay reachable 2026-05-06 10:06:57 +02:00
ZdenekSrotyr
6c94d2cbce Merge remote-tracking branch 'origin/main' into pr180-review
# Conflicts:
#	CHANGELOG.md
#	pyproject.toml
2026-05-06 07:27:25 +02:00
ZdenekSrotyr
df2c33147c fix: Devin Review on #194 round 2 — 3 BUG-class findings
1. instance.yaml overlay path now matches read site under STATE_DIR.
   Three sites updated:
     - app/api/admin.py:1005 (server-config endpoint writer)
     - app/api/admin.py:2610 (configure endpoint writer)
     - app/instance_config.py:106 (overlay reader)
   All three now go through _state_dir() so under flat-mount layout
   (STATE_DIR=/data-state) the irreplaceable instance.yaml overlay
   lands on the state disk (sdc) instead of the regenerable data
   disk (sdb). Without this fix, .env_overlay correctly went to the
   state disk while instance.yaml went to the data disk — config
   would be lost if an operator wiped sdb.

2. Strip customer-specific tokens from OSS repo per CLAUDE.md
   vendor-agnostic rule:
     - docker-compose.host-mount.yml: 'a deployer (Groupon FoundryAI)'
       → 'a deployer in production'
     - docker-compose.flat-mount.yml: 'caused 2026-05-05 in the
       Groupon FoundryAI deployment' → generic 'production failure
       mode'
     - docs/state-dir.md: rewrote the incident reference to describe
       the failure mode abstractly without naming the deployment;
       updated the recommendation table to say 'shadow-mount class'
       instead of dating the specific incident.

3. Updated docs/state-dir.md 'What reads STATE_DIR' to list all
   read/write sites including the three migrated in this round
   (admin.py, instance_config.py, marketplaces.py).

ANALYSIS finding (tls-rotate.sh hardcoded host-mount.yml) deferred
— same operator-side class as auto-upgrade.sh hardcoded host-mount,
documented limitation per the PR body.
2026-05-05 20:02:50 +02:00
ZdenekSrotyr
b6543c9c55 fix: Devin Review on #194 — 2 BUG-class findings
1. .env_overlay write paths now match read path under STATE_DIR.
   app/main.py:343 reads via _state_dir() (post-PR #194), but two
   write sites still hardcoded ${DATA_DIR}/state/.env_overlay:
     - app/api/admin.py:2687 — configure endpoint secrets persistence
     - app/api/marketplaces.py:152 — marketplace PAT persistence
   Under flat-mount layout (STATE_DIR=/data-state) the admin UI wrote
   secrets to /data/state/.env_overlay while the app read from
   /data-state/.env_overlay, silently dropping the value on next
   restart. Both write sites now go through _state_dir().

2. host-mount.yml: caddy inherits data:/srv:ro from base, but with
   no service populating the data: named volume (other services
   switched to direct /data binds), the inherited mount points at an
   empty Docker volume — try_files finds nothing, every parquet
   download falls through to uvicorn, defeating the v0.36.0
   file_server bypass under the host-mount layout. Added a caddy
   override that restates all mounts including a direct /data:/srv:ro
   bind. Mirrors the comment + treatment already in flat-mount.yml.
2026-05-05 19:47:12 +02:00
Vojtech Rysanek
a303de0372 feat: STATE_DIR env var + flat-mount overlay (parallel disks)
Introduces STATE_DIR as the single source of truth for the writable
state directory path, with backward-compatible default of
${DATA_DIR}/state. Pairs with a new docker-compose.flat-mount.yml
overlay that mounts the state disk in PARALLEL to the data disk
(rather than nested under it).

Why
---
The default deployment topology nests state under data: sdb at /data,
sdc at /data/state. That layout has known fragility documented in
docs/state-dir.md — bind-propagation gotchas, two-writer collisions
on the same prefix, mount-order coupling. The 2026-05-05 incident in
the Groupon FoundryAI deployment was a manifestation of the
propagation gotcha.

The flat layout (sdb at /data, sdc at /data-state — parallel, not
nested) eliminates the nested-mount class entirely. Each disk is its
own bind mount, recursive by default in modern Docker. No volume
options to forget. No two-writer collision (host scripts and
container app share /data-state at the same path, single namespace).

What changes
------------
App code (Python):
- src/db.py:        new _get_state_dir() helper. get_system_db() and
                    schema migration snapshot use it.
- app/secrets.py:   new _state_dir() helper. _load_or_generate() uses
                    it for .session_secret and .jwt_secret.
- app/main.py:      .env_overlay loaded from _state_dir().

Host scripts:
- scripts/ops/agnes-auto-upgrade.sh: STATE_DIR drives mount-sanity
  check and cert detection. Defaults preserve existing behavior.
- scripts/ops/agnes-tls-rotate.sh:   STATE_DIR drives CERT_DIR.

New compose overlay:
- docker-compose.flat-mount.yml: parallel /data and /data-state binds
  per service. Mutually exclusive with docker-compose.host-mount.yml;
  pick one based on disk topology.

Documentation:
- docs/state-dir.md: layout choice (A nested vs B flat), pros/cons,
  migration steps, and which code paths read STATE_DIR.

Backward compatibility
----------------------
STATE_DIR defaults to ${DATA_DIR}/state — current behavior. Existing
deployers that don't set the var see no behavior change. Migration
to flat layout is opt-in per the runbook in docs/state-dir.md.

Validation
----------
- bash -n on both host scripts: pass
- docker compose config -f docker-compose.flat-mount.yml: resolves
  cleanly with all 6 services binding /data and /data-state directly
- python3 import + helper exercise: STATE_DIR override works,
  default falls back to ${DATA_DIR}/state

Companion to PR #191 (drop named-volume driver_opts in host-mount.yml).
That PR fixes the immutability footgun for Layout A; this PR offers
Layout B as the architectural alternative.
2026-05-05 19:28:07 +02:00
ZdenekSrotyr
f2ce915458 fix: Devin Review on #188 commit 28423907 — 2 bugs
🚩 /api/v2/catalog still async def while now calling sync stat()

`/api/v2/catalog` was left as `async def` when the rest of Tier 1 was
converted, on the assumption it was lightweight. The new
`_materialized_size_hint` populator added in this PR calls
`Path.stat()` / `Path.exists()` for every visible row to bucket the
parquet size — on a local FS that's microseconds, but on a
network-mounted DATA_DIR (NFS / CIFS / GCS-FUSE) those syscalls
can block the event loop. Convert to plain `def` so FastAPI
auto-offloads to the thread pool, mirroring /api/query etc.

🔴 stream_download translates HTTPStatusError as generic transport error

`response.raise_for_status()` inside the retry loop raises
`httpx.HTTPStatusError` on 4xx/5xx. After retries exhaust, the new
`isinstance(last_exc, httpx.HTTPError)` check at line 219 was eating
the status code: HTTPStatusError is a subclass of HTTPError, so the
generic transport translation produced "Unexpected error: HTTPStatusError"
instead of the informative "Client error '401 Unauthorized' for url …"
that callers expect. Fix: short-circuit HTTPStatusError before the
HTTPError branch — it re-raises verbatim so the caller's status-code
handling + the rich server error body (e.g. 401 expired token, 403
cross_project_forbidden) reach the analyst.

api_get / api_post / api_delete / api_patch don't have the same bug:
httpx Client.get/etc. don't raise HTTPStatusError unless the caller
explicitly calls .raise_for_status(), and our wrappers don't.
Only stream_download does, hence the targeted fix there.
2026-05-05 18:29:44 +02:00
ZdenekSrotyr
e5fb913cec perf: Tier 1 event-loop unblocking — async def → def on BQ-bound handlers
Five hottest BQ-touching endpoints were `async def` but invoked synchronous
DuckDB / BQ-extension calls inside the body. Under uvicorn's single event
loop that meant a single heavy `agnes query --remote` (waiting up to
~200 s for BQ's jobs.query) froze EVERY other request — /api/health,
dashboard, auth, even another query — for the full BQ wait. Operators
saw "VM idle, app frozen" during PR #188's testing.

Convert to plain `def` so FastAPI auto-offloads the body to the anyio
thread pool. Event loop stays free for non-BQ requests.

- app/api/query.py:execute_query
- app/api/v2_scan.py:scan_estimate_endpoint, scan_endpoint
- app/api/v2_sample.py:sample
- app/api/v2_schema.py:schema

Audit: 0 `await` statements in any converted handler (verified file-by-
file), so the rename is safe. Tests in tests/test_v2_*.py called the
handlers via `asyncio.run(...)` which now fails on a non-coroutine return;
swapped for direct calls (asyncio.run( -> ( ) — keeps paren balance).

Plus AGNES_THREADPOOL_SIZE env var (default 200, was anyio's stock 40)
in app/main.py:lifespan. Set via
anyio.to_thread.current_default_thread_limiter().total_tokens. 200 is
comfortable headroom for <50 concurrent analysts; bump for more.

480/480 impacted tests pass (the 2 remaining errors are a pre-existing
fixture setup issue in test_reader_smoke_matrix.py unrelated to this
change).
2026-05-05 17:44:08 +02:00
ZdenekSrotyr
30e81a15b9 feat(workspace-prompt): decision tree + size-hint so analyst Claude gets it right first try
Three concrete changes addressing the "analyst Claude misuses the CLI"
class of bugs (image.png table — issues #3, #5, plus the recurrent
"how big is this table" guesswork):

1. config/claude_md_template.txt — the template agnes init writes to
   <workspace>/CLAUDE.md. Surfaces every catalog-row field with a why,
   adds a query_mode-based decision tree, explicit --estimate scoping
   (snapshot create ONLY — was the #1 first-try error), an agnes fetch
   → agnes snapshot create rename note, and a 6-row failure-mode table
   that maps each common error wording to its right next step.

2. app/api/v2_catalog.py — populate rough_size_hint for local +
   materialized rows from the on-disk parquet size, bucketed
   small/medium/large/very_large. Was hardcoded null with a TODO; AI
   couldn't tell "is this 6.8 GB" without a failed --remote round-trip.

3. cli/update_check.py — the [update] banner survived the da→agnes
   rename and printed "[update] da X is out of date" on every command,
   training analysts to associate the binary with the old name.

Verified by rendering the template against representative contexts
(33/33 tests pass) and running every use case from the original
screenshot through the real CLI against a dev VM.
2026-05-05 16:44:24 +02:00
ZdenekSrotyr
1be997f6d4 feat(caddy): file_server for parquet downloads — bypass uvicorn
A single analyst's multi-GB `agnes pull` held the only uvicorn worker
for the duration of the stream, starving UI / /api/health / every other
API endpoint. Container flipped to `unhealthy`. Triggered while a
6.8 GB `order_economics` pull was in-flight on prod 2026-05-05.

Caddy now intercepts `GET /api/data/{table_id}/download` and serves
the parquet directly via sendfile from the data volume (mounted r-o
at /srv inside the caddy container). RBAC enforced by `forward_auth`
to a new lightweight `GET /api/data/{table_id}/check-access` endpoint
(returns 204 / 403) — the bulk transfer never reaches uvicorn.

Path discovery via `try_files` over the known extract.duckdb v2 source
subdirs. Anything not at a static path falls through to the existing
app handler so legacy `src_data/parquet` and future connectors still
work without a Caddyfile change. Non-Caddy deployments are unchanged.

Stage 1 (multi-worker uvicorn) was considered but blocked by the
single-writer DuckDB lock on system.duckdb — workers > 1 would crash
at startup on "Could not set lock on file", the same race that pushed
the scheduler from in-process writes to HTTP-via-app. Multi-reader
workers + single-writer coordination is out of scope for this PR.
2026-05-05 16:41:33 +02:00
ZdenekSrotyr
4f04235502 feat(bigquery): bq_query_timeout_ms knob; default 600s (was 90s)
DuckDB BigQuery extension defaults `bq_query_timeout_ms` to 90 s, which
is too tight for analyst-scale queries against view-backed BQ datasets.
`agnes query --remote` HTTP 400'd with `Binder Error: Query execution
exceeded the timeout. Job ID: ...` whenever the underlying BQ job ran
longer than 90 s, even though the job itself was healthy.

Add `data_source.bigquery.query_timeout_ms` (default 600 000 ms = 10 min,
sentinel 0 falls through to the extension default). Applied via
`SET bq_query_timeout_ms` after every `LOAD bigquery` on every BQ-touching
DuckDB session: orchestrator's `_remote_attach` ATTACH path, BqAccess
session factory, and the standalone extractor. Configurable via
`/admin/server-config` UI.

Fail-soft: extension versions that don't recognise the setting silently
keep the default rather than poisoning the session.
2026-05-05 16:40:40 +02:00
ZdenekSrotyr
8d8d2c219e refactor(cli-store): pull/info → agnes admin store; add agnes store mine
Backup-orchestration commands were split across two namespaces (pull in
agnes store, push in agnes admin store), which broke the operator
mental model — pull/push are a paired operation and should sit
together.

Move pull + info into agnes admin store so all bulk operations share
one help screen. Add agnes store mine as the user-facing equivalent —
calls the same /api/store/bundle.zip endpoint with ?owner=me, which
the server resolves to the caller's user_id. Authors can archive
their own uploads without admin role; whole-Store bulk reads stay
admin-flavored as a discoverability hint.

Server: 3-line addition to export_bundle handles owner='me' as a
magic alias for the caller. No new endpoint.

Tests updated: pull/info expectations move from agnes store to
agnes admin store; new tests cover agnes store mine and the
?owner=me server resolution. 69/69 store tests green locally.
2026-05-05 13:49:18 +02:00
ZdenekSrotyr
3d63965a67 Merge remote-tracking branch 'origin/main' into pr180-review
# Conflicts:
#	CHANGELOG.md
#	app/web/templates/_app_header.html
2026-05-05 12:05:50 +02:00
ZdenekSrotyr
a8f9d065c8 feat(store): bundle export/import + agnes store update + agnes admin store push
Adds whole-Store backup/restore primitives so an external CI/CD job can
mirror the Store to a git repo (and restore back from one).

REST:
- GET /api/store/bundle.zip — deterministic ZIP of all (filtered) Store
  entities. Layout: manifest.json + entities/<id>/{plugin,assets}/.
  Manifest carries owner_email for cross-instance restore. Auth: any
  authenticated user (Store is community-open).
- POST /api/store/import-bundle — admin-only restore. Modes
  merge|replace|skip; owner resolution by email with stub-disabled-user
  fallback when the email is unknown on the target instance.

CLI:
- agnes store update <id> [--description X] [--zip PATH] ... — in-place
  edit (server PUT permits owner OR admin per F4). Closes the missing
  edit affordance for analysts who want to fix a typo or push a new
  ZIP without losing install_count.
- agnes store pull [-o store.zip] [--unpack DIR] — download the bundle.
  --unpack streams + extracts so an external git-backup workflow can
  drop the tree straight into a repo and `git add .`.
- agnes store info [--json] — counts + size summary.
- agnes admin store push <zip-or-dir> [--mode ...] — admin-only restore.
  Auto-zips a directory client-side so a working-tree → server
  round-trip is one command.

cli/v2_client.py gains api_get_stream helper for binary downloads.

Tests: 5 new server tests (bundle shape + filters + round-trip + stub
user creation + skip mode + admin-only gate) + 11 new CLI tests
(update, pull/unpack, info, admin push). 66/66 store-related tests
green locally.
2026-05-05 11:51:31 +02:00
ZdenekSrotyr
952dc9e74d fix(profile-sessions): tolerate stat() failures on individual jsonl (Devin Review on #179)
The previous gather used `sorted(glob, key=lambda p: p.stat().st_mtime)`.
A transient OSError (race with delete, permission flicker, EBADF on a
weird filesystem) on any single file raised through the lambda and 500-ed
the whole page.

Reworked: stat each path under try/except into a (path, stat) list, sort
the already-statted entries. Bad files drop silently from the listing.

Regression test test_profile_sessions_page_tolerates_stat_failures
patches Path.stat to raise on one of two files, asserts the page returns
200 with the good row rendered and the bad row dropped.
2026-05-05 09:53:06 +02:00
ZdenekSrotyr
d878764ac1 fix(session-collector-api): mirror sibling endpoints' audit-on-exception (Devin Review on #179)
Devin flagged that run_session_collector still had the same audit-skip
gap I fixed in run_verification_detector and run_corporate_memory in
the previous two rounds — a PermissionError walking /home, an OSError
on /data/user_sessions mkdir, or any other unhandled exception from
collector.run() would skip the audit_log row and only show in docker
logs.

Same try/except + unhandled_error pattern as the sibling endpoints.
All three LLM-pipeline run-* endpoints now record their failures the
same way; /admin/scheduler-runs sees them. Regression test in
tests/test_admin_run_endpoints.py::TestRunSessionCollector::test_unhandled_exception_still_audits.
2026-05-05 09:31:33 +02:00
ZdenekSrotyr
9ebe991b55 feat(profile): per-session jsonl download from /profile/sessions
User feedback during e2e of #179: the listing page is nice but I want
to grab the raw jsonl and look at what's inside.

Adds GET /profile/sessions/<filename>:
- Auth via get_current_user (owner-only).
- Path safety: rejects "/", "\", "..", leading ".", and any non-".jsonl"
  filename. The served path resolves under
  ${DATA_DIR}/user_sessions/<caller.id>/; if resolution escapes that
  base directory, returns 404 (never 403, so existence of other users'
  files isn't leaked).
- FileResponse with Content-Disposition: attachment.

UI: Download button per row in profile_sessions.html.

Tests in test_web_ui.py: path-traversal / nested / dotfile / non-jsonl
all 404 for owner; unauthenticated 302/401/403; authenticated owner
gets 200 + correct Content-Disposition.
2026-05-05 09:15:12 +02:00
ZdenekSrotyr
e86da72997 fix(corporate-memory-api): mirror verification-detector audit-on-exception (Devin Review on #179)
Devin flagged that run_corporate_memory still had the same audit-skip
gap I just fixed in run_verification_detector — if collect_all() throws
anything other than the already-translated ValueError (DuckDB lock,
network blip, unexpected SDK error), the audit_log row was never
written and /admin/scheduler-runs missed the failure.

Same try/except + unhandled_error pattern as the verification_detector
fix from 4c4dfee8. Regression test in
tests/test_admin_run_endpoints.py::TestRunCorporateMemory::test_unhandled_exception_still_audits.
2026-05-05 09:11:13 +02:00
ZdenekSrotyr
4c4dfee8e6 feat(profile): /profile/sessions page + audit on detector exception + correct SCHEDULER_AUDIT_ACTIONS
Three changes addressing user feedback during e2e test of #179 + Devin Review on e86dd5ed.

1) /profile/sessions — new self-service user page in the user menu.
   Lists all session jsonls the caller uploaded via `agnes push` joined
   against session_extraction_state. Each row shows uploaded_at, file
   size, status badge (pending/processed/extracted), processed_at, and
   items_extracted. The page docstring + help text explicitly call out
   that items_extracted=0 means the verification detector ran fine but
   the LLM found no claims to track — that's the documented "no items"
   outcome, not a broken pipeline. Closes the gap surfaced during the
   e2e test of #176 where a user could see their sessions on disk and
   process them through the LLM but had no UI to inspect what happened.

2) run_verification_detector audits unhandled exceptions (Devin #1).
   If detector.run() threw anything other than the already-translated
   ValueError, the audit_log row was never written. The endpoint now
   wraps detector.run in try/except, records the exception in
   audit_params["unhandled_error"], then re-raises as 500 after audit.
   The /admin/scheduler-runs page surfaces the failure row with the
   error type + message.

3) SCHEDULER_AUDIT_ACTIONS list corrected (Devin #2). Previous list
   had "marketplaces_sync_all" (wrong — actual is "marketplace.sync_all")
   plus "data_refresh" and "scripts_run_due" which app/api/sync.py and
   app/api/scripts.py don't write to audit_log. Fixed to the four
   actually-logged strings; comment points at the missing audit calls
   as a follow-up.

Tests: tests/test_web_ui.py adds TestAdminRoleGuards::test_profile_sessions_page_no_admin_required and tightens test_admin_scheduler_runs_page_admin_only to assert the correct marketplace.sync_all string.
2026-05-05 08:57:35 +02:00
ZdenekSrotyr
f0d091f721 fix(store): scratch dir leak on ZIP validation failure (Devin Review)
create_entity + update_entity created the `scratch` temp dir inside one
try/finally but cleaned it up in a separate one. Validation HTTPExceptions
raised by _safe_zip_extract (zip_unsafe_path, zip_too_large_uncompressed)
or the BadZipFile→422 conversion exited the first scope, and the second
finally was never entered → temp dir leaked on every failed upload.

Devin flagged this on the F2 commit. The leak pre-existed (zip_unsafe_path
was the original vector); F2 added zip_too_large_uncompressed to the same
broken cleanup path. Fixed by collapsing scratch creation + cleanup into
one outer try/finally that covers both extraction AND metadata/bake; the
inner try/except/finally still handles BadZipFile→422 + tmp file cleanup.

Same restructure in update_entity. Regression test
`test_scratch_dir_cleaned_up_after_failed_extraction` triggers a
zip_unsafe_path 422 and asserts tmp/agnes_store_* contains no leaked
dirs.
2026-05-05 08:52:15 +02:00
ZdenekSrotyr
fd3c76d21b fix(store): security + correctness blockers found in PR review (F1, F2, F4, F5)
Three independent reviews of PR #180 surfaced four real defects in the new
Store / my-ai-stack surface. CHANGELOG entries detail each; one-liners:

- F1 video_url XSS: any authenticated user could upload a Store entity
  with `video_url=javascript:...` and pop XSS in any viewer's session via
  the `<a href=...>` "Watch video" link in store_detail.html. Jinja2
  autoescape doesn't block URI schemes inside attribute values. Fixed by
  scheme-validating to http(s) only on create + update; 400 invalid_video_url.

- F2 ZIP decompression bomb: _safe_zip_extract checked path-traversal but
  not declared file_size totals — a 50 MB compressed upload at 1:1000
  ratio decompresses to 50 GB and DOS the host disk. Fixed by summing
  zinfo.file_size across infolist() and refusing > 200 MB before
  extractall touches disk. 413 zip_too_large_uncompressed.

- F4 admin authz parity: PUT /api/store/entities/{id} was owner-only while
  DELETE allowed owner OR admin; the store-detail page hid Edit/Delete
  buttons from admin even though DELETE was permitted. Fixed by allowing
  admin on PUT and passing is_admin to the template; gate is now
  is_owner OR is_admin everywhere.

- F5 cross-owner suffix collision: sanitize_username is many-to-one
  (alice.smith / alice_smith both → alice-smith). Two such users uploading
  entities with the same display name produced identical
  `<name>-by-<username>` suffixes, silently colliding in the served
  agnes-store-bundle on-disk paths AND the manifest catalog (Claude Code
  dedupes by plugin.json `name`). Fixed by enforcing global uniqueness on
  the suffixed value at create_entity; 409 conflict_global_suffix.

F3 (ZIP symlink members) was investigated and confirmed to be a
false-positive — Python's stdlib ZipFile.extractall does not honor
symlink mode bits, so no exploit exists.

9 new regression tests in tests/test_store_api.py::TestStoreSecurityFixes
covering all four. Test run locally: 60/60 store-related tests pass.
2026-05-05 08:18:02 +02:00
ZdenekSrotyr
e86dd5edc5 fix(anthropic): strict json_schema (additionalProperties=false) + add /admin/scheduler-runs UI
E2E test on a real BQ deploy showed every verification-extraction call
fails with HTTP 400 invalid_request_error: "output_config.format.schema:
For 'object' type, 'additionalProperties' must be explicitly set to false".
The Anthropic structured-output API now requires the field on every object
node in the json_schema. Fix: connectors/llm/anthropic_provider.py wraps
the caller-supplied schema through a recursive _strict_json_schema()
walker that adds the field where missing (preserving any explicit
override), then passes the strict variant to the API. Six unit tests in
TestStrictJsonSchema pin the recursion across nested objects, array items,
and the no-mutation invariant.

Adds /admin/scheduler-runs — a read-only admin page that surfaces the
last 200 audit-log entries from scheduler-driven actions. New
AuditRepository.query_actions(actions, limit) helper, new admin nav
entry. Failed scheduler ticks (HTTP 401, network errors) don't reach
the audit_log; the page calls that out with a hint to set
SCHEDULER_API_TOKEN if no rows show up.
2026-05-05 08:00:57 +02:00
ZdenekSrotyr
e68c2d3f0f fix(session-collector): argv-free run() helper, drop SystemExit footgun (Devin Review on #179)
run_session_collector called collector.main() which did argparse.parse_args()
on uvicorn's sys.argv (['app.main:app', '--host', ...]) → sys.exit(2) →
SystemExit(2), which inherits from BaseException, escapes FastAPI handlers,
and propagates through the thread pool. Every scheduler tick that fired the
endpoint either 500-ed or risked killing the uvicorn worker.

services/session_collector/collector.py now exposes run(dry_run, verbose)
that returns (rc, stats); main() is a thin CLI shim that parses argv and
delegates. The admin endpoint calls run() directly and audit-logs the
per-run stats (users_processed, files_copied, files_skipped) instead of
just the rc. Three regression tests in TestRunHelper.

Closes Devin Review finding on app/api/admin.py:2819 (#179).
2026-05-05 06:31:55 +02:00
ZdenekSrotyr
9f33e24bf9 fix(config): overlay-aware LLM consumers + env-ref resolution (#179 review)
Devin BUG: /api/admin/configure seeds an ai: block to the writable
overlay at DATA_DIR/state/instance.yaml, but the three LLM consumers
imported from config.loader.load_instance_config — which reads the
static config dir only. Even if they had read the overlay, the loader
ran yaml.safe_load directly without passing through _resolve_env_refs,
so '${ANTHROPIC_API_KEY}' would have stayed a literal placeholder. The
pipeline appeared to work because the factory falls back to the env
var directly, but the overlay path itself was dead code.

Two fixes, both required:

1. Switched the three LLM consumers to app.instance_config.load_instance_config:
   - services/corporate_memory/collector.py:collect_all
   - services/verification_detector/__main__.py:main
   - app/api/admin.py:run_verification_detector

2. app/instance_config.py runs the loaded overlay through
   config.loader._resolve_env_refs *before* the deep-merge, so
   '${ANTHROPIC_API_KEY}' resolves at config-load time.

New regression suite tests/test_instance_config_overlay.py pins:
- env-ref resolution against the overlay (resolved when env set,
  empty when env missing — never the literal placeholder)
- deep-merge still preserves static-only sections
- the three consumers reach app.instance_config (inspected via
  inspect.getsource so a future refactor that reverts the import
  fails the test)
- end-to-end: a seeded overlay + ANTHROPIC_API_KEY env reaches the
  factory with a resolved api_key
2026-05-05 05:57:22 +02:00
ZdenekSrotyr
98a8aba3be fix(tests): align test_llm_connector with new factory + fail-fast (#179 review)
The PR rewrote collect_all() to call the new
create_extractor_from_env_or_config() helper, but the existing tests
still mocked the old direct create_extractor() symbol and the old
silent-skip-on-missing-config behavior. Five tests in
TestCorporateMemoryCollector and one in TestCollectorExtractorIntegration
were red on the PR branch.

Changes:
- Tests now mock connectors.llm.create_extractor_from_env_or_config
  (the symbol the collector imports lazily).
- Renamed test_collect_all_no_ai_config_skips ->
  test_collect_all_no_ai_config_or_env_raises and
  test_collector_handles_invalid_config -> test_collector_raises_on_invalid_config.
  Both assert pytest.raises(ValueError) — the explicit fail-fast
  semantics defect 5 of #176 was supposed to enforce.
- collect_all() no longer swallows the factory's ValueError into
  stats["errors"]; it propagates so the scheduler / admin endpoint
  surface the actionable misconfiguration message instead of
  pretending the run was a no-op.
- /api/admin/run-corporate-memory translates the propagated ValueError
  into a 500 with the factory's message, matching
  /api/admin/run-verification-detector.
2026-05-05 05:55:01 +02:00
Minas Arustamyan
af72c5d259 fix(setup): walk TLS chain for trust-store match — Let's Encrypt cleanup
`_read_agnes_ca_pem()` decides whether the served fullchain.pem needs
trust-bootstrapping in the rendered setup prompt. Pre-fix it only
checked the leaf's *immediate* issuer against `certifi`'s trust store.
For Let's Encrypt that's the intermediate (R13), which `certifi` does
not ship — only roots are in trust stores. So a publicly-trusted LE
chain still tripped the "needs bootstrap" path and the setup prompt
emitted a step-0 TLS trust block + clone-fallback marketplace block
that no client actually needs (Bun-compiled `claude.exe`, system git,
Python via certifi all validate the chain through the bundled ISRG
Root X1).

Now we walk every cert in the fullchain (leaf + intermediates) and
return None the first time any cert's issuer is in the certifi trust
store — that captures the standard "leaf signed by intermediate signed
by publicly-trusted root" shape. Trusted subjects are read once into
a set for O(1) lookup. Self-signed (leaf.issuer == leaf.subject) and
private-CA chains (no chain link's issuer in certifi) keep their
previous "return PEM" behavior, so deployments that genuinely need
the bootstrap still get it.

Validated end-to-end against the live VM at
agnes-marustamyan.groupondev.com (LE R13 → ISRG Root X1):
  - Let's Encrypt fullchain                   → has_ca=False (was True)
  - Self-signed cert                          → has_ca=True
  - Corporate-CA chain (private root)         → has_ca=True
  - Missing fullchain.pem                     → has_ca=False
2026-05-05 04:55:06 +02:00
Minas Arustamyan
d5a7c9ad79 feat(store): /store + /my-ai-stack — community marketplace + per-user composition
Adds a community-driven Store where any authenticated user uploads
skills/agents/plugins as ZIPs, plus /my-ai-stack as the per-user
composition view. The served Claude Code marketplace is now:

    (admin_granted ∖ opt_outs) ∪ store_installs

Skill + agent installs are merged into a single `agnes-store-bundle`
plugin in the served marketplace; type=plugin uploads stay standalone.
Names are suffixed with `-by-<owner-username>` at upload time so two
owners can use the same display name without colliding in Claude Code's
flat skill/agent namespace.

Schema v23 → v24 adds three tables:
  - store_entities       — community-uploaded skills/agents/plugins
  - user_store_installs  — what each user has chosen to install
  - user_plugin_optouts  — opt-out overlay on top of admin grants

Admin grant-delete drops every user's opt-out for that plugin so
re-grant resets cleanly to enabled (no sticky personal preference).

UI:
  - /store      — e-commerce-style listing with type/category/owner
                  filters, search, pagination, owner-aware [Install]
                  buttons, clickable cards
  - /store/new  — 2-step upload wizard with drag & drop, preview
                  validation (POST /api/store/entities/preview), docs
                  multi-upload, photo + video URL
  - /store/{id} — detail page with hero, file list, docs, owner
                  actions (Edit/Delete) for the uploader
  - /my-ai-stack — Granted plugins (toggle opt-out) + From the Store
                  (uninstall) sections
  - Admin nav: Marketplaces moved into Admin dropdown, renamed to
                "Curated Marketplaces"

Validation hardening: type-mismatch guards reject skill ZIP uploaded as
agent (or vice versa), and plugin ZIPs masquerading as skills/agents.
Human-readable error messages mapped client-side from machine codes.

Cross-source naming: Store entity-id-prefixed dirs (`plugins/store-<id>/`)
plus the bundle (`plugins/store-bundle/`) avoid collisions with admin
marketplaces (whose `store` slug is reserved by `is_valid_slug`).

Bundle composition is content-hashed at serve time — install/uninstall
or owner re-upload bumps the bundle's plugin.json `version`, so Claude
Code's auto-update toggle picks up changes.

Tests: 50+ new tests across naming, repositories, filter (admin ∪ store
∪ bundle), API (upload/install/uninstall/delete/preview/docs), end-to-end
marketplace.zip with bundle merging.
2026-05-05 02:53:49 +02:00
ZdenekSrotyr
a621a415cc fix(health): session-pipeline staleness check (#176)
GET /api/health/detailed now returns a session_pipeline service entry.
Heuristic:
  max(mtime of /data/user_sessions/**/*.jsonl) <=
  max(processed_at in session_extraction_state) + grace_seconds

grace_seconds = 2 × verification-detector cadence (default 30 min;
configurable via SCHEDULER_VERIFICATION_DETECTOR_INTERVAL).

When the assert fails, status='warning' (never 'error') with an
actionable detail pointing at the verification-detector scheduler job.
A warning bubbles up to the existing overall='degraded' aggregation —
operators querying /api/health/detailed (or /agnes diagnose system)
get a clear breadcrumb instead of a silently-broken pipeline.

Cold-start case (no session files, or files newer than the grace
window with empty state table) is handled explicitly to avoid noise
on a fresh deploy.

Tests: tests/test_health_session_pipeline.py.
2026-05-05 00:04:28 +02:00
ZdenekSrotyr
c53c1e1572 fix(ui): admin pending-review banner on /corporate-memory (#176)
The /corporate-memory page filters status IN ('approved','mandatory')
and showed no hint that pending items exist. With approval_mode set to
'review_queue' (the default in instance.yaml.example), every collection
run would silently funnel new items into the pending bucket where no
operator ever saw them.

For admins (is_km_admin), the page now renders a banner above the
stats bar:
  N pending items awaiting review — review them at /corporate-memory/admin

Non-admins see no change (the route zeroes the count server-side
before passing to the template, so the hint is never leaked).

Tests: tests/test_corporate_memory_page.py.
2026-05-05 00:01:22 +02:00
ZdenekSrotyr
45de71e8ab fix(scheduler): wire LLM pipeline into scheduler-v2 (#176)
The session-collector, verification-detector, and corporate-memory
services now run on the same scheduler-v2 model that already drives
data-refresh, health-check, script-runner, and marketplaces:

- New admin endpoints in app/api/admin.py:
    POST /api/admin/run-session-collector
    POST /api/admin/run-verification-detector
    POST /api/admin/run-corporate-memory
  All admin-gated, sync-def (FastAPI thread pool), with one audit row
  per invocation. Same single-writer-of-system.duckdb pattern as the
  existing /api/marketplaces/sync-all job.

- services/scheduler/__main__.py JOBS gains three entries with offset
  cadences (10m / 15m / 17m, all coprime modulo the 30s tick) so the
  three LLM-backed jobs don't fire on the same tick and stack their
  API + DB load.

- The verification-detector endpoint surfaces the LLM factory's
  fail-fast ValueError as HTTP 500 with the actionable message,
  preserving the no-silent-skip contract from the previous commit.

Tests:
- tests/test_admin_run_endpoints.py covers admin gating + scheduler
  registration + endpoint contract.
- tests/test_scheduler_sidecar.py existing tests continue to pass.
2026-05-04 23:57:43 +02:00
ZdenekSrotyr
bbb04ac041 fix(setup): seed default ai: block + env-var fallback (#176)
POST /api/admin/configure now writes a default ai: block into the
instance.yaml overlay when the request leaves it untouched and either
ANTHROPIC_API_KEY or LLM_API_KEY is set in the environment. The block
references the env var via ${VAR} syntax — secrets never land in YAML.

connectors.llm.factory grows create_extractor_from_env_or_config which
falls back to ANTHROPIC_API_KEY / LLM_API_KEY when ai_config is empty
and raises a clear ValueError when neither is available. Both
services/corporate_memory and services/verification_detector switch to
the new helper, replacing the old 'silently skip when ai: missing'
path that was the silent-failure root cause.

Tests:
- tests/test_setup_ai_block.py — overlay seeding contract.
- tests/test_llm_provider_env_fallback.py — fallback + fail-fast.
2026-05-04 23:55:19 +02:00
ZdenekSrotyr
5915f92eaa fix(query-guardrail): single-pass alternation regex (Devin Review on query.py:464)
The iterative bare-name rewriter (one re.sub per name, longest-first)
was vulnerable to cross-contamination when the GCP project ID contained
a registered table name as a hyphen-delimited word.

Concrete repro:
  project        = 'my-ue-project'
  registered     = ['orders', 'ue']
  user SQL       = 'SELECT * FROM orders JOIN ue ON ...'
  iter 1 (orders): produces 'FROM `my-ue-project.fin.orders` JOIN ue ...'
  iter 2 (ue):     '\bue\b' matches 'ue' INSIDE 'my-ue-project' (hyphen
                   creates word boundary on both sides) — corrupts
                   the iter-1 path

Fallback at query.py:576 caught the resulting BQ parse error and fell
back to per-table SELECT * estimate, so impact was over-estimation,
not fail-open — but the #171 partition-pruning fix silently degraded
to pre-fix behavior whenever a project name shared a hyphen-segment
with a registered table.

Fix: single re.sub call with an alternation regex sorted longest-first.
Single-pass means each source position is processed exactly once, so
freshly-inserted backticked text from one match isn't re-scanned by
later names in the alternation.

Regression test
test_rewrite_helper_does_not_corrupt_when_project_id_contains_registered_name
covers the exact Devin repro.
2026-05-04 22:51:33 +02:00
ZdenekSrotyr
424ec9b0f4 refactor(install.html): single tile, single PAT-mint body shape
Drops the `<nav class="role-tiles">` block (Analyst / Admin tiles),
the `_show_admin_tile` flag, the `const ROLE = {{ role | tojson }};`
JS line, and the role-aware PAT-mint ternary. The setupNewClaude
button now mints a uniform PAT for everyone:

  { name: defaultTokenName(), expires_in_days: 90 }

…against the existing `POST /auth/tokens` endpoint. No new endpoint,
no role-locked TTL clamp. The `bootstrap-analyst` 1-hour scope is no
longer used from /setup (it broke the install flow anyway — saved PATs
expired before the user opened Claude Code; tracked as a separate
cleanup issue).

Also removes the now-unused `.role-tiles` / `.role-tile` CSS rules so
the stylesheet doesn't carry dead selectors.

Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 6.
2026-05-04 22:18:00 +02:00
ZdenekSrotyr
2ee529533f refactor(setup-page): drop role query param
The `/setup` route no longer accepts `?role=analyst|admin`. The route
signature drops the `Literal[...] = Query(...)` parameter and the
silent admin-downgrade block (`if role == "admin" and not is_admin:
role = "analyst"`). The `role` ctx variable threaded into install.html
also goes away — Task 6 cleans up the template's role-tile UI and the
JS PAT-mint ternary.

`?role=` is silently ignored by FastAPI for unknown query params, so
existing bookmarks (none in production — the param was added in this
PR and never shipped) just degrade to the unified layout. No
RedirectResponse shim needed.

Tests: drop the entire `tests/test_setup_page_roles.py` file (eight
role-branching tests that no longer apply) and add
`tests/test_setup_page_unified.py` with three tests:

  - `test_setup_page_renders_unified_layout`
  - `test_setup_page_ignores_role_query_param`
  - `test_setup_page_renders_marketplace_for_user_with_grants`
  - `test_install_legacy_path_redirects_to_setup`

Also replace the role-aware `test_install_preview_*` tests in
test_web_ui.py with unified-layout assertions.

Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 5.
2026-05-04 22:16:59 +02:00
ZdenekSrotyr
291079b1d2 refactor(welcome-template): drop role param; resolve plugins per-user unconditionally
Removes the `role: Literal["analyst", "admin"] = "admin"` parameter from
`compute_default_agent_prompt`. The same RBAC pass
(`marketplace_filter.resolve_allowed_plugins`) now runs for every user —
admin or not. Users with no `resource_grants` rows get the
no-marketplace layout; users with grants get the marketplace block
inserted. Admin-vs-analyst is no longer a layout branch.

`render_agent_prompt_banner` no longer derives a `role` from
`user.is_admin`; it just delegates to `compute_default_agent_prompt`.
Two `compute_default_agent_prompt(...role=role)` call sites in
`app/web/router.py::setup_page` are updated to drop the keyword so the
route keeps rendering — Task 5 will remove the `?role=` query
parameter and the silent admin-downgrade block from the route signature
itself.

Tests: drop role-aware assertions from test_welcome_template_renderer
and test_welcome_template_api. Both files now assert the unified
default contains `agnes init` + `uv tool install` and bans the legacy
`agnes auth import-token` / `agnes auth whoami` verbs.

Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 4.
2026-05-04 22:13:46 +02:00
ZdenekSrotyr
74b7f6e254 feat(setup-instructions): preflight checks both git and claude
Renames `_git_check_block` to `_preflight_block` and adds a
`claude --version` check beside `git --version`. Both binaries are
required by the marketplace step — git for the clone fallback,
claude for `claude plugin marketplace add` / `claude plugin install` —
so checking them together gives one clear failure instead of two
confusing downstream errors.

Install hints: `npm i -g @anthropic-ai/claude-code` for Linux / WSL
plus a doc URL (https://docs.claude.com/claude-code) for the native
macOS / Windows installers. We don't try to one-line a native
installer; the canonical instructions live upstream.

Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 3.
2026-05-04 22:11:38 +02:00
ZdenekSrotyr
e16698c3cc refactor(setup-instructions): unified layout with mandatory agnes init
Adds `_step_numbers(*, has_marketplace, has_skills)` so step numbering
lives in one place instead of being split across three branches in
`resolve_lines`. Pins the unified layout in the tests:

  No plugins:     1 install, 2 init, 3 catalog, 4 diagnose, 5 skills, 6 confirm
  With plugins:   1, 2, 3, 4 preflight, 5 marketplace, 6 diagnose, 7 skills, 8 confirm

`agnes auth import-token` / `agnes auth whoami` are now banned from the
rendered prompt — `agnes init` subsumes them. The renamed
`test_resolve_lines_no_plugins_unified_six_step_layout` asserts those
strings are absent and that the new step headers (`Bootstrap your Agnes
workspace`, `Verify the data is queryable`) are present.

Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 2.
2026-05-04 22:10:05 +02:00
ZdenekSrotyr
9334beed15 refactor(setup-instructions): drop role param; collapse analyst/admin into one layout
Removes the `role: Literal["analyst", "admin"]` parameter from
`resolve_lines` / `render_setup_instructions` and deletes the
`_resolve_analyst_lines`, `_analyst_init_lines`, `_analyst_finale_lines`
helpers. The unified flow now always emits `agnes init` (the
workspace-rails delivery mechanism) in place of the legacy
`agnes auth import-token` + `agnes auth whoami` pair, and uses
`agnes catalog` as the smoke-verify step.

`agnes init` already verifies the PAT internally, and `agnes catalog`
doubles as a data-plane smoke check, so dropping `agnes auth whoami`
costs no signal.

Drops the now-redundant `tests/test_setup_instructions_analyst.py` and
patches the one ordering test in `tests/test_setup_instructions.py` that
referenced the old "Log in" / "Verify the login" headers. Also strips
the `role=role` kwarg from `compute_default_agent_prompt`'s call into
`resolve_lines` so the welcome-template render path keeps working;
welcome_template.py's own role param is removed in a follow-up task.

Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 1.
2026-05-04 22:08:48 +02:00
ZdenekSrotyr
103efb69f0 chore(cli-rename): replace stale da verbs in active code paths
Bring admin UI, audit-log messages, code comments, and analyst-facing
skill docs in line with the post-bootstrap CLI surface (`agnes pull`,
`agnes push`, `agnes init`, `agnes snapshot create`). The legacy
`_LEGACY_STRINGS` detection tuple in `app/api/claude_md.py` and the hook
upgrade markers in `cli/lib/hooks.py` are intentionally left as-is —
they exist precisely to flag pre-rewrite content for re-authoring.

Strip "(folded from `da metrics list`)" / "(lifted from `da metrics
show`)" / "Replaces the old `da analyst status`" docstring noise — the
rename history is in CHANGELOG.md, not in module docstrings.
2026-05-04 21:10:43 +02:00
ZdenekSrotyr
500db8cd3c fix(query-guardrail): dry-run user SQL not synthetic SELECT * (#171)
Closes #171. The /api/query cost guardrail used to dry-run a synthetic
`SELECT * FROM <table>` for each registered remote-BQ row referenced
by the user SQL — which made BigQuery estimate a full table scan, with
column projection, predicate pushdown, and partition pruning all
disabled. Narrow queries on big partitioned/clustered tables (the
documented happy path for `agnes query --remote`) hit ~30,000×
over-estimates and got rejected with 400 `remote_scan_too_large` even
when BQ's own dry-run reported single-digit MB.

Pavel's report on #171 traced the root cause and proposed the fix:
rewrite the user SQL to BQ-native syntax and dry-run it as a single
job, exactly the way `bq query --dry_run` works.

Implementation:
- New helper _rewrite_user_sql_for_bq_dry_run rewrites bare registered
  names (word-boundary, case-insensitive, longest-first to avoid prefix
  collisions) + bq."<ds>"."<tbl>" forms to backticked
  `<project>.<ds>.<tbl>` paths.
- _bq_quota_and_cap_guard runs ONE dry-run on the rewritten SQL. Cap
  check uses the real estimate.
- Fallback path: if BQ rejects with bq_bad_request (e.g. DuckDB-only
  syntax like ::INT casts), the guard falls back to the pre-fix
  per-table SELECT * approach so non-portable queries still get a
  (loose) cap estimate instead of fail-opening. Non-parse BQ errors
  (forbidden, upstream) still propagate as 502.
- _bq_guardrail_inputs now also returns name_lookups so the rewriter
  has the (registered_name, bucket, source_table) mapping it needs.
- Per-table breakdown is unavailable from a composite dry-run; total
  bytes are pinned to dry_run_set[0] for the post-flight
  record_bytes(sum(...)) call to keep returning the right total.

Tests (7 new, 3 existing still pass):
- dry-run receives rewritten user SQL with WHERE clause intact (the
  load-bearing assertion for #171)
- single dry-run per request even with multiple registered tables
  (JOIN, UNION) referenced
- fallback to per-table SELECT * on bq_bad_request
- non-parse BQ errors (forbidden) still 502
- rewriter unit tests: bare + bq.path in same SQL, longest-name-wins
  on prefix collision, case-insensitive bare-name match
2026-05-04 21:08:21 +02:00
ZdenekSrotyr
e438170ade merge: pull #174 (BQ materialize view fix + concurrency, 0.33.0) into bootstrap branch
Brings in zs/materialize-sync-fix (PR #174):
- BigQuery view materialize works (wrap admin SQL in bigquery_query())
- Per-table mutex + fcntl.flock for concurrent COPY corruption
- Cost guardrail dry-run engages on materialized rows
- Schema v23 -> v24 migration: rewrite source_query to BQ-native
- Server-generated trivial source_query from bucket+source_table
- Validator backtick relaxation for materialized rows
- 0.33.0 release cut

Conflict resolution:
- CHANGELOG.md: keep our [Unreleased] (bootstrap rewrite content) ABOVE
  the new [0.33.0] section from #174. The bootstrap rewrite remains
  unreleased; it'll cut 0.34.0 (or later) when this PR merges to main.
- tests/conftest.py: union — keep our analyst-bootstrap fixture
  re-export AND #174's bq_instance / stub_bq_extractor fixtures.
- pyproject.toml auto-merged to 0.33.0 (matches the cut), correct.
- src/db.py auto-merged: SCHEMA_VERSION = 24, _v23_to_v24_finalize
  added — no overlap with our work which left schema at v23.
- CLAUDE.md auto-merged: schema-history paragraph extended with v24.

Verified: 79/79 across CLI bootstrap suite + materialize suite +
schema v24 migration tests pass locally on Python 3.13/macOS.
2026-05-04 20:53:00 +02:00
ZdenekSrotyr
92d477e422 fix(setup): default /setup to analyst, hide admin tile from non-admins
Three coupled UX fixes for the analyst-onboarding flow:

1. Dashboard "Setup a new Claude Code" CTA was rendering admin paste
   prompt for everyone (analysts couldn't actually execute the marketplace
   plugin install / skills setup steps). render_agent_prompt_banner now
   picks role based on user.is_admin — analysts get the analyst flow.

2. /setup default role changed from admin to analyst. Most visitors are
   analysts; admin layout is opt-in via the admin tile or ?role=admin.

3. Admin tile is admin-only on the role-tile nav. Non-admins see only
   the analyst tile. Server-side: non-admin requesting ?role=admin is
   silently downgraded to analyst (otherwise they'd see admin paste
   prompt despite no tile).

Tests:
- New: test_setup_page_admin_tile_hidden_for_non_admin (anonymous client
  can't see "Admin CLI" or role=admin link)
- New: test_setup_page_admin_role_downgraded_for_non_admin (anonymous
  ?role=admin → analyst layout, no marketplace step in clipboard)
- New: test_install_preview_default_role_is_analyst (admin signing in to
  bare /setup gets analyst clipboard by default)
- Renamed: test_setup_page_default_role_is_admin → ..._is_analyst
- Updated: test_setup_page_admin_clipboard_renders_admin_layout uses
  FastAPI dependency_overrides to inject admin user (admin layout is
  now admin-gated)
- Updated: test_install_preview_visible_for_signed_in_user explicitly
  passes ?role=admin to exercise admin layout
2026-05-04 20:20:37 +02:00
ZdenekSrotyr
3d58768143 fix: address Devin Review findings — incomplete renames + estimate guard
13 Devin findings across 10 files:

🔴 Critical:
- app/api/v2_catalog.py:42 — `_fetch_hint` returns `da fetch` in /api/v2/catalog
  responses (user-visible in every catalog list)
- cli/skills/agnes-data-querying.md — 11 stale `da fetch`/`da sync` refs in the
  bundled skill markdown
- config/claude_md_template.txt:38 — referenced `agnes pull --docs-only` flag
  that does NOT exist in agnes pull (removed; spec only ships --quiet/--json/
  --dry-run)

🟡 Important:
- app/api/admin.py:252 — `da fetch` in bq_max_scan_bytes hint
- cli/commands/auth.py:119 — `da sync` in import-token docstring (--help text)
- cli/commands/tokens.py:48 — "Export it so `da` can use it" prose
- ARCHITECTURE.md — 4 stale rows in CLI commands table
- README.md — stale paragraphs for analysts (da sync, da analyst setup)

🚩 Substantive observations addressed:
- app/api/query.py:249,302,489 — server-side error/help strings still said
  `da sync`/`da fetch` (returned in API responses to clients)
- cli/commands/snapshot.py:235-241 — DuckDB existence guard incorrectly
  blocked `--estimate` (server-side dry-run that never opens local DB).
  Added test ensuring estimate path skips the guard.

Skipped (intentionally historical):
- app/api/admin.py:2377,2429,2437 — historical comments describing past
  manifest-vs-sync_state bug; past tense, accurate to keep as `da sync`.
2026-05-04 20:05:06 +02:00
ZdenekSrotyr
5bffec641f chore(lint): final ruff fixes 2026-05-04 19:32:52 +02:00
ZdenekSrotyr
6c0846fd17 feat(config): expose materialize.lock_ttl_seconds in server-config
New top-level 'materialize' section, single field (lock_ttl_seconds).
Default 86400 (24h). Backs the file-lock TTL reclaim added in the
per-table-mutex change. Editable via PUT /api/admin/server-config and
the /admin/server-config UI.
2026-05-04 18:52:54 +02:00
ZdenekSrotyr
3871d5320a feat(admin): server-generate materialized source_query, allow BQ backticks
When admin registers a materialized BQ row with bucket+source_table but
no source_query, the server generates 'SELECT * FROM `<project>.<ds>.<tbl>`'
from instance.yaml's configured BQ project. Same fallback fires on PUT
when flipping to materialized. The backtick rejection guard, which was
appropriate for DuckDB-flavor source_query, is relaxed for materialized
rows since the new wrapping path (Task 2) runs admin SQL through BQ
jobs API which uses BQ-native syntax (backticks for dashed identifiers).
2026-05-04 18:37:27 +02:00
ZdenekSrotyr
c7c42de0f0 feat(sync): treat MaterializeInFlightError as 'skipped, in_flight'
_run_materialized_pass distinguishes due-check skips from in-flight
skips and never calls state.set_error for either. summary['skipped']
becomes a list of {table, reason} dicts; the end-of-pass log line
breaks out the in_flight subcount.

Hoists is_table_due to module-level import so test monkeypatching of
the symbol intercepts the call (the previous local import made
patches a no-op).
2026-05-04 18:11:38 +02:00
ZdenekSrotyr
a92c624dba feat(admin): yellow banner for legacy CLI verbs in workspace-prompt override 2026-05-04 17:46:50 +02:00
ZdenekSrotyr
8091620d33 fix(setup): role-aware clipboard render + JSON-escape ROLE injection
Two Task 4 review fixes for app/web/templates/install.html:

1. JSON-escape `ROLE` JS const via `{{ role | tojson }}` (defense in
   depth — removes the dependency on Jinja autoescape semantics for JS
   contexts; FastAPI's Literal validator already constrains role values).

2. Verify the analyst tile's clipboard payload is the analyst layout.
   The pre-existing role-aware plumbing (compute_default_agent_prompt
   threading role into setup_instructions_lines, picked up by the JS
   SETUP_INSTRUCTIONS_TEMPLATE array) was correct; adding regression tests
   that pin to the JS clipboard block specifically so a future inversion
   would fail loudly.

Tests: analyst clipboard contains `agnes init` + `agnes catalog` and
NOT `agnes auth import-token` / `agnes skills`; admin clipboard is the
inverse. Plus an explicit assertion that ROLE is rendered via tojson.
2026-05-04 17:43:46 +02:00
ZdenekSrotyr
7965f8021d fix(setup): role-aware PAT scope+TTL in setupNewClaude JS (Task 4 spec fix) 2026-05-04 17:34:30 +02:00
ZdenekSrotyr
f731ee7897 feat(setup): /setup?role=analyst|admin branching with role tiles 2026-05-04 17:28:47 +02:00
ZdenekSrotyr
54f83c281c test(setup): I1+I2 review fixes — AGNES_WORKSPACE.md alignment + step-number pin 2026-05-04 17:23:15 +02:00
ZdenekSrotyr
ae00945cbf fix(setup): clean stale 'da' refs in setup_instructions.py (Task 0.5 missed sweep) 2026-05-04 17:19:55 +02:00
ZdenekSrotyr
29e28ccbd3 feat(setup): add analyst role to install-prompt renderer 2026-05-04 17:17:59 +02:00
ZdenekSrotyr
59324f9361 feat(admin): scan CLAUDE.md override for legacy strings 2026-05-04 17:10:58 +02:00
ZdenekSrotyr
4ee7323436 feat(tokens): add scope + ttl_seconds fields with bootstrap-analyst clamp 2026-05-04 17:00:54 +02:00
ZdenekSrotyr
1563b05f2e refactor(cli): hard-cutover env vars + config dir to AGNES_*
Task 0.5 of clean-analyst-bootstrap. Greenfield rewrite — no fallback,
no aliases. Existing dev environments lose their cached PAT and must
re-authenticate.

Env var renames (hard cutover):
- DA_CONFIG_DIR    -> AGNES_CONFIG_DIR
- DA_SERVER        -> AGNES_SERVER
- DA_SERVER_URL    -> AGNES_SERVER_URL  (test-only stale ref, not in spec)
- DA_NO_UPDATE_CHECK -> AGNES_NO_UPDATE_CHECK
- DA_LOCAL_DIR     -> AGNES_LOCAL_DIR
- DA_TOKEN         -> AGNES_TOKEN
- DA_STREAM_RETRIES -> AGNES_STREAM_RETRIES

Config dir rename: ~/.config/da/ -> ~/.config/agnes/ (across code,
comments, docstrings, error messages, install templates, dev scripts).

Stale `da X` references in CLI source (and adjacent app/, tests/):
swept docstrings, comments, help text, and error messages where the
verb survives the rewrite (init, pull, push, catalog, status, diagnose,
auth, admin, skills, query, schema, describe, explore, disk-info,
snapshot, login, logout, whoami, server, setup) and replaced `da X`
with `agnes X`. Intentionally kept `da sync`, `da fetch`, `da analyst`,
`da metrics` — those verbs are removed in later tasks; the legacy
strings will be detected by `_LEGACY_STRINGS` (added in Task 2).

Test fixes:
- TestCLIVersion now asserts output starts with `agnes ` (was `da `).

Test results: 2675 passed, 25 skipped (full pytest run, excluding 9
pre-existing test_db.py / test_user_management.py / test_e2e_extract.py
/ test_cli_binary_rename.py failures unrelated to this rename).
2026-05-04 16:35:44 +02:00
ZdenekSrotyr
4bd1919f77 fix(query): #168 review iter 5 — forbidden-table check uses registry IDs
Devin Review iter #5 flagged a pre-existing class of name/id mismatch
in app/api/query.py:131-136 — the SAME root cause as the bq.* RBAC
issue I fixed in iter #3 (line 332/362). Devin called it out as
"NOT introduced by this PR" / "might merit follow-up", but it's
exactly the same security-boundary pattern this PR is hardening, so
fixing here keeps the RBAC story consistent across the handler.

The `forbidden = all_views - set(allowed)` comparison mixed types:
- `all_views` carries DuckDB master view names (= registry display
  `name` from the orchestrator's CREATE VIEW)
- `set(allowed)` carries registry IDs (resource_grants.resource_id)

When `id != name` (e.g. id="bq.finance.ue", name="ue"), authorized
users got spurious 403s — the view name landed in `forbidden` even
though the caller had a valid grant on the registry id.

Build a name->id map from the registry, then the forbidden check
compares apples to apples:
    allowed_view_names = {r["name"] for r in registry_rows
                          if r.get("name") and r.get("id") in allowed_ids}
    forbidden = all_views - allowed_view_names

107 affected tests pass; 487 pass in wider RBAC/query/access/admin
domain — no regressions.
2026-05-04 14:18:43 +02:00
ZdenekSrotyr
28aba4c1f9 fix(query): #168 review iter 3 — RBAC name-vs-id, placeholder dead code
Devin Review iter #3 found 3 new real bugs after iter #2's fixes landed.

🔴 RBAC check at app/api/query.py:362 used `row["name"]` against
`accessible_set`, but `accessible_set` is keyed by registry IDs
(`get_accessible_tables` returns `resource_grants.resource_id` —
table IDs, not display names). Confirmed by `_table_blocks` projection
at `app/resource_types.py:157-158`. When `id != name` (e.g.
`id="bq.finance.ue", name="ue"`), non-admin users with valid grants
got 403 `bq_path_access_denied`. Switch to `row["id"]`.

🚩 Bare-name pass at app/api/query.py:332 had the same name-vs-id
mismatch (different impact): legitimate accessible rows were skipped
from `dry_run_set`, so the cost guardrail under-counted scan bytes
for non-admin users. Could let an over-cap query through and
under-bill quota. Switch to `row_id` comparison.

🟡 `placeholder_from` for billing_project was dead code.
`_BQ_OPTIONAL_FIELD_DEFAULTS["billing_project"] = ""` seeded an empty
string into every GET payload via `_ensure_bq_optional_fields`. JS
`isUnset = (value === undefined)` evaluated False, so the
`(defaults to <project>)` placeholder NEVER rendered. Drop the seed —
field stays in `known_fields` (UI sees it) but routes through the
unset rendering path on GET, where placeholder_from fires.

Tests: test_get_surfaces_bq_fields_even_when_unset assertion flipped
from "billing_project IS present" to "billing_project NOT auto-seeded"
to lock in the new shape. 67 affected tests pass.
2026-05-04 13:51:36 +02:00
ZdenekSrotyr
5eaa449fcc fix(query): #168 review iter 2 — quota user_id parity + concurrent-slot 429
Devin Review iter #2 found 2 new issues (after iter #1's 5 fixes
landed). Both real, both addressed.

🔴 Quota user_id key mismatch defeated shared daily budget. /api/query
computed `user.get("id") or user.get("email")` while /api/v2/scan uses
`user.get("email") or "anon"` (app/api/v2_scan.py:327). Same user → two
different keys in the singleton QuotaTracker. BQ bytes consumed via
/api/query were tracked under UUID; via /api/v2/scan under email; the
`check_daily_budget` pre-flight on either endpoint never saw the
other's recorded bytes — per-user cap was effectively doubled. Match
v2/scan's email-first ordering.

🟡 QuotaExceededError(KIND_CONCURRENT) → 400 instead of 429.
`quota.acquire(user_id)` raises this from __enter__ when the per-user
concurrent-scan slot is at cap. The exception propagated through the
@contextlib.contextmanager generator, the caller's `with guard:`
block, and was caught by execute_query's generic `except Exception`
handler → mapped to 400 with a flattened "Query error: concurrent_scans:
N/M" string, dropping the typed retry_after_seconds field. Wrap the
`with quota.acquire(...)` in a try/except QuotaExceededError that maps
to 429 with the same typed-detail shape used for the daily-budget
rejection — consistent with /api/v2/scan:392-402.

Tests: test_api_query_quota.py user_id strings updated to
"admin@test.com" (the seeded_app admin's email) to match the new
email-first ordering. 40 affected tests pass.
2026-05-04 13:38:31 +02:00
ZdenekSrotyr
1263b80726 fix(query): #168 review — concurrent-slot wraps execute, doc/JS fixes
Devin Review on PR #168 found 5 issues — all real, all addressed.

🚩 ANALYSIS_001 (architectural): concurrent-slot guard didn't protect
actual BQ query execution. Earlier `_enforce_remote_bq_quota_and_cap`
ran dry-run + cap check inside `with quota.acquire(user_id):`, then
returned — releasing the slot BEFORE `analytics.execute(...)` ran. Spec
§4.3.3 explicitly designs the slot to wrap execute so the per-user
concurrent cap limits BQ scans, not just dry-runs.

Refactor to a context manager `_bq_quota_and_cap_guard`. Caller's `with`
block now holds the slot through dry-run, cap check, the actual
`analytics.execute(...)` (which is what triggers the BQ scan when DuckDB
resolves the master view), AND the post-flight record_bytes. Slot
released only when caller's `with` body exits.

🟡 BUG_001: placeholder JS walked `original` (full GET payload root)
instead of `original.sections`. `placeholder_from: ["data_source",
"bigquery", "project"]` is a section-relative path, so billing_project
placeholder NEVER rendered. Fix: walk `original.sections` (with fallback
to `original` for safety).

🟡 BUG_002 + BUG_003: admin_tables.html register and edit modals'
operator help text referenced `max_bytes_per_remote_query` (the old
name from the spec) but the actual config key is `bq_max_scan_bytes`
after the fix-up commit `6423888d` moved it. Replace both occurrences.

🟡 BUG_004: CHANGELOG entry said `api.query.bq_max_scan_bytes` (the
old path) but the read at app/api/query.py:53 is
`get_value("data_source", "bigquery", "bq_max_scan_bytes", ...)`. An
operator who set it under `api.query` in their yaml would have no
effect. Correct path in CHANGELOG.

All 95 #160-affected tests pass after the changes.
2026-05-04 13:28:03 +02:00
ZdenekSrotyr
6423888d02 fix(query): #160 move bq_max_scan_bytes to data_source.bigquery (UI editable)
E2E test on dev VM revealed: spec said "configurable via /admin/server-config"
for the cost guardrail cap, but the underlying read path was
`api.query.bq_max_scan_bytes` and `api` is NOT in `_EDITABLE_SECTIONS`. POST
to /admin/server-config rejected `{"sections":{"api":...}}` as "unknown
section(s): api" — the cap was only adjustable via direct YAML edit.

Move to `data_source.bigquery.bq_max_scan_bytes`:
- `_default_remote_query_cap_bytes()` reads from the new path.
- Add to `_OPTIONAL_FIELDS["data_source"]["bigquery"]["fields"]` with the
  same shape as `max_bytes_per_materialize` (kind=int, default 5 GiB, hint).
- Add to `_BQ_OPTIONAL_FIELD_DEFAULTS` so it surfaces in the GET payload
  even when YAML omits it.

Convention now mirrors `max_bytes_per_materialize` — both BQ cost
guardrails live under `data_source.bigquery`, both editable in the UI.
2026-05-04 12:46:38 +02:00
ZdenekSrotyr
39bdc1ff45 feat(admin): #160 BQ test-connection endpoint + billing_project placeholder UI
Closes the operator-side half of the reporter's loop. The CLI fix in
the previous commit makes USER_PROJECT_DENIED errors readable to
analysts; this commit lets admins verify reachability proactively
from /admin/server-config without waiting for analyst reports.

New endpoint POST /api/admin/bigquery/test-connection
(app/api/admin_bigquery_test.py, ~110 LOC):
- Depends(require_admin); registered in app/main.py.
- Builds BqAccess via existing get_bq_access(), runs `SELECT 1 AS ok`
  with a 10s polling timeout.
- 200 with {ok, billing_project, data_project, elapsed_ms} on success.
- 400 for `BqAccessError(not_configured)` (operator config issue).
- 502 for any other typed BqAccessError or unknown upstream exception.
- 504 for concurrent.futures.TimeoutError; best-effort cancel_job
  invoked (BQ-side cancel may still run; documented caveat).

Server-config placeholder (app/api/admin.py + admin_server_config.html):
- `data_source.bigquery.billing_project` field-spec gains
  `placeholder_from: ["data_source", "bigquery", "project"]`.
- renderLeafInput's text branch reads `opts.spec.placeholder_from`,
  walks the loaded `original` config dict, injects
  `placeholder="(defaults to <project>)"` into the input HTML at
  construction time. Admin sees the access.py:339-340 fallback rule
  visible directly in the UI without reading source.

UI button:
- "Test BigQuery connection" button next to data_source's Save button.
- onTestBigQuery() POSTs to the endpoint, renders structured result
  inline (green check + elapsed_ms on success; red kind + hint on
  failure).

Tests: 6 endpoint cases + 1 placeholder payload test = 7 GREEN. 62
total across the affected admin server-config test files.
2026-05-04 10:31:35 +02:00
ZdenekSrotyr
77cdb65f76 sec(query): #160 BQ_PATH catches quoted "bq" catalog token (Phase 3 review)
Phase 3 review identified an RBAC + cost-cap bypass: `SELECT * FROM
"bq"."ds"."tbl"` (catalog token quoted as a DuckDB identifier) was NOT
matched by the BQ_PATH regex, so direct quoted-form references skipped
both the registry check and the cost-cap dry-run. DuckDB resolves
`"bq"` to the same ATTACHed BQ catalog, so the bypass is real.

Widen the catalog-token alternation: `(?:"bq"|bq)` matches both forms.
Negative lookbehind `(?<![\w.])` still rejects look-alike prefixes
(`other_bq`, `my_bq`); the new "my_bq".ds.tbl negative test locks that
in alongside `other_bq.ds.tbl`.

Tests:
- 2 new positive cases in tests/test_query_bq_regex.py for the quoted
  form (`"bq"."finance"."ue"` and uppercase `"BQ"."ds"."tbl"`).
- 1 new negative case rejecting `"my_bq".ds.tbl` so the quoted-form
  widening doesn't open a different evasion.
- 1 new RBAC test in tests/test_api_query_rbac_bq_path.py: admin
  hitting an unregistered quoted path returns the same
  bq_path_not_registered 403 as the unquoted form.

All 33 Phase 3 tests pass after the fix.
2026-05-04 10:31:35 +02:00
ZdenekSrotyr
896c43c7a2 feat(query): #160 cost guardrail + bq.* RBAC + quota integration on /api/query
The headline implementation for issue #160. POST /api/query now gates
direct `bq."<dataset>"."<source_table>"` references behind the registry
and bounds the BQ scan cost behind a configurable cap. Wired through
the same singleton QuotaTracker as /api/v2/scan so daily-byte budgets
are shared across both BQ-touching paths.

Changes in app/api/query.py:

- Add module-level `BQ_PATH` regex matching the 16 syntax variants
  verified empirically (fully-quoted, unquoted, mixed quoting,
  case-insensitive, inside CTE bodies, multi-path, …).
- Add `bigquery_query` to the SQL keyword blocklist. Closes the
  pre-existing function-call backdoor where a user could run an
  arbitrary BQ jobs API call against any reachable dataset, bypassing
  the registry and RBAC. Wrap views internal to the BQ extractor still
  use bigquery_query() — but those run via DuckDB view resolution at
  query time, not via user-submitted SQL, so the blocklist doesn't
  break them.
- Add `_bq_guardrail_inputs` helper: walks user SQL twice — once for
  bare-name matches against accessible registered remote-BQ names
  (contributes to dry_run_set), once for direct `bq.X.Y` matches
  (gated against `find_by_bq_path` lookups, returns 403 with
  structured detail on miss or grant violation).
- Add `_enforce_remote_bq_quota_and_cap` helper: pre-flight
  `check_daily_budget` (over-cap → 429), then `with quota.acquire(...)`
  wraps a per-path BQ dry-run, sums bytes, raises 400
  `remote_scan_too_large` when total > cap.
- Cap default 5 GiB; configurable via `api.query.bq_max_scan_bytes`
  in /admin/server-config (next phase wires the UI).
- Post-flight `record_bytes` against the user's daily counter.
- Module-level imports of `_bq_dry_run_bytes`, `_build_quota_tracker`,
  `get_bq_access` so tests can monkeypatch via `app.api.query.<name>`.

Tests:
- All 23 RED tests from the previous commit now pass (regex matrix,
  blocklist with detail-string assertion, RBAC unregistered/admin-bypass,
  guardrail dry-run-called/over-cap-rejected, quota pre-flight 429).
- mock_dry_run fixture stubs both `_bq_dry_run_bytes` and `get_bq_access`
  so guardrail tests don't require a live BQ project.
- Quota test uses `admin1` (the seeded_app fixture's actual user id, not
  `admin`).

Smoke: 887 passed across query/bq/admin/extractor/registry/quota
domains. No regressions.
2026-05-04 10:31:35 +02:00
ZdenekSrotyr
e44d2280e5 refactor(quota): #160 relocate _build_quota_tracker to v2_quota.py
The /api/query cost guardrail (next phase) needs the same singleton
QuotaTracker so its daily-byte and concurrent-slot caps accumulate
across both /api/v2/scan and /api/query BQ-touching paths.

Move `_build_quota_tracker`, `_quota_singleton`, and `_quota_init_lock`
from app/api/v2_scan.py to app/api/v2_quota.py (the natural home; the
factory uses QuotaTracker which already lives there). Re-export the
function from v2_scan.py so the 7 test sites at tests/test_v2_scan.py
(lines 77, 118, 143, 160, 186, 208, 250) keep working without edits.

Crucially do NOT re-export `_quota_singleton` from v2_scan.py — Python
`from X import var` copies the binding at import time, so a re-exported
singleton would freeze at the initial None and never observe the
in-place mutation done inside `_build_quota_tracker()`. Re-export only
the function (which always reads the live module-global through `global`).

Mechanical refactor; no behavior change. 30 quota-related tests pass.
2026-05-04 10:31:35 +02:00
ZdenekSrotyr
9d0e4e687d refactor(bq): #160 remove legacy_wrap_views config knob (always-wrap)
Now that VIEW/MATERIALIZED_VIEW always wrap via bigquery_query() (the
prior `legacy_wrap_views=True` branch behavior, made unconditional in
the previous commit), the toggle has no semantic meaning and is removed
across the codebase.

Production code:
- app/api/admin.py: drop the field from _OPTIONAL_FIELDS["data_source"]
  ["bigquery"]["fields"] and from _BQ_OPTIONAL_FIELD_DEFAULTS, plus the
  comment block above the defaults dict.
- config/instance.yaml.example: drop the example snippet.
- src/orchestrator.py: update the inner-objects skip-branch comment to
  reflect the new BQ behavior (the skip itself stays — keboola
  use_extension=False still inserts _meta rows without inner views).
- app/web/templates/admin_tables.html: rewrite operator copy in the
  register and edit forms to reflect always-wrap.

Tests:
- tests/test_admin_server_config.py (TestServerConfigBigQueryFields):
  flip assertions from "field IS present" to "field NOT present" on
  legacy_wrap_views. Drop the test_post_persists_legacy_wrap_views test
  since the field no longer exists.
- tests/test_admin_server_config_known_fields.py: same flip on the
  known-fields registry assertion.
- tests/test_bigquery_extractor.py: drop the obsolete
  test_view_entity_does_not_create_master_view_by_default (asserted the
  bug we fixed) and test_legacy_wrap_views_toggle_restores_old_behavior
  (toggle no longer meaningful). Update remaining test docstrings.

Operators with `legacy_wrap_views: true` set in their overlay get the
new (equivalent) behavior automatically — the unrecognized key is
silently ignored by the YAML loader. Operators with `false` get the
issue-#160 fix as a behavior change, not a regression.

Spec gate updated: production code grep gate
  grep -rn 'legacy_wrap_views' connectors app src config cli
must return zero. tests/ excluded — historical "removed in #160"
breadcrumbs and `assert "X" not in fields` regression guards retained
as anti-regression signals.
2026-05-04 10:31:35 +02:00
ZdenekSrotyr
955b56608d feat(api,web,cli): /admin/workspace-prompt + /api/welcome restored + da analyst writes CLAUDE.md
- app/api/claude_md.py: GET /api/welcome (analyst, auth required); GET/PUT/DELETE
  /api/admin/workspace-prompt-template; POST …/preview; two-pass Jinja2 validation
  on PUT; validation stub mirrors build_claude_md_context() shape
- app/main.py: register claude_md_router
- app/web/router.py: GET /admin/workspace-prompt → admin_workspace_prompt.html
- app/web/templates/admin_workspace_prompt.html: CodeMirror editor + live preview +
  status chip + reset modal; mirrors admin_welcome.html for Agent Setup Prompt
- app/web/templates/_app_header.html: add "Agent Workspace Prompt" nav item next to
  "Agent Setup Prompt"; extend _admin_active to cover /admin/workspace-prompt
- cli/commands/analyst.py: _init_claude_workspace now accepts server_url + token;
  _write_claude_md fetches GET /api/welcome, writes CLAUDE.md, graceful 404/5xx;
  setup command adds --no-claude-md flag to opt out; default = write CLAUDE.md
- tests: test_claude_md_api.py (16 tests); test_analyst_bootstrap.py updated with
  4 new CLAUDE.md bootstrap tests; test_welcome_template_api.py: update stale
  assertion about /api/welcome being removed (endpoint restored)
- tests/snapshots/openapi.json: regenerated
2026-05-03 22:44:14 +02:00
ZdenekSrotyr
9ad7856f72 fix(devin-review): dashboard CTA respects override; PUT validates anon path
Finding #1: _build_context now routes through render_agent_prompt_banner when
a DB connection is available, so both /setup and the /dashboard clipboard CTA
always reflect the admin override (or the live default when no override is set).
Previously _build_context unconditionally used resolve_lines(), ignoring the
welcome_template override for the dashboard JS array.

Finding #2: PUT /api/admin/welcome-template now performs a second render pass
with user=None (anonymous stub) after the authenticated-user pass. Templates
that reference user.* fields without an {% if user %} guard are rejected with
a clear 400 error explaining the anon-visitor breakage.
2026-05-03 21:45:32 +02:00
ZdenekSrotyr
d18bc4c8f7 fix(api): align PUT validation autoescape with runtime (False); docs match 2026-05-03 21:30:24 +02:00
ZdenekSrotyr
61ef0d0eed fix(devin-review): address 4 findings on PR #167
- Fix #1: _detect_existing_project now checks .claude/settings.json for
  "da sync" marker instead of deleted CLAUDE.md; update tests accordingly.
- Fix #2: preview endpoint uses autoescape=False to match /setup rendering;
  align render_agent_prompt_banner in welcome_template.py to the same.
- Fix #3: apply _sanitize_banner_html to override render path in setup_page
  so all render paths sanitize consistently.
- Fix #4: move .setup-link-banner into the existing-user branch where
  account_details.last_sync_display is reachable; remove dead copy from
  new-user branch.
2026-05-03 21:15:01 +02:00
ZdenekSrotyr
bcb62ff4e2 fix(ui): tighten dashboard token row gap; lift editor/preview labels above panes 2026-05-03 19:51:34 +02:00
ZdenekSrotyr
05f12b416d fix(ui): dashboard token row alignment + match editor/preview heights 2026-05-03 19:23:50 +02:00
ZdenekSrotyr
dc931a6556 feat(admin-prompt): default = live setup script; override replaces /setup content
The /admin/agent-prompt editor now pre-fills with the full bash bootstrap
script from setup_instructions.resolve_lines() instead of being empty.
When an admin saves an override it replaces the default everywhere — the
/setup page display and the dashboard clipboard CTA — rather than adding a
banner above the auto-generated commands.

GET /api/admin/welcome-template now returns a `default` field with the live
computed script so the editor always shows meaningful starting content.

{server_url} and {token} single-brace placeholders survive Jinja2 rendering
and are substituted by JavaScript at clipboard-copy time as before.

Preview pane switches to textContent (not innerHTML) since content is bash.
2026-05-03 16:31:35 +02:00
ZdenekSrotyr
c4d23cf235 feat(admin-prompt): update editor UX + docs for banner context
- admin_welcome.html: update subtitle, description, placeholder cheatsheet
  (drop tables/metrics/marketplaces/sync_interval; add user-null note and
  security note). Textarea initial value is now empty (no default template
  to show). Preview pane uses innerHTML (HTML output). refreshStatus sets
  editor to empty when no override. Preview pane styled as light surface.
  Reset modal copy updated (no banner shown, not "OSS-shipped template").
- config/claude_md_template.txt: deleted (markdown template is gone;
  default is now no banner).
- docs/agent-setup-prompt.md: rewritten for variant C — describes the
  /setup banner, smaller placeholder table, security/sanitization notes,
  anonymous-user guard, example HTML snippet.
2026-05-03 16:12:13 +02:00
ZdenekSrotyr
8db4c1645b feat(admin-prompt): variant C — banner on /setup, drop CLAUDE.md generation
- src/welcome_template.py: rewrite as HTML banner renderer
  (render_agent_prompt_banner); drop _list_tables, _metrics_summary,
  _marketplaces_for_user, render_welcome, _load_default_template.
  build_context now exposes only instance/server/user/now/today.
  _sanitize_banner_html strips script/iframe/on*/javascript: post-render.
- app/api/welcome.py: drop get_welcome handler, WelcomeResponse, old
  _VALIDATION_STUB_CONTEXT. Admin endpoints stay at same URLs; validation
  stub updated to match new slim context. Preview now uses autoescape=True.
- app/web/router.py: setup_page calls render_agent_prompt_banner and passes
  banner_html to install.html; admin_agent_prompt_page drops _load_default_template.
- app/web/templates/install.html: add .setup-banner CSS + banner block above hero.
- cli/commands/analyst.py: replace _generate_claude_md with _init_claude_workspace;
  no CLAUDE.md written, only .claude/CLAUDE.local.md placeholder + settings.json hooks.
- tests: delete test_cli_analyst_welcome.py (tests deleted endpoint/function);
  rewrite TestGenerateClaudeMd → TestInitClaudeWorkspace; update api test to
  assert /api/welcome returns 404 and remove welcome-fetch tests.
2026-05-03 16:12:13 +02:00
ZdenekSrotyr
60386b9c3c polish: drop dead CSS, fix docstring drift, add agent-prompt route test 2026-05-03 16:12:13 +02:00
ZdenekSrotyr
ecb6c35ad5 feat(admin): rename /admin/welcome to /admin/agent-prompt (Agent Setup Prompt)
Rename the welcome prompt editor from /admin/welcome to /admin/agent-prompt
and update all UI labels to "Agent Setup Prompt". API endpoint URLs are
unchanged (PUT/GET/DELETE /api/admin/welcome-template, GET /api/welcome).

- Nav menu: "Welcome prompt" → "Agent Setup Prompt", href updated
- Page title and h2 updated in admin_welcome.html
- Error message hint in app/api/welcome.py updated to /admin/agent-prompt
- Dashboard: replace inline <details> preview of _claude_setup_instructions
  with a simple link to /setup (Task C)
- docs/welcome-template.md renamed to docs/agent-setup-prompt.md; internal
  references to /admin/welcome updated
- OpenAPI snapshot path updated
- Tests updated to reflect new route and removed inline preview
2026-05-03 16:12:13 +02:00
ZdenekSrotyr
c7b14fb120 feat(admin): drop setup_banner feature; consolidate into single editor
Remove the setup_banner feature (admin-editable /setup page banner) and
all associated code: API router, repository, renderer, admin template,
tests, and docs. The setup_page handler no longer calls render_setup_banner;
the install.html template no longer renders banner_html. The setup_banner
DuckDB table (v22) is kept intact for forward-compat with already-migrated
instances — only the application code is removed.

CHANGELOG updated: setup_banner bullets removed; Agent Setup Prompt
(welcome-template feature) now stands alone as the single editable prompt.
2026-05-03 16:12:13 +02:00
ZdenekSrotyr
b0ec842804 feat(admin-ui): SRI + CDN fallback for CodeMirror, 301→302 on /install, error sanitization
- Add integrity= + crossorigin= to all 4 cdnjs tags in admin_welcome.html
  and admin_setup_banner.html (I-1)
- Add graceful CDN fallback: when CodeMirror is undefined (SRI mismatch or
  CDN down), degrade to styled plain textarea with polyfill editor interface
  so save/reset/preview still work (I-1)
- Replace fixed 480px editor height with calc(100vh - 320px) for
  viewport-relative sizing; add min-height: 480px to .welcome-editor-col (M-8)
- Change /install redirect from 301 to 302 to prevent indefinite browser
  caching (I-5)
- Sanitize Jinja2 error detail in /api/welcome 500 response: log full error
  server-side, return generic detail pointing at /admin/welcome (M-7)
- Hoist build_context import to module level in app/api/welcome.py (M-11)
2026-05-03 16:12:13 +02:00
ZdenekSrotyr
39146288e1 feat: admin-editable setup_banner on /setup page (schema v22)
Adds an optional Jinja2/HTML banner displayed above the bootstrap
commands on /setup. Empty by default; admin authors it at
/admin/setup-banner. autoescape=True — safe for HTML context.
Render failures return "" so a broken banner never breaks /setup.

Schema v22: setup_banner singleton table, auto-migration v21→v22.
2026-05-03 16:12:13 +02:00
ZdenekSrotyr
40d221f20a feat(admin-welcome): CodeMirror editor + live preview pane 2026-05-03 16:12:13 +02:00
ZdenekSrotyr
4bcdc4e7d7 feat(dashboard): link Claude Code setup CTA to /setup page 2026-05-03 16:12:13 +02:00
ZdenekSrotyr
85967e14ca feat(web): rename /install → /setup; nav label 'Setup local agent'
- Add GET /setup serving install.html (CLI + Claude Code setup page)
- Add GET /install → 301 redirect to /setup for backwards compat
- Move first-time setup wizard from /setup to /first-time-setup
- Update nav link: href=/setup, label 'Setup local agent', active on both /setup and /install paths
- Update page <title> to 'Setup local agent — …'
- Update /dashboard and /setup comment in _claude_setup_instructions.jinja
- Update tests and OpenAPI snapshot accordingly
2026-05-03 16:12:13 +02:00
ZdenekSrotyr
92fd78cfb4 fix(admin-welcome): redesign with peer chrome, toast, btn-copy 2026-05-03 16:12:13 +02:00
ZdenekSrotyr
ecaa113c68 fix(admin-welcome): credentials: include, real-content preview, refresh after mutate 2026-05-03 16:10:48 +02:00
ZdenekSrotyr
2b3048f77f feat(web): /admin/welcome editor page 2026-05-03 16:10:48 +02:00
ZdenekSrotyr
93b713900b fix(api): validate template render on PUT; broaden render-time catch 2026-05-03 16:10:48 +02:00
ZdenekSrotyr
0d1ecd235d feat(api): /api/welcome + /api/admin/welcome-template endpoints 2026-05-03 16:10:48 +02:00
ZdenekSrotyr
d055417377 feat(config): default welcome template in jinja2 + sync_interval 2026-05-03 16:10:48 +02:00
ZdenekSrotyr
91caefaca9
security(auth): per-IP rate limit + last-admin guard (#165)
* security(auth): per-IP rate limit on auth endpoints + generalize last-admin guard

Closes #45 and #151.

#45 — every auth endpoint was unthrottled (login, magic-link, token,
bootstrap), leaving us open to password brute-force and SMTP
email-bombing. Wires slowapi (new dep) into the middleware chain with
per-route limits: 10/min on login + token, 5/min on send-link, 3/min on
bootstrap. Returns 429 with Retry-After: 60 once exceeded. Per-IP key
respects the leftmost X-Forwarded-For hop (Caddy in front of the app
strips client-supplied XFF). Operator escape hatch:
AGNES_AUTH_RATELIMIT_ENABLED=0. Test suite disables the limiter via
autouse conftest fixture so existing auth tests that hammer endpoints
in tight loops are unaffected.

#151 — DELETE /api/admin/users/{id}/memberships/{group_id} and the
mirror DELETE /api/admin/groups/{group_id}/members/{user_id} only
guarded against self-removal as last admin. Generalizes to refuse
removing anyone from the seeded Admin group when they are the only
remaining active admin (mirrors the existing
count_admins(active_only=True) <= 1 check on delete_user / update_user).
Recovery from zero admins requires direct DB access, so this closes
a path where a scheduler/bootstrap actor that bypasses normal admin
checks could otherwise empty the group.

* security(auth): throttle remaining email-bombing + token-confirm endpoints

Address code-review gap on PR #165 — the first commit covered /send-link
but missed two endpoints with the IDENTICAL email-bombing surface:

- POST /auth/password/reset       — sends reset mail, anti-enum response
- POST /auth/password/setup/request — sends setup mail, anti-enum response

Both now share the 5/min limit with /send-link.

Also add 10/min to the token-confirm surfaces — high-entropy tokens but
partial leaks via logs / referer have surfaced before, and unbounded
guess rate would let an attacker exhaust the keyspace adjacent to a
leaked prefix:

- POST /auth/email/verify
- GET  /auth/email/verify         — closes the click-through bypass
- POST /auth/password/reset/confirm
- POST /auth/password/setup/confirm

Doc fix: rate_limit.py module docstring + CHANGELOG entry no longer
claim "disable without a redeploy" (misleading). The Limiter constructor
freezes `enabled` from env at import time, matching every other Agnes
env knob — operators set the flag and bounce the container.

Tests: 4 new cases in test_auth_rate_limit.py covering
/reset, /setup/request, /reset/confirm, GET /verify. Full suite:
2583 passed, 32 skipped, 0 failed.

* security(auth): throttle JSON /auth/password/setup — closes form-throttle bypass

Second code-review pass on PR #165 caught a fifth gap: POST /auth/password/setup
(JSON variant, kept for backward compat) consumes the same setup_token as
the web form /setup/confirm but was unthrottled — an attacker brute-forcing
the token just switches from the form path to the JSON path and resumes
at unbounded RPS. Apply the same 10/min limit and signature shape used
on /setup/confirm.

Also extend CHANGELOG note about the JSON-variant bypass for future
operators reading the security entry.

Test: 1 new case (test_password_setup_json_rate_limited_after_10_requests),
9 rate-limit tests + 28 password-flow tests + 41 auth-provider tests pass,
no regressions.

* chore(release): cut 0.30.1 — auth security hardening (rate limit + last-admin guard)
2026-05-02 21:08:33 +02:00
ZdenekSrotyr
dc03837a7b feat(query-api): better error message when --remote query references a materialized-but-not-rebuilt id
E2E sub-agent finding: `da query --remote "SELECT * FROM <id>"` against a
materialized table that hasn't yet been rebuilt in the server's
analytics.duckdb returns a confusing DuckDB "Table does not exist"
message even though the table is in the registry. Materialized rows
produce parquets at `${DATA_DIR}/extracts/<source>/data/<id>.parquet`,
but the orchestrator's master-view creation is `_meta`-driven — fresh
instances or pre-tick states have the registry row without a
corresponding view, so analysts hit the bare "does not exist" with no
path forward.

Improve the error rendering in `app/api/query.py:execute_query`. When
DuckDB raises a "table does not exist" error, scan the registry for any
`query_mode='materialized'` row whose id or name appears in the failed
SQL. On a hit, return a 400 whose detail names the table, explains the
materialize state, and offers two concrete next steps:

1. Run `da sync` (or wait for the scheduler tick / hit
   POST /api/sync/trigger) to materialize the parquet, OR
2. Query the source directly via the catalog alias when the registry row
   carries bucket+source_table (e.g. `bq."dataset"."table"` for BigQuery,
   `kbc."bucket"."table"` for Keboola).

Detection is bounded — the registry round-trip only fires when DuckDB's
error mentions a missing table, so happy-path queries pay no cost.
Non-materialized unknowns fall through to DuckDB's raw error.

2 new tests: materialized id surfaces the hint with the bucket+source_table
payload; unknown table falls back to the generic error path with no false
positive on the new hint.
2026-05-01 23:09:52 +02:00
ZdenekSrotyr
8030a867ec fix(admin-api): keep source_type validator permissive when primary is 'local' (bootstrap)
The strict source_type-availability validator from the prior commit
broke ~12 existing tests that register tables on the default test
instance (where `data_source.type` resolves to 'local' since no
instance.yaml is loaded).

The intent of the validator is to catch *explicit* misconfig:
`type=bigquery` instance + `source_type=keboola` payload with no
`data_source.keboola.*` block. The bootstrap workflow — admin sets up
a fresh instance and registers a few tables before pointing at a real
source — should not be gated here.

Loosen the check: when `get_data_source_type()` returns 'local' (the
fallback when no `data_source.type` is set), skip the rejection. The
explicit mismatch case still 422s because that path resolves
`configured_primary` to a real source type.

Also adds an autouse keboola_instance fixture to test_journey_sync_query.py
which exercises Keboola registrations through the full sync→query
flow — the fixture documents the test's data-source assumption rather
than relying on the bootstrap escape hatch.
2026-05-01 23:09:15 +02:00
ZdenekSrotyr
bc3ba0d43d feat(admin-api): reject register-table for source_type not configured on instance
E2E sub-agent finding: instance configured with `data_source.type='bigquery'`
and no `data_source.keboola.*` block. Admin POSTs `{source_type: 'keboola'}`
to /api/admin/register-table → returns 201, row lands in the registry, but
never syncs because the scheduler has no Keboola URL/token to ATTACH
against. Operator only notices the gap when `da catalog` keeps showing
nothing.

The new `_validate_source_type_configured` helper runs immediately after
the id/view-name collision checks in `register_table`. A source_type is
considered configured when:

- it matches `get_data_source_type()` (the instance's primary), OR
- a non-empty `data_source.<source_type>` block exists in the effective
  `instance.yaml` (multi-source instance), OR
- it's in `_SOURCE_TYPES_INDEPENDENT_OF_DATA_SOURCE` (Jira / local — both
  get data through paths that don't involve `data_source.*`).

Returns 422 with a message that names the configured primary source and
points at `/admin/server-config` for enabling a secondary one. None /
empty source_type is still tolerated for backward compat with legacy CLI
scripts that don't set the field — the route resolves it later.

5 new tests cover: keboola-on-bq rejected, bq-on-keboola rejected,
matching source_type still works, jira allowed regardless, omitted
source_type passes through.

Existing tests that registered Keboola rows on the unconfigured default
test instance now opt into a `keboola_instance` fixture to satisfy the
new validator (tests/test_admin_bq_register.py + .keboola_materialized
+ .unregister_cleanup; the multi-source PUT test in test_admin_bq_register
adds a `keboola` block to its synthetic config).

Pre-existing test_missing_project_returns_error failure in
TestRebuildFromRegistry is unrelated (config-cache leakage from a
previous test in the same class) — confirmed pre-existing on the prior
commit via `git stash` reproduction.
2026-05-01 23:04:51 +02:00
ZdenekSrotyr
dd46461c6c fix(admin+orchestrator): DELETE registry drops parquet + sync_state; rebuild skips orphan parquets
E2E sub-agent finding: register a materialized BQ row → sync to materialize
the parquet at `/data/extracts/bigquery/data/<id>.parquet` → DELETE the
registry row. The DB row goes away but:

- the parquet file stays on disk forever, AND
- the sync_state row stays, so `/api/sync/manifest` keeps advertising the
  dropped table to `da sync`, AND
- the orchestrator's next rebuild can resurrect a master view by picking
  up the leftover parquet.

Two-part fix in `unregister_table`:

1. For materialized rows on bigquery/keboola, remove
   `${DATA_DIR}/extracts/<source_type>/data/<name>.parquet` (and any stale
   `<name>.parquet.tmp` from a crashed prior materialize). Filename is
   keyed on `table_registry.name` to match sync_state bookkeeping.
   File-removal errors are logged but don't fail the DELETE — the registry
   row is already gone, and an orphan parquet won't get a master view at
   next rebuild because the orchestrator's _meta-driven scan never picks
   up bare parquet files.

2. Always clear `sync_state` + `sync_history` rows for the dropped table_id
   so the manifest stops advertising the table — applies to all source
   types and modes, not just materialized, since any synced row had a
   sync_state entry.

Orchestrator-side defensive guard (Finding 2b) is a no-op in the current
implementation: `_attach_and_create_views` only creates master views from
`_meta` rows in each connector's `extract.duckdb`, so a parquet without a
matching `_meta` entry is already invisible to the rebuild. The new
test `test_orchestrator_skips_orphan_parquet_in_extracts` is kept as a
regression guard for that contract.

5 tests cover: BQ + Keboola materialized DELETE removes parquet, remote
DELETE doesn't error trying to remove a non-existent file, sync_state
cleared on DELETE, orchestrator orphan-skip invariant.
2026-05-01 22:54:11 +02:00
ZdenekSrotyr
f0979f997a fix(admin-api): reject backtick BQ-native source_query at register; surface materialize errors per-row
E2E testing showed admin POSTs of materialized BQ rows whose source_query
uses BigQuery-native backtick identifiers (`prj.ds.t`) silently no-op'd at
the next sync tick — the materialize path runs the SQL through the DuckDB
BQ extension's COPY which uses DuckDB's parser; backticks aren't recognized
and the query either parse-errors or matches zero rows. No parquet lands at
the canonical path and no error reaches an operator-visible surface.

Two-part fix:

1. RegisterTableRequest's _check_mode_query_coherence model_validator now
   rejects any source_query containing a backtick with a 422 + actionable
   message pointing at the DuckDB equivalent (bq."dataset"."table"). Same
   check is applied in update_table on the merged record so PATCHes that
   flip a stored source_query to backtick form are also caught. Covers BQ
   AND Keboola materialized rows since both connectors funnel source_query
   through DuckDB's COPY.

2. _run_materialized_pass now persists per-row failures via the new
   SyncStateRepository.set_error / clear_error methods (existing
   sync_state.error / status columns — no schema migration). GET
   /api/admin/registry enriches each row with `last_sync_error` from a
   single batched SELECT against sync_state, so the admin UI / da admin
   status can show "this table failed last sync because: X" instead of
   operators having to trawl scheduler logs. Recovered rows have the
   error cleared automatically — update_sync's success path resets
   status='ok' / error=NULL on the upsert.

The materialized-path test fixture's _materialized_payload helper is
updated to use DuckDB-flavor SQL (the prior backtick example pre-dated the
fix). 6 new tests cover register/update rejection on BQ + Keboola, the
sync_state error persistence, and the registry response surface.
2026-05-01 22:51:02 +02:00