Commit graph

218 commits

Author SHA1 Message Date
ZdenekSrotyr
b2c1ff143c fix(query): rewrite BQ-backed user SQL via bigquery_query() to enable predicate pushdown
User SQL hitting query_mode='remote' BigQuery rows was 50-100x slower
than the equivalent direct bigquery_query() call because DuckDB's master
view (CREATE VIEW … AS SELECT * FROM bigquery.<ds>.<tbl>) does not push
WHERE/SELECT/LIMIT into BQ in ATTACH-catalog mode. The BQ extension opens
a Storage Read API session over the entire upstream table; on >100M-row
sources this was 70-150s and frequently failed with 'Response too large
to return'.

Extract the existing dry-run rewriter's core (table-name → BQ-native
backtick path) into a shared helper. Add an execution-path rewriter
that wraps the whole user SQL in bigquery_query('<project>', '<inner>')
so the BQ planner sees the full query and engages partition pruning +
projection pushdown server-side.

Conservative fall-through: cross-source JOINs (BQ ↔ Keboola/Jira local),
queries already containing bigquery_query(, and unconfigured BQ project
all skip the rewrite and run the original SQL via ATTACH-catalog so
behavior degrades gracefully.
2026-05-06 13:02:34 +02:00
ZdenekSrotyr
6bc8739010 feat(admin/tables): show source, schedule, folder, registered, and sync-error in row 2026-05-06 11:09:02 +02:00
ZdenekSrotyr
b230d44687 docs(admin/tables): clarify NUL sentinel in unescapeShellQuoting 2026-05-06 10:15:56 +02:00
ZdenekSrotyr
05e535d743 fix(admin/tables): unescape shell-quoting backslashes in descriptions 2026-05-06 10:13:49 +02:00
ZdenekSrotyr
e369d0ed7b fix(admin/tables): clamp long description to 2 lines so Actions stay reachable 2026-05-06 10:06:57 +02:00
ZdenekSrotyr
6c94d2cbce Merge remote-tracking branch 'origin/main' into pr180-review
# Conflicts:
#	CHANGELOG.md
#	pyproject.toml
2026-05-06 07:27:25 +02:00
ZdenekSrotyr
df2c33147c fix: Devin Review on #194 round 2 — 3 BUG-class findings
1. instance.yaml overlay path now matches read site under STATE_DIR.
   Three sites updated:
     - app/api/admin.py:1005 (server-config endpoint writer)
     - app/api/admin.py:2610 (configure endpoint writer)
     - app/instance_config.py:106 (overlay reader)
   All three now go through _state_dir() so under flat-mount layout
   (STATE_DIR=/data-state) the irreplaceable instance.yaml overlay
   lands on the state disk (sdc) instead of the regenerable data
   disk (sdb). Without this fix, .env_overlay correctly went to the
   state disk while instance.yaml went to the data disk — config
   would be lost if an operator wiped sdb.

2. Strip customer-specific tokens from OSS repo per CLAUDE.md
   vendor-agnostic rule:
     - docker-compose.host-mount.yml: 'a deployer (Groupon FoundryAI)'
       → 'a deployer in production'
     - docker-compose.flat-mount.yml: 'caused 2026-05-05 in the
       Groupon FoundryAI deployment' → generic 'production failure
       mode'
     - docs/state-dir.md: rewrote the incident reference to describe
       the failure mode abstractly without naming the deployment;
       updated the recommendation table to say 'shadow-mount class'
       instead of dating the specific incident.

3. Updated docs/state-dir.md 'What reads STATE_DIR' to list all
   read/write sites including the three migrated in this round
   (admin.py, instance_config.py, marketplaces.py).

ANALYSIS finding (tls-rotate.sh hardcoded host-mount.yml) deferred
— same operator-side class as auto-upgrade.sh hardcoded host-mount,
documented limitation per the PR body.
2026-05-05 20:02:50 +02:00
ZdenekSrotyr
b6543c9c55 fix: Devin Review on #194 — 2 BUG-class findings
1. .env_overlay write paths now match read path under STATE_DIR.
   app/main.py:343 reads via _state_dir() (post-PR #194), but two
   write sites still hardcoded ${DATA_DIR}/state/.env_overlay:
     - app/api/admin.py:2687 — configure endpoint secrets persistence
     - app/api/marketplaces.py:152 — marketplace PAT persistence
   Under flat-mount layout (STATE_DIR=/data-state) the admin UI wrote
   secrets to /data/state/.env_overlay while the app read from
   /data-state/.env_overlay, silently dropping the value on next
   restart. Both write sites now go through _state_dir().

2. host-mount.yml: caddy inherits data:/srv:ro from base, but with
   no service populating the data: named volume (other services
   switched to direct /data binds), the inherited mount points at an
   empty Docker volume — try_files finds nothing, every parquet
   download falls through to uvicorn, defeating the v0.36.0
   file_server bypass under the host-mount layout. Added a caddy
   override that restates all mounts including a direct /data:/srv:ro
   bind. Mirrors the comment + treatment already in flat-mount.yml.
2026-05-05 19:47:12 +02:00
Vojtech Rysanek
a303de0372 feat: STATE_DIR env var + flat-mount overlay (parallel disks)
Introduces STATE_DIR as the single source of truth for the writable
state directory path, with backward-compatible default of
${DATA_DIR}/state. Pairs with a new docker-compose.flat-mount.yml
overlay that mounts the state disk in PARALLEL to the data disk
(rather than nested under it).

Why
---
The default deployment topology nests state under data: sdb at /data,
sdc at /data/state. That layout has known fragility documented in
docs/state-dir.md — bind-propagation gotchas, two-writer collisions
on the same prefix, mount-order coupling. The 2026-05-05 incident in
the Groupon FoundryAI deployment was a manifestation of the
propagation gotcha.

The flat layout (sdb at /data, sdc at /data-state — parallel, not
nested) eliminates the nested-mount class entirely. Each disk is its
own bind mount, recursive by default in modern Docker. No volume
options to forget. No two-writer collision (host scripts and
container app share /data-state at the same path, single namespace).

What changes
------------
App code (Python):
- src/db.py:        new _get_state_dir() helper. get_system_db() and
                    schema migration snapshot use it.
- app/secrets.py:   new _state_dir() helper. _load_or_generate() uses
                    it for .session_secret and .jwt_secret.
- app/main.py:      .env_overlay loaded from _state_dir().

Host scripts:
- scripts/ops/agnes-auto-upgrade.sh: STATE_DIR drives mount-sanity
  check and cert detection. Defaults preserve existing behavior.
- scripts/ops/agnes-tls-rotate.sh:   STATE_DIR drives CERT_DIR.

New compose overlay:
- docker-compose.flat-mount.yml: parallel /data and /data-state binds
  per service. Mutually exclusive with docker-compose.host-mount.yml;
  pick one based on disk topology.

Documentation:
- docs/state-dir.md: layout choice (A nested vs B flat), pros/cons,
  migration steps, and which code paths read STATE_DIR.

Backward compatibility
----------------------
STATE_DIR defaults to ${DATA_DIR}/state — current behavior. Existing
deployers that don't set the var see no behavior change. Migration
to flat layout is opt-in per the runbook in docs/state-dir.md.

Validation
----------
- bash -n on both host scripts: pass
- docker compose config -f docker-compose.flat-mount.yml: resolves
  cleanly with all 6 services binding /data and /data-state directly
- python3 import + helper exercise: STATE_DIR override works,
  default falls back to ${DATA_DIR}/state

Companion to PR #191 (drop named-volume driver_opts in host-mount.yml).
That PR fixes the immutability footgun for Layout A; this PR offers
Layout B as the architectural alternative.
2026-05-05 19:28:07 +02:00
ZdenekSrotyr
f2ce915458 fix: Devin Review on #188 commit 28423907 — 2 bugs
🚩 /api/v2/catalog still async def while now calling sync stat()

`/api/v2/catalog` was left as `async def` when the rest of Tier 1 was
converted, on the assumption it was lightweight. The new
`_materialized_size_hint` populator added in this PR calls
`Path.stat()` / `Path.exists()` for every visible row to bucket the
parquet size — on a local FS that's microseconds, but on a
network-mounted DATA_DIR (NFS / CIFS / GCS-FUSE) those syscalls
can block the event loop. Convert to plain `def` so FastAPI
auto-offloads to the thread pool, mirroring /api/query etc.

🔴 stream_download translates HTTPStatusError as generic transport error

`response.raise_for_status()` inside the retry loop raises
`httpx.HTTPStatusError` on 4xx/5xx. After retries exhaust, the new
`isinstance(last_exc, httpx.HTTPError)` check at line 219 was eating
the status code: HTTPStatusError is a subclass of HTTPError, so the
generic transport translation produced "Unexpected error: HTTPStatusError"
instead of the informative "Client error '401 Unauthorized' for url …"
that callers expect. Fix: short-circuit HTTPStatusError before the
HTTPError branch — it re-raises verbatim so the caller's status-code
handling + the rich server error body (e.g. 401 expired token, 403
cross_project_forbidden) reach the analyst.

api_get / api_post / api_delete / api_patch don't have the same bug:
httpx Client.get/etc. don't raise HTTPStatusError unless the caller
explicitly calls .raise_for_status(), and our wrappers don't.
Only stream_download does, hence the targeted fix there.
2026-05-05 18:29:44 +02:00
ZdenekSrotyr
e5fb913cec perf: Tier 1 event-loop unblocking — async def → def on BQ-bound handlers
Five hottest BQ-touching endpoints were `async def` but invoked synchronous
DuckDB / BQ-extension calls inside the body. Under uvicorn's single event
loop that meant a single heavy `agnes query --remote` (waiting up to
~200 s for BQ's jobs.query) froze EVERY other request — /api/health,
dashboard, auth, even another query — for the full BQ wait. Operators
saw "VM idle, app frozen" during PR #188's testing.

Convert to plain `def` so FastAPI auto-offloads the body to the anyio
thread pool. Event loop stays free for non-BQ requests.

- app/api/query.py:execute_query
- app/api/v2_scan.py:scan_estimate_endpoint, scan_endpoint
- app/api/v2_sample.py:sample
- app/api/v2_schema.py:schema

Audit: 0 `await` statements in any converted handler (verified file-by-
file), so the rename is safe. Tests in tests/test_v2_*.py called the
handlers via `asyncio.run(...)` which now fails on a non-coroutine return;
swapped for direct calls (asyncio.run( -> ( ) — keeps paren balance).

Plus AGNES_THREADPOOL_SIZE env var (default 200, was anyio's stock 40)
in app/main.py:lifespan. Set via
anyio.to_thread.current_default_thread_limiter().total_tokens. 200 is
comfortable headroom for <50 concurrent analysts; bump for more.

480/480 impacted tests pass (the 2 remaining errors are a pre-existing
fixture setup issue in test_reader_smoke_matrix.py unrelated to this
change).
2026-05-05 17:44:08 +02:00
ZdenekSrotyr
30e81a15b9 feat(workspace-prompt): decision tree + size-hint so analyst Claude gets it right first try
Three concrete changes addressing the "analyst Claude misuses the CLI"
class of bugs (image.png table — issues #3, #5, plus the recurrent
"how big is this table" guesswork):

1. config/claude_md_template.txt — the template agnes init writes to
   <workspace>/CLAUDE.md. Surfaces every catalog-row field with a why,
   adds a query_mode-based decision tree, explicit --estimate scoping
   (snapshot create ONLY — was the #1 first-try error), an agnes fetch
   → agnes snapshot create rename note, and a 6-row failure-mode table
   that maps each common error wording to its right next step.

2. app/api/v2_catalog.py — populate rough_size_hint for local +
   materialized rows from the on-disk parquet size, bucketed
   small/medium/large/very_large. Was hardcoded null with a TODO; AI
   couldn't tell "is this 6.8 GB" without a failed --remote round-trip.

3. cli/update_check.py — the [update] banner survived the da→agnes
   rename and printed "[update] da X is out of date" on every command,
   training analysts to associate the binary with the old name.

Verified by rendering the template against representative contexts
(33/33 tests pass) and running every use case from the original
screenshot through the real CLI against a dev VM.
2026-05-05 16:44:24 +02:00
ZdenekSrotyr
1be997f6d4 feat(caddy): file_server for parquet downloads — bypass uvicorn
A single analyst's multi-GB `agnes pull` held the only uvicorn worker
for the duration of the stream, starving UI / /api/health / every other
API endpoint. Container flipped to `unhealthy`. Triggered while a
6.8 GB `order_economics` pull was in-flight on prod 2026-05-05.

Caddy now intercepts `GET /api/data/{table_id}/download` and serves
the parquet directly via sendfile from the data volume (mounted r-o
at /srv inside the caddy container). RBAC enforced by `forward_auth`
to a new lightweight `GET /api/data/{table_id}/check-access` endpoint
(returns 204 / 403) — the bulk transfer never reaches uvicorn.

Path discovery via `try_files` over the known extract.duckdb v2 source
subdirs. Anything not at a static path falls through to the existing
app handler so legacy `src_data/parquet` and future connectors still
work without a Caddyfile change. Non-Caddy deployments are unchanged.

Stage 1 (multi-worker uvicorn) was considered but blocked by the
single-writer DuckDB lock on system.duckdb — workers > 1 would crash
at startup on "Could not set lock on file", the same race that pushed
the scheduler from in-process writes to HTTP-via-app. Multi-reader
workers + single-writer coordination is out of scope for this PR.
2026-05-05 16:41:33 +02:00
ZdenekSrotyr
4f04235502 feat(bigquery): bq_query_timeout_ms knob; default 600s (was 90s)
DuckDB BigQuery extension defaults `bq_query_timeout_ms` to 90 s, which
is too tight for analyst-scale queries against view-backed BQ datasets.
`agnes query --remote` HTTP 400'd with `Binder Error: Query execution
exceeded the timeout. Job ID: ...` whenever the underlying BQ job ran
longer than 90 s, even though the job itself was healthy.

Add `data_source.bigquery.query_timeout_ms` (default 600 000 ms = 10 min,
sentinel 0 falls through to the extension default). Applied via
`SET bq_query_timeout_ms` after every `LOAD bigquery` on every BQ-touching
DuckDB session: orchestrator's `_remote_attach` ATTACH path, BqAccess
session factory, and the standalone extractor. Configurable via
`/admin/server-config` UI.

Fail-soft: extension versions that don't recognise the setting silently
keep the default rather than poisoning the session.
2026-05-05 16:40:40 +02:00
ZdenekSrotyr
8d8d2c219e refactor(cli-store): pull/info → agnes admin store; add agnes store mine
Backup-orchestration commands were split across two namespaces (pull in
agnes store, push in agnes admin store), which broke the operator
mental model — pull/push are a paired operation and should sit
together.

Move pull + info into agnes admin store so all bulk operations share
one help screen. Add agnes store mine as the user-facing equivalent —
calls the same /api/store/bundle.zip endpoint with ?owner=me, which
the server resolves to the caller's user_id. Authors can archive
their own uploads without admin role; whole-Store bulk reads stay
admin-flavored as a discoverability hint.

Server: 3-line addition to export_bundle handles owner='me' as a
magic alias for the caller. No new endpoint.

Tests updated: pull/info expectations move from agnes store to
agnes admin store; new tests cover agnes store mine and the
?owner=me server resolution. 69/69 store tests green locally.
2026-05-05 13:49:18 +02:00
ZdenekSrotyr
3d63965a67 Merge remote-tracking branch 'origin/main' into pr180-review
# Conflicts:
#	CHANGELOG.md
#	app/web/templates/_app_header.html
2026-05-05 12:05:50 +02:00
ZdenekSrotyr
a8f9d065c8 feat(store): bundle export/import + agnes store update + agnes admin store push
Adds whole-Store backup/restore primitives so an external CI/CD job can
mirror the Store to a git repo (and restore back from one).

REST:
- GET /api/store/bundle.zip — deterministic ZIP of all (filtered) Store
  entities. Layout: manifest.json + entities/<id>/{plugin,assets}/.
  Manifest carries owner_email for cross-instance restore. Auth: any
  authenticated user (Store is community-open).
- POST /api/store/import-bundle — admin-only restore. Modes
  merge|replace|skip; owner resolution by email with stub-disabled-user
  fallback when the email is unknown on the target instance.

CLI:
- agnes store update <id> [--description X] [--zip PATH] ... — in-place
  edit (server PUT permits owner OR admin per F4). Closes the missing
  edit affordance for analysts who want to fix a typo or push a new
  ZIP without losing install_count.
- agnes store pull [-o store.zip] [--unpack DIR] — download the bundle.
  --unpack streams + extracts so an external git-backup workflow can
  drop the tree straight into a repo and `git add .`.
- agnes store info [--json] — counts + size summary.
- agnes admin store push <zip-or-dir> [--mode ...] — admin-only restore.
  Auto-zips a directory client-side so a working-tree → server
  round-trip is one command.

cli/v2_client.py gains api_get_stream helper for binary downloads.

Tests: 5 new server tests (bundle shape + filters + round-trip + stub
user creation + skip mode + admin-only gate) + 11 new CLI tests
(update, pull/unpack, info, admin push). 66/66 store-related tests
green locally.
2026-05-05 11:51:31 +02:00
ZdenekSrotyr
952dc9e74d fix(profile-sessions): tolerate stat() failures on individual jsonl (Devin Review on #179)
The previous gather used `sorted(glob, key=lambda p: p.stat().st_mtime)`.
A transient OSError (race with delete, permission flicker, EBADF on a
weird filesystem) on any single file raised through the lambda and 500-ed
the whole page.

Reworked: stat each path under try/except into a (path, stat) list, sort
the already-statted entries. Bad files drop silently from the listing.

Regression test test_profile_sessions_page_tolerates_stat_failures
patches Path.stat to raise on one of two files, asserts the page returns
200 with the good row rendered and the bad row dropped.
2026-05-05 09:53:06 +02:00
ZdenekSrotyr
d878764ac1 fix(session-collector-api): mirror sibling endpoints' audit-on-exception (Devin Review on #179)
Devin flagged that run_session_collector still had the same audit-skip
gap I fixed in run_verification_detector and run_corporate_memory in
the previous two rounds — a PermissionError walking /home, an OSError
on /data/user_sessions mkdir, or any other unhandled exception from
collector.run() would skip the audit_log row and only show in docker
logs.

Same try/except + unhandled_error pattern as the sibling endpoints.
All three LLM-pipeline run-* endpoints now record their failures the
same way; /admin/scheduler-runs sees them. Regression test in
tests/test_admin_run_endpoints.py::TestRunSessionCollector::test_unhandled_exception_still_audits.
2026-05-05 09:31:33 +02:00
ZdenekSrotyr
9ebe991b55 feat(profile): per-session jsonl download from /profile/sessions
User feedback during e2e of #179: the listing page is nice but I want
to grab the raw jsonl and look at what's inside.

Adds GET /profile/sessions/<filename>:
- Auth via get_current_user (owner-only).
- Path safety: rejects "/", "\", "..", leading ".", and any non-".jsonl"
  filename. The served path resolves under
  ${DATA_DIR}/user_sessions/<caller.id>/; if resolution escapes that
  base directory, returns 404 (never 403, so existence of other users'
  files isn't leaked).
- FileResponse with Content-Disposition: attachment.

UI: Download button per row in profile_sessions.html.

Tests in test_web_ui.py: path-traversal / nested / dotfile / non-jsonl
all 404 for owner; unauthenticated 302/401/403; authenticated owner
gets 200 + correct Content-Disposition.
2026-05-05 09:15:12 +02:00
ZdenekSrotyr
e86da72997 fix(corporate-memory-api): mirror verification-detector audit-on-exception (Devin Review on #179)
Devin flagged that run_corporate_memory still had the same audit-skip
gap I just fixed in run_verification_detector — if collect_all() throws
anything other than the already-translated ValueError (DuckDB lock,
network blip, unexpected SDK error), the audit_log row was never
written and /admin/scheduler-runs missed the failure.

Same try/except + unhandled_error pattern as the verification_detector
fix from 4c4dfee8. Regression test in
tests/test_admin_run_endpoints.py::TestRunCorporateMemory::test_unhandled_exception_still_audits.
2026-05-05 09:11:13 +02:00
ZdenekSrotyr
4c4dfee8e6 feat(profile): /profile/sessions page + audit on detector exception + correct SCHEDULER_AUDIT_ACTIONS
Three changes addressing user feedback during e2e test of #179 + Devin Review on e86dd5ed.

1) /profile/sessions — new self-service user page in the user menu.
   Lists all session jsonls the caller uploaded via `agnes push` joined
   against session_extraction_state. Each row shows uploaded_at, file
   size, status badge (pending/processed/extracted), processed_at, and
   items_extracted. The page docstring + help text explicitly call out
   that items_extracted=0 means the verification detector ran fine but
   the LLM found no claims to track — that's the documented "no items"
   outcome, not a broken pipeline. Closes the gap surfaced during the
   e2e test of #176 where a user could see their sessions on disk and
   process them through the LLM but had no UI to inspect what happened.

2) run_verification_detector audits unhandled exceptions (Devin #1).
   If detector.run() threw anything other than the already-translated
   ValueError, the audit_log row was never written. The endpoint now
   wraps detector.run in try/except, records the exception in
   audit_params["unhandled_error"], then re-raises as 500 after audit.
   The /admin/scheduler-runs page surfaces the failure row with the
   error type + message.

3) SCHEDULER_AUDIT_ACTIONS list corrected (Devin #2). Previous list
   had "marketplaces_sync_all" (wrong — actual is "marketplace.sync_all")
   plus "data_refresh" and "scripts_run_due" which app/api/sync.py and
   app/api/scripts.py don't write to audit_log. Fixed to the four
   actually-logged strings; comment points at the missing audit calls
   as a follow-up.

Tests: tests/test_web_ui.py adds TestAdminRoleGuards::test_profile_sessions_page_no_admin_required and tightens test_admin_scheduler_runs_page_admin_only to assert the correct marketplace.sync_all string.
2026-05-05 08:57:35 +02:00
ZdenekSrotyr
f0d091f721 fix(store): scratch dir leak on ZIP validation failure (Devin Review)
create_entity + update_entity created the `scratch` temp dir inside one
try/finally but cleaned it up in a separate one. Validation HTTPExceptions
raised by _safe_zip_extract (zip_unsafe_path, zip_too_large_uncompressed)
or the BadZipFile→422 conversion exited the first scope, and the second
finally was never entered → temp dir leaked on every failed upload.

Devin flagged this on the F2 commit. The leak pre-existed (zip_unsafe_path
was the original vector); F2 added zip_too_large_uncompressed to the same
broken cleanup path. Fixed by collapsing scratch creation + cleanup into
one outer try/finally that covers both extraction AND metadata/bake; the
inner try/except/finally still handles BadZipFile→422 + tmp file cleanup.

Same restructure in update_entity. Regression test
`test_scratch_dir_cleaned_up_after_failed_extraction` triggers a
zip_unsafe_path 422 and asserts tmp/agnes_store_* contains no leaked
dirs.
2026-05-05 08:52:15 +02:00
ZdenekSrotyr
fd3c76d21b fix(store): security + correctness blockers found in PR review (F1, F2, F4, F5)
Three independent reviews of PR #180 surfaced four real defects in the new
Store / my-ai-stack surface. CHANGELOG entries detail each; one-liners:

- F1 video_url XSS: any authenticated user could upload a Store entity
  with `video_url=javascript:...` and pop XSS in any viewer's session via
  the `<a href=...>` "Watch video" link in store_detail.html. Jinja2
  autoescape doesn't block URI schemes inside attribute values. Fixed by
  scheme-validating to http(s) only on create + update; 400 invalid_video_url.

- F2 ZIP decompression bomb: _safe_zip_extract checked path-traversal but
  not declared file_size totals — a 50 MB compressed upload at 1:1000
  ratio decompresses to 50 GB and DOS the host disk. Fixed by summing
  zinfo.file_size across infolist() and refusing > 200 MB before
  extractall touches disk. 413 zip_too_large_uncompressed.

- F4 admin authz parity: PUT /api/store/entities/{id} was owner-only while
  DELETE allowed owner OR admin; the store-detail page hid Edit/Delete
  buttons from admin even though DELETE was permitted. Fixed by allowing
  admin on PUT and passing is_admin to the template; gate is now
  is_owner OR is_admin everywhere.

- F5 cross-owner suffix collision: sanitize_username is many-to-one
  (alice.smith / alice_smith both → alice-smith). Two such users uploading
  entities with the same display name produced identical
  `<name>-by-<username>` suffixes, silently colliding in the served
  agnes-store-bundle on-disk paths AND the manifest catalog (Claude Code
  dedupes by plugin.json `name`). Fixed by enforcing global uniqueness on
  the suffixed value at create_entity; 409 conflict_global_suffix.

F3 (ZIP symlink members) was investigated and confirmed to be a
false-positive — Python's stdlib ZipFile.extractall does not honor
symlink mode bits, so no exploit exists.

9 new regression tests in tests/test_store_api.py::TestStoreSecurityFixes
covering all four. Test run locally: 60/60 store-related tests pass.
2026-05-05 08:18:02 +02:00
ZdenekSrotyr
e86dd5edc5 fix(anthropic): strict json_schema (additionalProperties=false) + add /admin/scheduler-runs UI
E2E test on a real BQ deploy showed every verification-extraction call
fails with HTTP 400 invalid_request_error: "output_config.format.schema:
For 'object' type, 'additionalProperties' must be explicitly set to false".
The Anthropic structured-output API now requires the field on every object
node in the json_schema. Fix: connectors/llm/anthropic_provider.py wraps
the caller-supplied schema through a recursive _strict_json_schema()
walker that adds the field where missing (preserving any explicit
override), then passes the strict variant to the API. Six unit tests in
TestStrictJsonSchema pin the recursion across nested objects, array items,
and the no-mutation invariant.

Adds /admin/scheduler-runs — a read-only admin page that surfaces the
last 200 audit-log entries from scheduler-driven actions. New
AuditRepository.query_actions(actions, limit) helper, new admin nav
entry. Failed scheduler ticks (HTTP 401, network errors) don't reach
the audit_log; the page calls that out with a hint to set
SCHEDULER_API_TOKEN if no rows show up.
2026-05-05 08:00:57 +02:00
ZdenekSrotyr
e68c2d3f0f fix(session-collector): argv-free run() helper, drop SystemExit footgun (Devin Review on #179)
run_session_collector called collector.main() which did argparse.parse_args()
on uvicorn's sys.argv (['app.main:app', '--host', ...]) → sys.exit(2) →
SystemExit(2), which inherits from BaseException, escapes FastAPI handlers,
and propagates through the thread pool. Every scheduler tick that fired the
endpoint either 500-ed or risked killing the uvicorn worker.

services/session_collector/collector.py now exposes run(dry_run, verbose)
that returns (rc, stats); main() is a thin CLI shim that parses argv and
delegates. The admin endpoint calls run() directly and audit-logs the
per-run stats (users_processed, files_copied, files_skipped) instead of
just the rc. Three regression tests in TestRunHelper.

Closes Devin Review finding on app/api/admin.py:2819 (#179).
2026-05-05 06:31:55 +02:00
ZdenekSrotyr
9f33e24bf9 fix(config): overlay-aware LLM consumers + env-ref resolution (#179 review)
Devin BUG: /api/admin/configure seeds an ai: block to the writable
overlay at DATA_DIR/state/instance.yaml, but the three LLM consumers
imported from config.loader.load_instance_config — which reads the
static config dir only. Even if they had read the overlay, the loader
ran yaml.safe_load directly without passing through _resolve_env_refs,
so '${ANTHROPIC_API_KEY}' would have stayed a literal placeholder. The
pipeline appeared to work because the factory falls back to the env
var directly, but the overlay path itself was dead code.

Two fixes, both required:

1. Switched the three LLM consumers to app.instance_config.load_instance_config:
   - services/corporate_memory/collector.py:collect_all
   - services/verification_detector/__main__.py:main
   - app/api/admin.py:run_verification_detector

2. app/instance_config.py runs the loaded overlay through
   config.loader._resolve_env_refs *before* the deep-merge, so
   '${ANTHROPIC_API_KEY}' resolves at config-load time.

New regression suite tests/test_instance_config_overlay.py pins:
- env-ref resolution against the overlay (resolved when env set,
  empty when env missing — never the literal placeholder)
- deep-merge still preserves static-only sections
- the three consumers reach app.instance_config (inspected via
  inspect.getsource so a future refactor that reverts the import
  fails the test)
- end-to-end: a seeded overlay + ANTHROPIC_API_KEY env reaches the
  factory with a resolved api_key
2026-05-05 05:57:22 +02:00
ZdenekSrotyr
98a8aba3be fix(tests): align test_llm_connector with new factory + fail-fast (#179 review)
The PR rewrote collect_all() to call the new
create_extractor_from_env_or_config() helper, but the existing tests
still mocked the old direct create_extractor() symbol and the old
silent-skip-on-missing-config behavior. Five tests in
TestCorporateMemoryCollector and one in TestCollectorExtractorIntegration
were red on the PR branch.

Changes:
- Tests now mock connectors.llm.create_extractor_from_env_or_config
  (the symbol the collector imports lazily).
- Renamed test_collect_all_no_ai_config_skips ->
  test_collect_all_no_ai_config_or_env_raises and
  test_collector_handles_invalid_config -> test_collector_raises_on_invalid_config.
  Both assert pytest.raises(ValueError) — the explicit fail-fast
  semantics defect 5 of #176 was supposed to enforce.
- collect_all() no longer swallows the factory's ValueError into
  stats["errors"]; it propagates so the scheduler / admin endpoint
  surface the actionable misconfiguration message instead of
  pretending the run was a no-op.
- /api/admin/run-corporate-memory translates the propagated ValueError
  into a 500 with the factory's message, matching
  /api/admin/run-verification-detector.
2026-05-05 05:55:01 +02:00
Minas Arustamyan
af72c5d259 fix(setup): walk TLS chain for trust-store match — Let's Encrypt cleanup
`_read_agnes_ca_pem()` decides whether the served fullchain.pem needs
trust-bootstrapping in the rendered setup prompt. Pre-fix it only
checked the leaf's *immediate* issuer against `certifi`'s trust store.
For Let's Encrypt that's the intermediate (R13), which `certifi` does
not ship — only roots are in trust stores. So a publicly-trusted LE
chain still tripped the "needs bootstrap" path and the setup prompt
emitted a step-0 TLS trust block + clone-fallback marketplace block
that no client actually needs (Bun-compiled `claude.exe`, system git,
Python via certifi all validate the chain through the bundled ISRG
Root X1).

Now we walk every cert in the fullchain (leaf + intermediates) and
return None the first time any cert's issuer is in the certifi trust
store — that captures the standard "leaf signed by intermediate signed
by publicly-trusted root" shape. Trusted subjects are read once into
a set for O(1) lookup. Self-signed (leaf.issuer == leaf.subject) and
private-CA chains (no chain link's issuer in certifi) keep their
previous "return PEM" behavior, so deployments that genuinely need
the bootstrap still get it.

Validated end-to-end against the live VM at
agnes-marustamyan.groupondev.com (LE R13 → ISRG Root X1):
  - Let's Encrypt fullchain                   → has_ca=False (was True)
  - Self-signed cert                          → has_ca=True
  - Corporate-CA chain (private root)         → has_ca=True
  - Missing fullchain.pem                     → has_ca=False
2026-05-05 04:55:06 +02:00
Minas Arustamyan
d5a7c9ad79 feat(store): /store + /my-ai-stack — community marketplace + per-user composition
Adds a community-driven Store where any authenticated user uploads
skills/agents/plugins as ZIPs, plus /my-ai-stack as the per-user
composition view. The served Claude Code marketplace is now:

    (admin_granted ∖ opt_outs) ∪ store_installs

Skill + agent installs are merged into a single `agnes-store-bundle`
plugin in the served marketplace; type=plugin uploads stay standalone.
Names are suffixed with `-by-<owner-username>` at upload time so two
owners can use the same display name without colliding in Claude Code's
flat skill/agent namespace.

Schema v23 → v24 adds three tables:
  - store_entities       — community-uploaded skills/agents/plugins
  - user_store_installs  — what each user has chosen to install
  - user_plugin_optouts  — opt-out overlay on top of admin grants

Admin grant-delete drops every user's opt-out for that plugin so
re-grant resets cleanly to enabled (no sticky personal preference).

UI:
  - /store      — e-commerce-style listing with type/category/owner
                  filters, search, pagination, owner-aware [Install]
                  buttons, clickable cards
  - /store/new  — 2-step upload wizard with drag & drop, preview
                  validation (POST /api/store/entities/preview), docs
                  multi-upload, photo + video URL
  - /store/{id} — detail page with hero, file list, docs, owner
                  actions (Edit/Delete) for the uploader
  - /my-ai-stack — Granted plugins (toggle opt-out) + From the Store
                  (uninstall) sections
  - Admin nav: Marketplaces moved into Admin dropdown, renamed to
                "Curated Marketplaces"

Validation hardening: type-mismatch guards reject skill ZIP uploaded as
agent (or vice versa), and plugin ZIPs masquerading as skills/agents.
Human-readable error messages mapped client-side from machine codes.

Cross-source naming: Store entity-id-prefixed dirs (`plugins/store-<id>/`)
plus the bundle (`plugins/store-bundle/`) avoid collisions with admin
marketplaces (whose `store` slug is reserved by `is_valid_slug`).

Bundle composition is content-hashed at serve time — install/uninstall
or owner re-upload bumps the bundle's plugin.json `version`, so Claude
Code's auto-update toggle picks up changes.

Tests: 50+ new tests across naming, repositories, filter (admin ∪ store
∪ bundle), API (upload/install/uninstall/delete/preview/docs), end-to-end
marketplace.zip with bundle merging.
2026-05-05 02:53:49 +02:00
ZdenekSrotyr
a621a415cc fix(health): session-pipeline staleness check (#176)
GET /api/health/detailed now returns a session_pipeline service entry.
Heuristic:
  max(mtime of /data/user_sessions/**/*.jsonl) <=
  max(processed_at in session_extraction_state) + grace_seconds

grace_seconds = 2 × verification-detector cadence (default 30 min;
configurable via SCHEDULER_VERIFICATION_DETECTOR_INTERVAL).

When the assert fails, status='warning' (never 'error') with an
actionable detail pointing at the verification-detector scheduler job.
A warning bubbles up to the existing overall='degraded' aggregation —
operators querying /api/health/detailed (or /agnes diagnose system)
get a clear breadcrumb instead of a silently-broken pipeline.

Cold-start case (no session files, or files newer than the grace
window with empty state table) is handled explicitly to avoid noise
on a fresh deploy.

Tests: tests/test_health_session_pipeline.py.
2026-05-05 00:04:28 +02:00
ZdenekSrotyr
c53c1e1572 fix(ui): admin pending-review banner on /corporate-memory (#176)
The /corporate-memory page filters status IN ('approved','mandatory')
and showed no hint that pending items exist. With approval_mode set to
'review_queue' (the default in instance.yaml.example), every collection
run would silently funnel new items into the pending bucket where no
operator ever saw them.

For admins (is_km_admin), the page now renders a banner above the
stats bar:
  N pending items awaiting review — review them at /corporate-memory/admin

Non-admins see no change (the route zeroes the count server-side
before passing to the template, so the hint is never leaked).

Tests: tests/test_corporate_memory_page.py.
2026-05-05 00:01:22 +02:00
ZdenekSrotyr
45de71e8ab fix(scheduler): wire LLM pipeline into scheduler-v2 (#176)
The session-collector, verification-detector, and corporate-memory
services now run on the same scheduler-v2 model that already drives
data-refresh, health-check, script-runner, and marketplaces:

- New admin endpoints in app/api/admin.py:
    POST /api/admin/run-session-collector
    POST /api/admin/run-verification-detector
    POST /api/admin/run-corporate-memory
  All admin-gated, sync-def (FastAPI thread pool), with one audit row
  per invocation. Same single-writer-of-system.duckdb pattern as the
  existing /api/marketplaces/sync-all job.

- services/scheduler/__main__.py JOBS gains three entries with offset
  cadences (10m / 15m / 17m, all coprime modulo the 30s tick) so the
  three LLM-backed jobs don't fire on the same tick and stack their
  API + DB load.

- The verification-detector endpoint surfaces the LLM factory's
  fail-fast ValueError as HTTP 500 with the actionable message,
  preserving the no-silent-skip contract from the previous commit.

Tests:
- tests/test_admin_run_endpoints.py covers admin gating + scheduler
  registration + endpoint contract.
- tests/test_scheduler_sidecar.py existing tests continue to pass.
2026-05-04 23:57:43 +02:00
ZdenekSrotyr
bbb04ac041 fix(setup): seed default ai: block + env-var fallback (#176)
POST /api/admin/configure now writes a default ai: block into the
instance.yaml overlay when the request leaves it untouched and either
ANTHROPIC_API_KEY or LLM_API_KEY is set in the environment. The block
references the env var via ${VAR} syntax — secrets never land in YAML.

connectors.llm.factory grows create_extractor_from_env_or_config which
falls back to ANTHROPIC_API_KEY / LLM_API_KEY when ai_config is empty
and raises a clear ValueError when neither is available. Both
services/corporate_memory and services/verification_detector switch to
the new helper, replacing the old 'silently skip when ai: missing'
path that was the silent-failure root cause.

Tests:
- tests/test_setup_ai_block.py — overlay seeding contract.
- tests/test_llm_provider_env_fallback.py — fallback + fail-fast.
2026-05-04 23:55:19 +02:00
ZdenekSrotyr
5915f92eaa fix(query-guardrail): single-pass alternation regex (Devin Review on query.py:464)
The iterative bare-name rewriter (one re.sub per name, longest-first)
was vulnerable to cross-contamination when the GCP project ID contained
a registered table name as a hyphen-delimited word.

Concrete repro:
  project        = 'my-ue-project'
  registered     = ['orders', 'ue']
  user SQL       = 'SELECT * FROM orders JOIN ue ON ...'
  iter 1 (orders): produces 'FROM `my-ue-project.fin.orders` JOIN ue ...'
  iter 2 (ue):     '\bue\b' matches 'ue' INSIDE 'my-ue-project' (hyphen
                   creates word boundary on both sides) — corrupts
                   the iter-1 path

Fallback at query.py:576 caught the resulting BQ parse error and fell
back to per-table SELECT * estimate, so impact was over-estimation,
not fail-open — but the #171 partition-pruning fix silently degraded
to pre-fix behavior whenever a project name shared a hyphen-segment
with a registered table.

Fix: single re.sub call with an alternation regex sorted longest-first.
Single-pass means each source position is processed exactly once, so
freshly-inserted backticked text from one match isn't re-scanned by
later names in the alternation.

Regression test
test_rewrite_helper_does_not_corrupt_when_project_id_contains_registered_name
covers the exact Devin repro.
2026-05-04 22:51:33 +02:00
ZdenekSrotyr
424ec9b0f4 refactor(install.html): single tile, single PAT-mint body shape
Drops the `<nav class="role-tiles">` block (Analyst / Admin tiles),
the `_show_admin_tile` flag, the `const ROLE = {{ role | tojson }};`
JS line, and the role-aware PAT-mint ternary. The setupNewClaude
button now mints a uniform PAT for everyone:

  { name: defaultTokenName(), expires_in_days: 90 }

…against the existing `POST /auth/tokens` endpoint. No new endpoint,
no role-locked TTL clamp. The `bootstrap-analyst` 1-hour scope is no
longer used from /setup (it broke the install flow anyway — saved PATs
expired before the user opened Claude Code; tracked as a separate
cleanup issue).

Also removes the now-unused `.role-tiles` / `.role-tile` CSS rules so
the stylesheet doesn't carry dead selectors.

Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 6.
2026-05-04 22:18:00 +02:00
ZdenekSrotyr
2ee529533f refactor(setup-page): drop role query param
The `/setup` route no longer accepts `?role=analyst|admin`. The route
signature drops the `Literal[...] = Query(...)` parameter and the
silent admin-downgrade block (`if role == "admin" and not is_admin:
role = "analyst"`). The `role` ctx variable threaded into install.html
also goes away — Task 6 cleans up the template's role-tile UI and the
JS PAT-mint ternary.

`?role=` is silently ignored by FastAPI for unknown query params, so
existing bookmarks (none in production — the param was added in this
PR and never shipped) just degrade to the unified layout. No
RedirectResponse shim needed.

Tests: drop the entire `tests/test_setup_page_roles.py` file (eight
role-branching tests that no longer apply) and add
`tests/test_setup_page_unified.py` with three tests:

  - `test_setup_page_renders_unified_layout`
  - `test_setup_page_ignores_role_query_param`
  - `test_setup_page_renders_marketplace_for_user_with_grants`
  - `test_install_legacy_path_redirects_to_setup`

Also replace the role-aware `test_install_preview_*` tests in
test_web_ui.py with unified-layout assertions.

Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 5.
2026-05-04 22:16:59 +02:00
ZdenekSrotyr
291079b1d2 refactor(welcome-template): drop role param; resolve plugins per-user unconditionally
Removes the `role: Literal["analyst", "admin"] = "admin"` parameter from
`compute_default_agent_prompt`. The same RBAC pass
(`marketplace_filter.resolve_allowed_plugins`) now runs for every user —
admin or not. Users with no `resource_grants` rows get the
no-marketplace layout; users with grants get the marketplace block
inserted. Admin-vs-analyst is no longer a layout branch.

`render_agent_prompt_banner` no longer derives a `role` from
`user.is_admin`; it just delegates to `compute_default_agent_prompt`.
Two `compute_default_agent_prompt(...role=role)` call sites in
`app/web/router.py::setup_page` are updated to drop the keyword so the
route keeps rendering — Task 5 will remove the `?role=` query
parameter and the silent admin-downgrade block from the route signature
itself.

Tests: drop role-aware assertions from test_welcome_template_renderer
and test_welcome_template_api. Both files now assert the unified
default contains `agnes init` + `uv tool install` and bans the legacy
`agnes auth import-token` / `agnes auth whoami` verbs.

Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 4.
2026-05-04 22:13:46 +02:00
ZdenekSrotyr
74b7f6e254 feat(setup-instructions): preflight checks both git and claude
Renames `_git_check_block` to `_preflight_block` and adds a
`claude --version` check beside `git --version`. Both binaries are
required by the marketplace step — git for the clone fallback,
claude for `claude plugin marketplace add` / `claude plugin install` —
so checking them together gives one clear failure instead of two
confusing downstream errors.

Install hints: `npm i -g @anthropic-ai/claude-code` for Linux / WSL
plus a doc URL (https://docs.claude.com/claude-code) for the native
macOS / Windows installers. We don't try to one-line a native
installer; the canonical instructions live upstream.

Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 3.
2026-05-04 22:11:38 +02:00
ZdenekSrotyr
e16698c3cc refactor(setup-instructions): unified layout with mandatory agnes init
Adds `_step_numbers(*, has_marketplace, has_skills)` so step numbering
lives in one place instead of being split across three branches in
`resolve_lines`. Pins the unified layout in the tests:

  No plugins:     1 install, 2 init, 3 catalog, 4 diagnose, 5 skills, 6 confirm
  With plugins:   1, 2, 3, 4 preflight, 5 marketplace, 6 diagnose, 7 skills, 8 confirm

`agnes auth import-token` / `agnes auth whoami` are now banned from the
rendered prompt — `agnes init` subsumes them. The renamed
`test_resolve_lines_no_plugins_unified_six_step_layout` asserts those
strings are absent and that the new step headers (`Bootstrap your Agnes
workspace`, `Verify the data is queryable`) are present.

Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 2.
2026-05-04 22:10:05 +02:00
ZdenekSrotyr
9334beed15 refactor(setup-instructions): drop role param; collapse analyst/admin into one layout
Removes the `role: Literal["analyst", "admin"]` parameter from
`resolve_lines` / `render_setup_instructions` and deletes the
`_resolve_analyst_lines`, `_analyst_init_lines`, `_analyst_finale_lines`
helpers. The unified flow now always emits `agnes init` (the
workspace-rails delivery mechanism) in place of the legacy
`agnes auth import-token` + `agnes auth whoami` pair, and uses
`agnes catalog` as the smoke-verify step.

`agnes init` already verifies the PAT internally, and `agnes catalog`
doubles as a data-plane smoke check, so dropping `agnes auth whoami`
costs no signal.

Drops the now-redundant `tests/test_setup_instructions_analyst.py` and
patches the one ordering test in `tests/test_setup_instructions.py` that
referenced the old "Log in" / "Verify the login" headers. Also strips
the `role=role` kwarg from `compute_default_agent_prompt`'s call into
`resolve_lines` so the welcome-template render path keeps working;
welcome_template.py's own role param is removed in a follow-up task.

Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 1.
2026-05-04 22:08:48 +02:00
ZdenekSrotyr
103efb69f0 chore(cli-rename): replace stale da verbs in active code paths
Bring admin UI, audit-log messages, code comments, and analyst-facing
skill docs in line with the post-bootstrap CLI surface (`agnes pull`,
`agnes push`, `agnes init`, `agnes snapshot create`). The legacy
`_LEGACY_STRINGS` detection tuple in `app/api/claude_md.py` and the hook
upgrade markers in `cli/lib/hooks.py` are intentionally left as-is —
they exist precisely to flag pre-rewrite content for re-authoring.

Strip "(folded from `da metrics list`)" / "(lifted from `da metrics
show`)" / "Replaces the old `da analyst status`" docstring noise — the
rename history is in CHANGELOG.md, not in module docstrings.
2026-05-04 21:10:43 +02:00
ZdenekSrotyr
500db8cd3c fix(query-guardrail): dry-run user SQL not synthetic SELECT * (#171)
Closes #171. The /api/query cost guardrail used to dry-run a synthetic
`SELECT * FROM <table>` for each registered remote-BQ row referenced
by the user SQL — which made BigQuery estimate a full table scan, with
column projection, predicate pushdown, and partition pruning all
disabled. Narrow queries on big partitioned/clustered tables (the
documented happy path for `agnes query --remote`) hit ~30,000×
over-estimates and got rejected with 400 `remote_scan_too_large` even
when BQ's own dry-run reported single-digit MB.

Pavel's report on #171 traced the root cause and proposed the fix:
rewrite the user SQL to BQ-native syntax and dry-run it as a single
job, exactly the way `bq query --dry_run` works.

Implementation:
- New helper _rewrite_user_sql_for_bq_dry_run rewrites bare registered
  names (word-boundary, case-insensitive, longest-first to avoid prefix
  collisions) + bq."<ds>"."<tbl>" forms to backticked
  `<project>.<ds>.<tbl>` paths.
- _bq_quota_and_cap_guard runs ONE dry-run on the rewritten SQL. Cap
  check uses the real estimate.
- Fallback path: if BQ rejects with bq_bad_request (e.g. DuckDB-only
  syntax like ::INT casts), the guard falls back to the pre-fix
  per-table SELECT * approach so non-portable queries still get a
  (loose) cap estimate instead of fail-opening. Non-parse BQ errors
  (forbidden, upstream) still propagate as 502.
- _bq_guardrail_inputs now also returns name_lookups so the rewriter
  has the (registered_name, bucket, source_table) mapping it needs.
- Per-table breakdown is unavailable from a composite dry-run; total
  bytes are pinned to dry_run_set[0] for the post-flight
  record_bytes(sum(...)) call to keep returning the right total.

Tests (7 new, 3 existing still pass):
- dry-run receives rewritten user SQL with WHERE clause intact (the
  load-bearing assertion for #171)
- single dry-run per request even with multiple registered tables
  (JOIN, UNION) referenced
- fallback to per-table SELECT * on bq_bad_request
- non-parse BQ errors (forbidden) still 502
- rewriter unit tests: bare + bq.path in same SQL, longest-name-wins
  on prefix collision, case-insensitive bare-name match
2026-05-04 21:08:21 +02:00
ZdenekSrotyr
e438170ade merge: pull #174 (BQ materialize view fix + concurrency, 0.33.0) into bootstrap branch
Brings in zs/materialize-sync-fix (PR #174):
- BigQuery view materialize works (wrap admin SQL in bigquery_query())
- Per-table mutex + fcntl.flock for concurrent COPY corruption
- Cost guardrail dry-run engages on materialized rows
- Schema v23 -> v24 migration: rewrite source_query to BQ-native
- Server-generated trivial source_query from bucket+source_table
- Validator backtick relaxation for materialized rows
- 0.33.0 release cut

Conflict resolution:
- CHANGELOG.md: keep our [Unreleased] (bootstrap rewrite content) ABOVE
  the new [0.33.0] section from #174. The bootstrap rewrite remains
  unreleased; it'll cut 0.34.0 (or later) when this PR merges to main.
- tests/conftest.py: union — keep our analyst-bootstrap fixture
  re-export AND #174's bq_instance / stub_bq_extractor fixtures.
- pyproject.toml auto-merged to 0.33.0 (matches the cut), correct.
- src/db.py auto-merged: SCHEMA_VERSION = 24, _v23_to_v24_finalize
  added — no overlap with our work which left schema at v23.
- CLAUDE.md auto-merged: schema-history paragraph extended with v24.

Verified: 79/79 across CLI bootstrap suite + materialize suite +
schema v24 migration tests pass locally on Python 3.13/macOS.
2026-05-04 20:53:00 +02:00
ZdenekSrotyr
92d477e422 fix(setup): default /setup to analyst, hide admin tile from non-admins
Three coupled UX fixes for the analyst-onboarding flow:

1. Dashboard "Setup a new Claude Code" CTA was rendering admin paste
   prompt for everyone (analysts couldn't actually execute the marketplace
   plugin install / skills setup steps). render_agent_prompt_banner now
   picks role based on user.is_admin — analysts get the analyst flow.

2. /setup default role changed from admin to analyst. Most visitors are
   analysts; admin layout is opt-in via the admin tile or ?role=admin.

3. Admin tile is admin-only on the role-tile nav. Non-admins see only
   the analyst tile. Server-side: non-admin requesting ?role=admin is
   silently downgraded to analyst (otherwise they'd see admin paste
   prompt despite no tile).

Tests:
- New: test_setup_page_admin_tile_hidden_for_non_admin (anonymous client
  can't see "Admin CLI" or role=admin link)
- New: test_setup_page_admin_role_downgraded_for_non_admin (anonymous
  ?role=admin → analyst layout, no marketplace step in clipboard)
- New: test_install_preview_default_role_is_analyst (admin signing in to
  bare /setup gets analyst clipboard by default)
- Renamed: test_setup_page_default_role_is_admin → ..._is_analyst
- Updated: test_setup_page_admin_clipboard_renders_admin_layout uses
  FastAPI dependency_overrides to inject admin user (admin layout is
  now admin-gated)
- Updated: test_install_preview_visible_for_signed_in_user explicitly
  passes ?role=admin to exercise admin layout
2026-05-04 20:20:37 +02:00
ZdenekSrotyr
3d58768143 fix: address Devin Review findings — incomplete renames + estimate guard
13 Devin findings across 10 files:

🔴 Critical:
- app/api/v2_catalog.py:42 — `_fetch_hint` returns `da fetch` in /api/v2/catalog
  responses (user-visible in every catalog list)
- cli/skills/agnes-data-querying.md — 11 stale `da fetch`/`da sync` refs in the
  bundled skill markdown
- config/claude_md_template.txt:38 — referenced `agnes pull --docs-only` flag
  that does NOT exist in agnes pull (removed; spec only ships --quiet/--json/
  --dry-run)

🟡 Important:
- app/api/admin.py:252 — `da fetch` in bq_max_scan_bytes hint
- cli/commands/auth.py:119 — `da sync` in import-token docstring (--help text)
- cli/commands/tokens.py:48 — "Export it so `da` can use it" prose
- ARCHITECTURE.md — 4 stale rows in CLI commands table
- README.md — stale paragraphs for analysts (da sync, da analyst setup)

🚩 Substantive observations addressed:
- app/api/query.py:249,302,489 — server-side error/help strings still said
  `da sync`/`da fetch` (returned in API responses to clients)
- cli/commands/snapshot.py:235-241 — DuckDB existence guard incorrectly
  blocked `--estimate` (server-side dry-run that never opens local DB).
  Added test ensuring estimate path skips the guard.

Skipped (intentionally historical):
- app/api/admin.py:2377,2429,2437 — historical comments describing past
  manifest-vs-sync_state bug; past tense, accurate to keep as `da sync`.
2026-05-04 20:05:06 +02:00
ZdenekSrotyr
5bffec641f chore(lint): final ruff fixes 2026-05-04 19:32:52 +02:00
ZdenekSrotyr
6c0846fd17 feat(config): expose materialize.lock_ttl_seconds in server-config
New top-level 'materialize' section, single field (lock_ttl_seconds).
Default 86400 (24h). Backs the file-lock TTL reclaim added in the
per-table-mutex change. Editable via PUT /api/admin/server-config and
the /admin/server-config UI.
2026-05-04 18:52:54 +02:00
ZdenekSrotyr
3871d5320a feat(admin): server-generate materialized source_query, allow BQ backticks
When admin registers a materialized BQ row with bucket+source_table but
no source_query, the server generates 'SELECT * FROM `<project>.<ds>.<tbl>`'
from instance.yaml's configured BQ project. Same fallback fires on PUT
when flipping to materialized. The backtick rejection guard, which was
appropriate for DuckDB-flavor source_query, is relaxed for materialized
rows since the new wrapping path (Task 2) runs admin SQL through BQ
jobs API which uses BQ-native syntax (backticks for dashed identifiers).
2026-05-04 18:37:27 +02:00
ZdenekSrotyr
c7c42de0f0 feat(sync): treat MaterializeInFlightError as 'skipped, in_flight'
_run_materialized_pass distinguishes due-check skips from in-flight
skips and never calls state.set_error for either. summary['skipped']
becomes a list of {table, reason} dicts; the end-of-pass log line
breaks out the in_flight subcount.

Hoists is_table_due to module-level import so test monkeypatching of
the symbol intercepts the call (the previous local import made
patches a no-op).
2026-05-04 18:11:38 +02:00