agnes-the-ai-analyst

History

ZdenekSrotyr 896c43c7a2 feat(query): #160 cost guardrail + bq.* RBAC + quota integration on /api/query The headline implementation for issue #160. POST /api/query now gates direct `bq."<dataset>"."<source_table>"` references behind the registry and bounds the BQ scan cost behind a configurable cap. Wired through the same singleton QuotaTracker as /api/v2/scan so daily-byte budgets are shared across both BQ-touching paths. Changes in app/api/query.py: - Add module-level `BQ_PATH` regex matching the 16 syntax variants verified empirically (fully-quoted, unquoted, mixed quoting, case-insensitive, inside CTE bodies, multi-path, …). - Add `bigquery_query` to the SQL keyword blocklist. Closes the pre-existing function-call backdoor where a user could run an arbitrary BQ jobs API call against any reachable dataset, bypassing the registry and RBAC. Wrap views internal to the BQ extractor still use bigquery_query() — but those run via DuckDB view resolution at query time, not via user-submitted SQL, so the blocklist doesn't break them. - Add `_bq_guardrail_inputs` helper: walks user SQL twice — once for bare-name matches against accessible registered remote-BQ names (contributes to dry_run_set), once for direct `bq.X.Y` matches (gated against `find_by_bq_path` lookups, returns 403 with structured detail on miss or grant violation). - Add `_enforce_remote_bq_quota_and_cap` helper: pre-flight `check_daily_budget` (over-cap → 429), then `with quota.acquire(...)` wraps a per-path BQ dry-run, sums bytes, raises 400 `remote_scan_too_large` when total > cap. - Cap default 5 GiB; configurable via `api.query.bq_max_scan_bytes` in /admin/server-config (next phase wires the UI). - Post-flight `record_bytes` against the user's daily counter. - Module-level imports of `_bq_dry_run_bytes`, `_build_quota_tracker`, `get_bq_access` so tests can monkeypatch via `app.api.query.<name>`. Tests: - All 23 RED tests from the previous commit now pass (regex matrix, blocklist with detail-string assertion, RBAC unregistered/admin-bypass, guardrail dry-run-called/over-cap-rejected, quota pre-flight 429). - mock_dry_run fixture stubs both `_bq_dry_run_bytes` and `get_bq_access` so guardrail tests don't require a live BQ project. - Quota test uses `admin1` (the seeded_app fixture's actual user id, not `admin`). Smoke: 887 passed across query/bq/admin/extractor/registry/quota domains. No regressions.		2026-05-04 10:31:35 +02:00
..
api	feat(query): #160 cost guardrail + bq.* RBAC + quota integration on /api/query	2026-05-04 10:31:35 +02:00
auth	security(auth): per-IP rate limit + last-admin guard (#165 )	2026-05-02 21:08:33 +02:00
debug	feat(observability): request_id end-to-end + dev debug toolbar + centralized logging (#136 )	2026-04-29 22:54:21 +02:00
marketplace_server	fix(marketplace): use plugin.json name in synth marketplace.json (#133 )	2026-04-29 19:25:57 +02:00
middleware	feat(observability): request_id end-to-end + dev debug toolbar + centralized logging (#136 )	2026-04-29 22:54:21 +02:00
web	refactor(bq): #160 remove legacy_wrap_views config knob (always-wrap)	2026-05-04 10:31:35 +02:00
__init__.py	feat: add FastAPI server with auth, RBAC, and all API endpoints	2026-03-27 15:19:18 +01:00
instance_config.py	feat(config): default welcome template in jinja2 + sync_interval	2026-05-03 16:10:48 +02:00
logging_config.py	feat(observability): request_id end-to-end + dev debug toolbar + centralized logging (#136 )	2026-04-29 22:54:21 +02:00
main.py	feat(api,web,cli): /admin/workspace-prompt + /api/welcome restored + da analyst writes CLAUDE.md	2026-05-03 22:44:14 +02:00
resource_types.py	feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150 )	2026-04-30 22:02:16 +02:00
secrets.py	fix: address Devin review round 5 — empty secret file, CI .env	2026-04-10 14:55:31 +02:00
utils.py	feat(rbac+marketplace): RBAC v13 + Claude Code marketplace + #81/#83/#44 hardening	2026-04-28 14:25:04 +02:00