agnes-the-ai-analyst/app/api
ZdenekSrotyr 896c43c7a2 feat(query): #160 cost guardrail + bq.* RBAC + quota integration on /api/query
The headline implementation for issue #160. POST /api/query now gates
direct `bq."<dataset>"."<source_table>"` references behind the registry
and bounds the BQ scan cost behind a configurable cap. Wired through
the same singleton QuotaTracker as /api/v2/scan so daily-byte budgets
are shared across both BQ-touching paths.

Changes in app/api/query.py:

- Add module-level `BQ_PATH` regex matching the 16 syntax variants
  verified empirically (fully-quoted, unquoted, mixed quoting,
  case-insensitive, inside CTE bodies, multi-path, …).
- Add `bigquery_query` to the SQL keyword blocklist. Closes the
  pre-existing function-call backdoor where a user could run an
  arbitrary BQ jobs API call against any reachable dataset, bypassing
  the registry and RBAC. Wrap views internal to the BQ extractor still
  use bigquery_query() — but those run via DuckDB view resolution at
  query time, not via user-submitted SQL, so the blocklist doesn't
  break them.
- Add `_bq_guardrail_inputs` helper: walks user SQL twice — once for
  bare-name matches against accessible registered remote-BQ names
  (contributes to dry_run_set), once for direct `bq.X.Y` matches
  (gated against `find_by_bq_path` lookups, returns 403 with
  structured detail on miss or grant violation).
- Add `_enforce_remote_bq_quota_and_cap` helper: pre-flight
  `check_daily_budget` (over-cap → 429), then `with quota.acquire(...)`
  wraps a per-path BQ dry-run, sums bytes, raises 400
  `remote_scan_too_large` when total > cap.
- Cap default 5 GiB; configurable via `api.query.bq_max_scan_bytes`
  in /admin/server-config (next phase wires the UI).
- Post-flight `record_bytes` against the user's daily counter.
- Module-level imports of `_bq_dry_run_bytes`, `_build_quota_tracker`,
  `get_bq_access` so tests can monkeypatch via `app.api.query.<name>`.

Tests:
- All 23 RED tests from the previous commit now pass (regex matrix,
  blocklist with detail-string assertion, RBAC unregistered/admin-bypass,
  guardrail dry-run-called/over-cap-rejected, quota pre-flight 429).
- mock_dry_run fixture stubs both `_bq_dry_run_bytes` and `get_bq_access`
  so guardrail tests don't require a live BQ project.
- Quota test uses `admin1` (the seeded_app fixture's actual user id, not
  `admin`).

Smoke: 887 passed across query/bq/admin/extractor/registry/quota
domains. No regressions.
2026-05-04 10:31:35 +02:00
..
__init__.py feat: add FastAPI server with auth, RBAC, and all API endpoints 2026-03-27 15:19:18 +01:00
access.py security(auth): per-IP rate limit + last-admin guard (#165) 2026-05-02 21:08:33 +02:00
admin.py refactor(bq): #160 remove legacy_wrap_views config knob (always-wrap) 2026-05-04 10:31:35 +02:00
catalog.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
claude_md.py feat(api,web,cli): /admin/workspace-prompt + /api/welcome restored + da analyst writes CLAUDE.md 2026-05-03 22:44:14 +02:00
cli_artifacts.py release(2.1.0): durable sync, CLI auto-update, versioned wheel URL, version unification (#43) 2026-04-22 21:18:18 +02:00
data.py fix(security+ops) + release(0.12.1): #82 #85 #87 hardening + cut 0.12.1 (#104) 2026-04-28 19:57:30 +02:00
health.py fix(admin+diagnose): address 2 additional Devin Review findings on PR #152 2026-05-01 21:21:23 +02:00
jira_webhooks.py fix(security): close Jira webhook fail-open + path traversal (#83) (#93) 2026-04-27 19:53:55 +02:00
marketplaces.py fix(scheduler): HTTP marketplaces job + SCHEDULER_API_TOKEN shared secret (#127) 2026-04-29 11:44:00 +02:00
me_debug.py feat(auth): /me/debug self-only auth diagnostic page (#116) 2026-04-29 06:36:28 +02:00
memory.py feat(memory): admin Edit + MEMORY_DOMAIN RBAC + ai-section UI (#141) 2026-04-30 11:04:41 +02:00
metadata.py feat(rbac+marketplace): RBAC v13 + Claude Code marketplace + #81/#83/#44 hardening 2026-04-28 14:25:04 +02:00
metrics.py feat(rbac+marketplace): RBAC v13 + Claude Code marketplace + #81/#83/#44 hardening 2026-04-28 14:25:04 +02:00
query.py feat(query): #160 cost guardrail + bq.* RBAC + quota integration on /api/query 2026-05-04 10:31:35 +02:00
query_hybrid.py feat(rbac+marketplace): RBAC v13 + Claude Code marketplace + #81/#83/#44 hardening 2026-04-28 14:25:04 +02:00
scripts.py feat(scheduler): re-wire sync_schedule + script.schedule; tune via env; OpenMetadata TLS (#135) 2026-04-29 22:06:30 +02:00
settings.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
sync.py fix(admin-api): reject backtick BQ-native source_query at register; surface materialize errors per-row 2026-05-01 22:51:02 +02:00
telegram.py feat: complete system — web UI, all API endpoints, governance, admin, CLI commands 2026-03-27 16:52:22 +01:00
tokens.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
upload.py fix(security+ops) + release(0.12.1): #82 #85 #87 hardening + cut 0.12.1 (#104) 2026-04-28 19:57:30 +02:00
users.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
v2_arrow.py feat(v2): claude-driven fetch primitives + 0.14.0 (#102) 2026-04-29 01:07:19 +02:00
v2_cache.py feat(v2): claude-driven fetch primitives + 0.14.0 (#102) 2026-04-29 01:07:19 +02:00
v2_catalog.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
v2_quota.py refactor(quota): #160 relocate _build_quota_tracker to v2_quota.py 2026-05-04 10:31:35 +02:00
v2_sample.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
v2_scan.py refactor(quota): #160 relocate _build_quota_tracker to v2_quota.py 2026-05-04 10:31:35 +02:00
v2_schema.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
welcome.py fix(devin-review): dashboard CTA respects override; PUT validates anon path 2026-05-03 21:45:32 +02:00
where_validator.py feat(v2): claude-driven fetch primitives + 0.14.0 (#102) 2026-04-29 01:07:19 +02:00