agnes-the-ai-analyst

History

ZdenekSrotyr 500db8cd3c fix(query-guardrail): dry-run user SQL not synthetic SELECT * (#171 ) Closes #171. The /api/query cost guardrail used to dry-run a synthetic `SELECT * FROM <table>` for each registered remote-BQ row referenced by the user SQL — which made BigQuery estimate a full table scan, with column projection, predicate pushdown, and partition pruning all disabled. Narrow queries on big partitioned/clustered tables (the documented happy path for `agnes query --remote`) hit ~30,000× over-estimates and got rejected with 400 `remote_scan_too_large` even when BQ's own dry-run reported single-digit MB. Pavel's report on #171 traced the root cause and proposed the fix: rewrite the user SQL to BQ-native syntax and dry-run it as a single job, exactly the way `bq query --dry_run` works. Implementation: - New helper _rewrite_user_sql_for_bq_dry_run rewrites bare registered names (word-boundary, case-insensitive, longest-first to avoid prefix collisions) + bq."<ds>"."<tbl>" forms to backticked `<project>.<ds>.<tbl>` paths. - _bq_quota_and_cap_guard runs ONE dry-run on the rewritten SQL. Cap check uses the real estimate. - Fallback path: if BQ rejects with bq_bad_request (e.g. DuckDB-only syntax like ::INT casts), the guard falls back to the pre-fix per-table SELECT * approach so non-portable queries still get a (loose) cap estimate instead of fail-opening. Non-parse BQ errors (forbidden, upstream) still propagate as 502. - _bq_guardrail_inputs now also returns name_lookups so the rewriter has the (registered_name, bucket, source_table) mapping it needs. - Per-table breakdown is unavailable from a composite dry-run; total bytes are pinned to dry_run_set[0] for the post-flight record_bytes(sum(...)) call to keep returning the right total. Tests (7 new, 3 existing still pass): - dry-run receives rewritten user SQL with WHERE clause intact (the load-bearing assertion for #171) - single dry-run per request even with multiple registered tables (JOIN, UNION) referenced - fallback to per-table SELECT * on bq_bad_request - non-parse BQ errors (forbidden) still 502 - rewriter unit tests: bare + bq.path in same SQL, longest-name-wins on prefix collision, case-insensitive bare-name match		2026-05-04 21:08:21 +02:00
..
api	fix(query-guardrail): dry-run user SQL not synthetic SELECT * (#171 )	2026-05-04 21:08:21 +02:00
auth	feat(tokens): add scope + ttl_seconds fields with bootstrap-analyst clamp	2026-05-04 17:00:54 +02:00
debug	feat(observability): request_id end-to-end + dev debug toolbar + centralized logging (#136 )	2026-04-29 22:54:21 +02:00
marketplace_server	fix(marketplace): use plugin.json name in synth marketplace.json (#133 )	2026-04-29 19:25:57 +02:00
middleware	feat(observability): request_id end-to-end + dev debug toolbar + centralized logging (#136 )	2026-04-29 22:54:21 +02:00
web	merge: pull #174 (BQ materialize view fix + concurrency, 0.33.0) into bootstrap branch	2026-05-04 20:53:00 +02:00
__init__.py	feat: add FastAPI server with auth, RBAC, and all API endpoints	2026-03-27 15:19:18 +01:00
instance_config.py	feat(config): default welcome template in jinja2 + sync_interval	2026-05-03 16:10:48 +02:00
logging_config.py	feat(observability): request_id end-to-end + dev debug toolbar + centralized logging (#136 )	2026-04-29 22:54:21 +02:00
main.py	feat(admin): #160 BQ test-connection endpoint + billing_project placeholder UI	2026-05-04 10:31:35 +02:00
resource_types.py	feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150 )	2026-04-30 22:02:16 +02:00
secrets.py	fix: address Devin review round 5 — empty secret file, CI .env	2026-04-10 14:55:31 +02:00
utils.py	feat(rbac+marketplace): RBAC v13 + Claude Code marketplace + #81/#83/#44 hardening	2026-04-28 14:25:04 +02:00