agnes-the-ai-analyst

Author	SHA1	Message	Date
Petr	85c87ec375	Pass explicit bqstorage_client to to_arrow_iterable() for Storage API Without explicit bqstorage_client parameter, to_arrow_iterable() silently falls back to REST API pagination (~5K rows/sec). With explicit client, it uses parallel gRPC streams via BQ Storage API (~300K rows/sec). No temp table materialization - BQ already writes query results to an internal temp table automatically. We just tell the reader to use the fast gRPC path instead of slow HTTP pagination.	2026-03-12 10:51:44 +01:00
Petr	4f74543a12	Fix streaming: use RowIterator.to_arrow_iterable() not QueryJob QueryJob only has to_arrow(), not to_arrow_iterable(). Must call query_job.result() first to get RowIterator, which has the streaming to_arrow_iterable() method.	2026-03-11 20:15:35 +01:00
Petr	ee70da86c3	Stream BQ results to Parquet instead of loading into memory Replace to_arrow() (loads entire result into RAM) with to_arrow_iterable() (streams RecordBatches). Each batch is written directly to disk via ParquetWriter - constant memory regardless of table size. Prevents OOM on 8GB server for multi-million row tables.	2026-03-11 20:13:03 +01:00
Petr	e26e47a071	Add BQ Storage API fallback to REST when readsessions permission missing	2026-03-11 13:59:09 +01:00
Petr	758910463b	Add BigQuery data source adapter BigQuery connector that syncs BQ tables to local Parquet files via PyArrow (no CSV intermediate step). Supports full refresh, timestamp-based incremental (via incremental_column), and partition-based sync strategies. - connectors/bigquery/client.py: BQ API wrapper with ADC auth, parameterized queries, metadata cache, cross-project support (job project != data project) - connectors/bigquery/adapter.py: DataSource implementation with merge/dedup - src/config.py: Add incremental_column field to TableConfig - 72 unit tests (mocked, no GCP SDK required)	2026-03-11 13:56:12 +01:00

5 commits