agnes-the-ai-analyst/scripts
ZdenekSrotyr 85d3810535 feat(materialized): query_mode='materialized' for BigQuery + Keboola — admin SELECT → parquet → analyst
Closes the 'admin pre-stages a curated table/view for analysts' use case end-to-end across both supported source connectors.

Backend (BigQuery + Keboola, schema v20):
  - schema v20 adds source_query TEXT to table_registry (renumbered from v19 after main's #150 RBAC migration also bumped to v19)
  - connectors/bigquery/extractor.py adds materialize_query(table_id, sql, *, bq, output_dir, max_bytes=...) — BqAccess session, dry-run cost guardrail (default 10 GiB, configurable via data_source.bigquery.max_bytes_per_materialize), idempotent ATTACH, rows/bytes/md5 metadata for sync_state
  - connectors/keboola/access.py — new KeboolaAccess facade (parallel of BqAccess) wrapping ATTACH 'keboola://...' AS kbc
  - connectors/keboola/extractor.py adds materialize_query — same shape, no dry-run analog (Keboola Storage API has different cost model); legacy bucket-download path skips query_mode='materialized' rows
  - app/api/sync.py:_run_materialized_pass dispatches by source_type to the right materialize_query
  - app/api/admin.py: RegisterTableRequest accepts source_query; model_validator coheres mode↔source_query↔bucket; PUT preserves omitted fields; deprecation marks (Field(deprecated=True)) on sync_strategy + profile_after_sync (no extractor reads them; profile_after_sync becomes inert — bug from earlier work where /api/sync/trigger never honored the flag); _BQ_OPTIONAL_FIELD_DEFAULTS injects defaults into GET /server-config payload

Operator + CLI surface:
  - da admin register-table --query / --query-mode materialized
  - scripts/smoke-test-materialized-bq.sh — end-to-end smoke for operators

Tests (incl. spike + integration + regression):
  - test_db_migration_v20, test_table_registry_source_query
  - test_bq_materialize, test_bq_cost_guardrail, test_bq_init_extract_skips
  - test_keboola_access, test_keboola_extension_query_passthrough (lock-in for the DuckDB extension capability), test_keboola_materialize, test_keboola_init_extract_skips, test_keboola_materialized_e2e (skipped without KBC_TEST_* creds)
  - test_sync_trigger_materialized, test_sync_trigger_keboola_materialized
  - test_api_admin_materialized, test_cli_admin_materialized
  - test_admin_bq_register, test_admin_discover_bigquery, test_admin_keboola_materialized, test_admin_phase_c_deprecation, test_admin_put_preservation, test_materialized_e2e

Cost: BQ uses bigquery_query() (jobs API, view-aware) — works on tables, views, materialized views uniformly. Keboola uses ATTACH+COPY parquet through the DuckDB extension.
2026-05-01 20:25:56 +02:00
..
debug chore(oss): isolate customer-specific deploy bits from scripts/grpn/ (#88, wave 1) (#94) 2026-04-27 20:24:34 +02:00
dev feat(setup): cross-platform TLS bootstrap + marketplace plugin install (#137) 2026-04-30 08:56:45 +02:00
ops fix(tls-rotate): self-signed fallback sets basicConstraints=critical,CA:FALSE (#159) 2026-05-01 12:23:14 +02:00
bootstrap-gcp.sh fix(bootstrap): grant monitoring.editor + enable monitoring API 2026-04-21 20:32:50 +02:00
duckdb_manager.py chore(oss): isolate customer-specific deploy bits from scripts/grpn/ (#88, wave 1) (#94) 2026-04-27 20:24:34 +02:00
fetch-env-from-secrets.sh chore(oss): isolate customer-specific deploy bits from scripts/grpn/ (#88, wave 1) (#94) 2026-04-27 20:24:34 +02:00
generate_openapi.py feat: multi-instance deployment — all 14 must-have items from spec 2026-04-10 11:57:42 +02:00
generate_sample_data.py feat(observability): request_id end-to-end + dev debug toolbar + centralized logging (#136) 2026-04-29 22:54:21 +02:00
init.sh refactor: final cleanup — delete legacy auth, clean deps, fix hash, migrate to uv 2026-03-31 19:18:30 +02:00
migrate_json_to_duckdb.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
migrate_metrics_to_duckdb.py feat(observability): request_id end-to-end + dev debug toolbar + centralized logging (#136) 2026-04-29 22:54:21 +02:00
migrate_parquets_to_extracts.py feat(observability): request_id end-to-end + dev debug toolbar + centralized logging (#136) 2026-04-29 22:54:21 +02:00
migrate_registry_to_duckdb.py feat(observability): request_id end-to-end + dev debug toolbar + centralized logging (#136) 2026-04-29 22:54:21 +02:00
README.md fix: rewrite Makefile and scripts/README.md 2026-04-09 17:16:04 +02:00
run-local-dev.ps1 feat(dev): add Windows PowerShell wrapper for local development (#80) 2026-04-28 23:59:11 +02:00
run-local-dev.sh fix(security+ops) + release(0.12.1): #82 #85 #87 hardening + cut 0.12.1 (#104) 2026-04-28 19:57:30 +02:00
seed_corporate_memory.py feat(memory): corporate memory v1+v1.5 + 0.15.0 (#72) 2026-04-29 07:16:22 +02:00
seed_dummy_tables.py feat(rbac+marketplace): RBAC v13 + Claude Code marketplace + #81/#83/#44 hardening 2026-04-28 14:25:04 +02:00
smoke-test-materialized-bq.sh feat(materialized): query_mode='materialized' for BigQuery + Keboola — admin SELECT → parquet → analyst 2026-05-01 20:25:56 +02:00
smoke-test.sh fix(ci): smoke-test stale route + rollback ghcr auth + issues:write (#140) 2026-04-30 09:42:27 +02:00
tls-fetch.sh feat(tls): corporate-CA HTTPS with URL-driven rotation, on-VM CSR gen, self-signed fallback (#51) 2026-04-25 19:51:25 +00:00

Scripts

Utility and migration scripts for Agnes AI Data Analyst.

Active Scripts

Script Purpose
generate_sample_data.py Generate sample data for development/demo
duckdb_manager.py DuckDB database management utilities
init.sh Initial server setup (install deps, create dirs)

Migration Scripts (one-time use)

Script Purpose
migrate_json_to_duckdb.py Migrate v1 JSON state files to DuckDB
migrate_parquets_to_extracts.py Migrate v1 parquet layout to extract.duckdb
migrate_registry_to_duckdb.py Migrate v1 table registry to DuckDB