agnes-the-ai-analyst/app/api
ZdenekSrotyr c3e82972c8
feat(bq): decouple table_registry bucket from BQ dataset name (#343) (#346)
* feat(bq): decouple table_registry bucket from BQ dataset name (#343)

Adds optional `bq_fqn` column (schema v51) carrying the fully-qualified
BigQuery path (project.dataset.table) so the rebuild path no longer has
to reconstruct it from the dual-purpose `bucket` field (which is also a
UX/RBAC label).

- Schema v51 migration + _SYSTEM_SCHEMA carry the nullable column;
  rows without it keep using the legacy bucket+source_table+
  remote_attach.project path (backwards compat).
- BQ extractor honors bq_fqn per row when present: dataset/table
  override on same-project rows; cross-project VIEW path works via
  bigquery_query(billing, ...); cross-project BASE TABLE skipped with
  a clear warning (multi-ATTACH per project deferred to follow-up).
- Orchestrator pre-pass detects drift between extract.duckdb
  _remote_attach.url and overlay data_source.bigquery.project, calls
  rebuild_from_registry to regenerate when they differ. Closes the
  operational hazard where /admin/server-config edits silently left
  the on-disk extract pointing at the old project until the next
  manual sync.
- Startup config check warns when project ≠ billing_project without
  location set (the on-disk symptom is "provider returned no data"
  silently in metadata cache), and when a warehouse-like data project
  has no billing_project override (silent 403 serviceusage path).
- _resolve_bq_location warning now points at the location config key
  explicitly so operators see the actionable fix in the log.
- POST /api/admin/register-table and PUT /api/admin/registry/{id}
  accept bq_fqn; malformed values rejected at the API boundary (422).
- 25 tests covering parse_bq_fqn matrix, extractor override paths
  (same-project + cross-project VIEW + cross-project BASE TABLE skip),
  orchestrator drift sync, startup-validator heuristic, admin models.

UI surface for bq_fqn input in /admin/tables intentionally omitted from
this PR (3.5k-line template change) — admins can register through the
REST API or `agnes admin` CLI in the meantime. Multi-project ATTACH
support is the same scope deferral as the cross-project BASE TABLE
skip; both ride a follow-up PR.

* review fixes: abstract CHANGELOG, merge duplicate Changed, bump docs schema version

- CHANGELOG.md: remove customer-specific hostname + incident date range
  from the orchestrator drift-sync entry (vendor-agnostic OSS rule),
  fold the entry into the existing [Unreleased] ### Changed section
  instead of opening a duplicate heading.
- docs/architecture.md: bump 'Current schema version' from 19 to 51 to
  match SCHEMA_VERSION (per agnes-orchestrator skill rule #4).

* review fixes: vendor-agnostic test fixture + Schema v51 internal bullet

- tests/test_bq_fqn.py: replace customer GCP project ID with generic
  'my-warehouse-project' placeholder (vendor-agnostic OSS rule). Test
  asserts on the warehouse-like heuristic, not the literal project
  name, so the rename is behavior-neutral.
- CHANGELOG.md: add explicit '\*\*Schema v51\*\*' bullet under
  `### Internal` naming the new version + summarizing the additive
  nullable column (matches the convention from v47/v48 bullets).

* fix(bq): cross-project _detect_table_type bills against extractor project

Addresses Devin review on #346 — pre-fix _detect_table_type passed the
data project as BOTH the FROM-clause target AND the bigquery_query()
first arg (billing project). For cross-project bq_fqn rows where
fqn_project != project_id, the data SA holds bigquery.dataViewer on
fqn_project but the serviceusage.services.use permission only on
project_id, so the call 403'd. init_extract's broad except Exception
swallowed the error and silently skipped the row, meaning the
cross-project VIEW path at extractor.py:~696 — the PR's primary
cross-project use case — never executed.

- Add optional billing_project kwarg to _detect_table_type; defaults
  to project for backwards compat (same-project callers unaffected).
- Update the init_extract call site to pass billing_project=project_id
  explicitly. Same-project rows (fqn_project == project_id) are a
  no-op; cross-project rows now route billing to the project where
  the SA actually has services.use.
- 2 new tests in TestDetectTableTypeBilling cover (a) explicit
  billing_project routing to bigquery_query 1st arg + data project
  staying in FROM, and (b) the backwards-compat default. Plus
  test_cross_project_detect_call_bills_against_extractor_project
  pins the call-site wiring — captures the (project, billing_project)
  pair the extractor passes for a cross-project bq_fqn row.

* release: 0.54.29 — bq_fqn decoupling + marketplace refactor + setup-script UX

Accumulated [Unreleased] content from #342 (flea marketplace refactor),
#344 (setup script step-2 cwd check), and #346 (this PR — bq_fqn column
+ orchestrator drift sync + startup config check). Schema v51.
2026-05-19 11:17:32 +00:00
..
__init__.py
_metadata_models.py feat(catalog): entity_type + validated where_examples + view-aware cost-guard + scheduler hygiene 2026-05-12 10:37:35 +02:00
access.py feat(web): consolidate the personal /me/* surface — /me/activity + /me/profile (#304) 2026-05-14 21:29:51 +02:00
activity.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
admin.py feat(bq): decouple table_registry bucket from BQ dataset name (#343) (#346) 2026-05-19 11:17:32 +00:00
admin_bigquery_test.py feat(admin): #160 BQ test-connection endpoint + billing_project placeholder UI 2026-05-04 10:31:35 +02:00
admin_sessions.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
admin_usage.py feat(marketplace): telemetry v46 + flea inner parity + listing polish (#329) 2026-05-15 20:58:03 +02:00
admin_usage_summary.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
admin_user_sessions.py fix(security): RBAC filter uses stable user_id instead of mutable email local-part (#293) (#299) 2026-05-14 14:12:54 +00:00
bq_metadata_refresh.py release: 0.52.0 — UX/hygiene round (5 fixes from 0.51.0 retro) 2026-05-12 15:09:14 +02:00
cache_warmup.py release: 0.50.0 — persistent BQ metadata cache + scheduled refresh; catalog never blocks on BigQuery 2026-05-11 20:37:17 +02:00
catalog.py fix(api): harden API surface before Swagger (issue #336) (#339) 2026-05-18 15:13:21 +02:00
claude_md.py chore(cli-rename): replace stale da verbs in active code paths 2026-05-04 21:10:43 +02:00
cli_artifacts.py feat(web): consolidate the personal /me/* surface — /me/activity + /me/profile (#304) 2026-05-14 21:29:51 +02:00
data.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
health.py fix(api): harden API surface before Swagger (issue #336) (#339) 2026-05-18 15:13:21 +02:00
initial_workspace.py fix(api): redirect unauthorized browser requests to login for initial workspace zip (#315) 2026-05-15 15:18:39 +02:00
jira_webhooks.py fix(api): harden API surface before Swagger (issue #336) (#339) 2026-05-18 15:13:21 +02:00
marketplace.py feat(flea): marketplace refactor — data model, attribution, UI unification (#342) 2026-05-19 02:32:41 +02:00
marketplaces.py feat(api): enforce API design rules via pytest + fix DELETE/status-code violations (#338) 2026-05-18 15:25:07 +02:00
me.py fix(security): RBAC filter uses stable user_id instead of mutable email local-part (#293) (#299) 2026-05-14 14:12:54 +00:00
me_debug.py feat(web): consolidate the personal /me/* surface — /me/activity + /me/profile (#304) 2026-05-14 21:29:51 +02:00
me_stats.py feat(web): consolidate the personal /me/* surface — /me/activity + /me/profile (#304) 2026-05-14 21:29:51 +02:00
memory.py feat(api): enforce API design rules via pytest + fix DELETE/status-code violations (#338) 2026-05-18 15:25:07 +02:00
metadata.py feat(rbac+marketplace): RBAC v13 + Claude Code marketplace + #81/#83/#44 hardening 2026-04-28 14:25:04 +02:00
metrics.py feat(api): enforce API design rules via pytest + fix DELETE/status-code violations (#338) 2026-05-18 15:25:07 +02:00
my_stack.py feat(flea): marketplace refactor — data model, attribution, UI unification (#342) 2026-05-19 02:32:41 +02:00
news.py feat(home): state-aware /home + /setup-advanced + schema v26 (#228) 2026-05-08 18:28:47 +02:00
observability.py feat(api): enforce API design rules via pytest + fix DELETE/status-code violations (#338) 2026-05-18 15:25:07 +02:00
query.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
query_hybrid.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
scripts.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
settings.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
store.py feat(flea): marketplace refactor — data model, attribution, UI unification (#342) 2026-05-19 02:32:41 +02:00
sync.py fix(api): harden API surface before Swagger (issue #336) (#339) 2026-05-18 15:13:21 +02:00
telegram.py feat: complete system — web UI, all API endpoints, governance, admin, CLI commands 2026-03-27 16:52:22 +01:00
tokens.py fix(api): harden API surface before Swagger (issue #336) (#339) 2026-05-18 15:13:21 +02:00
upload.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
users.py fix(api): harden API surface before Swagger (issue #336) (#339) 2026-05-18 15:13:21 +02:00
v2_arrow.py feat(v2): claude-driven fetch primitives + 0.14.0 (#102) 2026-04-29 01:07:19 +02:00
v2_cache.py feat(v2): claude-driven fetch primitives + 0.14.0 (#102) 2026-04-29 01:07:19 +02:00
v2_catalog.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
v2_quota.py refactor(quota): #160 relocate _build_quota_tracker to v2_quota.py 2026-05-04 10:31:35 +02:00
v2_sample.py fix(api): sample endpoint returns 500 for materialized BQ tables (#341) 2026-05-18 22:57:32 +02:00
v2_scan.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
v2_schema.py Activity Center: audit log + telemetry + sessions + agnes_* tables (#278) 2026-05-12 22:41:19 +02:00
welcome.py fix(devin-review): dashboard CTA respects override; PUT validates anon path 2026-05-03 21:45:32 +02:00
where_validator.py feat(v2): claude-driven fetch primitives + 0.14.0 (#102) 2026-04-29 01:07:19 +02:00