agnes-the-ai-analyst/docs/superpowers/plans
ZdenekSrotyr aa5921da67
release: 0.47.0 — source-agnostic catalog metadata + cache discipline (#223)
## Summary

- Catalog enrichment for `query_mode='remote'` rows: `rows`, `size_bytes`, `partition_by`, `clustered_by` per table (BQ + Keboola providers).
- `/api/v2/schema/{id}` cache miss: 2 BQ jobs → 1 (-50%) via shared `fetch_bq_columns_full`.
- All four catalog/schema/sample/metadata caches flush on registry change; single-row re-warm scheduled.
- Automatic cache warmup at server startup (bounded concurrency, opt-out via `AGNES_SKIP_CACHE_WARMUP=1`).
- SSE-driven freshness toolbar on `/admin/tables` with progress bar, log, and per-row badge.
- New admin doc `docs/admin/query-modes.md` — single source of truth on `local` / `remote` / `materialized` choice.

Closes #155.
Closes #156.

## Test plan

- [x] 65+ targeted tests pass across 11 new test modules + 3 modified ones.
- [x] No DB migration; no wire-break; `MIN_COMPAT_CLI_VERSION` unchanged.
- [ ] Reviewer: register a remote BQ table via `/admin/tables`, observe the toolbar populates within ~2 s and the per-row badge transitions warming → fresh.
- [ ] Reviewer: trigger `Re-warm all`, verify SSE log scrolls and `cacheWarmupBar` progresses.
- [ ] Reviewer: edit a registered row's bucket, verify `agnes schema <id>` returns updated columns immediately (no 1-hour staleness).
- [ ] Reviewer: confirm `agnes admin register-table --query-mode remote` prints the new IAM-smoke-check hint.

## Notable design decisions

- BigQuery `INFORMATION_SCHEMA.TABLE_STORAGE` is the only valid scope for size+rows (verified live 2026-05-07; dataset-scoped doesn't exist). Region resolved from `instance.yaml.data_source.bigquery.location` → `bq.client().get_dataset(...)` → fall back to legacy `__TABLES__`.
- VIEW handling: TABLE_STORAGE returns no rows for views, fall through to `__TABLES__` (also empty) → `TableMetadata(rows=None, size_bytes=None, partition_by=..., clustered_by=...)`. Null size signals analyst Claude to apply existing CLAUDE.md guidance.
- `size_bytes` is `active_logical_bytes + long_term_logical_bytes` — full BQ scan reads both; reporting only active undercounts aged partitioned tables.
- Source-agnostic provider seam: per-source `connectors/<source>/metadata.py:fetch(MetadataRequest)`; dispatcher in `app/api/v2_catalog.py:_metadata_provider_for` lazily imports per source_type so a Keboola-only deployment doesn't pay the BQ-extension import cost.
- Warmup non-blocking: FastAPI `lifespan` schedules `asyncio.create_task(_warm_catalog_caches_bg)` before `yield`. Per-row failures isolated.

## Out of scope

- Profile / column histograms / dimension cardinality for remote tables (separate issue).
- Onboarding nudge ("you have 0 remote tables, consider registering some BQ ones") — separate UX call.
- Provider plug-in registration via entry-points (the dispatch table is a hardcoded if-tree today; one line per future source).

## Release

Bumps `pyproject.toml` 0.46.1 → 0.47.0 (main shipped 0.46.0 + 0.46.1 during this PR — see commit `d98976ec`). New CHANGELOG section under `## [0.47.0] — 2026-05-07`.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/223" target="_blank">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
    <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review">
  </picture>
</a>
<!-- devin-review-badge-end -->
2026-05-07 18:33:55 +02:00
..
2026-03-27-01-duckdb-state-layer.md chore: clean stale docs — rewrite architecture.md, remove old plans 2026-04-09 09:06:13 +02:00
2026-03-27-02-complete-system.md fix: remove dead PRAGMA enable_wal code 2026-04-09 06:59:57 +02:00
2026-04-08-final-integration-fixes.md chore: clean repo for public release — fix references, remove drafts 2026-04-08 19:27:25 +02:00
2026-04-08-production-hardening.md fix: remove dead PRAGMA enable_wal code 2026-04-09 06:59:57 +02:00
2026-04-08-security-hardening.md chore: clean repo for public release — fix references, remove drafts 2026-04-08 19:27:25 +02:00
2026-04-09-dead-code-cleanup.md User management + PAT + CLI distribution + HTML auth redirect (#9 #10 #11 #12) (#28) 2026-04-22 14:24:28 +02:00
2026-04-09-deployment-readiness.md feat(tls): corporate-CA HTTPS with URL-driven rotation, on-VM CSR gen, self-signed fallback (#51) 2026-04-25 19:51:25 +00:00
2026-04-09-final-polish.md User management + PAT + CLI distribution + HTML auth redirect (#9 #10 #11 #12) (#28) 2026-04-22 14:24:28 +02:00
2026-04-09-security-fixes.md User management + PAT + CLI distribution + HTML auth redirect (#9 #10 #11 #12) (#28) 2026-04-22 14:24:28 +02:00
2026-04-10-analyst-bootstrap.md docs: add implementation plans for porting internal features 2026-04-10 19:08:55 +02:00
2026-04-10-business-metrics.md docs: add implementation plans for porting internal features 2026-04-10 19:08:55 +02:00
2026-04-10-metadata-writer.md docs: add implementation plans for porting internal features 2026-04-10 19:08:55 +02:00
2026-04-11-remote-query.md docs: add remote query implementation plan (5 tasks) 2026-04-11 11:02:04 +02:00
2026-04-12-comprehensive-test-suite.md docs: add comprehensive test suite implementation plan (8 tasks, 6 parallel blocks) 2026-04-12 10:44:08 +02:00
2026-04-21-deployment-log.md chore(oss): isolate customer-specific deploy bits from scripts/grpn/ (#88, wave 1) (#94) 2026-04-27 20:24:34 +02:00
2026-04-21-hackathon-dry-run.md chore(oss): isolate customer-specific deploy bits from scripts/grpn/ (#88, wave 1) (#94) 2026-04-27 20:24:34 +02:00
2026-04-21-issues-14-and-10.md chore(oss): isolate customer-specific deploy bits from scripts/grpn/ (#88, wave 1) (#94) 2026-04-27 20:24:34 +02:00
2026-04-21-multi-customer-deployment.md chore(oss): isolate customer-specific deploy bits from scripts/grpn/ (#88, wave 1) (#94) 2026-04-27 20:24:34 +02:00
2026-04-21-user-mgmt-pat-cli.md User management + PAT + CLI distribution + HTML auth redirect (#9 #10 #11 #12) (#28) 2026-04-22 14:24:28 +02:00
2026-04-22-cloudflare-access-auth.md User management + PAT + CLI distribution + HTML auth redirect (#9 #10 #11 #12) (#28) 2026-04-22 14:24:28 +02:00
2026-04-27-bq-pipeline-views-and-metadata-auth.md feat(v2): claude-driven fetch primitives + 0.14.0 (#102) 2026-04-29 01:07:19 +02:00
2026-04-27-claude-fetch-primitives.md feat(v2): claude-driven fetch primitives + 0.14.0 (#102) 2026-04-29 01:07:19 +02:00
2026-04-29-dev-debug-toolbar.md feat(observability): request_id end-to-end + dev debug toolbar + centralized logging (#136) 2026-04-29 22:54:21 +02:00
2026-04-29-issue-134-bq-access-unify-plan.md fix(v2): #134 BigQuery cross-project errors return structured 502/400 + BqAccess facade (#138) 2026-04-30 10:11:20 +02:00
2026-04-29-issues-77-78-79-89.md feat(scheduler): re-wire sync_schedule + script.schedule; tune via env; OpenMetadata TLS (#135) 2026-04-29 22:06:30 +02:00
2026-04-30-customizable-welcome-prompt.md docs: implementation plan for customizable welcome prompt 2026-05-03 16:10:48 +02:00
2026-05-01-admin-tables-form-cleanup.md feat(diagnose) + docs: warn on USER_PROJECT_DENIED footgun + document all newly-exposed knobs 2026-05-01 20:27:24 +02:00
2026-05-04-clean-analyst-bootstrap.md docs(spec+plan): round-4 review fixes — rename hygiene 2026-05-04 15:57:07 +02:00
2026-05-04-unified-setup-prompt.md refactor(setup-instructions): drop role param; collapse analyst/admin into one layout 2026-05-04 22:08:48 +02:00
2026-05-06-cli-auto-upgrade.md docs(plan): server-pinned CLI auto-upgrade — spec + implementation plan 2026-05-06 23:23:23 +02:00
2026-05-07-source-agnostic-table-metadata.md release: 0.47.0 — source-agnostic catalog metadata + cache discipline (#223) 2026-05-07 18:33:55 +02:00