* feat(bq): decouple table_registry bucket from BQ dataset name (#343) Adds optional `bq_fqn` column (schema v51) carrying the fully-qualified BigQuery path (project.dataset.table) so the rebuild path no longer has to reconstruct it from the dual-purpose `bucket` field (which is also a UX/RBAC label). - Schema v51 migration + _SYSTEM_SCHEMA carry the nullable column; rows without it keep using the legacy bucket+source_table+ remote_attach.project path (backwards compat). - BQ extractor honors bq_fqn per row when present: dataset/table override on same-project rows; cross-project VIEW path works via bigquery_query(billing, ...); cross-project BASE TABLE skipped with a clear warning (multi-ATTACH per project deferred to follow-up). - Orchestrator pre-pass detects drift between extract.duckdb _remote_attach.url and overlay data_source.bigquery.project, calls rebuild_from_registry to regenerate when they differ. Closes the operational hazard where /admin/server-config edits silently left the on-disk extract pointing at the old project until the next manual sync. - Startup config check warns when project ≠ billing_project without location set (the on-disk symptom is "provider returned no data" silently in metadata cache), and when a warehouse-like data project has no billing_project override (silent 403 serviceusage path). - _resolve_bq_location warning now points at the location config key explicitly so operators see the actionable fix in the log. - POST /api/admin/register-table and PUT /api/admin/registry/{id} accept bq_fqn; malformed values rejected at the API boundary (422). - 25 tests covering parse_bq_fqn matrix, extractor override paths (same-project + cross-project VIEW + cross-project BASE TABLE skip), orchestrator drift sync, startup-validator heuristic, admin models. UI surface for bq_fqn input in /admin/tables intentionally omitted from this PR (3.5k-line template change) — admins can register through the REST API or `agnes admin` CLI in the meantime. Multi-project ATTACH support is the same scope deferral as the cross-project BASE TABLE skip; both ride a follow-up PR. * review fixes: abstract CHANGELOG, merge duplicate Changed, bump docs schema version - CHANGELOG.md: remove customer-specific hostname + incident date range from the orchestrator drift-sync entry (vendor-agnostic OSS rule), fold the entry into the existing [Unreleased] ### Changed section instead of opening a duplicate heading. - docs/architecture.md: bump 'Current schema version' from 19 to 51 to match SCHEMA_VERSION (per agnes-orchestrator skill rule #4). * review fixes: vendor-agnostic test fixture + Schema v51 internal bullet - tests/test_bq_fqn.py: replace customer GCP project ID with generic 'my-warehouse-project' placeholder (vendor-agnostic OSS rule). Test asserts on the warehouse-like heuristic, not the literal project name, so the rename is behavior-neutral. - CHANGELOG.md: add explicit '\*\*Schema v51\*\*' bullet under `### Internal` naming the new version + summarizing the additive nullable column (matches the convention from v47/v48 bullets). * fix(bq): cross-project _detect_table_type bills against extractor project Addresses Devin review on #346 — pre-fix _detect_table_type passed the data project as BOTH the FROM-clause target AND the bigquery_query() first arg (billing project). For cross-project bq_fqn rows where fqn_project != project_id, the data SA holds bigquery.dataViewer on fqn_project but the serviceusage.services.use permission only on project_id, so the call 403'd. init_extract's broad except Exception swallowed the error and silently skipped the row, meaning the cross-project VIEW path at extractor.py:~696 — the PR's primary cross-project use case — never executed. - Add optional billing_project kwarg to _detect_table_type; defaults to project for backwards compat (same-project callers unaffected). - Update the init_extract call site to pass billing_project=project_id explicitly. Same-project rows (fqn_project == project_id) are a no-op; cross-project rows now route billing to the project where the SA actually has services.use. - 2 new tests in TestDetectTableTypeBilling cover (a) explicit billing_project routing to bigquery_query 1st arg + data project staying in FROM, and (b) the backwards-compat default. Plus test_cross_project_detect_call_bills_against_extractor_project pins the call-site wiring — captures the (project, billing_project) pair the extractor passes for a cross-project bq_fqn row. * release: 0.54.29 — bq_fqn decoupling + marketplace refactor + setup-script UX Accumulated [Unreleased] content from #342 (flea marketplace refactor), #344 (setup script step-2 cwd check), and #346 (this PR — bq_fqn column + orchestrator drift sync + startup config check). Schema v51.
127 lines
4.7 KiB
Python
127 lines
4.7 KiB
Python
"""v41 → v42 migration: 7 new usage_* tables for telemetry."""
|
|
|
|
import duckdb
|
|
from src.db import _ensure_schema as init_database
|
|
|
|
|
|
def test_v42_tables_exist_after_init(tmp_path):
|
|
db_path = tmp_path / "test.duckdb"
|
|
conn = duckdb.connect(str(db_path))
|
|
init_database(conn)
|
|
tables = {
|
|
row[0]
|
|
for row in conn.execute("SELECT table_name FROM information_schema.tables WHERE table_schema='main'").fetchall()
|
|
}
|
|
# v46 dropped: usage_plugin_daily, usage_attribution_skills/_agents/_commands.
|
|
# v46 added: usage_marketplace_item_daily, usage_marketplace_item_window.
|
|
for tbl in [
|
|
"usage_events",
|
|
"usage_session_summary",
|
|
"usage_tool_daily",
|
|
"usage_marketplace_item_daily",
|
|
"usage_marketplace_item_window",
|
|
]:
|
|
assert tbl in tables, f"missing table {tbl}"
|
|
for tbl in [
|
|
"usage_plugin_daily",
|
|
"usage_attribution_skills",
|
|
"usage_attribution_agents",
|
|
"usage_attribution_commands",
|
|
]:
|
|
assert tbl not in tables, f"dropped table {tbl} still present"
|
|
conn.close()
|
|
|
|
|
|
def test_v42_indices_exist(tmp_path):
|
|
db_path = tmp_path / "test.duckdb"
|
|
conn = duckdb.connect(str(db_path))
|
|
init_database(conn)
|
|
idx_names = {
|
|
row[0]
|
|
for row in conn.execute("SELECT index_name FROM duckdb_indexes WHERE table_name LIKE 'usage_%'").fetchall()
|
|
}
|
|
# v46 dropped: idx_usage_attr_*_lookup.
|
|
# v46 added: idx_mid_lookup, idx_miw_lookup on the new marketplace tables.
|
|
for idx in [
|
|
"idx_usage_events_session",
|
|
"idx_usage_events_user_time",
|
|
"idx_usage_events_tool",
|
|
"idx_usage_events_skill",
|
|
"idx_usage_events_ref",
|
|
"idx_usage_session_user",
|
|
"idx_usage_session_started",
|
|
"idx_mid_lookup",
|
|
"idx_miw_lookup",
|
|
]:
|
|
assert idx in idx_names, f"missing index {idx}"
|
|
conn.close()
|
|
|
|
|
|
def test_v41_to_v42_is_idempotent(tmp_path):
|
|
"""Running init twice on same DB must not error and version stays 41."""
|
|
db_path = tmp_path / "twice.duckdb"
|
|
conn = duckdb.connect(str(db_path))
|
|
init_database(conn)
|
|
conn.close()
|
|
conn = duckdb.connect(str(db_path))
|
|
init_database(conn)
|
|
v = conn.execute("SELECT MAX(version) FROM schema_version").fetchone()[0]
|
|
assert v == 51
|
|
conn.close()
|
|
|
|
|
|
def test_v41_db_upgrades_cleanly(tmp_path):
|
|
"""A v40-state DB (post-Activity-Center) must climb to v41 without error."""
|
|
db_path = tmp_path / "v41.duckdb"
|
|
conn = duckdb.connect(str(db_path))
|
|
# Minimal v40 baseline shape — schema_version + audit_log with v40 columns.
|
|
conn.execute("CREATE TABLE schema_version (version INTEGER, applied_at TIMESTAMP DEFAULT current_timestamp)")
|
|
conn.execute("INSERT INTO schema_version (version) VALUES (41)")
|
|
conn.execute("""CREATE TABLE audit_log (
|
|
id VARCHAR PRIMARY KEY, timestamp TIMESTAMP DEFAULT current_timestamp,
|
|
user_id VARCHAR, action VARCHAR, resource VARCHAR, params JSON,
|
|
result VARCHAR, duration_ms INTEGER,
|
|
params_before JSON, client_ip VARCHAR, client_kind VARCHAR, correlation_id VARCHAR
|
|
)""")
|
|
conn.close()
|
|
conn = duckdb.connect(str(db_path))
|
|
init_database(conn)
|
|
v = conn.execute("SELECT MAX(version) FROM schema_version").fetchone()[0]
|
|
assert v == 51
|
|
# All 7 new v41 tables exist after the v40→v41 upgrade
|
|
tables = {
|
|
row[0]
|
|
for row in conn.execute("SELECT table_name FROM information_schema.tables WHERE table_schema='main'").fetchall()
|
|
}
|
|
# v46 replaced the v42 attribution/rollup tables — verify the post-v46 set.
|
|
for tbl in [
|
|
"usage_events",
|
|
"usage_session_summary",
|
|
"usage_tool_daily",
|
|
"usage_marketplace_item_daily",
|
|
"usage_marketplace_item_window",
|
|
]:
|
|
assert tbl in tables, f"missing table {tbl} after upgrade"
|
|
conn.close()
|
|
|
|
|
|
def test_v30_db_ladders_all_the_way_up(tmp_path):
|
|
"""Old v30-state DB must climb all the way to v41 without losing data."""
|
|
db_path = tmp_path / "v30.duckdb"
|
|
conn = duckdb.connect(str(db_path))
|
|
conn.execute("CREATE TABLE schema_version (version INTEGER, applied_at TIMESTAMP DEFAULT current_timestamp)")
|
|
conn.execute("INSERT INTO schema_version (version) VALUES (30)")
|
|
conn.execute("CREATE TABLE audit_log (id VARCHAR PRIMARY KEY)")
|
|
conn.execute("INSERT INTO audit_log (id) VALUES ('vintage')")
|
|
conn.close()
|
|
|
|
conn = duckdb.connect(str(db_path))
|
|
init_database(conn)
|
|
v = conn.execute("SELECT MAX(version) FROM schema_version").fetchone()[0]
|
|
assert v == 51
|
|
cnt = conn.execute("SELECT COUNT(*) FROM audit_log WHERE id='vintage'").fetchone()[0]
|
|
assert cnt == 1
|
|
# New v41 table exists
|
|
cnt2 = conn.execute("SELECT COUNT(*) FROM usage_events").fetchone()[0]
|
|
assert cnt2 == 0
|
|
conn.close()
|