diff --git a/CHANGELOG.md b/CHANGELOG.md index 92c0d6a..22e6dcd 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -39,6 +39,10 @@ End-to-end clean-analyst-bootstrap rewrite. The web `/setup` page now produces a - `agnes snapshot create` (formerly `da fetch`) no longer materializes an empty `user/duckdb/analytics.duckdb` when run before any `agnes pull`. Friendly hint redirects to `agnes pull`. - Workspace `agnes status` reads from the canonical `server/parquet/` and `user/duckdb/analytics.duckdb` paths (was reading legacy `data/parquet/`, `data/metadata/last_sync.json`). - `agnes init` and `agnes pull` errors now use the `cli/error_render.py` typed-error renderer (added in 0.32.0), so analyst-facing error UX matches the structured shape `agnes query --remote` already produces. +- **`agnes admin register-table` UX**: three real-world feedback items addressed. + - **`--query-mode materialized` now requires `--bucket`** (client-side validation; exits with a clear error before hitting the server). The previous help docstring claimed `--bucket` was *ignored* for materialized rows, but the value is actually load-bearing — `agnes schema ` builds the BQ identifier as `bq..`, so an empty bucket registered the row but broke subsequent schema/describe with HTTP 400 "unsafe BQ identifier in registry". Docstring rewritten to reflect reality. + - **Post-success hints**: after a successful registration the CLI now points operators at the two follow-ups they routinely miss: (a) `agnes setup first-sync` to materialize the parquet (registration alone doesn't trigger a build; `agnes pull` reports "Updated 0 tables" until the scheduler tick), and (b) `agnes admin grant create table ` to make the row visible in `agnes catalog` for non-admin users (catalog is RBAC-filtered). + - Test coverage: `tests/test_cli_admin_materialized.py::test_register_materialized_without_bucket_fails_with_clear_error` and `test_register_table_emits_first_sync_and_grant_hints`. - **`agnes query --remote` SQL rewriter no longer corrupts output when the GCP project ID contains a registered table name as a hyphen-delimited word** (Devin Review on `query.py:464`). The previous iterative rewrite (one `re.sub(\b\b, ...)` per registered name) was vulnerable to cross-contamination: e.g. project `my-ue-project` + registered `orders` + registered `ue` → iter 1 rewrites `orders` to `\`my-ue-project.fin.orders\``, iter 2's `\bue\b` then matches the `ue` INSIDE `my-ue-project` and corrupts the iter-1 path. Fix: replaced the iteration with a SINGLE `re.sub` whose alternation regex (sorted longest-first) handles every name in one pass, so freshly-inserted backticked text isn't re-scanned. The fallback at `query.py:576` (per-table SELECT * on BQ parse error) caught the corrupted output as `bq_bad_request` so impact was over-estimation rather than fail-open, but the partition-pruning benefit of #171 is now preserved for projects whose IDs share a hyphen-segment with a registered table name. Regression test in `tests/test_api_query_guardrail.py::test_rewrite_helper_does_not_corrupt_when_project_id_contains_registered_name`. - **BigQuery materialize TTL reclaim is no longer dead code** (Devin Review on `extractor.py:166`). `_try_acquire_file_lock` used to call `open(lock_path, mode="w")` BEFORE checking the lock-file mtime, which truncated the file and refreshed mtime to *now* on every invocation. The subsequent `time.time() - lock_path.stat().st_mtime` always saw age ~0, so `age > TTL` never fired, and `materialize.lock_ttl_seconds` was a silently no-op config knob. Fix: stat the lock path BEFORE any `open()` to read the real pre-probe mtime; if older than TTL, unlink (forcing a fresh inode for the next `open + flock`); only then probe. Two regression tests added: `test_stale_held_lock_is_reclaimed_despite_live_holder` exercises the full reclaim path with a still-living fcntl holder, `test_failed_probe_does_not_self_refresh_lock_mtime` pins that a failed acquisition doesn't pathologically loop. Residual cross-process risk (a genuinely overrunning materialize past TTL races a fresh attempt) is documented in the helper docstring; in-process `threading.Lock` keyed on `table_id` blocks the single-process race. - **`agnes init --token X` now correctly uses the explicit token in the verify call**, even when `~/.config/agnes/token.json` already holds a stale token from a prior install. Pre-fix `cli.config.get_token()` read the on-disk file first and only fell back to env vars, so step 2 (PAT-verify) ran with the stale token and failed with a confusing 401 — even though the `--token` arg was valid (Devin Review on `init.py:99`). Fix: a `ContextVar`-based override in `cli.config` short-circuits `get_token()` before the file read; `_override_server_env` (used by both `agnes init` and `agnes pull`'s `run_pull`) sets it for the duration of the call. Async-safe (each task sees its own override) and leak-proof (resets on context exit). diff --git a/cli/commands/admin.py b/cli/commands/admin.py index 66ee0fb..7534524 100644 --- a/cli/commands/admin.py +++ b/cli/commands/admin.py @@ -92,12 +92,18 @@ def register_table( """Register a single table. Modes: - - **local** (Keboola): batch pull, parquet on disk. + - **local** (Keboola): batch pull, parquet on disk. Requires + `--bucket` + `--source-table`. - **remote** (BigQuery): view only, queries go to BQ. Requires `--bucket` + `--source-table`. - **materialized** (BigQuery): server-side scheduled SQL → parquet. - Requires `--query` (inline or `@file.sql`); `--bucket` / - `--source-table` ignored. + Requires `--query` (inline or `@file.sql`) AND `--bucket` (BQ + dataset of the destination identifier). `--source-table` defaults + to the registered `name` when omitted; explicit override is rare. + Note: `agnes schema ` builds the BQ identifier as + `bq..` even for materialized rows, so an + empty `--bucket` here registers the row but breaks subsequent + schema/describe calls. `--dry-run` goes through /precheck (BQ remote only — for materialized rows, dry-run is a no-op since the SQL itself is the contract). @@ -123,6 +129,22 @@ def register_table( ) raise typer.Exit(2) + # Bucket is load-bearing for the BQ destination identifier on + # materialized rows. Without it, registration succeeds but + # subsequent `agnes schema ` builds `bq.""..""` from + # the empty bucket and the server rejects with HTTP 400 "unsafe + # BQ identifier in registry". Catch this at register time so the + # operator gets a clear error pointing at the right knob. + if query_mode == "materialized" and not bucket: + typer.echo( + "Error: --query-mode materialized requires --bucket (the BQ " + "dataset for the destination identifier). Without it the row " + "registers but `agnes schema ` later fails with " + "'unsafe BQ identifier in registry'.", + err=True, + ) + raise typer.Exit(2) + payload = { "name": name, "source_type": source_type, @@ -176,6 +198,33 @@ def register_table( typer.echo(f"Registered (materializing in background): {name}") else: typer.echo(f"Registered: {name}") + + # Post-success hints. Two operator gotchas this catches: + # + # 1. `agnes pull` does not auto-materialize newly-registered + # rows — registration adds a registry row, but the parquet + # is built only when the scheduler tick runs (or first-sync + # is triggered manually). Without this hint operators see + # "Updated 0 tables" on `agnes pull` and assume something + # is broken. + # 2. `register-table` does NOT auto-grant. `agnes catalog` + # filters per-user via `resource_grants`, so operators + # other than the registering admin won't see the new row + # until a grant is created. + # + # Hint #1 only fires for `local` and `materialized` (the modes + # that actually produce a parquet); 202-async path covers a + # different signal, so don't double-message there. + if query_mode in ("local", "materialized") and resp.status_code != 202: + typer.echo( + " Next: run `agnes setup first-sync` to materialize " + "the parquet (or wait for the scheduler tick)." + ) + typer.echo( + f" Note: register-table does not auto-grant. Run " + f"`agnes admin grant create table {name}` to " + f"make this visible in `agnes catalog` for non-admin users." + ) elif resp.status_code == 409: typer.echo(f"Already exists: {name}") else: diff --git a/tests/test_cli_admin_materialized.py b/tests/test_cli_admin_materialized.py index f842551..bad4aad 100644 --- a/tests/test_cli_admin_materialized.py +++ b/tests/test_cli_admin_materialized.py @@ -29,6 +29,7 @@ def test_register_materialized_with_inline_query(monkeypatch): "admin", "register-table", "orders_90d", "--source-type", "bigquery", "--query-mode", "materialized", + "--bucket", "fin", "--query", "SELECT date FROM `prj.ds.orders`", "--sync-schedule", "every 6h", ]) @@ -59,6 +60,7 @@ def test_register_materialized_reads_query_from_file(tmp_path, monkeypatch): "admin", "register-table", "orders_90d", "--source-type", "bigquery", "--query-mode", "materialized", + "--bucket", "fin", "--query", f"@{sql_file}", "--sync-schedule", "daily 03:00", ]) @@ -155,3 +157,71 @@ def test_register_remote_path_unchanged(monkeypatch): assert "source_query" not in captured["json"] assert captured["json"]["bucket"] == "analytics" assert captured["json"]["source_table"] == "orders" + + +def test_register_materialized_without_bucket_fails_with_clear_error(monkeypatch): + """`--query-mode materialized` without `--bucket` is a client-side + error. Pre-fix the CLI sent `bucket=""` to the server; registration + succeeded but `agnes schema ` later 400ed with "unsafe BQ + identifier in registry" because the schema endpoint built + `bq.\"\".\"\"` from the empty bucket. Catching this at register + time gives operators a clear pointer at the right knob instead of + accept-then-fail-later UX.""" + called = {"count": 0} + + def fake_post(*args, **kwargs): + called["count"] += 1 + return _fake_resp(201) + + monkeypatch.setattr("cli.commands.admin.api_post", fake_post) + + runner = CliRunner() + result = runner.invoke(app, [ + "admin", "register-table", "category_summary", + "--source-type", "bigquery", + "--query-mode", "materialized", + "--query", "SELECT 1", + # No --bucket on purpose. + ]) + + assert result.exit_code != 0 + # API never called — fail fast on client side. + assert called["count"] == 0 + combined = result.stdout + (result.stderr or "") + assert "--bucket" in combined + # The error must explain WHY it's required, not just say "missing". + assert "schema" in combined.lower() or "identifier" in combined.lower() + + +def test_register_table_emits_first_sync_and_grant_hints(monkeypatch): + """After a successful register-table for a materialized row, the CLI + output must point operators at: + (a) `agnes setup first-sync` — registration adds a registry row but + does NOT trigger a parquet build, and `agnes pull` then reports + "Updated 0 tables (1 total)" until the next scheduler tick. + (b) `agnes admin grant create table ` — `agnes catalog` + is RBAC-filtered, so non-admin users won't see the new row + until a grant is created. + + Without these hints operators bounce between symptoms and assume + something's broken when it's just unstated post-register UX.""" + monkeypatch.setattr( + "cli.commands.admin.api_post", + lambda *a, **kw: _fake_resp(201), + ) + + runner = CliRunner() + result = runner.invoke(app, [ + "admin", "register-table", "category_summary", + "--source-type", "bigquery", + "--query-mode", "materialized", + "--bucket", "analytics", + "--query", "SELECT category, SUM(rev) FROM `prj.ds.tx` GROUP BY 1", + ]) + assert result.exit_code == 0, result.stdout + out = result.stdout + assert "agnes setup first-sync" in out + assert "agnes admin grant create" in out + # The grant hint should mention the registered name so operators can + # copy-paste the next command verbatim. + assert "category_summary" in out