fix(admin): register-table real-world UX gaps for materialized BQ
Three items from operator feedback after running the actual flow: (1) Help docstring lied: "--bucket / --source-table ignored" for materialized rows. Reality: --bucket is load-bearing because `agnes schema <name>` builds the BQ identifier as `bq.<bucket>.<source_table>`. An empty bucket registered the row but broke schema/describe with HTTP 400 "unsafe BQ identifier in registry". Fix: docstring rewritten to reflect reality, plus client-side validation rejects materialized + empty bucket with a clear error pointing at the right knob. (2) Post-register UX cliff: `agnes pull` after register-table reports "Updated 0 tables (1 total)" because registration adds a registry row but does NOT trigger a parquet build. Operators routinely assume something's broken when they need to run `agnes setup first-sync` to kick off the materialization. Hint emitted on success now points at first-sync. (3) RBAC gotcha: `agnes catalog` is RBAC-filtered via `resource_grants`, so non-admin users don't see freshly-registered rows until a grant is created. Hint emitted on success now points at `agnes admin grant create <group> table <name>`. Tests: 8/8 in test_cli_admin_materialized.py, including two new regression tests for the validation + the hint output.
This commit is contained in:
parent
5915f92eaa
commit
36012e0833
3 changed files with 126 additions and 3 deletions
|
|
@ -39,6 +39,10 @@ End-to-end clean-analyst-bootstrap rewrite. The web `/setup` page now produces a
|
|||
- `agnes snapshot create` (formerly `da fetch`) no longer materializes an empty `user/duckdb/analytics.duckdb` when run before any `agnes pull`. Friendly hint redirects to `agnes pull`.
|
||||
- Workspace `agnes status` reads from the canonical `server/parquet/` and `user/duckdb/analytics.duckdb` paths (was reading legacy `data/parquet/`, `data/metadata/last_sync.json`).
|
||||
- `agnes init` and `agnes pull` errors now use the `cli/error_render.py` typed-error renderer (added in 0.32.0), so analyst-facing error UX matches the structured shape `agnes query --remote` already produces.
|
||||
- **`agnes admin register-table` UX**: three real-world feedback items addressed.
|
||||
- **`--query-mode materialized` now requires `--bucket`** (client-side validation; exits with a clear error before hitting the server). The previous help docstring claimed `--bucket` was *ignored* for materialized rows, but the value is actually load-bearing — `agnes schema <name>` builds the BQ identifier as `bq.<bucket>.<source_table>`, so an empty bucket registered the row but broke subsequent schema/describe with HTTP 400 "unsafe BQ identifier in registry". Docstring rewritten to reflect reality.
|
||||
- **Post-success hints**: after a successful registration the CLI now points operators at the two follow-ups they routinely miss: (a) `agnes setup first-sync` to materialize the parquet (registration alone doesn't trigger a build; `agnes pull` reports "Updated 0 tables" until the scheduler tick), and (b) `agnes admin grant create <group> table <name>` to make the row visible in `agnes catalog` for non-admin users (catalog is RBAC-filtered).
|
||||
- Test coverage: `tests/test_cli_admin_materialized.py::test_register_materialized_without_bucket_fails_with_clear_error` and `test_register_table_emits_first_sync_and_grant_hints`.
|
||||
- **`agnes query --remote` SQL rewriter no longer corrupts output when the GCP project ID contains a registered table name as a hyphen-delimited word** (Devin Review on `query.py:464`). The previous iterative rewrite (one `re.sub(\b<name>\b, ...)` per registered name) was vulnerable to cross-contamination: e.g. project `my-ue-project` + registered `orders` + registered `ue` → iter 1 rewrites `orders` to `\`my-ue-project.fin.orders\``, iter 2's `\bue\b` then matches the `ue` INSIDE `my-ue-project` and corrupts the iter-1 path. Fix: replaced the iteration with a SINGLE `re.sub` whose alternation regex (sorted longest-first) handles every name in one pass, so freshly-inserted backticked text isn't re-scanned. The fallback at `query.py:576` (per-table SELECT * on BQ parse error) caught the corrupted output as `bq_bad_request` so impact was over-estimation rather than fail-open, but the partition-pruning benefit of #171 is now preserved for projects whose IDs share a hyphen-segment with a registered table name. Regression test in `tests/test_api_query_guardrail.py::test_rewrite_helper_does_not_corrupt_when_project_id_contains_registered_name`.
|
||||
- **BigQuery materialize TTL reclaim is no longer dead code** (Devin Review on `extractor.py:166`). `_try_acquire_file_lock` used to call `open(lock_path, mode="w")` BEFORE checking the lock-file mtime, which truncated the file and refreshed mtime to *now* on every invocation. The subsequent `time.time() - lock_path.stat().st_mtime` always saw age ~0, so `age > TTL` never fired, and `materialize.lock_ttl_seconds` was a silently no-op config knob. Fix: stat the lock path BEFORE any `open()` to read the real pre-probe mtime; if older than TTL, unlink (forcing a fresh inode for the next `open + flock`); only then probe. Two regression tests added: `test_stale_held_lock_is_reclaimed_despite_live_holder` exercises the full reclaim path with a still-living fcntl holder, `test_failed_probe_does_not_self_refresh_lock_mtime` pins that a failed acquisition doesn't pathologically loop. Residual cross-process risk (a genuinely overrunning materialize past TTL races a fresh attempt) is documented in the helper docstring; in-process `threading.Lock` keyed on `table_id` blocks the single-process race.
|
||||
- **`agnes init --token X` now correctly uses the explicit token in the verify call**, even when `~/.config/agnes/token.json` already holds a stale token from a prior install. Pre-fix `cli.config.get_token()` read the on-disk file first and only fell back to env vars, so step 2 (PAT-verify) ran with the stale token and failed with a confusing 401 — even though the `--token` arg was valid (Devin Review on `init.py:99`). Fix: a `ContextVar`-based override in `cli.config` short-circuits `get_token()` before the file read; `_override_server_env` (used by both `agnes init` and `agnes pull`'s `run_pull`) sets it for the duration of the call. Async-safe (each task sees its own override) and leak-proof (resets on context exit).
|
||||
|
|
|
|||
|
|
@ -92,12 +92,18 @@ def register_table(
|
|||
"""Register a single table.
|
||||
|
||||
Modes:
|
||||
- **local** (Keboola): batch pull, parquet on disk.
|
||||
- **local** (Keboola): batch pull, parquet on disk. Requires
|
||||
`--bucket` + `--source-table`.
|
||||
- **remote** (BigQuery): view only, queries go to BQ. Requires
|
||||
`--bucket` + `--source-table`.
|
||||
- **materialized** (BigQuery): server-side scheduled SQL → parquet.
|
||||
Requires `--query` (inline or `@file.sql`); `--bucket` /
|
||||
`--source-table` ignored.
|
||||
Requires `--query` (inline or `@file.sql`) AND `--bucket` (BQ
|
||||
dataset of the destination identifier). `--source-table` defaults
|
||||
to the registered `name` when omitted; explicit override is rare.
|
||||
Note: `agnes schema <name>` builds the BQ identifier as
|
||||
`bq.<bucket>.<source_table>` even for materialized rows, so an
|
||||
empty `--bucket` here registers the row but breaks subsequent
|
||||
schema/describe calls.
|
||||
|
||||
`--dry-run` goes through /precheck (BQ remote only — for materialized
|
||||
rows, dry-run is a no-op since the SQL itself is the contract).
|
||||
|
|
@ -123,6 +129,22 @@ def register_table(
|
|||
)
|
||||
raise typer.Exit(2)
|
||||
|
||||
# Bucket is load-bearing for the BQ destination identifier on
|
||||
# materialized rows. Without it, registration succeeds but
|
||||
# subsequent `agnes schema <name>` builds `bq."".."<src>"` from
|
||||
# the empty bucket and the server rejects with HTTP 400 "unsafe
|
||||
# BQ identifier in registry". Catch this at register time so the
|
||||
# operator gets a clear error pointing at the right knob.
|
||||
if query_mode == "materialized" and not bucket:
|
||||
typer.echo(
|
||||
"Error: --query-mode materialized requires --bucket (the BQ "
|
||||
"dataset for the destination identifier). Without it the row "
|
||||
"registers but `agnes schema <name>` later fails with "
|
||||
"'unsafe BQ identifier in registry'.",
|
||||
err=True,
|
||||
)
|
||||
raise typer.Exit(2)
|
||||
|
||||
payload = {
|
||||
"name": name,
|
||||
"source_type": source_type,
|
||||
|
|
@ -176,6 +198,33 @@ def register_table(
|
|||
typer.echo(f"Registered (materializing in background): {name}")
|
||||
else:
|
||||
typer.echo(f"Registered: {name}")
|
||||
|
||||
# Post-success hints. Two operator gotchas this catches:
|
||||
#
|
||||
# 1. `agnes pull` does not auto-materialize newly-registered
|
||||
# rows — registration adds a registry row, but the parquet
|
||||
# is built only when the scheduler tick runs (or first-sync
|
||||
# is triggered manually). Without this hint operators see
|
||||
# "Updated 0 tables" on `agnes pull` and assume something
|
||||
# is broken.
|
||||
# 2. `register-table` does NOT auto-grant. `agnes catalog`
|
||||
# filters per-user via `resource_grants`, so operators
|
||||
# other than the registering admin won't see the new row
|
||||
# until a grant is created.
|
||||
#
|
||||
# Hint #1 only fires for `local` and `materialized` (the modes
|
||||
# that actually produce a parquet); 202-async path covers a
|
||||
# different signal, so don't double-message there.
|
||||
if query_mode in ("local", "materialized") and resp.status_code != 202:
|
||||
typer.echo(
|
||||
" Next: run `agnes setup first-sync` to materialize "
|
||||
"the parquet (or wait for the scheduler tick)."
|
||||
)
|
||||
typer.echo(
|
||||
f" Note: register-table does not auto-grant. Run "
|
||||
f"`agnes admin grant create <group> table {name}` to "
|
||||
f"make this visible in `agnes catalog` for non-admin users."
|
||||
)
|
||||
elif resp.status_code == 409:
|
||||
typer.echo(f"Already exists: {name}")
|
||||
else:
|
||||
|
|
|
|||
|
|
@ -29,6 +29,7 @@ def test_register_materialized_with_inline_query(monkeypatch):
|
|||
"admin", "register-table", "orders_90d",
|
||||
"--source-type", "bigquery",
|
||||
"--query-mode", "materialized",
|
||||
"--bucket", "fin",
|
||||
"--query", "SELECT date FROM `prj.ds.orders`",
|
||||
"--sync-schedule", "every 6h",
|
||||
])
|
||||
|
|
@ -59,6 +60,7 @@ def test_register_materialized_reads_query_from_file(tmp_path, monkeypatch):
|
|||
"admin", "register-table", "orders_90d",
|
||||
"--source-type", "bigquery",
|
||||
"--query-mode", "materialized",
|
||||
"--bucket", "fin",
|
||||
"--query", f"@{sql_file}",
|
||||
"--sync-schedule", "daily 03:00",
|
||||
])
|
||||
|
|
@ -155,3 +157,71 @@ def test_register_remote_path_unchanged(monkeypatch):
|
|||
assert "source_query" not in captured["json"]
|
||||
assert captured["json"]["bucket"] == "analytics"
|
||||
assert captured["json"]["source_table"] == "orders"
|
||||
|
||||
|
||||
def test_register_materialized_without_bucket_fails_with_clear_error(monkeypatch):
|
||||
"""`--query-mode materialized` without `--bucket` is a client-side
|
||||
error. Pre-fix the CLI sent `bucket=""` to the server; registration
|
||||
succeeded but `agnes schema <name>` later 400ed with "unsafe BQ
|
||||
identifier in registry" because the schema endpoint built
|
||||
`bq.\"\".\"<src>\"` from the empty bucket. Catching this at register
|
||||
time gives operators a clear pointer at the right knob instead of
|
||||
accept-then-fail-later UX."""
|
||||
called = {"count": 0}
|
||||
|
||||
def fake_post(*args, **kwargs):
|
||||
called["count"] += 1
|
||||
return _fake_resp(201)
|
||||
|
||||
monkeypatch.setattr("cli.commands.admin.api_post", fake_post)
|
||||
|
||||
runner = CliRunner()
|
||||
result = runner.invoke(app, [
|
||||
"admin", "register-table", "category_summary",
|
||||
"--source-type", "bigquery",
|
||||
"--query-mode", "materialized",
|
||||
"--query", "SELECT 1",
|
||||
# No --bucket on purpose.
|
||||
])
|
||||
|
||||
assert result.exit_code != 0
|
||||
# API never called — fail fast on client side.
|
||||
assert called["count"] == 0
|
||||
combined = result.stdout + (result.stderr or "")
|
||||
assert "--bucket" in combined
|
||||
# The error must explain WHY it's required, not just say "missing".
|
||||
assert "schema" in combined.lower() or "identifier" in combined.lower()
|
||||
|
||||
|
||||
def test_register_table_emits_first_sync_and_grant_hints(monkeypatch):
|
||||
"""After a successful register-table for a materialized row, the CLI
|
||||
output must point operators at:
|
||||
(a) `agnes setup first-sync` — registration adds a registry row but
|
||||
does NOT trigger a parquet build, and `agnes pull` then reports
|
||||
"Updated 0 tables (1 total)" until the next scheduler tick.
|
||||
(b) `agnes admin grant create <group> table <id>` — `agnes catalog`
|
||||
is RBAC-filtered, so non-admin users won't see the new row
|
||||
until a grant is created.
|
||||
|
||||
Without these hints operators bounce between symptoms and assume
|
||||
something's broken when it's just unstated post-register UX."""
|
||||
monkeypatch.setattr(
|
||||
"cli.commands.admin.api_post",
|
||||
lambda *a, **kw: _fake_resp(201),
|
||||
)
|
||||
|
||||
runner = CliRunner()
|
||||
result = runner.invoke(app, [
|
||||
"admin", "register-table", "category_summary",
|
||||
"--source-type", "bigquery",
|
||||
"--query-mode", "materialized",
|
||||
"--bucket", "analytics",
|
||||
"--query", "SELECT category, SUM(rev) FROM `prj.ds.tx` GROUP BY 1",
|
||||
])
|
||||
assert result.exit_code == 0, result.stdout
|
||||
out = result.stdout
|
||||
assert "agnes setup first-sync" in out
|
||||
assert "agnes admin grant create" in out
|
||||
# The grant hint should mention the registered name so operators can
|
||||
# copy-paste the next command verbatim.
|
||||
assert "category_summary" in out
|
||||
|
|
|
|||
Loading…
Reference in a new issue