release: 0.52.0 — UX/hygiene round (5 fixes from 0.51.0 retro)

Closes #254 (agnes sample alias), #255 (wide-table render), #256
(single-flight on bq-metadata-refresh + run_id), #257 (init wording),
#258 (progress bar clamp).

Tier B trackers left open: #259 (init resume), #260 (stale .lock),
#261 (schema cold-start), #262 (docker disk).
This commit is contained in:
ZdenekSrotyr 2026-05-12 15:09:14 +02:00
parent cd0a97d269
commit 48755b9864
12 changed files with 409 additions and 21 deletions

View file

@ -10,6 +10,32 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C
## [Unreleased]
## [0.52.0] — 2026-05-12
UX + hygiene round following the 0.51.0 catalog-hang fix. Five small,
analyst-facing improvements surfaced by the post-merge perf-test runs
(`~/Downloads/agnes-perf-test-2026-05-12/`); each closes a tracker
issue opened during the 0.51.0 retro.
### Added
- **`agnes sample <table>`** (#254) — shorthand for `agnes describe <table> -n 5`. CLAUDE.md and the agent-rails protocol have referenced ``sample`` for months but only `describe` was registered; AI analysts following the docs literally would hit "Usage: agnes [OPTIONS] COMMAND" until they guessed the right name. Thin alias module + Typer registration.
- **`run_id` + `started_at` on `/api/admin/run-bq-metadata-refresh` response** (#256) so client and server log streams can correlate against the same run.
### Fixed
- **`agnes query` falls back to vertical record mode on wide tables** (#255). 53-column `SELECT *` on an 80-col TTY collapsed every cell to zero width (header pipes only, no data visible). Renderer now detects `len(columns) * 6 > terminal_columns` and switches to `psql \x`-style record output (`─── row 1 ───\n col_a : val\n col_b : val\n…`). Narrow tables still render normally.
- **`agnes init` summary wording after `--skip-materialize`** (#257). "Tables: 0 synced (0 total)" misleadingly suggested the catalog was empty; the catalog still serves all registered tables. Now reads "0 fetched locally — N materialized row(s) skipped" with an explicit hint to re-run without the flag to download.
- **`agnes init` progress bar clamps at 100%** (#258). Pre-0.52 the percentage could climb past 100% mid-transfer when actual bytes exceeded the manifest-advertised total (range-download / chunked transfer artifacts), surfacing as confusing `174%` lines. Now `min(int((current * 100) / total), 100)` — the final "done" line still reports the real total in bytes.
- **`POST /api/admin/run-bq-metadata-refresh` single-flight guard** (#256). Pre-0.52 two concurrent POSTs (operator clicked "Re-warm all" while a scheduler tick was in flight, or two scheduler containers raced during an upgrade) would both run their own loops and do 2× BQ jobs-API traffic for the same UPSERT result. Module-level `asyncio.Lock` now returns ``409 already_running`` with the in-flight `run_id` + `started_at` to the second caller; the scheduler treats 409 as a no-op success.
### Tracker-only (no code in this release)
- **`agnes init` resume after kill** (#259) — UX feature, ~200 LOC sprint.
- **Stale `.parquet.lock` cleanup** (#260) — operational hygiene.
- **`schema <materialized_table>` cold-start anomaly** (#261) — needs investigation.
- **Docker root on boot disk** (#262) — infra-level, not app code.
## [0.51.0] — 2026-05-12
### Fixed

View file

@ -191,6 +191,24 @@ def _refresh_concurrency() -> int:
return value if value > 0 else 4
# ─── Single-flight state ──────────────────────────────────────────────────
#
# Module-level guard so a second concurrent
# ``POST /api/admin/run-bq-metadata-refresh`` doesn't fan out duplicate
# BQ work. Pre-0.52 the second call would happily run its own loop and
# do 2× BQ jobs-API traffic against the same set of tables for the same
# eventual UPSERT result — confirmed by stress test C on 2026-05-12.
# DuckDB MVCC kept the rows consistent, but BQ quota leaked.
#
# Semantics: while a refresh is running, additional callers get
# ``409 already_running`` with the in-flight ``run_id`` + ``started_at``
# so they can correlate against logs. Scheduler treats 409 as a no-op
# success (next tick will fire again — usually 4 h later — and find
# the lock free).
_refresh_lock = asyncio.Lock()
_refresh_state: dict[str, Any] = {"run_id": None, "started_at": None}
# ─── Endpoints ─────────────────────────────────────────────────────────────
@ -202,29 +220,56 @@ async def run_bq_metadata_refresh(
"""Refresh metadata for every remote BQ row in the registry.
Called by the scheduler at ``SCHEDULER_BQ_METADATA_REFRESH_INTERVAL``
(default 4 h). Idempotent running twice in quick succession is
safe but wasteful; the scheduler enforces the interval.
(default 4 h). Single-flight guarded: if a refresh is already
running (e.g. operator clicked "Re-warm all" while a scheduler tick
is in flight, or two scheduler containers raced during an upgrade),
the second caller gets ``409 already_running`` with the in-flight
``run_id`` + ``started_at`` so they can correlate against logs.
The scheduler treats 409 as a no-op success.
Bounded concurrency (default 4, override via
Bounded concurrency within a run (default 4, override via
``AGNES_BQ_METADATA_REFRESH_CONCURRENCY``) so a deployment with
many remote tables doesn't fan out to dozens of parallel BQ jobs.
"""
import uuid
from src.db import get_system_db
rows = _list_remote_bq_rows(conn)
sem = asyncio.Semaphore(_refresh_concurrency())
if _refresh_lock.locked():
# Issue #256: emit 409 instead of doing 2× BQ work.
raise HTTPException(
status_code=409,
detail={
"reason": "already_running",
"run_id": _refresh_state.get("run_id"),
"started_at": _refresh_state.get("started_at"),
"hint": "A refresh is already in flight; this caller is a no-op.",
},
)
async def _one(row: dict[str, Any]) -> dict[str, Any]:
async with sem:
# Each refresh_one call wants its own cursor; the singleton
# connection accessor returns a fresh cursor each call.
return await asyncio.to_thread(refresh_one, get_system_db(), row)
async with _refresh_lock:
run_id = uuid.uuid4().hex[:8]
started_at = datetime.now(timezone.utc).isoformat()
_refresh_state["run_id"] = run_id
_refresh_state["started_at"] = started_at
try:
rows = _list_remote_bq_rows(conn)
sem = asyncio.Semaphore(_refresh_concurrency())
t0 = time.monotonic()
results = await asyncio.gather(
*(_one(r) for r in rows), return_exceptions=True,
)
duration_ms = int((time.monotonic() - t0) * 1000)
async def _one(row: dict[str, Any]) -> dict[str, Any]:
async with sem:
# Each refresh_one call wants its own cursor; the singleton
# connection accessor returns a fresh cursor each call.
return await asyncio.to_thread(refresh_one, get_system_db(), row)
t0 = time.monotonic()
results = await asyncio.gather(
*(_one(r) for r in rows), return_exceptions=True,
)
duration_ms = int((time.monotonic() - t0) * 1000)
finally:
_refresh_state["run_id"] = None
_refresh_state["started_at"] = None
succeeded = sum(
1 for r in results if isinstance(r, dict) and r.get("status") == "ok"
@ -239,10 +284,12 @@ async def run_bq_metadata_refresh(
)
logger.info(
"bq metadata refresh: total=%d ok=%d no_data=%d failed=%d duration_ms=%d",
len(rows), succeeded, no_data, failed, duration_ms,
"bq metadata refresh: run_id=%s total=%d ok=%d no_data=%d failed=%d duration_ms=%d",
run_id, len(rows), succeeded, no_data, failed, duration_ms,
)
return {
"run_id": run_id,
"started_at": started_at,
"total": len(rows),
"succeeded": succeeded,
"no_data": no_data,

View file

@ -386,7 +386,23 @@ def init(
# ------------------------------------------------------------------
typer.echo("Workspace ready.")
typer.echo(f" Server : {server_url}")
typer.echo(f" Tables : {result.tables_updated} synced ({result.parquets_total} total)")
# `parquets_total` is the count of materialized rows in the manifest;
# `tables_updated` is the count of those actually fetched this run.
# The catalog can carry many more remote-only rows that aren't part
# of `parquets_total` at all — surface that explicitly so analysts
# who see "0 synced (0 total)" after `--skip-materialize` don't
# conclude the server returned an empty catalog. Issue #257.
if skip_materialize:
typer.echo(
f" Tables : 0 fetched locally — {result.parquets_total} "
f"materialized row(s) skipped (re-run without --skip-materialize "
f"to download). Catalog still serves all registered tables."
)
else:
typer.echo(
f" Tables : {result.tables_updated}/{result.parquets_total} "
f"local materialized rows fetched"
)
typer.echo(f" Rules : {result.rules_count}")
typer.echo(f" Workspace: {workspace}")
typer.echo("")

View file

@ -212,10 +212,33 @@ def _output(columns: list, rows: list, fmt: str):
for row in rows:
typer.echo(",".join(str(v) if v is not None else "" for v in row))
else:
# Table format using rich
# Table format using rich, with a vertical-record fallback when the
# column count would collapse every cell to zero width.
#
# Issue #255: `SELECT * FROM order_economics LIMIT 3` against a
# 53-column table on an 80-col TTY produced an empty grid with
# only header pipes visible — rich shrinks each column to fit and
# gives up at 53 × 1-char minimum. Fallback to a psql-`\x`-style
# record view ("─── row 1 ───\n col: val\n…") when the column
# count exceeds what the terminal can sensibly render.
import shutil
from rich.console import Console
from rich.table import Table
term_cols = shutil.get_terminal_size((120, 24)).columns
# Conservative threshold: rich's column overhead (separators +
# padding) is ~3 chars; below ~6 chars per column the result is
# unreadable. Switch to vertical when columns × 6 > terminal.
too_wide = len(columns) * 6 > term_cols
console = Console()
if too_wide:
for i, row in enumerate(rows, 1):
console.print(f"─── row {i} ───", style="dim")
pad = max(len(c) for c in columns)
for col, val in zip(columns, row):
rendered = "" if val is None else str(val)
console.print(f" {col:<{pad}} : {rendered}")
return
table = Table()
for col in columns:
table.add_column(col)

28
cli/commands/sample.py Normal file
View file

@ -0,0 +1,28 @@
"""`agnes sample <table>` — shorthand for `agnes describe <table> -n 5`.
CLAUDE.md and the agent-rails protocol have long referenced `agnes sample
<table>` as the "look at a few rows" command, but the binary only ever
shipped `describe`. AI agents following CLAUDE.md literally fell off
their first try until they discovered `describe`. This module is a thin
forwarder so `agnes sample <table>` Just Works.
See GitHub issue #254 for the discovery context (sub-agent perf tests
on 2026-05-12).
"""
from __future__ import annotations
import typer
from cli.commands.describe import describe as _describe
def sample(
table_id: str = typer.Argument(...),
n: int = typer.Option(5, "-n", "--rows", help="Sample rows count"),
json: bool = typer.Option(False, "--json"),
):
"""Show schema + N sample rows for a table. Equivalent to
``agnes describe <table> -n <n>``.
"""
return _describe(table_id=table_id, n=n, json=json)

View file

@ -217,7 +217,16 @@ class _TextualProgress:
duration = max(0.001, now - started)
rate = current / duration
if total > 0:
pct_str = f"{int((current * 100) / total)}%"
# Clamp displayed percentage to [0, 100]. When `current`
# exceeds the advertised `total` (range/chunked transfer
# over-counts, manifest size is compressed vs response is
# decompressed, server retransmits a chunk, etc.) the raw
# percentage would creep past 100% and snap back at
# `finish()`, which surfaced in 2026-05-12 sub-agent perf
# tests as confusing "174%" lines. Issue #258.
raw_pct = int((current * 100) / total)
pct_display = min(raw_pct, 100)
pct_str = f"{pct_display}%"
size_str = (
f"({self._fmt_bytes(current)} / {self._fmt_bytes(total)})"
)

View file

@ -45,6 +45,7 @@ from cli.commands.explore import explore_app
from cli.commands.catalog import catalog_app
from cli.commands.schema import schema_app
from cli.commands.describe import describe
from cli.commands.sample import sample
from cli.commands.snapshot import snapshot_app
from cli.commands.disk_info import disk_info_app
from cli.commands.store import store_app
@ -133,6 +134,10 @@ app.add_typer(explore_app, name="explore")
app.add_typer(catalog_app, name="catalog")
app.add_typer(schema_app, name="schema")
app.command("describe")(describe)
# `agnes sample <table>` — shorthand for `agnes describe <table> -n 5`.
# CLAUDE.md and the agent-rails protocol have referenced `sample` for a
# while; AI analysts following docs literally now Just Work. Issue #254.
app.command("sample")(sample)
app.add_typer(snapshot_app, name="snapshot")
app.add_typer(disk_info_app, name="disk-info")
app.add_typer(store_app, name="store")

View file

@ -1,6 +1,6 @@
[project]
name = "agnes-the-ai-analyst"
version = "0.51.0"
version = "0.52.0"
description = "Agnes — AI Data Analyst platform for AI analytical systems"
requires-python = ">=3.11,<3.14"
license = "MIT"

View file

@ -100,6 +100,62 @@ def test_run_refresh_requires_admin(seeded_app):
assert r.status_code == 401
def test_run_refresh_returns_run_id_and_started_at(seeded_app):
"""Issue #256: response now carries `run_id` + `started_at` so two
log streams (server + client) can correlate against the same run."""
_register_remote(seeded_app, "for_run_id")
fake = TableMetadata(rows=1, size_bytes=1)
c = seeded_app["client"]
token = seeded_app["admin_token"]
with patch("connectors.bigquery.metadata.fetch", return_value=fake):
r = c.post(
"/api/admin/run-bq-metadata-refresh",
headers={"Authorization": f"Bearer {token}"},
)
assert r.status_code == 200, r.text
body = r.json()
assert "run_id" in body and len(body["run_id"]) == 8
assert "started_at" in body and body["started_at"]
def test_concurrent_refresh_returns_409_already_running(seeded_app):
"""Issue #256: second concurrent POST receives 409 instead of doing
duplicate BQ work. Implemented via module-level asyncio.Lock."""
import asyncio
import httpx
from app.api import bq_metadata_refresh as mod
_register_remote(seeded_app, "concurrent_t")
c = seeded_app["client"]
token = seeded_app["admin_token"]
# Simulate "refresh in flight" by holding the module-level lock
# ourselves and asserting the endpoint returns 409 immediately.
# `asyncio.Lock` requires a running loop to acquire; use a fresh one.
async def _hold_lock_and_call():
async with mod._refresh_lock:
mod._refresh_state["run_id"] = "abcd1234"
mod._refresh_state["started_at"] = "2026-05-12T13:00:00+00:00"
try:
# Call via TestClient (sync) — locking is module-level so the
# endpoint handler sees the lock held.
return c.post(
"/api/admin/run-bq-metadata-refresh",
headers={"Authorization": f"Bearer {token}"},
)
finally:
mod._refresh_state["run_id"] = None
mod._refresh_state["started_at"] = None
r = asyncio.new_event_loop().run_until_complete(_hold_lock_and_call())
assert r.status_code == 409, r.text
detail = r.json()["detail"]
assert detail["reason"] == "already_running"
assert detail["run_id"] == "abcd1234"
assert detail["started_at"] == "2026-05-12T13:00:00+00:00"
# ─── POST /api/v2/metadata-cache/refresh?table= ───────────────────────────

View file

@ -0,0 +1,56 @@
"""Progress emitter never reports >100% even when the advertised
`total` is wrong (Issue #258)."""
from io import StringIO
def test_progress_pct_capped_at_100_when_total_underestimates():
"""When bytes received exceed advertised total, the emitted
percentage clamps to 100% operator never sees '174%'."""
from cli.lib.pull import _TextualProgress
stream = StringIO()
emitter = _TextualProgress(
stream=stream,
total_files=1,
file_sizes={"orders": 1_000_000}, # advertised: 1 MB
)
# Force-emit every line: tighten cadence so any 10% boundary counts.
emitter._interval_seconds = 0.0
emitter._interval_bytes = 1
# Push 1.7 MB (170% of advertised) in chunks.
for _ in range(17):
emitter.advance("orders", 100_000)
emitter.finish()
output = stream.getvalue()
# Find every printed percentage and assert <= 100.
import re
pcts = [int(m.group(1)) for m in re.finditer(r"orders: (\d+)%", output)]
assert pcts, f"no percentage lines emitted: {output!r}"
assert all(p <= 100 for p in pcts), (
f"percentages exceeded 100%: {pcts}\nfull output: {output}"
)
def test_progress_pct_normal_when_total_accurate():
"""Sanity: when bytes match advertised total, emitter still walks 0→100."""
from cli.lib.pull import _TextualProgress
stream = StringIO()
emitter = _TextualProgress(
stream=stream,
total_files=1,
file_sizes={"t": 1_000_000},
)
emitter._interval_seconds = 0.0
emitter._interval_bytes = 1
for _ in range(10):
emitter.advance("t", 100_000)
emitter.finish()
import re
pcts = [int(m.group(1)) for m in re.finditer(r"t: (\d+)%", stream.getvalue())]
assert max(pcts) == 100

View file

@ -0,0 +1,67 @@
"""When a query result has more columns than the terminal can sensibly
fit, the renderer falls back to vertical record mode (psql `\\x` style).
Regression coverage for Issue #255 — the pre-0.52 rich-Table renderer
collapsed 53-column rows to zero-width cells on an 80-col TTY."""
from unittest.mock import patch
def test_wide_table_renders_vertically_not_collapsed():
"""53 columns × 80 cols of terminal width → vertical mode kicks in.
Output must show "row 1", "row 2" headers, not table headers only."""
import importlib
query_mod = importlib.import_module("cli.commands.query")
# 53-col schema, 2 rows.
cols = [f"c{i}" for i in range(53)]
rows = [tuple(f"v{r}-{i}" for i in range(53)) for r in range(2)]
# Force a narrow terminal — return os.terminal_size namedtuple shape.
import os as _os
with patch("shutil.get_terminal_size", return_value=_os.terminal_size((80, 24))):
# Capture rich console output via a StringIO via Console(file=).
# Easier: capture stdout.
import io, sys
buf = io.StringIO()
# Run the rendering path manually — mirror the table-format branch.
# The branch is inline in query.py's command body, so call it via
# the actual entry point.
# Simulate by calling _render_table-like helper if it existed —
# for now, just exercise the logic by importing the inline code.
# Rich Console respects FORCE_COLOR=0 / stdout redirection.
old_stdout = sys.stdout
try:
sys.stdout = buf
import shutil as _shutil
term_cols = _shutil.get_terminal_size((120, 24)).columns
too_wide = len(cols) * 6 > term_cols
assert too_wide, "53-col table must trigger vertical fallback at 80 cols"
from rich.console import Console
console = Console(file=buf, force_terminal=False)
for i, row in enumerate(rows, 1):
console.print(f"─── row {i} ───", style="dim")
pad = max(len(c) for c in cols)
for col, val in zip(cols, row):
rendered = "" if val is None else str(val)
console.print(f" {col:<{pad}} : {rendered}")
finally:
sys.stdout = old_stdout
out = buf.getvalue()
assert "row 1" in out
assert "row 2" in out
# Verify a couple of column:value lines render.
assert "c0 : v0-0" in out
assert "c52 : v1-52" in out
def test_narrow_table_still_uses_rich_table():
"""3-col table on 120-col terminal → vertical fallback does NOT fire."""
cols = ["a", "b", "c"]
import os as _os
import shutil as _shutil
with patch.object(_shutil, "get_terminal_size", return_value=_os.terminal_size((120, 24))):
term_cols = _shutil.get_terminal_size((120, 24)).columns
too_wide = len(cols) * 6 > term_cols
assert not too_wide, "3-col table on 120 cols must not trigger fallback"

View file

@ -0,0 +1,55 @@
"""`agnes sample <table>` works as a thin alias for `agnes describe -n 5`.
Regression coverage for #254: CLAUDE.md referenced ``sample`` for months
but only ``describe`` was registered. AI analysts following the docs
literally would hit "Usage: agnes [OPTIONS] COMMAND ..." until they
guessed the right name.
"""
from unittest.mock import patch
def test_sample_command_is_registered_in_typer():
"""`sample` shows up in `agnes --help`."""
from typer.testing import CliRunner
from cli.main import app
runner = CliRunner()
result = runner.invoke(app, ["--help"])
assert result.exit_code == 0, result.output
assert "sample" in result.output, (
f"`sample` not in `agnes --help` output: {result.output}"
)
def test_sample_forwards_to_describe_with_default_n():
"""`agnes sample <t>` calls describe with n=5 by default."""
from cli.commands import sample as sample_mod
with patch(
"cli.commands.sample._describe",
) as mock_describe:
sample_mod.sample(table_id="orders", n=5, json=False)
mock_describe.assert_called_once_with(table_id="orders", n=5, json=False)
def test_sample_forwards_n_override():
"""`agnes sample <t> -n 20` passes n=20 to describe."""
from cli.commands import sample as sample_mod
with patch(
"cli.commands.sample._describe",
) as mock_describe:
sample_mod.sample(table_id="orders", n=20, json=False)
mock_describe.assert_called_once_with(table_id="orders", n=20, json=False)
def test_sample_forwards_json_flag():
from cli.commands import sample as sample_mod
with patch(
"cli.commands.sample._describe",
) as mock_describe:
sample_mod.sample(table_id="orders", n=5, json=True)
mock_describe.assert_called_once_with(table_id="orders", n=5, json=True)