release: 0.45.0 — easy-wins bundle (#84 #164 #177 #178 #203 #204)

Operator-and-analyst quality bundle: a security fix for the optional Telegram bot, two CLI gaps closed, and three rounds of UX polish on `agnes diagnose` and `agnes pull` so non-TTY consumers (CI runners, Claude Code SessionStart hooks, sub-agent watchdogs) get readable, actionable signal. - Pairing-code RNG: random.choices -> secrets.choice (CSPRNG). - Telegram script runner: refuse out-of-shape usernames before sudo -u. CLAUDE.md.bak.<ISO-timestamp> before regenerating. - agnes admin unregister-table <id> -> DELETE /api/admin/registry/{id} - agnes admin update-table <id> --field=value ... -> PUT /api/admin/registry/{id} response but never promotes the headline. BQ billing-equals-data check downgraded warning -> info. default (5 s / 1 MiB vs 30 s / 10%) so sub-agent watchdogs don't kill the pull as a hung process. New env knobs: AGNES_PULL_PROGRESS_INTERVAL_{SECONDS,BYTES}. --include-schema (or ?include=schema) to opt back in. Tests: 120 passed across the touched modules, including new tests for each fix. Pre-existing failures on main (DB migration v1->v9, binary rename) are unrelated and not introduced here.
2026-05-07 11:33:11 +02:00 · 2026-05-07 11:33:11 +02:00 · c97fd504c5
commit c97fd504c5
parent f6c2012d5b
17 changed files with 816 additions and 25 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -10,9 +10,116 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C

 ## [Unreleased]

+## [0.45.0] — 2026-05-07
+
+Operator-and-analyst quality bundle: a security fix for the optional
+Telegram bot, two CLI gaps closed, and three rounds of UX polish on
+`agnes diagnose` and `agnes pull` so non-TTY consumers (CI runners,
+Claude Code SessionStart hooks, sub-agent watchdogs) get readable,
+actionable signal. Closes #84, #164, #177, #178, #203, #204.
+
+### Security
+
+- **Telegram bot pairing-code RNG hardened (#84).** The pairing code
+  used to link a Telegram chat to an Agnes user is now generated via
+  `secrets.choice` (CSPRNG) rather than `random.choices`. Pre-fix an
+  attacker who scraped one issued code could recover the `random`
+  module's PRNG state and predict subsequent codes issued in the same
+  process — the fix neutralizes that class of attack
+  (`services/telegram_bot/storage.py:_generate_code`).
+- **Telegram script runner refuses out-of-shape usernames (#84).** The
+  optional notification runner shells out via `sudo -u <username>`. A
+  username controlled by an attacker — e.g. via tampering with
+  `telegram_users.json` — could otherwise carry sudo flags
+  (`-u`, `--shell=…`) or shell metacharacters. The runner now validates
+  the value against a POSIX-conservative regex
+  (`^[a-z_][a-z0-9._-]{0,31}$`) and returns `None` before invoking
+  `subprocess.run` if it doesn't match
+  (`services/telegram_bot/runner.py:_USERNAME_RE`).
+
+### Added
+
+- `agnes admin unregister-table <id>` — CLI wrapper for
+  `DELETE /api/admin/registry/{id}` (#177). Confirms before destructive
+  action; pass `--yes` to skip the prompt in scripts. The server-side
+  endpoint already does the parquet/`sync_state` cleanup; the CLI is a
+  thin client.
+- `agnes admin update-table <id>` — CLI wrapper for
+  `PUT /api/admin/registry/{id}` (#177). Only the supplied flags go in
+  the body (`--name`, `--bucket`, `--source-table`, `--query-mode`,
+  `--query`, `--description`, `--sync-schedule`, `--source-type`); the
+  rest stay unchanged on the server. `--query` accepts `@path/to.sql`
+  for files. Calling with no flags errors (`No fields supplied`)
+  instead of silently no-opping.
+- `agnes diagnose --include-schema` (#204). The default `agnes
+  diagnose` no longer surfaces the DB schema-version check — analysts
+  hitting the CLI rarely care about the integer, and it dominated the
+  agent-facing output. Pass `--include-schema` (or query
+  `/api/health/detailed?include=schema` directly) when verifying a
+  migration.
+- **`info` severity tier in `/api/health/detailed`** (#178). Sits
+  between `ok` and `warning`: surfaces a non-trivial observation
+  worth reading without promoting the headline status to `degraded`.
+  See the module docstring at `app/api/health.py` for the full
+  severity ladder. The BQ billing-equals-data check is the first
+  consumer (was `warning` → now `info`).
+- `AGNES_PULL_PROGRESS_INTERVAL_SECONDS` and
+  `AGNES_PULL_PROGRESS_INTERVAL_BYTES` env knobs for the textual
+  progress emitter (#203). Defaults are tighter than pre-fix (5 s /
+  1 MiB vs the previous 30 s / 10%-of-total) so non-TTY consumers
+  see continuous output and don't trip dead-process watchdogs on
+  multi-GB parquets. Override either independently.
+
+### Changed
+
+- **`agnes pull` non-TTY progress is more chatty by default (#203).**
+  Previous cadence (30 s / 10%) produced one line every several
+  minutes on multi-GB parquets, long enough for Claude Code
+  sub-agent watchdogs to kill the pull as a hung process. New
+  defaults: emit when *any* of (10% boundary, 5 s elapsed, 1 MiB
+  bytes since last emit). The 10% boundary is unchanged so small
+  files still get the original visual rhythm.
+- **`/api/health/detailed` no longer includes `db_schema` by default
+  (#204).** Pass `?include=schema` to opt back in. The aggregator
+  treats the schema check as "not asserted" when absent, so
+  unrelated services can still drive the headline. Operators using
+  the legacy entry should add the parameter to their probe
+  configuration.
+- **BQ billing-project equals data-project surfaces as `info`, not
+  `warning` (#178).** Many valid single-project dev instances run
+  with billing == data; the message is informational. The `detail`
+  + `hint` strings are unchanged so the operator still gets the
+  USER_PROJECT_DENIED context if they're hitting it. Pre-fix, the
+  message alone promoted the overall headline to `degraded` even on
+  intentionally collapsed setups.
+- `agnes init --force` now snapshots the prior `CLAUDE.md` to
+  `CLAUDE.md.bak.<ISO-timestamp>` before regenerating it (#164). Each
+  re-run produces a fresh backup; the prior backup is not clobbered.
+  A FS error on the backup path is logged but does not abort the
+  init (the existing-workspace gate still requires `--force`).
+
 ### Internal

- `infra/modules/customer-instance` (tag `infra-v1.8.0`): `startup-script.sh.tpl` no longer overwrites operator-edited `AGNES_TAG` / `AGNES_TEMP_DIR` in `/opt/agnes/.env` on every boot. Reads the existing values when present and lets them win over the template-computed `$IMAGE_TAG`. Pre-fix, an in-place TF action that stopped/started the VM (e.g. `machine_type` change) would re-run the startup script and clobber any manually-pinned image tag — operators had to re-edit the file post-restart. Fresh provisions still get the TF-driven values; the `.env` file's existence is the disambiguator. To force a TF-driven reset, `rm /opt/agnes/.env` and reboot.
+- New `cli.client.api_put` helper to mirror `api_get` /
+  `api_post` / `api_delete` / `api_patch` for the new
+  `update-table` command.
+- Tests added: `tests/test_telegram_bot_runner.py`,
+  `tests/test_health_schema_gate.py`, plus extensions to
+  `test_telegram_storage`, `test_pull_progress`, `test_diagnose_billing`,
+  `test_cli_admin`, `test_cli_init`.
+- `infra/modules/customer-instance` (tag `infra-v1.8.0`):
+  `startup-script.sh.tpl` no longer overwrites operator-edited
+  `AGNES_TAG` / `AGNES_TEMP_DIR` in `/opt/agnes/.env` on every boot.
+  Reads the existing values when present and lets them win over the
+  template-computed `$IMAGE_TAG`. Pre-fix, an in-place TF action that
+  stopped/started the VM (e.g. `machine_type` change) would re-run the
+  startup script and clobber any manually-pinned image tag — operators
+  had to re-edit the file post-restart. Fresh provisions still get the
+  TF-driven values; the `.env` file's existence is the disambiguator.
+  To force a TF-driven reset, `rm /opt/agnes/.env` and reboot. Folded
+  in from #214, which landed on main between 0.44.1 and this cut.
+
+## [0.44.1] — 2026-05-07

 ### Fixed

--- a/app/api/health.py
+++ b/app/api/health.py
@ -1,10 +1,29 @@
-"""Health check endpoint — structured diagnostics for AI agents."""
+"""Health check endpoint — structured diagnostics for AI agents.
+
+## Severity vocabulary
+
+Per-check `status` values, in order of escalation:
+
+- `ok`     — nothing to surface.
+- `info`   — non-trivial observation worth showing the operator, but the
+             situation isn't broken. **Does not** promote the overall
+             status to `degraded` (issue #178).
+- `unknown`— check couldn't run (missing dependency, FS error). Surfaced
+             but doesn't promote overall.
+- `warning`— real issue, operator should look. Promotes overall to
+             `degraded`.
+- `error`  — critical. Promotes overall to `unhealthy`.
+
+Add an `info`-tier check by returning `{"status": "info", ...}` from the
+check function. The aggregator at the bottom of `health_check_detailed`
+treats `info` as non-promoting.
+"""

 import os
 from datetime import datetime, timezone
 from pathlib import Path

-from fastapi import APIRouter, Depends
+from fastapi import APIRouter, Depends, Query
 import duckdb

 from app.auth.dependencies import _get_db, get_current_user
@ -66,15 +85,18 @@ def _check_bq_billing_project() -> dict | None:
        return {"status": "ok", "detail": "BigQuery project not configured"}

    if billing == data:
+        # Issue #178: this is informational, not a fault. Many valid
+        # single-project dev instances run with billing == data and the SA
+        # has `serviceusage.services.use`. Keep the message visible but
+        # don't promote the overall status to `degraded` for it.
        return {
-            "status": "warning",
+            "status": "info",
            "detail": "BigQuery billing project equals data project",
            "hint": (
-                "Set data_source.bigquery.billing_project in instance.yaml to a "
+                "If the SA hits USER_PROJECT_DENIED 403, set "
+                "data_source.bigquery.billing_project in instance.yaml to a "
                "project the SA can bill against (typically your dev/billable "
                "project, distinct from a shared read-only data project). "
-                "Otherwise BQ calls 403 USER_PROJECT_DENIED whenever the SA "
-                "lacks serviceusage.services.use on the data project. "
                "Configurable via /admin/server-config UI."
            ),
            "billing_project": billing,
@ -230,9 +252,21 @@ async def health_check():
 async def health_check_detailed(
    conn: duckdb.DuckDBPyConnection = Depends(_get_db),
    _user: dict = Depends(get_current_user),
+    include: str = Query(
+        "",
+        description=(
+            "Comma-separated list of optional checks to include. "
+            "Recognised values: `schema` (DB schema version against the "
+            "expected migration). The default response omits these because "
+            "they're rarely actionable on a healthy instance and add noise "
+            "to `agnes diagnose` output (issue #204). Pass `?include=schema` "
+            "to get the legacy behavior."
+        ),
+    ),
 ):
    """Structured health check with deployment metadata. Requires authentication."""
    checks = {}
+    include_set = {p.strip() for p in include.split(",") if p.strip()}

    # DuckDB state
    try:
@ -241,7 +275,13 @@ async def health_check_detailed(
    except Exception as e:
        checks["duckdb_state"] = {"status": "error", "detail": str(e)}

-    # DB schema version check
+    # DB schema version check — opt-in (issue #204). Operators who run a
+    # fresh release pinned to the same image as the running schema rarely
+    # care about this number; analysts hitting the endpoint via
+    # `agnes diagnose` see it as noise. Surface it on demand via
+    # `?include=schema` (the dashboard / admin UI passes this; default
+    # CLI does not).
+    if "schema" in include_set:
        checks["db_schema"] = _check_db_schema()

    # Sync state summary
@ -292,6 +332,10 @@ async def health_check_detailed(
    except Exception as e:
        checks["session_pipeline"] = {"status": "unknown", "detail": str(e)}

+    # Aggregate to overall status. `info` and `unknown` surface in the
+    # response but never escalate the headline (issue #178). `warning`
+    # promotes to `degraded`; `error` (or a schema mismatch when the
+    # caller asked for it) promotes to `unhealthy`.
    overall = "healthy"
    for check in checks.values():
        if check.get("status") == "error":
@ -299,8 +343,9 @@ async def health_check_detailed(
            break
        if check.get("status") == "warning":
            overall = "degraded"
-    # DB schema mismatch or unreachable also makes the overall status unhealthy
-    if checks.get("db_schema", {}).get("db_schema") != "ok":
+    # Schema mismatch only escalates when the caller asked for the check
+    # — otherwise the absent key is treated as "not asserted".
+    if "db_schema" in checks and checks["db_schema"].get("db_schema") != "ok":
        overall = "unhealthy"

    return {
--- a/cli/client.py
+++ b/cli/client.py
@ -387,6 +387,14 @@ def api_patch(path: str, *, timeout: float = 30.0, **kwargs) -> httpx.Response:
        raise _translate_transport_error(exc, context=f"PATCH {path}", timeout_s=timeout) from exc


+def api_put(path: str, *, timeout: float = 30.0, **kwargs) -> httpx.Response:
+    try:
+        with get_client(timeout=timeout) as client:
+            return client.put(path, **kwargs)
+    except httpx.HTTPError as exc:
+        raise _translate_transport_error(exc, context=f"PUT {path}", timeout_s=timeout) from exc
+
+
 def _is_transient(exc: Exception) -> bool:
    """Worth retrying? Network blip or 5xx — yes. Auth / 4xx — no."""
    if isinstance(exc, (httpx.ConnectError, httpx.ReadError, httpx.WriteError,
--- a/cli/commands/admin.py
+++ b/cli/commands/admin.py
@ -4,7 +4,7 @@ import json

 import typer

-from cli.client import api_get, api_post, api_delete, api_patch
+from cli.client import api_get, api_post, api_delete, api_patch, api_put
 from cli.commands.admin_metrics import admin_metrics_app
 from cli.commands.admin_store import admin_store_app
 from cli.commands.memory_admin import memory_admin_app
@ -324,6 +324,139 @@ def list_tables(as_json: bool = typer.Option(False, "--json")):
            typer.echo(f"  {t['name']:30s} src={t.get('source_type','?'):10s} mode={t.get('query_mode','?'):6s} bucket={t.get('bucket',''):20s}")


+@admin_app.command("unregister-table")
+def unregister_table(
+    table_id: str = typer.Argument(..., help="Table id to unregister"),
+    yes: bool = typer.Option(
+        False, "--yes", "-y",
+        help="Skip the confirmation prompt (for scripts).",
+    ),
+):
+    """Unregister a table from the registry.
+
+    Calls `DELETE /api/admin/registry/{table_id}`. The server unhooks the
+    master view, removes the canonical parquet for materialized rows, and
+    clears the matching `sync_state` row. Issue #177.
+    """
+    if not yes:
+        typer.echo(f"About to unregister table: {table_id}")
+        if not typer.confirm("Continue?"):
+            typer.echo("Aborted.")
+            raise typer.Exit(0)
+    resp = api_delete(f"/api/admin/registry/{table_id}")
+    if resp.status_code == 204:
+        typer.echo(f"Unregistered: {table_id}")
+        return
+    if resp.status_code == 404:
+        typer.echo(f"Not registered: {table_id}", err=True)
+        raise typer.Exit(1)
+    try:
+        detail = resp.json().get("detail", resp.text)
+    except Exception:
+        detail = resp.text
+    typer.echo(f"Failed: {detail}", err=True)
+    raise typer.Exit(1)
+
+
+@admin_app.command("update-table")
+def update_table(
+    table_id: str = typer.Argument(..., help="Table id to update"),
+    name: str = typer.Option(None, "--name", help="New display name"),
+    bucket: str = typer.Option(None, "--bucket", help="New bucket / dataset"),
+    source_table: str = typer.Option(
+        None, "--source-table", help="New source table name"
+    ),
+    query_mode: str = typer.Option(
+        None,
+        "--query-mode",
+        help="New query mode: local | remote | materialized",
+    ),
+    query: str = typer.Option(
+        None,
+        "--query",
+        help=(
+            "New SQL body for query_mode='materialized' (BigQuery). "
+            "Inline SQL or `@path/to.sql` to read from disk. Use "
+            "`--query=` (empty value) to clear."
+        ),
+    ),
+    description: str = typer.Option(
+        None, "--description", help="New description"
+    ),
+    sync_schedule: str = typer.Option(
+        None,
+        "--sync-schedule",
+        help="New cron schedule (e.g. 'every 6h' / 'daily 03:00'); honored by materialized BQ rows",
+    ),
+    source_type: str = typer.Option(
+        None,
+        "--source-type",
+        help="Change source type. Rare — most edits keep this fixed.",
+    ),
+):
+    """Update a registered table.
+
+    Calls `PUT /api/admin/registry/{table_id}` with only the supplied
+    fields. Field omitted → unchanged. Issue #177.
+
+    For BQ rows, the server schedules a background rebuild so the master
+    view picks up the change without waiting for the next scheduled sync.
+    Switching `query_mode` away from `materialized` clears the stale
+    `source_query` automatically.
+    """
+    from pathlib import Path
+
+    payload: dict = {}
+    if name is not None:
+        payload["name"] = name
+    if bucket is not None:
+        payload["bucket"] = bucket
+    if source_table is not None:
+        payload["source_table"] = source_table
+    if query_mode is not None:
+        payload["query_mode"] = query_mode
+    if description is not None:
+        payload["description"] = description
+    if sync_schedule is not None:
+        payload["sync_schedule"] = sync_schedule
+    if source_type is not None:
+        payload["source_type"] = source_type
+    if query is not None:
+        if query.startswith("@"):
+            sql_path = Path(query[1:])
+            if not sql_path.exists():
+                typer.echo(f"Error: SQL file not found: {sql_path}", err=True)
+                raise typer.Exit(2)
+            payload["source_query"] = sql_path.read_text(encoding="utf-8").strip()
+        else:
+            payload["source_query"] = query.strip()
+
+    if not payload:
+        typer.echo(
+            "No fields supplied. Pass at least one of --name, --bucket, "
+            "--source-table, --query-mode, --query, --description, "
+            "--sync-schedule, --source-type.",
+            err=True,
+        )
+        raise typer.Exit(2)
+
+    resp = api_put(f"/api/admin/registry/{table_id}", json=payload)
+    if resp.status_code == 200:
+        data = resp.json()
+        updated = data.get("updated") or sorted(payload.keys())
+        typer.echo(f"Updated {table_id}: {', '.join(updated)}")
+        return
+    if resp.status_code == 404:
+        typer.echo(f"Not registered: {table_id}", err=True)
+        raise typer.Exit(1)
+    try:
+        detail = resp.json().get("detail", resp.text)
+    except Exception:
+        detail = resp.text
+    typer.echo(f"Failed: {detail}", err=True)
+    raise typer.Exit(1)
+
+
@admin_app.command("metadata-show")
 def metadata_show(
    table_id: str = typer.Argument(..., help="Table ID to show metadata for"),
--- a/cli/commands/diagnose.py
+++ b/cli/commands/diagnose.py
@ -16,6 +16,16 @@ def diagnose(
    symptom: str = typer.Option(None, "--symptom", help="Describe the problem"),
    component: str = typer.Option(None, "--component", help="Check specific component"),
    as_json: bool = typer.Option(False, "--json", help="Output as JSON"),
+    include_schema: bool = typer.Option(
+        False,
+        "--include-schema",
+        help=(
+            "Include the DB schema-version check. Off by default since the "
+            "answer is rarely actionable on a healthy instance and shows up "
+            "as noise in the agent-facing output (issue #204). On when the "
+            "operator is verifying a migration."
+        ),
+    ),
 ):
    """Run comprehensive system diagnostics. AI-agent friendly output."""
    # If a subcommand was invoked (e.g. `agnes diagnose system`), defer to it
@ -33,7 +43,8 @@ def diagnose(

        # Detailed health (auth required) for service-level checks
        try:
-            resp_d = api_get("/api/health/detailed")
+            params = {"include": "schema"} if include_schema else None
+            resp_d = api_get("/api/health/detailed", params=params)
            detailed = resp_d.json()
            for svc_name, svc_data in detailed.get("services", {}).items():
                check = {"name": svc_name, "status": svc_data.get("status", "unknown")}
@ -45,7 +56,8 @@ def diagnose(
    except Exception as e:
        checks.append({"name": "api", "status": "error", "detail": str(e)})

-    # Determine overall
+    # Determine overall — `info` and `unknown` surface in the per-check
+    # output but never promote the headline (issue #178).
    overall = "healthy"
    for c in checks:
        if c["status"] == "error":
--- a/cli/commands/init.py
+++ b/cli/commands/init.py
@ -95,6 +95,29 @@ def init(
            }}), err=True)
            raise typer.Exit(1)

+    # On --force, snapshot the existing CLAUDE.md before regenerating it
+    # so an operator who edited it can recover their notes (issue #164).
+    # Backup name carries an ISO timestamp so multiple `--force` runs in
+    # the same workspace don't clobber each other. We write the backup
+    # *after* the existing-workspace gate above so the un-forced path is
+    # unchanged.
+    if claude_md.exists() and force:
+        try:
+            ts = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
+            backup_path = workspace / f"CLAUDE.md.bak.{ts}"
+            backup_path.write_bytes(claude_md.read_bytes())
+            typer.echo(f"Backed up existing CLAUDE.md → {backup_path.name}")
+        except OSError as exc:
+            # FS error on the backup is annoying but shouldn't abort the
+            # init. Surface it so the operator knows their pre-existing
+            # CLAUDE.md is about to be overwritten without a recoverable
+            # copy on disk, then proceed.
+            typer.echo(
+                f"Warning: could not write CLAUDE.md backup ({exc}); "
+                f"continuing with --force overwrite",
+                err=True,
+            )
+
    # ------------------------------------------------------------------
    # Step 2: verify the PAT via /api/catalog/tables.
    #
--- a/cli/lib/pull.py
+++ b/cli/lib/pull.py
@ -65,6 +65,44 @@ class PullResult:
 _SAFE_ID_RE = re.compile(r"^[a-zA-Z0-9_\-]{1,128}$")


+def _read_progress_interval_seconds() -> float:
+    """Seconds between forced progress emissions per file. Default 5 s.
+
+    Tighter cadence than the original 30 s default keeps non-TTY consumers
+    (Claude Code sub-agent watchdogs, CI runners) from killing the process
+    on apparent silence during a slow chunk. Override via
+    `AGNES_PULL_PROGRESS_INTERVAL_SECONDS`. Issue #203.
+    """
+    raw = os.environ.get("AGNES_PULL_PROGRESS_INTERVAL_SECONDS", "")
+    if raw:
+        try:
+            v = float(raw)
+            if v > 0:
+                return v
+        except ValueError:
+            pass
+    return 5.0
+
+
+def _read_progress_interval_bytes() -> int:
+    """Bytes between forced progress emissions per file. Default 1 MiB.
+
+    Complements the time-based cadence so fast downloads also emit at a
+    reasonable rate (the original "every 10% of total" boundary went
+    unobserved on multi-GB parquets where 10% is tens of seconds of bytes).
+    Override via `AGNES_PULL_PROGRESS_INTERVAL_BYTES`. Issue #203.
+    """
+    raw = os.environ.get("AGNES_PULL_PROGRESS_INTERVAL_BYTES", "")
+    if raw:
+        try:
+            v = int(raw)
+            if v > 0:
+                return v
+        except ValueError:
+            pass
+    return 1024 * 1024
+
+
 class _TextualProgress:
    """Plain-text progress emitter for non-TTY stderr.

@ -74,9 +112,17 @@ class _TextualProgress:
    minutes on a multi-GB parquet) or emits raw ANSI noise. This class
    instead emits one terse line per file at sensible cadence.

-    Cadence policy: emit when *either*:
+    Cadence policy: emit when *any* of:
      - per-file bytes-downloaded crosses a 10%-of-total boundary, OR
-      - 30 s have elapsed since this file's last emission.
+      - more than ``AGNES_PULL_PROGRESS_INTERVAL_BYTES`` bytes (default
+        1 MiB) since this file's last emission, OR
+      - more than ``AGNES_PULL_PROGRESS_INTERVAL_SECONDS`` (default 5 s)
+        since this file's last emission.
+
+    The byte+second floor exists because sub-agent / CI watchdogs read
+    "no output for N seconds" as a hung process and kill it (issue #203);
+    the original 30 s / 10% policy was silent enough to trip those gates
+    on slow links.

    Always emits one final "done" line per file via `finish()` so the
    operator sees a confirmed completion even on tiny files.
@ -102,11 +148,14 @@ class _TextualProgress:
        self._total_files = total_files
        self._file_sizes = file_sizes
        self._lock = threading.Lock()
+        self._interval_seconds = _read_progress_interval_seconds()
+        self._interval_bytes = _read_progress_interval_bytes()
        # Per-file state.
        self._bytes: dict[str, int] = {tid: 0 for tid in file_sizes}
        self._started_at: dict[str, float] = {}
        self._last_emit_at: dict[str, float] = {}
        self._last_emit_pct: dict[str, int] = {}
+        self._last_emit_bytes: dict[str, int] = {}
        self._finished_idx: int = 0  # files whose `finish` line has been emitted

    def advance(self, tid: str, n: int) -> None:
@ -118,16 +167,23 @@ class _TextualProgress:
                self._started_at[tid] = now
                self._last_emit_at[tid] = now
                self._last_emit_pct[tid] = 0
+                self._last_emit_bytes[tid] = 0
            self._bytes[tid] = self._bytes.get(tid, 0) + n

            total = self._file_sizes.get(tid, 0)
            current = self._bytes[tid]
            pct = int((current * 100) / total) if total > 0 else 0
            elapsed = now - self._last_emit_at[tid]
+            bytes_since_emit = current - self._last_emit_bytes.get(tid, 0)
            crossed_10 = pct >= self._last_emit_pct[tid] + 10
-            if crossed_10 or elapsed >= 30.0:
+            if (
+                crossed_10
+                or elapsed >= self._interval_seconds
+                or bytes_since_emit >= self._interval_bytes
+            ):
                self._last_emit_at[tid] = now
                self._last_emit_pct[tid] = pct - (pct % 10)
+                self._last_emit_bytes[tid] = current
                self._emit_line(tid, current, total, now)

    def finish(self) -> None:
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,6 +1,6 @@
 [project]
 name = "agnes-the-ai-analyst"
-version = "0.44.1"
+version = "0.45.0"
 description = "Agnes — AI Data Analyst platform for AI analytical systems"
 requires-python = ">=3.11,<3.14"
 license = "MIT"
--- a/services/telegram_bot/runner.py
+++ b/services/telegram_bot/runner.py
@ -7,6 +7,7 @@ Runs the script as the owning user via the notify-scripts helper.

 import json
 import logging
+import re
 import subprocess

 from . import config
@ -15,12 +16,22 @@ logger = logging.getLogger(__name__)

 NOTIFY_SCRIPTS_BIN = "/usr/local/bin/notify-scripts"

+# POSIX-conservative username shape: alphanumerics, dot, hyphen, underscore;
+# must start with `[a-z_]` so the value can never be interpreted as a sudo
+# flag (e.g. `-u`, `--shell`). Mirrors the `useradd` defaults. Anything
+# outside this shape is refused before we hand it to `sudo -u`.
+_USERNAME_RE = re.compile(r"^[a-z_][a-z0-9._-]{0,31}$")
+

 def run_user_script(username: str, script_name: str) -> dict | None:
    """Run a notification script as the specified user and return parsed JSON output.

    Returns None on error, or the parsed JSON dict on success.
    """
+    if not _USERNAME_RE.match(username):
+        logger.error(f"Refusing to run script: invalid username shape: {username!r}")
+        return None
+
    if not script_name.endswith(".py"):
        logger.warning(f"Not a Python script: {script_name}")
        return None
--- a/services/telegram_bot/storage.py
+++ b/services/telegram_bot/storage.py
@ -7,7 +7,7 @@ Thread-safe file operations with atomic writes.
 import json
 import logging
 import os
-import random
+import secrets
 import string
 import tempfile
 import time
@ -95,8 +95,14 @@ def get_user_status(username: str) -> dict | None:


 def _generate_code() -> str:
-    """Generate a random numeric verification code."""
-    return "".join(random.choices(string.digits, k=config.CODE_LENGTH))
+    """Generate a cryptographically random numeric verification code.
+
+    Uses `secrets.choice` (CSPRNG) rather than `random.choices` because
+    pairing codes gate account linkage — a predictable PRNG output would
+    let an attacker who scrapes one code recover the RNG state and predict
+    others issued in the same process.
+    """
+    return "".join(secrets.choice(string.digits) for _ in range(config.CODE_LENGTH))


 def _cleanup_expired(codes: dict) -> dict:
--- a/tests/test_cli_admin.py
+++ b/tests/test_cli_admin.py
@ -156,6 +156,114 @@ class TestMetadataShow:
        assert result.exit_code == 1


+class TestUnregisterTable:
+    """Issue #177: `agnes admin unregister-table` wraps DELETE
+    /api/admin/registry/{id}. The server endpoint already does the
+    parquet/sync_state cleanup; the CLI is a thin client."""
+
+    def test_unregister_success(self):
+        with patch("cli.commands.admin.api_delete", return_value=_resp(204)):
+            result = runner.invoke(
+                app, ["admin", "unregister-table", "orders", "--yes"]
+            )
+        assert result.exit_code == 0, result.output
+        assert "Unregistered: orders" in result.output
+
+    def test_unregister_not_found(self):
+        with patch(
+            "cli.commands.admin.api_delete",
+            return_value=_resp(404, {"detail": "Table not found"}),
+        ):
+            result = runner.invoke(
+                app, ["admin", "unregister-table", "nope", "--yes"]
+            )
+        assert result.exit_code == 1
+
+    def test_unregister_prompts_without_yes(self):
+        """Without --yes, the CLI confirms before destructive action."""
+        with patch("cli.commands.admin.api_delete", return_value=_resp(204)) as d:
+            # Simulate operator typing "n" at the prompt.
+            result = runner.invoke(
+                app, ["admin", "unregister-table", "orders"], input="n\n"
+            )
+        # Either Aborted (exit 0) or refuses entirely; either way the
+        # server must not have been called.
+        d.assert_not_called()
+        assert result.exit_code == 0
+
+
+class TestUpdateTable:
+    """Issue #177: `agnes admin update-table` wraps PUT
+    /api/admin/registry/{id}. Only fields the operator passes go in the
+    body — server-side merge keeps the rest unchanged."""
+
+    def test_update_only_supplied_fields_sent(self):
+        captured = {}
+
+        def fake_put(path, **kwargs):
+            captured["path"] = path
+            captured["json"] = kwargs.get("json")
+            return _resp(200, {"id": "orders", "updated": ["bucket"]})
+
+        with patch("cli.commands.admin.api_put", side_effect=fake_put):
+            result = runner.invoke(
+                app, ["admin", "update-table", "orders", "--bucket", "out.c-prod"]
+            )
+        assert result.exit_code == 0, result.output
+        assert captured["path"] == "/api/admin/registry/orders"
+        # description must NOT be in the body — operator didn't pass it.
+        assert captured["json"] == {"bucket": "out.c-prod"}
+        assert "Updated orders" in result.output
+
+    def test_update_inline_query_for_materialized(self):
+        captured = {}
+
+        def fake_put(path, **kwargs):
+            captured["json"] = kwargs.get("json")
+            return _resp(200, {"id": "rev", "updated": ["query_mode", "source_query"]})
+
+        with patch("cli.commands.admin.api_put", side_effect=fake_put):
+            result = runner.invoke(app, [
+                "admin", "update-table", "rev",
+                "--query-mode", "materialized",
+                "--query", "SELECT 1",
+            ])
+        assert result.exit_code == 0, result.output
+        assert captured["json"]["query_mode"] == "materialized"
+        assert captured["json"]["source_query"] == "SELECT 1"
+
+    def test_update_query_at_file(self, tmp_path):
+        sql_file = tmp_path / "q.sql"
+        sql_file.write_text("SELECT * FROM orders\n")
+        captured = {}
+
+        def fake_put(path, **kwargs):
+            captured["json"] = kwargs.get("json")
+            return _resp(200, {"id": "rev", "updated": ["source_query"]})
+
+        with patch("cli.commands.admin.api_put", side_effect=fake_put):
+            result = runner.invoke(
+                app, ["admin", "update-table", "rev", "--query", f"@{sql_file}"]
+            )
+        assert result.exit_code == 0, result.output
+        assert captured["json"]["source_query"] == "SELECT * FROM orders"
+
+    def test_update_no_fields_supplied_errors(self):
+        result = runner.invoke(app, ["admin", "update-table", "orders"])
+        assert result.exit_code == 2
+        assert "No fields supplied" in (result.output + (result.stderr or ""))
+
+    def test_update_table_not_found(self):
+        with patch(
+            "cli.commands.admin.api_put",
+            return_value=_resp(404, {"detail": "Table not found"}),
+        ):
+            result = runner.invoke(
+                app, ["admin", "update-table", "nope", "--bucket", "x"]
+            )
+        assert result.exit_code == 1
+
+
 def test_admin_set_role_returns_hardfail():
    """v19: `agnes admin set-role` was removed. Calling it must hard-fail
    with a non-zero exit code and a message pointing at the replacement
--- a/tests/test_cli_init.py
+++ b/tests/test_cli_init.py
@ -123,6 +123,39 @@ def test_init_force_preserves_local_md(tmp_path, monkeypatch):
    assert "my notes" in (tmp_path / ".claude" / "CLAUDE.local.md").read_text()


+def test_init_force_backs_up_existing_claude_md(tmp_path, monkeypatch):
+    """Issue #164: --force overwrites CLAUDE.md, but the prior content
+    must be preserved as `CLAUDE.md.bak.<timestamp>` so an operator who
+    edited it can recover their notes. The backup carries an ISO
+    timestamp so re-running --force in the same workspace doesn't
+    clobber a prior backup.
+    """
+    monkeypatch.setenv("AGNES_CONFIG_DIR", str(tmp_path / "_cfg"))
+    api_get = _make_api_get()
+    monkeypatch.setattr("cli.commands.init.api_get", api_get, raising=False)
+    monkeypatch.setattr("cli.lib.pull.api_get", api_get, raising=False)
+
+    # Seed an existing CLAUDE.md the operator has edited.
+    (tmp_path / "CLAUDE.md").write_text(
+        "# AI Data Analyst\n\nMy custom edits — must survive reinit.\n"
+    )
+
+    r = runner.invoke(init_app, [
+        "--server-url", "http://x",
+        "--token", "t",
+        "--workspace", str(tmp_path),
+        "--force",
+    ])
+    assert r.exit_code == 0, r.output
+
+    # Backup file: glob since the timestamp is dynamic.
+    backups = list(tmp_path.glob("CLAUDE.md.bak.*"))
+    assert len(backups) == 1, [p.name for p in backups]
+    assert "must survive reinit" in backups[0].read_text()
+    # The summary line names the backup so the operator can find it.
+    assert "Backed up" in r.output, r.output
+
+
 def test_init_partial_state_friendly_exit(tmp_path, monkeypatch):
    """CLAUDE.md exists with marker but no settings.json -> friendly hint, exit 1."""
    monkeypatch.setenv("AGNES_CONFIG_DIR", str(tmp_path / "_cfg"))
--- a/tests/test_diagnose_billing.py
+++ b/tests/test_diagnose_billing.py
@ -1,9 +1,11 @@
-"""Phase K — `agnes diagnose` warning when BQ billing_project == project.
+"""Phase K — `agnes diagnose` info entry when BQ billing_project == project.

 Surfaces via /api/health/detailed (which `agnes diagnose` already consumes):
 when data_source.type == 'bigquery' and the resolved BqProjects.billing equals
 BqProjects.data, the response includes a `services.bq_config` entry with
-status='warning' and a hint about the 403 USER_PROJECT_DENIED footgun.
+status='info' (since #178 — was 'warning' before) and a hint about the 403
+USER_PROJECT_DENIED footgun. `info` keeps the message visible without
+promoting the overall check to 'degraded' the way 'warning' did.
 """

 import pytest
@ -46,7 +48,14 @@ def _reset_after(monkeypatch):


 def test_diagnose_warns_when_billing_equals_project(seeded_app, monkeypatch):
-    """BQ instance with billing_project missing (or equal to project) → warning."""
+    """BQ instance with billing_project missing (or equal to project) → info.
+
+    Pre-#178 this returned `warning`, which promoted the overall headline
+    to `degraded`. The check is informational — many valid single-project
+    dev instances run with billing == data — so it now returns `info` and
+    the headline stays `healthy` (issue #178). The detail message still
+    appears so operators can see it.
+    """
    _patch_instance_config(monkeypatch, {
        "data_source": {
            "type": "bigquery",
@ -65,10 +74,12 @@ def test_diagnose_warns_when_billing_equals_project(seeded_app, monkeypatch):

    bq_cfg = body.get("services", {}).get("bq_config")
    assert bq_cfg is not None, body
-    assert bq_cfg.get("status") == "warning", bq_cfg
+    assert bq_cfg.get("status") == "info", bq_cfg
    # Hint mentions the YAML field path so operators know what to fix.
    blob = (str(bq_cfg.get("detail", "")) + " " + str(bq_cfg.get("hint", ""))).lower()
    assert "billing_project" in blob, bq_cfg
+    # Info severity must not promote the headline to degraded.
+    assert body.get("status") == "healthy", body.get("status")


 def test_diagnose_clean_when_billing_differs(seeded_app, monkeypatch):
--- a/tests/test_health_schema_gate.py
+++ b/tests/test_health_schema_gate.py
@ -0,0 +1,90 @@
+"""Health-check schema-version check is opt-in (#204) and severity (#178).
+
+Two adjacent behaviors:
+
+- `GET /api/health/detailed` no longer includes `db_schema` by default.
+  Pass `?include=schema` to get it. Rationale: the schema version is
+  rarely actionable on a healthy instance and used to dominate the
+  agent-facing `agnes diagnose` output.
+
+- `info` severity entries appear in the response but never promote the
+  overall status to `degraded` (only `warning` does) or `unhealthy`
+  (only `error` does). This lets the BQ billing-equals-data check stay
+  visible without falsely tripping the headline.
+
+The `bq_config == info` assertion is in test_diagnose_billing.py; here
+we cover the schema gate and a synthetic `info`-doesn't-promote case.
+"""
+
+from __future__ import annotations
+
+
+def _auth(token: str) -> dict:
+    return {"Authorization": f"Bearer {token}"}
+
+
+def test_schema_check_omitted_by_default(seeded_app):
+    """Default response does not include `db_schema` (issue #204)."""
+    c = seeded_app["client"]
+    token = seeded_app["admin_token"]
+    r = c.get("/api/health/detailed", headers=_auth(token))
+    assert r.status_code == 200, r.text
+    assert "db_schema" not in r.json().get("services", {})
+
+
+def test_schema_check_present_when_include_schema(seeded_app):
+    """`?include=schema` returns the legacy entry verbatim."""
+    c = seeded_app["client"]
+    token = seeded_app["admin_token"]
+    r = c.get(
+        "/api/health/detailed",
+        headers=_auth(token),
+        params={"include": "schema"},
+    )
+    assert r.status_code == 200, r.text
+    services = r.json().get("services", {})
+    assert "db_schema" in services
+    # Healthy seeded test app must report ok against the current schema.
+    assert services["db_schema"].get("db_schema") == "ok", services["db_schema"]
+
+
+def test_unrecognised_include_token_is_ignored(seeded_app):
+    """Unknown include tokens don't error or surface; forward-compatible."""
+    c = seeded_app["client"]
+    token = seeded_app["admin_token"]
+    r = c.get(
+        "/api/health/detailed",
+        headers=_auth(token),
+        params={"include": "schema,bogus"},
+    )
+    assert r.status_code == 200, r.text
+    services = r.json().get("services", {})
+    assert "db_schema" in services
+    assert "bogus" not in services
+
+
+def test_info_severity_does_not_promote_overall(seeded_app, monkeypatch):
+    """Issue #178: a service returning `status: info` must NOT push the
+    headline to `degraded`. Only `warning`+ does that.
+
+    We synthesize an `info` entry by patching `_check_session_pipeline`
+    (any of the lazy checks would do) so we exercise the aggregator
+    without depending on a particular check's natural state.
+    """
+    import app.api.health as health_mod
+
+    def _fake_session_pipeline(_conn):
+        return {"status": "info", "detail": "synthetic info entry"}
+
+    monkeypatch.setattr(
+        health_mod, "_check_session_pipeline", _fake_session_pipeline
+    )
+
+    c = seeded_app["client"]
+    token = seeded_app["admin_token"]
+    r = c.get("/api/health/detailed", headers=_auth(token))
+    assert r.status_code == 200, r.text
+    body = r.json()
+    assert body["services"]["session_pipeline"]["status"] == "info"
+    # The critical assertion — info must not promote the headline.
+    assert body["status"] == "healthy", body["status"]
--- a/tests/test_pull_progress.py
+++ b/tests/test_pull_progress.py
@ -121,3 +121,65 @@ def test_textual_progress_emits_at_completion(
        or "done" in captured.err.lower()
        or "complete" in captured.err.lower()
    )
+
+
+class TestProgressIntervalKnobs:
+    """Issue #203: cadence is configurable via env vars so non-TTY
+    consumers (CI runners, sub-agent watchdogs) can tighten the floor
+    when the default is too quiet for their dead-process detector."""
+
+    def _stream(self):
+        return io.StringIO()
+
+    def test_default_seconds_floor_is_5s(self, monkeypatch):
+        """Default cadence is 5 s (was 30 s pre-#203)."""
+        monkeypatch.delenv("AGNES_PULL_PROGRESS_INTERVAL_SECONDS", raising=False)
+        from cli.lib.pull import _read_progress_interval_seconds
+        assert _read_progress_interval_seconds() == 5.0
+
+    def test_default_bytes_floor_is_1mib(self, monkeypatch):
+        """Default cadence is 1 MiB; complements the time-based floor."""
+        monkeypatch.delenv("AGNES_PULL_PROGRESS_INTERVAL_BYTES", raising=False)
+        from cli.lib.pull import _read_progress_interval_bytes
+        assert _read_progress_interval_bytes() == 1024 * 1024
+
+    def test_seconds_env_override(self, monkeypatch):
+        monkeypatch.setenv("AGNES_PULL_PROGRESS_INTERVAL_SECONDS", "0.5")
+        from cli.lib.pull import _read_progress_interval_seconds
+        assert _read_progress_interval_seconds() == 0.5
+
+    def test_bytes_env_override(self, monkeypatch):
+        monkeypatch.setenv("AGNES_PULL_PROGRESS_INTERVAL_BYTES", "131072")
+        from cli.lib.pull import _read_progress_interval_bytes
+        assert _read_progress_interval_bytes() == 131072
+
+    def test_invalid_envs_fall_back_to_default(self, monkeypatch):
+        """Garbage input doesn't break the pull — fall back to defaults."""
+        monkeypatch.setenv("AGNES_PULL_PROGRESS_INTERVAL_SECONDS", "nope")
+        monkeypatch.setenv("AGNES_PULL_PROGRESS_INTERVAL_BYTES", "-1")
+        from cli.lib.pull import (
+            _read_progress_interval_bytes,
+            _read_progress_interval_seconds,
+        )
+        assert _read_progress_interval_seconds() == 5.0
+        assert _read_progress_interval_bytes() == 1024 * 1024
+
+    def test_byte_floor_emits_more_often_than_pct_threshold(self, monkeypatch):
+        """A 100 MB file with 1 MiB byte cadence should emit far more
+        than 10 progress lines (the 10%-of-total cadence alone would).
+        This was the operator complaint in #203: on multi-GB parquets
+        the 30 s / 10 % policy produced one line every ~3 minutes."""
+        monkeypatch.setenv("AGNES_PULL_PROGRESS_INTERVAL_SECONDS", "9999")
+        monkeypatch.setenv("AGNES_PULL_PROGRESS_INTERVAL_BYTES", "1048576")
+        from cli.lib.pull import _TextualProgress
+        sink = self._stream()
+        total = 100 * 1024 * 1024  # 100 MiB
+        prog = _TextualProgress(
+            stream=sink, total_files=1, file_sizes={"tbl": total}
+        )
+        chunk = 64 * 1024  # 64 KiB chunks → 1600 advances
+        for _ in range(total // chunk):
+            prog.advance("tbl", chunk)
+        prog.finish()
+        emitted = sink.getvalue().count("\n")
+        assert emitted >= 50, f"only {emitted} lines emitted; cadence too coarse"
--- a/tests/test_telegram_bot_runner.py
+++ b/tests/test_telegram_bot_runner.py
@ -0,0 +1,60 @@
+"""Tests for `services.telegram_bot.runner` username validation.
+
+Issue #84: the runner shells out via `sudo -u <username>`. Without an
+input gate, a username controlled by an attacker (via tampering with
+the linked-users JSON, or via an upstream caller that doesn't validate)
+could carry sudo flags or shell metacharacters. Every value flowing
+into `subprocess.run([..., "-u", username, ...])` must match a
+POSIX-conservative shape; bad shapes are refused before the subprocess
+fires.
+"""
+
+from unittest.mock import patch
+
+from services.telegram_bot.runner import _USERNAME_RE, run_user_script
+
+
+def test_username_regex_accepts_normal_usernames():
+    for u in ("alice", "bob42", "data_ops", "svc-agnes", "_system"):
+        assert _USERNAME_RE.match(u), u
+
+
+def test_username_regex_rejects_obvious_attacks():
+    bad = [
+        "-u",                 # sudo flag
+        "--shell=/bin/bash",  # GNU long flag
+        "alice; rm -rf /",    # shell metachar
+        "alice && id",
+        "alice|cat /etc/shadow",
+        "alice$IFS",
+        "1starts_with_digit",
+        "alice/with/slash",
+        "alice with space",
+        "",                    # empty
+        "a" * 33,              # too long
+    ]
+    for u in bad:
+        assert not _USERNAME_RE.match(u), u
+
+
+def test_run_user_script_refuses_bad_username_without_subprocess():
+    """If validation refuses the username, subprocess.run must not fire.
+
+    Pre-fix, a tampered telegram_users.json with `username = "-u root"`
+    would have sudo'd as root via flag injection. The fix has the runner
+    short-circuit to None before any subprocess call.
+    """
+    with patch("services.telegram_bot.runner.subprocess.run") as run_mock:
+        result = run_user_script("-u", "ok_script.py")
+    assert result is None
+    run_mock.assert_not_called()
+
+
+def test_run_user_script_refuses_bad_script_name_without_subprocess():
+    """Existing guard at L24 rejects non-.py scripts; verify it still does
+    after the new username gate so a valid username + bad script combo
+    doesn't slip through and run."""
+    with patch("services.telegram_bot.runner.subprocess.run") as run_mock:
+        result = run_user_script("alice", "not_python.sh")
+    assert result is None
+    run_mock.assert_not_called()
--- a/tests/test_telegram_storage.py
+++ b/tests/test_telegram_storage.py
@ -113,3 +113,29 @@ class TestVerificationCodes:

        result = verify_code("123456")
        assert result is None
+
+    def test_code_uses_csprng_not_random_module(self, storage_paths):
+        """Issue #84: pairing-code RNG must not be derivable from
+        `random.seed`. Pre-fix the generator used `random.choices`, which
+        means an attacker who scrapes one code can recover the PRNG state
+        and predict subsequent codes issued in the same process. The fix
+        switched to `secrets.choice` (CSPRNG-backed); seeding the `random`
+        module must therefore have no effect on the produced codes.
+        """
+        import random as _random
+        from services.telegram_bot.storage import _generate_code
+
+        _random.seed(42)
+        first = _generate_code()
+        _random.seed(42)
+        second = _generate_code()
+        # If the generator still used `random`, seed(42) would force two
+        # identical sequences. With `secrets`, they're independent draws
+        # from the OS CSPRNG and equal only by astronomical coincidence
+        # — well below any reasonable test flake threshold for a
+        # length-CODE_LENGTH digit string (1 in 10**CODE_LENGTH).
+        assert first != second, (
+            f"Generator appears to use seedable PRNG (got identical "
+            f"codes {first!r} after re-seeding); fix #84 may have "
+            f"regressed."
+        )