release: 0.53.2 — diagnose silent-capture check + urllib3 2.7.0 + flaky-test fix (#270)

Three bundled improvements:

- #244 — new `agnes diagnose` check compares SessionStart events
  (~/.claude/projects/<encoded>/*.jsonl) against agnes-push uploaded
  log entries inside a 7-day window. Surfaces a warning when the gap
  exceeds 3, hinting at silently-broken capture-session — previously
  detectable only weeks after the fact.

- Dependabot — bumps transitive urllib3 from 1.26.20 to 2.7.0 to close
  5 advisories (4 high, 1 medium). kbcstorage 0.9.5 still pins
  urllib3<2.0.0 upstream; overridden via [tool.uv] override-dependencies
  since the SDK works fine against 2.x in practice (Client + Tables
  both flow through requests, which supports both lines).

- #252 — fix flaky test_scratch_dir_cleaned_up_after_failed_extraction
  by redirecting tempfile.tempdir to a per-test tmp_path. Pre-#252 the
  test scanned the shared system tmp dir and a sibling store test in
  another pytest-xdist worker could trip the assertion mid-window.

Closes #244. Closes #252.
This commit is contained in:
ZdenekSrotyr 2026-05-12 18:28:04 +02:00 committed by GitHub
parent ea6fcfda3b
commit c8de0e0f64
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 1647 additions and 1257 deletions

View file

@ -10,6 +10,26 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C
## [Unreleased] ## [Unreleased]
## [0.53.3] — 2026-05-12
Hygiene round closing #244 + #252 + clearing 5 Dependabot urllib3 advisories. (Originally cut as 0.53.2 — bumped to 0.53.3 after #264 / #268 landed as 0.53.2 in parallel.)
### Added
- **`agnes diagnose` flags silently-broken `agnes capture-session`** (#244). New check compares `~/.claude/projects/<encoded>/*.jsonl` (SessionStart events Claude Code wrote) against `<workspace>/.claude/agnes-sessions-uploaded.txt` (entries `agnes push` actually shipped) inside a 7-day window. If the gap exceeds 3 sessions, surfaces a `warning` status with both counts plus a `agnes capture-session --verbose` pointer for manual triage. Pre-#244 a stdin-contract change in Claude Code would silently stop session uploads with the only observable signal being "session uploads stopped happening" — usually noticed weeks later.
### Changed
- **`urllib3` bumped from 1.26.20 to 2.7.0** to close 5 Dependabot advisories (4 high, 1 medium): cross-origin sensitive-header leak on proxied low-level redirects, decompression-bomb safeguard bypass + unbounded decompression chain on the streaming API, and redirects-when-retries-disabled. `kbcstorage` 0.9.5 still declares `urllib3<2.0.0` upstream as of this release; we override it via `[tool.uv] override-dependencies` because the SDK works fine against 2.x in practice (we only use `Client` + `Tables`, both go through `requests`, which natively supports both lines). Keboola client + connector test paths exercised against 2.7.0 — no regressions.
### Fixed
- **`test_scratch_dir_cleaned_up_after_failed_extraction` no longer flakes under pytest-xdist** (#252). Pre-#252 the test scanned `tempfile.gettempdir()` for `agnes_store_*` directories and asserted the set hadn't grown across a request — but with `-n auto` workers a sibling store test in another worker could be mid-creation of its own `agnes_store_*` inside the [before, after] window, flipping the assertion. Test now redirects `tempfile.tempdir` to a per-test `tmp_path` so the glob only sees this test's scratch dir.
### Internal
- 8 regression tests in `tests/test_session_health.py` cover the #244 check matrix (ok / warning / info / threshold / window-bounds / malformed-log resilience).
## [0.53.2] — 2026-05-12 ## [0.53.2] — 2026-05-12
Two threads in one cut. **Operator surface:** `instance.brand` / Two threads in one cut. **Operator surface:** `instance.brand` /

View file

@ -1,11 +1,13 @@
"""Diagnose command — agnes diagnose.""" """Diagnose command — agnes diagnose."""
import json import json
from pathlib import Path
import typer import typer
from cli.client import api_get from cli.client import api_get
from cli.config import get_sync_state from cli.config import get_sync_state
from cli.lib.session_health import capture_session_health
diagnose_app = typer.Typer(help="System diagnostics") diagnose_app = typer.Typer(help="System diagnostics")
@ -56,6 +58,14 @@ def diagnose(
except Exception as e: except Exception as e:
checks.append({"name": "api", "status": "error", "detail": str(e)}) checks.append({"name": "api", "status": "error", "detail": str(e)})
# Issue #244: detect silently-broken capture-session by comparing
# observed SessionStart files against the uploaded-log entries.
# Adds one entry to `checks` with status ok / warning / info.
try:
checks.append(capture_session_health(Path.cwd()))
except Exception as e:
checks.append({"name": "capture-session", "status": "info", "detail": f"health check failed: {e}"})
# Determine overall — `info` and `unknown` surface in the per-check # Determine overall — `info` and `unknown` surface in the per-check
# output but never promote the headline (issue #178). # output but never promote the headline (issue #178).
overall = "healthy" overall = "healthy"
@ -74,6 +84,12 @@ def diagnose(
if c.get("stale_tables"): if c.get("stale_tables"):
for t in c["stale_tables"]: for t in c["stale_tables"]:
actions.append(f"Table '{t}' is stale. Run: agnes server logs scheduler | grep {t}") actions.append(f"Table '{t}' is stale. Run: agnes server logs scheduler | grep {t}")
if c["name"] == "capture-session" and c["status"] == "warning":
actions.append(
"Capture-session may be silently failing. Run "
"`agnes capture-session --verbose < ~/.claude/projects/<encoded>/<session>.jsonl` "
"against a recent session file to surface the real error."
)
result = { result = {
"overall": overall, "overall": overall,

136
cli/lib/session_health.py Normal file
View file

@ -0,0 +1,136 @@
"""Health check: detect silently-broken `agnes capture-session`.
Issue #244. `agnes capture-session` (the SessionStart hook helper)
exits 0 on every failure mode so the hook is invisible during session
startup. If Claude Code changes its stdin contract or capture-session
crashes mid-write, the uploaded-log stops growing but the SessionStart
events keep landing in `~/.claude/projects/<encoded>/`. The gap between
the two is a passive signal we can surface in `agnes diagnose`.
The check compares:
1. **Expected** count of session jsonl files in every
``~/.claude/projects/<encoded>/`` matching the current workspace with
``mtime`` within the configured window.
2. **Actual** count of entries in
``<workspace>/.claude/agnes-sessions-uploaded.txt`` whose
``<iso_timestamp>`` prefix falls within the same window.
If ``expected - actual`` exceeds the threshold, capture-session is
likely broken end-to-end. Emit a ``warning`` with both counts plus a
pointer to ``agnes capture-session --verbose`` for manual triage.
Window and threshold are conservative defaults (7d / 3) tuned to
surface stop-the-world breakage without false-positive churn on a
fresh workspace. Callers can override via ``window_days`` /
``threshold`` kwargs.
"""
from __future__ import annotations
from datetime import datetime, timedelta, timezone
from pathlib import Path
from typing import Any, Dict
from cli.lib.claude_sessions import find_claude_sessions_dirs
from cli.lib.session_queue import uploaded_log_path
_DEFAULT_WINDOW_DAYS = 7
_DEFAULT_THRESHOLD = 3
def _parse_uploaded_log_count(log_path: Path, cutoff: datetime) -> int:
if not log_path.exists():
return 0
count = 0
try:
text = log_path.read_text(encoding="utf-8")
except OSError:
return 0
for line in text.splitlines():
ts_str, sep, _ = line.partition("\t")
if not sep:
continue
try:
ts = datetime.strptime(ts_str.strip(), "%Y-%m-%dT%H:%M:%SZ").replace(
tzinfo=timezone.utc
)
except ValueError:
continue
if ts >= cutoff:
count += 1
return count
def _count_recent_session_files(workspace: Path, cutoff: datetime) -> int:
count = 0
for d in find_claude_sessions_dirs(workspace):
try:
iterator = d.glob("*.jsonl")
except OSError:
continue
for f in iterator:
try:
mtime_ts = f.stat().st_mtime
except OSError:
continue
mtime = datetime.fromtimestamp(mtime_ts, tz=timezone.utc)
if mtime >= cutoff:
count += 1
return count
def capture_session_health(
workspace: Path,
window_days: int = _DEFAULT_WINDOW_DAYS,
threshold: int = _DEFAULT_THRESHOLD,
now: datetime | None = None,
) -> Dict[str, Any]:
"""Return a diagnose-shaped check dict for capture-session health.
Status values:
- ``ok`` expected actual within threshold.
- ``warning`` observed SessionStart events that capture-session
didn't write to the uploaded log; likely broken end-to-end.
- ``info`` no SessionStart events in the window (no signal).
"""
if now is None:
now = datetime.now(timezone.utc)
cutoff = now - timedelta(days=window_days)
expected = _count_recent_session_files(workspace, cutoff)
actual = _parse_uploaded_log_count(uploaded_log_path(workspace), cutoff)
delta = expected - actual
base: Dict[str, Any] = {
"name": "capture-session",
"expected_sessions": expected,
"uploaded_entries": actual,
"window_days": window_days,
}
if expected == 0:
return {
**base,
"status": "info",
"detail": (
f"no Claude Code sessions observed in the last {window_days}d "
"for this workspace — nothing to verify"
),
}
if delta > threshold:
return {
**base,
"status": "warning",
"detail": (
f"{expected} SessionStart event(s) in the last {window_days}d "
f"but only {actual} entries in agnes-sessions-uploaded.txt — "
"capture-session may be silently failing. Try: "
"`agnes capture-session --verbose` against a session jsonl"
),
}
return {**base, "status": "ok"}

View file

@ -1,6 +1,6 @@
[project] [project]
name = "agnes-the-ai-analyst" name = "agnes-the-ai-analyst"
version = "0.53.2" version = "0.53.3"
description = "Agnes — AI Data Analyst platform for AI analytical systems" description = "Agnes — AI Data Analyst platform for AI analytical systems"
requires-python = ">=3.11,<3.14" requires-python = ">=3.11,<3.14"
license = "MIT" license = "MIT"
@ -101,6 +101,14 @@ dependencies = [
# a uniform API; OS releases the lock automatically on process exit (no # a uniform API; OS releases the lock automatically on process exit (no
# stale-lock detection required). Used by cli/lib/push_lock.py. # stale-lock detection required). Used by cli/lib/push_lock.py.
"filelock>=3.13,<4", "filelock>=3.13,<4",
# Transitive dependency hardened directly to dodge 5 dependabot advisories
# (4 high, 1 medium) flagged on urllib3<2.7.0: cross-origin sensitive
# header leak on proxied low-level redirects, decompression-bomb bypass
# + unbounded decompression chain on the streaming API, redirects-when-
# retries-disabled. Forced via `[tool.uv] override-dependencies` below
# because kbcstorage<=0.9.5 still pins urllib3<2.0.0 even though
# botocore/requests/google-cloud-* all support 2.x on Python 3.10+.
"urllib3>=2.7.0",
] ]
[project.optional-dependencies] [project.optional-dependencies]
@ -148,3 +156,11 @@ dev-dependencies = [
"openai>=1.30.0", "openai>=1.30.0",
"fastapi-debug-toolbar>=0.6.3", "fastapi-debug-toolbar>=0.6.3",
] ]
# Override the urllib3<2.0.0 ceiling kbcstorage 0.9.5 declares (upstream
# hasn't relaxed it as of 2026-05-12 but the SDK works fine against
# urllib3 2.x in practice — we only use `Client` + `Tables` from it and
# both go through `requests`, which natively supports both lines). Lets
# the resolver pick a urllib3 line that closes Dependabot advisories
# CVE-2024-37891 / CVE-2025-{xxx}. See `urllib3>=2.7.0` in [project]
# dependencies above for the security rationale.
override-dependencies = ["urllib3>=2.7.0"]

View file

@ -0,0 +1,206 @@
"""Regression coverage for cli.lib.session_health.capture_session_health.
Issue #244 — flag silently-broken `agnes capture-session` by comparing
session files in `~/.claude/projects/<encoded>/` against entries in
`<workspace>/.claude/agnes-sessions-uploaded.txt` within a sliding
window.
"""
from __future__ import annotations
import os
import time
from datetime import datetime, timedelta, timezone
from pathlib import Path
import pytest
def _set_home(monkeypatch, tmp_path):
"""Override the module-level ``_PROJECTS_DIR`` (evaluated once at
import via ``Path.home()``) so the check reads from a controlled
``~/.claude/projects/`` tree under tmp_path."""
import cli.lib.claude_sessions as cs
monkeypatch.setattr(cs, "_PROJECTS_DIR", tmp_path / ".claude" / "projects")
def _make_session_file(home: Path, workspace: Path, name: str, age_days: float) -> Path:
"""Write an empty jsonl into one of the candidate encoded dirs and
backdate its mtime."""
# Use variant-a encoding (slash→dash) — matches the macOS-friendly
# form cli/lib/claude_sessions.py emits first.
encoded = str(workspace.resolve()).replace("/", "-")
target = home / ".claude" / "projects" / encoded
target.mkdir(parents=True, exist_ok=True)
f = target / name
f.write_text("{}\n", encoding="utf-8")
# Backdate mtime
age = time.time() - (age_days * 86400)
os.utime(f, (age, age))
return f
def _append_uploaded_log(workspace: Path, when: datetime, transcript_path: str) -> None:
(workspace / ".claude").mkdir(parents=True, exist_ok=True)
log = workspace / ".claude" / "agnes-sessions-uploaded.txt"
line = f"{when.strftime('%Y-%m-%dT%H:%M:%SZ')}\t{transcript_path}\n"
with open(log, "a", encoding="utf-8") as f:
f.write(line)
def test_no_sessions_returns_info(tmp_path, monkeypatch):
"""Fresh workspace with no SessionStart events → info, not warning."""
workspace = tmp_path / "ws"
workspace.mkdir()
_set_home(monkeypatch, tmp_path / "home")
(tmp_path / "home").mkdir()
from cli.lib.session_health import capture_session_health
r = capture_session_health(workspace)
assert r["status"] == "info"
assert r["expected_sessions"] == 0
assert r["uploaded_entries"] == 0
def test_aligned_counts_returns_ok(tmp_path, monkeypatch):
"""SessionStart events match uploaded-log entries → ok."""
workspace = tmp_path / "ws"
workspace.mkdir()
home = tmp_path / "home"
home.mkdir()
_set_home(monkeypatch, home)
# 3 recent sessions
for i in range(3):
_make_session_file(home, workspace, f"s{i}.jsonl", age_days=2)
now = datetime.now(timezone.utc)
for i in range(3):
_append_uploaded_log(workspace, now - timedelta(days=2, hours=i),
f"/path/s{i}.jsonl")
from cli.lib.session_health import capture_session_health
r = capture_session_health(workspace)
assert r["status"] == "ok"
assert r["expected_sessions"] == 3
assert r["uploaded_entries"] == 3
def test_silent_breakage_returns_warning(tmp_path, monkeypatch):
"""SessionStart events ≫ uploaded entries (delta > threshold) → warning."""
workspace = tmp_path / "ws"
workspace.mkdir()
home = tmp_path / "home"
home.mkdir()
_set_home(monkeypatch, home)
# 10 recent SessionStart events
for i in range(10):
_make_session_file(home, workspace, f"s{i}.jsonl", age_days=2)
# only 2 uploads — capture-session silently dropped 8
now = datetime.now(timezone.utc)
for i in range(2):
_append_uploaded_log(workspace, now - timedelta(days=1), f"/p{i}.jsonl")
from cli.lib.session_health import capture_session_health
r = capture_session_health(workspace)
assert r["status"] == "warning"
assert r["expected_sessions"] == 10
assert r["uploaded_entries"] == 2
assert "capture-session may be silently failing" in r["detail"]
def test_older_sessions_outside_window_ignored(tmp_path, monkeypatch):
"""Sessions outside the window must not count toward expected."""
workspace = tmp_path / "ws"
workspace.mkdir()
home = tmp_path / "home"
home.mkdir()
_set_home(monkeypatch, home)
# 5 ancient sessions (60d ago) + 1 recent
for i in range(5):
_make_session_file(home, workspace, f"old{i}.jsonl", age_days=60)
_make_session_file(home, workspace, "recent.jsonl", age_days=2)
now = datetime.now(timezone.utc)
_append_uploaded_log(workspace, now - timedelta(days=2), "/p/recent.jsonl")
from cli.lib.session_health import capture_session_health
r = capture_session_health(workspace, window_days=7)
assert r["status"] == "ok"
assert r["expected_sessions"] == 1
assert r["uploaded_entries"] == 1
def test_uploaded_entries_outside_window_ignored(tmp_path, monkeypatch):
"""Old uploaded-log entries don't count even if SessionStart count is high."""
workspace = tmp_path / "ws"
workspace.mkdir()
home = tmp_path / "home"
home.mkdir()
_set_home(monkeypatch, home)
for i in range(10):
_make_session_file(home, workspace, f"s{i}.jsonl", age_days=1)
# 8 uploads but ancient — outside window
now = datetime.now(timezone.utc)
for i in range(8):
_append_uploaded_log(workspace, now - timedelta(days=60),
f"/p{i}.jsonl")
from cli.lib.session_health import capture_session_health
r = capture_session_health(workspace, window_days=7)
assert r["status"] == "warning"
assert r["expected_sessions"] == 10
assert r["uploaded_entries"] == 0
def test_threshold_respected(tmp_path, monkeypatch):
"""Delta within threshold stays ok (a couple unsynced sessions is fine)."""
workspace = tmp_path / "ws"
workspace.mkdir()
home = tmp_path / "home"
home.mkdir()
_set_home(monkeypatch, home)
for i in range(5):
_make_session_file(home, workspace, f"s{i}.jsonl", age_days=1)
now = datetime.now(timezone.utc)
# 3 uploads of 5 events → delta=2, threshold=3 → still ok
for i in range(3):
_append_uploaded_log(workspace, now - timedelta(days=1), f"/p{i}.jsonl")
from cli.lib.session_health import capture_session_health
r = capture_session_health(workspace, window_days=7, threshold=3)
assert r["status"] == "ok"
assert r["expected_sessions"] == 5
assert r["uploaded_entries"] == 3
def test_malformed_uploaded_log_lines_skipped(tmp_path, monkeypatch):
"""Garbage in uploaded-log doesn't crash the check; only well-formed
timestamped lines count."""
workspace = tmp_path / "ws"
workspace.mkdir()
(workspace / ".claude").mkdir()
home = tmp_path / "home"
home.mkdir()
_set_home(monkeypatch, home)
for i in range(3):
_make_session_file(home, workspace, f"s{i}.jsonl", age_days=1)
log = workspace / ".claude" / "agnes-sessions-uploaded.txt"
now = datetime.now(timezone.utc)
log.write_text(
"totally bogus line\n"
"\n" # blank
"no-tab-just-a-path\n"
f"{(now - timedelta(days=1)).strftime('%Y-%m-%dT%H:%M:%SZ')}\t/p.jsonl\n"
"not-a-timestamp\tstill-has-a-tab\n",
encoding="utf-8",
)
from cli.lib.session_health import capture_session_health
r = capture_session_health(workspace, window_days=7, threshold=3)
assert r["expected_sessions"] == 3
assert r["uploaded_entries"] == 1

View file

@ -590,15 +590,26 @@ class TestStoreSecurityFixes:
assert r2.status_code == 409, r2.text assert r2.status_code == 409, r2.text
assert r2.json()["detail"] == "conflict_global_suffix" assert r2.json()["detail"] == "conflict_global_suffix"
def test_scratch_dir_cleaned_up_after_failed_extraction(self, web_client, monkeypatch): def test_scratch_dir_cleaned_up_after_failed_extraction(self, web_client, monkeypatch, tmp_path):
"""Devin: ZIP-validation failure inside _safe_zip_extract was leaving """Devin: ZIP-validation failure inside _safe_zip_extract was leaving
the ``agnes_store_*`` scratch dir on disk because scratch creation the ``agnes_store_*`` scratch dir on disk because scratch creation
and cleanup lived in different try/finally scopes. After the fix and cleanup lived in different try/finally scopes. After the fix
both share one outer try/finally; assert the dir really is gone. both share one outer try/finally; assert the dir really is gone.
Issue #252: redirect ``tempfile.mkdtemp()`` to a per-test ``tmp_path``
via ``monkeypatch.setattr(tempfile, "tempdir", ...)`` so the
``agnes_store_*`` glob is scoped to this test's exclusive directory.
Pre-#252 the glob ran against the shared system tmp and would flake
when a sibling pytest-xdist worker's store test happened to be
mid-creation inside the [before, after] window.
""" """
import tempfile as _tempfile import tempfile as _tempfile
from pathlib import Path as _Path from pathlib import Path as _Path
# FastAPI app runs in-process under TestClient → patching the
# tempfile module here also redirects the server-side mkdtemp call.
monkeypatch.setattr(_tempfile, "tempdir", str(tmp_path))
# A ZIP whose only member traverses out of the destination — # A ZIP whose only member traverses out of the destination —
# _safe_zip_extract raises 422 zip_unsafe_path before it touches # _safe_zip_extract raises 422 zip_unsafe_path before it touches
# extractall. That's the simplest trigger that exits via # extractall. That's the simplest trigger that exits via
@ -608,7 +619,7 @@ class TestStoreSecurityFixes:
zf.writestr("../escape.txt", "boom") zf.writestr("../escape.txt", "boom")
bad_zip = buf.getvalue() bad_zip = buf.getvalue()
tmp_root = _Path(_tempfile.gettempdir()) tmp_root = tmp_path
before = {p.name for p in tmp_root.glob("agnes_store_*")} before = {p.name for p in tmp_root.glob("agnes_store_*")}
_, cookies = _create_user(web_client, "leak@x.com") _, cookies = _create_user(web_client, "leak@x.com")

2493
uv.lock

File diff suppressed because it is too large Load diff