feat(profile): /profile/sessions page + audit on detector exception + correct SCHEDULER_AUDIT_ACTIONS

Three changes addressing user feedback during e2e test of #179 + Devin Review on e86dd5ed.

1) /profile/sessions — new self-service user page in the user menu.
   Lists all session jsonls the caller uploaded via `agnes push` joined
   against session_extraction_state. Each row shows uploaded_at, file
   size, status badge (pending/processed/extracted), processed_at, and
   items_extracted. The page docstring + help text explicitly call out
   that items_extracted=0 means the verification detector ran fine but
   the LLM found no claims to track — that's the documented "no items"
   outcome, not a broken pipeline. Closes the gap surfaced during the
   e2e test of #176 where a user could see their sessions on disk and
   process them through the LLM but had no UI to inspect what happened.

2) run_verification_detector audits unhandled exceptions (Devin #1).
   If detector.run() threw anything other than the already-translated
   ValueError, the audit_log row was never written. The endpoint now
   wraps detector.run in try/except, records the exception in
   audit_params["unhandled_error"], then re-raises as 500 after audit.
   The /admin/scheduler-runs page surfaces the failure row with the
   error type + message.

3) SCHEDULER_AUDIT_ACTIONS list corrected (Devin #2). Previous list
   had "marketplaces_sync_all" (wrong — actual is "marketplace.sync_all")
   plus "data_refresh" and "scripts_run_due" which app/api/sync.py and
   app/api/scripts.py don't write to audit_log. Fixed to the four
   actually-logged strings; comment points at the missing audit calls
   as a follow-up.

Tests: tests/test_web_ui.py adds TestAdminRoleGuards::test_profile_sessions_page_no_admin_required and tightens test_admin_scheduler_runs_page_admin_only to assert the correct marketplace.sync_all string.
This commit is contained in:
ZdenekSrotyr 2026-05-05 08:57:35 +02:00
parent e86dd5edc5
commit 4c4dfee8e6
6 changed files with 226 additions and 10 deletions

View file

@ -45,10 +45,13 @@ Five-defect fix for the silently-broken session pipeline on default Compose depl
- **#179 review — `/api/admin/run-session-collector` would SystemExit the worker.** The endpoint called `collector.main()`, whose `argparse.parse_args()` parsed uvicorn's `sys.argv` (`['app.main:app', '--host', …]`) and called `sys.exit(2)` on the unrecognised flags. `SystemExit` inherits from `BaseException`, escapes FastAPI's exception machinery, and propagates through the thread pool — every scheduler tick that fired the endpoint either 500-ed or risked killing the uvicorn worker. Fix: `services/session_collector/collector.py` now exposes an argv-free `run(dry_run, verbose) -> (rc, stats)` helper; `main()` is a thin CLI shim around it and the admin endpoint calls `run()` directly. Audit log now carries the per-run stats (`users_processed`, `files_copied`, `files_skipped`) instead of just the rc. Regression tests in `tests/test_session_collector.py::TestRunHelper`. - **#179 review — `/api/admin/run-session-collector` would SystemExit the worker.** The endpoint called `collector.main()`, whose `argparse.parse_args()` parsed uvicorn's `sys.argv` (`['app.main:app', '--host', …]`) and called `sys.exit(2)` on the unrecognised flags. `SystemExit` inherits from `BaseException`, escapes FastAPI's exception machinery, and propagates through the thread pool — every scheduler tick that fired the endpoint either 500-ed or risked killing the uvicorn worker. Fix: `services/session_collector/collector.py` now exposes an argv-free `run(dry_run, verbose) -> (rc, stats)` helper; `main()` is a thin CLI shim around it and the admin endpoint calls `run()` directly. Audit log now carries the per-run stats (`users_processed`, `files_copied`, `files_skipped`) instead of just the rc. Regression tests in `tests/test_session_collector.py::TestRunHelper`.
- **#179 review — `python -m services.corporate_memory` crashed on missing LLM config instead of exiting cleanly.** The PR's fail-fast change made `collect_all()` raise `ValueError` when neither an `ai:` block nor `ANTHROPIC_API_KEY`/`LLM_API_KEY` was available. The `verification_detector` CLI was updated to catch it; the corporate-memory CLI was missed. Now also wrapped — operators get a one-line `Corporate Memory cannot run: <factory message>` on stderr and rc=1 instead of a raw traceback. Regression test in `tests/test_llm_connector.py::TestCorporateMemoryCollector::test_main_returns_1_on_no_ai_config_instead_of_traceback`. - **#179 review — `python -m services.corporate_memory` crashed on missing LLM config instead of exiting cleanly.** The PR's fail-fast change made `collect_all()` raise `ValueError` when neither an `ai:` block nor `ANTHROPIC_API_KEY`/`LLM_API_KEY` was available. The `verification_detector` CLI was updated to catch it; the corporate-memory CLI was missed. Now also wrapped — operators get a one-line `Corporate Memory cannot run: <factory message>` on stderr and rc=1 instead of a raw traceback. Regression test in `tests/test_llm_connector.py::TestCorporateMemoryCollector::test_main_returns_1_on_no_ai_config_instead_of_traceback`.
- **E2E test — Anthropic API rejected every extraction request.** The structured-output API now requires `additionalProperties: false` on every `{"type": "object"}` node in the json_schema; without it the API returns 400 `invalid_request_error` ("output_config.format.schema: For 'object' type, 'additionalProperties' must be explicitly set to false"). Surfaced on a real BQ-backed deploy: every uploaded session jsonl failed verification-extraction in a tight retry loop. Fix: `connectors/llm/anthropic_provider.py` now wraps the caller-supplied schema through a recursive `_strict_json_schema()` walker that adds the field where missing (preserving any explicit operator override), then passes the strict variant to the API. Six unit tests in `tests/test_llm_connector.py::TestStrictJsonSchema` pin the recursion across nested objects, array items, and the no-mutation invariant. - **E2E test — Anthropic API rejected every extraction request.** The structured-output API now requires `additionalProperties: false` on every `{"type": "object"}` node in the json_schema; without it the API returns 400 `invalid_request_error` ("output_config.format.schema: For 'object' type, 'additionalProperties' must be explicitly set to false"). Surfaced on a real BQ-backed deploy: every uploaded session jsonl failed verification-extraction in a tight retry loop. Fix: `connectors/llm/anthropic_provider.py` now wraps the caller-supplied schema through a recursive `_strict_json_schema()` walker that adds the field where missing (preserving any explicit operator override), then passes the strict variant to the API. Six unit tests in `tests/test_llm_connector.py::TestStrictJsonSchema` pin the recursion across nested objects, array items, and the no-mutation invariant.
- **#179 review — `/api/admin/run-verification-detector` skipped audit on unhandled exceptions.** If `detector.run()` threw anything other than the already-translated `ValueError` (DuckDB lock, network blip, unexpected SDK error), the audit_log row was never written — the operator's only signal was `docker logs agnes-scheduler-1`. The endpoint now wraps `detector.run` in try/except, records the exception in `audit_params["unhandled_error"]`, then re-raises as 500 after audit. The `/admin/scheduler-runs` page surfaces the failure row with the error type and message.
- **#179 review — `SCHEDULER_AUDIT_ACTIONS` listed action strings that don't actually appear in `audit_log`.** The list at `app/web/router.py:952` had `"marketplaces_sync_all"` (wrong — actual is `"marketplace.sync_all"`) plus `"data_refresh"` and `"scripts_run_due"` (which `app/api/sync.py` and `app/api/scripts.py` don't write). Corrected to the four actually-logged strings, with a comment pointing at the missing audit calls in sync/scripts as a follow-up.
### Added ### Added
- `/admin/scheduler-runs` — read-only admin page showing the last 200 audit-log entries from scheduler-driven actions (`run_session_collector`, `run_verification_detector`, `run_corporate_memory`, `marketplaces_sync_all`, `data_refresh`, `scripts_run_due`). New `AuditRepository.query_actions(actions, limit)` query helper, new admin nav entry under the Admin dropdown. Failed scheduler ticks (HTTP 401, network errors) don't reach the audit_log — those still live only in `docker logs agnes-scheduler-1`; the page calls that out with a hint to set `SCHEDULER_API_TOKEN` if no rows show up. - `/admin/scheduler-runs` — read-only admin page showing the last 200 audit-log entries from scheduler-driven actions (`run_session_collector`, `run_verification_detector`, `run_corporate_memory`, `marketplace.sync_all`). New `AuditRepository.query_actions(actions, limit)` query helper, new admin nav entry under the Admin dropdown. `data-refresh` (`POST /api/sync/trigger`) and `script-runner` (`POST /api/scripts/run-due`) are scheduler jobs but don't write to `audit_log` today, so they can't appear here yet. Failed scheduler ticks (HTTP 401, network errors) don't reach the audit_log either — those still live only in `docker logs agnes-scheduler-1`; the page calls that out with a hint to set `SCHEDULER_API_TOKEN` if no rows show up.
- `/profile/sessions` — self-service user page in the user menu, showing all session jsonls the caller uploaded via `agnes push` joined against `session_extraction_state`. Each row shows uploaded_at, file size, status badge (`pending` / `processed` / `extracted`), processed_at, and `items_extracted`. The page docstring explicitly calls out that `items_extracted = 0` means the verification detector ran successfully but the LLM found no claims worth tracking — that's the documented "no items" outcome, not a broken pipeline. Closes the gap surfaced during the e2e test of #176 where a user could see their sessions on disk and process them through the LLM but had no UI to inspect what happened.
### Internal ### Internal

View file

@ -2860,23 +2860,39 @@ def run_verification_detector(
raise HTTPException(status_code=500, detail=str(e)) raise HTTPException(status_code=500, detail=str(e))
job_conn = get_system_db() job_conn = get_system_db()
stats: dict = {}
job_error: Optional[Exception] = None
try: try:
stats = detector.run(job_conn, extractor, dry_run=False) stats = detector.run(job_conn, extractor, dry_run=False)
except Exception as e:
# Capture and re-raise after audit so an unhandled detector error
# (DuckDB lock, network blip, unexpected SDK type) still leaves a
# row in audit_log — the /admin/scheduler-runs page is the
# operator's only signal beyond docker logs.
job_error = e
finally: finally:
try: try:
job_conn.close() job_conn.close()
except Exception: except Exception:
pass pass
audit_params: dict = {
"items_created": stats.get("items_created", 0),
"errors": len(stats.get("errors", [])),
}
if job_error is not None:
audit_params["unhandled_error"] = f"{type(job_error).__name__}: {job_error}"
AuditRepository(conn).log( AuditRepository(conn).log(
user_id=user.get("id"), user_id=user.get("id"),
action="run_verification_detector", action="run_verification_detector",
resource="job:verification-detector", resource="job:verification-detector",
params={ params=audit_params,
"items_created": stats.get("items_created", 0),
"errors": len(stats.get("errors", [])),
},
) )
if job_error is not None:
raise HTTPException(status_code=500, detail=audit_params["unhandled_error"])
return {"ok": not stats.get("errors"), "details": stats} return {"ok": not stats.get("errors"), "details": stats}

View file

@ -5,7 +5,7 @@ Replicates all Flask webapp routes with DuckDB-backed data.
import logging import logging
import os import os
from datetime import datetime from datetime import datetime, timezone
from pathlib import Path from pathlib import Path
from typing import Optional from typing import Optional
from urllib.parse import quote from urllib.parse import quote
@ -949,13 +949,16 @@ async def admin_marketplaces_page(
# Scheduler-driven admin actions audited by app/api/admin.py and # Scheduler-driven admin actions audited by app/api/admin.py and
# app/api/marketplaces.py. Keep in sync with the JOBS list in # app/api/marketplaces.py. Keep in sync with the JOBS list in
# services/scheduler/__main__.py. # services/scheduler/__main__.py.
#
# `data-refresh` (POST /api/sync/trigger) and `script-runner`
# (POST /api/scripts/run-due) are scheduler jobs but they do NOT write
# audit_log today, so they can't appear here. If you add audit calls to
# those endpoints, add the matching action strings to this list.
SCHEDULER_AUDIT_ACTIONS = [ SCHEDULER_AUDIT_ACTIONS = [
"run_session_collector", "run_session_collector",
"run_verification_detector", "run_verification_detector",
"run_corporate_memory", "run_corporate_memory",
"marketplaces_sync_all", "marketplace.sync_all",
"data_refresh",
"scripts_run_due",
] ]
@ -1110,6 +1113,76 @@ async def profile_page(
return templates.TemplateResponse(request, "profile.html", ctx) return templates.TemplateResponse(request, "profile.html", ctx)
@router.get("/profile/sessions", response_class=HTMLResponse)
async def profile_sessions_page(
request: Request,
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""User-self-view of own uploaded sessions and their extraction state.
Walks `${DATA_DIR}/user_sessions/<user_id>/*.jsonl` for the caller's
own user_id, joins each file against `session_extraction_state` to
surface processed_at + items_extracted, and renders a table.
Items_extracted = 0 means the verification_detector ran but the LLM
found no claims worth tracking that's the documented "no items"
outcome; it does NOT mean the pipeline is broken.
"""
import pathlib
user_id = user["id"]
data_dir = pathlib.Path(os.environ.get("DATA_DIR", "/data"))
user_sessions_dir = data_dir / "user_sessions" / user_id
files = []
if user_sessions_dir.is_dir():
for jsonl in sorted(user_sessions_dir.glob("*.jsonl"), key=lambda p: p.stat().st_mtime, reverse=True):
try:
stat = jsonl.stat()
except OSError:
continue
files.append({
"name": jsonl.name,
"size_bytes": stat.st_size,
"mtime": datetime.fromtimestamp(stat.st_mtime, tz=timezone.utc),
})
state_map: dict = {}
if files:
keys = [f"{user_id}/{f['name']}" for f in files]
placeholders = ",".join("?" for _ in keys)
rows = conn.execute(
f"""SELECT session_file, processed_at, items_extracted, file_hash
FROM session_extraction_state
WHERE session_file IN ({placeholders})""",
keys,
).fetchall()
cols = [d[0] for d in conn.description]
for row in rows:
d = dict(zip(cols, row))
state_map[d["session_file"]] = d
rows_view = []
for f in files:
key = f"{user_id}/{f['name']}"
state = state_map.get(key)
rows_view.append({
"name": f["name"],
"size_kb": round(f["size_bytes"] / 1024, 1),
"uploaded_at": f["mtime"],
"processed_at": state["processed_at"] if state else None,
"items_extracted": state["items_extracted"] if state else None,
"is_processed": state is not None,
})
ctx = _build_context(
request,
user=user,
sessions=rows_view,
user_id=user_id,
)
return templates.TemplateResponse(request, "profile_sessions.html", ctx)
@router.get("/_debug/throw/http/{code:int}", response_class=HTMLResponse, include_in_schema=False) @router.get("/_debug/throw/http/{code:int}", response_class=HTMLResponse, include_in_schema=False)
async def _debug_throw_http(request: Request, code: int): async def _debug_throw_http(request: Request, code: int):
"""Dev helper — raise an HTTPException with the given status code. """Dev helper — raise an HTTPException with the given status code.

View file

@ -58,7 +58,8 @@
<div class="app-user-menu-role">Admin</div> <div class="app-user-menu-role">Admin</div>
{% endif %} {% endif %}
</div> </div>
<a class="app-user-menu-item {% if _path.startswith('/profile') %}is-active{% endif %}" role="menuitem" href="/profile">Profile</a> <a class="app-user-menu-item {% if _path == '/profile' %}is-active{% endif %}" role="menuitem" href="/profile">Profile</a>
<a class="app-user-menu-item {% if _path == '/profile/sessions' %}is-active{% endif %}" role="menuitem" href="/profile/sessions">My sessions</a>
<a class="app-user-menu-item {% if _path == '/tokens' %}is-active{% endif %}" role="menuitem" href="/tokens">My tokens</a> <a class="app-user-menu-item {% if _path == '/tokens' %}is-active{% endif %}" role="menuitem" href="/tokens">My tokens</a>
{% if config.DEBUG_AUTH_ENABLED %} {% if config.DEBUG_AUTH_ENABLED %}
<a class="app-user-menu-item {% if _path.startswith('/me/debug') %}is-active{% endif %}" role="menuitem" href="/me/debug">Auth debug</a> <a class="app-user-menu-item {% if _path.startswith('/me/debug') %}is-active{% endif %}" role="menuitem" href="/me/debug">Auth debug</a>

View file

@ -0,0 +1,109 @@
{% extends "base.html" %}
{% block title %}My sessions — {{ config.INSTANCE_NAME }}{% endblock %}
{% block content %}
<style>
.container:has(.sess-page) { max-width: none; padding: 24px 16px; }
.sess-page { max-width: 1200px; margin: 0 auto; padding: 0; }
.sess-title { margin: 0 0 8px 0; font-size: 22px; font-weight: 600; }
.sess-help { color: var(--text-secondary, #6b7280); font-size: 13px; margin-bottom: 20px; line-height: 1.55; }
.sess-help code { background: var(--border-light, #f3f4f6); padding: 1px 6px; border-radius: 4px; font-size: 12px; }
.sess-table-wrap {
background: var(--surface, #fff);
border: 1px solid var(--border, #e5e7eb);
border-radius: 12px;
overflow-x: auto;
}
.sess-table { width: 100%; border-collapse: collapse; font-size: 13px; }
.sess-table thead th {
text-align: left; padding: 12px 16px;
background: var(--border-light, #f9fafb);
border-bottom: 1px solid var(--border, #e5e7eb);
font-weight: 600; color: var(--text-secondary, #6b7280);
font-size: 11px; text-transform: uppercase; letter-spacing: 0.4px;
white-space: nowrap;
}
.sess-table tbody td {
padding: 10px 16px;
border-bottom: 1px solid var(--border-light, #f3f4f6);
vertical-align: middle;
}
.sess-table tbody tr:last-child td { border-bottom: none; }
.sess-table tbody tr:hover { background: var(--border-light, #fafafa); }
.sess-table .name { font-family: ui-monospace, SFMono-Regular, Menlo, monospace; font-size: 12px; }
.sess-table .ts { white-space: nowrap; color: var(--text-secondary, #6b7280); font-variant-numeric: tabular-nums; }
.sess-table .num { text-align: right; font-variant-numeric: tabular-nums; }
.badge {
display: inline-block; padding: 2px 8px; border-radius: 999px;
font-size: 11px; font-weight: 500;
}
.badge-pending { background: #fef3c7; color: #92400e; }
.badge-processed { background: #dbeafe; color: #1e40af; }
.badge-extracted { background: #d1fae5; color: #065f46; }
.empty {
padding: 40px 16px; text-align: center;
color: var(--text-secondary, #6b7280); font-size: 13px;
}
</style>
<div class="sess-page">
<h1 class="sess-title">My sessions</h1>
<p class="sess-help">
Sessions you uploaded via <code>agnes push</code> from your Claude Code workspace, with
extraction status from <code>session_extraction_state</code>.
<br>
<strong>Items extracted = 0</strong> means the verification detector ran successfully
but the LLM didn't find anything worth tracking in that session — that's expected for
sessions that are mostly tool calls or coding without confident factual claims.
<br>
<em>Pending</em> means the file is on disk but the scheduler hasn't processed it yet
(next verification-detector tick: every 15 min by default).
</p>
<div class="sess-table-wrap">
{% if sessions %}
<table class="sess-table">
<thead>
<tr>
<th>Session file</th>
<th>Uploaded</th>
<th class="num">Size</th>
<th>Status</th>
<th>Processed</th>
<th class="num">Items extracted</th>
</tr>
</thead>
<tbody>
{% for s in sessions %}
<tr>
<td class="name">{{ s.name }}</td>
<td class="ts">{{ s.uploaded_at.strftime("%Y-%m-%d %H:%M:%S UTC") }}</td>
<td class="num">{{ s.size_kb }} kB</td>
<td>
{% if not s.is_processed %}
<span class="badge badge-pending">pending</span>
{% elif s.items_extracted and s.items_extracted > 0 %}
<span class="badge badge-extracted">extracted</span>
{% else %}
<span class="badge badge-processed">processed</span>
{% endif %}
</td>
<td class="ts">
{% if s.processed_at %}{{ s.processed_at.strftime("%Y-%m-%d %H:%M:%S UTC") }}{% endif %}
</td>
<td class="num">
{% if s.items_extracted is not none %}{{ s.items_extracted }}{% endif %}
</td>
</tr>
{% endfor %}
</tbody>
</table>
{% else %}
<div class="empty">
No session files yet. Run <code>agnes push</code> from a Claude Code workspace
where <code>agnes init</code> installed the SessionEnd hook.
</div>
{% endif %}
</div>
</div>
{% endblock %}

View file

@ -338,6 +338,20 @@ class TestAdminRoleGuards:
assert b"run_session_collector" in r.content assert b"run_session_collector" in r.content
assert b"run_verification_detector" in r.content assert b"run_verification_detector" in r.content
assert b"run_corporate_memory" in r.content assert b"run_corporate_memory" in r.content
# Devin Review on e86dd5ed: list must use the actual logged action
# string, not a guess.
assert b"marketplace.sync_all" in r.content
def test_profile_sessions_page_no_admin_required(self, web_client, analyst_cookie, admin_cookie):
"""The /profile/sessions page is gated by get_current_user, not require_admin —
every authenticated user views their own sessions."""
r = web_client.get("/profile/sessions", follow_redirects=False)
assert r.status_code in (302, 401, 403)
r = web_client.get("/profile/sessions", cookies=analyst_cookie, follow_redirects=False)
assert r.status_code == 200
assert b"My sessions" in r.content
r = web_client.get("/profile/sessions", cookies=admin_cookie, follow_redirects=False)
assert r.status_code == 200
class TestUnauthenticatedHtmlRedirects: class TestUnauthenticatedHtmlRedirects: