feat(profile): /profile/sessions page + audit on detector exception + correct SCHEDULER_AUDIT_ACTIONS

Three changes addressing user feedback during e2e test of #179 + Devin Review on e86dd5ed. 1) /profile/sessions — new self-service user page in the user menu. Lists all session jsonls the caller uploaded via `agnes push` joined against session_extraction_state. Each row shows uploaded_at, file size, status badge (pending/processed/extracted), processed_at, and items_extracted. The page docstring + help text explicitly call out that items_extracted=0 means the verification detector ran fine but the LLM found no claims to track — that's the documented "no items" outcome, not a broken pipeline. Closes the gap surfaced during the e2e test of #176 where a user could see their sessions on disk and process them through the LLM but had no UI to inspect what happened. 2) run_verification_detector audits unhandled exceptions (Devin #1). If detector.run() threw anything other than the already-translated ValueError, the audit_log row was never written. The endpoint now wraps detector.run in try/except, records the exception in audit_params["unhandled_error"], then re-raises as 500 after audit. The /admin/scheduler-runs page surfaces the failure row with the error type + message. 3) SCHEDULER_AUDIT_ACTIONS list corrected (Devin #2). Previous list had "marketplaces_sync_all" (wrong — actual is "marketplace.sync_all") plus "data_refresh" and "scripts_run_due" which app/api/sync.py and app/api/scripts.py don't write to audit_log. Fixed to the four actually-logged strings; comment points at the missing audit calls as a follow-up. Tests: tests/test_web_ui.py adds TestAdminRoleGuards::test_profile_sessions_page_no_admin_required and tightens test_admin_scheduler_runs_page_admin_only to assert the correct marketplace.sync_all string.
2026-05-05 08:57:35 +02:00 · 2026-05-05 08:57:35 +02:00 · 4c4dfee8e6
commit 4c4dfee8e6
parent e86dd5edc5
6 changed files with 226 additions and 10 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -45,10 +45,13 @@ Five-defect fix for the silently-broken session pipeline on default Compose depl
 - **#179 review — `/api/admin/run-session-collector` would SystemExit the worker.** The endpoint called `collector.main()`, whose `argparse.parse_args()` parsed uvicorn's `sys.argv` (`['app.main:app', '--host', …]`) and called `sys.exit(2)` on the unrecognised flags. `SystemExit` inherits from `BaseException`, escapes FastAPI's exception machinery, and propagates through the thread pool — every scheduler tick that fired the endpoint either 500-ed or risked killing the uvicorn worker. Fix: `services/session_collector/collector.py` now exposes an argv-free `run(dry_run, verbose) -> (rc, stats)` helper; `main()` is a thin CLI shim around it and the admin endpoint calls `run()` directly. Audit log now carries the per-run stats (`users_processed`, `files_copied`, `files_skipped`) instead of just the rc. Regression tests in `tests/test_session_collector.py::TestRunHelper`.
 - **#179 review — `python -m services.corporate_memory` crashed on missing LLM config instead of exiting cleanly.** The PR's fail-fast change made `collect_all()` raise `ValueError` when neither an `ai:` block nor `ANTHROPIC_API_KEY`/`LLM_API_KEY` was available. The `verification_detector` CLI was updated to catch it; the corporate-memory CLI was missed. Now also wrapped — operators get a one-line `Corporate Memory cannot run: <factory message>` on stderr and rc=1 instead of a raw traceback. Regression test in `tests/test_llm_connector.py::TestCorporateMemoryCollector::test_main_returns_1_on_no_ai_config_instead_of_traceback`.
 - **E2E test — Anthropic API rejected every extraction request.** The structured-output API now requires `additionalProperties: false` on every `{"type": "object"}` node in the json_schema; without it the API returns 400 `invalid_request_error` ("output_config.format.schema: For 'object' type, 'additionalProperties' must be explicitly set to false"). Surfaced on a real BQ-backed deploy: every uploaded session jsonl failed verification-extraction in a tight retry loop. Fix: `connectors/llm/anthropic_provider.py` now wraps the caller-supplied schema through a recursive `_strict_json_schema()` walker that adds the field where missing (preserving any explicit operator override), then passes the strict variant to the API. Six unit tests in `tests/test_llm_connector.py::TestStrictJsonSchema` pin the recursion across nested objects, array items, and the no-mutation invariant.
 - **#179 review — `/api/admin/run-verification-detector` skipped audit on unhandled exceptions.** If `detector.run()` threw anything other than the already-translated `ValueError` (DuckDB lock, network blip, unexpected SDK error), the audit_log row was never written — the operator's only signal was `docker logs agnes-scheduler-1`. The endpoint now wraps `detector.run` in try/except, records the exception in `audit_params["unhandled_error"]`, then re-raises as 500 after audit. The `/admin/scheduler-runs` page surfaces the failure row with the error type and message.
 - **#179 review — `SCHEDULER_AUDIT_ACTIONS` listed action strings that don't actually appear in `audit_log`.** The list at `app/web/router.py:952` had `"marketplaces_sync_all"` (wrong — actual is `"marketplace.sync_all"`) plus `"data_refresh"` and `"scripts_run_due"` (which `app/api/sync.py` and `app/api/scripts.py` don't write). Corrected to the four actually-logged strings, with a comment pointing at the missing audit calls in sync/scripts as a follow-up.
 ### Added
- `/admin/scheduler-runs` — read-only admin page showing the last 200 audit-log entries from scheduler-driven actions (`run_session_collector`, `run_verification_detector`, `run_corporate_memory`, `marketplaces_sync_all`, `data_refresh`, `scripts_run_due`). New `AuditRepository.query_actions(actions, limit)` query helper, new admin nav entry under the Admin dropdown. Failed scheduler ticks (HTTP 401, network errors) don't reach the audit_log — those still live only in `docker logs agnes-scheduler-1`; the page calls that out with a hint to set `SCHEDULER_API_TOKEN` if no rows show up.
+- `/admin/scheduler-runs` — read-only admin page showing the last 200 audit-log entries from scheduler-driven actions (`run_session_collector`, `run_verification_detector`, `run_corporate_memory`, `marketplace.sync_all`). New `AuditRepository.query_actions(actions, limit)` query helper, new admin nav entry under the Admin dropdown. `data-refresh` (`POST /api/sync/trigger`) and `script-runner` (`POST /api/scripts/run-due`) are scheduler jobs but don't write to `audit_log` today, so they can't appear here yet. Failed scheduler ticks (HTTP 401, network errors) don't reach the audit_log either — those still live only in `docker logs agnes-scheduler-1`; the page calls that out with a hint to set `SCHEDULER_API_TOKEN` if no rows show up.
 - `/profile/sessions` — self-service user page in the user menu, showing all session jsonls the caller uploaded via `agnes push` joined against `session_extraction_state`. Each row shows uploaded_at, file size, status badge (`pending` / `processed` / `extracted`), processed_at, and `items_extracted`. The page docstring explicitly calls out that `items_extracted = 0` means the verification detector ran successfully but the LLM found no claims worth tracking — that's the documented "no items" outcome, not a broken pipeline. Closes the gap surfaced during the e2e test of #176 where a user could see their sessions on disk and process them through the LLM but had no UI to inspect what happened.
 ### Internal
--- a/app/api/admin.py
+++ b/app/api/admin.py
@ -2860,23 +2860,39 @@ def run_verification_detector(
        raise HTTPException(status_code=500, detail=str(e))
    job_conn = get_system_db()
    stats: dict = {}
    job_error: Optional[Exception] = None
    try:
        stats = detector.run(job_conn, extractor, dry_run=False)
    except Exception as e:
        # Capture and re-raise after audit so an unhandled detector error
        # (DuckDB lock, network blip, unexpected SDK type) still leaves a
        # row in audit_log — the /admin/scheduler-runs page is the
        # operator's only signal beyond docker logs.
        job_error = e
    finally:
        try:
            job_conn.close()
        except Exception:
            pass
    audit_params: dict = {
        "items_created": stats.get("items_created", 0),
        "errors": len(stats.get("errors", [])),
    }
    if job_error is not None:
        audit_params["unhandled_error"] = f"{type(job_error).__name__}: {job_error}"
    AuditRepository(conn).log(
        user_id=user.get("id"),
        action="run_verification_detector",
        resource="job:verification-detector",
-        params={
+        params=audit_params,
            "items_created": stats.get("items_created", 0),
            "errors": len(stats.get("errors", [])),
        },
    )
    if job_error is not None:
        raise HTTPException(status_code=500, detail=audit_params["unhandled_error"])
    return {"ok": not stats.get("errors"), "details": stats}
--- a/app/web/router.py
+++ b/app/web/router.py
@ -5,7 +5,7 @@ Replicates all Flask webapp routes with DuckDB-backed data.
 import logging
 import os
-from datetime import datetime
+from datetime import datetime, timezone
 from pathlib import Path
 from typing import Optional
 from urllib.parse import quote
@ -949,13 +949,16 @@ async def admin_marketplaces_page(
 # Scheduler-driven admin actions audited by app/api/admin.py and
 # app/api/marketplaces.py. Keep in sync with the JOBS list in
 # services/scheduler/__main__.py.
 #
 # `data-refresh` (POST /api/sync/trigger) and `script-runner`
 # (POST /api/scripts/run-due) are scheduler jobs but they do NOT write
 # audit_log today, so they can't appear here. If you add audit calls to
 # those endpoints, add the matching action strings to this list.
 SCHEDULER_AUDIT_ACTIONS = [
    "run_session_collector",
    "run_verification_detector",
    "run_corporate_memory",
-    "marketplaces_sync_all",
+    "marketplace.sync_all",
    "data_refresh",
    "scripts_run_due",
 ]
@ -1110,6 +1113,76 @@ async def profile_page(
    return templates.TemplateResponse(request, "profile.html", ctx)
@router.get("/profile/sessions", response_class=HTMLResponse)
 async def profile_sessions_page(
    request: Request,
    user: dict = Depends(get_current_user),
    conn: duckdb.DuckDBPyConnection = Depends(_get_db),
 ):
    """User-self-view of own uploaded sessions and their extraction state.
    Walks `${DATA_DIR}/user_sessions/<user_id>/*.jsonl` for the caller's
    own user_id, joins each file against `session_extraction_state` to
    surface processed_at + items_extracted, and renders a table.
    Items_extracted = 0 means the verification_detector ran but the LLM
    found no claims worth tracking — that's the documented "no items"
    outcome; it does NOT mean the pipeline is broken.
    """
    import pathlib
    user_id = user["id"]
    data_dir = pathlib.Path(os.environ.get("DATA_DIR", "/data"))
    user_sessions_dir = data_dir / "user_sessions" / user_id
    files = []
    if user_sessions_dir.is_dir():
        for jsonl in sorted(user_sessions_dir.glob("*.jsonl"), key=lambda p: p.stat().st_mtime, reverse=True):
            try:
                stat = jsonl.stat()
            except OSError:
                continue
            files.append({
                "name": jsonl.name,
                "size_bytes": stat.st_size,
                "mtime": datetime.fromtimestamp(stat.st_mtime, tz=timezone.utc),
            })
    state_map: dict = {}
    if files:
        keys = [f"{user_id}/{f['name']}" for f in files]
        placeholders = ",".join("?" for _ in keys)
        rows = conn.execute(
            f"""SELECT session_file, processed_at, items_extracted, file_hash
                FROM session_extraction_state
                WHERE session_file IN ({placeholders})""",
            keys,
        ).fetchall()
        cols = [d[0] for d in conn.description]
        for row in rows:
            d = dict(zip(cols, row))
            state_map[d["session_file"]] = d
    rows_view = []
    for f in files:
        key = f"{user_id}/{f['name']}"
        state = state_map.get(key)
        rows_view.append({
            "name": f["name"],
            "size_kb": round(f["size_bytes"] / 1024, 1),
            "uploaded_at": f["mtime"],
            "processed_at": state["processed_at"] if state else None,
            "items_extracted": state["items_extracted"] if state else None,
            "is_processed": state is not None,
        })
    ctx = _build_context(
        request,
        user=user,
        sessions=rows_view,
        user_id=user_id,
    )
    return templates.TemplateResponse(request, "profile_sessions.html", ctx)
@router.get("/_debug/throw/http/{code:int}", response_class=HTMLResponse, include_in_schema=False)
 async def _debug_throw_http(request: Request, code: int):
    """Dev helper — raise an HTTPException with the given status code.
--- a/app/web/templates/_app_header.html
+++ b/app/web/templates/_app_header.html
@ -58,7 +58,8 @@
                    <div class="app-user-menu-role">Admin</div>
                    {% endif %}
                </div>
-                <a class="app-user-menu-item {% if _path.startswith('/profile') %}is-active{% endif %}" role="menuitem" href="/profile">Profile</a>
+                <a class="app-user-menu-item {% if _path == '/profile' %}is-active{% endif %}" role="menuitem" href="/profile">Profile</a>
                <a class="app-user-menu-item {% if _path == '/profile/sessions' %}is-active{% endif %}" role="menuitem" href="/profile/sessions">My sessions</a>
                <a class="app-user-menu-item {% if _path == '/tokens' %}is-active{% endif %}" role="menuitem" href="/tokens">My tokens</a>
                {% if config.DEBUG_AUTH_ENABLED %}
                <a class="app-user-menu-item {% if _path.startswith('/me/debug') %}is-active{% endif %}" role="menuitem" href="/me/debug">Auth debug</a>
--- a/app/web/templates/profile_sessions.html
+++ b/app/web/templates/profile_sessions.html
@ -0,0 +1,109 @@
 {% extends "base.html" %}
 {% block title %}My sessions — {{ config.INSTANCE_NAME }}{% endblock %}
 {% block content %}
 <style>
  .container:has(.sess-page) { max-width: none; padding: 24px 16px; }
  .sess-page { max-width: 1200px; margin: 0 auto; padding: 0; }
  .sess-title { margin: 0 0 8px 0; font-size: 22px; font-weight: 600; }
  .sess-help { color: var(--text-secondary, #6b7280); font-size: 13px; margin-bottom: 20px; line-height: 1.55; }
  .sess-help code { background: var(--border-light, #f3f4f6); padding: 1px 6px; border-radius: 4px; font-size: 12px; }
  .sess-table-wrap {
    background: var(--surface, #fff);
    border: 1px solid var(--border, #e5e7eb);
    border-radius: 12px;
    overflow-x: auto;
  }
  .sess-table { width: 100%; border-collapse: collapse; font-size: 13px; }
  .sess-table thead th {
    text-align: left; padding: 12px 16px;
    background: var(--border-light, #f9fafb);
    border-bottom: 1px solid var(--border, #e5e7eb);
    font-weight: 600; color: var(--text-secondary, #6b7280);
    font-size: 11px; text-transform: uppercase; letter-spacing: 0.4px;
    white-space: nowrap;
  }
  .sess-table tbody td {
    padding: 10px 16px;
    border-bottom: 1px solid var(--border-light, #f3f4f6);
    vertical-align: middle;
  }
  .sess-table tbody tr:last-child td { border-bottom: none; }
  .sess-table tbody tr:hover { background: var(--border-light, #fafafa); }
  .sess-table .name { font-family: ui-monospace, SFMono-Regular, Menlo, monospace; font-size: 12px; }
  .sess-table .ts { white-space: nowrap; color: var(--text-secondary, #6b7280); font-variant-numeric: tabular-nums; }
  .sess-table .num { text-align: right; font-variant-numeric: tabular-nums; }
  .badge {
    display: inline-block; padding: 2px 8px; border-radius: 999px;
    font-size: 11px; font-weight: 500;
  }
  .badge-pending { background: #fef3c7; color: #92400e; }
  .badge-processed { background: #dbeafe; color: #1e40af; }
  .badge-extracted { background: #d1fae5; color: #065f46; }
  .empty {
    padding: 40px 16px; text-align: center;
    color: var(--text-secondary, #6b7280); font-size: 13px;
  }
 </style>
 <div class="sess-page">
  <h1 class="sess-title">My sessions</h1>
  <p class="sess-help">
    Sessions you uploaded via <code>agnes push</code> from your Claude Code workspace, with
    extraction status from <code>session_extraction_state</code>.
    <br>
    <strong>Items extracted = 0</strong> means the verification detector ran successfully
    but the LLM didn't find anything worth tracking in that session — that's expected for
    sessions that are mostly tool calls or coding without confident factual claims.
    <br>
    <em>Pending</em> means the file is on disk but the scheduler hasn't processed it yet
    (next verification-detector tick: every 15 min by default).
  </p>
  <div class="sess-table-wrap">
    {% if sessions %}
    <table class="sess-table">
      <thead>
        <tr>
          <th>Session file</th>
          <th>Uploaded</th>
          <th class="num">Size</th>
          <th>Status</th>
          <th>Processed</th>
          <th class="num">Items extracted</th>
        </tr>
      </thead>
      <tbody>
        {% for s in sessions %}
        <tr>
          <td class="name">{{ s.name }}</td>
          <td class="ts">{{ s.uploaded_at.strftime("%Y-%m-%d %H:%M:%S UTC") }}</td>
          <td class="num">{{ s.size_kb }} kB</td>
          <td>
            {% if not s.is_processed %}
              <span class="badge badge-pending">pending</span>
            {% elif s.items_extracted and s.items_extracted > 0 %}
              <span class="badge badge-extracted">extracted</span>
            {% else %}
              <span class="badge badge-processed">processed</span>
            {% endif %}
          </td>
          <td class="ts">
            {% if s.processed_at %}{{ s.processed_at.strftime("%Y-%m-%d %H:%M:%S UTC") }}{% endif %}
          </td>
          <td class="num">
            {% if s.items_extracted is not none %}{{ s.items_extracted }}{% endif %}
          </td>
        </tr>
        {% endfor %}
      </tbody>
    </table>
    {% else %}
    <div class="empty">
      No session files yet. Run <code>agnes push</code> from a Claude Code workspace
      where <code>agnes init</code> installed the SessionEnd hook.
    </div>
    {% endif %}
  </div>
 </div>
 {% endblock %}
--- a/tests/test_web_ui.py
+++ b/tests/test_web_ui.py
@ -338,6 +338,20 @@ class TestAdminRoleGuards:
        assert b"run_session_collector" in r.content
        assert b"run_verification_detector" in r.content
        assert b"run_corporate_memory" in r.content
        # Devin Review on e86dd5ed: list must use the actual logged action
        # string, not a guess.
        assert b"marketplace.sync_all" in r.content
    def test_profile_sessions_page_no_admin_required(self, web_client, analyst_cookie, admin_cookie):
        """The /profile/sessions page is gated by get_current_user, not require_admin —
        every authenticated user views their own sessions."""
        r = web_client.get("/profile/sessions", follow_redirects=False)
        assert r.status_code in (302, 401, 403)
        r = web_client.get("/profile/sessions", cookies=analyst_cookie, follow_redirects=False)
        assert r.status_code == 200
        assert b"My sessions" in r.content
        r = web_client.get("/profile/sessions", cookies=admin_cookie, follow_redirects=False)
        assert r.status_code == 200
 class TestUnauthenticatedHtmlRedirects: