feat(profile): /profile/sessions page + audit on detector exception + correct SCHEDULER_AUDIT_ACTIONS
Three changes addressing user feedback during e2e test of #179 + Devin Review on e86dd5ed.
1) /profile/sessions — new self-service user page in the user menu.
Lists all session jsonls the caller uploaded via `agnes push` joined
against session_extraction_state. Each row shows uploaded_at, file
size, status badge (pending/processed/extracted), processed_at, and
items_extracted. The page docstring + help text explicitly call out
that items_extracted=0 means the verification detector ran fine but
the LLM found no claims to track — that's the documented "no items"
outcome, not a broken pipeline. Closes the gap surfaced during the
e2e test of #176 where a user could see their sessions on disk and
process them through the LLM but had no UI to inspect what happened.
2) run_verification_detector audits unhandled exceptions (Devin #1).
If detector.run() threw anything other than the already-translated
ValueError, the audit_log row was never written. The endpoint now
wraps detector.run in try/except, records the exception in
audit_params["unhandled_error"], then re-raises as 500 after audit.
The /admin/scheduler-runs page surfaces the failure row with the
error type + message.
3) SCHEDULER_AUDIT_ACTIONS list corrected (Devin #2). Previous list
had "marketplaces_sync_all" (wrong — actual is "marketplace.sync_all")
plus "data_refresh" and "scripts_run_due" which app/api/sync.py and
app/api/scripts.py don't write to audit_log. Fixed to the four
actually-logged strings; comment points at the missing audit calls
as a follow-up.
Tests: tests/test_web_ui.py adds TestAdminRoleGuards::test_profile_sessions_page_no_admin_required and tightens test_admin_scheduler_runs_page_admin_only to assert the correct marketplace.sync_all string.
This commit is contained in:
parent
e86dd5edc5
commit
4c4dfee8e6
6 changed files with 226 additions and 10 deletions
|
|
@ -45,10 +45,13 @@ Five-defect fix for the silently-broken session pipeline on default Compose depl
|
|||
- **#179 review — `/api/admin/run-session-collector` would SystemExit the worker.** The endpoint called `collector.main()`, whose `argparse.parse_args()` parsed uvicorn's `sys.argv` (`['app.main:app', '--host', …]`) and called `sys.exit(2)` on the unrecognised flags. `SystemExit` inherits from `BaseException`, escapes FastAPI's exception machinery, and propagates through the thread pool — every scheduler tick that fired the endpoint either 500-ed or risked killing the uvicorn worker. Fix: `services/session_collector/collector.py` now exposes an argv-free `run(dry_run, verbose) -> (rc, stats)` helper; `main()` is a thin CLI shim around it and the admin endpoint calls `run()` directly. Audit log now carries the per-run stats (`users_processed`, `files_copied`, `files_skipped`) instead of just the rc. Regression tests in `tests/test_session_collector.py::TestRunHelper`.
|
||||
- **#179 review — `python -m services.corporate_memory` crashed on missing LLM config instead of exiting cleanly.** The PR's fail-fast change made `collect_all()` raise `ValueError` when neither an `ai:` block nor `ANTHROPIC_API_KEY`/`LLM_API_KEY` was available. The `verification_detector` CLI was updated to catch it; the corporate-memory CLI was missed. Now also wrapped — operators get a one-line `Corporate Memory cannot run: <factory message>` on stderr and rc=1 instead of a raw traceback. Regression test in `tests/test_llm_connector.py::TestCorporateMemoryCollector::test_main_returns_1_on_no_ai_config_instead_of_traceback`.
|
||||
- **E2E test — Anthropic API rejected every extraction request.** The structured-output API now requires `additionalProperties: false` on every `{"type": "object"}` node in the json_schema; without it the API returns 400 `invalid_request_error` ("output_config.format.schema: For 'object' type, 'additionalProperties' must be explicitly set to false"). Surfaced on a real BQ-backed deploy: every uploaded session jsonl failed verification-extraction in a tight retry loop. Fix: `connectors/llm/anthropic_provider.py` now wraps the caller-supplied schema through a recursive `_strict_json_schema()` walker that adds the field where missing (preserving any explicit operator override), then passes the strict variant to the API. Six unit tests in `tests/test_llm_connector.py::TestStrictJsonSchema` pin the recursion across nested objects, array items, and the no-mutation invariant.
|
||||
- **#179 review — `/api/admin/run-verification-detector` skipped audit on unhandled exceptions.** If `detector.run()` threw anything other than the already-translated `ValueError` (DuckDB lock, network blip, unexpected SDK error), the audit_log row was never written — the operator's only signal was `docker logs agnes-scheduler-1`. The endpoint now wraps `detector.run` in try/except, records the exception in `audit_params["unhandled_error"]`, then re-raises as 500 after audit. The `/admin/scheduler-runs` page surfaces the failure row with the error type and message.
|
||||
- **#179 review — `SCHEDULER_AUDIT_ACTIONS` listed action strings that don't actually appear in `audit_log`.** The list at `app/web/router.py:952` had `"marketplaces_sync_all"` (wrong — actual is `"marketplace.sync_all"`) plus `"data_refresh"` and `"scripts_run_due"` (which `app/api/sync.py` and `app/api/scripts.py` don't write). Corrected to the four actually-logged strings, with a comment pointing at the missing audit calls in sync/scripts as a follow-up.
|
||||
|
||||
### Added
|
||||
|
||||
- `/admin/scheduler-runs` — read-only admin page showing the last 200 audit-log entries from scheduler-driven actions (`run_session_collector`, `run_verification_detector`, `run_corporate_memory`, `marketplaces_sync_all`, `data_refresh`, `scripts_run_due`). New `AuditRepository.query_actions(actions, limit)` query helper, new admin nav entry under the Admin dropdown. Failed scheduler ticks (HTTP 401, network errors) don't reach the audit_log — those still live only in `docker logs agnes-scheduler-1`; the page calls that out with a hint to set `SCHEDULER_API_TOKEN` if no rows show up.
|
||||
- `/admin/scheduler-runs` — read-only admin page showing the last 200 audit-log entries from scheduler-driven actions (`run_session_collector`, `run_verification_detector`, `run_corporate_memory`, `marketplace.sync_all`). New `AuditRepository.query_actions(actions, limit)` query helper, new admin nav entry under the Admin dropdown. `data-refresh` (`POST /api/sync/trigger`) and `script-runner` (`POST /api/scripts/run-due`) are scheduler jobs but don't write to `audit_log` today, so they can't appear here yet. Failed scheduler ticks (HTTP 401, network errors) don't reach the audit_log either — those still live only in `docker logs agnes-scheduler-1`; the page calls that out with a hint to set `SCHEDULER_API_TOKEN` if no rows show up.
|
||||
- `/profile/sessions` — self-service user page in the user menu, showing all session jsonls the caller uploaded via `agnes push` joined against `session_extraction_state`. Each row shows uploaded_at, file size, status badge (`pending` / `processed` / `extracted`), processed_at, and `items_extracted`. The page docstring explicitly calls out that `items_extracted = 0` means the verification detector ran successfully but the LLM found no claims worth tracking — that's the documented "no items" outcome, not a broken pipeline. Closes the gap surfaced during the e2e test of #176 where a user could see their sessions on disk and process them through the LLM but had no UI to inspect what happened.
|
||||
|
||||
### Internal
|
||||
|
||||
|
|
|
|||
|
|
@ -2860,23 +2860,39 @@ def run_verification_detector(
|
|||
raise HTTPException(status_code=500, detail=str(e))
|
||||
|
||||
job_conn = get_system_db()
|
||||
stats: dict = {}
|
||||
job_error: Optional[Exception] = None
|
||||
try:
|
||||
stats = detector.run(job_conn, extractor, dry_run=False)
|
||||
except Exception as e:
|
||||
# Capture and re-raise after audit so an unhandled detector error
|
||||
# (DuckDB lock, network blip, unexpected SDK type) still leaves a
|
||||
# row in audit_log — the /admin/scheduler-runs page is the
|
||||
# operator's only signal beyond docker logs.
|
||||
job_error = e
|
||||
finally:
|
||||
try:
|
||||
job_conn.close()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
audit_params: dict = {
|
||||
"items_created": stats.get("items_created", 0),
|
||||
"errors": len(stats.get("errors", [])),
|
||||
}
|
||||
if job_error is not None:
|
||||
audit_params["unhandled_error"] = f"{type(job_error).__name__}: {job_error}"
|
||||
|
||||
AuditRepository(conn).log(
|
||||
user_id=user.get("id"),
|
||||
action="run_verification_detector",
|
||||
resource="job:verification-detector",
|
||||
params={
|
||||
"items_created": stats.get("items_created", 0),
|
||||
"errors": len(stats.get("errors", [])),
|
||||
},
|
||||
params=audit_params,
|
||||
)
|
||||
|
||||
if job_error is not None:
|
||||
raise HTTPException(status_code=500, detail=audit_params["unhandled_error"])
|
||||
|
||||
return {"ok": not stats.get("errors"), "details": stats}
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -5,7 +5,7 @@ Replicates all Flask webapp routes with DuckDB-backed data.
|
|||
|
||||
import logging
|
||||
import os
|
||||
from datetime import datetime
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
from urllib.parse import quote
|
||||
|
|
@ -949,13 +949,16 @@ async def admin_marketplaces_page(
|
|||
# Scheduler-driven admin actions audited by app/api/admin.py and
|
||||
# app/api/marketplaces.py. Keep in sync with the JOBS list in
|
||||
# services/scheduler/__main__.py.
|
||||
#
|
||||
# `data-refresh` (POST /api/sync/trigger) and `script-runner`
|
||||
# (POST /api/scripts/run-due) are scheduler jobs but they do NOT write
|
||||
# audit_log today, so they can't appear here. If you add audit calls to
|
||||
# those endpoints, add the matching action strings to this list.
|
||||
SCHEDULER_AUDIT_ACTIONS = [
|
||||
"run_session_collector",
|
||||
"run_verification_detector",
|
||||
"run_corporate_memory",
|
||||
"marketplaces_sync_all",
|
||||
"data_refresh",
|
||||
"scripts_run_due",
|
||||
"marketplace.sync_all",
|
||||
]
|
||||
|
||||
|
||||
|
|
@ -1110,6 +1113,76 @@ async def profile_page(
|
|||
return templates.TemplateResponse(request, "profile.html", ctx)
|
||||
|
||||
|
||||
@router.get("/profile/sessions", response_class=HTMLResponse)
|
||||
async def profile_sessions_page(
|
||||
request: Request,
|
||||
user: dict = Depends(get_current_user),
|
||||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||||
):
|
||||
"""User-self-view of own uploaded sessions and their extraction state.
|
||||
|
||||
Walks `${DATA_DIR}/user_sessions/<user_id>/*.jsonl` for the caller's
|
||||
own user_id, joins each file against `session_extraction_state` to
|
||||
surface processed_at + items_extracted, and renders a table.
|
||||
Items_extracted = 0 means the verification_detector ran but the LLM
|
||||
found no claims worth tracking — that's the documented "no items"
|
||||
outcome; it does NOT mean the pipeline is broken.
|
||||
"""
|
||||
import pathlib
|
||||
user_id = user["id"]
|
||||
data_dir = pathlib.Path(os.environ.get("DATA_DIR", "/data"))
|
||||
user_sessions_dir = data_dir / "user_sessions" / user_id
|
||||
|
||||
files = []
|
||||
if user_sessions_dir.is_dir():
|
||||
for jsonl in sorted(user_sessions_dir.glob("*.jsonl"), key=lambda p: p.stat().st_mtime, reverse=True):
|
||||
try:
|
||||
stat = jsonl.stat()
|
||||
except OSError:
|
||||
continue
|
||||
files.append({
|
||||
"name": jsonl.name,
|
||||
"size_bytes": stat.st_size,
|
||||
"mtime": datetime.fromtimestamp(stat.st_mtime, tz=timezone.utc),
|
||||
})
|
||||
|
||||
state_map: dict = {}
|
||||
if files:
|
||||
keys = [f"{user_id}/{f['name']}" for f in files]
|
||||
placeholders = ",".join("?" for _ in keys)
|
||||
rows = conn.execute(
|
||||
f"""SELECT session_file, processed_at, items_extracted, file_hash
|
||||
FROM session_extraction_state
|
||||
WHERE session_file IN ({placeholders})""",
|
||||
keys,
|
||||
).fetchall()
|
||||
cols = [d[0] for d in conn.description]
|
||||
for row in rows:
|
||||
d = dict(zip(cols, row))
|
||||
state_map[d["session_file"]] = d
|
||||
|
||||
rows_view = []
|
||||
for f in files:
|
||||
key = f"{user_id}/{f['name']}"
|
||||
state = state_map.get(key)
|
||||
rows_view.append({
|
||||
"name": f["name"],
|
||||
"size_kb": round(f["size_bytes"] / 1024, 1),
|
||||
"uploaded_at": f["mtime"],
|
||||
"processed_at": state["processed_at"] if state else None,
|
||||
"items_extracted": state["items_extracted"] if state else None,
|
||||
"is_processed": state is not None,
|
||||
})
|
||||
|
||||
ctx = _build_context(
|
||||
request,
|
||||
user=user,
|
||||
sessions=rows_view,
|
||||
user_id=user_id,
|
||||
)
|
||||
return templates.TemplateResponse(request, "profile_sessions.html", ctx)
|
||||
|
||||
|
||||
@router.get("/_debug/throw/http/{code:int}", response_class=HTMLResponse, include_in_schema=False)
|
||||
async def _debug_throw_http(request: Request, code: int):
|
||||
"""Dev helper — raise an HTTPException with the given status code.
|
||||
|
|
|
|||
|
|
@ -58,7 +58,8 @@
|
|||
<div class="app-user-menu-role">Admin</div>
|
||||
{% endif %}
|
||||
</div>
|
||||
<a class="app-user-menu-item {% if _path.startswith('/profile') %}is-active{% endif %}" role="menuitem" href="/profile">Profile</a>
|
||||
<a class="app-user-menu-item {% if _path == '/profile' %}is-active{% endif %}" role="menuitem" href="/profile">Profile</a>
|
||||
<a class="app-user-menu-item {% if _path == '/profile/sessions' %}is-active{% endif %}" role="menuitem" href="/profile/sessions">My sessions</a>
|
||||
<a class="app-user-menu-item {% if _path == '/tokens' %}is-active{% endif %}" role="menuitem" href="/tokens">My tokens</a>
|
||||
{% if config.DEBUG_AUTH_ENABLED %}
|
||||
<a class="app-user-menu-item {% if _path.startswith('/me/debug') %}is-active{% endif %}" role="menuitem" href="/me/debug">Auth debug</a>
|
||||
|
|
|
|||
109
app/web/templates/profile_sessions.html
Normal file
109
app/web/templates/profile_sessions.html
Normal file
|
|
@ -0,0 +1,109 @@
|
|||
{% extends "base.html" %}
|
||||
{% block title %}My sessions — {{ config.INSTANCE_NAME }}{% endblock %}
|
||||
|
||||
{% block content %}
|
||||
<style>
|
||||
.container:has(.sess-page) { max-width: none; padding: 24px 16px; }
|
||||
.sess-page { max-width: 1200px; margin: 0 auto; padding: 0; }
|
||||
.sess-title { margin: 0 0 8px 0; font-size: 22px; font-weight: 600; }
|
||||
.sess-help { color: var(--text-secondary, #6b7280); font-size: 13px; margin-bottom: 20px; line-height: 1.55; }
|
||||
.sess-help code { background: var(--border-light, #f3f4f6); padding: 1px 6px; border-radius: 4px; font-size: 12px; }
|
||||
.sess-table-wrap {
|
||||
background: var(--surface, #fff);
|
||||
border: 1px solid var(--border, #e5e7eb);
|
||||
border-radius: 12px;
|
||||
overflow-x: auto;
|
||||
}
|
||||
.sess-table { width: 100%; border-collapse: collapse; font-size: 13px; }
|
||||
.sess-table thead th {
|
||||
text-align: left; padding: 12px 16px;
|
||||
background: var(--border-light, #f9fafb);
|
||||
border-bottom: 1px solid var(--border, #e5e7eb);
|
||||
font-weight: 600; color: var(--text-secondary, #6b7280);
|
||||
font-size: 11px; text-transform: uppercase; letter-spacing: 0.4px;
|
||||
white-space: nowrap;
|
||||
}
|
||||
.sess-table tbody td {
|
||||
padding: 10px 16px;
|
||||
border-bottom: 1px solid var(--border-light, #f3f4f6);
|
||||
vertical-align: middle;
|
||||
}
|
||||
.sess-table tbody tr:last-child td { border-bottom: none; }
|
||||
.sess-table tbody tr:hover { background: var(--border-light, #fafafa); }
|
||||
.sess-table .name { font-family: ui-monospace, SFMono-Regular, Menlo, monospace; font-size: 12px; }
|
||||
.sess-table .ts { white-space: nowrap; color: var(--text-secondary, #6b7280); font-variant-numeric: tabular-nums; }
|
||||
.sess-table .num { text-align: right; font-variant-numeric: tabular-nums; }
|
||||
.badge {
|
||||
display: inline-block; padding: 2px 8px; border-radius: 999px;
|
||||
font-size: 11px; font-weight: 500;
|
||||
}
|
||||
.badge-pending { background: #fef3c7; color: #92400e; }
|
||||
.badge-processed { background: #dbeafe; color: #1e40af; }
|
||||
.badge-extracted { background: #d1fae5; color: #065f46; }
|
||||
.empty {
|
||||
padding: 40px 16px; text-align: center;
|
||||
color: var(--text-secondary, #6b7280); font-size: 13px;
|
||||
}
|
||||
</style>
|
||||
|
||||
<div class="sess-page">
|
||||
<h1 class="sess-title">My sessions</h1>
|
||||
<p class="sess-help">
|
||||
Sessions you uploaded via <code>agnes push</code> from your Claude Code workspace, with
|
||||
extraction status from <code>session_extraction_state</code>.
|
||||
<br>
|
||||
<strong>Items extracted = 0</strong> means the verification detector ran successfully
|
||||
but the LLM didn't find anything worth tracking in that session — that's expected for
|
||||
sessions that are mostly tool calls or coding without confident factual claims.
|
||||
<br>
|
||||
<em>Pending</em> means the file is on disk but the scheduler hasn't processed it yet
|
||||
(next verification-detector tick: every 15 min by default).
|
||||
</p>
|
||||
|
||||
<div class="sess-table-wrap">
|
||||
{% if sessions %}
|
||||
<table class="sess-table">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Session file</th>
|
||||
<th>Uploaded</th>
|
||||
<th class="num">Size</th>
|
||||
<th>Status</th>
|
||||
<th>Processed</th>
|
||||
<th class="num">Items extracted</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{% for s in sessions %}
|
||||
<tr>
|
||||
<td class="name">{{ s.name }}</td>
|
||||
<td class="ts">{{ s.uploaded_at.strftime("%Y-%m-%d %H:%M:%S UTC") }}</td>
|
||||
<td class="num">{{ s.size_kb }} kB</td>
|
||||
<td>
|
||||
{% if not s.is_processed %}
|
||||
<span class="badge badge-pending">pending</span>
|
||||
{% elif s.items_extracted and s.items_extracted > 0 %}
|
||||
<span class="badge badge-extracted">extracted</span>
|
||||
{% else %}
|
||||
<span class="badge badge-processed">processed</span>
|
||||
{% endif %}
|
||||
</td>
|
||||
<td class="ts">
|
||||
{% if s.processed_at %}{{ s.processed_at.strftime("%Y-%m-%d %H:%M:%S UTC") }}{% endif %}
|
||||
</td>
|
||||
<td class="num">
|
||||
{% if s.items_extracted is not none %}{{ s.items_extracted }}{% endif %}
|
||||
</td>
|
||||
</tr>
|
||||
{% endfor %}
|
||||
</tbody>
|
||||
</table>
|
||||
{% else %}
|
||||
<div class="empty">
|
||||
No session files yet. Run <code>agnes push</code> from a Claude Code workspace
|
||||
where <code>agnes init</code> installed the SessionEnd hook.
|
||||
</div>
|
||||
{% endif %}
|
||||
</div>
|
||||
</div>
|
||||
{% endblock %}
|
||||
|
|
@ -338,6 +338,20 @@ class TestAdminRoleGuards:
|
|||
assert b"run_session_collector" in r.content
|
||||
assert b"run_verification_detector" in r.content
|
||||
assert b"run_corporate_memory" in r.content
|
||||
# Devin Review on e86dd5ed: list must use the actual logged action
|
||||
# string, not a guess.
|
||||
assert b"marketplace.sync_all" in r.content
|
||||
|
||||
def test_profile_sessions_page_no_admin_required(self, web_client, analyst_cookie, admin_cookie):
|
||||
"""The /profile/sessions page is gated by get_current_user, not require_admin —
|
||||
every authenticated user views their own sessions."""
|
||||
r = web_client.get("/profile/sessions", follow_redirects=False)
|
||||
assert r.status_code in (302, 401, 403)
|
||||
r = web_client.get("/profile/sessions", cookies=analyst_cookie, follow_redirects=False)
|
||||
assert r.status_code == 200
|
||||
assert b"My sessions" in r.content
|
||||
r = web_client.get("/profile/sessions", cookies=admin_cookie, follow_redirects=False)
|
||||
assert r.status_code == 200
|
||||
|
||||
|
||||
class TestUnauthenticatedHtmlRedirects:
|
||||
|
|
|
|||
Loading…
Reference in a new issue