* fix(security): RBAC filter for agnes_sessions matches both email local-part and user_id
The upload API (POST /api/upload/sessions) stores session files under
user_sessions/{user_id}/ (UUID), while the session collector uses the
OS username (email local-part). The session pipeline writes the directory
name verbatim into usage_session_summary.username, so the column can
contain either value depending on the ingestion path.
The RBAC filter in build_filter_clause previously only matched the email
local-part, missing sessions uploaded via the API. The fix adds an OR
condition so non-admin users see rows where username matches either their
email local-part or their user_id.
Closes #293
Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>
* fix(security): RBAC filter uses stable user_id instead of mutable email local-part
Closes #293
Previous fix used OR condition matching both email local-part and user_id
in the username column. This was fragile: email changes would break
filtering. This commit introduces a dedicated user_id column populated
by the session pipeline via resolve_user_id(), and switches the RBAC
filter to use it exclusively.
Changes:
- Schema v45: add user_id column to usage_session_summary and usage_events
- UsageProcessor: accept and store user_id in both tables
- runner.py: resolve_user_id() maps directory name to users.id UUID
(exact match for UUID dirs, email LIKE for local-part dirs)
- INTERNAL_TABLES: agnes_sessions/agnes_telemetry filter on user_id column
- build_filter_clause: simplified to WHERE user_id = '<uuid>' (no OR)
- me.py/admin_user_sessions.py: query by user_id OR username for
backward compatibility during transition
- USAGE_PROCESSOR_VERSION bumped 2→3 to trigger reprocessing/backfill
- Tests updated: 27 pass including new email-change resilience test
Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>
* fix(tests): bump schema version assertions 44→45
Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>
* fix(docs): correct resolve_user_id docstring, add TypeError comment
Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>
* fix(security): address review — backward-compat OR, LIKE escaping, narrower TypeError
Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>
* fix(security): address code review — eliminate TypeError hack, add resolve_user_id tests
Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>
* fix(db): create user_id indexes in _v44_to_v45, not _SYSTEM_SCHEMA
_SYSTEM_SCHEMA runs before the migration ladder. On an upgrade from
v42/v43/v44, usage_events / usage_session_summary already exist without
the user_id column (CREATE TABLE IF NOT EXISTS is a no-op), so the
CREATE INDEX ... (user_id) lines in _SYSTEM_SCHEMA failed to bind and
aborted _ensure_schema — the app would not start post-upgrade. Move the
index creation to _v44_to_v45, which ADDs the column first. Same pattern
as the v41 audit_log indices.
* fix(usage): bump USAGE_PROCESSOR_VERSION 3→4 for user_id backfill
#303 shipped USAGE_PROCESSOR_VERSION=3 (release 0.54.12) for its
<command-name> slash extraction. This PR's 2→3 bump collided with it
on rebase, so the reprocess loop would not re-trigger to backfill the
new user_id column on deployments already running v3. Bump to 4.
* release: 0.54.13 — RBAC filter uses stable user_id (#293)
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
103 lines
3.5 KiB
Python
103 lines
3.5 KiB
Python
"""UsageProcessor — extracts skill / agent / tool invocation events from
|
|
Claude Code session jsonls. See Phase A.3 of platform-telemetry epic."""
|
|
|
|
from __future__ import annotations
|
|
|
|
import hashlib
|
|
import logging
|
|
from pathlib import Path
|
|
|
|
import duckdb
|
|
|
|
from services.session_pipeline.contract import ProcessorResult
|
|
from services.session_pipeline.lib import parse_jsonl
|
|
from services.session_processors.usage_lib import (
|
|
USAGE_PROCESSOR_VERSION,
|
|
AttributionLookup,
|
|
compute_summary,
|
|
iter_events,
|
|
)
|
|
from src.repositories.usage import UsageRepository
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
class UsageProcessor:
|
|
name: str = "usage"
|
|
cadence_minutes: int = 10
|
|
|
|
def process_session(
|
|
self,
|
|
session_path: Path,
|
|
username: str,
|
|
session_key: str,
|
|
conn: duckdb.DuckDBPyConnection,
|
|
*,
|
|
user_id: str | None = None,
|
|
) -> ProcessorResult:
|
|
turns = parse_jsonl(session_path)
|
|
events = list(iter_events(turns))
|
|
|
|
# Derive session_id from first turn that carries one
|
|
session_id = session_key
|
|
for t in turns:
|
|
sid = t.get("sessionId")
|
|
if sid:
|
|
session_id = sid
|
|
break
|
|
|
|
attr = AttributionLookup(conn)
|
|
rows = []
|
|
for e in events:
|
|
source, ref_id = attr.attribute(e)
|
|
# Stable dedup key: session_id + event_uuid + tool_id + event_type + tool_name + command_name.
|
|
# tool_id (tu_xxx) disambiguates parallel tool_use items in the same assistant turn
|
|
# that share the same event_uuid, event_type, and tool_name.
|
|
id_input = (
|
|
f"{session_id}|{e.event_uuid or ''}|{e.tool_id or ''}"
|
|
f"|{e.event_type}|{e.tool_name or ''}"
|
|
f"|{e.command_name or ''}"
|
|
)
|
|
event_id = hashlib.sha256(id_input.encode()).hexdigest()
|
|
rows.append(
|
|
{
|
|
"id": event_id,
|
|
"session_id": session_id,
|
|
"session_file": session_key,
|
|
"username": username,
|
|
"user_id": user_id,
|
|
"event_uuid": e.event_uuid,
|
|
"parent_uuid": e.parent_uuid,
|
|
"event_type": e.event_type,
|
|
"tool_name": e.tool_name,
|
|
"skill_name": e.skill_name,
|
|
"subagent_type": e.subagent_type,
|
|
"command_name": e.command_name,
|
|
"is_error": e.is_error,
|
|
"source": source,
|
|
"ref_id": ref_id,
|
|
"model": e.model,
|
|
"cwd": e.cwd,
|
|
"occurred_at": e.occurred_at,
|
|
"processor_version": USAGE_PROCESSOR_VERSION,
|
|
}
|
|
)
|
|
|
|
summary = compute_summary(turns, rows)
|
|
summary["session_file"] = session_key
|
|
summary["username"] = username
|
|
summary["user_id"] = user_id
|
|
# Override session_id with the resolved one
|
|
if not summary.get("session_id"):
|
|
summary["session_id"] = session_id
|
|
|
|
repo = UsageRepository(conn)
|
|
n_written = repo.upsert_events(rows, processor_version=USAGE_PROCESSOR_VERSION)
|
|
repo.upsert_summary(summary, processor_version=USAGE_PROCESSOR_VERSION)
|
|
|
|
logger.info(
|
|
"usage processor: %d events written for session %s",
|
|
n_written,
|
|
session_key,
|
|
)
|
|
return ProcessorResult(items_count=len(rows))
|