* fix(security): RBAC filter for agnes_sessions matches both email local-part and user_id
The upload API (POST /api/upload/sessions) stores session files under
user_sessions/{user_id}/ (UUID), while the session collector uses the
OS username (email local-part). The session pipeline writes the directory
name verbatim into usage_session_summary.username, so the column can
contain either value depending on the ingestion path.
The RBAC filter in build_filter_clause previously only matched the email
local-part, missing sessions uploaded via the API. The fix adds an OR
condition so non-admin users see rows where username matches either their
email local-part or their user_id.
Closes #293
Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>
* fix(security): RBAC filter uses stable user_id instead of mutable email local-part
Closes #293
Previous fix used OR condition matching both email local-part and user_id
in the username column. This was fragile: email changes would break
filtering. This commit introduces a dedicated user_id column populated
by the session pipeline via resolve_user_id(), and switches the RBAC
filter to use it exclusively.
Changes:
- Schema v45: add user_id column to usage_session_summary and usage_events
- UsageProcessor: accept and store user_id in both tables
- runner.py: resolve_user_id() maps directory name to users.id UUID
(exact match for UUID dirs, email LIKE for local-part dirs)
- INTERNAL_TABLES: agnes_sessions/agnes_telemetry filter on user_id column
- build_filter_clause: simplified to WHERE user_id = '<uuid>' (no OR)
- me.py/admin_user_sessions.py: query by user_id OR username for
backward compatibility during transition
- USAGE_PROCESSOR_VERSION bumped 2→3 to trigger reprocessing/backfill
- Tests updated: 27 pass including new email-change resilience test
Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>
* fix(tests): bump schema version assertions 44→45
Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>
* fix(docs): correct resolve_user_id docstring, add TypeError comment
Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>
* fix(security): address review — backward-compat OR, LIKE escaping, narrower TypeError
Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>
* fix(security): address code review — eliminate TypeError hack, add resolve_user_id tests
Co-Authored-By: zdenek.srotyr <zdenek.srotyr@keboola.com>
* fix(db): create user_id indexes in _v44_to_v45, not _SYSTEM_SCHEMA
_SYSTEM_SCHEMA runs before the migration ladder. On an upgrade from
v42/v43/v44, usage_events / usage_session_summary already exist without
the user_id column (CREATE TABLE IF NOT EXISTS is a no-op), so the
CREATE INDEX ... (user_id) lines in _SYSTEM_SCHEMA failed to bind and
aborted _ensure_schema — the app would not start post-upgrade. Move the
index creation to _v44_to_v45, which ADDs the column first. Same pattern
as the v41 audit_log indices.
* fix(usage): bump USAGE_PROCESSOR_VERSION 3→4 for user_id backfill
#303 shipped USAGE_PROCESSOR_VERSION=3 (release 0.54.12) for its
<command-name> slash extraction. This PR's 2→3 bump collided with it
on rebase, so the reprocess loop would not re-trigger to backfill the
new user_id column on deployments already running v3. Bump to 4.
* release: 0.54.13 — RBAC filter uses stable user_id (#293)
---------
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
This commit is contained in:
parent
f53e98d5a3
commit
3e19caa975
16 changed files with 729 additions and 494 deletions
14
CHANGELOG.md
14
CHANGELOG.md
|
|
@ -10,6 +10,20 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C
|
||||||
|
|
||||||
## [Unreleased]
|
## [Unreleased]
|
||||||
|
|
||||||
|
## [0.54.13] — 2026-05-14
|
||||||
|
|
||||||
|
### Security
|
||||||
|
|
||||||
|
- **RBAC filter uses stable `user_id` (UUID) instead of mutable email
|
||||||
|
local-part (#293).** Non-admin users querying `agnes_sessions` /
|
||||||
|
`agnes_telemetry` are now filtered by `user_id` (immutable UUID)
|
||||||
|
rather than `username` (email local-part, which changes on rename).
|
||||||
|
Schema v45 adds a `user_id` column to `usage_session_summary` and
|
||||||
|
`usage_events`; the session pipeline's `resolve_user_id()` populates
|
||||||
|
it on every (re)process run. `USAGE_PROCESSOR_VERSION` bumps 3→4 to
|
||||||
|
trigger backfill. During the transition period, RBAC queries include
|
||||||
|
an OR fallback on `username` so pre-backfill rows remain visible.
|
||||||
|
|
||||||
## [0.54.12] — 2026-05-14
|
## [0.54.12] — 2026-05-14
|
||||||
|
|
||||||
### Fixed
|
### Fixed
|
||||||
|
|
|
||||||
|
|
@ -97,6 +97,8 @@ def list_user_sessions(
|
||||||
# ------------------------------------------------------------------
|
# ------------------------------------------------------------------
|
||||||
# Pull processed rows from usage_session_summary
|
# Pull processed rows from usage_session_summary
|
||||||
# ------------------------------------------------------------------
|
# ------------------------------------------------------------------
|
||||||
|
# Match on both user_id (stable, v45+) and username (legacy) so the
|
||||||
|
# admin view shows sessions from both ingestion paths and pre-v45 rows.
|
||||||
try:
|
try:
|
||||||
rows_db = conn.execute(
|
rows_db = conn.execute(
|
||||||
"""
|
"""
|
||||||
|
|
@ -105,19 +107,27 @@ def list_user_sessions(
|
||||||
active_seconds, wall_seconds,
|
active_seconds, wall_seconds,
|
||||||
tool_calls, tool_errors, primary_model
|
tool_calls, tool_errors, primary_model
|
||||||
FROM usage_session_summary
|
FROM usage_session_summary
|
||||||
WHERE username = ?
|
WHERE user_id = ? OR username = ?
|
||||||
ORDER BY started_at DESC NULLS LAST
|
ORDER BY started_at DESC NULLS LAST
|
||||||
""",
|
""",
|
||||||
[username],
|
[user_id, username],
|
||||||
).fetchall()
|
).fetchall()
|
||||||
except Exception:
|
except Exception:
|
||||||
rows_db = []
|
rows_db = []
|
||||||
|
|
||||||
processed_files: dict[str, dict] = {}
|
processed_files: dict[str, dict] = {}
|
||||||
if rows_db:
|
if rows_db:
|
||||||
cols = ["session_file", "session_id", "started_at", "ended_at",
|
cols = [
|
||||||
"active_seconds", "wall_seconds", "tool_calls", "tool_errors",
|
"session_file",
|
||||||
"primary_model"]
|
"session_id",
|
||||||
|
"started_at",
|
||||||
|
"ended_at",
|
||||||
|
"active_seconds",
|
||||||
|
"wall_seconds",
|
||||||
|
"tool_calls",
|
||||||
|
"tool_errors",
|
||||||
|
"primary_model",
|
||||||
|
]
|
||||||
for r in rows_db:
|
for r in rows_db:
|
||||||
d = dict(zip(cols, r))
|
d = dict(zip(cols, r))
|
||||||
# Normalise timestamps to ISO strings
|
# Normalise timestamps to ISO strings
|
||||||
|
|
@ -144,18 +154,20 @@ def list_user_sessions(
|
||||||
# Try to extract a session_id from the filename: the collector
|
# Try to extract a session_id from the filename: the collector
|
||||||
# names files like "<session_id>.jsonl" or "sess-<id>.jsonl".
|
# names files like "<session_id>.jsonl" or "sess-<id>.jsonl".
|
||||||
sid = p.stem
|
sid = p.stem
|
||||||
all_rows.append({
|
all_rows.append(
|
||||||
"session_file": fname,
|
{
|
||||||
"session_id": sid,
|
"session_file": fname,
|
||||||
"started_at": mtime.isoformat(),
|
"session_id": sid,
|
||||||
"ended_at": None,
|
"started_at": mtime.isoformat(),
|
||||||
"active_seconds": None,
|
"ended_at": None,
|
||||||
"wall_seconds": None,
|
"active_seconds": None,
|
||||||
"tool_calls": 0,
|
"wall_seconds": None,
|
||||||
"tool_errors": 0,
|
"tool_calls": 0,
|
||||||
"primary_model": None,
|
"tool_errors": 0,
|
||||||
"processed": False,
|
"primary_model": None,
|
||||||
})
|
"processed": False,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
# Sort: processed (have started_at) first then unprocessed, both newest-first
|
# Sort: processed (have started_at) first then unprocessed, both newest-first
|
||||||
def _sort_key(r: dict):
|
def _sort_key(r: dict):
|
||||||
|
|
@ -165,7 +177,7 @@ def list_user_sessions(
|
||||||
all_rows.sort(key=_sort_key, reverse=True)
|
all_rows.sort(key=_sort_key, reverse=True)
|
||||||
|
|
||||||
total = len(all_rows)
|
total = len(all_rows)
|
||||||
page = all_rows[offset: offset + limit]
|
page = all_rows[offset : offset + limit]
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"rows": page,
|
"rows": page,
|
||||||
|
|
@ -351,7 +363,7 @@ def list_user_activity(
|
||||||
audit_repo = AuditRepository(conn)
|
audit_repo = AuditRepository(conn)
|
||||||
rows, _ = audit_repo.query(user_id=user_id, limit=limit + offset)
|
rows, _ = audit_repo.query(user_id=user_id, limit=limit + offset)
|
||||||
# Apply offset via slicing — cursor-based pagination is per-page only
|
# Apply offset via slicing — cursor-based pagination is per-page only
|
||||||
rows = rows[offset: offset + limit]
|
rows = rows[offset : offset + limit]
|
||||||
|
|
||||||
# Normalise timestamps to ISO strings and decode JSON params
|
# Normalise timestamps to ISO strings and decode JSON params
|
||||||
for r in rows:
|
for r in rows:
|
||||||
|
|
@ -366,9 +378,7 @@ def list_user_activity(
|
||||||
except (ValueError, TypeError):
|
except (ValueError, TypeError):
|
||||||
pass
|
pass
|
||||||
|
|
||||||
total = conn.execute(
|
total = conn.execute("SELECT COUNT(*) FROM audit_log WHERE user_id = ?", [user_id]).fetchone()[0]
|
||||||
"SELECT COUNT(*) FROM audit_log WHERE user_id = ?", [user_id]
|
|
||||||
).fetchone()[0]
|
|
||||||
|
|
||||||
try:
|
try:
|
||||||
AuditRepository(conn).log(
|
AuditRepository(conn).log(
|
||||||
|
|
|
||||||
|
|
@ -84,9 +84,7 @@ def _username_for_stats(user: dict) -> str:
|
||||||
return email.split("@")[0] if "@" in email else email
|
return email.split("@")[0] if "@" in email else email
|
||||||
|
|
||||||
|
|
||||||
def compute_home_stats(
|
def compute_home_stats(conn: duckdb.DuckDBPyConnection, user: dict, window: str = "24h") -> dict:
|
||||||
conn: duckdb.DuckDBPyConnection, user: dict, window: str = "24h"
|
|
||||||
) -> dict:
|
|
||||||
"""Pure helper that returns the home-stats payload for the given user.
|
"""Pure helper that returns the home-stats payload for the given user.
|
||||||
|
|
||||||
Shared by the HTTP endpoint and the /home Jinja handler (server-side
|
Shared by the HTTP endpoint and the /home Jinja handler (server-side
|
||||||
|
|
@ -101,9 +99,13 @@ def compute_home_stats(
|
||||||
interval = _WINDOW_INTERVALS["24h"]
|
interval = _WINDOW_INTERVALS["24h"]
|
||||||
|
|
||||||
username = _username_for_stats(user)
|
username = _username_for_stats(user)
|
||||||
|
uid = user.get("id") or ""
|
||||||
|
|
||||||
# f-string interpolates only the validated interval literal above;
|
# f-string interpolates only the validated interval literal above;
|
||||||
# all user-controlled input flows through bound parameters.
|
# all user-controlled input flows through bound parameters.
|
||||||
|
# Match on both user_id (stable, populated by v45 pipeline) and
|
||||||
|
# username (legacy rows before v45 backfill) so stats are complete
|
||||||
|
# during the transition period.
|
||||||
sql = f"""
|
sql = f"""
|
||||||
WITH win AS (
|
WITH win AS (
|
||||||
SELECT current_timestamp - {interval} AS since
|
SELECT current_timestamp - {interval} AS since
|
||||||
|
|
@ -117,12 +119,13 @@ def compute_home_stats(
|
||||||
COALESCE(SUM(cache_read_tokens), 0) AS cache_read,
|
COALESCE(SUM(cache_read_tokens), 0) AS cache_read,
|
||||||
COALESCE(SUM(cache_creation_tokens), 0) AS cache_creation
|
COALESCE(SUM(cache_creation_tokens), 0) AS cache_creation
|
||||||
FROM usage_session_summary, win
|
FROM usage_session_summary, win
|
||||||
WHERE username = ? AND started_at >= win.since
|
WHERE (user_id = ? OR username = ?)
|
||||||
|
AND started_at >= win.since
|
||||||
),
|
),
|
||||||
proj AS (
|
proj AS (
|
||||||
SELECT COUNT(DISTINCT cwd) AS projects
|
SELECT COUNT(DISTINCT cwd) AS projects
|
||||||
FROM usage_events, win
|
FROM usage_events, win
|
||||||
WHERE username = ?
|
WHERE (user_id = ? OR username = ?)
|
||||||
AND cwd IS NOT NULL
|
AND cwd IS NOT NULL
|
||||||
AND occurred_at >= win.since
|
AND occurred_at >= win.since
|
||||||
),
|
),
|
||||||
|
|
@ -137,7 +140,7 @@ def compute_home_stats(
|
||||||
proj.projects
|
proj.projects
|
||||||
FROM u, sess, proj
|
FROM u, sess, proj
|
||||||
"""
|
"""
|
||||||
row = conn.execute(sql, [username, username, user["id"]]).fetchone()
|
row = conn.execute(sql, [uid, username, uid, username, uid]).fetchone()
|
||||||
|
|
||||||
if row is None:
|
if row is None:
|
||||||
return {
|
return {
|
||||||
|
|
@ -146,15 +149,16 @@ def compute_home_stats(
|
||||||
"sessions": 0,
|
"sessions": 0,
|
||||||
"prompts": 0,
|
"prompts": 0,
|
||||||
"tokens": {
|
"tokens": {
|
||||||
"input": 0, "output": 0,
|
"input": 0,
|
||||||
"cache_read": 0, "cache_creation": 0,
|
"output": 0,
|
||||||
|
"cache_read": 0,
|
||||||
|
"cache_creation": 0,
|
||||||
"total": 0,
|
"total": 0,
|
||||||
},
|
},
|
||||||
"projects": 0,
|
"projects": 0,
|
||||||
}
|
}
|
||||||
|
|
||||||
(last_pull_at, sessions, prompts,
|
(last_pull_at, sessions, prompts, input_t, output_t, cache_read, cache_creation, projects) = row
|
||||||
input_t, output_t, cache_read, cache_creation, projects) = row
|
|
||||||
return {
|
return {
|
||||||
"window": window,
|
"window": window,
|
||||||
"last_pull_at": last_pull_at.isoformat() if last_pull_at else None,
|
"last_pull_at": last_pull_at.isoformat() if last_pull_at else None,
|
||||||
|
|
@ -165,8 +169,7 @@ def compute_home_stats(
|
||||||
"output": int(output_t or 0),
|
"output": int(output_t or 0),
|
||||||
"cache_read": int(cache_read or 0),
|
"cache_read": int(cache_read or 0),
|
||||||
"cache_creation": int(cache_creation or 0),
|
"cache_creation": int(cache_creation or 0),
|
||||||
"total": int((input_t or 0) + (output_t or 0)
|
"total": int((input_t or 0) + (output_t or 0) + (cache_read or 0) + (cache_creation or 0)),
|
||||||
+ (cache_read or 0) + (cache_creation or 0)),
|
|
||||||
},
|
},
|
||||||
"projects": int(projects or 0),
|
"projects": int(projects or 0),
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -27,9 +27,8 @@ from __future__ import annotations
|
||||||
import logging
|
import logging
|
||||||
import re
|
import re
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from typing import Any, Optional
|
from typing import Any
|
||||||
|
|
||||||
import duckdb
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
@ -38,6 +37,7 @@ logger = logging.getLogger(__name__)
|
||||||
# Internal-table registry — single source of truth
|
# Internal-table registry — single source of truth
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
@dataclass(frozen=True)
|
||||||
class InternalTable:
|
class InternalTable:
|
||||||
"""One internal table mapping.
|
"""One internal table mapping.
|
||||||
|
|
@ -54,30 +54,34 @@ class InternalTable:
|
||||||
display_name: human-readable name (also goes into ``table_registry.name``)
|
display_name: human-readable name (also goes into ``table_registry.name``)
|
||||||
description: short blurb (catalog UI + ``agnes catalog`` output)
|
description: short blurb (catalog UI + ``agnes catalog`` output)
|
||||||
"""
|
"""
|
||||||
|
|
||||||
registry_id: str
|
registry_id: str
|
||||||
source_table: str
|
source_table: str
|
||||||
filter_column: str
|
filter_column: str
|
||||||
filter_kind: str # 'username' | 'user_id'
|
filter_kind: str # 'username' | 'user_id'
|
||||||
display_name: str
|
display_name: str
|
||||||
description: str
|
description: str
|
||||||
|
legacy_username_column: str | None = None # backward-compat OR fallback
|
||||||
|
|
||||||
|
|
||||||
INTERNAL_TABLES: tuple[InternalTable, ...] = (
|
INTERNAL_TABLES: tuple[InternalTable, ...] = (
|
||||||
InternalTable(
|
InternalTable(
|
||||||
registry_id="agnes_sessions",
|
registry_id="agnes_sessions",
|
||||||
source_table="usage_session_summary",
|
source_table="usage_session_summary",
|
||||||
filter_column="username",
|
filter_column="user_id",
|
||||||
filter_kind="username",
|
filter_kind="user_id",
|
||||||
display_name="Agnes sessions",
|
display_name="Agnes sessions",
|
||||||
description="Claude Code sessions. Also available locally for analysis.",
|
description="Claude Code sessions. Also available locally for analysis.",
|
||||||
|
legacy_username_column="username",
|
||||||
),
|
),
|
||||||
InternalTable(
|
InternalTable(
|
||||||
registry_id="agnes_telemetry",
|
registry_id="agnes_telemetry",
|
||||||
source_table="usage_events",
|
source_table="usage_events",
|
||||||
filter_column="username",
|
filter_column="user_id",
|
||||||
filter_kind="username",
|
filter_kind="user_id",
|
||||||
display_name="Agnes telemetry events",
|
display_name="Agnes telemetry events",
|
||||||
description="Tool and skill invocations from Claude Code. Also available locally for analysis.",
|
description="Tool and skill invocations from Claude Code. Also available locally for analysis.",
|
||||||
|
legacy_username_column="username",
|
||||||
),
|
),
|
||||||
InternalTable(
|
InternalTable(
|
||||||
registry_id="agnes_audit",
|
registry_id="agnes_audit",
|
||||||
|
|
@ -89,9 +93,7 @@ INTERNAL_TABLES: tuple[InternalTable, ...] = (
|
||||||
),
|
),
|
||||||
)
|
)
|
||||||
|
|
||||||
INTERNAL_TABLES_BY_ID: dict[str, InternalTable] = {
|
INTERNAL_TABLES_BY_ID: dict[str, InternalTable] = {t.registry_id: t for t in INTERNAL_TABLES}
|
||||||
t.registry_id: t for t in INTERNAL_TABLES
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def is_internal_table(table_id: str) -> bool:
|
def is_internal_table(table_id: str) -> bool:
|
||||||
|
|
@ -106,7 +108,7 @@ def is_internal_table(table_id: str) -> bool:
|
||||||
# resolve to filesystem usernames with a `+`, and the session-data-dir
|
# resolve to filesystem usernames with a `+`, and the session-data-dir
|
||||||
# layout already supports the same character class.
|
# layout already supports the same character class.
|
||||||
_USERNAME_RE = re.compile(r"^[A-Za-z0-9._+-]{1,200}$")
|
_USERNAME_RE = re.compile(r"^[A-Za-z0-9._+-]{1,200}$")
|
||||||
_USER_ID_RE = re.compile(r"^[A-Za-z0-9._@:+-]{1,200}$")
|
_USER_ID_RE = re.compile(r"^[A-Za-z0-9._@:+-]{1,200}$")
|
||||||
|
|
||||||
|
|
||||||
class InternalAccessError(Exception):
|
class InternalAccessError(Exception):
|
||||||
|
|
@ -128,16 +130,12 @@ def _filter_value(user: dict[str, Any], kind: str) -> str:
|
||||||
email = (user or {}).get("email", "") or ""
|
email = (user or {}).get("email", "") or ""
|
||||||
username = email.split("@")[0] if "@" in email else email
|
username = email.split("@")[0] if "@" in email else email
|
||||||
if not _USERNAME_RE.match(username):
|
if not _USERNAME_RE.match(username):
|
||||||
raise InternalAccessError(
|
raise InternalAccessError(f"user email {email!r} does not yield a safe username for scoping")
|
||||||
f"user email {email!r} does not yield a safe username for scoping"
|
|
||||||
)
|
|
||||||
return username
|
return username
|
||||||
if kind == "user_id":
|
if kind == "user_id":
|
||||||
uid = (user or {}).get("id", "") or ""
|
uid = (user or {}).get("id", "") or ""
|
||||||
if not _USER_ID_RE.match(uid):
|
if not _USER_ID_RE.match(uid):
|
||||||
raise InternalAccessError(
|
raise InternalAccessError(f"user_id {uid!r} fails the safe-identifier check")
|
||||||
f"user_id {uid!r} fails the safe-identifier check"
|
|
||||||
)
|
|
||||||
return uid
|
return uid
|
||||||
raise InternalAccessError(f"unknown filter_kind: {kind!r}")
|
raise InternalAccessError(f"unknown filter_kind: {kind!r}")
|
||||||
|
|
||||||
|
|
@ -145,15 +143,24 @@ def _filter_value(user: dict[str, Any], kind: str) -> str:
|
||||||
def build_filter_clause(table: InternalTable, user: dict[str, Any], is_admin: bool) -> str:
|
def build_filter_clause(table: InternalTable, user: dict[str, Any], is_admin: bool) -> str:
|
||||||
"""Return the WHERE clause for one internal table.
|
"""Return the WHERE clause for one internal table.
|
||||||
|
|
||||||
Admins get an empty string (unscoped view). Everyone else gets
|
Admins get an empty string (unscoped view). Everyone else get
|
||||||
``WHERE <col> = '<value>'`` where value has been regex-validated.
|
``WHERE <col> = '<value>'`` where value has been regex-validated.
|
||||||
|
|
||||||
|
``agnes_sessions`` and ``agnes_telemetry`` filter primarily on
|
||||||
|
``user_id`` (stable UUID) but include an OR fallback on
|
||||||
|
``username`` (email local-part) for rows that pre-date the v45
|
||||||
|
backfill. Once all rows carry a non-NULL ``user_id`` the
|
||||||
|
``legacy_username_column`` field can be removed.
|
||||||
"""
|
"""
|
||||||
if is_admin:
|
if is_admin:
|
||||||
return ""
|
return ""
|
||||||
value = _filter_value(user, table.filter_kind)
|
value = _filter_value(user, table.filter_kind)
|
||||||
# value is regex-validated to ``[A-Za-z0-9._@:-]`` so single-quote
|
|
||||||
# escape is unnecessary; double single-quote anyway for defense-in-depth.
|
|
||||||
safe = value.replace("'", "''")
|
safe = value.replace("'", "''")
|
||||||
|
|
||||||
|
if table.legacy_username_column:
|
||||||
|
legacy = _filter_value(user, "username").replace("'", "''")
|
||||||
|
return f"WHERE ({table.filter_column} = '{safe}' OR {table.legacy_username_column} = '{legacy}')"
|
||||||
|
|
||||||
return f"WHERE {table.filter_column} = '{safe}'"
|
return f"WHERE {table.filter_column} = '{safe}'"
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -162,7 +169,7 @@ def build_filter_clause(table: InternalTable, user: dict[str, Any], is_admin: bo
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
_TABLE_REF_RE = re.compile(
|
_TABLE_REF_RE = re.compile(
|
||||||
r'\b(' + '|'.join(re.escape(t.registry_id) for t in INTERNAL_TABLES) + r')\b',
|
r"\b(" + "|".join(re.escape(t.registry_id) for t in INTERNAL_TABLES) + r")\b",
|
||||||
re.IGNORECASE,
|
re.IGNORECASE,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
@ -175,7 +182,7 @@ _SQL_STRING_LITERAL_RE = re.compile(r"'(?:''|[^'])*'")
|
||||||
# comment-wrapped table name (`/**/users/**/`) can't slip past the
|
# comment-wrapped table name (`/**/users/**/`) can't slip past the
|
||||||
# identifier scan downstream.
|
# identifier scan downstream.
|
||||||
_SQL_BLOCK_COMMENT_RE = re.compile(r"/\*[\s\S]*?\*/")
|
_SQL_BLOCK_COMMENT_RE = re.compile(r"/\*[\s\S]*?\*/")
|
||||||
_SQL_LINE_COMMENT_RE = re.compile(r"--[^\n]*")
|
_SQL_LINE_COMMENT_RE = re.compile(r"--[^\n]*")
|
||||||
|
|
||||||
|
|
||||||
def _strip_sql_noise(sql: str) -> str:
|
def _strip_sql_noise(sql: str) -> str:
|
||||||
|
|
@ -190,9 +197,7 @@ def _strip_sql_noise(sql: str) -> str:
|
||||||
return s
|
return s
|
||||||
|
|
||||||
|
|
||||||
_INTERNAL_ALIAS_NAMES: frozenset[str] = frozenset(
|
_INTERNAL_ALIAS_NAMES: frozenset[str] = frozenset(t.registry_id.lower() for t in INTERNAL_TABLES)
|
||||||
t.registry_id.lower() for t in INTERNAL_TABLES
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def _sensitive_table_reference(stripped_sql: str, conn) -> str | None:
|
def _sensitive_table_reference(stripped_sql: str, conn) -> str | None:
|
||||||
|
|
@ -211,10 +216,7 @@ def _sensitive_table_reference(stripped_sql: str, conn) -> str | None:
|
||||||
double-quoted (`"users"`) forms both match because the bare name
|
double-quoted (`"users"`) forms both match because the bare name
|
||||||
still sits between word boundaries.
|
still sits between word boundaries.
|
||||||
"""
|
"""
|
||||||
rows = conn.execute(
|
rows = conn.execute("SELECT table_name FROM information_schema.tables WHERE table_schema = 'main'").fetchall()
|
||||||
"SELECT table_name FROM information_schema.tables "
|
|
||||||
"WHERE table_schema = 'main'"
|
|
||||||
).fetchall()
|
|
||||||
for (name,) in rows:
|
for (name,) in rows:
|
||||||
if name is None:
|
if name is None:
|
||||||
continue
|
continue
|
||||||
|
|
@ -300,17 +302,13 @@ def execute_internal_query(
|
||||||
sensitive = _sensitive_table_reference(stripped, get_system_db())
|
sensitive = _sensitive_table_reference(stripped, get_system_db())
|
||||||
if sensitive is not None:
|
if sensitive is not None:
|
||||||
raise InternalAccessError(
|
raise InternalAccessError(
|
||||||
f"non-admin SQL cannot reference table {sensitive!r}; "
|
f"non-admin SQL cannot reference table {sensitive!r}; query one of the agnes_* aliases instead"
|
||||||
"query one of the agnes_* aliases instead"
|
|
||||||
)
|
)
|
||||||
cte_parts = []
|
cte_parts = []
|
||||||
for table_id in refs:
|
for table_id in refs:
|
||||||
table = INTERNAL_TABLES_BY_ID[table_id]
|
table = INTERNAL_TABLES_BY_ID[table_id]
|
||||||
where_clause = build_filter_clause(table, user, is_admin)
|
where_clause = build_filter_clause(table, user, is_admin)
|
||||||
cte_parts.append(
|
cte_parts.append(f"{table.registry_id} AS (SELECT * FROM {table.source_table} {where_clause})")
|
||||||
f"{table.registry_id} AS "
|
|
||||||
f"(SELECT * FROM {table.source_table} {where_clause})"
|
|
||||||
)
|
|
||||||
cte_prefix = "WITH " + ", ".join(cte_parts)
|
cte_prefix = "WITH " + ", ".join(cte_parts)
|
||||||
wrapped = f"{cte_prefix} SELECT * FROM ({sql}) AS _agnes_user_query"
|
wrapped = f"{cte_prefix} SELECT * FROM ({sql}) AS _agnes_user_query"
|
||||||
|
|
||||||
|
|
@ -332,6 +330,7 @@ def execute_internal_query(
|
||||||
# Schema introspection — feeds /api/v2/schema/{id} for internal tables
|
# Schema introspection — feeds /api/v2/schema/{id} for internal tables
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
def get_schema(system_db_path: str, table_id: str) -> list[dict]:
|
def get_schema(system_db_path: str, table_id: str) -> list[dict]:
|
||||||
"""Return the underlying physical schema for an internal table.
|
"""Return the underlying physical schema for an internal table.
|
||||||
|
|
||||||
|
|
@ -349,6 +348,7 @@ def get_schema(system_db_path: str, table_id: str) -> list[dict]:
|
||||||
return []
|
return []
|
||||||
table = INTERNAL_TABLES_BY_ID[table_id]
|
table = INTERNAL_TABLES_BY_ID[table_id]
|
||||||
from src.db import get_system_db
|
from src.db import get_system_db
|
||||||
|
|
||||||
cursor = get_system_db().cursor()
|
cursor = get_system_db().cursor()
|
||||||
try:
|
try:
|
||||||
rows = cursor.execute(
|
rows = cursor.execute(
|
||||||
|
|
@ -357,10 +357,7 @@ def get_schema(system_db_path: str, table_id: str) -> list[dict]:
|
||||||
"WHERE table_name = ? ORDER BY ordinal_position",
|
"WHERE table_name = ? ORDER BY ordinal_position",
|
||||||
[table.source_table],
|
[table.source_table],
|
||||||
).fetchall()
|
).fetchall()
|
||||||
return [
|
return [{"name": r[0], "type": r[1], "nullable": r[2] == "YES"} for r in rows]
|
||||||
{"name": r[0], "type": r[1], "nullable": r[2] == "YES"}
|
|
||||||
for r in rows
|
|
||||||
]
|
|
||||||
finally:
|
finally:
|
||||||
try:
|
try:
|
||||||
cursor.close()
|
cursor.close()
|
||||||
|
|
|
||||||
|
|
@ -1,6 +1,6 @@
|
||||||
[project]
|
[project]
|
||||||
name = "agnes-the-ai-analyst"
|
name = "agnes-the-ai-analyst"
|
||||||
version = "0.54.12"
|
version = "0.54.13"
|
||||||
description = "Agnes — AI Data Analyst platform for AI analytical systems"
|
description = "Agnes — AI Data Analyst platform for AI analytical systems"
|
||||||
requires-python = ">=3.11,<3.14"
|
requires-python = ">=3.11,<3.14"
|
||||||
license = "MIT"
|
license = "MIT"
|
||||||
|
|
|
||||||
|
|
@ -25,6 +25,43 @@ from src.repositories.session_processor_state import SessionProcessorStateReposi
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_user_id(
|
||||||
|
conn: duckdb.DuckDBPyConnection,
|
||||||
|
username: str,
|
||||||
|
) -> str | None:
|
||||||
|
"""Map a session-directory name to the stable ``users.id`` UUID.
|
||||||
|
|
||||||
|
Two conventions exist for the directory name under
|
||||||
|
``/data/user_sessions/``:
|
||||||
|
|
||||||
|
* **Session collector** writes under the OS username, which in
|
||||||
|
current deployments equals the email local-part (e.g. ``alice``).
|
||||||
|
* **Upload API** writes under ``user["id"]`` — a UUID.
|
||||||
|
|
||||||
|
Resolution order:
|
||||||
|
1. Exact match on ``users.id`` (covers the UUID path).
|
||||||
|
2. Email local-part match: ``users.email LIKE '<username>@%'``.
|
||||||
|
If multiple users share the same local-part (different domains),
|
||||||
|
we pick the one most recently updated.
|
||||||
|
3. Fallback: return ``None`` (orphaned / deleted user).
|
||||||
|
"""
|
||||||
|
row = conn.execute(
|
||||||
|
"SELECT id FROM users WHERE id = ?",
|
||||||
|
[username],
|
||||||
|
).fetchone()
|
||||||
|
if row:
|
||||||
|
return row[0]
|
||||||
|
escaped = username.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
|
||||||
|
row = conn.execute(
|
||||||
|
"SELECT id FROM users WHERE email LIKE ? || '@%' ESCAPE '\\' ORDER BY updated_at DESC NULLS LAST LIMIT 1",
|
||||||
|
[escaped],
|
||||||
|
).fetchone()
|
||||||
|
if row:
|
||||||
|
return row[0]
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
DEFAULT_SESSION_DATA_DIR = Path(os.environ.get("SESSION_DATA_DIR", "/data/user_sessions"))
|
DEFAULT_SESSION_DATA_DIR = Path(os.environ.get("SESSION_DATA_DIR", "/data/user_sessions"))
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -60,6 +97,11 @@ def run_processor(
|
||||||
logger.info("No sessions to process for processor=%s", processor.name)
|
logger.info("No sessions to process for processor=%s", processor.name)
|
||||||
return stats
|
return stats
|
||||||
|
|
||||||
|
# Pre-resolve user_id per directory name so each processor can
|
||||||
|
# store the stable identity. Cache avoids repeated DB lookups when
|
||||||
|
# one user has many sessions.
|
||||||
|
_uid_cache: dict[str, str | None] = {}
|
||||||
|
|
||||||
for username, jsonl_path in candidates:
|
for username, jsonl_path in candidates:
|
||||||
session_key = f"{username}/{jsonl_path.name}"
|
session_key = f"{username}/{jsonl_path.name}"
|
||||||
try:
|
try:
|
||||||
|
|
@ -67,7 +109,9 @@ def run_processor(
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.warning(
|
logger.warning(
|
||||||
"Cannot hash %s for processor=%s: %s",
|
"Cannot hash %s for processor=%s: %s",
|
||||||
session_key, processor.name, e,
|
session_key,
|
||||||
|
processor.name,
|
||||||
|
e,
|
||||||
)
|
)
|
||||||
stats["errors"] += 1
|
stats["errors"] += 1
|
||||||
stats["errors_detail"].append({"session": session_key, "error": str(e)})
|
stats["errors_detail"].append({"session": session_key, "error": str(e)})
|
||||||
|
|
@ -82,12 +126,23 @@ def run_processor(
|
||||||
stats["skipped"] += 1
|
stats["skipped"] += 1
|
||||||
continue
|
continue
|
||||||
|
|
||||||
|
if username not in _uid_cache:
|
||||||
|
_uid_cache[username] = resolve_user_id(conn, username)
|
||||||
|
resolved_uid = _uid_cache[username]
|
||||||
|
|
||||||
try:
|
try:
|
||||||
result = processor.process_session(jsonl_path, username, session_key, conn)
|
result = processor.process_session(
|
||||||
|
jsonl_path,
|
||||||
|
username,
|
||||||
|
session_key,
|
||||||
|
conn,
|
||||||
|
user_id=resolved_uid,
|
||||||
|
)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.exception(
|
logger.exception(
|
||||||
"Processor %s failed on %s — leaving state unwritten for retry",
|
"Processor %s failed on %s — leaving state unwritten for retry",
|
||||||
processor.name, session_key,
|
processor.name,
|
||||||
|
session_key,
|
||||||
)
|
)
|
||||||
stats["errors"] += 1
|
stats["errors"] += 1
|
||||||
stats["errors_detail"].append({"session": session_key, "error": str(e)})
|
stats["errors_detail"].append({"session": session_key, "error": str(e)})
|
||||||
|
|
@ -101,7 +156,8 @@ def run_processor(
|
||||||
# cause the same session to be retried forever.
|
# cause the same session to be retried forever.
|
||||||
logger.warning(
|
logger.warning(
|
||||||
"Processor %s returned non-ProcessorResult on %s; coercing to empty result",
|
"Processor %s returned non-ProcessorResult on %s; coercing to empty result",
|
||||||
processor.name, session_key,
|
processor.name,
|
||||||
|
session_key,
|
||||||
)
|
)
|
||||||
result = ProcessorResult(items_count=0)
|
result = ProcessorResult(items_count=0)
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -32,6 +32,8 @@ class UsageProcessor:
|
||||||
username: str,
|
username: str,
|
||||||
session_key: str,
|
session_key: str,
|
||||||
conn: duckdb.DuckDBPyConnection,
|
conn: duckdb.DuckDBPyConnection,
|
||||||
|
*,
|
||||||
|
user_id: str | None = None,
|
||||||
) -> ProcessorResult:
|
) -> ProcessorResult:
|
||||||
turns = parse_jsonl(session_path)
|
turns = parse_jsonl(session_path)
|
||||||
events = list(iter_events(turns))
|
events = list(iter_events(turns))
|
||||||
|
|
@ -63,6 +65,7 @@ class UsageProcessor:
|
||||||
"session_id": session_id,
|
"session_id": session_id,
|
||||||
"session_file": session_key,
|
"session_file": session_key,
|
||||||
"username": username,
|
"username": username,
|
||||||
|
"user_id": user_id,
|
||||||
"event_uuid": e.event_uuid,
|
"event_uuid": e.event_uuid,
|
||||||
"parent_uuid": e.parent_uuid,
|
"parent_uuid": e.parent_uuid,
|
||||||
"event_type": e.event_type,
|
"event_type": e.event_type,
|
||||||
|
|
@ -83,6 +86,7 @@ class UsageProcessor:
|
||||||
summary = compute_summary(turns, rows)
|
summary = compute_summary(turns, rows)
|
||||||
summary["session_file"] = session_key
|
summary["session_file"] = session_key
|
||||||
summary["username"] = username
|
summary["username"] = username
|
||||||
|
summary["user_id"] = user_id
|
||||||
# Override session_id with the resolved one
|
# Override session_id with the resolved one
|
||||||
if not summary.get("session_id"):
|
if not summary.get("session_id"):
|
||||||
summary["session_id"] = session_id
|
summary["session_id"] = session_id
|
||||||
|
|
|
||||||
|
|
@ -40,13 +40,29 @@ from dataclasses import dataclass
|
||||||
from datetime import datetime, timezone, timedelta
|
from datetime import datetime, timezone, timedelta
|
||||||
from typing import Iterator
|
from typing import Iterator
|
||||||
|
|
||||||
USAGE_PROCESSOR_VERSION = 3
|
# v4: #293 added the user_id column to usage tables — bump forces the
|
||||||
|
# session-pipeline reprocess loop to backfill user_id on existing rows.
|
||||||
|
# (v3 was #303's <command-name> slash extraction.)
|
||||||
|
USAGE_PROCESSOR_VERSION = 4
|
||||||
|
|
||||||
BUILTIN_TOOLS = frozenset({
|
BUILTIN_TOOLS = frozenset(
|
||||||
"Bash", "Read", "Edit", "Write", "Grep", "Glob", "TodoWrite",
|
{
|
||||||
"Task", "Agent", "NotebookEdit", "WebFetch", "WebSearch", "ExitPlanMode",
|
"Bash",
|
||||||
"LS", # also built-in
|
"Read",
|
||||||
})
|
"Edit",
|
||||||
|
"Write",
|
||||||
|
"Grep",
|
||||||
|
"Glob",
|
||||||
|
"TodoWrite",
|
||||||
|
"Task",
|
||||||
|
"Agent",
|
||||||
|
"NotebookEdit",
|
||||||
|
"WebFetch",
|
||||||
|
"WebSearch",
|
||||||
|
"ExitPlanMode",
|
||||||
|
"LS", # also built-in
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
# Claude Code wraps user-typed slash invocations as
|
# Claude Code wraps user-typed slash invocations as
|
||||||
# <command-name>/<name></command-name> inside the user message content
|
# <command-name>/<name></command-name> inside the user message content
|
||||||
|
|
@ -57,18 +73,23 @@ BUILTIN_TOOLS = frozenset({
|
||||||
COMMAND_NAME_RE = re.compile(r"<command-name>/([A-Za-z][\w:-]*)</command-name>")
|
COMMAND_NAME_RE = re.compile(r"<command-name>/([A-Za-z][\w:-]*)</command-name>")
|
||||||
|
|
||||||
# Event types to skip entirely
|
# Event types to skip entirely
|
||||||
_SKIP_TYPES = frozenset({
|
_SKIP_TYPES = frozenset(
|
||||||
"system", "summary", "file-history-snapshot",
|
{
|
||||||
"queue-operation", "progress",
|
"system",
|
||||||
})
|
"summary",
|
||||||
|
"file-history-snapshot",
|
||||||
|
"queue-operation",
|
||||||
|
"progress",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
@dataclass(frozen=True)
|
||||||
class ParsedEvent:
|
class ParsedEvent:
|
||||||
event_uuid: str | None
|
event_uuid: str | None
|
||||||
parent_uuid: str | None
|
parent_uuid: str | None
|
||||||
tool_id: str | None # tool_use 'id' (tu_xxx) from message.content item; None for slash_command
|
tool_id: str | None # tool_use 'id' (tu_xxx) from message.content item; None for slash_command
|
||||||
event_type: str # 'tool_use' | 'slash_command' | 'subagent' | 'mcp_call'
|
event_type: str # 'tool_use' | 'slash_command' | 'subagent' | 'mcp_call'
|
||||||
tool_name: str | None
|
tool_name: str | None
|
||||||
skill_name: str | None
|
skill_name: str | None
|
||||||
subagent_type: str | None
|
subagent_type: str | None
|
||||||
|
|
@ -207,9 +228,7 @@ def iter_events(turns: list[dict]) -> Iterator[ParsedEvent]:
|
||||||
text_parts = [content]
|
text_parts = [content]
|
||||||
elif isinstance(content, list):
|
elif isinstance(content, list):
|
||||||
text_parts = [
|
text_parts = [
|
||||||
item.get("text", "")
|
item.get("text", "") for item in content if isinstance(item, dict) and item.get("type") == "text"
|
||||||
for item in content
|
|
||||||
if isinstance(item, dict) and item.get("type") == "text"
|
|
||||||
]
|
]
|
||||||
else:
|
else:
|
||||||
text_parts = []
|
text_parts = []
|
||||||
|
|
@ -243,7 +262,7 @@ class AttributionLookup:
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def __init__(self, conn):
|
def __init__(self, conn):
|
||||||
self._skills: dict[str, tuple[str, str]] = {} # name -> (source, ref_id)
|
self._skills: dict[str, tuple[str, str]] = {} # name -> (source, ref_id)
|
||||||
self._agents: dict[str, tuple[str, str]] = {}
|
self._agents: dict[str, tuple[str, str]] = {}
|
||||||
self._commands: dict[str, tuple[str, str]] = {}
|
self._commands: dict[str, tuple[str, str]] = {}
|
||||||
|
|
||||||
|
|
@ -375,9 +394,7 @@ def compute_summary(turns: list[dict], events: list[dict]) -> dict:
|
||||||
|
|
||||||
started_at = min(timestamps) if timestamps else None
|
started_at = min(timestamps) if timestamps else None
|
||||||
ended_at = max(timestamps) if timestamps else None
|
ended_at = max(timestamps) if timestamps else None
|
||||||
wall_seconds = (
|
wall_seconds = int((ended_at - started_at).total_seconds()) if started_at and ended_at else 0
|
||||||
int((ended_at - started_at).total_seconds()) if started_at and ended_at else 0
|
|
||||||
)
|
|
||||||
active_seconds = compute_active_seconds(timestamps)
|
active_seconds = compute_active_seconds(timestamps)
|
||||||
|
|
||||||
# Aggregate counts from events
|
# Aggregate counts from events
|
||||||
|
|
|
||||||
|
|
@ -42,6 +42,7 @@ class VerificationProcessor:
|
||||||
username: str,
|
username: str,
|
||||||
session_key: str,
|
session_key: str,
|
||||||
conn: duckdb.DuckDBPyConnection,
|
conn: duckdb.DuckDBPyConnection,
|
||||||
|
**kwargs: object,
|
||||||
) -> ProcessorResult:
|
) -> ProcessorResult:
|
||||||
repo = KnowledgeRepository(conn)
|
repo = KnowledgeRepository(conn)
|
||||||
session_id = f"session-{session_path.stem}-{username}"
|
session_id = f"session-{session_path.stem}-{username}"
|
||||||
|
|
@ -126,7 +127,8 @@ class VerificationProcessor:
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.warning(
|
logger.warning(
|
||||||
"Duplicate-candidate detection failed for %s: %s",
|
"Duplicate-candidate detection failed for %s: %s",
|
||||||
item_id, e,
|
item_id,
|
||||||
|
e,
|
||||||
)
|
)
|
||||||
|
|
||||||
# Run contradiction detection inline. Failure of the LLM
|
# Run contradiction detection inline. Failure of the LLM
|
||||||
|
|
@ -140,12 +142,15 @@ class VerificationProcessor:
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.warning(
|
logger.warning(
|
||||||
"Unexpected error during contradiction check for %s: %s",
|
"Unexpected error during contradiction check for %s: %s",
|
||||||
item_id, e,
|
item_id,
|
||||||
|
e,
|
||||||
)
|
)
|
||||||
|
|
||||||
logger.info(
|
logger.info(
|
||||||
"Processed %s: %d verifications, %d items created",
|
"Processed %s: %d verifications, %d items created",
|
||||||
session_key, len(verifications), items_created,
|
session_key,
|
||||||
|
len(verifications),
|
||||||
|
items_created,
|
||||||
)
|
)
|
||||||
return ProcessorResult(items_count=items_created)
|
return ProcessorResult(items_count=items_created)
|
||||||
|
|
||||||
|
|
|
||||||
316
src/db.py
316
src/db.py
|
|
@ -40,7 +40,7 @@ def _maybe_instrument(con, db_tag: str):
|
||||||
|
|
||||||
_SAFE_IDENTIFIER = re.compile(r"^[a-zA-Z_][a-zA-Z0-9_]{0,63}$")
|
_SAFE_IDENTIFIER = re.compile(r"^[a-zA-Z_][a-zA-Z0-9_]{0,63}$")
|
||||||
|
|
||||||
SCHEMA_VERSION = 44
|
SCHEMA_VERSION = 45
|
||||||
|
|
||||||
_SYSTEM_SCHEMA = """
|
_SYSTEM_SCHEMA = """
|
||||||
CREATE TABLE IF NOT EXISTS schema_version (
|
CREATE TABLE IF NOT EXISTS schema_version (
|
||||||
|
|
@ -723,13 +723,18 @@ CREATE TABLE IF NOT EXISTS usage_events (
|
||||||
occurred_at TIMESTAMP NOT NULL,
|
occurred_at TIMESTAMP NOT NULL,
|
||||||
processor_version INTEGER NOT NULL,
|
processor_version INTEGER NOT NULL,
|
||||||
extracted_at TIMESTAMP DEFAULT current_timestamp,
|
extracted_at TIMESTAMP DEFAULT current_timestamp,
|
||||||
friction_tags JSON
|
friction_tags JSON,
|
||||||
|
user_id VARCHAR
|
||||||
);
|
);
|
||||||
CREATE INDEX IF NOT EXISTS idx_usage_events_session ON usage_events(session_id);
|
CREATE INDEX IF NOT EXISTS idx_usage_events_session ON usage_events(session_id);
|
||||||
CREATE INDEX IF NOT EXISTS idx_usage_events_user_time ON usage_events(username, occurred_at);
|
CREATE INDEX IF NOT EXISTS idx_usage_events_user_time ON usage_events(username, occurred_at);
|
||||||
CREATE INDEX IF NOT EXISTS idx_usage_events_tool ON usage_events(tool_name);
|
CREATE INDEX IF NOT EXISTS idx_usage_events_tool ON usage_events(tool_name);
|
||||||
CREATE INDEX IF NOT EXISTS idx_usage_events_skill ON usage_events(skill_name);
|
CREATE INDEX IF NOT EXISTS idx_usage_events_skill ON usage_events(skill_name);
|
||||||
CREATE INDEX IF NOT EXISTS idx_usage_events_ref ON usage_events(source, ref_id);
|
CREATE INDEX IF NOT EXISTS idx_usage_events_ref ON usage_events(source, ref_id);
|
||||||
|
-- idx_usage_events_user_id is created by _v44_to_v45, not here: _SYSTEM_SCHEMA
|
||||||
|
-- runs before the migration ladder, and CREATE TABLE IF NOT EXISTS won't add
|
||||||
|
-- user_id to a pre-v45 usage_events, so an index on it would fail to bind.
|
||||||
|
-- Same pattern as the v41 audit_log indices below.
|
||||||
|
|
||||||
CREATE TABLE IF NOT EXISTS usage_session_summary (
|
CREATE TABLE IF NOT EXISTS usage_session_summary (
|
||||||
session_file VARCHAR PRIMARY KEY,
|
session_file VARCHAR PRIMARY KEY,
|
||||||
|
|
@ -760,10 +765,13 @@ CREATE TABLE IF NOT EXISTS usage_session_summary (
|
||||||
input_tokens BIGINT DEFAULT 0,
|
input_tokens BIGINT DEFAULT 0,
|
||||||
output_tokens BIGINT DEFAULT 0,
|
output_tokens BIGINT DEFAULT 0,
|
||||||
cache_read_tokens BIGINT DEFAULT 0,
|
cache_read_tokens BIGINT DEFAULT 0,
|
||||||
cache_creation_tokens BIGINT DEFAULT 0
|
cache_creation_tokens BIGINT DEFAULT 0,
|
||||||
|
user_id VARCHAR
|
||||||
);
|
);
|
||||||
CREATE INDEX IF NOT EXISTS idx_usage_session_user ON usage_session_summary(username);
|
CREATE INDEX IF NOT EXISTS idx_usage_session_user ON usage_session_summary(username);
|
||||||
CREATE INDEX IF NOT EXISTS idx_usage_session_started ON usage_session_summary(started_at);
|
CREATE INDEX IF NOT EXISTS idx_usage_session_started ON usage_session_summary(started_at);
|
||||||
|
-- idx_usage_session_user_id is created by _v44_to_v45, not here — see the
|
||||||
|
-- note on idx_usage_events_user_id above.
|
||||||
|
|
||||||
CREATE TABLE IF NOT EXISTS usage_tool_daily (
|
CREATE TABLE IF NOT EXISTS usage_tool_daily (
|
||||||
day DATE NOT NULL,
|
day DATE NOT NULL,
|
||||||
|
|
@ -881,15 +889,16 @@ def _try_open_system_db(db_path: str) -> duckdb.DuckDBPyConnection:
|
||||||
snapshot = Path(db_path).parent / "system.duckdb.pre-migrate"
|
snapshot = Path(db_path).parent / "system.duckdb.pre-migrate"
|
||||||
if not snapshot.exists():
|
if not snapshot.exists():
|
||||||
logger.error(
|
logger.error(
|
||||||
"WAL replay failed and no pre-migrate snapshot at %s — "
|
"WAL replay failed and no pre-migrate snapshot at %s — manual recovery required.",
|
||||||
"manual recovery required.", snapshot,
|
snapshot,
|
||||||
)
|
)
|
||||||
raise
|
raise
|
||||||
wal_path = Path(db_path + ".wal")
|
wal_path = Path(db_path + ".wal")
|
||||||
logger.warning(
|
logger.warning(
|
||||||
"WAL replay failed (%s) — auto-restoring from pre-migrate "
|
"WAL replay failed (%s) — auto-restoring from pre-migrate "
|
||||||
"snapshot %s. The migration ladder will re-run on this start.",
|
"snapshot %s. The migration ladder will re-run on this start.",
|
||||||
msg.split("\n", 1)[0][:200], snapshot,
|
msg.split("\n", 1)[0][:200],
|
||||||
|
snapshot,
|
||||||
)
|
)
|
||||||
# Move (not copy) the broken DB aside so an operator can post-
|
# Move (not copy) the broken DB aside so an operator can post-
|
||||||
# mortem if needed. The pre-migrate snapshot becomes the new
|
# mortem if needed. The pre-migrate snapshot becomes the new
|
||||||
|
|
@ -961,9 +970,7 @@ def get_analytics_db() -> duckdb.DuckDBPyConnection:
|
||||||
return _maybe_instrument(_analytics_db_conn.cursor(), "analytics")
|
return _maybe_instrument(_analytics_db_conn.cursor(), "analytics")
|
||||||
|
|
||||||
|
|
||||||
def _reattach_remote_extensions(
|
def _reattach_remote_extensions(conn: duckdb.DuckDBPyConnection, extracts_dir: Path) -> None:
|
||||||
conn: duckdb.DuckDBPyConnection, extracts_dir: Path
|
|
||||||
) -> None:
|
|
||||||
"""Re-LOAD DuckDB extensions listed in _remote_attach tables of each extract.duckdb.
|
"""Re-LOAD DuckDB extensions listed in _remote_attach tables of each extract.duckdb.
|
||||||
|
|
||||||
Called from get_analytics_db_readonly() after ATTACHing extract.duckdb files so
|
Called from get_analytics_db_readonly() after ATTACHing extract.duckdb files so
|
||||||
|
|
@ -974,9 +981,7 @@ def _reattach_remote_extensions(
|
||||||
return
|
return
|
||||||
|
|
||||||
try:
|
try:
|
||||||
attached_dbs = {
|
attached_dbs = {r[0] for r in conn.execute("SELECT database_name FROM duckdb_databases()").fetchall()}
|
||||||
r[0] for r in conn.execute("SELECT database_name FROM duckdb_databases()").fetchall()
|
|
||||||
}
|
|
||||||
except Exception:
|
except Exception:
|
||||||
return
|
return
|
||||||
|
|
||||||
|
|
@ -1013,9 +1018,7 @@ def _reattach_remote_extensions(
|
||||||
|
|
||||||
# Refresh attached list before processing each source's rows
|
# Refresh attached list before processing each source's rows
|
||||||
try:
|
try:
|
||||||
attached_dbs = {
|
attached_dbs = {r[0] for r in conn.execute("SELECT database_name FROM duckdb_databases()").fetchall()}
|
||||||
r[0] for r in conn.execute("SELECT database_name FROM duckdb_databases()").fetchall()
|
|
||||||
}
|
|
||||||
except Exception:
|
except Exception:
|
||||||
pass
|
pass
|
||||||
|
|
||||||
|
|
@ -1042,7 +1045,8 @@ def _reattach_remote_extensions(
|
||||||
"query-path remote_attach: extension %r not in allowlist; "
|
"query-path remote_attach: extension %r not in allowlist; "
|
||||||
"refusing to LOAD/ATTACH for source %s. Override via "
|
"refusing to LOAD/ATTACH for source %s. Override via "
|
||||||
"AGNES_REMOTE_ATTACH_EXTENSIONS if intended.",
|
"AGNES_REMOTE_ATTACH_EXTENSIONS if intended.",
|
||||||
extension, alias,
|
extension,
|
||||||
|
alias,
|
||||||
)
|
)
|
||||||
continue
|
continue
|
||||||
if token_env and not is_token_env_allowed(token_env):
|
if token_env and not is_token_env_allowed(token_env):
|
||||||
|
|
@ -1050,7 +1054,8 @@ def _reattach_remote_extensions(
|
||||||
"query-path remote_attach: token_env %r not in allowlist; "
|
"query-path remote_attach: token_env %r not in allowlist; "
|
||||||
"refusing for source %s. Override via "
|
"refusing for source %s. Override via "
|
||||||
"AGNES_REMOTE_ATTACH_TOKEN_ENVS if intended.",
|
"AGNES_REMOTE_ATTACH_TOKEN_ENVS if intended.",
|
||||||
token_env, alias,
|
token_env,
|
||||||
|
alias,
|
||||||
)
|
)
|
||||||
continue
|
continue
|
||||||
if alias in attached_dbs:
|
if alias in attached_dbs:
|
||||||
|
|
@ -1081,25 +1086,20 @@ def _reattach_remote_extensions(
|
||||||
except BQMetadataAuthError as e:
|
except BQMetadataAuthError as e:
|
||||||
logger.error(
|
logger.error(
|
||||||
"Failed to fetch BQ metadata token for %s: %s — skipping ATTACH",
|
"Failed to fetch BQ metadata token for %s: %s — skipping ATTACH",
|
||||||
alias, e,
|
alias,
|
||||||
|
e,
|
||||||
)
|
)
|
||||||
continue
|
continue
|
||||||
escaped = escape_sql_string_literal(bq_token)
|
escaped = escape_sql_string_literal(bq_token)
|
||||||
secret_name = f"bq_secret_{alias}"
|
secret_name = f"bq_secret_{alias}"
|
||||||
conn.execute(
|
conn.execute(f"CREATE OR REPLACE SECRET {secret_name} (TYPE bigquery, ACCESS_TOKEN '{escaped}')")
|
||||||
f"CREATE OR REPLACE SECRET {secret_name} "
|
|
||||||
f"(TYPE bigquery, ACCESS_TOKEN '{escaped}')"
|
|
||||||
)
|
|
||||||
from connectors.bigquery.access import apply_bq_session_settings
|
from connectors.bigquery.access import apply_bq_session_settings
|
||||||
|
|
||||||
apply_bq_session_settings(conn)
|
apply_bq_session_settings(conn)
|
||||||
conn.execute(
|
conn.execute(f"ATTACH '{safe_url}' AS {alias} (TYPE {extension}, READ_ONLY)")
|
||||||
f"ATTACH '{safe_url}' AS {alias} (TYPE {extension}, READ_ONLY)"
|
|
||||||
)
|
|
||||||
elif token:
|
elif token:
|
||||||
escaped_token = escape_sql_string_literal(token)
|
escaped_token = escape_sql_string_literal(token)
|
||||||
conn.execute(
|
conn.execute(f"ATTACH '{safe_url}' AS {alias} (TYPE {extension}, TOKEN '{escaped_token}')")
|
||||||
f"ATTACH '{safe_url}' AS {alias} (TYPE {extension}, TOKEN '{escaped_token}')"
|
|
||||||
)
|
|
||||||
# Apply BQ session settings on every BQ-extension attach,
|
# Apply BQ session settings on every BQ-extension attach,
|
||||||
# not only the metadata-token branch above. Previously the
|
# not only the metadata-token branch above. Previously the
|
||||||
# token-based branch fell through without setting
|
# token-based branch fell through without setting
|
||||||
|
|
@ -1107,13 +1107,13 @@ def _reattach_remote_extensions(
|
||||||
# in place and causing "remote query timeout" surprises.
|
# in place and causing "remote query timeout" surprises.
|
||||||
if extension == "bigquery":
|
if extension == "bigquery":
|
||||||
from connectors.bigquery.access import apply_bq_session_settings
|
from connectors.bigquery.access import apply_bq_session_settings
|
||||||
|
|
||||||
apply_bq_session_settings(conn)
|
apply_bq_session_settings(conn)
|
||||||
else:
|
else:
|
||||||
conn.execute(
|
conn.execute(f"ATTACH '{safe_url}' AS {alias} (TYPE {extension}, READ_ONLY)")
|
||||||
f"ATTACH '{safe_url}' AS {alias} (TYPE {extension}, READ_ONLY)"
|
|
||||||
)
|
|
||||||
if extension == "bigquery":
|
if extension == "bigquery":
|
||||||
from connectors.bigquery.access import apply_bq_session_settings
|
from connectors.bigquery.access import apply_bq_session_settings
|
||||||
|
|
||||||
apply_bq_session_settings(conn)
|
apply_bq_session_settings(conn)
|
||||||
attached_dbs.add(alias)
|
attached_dbs.add(alias)
|
||||||
logger.debug("Re-attached remote source %s via %s extension", alias, extension)
|
logger.debug("Re-attached remote source %s via %s extension", alias, extension)
|
||||||
|
|
@ -1434,8 +1434,7 @@ _V16_TO_V17_MIGRATIONS = [
|
||||||
PRIMARY KEY (item_a_id, item_b_id, relation_type)
|
PRIMARY KEY (item_a_id, item_b_id, relation_type)
|
||||||
)
|
)
|
||||||
""",
|
""",
|
||||||
"CREATE INDEX IF NOT EXISTS idx_knowledge_item_relations_resolved "
|
"CREATE INDEX IF NOT EXISTS idx_knowledge_item_relations_resolved ON knowledge_item_relations(resolved)",
|
||||||
"ON knowledge_item_relations(resolved)",
|
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -1458,14 +1457,15 @@ _V19_TO_V20_MIGRATIONS = [
|
||||||
# first so implies references resolve cleanly when expand_implies does BFS.
|
# first so implies references resolve cleanly when expand_implies does BFS.
|
||||||
_CORE_ROLES_SEED = [
|
_CORE_ROLES_SEED = [
|
||||||
# (key, display_name, description, implies)
|
# (key, display_name, description, implies)
|
||||||
("core.viewer", "Viewer",
|
("core.viewer", "Viewer", "Read-only access to permitted datasets.", []),
|
||||||
"Read-only access to permitted datasets.", []),
|
("core.analyst", "Analyst", "Default user role; query data, run analyses.", ["core.viewer"]),
|
||||||
("core.analyst", "Analyst",
|
(
|
||||||
"Default user role; query data, run analyses.", ["core.viewer"]),
|
"core.km_admin",
|
||||||
("core.km_admin", "Knowledge-management admin",
|
"Knowledge-management admin",
|
||||||
"Manages metric definitions and column metadata.", ["core.analyst"]),
|
"Manages metric definitions and column metadata.",
|
||||||
("core.admin", "Administrator",
|
["core.analyst"],
|
||||||
"Full system access; bypasses dataset_permissions.", ["core.km_admin"]),
|
),
|
||||||
|
("core.admin", "Administrator", "Full system access; bypasses dataset_permissions.", ["core.km_admin"]),
|
||||||
]
|
]
|
||||||
|
|
||||||
# Maps the legacy users.role string values onto core.* keys for the v8→v9
|
# Maps the legacy users.role string values onto core.* keys for the v8→v9
|
||||||
|
|
@ -1486,10 +1486,8 @@ SYSTEM_EVERYONE_GROUP = "Everyone"
|
||||||
# app.auth.access (admin short-circuit) and the OAuth callback (default
|
# app.auth.access (admin short-circuit) and the OAuth callback (default
|
||||||
# Everyone membership for new users); changing them is a breaking change.
|
# Everyone membership for new users); changing them is a breaking change.
|
||||||
_SYSTEM_GROUPS_SEED = [
|
_SYSTEM_GROUPS_SEED = [
|
||||||
(SYSTEM_ADMIN_GROUP,
|
(SYSTEM_ADMIN_GROUP, "System: full access to all data and admin actions"),
|
||||||
"System: full access to all data and admin actions"),
|
(SYSTEM_EVERYONE_GROUP, "System: default group every user is implicitly a member of"),
|
||||||
(SYSTEM_EVERYONE_GROUP,
|
|
||||||
"System: default group every user is implicitly a member of"),
|
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -1505,9 +1503,7 @@ def _seed_system_groups(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
import uuid as _uuid
|
import uuid as _uuid
|
||||||
|
|
||||||
for name, description in _SYSTEM_GROUPS_SEED:
|
for name, description in _SYSTEM_GROUPS_SEED:
|
||||||
existing = conn.execute(
|
existing = conn.execute("SELECT id, is_system FROM user_groups WHERE name = ?", [name]).fetchone()
|
||||||
"SELECT id, is_system FROM user_groups WHERE name = ?", [name]
|
|
||||||
).fetchone()
|
|
||||||
if existing is None:
|
if existing is None:
|
||||||
conn.execute(
|
conn.execute(
|
||||||
"""INSERT INTO user_groups (id, name, description, is_system, created_by)
|
"""INSERT INTO user_groups (id, name, description, is_system, created_by)
|
||||||
|
|
@ -1549,9 +1545,7 @@ def _v12_to_v13_finalize(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
try:
|
try:
|
||||||
_seed_system_groups(conn)
|
_seed_system_groups(conn)
|
||||||
|
|
||||||
admin_group_id = conn.execute(
|
admin_group_id = conn.execute("SELECT id FROM user_groups WHERE name = ?", [SYSTEM_ADMIN_GROUP]).fetchone()[0]
|
||||||
"SELECT id FROM user_groups WHERE name = ?", [SYSTEM_ADMIN_GROUP]
|
|
||||||
).fetchone()[0]
|
|
||||||
everyone_group_id = conn.execute(
|
everyone_group_id = conn.execute(
|
||||||
"SELECT id FROM user_groups WHERE name = ?", [SYSTEM_EVERYONE_GROUP]
|
"SELECT id FROM user_groups WHERE name = ?", [SYSTEM_EVERYONE_GROUP]
|
||||||
).fetchone()[0]
|
).fetchone()[0]
|
||||||
|
|
@ -1560,16 +1554,14 @@ def _v12_to_v13_finalize(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
# column having been physically dropped already (re-run safety) and of
|
# column having been physically dropped already (re-run safety) and of
|
||||||
# malformed JSON (caught row-by-row, skipped silently).
|
# malformed JSON (caught row-by-row, skipped silently).
|
||||||
has_groups_col = conn.execute(
|
has_groups_col = conn.execute(
|
||||||
"SELECT 1 FROM information_schema.columns "
|
"SELECT 1 FROM information_schema.columns WHERE table_name = 'users' AND column_name = 'groups'"
|
||||||
"WHERE table_name = 'users' AND column_name = 'groups'"
|
|
||||||
).fetchone()
|
).fetchone()
|
||||||
if has_groups_col:
|
if has_groups_col:
|
||||||
rows = conn.execute(
|
rows = conn.execute("SELECT id, groups FROM users WHERE groups IS NOT NULL").fetchall()
|
||||||
"SELECT id, groups FROM users WHERE groups IS NOT NULL"
|
|
||||||
).fetchall()
|
|
||||||
for user_id, groups_json in rows:
|
for user_id, groups_json in rows:
|
||||||
try:
|
try:
|
||||||
import json as _json
|
import json as _json
|
||||||
|
|
||||||
names = _json.loads(groups_json) if isinstance(groups_json, str) else (groups_json or [])
|
names = _json.loads(groups_json) if isinstance(groups_json, str) else (groups_json or [])
|
||||||
except (ValueError, TypeError):
|
except (ValueError, TypeError):
|
||||||
names = []
|
names = []
|
||||||
|
|
@ -1579,7 +1571,8 @@ def _v12_to_v13_finalize(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
if not isinstance(name, str) or not name.strip():
|
if not isinstance(name, str) or not name.strip():
|
||||||
continue
|
continue
|
||||||
group_row = conn.execute(
|
group_row = conn.execute(
|
||||||
"SELECT id FROM user_groups WHERE name = ?", [name],
|
"SELECT id FROM user_groups WHERE name = ?",
|
||||||
|
[name],
|
||||||
).fetchone()
|
).fetchone()
|
||||||
if not group_row:
|
if not group_row:
|
||||||
continue
|
continue
|
||||||
|
|
@ -1592,9 +1585,9 @@ def _v12_to_v13_finalize(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
)
|
)
|
||||||
except duckdb.ConstraintException:
|
except duckdb.ConstraintException:
|
||||||
logger.debug(
|
logger.debug(
|
||||||
"v13 backfill step 2 (google_sync): skipped "
|
"v13 backfill step 2 (google_sync): skipped insert for user=%s group=%s — already present",
|
||||||
"insert for user=%s group=%s — already present",
|
user_id,
|
||||||
user_id, name,
|
name,
|
||||||
)
|
)
|
||||||
|
|
||||||
# 3. core.admin grants → Admin membership. Tolerant of either table being
|
# 3. core.admin grants → Admin membership. Tolerant of either table being
|
||||||
|
|
@ -1670,7 +1663,8 @@ def _v12_to_v13_finalize(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
logger.debug(
|
logger.debug(
|
||||||
"v13 backfill step 5 (resource_grants): skipped "
|
"v13 backfill step 5 (resource_grants): skipped "
|
||||||
"insert for group=%s resource=%s — already migrated",
|
"insert for group=%s resource=%s — already migrated",
|
||||||
group_id, resource_id,
|
group_id,
|
||||||
|
resource_id,
|
||||||
)
|
)
|
||||||
|
|
||||||
# Audit: log any non-core capability grants before dropping the
|
# Audit: log any non-core capability grants before dropping the
|
||||||
|
|
@ -1695,7 +1689,8 @@ def _v12_to_v13_finalize(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
"If this role was registered via register_internal_role(), "
|
"If this role was registered via register_internal_role(), "
|
||||||
"the affected users need to be re-added to an "
|
"the affected users need to be re-added to an "
|
||||||
"appropriate user_group post-upgrade.",
|
"appropriate user_group post-upgrade.",
|
||||||
cnt, role_key,
|
cnt,
|
||||||
|
role_key,
|
||||||
)
|
)
|
||||||
|
|
||||||
# 6. Drop legacy tables in FK-correct order: dependent tables first.
|
# 6. Drop legacy tables in FK-correct order: dependent tables first.
|
||||||
|
|
@ -1754,8 +1749,7 @@ def _v13_to_v14_finalize(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
).fetchone()[0]
|
).fetchone()[0]
|
||||||
if orphan_members:
|
if orphan_members:
|
||||||
logger.warning(
|
logger.warning(
|
||||||
"v14 migration: dropping %d orphan user_group_members rows "
|
"v14 migration: dropping %d orphan user_group_members rows (group_id pointed at a deleted user_groups.id)",
|
||||||
"(group_id pointed at a deleted user_groups.id)",
|
|
||||||
orphan_members,
|
orphan_members,
|
||||||
)
|
)
|
||||||
if orphan_grants:
|
if orphan_grants:
|
||||||
|
|
@ -1778,9 +1772,7 @@ def _v13_to_v14_finalize(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
)
|
)
|
||||||
|
|
||||||
# user_group_members rebuild
|
# user_group_members rebuild
|
||||||
conn.execute(
|
conn.execute("ALTER TABLE user_group_members RENAME TO user_group_members_v13_pre")
|
||||||
"ALTER TABLE user_group_members RENAME TO user_group_members_v13_pre"
|
|
||||||
)
|
|
||||||
conn.execute(
|
conn.execute(
|
||||||
"""CREATE TABLE user_group_members (
|
"""CREATE TABLE user_group_members (
|
||||||
user_id VARCHAR NOT NULL,
|
user_id VARCHAR NOT NULL,
|
||||||
|
|
@ -1800,9 +1792,7 @@ def _v13_to_v14_finalize(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
conn.execute("DROP TABLE user_group_members_v13_pre")
|
conn.execute("DROP TABLE user_group_members_v13_pre")
|
||||||
|
|
||||||
# resource_grants rebuild
|
# resource_grants rebuild
|
||||||
conn.execute(
|
conn.execute("ALTER TABLE resource_grants RENAME TO resource_grants_v13_pre")
|
||||||
"ALTER TABLE resource_grants RENAME TO resource_grants_v13_pre"
|
|
||||||
)
|
|
||||||
conn.execute(
|
conn.execute(
|
||||||
"""CREATE TABLE resource_grants (
|
"""CREATE TABLE resource_grants (
|
||||||
id VARCHAR PRIMARY KEY,
|
id VARCHAR PRIMARY KEY,
|
||||||
|
|
@ -1913,11 +1903,13 @@ def _v18_to_v19_finalize(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
`primary_key`) still migrate cleanly. Wrapped in BEGIN/COMMIT;
|
`primary_key`) still migrate cleanly. Wrapped in BEGIN/COMMIT;
|
||||||
on error ROLLBACK and the outer caller skips the schema_version bump.
|
on error ROLLBACK and the outer caller skips the schema_version bump.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
def _existing_cols(table: str) -> set[str]:
|
def _existing_cols(table: str) -> set[str]:
|
||||||
return {
|
return {
|
||||||
r[0] for r in conn.execute(
|
r[0]
|
||||||
"SELECT column_name FROM information_schema.columns "
|
for r in conn.execute(
|
||||||
"WHERE table_name = ?", [table],
|
"SELECT column_name FROM information_schema.columns WHERE table_name = ?",
|
||||||
|
[table],
|
||||||
).fetchall()
|
).fetchall()
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -1951,26 +1943,29 @@ def _v18_to_v19_finalize(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
)"""
|
)"""
|
||||||
)
|
)
|
||||||
users_target_cols = [
|
users_target_cols = [
|
||||||
"id", "email", "name", "password_hash",
|
"id",
|
||||||
"setup_token", "setup_token_created",
|
"email",
|
||||||
"reset_token", "reset_token_created",
|
"name",
|
||||||
"active", "deactivated_at", "deactivated_by",
|
"password_hash",
|
||||||
"created_at", "updated_at",
|
"setup_token",
|
||||||
|
"setup_token_created",
|
||||||
|
"reset_token",
|
||||||
|
"reset_token_created",
|
||||||
|
"active",
|
||||||
|
"deactivated_at",
|
||||||
|
"deactivated_by",
|
||||||
|
"created_at",
|
||||||
|
"updated_at",
|
||||||
]
|
]
|
||||||
old_users_cols = _existing_cols("users_v18_pre")
|
old_users_cols = _existing_cols("users_v18_pre")
|
||||||
common = [c for c in users_target_cols if c in old_users_cols]
|
common = [c for c in users_target_cols if c in old_users_cols]
|
||||||
col_list = ", ".join(common)
|
col_list = ", ".join(common)
|
||||||
conn.execute(
|
conn.execute(f"INSERT INTO users ({col_list}) SELECT {col_list} FROM users_v18_pre")
|
||||||
f"INSERT INTO users ({col_list}) "
|
|
||||||
f"SELECT {col_list} FROM users_v18_pre"
|
|
||||||
)
|
|
||||||
conn.execute("DROP TABLE users_v18_pre")
|
conn.execute("DROP TABLE users_v18_pre")
|
||||||
|
|
||||||
# 4: rebuild table_registry without `is_public` column.
|
# 4: rebuild table_registry without `is_public` column.
|
||||||
if "is_public" in _existing_cols("table_registry"):
|
if "is_public" in _existing_cols("table_registry"):
|
||||||
conn.execute(
|
conn.execute("ALTER TABLE table_registry RENAME TO table_registry_v18_pre")
|
||||||
"ALTER TABLE table_registry RENAME TO table_registry_v18_pre"
|
|
||||||
)
|
|
||||||
conn.execute(
|
conn.execute(
|
||||||
"""CREATE TABLE table_registry (
|
"""CREATE TABLE table_registry (
|
||||||
id VARCHAR PRIMARY KEY,
|
id VARCHAR PRIMARY KEY,
|
||||||
|
|
@ -1990,18 +1985,25 @@ def _v18_to_v19_finalize(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
)"""
|
)"""
|
||||||
)
|
)
|
||||||
registry_target_cols = [
|
registry_target_cols = [
|
||||||
"id", "name", "source_type", "bucket", "source_table",
|
"id",
|
||||||
"sync_strategy", "query_mode", "sync_schedule",
|
"name",
|
||||||
"profile_after_sync", "primary_key", "folder",
|
"source_type",
|
||||||
"description", "registered_by", "registered_at",
|
"bucket",
|
||||||
|
"source_table",
|
||||||
|
"sync_strategy",
|
||||||
|
"query_mode",
|
||||||
|
"sync_schedule",
|
||||||
|
"profile_after_sync",
|
||||||
|
"primary_key",
|
||||||
|
"folder",
|
||||||
|
"description",
|
||||||
|
"registered_by",
|
||||||
|
"registered_at",
|
||||||
]
|
]
|
||||||
old_registry_cols = _existing_cols("table_registry_v18_pre")
|
old_registry_cols = _existing_cols("table_registry_v18_pre")
|
||||||
common = [c for c in registry_target_cols if c in old_registry_cols]
|
common = [c for c in registry_target_cols if c in old_registry_cols]
|
||||||
col_list = ", ".join(common)
|
col_list = ", ".join(common)
|
||||||
conn.execute(
|
conn.execute(f"INSERT INTO table_registry ({col_list}) SELECT {col_list} FROM table_registry_v18_pre")
|
||||||
f"INSERT INTO table_registry ({col_list}) "
|
|
||||||
f"SELECT {col_list} FROM table_registry_v18_pre"
|
|
||||||
)
|
|
||||||
conn.execute("DROP TABLE table_registry_v18_pre")
|
conn.execute("DROP TABLE table_registry_v18_pre")
|
||||||
|
|
||||||
conn.execute("COMMIT")
|
conn.execute("COMMIT")
|
||||||
|
|
@ -2025,9 +2027,7 @@ def _seed_core_roles(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
import uuid as _uuid
|
import uuid as _uuid
|
||||||
|
|
||||||
for key, display_name, description, implies in _CORE_ROLES_SEED:
|
for key, display_name, description, implies in _CORE_ROLES_SEED:
|
||||||
existing = conn.execute(
|
existing = conn.execute("SELECT id FROM internal_roles WHERE key = ?", [key]).fetchone()
|
||||||
"SELECT id FROM internal_roles WHERE key = ?", [key]
|
|
||||||
).fetchone()
|
|
||||||
implies_json = _json.dumps(implies)
|
implies_json = _json.dumps(implies)
|
||||||
if existing:
|
if existing:
|
||||||
conn.execute(
|
conn.execute(
|
||||||
|
|
@ -2059,25 +2059,21 @@ def _backfill_users_role_to_grants(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
# Verify users.role column still exists (we may be re-running after a
|
# Verify users.role column still exists (we may be re-running after a
|
||||||
# half-applied migration); skip silently if it's already gone.
|
# half-applied migration); skip silently if it's already gone.
|
||||||
has_role_col = conn.execute(
|
has_role_col = conn.execute(
|
||||||
"SELECT 1 FROM information_schema.columns "
|
"SELECT 1 FROM information_schema.columns WHERE table_name = 'users' AND column_name = 'role'"
|
||||||
"WHERE table_name = 'users' AND column_name = 'role'"
|
|
||||||
).fetchone()
|
).fetchone()
|
||||||
if not has_role_col:
|
if not has_role_col:
|
||||||
return
|
return
|
||||||
|
|
||||||
rows = conn.execute(
|
rows = conn.execute("SELECT id, role FROM users WHERE role IS NOT NULL").fetchall()
|
||||||
"SELECT id, role FROM users WHERE role IS NOT NULL"
|
|
||||||
).fetchall()
|
|
||||||
backfilled = 0
|
backfilled = 0
|
||||||
for user_id, role_str in rows:
|
for user_id, role_str in rows:
|
||||||
role_key = _LEGACY_ROLE_TO_CORE_KEY.get(role_str, "core.viewer")
|
role_key = _LEGACY_ROLE_TO_CORE_KEY.get(role_str, "core.viewer")
|
||||||
role_row = conn.execute(
|
role_row = conn.execute("SELECT id FROM internal_roles WHERE key = ?", [role_key]).fetchone()
|
||||||
"SELECT id FROM internal_roles WHERE key = ?", [role_key]
|
|
||||||
).fetchone()
|
|
||||||
if not role_row:
|
if not role_row:
|
||||||
logger.warning(
|
logger.warning(
|
||||||
"v9 backfill: core role %s missing — skipping user %s",
|
"v9 backfill: core role %s missing — skipping user %s",
|
||||||
role_key, user_id,
|
role_key,
|
||||||
|
user_id,
|
||||||
)
|
)
|
||||||
continue
|
continue
|
||||||
try:
|
try:
|
||||||
|
|
@ -2363,8 +2359,7 @@ def _v28_to_v29_finalize(conn) -> None:
|
||||||
are gone after the first run and the seed INSERTs use ON CONFLICT.
|
are gone after the first run and the seed INSERTs use ON CONFLICT.
|
||||||
"""
|
"""
|
||||||
has_welcome = conn.execute(
|
has_welcome = conn.execute(
|
||||||
"SELECT 1 FROM information_schema.tables "
|
"SELECT 1 FROM information_schema.tables WHERE table_schema = 'main' AND table_name = 'welcome_template'"
|
||||||
"WHERE table_schema = 'main' AND table_name = 'welcome_template'"
|
|
||||||
).fetchone()
|
).fetchone()
|
||||||
if has_welcome:
|
if has_welcome:
|
||||||
conn.execute(
|
conn.execute(
|
||||||
|
|
@ -2375,8 +2370,7 @@ def _v28_to_v29_finalize(conn) -> None:
|
||||||
conn.execute("DROP TABLE welcome_template")
|
conn.execute("DROP TABLE welcome_template")
|
||||||
|
|
||||||
has_claude_md = conn.execute(
|
has_claude_md = conn.execute(
|
||||||
"SELECT 1 FROM information_schema.tables "
|
"SELECT 1 FROM information_schema.tables WHERE table_schema = 'main' AND table_name = 'claude_md_template'"
|
||||||
"WHERE table_schema = 'main' AND table_name = 'claude_md_template'"
|
|
||||||
).fetchone()
|
).fetchone()
|
||||||
if has_claude_md:
|
if has_claude_md:
|
||||||
conn.execute(
|
conn.execute(
|
||||||
|
|
@ -2391,8 +2385,7 @@ def _v28_to_v29_finalize(conn) -> None:
|
||||||
# operators) or via a prior migration run (idempotent re-execution).
|
# operators) or via a prior migration run (idempotent re-execution).
|
||||||
for key in ("welcome", "claude_md", "home"):
|
for key in ("welcome", "claude_md", "home"):
|
||||||
conn.execute(
|
conn.execute(
|
||||||
"INSERT INTO instance_templates (key, content) VALUES (?, NULL) "
|
"INSERT INTO instance_templates (key, content) VALUES (?, NULL) ON CONFLICT (key) DO NOTHING",
|
||||||
"ON CONFLICT (key) DO NOTHING",
|
|
||||||
[key],
|
[key],
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
@ -2531,15 +2524,12 @@ def _v34_to_v35_migrate(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
The audit columns (``archived_at``, ``archived_by``) ship first
|
The audit columns (``archived_at``, ``archived_by``) ship first
|
||||||
behind ``IF NOT EXISTS`` so they're safe in all three states.
|
behind ``IF NOT EXISTS`` so they're safe in all three states.
|
||||||
"""
|
"""
|
||||||
conn.execute(
|
conn.execute("ALTER TABLE store_entities ADD COLUMN IF NOT EXISTS archived_at TIMESTAMP")
|
||||||
"ALTER TABLE store_entities ADD COLUMN IF NOT EXISTS archived_at TIMESTAMP"
|
conn.execute("ALTER TABLE store_entities ADD COLUMN IF NOT EXISTS archived_by VARCHAR")
|
||||||
)
|
|
||||||
conn.execute(
|
|
||||||
"ALTER TABLE store_entities ADD COLUMN IF NOT EXISTS archived_by VARCHAR"
|
|
||||||
)
|
|
||||||
|
|
||||||
cols = {
|
cols = {
|
||||||
r[0] for r in conn.execute(
|
r[0]
|
||||||
|
for r in conn.execute(
|
||||||
"SELECT column_name FROM information_schema.columns "
|
"SELECT column_name FROM information_schema.columns "
|
||||||
"WHERE table_name = 'store_entities' "
|
"WHERE table_name = 'store_entities' "
|
||||||
" AND column_name IN ('visibility_status', '_vis_v35')"
|
" AND column_name IN ('visibility_status', '_vis_v35')"
|
||||||
|
|
@ -2553,18 +2543,10 @@ def _v34_to_v35_migrate(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
# in v36 (DuckDB ALTER COLUMN supports SET NOT NULL / SET DEFAULT
|
# in v36 (DuckDB ALTER COLUMN supports SET NOT NULL / SET DEFAULT
|
||||||
# but not ADD CHECK on an existing column). Value-list enforcement
|
# but not ADD CHECK on an existing column). Value-list enforcement
|
||||||
# is application-side via VALID_VISIBILITY in StoreEntitiesRepository.
|
# is application-side via VALID_VISIBILITY in StoreEntitiesRepository.
|
||||||
conn.execute(
|
conn.execute("ALTER TABLE store_entities ADD COLUMN _vis_v35 VARCHAR")
|
||||||
"ALTER TABLE store_entities ADD COLUMN _vis_v35 VARCHAR"
|
conn.execute("UPDATE store_entities SET _vis_v35 = visibility_status")
|
||||||
)
|
conn.execute("ALTER TABLE store_entities DROP COLUMN visibility_status")
|
||||||
conn.execute(
|
conn.execute("ALTER TABLE store_entities RENAME COLUMN _vis_v35 TO visibility_status")
|
||||||
"UPDATE store_entities SET _vis_v35 = visibility_status"
|
|
||||||
)
|
|
||||||
conn.execute(
|
|
||||||
"ALTER TABLE store_entities DROP COLUMN visibility_status"
|
|
||||||
)
|
|
||||||
conn.execute(
|
|
||||||
"ALTER TABLE store_entities RENAME COLUMN _vis_v35 TO visibility_status"
|
|
||||||
)
|
|
||||||
elif has_temp and not has_vis:
|
elif has_temp and not has_vis:
|
||||||
# Partial-rebuild recovery — prior attempt dropped visibility_status
|
# Partial-rebuild recovery — prior attempt dropped visibility_status
|
||||||
# but the RENAME never landed. Data is already in _vis_v35 from
|
# but the RENAME never landed. Data is already in _vis_v35 from
|
||||||
|
|
@ -2573,19 +2555,14 @@ def _v34_to_v35_migrate(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
"v34→v35 detected partial-rebuild state (visibility_status "
|
"v34→v35 detected partial-rebuild state (visibility_status "
|
||||||
"missing, _vis_v35 present); recovering via RENAME"
|
"missing, _vis_v35 present); recovering via RENAME"
|
||||||
)
|
)
|
||||||
conn.execute(
|
conn.execute("ALTER TABLE store_entities RENAME COLUMN _vis_v35 TO visibility_status")
|
||||||
"ALTER TABLE store_entities RENAME COLUMN _vis_v35 TO visibility_status"
|
|
||||||
)
|
|
||||||
elif has_vis and has_temp:
|
elif has_vis and has_temp:
|
||||||
# Both present — earlier rebuild aborted before the DROP.
|
# Both present — earlier rebuild aborted before the DROP.
|
||||||
# visibility_status holds the canonical values; drop the temp.
|
# visibility_status holds the canonical values; drop the temp.
|
||||||
logger.warning(
|
logger.warning(
|
||||||
"v34→v35 detected partial-rebuild state (both visibility_status "
|
"v34→v35 detected partial-rebuild state (both visibility_status and _vis_v35 present); dropping the temp"
|
||||||
"and _vis_v35 present); dropping the temp"
|
|
||||||
)
|
|
||||||
conn.execute(
|
|
||||||
"ALTER TABLE store_entities DROP COLUMN _vis_v35"
|
|
||||||
)
|
)
|
||||||
|
conn.execute("ALTER TABLE store_entities DROP COLUMN _vis_v35")
|
||||||
# else: neither column is present, which means store_entities itself
|
# else: neither column is present, which means store_entities itself
|
||||||
# is at a shape ahead of v34. _SYSTEM_SCHEMA above already created
|
# is at a shape ahead of v34. _SYSTEM_SCHEMA above already created
|
||||||
# the post-v35 shape; nothing to do here.
|
# the post-v35 shape; nothing to do here.
|
||||||
|
|
@ -2894,10 +2871,7 @@ def _v42_to_v43(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
UNIQUE (user_id, name)
|
UNIQUE (user_id, name)
|
||||||
)
|
)
|
||||||
""")
|
""")
|
||||||
conn.execute(
|
conn.execute("CREATE INDEX IF NOT EXISTS idx_obs_views_user ON user_observability_views(user_id, created_at)")
|
||||||
"CREATE INDEX IF NOT EXISTS idx_obs_views_user "
|
|
||||||
"ON user_observability_views(user_id, created_at)"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def _v43_to_v44(conn: duckdb.DuckDBPyConnection) -> None:
|
def _v43_to_v44(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
|
|
@ -2915,19 +2889,35 @@ def _v43_to_v44(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
from 1 → 2 in the same release, which the session-pipeline
|
from 1 → 2 in the same release, which the session-pipeline
|
||||||
reprocess loop uses to invalidate stale summaries.
|
reprocess loop uses to invalidate stale summaries.
|
||||||
"""
|
"""
|
||||||
conn.execute(
|
conn.execute("ALTER TABLE users ADD COLUMN IF NOT EXISTS last_pull_at TIMESTAMP")
|
||||||
"ALTER TABLE users ADD COLUMN IF NOT EXISTS last_pull_at TIMESTAMP"
|
|
||||||
)
|
|
||||||
for col in (
|
for col in (
|
||||||
"input_tokens",
|
"input_tokens",
|
||||||
"output_tokens",
|
"output_tokens",
|
||||||
"cache_read_tokens",
|
"cache_read_tokens",
|
||||||
"cache_creation_tokens",
|
"cache_creation_tokens",
|
||||||
):
|
):
|
||||||
conn.execute(
|
conn.execute(f"ALTER TABLE usage_session_summary ADD COLUMN IF NOT EXISTS {col} BIGINT DEFAULT 0")
|
||||||
f"ALTER TABLE usage_session_summary "
|
|
||||||
f"ADD COLUMN IF NOT EXISTS {col} BIGINT DEFAULT 0"
|
|
||||||
)
|
def _v44_to_v45(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
|
"""v45: add user_id column to usage tables for stable RBAC filtering.
|
||||||
|
|
||||||
|
The ``username`` column in ``usage_session_summary`` / ``usage_events``
|
||||||
|
stores the directory name from the session-data path, which is either
|
||||||
|
an email local-part (session collector) or a UUID (upload API). Email
|
||||||
|
local-parts are unstable — they change when users rename. ``user_id``
|
||||||
|
is the stable identity and becomes the authoritative RBAC filter
|
||||||
|
column for the ``agnes_sessions`` / ``agnes_telemetry`` aliases.
|
||||||
|
|
||||||
|
Backfill: the UsageProcessor populates ``user_id`` on every
|
||||||
|
(re)process run. Existing rows get backfilled when
|
||||||
|
``USAGE_PROCESSOR_VERSION`` bumps, which triggers the session-pipeline
|
||||||
|
reprocess loop.
|
||||||
|
"""
|
||||||
|
conn.execute("ALTER TABLE usage_session_summary ADD COLUMN IF NOT EXISTS user_id VARCHAR")
|
||||||
|
conn.execute("ALTER TABLE usage_events ADD COLUMN IF NOT EXISTS user_id VARCHAR")
|
||||||
|
conn.execute("CREATE INDEX IF NOT EXISTS idx_usage_session_user_id ON usage_session_summary(user_id)")
|
||||||
|
conn.execute("CREATE INDEX IF NOT EXISTS idx_usage_events_user_id ON usage_events(user_id)")
|
||||||
|
|
||||||
|
|
||||||
_V33_TO_V34_MIGRATIONS = [
|
_V33_TO_V34_MIGRATIONS = [
|
||||||
|
|
@ -3049,8 +3039,10 @@ def _replace_for_v24(project_id: str):
|
||||||
function-form replacement is the defensive idiom — it makes the
|
function-form replacement is the defensive idiom — it makes the
|
||||||
intent explicit and removes the dependency on re.sub's replacement-
|
intent explicit and removes the dependency on re.sub's replacement-
|
||||||
string escaping rules."""
|
string escaping rules."""
|
||||||
|
|
||||||
def _repl(m):
|
def _repl(m):
|
||||||
return f"`{project_id}.{m.group(1)}.{m.group(2)}`"
|
return f"`{project_id}.{m.group(1)}.{m.group(2)}`"
|
||||||
|
|
||||||
return _repl
|
return _repl
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -3059,6 +3051,7 @@ def _v23_to_v24_finalize(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
|
|
||||||
try:
|
try:
|
||||||
from app.instance_config import get_value
|
from app.instance_config import get_value
|
||||||
|
|
||||||
project_id = get_value("data_source", "bigquery", "project", default="") or ""
|
project_id = get_value("data_source", "bigquery", "project", default="") or ""
|
||||||
except Exception:
|
except Exception:
|
||||||
project_id = ""
|
project_id = ""
|
||||||
|
|
@ -3092,7 +3085,7 @@ def _v23_to_v24_finalize(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
raise RuntimeError(
|
raise RuntimeError(
|
||||||
f"v24 migration cannot complete: {len(rows)} materialized "
|
f"v24 migration cannot complete: {len(rows)} materialized "
|
||||||
f"BigQuery row(s) need their source_query rewritten from "
|
f"BigQuery row(s) need their source_query rewritten from "
|
||||||
f"DuckDB-flavor `bq.\"ds\".\"tbl\"` to BQ-native "
|
f'DuckDB-flavor `bq."ds"."tbl"` to BQ-native '
|
||||||
f"`<project>.ds.tbl`, but `data_source.bigquery.project` is "
|
f"`<project>.ds.tbl`, but `data_source.bigquery.project` is "
|
||||||
f"not configured. Set it via /admin/server-config (or "
|
f"not configured. Set it via /admin/server-config (or "
|
||||||
f"`instance.yaml: data_source.bigquery.project`) and restart "
|
f"`instance.yaml: data_source.bigquery.project`) and restart "
|
||||||
|
|
@ -3113,7 +3106,8 @@ def _v23_to_v24_finalize(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
[new_sq, row_id],
|
[new_sq, row_id],
|
||||||
)
|
)
|
||||||
logger.info(
|
logger.info(
|
||||||
"v24 migration: rewrote source_query for row %r", row_id,
|
"v24 migration: rewrote source_query for row %r",
|
||||||
|
row_id,
|
||||||
)
|
)
|
||||||
conn.execute("COMMIT")
|
conn.execute("COMMIT")
|
||||||
except Exception:
|
except Exception:
|
||||||
|
|
@ -3179,16 +3173,12 @@ def _ensure_schema(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
[SCHEMA_VERSION],
|
[SCHEMA_VERSION],
|
||||||
)
|
)
|
||||||
# v22 setup_banner row (kept as compat per CLAUDE.md schema notes).
|
# v22 setup_banner row (kept as compat per CLAUDE.md schema notes).
|
||||||
conn.execute(
|
conn.execute("INSERT INTO setup_banner (id, content) VALUES (1, NULL) ON CONFLICT (id) DO NOTHING")
|
||||||
"INSERT INTO setup_banner (id, content) VALUES (1, NULL) "
|
|
||||||
"ON CONFLICT (id) DO NOTHING"
|
|
||||||
)
|
|
||||||
# v26 instance_templates seed — three canonical keys with NULL
|
# v26 instance_templates seed — three canonical keys with NULL
|
||||||
# content (operator override absent → render OSS default).
|
# content (operator override absent → render OSS default).
|
||||||
for key in ("welcome", "claude_md", "home"):
|
for key in ("welcome", "claude_md", "home"):
|
||||||
conn.execute(
|
conn.execute(
|
||||||
"INSERT INTO instance_templates (key, content) VALUES (?, NULL) "
|
"INSERT INTO instance_templates (key, content) VALUES (?, NULL) ON CONFLICT (key) DO NOTHING",
|
||||||
"ON CONFLICT (key) DO NOTHING",
|
|
||||||
[key],
|
[key],
|
||||||
)
|
)
|
||||||
# v41 audit_log indices: _SYSTEM_SCHEMA omits CREATE INDEX to
|
# v41 audit_log indices: _SYSTEM_SCHEMA omits CREATE INDEX to
|
||||||
|
|
@ -3207,6 +3197,10 @@ def _ensure_schema(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
# them on fresh installs (no-op ALTERs); kept here for the
|
# them on fresh installs (no-op ALTERs); kept here for the
|
||||||
# ladder's chronological readability.
|
# ladder's chronological readability.
|
||||||
_v43_to_v44(conn)
|
_v43_to_v44(conn)
|
||||||
|
# v45 user_id column on usage tables. _SYSTEM_SCHEMA declares
|
||||||
|
# the columns for fresh installs; migration adds them for
|
||||||
|
# existing DBs. No-op on fresh.
|
||||||
|
_v44_to_v45(conn)
|
||||||
# Fresh-install seed is handled by the unconditional
|
# Fresh-install seed is handled by the unconditional
|
||||||
# _seed_core_roles call at the bottom of _ensure_schema —
|
# _seed_core_roles call at the bottom of _ensure_schema —
|
||||||
# left as a no-op branch here so the migration ladder still
|
# left as a no-op branch here so the migration ladder still
|
||||||
|
|
@ -3248,8 +3242,7 @@ def _ensure_schema(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
# Skip UPDATE if the column never existed (e.g. test fixtures
|
# Skip UPDATE if the column never existed (e.g. test fixtures
|
||||||
# starting from v2/v3 with a hand-crafted minimal users table).
|
# starting from v2/v3 with a hand-crafted minimal users table).
|
||||||
has_role_col = conn.execute(
|
has_role_col = conn.execute(
|
||||||
"SELECT 1 FROM information_schema.columns "
|
"SELECT 1 FROM information_schema.columns WHERE table_name = 'users' AND column_name = 'role'"
|
||||||
"WHERE table_name = 'users' AND column_name = 'role'"
|
|
||||||
).fetchone()
|
).fetchone()
|
||||||
if has_role_col:
|
if has_role_col:
|
||||||
conn.execute("UPDATE users SET role = NULL")
|
conn.execute("UPDATE users SET role = NULL")
|
||||||
|
|
@ -3350,6 +3343,8 @@ def _ensure_schema(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
_v42_to_v43(conn)
|
_v42_to_v43(conn)
|
||||||
if current < 44:
|
if current < 44:
|
||||||
_v43_to_v44(conn)
|
_v43_to_v44(conn)
|
||||||
|
if current < 45:
|
||||||
|
_v44_to_v45(conn)
|
||||||
conn.execute(
|
conn.execute(
|
||||||
"UPDATE schema_version SET version = ?, applied_at = current_timestamp",
|
"UPDATE schema_version SET version = ?, applied_at = current_timestamp",
|
||||||
[SCHEMA_VERSION],
|
[SCHEMA_VERSION],
|
||||||
|
|
@ -3383,7 +3378,8 @@ def _ensure_schema(conn: duckdb.DuckDBPyConnection) -> None:
|
||||||
"this process before any container restart is the "
|
"this process before any container restart is the "
|
||||||
"safe path; otherwise, the next start may need to "
|
"safe path; otherwise, the next start may need to "
|
||||||
"restore from %s.",
|
"restore from %s.",
|
||||||
e, _get_state_dir() / "system.duckdb.pre-migrate",
|
e,
|
||||||
|
_get_state_dir() / "system.duckdb.pre-migrate",
|
||||||
)
|
)
|
||||||
|
|
||||||
# Always run the system-groups seed when the DB is on a version this binary
|
# Always run the system-groups seed when the DB is on a version this binary
|
||||||
|
|
|
||||||
|
|
@ -16,15 +16,31 @@ class UsageRepository:
|
||||||
if not rows:
|
if not rows:
|
||||||
return 0
|
return 0
|
||||||
cols = [
|
cols = [
|
||||||
"id", "session_id", "session_file", "username", "event_uuid", "parent_uuid",
|
"id",
|
||||||
"event_type", "tool_name", "skill_name", "subagent_type", "command_name",
|
"session_id",
|
||||||
"is_error", "source", "ref_id", "model", "cwd", "occurred_at", "processor_version",
|
"session_file",
|
||||||
|
"username",
|
||||||
|
"event_uuid",
|
||||||
|
"parent_uuid",
|
||||||
|
"event_type",
|
||||||
|
"tool_name",
|
||||||
|
"skill_name",
|
||||||
|
"subagent_type",
|
||||||
|
"command_name",
|
||||||
|
"is_error",
|
||||||
|
"source",
|
||||||
|
"ref_id",
|
||||||
|
"model",
|
||||||
|
"cwd",
|
||||||
|
"occurred_at",
|
||||||
|
"processor_version",
|
||||||
|
"user_id",
|
||||||
]
|
]
|
||||||
placeholders = ",".join("?" for _ in cols)
|
placeholders = ",".join("?" for _ in cols)
|
||||||
sql = f"INSERT OR IGNORE INTO usage_events ({','.join(cols)}) VALUES ({placeholders})"
|
sql = f"INSERT OR IGNORE INTO usage_events ({','.join(cols)}) VALUES ({placeholders})"
|
||||||
self.conn.executemany(sql, [
|
self.conn.executemany(
|
||||||
[r.get(c) if c != "processor_version" else processor_version for c in cols] for r in rows
|
sql, [[r.get(c) if c != "processor_version" else processor_version for c in cols] for r in rows]
|
||||||
])
|
)
|
||||||
return len(rows)
|
return len(rows)
|
||||||
|
|
||||||
def upsert_summary(self, summary: dict, *, processor_version: int) -> None:
|
def upsert_summary(self, summary: dict, *, processor_version: int) -> None:
|
||||||
|
|
@ -37,8 +53,8 @@ class UsageRepository:
|
||||||
tool_calls, tool_errors, skill_invocations, subagent_dispatches,
|
tool_calls, tool_errors, skill_invocations, subagent_dispatches,
|
||||||
mcp_calls, slash_commands, distinct_tools, distinct_skills,
|
mcp_calls, slash_commands, distinct_tools, distinct_skills,
|
||||||
primary_model, input_tokens, output_tokens, cache_read_tokens,
|
primary_model, input_tokens, output_tokens, cache_read_tokens,
|
||||||
cache_creation_tokens, processor_version)
|
cache_creation_tokens, processor_version, user_id)
|
||||||
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||||
""",
|
""",
|
||||||
[
|
[
|
||||||
summary["session_file"],
|
summary["session_file"],
|
||||||
|
|
@ -64,18 +80,15 @@ class UsageRepository:
|
||||||
summary.get("cache_read_tokens", 0),
|
summary.get("cache_read_tokens", 0),
|
||||||
summary.get("cache_creation_tokens", 0),
|
summary.get("cache_creation_tokens", 0),
|
||||||
processor_version,
|
processor_version,
|
||||||
|
summary.get("user_id"),
|
||||||
],
|
],
|
||||||
)
|
)
|
||||||
|
|
||||||
def purge_for_session(self, session_file: str) -> int:
|
def purge_for_session(self, session_file: str) -> int:
|
||||||
"""DELETE events + summary for one session — used on reprocess."""
|
"""DELETE events + summary for one session — used on reprocess."""
|
||||||
r = self.conn.execute(
|
r = self.conn.execute("DELETE FROM usage_events WHERE session_file = ?", [session_file])
|
||||||
"DELETE FROM usage_events WHERE session_file = ?", [session_file]
|
|
||||||
)
|
|
||||||
events_deleted = r.rowcount if r.rowcount else 0
|
events_deleted = r.rowcount if r.rowcount else 0
|
||||||
self.conn.execute(
|
self.conn.execute("DELETE FROM usage_session_summary WHERE session_file = ?", [session_file])
|
||||||
"DELETE FROM usage_session_summary WHERE session_file = ?", [session_file]
|
|
||||||
)
|
|
||||||
return events_deleted
|
return events_deleted
|
||||||
|
|
||||||
def delete_older_than(self, days: int) -> int:
|
def delete_older_than(self, days: int) -> int:
|
||||||
|
|
|
||||||
|
|
@ -8,6 +8,7 @@ The v19 step (#150) drops dataset_permissions, access_requests tables and
|
||||||
users.role, table_registry.is_public columns; v20 then ALTERs the post-v19
|
users.role, table_registry.is_public columns; v20 then ALTERs the post-v19
|
||||||
table_registry to add the source_query column.
|
table_registry to add the source_query column.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import duckdb
|
import duckdb
|
||||||
|
|
||||||
from src.db import SCHEMA_VERSION, _ensure_schema, get_schema_version
|
from src.db import SCHEMA_VERSION, _ensure_schema, get_schema_version
|
||||||
|
|
@ -98,7 +99,7 @@ def test_schema_version_is_44():
|
||||||
# output_tokens, cache_read_tokens, cache_creation_tokens).
|
# output_tokens, cache_read_tokens, cache_creation_tokens).
|
||||||
# USAGE_PROCESSOR_VERSION simultaneously bumps 1→2 so the
|
# USAGE_PROCESSOR_VERSION simultaneously bumps 1→2 so the
|
||||||
# reprocess loop backfills tokens on next tick.
|
# reprocess loop backfills tokens on next tick.
|
||||||
assert SCHEMA_VERSION == 44
|
assert SCHEMA_VERSION == 45
|
||||||
|
|
||||||
|
|
||||||
def test_v37_marketplace_curator_columns(tmp_path):
|
def test_v37_marketplace_curator_columns(tmp_path):
|
||||||
|
|
@ -109,9 +110,9 @@ def test_v37_marketplace_curator_columns(tmp_path):
|
||||||
_ensure_schema(conn)
|
_ensure_schema(conn)
|
||||||
|
|
||||||
registry_cols = {
|
registry_cols = {
|
||||||
r[0] for r in conn.execute(
|
r[0]
|
||||||
"SELECT column_name FROM information_schema.columns "
|
for r in conn.execute(
|
||||||
"WHERE table_name = 'marketplace_registry'"
|
"SELECT column_name FROM information_schema.columns WHERE table_name = 'marketplace_registry'"
|
||||||
).fetchall()
|
).fetchall()
|
||||||
}
|
}
|
||||||
assert {"curator_name", "curator_email"} <= registry_cols, (
|
assert {"curator_name", "curator_email"} <= registry_cols, (
|
||||||
|
|
@ -119,9 +120,9 @@ def test_v37_marketplace_curator_columns(tmp_path):
|
||||||
)
|
)
|
||||||
|
|
||||||
plugin_cols = {
|
plugin_cols = {
|
||||||
r[0] for r in conn.execute(
|
r[0]
|
||||||
"SELECT column_name FROM information_schema.columns "
|
for r in conn.execute(
|
||||||
"WHERE table_name = 'marketplace_plugins'"
|
"SELECT column_name FROM information_schema.columns WHERE table_name = 'marketplace_plugins'"
|
||||||
).fetchall()
|
).fetchall()
|
||||||
}
|
}
|
||||||
assert {"cover_photo_url", "video_url", "doc_links"} <= plugin_cols, (
|
assert {"cover_photo_url", "video_url", "doc_links"} <= plugin_cols, (
|
||||||
|
|
@ -139,10 +140,7 @@ def test_v36_db_migrates_to_current(tmp_path):
|
||||||
|
|
||||||
# Stand up a minimal v36-shape registry + plugin row, plus the
|
# Stand up a minimal v36-shape registry + plugin row, plus the
|
||||||
# schema_version row that pins us to 36.
|
# schema_version row that pins us to 36.
|
||||||
conn.execute(
|
conn.execute("CREATE TABLE schema_version (version INTEGER, applied_at TIMESTAMP DEFAULT current_timestamp)")
|
||||||
"CREATE TABLE schema_version (version INTEGER, "
|
|
||||||
"applied_at TIMESTAMP DEFAULT current_timestamp)"
|
|
||||||
)
|
|
||||||
conn.execute("INSERT INTO schema_version (version) VALUES (36)")
|
conn.execute("INSERT INTO schema_version (version) VALUES (36)")
|
||||||
conn.execute("""CREATE TABLE marketplace_registry (
|
conn.execute("""CREATE TABLE marketplace_registry (
|
||||||
id VARCHAR PRIMARY KEY, name VARCHAR NOT NULL,
|
id VARCHAR PRIMARY KEY, name VARCHAR NOT NULL,
|
||||||
|
|
@ -161,22 +159,15 @@ def test_v36_db_migrates_to_current(tmp_path):
|
||||||
PRIMARY KEY (marketplace_id, name)
|
PRIMARY KEY (marketplace_id, name)
|
||||||
)""")
|
)""")
|
||||||
conn.execute(
|
conn.execute(
|
||||||
"INSERT INTO marketplace_registry (id, name, url) "
|
"INSERT INTO marketplace_registry (id, name, url) VALUES ('legacy', 'Legacy', 'https://example.com/repo.git')"
|
||||||
"VALUES ('legacy', 'Legacy', 'https://example.com/repo.git')"
|
|
||||||
)
|
|
||||||
conn.execute(
|
|
||||||
"INSERT INTO marketplace_plugins (marketplace_id, name) "
|
|
||||||
"VALUES ('legacy', 'foo')"
|
|
||||||
)
|
)
|
||||||
|
conn.execute("INSERT INTO marketplace_plugins (marketplace_id, name) VALUES ('legacy', 'foo')")
|
||||||
|
|
||||||
_ensure_schema(conn)
|
_ensure_schema(conn)
|
||||||
assert get_schema_version(conn) == SCHEMA_VERSION
|
assert get_schema_version(conn) == SCHEMA_VERSION
|
||||||
|
|
||||||
# v37 enrichment columns exist; existing rows preserved with NULL.
|
# v37 enrichment columns exist; existing rows preserved with NULL.
|
||||||
row = conn.execute(
|
row = conn.execute("SELECT curator_name, curator_email FROM marketplace_registry WHERE id = 'legacy'").fetchone()
|
||||||
"SELECT curator_name, curator_email FROM marketplace_registry "
|
|
||||||
"WHERE id = 'legacy'"
|
|
||||||
).fetchone()
|
|
||||||
assert row == (None, None)
|
assert row == (None, None)
|
||||||
|
|
||||||
row = conn.execute(
|
row = conn.execute(
|
||||||
|
|
@ -196,27 +187,18 @@ def test_v39_adds_marketplace_plugins_is_system(tmp_path):
|
||||||
_ensure_schema(conn)
|
_ensure_schema(conn)
|
||||||
|
|
||||||
cols = {
|
cols = {
|
||||||
r[0] for r in conn.execute(
|
r[0]
|
||||||
"SELECT column_name FROM information_schema.columns "
|
for r in conn.execute(
|
||||||
"WHERE table_name = 'marketplace_plugins'"
|
"SELECT column_name FROM information_schema.columns WHERE table_name = 'marketplace_plugins'"
|
||||||
).fetchall()
|
).fetchall()
|
||||||
}
|
}
|
||||||
assert "is_system" in cols, f"is_system missing from {cols}"
|
assert "is_system" in cols, f"is_system missing from {cols}"
|
||||||
|
|
||||||
# New rows default to FALSE — required so a freshly-synced plugin
|
# New rows default to FALSE — required so a freshly-synced plugin
|
||||||
# doesn't accidentally land in everyone's stack.
|
# doesn't accidentally land in everyone's stack.
|
||||||
conn.execute(
|
conn.execute("INSERT INTO marketplace_registry (id, name, url) VALUES ('m', 'M', 'https://example.com/repo.git')")
|
||||||
"INSERT INTO marketplace_registry (id, name, url) "
|
conn.execute("INSERT INTO marketplace_plugins (marketplace_id, name) VALUES ('m', 'p')")
|
||||||
"VALUES ('m', 'M', 'https://example.com/repo.git')"
|
row = conn.execute("SELECT is_system FROM marketplace_plugins WHERE marketplace_id = 'm' AND name = 'p'").fetchone()
|
||||||
)
|
|
||||||
conn.execute(
|
|
||||||
"INSERT INTO marketplace_plugins (marketplace_id, name) "
|
|
||||||
"VALUES ('m', 'p')"
|
|
||||||
)
|
|
||||||
row = conn.execute(
|
|
||||||
"SELECT is_system FROM marketplace_plugins "
|
|
||||||
"WHERE marketplace_id = 'm' AND name = 'p'"
|
|
||||||
).fetchone()
|
|
||||||
assert row[0] is False, f"new plugin defaulted to {row[0]!r}, expected False"
|
assert row[0] is False, f"new plugin defaulted to {row[0]!r}, expected False"
|
||||||
conn.close()
|
conn.close()
|
||||||
|
|
||||||
|
|
@ -230,10 +212,7 @@ def test_v38_db_migrates_to_v39(tmp_path):
|
||||||
# Stand up the v38 minimal shape: schema_version row + the two
|
# Stand up the v38 minimal shape: schema_version row + the two
|
||||||
# marketplace tables + a pre-existing plugin row that must survive
|
# marketplace tables + a pre-existing plugin row that must survive
|
||||||
# the migration with is_system = FALSE.
|
# the migration with is_system = FALSE.
|
||||||
conn.execute(
|
conn.execute("CREATE TABLE schema_version (version INTEGER, applied_at TIMESTAMP DEFAULT current_timestamp)")
|
||||||
"CREATE TABLE schema_version (version INTEGER, "
|
|
||||||
"applied_at TIMESTAMP DEFAULT current_timestamp)"
|
|
||||||
)
|
|
||||||
conn.execute("INSERT INTO schema_version (version) VALUES (38)")
|
conn.execute("INSERT INTO schema_version (version) VALUES (38)")
|
||||||
conn.execute("""CREATE TABLE marketplace_registry (
|
conn.execute("""CREATE TABLE marketplace_registry (
|
||||||
id VARCHAR PRIMARY KEY, name VARCHAR NOT NULL,
|
id VARCHAR PRIMARY KEY, name VARCHAR NOT NULL,
|
||||||
|
|
@ -254,21 +233,17 @@ def test_v38_db_migrates_to_v39(tmp_path):
|
||||||
PRIMARY KEY (marketplace_id, name)
|
PRIMARY KEY (marketplace_id, name)
|
||||||
)""")
|
)""")
|
||||||
conn.execute(
|
conn.execute(
|
||||||
"INSERT INTO marketplace_registry (id, name, url) "
|
"INSERT INTO marketplace_registry (id, name, url) VALUES ('legacy', 'Legacy', 'https://example.com/repo.git')"
|
||||||
"VALUES ('legacy', 'Legacy', 'https://example.com/repo.git')"
|
|
||||||
)
|
|
||||||
conn.execute(
|
|
||||||
"INSERT INTO marketplace_plugins (marketplace_id, name) "
|
|
||||||
"VALUES ('legacy', 'foo')"
|
|
||||||
)
|
)
|
||||||
|
conn.execute("INSERT INTO marketplace_plugins (marketplace_id, name) VALUES ('legacy', 'foo')")
|
||||||
|
|
||||||
_ensure_schema(conn)
|
_ensure_schema(conn)
|
||||||
assert get_schema_version(conn) == SCHEMA_VERSION
|
assert get_schema_version(conn) == SCHEMA_VERSION
|
||||||
|
|
||||||
cols = {
|
cols = {
|
||||||
r[0] for r in conn.execute(
|
r[0]
|
||||||
"SELECT column_name FROM information_schema.columns "
|
for r in conn.execute(
|
||||||
"WHERE table_name = 'marketplace_plugins'"
|
"SELECT column_name FROM information_schema.columns WHERE table_name = 'marketplace_plugins'"
|
||||||
).fetchall()
|
).fetchall()
|
||||||
}
|
}
|
||||||
assert "is_system" in cols
|
assert "is_system" in cols
|
||||||
|
|
@ -276,8 +251,7 @@ def test_v38_db_migrates_to_v39(tmp_path):
|
||||||
# Existing pre-v39 row backfilled to FALSE — no plugin lands in
|
# Existing pre-v39 row backfilled to FALSE — no plugin lands in
|
||||||
# everyone's stack just because we ran the migration.
|
# everyone's stack just because we ran the migration.
|
||||||
row = conn.execute(
|
row = conn.execute(
|
||||||
"SELECT is_system FROM marketplace_plugins "
|
"SELECT is_system FROM marketplace_plugins WHERE marketplace_id = 'legacy' AND name = 'foo'"
|
||||||
"WHERE marketplace_id = 'legacy' AND name = 'foo'"
|
|
||||||
).fetchone()
|
).fetchone()
|
||||||
assert row[0] is False, f"pre-existing row backfilled to {row[0]!r}"
|
assert row[0] is False, f"pre-existing row backfilled to {row[0]!r}"
|
||||||
conn.close()
|
conn.close()
|
||||||
|
|
@ -289,9 +263,9 @@ def test_v20_adds_source_query(tmp_path):
|
||||||
_ensure_schema(conn)
|
_ensure_schema(conn)
|
||||||
|
|
||||||
cols = {
|
cols = {
|
||||||
r[0] for r in conn.execute(
|
r[0]
|
||||||
"SELECT column_name FROM information_schema.columns "
|
for r in conn.execute(
|
||||||
"WHERE table_name = 'table_registry'"
|
"SELECT column_name FROM information_schema.columns WHERE table_name = 'table_registry'"
|
||||||
).fetchall()
|
).fetchall()
|
||||||
}
|
}
|
||||||
assert "source_query" in cols, f"source_query missing from {cols}"
|
assert "source_query" in cols, f"source_query missing from {cols}"
|
||||||
|
|
@ -312,19 +286,13 @@ def test_claude_md_template_seeded_in_instance_templates(tmp_path):
|
||||||
_ensure_schema(conn)
|
_ensure_schema(conn)
|
||||||
|
|
||||||
tables = {
|
tables = {
|
||||||
r[0] for r in conn.execute(
|
r[0]
|
||||||
"SELECT table_name FROM information_schema.tables "
|
for r in conn.execute("SELECT table_name FROM information_schema.tables WHERE table_schema = 'main'").fetchall()
|
||||||
"WHERE table_schema = 'main'"
|
|
||||||
).fetchall()
|
|
||||||
}
|
}
|
||||||
assert "instance_templates" in tables
|
assert "instance_templates" in tables
|
||||||
assert "claude_md_template" not in tables, (
|
assert "claude_md_template" not in tables, "claude_md_template should be consolidated away post-v28"
|
||||||
"claude_md_template should be consolidated away post-v28"
|
|
||||||
)
|
|
||||||
|
|
||||||
row = conn.execute(
|
row = conn.execute("SELECT key, content FROM instance_templates WHERE key = 'claude_md'").fetchone()
|
||||||
"SELECT key, content FROM instance_templates WHERE key = 'claude_md'"
|
|
||||||
).fetchone()
|
|
||||||
assert row is not None
|
assert row is not None
|
||||||
assert row[0] == "claude_md"
|
assert row[0] == "claude_md"
|
||||||
assert row[1] is None # default = no override
|
assert row[1] is None # default = no override
|
||||||
|
|
@ -340,10 +308,7 @@ def test_v19_db_migrates_to_v20(tmp_path):
|
||||||
# Simulate a v19 DB at minimal but realistic shape: schema_version row +
|
# Simulate a v19 DB at minimal but realistic shape: schema_version row +
|
||||||
# a table_registry row in the post-v19 column shape (no is_public column,
|
# a table_registry row in the post-v19 column shape (no is_public column,
|
||||||
# since v19 finalize dropped it via the table-rebuild idiom).
|
# since v19 finalize dropped it via the table-rebuild idiom).
|
||||||
conn.execute(
|
conn.execute("CREATE TABLE schema_version (version INTEGER, applied_at TIMESTAMP DEFAULT current_timestamp)")
|
||||||
"CREATE TABLE schema_version (version INTEGER, "
|
|
||||||
"applied_at TIMESTAMP DEFAULT current_timestamp)"
|
|
||||||
)
|
|
||||||
conn.execute("INSERT INTO schema_version (version) VALUES (19)")
|
conn.execute("INSERT INTO schema_version (version) VALUES (19)")
|
||||||
conn.execute("""CREATE TABLE table_registry (
|
conn.execute("""CREATE TABLE table_registry (
|
||||||
id VARCHAR PRIMARY KEY, name VARCHAR NOT NULL,
|
id VARCHAR PRIMARY KEY, name VARCHAR NOT NULL,
|
||||||
|
|
@ -361,16 +326,14 @@ def test_v19_db_migrates_to_v20(tmp_path):
|
||||||
|
|
||||||
assert get_schema_version(conn) == SCHEMA_VERSION # bumped 19→28 forward
|
assert get_schema_version(conn) == SCHEMA_VERSION # bumped 19→28 forward
|
||||||
cols = {
|
cols = {
|
||||||
r[0] for r in conn.execute(
|
r[0]
|
||||||
"SELECT column_name FROM information_schema.columns "
|
for r in conn.execute(
|
||||||
"WHERE table_name = 'table_registry'"
|
"SELECT column_name FROM information_schema.columns WHERE table_name = 'table_registry'"
|
||||||
).fetchall()
|
).fetchall()
|
||||||
}
|
}
|
||||||
assert "source_query" in cols
|
assert "source_query" in cols
|
||||||
# Existing row preserved, new column NULL
|
# Existing row preserved, new column NULL
|
||||||
row = conn.execute(
|
row = conn.execute("SELECT id, source_query FROM table_registry WHERE id='foo'").fetchone()
|
||||||
"SELECT id, source_query FROM table_registry WHERE id='foo'"
|
|
||||||
).fetchone()
|
|
||||||
assert row == ("foo", None)
|
assert row == ("foo", None)
|
||||||
conn.close()
|
conn.close()
|
||||||
|
|
||||||
|
|
@ -389,8 +352,7 @@ def _make_v34_store_entities(conn):
|
||||||
)
|
)
|
||||||
""")
|
""")
|
||||||
conn.execute(
|
conn.execute(
|
||||||
"INSERT INTO store_entities (id, visibility_status) VALUES "
|
"INSERT INTO store_entities (id, visibility_status) VALUES ('a', 'approved'), ('b', 'pending'), ('c', 'hidden')"
|
||||||
"('a', 'approved'), ('b', 'pending'), ('c', 'hidden')"
|
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -409,9 +371,9 @@ def test_v34_to_v35_clean_path_rebuilds_visibility_column(tmp_path):
|
||||||
_v34_to_v35_migrate(conn)
|
_v34_to_v35_migrate(conn)
|
||||||
|
|
||||||
cols = {
|
cols = {
|
||||||
r[0] for r in conn.execute(
|
r[0]
|
||||||
"SELECT column_name FROM information_schema.columns "
|
for r in conn.execute(
|
||||||
"WHERE table_name = 'store_entities'"
|
"SELECT column_name FROM information_schema.columns WHERE table_name = 'store_entities'"
|
||||||
).fetchall()
|
).fetchall()
|
||||||
}
|
}
|
||||||
assert "visibility_status" in cols
|
assert "visibility_status" in cols
|
||||||
|
|
@ -419,12 +381,8 @@ def test_v34_to_v35_clean_path_rebuilds_visibility_column(tmp_path):
|
||||||
assert "archived_at" in cols
|
assert "archived_at" in cols
|
||||||
assert "archived_by" in cols
|
assert "archived_by" in cols
|
||||||
|
|
||||||
rows = dict(conn.execute(
|
rows = dict(conn.execute("SELECT id, visibility_status FROM store_entities ORDER BY id").fetchall())
|
||||||
"SELECT id, visibility_status FROM store_entities ORDER BY id"
|
assert rows == {"a": "approved", "b": "pending", "c": "hidden"}, f"row values must survive the rebuild: {rows}"
|
||||||
).fetchall())
|
|
||||||
assert rows == {"a": "approved", "b": "pending", "c": "hidden"}, (
|
|
||||||
f"row values must survive the rebuild: {rows}"
|
|
||||||
)
|
|
||||||
conn.close()
|
conn.close()
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -452,16 +410,15 @@ def test_v34_to_v35_recovers_from_partial_rebuild_missing_visibility(tmp_path):
|
||||||
)
|
)
|
||||||
""")
|
""")
|
||||||
conn.execute(
|
conn.execute(
|
||||||
"INSERT INTO store_entities (id, _vis_v35) VALUES "
|
"INSERT INTO store_entities (id, _vis_v35) VALUES ('a', 'approved'), ('b', 'pending'), ('c', 'hidden')"
|
||||||
"('a', 'approved'), ('b', 'pending'), ('c', 'hidden')"
|
|
||||||
)
|
)
|
||||||
|
|
||||||
_v34_to_v35_migrate(conn)
|
_v34_to_v35_migrate(conn)
|
||||||
|
|
||||||
cols = {
|
cols = {
|
||||||
r[0] for r in conn.execute(
|
r[0]
|
||||||
"SELECT column_name FROM information_schema.columns "
|
for r in conn.execute(
|
||||||
"WHERE table_name = 'store_entities'"
|
"SELECT column_name FROM information_schema.columns WHERE table_name = 'store_entities'"
|
||||||
).fetchall()
|
).fetchall()
|
||||||
}
|
}
|
||||||
assert "visibility_status" in cols
|
assert "visibility_status" in cols
|
||||||
|
|
@ -469,9 +426,7 @@ def test_v34_to_v35_recovers_from_partial_rebuild_missing_visibility(tmp_path):
|
||||||
assert "archived_at" in cols
|
assert "archived_at" in cols
|
||||||
assert "archived_by" in cols
|
assert "archived_by" in cols
|
||||||
|
|
||||||
rows = dict(conn.execute(
|
rows = dict(conn.execute("SELECT id, visibility_status FROM store_entities ORDER BY id").fetchall())
|
||||||
"SELECT id, visibility_status FROM store_entities ORDER BY id"
|
|
||||||
).fetchall())
|
|
||||||
assert rows == {"a": "approved", "b": "pending", "c": "hidden"}, (
|
assert rows == {"a": "approved", "b": "pending", "c": "hidden"}, (
|
||||||
f"row values must come back via RENAME, not be lost: {rows}"
|
f"row values must come back via RENAME, not be lost: {rows}"
|
||||||
)
|
)
|
||||||
|
|
@ -495,25 +450,20 @@ def test_v34_to_v35_recovers_from_partial_rebuild_both_columns(tmp_path):
|
||||||
_vis_v35 VARCHAR
|
_vis_v35 VARCHAR
|
||||||
)
|
)
|
||||||
""")
|
""")
|
||||||
conn.execute(
|
conn.execute("INSERT INTO store_entities (id, visibility_status, _vis_v35) VALUES ('a', 'approved', 'approved')")
|
||||||
"INSERT INTO store_entities (id, visibility_status, _vis_v35) VALUES "
|
|
||||||
"('a', 'approved', 'approved')"
|
|
||||||
)
|
|
||||||
|
|
||||||
_v34_to_v35_migrate(conn)
|
_v34_to_v35_migrate(conn)
|
||||||
|
|
||||||
cols = {
|
cols = {
|
||||||
r[0] for r in conn.execute(
|
r[0]
|
||||||
"SELECT column_name FROM information_schema.columns "
|
for r in conn.execute(
|
||||||
"WHERE table_name = 'store_entities'"
|
"SELECT column_name FROM information_schema.columns WHERE table_name = 'store_entities'"
|
||||||
).fetchall()
|
).fetchall()
|
||||||
}
|
}
|
||||||
assert "visibility_status" in cols
|
assert "visibility_status" in cols
|
||||||
assert "_vis_v35" not in cols, "temp column must be dropped"
|
assert "_vis_v35" not in cols, "temp column must be dropped"
|
||||||
|
|
||||||
row = conn.execute(
|
row = conn.execute("SELECT id, visibility_status FROM store_entities WHERE id = 'a'").fetchone()
|
||||||
"SELECT id, visibility_status FROM store_entities WHERE id = 'a'"
|
|
||||||
).fetchone()
|
|
||||||
assert row == ("a", "approved")
|
assert row == ("a", "approved")
|
||||||
conn.close()
|
conn.close()
|
||||||
|
|
||||||
|
|
@ -533,10 +483,7 @@ def test_v32_db_with_partial_v35_recovers_through_full_ladder(tmp_path):
|
||||||
# Stand up the broken state. We only need enough of the schema for the
|
# Stand up the broken state. We only need enough of the schema for the
|
||||||
# migration ladder to run — ``_ensure_schema`` will create the rest
|
# migration ladder to run — ``_ensure_schema`` will create the rest
|
||||||
# via ``_SYSTEM_SCHEMA``'s IF NOT EXISTS guards.
|
# via ``_SYSTEM_SCHEMA``'s IF NOT EXISTS guards.
|
||||||
conn.execute(
|
conn.execute("CREATE TABLE schema_version (version INTEGER, applied_at TIMESTAMP DEFAULT current_timestamp)")
|
||||||
"CREATE TABLE schema_version (version INTEGER, "
|
|
||||||
"applied_at TIMESTAMP DEFAULT current_timestamp)"
|
|
||||||
)
|
|
||||||
conn.execute("INSERT INTO schema_version (version) VALUES (32)")
|
conn.execute("INSERT INTO schema_version (version) VALUES (32)")
|
||||||
conn.execute("""
|
conn.execute("""
|
||||||
CREATE TABLE store_entities (
|
CREATE TABLE store_entities (
|
||||||
|
|
@ -550,26 +497,21 @@ def test_v32_db_with_partial_v35_recovers_through_full_ladder(tmp_path):
|
||||||
_vis_v35 VARCHAR
|
_vis_v35 VARCHAR
|
||||||
)
|
)
|
||||||
""")
|
""")
|
||||||
conn.execute(
|
conn.execute("INSERT INTO store_entities (id, type, name, _vis_v35) VALUES ('a', 'skill', 'alpha', 'approved')")
|
||||||
"INSERT INTO store_entities (id, type, name, _vis_v35) "
|
|
||||||
"VALUES ('a', 'skill', 'alpha', 'approved')"
|
|
||||||
)
|
|
||||||
|
|
||||||
_ensure_schema(conn)
|
_ensure_schema(conn)
|
||||||
|
|
||||||
assert get_schema_version(conn) == SCHEMA_VERSION
|
assert get_schema_version(conn) == SCHEMA_VERSION
|
||||||
cols = {
|
cols = {
|
||||||
r[0] for r in conn.execute(
|
r[0]
|
||||||
"SELECT column_name FROM information_schema.columns "
|
for r in conn.execute(
|
||||||
"WHERE table_name = 'store_entities'"
|
"SELECT column_name FROM information_schema.columns WHERE table_name = 'store_entities'"
|
||||||
).fetchall()
|
).fetchall()
|
||||||
}
|
}
|
||||||
assert "visibility_status" in cols
|
assert "visibility_status" in cols
|
||||||
assert "_vis_v35" not in cols
|
assert "_vis_v35" not in cols
|
||||||
# Existing row preserved, value carried over from _vis_v35.
|
# Existing row preserved, value carried over from _vis_v35.
|
||||||
row = conn.execute(
|
row = conn.execute("SELECT id, visibility_status FROM store_entities WHERE id = 'a'").fetchone()
|
||||||
"SELECT id, visibility_status FROM store_entities WHERE id = 'a'"
|
|
||||||
).fetchone()
|
|
||||||
assert row == ("a", "approved")
|
assert row == ("a", "approved")
|
||||||
conn.close()
|
conn.close()
|
||||||
|
|
||||||
|
|
@ -593,13 +535,9 @@ def test_v35_to_v36_reapplies_visibility_constraints(tmp_path):
|
||||||
).fetchall()
|
).fetchall()
|
||||||
assert cols, "visibility_status column missing from store_entities"
|
assert cols, "visibility_status column missing from store_entities"
|
||||||
name, is_nullable, default_expr = cols[0]
|
name, is_nullable, default_expr = cols[0]
|
||||||
assert is_nullable == "NO", (
|
assert is_nullable == "NO", f"visibility_status must be NOT NULL after v36; got is_nullable={is_nullable!r}"
|
||||||
f"visibility_status must be NOT NULL after v36; got is_nullable={is_nullable!r}"
|
|
||||||
)
|
|
||||||
# DuckDB renders the default as a quoted literal — match either form.
|
# DuckDB renders the default as a quoted literal — match either form.
|
||||||
assert default_expr is not None, "visibility_status DEFAULT must be set"
|
assert default_expr is not None, "visibility_status DEFAULT must be set"
|
||||||
assert "pending" in str(default_expr).lower(), (
|
assert "pending" in str(default_expr).lower(), f"visibility_status DEFAULT must be 'pending'; got {default_expr!r}"
|
||||||
f"visibility_status DEFAULT must be 'pending'; got {default_expr!r}"
|
|
||||||
)
|
|
||||||
|
|
||||||
conn.close()
|
conn.close()
|
||||||
|
|
|
||||||
|
|
@ -10,6 +10,7 @@ Covers:
|
||||||
- get_home_status_frame_visibility honors the env var + yaml override
|
- get_home_status_frame_visibility honors the env var + yaml override
|
||||||
and defaults true.
|
and defaults true.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import asyncio
|
import asyncio
|
||||||
|
|
@ -42,10 +43,7 @@ def test_v44_fresh_install_has_token_columns_and_last_pull(tmp_path):
|
||||||
user_cols = {r[1] for r in conn.execute("PRAGMA table_info(users)").fetchall()}
|
user_cols = {r[1] for r in conn.execute("PRAGMA table_info(users)").fetchall()}
|
||||||
assert "last_pull_at" in user_cols
|
assert "last_pull_at" in user_cols
|
||||||
|
|
||||||
sess_cols = {
|
sess_cols = {r[1] for r in conn.execute("PRAGMA table_info(usage_session_summary)").fetchall()}
|
||||||
r[1]
|
|
||||||
for r in conn.execute("PRAGMA table_info(usage_session_summary)").fetchall()
|
|
||||||
}
|
|
||||||
for col in (
|
for col in (
|
||||||
"input_tokens",
|
"input_tokens",
|
||||||
"output_tokens",
|
"output_tokens",
|
||||||
|
|
@ -61,10 +59,7 @@ def test_v43_to_v44_upgrade_is_idempotent(tmp_path):
|
||||||
db_path = tmp_path / "system.duckdb"
|
db_path = tmp_path / "system.duckdb"
|
||||||
conn = duckdb.connect(str(db_path))
|
conn = duckdb.connect(str(db_path))
|
||||||
# Hand-roll v43-shaped tables (no last_pull_at, no token cols).
|
# Hand-roll v43-shaped tables (no last_pull_at, no token cols).
|
||||||
conn.execute(
|
conn.execute("CREATE TABLE users (id VARCHAR PRIMARY KEY, email VARCHAR, onboarded BOOLEAN DEFAULT FALSE)")
|
||||||
"CREATE TABLE users (id VARCHAR PRIMARY KEY, email VARCHAR, "
|
|
||||||
"onboarded BOOLEAN DEFAULT FALSE)"
|
|
||||||
)
|
|
||||||
conn.execute(
|
conn.execute(
|
||||||
"""
|
"""
|
||||||
CREATE TABLE usage_session_summary (
|
CREATE TABLE usage_session_summary (
|
||||||
|
|
@ -82,14 +77,8 @@ def test_v43_to_v44_upgrade_is_idempotent(tmp_path):
|
||||||
_v43_to_v44(conn)
|
_v43_to_v44(conn)
|
||||||
_v43_to_v44(conn) # idempotent
|
_v43_to_v44(conn) # idempotent
|
||||||
|
|
||||||
assert "last_pull_at" in {
|
assert "last_pull_at" in {r[1] for r in conn.execute("PRAGMA table_info(users)").fetchall()}
|
||||||
r[1] for r in conn.execute("PRAGMA table_info(users)").fetchall()
|
tok_cols = {r[1] for r in conn.execute("PRAGMA table_info(usage_session_summary)").fetchall() if "token" in r[1]}
|
||||||
}
|
|
||||||
tok_cols = {
|
|
||||||
r[1]
|
|
||||||
for r in conn.execute("PRAGMA table_info(usage_session_summary)").fetchall()
|
|
||||||
if "token" in r[1]
|
|
||||||
}
|
|
||||||
assert tok_cols == {
|
assert tok_cols == {
|
||||||
"input_tokens",
|
"input_tokens",
|
||||||
"output_tokens",
|
"output_tokens",
|
||||||
|
|
@ -100,7 +89,7 @@ def test_v43_to_v44_upgrade_is_idempotent(tmp_path):
|
||||||
|
|
||||||
def test_schema_version_constant_is_44():
|
def test_schema_version_constant_is_44():
|
||||||
"""Belt + suspenders against schema_version regressions."""
|
"""Belt + suspenders against schema_version regressions."""
|
||||||
assert SCHEMA_VERSION == 44
|
assert SCHEMA_VERSION == 45
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
@ -118,15 +107,23 @@ def stats_conn(tmp_path):
|
||||||
|
|
||||||
def _seed_user(conn, *, uid="u1", email="alice@example.com"):
|
def _seed_user(conn, *, uid="u1", email="alice@example.com"):
|
||||||
conn.execute(
|
conn.execute(
|
||||||
"INSERT INTO users (id, email, active, onboarded, last_pull_at) "
|
"INSERT INTO users (id, email, active, onboarded, last_pull_at) VALUES (?, ?, TRUE, TRUE, current_timestamp)",
|
||||||
"VALUES (?, ?, TRUE, TRUE, current_timestamp)",
|
|
||||||
[uid, email],
|
[uid, email],
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
def _seed_session(conn, *, session_file, username, started_sql, prompts=0,
|
def _seed_session(
|
||||||
input_tokens=0, output_tokens=0,
|
conn,
|
||||||
cache_read=0, cache_creation=0):
|
*,
|
||||||
|
session_file,
|
||||||
|
username,
|
||||||
|
started_sql,
|
||||||
|
prompts=0,
|
||||||
|
input_tokens=0,
|
||||||
|
output_tokens=0,
|
||||||
|
cache_read=0,
|
||||||
|
cache_creation=0,
|
||||||
|
):
|
||||||
conn.execute(
|
conn.execute(
|
||||||
f"""
|
f"""
|
||||||
INSERT INTO usage_session_summary
|
INSERT INTO usage_session_summary
|
||||||
|
|
@ -136,8 +133,7 @@ def _seed_session(conn, *, session_file, username, started_sql, prompts=0,
|
||||||
VALUES (?, ?, ?, {started_sql}, current_timestamp,
|
VALUES (?, ?, ?, {started_sql}, current_timestamp,
|
||||||
?, ?, ?, ?, ?, 2)
|
?, ?, ?, ?, ?, 2)
|
||||||
""",
|
""",
|
||||||
[session_file, session_file, username, prompts,
|
[session_file, session_file, username, prompts, input_tokens, output_tokens, cache_read, cache_creation],
|
||||||
input_tokens, output_tokens, cache_read, cache_creation],
|
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -159,27 +155,60 @@ def test_compute_home_stats_24h_vs_7d_windowing(stats_conn):
|
||||||
from app.api.me import compute_home_stats
|
from app.api.me import compute_home_stats
|
||||||
|
|
||||||
_seed_user(stats_conn)
|
_seed_user(stats_conn)
|
||||||
_seed_session(stats_conn, session_file="a.jsonl", username="alice",
|
_seed_session(
|
||||||
started_sql="current_timestamp - INTERVAL 1 HOUR",
|
stats_conn,
|
||||||
prompts=5, input_tokens=100, output_tokens=50,
|
session_file="a.jsonl",
|
||||||
cache_read=800, cache_creation=25)
|
username="alice",
|
||||||
_seed_session(stats_conn, session_file="b.jsonl", username="alice",
|
started_sql="current_timestamp - INTERVAL 1 HOUR",
|
||||||
started_sql="current_timestamp - INTERVAL 3 DAY",
|
prompts=5,
|
||||||
prompts=5, input_tokens=100, output_tokens=50,
|
input_tokens=100,
|
||||||
cache_read=800, cache_creation=25)
|
output_tokens=50,
|
||||||
_seed_session(stats_conn, session_file="c.jsonl", username="alice",
|
cache_read=800,
|
||||||
started_sql="current_timestamp - INTERVAL 30 DAY",
|
cache_creation=25,
|
||||||
prompts=99)
|
)
|
||||||
|
_seed_session(
|
||||||
|
stats_conn,
|
||||||
|
session_file="b.jsonl",
|
||||||
|
username="alice",
|
||||||
|
started_sql="current_timestamp - INTERVAL 3 DAY",
|
||||||
|
prompts=5,
|
||||||
|
input_tokens=100,
|
||||||
|
output_tokens=50,
|
||||||
|
cache_read=800,
|
||||||
|
cache_creation=25,
|
||||||
|
)
|
||||||
|
_seed_session(
|
||||||
|
stats_conn,
|
||||||
|
session_file="c.jsonl",
|
||||||
|
username="alice",
|
||||||
|
started_sql="current_timestamp - INTERVAL 30 DAY",
|
||||||
|
prompts=99,
|
||||||
|
)
|
||||||
|
|
||||||
_seed_event(stats_conn, ev_id="e1", session_file="a.jsonl",
|
_seed_event(
|
||||||
username="alice", cwd="/proj/alpha",
|
stats_conn,
|
||||||
occurred_sql="current_timestamp - INTERVAL 1 HOUR")
|
ev_id="e1",
|
||||||
_seed_event(stats_conn, ev_id="e2", session_file="a.jsonl",
|
session_file="a.jsonl",
|
||||||
username="alice", cwd="/proj/beta",
|
username="alice",
|
||||||
occurred_sql="current_timestamp - INTERVAL 2 HOUR")
|
cwd="/proj/alpha",
|
||||||
_seed_event(stats_conn, ev_id="e3", session_file="b.jsonl",
|
occurred_sql="current_timestamp - INTERVAL 1 HOUR",
|
||||||
username="alice", cwd="/proj/gamma",
|
)
|
||||||
occurred_sql="current_timestamp - INTERVAL 3 DAY")
|
_seed_event(
|
||||||
|
stats_conn,
|
||||||
|
ev_id="e2",
|
||||||
|
session_file="a.jsonl",
|
||||||
|
username="alice",
|
||||||
|
cwd="/proj/beta",
|
||||||
|
occurred_sql="current_timestamp - INTERVAL 2 HOUR",
|
||||||
|
)
|
||||||
|
_seed_event(
|
||||||
|
stats_conn,
|
||||||
|
ev_id="e3",
|
||||||
|
session_file="b.jsonl",
|
||||||
|
username="alice",
|
||||||
|
cwd="/proj/gamma",
|
||||||
|
occurred_sql="current_timestamp - INTERVAL 3 DAY",
|
||||||
|
)
|
||||||
|
|
||||||
user = {"id": "u1", "email": "alice@example.com"}
|
user = {"id": "u1", "email": "alice@example.com"}
|
||||||
|
|
||||||
|
|
@ -243,8 +272,10 @@ def test_compute_home_stats_missing_users_row_returns_zeros(stats_conn):
|
||||||
"sessions": 0,
|
"sessions": 0,
|
||||||
"prompts": 0,
|
"prompts": 0,
|
||||||
"tokens": {
|
"tokens": {
|
||||||
"input": 0, "output": 0,
|
"input": 0,
|
||||||
"cache_read": 0, "cache_creation": 0,
|
"output": 0,
|
||||||
|
"cache_read": 0,
|
||||||
|
"cache_creation": 0,
|
||||||
"total": 0,
|
"total": 0,
|
||||||
},
|
},
|
||||||
"projects": 0,
|
"projects": 0,
|
||||||
|
|
@ -267,9 +298,7 @@ def test_sync_manifest_bumps_last_pull_at(stats_conn, monkeypatch, tmp_path):
|
||||||
|
|
||||||
_seed_user(stats_conn, uid="u_pull", email="puller@example.com")
|
_seed_user(stats_conn, uid="u_pull", email="puller@example.com")
|
||||||
# Wipe seeded last_pull_at so we can detect the bump.
|
# Wipe seeded last_pull_at so we can detect the bump.
|
||||||
stats_conn.execute(
|
stats_conn.execute("UPDATE users SET last_pull_at = NULL WHERE id = ?", ["u_pull"])
|
||||||
"UPDATE users SET last_pull_at = NULL WHERE id = ?", ["u_pull"]
|
|
||||||
)
|
|
||||||
|
|
||||||
asyncio.run(
|
asyncio.run(
|
||||||
sync_manifest(
|
sync_manifest(
|
||||||
|
|
@ -277,9 +306,7 @@ def test_sync_manifest_bumps_last_pull_at(stats_conn, monkeypatch, tmp_path):
|
||||||
conn=stats_conn,
|
conn=stats_conn,
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
row = stats_conn.execute(
|
row = stats_conn.execute("SELECT last_pull_at FROM users WHERE id = ?", ["u_pull"]).fetchone()
|
||||||
"SELECT last_pull_at FROM users WHERE id = ?", ["u_pull"]
|
|
||||||
).fetchone()
|
|
||||||
# Don't compare against `datetime.now(utc)` — DuckDB's
|
# Don't compare against `datetime.now(utc)` — DuckDB's
|
||||||
# ``current_timestamp`` returns the session's wall-clock time which
|
# ``current_timestamp`` returns the session's wall-clock time which
|
||||||
# may be naive-local-or-utc depending on the environment, so a
|
# may be naive-local-or-utc depending on the environment, so a
|
||||||
|
|
@ -298,6 +325,7 @@ def test_status_frame_default_is_visible(monkeypatch):
|
||||||
"""Absent both env var and yaml entry, the flag returns True."""
|
"""Absent both env var and yaml entry, the flag returns True."""
|
||||||
monkeypatch.delenv("AGNES_HOME_SHOW_STATUS_FRAME", raising=False)
|
monkeypatch.delenv("AGNES_HOME_SHOW_STATUS_FRAME", raising=False)
|
||||||
from app.instance_config import get_home_status_frame_visibility
|
from app.instance_config import get_home_status_frame_visibility
|
||||||
|
|
||||||
assert get_home_status_frame_visibility() is True
|
assert get_home_status_frame_visibility() is True
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -305,12 +333,14 @@ def test_status_frame_env_var_off(monkeypatch):
|
||||||
"""AGNES_HOME_SHOW_STATUS_FRAME=0 hides the frame."""
|
"""AGNES_HOME_SHOW_STATUS_FRAME=0 hides the frame."""
|
||||||
monkeypatch.setenv("AGNES_HOME_SHOW_STATUS_FRAME", "0")
|
monkeypatch.setenv("AGNES_HOME_SHOW_STATUS_FRAME", "0")
|
||||||
from app.instance_config import get_home_status_frame_visibility
|
from app.instance_config import get_home_status_frame_visibility
|
||||||
|
|
||||||
assert get_home_status_frame_visibility() is False
|
assert get_home_status_frame_visibility() is False
|
||||||
|
|
||||||
|
|
||||||
def test_status_frame_env_var_falsey_values(monkeypatch):
|
def test_status_frame_env_var_falsey_values(monkeypatch):
|
||||||
"""Each of {0, false, no, off, ''} hides the frame; anything else shows."""
|
"""Each of {0, false, no, off, ''} hides the frame; anything else shows."""
|
||||||
from app.instance_config import get_home_status_frame_visibility
|
from app.instance_config import get_home_status_frame_visibility
|
||||||
|
|
||||||
for val in ("0", "false", "False", "FALSE", "no", "off", ""):
|
for val in ("0", "false", "False", "FALSE", "no", "off", ""):
|
||||||
monkeypatch.setenv("AGNES_HOME_SHOW_STATUS_FRAME", val)
|
monkeypatch.setenv("AGNES_HOME_SHOW_STATUS_FRAME", val)
|
||||||
assert get_home_status_frame_visibility() is False, f"{val!r} should hide"
|
assert get_home_status_frame_visibility() is False, f"{val!r} should hide"
|
||||||
|
|
|
||||||
|
|
@ -12,9 +12,7 @@ Coverage:
|
||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import os
|
|
||||||
|
|
||||||
import duckdb
|
|
||||||
import pytest
|
import pytest
|
||||||
|
|
||||||
from connectors.internal.access import (
|
from connectors.internal.access import (
|
||||||
|
|
@ -46,22 +44,24 @@ def system_db(tmp_path, monkeypatch):
|
||||||
|
|
||||||
# Force close any pre-existing handle held by an earlier test.
|
# Force close any pre-existing handle held by an earlier test.
|
||||||
from src.db import close_system_db
|
from src.db import close_system_db
|
||||||
|
|
||||||
close_system_db()
|
close_system_db()
|
||||||
|
|
||||||
from src.db import get_system_db
|
from src.db import get_system_db
|
||||||
|
|
||||||
conn = get_system_db()
|
conn = get_system_db()
|
||||||
_ensure_schema(conn)
|
_ensure_schema(conn)
|
||||||
ensure_internal_tables_registered(conn)
|
ensure_internal_tables_registered(conn)
|
||||||
# Seed a couple of canonical rows for the RBAC checks.
|
# Seed a couple of canonical rows for the RBAC checks.
|
||||||
conn.execute(
|
conn.execute(
|
||||||
"INSERT INTO usage_session_summary "
|
"INSERT INTO usage_session_summary "
|
||||||
"(session_file, session_id, username, tool_calls, tool_errors, processor_version) VALUES "
|
"(session_file, session_id, username, user_id, tool_calls, tool_errors, processor_version) VALUES "
|
||||||
"('alice/s1.jsonl', 's-a-1', 'alice', 10, 1, 1)"
|
"('alice/s1.jsonl', 's-a-1', 'alice', 'alice-uuid', 10, 1, 1)"
|
||||||
)
|
)
|
||||||
conn.execute(
|
conn.execute(
|
||||||
"INSERT INTO usage_session_summary "
|
"INSERT INTO usage_session_summary "
|
||||||
"(session_file, session_id, username, tool_calls, tool_errors, processor_version) VALUES "
|
"(session_file, session_id, username, user_id, tool_calls, tool_errors, processor_version) VALUES "
|
||||||
"('bob/s2.jsonl', 's-b-1', 'bob', 20, 0, 1)"
|
"('bob/s2.jsonl', 's-b-1', 'bob', 'bob-uuid', 20, 0, 1)"
|
||||||
)
|
)
|
||||||
conn.execute(
|
conn.execute(
|
||||||
"INSERT INTO audit_log (id, user_id, action, result) VALUES "
|
"INSERT INTO audit_log (id, user_id, action, result) VALUES "
|
||||||
|
|
@ -79,6 +79,7 @@ def system_db(tmp_path, monkeypatch):
|
||||||
# Static helpers — no DB needed
|
# Static helpers — no DB needed
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
def test_is_internal_table_recognises_canonical_ids():
|
def test_is_internal_table_recognises_canonical_ids():
|
||||||
assert is_internal_table("agnes_sessions")
|
assert is_internal_table("agnes_sessions")
|
||||||
assert is_internal_table("agnes_telemetry")
|
assert is_internal_table("agnes_telemetry")
|
||||||
|
|
@ -108,25 +109,31 @@ def test_filter_clause_admin_is_empty():
|
||||||
assert build_filter_clause(table, {"email": "admin@x", "id": "admin-uuid"}, True) == ""
|
assert build_filter_clause(table, {"email": "admin@x", "id": "admin-uuid"}, True) == ""
|
||||||
|
|
||||||
|
|
||||||
def test_filter_clause_non_admin_scopes_to_username():
|
def test_filter_clause_non_admin_scopes_to_user_id():
|
||||||
|
"""agnes_sessions filters on user_id with OR username fallback for
|
||||||
|
pre-backfill rows."""
|
||||||
table = INTERNAL_TABLES_BY_ID["agnes_sessions"]
|
table = INTERNAL_TABLES_BY_ID["agnes_sessions"]
|
||||||
clause = build_filter_clause(table, {"email": "alice@example.com", "id": "alice-uuid"}, False)
|
clause = build_filter_clause(table, {"email": "alice@example.com", "id": "alice-uuid"}, False)
|
||||||
assert clause == "WHERE username = 'alice'"
|
assert clause == "WHERE (user_id = 'alice-uuid' OR username = 'alice')"
|
||||||
|
|
||||||
|
|
||||||
def test_filter_clause_non_admin_scopes_audit_to_user_id():
|
def test_filter_clause_non_admin_scopes_audit_to_user_id():
|
||||||
table = INTERNAL_TABLES_BY_ID["agnes_audit"]
|
table = INTERNAL_TABLES_BY_ID["agnes_audit"]
|
||||||
clause = build_filter_clause(
|
clause = build_filter_clause(
|
||||||
table, {"email": "alice@example.com", "id": "alice-uuid"}, False,
|
table,
|
||||||
|
{"email": "alice@example.com", "id": "alice-uuid"},
|
||||||
|
False,
|
||||||
)
|
)
|
||||||
assert clause == "WHERE user_id = 'alice-uuid'"
|
assert clause == "WHERE user_id = 'alice-uuid'"
|
||||||
|
|
||||||
|
|
||||||
def test_filter_clause_rejects_unsafe_username():
|
def test_filter_clause_rejects_unsafe_user_id():
|
||||||
table = INTERNAL_TABLES_BY_ID["agnes_sessions"]
|
table = INTERNAL_TABLES_BY_ID["agnes_sessions"]
|
||||||
with pytest.raises(InternalAccessError):
|
with pytest.raises(InternalAccessError):
|
||||||
build_filter_clause(
|
build_filter_clause(
|
||||||
table, {"email": "alice'; DROP TABLE--@example.com", "id": "x"}, False,
|
table,
|
||||||
|
{"email": "alice@example.com", "id": "'; DROP TABLE--"},
|
||||||
|
False,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -134,6 +141,7 @@ def test_filter_clause_rejects_unsafe_username():
|
||||||
# End-to-end RBAC — runs against a real (fresh) system.duckdb
|
# End-to-end RBAC — runs against a real (fresh) system.duckdb
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
def test_admin_sees_every_user_session(system_db, tmp_path):
|
def test_admin_sees_every_user_session(system_db, tmp_path):
|
||||||
db_path = str(_get_state_dir() / "system.duckdb")
|
db_path = str(_get_state_dir() / "system.duckdb")
|
||||||
_, rows, _ = execute_internal_query(
|
_, rows, _ = execute_internal_query(
|
||||||
|
|
@ -156,6 +164,49 @@ def test_non_admin_sees_only_own_sessions(system_db):
|
||||||
assert rows == [("alice", "s-a-1")]
|
assert rows == [("alice", "s-a-1")]
|
||||||
|
|
||||||
|
|
||||||
|
def test_non_admin_sees_sessions_uploaded_via_api(system_db):
|
||||||
|
"""Sessions uploaded via POST /api/upload/sessions are stored under
|
||||||
|
user_id (UUID) by the pipeline. The user_id filter covers both
|
||||||
|
ingestion paths (collector and upload API) because the pipeline
|
||||||
|
resolves the stable user_id for every session at processing time."""
|
||||||
|
from src.db import get_system_db
|
||||||
|
|
||||||
|
conn = get_system_db()
|
||||||
|
conn.execute(
|
||||||
|
"INSERT INTO usage_session_summary "
|
||||||
|
"(session_file, session_id, username, user_id, tool_calls, tool_errors, processor_version) VALUES "
|
||||||
|
"('550e8400-uuid/api.jsonl', 's-api-1', '550e8400-uuid', '550e8400-uuid', 5, 0, 1)"
|
||||||
|
)
|
||||||
|
db_path = str(_get_state_dir() / "system.duckdb")
|
||||||
|
_, rows, _ = execute_internal_query(
|
||||||
|
db_path,
|
||||||
|
{"email": "alice@example.com", "id": "550e8400-uuid"},
|
||||||
|
is_admin=False,
|
||||||
|
sql="SELECT username, session_id FROM agnes_sessions ORDER BY session_id",
|
||||||
|
)
|
||||||
|
# alice should see both her collector row (user_id='alice-uuid' from
|
||||||
|
# fixture — won't match '550e8400-uuid') and her upload-API row
|
||||||
|
# (user_id='550e8400-uuid').
|
||||||
|
assert ("550e8400-uuid", "s-api-1") in rows
|
||||||
|
# bob's rows must NOT appear
|
||||||
|
assert all(r[0] != "bob" for r in rows)
|
||||||
|
|
||||||
|
|
||||||
|
def test_email_change_does_not_affect_visibility(system_db):
|
||||||
|
"""If alice changes her email from alice@example.com to
|
||||||
|
alice.new@example.com, her sessions should still be visible
|
||||||
|
because the filter uses the stable user_id, not the email."""
|
||||||
|
db_path = str(_get_state_dir() / "system.duckdb")
|
||||||
|
# Query with a different email but the same user_id.
|
||||||
|
_, rows, _ = execute_internal_query(
|
||||||
|
db_path,
|
||||||
|
{"email": "alice.new@example.com", "id": "alice-uuid"},
|
||||||
|
is_admin=False,
|
||||||
|
sql="SELECT username, session_id FROM agnes_sessions",
|
||||||
|
)
|
||||||
|
assert rows == [("alice", "s-a-1")]
|
||||||
|
|
||||||
|
|
||||||
def test_non_admin_sees_only_own_audit_rows(system_db):
|
def test_non_admin_sees_only_own_audit_rows(system_db):
|
||||||
db_path = str(_get_state_dir() / "system.duckdb")
|
db_path = str(_get_state_dir() / "system.duckdb")
|
||||||
_, rows, _ = execute_internal_query(
|
_, rows, _ = execute_internal_query(
|
||||||
|
|
@ -225,6 +276,7 @@ def test_execute_rejects_sql_without_internal_refs():
|
||||||
# request handler layer where `is_user_admin(user)` was mis-called.
|
# request handler layer where `is_user_admin(user)` was mis-called.
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
def _seed_internal_via_api():
|
def _seed_internal_via_api():
|
||||||
"""``seeded_app`` bypasses the FastAPI lifespan, so the registry seed
|
"""``seeded_app`` bypasses the FastAPI lifespan, so the registry seed
|
||||||
that puts agnes_sessions / agnes_telemetry / agnes_audit into
|
that puts agnes_sessions / agnes_telemetry / agnes_audit into
|
||||||
|
|
@ -232,6 +284,7 @@ def _seed_internal_via_api():
|
||||||
routes (`/api/query`, `/api/v2/sample`) can find them."""
|
routes (`/api/query`, `/api/v2/sample`) can find them."""
|
||||||
from src.db import get_system_db
|
from src.db import get_system_db
|
||||||
from connectors.internal.registry import ensure_internal_tables_registered
|
from connectors.internal.registry import ensure_internal_tables_registered
|
||||||
|
|
||||||
ensure_internal_tables_registered(get_system_db())
|
ensure_internal_tables_registered(get_system_db())
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -243,6 +296,7 @@ def test_query_internal_via_api_admin_path(seeded_app, admin_user):
|
||||||
"""
|
"""
|
||||||
_seed_internal_via_api()
|
_seed_internal_via_api()
|
||||||
from src.db import get_system_db
|
from src.db import get_system_db
|
||||||
|
|
||||||
conn = get_system_db()
|
conn = get_system_db()
|
||||||
conn.execute(
|
conn.execute(
|
||||||
"INSERT INTO usage_session_summary "
|
"INSERT INTO usage_session_summary "
|
||||||
|
|
@ -287,6 +341,7 @@ def test_sample_internal_via_api_admin_path(seeded_app, admin_user):
|
||||||
# round-2 review (PR #278 R2 finding #1).
|
# round-2 review (PR #278 R2 finding #1).
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
def test_string_literal_alone_does_not_route_internal(system_db):
|
def test_string_literal_alone_does_not_route_internal(system_db):
|
||||||
"""A SQL with `agnes_sessions` only inside a string literal should not
|
"""A SQL with `agnes_sessions` only inside a string literal should not
|
||||||
be treated as referencing the internal table — find_internal_refs
|
be treated as referencing the internal table — find_internal_refs
|
||||||
|
|
|
||||||
|
|
@ -1,4 +1,5 @@
|
||||||
"""v41 → v42 migration: 7 new usage_* tables for telemetry."""
|
"""v41 → v42 migration: 7 new usage_* tables for telemetry."""
|
||||||
|
|
||||||
import duckdb
|
import duckdb
|
||||||
import pytest
|
import pytest
|
||||||
from src.db import _ensure_schema as init_database, SCHEMA_VERSION
|
from src.db import _ensure_schema as init_database, SCHEMA_VERSION
|
||||||
|
|
@ -10,18 +11,25 @@ def test_schema_version_is_42():
|
||||||
# constant. Test name preserved for git-blame continuity; the
|
# constant. Test name preserved for git-blame continuity; the
|
||||||
# version-pinned tests in test_db_schema_version.py and
|
# version-pinned tests in test_db_schema_version.py and
|
||||||
# test_home_stats.py carry the v44 commentary.
|
# test_home_stats.py carry the v44 commentary.
|
||||||
assert SCHEMA_VERSION == 44
|
assert SCHEMA_VERSION == 45
|
||||||
|
|
||||||
|
|
||||||
def test_v42_tables_exist_after_init(tmp_path):
|
def test_v42_tables_exist_after_init(tmp_path):
|
||||||
db_path = tmp_path / "test.duckdb"
|
db_path = tmp_path / "test.duckdb"
|
||||||
conn = duckdb.connect(str(db_path))
|
conn = duckdb.connect(str(db_path))
|
||||||
init_database(conn)
|
init_database(conn)
|
||||||
tables = {row[0] for row in conn.execute("SELECT table_name FROM information_schema.tables WHERE table_schema='main'").fetchall()}
|
tables = {
|
||||||
|
row[0]
|
||||||
|
for row in conn.execute("SELECT table_name FROM information_schema.tables WHERE table_schema='main'").fetchall()
|
||||||
|
}
|
||||||
for tbl in [
|
for tbl in [
|
||||||
"usage_events", "usage_session_summary",
|
"usage_events",
|
||||||
"usage_tool_daily", "usage_plugin_daily",
|
"usage_session_summary",
|
||||||
"usage_attribution_skills", "usage_attribution_agents", "usage_attribution_commands",
|
"usage_tool_daily",
|
||||||
|
"usage_plugin_daily",
|
||||||
|
"usage_attribution_skills",
|
||||||
|
"usage_attribution_agents",
|
||||||
|
"usage_attribution_commands",
|
||||||
]:
|
]:
|
||||||
assert tbl in tables, f"missing table {tbl}"
|
assert tbl in tables, f"missing table {tbl}"
|
||||||
conn.close()
|
conn.close()
|
||||||
|
|
@ -31,12 +39,21 @@ def test_v42_indices_exist(tmp_path):
|
||||||
db_path = tmp_path / "test.duckdb"
|
db_path = tmp_path / "test.duckdb"
|
||||||
conn = duckdb.connect(str(db_path))
|
conn = duckdb.connect(str(db_path))
|
||||||
init_database(conn)
|
init_database(conn)
|
||||||
idx_names = {row[0] for row in conn.execute("SELECT index_name FROM duckdb_indexes WHERE table_name LIKE 'usage_%'").fetchall()}
|
idx_names = {
|
||||||
|
row[0]
|
||||||
|
for row in conn.execute("SELECT index_name FROM duckdb_indexes WHERE table_name LIKE 'usage_%'").fetchall()
|
||||||
|
}
|
||||||
for idx in [
|
for idx in [
|
||||||
"idx_usage_events_session", "idx_usage_events_user_time", "idx_usage_events_tool",
|
"idx_usage_events_session",
|
||||||
"idx_usage_events_skill", "idx_usage_events_ref",
|
"idx_usage_events_user_time",
|
||||||
"idx_usage_session_user", "idx_usage_session_started",
|
"idx_usage_events_tool",
|
||||||
"idx_usage_attr_skill_lookup", "idx_usage_attr_agent_lookup", "idx_usage_attr_command_lookup",
|
"idx_usage_events_skill",
|
||||||
|
"idx_usage_events_ref",
|
||||||
|
"idx_usage_session_user",
|
||||||
|
"idx_usage_session_started",
|
||||||
|
"idx_usage_attr_skill_lookup",
|
||||||
|
"idx_usage_attr_agent_lookup",
|
||||||
|
"idx_usage_attr_command_lookup",
|
||||||
]:
|
]:
|
||||||
assert idx in idx_names, f"missing index {idx}"
|
assert idx in idx_names, f"missing index {idx}"
|
||||||
conn.close()
|
conn.close()
|
||||||
|
|
@ -51,7 +68,7 @@ def test_v41_to_v42_is_idempotent(tmp_path):
|
||||||
conn = duckdb.connect(str(db_path))
|
conn = duckdb.connect(str(db_path))
|
||||||
init_database(conn)
|
init_database(conn)
|
||||||
v = conn.execute("SELECT MAX(version) FROM schema_version").fetchone()[0]
|
v = conn.execute("SELECT MAX(version) FROM schema_version").fetchone()[0]
|
||||||
assert v == 44
|
assert v == 45
|
||||||
conn.close()
|
conn.close()
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -72,15 +89,20 @@ def test_v41_db_upgrades_cleanly(tmp_path):
|
||||||
conn = duckdb.connect(str(db_path))
|
conn = duckdb.connect(str(db_path))
|
||||||
init_database(conn)
|
init_database(conn)
|
||||||
v = conn.execute("SELECT MAX(version) FROM schema_version").fetchone()[0]
|
v = conn.execute("SELECT MAX(version) FROM schema_version").fetchone()[0]
|
||||||
assert v == 44
|
assert v == 45
|
||||||
# All 7 new v41 tables exist after the v40→v41 upgrade
|
# All 7 new v41 tables exist after the v40→v41 upgrade
|
||||||
tables = {row[0] for row in conn.execute(
|
tables = {
|
||||||
"SELECT table_name FROM information_schema.tables WHERE table_schema='main'"
|
row[0]
|
||||||
).fetchall()}
|
for row in conn.execute("SELECT table_name FROM information_schema.tables WHERE table_schema='main'").fetchall()
|
||||||
|
}
|
||||||
for tbl in [
|
for tbl in [
|
||||||
"usage_events", "usage_session_summary",
|
"usage_events",
|
||||||
"usage_tool_daily", "usage_plugin_daily",
|
"usage_session_summary",
|
||||||
"usage_attribution_skills", "usage_attribution_agents", "usage_attribution_commands",
|
"usage_tool_daily",
|
||||||
|
"usage_plugin_daily",
|
||||||
|
"usage_attribution_skills",
|
||||||
|
"usage_attribution_agents",
|
||||||
|
"usage_attribution_commands",
|
||||||
]:
|
]:
|
||||||
assert tbl in tables, f"missing table {tbl} after v40→v41 upgrade"
|
assert tbl in tables, f"missing table {tbl} after v40→v41 upgrade"
|
||||||
conn.close()
|
conn.close()
|
||||||
|
|
@ -99,7 +121,7 @@ def test_v30_db_ladders_all_the_way_up(tmp_path):
|
||||||
conn = duckdb.connect(str(db_path))
|
conn = duckdb.connect(str(db_path))
|
||||||
init_database(conn)
|
init_database(conn)
|
||||||
v = conn.execute("SELECT MAX(version) FROM schema_version").fetchone()[0]
|
v = conn.execute("SELECT MAX(version) FROM schema_version").fetchone()[0]
|
||||||
assert v == 44
|
assert v == 45
|
||||||
cnt = conn.execute("SELECT COUNT(*) FROM audit_log WHERE id='vintage'").fetchone()[0]
|
cnt = conn.execute("SELECT COUNT(*) FROM audit_log WHERE id='vintage'").fetchone()[0]
|
||||||
assert cnt == 1
|
assert cnt == 1
|
||||||
# New v41 table exists
|
# New v41 table exists
|
||||||
|
|
|
||||||
|
|
@ -17,11 +17,10 @@ import json
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
import duckdb
|
import duckdb
|
||||||
import pytest
|
|
||||||
|
|
||||||
from services.session_pipeline.contract import ProcessorResult
|
from services.session_pipeline.contract import ProcessorResult
|
||||||
from services.session_pipeline.lib import compute_file_hash, parse_jsonl
|
from services.session_pipeline.lib import compute_file_hash, parse_jsonl
|
||||||
from services.session_pipeline.runner import run_processor
|
from services.session_pipeline.runner import resolve_user_id, run_processor
|
||||||
from src.repositories.session_processor_state import SessionProcessorStateRepository
|
from src.repositories.session_processor_state import SessionProcessorStateRepository
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -29,6 +28,7 @@ def _fresh_db(tmp_path, monkeypatch) -> duckdb.DuckDBPyConnection:
|
||||||
"""Same idiom as tests/test_corporate_memory_v1.py — fresh schema in tmp_path."""
|
"""Same idiom as tests/test_corporate_memory_v1.py — fresh schema in tmp_path."""
|
||||||
monkeypatch.setenv("DATA_DIR", str(tmp_path))
|
monkeypatch.setenv("DATA_DIR", str(tmp_path))
|
||||||
import src.db as db_module
|
import src.db as db_module
|
||||||
|
|
||||||
db_module._system_db_conn = None
|
db_module._system_db_conn = None
|
||||||
db_module._system_db_path = None
|
db_module._system_db_path = None
|
||||||
return db_module.get_system_db()
|
return db_module.get_system_db()
|
||||||
|
|
@ -38,12 +38,15 @@ def _fresh_db(tmp_path, monkeypatch) -> duckdb.DuckDBPyConnection:
|
||||||
# parse_jsonl
|
# parse_jsonl
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
class TestParseJsonl:
|
class TestParseJsonl:
|
||||||
def test_parses_well_formed_lines(self, tmp_path):
|
def test_parses_well_formed_lines(self, tmp_path):
|
||||||
f = tmp_path / "session.jsonl"
|
f = tmp_path / "session.jsonl"
|
||||||
f.write_text(
|
f.write_text(
|
||||||
json.dumps({"role": "user", "content": "hi"}) + "\n"
|
json.dumps({"role": "user", "content": "hi"})
|
||||||
+ json.dumps({"role": "assistant", "content": "hello"}) + "\n"
|
+ "\n"
|
||||||
|
+ json.dumps({"role": "assistant", "content": "hello"})
|
||||||
|
+ "\n"
|
||||||
)
|
)
|
||||||
turns = parse_jsonl(f)
|
turns = parse_jsonl(f)
|
||||||
assert len(turns) == 2
|
assert len(turns) == 2
|
||||||
|
|
@ -55,9 +58,11 @@ class TestParseJsonl:
|
||||||
a single corrupt row mustn't abort processing of the rest."""
|
a single corrupt row mustn't abort processing of the rest."""
|
||||||
f = tmp_path / "session.jsonl"
|
f = tmp_path / "session.jsonl"
|
||||||
f.write_text(
|
f.write_text(
|
||||||
json.dumps({"role": "user", "content": "ok"}) + "\n"
|
json.dumps({"role": "user", "content": "ok"})
|
||||||
|
+ "\n"
|
||||||
+ "this is not json\n"
|
+ "this is not json\n"
|
||||||
+ json.dumps({"role": "assistant", "content": "still ok"}) + "\n"
|
+ json.dumps({"role": "assistant", "content": "still ok"})
|
||||||
|
+ "\n"
|
||||||
)
|
)
|
||||||
turns = parse_jsonl(f)
|
turns = parse_jsonl(f)
|
||||||
assert len(turns) == 2
|
assert len(turns) == 2
|
||||||
|
|
@ -66,11 +71,7 @@ class TestParseJsonl:
|
||||||
|
|
||||||
def test_skips_blank_lines(self, tmp_path):
|
def test_skips_blank_lines(self, tmp_path):
|
||||||
f = tmp_path / "session.jsonl"
|
f = tmp_path / "session.jsonl"
|
||||||
f.write_text(
|
f.write_text("\n" + json.dumps({"role": "user", "content": "x"}) + "\n" + " \n")
|
||||||
"\n"
|
|
||||||
+ json.dumps({"role": "user", "content": "x"}) + "\n"
|
|
||||||
+ " \n"
|
|
||||||
)
|
|
||||||
turns = parse_jsonl(f)
|
turns = parse_jsonl(f)
|
||||||
assert len(turns) == 1
|
assert len(turns) == 1
|
||||||
|
|
||||||
|
|
@ -79,6 +80,7 @@ class TestParseJsonl:
|
||||||
# compute_file_hash
|
# compute_file_hash
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
class TestComputeFileHash:
|
class TestComputeFileHash:
|
||||||
def test_deterministic(self, tmp_path):
|
def test_deterministic(self, tmp_path):
|
||||||
f = tmp_path / "x.jsonl"
|
f = tmp_path / "x.jsonl"
|
||||||
|
|
@ -94,10 +96,77 @@ class TestComputeFileHash:
|
||||||
assert h1 != h2
|
assert h1 != h2
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# resolve_user_id
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class TestResolveUserId:
|
||||||
|
"""Unit tests for the linchpin identity-resolution function."""
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _seed_users(conn: duckdb.DuckDBPyConnection, rows: list[tuple[str, str, str | None]]) -> None:
|
||||||
|
for uid, email, updated_at in rows:
|
||||||
|
conn.execute(
|
||||||
|
"INSERT INTO users (id, email, updated_at) VALUES (?, ?, ?)",
|
||||||
|
[uid, email, updated_at],
|
||||||
|
)
|
||||||
|
|
||||||
|
def test_exact_uuid_match(self, tmp_path, monkeypatch):
|
||||||
|
conn = _fresh_db(tmp_path, monkeypatch)
|
||||||
|
self._seed_users(conn, [("uuid-aaa", "alice@example.com", "2026-01-01")])
|
||||||
|
assert resolve_user_id(conn, "uuid-aaa") == "uuid-aaa"
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
def test_email_local_part_match(self, tmp_path, monkeypatch):
|
||||||
|
conn = _fresh_db(tmp_path, monkeypatch)
|
||||||
|
self._seed_users(conn, [("uuid-bbb", "bob@example.com", "2026-01-01")])
|
||||||
|
assert resolve_user_id(conn, "bob") == "uuid-bbb"
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
def test_null_fallback_for_unknown(self, tmp_path, monkeypatch):
|
||||||
|
conn = _fresh_db(tmp_path, monkeypatch)
|
||||||
|
self._seed_users(conn, [("uuid-aaa", "alice@example.com", "2026-01-01")])
|
||||||
|
assert resolve_user_id(conn, "nobody") is None
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
def test_tiebreak_picks_most_recently_updated(self, tmp_path, monkeypatch):
|
||||||
|
conn = _fresh_db(tmp_path, monkeypatch)
|
||||||
|
self._seed_users(
|
||||||
|
conn,
|
||||||
|
[
|
||||||
|
("uuid-old", "zara@old.com", "2025-01-01"),
|
||||||
|
("uuid-new", "zara@new.com", "2026-06-01"),
|
||||||
|
],
|
||||||
|
)
|
||||||
|
assert resolve_user_id(conn, "zara") == "uuid-new"
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
def test_underscore_not_treated_as_wildcard(self, tmp_path, monkeypatch):
|
||||||
|
conn = _fresh_db(tmp_path, monkeypatch)
|
||||||
|
self._seed_users(
|
||||||
|
conn,
|
||||||
|
[
|
||||||
|
("uuid-alice", "alicexsmith@example.com", "2026-01-01"),
|
||||||
|
("uuid-real", "alice_smith@example.com", "2025-01-01"),
|
||||||
|
],
|
||||||
|
)
|
||||||
|
# "alice_smith" must match only the literal underscore email
|
||||||
|
assert resolve_user_id(conn, "alice_smith") == "uuid-real"
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
def test_uuid_branch_takes_priority_over_email(self, tmp_path, monkeypatch):
|
||||||
|
conn = _fresh_db(tmp_path, monkeypatch)
|
||||||
|
self._seed_users(conn, [("uuid-aaa", "uuid-aaa@example.com", "2026-01-01")])
|
||||||
|
assert resolve_user_id(conn, "uuid-aaa") == "uuid-aaa"
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# SessionProcessorStateRepository
|
# SessionProcessorStateRepository
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
class TestSessionProcessorStateRepository:
|
class TestSessionProcessorStateRepository:
|
||||||
def test_unprocessed_when_empty(self, tmp_path, monkeypatch):
|
def test_unprocessed_when_empty(self, tmp_path, monkeypatch):
|
||||||
conn = _fresh_db(tmp_path, monkeypatch)
|
conn = _fresh_db(tmp_path, monkeypatch)
|
||||||
|
|
@ -175,7 +244,6 @@ class TestSessionProcessorStateRepository:
|
||||||
every scheduler tick."""
|
every scheduler tick."""
|
||||||
import os
|
import os
|
||||||
import time
|
import time
|
||||||
from datetime import datetime, timezone
|
|
||||||
|
|
||||||
conn = _fresh_db(tmp_path, monkeypatch)
|
conn = _fresh_db(tmp_path, monkeypatch)
|
||||||
sessions = tmp_path / "sessions"
|
sessions = tmp_path / "sessions"
|
||||||
|
|
@ -233,6 +301,7 @@ class TestSessionProcessorStateRepository:
|
||||||
# run_processor
|
# run_processor
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
class _FakeProcessor:
|
class _FakeProcessor:
|
||||||
"""Test double that records its calls and is configurable per behavior."""
|
"""Test double that records its calls and is configurable per behavior."""
|
||||||
|
|
||||||
|
|
@ -249,7 +318,7 @@ class _FakeProcessor:
|
||||||
self.raise_on_session = raise_on_session
|
self.raise_on_session = raise_on_session
|
||||||
self.calls: list[str] = []
|
self.calls: list[str] = []
|
||||||
|
|
||||||
def process_session(self, session_path: Path, username: str, session_key: str, conn):
|
def process_session(self, session_path: Path, username: str, session_key: str, conn, **kwargs: object):
|
||||||
self.calls.append(session_key)
|
self.calls.append(session_key)
|
||||||
if self.raise_on_session is not None and session_key == self.raise_on_session:
|
if self.raise_on_session is not None and session_key == self.raise_on_session:
|
||||||
raise RuntimeError("simulated processor failure")
|
raise RuntimeError("simulated processor failure")
|
||||||
|
|
@ -385,6 +454,7 @@ class TestRunProcessor:
|
||||||
class _BadReturn:
|
class _BadReturn:
|
||||||
name = "bad"
|
name = "bad"
|
||||||
cadence_minutes = 1
|
cadence_minutes = 1
|
||||||
|
|
||||||
def process_session(self, *a, **kw):
|
def process_session(self, *a, **kw):
|
||||||
return None # type: ignore[return-value]
|
return None # type: ignore[return-value]
|
||||||
|
|
||||||
|
|
@ -398,6 +468,7 @@ class TestRunProcessor:
|
||||||
# v29 migration — verification rows preserved, old table dropped
|
# v29 migration — verification rows preserved, old table dropped
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
class TestV29Migration:
|
class TestV29Migration:
|
||||||
"""Exercise the v28 → v29 migration directly. Builds a v28 schema (using
|
"""Exercise the v28 → v29 migration directly. Builds a v28 schema (using
|
||||||
the pre-v29 idiom inline so the test doesn't depend on _SYSTEM_SCHEMA's
|
the pre-v29 idiom inline so the test doesn't depend on _SYSTEM_SCHEMA's
|
||||||
|
|
@ -426,6 +497,7 @@ class TestV29Migration:
|
||||||
# Run v29 migration steps via the helper (which conditionally copies
|
# Run v29 migration steps via the helper (which conditionally copies
|
||||||
# from the legacy table when present).
|
# from the legacy table when present).
|
||||||
from src.db import _v30_to_v31_migrate
|
from src.db import _v30_to_v31_migrate
|
||||||
|
|
||||||
_v30_to_v31_migrate(conn)
|
_v30_to_v31_migrate(conn)
|
||||||
|
|
||||||
# New table has the row tagged with processor_name='verification'.
|
# New table has the row tagged with processor_name='verification'.
|
||||||
|
|
@ -437,7 +509,8 @@ class TestV29Migration:
|
||||||
|
|
||||||
# Old table is gone.
|
# Old table is gone.
|
||||||
existing = {
|
existing = {
|
||||||
r[0] for r in conn.execute(
|
r[0]
|
||||||
|
for r in conn.execute(
|
||||||
"SELECT table_name FROM information_schema.tables WHERE table_schema='main'"
|
"SELECT table_name FROM information_schema.tables WHERE table_schema='main'"
|
||||||
).fetchall()
|
).fetchall()
|
||||||
}
|
}
|
||||||
|
|
@ -477,10 +550,12 @@ class TestV29Migration:
|
||||||
)
|
)
|
||||||
|
|
||||||
from src.db import _v30_to_v31_migrate
|
from src.db import _v30_to_v31_migrate
|
||||||
|
|
||||||
_v30_to_v31_migrate(conn)
|
_v30_to_v31_migrate(conn)
|
||||||
|
|
||||||
existing = {
|
existing = {
|
||||||
r[0] for r in conn.execute(
|
r[0]
|
||||||
|
for r in conn.execute(
|
||||||
"SELECT table_name FROM information_schema.tables WHERE table_schema='main'"
|
"SELECT table_name FROM information_schema.tables WHERE table_schema='main'"
|
||||||
).fetchall()
|
).fetchall()
|
||||||
}
|
}
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue