agnes-the-ai-analyst/services/session_processors/verification.py
minasarustamyan e26236fdc1
Extract session-pipeline framework + UsageProcessor skeleton (#232)
* Extract session pipeline framework, refactor verification, add UsageProcessor skeleton

Pluggable framework under services/session_pipeline/ (contract + lib + per-processor
runner) so multiple processors can read /data/user_sessions/<key>/*.jsonl on their
own cadence with full failure isolation. Verification flow becomes the first plugin;
a no-op UsageProcessor reserves the second slot pending a separate brainstorm on
extraction logic + storage shape.

Schema v28→v29: rename session_extraction_state → session_processor_state with
composite PK (processor_name, session_file). Existing rows copied over with
processor_name='verification'; legacy table dropped. Migration is idempotent and
no-ops the copy step on fresh installs that came up at the new schema.

Endpoint: /api/admin/run-verification-detector replaced by parametrized
/api/admin/run-session-processor?processor=<name>. Audit action format follows.
Scheduler JOBS: verification-detector entry split into session-processor:verification
+ session-processor:usage. SCHEDULER_VERIFICATION_DETECTOR_INTERVAL retained for
operator compatibility (drives both cadence and health-check grace window);
SCHEDULER_USAGE_PROCESSOR_INTERVAL added.

* Address PR #232 review: scan dead branch + per-processor lock

- `SessionProcessorStateRepository.scan_unprocessed_for` dead else: both
  branches surfaced every jsonl, the SELECT was unused, runner MD5-rehashed
  every stable session per tick. Replaced with an mtime precheck — stable
  sessions (mtime <= processed_at) are filtered at scan; modified files
  still surface for the runner's authoritative `file_hash` invalidation.
  Naive-local comparison matches the existing health-check idiom (DuckDB
  TIMESTAMP strips tz on storage).

- Per-processor advisory lock around `_run_processor` in
  `/api/admin/run-session-processor`. Scheduler tick + manual admin POST
  could otherwise both run, both call create_evidence on overlapping
  detections, and accumulate duplicate verification_evidence rows (the
  dedup short-circuit only covers create+contradiction, not evidence per
  ADR Decision 3). Non-blocking acquire → 409 Conflict on concurrent
  invocation; release in finally so a runner exception doesn't wedge the
  processor.

Tests: two new scan unit tests (mtime filter + post-mark mtime bump), 409
endpoint test, lock-released-on-exception test. Two existing tests updated
for the new "filtered at scan" stat shape (previously asserted skipped == 1,
now scanned == 0).

* Address PR #232 review #2: parallel scheduler tick + last_run on terminal state

Two pre-existing scaffold bugs in services/scheduler/__main__.py amplified
by adding more session-pipeline jobs:

1. Serial for-loop over jobs with synchronous httpx.post(timeout=900) — a
   10-minute verification run blocked every other job (data-refresh,
   health-check, usage, corporate-memory) for the whole window. The PR's
   stated isolation guarantee held inside the runner but broke at the
   scheduler dispatch layer.

2. last_run advanced only when _call_api returned True. Permanent-failure
   jobs hot-looped on every tick (30s) instead of cadence (15min).

Fix: ThreadPoolExecutor.submit per due job + per-job in_flight set so a
long-running job can't be re-launched on subsequent ticks. last_run
advances unconditionally in finally; errors still surface via _call_api
logging + audit_log on the receiving side.

_run_job extracted to module-level for unit testing. New tests:
- TestRunJobBookkeeping: advances on success / failure / unhandled raise
- TestRunLoopParallelism: in_flight protection prevents duplicate
  launches across ticks for a single slow job

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
2026-05-08 19:47:46 +02:00

173 lines
6.8 KiB
Python

"""VerificationProcessor — first plugin of the session-pipeline framework.
Wraps the body of the pre-refactor `verification_detector.detector.run()`
inner loop so the LLM extraction + persist behavior is unchanged after the
framework refactor. Tests in `tests/test_corporate_memory_v1.py` are the
regression contract.
"""
from __future__ import annotations
import logging
from pathlib import Path
import duckdb
from connectors.llm import StructuredExtractor
from connectors.llm.exceptions import LLMError
from services.corporate_memory import contradiction as contradiction_module
from services.corporate_memory.confidence import compute_confidence
from services.session_pipeline.contract import ProcessorResult
from services.session_pipeline.lib import parse_jsonl
from services.verification_detector.duplicates import _record_duplicate_candidates
from services.verification_detector.detector import (
_generate_id,
extract_verifications,
)
from src.repositories.knowledge import KnowledgeRepository
logger = logging.getLogger(__name__)
class VerificationProcessor:
name: str = "verification"
cadence_minutes: int = 15
def __init__(self, extractor: StructuredExtractor):
self.extractor = extractor
def process_session(
self,
session_path: Path,
username: str,
session_key: str,
conn: duckdb.DuckDBPyConnection,
) -> ProcessorResult:
repo = KnowledgeRepository(conn)
session_id = f"session-{session_path.stem}-{username}"
turns = parse_jsonl(session_path)
if not turns:
logger.info("Empty session: %s", session_key)
return ProcessorResult(items_count=0)
verifications = extract_verifications(self.extractor, username, session_id, turns)
items_created = 0
for v in verifications:
item_id = _generate_id(v["title"], v["content"])
existing = repo.get_by_id(item_id)
if existing:
# Hash collision on (title, content) → another analyst
# produced the same fact. ADR Decision 3 expects multiple
# evidence rows to accumulate (one per distinct
# verification event), so we still persist the new
# evidence row even though we skip the create+contradiction
# path. Without this, the second analyst's user_quote and
# detection_type are silently dropped and the
# "additional verifiers" boost cannot accumulate.
logger.info(
"Duplicate item — recording evidence on existing: %s",
item_id,
)
repo.create_evidence(
item_id=item_id,
source_user=username,
source_ref=session_id,
detection_type=v.get("detection_type"),
user_quote=v.get("user_quote"),
)
continue
# Confidence is computed in code from (source_type, detection_type).
# The LLM is not trusted to set its own credibility — see Q3 in
# docs/pd-ps-comments.md and the ADR.
detection_type = v.get("detection_type")
try:
confidence_value = compute_confidence("user_verification", detection_type)
except ValueError:
# Unknown detection_type from the LLM; fall back to a
# lookup-keyed default rather than the LLM-supplied value.
confidence_value = compute_confidence("user_verification", "confirmation")
repo.create(
id=item_id,
title=v["title"],
content=v["content"],
category="business_logic",
source_user=username,
tags=v.get("entities", []),
status="pending",
confidence=confidence_value,
domain=v.get("domain"),
entities=v.get("entities"),
source_type="user_verification",
source_ref=session_id,
sensitivity="internal",
)
# Persist the verification evidence row — user_quote and
# detection_type are the raw signal Bayesian re-calibration
# will need later (Q3).
repo.create_evidence(
item_id=item_id,
source_user=username,
source_ref=session_id,
detection_type=detection_type,
user_quote=v.get("user_quote"),
)
items_created += 1
# Record duplicate-candidate hints inline. Heuristic-only (no
# LLM call) so it stays cheap; failures must never abort
# session processing — log and continue. Issue #62.
try:
new_item = repo.get_by_id(item_id)
if new_item is not None:
_record_duplicate_candidates(repo, new_item)
except Exception as e:
logger.warning(
"Duplicate-candidate detection failed for %s: %s",
item_id, e,
)
# Run contradiction detection inline. Failure of the LLM
# judge must not abort session processing — log and move on.
try:
new_item = repo.get_by_id(item_id)
if new_item is not None:
contradiction_module.detect_and_record(self.extractor, new_item, repo)
except LLMError as e:
logger.warning("Contradiction check failed for %s: %s", item_id, e)
except Exception as e:
logger.warning(
"Unexpected error during contradiction check for %s: %s",
item_id, e,
)
logger.info(
"Processed %s: %d verifications, %d items created",
session_key, len(verifications), items_created,
)
return ProcessorResult(items_count=items_created)
def build_verification_processor() -> VerificationProcessor:
"""Factory that constructs the LLM extractor from instance config + env.
Mirrors the pattern in services/verification_detector/__main__.py and
app/api/admin.py:run_verification_detector — both built the extractor
lazily at call time. Raises if the LLM isn't configured."""
from connectors.llm import create_extractor_from_env_or_config
try:
from app.instance_config import load_instance_config
try:
config = load_instance_config()
except (ValueError, FileNotFoundError):
config = {}
ai_config = config.get("ai") if config else None
except Exception:
ai_config = None
extractor = create_extractor_from_env_or_config(ai_config)
return VerificationProcessor(extractor=extractor)