* feat(unified-stack): Browse + My Stack + Recipes + RBAC matrix (v49–v55)
Squash of 94 commits spanning the v49 → v55 unified-stack rewrite.
Full per-feature breakdown lives in CHANGELOG.md under [Unreleased].
Major buckets:
* v49 schema — first-class user_groups + user_group_members +
resource_grants; admin can CRUD groups and grants; Google
Workspace nightly sync writes into the new tables.
* v49 data_packages — admin-curated bundles of tables, RBAC-gated,
first-class section on /catalog Browse + My Stack.
* v49 memory_domains — row-backed (replaces hardcoded VALID_DOMAINS
enum); admin can CRUD; grants follow the same shape as tables and
packages.
* v50 cover_image_url + admin sidebar collapsibles + per-row Mode
tooltip + admin queue domain badges + admin "+ New Item" seed flow.
* v51 lifecycle status (prod/poc/coming-soon/draft) + category +
palette swatches on admin modals.
* v52 per-table detail page /catalog/t/<id>.
* v53 Recipes — admin-curated SQL templates as a second tab on
/catalog with full Edit/Delete admin affordances.
* v54 soft-delete (deleted_at) + Undo toast for packages, memory
domains, and recipes; hard_delete() retained as escape hatch.
* v55 Recipes RBAC — ResourceType.RECIPE registered, inline Group
Access matrix on Create + Edit Recipe modals (mirrors the Memory
Domain pattern).
* Activity Center per-resource filter (resource_prefix LIKE-anchored
on audit_log.resource); admin nav g+letter keyboard shortcuts;
loadAdminTablesLayout N+1 → single endpoint; /api/memory 30s
page-level cache.
* CI hardening — Keboola legacy tests pytest.importorskip; perf-
smoke threshold widened to stop cold-cache flake.
5002 tests passing, 35 skipped.
* feat(p2 backlog): Cmd-K palette + suggest-a-domain + nightly E2E + v55 schema
10-item P2 sweep on top of the unified-stack squash. New behaviour:
* Cmd-K admin command palette (base.html) — fuzzy-search overlay over
admin + user-facing routes. Arrows/Enter to navigate, Esc to close.
* Stack-tabs digit shortcuts — 1/2/3 switch Browse / My Stack /
Recipes on /catalog + /corporate-memory.
* Friendlier non-admin empty state on /corporate-memory, plus a
"Suggest a domain" CTA → POST /api/memory-domain-suggestions, admin
queue with approve/reject. Backed by a new memory_domain_suggestions
table (schema v55).
* /admin/corporate-memory 7-tab strip grouped under Moderation /
Catalog parent labels.
* Bulk-assign table → package dropdown annotates each option with
"(N of M tables already in)" so the existing distribution is visible
before picking a target.
* GET /api/memory + /tree accept is_required filter; admin status
dropdowns route the "Required" sentinel onto it (status no longer
holds 'mandatory' post-v49, so the old dropdown returned nothing).
* chip-input.js is now opt-in per template via {% block extra_scripts %}
instead of loaded globally on every page from base.html.
* Edit-modal close helpers consolidated onto _closeEditModalById();
docs the per-source-type modal architecture decision.
* New .github/workflows/e2e-nightly.yml runs agent-browser smoke
scripts (scripts/e2e/smoke_*.sh) against a docker-compose stack
nightly at 04:30 UTC; failures open an agent-browser-nightly issue.
5012 tests passing, 35 skipped.
* fix(visual audit): 6 page regressions on memory + data-package surfaces
agent-browser walkthrough of every memory + data-package page in the PR
turned up 6 real bugs. Fixes:
1. Admin memory modals were dead. Duplicate `let _cmdNewDomainId`
declarations from the deprecated step-2 RBAC stubs in
admin_corporate_memory.html collided with the live state vars
declared earlier in the same <script> → SyntaxError on parse →
the entire second script block silently failed → every inline
onclick= handler defined there (`+ New Memory Domain`, Edit, etc.)
was a no-op. Removed the duplicate stubs.
2. /catalog/t/<table_id> + /catalog/r/<slug> rendered unstyled.
Both templates injected their CSS via {% block head %} but
base.html exposes {% block head_extra %} — wrong block name
meant <style> rules never reached the rendered HTML. Renamed
to head_extra. Hero card, section cards, dark SQL block, proper
full-width inputs all now render as designed.
3. L49 leak — "MANDATORY" KPI label + "Make Mandatory" row buttons
on /admin/corporate-memory still used the old word. Renamed to
"Required" / "Mark as Required" so UI matches the data model
(v49 split moved the Required tier onto the orthogonal
is_required boolean; status no longer holds 'mandatory').
4. Activity Center Resource dropdown didn't know the v55
`memory_domain_suggestion:` namespace — added it.
5. Tab strip on /admin/corporate-memory wrapped text 2× per button
on narrow viewports after the L50 MODERATION/CATALOG group
labels pushed total width past most viewports. Switched the
strip to flex-wrap:nowrap + overflow-x:auto with
white-space:nowrap + flex-shrink:0 on every direct child so the
tabs stay one row and slide horizontally when they overflow.
5012 tests passing, 35 skipped.
* rebase-cleanup: align with main's 0.54.25-27 API design + comment fix
Three follow-on fixes after rebasing onto origin/main (0.54.27):
* admin_tables.html: dropped a stray nested ``{% if data_source_type
== 'keboola' %}`` around ``prefillFromKeboolaTable`` (main never had
it; the outer Phase F2 guard already covers it) and reworded a JS
comment that contained literal ``{% %}`` tokens which Jinja was
parsing as a real tag → unbalanced if/endif → 30 template render
failures across the suite.
* /api/stack/subscription/{type}/{id}: DELETE now returns 204 instead
of 200 per the 0.54.26 design rules. CLI client + parity tests
updated to accept 2xx / assert 204.
* Memory-domain suggestion approve/reject paths added to
``_VERB_PATH_ALLOWLIST`` — they are pending → approved/rejected
state-machine transitions (approve also creates the real
memory_domains row as a side effect), so the RPC shape is
intentional rather than a missed PATCH refactor.
5035 tests passing, 35 skipped.
* fix(catalog_table_detail): real polish pass — hero glyph, dedup pills, rows/size meta, scoped sync CTA
The previous fix only got the block-name typo so the existing CSS rendered.
The actual layout was still wireframe-tier on close inspection:
* No cover glyph in the hero (a flat white card with title + meta line);
data-package + memory-domain detail pages both have a colored icon
square. Restored parity — table.icon emoji if set, otherwise initials
on a colored square using table.color.
* "INTERNAL" pill rendered twice for agnes_audit etc. — the mode pill
and the source-type pill happened to be identical strings. Now skip
the source pill when it matches the mode (`internal == internal`).
* Bucket / source_table code chip showed `Agnes Internal.audit_log` for
internal rows — meaningless to a user. Hidden when source_type is
internal.
* `pairs_well_with` admin input was a comma-separated `<input>` always
visible. Wrapped all 4 sections in an Edit-on-demand toggle: read-
only display by default, "+ Add" / "Edit" button on the right edge
of each section header reveals the inline form, Cancel hides it.
* "Trigger sync now" was a cramped link squashed into the empty-state
flex row (visible as `Tr…` overflow before). Promoted to a proper
btn-primary button under the empty-state copy. Hidden entirely for
internal tables (which are server-managed — no upstream to pull).
* Hero meta now surfaces row count + payload size (when sync_state has
them) + last sync timestamp on a single line — was missing from the
original.
* Mode pills colored by tier (local=green, remote=amber, materialized=
blue, internal=gray) so the basic fact about a table reads at a
glance, not from upper-cased ALL-CAPS text alone.
* tests(v56): TDD baseline for extended data-packages content + per-table docs
68 failing tests across 8 files spec the v56 surface before any
implementation lands:
* test_schema_v55_to_v56_migration.py — schema bump, additive ALTERs
on data_packages + table_registry, idempotency, sequential-upgrade
preservation
* test_data_packages_repo_v56.py — repo create/update/get/list for
owner_name, owner_team, tags, long_description, when_to_use,
when_not_to_use, example_questions (JSON list round-trip, empty
defaults, partial-update preservation)
* test_table_registry_v56_docs.py — update_docs for grain, platforms,
partition_col, history, gotchas; preserves v52 docs columns
* test_api_data_packages_v56.py — PUT/POST/GET for all new fields,
field-level validation (tag count, bullet length, description size),
virtual badge derivation (curated/new)
* test_api_registry_docs_v56.py — PATCH /api/admin/registry/{id}/docs
for v56 fields, validation, RBAC unchanged
* test_web_catalog_package_detail_v56.py — /catalog/p/<slug> rewrite
asserts on rendered owner line, tag pills, badges, What it is,
Use it when, Skip it when, Example questions, per-table extended
detail in collapsible row, key-gotcha distinctness, admin-only Edit
* test_web_stack_card_v56_metadata.py — Browse-grid card additions
(owner chip, tag chips, badges) without breaking back-compat for
rows missing the new fields
* test_data_packages_no_vendor_content.py — CI guard: scans app/ +
src/ + cli/ + config/ + scripts/ for Groupon-specific tokens from
the colleague's spec MD; fails if any leak into OSS surfaces
* test_db_schema_version.py — bumped 55 → 56 with rationale
Plus updates schema-version assertion to 56. Implementation lands in
subsequent commits (schema migration → repo → API → templates).
* feat(v56): schema + repo for extended data-packages content
Schema additions (ALTER ADD COLUMN IF NOT EXISTS — additive + idempotent):
* data_packages: owner_name, owner_team, tags, long_description,
when_to_use, when_not_to_use, example_questions (JSON-as-VARCHAR for
the lists)
* table_registry: grain, platforms, partition_col, history, gotchas
(extends the v52 sample_questions / things_to_know / pairs_well_with
docs surface with structured per-table content)
Repo extensions:
* DataPackagesRepository.create + update accept the new fields with
the same Optional-is-no-op contract as v51 (pass an empty list to
clear a JSON column)
* _decode_row decodes the new JSON-list columns to Python lists; NULL
rounds back to [] so callers don't branch
* TableRegistryRepository.update_docs grew the v56 fields alongside
the existing v52 ones — single PATCH can write either tier
atomically
* TableRegistryRepository._decode_row picks up platforms + gotchas in
the same NULL-tolerant decoder
22 repo + migration tests passing. API + UI land in subsequent commits.
* feat(v56): API surface for extended data-packages + per-table docs
CreateDataPackageRequest + UpdateDataPackageRequest grew the v56 fields
(owner_name, owner_team, tags, long_description, when_to_use,
when_not_to_use, example_questions) with per-field validators that
match the Foundry spec checklist:
* tags: ≤8 entries × ≤30 chars
* long_description: ≤4000 chars
* use/skip: ≤8 bullets × ≤200 chars
* example_questions: ≤12 × ≤200 chars
_serialize emits all v56 fields plus a virtual ``badges`` list derived
server-side at render time (no DB column needed): "curated" when the
creator is in the Admin group, "new" within 30 days of created_at.
Backdating created_at or admin-status changes pick up automatically.
PATCH /api/admin/registry/{id}/docs extended with v56 structured
per-table fields (grain, platforms, partition_col, history, gotchas).
gotchas: list of {key: bool, body: str} Pydantic models with the same
≤8 cap; first key=true entry becomes the Key gotcha on the rendered
package detail page. PATCH echoes the fresh state so callers can
re-render without a second GET.
26 API tests passing (16 data-packages + 10 registry-docs).
* feat(v56): /catalog/p/<slug> rewrite + Browse-grid card augmentation
The third (and final) v56 commit lights up the UI surfaces backed by
the schema + API commits earlier in this PR:
* /catalog/p/<slug> template rebuilt around the Foundry spec's
section ladder — hero (icon + name + badges + owner + tags +
description + meta + Add-to-stack), "What it is" markdown body,
paired "Use it when / Skip it when" panels, "Tables in this
package" with collapsible per-table extended detail (grain /
platforms / partition_col / history / gotchas + sample questions),
and an "Example questions you can ask Claude" prompt panel. Each
section guarded by ``{% if pkg.<field> %}`` — empty content fields
hide the section entirely (no "No X yet" placeholder noise on the
public-facing drilldown).
* router catalog_package_detail hydrates per-table v56 fields onto
the tables list + derives the virtual badges (curated / new)
server-side from creator-in-Admin + 30-day created_at.
* StackResolver.ResourceEntry grew owner_name / owner_team / tags /
badges; _fetch_entries pulls the v56 columns + computes badges
once per fetch using a single Admin-group SELECT.
* _data_package_entry_dict adapter passes the new fields through to
the macro; tags are merged source-type pills + admin-authored
category tags per the spec convention.
* _stack_card.html renders the v56 badges (top-left, data-badge=
hooks) + the owner chip (data-card-owner hook) without breaking
back-compat — pre-v56 rows render unchanged.
* Admin PUT handler strips the v56 docs fields from the
read-modify-write merged dict so register() doesn't blow up
with the now-larger row shape (same pattern as the v52 docs
fields stripping).
5115 tests passing (+98 v56 + 18 fixed regressions from the merged-
register PUT path), 35 skipped.
* fix(rbac): Edit-on-package + Group-access 'required' persistence + CI vendor guard
Three related bugs reported on the merged-with-main branch:
1. Clicking Edit on a Data Package card landed on /admin/tables with
a `#<pkg.id>` hash that nothing listened to — admin saw the global
table listing, not the editor for that specific package. Added a
`?edit_package=<pkg_id>` query-param handler in admin_tables.html
(analog to the existing `?edit=<table_id>` and `?assign_to=<pkg_id>`
patterns) that calls openEditDataPackageModal on DOMContentLoaded
after a 250ms layout settle. Updated the package-detail Edit link
to use the new query param.
2. Setting Group Access to 'required' didn't persist — re-opening
the modal showed 'available'. Root cause was the v49
``resource_grants.requirement`` enum existing in the DB but the
POST /api/admin/grants endpoint not surfacing it: ``CreateGrantRequest``
declared only group_id + resource_type + resource_id, so Pydantic
silently dropped the matrix's ``requirement: 'required'`` payload
and the new row landed at the DB column default ('available').
Plumbed ``requirement`` through ``CreateGrantRequest`` →
``ResourceGrantsRepository.create`` so the value persists in one
round-trip. Plus a UNIQUE-constraint race in the matrix
diff-apply: DELETE-old + POST-new ran in parallel via
``Promise.allSettled``, so POST could fire first and trip the
unique check before DELETE freed the slot. Switched to sequential
(await all deletes; then await all writes) across all three
matrices (Edit Data Package, Edit Memory Domain, Edit Recipe).
3. CI vendor-content guard ``test_no_groupon_specific_strings_in_oss``
tripped on two of my own docstrings: a "Foundry Data team" mention
in two src/db.py comments + an ``s1_session_landings`` example in
cli/skills/agnes-table-registration.md. Rephrased the comments to
"extended-descriptions admin spec" and replaced the example with
a generic ``events_daily`` table name.
5164 tests passing, 35 skipped (+4 regression tests pinning the POST
/api/admin/grants requirement contract). Vendor guard back to green.
* fix(catalog): admin Browse path drops v58 card fields
The /catalog and /memory admin god-mode branch built ResourceEntry
instances inline from pkg_repo.list() / domains_repo.list() and skipped
owner_name, owner_team, tags, and derived badges (curated/new). Visible
symptom: a package with an owner + tags rendered with the v56 chrome
for non-admin viewers but as a bare card for admins.
Adds StackResolver.browse_admin(user_id, resource_type) — admin god-mode
Browse that walks the full table but routes through the same
_fetch_entries enrichment pass as browse(), so admin + non-admin Browse
stay visually consistent. Both /catalog and /corporate-memory routes
switch to it.
Regression test in tests/test_stack_resolver_browse_admin.py covers:
owner/tags propagation, new/curated badge derivation, in_stack from
admin subscriptions, all-packages-regardless-of-grants, and the
ValueError for unsupported resource types.
* fix(catalog): three /catalog tab-strip UX bugs
1. Required Remove → red toast
browse_admin passed empty required_ids to _fetch_entries, so the
admin's own required grants surfaced as 'available' and the macro
rendered an actionable Remove button that POST /unsubscribe 400'd
on. Now derives required_ids from the admin's own groups so
Required packages render with the disabled "In stack (required)"
button. Regression test in test_stack_resolver_browse_admin.py.
2. Remove green-toasts but card stays until refresh
The My-Stack empty-state placeholder was only emitted server-side
when stack_entries was empty at render time. Removing the last
card left the tab completely blank — users read that as "Remove
didn't work, let me refresh". Both grid + empty-state are now
always rendered with one of them initially hidden; the JS swaps
visibility on add/remove instead of injecting DOM. Same fix in
/corporate-memory.
3. "What are Recipes?" + ambiguous (admin) suffix
Recipes tab now carries its own curator-block explainer (the
shared one was moved inside Browse view so it doesn't bleed
across tabs). The grey "(admin)" suffix becomes a yellow
.admin-only-hint chip with a title tooltip — visibility hint is
now unambiguous: yellow chip = "only you see this", non-admins
don't see the affordance at all.
* schema: renumber v51..v58 → v52..v59 to make room for main's v51
Main 0.54.29 introduced a NEW v51 (table_registry.bq_fqn — issue #343)
that releases ahead of this branch. The unified-stack chain v51..v58
shifts up by one so main's v51 stays as the released schema and ours
become v52..v59. Function names, internal version bumps, dispatch
ladder thresholds, and the migration-test references all move
together. Subsequent merge with main lands the bq_fqn column at the
freed v51 slot.
* fix(seed): seed admin lands in BOTH Admin AND Everyone groups
The LOCAL_DEV_MODE / SEED_ADMIN_EMAIL bootstrap only added the seed
user to Admin. Everyone-scoped grants — the canonical "every-user-
sees-this" pattern for Required onboarding — didn't surface for the
seed admin's own /catalog because they weren't in Everyone. Symptom:
admin grants a Required-tier package to Everyone, then sees it on
/catalog still rendered with an "Add to stack" button (because the
admin's resolved required_ids was empty for that package).
The dual-membership keeps Admin (authorization) and Everyone
(default-grant target) intentionally separate per the design comment
on UserRepository.create — every membership remains traceable to a
concrete row, just now with a system_seed row in Everyone too. Both
INSERTs go through UserGroupMembersRepository.add_member which is
idempotent on (user_id, group_id), so re-fires on every lifespan
startup don't duplicate rows.
Regression test in test_main_seed_admin_everyone.py.
* style: unify admin-only hints across marketplace + memory detail pages
Replaces three stale ``(admin)`` parentheticals with the same yellow
``admin-only`` chip introduced for /catalog tab actions. Same tooltip
copy ("Visible only to admins — analysts won't see this …") so the
visibility hint is unmistakable wherever it appears:
- Hard delete on marketplace_plugin_detail (admin-only destructive
action — same gating as the original suffix conveyed).
- Hard delete on marketplace_item_detail (same).
- Edit link on memory_domain_detail (title-attr only before; now a
visible chip too).
Non-admin viewers never saw these affordances — the gates are
unchanged. Pure styling pass for consistency.
* fix(catalog): exclude soft-deleted data packages + memory domains from Browse
``StackResolver._fetch_entries`` and ``browse_admin`` were querying
data_packages / memory_domains without a ``deleted_at IS NULL`` guard.
A package soft-deleted via /admin/* (v54 soft-delete contract) stayed
visible on /catalog and /memory until either an Undo or a hard delete
— directly contradicting the soft-delete UX which is supposed to
remove the affordance immediately and only retain the row for the
Undo window.
The repository accessors (DataPackagesRepository.list,
MemoryDomainsRepository.list, list_packages_of_table, etc.) already
filter deleted rows; this commit brings the resolver's direct SQL in
line with that contract.
Regression test in test_stack_resolver_browse_admin.py.
* fix(catalog): Add/Remove updates full card chrome, not just button
The previous _applyStackChange flipped only the footer button label —
the card border (.is-in-stack class), top-right "In stack" badge, and
button color class (--add / --remove) stayed at their server-rendered
state. After Add the user saw the button checkmark but the rest of
the card still looked like "available, not in stack". They read this
as "the change didn't take — let me refresh".
This commit makes the optimistic update mirror what the server-side
macro renders for the new state:
* ``c.classList.toggle('is-in-stack', becameInStack)`` — flips the
border + visual state class.
* Top-right ``.stack-card__req-badge--instack`` badge is injected on
Add, removed on Remove (skipped when ``data-requirement='required'``
— that slot is owned by the Required badge).
* Button text is "Remove" / "+ Add to stack" matching the macro
(was "✓ In stack" which was visually nice but inconsistent).
* Button color class --add / --remove swaps so the destructive Remove
tint kicks in immediately.
The clone-into-My-Stack path applies the same updates so the new card
in My Stack reads identically to a server-rendered in_stack card.
Mirrored in /corporate-memory.
* fix(memory): four Devin-review bugs on /memory drill-down + manifest
PR #333 Devin review surfaced four real bugs that ship a broken
/memory experience even though the unit tests passed.
1. Manifest md5 omits is_required + content (app/api/sync.py:836-840)
_build_memory_domains_section hashed only (id|title|status) per
item. _build_per_domain_markdown routes items between "## Required"
and "## Approved" by is_required and embeds full content — so an
admin edit of either dimension left the manifest md5 unchanged,
`agnes pull` skipped the re-fetch, and the analyst kept a stale
bundle.md. Now both fields participate in the hash.
2. required_count always 0 (src/repositories/memory_domains.py)
list_items_of_domain only SELECTed (id, title, status) so the
`it.get("is_required")` in the manifest builder always evaluated
to None → required_count = 0 regardless of actual state. The
manifest builder advertised a count it could never compute. Now
projects is_required + content too (required by fix 1 anyway).
3. Vote URL 404 (memory_domain_detail.html:289-290)
Constructed `/api/memory/items/{id}/vote` but the route is
`/api/memory/{id}/vote`. Every upvote/downvote button was a
silent no-op.
4. Dismiss/undismiss URL + method both wrong (memory_domain_detail.html:296-305)
Constructed `/api/memory/items/{id}/dismiss` (extra /items/) and
/undismiss (no such route — undismiss is DELETE on /dismiss).
Both buttons silently 404'd. Now POST + DELETE on
`/api/memory/{id}/dismiss` per app/api/memory.py:635/675.
* fix: multi-agent reviewer findings — vendor-token scrubs + manifest md5 predicate + soft-delete filter
Three reviewer findings from the multi-agent review on PR #333,
fixed in-place per CLAUDE.md issue-economy rule.
Reviewer-rules (Important — vendor-agnostic OSS):
- app/main.py:218 comment: replaced 'foundryai-prod' with generic
'a customer prod instance' phrasing. Public OSS repo must not
carry customer-specific tokens (CLAUDE.md § Project conventions).
- tests/test_table_registry_v56_docs.py:70 fixture string:
replaced "user_brand_affiliation = 'groupon'" with 'acme' on
the same rule.
Reviewer-architecture (closes still-unresolved Devin 🚩 ANALYSIS):
- app/api/sync.py _build_memory_domains_section: md5 hash loop now
filters items to the SAME predicate the bundle renderer uses
(is_required OR status='approved'). Pre-fix the hash iterated ALL
items but _build_per_domain_markdown only rendered the union of
required items + approved-non-required items — so an admin edit
to a pending/rejected non-required item flipped the md5 against
an identical-bytes bundle, triggering a wasteful re-fetch on
every analyst's next 'agnes pull'. The earlier commit fixed the
hash-input fields (is_required + content); this closes the
set-of-items asymmetry Devin separately flagged.
Reviewer-RBAC (minor cleanup):
- app/resource_types.py _data_package_blocks and _memory_domain_blocks
now filter 'WHERE deleted_at IS NULL' (v54 soft-delete column) so
the /admin/access UI doesn't surface soft-deleted entities as
grantable. Mirrors the existing filter on _recipe_blocks. No
security leak pre-fix (resolver double-filters and re-checks at
serve time), just UI cleanliness.
- app/services/stack_resolver.py add_to_stack: docstring note
added explaining that authorization is enforced at the API layer
(app/api/stack.py can_access gate), not at the resolver. The
initial review suggested adding a defensive 403 here, but that
broke 5 existing tests that legitimately call add_to_stack
directly without setting up grants first; the docstring captures
the contract instead. stack() already intersects subscriptions
with current available_ids on every read, so a 'zombie' row from
a misuse never leaks into the user-facing manifest.
* release: 0.55.0 — unified Browse + My Stack (Data Packages + Memory), schema v48→v59, 3 BREAKING
1761 lines
66 KiB
Python
1761 lines
66 KiB
Python
"""Corporate memory endpoints — knowledge items, voting, governance admin, contradictions."""
|
||
|
||
import asyncio
|
||
import json
|
||
import logging
|
||
import uuid
|
||
from typing import Optional, List
|
||
|
||
from fastapi import APIRouter, Depends, HTTPException, Response
|
||
from pydantic import BaseModel, ConfigDict, Field
|
||
import duckdb
|
||
|
||
from app.auth.dependencies import get_current_user, _get_db
|
||
from app.auth.access import require_admin, is_user_admin, can_access
|
||
from src.repositories.knowledge import KnowledgeRepository
|
||
from src.repositories.memory_domains import MemoryDomainsRepository
|
||
from src.repositories.audit import AuditRepository
|
||
|
||
logger = logging.getLogger(__name__)
|
||
|
||
router = APIRouter(prefix="/api/memory", tags=["memory"])
|
||
|
||
# v49: ``mandatory`` is no longer a lifecycle status — Required tier rides on
|
||
# ``knowledge_items.is_required``. ``status`` covers lifecycle only (pending,
|
||
# approved, rejected, revoked, expired).
|
||
VALID_STATUSES = ["pending", "approved", "rejected", "revoked", "expired"]
|
||
|
||
BUNDLE_TOKEN_BUDGET = 6000
|
||
# Rough chars-per-token estimate (conservative).
|
||
_CHARS_PER_TOKEN = 4
|
||
|
||
# v49: domain set is no longer a hardcoded enum — it lives in the
|
||
# ``memory_domains`` table and is administrable via /admin/memory-domains.
|
||
# Validation uses ``MemoryDomainsRepository.exists_by_slug``.
|
||
|
||
|
||
def _validate_domain_slug(slug: Optional[str], conn: duckdb.DuckDBPyConnection) -> None:
|
||
"""Raise 400 if ``slug`` is truthy but doesn't resolve to a memory_domains row."""
|
||
if not slug:
|
||
return
|
||
if not MemoryDomainsRepository(conn).exists_by_slug(slug):
|
||
raise HTTPException(
|
||
status_code=400,
|
||
detail=f"Unknown memory domain slug: {slug!r}",
|
||
)
|
||
|
||
# API-layer allowlist for ``POST /api/memory/admin/bulk-update``. The repo's
|
||
# ``_UPDATABLE_FIELDS`` is intentionally broader (``status``, ``sensitivity``,
|
||
# ``is_personal``, ``confidence``, valid_from/until, supersedes, etc.) so the
|
||
# narrow per-item ``update`` path can still touch them; bulk-edit must NOT,
|
||
# because changing status / personal-flag / sensitivity in bulk bypasses the
|
||
# proper governance flow (``/admin/mandate``, ``/admin/revoke``,
|
||
# ``/{id}/personal``) and its dedicated audit rows. Callers that need those
|
||
# fields in bulk should use the per-item endpoints. See PR #126 review.
|
||
_BULK_UPDATE_ALLOWED = frozenset({
|
||
"category", "domain", "tags", "tags_add", "tags_remove",
|
||
"audience", "title", "content",
|
||
})
|
||
|
||
|
||
def _is_privileged_viewer(user: dict, conn: duckdb.DuckDBPyConnection) -> bool:
|
||
"""Admins (members of the Admin system group, per RBAC v13) are the
|
||
privileged viewer tier. Pre-v13 the schema also had a km_admin role; v13
|
||
collapsed the role hierarchy into groups, so the corporate-memory admin
|
||
capability now lives on top of plain admin membership. Module authors
|
||
needing a finer-grained gate (curator-only, etc.) should add a
|
||
``ResourceType.CORPORATE_MEMORY_ADMIN`` resource type and gate with
|
||
``require_resource_access`` instead of extending this helper."""
|
||
user_id = user.get("id")
|
||
if not user_id:
|
||
return False
|
||
return is_user_admin(user_id, conn)
|
||
|
||
|
||
def _effective_groups(
|
||
user: dict, conn: duckdb.DuckDBPyConnection
|
||
) -> Optional[List[str]]:
|
||
"""Audience-filter group list for the caller, or ``None`` for admins
|
||
(no filter — see all items regardless of audience).
|
||
|
||
Reads from ``user_group_members`` JOIN ``user_groups`` (the v13 model).
|
||
Pre-v13 this read ``users.groups`` JSON; that column was dropped in v13
|
||
and the membership is now materialized in ``user_group_members`` with a
|
||
``source`` discriminator (admin / google_sync / system_seed).
|
||
"""
|
||
if _is_privileged_viewer(user, conn):
|
||
return None
|
||
user_id = user.get("id")
|
||
if not user_id:
|
||
return []
|
||
rows = conn.execute(
|
||
"""SELECT g.name FROM user_group_members m
|
||
JOIN user_groups g ON m.group_id = g.id
|
||
WHERE m.user_id = ?""",
|
||
[user_id],
|
||
).fetchall()
|
||
return [f"group:{r[0]}" for r in rows]
|
||
|
||
|
||
def _caller_granted_memory_domains(
|
||
user: dict,
|
||
conn: duckdb.DuckDBPyConnection,
|
||
) -> Optional[List[str]]:
|
||
"""Domains the caller has been granted access to via resource_grants.
|
||
|
||
The grant model is generic — admins assign ``MEMORY_DOMAIN`` resources
|
||
(e.g. ``md_finance``) to ``user_groups`` rows via ``/admin/access``.
|
||
This helper resolves the caller's group memberships against
|
||
``resource_grants`` and returns the union of ``memory_domains.id``
|
||
values (v49: the migration re-pointed grants from slug to id).
|
||
|
||
Returns ``None`` for privileged viewers (admins see everything regardless
|
||
of grants — same convention as ``_effective_groups``). Returns an
|
||
empty list when the caller has no grants — the SQL EXISTS-join collapses
|
||
in that case, preserving pre-RBAC behaviour.
|
||
"""
|
||
if _is_privileged_viewer(user, conn):
|
||
return None
|
||
user_id = user.get("id")
|
||
if not user_id:
|
||
return []
|
||
rows = conn.execute(
|
||
"""SELECT DISTINCT rg.resource_id
|
||
FROM resource_grants rg
|
||
JOIN user_group_members m ON m.group_id = rg.group_id
|
||
WHERE m.user_id = ?
|
||
AND rg.resource_type = 'memory_domain'""",
|
||
[user_id],
|
||
).fetchall()
|
||
return [r[0] for r in rows]
|
||
|
||
|
||
def _can_view_item(user: dict, item: dict, is_priv: bool) -> bool:
|
||
"""Personal items are visible only to the contributor and privileged
|
||
viewers. Non-personal items are visible to any authenticated user.
|
||
|
||
``is_priv`` is pre-computed by the caller (one DB hit per request) so
|
||
a per-item loop doesn't re-query ``user_group_members`` for every row.
|
||
"""
|
||
if not item.get("is_personal"):
|
||
return True
|
||
if is_priv:
|
||
return True
|
||
return item.get("source_user") == user.get("email")
|
||
|
||
|
||
class CreateKnowledgeRequest(BaseModel):
|
||
title: str
|
||
content: str
|
||
# Allow callers to POST either `domain_slug` (new canonical name,
|
||
# matching admin/repo/template layers) or `domain` (legacy alias kept
|
||
# for one release so existing API callers don't break — Pydantic v2
|
||
# accepts the alias on input, Python code reads `request.domain_slug`).
|
||
model_config = ConfigDict(populate_by_name=True)
|
||
category: str
|
||
tags: Optional[List[str]] = None
|
||
domain_slug: Optional[str] = Field(default=None, alias="domain")
|
||
entities: Optional[List[str]] = None
|
||
source_type: Optional[str] = None
|
||
|
||
|
||
class VoteRequest(BaseModel):
|
||
vote: int
|
||
|
||
|
||
class PersonalFlagRequest(BaseModel):
|
||
is_personal: bool
|
||
|
||
|
||
class AdminActionRequest(BaseModel):
|
||
reason: Optional[str] = None
|
||
audience: Optional[str] = None
|
||
|
||
|
||
class EditRequest(BaseModel):
|
||
title: Optional[str] = None
|
||
content: Optional[str] = None
|
||
|
||
|
||
class BatchActionRequest(BaseModel):
|
||
item_ids: List[str]
|
||
action: str # approve, reject, mandate, revoke
|
||
reason: Optional[str] = None
|
||
audience: Optional[str] = None
|
||
|
||
|
||
class ResolveContradictionRequest(BaseModel):
|
||
resolution: str # kept_a, kept_b, merged, both_valid
|
||
|
||
|
||
class CreateContradictionRequest(BaseModel):
|
||
item_a_id: str
|
||
item_b_id: str
|
||
explanation: str
|
||
severity: Optional[str] = None
|
||
suggested_resolution: Optional[str] = None
|
||
|
||
|
||
class PatchItemRequest(BaseModel):
|
||
"""Partial update for a knowledge item via PATCH /api/memory/admin/{id}.
|
||
|
||
Replaces the narrow ``EditRequest`` (title + content only). Any field
|
||
left as ``None`` is unchanged. Domain is validated against
|
||
``VALID_DOMAINS`` when supplied.
|
||
|
||
``domain_ids`` is the M:N junction write path (knowledge_item_domains)
|
||
used by the admin item-edit modal's chip-input — pass a list of
|
||
memory_domains.id strings and the endpoint replaces the item's full
|
||
domain membership atomically. Empty list ``[]`` clears all
|
||
memberships. Supplying both ``domain`` and ``domain_ids`` is allowed
|
||
(the legacy single ``domain`` write happens first, the junction
|
||
replace overrides it).
|
||
"""
|
||
title: Optional[str] = None
|
||
content: Optional[str] = None
|
||
category: Optional[str] = None
|
||
domain: Optional[str] = None
|
||
domain_ids: Optional[List[str]] = None
|
||
tags: Optional[List[str]] = None
|
||
audience: Optional[str] = None
|
||
|
||
|
||
class BulkUpdateRequest(BaseModel):
|
||
"""Apply ``updates`` to every id in ``item_ids``. Issue #62."""
|
||
item_ids: List[str]
|
||
updates: dict
|
||
|
||
|
||
class ResolveDuplicateRequest(BaseModel):
|
||
"""Resolve a duplicate-candidate relation row.
|
||
|
||
``resolution`` is one of ``duplicate`` / ``different`` / ``dismissed``
|
||
(decision 2 in issue #62 — no auto-merge action; merging is a separate
|
||
larger feature).
|
||
"""
|
||
resolution: str
|
||
|
||
|
||
# ---- Memory domain catalog (v49 — frontend typeahead + admin dropdowns) ----
|
||
|
||
|
||
@router.get("/domains")
|
||
async def list_memory_domains(
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""List all memory domains for chip-input typeahead + dropdown population.
|
||
|
||
v49: replaces the hardcoded ``VALID_DOMAINS`` constant. Returns every
|
||
row in ``memory_domains`` (admin-administered + the six canonical seed
|
||
rows) so the frontend can render the picker without a separate /admin
|
||
endpoint. Authenticated users only — domain catalog is non-sensitive
|
||
metadata that powers the item-edit UI.
|
||
"""
|
||
domains = MemoryDomainsRepository(conn).list()
|
||
return {
|
||
"domains": [
|
||
{
|
||
"id": d["id"],
|
||
"slug": d["slug"],
|
||
"name": d["name"],
|
||
"description": d["description"],
|
||
"icon": d["icon"],
|
||
"color": d["color"],
|
||
}
|
||
for d in domains
|
||
]
|
||
}
|
||
|
||
|
||
# ---- User endpoints ----
|
||
|
||
@router.get("")
|
||
async def list_knowledge(
|
||
status_filter: Optional[str] = None,
|
||
category: Optional[str] = None,
|
||
domain: Optional[str] = None,
|
||
source_type: Optional[str] = None,
|
||
search: Optional[str] = None,
|
||
exclude_personal: bool = True,
|
||
upvoted_by_me: bool = False,
|
||
hide_dismissed: bool = False,
|
||
is_required: Optional[bool] = None,
|
||
page: int = 1,
|
||
per_page: int = 50,
|
||
sort: str = "updated_at",
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""List knowledge items with filtering, pagination, search.
|
||
|
||
``upvoted_by_me=true`` narrows to items the caller upvoted (powers the
|
||
"My Upvotes" filter on /corporate-memory — replaces the old dead
|
||
"My Rules" category sentinel).
|
||
"""
|
||
repo = KnowledgeRepository(conn)
|
||
page = max(page, 1)
|
||
offset = (page - 1) * per_page
|
||
# Privacy: non-privileged viewers can never opt out of the personal filter.
|
||
# Their own personal contributions are visible via /my-contributions, not here.
|
||
effective_exclude_personal = True if not _is_privileged_viewer(user, conn) else exclude_personal
|
||
effective_groups = _effective_groups(user, conn)
|
||
granted_domains = _caller_granted_memory_domains(user, conn)
|
||
statuses = [status_filter] if status_filter else None
|
||
upvoted_by_user_id = user["id"] if upvoted_by_me else None
|
||
# v46: caller's id is plumbed to repo filters when hide_dismissed=True so
|
||
# the SQL can NOT-EXISTS-subquery against knowledge_item_user_dismissed.
|
||
# Mandatory items are exempted by the subquery's status guard.
|
||
dismissed_by_user_id = user["id"]
|
||
if search:
|
||
items = repo.search(
|
||
search,
|
||
exclude_personal=effective_exclude_personal,
|
||
user_groups=effective_groups,
|
||
granted_domains=granted_domains,
|
||
statuses=statuses,
|
||
category=category,
|
||
domain=domain,
|
||
source_type=source_type,
|
||
is_required=is_required,
|
||
dismissed_by_user=dismissed_by_user_id,
|
||
hide_dismissed=hide_dismissed,
|
||
limit=per_page,
|
||
offset=offset,
|
||
)
|
||
if upvoted_by_user_id:
|
||
# Best-effort post-filter for the search() path (which doesn't
|
||
# plumb the upvote filter into its SQL). Search + "My Upvotes"
|
||
# is rare enough that a post-filter is fine.
|
||
upvoted_ids = {
|
||
r[0] for r in conn.execute(
|
||
"SELECT item_id FROM knowledge_votes WHERE user_id = ? AND vote > 0",
|
||
[upvoted_by_user_id],
|
||
).fetchall()
|
||
}
|
||
items = [it for it in items if it["id"] in upvoted_ids]
|
||
else:
|
||
items = repo.list_items(
|
||
statuses=statuses,
|
||
category=category,
|
||
domain=domain,
|
||
source_type=source_type,
|
||
is_required=is_required,
|
||
exclude_personal=effective_exclude_personal,
|
||
user_groups=effective_groups,
|
||
granted_domains=granted_domains,
|
||
upvoted_by_user=upvoted_by_user_id,
|
||
dismissed_by_user=dismissed_by_user_id,
|
||
hide_dismissed=hide_dismissed,
|
||
limit=per_page,
|
||
offset=offset,
|
||
)
|
||
|
||
# Enrich with votes + per-user dismissal flag. The set lookup keeps the
|
||
# per-item annotation O(1); the frontend uses ``dismissed_by_me`` to
|
||
# render the gray-out state without a separate roundtrip.
|
||
dismissed_set = set(repo.list_dismissed_ids(user["id"]))
|
||
for item in items:
|
||
votes = repo.get_votes(item["id"])
|
||
item["upvotes"] = votes["upvotes"]
|
||
item["downvotes"] = votes["downvotes"]
|
||
item["score"] = votes["upvotes"] - votes["downvotes"]
|
||
item["dismissed_by_me"] = item["id"] in dismissed_set
|
||
|
||
import math
|
||
total_count = repo.count_items(
|
||
search=search,
|
||
statuses=statuses,
|
||
category=category,
|
||
domain=domain,
|
||
source_type=source_type,
|
||
is_required=is_required,
|
||
exclude_personal=effective_exclude_personal,
|
||
user_groups=effective_groups,
|
||
granted_domains=granted_domains,
|
||
dismissed_by_user=dismissed_by_user_id,
|
||
hide_dismissed=hide_dismissed,
|
||
)
|
||
total_pages = math.ceil(total_count / per_page) if per_page > 0 else 1
|
||
|
||
return {
|
||
"items": items,
|
||
"count": len(items),
|
||
"page": page,
|
||
"per_page": per_page,
|
||
"total_count": total_count,
|
||
"total_pages": total_pages,
|
||
}
|
||
|
||
|
||
@router.get("/stats")
|
||
async def get_stats(
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Get corporate memory statistics.
|
||
|
||
Aggregations exclude personal items for non-privileged callers — otherwise
|
||
`total` and the `by_*` counts would change in observable ways when a
|
||
colleague flags or unflags a personal item, leaking existence info per
|
||
ADR Decision 1.
|
||
|
||
Uses SQL aggregation rather than ``repo.list_items()`` to keep the
|
||
endpoint cheap on large knowledge bases (the loader path materializes
|
||
every row + parses JSON tags/contributors per row, which blocks the
|
||
event loop on N>1k items). Audience filter mirrors what list_items
|
||
applies: ``audience IS NULL OR audience = 'all'`` plus, for non-admins,
|
||
membership in any of the caller's group-prefixed audiences.
|
||
"""
|
||
is_priv = _is_privileged_viewer(user, conn)
|
||
groups = _effective_groups(user, conn)
|
||
granted_domains = _caller_granted_memory_domains(user, conn)
|
||
|
||
where_clauses: List[str] = []
|
||
params: list = []
|
||
if not is_priv:
|
||
# Personal-item privacy: non-privileged callers see no personal items
|
||
# in the aggregate, even their own. /my-contributions is the canonical
|
||
# surface for a user's personal contributions; including them here
|
||
# would make /api/memory/stats.total disagree with the count visible
|
||
# via GET /api/memory (which forces exclude_personal=True for non-
|
||
# admins regardless of source_user).
|
||
where_clauses.append("(is_personal IS NULL OR is_personal = FALSE)")
|
||
|
||
if groups is not None:
|
||
# Mirror the visibility composition KnowledgeRepository.list_items
|
||
# uses: audience match OR MEMORY_DOMAIN grant. Without this the
|
||
# stats `total` diverges from the list endpoint's `total_count` for
|
||
# non-admin users with grants. v49: granted_domains values are
|
||
# ``memory_domains.id`` and resolve via the junction EXISTS subquery.
|
||
visibility = ["audience IS NULL", "audience = 'all'"]
|
||
if groups:
|
||
placeholders = ",".join(["?"] * len(groups))
|
||
visibility.append(f"audience IN ({placeholders})")
|
||
params.extend(groups)
|
||
if granted_domains:
|
||
domain_placeholders = ",".join(["?"] * len(granted_domains))
|
||
visibility.append(
|
||
"EXISTS (SELECT 1 FROM knowledge_item_domains kid "
|
||
"WHERE kid.item_id = knowledge_items.id "
|
||
f"AND kid.domain_id IN ({domain_placeholders}))"
|
||
)
|
||
params.extend(granted_domains)
|
||
where_clauses.append("(" + " OR ".join(visibility) + ")")
|
||
|
||
where_sql = (" WHERE " + " AND ".join(where_clauses)) if where_clauses else ""
|
||
|
||
total = conn.execute(
|
||
f"SELECT COUNT(*) FROM knowledge_items{where_sql}", params
|
||
).fetchone()[0] or 0
|
||
|
||
by_status_rows = conn.execute(
|
||
f"SELECT COALESCE(status, 'unknown') AS s, COUNT(*) "
|
||
f"FROM knowledge_items{where_sql} GROUP BY s",
|
||
params,
|
||
).fetchall()
|
||
by_status = {r[0]: r[1] for r in by_status_rows}
|
||
|
||
cat_rows = conn.execute(
|
||
f"SELECT DISTINCT category FROM knowledge_items{where_sql} "
|
||
f"{'AND' if where_sql else 'WHERE'} category IS NOT NULL",
|
||
params,
|
||
).fetchall()
|
||
categories = sorted(r[0] for r in cat_rows if r[0])
|
||
|
||
# v49: domain lives in the junction. LEFT JOIN to surface 'unset' bucket
|
||
# for items without any domain row, matching the pre-v49 COALESCE behavior.
|
||
by_domain_rows = conn.execute(
|
||
"SELECT COALESCE(md.slug, 'unset') AS d, COUNT(*) "
|
||
"FROM knowledge_items "
|
||
"LEFT JOIN knowledge_item_domains kid ON kid.item_id = knowledge_items.id "
|
||
"LEFT JOIN memory_domains md ON md.id = kid.domain_id"
|
||
+ (where_sql or "")
|
||
+ " GROUP BY d",
|
||
params,
|
||
).fetchall()
|
||
by_domain = {r[0]: r[1] for r in by_domain_rows}
|
||
|
||
by_source_rows = conn.execute(
|
||
f"SELECT COALESCE(source_type, 'unknown') AS st, COUNT(*) "
|
||
f"FROM knowledge_items{where_sql} GROUP BY st",
|
||
params,
|
||
).fetchall()
|
||
by_source_type = {r[0]: r[1] for r in by_source_rows}
|
||
|
||
# by_tag + by_audience extend stats for the chip-filter UI (issue #62).
|
||
# The repo helpers honor the same audience + personal-item filters this
|
||
# endpoint applies above.
|
||
repo = KnowledgeRepository(conn)
|
||
exclude_personal_for_caller = not is_priv
|
||
by_tag = repo.count_by_tag(
|
||
exclude_personal=exclude_personal_for_caller,
|
||
user_groups=groups,
|
||
granted_domains=granted_domains,
|
||
)
|
||
by_audience = repo.count_by_audience(
|
||
exclude_personal=exclude_personal_for_caller,
|
||
user_groups=groups,
|
||
granted_domains=granted_domains,
|
||
)
|
||
|
||
return {
|
||
"total": total,
|
||
"by_status": by_status,
|
||
"categories": categories,
|
||
"by_domain": by_domain,
|
||
"by_source_type": by_source_type,
|
||
"by_tag": by_tag,
|
||
"by_audience": by_audience,
|
||
}
|
||
|
||
|
||
@router.post("", status_code=201)
|
||
async def create_knowledge(
|
||
request: CreateKnowledgeRequest,
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
# Mirror the validation already enforced by PATCH /admin/{id} and bulk-update
|
||
# so an item can't be created with a domain it can't be patched to. Empty /
|
||
# missing domain is fine — only reject non-empty values outside the allowlist.
|
||
# See PR #126 review.
|
||
_validate_domain_slug(request.domain_slug, conn)
|
||
repo = KnowledgeRepository(conn)
|
||
item_id = str(uuid.uuid4())
|
||
|
||
# Best-effort auto-tagging — runs only when an LLM extractor is configured.
|
||
tags = list(request.tags) if request.tags else []
|
||
try:
|
||
from config.loader import load_instance_config
|
||
from connectors.llm import create_extractor
|
||
from services.corporate_memory.tagger import auto_tag_items
|
||
cfg = load_instance_config()
|
||
ai_cfg = cfg.get("ai")
|
||
if ai_cfg:
|
||
extractor = create_extractor(ai_cfg)
|
||
stub = [{"id": item_id, "title": request.title, "content": request.content}]
|
||
assignments = await asyncio.to_thread(auto_tag_items, stub, extractor)
|
||
topics = assignments.get(item_id, [])
|
||
if topics:
|
||
seen: set[str] = set()
|
||
merged: list[str] = []
|
||
for t in topics + tags:
|
||
if t not in seen:
|
||
seen.add(t)
|
||
merged.append(t)
|
||
tags = merged
|
||
except Exception:
|
||
pass # tagging is non-critical — never block item creation
|
||
|
||
create_kwargs = dict(
|
||
id=item_id,
|
||
title=request.title,
|
||
content=request.content,
|
||
category=request.category,
|
||
source_user=user.get("email"),
|
||
tags=tags or None,
|
||
domain=request.domain_slug,
|
||
entities=request.entities,
|
||
confidence=0.50,
|
||
)
|
||
if request.source_type:
|
||
create_kwargs["source_type"] = request.source_type
|
||
repo.create(**create_kwargs)
|
||
return {"id": item_id, "status": "pending"}
|
||
|
||
|
||
@router.post("/{item_id}/vote")
|
||
async def vote_knowledge(
|
||
item_id: str,
|
||
request: VoteRequest,
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
if request.vote not in (1, -1, 0):
|
||
raise HTTPException(status_code=400, detail="Vote must be 1, -1, or 0 (retract)")
|
||
repo = KnowledgeRepository(conn)
|
||
item = repo.get_by_id(item_id)
|
||
if not item or not _can_view_item(user, item, _is_privileged_viewer(user, conn)):
|
||
raise HTTPException(status_code=404, detail="Knowledge item not found")
|
||
if request.vote == 0:
|
||
repo.unvote(item_id, user["id"])
|
||
else:
|
||
repo.vote(item_id, user["id"], request.vote)
|
||
return repo.get_votes(item_id)
|
||
|
||
|
||
@router.get("/my-votes")
|
||
async def get_my_votes(
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Get current user's votes on all items."""
|
||
results = conn.execute(
|
||
"SELECT item_id, vote FROM knowledge_votes WHERE user_id = ?", [user["id"]]
|
||
).fetchall()
|
||
return {row[0]: row[1] for row in results}
|
||
|
||
|
||
@router.get("/my-contributions")
|
||
async def get_my_contributions(
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Get knowledge items contributed by the current user."""
|
||
repo = KnowledgeRepository(conn)
|
||
email = user.get("email", "")
|
||
items = repo.get_user_contributions(email)
|
||
for item in items:
|
||
votes = repo.get_votes(item["id"])
|
||
item["upvotes"] = votes["upvotes"]
|
||
item["downvotes"] = votes["downvotes"]
|
||
item["score"] = votes["upvotes"] - votes["downvotes"]
|
||
return {"items": items, "count": len(items)}
|
||
|
||
|
||
@router.post("/{item_id}/personal")
|
||
async def toggle_personal_flag(
|
||
item_id: str,
|
||
request: PersonalFlagRequest,
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Toggle personal/excluded flag on a knowledge item (only by the contributor)."""
|
||
repo = KnowledgeRepository(conn)
|
||
item = repo.get_by_id(item_id)
|
||
if not item:
|
||
raise HTTPException(status_code=404, detail="Knowledge item not found")
|
||
if item.get("source_user") != user.get("email"):
|
||
raise HTTPException(status_code=403, detail="Only the contributor can flag personal items")
|
||
repo.set_personal(item_id, request.is_personal)
|
||
return {"id": item_id, "is_personal": request.is_personal}
|
||
|
||
|
||
@router.post("/{item_id}/dismiss")
|
||
async def dismiss_item(
|
||
item_id: str,
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Per-user opt-out — remove an item from the caller's AI bundle.
|
||
|
||
Idempotent: re-dismissing an already-dismissed item is a no-op success.
|
||
Mandatory items can never be dismissed — the governance hard rule —
|
||
so a POST against one returns 400 with a clear detail message.
|
||
"""
|
||
repo = KnowledgeRepository(conn)
|
||
item = repo.get_by_id(item_id)
|
||
if not item or not _can_view_item(user, item, _is_privileged_viewer(user, conn)):
|
||
raise HTTPException(status_code=404, detail="Knowledge item not found")
|
||
# v49: Required tier rides on ``is_required`` (was status='mandatory').
|
||
if item.get("is_required") is True:
|
||
raise HTTPException(status_code=400, detail="Cannot dismiss a mandatory item")
|
||
repo.dismiss(user["id"], item_id)
|
||
# v49 Section 9.2 — telemetry. domain_ids surfaces the per-item domain
|
||
# membership so /admin/telemetry can correlate dismissals with the
|
||
# domain they came from.
|
||
try:
|
||
from src.repositories.memory_domains import MemoryDomainsRepository
|
||
from src.repositories.usage import UsageRepository
|
||
domain_ids = [
|
||
d["id"] for d in MemoryDomainsRepository(conn).list_domains_of_item(item_id)
|
||
]
|
||
UsageRepository(conn).emit_server_event(
|
||
event_type="memory.dismiss",
|
||
user_id=user["id"],
|
||
username=user.get("email") or user["id"],
|
||
props={"item_id": item_id, "domain_ids": domain_ids},
|
||
)
|
||
except Exception:
|
||
pass
|
||
return {"id": item_id, "dismissed": True}
|
||
|
||
|
||
@router.delete("/{item_id}/dismiss", status_code=204)
|
||
async def undismiss_item(
|
||
item_id: str,
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Idempotent un-dismiss — a second DELETE still returns 204.
|
||
|
||
Returns 404 if the item itself doesn't exist (consistent with the rest
|
||
of the per-item endpoints); the dismissal row's existence is not
|
||
consulted because absence is the success state.
|
||
"""
|
||
repo = KnowledgeRepository(conn)
|
||
item = repo.get_by_id(item_id)
|
||
if not item or not _can_view_item(user, item, _is_privileged_viewer(user, conn)):
|
||
raise HTTPException(status_code=404, detail="Knowledge item not found")
|
||
repo.undismiss(user["id"], item_id)
|
||
# v49 Section 9.2 — telemetry. Best-effort fire-and-forget. Endpoint
|
||
# returns 204 No Content (the decorator status_code overrides any
|
||
# body), so no return value needed; telemetry is the only side effect
|
||
# we still want.
|
||
try:
|
||
from src.repositories.usage import UsageRepository
|
||
UsageRepository(conn).emit_server_event(
|
||
event_type="memory.undismiss",
|
||
user_id=user["id"],
|
||
username=user.get("email") or user["id"],
|
||
props={"item_id": item_id},
|
||
)
|
||
except Exception:
|
||
pass
|
||
|
||
|
||
@router.get("/{item_id}/provenance")
|
||
async def get_provenance(
|
||
item_id: str,
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Get source provenance for a knowledge item."""
|
||
repo = KnowledgeRepository(conn)
|
||
item = repo.get_by_id(item_id)
|
||
if not item or not _can_view_item(user, item, _is_privileged_viewer(user, conn)):
|
||
raise HTTPException(status_code=404, detail="Knowledge item not found")
|
||
return {
|
||
"id": item_id,
|
||
"source_type": item.get("source_type"),
|
||
"source_ref": item.get("source_ref"),
|
||
"source_user": item.get("source_user"),
|
||
"confidence": item.get("confidence"),
|
||
"domain": item.get("domain"),
|
||
"entities": item.get("entities"),
|
||
"valid_from": item.get("valid_from"),
|
||
"valid_until": item.get("valid_until"),
|
||
"supersedes": item.get("supersedes"),
|
||
"created_at": item.get("created_at"),
|
||
}
|
||
|
||
|
||
# ---- Admin governance endpoints ----
|
||
|
||
def _get_item_or_404(repo: KnowledgeRepository, item_id: str) -> dict:
|
||
item = repo.get_by_id(item_id)
|
||
if not item:
|
||
raise HTTPException(status_code=404, detail="Knowledge item not found")
|
||
return item
|
||
|
||
|
||
def _audit_action(conn, admin_email: str, action: str, item_id: str, details: dict = None):
|
||
"""Write an admin governance audit row.
|
||
|
||
Action names use the ``corporate_memory.<action>`` namespace as advertised
|
||
in the 0.15.0 CHANGELOG. Pre-#62 the code wrote ``km_<action>`` — the
|
||
audit-tab filter (see ``admin_audit`` below) accepts both prefixes so
|
||
historical rows still surface.
|
||
"""
|
||
audit = AuditRepository(conn)
|
||
audit.log(
|
||
user_id=admin_email,
|
||
action=f"corporate_memory.{action}",
|
||
resource=item_id,
|
||
params=details,
|
||
)
|
||
|
||
|
||
@router.post("/admin/approve")
|
||
async def admin_approve(
|
||
item_id: str,
|
||
user: dict = Depends(require_admin),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
repo = KnowledgeRepository(conn)
|
||
_get_item_or_404(repo, item_id)
|
||
repo.update_status(item_id, "approved")
|
||
_audit_action(conn, user["email"], "approve", item_id)
|
||
return {"id": item_id, "status": "approved"}
|
||
|
||
|
||
@router.post("/admin/reject")
|
||
async def admin_reject(
|
||
item_id: str,
|
||
request: AdminActionRequest,
|
||
user: dict = Depends(require_admin),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
repo = KnowledgeRepository(conn)
|
||
_get_item_or_404(repo, item_id)
|
||
repo.update_status(item_id, "rejected")
|
||
_audit_action(conn, user["email"], "reject", item_id, {"reason": request.reason})
|
||
return {"id": item_id, "status": "rejected"}
|
||
|
||
|
||
@router.post("/admin/mandate")
|
||
async def admin_mandate(
|
||
item_id: str,
|
||
request: AdminActionRequest,
|
||
user: dict = Depends(require_admin),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""v49: Required tier rides on ``knowledge_items.is_required`` boolean —
|
||
``status`` is reserved for lifecycle (pending/approved/rejected/revoked/
|
||
expired). This endpoint keeps the path stable for back-compat; response
|
||
shape now surfaces ``is_required: True`` instead of ``status: 'mandatory'``.
|
||
"""
|
||
repo = KnowledgeRepository(conn)
|
||
_get_item_or_404(repo, item_id)
|
||
repo.set_is_required(item_id, True)
|
||
if request.audience is not None:
|
||
repo.update(item_id, audience=request.audience)
|
||
_audit_action(conn, user["email"], "mandate", item_id, {
|
||
"reason": request.reason, "audience": request.audience,
|
||
})
|
||
# v49 Section 9.1 — spec table maps both mark-mandatory and the legacy
|
||
# mandate endpoint to the canonical ``memory_item.set_required`` action
|
||
# with a boolean payload so audit consumers can stop splitting on path.
|
||
try:
|
||
AuditRepository(conn).log(
|
||
user_id=user["email"],
|
||
action="memory_item.set_required",
|
||
resource=f"knowledge_item:{item_id}",
|
||
params={"new_value": True},
|
||
)
|
||
except Exception:
|
||
pass
|
||
return {"id": item_id, "is_required": True, "status": "mandatory"}
|
||
|
||
|
||
@router.post("/items/{item_id}/mark-mandatory")
|
||
async def mark_mandatory(
|
||
item_id: str,
|
||
user: dict = Depends(require_admin),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Promote an item to required (``is_required = TRUE``).
|
||
|
||
v49: explicit path-segment variant of the legacy ``/admin/mandate`` query-
|
||
param endpoint, matching the spec's Section 6 mapping table. Same audit
|
||
pattern but no audience / reason fields — those stay on /admin/mandate.
|
||
"""
|
||
repo = KnowledgeRepository(conn)
|
||
_get_item_or_404(repo, item_id)
|
||
repo.set_is_required(item_id, True)
|
||
AuditRepository(conn).log(
|
||
user_id=user["email"],
|
||
action="memory_item.set_required",
|
||
resource=f"knowledge_item:{item_id}",
|
||
params={"new_value": True},
|
||
)
|
||
return {"id": item_id, "is_required": True}
|
||
|
||
|
||
@router.post("/items/{item_id}/mark-unmandatory")
|
||
async def mark_unmandatory(
|
||
item_id: str,
|
||
user: dict = Depends(require_admin),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Demote an item from required (``is_required = FALSE``).
|
||
|
||
v49 — inverse of mark-mandatory. The item stays in the catalog with its
|
||
existing ``status`` (typically ``approved``); only the required-tier flag
|
||
flips. Audit row writes ``memory_item.set_required`` with
|
||
``{new_value: false}``.
|
||
"""
|
||
repo = KnowledgeRepository(conn)
|
||
_get_item_or_404(repo, item_id)
|
||
repo.set_is_required(item_id, False)
|
||
AuditRepository(conn).log(
|
||
user_id=user["email"],
|
||
action="memory_item.set_required",
|
||
resource=f"knowledge_item:{item_id}",
|
||
params={"new_value": False},
|
||
)
|
||
return {"id": item_id, "is_required": False}
|
||
|
||
|
||
@router.post("/admin/revoke")
|
||
async def admin_revoke(
|
||
item_id: str,
|
||
request: AdminActionRequest,
|
||
user: dict = Depends(require_admin),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
repo = KnowledgeRepository(conn)
|
||
_get_item_or_404(repo, item_id)
|
||
repo.update_status(item_id, "revoked")
|
||
_audit_action(conn, user["email"], "revoke", item_id, {"reason": request.reason})
|
||
return {"id": item_id, "status": "revoked"}
|
||
|
||
|
||
@router.post("/admin/edit")
|
||
async def admin_edit(
|
||
item_id: str,
|
||
request: EditRequest,
|
||
user: dict = Depends(require_admin),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
repo = KnowledgeRepository(conn)
|
||
_get_item_or_404(repo, item_id)
|
||
updates = {}
|
||
if request.title is not None:
|
||
updates["title"] = request.title
|
||
if request.content is not None:
|
||
updates["content"] = request.content
|
||
if updates:
|
||
repo.update(item_id, **updates)
|
||
_audit_action(conn, user["email"], "edit", item_id, updates)
|
||
return {"id": item_id, "updated": list(updates.keys())}
|
||
|
||
|
||
@router.post("/admin/batch")
|
||
async def admin_batch(
|
||
request: BatchActionRequest,
|
||
user: dict = Depends(require_admin),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Batch governance action on multiple items.
|
||
|
||
v49: ``mandate`` flips the new ``is_required`` boolean to TRUE (was
|
||
``status='mandatory'`` overload). Other actions still drive ``status``.
|
||
"""
|
||
repo = KnowledgeRepository(conn)
|
||
# mandate is special — it writes is_required, not status. All other
|
||
# actions stay on the status lifecycle column.
|
||
status_actions = {
|
||
"approve": "approved",
|
||
"reject": "rejected",
|
||
"revoke": "revoked",
|
||
}
|
||
if request.action not in (*status_actions, "mandate"):
|
||
raise HTTPException(status_code=400, detail=f"Invalid action: {request.action}")
|
||
|
||
results = {"success": [], "not_found": []}
|
||
for item_id in request.item_ids:
|
||
item = repo.get_by_id(item_id)
|
||
if not item:
|
||
results["not_found"].append(item_id)
|
||
continue
|
||
if request.action == "mandate":
|
||
repo.set_is_required(item_id, True)
|
||
if request.audience is not None:
|
||
repo.update(item_id, audience=request.audience)
|
||
else:
|
||
repo.update_status(item_id, status_actions[request.action])
|
||
_audit_action(conn, user["email"], request.action, item_id, {
|
||
"reason": request.reason, "audience": request.audience, "batch": True,
|
||
})
|
||
results["success"].append(item_id)
|
||
|
||
return results
|
||
|
||
|
||
@router.get("/admin/pending")
|
||
async def admin_pending(
|
||
category: Optional[str] = None,
|
||
page: int = 1,
|
||
per_page: int = 50,
|
||
user: dict = Depends(require_admin),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Get pending items queue for admin review."""
|
||
repo = KnowledgeRepository(conn)
|
||
page = max(page, 1)
|
||
offset = (page - 1) * per_page
|
||
items = repo.list_items(statuses=["pending"], category=category, limit=per_page, offset=offset)
|
||
return {"items": items, "count": len(items)}
|
||
|
||
|
||
@router.get("/admin/audit")
|
||
async def admin_audit(
|
||
page: int = 1,
|
||
per_page: int = 50,
|
||
action: Optional[str] = None,
|
||
user: dict = Depends(require_admin),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Get governance audit log.
|
||
|
||
Filters ``corporate_memory.<action>`` rows AND legacy ``km_<action>``
|
||
rows. The dual prefix is here because rows already in the audit log keep
|
||
the legacy ``km_*`` action name (no migration of historical audit rows —
|
||
they are write-once); new rows use the ``corporate_memory.*`` namespace.
|
||
See issue #62 decision E.
|
||
"""
|
||
# Pagination: page is 1-indexed; offset must apply to BOTH branches so the
|
||
# UI's per-page navigation actually returns subsequent rows. Pre-fix, both
|
||
# SQL paths had LIMIT only and silently returned page 1 for every page.
|
||
offset = (max(page, 1) - 1) * per_page
|
||
if action:
|
||
# Match the action across both prefixes so the per-action filter still
|
||
# surfaces historical rows.
|
||
rows = conn.execute(
|
||
"""SELECT * FROM audit_log
|
||
WHERE action IN (?, ?)
|
||
ORDER BY timestamp DESC LIMIT ? OFFSET ?""",
|
||
[f"corporate_memory.{action}", f"km_{action}", per_page, offset],
|
||
).fetchall()
|
||
else:
|
||
rows = conn.execute(
|
||
"""SELECT * FROM audit_log
|
||
WHERE action LIKE 'corporate_memory.%' OR action LIKE 'km_%'
|
||
ORDER BY timestamp DESC LIMIT ? OFFSET ?""",
|
||
[per_page, offset],
|
||
).fetchall()
|
||
if rows:
|
||
columns = [desc[0] for desc in conn.description]
|
||
entries = [dict(zip(columns, row)) for row in rows]
|
||
else:
|
||
entries = []
|
||
return {"entries": entries, "count": len(entries)}
|
||
|
||
|
||
# ---- Admin contradiction endpoints ----
|
||
|
||
@router.get("/admin/contradictions")
|
||
async def admin_contradictions(
|
||
resolved: Optional[bool] = None,
|
||
exclude_personal: bool = True,
|
||
user: dict = Depends(require_admin),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""List knowledge contradictions for admin review.
|
||
|
||
By default (`exclude_personal=True`), personal items are replaced with
|
||
{id, hidden: true} so the contradiction record is still visible for
|
||
governance but personal content is not exposed. Pass exclude_personal=false
|
||
to opt in to full content (KM_ADMIN only — see ADR Decision 1).
|
||
"""
|
||
repo = KnowledgeRepository(conn)
|
||
contradictions = repo.list_contradictions(resolved=resolved)
|
||
# Collect all distinct item IDs and fetch in one query (M5 batch optimisation).
|
||
all_item_ids = list({
|
||
id_
|
||
for c in contradictions
|
||
for id_ in (c["item_a_id"], c["item_b_id"])
|
||
})
|
||
items_by_id = repo.get_by_ids(all_item_ids)
|
||
for c in contradictions:
|
||
item_a = items_by_id.get(c["item_a_id"])
|
||
item_b = items_by_id.get(c["item_b_id"])
|
||
if exclude_personal:
|
||
c["item_a"] = {"id": c["item_a_id"], "hidden": True} if item_a and item_a.get("is_personal") else item_a
|
||
c["item_b"] = {"id": c["item_b_id"], "hidden": True} if item_b and item_b.get("is_personal") else item_b
|
||
else:
|
||
c["item_a"] = item_a
|
||
c["item_b"] = item_b
|
||
return {"contradictions": contradictions, "count": len(contradictions)}
|
||
|
||
|
||
@router.post("/admin/contradictions", status_code=201)
|
||
async def admin_create_contradiction(
|
||
request: CreateContradictionRequest,
|
||
user: dict = Depends(require_admin),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Admin endpoint for manually recording a contradiction between two knowledge items."""
|
||
repo = KnowledgeRepository(conn)
|
||
if not repo.get_by_id(request.item_a_id):
|
||
raise HTTPException(status_code=404, detail=f"Item A not found: {request.item_a_id}")
|
||
if not repo.get_by_id(request.item_b_id):
|
||
raise HTTPException(status_code=404, detail=f"Item B not found: {request.item_b_id}")
|
||
|
||
cid = repo.create_contradiction(
|
||
item_a_id=request.item_a_id,
|
||
item_b_id=request.item_b_id,
|
||
explanation=request.explanation,
|
||
severity=request.severity,
|
||
suggested_resolution=request.suggested_resolution,
|
||
)
|
||
return {"id": cid}
|
||
|
||
|
||
@router.post("/admin/contradictions/{contradiction_id}/resolve")
|
||
async def admin_resolve_contradiction(
|
||
contradiction_id: str,
|
||
request: ResolveContradictionRequest,
|
||
user: dict = Depends(require_admin),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Resolve a knowledge contradiction."""
|
||
repo = KnowledgeRepository(conn)
|
||
contradiction = repo.get_contradiction(contradiction_id)
|
||
if not contradiction:
|
||
raise HTTPException(status_code=404, detail="Contradiction not found")
|
||
if contradiction.get("resolved"):
|
||
raise HTTPException(status_code=400, detail="Contradiction already resolved")
|
||
|
||
valid_resolutions = ["kept_a", "kept_b", "merged", "both_valid"]
|
||
if request.resolution not in valid_resolutions:
|
||
raise HTTPException(
|
||
status_code=400,
|
||
detail=f"Resolution must be one of: {valid_resolutions}",
|
||
)
|
||
|
||
repo.resolve_contradiction(contradiction_id, user["email"], request.resolution)
|
||
_audit_action(conn, user["email"], "resolve_contradiction", contradiction_id, {
|
||
"resolution": request.resolution,
|
||
"item_a_id": contradiction["item_a_id"],
|
||
"item_b_id": contradiction["item_b_id"],
|
||
})
|
||
return {"id": contradiction_id, "resolved": True, "resolution": request.resolution}
|
||
|
||
|
||
# ---- Admin duplicate-candidate endpoints (issue #62) ----
|
||
|
||
VALID_DUPLICATE_RESOLUTIONS = ["duplicate", "different", "dismissed"]
|
||
DUPLICATE_RELATION_TYPE = "likely_duplicate"
|
||
|
||
|
||
def _strip_personal(item: Optional[dict], hide: bool) -> Optional[dict]:
|
||
"""Return a placeholder dict when ``item`` is personal and ``hide`` is set."""
|
||
if item is None:
|
||
return None
|
||
if hide and item.get("is_personal"):
|
||
return {"id": item.get("id"), "hidden": True}
|
||
return item
|
||
|
||
|
||
@router.get("/admin/duplicate-candidates")
|
||
async def admin_duplicate_candidates(
|
||
resolved: Optional[bool] = None,
|
||
exclude_personal: bool = True,
|
||
limit: int = 100,
|
||
user: dict = Depends(require_admin),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""List duplicate-candidate relations for admin review.
|
||
|
||
Pass ``resolved=true`` or ``resolved=false`` to filter; omit both to fetch
|
||
every state (the original UI default). The web UI keeps surfacing the
|
||
actionable backlog by passing ``resolved=false`` explicitly.
|
||
|
||
With ``exclude_personal=true`` (default) personal items in the pair are
|
||
replaced with ``{id, hidden: true}`` — the relation row is still visible
|
||
so admins can resolve it, but content stays inside the personal-item
|
||
privacy boundary (ADR Decision 1 precedent).
|
||
"""
|
||
repo = KnowledgeRepository(conn)
|
||
relations = repo.list_relations(
|
||
relation_type=DUPLICATE_RELATION_TYPE,
|
||
resolved=resolved,
|
||
limit=limit,
|
||
)
|
||
item_ids = list({
|
||
id_
|
||
for r in relations
|
||
for id_ in (r["item_a_id"], r["item_b_id"])
|
||
})
|
||
items_by_id = repo.get_by_ids(item_ids) if item_ids else {}
|
||
for r in relations:
|
||
r["item_a"] = _strip_personal(items_by_id.get(r["item_a_id"]), exclude_personal)
|
||
r["item_b"] = _strip_personal(items_by_id.get(r["item_b_id"]), exclude_personal)
|
||
return {"relations": relations, "count": len(relations)}
|
||
|
||
|
||
@router.post("/admin/duplicate-candidates/resolve")
|
||
async def admin_resolve_duplicate_candidate(
|
||
item_a_id: str,
|
||
item_b_id: str,
|
||
request: ResolveDuplicateRequest,
|
||
user: dict = Depends(require_admin),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Resolve a duplicate-candidate relation.
|
||
|
||
Admin chooses: ``duplicate`` (acknowledge), ``different`` (false
|
||
positive), or ``dismissed`` (don't surface again, but no judgment).
|
||
Idempotent re-resolve is rejected with 400 — the audit trail wants one
|
||
decision per pair.
|
||
"""
|
||
if request.resolution not in VALID_DUPLICATE_RESOLUTIONS:
|
||
raise HTTPException(
|
||
status_code=400,
|
||
detail=f"resolution must be one of: {VALID_DUPLICATE_RESOLUTIONS}",
|
||
)
|
||
repo = KnowledgeRepository(conn)
|
||
existing = repo.get_relation(item_a_id, item_b_id, DUPLICATE_RELATION_TYPE)
|
||
if not existing:
|
||
raise HTTPException(status_code=404, detail="Duplicate-candidate relation not found")
|
||
if existing.get("resolved"):
|
||
raise HTTPException(status_code=400, detail="Relation already resolved")
|
||
|
||
repo.resolve_relation(
|
||
item_a_id=item_a_id,
|
||
item_b_id=item_b_id,
|
||
relation_type=DUPLICATE_RELATION_TYPE,
|
||
resolved_by=user["email"],
|
||
resolution=request.resolution,
|
||
)
|
||
# Resource-id of audit row is the canonical (a,b) pair for grep-ability.
|
||
a, b = sorted([item_a_id, item_b_id])
|
||
_audit_action(
|
||
conn,
|
||
user["email"],
|
||
"resolve_duplicate",
|
||
f"{a}::{b}",
|
||
{"resolution": request.resolution, "item_a_id": a, "item_b_id": b},
|
||
)
|
||
return {
|
||
"item_a_id": a,
|
||
"item_b_id": b,
|
||
"resolved": True,
|
||
"resolution": request.resolution,
|
||
}
|
||
|
||
|
||
# ---- Admin PATCH + bulk-update + tree endpoints (issue #62) ----
|
||
|
||
|
||
@router.get("/admin/{item_id}")
|
||
async def admin_get_item(
|
||
item_id: str,
|
||
user: dict = Depends(require_admin),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Single-item GET — powers the ``#item-<id>`` deep link from
|
||
`/memory/d/<slug>`'s Edit affordance. The admin page uses this to
|
||
fetch the row directly (bypassing pagination of the All-Items list)
|
||
so the edit modal opens reliably regardless of which page the item
|
||
happens to fall on. Returns the same dict shape as the list rows.
|
||
|
||
Route placement note: declared AFTER all named ``/admin/<word>`` GET
|
||
routes (pending, audit, contradictions, duplicate-candidates) so the
|
||
catch-all ``{item_id}`` doesn't shadow them — FastAPI matches in
|
||
declaration order.
|
||
"""
|
||
repo = KnowledgeRepository(conn)
|
||
item = repo.get_by_id(item_id)
|
||
if not item:
|
||
raise HTTPException(status_code=404, detail="item_not_found")
|
||
return item
|
||
|
||
|
||
@router.patch("/admin/{item_id}")
|
||
async def admin_patch_item(
|
||
item_id: str,
|
||
request: PatchItemRequest,
|
||
user: dict = Depends(require_admin),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Partial update — accepts category/domain/tags/audience/title/content.
|
||
|
||
Replaces the narrow ``POST /api/memory/admin/edit`` (kept one release as
|
||
a thin alias). Audit row tagged ``corporate_memory.update_item`` records
|
||
which fields changed (not the full diff — keep audit rows compact).
|
||
"""
|
||
repo = KnowledgeRepository(conn)
|
||
_get_item_or_404(repo, item_id)
|
||
|
||
# ``exclude_unset=True`` preserves explicit ``null`` values from the request
|
||
# body so callers can clear previously-set Optional fields (e.g. PATCH
|
||
# ``{"audience": null}`` resets audience to NULL). With ``exclude_none=True``
|
||
# those nulls were silently dropped — the only path to clear was the
|
||
# empty-string short-circuit on ``domain``, and ``audience`` had no clearing
|
||
# path at all. See PR #126 round-4 review.
|
||
updates = request.model_dump(exclude_unset=True)
|
||
if "domain" in updates and updates["domain"]:
|
||
_validate_domain_slug(updates["domain"], conn)
|
||
# ``title`` is NOT NULL in the schema. ``exclude_unset=True`` lets explicit
|
||
# ``null`` through, which would 500 on a DuckDB constraint violation. Reject
|
||
# at the boundary so the caller gets a 400 with a clear message instead.
|
||
if "title" in updates and updates["title"] is None:
|
||
raise HTTPException(status_code=400, detail="title cannot be null")
|
||
|
||
# M:N domain membership lives in ``knowledge_item_domains`` and is
|
||
# written via a separate junction repo call — strip from the legacy
|
||
# ``repo.update(**)`` kwargs since the knowledge_items row has no
|
||
# ``domain_ids`` column.
|
||
domain_ids = updates.pop("domain_ids", None)
|
||
|
||
if not updates and domain_ids is None:
|
||
return {"id": item_id, "updated": []}
|
||
|
||
# tags is a list — JSON-encode to match the column type, mirroring create().
|
||
repo_kwargs = dict(updates)
|
||
if "tags" in repo_kwargs:
|
||
repo_kwargs["tags"] = (
|
||
json.dumps(repo_kwargs["tags"]) if repo_kwargs["tags"] else None
|
||
)
|
||
if repo_kwargs:
|
||
repo.update(item_id, **repo_kwargs)
|
||
|
||
# Junction write — replace the item's full domain membership atomically.
|
||
# Resolve ids → slugs because replace_domains_for_item takes slugs;
|
||
# unknown ids raise 400 (admin's chip-input only picks from
|
||
# /api/admin/memory-domains so a missing id means a race or an
|
||
# already-deleted domain).
|
||
if domain_ids is not None:
|
||
from src.repositories.memory_domains import MemoryDomainsRepository
|
||
dom_repo = MemoryDomainsRepository(conn)
|
||
if domain_ids:
|
||
placeholders = ",".join(["?"] * len(domain_ids))
|
||
rows = conn.execute(
|
||
f"SELECT id, slug FROM memory_domains WHERE id IN ({placeholders})",
|
||
domain_ids,
|
||
).fetchall()
|
||
id_to_slug = {r[0]: r[1] for r in rows}
|
||
missing = [i for i in domain_ids if i not in id_to_slug]
|
||
if missing:
|
||
raise HTTPException(
|
||
status_code=400,
|
||
detail=f"unknown_domain_ids: {missing}",
|
||
)
|
||
slugs = [id_to_slug[i] for i in domain_ids]
|
||
else:
|
||
slugs = []
|
||
dom_repo.replace_domains_for_item(
|
||
item_id, slugs, added_by=user["email"]
|
||
)
|
||
|
||
audit_keys = sorted(updates.keys())
|
||
if domain_ids is not None:
|
||
audit_keys.append("domain_ids")
|
||
_audit_action(
|
||
conn,
|
||
user["email"],
|
||
"update_item",
|
||
item_id,
|
||
{"updated_fields": audit_keys},
|
||
)
|
||
return {"id": item_id, "updated": audit_keys}
|
||
|
||
|
||
@router.post("/admin/bulk-update")
|
||
async def admin_bulk_update(
|
||
request: BulkUpdateRequest,
|
||
user: dict = Depends(require_admin),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Apply ``updates`` to every id in ``item_ids``. Per-id audit rows.
|
||
|
||
Returns a per-id status map plus rolled-up convenience lists (200 even on
|
||
partial failure — the body distinguishes successes from misses).
|
||
"""
|
||
repo = KnowledgeRepository(conn)
|
||
updates = dict(request.updates or {})
|
||
# Reject governance-sensitive fields BEFORE hitting the repo. _UPDATABLE_FIELDS
|
||
# in the repo is broad on purpose; this endpoint is the narrow path. Callers
|
||
# that need to flip status/sensitivity/is_personal must use the dedicated
|
||
# governance endpoints so the right audit row is written. See PR #126 review.
|
||
disallowed = sorted(k for k in updates.keys() if k not in _BULK_UPDATE_ALLOWED)
|
||
if disallowed:
|
||
raise HTTPException(
|
||
status_code=400,
|
||
detail=(
|
||
f"updates contains disallowed field(s): {disallowed}. "
|
||
f"Allowed: {sorted(_BULK_UPDATE_ALLOWED)}"
|
||
),
|
||
)
|
||
if "domain" in updates and updates["domain"]:
|
||
_validate_domain_slug(updates["domain"], conn)
|
||
# Mirror the PATCH boundary check — title is NOT NULL in the schema, so
|
||
# an explicit null here would fall through to a per-item Constraint Error
|
||
# in repo.bulk_update() instead of a clean 400 to the caller.
|
||
if "title" in updates and updates["title"] is None:
|
||
raise HTTPException(status_code=400, detail="title cannot be null")
|
||
if not request.item_ids:
|
||
return {"updated": [], "not_found": [], "errors": {}}
|
||
|
||
statuses = repo.bulk_update(request.item_ids, updates)
|
||
|
||
# Allowlist already enforced above, so every key in updates is auditable.
|
||
audited_fields = sorted(updates.keys())
|
||
updated: List[str] = []
|
||
not_found: List[str] = []
|
||
errors: dict = {}
|
||
for item_id, status in statuses.items():
|
||
if status == "updated":
|
||
updated.append(item_id)
|
||
_audit_action(
|
||
conn,
|
||
user["email"],
|
||
"bulk_update",
|
||
item_id,
|
||
{"updated_fields": audited_fields, "batch": True},
|
||
)
|
||
elif status == "not_found":
|
||
not_found.append(item_id)
|
||
else:
|
||
errors[item_id] = status
|
||
return {"updated": updated, "not_found": not_found, "errors": errors}
|
||
|
||
|
||
# Axes the tree endpoint groups by. Anything else → 400. Order matters for
|
||
# the default chip rendering in the UI.
|
||
_TREE_AXES = ("domain", "category", "tag", "audience")
|
||
|
||
|
||
def _label_for_axis(axis: str, key: Optional[str]) -> str:
|
||
"""Pretty bucket label for the tree UI. Falls back to the raw key."""
|
||
if key is None or key == "":
|
||
return {
|
||
"domain": "(no domain)",
|
||
"category": "(no category)",
|
||
"tag": "(no tag)",
|
||
"audience": "All users",
|
||
}.get(axis, "(unset)")
|
||
if axis == "audience" and key == "all":
|
||
return "All users"
|
||
if axis == "audience" and key.startswith("group:"):
|
||
return f"Group: {key[len('group:'):]}"
|
||
return key
|
||
|
||
|
||
def _matches_chip_filters(
|
||
item: dict,
|
||
*,
|
||
status_filter: Optional[str],
|
||
source_type: Optional[str],
|
||
audience: Optional[str],
|
||
has_duplicate_ids: Optional[set],
|
||
q: Optional[str],
|
||
) -> bool:
|
||
"""Apply chip filters to an already-RBAC-filtered item."""
|
||
if status_filter and item.get("status") != status_filter:
|
||
return False
|
||
if source_type and item.get("source_type") != source_type:
|
||
return False
|
||
if audience:
|
||
# Treat NULL audience as 'all' so chip-filter behavior matches the
|
||
# SQL audience filter, ``count_by_audience`` (COALESCE→'all'), and the
|
||
# tree's ``_bucket_key`` (NULL → 'all'). Without this coalesce,
|
||
# NULL-audience items disappear from the ``audience=all`` chip even
|
||
# though the rest of the system treats them as visible-to-everyone.
|
||
item_audience = item.get("audience") or "all"
|
||
if item_audience != audience:
|
||
return False
|
||
if has_duplicate_ids is not None and item.get("id") not in has_duplicate_ids:
|
||
return False
|
||
if q:
|
||
needle = q.lower()
|
||
title = (item.get("title") or "").lower()
|
||
content = (item.get("content") or "").lower()
|
||
if needle not in title and needle not in content:
|
||
return False
|
||
return True
|
||
|
||
|
||
@router.get("/tree")
|
||
async def get_tree(
|
||
axis: str = "domain",
|
||
status_filter: Optional[str] = None,
|
||
source_type: Optional[str] = None,
|
||
audience: Optional[str] = None,
|
||
q: Optional[str] = None,
|
||
has_duplicate: bool = False,
|
||
exclude_personal: bool = True,
|
||
is_required: Optional[bool] = None,
|
||
page: int = 1,
|
||
per_page: int = 50,
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Server-side grouping for the Browse / Group-by tree (issue #62).
|
||
|
||
Returns ``{groups: [{key, label, count, items: [...]}]}`` already
|
||
RBAC-filtered + chip-filtered. The same ``_effective_groups`` and
|
||
``_can_view_item`` helpers used by ``GET /api/memory`` apply, so a
|
||
non-admin caller never sees personal items belonging to others, and
|
||
audience-restricted items only surface for members of the audience
|
||
group.
|
||
|
||
On the ``tag`` axis a single item appears once per tag it holds — that
|
||
is the intended "overlapping bucket" affordance. Every other axis puts
|
||
each item in its single canonical bucket.
|
||
"""
|
||
if axis not in _TREE_AXES:
|
||
raise HTTPException(
|
||
status_code=400,
|
||
detail=f"axis must be one of: {list(_TREE_AXES)}",
|
||
)
|
||
repo = KnowledgeRepository(conn)
|
||
is_priv = _is_privileged_viewer(user, conn)
|
||
effective_groups = _effective_groups(user, conn)
|
||
# Privacy parity with ``GET /api/memory``: non-admin can never opt out.
|
||
effective_exclude_personal = True if not is_priv else exclude_personal
|
||
|
||
page = max(page, 1)
|
||
per_page = max(min(per_page, 500), 1)
|
||
|
||
# has_duplicate=true narrows the candidate set to items present in any
|
||
# unresolved likely_duplicate relation. Computed once; intersected per
|
||
# item below.
|
||
has_duplicate_ids: Optional[set] = None
|
||
if has_duplicate:
|
||
rels = repo.list_relations(
|
||
relation_type=DUPLICATE_RELATION_TYPE, resolved=False, limit=10000,
|
||
)
|
||
has_duplicate_ids = {
|
||
id_ for r in rels for id_ in (r["item_a_id"], r["item_b_id"])
|
||
}
|
||
|
||
# Audience-axis privacy (decision 13): non-admins only see their own
|
||
# group buckets + null/all. Use the audience pre-filter on the SQL side
|
||
# so non-admins never accidentally see another group's bucket count.
|
||
granted_domains = _caller_granted_memory_domains(user, conn)
|
||
statuses = [status_filter] if status_filter else None
|
||
items = repo.list_items(
|
||
statuses=statuses,
|
||
source_type=source_type,
|
||
is_required=is_required,
|
||
exclude_personal=effective_exclude_personal,
|
||
user_groups=effective_groups,
|
||
granted_domains=granted_domains,
|
||
limit=10000,
|
||
offset=0,
|
||
)
|
||
|
||
# Apply remaining chip filters that don't have a SQL layer yet.
|
||
visible: List[dict] = []
|
||
for item in items:
|
||
if not _can_view_item(user, item, is_priv):
|
||
continue
|
||
if not _matches_chip_filters(
|
||
item,
|
||
status_filter=None, # already in SQL
|
||
source_type=None, # already in SQL
|
||
audience=audience,
|
||
has_duplicate_ids=has_duplicate_ids,
|
||
q=q,
|
||
):
|
||
continue
|
||
visible.append(item)
|
||
|
||
# Group items by axis. tag is multi-bucket; everything else is single.
|
||
groups: dict = {}
|
||
|
||
def _bucket_key(item: dict, axis: str) -> List[str]:
|
||
if axis == "tag":
|
||
tags = item.get("tags")
|
||
if isinstance(tags, str):
|
||
try:
|
||
tags = json.loads(tags)
|
||
except json.JSONDecodeError:
|
||
tags = []
|
||
if not tags:
|
||
return [""]
|
||
return [str(t) for t in tags]
|
||
if axis == "audience":
|
||
aud = item.get("audience")
|
||
return [aud if aud else "all"]
|
||
if axis == "category":
|
||
return [item.get("category") or ""]
|
||
# default: domain
|
||
return [item.get("domain") or ""]
|
||
|
||
for item in visible:
|
||
for key in _bucket_key(item, axis):
|
||
bucket = groups.setdefault(key, {
|
||
"key": key,
|
||
"label": _label_for_axis(axis, key),
|
||
"items": [],
|
||
})
|
||
bucket["items"].append(item)
|
||
|
||
# Stable ordering: alphabetic on key with the empty bucket sinking to
|
||
# the bottom — the UI usually wants the "real" buckets up top.
|
||
ordered = sorted(
|
||
groups.values(),
|
||
key=lambda g: (g["key"] == "", g["key"].lower() if isinstance(g["key"], str) else ""),
|
||
)
|
||
for g in ordered:
|
||
g["count"] = len(g["items"])
|
||
|
||
# Page over groups (not items): operators paging through hundreds of
|
||
# tags want bucket-level pagination. The UI expands a bucket to see all
|
||
# its items.
|
||
start = (page - 1) * per_page
|
||
paged = ordered[start:start + per_page]
|
||
return {
|
||
"axis": axis,
|
||
"groups": paged,
|
||
"page": page,
|
||
"per_page": per_page,
|
||
"total_groups": len(ordered),
|
||
"total_items": sum(g["count"] for g in ordered),
|
||
}
|
||
|
||
|
||
# ---- Bundle endpoint ----
|
||
|
||
|
||
def _build_per_domain_markdown(
|
||
slug: str, user: dict, conn: duckdb.DuckDBPyConnection
|
||
) -> Response:
|
||
"""Render a deterministic markdown bundle for a single memory domain.
|
||
|
||
Used by ``agnes pull`` to write ``~/.claude/memory/<slug>/bundle.md``.
|
||
The bundle includes both ``is_required=TRUE`` and approved items so
|
||
the per-domain md5 in ``/api/sync/manifest`` (built from the same
|
||
item set in ``_build_memory_domains_section``) matches the md5 of
|
||
what the CLI just received. Items are sorted by ``id`` to mirror the
|
||
manifest's md5 computation byte-for-byte (Section 5.1 of the
|
||
unified-stack design).
|
||
|
||
RBAC: the caller must have a grant on the domain — admins bypass
|
||
via ``can_access``'s admin short-circuit. Anonymous or grantless
|
||
callers get 403.
|
||
"""
|
||
repo = MemoryDomainsRepository(conn)
|
||
dom = repo.get_by_slug(slug)
|
||
if not dom:
|
||
raise HTTPException(status_code=404, detail="memory_domain_not_found")
|
||
if not can_access(user["id"], "memory_domain", dom["id"], conn):
|
||
raise HTTPException(status_code=403, detail="no_grant")
|
||
|
||
# Pull items the same way the manifest md5 helper does — id order,
|
||
# full payload (title/status/is_required pulled via the knowledge
|
||
# repository for content), no token-budget truncation.
|
||
items_meta = repo.list_items_of_domain(dom["id"], limit=10000)
|
||
if not items_meta:
|
||
body = f"# {dom['name']}\n\n_No items in this domain yet._\n"
|
||
return Response(content=body, media_type="text/markdown; charset=utf-8")
|
||
|
||
# Fetch full bodies — list_items_of_domain only returns id/title/status.
|
||
knowledge_repo = KnowledgeRepository(conn)
|
||
full_items: list = []
|
||
for meta in sorted(items_meta, key=lambda r: r["id"]):
|
||
full = knowledge_repo.get_by_id(meta["id"])
|
||
if not full:
|
||
continue
|
||
full_items.append(full)
|
||
|
||
lines: list = [f"# {dom['name']}", ""]
|
||
if dom.get("description"):
|
||
lines.append(dom["description"])
|
||
lines.append("")
|
||
|
||
required = [it for it in full_items if it.get("is_required")]
|
||
approved = [
|
||
it
|
||
for it in full_items
|
||
if not it.get("is_required") and it.get("status") == "approved"
|
||
]
|
||
|
||
if required:
|
||
lines.append("## Required")
|
||
lines.append("")
|
||
for it in required:
|
||
lines.append(f"### {it.get('title', 'Untitled')}")
|
||
lines.append("")
|
||
lines.append(it.get("content", "") or "")
|
||
lines.append("")
|
||
|
||
if approved:
|
||
lines.append("## Approved")
|
||
lines.append("")
|
||
for it in approved:
|
||
lines.append(f"### {it.get('title', 'Untitled')}")
|
||
lines.append("")
|
||
lines.append(it.get("content", "") or "")
|
||
lines.append("")
|
||
|
||
body = "\n".join(lines).rstrip() + "\n"
|
||
return Response(content=body, media_type="text/markdown; charset=utf-8")
|
||
|
||
|
||
@router.get("/bundle")
|
||
async def get_bundle(
|
||
domain: Optional[str] = None,
|
||
user: dict = Depends(get_current_user),
|
||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||
):
|
||
"""Token-budgeted bundle of knowledge items for AI agent injection.
|
||
|
||
Mandatory items are always included regardless of the token budget.
|
||
Approved items are confidence×recency-ranked and included until the budget
|
||
is exhausted. Audience-filtered by the caller's group memberships (admins
|
||
see everything).
|
||
|
||
v49: when ``?domain=<slug>`` is supplied the response shape switches
|
||
to ``text/markdown`` containing a deterministic per-domain bundle —
|
||
that's what ``agnes pull`` writes to ``~/.claude/memory/<slug>/bundle.md``.
|
||
RBAC: the caller must have a ``MEMORY_DOMAIN`` grant on the domain
|
||
(admins bypass per ``can_access``). The markdown body sorts items
|
||
alphabetically by title and includes both required and approved
|
||
items (required first, with a marker) so the bundle md5 in the
|
||
manifest matches what the CLI re-renders.
|
||
"""
|
||
from datetime import datetime, timezone
|
||
|
||
# ----- Per-domain markdown variant (v49) -----
|
||
if domain:
|
||
return _build_per_domain_markdown(domain, user, conn)
|
||
|
||
repo = KnowledgeRepository(conn)
|
||
effective_groups = _effective_groups(user, conn)
|
||
granted_domains = _caller_granted_memory_domains(user, conn)
|
||
|
||
# v46: the bundle is what AI agents inject as context, so the opt-out
|
||
# has real effect here — it's always-on for the calling user. Mandatory
|
||
# items are exempted by the EXISTS subquery's status guard inside
|
||
# ``list_items``; the user's dismissal row for a then-approved item is
|
||
# silently ignored if/when the item is later mandated.
|
||
dismissed_by_user_id = user["id"]
|
||
# v49: Required tier rides on is_required boolean. Was statuses=['mandatory'].
|
||
mandatory = repo.list_items(
|
||
is_required=True,
|
||
exclude_personal=True,
|
||
user_groups=effective_groups,
|
||
granted_domains=granted_domains,
|
||
dismissed_by_user=dismissed_by_user_id,
|
||
hide_dismissed=True,
|
||
limit=1000,
|
||
offset=0,
|
||
)
|
||
|
||
approved = repo.list_items(
|
||
statuses=["approved"],
|
||
is_required=False,
|
||
exclude_personal=True,
|
||
user_groups=effective_groups,
|
||
granted_domains=granted_domains,
|
||
dismissed_by_user=dismissed_by_user_id,
|
||
hide_dismissed=True,
|
||
limit=1000,
|
||
offset=0,
|
||
)
|
||
|
||
# Rank approved by confidence × recency (days since updated_at, max 365).
|
||
# updated_at is intentional: a recently admin-edited item reflects a human
|
||
# who just reviewed and corrected it, making it more trustworthy than an
|
||
# older untouched item. This differs from confidence.py which decays from
|
||
# created_at — the two scores serve different purposes (credibility vs freshness).
|
||
now = datetime.now(timezone.utc)
|
||
|
||
def _rank(item: dict) -> float:
|
||
confidence = float(item["confidence"]) if item.get("confidence") is not None else 0.5
|
||
updated_raw = item.get("updated_at")
|
||
if updated_raw:
|
||
try:
|
||
if isinstance(updated_raw, str):
|
||
from datetime import datetime as dt
|
||
updated = dt.fromisoformat(updated_raw.replace("Z", "+00:00"))
|
||
else:
|
||
updated = updated_raw
|
||
if updated.tzinfo is None:
|
||
from datetime import timezone as tz
|
||
updated = updated.replace(tzinfo=tz.utc)
|
||
age_days = max((now - updated).days, 0)
|
||
except Exception:
|
||
age_days = 365
|
||
else:
|
||
age_days = 365
|
||
recency = max(0.0, 1.0 - age_days / 365.0)
|
||
return confidence * recency
|
||
|
||
approved_ranked = sorted(approved, key=_rank, reverse=True)
|
||
|
||
def _token_est(item: dict) -> int:
|
||
return len((item.get("title", "") + " " + item.get("content", ""))) // _CHARS_PER_TOKEN
|
||
|
||
budget_remaining = BUNDLE_TOKEN_BUDGET - sum(_token_est(i) for i in mandatory)
|
||
approved_included = []
|
||
for item in approved_ranked:
|
||
cost = _token_est(item)
|
||
if budget_remaining - cost < 0:
|
||
break
|
||
approved_included.append(item)
|
||
budget_remaining -= cost
|
||
|
||
return {
|
||
"mandatory": mandatory,
|
||
"approved": approved_included,
|
||
"token_estimate": BUNDLE_TOKEN_BUDGET - budget_remaining,
|
||
"token_budget": BUNDLE_TOKEN_BUDGET,
|
||
}
|