agnes-the-ai-analyst/app/api/memory.py
ZdenekSrotyr 64cf78860d
feat(stack): unified Browse + My Stack for Data Packages and Memory (v49 schema) (#333)
* feat(unified-stack): Browse + My Stack + Recipes + RBAC matrix (v49–v55)

Squash of 94 commits spanning the v49 → v55 unified-stack rewrite.
Full per-feature breakdown lives in CHANGELOG.md under [Unreleased].
Major buckets:

* v49 schema — first-class user_groups + user_group_members +
  resource_grants; admin can CRUD groups and grants; Google
  Workspace nightly sync writes into the new tables.
* v49 data_packages — admin-curated bundles of tables, RBAC-gated,
  first-class section on /catalog Browse + My Stack.
* v49 memory_domains — row-backed (replaces hardcoded VALID_DOMAINS
  enum); admin can CRUD; grants follow the same shape as tables and
  packages.
* v50 cover_image_url + admin sidebar collapsibles + per-row Mode
  tooltip + admin queue domain badges + admin "+ New Item" seed flow.
* v51 lifecycle status (prod/poc/coming-soon/draft) + category +
  palette swatches on admin modals.
* v52 per-table detail page /catalog/t/<id>.
* v53 Recipes — admin-curated SQL templates as a second tab on
  /catalog with full Edit/Delete admin affordances.
* v54 soft-delete (deleted_at) + Undo toast for packages, memory
  domains, and recipes; hard_delete() retained as escape hatch.
* v55 Recipes RBAC — ResourceType.RECIPE registered, inline Group
  Access matrix on Create + Edit Recipe modals (mirrors the Memory
  Domain pattern).
* Activity Center per-resource filter (resource_prefix LIKE-anchored
  on audit_log.resource); admin nav g+letter keyboard shortcuts;
  loadAdminTablesLayout N+1 → single endpoint; /api/memory 30s
  page-level cache.
* CI hardening — Keboola legacy tests pytest.importorskip; perf-
  smoke threshold widened to stop cold-cache flake.

5002 tests passing, 35 skipped.

* feat(p2 backlog): Cmd-K palette + suggest-a-domain + nightly E2E + v55 schema

10-item P2 sweep on top of the unified-stack squash. New behaviour:

* Cmd-K admin command palette (base.html) — fuzzy-search overlay over
  admin + user-facing routes. Arrows/Enter to navigate, Esc to close.
* Stack-tabs digit shortcuts — 1/2/3 switch Browse / My Stack /
  Recipes on /catalog + /corporate-memory.
* Friendlier non-admin empty state on /corporate-memory, plus a
  "Suggest a domain" CTA → POST /api/memory-domain-suggestions, admin
  queue with approve/reject. Backed by a new memory_domain_suggestions
  table (schema v55).
* /admin/corporate-memory 7-tab strip grouped under Moderation /
  Catalog parent labels.
* Bulk-assign table → package dropdown annotates each option with
  "(N of M tables already in)" so the existing distribution is visible
  before picking a target.
* GET /api/memory + /tree accept is_required filter; admin status
  dropdowns route the "Required" sentinel onto it (status no longer
  holds 'mandatory' post-v49, so the old dropdown returned nothing).
* chip-input.js is now opt-in per template via {% block extra_scripts %}
  instead of loaded globally on every page from base.html.
* Edit-modal close helpers consolidated onto _closeEditModalById();
  docs the per-source-type modal architecture decision.
* New .github/workflows/e2e-nightly.yml runs agent-browser smoke
  scripts (scripts/e2e/smoke_*.sh) against a docker-compose stack
  nightly at 04:30 UTC; failures open an agent-browser-nightly issue.

5012 tests passing, 35 skipped.

* fix(visual audit): 6 page regressions on memory + data-package surfaces

agent-browser walkthrough of every memory + data-package page in the PR
turned up 6 real bugs. Fixes:

1. Admin memory modals were dead. Duplicate `let _cmdNewDomainId`
   declarations from the deprecated step-2 RBAC stubs in
   admin_corporate_memory.html collided with the live state vars
   declared earlier in the same <script> → SyntaxError on parse →
   the entire second script block silently failed → every inline
   onclick= handler defined there (`+ New Memory Domain`, Edit, etc.)
   was a no-op. Removed the duplicate stubs.

2. /catalog/t/<table_id> + /catalog/r/<slug> rendered unstyled.
   Both templates injected their CSS via {% block head %} but
   base.html exposes {% block head_extra %} — wrong block name
   meant <style> rules never reached the rendered HTML. Renamed
   to head_extra. Hero card, section cards, dark SQL block, proper
   full-width inputs all now render as designed.

3. L49 leak — "MANDATORY" KPI label + "Make Mandatory" row buttons
   on /admin/corporate-memory still used the old word. Renamed to
   "Required" / "Mark as Required" so UI matches the data model
   (v49 split moved the Required tier onto the orthogonal
   is_required boolean; status no longer holds 'mandatory').

4. Activity Center Resource dropdown didn't know the v55
   `memory_domain_suggestion:` namespace — added it.

5. Tab strip on /admin/corporate-memory wrapped text 2× per button
   on narrow viewports after the L50 MODERATION/CATALOG group
   labels pushed total width past most viewports. Switched the
   strip to flex-wrap:nowrap + overflow-x:auto with
   white-space:nowrap + flex-shrink:0 on every direct child so the
   tabs stay one row and slide horizontally when they overflow.

5012 tests passing, 35 skipped.

* rebase-cleanup: align with main's 0.54.25-27 API design + comment fix

Three follow-on fixes after rebasing onto origin/main (0.54.27):

* admin_tables.html: dropped a stray nested ``{% if data_source_type
  == 'keboola' %}`` around ``prefillFromKeboolaTable`` (main never had
  it; the outer Phase F2 guard already covers it) and reworded a JS
  comment that contained literal ``{% %}`` tokens which Jinja was
  parsing as a real tag → unbalanced if/endif → 30 template render
  failures across the suite.
* /api/stack/subscription/{type}/{id}: DELETE now returns 204 instead
  of 200 per the 0.54.26 design rules. CLI client + parity tests
  updated to accept 2xx / assert 204.
* Memory-domain suggestion approve/reject paths added to
  ``_VERB_PATH_ALLOWLIST`` — they are pending → approved/rejected
  state-machine transitions (approve also creates the real
  memory_domains row as a side effect), so the RPC shape is
  intentional rather than a missed PATCH refactor.

5035 tests passing, 35 skipped.

* fix(catalog_table_detail): real polish pass — hero glyph, dedup pills, rows/size meta, scoped sync CTA

The previous fix only got the block-name typo so the existing CSS rendered.
The actual layout was still wireframe-tier on close inspection:

* No cover glyph in the hero (a flat white card with title + meta line);
  data-package + memory-domain detail pages both have a colored icon
  square. Restored parity — table.icon emoji if set, otherwise initials
  on a colored square using table.color.
* "INTERNAL" pill rendered twice for agnes_audit etc. — the mode pill
  and the source-type pill happened to be identical strings. Now skip
  the source pill when it matches the mode (`internal == internal`).
* Bucket / source_table code chip showed `Agnes Internal.audit_log` for
  internal rows — meaningless to a user. Hidden when source_type is
  internal.
* `pairs_well_with` admin input was a comma-separated `<input>` always
  visible. Wrapped all 4 sections in an Edit-on-demand toggle: read-
  only display by default, "+ Add" / "Edit" button on the right edge
  of each section header reveals the inline form, Cancel hides it.
* "Trigger sync now" was a cramped link squashed into the empty-state
  flex row (visible as `Tr…` overflow before). Promoted to a proper
  btn-primary button under the empty-state copy. Hidden entirely for
  internal tables (which are server-managed — no upstream to pull).
* Hero meta now surfaces row count + payload size (when sync_state has
  them) + last sync timestamp on a single line — was missing from the
  original.
* Mode pills colored by tier (local=green, remote=amber, materialized=
  blue, internal=gray) so the basic fact about a table reads at a
  glance, not from upper-cased ALL-CAPS text alone.

* tests(v56): TDD baseline for extended data-packages content + per-table docs

68 failing tests across 8 files spec the v56 surface before any
implementation lands:

* test_schema_v55_to_v56_migration.py — schema bump, additive ALTERs
  on data_packages + table_registry, idempotency, sequential-upgrade
  preservation
* test_data_packages_repo_v56.py — repo create/update/get/list for
  owner_name, owner_team, tags, long_description, when_to_use,
  when_not_to_use, example_questions (JSON list round-trip, empty
  defaults, partial-update preservation)
* test_table_registry_v56_docs.py — update_docs for grain, platforms,
  partition_col, history, gotchas; preserves v52 docs columns
* test_api_data_packages_v56.py — PUT/POST/GET for all new fields,
  field-level validation (tag count, bullet length, description size),
  virtual badge derivation (curated/new)
* test_api_registry_docs_v56.py — PATCH /api/admin/registry/{id}/docs
  for v56 fields, validation, RBAC unchanged
* test_web_catalog_package_detail_v56.py — /catalog/p/<slug> rewrite
  asserts on rendered owner line, tag pills, badges, What it is,
  Use it when, Skip it when, Example questions, per-table extended
  detail in collapsible row, key-gotcha distinctness, admin-only Edit
* test_web_stack_card_v56_metadata.py — Browse-grid card additions
  (owner chip, tag chips, badges) without breaking back-compat for
  rows missing the new fields
* test_data_packages_no_vendor_content.py — CI guard: scans app/ +
  src/ + cli/ + config/ + scripts/ for Groupon-specific tokens from
  the colleague's spec MD; fails if any leak into OSS surfaces
* test_db_schema_version.py — bumped 55 → 56 with rationale

Plus updates schema-version assertion to 56. Implementation lands in
subsequent commits (schema migration → repo → API → templates).

* feat(v56): schema + repo for extended data-packages content

Schema additions (ALTER ADD COLUMN IF NOT EXISTS — additive + idempotent):

* data_packages: owner_name, owner_team, tags, long_description,
  when_to_use, when_not_to_use, example_questions (JSON-as-VARCHAR for
  the lists)
* table_registry: grain, platforms, partition_col, history, gotchas
  (extends the v52 sample_questions / things_to_know / pairs_well_with
  docs surface with structured per-table content)

Repo extensions:

* DataPackagesRepository.create + update accept the new fields with
  the same Optional-is-no-op contract as v51 (pass an empty list to
  clear a JSON column)
* _decode_row decodes the new JSON-list columns to Python lists; NULL
  rounds back to [] so callers don't branch
* TableRegistryRepository.update_docs grew the v56 fields alongside
  the existing v52 ones — single PATCH can write either tier
  atomically
* TableRegistryRepository._decode_row picks up platforms + gotchas in
  the same NULL-tolerant decoder

22 repo + migration tests passing. API + UI land in subsequent commits.

* feat(v56): API surface for extended data-packages + per-table docs

CreateDataPackageRequest + UpdateDataPackageRequest grew the v56 fields
(owner_name, owner_team, tags, long_description, when_to_use,
when_not_to_use, example_questions) with per-field validators that
match the Foundry spec checklist:

  * tags: ≤8 entries × ≤30 chars
  * long_description: ≤4000 chars
  * use/skip: ≤8 bullets × ≤200 chars
  * example_questions: ≤12 × ≤200 chars

_serialize emits all v56 fields plus a virtual ``badges`` list derived
server-side at render time (no DB column needed): "curated" when the
creator is in the Admin group, "new" within 30 days of created_at.
Backdating created_at or admin-status changes pick up automatically.

PATCH /api/admin/registry/{id}/docs extended with v56 structured
per-table fields (grain, platforms, partition_col, history, gotchas).
gotchas: list of {key: bool, body: str} Pydantic models with the same
≤8 cap; first key=true entry becomes the Key gotcha on the rendered
package detail page. PATCH echoes the fresh state so callers can
re-render without a second GET.

26 API tests passing (16 data-packages + 10 registry-docs).

* feat(v56): /catalog/p/<slug> rewrite + Browse-grid card augmentation

The third (and final) v56 commit lights up the UI surfaces backed by
the schema + API commits earlier in this PR:

* /catalog/p/<slug> template rebuilt around the Foundry spec's
  section ladder — hero (icon + name + badges + owner + tags +
  description + meta + Add-to-stack), "What it is" markdown body,
  paired "Use it when / Skip it when" panels, "Tables in this
  package" with collapsible per-table extended detail (grain /
  platforms / partition_col / history / gotchas + sample questions),
  and an "Example questions you can ask Claude" prompt panel. Each
  section guarded by ``{% if pkg.<field> %}`` — empty content fields
  hide the section entirely (no "No X yet" placeholder noise on the
  public-facing drilldown).
* router catalog_package_detail hydrates per-table v56 fields onto
  the tables list + derives the virtual badges (curated / new)
  server-side from creator-in-Admin + 30-day created_at.
* StackResolver.ResourceEntry grew owner_name / owner_team / tags /
  badges; _fetch_entries pulls the v56 columns + computes badges
  once per fetch using a single Admin-group SELECT.
* _data_package_entry_dict adapter passes the new fields through to
  the macro; tags are merged source-type pills + admin-authored
  category tags per the spec convention.
* _stack_card.html renders the v56 badges (top-left, data-badge=
  hooks) + the owner chip (data-card-owner hook) without breaking
  back-compat — pre-v56 rows render unchanged.
* Admin PUT handler strips the v56 docs fields from the
  read-modify-write merged dict so register() doesn't blow up
  with the now-larger row shape (same pattern as the v52 docs
  fields stripping).

5115 tests passing (+98 v56 + 18 fixed regressions from the merged-
register PUT path), 35 skipped.

* fix(rbac): Edit-on-package + Group-access 'required' persistence + CI vendor guard

Three related bugs reported on the merged-with-main branch:

1. Clicking Edit on a Data Package card landed on /admin/tables with
   a `#<pkg.id>` hash that nothing listened to — admin saw the global
   table listing, not the editor for that specific package. Added a
   `?edit_package=<pkg_id>` query-param handler in admin_tables.html
   (analog to the existing `?edit=<table_id>` and `?assign_to=<pkg_id>`
   patterns) that calls openEditDataPackageModal on DOMContentLoaded
   after a 250ms layout settle. Updated the package-detail Edit link
   to use the new query param.

2. Setting Group Access to 'required' didn't persist — re-opening
   the modal showed 'available'. Root cause was the v49
   ``resource_grants.requirement`` enum existing in the DB but the
   POST /api/admin/grants endpoint not surfacing it: ``CreateGrantRequest``
   declared only group_id + resource_type + resource_id, so Pydantic
   silently dropped the matrix's ``requirement: 'required'`` payload
   and the new row landed at the DB column default ('available').
   Plumbed ``requirement`` through ``CreateGrantRequest`` →
   ``ResourceGrantsRepository.create`` so the value persists in one
   round-trip. Plus a UNIQUE-constraint race in the matrix
   diff-apply: DELETE-old + POST-new ran in parallel via
   ``Promise.allSettled``, so POST could fire first and trip the
   unique check before DELETE freed the slot. Switched to sequential
   (await all deletes; then await all writes) across all three
   matrices (Edit Data Package, Edit Memory Domain, Edit Recipe).

3. CI vendor-content guard ``test_no_groupon_specific_strings_in_oss``
   tripped on two of my own docstrings: a "Foundry Data team" mention
   in two src/db.py comments + an ``s1_session_landings`` example in
   cli/skills/agnes-table-registration.md. Rephrased the comments to
   "extended-descriptions admin spec" and replaced the example with
   a generic ``events_daily`` table name.

5164 tests passing, 35 skipped (+4 regression tests pinning the POST
/api/admin/grants requirement contract). Vendor guard back to green.

* fix(catalog): admin Browse path drops v58 card fields

The /catalog and /memory admin god-mode branch built ResourceEntry
instances inline from pkg_repo.list() / domains_repo.list() and skipped
owner_name, owner_team, tags, and derived badges (curated/new). Visible
symptom: a package with an owner + tags rendered with the v56 chrome
for non-admin viewers but as a bare card for admins.

Adds StackResolver.browse_admin(user_id, resource_type) — admin god-mode
Browse that walks the full table but routes through the same
_fetch_entries enrichment pass as browse(), so admin + non-admin Browse
stay visually consistent. Both /catalog and /corporate-memory routes
switch to it.

Regression test in tests/test_stack_resolver_browse_admin.py covers:
owner/tags propagation, new/curated badge derivation, in_stack from
admin subscriptions, all-packages-regardless-of-grants, and the
ValueError for unsupported resource types.

* fix(catalog): three /catalog tab-strip UX bugs

1. Required Remove → red toast
   browse_admin passed empty required_ids to _fetch_entries, so the
   admin's own required grants surfaced as 'available' and the macro
   rendered an actionable Remove button that POST /unsubscribe 400'd
   on. Now derives required_ids from the admin's own groups so
   Required packages render with the disabled "In stack (required)"
   button. Regression test in test_stack_resolver_browse_admin.py.

2. Remove green-toasts but card stays until refresh
   The My-Stack empty-state placeholder was only emitted server-side
   when stack_entries was empty at render time. Removing the last
   card left the tab completely blank — users read that as "Remove
   didn't work, let me refresh". Both grid + empty-state are now
   always rendered with one of them initially hidden; the JS swaps
   visibility on add/remove instead of injecting DOM. Same fix in
   /corporate-memory.

3. "What are Recipes?" + ambiguous (admin) suffix
   Recipes tab now carries its own curator-block explainer (the
   shared one was moved inside Browse view so it doesn't bleed
   across tabs). The grey "(admin)" suffix becomes a yellow
   .admin-only-hint chip with a title tooltip — visibility hint is
   now unambiguous: yellow chip = "only you see this", non-admins
   don't see the affordance at all.

* schema: renumber v51..v58 → v52..v59 to make room for main's v51

Main 0.54.29 introduced a NEW v51 (table_registry.bq_fqn — issue #343)
that releases ahead of this branch. The unified-stack chain v51..v58
shifts up by one so main's v51 stays as the released schema and ours
become v52..v59. Function names, internal version bumps, dispatch
ladder thresholds, and the migration-test references all move
together. Subsequent merge with main lands the bq_fqn column at the
freed v51 slot.

* fix(seed): seed admin lands in BOTH Admin AND Everyone groups

The LOCAL_DEV_MODE / SEED_ADMIN_EMAIL bootstrap only added the seed
user to Admin. Everyone-scoped grants — the canonical "every-user-
sees-this" pattern for Required onboarding — didn't surface for the
seed admin's own /catalog because they weren't in Everyone. Symptom:
admin grants a Required-tier package to Everyone, then sees it on
/catalog still rendered with an "Add to stack" button (because the
admin's resolved required_ids was empty for that package).

The dual-membership keeps Admin (authorization) and Everyone
(default-grant target) intentionally separate per the design comment
on UserRepository.create — every membership remains traceable to a
concrete row, just now with a system_seed row in Everyone too. Both
INSERTs go through UserGroupMembersRepository.add_member which is
idempotent on (user_id, group_id), so re-fires on every lifespan
startup don't duplicate rows.

Regression test in test_main_seed_admin_everyone.py.

* style: unify admin-only hints across marketplace + memory detail pages

Replaces three stale ``(admin)`` parentheticals with the same yellow
``admin-only`` chip introduced for /catalog tab actions. Same tooltip
copy ("Visible only to admins — analysts won't see this …") so the
visibility hint is unmistakable wherever it appears:

- Hard delete on marketplace_plugin_detail (admin-only destructive
  action — same gating as the original suffix conveyed).
- Hard delete on marketplace_item_detail (same).
- Edit link on memory_domain_detail (title-attr only before; now a
  visible chip too).

Non-admin viewers never saw these affordances — the gates are
unchanged. Pure styling pass for consistency.

* fix(catalog): exclude soft-deleted data packages + memory domains from Browse

``StackResolver._fetch_entries`` and ``browse_admin`` were querying
data_packages / memory_domains without a ``deleted_at IS NULL`` guard.
A package soft-deleted via /admin/* (v54 soft-delete contract) stayed
visible on /catalog and /memory until either an Undo or a hard delete
— directly contradicting the soft-delete UX which is supposed to
remove the affordance immediately and only retain the row for the
Undo window.

The repository accessors (DataPackagesRepository.list,
MemoryDomainsRepository.list, list_packages_of_table, etc.) already
filter deleted rows; this commit brings the resolver's direct SQL in
line with that contract.

Regression test in test_stack_resolver_browse_admin.py.

* fix(catalog): Add/Remove updates full card chrome, not just button

The previous _applyStackChange flipped only the footer button label —
the card border (.is-in-stack class), top-right "In stack" badge, and
button color class (--add / --remove) stayed at their server-rendered
state. After Add the user saw the button checkmark but the rest of
the card still looked like "available, not in stack". They read this
as "the change didn't take — let me refresh".

This commit makes the optimistic update mirror what the server-side
macro renders for the new state:

* ``c.classList.toggle('is-in-stack', becameInStack)`` — flips the
  border + visual state class.
* Top-right ``.stack-card__req-badge--instack`` badge is injected on
  Add, removed on Remove (skipped when ``data-requirement='required'``
  — that slot is owned by the Required badge).
* Button text is "Remove" / "+ Add to stack" matching the macro
  (was "✓ In stack" which was visually nice but inconsistent).
* Button color class --add / --remove swaps so the destructive Remove
  tint kicks in immediately.

The clone-into-My-Stack path applies the same updates so the new card
in My Stack reads identically to a server-rendered in_stack card.
Mirrored in /corporate-memory.

* fix(memory): four Devin-review bugs on /memory drill-down + manifest

PR #333 Devin review surfaced four real bugs that ship a broken
/memory experience even though the unit tests passed.

1. Manifest md5 omits is_required + content (app/api/sync.py:836-840)
   _build_memory_domains_section hashed only (id|title|status) per
   item. _build_per_domain_markdown routes items between "## Required"
   and "## Approved" by is_required and embeds full content — so an
   admin edit of either dimension left the manifest md5 unchanged,
   `agnes pull` skipped the re-fetch, and the analyst kept a stale
   bundle.md. Now both fields participate in the hash.

2. required_count always 0 (src/repositories/memory_domains.py)
   list_items_of_domain only SELECTed (id, title, status) so the
   `it.get("is_required")` in the manifest builder always evaluated
   to None → required_count = 0 regardless of actual state. The
   manifest builder advertised a count it could never compute. Now
   projects is_required + content too (required by fix 1 anyway).

3. Vote URL 404 (memory_domain_detail.html:289-290)
   Constructed `/api/memory/items/{id}/vote` but the route is
   `/api/memory/{id}/vote`. Every upvote/downvote button was a
   silent no-op.

4. Dismiss/undismiss URL + method both wrong (memory_domain_detail.html:296-305)
   Constructed `/api/memory/items/{id}/dismiss` (extra /items/) and
   /undismiss (no such route — undismiss is DELETE on /dismiss).
   Both buttons silently 404'd. Now POST + DELETE on
   `/api/memory/{id}/dismiss` per app/api/memory.py:635/675.

* fix: multi-agent reviewer findings — vendor-token scrubs + manifest md5 predicate + soft-delete filter

Three reviewer findings from the multi-agent review on PR #333,
fixed in-place per CLAUDE.md issue-economy rule.

Reviewer-rules (Important — vendor-agnostic OSS):
- app/main.py:218 comment: replaced 'foundryai-prod' with generic
  'a customer prod instance' phrasing. Public OSS repo must not
  carry customer-specific tokens (CLAUDE.md § Project conventions).
- tests/test_table_registry_v56_docs.py:70 fixture string:
  replaced "user_brand_affiliation = 'groupon'" with 'acme' on
  the same rule.

Reviewer-architecture (closes still-unresolved Devin 🚩 ANALYSIS):
- app/api/sync.py _build_memory_domains_section: md5 hash loop now
  filters items to the SAME predicate the bundle renderer uses
  (is_required OR status='approved'). Pre-fix the hash iterated ALL
  items but _build_per_domain_markdown only rendered the union of
  required items + approved-non-required items — so an admin edit
  to a pending/rejected non-required item flipped the md5 against
  an identical-bytes bundle, triggering a wasteful re-fetch on
  every analyst's next 'agnes pull'. The earlier commit fixed the
  hash-input fields (is_required + content); this closes the
  set-of-items asymmetry Devin separately flagged.

Reviewer-RBAC (minor cleanup):
- app/resource_types.py _data_package_blocks and _memory_domain_blocks
  now filter 'WHERE deleted_at IS NULL' (v54 soft-delete column) so
  the /admin/access UI doesn't surface soft-deleted entities as
  grantable. Mirrors the existing filter on _recipe_blocks. No
  security leak pre-fix (resolver double-filters and re-checks at
  serve time), just UI cleanliness.
- app/services/stack_resolver.py add_to_stack: docstring note
  added explaining that authorization is enforced at the API layer
  (app/api/stack.py can_access gate), not at the resolver. The
  initial review suggested adding a defensive 403 here, but that
  broke 5 existing tests that legitimately call add_to_stack
  directly without setting up grants first; the docstring captures
  the contract instead. stack() already intersects subscriptions
  with current available_ids on every read, so a 'zombie' row from
  a misuse never leaks into the user-facing manifest.

* release: 0.55.0 — unified Browse + My Stack (Data Packages + Memory), schema v48→v59, 3 BREAKING
2026-05-19 15:00:15 +02:00

1761 lines
66 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

"""Corporate memory endpoints — knowledge items, voting, governance admin, contradictions."""
import asyncio
import json
import logging
import uuid
from typing import Optional, List
from fastapi import APIRouter, Depends, HTTPException, Response
from pydantic import BaseModel, ConfigDict, Field
import duckdb
from app.auth.dependencies import get_current_user, _get_db
from app.auth.access import require_admin, is_user_admin, can_access
from src.repositories.knowledge import KnowledgeRepository
from src.repositories.memory_domains import MemoryDomainsRepository
from src.repositories.audit import AuditRepository
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/api/memory", tags=["memory"])
# v49: ``mandatory`` is no longer a lifecycle status — Required tier rides on
# ``knowledge_items.is_required``. ``status`` covers lifecycle only (pending,
# approved, rejected, revoked, expired).
VALID_STATUSES = ["pending", "approved", "rejected", "revoked", "expired"]
BUNDLE_TOKEN_BUDGET = 6000
# Rough chars-per-token estimate (conservative).
_CHARS_PER_TOKEN = 4
# v49: domain set is no longer a hardcoded enum — it lives in the
# ``memory_domains`` table and is administrable via /admin/memory-domains.
# Validation uses ``MemoryDomainsRepository.exists_by_slug``.
def _validate_domain_slug(slug: Optional[str], conn: duckdb.DuckDBPyConnection) -> None:
"""Raise 400 if ``slug`` is truthy but doesn't resolve to a memory_domains row."""
if not slug:
return
if not MemoryDomainsRepository(conn).exists_by_slug(slug):
raise HTTPException(
status_code=400,
detail=f"Unknown memory domain slug: {slug!r}",
)
# API-layer allowlist for ``POST /api/memory/admin/bulk-update``. The repo's
# ``_UPDATABLE_FIELDS`` is intentionally broader (``status``, ``sensitivity``,
# ``is_personal``, ``confidence``, valid_from/until, supersedes, etc.) so the
# narrow per-item ``update`` path can still touch them; bulk-edit must NOT,
# because changing status / personal-flag / sensitivity in bulk bypasses the
# proper governance flow (``/admin/mandate``, ``/admin/revoke``,
# ``/{id}/personal``) and its dedicated audit rows. Callers that need those
# fields in bulk should use the per-item endpoints. See PR #126 review.
_BULK_UPDATE_ALLOWED = frozenset({
"category", "domain", "tags", "tags_add", "tags_remove",
"audience", "title", "content",
})
def _is_privileged_viewer(user: dict, conn: duckdb.DuckDBPyConnection) -> bool:
"""Admins (members of the Admin system group, per RBAC v13) are the
privileged viewer tier. Pre-v13 the schema also had a km_admin role; v13
collapsed the role hierarchy into groups, so the corporate-memory admin
capability now lives on top of plain admin membership. Module authors
needing a finer-grained gate (curator-only, etc.) should add a
``ResourceType.CORPORATE_MEMORY_ADMIN`` resource type and gate with
``require_resource_access`` instead of extending this helper."""
user_id = user.get("id")
if not user_id:
return False
return is_user_admin(user_id, conn)
def _effective_groups(
user: dict, conn: duckdb.DuckDBPyConnection
) -> Optional[List[str]]:
"""Audience-filter group list for the caller, or ``None`` for admins
(no filter — see all items regardless of audience).
Reads from ``user_group_members`` JOIN ``user_groups`` (the v13 model).
Pre-v13 this read ``users.groups`` JSON; that column was dropped in v13
and the membership is now materialized in ``user_group_members`` with a
``source`` discriminator (admin / google_sync / system_seed).
"""
if _is_privileged_viewer(user, conn):
return None
user_id = user.get("id")
if not user_id:
return []
rows = conn.execute(
"""SELECT g.name FROM user_group_members m
JOIN user_groups g ON m.group_id = g.id
WHERE m.user_id = ?""",
[user_id],
).fetchall()
return [f"group:{r[0]}" for r in rows]
def _caller_granted_memory_domains(
user: dict,
conn: duckdb.DuckDBPyConnection,
) -> Optional[List[str]]:
"""Domains the caller has been granted access to via resource_grants.
The grant model is generic — admins assign ``MEMORY_DOMAIN`` resources
(e.g. ``md_finance``) to ``user_groups`` rows via ``/admin/access``.
This helper resolves the caller's group memberships against
``resource_grants`` and returns the union of ``memory_domains.id``
values (v49: the migration re-pointed grants from slug to id).
Returns ``None`` for privileged viewers (admins see everything regardless
of grants — same convention as ``_effective_groups``). Returns an
empty list when the caller has no grants — the SQL EXISTS-join collapses
in that case, preserving pre-RBAC behaviour.
"""
if _is_privileged_viewer(user, conn):
return None
user_id = user.get("id")
if not user_id:
return []
rows = conn.execute(
"""SELECT DISTINCT rg.resource_id
FROM resource_grants rg
JOIN user_group_members m ON m.group_id = rg.group_id
WHERE m.user_id = ?
AND rg.resource_type = 'memory_domain'""",
[user_id],
).fetchall()
return [r[0] for r in rows]
def _can_view_item(user: dict, item: dict, is_priv: bool) -> bool:
"""Personal items are visible only to the contributor and privileged
viewers. Non-personal items are visible to any authenticated user.
``is_priv`` is pre-computed by the caller (one DB hit per request) so
a per-item loop doesn't re-query ``user_group_members`` for every row.
"""
if not item.get("is_personal"):
return True
if is_priv:
return True
return item.get("source_user") == user.get("email")
class CreateKnowledgeRequest(BaseModel):
title: str
content: str
# Allow callers to POST either `domain_slug` (new canonical name,
# matching admin/repo/template layers) or `domain` (legacy alias kept
# for one release so existing API callers don't break — Pydantic v2
# accepts the alias on input, Python code reads `request.domain_slug`).
model_config = ConfigDict(populate_by_name=True)
category: str
tags: Optional[List[str]] = None
domain_slug: Optional[str] = Field(default=None, alias="domain")
entities: Optional[List[str]] = None
source_type: Optional[str] = None
class VoteRequest(BaseModel):
vote: int
class PersonalFlagRequest(BaseModel):
is_personal: bool
class AdminActionRequest(BaseModel):
reason: Optional[str] = None
audience: Optional[str] = None
class EditRequest(BaseModel):
title: Optional[str] = None
content: Optional[str] = None
class BatchActionRequest(BaseModel):
item_ids: List[str]
action: str # approve, reject, mandate, revoke
reason: Optional[str] = None
audience: Optional[str] = None
class ResolveContradictionRequest(BaseModel):
resolution: str # kept_a, kept_b, merged, both_valid
class CreateContradictionRequest(BaseModel):
item_a_id: str
item_b_id: str
explanation: str
severity: Optional[str] = None
suggested_resolution: Optional[str] = None
class PatchItemRequest(BaseModel):
"""Partial update for a knowledge item via PATCH /api/memory/admin/{id}.
Replaces the narrow ``EditRequest`` (title + content only). Any field
left as ``None`` is unchanged. Domain is validated against
``VALID_DOMAINS`` when supplied.
``domain_ids`` is the M:N junction write path (knowledge_item_domains)
used by the admin item-edit modal's chip-input — pass a list of
memory_domains.id strings and the endpoint replaces the item's full
domain membership atomically. Empty list ``[]`` clears all
memberships. Supplying both ``domain`` and ``domain_ids`` is allowed
(the legacy single ``domain`` write happens first, the junction
replace overrides it).
"""
title: Optional[str] = None
content: Optional[str] = None
category: Optional[str] = None
domain: Optional[str] = None
domain_ids: Optional[List[str]] = None
tags: Optional[List[str]] = None
audience: Optional[str] = None
class BulkUpdateRequest(BaseModel):
"""Apply ``updates`` to every id in ``item_ids``. Issue #62."""
item_ids: List[str]
updates: dict
class ResolveDuplicateRequest(BaseModel):
"""Resolve a duplicate-candidate relation row.
``resolution`` is one of ``duplicate`` / ``different`` / ``dismissed``
(decision 2 in issue #62 — no auto-merge action; merging is a separate
larger feature).
"""
resolution: str
# ---- Memory domain catalog (v49 — frontend typeahead + admin dropdowns) ----
@router.get("/domains")
async def list_memory_domains(
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""List all memory domains for chip-input typeahead + dropdown population.
v49: replaces the hardcoded ``VALID_DOMAINS`` constant. Returns every
row in ``memory_domains`` (admin-administered + the six canonical seed
rows) so the frontend can render the picker without a separate /admin
endpoint. Authenticated users only — domain catalog is non-sensitive
metadata that powers the item-edit UI.
"""
domains = MemoryDomainsRepository(conn).list()
return {
"domains": [
{
"id": d["id"],
"slug": d["slug"],
"name": d["name"],
"description": d["description"],
"icon": d["icon"],
"color": d["color"],
}
for d in domains
]
}
# ---- User endpoints ----
@router.get("")
async def list_knowledge(
status_filter: Optional[str] = None,
category: Optional[str] = None,
domain: Optional[str] = None,
source_type: Optional[str] = None,
search: Optional[str] = None,
exclude_personal: bool = True,
upvoted_by_me: bool = False,
hide_dismissed: bool = False,
is_required: Optional[bool] = None,
page: int = 1,
per_page: int = 50,
sort: str = "updated_at",
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""List knowledge items with filtering, pagination, search.
``upvoted_by_me=true`` narrows to items the caller upvoted (powers the
"My Upvotes" filter on /corporate-memory — replaces the old dead
"My Rules" category sentinel).
"""
repo = KnowledgeRepository(conn)
page = max(page, 1)
offset = (page - 1) * per_page
# Privacy: non-privileged viewers can never opt out of the personal filter.
# Their own personal contributions are visible via /my-contributions, not here.
effective_exclude_personal = True if not _is_privileged_viewer(user, conn) else exclude_personal
effective_groups = _effective_groups(user, conn)
granted_domains = _caller_granted_memory_domains(user, conn)
statuses = [status_filter] if status_filter else None
upvoted_by_user_id = user["id"] if upvoted_by_me else None
# v46: caller's id is plumbed to repo filters when hide_dismissed=True so
# the SQL can NOT-EXISTS-subquery against knowledge_item_user_dismissed.
# Mandatory items are exempted by the subquery's status guard.
dismissed_by_user_id = user["id"]
if search:
items = repo.search(
search,
exclude_personal=effective_exclude_personal,
user_groups=effective_groups,
granted_domains=granted_domains,
statuses=statuses,
category=category,
domain=domain,
source_type=source_type,
is_required=is_required,
dismissed_by_user=dismissed_by_user_id,
hide_dismissed=hide_dismissed,
limit=per_page,
offset=offset,
)
if upvoted_by_user_id:
# Best-effort post-filter for the search() path (which doesn't
# plumb the upvote filter into its SQL). Search + "My Upvotes"
# is rare enough that a post-filter is fine.
upvoted_ids = {
r[0] for r in conn.execute(
"SELECT item_id FROM knowledge_votes WHERE user_id = ? AND vote > 0",
[upvoted_by_user_id],
).fetchall()
}
items = [it for it in items if it["id"] in upvoted_ids]
else:
items = repo.list_items(
statuses=statuses,
category=category,
domain=domain,
source_type=source_type,
is_required=is_required,
exclude_personal=effective_exclude_personal,
user_groups=effective_groups,
granted_domains=granted_domains,
upvoted_by_user=upvoted_by_user_id,
dismissed_by_user=dismissed_by_user_id,
hide_dismissed=hide_dismissed,
limit=per_page,
offset=offset,
)
# Enrich with votes + per-user dismissal flag. The set lookup keeps the
# per-item annotation O(1); the frontend uses ``dismissed_by_me`` to
# render the gray-out state without a separate roundtrip.
dismissed_set = set(repo.list_dismissed_ids(user["id"]))
for item in items:
votes = repo.get_votes(item["id"])
item["upvotes"] = votes["upvotes"]
item["downvotes"] = votes["downvotes"]
item["score"] = votes["upvotes"] - votes["downvotes"]
item["dismissed_by_me"] = item["id"] in dismissed_set
import math
total_count = repo.count_items(
search=search,
statuses=statuses,
category=category,
domain=domain,
source_type=source_type,
is_required=is_required,
exclude_personal=effective_exclude_personal,
user_groups=effective_groups,
granted_domains=granted_domains,
dismissed_by_user=dismissed_by_user_id,
hide_dismissed=hide_dismissed,
)
total_pages = math.ceil(total_count / per_page) if per_page > 0 else 1
return {
"items": items,
"count": len(items),
"page": page,
"per_page": per_page,
"total_count": total_count,
"total_pages": total_pages,
}
@router.get("/stats")
async def get_stats(
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Get corporate memory statistics.
Aggregations exclude personal items for non-privileged callers — otherwise
`total` and the `by_*` counts would change in observable ways when a
colleague flags or unflags a personal item, leaking existence info per
ADR Decision 1.
Uses SQL aggregation rather than ``repo.list_items()`` to keep the
endpoint cheap on large knowledge bases (the loader path materializes
every row + parses JSON tags/contributors per row, which blocks the
event loop on N>1k items). Audience filter mirrors what list_items
applies: ``audience IS NULL OR audience = 'all'`` plus, for non-admins,
membership in any of the caller's group-prefixed audiences.
"""
is_priv = _is_privileged_viewer(user, conn)
groups = _effective_groups(user, conn)
granted_domains = _caller_granted_memory_domains(user, conn)
where_clauses: List[str] = []
params: list = []
if not is_priv:
# Personal-item privacy: non-privileged callers see no personal items
# in the aggregate, even their own. /my-contributions is the canonical
# surface for a user's personal contributions; including them here
# would make /api/memory/stats.total disagree with the count visible
# via GET /api/memory (which forces exclude_personal=True for non-
# admins regardless of source_user).
where_clauses.append("(is_personal IS NULL OR is_personal = FALSE)")
if groups is not None:
# Mirror the visibility composition KnowledgeRepository.list_items
# uses: audience match OR MEMORY_DOMAIN grant. Without this the
# stats `total` diverges from the list endpoint's `total_count` for
# non-admin users with grants. v49: granted_domains values are
# ``memory_domains.id`` and resolve via the junction EXISTS subquery.
visibility = ["audience IS NULL", "audience = 'all'"]
if groups:
placeholders = ",".join(["?"] * len(groups))
visibility.append(f"audience IN ({placeholders})")
params.extend(groups)
if granted_domains:
domain_placeholders = ",".join(["?"] * len(granted_domains))
visibility.append(
"EXISTS (SELECT 1 FROM knowledge_item_domains kid "
"WHERE kid.item_id = knowledge_items.id "
f"AND kid.domain_id IN ({domain_placeholders}))"
)
params.extend(granted_domains)
where_clauses.append("(" + " OR ".join(visibility) + ")")
where_sql = (" WHERE " + " AND ".join(where_clauses)) if where_clauses else ""
total = conn.execute(
f"SELECT COUNT(*) FROM knowledge_items{where_sql}", params
).fetchone()[0] or 0
by_status_rows = conn.execute(
f"SELECT COALESCE(status, 'unknown') AS s, COUNT(*) "
f"FROM knowledge_items{where_sql} GROUP BY s",
params,
).fetchall()
by_status = {r[0]: r[1] for r in by_status_rows}
cat_rows = conn.execute(
f"SELECT DISTINCT category FROM knowledge_items{where_sql} "
f"{'AND' if where_sql else 'WHERE'} category IS NOT NULL",
params,
).fetchall()
categories = sorted(r[0] for r in cat_rows if r[0])
# v49: domain lives in the junction. LEFT JOIN to surface 'unset' bucket
# for items without any domain row, matching the pre-v49 COALESCE behavior.
by_domain_rows = conn.execute(
"SELECT COALESCE(md.slug, 'unset') AS d, COUNT(*) "
"FROM knowledge_items "
"LEFT JOIN knowledge_item_domains kid ON kid.item_id = knowledge_items.id "
"LEFT JOIN memory_domains md ON md.id = kid.domain_id"
+ (where_sql or "")
+ " GROUP BY d",
params,
).fetchall()
by_domain = {r[0]: r[1] for r in by_domain_rows}
by_source_rows = conn.execute(
f"SELECT COALESCE(source_type, 'unknown') AS st, COUNT(*) "
f"FROM knowledge_items{where_sql} GROUP BY st",
params,
).fetchall()
by_source_type = {r[0]: r[1] for r in by_source_rows}
# by_tag + by_audience extend stats for the chip-filter UI (issue #62).
# The repo helpers honor the same audience + personal-item filters this
# endpoint applies above.
repo = KnowledgeRepository(conn)
exclude_personal_for_caller = not is_priv
by_tag = repo.count_by_tag(
exclude_personal=exclude_personal_for_caller,
user_groups=groups,
granted_domains=granted_domains,
)
by_audience = repo.count_by_audience(
exclude_personal=exclude_personal_for_caller,
user_groups=groups,
granted_domains=granted_domains,
)
return {
"total": total,
"by_status": by_status,
"categories": categories,
"by_domain": by_domain,
"by_source_type": by_source_type,
"by_tag": by_tag,
"by_audience": by_audience,
}
@router.post("", status_code=201)
async def create_knowledge(
request: CreateKnowledgeRequest,
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
# Mirror the validation already enforced by PATCH /admin/{id} and bulk-update
# so an item can't be created with a domain it can't be patched to. Empty /
# missing domain is fine — only reject non-empty values outside the allowlist.
# See PR #126 review.
_validate_domain_slug(request.domain_slug, conn)
repo = KnowledgeRepository(conn)
item_id = str(uuid.uuid4())
# Best-effort auto-tagging — runs only when an LLM extractor is configured.
tags = list(request.tags) if request.tags else []
try:
from config.loader import load_instance_config
from connectors.llm import create_extractor
from services.corporate_memory.tagger import auto_tag_items
cfg = load_instance_config()
ai_cfg = cfg.get("ai")
if ai_cfg:
extractor = create_extractor(ai_cfg)
stub = [{"id": item_id, "title": request.title, "content": request.content}]
assignments = await asyncio.to_thread(auto_tag_items, stub, extractor)
topics = assignments.get(item_id, [])
if topics:
seen: set[str] = set()
merged: list[str] = []
for t in topics + tags:
if t not in seen:
seen.add(t)
merged.append(t)
tags = merged
except Exception:
pass # tagging is non-critical — never block item creation
create_kwargs = dict(
id=item_id,
title=request.title,
content=request.content,
category=request.category,
source_user=user.get("email"),
tags=tags or None,
domain=request.domain_slug,
entities=request.entities,
confidence=0.50,
)
if request.source_type:
create_kwargs["source_type"] = request.source_type
repo.create(**create_kwargs)
return {"id": item_id, "status": "pending"}
@router.post("/{item_id}/vote")
async def vote_knowledge(
item_id: str,
request: VoteRequest,
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
if request.vote not in (1, -1, 0):
raise HTTPException(status_code=400, detail="Vote must be 1, -1, or 0 (retract)")
repo = KnowledgeRepository(conn)
item = repo.get_by_id(item_id)
if not item or not _can_view_item(user, item, _is_privileged_viewer(user, conn)):
raise HTTPException(status_code=404, detail="Knowledge item not found")
if request.vote == 0:
repo.unvote(item_id, user["id"])
else:
repo.vote(item_id, user["id"], request.vote)
return repo.get_votes(item_id)
@router.get("/my-votes")
async def get_my_votes(
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Get current user's votes on all items."""
results = conn.execute(
"SELECT item_id, vote FROM knowledge_votes WHERE user_id = ?", [user["id"]]
).fetchall()
return {row[0]: row[1] for row in results}
@router.get("/my-contributions")
async def get_my_contributions(
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Get knowledge items contributed by the current user."""
repo = KnowledgeRepository(conn)
email = user.get("email", "")
items = repo.get_user_contributions(email)
for item in items:
votes = repo.get_votes(item["id"])
item["upvotes"] = votes["upvotes"]
item["downvotes"] = votes["downvotes"]
item["score"] = votes["upvotes"] - votes["downvotes"]
return {"items": items, "count": len(items)}
@router.post("/{item_id}/personal")
async def toggle_personal_flag(
item_id: str,
request: PersonalFlagRequest,
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Toggle personal/excluded flag on a knowledge item (only by the contributor)."""
repo = KnowledgeRepository(conn)
item = repo.get_by_id(item_id)
if not item:
raise HTTPException(status_code=404, detail="Knowledge item not found")
if item.get("source_user") != user.get("email"):
raise HTTPException(status_code=403, detail="Only the contributor can flag personal items")
repo.set_personal(item_id, request.is_personal)
return {"id": item_id, "is_personal": request.is_personal}
@router.post("/{item_id}/dismiss")
async def dismiss_item(
item_id: str,
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Per-user opt-out — remove an item from the caller's AI bundle.
Idempotent: re-dismissing an already-dismissed item is a no-op success.
Mandatory items can never be dismissed — the governance hard rule —
so a POST against one returns 400 with a clear detail message.
"""
repo = KnowledgeRepository(conn)
item = repo.get_by_id(item_id)
if not item or not _can_view_item(user, item, _is_privileged_viewer(user, conn)):
raise HTTPException(status_code=404, detail="Knowledge item not found")
# v49: Required tier rides on ``is_required`` (was status='mandatory').
if item.get("is_required") is True:
raise HTTPException(status_code=400, detail="Cannot dismiss a mandatory item")
repo.dismiss(user["id"], item_id)
# v49 Section 9.2 — telemetry. domain_ids surfaces the per-item domain
# membership so /admin/telemetry can correlate dismissals with the
# domain they came from.
try:
from src.repositories.memory_domains import MemoryDomainsRepository
from src.repositories.usage import UsageRepository
domain_ids = [
d["id"] for d in MemoryDomainsRepository(conn).list_domains_of_item(item_id)
]
UsageRepository(conn).emit_server_event(
event_type="memory.dismiss",
user_id=user["id"],
username=user.get("email") or user["id"],
props={"item_id": item_id, "domain_ids": domain_ids},
)
except Exception:
pass
return {"id": item_id, "dismissed": True}
@router.delete("/{item_id}/dismiss", status_code=204)
async def undismiss_item(
item_id: str,
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Idempotent un-dismiss — a second DELETE still returns 204.
Returns 404 if the item itself doesn't exist (consistent with the rest
of the per-item endpoints); the dismissal row's existence is not
consulted because absence is the success state.
"""
repo = KnowledgeRepository(conn)
item = repo.get_by_id(item_id)
if not item or not _can_view_item(user, item, _is_privileged_viewer(user, conn)):
raise HTTPException(status_code=404, detail="Knowledge item not found")
repo.undismiss(user["id"], item_id)
# v49 Section 9.2 — telemetry. Best-effort fire-and-forget. Endpoint
# returns 204 No Content (the decorator status_code overrides any
# body), so no return value needed; telemetry is the only side effect
# we still want.
try:
from src.repositories.usage import UsageRepository
UsageRepository(conn).emit_server_event(
event_type="memory.undismiss",
user_id=user["id"],
username=user.get("email") or user["id"],
props={"item_id": item_id},
)
except Exception:
pass
@router.get("/{item_id}/provenance")
async def get_provenance(
item_id: str,
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Get source provenance for a knowledge item."""
repo = KnowledgeRepository(conn)
item = repo.get_by_id(item_id)
if not item or not _can_view_item(user, item, _is_privileged_viewer(user, conn)):
raise HTTPException(status_code=404, detail="Knowledge item not found")
return {
"id": item_id,
"source_type": item.get("source_type"),
"source_ref": item.get("source_ref"),
"source_user": item.get("source_user"),
"confidence": item.get("confidence"),
"domain": item.get("domain"),
"entities": item.get("entities"),
"valid_from": item.get("valid_from"),
"valid_until": item.get("valid_until"),
"supersedes": item.get("supersedes"),
"created_at": item.get("created_at"),
}
# ---- Admin governance endpoints ----
def _get_item_or_404(repo: KnowledgeRepository, item_id: str) -> dict:
item = repo.get_by_id(item_id)
if not item:
raise HTTPException(status_code=404, detail="Knowledge item not found")
return item
def _audit_action(conn, admin_email: str, action: str, item_id: str, details: dict = None):
"""Write an admin governance audit row.
Action names use the ``corporate_memory.<action>`` namespace as advertised
in the 0.15.0 CHANGELOG. Pre-#62 the code wrote ``km_<action>`` — the
audit-tab filter (see ``admin_audit`` below) accepts both prefixes so
historical rows still surface.
"""
audit = AuditRepository(conn)
audit.log(
user_id=admin_email,
action=f"corporate_memory.{action}",
resource=item_id,
params=details,
)
@router.post("/admin/approve")
async def admin_approve(
item_id: str,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
repo = KnowledgeRepository(conn)
_get_item_or_404(repo, item_id)
repo.update_status(item_id, "approved")
_audit_action(conn, user["email"], "approve", item_id)
return {"id": item_id, "status": "approved"}
@router.post("/admin/reject")
async def admin_reject(
item_id: str,
request: AdminActionRequest,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
repo = KnowledgeRepository(conn)
_get_item_or_404(repo, item_id)
repo.update_status(item_id, "rejected")
_audit_action(conn, user["email"], "reject", item_id, {"reason": request.reason})
return {"id": item_id, "status": "rejected"}
@router.post("/admin/mandate")
async def admin_mandate(
item_id: str,
request: AdminActionRequest,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""v49: Required tier rides on ``knowledge_items.is_required`` boolean —
``status`` is reserved for lifecycle (pending/approved/rejected/revoked/
expired). This endpoint keeps the path stable for back-compat; response
shape now surfaces ``is_required: True`` instead of ``status: 'mandatory'``.
"""
repo = KnowledgeRepository(conn)
_get_item_or_404(repo, item_id)
repo.set_is_required(item_id, True)
if request.audience is not None:
repo.update(item_id, audience=request.audience)
_audit_action(conn, user["email"], "mandate", item_id, {
"reason": request.reason, "audience": request.audience,
})
# v49 Section 9.1 — spec table maps both mark-mandatory and the legacy
# mandate endpoint to the canonical ``memory_item.set_required`` action
# with a boolean payload so audit consumers can stop splitting on path.
try:
AuditRepository(conn).log(
user_id=user["email"],
action="memory_item.set_required",
resource=f"knowledge_item:{item_id}",
params={"new_value": True},
)
except Exception:
pass
return {"id": item_id, "is_required": True, "status": "mandatory"}
@router.post("/items/{item_id}/mark-mandatory")
async def mark_mandatory(
item_id: str,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Promote an item to required (``is_required = TRUE``).
v49: explicit path-segment variant of the legacy ``/admin/mandate`` query-
param endpoint, matching the spec's Section 6 mapping table. Same audit
pattern but no audience / reason fields — those stay on /admin/mandate.
"""
repo = KnowledgeRepository(conn)
_get_item_or_404(repo, item_id)
repo.set_is_required(item_id, True)
AuditRepository(conn).log(
user_id=user["email"],
action="memory_item.set_required",
resource=f"knowledge_item:{item_id}",
params={"new_value": True},
)
return {"id": item_id, "is_required": True}
@router.post("/items/{item_id}/mark-unmandatory")
async def mark_unmandatory(
item_id: str,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Demote an item from required (``is_required = FALSE``).
v49 — inverse of mark-mandatory. The item stays in the catalog with its
existing ``status`` (typically ``approved``); only the required-tier flag
flips. Audit row writes ``memory_item.set_required`` with
``{new_value: false}``.
"""
repo = KnowledgeRepository(conn)
_get_item_or_404(repo, item_id)
repo.set_is_required(item_id, False)
AuditRepository(conn).log(
user_id=user["email"],
action="memory_item.set_required",
resource=f"knowledge_item:{item_id}",
params={"new_value": False},
)
return {"id": item_id, "is_required": False}
@router.post("/admin/revoke")
async def admin_revoke(
item_id: str,
request: AdminActionRequest,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
repo = KnowledgeRepository(conn)
_get_item_or_404(repo, item_id)
repo.update_status(item_id, "revoked")
_audit_action(conn, user["email"], "revoke", item_id, {"reason": request.reason})
return {"id": item_id, "status": "revoked"}
@router.post("/admin/edit")
async def admin_edit(
item_id: str,
request: EditRequest,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
repo = KnowledgeRepository(conn)
_get_item_or_404(repo, item_id)
updates = {}
if request.title is not None:
updates["title"] = request.title
if request.content is not None:
updates["content"] = request.content
if updates:
repo.update(item_id, **updates)
_audit_action(conn, user["email"], "edit", item_id, updates)
return {"id": item_id, "updated": list(updates.keys())}
@router.post("/admin/batch")
async def admin_batch(
request: BatchActionRequest,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Batch governance action on multiple items.
v49: ``mandate`` flips the new ``is_required`` boolean to TRUE (was
``status='mandatory'`` overload). Other actions still drive ``status``.
"""
repo = KnowledgeRepository(conn)
# mandate is special — it writes is_required, not status. All other
# actions stay on the status lifecycle column.
status_actions = {
"approve": "approved",
"reject": "rejected",
"revoke": "revoked",
}
if request.action not in (*status_actions, "mandate"):
raise HTTPException(status_code=400, detail=f"Invalid action: {request.action}")
results = {"success": [], "not_found": []}
for item_id in request.item_ids:
item = repo.get_by_id(item_id)
if not item:
results["not_found"].append(item_id)
continue
if request.action == "mandate":
repo.set_is_required(item_id, True)
if request.audience is not None:
repo.update(item_id, audience=request.audience)
else:
repo.update_status(item_id, status_actions[request.action])
_audit_action(conn, user["email"], request.action, item_id, {
"reason": request.reason, "audience": request.audience, "batch": True,
})
results["success"].append(item_id)
return results
@router.get("/admin/pending")
async def admin_pending(
category: Optional[str] = None,
page: int = 1,
per_page: int = 50,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Get pending items queue for admin review."""
repo = KnowledgeRepository(conn)
page = max(page, 1)
offset = (page - 1) * per_page
items = repo.list_items(statuses=["pending"], category=category, limit=per_page, offset=offset)
return {"items": items, "count": len(items)}
@router.get("/admin/audit")
async def admin_audit(
page: int = 1,
per_page: int = 50,
action: Optional[str] = None,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Get governance audit log.
Filters ``corporate_memory.<action>`` rows AND legacy ``km_<action>``
rows. The dual prefix is here because rows already in the audit log keep
the legacy ``km_*`` action name (no migration of historical audit rows —
they are write-once); new rows use the ``corporate_memory.*`` namespace.
See issue #62 decision E.
"""
# Pagination: page is 1-indexed; offset must apply to BOTH branches so the
# UI's per-page navigation actually returns subsequent rows. Pre-fix, both
# SQL paths had LIMIT only and silently returned page 1 for every page.
offset = (max(page, 1) - 1) * per_page
if action:
# Match the action across both prefixes so the per-action filter still
# surfaces historical rows.
rows = conn.execute(
"""SELECT * FROM audit_log
WHERE action IN (?, ?)
ORDER BY timestamp DESC LIMIT ? OFFSET ?""",
[f"corporate_memory.{action}", f"km_{action}", per_page, offset],
).fetchall()
else:
rows = conn.execute(
"""SELECT * FROM audit_log
WHERE action LIKE 'corporate_memory.%' OR action LIKE 'km_%'
ORDER BY timestamp DESC LIMIT ? OFFSET ?""",
[per_page, offset],
).fetchall()
if rows:
columns = [desc[0] for desc in conn.description]
entries = [dict(zip(columns, row)) for row in rows]
else:
entries = []
return {"entries": entries, "count": len(entries)}
# ---- Admin contradiction endpoints ----
@router.get("/admin/contradictions")
async def admin_contradictions(
resolved: Optional[bool] = None,
exclude_personal: bool = True,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""List knowledge contradictions for admin review.
By default (`exclude_personal=True`), personal items are replaced with
{id, hidden: true} so the contradiction record is still visible for
governance but personal content is not exposed. Pass exclude_personal=false
to opt in to full content (KM_ADMIN only — see ADR Decision 1).
"""
repo = KnowledgeRepository(conn)
contradictions = repo.list_contradictions(resolved=resolved)
# Collect all distinct item IDs and fetch in one query (M5 batch optimisation).
all_item_ids = list({
id_
for c in contradictions
for id_ in (c["item_a_id"], c["item_b_id"])
})
items_by_id = repo.get_by_ids(all_item_ids)
for c in contradictions:
item_a = items_by_id.get(c["item_a_id"])
item_b = items_by_id.get(c["item_b_id"])
if exclude_personal:
c["item_a"] = {"id": c["item_a_id"], "hidden": True} if item_a and item_a.get("is_personal") else item_a
c["item_b"] = {"id": c["item_b_id"], "hidden": True} if item_b and item_b.get("is_personal") else item_b
else:
c["item_a"] = item_a
c["item_b"] = item_b
return {"contradictions": contradictions, "count": len(contradictions)}
@router.post("/admin/contradictions", status_code=201)
async def admin_create_contradiction(
request: CreateContradictionRequest,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Admin endpoint for manually recording a contradiction between two knowledge items."""
repo = KnowledgeRepository(conn)
if not repo.get_by_id(request.item_a_id):
raise HTTPException(status_code=404, detail=f"Item A not found: {request.item_a_id}")
if not repo.get_by_id(request.item_b_id):
raise HTTPException(status_code=404, detail=f"Item B not found: {request.item_b_id}")
cid = repo.create_contradiction(
item_a_id=request.item_a_id,
item_b_id=request.item_b_id,
explanation=request.explanation,
severity=request.severity,
suggested_resolution=request.suggested_resolution,
)
return {"id": cid}
@router.post("/admin/contradictions/{contradiction_id}/resolve")
async def admin_resolve_contradiction(
contradiction_id: str,
request: ResolveContradictionRequest,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Resolve a knowledge contradiction."""
repo = KnowledgeRepository(conn)
contradiction = repo.get_contradiction(contradiction_id)
if not contradiction:
raise HTTPException(status_code=404, detail="Contradiction not found")
if contradiction.get("resolved"):
raise HTTPException(status_code=400, detail="Contradiction already resolved")
valid_resolutions = ["kept_a", "kept_b", "merged", "both_valid"]
if request.resolution not in valid_resolutions:
raise HTTPException(
status_code=400,
detail=f"Resolution must be one of: {valid_resolutions}",
)
repo.resolve_contradiction(contradiction_id, user["email"], request.resolution)
_audit_action(conn, user["email"], "resolve_contradiction", contradiction_id, {
"resolution": request.resolution,
"item_a_id": contradiction["item_a_id"],
"item_b_id": contradiction["item_b_id"],
})
return {"id": contradiction_id, "resolved": True, "resolution": request.resolution}
# ---- Admin duplicate-candidate endpoints (issue #62) ----
VALID_DUPLICATE_RESOLUTIONS = ["duplicate", "different", "dismissed"]
DUPLICATE_RELATION_TYPE = "likely_duplicate"
def _strip_personal(item: Optional[dict], hide: bool) -> Optional[dict]:
"""Return a placeholder dict when ``item`` is personal and ``hide`` is set."""
if item is None:
return None
if hide and item.get("is_personal"):
return {"id": item.get("id"), "hidden": True}
return item
@router.get("/admin/duplicate-candidates")
async def admin_duplicate_candidates(
resolved: Optional[bool] = None,
exclude_personal: bool = True,
limit: int = 100,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""List duplicate-candidate relations for admin review.
Pass ``resolved=true`` or ``resolved=false`` to filter; omit both to fetch
every state (the original UI default). The web UI keeps surfacing the
actionable backlog by passing ``resolved=false`` explicitly.
With ``exclude_personal=true`` (default) personal items in the pair are
replaced with ``{id, hidden: true}`` — the relation row is still visible
so admins can resolve it, but content stays inside the personal-item
privacy boundary (ADR Decision 1 precedent).
"""
repo = KnowledgeRepository(conn)
relations = repo.list_relations(
relation_type=DUPLICATE_RELATION_TYPE,
resolved=resolved,
limit=limit,
)
item_ids = list({
id_
for r in relations
for id_ in (r["item_a_id"], r["item_b_id"])
})
items_by_id = repo.get_by_ids(item_ids) if item_ids else {}
for r in relations:
r["item_a"] = _strip_personal(items_by_id.get(r["item_a_id"]), exclude_personal)
r["item_b"] = _strip_personal(items_by_id.get(r["item_b_id"]), exclude_personal)
return {"relations": relations, "count": len(relations)}
@router.post("/admin/duplicate-candidates/resolve")
async def admin_resolve_duplicate_candidate(
item_a_id: str,
item_b_id: str,
request: ResolveDuplicateRequest,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Resolve a duplicate-candidate relation.
Admin chooses: ``duplicate`` (acknowledge), ``different`` (false
positive), or ``dismissed`` (don't surface again, but no judgment).
Idempotent re-resolve is rejected with 400 — the audit trail wants one
decision per pair.
"""
if request.resolution not in VALID_DUPLICATE_RESOLUTIONS:
raise HTTPException(
status_code=400,
detail=f"resolution must be one of: {VALID_DUPLICATE_RESOLUTIONS}",
)
repo = KnowledgeRepository(conn)
existing = repo.get_relation(item_a_id, item_b_id, DUPLICATE_RELATION_TYPE)
if not existing:
raise HTTPException(status_code=404, detail="Duplicate-candidate relation not found")
if existing.get("resolved"):
raise HTTPException(status_code=400, detail="Relation already resolved")
repo.resolve_relation(
item_a_id=item_a_id,
item_b_id=item_b_id,
relation_type=DUPLICATE_RELATION_TYPE,
resolved_by=user["email"],
resolution=request.resolution,
)
# Resource-id of audit row is the canonical (a,b) pair for grep-ability.
a, b = sorted([item_a_id, item_b_id])
_audit_action(
conn,
user["email"],
"resolve_duplicate",
f"{a}::{b}",
{"resolution": request.resolution, "item_a_id": a, "item_b_id": b},
)
return {
"item_a_id": a,
"item_b_id": b,
"resolved": True,
"resolution": request.resolution,
}
# ---- Admin PATCH + bulk-update + tree endpoints (issue #62) ----
@router.get("/admin/{item_id}")
async def admin_get_item(
item_id: str,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Single-item GET — powers the ``#item-<id>`` deep link from
`/memory/d/<slug>`'s Edit affordance. The admin page uses this to
fetch the row directly (bypassing pagination of the All-Items list)
so the edit modal opens reliably regardless of which page the item
happens to fall on. Returns the same dict shape as the list rows.
Route placement note: declared AFTER all named ``/admin/<word>`` GET
routes (pending, audit, contradictions, duplicate-candidates) so the
catch-all ``{item_id}`` doesn't shadow them — FastAPI matches in
declaration order.
"""
repo = KnowledgeRepository(conn)
item = repo.get_by_id(item_id)
if not item:
raise HTTPException(status_code=404, detail="item_not_found")
return item
@router.patch("/admin/{item_id}")
async def admin_patch_item(
item_id: str,
request: PatchItemRequest,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Partial update — accepts category/domain/tags/audience/title/content.
Replaces the narrow ``POST /api/memory/admin/edit`` (kept one release as
a thin alias). Audit row tagged ``corporate_memory.update_item`` records
which fields changed (not the full diff — keep audit rows compact).
"""
repo = KnowledgeRepository(conn)
_get_item_or_404(repo, item_id)
# ``exclude_unset=True`` preserves explicit ``null`` values from the request
# body so callers can clear previously-set Optional fields (e.g. PATCH
# ``{"audience": null}`` resets audience to NULL). With ``exclude_none=True``
# those nulls were silently dropped — the only path to clear was the
# empty-string short-circuit on ``domain``, and ``audience`` had no clearing
# path at all. See PR #126 round-4 review.
updates = request.model_dump(exclude_unset=True)
if "domain" in updates and updates["domain"]:
_validate_domain_slug(updates["domain"], conn)
# ``title`` is NOT NULL in the schema. ``exclude_unset=True`` lets explicit
# ``null`` through, which would 500 on a DuckDB constraint violation. Reject
# at the boundary so the caller gets a 400 with a clear message instead.
if "title" in updates and updates["title"] is None:
raise HTTPException(status_code=400, detail="title cannot be null")
# M:N domain membership lives in ``knowledge_item_domains`` and is
# written via a separate junction repo call — strip from the legacy
# ``repo.update(**)`` kwargs since the knowledge_items row has no
# ``domain_ids`` column.
domain_ids = updates.pop("domain_ids", None)
if not updates and domain_ids is None:
return {"id": item_id, "updated": []}
# tags is a list — JSON-encode to match the column type, mirroring create().
repo_kwargs = dict(updates)
if "tags" in repo_kwargs:
repo_kwargs["tags"] = (
json.dumps(repo_kwargs["tags"]) if repo_kwargs["tags"] else None
)
if repo_kwargs:
repo.update(item_id, **repo_kwargs)
# Junction write — replace the item's full domain membership atomically.
# Resolve ids → slugs because replace_domains_for_item takes slugs;
# unknown ids raise 400 (admin's chip-input only picks from
# /api/admin/memory-domains so a missing id means a race or an
# already-deleted domain).
if domain_ids is not None:
from src.repositories.memory_domains import MemoryDomainsRepository
dom_repo = MemoryDomainsRepository(conn)
if domain_ids:
placeholders = ",".join(["?"] * len(domain_ids))
rows = conn.execute(
f"SELECT id, slug FROM memory_domains WHERE id IN ({placeholders})",
domain_ids,
).fetchall()
id_to_slug = {r[0]: r[1] for r in rows}
missing = [i for i in domain_ids if i not in id_to_slug]
if missing:
raise HTTPException(
status_code=400,
detail=f"unknown_domain_ids: {missing}",
)
slugs = [id_to_slug[i] for i in domain_ids]
else:
slugs = []
dom_repo.replace_domains_for_item(
item_id, slugs, added_by=user["email"]
)
audit_keys = sorted(updates.keys())
if domain_ids is not None:
audit_keys.append("domain_ids")
_audit_action(
conn,
user["email"],
"update_item",
item_id,
{"updated_fields": audit_keys},
)
return {"id": item_id, "updated": audit_keys}
@router.post("/admin/bulk-update")
async def admin_bulk_update(
request: BulkUpdateRequest,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Apply ``updates`` to every id in ``item_ids``. Per-id audit rows.
Returns a per-id status map plus rolled-up convenience lists (200 even on
partial failure — the body distinguishes successes from misses).
"""
repo = KnowledgeRepository(conn)
updates = dict(request.updates or {})
# Reject governance-sensitive fields BEFORE hitting the repo. _UPDATABLE_FIELDS
# in the repo is broad on purpose; this endpoint is the narrow path. Callers
# that need to flip status/sensitivity/is_personal must use the dedicated
# governance endpoints so the right audit row is written. See PR #126 review.
disallowed = sorted(k for k in updates.keys() if k not in _BULK_UPDATE_ALLOWED)
if disallowed:
raise HTTPException(
status_code=400,
detail=(
f"updates contains disallowed field(s): {disallowed}. "
f"Allowed: {sorted(_BULK_UPDATE_ALLOWED)}"
),
)
if "domain" in updates and updates["domain"]:
_validate_domain_slug(updates["domain"], conn)
# Mirror the PATCH boundary check — title is NOT NULL in the schema, so
# an explicit null here would fall through to a per-item Constraint Error
# in repo.bulk_update() instead of a clean 400 to the caller.
if "title" in updates and updates["title"] is None:
raise HTTPException(status_code=400, detail="title cannot be null")
if not request.item_ids:
return {"updated": [], "not_found": [], "errors": {}}
statuses = repo.bulk_update(request.item_ids, updates)
# Allowlist already enforced above, so every key in updates is auditable.
audited_fields = sorted(updates.keys())
updated: List[str] = []
not_found: List[str] = []
errors: dict = {}
for item_id, status in statuses.items():
if status == "updated":
updated.append(item_id)
_audit_action(
conn,
user["email"],
"bulk_update",
item_id,
{"updated_fields": audited_fields, "batch": True},
)
elif status == "not_found":
not_found.append(item_id)
else:
errors[item_id] = status
return {"updated": updated, "not_found": not_found, "errors": errors}
# Axes the tree endpoint groups by. Anything else → 400. Order matters for
# the default chip rendering in the UI.
_TREE_AXES = ("domain", "category", "tag", "audience")
def _label_for_axis(axis: str, key: Optional[str]) -> str:
"""Pretty bucket label for the tree UI. Falls back to the raw key."""
if key is None or key == "":
return {
"domain": "(no domain)",
"category": "(no category)",
"tag": "(no tag)",
"audience": "All users",
}.get(axis, "(unset)")
if axis == "audience" and key == "all":
return "All users"
if axis == "audience" and key.startswith("group:"):
return f"Group: {key[len('group:'):]}"
return key
def _matches_chip_filters(
item: dict,
*,
status_filter: Optional[str],
source_type: Optional[str],
audience: Optional[str],
has_duplicate_ids: Optional[set],
q: Optional[str],
) -> bool:
"""Apply chip filters to an already-RBAC-filtered item."""
if status_filter and item.get("status") != status_filter:
return False
if source_type and item.get("source_type") != source_type:
return False
if audience:
# Treat NULL audience as 'all' so chip-filter behavior matches the
# SQL audience filter, ``count_by_audience`` (COALESCE→'all'), and the
# tree's ``_bucket_key`` (NULL → 'all'). Without this coalesce,
# NULL-audience items disappear from the ``audience=all`` chip even
# though the rest of the system treats them as visible-to-everyone.
item_audience = item.get("audience") or "all"
if item_audience != audience:
return False
if has_duplicate_ids is not None and item.get("id") not in has_duplicate_ids:
return False
if q:
needle = q.lower()
title = (item.get("title") or "").lower()
content = (item.get("content") or "").lower()
if needle not in title and needle not in content:
return False
return True
@router.get("/tree")
async def get_tree(
axis: str = "domain",
status_filter: Optional[str] = None,
source_type: Optional[str] = None,
audience: Optional[str] = None,
q: Optional[str] = None,
has_duplicate: bool = False,
exclude_personal: bool = True,
is_required: Optional[bool] = None,
page: int = 1,
per_page: int = 50,
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Server-side grouping for the Browse / Group-by tree (issue #62).
Returns ``{groups: [{key, label, count, items: [...]}]}`` already
RBAC-filtered + chip-filtered. The same ``_effective_groups`` and
``_can_view_item`` helpers used by ``GET /api/memory`` apply, so a
non-admin caller never sees personal items belonging to others, and
audience-restricted items only surface for members of the audience
group.
On the ``tag`` axis a single item appears once per tag it holds — that
is the intended "overlapping bucket" affordance. Every other axis puts
each item in its single canonical bucket.
"""
if axis not in _TREE_AXES:
raise HTTPException(
status_code=400,
detail=f"axis must be one of: {list(_TREE_AXES)}",
)
repo = KnowledgeRepository(conn)
is_priv = _is_privileged_viewer(user, conn)
effective_groups = _effective_groups(user, conn)
# Privacy parity with ``GET /api/memory``: non-admin can never opt out.
effective_exclude_personal = True if not is_priv else exclude_personal
page = max(page, 1)
per_page = max(min(per_page, 500), 1)
# has_duplicate=true narrows the candidate set to items present in any
# unresolved likely_duplicate relation. Computed once; intersected per
# item below.
has_duplicate_ids: Optional[set] = None
if has_duplicate:
rels = repo.list_relations(
relation_type=DUPLICATE_RELATION_TYPE, resolved=False, limit=10000,
)
has_duplicate_ids = {
id_ for r in rels for id_ in (r["item_a_id"], r["item_b_id"])
}
# Audience-axis privacy (decision 13): non-admins only see their own
# group buckets + null/all. Use the audience pre-filter on the SQL side
# so non-admins never accidentally see another group's bucket count.
granted_domains = _caller_granted_memory_domains(user, conn)
statuses = [status_filter] if status_filter else None
items = repo.list_items(
statuses=statuses,
source_type=source_type,
is_required=is_required,
exclude_personal=effective_exclude_personal,
user_groups=effective_groups,
granted_domains=granted_domains,
limit=10000,
offset=0,
)
# Apply remaining chip filters that don't have a SQL layer yet.
visible: List[dict] = []
for item in items:
if not _can_view_item(user, item, is_priv):
continue
if not _matches_chip_filters(
item,
status_filter=None, # already in SQL
source_type=None, # already in SQL
audience=audience,
has_duplicate_ids=has_duplicate_ids,
q=q,
):
continue
visible.append(item)
# Group items by axis. tag is multi-bucket; everything else is single.
groups: dict = {}
def _bucket_key(item: dict, axis: str) -> List[str]:
if axis == "tag":
tags = item.get("tags")
if isinstance(tags, str):
try:
tags = json.loads(tags)
except json.JSONDecodeError:
tags = []
if not tags:
return [""]
return [str(t) for t in tags]
if axis == "audience":
aud = item.get("audience")
return [aud if aud else "all"]
if axis == "category":
return [item.get("category") or ""]
# default: domain
return [item.get("domain") or ""]
for item in visible:
for key in _bucket_key(item, axis):
bucket = groups.setdefault(key, {
"key": key,
"label": _label_for_axis(axis, key),
"items": [],
})
bucket["items"].append(item)
# Stable ordering: alphabetic on key with the empty bucket sinking to
# the bottom — the UI usually wants the "real" buckets up top.
ordered = sorted(
groups.values(),
key=lambda g: (g["key"] == "", g["key"].lower() if isinstance(g["key"], str) else ""),
)
for g in ordered:
g["count"] = len(g["items"])
# Page over groups (not items): operators paging through hundreds of
# tags want bucket-level pagination. The UI expands a bucket to see all
# its items.
start = (page - 1) * per_page
paged = ordered[start:start + per_page]
return {
"axis": axis,
"groups": paged,
"page": page,
"per_page": per_page,
"total_groups": len(ordered),
"total_items": sum(g["count"] for g in ordered),
}
# ---- Bundle endpoint ----
def _build_per_domain_markdown(
slug: str, user: dict, conn: duckdb.DuckDBPyConnection
) -> Response:
"""Render a deterministic markdown bundle for a single memory domain.
Used by ``agnes pull`` to write ``~/.claude/memory/<slug>/bundle.md``.
The bundle includes both ``is_required=TRUE`` and approved items so
the per-domain md5 in ``/api/sync/manifest`` (built from the same
item set in ``_build_memory_domains_section``) matches the md5 of
what the CLI just received. Items are sorted by ``id`` to mirror the
manifest's md5 computation byte-for-byte (Section 5.1 of the
unified-stack design).
RBAC: the caller must have a grant on the domain — admins bypass
via ``can_access``'s admin short-circuit. Anonymous or grantless
callers get 403.
"""
repo = MemoryDomainsRepository(conn)
dom = repo.get_by_slug(slug)
if not dom:
raise HTTPException(status_code=404, detail="memory_domain_not_found")
if not can_access(user["id"], "memory_domain", dom["id"], conn):
raise HTTPException(status_code=403, detail="no_grant")
# Pull items the same way the manifest md5 helper does — id order,
# full payload (title/status/is_required pulled via the knowledge
# repository for content), no token-budget truncation.
items_meta = repo.list_items_of_domain(dom["id"], limit=10000)
if not items_meta:
body = f"# {dom['name']}\n\n_No items in this domain yet._\n"
return Response(content=body, media_type="text/markdown; charset=utf-8")
# Fetch full bodies — list_items_of_domain only returns id/title/status.
knowledge_repo = KnowledgeRepository(conn)
full_items: list = []
for meta in sorted(items_meta, key=lambda r: r["id"]):
full = knowledge_repo.get_by_id(meta["id"])
if not full:
continue
full_items.append(full)
lines: list = [f"# {dom['name']}", ""]
if dom.get("description"):
lines.append(dom["description"])
lines.append("")
required = [it for it in full_items if it.get("is_required")]
approved = [
it
for it in full_items
if not it.get("is_required") and it.get("status") == "approved"
]
if required:
lines.append("## Required")
lines.append("")
for it in required:
lines.append(f"### {it.get('title', 'Untitled')}")
lines.append("")
lines.append(it.get("content", "") or "")
lines.append("")
if approved:
lines.append("## Approved")
lines.append("")
for it in approved:
lines.append(f"### {it.get('title', 'Untitled')}")
lines.append("")
lines.append(it.get("content", "") or "")
lines.append("")
body = "\n".join(lines).rstrip() + "\n"
return Response(content=body, media_type="text/markdown; charset=utf-8")
@router.get("/bundle")
async def get_bundle(
domain: Optional[str] = None,
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Token-budgeted bundle of knowledge items for AI agent injection.
Mandatory items are always included regardless of the token budget.
Approved items are confidence×recency-ranked and included until the budget
is exhausted. Audience-filtered by the caller's group memberships (admins
see everything).
v49: when ``?domain=<slug>`` is supplied the response shape switches
to ``text/markdown`` containing a deterministic per-domain bundle —
that's what ``agnes pull`` writes to ``~/.claude/memory/<slug>/bundle.md``.
RBAC: the caller must have a ``MEMORY_DOMAIN`` grant on the domain
(admins bypass per ``can_access``). The markdown body sorts items
alphabetically by title and includes both required and approved
items (required first, with a marker) so the bundle md5 in the
manifest matches what the CLI re-renders.
"""
from datetime import datetime, timezone
# ----- Per-domain markdown variant (v49) -----
if domain:
return _build_per_domain_markdown(domain, user, conn)
repo = KnowledgeRepository(conn)
effective_groups = _effective_groups(user, conn)
granted_domains = _caller_granted_memory_domains(user, conn)
# v46: the bundle is what AI agents inject as context, so the opt-out
# has real effect here — it's always-on for the calling user. Mandatory
# items are exempted by the EXISTS subquery's status guard inside
# ``list_items``; the user's dismissal row for a then-approved item is
# silently ignored if/when the item is later mandated.
dismissed_by_user_id = user["id"]
# v49: Required tier rides on is_required boolean. Was statuses=['mandatory'].
mandatory = repo.list_items(
is_required=True,
exclude_personal=True,
user_groups=effective_groups,
granted_domains=granted_domains,
dismissed_by_user=dismissed_by_user_id,
hide_dismissed=True,
limit=1000,
offset=0,
)
approved = repo.list_items(
statuses=["approved"],
is_required=False,
exclude_personal=True,
user_groups=effective_groups,
granted_domains=granted_domains,
dismissed_by_user=dismissed_by_user_id,
hide_dismissed=True,
limit=1000,
offset=0,
)
# Rank approved by confidence × recency (days since updated_at, max 365).
# updated_at is intentional: a recently admin-edited item reflects a human
# who just reviewed and corrected it, making it more trustworthy than an
# older untouched item. This differs from confidence.py which decays from
# created_at — the two scores serve different purposes (credibility vs freshness).
now = datetime.now(timezone.utc)
def _rank(item: dict) -> float:
confidence = float(item["confidence"]) if item.get("confidence") is not None else 0.5
updated_raw = item.get("updated_at")
if updated_raw:
try:
if isinstance(updated_raw, str):
from datetime import datetime as dt
updated = dt.fromisoformat(updated_raw.replace("Z", "+00:00"))
else:
updated = updated_raw
if updated.tzinfo is None:
from datetime import timezone as tz
updated = updated.replace(tzinfo=tz.utc)
age_days = max((now - updated).days, 0)
except Exception:
age_days = 365
else:
age_days = 365
recency = max(0.0, 1.0 - age_days / 365.0)
return confidence * recency
approved_ranked = sorted(approved, key=_rank, reverse=True)
def _token_est(item: dict) -> int:
return len((item.get("title", "") + " " + item.get("content", ""))) // _CHARS_PER_TOKEN
budget_remaining = BUNDLE_TOKEN_BUDGET - sum(_token_est(i) for i in mandatory)
approved_included = []
for item in approved_ranked:
cost = _token_est(item)
if budget_remaining - cost < 0:
break
approved_included.append(item)
budget_remaining -= cost
return {
"mandatory": mandatory,
"approved": approved_included,
"token_estimate": BUNDLE_TOKEN_BUDGET - budget_remaining,
"token_budget": BUNDLE_TOKEN_BUDGET,
}