agnes-the-ai-analyst/app/api/memory.py
ZdenekSrotyr 5f6bb7a4b2
fix(security+ops) + release(0.12.1): #82 #85 #87 hardening + cut 0.12.1 (#104)
* fix(security+ops): #82 #85 #87 — auth hardening, API validation, deploy posture

Security and operational hardening across three issue groups:

- M23: docker-compose.override.yml → docker-compose.dev.yml (BREAKING, prod foot-gun)
- C13: Container runs as non-root user 'agnes' (USER directive in Dockerfile)
- M21: Docker resource limits (mem_limit, cpus) on app + scheduler
- M22: Caddyfile security headers (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, -Server)
- M17: /api/health split into minimal (unauth) + /api/health/detailed (auth) (BREAKING)
- M26: release.yml restricts build-and-push to main + workflow_dispatch; paths-ignore for docs

- C2: table_id traversal validation on /api/data/{table_id}/download
- M4: Upload streaming (chunk-read + temp file) instead of full-buffer; /local-md hashed filename

- C5: reset_token removed from POST /api/users/{id}/reset-password response
- C8: Startup WARNING when no user has password_hash (bootstrap window visible)
- M9: Audit log on failed web form login (mirrors /auth/token endpoint)
- M10: Atomic magic-link consume via compare-and-swap (CONSUMED: marker + DuckDB conflict catch)

Also: SSRF protection on /api/admin/configure (#46), memory stats SQL aggregation (#90)

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* fix(review): SSRF 169.254.x.x + IPv6 multicast; M10 marker cleanup safety

Review fixes:
- Add 169.254.0.0/16 (link-local, cloud metadata) to SSRF regex — was
  missing, allowing requests to AWS/GCP/Azure metadata endpoints
- Add ff[0-9a-f]{2}: (IPv6 multicast) to SSRF regex
- M10: wrap Step 3 (CONSUMED marker cleanup) in try-except with
  warning log — prevents unhandled exception if DB write fails after
  successful token consumption
- Add test for 169.254.169.254 SSRF rejection

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* fix(review): SSRF IPv6 bypass, CLI health endpoint, upload FD leak

Address Devin Review findings on PR #104:

1. SSRF IPv6 bypass: Replace hostname regex with DNS resolution +
   ipaddress module checks. The old regex patterns like `fe80:` only
   matched up to the first colon, missing real IPv6 addresses like
   `fe80::1`, `fc00::1`, `ff02::1`. The new approach resolves the
   hostname via getaddrinfo and checks each resulting IP against
   ipaddress.is_private/is_loopback/is_link_local/is_reserved/is_multicast.

2. CLI commands broken: `da setup test-connection`, `da setup verify`,
   `da diagnose`, `da status` all called /api/health expecting the old
   format (status=="healthy", services dict). Now they call
   /api/health/detailed for service-level checks (with graceful fallback
   to the minimal endpoint when auth is not configured).

3. Temp file handle leak: _stream_to_temp returns an open
   NamedTemporaryFile; callers now close it before shutil.move() to
   prevent FD leaks until GC.

Also adds IPv6 SSRF test cases (loopback, link-local, unique-local,
multicast) with mocked DNS resolution for test environment independence.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* fix(review): download regex blocks hyphenated IDs; document health split

Address Devin Review round-3 findings on PR #104:

1. _SAFE_IDENTIFIER regex blocked hyphenated table IDs: The download
   endpoint used the strict SQL-identifier regex which does not allow
   dots or hyphens, but Keboola table IDs like in.c-crm.orders
   contain both. Switched to _SAFE_QUOTED_IDENTIFIER which allows dots
   and hyphens while still blocking path-traversal chars (/, .., \)
   and quote/control characters. Added test for hyphenated/dotted IDs.

2. Documented health endpoint split in DEPLOYMENT.md: Added Health
   checks & external monitoring section explaining both endpoints
   (minimal unauth /api/health vs authenticated /api/health/detailed)
   and how to wire external monitoring tools to the detailed endpoint
   with a PAT.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* release(0.12.1): cut hotfix for snapshot integrity + #82/#85/#87 hardening

* fix(security): apply CAS pattern to password reset confirm (#82/M10 follow-up)

Devin review on the rebased PR flagged the asymmetry: magic-link verify
got the atomic compare-and-swap pattern in the original M10 fix, but
password reset confirm at /auth/password/reset/confirm was still using
read-validate-clear. Two concurrent POSTs with the same valid reset
token could both succeed in setting different new passwords (last-write-
wins). Lower severity than the magic-link race because the attacker
would need the reset token AND to race the legitimate user, but the
asymmetry was a polish gap.

Mirrors app/auth/providers/email.py::_consume_token CAS exactly: write
unique CONSUMED:<random> marker via UPDATE...WHERE token=old_token, then
SELECT to verify our marker won, then proceed. Only the winner clears
the marker and applies the password change.

New regression test_concurrent_reset_only_one_wins in
tests/test_password_flows.py::TestResetConfirm pins the contract: two
ThreadPoolExecutor workers + Barrier hit /reset/confirm with the same
token; exactly one gets 302 (password applied), the other gets 200 with
'Invalid or expired'. Sanity-checked against the pre-CAS code — both
POSTs got 302 (race confirmed).

---------

Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-04-28 19:57:30 +02:00

311 lines
9.6 KiB
Python

"""Corporate memory endpoints — knowledge items, voting, governance admin."""
import uuid
from typing import Optional, List
from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
import duckdb
from app.auth.access import require_admin
from app.auth.dependencies import get_current_user, _get_db
from src.repositories.knowledge import KnowledgeRepository
from src.repositories.audit import AuditRepository
router = APIRouter(prefix="/api/memory", tags=["memory"])
VALID_STATUSES = ["pending", "approved", "mandatory", "rejected", "revoked", "expired"]
class CreateKnowledgeRequest(BaseModel):
title: str
content: str
category: str
tags: Optional[List[str]] = None
class VoteRequest(BaseModel):
vote: int
class AdminActionRequest(BaseModel):
reason: Optional[str] = None
audience: Optional[str] = None
class EditRequest(BaseModel):
title: Optional[str] = None
content: Optional[str] = None
class BatchActionRequest(BaseModel):
item_ids: List[str]
action: str # approve, reject, mandate, revoke
reason: Optional[str] = None
audience: Optional[str] = None
# ---- User endpoints ----
@router.get("")
async def list_knowledge(
status_filter: Optional[str] = None,
category: Optional[str] = None,
search: Optional[str] = None,
page: int = 1,
per_page: int = 50,
sort: str = "updated_at",
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""List knowledge items with filtering, pagination, search."""
repo = KnowledgeRepository(conn)
offset = (page - 1) * per_page
if search:
items = repo.search(search)
else:
statuses = [status_filter] if status_filter else None
items = repo.list_items(statuses=statuses, category=category, limit=per_page, offset=offset)
# Enrich with votes
for item in items:
votes = repo.get_votes(item["id"])
item["upvotes"] = votes["upvotes"]
item["downvotes"] = votes["downvotes"]
item["score"] = votes["upvotes"] - votes["downvotes"]
return {"items": items, "count": len(items), "page": page, "per_page": per_page}
@router.get("/stats")
async def get_stats(
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Get corporate memory statistics."""
rows = conn.execute(
"SELECT status, category, COUNT(*) as n FROM knowledge_items GROUP BY status, category"
).fetchall()
status_counts: dict[str, int] = {}
categories: set[str] = set()
total = 0
for status, category, n in rows:
status_counts[status] = status_counts.get(status, 0) + n
if category:
categories.add(category)
total += n
return {"total": total, "by_status": status_counts, "categories": sorted(categories)}
@router.post("", status_code=201)
async def create_knowledge(
request: CreateKnowledgeRequest,
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
repo = KnowledgeRepository(conn)
item_id = str(uuid.uuid4())
repo.create(
id=item_id,
title=request.title,
content=request.content,
category=request.category,
source_user=user.get("email"),
tags=request.tags,
)
return {"id": item_id, "status": "pending"}
@router.post("/{item_id}/vote")
async def vote_knowledge(
item_id: str,
request: VoteRequest,
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
if request.vote not in (1, -1):
raise HTTPException(status_code=400, detail="Vote must be 1 or -1")
repo = KnowledgeRepository(conn)
if not repo.get_by_id(item_id):
raise HTTPException(status_code=404, detail="Knowledge item not found")
repo.vote(item_id, user["id"], request.vote)
return repo.get_votes(item_id)
@router.get("/my-votes")
async def get_my_votes(
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Get current user's votes on all items."""
results = conn.execute(
"SELECT item_id, vote FROM knowledge_votes WHERE user_id = ?", [user["id"]]
).fetchall()
return {row[0]: row[1] for row in results}
# ---- Admin governance endpoints ----
def _get_item_or_404(repo: KnowledgeRepository, item_id: str) -> dict:
item = repo.get_by_id(item_id)
if not item:
raise HTTPException(status_code=404, detail="Knowledge item not found")
return item
def _audit_action(conn, admin_email: str, action: str, item_id: str, details: dict = None):
audit = AuditRepository(conn)
audit.log(user_id=admin_email, action=f"km_{action}", resource=item_id, params=details)
@router.post("/admin/approve")
async def admin_approve(
item_id: str,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
repo = KnowledgeRepository(conn)
_get_item_or_404(repo, item_id)
repo.update_status(item_id, "approved")
_audit_action(conn, user["email"], "approve", item_id)
return {"id": item_id, "status": "approved"}
@router.post("/admin/reject")
async def admin_reject(
item_id: str,
request: AdminActionRequest,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
repo = KnowledgeRepository(conn)
_get_item_or_404(repo, item_id)
repo.update_status(item_id, "rejected")
_audit_action(conn, user["email"], "reject", item_id, {"reason": request.reason})
return {"id": item_id, "status": "rejected"}
@router.post("/admin/mandate")
async def admin_mandate(
item_id: str,
request: AdminActionRequest,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
repo = KnowledgeRepository(conn)
_get_item_or_404(repo, item_id)
repo.update_status(item_id, "mandatory")
_audit_action(conn, user["email"], "mandate", item_id, {
"reason": request.reason, "audience": request.audience,
})
return {"id": item_id, "status": "mandatory"}
@router.post("/admin/revoke")
async def admin_revoke(
item_id: str,
request: AdminActionRequest,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
repo = KnowledgeRepository(conn)
_get_item_or_404(repo, item_id)
repo.update_status(item_id, "revoked")
_audit_action(conn, user["email"], "revoke", item_id, {"reason": request.reason})
return {"id": item_id, "status": "revoked"}
@router.post("/admin/edit")
async def admin_edit(
item_id: str,
request: EditRequest,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
repo = KnowledgeRepository(conn)
_get_item_or_404(repo, item_id)
updates = {}
if request.title is not None:
updates["title"] = request.title
if request.content is not None:
updates["content"] = request.content
if updates:
repo.update(item_id, **updates)
_audit_action(conn, user["email"], "edit", item_id, updates)
return {"id": item_id, "updated": list(updates.keys())}
@router.post("/admin/batch")
async def admin_batch(
request: BatchActionRequest,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Batch governance action on multiple items."""
repo = KnowledgeRepository(conn)
action_map = {
"approve": "approved",
"reject": "rejected",
"mandate": "mandatory",
"revoke": "revoked",
}
if request.action not in action_map:
raise HTTPException(status_code=400, detail=f"Invalid action: {request.action}")
new_status = action_map[request.action]
results = {"success": [], "not_found": []}
for item_id in request.item_ids:
item = repo.get_by_id(item_id)
if not item:
results["not_found"].append(item_id)
continue
repo.update_status(item_id, new_status)
_audit_action(conn, user["email"], request.action, item_id, {
"reason": request.reason, "audience": request.audience, "batch": True,
})
results["success"].append(item_id)
return results
@router.get("/admin/pending")
async def admin_pending(
category: Optional[str] = None,
page: int = 1,
per_page: int = 50,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Get pending items queue for admin review."""
repo = KnowledgeRepository(conn)
offset = (page - 1) * per_page
items = repo.list_items(statuses=["pending"], category=category, limit=per_page, offset=offset)
return {"items": items, "count": len(items)}
@router.get("/admin/audit")
async def admin_audit(
page: int = 1,
per_page: int = 50,
action: Optional[str] = None,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Get governance audit log."""
audit = AuditRepository(conn)
# Filter km_ prefixed actions
km_action = f"km_{action}" if action else None
entries = audit.query(action=km_action, limit=per_page)
if not km_action:
# Get all km_ actions
entries = conn.execute(
"SELECT * FROM audit_log WHERE action LIKE 'km_%' ORDER BY timestamp DESC LIMIT ?",
[per_page],
).fetchall()
if entries:
columns = [desc[0] for desc in conn.description]
entries = [dict(zip(columns, row)) for row in entries]
else:
entries = []
return {"entries": entries, "count": len(entries)}