agnes-the-ai-analyst/tests/test_memory_api.py
ZdenekSrotyr 5f6bb7a4b2
fix(security+ops) + release(0.12.1): #82 #85 #87 hardening + cut 0.12.1 (#104)
* fix(security+ops): #82 #85 #87 — auth hardening, API validation, deploy posture

Security and operational hardening across three issue groups:

- M23: docker-compose.override.yml → docker-compose.dev.yml (BREAKING, prod foot-gun)
- C13: Container runs as non-root user 'agnes' (USER directive in Dockerfile)
- M21: Docker resource limits (mem_limit, cpus) on app + scheduler
- M22: Caddyfile security headers (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, -Server)
- M17: /api/health split into minimal (unauth) + /api/health/detailed (auth) (BREAKING)
- M26: release.yml restricts build-and-push to main + workflow_dispatch; paths-ignore for docs

- C2: table_id traversal validation on /api/data/{table_id}/download
- M4: Upload streaming (chunk-read + temp file) instead of full-buffer; /local-md hashed filename

- C5: reset_token removed from POST /api/users/{id}/reset-password response
- C8: Startup WARNING when no user has password_hash (bootstrap window visible)
- M9: Audit log on failed web form login (mirrors /auth/token endpoint)
- M10: Atomic magic-link consume via compare-and-swap (CONSUMED: marker + DuckDB conflict catch)

Also: SSRF protection on /api/admin/configure (#46), memory stats SQL aggregation (#90)

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* fix(review): SSRF 169.254.x.x + IPv6 multicast; M10 marker cleanup safety

Review fixes:
- Add 169.254.0.0/16 (link-local, cloud metadata) to SSRF regex — was
  missing, allowing requests to AWS/GCP/Azure metadata endpoints
- Add ff[0-9a-f]{2}: (IPv6 multicast) to SSRF regex
- M10: wrap Step 3 (CONSUMED marker cleanup) in try-except with
  warning log — prevents unhandled exception if DB write fails after
  successful token consumption
- Add test for 169.254.169.254 SSRF rejection

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* fix(review): SSRF IPv6 bypass, CLI health endpoint, upload FD leak

Address Devin Review findings on PR #104:

1. SSRF IPv6 bypass: Replace hostname regex with DNS resolution +
   ipaddress module checks. The old regex patterns like `fe80:` only
   matched up to the first colon, missing real IPv6 addresses like
   `fe80::1`, `fc00::1`, `ff02::1`. The new approach resolves the
   hostname via getaddrinfo and checks each resulting IP against
   ipaddress.is_private/is_loopback/is_link_local/is_reserved/is_multicast.

2. CLI commands broken: `da setup test-connection`, `da setup verify`,
   `da diagnose`, `da status` all called /api/health expecting the old
   format (status=="healthy", services dict). Now they call
   /api/health/detailed for service-level checks (with graceful fallback
   to the minimal endpoint when auth is not configured).

3. Temp file handle leak: _stream_to_temp returns an open
   NamedTemporaryFile; callers now close it before shutil.move() to
   prevent FD leaks until GC.

Also adds IPv6 SSRF test cases (loopback, link-local, unique-local,
multicast) with mocked DNS resolution for test environment independence.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* fix(review): download regex blocks hyphenated IDs; document health split

Address Devin Review round-3 findings on PR #104:

1. _SAFE_IDENTIFIER regex blocked hyphenated table IDs: The download
   endpoint used the strict SQL-identifier regex which does not allow
   dots or hyphens, but Keboola table IDs like in.c-crm.orders
   contain both. Switched to _SAFE_QUOTED_IDENTIFIER which allows dots
   and hyphens while still blocking path-traversal chars (/, .., \)
   and quote/control characters. Added test for hyphenated/dotted IDs.

2. Documented health endpoint split in DEPLOYMENT.md: Added Health
   checks & external monitoring section explaining both endpoints
   (minimal unauth /api/health vs authenticated /api/health/detailed)
   and how to wire external monitoring tools to the detailed endpoint
   with a PAT.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* release(0.12.1): cut hotfix for snapshot integrity + #82/#85/#87 hardening

* fix(security): apply CAS pattern to password reset confirm (#82/M10 follow-up)

Devin review on the rebased PR flagged the asymmetry: magic-link verify
got the atomic compare-and-swap pattern in the original M10 fix, but
password reset confirm at /auth/password/reset/confirm was still using
read-validate-clear. Two concurrent POSTs with the same valid reset
token could both succeed in setting different new passwords (last-write-
wins). Lower severity than the magic-link race because the attacker
would need the reset token AND to race the legitimate user, but the
asymmetry was a polish gap.

Mirrors app/auth/providers/email.py::_consume_token CAS exactly: write
unique CONSUMED:<random> marker via UPDATE...WHERE token=old_token, then
SELECT to verify our marker won, then proceed. Only the winner clears
the marker and applies the password change.

New regression test_concurrent_reset_only_one_wins in
tests/test_password_flows.py::TestResetConfirm pins the contract: two
ThreadPoolExecutor workers + Barrier hit /reset/confirm with the same
token; exactly one gets 302 (password applied), the other gets 200 with
'Invalid or expired'. Sanity-checked against the pre-CAS code — both
POSTs got 302 (race confirmed).

---------

Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-04-28 19:57:30 +02:00

322 lines
11 KiB
Python

"""Tests for corporate memory API — knowledge items, voting, governance."""
import pytest
from src.repositories.knowledge import KnowledgeRepository
def _auth(token):
return {"Authorization": f"Bearer {token}"}
class TestMemoryCreate:
def test_create_knowledge_item(self, seeded_app):
c = seeded_app["client"]
token = seeded_app["admin_token"]
resp = c.post(
"/api/memory",
json={"title": "Best Practice", "content": "Always document your code.", "category": "engineering"},
headers=_auth(token),
)
assert resp.status_code == 201
data = resp.json()
assert "id" in data
assert data["status"] == "pending"
def test_create_with_tags(self, seeded_app):
c = seeded_app["client"]
token = seeded_app["analyst_token"]
resp = c.post(
"/api/memory",
json={
"title": "Tagged Item",
"content": "Content here",
"category": "process",
"tags": ["tag1", "tag2"],
},
headers=_auth(token),
)
assert resp.status_code == 201
assert "id" in resp.json()
def test_create_missing_title_returns_422(self, seeded_app):
c = seeded_app["client"]
token = seeded_app["admin_token"]
resp = c.post(
"/api/memory",
json={"content": "No title", "category": "engineering"},
headers=_auth(token),
)
assert resp.status_code == 422
def test_create_missing_content_returns_422(self, seeded_app):
c = seeded_app["client"]
token = seeded_app["admin_token"]
resp = c.post(
"/api/memory",
json={"title": "No content", "category": "engineering"},
headers=_auth(token),
)
assert resp.status_code == 422
def test_create_requires_auth(self, seeded_app):
c = seeded_app["client"]
resp = c.post(
"/api/memory",
json={"title": "Test", "content": "Content", "category": "engineering"},
)
assert resp.status_code == 401
class TestMemoryList:
def _create_item(self, c, token, title="Test Item", category="engineering"):
resp = c.post(
"/api/memory",
json={"title": title, "content": f"Content for {title}", "category": category},
headers=_auth(token),
)
assert resp.status_code == 201
return resp.json()["id"]
def test_list_empty(self, seeded_app):
c = seeded_app["client"]
token = seeded_app["admin_token"]
resp = c.get("/api/memory", headers=_auth(token))
assert resp.status_code == 200
data = resp.json()
assert "items" in data
assert "count" in data
def test_list_requires_auth(self, seeded_app):
c = seeded_app["client"]
resp = c.get("/api/memory")
assert resp.status_code == 401
def test_list_pagination(self, seeded_app):
c = seeded_app["client"]
token = seeded_app["admin_token"]
# Create 3 items
for i in range(3):
self._create_item(c, token, title=f"Item {i}")
# Page 1 with per_page=2
resp = c.get("/api/memory?page=1&per_page=2", headers=_auth(token))
assert resp.status_code == 200
data = resp.json()
assert data["per_page"] == 2
assert data["page"] == 1
assert len(data["items"]) <= 2
def test_list_search(self, seeded_app):
c = seeded_app["client"]
token = seeded_app["admin_token"]
self._create_item(c, token, title="Unique Keyword SearchTarget")
self._create_item(c, token, title="Another Item")
resp = c.get("/api/memory?search=SearchTarget", headers=_auth(token))
assert resp.status_code == 200
data = resp.json()
assert data["count"] >= 1
titles = [item["title"] for item in data["items"]]
assert any("SearchTarget" in t for t in titles)
class TestMemoryStats:
def test_get_stats_returns_counts(self, seeded_app):
c = seeded_app["client"]
token = seeded_app["admin_token"]
resp = c.get("/api/memory/stats", headers=_auth(token))
assert resp.status_code == 200
data = resp.json()
assert "total" in data
assert isinstance(data["total"], int)
assert data["total"] >= 0
assert "by_status" in data
assert isinstance(data["by_status"], dict)
assert "categories" in data
assert isinstance(data["categories"], list)
def test_get_stats_requires_auth(self, seeded_app):
c = seeded_app["client"]
resp = c.get("/api/memory/stats")
assert resp.status_code == 401
def test_get_stats_does_not_load_all_items(self, seeded_app):
"""Stats endpoint uses SQL aggregation, not list_items()."""
from unittest.mock import patch
c = seeded_app["client"]
token = seeded_app["admin_token"]
with patch.object(KnowledgeRepository, "list_items", side_effect=AssertionError("list_items should not be called")):
resp = c.get("/api/memory/stats", headers=_auth(token))
assert resp.status_code == 200
class TestMemoryVote:
def _create_item(self, c, token):
resp = c.post(
"/api/memory",
json={"title": "Voteable", "content": "vote me", "category": "process"},
headers=_auth(token),
)
return resp.json()["id"]
def test_vote_upvote(self, seeded_app):
c = seeded_app["client"]
token = seeded_app["admin_token"]
item_id = self._create_item(c, token)
resp = c.post(f"/api/memory/{item_id}/vote", json={"vote": 1}, headers=_auth(token))
assert resp.status_code == 200
data = resp.json()
assert data["upvotes"] >= 1
def test_vote_downvote(self, seeded_app):
c = seeded_app["client"]
token = seeded_app["admin_token"]
item_id = self._create_item(c, token)
resp = c.post(f"/api/memory/{item_id}/vote", json={"vote": -1}, headers=_auth(token))
assert resp.status_code == 200
data = resp.json()
assert data["downvotes"] >= 1
def test_vote_invalid_value_returns_400(self, seeded_app):
c = seeded_app["client"]
token = seeded_app["admin_token"]
item_id = self._create_item(c, token)
resp = c.post(f"/api/memory/{item_id}/vote", json={"vote": 5}, headers=_auth(token))
assert resp.status_code == 400
def test_vote_nonexistent_item_returns_404(self, seeded_app):
c = seeded_app["client"]
token = seeded_app["admin_token"]
resp = c.post("/api/memory/nonexistent-id/vote", json={"vote": 1}, headers=_auth(token))
assert resp.status_code == 404
def test_vote_requires_auth(self, seeded_app):
c = seeded_app["client"]
resp = c.post("/api/memory/some-id/vote", json={"vote": 1})
assert resp.status_code == 401
class TestMemoryMyVotes:
def test_get_my_votes_empty(self, seeded_app):
c = seeded_app["client"]
token = seeded_app["analyst_token"]
resp = c.get("/api/memory/my-votes", headers=_auth(token))
assert resp.status_code == 200
assert isinstance(resp.json(), dict)
def test_get_my_votes_after_voting(self, seeded_app):
c = seeded_app["client"]
token = seeded_app["admin_token"]
# Create and vote
item_resp = c.post(
"/api/memory",
json={"title": "My Vote Item", "content": "content", "category": "engineering"},
headers=_auth(token),
)
item_id = item_resp.json()["id"]
c.post(f"/api/memory/{item_id}/vote", json={"vote": 1}, headers=_auth(token))
# Check my-votes
resp = c.get("/api/memory/my-votes", headers=_auth(token))
assert resp.status_code == 200
votes = resp.json()
assert item_id in votes
assert votes[item_id] == 1
def test_my_votes_requires_auth(self, seeded_app):
c = seeded_app["client"]
resp = c.get("/api/memory/my-votes")
assert resp.status_code == 401
class TestMemoryAdminEndpoints:
def _create_item(self, c, token):
resp = c.post(
"/api/memory",
json={"title": "Admin Test", "content": "content", "category": "policy"},
headers=_auth(token),
)
assert resp.status_code == 201
return resp.json()["id"]
def test_admin_approve(self, seeded_app):
c = seeded_app["client"]
admin_token = seeded_app["admin_token"]
item_id = self._create_item(c, admin_token)
resp = c.post(f"/api/memory/admin/approve?item_id={item_id}", headers=_auth(admin_token))
assert resp.status_code == 200
assert resp.json()["status"] == "approved"
def test_admin_reject(self, seeded_app):
c = seeded_app["client"]
admin_token = seeded_app["admin_token"]
item_id = self._create_item(c, admin_token)
resp = c.post(
f"/api/memory/admin/reject?item_id={item_id}",
json={"reason": "not relevant"},
headers=_auth(admin_token),
)
assert resp.status_code == 200
assert resp.json()["status"] == "rejected"
def test_admin_mandate(self, seeded_app):
c = seeded_app["client"]
admin_token = seeded_app["admin_token"]
item_id = self._create_item(c, admin_token)
resp = c.post(
f"/api/memory/admin/mandate?item_id={item_id}",
json={"reason": "company policy", "audience": "all"},
headers=_auth(admin_token),
)
assert resp.status_code == 200
assert resp.json()["status"] == "mandatory"
def test_admin_approve_analyst_gets_403(self, seeded_app):
"""Analyst cannot use admin governance endpoints."""
c = seeded_app["client"]
admin_token = seeded_app["admin_token"]
analyst_token = seeded_app["analyst_token"]
item_id = self._create_item(c, admin_token)
resp = c.post(f"/api/memory/admin/approve?item_id={item_id}", headers=_auth(analyst_token))
assert resp.status_code == 403
def test_admin_reject_analyst_gets_403(self, seeded_app):
c = seeded_app["client"]
admin_token = seeded_app["admin_token"]
analyst_token = seeded_app["analyst_token"]
item_id = self._create_item(c, admin_token)
resp = c.post(
f"/api/memory/admin/reject?item_id={item_id}",
json={"reason": "nope"},
headers=_auth(analyst_token),
)
assert resp.status_code == 403
def test_admin_mandate_analyst_gets_403(self, seeded_app):
c = seeded_app["client"]
admin_token = seeded_app["admin_token"]
analyst_token = seeded_app["analyst_token"]
item_id = self._create_item(c, admin_token)
resp = c.post(
f"/api/memory/admin/mandate?item_id={item_id}",
json={"reason": "policy"},
headers=_auth(analyst_token),
)
assert resp.status_code == 403
def test_admin_approve_requires_auth(self, seeded_app):
c = seeded_app["client"]
resp = c.post("/api/memory/admin/approve?item_id=some-id")
assert resp.status_code == 401