agnes-the-ai-analyst/tests/test_auth_scheduler_token.py
ZdenekSrotyr 995e4cd366
fix(scheduler): HTTP marketplaces job + SCHEDULER_API_TOKEN shared secret (#127)
* fix(scheduler): HTTP marketplaces job + SCHEDULER_API_TOKEN shared secret

Two scheduler-reliability bugs surfaced after the v0.12.1 USER-agnes flip:

1. The marketplaces job called src.marketplace.sync_marketplaces() in-process
   from the scheduler container, racing the app's long-lived system.duckdb
   handle. DuckDB rejects cross-process writers — every cron tick 500-ed on
   "Could not set lock on file ... PID 0".

2. The data-refresh + new marketplaces jobs both 401-ed on the API because
   SCHEDULER_API_TOKEN was never propagated by the Terraform startup script.
   The scheduler had no credential to authenticate with.

Fix:
- New POST /api/marketplaces/sync-all (admin-only) drives the nightly refresh
  through the app process so it inherits the existing DB connection.
- Scheduler swaps fn->http for marketplaces; all jobs are now plain HTTP and
  the scheduler is reduced to a cron clock.
- New app/auth/scheduler_token.py adds a shared-secret auth path. The
  startup script generates a 256-bit secret on first boot, persists it
  across reboots, and writes it to /opt/agnes/.env. Both containers source
  the same .env. The app validates incoming Bearer tokens against the env
  var (constant-time, length-floored) and resolves matches to a synthetic
  scheduler@system.local user that's a member of the Admin system group.
  Audit-log entries from the scheduler are attributed to this user.
- app/main.py seeds the synthetic user at startup so the first cron tick
  has a valid actor; lazy seed in get_scheduler_user covers token rotation
  before the next app restart.

Tests: 5 new in tests/test_auth_scheduler_token.py covering empty/short
secret rejection, exact-match comparison, idempotent user seeding, and
lazy provisioning. 142 marketplace + scheduler tests + 96 auth tests
remain green.

Existing VMs with .env from before this change need a one-time
re-provisioning (re-run startup-script or rotate via openssl rand);
documented in CHANGELOG.

* fix(audit): use '_all' sentinel for bulk marketplace sync — Devin review #127

Avoids the literal string 'marketplace:None' in the audit_log resource
column when the bulk sync endpoint writes its summary row.

* fix(scheduler): unblock event loop + per-job timeouts — Devin review #127

Two findings from Devin re-review on commit 5fbad15:

1. BUG: trigger_sync_all was async def, so FastAPI ran it on the asyncio
   event loop. sync_marketplaces() does blocking I/O (subprocess git
   clones up to GIT_TIMEOUT_SEC=300 each, threading.Lock, DuckDB writes)
   and would freeze every concurrent request for the duration of a bulk
   sync. Switched to plain def so FastAPI auto-routes to the thread pool.

2. ANALYSIS: scheduler used a fixed 120s httpx timeout for every POST.
   Bulk marketplace sync iterates the registry under a single lock with
   up to 300s per repo — easily exceeds 120s on 2-3 slow repos. The
   scheduler then sees a timeout, doesn't update last_run, and re-fires
   on the next 30s tick, queueing redundant work. Per-job timeout
   override added to the JOBS tuple; marketplaces gets 900s (15 min),
   data-refresh keeps 120s, health-check 30s.

* fix(auth): require_session_token rejects scheduler shared secret — Devin review #127

require_session_token gates /auth/tokens (PAT minting). Pre-fix it only
rejected JWTs with typ=pat — but the scheduler shared secret is an opaque
string, so verify_token() returns None, payload becomes {}, and the
PAT-claim check silently passed. A caller bearing SCHEDULER_API_TOKEN
could mint persistent PATs that survive a secret rotation.

Added explicit is_scheduler_token() check before the PAT-claim check;
new regression test in tests/test_auth_scheduler_token.py.

Devin's other note (pre-existing async def trigger_sync at marketplaces.py:392
also calls blocking sync_one) — Devin flagged it as out-of-scope for this PR
and I agree; tracking separately.

* release(0.17.0): cut + clean up CHANGELOG duplicates

Cuts 0.17.0 (minor: scheduler shared-secret auth + sync-all endpoint
plus the deploy-shape fixes that landed since the last release tag).

Bumps pyproject from 0.15.0 — also corrects the missed bump from PR #120
(v0.16.0 was tagged on GitHub and shipped as :stable, but pyproject
stayed at 0.15.0, so /api/version, /cli/latest, and `da --version` had
been under-reporting the running release).

Removes the long-form duplicate entries for 0.13.0 / 0.14.0 / 0.15.0
above [0.16.0] — the canonical short summaries (with GitHub-release
links) already exist below 0.16.0, the long forms were leftover state
from before those versions were cut and have been silently shadowed
ever since.
2026-04-29 11:44:00 +02:00

153 lines
5.7 KiB
Python

"""Tests for the SCHEDULER_API_TOKEN shared-secret auth path."""
import tempfile
import pytest
@pytest.fixture
def fresh_db(monkeypatch):
"""Isolated DuckDB + JWT secret per test, mirroring tests/test_pat.py."""
with tempfile.TemporaryDirectory() as tmp:
monkeypatch.setenv("DATA_DIR", tmp)
monkeypatch.setenv("TESTING", "1")
monkeypatch.setenv("JWT_SECRET_KEY", "test-jwt-secret-key-minimum-32-chars!!")
# Clean slate — clear any inherited token from the host shell.
monkeypatch.delenv("SCHEDULER_API_TOKEN", raising=False)
# Force pristine state — earlier tests in the same session may have
# opened the singleton; drop it so the new DATA_DIR takes effect.
from src.db import close_system_db
close_system_db()
yield tmp
close_system_db()
def test_is_scheduler_token_disabled_when_env_unset(fresh_db, monkeypatch):
"""Empty SCHEDULER_API_TOKEN must disable the auth path entirely.
A bug here would let any caller authenticate with empty Bearer "" — the
constant-time compare would also be empty — granting admin to anyone.
"""
from app.auth.scheduler_token import is_scheduler_token
monkeypatch.delenv("SCHEDULER_API_TOKEN", raising=False)
assert is_scheduler_token("") is False
assert is_scheduler_token("anything") is False
def test_is_scheduler_token_disabled_when_env_too_short(fresh_db, monkeypatch):
"""Operator typo (SCHEDULER_API_TOKEN=todo) must NOT grant admin.
The minimum length floor exists specifically to prevent a 4-char bearer
from accidentally matching a 4-char misconfigured secret.
"""
from app.auth.scheduler_token import is_scheduler_token
monkeypatch.setenv("SCHEDULER_API_TOKEN", "too-short")
assert is_scheduler_token("too-short") is False
def test_is_scheduler_token_matches_only_exact_value(fresh_db, monkeypatch):
from app.auth.scheduler_token import is_scheduler_token
secret = "x" * 64 # > min length
monkeypatch.setenv("SCHEDULER_API_TOKEN", secret)
assert is_scheduler_token(secret) is True
assert is_scheduler_token(secret + "trailing") is False
assert is_scheduler_token(secret[:-1]) is False
assert is_scheduler_token("y" * 64) is False
def test_ensure_scheduler_user_seeds_user_and_admin_membership(fresh_db, monkeypatch):
"""First call seeds; second call is a no-op idempotent re-add."""
from app.auth.scheduler_token import (
SCHEDULER_USER_EMAIL,
ensure_scheduler_user,
)
from src.db import SYSTEM_ADMIN_GROUP, get_system_db
conn = get_system_db()
try:
user1 = ensure_scheduler_user(conn)
assert user1["email"] == SCHEDULER_USER_EMAIL
# Admin group membership exists.
admin_group = conn.execute(
"SELECT id FROM user_groups WHERE name = ?", [SYSTEM_ADMIN_GROUP],
).fetchone()
assert admin_group is not None
membership = conn.execute(
"SELECT 1 FROM user_group_members WHERE user_id = ? AND group_id = ?",
[user1["id"], admin_group[0]],
).fetchone()
assert membership is not None
# Second call — same id, no duplicate membership row.
user2 = ensure_scheduler_user(conn)
assert user2["id"] == user1["id"]
rows = conn.execute(
"SELECT COUNT(*) FROM user_group_members WHERE user_id = ? AND group_id = ?",
[user1["id"], admin_group[0]],
).fetchone()
assert rows[0] == 1
finally:
conn.close()
def test_get_scheduler_user_lazy_seeds_when_absent(fresh_db, monkeypatch):
"""First lookup with no prior seed should provision on demand.
The startup hook in app.main also seeds eagerly, but the scheduler may
present the token before main.py has finished its lifespan setup on a
cold boot — get_scheduler_user must close that gap.
"""
from app.auth.scheduler_token import (
SCHEDULER_USER_EMAIL,
get_scheduler_user,
)
from src.db import get_system_db
from src.repositories.users import UserRepository
conn = get_system_db()
try:
# Confirm user does not exist before the call.
assert UserRepository(conn).get_by_email(SCHEDULER_USER_EMAIL) is None
user = get_scheduler_user(conn)
assert user is not None
assert user["email"] == SCHEDULER_USER_EMAIL
finally:
conn.close()
def test_require_session_token_rejects_scheduler_secret(fresh_db, monkeypatch):
"""The shared scheduler secret must NOT pass `require_session_token`.
/auth/tokens (PAT minting) is gated by `require_session_token`, which
historically rejected only PATs (JWTs with typ=pat). The scheduler
secret is opaque so verify_token() returns None and the PAT-claim
check would silently pass — letting a compromised secret forge
persistent PATs that survive a rotation. Regression guard for the
Devin review on PR #127.
"""
import asyncio
from unittest.mock import MagicMock
from fastapi import HTTPException
from app.auth.dependencies import require_session_token
secret = "x" * 64
monkeypatch.setenv("SCHEDULER_API_TOKEN", secret)
request = MagicMock()
request.headers = {"authorization": f"Bearer {secret}"}
request.cookies = {}
user = {"id": "scheduler-id", "email": "scheduler@system.local"}
try:
asyncio.run(require_session_token(request=request, user=user))
except HTTPException as exc:
assert exc.status_code == 403
# Detail should signal "interactive only", flavor doesn't matter.
assert "interactive" in exc.detail.lower()
else:
raise AssertionError("require_session_token must reject scheduler secret")