* fix(scheduler): HTTP marketplaces job + SCHEDULER_API_TOKEN shared secret Two scheduler-reliability bugs surfaced after the v0.12.1 USER-agnes flip: 1. The marketplaces job called src.marketplace.sync_marketplaces() in-process from the scheduler container, racing the app's long-lived system.duckdb handle. DuckDB rejects cross-process writers — every cron tick 500-ed on "Could not set lock on file ... PID 0". 2. The data-refresh + new marketplaces jobs both 401-ed on the API because SCHEDULER_API_TOKEN was never propagated by the Terraform startup script. The scheduler had no credential to authenticate with. Fix: - New POST /api/marketplaces/sync-all (admin-only) drives the nightly refresh through the app process so it inherits the existing DB connection. - Scheduler swaps fn->http for marketplaces; all jobs are now plain HTTP and the scheduler is reduced to a cron clock. - New app/auth/scheduler_token.py adds a shared-secret auth path. The startup script generates a 256-bit secret on first boot, persists it across reboots, and writes it to /opt/agnes/.env. Both containers source the same .env. The app validates incoming Bearer tokens against the env var (constant-time, length-floored) and resolves matches to a synthetic scheduler@system.local user that's a member of the Admin system group. Audit-log entries from the scheduler are attributed to this user. - app/main.py seeds the synthetic user at startup so the first cron tick has a valid actor; lazy seed in get_scheduler_user covers token rotation before the next app restart. Tests: 5 new in tests/test_auth_scheduler_token.py covering empty/short secret rejection, exact-match comparison, idempotent user seeding, and lazy provisioning. 142 marketplace + scheduler tests + 96 auth tests remain green. Existing VMs with .env from before this change need a one-time re-provisioning (re-run startup-script or rotate via openssl rand); documented in CHANGELOG. * fix(audit): use '_all' sentinel for bulk marketplace sync — Devin review #127 Avoids the literal string 'marketplace:None' in the audit_log resource column when the bulk sync endpoint writes its summary row. * fix(scheduler): unblock event loop + per-job timeouts — Devin review #127 Two findings from Devin re-review on commit 5fbad15: 1. BUG: trigger_sync_all was async def, so FastAPI ran it on the asyncio event loop. sync_marketplaces() does blocking I/O (subprocess git clones up to GIT_TIMEOUT_SEC=300 each, threading.Lock, DuckDB writes) and would freeze every concurrent request for the duration of a bulk sync. Switched to plain def so FastAPI auto-routes to the thread pool. 2. ANALYSIS: scheduler used a fixed 120s httpx timeout for every POST. Bulk marketplace sync iterates the registry under a single lock with up to 300s per repo — easily exceeds 120s on 2-3 slow repos. The scheduler then sees a timeout, doesn't update last_run, and re-fires on the next 30s tick, queueing redundant work. Per-job timeout override added to the JOBS tuple; marketplaces gets 900s (15 min), data-refresh keeps 120s, health-check 30s. * fix(auth): require_session_token rejects scheduler shared secret — Devin review #127 require_session_token gates /auth/tokens (PAT minting). Pre-fix it only rejected JWTs with typ=pat — but the scheduler shared secret is an opaque string, so verify_token() returns None, payload becomes {}, and the PAT-claim check silently passed. A caller bearing SCHEDULER_API_TOKEN could mint persistent PATs that survive a secret rotation. Added explicit is_scheduler_token() check before the PAT-claim check; new regression test in tests/test_auth_scheduler_token.py. Devin's other note (pre-existing async def trigger_sync at marketplaces.py:392 also calls blocking sync_one) — Devin flagged it as out-of-scope for this PR and I agree; tracking separately. * release(0.17.0): cut + clean up CHANGELOG duplicates Cuts 0.17.0 (minor: scheduler shared-secret auth + sync-all endpoint plus the deploy-shape fixes that landed since the last release tag). Bumps pyproject from 0.15.0 — also corrects the missed bump from PR #120 (v0.16.0 was tagged on GitHub and shipped as :stable, but pyproject stayed at 0.15.0, so /api/version, /cli/latest, and `da --version` had been under-reporting the running release). Removes the long-form duplicate entries for 0.13.0 / 0.14.0 / 0.15.0 above [0.16.0] — the canonical short summaries (with GitHub-release links) already exist below 0.16.0, the long forms were leftover state from before those versions were cut and have been silently shadowed ever since.
153 lines
5.7 KiB
Python
153 lines
5.7 KiB
Python
"""Tests for the SCHEDULER_API_TOKEN shared-secret auth path."""
|
|
|
|
import tempfile
|
|
|
|
import pytest
|
|
|
|
|
|
@pytest.fixture
|
|
def fresh_db(monkeypatch):
|
|
"""Isolated DuckDB + JWT secret per test, mirroring tests/test_pat.py."""
|
|
with tempfile.TemporaryDirectory() as tmp:
|
|
monkeypatch.setenv("DATA_DIR", tmp)
|
|
monkeypatch.setenv("TESTING", "1")
|
|
monkeypatch.setenv("JWT_SECRET_KEY", "test-jwt-secret-key-minimum-32-chars!!")
|
|
# Clean slate — clear any inherited token from the host shell.
|
|
monkeypatch.delenv("SCHEDULER_API_TOKEN", raising=False)
|
|
# Force pristine state — earlier tests in the same session may have
|
|
# opened the singleton; drop it so the new DATA_DIR takes effect.
|
|
from src.db import close_system_db
|
|
close_system_db()
|
|
yield tmp
|
|
close_system_db()
|
|
|
|
|
|
def test_is_scheduler_token_disabled_when_env_unset(fresh_db, monkeypatch):
|
|
"""Empty SCHEDULER_API_TOKEN must disable the auth path entirely.
|
|
|
|
A bug here would let any caller authenticate with empty Bearer "" — the
|
|
constant-time compare would also be empty — granting admin to anyone.
|
|
"""
|
|
from app.auth.scheduler_token import is_scheduler_token
|
|
|
|
monkeypatch.delenv("SCHEDULER_API_TOKEN", raising=False)
|
|
assert is_scheduler_token("") is False
|
|
assert is_scheduler_token("anything") is False
|
|
|
|
|
|
def test_is_scheduler_token_disabled_when_env_too_short(fresh_db, monkeypatch):
|
|
"""Operator typo (SCHEDULER_API_TOKEN=todo) must NOT grant admin.
|
|
|
|
The minimum length floor exists specifically to prevent a 4-char bearer
|
|
from accidentally matching a 4-char misconfigured secret.
|
|
"""
|
|
from app.auth.scheduler_token import is_scheduler_token
|
|
|
|
monkeypatch.setenv("SCHEDULER_API_TOKEN", "too-short")
|
|
assert is_scheduler_token("too-short") is False
|
|
|
|
|
|
def test_is_scheduler_token_matches_only_exact_value(fresh_db, monkeypatch):
|
|
from app.auth.scheduler_token import is_scheduler_token
|
|
|
|
secret = "x" * 64 # > min length
|
|
monkeypatch.setenv("SCHEDULER_API_TOKEN", secret)
|
|
assert is_scheduler_token(secret) is True
|
|
assert is_scheduler_token(secret + "trailing") is False
|
|
assert is_scheduler_token(secret[:-1]) is False
|
|
assert is_scheduler_token("y" * 64) is False
|
|
|
|
|
|
def test_ensure_scheduler_user_seeds_user_and_admin_membership(fresh_db, monkeypatch):
|
|
"""First call seeds; second call is a no-op idempotent re-add."""
|
|
from app.auth.scheduler_token import (
|
|
SCHEDULER_USER_EMAIL,
|
|
ensure_scheduler_user,
|
|
)
|
|
from src.db import SYSTEM_ADMIN_GROUP, get_system_db
|
|
|
|
conn = get_system_db()
|
|
try:
|
|
user1 = ensure_scheduler_user(conn)
|
|
assert user1["email"] == SCHEDULER_USER_EMAIL
|
|
# Admin group membership exists.
|
|
admin_group = conn.execute(
|
|
"SELECT id FROM user_groups WHERE name = ?", [SYSTEM_ADMIN_GROUP],
|
|
).fetchone()
|
|
assert admin_group is not None
|
|
membership = conn.execute(
|
|
"SELECT 1 FROM user_group_members WHERE user_id = ? AND group_id = ?",
|
|
[user1["id"], admin_group[0]],
|
|
).fetchone()
|
|
assert membership is not None
|
|
|
|
# Second call — same id, no duplicate membership row.
|
|
user2 = ensure_scheduler_user(conn)
|
|
assert user2["id"] == user1["id"]
|
|
rows = conn.execute(
|
|
"SELECT COUNT(*) FROM user_group_members WHERE user_id = ? AND group_id = ?",
|
|
[user1["id"], admin_group[0]],
|
|
).fetchone()
|
|
assert rows[0] == 1
|
|
finally:
|
|
conn.close()
|
|
|
|
|
|
def test_get_scheduler_user_lazy_seeds_when_absent(fresh_db, monkeypatch):
|
|
"""First lookup with no prior seed should provision on demand.
|
|
|
|
The startup hook in app.main also seeds eagerly, but the scheduler may
|
|
present the token before main.py has finished its lifespan setup on a
|
|
cold boot — get_scheduler_user must close that gap.
|
|
"""
|
|
from app.auth.scheduler_token import (
|
|
SCHEDULER_USER_EMAIL,
|
|
get_scheduler_user,
|
|
)
|
|
from src.db import get_system_db
|
|
from src.repositories.users import UserRepository
|
|
|
|
conn = get_system_db()
|
|
try:
|
|
# Confirm user does not exist before the call.
|
|
assert UserRepository(conn).get_by_email(SCHEDULER_USER_EMAIL) is None
|
|
user = get_scheduler_user(conn)
|
|
assert user is not None
|
|
assert user["email"] == SCHEDULER_USER_EMAIL
|
|
finally:
|
|
conn.close()
|
|
|
|
|
|
def test_require_session_token_rejects_scheduler_secret(fresh_db, monkeypatch):
|
|
"""The shared scheduler secret must NOT pass `require_session_token`.
|
|
|
|
/auth/tokens (PAT minting) is gated by `require_session_token`, which
|
|
historically rejected only PATs (JWTs with typ=pat). The scheduler
|
|
secret is opaque so verify_token() returns None and the PAT-claim
|
|
check would silently pass — letting a compromised secret forge
|
|
persistent PATs that survive a rotation. Regression guard for the
|
|
Devin review on PR #127.
|
|
"""
|
|
import asyncio
|
|
from unittest.mock import MagicMock
|
|
|
|
from fastapi import HTTPException
|
|
|
|
from app.auth.dependencies import require_session_token
|
|
|
|
secret = "x" * 64
|
|
monkeypatch.setenv("SCHEDULER_API_TOKEN", secret)
|
|
|
|
request = MagicMock()
|
|
request.headers = {"authorization": f"Bearer {secret}"}
|
|
request.cookies = {}
|
|
|
|
user = {"id": "scheduler-id", "email": "scheduler@system.local"}
|
|
try:
|
|
asyncio.run(require_session_token(request=request, user=user))
|
|
except HTTPException as exc:
|
|
assert exc.status_code == 403
|
|
# Detail should signal "interactive only", flavor doesn't matter.
|
|
assert "interactive" in exc.detail.lower()
|
|
else:
|
|
raise AssertionError("require_session_token must reject scheduler secret")
|