agnes-the-ai-analyst/app/auth/scheduler_token.py
minasarustamyan 9de679c714
System plugins (schema v39) + marketplace UX polish + drop legacy pages (#241)
* System plugin tier with mark/unmark fanout (schema v39)

Adds a mandatory plugin tier so admins can pin a small set of curated
plugins into every user's stack from day one. Marking a plugin via the
new toggle on /admin/marketplaces materializes resource_grants for every
group and user_plugin_optouts subscriptions for every user, so the
existing resolver pulls the plugin into every served set without a new
filter layer. Hooks on user-create (Google OAuth, magic-link, admin
POST, scheduler) and group-create propagate the same materialization to
new principals. UI locks: /admin/access disables the checkbox with a
SYSTEM pill; /marketplace cards swap the "In stack" green pill for an
amber "Required" badge with shield icon; the plugin detail install
button reads "Required by your org"; /my-ai-stack toggle is disabled.
Bypass paths return 409 (DELETE /api/admin/grants for system grants,
PUT /api/my-stack/curated/.../{enabled:false}, DELETE
/api/marketplace/curated/.../install). Unmark only flips the flag —
materialized rows persist so admins curate cleanup at their leisure
through the now-unlocked /admin/access checkboxes.

* Marketplace UX polish + drop legacy /store and /my-ai-stack pages

Two-part cleanup post-v39:

(1) Page deletion. /store and /my-ai-stack were already replaced by
/marketplace?tab=flea and /marketplace?tab=my respectively, but the
standalone routes lingered. Hard delete in dev mode — no redirects,
stale bookmarks 404. The /store/new upload wizard, the flea
detail/edit pages, the admin queue, and all /api/store/* +
/api/my-stack endpoints (CLI consumers) stay. Internal hardcoded
hrefs in the upload wizard's Cancel button and the advanced-setup
page repointed to the marketplace tabs.

(2) Detail-page install button rework. The single button that morphed
between "+ Add to my stack" and "✓ In your stack" did not
communicate uninstall affordance. The installed state now renders an
inline white status label *before* a separate red-bordered
"✕ Remove from stack" button on the same row, both at identical
height to avoid layout shift. System plugins keep their locked amber
"✓ Required by your org" pill (no Remove button — API refuses 409).
The post-action hint panel now fires on remove too with the title
flipped to "✓ Removed from your stack" — Claude Code needs the same
/update-agnes-plugins refresh either way.

Also: /admin/marketplaces Details modal "Mark as system" toggle
redesigned. The button was near-invisible (matched neutral row
metadata). It's now a balanced amber-toned chip with shield icon
and a structured confirm modal replacing the native confirm() dialog
that summarizes fanout consequences before commit.

* Move stack-hint inside hero with glass-on-gradient styling

The post-action hint card ("✓ Added to your stack" with the
/update-agnes-plugins recipe) used to live below the hero in
panel-what (gray card on white page body). Clicking add/remove
inserted/removed it between the hero and content, shifting the
panels below — a noticeable scroll jump.

The hint is now anchored inside the hero's top-right corner alongside
the install/remove buttons, both as flex children of an absolutely
positioned .actions container. The card uses a translucent
white-on-glass treatment that adopts the hero's kind color (blue for
plugin, green for skill, purple for agent) without per-kind branching.
Hero is always tall enough (160px photo) to contain the action+hint
stack without overflow, so toggling the hint visibility doesn't grow
the hero or shift body content.

The hero-head grid reserves a third 300px column for the absolute
actions overlay so meta gets the proper 1fr free space instead of
being squeezed by a padding-right hack. Responsive breakpoint at
1100px reflows the actions stack below hero-head when the viewport
isn't wide enough to keep meta + actions side-by-side comfortably.

* Add optional -DataPath bind mount to run-local-dev.ps1

When the operator wants to inspect DuckDB files (system.duckdb, extracts,
marketplaces, store/, …) directly from Windows Explorer, the named volume
inside the Docker Desktop WSL VM isn't reachable. The new -DataPath param
generates a transient compose override that rebinds /data on app, scheduler,
extract (and Caddy's /srv:ro mirror) to a Windows host folder.

Fully additive — when -DataPath is omitted everything behaves exactly as
before: no override file is generated, $composeFiles array is unchanged,
finally cleanup is a no-op. Existing positional invocations
(.\run-local-dev.ps1 up | down | logs) keep binding to $Action because
$DataPath is a named-only parameter with no Position attribute.

The override is written via [System.IO.File]::WriteAllText so the YAML is
BOM-less across PS 5.1 / 7+ — Compose rejects BOM-prefixed YAML on Windows.
The override file is unique per PID and removed in the script's finally
block so concurrent invocations and crashes don't leak files.

* factor mark_system fanout into UserCuratedSubscriptionsRepository

The endpoint imported UserCuratedSubscriptionsRepository, ignored it
(noqa: F841), then duplicated the user-side fanout SQL inline. Adds
fanout_system_for_plugin() symmetric to the existing
fanout_system_for_user() and routes mark_plugin_system through it —
removes the dead import + 14 lines of inline SQL, returns the same
`affected_users` delta count, no behavior change.

* drop customer-specific path from .ps1 example

Per CLAUDE.md vendor-agnostic OSS rule: replaced
C:\\Business\\Groupon\\Agnes\\agnes-data with the generic
C:\\Users\\<you>\\agnes-data placeholder so the docstring
example reads cleanly on any reviewer's box.

* release: 0.48.0 + parallelize Release-workflow pytest

Cuts the release shipped via #228 #230 #231 #232 #233 #234 #236 #237 #238
#239 #240 plus this PR (#241). Major changes:

- System plugin tier (schema v39) — admins mark a plugin mandatory; fans
  out RBAC grants + subscriptions to every existing user/group plus
  hooks for new principals
- BREAKING: removed standalone /store + /my-ai-stack page routes
  (replaced by /marketplace?tab=flea + /marketplace?tab=my)
- Setup-prompt + bootstrap recovery fixes (#240)
- DuckDB CHECKPOINT-on-shutdown + 60s compose grace (#235)
- Marketplace + flea-market UX polish, agnes-metadata.json enrichment

Bonus: switch release.yml test step to `-n auto` (matches ci.yml).
Single-threaded was 15-20 min and frequently the bottleneck on PR
mergeability — now ~6 min.

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-10 19:15:41 +00:00

150 lines
5.9 KiB
Python

"""Shared-secret auth path for the in-cluster scheduler service.
The scheduler container ships every cron tick to the FastAPI app over HTTP
(see ``services.scheduler.__main__``). It needs a long-lived credential to
authenticate itself, but minting a real PAT for it requires a logged-in
session — chicken-and-egg at first boot.
The pragmatic solution: both the ``app`` and ``scheduler`` containers source
the same ``.env`` (via Docker Compose ``env_file: .env``). The
``infra/modules/customer-instance/startup-script.sh.tpl`` generates a random
``SCHEDULER_API_TOKEN`` once at VM provisioning and writes it there. When a
caller presents that exact secret as ``Authorization: Bearer <secret>``, the
app loads (or seeds on demand) a synthetic ``scheduler@system.local`` user
that is a member of the ``Admin`` system group — so existing RBAC paths
continue to work without special-casing.
Constraints on the secret (enforced here, not parsed):
- Empty / unset → this auth path is **disabled**. Production deploys should
set it; dev / LOCAL_DEV_MODE typically doesn't, since the scheduler
rides the dev-bypass instead.
- Length < 32 → treated as misconfiguration and disabled. Prevents an
operator typo that sets ``SCHEDULER_API_TOKEN=todo`` from accidentally
granting admin to a 4-character bearer.
- Comparison uses :func:`hmac.compare_digest` — constant-time so a remote
caller cannot mount a length-discrimination timing attack.
Audit: every action by this user is attributed to ``scheduler@system.local``,
visible in ``audit_log`` as a normal admin actor. Rotating the secret is
``edit .env → docker compose restart app scheduler``; no DB write needed.
"""
from __future__ import annotations
import hmac
import logging
import os
import uuid
from typing import Optional
import duckdb
logger = logging.getLogger(__name__)
# Identity of the synthetic user that backs the shared-secret auth path.
# Kept stable so audit-log entries from the scheduler are easy to filter.
SCHEDULER_USER_EMAIL = "scheduler@system.local"
SCHEDULER_USER_NAME = "Scheduler"
# Floor on the secret length. 32 bytes ≈ 256 bits of entropy if generated
# from /dev/urandom; well above the brute-force frontier and well above any
# typo a human is plausibly going to make.
SCHEDULER_TOKEN_MIN_LENGTH = 32
def get_scheduler_secret() -> str:
"""Return the configured shared secret, stripped. Empty when disabled."""
return os.environ.get("SCHEDULER_API_TOKEN", "").strip()
def is_scheduler_token(token: str) -> bool:
"""True iff ``token`` exactly matches the configured shared secret.
Returns False when the env var is empty or shorter than the minimum
length (auth path disabled). Uses constant-time comparison.
"""
if not token:
return False
secret = get_scheduler_secret()
if not secret or len(secret) < SCHEDULER_TOKEN_MIN_LENGTH:
return False
return hmac.compare_digest(token, secret)
def ensure_scheduler_user(conn: duckdb.DuckDBPyConnection) -> dict:
"""Idempotently provision the scheduler user + Admin group membership.
Called both from the app's startup hook (so the user exists from the
very first boot) and lazily from :func:`get_scheduler_user` so a token
presented before the next restart of the app still resolves.
Returns the user dict in the same shape ``UserRepository.get_by_email``
yields elsewhere — the caller treats it as any other authenticated user.
"""
from src.db import SYSTEM_ADMIN_GROUP
from src.repositories.user_group_members import UserGroupMembersRepository
from src.repositories.users import UserRepository
repo = UserRepository(conn)
user = repo.get_by_email(SCHEDULER_USER_EMAIL)
if not user:
user_id = str(uuid.uuid4())
repo.create(
id=user_id,
email=SCHEDULER_USER_EMAIL,
name=SCHEDULER_USER_NAME,
# No password_hash — this user authenticates via the shared
# secret only, never via /auth/login. Keeps the bootstrap
# check ("any user has a password?") accurate.
password_hash=None,
)
# v39: scheduler service user gets the same mandatory tier as
# human users. The scheduler's plugin set is rarely consumed
# interactively, but keeping the fanout symmetric prevents
# surprising drift when an operator inspects this user's stack.
try:
from src.repositories.user_curated_subscriptions import (
UserCuratedSubscriptionsRepository,
)
UserCuratedSubscriptionsRepository(
conn
).fanout_system_for_user(user_id)
except Exception:
logger.exception(
"system-plugin fanout failed for scheduler user",
)
user = repo.get_by_email(SCHEDULER_USER_EMAIL)
logger.info("Seeded scheduler service user: %s", SCHEDULER_USER_EMAIL)
admin_group = conn.execute(
"SELECT id FROM user_groups WHERE name = ?", [SYSTEM_ADMIN_GROUP],
).fetchone()
if admin_group:
UserGroupMembersRepository(conn).add_member(
user_id=user["id"],
group_id=admin_group[0],
source="system_seed",
added_by="app.auth.scheduler_token:ensure_scheduler_user",
)
return user
def get_scheduler_user(conn: duckdb.DuckDBPyConnection) -> Optional[dict]:
"""Look up the scheduler user, seeding it on demand if absent.
Returns None only when seeding fails — typically a malformed schema or
an out-of-band DB error. The caller (``get_current_user``) maps None
to a normal 401 so the failure is observable but does not crash.
"""
from src.repositories.users import UserRepository
user = UserRepository(conn).get_by_email(SCHEDULER_USER_EMAIL)
if user:
return user
try:
return ensure_scheduler_user(conn)
except Exception as e: # noqa: BLE001
logger.error("Failed to provision scheduler user on demand: %s", e)
return None