docs: consolidate and de-clutter the documentation tree (#306 )

CLAUDE.md rewritten (708 -> ~320 lines): four overlapping release
sections collapsed to one, stale v1->v35 schema history dropped (it
lives in CHANGELOG), marketplace endpoint internals and verbose
process sections moved out or tightened.

New focused docs:
- docs/RELEASING.md - release process, deploy workflows, CI quirks
  (RELEASE_TEMPLATE.md folded in as an appendix)
- docs/marketplace.md - marketplace ingestion + re-serving internals
- docs/README.md - documentation index by audience, linked from
  README.md and CLAUDE.md

Archived under docs/archive/: docs/superpowers/ (52 historical
planning artifacts), HACKATHON.md, pd-ps-comments.md,
security-audit-2026-04.md, future/NOTIFICATIONS.md.

Removed the docs/auto-install.md stub. Fixed dangling links in
connectors/jira/README.md and dev_docs/README.md, repointed
code/doc references to archived paths.

2026-05-14 18:54:22 +00:00

61 KiB

Raw Blame History

Issues #77, #78, #79, #89 — Re-wire Scheduler + TLS Hardening

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Honor per-table sync_schedule and per-script schedule at runtime (Option A — re-implement); make sidecar job intervals operator-tunable; stop disabling TLS verification globally in the OpenMetadata client.

Architecture:

API-side filter for table_registry.sync_schedule (#79). New helper filter_due_tables() in src/scheduler.py is called from app/api/sync.py:_run_sync() after repo.list_local(). Tables with no schedule keep current "always sync" behavior (opt-in feature). Manual POST /api/sync/trigger {"tables": [...]} bypasses the filter (operator override always wins).
Server-side runner endpoint POST /api/scripts/run-due (#78). The sidecar fires the endpoint on a configurable cadence; the API claims due scripts atomically (last_status='running' UPDATE … RETURNING), runs each via existing _execute_script in BackgroundTasks, and writes last_run + last_status on completion. Concurrency: a script in running state is skipped on the next tick.
Env-driven sidecar JOBS (#77). Three documented env overrides for the existing two interval jobs + tick, plus a fourth for the new script-runner job (attributed to #78 in changelog). Marketplaces stays hardcoded — outside #77 scope.
TLS verify by default in OpenMetadataClient (#89). Mirror the connectors/llm/openai_compat.py pattern: verify: bool | str = True constructor parameter; drop the module-level warnings.filterwarnings.

No DuckDB schema migration required — all touched columns (table_registry.sync_schedule, script_registry.schedule/last_run/last_status) already exist in v17.

Tech Stack: Python 3.11+, FastAPI, Pydantic v2, DuckDB, httpx. Pytest for tests.

Out of scope (intentional):

Issue #68 (Stop hook output field) — no Stop hook source lives in this OSS repo as of HEAD; the referenced TODO.md no longer exists. Needs clarification from the issue author before implementation.
Per-script concurrency beyond "skip if running" (no queue, no max-runtime detection).
Operator-defined custom sidecar jobs (would land in instance.yaml per #77's "Future option").

File Structure

New files:

tests/test_run_due_scripts.py — tests for the new /api/scripts/run-due endpoint and the claim_for_run/record_run_result repo methods.
tests/test_sync_filter.py — tests for filter_due_tables() and is_valid_schedule().

Modified files:

src/scheduler.py — add is_valid_schedule(schedule) -> bool and filter_due_tables(table_configs, sync_state_repo) -> list[dict].
app/api/sync.py — wire filter_due_tables() into _run_sync().
app/api/admin.py — Pydantic field_validator on RegisterTableRequest.sync_schedule and UpdateTableRequest.sync_schedule (reject malformed strings with 422).
src/repositories/notifications.py — extend ScriptRepository with claim_for_run(script_id) and record_run_result(script_id, status).
app/api/scripts.py — add POST /api/scripts/run-due endpoint; Pydantic field_validator on DeployScriptRequest.schedule.
services/scheduler/__main__.py — env-driven JOBS list with validation; add 4th script-runner job.
connectors/openmetadata/client.py — add verify constructor param; drop module-level warnings.filterwarnings.
config/.env.template — document the four new SCHEDULER_* env vars.
docs/DEPLOYMENT.md — new "Scheduler tuning" subsection covering the env vars.
CHANGELOG.md — entries under a new [0.19.0] section.
pyproject.toml — bump version to 0.19.0.

Untouched (intentionally):

src/db.py — schema unchanged. script_registry.last_run was always nullable; we just start writing to it.
tests/test_scheduler*.py — keep as-is. is_table_due is the reusable primitive both filter_due_tables and the script runner build on.

Pre-flight

Step P-1: Confirm worktree branch and clean state

git status
git branch --show-current

Expected: clean tree on worktree-issues-68-77-78-79-89 (or whatever this worktree's branch is).

Step P-2: Confirm test suite is green at HEAD

pytest tests/test_scheduler.py tests/test_sync_manifest.py tests/test_scripts_api.py -v 2>&1 | tail -30

Expected: all green. If any are red at HEAD, stop and investigate before adding new tests.

Task 1: `src/scheduler.py` — add `is_valid_schedule` and `filter_due_tables`

Files:

Modify: src/scheduler.py (add two new functions at the end of the module)
Create: tests/test_sync_filter.py

Why this first: Both #79 (table sync) and #78 (scripts) reuse is_valid_schedule for Pydantic validation. filter_due_tables is the pure helper #79 wires into _run_sync(). Pure-function unit tests; no FastAPI / DuckDB plumbing yet.

Step 1.1: Write the failing test file

Create tests/test_sync_filter.py:

"""Tests for the schedule-validity helper and the per-table due-filter."""

from datetime import datetime, timedelta, timezone

import pytest

from src.scheduler import filter_due_tables, is_valid_schedule


# ---------------- is_valid_schedule -----------------------------------------

@pytest.mark.parametrize("schedule", [
    "every 15m",
    "every 1h",
    "every 6h",
    "daily 05:00",
    "daily 07:00,13:00,18:00",
])
def test_is_valid_schedule_accepts_documented_formats(schedule):
    assert is_valid_schedule(schedule) is True


@pytest.mark.parametrize("schedule", [
    "",
    "every",
    "every 0m",          # zero is not a positive interval
    "every 15s",         # seconds not supported
    "daily",
    "daily 25:00",       # invalid hour
    "daily 12:60",       # invalid minute
    "daily 12:00,",      # trailing comma
    "hourly",            # unknown keyword
    "every -5m",         # negative
])
def test_is_valid_schedule_rejects_malformed_strings(schedule):
    assert is_valid_schedule(schedule) is False


def test_is_valid_schedule_treats_none_as_invalid():
    # None is "no schedule" — callers handle that case before validating.
    # The validator is for non-null strings only.
    assert is_valid_schedule(None) is False  # type: ignore[arg-type]


# ---------------- filter_due_tables -----------------------------------------

class _FakeSyncStateRepo:
    """Stub SyncStateRepository — returns last_sync per table_id."""

    def __init__(self, last_syncs: dict[str, datetime | None]):
        self._data = last_syncs

    def get_last_sync(self, table_id: str):
        return self._data.get(table_id)


def _utc(year, month, day, hour=0, minute=0):
    return datetime(year, month, day, hour, minute, tzinfo=timezone.utc)


def test_filter_due_tables_passes_through_unscheduled_tables():
    """Tables with sync_schedule=None are always due (opt-in feature)."""
    configs = [
        {"id": "t1", "name": "t1", "sync_schedule": None},
        {"id": "t2", "name": "t2", "sync_schedule": ""},
    ]
    repo = _FakeSyncStateRepo({})
    out = filter_due_tables(configs, repo, now=_utc(2026, 5, 1, 10, 0))
    assert [c["id"] for c in out] == ["t1", "t2"]


def test_filter_due_tables_drops_table_within_interval():
    """A table on 'every 1h' synced 30m ago is NOT due."""
    configs = [{"id": "fast", "name": "fast", "sync_schedule": "every 1h"}]
    repo = _FakeSyncStateRepo({"fast": _utc(2026, 5, 1, 9, 30)})
    out = filter_due_tables(configs, repo, now=_utc(2026, 5, 1, 10, 0))
    assert out == []


def test_filter_due_tables_keeps_table_past_interval():
    """A table on 'every 1h' synced 90m ago IS due."""
    configs = [{"id": "fast", "name": "fast", "sync_schedule": "every 1h"}]
    repo = _FakeSyncStateRepo({"fast": _utc(2026, 5, 1, 8, 30)})
    out = filter_due_tables(configs, repo, now=_utc(2026, 5, 1, 10, 0))
    assert [c["id"] for c in out] == ["fast"]


def test_filter_due_tables_keeps_never_synced_table():
    """No last_sync row → always due (matches is_table_due semantics)."""
    configs = [{"id": "new", "name": "new", "sync_schedule": "every 1h"}]
    repo = _FakeSyncStateRepo({})  # no entry at all
    out = filter_due_tables(configs, repo, now=_utc(2026, 5, 1, 10, 0))
    assert [c["id"] for c in out] == ["new"]


def test_filter_due_tables_treats_invalid_schedule_as_unscheduled():
    """Garbled sync_schedule: log + always sync (don't silently skip)."""
    configs = [{"id": "bad", "name": "bad", "sync_schedule": "BOGUS"}]
    repo = _FakeSyncStateRepo({"bad": _utc(2026, 5, 1, 9, 59)})
    out = filter_due_tables(configs, repo, now=_utc(2026, 5, 1, 10, 0))
    assert [c["id"] for c in out] == ["bad"]


def test_filter_due_tables_mixed_due_and_skipped():
    configs = [
        {"id": "due",     "name": "due",     "sync_schedule": "every 30m"},
        {"id": "skipped", "name": "skipped", "sync_schedule": "every 30m"},
        {"id": "always",  "name": "always",  "sync_schedule": None},
    ]
    repo = _FakeSyncStateRepo({
        "due":     _utc(2026, 5, 1, 9, 0),    # 60m ago → due
        "skipped": _utc(2026, 5, 1, 9, 50),   # 10m ago → skip
    })
    out = filter_due_tables(configs, repo, now=_utc(2026, 5, 1, 10, 0))
    assert sorted(c["id"] for c in out) == ["always", "due"]


def test_filter_due_tables_handles_naive_last_sync():
    """SyncStateRepository can return naive datetimes from older rows; helper
    must coerce to UTC instead of crashing on tz-aware vs naive comparison."""
    configs = [{"id": "old", "name": "old", "sync_schedule": "every 1h"}]
    naive_2h_ago = datetime(2026, 5, 1, 8, 0)  # no tzinfo
    repo = _FakeSyncStateRepo({"old": naive_2h_ago})
    out = filter_due_tables(configs, repo, now=_utc(2026, 5, 1, 10, 0))
    assert [c["id"] for c in out] == ["old"]

Step 1.2: Run tests — expect ImportError

pytest tests/test_sync_filter.py -v 2>&1 | tail -10

Expected: ImportError on from src.scheduler import filter_due_tables, is_valid_schedule — those symbols don't exist yet.

Step 1.3: Implement is_valid_schedule and filter_due_tables

Append to src/scheduler.py (after _parse_timestamp):

def is_valid_schedule(schedule: Optional[str]) -> bool:
    """Return True iff ``schedule`` parses as a documented schedule string.

    Accepted forms (mirroring the rest of this module):
      - ``"every Nm"`` / ``"every Nh"`` with N a positive integer
      - ``"daily HH:MM"`` (24-h, UTC) optionally comma-separated:
        ``"daily 07:00,13:00"``

    Anything else — including ``None``, empty string, or a parseable-looking
    but out-of-range value (``"daily 25:00"``) — returns False. Pydantic
    validators on the admin API call this to reject malformed input with
    422 instead of accepting it and silently no-op'ing later.
    """
    if not schedule or not isinstance(schedule, str):
        return False
    interval = parse_interval_minutes(schedule)
    if interval is not None:
        return interval > 0
    match = DAILY_PATTERN.match(schedule)
    if not match:
        return False
    return bool(_parse_daily_times(match.group(1)))


def filter_due_tables(
    table_configs: list[dict],
    sync_state_repo,
    now: Optional[datetime] = None,
) -> list[dict]:
    """Drop table configs whose ``sync_schedule`` says they are not due.

    Behaviour:
      - ``sync_schedule`` is None / empty / not a valid string → table passes
        through (no schedule = "sync on every tick", existing behaviour).
      - Valid schedule + last_sync within the cadence → drop.
      - Valid schedule + last_sync past cadence (or never) → keep.
      - Invalid schedule string → log a warning and let the table through
        (do NOT silently skip — operator surprise is worse than a redundant
        sync).

    ``sync_state_repo`` is duck-typed: only ``get_last_sync(table_id)`` is
    called, returning a ``datetime`` (tz-aware preferred, naive treated as
    UTC) or ``None``.
    """
    if now is None:
        now = datetime.now(timezone.utc)
    out: list[dict] = []
    for tc in table_configs:
        schedule = tc.get("sync_schedule")
        if not schedule:
            out.append(tc)
            continue
        if not is_valid_schedule(schedule):
            logger.warning(
                "Table %s has malformed sync_schedule %r — syncing anyway "
                "(fix the schedule string to suppress this message)",
                tc.get("id") or tc.get("name"),
                schedule,
            )
            out.append(tc)
            continue
        last_sync = sync_state_repo.get_last_sync(tc.get("id") or tc.get("name"))
        last_sync_iso: Optional[str]
        if last_sync is None:
            last_sync_iso = None
        else:
            if last_sync.tzinfo is None:
                last_sync = last_sync.replace(tzinfo=timezone.utc)
            last_sync_iso = last_sync.isoformat()
        if is_table_due(schedule, last_sync_iso, now=now):
            out.append(tc)
        else:
            logger.info(
                "Table %s skipped: schedule=%r, last_sync=%s, not due yet",
                tc.get("id") or tc.get("name"),
                schedule,
                last_sync_iso,
            )
    return out

Step 1.4: Run tests — expect green

pytest tests/test_sync_filter.py -v 2>&1 | tail -25

Expected: all green (parametrized cases included).

Step 1.5: Commit

git add src/scheduler.py tests/test_sync_filter.py
git commit -m "feat(scheduler): add is_valid_schedule + filter_due_tables helpers (#79)"

Task 2: Wire `filter_due_tables` into `_run_sync`

Files:

Modify: app/api/sync.py:37-217 (the _run_sync function)
Test: extend tests/test_sync_filter.py with one integration-style test that exercises _run_sync's filter call site (mocking subprocess + orchestrator).

Why now: With the helper green, the wiring is a 4-line change. Test stubs out the heavy machinery (subprocess, orchestrator) and asserts only the filter is invoked correctly.

Step 2.1: Add the integration test

Append to tests/test_sync_filter.py:

# ---------------- _run_sync wiring ------------------------------------------

def test_run_sync_filters_local_tables_by_schedule(monkeypatch, tmp_path):
    """`_run_sync(tables=None)` consults `filter_due_tables` and skips
    tables that are not due. Manual override (`tables=[...]`) bypasses
    the filter entirely."""
    from app.api import sync as sync_module

    # Stub get_data_source_type → 'keboola' so the keboola subprocess code
    # path is taken (also matches the existing _run_sync shape).
    monkeypatch.setattr(
        sync_module, "_get_data_dir", lambda: tmp_path,
    )
    import app.instance_config as instance_config
    monkeypatch.setattr(instance_config, "get_data_source_type", lambda: "keboola")

    # Fake registry with one due + one skipped table.
    fake_configs = [
        {"id": "due",     "name": "due",     "source_type": "keboola",
         "sync_schedule": "every 30m", "query_mode": "local"},
        {"id": "skipped", "name": "skipped", "source_type": "keboola",
         "sync_schedule": "every 30m", "query_mode": "local"},
    ]

    class _StubRegistry:
        def __init__(self, conn): pass
        def list_local(self, source_type=None): return list(fake_configs)
        def get(self, table_id):
            return next((c for c in fake_configs if c["id"] == table_id), None)

    monkeypatch.setattr(
        "src.repositories.table_registry.TableRegistryRepository",
        _StubRegistry,
    )

    # Fake sync_state: 'due' last synced 60m ago, 'skipped' 10m ago.
    from datetime import datetime, timezone
    last_syncs = {
        "due":     datetime(2026, 5, 1, 9, 0,  tzinfo=timezone.utc),
        "skipped": datetime(2026, 5, 1, 9, 50, tzinfo=timezone.utc),
    }

    class _StubState:
        def __init__(self, conn): pass
        def get_last_sync(self, table_id): return last_syncs.get(table_id)

    monkeypatch.setattr(
        "src.repositories.sync_state.SyncStateRepository",
        _StubState,
    )

    # Freeze 'now' inside src.scheduler.filter_due_tables. We do this by
    # monkeypatching filter_due_tables itself to inject `now=`.
    from src import scheduler as _sched
    real_filter = _sched.filter_due_tables
    monkeypatch.setattr(
        sync_module, "filter_due_tables",
        lambda cfgs, repo: real_filter(
            cfgs, repo, now=datetime(2026, 5, 1, 10, 0, tzinfo=timezone.utc),
        ),
    )

    # Capture the configs that subprocess.run sees (via stdin payload).
    captured = {}

    def _fake_run(cmd, input, capture_output, text, timeout, env, cwd):
        import json as _json
        captured["configs"] = _json.loads(input)
        class _R:
            returncode = 0
            stdout = "{}"
            stderr = ""
        return _R()

    monkeypatch.setattr(sync_module.subprocess, "run", _fake_run)

    # Stub orchestrator + profiler imports inside the function so we don't
    # require a real DuckDB analytics file.
    import src.orchestrator as _orch_mod

    class _StubOrch:
        def rebuild(self): return {}

    monkeypatch.setattr(_orch_mod, "SyncOrchestrator", _StubOrch)

    # Run with tables=None → filter applies → only 'due' goes to subprocess.
    sync_module._run_sync(tables=None)
    assert [c["id"] for c in captured["configs"]] == ["due"]

    # Run with explicit override → filter is BYPASSED → both go through.
    captured.clear()
    sync_module._run_sync(tables=["due", "skipped"])
    assert sorted(c["id"] for c in captured["configs"]) == ["due", "skipped"]

Step 2.2: Run the test — expect FAIL

pytest tests/test_sync_filter.py::test_run_sync_filters_local_tables_by_schedule -v 2>&1 | tail -20

Expected: AssertionError — captured["configs"] contains both tables in the first assertion (filter not yet wired in).

Step 2.3: Wire filter_due_tables into _run_sync

In app/api/sync.py, add the import near the top of _run_sync (line 50ish):

from src.scheduler import filter_due_tables
from src.repositories.sync_state import SyncStateRepository

Replace lines 56-66 (the registry-read block) with:

        # Read table configs in main process (has shared DuckDB connection)
        sys_conn = get_system_db()
        try:
            repo = TableRegistryRepository(sys_conn)
            if tables:
                # Manual operator override — bypass schedule filter entirely
                # so an admin saying "sync these specific tables now" wins.
                all_configs = [repo.get(t) for t in tables]
                table_configs = [c for c in all_configs if c is not None]
            else:
                table_configs = repo.list_local(source_type) if source_type else repo.list_local()
                # #79: drop tables whose sync_schedule says they are not due.
                # Tables without a schedule pass through (opt-in feature).
                state_repo = SyncStateRepository(sys_conn)
                table_configs = filter_due_tables(table_configs, state_repo)
        finally:
            sys_conn.close()

(Leave the auto-discovery block immediately after unchanged; it only fires when table_configs is empty after filtering, which is consistent with prior semantics.)

Step 2.4: Run the wiring test + the existing sync test — expect green

pytest tests/test_sync_filter.py tests/test_sync_manifest.py -v 2>&1 | tail -25

Expected: green. The manifest test exercises a different code path; if it regresses, the import probably broke something — re-verify the import block.

Step 2.5: Commit

git add app/api/sync.py tests/test_sync_filter.py
git commit -m "feat(sync): honor table_registry.sync_schedule at trigger time (#79)"

Task 3: Pydantic format validators for `sync_schedule`

Files:

Modify: app/api/admin.py — add field_validator on RegisterTableRequest.sync_schedule and UpdateTableRequest.sync_schedule.
Test: extend tests/test_admin_bq_register.py (or a sibling, depending on what the codebase calls the admin-register test file).

Why: Once #79 honours the field, malformed values become operator-visible bugs ("I set sync_schedule='hourly' but it never skips"). Reject at register/update time with a clear 422.

Step 3.1: Locate the right test file

grep -l "RegisterTableRequest\|/register-table" tests/ -r 2>/dev/null | head -3

Use whatever file matches. Plan continues assuming tests/test_admin_bq_register.py exists (per the file listing in Task 0); adapt path if not.

Step 3.2: Write the failing tests

Append to tests/test_admin_bq_register.py:

# --- sync_schedule format validation (#79) ----------------------------------

import pytest
from pydantic import ValidationError

from app.api.admin import RegisterTableRequest, UpdateTableRequest


@pytest.mark.parametrize("schedule", [
    "every 15m",
    "every 1h",
    "daily 05:00",
    "daily 07:00,13:00,18:00",
    None,  # explicit None is allowed (no schedule = always sync)
])
def test_register_request_accepts_valid_sync_schedule(schedule):
    req = RegisterTableRequest(name="orders", sync_schedule=schedule)
    assert req.sync_schedule == schedule


@pytest.mark.parametrize("schedule", [
    "hourly",
    "every 0m",
    "daily 25:00",
    "every 5x",
    "  ",
])
def test_register_request_rejects_malformed_sync_schedule(schedule):
    with pytest.raises(ValidationError) as exc_info:
        RegisterTableRequest(name="orders", sync_schedule=schedule)
    assert "sync_schedule" in str(exc_info.value)


@pytest.mark.parametrize("schedule", [
    "every 30m",
    "daily 08:00",
    None,
])
def test_update_request_accepts_valid_sync_schedule(schedule):
    req = UpdateTableRequest(sync_schedule=schedule)
    assert req.sync_schedule == schedule


def test_update_request_rejects_malformed_sync_schedule():
    with pytest.raises(ValidationError):
        UpdateTableRequest(sync_schedule="weekly")

Step 3.3: Run — expect FAIL

pytest tests/test_admin_bq_register.py -v -k sync_schedule 2>&1 | tail -20

Expected: failures because malformed strings are accepted today.

Step 3.4: Add the validators to app/api/admin.py

In app/api/admin.py, add the import near the top (next to other src imports, around line 27):

from src.scheduler import is_valid_schedule

In the RegisterTableRequest class (line 644), add this validator alongside the existing ones:

    @field_validator("sync_schedule", mode="before")
    @classmethod
    def _validate_sync_schedule(cls, v):
        # None / "" / pure-whitespace → no schedule, accepted.
        # Any non-empty string must parse — otherwise it would be persisted
        # and silently ignored by the runtime evaluator.
        if v in (None, ""):
            return v
        if isinstance(v, str) and not v.strip():
            return None
        if not is_valid_schedule(v):
            raise ValueError(
                f"sync_schedule must be 'every Nm' / 'every Nh' / "
                f"'daily HH:MM[,HH:MM,...]', got {v!r}"
            )
        return v

In the UpdateTableRequest class (line 780), add the same validator. (Duplication is intentional — the two models have separate field declarations and Pydantic v2 validators don't inherit cleanly across unrelated BaseModel classes. DRY-ing into a mixin is overkill for two fields.)

Step 3.5: Run — expect green

pytest tests/test_admin_bq_register.py -v -k sync_schedule 2>&1 | tail -20

Expected: green.

Step 3.6: Commit

git add app/api/admin.py tests/test_admin_bq_register.py
git commit -m "feat(admin): validate sync_schedule format on register/update (#79)"

Task 4: Extend `ScriptRepository` with `claim_for_run` and `record_run_result`

Files:

Modify: src/repositories/notifications.py — add two methods to ScriptRepository.
Create: tests/test_run_due_scripts.py (will grow across Tasks 4–6).

Why: Concurrency is "skip if running". claim_for_run is the atomic UPDATE that flips a script from idle → running and returns whether the caller actually owns the slot. record_run_result writes the post-execution status.

Step 4.1: Write failing tests

Create tests/test_run_due_scripts.py:

"""Tests for the scheduled-script runner — repo claim/release primitives,
the run-due endpoint, and Pydantic validation on DeployScriptRequest."""

from datetime import datetime, timezone

import pytest

from src.db import get_system_db
from src.repositories.notifications import ScriptRepository


@pytest.fixture()
def conn(tmp_path, monkeypatch):
    """Fresh system.duckdb in a tmp dir — uses real schema, no mocks."""
    monkeypatch.setenv("DATA_DIR", str(tmp_path))
    state_dir = tmp_path / "state"
    state_dir.mkdir(parents=True, exist_ok=True)
    c = get_system_db()
    yield c
    c.close()


def _deploy(repo: ScriptRepository, script_id="s1", schedule="every 1h"):
    repo.deploy(id=script_id, name=script_id, owner="u1",
                schedule=schedule, source="print('hi')")


# ---------------- claim_for_run ---------------------------------------------

def test_claim_for_run_succeeds_when_idle(conn):
    repo = ScriptRepository(conn)
    _deploy(repo)
    assert repo.claim_for_run("s1") is True
    row = repo.get("s1")
    assert row["last_status"] == "running"
    assert row["last_run"] is not None


def test_claim_for_run_fails_when_already_running(conn):
    repo = ScriptRepository(conn)
    _deploy(repo)
    assert repo.claim_for_run("s1") is True
    # Second claim should fail because last_status is still 'running'.
    assert repo.claim_for_run("s1") is False


def test_claim_for_run_succeeds_after_completion(conn):
    repo = ScriptRepository(conn)
    _deploy(repo)
    repo.claim_for_run("s1")
    repo.record_run_result("s1", status="success")
    # Now claimable again.
    assert repo.claim_for_run("s1") is True


def test_claim_for_run_returns_false_for_unknown_script(conn):
    repo = ScriptRepository(conn)
    assert repo.claim_for_run("does-not-exist") is False


# ---------------- record_run_result -----------------------------------------

@pytest.mark.parametrize("status", ["success", "failure"])
def test_record_run_result_writes_terminal_status(conn, status):
    repo = ScriptRepository(conn)
    _deploy(repo)
    repo.claim_for_run("s1")
    repo.record_run_result("s1", status=status)
    row = repo.get("s1")
    assert row["last_status"] == status


def test_record_run_result_rejects_running_as_terminal(conn):
    """The 'running' string is reserved for claim_for_run; record_run_result
    must reject it so a caller can't accidentally re-arm the running flag
    instead of clearing it."""
    repo = ScriptRepository(conn)
    _deploy(repo)
    repo.claim_for_run("s1")
    with pytest.raises(ValueError):
        repo.record_run_result("s1", status="running")

Step 4.2: Run — expect FAIL

pytest tests/test_run_due_scripts.py -v 2>&1 | tail -25

Expected: AttributeError on claim_for_run / record_run_result.

Step 4.3: Add the methods to ScriptRepository

In src/repositories/notifications.py, after the existing list_all method (around line 105), add:

    def claim_for_run(self, script_id: str) -> bool:
        """Atomically set last_status='running' iff the script is idle.

        Returns True iff this caller is the new owner of the run slot.
        Returns False if the script does not exist OR is already running.

        Implementation: UPDATE … WHERE last_status IS DISTINCT FROM 'running'
        + RETURNING id. DuckDB supports IS DISTINCT FROM and RETURNING; if
        zero rows come back, somebody else already owns the slot.
        """
        now = datetime.now(timezone.utc)
        result = self.conn.execute(
            """UPDATE script_registry
               SET last_status = 'running', last_run = ?
               WHERE id = ?
                 AND (last_status IS NULL OR last_status != 'running')
               RETURNING id""",
            [now, script_id],
        ).fetchone()
        return result is not None

    def record_run_result(self, script_id: str, status: str) -> None:
        """Write the terminal status of a finished run (clears 'running').

        Accepts only 'success' or 'failure' — 'running' would re-arm the
        flag instead of clearing it, defeating the purpose of the call.
        """
        if status not in ("success", "failure"):
            raise ValueError(
                f"record_run_result: status must be 'success' or 'failure', "
                f"got {status!r}"
            )
        self.conn.execute(
            "UPDATE script_registry SET last_status = ? WHERE id = ?",
            [status, script_id],
        )

Step 4.4: Run — expect green

pytest tests/test_run_due_scripts.py -v 2>&1 | tail -25

Expected: green.

Step 4.5: Commit

git add src/repositories/notifications.py tests/test_run_due_scripts.py
git commit -m "feat(scripts): add claim_for_run + record_run_result to ScriptRepository (#78)"

Task 5: `POST /api/scripts/run-due` endpoint

Files:

Modify: app/api/scripts.py — new endpoint + Pydantic validator on DeployScriptRequest.schedule.
Test: extend tests/test_run_due_scripts.py.

Why: This is the API surface the sidecar fires on every tick. It iterates script_registry, claims each due script, and queues execution in a BackgroundTask so the response returns immediately — the sidecar doesn't block on long-running scripts.

Step 5.1: Write failing tests

Append to tests/test_run_due_scripts.py:

# ---------------- DeployScriptRequest.schedule validation -------------------

from pydantic import ValidationError

from app.api.scripts import DeployScriptRequest


def test_deploy_request_accepts_valid_schedule():
    req = DeployScriptRequest(name="report", source="print(1)", schedule="every 1h")
    assert req.schedule == "every 1h"


def test_deploy_request_accepts_no_schedule():
    req = DeployScriptRequest(name="report", source="print(1)")
    assert req.schedule is None


def test_deploy_request_rejects_malformed_schedule():
    with pytest.raises(ValidationError):
        DeployScriptRequest(name="report", source="print(1)", schedule="weekly")


# ---------------- /api/scripts/run-due endpoint -----------------------------

from fastapi.testclient import TestClient

# Helper: mint a TestClient with admin auth bypass. The codebase uses
# `LOCAL_DEV_MODE=1` to short-circuit auth in tests; mirror existing test
# files (tests/test_scripts_api.py) for the canonical pattern.

@pytest.fixture()
def client(monkeypatch, tmp_path):
    monkeypatch.setenv("LOCAL_DEV_MODE", "1")
    monkeypatch.setenv("DATA_DIR", str(tmp_path))
    (tmp_path / "state").mkdir(parents=True, exist_ok=True)
    from app.main import app
    return TestClient(app)


def test_run_due_skips_scripts_without_schedule(client, monkeypatch):
    """A script with schedule=NULL is never picked up by run-due (those
    are run only via explicit POST /api/scripts/{id}/run)."""
    monkeypatch.setattr(
        "app.api.scripts._execute_script",
        lambda src, name: {"name": name, "exit_code": 0, "stdout": "", "stderr": "", "truncated": False},
    )
    deploy = client.post(
        "/api/scripts/deploy",
        json={"name": "manual-only", "source": "print(1)"},
    )
    assert deploy.status_code == 201
    resp = client.post("/api/scripts/run-due")
    assert resp.status_code == 200
    assert resp.json()["claimed"] == []


def test_run_due_claims_due_scripts(client, monkeypatch):
    """A script on 'every 1h' that has never run gets claimed and executed."""
    calls = []
    def _fake_exec(source, name):
        calls.append(name)
        return {"name": name, "exit_code": 0, "stdout": "", "stderr": "", "truncated": False}
    monkeypatch.setattr("app.api.scripts._execute_script", _fake_exec)
    deploy = client.post(
        "/api/scripts/deploy",
        json={"name": "report", "source": "print(1)", "schedule": "every 1h"},
    )
    script_id = deploy.json()["id"]
    resp = client.post("/api/scripts/run-due")
    assert resp.status_code == 200
    body = resp.json()
    assert body["claimed"] == [script_id]
    # BackgroundTasks runs synchronously inside TestClient, so the call
    # has happened by now.
    assert "report" in calls


def test_run_due_skips_scripts_already_running(client, monkeypatch):
    """A script in 'running' state must not be re-claimed by a second
    sidecar tick that arrives while the previous run is still going."""
    monkeypatch.setattr(
        "app.api.scripts._execute_script",
        # Simulate a slow run by NOT updating last_status — repo.claim_for_run
        # already wrote 'running'; we leave it that way.
        lambda src, name: {"name": name, "exit_code": 0, "stdout": "", "stderr": "", "truncated": False},
    )
    # Patch out record_run_result so the run never "completes".
    monkeypatch.setattr(
        "src.repositories.notifications.ScriptRepository.record_run_result",
        lambda self, *a, **kw: None,
    )
    deploy = client.post(
        "/api/scripts/deploy",
        json={"name": "long", "source": "print(1)", "schedule": "every 1h"},
    )
    script_id = deploy.json()["id"]
    first = client.post("/api/scripts/run-due")
    assert first.json()["claimed"] == [script_id]
    second = client.post("/api/scripts/run-due")
    assert second.json()["claimed"] == []

Step 5.2: Run — expect FAIL

pytest tests/test_run_due_scripts.py -v -k "deploy_request or run_due" 2>&1 | tail -25

Expected: ValidationError on the validator tests; 404/405 on the endpoint tests (route doesn't exist).

Step 5.3: Add the validator and endpoint

In app/api/scripts.py, near the top:

from datetime import datetime, timezone
from fastapi import BackgroundTasks
from pydantic import field_validator

from src.scheduler import is_valid_schedule, is_table_due

Replace the existing DeployScriptRequest (lines 24-27) with:

class DeployScriptRequest(BaseModel):
    name: str
    source: str
    schedule: Optional[str] = None

    @field_validator("schedule", mode="before")
    @classmethod
    def _validate_schedule(cls, v):
        if v in (None, ""):
            return None
        if isinstance(v, str) and not v.strip():
            return None
        if not is_valid_schedule(v):
            raise ValueError(
                f"schedule must be 'every Nm' / 'every Nh' / "
                f"'daily HH:MM[,HH:MM,...]', got {v!r}"
            )
        return v

Add the endpoint at the end of the route definitions (after undeploy_script):

@router.post("/run-due")
async def run_due_scripts(
    background_tasks: BackgroundTasks,
    user: dict = Depends(require_admin),
    conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
    """Run every deployed script whose ``schedule`` says it is due.

    Iterates ``script_registry``, skips rows without a schedule (those run
    only via explicit POST /{id}/run), evaluates ``is_table_due(schedule,
    last_run)``, and atomically claims each due row via
    ``ScriptRepository.claim_for_run``. Execution is queued as a
    ``BackgroundTask`` so the response returns immediately — the sidecar
    must not block waiting on a long-running script.

    Concurrency: ``claim_for_run`` flips ``last_status`` to ``'running'``
    inside the same UPDATE; a script already in that state is skipped on
    subsequent ticks until the BackgroundTask writes a terminal status via
    ``record_run_result``. There is no max-runtime detection in this PR —
    if a BackgroundTask crashes without writing a terminal status, the
    script stays stuck in ``'running'`` until an operator clears it
    manually (``UPDATE script_registry SET last_status = NULL WHERE id =
    ?``). Documenting this as an accepted v0 limitation; revisit if it
    bites in practice.
    """
    repo = ScriptRepository(conn)
    claimed: list[str] = []
    for script in repo.list_all():
        schedule = script.get("schedule")
        if not schedule:
            continue
        last_run = script.get("last_run")
        last_run_iso = last_run.isoformat() if last_run else None
        if not is_table_due(schedule, last_run_iso):
            continue
        if not repo.claim_for_run(script["id"]):
            # Lost the race / already running — next tick will retry.
            continue
        claimed.append(script["id"])
        background_tasks.add_task(
            _run_claimed_script,
            script_id=script["id"],
            source=script["source"],
            name=script["name"],
        )
    return {"claimed": claimed, "count": len(claimed)}


def _run_claimed_script(script_id: str, source: str, name: str) -> None:
    """Execute a previously-claimed script and write the terminal status.

    Runs in a FastAPI BackgroundTask, so it owns its own DB connection
    (the request-scoped conn is already gone by the time this fires).
    Any exception writes 'failure' and re-raises so the BG handler still
    surfaces the traceback in logs.
    """
    from src.db import get_system_db
    bg_conn = get_system_db()
    try:
        bg_repo = ScriptRepository(bg_conn)
        try:
            _execute_script(source, name)
            bg_repo.record_run_result(script_id, status="success")
        except Exception:
            bg_repo.record_run_result(script_id, status="failure")
            raise
    finally:
        bg_conn.close()

Step 5.4: Run — expect green

pytest tests/test_run_due_scripts.py -v 2>&1 | tail -25

Expected: green. If the LOCAL_DEV_MODE auth bypass test fixture doesn't quite work in your repo, mirror whatever pattern tests/test_scripts_api.py uses for the same client.

Step 5.5: Commit

git add app/api/scripts.py tests/test_run_due_scripts.py
git commit -m "feat(scripts): POST /api/scripts/run-due + format validator (#78)"

Task 6: Env-driven sidecar JOBS + add `script-runner` job

Files:

Modify: services/scheduler/__main__.py — replace hardcoded JOBS list with an env-driven builder; add 4th job for scripts.
Create: tests/test_scheduler_sidecar.py — small unit tests on the new builder.

Why now: With the API surface in place (Tasks 2 + 5), the sidecar is the operational glue. Refactor to env-driven config (#77) and add the script tick (#78) in one pass — the touched lines overlap.

Step 6.1: Write failing tests

Create tests/test_scheduler_sidecar.py:

"""Unit tests for the env-driven JOBS builder in services.scheduler."""

import pytest


def test_build_jobs_uses_documented_defaults(monkeypatch):
    """No env overrides → default cadences."""
    for v in (
        "SCHEDULER_DATA_REFRESH_INTERVAL",
        "SCHEDULER_HEALTH_CHECK_INTERVAL",
        "SCHEDULER_TICK_SECONDS",
        "SCHEDULER_SCRIPT_RUN_INTERVAL",
    ):
        monkeypatch.delenv(v, raising=False)
    from services.scheduler.__main__ import build_jobs, resolved_tick_seconds
    jobs = {name: schedule for name, schedule, *_ in build_jobs()}
    assert jobs["data-refresh"]    == "every 15m"
    assert jobs["health-check"]    == "every 5m"
    assert jobs["script-runner"]   == "every 1m"
    assert jobs["marketplaces"]    == "daily 03:00"
    assert resolved_tick_seconds() == 30


def test_build_jobs_honors_env_overrides(monkeypatch):
    monkeypatch.setenv("SCHEDULER_DATA_REFRESH_INTERVAL", "1800")  # 30m
    monkeypatch.setenv("SCHEDULER_HEALTH_CHECK_INTERVAL", "60")    # 1m
    monkeypatch.setenv("SCHEDULER_SCRIPT_RUN_INTERVAL", "120")     # 2m
    monkeypatch.setenv("SCHEDULER_TICK_SECONDS", "10")
    from services.scheduler.__main__ import build_jobs, resolved_tick_seconds
    jobs = {name: schedule for name, schedule, *_ in build_jobs()}
    assert jobs["data-refresh"]  == "every 30m"
    assert jobs["health-check"]  == "every 1m"
    assert jobs["script-runner"] == "every 2m"
    assert resolved_tick_seconds() == 10


@pytest.mark.parametrize("var", [
    "SCHEDULER_DATA_REFRESH_INTERVAL",
    "SCHEDULER_HEALTH_CHECK_INTERVAL",
    "SCHEDULER_TICK_SECONDS",
    "SCHEDULER_SCRIPT_RUN_INTERVAL",
])
@pytest.mark.parametrize("bad", ["0", "-5", "abc", ""])
def test_build_jobs_rejects_invalid_env(monkeypatch, var, bad):
    monkeypatch.setenv(var, bad)
    from services.scheduler.__main__ import build_jobs
    with pytest.raises(ValueError):
        build_jobs()


def test_build_jobs_rejects_tick_larger_than_smallest_interval(monkeypatch):
    """Tick must be <= the smallest job interval, otherwise jobs would
    consistently miss their cadence by up to one tick."""
    monkeypatch.setenv("SCHEDULER_HEALTH_CHECK_INTERVAL", "60")
    monkeypatch.setenv("SCHEDULER_TICK_SECONDS", "120")
    from services.scheduler.__main__ import build_jobs
    with pytest.raises(ValueError, match="tick"):
        build_jobs()


def test_build_jobs_includes_run_due_endpoint():
    """The script-runner job must POST to /api/scripts/run-due."""
    from services.scheduler.__main__ import build_jobs
    target = next(j for j in build_jobs() if j[0] == "script-runner")
    name, schedule, endpoint, method, _timeout = target
    assert endpoint == "/api/scripts/run-due"
    assert method == "POST"

Step 6.2: Run — expect FAIL

pytest tests/test_scheduler_sidecar.py -v 2>&1 | tail -20

Expected: ImportError on build_jobs / resolved_tick_seconds; or KeyError on script-runner (current JOBS list doesn't have it).

Step 6.3: Refactor services/scheduler/__main__.py

Replace the hardcoded JOBS block (lines 72-89) and the run() function with the following. Keep everything above line 72 (imports, _get_auth_token, etc.) unchanged.

# --- Env parsing ------------------------------------------------------------

_DEFAULTS = {
    "SCHEDULER_DATA_REFRESH_INTERVAL": 15 * 60,   # seconds
    "SCHEDULER_HEALTH_CHECK_INTERVAL": 5 * 60,
    "SCHEDULER_SCRIPT_RUN_INTERVAL":   1 * 60,
    "SCHEDULER_TICK_SECONDS":          30,
}


def _read_positive_int(name: str) -> int:
    """Read an env var as a positive integer or fall back to the default."""
    raw = os.environ.get(name)
    if raw is None or raw == "":
        if name not in _DEFAULTS:
            raise ValueError(f"Unknown scheduler env var: {name}")
        return _DEFAULTS[name]
    try:
        value = int(raw)
    except (TypeError, ValueError):
        raise ValueError(f"{name}={raw!r} must be a positive integer (seconds)")
    if value <= 0:
        raise ValueError(f"{name}={value} must be > 0 (seconds)")
    return value


def _seconds_to_schedule(seconds: int) -> str:
    """Convert a seconds value to the closest 'every Nm' / 'every Nh' string."""
    if seconds % 3600 == 0 and seconds >= 3600:
        return f"every {seconds // 3600}h"
    minutes = max(1, seconds // 60)
    return f"every {minutes}m"


def resolved_tick_seconds() -> int:
    """Read + validate SCHEDULER_TICK_SECONDS in isolation (test helper)."""
    return _read_positive_int("SCHEDULER_TICK_SECONDS")


def build_jobs() -> list[tuple[str, str, str, str, int]]:
    """Build the JOBS list from env, applying defaults and validation.

    Tuple shape: (name, schedule_string, endpoint, method, http_timeout_sec).
    Marketplaces stays hardcoded — promoting it to env is out of #77 scope.
    """
    refresh = _read_positive_int("SCHEDULER_DATA_REFRESH_INTERVAL")
    health  = _read_positive_int("SCHEDULER_HEALTH_CHECK_INTERVAL")
    scripts = _read_positive_int("SCHEDULER_SCRIPT_RUN_INTERVAL")
    tick    = _read_positive_int("SCHEDULER_TICK_SECONDS")
    smallest = min(refresh, health, scripts)
    if tick > smallest:
        raise ValueError(
            f"SCHEDULER_TICK_SECONDS={tick} must be <= the smallest job "
            f"interval ({smallest}s) so jobs don't consistently miss their "
            f"cadence by up to one tick"
        )
    return [
        ("data-refresh",  _seconds_to_schedule(refresh), "/api/sync/trigger",      "POST", 120),
        ("health-check",  _seconds_to_schedule(health),  "/api/health",            "GET",   30),
        ("script-runner", _seconds_to_schedule(scripts), "/api/scripts/run-due",   "POST", 600),
        ("marketplaces",  "daily 03:00",                 "/api/marketplaces/sync-all", "POST", 900),
    ]


_running = True


def _signal_handler(sig, frame):
    global _running
    logger.info(f"Received signal {sig}, shutting down...")
    _running = False


def _call_api(endpoint: str, method: str, timeout_sec: int) -> bool:
    """Call the main app API. Returns True on success."""
    url = f"{API_URL}{endpoint}"
    headers = {}
    token = _get_auth_token()
    if token:
        headers["Authorization"] = f"Bearer {token}"
    try:
        if method == "POST":
            resp = httpx.post(url, headers=headers, timeout=timeout_sec)
        else:
            resp = httpx.get(url, headers=headers, timeout=timeout_sec)
        if resp.status_code < 400:
            logger.info(f"Job {endpoint}: {resp.status_code}")
            return True
        else:
            logger.warning(f"Job {endpoint}: HTTP {resp.status_code} - {resp.text[:200]}")
            return False
    except Exception as e:
        logger.error(f"Job {endpoint} failed: {e}")
        return False


def run():
    signal.signal(signal.SIGTERM, _signal_handler)
    signal.signal(signal.SIGINT, _signal_handler)

    jobs = build_jobs()
    tick = resolved_tick_seconds()
    logger.info(
        "Scheduler started. API_URL=%s, %d jobs, tick=%ds. Schedules: %s",
        API_URL, len(jobs), tick,
        {name: schedule for name, schedule, *_ in jobs},
    )

    last_run: dict[str, str | None] = {name: None for name, *_ in jobs}

    while _running:
        now_iso = datetime.now(timezone.utc).isoformat()
        for name, schedule, endpoint, method, timeout_sec in jobs:
            if not is_table_due(schedule, last_run[name]):
                continue
            logger.info("Running job: %s (%s)", name, schedule)
            ok = _call_api(endpoint, method, timeout_sec)
            if ok:
                last_run[name] = now_iso
        time.sleep(tick)

    logger.info("Scheduler stopped.")


if __name__ == "__main__":
    run()

(Delete the old JOBS = [...] literal and the old run() body — they're fully replaced.)

Step 6.4: Run — expect green

pytest tests/test_scheduler_sidecar.py -v 2>&1 | tail -20

Expected: green.

Step 6.5: Commit

git add services/scheduler/__main__.py tests/test_scheduler_sidecar.py
git commit -m "feat(scheduler): env-driven JOBS + script-runner tick (#77, #78)"

Task 7: OpenMetadata client — TLS verify by default

Files:

Modify: connectors/openmetadata/client.py
Test: extend with a small tests file (or add to whatever existing test the connector has — search first).

Why: verify=False ships JWT bearer tokens over an unauthenticated channel; the module-level warnings.filterwarnings mutates global state. Mirror the pattern in connectors/llm/openai_compat.py which already gets this right.

Step 7.1: Locate existing OpenMetadata test file (if any)

ls tests/ | grep -i openmetadata

If empty, create tests/test_openmetadata_client.py. If a file exists, extend it.

Step 7.2: Write failing tests

Create (or extend) tests/test_openmetadata_client.py:

"""Tests for OpenMetadataClient TLS handling — see #89.

The previous version disabled TLS verification globally and suppressed the
"Unverified HTTPS request" warning at import time. Both behaviors are
fixed here.
"""

import warnings
from unittest.mock import patch


def test_client_verifies_tls_by_default():
    from connectors.openmetadata.client import OpenMetadataClient
    with patch("connectors.openmetadata.client.httpx.Client") as mock_client:
        OpenMetadataClient(base_url="https://catalog.example.com", token="t")
    kwargs = mock_client.call_args.kwargs
    assert kwargs["verify"] is True


def test_client_accepts_explicit_verify_false():
    """Operators on internal CAs may opt out — but it must be explicit."""
    from connectors.openmetadata.client import OpenMetadataClient
    with patch("connectors.openmetadata.client.httpx.Client") as mock_client:
        OpenMetadataClient(base_url="https://catalog.example.com", token="t", verify=False)
    assert mock_client.call_args.kwargs["verify"] is False


def test_client_accepts_custom_ca_bundle_path():
    from connectors.openmetadata.client import OpenMetadataClient
    with patch("connectors.openmetadata.client.httpx.Client") as mock_client:
        OpenMetadataClient(
            base_url="https://catalog.example.com",
            token="t",
            verify="/etc/ssl/certs/internal-ca.pem",
        )
    assert mock_client.call_args.kwargs["verify"] == "/etc/ssl/certs/internal-ca.pem"


def test_module_import_does_not_mutate_global_warnings_filter():
    """The previous version called warnings.filterwarnings('ignore', ...)
    at import time, which suppresses urllib3 warnings for ALL httpx
    clients in the process — not just OpenMetadata's. Drop it."""
    import importlib
    pre_filters = list(warnings.filters)
    import connectors.openmetadata.client as om
    importlib.reload(om)
    post_filters = list(warnings.filters)
    # No new "ignore Unverified HTTPS request" filter should have been added.
    new = [f for f in post_filters if f not in pre_filters]
    for action, message, *_ in new:
        if message is not None:
            assert "Unverified HTTPS request" not in message.pattern

Step 7.3: Run — expect FAIL

pytest tests/test_openmetadata_client.py -v 2>&1 | tail -20

Expected: failures — verify=False is hardcoded, and the module-level warnings.filterwarnings runs at import.

Step 7.4: Fix the client

In connectors/openmetadata/client.py:

Delete lines 14 (import warnings) and 18-19 (the warnings.filterwarnings(...) call and its comment).

Replace the __init__ signature (lines 34-59) with:

    def __init__(
        self,
        base_url: str,
        token: str,
        timeout: int = 30,
        verify: bool | str = True,
    ):
        """
        Initialize OpenMetadata API client.

        Args:
            base_url: Base URL of OpenMetadata instance (e.g., "https://catalog.example.com")
            token: JWT bearer token for authentication
            timeout: HTTP request timeout in seconds
            verify: TLS verification — True (default), False to disable
                (e.g., for self-signed certificates on internal CAs), or a
                path to a CA bundle. The previous version hardcoded False
                globally and suppressed warnings — both removed in #89.
                Operators with self-signed certs should pass an explicit
                ``verify=False`` or a CA bundle path from their config.
        """
        self.base_url = base_url.rstrip("/")
        self.token = token
        self.timeout = timeout
        self._client = httpx.Client(
            base_url=self.base_url,
            headers={
                "Authorization": f"Bearer {token}",
                "Content-Type": "application/json",
            },
            timeout=timeout,
            verify=verify,
        )

Step 7.5: Run — expect green

pytest tests/test_openmetadata_client.py -v 2>&1 | tail -20

Expected: green.

Step 7.6: Audit existing call sites

grep -rn "OpenMetadataClient(" --include="*.py" .

Any call site that previously relied on the implicit verify=False will now hit a TLS error if it talks to a self-signed instance. Update each call site to pass verify= explicitly from the config (e.g., reading OPENMETADATA_VERIFY_SSL from instance.yaml or env). If no internal config flag exists yet, add one to instance.yaml.example and surface it in config/loader.py so operators have a tuning knob. List every changed call site in the commit message.

Step 7.7: Commit

git add connectors/openmetadata/client.py tests/test_openmetadata_client.py [...any caller files updated in 7.6]
git commit -m "fix(openmetadata): verify TLS by default; drop module-level warning filter (#89)"

Task 8: Documentation + `.env.template` updates

Files:

Modify: config/.env.template
Modify: docs/DEPLOYMENT.md

Why: The new env vars are operator-facing surface — they need to be discoverable without spelunking source.

Step 8.1: Add the env vars to config/.env.template

Append a new section to config/.env.template:

# ── SCHEDULER (sidecar tuning) ──────────────────────
# All values are in seconds and must be positive integers. SCHEDULER_TICK_SECONDS
# must be <= the smallest job interval below.
# SCHEDULER_DATA_REFRESH_INTERVAL=900   # default 15 min — POST /api/sync/trigger
# SCHEDULER_HEALTH_CHECK_INTERVAL=300   # default 5 min  — GET  /api/health
# SCHEDULER_SCRIPT_RUN_INTERVAL=60      # default 1 min  — POST /api/scripts/run-due
# SCHEDULER_TICK_SECONDS=30             # default 30 s   — loop polling cadence

Step 8.2: Add a "Scheduler tuning" subsection to docs/DEPLOYMENT.md

Find the most appropriate location (probably near the existing TLS / Docker compose section) and insert:

### Scheduler tuning

The scheduler sidecar (`services/scheduler/__main__.py`) fires periodic
HTTP calls against the main app. Job cadences are configurable via env
vars on the scheduler container:

| Env var                            | Default | Purpose                                       |
| ---------------------------------- | ------- | --------------------------------------------- |
| `SCHEDULER_DATA_REFRESH_INTERVAL`  | `900`   | seconds between `POST /api/sync/trigger`      |
| `SCHEDULER_HEALTH_CHECK_INTERVAL`  | `300`   | seconds between `GET /api/health`             |
| `SCHEDULER_SCRIPT_RUN_INTERVAL`    | `60`    | seconds between `POST /api/scripts/run-due`   |
| `SCHEDULER_TICK_SECONDS`           | `30`    | loop polling cadence; must be ≤ smallest interval above |

`/api/sync/trigger` walks `table_registry`; tables with a per-row
`sync_schedule` (`every Nm` / `every Nh` / `daily HH:MM[,...]`) are
filtered to only those due for sync since their last run. Tables without
a schedule continue to run on every tick. The marketplace job runs at
`daily 03:00` UTC and is not currently env-tunable.

`/api/scripts/run-due` walks `script_registry` and runs each deployed
script whose `schedule` says it is due. Scripts in the `running` state
are skipped on subsequent ticks until the previous run writes a terminal
status. The endpoint requires admin auth (the sidecar's
`SCHEDULER_API_TOKEN` resolves to a synthetic Admin user).

Step 8.3: Commit

git add config/.env.template docs/DEPLOYMENT.md
git commit -m "docs: document scheduler env vars + per-table/script schedules (#77, #78, #79)"

Task 9: CHANGELOG entries + release cut

Files:

Modify: CHANGELOG.md
Modify: pyproject.toml

Why: Per CLAUDE.md, every user-visible change requires a CHANGELOG entry in the same PR. This is one bundled PR covering four issues; release-cut goes here as the last commit (per user convention: release-cut belongs in the next behavior-change PR, not a standalone one).

Step 9.1: Edit CHANGELOG.md

Replace the current top-of-file structure:

## [Unreleased]

## [0.18.0] — 2026-04-29
...

with:

## [Unreleased]

## [0.19.0] — 2026-04-29

### Added
- `table_registry.sync_schedule` is now honored at runtime. `POST /api/sync/trigger` (called by the scheduler sidecar every 15 min by default) drops local tables whose schedule says they are not due. Tables without a schedule continue to sync on every tick (opt-in feature). Manual `POST /api/sync/trigger {"tables":[...]}` bypasses the schedule filter — operator override always wins. (#79)
- `script_registry.schedule` is now honored at runtime via the new endpoint `POST /api/scripts/run-due` (admin-only). The scheduler sidecar fires this every 60 s by default. Each due script is claimed atomically (`last_status='running'`), executed in a BackgroundTask, and the outcome written to `last_run` / `last_status`. Scripts already in `running` state are skipped — no concurrent runs of the same script. (#78)
- Four new env vars on the scheduler sidecar: `SCHEDULER_DATA_REFRESH_INTERVAL`, `SCHEDULER_HEALTH_CHECK_INTERVAL`, `SCHEDULER_SCRIPT_RUN_INTERVAL`, `SCHEDULER_TICK_SECONDS`. All accept positive integers (seconds); tick must be ≤ smallest job interval. Documented in `docs/DEPLOYMENT.md` → Scheduler tuning. (#77)
- `RegisterTableRequest.sync_schedule`, `UpdateTableRequest.sync_schedule`, and `DeployScriptRequest.schedule` now reject malformed strings with a Pydantic 422 (e.g. `"hourly"`, `"daily 25:00"`). The accepted forms are unchanged: `every Nm`, `every Nh`, `daily HH:MM[,HH:MM,...]`. (#78, #79)

### Changed
- `OpenMetadataClient` now defaults to `verify=True` for TLS. The previous version hardcoded `verify=False` and suppressed urllib3's "Unverified HTTPS request" warning at import time (which leaked to every other httpx client in the process). Operators on internal CAs must pass `verify=False` or a CA bundle path explicitly. **Existing deployments on self-signed certificates without an explicit opt-out will start failing TLS verification — set `verify=False` at the call site, or supply a CA bundle, before upgrading.** (#89)

### Internal
- `src/scheduler.py` now exports `is_valid_schedule(s)` and `filter_due_tables(configs, sync_state_repo)` for reuse across the sync filter, the script runner, and Pydantic validators.
- `ScriptRepository` gains `claim_for_run(script_id)` and `record_run_result(script_id, status)` — the atomic primitives for the scheduled-script execution path.

Step 9.2: Bump version

In pyproject.toml, change:

version = "0.18.0"

to:

version = "0.19.0"

Step 9.3: Commit + tag (tag pushed by maintainer post-merge)

git add CHANGELOG.md pyproject.toml
git commit -m "chore(release): cut 0.19.0 — scheduler re-wire + OpenMetadata TLS"

(Do NOT push a v0.19.0 git tag from the worktree. Per the user's convention, the tag is created on the merge commit on main and a GitHub Release is opened to mirror it.)

Task 10: Final verification

Files: none modified.

Step 10.1: Run the full test suite

pytest tests/ -x 2>&1 | tail -40

Expected: green. If any unrelated test fails, investigate before declaring done — possible interaction with the import-order changes in app/api/sync.py or the new field_validator in app/api/admin.py.

Step 10.2: Smoke-test the import surface

python -c "from app.main import app; from services.scheduler.__main__ import build_jobs; print('jobs:', [j[0] for j in build_jobs()])"

Expected output: jobs: ['data-refresh', 'health-check', 'script-runner', 'marketplaces']. Any ImportError indicates a missing import added in this PR.

Step 10.3: Open the PR

git push -u origin worktree-issues-68-77-78-79-89
gh pr create --title "feat(scheduler): honor sync_schedule + script schedule; tune via env; OpenMetadata TLS" --body "$(cat <<'EOF'
## Summary

Bundles four scheduler / security issues:

- **#79** — `table_registry.sync_schedule` is now honored at runtime via an API-side filter inside `_run_sync()`. Tables without a schedule continue to sync on every tick; manual `POST /api/sync/trigger {"tables":[...]}` bypasses the filter.
- **#78** — New endpoint `POST /api/scripts/run-due` runs deployed scripts whose `schedule` says they are due. Atomic claim via `last_status='running'`; results written via BackgroundTask.
- **#77** — Sidecar JOBS list is now built from env (`SCHEDULER_*_INTERVAL`, `SCHEDULER_TICK_SECONDS`). Validation: positive ints, tick ≤ smallest interval. Adds a 4th `script-runner` job for #78.
- **#89** — `OpenMetadataClient` defaults to `verify=True`. Module-level `warnings.filterwarnings` removed.

Issue **#68** is intentionally NOT in scope — the referenced Stop hook script does not live in this OSS repo as of HEAD; the issue needs clarification before implementation.

## Test plan

- [ ] `pytest tests/` passes
- [ ] Manual: register a table with `sync_schedule="every 1h"`, sync it, then trigger sync within the hour — confirm log line `Table X skipped: schedule=...`
- [ ] Manual: deploy a script with `schedule="every 1m"`, wait, confirm `last_run` and `last_status` populate
- [ ] Manual: set `SCHEDULER_TICK_SECONDS=99999` → scheduler container fails to start with the validation error
- [ ] Manual: any internal OpenMetadata caller now passes `verify=False` (or a CA bundle path) explicitly

EOF
)"

Self-review checklist (run before declaring plan-write done)

Spec coverage: #77 ✓ (Task 6), #78 ✓ (Tasks 4–6), #79 ✓ (Tasks 1–3), #89 ✓ (Task 7), #68 ✗ (intentionally out of scope, documented in plan header). All accepted.
Placeholder scan: none of the "TBD / fill in / similar to" forbidden phrases. Code blocks present in every implementation step.
Type consistency: claim_for_run / record_run_result referenced in Tasks 4 and 5 with matching signatures. filter_due_tables referenced in Tasks 1 and 2 with matching signature. is_valid_schedule referenced in Tasks 1, 3, 5 with consistent contract. build_jobs and resolved_tick_seconds defined and used in Task 6 only.
Schema migration: no migration. Verified table_registry.sync_schedule and script_registry.{schedule,last_run,last_status} already exist in v17.

61 KiB Raw Blame History Unescape Escape

Issues #77, #78, #79, #89 — Re-wire Scheduler + TLS Hardening

File Structure

Pre-flight

Task 1: src/scheduler.py — add is_valid_schedule and filter_due_tables

Task 2: Wire filter_due_tables into _run_sync

Task 3: Pydantic format validators for sync_schedule

Task 4: Extend ScriptRepository with claim_for_run and record_run_result

Task 5: POST /api/scripts/run-due endpoint

Task 6: Env-driven sidecar JOBS + add script-runner job

Task 7: OpenMetadata client — TLS verify by default

Task 8: Documentation + .env.template updates

Task 9: CHANGELOG entries + release cut

Task 10: Final verification

Self-review checklist (run before declaring plan-write done)

61 KiB

Raw Blame History

Task 1: `src/scheduler.py` — add `is_valid_schedule` and `filter_due_tables`

Task 2: Wire `filter_due_tables` into `_run_sync`

Task 3: Pydantic format validators for `sync_schedule`

Task 4: Extend `ScriptRepository` with `claim_for_run` and `record_run_result`

Task 5: `POST /api/scripts/run-due` endpoint

Task 6: Env-driven sidecar JOBS + add `script-runner` job

Task 8: Documentation + `.env.template` updates