# Issues #77, #78, #79, #89 — Re-wire Scheduler + TLS Hardening > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Honor per-table `sync_schedule` and per-script `schedule` at runtime (Option A — re-implement); make sidecar job intervals operator-tunable; stop disabling TLS verification globally in the OpenMetadata client. **Architecture:** - **API-side filter** for `table_registry.sync_schedule` (#79). New helper `filter_due_tables()` in `src/scheduler.py` is called from `app/api/sync.py:_run_sync()` after `repo.list_local()`. Tables with no schedule keep current "always sync" behavior (opt-in feature). Manual `POST /api/sync/trigger {"tables": [...]}` bypasses the filter (operator override always wins). - **Server-side runner endpoint** `POST /api/scripts/run-due` (#78). The sidecar fires the endpoint on a configurable cadence; the API claims due scripts atomically (`last_status='running'` UPDATE … RETURNING), runs each via existing `_execute_script` in BackgroundTasks, and writes `last_run` + `last_status` on completion. Concurrency: a script in `running` state is skipped on the next tick. - **Env-driven sidecar JOBS** (#77). Three documented env overrides for the existing two interval jobs + tick, plus a fourth for the new script-runner job (attributed to #78 in changelog). Marketplaces stays hardcoded — outside #77 scope. - **TLS verify by default** in `OpenMetadataClient` (#89). Mirror the `connectors/llm/openai_compat.py` pattern: `verify: bool | str = True` constructor parameter; drop the module-level `warnings.filterwarnings`. No DuckDB schema migration required — all touched columns (`table_registry.sync_schedule`, `script_registry.schedule/last_run/last_status`) already exist in v17. **Tech Stack:** Python 3.11+, FastAPI, Pydantic v2, DuckDB, httpx. Pytest for tests. **Out of scope (intentional):** - Issue #68 (Stop hook output field) — no Stop hook source lives in this OSS repo as of HEAD; the referenced TODO.md no longer exists. Needs clarification from the issue author before implementation. - Per-script concurrency beyond "skip if running" (no queue, no max-runtime detection). - Operator-defined custom sidecar jobs (would land in `instance.yaml` per #77's "Future option"). --- ## File Structure **New files:** - `tests/test_run_due_scripts.py` — tests for the new `/api/scripts/run-due` endpoint and the `claim_for_run`/`record_run_result` repo methods. - `tests/test_sync_filter.py` — tests for `filter_due_tables()` and `is_valid_schedule()`. **Modified files:** - `src/scheduler.py` — add `is_valid_schedule(schedule) -> bool` and `filter_due_tables(table_configs, sync_state_repo) -> list[dict]`. - `app/api/sync.py` — wire `filter_due_tables()` into `_run_sync()`. - `app/api/admin.py` — Pydantic `field_validator` on `RegisterTableRequest.sync_schedule` and `UpdateTableRequest.sync_schedule` (reject malformed strings with 422). - `src/repositories/notifications.py` — extend `ScriptRepository` with `claim_for_run(script_id)` and `record_run_result(script_id, status)`. - `app/api/scripts.py` — add `POST /api/scripts/run-due` endpoint; Pydantic `field_validator` on `DeployScriptRequest.schedule`. - `services/scheduler/__main__.py` — env-driven `JOBS` list with validation; add 4th `script-runner` job. - `connectors/openmetadata/client.py` — add `verify` constructor param; drop module-level `warnings.filterwarnings`. - `config/.env.template` — document the four new `SCHEDULER_*` env vars. - `docs/DEPLOYMENT.md` — new "Scheduler tuning" subsection covering the env vars. - `CHANGELOG.md` — entries under a new `[0.19.0]` section. - `pyproject.toml` — bump `version` to `0.19.0`. **Untouched (intentionally):** - `src/db.py` — schema unchanged. `script_registry.last_run` was always nullable; we just start writing to it. - `tests/test_scheduler*.py` — keep as-is. `is_table_due` is the reusable primitive both `filter_due_tables` and the script runner build on. --- ## Pre-flight - [ ] **Step P-1: Confirm worktree branch and clean state** ```bash git status git branch --show-current ``` Expected: clean tree on `worktree-issues-68-77-78-79-89` (or whatever this worktree's branch is). - [ ] **Step P-2: Confirm test suite is green at HEAD** ```bash pytest tests/test_scheduler.py tests/test_sync_manifest.py tests/test_scripts_api.py -v 2>&1 | tail -30 ``` Expected: all green. If any are red at HEAD, stop and investigate before adding new tests. --- ## Task 1: `src/scheduler.py` — add `is_valid_schedule` and `filter_due_tables` **Files:** - Modify: `src/scheduler.py` (add two new functions at the end of the module) - Create: `tests/test_sync_filter.py` **Why this first:** Both #79 (table sync) and #78 (scripts) reuse `is_valid_schedule` for Pydantic validation. `filter_due_tables` is the pure helper #79 wires into `_run_sync()`. Pure-function unit tests; no FastAPI / DuckDB plumbing yet. - [ ] **Step 1.1: Write the failing test file** Create `tests/test_sync_filter.py`: ```python """Tests for the schedule-validity helper and the per-table due-filter.""" from datetime import datetime, timedelta, timezone import pytest from src.scheduler import filter_due_tables, is_valid_schedule # ---------------- is_valid_schedule ----------------------------------------- @pytest.mark.parametrize("schedule", [ "every 15m", "every 1h", "every 6h", "daily 05:00", "daily 07:00,13:00,18:00", ]) def test_is_valid_schedule_accepts_documented_formats(schedule): assert is_valid_schedule(schedule) is True @pytest.mark.parametrize("schedule", [ "", "every", "every 0m", # zero is not a positive interval "every 15s", # seconds not supported "daily", "daily 25:00", # invalid hour "daily 12:60", # invalid minute "daily 12:00,", # trailing comma "hourly", # unknown keyword "every -5m", # negative ]) def test_is_valid_schedule_rejects_malformed_strings(schedule): assert is_valid_schedule(schedule) is False def test_is_valid_schedule_treats_none_as_invalid(): # None is "no schedule" — callers handle that case before validating. # The validator is for non-null strings only. assert is_valid_schedule(None) is False # type: ignore[arg-type] # ---------------- filter_due_tables ----------------------------------------- class _FakeSyncStateRepo: """Stub SyncStateRepository — returns last_sync per table_id.""" def __init__(self, last_syncs: dict[str, datetime | None]): self._data = last_syncs def get_last_sync(self, table_id: str): return self._data.get(table_id) def _utc(year, month, day, hour=0, minute=0): return datetime(year, month, day, hour, minute, tzinfo=timezone.utc) def test_filter_due_tables_passes_through_unscheduled_tables(): """Tables with sync_schedule=None are always due (opt-in feature).""" configs = [ {"id": "t1", "name": "t1", "sync_schedule": None}, {"id": "t2", "name": "t2", "sync_schedule": ""}, ] repo = _FakeSyncStateRepo({}) out = filter_due_tables(configs, repo, now=_utc(2026, 5, 1, 10, 0)) assert [c["id"] for c in out] == ["t1", "t2"] def test_filter_due_tables_drops_table_within_interval(): """A table on 'every 1h' synced 30m ago is NOT due.""" configs = [{"id": "fast", "name": "fast", "sync_schedule": "every 1h"}] repo = _FakeSyncStateRepo({"fast": _utc(2026, 5, 1, 9, 30)}) out = filter_due_tables(configs, repo, now=_utc(2026, 5, 1, 10, 0)) assert out == [] def test_filter_due_tables_keeps_table_past_interval(): """A table on 'every 1h' synced 90m ago IS due.""" configs = [{"id": "fast", "name": "fast", "sync_schedule": "every 1h"}] repo = _FakeSyncStateRepo({"fast": _utc(2026, 5, 1, 8, 30)}) out = filter_due_tables(configs, repo, now=_utc(2026, 5, 1, 10, 0)) assert [c["id"] for c in out] == ["fast"] def test_filter_due_tables_keeps_never_synced_table(): """No last_sync row → always due (matches is_table_due semantics).""" configs = [{"id": "new", "name": "new", "sync_schedule": "every 1h"}] repo = _FakeSyncStateRepo({}) # no entry at all out = filter_due_tables(configs, repo, now=_utc(2026, 5, 1, 10, 0)) assert [c["id"] for c in out] == ["new"] def test_filter_due_tables_treats_invalid_schedule_as_unscheduled(): """Garbled sync_schedule: log + always sync (don't silently skip).""" configs = [{"id": "bad", "name": "bad", "sync_schedule": "BOGUS"}] repo = _FakeSyncStateRepo({"bad": _utc(2026, 5, 1, 9, 59)}) out = filter_due_tables(configs, repo, now=_utc(2026, 5, 1, 10, 0)) assert [c["id"] for c in out] == ["bad"] def test_filter_due_tables_mixed_due_and_skipped(): configs = [ {"id": "due", "name": "due", "sync_schedule": "every 30m"}, {"id": "skipped", "name": "skipped", "sync_schedule": "every 30m"}, {"id": "always", "name": "always", "sync_schedule": None}, ] repo = _FakeSyncStateRepo({ "due": _utc(2026, 5, 1, 9, 0), # 60m ago → due "skipped": _utc(2026, 5, 1, 9, 50), # 10m ago → skip }) out = filter_due_tables(configs, repo, now=_utc(2026, 5, 1, 10, 0)) assert sorted(c["id"] for c in out) == ["always", "due"] def test_filter_due_tables_handles_naive_last_sync(): """SyncStateRepository can return naive datetimes from older rows; helper must coerce to UTC instead of crashing on tz-aware vs naive comparison.""" configs = [{"id": "old", "name": "old", "sync_schedule": "every 1h"}] naive_2h_ago = datetime(2026, 5, 1, 8, 0) # no tzinfo repo = _FakeSyncStateRepo({"old": naive_2h_ago}) out = filter_due_tables(configs, repo, now=_utc(2026, 5, 1, 10, 0)) assert [c["id"] for c in out] == ["old"] ``` - [ ] **Step 1.2: Run tests — expect ImportError** ```bash pytest tests/test_sync_filter.py -v 2>&1 | tail -10 ``` Expected: ImportError on `from src.scheduler import filter_due_tables, is_valid_schedule` — those symbols don't exist yet. - [ ] **Step 1.3: Implement `is_valid_schedule` and `filter_due_tables`** Append to `src/scheduler.py` (after `_parse_timestamp`): ```python def is_valid_schedule(schedule: Optional[str]) -> bool: """Return True iff ``schedule`` parses as a documented schedule string. Accepted forms (mirroring the rest of this module): - ``"every Nm"`` / ``"every Nh"`` with N a positive integer - ``"daily HH:MM"`` (24-h, UTC) optionally comma-separated: ``"daily 07:00,13:00"`` Anything else — including ``None``, empty string, or a parseable-looking but out-of-range value (``"daily 25:00"``) — returns False. Pydantic validators on the admin API call this to reject malformed input with 422 instead of accepting it and silently no-op'ing later. """ if not schedule or not isinstance(schedule, str): return False interval = parse_interval_minutes(schedule) if interval is not None: return interval > 0 match = DAILY_PATTERN.match(schedule) if not match: return False return bool(_parse_daily_times(match.group(1))) def filter_due_tables( table_configs: list[dict], sync_state_repo, now: Optional[datetime] = None, ) -> list[dict]: """Drop table configs whose ``sync_schedule`` says they are not due. Behaviour: - ``sync_schedule`` is None / empty / not a valid string → table passes through (no schedule = "sync on every tick", existing behaviour). - Valid schedule + last_sync within the cadence → drop. - Valid schedule + last_sync past cadence (or never) → keep. - Invalid schedule string → log a warning and let the table through (do NOT silently skip — operator surprise is worse than a redundant sync). ``sync_state_repo`` is duck-typed: only ``get_last_sync(table_id)`` is called, returning a ``datetime`` (tz-aware preferred, naive treated as UTC) or ``None``. """ if now is None: now = datetime.now(timezone.utc) out: list[dict] = [] for tc in table_configs: schedule = tc.get("sync_schedule") if not schedule: out.append(tc) continue if not is_valid_schedule(schedule): logger.warning( "Table %s has malformed sync_schedule %r — syncing anyway " "(fix the schedule string to suppress this message)", tc.get("id") or tc.get("name"), schedule, ) out.append(tc) continue last_sync = sync_state_repo.get_last_sync(tc.get("id") or tc.get("name")) last_sync_iso: Optional[str] if last_sync is None: last_sync_iso = None else: if last_sync.tzinfo is None: last_sync = last_sync.replace(tzinfo=timezone.utc) last_sync_iso = last_sync.isoformat() if is_table_due(schedule, last_sync_iso, now=now): out.append(tc) else: logger.info( "Table %s skipped: schedule=%r, last_sync=%s, not due yet", tc.get("id") or tc.get("name"), schedule, last_sync_iso, ) return out ``` - [ ] **Step 1.4: Run tests — expect green** ```bash pytest tests/test_sync_filter.py -v 2>&1 | tail -25 ``` Expected: all green (parametrized cases included). - [ ] **Step 1.5: Commit** ```bash git add src/scheduler.py tests/test_sync_filter.py git commit -m "feat(scheduler): add is_valid_schedule + filter_due_tables helpers (#79)" ``` --- ## Task 2: Wire `filter_due_tables` into `_run_sync` **Files:** - Modify: `app/api/sync.py:37-217` (the `_run_sync` function) - Test: extend `tests/test_sync_filter.py` with one integration-style test that exercises `_run_sync`'s filter call site (mocking subprocess + orchestrator). **Why now:** With the helper green, the wiring is a 4-line change. Test stubs out the heavy machinery (subprocess, orchestrator) and asserts only the filter is invoked correctly. - [ ] **Step 2.1: Add the integration test** Append to `tests/test_sync_filter.py`: ```python # ---------------- _run_sync wiring ------------------------------------------ def test_run_sync_filters_local_tables_by_schedule(monkeypatch, tmp_path): """`_run_sync(tables=None)` consults `filter_due_tables` and skips tables that are not due. Manual override (`tables=[...]`) bypasses the filter entirely.""" from app.api import sync as sync_module # Stub get_data_source_type → 'keboola' so the keboola subprocess code # path is taken (also matches the existing _run_sync shape). monkeypatch.setattr( sync_module, "_get_data_dir", lambda: tmp_path, ) import app.instance_config as instance_config monkeypatch.setattr(instance_config, "get_data_source_type", lambda: "keboola") # Fake registry with one due + one skipped table. fake_configs = [ {"id": "due", "name": "due", "source_type": "keboola", "sync_schedule": "every 30m", "query_mode": "local"}, {"id": "skipped", "name": "skipped", "source_type": "keboola", "sync_schedule": "every 30m", "query_mode": "local"}, ] class _StubRegistry: def __init__(self, conn): pass def list_local(self, source_type=None): return list(fake_configs) def get(self, table_id): return next((c for c in fake_configs if c["id"] == table_id), None) monkeypatch.setattr( "src.repositories.table_registry.TableRegistryRepository", _StubRegistry, ) # Fake sync_state: 'due' last synced 60m ago, 'skipped' 10m ago. from datetime import datetime, timezone last_syncs = { "due": datetime(2026, 5, 1, 9, 0, tzinfo=timezone.utc), "skipped": datetime(2026, 5, 1, 9, 50, tzinfo=timezone.utc), } class _StubState: def __init__(self, conn): pass def get_last_sync(self, table_id): return last_syncs.get(table_id) monkeypatch.setattr( "src.repositories.sync_state.SyncStateRepository", _StubState, ) # Freeze 'now' inside src.scheduler.filter_due_tables. We do this by # monkeypatching filter_due_tables itself to inject `now=`. from src import scheduler as _sched real_filter = _sched.filter_due_tables monkeypatch.setattr( sync_module, "filter_due_tables", lambda cfgs, repo: real_filter( cfgs, repo, now=datetime(2026, 5, 1, 10, 0, tzinfo=timezone.utc), ), ) # Capture the configs that subprocess.run sees (via stdin payload). captured = {} def _fake_run(cmd, input, capture_output, text, timeout, env, cwd): import json as _json captured["configs"] = _json.loads(input) class _R: returncode = 0 stdout = "{}" stderr = "" return _R() monkeypatch.setattr(sync_module.subprocess, "run", _fake_run) # Stub orchestrator + profiler imports inside the function so we don't # require a real DuckDB analytics file. import src.orchestrator as _orch_mod class _StubOrch: def rebuild(self): return {} monkeypatch.setattr(_orch_mod, "SyncOrchestrator", _StubOrch) # Run with tables=None → filter applies → only 'due' goes to subprocess. sync_module._run_sync(tables=None) assert [c["id"] for c in captured["configs"]] == ["due"] # Run with explicit override → filter is BYPASSED → both go through. captured.clear() sync_module._run_sync(tables=["due", "skipped"]) assert sorted(c["id"] for c in captured["configs"]) == ["due", "skipped"] ``` - [ ] **Step 2.2: Run the test — expect FAIL** ```bash pytest tests/test_sync_filter.py::test_run_sync_filters_local_tables_by_schedule -v 2>&1 | tail -20 ``` Expected: AssertionError — `captured["configs"]` contains both tables in the first assertion (filter not yet wired in). - [ ] **Step 2.3: Wire `filter_due_tables` into `_run_sync`** In `app/api/sync.py`, add the import near the top of `_run_sync` (line 50ish): ```python from src.scheduler import filter_due_tables from src.repositories.sync_state import SyncStateRepository ``` Replace lines 56-66 (the registry-read block) with: ```python # Read table configs in main process (has shared DuckDB connection) sys_conn = get_system_db() try: repo = TableRegistryRepository(sys_conn) if tables: # Manual operator override — bypass schedule filter entirely # so an admin saying "sync these specific tables now" wins. all_configs = [repo.get(t) for t in tables] table_configs = [c for c in all_configs if c is not None] else: table_configs = repo.list_local(source_type) if source_type else repo.list_local() # #79: drop tables whose sync_schedule says they are not due. # Tables without a schedule pass through (opt-in feature). state_repo = SyncStateRepository(sys_conn) table_configs = filter_due_tables(table_configs, state_repo) finally: sys_conn.close() ``` (Leave the auto-discovery block immediately after unchanged; it only fires when `table_configs` is empty after filtering, which is consistent with prior semantics.) - [ ] **Step 2.4: Run the wiring test + the existing sync test — expect green** ```bash pytest tests/test_sync_filter.py tests/test_sync_manifest.py -v 2>&1 | tail -25 ``` Expected: green. The manifest test exercises a different code path; if it regresses, the import probably broke something — re-verify the import block. - [ ] **Step 2.5: Commit** ```bash git add app/api/sync.py tests/test_sync_filter.py git commit -m "feat(sync): honor table_registry.sync_schedule at trigger time (#79)" ``` --- ## Task 3: Pydantic format validators for `sync_schedule` **Files:** - Modify: `app/api/admin.py` — add `field_validator` on `RegisterTableRequest.sync_schedule` and `UpdateTableRequest.sync_schedule`. - Test: extend `tests/test_admin_bq_register.py` (or a sibling, depending on what the codebase calls the admin-register test file). **Why:** Once #79 honours the field, malformed values become operator-visible bugs ("I set `sync_schedule='hourly'` but it never skips"). Reject at register/update time with a clear 422. - [ ] **Step 3.1: Locate the right test file** ```bash grep -l "RegisterTableRequest\|/register-table" tests/ -r 2>/dev/null | head -3 ``` Use whatever file matches. Plan continues assuming `tests/test_admin_bq_register.py` exists (per the file listing in Task 0); adapt path if not. - [ ] **Step 3.2: Write the failing tests** Append to `tests/test_admin_bq_register.py`: ```python # --- sync_schedule format validation (#79) ---------------------------------- import pytest from pydantic import ValidationError from app.api.admin import RegisterTableRequest, UpdateTableRequest @pytest.mark.parametrize("schedule", [ "every 15m", "every 1h", "daily 05:00", "daily 07:00,13:00,18:00", None, # explicit None is allowed (no schedule = always sync) ]) def test_register_request_accepts_valid_sync_schedule(schedule): req = RegisterTableRequest(name="orders", sync_schedule=schedule) assert req.sync_schedule == schedule @pytest.mark.parametrize("schedule", [ "hourly", "every 0m", "daily 25:00", "every 5x", " ", ]) def test_register_request_rejects_malformed_sync_schedule(schedule): with pytest.raises(ValidationError) as exc_info: RegisterTableRequest(name="orders", sync_schedule=schedule) assert "sync_schedule" in str(exc_info.value) @pytest.mark.parametrize("schedule", [ "every 30m", "daily 08:00", None, ]) def test_update_request_accepts_valid_sync_schedule(schedule): req = UpdateTableRequest(sync_schedule=schedule) assert req.sync_schedule == schedule def test_update_request_rejects_malformed_sync_schedule(): with pytest.raises(ValidationError): UpdateTableRequest(sync_schedule="weekly") ``` - [ ] **Step 3.3: Run — expect FAIL** ```bash pytest tests/test_admin_bq_register.py -v -k sync_schedule 2>&1 | tail -20 ``` Expected: failures because malformed strings are accepted today. - [ ] **Step 3.4: Add the validators to `app/api/admin.py`** In `app/api/admin.py`, add the import near the top (next to other `src` imports, around line 27): ```python from src.scheduler import is_valid_schedule ``` In the `RegisterTableRequest` class (line 644), add this validator alongside the existing ones: ```python @field_validator("sync_schedule", mode="before") @classmethod def _validate_sync_schedule(cls, v): # None / "" / pure-whitespace → no schedule, accepted. # Any non-empty string must parse — otherwise it would be persisted # and silently ignored by the runtime evaluator. if v in (None, ""): return v if isinstance(v, str) and not v.strip(): return None if not is_valid_schedule(v): raise ValueError( f"sync_schedule must be 'every Nm' / 'every Nh' / " f"'daily HH:MM[,HH:MM,...]', got {v!r}" ) return v ``` In the `UpdateTableRequest` class (line 780), add the same validator. (Duplication is intentional — the two models have separate field declarations and Pydantic v2 validators don't inherit cleanly across unrelated `BaseModel` classes. DRY-ing into a mixin is overkill for two fields.) - [ ] **Step 3.5: Run — expect green** ```bash pytest tests/test_admin_bq_register.py -v -k sync_schedule 2>&1 | tail -20 ``` Expected: green. - [ ] **Step 3.6: Commit** ```bash git add app/api/admin.py tests/test_admin_bq_register.py git commit -m "feat(admin): validate sync_schedule format on register/update (#79)" ``` --- ## Task 4: Extend `ScriptRepository` with `claim_for_run` and `record_run_result` **Files:** - Modify: `src/repositories/notifications.py` — add two methods to `ScriptRepository`. - Create: `tests/test_run_due_scripts.py` (will grow across Tasks 4–6). **Why:** Concurrency is "skip if running". `claim_for_run` is the atomic UPDATE that flips a script from idle → `running` and returns whether the caller actually owns the slot. `record_run_result` writes the post-execution status. - [ ] **Step 4.1: Write failing tests** Create `tests/test_run_due_scripts.py`: ```python """Tests for the scheduled-script runner — repo claim/release primitives, the run-due endpoint, and Pydantic validation on DeployScriptRequest.""" from datetime import datetime, timezone import pytest from src.db import get_system_db from src.repositories.notifications import ScriptRepository @pytest.fixture() def conn(tmp_path, monkeypatch): """Fresh system.duckdb in a tmp dir — uses real schema, no mocks.""" monkeypatch.setenv("DATA_DIR", str(tmp_path)) state_dir = tmp_path / "state" state_dir.mkdir(parents=True, exist_ok=True) c = get_system_db() yield c c.close() def _deploy(repo: ScriptRepository, script_id="s1", schedule="every 1h"): repo.deploy(id=script_id, name=script_id, owner="u1", schedule=schedule, source="print('hi')") # ---------------- claim_for_run --------------------------------------------- def test_claim_for_run_succeeds_when_idle(conn): repo = ScriptRepository(conn) _deploy(repo) assert repo.claim_for_run("s1") is True row = repo.get("s1") assert row["last_status"] == "running" assert row["last_run"] is not None def test_claim_for_run_fails_when_already_running(conn): repo = ScriptRepository(conn) _deploy(repo) assert repo.claim_for_run("s1") is True # Second claim should fail because last_status is still 'running'. assert repo.claim_for_run("s1") is False def test_claim_for_run_succeeds_after_completion(conn): repo = ScriptRepository(conn) _deploy(repo) repo.claim_for_run("s1") repo.record_run_result("s1", status="success") # Now claimable again. assert repo.claim_for_run("s1") is True def test_claim_for_run_returns_false_for_unknown_script(conn): repo = ScriptRepository(conn) assert repo.claim_for_run("does-not-exist") is False # ---------------- record_run_result ----------------------------------------- @pytest.mark.parametrize("status", ["success", "failure"]) def test_record_run_result_writes_terminal_status(conn, status): repo = ScriptRepository(conn) _deploy(repo) repo.claim_for_run("s1") repo.record_run_result("s1", status=status) row = repo.get("s1") assert row["last_status"] == status def test_record_run_result_rejects_running_as_terminal(conn): """The 'running' string is reserved for claim_for_run; record_run_result must reject it so a caller can't accidentally re-arm the running flag instead of clearing it.""" repo = ScriptRepository(conn) _deploy(repo) repo.claim_for_run("s1") with pytest.raises(ValueError): repo.record_run_result("s1", status="running") ``` - [ ] **Step 4.2: Run — expect FAIL** ```bash pytest tests/test_run_due_scripts.py -v 2>&1 | tail -25 ``` Expected: AttributeError on `claim_for_run` / `record_run_result`. - [ ] **Step 4.3: Add the methods to `ScriptRepository`** In `src/repositories/notifications.py`, after the existing `list_all` method (around line 105), add: ```python def claim_for_run(self, script_id: str) -> bool: """Atomically set last_status='running' iff the script is idle. Returns True iff this caller is the new owner of the run slot. Returns False if the script does not exist OR is already running. Implementation: UPDATE … WHERE last_status IS DISTINCT FROM 'running' + RETURNING id. DuckDB supports IS DISTINCT FROM and RETURNING; if zero rows come back, somebody else already owns the slot. """ now = datetime.now(timezone.utc) result = self.conn.execute( """UPDATE script_registry SET last_status = 'running', last_run = ? WHERE id = ? AND (last_status IS NULL OR last_status != 'running') RETURNING id""", [now, script_id], ).fetchone() return result is not None def record_run_result(self, script_id: str, status: str) -> None: """Write the terminal status of a finished run (clears 'running'). Accepts only 'success' or 'failure' — 'running' would re-arm the flag instead of clearing it, defeating the purpose of the call. """ if status not in ("success", "failure"): raise ValueError( f"record_run_result: status must be 'success' or 'failure', " f"got {status!r}" ) self.conn.execute( "UPDATE script_registry SET last_status = ? WHERE id = ?", [status, script_id], ) ``` - [ ] **Step 4.4: Run — expect green** ```bash pytest tests/test_run_due_scripts.py -v 2>&1 | tail -25 ``` Expected: green. - [ ] **Step 4.5: Commit** ```bash git add src/repositories/notifications.py tests/test_run_due_scripts.py git commit -m "feat(scripts): add claim_for_run + record_run_result to ScriptRepository (#78)" ``` --- ## Task 5: `POST /api/scripts/run-due` endpoint **Files:** - Modify: `app/api/scripts.py` — new endpoint + Pydantic validator on `DeployScriptRequest.schedule`. - Test: extend `tests/test_run_due_scripts.py`. **Why:** This is the API surface the sidecar fires on every tick. It iterates `script_registry`, claims each due script, and queues execution in a `BackgroundTask` so the response returns immediately — the sidecar doesn't block on long-running scripts. - [ ] **Step 5.1: Write failing tests** Append to `tests/test_run_due_scripts.py`: ```python # ---------------- DeployScriptRequest.schedule validation ------------------- from pydantic import ValidationError from app.api.scripts import DeployScriptRequest def test_deploy_request_accepts_valid_schedule(): req = DeployScriptRequest(name="report", source="print(1)", schedule="every 1h") assert req.schedule == "every 1h" def test_deploy_request_accepts_no_schedule(): req = DeployScriptRequest(name="report", source="print(1)") assert req.schedule is None def test_deploy_request_rejects_malformed_schedule(): with pytest.raises(ValidationError): DeployScriptRequest(name="report", source="print(1)", schedule="weekly") # ---------------- /api/scripts/run-due endpoint ----------------------------- from fastapi.testclient import TestClient # Helper: mint a TestClient with admin auth bypass. The codebase uses # `LOCAL_DEV_MODE=1` to short-circuit auth in tests; mirror existing test # files (tests/test_scripts_api.py) for the canonical pattern. @pytest.fixture() def client(monkeypatch, tmp_path): monkeypatch.setenv("LOCAL_DEV_MODE", "1") monkeypatch.setenv("DATA_DIR", str(tmp_path)) (tmp_path / "state").mkdir(parents=True, exist_ok=True) from app.main import app return TestClient(app) def test_run_due_skips_scripts_without_schedule(client, monkeypatch): """A script with schedule=NULL is never picked up by run-due (those are run only via explicit POST /api/scripts/{id}/run).""" monkeypatch.setattr( "app.api.scripts._execute_script", lambda src, name: {"name": name, "exit_code": 0, "stdout": "", "stderr": "", "truncated": False}, ) deploy = client.post( "/api/scripts/deploy", json={"name": "manual-only", "source": "print(1)"}, ) assert deploy.status_code == 201 resp = client.post("/api/scripts/run-due") assert resp.status_code == 200 assert resp.json()["claimed"] == [] def test_run_due_claims_due_scripts(client, monkeypatch): """A script on 'every 1h' that has never run gets claimed and executed.""" calls = [] def _fake_exec(source, name): calls.append(name) return {"name": name, "exit_code": 0, "stdout": "", "stderr": "", "truncated": False} monkeypatch.setattr("app.api.scripts._execute_script", _fake_exec) deploy = client.post( "/api/scripts/deploy", json={"name": "report", "source": "print(1)", "schedule": "every 1h"}, ) script_id = deploy.json()["id"] resp = client.post("/api/scripts/run-due") assert resp.status_code == 200 body = resp.json() assert body["claimed"] == [script_id] # BackgroundTasks runs synchronously inside TestClient, so the call # has happened by now. assert "report" in calls def test_run_due_skips_scripts_already_running(client, monkeypatch): """A script in 'running' state must not be re-claimed by a second sidecar tick that arrives while the previous run is still going.""" monkeypatch.setattr( "app.api.scripts._execute_script", # Simulate a slow run by NOT updating last_status — repo.claim_for_run # already wrote 'running'; we leave it that way. lambda src, name: {"name": name, "exit_code": 0, "stdout": "", "stderr": "", "truncated": False}, ) # Patch out record_run_result so the run never "completes". monkeypatch.setattr( "src.repositories.notifications.ScriptRepository.record_run_result", lambda self, *a, **kw: None, ) deploy = client.post( "/api/scripts/deploy", json={"name": "long", "source": "print(1)", "schedule": "every 1h"}, ) script_id = deploy.json()["id"] first = client.post("/api/scripts/run-due") assert first.json()["claimed"] == [script_id] second = client.post("/api/scripts/run-due") assert second.json()["claimed"] == [] ``` - [ ] **Step 5.2: Run — expect FAIL** ```bash pytest tests/test_run_due_scripts.py -v -k "deploy_request or run_due" 2>&1 | tail -25 ``` Expected: ValidationError on the validator tests; 404/405 on the endpoint tests (route doesn't exist). - [ ] **Step 5.3: Add the validator and endpoint** In `app/api/scripts.py`, near the top: ```python from datetime import datetime, timezone from fastapi import BackgroundTasks from pydantic import field_validator from src.scheduler import is_valid_schedule, is_table_due ``` Replace the existing `DeployScriptRequest` (lines 24-27) with: ```python class DeployScriptRequest(BaseModel): name: str source: str schedule: Optional[str] = None @field_validator("schedule", mode="before") @classmethod def _validate_schedule(cls, v): if v in (None, ""): return None if isinstance(v, str) and not v.strip(): return None if not is_valid_schedule(v): raise ValueError( f"schedule must be 'every Nm' / 'every Nh' / " f"'daily HH:MM[,HH:MM,...]', got {v!r}" ) return v ``` Add the endpoint at the end of the route definitions (after `undeploy_script`): ```python @router.post("/run-due") async def run_due_scripts( background_tasks: BackgroundTasks, user: dict = Depends(require_admin), conn: duckdb.DuckDBPyConnection = Depends(_get_db), ): """Run every deployed script whose ``schedule`` says it is due. Iterates ``script_registry``, skips rows without a schedule (those run only via explicit POST /{id}/run), evaluates ``is_table_due(schedule, last_run)``, and atomically claims each due row via ``ScriptRepository.claim_for_run``. Execution is queued as a ``BackgroundTask`` so the response returns immediately — the sidecar must not block waiting on a long-running script. Concurrency: ``claim_for_run`` flips ``last_status`` to ``'running'`` inside the same UPDATE; a script already in that state is skipped on subsequent ticks until the BackgroundTask writes a terminal status via ``record_run_result``. There is no max-runtime detection in this PR — if a BackgroundTask crashes without writing a terminal status, the script stays stuck in ``'running'`` until an operator clears it manually (``UPDATE script_registry SET last_status = NULL WHERE id = ?``). Documenting this as an accepted v0 limitation; revisit if it bites in practice. """ repo = ScriptRepository(conn) claimed: list[str] = [] for script in repo.list_all(): schedule = script.get("schedule") if not schedule: continue last_run = script.get("last_run") last_run_iso = last_run.isoformat() if last_run else None if not is_table_due(schedule, last_run_iso): continue if not repo.claim_for_run(script["id"]): # Lost the race / already running — next tick will retry. continue claimed.append(script["id"]) background_tasks.add_task( _run_claimed_script, script_id=script["id"], source=script["source"], name=script["name"], ) return {"claimed": claimed, "count": len(claimed)} def _run_claimed_script(script_id: str, source: str, name: str) -> None: """Execute a previously-claimed script and write the terminal status. Runs in a FastAPI BackgroundTask, so it owns its own DB connection (the request-scoped conn is already gone by the time this fires). Any exception writes 'failure' and re-raises so the BG handler still surfaces the traceback in logs. """ from src.db import get_system_db bg_conn = get_system_db() try: bg_repo = ScriptRepository(bg_conn) try: _execute_script(source, name) bg_repo.record_run_result(script_id, status="success") except Exception: bg_repo.record_run_result(script_id, status="failure") raise finally: bg_conn.close() ``` - [ ] **Step 5.4: Run — expect green** ```bash pytest tests/test_run_due_scripts.py -v 2>&1 | tail -25 ``` Expected: green. If the LOCAL_DEV_MODE auth bypass test fixture doesn't quite work in your repo, mirror whatever pattern `tests/test_scripts_api.py` uses for the same client. - [ ] **Step 5.5: Commit** ```bash git add app/api/scripts.py tests/test_run_due_scripts.py git commit -m "feat(scripts): POST /api/scripts/run-due + format validator (#78)" ``` --- ## Task 6: Env-driven sidecar JOBS + add `script-runner` job **Files:** - Modify: `services/scheduler/__main__.py` — replace hardcoded `JOBS` list with an env-driven builder; add 4th job for scripts. - Create: `tests/test_scheduler_sidecar.py` — small unit tests on the new builder. **Why now:** With the API surface in place (Tasks 2 + 5), the sidecar is the operational glue. Refactor to env-driven config (#77) and add the script tick (#78) in one pass — the touched lines overlap. - [ ] **Step 6.1: Write failing tests** Create `tests/test_scheduler_sidecar.py`: ```python """Unit tests for the env-driven JOBS builder in services.scheduler.""" import pytest def test_build_jobs_uses_documented_defaults(monkeypatch): """No env overrides → default cadences.""" for v in ( "SCHEDULER_DATA_REFRESH_INTERVAL", "SCHEDULER_HEALTH_CHECK_INTERVAL", "SCHEDULER_TICK_SECONDS", "SCHEDULER_SCRIPT_RUN_INTERVAL", ): monkeypatch.delenv(v, raising=False) from services.scheduler.__main__ import build_jobs, resolved_tick_seconds jobs = {name: schedule for name, schedule, *_ in build_jobs()} assert jobs["data-refresh"] == "every 15m" assert jobs["health-check"] == "every 5m" assert jobs["script-runner"] == "every 1m" assert jobs["marketplaces"] == "daily 03:00" assert resolved_tick_seconds() == 30 def test_build_jobs_honors_env_overrides(monkeypatch): monkeypatch.setenv("SCHEDULER_DATA_REFRESH_INTERVAL", "1800") # 30m monkeypatch.setenv("SCHEDULER_HEALTH_CHECK_INTERVAL", "60") # 1m monkeypatch.setenv("SCHEDULER_SCRIPT_RUN_INTERVAL", "120") # 2m monkeypatch.setenv("SCHEDULER_TICK_SECONDS", "10") from services.scheduler.__main__ import build_jobs, resolved_tick_seconds jobs = {name: schedule for name, schedule, *_ in build_jobs()} assert jobs["data-refresh"] == "every 30m" assert jobs["health-check"] == "every 1m" assert jobs["script-runner"] == "every 2m" assert resolved_tick_seconds() == 10 @pytest.mark.parametrize("var", [ "SCHEDULER_DATA_REFRESH_INTERVAL", "SCHEDULER_HEALTH_CHECK_INTERVAL", "SCHEDULER_TICK_SECONDS", "SCHEDULER_SCRIPT_RUN_INTERVAL", ]) @pytest.mark.parametrize("bad", ["0", "-5", "abc", ""]) def test_build_jobs_rejects_invalid_env(monkeypatch, var, bad): monkeypatch.setenv(var, bad) from services.scheduler.__main__ import build_jobs with pytest.raises(ValueError): build_jobs() def test_build_jobs_rejects_tick_larger_than_smallest_interval(monkeypatch): """Tick must be <= the smallest job interval, otherwise jobs would consistently miss their cadence by up to one tick.""" monkeypatch.setenv("SCHEDULER_HEALTH_CHECK_INTERVAL", "60") monkeypatch.setenv("SCHEDULER_TICK_SECONDS", "120") from services.scheduler.__main__ import build_jobs with pytest.raises(ValueError, match="tick"): build_jobs() def test_build_jobs_includes_run_due_endpoint(): """The script-runner job must POST to /api/scripts/run-due.""" from services.scheduler.__main__ import build_jobs target = next(j for j in build_jobs() if j[0] == "script-runner") name, schedule, endpoint, method, _timeout = target assert endpoint == "/api/scripts/run-due" assert method == "POST" ``` - [ ] **Step 6.2: Run — expect FAIL** ```bash pytest tests/test_scheduler_sidecar.py -v 2>&1 | tail -20 ``` Expected: ImportError on `build_jobs` / `resolved_tick_seconds`; or KeyError on `script-runner` (current JOBS list doesn't have it). - [ ] **Step 6.3: Refactor `services/scheduler/__main__.py`** Replace the hardcoded `JOBS` block (lines 72-89) and the `run()` function with the following. Keep everything above line 72 (imports, `_get_auth_token`, etc.) unchanged. ```python # --- Env parsing ------------------------------------------------------------ _DEFAULTS = { "SCHEDULER_DATA_REFRESH_INTERVAL": 15 * 60, # seconds "SCHEDULER_HEALTH_CHECK_INTERVAL": 5 * 60, "SCHEDULER_SCRIPT_RUN_INTERVAL": 1 * 60, "SCHEDULER_TICK_SECONDS": 30, } def _read_positive_int(name: str) -> int: """Read an env var as a positive integer or fall back to the default.""" raw = os.environ.get(name) if raw is None or raw == "": if name not in _DEFAULTS: raise ValueError(f"Unknown scheduler env var: {name}") return _DEFAULTS[name] try: value = int(raw) except (TypeError, ValueError): raise ValueError(f"{name}={raw!r} must be a positive integer (seconds)") if value <= 0: raise ValueError(f"{name}={value} must be > 0 (seconds)") return value def _seconds_to_schedule(seconds: int) -> str: """Convert a seconds value to the closest 'every Nm' / 'every Nh' string.""" if seconds % 3600 == 0 and seconds >= 3600: return f"every {seconds // 3600}h" minutes = max(1, seconds // 60) return f"every {minutes}m" def resolved_tick_seconds() -> int: """Read + validate SCHEDULER_TICK_SECONDS in isolation (test helper).""" return _read_positive_int("SCHEDULER_TICK_SECONDS") def build_jobs() -> list[tuple[str, str, str, str, int]]: """Build the JOBS list from env, applying defaults and validation. Tuple shape: (name, schedule_string, endpoint, method, http_timeout_sec). Marketplaces stays hardcoded — promoting it to env is out of #77 scope. """ refresh = _read_positive_int("SCHEDULER_DATA_REFRESH_INTERVAL") health = _read_positive_int("SCHEDULER_HEALTH_CHECK_INTERVAL") scripts = _read_positive_int("SCHEDULER_SCRIPT_RUN_INTERVAL") tick = _read_positive_int("SCHEDULER_TICK_SECONDS") smallest = min(refresh, health, scripts) if tick > smallest: raise ValueError( f"SCHEDULER_TICK_SECONDS={tick} must be <= the smallest job " f"interval ({smallest}s) so jobs don't consistently miss their " f"cadence by up to one tick" ) return [ ("data-refresh", _seconds_to_schedule(refresh), "/api/sync/trigger", "POST", 120), ("health-check", _seconds_to_schedule(health), "/api/health", "GET", 30), ("script-runner", _seconds_to_schedule(scripts), "/api/scripts/run-due", "POST", 600), ("marketplaces", "daily 03:00", "/api/marketplaces/sync-all", "POST", 900), ] _running = True def _signal_handler(sig, frame): global _running logger.info(f"Received signal {sig}, shutting down...") _running = False def _call_api(endpoint: str, method: str, timeout_sec: int) -> bool: """Call the main app API. Returns True on success.""" url = f"{API_URL}{endpoint}" headers = {} token = _get_auth_token() if token: headers["Authorization"] = f"Bearer {token}" try: if method == "POST": resp = httpx.post(url, headers=headers, timeout=timeout_sec) else: resp = httpx.get(url, headers=headers, timeout=timeout_sec) if resp.status_code < 400: logger.info(f"Job {endpoint}: {resp.status_code}") return True else: logger.warning(f"Job {endpoint}: HTTP {resp.status_code} - {resp.text[:200]}") return False except Exception as e: logger.error(f"Job {endpoint} failed: {e}") return False def run(): signal.signal(signal.SIGTERM, _signal_handler) signal.signal(signal.SIGINT, _signal_handler) jobs = build_jobs() tick = resolved_tick_seconds() logger.info( "Scheduler started. API_URL=%s, %d jobs, tick=%ds. Schedules: %s", API_URL, len(jobs), tick, {name: schedule for name, schedule, *_ in jobs}, ) last_run: dict[str, str | None] = {name: None for name, *_ in jobs} while _running: now_iso = datetime.now(timezone.utc).isoformat() for name, schedule, endpoint, method, timeout_sec in jobs: if not is_table_due(schedule, last_run[name]): continue logger.info("Running job: %s (%s)", name, schedule) ok = _call_api(endpoint, method, timeout_sec) if ok: last_run[name] = now_iso time.sleep(tick) logger.info("Scheduler stopped.") if __name__ == "__main__": run() ``` (Delete the old `JOBS = [...]` literal and the old `run()` body — they're fully replaced.) - [ ] **Step 6.4: Run — expect green** ```bash pytest tests/test_scheduler_sidecar.py -v 2>&1 | tail -20 ``` Expected: green. - [ ] **Step 6.5: Commit** ```bash git add services/scheduler/__main__.py tests/test_scheduler_sidecar.py git commit -m "feat(scheduler): env-driven JOBS + script-runner tick (#77, #78)" ``` --- ## Task 7: OpenMetadata client — TLS verify by default **Files:** - Modify: `connectors/openmetadata/client.py` - Test: extend with a small tests file (or add to whatever existing test the connector has — search first). **Why:** `verify=False` ships JWT bearer tokens over an unauthenticated channel; the module-level `warnings.filterwarnings` mutates global state. Mirror the pattern in `connectors/llm/openai_compat.py` which already gets this right. - [ ] **Step 7.1: Locate existing OpenMetadata test file (if any)** ```bash ls tests/ | grep -i openmetadata ``` If empty, create `tests/test_openmetadata_client.py`. If a file exists, extend it. - [ ] **Step 7.2: Write failing tests** Create (or extend) `tests/test_openmetadata_client.py`: ```python """Tests for OpenMetadataClient TLS handling — see #89. The previous version disabled TLS verification globally and suppressed the "Unverified HTTPS request" warning at import time. Both behaviors are fixed here. """ import warnings from unittest.mock import patch def test_client_verifies_tls_by_default(): from connectors.openmetadata.client import OpenMetadataClient with patch("connectors.openmetadata.client.httpx.Client") as mock_client: OpenMetadataClient(base_url="https://catalog.example.com", token="t") kwargs = mock_client.call_args.kwargs assert kwargs["verify"] is True def test_client_accepts_explicit_verify_false(): """Operators on internal CAs may opt out — but it must be explicit.""" from connectors.openmetadata.client import OpenMetadataClient with patch("connectors.openmetadata.client.httpx.Client") as mock_client: OpenMetadataClient(base_url="https://catalog.example.com", token="t", verify=False) assert mock_client.call_args.kwargs["verify"] is False def test_client_accepts_custom_ca_bundle_path(): from connectors.openmetadata.client import OpenMetadataClient with patch("connectors.openmetadata.client.httpx.Client") as mock_client: OpenMetadataClient( base_url="https://catalog.example.com", token="t", verify="/etc/ssl/certs/internal-ca.pem", ) assert mock_client.call_args.kwargs["verify"] == "/etc/ssl/certs/internal-ca.pem" def test_module_import_does_not_mutate_global_warnings_filter(): """The previous version called warnings.filterwarnings('ignore', ...) at import time, which suppresses urllib3 warnings for ALL httpx clients in the process — not just OpenMetadata's. Drop it.""" import importlib pre_filters = list(warnings.filters) import connectors.openmetadata.client as om importlib.reload(om) post_filters = list(warnings.filters) # No new "ignore Unverified HTTPS request" filter should have been added. new = [f for f in post_filters if f not in pre_filters] for action, message, *_ in new: if message is not None: assert "Unverified HTTPS request" not in message.pattern ``` - [ ] **Step 7.3: Run — expect FAIL** ```bash pytest tests/test_openmetadata_client.py -v 2>&1 | tail -20 ``` Expected: failures — `verify=False` is hardcoded, and the module-level `warnings.filterwarnings` runs at import. - [ ] **Step 7.4: Fix the client** In `connectors/openmetadata/client.py`: Delete lines 14 (`import warnings`) and 18-19 (the `warnings.filterwarnings(...)` call and its comment). Replace the `__init__` signature (lines 34-59) with: ```python def __init__( self, base_url: str, token: str, timeout: int = 30, verify: bool | str = True, ): """ Initialize OpenMetadata API client. Args: base_url: Base URL of OpenMetadata instance (e.g., "https://catalog.example.com") token: JWT bearer token for authentication timeout: HTTP request timeout in seconds verify: TLS verification — True (default), False to disable (e.g., for self-signed certificates on internal CAs), or a path to a CA bundle. The previous version hardcoded False globally and suppressed warnings — both removed in #89. Operators with self-signed certs should pass an explicit ``verify=False`` or a CA bundle path from their config. """ self.base_url = base_url.rstrip("/") self.token = token self.timeout = timeout self._client = httpx.Client( base_url=self.base_url, headers={ "Authorization": f"Bearer {token}", "Content-Type": "application/json", }, timeout=timeout, verify=verify, ) ``` - [ ] **Step 7.5: Run — expect green** ```bash pytest tests/test_openmetadata_client.py -v 2>&1 | tail -20 ``` Expected: green. - [ ] **Step 7.6: Audit existing call sites** ```bash grep -rn "OpenMetadataClient(" --include="*.py" . ``` Any call site that previously relied on the implicit `verify=False` will now hit a TLS error if it talks to a self-signed instance. Update each call site to pass `verify=` explicitly from the config (e.g., reading `OPENMETADATA_VERIFY_SSL` from instance.yaml or env). If no internal config flag exists yet, add one to `instance.yaml.example` and surface it in `config/loader.py` so operators have a tuning knob. **List every changed call site in the commit message.** - [ ] **Step 7.7: Commit** ```bash git add connectors/openmetadata/client.py tests/test_openmetadata_client.py [...any caller files updated in 7.6] git commit -m "fix(openmetadata): verify TLS by default; drop module-level warning filter (#89)" ``` --- ## Task 8: Documentation + `.env.template` updates **Files:** - Modify: `config/.env.template` - Modify: `docs/DEPLOYMENT.md` **Why:** The new env vars are operator-facing surface — they need to be discoverable without spelunking source. - [ ] **Step 8.1: Add the env vars to `config/.env.template`** Append a new section to `config/.env.template`: ```ini # ── SCHEDULER (sidecar tuning) ────────────────────── # All values are in seconds and must be positive integers. SCHEDULER_TICK_SECONDS # must be <= the smallest job interval below. # SCHEDULER_DATA_REFRESH_INTERVAL=900 # default 15 min — POST /api/sync/trigger # SCHEDULER_HEALTH_CHECK_INTERVAL=300 # default 5 min — GET /api/health # SCHEDULER_SCRIPT_RUN_INTERVAL=60 # default 1 min — POST /api/scripts/run-due # SCHEDULER_TICK_SECONDS=30 # default 30 s — loop polling cadence ``` - [ ] **Step 8.2: Add a "Scheduler tuning" subsection to `docs/DEPLOYMENT.md`** Find the most appropriate location (probably near the existing TLS / Docker compose section) and insert: ```markdown ### Scheduler tuning The scheduler sidecar (`services/scheduler/__main__.py`) fires periodic HTTP calls against the main app. Job cadences are configurable via env vars on the scheduler container: | Env var | Default | Purpose | | ---------------------------------- | ------- | --------------------------------------------- | | `SCHEDULER_DATA_REFRESH_INTERVAL` | `900` | seconds between `POST /api/sync/trigger` | | `SCHEDULER_HEALTH_CHECK_INTERVAL` | `300` | seconds between `GET /api/health` | | `SCHEDULER_SCRIPT_RUN_INTERVAL` | `60` | seconds between `POST /api/scripts/run-due` | | `SCHEDULER_TICK_SECONDS` | `30` | loop polling cadence; must be ≤ smallest interval above | `/api/sync/trigger` walks `table_registry`; tables with a per-row `sync_schedule` (`every Nm` / `every Nh` / `daily HH:MM[,...]`) are filtered to only those due for sync since their last run. Tables without a schedule continue to run on every tick. The marketplace job runs at `daily 03:00` UTC and is not currently env-tunable. `/api/scripts/run-due` walks `script_registry` and runs each deployed script whose `schedule` says it is due. Scripts in the `running` state are skipped on subsequent ticks until the previous run writes a terminal status. The endpoint requires admin auth (the sidecar's `SCHEDULER_API_TOKEN` resolves to a synthetic Admin user). ``` - [ ] **Step 8.3: Commit** ```bash git add config/.env.template docs/DEPLOYMENT.md git commit -m "docs: document scheduler env vars + per-table/script schedules (#77, #78, #79)" ``` --- ## Task 9: CHANGELOG entries + release cut **Files:** - Modify: `CHANGELOG.md` - Modify: `pyproject.toml` **Why:** Per CLAUDE.md, every user-visible change requires a CHANGELOG entry in the same PR. This is one bundled PR covering four issues; release-cut goes here as the last commit (per user convention: release-cut belongs in the next behavior-change PR, not a standalone one). - [ ] **Step 9.1: Edit `CHANGELOG.md`** Replace the current top-of-file structure: ```markdown ## [Unreleased] ## [0.18.0] — 2026-04-29 ... ``` with: ```markdown ## [Unreleased] ## [0.19.0] — 2026-04-29 ### Added - `table_registry.sync_schedule` is now honored at runtime. `POST /api/sync/trigger` (called by the scheduler sidecar every 15 min by default) drops local tables whose schedule says they are not due. Tables without a schedule continue to sync on every tick (opt-in feature). Manual `POST /api/sync/trigger {"tables":[...]}` bypasses the schedule filter — operator override always wins. (#79) - `script_registry.schedule` is now honored at runtime via the new endpoint `POST /api/scripts/run-due` (admin-only). The scheduler sidecar fires this every 60 s by default. Each due script is claimed atomically (`last_status='running'`), executed in a BackgroundTask, and the outcome written to `last_run` / `last_status`. Scripts already in `running` state are skipped — no concurrent runs of the same script. (#78) - Four new env vars on the scheduler sidecar: `SCHEDULER_DATA_REFRESH_INTERVAL`, `SCHEDULER_HEALTH_CHECK_INTERVAL`, `SCHEDULER_SCRIPT_RUN_INTERVAL`, `SCHEDULER_TICK_SECONDS`. All accept positive integers (seconds); tick must be ≤ smallest job interval. Documented in `docs/DEPLOYMENT.md` → Scheduler tuning. (#77) - `RegisterTableRequest.sync_schedule`, `UpdateTableRequest.sync_schedule`, and `DeployScriptRequest.schedule` now reject malformed strings with a Pydantic 422 (e.g. `"hourly"`, `"daily 25:00"`). The accepted forms are unchanged: `every Nm`, `every Nh`, `daily HH:MM[,HH:MM,...]`. (#78, #79) ### Changed - `OpenMetadataClient` now defaults to `verify=True` for TLS. The previous version hardcoded `verify=False` and suppressed urllib3's "Unverified HTTPS request" warning at import time (which leaked to every other httpx client in the process). Operators on internal CAs must pass `verify=False` or a CA bundle path explicitly. **Existing deployments on self-signed certificates without an explicit opt-out will start failing TLS verification — set `verify=False` at the call site, or supply a CA bundle, before upgrading.** (#89) ### Internal - `src/scheduler.py` now exports `is_valid_schedule(s)` and `filter_due_tables(configs, sync_state_repo)` for reuse across the sync filter, the script runner, and Pydantic validators. - `ScriptRepository` gains `claim_for_run(script_id)` and `record_run_result(script_id, status)` — the atomic primitives for the scheduled-script execution path. ``` - [ ] **Step 9.2: Bump version** In `pyproject.toml`, change: ```toml version = "0.18.0" ``` to: ```toml version = "0.19.0" ``` - [ ] **Step 9.3: Commit + tag (tag pushed by maintainer post-merge)** ```bash git add CHANGELOG.md pyproject.toml git commit -m "chore(release): cut 0.19.0 — scheduler re-wire + OpenMetadata TLS" ``` (Do NOT push a `v0.19.0` git tag from the worktree. Per the user's convention, the tag is created on the merge commit on `main` and a GitHub Release is opened to mirror it.) --- ## Task 10: Final verification **Files:** none modified. - [ ] **Step 10.1: Run the full test suite** ```bash pytest tests/ -x 2>&1 | tail -40 ``` Expected: green. If any unrelated test fails, investigate before declaring done — possible interaction with the import-order changes in `app/api/sync.py` or the new `field_validator` in `app/api/admin.py`. - [ ] **Step 10.2: Smoke-test the import surface** ```bash python -c "from app.main import app; from services.scheduler.__main__ import build_jobs; print('jobs:', [j[0] for j in build_jobs()])" ``` Expected output: `jobs: ['data-refresh', 'health-check', 'script-runner', 'marketplaces']`. Any ImportError indicates a missing import added in this PR. - [ ] **Step 10.3: Open the PR** ```bash git push -u origin worktree-issues-68-77-78-79-89 gh pr create --title "feat(scheduler): honor sync_schedule + script schedule; tune via env; OpenMetadata TLS" --body "$(cat <<'EOF' ## Summary Bundles four scheduler / security issues: - **#79** — `table_registry.sync_schedule` is now honored at runtime via an API-side filter inside `_run_sync()`. Tables without a schedule continue to sync on every tick; manual `POST /api/sync/trigger {"tables":[...]}` bypasses the filter. - **#78** — New endpoint `POST /api/scripts/run-due` runs deployed scripts whose `schedule` says they are due. Atomic claim via `last_status='running'`; results written via BackgroundTask. - **#77** — Sidecar JOBS list is now built from env (`SCHEDULER_*_INTERVAL`, `SCHEDULER_TICK_SECONDS`). Validation: positive ints, tick ≤ smallest interval. Adds a 4th `script-runner` job for #78. - **#89** — `OpenMetadataClient` defaults to `verify=True`. Module-level `warnings.filterwarnings` removed. Issue **#68** is intentionally NOT in scope — the referenced Stop hook script does not live in this OSS repo as of HEAD; the issue needs clarification before implementation. ## Test plan - [ ] `pytest tests/` passes - [ ] Manual: register a table with `sync_schedule="every 1h"`, sync it, then trigger sync within the hour — confirm log line `Table X skipped: schedule=...` - [ ] Manual: deploy a script with `schedule="every 1m"`, wait, confirm `last_run` and `last_status` populate - [ ] Manual: set `SCHEDULER_TICK_SECONDS=99999` → scheduler container fails to start with the validation error - [ ] Manual: any internal OpenMetadata caller now passes `verify=False` (or a CA bundle path) explicitly EOF )" ``` --- ## Self-review checklist (run before declaring plan-write done) - **Spec coverage:** #77 ✓ (Task 6), #78 ✓ (Tasks 4–6), #79 ✓ (Tasks 1–3), #89 ✓ (Task 7), #68 ✗ (intentionally out of scope, documented in plan header). All accepted. - **Placeholder scan:** none of the "TBD / fill in / similar to" forbidden phrases. Code blocks present in every implementation step. - **Type consistency:** `claim_for_run` / `record_run_result` referenced in Tasks 4 and 5 with matching signatures. `filter_due_tables` referenced in Tasks 1 and 2 with matching signature. `is_valid_schedule` referenced in Tasks 1, 3, 5 with consistent contract. `build_jobs` and `resolved_tick_seconds` defined and used in Task 6 only. - **Schema migration:** no migration. Verified `table_registry.sync_schedule` and `script_registry.{schedule,last_run,last_status}` already exist in v17.