Commit graph

3 commits

Author SHA1 Message Date
ZdenekSrotyr
fa3a76a528 fix(scheduler): single env var drives cadence + grace (#179 review)
Devin NOTABLE: SCHEDULER_VERIFICATION_DETECTOR_INTERVAL was already
read by app/api/health.py to compute the staleness grace window, but
the actual scheduler cadence was hardcoded to 'every 15m'. The env
var name implied it controlled the cadence — it didn't. An operator
throttling the detector via the env was silently ignored by the
scheduler while the health grace silently widened.

Wired the env var into both ends. Same pattern applied to the other
two LLM-pipeline jobs:
- SCHEDULER_SESSION_COLLECTOR_INTERVAL     (default 600s = 10m)
- SCHEDULER_VERIFICATION_DETECTOR_INTERVAL (default 900s = 15m)
- SCHEDULER_CORPORATE_MEMORY_INTERVAL      (default 1020s = 17m)

Defaults preserve the existing 10m / 15m / 17m coprime offset so the
three jobs don't fire on the same tick.

build_jobs() now reads all three through _read_positive_int (matching
the existing pattern for data-refresh / health-check / script-runner)
and feeds them to _seconds_to_schedule. The smallest-interval check
includes the new variables so an operator can't accidentally set a
tick larger than any LLM cadence.

New tests in tests/test_scheduler.py:
- TestLLMPipelineCadenceEnvVars: env override changes the schedule
  string at scheduler-init time, with parametrized invalid-value
  rejection.
- TestVerificationDetectorGraceFollowsCadence: pinning the
  single-source-of-truth contract — same env var moves both the
  scheduler cadence and the health-check grace.
2026-05-05 05:59:18 +02:00
Petr
5f27d05894 Support multiple daily sync times (e.g., "daily 07:00,13:00,18:00")
Scheduler now accepts comma-separated HH:MM times in daily schedules.
Each time slot is independently evaluated - if any slot has passed and
last_sync is before it, the table is marked as due.

This lets tables sync multiple times per day to pick up data refreshes
that happen throughout the day (e.g., Keboola pipelines running 3x/day).
2026-03-16 23:09:48 +01:00
Petr
80c5b902e0 Add scheduled data sync and catalog refresh with systemd timers
- New sync_schedule and profile_after_sync fields in TableConfig
  (formats: "every 15m", "every 1h", "daily 05:00")
- New src/scheduler.py with schedule evaluation logic (is_table_due)
- New --scheduled mode in data_sync.py: only syncs tables that are due,
  respects profile_after_sync flag, auto-restarts webapp after profiling
- Systemd timer+service for data-refresh (every 15 min)
- Systemd timer+service for catalog-refresh (every 15 min)
- deploy.sh enables new timers automatically
- Complete table config reference in data_description.md.example
- 58 new scheduler tests
2026-03-15 02:16:31 +01:00