Add scheduled data sync and catalog refresh with systemd timers
- New sync_schedule and profile_after_sync fields in TableConfig (formats: "every 15m", "every 1h", "daily 05:00") - New src/scheduler.py with schedule evaluation logic (is_table_due) - New --scheduled mode in data_sync.py: only syncs tables that are due, respects profile_after_sync flag, auto-restarts webapp after profiling - Systemd timer+service for data-refresh (every 15 min) - Systemd timer+service for catalog-refresh (every 15 min) - deploy.sh enables new timers automatically - Complete table config reference in data_description.md.example - 58 new scheduler tests
This commit is contained in:
parent
d9f3977028
commit
80c5b902e0
10 changed files with 846 additions and 32 deletions
|
|
@ -11,30 +11,127 @@ folder_mapping:
|
||||||
"in.c-example": "example"
|
"in.c-example": "example"
|
||||||
|
|
||||||
tables:
|
tables:
|
||||||
|
# Small reference table - full refresh, no automatic sync
|
||||||
- id: "in.c-example.customers"
|
- id: "in.c-example.customers"
|
||||||
name: "customers"
|
name: "customers"
|
||||||
description: "Customer master data"
|
description: "Customer master data"
|
||||||
primary_key: "id"
|
primary_key: "id"
|
||||||
sync_strategy: "full_refresh"
|
sync_strategy: "full_refresh"
|
||||||
|
|
||||||
|
# Large transactional table - daily automatic sync with profiling
|
||||||
- id: "in.c-example.orders"
|
- id: "in.c-example.orders"
|
||||||
name: "orders"
|
name: "orders"
|
||||||
description: "Order transactions with line items"
|
description: "Order transactions with line items"
|
||||||
primary_key: "id"
|
primary_key: "id"
|
||||||
sync_strategy: "incremental"
|
sync_strategy: "partitioned"
|
||||||
incremental_window_days: 7
|
|
||||||
partition_by: "created_at"
|
partition_by: "created_at"
|
||||||
partition_granularity: "month"
|
partition_column_type: "DATE"
|
||||||
|
partition_granularity: "day"
|
||||||
|
incremental_window_days: 3
|
||||||
|
max_history_days: 450
|
||||||
|
query_mode: "local"
|
||||||
|
sync_schedule: "daily 05:00"
|
||||||
|
profile_after_sync: true
|
||||||
|
|
||||||
|
# Frequently updated table - sync every hour, skip profiling
|
||||||
|
- id: "in.c-example.events"
|
||||||
|
name: "events"
|
||||||
|
description: "Real-time event stream"
|
||||||
|
primary_key: "event_id"
|
||||||
|
sync_strategy: "partitioned"
|
||||||
|
partition_by: "event_date"
|
||||||
|
partition_column_type: "DATE"
|
||||||
|
partition_granularity: "day"
|
||||||
|
incremental_window_days: 1
|
||||||
|
query_mode: "local"
|
||||||
|
sync_schedule: "every 1h"
|
||||||
|
profile_after_sync: false
|
||||||
```
|
```
|
||||||
|
|
||||||
## Sync Strategies
|
## Table Configuration Reference
|
||||||
|
|
||||||
- **full_refresh**: Downloads entire table on each sync. Best for small reference tables.
|
### Required Fields
|
||||||
- **incremental**: Downloads only new/changed rows based on a date column. Best for large transactional tables.
|
|
||||||
|
|
||||||
## Partition Granularity
|
| Field | Description | Example |
|
||||||
|
|-------|-------------|---------|
|
||||||
|
| `id` | Full table identifier in data source | `"in.c-crm.company"` |
|
||||||
|
| `name` | Short name (used for Parquet filenames) | `"company"` |
|
||||||
|
| `description` | Human-readable description | `"Company master data"` |
|
||||||
|
| `primary_key` | Primary key column(s), comma-separated | `"id"` or `"order_id, line_id"` |
|
||||||
|
| `sync_strategy` | How data is downloaded (see below) | `"full_refresh"` |
|
||||||
|
|
||||||
When using `partition_by`, data is split into separate Parquet files by time period:
|
### Sync Strategy
|
||||||
- **month**: One file per month (e.g., `orders/2024-01.parquet`)
|
|
||||||
- **day**: One file per day (e.g., `events/2024-01-15.parquet`)
|
| Strategy | Description | Use for |
|
||||||
- **none**: Single file (default)
|
|----------|-------------|---------|
|
||||||
|
| `full_refresh` | Downloads entire table each sync | Small reference tables (< 100K rows) |
|
||||||
|
| `incremental` | Downloads changed rows via changedSince | Medium tables with update tracking |
|
||||||
|
| `partitioned` | Downloads by time partitions, overwrites only recent ones | Large tables with date column |
|
||||||
|
|
||||||
|
### Partitioning
|
||||||
|
|
||||||
|
| Field | Default | Description |
|
||||||
|
|-------|---------|-------------|
|
||||||
|
| `partition_by` | *(none)* | Column to partition by (e.g., `"created_at"`, `"event_date"`) |
|
||||||
|
| `partition_granularity` | `"month"` | `"day"`, `"month"`, or `"year"` |
|
||||||
|
| `partition_column_type` | `"TIMESTAMP"` | SQL type: `"DATE"`, `"TIMESTAMP"`, or `"DATETIME"` |
|
||||||
|
| `incremental_window_days` | `7` | How many recent days to re-download on each sync |
|
||||||
|
| `max_history_days` | *(all)* | Maximum history to keep (e.g., `450` for ~15 months) |
|
||||||
|
| `initial_load_chunk_days` | `30` | Chunk size for first-time download |
|
||||||
|
|
||||||
|
### Query Mode
|
||||||
|
|
||||||
|
| Field | Default | Description |
|
||||||
|
|-------|---------|-------------|
|
||||||
|
| `query_mode` | `"local"` | How the AI agent queries this table |
|
||||||
|
|
||||||
|
| Mode | Description | Best for |
|
||||||
|
|------|-------------|----------|
|
||||||
|
| `local` | Synced to Parquet, queried via DuckDB | Tables < 2 GB, fast queries |
|
||||||
|
| `remote` | Not synced, queried via BigQuery | Huge tables (100+ GB), live data |
|
||||||
|
| `hybrid` | Subset synced for profiling, queries go to BigQuery | Medium tables needing live data |
|
||||||
|
|
||||||
|
### Automatic Sync Schedule
|
||||||
|
|
||||||
|
| Field | Default | Description |
|
||||||
|
|-------|---------|-------------|
|
||||||
|
| `sync_schedule` | *(none)* | When to automatically sync this table |
|
||||||
|
| `profile_after_sync` | `true` | Run data profiler after sync completes |
|
||||||
|
|
||||||
|
The `sync_schedule` field controls automatic synchronization via the `data-refresh`
|
||||||
|
systemd timer (runs every 15 minutes). If omitted, the table is only synced manually.
|
||||||
|
|
||||||
|
**Schedule formats:**
|
||||||
|
|
||||||
|
| Format | Example | Description |
|
||||||
|
|--------|---------|-------------|
|
||||||
|
| `every {N}m` | `"every 15m"`, `"every 30m"` | Sync every N minutes |
|
||||||
|
| `every {N}h` | `"every 1h"`, `"every 6h"` | Sync every N hours |
|
||||||
|
| `daily HH:MM` | `"daily 05:00"`, `"daily 17:30"` | Sync once per day at HH:MM UTC |
|
||||||
|
| *(omitted)* | - | Manual sync only (`python -m src.data_sync`) |
|
||||||
|
|
||||||
|
**How scheduling works:**
|
||||||
|
- A systemd timer runs `python -m src.data_sync --scheduled` every 15 minutes
|
||||||
|
- For each table with `sync_schedule`, it checks the last sync time from `sync_state.json`
|
||||||
|
- `every` schedules: syncs if enough time has elapsed since last sync
|
||||||
|
- `daily` schedules: syncs once after the target time passes (skips if already synced today)
|
||||||
|
- Tables without `sync_schedule` are never synced automatically
|
||||||
|
|
||||||
|
**Profiling control:**
|
||||||
|
- `profile_after_sync: true` (default) - runs profiler after sync to update column statistics
|
||||||
|
- `profile_after_sync: false` - skips profiler (use for frequently synced tables where
|
||||||
|
profiling overhead is not worth it; the AI agent uses slightly older statistics)
|
||||||
|
- When profiling runs, the webapp is automatically restarted to load new statistics
|
||||||
|
|
||||||
|
### Optional Fields
|
||||||
|
|
||||||
|
| Field | Default | Description |
|
||||||
|
|-------|---------|-------------|
|
||||||
|
| `folder` | *(from folder_mapping)* | Override output folder name |
|
||||||
|
| `row_filter` | *(none)* | SQL WHERE clause (e.g., `"date >= DATE_SUB(CURRENT_DATE(), INTERVAL 15 MONTH)"`) |
|
||||||
|
| `columns` | *(all)* | List of columns to sync (subset) |
|
||||||
|
| `incremental_column` | *(none)* | Column for timestamp-based incremental sync (BigQuery) |
|
||||||
|
| `dataset` | *(none)* | Dataset group name for on-demand tables |
|
||||||
|
| `catalog_fqn` | *(auto)* | OpenMetadata FQN override (auto-derived from table ID if not set) |
|
||||||
|
| `foreign_keys` | `[]` | List of foreign key relationships |
|
||||||
|
| `where_filters` | `[]` | List of filters for Keboola Storage API |
|
||||||
|
|
|
||||||
|
|
@ -363,7 +363,7 @@ if [[ -n "${DESKTOP_JWT_SECRET:-}" ]] && ! systemctl is-active --quiet ws-gatewa
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Enable timers (only if service files exist)
|
# Enable timers (only if service files exist)
|
||||||
for timer in corporate-memory session-collector jira-sla-poll jira-consistency jira-consistency-deep; do
|
for timer in corporate-memory session-collector jira-sla-poll jira-consistency jira-consistency-deep data-refresh catalog-refresh; do
|
||||||
if [[ -f "/etc/systemd/system/${timer}.timer" ]]; then
|
if [[ -f "/etc/systemd/system/${timer}.timer" ]]; then
|
||||||
if ! systemctl is-enabled --quiet "${timer}.timer" 2>/dev/null; then
|
if ! systemctl is-enabled --quiet "${timer}.timer" 2>/dev/null; then
|
||||||
log "Enabling ${timer} timer..."
|
log "Enabling ${timer} timer..."
|
||||||
|
|
|
||||||
27
services/catalog_refresh/systemd/catalog-refresh.service
Normal file
27
services/catalog_refresh/systemd/catalog-refresh.service
Normal file
|
|
@ -0,0 +1,27 @@
|
||||||
|
[Unit]
|
||||||
|
Description=Catalog Refresh - export metrics and tables from OpenMetadata to YAML
|
||||||
|
After=network-online.target
|
||||||
|
Wants=network-online.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=oneshot
|
||||||
|
User=root
|
||||||
|
Group=data-ops
|
||||||
|
WorkingDirectory=/opt/data-analyst/repo
|
||||||
|
ExecStart=/opt/data-analyst/.venv/bin/python3 -m src.catalog_export
|
||||||
|
|
||||||
|
# Environment
|
||||||
|
EnvironmentFile=/opt/data-analyst/.env
|
||||||
|
Environment=PYTHONPATH=/opt/data-analyst/repo
|
||||||
|
Environment=CONFIG_DIR=/opt/data-analyst/instance/config
|
||||||
|
|
||||||
|
# Write access to docs output directory
|
||||||
|
ProtectSystem=strict
|
||||||
|
ReadWritePaths=/data/docs /opt/data-analyst/logs
|
||||||
|
PrivateTmp=true
|
||||||
|
|
||||||
|
# Catalog export is fast (seconds)
|
||||||
|
TimeoutStartSec=120
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
11
services/catalog_refresh/systemd/catalog-refresh.timer
Normal file
11
services/catalog_refresh/systemd/catalog-refresh.timer
Normal file
|
|
@ -0,0 +1,11 @@
|
||||||
|
[Unit]
|
||||||
|
Description=Run Catalog Refresh every 15 minutes
|
||||||
|
|
||||||
|
[Timer]
|
||||||
|
OnBootSec=1min
|
||||||
|
OnUnitActiveSec=15min
|
||||||
|
RandomizedDelaySec=30
|
||||||
|
Persistent=true
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=timers.target
|
||||||
33
services/data_refresh/systemd/data-refresh.service
Normal file
33
services/data_refresh/systemd/data-refresh.service
Normal file
|
|
@ -0,0 +1,33 @@
|
||||||
|
[Unit]
|
||||||
|
Description=Data Refresh - scheduled sync from BigQuery to local Parquet
|
||||||
|
After=network-online.target
|
||||||
|
Wants=network-online.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=oneshot
|
||||||
|
User=root
|
||||||
|
Group=data-ops
|
||||||
|
WorkingDirectory=/opt/data-analyst/repo
|
||||||
|
ExecStart=/opt/data-analyst/.venv/bin/python3 -m src.data_sync --scheduled
|
||||||
|
|
||||||
|
# Environment
|
||||||
|
EnvironmentFile=/opt/data-analyst/.env
|
||||||
|
Environment=PYTHONPATH=/opt/data-analyst/repo
|
||||||
|
Environment=CONFIG_DIR=/opt/data-analyst/instance/config
|
||||||
|
|
||||||
|
# Write access to data directory and logs
|
||||||
|
ProtectSystem=strict
|
||||||
|
ReadWritePaths=/data /opt/data-analyst/logs /tmp/data_analyst_staging
|
||||||
|
PrivateTmp=false
|
||||||
|
|
||||||
|
# Sync can take a while for large tables
|
||||||
|
TimeoutStartSec=3600
|
||||||
|
|
||||||
|
# Prevent overlapping runs
|
||||||
|
ExecCondition=/usr/bin/test ! -f /tmp/data-refresh.lock
|
||||||
|
ExecStartPre=/usr/bin/touch /tmp/data-refresh.lock
|
||||||
|
ExecStartPost=/usr/bin/rm -f /tmp/data-refresh.lock
|
||||||
|
ExecStopPost=/usr/bin/rm -f /tmp/data-refresh.lock
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
11
services/data_refresh/systemd/data-refresh.timer
Normal file
11
services/data_refresh/systemd/data-refresh.timer
Normal file
|
|
@ -0,0 +1,11 @@
|
||||||
|
[Unit]
|
||||||
|
Description=Run Data Refresh every 15 minutes
|
||||||
|
|
||||||
|
[Timer]
|
||||||
|
OnBootSec=3min
|
||||||
|
OnUnitActiveSec=15min
|
||||||
|
RandomizedDelaySec=30
|
||||||
|
Persistent=true
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=timers.target
|
||||||
|
|
@ -86,6 +86,8 @@ class TableConfig:
|
||||||
max_history_days: Max days of history for initial incremental load (None = download all)
|
max_history_days: Max days of history for initial incremental load (None = download all)
|
||||||
dataset: Dataset group name for on-demand tables (e.g., "kbc_telemetry_expert")
|
dataset: Dataset group name for on-demand tables (e.g., "kbc_telemetry_expert")
|
||||||
initial_load_chunk_days: Chunk size in days for chunked initial load (default: 30)
|
initial_load_chunk_days: Chunk size in days for chunked initial load (default: 30)
|
||||||
|
sync_schedule: Schedule for automatic sync: "every 15m", "every 1h", "daily 05:00" (UTC)
|
||||||
|
profile_after_sync: Run profiler after sync (default True; disable for frequently synced tables)
|
||||||
"""
|
"""
|
||||||
id: str
|
id: str
|
||||||
name: str
|
name: str
|
||||||
|
|
@ -107,6 +109,8 @@ class TableConfig:
|
||||||
query_mode: str = "local" # "local" (Parquet) | "remote" (BQ direct) | "hybrid" (sync subset, query BQ)
|
query_mode: str = "local" # "local" (Parquet) | "remote" (BQ direct) | "hybrid" (sync subset, query BQ)
|
||||||
partition_column_type: str = "TIMESTAMP" # BQ SQL type for partition column: "DATE", "TIMESTAMP", "DATETIME"
|
partition_column_type: str = "TIMESTAMP" # BQ SQL type for partition column: "DATE", "TIMESTAMP", "DATETIME"
|
||||||
catalog_fqn: Optional[str] = None # Explicit OpenMetadata FQN override (auto-derived if not set)
|
catalog_fqn: Optional[str] = None # Explicit OpenMetadata FQN override (auto-derived if not set)
|
||||||
|
sync_schedule: Optional[str] = None # Schedule: "every 15m", "every 1h", "daily 05:00" (UTC)
|
||||||
|
profile_after_sync: bool = True # Run profiler after sync (disable for frequently synced tables)
|
||||||
|
|
||||||
def __post_init__(self):
|
def __post_init__(self):
|
||||||
"""Validate configuration after initialization."""
|
"""Validate configuration after initialization."""
|
||||||
|
|
@ -158,6 +162,19 @@ class TableConfig:
|
||||||
f"Allowed values: {', '.join(valid_column_types)}"
|
f"Allowed values: {', '.join(valid_column_types)}"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
# Validate sync_schedule format
|
||||||
|
if self.sync_schedule:
|
||||||
|
import re as _re
|
||||||
|
valid_schedule = (
|
||||||
|
_re.match(r"^every \d+[mh]$", self.sync_schedule)
|
||||||
|
or _re.match(r"^daily \d{2}:\d{2}$", self.sync_schedule)
|
||||||
|
)
|
||||||
|
if not valid_schedule:
|
||||||
|
raise ValueError(
|
||||||
|
f"Invalid sync_schedule '{self.sync_schedule}' for table {self.id}. "
|
||||||
|
f"Allowed formats: 'every 15m', 'every 1h', 'daily 05:00'"
|
||||||
|
)
|
||||||
|
|
||||||
# For partitioned, partition_by must be defined
|
# For partitioned, partition_by must be defined
|
||||||
if self.sync_strategy == "partitioned":
|
if self.sync_strategy == "partitioned":
|
||||||
if not self.partition_by:
|
if not self.partition_by:
|
||||||
|
|
@ -457,6 +474,8 @@ class Config:
|
||||||
query_mode=table_data.get("query_mode", "local"),
|
query_mode=table_data.get("query_mode", "local"),
|
||||||
partition_column_type=table_data.get("partition_column_type", "TIMESTAMP"),
|
partition_column_type=table_data.get("partition_column_type", "TIMESTAMP"),
|
||||||
catalog_fqn=table_data.get("catalog_fqn"),
|
catalog_fqn=table_data.get("catalog_fqn"),
|
||||||
|
sync_schedule=table_data.get("sync_schedule"),
|
||||||
|
profile_after_sync=table_data.get("profile_after_sync", True),
|
||||||
)
|
)
|
||||||
table_configs.append(config)
|
table_configs.append(config)
|
||||||
|
|
||||||
|
|
|
||||||
177
src/data_sync.py
177
src/data_sync.py
|
|
@ -481,24 +481,152 @@ class DataSyncManager:
|
||||||
|
|
||||||
# Auto-profile changed tables
|
# Auto-profile changed tables
|
||||||
if success_count > 0:
|
if success_count > 0:
|
||||||
try:
|
self._auto_profile(results)
|
||||||
from src.profiler import profile_changed_tables
|
|
||||||
changed = [
|
|
||||||
self.config.get_table_config(tid).name
|
|
||||||
for tid, r in results.items()
|
|
||||||
if r.get("success") and self.config.get_table_config(tid)
|
|
||||||
]
|
|
||||||
if changed:
|
|
||||||
result = profile_changed_tables(changed)
|
|
||||||
logger.info(
|
|
||||||
f"Auto-profiling: {result['success']} profiled, "
|
|
||||||
f"{result['errors']} errors, {result['skipped']} skipped"
|
|
||||||
)
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Auto-profiling failed (non-fatal): {e}")
|
|
||||||
|
|
||||||
return results
|
return results
|
||||||
|
|
||||||
|
def _auto_profile(
|
||||||
|
self,
|
||||||
|
results: Dict[str, Dict[str, Any]],
|
||||||
|
skip_tables: Optional[List[str]] = None,
|
||||||
|
):
|
||||||
|
"""Run profiler on successfully synced tables.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
results: Sync results dict {table_id: result}
|
||||||
|
skip_tables: Table IDs to skip profiling for
|
||||||
|
"""
|
||||||
|
skip_set = set(skip_tables or [])
|
||||||
|
try:
|
||||||
|
from src.profiler import profile_changed_tables
|
||||||
|
changed = [
|
||||||
|
self.config.get_table_config(tid).name
|
||||||
|
for tid, r in results.items()
|
||||||
|
if r.get("success")
|
||||||
|
and self.config.get_table_config(tid)
|
||||||
|
and tid not in skip_set
|
||||||
|
]
|
||||||
|
if changed:
|
||||||
|
result = profile_changed_tables(changed)
|
||||||
|
logger.info(
|
||||||
|
f"Auto-profiling: {result['success']} profiled, "
|
||||||
|
f"{result['errors']} errors, {result['skipped']} skipped"
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
logger.info("No tables to profile (all skipped or none succeeded)")
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Auto-profiling failed (non-fatal): {e}")
|
||||||
|
|
||||||
|
def sync_scheduled(self) -> Dict[str, Dict[str, Any]]:
|
||||||
|
"""Synchronize only tables whose sync_schedule says they are due.
|
||||||
|
|
||||||
|
Evaluates each table's sync_schedule against its last_sync timestamp.
|
||||||
|
Only syncs tables that are due. Respects profile_after_sync flag.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary {table_id: result} with sync results (only for synced tables)
|
||||||
|
"""
|
||||||
|
from src.scheduler import is_table_due
|
||||||
|
|
||||||
|
scheduled_tables = [
|
||||||
|
tc for tc in self.config.tables
|
||||||
|
if tc.sync_schedule and tc.query_mode != "remote"
|
||||||
|
]
|
||||||
|
|
||||||
|
if not scheduled_tables:
|
||||||
|
logger.info("No tables with sync_schedule configured")
|
||||||
|
return {}
|
||||||
|
|
||||||
|
# Evaluate which tables are due
|
||||||
|
due_tables = []
|
||||||
|
for tc in scheduled_tables:
|
||||||
|
last_sync = self.sync_state.get_last_sync(tc.id)
|
||||||
|
if is_table_due(tc.sync_schedule, last_sync):
|
||||||
|
due_tables.append(tc)
|
||||||
|
logger.info(f"Table {tc.name} is DUE (schedule: {tc.sync_schedule})")
|
||||||
|
else:
|
||||||
|
logger.debug(f"Table {tc.name} is not due (schedule: {tc.sync_schedule})")
|
||||||
|
|
||||||
|
if not due_tables:
|
||||||
|
logger.info(
|
||||||
|
f"Checked {len(scheduled_tables)} scheduled tables, none are due"
|
||||||
|
)
|
||||||
|
return {}
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
f"Syncing {len(due_tables)}/{len(scheduled_tables)} due tables: "
|
||||||
|
f"{', '.join(tc.name for tc in due_tables)}"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Sync due tables
|
||||||
|
results = {}
|
||||||
|
for table_config in due_tables:
|
||||||
|
try:
|
||||||
|
result = self.data_source.sync_table(table_config, self.sync_state)
|
||||||
|
results[table_config.id] = result
|
||||||
|
if result["success"]:
|
||||||
|
logger.info(
|
||||||
|
f" {table_config.name}: {result['rows']:,} rows, "
|
||||||
|
f"{result['file_size_mb']:.2f} MB"
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
logger.error(f" {table_config.name}: {result['error']}")
|
||||||
|
except Exception as e:
|
||||||
|
logger.error(f" {table_config.name}: sync failed: {e}")
|
||||||
|
results[table_config.id] = {"success": False, "error": str(e)}
|
||||||
|
|
||||||
|
success_count = sum(1 for r in results.values() if r["success"])
|
||||||
|
logger.info(f"Scheduled sync: {success_count}/{len(results)} tables successful")
|
||||||
|
|
||||||
|
# Generate schema.yml
|
||||||
|
if success_count > 0:
|
||||||
|
try:
|
||||||
|
self._generate_schema_yaml()
|
||||||
|
except Exception as e:
|
||||||
|
logger.warning(f"Failed to generate schema.yml: {e}")
|
||||||
|
|
||||||
|
# Profile only tables with profile_after_sync=True
|
||||||
|
skip_profiler = [
|
||||||
|
tc.id for tc in due_tables if not tc.profile_after_sync
|
||||||
|
]
|
||||||
|
if skip_profiler:
|
||||||
|
logger.info(
|
||||||
|
f"Skipping profiler for: "
|
||||||
|
f"{', '.join(self.config.get_table_config(tid).name for tid in skip_profiler)}"
|
||||||
|
)
|
||||||
|
|
||||||
|
profiled_any = False
|
||||||
|
if success_count > 0:
|
||||||
|
tables_to_profile = [
|
||||||
|
tid for tid, r in results.items()
|
||||||
|
if r.get("success") and tid not in set(skip_profiler)
|
||||||
|
]
|
||||||
|
if tables_to_profile:
|
||||||
|
self._auto_profile(results, skip_tables=skip_profiler)
|
||||||
|
profiled_any = True
|
||||||
|
|
||||||
|
# Restart webapp if profiler ran (new profiles.json needs reload)
|
||||||
|
if profiled_any:
|
||||||
|
self._restart_webapp()
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
def _restart_webapp(self):
|
||||||
|
"""Restart webapp service to pick up new profiles.json."""
|
||||||
|
import subprocess
|
||||||
|
try:
|
||||||
|
subprocess.run(
|
||||||
|
["sudo", "systemctl", "restart", "webapp"],
|
||||||
|
check=True,
|
||||||
|
capture_output=True,
|
||||||
|
timeout=30,
|
||||||
|
)
|
||||||
|
logger.info("Webapp restarted successfully")
|
||||||
|
except subprocess.CalledProcessError as e:
|
||||||
|
logger.warning(f"Failed to restart webapp: {e.stderr.decode() if e.stderr else e}")
|
||||||
|
except FileNotFoundError:
|
||||||
|
logger.debug("systemctl not found (not running on server)")
|
||||||
|
|
||||||
|
|
||||||
def create_sync_manager() -> DataSyncManager:
|
def create_sync_manager() -> DataSyncManager:
|
||||||
"""
|
"""
|
||||||
|
|
@ -563,16 +691,25 @@ if __name__ == "__main__":
|
||||||
# CLI interface for sync
|
# CLI interface for sync
|
||||||
import sys
|
import sys
|
||||||
|
|
||||||
print("Data Sync")
|
scheduled_mode = "--scheduled" in sys.argv
|
||||||
|
table_args = [a for a in sys.argv[1:] if a != "--scheduled"]
|
||||||
|
|
||||||
try:
|
try:
|
||||||
manager = create_sync_manager()
|
manager = create_sync_manager()
|
||||||
|
|
||||||
if len(sys.argv) > 1:
|
if scheduled_mode:
|
||||||
tables_to_sync = sys.argv[1:]
|
print("Data Sync (scheduled mode)")
|
||||||
print(f"\nSynchronizing selected tables: {', '.join(tables_to_sync)}")
|
results = manager.sync_scheduled()
|
||||||
results = manager.sync_all(tables=tables_to_sync)
|
|
||||||
|
if not results:
|
||||||
|
print("No tables due for sync")
|
||||||
|
sys.exit(0)
|
||||||
|
elif table_args:
|
||||||
|
print("Data Sync")
|
||||||
|
print(f"\nSynchronizing selected tables: {', '.join(table_args)}")
|
||||||
|
results = manager.sync_all(tables=table_args)
|
||||||
else:
|
else:
|
||||||
|
print("Data Sync")
|
||||||
print("\nSynchronizing all tables...")
|
print("\nSynchronizing all tables...")
|
||||||
results = manager.sync_all()
|
results = manager.sync_all()
|
||||||
|
|
||||||
|
|
|
||||||
158
src/scheduler.py
Normal file
158
src/scheduler.py
Normal file
|
|
@ -0,0 +1,158 @@
|
||||||
|
"""
|
||||||
|
Schedule evaluation for automatic data sync.
|
||||||
|
|
||||||
|
Parses sync_schedule strings from table configuration and determines
|
||||||
|
whether a table is due for synchronization based on its last sync time.
|
||||||
|
|
||||||
|
Schedule formats:
|
||||||
|
"every 15m" - every 15 minutes
|
||||||
|
"every 1h" - every hour
|
||||||
|
"daily 05:00" - once per day at 05:00 UTC
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import re
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# Pattern: "every 15m", "every 2h"
|
||||||
|
INTERVAL_PATTERN = re.compile(r"^every (\d+)([mh])$")
|
||||||
|
|
||||||
|
# Pattern: "daily 05:00", "daily 17:30"
|
||||||
|
DAILY_PATTERN = re.compile(r"^daily (\d{2}):(\d{2})$")
|
||||||
|
|
||||||
|
|
||||||
|
def parse_interval_minutes(schedule: str) -> Optional[int]:
|
||||||
|
"""Parse an interval schedule into minutes.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
schedule: Schedule string like "every 15m" or "every 1h"
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Interval in minutes, or None if not an interval schedule.
|
||||||
|
"""
|
||||||
|
match = INTERVAL_PATTERN.match(schedule)
|
||||||
|
if not match:
|
||||||
|
return None
|
||||||
|
value = int(match.group(1))
|
||||||
|
unit = match.group(2)
|
||||||
|
if unit == "h":
|
||||||
|
return value * 60
|
||||||
|
return value
|
||||||
|
|
||||||
|
|
||||||
|
def is_table_due(
|
||||||
|
schedule: str,
|
||||||
|
last_sync_iso: Optional[str],
|
||||||
|
now: Optional[datetime] = None,
|
||||||
|
) -> bool:
|
||||||
|
"""Determine whether a table is due for sync based on its schedule.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
schedule: Schedule string from table config (e.g., "every 1h", "daily 05:00")
|
||||||
|
last_sync_iso: ISO timestamp of last sync, or None if never synced
|
||||||
|
now: Current time (UTC). Defaults to datetime.now(timezone.utc).
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
True if the table should be synced now.
|
||||||
|
"""
|
||||||
|
if now is None:
|
||||||
|
now = datetime.now(timezone.utc)
|
||||||
|
|
||||||
|
# Never synced -> always due
|
||||||
|
if not last_sync_iso:
|
||||||
|
logger.info("Table never synced, marking as due")
|
||||||
|
return True
|
||||||
|
|
||||||
|
# Parse last_sync timestamp
|
||||||
|
last_sync = _parse_timestamp(last_sync_iso)
|
||||||
|
if last_sync is None:
|
||||||
|
logger.warning(f"Cannot parse last_sync timestamp: {last_sync_iso}, marking as due")
|
||||||
|
return True
|
||||||
|
|
||||||
|
# Ensure timezone-aware comparison
|
||||||
|
if last_sync.tzinfo is None:
|
||||||
|
last_sync = last_sync.replace(tzinfo=timezone.utc)
|
||||||
|
|
||||||
|
# Check interval schedule: "every Xm" / "every Xh"
|
||||||
|
interval_minutes = parse_interval_minutes(schedule)
|
||||||
|
if interval_minutes is not None:
|
||||||
|
elapsed_minutes = (now - last_sync).total_seconds() / 60
|
||||||
|
due = elapsed_minutes >= interval_minutes
|
||||||
|
if due:
|
||||||
|
logger.debug(
|
||||||
|
f"Interval schedule: {elapsed_minutes:.0f}m elapsed >= {interval_minutes}m interval"
|
||||||
|
)
|
||||||
|
return due
|
||||||
|
|
||||||
|
# Check daily schedule: "daily HH:MM"
|
||||||
|
match = DAILY_PATTERN.match(schedule)
|
||||||
|
if match:
|
||||||
|
target_hour = int(match.group(1))
|
||||||
|
target_minute = int(match.group(2))
|
||||||
|
return _is_daily_due(last_sync, now, target_hour, target_minute)
|
||||||
|
|
||||||
|
logger.warning(f"Unknown schedule format: {schedule}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def _is_daily_due(
|
||||||
|
last_sync: datetime,
|
||||||
|
now: datetime,
|
||||||
|
target_hour: int,
|
||||||
|
target_minute: int,
|
||||||
|
) -> bool:
|
||||||
|
"""Check if a daily schedule is due.
|
||||||
|
|
||||||
|
A daily schedule at HH:MM is due when:
|
||||||
|
1. Current time is at or past HH:MM today, AND
|
||||||
|
2. Last sync was before HH:MM today
|
||||||
|
|
||||||
|
This means: once HH:MM passes, the first scheduler tick will trigger it,
|
||||||
|
and subsequent ticks on the same day will skip it.
|
||||||
|
"""
|
||||||
|
# Today's target time
|
||||||
|
today_target = now.replace(
|
||||||
|
hour=target_hour, minute=target_minute, second=0, microsecond=0
|
||||||
|
)
|
||||||
|
|
||||||
|
# Not yet time today
|
||||||
|
if now < today_target:
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Time has passed, check if we already synced after today's target
|
||||||
|
if last_sync >= today_target:
|
||||||
|
return False
|
||||||
|
|
||||||
|
logger.debug(
|
||||||
|
f"Daily schedule: target {target_hour:02d}:{target_minute:02d} UTC, "
|
||||||
|
f"last sync {last_sync.isoformat()}, now {now.isoformat()} -> due"
|
||||||
|
)
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_timestamp(iso_string: str) -> Optional[datetime]:
|
||||||
|
"""Parse an ISO timestamp string, handling various formats.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
iso_string: ISO 8601 timestamp string
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
datetime object or None if parsing fails
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
# Python 3.11+ fromisoformat handles most formats
|
||||||
|
return datetime.fromisoformat(iso_string)
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Fallback: try common formats
|
||||||
|
for fmt in ("%Y-%m-%dT%H:%M:%S.%f", "%Y-%m-%dT%H:%M:%S", "%Y-%m-%d %H:%M:%S"):
|
||||||
|
try:
|
||||||
|
return datetime.strptime(iso_string, fmt)
|
||||||
|
except ValueError:
|
||||||
|
continue
|
||||||
|
|
||||||
|
return None
|
||||||
321
tests/test_scheduler.py
Normal file
321
tests/test_scheduler.py
Normal file
|
|
@ -0,0 +1,321 @@
|
||||||
|
"""Tests for src.scheduler - schedule parsing and sync-due evaluation."""
|
||||||
|
|
||||||
|
from datetime import datetime, timedelta, timezone
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from src.scheduler import (
|
||||||
|
_is_daily_due,
|
||||||
|
_parse_timestamp,
|
||||||
|
is_table_due,
|
||||||
|
parse_interval_minutes,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Fixed reference time: 2026-03-15 12:00:00 UTC
|
||||||
|
NOW = datetime(2026, 3, 15, 12, 0, 0, tzinfo=timezone.utc)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# parse_interval_minutes
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class TestParseIntervalMinutes:
|
||||||
|
"""Tests for parse_interval_minutes()."""
|
||||||
|
|
||||||
|
def test_minutes_basic(self) -> None:
|
||||||
|
assert parse_interval_minutes("every 15m") == 15
|
||||||
|
|
||||||
|
def test_minutes_single_digit(self) -> None:
|
||||||
|
assert parse_interval_minutes("every 5m") == 5
|
||||||
|
|
||||||
|
def test_minutes_large(self) -> None:
|
||||||
|
assert parse_interval_minutes("every 120m") == 120
|
||||||
|
|
||||||
|
def test_hours_basic(self) -> None:
|
||||||
|
assert parse_interval_minutes("every 2h") == 120
|
||||||
|
|
||||||
|
def test_hours_single(self) -> None:
|
||||||
|
assert parse_interval_minutes("every 1h") == 60
|
||||||
|
|
||||||
|
def test_hours_large(self) -> None:
|
||||||
|
assert parse_interval_minutes("every 24h") == 1440
|
||||||
|
|
||||||
|
def test_daily_returns_none(self) -> None:
|
||||||
|
assert parse_interval_minutes("daily 05:00") is None
|
||||||
|
|
||||||
|
def test_invalid_format_returns_none(self) -> None:
|
||||||
|
assert parse_interval_minutes("not a schedule") is None
|
||||||
|
|
||||||
|
def test_empty_string_returns_none(self) -> None:
|
||||||
|
assert parse_interval_minutes("") is None
|
||||||
|
|
||||||
|
def test_missing_unit_returns_none(self) -> None:
|
||||||
|
assert parse_interval_minutes("every 15") is None
|
||||||
|
|
||||||
|
def test_wrong_unit_returns_none(self) -> None:
|
||||||
|
assert parse_interval_minutes("every 15s") is None
|
||||||
|
|
||||||
|
def test_no_space_returns_none(self) -> None:
|
||||||
|
assert parse_interval_minutes("every15m") is None
|
||||||
|
|
||||||
|
def test_extra_whitespace_returns_none(self) -> None:
|
||||||
|
# Strict parsing: extra whitespace is rejected
|
||||||
|
assert parse_interval_minutes("every 15m") is None
|
||||||
|
|
||||||
|
def test_negative_not_matched(self) -> None:
|
||||||
|
# Regex uses \d+ so negative sign won't match
|
||||||
|
assert parse_interval_minutes("every -5m") is None
|
||||||
|
|
||||||
|
def test_zero_minutes(self) -> None:
|
||||||
|
# "every 0m" matches the pattern, returns 0
|
||||||
|
assert parse_interval_minutes("every 0m") == 0
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# is_table_due - interval schedules
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class TestIsTableDueInterval:
|
||||||
|
"""Tests for is_table_due() with interval-based schedules."""
|
||||||
|
|
||||||
|
def test_never_synced_is_due(self) -> None:
|
||||||
|
assert is_table_due("every 15m", last_sync_iso=None, now=NOW) is True
|
||||||
|
|
||||||
|
def test_empty_last_sync_is_due(self) -> None:
|
||||||
|
assert is_table_due("every 15m", last_sync_iso="", now=NOW) is True
|
||||||
|
|
||||||
|
def test_synced_10min_ago_every_15m_not_due(self) -> None:
|
||||||
|
last_sync = (NOW - timedelta(minutes=10)).isoformat()
|
||||||
|
assert is_table_due("every 15m", last_sync_iso=last_sync, now=NOW) is False
|
||||||
|
|
||||||
|
def test_synced_20min_ago_every_15m_is_due(self) -> None:
|
||||||
|
last_sync = (NOW - timedelta(minutes=20)).isoformat()
|
||||||
|
assert is_table_due("every 15m", last_sync_iso=last_sync, now=NOW) is True
|
||||||
|
|
||||||
|
def test_synced_exactly_15min_ago_every_15m_is_due(self) -> None:
|
||||||
|
last_sync = (NOW - timedelta(minutes=15)).isoformat()
|
||||||
|
assert is_table_due("every 15m", last_sync_iso=last_sync, now=NOW) is True
|
||||||
|
|
||||||
|
def test_synced_30min_ago_every_1h_not_due(self) -> None:
|
||||||
|
last_sync = (NOW - timedelta(minutes=30)).isoformat()
|
||||||
|
assert is_table_due("every 1h", last_sync_iso=last_sync, now=NOW) is False
|
||||||
|
|
||||||
|
def test_synced_90min_ago_every_1h_is_due(self) -> None:
|
||||||
|
last_sync = (NOW - timedelta(minutes=90)).isoformat()
|
||||||
|
assert is_table_due("every 1h", last_sync_iso=last_sync, now=NOW) is True
|
||||||
|
|
||||||
|
def test_synced_exactly_1h_ago_every_1h_is_due(self) -> None:
|
||||||
|
last_sync = (NOW - timedelta(hours=1)).isoformat()
|
||||||
|
assert is_table_due("every 1h", last_sync_iso=last_sync, now=NOW) is True
|
||||||
|
|
||||||
|
def test_synced_59min_ago_every_1h_not_due(self) -> None:
|
||||||
|
last_sync = (NOW - timedelta(minutes=59)).isoformat()
|
||||||
|
assert is_table_due("every 1h", last_sync_iso=last_sync, now=NOW) is False
|
||||||
|
|
||||||
|
def test_synced_3h_ago_every_2h_is_due(self) -> None:
|
||||||
|
last_sync = (NOW - timedelta(hours=3)).isoformat()
|
||||||
|
assert is_table_due("every 2h", last_sync_iso=last_sync, now=NOW) is True
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# is_table_due - daily schedules
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class TestIsTableDueDaily:
|
||||||
|
"""Tests for is_table_due() with daily schedules."""
|
||||||
|
|
||||||
|
def test_before_target_time_not_due(self) -> None:
|
||||||
|
now = datetime(2026, 3, 15, 4, 30, 0, tzinfo=timezone.utc)
|
||||||
|
last_sync = datetime(2026, 3, 14, 6, 0, 0, tzinfo=timezone.utc).isoformat()
|
||||||
|
assert is_table_due("daily 05:00", last_sync_iso=last_sync, now=now) is False
|
||||||
|
|
||||||
|
def test_past_target_not_synced_today_is_due(self) -> None:
|
||||||
|
now = datetime(2026, 3, 15, 5, 30, 0, tzinfo=timezone.utc)
|
||||||
|
last_sync = datetime(2026, 3, 15, 4, 0, 0, tzinfo=timezone.utc).isoformat()
|
||||||
|
assert is_table_due("daily 05:00", last_sync_iso=last_sync, now=now) is True
|
||||||
|
|
||||||
|
def test_past_target_already_synced_after_target_not_due(self) -> None:
|
||||||
|
now = datetime(2026, 3, 15, 5, 30, 0, tzinfo=timezone.utc)
|
||||||
|
last_sync = datetime(2026, 3, 15, 5, 15, 0, tzinfo=timezone.utc).isoformat()
|
||||||
|
assert is_table_due("daily 05:00", last_sync_iso=last_sync, now=now) is False
|
||||||
|
|
||||||
|
def test_evening_schedule_past_target_last_sync_yesterday_is_due(self) -> None:
|
||||||
|
now = datetime(2026, 3, 15, 18, 0, 0, tzinfo=timezone.utc)
|
||||||
|
last_sync = datetime(2026, 3, 14, 17, 30, 0, tzinfo=timezone.utc).isoformat()
|
||||||
|
assert is_table_due("daily 17:00", last_sync_iso=last_sync, now=now) is True
|
||||||
|
|
||||||
|
def test_daily_never_synced_is_due(self) -> None:
|
||||||
|
now = datetime(2026, 3, 15, 6, 0, 0, tzinfo=timezone.utc)
|
||||||
|
assert is_table_due("daily 05:00", last_sync_iso=None, now=now) is True
|
||||||
|
|
||||||
|
def test_daily_never_synced_before_target_still_due(self) -> None:
|
||||||
|
# Never synced always returns True regardless of target time
|
||||||
|
now = datetime(2026, 3, 15, 3, 0, 0, tzinfo=timezone.utc)
|
||||||
|
assert is_table_due("daily 05:00", last_sync_iso=None, now=now) is True
|
||||||
|
|
||||||
|
def test_daily_exactly_at_target_time_is_due(self) -> None:
|
||||||
|
now = datetime(2026, 3, 15, 5, 0, 0, tzinfo=timezone.utc)
|
||||||
|
last_sync = datetime(2026, 3, 14, 5, 0, 0, tzinfo=timezone.utc).isoformat()
|
||||||
|
# now == today_target, so now < today_target is False
|
||||||
|
# last_sync (yesterday) < today_target => due
|
||||||
|
assert is_table_due("daily 05:00", last_sync_iso=last_sync, now=now) is True
|
||||||
|
|
||||||
|
def test_daily_synced_at_exactly_target_not_due_again(self) -> None:
|
||||||
|
now = datetime(2026, 3, 15, 5, 30, 0, tzinfo=timezone.utc)
|
||||||
|
last_sync = datetime(2026, 3, 15, 5, 0, 0, tzinfo=timezone.utc).isoformat()
|
||||||
|
# last_sync == today_target => last_sync >= today_target => not due
|
||||||
|
assert is_table_due("daily 05:00", last_sync_iso=last_sync, now=now) is False
|
||||||
|
|
||||||
|
def test_midnight_schedule(self) -> None:
|
||||||
|
now = datetime(2026, 3, 15, 0, 30, 0, tzinfo=timezone.utc)
|
||||||
|
last_sync = datetime(2026, 3, 14, 0, 15, 0, tzinfo=timezone.utc).isoformat()
|
||||||
|
assert is_table_due("daily 00:00", last_sync_iso=last_sync, now=now) is True
|
||||||
|
|
||||||
|
def test_end_of_day_schedule(self) -> None:
|
||||||
|
now = datetime(2026, 3, 15, 23, 59, 0, tzinfo=timezone.utc)
|
||||||
|
last_sync = datetime(2026, 3, 14, 23, 50, 0, tzinfo=timezone.utc).isoformat()
|
||||||
|
assert is_table_due("daily 23:30", last_sync_iso=last_sync, now=now) is True
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# is_table_due - edge cases
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class TestIsTableDueEdgeCases:
|
||||||
|
"""Edge case tests for is_table_due()."""
|
||||||
|
|
||||||
|
def test_unparseable_last_sync_returns_true(self) -> None:
|
||||||
|
# Fail-safe: if we can't parse last_sync, assume sync is needed
|
||||||
|
assert is_table_due("every 15m", last_sync_iso="garbage", now=NOW) is True
|
||||||
|
|
||||||
|
def test_unknown_schedule_format_returns_false(self) -> None:
|
||||||
|
last_sync = (NOW - timedelta(hours=2)).isoformat()
|
||||||
|
assert is_table_due("weekly", last_sync_iso=last_sync, now=NOW) is False
|
||||||
|
|
||||||
|
def test_unknown_schedule_never_synced_returns_true(self) -> None:
|
||||||
|
# Never synced takes priority over unknown schedule
|
||||||
|
assert is_table_due("weekly", last_sync_iso=None, now=NOW) is True
|
||||||
|
|
||||||
|
def test_now_defaults_to_current_time(self) -> None:
|
||||||
|
# When now is not provided, it defaults to current UTC time
|
||||||
|
# A table that was never synced should be due regardless
|
||||||
|
assert is_table_due("every 15m", last_sync_iso=None) is True
|
||||||
|
|
||||||
|
def test_naive_last_sync_treated_as_utc(self) -> None:
|
||||||
|
# Naive timestamp (no timezone) should be treated as UTC
|
||||||
|
naive_ts = "2026-03-15T11:50:00"
|
||||||
|
# 10 minutes ago from NOW (12:00), with 15m interval -> not due
|
||||||
|
assert is_table_due("every 15m", last_sync_iso=naive_ts, now=NOW) is False
|
||||||
|
|
||||||
|
def test_last_sync_in_future_not_due(self) -> None:
|
||||||
|
# Edge case: last_sync in the future (clock skew, etc.)
|
||||||
|
future = (NOW + timedelta(hours=1)).isoformat()
|
||||||
|
assert is_table_due("every 15m", last_sync_iso=future, now=NOW) is False
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# _is_daily_due (internal function, direct tests)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class TestIsDailyDue:
|
||||||
|
"""Direct tests for _is_daily_due() internal function."""
|
||||||
|
|
||||||
|
def test_before_target_not_due(self) -> None:
|
||||||
|
now = datetime(2026, 3, 15, 4, 0, 0, tzinfo=timezone.utc)
|
||||||
|
last_sync = datetime(2026, 3, 14, 5, 30, 0, tzinfo=timezone.utc)
|
||||||
|
assert _is_daily_due(last_sync, now, target_hour=5, target_minute=0) is False
|
||||||
|
|
||||||
|
def test_after_target_last_sync_before_target_is_due(self) -> None:
|
||||||
|
now = datetime(2026, 3, 15, 6, 0, 0, tzinfo=timezone.utc)
|
||||||
|
last_sync = datetime(2026, 3, 15, 4, 0, 0, tzinfo=timezone.utc)
|
||||||
|
assert _is_daily_due(last_sync, now, target_hour=5, target_minute=0) is True
|
||||||
|
|
||||||
|
def test_after_target_last_sync_after_target_not_due(self) -> None:
|
||||||
|
now = datetime(2026, 3, 15, 6, 0, 0, tzinfo=timezone.utc)
|
||||||
|
last_sync = datetime(2026, 3, 15, 5, 30, 0, tzinfo=timezone.utc)
|
||||||
|
assert _is_daily_due(last_sync, now, target_hour=5, target_minute=0) is False
|
||||||
|
|
||||||
|
def test_target_with_minutes(self) -> None:
|
||||||
|
now = datetime(2026, 3, 15, 17, 45, 0, tzinfo=timezone.utc)
|
||||||
|
last_sync = datetime(2026, 3, 15, 10, 0, 0, tzinfo=timezone.utc)
|
||||||
|
assert _is_daily_due(last_sync, now, target_hour=17, target_minute=30) is True
|
||||||
|
|
||||||
|
def test_target_with_minutes_not_yet(self) -> None:
|
||||||
|
now = datetime(2026, 3, 15, 17, 15, 0, tzinfo=timezone.utc)
|
||||||
|
last_sync = datetime(2026, 3, 15, 10, 0, 0, tzinfo=timezone.utc)
|
||||||
|
assert _is_daily_due(last_sync, now, target_hour=17, target_minute=30) is False
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# _parse_timestamp
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
class TestParseTimestamp:
|
||||||
|
"""Tests for _parse_timestamp() internal function."""
|
||||||
|
|
||||||
|
def test_iso_with_timezone(self) -> None:
|
||||||
|
result = _parse_timestamp("2026-03-15T12:00:00+00:00")
|
||||||
|
assert result is not None
|
||||||
|
assert result.year == 2026
|
||||||
|
assert result.month == 3
|
||||||
|
assert result.day == 15
|
||||||
|
assert result.hour == 12
|
||||||
|
|
||||||
|
def test_iso_with_z_suffix(self) -> None:
|
||||||
|
# Python 3.11+ fromisoformat handles Z
|
||||||
|
result = _parse_timestamp("2026-03-15T12:00:00Z")
|
||||||
|
assert result is not None
|
||||||
|
assert result.hour == 12
|
||||||
|
|
||||||
|
def test_iso_without_timezone(self) -> None:
|
||||||
|
result = _parse_timestamp("2026-03-15T12:00:00")
|
||||||
|
assert result is not None
|
||||||
|
assert result.hour == 12
|
||||||
|
assert result.tzinfo is None
|
||||||
|
|
||||||
|
def test_iso_with_microseconds(self) -> None:
|
||||||
|
result = _parse_timestamp("2026-03-15T12:00:00.123456")
|
||||||
|
assert result is not None
|
||||||
|
assert result.microsecond == 123456
|
||||||
|
|
||||||
|
def test_space_separated(self) -> None:
|
||||||
|
result = _parse_timestamp("2026-03-15 12:00:00")
|
||||||
|
assert result is not None
|
||||||
|
assert result.hour == 12
|
||||||
|
|
||||||
|
def test_invalid_string_returns_none(self) -> None:
|
||||||
|
assert _parse_timestamp("not-a-date") is None
|
||||||
|
|
||||||
|
def test_empty_string_returns_none(self) -> None:
|
||||||
|
assert _parse_timestamp("") is None
|
||||||
|
|
||||||
|
def test_partial_date_returns_none(self) -> None:
|
||||||
|
# "2026-03-15" alone - fromisoformat handles date-only in 3.11+
|
||||||
|
result = _parse_timestamp("2026-03-15")
|
||||||
|
# Should parse as a date (with hour=0, minute=0)
|
||||||
|
assert result is not None
|
||||||
|
assert result.hour == 0
|
||||||
|
|
||||||
|
def test_iso_with_positive_offset(self) -> None:
|
||||||
|
result = _parse_timestamp("2026-03-15T12:00:00+05:30")
|
||||||
|
assert result is not None
|
||||||
|
assert result.hour == 12
|
||||||
|
assert result.utcoffset() is not None
|
||||||
|
|
||||||
|
def test_iso_with_negative_offset(self) -> None:
|
||||||
|
result = _parse_timestamp("2026-03-15T12:00:00-07:00")
|
||||||
|
assert result is not None
|
||||||
|
assert result.utcoffset() is not None
|
||||||
|
|
||||||
|
def test_numeric_garbage_returns_none(self) -> None:
|
||||||
|
assert _parse_timestamp("12345") is None
|
||||||
|
|
||||||
|
def test_none_like_string_returns_none(self) -> None:
|
||||||
|
assert _parse_timestamp("None") is None
|
||||||
Loading…
Reference in a new issue