Extract Jira into connectors/jira module

Move all Jira-specific code into a self-contained connector module:
- 22 files moved via git mv (transform, service, webhook, scripts,
  systemd units, tests, docs, bin helper)
- All imports updated to use connectors.jira.* paths
- Jira is now conditional: auto-detected via JIRA_DOMAIN env var
- Webapp registers Jira blueprint only when available
- Health service monitors Jira timers only when enabled
- Profiler loads Jira tables dynamically from filesystem
- Sync settings uses config-driven dependency validation
- Renamed keboola_platform_url -> custom_url in transform
- Updated deploy.sh, sudoers-deploy, backfill_gap.sh paths
- Fixed pytest.ini to skip live tests by default
This commit is contained in:
Petr 2026-03-09 11:17:50 +01:00
parent d8226c6641
commit 86edd27655
37 changed files with 211 additions and 172 deletions

View file

@ -150,8 +150,8 @@ When reopening the project in Claude Code:
### Files NOT to modify (stable infrastructure) ### Files NOT to modify (stable infrastructure)
- `src/parquet_manager.py` - Parquet conversion engine - `src/parquet_manager.py` - Parquet conversion engine
- `src/jira_file_lock.py` - Advisory file locking - `connectors/jira/file_lock.py` - Advisory file locking
- `src/incremental_jira_transform.py` - Jira monthly Parquet transform - `connectors/jira/incremental_transform.py` - Jira monthly Parquet transform
- `server/ws_gateway/` - WebSocket notification gateway - `server/ws_gateway/` - WebSocket notification gateway
## Git Commits & Pull Requests ## Git Commits & Pull Requests

1
connectors/__init__.py Normal file
View file

@ -0,0 +1 @@
"""Connectors package - pluggable data source integrations."""

View file

@ -108,7 +108,7 @@ Real-time sync of Jira support tickets for AI-powered analysis.
### 2. Webhook Receiver ### 2. Webhook Receiver
**File:** `webapp/jira_webhook.py` **File:** `connectors/jira/webhook.py`
Flask blueprint that handles incoming webhooks: Flask blueprint that handles incoming webhooks:
@ -131,7 +131,7 @@ def receive_jira_webhook():
### 3. Jira Service ### 3. Jira Service
**File:** `webapp/jira_service.py` **File:** `connectors/jira/service.py`
Handles Jira API communication and data persistence: Handles Jira API communication and data persistence:
@ -171,13 +171,13 @@ Two transformation modes are available:
#### 4a. Incremental Transform (Real-Time) #### 4a. Incremental Transform (Real-Time)
**File:** `src/incremental_jira_transform.py` **File:** `connectors/jira/incremental_transform.py`
Called automatically by webhook handler after saving issue JSON and attachments. Updates only the affected monthly Parquet file. Called automatically by webhook handler after saving issue JSON and attachments. Updates only the affected monthly Parquet file.
```python ```python
# Called from jira_service.py after save_issue() # Called from jira_service.py after save_issue()
from src.incremental_jira_transform import transform_single_issue from connectors.jira.incremental_transform import transform_single_issue
transform_single_issue( transform_single_issue(
issue_key="SUPPORT-1234", issue_key="SUPPORT-1234",
@ -200,12 +200,12 @@ transform_single_issue(
#### 4b. Batch Transform (Initial Load / Recovery) #### 4b. Batch Transform (Initial Load / Recovery)
**File:** `src/jira_transform.py` **File:** `connectors/jira/transform.py`
Used for initial historical load or to rebuild all Parquet from raw JSON. Used for initial historical load or to rebuild all Parquet from raw JSON.
```bash ```bash
python src/jira_transform.py \ python -m connectors.jira.transform \
--raw-dir /data/src_data/raw/jira \ --raw-dir /data/src_data/raw/jira \
--output-dir /data/src_data/parquet/jira \ --output-dir /data/src_data/parquet/jira \
--attachments-dir /data/src_data/raw/jira/attachments --attachments-dir /data/src_data/raw/jira/attachments
@ -422,7 +422,7 @@ if not hmac.compare_digest(signature, expected):
1. Run transformation manually: 1. Run transformation manually:
```bash ```bash
python src/jira_transform.py \ python -m connectors.jira.transform \
--raw-dir /data/src_data/raw/jira \ --raw-dir /data/src_data/raw/jira \
--output-dir /data/src_data/parquet/jira \ --output-dir /data/src_data/parquet/jira \
--attachments-dir /data/src_data/raw/jira/attachments --attachments-dir /data/src_data/raw/jira/attachments
@ -439,11 +439,11 @@ See [docs/jira_schema.md](jira_schema.md) for detailed table schemas and example
For initial setup or recovery, use the backfill script to download all historical issues. For initial setup or recovery, use the backfill script to download all historical issues.
**File:** `scripts/jira_backfill.py` **File:** `connectors/jira/scripts/backfill.py`
```bash ```bash
# Download all SUPPORT tickets (idempotent, skips existing) # Download all SUPPORT tickets (idempotent, skips existing)
python scripts/jira_backfill.py --parallel 4 python -m connectors.jira.scripts.backfill --parallel 4
# Environment variables required: # Environment variables required:
JIRA_DOMAIN=your-org.atlassian.net JIRA_DOMAIN=your-org.atlassian.net
@ -461,14 +461,14 @@ JIRA_DATA_DIR=/data/src_data/raw/jira # optional, default path
**SLA backfill** (separate script, uses JSM service account): **SLA backfill** (separate script, uses JSM service account):
**File:** `scripts/jira_backfill_sla.py` **File:** `connectors/jira/scripts/backfill_sla.py`
```bash ```bash
# Fetch SLA fields for all issues (uses JIRA_SLA_* env vars) # Fetch SLA fields for all issues (uses JIRA_SLA_* env vars)
python scripts/jira_backfill_sla.py --parallel 8 python -m connectors.jira.scripts.backfill_sla --parallel 8
# Dry run (count files needing update): # Dry run (count files needing update):
python scripts/jira_backfill_sla.py --dry-run python -m connectors.jira.scripts.backfill_sla --dry-run
``` ```
The personal API token lacks JSM Agent licence needed for SLA fields. The personal API token lacks JSM Agent licence needed for SLA fields.
@ -478,7 +478,7 @@ into existing raw JSON files.
**After backfill, run batch transform:** **After backfill, run batch transform:**
```bash ```bash
python src/jira_transform.py \ python -m connectors.jira.transform \
--raw-dir /data/src_data/raw/jira \ --raw-dir /data/src_data/raw/jira \
--output-dir /data/src_data/parquet/jira \ --output-dir /data/src_data/parquet/jira \
--attachments-dir /data/src_data/raw/jira/attachments --attachments-dir /data/src_data/raw/jira/attachments
@ -491,7 +491,7 @@ cp -r /data/src_data/parquet/jira/* ~/server/parquet/jira/
SLA elapsed values (`first_response_elapsed_millis`, `time_to_resolution_elapsed_millis`) only update when a webhook fires. For idle open tickets (~49 tickets, ~0.3% of dataset), these values go stale and no longer reflect the actual current elapsed time. SLA elapsed values (`first_response_elapsed_millis`, `time_to_resolution_elapsed_millis`) only update when a webhook fires. For idle open tickets (~49 tickets, ~0.3% of dataset), these values go stale and no longer reflect the actual current elapsed time.
**File:** `scripts/jira_poll_sla.py` **File:** `connectors/jira/scripts/poll_sla.py`
The SLA polling job runs every 15 minutes via systemd timer (`jira-sla-poll.timer`) as `root:data-ops` and: The SLA polling job runs every 15 minutes via systemd timer (`jira-sla-poll.timer`) as `root:data-ops` and:
@ -502,19 +502,19 @@ The SLA polling job runs every 15 minutes via systemd timer (`jira-sla-poll.time
**Self-healing:** The poll fetches `status`, `resolution`, `resolutiondate`, and `updated` alongside the SLA fields. If a ticket is resolved in Jira but still appears "open" in Parquet (e.g. due to a missed webhook), the poll automatically corrects the status in JSON and re-transforms to Parquet. Log output: `Self-healing: SUPPORT-XXXX is resolved in Jira`. This was added in response to [#203](https://github.com/your-org/ai-data-analyst/issues/203) where 12 tickets were permanently stale after a permission bug prevented webhooks from updating JSON files. **Self-healing:** The poll fetches `status`, `resolution`, `resolutiondate`, and `updated` alongside the SLA fields. If a ticket is resolved in Jira but still appears "open" in Parquet (e.g. due to a missed webhook), the poll automatically corrects the status in JSON and re-transforms to Parquet. Log output: `Self-healing: SUPPORT-XXXX is resolved in Jira`. This was added in response to [#203](https://github.com/your-org/ai-data-analyst/issues/203) where 12 tickets were permanently stale after a permission bug prevented webhooks from updating JSON files.
**File locking:** The entire read-modify-write + Parquet transform is wrapped in a per-issue advisory file lock (`src/jira_file_lock.py`) to prevent races with the webhook handler. The webhook handler (`webapp/jira_service.py`) uses the same lock. Different issue keys don't block each other. **File locking:** The entire read-modify-write + Parquet transform is wrapped in a per-issue advisory file lock (`connectors/jira/file_lock.py`) to prevent races with the webhook handler. The webhook handler (`connectors/jira/service.py`) uses the same lock. Different issue keys don't block each other.
**Important — `mkstemp` and ACL:** The `issues/` directory uses POSIX ACLs with `default:mask::rwx`. `tempfile.mkstemp()` creates files with mode `0600`, which overrides the ACL mask to `---` and breaks group access for www-data (webhook handler) and deploy (batch transform). The `os.fchmod(fd, 0o660)` call immediately after `mkstemp()` restores the mask to `rw-`, preserving ACL-based access. See [#203](https://github.com/your-org/ai-data-analyst/issues/203) for the full incident report. **Important — `mkstemp` and ACL:** The `issues/` directory uses POSIX ACLs with `default:mask::rwx`. `tempfile.mkstemp()` creates files with mode `0600`, which overrides the ACL mask to `---` and breaks group access for www-data (webhook handler) and deploy (batch transform). The `os.fchmod(fd, 0o660)` call immediately after `mkstemp()` restores the mask to `rw-`, preserving ACL-based access. See [#203](https://github.com/your-org/ai-data-analyst/issues/203) for the full incident report.
```bash ```bash
# Manual run # Manual run
python scripts/jira_poll_sla.py python -m connectors.jira.scripts.poll_sla
# Dry run (count open issues) # Dry run (count open issues)
python scripts/jira_poll_sla.py --dry-run python -m connectors.jira.scripts.poll_sla --dry-run
# Verbose logging # Verbose logging
python scripts/jira_poll_sla.py --verbose python -m connectors.jira.scripts.poll_sla --verbose
``` ```
**Return states:** **Return states:**

View file

@ -0,0 +1,9 @@
"""
Jira connector - optional push-based data integration.
Provides real-time webhook ingestion, batch backfill, SLA polling,
and incremental Parquet transforms for Jira Cloud issues.
Enable by setting jira.enabled: true in config/instance.yaml
and providing JIRA_* environment variables.
"""

View file

@ -12,7 +12,7 @@ Lock nesting order (always outer → inner to prevent deadlocks):
Uses fcntl.flock() for POSIX advisory locking (works across processes). Uses fcntl.flock() for POSIX advisory locking (works across processes).
Usage: Usage:
from src.jira_file_lock import issue_json_lock, parquet_month_lock from connectors.jira.file_lock import issue_json_lock, parquet_month_lock
with issue_json_lock(issues_dir, "SUPPORT-1234"): with issue_json_lock(issues_dir, "SUPPORT-1234"):
# read JSON, modify, write # read JSON, modify, write

View file

@ -16,8 +16,8 @@ import pyarrow as pa
import pyarrow.parquet as pq import pyarrow.parquet as pq
# Import transform functions from batch transform # Import transform functions from batch transform
from .jira_file_lock import parquet_month_lock from .file_lock import parquet_month_lock
from .jira_transform import ( from .transform import (
ATTACHMENTS_SCHEMA, ATTACHMENTS_SCHEMA,
CHANGELOG_SCHEMA, CHANGELOG_SCHEMA,
COMMENTS_SCHEMA, COMMENTS_SCHEMA,

View file

View file

@ -1,22 +1,22 @@
#!/usr/bin/env python3 #!/usr/bin/env python3
""" """
Jira Backfill Script - Download all historical SUPPORT tickets. Jira Backfill Script - Download all historical Jira issues.
Downloads all issues from Jira SUPPORT project using JQL search with pagination. Downloads all issues from Jira using JQL search with pagination.
Reuses the webapp's JiraService for consistent data handling. Reuses the webapp's JiraService for consistent data handling.
Usage: Usage:
# On server (uses /opt/data-analyst/.env): # On server (uses /opt/data-analyst/.env):
python scripts/jira_backfill.py python -m connectors.jira.scripts.backfill
# With custom settings: # With custom settings:
python scripts/jira_backfill.py --jql "project = SUPPORT AND created >= 2025-01-01" python -m connectors.jira.scripts.backfill --jql "project = MY_PROJECT AND created >= 2025-01-01"
# Skip already downloaded issues: # Skip already downloaded issues:
python scripts/jira_backfill.py --skip-existing python -m connectors.jira.scripts.backfill --skip-existing
# Dry run (show what would be downloaded): # Dry run (show what would be downloaded):
python scripts/jira_backfill.py --dry-run python -m connectors.jira.scripts.backfill --dry-run
Environment variables (loaded from .env or set manually): Environment variables (loaded from .env or set manually):
JIRA_DOMAIN - Jira Cloud domain (e.g., your-org.atlassian.net) JIRA_DOMAIN - Jira Cloud domain (e.g., your-org.atlassian.net)
@ -158,7 +158,7 @@ class JiraBackfill:
jql: JQL query string jql: JQL query string
Yields: Yields:
Issue keys (e.g., "SUPPORT-15190") Issue keys (e.g., "PROJ-15190")
""" """
next_page_token = None next_page_token = None
total_fetched = 0 total_fetched = 0
@ -201,7 +201,7 @@ class JiraBackfill:
Fetch complete issue data from Jira. Fetch complete issue data from Jira.
Args: Args:
issue_key: Issue key (e.g., "SUPPORT-123") issue_key: Issue key (e.g., "PROJ-123")
Returns: Returns:
Issue data dict or None if fetch failed Issue data dict or None if fetch failed
@ -245,7 +245,7 @@ class JiraBackfill:
Fetch remote links for an issue from Jira. Fetch remote links for an issue from Jira.
Args: Args:
issue_key: Issue key (e.g., "SUPPORT-123") issue_key: Issue key (e.g., "PROJ-123")
Returns: Returns:
List of remote link dicts, empty list on failure List of remote link dicts, empty list on failure
@ -504,7 +504,7 @@ class JiraBackfill:
def main(): def main():
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
description="Download all SUPPORT tickets from Jira", description="Download all Jira issues",
formatter_class=argparse.RawDescriptionHelpFormatter, formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=__doc__, epilog=__doc__,
) )
@ -543,7 +543,7 @@ def main():
) )
parser.add_argument( parser.add_argument(
"--issue-keys", "--issue-keys",
help="Comma-separated list of specific issue keys to backfill (e.g., SUPPORT-15307,SUPPORT-15308)", help="Comma-separated list of specific issue keys to backfill (e.g., PROJ-123,PROJ-456)",
) )
args = parser.parse_args() args = parser.parse_args()

View file

@ -8,13 +8,13 @@ Parquet transform to extract remote_links table data.
Usage: Usage:
# On server (uses /opt/data-analyst/.env): # On server (uses /opt/data-analyst/.env):
python scripts/jira_backfill_remote_links.py python -m connectors.jira.scripts.backfill_remote_links
# With parallel workers: # With parallel workers:
python scripts/jira_backfill_remote_links.py --parallel 4 python -m connectors.jira.scripts.backfill_remote_links --parallel 4
# Dry run: # Dry run:
python scripts/jira_backfill_remote_links.py --dry-run python -m connectors.jira.scripts.backfill_remote_links --dry-run
Environment variables (loaded from .env): Environment variables (loaded from .env):
JIRA_DOMAIN - Jira Cloud domain JIRA_DOMAIN - Jira Cloud domain

View file

@ -13,16 +13,16 @@ the domain-based URL (https://your-org.atlassian.net/rest/api/3/...).
Usage: Usage:
# On server: # On server:
python scripts/jira_backfill_sla.py python -m connectors.jira.scripts.backfill_sla
# With parallel workers: # With parallel workers:
python scripts/jira_backfill_sla.py --parallel 8 python -m connectors.jira.scripts.backfill_sla --parallel 8
# Dry run (count files needing update): # Dry run (count files needing update):
python scripts/jira_backfill_sla.py --dry-run python -m connectors.jira.scripts.backfill_sla --dry-run
# Force re-fetch even if SLA data already present: # Force re-fetch even if SLA data already present:
python scripts/jira_backfill_sla.py --force python -m connectors.jira.scripts.backfill_sla --force
Environment variables (loaded from .env): Environment variables (loaded from .env):
JIRA_SLA_EMAIL - Email for JSM service account authentication JIRA_SLA_EMAIL - Email for JSM service account authentication

View file

@ -12,13 +12,13 @@ Runs every 30 minutes via systemd timer to detect webhook losses and transform f
Usage: Usage:
# Dry run (check only, no fixes) # Dry run (check only, no fixes)
python scripts/jira_consistency_check.py --dry-run --max-age-days 7 python -m connectors.jira.scripts.consistency_check --dry-run --max-age-days 7
# Auto-fix mode (default) # Auto-fix mode (default)
python scripts/jira_consistency_check.py --auto-fix --max-age-days 30 python -m connectors.jira.scripts.consistency_check --auto-fix --max-age-days 30
# Weekly deep check (full history) # Weekly deep check (full history)
python scripts/jira_consistency_check.py --auto-fix --max-age-days 365 python -m connectors.jira.scripts.consistency_check --auto-fix --max-age-days 365
Environment variables (loaded from .env): Environment variables (loaded from .env):
JIRA_DOMAIN - Jira Cloud domain (e.g., your-org.atlassian.net) JIRA_DOMAIN - Jira Cloud domain (e.g., your-org.atlassian.net)
@ -353,7 +353,7 @@ class JiraConsistencyChecker:
# Build command for targeted backfill (force re-download to fix corrupted files) # Build command for targeted backfill (force re-download to fix corrupted files)
cmd = [ cmd = [
str(self.config.venv_python), str(self.config.venv_python),
str(self.config.repo_dir / "scripts" / "jira_backfill.py"), str(self.config.repo_dir / "connectors" / "jira" / "scripts" / "backfill.py"),
"--issue-keys", "--issue-keys",
",".join(issue_keys), ",".join(issue_keys),
"--no-skip-existing", # Force re-download even if files exist "--no-skip-existing", # Force re-download even if files exist
@ -406,7 +406,7 @@ class JiraConsistencyChecker:
cmd = [ cmd = [
str(self.config.venv_python), str(self.config.venv_python),
"-m", "-m",
"src.incremental_jira_transform", "connectors.jira.incremental_transform",
issue_key, issue_key,
"--raw-dir", str(self.config.raw_dir), "--raw-dir", str(self.config.raw_dir),
"--output-dir", str(self.config.parquet_dir), "--output-dir", str(self.config.parquet_dir),

View file

@ -15,13 +15,13 @@ Designed to run as a systemd timer (every 15 min) via jira-sla-poll.timer.
Usage: Usage:
# On server: # On server:
python scripts/jira_poll_sla.py python -m connectors.jira.scripts.poll_sla
# Dry run (count open issues, don't fetch): # Dry run (count open issues, don't fetch):
python scripts/jira_poll_sla.py --dry-run python -m connectors.jira.scripts.poll_sla --dry-run
# Verbose logging: # Verbose logging:
python scripts/jira_poll_sla.py --verbose python -m connectors.jira.scripts.poll_sla --verbose
Environment variables (loaded from .env): Environment variables (loaded from .env):
JIRA_SLA_EMAIL - Email for JSM service account authentication JIRA_SLA_EMAIL - Email for JSM service account authentication
@ -44,16 +44,16 @@ import pandas as pd
from dotenv import load_dotenv from dotenv import load_dotenv
# Add project root to sys.path for imports # Add project root to sys.path for imports
PROJECT_ROOT = Path(__file__).resolve().parent.parent PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent.parent
sys.path.insert(0, str(PROJECT_ROOT)) sys.path.insert(0, str(PROJECT_ROOT))
from scripts.jira_backfill_sla import ( from connectors.jira.scripts.backfill_sla import (
SLA_FIELDS, SLA_FIELDS,
has_valid_sla_data, has_valid_sla_data,
load_config, load_config,
) )
from src.incremental_jira_transform import transform_single_issue from connectors.jira.incremental_transform import transform_single_issue
from src.jira_file_lock import issue_json_lock from connectors.jira.file_lock import issue_json_lock
logging.basicConfig( logging.basicConfig(
level=logging.INFO, level=logging.INFO,

View file

@ -18,7 +18,7 @@ from typing import Any
import httpx import httpx
from .config import Config from webapp.config import Config
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@ -38,7 +38,7 @@ def trigger_incremental_transform(issue_key: str, deleted: bool = False) -> bool
True if transform succeeded, False otherwise True if transform succeeded, False otherwise
""" """
try: try:
from src.incremental_jira_transform import transform_single_issue from connectors.jira.incremental_transform import transform_single_issue
success = transform_single_issue( success = transform_single_issue(
issue_key=issue_key, issue_key=issue_key,
@ -262,7 +262,7 @@ class JiraService:
file_path.parent.mkdir(parents=True, exist_ok=True) file_path.parent.mkdir(parents=True, exist_ok=True)
try: try:
from src.jira_file_lock import issue_json_lock from connectors.jira.file_lock import issue_json_lock
# Lock protects the JSON write + Parquet transform from concurrent # Lock protects the JSON write + Parquet transform from concurrent
# SLA poll writes. Attachment download stays outside the lock. # SLA poll writes. Attachment download stays outside the lock.
@ -499,7 +499,7 @@ class JiraService:
if file_path.exists(): if file_path.exists():
# Mark as deleted rather than removing # Mark as deleted rather than removing
try: try:
from src.jira_file_lock import issue_json_lock from connectors.jira.file_lock import issue_json_lock
issues_dir = self.data_dir / "issues" issues_dir = self.data_dir / "issues"
with issue_json_lock(issues_dir, issue_key): with issue_json_lock(issues_dir, issue_key):

View file

@ -8,7 +8,7 @@ Type=oneshot
User=root User=root
Group=data-ops Group=data-ops
WorkingDirectory=/opt/data-analyst/repo WorkingDirectory=/opt/data-analyst/repo
ExecStart=/opt/data-analyst/.venv/bin/python scripts/jira_consistency_check.py --auto-fix --max-age-days 30 ExecStart=/opt/data-analyst/.venv/bin/python -m connectors.jira.scripts.consistency_check --auto-fix --max-age-days 30
EnvironmentFile=/opt/data-analyst/.env EnvironmentFile=/opt/data-analyst/.env
EnvironmentFile=/opt/data-analyst/repo/.env EnvironmentFile=/opt/data-analyst/repo/.env
ProtectSystem=strict ProtectSystem=strict

View file

@ -8,7 +8,7 @@ Type=oneshot
User=root User=root
Group=data-ops Group=data-ops
WorkingDirectory=/opt/data-analyst/repo WorkingDirectory=/opt/data-analyst/repo
ExecStart=/opt/data-analyst/.venv/bin/python scripts/jira_poll_sla.py ExecStart=/opt/data-analyst/.venv/bin/python -m connectors.jira.scripts.poll_sla
EnvironmentFile=/opt/data-analyst/.env EnvironmentFile=/opt/data-analyst/.env
EnvironmentFile=/opt/data-analyst/repo/.env EnvironmentFile=/opt/data-analyst/repo/.env
ProtectSystem=strict ProtectSystem=strict

View file

View file

@ -1,4 +1,4 @@
"""Tests for per-issue advisory file locking (src/jira_file_lock.py). """Tests for per-issue advisory file locking (connectors/jira/file_lock.py).
Verifies that issue_json_lock correctly: Verifies that issue_json_lock correctly:
- Acquires and releases locks via context manager - Acquires and releases locks via context manager
@ -13,7 +13,7 @@ from pathlib import Path
import pytest import pytest
from src.jira_file_lock import issue_json_lock from connectors.jira.file_lock import issue_json_lock
class TestBasicLockUnlock: class TestBasicLockUnlock:

View file

@ -1,4 +1,4 @@
"""Tests for per-month Parquet advisory file locking (src/jira_file_lock.py). """Tests for per-month Parquet advisory file locking (connectors/jira/file_lock.py).
Verifies that parquet_month_lock correctly: Verifies that parquet_month_lock correctly:
- Acquires and releases locks via context manager - Acquires and releases locks via context manager
@ -17,7 +17,7 @@ from pathlib import Path
import pandas as pd import pandas as pd
import pytest import pytest
from src.jira_file_lock import parquet_month_lock from connectors.jira.file_lock import parquet_month_lock
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@ -269,7 +269,7 @@ class TestParquetLockIntegration:
def test_concurrent_transforms_no_data_loss(self, tmp_path: Path) -> None: def test_concurrent_transforms_no_data_loss(self, tmp_path: Path) -> None:
"""Simulate concurrent webhook transforms for same month.""" """Simulate concurrent webhook transforms for same month."""
from src.incremental_jira_transform import transform_single_issue from connectors.jira.incremental_transform import transform_single_issue
raw_dir = tmp_path / "raw" raw_dir = tmp_path / "raw"
issues_dir = raw_dir / "issues" issues_dir = raw_dir / "issues"
@ -331,7 +331,7 @@ class TestParquetLockIntegration:
def test_concurrent_transforms_different_months_independent(self, tmp_path: Path) -> None: def test_concurrent_transforms_different_months_independent(self, tmp_path: Path) -> None:
"""Issues in different months should not interfere with each other.""" """Issues in different months should not interfere with each other."""
from src.incremental_jira_transform import transform_single_issue from connectors.jira.incremental_transform import transform_single_issue
raw_dir = tmp_path / "raw" raw_dir = tmp_path / "raw"
issues_dir = raw_dir / "issues" issues_dir = raw_dir / "issues"

View file

@ -1,5 +1,5 @@
""" """
Tests for scripts/jira_poll_sla.py - SLA polling and self-healing logic. Tests for connectors/jira/scripts/poll_sla.py - SLA polling and self-healing logic.
Covers: Covers:
- fetch_sla_and_status: API response parsing for SLA + status fields - fetch_sla_and_status: API response parsing for SLA + status fields
@ -14,9 +14,9 @@ from unittest.mock import MagicMock, patch
import pytest import pytest
# Ensure project root is importable # Ensure project root is importable
sys.path.insert(0, str(Path(__file__).resolve().parent.parent)) sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent.parent))
from scripts.jira_poll_sla import ( from connectors.jira.scripts.poll_sla import (
SLA_FIELDS, SLA_FIELDS,
STATUS_FIELDS, STATUS_FIELDS,
fetch_sla_and_status, fetch_sla_and_status,
@ -68,7 +68,7 @@ def fake_issue_json_in_progress(tmp_path: Path) -> Path:
class TestFetchSlaAndStatus: class TestFetchSlaAndStatus:
"""Tests for the fetch_sla_and_status function.""" """Tests for the fetch_sla_and_status function."""
@patch("scripts.jira_poll_sla.httpx.Client") @patch("connectors.jira.scripts.poll_sla.httpx.Client")
def test_returns_all_sla_and_status_fields(self, mock_client_cls: MagicMock) -> None: def test_returns_all_sla_and_status_fields(self, mock_client_cls: MagicMock) -> None:
""" """
When the Jira API returns 200 with all requested fields, When the Jira API returns 200 with all requested fields,
@ -145,8 +145,8 @@ class TestFetchSlaAndStatus:
class TestUpdateIssueSlaHealing: class TestUpdateIssueSlaHealing:
"""Tests for self-healing when API reports an issue as resolved.""" """Tests for self-healing when API reports an issue as resolved."""
@patch("scripts.jira_poll_sla.transform_single_issue") @patch("connectors.jira.scripts.poll_sla.transform_single_issue")
@patch("scripts.jira_poll_sla.fetch_sla_and_status") @patch("connectors.jira.scripts.poll_sla.fetch_sla_and_status")
def test_self_healing_returns_healed_and_updates_json( def test_self_healing_returns_healed_and_updates_json(
self, self,
mock_fetch: MagicMock, mock_fetch: MagicMock,
@ -224,8 +224,8 @@ class TestUpdateIssueSlaHealing:
class TestUpdateIssueSlaSkip: class TestUpdateIssueSlaSkip:
"""Tests for the skip logic when SLA data is empty and status is not Done.""" """Tests for the skip logic when SLA data is empty and status is not Done."""
@patch("scripts.jira_poll_sla.transform_single_issue") @patch("connectors.jira.scripts.poll_sla.transform_single_issue")
@patch("scripts.jira_poll_sla.fetch_sla_and_status") @patch("connectors.jira.scripts.poll_sla.fetch_sla_and_status")
def test_skips_when_no_sla_data_and_not_resolved( def test_skips_when_no_sla_data_and_not_resolved(
self, self,
mock_fetch: MagicMock, mock_fetch: MagicMock,
@ -272,8 +272,8 @@ class TestUpdateIssueSlaSkip:
class TestUpdateIssueSlaJsonMissing: class TestUpdateIssueSlaJsonMissing:
"""Tests for missing JSON file handling.""" """Tests for missing JSON file handling."""
@patch("scripts.jira_poll_sla.transform_single_issue") @patch("connectors.jira.scripts.poll_sla.transform_single_issue")
@patch("scripts.jira_poll_sla.fetch_sla_and_status") @patch("connectors.jira.scripts.poll_sla.fetch_sla_and_status")
def test_returns_skipped_when_json_file_missing( def test_returns_skipped_when_json_file_missing(
self, self,
mock_fetch: MagicMock, mock_fetch: MagicMock,

View file

@ -32,7 +32,7 @@ CUSTOM_FIELD_NAMES = {
"customfield_10157": "satisfaction", # Customer satisfaction (was: sla_info) "customfield_10157": "satisfaction", # Customer satisfaction (was: sla_info)
"customfield_10323": "triage", # Triage multi-select (was: team_tier) "customfield_10323": "triage", # Triage multi-select (was: team_tier)
"customfield_10330": "context", # Context field (was: root_cause) "customfield_10330": "context", # Context field (was: root_cause)
"customfield_10325": "keboola_platform_url", # Keboola platform URL (was: resolution_summary) "customfield_10325": "custom_url", # Custom URL (was: resolution_summary)
"customfield_10350": "slack_link", # Slack link (was: customer_type) "customfield_10350": "slack_link", # Slack link (was: customer_type)
"customfield_10475": "email_address", # Email address (was: context) "customfield_10475": "email_address", # Email address (was: context)
"customfield_10511": "configuration_item", # Configuration item (was: categories) "customfield_10511": "configuration_item", # Configuration item (was: categories)
@ -80,7 +80,7 @@ ISSUES_SCHEMA = {
"organizations": "string", "organizations": "string",
"spam": "string", "spam": "string",
"context": "string", "context": "string",
"keboola_platform_url": "string", "custom_url": "string",
"slack_link": "string", "slack_link": "string",
"technical_issue_category": "string", "technical_issue_category": "string",
"email_address": "string", "email_address": "string",
@ -380,7 +380,7 @@ def transform_issue(raw_issue: dict) -> dict:
"organizations": json.dumps(extract_option_list(fields.get("customfield_10002"))), "organizations": json.dumps(extract_option_list(fields.get("customfield_10002"))),
"spam": extract_option_value(fields.get("customfield_10365")), "spam": extract_option_value(fields.get("customfield_10365")),
"context": extract_text_from_adf(fields.get("customfield_10330")) or None, "context": extract_text_from_adf(fields.get("customfield_10330")) or None,
"keboola_platform_url": fields.get("customfield_10325"), "custom_url": fields.get("customfield_10325"),
"slack_link": extract_option_value(fields.get("customfield_10350")), "slack_link": extract_option_value(fields.get("customfield_10350")),
"technical_issue_category": extract_option_value(fields.get("customfield_10676")), "technical_issue_category": extract_option_value(fields.get("customfield_10676")),
"email_address": extract_option_value(fields.get("customfield_10475")), "email_address": extract_option_value(fields.get("customfield_10475")),

View file

@ -13,8 +13,8 @@ from datetime import datetime
from flask import Blueprint, abort, jsonify, request from flask import Blueprint, abort, jsonify, request
from .config import Config from webapp.config import Config
from .jira_service import get_jira_service from .service import get_jira_service
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)

View file

@ -152,7 +152,7 @@ Use `0o660` for files accessed by services via data-ops group ACL, `0o644` for w
When multiple services write to the same JSON file (e.g., SLA poll and webhook handler both updating `/data/src_data/raw/jira/issues/SUPPORT-1234.json`), use advisory file locking to prevent races: When multiple services write to the same JSON file (e.g., SLA poll and webhook handler both updating `/data/src_data/raw/jira/issues/SUPPORT-1234.json`), use advisory file locking to prevent races:
```python ```python
from src.jira_file_lock import issue_json_lock from connectors.jira.file_lock import issue_json_lock
with issue_json_lock(issues_dir, issue_key): with issue_json_lock(issues_dir, issue_key):
# read JSON, modify, atomic write, transform to Parquet # read JSON, modify, atomic write, transform to Parquet
@ -165,8 +165,8 @@ with issue_json_lock(issues_dir, issue_key):
- The lock must cover the entire read-modify-write **and** the Parquet transform — otherwise another writer could overwrite the JSON between write and transform, causing the transform to read stale data - The lock must cover the entire read-modify-write **and** the Parquet transform — otherwise another writer could overwrite the JSON between write and transform, causing the transform to read stale data
Currently used by: Currently used by:
- `scripts/jira_poll_sla.py` — wraps SLA+status update + `transform_single_issue()` - `connectors/jira/scripts/poll_sla.py` — wraps SLA+status update + `transform_single_issue()`
- `webapp/jira_service.py` — wraps `save_issue()` JSON write + `trigger_incremental_transform()`, and `_handle_deletion()` read-modify-write + transform - `connectors/jira/service.py` — wraps `save_issue()` JSON write + `trigger_incremental_transform()`, and `_handle_deletion()` read-modify-write + transform
Attachment downloads in `save_issue()` intentionally run **outside** the lock (can take tens of seconds and don't modify JSON). Attachment downloads in `save_issue()` intentionally run **outside** the lock (can take tens of seconds and don't modify JSON).
@ -1405,8 +1405,8 @@ SLA elapsed values (`first_response_elapsed_millis`, `time_to_resolution_elapsed
|-----------|-------------| |-----------|-------------|
| `jira-sla-poll.service` | Oneshot service that polls open tickets for fresh SLA + status data | | `jira-sla-poll.service` | Oneshot service that polls open tickets for fresh SLA + status data |
| `jira-sla-poll.timer` | Runs every 15 minutes (10min after boot, then every 15min) | | `jira-sla-poll.timer` | Runs every 15 minutes (10min after boot, then every 15min) |
| `scripts/jira_poll_sla.py` | Reads Parquet to find open issues, fetches SLA + status via cloud API | | `connectors/jira/scripts/poll_sla.py` | Reads Parquet to find open issues, fetches SLA + status via cloud API |
| `src/jira_file_lock.py` | Per-issue advisory file locking (shared with webhook handler) | | `connectors/jira/file_lock.py` | Per-issue advisory file locking (shared with webhook handler) |
**How it works:** **How it works:**
1. Reads Parquet issues to find open tickets with SLA data (~49 tickets) 1. Reads Parquet issues to find open tickets with SLA data (~49 tickets)
@ -1428,7 +1428,7 @@ journalctl -u jira-sla-poll.service --since "1 hour ago"
# Manual dry run (count open issues) # Manual dry run (count open issues)
cd /opt/data-analyst/repo cd /opt/data-analyst/repo
/opt/data-analyst/.venv/bin/python scripts/jira_poll_sla.py --dry-run /opt/data-analyst/.venv/bin/python -m connectors.jira.scripts.poll_sla --dry-run
``` ```
**Requires:** `JIRA_SLA_EMAIL`, `JIRA_SLA_API_TOKEN`, `JIRA_CLOUD_ID` in `.env`. Timer is auto-enabled by `deploy.sh` when `JIRA_SLA_API_TOKEN` is set. **Requires:** `JIRA_SLA_EMAIL`, `JIRA_SLA_API_TOKEN`, `JIRA_CLOUD_ID` in `.env`. Timer is auto-enabled by `deploy.sh` when `JIRA_SLA_API_TOKEN` is set.
@ -1442,7 +1442,7 @@ Automated check every 30 minutes to detect missing Jira issues caused by webhook
| `jira-consistency.service` | Oneshot service that validates data consistency across all sources | | `jira-consistency.service` | Oneshot service that validates data consistency across all sources |
| `jira-consistency.timer` | Runs every 30 minutes (10min after boot) | | `jira-consistency.timer` | Runs every 30 minutes (10min after boot) |
| `jira-consistency-deep.timer` | Weekly full history check (Sunday 3 AM) | | `jira-consistency-deep.timer` | Weekly full history check (Sunday 3 AM) |
| `scripts/jira_consistency_check.py` | Validation script with auto-backfill capability | | `connectors/jira/scripts/consistency_check.py` | Validation script with auto-backfill capability |
**How it works:** **How it works:**
1. Queries Jira API for all issue keys (last 30 days by default) 1. Queries Jira API for all issue keys (last 30 days by default)
@ -1470,10 +1470,10 @@ journalctl -u jira-consistency.service --since "1 hour ago"
# Manual check (dry run) # Manual check (dry run)
cd /opt/data-analyst/repo cd /opt/data-analyst/repo
/opt/data-analyst/.venv/bin/python scripts/jira_consistency_check.py --dry-run --max-age-days 7 /opt/data-analyst/.venv/bin/python -m connectors.jira.scripts.consistency_check --dry-run --max-age-days 7
# Manual check with auto-fix # Manual check with auto-fix
/opt/data-analyst/.venv/bin/python scripts/jira_consistency_check.py --auto-fix --max-age-days 30 /opt/data-analyst/.venv/bin/python -m connectors.jira.scripts.consistency_check --auto-fix --max-age-days 30
# View consistency report # View consistency report
cat /data/src_data/raw/jira/_consistency_report.json | python3 -m json.tool cat /data/src_data/raw/jira/_consistency_report.json | python3 -m json.tool
@ -1486,7 +1486,7 @@ jq -r '.discrepancies.missing_in_json[]' /data/src_data/raw/jira/_consistency_re
# Backfill specific issues # Backfill specific issues
cd /opt/data-analyst/repo cd /opt/data-analyst/repo
/opt/data-analyst/.venv/bin/python scripts/jira_backfill.py --issue-keys SUPPORT-15307,SUPPORT-15308 /opt/data-analyst/.venv/bin/python -m connectors.jira.scripts.backfill --issue-keys SUPPORT-15307,SUPPORT-15308
# Verify in Parquet # Verify in Parquet
/opt/data-analyst/.venv/bin/python -c " /opt/data-analyst/.venv/bin/python -c "
@ -1510,7 +1510,7 @@ for row in result:
- API token has read-only access to Jira (no write permissions needed) - API token has read-only access to Jira (no write permissions needed)
- Webhook events are logged for audit purposes - Webhook events are logged for audit purposes
- Multiple services write to `/data/src_data/raw/jira/`: webapp (www-data), SLA poll (root), consistency check (root), backfill scripts (admin users) - Multiple services write to `/data/src_data/raw/jira/`: webapp (www-data), SLA poll (root), consistency check (root), backfill scripts (admin users)
- Concurrent writes to the same issue JSON are serialized via per-issue advisory file locking (`src/jira_file_lock.py`, `fcntl.flock`). Lock files in `issues/.locks/`. See [#203](https://github.com/your-org/ai-data-analyst/issues/203). - Concurrent writes to the same issue JSON are serialized via per-issue advisory file locking (`connectors/jira/file_lock.py`, `fcntl.flock`). Lock files in `issues/.locks/`. See [#203](https://github.com/your-org/ai-data-analyst/issues/203).
## Data Profiler ## Data Profiler

View file

@ -1,3 +1,4 @@
[pytest] [pytest]
addopts = -m "not live"
markers = markers =
live: tests requiring server access (deselect with '-m "not live"') live: tests requiring server access (run with '-m live')

View file

@ -44,12 +44,12 @@ cd "$REPO_DIR"
echo "" echo ""
echo "--- Phase 1: Download raw JSON ---" echo "--- Phase 1: Download raw JSON ---"
if $DRY_RUN; then if $DRY_RUN; then
python scripts/jira_backfill.py --jql "$JQL" --dry-run python -m connectors.jira.scripts.backfill --jql "$JQL" --dry-run
echo "Dry run complete. Exiting." echo "Dry run complete. Exiting."
exit 0 exit 0
fi fi
python scripts/jira_backfill.py --jql "$JQL" --skip-existing --parallel 4 python -m connectors.jira.scripts.backfill --jql "$JQL" --skip-existing --parallel 4
# --- Phase 2: Incremental Parquet transform --- # --- Phase 2: Incremental Parquet transform ---
echo "" echo ""

View file

@ -219,18 +219,18 @@ fi
# Deploy Jira SLA polling systemd service and timer # Deploy Jira SLA polling systemd service and timer
log "Deploying jira-sla-poll service and timer..." log "Deploying jira-sla-poll service and timer..."
if [[ -f "${REPO_DIR}/server/jira-sla-poll.service" ]]; then if [[ -f "${REPO_DIR}/connectors/jira/systemd/jira-sla-poll.service" ]]; then
sudo /usr/bin/cp "${REPO_DIR}/server/jira-sla-poll.service" /etc/systemd/system/jira-sla-poll.service sudo /usr/bin/cp "${REPO_DIR}/connectors/jira/systemd/jira-sla-poll.service" /etc/systemd/system/jira-sla-poll.service
sudo /usr/bin/cp "${REPO_DIR}/server/jira-sla-poll.timer" /etc/systemd/system/jira-sla-poll.timer sudo /usr/bin/cp "${REPO_DIR}/connectors/jira/systemd/jira-sla-poll.timer" /etc/systemd/system/jira-sla-poll.timer
sudo /usr/bin/systemctl daemon-reload sudo /usr/bin/systemctl daemon-reload
fi fi
# Deploy Jira consistency monitoring systemd service and timers # Deploy Jira consistency monitoring systemd service and timers
log "Deploying jira-consistency service and timers..." log "Deploying jira-consistency service and timers..."
if [[ -f "${REPO_DIR}/server/jira-consistency.service" ]]; then if [[ -f "${REPO_DIR}/connectors/jira/systemd/jira-consistency.service" ]]; then
sudo /usr/bin/cp "${REPO_DIR}/server/jira-consistency.service" /etc/systemd/system/jira-consistency.service sudo /usr/bin/cp "${REPO_DIR}/connectors/jira/systemd/jira-consistency.service" /etc/systemd/system/jira-consistency.service
sudo /usr/bin/cp "${REPO_DIR}/server/jira-consistency.timer" /etc/systemd/system/jira-consistency.timer sudo /usr/bin/cp "${REPO_DIR}/connectors/jira/systemd/jira-consistency.timer" /etc/systemd/system/jira-consistency.timer
sudo /usr/bin/cp "${REPO_DIR}/server/jira-consistency-deep.timer" /etc/systemd/system/jira-consistency-deep.timer sudo /usr/bin/cp "${REPO_DIR}/connectors/jira/systemd/jira-consistency-deep.timer" /etc/systemd/system/jira-consistency-deep.timer
sudo /usr/bin/systemctl daemon-reload sudo /usr/bin/systemctl daemon-reload
# Create log file with correct permissions # Create log file with correct permissions

View file

@ -113,8 +113,8 @@ deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl stop corporate-memory.timer
deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl is-enabled corporate-memory.timer deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl is-enabled corporate-memory.timer
# Allow deploy user to manage jira-sla-poll service and timer # Allow deploy user to manage jira-sla-poll service and timer
deploy ALL=(ALL) NOPASSWD: /usr/bin/cp /opt/data-analyst/repo/server/jira-sla-poll.service /etc/systemd/system/jira-sla-poll.service deploy ALL=(ALL) NOPASSWD: /usr/bin/cp /opt/data-analyst/repo/connectors/jira/systemd/jira-sla-poll.service /etc/systemd/system/jira-sla-poll.service
deploy ALL=(ALL) NOPASSWD: /usr/bin/cp /opt/data-analyst/repo/server/jira-sla-poll.timer /etc/systemd/system/jira-sla-poll.timer deploy ALL=(ALL) NOPASSWD: /usr/bin/cp /opt/data-analyst/repo/connectors/jira/systemd/jira-sla-poll.timer /etc/systemd/system/jira-sla-poll.timer
deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl enable jira-sla-poll.timer deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl enable jira-sla-poll.timer
deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl start jira-sla-poll.timer deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl start jira-sla-poll.timer
deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl stop jira-sla-poll.timer deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl stop jira-sla-poll.timer
@ -132,9 +132,9 @@ deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl stop session-collector.timer
deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl is-enabled session-collector.timer deploy ALL=(ALL) NOPASSWD: /usr/bin/systemctl is-enabled session-collector.timer
# Allow deploy user to manage jira-consistency service and timers # Allow deploy user to manage jira-consistency service and timers
deploy ALL=(ALL) NOPASSWD: /usr/bin/cp /opt/data-analyst/repo/server/jira-consistency.service /etc/systemd/system/jira-consistency.service deploy ALL=(ALL) NOPASSWD: /usr/bin/cp /opt/data-analyst/repo/connectors/jira/systemd/jira-consistency.service /etc/systemd/system/jira-consistency.service
deploy ALL=(ALL) NOPASSWD: /usr/bin/cp /opt/data-analyst/repo/server/jira-consistency.timer /etc/systemd/system/jira-consistency.timer deploy ALL=(ALL) NOPASSWD: /usr/bin/cp /opt/data-analyst/repo/connectors/jira/systemd/jira-consistency.timer /etc/systemd/system/jira-consistency.timer
deploy ALL=(ALL) NOPASSWD: /usr/bin/cp /opt/data-analyst/repo/server/jira-consistency-deep.timer /etc/systemd/system/jira-consistency-deep.timer deploy ALL=(ALL) NOPASSWD: /usr/bin/cp /opt/data-analyst/repo/connectors/jira/systemd/jira-consistency-deep.timer /etc/systemd/system/jira-consistency-deep.timer
deploy ALL=(ALL) NOPASSWD: /usr/bin/touch /opt/data-analyst/logs/jira-consistency.log deploy ALL=(ALL) NOPASSWD: /usr/bin/touch /opt/data-analyst/logs/jira-consistency.log
deploy ALL=(ALL) NOPASSWD: /usr/bin/chown root\:data-ops /opt/data-analyst/logs/jira-consistency.log deploy ALL=(ALL) NOPASSWD: /usr/bin/chown root\:data-ops /opt/data-analyst/logs/jira-consistency.log
deploy ALL=(ALL) NOPASSWD: /usr/bin/chmod 664 /opt/data-analyst/logs/jira-consistency.log deploy ALL=(ALL) NOPASSWD: /usr/bin/chmod 664 /opt/data-analyst/logs/jira-consistency.log

View file

@ -60,55 +60,63 @@ METRICS_YML_PATH = DOCS_DIR / "metrics.yml"
METRICS_DIR = DOCS_DIR / "metrics" METRICS_DIR = DOCS_DIR / "metrics"
DATA_DESCRIPTION_PATH = DOCS_DIR / "data_description.md" DATA_DESCRIPTION_PATH = DOCS_DIR / "data_description.md"
# Jira / Support tables - not in data_description.md but stored as partitioned parquet # Jira tables - loaded dynamically if Jira connector is enabled
JIRA_PARQUET_DIR = PARQUET_DIR / "jira" # The Jira connector stores partitioned parquet files in PARQUET_DIR/jira/
JIRA_TABLES = [ def _load_jira_tables() -> tuple:
{ """Load Jira table definitions if the connector directory exists."""
"name": "jira_issues", jira_dir = PARQUET_DIR / "jira"
"subdir": "issues", if not jira_dir.exists():
"description": "Support tickets from Jira SUPPORT project. Key fields: issue_key, summary, description, status, priority, assignee, created_at, resolved_at, severity, triage.", return jira_dir, []
"primary_key": "issue_key", return jira_dir, [
"foreign_keys": [], {
}, "name": "jira_issues",
{ "subdir": "issues",
"name": "jira_comments", "description": "Jira issues. Key fields: issue_key, summary, description, status, priority, assignee, created_at, resolved_at.",
"subdir": "comments", "primary_key": "issue_key",
"description": "Comments on support tickets. Key fields: comment_id, issue_key, author_email, body, created_at.", "foreign_keys": [],
"primary_key": "comment_id", },
"foreign_keys": [{"column": "issue_key", "references": "jira_issues.issue_key", "description": "Parent support ticket"}], {
}, "name": "jira_comments",
{ "subdir": "comments",
"name": "jira_attachments", "description": "Comments on Jira issues. Key fields: comment_id, issue_key, author_email, body, created_at.",
"subdir": "attachments", "primary_key": "comment_id",
"description": "Attachment metadata with local file paths. Key fields: attachment_id, issue_key, filename, local_path, size_bytes, mime_type.", "foreign_keys": [{"column": "issue_key", "references": "jira_issues.issue_key", "description": "Parent issue"}],
"primary_key": "attachment_id", },
"foreign_keys": [{"column": "issue_key", "references": "jira_issues.issue_key", "description": "Parent support ticket"}], {
}, "name": "jira_attachments",
{ "subdir": "attachments",
"name": "jira_changelog", "description": "Attachment metadata with local file paths. Key fields: attachment_id, issue_key, filename, local_path, size_bytes, mime_type.",
"subdir": "changelog", "primary_key": "attachment_id",
"description": "History of all field changes on issues. Key fields: change_id, issue_key, field_name, from_value, to_value, changed_at.", "foreign_keys": [{"column": "issue_key", "references": "jira_issues.issue_key", "description": "Parent issue"}],
"primary_key": "change_id", },
"foreign_keys": [{"column": "issue_key", "references": "jira_issues.issue_key", "description": "Parent support ticket"}], {
}, "name": "jira_changelog",
{ "subdir": "changelog",
"name": "jira_issuelinks", "description": "History of all field changes on issues. Key fields: change_id, issue_key, field_name, from_value, to_value, changed_at.",
"subdir": "issuelinks", "primary_key": "change_id",
"description": "Links between Jira issues (blocks, duplicates, relates to). Key fields: issue_key, link_id, link_type, direction, linked_issue_key.", "foreign_keys": [{"column": "issue_key", "references": "jira_issues.issue_key", "description": "Parent issue"}],
"primary_key": "link_id", },
"foreign_keys": [ {
{"column": "issue_key", "references": "jira_issues.issue_key", "description": "Source support ticket"}, "name": "jira_issuelinks",
{"column": "linked_issue_key", "references": "jira_issues.issue_key", "description": "Target linked ticket"}, "subdir": "issuelinks",
], "description": "Links between Jira issues (blocks, duplicates, relates to). Key fields: issue_key, link_id, link_type, direction, linked_issue_key.",
}, "primary_key": "link_id",
{ "foreign_keys": [
"name": "jira_remote_links", {"column": "issue_key", "references": "jira_issues.issue_key", "description": "Source issue"},
"subdir": "remote_links", {"column": "linked_issue_key", "references": "jira_issues.issue_key", "description": "Target linked issue"},
"description": "External links attached to issues (Confluence pages, Slack threads, etc.). Key fields: issue_key, remote_link_id, url, title.", ],
"primary_key": "remote_link_id", },
"foreign_keys": [{"column": "issue_key", "references": "jira_issues.issue_key", "description": "Parent support ticket"}], {
}, "name": "jira_remote_links",
] "subdir": "remote_links",
"description": "External links attached to issues (Confluence pages, Slack threads, etc.). Key fields: issue_key, remote_link_id, url, title.",
"primary_key": "remote_link_id",
"foreign_keys": [{"column": "issue_key", "references": "jira_issues.issue_key", "description": "Parent issue"}],
},
]
JIRA_PARQUET_DIR, JIRA_TABLES = _load_jira_tables()
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------

View file

@ -21,7 +21,7 @@ import pytest
REPO_ROOT = Path(__file__).resolve().parent.parent REPO_ROOT = Path(__file__).resolve().parent.parent
SCRIPTS_DIR = REPO_ROOT / "scripts" SCRIPTS_DIR = REPO_ROOT / "scripts"
SYNC_DATA_SH = SCRIPTS_DIR / "sync_data.sh" SYNC_DATA_SH = SCRIPTS_DIR / "sync_data.sh"
SYNC_JIRA_SH = SCRIPTS_DIR / "sync_jira.sh" SYNC_JIRA_SH = REPO_ROOT / "connectors" / "jira" / "scripts" / "sync_jira.sh"
SYNC_SCRIPTS = [SYNC_DATA_SH, SYNC_JIRA_SH] SYNC_SCRIPTS = [SYNC_DATA_SH, SYNC_JIRA_SH]
DIAG_DIR = REPO_ROOT / "data" / "sync_diagnostics" DIAG_DIR = REPO_ROOT / "data" / "sync_diagnostics"

View file

@ -18,11 +18,18 @@ from flask import Flask, flash, jsonify, redirect, render_template, request, ses
from .auth import auth_bp, init_oauth, login_required from .auth import auth_bp, init_oauth, login_required
from .config import Config from .config import Config
from .desktop_auth import desktop_bp, require_desktop_auth from .desktop_auth import desktop_bp, require_desktop_auth
from .jira_webhook import jira_bp
from .notification_images import images_bp from .notification_images import images_bp
from .account_service import get_account_details from .account_service import get_account_details
from .sync_settings_service import get_sync_settings, update_sync_settings from .sync_settings_service import get_sync_settings, update_sync_settings
# Jira connector is optional - only loaded if configured
try:
from connectors.jira.webhook import jira_bp
JIRA_AVAILABLE = True
except ImportError:
JIRA_AVAILABLE = False
jira_bp = None
# Password auth is optional - requires SENDGRID_API_KEY # Password auth is optional - requires SENDGRID_API_KEY
try: try:
from .password_auth import password_auth_bp from .password_auth import password_auth_bp
@ -73,7 +80,8 @@ def create_app() -> Flask:
app.register_blueprint(auth_bp) app.register_blueprint(auth_bp)
app.register_blueprint(desktop_bp) app.register_blueprint(desktop_bp)
app.register_blueprint(images_bp) app.register_blueprint(images_bp)
app.register_blueprint(jira_bp) if JIRA_AVAILABLE and jira_bp:
app.register_blueprint(jira_bp)
if PASSWORD_AUTH_AVAILABLE and password_auth_bp: if PASSWORD_AUTH_AVAILABLE and password_auth_bp:
app.register_blueprint(password_auth_bp) app.register_blueprint(password_auth_bp)

View file

@ -97,18 +97,17 @@ class Config:
# Notification images directory # Notification images directory
NOTIFICATION_IMAGES_DIR = "/tmp" NOTIFICATION_IMAGES_DIR = "/tmp"
# Jira webhook integration # Jira connector (optional - loaded from connectors/jira/)
# These remain here for backward compatibility; the Jira connector
# reads them from this Config class.
JIRA_ENABLED = os.environ.get("JIRA_DOMAIN", "") != ""
JIRA_WEBHOOK_SECRET = os.environ.get("JIRA_WEBHOOK_SECRET", "") JIRA_WEBHOOK_SECRET = os.environ.get("JIRA_WEBHOOK_SECRET", "")
JIRA_DOMAIN = os.environ.get("JIRA_DOMAIN", "") # e.g., "yourorg.atlassian.net" JIRA_DOMAIN = os.environ.get("JIRA_DOMAIN", "")
JIRA_EMAIL = os.environ.get("JIRA_EMAIL", "") JIRA_EMAIL = os.environ.get("JIRA_EMAIL", "")
JIRA_API_TOKEN = os.environ.get("JIRA_API_TOKEN", "") JIRA_API_TOKEN = os.environ.get("JIRA_API_TOKEN", "")
# Jira SLA service account (JSM Agent licence required for SLA fields)
JIRA_SLA_EMAIL = os.environ.get("JIRA_SLA_EMAIL", "") JIRA_SLA_EMAIL = os.environ.get("JIRA_SLA_EMAIL", "")
JIRA_SLA_API_TOKEN = os.environ.get("JIRA_SLA_API_TOKEN", "") JIRA_SLA_API_TOKEN = os.environ.get("JIRA_SLA_API_TOKEN", "")
JIRA_CLOUD_ID = os.environ.get("JIRA_CLOUD_ID", "") JIRA_CLOUD_ID = os.environ.get("JIRA_CLOUD_ID", "")
# Jira data storage (raw data, will be processed to parquet later)
JIRA_DATA_DIR = Path(os.environ.get("JIRA_DATA_DIR", "/data/src_data/raw/jira")) JIRA_DATA_DIR = Path(os.environ.get("JIRA_DATA_DIR", "/data/src_data/raw/jira"))
@classmethod @classmethod

View file

@ -5,7 +5,7 @@ Returns detailed system status including:
- Systemd services (webapp, telegram-bot, timers) - Systemd services (webapp, telegram-bot, timers)
- Disk space - Disk space
- System load - System load
- Last Jira webhook timestamp - Optional: Jira webhook timestamp (if Jira connector enabled)
""" """
import logging import logging
@ -14,6 +14,8 @@ import subprocess
from datetime import datetime from datetime import datetime
from pathlib import Path from pathlib import Path
from .config import Config
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
# Services to monitor # Services to monitor
@ -22,12 +24,19 @@ CRITICAL_SERVICES = [
"notify-bot.service", "notify-bot.service",
] ]
TIMERS_TO_MONITOR = [ # Base timers (always monitored)
"jira-consistency.timer", _BASE_TIMERS = [
"corporate-memory.timer", "corporate-memory.timer",
]
# Jira timers (only if Jira connector is enabled)
_JIRA_TIMERS = [
"jira-consistency.timer",
"jira-sla-poll.timer", "jira-sla-poll.timer",
] ]
TIMERS_TO_MONITOR = _BASE_TIMERS + (_JIRA_TIMERS if Config.JIRA_ENABLED else [])
def get_service_status(service_name: str) -> dict: def get_service_status(service_name: str) -> dict:
"""Get systemd service status.""" """Get systemd service status."""
@ -139,7 +148,6 @@ def health_check() -> tuple[dict, int]:
timers = [get_service_status(t) for t in TIMERS_TO_MONITOR] timers = [get_service_status(t) for t in TIMERS_TO_MONITOR]
disk = get_disk_usage() disk = get_disk_usage()
load = get_load_average() load = get_load_average()
jira = get_last_jira_webhook()
# Overall health: all critical checks must pass # Overall health: all critical checks must pass
all_healthy = ( all_healthy = (
@ -155,9 +163,12 @@ def health_check() -> tuple[dict, int]:
"timers": timers, "timers": timers,
"disk": disk, "disk": disk,
"load": load, "load": load,
"jira_webhook": jira,
} }
# Include Jira webhook status only if connector is enabled
if Config.JIRA_ENABLED:
response["jira_webhook"] = get_last_jira_webhook()
# Return 200 if healthy, 503 if degraded # Return 200 if healthy, 503 if degraded
status_code = 200 if all_healthy else 503 status_code = 200 if all_healthy else 503

View file

@ -105,9 +105,11 @@ def update_sync_settings(username: str, settings: dict) -> tuple[bool, str]:
existing = all_settings.get(username, {}).get("datasets", dict(DEFAULT_SETTINGS)) existing = all_settings.get(username, {}).get("datasets", dict(DEFAULT_SETTINGS))
existing.update(settings) existing.update(settings)
# Validate dependencies on merged state # Validate dependencies on merged state (from instance config)
if existing.get("jira_attachments") and not existing.get("jira"): for key, info in DATASET_INFO.items():
return False, "Jira attachments require Jira to be enabled" requires = info.get("requires") if isinstance(info, dict) else None
if requires and existing.get(key) and not existing.get(requires):
return False, f"{key} requires {requires} to be enabled"
# Update user's settings # Update user's settings
all_settings[username] = { all_settings[username] = {