* fix(security): close Jira webhook fail-open + path traversal (#83) Two related vulnerabilities: 1. Fail-open signature check: when JIRA_WEBHOOK_SECRET was unset, _verify_signature returned True and any unauthenticated POST to /webhooks/jira would run the full ingest pipeline. Now fail-closed — the handler short-circuits with 503 (operator-misconfiguration signal, distinct from 401 wrong-signature) when the secret is missing. 2. Path traversal via attacker-controlled issue_key: webhook payloads carry issue.key, which flowed unsanitized into save_issue (issues_dir / "{issue_key}.json"), download_attachment (attachments_dir / issue_key), and incremental_transform (raw_dir / "issues" / "{issue_key}.json"). A crafted webhook with issue.key="../../etc/passwd" could write outside the Jira data dir. Defense-in-depth: new connectors/jira/validation.py exposes is_valid_issue_key (whitelist regex ^[A-Z][A-Z0-9_]{0,31}-\d{1,12}$) and safe_join_under (Path.resolve() containment check). Both are enforced at the webhook entry point AND at every filesystem boundary in the connector. Tests: - New tests/test_jira_validation.py — unit tests for both helpers (parametrized invalid keys, traversal/symlink/absolute-path cases). - Webhook tests: test_unconfigured_secret_returns_503, test_path_traversal_in_issue_key_rejected (parametrized over 10 bad keys), test_valid_issue_key_accepted. CHANGELOG: two CRITICAL Fixed bullets under Unreleased. Closes #83. * fix(security): close remaining #83 review findings — webhookEvent traversal, _handle_deletion guard, regex tightening Reviewer of PR #93 flagged four MUST-FIXes: 1. _log_webhook_event used the attacker-controlled `webhookEvent` field as a filename component without sanitization. Payload with `webhookEvent: "../../tmp/pwn"` could escape WEBHOOK_LOG_DIR. Now: - non-`[A-Za-z0-9_-]` runs are replaced with `_` (dot excluded so `..` cannot survive sanitization as a directory component) - length capped at 64 chars - final path routed through safe_join_under New regression test `test_webhook_event_path_traversal_sanitized`. 2. _handle_deletion (connectors/jira/service.py:530) and process_webhook_event (line 487) still used raw issue_key in path builds. Even though the webhook handler validates upstream, the "defense-in-depth at every filesystem boundary" claim required these too. Both now run is_valid_issue_key and safe_join_under guards. 3. Regex `^[A-Z][A-Z0-9_]{0,31}-\d{1,12}$` permitted underscores in project keys. Atlassian's project-key validator does not — `A_B-1` is rejected by Jira itself. Tightened to `[A-Z0-9]` and updated tests: `ABC_DEF-1` is now invalid, added Cyrillic А-1 (lookalike), CRLF, and oversize cases to the bad-key parametrization. 4. Existing test test_deletion_of_nonexistent_issue_returns_true used `PROJ-NOEXIST` which is not a real Jira key shape. Updated to `PROJ-99999`. The test still exercises the same intent (deletion of issue with no local file is idempotent). 73/73 jira tests pass locally (test_jira_webhooks + test_jira_validation + test_jira_service + test_jira_service_full + test_jira_incremental). CHANGELOG updated to document the regex tightening and the new webhookEvent sanitization. Refs review of #93. * fix(tests): test_journey_jira tests assumed fail-open before #83 fix CI failure on PR #93 caught two journey tests that pinned the OLD fail-open contract: - test_webhook_with_no_secret_configured_accepted asserted 200 when JIRA_WEBHOOK_SECRET was unset. After the #83 fix that's a 503 (operator misconfig). Renamed to _refused and flipped the assertion. - test_webhook_empty_payload_rejected didn't set the secret, so the 503 short-circuit fired before the empty-payload 400 could. Set JIRA_WEBHOOK_SECRET in the patched Config so the test exercises the intended path. 56/56 jira journey + webhook + validation tests now pass. * fix(security): #93 round-3 — webhook fallback format + save_issue early validation Devin Review caught two real findings: 1. Webhook handler regression: the round-2 fix extracted issue_key only from event_data['issue']['key'], but process_webhook_event has long supported a fallback 'issue_key' top-level field for certain Jira event formats (e.g. delete events historically). The handler now blocks those events with 400 before they reach the service layer. Fix: mirror process_webhook_event's fallback in the handler — try issue.key first, fall through to event_data.get('issue_key') when empty. is_valid_issue_key still validates whichever source provided the key. 2. save_issue defense-in-depth was incomplete: is_valid_issue_key ran AFTER fetch_remote_links and fetch_sla_fields had already used the unvalidated issue_key in HTTP URL construction ({base_url}/issue/{issue_key}/remotelink etc.). A future internal caller invoking save_issue directly with attacker-controlled input could trigger outbound requests with a malicious path component (limited SSRF / URL-path manipulation against the Jira API server). Fix: move the is_valid_issue_key check to immediately after the null guard, before any HTTP request or filesystem op. Webhook layer still validates upstream, this is the second layer. 66 jira tests pass. Refs Devin Review of #93. * fix(changelog): #93 round-4 — add BREAKING marker to fail-closed bullet Devin Review caught: the JIRA_WEBHOOK_SECRET fail-closed change is a behavior change for operators (response code 503 vs old 200) that existing alerting may treat differently. Per CLAUDE.md changelog discipline rule, operators grep for **BREAKING** before bumping the pin. Added the marker + a short note on what action operators need to take (set the env var if they haven't). Refs Devin Review of #93. * fix: #93 round-5 — null-issue crash + comment drift Devin Review caught two findings on the round-4 commit: 1. Pre-existing crash on null issue field: a webhook payload with {"issue": null} (rather than omitting the key) caused event_data.get("issue", {}) to return None, then issue.get("key") raised AttributeError → unhandled 500. Pre-existing but reachable. Fix: 'event_data.get("issue") or {}' normalises None to {}, then the existing fallback / validation path returns 400 cleanly. New regression test test_null_issue_field_does_not_crash. 2. Inline comment drift: the comment at line 77 documented the allowed character class as [A-Za-z0-9._-] (with dot) but the regex at line 27 excludes dot deliberately (so '..' cannot survive sanitization). Fixed the comment to match. 52 jira tests pass. Refs Devin Review of #93 round 5. * fix: #93 round-6 — process_webhook_event also normalises null issue field Devin Review caught: the webhook handler at app/api/jira_webhooks.py correctly handles {"issue": null} via 'event_data.get("issue") or {}', but process_webhook_event at connectors/jira/service.py:509 still used the bare 'event_data.get("issue", {})' which returns None on explicit null. Internal callers (anything that invokes process_webhook_event without going through the HTTP handler) would hit the same AttributeError the round-5 fix closed at the handler layer. Same one-line fix. 32 jira tests pass. Refs Devin Review of #93 round 5. * fix: #93 round-7 — issue-key regex uses [0-9] not \d Devin Review caught: Python 3's \d matches any Unicode decimal digit (Arabic-Indic ٣, Bengali ৩, Devanagari ३, …). A key like TEST-٣ would pass the regex even though it's not a valid Jira input. Tightened to [0-9] (ASCII only). Added three Unicode-digit cases to the bad-key parametrization in test_jira_validation.py to lock in the contract. Refs Devin Review of #93 round 6. * fix: #93 round-8 — use \\Z anchor not $ in issue-key regex Devin Review caught: Python's $ anchor matches before a trailing \\n, so re.match('…$', 'TEST-1\\n') returns a match. is_valid_issue_key returned True for CRLF-injected keys. \\Z is hard end-of-string and closes that bypass. Manual verification: is_valid_issue_key('TEST-1\\n') → False (was True before fix) is_valid_issue_key('TEST-1\\r\\n') → False is_valid_issue_key('TEST-1') → True Refs Devin Review of #93 round 7. * docs: #93 round-9 — CHANGELOG regex matches implementation
621 lines
23 KiB
Python
621 lines
23 KiB
Python
"""
|
|
Jira API service for fetching issue data.
|
|
|
|
Handles communication with Jira Cloud REST API to fetch complete issue data
|
|
including all fields, comments, and attachments.
|
|
|
|
After saving issue data and attachments, triggers incremental Parquet transform
|
|
for real-time updates available via rsync.
|
|
"""
|
|
|
|
import json
|
|
import logging
|
|
import os
|
|
import tempfile
|
|
from datetime import datetime, timezone
|
|
from pathlib import Path
|
|
from typing import Any
|
|
|
|
import httpx
|
|
|
|
from connectors.jira.validation import is_valid_issue_key, safe_join_under
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
class _JiraConfig:
|
|
"""Jira configuration from environment variables."""
|
|
JIRA_DOMAIN = os.environ.get("JIRA_DOMAIN", "")
|
|
JIRA_EMAIL = os.environ.get("JIRA_EMAIL", "")
|
|
JIRA_API_TOKEN = os.environ.get("JIRA_API_TOKEN", "")
|
|
JIRA_DATA_DIR = Path(os.environ.get("JIRA_DATA_DIR", "/data/src_data/raw/jira"))
|
|
JIRA_CLOUD_ID = os.environ.get("JIRA_CLOUD_ID", "")
|
|
JIRA_SLA_EMAIL = os.environ.get("JIRA_SLA_EMAIL", "")
|
|
JIRA_SLA_API_TOKEN = os.environ.get("JIRA_SLA_API_TOKEN", "")
|
|
JIRA_WEBHOOK_SECRET = os.environ.get("JIRA_WEBHOOK_SECRET", "")
|
|
DEBUG = os.environ.get("DEBUG", "").lower() in ("1", "true")
|
|
|
|
|
|
Config = _JiraConfig
|
|
|
|
|
|
def trigger_incremental_transform(issue_key: str, deleted: bool = False) -> bool:
|
|
"""
|
|
Trigger incremental Parquet transform for a single issue.
|
|
|
|
This updates only the affected monthly Parquet file, making the change
|
|
immediately available for rsync to analysts.
|
|
|
|
Args:
|
|
issue_key: Jira issue key (e.g., "SUPPORT-1234")
|
|
deleted: If True, remove issue from Parquet files
|
|
|
|
Returns:
|
|
True if transform succeeded, False otherwise
|
|
"""
|
|
try:
|
|
from connectors.jira.incremental_transform import transform_single_issue
|
|
|
|
success = transform_single_issue(
|
|
issue_key=issue_key,
|
|
deleted=deleted,
|
|
)
|
|
|
|
if success:
|
|
logger.info(f"Incremental transform completed for {issue_key}")
|
|
# Rebuild Jira views in master analytics.duckdb
|
|
try:
|
|
from src.orchestrator import SyncOrchestrator
|
|
SyncOrchestrator().rebuild_source("jira")
|
|
except Exception as orch_err:
|
|
logger.warning(f"Orchestrator rebuild failed: {orch_err}")
|
|
else:
|
|
logger.warning(f"Incremental transform failed for {issue_key}")
|
|
|
|
return success
|
|
|
|
except ImportError as e:
|
|
logger.warning(f"Incremental transform not available: {e}")
|
|
return False
|
|
except Exception as e:
|
|
logger.error(f"Error in incremental transform for {issue_key}: {e}")
|
|
return False
|
|
|
|
|
|
class JiraService:
|
|
"""Service for interacting with Jira Cloud REST API."""
|
|
|
|
# Max attachment size to download (50 MB)
|
|
MAX_ATTACHMENT_SIZE = 50 * 1024 * 1024
|
|
|
|
def __init__(self) -> None:
|
|
"""Initialize Jira service with configuration."""
|
|
self.domain = Config.JIRA_DOMAIN
|
|
self.email = Config.JIRA_EMAIL
|
|
self.api_token = Config.JIRA_API_TOKEN
|
|
self.data_dir = Config.JIRA_DATA_DIR
|
|
self.attachments_dir = self.data_dir / "attachments"
|
|
|
|
if not all([self.domain, self.email, self.api_token]):
|
|
logger.warning("Jira credentials not fully configured")
|
|
|
|
@property
|
|
def base_url(self) -> str:
|
|
"""Get Jira API base URL."""
|
|
return f"https://{self.domain}/rest/api/3"
|
|
|
|
@property
|
|
def auth(self) -> tuple[str, str]:
|
|
"""Get HTTP Basic auth tuple."""
|
|
return (self.email, self.api_token)
|
|
|
|
def is_configured(self) -> bool:
|
|
"""Check if Jira service is properly configured."""
|
|
return all([self.domain, self.email, self.api_token])
|
|
|
|
def fetch_issue(self, issue_key: str) -> dict[str, Any] | None:
|
|
"""
|
|
Fetch complete issue data from Jira.
|
|
|
|
Args:
|
|
issue_key: Issue key (e.g., "KSP-123")
|
|
|
|
Returns:
|
|
Issue data dict or None if fetch failed
|
|
"""
|
|
if not self.is_configured():
|
|
logger.error("Jira service not configured, cannot fetch issue")
|
|
return None
|
|
|
|
url = f"{self.base_url}/issue/{issue_key}"
|
|
params = {
|
|
"expand": "renderedFields,changelog",
|
|
"fields": "*all",
|
|
}
|
|
|
|
try:
|
|
with httpx.Client(timeout=30) as client:
|
|
response = client.get(
|
|
url,
|
|
auth=self.auth,
|
|
params=params,
|
|
headers={"Accept": "application/json"},
|
|
)
|
|
|
|
if response.status_code == 200:
|
|
return response.json()
|
|
elif response.status_code == 404:
|
|
logger.warning(f"Issue {issue_key} not found")
|
|
return None
|
|
else:
|
|
logger.error(
|
|
f"Failed to fetch issue {issue_key}: "
|
|
f"{response.status_code} - {response.text[:200]}"
|
|
)
|
|
return None
|
|
|
|
except httpx.RequestError as e:
|
|
logger.error(f"Request error fetching issue {issue_key}: {e}")
|
|
return None
|
|
|
|
def fetch_sla_fields(self, issue_key: str) -> dict[str, Any] | None:
|
|
"""
|
|
Fetch SLA fields using the JSM service account.
|
|
|
|
The personal API token lacks JSM Agent licence needed for SLA fields.
|
|
This method uses a separate service account with the cloud API URL.
|
|
|
|
Args:
|
|
issue_key: Issue key (e.g., "SUPPORT-123")
|
|
|
|
Returns:
|
|
Dict with SLA field values, or None if not configured/failed
|
|
"""
|
|
cloud_id = Config.JIRA_CLOUD_ID
|
|
sla_email = Config.JIRA_SLA_EMAIL
|
|
sla_token = Config.JIRA_SLA_API_TOKEN
|
|
|
|
if not all([cloud_id, sla_email, sla_token]):
|
|
logger.debug("SLA service account not configured, skipping SLA fetch")
|
|
return None
|
|
|
|
base_url = f"https://api.atlassian.com/ex/jira/{cloud_id}/rest/api/3"
|
|
url = f"{base_url}/issue/{issue_key}"
|
|
params = {"fields": "customfield_10328,customfield_10161"}
|
|
|
|
try:
|
|
with httpx.Client(timeout=30) as client:
|
|
response = client.get(
|
|
url,
|
|
auth=(sla_email, sla_token),
|
|
headers={"Accept": "application/json"},
|
|
)
|
|
|
|
if response.status_code == 200:
|
|
return response.json().get("fields", {})
|
|
else:
|
|
logger.warning(
|
|
f"Failed to fetch SLA for {issue_key}: "
|
|
f"{response.status_code}"
|
|
)
|
|
return None
|
|
|
|
except httpx.RequestError as e:
|
|
logger.warning(f"SLA fetch error for {issue_key}: {e}")
|
|
return None
|
|
|
|
def fetch_remote_links(self, issue_key: str) -> list[dict]:
|
|
"""
|
|
Fetch remote links for an issue from Jira.
|
|
|
|
Args:
|
|
issue_key: Issue key (e.g., "KSP-123")
|
|
|
|
Returns:
|
|
List of remote link dicts, empty list on failure
|
|
"""
|
|
if not self.is_configured():
|
|
return []
|
|
|
|
url = f"{self.base_url}/issue/{issue_key}/remotelink"
|
|
|
|
try:
|
|
with httpx.Client(timeout=30) as client:
|
|
response = client.get(
|
|
url,
|
|
auth=self.auth,
|
|
headers={"Accept": "application/json"},
|
|
)
|
|
|
|
if response.status_code == 200:
|
|
return response.json()
|
|
elif response.status_code == 404:
|
|
return []
|
|
else:
|
|
logger.warning(
|
|
f"Failed to fetch remote links for {issue_key}: "
|
|
f"{response.status_code}"
|
|
)
|
|
return []
|
|
|
|
except httpx.RequestError as e:
|
|
logger.warning(f"Request error fetching remote links for {issue_key}: {e}")
|
|
return []
|
|
|
|
def save_issue(self, issue_data: dict[str, Any]) -> Path | None:
|
|
"""
|
|
Save issue data to JSON file.
|
|
|
|
Args:
|
|
issue_data: Complete issue data from Jira API
|
|
|
|
Returns:
|
|
Path to saved file or None if save failed
|
|
"""
|
|
issue_key = issue_data.get("key")
|
|
if not issue_key:
|
|
logger.error("Issue data missing 'key' field")
|
|
return None
|
|
|
|
# Defense-in-depth: validate `issue_key` BEFORE any code path
|
|
# uses it — including the HTTP URL constructions in
|
|
# fetch_remote_links / fetch_sla_fields below. The webhook
|
|
# handler already validates upstream, but a future internal
|
|
# caller invoking save_issue directly with attacker-controlled
|
|
# input would otherwise fire outbound requests with a malicious
|
|
# path component (limited SSRF / path manipulation against the
|
|
# Jira API server) before the filesystem-side guard rejected it.
|
|
# Issue #83 round 3.
|
|
if not is_valid_issue_key(issue_key):
|
|
logger.error(f"Refusing to save issue with malformed key: {issue_key!r}")
|
|
return None
|
|
|
|
# Create data directory if needed
|
|
self.data_dir.mkdir(parents=True, exist_ok=True)
|
|
|
|
# Add metadata
|
|
issue_data["_synced_at"] = datetime.now(timezone.utc).isoformat()
|
|
|
|
# Fetch and embed remote links for Parquet transform
|
|
issue_key_for_links = issue_data.get("key")
|
|
if issue_key_for_links:
|
|
issue_data["_remote_links"] = self.fetch_remote_links(issue_key_for_links)
|
|
|
|
# Overlay SLA fields from JSM service account (personal token lacks permissions)
|
|
sla_fields = self.fetch_sla_fields(issue_key)
|
|
if sla_fields:
|
|
if "fields" not in issue_data:
|
|
issue_data["fields"] = {}
|
|
for sla_field_id in ("customfield_10328", "customfield_10161"):
|
|
if sla_field_id in sla_fields:
|
|
issue_data["fields"][sla_field_id] = sla_fields[sla_field_id]
|
|
logger.info(f"Overlayed SLA fields for {issue_key}")
|
|
|
|
# Save to file (one file per issue for now, later we'll batch to parquet)
|
|
# Path.resolve() containment as second layer; the regex check
|
|
# above is the primary defense.
|
|
issues_dir = self.data_dir / "issues"
|
|
issues_dir.mkdir(parents=True, exist_ok=True)
|
|
try:
|
|
file_path = safe_join_under(issues_dir, f"{issue_key}.json")
|
|
except ValueError as e:
|
|
logger.error(f"Path traversal blocked for issue {issue_key!r}: {e}")
|
|
return None
|
|
|
|
try:
|
|
from connectors.jira.file_lock import issue_json_lock
|
|
|
|
# Lock protects the JSON write + Parquet transform from concurrent
|
|
# SLA poll writes. Attachment download stays outside the lock.
|
|
with issue_json_lock(issues_dir, issue_key):
|
|
# Atomic write: temp file + replace
|
|
fd, tmp_path = tempfile.mkstemp(
|
|
dir=str(file_path.parent), suffix=".tmp"
|
|
)
|
|
os.fchmod(fd, 0o660) # Restore group rw for ACL
|
|
try:
|
|
with os.fdopen(fd, "w") as f:
|
|
json.dump(issue_data, f, indent=2, default=str)
|
|
os.replace(tmp_path, str(file_path))
|
|
except Exception:
|
|
try:
|
|
os.unlink(tmp_path)
|
|
except OSError:
|
|
pass
|
|
raise
|
|
logger.info(f"Saved issue {issue_key} to {file_path}")
|
|
|
|
# Trigger incremental Parquet transform FIRST for real-time rsync.
|
|
# This must run before attachment download because large attachments
|
|
# can cause gunicorn worker timeouts (SIGKILL), preventing the
|
|
# transform from ever running. Parquet availability is higher
|
|
# priority than local attachment files.
|
|
trigger_incremental_transform(issue_key, deleted=False)
|
|
|
|
# Download attachments OUTSIDE the lock (non-fatal: timeout/failure
|
|
# here should not block the webhook response or prevent Parquet
|
|
# from being updated, and can be slow)
|
|
try:
|
|
downloaded = self.download_all_attachments(issue_data)
|
|
if downloaded:
|
|
logger.info(f"Downloaded {len(downloaded)} attachments for {issue_key}")
|
|
except Exception as att_err:
|
|
logger.warning(f"Attachment download failed for {issue_key}: {att_err}")
|
|
|
|
return file_path
|
|
except Exception as e:
|
|
logger.error(f"Failed to save issue {issue_key}: {e}")
|
|
return None
|
|
|
|
def download_attachment(self, attachment: dict[str, Any], issue_key: str) -> Path | None:
|
|
"""
|
|
Download a single attachment from Jira.
|
|
|
|
Args:
|
|
attachment: Attachment metadata from Jira API
|
|
issue_key: Issue key for organizing files
|
|
|
|
Returns:
|
|
Path to downloaded file or None if download failed
|
|
"""
|
|
content_url = attachment.get("content")
|
|
filename = attachment.get("filename", "unknown")
|
|
size = attachment.get("size", 0)
|
|
attachment_id = attachment.get("id", "unknown")
|
|
|
|
if not content_url:
|
|
logger.warning(f"Attachment {filename} has no content URL")
|
|
return None
|
|
|
|
# Skip large attachments
|
|
if size > self.MAX_ATTACHMENT_SIZE:
|
|
logger.warning(
|
|
f"Skipping attachment {filename} ({size} bytes) - exceeds max size"
|
|
)
|
|
return None
|
|
|
|
# Create issue-specific attachment directory.
|
|
# Two-layer guard against path traversal via issue_key (issue #83).
|
|
if not is_valid_issue_key(issue_key):
|
|
logger.error(f"Refusing to download attachment for malformed key: {issue_key!r}")
|
|
return None
|
|
try:
|
|
issue_attachments_dir = safe_join_under(self.attachments_dir, issue_key)
|
|
except ValueError as e:
|
|
logger.error(f"Path traversal blocked for attachment {issue_key!r}: {e}")
|
|
return None
|
|
issue_attachments_dir.mkdir(parents=True, exist_ok=True)
|
|
|
|
# Use attachment ID in filename to avoid collisions
|
|
safe_filename = f"{attachment_id}_{filename}"
|
|
try:
|
|
file_path = safe_join_under(issue_attachments_dir, safe_filename)
|
|
except ValueError as e:
|
|
logger.error(f"Path traversal blocked for attachment filename {safe_filename!r}: {e}")
|
|
return None
|
|
|
|
try:
|
|
with httpx.Client(timeout=60, follow_redirects=True) as client:
|
|
response = client.get(
|
|
content_url,
|
|
auth=self.auth,
|
|
)
|
|
|
|
if response.status_code == 200:
|
|
with open(file_path, "wb") as f:
|
|
f.write(response.content)
|
|
logger.info(f"Downloaded attachment {filename} to {file_path}")
|
|
return file_path
|
|
else:
|
|
logger.error(
|
|
f"Failed to download attachment {filename}: "
|
|
f"{response.status_code}"
|
|
)
|
|
return None
|
|
|
|
except httpx.RequestError as e:
|
|
logger.error(f"Request error downloading attachment {filename}: {e}")
|
|
return None
|
|
|
|
def download_all_attachments(self, issue_data: dict[str, Any]) -> list[Path]:
|
|
"""
|
|
Download all attachments for an issue (from fields and comments).
|
|
|
|
Args:
|
|
issue_data: Complete issue data from Jira API
|
|
|
|
Returns:
|
|
List of paths to downloaded files
|
|
"""
|
|
issue_key = issue_data.get("key", "unknown")
|
|
downloaded = []
|
|
|
|
# Get direct attachments from issue fields
|
|
attachments = issue_data.get("fields", {}).get("attachment", [])
|
|
logger.info(f"Issue {issue_key} has {len(attachments)} direct attachments")
|
|
|
|
for attachment in attachments:
|
|
path = self.download_attachment(attachment, issue_key)
|
|
if path:
|
|
downloaded.append(path)
|
|
|
|
# Check comments for inline attachments (ADF media nodes)
|
|
# Comments in Jira Cloud use Atlassian Document Format (ADF)
|
|
comments_data = issue_data.get("fields", {}).get("comment", {})
|
|
comments = comments_data.get("comments", [])
|
|
|
|
for comment in comments:
|
|
# ADF body may contain mediaSingle/mediaInline nodes with attachments
|
|
body = comment.get("body", {})
|
|
media_attachments = self._extract_media_from_adf(body)
|
|
|
|
for media_id in media_attachments:
|
|
# Media in comments references attachments by ID
|
|
# Find matching attachment in the attachment list
|
|
for attachment in attachments:
|
|
if attachment.get("id") == media_id:
|
|
# Already downloaded above
|
|
break
|
|
else:
|
|
# Media not in main attachments - try to fetch directly
|
|
logger.debug(f"Found media {media_id} in comment, not in attachments")
|
|
|
|
logger.info(f"Downloaded {len(downloaded)} attachments for {issue_key}")
|
|
return downloaded
|
|
|
|
def _extract_media_from_adf(self, node: dict[str, Any]) -> list[str]:
|
|
"""
|
|
Extract media IDs from Atlassian Document Format (ADF) content.
|
|
|
|
Args:
|
|
node: ADF node (recursive structure)
|
|
|
|
Returns:
|
|
List of media attachment IDs found in the content
|
|
"""
|
|
media_ids = []
|
|
|
|
if not isinstance(node, dict):
|
|
return media_ids
|
|
|
|
# Check if this node is a media node
|
|
node_type = node.get("type", "")
|
|
if node_type in ("mediaSingle", "mediaInline", "media"):
|
|
attrs = node.get("attrs", {})
|
|
media_id = attrs.get("id")
|
|
if media_id:
|
|
media_ids.append(media_id)
|
|
|
|
# Recursively check content
|
|
content = node.get("content", [])
|
|
if isinstance(content, list):
|
|
for child in content:
|
|
media_ids.extend(self._extract_media_from_adf(child))
|
|
|
|
return media_ids
|
|
|
|
def process_webhook_event(self, event_data: dict[str, Any]) -> bool:
|
|
"""
|
|
Process a webhook event by fetching and saving the related issue.
|
|
|
|
Args:
|
|
event_data: Webhook event payload from Jira
|
|
|
|
Returns:
|
|
True if processing succeeded, False otherwise
|
|
"""
|
|
# Extract issue key from event
|
|
# Jira webhook format: {"webhookEvent": "jira:issue_updated", "issue": {"key": "KSP-123", ...}}
|
|
# Defensive: a payload may carry `"issue": null` rather than
|
|
# omitting the key. The webhook handler normalises this, but
|
|
# do the same here too — process_webhook_event is reachable from
|
|
# internal callers as well as the webhook path.
|
|
issue = event_data.get("issue") or {}
|
|
issue_key = issue.get("key")
|
|
|
|
if not issue_key:
|
|
# Try alternative format for some events
|
|
issue_key = event_data.get("issue_key")
|
|
|
|
if not issue_key:
|
|
logger.warning(f"Could not extract issue key from webhook event: {event_data.get('webhookEvent')}")
|
|
return False
|
|
|
|
# Defense-in-depth: even if the webhook layer's validation is bypassed
|
|
# (e.g. a future internal caller invokes process_webhook_event directly),
|
|
# refuse a malformed key here. Issue #83.
|
|
if not is_valid_issue_key(issue_key):
|
|
logger.error(f"process_webhook_event: refusing malformed issue key {issue_key!r}")
|
|
return False
|
|
|
|
webhook_event = event_data.get("webhookEvent", "unknown")
|
|
logger.info(f"Processing webhook event: {webhook_event} for issue {issue_key}")
|
|
|
|
# Handle deletion events
|
|
if "deleted" in webhook_event.lower():
|
|
return self._handle_deletion(issue_key)
|
|
|
|
# Fetch fresh data from API (webhook payload may not have all fields)
|
|
issue_data = self.fetch_issue(issue_key)
|
|
if not issue_data:
|
|
# If fetch fails, try to use embedded issue data from webhook
|
|
if issue and issue.get("fields"):
|
|
logger.info(f"Using embedded issue data for {issue_key}")
|
|
issue_data = issue
|
|
else:
|
|
return False
|
|
|
|
# Save the issue
|
|
return self.save_issue(issue_data) is not None
|
|
|
|
def _handle_deletion(self, issue_key: str) -> bool:
|
|
"""
|
|
Handle issue deletion by marking it as deleted and updating Parquet.
|
|
|
|
Args:
|
|
issue_key: Key of deleted issue
|
|
|
|
Returns:
|
|
True if handled successfully
|
|
"""
|
|
# Defense-in-depth path-traversal guard (issue #83). Callers should
|
|
# already have validated; refuse anyway.
|
|
if not is_valid_issue_key(issue_key):
|
|
logger.error(f"_handle_deletion: refusing malformed issue key {issue_key!r}")
|
|
return False
|
|
try:
|
|
file_path = safe_join_under(self.data_dir / "issues", f"{issue_key}.json")
|
|
except ValueError as e:
|
|
logger.error(f"_handle_deletion: path traversal blocked for {issue_key!r}: {e}")
|
|
return False
|
|
|
|
if file_path.exists():
|
|
# Mark as deleted rather than removing
|
|
try:
|
|
from connectors.jira.file_lock import issue_json_lock
|
|
|
|
issues_dir = self.data_dir / "issues"
|
|
with issue_json_lock(issues_dir, issue_key):
|
|
with open(file_path) as f:
|
|
data = json.load(f)
|
|
data["_deleted_at"] = datetime.now(timezone.utc).isoformat()
|
|
|
|
# Atomic write: temp file + replace
|
|
fd, tmp_path = tempfile.mkstemp(
|
|
dir=str(file_path.parent), suffix=".tmp"
|
|
)
|
|
os.fchmod(fd, 0o660) # Restore group rw for ACL
|
|
try:
|
|
with os.fdopen(fd, "w") as f:
|
|
json.dump(data, f, indent=2, default=str)
|
|
os.replace(tmp_path, str(file_path))
|
|
except Exception:
|
|
try:
|
|
os.unlink(tmp_path)
|
|
except OSError:
|
|
pass
|
|
raise
|
|
logger.info(f"Marked issue {issue_key} as deleted")
|
|
|
|
# Remove from Parquet files
|
|
trigger_incremental_transform(issue_key, deleted=True)
|
|
|
|
return True
|
|
except Exception as e:
|
|
logger.error(f"Failed to mark issue {issue_key} as deleted: {e}")
|
|
return False
|
|
|
|
logger.info(f"Issue {issue_key} not found locally, nothing to delete")
|
|
return True
|
|
|
|
|
|
# Singleton instance
|
|
_jira_service: JiraService | None = None
|
|
|
|
|
|
def get_jira_service() -> JiraService:
|
|
"""Get or create Jira service singleton."""
|
|
global _jira_service
|
|
if _jira_service is None:
|
|
_jira_service = JiraService()
|
|
return _jira_service
|