fix(jira): harden _remote_links fetch — prevent transient outage from wiping parquet rows (#319)

* fix(jira): harden _remote_links fetch — transient API failure no longer wipes parquet rows

Pre-fix, all three fetch_remote_links call sites (service.py,
scripts/backfill.py, scripts/backfill_remote_links.py) silently
returned [] on 401/403/429/5xx or httpx.RequestError. Callers overlaid
that [] onto cached issue JSON, and transform_remote_links interpreted
the empty list as 'issue legitimately has no remote links — delete
existing rows', so a transient Jira auth blip permanently wiped
remote-link history.

Now:
- Every fetch site raises JiraFetchError on non-200/non-404 status,
  on httpx.RequestError, and on the 'service not configured' path.
- Overlay sites skip the _remote_links key when fetch raises, leaving
  it ABSENT (not present-but-empty).
- transform_remote_links returns None for absent/null keys (preserve
  existing rows) vs [] (legitimate empty — wipe).
- Both consumers (batch transform_all, incremental
  transform_single_issue) honor the new contract.
- End-to-end tests
  test_incremental_preserves_remote_links_when_overlay_absent and
  test_incremental_wipes_remote_links_when_overlay_present_but_empty
  lock both halves.

Adversarial-review fixes bundled:
- service.py: unconfigured-service path now raises JiraFetchError
  instead of returning [] (a webhook can arrive while API creds are
  missing — HMAC verification uses a separate JIRA_WEBHOOK_SECRET).
  Regression guard test_raises_when_unconfigured added.
- consistency_check.py: AUTO_FIX_THRESHOLD bumped 10 -> 20 to cover
  typical SLA-poller hiccups before escalating to ERROR.
- CLAUDE.md: connectors/jira/transform.py removed from 'Files NOT to
  modify' (overlay-contract change required touching it; module
  remains sensitive but is no longer off-limits).

* release: 0.54.19 — jira remote_links hardening (transient API failure no longer wipes parquet rows)
This commit is contained in:
ZdenekSrotyr 2026-05-15 19:09:46 +02:00 committed by GitHub
parent 9e948abc9c
commit ed3e8337ab
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
11 changed files with 578 additions and 74 deletions

View file

@ -10,6 +10,47 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C
## [Unreleased]
## [0.54.19] — 2026-05-15
### Changed
- `connectors/jira/scripts/consistency_check.py`
`AUTO_FIX_THRESHOLD` bumped from 10 to 20. Auto-backfill now covers
typical SLA-poller hiccups before escalating to ERROR.
`WARNING_THRESHOLD` unchanged.
### Fixed
- **`connectors/jira` — transient Jira API failure no longer wipes
existing `remote_links` parquet rows.** Pre-fix, all three
`fetch_remote_links` sites (`service.py`, `scripts/backfill.py`,
`scripts/backfill_remote_links.py`) silently returned `[]` on
401/403/429/5xx or `httpx.RequestError`. Callers overlaid that `[]`
onto cached issue JSON, and `transform_remote_links` interpreted the
empty list as "issue legitimately has no remote links — delete its
existing rows", so a transient Jira auth blip (or a webhook burst
hitting Jira's rate limiter) permanently wiped remote-link history.
Now: every fetch site raises `JiraFetchError` on non-200/non-404
status and `httpx.RequestError` (including the "service not
configured" path — a webhook arriving while API creds are missing
no longer surfaces as a silent wipe), overlay sites skip the
`_remote_links` key on raise (leaving it ABSENT, not
present-but-empty), and `transform_remote_links` returns `None` for
absent / `null` keys (preserve existing rows) vs `[]` (legitimate
empty — wipe). Both consumers (batch `transform_all` and incremental
`transform_single_issue`) honor the new contract. End-to-end tests
lock both halves: `test_incremental_preserves_remote_links_when_overlay_absent`
+ `test_incremental_wipes_remote_links_when_overlay_present_but_empty`.
The bulk-backfill scripts retain their existing `Retry-After`
sleep+retry loop for 429 (appropriate for non-interactive batch
contexts); only the webhook hot path raises on 429.
### Internal
- `CLAUDE.md``connectors/jira/transform.py` removed from the
"Files NOT to modify" list. The `_remote_links` hardening required
modifying `transform_remote_links` and `transform_all` to honor the
new "overlay absent → preserve existing rows" contract; the module
remains sensitive (touch only with end-to-end understanding of the
JSON-overlay / parquet-rewrite pipeline) but is no longer off-limits.
## [0.54.18] — 2026-05-15
### Added

View file

@ -293,9 +293,15 @@ Auth providers in `app/auth/` (FastAPI-based):
### Files NOT to modify (stable infrastructure)
- `connectors/jira/file_lock.py` — advisory file locking
- `connectors/jira/transform.py` — core Jira transform logic
- `services/ws_gateway/` — WebSocket notification gateway
(`connectors/jira/transform.py` was previously listed here but has been
removed: the `_remote_links` hardening in 0.54.19 required modifying
`transform_remote_links` and `transform_all` to honor a new "overlay
absent → preserve existing rows" contract. The transform module remains
sensitive — touch it only when you understand the JSON-overlay /
parquet-rewrite pipeline end-to-end — but it is no longer off-limits.)
## Release process
Full recipe, deploy workflows, and CI quirks: [`docs/RELEASING.md`](docs/RELEASING.md). The non-negotiable rules:

View file

@ -217,10 +217,19 @@ def transform_single_issue(
updated_paths.append(path)
# Remote links
if remote_links_records is not None:
existing_remote_links = load_parquet_month(output_dir / "remote_links", month_key)
updated_remote_links = upsert_dataframe(existing_remote_links, remote_links_records, "issue_key", issue_key)
path = save_parquet_month(updated_remote_links, REMOTE_LINKS_SCHEMA, output_dir / "remote_links", month_key)
updated_paths.append(path)
else:
# The writer (save_issue / backfill / backfill_remote_links) skipped
# the _remote_links overlay due to a Jira fetch failure. Preserve the
# existing parquet rows for this issue instead of wiping them.
logger.warning(
f"Skipping remote_links upsert for {issue_key}: overlay absent "
f"(fetch failure). Existing rows preserved."
)
# Update extract.duckdb _meta for all affected tables
try:

View file

@ -41,6 +41,7 @@ import httpx
from dotenv import load_dotenv
from app.logging_config import setup_logging
from connectors.jira.service import JiraFetchError
setup_logging(__name__)
logger = logging.getLogger(__name__)
@ -244,11 +245,10 @@ class JiraBackfill:
"""
Fetch remote links for an issue from Jira.
Args:
issue_key: Issue key (e.g., "PROJ-123")
Returns:
List of remote link dicts, empty list on failure
Mirrors connectors.jira.service.JiraService.fetch_remote_links
raises JiraFetchError on auth (401/403) or server (5xx) failure so
the caller can skip the overlay rather than wipe existing parquet
rows. 429 rate-limit retries are kept as-is (legitimate transient).
"""
url = f"{self.base_url}/issue/{issue_key}/remotelink"
@ -259,23 +259,34 @@ class JiraBackfill:
auth=self.auth,
headers={"Accept": "application/json"},
)
except httpx.RequestError as e:
raise JiraFetchError(
f"Backfill remote-links fetch for {issue_key} failed: connection — {e}"
) from e
if response.status_code == 200:
return response.json()
elif response.status_code == 404:
if response.status_code == 404:
return []
elif response.status_code == 429:
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 60))
logger.warning(f"Rate limited on remote links, waiting {retry_after}s...")
time.sleep(retry_after)
return self.fetch_remote_links(issue_key)
else:
logger.debug(f"Failed to fetch remote links for {issue_key}: {response.status_code}")
return []
except httpx.RequestError as e:
logger.debug(f"Error fetching remote links for {issue_key}: {e}")
return []
if response.status_code in (401, 403):
raise JiraFetchError(
f"Backfill remote-links fetch for {issue_key} failed: auth error "
f"({response.status_code}) — token may be expired/revoked"
)
if response.status_code >= 500:
raise JiraFetchError(
f"Backfill remote-links fetch for {issue_key} failed: server error "
f"({response.status_code})"
)
raise JiraFetchError(
f"Backfill remote-links fetch for {issue_key} failed: unexpected status "
f"{response.status_code}"
)
def save_issue(self, issue_data: dict) -> Path | None:
"""
@ -398,8 +409,15 @@ class JiraBackfill:
self.stats["failed"] += 1
return False
# Fetch and embed remote links for Parquet transform
# Fetch and embed remote links for Parquet transform. If fetch fails,
# leave the key ABSENT so transform_remote_links preserves existing rows.
try:
issue_data["_remote_links"] = self.fetch_remote_links(issue_key)
except JiraFetchError as e:
logger.warning(
f"Skipping _remote_links overlay for {issue_key}: {e}. "
f"Existing parquet rows will be preserved."
)
# Save JSON
if not self.save_issue(issue_data):

View file

@ -37,6 +37,7 @@ import httpx
from dotenv import load_dotenv
from app.logging_config import setup_logging
from connectors.jira.service import JiraFetchError
setup_logging(__name__)
logger = logging.getLogger(__name__)
@ -72,7 +73,12 @@ def load_config() -> dict:
def fetch_remote_links(base_url: str, auth: tuple[str, str], issue_key: str) -> list[dict]:
"""Fetch remote links for a single issue from Jira API."""
"""Fetch remote links for a single issue from Jira API.
Raises JiraFetchError on auth/server failure so the caller can
skip the overlay rather than write [] and let the next incremental
transform wipe existing parquet rows.
"""
url = f"{base_url}/issue/{issue_key}/remotelink"
try:
@ -82,23 +88,34 @@ def fetch_remote_links(base_url: str, auth: tuple[str, str], issue_key: str) ->
auth=auth,
headers={"Accept": "application/json"},
)
except httpx.RequestError as e:
raise JiraFetchError(
f"Targeted backfill remote-links fetch for {issue_key} failed: connection — {e}"
) from e
if response.status_code == 200:
return response.json()
elif response.status_code == 404:
if response.status_code == 404:
return []
elif response.status_code == 429:
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 60))
logger.warning(f"Rate limited, waiting {retry_after}s...")
time.sleep(retry_after)
return fetch_remote_links(base_url, auth, issue_key)
else:
logger.debug(f"Failed to fetch remote links for {issue_key}: {response.status_code}")
return []
except httpx.RequestError as e:
logger.debug(f"Error fetching remote links for {issue_key}: {e}")
return []
if response.status_code in (401, 403):
raise JiraFetchError(
f"Targeted backfill remote-links fetch for {issue_key} failed: auth error "
f"({response.status_code}) — token may be expired/revoked"
)
if response.status_code >= 500:
raise JiraFetchError(
f"Targeted backfill remote-links fetch for {issue_key} failed: server error "
f"({response.status_code})"
)
raise JiraFetchError(
f"Targeted backfill remote-links fetch for {issue_key} failed: unexpected status "
f"{response.status_code}"
)
def process_file(json_path: Path, base_url: str, auth: tuple[str, str]) -> str:
@ -119,8 +136,17 @@ def process_file(json_path: Path, base_url: str, auth: tuple[str, str]) -> str:
if not issue_key:
return "failed"
# Fetch remote links
# Fetch remote links. If fetch fails, return "failed" without writing
# the JSON — leaves the file as-is (no _remote_links key) so the next
# incremental transform preserves existing parquet rows.
try:
remote_links = fetch_remote_links(base_url, auth, issue_key)
except JiraFetchError as e:
logger.warning(
f"Skipping _remote_links embed for {issue_key}: {e}. "
f"Existing parquet rows will be preserved."
)
return "failed"
# Embed in data
data["_remote_links"] = remote_links

View file

@ -119,7 +119,7 @@ class JiraConsistencyChecker:
GRACE_PERIOD_MINUTES = 5
# Thresholds for auto-backfill
AUTO_FIX_THRESHOLD = 10 # Auto-fix if ≤10 issues missing
AUTO_FIX_THRESHOLD = 20 # Auto-fix if ≤20 issues missing
WARNING_THRESHOLD = 5 # Log WARNING if >5 issues
# Jira API limits

View file

@ -23,6 +23,15 @@ from connectors.jira.validation import is_valid_issue_key, safe_join_under
logger = logging.getLogger(__name__)
class JiraFetchError(Exception):
"""Raised by Jira fetch helpers when the API returns an auth (401/403)
or server (5xx) error. Callers that overlay the result onto cached
issue JSON (save_issue, backfill processors) MUST catch this and
skip the overlay; otherwise a transient outage silently wipes
existing parquet rows for that issue.
"""
class _JiraConfig:
"""Jira configuration from environment variables."""
JIRA_DOMAIN = os.environ.get("JIRA_DOMAIN", "")
@ -118,7 +127,7 @@ class JiraService:
Fetch complete issue data from Jira.
Args:
issue_key: Issue key (e.g., "KSP-123")
issue_key: Issue key (e.g., "PROJ-123")
Returns:
Issue data dict or None if fetch failed
@ -208,14 +217,28 @@ class JiraService:
"""
Fetch remote links for an issue from Jira.
Args:
issue_key: Issue key (e.g., "KSP-123")
Returns:
List of remote link dicts, empty list on failure
Returns the list of remote links on 200; an empty list on 404
(issue legitimately has no remote links). Raises JiraFetchError
on ANY other status code or on httpx.RequestError, so callers
that overlay this onto cached issue data can skip the overlay
instead of wiping existing rows. Critically, 429 rate limits
also raise silently returning [] there would re-trigger the
same wipe bug (a webhook burst hitting Jira's rate limiter is
the most likely production scenario).
"""
# Unconfigured-service case: per the new contract, callers
# interpret `[]` as "issue legitimately has no remote links"
# and `JiraFetchError` as "fetch failed, preserve existing
# rows". Silently returning `[]` here would overlay an empty
# list onto cached issue JSON and wipe existing parquet rows
# the next time a webhook fires while creds happen to be
# missing — the exact regression this PR closes for the 401
# / 429 / 5xx paths. Raise instead so the overlay site skips.
if not self.is_configured():
return []
raise JiraFetchError(
f"Remote-links fetch for {issue_key} failed: "
"Jira service not configured (missing API credentials)"
)
url = f"{self.base_url}/issue/{issue_key}/remotelink"
@ -226,21 +249,34 @@ class JiraService:
auth=self.auth,
headers={"Accept": "application/json"},
)
except httpx.RequestError as e:
raise JiraFetchError(
f"Remote-links fetch for {issue_key} failed: connection — {e}"
) from e
if response.status_code == 200:
return response.json()
elif response.status_code == 404:
if response.status_code == 404:
return []
else:
logger.warning(
f"Failed to fetch remote links for {issue_key}: "
if response.status_code in (401, 403):
raise JiraFetchError(
f"Remote-links fetch for {issue_key} failed: auth error "
f"({response.status_code}) — token may be expired/revoked"
)
if response.status_code == 429:
raise JiraFetchError(
f"Remote-links fetch for {issue_key} failed: rate limited "
f"(429) — retry later"
)
if response.status_code >= 500:
raise JiraFetchError(
f"Remote-links fetch for {issue_key} failed: server error "
f"({response.status_code})"
)
raise JiraFetchError(
f"Remote-links fetch for {issue_key} failed: unexpected status "
f"{response.status_code}"
)
return []
except httpx.RequestError as e:
logger.warning(f"Request error fetching remote links for {issue_key}: {e}")
return []
def save_issue(self, issue_data: dict[str, Any]) -> Path | None:
"""
@ -276,10 +312,19 @@ class JiraService:
# Add metadata
issue_data["_synced_at"] = datetime.now(timezone.utc).isoformat()
# Fetch and embed remote links for Parquet transform
# Overlay-skip guard: if fetch_remote_links raises (auth/server failure),
# leave the _remote_links key ABSENT. transform_remote_links treats absent key
# as "no fresh data, preserve existing parquet rows". A present-but-empty list
# would be interpreted as "this issue has no remote links — wipe existing".
issue_key_for_links = issue_data.get("key")
if issue_key_for_links:
try:
issue_data["_remote_links"] = self.fetch_remote_links(issue_key_for_links)
except JiraFetchError as e:
logger.warning(
f"Skipping _remote_links overlay for {issue_key_for_links}: {e}. "
f"Existing parquet rows will be preserved."
)
# Overlay SLA fields from JSM service account (personal token lacks permissions)
sla_fields = self.fetch_sla_fields(issue_key)

View file

@ -526,14 +526,32 @@ def transform_issuelinks(raw_issue: dict) -> list[dict]:
return records
def transform_remote_links(raw_issue: dict) -> list[dict]:
def transform_remote_links(raw_issue: dict) -> list[dict] | None:
"""Extract and transform remote links from an issue.
Remote links are embedded in the raw issue JSON as `_remote_links`
by the fetch layer (jira_service.py / jira_backfill.py).
Returns:
- list[dict]: fresh records to upsert into parquet. May be empty,
meaning the issue legitimately has no remote links right now
(HTTP 200 with [] or HTTP 404 from the fetch).
- None: the _remote_links key was absent from raw_issue, which
signals that save_issue (or the equivalent backfill writer)
could not refresh remote links typically a 401/403/5xx from
the Jira API. Callers MUST treat None as "skip the upsert";
overwriting with [] would delete existing parquet rows for
this issue.
The key shape is set by the writers (JiraService.save_issue,
JiraBackfiller, backfill_remote_links): present means the fetch
succeeded (200 or 404), absent means the fetch raised.
"""
issue_key = raw_issue.get("key")
remote_links = raw_issue.get("_remote_links", [])
# Treat both absent key and explicit None as the "no fresh data" signal —
# absent is the contract from save_issue/backfill writers, None is the
# defensive case where a JSON edit or older buggy code stored an explicit
# null (would otherwise blow up on `for rl in None`).
remote_links = raw_issue.get("_remote_links")
if remote_links is None:
return None
records = []
for rl in remote_links:
@ -664,7 +682,15 @@ def transform_all(
changelog_by_month[month_key].extend(transform_changelog(raw_issue))
issuelinks_by_month[month_key].extend(transform_issuelinks(raw_issue))
remote_links_by_month[month_key].extend(transform_remote_links(raw_issue))
rl_records = transform_remote_links(raw_issue)
if rl_records is not None:
remote_links_by_month[month_key].extend(rl_records)
# else: _remote_links overlay was skipped (fetch failure). The batch
# rebuild writes monthly parquets from scratch, so this issue simply
# contributes no rows to the rebuild — it doesn't "preserve" anything.
# A re-run after the outage clears will repopulate. The incremental
# path (incremental_transform.py) is what genuinely preserves
# existing rows; batch mode is full-rebuild and not the hot path.
except Exception as e:
logger.error(f"Error processing {json_file}: {e}")

View file

@ -1,6 +1,6 @@
[project]
name = "agnes-the-ai-analyst"
version = "0.54.18"
version = "0.54.19"
description = "Agnes — AI Data Analyst platform for AI analytical systems"
requires-python = ">=3.11,<3.14"
license = "MIT"

View file

@ -1,5 +1,6 @@
"""Tests for incremental Jira parquet transform (upsert_dataframe and friends)."""
import json
from pathlib import Path
from unittest.mock import patch
@ -10,8 +11,10 @@ import pytest
from connectors.jira.incremental_transform import (
load_parquet_month,
save_parquet_month,
transform_single_issue,
upsert_dataframe,
)
from connectors.jira.transform import REMOTE_LINKS_SCHEMA
# Minimal schema compatible with ISSUES_SCHEMA for testing purposes
@ -183,3 +186,129 @@ class TestParquetMonthlyPartitioning:
assert proj1.iloc[0]["summary"] == "Updated"
proj2 = final[final["issue_key"] == "PROJ-2"]
assert proj2.iloc[0]["summary"] == "Keep"
def _seed_remote_links_parquet(parquet_root, month_key, rows):
"""Write a starter remote_links parquet so we can assert preservation."""
df = pd.DataFrame(rows)
target = parquet_root / "remote_links"
target.mkdir(parents=True, exist_ok=True)
save_parquet_month(df, REMOTE_LINKS_SCHEMA, target, month_key)
def _write_raw_issue(raw_dir, issue_key, payload):
"""Seed raw issue JSON at the path transform_single_issue reads from."""
issues_dir = raw_dir / "issues"
issues_dir.mkdir(parents=True, exist_ok=True)
(issues_dir / f"{issue_key}.json").write_text(json.dumps(payload))
def test_incremental_preserves_remote_links_when_overlay_absent(tmp_path):
"""When the _remote_links key is absent from the raw JSON (the writer
skipped the overlay due to a Jira fetch failure), transform_single_issue
must NOT wipe existing parquet rows for that issue. Existing rows must
remain untouched and the function must report success."""
raw_dir = tmp_path / "raw"
output_dir = tmp_path / "parquet"
attachments_dir = tmp_path / "attachments"
output_dir.mkdir()
attachments_dir.mkdir()
# Pre-seed an existing remote-link row for PROJ-1 in month 2026-05.
_seed_remote_links_parquet(output_dir, "2026-05", [{
"issue_key": "PROJ-1",
"remote_link_id": "rl-existing",
"url": "https://example.com/old",
"title": "Pre-existing link",
"application_name": "X",
"application_type": "x",
}])
# Raw issue WITHOUT _remote_links key — overlay was skipped upstream.
_write_raw_issue(raw_dir, "PROJ-1", {
"key": "PROJ-1",
"id": "10001",
"fields": {
"summary": "test",
"status": {"name": "Open"},
"issuetype": {"name": "Bug"},
"attachment": [],
"comment": {"comments": []},
"created": "2026-05-15T00:00:00.000+0000",
"updated": "2026-05-15T00:00:00.000+0000",
},
# NOTE: no _remote_links key — that is the test condition.
})
ok = transform_single_issue(
issue_key="PROJ-1",
raw_dir=raw_dir,
output_dir=output_dir,
attachments_dir=attachments_dir,
)
assert ok is True
df = load_parquet_month(output_dir / "remote_links", "2026-05")
assert df is not None and len(df) == 1, \
"Existing remote-link row was wiped — overlay-absent signal not honored"
assert df.iloc[0]["remote_link_id"] == "rl-existing"
def test_incremental_wipes_remote_links_when_overlay_present_but_empty(tmp_path):
"""The mirror-image case: when _remote_links IS present but the list is
empty, that's a successful fetch confirming the issue legitimately has no
remote links right now. The transform MUST wipe any existing parquet rows
for that issue keeping them would be stale data.
Together with test_incremental_preserves_remote_links_when_overlay_absent,
this locks the absent-vs-empty contract end-to-end. A future regression
that 'simplifies' the transform to treat [] the same as absent (i.e.,
skip upsert) would be caught here."""
raw_dir = tmp_path / "raw"
output_dir = tmp_path / "parquet"
attachments_dir = tmp_path / "attachments"
output_dir.mkdir()
attachments_dir.mkdir()
# Pre-seed a stale row that should be wiped.
_seed_remote_links_parquet(output_dir, "2026-05", [{
"issue_key": "PROJ-2",
"remote_link_id": "rl-stale",
"url": "https://example.com/stale",
"title": "Stale link to be wiped",
"application_name": "X",
"application_type": "x",
}])
# Raw issue WITH _remote_links: [] — fresh fetch confirmed empty.
_write_raw_issue(raw_dir, "PROJ-2", {
"key": "PROJ-2",
"id": "10002",
"fields": {
"summary": "test",
"status": {"name": "Open"},
"issuetype": {"name": "Bug"},
"attachment": [],
"comment": {"comments": []},
"created": "2026-05-15T00:00:00.000+0000",
"updated": "2026-05-15T00:00:00.000+0000",
},
"_remote_links": [],
})
ok = transform_single_issue(
issue_key="PROJ-2",
raw_dir=raw_dir,
output_dir=output_dir,
attachments_dir=attachments_dir,
)
assert ok is True
df = load_parquet_month(output_dir / "remote_links", "2026-05")
# Either the file was unlinked (df is None) or the row was removed.
# Both outcomes satisfy the contract — the stale row must not survive.
if df is not None:
remaining = df[df["issue_key"] == "PROJ-2"]
assert len(remaining) == 0, \
"Stale remote-link row survived a successful empty-list fetch — " \
"the empty-list (legitimate) signal was misinterpreted as preserve"

View file

@ -8,6 +8,10 @@ import pytest
from tests.helpers.factories import WebhookEventFactory
from connectors.jira.scripts.consistency_check import JiraConsistencyChecker
from connectors.jira.service import JiraFetchError
from connectors.jira.transform import transform_remote_links
@pytest.fixture
def jira_env(tmp_path, monkeypatch):
@ -180,3 +184,203 @@ class TestJiraServiceWebhookProcessing:
secret.encode("utf-8"), payload, hashlib.sha256
).hexdigest()
assert sig == f"sha256={expected_mac}"
class TestFetchRemoteLinks:
"""fetch_remote_links must raise on auth/server failure to prevent the
save_issue overlay from writing [] into cached JSON, which downstream
would interpret as 'delete existing remote_links rows for this issue'."""
def _mock_http(self, status_code, json_body=None):
"""Build a MagicMock httpx.Client context manager returning a fixed response."""
response = MagicMock()
response.status_code = status_code
response.json.return_value = json_body or []
client = MagicMock()
client.get.return_value = response
client.__enter__ = lambda s: client
client.__exit__ = MagicMock(return_value=False)
return client
def test_returns_list_on_200(self, jira_env):
service = _make_jira_service(jira_env)
with patch("connectors.jira.service.httpx.Client",
return_value=self._mock_http(200, [{"id": "1"}])):
assert service.fetch_remote_links("PROJ-1") == [{"id": "1"}]
def test_returns_empty_on_404(self, jira_env):
service = _make_jira_service(jira_env)
with patch("connectors.jira.service.httpx.Client",
return_value=self._mock_http(404)):
assert service.fetch_remote_links("PROJ-1") == []
def test_raises_on_401(self, jira_env):
service = _make_jira_service(jira_env)
with patch("connectors.jira.service.httpx.Client",
return_value=self._mock_http(401)):
with pytest.raises(JiraFetchError, match="auth"):
service.fetch_remote_links("PROJ-1")
def test_raises_on_403(self, jira_env):
service = _make_jira_service(jira_env)
with patch("connectors.jira.service.httpx.Client",
return_value=self._mock_http(403)):
with pytest.raises(JiraFetchError, match="auth"):
service.fetch_remote_links("PROJ-1")
def test_raises_on_500(self, jira_env):
service = _make_jira_service(jira_env)
with patch("connectors.jira.service.httpx.Client",
return_value=self._mock_http(500)):
with pytest.raises(JiraFetchError, match="server"):
service.fetch_remote_links("PROJ-1")
def test_raises_on_429_rate_limit(self, jira_env):
# 429 in the webhook hot path must raise (not silently return []).
# A webhook burst hitting Jira's rate limiter is the most likely
# production scenario; returning [] would re-trigger the wipe bug.
service = _make_jira_service(jira_env)
with patch("connectors.jira.service.httpx.Client",
return_value=self._mock_http(429)):
with pytest.raises(JiraFetchError, match="rate limited"):
service.fetch_remote_links("PROJ-1")
def test_raises_on_unexpected_status(self, jira_env):
# Any non-success/non-404 status raises — covers 400, 405, 418, etc.
# No silent fall-through.
service = _make_jira_service(jira_env)
with patch("connectors.jira.service.httpx.Client",
return_value=self._mock_http(418)):
with pytest.raises(JiraFetchError, match="unexpected status"):
service.fetch_remote_links("PROJ-1")
def test_raises_when_unconfigured(self, jira_env):
"""Regression guard for the Devin adversarial-review finding:
when the Jira service is unconfigured (missing API credentials)
the fetch must raise JiraFetchError, NOT silently return [].
Silent [] would overlay an empty list onto cached issue JSON via
save_issue and wipe existing remote_links parquet rows the next
time a webhook fires while creds happen to be missing the
exact scenario this PR closes for the 401 / 429 / 5xx paths.
The webhook hot path passes HMAC verification with
JIRA_WEBHOOK_SECRET (separate from API creds), so a
missing-creds + webhook-arrives combo is realistic, not
theoretical."""
from connectors.jira import service as svc
# Mirror _make_jira_service except DROP the API credentials so
# is_configured() returns False.
svc.Config.JIRA_DOMAIN = ""
svc.Config.JIRA_EMAIL = ""
svc.Config.JIRA_API_TOKEN = ""
svc.Config.JIRA_DATA_DIR = jira_env
svc.Config.JIRA_WEBHOOK_SECRET = "webhook-secret-123"
svc._jira_service = None
service = svc.get_jira_service()
assert not service.is_configured()
with pytest.raises(JiraFetchError, match="not configured"):
service.fetch_remote_links("PROJ-1")
def test_raises_on_request_error(self, jira_env):
import httpx
service = _make_jira_service(jira_env)
client = MagicMock()
client.get.side_effect = httpx.RequestError("connection reset")
client.__enter__ = lambda s: client
client.__exit__ = MagicMock(return_value=False)
with patch("connectors.jira.service.httpx.Client", return_value=client):
with pytest.raises(JiraFetchError, match="connection"):
service.fetch_remote_links("PROJ-1")
class TestSaveIssueRemoteLinksOverlay:
"""save_issue must NOT set _remote_links when fetch_remote_links raises.
The absent key is the contract with transform_remote_links: it signals
'preserve existing rows'. A present-but-empty list would wipe them."""
def test_sets_remote_links_on_success(self, jira_env):
service = _make_jira_service(jira_env)
with patch.object(service, "fetch_remote_links", return_value=[{"id": "rl-1"}]), \
patch.object(service, "fetch_sla_fields", return_value=None):
path = service.save_issue(_fake_issue_data("PROJ-1"))
with open(path) as f:
data = json.load(f)
assert data["_remote_links"] == [{"id": "rl-1"}]
def test_sets_empty_remote_links_on_404(self, jira_env):
# 404 stays as [] — legitimately means "issue has no remote links".
service = _make_jira_service(jira_env)
with patch.object(service, "fetch_remote_links", return_value=[]), \
patch.object(service, "fetch_sla_fields", return_value=None):
path = service.save_issue(_fake_issue_data("PROJ-1"))
with open(path) as f:
data = json.load(f)
assert data["_remote_links"] == []
def test_omits_remote_links_key_on_fetch_error(self, jira_env):
# When fetch raises, the key MUST be absent — that's the signal
# to the transform that this isn't fresh data and should be skipped.
service = _make_jira_service(jira_env)
with patch.object(service, "fetch_remote_links",
side_effect=JiraFetchError("auth")), \
patch.object(service, "fetch_sla_fields", return_value=None):
path = service.save_issue(_fake_issue_data("PROJ-1"))
with open(path) as f:
data = json.load(f)
assert "_remote_links" not in data, \
"Absent key is the contract with transform_remote_links — " \
"do not change to an empty list."
class TestTransformRemoteLinks:
"""transform_remote_links returns None when _remote_links is absent
(preserve-existing signal) and [] when present-but-empty (legitimate 'none')."""
def test_returns_list_when_links_present(self):
result = transform_remote_links({
"key": "PROJ-1",
"_remote_links": [{
"id": "rl-1",
"object": {"url": "https://x", "title": "X"},
"application": {"name": "App", "type": "type"},
}],
})
assert result is not None
assert len(result) == 1
assert result[0]["remote_link_id"] == "rl-1"
def test_returns_empty_list_when_key_present_but_empty(self):
result = transform_remote_links({"key": "PROJ-1", "_remote_links": []})
assert result == []
def test_returns_none_when_key_absent(self):
# Absent key = save_issue skipped the overlay because fetch failed.
# Signal to caller: preserve existing parquet rows for this issue.
result = transform_remote_links({"key": "PROJ-1"})
assert result is None
def test_returns_none_when_key_is_explicit_null(self):
# Defensive: a JSON with `_remote_links: null` (from older buggy code
# or a manual edit) would otherwise blow up on `for rl in None`.
# Treat null the same as absent — both mean "no fresh data".
result = transform_remote_links({"key": "PROJ-1", "_remote_links": None})
assert result is None
class TestAutoFixThreshold:
"""The auto-fix threshold determines at what gap size we auto-backfill
vs require manual review. Legacy bumped it from 10 to 20 small enough
to stay safe, big enough to absorb a typical SLA-poller hiccup."""
def test_threshold_is_twenty(self):
assert JiraConsistencyChecker.AUTO_FIX_THRESHOLD == 20
def test_alert_level_warning_at_threshold(self):
# 20 missing should still be WARNING (auto-fix territory), not ERROR.
config = MagicMock()
checker = JiraConsistencyChecker(config)
assert checker.get_alert_level(20) == "WARNING"
def test_alert_level_error_above_threshold(self):
config = MagicMock()
checker = JiraConsistencyChecker(config)
assert checker.get_alert_level(21) == "ERROR"