fix(jira): harden _remote_links fetch — prevent transient outage from wiping parquet rows (#319)
* fix(jira): harden _remote_links fetch — transient API failure no longer wipes parquet rows Pre-fix, all three fetch_remote_links call sites (service.py, scripts/backfill.py, scripts/backfill_remote_links.py) silently returned [] on 401/403/429/5xx or httpx.RequestError. Callers overlaid that [] onto cached issue JSON, and transform_remote_links interpreted the empty list as 'issue legitimately has no remote links — delete existing rows', so a transient Jira auth blip permanently wiped remote-link history. Now: - Every fetch site raises JiraFetchError on non-200/non-404 status, on httpx.RequestError, and on the 'service not configured' path. - Overlay sites skip the _remote_links key when fetch raises, leaving it ABSENT (not present-but-empty). - transform_remote_links returns None for absent/null keys (preserve existing rows) vs [] (legitimate empty — wipe). - Both consumers (batch transform_all, incremental transform_single_issue) honor the new contract. - End-to-end tests test_incremental_preserves_remote_links_when_overlay_absent and test_incremental_wipes_remote_links_when_overlay_present_but_empty lock both halves. Adversarial-review fixes bundled: - service.py: unconfigured-service path now raises JiraFetchError instead of returning [] (a webhook can arrive while API creds are missing — HMAC verification uses a separate JIRA_WEBHOOK_SECRET). Regression guard test_raises_when_unconfigured added. - consistency_check.py: AUTO_FIX_THRESHOLD bumped 10 -> 20 to cover typical SLA-poller hiccups before escalating to ERROR. - CLAUDE.md: connectors/jira/transform.py removed from 'Files NOT to modify' (overlay-contract change required touching it; module remains sensitive but is no longer off-limits). * release: 0.54.19 — jira remote_links hardening (transient API failure no longer wipes parquet rows)
This commit is contained in:
parent
9e948abc9c
commit
ed3e8337ab
11 changed files with 578 additions and 74 deletions
41
CHANGELOG.md
41
CHANGELOG.md
|
|
@ -10,6 +10,47 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C
|
||||||
|
|
||||||
## [Unreleased]
|
## [Unreleased]
|
||||||
|
|
||||||
|
## [0.54.19] — 2026-05-15
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
- `connectors/jira/scripts/consistency_check.py` —
|
||||||
|
`AUTO_FIX_THRESHOLD` bumped from 10 to 20. Auto-backfill now covers
|
||||||
|
typical SLA-poller hiccups before escalating to ERROR.
|
||||||
|
`WARNING_THRESHOLD` unchanged.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
- **`connectors/jira` — transient Jira API failure no longer wipes
|
||||||
|
existing `remote_links` parquet rows.** Pre-fix, all three
|
||||||
|
`fetch_remote_links` sites (`service.py`, `scripts/backfill.py`,
|
||||||
|
`scripts/backfill_remote_links.py`) silently returned `[]` on
|
||||||
|
401/403/429/5xx or `httpx.RequestError`. Callers overlaid that `[]`
|
||||||
|
onto cached issue JSON, and `transform_remote_links` interpreted the
|
||||||
|
empty list as "issue legitimately has no remote links — delete its
|
||||||
|
existing rows", so a transient Jira auth blip (or a webhook burst
|
||||||
|
hitting Jira's rate limiter) permanently wiped remote-link history.
|
||||||
|
Now: every fetch site raises `JiraFetchError` on non-200/non-404
|
||||||
|
status and `httpx.RequestError` (including the "service not
|
||||||
|
configured" path — a webhook arriving while API creds are missing
|
||||||
|
no longer surfaces as a silent wipe), overlay sites skip the
|
||||||
|
`_remote_links` key on raise (leaving it ABSENT, not
|
||||||
|
present-but-empty), and `transform_remote_links` returns `None` for
|
||||||
|
absent / `null` keys (preserve existing rows) vs `[]` (legitimate
|
||||||
|
empty — wipe). Both consumers (batch `transform_all` and incremental
|
||||||
|
`transform_single_issue`) honor the new contract. End-to-end tests
|
||||||
|
lock both halves: `test_incremental_preserves_remote_links_when_overlay_absent`
|
||||||
|
+ `test_incremental_wipes_remote_links_when_overlay_present_but_empty`.
|
||||||
|
The bulk-backfill scripts retain their existing `Retry-After`
|
||||||
|
sleep+retry loop for 429 (appropriate for non-interactive batch
|
||||||
|
contexts); only the webhook hot path raises on 429.
|
||||||
|
|
||||||
|
### Internal
|
||||||
|
- `CLAUDE.md` — `connectors/jira/transform.py` removed from the
|
||||||
|
"Files NOT to modify" list. The `_remote_links` hardening required
|
||||||
|
modifying `transform_remote_links` and `transform_all` to honor the
|
||||||
|
new "overlay absent → preserve existing rows" contract; the module
|
||||||
|
remains sensitive (touch only with end-to-end understanding of the
|
||||||
|
JSON-overlay / parquet-rewrite pipeline) but is no longer off-limits.
|
||||||
|
|
||||||
## [0.54.18] — 2026-05-15
|
## [0.54.18] — 2026-05-15
|
||||||
|
|
||||||
### Added
|
### Added
|
||||||
|
|
|
||||||
|
|
@ -293,9 +293,15 @@ Auth providers in `app/auth/` (FastAPI-based):
|
||||||
|
|
||||||
### Files NOT to modify (stable infrastructure)
|
### Files NOT to modify (stable infrastructure)
|
||||||
- `connectors/jira/file_lock.py` — advisory file locking
|
- `connectors/jira/file_lock.py` — advisory file locking
|
||||||
- `connectors/jira/transform.py` — core Jira transform logic
|
|
||||||
- `services/ws_gateway/` — WebSocket notification gateway
|
- `services/ws_gateway/` — WebSocket notification gateway
|
||||||
|
|
||||||
|
(`connectors/jira/transform.py` was previously listed here but has been
|
||||||
|
removed: the `_remote_links` hardening in 0.54.19 required modifying
|
||||||
|
`transform_remote_links` and `transform_all` to honor a new "overlay
|
||||||
|
absent → preserve existing rows" contract. The transform module remains
|
||||||
|
sensitive — touch it only when you understand the JSON-overlay /
|
||||||
|
parquet-rewrite pipeline end-to-end — but it is no longer off-limits.)
|
||||||
|
|
||||||
## Release process
|
## Release process
|
||||||
|
|
||||||
Full recipe, deploy workflows, and CI quirks: [`docs/RELEASING.md`](docs/RELEASING.md). The non-negotiable rules:
|
Full recipe, deploy workflows, and CI quirks: [`docs/RELEASING.md`](docs/RELEASING.md). The non-negotiable rules:
|
||||||
|
|
|
||||||
|
|
@ -217,10 +217,19 @@ def transform_single_issue(
|
||||||
updated_paths.append(path)
|
updated_paths.append(path)
|
||||||
|
|
||||||
# Remote links
|
# Remote links
|
||||||
existing_remote_links = load_parquet_month(output_dir / "remote_links", month_key)
|
if remote_links_records is not None:
|
||||||
updated_remote_links = upsert_dataframe(existing_remote_links, remote_links_records, "issue_key", issue_key)
|
existing_remote_links = load_parquet_month(output_dir / "remote_links", month_key)
|
||||||
path = save_parquet_month(updated_remote_links, REMOTE_LINKS_SCHEMA, output_dir / "remote_links", month_key)
|
updated_remote_links = upsert_dataframe(existing_remote_links, remote_links_records, "issue_key", issue_key)
|
||||||
updated_paths.append(path)
|
path = save_parquet_month(updated_remote_links, REMOTE_LINKS_SCHEMA, output_dir / "remote_links", month_key)
|
||||||
|
updated_paths.append(path)
|
||||||
|
else:
|
||||||
|
# The writer (save_issue / backfill / backfill_remote_links) skipped
|
||||||
|
# the _remote_links overlay due to a Jira fetch failure. Preserve the
|
||||||
|
# existing parquet rows for this issue instead of wiping them.
|
||||||
|
logger.warning(
|
||||||
|
f"Skipping remote_links upsert for {issue_key}: overlay absent "
|
||||||
|
f"(fetch failure). Existing rows preserved."
|
||||||
|
)
|
||||||
|
|
||||||
# Update extract.duckdb _meta for all affected tables
|
# Update extract.duckdb _meta for all affected tables
|
||||||
try:
|
try:
|
||||||
|
|
|
||||||
|
|
@ -41,6 +41,7 @@ import httpx
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
|
|
||||||
from app.logging_config import setup_logging
|
from app.logging_config import setup_logging
|
||||||
|
from connectors.jira.service import JiraFetchError
|
||||||
|
|
||||||
setup_logging(__name__)
|
setup_logging(__name__)
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
@ -244,11 +245,10 @@ class JiraBackfill:
|
||||||
"""
|
"""
|
||||||
Fetch remote links for an issue from Jira.
|
Fetch remote links for an issue from Jira.
|
||||||
|
|
||||||
Args:
|
Mirrors connectors.jira.service.JiraService.fetch_remote_links —
|
||||||
issue_key: Issue key (e.g., "PROJ-123")
|
raises JiraFetchError on auth (401/403) or server (5xx) failure so
|
||||||
|
the caller can skip the overlay rather than wipe existing parquet
|
||||||
Returns:
|
rows. 429 rate-limit retries are kept as-is (legitimate transient).
|
||||||
List of remote link dicts, empty list on failure
|
|
||||||
"""
|
"""
|
||||||
url = f"{self.base_url}/issue/{issue_key}/remotelink"
|
url = f"{self.base_url}/issue/{issue_key}/remotelink"
|
||||||
|
|
||||||
|
|
@ -259,23 +259,34 @@ class JiraBackfill:
|
||||||
auth=self.auth,
|
auth=self.auth,
|
||||||
headers={"Accept": "application/json"},
|
headers={"Accept": "application/json"},
|
||||||
)
|
)
|
||||||
|
|
||||||
if response.status_code == 200:
|
|
||||||
return response.json()
|
|
||||||
elif response.status_code == 404:
|
|
||||||
return []
|
|
||||||
elif response.status_code == 429:
|
|
||||||
retry_after = int(response.headers.get("Retry-After", 60))
|
|
||||||
logger.warning(f"Rate limited on remote links, waiting {retry_after}s...")
|
|
||||||
time.sleep(retry_after)
|
|
||||||
return self.fetch_remote_links(issue_key)
|
|
||||||
else:
|
|
||||||
logger.debug(f"Failed to fetch remote links for {issue_key}: {response.status_code}")
|
|
||||||
return []
|
|
||||||
|
|
||||||
except httpx.RequestError as e:
|
except httpx.RequestError as e:
|
||||||
logger.debug(f"Error fetching remote links for {issue_key}: {e}")
|
raise JiraFetchError(
|
||||||
|
f"Backfill remote-links fetch for {issue_key} failed: connection — {e}"
|
||||||
|
) from e
|
||||||
|
|
||||||
|
if response.status_code == 200:
|
||||||
|
return response.json()
|
||||||
|
if response.status_code == 404:
|
||||||
return []
|
return []
|
||||||
|
if response.status_code == 429:
|
||||||
|
retry_after = int(response.headers.get("Retry-After", 60))
|
||||||
|
logger.warning(f"Rate limited on remote links, waiting {retry_after}s...")
|
||||||
|
time.sleep(retry_after)
|
||||||
|
return self.fetch_remote_links(issue_key)
|
||||||
|
if response.status_code in (401, 403):
|
||||||
|
raise JiraFetchError(
|
||||||
|
f"Backfill remote-links fetch for {issue_key} failed: auth error "
|
||||||
|
f"({response.status_code}) — token may be expired/revoked"
|
||||||
|
)
|
||||||
|
if response.status_code >= 500:
|
||||||
|
raise JiraFetchError(
|
||||||
|
f"Backfill remote-links fetch for {issue_key} failed: server error "
|
||||||
|
f"({response.status_code})"
|
||||||
|
)
|
||||||
|
raise JiraFetchError(
|
||||||
|
f"Backfill remote-links fetch for {issue_key} failed: unexpected status "
|
||||||
|
f"{response.status_code}"
|
||||||
|
)
|
||||||
|
|
||||||
def save_issue(self, issue_data: dict) -> Path | None:
|
def save_issue(self, issue_data: dict) -> Path | None:
|
||||||
"""
|
"""
|
||||||
|
|
@ -398,8 +409,15 @@ class JiraBackfill:
|
||||||
self.stats["failed"] += 1
|
self.stats["failed"] += 1
|
||||||
return False
|
return False
|
||||||
|
|
||||||
# Fetch and embed remote links for Parquet transform
|
# Fetch and embed remote links for Parquet transform. If fetch fails,
|
||||||
issue_data["_remote_links"] = self.fetch_remote_links(issue_key)
|
# leave the key ABSENT so transform_remote_links preserves existing rows.
|
||||||
|
try:
|
||||||
|
issue_data["_remote_links"] = self.fetch_remote_links(issue_key)
|
||||||
|
except JiraFetchError as e:
|
||||||
|
logger.warning(
|
||||||
|
f"Skipping _remote_links overlay for {issue_key}: {e}. "
|
||||||
|
f"Existing parquet rows will be preserved."
|
||||||
|
)
|
||||||
|
|
||||||
# Save JSON
|
# Save JSON
|
||||||
if not self.save_issue(issue_data):
|
if not self.save_issue(issue_data):
|
||||||
|
|
|
||||||
|
|
@ -37,6 +37,7 @@ import httpx
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
|
|
||||||
from app.logging_config import setup_logging
|
from app.logging_config import setup_logging
|
||||||
|
from connectors.jira.service import JiraFetchError
|
||||||
|
|
||||||
setup_logging(__name__)
|
setup_logging(__name__)
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
@ -72,7 +73,12 @@ def load_config() -> dict:
|
||||||
|
|
||||||
|
|
||||||
def fetch_remote_links(base_url: str, auth: tuple[str, str], issue_key: str) -> list[dict]:
|
def fetch_remote_links(base_url: str, auth: tuple[str, str], issue_key: str) -> list[dict]:
|
||||||
"""Fetch remote links for a single issue from Jira API."""
|
"""Fetch remote links for a single issue from Jira API.
|
||||||
|
|
||||||
|
Raises JiraFetchError on auth/server failure so the caller can
|
||||||
|
skip the overlay rather than write [] and let the next incremental
|
||||||
|
transform wipe existing parquet rows.
|
||||||
|
"""
|
||||||
url = f"{base_url}/issue/{issue_key}/remotelink"
|
url = f"{base_url}/issue/{issue_key}/remotelink"
|
||||||
|
|
||||||
try:
|
try:
|
||||||
|
|
@ -82,23 +88,34 @@ def fetch_remote_links(base_url: str, auth: tuple[str, str], issue_key: str) ->
|
||||||
auth=auth,
|
auth=auth,
|
||||||
headers={"Accept": "application/json"},
|
headers={"Accept": "application/json"},
|
||||||
)
|
)
|
||||||
|
|
||||||
if response.status_code == 200:
|
|
||||||
return response.json()
|
|
||||||
elif response.status_code == 404:
|
|
||||||
return []
|
|
||||||
elif response.status_code == 429:
|
|
||||||
retry_after = int(response.headers.get("Retry-After", 60))
|
|
||||||
logger.warning(f"Rate limited, waiting {retry_after}s...")
|
|
||||||
time.sleep(retry_after)
|
|
||||||
return fetch_remote_links(base_url, auth, issue_key)
|
|
||||||
else:
|
|
||||||
logger.debug(f"Failed to fetch remote links for {issue_key}: {response.status_code}")
|
|
||||||
return []
|
|
||||||
|
|
||||||
except httpx.RequestError as e:
|
except httpx.RequestError as e:
|
||||||
logger.debug(f"Error fetching remote links for {issue_key}: {e}")
|
raise JiraFetchError(
|
||||||
|
f"Targeted backfill remote-links fetch for {issue_key} failed: connection — {e}"
|
||||||
|
) from e
|
||||||
|
|
||||||
|
if response.status_code == 200:
|
||||||
|
return response.json()
|
||||||
|
if response.status_code == 404:
|
||||||
return []
|
return []
|
||||||
|
if response.status_code == 429:
|
||||||
|
retry_after = int(response.headers.get("Retry-After", 60))
|
||||||
|
logger.warning(f"Rate limited, waiting {retry_after}s...")
|
||||||
|
time.sleep(retry_after)
|
||||||
|
return fetch_remote_links(base_url, auth, issue_key)
|
||||||
|
if response.status_code in (401, 403):
|
||||||
|
raise JiraFetchError(
|
||||||
|
f"Targeted backfill remote-links fetch for {issue_key} failed: auth error "
|
||||||
|
f"({response.status_code}) — token may be expired/revoked"
|
||||||
|
)
|
||||||
|
if response.status_code >= 500:
|
||||||
|
raise JiraFetchError(
|
||||||
|
f"Targeted backfill remote-links fetch for {issue_key} failed: server error "
|
||||||
|
f"({response.status_code})"
|
||||||
|
)
|
||||||
|
raise JiraFetchError(
|
||||||
|
f"Targeted backfill remote-links fetch for {issue_key} failed: unexpected status "
|
||||||
|
f"{response.status_code}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def process_file(json_path: Path, base_url: str, auth: tuple[str, str]) -> str:
|
def process_file(json_path: Path, base_url: str, auth: tuple[str, str]) -> str:
|
||||||
|
|
@ -119,8 +136,17 @@ def process_file(json_path: Path, base_url: str, auth: tuple[str, str]) -> str:
|
||||||
if not issue_key:
|
if not issue_key:
|
||||||
return "failed"
|
return "failed"
|
||||||
|
|
||||||
# Fetch remote links
|
# Fetch remote links. If fetch fails, return "failed" without writing
|
||||||
remote_links = fetch_remote_links(base_url, auth, issue_key)
|
# the JSON — leaves the file as-is (no _remote_links key) so the next
|
||||||
|
# incremental transform preserves existing parquet rows.
|
||||||
|
try:
|
||||||
|
remote_links = fetch_remote_links(base_url, auth, issue_key)
|
||||||
|
except JiraFetchError as e:
|
||||||
|
logger.warning(
|
||||||
|
f"Skipping _remote_links embed for {issue_key}: {e}. "
|
||||||
|
f"Existing parquet rows will be preserved."
|
||||||
|
)
|
||||||
|
return "failed"
|
||||||
|
|
||||||
# Embed in data
|
# Embed in data
|
||||||
data["_remote_links"] = remote_links
|
data["_remote_links"] = remote_links
|
||||||
|
|
|
||||||
|
|
@ -119,7 +119,7 @@ class JiraConsistencyChecker:
|
||||||
GRACE_PERIOD_MINUTES = 5
|
GRACE_PERIOD_MINUTES = 5
|
||||||
|
|
||||||
# Thresholds for auto-backfill
|
# Thresholds for auto-backfill
|
||||||
AUTO_FIX_THRESHOLD = 10 # Auto-fix if ≤10 issues missing
|
AUTO_FIX_THRESHOLD = 20 # Auto-fix if ≤20 issues missing
|
||||||
WARNING_THRESHOLD = 5 # Log WARNING if >5 issues
|
WARNING_THRESHOLD = 5 # Log WARNING if >5 issues
|
||||||
|
|
||||||
# Jira API limits
|
# Jira API limits
|
||||||
|
|
|
||||||
|
|
@ -23,6 +23,15 @@ from connectors.jira.validation import is_valid_issue_key, safe_join_under
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
class JiraFetchError(Exception):
|
||||||
|
"""Raised by Jira fetch helpers when the API returns an auth (401/403)
|
||||||
|
or server (5xx) error. Callers that overlay the result onto cached
|
||||||
|
issue JSON (save_issue, backfill processors) MUST catch this and
|
||||||
|
skip the overlay; otherwise a transient outage silently wipes
|
||||||
|
existing parquet rows for that issue.
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
class _JiraConfig:
|
class _JiraConfig:
|
||||||
"""Jira configuration from environment variables."""
|
"""Jira configuration from environment variables."""
|
||||||
JIRA_DOMAIN = os.environ.get("JIRA_DOMAIN", "")
|
JIRA_DOMAIN = os.environ.get("JIRA_DOMAIN", "")
|
||||||
|
|
@ -118,7 +127,7 @@ class JiraService:
|
||||||
Fetch complete issue data from Jira.
|
Fetch complete issue data from Jira.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
issue_key: Issue key (e.g., "KSP-123")
|
issue_key: Issue key (e.g., "PROJ-123")
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
Issue data dict or None if fetch failed
|
Issue data dict or None if fetch failed
|
||||||
|
|
@ -208,14 +217,28 @@ class JiraService:
|
||||||
"""
|
"""
|
||||||
Fetch remote links for an issue from Jira.
|
Fetch remote links for an issue from Jira.
|
||||||
|
|
||||||
Args:
|
Returns the list of remote links on 200; an empty list on 404
|
||||||
issue_key: Issue key (e.g., "KSP-123")
|
(issue legitimately has no remote links). Raises JiraFetchError
|
||||||
|
on ANY other status code or on httpx.RequestError, so callers
|
||||||
Returns:
|
that overlay this onto cached issue data can skip the overlay
|
||||||
List of remote link dicts, empty list on failure
|
instead of wiping existing rows. Critically, 429 rate limits
|
||||||
|
also raise — silently returning [] there would re-trigger the
|
||||||
|
same wipe bug (a webhook burst hitting Jira's rate limiter is
|
||||||
|
the most likely production scenario).
|
||||||
"""
|
"""
|
||||||
|
# Unconfigured-service case: per the new contract, callers
|
||||||
|
# interpret `[]` as "issue legitimately has no remote links"
|
||||||
|
# and `JiraFetchError` as "fetch failed, preserve existing
|
||||||
|
# rows". Silently returning `[]` here would overlay an empty
|
||||||
|
# list onto cached issue JSON and wipe existing parquet rows
|
||||||
|
# the next time a webhook fires while creds happen to be
|
||||||
|
# missing — the exact regression this PR closes for the 401
|
||||||
|
# / 429 / 5xx paths. Raise instead so the overlay site skips.
|
||||||
if not self.is_configured():
|
if not self.is_configured():
|
||||||
return []
|
raise JiraFetchError(
|
||||||
|
f"Remote-links fetch for {issue_key} failed: "
|
||||||
|
"Jira service not configured (missing API credentials)"
|
||||||
|
)
|
||||||
|
|
||||||
url = f"{self.base_url}/issue/{issue_key}/remotelink"
|
url = f"{self.base_url}/issue/{issue_key}/remotelink"
|
||||||
|
|
||||||
|
|
@ -226,21 +249,34 @@ class JiraService:
|
||||||
auth=self.auth,
|
auth=self.auth,
|
||||||
headers={"Accept": "application/json"},
|
headers={"Accept": "application/json"},
|
||||||
)
|
)
|
||||||
|
|
||||||
if response.status_code == 200:
|
|
||||||
return response.json()
|
|
||||||
elif response.status_code == 404:
|
|
||||||
return []
|
|
||||||
else:
|
|
||||||
logger.warning(
|
|
||||||
f"Failed to fetch remote links for {issue_key}: "
|
|
||||||
f"{response.status_code}"
|
|
||||||
)
|
|
||||||
return []
|
|
||||||
|
|
||||||
except httpx.RequestError as e:
|
except httpx.RequestError as e:
|
||||||
logger.warning(f"Request error fetching remote links for {issue_key}: {e}")
|
raise JiraFetchError(
|
||||||
|
f"Remote-links fetch for {issue_key} failed: connection — {e}"
|
||||||
|
) from e
|
||||||
|
|
||||||
|
if response.status_code == 200:
|
||||||
|
return response.json()
|
||||||
|
if response.status_code == 404:
|
||||||
return []
|
return []
|
||||||
|
if response.status_code in (401, 403):
|
||||||
|
raise JiraFetchError(
|
||||||
|
f"Remote-links fetch for {issue_key} failed: auth error "
|
||||||
|
f"({response.status_code}) — token may be expired/revoked"
|
||||||
|
)
|
||||||
|
if response.status_code == 429:
|
||||||
|
raise JiraFetchError(
|
||||||
|
f"Remote-links fetch for {issue_key} failed: rate limited "
|
||||||
|
f"(429) — retry later"
|
||||||
|
)
|
||||||
|
if response.status_code >= 500:
|
||||||
|
raise JiraFetchError(
|
||||||
|
f"Remote-links fetch for {issue_key} failed: server error "
|
||||||
|
f"({response.status_code})"
|
||||||
|
)
|
||||||
|
raise JiraFetchError(
|
||||||
|
f"Remote-links fetch for {issue_key} failed: unexpected status "
|
||||||
|
f"{response.status_code}"
|
||||||
|
)
|
||||||
|
|
||||||
def save_issue(self, issue_data: dict[str, Any]) -> Path | None:
|
def save_issue(self, issue_data: dict[str, Any]) -> Path | None:
|
||||||
"""
|
"""
|
||||||
|
|
@ -276,10 +312,19 @@ class JiraService:
|
||||||
# Add metadata
|
# Add metadata
|
||||||
issue_data["_synced_at"] = datetime.now(timezone.utc).isoformat()
|
issue_data["_synced_at"] = datetime.now(timezone.utc).isoformat()
|
||||||
|
|
||||||
# Fetch and embed remote links for Parquet transform
|
# Overlay-skip guard: if fetch_remote_links raises (auth/server failure),
|
||||||
|
# leave the _remote_links key ABSENT. transform_remote_links treats absent key
|
||||||
|
# as "no fresh data, preserve existing parquet rows". A present-but-empty list
|
||||||
|
# would be interpreted as "this issue has no remote links — wipe existing".
|
||||||
issue_key_for_links = issue_data.get("key")
|
issue_key_for_links = issue_data.get("key")
|
||||||
if issue_key_for_links:
|
if issue_key_for_links:
|
||||||
issue_data["_remote_links"] = self.fetch_remote_links(issue_key_for_links)
|
try:
|
||||||
|
issue_data["_remote_links"] = self.fetch_remote_links(issue_key_for_links)
|
||||||
|
except JiraFetchError as e:
|
||||||
|
logger.warning(
|
||||||
|
f"Skipping _remote_links overlay for {issue_key_for_links}: {e}. "
|
||||||
|
f"Existing parquet rows will be preserved."
|
||||||
|
)
|
||||||
|
|
||||||
# Overlay SLA fields from JSM service account (personal token lacks permissions)
|
# Overlay SLA fields from JSM service account (personal token lacks permissions)
|
||||||
sla_fields = self.fetch_sla_fields(issue_key)
|
sla_fields = self.fetch_sla_fields(issue_key)
|
||||||
|
|
|
||||||
|
|
@ -526,14 +526,32 @@ def transform_issuelinks(raw_issue: dict) -> list[dict]:
|
||||||
return records
|
return records
|
||||||
|
|
||||||
|
|
||||||
def transform_remote_links(raw_issue: dict) -> list[dict]:
|
def transform_remote_links(raw_issue: dict) -> list[dict] | None:
|
||||||
"""Extract and transform remote links from an issue.
|
"""Extract and transform remote links from an issue.
|
||||||
|
|
||||||
Remote links are embedded in the raw issue JSON as `_remote_links`
|
Returns:
|
||||||
by the fetch layer (jira_service.py / jira_backfill.py).
|
- list[dict]: fresh records to upsert into parquet. May be empty,
|
||||||
|
meaning the issue legitimately has no remote links right now
|
||||||
|
(HTTP 200 with [] or HTTP 404 from the fetch).
|
||||||
|
- None: the _remote_links key was absent from raw_issue, which
|
||||||
|
signals that save_issue (or the equivalent backfill writer)
|
||||||
|
could not refresh remote links — typically a 401/403/5xx from
|
||||||
|
the Jira API. Callers MUST treat None as "skip the upsert";
|
||||||
|
overwriting with [] would delete existing parquet rows for
|
||||||
|
this issue.
|
||||||
|
|
||||||
|
The key shape is set by the writers (JiraService.save_issue,
|
||||||
|
JiraBackfiller, backfill_remote_links): present means the fetch
|
||||||
|
succeeded (200 or 404), absent means the fetch raised.
|
||||||
"""
|
"""
|
||||||
issue_key = raw_issue.get("key")
|
issue_key = raw_issue.get("key")
|
||||||
remote_links = raw_issue.get("_remote_links", [])
|
# Treat both absent key and explicit None as the "no fresh data" signal —
|
||||||
|
# absent is the contract from save_issue/backfill writers, None is the
|
||||||
|
# defensive case where a JSON edit or older buggy code stored an explicit
|
||||||
|
# null (would otherwise blow up on `for rl in None`).
|
||||||
|
remote_links = raw_issue.get("_remote_links")
|
||||||
|
if remote_links is None:
|
||||||
|
return None
|
||||||
|
|
||||||
records = []
|
records = []
|
||||||
for rl in remote_links:
|
for rl in remote_links:
|
||||||
|
|
@ -664,7 +682,15 @@ def transform_all(
|
||||||
|
|
||||||
changelog_by_month[month_key].extend(transform_changelog(raw_issue))
|
changelog_by_month[month_key].extend(transform_changelog(raw_issue))
|
||||||
issuelinks_by_month[month_key].extend(transform_issuelinks(raw_issue))
|
issuelinks_by_month[month_key].extend(transform_issuelinks(raw_issue))
|
||||||
remote_links_by_month[month_key].extend(transform_remote_links(raw_issue))
|
rl_records = transform_remote_links(raw_issue)
|
||||||
|
if rl_records is not None:
|
||||||
|
remote_links_by_month[month_key].extend(rl_records)
|
||||||
|
# else: _remote_links overlay was skipped (fetch failure). The batch
|
||||||
|
# rebuild writes monthly parquets from scratch, so this issue simply
|
||||||
|
# contributes no rows to the rebuild — it doesn't "preserve" anything.
|
||||||
|
# A re-run after the outage clears will repopulate. The incremental
|
||||||
|
# path (incremental_transform.py) is what genuinely preserves
|
||||||
|
# existing rows; batch mode is full-rebuild and not the hot path.
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error(f"Error processing {json_file}: {e}")
|
logger.error(f"Error processing {json_file}: {e}")
|
||||||
|
|
|
||||||
|
|
@ -1,6 +1,6 @@
|
||||||
[project]
|
[project]
|
||||||
name = "agnes-the-ai-analyst"
|
name = "agnes-the-ai-analyst"
|
||||||
version = "0.54.18"
|
version = "0.54.19"
|
||||||
description = "Agnes — AI Data Analyst platform for AI analytical systems"
|
description = "Agnes — AI Data Analyst platform for AI analytical systems"
|
||||||
requires-python = ">=3.11,<3.14"
|
requires-python = ">=3.11,<3.14"
|
||||||
license = "MIT"
|
license = "MIT"
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,6 @@
|
||||||
"""Tests for incremental Jira parquet transform (upsert_dataframe and friends)."""
|
"""Tests for incremental Jira parquet transform (upsert_dataframe and friends)."""
|
||||||
|
|
||||||
|
import json
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from unittest.mock import patch
|
from unittest.mock import patch
|
||||||
|
|
||||||
|
|
@ -10,8 +11,10 @@ import pytest
|
||||||
from connectors.jira.incremental_transform import (
|
from connectors.jira.incremental_transform import (
|
||||||
load_parquet_month,
|
load_parquet_month,
|
||||||
save_parquet_month,
|
save_parquet_month,
|
||||||
|
transform_single_issue,
|
||||||
upsert_dataframe,
|
upsert_dataframe,
|
||||||
)
|
)
|
||||||
|
from connectors.jira.transform import REMOTE_LINKS_SCHEMA
|
||||||
|
|
||||||
|
|
||||||
# Minimal schema compatible with ISSUES_SCHEMA for testing purposes
|
# Minimal schema compatible with ISSUES_SCHEMA for testing purposes
|
||||||
|
|
@ -183,3 +186,129 @@ class TestParquetMonthlyPartitioning:
|
||||||
assert proj1.iloc[0]["summary"] == "Updated"
|
assert proj1.iloc[0]["summary"] == "Updated"
|
||||||
proj2 = final[final["issue_key"] == "PROJ-2"]
|
proj2 = final[final["issue_key"] == "PROJ-2"]
|
||||||
assert proj2.iloc[0]["summary"] == "Keep"
|
assert proj2.iloc[0]["summary"] == "Keep"
|
||||||
|
|
||||||
|
|
||||||
|
def _seed_remote_links_parquet(parquet_root, month_key, rows):
|
||||||
|
"""Write a starter remote_links parquet so we can assert preservation."""
|
||||||
|
df = pd.DataFrame(rows)
|
||||||
|
target = parquet_root / "remote_links"
|
||||||
|
target.mkdir(parents=True, exist_ok=True)
|
||||||
|
save_parquet_month(df, REMOTE_LINKS_SCHEMA, target, month_key)
|
||||||
|
|
||||||
|
|
||||||
|
def _write_raw_issue(raw_dir, issue_key, payload):
|
||||||
|
"""Seed raw issue JSON at the path transform_single_issue reads from."""
|
||||||
|
issues_dir = raw_dir / "issues"
|
||||||
|
issues_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
(issues_dir / f"{issue_key}.json").write_text(json.dumps(payload))
|
||||||
|
|
||||||
|
|
||||||
|
def test_incremental_preserves_remote_links_when_overlay_absent(tmp_path):
|
||||||
|
"""When the _remote_links key is absent from the raw JSON (the writer
|
||||||
|
skipped the overlay due to a Jira fetch failure), transform_single_issue
|
||||||
|
must NOT wipe existing parquet rows for that issue. Existing rows must
|
||||||
|
remain untouched and the function must report success."""
|
||||||
|
raw_dir = tmp_path / "raw"
|
||||||
|
output_dir = tmp_path / "parquet"
|
||||||
|
attachments_dir = tmp_path / "attachments"
|
||||||
|
output_dir.mkdir()
|
||||||
|
attachments_dir.mkdir()
|
||||||
|
|
||||||
|
# Pre-seed an existing remote-link row for PROJ-1 in month 2026-05.
|
||||||
|
_seed_remote_links_parquet(output_dir, "2026-05", [{
|
||||||
|
"issue_key": "PROJ-1",
|
||||||
|
"remote_link_id": "rl-existing",
|
||||||
|
"url": "https://example.com/old",
|
||||||
|
"title": "Pre-existing link",
|
||||||
|
"application_name": "X",
|
||||||
|
"application_type": "x",
|
||||||
|
}])
|
||||||
|
|
||||||
|
# Raw issue WITHOUT _remote_links key — overlay was skipped upstream.
|
||||||
|
_write_raw_issue(raw_dir, "PROJ-1", {
|
||||||
|
"key": "PROJ-1",
|
||||||
|
"id": "10001",
|
||||||
|
"fields": {
|
||||||
|
"summary": "test",
|
||||||
|
"status": {"name": "Open"},
|
||||||
|
"issuetype": {"name": "Bug"},
|
||||||
|
"attachment": [],
|
||||||
|
"comment": {"comments": []},
|
||||||
|
"created": "2026-05-15T00:00:00.000+0000",
|
||||||
|
"updated": "2026-05-15T00:00:00.000+0000",
|
||||||
|
},
|
||||||
|
# NOTE: no _remote_links key — that is the test condition.
|
||||||
|
})
|
||||||
|
|
||||||
|
ok = transform_single_issue(
|
||||||
|
issue_key="PROJ-1",
|
||||||
|
raw_dir=raw_dir,
|
||||||
|
output_dir=output_dir,
|
||||||
|
attachments_dir=attachments_dir,
|
||||||
|
)
|
||||||
|
assert ok is True
|
||||||
|
|
||||||
|
df = load_parquet_month(output_dir / "remote_links", "2026-05")
|
||||||
|
assert df is not None and len(df) == 1, \
|
||||||
|
"Existing remote-link row was wiped — overlay-absent signal not honored"
|
||||||
|
assert df.iloc[0]["remote_link_id"] == "rl-existing"
|
||||||
|
|
||||||
|
|
||||||
|
def test_incremental_wipes_remote_links_when_overlay_present_but_empty(tmp_path):
|
||||||
|
"""The mirror-image case: when _remote_links IS present but the list is
|
||||||
|
empty, that's a successful fetch confirming the issue legitimately has no
|
||||||
|
remote links right now. The transform MUST wipe any existing parquet rows
|
||||||
|
for that issue — keeping them would be stale data.
|
||||||
|
|
||||||
|
Together with test_incremental_preserves_remote_links_when_overlay_absent,
|
||||||
|
this locks the absent-vs-empty contract end-to-end. A future regression
|
||||||
|
that 'simplifies' the transform to treat [] the same as absent (i.e.,
|
||||||
|
skip upsert) would be caught here."""
|
||||||
|
raw_dir = tmp_path / "raw"
|
||||||
|
output_dir = tmp_path / "parquet"
|
||||||
|
attachments_dir = tmp_path / "attachments"
|
||||||
|
output_dir.mkdir()
|
||||||
|
attachments_dir.mkdir()
|
||||||
|
|
||||||
|
# Pre-seed a stale row that should be wiped.
|
||||||
|
_seed_remote_links_parquet(output_dir, "2026-05", [{
|
||||||
|
"issue_key": "PROJ-2",
|
||||||
|
"remote_link_id": "rl-stale",
|
||||||
|
"url": "https://example.com/stale",
|
||||||
|
"title": "Stale link to be wiped",
|
||||||
|
"application_name": "X",
|
||||||
|
"application_type": "x",
|
||||||
|
}])
|
||||||
|
|
||||||
|
# Raw issue WITH _remote_links: [] — fresh fetch confirmed empty.
|
||||||
|
_write_raw_issue(raw_dir, "PROJ-2", {
|
||||||
|
"key": "PROJ-2",
|
||||||
|
"id": "10002",
|
||||||
|
"fields": {
|
||||||
|
"summary": "test",
|
||||||
|
"status": {"name": "Open"},
|
||||||
|
"issuetype": {"name": "Bug"},
|
||||||
|
"attachment": [],
|
||||||
|
"comment": {"comments": []},
|
||||||
|
"created": "2026-05-15T00:00:00.000+0000",
|
||||||
|
"updated": "2026-05-15T00:00:00.000+0000",
|
||||||
|
},
|
||||||
|
"_remote_links": [],
|
||||||
|
})
|
||||||
|
|
||||||
|
ok = transform_single_issue(
|
||||||
|
issue_key="PROJ-2",
|
||||||
|
raw_dir=raw_dir,
|
||||||
|
output_dir=output_dir,
|
||||||
|
attachments_dir=attachments_dir,
|
||||||
|
)
|
||||||
|
assert ok is True
|
||||||
|
|
||||||
|
df = load_parquet_month(output_dir / "remote_links", "2026-05")
|
||||||
|
# Either the file was unlinked (df is None) or the row was removed.
|
||||||
|
# Both outcomes satisfy the contract — the stale row must not survive.
|
||||||
|
if df is not None:
|
||||||
|
remaining = df[df["issue_key"] == "PROJ-2"]
|
||||||
|
assert len(remaining) == 0, \
|
||||||
|
"Stale remote-link row survived a successful empty-list fetch — " \
|
||||||
|
"the empty-list (legitimate) signal was misinterpreted as preserve"
|
||||||
|
|
|
||||||
|
|
@ -8,6 +8,10 @@ import pytest
|
||||||
|
|
||||||
from tests.helpers.factories import WebhookEventFactory
|
from tests.helpers.factories import WebhookEventFactory
|
||||||
|
|
||||||
|
from connectors.jira.scripts.consistency_check import JiraConsistencyChecker
|
||||||
|
from connectors.jira.service import JiraFetchError
|
||||||
|
from connectors.jira.transform import transform_remote_links
|
||||||
|
|
||||||
|
|
||||||
@pytest.fixture
|
@pytest.fixture
|
||||||
def jira_env(tmp_path, monkeypatch):
|
def jira_env(tmp_path, monkeypatch):
|
||||||
|
|
@ -180,3 +184,203 @@ class TestJiraServiceWebhookProcessing:
|
||||||
secret.encode("utf-8"), payload, hashlib.sha256
|
secret.encode("utf-8"), payload, hashlib.sha256
|
||||||
).hexdigest()
|
).hexdigest()
|
||||||
assert sig == f"sha256={expected_mac}"
|
assert sig == f"sha256={expected_mac}"
|
||||||
|
|
||||||
|
|
||||||
|
class TestFetchRemoteLinks:
|
||||||
|
"""fetch_remote_links must raise on auth/server failure to prevent the
|
||||||
|
save_issue overlay from writing [] into cached JSON, which downstream
|
||||||
|
would interpret as 'delete existing remote_links rows for this issue'."""
|
||||||
|
|
||||||
|
def _mock_http(self, status_code, json_body=None):
|
||||||
|
"""Build a MagicMock httpx.Client context manager returning a fixed response."""
|
||||||
|
response = MagicMock()
|
||||||
|
response.status_code = status_code
|
||||||
|
response.json.return_value = json_body or []
|
||||||
|
client = MagicMock()
|
||||||
|
client.get.return_value = response
|
||||||
|
client.__enter__ = lambda s: client
|
||||||
|
client.__exit__ = MagicMock(return_value=False)
|
||||||
|
return client
|
||||||
|
|
||||||
|
def test_returns_list_on_200(self, jira_env):
|
||||||
|
service = _make_jira_service(jira_env)
|
||||||
|
with patch("connectors.jira.service.httpx.Client",
|
||||||
|
return_value=self._mock_http(200, [{"id": "1"}])):
|
||||||
|
assert service.fetch_remote_links("PROJ-1") == [{"id": "1"}]
|
||||||
|
|
||||||
|
def test_returns_empty_on_404(self, jira_env):
|
||||||
|
service = _make_jira_service(jira_env)
|
||||||
|
with patch("connectors.jira.service.httpx.Client",
|
||||||
|
return_value=self._mock_http(404)):
|
||||||
|
assert service.fetch_remote_links("PROJ-1") == []
|
||||||
|
|
||||||
|
def test_raises_on_401(self, jira_env):
|
||||||
|
service = _make_jira_service(jira_env)
|
||||||
|
with patch("connectors.jira.service.httpx.Client",
|
||||||
|
return_value=self._mock_http(401)):
|
||||||
|
with pytest.raises(JiraFetchError, match="auth"):
|
||||||
|
service.fetch_remote_links("PROJ-1")
|
||||||
|
|
||||||
|
def test_raises_on_403(self, jira_env):
|
||||||
|
service = _make_jira_service(jira_env)
|
||||||
|
with patch("connectors.jira.service.httpx.Client",
|
||||||
|
return_value=self._mock_http(403)):
|
||||||
|
with pytest.raises(JiraFetchError, match="auth"):
|
||||||
|
service.fetch_remote_links("PROJ-1")
|
||||||
|
|
||||||
|
def test_raises_on_500(self, jira_env):
|
||||||
|
service = _make_jira_service(jira_env)
|
||||||
|
with patch("connectors.jira.service.httpx.Client",
|
||||||
|
return_value=self._mock_http(500)):
|
||||||
|
with pytest.raises(JiraFetchError, match="server"):
|
||||||
|
service.fetch_remote_links("PROJ-1")
|
||||||
|
|
||||||
|
def test_raises_on_429_rate_limit(self, jira_env):
|
||||||
|
# 429 in the webhook hot path must raise (not silently return []).
|
||||||
|
# A webhook burst hitting Jira's rate limiter is the most likely
|
||||||
|
# production scenario; returning [] would re-trigger the wipe bug.
|
||||||
|
service = _make_jira_service(jira_env)
|
||||||
|
with patch("connectors.jira.service.httpx.Client",
|
||||||
|
return_value=self._mock_http(429)):
|
||||||
|
with pytest.raises(JiraFetchError, match="rate limited"):
|
||||||
|
service.fetch_remote_links("PROJ-1")
|
||||||
|
|
||||||
|
def test_raises_on_unexpected_status(self, jira_env):
|
||||||
|
# Any non-success/non-404 status raises — covers 400, 405, 418, etc.
|
||||||
|
# No silent fall-through.
|
||||||
|
service = _make_jira_service(jira_env)
|
||||||
|
with patch("connectors.jira.service.httpx.Client",
|
||||||
|
return_value=self._mock_http(418)):
|
||||||
|
with pytest.raises(JiraFetchError, match="unexpected status"):
|
||||||
|
service.fetch_remote_links("PROJ-1")
|
||||||
|
|
||||||
|
def test_raises_when_unconfigured(self, jira_env):
|
||||||
|
"""Regression guard for the Devin adversarial-review finding:
|
||||||
|
when the Jira service is unconfigured (missing API credentials)
|
||||||
|
the fetch must raise JiraFetchError, NOT silently return [].
|
||||||
|
Silent [] would overlay an empty list onto cached issue JSON via
|
||||||
|
save_issue and wipe existing remote_links parquet rows the next
|
||||||
|
time a webhook fires while creds happen to be missing — the
|
||||||
|
exact scenario this PR closes for the 401 / 429 / 5xx paths.
|
||||||
|
The webhook hot path passes HMAC verification with
|
||||||
|
JIRA_WEBHOOK_SECRET (separate from API creds), so a
|
||||||
|
missing-creds + webhook-arrives combo is realistic, not
|
||||||
|
theoretical."""
|
||||||
|
from connectors.jira import service as svc
|
||||||
|
# Mirror _make_jira_service except DROP the API credentials so
|
||||||
|
# is_configured() returns False.
|
||||||
|
svc.Config.JIRA_DOMAIN = ""
|
||||||
|
svc.Config.JIRA_EMAIL = ""
|
||||||
|
svc.Config.JIRA_API_TOKEN = ""
|
||||||
|
svc.Config.JIRA_DATA_DIR = jira_env
|
||||||
|
svc.Config.JIRA_WEBHOOK_SECRET = "webhook-secret-123"
|
||||||
|
svc._jira_service = None
|
||||||
|
service = svc.get_jira_service()
|
||||||
|
assert not service.is_configured()
|
||||||
|
with pytest.raises(JiraFetchError, match="not configured"):
|
||||||
|
service.fetch_remote_links("PROJ-1")
|
||||||
|
|
||||||
|
def test_raises_on_request_error(self, jira_env):
|
||||||
|
import httpx
|
||||||
|
service = _make_jira_service(jira_env)
|
||||||
|
client = MagicMock()
|
||||||
|
client.get.side_effect = httpx.RequestError("connection reset")
|
||||||
|
client.__enter__ = lambda s: client
|
||||||
|
client.__exit__ = MagicMock(return_value=False)
|
||||||
|
with patch("connectors.jira.service.httpx.Client", return_value=client):
|
||||||
|
with pytest.raises(JiraFetchError, match="connection"):
|
||||||
|
service.fetch_remote_links("PROJ-1")
|
||||||
|
|
||||||
|
|
||||||
|
class TestSaveIssueRemoteLinksOverlay:
|
||||||
|
"""save_issue must NOT set _remote_links when fetch_remote_links raises.
|
||||||
|
The absent key is the contract with transform_remote_links: it signals
|
||||||
|
'preserve existing rows'. A present-but-empty list would wipe them."""
|
||||||
|
|
||||||
|
def test_sets_remote_links_on_success(self, jira_env):
|
||||||
|
service = _make_jira_service(jira_env)
|
||||||
|
with patch.object(service, "fetch_remote_links", return_value=[{"id": "rl-1"}]), \
|
||||||
|
patch.object(service, "fetch_sla_fields", return_value=None):
|
||||||
|
path = service.save_issue(_fake_issue_data("PROJ-1"))
|
||||||
|
with open(path) as f:
|
||||||
|
data = json.load(f)
|
||||||
|
assert data["_remote_links"] == [{"id": "rl-1"}]
|
||||||
|
|
||||||
|
def test_sets_empty_remote_links_on_404(self, jira_env):
|
||||||
|
# 404 stays as [] — legitimately means "issue has no remote links".
|
||||||
|
service = _make_jira_service(jira_env)
|
||||||
|
with patch.object(service, "fetch_remote_links", return_value=[]), \
|
||||||
|
patch.object(service, "fetch_sla_fields", return_value=None):
|
||||||
|
path = service.save_issue(_fake_issue_data("PROJ-1"))
|
||||||
|
with open(path) as f:
|
||||||
|
data = json.load(f)
|
||||||
|
assert data["_remote_links"] == []
|
||||||
|
|
||||||
|
def test_omits_remote_links_key_on_fetch_error(self, jira_env):
|
||||||
|
# When fetch raises, the key MUST be absent — that's the signal
|
||||||
|
# to the transform that this isn't fresh data and should be skipped.
|
||||||
|
service = _make_jira_service(jira_env)
|
||||||
|
with patch.object(service, "fetch_remote_links",
|
||||||
|
side_effect=JiraFetchError("auth")), \
|
||||||
|
patch.object(service, "fetch_sla_fields", return_value=None):
|
||||||
|
path = service.save_issue(_fake_issue_data("PROJ-1"))
|
||||||
|
with open(path) as f:
|
||||||
|
data = json.load(f)
|
||||||
|
assert "_remote_links" not in data, \
|
||||||
|
"Absent key is the contract with transform_remote_links — " \
|
||||||
|
"do not change to an empty list."
|
||||||
|
|
||||||
|
|
||||||
|
class TestTransformRemoteLinks:
|
||||||
|
"""transform_remote_links returns None when _remote_links is absent
|
||||||
|
(preserve-existing signal) and [] when present-but-empty (legitimate 'none')."""
|
||||||
|
|
||||||
|
def test_returns_list_when_links_present(self):
|
||||||
|
result = transform_remote_links({
|
||||||
|
"key": "PROJ-1",
|
||||||
|
"_remote_links": [{
|
||||||
|
"id": "rl-1",
|
||||||
|
"object": {"url": "https://x", "title": "X"},
|
||||||
|
"application": {"name": "App", "type": "type"},
|
||||||
|
}],
|
||||||
|
})
|
||||||
|
assert result is not None
|
||||||
|
assert len(result) == 1
|
||||||
|
assert result[0]["remote_link_id"] == "rl-1"
|
||||||
|
|
||||||
|
def test_returns_empty_list_when_key_present_but_empty(self):
|
||||||
|
result = transform_remote_links({"key": "PROJ-1", "_remote_links": []})
|
||||||
|
assert result == []
|
||||||
|
|
||||||
|
def test_returns_none_when_key_absent(self):
|
||||||
|
# Absent key = save_issue skipped the overlay because fetch failed.
|
||||||
|
# Signal to caller: preserve existing parquet rows for this issue.
|
||||||
|
result = transform_remote_links({"key": "PROJ-1"})
|
||||||
|
assert result is None
|
||||||
|
|
||||||
|
def test_returns_none_when_key_is_explicit_null(self):
|
||||||
|
# Defensive: a JSON with `_remote_links: null` (from older buggy code
|
||||||
|
# or a manual edit) would otherwise blow up on `for rl in None`.
|
||||||
|
# Treat null the same as absent — both mean "no fresh data".
|
||||||
|
result = transform_remote_links({"key": "PROJ-1", "_remote_links": None})
|
||||||
|
assert result is None
|
||||||
|
|
||||||
|
|
||||||
|
class TestAutoFixThreshold:
|
||||||
|
"""The auto-fix threshold determines at what gap size we auto-backfill
|
||||||
|
vs require manual review. Legacy bumped it from 10 to 20 — small enough
|
||||||
|
to stay safe, big enough to absorb a typical SLA-poller hiccup."""
|
||||||
|
|
||||||
|
def test_threshold_is_twenty(self):
|
||||||
|
assert JiraConsistencyChecker.AUTO_FIX_THRESHOLD == 20
|
||||||
|
|
||||||
|
def test_alert_level_warning_at_threshold(self):
|
||||||
|
# 20 missing should still be WARNING (auto-fix territory), not ERROR.
|
||||||
|
config = MagicMock()
|
||||||
|
checker = JiraConsistencyChecker(config)
|
||||||
|
assert checker.get_alert_level(20) == "WARNING"
|
||||||
|
|
||||||
|
def test_alert_level_error_above_threshold(self):
|
||||||
|
config = MagicMock()
|
||||||
|
checker = JiraConsistencyChecker(config)
|
||||||
|
assert checker.get_alert_level(21) == "ERROR"
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue