- Replace padak/tmp_oss → keboola/agnes-the-ai-analyst in all docs, infra, CLI - Replace your-org/ai-data-analyst → keboola/agnes-the-ai-analyst in README, Jira docs - Remove real GCP project ID from terraform.tfvars.example - Delete internal draft documents (dev_docs/draft/) - Update infra/main.tf to clone from main branch
626 lines
27 KiB
Markdown
626 lines
27 KiB
Markdown
# Jira Integration
|
|
|
|
Real-time sync of Jira support tickets for AI-powered analysis.
|
|
|
|
## Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
│ JIRA CLOUD │
|
|
│ (your-org.atlassian.net) │
|
|
│ │
|
|
│ Issue created/updated/deleted ───► Webhook POST │
|
|
│ Comment added/updated ───► with HMAC signature │
|
|
│ Attachment uploaded ───► │
|
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
│ DATA BROKER SERVER │
|
|
│ (your-instance.example.com) │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
│ │ Flask Webapp (/webhooks/jira) │ │
|
|
│ │ │ │
|
|
│ │ 1. Verify HMAC-SHA256 signature │ │
|
|
│ │ 2. Log raw webhook event │ │
|
|
│ │ 3. Extract issue key from payload │ │
|
|
│ │ 4. Fetch complete issue data via Jira REST API │ │
|
|
│ │ 5. Overlay SLA fields via JSM service account (cloud API) │ │
|
|
│ │ 6. Save issue JSON to disk │ │
|
|
│ │ 7. Download all attachments │ │
|
|
│ │ 8. Trigger incremental Parquet transform │ │
|
|
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
│ │ /data/src_data/raw/jira/ │ │
|
|
│ │ ├── issues/ # Raw JSON per issue │ │
|
|
│ │ │ ├── SUPPORT-15186.json │ │
|
|
│ │ │ └── SUPPORT-15190.json │ │
|
|
│ │ ├── attachments/ # Downloaded files │ │
|
|
│ │ │ └── SUPPORT-15190/ │ │
|
|
│ │ │ └── 56340_image.png │ │
|
|
│ │ └── webhook_events/ # Audit log │ │
|
|
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
│ │ incremental_jira_transform.py (called automatically) │ │
|
|
│ │ │ │
|
|
│ │ • Load saved issue JSON │ │
|
|
│ │ • Extract fields, convert ADF to plain text │ │
|
|
│ │ • Upsert into monthly Parquet (only affected month) │ │
|
|
│ │ • Copy to distribution directory │ │
|
|
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
│ │ /data/src_data/parquet/jira/ (monthly partitioned) │ │
|
|
│ │ ├── issues/ # 49 columns, clean schema │ │
|
|
│ │ │ ├── 2025-01.parquet │ │
|
|
│ │ │ └── 2025-02.parquet │ │
|
|
│ │ ├── comments/ # Extracted comment text │ │
|
|
│ │ ├── attachments/ # Metadata + local paths │ │
|
|
│ │ ├── changelog/ # Field change history │ │
|
|
│ │ ├── issuelinks/ # Links between issues │ │
|
|
│ │ └── remote_links/ # External links (Confluence, Slack) │ │
|
|
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
│
|
|
│ rsync
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
│ ANALYST MACHINE │
|
|
│ │
|
|
│ ~/data-analysis/ │
|
|
│ └── server/ │
|
|
│ └── parquet/ │
|
|
│ └── jira/ # Synced Parquet + attachments │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────────────────┐ │
|
|
│ │ Claude Code + DuckDB │ │
|
|
│ │ │ │
|
|
│ │ -- Query all months with glob pattern │ │
|
|
│ │ SELECT * FROM 'server/parquet/jira/issues/*.parquet' │ │
|
|
│ │ WHERE severity LIKE '%Medium%'; │ │
|
|
│ └─────────────────────────────────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Components
|
|
|
|
### 1. Jira Webhook Configuration
|
|
|
|
**Location:** https://your-org.atlassian.net/plugins/servlet/webhooks
|
|
|
|
| Setting | Value |
|
|
|---------|-------|
|
|
| URL | `https://your-instance.example.com/webhooks/jira` |
|
|
| Secret | Same as `JIRA_WEBHOOK_SECRET` in server `.env` |
|
|
| JQL Filter | `project = "Your Project"` |
|
|
|
|
**Subscribed Events:**
|
|
- Issue: created, updated, deleted
|
|
- Comment: created, updated
|
|
- Attachment: created
|
|
- Issue link: created
|
|
|
|
### 2. Webhook Receiver
|
|
|
|
**File:** `connectors/jira/webhook.py`
|
|
|
|
Flask blueprint that handles incoming webhooks:
|
|
|
|
```python
|
|
@jira_bp.route("/jira", methods=["POST"])
|
|
def receive_jira_webhook():
|
|
# 1. Verify HMAC signature
|
|
# 2. Parse JSON payload
|
|
# 3. Log event to webhook_events/
|
|
# 4. Call jira_service.process_webhook_event()
|
|
```
|
|
|
|
**Endpoints:**
|
|
|
|
| Endpoint | Method | Description |
|
|
|----------|--------|-------------|
|
|
| `/webhooks/jira` | POST | Receive webhooks from Jira |
|
|
| `/webhooks/jira/health` | GET | Health check, shows config status |
|
|
| `/webhooks/jira/test` | POST | Manual issue fetch (debug mode only) |
|
|
|
|
### 3. Jira Service
|
|
|
|
**File:** `connectors/jira/service.py`
|
|
|
|
Handles Jira API communication and data persistence:
|
|
|
|
```python
|
|
class JiraService:
|
|
def fetch_issue(issue_key) -> dict
|
|
# GET /rest/api/3/issue/{key}?expand=renderedFields,changelog&fields=*all
|
|
|
|
def fetch_sla_fields(issue_key) -> dict | None
|
|
# GET via cloud API with JSM service account
|
|
# Returns SLA fields (first_response_time, time_to_resolution)
|
|
|
|
def save_issue(issue_data) -> Path
|
|
# 1. Fetch remote links
|
|
# 2. Overlay SLA fields from service account
|
|
# 3. Save to /data/src_data/raw/jira/issues/{key}.json
|
|
# 4. Download attachments
|
|
|
|
def download_attachment(attachment, issue_key) -> Path
|
|
# GET attachment content URL with auth
|
|
# Save to attachments/{issue_key}/{id}_{filename}
|
|
```
|
|
|
|
**Why fetch after webhook?**
|
|
- Webhook payload contains minimal data
|
|
- Full issue data requires API call with `fields=*all`
|
|
- Ensures we have complete, consistent data
|
|
|
|
**Why two API tokens?**
|
|
- Personal token fetches all fields except SLA (lacks JSM Agent licence)
|
|
- JSM service account token fetches SLA fields via Atlassian Cloud API
|
|
- SLA data is overlayed into the issue JSON before saving
|
|
|
|
### 4. Data Transformation
|
|
|
|
Two transformation modes are available:
|
|
|
|
#### 4a. Incremental Transform (Real-Time)
|
|
|
|
**File:** `connectors/jira/incremental_transform.py`
|
|
|
|
Called automatically by webhook handler after saving issue JSON and attachments. Updates only the affected monthly Parquet file.
|
|
|
|
```python
|
|
# Called from jira_service.py after save_issue()
|
|
from connectors.jira.incremental_transform import transform_single_issue
|
|
|
|
transform_single_issue(
|
|
issue_key="SUPPORT-1234",
|
|
deleted=False, # or True for deletion events
|
|
)
|
|
```
|
|
|
|
**How it works:**
|
|
1. Loads the saved JSON for the issue
|
|
2. Determines the month from `created_at` date
|
|
3. Loads existing Parquet for that month (if any)
|
|
4. Upserts issue data (removes old, adds new)
|
|
5. Saves updated Parquet
|
|
6. Copies to distribution directory for rsync
|
|
|
|
**Benefits:**
|
|
- Data available within seconds of Jira change
|
|
- Only updates one monthly file (~50-100KB)
|
|
- Rsync transfers only changed files
|
|
|
|
#### 4b. Batch Transform (Initial Load / Recovery)
|
|
|
|
**File:** `connectors/jira/transform.py`
|
|
|
|
Used for initial historical load or to rebuild all Parquet from raw JSON.
|
|
|
|
```bash
|
|
python -m connectors.jira.transform \
|
|
--raw-dir /data/src_data/raw/jira \
|
|
--output-dir /data/src_data/parquet/jira \
|
|
--attachments-dir /data/src_data/raw/jira/attachments
|
|
```
|
|
|
|
**Common transformations (both modes):**
|
|
- Extracts plain text from ADF (Atlassian Document Format)
|
|
- Maps custom field IDs to human-readable names
|
|
- Normalizes nested structures into flat tables
|
|
- Links attachments to local file paths
|
|
- Enforces explicit PyArrow schema for consistent types across months
|
|
|
|
### 5. Data Distribution
|
|
|
|
Analysts sync data via rsync (same as other data):
|
|
|
|
```bash
|
|
bash server/scripts/sync_data.sh
|
|
```
|
|
|
|
This syncs:
|
|
- `server/parquet/jira/` - Parquet tables (issues, comments, attachments metadata, changelog, issuelinks, remote_links)
|
|
|
|
For attachment files, see [Attachment Access](#attachment-access) section below.
|
|
|
|
## Data Flow Timeline (Real-Time)
|
|
|
|
```
|
|
T+0ms Jira: Issue updated
|
|
T+50ms Jira: Webhook POST to our server
|
|
T+100ms Server: Verify signature, log event
|
|
T+150ms Server: GET /rest/api/3/issue/{key} from Jira API
|
|
T+400ms Server: GET SLA fields via JSM service account (cloud API)
|
|
T+500ms Server: Save JSON (with SLA overlay) to raw/jira/issues/
|
|
T+600ms Server: Download attachments (parallel)
|
|
T+800ms Server: Incremental transform → update monthly Parquet
|
|
T+900ms Server: Copy to distribution directory
|
|
T+1000ms Server: Return 200 OK to Jira
|
|
|
|
(analyst sync - any time)
|
|
T+Xsec Analyst: bash sync_data.sh
|
|
T+Xsec Analyst: rsync downloads only changed monthly file (~50KB)
|
|
T+Xsec Analyst: Query with DuckDB - sees latest data
|
|
```
|
|
|
|
**Key improvement:** Incremental transform runs immediately after webhook processing, so data is available for sync within seconds of the Jira change.
|
|
|
|
## Configuration
|
|
|
|
### Server Environment Variables
|
|
|
|
In `/opt/data-analyst/.env`:
|
|
|
|
```bash
|
|
# Jira webhook integration
|
|
JIRA_WEBHOOK_SECRET=<random 64-char hex string>
|
|
JIRA_DOMAIN=your-org.atlassian.net
|
|
JIRA_EMAIL=integration-user@your-domain.com
|
|
JIRA_API_TOKEN=<API token from Atlassian>
|
|
|
|
# Jira SLA service account (JSM Agent licence for SLA fields)
|
|
JIRA_SLA_EMAIL=<JSM service account email>
|
|
JIRA_SLA_API_TOKEN=<API token from 1Password>
|
|
JIRA_CLOUD_ID=f0f7a244-4fb4-41f9-b1f0-b79e24a20f11
|
|
```
|
|
|
|
### GitHub Secrets
|
|
|
|
| Secret | Description |
|
|
|--------|-------------|
|
|
| `JIRA_WEBHOOK_SECRET` | HMAC secret for webhook verification |
|
|
| `JIRA_DOMAIN` | Jira Cloud domain |
|
|
| `JIRA_EMAIL` | Email for API authentication |
|
|
| `JIRA_API_TOKEN` | API token from Atlassian account |
|
|
| `JIRA_SLA_EMAIL` | JSM service account email (for SLA fields) |
|
|
| `JIRA_SLA_API_TOKEN` | JSM service account API token |
|
|
| `JIRA_CLOUD_ID` | Atlassian Cloud site ID |
|
|
|
|
### Getting Jira API Token
|
|
|
|
1. Go to https://id.atlassian.com/manage-profile/security/api-tokens
|
|
2. Click "Create API token"
|
|
3. Name it (e.g., "Data Analyst Integration")
|
|
4. Copy token to `JIRA_API_TOKEN`
|
|
|
|
**⚠️ IMPORTANT: API tokens expire after 365 days maximum (Atlassian limitation).**
|
|
|
|
Set a calendar reminder to rotate the token before expiration. When rotating:
|
|
1. Create new token in Atlassian
|
|
2. Update `JIRA_API_TOKEN` in GitHub Secrets and server `.env`
|
|
3. Restart webapp: `sudo systemctl restart webapp`
|
|
4. Test: `curl https://your-instance.example.com/webhooks/jira/health`
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
/data/src_data/
|
|
├── raw/
|
|
│ └── jira/ # Raw data from webhooks
|
|
│ ├── issues/ # One JSON file per issue
|
|
│ │ ├── SUPPORT-15186.json
|
|
│ │ ├── SUPPORT-15189.json
|
|
│ │ └── SUPPORT-15190.json
|
|
│ ├── attachments/ # Downloaded files (by issue key)
|
|
│ │ ├── SUPPORT-15189/
|
|
│ │ │ ├── 56337_image.png
|
|
│ │ │ └── 56338_image-20260203-110549.png
|
|
│ │ └── SUPPORT-15190/
|
|
│ │ └── 56340_image.png
|
|
│ └── webhook_events/ # Audit log of all webhooks
|
|
│ ├── 20260203_105203_jira_issue_updated.json
|
|
│ └── 20260203_110457_comment_created.json
|
|
│
|
|
└── parquet/
|
|
└── jira/ # Transformed data (monthly partitioned)
|
|
├── issues/ # Main issues table
|
|
│ ├── 2025-01.parquet
|
|
│ ├── 2025-02.parquet
|
|
│ └── ...
|
|
├── comments/ # Issue comments
|
|
│ └── YYYY-MM.parquet
|
|
├── attachments/ # Attachment metadata
|
|
│ └── YYYY-MM.parquet
|
|
├── changelog/ # Field change history
|
|
│ └── YYYY-MM.parquet
|
|
├── issuelinks/ # Links between issues
|
|
│ └── YYYY-MM.parquet
|
|
└── remote_links/ # External links (Confluence, Slack, etc.)
|
|
└── YYYY-MM.parquet
|
|
```
|
|
|
|
**Monthly Partitioning Benefits:**
|
|
- Efficient rsync: only changed months are transferred
|
|
- Better performance: smaller files for ~15,000 total tickets
|
|
- Incremental updates: new months don't rewrite old data
|
|
|
|
## Monitoring
|
|
|
|
### Health Check
|
|
|
|
```bash
|
|
curl https://your-instance.example.com/webhooks/jira/health
|
|
```
|
|
|
|
Response:
|
|
```json
|
|
{
|
|
"status": "ok",
|
|
"configured": true,
|
|
"webhook_secret_set": true,
|
|
"jira_domain": "your-org.atlassian.net"
|
|
}
|
|
```
|
|
|
|
### Logs
|
|
|
|
```bash
|
|
# Webapp logs (webhook processing)
|
|
tail -f /opt/data-analyst/logs/webapp-error.log | grep -i jira
|
|
|
|
# Recent webhook events
|
|
ls -lt /data/src_data/raw/jira/webhook_events/ | head -20
|
|
|
|
# Issue count
|
|
ls /data/src_data/raw/jira/issues/ | wc -l
|
|
|
|
# Attachment count
|
|
find /data/src_data/raw/jira/attachments/ -type f | wc -l
|
|
```
|
|
|
|
## Security
|
|
|
|
| Layer | Protection |
|
|
|-------|------------|
|
|
| Webhook | HMAC-SHA256 signature verification |
|
|
| API Auth | HTTP Basic Auth (email + API token) |
|
|
| Storage | Server directories with `data-ops` group permissions |
|
|
| Transport | HTTPS only (Let's Encrypt certificate) |
|
|
|
|
**Webhook Signature Verification:**
|
|
```python
|
|
expected = hmac.new(
|
|
secret.encode('utf-8'),
|
|
request.get_data(),
|
|
hashlib.sha256
|
|
).hexdigest()
|
|
|
|
if not hmac.compare_digest(signature, expected):
|
|
abort(401)
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Webhook not received
|
|
|
|
1. Check Jira webhook is enabled and URL is correct
|
|
2. Verify JQL filter matches the issue's project
|
|
3. Check server firewall allows HTTPS from Atlassian IPs
|
|
|
|
### Signature verification fails
|
|
|
|
1. Verify `JIRA_WEBHOOK_SECRET` matches in both Jira and server `.env`
|
|
2. Check for trailing whitespace in secret
|
|
3. Restart webapp after changing `.env`
|
|
|
|
### Attachments not downloading
|
|
|
|
1. Check `JIRA_API_TOKEN` is valid
|
|
2. Verify API token has read access to attachments
|
|
3. Check disk space on `/data` partition
|
|
4. Large attachments (>50MB) are skipped by design
|
|
|
|
### Missing data in Parquet
|
|
|
|
1. Run transformation manually:
|
|
```bash
|
|
python -m connectors.jira.transform \
|
|
--raw-dir /data/src_data/raw/jira \
|
|
--output-dir /data/src_data/parquet/jira \
|
|
--attachments-dir /data/src_data/raw/jira/attachments
|
|
```
|
|
2. Check for errors in transformation output
|
|
3. Verify raw JSON files exist in `raw/jira/issues/`
|
|
4. Note: Output files are partitioned by month (e.g., `issues/2026-01.parquet`)
|
|
|
|
## Schema Reference
|
|
|
|
See [docs/jira_schema.md](jira_schema.md) for detailed table schemas and example queries.
|
|
|
|
## Historical Backfill
|
|
|
|
For initial setup or recovery, use the backfill script to download all historical issues.
|
|
|
|
**File:** `connectors/jira/scripts/backfill.py`
|
|
|
|
```bash
|
|
# Download all SUPPORT tickets (idempotent, skips existing)
|
|
python -m connectors.jira.scripts.backfill --parallel 4
|
|
|
|
# Environment variables required:
|
|
JIRA_DOMAIN=your-org.atlassian.net
|
|
JIRA_EMAIL=integration-user@your-domain.com
|
|
JIRA_API_TOKEN=<API token>
|
|
JIRA_DATA_DIR=/data/src_data/raw/jira # optional, default path
|
|
```
|
|
|
|
**Features:**
|
|
- Uses new Jira Cloud API (`POST /rest/api/3/search/jql` with `nextPageToken`)
|
|
- Parallel downloads (configurable workers)
|
|
- Downloads all attachments
|
|
- Idempotent - skips already downloaded issues
|
|
- Handles rate limiting gracefully
|
|
|
|
**SLA backfill** (separate script, uses JSM service account):
|
|
|
|
**File:** `connectors/jira/scripts/backfill_sla.py`
|
|
|
|
```bash
|
|
# Fetch SLA fields for all issues (uses JIRA_SLA_* env vars)
|
|
python -m connectors.jira.scripts.backfill_sla --parallel 8
|
|
|
|
# Dry run (count files needing update):
|
|
python -m connectors.jira.scripts.backfill_sla --dry-run
|
|
```
|
|
|
|
The personal API token lacks JSM Agent licence needed for SLA fields.
|
|
This script uses the JSM service account via the
|
|
Atlassian Cloud API (`api.atlassian.com`) to fetch and embed SLA data
|
|
into existing raw JSON files.
|
|
|
|
**After backfill, run batch transform:**
|
|
```bash
|
|
python -m connectors.jira.transform \
|
|
--raw-dir /data/src_data/raw/jira \
|
|
--output-dir /data/src_data/parquet/jira \
|
|
--attachments-dir /data/src_data/raw/jira/attachments
|
|
|
|
# Copy to distribution directory
|
|
cp -r /data/src_data/parquet/jira/* ~/server/parquet/jira/
|
|
```
|
|
|
|
## SLA Polling (Open Tickets)
|
|
|
|
SLA elapsed values (`first_response_elapsed_millis`, `time_to_resolution_elapsed_millis`) only update when a webhook fires. For idle open tickets (~49 tickets, ~0.3% of dataset), these values go stale and no longer reflect the actual current elapsed time.
|
|
|
|
**File:** `connectors/jira/scripts/poll_sla.py`
|
|
|
|
The SLA polling job runs every 15 minutes via systemd timer (`jira-sla-poll.timer`) as `root:data-ops` and:
|
|
|
|
1. Reads Parquet to find open issues with SLA data
|
|
2. Fetches fresh SLA **and status** fields via JSM service account (cloud API)
|
|
3. Updates raw JSON atomically (`tempfile.mkstemp()` + `os.fchmod(fd, 0o660)` + `os.replace()`)
|
|
4. Triggers incremental Parquet transform (inside advisory file lock)
|
|
|
|
**Self-healing:** The poll fetches `status`, `resolution`, `resolutiondate`, and `updated` alongside the SLA fields. If a ticket is resolved in Jira but still appears "open" in Parquet (e.g. due to a missed webhook), the poll automatically corrects the status in JSON and re-transforms to Parquet. Log output: `Self-healing: SUPPORT-XXXX is resolved in Jira`. This was added in response to [#203](https://github.com/keboola/agnes-the-ai-analyst/issues/203) where 12 tickets were permanently stale after a permission bug prevented webhooks from updating JSON files.
|
|
|
|
**File locking:** The entire read-modify-write + Parquet transform is wrapped in a per-issue advisory file lock (`connectors/jira/file_lock.py`) to prevent races with the webhook handler. The webhook handler (`connectors/jira/service.py`) uses the same lock. Different issue keys don't block each other.
|
|
|
|
**Important — `mkstemp` and ACL:** The `issues/` directory uses POSIX ACLs with `default:mask::rwx`. `tempfile.mkstemp()` creates files with mode `0600`, which overrides the ACL mask to `---` and breaks group access for www-data (webhook handler) and deploy (batch transform). The `os.fchmod(fd, 0o660)` call immediately after `mkstemp()` restores the mask to `rw-`, preserving ACL-based access. See [#203](https://github.com/keboola/agnes-the-ai-analyst/issues/203) for the full incident report.
|
|
|
|
```bash
|
|
# Manual run
|
|
python -m connectors.jira.scripts.poll_sla
|
|
|
|
# Dry run (count open issues)
|
|
python -m connectors.jira.scripts.poll_sla --dry-run
|
|
|
|
# Verbose logging
|
|
python -m connectors.jira.scripts.poll_sla --verbose
|
|
```
|
|
|
|
**Return states:**
|
|
- `updated` — SLA fields refreshed, status unchanged
|
|
- `healed` — status corrected (ticket was resolved in Jira but stale locally)
|
|
- `skipped` — no valid SLA data and ticket not resolved
|
|
- `failed` — API error or transform failure
|
|
|
|
**Note:** `sla_cycle_type` (ongoing/completed) is not stored in Parquet — compute it on-the-fly in DuckDB:
|
|
```sql
|
|
SELECT issue_key,
|
|
CASE WHEN status_category = 'Done' THEN 'completed' ELSE 'ongoing' END AS sla_cycle_type,
|
|
first_response_elapsed_millis,
|
|
time_to_resolution_elapsed_millis
|
|
FROM 'server/parquet/jira/issues/*.parquet'
|
|
WHERE first_response_elapsed_millis IS NOT NULL
|
|
```
|
|
|
|
## Analyst Sync Configuration
|
|
|
|
Jira data is an **optional dataset** - not synced by default to save bandwidth.
|
|
|
|
**Enable Jira sync:**
|
|
```bash
|
|
# Edit local config (created on first sync_data.sh run)
|
|
nano ~/.config/data-analyst/sync.yaml
|
|
|
|
# Change:
|
|
datasets:
|
|
jira: true # Enable parquet data (~50MB)
|
|
jira_attachments: false # Keep false unless you need actual files
|
|
```
|
|
|
|
**Then sync:**
|
|
```bash
|
|
bash server/scripts/sync_data.sh
|
|
```
|
|
|
|
DuckDB views for Jira tables are created automatically if data exists:
|
|
- `jira_issues` - main issues table
|
|
- `jira_comments` - issue comments
|
|
- `jira_attachments` - attachment metadata (filenames, sizes, URLs)
|
|
- `jira_changelog` - field change history
|
|
- `jira_issuelinks` - links between issues (blocks, duplicates, relates to)
|
|
- `jira_remote_links` - external links (Confluence, Slack, etc.)
|
|
|
|
## Attachment Access
|
|
|
|
Attachments (images, logs, PDFs) are stored separately from parquet data.
|
|
|
|
### Option 1: Download per-ticket (recommended)
|
|
|
|
Download attachments for a specific ticket to local temp folder:
|
|
|
|
```bash
|
|
# Download all attachments for one ticket
|
|
rsync -avz data-analyst:server/jira_attachments/SUPPORT-1234/ /tmp/SUPPORT-1234/
|
|
|
|
# View locally
|
|
ls /tmp/SUPPORT-1234/
|
|
open /tmp/SUPPORT-1234/screenshot.png # macOS
|
|
```
|
|
|
|
This is fast (only downloads files for one ticket) and keeps your local machine clean.
|
|
|
|
### Option 2: Sync attachments locally (for heavy analysis)
|
|
|
|
If you need frequent access to attachments, enable full sync:
|
|
|
|
```yaml
|
|
# ~/.config/data-analyst/sync.yaml
|
|
datasets:
|
|
jira: true
|
|
jira_attachments: true # Syncs ~500MB+ of files
|
|
```
|
|
|
|
Then `sync_data.sh` will rsync attachments to `./server/jira_attachments/`.
|
|
|
|
### Finding attachment path from parquet
|
|
|
|
The `jira_attachments` table has a `local_path` column with the server path:
|
|
|
|
```sql
|
|
SELECT
|
|
issue_key,
|
|
filename,
|
|
local_path,
|
|
size_bytes
|
|
FROM jira_attachments
|
|
WHERE issue_key = 'SUPPORT-1234';
|
|
```
|
|
|
|
Result:
|
|
```
|
|
issue_key | filename | local_path | size_bytes
|
|
SUPPORT-1234 | screenshot.png | /data/src_data/raw/jira/attachments/SUPPORT-1234/... | 45678
|
|
```
|
|
|
|
To access locally (if synced): replace `/data/src_data/raw/jira/attachments/` with `./server/jira_attachments/`.
|
|
|
|
## Future Improvements
|
|
|
|
- [x] ~~Automatic Parquet regeneration after each webhook~~ (Implemented: incremental transform)
|
|
- [x] ~~Incremental Parquet updates~~ (Implemented: upsert by issue_key)
|
|
- [x] ~~Full historical sync from Jira~~ (Implemented: jira_backfill.py)
|
|
- [x] ~~SLA polling for open tickets~~ (Implemented: jira_poll_sla.py, 15min timer)
|
|
- [ ] Comment attachment extraction (inline images in ADF)
|
|
- [ ] Custom field name resolution from Jira metadata API
|
|
- [ ] Attachment binary sync to analysts (currently metadata only)
|