Open-source AI data analyst platform extracted from internal repo. Includes data sync engine, Keboola adapter, Flask web portal, server deployment scripts, and configuration templates.
13 KiB
Jira Support Tickets Schema
This document describes the schema of transformed Jira data available for analysis.
Data Location
/data/src_data/parquet/jira/ # Transformed Parquet files (monthly chunks)
├── issues/ # Main issues table
│ ├── 2025-01.parquet
│ ├── 2025-02.parquet
│ └── ...
├── comments/ # Issue comments
│ └── YYYY-MM.parquet
├── attachments/ # Attachment metadata with local paths
│ └── YYYY-MM.parquet
├── changelog/ # Change history
│ └── YYYY-MM.parquet
├── issuelinks/ # Links between issues
│ └── YYYY-MM.parquet
└── remote_links/ # External links (Confluence, Slack, etc.)
└── YYYY-MM.parquet
/data/src_data/raw/jira/ # Raw data (JSON + files)
├── issues/ # Raw JSON per issue
├── attachments/ # Downloaded attachment files
│ └── {issue_key}/ # By issue key (e.g., SUPPORT-15051/)
│ └── {id}_{filename} # e.g., 56340_screenshot.png
└── webhook_events/ # Raw webhook payloads (audit)
Monthly Partitioning: Parquet files are partitioned by month based on created_at timestamp. This enables efficient rsync (only changed months sync) and keeps individual file sizes manageable for ~15,000 tickets.
DuckDB Query Pattern: Use glob patterns to query all months:
SELECT * FROM 'server/parquet/jira/issues/*.parquet';
Tables
issues
Main table with support ticket information.
| Column | Type | Description |
|---|---|---|
issue_key |
string | Unique issue identifier (e.g., "SUPPORT-15190") |
issue_id |
string | Jira internal ID |
issue_url |
string | Direct URL to issue in Jira |
summary |
string | Issue title/summary |
description |
string | Full description (plain text, extracted from ADF) |
issue_type |
string | Type (Service Request, Bug, etc.) |
status |
string | Current status (New, Under Review, Resolved, etc.) |
status_category |
string | Status category (To Do, In Progress, Done) |
priority |
string | Priority level (Lowest, Low, Medium, High, Highest) |
resolution |
string | Resolution type if resolved |
project_key |
string | Project key (SUPPORT) |
project_name |
string | Project name (e.g., your Jira project name) |
creator_email |
string | Email of ticket creator |
creator_name |
string | Display name of creator |
reporter_email |
string | Email of reporter |
reporter_name |
string | Display name of reporter |
assignee_email |
string | Email of assigned agent |
assignee_name |
string | Display name of assignee |
created_at |
datetime | When ticket was created |
updated_at |
datetime | Last update timestamp |
resolved_at |
datetime | When ticket was resolved (null if open) |
due_date |
string | Due date if set |
labels |
string (JSON) | Array of labels as JSON |
attachment_count |
int | Number of attachments |
comment_count |
int | Number of comments |
issuelink_count |
int | Number of linked issues |
request_type |
string | Service Desk request type name |
request_status |
string | Service Desk specific status |
severity |
string | Severity level (custom field) |
triage |
string (JSON) | Triage multi-select (renamed from team_tier) |
configuration_item |
string (JSON) | Configuration item multi-select (renamed from categories) |
participants |
string (JSON) | List of participant emails |
organizations |
string (JSON) | Related organizations |
spam |
string | Spam flag (True/null) |
context |
string | Context field (renamed from root_cause; maps to customfield_10330) |
keboola_platform_url |
string | Keboola platform URL (renamed from resolution_summary) |
slack_link |
string | Slack link (renamed from customer_type) |
technical_issue_category |
string | Technical issue category (renamed from satisfaction_rating) |
email_address |
string | Email address field (renamed from context; maps to customfield_10475) |
satisfaction |
int | Customer satisfaction rating (1-5) |
first_response_breached |
string | SLA: whether first response SLA was breached (True/False) |
first_response_goal_millis |
int | SLA: first response goal duration in milliseconds |
first_response_elapsed_millis |
int | SLA: actual first response time in milliseconds |
time_to_resolution_breached |
string | SLA: whether resolution SLA was breached (True/False) |
time_to_resolution_goal_millis |
int | SLA: resolution goal duration in milliseconds |
time_to_resolution_elapsed_millis |
int | SLA: actual resolution time in milliseconds |
l3_team |
string | L3 team assignment (new) |
_synced_at |
string | When data was synced from Jira |
_raw_file |
string | Source JSON filename |
comments
Issue comments from support conversations.
| Column | Type | Description |
|---|---|---|
comment_id |
string | Unique comment ID |
issue_key |
string | Parent issue key (FK to issues) |
author_email |
string | Comment author email |
author_name |
string | Comment author display name |
body |
string | Comment text (plain text, extracted from ADF) |
created_at |
datetime | When comment was created |
updated_at |
datetime | When comment was last edited |
update_author_email |
string | Who last edited the comment |
attachments
Attachment metadata with local file paths.
| Column | Type | Description |
|---|---|---|
attachment_id |
string | Unique attachment ID |
issue_key |
string | Parent issue key (FK to issues) |
filename |
string | Original filename |
local_path |
string | Server path to downloaded file |
hierarchical_path |
string | Hierarchical path for future use (e.g., 15/051/56340_file.png) |
size_bytes |
int | File size in bytes |
mime_type |
string | MIME type (image/png, application/pdf, etc.) |
author_email |
string | Who uploaded the attachment |
created_at |
datetime | When attachment was uploaded |
content_url |
string | Jira API URL to download |
thumbnail_url |
string | Jira API URL for thumbnail (images only) |
changelog
History of all field changes on issues.
| Column | Type | Description |
|---|---|---|
change_id |
string | Change history ID |
issue_key |
string | Parent issue key (FK to issues) |
author_email |
string | Who made the change |
author_name |
string | Display name of who made change |
field_name |
string | Name of changed field |
field_type |
string | Type of field (jira, custom) |
from_value |
string | Previous value (as string) |
to_value |
string | New value (as string) |
changed_at |
datetime | When change occurred |
issuelinks
Links between Jira issues (blocks, duplicates, relates to, etc.).
| Column | Type | Description |
|---|---|---|
issue_key |
string | Source issue key (FK to issues) |
link_id |
string | Unique link ID |
link_type |
string | Link type name (Blocks, Duplicate, Relates, etc.) |
direction |
string | Link direction: "inward" or "outward" |
linked_issue_key |
string | Target issue key |
linked_issue_summary |
string | Summary of linked issue |
linked_issue_status |
string | Status of linked issue |
linked_issue_priority |
string | Priority of linked issue |
remote_links
External links attached to issues (Confluence pages, Slack threads, external URLs).
| Column | Type | Description |
|---|---|---|
issue_key |
string | Parent issue key (FK to issues) |
remote_link_id |
string | Unique remote link ID |
url |
string | External URL |
title |
string | Link title/label |
application_name |
string | Application name (e.g., "Confluence", "Slack") |
application_type |
string | Application type identifier |
Relationships
All child tables reference jira_issues via the issue_key column:
jira_issues (PK: issue_key)
├── jira_comments (FK: issue_key → jira_issues.issue_key)
├── jira_attachments (FK: issue_key → jira_issues.issue_key)
├── jira_changelog (FK: issue_key → jira_issues.issue_key)
├── jira_issuelinks (FK: issue_key → jira_issues.issue_key)
│ (FK: linked_issue_key → jira_issues.issue_key)
└── jira_remote_links (FK: issue_key → jira_issues.issue_key)
These relationships are used by the Data Profiler to populate the Relationships tab in the catalog UI. They enable navigation between related table profiles.
Join examples:
-- Issues with their comments
SELECT i.issue_key, i.summary, c.body, c.created_at
FROM 'server/parquet/jira/issues/*.parquet' i
JOIN 'server/parquet/jira/comments/*.parquet' c ON i.issue_key = c.issue_key;
-- Issues with linked issues
SELECT i.issue_key, i.summary, l.link_type, l.direction, l.linked_issue_key
FROM 'server/parquet/jira/issues/*.parquet' i
JOIN 'server/parquet/jira/issuelinks/*.parquet' l ON i.issue_key = l.issue_key;
Example Queries (DuckDB)
Note: Use glob patterns (*.parquet) to query all monthly chunks at once.
Active tickets by status
SELECT status, COUNT(*) as count
FROM 'server/parquet/jira/issues/*.parquet'
WHERE resolved_at IS NULL
GROUP BY status
ORDER BY count DESC;
Average resolution time by severity
SELECT
severity,
COUNT(*) as tickets,
AVG(EXTRACT(EPOCH FROM (resolved_at - created_at)) / 3600) as avg_hours
FROM 'server/parquet/jira/issues/*.parquet'
WHERE resolved_at IS NOT NULL
GROUP BY severity;
Most active commenters
SELECT
author_email,
author_name,
COUNT(*) as comments
FROM 'server/parquet/jira/comments/*.parquet'
GROUP BY author_email, author_name
ORDER BY comments DESC
LIMIT 10;
Tickets with attachments
SELECT
i.issue_key,
i.summary,
a.filename,
a.local_path
FROM 'server/parquet/jira/issues/*.parquet' i
JOIN 'server/parquet/jira/attachments/*.parquet' a ON i.issue_key = a.issue_key
WHERE a.local_path IS NOT NULL;
Field change frequency
SELECT
field_name,
COUNT(*) as changes
FROM 'server/parquet/jira/changelog/*.parquet'
GROUP BY field_name
ORDER BY changes DESC;
Query specific month only
-- Query only January 2026 data
SELECT * FROM 'server/parquet/jira/issues/2026-01.parquet';
Data Freshness
- Data is synced in real-time via Jira webhooks
- Each issue update triggers: webhook → fetch → save JSON → download attachments → incremental Parquet transform
- Parquet files are updated within seconds of Jira change (only affected month is rewritten)
- Raw JSON is kept for audit and debugging
- Historical data can be loaded via
scripts/jira_backfill.py
Viewing Attachments
Attachments are stored on the server at /data/src_data/raw/jira/attachments/{issue_key}/.
Analysts can access them via symlink at ~/server/jira_attachments/.
Download attachments for a specific ticket:
# Rsync one ticket's attachments to local temp folder
rsync -avz data-analyst:server/jira_attachments/SUPPORT-1234/ /tmp/SUPPORT-1234/
# View locally
ls /tmp/SUPPORT-1234/
open /tmp/SUPPORT-1234/screenshot.png # macOS
Find attachment info from parquet:
SELECT issue_key, filename, size_bytes, local_path
FROM jira_attachments
WHERE issue_key = 'SUPPORT-1234';
Custom Field Reference
| Field ID | Column Name | Description |
|---|---|---|
| customfield_10004 | severity | Severity: 1-Highest to 5-Lowest |
| customfield_10323 | triage | Triage multi-select (renamed from team_tier) |
| customfield_10511 | configuration_item | Configuration item multi-select (renamed from categories) |
| customfield_10365 | spam | Spam flag: True/null |
| customfield_10010 | request_type_info | Service Desk request type metadata |
| customfield_10330 | context | Context field (renamed from root_cause) |
| customfield_10325 | keboola_platform_url | Keboola platform URL (renamed from resolution_summary) |
| customfield_10350 | slack_link | Slack link (renamed from customer_type) |
| customfield_10475 | email_address | Email address (renamed from context) |
| customfield_10676 | technical_issue_category | Technical issue category (renamed from satisfaction_rating) |
| customfield_10157 | satisfaction | Customer satisfaction rating (1-5) |
| customfield_10328 | first_response_* | SLA: first response (breached, goal_millis, elapsed_millis) |
| customfield_10161 | time_to_resolution_* | SLA: resolution time (breached, goal_millis, elapsed_millis) |
| customfield_11831 | l3_team | L3 team assignment (new) |
| customfield_10156 | participants | Participant user list |
| customfield_10002 | organizations | Organizations |