agnes-the-ai-analyst/docs/jira_schema.md
Petr c56905d34f Initial commit: OSS data distribution platform
Open-source AI data analyst platform extracted from internal repo.
Includes data sync engine, Keboola adapter, Flask web portal,
server deployment scripts, and configuration templates.
2026-03-08 23:31:28 +01:00

13 KiB

Jira Support Tickets Schema

This document describes the schema of transformed Jira data available for analysis.

Data Location

/data/src_data/parquet/jira/     # Transformed Parquet files (monthly chunks)
├── issues/                      # Main issues table
│   ├── 2025-01.parquet
│   ├── 2025-02.parquet
│   └── ...
├── comments/                    # Issue comments
│   └── YYYY-MM.parquet
├── attachments/                 # Attachment metadata with local paths
│   └── YYYY-MM.parquet
├── changelog/                   # Change history
│   └── YYYY-MM.parquet
├── issuelinks/                  # Links between issues
│   └── YYYY-MM.parquet
└── remote_links/                # External links (Confluence, Slack, etc.)
    └── YYYY-MM.parquet

/data/src_data/raw/jira/         # Raw data (JSON + files)
├── issues/                      # Raw JSON per issue
├── attachments/                 # Downloaded attachment files
│   └── {issue_key}/            # By issue key (e.g., SUPPORT-15051/)
│       └── {id}_{filename}     # e.g., 56340_screenshot.png
└── webhook_events/              # Raw webhook payloads (audit)

Monthly Partitioning: Parquet files are partitioned by month based on created_at timestamp. This enables efficient rsync (only changed months sync) and keeps individual file sizes manageable for ~15,000 tickets.

DuckDB Query Pattern: Use glob patterns to query all months:

SELECT * FROM 'server/parquet/jira/issues/*.parquet';

Tables

issues

Main table with support ticket information.

Column Type Description
issue_key string Unique issue identifier (e.g., "SUPPORT-15190")
issue_id string Jira internal ID
issue_url string Direct URL to issue in Jira
summary string Issue title/summary
description string Full description (plain text, extracted from ADF)
issue_type string Type (Service Request, Bug, etc.)
status string Current status (New, Under Review, Resolved, etc.)
status_category string Status category (To Do, In Progress, Done)
priority string Priority level (Lowest, Low, Medium, High, Highest)
resolution string Resolution type if resolved
project_key string Project key (SUPPORT)
project_name string Project name (e.g., your Jira project name)
creator_email string Email of ticket creator
creator_name string Display name of creator
reporter_email string Email of reporter
reporter_name string Display name of reporter
assignee_email string Email of assigned agent
assignee_name string Display name of assignee
created_at datetime When ticket was created
updated_at datetime Last update timestamp
resolved_at datetime When ticket was resolved (null if open)
due_date string Due date if set
labels string (JSON) Array of labels as JSON
attachment_count int Number of attachments
comment_count int Number of comments
issuelink_count int Number of linked issues
request_type string Service Desk request type name
request_status string Service Desk specific status
severity string Severity level (custom field)
triage string (JSON) Triage multi-select (renamed from team_tier)
configuration_item string (JSON) Configuration item multi-select (renamed from categories)
participants string (JSON) List of participant emails
organizations string (JSON) Related organizations
spam string Spam flag (True/null)
context string Context field (renamed from root_cause; maps to customfield_10330)
keboola_platform_url string Keboola platform URL (renamed from resolution_summary)
slack_link string Slack link (renamed from customer_type)
technical_issue_category string Technical issue category (renamed from satisfaction_rating)
email_address string Email address field (renamed from context; maps to customfield_10475)
satisfaction int Customer satisfaction rating (1-5)
first_response_breached string SLA: whether first response SLA was breached (True/False)
first_response_goal_millis int SLA: first response goal duration in milliseconds
first_response_elapsed_millis int SLA: actual first response time in milliseconds
time_to_resolution_breached string SLA: whether resolution SLA was breached (True/False)
time_to_resolution_goal_millis int SLA: resolution goal duration in milliseconds
time_to_resolution_elapsed_millis int SLA: actual resolution time in milliseconds
l3_team string L3 team assignment (new)
_synced_at string When data was synced from Jira
_raw_file string Source JSON filename

comments

Issue comments from support conversations.

Column Type Description
comment_id string Unique comment ID
issue_key string Parent issue key (FK to issues)
author_email string Comment author email
author_name string Comment author display name
body string Comment text (plain text, extracted from ADF)
created_at datetime When comment was created
updated_at datetime When comment was last edited
update_author_email string Who last edited the comment

attachments

Attachment metadata with local file paths.

Column Type Description
attachment_id string Unique attachment ID
issue_key string Parent issue key (FK to issues)
filename string Original filename
local_path string Server path to downloaded file
hierarchical_path string Hierarchical path for future use (e.g., 15/051/56340_file.png)
size_bytes int File size in bytes
mime_type string MIME type (image/png, application/pdf, etc.)
author_email string Who uploaded the attachment
created_at datetime When attachment was uploaded
content_url string Jira API URL to download
thumbnail_url string Jira API URL for thumbnail (images only)

changelog

History of all field changes on issues.

Column Type Description
change_id string Change history ID
issue_key string Parent issue key (FK to issues)
author_email string Who made the change
author_name string Display name of who made change
field_name string Name of changed field
field_type string Type of field (jira, custom)
from_value string Previous value (as string)
to_value string New value (as string)
changed_at datetime When change occurred

Links between Jira issues (blocks, duplicates, relates to, etc.).

Column Type Description
issue_key string Source issue key (FK to issues)
link_id string Unique link ID
link_type string Link type name (Blocks, Duplicate, Relates, etc.)
direction string Link direction: "inward" or "outward"
linked_issue_key string Target issue key
linked_issue_summary string Summary of linked issue
linked_issue_status string Status of linked issue
linked_issue_priority string Priority of linked issue

External links attached to issues (Confluence pages, Slack threads, external URLs).

Column Type Description
issue_key string Parent issue key (FK to issues)
remote_link_id string Unique remote link ID
url string External URL
title string Link title/label
application_name string Application name (e.g., "Confluence", "Slack")
application_type string Application type identifier

Relationships

All child tables reference jira_issues via the issue_key column:

jira_issues (PK: issue_key)
├── jira_comments       (FK: issue_key → jira_issues.issue_key)
├── jira_attachments    (FK: issue_key → jira_issues.issue_key)
├── jira_changelog      (FK: issue_key → jira_issues.issue_key)
├── jira_issuelinks     (FK: issue_key → jira_issues.issue_key)
│                       (FK: linked_issue_key → jira_issues.issue_key)
└── jira_remote_links   (FK: issue_key → jira_issues.issue_key)

These relationships are used by the Data Profiler to populate the Relationships tab in the catalog UI. They enable navigation between related table profiles.

Join examples:

-- Issues with their comments
SELECT i.issue_key, i.summary, c.body, c.created_at
FROM 'server/parquet/jira/issues/*.parquet' i
JOIN 'server/parquet/jira/comments/*.parquet' c ON i.issue_key = c.issue_key;

-- Issues with linked issues
SELECT i.issue_key, i.summary, l.link_type, l.direction, l.linked_issue_key
FROM 'server/parquet/jira/issues/*.parquet' i
JOIN 'server/parquet/jira/issuelinks/*.parquet' l ON i.issue_key = l.issue_key;

Example Queries (DuckDB)

Note: Use glob patterns (*.parquet) to query all monthly chunks at once.

Active tickets by status

SELECT status, COUNT(*) as count
FROM 'server/parquet/jira/issues/*.parquet'
WHERE resolved_at IS NULL
GROUP BY status
ORDER BY count DESC;

Average resolution time by severity

SELECT
    severity,
    COUNT(*) as tickets,
    AVG(EXTRACT(EPOCH FROM (resolved_at - created_at)) / 3600) as avg_hours
FROM 'server/parquet/jira/issues/*.parquet'
WHERE resolved_at IS NOT NULL
GROUP BY severity;

Most active commenters

SELECT
    author_email,
    author_name,
    COUNT(*) as comments
FROM 'server/parquet/jira/comments/*.parquet'
GROUP BY author_email, author_name
ORDER BY comments DESC
LIMIT 10;

Tickets with attachments

SELECT
    i.issue_key,
    i.summary,
    a.filename,
    a.local_path
FROM 'server/parquet/jira/issues/*.parquet' i
JOIN 'server/parquet/jira/attachments/*.parquet' a ON i.issue_key = a.issue_key
WHERE a.local_path IS NOT NULL;

Field change frequency

SELECT
    field_name,
    COUNT(*) as changes
FROM 'server/parquet/jira/changelog/*.parquet'
GROUP BY field_name
ORDER BY changes DESC;

Query specific month only

-- Query only January 2026 data
SELECT * FROM 'server/parquet/jira/issues/2026-01.parquet';

Data Freshness

  • Data is synced in real-time via Jira webhooks
  • Each issue update triggers: webhook → fetch → save JSON → download attachments → incremental Parquet transform
  • Parquet files are updated within seconds of Jira change (only affected month is rewritten)
  • Raw JSON is kept for audit and debugging
  • Historical data can be loaded via scripts/jira_backfill.py

Viewing Attachments

Attachments are stored on the server at /data/src_data/raw/jira/attachments/{issue_key}/. Analysts can access them via symlink at ~/server/jira_attachments/.

Download attachments for a specific ticket:

# Rsync one ticket's attachments to local temp folder
rsync -avz data-analyst:server/jira_attachments/SUPPORT-1234/ /tmp/SUPPORT-1234/

# View locally
ls /tmp/SUPPORT-1234/
open /tmp/SUPPORT-1234/screenshot.png  # macOS

Find attachment info from parquet:

SELECT issue_key, filename, size_bytes, local_path
FROM jira_attachments
WHERE issue_key = 'SUPPORT-1234';

Custom Field Reference

Field ID Column Name Description
customfield_10004 severity Severity: 1-Highest to 5-Lowest
customfield_10323 triage Triage multi-select (renamed from team_tier)
customfield_10511 configuration_item Configuration item multi-select (renamed from categories)
customfield_10365 spam Spam flag: True/null
customfield_10010 request_type_info Service Desk request type metadata
customfield_10330 context Context field (renamed from root_cause)
customfield_10325 keboola_platform_url Keboola platform URL (renamed from resolution_summary)
customfield_10350 slack_link Slack link (renamed from customer_type)
customfield_10475 email_address Email address (renamed from context)
customfield_10676 technical_issue_category Technical issue category (renamed from satisfaction_rating)
customfield_10157 satisfaction Customer satisfaction rating (1-5)
customfield_10328 first_response_* SLA: first response (breached, goal_millis, elapsed_millis)
customfield_10161 time_to_resolution_* SLA: resolution time (breached, goal_millis, elapsed_millis)
customfield_11831 l3_team L3 team assignment (new)
customfield_10156 participants Participant user list
customfield_10002 organizations Organizations