Remove hardcoded Jira and Telemetry cards from catalog

These Keboola-specific data source cards don't belong in the OSS repo.
The catalog now shows only dynamic content: Core Business Data (from
data_description.md) and Business Metrics (from docs/metrics/*.yml).

Also update auto-install.md with Business Metrics documentation,
pipeline diagram, and expanded checklist.
This commit is contained in:
Petr 2026-03-10 22:48:07 +01:00
parent 5a84473213
commit 49559fba1b
2 changed files with 71 additions and 306 deletions

View file

@ -374,7 +374,14 @@ config/instance.yaml ────────symlink──> repo/config/instance
/api/catalog/profile/ _load_data_stats()
(per-table stats, (header: "9 tables,
columns, alerts, ~217K rows total")
relationships)
relationships,
used_by_metrics)
docs/metrics/*/*.yml ──────────────> _load_metrics_data()
(metric definitions, │
SQL examples, ▼
dimensions) /catalog "Business Metrics" card
/api/metrics/<path> (modal detail)
```
Key files and their roles:
@ -386,6 +393,7 @@ Key files and their roles:
| `*.parquet` | `/data/src_data/parquet/` | Actual data files (flat or in subfolders) |
| `profiles.json` | `/data/src_data/metadata/` | Profiler output: statistics, alerts, relationships per table |
| `sync_state.json` | `/data/src_data/metadata/` | Sync process stats (optional; profiler provides fallback) |
| `docs/metrics/*/*.yml` | OSS repo (sample) or instance repo (production) | Business metric definitions with SQL examples |
**Folder mapping** serves dual purpose: maps table IDs to catalog categories for the UI,
and maps to filesystem paths for the profiler. The profiler auto-detects flat layouts
@ -482,7 +490,59 @@ ln -sf /opt/data-analyst/instance/config/data_description.md \
systemctl restart webapp
```
### 6c: Run Data Profiler
### 6c: Business Metrics
The Data Catalog includes a **Business Metrics** card that dynamically renders metric
definitions from YAML files. The OSS repo ships with 10 sample e-commerce metrics in
`docs/metrics/` (4 categories: revenue, customers, marketing, support) that align with
the sample data generator tables.
**How it works:**
- Webapp scans `docs/metrics/*/*.yml` (production: `/data/docs/metrics/`)
- Each YAML file defines one metric with SQL examples, dimensions, and notes
- The profiler links metrics to tables via `used_by_metrics` in `profiles.json`
- Clicking a metric opens a modal with Overview, How to Use, SQL Examples, and Technical tabs
**For sample data:** metrics work out of the box - the OSS repo includes sample definitions.
**For production:** create metric YAMLs in the **instance repo** and deploy them to
`/data/docs/metrics/` on the server. The production path takes precedence over the OSS repo.
```bash
# Instance repo: create metric definitions
mkdir -p /opt/data-analyst/instance/docs/metrics/{revenue,operations}
# ... add your .yml files ...
# Deploy metrics to server
cp -r /opt/data-analyst/instance/docs/metrics/ /data/docs/metrics/
chown -R root:data-ops /data/docs/metrics
chmod -R 2775 /data/docs/metrics
```
Each metric YAML file follows this structure (list with one dict):
```yaml
- name: metric_name
display_name: Human Readable Name
category: revenue # must match parent directory name
type: sum # sum, average, count_distinct, ratio
unit: USD
grain: monthly
time_column: order_date
table: orders # primary table
tables: [orders, customers] # optional: all referenced tables
expression: "SUM(total_amount)"
description: "What this metric measures..."
dimensions: [channel, region]
notes: ["Important context..."]
synonyms: [alias1, alias2]
sql: |
SELECT ... FROM ... GROUP BY ...
sql_by_channel: | # any sql_* key is auto-discovered
SELECT ... GROUP BY channel
```
### 6d: Run Data Profiler
The profiler reads parquet files + `data_description.md` and generates `profiles.json`
with per-table statistics, column analysis, data quality alerts, and relationship maps.
@ -498,7 +558,8 @@ The profiler provides:
- **Overview**: row count, column count, file size, date coverage, missing cell %
- **Columns**: type distribution, top values, histograms for numeric columns
- **Insights**: data quality alerts (high missing %, imbalanced categories, high cardinality)
- **Relationships**: FK diagram built from `foreign_keys` in `data_description.md`
- **Relationships**: FK diagram built from `foreign_keys` in `data_description.md`, plus linked Business Metrics
- **Used by Metrics**: shows which metric definitions reference this table (from `docs/metrics/`)
- **Sample**: first 5 rows of the table
Without `sync_state.json` (no data adapter running), the profiler computes file sizes
@ -521,9 +582,12 @@ cd /opt/data-analyst/repo && /opt/data-analyst/.venv/bin/python -m src.profiler
| 6.4 | Catalog header | "9 tables, ~217K+ rows total" (from profiles.json) |
| 6.5 | Profile modal | Click "Profile" on any table → statistics, columns, insights |
| 6.6 | Relationships | Orders profile → shows customers, order_items, payments links |
| 6.7 | File sizes | Profile overview shows non-zero file size (e.g., 0.69 MB) |
| 6.8 | Analyst sync | Analyst can rsync parquet files to local machine |
| 6.9 | DuckDB loads | `SELECT count(*) FROM read_parquet('orders.parquet')` returns rows |
| 6.7 | Used by Metrics | Orders overview → shows total_revenue, campaign_roi, etc. badges |
| 6.8 | Business Metrics | `/catalog` shows "Business Metrics" card with 4 categories, 10 metrics |
| 6.9 | Metric modal | Click any metric → modal with SQL examples, dimensions, notes |
| 6.10 | File sizes | Profile overview shows non-zero file size (e.g., 0.69 MB) |
| 6.11 | Analyst sync | Analyst can rsync parquet files to local machine |
| 6.12 | DuckDB loads | `SELECT count(*) FROM read_parquet('orders.parquet')` returns rows |
## Step 7: Real Data Source (Production)

View file

@ -1428,232 +1428,8 @@
</div>
{% endif %}
<!-- ── Card 2: Support Data (Jira) ── -->
<div class="source-card" id="jiraCard">
<div class="source-card-header">
<div class="source-card-left">
<div class="source-card-icon jira">
<svg width="22" height="22" viewBox="0 0 24 24" fill="none" stroke="#6B7280" stroke-width="1.8" stroke-linecap="round" stroke-linejoin="round">
<circle cx="12" cy="12" r="10"/>
<path d="M9.09 9a3 3 0 0 1 5.83 1c0 2-3 3-3 3"/>
<line x1="12" y1="17" x2="12.01" y2="17"/>
</svg>
</div>
<div class="source-card-info">
<div class="source-card-name">Support Data (Jira)</div>
<div class="source-card-desc">Customer support tickets, comments, and change history</div>
<div class="source-card-meta">6 tables &middot; Real-time via webhooks</div>
</div>
</div>
<div class="source-card-right">
<span id="jiraBadge" class="{{ 'badge-subscribed' if sync_settings and sync_settings.datasets.jira else 'badge-unsubscribed' }}">
{{ 'Subscribed' if sync_settings and sync_settings.datasets.jira else 'Not subscribed' }}
</span>
<label class="toggle-switch">
<input type="checkbox" id="toggle-jira" onchange="updateJiraSubscription()" {{ 'checked' if sync_settings and sync_settings.datasets.jira }}>
<span class="toggle-slider"></span>
</label>
</div>
</div>
</div>
<!-- Unsubscribed info -->
<div id="jiraUnsubscribed" class="source-card-unsubscribed" style="{{ 'display:none' if sync_settings and sync_settings.datasets.jira }}">
Enable this data source to get access to Jira support ticket data. Includes issues, comments, attachments metadata, and field change history. Data is synced in real-time via webhooks.
</div>
<!-- Jira accordion (visible only when subscribed) -->
<div id="jiraAccordion" style="{{ '' if sync_settings and sync_settings.datasets.jira else 'display:none' }}">
<div class="accordion-category">
<button class="accordion-trigger" onclick="toggleAccordion(this)">
<svg class="accordion-chevron" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<polyline points="9 18 15 12 9 6"/>
</svg>
Support
<span class="accordion-count">6 tables</span>
</button>
<div class="accordion-content">
<div class="table-row" onclick="openProfiler('jira_issues')">
<div class="table-row-left">
<div class="table-row-name">jira_issues</div>
<div class="table-row-desc">Support tickets from SUPPORT project. Key fields: issue_key, summary, status, priority, assignee, severity.</div>
</div>
<div class="table-row-right">
<span class="rows-badge">Real-time</span>
<span class="profile-link">
<svg width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M2 12s3-7 10-7 10 7 10 7-3 7-10 7-10-7-10-7z"/><circle cx="12" cy="12" r="3"/></svg>
Profile
</span>
</div>
</div>
<div class="table-row" onclick="openProfiler('jira_comments')">
<div class="table-row-left">
<div class="table-row-name">jira_comments</div>
<div class="table-row-desc">Comments on support tickets. Key fields: comment_id, issue_key, author_email, body, created_at.</div>
</div>
<div class="table-row-right">
<span class="rows-badge">Real-time</span>
<span class="profile-link">
<svg width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M2 12s3-7 10-7 10 7 10 7-3 7-10 7-10-7-10-7z"/><circle cx="12" cy="12" r="3"/></svg>
Profile
</span>
</div>
</div>
<div class="table-row" onclick="openProfiler('jira_attachments')">
<div class="table-row-left">
<div class="table-row-name">jira_attachments</div>
<div class="table-row-desc">Attachment metadata with local file paths. Key fields: attachment_id, issue_key, filename, size_bytes, mime_type.</div>
</div>
<div class="table-row-right">
<span class="rows-badge">Real-time</span>
<span class="profile-link">
<svg width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M2 12s3-7 10-7 10 7 10 7-3 7-10 7-10-7-10-7z"/><circle cx="12" cy="12" r="3"/></svg>
Profile
</span>
</div>
</div>
<div class="table-row" onclick="openProfiler('jira_changelog')">
<div class="table-row-left">
<div class="table-row-name">jira_changelog</div>
<div class="table-row-desc">History of all field changes on issues. Key fields: change_id, issue_key, field_name, from_value, to_value.</div>
</div>
<div class="table-row-right">
<span class="rows-badge">Real-time</span>
<span class="profile-link">
<svg width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M2 12s3-7 10-7 10 7 10 7-3 7-10 7-10-7-10-7z"/><circle cx="12" cy="12" r="3"/></svg>
Profile
</span>
</div>
</div>
<div class="table-row" onclick="openProfiler('jira_issuelinks')">
<div class="table-row-left">
<div class="table-row-name">jira_issuelinks</div>
<div class="table-row-desc">Links between issues (blocks, duplicates, relates to). Key fields: issue_key, link_type, direction, linked_issue_key.</div>
</div>
<div class="table-row-right">
<span class="rows-badge">Real-time</span>
<span class="profile-link">
<svg width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M2 12s3-7 10-7 10 7 10 7-3 7-10 7-10-7-10-7z"/><circle cx="12" cy="12" r="3"/></svg>
Profile
</span>
</div>
</div>
<div class="table-row" onclick="openProfiler('jira_remote_links')">
<div class="table-row-left">
<div class="table-row-name">jira_remote_links</div>
<div class="table-row-desc">External links (Confluence, Slack, etc.). Key fields: issue_key, url, title, application_name.</div>
</div>
<div class="table-row-right">
<span class="rows-badge">Real-time</span>
<span class="profile-link">
<svg width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M2 12s3-7 10-7 10 7 10 7-3 7-10 7-10-7-10-7z"/><circle cx="12" cy="12" r="3"/></svg>
Profile
</span>
</div>
</div>
</div>
</div>
<!-- Attachments toggle option -->
<div class="jira-attachment-option">
<div class="jira-attachment-label">
<span>Include attachment files</span>
<span>~4,200 MB+ of images, logs, and documents</span>
</div>
<label class="toggle-switch">
<input type="checkbox" id="toggle-jira_attachments" onchange="updateJiraSubscription()" {{ 'checked' if sync_settings and sync_settings.datasets.jira_attachments }} {{ '' if sync_settings and sync_settings.datasets.jira else 'disabled' }}>
<span class="toggle-slider"></span>
</label>
</div>
</div>
</div>
<!-- ── Card 3: Platform Telemetry ── -->
<div class="source-card" id="telemetryExpertCard">
<div class="source-card-header">
<div class="source-card-left">
<div class="source-card-icon telemetry-expert">
<svg width="22" height="22" viewBox="0 0 24 24" fill="none" stroke="#6B7280" stroke-width="1.8" stroke-linecap="round" stroke-linejoin="round">
<rect x="2" y="3" width="20" height="14" rx="2" ry="2"/>
<line x1="8" y1="21" x2="16" y2="21"/>
<line x1="12" y1="17" x2="12" y2="21"/>
</svg>
</div>
<div class="source-card-info">
<div class="source-card-name">Platform Telemetry</div>
<div class="source-card-desc">Component registry, configurations, and job execution history</div>
<div class="source-card-meta">3 tables &middot; ~100 MB</div>
</div>
</div>
<div class="source-card-right">
<span id="telemetryExpertBadge" class="{{ 'badge-subscribed' if sync_settings and sync_settings.datasets.kbc_telemetry_expert else 'badge-unsubscribed' }}">
{{ 'Subscribed' if sync_settings and sync_settings.datasets.kbc_telemetry_expert else 'Not subscribed' }}
</span>
<label class="toggle-switch">
<input type="checkbox" id="toggle-kbc_telemetry_expert" onchange="updateTelemetryExpertSubscription()" {{ 'checked' if sync_settings and sync_settings.datasets.kbc_telemetry_expert }}>
<span class="toggle-slider"></span>
</label>
</div>
</div>
<!-- Unsubscribed info -->
<div id="telemetryExpertUnsubscribed" class="source-card-unsubscribed" style="{{ 'display:none' if sync_settings and sync_settings.datasets.kbc_telemetry_expert }}">
Enable this data source to get detailed platform telemetry. Includes component definitions, project configurations, and job execution history with timing, credits, and error details.
</div>
<!-- Telemetry Expert accordion (visible only when subscribed) -->
<div id="telemetryExpertAccordion" style="{{ '' if sync_settings and sync_settings.datasets.kbc_telemetry_expert else 'display:none' }}">
<div class="accordion-category">
<button class="accordion-trigger" onclick="toggleAccordion(this)">
<svg class="accordion-chevron" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round">
<polyline points="9 18 15 12 9 6"/>
</svg>
Telemetry Expert
<span class="accordion-count">3 tables</span>
</button>
<div class="accordion-content">
<div class="table-row" onclick="openProfiler('kbc_component')">
<div class="table-row-left">
<div class="table-row-name">kbc_component</div>
<div class="table-row-desc">Master data for platform components - registry of extractors, writers, transformations, and other component types.</div>
</div>
<div class="table-row-right">
<span class="rows-badge">Full refresh</span>
<span class="profile-link">
<svg width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M2 12s3-7 10-7 10 7 10 7-3 7-10 7-10-7-10-7z"/><circle cx="12" cy="12" r="3"/></svg>
Profile
</span>
</div>
</div>
<div class="table-row" onclick="openProfiler('kbc_component_configuration')">
<div class="table-row-left">
<div class="table-row-name">kbc_component_configuration</div>
<div class="table-row-desc">Component configurations within projects - tracks all component setups across projects.</div>
</div>
<div class="table-row-right">
<span class="rows-badge">Incremental</span>
<span class="profile-link">
<svg width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M2 12s3-7 10-7 10 7 10 7-3 7-10 7-10-7-10-7z"/><circle cx="12" cy="12" r="3"/></svg>
Profile
</span>
</div>
</div>
<div class="table-row" onclick="openProfiler('kbc_job')">
<div class="table-row-left">
<div class="table-row-name">kbc_job</div>
<div class="table-row-desc">Job execution history - tracks all component runs with timing, status, credits, and error details. Last 180 days.</div>
</div>
<div class="table-row-right">
<span class="rows-badge">Incremental</span>
<span class="profile-link">
<svg width="12" height="12" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M2 12s3-7 10-7 10 7 10 7-3 7-10 7-10-7-10-7z"/><circle cx="12" cy="12" r="3"/></svg>
Profile
</span>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
@ -1738,81 +1514,6 @@ function toggleAccordion(trigger) {
}
/* ═══════════════ DATASET SUBSCRIPTIONS ═══════════════ */
function collectAllDatasets() {
const jiraToggle = document.getElementById('toggle-jira');
const attachmentsToggle = document.getElementById('toggle-jira_attachments');
const telemetryExpertToggle = document.getElementById('toggle-kbc_telemetry_expert');
return {
jira: jiraToggle.checked,
jira_attachments: attachmentsToggle.checked,
kbc_telemetry_expert: telemetryExpertToggle.checked
};
}
async function saveAllDatasets() {
try {
await fetch('/api/sync-settings', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({datasets: collectAllDatasets()})
});
} catch (e) {
console.error('Failed to update settings:', e);
}
}
async function updateJiraSubscription() {
const jiraToggle = document.getElementById('toggle-jira');
const attachmentsToggle = document.getElementById('toggle-jira_attachments');
const jiraBadge = document.getElementById('jiraBadge');
const jiraUnsubscribed = document.getElementById('jiraUnsubscribed');
const jiraAccordion = document.getElementById('jiraAccordion');
// Handle dependency: jira_attachments requires jira
if (!jiraToggle.checked) {
attachmentsToggle.checked = false;
attachmentsToggle.disabled = true;
} else {
attachmentsToggle.disabled = false;
}
// Update UI immediately
if (jiraToggle.checked) {
jiraBadge.className = 'badge-subscribed';
jiraBadge.textContent = 'Subscribed';
jiraUnsubscribed.style.display = 'none';
jiraAccordion.style.display = '';
} else {
jiraBadge.className = 'badge-unsubscribed';
jiraBadge.textContent = 'Not subscribed';
jiraUnsubscribed.style.display = '';
jiraAccordion.style.display = 'none';
}
await saveAllDatasets();
}
async function updateTelemetryExpertSubscription() {
const toggle = document.getElementById('toggle-kbc_telemetry_expert');
const badge = document.getElementById('telemetryExpertBadge');
const unsubscribed = document.getElementById('telemetryExpertUnsubscribed');
const accordion = document.getElementById('telemetryExpertAccordion');
if (toggle.checked) {
badge.className = 'badge-subscribed';
badge.textContent = 'Subscribed';
unsubscribed.style.display = 'none';
accordion.style.display = '';
} else {
badge.className = 'badge-unsubscribed';
badge.textContent = 'Not subscribed';
unsubscribed.style.display = '';
accordion.style.display = 'none';
}
await saveAllDatasets();
}
/* ═══════════════ PROFILER (preserved 1:1) ═══════════════ */
let currentCharts = [];
let currentProfile = null;