agnes-the-ai-analyst/docs/superpowers/plans/2026-04-10-business-metrics.md
ZdenekSrotyr 06ac937f8b docs: add implementation plans for porting internal features
Three independent plans following TDD approach:
1. Business metrics (10 tasks) — schema v4, repository, CLI, API, starter pack, profiler integration
2. Analyst bootstrap (4 tasks) — da analyst setup, CLAUDE.md template, freshness check
3. Metadata writer (4 tasks) — column metadata repo, CLI, API, Keboola push

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 19:08:55 +02:00

1808 lines
56 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Business Metrics Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Add DuckDB-backed business metrics framework with YAML import/export, CLI, API, profiler integration, and a 10-metric starter pack.
**Architecture:** New `metric_definitions` table in system.duckdb (schema v4). Repository pattern matching `table_registry.py`. CLI commands via Typer, API via FastAPI router. Profiler refactored to read metrics from DuckDB instead of YAML.
**Tech Stack:** DuckDB, FastAPI, Typer, PyYAML, pytest
**Spec:** `docs/superpowers/specs/2026-04-10-porting-internal-features-design.md` — Section 1
---
### Task 1: Schema Migration v3→v4
**Files:**
- Modify: `src/db.py:19` (SCHEMA_VERSION), `src/db.py:258-302` (migrations + _ensure_schema)
- Test: `tests/test_db.py`
- [ ] **Step 1: Write failing test for v4 schema**
In `tests/test_db.py`, add:
```python
class TestSchemaV4:
def test_metric_definitions_table_exists(self, tmp_path, monkeypatch):
_setup_data_dir(tmp_path, monkeypatch)
from src.db import get_system_db
conn = get_system_db()
try:
tables = {
row[0]
for row in conn.execute(
"SELECT table_name FROM information_schema.tables WHERE table_schema = 'main'"
).fetchall()
}
assert "metric_definitions" in tables
assert "column_metadata" in tables
finally:
conn.close()
def test_metric_definitions_columns(self, tmp_path, monkeypatch):
_setup_data_dir(tmp_path, monkeypatch)
from src.db import get_system_db
conn = get_system_db()
try:
cols = {
row[0]
for row in conn.execute(
"SELECT column_name FROM information_schema.columns "
"WHERE table_name = 'metric_definitions'"
).fetchall()
}
expected = {
"id", "name", "display_name", "category", "description",
"type", "unit", "grain", "table_name", "tables", "expression",
"time_column", "dimensions", "filters", "synonyms", "notes",
"sql", "sql_variants", "validation", "source",
"created_at", "updated_at",
}
assert expected.issubset(cols), f"Missing: {expected - cols}"
finally:
conn.close()
def test_column_metadata_table_exists(self, tmp_path, monkeypatch):
_setup_data_dir(tmp_path, monkeypatch)
from src.db import get_system_db
conn = get_system_db()
try:
cols = {
row[0]
for row in conn.execute(
"SELECT column_name FROM information_schema.columns "
"WHERE table_name = 'column_metadata'"
).fetchall()
}
expected = {"table_id", "column_name", "basetype", "description", "confidence", "source", "updated_at"}
assert expected.issubset(cols), f"Missing: {expected - cols}"
finally:
conn.close()
def test_v3_to_v4_migration(self, tmp_path, monkeypatch):
"""Simulate a v3 database and verify migration to v4."""
_setup_data_dir(tmp_path, monkeypatch)
import duckdb
from src.db import _SYSTEM_SCHEMA
db_path = tmp_path / "state" / "system.duckdb"
db_path.parent.mkdir(parents=True, exist_ok=True)
conn = duckdb.connect(str(db_path))
conn.execute(_SYSTEM_SCHEMA)
conn.execute("INSERT INTO schema_version (version) VALUES (3)")
conn.close()
from src.db import get_system_db
conn = get_system_db()
try:
version = conn.execute("SELECT MAX(version) FROM schema_version").fetchone()[0]
assert version == 4
tables = {
row[0]
for row in conn.execute(
"SELECT table_name FROM information_schema.tables WHERE table_schema = 'main'"
).fetchall()
}
assert "metric_definitions" in tables
assert "column_metadata" in tables
finally:
conn.close()
```
- [ ] **Step 2: Run test to verify it fails**
Run: `pytest tests/test_db.py::TestSchemaV4 -v`
Expected: FAIL — `metric_definitions` table does not exist
- [ ] **Step 3: Implement schema v4 migration**
In `src/db.py`, change line 19:
```python
SCHEMA_VERSION = 4
```
After `_V2_TO_V3_MIGRATIONS` (line ~260), add:
```python
_V3_TO_V4_MIGRATIONS = [
"""CREATE TABLE IF NOT EXISTS metric_definitions (
id VARCHAR PRIMARY KEY,
name VARCHAR NOT NULL,
display_name VARCHAR NOT NULL,
category VARCHAR NOT NULL,
description TEXT,
type VARCHAR DEFAULT 'sum',
unit VARCHAR,
grain VARCHAR DEFAULT 'monthly',
table_name VARCHAR,
tables VARCHAR[],
expression VARCHAR,
time_column VARCHAR,
dimensions VARCHAR[],
filters VARCHAR[],
synonyms VARCHAR[],
notes VARCHAR[],
sql TEXT NOT NULL,
sql_variants JSON,
validation JSON,
source VARCHAR DEFAULT 'manual',
created_at TIMESTAMP DEFAULT current_timestamp,
updated_at TIMESTAMP DEFAULT current_timestamp
)""",
"""CREATE TABLE IF NOT EXISTS column_metadata (
table_id VARCHAR NOT NULL,
column_name VARCHAR NOT NULL,
basetype VARCHAR,
description VARCHAR,
confidence VARCHAR DEFAULT 'manual',
source VARCHAR DEFAULT 'manual',
updated_at TIMESTAMP DEFAULT current_timestamp,
PRIMARY KEY (table_id, column_name)
)""",
]
```
In `_ensure_schema()`, after the `if current < 3:` block (line ~298), add:
```python
if current < 4:
for sql in _V3_TO_V4_MIGRATIONS:
conn.execute(sql)
```
Also update the `test_creates_all_tables` test's `expected` set to include `"metric_definitions"` and `"column_metadata"`.
- [ ] **Step 4: Run test to verify it passes**
Run: `pytest tests/test_db.py -v`
Expected: ALL PASS
- [ ] **Step 5: Commit**
```bash
git add src/db.py tests/test_db.py
git commit -m "feat: add schema v4 with metric_definitions and column_metadata tables"
```
---
### Task 2: MetricRepository — CRUD
**Files:**
- Create: `src/repositories/metrics.py`
- Test: `tests/test_metrics.py`
- [ ] **Step 1: Write failing tests for MetricRepository CRUD**
Create `tests/test_metrics.py`:
```python
"""Tests for MetricRepository."""
import os
import pytest
import duckdb
@pytest.fixture
def db_conn(tmp_path, monkeypatch):
monkeypatch.setenv("DATA_DIR", str(tmp_path))
from src.db import get_system_db
conn = get_system_db()
yield conn
conn.close()
SAMPLE_METRIC = {
"id": "revenue/mrr",
"name": "mrr",
"display_name": "Monthly Recurring Revenue",
"category": "revenue",
"description": "Total MRR from all subscriptions",
"type": "sum",
"unit": "USD",
"grain": "monthly",
"table_name": "subscriptions",
"expression": "SUM(mrr_amount)",
"time_column": "billing_date",
"dimensions": ["plan_type", "region"],
"synonyms": ["monthly_revenue", "recurring_revenue"],
"notes": ["Excludes one-time fees"],
"sql": "SELECT DATE_TRUNC('month', billing_date) AS month, SUM(mrr_amount) AS mrr FROM subscriptions GROUP BY 1",
}
class TestMetricRepositoryCreate:
def test_create_metric(self, db_conn):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
result = repo.create(**SAMPLE_METRIC)
assert result["id"] == "revenue/mrr"
assert result["name"] == "mrr"
def test_create_duplicate_upserts(self, db_conn):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
repo.create(**SAMPLE_METRIC)
updated = {**SAMPLE_METRIC, "display_name": "Updated MRR"}
result = repo.create(**updated)
assert result["display_name"] == "Updated MRR"
assert len(repo.list()) == 1
class TestMetricRepositoryRead:
def test_get_existing(self, db_conn):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
repo.create(**SAMPLE_METRIC)
result = repo.get("revenue/mrr")
assert result is not None
assert result["name"] == "mrr"
assert result["dimensions"] == ["plan_type", "region"]
def test_get_missing(self, db_conn):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
assert repo.get("nonexistent/metric") is None
def test_list_all(self, db_conn):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
repo.create(**SAMPLE_METRIC)
repo.create(
id="revenue/arr", name="arr", display_name="ARR",
category="revenue", sql="SELECT 1",
)
results = repo.list()
assert len(results) == 2
def test_list_by_category(self, db_conn):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
repo.create(**SAMPLE_METRIC)
repo.create(
id="ops/resolution_time", name="resolution_time",
display_name="Resolution Time", category="ops",
sql="SELECT 1",
)
results = repo.list(category="revenue")
assert len(results) == 1
assert results[0]["name"] == "mrr"
class TestMetricRepositoryUpdate:
def test_update_fields(self, db_conn):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
repo.create(**SAMPLE_METRIC)
result = repo.update("revenue/mrr", display_name="Gross MRR", unit="EUR")
assert result["display_name"] == "Gross MRR"
assert result["unit"] == "EUR"
assert result["name"] == "mrr" # unchanged
def test_update_missing_returns_none(self, db_conn):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
assert repo.update("nonexistent/x", name="y") is None
class TestMetricRepositoryDelete:
def test_delete_existing(self, db_conn):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
repo.create(**SAMPLE_METRIC)
assert repo.delete("revenue/mrr") is True
assert repo.get("revenue/mrr") is None
def test_delete_missing(self, db_conn):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
assert repo.delete("nonexistent/x") is False
class TestMetricRepositorySearch:
def test_find_by_table(self, db_conn):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
repo.create(**SAMPLE_METRIC)
repo.create(
id="revenue/arr", name="arr", display_name="ARR",
category="revenue", table_name="subscriptions",
sql="SELECT 1",
)
repo.create(
id="ops/tickets", name="tickets", display_name="Tickets",
category="ops", table_name="tickets",
sql="SELECT 1",
)
results = repo.find_by_table("subscriptions")
assert len(results) == 2
names = {r["name"] for r in results}
assert names == {"mrr", "arr"}
def test_find_by_synonym(self, db_conn):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
repo.create(**SAMPLE_METRIC)
results = repo.find_by_synonym("recurring_revenue")
assert len(results) == 1
assert results[0]["name"] == "mrr"
def test_get_table_map(self, db_conn):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
repo.create(**SAMPLE_METRIC)
repo.create(
id="ops/tickets", name="tickets", display_name="Tickets",
category="ops", table_name="tickets",
sql="SELECT 1",
)
table_map = repo.get_table_map()
assert table_map == {"subscriptions": ["mrr"], "tickets": ["tickets"]}
```
- [ ] **Step 2: Run tests to verify they fail**
Run: `pytest tests/test_metrics.py -v`
Expected: FAIL — `ModuleNotFoundError: No module named 'src.repositories.metrics'`
- [ ] **Step 3: Implement MetricRepository**
Create `src/repositories/metrics.py`:
```python
"""Repository for metric definitions."""
from datetime import datetime, timezone
from typing import Any, Dict, List, Optional
import duckdb
class MetricRepository:
def __init__(self, conn: duckdb.DuckDBPyConnection):
self.conn = conn
def create(self, id: str, name: str, display_name: str, category: str,
sql: str, description: Optional[str] = None,
type: str = "sum", unit: Optional[str] = None,
grain: str = "monthly", table_name: Optional[str] = None,
tables: Optional[List[str]] = None, expression: Optional[str] = None,
time_column: Optional[str] = None, dimensions: Optional[List[str]] = None,
filters: Optional[List[str]] = None, synonyms: Optional[List[str]] = None,
notes: Optional[List[str]] = None, sql_variants: Optional[dict] = None,
validation: Optional[dict] = None, source: str = "manual",
**kwargs) -> Dict[str, Any]:
now = datetime.now(timezone.utc)
self.conn.execute(
"""INSERT INTO metric_definitions (
id, name, display_name, category, description, type, unit, grain,
table_name, tables, expression, time_column, dimensions, filters,
synonyms, notes, sql, sql_variants, validation, source,
created_at, updated_at
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT (id) DO UPDATE SET
name = excluded.name, display_name = excluded.display_name,
category = excluded.category, description = excluded.description,
type = excluded.type, unit = excluded.unit, grain = excluded.grain,
table_name = excluded.table_name, tables = excluded.tables,
expression = excluded.expression, time_column = excluded.time_column,
dimensions = excluded.dimensions, filters = excluded.filters,
synonyms = excluded.synonyms, notes = excluded.notes,
sql = excluded.sql, sql_variants = excluded.sql_variants,
validation = excluded.validation, source = excluded.source,
updated_at = excluded.updated_at""",
[id, name, display_name, category, description, type, unit, grain,
table_name, tables, expression, time_column, dimensions, filters,
synonyms, notes, sql,
json_dumps(sql_variants), json_dumps(validation),
source, now, now],
)
return self.get(id)
def get(self, metric_id: str) -> Optional[Dict[str, Any]]:
result = self.conn.execute(
"SELECT * FROM metric_definitions WHERE id = ?", [metric_id]
).fetchone()
if not result:
return None
columns = [desc[0] for desc in self.conn.description]
return dict(zip(columns, result))
def list(self, category: Optional[str] = None) -> List[Dict[str, Any]]:
if category:
results = self.conn.execute(
"SELECT * FROM metric_definitions WHERE category = ? ORDER BY name",
[category],
).fetchall()
else:
results = self.conn.execute(
"SELECT * FROM metric_definitions ORDER BY category, name"
).fetchall()
if not results:
return []
columns = [desc[0] for desc in self.conn.description]
return [dict(zip(columns, row)) for row in results]
def update(self, metric_id: str, **kwargs) -> Optional[Dict[str, Any]]:
existing = self.get(metric_id)
if not existing:
return None
kwargs["updated_at"] = datetime.now(timezone.utc)
# Convert JSON fields
for json_field in ("sql_variants", "validation"):
if json_field in kwargs and kwargs[json_field] is not None:
kwargs[json_field] = json_dumps(kwargs[json_field])
set_clauses = ", ".join(f"{k} = ?" for k in kwargs)
values = list(kwargs.values()) + [metric_id]
self.conn.execute(
f"UPDATE metric_definitions SET {set_clauses} WHERE id = ?",
values,
)
return self.get(metric_id)
def delete(self, metric_id: str) -> bool:
existing = self.get(metric_id)
if not existing:
return False
self.conn.execute("DELETE FROM metric_definitions WHERE id = ?", [metric_id])
return True
def find_by_table(self, table_name: str) -> List[Dict[str, Any]]:
results = self.conn.execute(
"SELECT * FROM metric_definitions WHERE table_name = ? ORDER BY name",
[table_name],
).fetchall()
if not results:
return []
columns = [desc[0] for desc in self.conn.description]
return [dict(zip(columns, row)) for row in results]
def find_by_synonym(self, term: str) -> List[Dict[str, Any]]:
results = self.conn.execute(
"SELECT * FROM metric_definitions WHERE list_contains(synonyms, ?) ORDER BY name",
[term],
).fetchall()
if not results:
return []
columns = [desc[0] for desc in self.conn.description]
return [dict(zip(columns, row)) for row in results]
def get_table_map(self) -> Dict[str, List[str]]:
"""Return {table_name: [metric_name, ...]} for profiler integration."""
results = self.conn.execute(
"SELECT table_name, name FROM metric_definitions WHERE table_name IS NOT NULL ORDER BY table_name"
).fetchall()
table_map: Dict[str, List[str]] = {}
for table_name, metric_name in results:
table_map.setdefault(table_name, []).append(metric_name)
return table_map
def json_dumps(obj) -> Optional[str]:
"""Serialize to JSON string for DuckDB JSON columns, or None."""
if obj is None:
return None
import json
return json.dumps(obj)
```
- [ ] **Step 4: Run tests to verify they pass**
Run: `pytest tests/test_metrics.py -v`
Expected: ALL PASS
- [ ] **Step 5: Commit**
```bash
git add src/repositories/metrics.py tests/test_metrics.py
git commit -m "feat: add MetricRepository with CRUD, search, and table map"
```
---
### Task 3: YAML Import/Export
**Files:**
- Modify: `src/repositories/metrics.py` (add import_from_yaml, export_to_yaml)
- Test: `tests/test_metrics.py` (add import/export tests)
- [ ] **Step 1: Write failing tests for YAML import/export**
Add to `tests/test_metrics.py`:
```python
import yaml
from pathlib import Path
@pytest.fixture
def metrics_dir(tmp_path):
"""Create a sample metrics directory with YAML files."""
revenue_dir = tmp_path / "metrics" / "revenue"
revenue_dir.mkdir(parents=True)
ops_dir = tmp_path / "metrics" / "operations"
ops_dir.mkdir(parents=True)
# total_revenue.yml — uses 'table' key (YAML format)
(revenue_dir / "total_revenue.yml").write_text(yaml.dump({
"name": "total_revenue",
"display_name": "Total Revenue",
"category": "revenue",
"type": "sum",
"unit": "USD",
"grain": "monthly",
"table": "orders",
"expression": "SUM(total_amount)",
"time_column": "order_date",
"dimensions": ["channel", "region"],
"synonyms": ["gross_revenue"],
"sql": "SELECT SUM(total_amount) FROM orders",
"sql_by_channel": "SELECT channel, SUM(total_amount) FROM orders GROUP BY 1",
}))
# resolution_time.yml
(ops_dir / "resolution_time.yml").write_text(yaml.dump({
"name": "resolution_time",
"display_name": "Support Resolution Time",
"category": "operations",
"type": "avg",
"unit": "hours",
"table": "tickets",
"sql": "SELECT AVG(resolution_hours) FROM tickets",
}))
return tmp_path / "metrics"
class TestMetricRepositoryImport:
def test_import_from_directory(self, db_conn, metrics_dir):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
count = repo.import_from_yaml(metrics_dir)
assert count == 2
assert repo.get("revenue/total_revenue") is not None
assert repo.get("operations/resolution_time") is not None
def test_import_maps_table_to_table_name(self, db_conn, metrics_dir):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
repo.import_from_yaml(metrics_dir)
metric = repo.get("revenue/total_revenue")
assert metric["table_name"] == "orders"
def test_import_collects_sql_variants(self, db_conn, metrics_dir):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
repo.import_from_yaml(metrics_dir)
metric = repo.get("revenue/total_revenue")
variants = metric["sql_variants"]
# DuckDB returns JSON as string or dict depending on version
if isinstance(variants, str):
import json
variants = json.loads(variants)
assert "by_channel" in variants
def test_import_single_file(self, db_conn, metrics_dir):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
count = repo.import_from_yaml(metrics_dir / "revenue" / "total_revenue.yml")
assert count == 1
def test_import_idempotent(self, db_conn, metrics_dir):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
repo.import_from_yaml(metrics_dir)
repo.import_from_yaml(metrics_dir)
assert len(repo.list()) == 2
class TestMetricRepositoryExport:
def test_export_to_yaml(self, db_conn, metrics_dir, tmp_path):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
repo.import_from_yaml(metrics_dir)
export_dir = tmp_path / "export"
count = repo.export_to_yaml(export_dir)
assert count == 2
# Check directory structure
assert (export_dir / "revenue" / "total_revenue.yml").exists()
assert (export_dir / "operations" / "resolution_time.yml").exists()
# Check content
data = yaml.safe_load((export_dir / "revenue" / "total_revenue.yml").read_text())
assert data["name"] == "total_revenue"
assert data["table"] == "orders" # exported as 'table', not 'table_name'
```
- [ ] **Step 2: Run tests to verify they fail**
Run: `pytest tests/test_metrics.py::TestMetricRepositoryImport -v`
Expected: FAIL — `AttributeError: 'MetricRepository' object has no attribute 'import_from_yaml'`
- [ ] **Step 3: Implement import_from_yaml and export_to_yaml**
Add to `MetricRepository` class in `src/repositories/metrics.py`:
```python
def import_from_yaml(self, path) -> int:
"""Import metrics from YAML file(s). Returns count imported.
Args:
path: Path to a single YAML file or directory containing category/metric.yml files.
"""
from pathlib import Path
import yaml
path = Path(path)
count = 0
if path.is_file():
yml_files = [path]
elif path.is_dir():
yml_files = sorted(path.glob("*/*.yml"))
else:
return 0
for yml_file in yml_files:
try:
with open(yml_file) as f:
data = yaml.safe_load(f)
except (yaml.YAMLError, OSError):
continue
if not data:
continue
# Handle list-wrapped metrics (internal repo format)
metric_list = data if isinstance(data, list) else [data]
for metric in metric_list:
if not isinstance(metric, dict) or "name" not in metric:
continue
# Infer category from directory name
category = metric.get("category", yml_file.parent.name)
name = metric["name"]
metric_id = f"{category}/{name}"
# Map YAML 'table' → DuckDB 'table_name'
table_name = metric.pop("table", None)
# Collect sql_by_* variants
sql_variants = {}
keys_to_remove = []
for key in list(metric.keys()):
if key.startswith("sql_by_"):
variant_name = key[4:] # "sql_by_channel" → "by_channel"
sql_variants[variant_name] = metric[key]
keys_to_remove.append(key)
for key in keys_to_remove:
del metric[key]
self.create(
id=metric_id,
name=name,
display_name=metric.get("display_name", name),
category=category,
description=metric.get("description"),
type=metric.get("type", "sum"),
unit=metric.get("unit"),
grain=metric.get("grain", "monthly"),
table_name=table_name or metric.get("table_name"),
tables=metric.get("tables"),
expression=metric.get("expression"),
time_column=metric.get("time_column"),
dimensions=metric.get("dimensions"),
filters=metric.get("filters"),
synonyms=metric.get("synonyms"),
notes=metric.get("notes"),
sql=metric.get("sql", ""),
sql_variants=sql_variants if sql_variants else None,
validation=metric.get("validation"),
source="yaml_import",
)
count += 1
return count
def export_to_yaml(self, output_dir) -> int:
"""Export all metrics to YAML files. Returns count exported."""
from pathlib import Path
import yaml
import json
output_dir = Path(output_dir)
count = 0
for metric in self.list():
category = metric["category"]
name = metric["name"]
cat_dir = output_dir / category
cat_dir.mkdir(parents=True, exist_ok=True)
# Build YAML-compatible dict
data = {
"name": name,
"display_name": metric["display_name"],
"category": category,
}
# Map DuckDB 'table_name' back to YAML 'table'
if metric.get("table_name"):
data["table"] = metric["table_name"]
for field in ("description", "type", "unit", "grain", "tables",
"expression", "time_column", "dimensions", "filters",
"synonyms", "notes"):
if metric.get(field) is not None:
data[field] = metric[field]
if metric.get("sql"):
data["sql"] = metric["sql"]
# Expand sql_variants back to sql_by_* keys
variants = metric.get("sql_variants")
if variants:
if isinstance(variants, str):
variants = json.loads(variants)
for key, val in variants.items():
data[f"sql_{key}"] = val
if metric.get("validation"):
val = metric["validation"]
if isinstance(val, str):
val = json.loads(val)
data["validation"] = val
yml_path = cat_dir / f"{name}.yml"
yml_path.write_text(yaml.dump(data, default_flow_style=False, allow_unicode=True, sort_keys=False))
count += 1
return count
```
- [ ] **Step 4: Run tests to verify they pass**
Run: `pytest tests/test_metrics.py -v`
Expected: ALL PASS
- [ ] **Step 5: Commit**
```bash
git add src/repositories/metrics.py tests/test_metrics.py
git commit -m "feat: add YAML import/export to MetricRepository"
```
---
### Task 4: CLI `da metrics`
**Files:**
- Create: `cli/commands/metrics.py`
- Modify: `cli/main.py` (register metrics_app)
- Test: `tests/test_cli.py` (add metrics help test)
- [ ] **Step 1: Write failing test for CLI registration**
Add to `tests/test_cli.py` in `TestCLIHelp`:
```python
def test_metrics_help(self):
result = runner.invoke(app, ["metrics", "--help"])
assert result.exit_code == 0
assert "list" in result.output
assert "show" in result.output
assert "import" in result.output
```
- [ ] **Step 2: Run test to verify it fails**
Run: `pytest tests/test_cli.py::TestCLIHelp::test_metrics_help -v`
Expected: FAIL — `No such command 'metrics'`
- [ ] **Step 3: Implement CLI commands**
Create `cli/commands/metrics.py`:
```python
"""Metrics commands — da metrics."""
import json
from pathlib import Path
import typer
from cli.client import api_get
metrics_app = typer.Typer(help="Business metrics — list, show, import, export, validate")
@metrics_app.command("list")
def list_metrics(
category: str = typer.Option(None, "--category", "-c", help="Filter by category"),
as_json: bool = typer.Option(False, "--json", help="Output as JSON"),
):
"""List all metrics."""
params = {}
if category:
params["category"] = category
resp = api_get("/api/metrics", params=params)
if resp.status_code != 200:
typer.echo(f"Failed: {resp.json().get('detail', resp.text)}", err=True)
raise typer.Exit(1)
data = resp.json()
metrics = data.get("metrics", [])
if as_json:
typer.echo(json.dumps(metrics, indent=2))
else:
if not metrics:
typer.echo("No metrics found. Import with: da metrics import docs/metrics/")
return
current_cat = None
for m in metrics:
if m["category"] != current_cat:
current_cat = m["category"]
typer.echo(f"\n {current_cat.upper()}")
typer.echo(f" {m['id']:40s} {m.get('display_name', m['name'])}")
typer.echo(f"\nTotal: {len(metrics)} metrics")
@metrics_app.command("show")
def show_metric(
metric_id: str = typer.Argument(..., help="Metric ID (e.g., revenue/mrr)"),
as_json: bool = typer.Option(False, "--json", help="Output as JSON"),
):
"""Show detail for a metric."""
resp = api_get(f"/api/metrics/{metric_id}")
if resp.status_code == 404:
typer.echo(f"Metric not found: {metric_id}", err=True)
raise typer.Exit(1)
if resp.status_code != 200:
typer.echo(f"Failed: {resp.json().get('detail', resp.text)}", err=True)
raise typer.Exit(1)
m = resp.json()
if as_json:
typer.echo(json.dumps(m, indent=2))
else:
typer.echo(f" {m['display_name']} ({m['id']})")
typer.echo(f" Type: {m.get('type', 'sum')} | Unit: {m.get('unit', '-')} | Grain: {m.get('grain', '-')}")
if m.get("description"):
typer.echo(f"\n {m['description']}")
if m.get("table_name"):
typer.echo(f"\n Table: {m['table_name']}")
if m.get("dimensions"):
typer.echo(f" Dimensions: {', '.join(m['dimensions'])}")
if m.get("notes"):
typer.echo("\n Notes:")
for note in m["notes"]:
typer.echo(f" - {note}")
if m.get("sql"):
typer.echo(f"\n SQL:\n {m['sql']}")
@metrics_app.command("import")
def import_metrics(
path: str = typer.Argument(..., help="Path to YAML file or directory"),
):
"""Import metrics from YAML into DuckDB."""
from src.db import get_system_db
from src.repositories.metrics import MetricRepository
source = Path(path)
if not source.exists():
typer.echo(f"Path not found: {path}", err=True)
raise typer.Exit(1)
conn = get_system_db()
try:
repo = MetricRepository(conn)
count = repo.import_from_yaml(source)
typer.echo(f"Imported {count} metrics into DuckDB")
finally:
conn.close()
@metrics_app.command("export")
def export_metrics(
output_dir: str = typer.Option("./metrics_export", "--dir", "-d", help="Output directory"),
):
"""Export metrics from DuckDB to YAML files."""
from src.db import get_system_db
from src.repositories.metrics import MetricRepository
conn = get_system_db()
try:
repo = MetricRepository(conn)
count = repo.export_to_yaml(output_dir)
typer.echo(f"Exported {count} metrics to {output_dir}/")
finally:
conn.close()
@metrics_app.command("validate")
def validate_metrics():
"""Validate metrics — check that referenced tables exist in analytics DB."""
from src.db import get_system_db
from src.repositories.metrics import MetricRepository
from src.repositories.table_registry import TableRegistryRepository
conn = get_system_db()
try:
metrics_repo = MetricRepository(conn)
tables_repo = TableRegistryRepository(conn)
all_tables = {t["id"] for t in tables_repo.list_all()}
metrics = metrics_repo.list()
ok = 0
warnings = 0
for m in metrics:
if m.get("table_name") and m["table_name"] not in all_tables:
typer.echo(f" WARN {m['id']}: table '{m['table_name']}' not in registry")
warnings += 1
else:
ok += 1
typer.echo(f"\nValidated {len(metrics)} metrics: {ok} OK, {warnings} warnings")
finally:
conn.close()
```
Register in `cli/main.py` — add import and registration:
```python
from cli.commands.metrics import metrics_app
# ...
app.add_typer(metrics_app, name="metrics")
```
- [ ] **Step 4: Run tests to verify they pass**
Run: `pytest tests/test_cli.py::TestCLIHelp::test_metrics_help -v`
Expected: PASS
- [ ] **Step 5: Commit**
```bash
git add cli/commands/metrics.py cli/main.py tests/test_cli.py
git commit -m "feat: add da metrics CLI commands (list, show, import, export, validate)"
```
---
### Task 5: API Endpoints
**Files:**
- Create: `app/api/metrics.py`
- Modify: `app/main.py` (register router)
- Modify: `app/api/catalog.py` (deprecation redirect)
- Test: `tests/test_api.py` (add metrics API tests)
- [ ] **Step 1: Write failing tests for metrics API**
Add to `tests/test_api.py`:
```python
class TestMetricsAPI:
def test_list_metrics_empty(self, seeded_client):
client, admin_token, _ = seeded_client
resp = client.get("/api/metrics", headers={"Authorization": f"Bearer {admin_token}"})
assert resp.status_code == 200
assert resp.json()["metrics"] == []
def test_create_and_list_metric(self, seeded_client):
client, admin_token, _ = seeded_client
metric = {
"id": "revenue/mrr",
"name": "mrr",
"display_name": "MRR",
"category": "revenue",
"sql": "SELECT SUM(amount) FROM subscriptions",
}
resp = client.post(
"/api/admin/metrics",
json=metric,
headers={"Authorization": f"Bearer {admin_token}"},
)
assert resp.status_code == 201
resp = client.get("/api/metrics", headers={"Authorization": f"Bearer {admin_token}"})
assert resp.status_code == 200
assert len(resp.json()["metrics"]) == 1
def test_get_metric_detail(self, seeded_client):
client, admin_token, _ = seeded_client
client.post(
"/api/admin/metrics",
json={"id": "revenue/mrr", "name": "mrr", "display_name": "MRR",
"category": "revenue", "sql": "SELECT 1"},
headers={"Authorization": f"Bearer {admin_token}"},
)
resp = client.get("/api/metrics/revenue/mrr", headers={"Authorization": f"Bearer {admin_token}"})
assert resp.status_code == 200
assert resp.json()["name"] == "mrr"
def test_get_metric_not_found(self, seeded_client):
client, admin_token, _ = seeded_client
resp = client.get("/api/metrics/nonexistent/x", headers={"Authorization": f"Bearer {admin_token}"})
assert resp.status_code == 404
def test_delete_metric(self, seeded_client):
client, admin_token, _ = seeded_client
client.post(
"/api/admin/metrics",
json={"id": "revenue/mrr", "name": "mrr", "display_name": "MRR",
"category": "revenue", "sql": "SELECT 1"},
headers={"Authorization": f"Bearer {admin_token}"},
)
resp = client.delete("/api/admin/metrics/revenue/mrr",
headers={"Authorization": f"Bearer {admin_token}"})
assert resp.status_code == 200
def test_analyst_cannot_create_metric(self, seeded_client):
client, _, analyst_token = seeded_client
resp = client.post(
"/api/admin/metrics",
json={"id": "revenue/mrr", "name": "mrr", "display_name": "MRR",
"category": "revenue", "sql": "SELECT 1"},
headers={"Authorization": f"Bearer {analyst_token}"},
)
assert resp.status_code == 403
def test_list_metrics_filter_by_category(self, seeded_client):
client, admin_token, _ = seeded_client
for m in [
{"id": "revenue/mrr", "name": "mrr", "display_name": "MRR", "category": "revenue", "sql": "SELECT 1"},
{"id": "ops/tickets", "name": "tickets", "display_name": "Tickets", "category": "ops", "sql": "SELECT 1"},
]:
client.post("/api/admin/metrics", json=m, headers={"Authorization": f"Bearer {admin_token}"})
resp = client.get("/api/metrics?category=revenue", headers={"Authorization": f"Bearer {admin_token}"})
assert len(resp.json()["metrics"]) == 1
```
- [ ] **Step 2: Run tests to verify they fail**
Run: `pytest tests/test_api.py::TestMetricsAPI -v`
Expected: FAIL — 404 on `/api/metrics`
- [ ] **Step 3: Implement API router**
Create `app/api/metrics.py`:
```python
"""Metrics API endpoints."""
from typing import Optional
from fastapi import APIRouter, Depends, HTTPException, Query
from pydantic import BaseModel
import duckdb
from app.auth.dependencies import get_current_user, require_admin, _get_db
from src.repositories.metrics import MetricRepository
router = APIRouter(tags=["metrics"])
class MetricCreate(BaseModel):
id: str
name: str
display_name: str
category: str
sql: str
description: Optional[str] = None
type: str = "sum"
unit: Optional[str] = None
grain: str = "monthly"
table_name: Optional[str] = None
tables: Optional[list] = None
expression: Optional[str] = None
time_column: Optional[str] = None
dimensions: Optional[list] = None
filters: Optional[list] = None
synonyms: Optional[list] = None
notes: Optional[list] = None
sql_variants: Optional[dict] = None
validation: Optional[dict] = None
@router.get("/api/metrics")
async def list_metrics(
category: Optional[str] = Query(None),
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
repo = MetricRepository(conn)
metrics = repo.list(category=category)
return {"metrics": metrics, "count": len(metrics)}
@router.get("/api/metrics/{metric_id:path}")
async def get_metric(
metric_id: str,
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
repo = MetricRepository(conn)
metric = repo.get(metric_id)
if not metric:
raise HTTPException(status_code=404, detail=f"Metric not found: {metric_id}")
return metric
@router.post("/api/admin/metrics", status_code=201)
async def create_metric(
body: MetricCreate,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
repo = MetricRepository(conn)
return repo.create(**body.model_dump())
@router.delete("/api/admin/metrics/{metric_id:path}")
async def delete_metric(
metric_id: str,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
repo = MetricRepository(conn)
if not repo.delete(metric_id):
raise HTTPException(status_code=404, detail=f"Metric not found: {metric_id}")
return {"status": "deleted", "id": metric_id}
```
Register in `app/main.py` — add import and include:
```python
from app.api.metrics import router as metrics_router
# ... (add near other router imports at top of file)
# In create_app(), add before web_router:
app.include_router(metrics_router)
```
Deprecate old endpoint in `app/api/catalog.py` — replace the `get_metric` function:
```python
@router.get("/metrics/{metric_path:path}", deprecated=True)
async def get_metric(
metric_path: str,
user: dict = Depends(get_current_user),
):
"""Deprecated: Use GET /api/metrics/{metric_id} instead."""
from fastapi.responses import RedirectResponse
# Strip .yml extension for the new endpoint
metric_id = metric_path.replace(".yml", "")
return RedirectResponse(url=f"/api/metrics/{metric_id}", status_code=301)
```
- [ ] **Step 4: Run tests to verify they pass**
Run: `pytest tests/test_api.py::TestMetricsAPI -v`
Expected: ALL PASS
- [ ] **Step 5: Commit**
```bash
git add app/api/metrics.py app/main.py app/api/catalog.py tests/test_api.py
git commit -m "feat: add metrics API endpoints with admin CRUD"
```
---
### Task 6: Starter Pack Metrics
**Files:**
- Create: `docs/metrics/metrics.yml` (index)
- Create: `docs/metrics/revenue/mrr.yml`
- Create: `docs/metrics/revenue/arr.yml`
- Create: `docs/metrics/revenue/churn_rate.yml`
- Create: `docs/metrics/product_usage/active_users.yml`
- Create: `docs/metrics/product_usage/feature_adoption.yml`
- Create: `docs/metrics/sales/new_customers.yml`
- Create: `docs/metrics/sales/upsell_expansion.yml`
- Create: `docs/metrics/sales/pipeline_value.yml`
- Create: `docs/metrics/operations/support_resolution_time.yml`
- Create: `docs/metrics/operations/infrastructure_cost.yml`
- Existing: `docs/metrics/revenue/total_revenue.yml` (already exists, no change)
- [ ] **Step 1: Create metrics index**
Create `docs/metrics/metrics.yml`:
```yaml
version: "2.0"
description: "Business metrics starter pack. Import with: da metrics import docs/metrics/"
categories:
- name: revenue
folder: revenue/
metrics: [total_revenue, mrr, arr, churn_rate]
- name: product_usage
folder: product_usage/
metrics: [active_users, feature_adoption]
- name: sales
folder: sales/
metrics: [new_customers, upsell_expansion, pipeline_value]
- name: operations
folder: operations/
metrics: [support_resolution_time, infrastructure_cost]
```
- [ ] **Step 2: Create revenue metrics**
Create `docs/metrics/revenue/mrr.yml`:
```yaml
- name: mrr
display_name: Monthly Recurring Revenue
category: revenue
type: sum
unit: USD
grain: monthly
table: subscriptions
expression: "SUM(mrr_amount)"
time_column: billing_date
dimensions:
- plan_type
- region
notes:
- "Aggregated at company level, not per-contract"
- "Excludes one-time setup fees and overages"
synonyms:
- monthly_revenue
- recurring_revenue
sql: |
SELECT
DATE_TRUNC('month', billing_date) AS month,
SUM(mrr_amount) AS mrr
FROM subscriptions
WHERE status = 'active'
GROUP BY 1
ORDER BY 1
sql_by_plan: |
SELECT
DATE_TRUNC('month', billing_date) AS month,
plan_type,
SUM(mrr_amount) AS mrr
FROM subscriptions
WHERE status = 'active'
GROUP BY 1, 2
ORDER BY 1, 3 DESC
```
Create `docs/metrics/revenue/arr.yml`:
```yaml
- name: arr
display_name: Annual Recurring Revenue
category: revenue
type: sum
unit: USD
grain: monthly
table: subscriptions
expression: "SUM(mrr_amount) * 12"
time_column: billing_date
notes:
- "ARR = MRR × 12 (annualized current MRR)"
- "Point-in-time snapshot, not forward-looking"
synonyms:
- annual_revenue
- annualized_revenue
sql: |
SELECT
DATE_TRUNC('month', billing_date) AS month,
SUM(mrr_amount) * 12 AS arr
FROM subscriptions
WHERE status = 'active'
GROUP BY 1
ORDER BY 1
```
Create `docs/metrics/revenue/churn_rate.yml`:
```yaml
- name: churn_rate
display_name: Monthly Churn Rate
category: revenue
type: ratio
unit: percentage
grain: monthly
table: subscriptions
expression: "churned_mrr / beginning_mrr * 100"
time_column: billing_date
notes:
- "Gross revenue churn — does not net out expansion"
- "Beginning MRR is MRR at start of month"
synonyms:
- revenue_churn
- mrr_churn
sql: |
WITH monthly AS (
SELECT
DATE_TRUNC('month', cancelled_at) AS month,
SUM(mrr_amount) AS churned_mrr
FROM subscriptions
WHERE status = 'cancelled'
GROUP BY 1
),
beginning AS (
SELECT
DATE_TRUNC('month', billing_date) AS month,
SUM(mrr_amount) AS beginning_mrr
FROM subscriptions
WHERE status = 'active'
GROUP BY 1
)
SELECT
b.month,
COALESCE(m.churned_mrr, 0) / NULLIF(b.beginning_mrr, 0) * 100 AS churn_rate
FROM beginning b
LEFT JOIN monthly m ON b.month = m.month
ORDER BY 1
```
- [ ] **Step 3: Create product_usage metrics**
Create `docs/metrics/product_usage/active_users.yml`:
```yaml
- name: active_users
display_name: Monthly Active Users
category: product_usage
type: count
unit: count
grain: monthly
table: user_events
expression: "COUNT(DISTINCT user_id)"
time_column: event_date
dimensions:
- feature
- plan_type
synonyms:
- mau
- monthly_active
sql: |
SELECT
DATE_TRUNC('month', event_date) AS month,
COUNT(DISTINCT user_id) AS active_users
FROM user_events
GROUP BY 1
ORDER BY 1
```
Create `docs/metrics/product_usage/feature_adoption.yml`:
```yaml
- name: feature_adoption
display_name: Feature Adoption Rate
category: product_usage
type: ratio
unit: percentage
grain: monthly
table: user_events
expression: "users_using_feature / total_active_users * 100"
time_column: event_date
dimensions:
- feature
synonyms:
- adoption_rate
- feature_usage
sql: |
WITH feature_users AS (
SELECT
DATE_TRUNC('month', event_date) AS month,
feature,
COUNT(DISTINCT user_id) AS users_using
FROM user_events
GROUP BY 1, 2
),
total AS (
SELECT
DATE_TRUNC('month', event_date) AS month,
COUNT(DISTINCT user_id) AS total_users
FROM user_events
GROUP BY 1
)
SELECT
f.month, f.feature,
f.users_using * 100.0 / NULLIF(t.total_users, 0) AS adoption_pct
FROM feature_users f
JOIN total t ON f.month = t.month
ORDER BY 1, 3 DESC
```
- [ ] **Step 4: Create sales metrics**
Create `docs/metrics/sales/new_customers.yml`:
```yaml
- name: new_customers
display_name: New Customers
category: sales
type: count
unit: count
grain: monthly
table: orders
expression: "COUNT(DISTINCT customer_id)"
time_column: order_date
dimensions:
- channel
- region
synonyms:
- customer_acquisition
- new_logos
sql: |
SELECT
DATE_TRUNC('month', first_order_date) AS month,
COUNT(DISTINCT customer_id) AS new_customers
FROM (
SELECT customer_id, MIN(order_date) AS first_order_date
FROM orders
WHERE status = 'completed'
GROUP BY 1
)
GROUP BY 1
ORDER BY 1
```
Create `docs/metrics/sales/upsell_expansion.yml`:
```yaml
- name: upsell_expansion
display_name: Upsell & Expansion Revenue
category: sales
type: sum
unit: USD
grain: monthly
table: subscriptions
expression: "SUM(CASE WHEN change_type IN ('upgrade','expansion') THEN delta_mrr END)"
time_column: change_date
synonyms:
- expansion_revenue
- upsell
sql: |
SELECT
DATE_TRUNC('month', change_date) AS month,
SUM(delta_mrr) AS expansion_mrr
FROM subscription_changes
WHERE change_type IN ('upgrade', 'expansion')
GROUP BY 1
ORDER BY 1
```
Create `docs/metrics/sales/pipeline_value.yml`:
```yaml
- name: pipeline_value
display_name: Pipeline Value
category: sales
type: sum
unit: USD
grain: monthly
table: opportunities
expression: "SUM(deal_value * probability / 100)"
time_column: expected_close_date
dimensions:
- stage
- owner
synonyms:
- weighted_pipeline
- deal_pipeline
sql: |
SELECT
DATE_TRUNC('month', expected_close_date) AS month,
SUM(deal_value * probability / 100) AS weighted_pipeline
FROM opportunities
WHERE stage NOT IN ('closed_won', 'closed_lost')
GROUP BY 1
ORDER BY 1
```
- [ ] **Step 5: Create operations metrics**
Create `docs/metrics/operations/support_resolution_time.yml`:
```yaml
- name: support_resolution_time
display_name: Support Resolution Time
category: operations
type: avg
unit: hours
grain: monthly
table: tickets
expression: "AVG(resolution_hours)"
time_column: created_at
dimensions:
- priority
- category
synonyms:
- resolution_time
- mttr
- mean_time_to_resolve
sql: |
SELECT
DATE_TRUNC('month', created_at) AS month,
AVG(resolution_hours) AS avg_resolution_hours,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY resolution_hours) AS median_hours
FROM tickets
WHERE status = 'resolved'
GROUP BY 1
ORDER BY 1
```
Create `docs/metrics/operations/infrastructure_cost.yml`:
```yaml
- name: infrastructure_cost
display_name: Infrastructure Cost
category: operations
type: sum
unit: USD
grain: monthly
table: infra_costs
expression: "SUM(cost_usd)"
time_column: billing_month
dimensions:
- provider
- service
- environment
synonyms:
- cloud_cost
- hosting_cost
- infra_spend
sql: |
SELECT
billing_month AS month,
SUM(cost_usd) AS total_cost
FROM infra_costs
GROUP BY 1
ORDER BY 1
sql_by_provider: |
SELECT
billing_month AS month,
provider,
SUM(cost_usd) AS cost
FROM infra_costs
GROUP BY 1, 2
ORDER BY 1, 3 DESC
```
- [ ] **Step 6: Write import test for starter pack**
Add to `tests/test_metrics.py`:
```python
class TestStarterPack:
def test_import_starter_pack(self, db_conn):
from src.repositories.metrics import MetricRepository
repo = MetricRepository(db_conn)
starter_dir = Path(__file__).parent.parent / "docs" / "metrics"
if not starter_dir.exists():
pytest.skip("Starter pack not found")
count = repo.import_from_yaml(starter_dir)
assert count >= 11 # 11 metrics (total_revenue + 10 new)
assert repo.get("revenue/total_revenue") is not None
assert repo.get("revenue/mrr") is not None
assert repo.get("operations/infrastructure_cost") is not None
```
- [ ] **Step 7: Run tests**
Run: `pytest tests/test_metrics.py::TestStarterPack -v`
Expected: PASS
- [ ] **Step 8: Commit**
```bash
git add docs/metrics/ tests/test_metrics.py
git commit -m "feat: add 10 starter pack metrics (revenue, usage, sales, operations)"
```
---
### Task 7: Profiler Integration
**Files:**
- Modify: `src/profiler.py:1154,1248` (replace load_metrics calls)
- Test: manual verification (profiler is integration-heavy)
- [ ] **Step 1: Add get_table_map fallback to profiler**
In `src/profiler.py`, find the two call sites where `load_metrics()` is called (lines ~1154 and ~1248). In each location, replace:
```python
metrics_map = load_metrics(METRICS_YML_PATH)
```
with:
```python
# Try DuckDB-backed metrics first, fall back to YAML scan
metrics_map = _load_metrics_from_db()
if not metrics_map:
metrics_map = load_metrics(METRICS_YML_PATH)
```
Add this helper function near the top of the file (after the imports):
```python
def _load_metrics_from_db() -> Dict[str, List[str]]:
"""Load metrics table map from DuckDB. Returns empty dict on failure."""
try:
from src.db import get_system_db
from src.repositories.metrics import MetricRepository
conn = get_system_db()
repo = MetricRepository(conn)
table_map = repo.get_table_map()
conn.close()
return table_map
except Exception as exc:
logger.debug("Could not load metrics from DuckDB: %s", exc)
return {}
```
- [ ] **Step 2: Run existing profiler tests**
Run: `pytest tests/ -k profiler -v`
Expected: ALL PASS (existing tests should not break)
- [ ] **Step 3: Commit**
```bash
git add src/profiler.py
git commit -m "feat: profiler reads metrics from DuckDB with YAML fallback"
```
---
### Task 8: CLAUDE.md Update
**Files:**
- Modify: `CLAUDE.md`
- [ ] **Step 1: Add metrics workflow section to CLAUDE.md**
After the "## Development" section, add:
```markdown
## Business Metrics
Standardized metric definitions live in DuckDB (`metric_definitions` table). Import starter pack:
```bash
da metrics import docs/metrics/
```
### For AI agents analyzing data:
Before computing any business metric, look up the canonical definition:
1. `da metrics list` — find the relevant metric
2. `da metrics show revenue/mrr` — read the SQL and business rules
3. Use the SQL from the metric definition, adapt to the specific question
Never invent metric calculations — always use the canonical definitions.
```
- [ ] **Step 2: Commit**
```bash
git add CLAUDE.md
git commit -m "docs: add metrics workflow instructions to CLAUDE.md"
```
---
### Task 9: Migration Script
**Files:**
- Create: `scripts/migrate_metrics_to_duckdb.py`
- [ ] **Step 1: Create standalone migration script**
Create `scripts/migrate_metrics_to_duckdb.py`:
```python
"""Migrate metric YAML files to DuckDB metric_definitions table.
Usage:
python scripts/migrate_metrics_to_duckdb.py [--metrics-dir docs/metrics]
Idempotent — safe to run repeatedly. Uses UPSERT (ON CONFLICT DO UPDATE).
"""
import argparse
import logging
import sys
from pathlib import Path
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
logger = logging.getLogger(__name__)
def main():
parser = argparse.ArgumentParser(description="Migrate metric YAMLs to DuckDB")
parser.add_argument("--metrics-dir", default="docs/metrics", help="Path to metrics directory")
args = parser.parse_args()
metrics_dir = Path(args.metrics_dir)
if not metrics_dir.is_dir():
logger.error("Metrics directory not found: %s", metrics_dir)
sys.exit(1)
from src.db import get_system_db
from src.repositories.metrics import MetricRepository
conn = get_system_db()
try:
repo = MetricRepository(conn)
count = repo.import_from_yaml(metrics_dir)
logger.info("Imported %d metrics from %s", count, metrics_dir)
finally:
conn.close()
if __name__ == "__main__":
main()
```
- [ ] **Step 2: Run it manually to verify**
Run: `python scripts/migrate_metrics_to_duckdb.py --metrics-dir docs/metrics`
Expected: `Imported N metrics from docs/metrics`
- [ ] **Step 3: Commit**
```bash
git add scripts/migrate_metrics_to_duckdb.py
git commit -m "feat: add standalone metric YAML → DuckDB migration script"
```
---
### Task 10: Final Integration Test
- [ ] **Step 1: Run full test suite**
Run: `pytest tests/ -v --timeout=60`
Expected: ALL PASS (no regressions)
- [ ] **Step 2: Run OpenAPI snapshot test (if exists)**
Run: `pytest tests/ -k openapi -v`
Expected: May need snapshot update if endpoint list changed. If it fails, regenerate with `python scripts/generate_openapi.py`
- [ ] **Step 3: Final commit if any fixes needed**
```bash
git add -A
git commit -m "fix: address integration test issues"
```