docs: add implementation plans for porting internal features
Three independent plans following TDD approach: 1. Business metrics (10 tasks) — schema v4, repository, CLI, API, starter pack, profiler integration 2. Analyst bootstrap (4 tasks) — da analyst setup, CLAUDE.md template, freshness check 3. Metadata writer (4 tasks) — column metadata repo, CLI, API, Keboola push Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
c57e195932
commit
06ac937f8b
3 changed files with 2977 additions and 0 deletions
602
docs/superpowers/plans/2026-04-10-analyst-bootstrap.md
Normal file
602
docs/superpowers/plans/2026-04-10-analyst-bootstrap.md
Normal file
|
|
@ -0,0 +1,602 @@
|
|||
# Analyst Bootstrap Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Add `da analyst setup` command that onboards analysts to a remote Agnes instance — connects, downloads data, initializes local DuckDB, generates CLAUDE.md.
|
||||
|
||||
**Architecture:** New top-level Typer command `da analyst` with `setup` subcommand. Uses existing `cli/client.py` HTTP helpers. Generates CLAUDE.md from template with instance-specific placeholders.
|
||||
|
||||
**Tech Stack:** Typer, httpx (via cli/client.py), DuckDB, Rich (progress bars), Jinja2-style string substitution
|
||||
|
||||
**Spec:** `docs/superpowers/specs/2026-04-10-porting-internal-features-design.md` — Section 2
|
||||
|
||||
**Depends on:** Business Metrics plan (Task 5 — metrics API must exist for Step 4)
|
||||
|
||||
---
|
||||
|
||||
### Task 1: CLAUDE.md Template
|
||||
|
||||
**Files:**
|
||||
- Create: `config/claude_md_template.txt`
|
||||
|
||||
- [ ] **Step 1: Create the template file**
|
||||
|
||||
Create `config/claude_md_template.txt`:
|
||||
|
||||
```
|
||||
# {instance_name} — AI Data Analyst
|
||||
|
||||
This workspace is connected to {server_url}.
|
||||
|
||||
## Rules
|
||||
- Before computing any business metric: run `da metrics show <category>/<name>`
|
||||
- For current schema: read `data/metadata/schema.json`
|
||||
- Do not use DESCRIBE/SHOW COLUMNS — read metadata files instead
|
||||
- Save work output to `user/artifacts/`
|
||||
- Sync data regularly with `da sync`
|
||||
|
||||
## Metrics Workflow
|
||||
1. `da metrics list` — find the relevant metric
|
||||
2. `da metrics show revenue/mrr` — read SQL and business rules
|
||||
3. Use the canonical SQL from the metric definition, adapt to the question
|
||||
4. Never invent metric calculations — always check existing definitions first
|
||||
|
||||
## Data Sync
|
||||
- `da sync` — download current data from server
|
||||
- `da sync --docs-only` — just metadata and metrics (fast refresh)
|
||||
- `da sync --upload-only` — upload sessions and local notes to server
|
||||
- Data on the server refreshes every {sync_interval}
|
||||
|
||||
## Directory Structure
|
||||
- `data/` — read-only data downloaded from server
|
||||
- `data/parquet/` — table data in Parquet format
|
||||
- `data/duckdb/` — local analytics DuckDB database
|
||||
- `data/metadata/` — profiles, schema, metrics cache
|
||||
- `user/` — your workspace (persistent across syncs)
|
||||
- `user/artifacts/` — analysis outputs, reports, charts
|
||||
- `user/sessions/` — Claude Code session logs
|
||||
- `.claude/CLAUDE.local.md` — your personal notes (never overwritten, uploaded on sync)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Commit**
|
||||
|
||||
```bash
|
||||
git add config/claude_md_template.txt
|
||||
git commit -m "feat: add CLAUDE.md template for analyst bootstrap"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 2: `da analyst setup` — Core Command
|
||||
|
||||
**Files:**
|
||||
- Create: `cli/commands/analyst.py`
|
||||
- Modify: `cli/main.py` (register analyst_app)
|
||||
- Test: `tests/test_cli.py` (help test)
|
||||
- Test: `tests/test_analyst_bootstrap.py`
|
||||
|
||||
- [ ] **Step 1: Write failing tests**
|
||||
|
||||
Add to `tests/test_cli.py` in `TestCLIHelp`:
|
||||
|
||||
```python
|
||||
def test_analyst_help(self):
|
||||
result = runner.invoke(app, ["analyst", "--help"])
|
||||
assert result.exit_code == 0
|
||||
assert "setup" in result.output
|
||||
```
|
||||
|
||||
Create `tests/test_analyst_bootstrap.py`:
|
||||
|
||||
```python
|
||||
"""Tests for analyst bootstrap flow."""
|
||||
|
||||
import json
|
||||
import os
|
||||
from pathlib import Path
|
||||
from unittest.mock import patch, MagicMock
|
||||
|
||||
import pytest
|
||||
from typer.testing import CliRunner
|
||||
from cli.main import app
|
||||
|
||||
runner = CliRunner()
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def tmp_workspace(tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("DATA_DIR", str(tmp_path / "data"))
|
||||
monkeypatch.setenv("DA_CONFIG_DIR", str(tmp_path / "config"))
|
||||
monkeypatch.setenv("DA_LOCAL_DIR", str(tmp_path / "local"))
|
||||
monkeypatch.setenv("JWT_SECRET_KEY", "test-secret")
|
||||
(tmp_path / "data").mkdir()
|
||||
(tmp_path / "config").mkdir()
|
||||
(tmp_path / "local").mkdir()
|
||||
monkeypatch.chdir(tmp_path / "workspace")
|
||||
(tmp_path / "workspace").mkdir()
|
||||
yield tmp_path / "workspace"
|
||||
|
||||
|
||||
class TestDetectExistingProject:
|
||||
def test_detects_existing_claude_md(self, tmp_workspace):
|
||||
(tmp_workspace / "CLAUDE.md").write_text("# Acme — AI Data Analyst\n")
|
||||
result = runner.invoke(app, ["analyst", "setup", "--server-url", "http://localhost:8000"])
|
||||
assert "already set up" in result.output.lower() or result.exit_code == 0
|
||||
|
||||
def test_no_detection_with_force(self, tmp_workspace):
|
||||
(tmp_workspace / "CLAUDE.md").write_text("# Acme — AI Data Analyst\n")
|
||||
with patch("cli.commands.analyst._connect_to_instance") as mock_connect:
|
||||
mock_connect.side_effect = SystemExit(1) # Will fail at connect step
|
||||
result = runner.invoke(app, ["analyst", "setup", "--force",
|
||||
"--server-url", "http://localhost:8000"])
|
||||
# Should have passed detection and attempted connect
|
||||
mock_connect.assert_called_once()
|
||||
|
||||
|
||||
class TestCreateWorkspace:
|
||||
def test_creates_directory_structure(self, tmp_workspace):
|
||||
from cli.commands.analyst import _create_workspace
|
||||
_create_workspace(tmp_workspace)
|
||||
assert (tmp_workspace / "data" / "parquet").is_dir()
|
||||
assert (tmp_workspace / "data" / "duckdb").is_dir()
|
||||
assert (tmp_workspace / "data" / "metadata").is_dir()
|
||||
assert (tmp_workspace / "user" / "artifacts").is_dir()
|
||||
assert (tmp_workspace / "user" / "sessions").is_dir()
|
||||
assert (tmp_workspace / ".claude").is_dir()
|
||||
|
||||
|
||||
class TestGenerateClaudeMd:
|
||||
def test_generates_from_template(self, tmp_workspace):
|
||||
from cli.commands.analyst import _generate_claude_md
|
||||
_generate_claude_md(
|
||||
workspace=tmp_workspace,
|
||||
instance_name="Acme Analytics",
|
||||
server_url="https://data.acme.com",
|
||||
sync_interval="15 minutes",
|
||||
)
|
||||
claude_md = tmp_workspace / "CLAUDE.md"
|
||||
assert claude_md.exists()
|
||||
content = claude_md.read_text()
|
||||
assert "Acme Analytics" in content
|
||||
assert "https://data.acme.com" in content
|
||||
assert "15 minutes" in content
|
||||
|
||||
def test_creates_claude_local_md(self, tmp_workspace):
|
||||
from cli.commands.analyst import _generate_claude_md
|
||||
(tmp_workspace / ".claude").mkdir(parents=True, exist_ok=True)
|
||||
_generate_claude_md(
|
||||
workspace=tmp_workspace,
|
||||
instance_name="Test",
|
||||
server_url="http://localhost",
|
||||
sync_interval="1 hour",
|
||||
)
|
||||
assert (tmp_workspace / ".claude" / "CLAUDE.local.md").exists()
|
||||
|
||||
def test_does_not_overwrite_existing_local_md(self, tmp_workspace):
|
||||
(tmp_workspace / ".claude").mkdir(parents=True, exist_ok=True)
|
||||
local_md = tmp_workspace / ".claude" / "CLAUDE.local.md"
|
||||
local_md.write_text("my notes")
|
||||
from cli.commands.analyst import _generate_claude_md
|
||||
_generate_claude_md(
|
||||
workspace=tmp_workspace,
|
||||
instance_name="Test",
|
||||
server_url="http://localhost",
|
||||
sync_interval="1 hour",
|
||||
)
|
||||
assert local_md.read_text() == "my notes"
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run tests to verify they fail**
|
||||
|
||||
Run: `pytest tests/test_analyst_bootstrap.py -v`
|
||||
Expected: FAIL — `No such command 'analyst'`
|
||||
|
||||
- [ ] **Step 3: Implement `cli/commands/analyst.py`**
|
||||
|
||||
Create `cli/commands/analyst.py`:
|
||||
|
||||
```python
|
||||
"""Analyst commands — da analyst."""
|
||||
|
||||
import json
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
import typer
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
analyst_app = typer.Typer(help="Analyst workspace — setup, connect to a remote instance")
|
||||
|
||||
AGNES_IDENTIFIER = "AI Data Analyst"
|
||||
|
||||
|
||||
@analyst_app.command("setup")
|
||||
def setup(
|
||||
server_url: str = typer.Option(None, "--server-url", "-s", help="Agnes instance URL"),
|
||||
force: bool = typer.Option(False, "--force", help="Re-run from scratch, clean partial state"),
|
||||
):
|
||||
"""Set up a local analyst workspace connected to a remote Agnes instance."""
|
||||
workspace = Path.cwd()
|
||||
|
||||
# Step 1: Detect existing project
|
||||
if not force:
|
||||
claude_md = workspace / "CLAUDE.md"
|
||||
if claude_md.exists() and AGNES_IDENTIFIER in claude_md.read_text():
|
||||
typer.echo("Project already set up. Use 'da sync' to refresh data, or --force to re-setup.")
|
||||
return
|
||||
|
||||
# Step 2: Connect
|
||||
if not server_url:
|
||||
server_url = typer.prompt("Agnes instance URL (e.g., https://data.acme.com)")
|
||||
|
||||
token = _connect_to_instance(server_url)
|
||||
|
||||
# Step 3: Create workspace
|
||||
_create_workspace(workspace)
|
||||
|
||||
# Step 4: Download schema and metrics
|
||||
_download_metadata(workspace, server_url, token)
|
||||
|
||||
# Step 5: Download data
|
||||
table_count = _download_data(workspace, server_url, token)
|
||||
|
||||
# Step 6: Initialize DuckDB
|
||||
row_count = _initialize_duckdb(workspace)
|
||||
|
||||
# Step 7: Generate CLAUDE.md
|
||||
instance_name = _get_instance_name(server_url, token)
|
||||
_generate_claude_md(
|
||||
workspace=workspace,
|
||||
instance_name=instance_name,
|
||||
server_url=server_url,
|
||||
sync_interval="15 minutes",
|
||||
)
|
||||
|
||||
# Step 8: Verify
|
||||
typer.echo(f"\nSetup complete. {table_count} tables, {row_count} total rows.")
|
||||
typer.echo("Start analyzing with Claude Code, or run 'da sync' to refresh data.")
|
||||
|
||||
|
||||
def _connect_to_instance(server_url: str) -> str:
|
||||
"""Connect to Agnes instance, authenticate, return JWT token."""
|
||||
import httpx
|
||||
|
||||
# Health check
|
||||
try:
|
||||
resp = httpx.get(f"{server_url}/api/health", timeout=10)
|
||||
resp.raise_for_status()
|
||||
except Exception as e:
|
||||
typer.echo(f"Cannot reach {server_url}: {e}", err=True)
|
||||
raise typer.Exit(1)
|
||||
|
||||
# Authenticate
|
||||
email = typer.prompt("Email")
|
||||
password = typer.prompt("Password", hide_input=True)
|
||||
|
||||
try:
|
||||
resp = httpx.post(
|
||||
f"{server_url}/auth/token",
|
||||
data={"username": email, "password": password},
|
||||
timeout=10,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
token = resp.json().get("access_token")
|
||||
if not token:
|
||||
typer.echo("Authentication failed: no token in response", err=True)
|
||||
raise typer.Exit(1)
|
||||
except httpx.HTTPStatusError as e:
|
||||
typer.echo(f"Authentication failed: {e.response.text}", err=True)
|
||||
raise typer.Exit(1)
|
||||
|
||||
# Save credentials
|
||||
from cli.config import save_config
|
||||
save_config({"server_url": server_url, "token": token})
|
||||
typer.echo(f"Connected to {server_url}")
|
||||
return token
|
||||
|
||||
|
||||
def _create_workspace(workspace: Path) -> None:
|
||||
"""Create analyst directory structure."""
|
||||
dirs = [
|
||||
workspace / "data" / "parquet",
|
||||
workspace / "data" / "duckdb",
|
||||
workspace / "data" / "metadata",
|
||||
workspace / "user" / "artifacts",
|
||||
workspace / "user" / "sessions",
|
||||
workspace / ".claude",
|
||||
]
|
||||
for d in dirs:
|
||||
d.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
|
||||
def _download_metadata(workspace: Path, server_url: str, token: str) -> None:
|
||||
"""Download table list and metrics to local cache."""
|
||||
import httpx
|
||||
|
||||
headers = {"Authorization": f"Bearer {token}"}
|
||||
metadata_dir = workspace / "data" / "metadata"
|
||||
|
||||
# Tables
|
||||
try:
|
||||
resp = httpx.get(f"{server_url}/api/catalog/tables", headers=headers, timeout=30)
|
||||
resp.raise_for_status()
|
||||
(metadata_dir / "tables.json").write_text(json.dumps(resp.json(), indent=2))
|
||||
typer.echo(f"Downloaded table catalog ({resp.json().get('count', '?')} tables)")
|
||||
except Exception as e:
|
||||
typer.echo(f"Warning: could not download table catalog: {e}", err=True)
|
||||
|
||||
# Metrics
|
||||
try:
|
||||
resp = httpx.get(f"{server_url}/api/metrics", headers=headers, timeout=30)
|
||||
resp.raise_for_status()
|
||||
(metadata_dir / "metrics.json").write_text(json.dumps(resp.json(), indent=2))
|
||||
typer.echo(f"Downloaded metrics ({resp.json().get('count', '?')} metrics)")
|
||||
except Exception as e:
|
||||
typer.echo(f"Warning: could not download metrics: {e}", err=True)
|
||||
|
||||
|
||||
def _download_data(workspace: Path, server_url: str, token: str) -> int:
|
||||
"""Download parquet files for all accessible tables. Returns count."""
|
||||
import httpx
|
||||
|
||||
metadata_dir = workspace / "data" / "metadata"
|
||||
parquet_dir = workspace / "data" / "parquet"
|
||||
|
||||
tables_file = metadata_dir / "tables.json"
|
||||
if not tables_file.exists():
|
||||
return 0
|
||||
|
||||
tables_data = json.loads(tables_file.read_text())
|
||||
tables = tables_data.get("tables", [])
|
||||
count = 0
|
||||
|
||||
for table in tables:
|
||||
tid = table["id"]
|
||||
target = parquet_dir / f"{tid}.parquet"
|
||||
|
||||
# Resume: skip if already downloaded
|
||||
if target.exists() and target.stat().st_size > 0:
|
||||
count += 1
|
||||
continue
|
||||
|
||||
try:
|
||||
with httpx.Client(base_url=server_url, headers={"Authorization": f"Bearer {token}"},
|
||||
timeout=300) as client:
|
||||
with client.stream("GET", f"/api/data/{tid}/download") as resp:
|
||||
if resp.status_code == 404:
|
||||
continue
|
||||
resp.raise_for_status()
|
||||
target.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(target, "wb") as f:
|
||||
for chunk in resp.iter_bytes(65536):
|
||||
f.write(chunk)
|
||||
count += 1
|
||||
typer.echo(f" Downloaded {tid}")
|
||||
except Exception as e:
|
||||
typer.echo(f" Failed {tid}: {e}", err=True)
|
||||
|
||||
typer.echo(f"Downloaded {count}/{len(tables)} tables")
|
||||
return count
|
||||
|
||||
|
||||
def _initialize_duckdb(workspace: Path) -> int:
|
||||
"""Create local analytics.duckdb with views over downloaded parquets. Returns total rows."""
|
||||
import duckdb
|
||||
|
||||
parquet_dir = workspace / "data" / "parquet"
|
||||
db_path = workspace / "data" / "duckdb" / "analytics.duckdb"
|
||||
db_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
conn = duckdb.connect(str(db_path))
|
||||
total_rows = 0
|
||||
|
||||
for pq in sorted(parquet_dir.glob("*.parquet")):
|
||||
view_name = pq.stem
|
||||
try:
|
||||
conn.execute(f"CREATE OR REPLACE VIEW \"{view_name}\" AS SELECT * FROM read_parquet('{pq}')")
|
||||
row_count = conn.execute(f"SELECT count(*) FROM \"{view_name}\"").fetchone()[0]
|
||||
total_rows += row_count
|
||||
except Exception as e:
|
||||
logger.warning("Could not create view for %s: %s", pq.name, e)
|
||||
|
||||
conn.close()
|
||||
typer.echo(f"Initialized DuckDB with {len(list(parquet_dir.glob('*.parquet')))} views")
|
||||
return total_rows
|
||||
|
||||
|
||||
def _get_instance_name(server_url: str, token: str) -> str:
|
||||
"""Get instance name from server, fallback to URL hostname."""
|
||||
import httpx
|
||||
try:
|
||||
resp = httpx.get(f"{server_url}/api/health", headers={"Authorization": f"Bearer {token}"}, timeout=10)
|
||||
data = resp.json()
|
||||
return data.get("instance_name", server_url.split("//")[-1].split("/")[0])
|
||||
except Exception:
|
||||
return server_url.split("//")[-1].split("/")[0]
|
||||
|
||||
|
||||
def _generate_claude_md(
|
||||
workspace: Path,
|
||||
instance_name: str,
|
||||
server_url: str,
|
||||
sync_interval: str,
|
||||
) -> None:
|
||||
"""Generate CLAUDE.md from template."""
|
||||
template_path = Path(__file__).parent.parent.parent / "config" / "claude_md_template.txt"
|
||||
if template_path.exists():
|
||||
template = template_path.read_text()
|
||||
else:
|
||||
# Inline fallback
|
||||
template = "# {instance_name} — AI Data Analyst\n\nConnected to {server_url}.\n"
|
||||
|
||||
content = template.replace("{instance_name}", instance_name)
|
||||
content = content.replace("{server_url}", server_url)
|
||||
content = content.replace("{sync_interval}", sync_interval)
|
||||
|
||||
(workspace / "CLAUDE.md").write_text(content)
|
||||
|
||||
# Create CLAUDE.local.md if it doesn't exist
|
||||
local_md = workspace / ".claude" / "CLAUDE.local.md"
|
||||
local_md.parent.mkdir(parents=True, exist_ok=True)
|
||||
if not local_md.exists():
|
||||
local_md.write_text("# Personal Notes\n\nAdd your learnings and insights here.\n")
|
||||
```
|
||||
|
||||
Register in `cli/main.py`:
|
||||
|
||||
```python
|
||||
from cli.commands.analyst import analyst_app
|
||||
# ...
|
||||
app.add_typer(analyst_app, name="analyst")
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run tests to verify they pass**
|
||||
|
||||
Run: `pytest tests/test_analyst_bootstrap.py -v && pytest tests/test_cli.py::TestCLIHelp::test_analyst_help -v`
|
||||
Expected: ALL PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add cli/commands/analyst.py cli/main.py config/claude_md_template.txt tests/test_analyst_bootstrap.py tests/test_cli.py
|
||||
git commit -m "feat: add da analyst setup command with bootstrap flow"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3: Returning-Session Detection
|
||||
|
||||
**Files:**
|
||||
- Modify: `cli/commands/analyst.py` (add `da analyst status` command)
|
||||
- Test: `tests/test_analyst_bootstrap.py`
|
||||
|
||||
- [ ] **Step 1: Write failing test**
|
||||
|
||||
Add to `tests/test_analyst_bootstrap.py`:
|
||||
|
||||
```python
|
||||
import time
|
||||
|
||||
class TestReturningSession:
|
||||
def test_stale_data_warning(self, tmp_workspace):
|
||||
from cli.commands.analyst import _check_data_freshness
|
||||
metadata_dir = tmp_workspace / "data" / "metadata"
|
||||
metadata_dir.mkdir(parents=True, exist_ok=True)
|
||||
# Write last_sync.json with old timestamp
|
||||
import json
|
||||
from datetime import datetime, timezone, timedelta
|
||||
old_time = (datetime.now(timezone.utc) - timedelta(hours=25)).isoformat()
|
||||
(metadata_dir / "last_sync.json").write_text(json.dumps({"last_sync": old_time}))
|
||||
result = _check_data_freshness(tmp_workspace)
|
||||
assert result == "stale"
|
||||
|
||||
def test_fresh_data_ok(self, tmp_workspace):
|
||||
from cli.commands.analyst import _check_data_freshness
|
||||
metadata_dir = tmp_workspace / "data" / "metadata"
|
||||
metadata_dir.mkdir(parents=True, exist_ok=True)
|
||||
import json
|
||||
from datetime import datetime, timezone
|
||||
now = datetime.now(timezone.utc).isoformat()
|
||||
(metadata_dir / "last_sync.json").write_text(json.dumps({"last_sync": now}))
|
||||
result = _check_data_freshness(tmp_workspace)
|
||||
assert result == "fresh"
|
||||
|
||||
def test_no_data_returns_missing(self, tmp_workspace):
|
||||
from cli.commands.analyst import _check_data_freshness
|
||||
result = _check_data_freshness(tmp_workspace)
|
||||
assert result == "missing"
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `pytest tests/test_analyst_bootstrap.py::TestReturningSession -v`
|
||||
Expected: FAIL — `_check_data_freshness` not found
|
||||
|
||||
- [ ] **Step 3: Implement freshness check and status command**
|
||||
|
||||
Add to `cli/commands/analyst.py`:
|
||||
|
||||
```python
|
||||
@analyst_app.command("status")
|
||||
def status():
|
||||
"""Check workspace status and data freshness."""
|
||||
workspace = Path.cwd()
|
||||
|
||||
claude_md = workspace / "CLAUDE.md"
|
||||
if not claude_md.exists() or AGNES_IDENTIFIER not in claude_md.read_text():
|
||||
typer.echo("No analyst workspace detected. Run 'da analyst setup' first.")
|
||||
raise typer.Exit(1)
|
||||
|
||||
freshness = _check_data_freshness(workspace)
|
||||
if freshness == "stale":
|
||||
typer.echo("Data is stale (>24h old). Run 'da sync' to refresh.")
|
||||
elif freshness == "missing":
|
||||
typer.echo("No data found. Run 'da analyst setup' to download data.")
|
||||
else:
|
||||
typer.echo("Data is fresh.")
|
||||
|
||||
|
||||
def _check_data_freshness(workspace: Path) -> str:
|
||||
"""Check data freshness. Returns 'fresh', 'stale', or 'missing'."""
|
||||
last_sync_file = workspace / "data" / "metadata" / "last_sync.json"
|
||||
if not last_sync_file.exists():
|
||||
return "missing"
|
||||
|
||||
try:
|
||||
data = json.loads(last_sync_file.read_text())
|
||||
last_sync_str = data.get("last_sync")
|
||||
if not last_sync_str:
|
||||
return "missing"
|
||||
|
||||
from datetime import datetime, timezone, timedelta
|
||||
last_sync = datetime.fromisoformat(last_sync_str)
|
||||
if last_sync.tzinfo is None:
|
||||
last_sync = last_sync.replace(tzinfo=timezone.utc)
|
||||
age = datetime.now(timezone.utc) - last_sync
|
||||
if age > timedelta(hours=24):
|
||||
return "stale"
|
||||
return "fresh"
|
||||
except Exception:
|
||||
return "missing"
|
||||
```
|
||||
|
||||
Also update `_download_metadata` to write `last_sync.json`:
|
||||
|
||||
At the end of the `_download_metadata` function, add:
|
||||
|
||||
```python
|
||||
# Record sync timestamp
|
||||
from datetime import datetime, timezone
|
||||
sync_record = {"last_sync": datetime.now(timezone.utc).isoformat(), "server_url": server_url}
|
||||
(metadata_dir / "last_sync.json").write_text(json.dumps(sync_record))
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run tests to verify they pass**
|
||||
|
||||
Run: `pytest tests/test_analyst_bootstrap.py -v`
|
||||
Expected: ALL PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add cli/commands/analyst.py tests/test_analyst_bootstrap.py
|
||||
git commit -m "feat: add da analyst status and returning-session freshness check"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 4: Final Integration
|
||||
|
||||
- [ ] **Step 1: Run full test suite**
|
||||
|
||||
Run: `pytest tests/ -v --timeout=60`
|
||||
Expected: ALL PASS
|
||||
|
||||
- [ ] **Step 2: Commit if any fixes needed**
|
||||
|
||||
```bash
|
||||
git add -A
|
||||
git commit -m "fix: address analyst bootstrap integration issues"
|
||||
```
|
||||
1808
docs/superpowers/plans/2026-04-10-business-metrics.md
Normal file
1808
docs/superpowers/plans/2026-04-10-business-metrics.md
Normal file
File diff suppressed because it is too large
Load diff
567
docs/superpowers/plans/2026-04-10-metadata-writer.md
Normal file
567
docs/superpowers/plans/2026-04-10-metadata-writer.md
Normal file
|
|
@ -0,0 +1,567 @@
|
|||
# Metadata Writer Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Add column metadata management — discover basetypes/descriptions, store in DuckDB, push back to Keboola Storage API.
|
||||
|
||||
**Architecture:** `column_metadata` table (created in schema v4 by the metrics plan). New `ColumnMetadataRepository` following `table_registry.py` pattern. CLI subcommands under `da admin metadata`. API endpoints under `/api/admin/metadata/`. Keboola push uses Storage API v2.
|
||||
|
||||
**Tech Stack:** DuckDB, FastAPI, Typer, httpx (for Keboola API push), PyArrow (for schema introspection)
|
||||
|
||||
**Spec:** `docs/superpowers/specs/2026-04-10-porting-internal-features-design.md` — Section 3
|
||||
|
||||
**Depends on:** Business Metrics plan (Task 1 — schema v4 creates `column_metadata` table)
|
||||
|
||||
---
|
||||
|
||||
### Task 1: ColumnMetadataRepository
|
||||
|
||||
**Files:**
|
||||
- Create: `src/repositories/column_metadata.py`
|
||||
- Test: `tests/test_column_metadata.py`
|
||||
|
||||
- [ ] **Step 1: Write failing tests**
|
||||
|
||||
Create `tests/test_column_metadata.py`:
|
||||
|
||||
```python
|
||||
"""Tests for ColumnMetadataRepository."""
|
||||
|
||||
import os
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
import duckdb
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def db_conn(tmp_path, monkeypatch):
|
||||
monkeypatch.setenv("DATA_DIR", str(tmp_path))
|
||||
from src.db import get_system_db
|
||||
conn = get_system_db()
|
||||
yield conn
|
||||
conn.close()
|
||||
|
||||
|
||||
class TestColumnMetadataCreate:
|
||||
def test_save_single_column(self, db_conn):
|
||||
from src.repositories.column_metadata import ColumnMetadataRepository
|
||||
repo = ColumnMetadataRepository(db_conn)
|
||||
repo.save("orders", "total_amount", basetype="NUMERIC", description="Order total in USD")
|
||||
result = repo.get("orders", "total_amount")
|
||||
assert result is not None
|
||||
assert result["basetype"] == "NUMERIC"
|
||||
assert result["description"] == "Order total in USD"
|
||||
assert result["confidence"] == "manual"
|
||||
|
||||
def test_upsert_overwrites(self, db_conn):
|
||||
from src.repositories.column_metadata import ColumnMetadataRepository
|
||||
repo = ColumnMetadataRepository(db_conn)
|
||||
repo.save("orders", "total_amount", basetype="NUMERIC", description="v1")
|
||||
repo.save("orders", "total_amount", basetype="FLOAT", description="v2")
|
||||
result = repo.get("orders", "total_amount")
|
||||
assert result["basetype"] == "FLOAT"
|
||||
assert result["description"] == "v2"
|
||||
|
||||
|
||||
class TestColumnMetadataRead:
|
||||
def test_list_for_table(self, db_conn):
|
||||
from src.repositories.column_metadata import ColumnMetadataRepository
|
||||
repo = ColumnMetadataRepository(db_conn)
|
||||
repo.save("orders", "id", basetype="STRING")
|
||||
repo.save("orders", "total", basetype="NUMERIC")
|
||||
repo.save("users", "email", basetype="STRING")
|
||||
results = repo.list_for_table("orders")
|
||||
assert len(results) == 2
|
||||
names = {r["column_name"] for r in results}
|
||||
assert names == {"id", "total"}
|
||||
|
||||
def test_get_missing(self, db_conn):
|
||||
from src.repositories.column_metadata import ColumnMetadataRepository
|
||||
repo = ColumnMetadataRepository(db_conn)
|
||||
assert repo.get("x", "y") is None
|
||||
|
||||
|
||||
class TestColumnMetadataDelete:
|
||||
def test_delete_column(self, db_conn):
|
||||
from src.repositories.column_metadata import ColumnMetadataRepository
|
||||
repo = ColumnMetadataRepository(db_conn)
|
||||
repo.save("orders", "total", basetype="NUMERIC")
|
||||
assert repo.delete("orders", "total") is True
|
||||
assert repo.get("orders", "total") is None
|
||||
|
||||
def test_delete_missing(self, db_conn):
|
||||
from src.repositories.column_metadata import ColumnMetadataRepository
|
||||
repo = ColumnMetadataRepository(db_conn)
|
||||
assert repo.delete("x", "y") is False
|
||||
|
||||
|
||||
class TestColumnMetadataProposal:
|
||||
def test_import_proposal(self, db_conn, tmp_path):
|
||||
from src.repositories.column_metadata import ColumnMetadataRepository
|
||||
repo = ColumnMetadataRepository(db_conn)
|
||||
|
||||
proposal = {
|
||||
"project": {"name": "sales"},
|
||||
"generated_at": "2026-04-10T12:00:00",
|
||||
"tables": {
|
||||
"orders": {
|
||||
"columns": {
|
||||
"id": {"basetype": "STRING", "description": "Order ID", "confidence": "high"},
|
||||
"total": {"basetype": "NUMERIC", "description": "Total amount", "confidence": "medium"},
|
||||
}
|
||||
}
|
||||
},
|
||||
}
|
||||
proposal_path = tmp_path / "proposal.json"
|
||||
proposal_path.write_text(json.dumps(proposal))
|
||||
|
||||
count = repo.import_proposal(proposal_path)
|
||||
assert count == 2
|
||||
assert repo.get("orders", "id")["basetype"] == "STRING"
|
||||
assert repo.get("orders", "total")["confidence"] == "medium"
|
||||
|
||||
def test_import_proposal_sets_source(self, db_conn, tmp_path):
|
||||
from src.repositories.column_metadata import ColumnMetadataRepository
|
||||
repo = ColumnMetadataRepository(db_conn)
|
||||
|
||||
proposal = {
|
||||
"tables": {
|
||||
"orders": {
|
||||
"columns": {
|
||||
"id": {"basetype": "STRING", "description": "test", "confidence": "high"},
|
||||
}
|
||||
}
|
||||
},
|
||||
}
|
||||
(tmp_path / "p.json").write_text(json.dumps(proposal))
|
||||
repo.import_proposal(tmp_path / "p.json")
|
||||
assert repo.get("orders", "id")["source"] == "ai_enrichment"
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run tests to verify they fail**
|
||||
|
||||
Run: `pytest tests/test_column_metadata.py -v`
|
||||
Expected: FAIL — `ModuleNotFoundError`
|
||||
|
||||
- [ ] **Step 3: Implement ColumnMetadataRepository**
|
||||
|
||||
Create `src/repositories/column_metadata.py`:
|
||||
|
||||
```python
|
||||
"""Repository for column metadata (descriptions, basetypes)."""
|
||||
|
||||
import json
|
||||
import logging
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
import duckdb
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class ColumnMetadataRepository:
|
||||
def __init__(self, conn: duckdb.DuckDBPyConnection):
|
||||
self.conn = conn
|
||||
|
||||
def save(self, table_id: str, column_name: str,
|
||||
basetype: Optional[str] = None,
|
||||
description: Optional[str] = None,
|
||||
confidence: str = "manual",
|
||||
source: str = "manual") -> Dict[str, Any]:
|
||||
now = datetime.now(timezone.utc)
|
||||
self.conn.execute(
|
||||
"""INSERT INTO column_metadata (table_id, column_name, basetype, description, confidence, source, updated_at)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?)
|
||||
ON CONFLICT (table_id, column_name) DO UPDATE SET
|
||||
basetype = excluded.basetype,
|
||||
description = excluded.description,
|
||||
confidence = excluded.confidence,
|
||||
source = excluded.source,
|
||||
updated_at = excluded.updated_at""",
|
||||
[table_id, column_name, basetype, description, confidence, source, now],
|
||||
)
|
||||
return self.get(table_id, column_name)
|
||||
|
||||
def get(self, table_id: str, column_name: str) -> Optional[Dict[str, Any]]:
|
||||
result = self.conn.execute(
|
||||
"SELECT * FROM column_metadata WHERE table_id = ? AND column_name = ?",
|
||||
[table_id, column_name],
|
||||
).fetchone()
|
||||
if not result:
|
||||
return None
|
||||
columns = [desc[0] for desc in self.conn.description]
|
||||
return dict(zip(columns, result))
|
||||
|
||||
def list_for_table(self, table_id: str) -> List[Dict[str, Any]]:
|
||||
results = self.conn.execute(
|
||||
"SELECT * FROM column_metadata WHERE table_id = ? ORDER BY column_name",
|
||||
[table_id],
|
||||
).fetchall()
|
||||
if not results:
|
||||
return []
|
||||
columns = [desc[0] for desc in self.conn.description]
|
||||
return [dict(zip(columns, row)) for row in results]
|
||||
|
||||
def delete(self, table_id: str, column_name: str) -> bool:
|
||||
existing = self.get(table_id, column_name)
|
||||
if not existing:
|
||||
return False
|
||||
self.conn.execute(
|
||||
"DELETE FROM column_metadata WHERE table_id = ? AND column_name = ?",
|
||||
[table_id, column_name],
|
||||
)
|
||||
return True
|
||||
|
||||
def import_proposal(self, proposal_path) -> int:
|
||||
"""Import a metadata proposal JSON file. Returns count of columns imported."""
|
||||
path = Path(proposal_path)
|
||||
data = json.loads(path.read_text())
|
||||
count = 0
|
||||
|
||||
tables = data.get("tables", {})
|
||||
for table_id, table_data in tables.items():
|
||||
columns = table_data.get("columns", {})
|
||||
for col_name, col_data in columns.items():
|
||||
self.save(
|
||||
table_id=table_id,
|
||||
column_name=col_name,
|
||||
basetype=col_data.get("basetype"),
|
||||
description=col_data.get("description"),
|
||||
confidence=col_data.get("confidence", "medium"),
|
||||
source="ai_enrichment",
|
||||
)
|
||||
count += 1
|
||||
|
||||
return count
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run tests to verify they pass**
|
||||
|
||||
Run: `pytest tests/test_column_metadata.py -v`
|
||||
Expected: ALL PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add src/repositories/column_metadata.py tests/test_column_metadata.py
|
||||
git commit -m "feat: add ColumnMetadataRepository with CRUD and proposal import"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 2: CLI Subcommands `da admin metadata`
|
||||
|
||||
**Files:**
|
||||
- Modify: `cli/commands/admin.py` (add metadata subcommands)
|
||||
- Test: `tests/test_cli.py`
|
||||
|
||||
- [ ] **Step 1: Write failing test**
|
||||
|
||||
Add to `tests/test_cli.py` in `TestCLIHelp`:
|
||||
|
||||
```python
|
||||
def test_admin_metadata_help(self):
|
||||
result = runner.invoke(app, ["admin", "metadata-show", "--help"])
|
||||
assert result.exit_code == 0
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run test to verify it fails**
|
||||
|
||||
Run: `pytest tests/test_cli.py::TestCLIHelp::test_admin_metadata_help -v`
|
||||
Expected: FAIL — `No such command 'metadata-show'`
|
||||
|
||||
- [ ] **Step 3: Add metadata commands to admin.py**
|
||||
|
||||
Add to `cli/commands/admin.py`:
|
||||
|
||||
```python
|
||||
@admin_app.command("metadata-show")
|
||||
def metadata_show(
|
||||
table_id: str = typer.Argument(..., help="Table ID"),
|
||||
as_json: bool = typer.Option(False, "--json"),
|
||||
):
|
||||
"""Show column metadata for a table."""
|
||||
resp = api_get(f"/api/admin/metadata/{table_id}")
|
||||
if resp.status_code != 200:
|
||||
typer.echo(f"Failed: {resp.json().get('detail', resp.text)}", err=True)
|
||||
raise typer.Exit(1)
|
||||
|
||||
columns = resp.json().get("columns", [])
|
||||
if as_json:
|
||||
typer.echo(json.dumps(columns, indent=2))
|
||||
else:
|
||||
if not columns:
|
||||
typer.echo(f"No metadata for table '{table_id}'")
|
||||
return
|
||||
typer.echo(f"\n Metadata for {table_id}:")
|
||||
for c in columns:
|
||||
desc = c.get("description", "-")
|
||||
typer.echo(f" {c['column_name']:30s} {c.get('basetype', '?'):12s} {desc}")
|
||||
|
||||
|
||||
@admin_app.command("metadata-apply")
|
||||
def metadata_apply(
|
||||
proposal_path: str = typer.Argument(..., help="Path to proposal JSON file"),
|
||||
push_to_source: bool = typer.Option(False, "--push-to-source", help="Push to Keboola Storage API"),
|
||||
dry_run: bool = typer.Option(False, "--dry-run", help="Show changes without applying"),
|
||||
):
|
||||
"""Apply a metadata proposal (JSON) to DuckDB and optionally push to source."""
|
||||
from pathlib import Path
|
||||
|
||||
path = Path(proposal_path)
|
||||
if not path.exists():
|
||||
typer.echo(f"File not found: {proposal_path}", err=True)
|
||||
raise typer.Exit(1)
|
||||
|
||||
import json as json_mod
|
||||
data = json_mod.loads(path.read_text())
|
||||
tables = data.get("tables", {})
|
||||
|
||||
if dry_run:
|
||||
for table_id, td in tables.items():
|
||||
for col, cd in td.get("columns", {}).items():
|
||||
typer.echo(f" {table_id}.{col}: {cd.get('basetype', '?')} — {cd.get('description', '-')}")
|
||||
typer.echo(f"\nDry run: {sum(len(td.get('columns', {})) for td in tables.values())} columns would be applied")
|
||||
return
|
||||
|
||||
from src.db import get_system_db
|
||||
from src.repositories.column_metadata import ColumnMetadataRepository
|
||||
|
||||
conn = get_system_db()
|
||||
try:
|
||||
repo = ColumnMetadataRepository(conn)
|
||||
count = repo.import_proposal(path)
|
||||
typer.echo(f"Applied {count} column metadata entries to DuckDB")
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
if push_to_source:
|
||||
resp = api_post(f"/api/admin/metadata/push", json={"proposal_path": str(path)})
|
||||
if resp.status_code == 200:
|
||||
typer.echo("Pushed metadata to source system")
|
||||
else:
|
||||
typer.echo(f"Push failed: {resp.json().get('detail', resp.text)}", err=True)
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run tests to verify they pass**
|
||||
|
||||
Run: `pytest tests/test_cli.py::TestCLIHelp::test_admin_metadata_help -v`
|
||||
Expected: PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add cli/commands/admin.py tests/test_cli.py
|
||||
git commit -m "feat: add da admin metadata-show and metadata-apply commands"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 3: API Endpoints
|
||||
|
||||
**Files:**
|
||||
- Create: `app/api/metadata.py`
|
||||
- Modify: `app/main.py` (register router)
|
||||
- Test: `tests/test_api.py`
|
||||
|
||||
- [ ] **Step 1: Write failing tests**
|
||||
|
||||
Add to `tests/test_api.py`:
|
||||
|
||||
```python
|
||||
class TestMetadataAPI:
|
||||
def test_get_metadata_empty(self, seeded_client):
|
||||
client, admin_token, _ = seeded_client
|
||||
resp = client.get("/api/admin/metadata/orders",
|
||||
headers={"Authorization": f"Bearer {admin_token}"})
|
||||
assert resp.status_code == 200
|
||||
assert resp.json()["columns"] == []
|
||||
|
||||
def test_save_and_get_metadata(self, seeded_client):
|
||||
client, admin_token, _ = seeded_client
|
||||
resp = client.post(
|
||||
"/api/admin/metadata/orders",
|
||||
json={"columns": [
|
||||
{"column_name": "id", "basetype": "STRING", "description": "Order ID"},
|
||||
{"column_name": "total", "basetype": "NUMERIC", "description": "Total amount"},
|
||||
]},
|
||||
headers={"Authorization": f"Bearer {admin_token}"},
|
||||
)
|
||||
assert resp.status_code == 200
|
||||
assert resp.json()["count"] == 2
|
||||
|
||||
resp = client.get("/api/admin/metadata/orders",
|
||||
headers={"Authorization": f"Bearer {admin_token}"})
|
||||
assert len(resp.json()["columns"]) == 2
|
||||
|
||||
def test_analyst_cannot_save_metadata(self, seeded_client):
|
||||
client, _, analyst_token = seeded_client
|
||||
resp = client.post(
|
||||
"/api/admin/metadata/orders",
|
||||
json={"columns": [{"column_name": "id", "basetype": "STRING"}]},
|
||||
headers={"Authorization": f"Bearer {analyst_token}"},
|
||||
)
|
||||
assert resp.status_code == 403
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run tests to verify they fail**
|
||||
|
||||
Run: `pytest tests/test_api.py::TestMetadataAPI -v`
|
||||
Expected: FAIL — 404 on `/api/admin/metadata/orders`
|
||||
|
||||
- [ ] **Step 3: Implement API router**
|
||||
|
||||
Create `app/api/metadata.py`:
|
||||
|
||||
```python
|
||||
"""Column metadata API endpoints."""
|
||||
|
||||
from typing import List, Optional
|
||||
|
||||
from fastapi import APIRouter, Depends, HTTPException
|
||||
from pydantic import BaseModel
|
||||
import duckdb
|
||||
|
||||
from app.auth.dependencies import get_current_user, require_admin, _get_db
|
||||
from src.repositories.column_metadata import ColumnMetadataRepository
|
||||
|
||||
router = APIRouter(tags=["metadata"])
|
||||
|
||||
|
||||
class ColumnMetadataItem(BaseModel):
|
||||
column_name: str
|
||||
basetype: Optional[str] = None
|
||||
description: Optional[str] = None
|
||||
confidence: str = "manual"
|
||||
|
||||
|
||||
class ColumnMetadataSave(BaseModel):
|
||||
columns: List[ColumnMetadataItem]
|
||||
|
||||
|
||||
@router.get("/api/admin/metadata/{table_id}")
|
||||
async def get_table_metadata(
|
||||
table_id: str,
|
||||
user: dict = Depends(get_current_user),
|
||||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||||
):
|
||||
repo = ColumnMetadataRepository(conn)
|
||||
columns = repo.list_for_table(table_id)
|
||||
return {"table_id": table_id, "columns": columns}
|
||||
|
||||
|
||||
@router.post("/api/admin/metadata/{table_id}")
|
||||
async def save_table_metadata(
|
||||
table_id: str,
|
||||
body: ColumnMetadataSave,
|
||||
user: dict = Depends(require_admin),
|
||||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||||
):
|
||||
repo = ColumnMetadataRepository(conn)
|
||||
for col in body.columns:
|
||||
repo.save(
|
||||
table_id=table_id,
|
||||
column_name=col.column_name,
|
||||
basetype=col.basetype,
|
||||
description=col.description,
|
||||
confidence=col.confidence,
|
||||
source="api",
|
||||
)
|
||||
return {"status": "ok", "table_id": table_id, "count": len(body.columns)}
|
||||
|
||||
|
||||
@router.post("/api/admin/metadata/{table_id}/push")
|
||||
async def push_metadata_to_source(
|
||||
table_id: str,
|
||||
user: dict = Depends(require_admin),
|
||||
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
||||
):
|
||||
"""Push column metadata to the source system (Keboola only)."""
|
||||
from src.repositories.table_registry import TableRegistryRepository
|
||||
table_repo = TableRegistryRepository(conn)
|
||||
table = table_repo.get(table_id)
|
||||
|
||||
if not table:
|
||||
raise HTTPException(status_code=404, detail=f"Table not found: {table_id}")
|
||||
if table.get("source_type") != "keboola":
|
||||
raise HTTPException(status_code=400, detail="Push only supported for Keboola source tables")
|
||||
|
||||
meta_repo = ColumnMetadataRepository(conn)
|
||||
columns = meta_repo.list_for_table(table_id)
|
||||
if not columns:
|
||||
raise HTTPException(status_code=400, detail="No metadata to push")
|
||||
|
||||
# Build Keboola API payload
|
||||
import os
|
||||
import httpx
|
||||
|
||||
stack_url = os.environ.get("KBC_STACK_URL", "")
|
||||
token = os.environ.get("KBC_STORAGE_TOKEN", "")
|
||||
if not stack_url or not token:
|
||||
raise HTTPException(status_code=400, detail="KBC_STACK_URL and KBC_STORAGE_TOKEN must be set")
|
||||
|
||||
source_table = table.get("source_table", table_id)
|
||||
columns_metadata = {}
|
||||
for col in columns:
|
||||
entries = []
|
||||
if col.get("basetype"):
|
||||
entries.append({"key": "KBC.datatype.basetype", "value": col["basetype"]})
|
||||
if col.get("description"):
|
||||
entries.append({"key": "KBC.description", "value": col["description"]})
|
||||
if entries:
|
||||
columns_metadata[col["column_name"]] = entries
|
||||
|
||||
try:
|
||||
resp = httpx.post(
|
||||
f"{stack_url}/v2/storage/tables/{source_table}/metadata",
|
||||
headers={"X-StorageApi-Token": token},
|
||||
json={"provider": "ai-metadata-enrichment", "columnsMetadata": columns_metadata},
|
||||
timeout=30,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
return {"status": "pushed", "table_id": table_id, "columns": len(columns_metadata)}
|
||||
except httpx.HTTPStatusError as e:
|
||||
raise HTTPException(status_code=502, detail=f"Keboola API error: {e.response.text}")
|
||||
```
|
||||
|
||||
Register in `app/main.py`:
|
||||
|
||||
```python
|
||||
from app.api.metadata import router as metadata_router
|
||||
# ... (add near other router imports)
|
||||
|
||||
# In create_app(), add before web_router:
|
||||
app.include_router(metadata_router)
|
||||
```
|
||||
|
||||
- [ ] **Step 4: Run tests to verify they pass**
|
||||
|
||||
Run: `pytest tests/test_api.py::TestMetadataAPI -v`
|
||||
Expected: ALL PASS
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
|
||||
```bash
|
||||
git add app/api/metadata.py app/main.py tests/test_api.py
|
||||
git commit -m "feat: add column metadata API with Keboola push support"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Task 4: Final Integration
|
||||
|
||||
- [ ] **Step 1: Run full test suite**
|
||||
|
||||
Run: `pytest tests/ -v --timeout=60`
|
||||
Expected: ALL PASS
|
||||
|
||||
- [ ] **Step 2: Commit if any fixes needed**
|
||||
|
||||
```bash
|
||||
git add -A
|
||||
git commit -m "fix: address metadata writer integration issues"
|
||||
```
|
||||
Loading…
Reference in a new issue