CLAUDE.md rewritten (708 -> ~320 lines): four overlapping release sections collapsed to one, stale v1->v35 schema history dropped (it lives in CHANGELOG), marketplace endpoint internals and verbose process sections moved out or tightened. New focused docs: - docs/RELEASING.md - release process, deploy workflows, CI quirks (RELEASE_TEMPLATE.md folded in as an appendix) - docs/marketplace.md - marketplace ingestion + re-serving internals - docs/README.md - documentation index by audience, linked from README.md and CLAUDE.md Archived under docs/archive/: docs/superpowers/ (52 historical planning artifacts), HACKATHON.md, pd-ps-comments.md, security-audit-2026-04.md, future/NOTIFICATIONS.md. Removed the docs/auto-install.md stub. Fixed dangling links in connectors/jira/README.md and dev_docs/README.md, repointed code/doc references to archived paths.
20 KiB
Analyst Bootstrap Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Add da analyst setup command that onboards analysts to a remote Agnes instance — connects, downloads data, initializes local DuckDB, generates CLAUDE.md.
Architecture: New top-level Typer command da analyst with setup subcommand. Uses existing cli/client.py HTTP helpers. Generates CLAUDE.md from template with instance-specific placeholders.
Tech Stack: Typer, httpx (via cli/client.py), DuckDB, Rich (progress bars), Jinja2-style string substitution
Spec: docs/superpowers/specs/2026-04-10-porting-internal-features-design.md — Section 2
Depends on: Business Metrics plan (Task 5 — metrics API must exist for Step 4)
Task 1: CLAUDE.md Template
Files:
-
Create:
config/claude_md_template.txt -
Step 1: Create the template file
Create config/claude_md_template.txt:
# {instance_name} — AI Data Analyst
This workspace is connected to {server_url}.
## Rules
- Before computing any business metric: run `da metrics show <category>/<name>`
- For current schema: read `data/metadata/schema.json`
- Do not use DESCRIBE/SHOW COLUMNS — read metadata files instead
- Save work output to `user/artifacts/`
- Sync data regularly with `da sync`
## Metrics Workflow
1. `da metrics list` — find the relevant metric
2. `da metrics show revenue/mrr` — read SQL and business rules
3. Use the canonical SQL from the metric definition, adapt to the question
4. Never invent metric calculations — always check existing definitions first
## Data Sync
- `da sync` — download current data from server
- `da sync --docs-only` — just metadata and metrics (fast refresh)
- `da sync --upload-only` — upload sessions and local notes to server
- Data on the server refreshes every {sync_interval}
## Directory Structure
- `data/` — read-only data downloaded from server
- `data/parquet/` — table data in Parquet format
- `data/duckdb/` — local analytics DuckDB database
- `data/metadata/` — profiles, schema, metrics cache
- `user/` — your workspace (persistent across syncs)
- `user/artifacts/` — analysis outputs, reports, charts
- `user/sessions/` — Claude Code session logs
- `.claude/CLAUDE.local.md` — your personal notes (never overwritten, uploaded on sync)
- Step 2: Commit
git add config/claude_md_template.txt
git commit -m "feat: add CLAUDE.md template for analyst bootstrap"
Task 2: da analyst setup — Core Command
Files:
-
Create:
cli/commands/analyst.py -
Modify:
cli/main.py(register analyst_app) -
Test:
tests/test_cli.py(help test) -
Test:
tests/test_analyst_bootstrap.py -
Step 1: Write failing tests
Add to tests/test_cli.py in TestCLIHelp:
def test_analyst_help(self):
result = runner.invoke(app, ["analyst", "--help"])
assert result.exit_code == 0
assert "setup" in result.output
Create tests/test_analyst_bootstrap.py:
"""Tests for analyst bootstrap flow."""
import json
import os
from pathlib import Path
from unittest.mock import patch, MagicMock
import pytest
from typer.testing import CliRunner
from cli.main import app
runner = CliRunner()
@pytest.fixture(autouse=True)
def tmp_workspace(tmp_path, monkeypatch):
monkeypatch.setenv("DATA_DIR", str(tmp_path / "data"))
monkeypatch.setenv("DA_CONFIG_DIR", str(tmp_path / "config"))
monkeypatch.setenv("DA_LOCAL_DIR", str(tmp_path / "local"))
monkeypatch.setenv("JWT_SECRET_KEY", "test-secret")
(tmp_path / "data").mkdir()
(tmp_path / "config").mkdir()
(tmp_path / "local").mkdir()
monkeypatch.chdir(tmp_path / "workspace")
(tmp_path / "workspace").mkdir()
yield tmp_path / "workspace"
class TestDetectExistingProject:
def test_detects_existing_claude_md(self, tmp_workspace):
(tmp_workspace / "CLAUDE.md").write_text("# Acme — AI Data Analyst\n")
result = runner.invoke(app, ["analyst", "setup", "--server-url", "http://localhost:8000"])
assert "already set up" in result.output.lower() or result.exit_code == 0
def test_no_detection_with_force(self, tmp_workspace):
(tmp_workspace / "CLAUDE.md").write_text("# Acme — AI Data Analyst\n")
with patch("cli.commands.analyst._connect_to_instance") as mock_connect:
mock_connect.side_effect = SystemExit(1) # Will fail at connect step
result = runner.invoke(app, ["analyst", "setup", "--force",
"--server-url", "http://localhost:8000"])
# Should have passed detection and attempted connect
mock_connect.assert_called_once()
class TestCreateWorkspace:
def test_creates_directory_structure(self, tmp_workspace):
from cli.commands.analyst import _create_workspace
_create_workspace(tmp_workspace)
assert (tmp_workspace / "data" / "parquet").is_dir()
assert (tmp_workspace / "data" / "duckdb").is_dir()
assert (tmp_workspace / "data" / "metadata").is_dir()
assert (tmp_workspace / "user" / "artifacts").is_dir()
assert (tmp_workspace / "user" / "sessions").is_dir()
assert (tmp_workspace / ".claude").is_dir()
class TestGenerateClaudeMd:
def test_generates_from_template(self, tmp_workspace):
from cli.commands.analyst import _generate_claude_md
_generate_claude_md(
workspace=tmp_workspace,
instance_name="Acme Analytics",
server_url="https://data.acme.com",
sync_interval="15 minutes",
)
claude_md = tmp_workspace / "CLAUDE.md"
assert claude_md.exists()
content = claude_md.read_text()
assert "Acme Analytics" in content
assert "https://data.acme.com" in content
assert "15 minutes" in content
def test_creates_claude_local_md(self, tmp_workspace):
from cli.commands.analyst import _generate_claude_md
(tmp_workspace / ".claude").mkdir(parents=True, exist_ok=True)
_generate_claude_md(
workspace=tmp_workspace,
instance_name="Test",
server_url="http://localhost",
sync_interval="1 hour",
)
assert (tmp_workspace / ".claude" / "CLAUDE.local.md").exists()
def test_does_not_overwrite_existing_local_md(self, tmp_workspace):
(tmp_workspace / ".claude").mkdir(parents=True, exist_ok=True)
local_md = tmp_workspace / ".claude" / "CLAUDE.local.md"
local_md.write_text("my notes")
from cli.commands.analyst import _generate_claude_md
_generate_claude_md(
workspace=tmp_workspace,
instance_name="Test",
server_url="http://localhost",
sync_interval="1 hour",
)
assert local_md.read_text() == "my notes"
- Step 2: Run tests to verify they fail
Run: pytest tests/test_analyst_bootstrap.py -v
Expected: FAIL — No such command 'analyst'
- Step 3: Implement
cli/commands/analyst.py
Create cli/commands/analyst.py:
"""Analyst commands — da analyst."""
import json
import logging
from pathlib import Path
from typing import Optional
import typer
logger = logging.getLogger(__name__)
analyst_app = typer.Typer(help="Analyst workspace — setup, connect to a remote instance")
AGNES_IDENTIFIER = "AI Data Analyst"
@analyst_app.command("setup")
def setup(
server_url: str = typer.Option(None, "--server-url", "-s", help="Agnes instance URL"),
force: bool = typer.Option(False, "--force", help="Re-run from scratch, clean partial state"),
):
"""Set up a local analyst workspace connected to a remote Agnes instance."""
workspace = Path.cwd()
# Step 1: Detect existing project
if not force:
claude_md = workspace / "CLAUDE.md"
if claude_md.exists() and AGNES_IDENTIFIER in claude_md.read_text():
typer.echo("Project already set up. Use 'da sync' to refresh data, or --force to re-setup.")
return
# Step 2: Connect
if not server_url:
server_url = typer.prompt("Agnes instance URL (e.g., https://data.acme.com)")
token = _connect_to_instance(server_url)
# Step 3: Create workspace
_create_workspace(workspace)
# Step 4: Download schema and metrics
_download_metadata(workspace, server_url, token)
# Step 5: Download data
table_count = _download_data(workspace, server_url, token)
# Step 6: Initialize DuckDB
row_count = _initialize_duckdb(workspace)
# Step 7: Generate CLAUDE.md
instance_name = _get_instance_name(server_url, token)
_generate_claude_md(
workspace=workspace,
instance_name=instance_name,
server_url=server_url,
sync_interval="15 minutes",
)
# Step 8: Verify
typer.echo(f"\nSetup complete. {table_count} tables, {row_count} total rows.")
typer.echo("Start analyzing with Claude Code, or run 'da sync' to refresh data.")
def _connect_to_instance(server_url: str) -> str:
"""Connect to Agnes instance, authenticate, return JWT token."""
import httpx
# Health check
try:
resp = httpx.get(f"{server_url}/api/health", timeout=10)
resp.raise_for_status()
except Exception as e:
typer.echo(f"Cannot reach {server_url}: {e}", err=True)
raise typer.Exit(1)
# Authenticate
email = typer.prompt("Email")
password = typer.prompt("Password", hide_input=True)
try:
resp = httpx.post(
f"{server_url}/auth/token",
data={"username": email, "password": password},
timeout=10,
)
resp.raise_for_status()
token = resp.json().get("access_token")
if not token:
typer.echo("Authentication failed: no token in response", err=True)
raise typer.Exit(1)
except httpx.HTTPStatusError as e:
typer.echo(f"Authentication failed: {e.response.text}", err=True)
raise typer.Exit(1)
# Save credentials
from cli.config import save_config
save_config({"server_url": server_url, "token": token})
typer.echo(f"Connected to {server_url}")
return token
def _create_workspace(workspace: Path) -> None:
"""Create analyst directory structure."""
dirs = [
workspace / "data" / "parquet",
workspace / "data" / "duckdb",
workspace / "data" / "metadata",
workspace / "user" / "artifacts",
workspace / "user" / "sessions",
workspace / ".claude",
]
for d in dirs:
d.mkdir(parents=True, exist_ok=True)
def _download_metadata(workspace: Path, server_url: str, token: str) -> None:
"""Download table list and metrics to local cache."""
import httpx
headers = {"Authorization": f"Bearer {token}"}
metadata_dir = workspace / "data" / "metadata"
# Tables
try:
resp = httpx.get(f"{server_url}/api/catalog/tables", headers=headers, timeout=30)
resp.raise_for_status()
(metadata_dir / "tables.json").write_text(json.dumps(resp.json(), indent=2))
typer.echo(f"Downloaded table catalog ({resp.json().get('count', '?')} tables)")
except Exception as e:
typer.echo(f"Warning: could not download table catalog: {e}", err=True)
# Metrics
try:
resp = httpx.get(f"{server_url}/api/metrics", headers=headers, timeout=30)
resp.raise_for_status()
(metadata_dir / "metrics.json").write_text(json.dumps(resp.json(), indent=2))
typer.echo(f"Downloaded metrics ({resp.json().get('count', '?')} metrics)")
except Exception as e:
typer.echo(f"Warning: could not download metrics: {e}", err=True)
def _download_data(workspace: Path, server_url: str, token: str) -> int:
"""Download parquet files for all accessible tables. Returns count."""
import httpx
metadata_dir = workspace / "data" / "metadata"
parquet_dir = workspace / "data" / "parquet"
tables_file = metadata_dir / "tables.json"
if not tables_file.exists():
return 0
tables_data = json.loads(tables_file.read_text())
tables = tables_data.get("tables", [])
count = 0
for table in tables:
tid = table["id"]
target = parquet_dir / f"{tid}.parquet"
# Resume: skip if already downloaded
if target.exists() and target.stat().st_size > 0:
count += 1
continue
try:
with httpx.Client(base_url=server_url, headers={"Authorization": f"Bearer {token}"},
timeout=300) as client:
with client.stream("GET", f"/api/data/{tid}/download") as resp:
if resp.status_code == 404:
continue
resp.raise_for_status()
target.parent.mkdir(parents=True, exist_ok=True)
with open(target, "wb") as f:
for chunk in resp.iter_bytes(65536):
f.write(chunk)
count += 1
typer.echo(f" Downloaded {tid}")
except Exception as e:
typer.echo(f" Failed {tid}: {e}", err=True)
typer.echo(f"Downloaded {count}/{len(tables)} tables")
return count
def _initialize_duckdb(workspace: Path) -> int:
"""Create local analytics.duckdb with views over downloaded parquets. Returns total rows."""
import duckdb
parquet_dir = workspace / "data" / "parquet"
db_path = workspace / "data" / "duckdb" / "analytics.duckdb"
db_path.parent.mkdir(parents=True, exist_ok=True)
conn = duckdb.connect(str(db_path))
total_rows = 0
for pq in sorted(parquet_dir.glob("*.parquet")):
view_name = pq.stem
try:
conn.execute(f"CREATE OR REPLACE VIEW \"{view_name}\" AS SELECT * FROM read_parquet('{pq}')")
row_count = conn.execute(f"SELECT count(*) FROM \"{view_name}\"").fetchone()[0]
total_rows += row_count
except Exception as e:
logger.warning("Could not create view for %s: %s", pq.name, e)
conn.close()
typer.echo(f"Initialized DuckDB with {len(list(parquet_dir.glob('*.parquet')))} views")
return total_rows
def _get_instance_name(server_url: str, token: str) -> str:
"""Get instance name from server, fallback to URL hostname."""
import httpx
try:
resp = httpx.get(f"{server_url}/api/health", headers={"Authorization": f"Bearer {token}"}, timeout=10)
data = resp.json()
return data.get("instance_name", server_url.split("//")[-1].split("/")[0])
except Exception:
return server_url.split("//")[-1].split("/")[0]
def _generate_claude_md(
workspace: Path,
instance_name: str,
server_url: str,
sync_interval: str,
) -> None:
"""Generate CLAUDE.md from template."""
template_path = Path(__file__).parent.parent.parent / "config" / "claude_md_template.txt"
if template_path.exists():
template = template_path.read_text()
else:
# Inline fallback
template = "# {instance_name} — AI Data Analyst\n\nConnected to {server_url}.\n"
content = template.replace("{instance_name}", instance_name)
content = content.replace("{server_url}", server_url)
content = content.replace("{sync_interval}", sync_interval)
(workspace / "CLAUDE.md").write_text(content)
# Create CLAUDE.local.md if it doesn't exist
local_md = workspace / ".claude" / "CLAUDE.local.md"
local_md.parent.mkdir(parents=True, exist_ok=True)
if not local_md.exists():
local_md.write_text("# Personal Notes\n\nAdd your learnings and insights here.\n")
Register in cli/main.py:
from cli.commands.analyst import analyst_app
# ...
app.add_typer(analyst_app, name="analyst")
- Step 4: Run tests to verify they pass
Run: pytest tests/test_analyst_bootstrap.py -v && pytest tests/test_cli.py::TestCLIHelp::test_analyst_help -v
Expected: ALL PASS
- Step 5: Commit
git add cli/commands/analyst.py cli/main.py config/claude_md_template.txt tests/test_analyst_bootstrap.py tests/test_cli.py
git commit -m "feat: add da analyst setup command with bootstrap flow"
Task 3: Returning-Session Detection
Files:
-
Modify:
cli/commands/analyst.py(addda analyst statuscommand) -
Test:
tests/test_analyst_bootstrap.py -
Step 1: Write failing test
Add to tests/test_analyst_bootstrap.py:
import time
class TestReturningSession:
def test_stale_data_warning(self, tmp_workspace):
from cli.commands.analyst import _check_data_freshness
metadata_dir = tmp_workspace / "data" / "metadata"
metadata_dir.mkdir(parents=True, exist_ok=True)
# Write last_sync.json with old timestamp
import json
from datetime import datetime, timezone, timedelta
old_time = (datetime.now(timezone.utc) - timedelta(hours=25)).isoformat()
(metadata_dir / "last_sync.json").write_text(json.dumps({"last_sync": old_time}))
result = _check_data_freshness(tmp_workspace)
assert result == "stale"
def test_fresh_data_ok(self, tmp_workspace):
from cli.commands.analyst import _check_data_freshness
metadata_dir = tmp_workspace / "data" / "metadata"
metadata_dir.mkdir(parents=True, exist_ok=True)
import json
from datetime import datetime, timezone
now = datetime.now(timezone.utc).isoformat()
(metadata_dir / "last_sync.json").write_text(json.dumps({"last_sync": now}))
result = _check_data_freshness(tmp_workspace)
assert result == "fresh"
def test_no_data_returns_missing(self, tmp_workspace):
from cli.commands.analyst import _check_data_freshness
result = _check_data_freshness(tmp_workspace)
assert result == "missing"
- Step 2: Run test to verify it fails
Run: pytest tests/test_analyst_bootstrap.py::TestReturningSession -v
Expected: FAIL — _check_data_freshness not found
- Step 3: Implement freshness check and status command
Add to cli/commands/analyst.py:
@analyst_app.command("status")
def status():
"""Check workspace status and data freshness."""
workspace = Path.cwd()
claude_md = workspace / "CLAUDE.md"
if not claude_md.exists() or AGNES_IDENTIFIER not in claude_md.read_text():
typer.echo("No analyst workspace detected. Run 'da analyst setup' first.")
raise typer.Exit(1)
freshness = _check_data_freshness(workspace)
if freshness == "stale":
typer.echo("Data is stale (>24h old). Run 'da sync' to refresh.")
elif freshness == "missing":
typer.echo("No data found. Run 'da analyst setup' to download data.")
else:
typer.echo("Data is fresh.")
def _check_data_freshness(workspace: Path) -> str:
"""Check data freshness. Returns 'fresh', 'stale', or 'missing'."""
last_sync_file = workspace / "data" / "metadata" / "last_sync.json"
if not last_sync_file.exists():
return "missing"
try:
data = json.loads(last_sync_file.read_text())
last_sync_str = data.get("last_sync")
if not last_sync_str:
return "missing"
from datetime import datetime, timezone, timedelta
last_sync = datetime.fromisoformat(last_sync_str)
if last_sync.tzinfo is None:
last_sync = last_sync.replace(tzinfo=timezone.utc)
age = datetime.now(timezone.utc) - last_sync
if age > timedelta(hours=24):
return "stale"
return "fresh"
except Exception:
return "missing"
Also update _download_metadata to write last_sync.json:
At the end of the _download_metadata function, add:
# Record sync timestamp
from datetime import datetime, timezone
sync_record = {"last_sync": datetime.now(timezone.utc).isoformat(), "server_url": server_url}
(metadata_dir / "last_sync.json").write_text(json.dumps(sync_record))
- Step 4: Run tests to verify they pass
Run: pytest tests/test_analyst_bootstrap.py -v
Expected: ALL PASS
- Step 5: Commit
git add cli/commands/analyst.py tests/test_analyst_bootstrap.py
git commit -m "feat: add da analyst status and returning-session freshness check"
Task 4: Final Integration
- Step 1: Run full test suite
Run: pytest tests/ -v --timeout=60
Expected: ALL PASS
- Step 2: Commit if any fixes needed
git add -A
git commit -m "fix: address analyst bootstrap integration issues"