Make CLAUDE.md template generic and instance-aware
- Remove all Keboola-specific content (metric categories, MRR/ARR refs,
corporate memory, hardcoded server IP)
- Add {ssh_alias}, {server_host}, {webapp_url} placeholders
- Bootstrap saves .sync_connection file with instance details
- sync_data.sh reads .sync_connection to substitute all placeholders
- Text instructions in dashboard include .sync_connection step
This commit is contained in:
parent
6728da63fb
commit
2237334b05
4 changed files with 89 additions and 103 deletions
|
|
@ -131,10 +131,17 @@ setup:
|
||||||
max_retries: 3
|
max_retries: 3
|
||||||
|
|
||||||
- name: "create_folders"
|
- name: "create_folders"
|
||||||
description: "Create local project structure"
|
description: "Create local project structure and save connection details"
|
||||||
action: |
|
action: |
|
||||||
mkdir -p ./server/docs ./server/scripts ./server/examples ./server/parquet ./server/metadata
|
mkdir -p ./server/docs ./server/scripts ./server/examples ./server/parquet ./server/metadata
|
||||||
mkdir -p ./user/duckdb ./user/notifications ./user/artifacts ./user/scripts ./user/parquet ./user/sessions
|
mkdir -p ./user/duckdb ./user/notifications ./user/artifacts ./user/scripts ./user/parquet ./user/sessions
|
||||||
|
|
||||||
|
# Save connection details for sync_data.sh to use when generating CLAUDE.md
|
||||||
|
cat > ./.sync_connection << 'CONN'
|
||||||
|
ssh_alias={ssh_alias}
|
||||||
|
server_host={server_host}
|
||||||
|
webapp_url={webapp_url}
|
||||||
|
CONN
|
||||||
message: |
|
message: |
|
||||||
Project structure created (server/, user/).
|
Project structure created (server/, user/).
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -8,13 +8,13 @@ Project context file for **AI Data Analyst** - local analytics environment with
|
||||||
|----------|-------|
|
|----------|-------|
|
||||||
| **Project Type** | AI Data Analyst |
|
| **Project Type** | AI Data Analyst |
|
||||||
| **Database** | DuckDB at `user/duckdb/analytics.duckdb` |
|
| **Database** | DuckDB at `user/duckdb/analytics.duckdb` |
|
||||||
| **Data Source** | data-analyst server (34.88.8.46) |
|
| **Data Source** | {ssh_alias} server ({server_host}) |
|
||||||
| **Data Format** | Parquet files in `server/parquet/` |
|
| **Data Format** | Parquet files in `server/parquet/` |
|
||||||
| **Analyst** | {username} |
|
| **Analyst** | {username} |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## ⚠️ CRITICAL: Always Start Here
|
## CRITICAL: Always Start Here
|
||||||
|
|
||||||
### 1. Sync Data When Starting
|
### 1. Sync Data When Starting
|
||||||
|
|
||||||
|
|
@ -30,35 +30,7 @@ bash server/scripts/sync_data.sh
|
||||||
|
|
||||||
This updates data, scripts, documentation, and CLAUDE.md.
|
This updates data, scripts, documentation, and CLAUDE.md.
|
||||||
|
|
||||||
### 2. Read Metrics Definitions
|
### 2. Read Schema Documentation Before Writing SQL
|
||||||
|
|
||||||
**Before calculating ANY business metric (MRR, ARR, usage, limits, etc.), you MUST:**
|
|
||||||
|
|
||||||
1. **Start with the metrics index** - read `server/docs/metrics/metrics.yml` first
|
|
||||||
- This index file lists all available metrics organized by category
|
|
||||||
- Find the metric you need and note its file path
|
|
||||||
|
|
||||||
2. **Then read the specific metric file** from its category folder:
|
|
||||||
```bash
|
|
||||||
# Example: Read the metrics index first
|
|
||||||
cat server/docs/metrics/metrics.yml
|
|
||||||
|
|
||||||
# Then read the specific metric definition you need
|
|
||||||
cat server/docs/metrics/sales_revenue/mrr.yml
|
|
||||||
cat server/docs/metrics/product_usage/usage_value.yml
|
|
||||||
cat server/docs/metrics/finance/infra_cost.yml
|
|
||||||
cat server/docs/metrics/weekly_leadership_kpis/revenue_upsells_ytd.yml
|
|
||||||
```
|
|
||||||
|
|
||||||
**Categories:**
|
|
||||||
- `finance/` - Financial metrics (infra costs, retention)
|
|
||||||
- `product_usage/` - Platform usage, limits, telemetry
|
|
||||||
- `sales_revenue/` - MRR, ARR, new customers, expansions
|
|
||||||
- `weekly_leadership_kpis/` - Weekly KPIs for leadership reporting
|
|
||||||
|
|
||||||
Do not calculate metrics from memory. The formulas contain critical details (e.g., conditional aggregation for different metric types, proper value vs company_value usage). Getting this wrong produces plausible but incorrect numbers.
|
|
||||||
|
|
||||||
### 3. Read Schema Documentation Before Writing SQL
|
|
||||||
|
|
||||||
**MANDATORY: Before writing ANY SQL query, you MUST read the relevant documentation files:**
|
**MANDATORY: Before writing ANY SQL query, you MUST read the relevant documentation files:**
|
||||||
|
|
||||||
|
|
@ -73,20 +45,6 @@ cat server/docs/schema.yml
|
||||||
- **NEVER guess column names**
|
- **NEVER guess column names**
|
||||||
- schema.yml contains: all column names, types, descriptions, primary keys
|
- schema.yml contains: all column names, types, descriptions, primary keys
|
||||||
|
|
||||||
#### For on-demand datasets (if enabled):
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check for additional dataset schemas (e.g., kbc_telemetry_expert)
|
|
||||||
ls server/docs/datasets/
|
|
||||||
# Read the dataset doc for table relationships and ER diagrams
|
|
||||||
cat server/docs/datasets/<dataset_name>.md
|
|
||||||
# Read the dataset schema for column details
|
|
||||||
cat server/docs/datasets/<dataset_name>/schema.yml
|
|
||||||
```
|
|
||||||
|
|
||||||
- On-demand datasets have their own schema.yml and documentation files
|
|
||||||
- Only available if enabled in Data Settings at {webapp_url}
|
|
||||||
|
|
||||||
#### For table relationships (joins, foreign keys):
|
#### For table relationships (joins, foreign keys):
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|
@ -94,9 +52,30 @@ cat server/docs/datasets/<dataset_name>/schema.yml
|
||||||
cat server/docs/data_description.md
|
cat server/docs/data_description.md
|
||||||
```
|
```
|
||||||
|
|
||||||
- Contains ER diagrams, primary/foreign keys, sync strategies
|
- Contains primary/foreign keys, sync strategies, and table descriptions
|
||||||
- Essential for writing correct JOIN queries
|
- Essential for writing correct JOIN queries
|
||||||
- On-demand dataset docs reference core tables with `(core)` markers
|
|
||||||
|
#### For additional dataset schemas (if available):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check for additional dataset schemas
|
||||||
|
ls server/docs/datasets/ 2>/dev/null
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Read Metrics Definitions (if available)
|
||||||
|
|
||||||
|
**Before calculating ANY business metric, check for metric definitions:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check if metrics index exists
|
||||||
|
cat server/docs/metrics/metrics.yml 2>/dev/null
|
||||||
|
|
||||||
|
# Or list available metric files
|
||||||
|
ls server/docs/metrics/ 2>/dev/null
|
||||||
|
```
|
||||||
|
|
||||||
|
If metric definitions exist, always read the specific metric file before calculating.
|
||||||
|
Do not calculate metrics from memory - the formulas contain critical details.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -106,17 +85,16 @@ cat server/docs/data_description.md
|
||||||
project_root/
|
project_root/
|
||||||
├── server/ # READ-ONLY - synced from server
|
├── server/ # READ-ONLY - synced from server
|
||||||
│ ├── docs/ # Documentation
|
│ ├── docs/ # Documentation
|
||||||
│ │ ├── metrics/ # Metric definitions (modular structure)
|
│ │ ├── data_description.md # Table relationships and descriptions
|
||||||
│ │ ├── datasets/ # On-demand dataset docs and schemas
|
│ │ ├── schema.yml # Table schemas and column definitions
|
||||||
│ │ ├── data_description.md # Table relationships and ER diagrams
|
│ │ ├── metrics/ # Metric definitions (if available)
|
||||||
│ │ └── schema.yml # Table schemas and column definitions
|
│ │ └── datasets/ # Additional dataset docs (if available)
|
||||||
│ ├── scripts/ # Helper scripts (sync_data.sh, setup_views.sh)
|
│ ├── scripts/ # Helper scripts (sync_data.sh, setup_views.sh)
|
||||||
│ ├── examples/ # Example notification scripts
|
│ ├── examples/ # Example scripts (if available)
|
||||||
│ └── parquet/ # Synced parquet data files
|
│ └── parquet/ # Synced parquet data files
|
||||||
│
|
│
|
||||||
├── user/ # YOUR WORKSPACE - never overwritten
|
├── user/ # YOUR WORKSPACE - never overwritten
|
||||||
│ ├── duckdb/ # DuckDB database (analytics.duckdb)
|
│ ├── duckdb/ # DuckDB database (analytics.duckdb)
|
||||||
│ ├── notifications/ # Your notification scripts
|
|
||||||
│ ├── artifacts/ # Analysis outputs, charts, exports
|
│ ├── artifacts/ # Analysis outputs, charts, exports
|
||||||
│ └── scripts/ # Your custom scripts
|
│ └── scripts/ # Your custom scripts
|
||||||
│
|
│
|
||||||
|
|
@ -159,19 +137,21 @@ for table in tables:
|
||||||
con.close()
|
con.close()
|
||||||
```
|
```
|
||||||
|
|
||||||
### Query examples
|
### Query data
|
||||||
|
|
||||||
Browse `server/docs/metrics/metrics.yml` for all available metrics, then read specific metric files:
|
```bash
|
||||||
- **Finance**: `finance/` - Infrastructure costs with allocation guides
|
# Read schema first, then query
|
||||||
- **Product Usage**: `product_usage/` - Usage metrics with conditional aggregation, contract limits, usage vs limits
|
cat server/docs/schema.yml
|
||||||
- **Sales & Revenue**: `sales_revenue/` - MRR, ARR, new customer acquisition, expansions
|
```
|
||||||
- **Weekly Leadership KPIs**: `weekly_leadership_kpis/` - All weekly metrics for leadership reporting
|
|
||||||
|
|
||||||
All metric examples include multiple SQL variants:
|
```python
|
||||||
- `sql`: Total aggregate across all companies
|
import duckdb
|
||||||
- `sql_by_company`: Grouped by company
|
con = duckdb.connect('user/duckdb/analytics.duckdb')
|
||||||
- `sql_single_company`: Filter for specific company
|
# Write your query based on schema.yml column definitions
|
||||||
- `sql_by_project`: Project-level analysis (where applicable)
|
result = con.execute("SELECT * FROM your_table LIMIT 10").fetchdf()
|
||||||
|
print(result)
|
||||||
|
con.close()
|
||||||
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -193,42 +173,13 @@ You're ready to analyze!
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Corporate Memory
|
|
||||||
|
|
||||||
Your `CLAUDE.local.md` file serves a dual purpose:
|
|
||||||
1. **Personal notes** - never overwritten by server sync, your workspace for discoveries
|
|
||||||
2. **Knowledge sharing** - backed up to the server and processed into shared team knowledge
|
|
||||||
|
|
||||||
### How It Works
|
|
||||||
|
|
||||||
- Every `sync_data.sh` run backs up your `CLAUDE.local.md` to the server
|
|
||||||
- Every 30 minutes, the server extracts valuable knowledge from all team members' files
|
|
||||||
- Extracted knowledge is deduplicated and merged into a shared Corporate Memory database
|
|
||||||
- Browse and vote on knowledge at {webapp_url}/corporate-memory
|
|
||||||
- Items you upvote are synced to your `.claude/rules/` during the next data sync
|
|
||||||
|
|
||||||
### What to Write in CLAUDE.local.md
|
|
||||||
|
|
||||||
When you discover something valuable during your work, add it to `CLAUDE.local.md`:
|
|
||||||
|
|
||||||
- **Technical discoveries**: Novel solutions, workarounds, or techniques
|
|
||||||
- **Best practices**: Patterns that improved code quality or productivity
|
|
||||||
- **Tool tips**: Useful DuckDB queries, commands, or configurations
|
|
||||||
- **Debugging wisdom**: How specific errors were diagnosed and resolved
|
|
||||||
- **Domain knowledge**: Business logic insights or data relationships
|
|
||||||
|
|
||||||
The more specific and actionable your notes are, the more valuable they become for the whole team.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Important Reminders
|
## Important Reminders
|
||||||
|
|
||||||
- ⚠️ **Always read `server/docs/schema.yml` before writing SQL queries**
|
- Always read `server/docs/schema.yml` before writing SQL queries
|
||||||
- ⚠️ **Always check `server/docs/datasets/` for additional schema files from on-demand datasets**
|
- Always read `server/docs/data_description.md` for table relationships and joins
|
||||||
- ⚠️ **Always read `server/docs/metrics/metrics.yml` to find the right metric, then read its definition file before calculating business metrics**
|
- Check `server/docs/metrics/` for metric definitions before calculating business metrics
|
||||||
- ⚠️ **Always read `server/docs/data_description.md` for table relationships and joins**
|
- Use DuckDB views, not direct parquet file reads
|
||||||
- ✅ Use DuckDB views, not direct parquet file reads
|
- Never modify files in `server/` - they're read-only
|
||||||
- ❌ Never modify files in `server/` - they're read-only
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -268,7 +268,27 @@ if [[ -z "$DRY_RUN" ]]; then
|
||||||
ANALYST_USER="$EXISTING_USER"
|
ANALYST_USER="$EXISTING_USER"
|
||||||
fi
|
fi
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
# Read connection details from .sync_connection (written by bootstrap)
|
||||||
|
SSH_ALIAS="data-analyst"
|
||||||
|
SSH_HOST="unknown"
|
||||||
|
WEBAPP_URL=""
|
||||||
|
if [[ -f "./.sync_connection" ]]; then
|
||||||
|
SSH_ALIAS=$(grep '^ssh_alias=' ./.sync_connection 2>/dev/null | cut -d= -f2 || echo "data-analyst")
|
||||||
|
SSH_HOST=$(grep '^server_host=' ./.sync_connection 2>/dev/null | cut -d= -f2 || echo "unknown")
|
||||||
|
WEBAPP_URL=$(grep '^webapp_url=' ./.sync_connection 2>/dev/null | cut -d= -f2 || echo "")
|
||||||
|
fi
|
||||||
|
# Fallback: extract host from SSH config
|
||||||
|
if [[ "$SSH_HOST" == "unknown" ]] && [[ -f "$HOME/.ssh/config" ]]; then
|
||||||
|
SSH_HOST=$(awk "/^Host ${SSH_ALIAS}\$/,/^Host /{if(/HostName/) print \$2}" "$HOME/.ssh/config" 2>/dev/null | head -1)
|
||||||
|
SSH_HOST="${SSH_HOST:-unknown}"
|
||||||
|
fi
|
||||||
|
WEBAPP_URL="${WEBAPP_URL:-https://${SSH_HOST}}"
|
||||||
|
|
||||||
sed -e "s/{username}/$ANALYST_USER/g" \
|
sed -e "s/{username}/$ANALYST_USER/g" \
|
||||||
|
-e "s/{ssh_alias}/$SSH_ALIAS/g" \
|
||||||
|
-e "s|{server_host}|$SSH_HOST|g" \
|
||||||
|
-e "s|{webapp_url}|$WEBAPP_URL|g" \
|
||||||
./server/docs/setup/claude_md_template.txt > ./CLAUDE.md
|
./server/docs/setup/claude_md_template.txt > ./CLAUDE.md
|
||||||
echo "📝 CLAUDE.md updated from latest template"
|
echo "📝 CLAUDE.md updated from latest template"
|
||||||
fi
|
fi
|
||||||
|
|
|
||||||
|
|
@ -2447,9 +2447,13 @@
|
||||||
+ ' IdentityFile ' + sshKey + '\n'
|
+ ' IdentityFile ' + sshKey + '\n'
|
||||||
+ ' StrictHostKeyChecking accept-new\n'
|
+ ' StrictHostKeyChecking accept-new\n'
|
||||||
+ ' Then test: ssh ' + sshAlias + ' echo ok\n\n'
|
+ ' Then test: ssh ' + sshAlias + ' echo ok\n\n'
|
||||||
+ '2. Create project folders (use explicit mkdir, not brace expansion):\n'
|
+ '2. Create project folders and save connection details:\n'
|
||||||
+ ' mkdir -p server/docs server/scripts server/parquet server/metadata server/examples\n'
|
+ ' mkdir -p server/docs server/scripts server/parquet server/metadata server/examples\n'
|
||||||
+ ' mkdir -p user/duckdb user/notifications user/artifacts user/scripts user/parquet user/sessions\n\n'
|
+ ' mkdir -p user/duckdb user/notifications user/artifacts user/scripts user/parquet user/sessions\n'
|
||||||
|
+ ' Save this to .sync_connection (used by sync script to generate CLAUDE.md):\n'
|
||||||
|
+ ' ssh_alias=' + sshAlias + '\n'
|
||||||
|
+ ' server_host=' + serverHost + '\n'
|
||||||
|
+ ' webapp_url=' + webappUrl + '\n\n'
|
||||||
+ '3. Download from server via rsync (use --no-perms --no-group to avoid macOS permission errors).\n'
|
+ '3. Download from server via rsync (use --no-perms --no-group to avoid macOS permission errors).\n'
|
||||||
+ ' Skip directories that don\'t exist on the server (rsync exit code 23 = missing source).\n'
|
+ ' Skip directories that don\'t exist on the server (rsync exit code 23 = missing source).\n'
|
||||||
+ ' rsync -avz --no-perms --no-group ' + sshAlias + ':server/scripts/ ./server/scripts/\n'
|
+ ' rsync -avz --no-perms --no-group ' + sshAlias + ':server/scripts/ ./server/scripts/\n'
|
||||||
|
|
@ -2464,8 +2468,12 @@
|
||||||
+ ' pip install pandas pyarrow duckdb pyyaml python-dotenv\n\n'
|
+ ' pip install pandas pyarrow duckdb pyyaml python-dotenv\n\n'
|
||||||
+ '5. Initialize DuckDB (only if server/scripts/setup_views.sh exists):\n'
|
+ '5. Initialize DuckDB (only if server/scripts/setup_views.sh exists):\n'
|
||||||
+ ' bash server/scripts/setup_views.sh\n\n'
|
+ ' bash server/scripts/setup_views.sh\n\n'
|
||||||
+ '6. Create CLAUDE.md (only if server/docs/setup/claude_md_template.txt exists):\n'
|
+ '6. Create CLAUDE.md (if server/docs/setup/claude_md_template.txt exists):\n'
|
||||||
+ ' Copy the template, replace {username} with ' + username + '\n';
|
+ ' Copy the template, replace placeholders:\n'
|
||||||
|
+ ' {username} -> ' + username + '\n'
|
||||||
|
+ ' {ssh_alias} -> ' + sshAlias + '\n'
|
||||||
|
+ ' {server_host} -> ' + serverHost + '\n'
|
||||||
|
+ ' {webapp_url} -> ' + webappUrl + '\n';
|
||||||
|
|
||||||
var button = btn || document.getElementById('bootstrapCopyBtn');
|
var button = btn || document.getElementById('bootstrapCopyBtn');
|
||||||
var origText = button.textContent;
|
var origText = button.textContent;
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue