Open-source AI data analyst platform extracted from internal repo. Includes data sync engine, Keboola adapter, Flask web portal, server deployment scripts, and configuration templates.
24 KiB
Session Exploration Guide
Guide for exploring Claude Code session transcripts to identify friction points and improve the analyst experience.
Session Data Location
Server: data-broker-for-claude (alias: kids)
Path: /data/user_sessions/
Sessions are collected by systemd service session-collector.timer (runs every 30 minutes).
Directory Structure
Sessions are organized by user:
/data/user_sessions/
├── petr/
│ ├── 2026-02-10_49898dbe-5045-45f5-9177-2ff10917de4a.jsonl
│ └── ...
├── martin.matejka/
│ └── ...
└── jakub.sochan/
└── ...
Permission Issue (See Issue #147)
Session files are owned by root:data-ops with -rw------- permissions, making direct scp impossible. Workaround: Use sudo cat over SSH:
# This FAILS
scp kids:/data/user_sessions/petr/session.jsonl .
# Use this instead
ssh kids "sudo cat /data/user_sessions/petr/session.jsonl" > session.jsonl
⚠️ CRITICAL: Common Analysis Mistakes
Based on real-world friction point analysis (Feb 2026), avoid these common pitfalls:
1. File mtime ≠ Session Time
WRONG:
# This finds files modified in last 48h, NOT sessions active in last 48h
find /data/user_sessions -name '*.jsonl' -mtime -2
Problem: File modification time is when collector WROTE the file, not when session started/ended.
Example:
- Session starts: 5.2.2026 15:56
- Session ends: 9.2.2026 20:50 (laptop closed for 4 days)
- File mtime: 9.2.2026 20:50
-mtime -2filter: MATCHES (file modified 2 days ago)- But session actually started 6 days ago!
CORRECT:
# Download sessions first, then filter by internal timestamps
for session in *.jsonl; do
FIRST_TS=$(jq -r 'select(.timestamp) | .timestamp' "$session" | head -1)
# Check if FIRST_TS is within your time window
done
2. Session Gaps (Laptop Closed)
WRONG:
# Summing elapsed times without checking for gaps
total_time = sum(event['data']['elapsedTimeSeconds'] for event in bash_events)
Problem: Sessions can span DAYS with laptop closed. Summing elapsed times includes idle time.
Example:
Session petr.hunka_2026-02-09_19c0a02f:
First event: 2026-02-05 16:06:52
Last event: 2026-02-09 15:18:50
Span: 3 days, 23 hours
Gap 1: 89.8 hours (laptop closed over weekend)
Gap 2: 4.9 hours (lunch break)
Gap 3: 5.4 hours (after work)
TOTAL ACTIVE TIME: only 4.6 minutes!
WRONG CALCULATION: 173.6 minutes (2.9 hours)
CORRECT:
# Group events into active blocks (gap >10 min = new block)
def calculate_active_time(timestamps):
blocks = []
current_block_start = timestamps[0]
prev_ts = timestamps[0]
for ts in timestamps[1:]:
gap_seconds = (ts - prev_ts).total_seconds()
if gap_seconds > 600: # >10 min gap
blocks.append((current_block_start, prev_ts))
current_block_start = ts
prev_ts = ts
blocks.append((current_block_start, timestamps[-1]))
total_active = sum((end - start).total_seconds() for start, end in blocks)
return total_active
3. Bash Progress Elapsed Times Are Cumulative
WRONG:
# Interpreting elapsed times as per-operation duration
for event in bash_progress_events:
if event['data']['elapsedTimeSeconds'] > 60:
print(f"This operation took {elapsed}s") # WRONG!
Problem: elapsedTimeSeconds is cumulative from command start, not per-operation.
Example:
bash_progress events for single rsync command:
Event 1: elapsedTimeSeconds: 11 (11s from start)
Event 2: elapsedTimeSeconds: 12 (12s from start)
Event 3: elapsedTimeSeconds: 13 (13s from start)
...
Event 96: elapsedTimeSeconds: 266 (266s from start)
These are NOT 96 separate 11-266s operations! This is ONE operation with 96 progress updates over 266 seconds.
CORRECT:
# Group by parent command, track min/max elapsed time
commands = {}
for event in bash_progress_events:
cmd_id = event.get('parentUuid') or event.get('uuid')
if cmd_id not in commands:
commands[cmd_id] = {'min': float('inf'), 'max': 0}
elapsed = event['data']['elapsedTimeSeconds']
commands[cmd_id]['min'] = min(commands[cmd_id]['min'], elapsed)
commands[cmd_id]['max'] = max(commands[cmd_id]['max'], elapsed)
# Total duration = max - min
for cmd_id, times in commands.items():
duration = times['max'] - times['min']
print(f"Command {cmd_id} took {duration}s")
4. Pre-Fix Sessions
WRONG: Reporting bugs that were already fixed.
Problem: Session file mtime doesn't tell you when session STARTED, only when it was written.
Example:
Issue #84 fixed: 2026-02-06 21:37:49
Session file: petr.hunka_2026-02-09_19c0a02f.jsonl
File mtime: 2026-02-09 20:50 (within 48h filter)
BUT: Session started 2026-02-05 15:56 (BEFORE fix!)
CORRECT:
# Always check when bug was fixed vs. when session started
git log --all --oneline --grep="issue-number" -- relevant/files
# Extract session start time
FIRST_TS=$(jq -r 'select(.timestamp) | .timestamp' session.jsonl | head -1)
# Compare: if session started before fix, ignore the bug report
5. Context-Free Bash Commands
WRONG:
# Reporting this as "user manually typed /tmp pattern"
pip freeze > /tmp/analyst_requirements.txt
scp /tmp/analyst_requirements.txt data-analyst:/tmp/...
Problem: You don't know if this was:
- From our bootstrap.yaml (intended)
- Claude improvising during troubleshooting (bug)
- User typed manually (edge case)
CORRECT:
# Look at surrounding context in session
jq -r 'select(.type == "user") | .message.content' session.jsonl | tail -20
# Check if user pasted bootstrap instructions
grep -B10 -A10 "/tmp/analyst_requirements" session.jsonl
# Check our codebase - is this pattern in our scripts?
git grep "/tmp/analyst_requirements" -- scripts/ docs/
Session JSONL Format
Each session is stored as a JSONL file with one JSON object per line. Each line represents an event in the session.
Event Types
Based on real session analysis:
- user - User messages
- assistant - Assistant responses (contains tool_use in content array)
- progress - Progress updates (hooks, bash execution)
- system - System messages (often empty)
- file-history-snapshot - File tracking snapshots
- queue-operation - Queue management events
- summary - Session summaries
Common Event Structure
All events share:
{
"type": "user|assistant|progress|system|...",
"sessionId": "uuid",
"timestamp": "ISO 8601",
"cwd": "/path/to/working/dir",
"version": "2.1.29",
"uuid": "event-uuid",
"parentUuid": "parent-event-uuid"
}
User Event
{
"type": "user",
"message": {
"role": "user",
"content": "user's message text"
},
"isMeta": false
}
Assistant Event with Tool Use
{
"type": "assistant",
"message": {
"role": "assistant",
"content": [
{
"type": "text",
"text": "response text"
},
{
"type": "tool_use",
"id": "tool-use-id",
"name": "Bash",
"input": {
"command": "ls -la",
"description": "List files"
}
}
]
}
}
Progress Event - Bash Execution
Important: Exit codes are NOT present in bash_progress events! Only output and timing.
{
"type": "progress",
"data": {
"type": "bash_progress",
"output": "command output",
"fullOutput": "complete output with formatting",
"elapsedTimeSeconds": 1.234,
"totalLines": 10,
"timeoutMs": 120000
}
}
Progress Event - Hook
{
"type": "progress",
"data": {
"type": "hook_progress",
"hookEvent": "SessionStart",
"hookName": "SessionStart:clear",
"command": "${CLAUDE_PLUGIN_ROOT}/hooks-handlers/session-start.sh"
}
}
Key Insight for Friction Detection
⚠️ Exit codes and explicit errors are NOT tracked in the current session format!
To find friction points, look for:
- Error keywords in bash output strings
- Repeated similar commands (retry patterns)
- User questions/confusion in messages
- Long bash execution times
- Context overflow (session continuations)
Practical Commands
Browse Sessions on Server
# List all sessions
ssh kids "ls -lh /data/user_sessions/"
# Count sessions
ssh kids "ls /data/user_sessions/ | wc -l"
# Find recent sessions (last 7 days)
ssh kids "find /data/user_sessions -name '*.jsonl' -mtime -7"
# Find sessions by user
ssh kids "ls /data/user_sessions/ | grep '^petr-'"
Download Sessions for Local Analysis
# Download specific session
scp kids:/data/user_sessions/petr-2024-12-15-abc123.jsonl .
# Download all sessions for a user
scp kids:/data/user_sessions/petr-*.jsonl ./sessions/
# Download recent sessions
ssh kids "find /data/user_sessions -mtime -7" | xargs -I {} scp kids:{} ./sessions/
Quick Stats on Server
# Count events per session
ssh kids "wc -l /data/user_sessions/*.jsonl"
# Count tool uses
ssh kids "grep -h '\"type\": \"tool_use\"' /data/user_sessions/*.jsonl | wc -l"
# Count errors
ssh kids "grep -h '\"is_error\": true' /data/user_sessions/*.jsonl | wc -l"
jq Queries for Analysis
Extract All Bash Commands
# Extract from assistant messages with tool_use
jq -r 'select(.type == "assistant") | .message.content[]? | select(.type == "tool_use" and .name == "Bash") | .input.command' session.jsonl
Find Bash Output (for error keywords)
# Extract bash execution output
jq -r 'select(.type == "progress" and .data.type == "bash_progress") | .data.fullOutput' session.jsonl
Find Error Keywords in Output
Since exit codes aren't tracked, search output strings:
# Search bash output for errors
jq -r 'select(.type == "progress" and .data.type == "bash_progress") | .data.fullOutput' session.jsonl | grep -i "error\|failed\|permission denied"
⚠️ Exit Code Queries Don't Work
Exit codes are NOT present in the current session format. These queries will return nothing:
# These DON'T work with current format
jq 'select(.exit_code == 127)' session.jsonl # Returns nothing
jq 'select(.data.exitCode != 0)' session.jsonl # Returns nothing
Count Tool Usage
jq -r 'select(.type == "tool_use") | .name' session.jsonl | sort | uniq -c
Extract Error Messages
jq -r 'select(.is_error == true) | .content' session.jsonl
Find Retry Patterns
# Find repeated similar commands (potential retry loops)
jq -r 'select(.type == "tool_use" and .name == "Bash") | .input.command' session.jsonl | sort | uniq -c | sort -rn
Session Timeline
jq -r '[.timestamp, .type, .name // .role] | @tsv' session.jsonl
Full Tool Use + Result
# Extract tool use followed by its result
jq -s 'group_by(.tool_use_id // empty) | .[] | select(length == 2)' session.jsonl
grep Patterns for Friction Points
Permission Errors
grep -i "permission denied" session.jsonl
grep -i "access denied" session.jsonl
grep -i "not permitted" session.jsonl
Command Not Found
grep "command not found" session.jsonl
grep "No such file or directory" session.jsonl
Authentication Issues
grep -i "authentication failed" session.jsonl
grep -i "unauthorized" session.jsonl
grep -i "forbidden" session.jsonl
Timeout Issues
grep -i "timeout" session.jsonl
grep -i "timed out" session.jsonl
Data Sync Issues
grep "sync_data.sh" session.jsonl
grep "rsync" session.jsonl | grep -i "error\|fail"
Python/Environment Issues
grep "ModuleNotFoundError" session.jsonl
grep "ImportError" session.jsonl
grep "No module named" session.jsonl
What to Look For (Friction Points)
1. Tool Failures
Indicators:
"is_error": truein tool results- Non-zero exit codes
- Exceptions in tool output
Check:
- Which tools fail most often?
- Are there common error patterns?
- Do errors lead to retry loops?
2. Permission Issues
Indicators:
- Exit code 126
- "Permission denied" messages
- Sudoers-related errors
Check:
- What operations need elevated permissions?
- Are there gaps in sudoers configuration?
- Do users hit permission walls frequently?
3. Command Not Found
Indicators:
- Exit code 127
- "command not found" messages
Check:
- Missing system utilities?
- PATH issues?
- Typos in commands vs. actual bugs?
4. Retry Loops
Indicators:
- Same command repeated multiple times
- Error → retry → error pattern
- High tool use count for single task
Check:
- What triggers retries?
- Are retries effective or just wasting time?
- Could better error messages prevent retries?
5. Data Sync Problems
Indicators:
- rsync errors
- Missing files after sync
- Stale data complaints
Check:
- Network issues?
- Permission problems on server?
- User confusion about when to sync?
6. Environment Setup Issues
Indicators:
- Missing Python modules
- Virtual environment problems
- Dependency conflicts
Check:
- Are setup instructions clear?
- Do users skip setup steps?
- Are requirements.txt files accurate?
7. User Confusion
Indicators:
- Repeated questions about same topic
- Commands tried in wrong directory
- Misunderstanding of data structure
Check:
- Documentation gaps?
- Confusing error messages?
- Missing guidance in critical moments?
Creating GitHub Issues
When to Create an Issue
Create a GitHub issue when you find:
- Repeated failures across multiple sessions
- Systematic problems (not user typos)
- Missing features that would prevent friction
- Documentation gaps that cause confusion
- Security or permission model issues
Issue Template
## Friction Point: [Short Description]
**Source:** Session exploration - [username]-[date]-[session_id]
**Frequency:** [How often this appears in sessions]
**Impact:** [High/Medium/Low]
### Problem
[Clear description of what goes wrong]
### Evidence
[Relevant excerpts from session JSONL]
### Root Cause
[Your analysis of why this happens]
### Proposed Solution
[How to fix this - could be code, documentation, or process change]
### Related Sessions
- session1.jsonl
- session2.jsonl
Labels to Use
user-feedback- Always use this for friction points from sessionsclaude-learnings- If the issue reveals Claude-specific patternsbug- If something is brokendocumentation- If docs are missing or unclearenhancement- If a feature would prevent the frictionsecurity- If related to permissions or access controlpipeline- If data sync or transformation related
Example Exploration Workflow
Step 1: Get Overview
# How many sessions do we have?
ssh kids "ls /data/user_sessions/ | wc -l"
# When was the last session?
ssh kids "ls -lt /data/user_sessions/ | head -5"
# Which users have sessions?
ssh kids "ls /data/user_sessions/ | cut -d- -f1 | sort | uniq -c"
Step 2: Sample Sessions
# Download a few recent sessions
mkdir -p ~/session-exploration
cd ~/session-exploration
# Get 5 most recent
ssh kids "ls -t /data/user_sessions/*.jsonl | head -5" | xargs -I {} scp kids:{} .
Step 3: Quick Analysis
# For each session, check:
for session in *.jsonl; do
echo "=== $session ==="
echo "Events: $(wc -l < $session)"
echo "Errors: $(jq 'select(.is_error == true)' $session | wc -l)"
echo "Exit codes: $(jq -r 'select(.exit_code != null and .exit_code != 0) | .exit_code' $session | sort | uniq -c)"
echo ""
done
Step 4: Deep Dive on Errors
# Extract all errors to a file
jq 'select(.is_error == true)' *.jsonl > all_errors.json
# Analyze error patterns
jq -r '.content' all_errors.json | sort | uniq -c | sort -rn
Step 5: Create Issues
For each distinct friction point:
- Document the pattern
- Collect evidence from sessions
- Propose solution
- Create GitHub issue with appropriate labels
Tips
- Start small: Analyze 5-10 sessions first before scaling up
- Look for patterns: Single errors might be user mistakes, repeated errors are friction points
- Context matters: Read surrounding messages to understand user intent
- Time analysis: Check if friction points occur at specific times (e.g., after server updates)
- User comparison: Do all users hit the same issues or are some user-specific?
Practical Example: Correct Session Analysis
Based on lessons learned from Feb 2026 friction point analysis:
Step 1: Download Sessions with Correct Time Filtering
# Create analysis directory
mkdir -p ~/session-analysis/raw
# Download ALL sessions (we'll filter by timestamps later)
# OR download sessions from specific date range based on filename
ssh kids "find /data/user_sessions -name '*2026-02-1[01]*.jsonl' -type f" | while read session; do
filename=$(echo $session | sed 's/\/data\/user_sessions\///' | tr '/' '_')
ssh kids "sudo cat $session" > ~/session-analysis/raw/$filename
done
Step 2: Filter by Internal Timestamps
#!/usr/bin/env python3
"""Filter sessions by actual start/end time, not file mtime."""
import json
from datetime import datetime, timezone, timedelta
from pathlib import Path
# Define time window
NOW = datetime.now(timezone.utc)
WINDOW_START = NOW - timedelta(hours=48)
session_dir = Path("~/session-analysis/raw").expanduser()
valid_sessions = []
for session_file in session_dir.glob("*.jsonl"):
try:
# Get first and last timestamp
with open(session_file) as f:
lines = f.readlines()
first_event = json.loads(lines[0])
last_event = json.loads(lines[-1])
first_ts = datetime.fromisoformat(first_event['timestamp'].replace('Z', '+00:00'))
last_ts = datetime.fromisoformat(last_event['timestamp'].replace('Z', '+00:00'))
# Check if session STARTED within window (not just ended)
if first_ts >= WINDOW_START:
valid_sessions.append({
'file': session_file.name,
'start': first_ts,
'end': last_ts,
'duration': last_ts - first_ts
})
print(f"✓ {session_file.name}: {first_ts} -> {last_ts}")
else:
print(f"✗ {session_file.name}: started {first_ts} (before window)")
except Exception as e:
print(f"✗ {session_file.name}: error - {e}")
print(f"\nFound {len(valid_sessions)} sessions within 48h window")
Step 3: Calculate Active Time (Skip Gaps)
#!/usr/bin/env python3
"""Calculate real active time, excluding laptop-closed gaps."""
import json
from datetime import datetime
from pathlib import Path
def calculate_active_time(session_file, gap_threshold_minutes=10):
"""
Calculate active time in session, excluding gaps >threshold.
Args:
session_file: Path to JSONL session file
gap_threshold_minutes: Gap >this = laptop closed (default 10 min)
Returns:
dict with total_span, active_time, gaps
"""
timestamps = []
with open(session_file) as f:
for line in f:
try:
event = json.loads(line)
if ts_str := event.get('timestamp'):
ts = datetime.fromisoformat(ts_str.replace('Z', '+00:00'))
timestamps.append(ts)
except:
pass
if len(timestamps) < 2:
return None
timestamps.sort()
# Find gaps
gaps = []
blocks = []
block_start = timestamps[0]
prev_ts = timestamps[0]
for ts in timestamps[1:]:
gap_seconds = (ts - prev_ts).total_seconds()
if gap_seconds > gap_threshold_minutes * 60:
# End current block
blocks.append((block_start, prev_ts))
gaps.append({
'start': prev_ts,
'end': ts,
'duration_hours': gap_seconds / 3600
})
block_start = ts
prev_ts = ts
# Last block
blocks.append((block_start, timestamps[-1]))
# Calculate metrics
total_span = (timestamps[-1] - timestamps[0]).total_seconds()
active_time = sum((end - start).total_seconds() for start, end in blocks)
gap_time = total_span - active_time
return {
'total_span_hours': total_span / 3600,
'active_time_hours': active_time / 3600,
'gap_time_hours': gap_time / 3600,
'num_gaps': len(gaps),
'gaps': gaps,
'num_blocks': len(blocks)
}
# Example usage
session = Path("~/session-analysis/raw/petr.hunka_2026-02-09_19c0a02f.jsonl").expanduser()
result = calculate_active_time(session)
print(f"Total span: {result['total_span_hours']:.2f} hours")
print(f"Active time: {result['active_time_hours']:.2f} hours")
print(f"Gap time: {result['gap_time_hours']:.2f} hours")
print(f"Number of gaps: {result['num_gaps']}")
for i, gap in enumerate(result['gaps'], 1):
print(f" Gap {i}: {gap['duration_hours']:.1f} hours ({gap['start']} -> {gap['end']})")
Step 4: Verify Bug Fix Timeline
#!/bin/bash
# Check if session contains pre-fix bugs
SESSION_FILE="$1"
ISSUE_NUMBER="$2" # e.g., 84
# Get session start time
FIRST_TS=$(jq -r 'select(.timestamp) | .timestamp' "$SESSION_FILE" | head -1)
echo "Session started: $FIRST_TS"
# Get when issue was fixed
echo -e "\nIssue #$ISSUE_NUMBER fix timeline:"
gh issue view $ISSUE_NUMBER --json closedAt,title | jq -r '"Closed: " + .closedAt + " - " + .title'
# Get relevant commits around that time
echo -e "\nRelevant commits:"
CLOSE_DATE=$(gh issue view $ISSUE_NUMBER --json closedAt -q .closedAt | cut -d'T' -f1)
git log --all --since="$CLOSE_DATE" --until="$(date -v+1d -j -f "%Y-%m-%d" "$CLOSE_DATE" +%Y-%m-%d)" \
--oneline --grep="$ISSUE_NUMBER\|tmp\|requirements" -- scripts/ docs/setup/
echo -e "\nConclusion:"
if [[ "$FIRST_TS" < "$(gh issue view $ISSUE_NUMBER --json closedAt -q .closedAt)" ]]; then
echo "⚠️ Session STARTED BEFORE fix - bug is expected"
else
echo "✓ Session started AFTER fix - bug should not appear"
fi
Step 5: Generate Accurate Report
#!/usr/bin/env python3
"""Generate friction report with verified metrics."""
import json
from pathlib import Path
from collections import defaultdict
def analyze_friction_points(sessions_dir):
"""
Analyze friction points with proper methodology.
Avoids:
- False positives from pre-fix sessions
- Inflated time estimates from gaps
- Misinterpreted cumulative elapsed times
"""
friction_points = defaultdict(list)
for session_file in Path(sessions_dir).glob("*.jsonl"):
# 1. Check session time window
active_time = calculate_active_time(session_file)
if not active_time or active_time['total_span_hours'] > 48:
continue # Skip multi-day sessions or invalid
# 2. Extract friction indicators
with open(session_file) as f:
for line in f:
event = json.loads(line)
# Example: Permission errors
if event.get('type') == 'progress':
output = event.get('data', {}).get('fullOutput', '')
if 'Permission denied' in output:
friction_points['permission_errors'].append({
'session': session_file.name,
'timestamp': event.get('timestamp'),
'output': output[:200]
})
# Example: Slow operations (using proper elapsed time interpretation)
if event.get('type') == 'progress' and event.get('data', {}).get('type') == 'bash_progress':
# This is cumulative, not per-operation!
# Track command start/end, not individual progress events
pass
return friction_points
# Generate report
friction = analyze_friction_points("~/session-analysis/raw")
for category, incidents in friction.items():
print(f"\n## {category.replace('_', ' ').title()}")
print(f"Found {len(incidents)} incidents")
# ... detailed reporting
Future Automation
Once we understand common patterns, we can build:
- Automated friction detection scripts
- Session quality metrics
- Alerting for critical failures
- User experience dashboards
But start with manual exploration to learn what matters.