Branding cleanup: remove Keboola-specific references from docs and config

- server/deploy.sh: KEBOOLA_ENV_FILE -> SYNC_ENV_FILE
- server/ws-gateway.service, notify-bot.service: remove Keboola from descriptions
- .gitignore: generic comment for data directory
- CLAUDE.md, README.md, ARCHITECTURE.md: update paths from src/adapters to connectors/
- docs/DATA_SOURCES.md: update custom connector guide to connectors/ pattern
- connectors/jira/README.md: keboola-analyst -> data-analyst in config paths
- dev_docs/desktop-app.md: KeboolaAnalyst -> DataAnalyst branding
This commit is contained in:
Petr 2026-03-09 12:22:27 +01:00
parent 266e8573d3
commit 38b86127ed
10 changed files with 56 additions and 55 deletions

2
.gitignore vendored
View file

@ -117,7 +117,7 @@ docs/data_description.md
.github/workflows/deploy.yml
# Project-specific: Data directory
# Downloaded data from Keboola - never commit
# Downloaded source data - never commit
data/
# Metadata tooling - entire folder

View file

@ -10,7 +10,7 @@ Data Source (Keboola / CSV / BigQuery)
| Data Broker Server |
| |
| src/data_sync.py |
| -> src/adapters/*.py (fetch data) |
| -> connectors/*.py (fetch data) |
| -> src/parquet_manager.py (convert) |
| |
| /data/src_data/parquet/ (output) |
@ -37,9 +37,8 @@ Pulls data from configured source, converts to Parquet.
| File | Role |
|------|------|
| `src/data_sync.py` | Orchestration + `DataSource` ABC (line 149) |
| `src/adapters/base.py` | Adapter interface |
| `src/adapters/keboola_adapter.py` | Keboola Storage adapter |
| `src/keboola_client.py` | Low-level Keboola API client |
| `connectors/keboola/adapter.py` | Keboola data source |
| `connectors/keboola/client.py` | Low-level Keboola API client |
| `src/parquet_manager.py` | CSV -> typed Parquet conversion |
| `src/config.py` | Reads `data_description.md` for table definitions |
| `src/profiler.py` | Data profiling for catalog UI |
@ -129,7 +128,7 @@ inject_config() context processor
## Key Patterns
- **Adapter pattern**: Factory in `src/adapters/__init__.py`, ABC in `src/data_sync.py`
- **Connector pattern**: Dynamic connector registry in `src/data_sync.py`, `connectors/keboola/` for reference
- **Atomic writes**: `tempfile.mkstemp()` + `os.fchmod()` + `os.replace()` for JSON state files
- **User home writes**: `sudo install -o {user} -g {user}` for writing to analyst home dirs
- **Config interpolation**: `${ENV_VAR}` in YAML resolved at load time, missing vars logged as warnings

View file

@ -33,11 +33,13 @@ Ask the user for:
```
├── src/ # Core data sync engine
│ ├── adapters/ # Data source adapters (Keboola, CSV, etc.)
│ ├── config.py # Configuration from data_description.md
│ ├── data_sync.py # Sync orchestration
│ ├── data_sync.py # Sync orchestration + DataSource ABC
│ ├── parquet_manager.py # Parquet file management
│ └── profiler.py # Data profiling
├── connectors/ # Data source connectors
│ ├── keboola/ # Keboola Storage connector
│ └── jira/ # Jira webhook connector
├── webapp/ # Flask web portal (login, dashboard, API)
├── server/ # Server deployment (systemd, scripts)
├── scripts/ # Utility scripts (sync, DuckDB setup)
@ -97,8 +99,8 @@ python -m src.data_sync
## Data Source Adapters
The platform supports pluggable data sources via `src/adapters/`:
- **Keboola** (`keboola`): Syncs from Keboola Storage API
The platform supports pluggable data sources via `connectors/`:
- **Keboola** (`keboola`): Syncs from Keboola Storage API (see `connectors/keboola/`)
- **CSV** (`csv`): Import from local CSV files (planned)
- **BigQuery** (`bigquery`): Query from Google BigQuery (planned)
@ -136,11 +138,11 @@ When reopening the project in Claude Code:
4. `inject_config()` context processor exposes `Config` to all Jinja templates
5. Templates use `{{ config.INSTANCE_NAME }}`, `{{ config.INSTANCE_SUBTITLE }}`, etc.
### Adapter Pattern
- Factory: `src/adapters/__init__.py` -> `create_data_source(adapter_type, **kwargs)`
- ABC: `DataSource` class in `src/data_sync.py` (lines 149-172)
- Keboola: `src/adapters/keboola_adapter.py` -> thin facade wrapping `LocalKeboolaSource`
- Core Keboola logic: `src/keboola_client.py` (788 lines, Keboola Storage API wrapper)
### Connector Pattern
- ABC: `DataSource` class in `src/data_sync.py`
- Registry: `create_data_source()` in `src/data_sync.py` auto-discovers connectors in `connectors/`
- Keboola: `connectors/keboola/adapter.py` -> `KeboolaDataSource` implementing `DataSource`
- Core Keboola logic: `connectors/keboola/client.py` (Keboola Storage API wrapper)
### Server Patterns
- Atomic JSON writes: `tempfile.mkstemp()` + `os.fchmod(fd, 0o660)` + `os.replace()`

View file

@ -82,14 +82,17 @@ ai-data-analyst/
│ └── data_description.md.example # Data schema template
├── src/ # Server-side Python code
│ ├── adapters/ # Data source adapters
│ │ ├── base.py # Adapter interface (ABC)
│ │ └── keboola_adapter.py # Keboola Storage adapter
│ ├── data_sync.py # Orchestrates data pull from sources
│ ├── data_sync.py # Orchestrates data pull + DataSource ABC
│ ├── parquet_manager.py # CSV to Parquet conversion
│ ├── config.py # Configuration loader
│ └── profiler.py # Data profiling for catalog
├── connectors/ # Data source connectors
│ ├── keboola/ # Keboola Storage connector
│ │ ├── adapter.py # KeboolaDataSource (implements DataSource)
│ │ └── client.py # Low-level Keboola API client
│ └── jira/ # Jira webhook connector
├── webapp/ # Flask web application
│ └── ... # User onboarding, settings, catalog
@ -124,7 +127,7 @@ ai-data-analyst/
| BigQuery | Planned | Google BigQuery adapter |
| Snowflake | Planned | Snowflake adapter |
Adding a new adapter means implementing the `DataSource` interface in `src/adapters/` and setting `data_source.type` in `config/instance.yaml`. See `src/adapters/base.py` for the contract.
Adding a new data source means creating a connector module in `connectors/` that implements the `DataSource` interface from `src/data_sync.py`, and setting `data_source.type` in `config/instance.yaml`. See `connectors/keboola/` for a reference implementation.
## Using with Claude Code

View file

@ -73,7 +73,7 @@ Real-time sync of Jira support tickets for AI-powered analysis.
┌─────────────────────────────────────────────────────────────────────────────┐
│ ANALYST MACHINE │
│ │
│ ~/keboola-analysis/
│ ~/data-analysis/
│ └── server/ │
│ └── parquet/ │
│ └── jira/ # Synced Parquet + attachments │
@ -540,7 +540,7 @@ Jira data is an **optional dataset** - not synced by default to save bandwidth.
**Enable Jira sync:**
```bash
# Edit local config (created on first sync_data.sh run)
nano ~/.config/keboola-analyst/sync.yaml
nano ~/.config/data-analyst/sync.yaml
# Change:
datasets:
@ -585,7 +585,7 @@ This is fast (only downloads files for one ticket) and keeps your local machine
If you need frequent access to attachments, enable full sync:
```yaml
# ~/.config/keboola-analyst/sync.yaml
# ~/.config/data-analyst/sync.yaml
datasets:
jira: true
jira_attachments: true # Syncs ~500MB+ of files

View file

@ -32,18 +32,18 @@ The WebSocket gateway (`server/ws_gateway/`) runs as a separate systemd service
## Building
```bash
cd macos-app/KeboolaAnalyst
xcodebuild -scheme KeboolaAnalyst -configuration Debug build
cd macos-app/DataAnalyst
xcodebuild -scheme DataAnalyst -configuration Debug build
```
The built app is at:
```
~/Library/Developer/Xcode/DerivedData/KeboolaAnalyst-*/Build/Products/Debug/KeboolaAnalyst.app
~/Library/Developer/Xcode/DerivedData/DataAnalyst-*/Build/Products/Debug/DataAnalyst.app
```
To run:
```bash
open ~/Library/Developer/Xcode/DerivedData/KeboolaAnalyst-*/Build/Products/Debug/KeboolaAnalyst.app
open ~/Library/Developer/Xcode/DerivedData/DataAnalyst-*/Build/Products/Debug/DataAnalyst.app
```
## Authentication Flow
@ -52,7 +52,7 @@ open ~/Library/Developer/Xcode/DerivedData/KeboolaAnalyst-*/Build/Products/Debug
2. Browser opens `https://your-instance.example.com/desktop/link`
3. User authenticates via Google SSO (if not already logged in)
4. User clicks **Authorize Desktop App**
5. Webapp generates a JWT token (HS256, 30-day expiry) and redirects to `keboola-analyst://auth?token=eyJ...`
5. Webapp generates a JWT token (HS256, 30-day expiry) and redirects to `data-analyst://auth?token=eyJ...`
6. macOS app catches the custom URL scheme, stores the JWT in Keychain
7. App connects to WebSocket gateway, sends `{"type":"auth","token":"..."}`
8. Gateway validates JWT and confirms with `{"type":"auth_ok","username":"..."}`
@ -86,7 +86,7 @@ Client -> Server: {"type":"pong"}
- **Persistence**: notifications stored in UserDefaults between launches
- **Keychain**: JWT token stored securely in macOS Keychain
- **Run scripts**: execute notification scripts on-demand via webapp API, results arrive as WS notifications
- **Logging**: `os.log` with subsystem `com.keboola.analyst`, category `WebSocket` -- view with `log show --predicate 'subsystem == "com.keboola.analyst"' --last 5m --info`
- **Logging**: `os.log` with subsystem `com.dataanalyst`, category `WebSocket` -- view with `log show --predicate 'subsystem == "com.dataanalyst"' --last 5m --info`
## Server Components
@ -161,12 +161,12 @@ location /ws/notifications {
## Project Structure
```
macos-app/KeboolaAnalyst/
KeboolaAnalyst.xcodeproj/
KeboolaAnalyst/
macos-app/DataAnalyst/
DataAnalyst.xcodeproj/
DataAnalyst/
App/
KeboolaAnalystApp.swift # @main, MenuBarExtra
AppDelegate.swift # URL scheme handler (keboola-analyst://)
DataAnalystApp.swift # @main, MenuBarExtra
AppDelegate.swift # URL scheme handler (data-analyst://)
Core/
Config.swift # URLs, timeouts, keychain names
KeychainService.swift # JWT storage in Keychain
@ -182,7 +182,7 @@ macos-app/KeboolaAnalyst/
NotificationDetail.swift # Full view with chart image
SettingsView.swift # Connection status, sign out
Info.plist # URL scheme registration
KeboolaAnalyst.entitlements # Network client permission
DataAnalyst.entitlements # Network client permission
```
## Troubleshooting
@ -208,7 +208,7 @@ sudo -u deploy curl -s --unix-socket /run/ws-gateway/ws.sock http://localhost/he
```
If connections is 0, restart the app. Check app logs:
```bash
/usr/bin/log show --predicate 'subsystem == "com.keboola.analyst"' --last 5m --info
/usr/bin/log show --predicate 'subsystem == "com.dataanalyst"' --last 5m --info
```
### Script runs but no notification appears

View file

@ -51,12 +51,12 @@ tables:
sync_strategy: "full_refresh"
```
## Writing a Custom Adapter
## Writing a Custom Connector
Create a new file in `src/adapters/`:
Create a new connector module in `connectors/<name>/adapter.py`:
```python
from ..data_sync import DataSource
from src.data_sync import DataSource
class MyDataSource(DataSource):
def sync_table(self, table_config, sync_state):
@ -65,9 +65,6 @@ class MyDataSource(DataSource):
pass
```
Register in `src/adapters/__init__.py`:
```python
if adapter_type == "my_source":
from .my_adapter import MyDataSource
return MyDataSource(**kwargs)
```
The `create_data_source()` function in `src/data_sync.py` auto-discovers connectors from the `connectors/` directory. Set `data_source.type` in `config/instance.yaml` to match the connector directory name (e.g., `keboola` for `connectors/keboola/`).
See `connectors/keboola/` for a complete reference implementation.

View file

@ -267,7 +267,7 @@ if [[ -f "${REPO_DIR}/server/limits-users.conf" ]]; then
fi
# Create data sync .env file from environment variables (passed from GitHub Actions)
KEBOOLA_ENV_FILE="${REPO_DIR}/.env"
SYNC_ENV_FILE="${REPO_DIR}/.env"
if [[ -n "${KEBOOLA_STORAGE_TOKEN:-}" ]]; then
log "Creating data sync .env file..."
{
@ -310,12 +310,12 @@ if [[ -n "${KEBOOLA_STORAGE_TOKEN:-}" ]]; then
if [[ -n "${ANTHROPIC_API_KEY:-}" ]]; then
echo "ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}"
fi
} | sudo /usr/bin/tee "$KEBOOLA_ENV_FILE" > /dev/null
sudo /usr/bin/chown root:data-ops "$KEBOOLA_ENV_FILE"
sudo /usr/bin/chmod 640 "$KEBOOLA_ENV_FILE"
} | sudo /usr/bin/tee "$SYNC_ENV_FILE" > /dev/null
sudo /usr/bin/chown root:data-ops "$SYNC_ENV_FILE"
sudo /usr/bin/chmod 640 "$SYNC_ENV_FILE"
log " Data sync .env created with secure permissions (640)"
else
log " Skipping data sync .env creation (no KEBOOLA_STORAGE_TOKEN provided)"
log " Skipping data sync .env creation (no sync credentials provided)"
fi
# Set correct permissions
@ -325,8 +325,8 @@ sudo /usr/bin/chmod -R 770 "$APP_DIR" # owner+group rwx, others none
sudo /usr/bin/chmod -R g+s "$APP_DIR" # setgid for new files
# Restore .env permissions (may have been overwritten by chmod -R)
if [[ -f "$KEBOOLA_ENV_FILE" ]]; then
sudo /usr/bin/chmod 640 "$KEBOOLA_ENV_FILE"
if [[ -f "$SYNC_ENV_FILE" ]]; then
sudo /usr/bin/chmod 640 "$SYNC_ENV_FILE"
fi
# Update and restart webapp if running

View file

@ -1,5 +1,5 @@
[Unit]
Description=Keboola Data Analyst Telegram Notification Bot
Description=Data Analyst Telegram Notification Bot
After=network-online.target
Wants=network-online.target
@ -12,7 +12,7 @@ ExecStart=/opt/data-analyst/.venv/bin/python -m server.telegram_bot.bot
Restart=always
RestartSec=10
# Environment (webapp .env + Keboola .env with bot token)
# Environment (webapp .env + sync .env with bot token)
EnvironmentFile=/opt/data-analyst/.env
EnvironmentFile=/opt/data-analyst/repo/.env

View file

@ -1,5 +1,5 @@
[Unit]
Description=WebSocket Gateway for Keboola Data Analyst
Description=WebSocket Gateway for Data Analyst
After=network.target
[Service]