Branding cleanup: remove Keboola-specific references from docs and config

- server/deploy.sh: KEBOOLA_ENV_FILE -> SYNC_ENV_FILE
- server/ws-gateway.service, notify-bot.service: remove Keboola from descriptions
- .gitignore: generic comment for data directory
- CLAUDE.md, README.md, ARCHITECTURE.md: update paths from src/adapters to connectors/
- docs/DATA_SOURCES.md: update custom connector guide to connectors/ pattern
- connectors/jira/README.md: keboola-analyst -> data-analyst in config paths
- dev_docs/desktop-app.md: KeboolaAnalyst -> DataAnalyst branding
This commit is contained in:
Petr 2026-03-09 12:22:27 +01:00
parent 266e8573d3
commit 38b86127ed
10 changed files with 56 additions and 55 deletions

2
.gitignore vendored
View file

@ -117,7 +117,7 @@ docs/data_description.md
.github/workflows/deploy.yml .github/workflows/deploy.yml
# Project-specific: Data directory # Project-specific: Data directory
# Downloaded data from Keboola - never commit # Downloaded source data - never commit
data/ data/
# Metadata tooling - entire folder # Metadata tooling - entire folder

View file

@ -10,7 +10,7 @@ Data Source (Keboola / CSV / BigQuery)
| Data Broker Server | | Data Broker Server |
| | | |
| src/data_sync.py | | src/data_sync.py |
| -> src/adapters/*.py (fetch data) | | -> connectors/*.py (fetch data) |
| -> src/parquet_manager.py (convert) | | -> src/parquet_manager.py (convert) |
| | | |
| /data/src_data/parquet/ (output) | | /data/src_data/parquet/ (output) |
@ -37,9 +37,8 @@ Pulls data from configured source, converts to Parquet.
| File | Role | | File | Role |
|------|------| |------|------|
| `src/data_sync.py` | Orchestration + `DataSource` ABC (line 149) | | `src/data_sync.py` | Orchestration + `DataSource` ABC (line 149) |
| `src/adapters/base.py` | Adapter interface | | `connectors/keboola/adapter.py` | Keboola data source |
| `src/adapters/keboola_adapter.py` | Keboola Storage adapter | | `connectors/keboola/client.py` | Low-level Keboola API client |
| `src/keboola_client.py` | Low-level Keboola API client |
| `src/parquet_manager.py` | CSV -> typed Parquet conversion | | `src/parquet_manager.py` | CSV -> typed Parquet conversion |
| `src/config.py` | Reads `data_description.md` for table definitions | | `src/config.py` | Reads `data_description.md` for table definitions |
| `src/profiler.py` | Data profiling for catalog UI | | `src/profiler.py` | Data profiling for catalog UI |
@ -129,7 +128,7 @@ inject_config() context processor
## Key Patterns ## Key Patterns
- **Adapter pattern**: Factory in `src/adapters/__init__.py`, ABC in `src/data_sync.py` - **Connector pattern**: Dynamic connector registry in `src/data_sync.py`, `connectors/keboola/` for reference
- **Atomic writes**: `tempfile.mkstemp()` + `os.fchmod()` + `os.replace()` for JSON state files - **Atomic writes**: `tempfile.mkstemp()` + `os.fchmod()` + `os.replace()` for JSON state files
- **User home writes**: `sudo install -o {user} -g {user}` for writing to analyst home dirs - **User home writes**: `sudo install -o {user} -g {user}` for writing to analyst home dirs
- **Config interpolation**: `${ENV_VAR}` in YAML resolved at load time, missing vars logged as warnings - **Config interpolation**: `${ENV_VAR}` in YAML resolved at load time, missing vars logged as warnings

View file

@ -33,11 +33,13 @@ Ask the user for:
``` ```
├── src/ # Core data sync engine ├── src/ # Core data sync engine
│ ├── adapters/ # Data source adapters (Keboola, CSV, etc.)
│ ├── config.py # Configuration from data_description.md │ ├── config.py # Configuration from data_description.md
│ ├── data_sync.py # Sync orchestration │ ├── data_sync.py # Sync orchestration + DataSource ABC
│ ├── parquet_manager.py # Parquet file management │ ├── parquet_manager.py # Parquet file management
│ └── profiler.py # Data profiling │ └── profiler.py # Data profiling
├── connectors/ # Data source connectors
│ ├── keboola/ # Keboola Storage connector
│ └── jira/ # Jira webhook connector
├── webapp/ # Flask web portal (login, dashboard, API) ├── webapp/ # Flask web portal (login, dashboard, API)
├── server/ # Server deployment (systemd, scripts) ├── server/ # Server deployment (systemd, scripts)
├── scripts/ # Utility scripts (sync, DuckDB setup) ├── scripts/ # Utility scripts (sync, DuckDB setup)
@ -97,8 +99,8 @@ python -m src.data_sync
## Data Source Adapters ## Data Source Adapters
The platform supports pluggable data sources via `src/adapters/`: The platform supports pluggable data sources via `connectors/`:
- **Keboola** (`keboola`): Syncs from Keboola Storage API - **Keboola** (`keboola`): Syncs from Keboola Storage API (see `connectors/keboola/`)
- **CSV** (`csv`): Import from local CSV files (planned) - **CSV** (`csv`): Import from local CSV files (planned)
- **BigQuery** (`bigquery`): Query from Google BigQuery (planned) - **BigQuery** (`bigquery`): Query from Google BigQuery (planned)
@ -136,11 +138,11 @@ When reopening the project in Claude Code:
4. `inject_config()` context processor exposes `Config` to all Jinja templates 4. `inject_config()` context processor exposes `Config` to all Jinja templates
5. Templates use `{{ config.INSTANCE_NAME }}`, `{{ config.INSTANCE_SUBTITLE }}`, etc. 5. Templates use `{{ config.INSTANCE_NAME }}`, `{{ config.INSTANCE_SUBTITLE }}`, etc.
### Adapter Pattern ### Connector Pattern
- Factory: `src/adapters/__init__.py` -> `create_data_source(adapter_type, **kwargs)` - ABC: `DataSource` class in `src/data_sync.py`
- ABC: `DataSource` class in `src/data_sync.py` (lines 149-172) - Registry: `create_data_source()` in `src/data_sync.py` auto-discovers connectors in `connectors/`
- Keboola: `src/adapters/keboola_adapter.py` -> thin facade wrapping `LocalKeboolaSource` - Keboola: `connectors/keboola/adapter.py` -> `KeboolaDataSource` implementing `DataSource`
- Core Keboola logic: `src/keboola_client.py` (788 lines, Keboola Storage API wrapper) - Core Keboola logic: `connectors/keboola/client.py` (Keboola Storage API wrapper)
### Server Patterns ### Server Patterns
- Atomic JSON writes: `tempfile.mkstemp()` + `os.fchmod(fd, 0o660)` + `os.replace()` - Atomic JSON writes: `tempfile.mkstemp()` + `os.fchmod(fd, 0o660)` + `os.replace()`

View file

@ -82,14 +82,17 @@ ai-data-analyst/
│ └── data_description.md.example # Data schema template │ └── data_description.md.example # Data schema template
├── src/ # Server-side Python code ├── src/ # Server-side Python code
│ ├── adapters/ # Data source adapters │ ├── data_sync.py # Orchestrates data pull + DataSource ABC
│ │ ├── base.py # Adapter interface (ABC)
│ │ └── keboola_adapter.py # Keboola Storage adapter
│ ├── data_sync.py # Orchestrates data pull from sources
│ ├── parquet_manager.py # CSV to Parquet conversion │ ├── parquet_manager.py # CSV to Parquet conversion
│ ├── config.py # Configuration loader │ ├── config.py # Configuration loader
│ └── profiler.py # Data profiling for catalog │ └── profiler.py # Data profiling for catalog
├── connectors/ # Data source connectors
│ ├── keboola/ # Keboola Storage connector
│ │ ├── adapter.py # KeboolaDataSource (implements DataSource)
│ │ └── client.py # Low-level Keboola API client
│ └── jira/ # Jira webhook connector
├── webapp/ # Flask web application ├── webapp/ # Flask web application
│ └── ... # User onboarding, settings, catalog │ └── ... # User onboarding, settings, catalog
@ -124,7 +127,7 @@ ai-data-analyst/
| BigQuery | Planned | Google BigQuery adapter | | BigQuery | Planned | Google BigQuery adapter |
| Snowflake | Planned | Snowflake adapter | | Snowflake | Planned | Snowflake adapter |
Adding a new adapter means implementing the `DataSource` interface in `src/adapters/` and setting `data_source.type` in `config/instance.yaml`. See `src/adapters/base.py` for the contract. Adding a new data source means creating a connector module in `connectors/` that implements the `DataSource` interface from `src/data_sync.py`, and setting `data_source.type` in `config/instance.yaml`. See `connectors/keboola/` for a reference implementation.
## Using with Claude Code ## Using with Claude Code

View file

@ -73,7 +73,7 @@ Real-time sync of Jira support tickets for AI-powered analysis.
┌─────────────────────────────────────────────────────────────────────────────┐ ┌─────────────────────────────────────────────────────────────────────────────┐
│ ANALYST MACHINE │ │ ANALYST MACHINE │
│ │ │ │
│ ~/keboola-analysis/ │ ~/data-analysis/
│ └── server/ │ │ └── server/ │
│ └── parquet/ │ │ └── parquet/ │
│ └── jira/ # Synced Parquet + attachments │ │ └── jira/ # Synced Parquet + attachments │
@ -540,7 +540,7 @@ Jira data is an **optional dataset** - not synced by default to save bandwidth.
**Enable Jira sync:** **Enable Jira sync:**
```bash ```bash
# Edit local config (created on first sync_data.sh run) # Edit local config (created on first sync_data.sh run)
nano ~/.config/keboola-analyst/sync.yaml nano ~/.config/data-analyst/sync.yaml
# Change: # Change:
datasets: datasets:
@ -585,7 +585,7 @@ This is fast (only downloads files for one ticket) and keeps your local machine
If you need frequent access to attachments, enable full sync: If you need frequent access to attachments, enable full sync:
```yaml ```yaml
# ~/.config/keboola-analyst/sync.yaml # ~/.config/data-analyst/sync.yaml
datasets: datasets:
jira: true jira: true
jira_attachments: true # Syncs ~500MB+ of files jira_attachments: true # Syncs ~500MB+ of files

View file

@ -32,18 +32,18 @@ The WebSocket gateway (`server/ws_gateway/`) runs as a separate systemd service
## Building ## Building
```bash ```bash
cd macos-app/KeboolaAnalyst cd macos-app/DataAnalyst
xcodebuild -scheme KeboolaAnalyst -configuration Debug build xcodebuild -scheme DataAnalyst -configuration Debug build
``` ```
The built app is at: The built app is at:
``` ```
~/Library/Developer/Xcode/DerivedData/KeboolaAnalyst-*/Build/Products/Debug/KeboolaAnalyst.app ~/Library/Developer/Xcode/DerivedData/DataAnalyst-*/Build/Products/Debug/DataAnalyst.app
``` ```
To run: To run:
```bash ```bash
open ~/Library/Developer/Xcode/DerivedData/KeboolaAnalyst-*/Build/Products/Debug/KeboolaAnalyst.app open ~/Library/Developer/Xcode/DerivedData/DataAnalyst-*/Build/Products/Debug/DataAnalyst.app
``` ```
## Authentication Flow ## Authentication Flow
@ -52,7 +52,7 @@ open ~/Library/Developer/Xcode/DerivedData/KeboolaAnalyst-*/Build/Products/Debug
2. Browser opens `https://your-instance.example.com/desktop/link` 2. Browser opens `https://your-instance.example.com/desktop/link`
3. User authenticates via Google SSO (if not already logged in) 3. User authenticates via Google SSO (if not already logged in)
4. User clicks **Authorize Desktop App** 4. User clicks **Authorize Desktop App**
5. Webapp generates a JWT token (HS256, 30-day expiry) and redirects to `keboola-analyst://auth?token=eyJ...` 5. Webapp generates a JWT token (HS256, 30-day expiry) and redirects to `data-analyst://auth?token=eyJ...`
6. macOS app catches the custom URL scheme, stores the JWT in Keychain 6. macOS app catches the custom URL scheme, stores the JWT in Keychain
7. App connects to WebSocket gateway, sends `{"type":"auth","token":"..."}` 7. App connects to WebSocket gateway, sends `{"type":"auth","token":"..."}`
8. Gateway validates JWT and confirms with `{"type":"auth_ok","username":"..."}` 8. Gateway validates JWT and confirms with `{"type":"auth_ok","username":"..."}`
@ -86,7 +86,7 @@ Client -> Server: {"type":"pong"}
- **Persistence**: notifications stored in UserDefaults between launches - **Persistence**: notifications stored in UserDefaults between launches
- **Keychain**: JWT token stored securely in macOS Keychain - **Keychain**: JWT token stored securely in macOS Keychain
- **Run scripts**: execute notification scripts on-demand via webapp API, results arrive as WS notifications - **Run scripts**: execute notification scripts on-demand via webapp API, results arrive as WS notifications
- **Logging**: `os.log` with subsystem `com.keboola.analyst`, category `WebSocket` -- view with `log show --predicate 'subsystem == "com.keboola.analyst"' --last 5m --info` - **Logging**: `os.log` with subsystem `com.dataanalyst`, category `WebSocket` -- view with `log show --predicate 'subsystem == "com.dataanalyst"' --last 5m --info`
## Server Components ## Server Components
@ -161,12 +161,12 @@ location /ws/notifications {
## Project Structure ## Project Structure
``` ```
macos-app/KeboolaAnalyst/ macos-app/DataAnalyst/
KeboolaAnalyst.xcodeproj/ DataAnalyst.xcodeproj/
KeboolaAnalyst/ DataAnalyst/
App/ App/
KeboolaAnalystApp.swift # @main, MenuBarExtra DataAnalystApp.swift # @main, MenuBarExtra
AppDelegate.swift # URL scheme handler (keboola-analyst://) AppDelegate.swift # URL scheme handler (data-analyst://)
Core/ Core/
Config.swift # URLs, timeouts, keychain names Config.swift # URLs, timeouts, keychain names
KeychainService.swift # JWT storage in Keychain KeychainService.swift # JWT storage in Keychain
@ -182,7 +182,7 @@ macos-app/KeboolaAnalyst/
NotificationDetail.swift # Full view with chart image NotificationDetail.swift # Full view with chart image
SettingsView.swift # Connection status, sign out SettingsView.swift # Connection status, sign out
Info.plist # URL scheme registration Info.plist # URL scheme registration
KeboolaAnalyst.entitlements # Network client permission DataAnalyst.entitlements # Network client permission
``` ```
## Troubleshooting ## Troubleshooting
@ -208,7 +208,7 @@ sudo -u deploy curl -s --unix-socket /run/ws-gateway/ws.sock http://localhost/he
``` ```
If connections is 0, restart the app. Check app logs: If connections is 0, restart the app. Check app logs:
```bash ```bash
/usr/bin/log show --predicate 'subsystem == "com.keboola.analyst"' --last 5m --info /usr/bin/log show --predicate 'subsystem == "com.dataanalyst"' --last 5m --info
``` ```
### Script runs but no notification appears ### Script runs but no notification appears

View file

@ -51,12 +51,12 @@ tables:
sync_strategy: "full_refresh" sync_strategy: "full_refresh"
``` ```
## Writing a Custom Adapter ## Writing a Custom Connector
Create a new file in `src/adapters/`: Create a new connector module in `connectors/<name>/adapter.py`:
```python ```python
from ..data_sync import DataSource from src.data_sync import DataSource
class MyDataSource(DataSource): class MyDataSource(DataSource):
def sync_table(self, table_config, sync_state): def sync_table(self, table_config, sync_state):
@ -65,9 +65,6 @@ class MyDataSource(DataSource):
pass pass
``` ```
Register in `src/adapters/__init__.py`: The `create_data_source()` function in `src/data_sync.py` auto-discovers connectors from the `connectors/` directory. Set `data_source.type` in `config/instance.yaml` to match the connector directory name (e.g., `keboola` for `connectors/keboola/`).
```python
if adapter_type == "my_source": See `connectors/keboola/` for a complete reference implementation.
from .my_adapter import MyDataSource
return MyDataSource(**kwargs)
```

View file

@ -267,7 +267,7 @@ if [[ -f "${REPO_DIR}/server/limits-users.conf" ]]; then
fi fi
# Create data sync .env file from environment variables (passed from GitHub Actions) # Create data sync .env file from environment variables (passed from GitHub Actions)
KEBOOLA_ENV_FILE="${REPO_DIR}/.env" SYNC_ENV_FILE="${REPO_DIR}/.env"
if [[ -n "${KEBOOLA_STORAGE_TOKEN:-}" ]]; then if [[ -n "${KEBOOLA_STORAGE_TOKEN:-}" ]]; then
log "Creating data sync .env file..." log "Creating data sync .env file..."
{ {
@ -310,12 +310,12 @@ if [[ -n "${KEBOOLA_STORAGE_TOKEN:-}" ]]; then
if [[ -n "${ANTHROPIC_API_KEY:-}" ]]; then if [[ -n "${ANTHROPIC_API_KEY:-}" ]]; then
echo "ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}" echo "ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}"
fi fi
} | sudo /usr/bin/tee "$KEBOOLA_ENV_FILE" > /dev/null } | sudo /usr/bin/tee "$SYNC_ENV_FILE" > /dev/null
sudo /usr/bin/chown root:data-ops "$KEBOOLA_ENV_FILE" sudo /usr/bin/chown root:data-ops "$SYNC_ENV_FILE"
sudo /usr/bin/chmod 640 "$KEBOOLA_ENV_FILE" sudo /usr/bin/chmod 640 "$SYNC_ENV_FILE"
log " Data sync .env created with secure permissions (640)" log " Data sync .env created with secure permissions (640)"
else else
log " Skipping data sync .env creation (no KEBOOLA_STORAGE_TOKEN provided)" log " Skipping data sync .env creation (no sync credentials provided)"
fi fi
# Set correct permissions # Set correct permissions
@ -325,8 +325,8 @@ sudo /usr/bin/chmod -R 770 "$APP_DIR" # owner+group rwx, others none
sudo /usr/bin/chmod -R g+s "$APP_DIR" # setgid for new files sudo /usr/bin/chmod -R g+s "$APP_DIR" # setgid for new files
# Restore .env permissions (may have been overwritten by chmod -R) # Restore .env permissions (may have been overwritten by chmod -R)
if [[ -f "$KEBOOLA_ENV_FILE" ]]; then if [[ -f "$SYNC_ENV_FILE" ]]; then
sudo /usr/bin/chmod 640 "$KEBOOLA_ENV_FILE" sudo /usr/bin/chmod 640 "$SYNC_ENV_FILE"
fi fi
# Update and restart webapp if running # Update and restart webapp if running

View file

@ -1,5 +1,5 @@
[Unit] [Unit]
Description=Keboola Data Analyst Telegram Notification Bot Description=Data Analyst Telegram Notification Bot
After=network-online.target After=network-online.target
Wants=network-online.target Wants=network-online.target
@ -12,7 +12,7 @@ ExecStart=/opt/data-analyst/.venv/bin/python -m server.telegram_bot.bot
Restart=always Restart=always
RestartSec=10 RestartSec=10
# Environment (webapp .env + Keboola .env with bot token) # Environment (webapp .env + sync .env with bot token)
EnvironmentFile=/opt/data-analyst/.env EnvironmentFile=/opt/data-analyst/.env
EnvironmentFile=/opt/data-analyst/repo/.env EnvironmentFile=/opt/data-analyst/repo/.env

View file

@ -1,5 +1,5 @@
[Unit] [Unit]
Description=WebSocket Gateway for Keboola Data Analyst Description=WebSocket Gateway for Data Analyst
After=network.target After=network.target
[Service] [Service]