Update docs for modular architecture (auth/, services/, scripts/)
Add auth providers, standalone services, and service patterns to project structure in README, ARCHITECTURE, and CLAUDE.md. Reflects the completed extraction of auth, telegram bot, ws gateway, corporate memory, and session collector.
This commit is contained in:
parent
15b513266d
commit
7c9007a8f9
3 changed files with 88 additions and 27 deletions
|
|
@ -64,20 +64,44 @@ Flask app for user onboarding, settings, and data catalog.
|
||||||
| `config/.env.template` | Secret variable placeholders |
|
| `config/.env.template` | Secret variable placeholders |
|
||||||
| `docs/data_description.md` | Table schemas + sync strategies (not committed) |
|
| `docs/data_description.md` | Table schemas + sync strategies (not committed) |
|
||||||
|
|
||||||
### 4. Server Infrastructure (`server/`)
|
### 4. Auth Providers (`auth/`)
|
||||||
|
|
||||||
Deployment, systemd services, security.
|
Pluggable authentication via auto-discovered providers.
|
||||||
|
|
||||||
|
| File | Role |
|
||||||
|
|------|------|
|
||||||
|
| `auth/__init__.py` | `AuthProvider` ABC + `discover_providers()` scanner |
|
||||||
|
| `auth/google/provider.py` | Google OAuth (extracted from webapp/auth.py) |
|
||||||
|
| `auth/password/provider.py` | Email/password (delegates to webapp/password_auth) |
|
||||||
|
| `auth/desktop/provider.py` | Desktop JWT auth (API-only, hidden from login page) |
|
||||||
|
|
||||||
|
To add a new provider: create `auth/<name>/provider.py` implementing `AuthProvider`, export a `provider` instance. No core changes needed.
|
||||||
|
|
||||||
|
### 5. Standalone Services (`services/`)
|
||||||
|
|
||||||
|
Self-contained services with own systemd units, auto-discovered by `deploy.sh`.
|
||||||
|
|
||||||
|
| Directory | Role |
|
||||||
|
|-----------|------|
|
||||||
|
| `services/telegram_bot/` | Telegram notification bot + dispatch |
|
||||||
|
| `services/ws_gateway/` | WebSocket gateway for desktop app |
|
||||||
|
| `services/corporate_memory/` | AI knowledge aggregation from analyst sessions |
|
||||||
|
| `services/session_collector/` | Claude Code session metadata collector |
|
||||||
|
|
||||||
|
### 6. Server Infrastructure (`server/`)
|
||||||
|
|
||||||
|
Deployment only -- no application code.
|
||||||
|
|
||||||
| File | Role |
|
| File | Role |
|
||||||
|------|------|
|
|------|------|
|
||||||
| `server/setup.sh` | Initial server provisioning (groups, users, dirs) |
|
| `server/setup.sh` | Initial server provisioning (groups, users, dirs) |
|
||||||
| `server/webapp-setup.sh` | Nginx, SSL, systemd for webapp |
|
| `server/webapp-setup.sh` | Nginx, SSL, systemd for webapp |
|
||||||
| `server/deploy.sh` | CI/CD deployment script |
|
| `server/deploy.sh` | CI/CD deployment (auto-discovers `services/*/systemd/*`) |
|
||||||
| `server/sudoers-deploy` | Least-privilege sudo rules for deploy user |
|
| `server/sudoers-deploy` | Least-privilege sudo rules for deploy user |
|
||||||
| `server/sudoers-webapp` | Sudo rules for www-data (webapp) |
|
| `server/sudoers-webapp` | Sudo rules for www-data (webapp) |
|
||||||
| `server/bin/` | Management scripts (add-analyst, list-analysts, etc.) |
|
| `server/bin/` | Management scripts (add-analyst, list-analysts, etc.) |
|
||||||
|
|
||||||
### 5. Analyst Scripts (`scripts/`)
|
### 7. Analyst Scripts (`scripts/`)
|
||||||
|
|
||||||
Helper scripts synced to analyst machines.
|
Helper scripts synced to analyst machines.
|
||||||
|
|
||||||
|
|
@ -129,6 +153,8 @@ inject_config() context processor
|
||||||
## Key Patterns
|
## Key Patterns
|
||||||
|
|
||||||
- **Connector pattern**: Dynamic connector registry in `src/data_sync.py`, `connectors/keboola/` for reference
|
- **Connector pattern**: Dynamic connector registry in `src/data_sync.py`, `connectors/keboola/` for reference
|
||||||
|
- **Auth provider pattern**: Auto-discovered from `auth/*/provider.py`, each implements `AuthProvider` ABC
|
||||||
|
- **Service pattern**: Self-contained modules in `services/` with own `__main__.py` and `systemd/` directory
|
||||||
- **Atomic writes**: `tempfile.mkstemp()` + `os.fchmod()` + `os.replace()` for JSON state files
|
- **Atomic writes**: `tempfile.mkstemp()` + `os.fchmod()` + `os.replace()` for JSON state files
|
||||||
- **User home writes**: `sudo install -o {user} -g {user}` for writing to analyst home dirs
|
- **User home writes**: `sudo install -o {user} -g {user}` for writing to analyst home dirs
|
||||||
- **Config interpolation**: `${ENV_VAR}` in YAML resolved at load time, missing vars logged as warnings
|
- **Config interpolation**: `${ENV_VAR}` in YAML resolved at load time, missing vars logged as warnings
|
||||||
|
|
|
||||||
46
CLAUDE.md
46
CLAUDE.md
|
|
@ -32,17 +32,26 @@ Ask the user for:
|
||||||
## Project Structure
|
## Project Structure
|
||||||
|
|
||||||
```
|
```
|
||||||
├── src/ # Core data sync engine
|
├── src/ # Core data sync engine (vendor-neutral)
|
||||||
│ ├── config.py # Configuration from data_description.md
|
│ ├── config.py # Configuration from data_description.md
|
||||||
│ ├── data_sync.py # Sync orchestration + DataSource ABC
|
│ ├── data_sync.py # Sync orchestration + DataSource ABC
|
||||||
│ ├── parquet_manager.py # Parquet file management
|
│ ├── parquet_manager.py # Parquet file management
|
||||||
│ └── profiler.py # Data profiling
|
│ └── profiler.py # Data profiling
|
||||||
├── connectors/ # Data source connectors
|
├── connectors/ # Data source connectors (pluggable)
|
||||||
│ ├── keboola/ # Keboola Storage connector
|
│ ├── keboola/ # Keboola Storage connector
|
||||||
│ └── jira/ # Jira webhook connector
|
│ └── jira/ # Jira webhook connector
|
||||||
|
├── auth/ # Authentication providers (pluggable)
|
||||||
|
│ ├── google/ # Google OAuth provider
|
||||||
|
│ ├── password/ # Email/password provider
|
||||||
|
│ └── desktop/ # Desktop JWT provider (API-only)
|
||||||
|
├── services/ # Standalone services (own systemd units)
|
||||||
|
│ ├── telegram_bot/ # Telegram notification bot
|
||||||
|
│ ├── ws_gateway/ # WebSocket notification gateway
|
||||||
|
│ ├── corporate_memory/ # AI knowledge aggregation
|
||||||
|
│ └── session_collector/ # Claude Code session collector
|
||||||
├── webapp/ # Flask web portal (login, dashboard, API)
|
├── webapp/ # Flask web portal (login, dashboard, API)
|
||||||
├── server/ # Server deployment (systemd, scripts)
|
├── server/ # Deployment infrastructure only
|
||||||
├── scripts/ # Utility scripts (sync, DuckDB setup)
|
├── scripts/ # Utility scripts (sync, DuckDB setup, dev)
|
||||||
├── config/ # Configuration templates
|
├── config/ # Configuration templates
|
||||||
│ ├── instance.yaml.example
|
│ ├── instance.yaml.example
|
||||||
│ └── data_description.md.example
|
│ └── data_description.md.example
|
||||||
|
|
@ -97,14 +106,22 @@ pytest tests/ -v
|
||||||
python -m src.data_sync
|
python -m src.data_sync
|
||||||
```
|
```
|
||||||
|
|
||||||
## Data Source Adapters
|
## Extensibility
|
||||||
|
|
||||||
The platform supports pluggable data sources via `connectors/`:
|
### Data Sources
|
||||||
- **Keboola** (`keboola`): Syncs from Keboola Storage API (see `connectors/keboola/`)
|
Pluggable data source connectors in `connectors/`:
|
||||||
|
- **Keboola** (`keboola`): Syncs from Keboola Storage API
|
||||||
- **CSV** (`csv`): Import from local CSV files (planned)
|
- **CSV** (`csv`): Import from local CSV files (planned)
|
||||||
- **BigQuery** (`bigquery`): Query from Google BigQuery (planned)
|
- New connector = `connectors/<name>/adapter.py` implementing `DataSource`
|
||||||
|
|
||||||
Configure in `config/instance.yaml` under `data_source.type`.
|
### Authentication
|
||||||
|
Pluggable auth providers in `auth/`:
|
||||||
|
- **Google** (`google`): OAuth via Google
|
||||||
|
- **Password** (`password`): Email/password with magic links
|
||||||
|
- **Desktop** (`desktop`): JWT for desktop app API
|
||||||
|
- New provider = `auth/<name>/provider.py` implementing `AuthProvider`
|
||||||
|
|
||||||
|
Configure data source in `config/instance.yaml` under `data_source.type`.
|
||||||
|
|
||||||
## Server Management
|
## Server Management
|
||||||
|
|
||||||
|
|
@ -144,6 +161,17 @@ When reopening the project in Claude Code:
|
||||||
- Keboola: `connectors/keboola/adapter.py` -> `KeboolaDataSource` implementing `DataSource`
|
- Keboola: `connectors/keboola/adapter.py` -> `KeboolaDataSource` implementing `DataSource`
|
||||||
- Core Keboola logic: `connectors/keboola/client.py` (Keboola Storage API wrapper)
|
- Core Keboola logic: `connectors/keboola/client.py` (Keboola Storage API wrapper)
|
||||||
|
|
||||||
|
### Auth Provider Pattern
|
||||||
|
- ABC: `AuthProvider` class in `auth/__init__.py`
|
||||||
|
- Discovery: `discover_providers()` scans `auth/*/provider.py`
|
||||||
|
- Providers: google, password, desktop (each exports `provider` instance)
|
||||||
|
- Session contract: all providers set `session["user"] = {"email", "name", "picture"}`
|
||||||
|
|
||||||
|
### Service Pattern
|
||||||
|
- Self-contained modules in `services/` with `__main__.py` for `python -m services.<name>`
|
||||||
|
- Systemd files in `services/<name>/systemd/`, auto-discovered by `deploy.sh`
|
||||||
|
- Services: telegram_bot, ws_gateway, corporate_memory, session_collector
|
||||||
|
|
||||||
### Server Patterns
|
### Server Patterns
|
||||||
- Atomic JSON writes: `tempfile.mkstemp()` + `os.fchmod(fd, 0o660)` + `os.replace()`
|
- Atomic JSON writes: `tempfile.mkstemp()` + `os.fchmod(fd, 0o660)` + `os.replace()`
|
||||||
- User home writes: `sudo /usr/bin/install -o {user} -g {user}` pattern
|
- User home writes: `sudo /usr/bin/install -o {user} -g {user}` pattern
|
||||||
|
|
|
||||||
35
README.md
35
README.md
|
|
@ -40,7 +40,8 @@ flowchart TB
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
- **Pluggable data sources** -- adapter interface supporting Keboola out of the box, CSV import, and extensible to BigQuery, Snowflake, and others.
|
- **Pluggable data sources** -- connector interface supporting Keboola out of the box, CSV import, and extensible to BigQuery, Snowflake, and others.
|
||||||
|
- **Pluggable authentication** -- auto-discovered auth providers (Google OAuth, email/password, desktop JWT, or custom).
|
||||||
- **Automatic Parquet conversion** -- source data is converted to typed, partitioned Parquet files for efficient local querying.
|
- **Automatic Parquet conversion** -- source data is converted to typed, partitioned Parquet files for efficient local querying.
|
||||||
- **SSH-based distribution** -- analysts sync data securely via rsync; no cloud credentials leave the server.
|
- **SSH-based distribution** -- analysts sync data securely via rsync; no cloud credentials leave the server.
|
||||||
- **Claude Code as analyst interface** -- natural language queries against DuckDB, powered by Claude.
|
- **Claude Code as analyst interface** -- natural language queries against DuckDB, powered by Claude.
|
||||||
|
|
@ -81,37 +82,43 @@ ai-data-analyst/
|
||||||
│ ├── instance.yaml.example # Main config template (copy to instance.yaml)
|
│ ├── instance.yaml.example # Main config template (copy to instance.yaml)
|
||||||
│ └── data_description.md.example # Data schema template
|
│ └── data_description.md.example # Data schema template
|
||||||
│
|
│
|
||||||
├── src/ # Server-side Python code
|
├── src/ # Core data sync engine (vendor-neutral)
|
||||||
│ ├── data_sync.py # Orchestrates data pull + DataSource ABC
|
│ ├── data_sync.py # Orchestrates data pull + DataSource ABC
|
||||||
│ ├── parquet_manager.py # CSV to Parquet conversion
|
│ ├── parquet_manager.py # CSV to Parquet conversion
|
||||||
│ ├── config.py # Configuration loader
|
│ ├── config.py # Configuration loader
|
||||||
│ └── profiler.py # Data profiling for catalog
|
│ └── profiler.py # Data profiling for catalog
|
||||||
│
|
│
|
||||||
├── connectors/ # Data source connectors
|
├── connectors/ # Data source connectors (pluggable)
|
||||||
│ ├── keboola/ # Keboola Storage connector
|
│ ├── keboola/ # Keboola Storage connector
|
||||||
│ │ ├── adapter.py # KeboolaDataSource (implements DataSource)
|
│ │ ├── adapter.py # KeboolaDataSource (implements DataSource)
|
||||||
│ │ └── client.py # Low-level Keboola API client
|
│ │ └── client.py # Low-level Keboola API client
|
||||||
│ └── jira/ # Jira webhook connector
|
│ └── jira/ # Jira webhook connector
|
||||||
│
|
│
|
||||||
|
├── auth/ # Authentication providers (pluggable)
|
||||||
|
│ ├── google/ # Google OAuth provider
|
||||||
|
│ ├── password/ # Email/password provider
|
||||||
|
│ └── desktop/ # Desktop JWT provider (API-only)
|
||||||
|
│
|
||||||
|
├── services/ # Standalone services (own systemd units)
|
||||||
|
│ ├── telegram_bot/ # Telegram notification bot
|
||||||
|
│ ├── ws_gateway/ # WebSocket notification gateway
|
||||||
|
│ ├── corporate_memory/ # AI knowledge aggregation
|
||||||
|
│ └── session_collector/ # Claude Code session collector
|
||||||
|
│
|
||||||
├── webapp/ # Flask web application
|
├── webapp/ # Flask web application
|
||||||
│ └── ... # User onboarding, settings, catalog
|
│ └── ... # User onboarding, settings, catalog
|
||||||
│
|
│
|
||||||
├── server/ # Deployment and server management
|
├── server/ # Deployment infrastructure only
|
||||||
│ ├── deploy.sh # Deployment script
|
│ ├── deploy.sh # Deployment script (auto-discovers services)
|
||||||
│ └── ... # Systemd units, sudoers, cron jobs
|
│ └── ... # Sudoers, nginx, setup scripts
|
||||||
│
|
│
|
||||||
├── scripts/ # Analyst-facing helper scripts
|
├── scripts/ # Helper scripts
|
||||||
│ ├── sync_data.sh # Sync data from server
|
│ ├── sync_data.sh # Sync data from server
|
||||||
│ └── setup_views.sh # Initialize DuckDB views
|
│ ├── setup_views.sh # Initialize DuckDB views
|
||||||
|
│ └── dev_run.py # Dev server with auth bypass
|
||||||
│
|
│
|
||||||
├── docs/ # User-facing documentation
|
├── docs/ # User-facing documentation
|
||||||
│ ├── QUICKSTART.md # Setup guide
|
|
||||||
│ └── data_description.md # Table schemas (single source of truth)
|
|
||||||
│
|
|
||||||
├── dev_docs/ # Developer and operator documentation
|
├── dev_docs/ # Developer and operator documentation
|
||||||
│ ├── server.md # Server administration
|
|
||||||
│ └── security.md # Security model
|
|
||||||
│
|
|
||||||
├── tests/ # Test suite
|
├── tests/ # Test suite
|
||||||
├── requirements.txt # Python dependencies
|
├── requirements.txt # Python dependencies
|
||||||
├── CLAUDE.md # Instructions for Claude Code
|
├── CLAUDE.md # Instructions for Claude Code
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue