Update docs for modular architecture (auth/, services/, scripts/)
Add auth providers, standalone services, and service patterns to project structure in README, ARCHITECTURE, and CLAUDE.md. Reflects the completed extraction of auth, telegram bot, ws gateway, corporate memory, and session collector.
This commit is contained in:
parent
15b513266d
commit
7c9007a8f9
3 changed files with 88 additions and 27 deletions
|
|
@ -64,20 +64,44 @@ Flask app for user onboarding, settings, and data catalog.
|
|||
| `config/.env.template` | Secret variable placeholders |
|
||||
| `docs/data_description.md` | Table schemas + sync strategies (not committed) |
|
||||
|
||||
### 4. Server Infrastructure (`server/`)
|
||||
### 4. Auth Providers (`auth/`)
|
||||
|
||||
Deployment, systemd services, security.
|
||||
Pluggable authentication via auto-discovered providers.
|
||||
|
||||
| File | Role |
|
||||
|------|------|
|
||||
| `auth/__init__.py` | `AuthProvider` ABC + `discover_providers()` scanner |
|
||||
| `auth/google/provider.py` | Google OAuth (extracted from webapp/auth.py) |
|
||||
| `auth/password/provider.py` | Email/password (delegates to webapp/password_auth) |
|
||||
| `auth/desktop/provider.py` | Desktop JWT auth (API-only, hidden from login page) |
|
||||
|
||||
To add a new provider: create `auth/<name>/provider.py` implementing `AuthProvider`, export a `provider` instance. No core changes needed.
|
||||
|
||||
### 5. Standalone Services (`services/`)
|
||||
|
||||
Self-contained services with own systemd units, auto-discovered by `deploy.sh`.
|
||||
|
||||
| Directory | Role |
|
||||
|-----------|------|
|
||||
| `services/telegram_bot/` | Telegram notification bot + dispatch |
|
||||
| `services/ws_gateway/` | WebSocket gateway for desktop app |
|
||||
| `services/corporate_memory/` | AI knowledge aggregation from analyst sessions |
|
||||
| `services/session_collector/` | Claude Code session metadata collector |
|
||||
|
||||
### 6. Server Infrastructure (`server/`)
|
||||
|
||||
Deployment only -- no application code.
|
||||
|
||||
| File | Role |
|
||||
|------|------|
|
||||
| `server/setup.sh` | Initial server provisioning (groups, users, dirs) |
|
||||
| `server/webapp-setup.sh` | Nginx, SSL, systemd for webapp |
|
||||
| `server/deploy.sh` | CI/CD deployment script |
|
||||
| `server/deploy.sh` | CI/CD deployment (auto-discovers `services/*/systemd/*`) |
|
||||
| `server/sudoers-deploy` | Least-privilege sudo rules for deploy user |
|
||||
| `server/sudoers-webapp` | Sudo rules for www-data (webapp) |
|
||||
| `server/bin/` | Management scripts (add-analyst, list-analysts, etc.) |
|
||||
|
||||
### 5. Analyst Scripts (`scripts/`)
|
||||
### 7. Analyst Scripts (`scripts/`)
|
||||
|
||||
Helper scripts synced to analyst machines.
|
||||
|
||||
|
|
@ -129,6 +153,8 @@ inject_config() context processor
|
|||
## Key Patterns
|
||||
|
||||
- **Connector pattern**: Dynamic connector registry in `src/data_sync.py`, `connectors/keboola/` for reference
|
||||
- **Auth provider pattern**: Auto-discovered from `auth/*/provider.py`, each implements `AuthProvider` ABC
|
||||
- **Service pattern**: Self-contained modules in `services/` with own `__main__.py` and `systemd/` directory
|
||||
- **Atomic writes**: `tempfile.mkstemp()` + `os.fchmod()` + `os.replace()` for JSON state files
|
||||
- **User home writes**: `sudo install -o {user} -g {user}` for writing to analyst home dirs
|
||||
- **Config interpolation**: `${ENV_VAR}` in YAML resolved at load time, missing vars logged as warnings
|
||||
|
|
|
|||
46
CLAUDE.md
46
CLAUDE.md
|
|
@ -32,17 +32,26 @@ Ask the user for:
|
|||
## Project Structure
|
||||
|
||||
```
|
||||
├── src/ # Core data sync engine
|
||||
├── src/ # Core data sync engine (vendor-neutral)
|
||||
│ ├── config.py # Configuration from data_description.md
|
||||
│ ├── data_sync.py # Sync orchestration + DataSource ABC
|
||||
│ ├── parquet_manager.py # Parquet file management
|
||||
│ └── profiler.py # Data profiling
|
||||
├── connectors/ # Data source connectors
|
||||
├── connectors/ # Data source connectors (pluggable)
|
||||
│ ├── keboola/ # Keboola Storage connector
|
||||
│ └── jira/ # Jira webhook connector
|
||||
├── auth/ # Authentication providers (pluggable)
|
||||
│ ├── google/ # Google OAuth provider
|
||||
│ ├── password/ # Email/password provider
|
||||
│ └── desktop/ # Desktop JWT provider (API-only)
|
||||
├── services/ # Standalone services (own systemd units)
|
||||
│ ├── telegram_bot/ # Telegram notification bot
|
||||
│ ├── ws_gateway/ # WebSocket notification gateway
|
||||
│ ├── corporate_memory/ # AI knowledge aggregation
|
||||
│ └── session_collector/ # Claude Code session collector
|
||||
├── webapp/ # Flask web portal (login, dashboard, API)
|
||||
├── server/ # Server deployment (systemd, scripts)
|
||||
├── scripts/ # Utility scripts (sync, DuckDB setup)
|
||||
├── server/ # Deployment infrastructure only
|
||||
├── scripts/ # Utility scripts (sync, DuckDB setup, dev)
|
||||
├── config/ # Configuration templates
|
||||
│ ├── instance.yaml.example
|
||||
│ └── data_description.md.example
|
||||
|
|
@ -97,14 +106,22 @@ pytest tests/ -v
|
|||
python -m src.data_sync
|
||||
```
|
||||
|
||||
## Data Source Adapters
|
||||
## Extensibility
|
||||
|
||||
The platform supports pluggable data sources via `connectors/`:
|
||||
- **Keboola** (`keboola`): Syncs from Keboola Storage API (see `connectors/keboola/`)
|
||||
### Data Sources
|
||||
Pluggable data source connectors in `connectors/`:
|
||||
- **Keboola** (`keboola`): Syncs from Keboola Storage API
|
||||
- **CSV** (`csv`): Import from local CSV files (planned)
|
||||
- **BigQuery** (`bigquery`): Query from Google BigQuery (planned)
|
||||
- New connector = `connectors/<name>/adapter.py` implementing `DataSource`
|
||||
|
||||
Configure in `config/instance.yaml` under `data_source.type`.
|
||||
### Authentication
|
||||
Pluggable auth providers in `auth/`:
|
||||
- **Google** (`google`): OAuth via Google
|
||||
- **Password** (`password`): Email/password with magic links
|
||||
- **Desktop** (`desktop`): JWT for desktop app API
|
||||
- New provider = `auth/<name>/provider.py` implementing `AuthProvider`
|
||||
|
||||
Configure data source in `config/instance.yaml` under `data_source.type`.
|
||||
|
||||
## Server Management
|
||||
|
||||
|
|
@ -144,6 +161,17 @@ When reopening the project in Claude Code:
|
|||
- Keboola: `connectors/keboola/adapter.py` -> `KeboolaDataSource` implementing `DataSource`
|
||||
- Core Keboola logic: `connectors/keboola/client.py` (Keboola Storage API wrapper)
|
||||
|
||||
### Auth Provider Pattern
|
||||
- ABC: `AuthProvider` class in `auth/__init__.py`
|
||||
- Discovery: `discover_providers()` scans `auth/*/provider.py`
|
||||
- Providers: google, password, desktop (each exports `provider` instance)
|
||||
- Session contract: all providers set `session["user"] = {"email", "name", "picture"}`
|
||||
|
||||
### Service Pattern
|
||||
- Self-contained modules in `services/` with `__main__.py` for `python -m services.<name>`
|
||||
- Systemd files in `services/<name>/systemd/`, auto-discovered by `deploy.sh`
|
||||
- Services: telegram_bot, ws_gateway, corporate_memory, session_collector
|
||||
|
||||
### Server Patterns
|
||||
- Atomic JSON writes: `tempfile.mkstemp()` + `os.fchmod(fd, 0o660)` + `os.replace()`
|
||||
- User home writes: `sudo /usr/bin/install -o {user} -g {user}` pattern
|
||||
|
|
|
|||
35
README.md
35
README.md
|
|
@ -40,7 +40,8 @@ flowchart TB
|
|||
|
||||
## Features
|
||||
|
||||
- **Pluggable data sources** -- adapter interface supporting Keboola out of the box, CSV import, and extensible to BigQuery, Snowflake, and others.
|
||||
- **Pluggable data sources** -- connector interface supporting Keboola out of the box, CSV import, and extensible to BigQuery, Snowflake, and others.
|
||||
- **Pluggable authentication** -- auto-discovered auth providers (Google OAuth, email/password, desktop JWT, or custom).
|
||||
- **Automatic Parquet conversion** -- source data is converted to typed, partitioned Parquet files for efficient local querying.
|
||||
- **SSH-based distribution** -- analysts sync data securely via rsync; no cloud credentials leave the server.
|
||||
- **Claude Code as analyst interface** -- natural language queries against DuckDB, powered by Claude.
|
||||
|
|
@ -81,37 +82,43 @@ ai-data-analyst/
|
|||
│ ├── instance.yaml.example # Main config template (copy to instance.yaml)
|
||||
│ └── data_description.md.example # Data schema template
|
||||
│
|
||||
├── src/ # Server-side Python code
|
||||
├── src/ # Core data sync engine (vendor-neutral)
|
||||
│ ├── data_sync.py # Orchestrates data pull + DataSource ABC
|
||||
│ ├── parquet_manager.py # CSV to Parquet conversion
|
||||
│ ├── config.py # Configuration loader
|
||||
│ └── profiler.py # Data profiling for catalog
|
||||
│
|
||||
├── connectors/ # Data source connectors
|
||||
├── connectors/ # Data source connectors (pluggable)
|
||||
│ ├── keboola/ # Keboola Storage connector
|
||||
│ │ ├── adapter.py # KeboolaDataSource (implements DataSource)
|
||||
│ │ └── client.py # Low-level Keboola API client
|
||||
│ └── jira/ # Jira webhook connector
|
||||
│
|
||||
├── auth/ # Authentication providers (pluggable)
|
||||
│ ├── google/ # Google OAuth provider
|
||||
│ ├── password/ # Email/password provider
|
||||
│ └── desktop/ # Desktop JWT provider (API-only)
|
||||
│
|
||||
├── services/ # Standalone services (own systemd units)
|
||||
│ ├── telegram_bot/ # Telegram notification bot
|
||||
│ ├── ws_gateway/ # WebSocket notification gateway
|
||||
│ ├── corporate_memory/ # AI knowledge aggregation
|
||||
│ └── session_collector/ # Claude Code session collector
|
||||
│
|
||||
├── webapp/ # Flask web application
|
||||
│ └── ... # User onboarding, settings, catalog
|
||||
│
|
||||
├── server/ # Deployment and server management
|
||||
│ ├── deploy.sh # Deployment script
|
||||
│ └── ... # Systemd units, sudoers, cron jobs
|
||||
├── server/ # Deployment infrastructure only
|
||||
│ ├── deploy.sh # Deployment script (auto-discovers services)
|
||||
│ └── ... # Sudoers, nginx, setup scripts
|
||||
│
|
||||
├── scripts/ # Analyst-facing helper scripts
|
||||
├── scripts/ # Helper scripts
|
||||
│ ├── sync_data.sh # Sync data from server
|
||||
│ └── setup_views.sh # Initialize DuckDB views
|
||||
│ ├── setup_views.sh # Initialize DuckDB views
|
||||
│ └── dev_run.py # Dev server with auth bypass
|
||||
│
|
||||
├── docs/ # User-facing documentation
|
||||
│ ├── QUICKSTART.md # Setup guide
|
||||
│ └── data_description.md # Table schemas (single source of truth)
|
||||
│
|
||||
├── dev_docs/ # Developer and operator documentation
|
||||
│ ├── server.md # Server administration
|
||||
│ └── security.md # Security model
|
||||
│
|
||||
├── tests/ # Test suite
|
||||
├── requirements.txt # Python dependencies
|
||||
├── CLAUDE.md # Instructions for Claude Code
|
||||
|
|
|
|||
Loading…
Reference in a new issue