docs: add refactoring plan, design spec, and gitignore updates
This commit is contained in:
parent
e0ce91ddb9
commit
07b396bfe2
4 changed files with 2677 additions and 0 deletions
3
.gitignore
vendored
3
.gitignore
vendored
|
|
@ -132,3 +132,6 @@ docs/datasets/*/schema.yml
|
|||
# Agent-generated reports (not part of codebase)
|
||||
.audit/
|
||||
docs/AGENT-REPORTS/
|
||||
|
||||
# Internal transcripts and meeting notes
|
||||
docs/ZS_PADAK_*
|
||||
|
|
|
|||
576
docs/REFACTORING_PLAN.md
Normal file
576
docs/REFACTORING_PLAN.md
Normal file
|
|
@ -0,0 +1,576 @@
|
|||
# Refaktoring AI Data Analyst — Finální plán
|
||||
|
||||
## Kontext
|
||||
|
||||
Platforma vznikla iterativně pro interní Keboola a nyní se má stát produktem pro zákazníky (Groupon aj.). Klíčové problémy z transcriptu ZS+Padák: křehký filesystem stav (JSON soubory, permission konflikty), žádné API (vše SSH+skripty), bezpečnost přes Linux skupiny, složitá instalace (10+ kroků). Systém je navržen pro AI agenty — člověk diskutuje s AI, AI řeší vše (user, admin, dev operace).
|
||||
|
||||
**UX zůstává stejné.** Tooling: `uv` všude místo pip. Docker + Kamal pro server. CLI (`da`) jako primární rozhraní pro AI agenty.
|
||||
|
||||
---
|
||||
|
||||
## Architektura — cílový stav
|
||||
|
||||
```
|
||||
SERVER (Docker + Kamal):
|
||||
├── webapp Flask UI (katalog, login, corporate memory)
|
||||
├── api FastAPI (CLI backend, sync manifest, data download)
|
||||
├── scheduler APScheduler (nahrazuje 7 systemd timerů)
|
||||
├── telegram-bot Telegram notifikace
|
||||
├── ws-gateway WebSocket pro desktop app
|
||||
└── script-runner Sandboxovaný runner pro user skripty
|
||||
|
||||
LOKÁLNĚ (analytik):
|
||||
├── da CLI Python balíček (uv tool install)
|
||||
├── DuckDB Embedded (analytics.duckdb → views na parquety)
|
||||
└── Parquety Stažené ze serveru přes da sync
|
||||
|
||||
DVA DuckDB NA SERVERU:
|
||||
├── /data/state/system.duckdb Systémový stav (users, sync_state, knowledge...)
|
||||
└── /data/analytics/server.duckdb Views → /data/parquet/** (profiler, remote query, skripty)
|
||||
|
||||
JEDEN DuckDB LOKÁLNĚ:
|
||||
└── user/duckdb/analytics.duckdb Views → server/parquet/** + user tabulky
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Fáze 0: Základ — DuckDB state + repository vrstva
|
||||
|
||||
**Cíl:** Nahradit 10+ JSON souborů DuckDB databází. Eliminovat #1 zdroj outages (file permission konflikty).
|
||||
|
||||
**Proč DuckDB:** Už v stacku, agent může joinovat stav s analytickými daty, lepší než SQLite pro analytické dotazy nad stavem.
|
||||
|
||||
### Task 0A: DuckDB schema + repository vrstva [INDEPENDENT]
|
||||
|
||||
Nové soubory:
|
||||
- `src/db.py` — DuckDB connection management, schema creation, migration system
|
||||
- `src/repositories/__init__.py`
|
||||
- `src/repositories/sync_state.py` — CRUD pro sync stav
|
||||
- `src/repositories/users.py` — CRUD pro uživatele + role
|
||||
- `src/repositories/knowledge.py` — CRUD pro corporate memory
|
||||
- `src/repositories/table_registry.py` — CRUD pro registr tabulek
|
||||
- `src/repositories/audit.py` — audit log
|
||||
- `src/repositories/notifications.py` — telegram links, pending codes, script registry
|
||||
|
||||
Schema tabulky (mapování z JSON):
|
||||
|
||||
| Současný JSON | DuckDB tabulka | Zdroj soubor |
|
||||
|---|---|---|
|
||||
| `sync_state.json` | `sync_state` | `src/data_sync.py:37-138` |
|
||||
| `sync_settings.json` | `user_sync_settings` | `webapp/sync_settings_service.py:20` |
|
||||
| `knowledge.json` | `knowledge_items` | `webapp/corporate_memory_service.py` |
|
||||
| `votes.json` | `knowledge_votes` | `webapp/corporate_memory_service.py` |
|
||||
| `audit.jsonl` | `audit_log` | `webapp/corporate_memory_service.py` |
|
||||
| `telegram_users.json` | `telegram_links` | `services/telegram_bot/storage.py` |
|
||||
| `pending_codes.json` | `pending_codes` | `services/telegram_bot/storage.py` |
|
||||
| `password_users.json` | `users` | `webapp/password_auth.py` |
|
||||
| `table_registry.json` | `table_registry` | `src/table_registry.py` |
|
||||
| `profiles.json` | `table_profiles` | `src/profiler.py` |
|
||||
|
||||
Přidat navíc: `sync_history` (posledních 10 syncí per tabulka, ne jen last), `script_registry` (deployed skripty).
|
||||
|
||||
### Task 0B: Migrace existujících service souborů na repository [DEPENDS ON 0A]
|
||||
|
||||
Soubory k úpravě (nahradit `_read_json`/`_write_json` za repository volání):
|
||||
- `webapp/sync_settings_service.py` řádky 40-62
|
||||
- `webapp/corporate_memory_service.py` — 31 JSON operací
|
||||
- `webapp/telegram_service.py` řádky 22-45
|
||||
- `src/data_sync.py` — třída `SyncState` řádky 37-138
|
||||
- `src/table_registry.py` — `_load`, `_atomic_write_json`
|
||||
- `src/profiler.py` — uložení profilů
|
||||
- `services/corporate_memory/collector.py` — čtení/zápis knowledge
|
||||
- `services/telegram_bot/storage.py` — 15 JSON operací
|
||||
|
||||
Pattern: dual-write (JSON + DuckDB) po přechodnou dobu → ověřit → smazat JSON zápisy.
|
||||
|
||||
### Task 0C: Migrační skript [DEPENDS ON 0A]
|
||||
|
||||
- `scripts/migrate_json_to_duckdb.py` — načte všechny JSON, vloží do DuckDB
|
||||
- Idempotentní (safe to run multiple times)
|
||||
- Validace po migraci (count porovnání)
|
||||
|
||||
### Co se NEMĚNÍ v Fázi 0
|
||||
- Flask routes v `webapp/app.py`
|
||||
- HTML šablony
|
||||
- Konektory (`connectors/keboola/`, `connectors/bigquery/`, `connectors/jira/`)
|
||||
- `src/config.py` (čte `data_description.md` — konfigurace, ne stav)
|
||||
- `config/loader.py` (čte `instance.yaml`)
|
||||
- `src/parquet_manager.py`
|
||||
|
||||
---
|
||||
|
||||
## Fáze 1: API vrstva (FastAPI)
|
||||
|
||||
**Cíl:** REST API pro CLI. Všechny operace co dnes vyžadují SSH.
|
||||
|
||||
### Task 1A: FastAPI základ + auth [INDEPENDENT od 0B, DEPENDS ON 0A]
|
||||
|
||||
Nové soubory:
|
||||
```
|
||||
api/
|
||||
__init__.py
|
||||
app.py # FastAPI app, middleware, CORS
|
||||
auth.py # JWT vydávání + validace
|
||||
dependencies.py # DI pro DuckDB session, current_user
|
||||
```
|
||||
|
||||
Auth flow:
|
||||
1. `POST /api/auth/login` — přijme OAuth token z webappu, vydá JWT
|
||||
2. `POST /api/auth/token` — přijme API key, vydá JWT
|
||||
3. JWT obsahuje: user_id, email, role, expiry
|
||||
4. Middleware validuje JWT na všech /api/* endpoints
|
||||
|
||||
### Task 1B: Sync + Data endpointy [DEPENDS ON 1A, 0A]
|
||||
|
||||
```
|
||||
api/routers/
|
||||
sync.py # GET /api/sync/manifest, POST /api/sync/trigger
|
||||
data.py # GET /api/data/{table}/download (parquet stream)
|
||||
```
|
||||
|
||||
- `/api/sync/manifest` — vrátí hashe všech parquetů, docs, rules, profilů (filtrované per-user dle subscription)
|
||||
- `/api/data/{table}/download` — streaming parquet souboru s ETag/If-None-Match
|
||||
- `/api/sync/trigger` — spustí DataSyncManager (reuse `src/data_sync.py`)
|
||||
|
||||
### Task 1C: Query + Scripts endpointy [DEPENDS ON 1A, 0A]
|
||||
|
||||
```
|
||||
api/routers/
|
||||
query.py # POST /api/query (remote query)
|
||||
scripts.py # POST /api/scripts/run, /deploy, /list
|
||||
```
|
||||
|
||||
- `/api/query` — reuse `src/remote_query.py`, výsledek jako JSON/parquet
|
||||
- `/api/scripts/run` — spustí Python skript v sandboxu na serveru
|
||||
- `/api/scripts/deploy` — nahraje skript + registruje v scheduleru
|
||||
- `/api/scripts/list` — deployed skripty s jejich schedules
|
||||
|
||||
### Task 1D: User management + Corporate memory endpointy [DEPENDS ON 1A, 0A]
|
||||
|
||||
```
|
||||
api/routers/
|
||||
users.py # CRUD uživatelů, role, permissions
|
||||
settings.py # GET/PUT sync settings per user
|
||||
memory.py # Corporate memory CRUD, voting, governance
|
||||
health.py # GET /api/health (strukturovaná diagnostika)
|
||||
upload.py # POST sessions, artifacts, CLAUDE.local.md
|
||||
```
|
||||
|
||||
### Task 1E: Odstranění SSH/sudo závislostí [DEPENDS ON 1B, 1D]
|
||||
|
||||
Smazat/přepsat:
|
||||
- `webapp/sync_settings_service.py` řádky 128-240 (sudo/rsync-filter kód)
|
||||
- `webapp/user_service.py` — Linux user management (`pwd.getpwnam`, `sudo add-analyst`)
|
||||
- SSH key validace workflow
|
||||
- `server/sudoers-webapp`, `server/sudoers-deploy`
|
||||
- `server/bin/add-analyst`
|
||||
|
||||
---
|
||||
|
||||
## Fáze 2: CLI nástroj (`da`)
|
||||
|
||||
**Cíl:** Jediné rozhraní pro AI agenty. Nahrazuje SSH+skripty. `uv tool install`.
|
||||
|
||||
### Task 2A: CLI základ + auth [INDEPENDENT od 1B-1E, DEPENDS ON 1A]
|
||||
|
||||
```
|
||||
cli/
|
||||
__init__.py
|
||||
main.py # Typer app, global options (--server, --json)
|
||||
config.py # ~/.config/da/ management
|
||||
client.py # HTTP client wrapper (auth, retry, streaming)
|
||||
commands/
|
||||
auth.py # da login, da logout, da whoami
|
||||
```
|
||||
|
||||
- `da login` → otevře browser pro OAuth → server vydá JWT → uloží do `~/.config/da/token.json`
|
||||
- `da --json` flag na všech příkazech pro strukturovaný output
|
||||
- `da --server URL` override (default z config.yaml)
|
||||
|
||||
### Task 2B: Sync příkazy [DEPENDS ON 2A, 1B]
|
||||
|
||||
```
|
||||
cli/commands/
|
||||
sync.py # da sync, da sync --table X, da sync --upload-only
|
||||
```
|
||||
|
||||
Flow:
|
||||
1. `GET /api/sync/manifest` → porovnej s `~/.config/da/sync_state.json`
|
||||
2. Download změněné parquety (HTTP streaming s progress barem)
|
||||
3. Download docs, rules, profily
|
||||
4. Upload sessions, artifacts, CLAUDE.local.md
|
||||
5. Rebuild DuckDB views (DROP views, CREATE VIEW per tabulka, zachovej user tabulky)
|
||||
6. Update lokální manifest
|
||||
|
||||
Přepíše funkci `scripts/sync_data.sh` (475 řádků).
|
||||
|
||||
### Task 2C: Query + Scripts příkazy [DEPENDS ON 2A, 1C]
|
||||
|
||||
```
|
||||
cli/commands/
|
||||
query.py # da query "SQL" [--remote] [--json]
|
||||
scripts.py # da scripts list/run/deploy/undeploy
|
||||
explore.py # da explore {table} — profil tabulky
|
||||
```
|
||||
|
||||
- `da query` — lokální DuckDB default, `--remote` přes server API
|
||||
- `da scripts run X` — lokálně default, `--remote` přes server
|
||||
- `da scripts deploy X --schedule "cron"` — upload + registrace na serveru
|
||||
- `da explore orders` — profil z lokálních dat (nebo `--remote` ze serveru)
|
||||
|
||||
### Task 2D: Admin + Server příkazy [DEPENDS ON 2A, 1D]
|
||||
|
||||
```
|
||||
cli/commands/
|
||||
admin.py # da admin add-user/remove-user/list-users
|
||||
status.py # da status [--local] — zdraví systému
|
||||
server.py # da server deploy/rollback/logs/status
|
||||
diagnose.py # da diagnose — AI-friendly diagnostika
|
||||
```
|
||||
|
||||
- `da status` — strukturovaný health report (tabulky, sync stav, služby)
|
||||
- `da status --local` — offline: kdy jsem synkoval, kolik dat mám
|
||||
- `da diagnose` — projde logy, sync stav, konektivitu → root cause
|
||||
- `da server deploy` — wrapper kolem `kamal deploy`
|
||||
- `da server logs webapp` — wrapper kolem `kamal app logs`
|
||||
|
||||
### Task 2E: PyPI distribuce [DEPENDS ON 2A]
|
||||
|
||||
- `pyproject.toml` pro CLI balíček
|
||||
- `uv tool install data-analyst` nebo `uv pip install data-analyst`
|
||||
- Entry point: `[project.scripts] da = "cli.main:app"`
|
||||
- Minimální dependencies: typer, httpx, duckdb, rich (progress bars)
|
||||
|
||||
---
|
||||
|
||||
## Fáze 3: Docker + Kamal
|
||||
|
||||
**Cíl:** `docker compose up` pro dev, `kamal deploy` pro produkci. Nahrazuje 10+ manuálních kroků.
|
||||
|
||||
### Task 3A: Dockerfile + docker-compose.yml [INDEPENDENT]
|
||||
|
||||
```
|
||||
Dockerfile # python:3.13-slim, uv install, jeden image
|
||||
docker-compose.yml # webapp, api, scheduler, telegram-bot, ws-gateway
|
||||
docker-compose.test.yml # api + test-runner pro integrační testy
|
||||
```
|
||||
|
||||
- Jeden image, různý CMD per služba
|
||||
- Volume `/data` sdílený mezi kontejnery
|
||||
- `profiles: ["full"]` pro volitelné služby (telegram, ws-gateway)
|
||||
- `uv sync` místo `pip install` v Dockerfile
|
||||
|
||||
### Task 3B: Scheduler služba [DEPENDS ON 0A]
|
||||
|
||||
Nový soubor: `services/scheduler/__main__.py`
|
||||
- APScheduler (nebo jednoduchý custom) nahrazuje 7 systemd timerů:
|
||||
|
||||
| Timer | Schedule | Funkce |
|
||||
|---|---|---|
|
||||
| data-refresh | 15 min | `DataSyncManager.sync_scheduled()` |
|
||||
| catalog-refresh | 15 min | Catalog refresh |
|
||||
| corporate-memory | 30 min | Knowledge collector |
|
||||
| session-collector | 6h | Session collection (z uploaded dat) |
|
||||
| user-scripts | per-script cron | Script runner |
|
||||
| profiler | po data-refresh | Auto-profile nových dat |
|
||||
|
||||
### Task 3C: Kamal konfigurace [DEPENDS ON 3A]
|
||||
|
||||
```
|
||||
config/
|
||||
deploy.yml # produkční Kamal config
|
||||
deploy.staging.yml # staging override
|
||||
```
|
||||
|
||||
- Kamal Proxy pro auto-SSL (Let's Encrypt)
|
||||
- Healthcheck na `/api/health`
|
||||
- Zero-downtime deploy
|
||||
- Accessories: scheduler, telegram-bot, ws-gateway, script-runner
|
||||
- Environment secrets přes Kamal env management
|
||||
|
||||
### Task 3D: GitHub Actions CI/CD [DEPENDS ON 3A, 3C]
|
||||
|
||||
```
|
||||
.github/workflows/
|
||||
ci.yml # test + build na každém push
|
||||
deploy.yml # staging na PR, production na merge do main
|
||||
```
|
||||
|
||||
Flow: push → pytest → integrační testy (docker compose) → build image → push GHCR → kamal deploy staging (PR) / production (merge)
|
||||
|
||||
### Task 3E: Smazání starého server infra [DEPENDS ON 3A-3D, ověřeno že nové funguje]
|
||||
|
||||
Smazat:
|
||||
- `server/setup.sh` (103 řádků)
|
||||
- `server/webapp-setup.sh` (171 řádků)
|
||||
- `server/deploy.sh` (395 řádků)
|
||||
- `server/migrate-to-v2.sh` (146 řádků)
|
||||
- Všechny systemd unit soubory (`services/*/systemd/`)
|
||||
- `server/sudoers-*`
|
||||
- `server/bin/add-analyst` a related skripty
|
||||
- `scripts/sync_data.sh` (475 řádků)
|
||||
- `server/webapp.service`, `server/webapp-nginx.conf`
|
||||
|
||||
---
|
||||
|
||||
## Fáze 4: RBAC + bezpečnost
|
||||
|
||||
**Cíl:** Aplikační RBAC místo Linux skupin. Audit trail. Script sandboxing.
|
||||
|
||||
### Task 4A: Role + permissions v DuckDB [DEPENDS ON 0A]
|
||||
|
||||
Nový soubor: `src/rbac.py`
|
||||
|
||||
```python
|
||||
class Role(Enum):
|
||||
VIEWER = "viewer" # Katalog, čtení dat
|
||||
ANALYST = "analyst" # Sync, queries, voting, skripty
|
||||
ADMIN = "admin" # Správa uživatelů, schvalování knowledge
|
||||
KM_ADMIN = "km_admin" # Corporate memory governance
|
||||
```
|
||||
|
||||
- Dataset-level permissions (kdo má přístup ke kterým datům)
|
||||
- Přepsat `webapp/auth.py` řádky 37-65 (admin_required/km_admin_required)
|
||||
- Přepsat `webapp/user_service.py` celý — DB místo `pwd.getpwnam()` + `sudo`
|
||||
|
||||
### Task 4B: Audit trail [DEPENDS ON 0A]
|
||||
|
||||
- Každý API call logován do `audit_log` tabulky
|
||||
- Struktura: timestamp, user_id, action, resource, params, result, duration
|
||||
- Agent může: `da query "SELECT * FROM system.audit_log WHERE action='sync_trigger' ORDER BY timestamp DESC LIMIT 10"`
|
||||
|
||||
### Task 4C: Script sandboxing [DEPENDS ON 3A]
|
||||
|
||||
- Script-runner jako izolovaný Docker kontejner
|
||||
- Read-only přístup k DuckDB
|
||||
- Omezená paměť (512MB), čas (5min), žádný network (kromě notification dispatch)
|
||||
- Explicitní whitelist Python balíčků (pandas, duckdb, matplotlib)
|
||||
|
||||
### Task 4D: Corporate memory push model [DEPENDS ON 1D]
|
||||
|
||||
- Uživatelé pushují CLAUDE.local.md přes `da sync --upload-only`
|
||||
- Server nikdy nečte `/home/*/` jako root
|
||||
- Corporate memory collector zpracovává uploaded data z DB
|
||||
|
||||
---
|
||||
|
||||
## Dependency graf pro multi-agenty
|
||||
|
||||
```
|
||||
Fáze 0:
|
||||
0A (DuckDB schema) ─────────────────────┐
|
||||
0C (migrační skript) ← závisí na 0A │
|
||||
0B (migrace services) ← závisí na 0A │
|
||||
│
|
||||
Fáze 1: │
|
||||
1A (FastAPI základ) ← závisí na 0A ─────┤
|
||||
1B (sync/data EP) ← závisí na 1A, 0A │
|
||||
1C (query/scripts EP) ← závisí na 1A │
|
||||
1D (users/memory EP) ← závisí na 1A │
|
||||
1E (remove SSH) ← závisí na 1B, 1D │
|
||||
│
|
||||
Fáze 2: │
|
||||
2A (CLI základ) ← závisí na 1A │
|
||||
2B (sync cmd) ← závisí na 2A, 1B │
|
||||
2C (query/scripts cmd) ← závisí na 2A │
|
||||
2D (admin/server cmd) ← závisí na 2A │
|
||||
2E (PyPI) ← závisí na 2A │
|
||||
│
|
||||
Fáze 3: │
|
||||
3A (Dockerfile) ← INDEPENDENT ──────────┘
|
||||
3B (scheduler) ← závisí na 0A
|
||||
3C (Kamal) ← závisí na 3A
|
||||
3D (CI/CD) ← závisí na 3A, 3C
|
||||
3E (cleanup) ← závisí na 3A-3D verified
|
||||
|
||||
Fáze 4:
|
||||
4A (RBAC) ← závisí na 0A
|
||||
4B (audit) ← závisí na 0A
|
||||
4C (sandbox) ← závisí na 3A
|
||||
4D (push model) ← závisí na 1D
|
||||
```
|
||||
|
||||
### Paralelní agenty — optimální rozložení
|
||||
|
||||
```
|
||||
AGENT 1: DuckDB + Repositories AGENT 2: FastAPI AGENT 3: Docker + Kamal
|
||||
───────────────────────────── ───────────────── ──────────────────────
|
||||
0A: DuckDB schema (čeká na 0A) 3A: Dockerfile + compose
|
||||
0C: migrační skript 1A: FastAPI základ 3B: scheduler služba
|
||||
0B: migrace services 1B: sync/data EP 3C: Kamal konfigurace
|
||||
4A: RBAC 1C: query/scripts EP 3D: CI/CD workflow
|
||||
4B: audit trail 1D: users/memory EP 4C: script sandbox
|
||||
1E: remove SSH deps
|
||||
|
||||
AGENT 4: CLI + Skills AGENT 5: Integrace + Cleanup
|
||||
───────────────────── ───────────────────────────
|
||||
(čeká na 1A) (čeká na agents 1-4)
|
||||
2A: CLI základ + auth End-to-end testování
|
||||
2B: sync příkazy 3E: smazání starého infra
|
||||
2C: query/scripts příkazy 4D: corporate memory push
|
||||
2D: admin/server příkazy 5A: CLAUDE.md template update
|
||||
2E: PyPI distribuce Dokumentace update
|
||||
5B: CLI skills (help/docs)
|
||||
5C: da setup (interactive)
|
||||
5D: da diagnose
|
||||
5E: da infra (multi-customer)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Znovupoužité vs. přepsané soubory
|
||||
|
||||
### Beze změny (business logika zachována)
|
||||
- `src/config.py` — TableConfig, Config parsing (625 řádků)
|
||||
- `src/parquet_manager.py` — Parquet conversion engine
|
||||
- `connectors/keboola/adapter.py` + `client.py`
|
||||
- `connectors/bigquery/adapter.py` + `client.py`
|
||||
- `connectors/jira/` — celý connector
|
||||
- `connectors/llm/` — LLM abstrakce
|
||||
- `connectors/openmetadata/` — katalog enrichment
|
||||
- `webapp/config.py`, `config/loader.py`
|
||||
- `webapp/templates/` — všechny HTML šablony
|
||||
- `src/remote_query.py` — query logika (zabalená API)
|
||||
- `src/profiler.py` — profiling logika (output do DuckDB)
|
||||
|
||||
### Přepojené na DuckDB (logika zachována, I/O vrstva vyměněna)
|
||||
- `webapp/corporate_memory_service.py`
|
||||
- `webapp/sync_settings_service.py`
|
||||
- `webapp/telegram_service.py`
|
||||
- `src/data_sync.py` (SyncState třída)
|
||||
- `src/table_registry.py`
|
||||
- `services/corporate_memory/collector.py`
|
||||
- `services/telegram_bot/storage.py`
|
||||
|
||||
### Přepsané
|
||||
- `webapp/user_service.py` — DB místo Linux users
|
||||
- `webapp/auth.py` řádky 37-65 — RBAC místo Linux skupin
|
||||
|
||||
### Nové
|
||||
- `src/db.py`, `src/repositories/`, `src/rbac.py`
|
||||
- `api/` — celý FastAPI server
|
||||
- `cli/` — celý CLI nástroj
|
||||
- `Dockerfile`, `docker-compose*.yml`, `config/deploy*.yml`
|
||||
- `services/scheduler/__main__.py`
|
||||
- `.github/workflows/ci.yml`, `.github/workflows/deploy.yml`
|
||||
|
||||
### Smazané
|
||||
- `server/setup.sh`, `server/webapp-setup.sh`, `server/deploy.sh`
|
||||
- `server/migrate-to-v2.sh`
|
||||
- `server/sudoers-*`, `server/bin/add-analyst`
|
||||
- `scripts/sync_data.sh`
|
||||
- Všechny `services/*/systemd/` soubory
|
||||
- `server/webapp.service`, `server/webapp-nginx.conf`
|
||||
|
||||
---
|
||||
|
||||
## Fáze 5: Agent Skills (CLAUDE.md + CLI skills)
|
||||
|
||||
**Cíl:** AI agent má vestavěné znalosti pro nasazení, administraci, diagnostiku a vývoj. Nemusí nic googlit — vše je v skills.
|
||||
|
||||
### Task 5A: CLAUDE.md template pro analytiky [INDEPENDENT]
|
||||
|
||||
Aktualizovat `docs/setup/claude_md_template.md`:
|
||||
- Instrukce pro `da` CLI místo SSH/rsync
|
||||
- `da sync` jako povinný start session
|
||||
- Jak pracovat s lokálním DuckDB
|
||||
- Jak vytvářet a deployovat skripty
|
||||
- Jak používat corporate memory
|
||||
- Notifikační vzory (lokální vs serverové)
|
||||
|
||||
### Task 5B: Admin/Deploy skills v CLI [DEPENDS ON 2D]
|
||||
|
||||
`da` CLI bude obsahovat vestavěné skills — dlouhé help texty s domain knowledge, které AI agent přečte přes `da <command> --help` nebo `da skills <topic>`:
|
||||
|
||||
```bash
|
||||
da skills list # seznam všech dostupných skills
|
||||
da skills setup # kompletní průvodce setup nové instance
|
||||
da skills troubleshoot # diagnostické postupy
|
||||
da skills connectors # jak přidat nový data source
|
||||
da skills notifications # jak fungují notifikace
|
||||
da skills corporate-memory # governance, approval flow
|
||||
da skills security # RBAC, permissions, audit
|
||||
da skills backup-restore # disaster recovery
|
||||
da skills upgrade # jak upgradovat verzi
|
||||
```
|
||||
|
||||
Každý skill = markdown soubor v `cli/skills/` který se zobrazí přes `da skills <name>`.
|
||||
|
||||
### Task 5C: Interaktivní setup skill [DEPENDS ON 2D, 1A]
|
||||
|
||||
```bash
|
||||
da setup # AI agent spustí interaktivní setup
|
||||
```
|
||||
|
||||
Flow (agent řídí):
|
||||
1. `da setup init` → vygeneruje `instance.yaml` z konverzace s uživatelem
|
||||
2. `da setup test-connection` → ověří credentials (Keboola/BigQuery)
|
||||
3. `da setup deploy` → `docker compose up` nebo `kamal deploy`
|
||||
4. `da setup first-sync` → triggeruje první data sync
|
||||
5. `da setup verify` → healthcheck, počet tabulek, sample query
|
||||
6. `da setup add-user` → přidá prvního analytika
|
||||
|
||||
Každý krok vrací strukturovaný JSON → agent ví co dělat dál.
|
||||
|
||||
### Task 5D: Diagnose skill [DEPENDS ON 2D, 1D]
|
||||
|
||||
```bash
|
||||
da diagnose # kompletní diagnostika
|
||||
da diagnose --symptom "data not updating" # cílená diagnostika
|
||||
da diagnose --component scheduler # diagnostika jedné služby
|
||||
```
|
||||
|
||||
Output (strukturovaný pro agenta):
|
||||
```json
|
||||
{
|
||||
"overall": "degraded",
|
||||
"checks": [
|
||||
{"name": "api", "status": "ok", "latency_ms": 12},
|
||||
{"name": "scheduler", "status": "ok", "last_run": "2026-03-27T08:00"},
|
||||
{"name": "data_freshness", "status": "warning",
|
||||
"detail": "table 'orders' last synced 26h ago, expected 15min",
|
||||
"suggested_action": "da server logs scheduler | grep orders"},
|
||||
{"name": "disk", "status": "ok", "usage": "45%"},
|
||||
{"name": "duckdb", "status": "ok", "tables": 47, "total_rows": "12.3M"}
|
||||
],
|
||||
"suggested_actions": [
|
||||
"Check scheduler logs for 'orders' sync failures",
|
||||
"Run: da server logs scheduler --since 24h | grep -i error"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Task 5E: Operační skills pro multi-customer [DEPENDS ON 3C]
|
||||
|
||||
```bash
|
||||
da infra list # seznam zákaznických instancí
|
||||
da infra provision --customer acme --cloud gcp --region europe-west1
|
||||
da infra status acme # zdraví zákaznické instance
|
||||
da infra deploy acme # deploy na zákaznický server
|
||||
da infra backup acme # snapshot dat
|
||||
```
|
||||
|
||||
Budoucí rozšíření — Terraform pod kapotou pro provision, Kamal pro deploy.
|
||||
|
||||
---
|
||||
|
||||
## Verifikace
|
||||
|
||||
### Per-fáze
|
||||
1. **Fáze 0:** `pytest tests/` zelený, webapp funguje identicky s DuckDB backendem
|
||||
2. **Fáze 1:** `curl /api/health` → ok, `curl /api/sync/manifest` → manifest, parquet download funguje
|
||||
3. **Fáze 2:** `da login && da sync` vytvoří identickou strukturu jako `sync_data.sh`, `da query` funguje offline
|
||||
4. **Fáze 3:** `docker compose up` → všechny služby běží, `kamal deploy -d staging` → staging funguje
|
||||
5. **Fáze 4:** viewer nemůže triggerovat sync, admin může spravovat uživatele, skripty běží v sandboxu
|
||||
|
||||
### End-to-end test (celý flow)
|
||||
1. `docker compose up -d` (nebo `kamal deploy`)
|
||||
2. Přes webapp: přihlásit se, vybrat datasety
|
||||
3. `da login && da sync` → parquety lokálně
|
||||
4. `da query "SELECT count(*) FROM orders"` → výsledek offline
|
||||
5. `da scripts run sales_alert` → lokální exekuce
|
||||
6. `da scripts deploy sales_alert --schedule "0 8 * * MON"` → serverová exekuce
|
||||
7. `da sync --upload-only` → sessions/artifacts na serveru
|
||||
8. Corporate memory: knowledge items viditelné ve webappu
|
||||
9. Telegram notifikace doručeny
|
||||
10. `da diagnose` → strukturovaný health report
|
||||
1574
docs/superpowers/plans/2026-03-27-01-duckdb-state-layer.md
Normal file
1574
docs/superpowers/plans/2026-03-27-01-duckdb-state-layer.md
Normal file
File diff suppressed because it is too large
Load diff
524
docs/superpowers/specs/2026-03-27-refactoring-design.md
Normal file
524
docs/superpowers/specs/2026-03-27-refactoring-design.md
Normal file
|
|
@ -0,0 +1,524 @@
|
|||
# AI Data Analyst — Refactoring Design Spec
|
||||
|
||||
**Date:** 2026-03-27
|
||||
**Status:** Draft
|
||||
**Target:** Greenfield demo with Keboola internal data
|
||||
|
||||
## 1. Problem Statement
|
||||
|
||||
The platform was built iteratively as an internal tool and needs to become a product for external customers (Groupon, others). Key problems:
|
||||
|
||||
1. **Fragile filesystem state** — 10+ JSON files, permission conflicts between processes (www-data, deploy, root, user) cause outages
|
||||
2. **No API** — all operations via SSH + bash scripts, no programmatic control
|
||||
3. **Security via Linux groups** — no real RBAC, SSH keys visible in `ps aux`, root reads user homes
|
||||
4. **Complex installation** — 10+ manual steps, specific OS requirements, dual-repo pattern with symlinks
|
||||
5. **Operations nightmare** — scattered scripts, no unified logging/monitoring, creator calls it "duct tape solution"
|
||||
|
||||
The system is designed for AI agents — humans discuss with AI, AI handles everything (user, admin, dev operations).
|
||||
|
||||
**Constraint:** UX must remain identical. Web catalog, data sync, offline Claude Code analysis, Telegram notifications, corporate memory — all preserved.
|
||||
|
||||
## 2. Architecture
|
||||
|
||||
### Target State
|
||||
|
||||
```
|
||||
SERVER (Docker + Kamal):
|
||||
┌──────────────────────────────────────────────────┐
|
||||
│ FastAPI Main App (1 process) │
|
||||
│ ├── Web UI (Jinja2 templates) │
|
||||
│ ├── REST API (/api/*) │
|
||||
│ ├── WebSocket (/ws/notifications) │
|
||||
│ └── Auth (JWT + pluggable providers) │
|
||||
└──────────────────────────────────────────────────┘
|
||||
┌─────────────────┐ ┌──────────────────────────────┐
|
||||
│ Scheduler sidecar│ │ Telegram bot (optional) │
|
||||
│ Calls /api/ │ │ Long-running daemon │
|
||||
└─────────────────┘ └──────────────────────────────┘
|
||||
|
||||
/data/state/system.duckdb ← system state (users, sync, knowledge, audit)
|
||||
/data/analytics/server.duckdb ← views on parquet files
|
||||
/data/parquet/** ← data files
|
||||
|
||||
LOCAL (analyst):
|
||||
┌──────────────────────────────────────────────────┐
|
||||
│ da CLI (uv tool install data-analyst-cli) │
|
||||
│ user/duckdb/analytics.duckdb ← views + user tbls│
|
||||
│ server/parquet/** ← downloaded via da sync │
|
||||
│ Claude Code ← works offline with DuckDB │
|
||||
└──────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Key Decisions
|
||||
|
||||
| Decision | Choice | Rationale |
|
||||
|----------|--------|-----------|
|
||||
| Web framework | FastAPI only (no Flask) | One framework, OpenAPI auto-schema, async native, Jinja2 support |
|
||||
| State storage | DuckDB | Already in stack, agent can join state with analytics, better than SQLite for analytical queries |
|
||||
| CLI tool | `da` via `uv tool install` | AI-agent native interface, no Docker dependency locally |
|
||||
| Server deploy | Docker + Kamal | Zero-downtime deploys, auto-SSL, simple config |
|
||||
| Architecture | Hybrid (main app + scheduler sidecar + optional telegram) | 3 containers max, WebSocket in main app |
|
||||
| Auth providers | All 3 (Google OAuth + Email magic link + Password) | Full compatibility with existing users |
|
||||
| LLM provider | Configurable in instance.yaml | User chooses: local Ollama, Anthropic, OpenAI, AI Gateway |
|
||||
| Python tooling | uv everywhere (no pip) | Faster, deterministic, modern |
|
||||
|
||||
## 3. Data Layer
|
||||
|
||||
### Server DuckDB: system.duckdb
|
||||
|
||||
```sql
|
||||
-- Users & RBAC
|
||||
CREATE TABLE users (
|
||||
id VARCHAR PRIMARY KEY,
|
||||
email VARCHAR UNIQUE NOT NULL,
|
||||
name VARCHAR,
|
||||
role VARCHAR DEFAULT 'analyst', -- viewer, analyst, admin, km_admin
|
||||
password_hash VARCHAR,
|
||||
setup_token VARCHAR,
|
||||
reset_token VARCHAR,
|
||||
created_at TIMESTAMP DEFAULT current_timestamp,
|
||||
updated_at TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE TABLE user_sync_settings (
|
||||
user_id VARCHAR REFERENCES users(id),
|
||||
dataset VARCHAR NOT NULL,
|
||||
enabled BOOLEAN DEFAULT false,
|
||||
table_mode VARCHAR DEFAULT 'all', -- all, explicit
|
||||
tables JSON,
|
||||
updated_at TIMESTAMP,
|
||||
PRIMARY KEY (user_id, dataset)
|
||||
);
|
||||
|
||||
CREATE TABLE dataset_permissions (
|
||||
user_id VARCHAR REFERENCES users(id),
|
||||
dataset VARCHAR NOT NULL,
|
||||
access VARCHAR DEFAULT 'read', -- read, none
|
||||
PRIMARY KEY (user_id, dataset)
|
||||
);
|
||||
|
||||
-- Sync state + history
|
||||
CREATE TABLE sync_state (
|
||||
table_id VARCHAR PRIMARY KEY,
|
||||
last_sync TIMESTAMP,
|
||||
rows BIGINT,
|
||||
file_size_bytes BIGINT,
|
||||
uncompressed_size_bytes BIGINT,
|
||||
columns INTEGER,
|
||||
hash VARCHAR,
|
||||
status VARCHAR DEFAULT 'ok',
|
||||
error TEXT
|
||||
);
|
||||
|
||||
CREATE TABLE sync_history (
|
||||
id VARCHAR PRIMARY KEY,
|
||||
table_id VARCHAR NOT NULL,
|
||||
synced_at TIMESTAMP NOT NULL,
|
||||
rows BIGINT,
|
||||
duration_ms INTEGER,
|
||||
status VARCHAR,
|
||||
error TEXT
|
||||
);
|
||||
|
||||
-- Corporate memory
|
||||
CREATE TABLE knowledge_items (
|
||||
id VARCHAR PRIMARY KEY,
|
||||
title VARCHAR NOT NULL,
|
||||
content TEXT,
|
||||
category VARCHAR,
|
||||
tags TEXT[],
|
||||
status VARCHAR DEFAULT 'pending', -- pending, approved, mandatory, rejected
|
||||
contributors TEXT[],
|
||||
source_user VARCHAR,
|
||||
audience VARCHAR,
|
||||
created_at TIMESTAMP,
|
||||
updated_at TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE TABLE knowledge_votes (
|
||||
item_id VARCHAR REFERENCES knowledge_items(id),
|
||||
user_id VARCHAR REFERENCES users(id),
|
||||
vote INTEGER, -- 1 or -1
|
||||
voted_at TIMESTAMP,
|
||||
PRIMARY KEY (item_id, user_id)
|
||||
);
|
||||
|
||||
-- Audit
|
||||
CREATE TABLE audit_log (
|
||||
id VARCHAR PRIMARY KEY,
|
||||
timestamp TIMESTAMP NOT NULL,
|
||||
user_id VARCHAR,
|
||||
action VARCHAR NOT NULL,
|
||||
resource VARCHAR,
|
||||
params JSON,
|
||||
result VARCHAR,
|
||||
duration_ms INTEGER
|
||||
);
|
||||
|
||||
-- Notifications
|
||||
CREATE TABLE telegram_links (
|
||||
user_id VARCHAR PRIMARY KEY REFERENCES users(id),
|
||||
chat_id BIGINT NOT NULL,
|
||||
linked_at TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE TABLE pending_codes (
|
||||
code VARCHAR PRIMARY KEY,
|
||||
chat_id BIGINT NOT NULL,
|
||||
created_at TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE TABLE script_registry (
|
||||
id VARCHAR PRIMARY KEY,
|
||||
name VARCHAR NOT NULL,
|
||||
owner VARCHAR REFERENCES users(id),
|
||||
schedule VARCHAR, -- cron expression or null
|
||||
source TEXT NOT NULL,
|
||||
deployed_at TIMESTAMP,
|
||||
last_run TIMESTAMP,
|
||||
last_status VARCHAR
|
||||
);
|
||||
|
||||
-- Table registry
|
||||
CREATE TABLE table_registry (
|
||||
id VARCHAR PRIMARY KEY,
|
||||
name VARCHAR NOT NULL,
|
||||
folder VARCHAR,
|
||||
sync_strategy VARCHAR,
|
||||
primary_key VARCHAR,
|
||||
description TEXT,
|
||||
registered_by VARCHAR,
|
||||
registered_at TIMESTAMP
|
||||
);
|
||||
|
||||
-- Profiles
|
||||
CREATE TABLE table_profiles (
|
||||
table_id VARCHAR PRIMARY KEY,
|
||||
profile JSON NOT NULL,
|
||||
profiled_at TIMESTAMP
|
||||
);
|
||||
```
|
||||
|
||||
### Server DuckDB: server.duckdb
|
||||
|
||||
Auto-generated views on parquet files:
|
||||
```sql
|
||||
CREATE VIEW orders AS SELECT * FROM read_parquet('/data/parquet/sales/orders.parquet');
|
||||
CREATE VIEW customers AS SELECT * FROM read_parquet('/data/parquet/sales/customers.parquet');
|
||||
-- Generated from schema.yml by profiler/sync
|
||||
```
|
||||
|
||||
### Local DuckDB: analytics.duckdb
|
||||
|
||||
Views on local parquets (generated by `da sync`):
|
||||
```sql
|
||||
CREATE VIEW orders AS SELECT * FROM read_parquet('./server/parquet/sales/orders.parquet');
|
||||
-- User-created tables survive da sync (rebuild drops only views, not tables)
|
||||
```
|
||||
|
||||
### Repository Pattern
|
||||
|
||||
```
|
||||
src/repositories/
|
||||
__init__.py # get_system_db(), get_analytics_db() factories
|
||||
users.py # UserRepository (CRUD + role checks)
|
||||
sync_state.py # SyncStateRepository (state + history)
|
||||
knowledge.py # KnowledgeRepository (items + votes + governance)
|
||||
audit.py # AuditRepository (append + query)
|
||||
scripts.py # ScriptRepository (registry + scheduling)
|
||||
table_registry.py # TableRegistryRepository
|
||||
notifications.py # TelegramRepository + PendingCodeRepository
|
||||
```
|
||||
|
||||
## 4. API Endpoints
|
||||
|
||||
### FastAPI Router Structure
|
||||
|
||||
```
|
||||
app/
|
||||
main.py # FastAPI app, lifespan events, middleware
|
||||
auth/
|
||||
router.py # POST /auth/login, /auth/token, /auth/logout
|
||||
jwt.py # JWT create/verify (PyJWT)
|
||||
providers/ # Pluggable: google/, email/, password/
|
||||
dependencies.py # get_current_user, require_role(Role)
|
||||
web/
|
||||
router.py # Web UI: GET /, /catalog, /memory, /settings...
|
||||
templates/ # Jinja2 (migrated from webapp/templates/)
|
||||
static/ # CSS, JS, images
|
||||
api/
|
||||
sync.py # GET /api/sync/manifest, POST /api/sync/trigger
|
||||
data.py # GET /api/data/{table}/download
|
||||
query.py # POST /api/query
|
||||
scripts.py # GET/POST /api/scripts, POST /api/scripts/{id}/run
|
||||
users.py # CRUD /api/users
|
||||
settings.py # GET/PUT /api/users/{id}/settings
|
||||
memory.py # CRUD /api/memory, POST /api/memory/{id}/vote
|
||||
health.py # GET /api/health
|
||||
upload.py # POST /api/upload/sessions, /artifacts, /local-md
|
||||
ws/
|
||||
notifications.py # WebSocket /ws/notifications
|
||||
```
|
||||
|
||||
### Key Endpoints
|
||||
|
||||
| Endpoint | Method | Auth | Purpose |
|
||||
|----------|--------|------|---------|
|
||||
| `/api/sync/manifest` | GET | JWT (analyst+) | Hash-based manifest of all synced data |
|
||||
| `/api/sync/trigger` | POST | JWT (admin) | Trigger data sync from source |
|
||||
| `/api/data/{table}/download` | GET | JWT (analyst+) | Stream parquet file (ETag support) |
|
||||
| `/api/query` | POST | JWT (analyst+) | Execute SQL against server DuckDB |
|
||||
| `/api/scripts` | GET/POST | JWT (analyst+) | List/deploy user scripts |
|
||||
| `/api/scripts/{id}/run` | POST | JWT (analyst+) | Execute script in sandbox |
|
||||
| `/api/users` | GET/POST/DELETE | JWT (admin) | User management |
|
||||
| `/api/memory` | GET/POST/PUT | JWT (analyst+) | Corporate memory CRUD |
|
||||
| `/api/health` | GET | none | Structured health check |
|
||||
| `/api/upload/sessions` | POST | JWT (analyst+) | Upload Claude session transcripts |
|
||||
| `/api/upload/local-md` | POST | JWT (analyst+) | Upload CLAUDE.local.md content |
|
||||
|
||||
### Sync Protocol
|
||||
|
||||
1. CLI calls `GET /api/sync/manifest` → receives hashes per table/asset
|
||||
2. CLI compares with local `~/.config/da/sync_state.json`
|
||||
3. For each changed table: `GET /api/data/{table}/download` → streaming to `./server/parquet/`
|
||||
4. Download changed docs, rules, profiles, scripts
|
||||
5. Upload new sessions, artifacts, CLAUDE.local.md content
|
||||
6. Rebuild local DuckDB views (preserve user-created tables)
|
||||
7. Update local sync manifest
|
||||
|
||||
## 5. CLI Tool (`da`)
|
||||
|
||||
### Structure
|
||||
|
||||
```
|
||||
cli/
|
||||
main.py # Typer app, --server/--json global options
|
||||
config.py # ~/.config/da/ management (token, server URL, sync state)
|
||||
client.py # httpx async client (JWT auth, retry, streaming, progress bars)
|
||||
duckdb_local.py # Local DuckDB management (create views, query, explore)
|
||||
commands/
|
||||
auth.py # da login/logout/whoami
|
||||
sync.py # da sync [--table X] [--upload-only] [--docs-only]
|
||||
query.py # da query "SQL" [--remote] [--json] [--format csv/table/json]
|
||||
scripts.py # da scripts list/run/deploy/undeploy
|
||||
explore.py # da explore {table}
|
||||
admin.py # da admin add-user/remove-user/list-users/set-role
|
||||
status.py # da status [--local] [--json]
|
||||
server.py # da server deploy/rollback/logs/status/backup
|
||||
setup.py # da setup init/test-connection/deploy/first-sync/verify
|
||||
diagnose.py # da diagnose [--symptom X] [--component Y]
|
||||
skills.py # da skills list/show
|
||||
infra.py # da infra provision/status/deploy (future)
|
||||
skills/ # Markdown knowledge base for AI agents
|
||||
setup.md
|
||||
troubleshoot.md
|
||||
connectors.md
|
||||
notifications.md
|
||||
corporate-memory.md
|
||||
security.md
|
||||
backup-restore.md
|
||||
upgrade.md
|
||||
```
|
||||
|
||||
### Distribution
|
||||
|
||||
```toml
|
||||
[project]
|
||||
name = "data-analyst-cli"
|
||||
requires-python = ">=3.11"
|
||||
dependencies = ["typer>=0.12", "httpx>=0.27", "duckdb>=1.1", "rich>=13", "pyjwt>=2.8"]
|
||||
|
||||
[project.scripts]
|
||||
da = "cli.main:app"
|
||||
```
|
||||
|
||||
Install: `uv tool install data-analyst-cli`
|
||||
|
||||
### Offline Capability
|
||||
|
||||
After `da sync`, everything works without network:
|
||||
- `da query` → local DuckDB
|
||||
- `da scripts run` → local Python execution
|
||||
- `da explore` → local profile data
|
||||
- `da status --local` → sync timestamps from local manifest
|
||||
|
||||
## 6. Deploy & Infrastructure
|
||||
|
||||
### Docker
|
||||
|
||||
```dockerfile
|
||||
FROM python:3.13-slim
|
||||
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
|
||||
WORKDIR /app
|
||||
COPY pyproject.toml uv.lock ./
|
||||
RUN uv sync --frozen --no-dev
|
||||
COPY . .
|
||||
CMD ["uv", "run", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||
```
|
||||
|
||||
### Docker Compose (dev)
|
||||
|
||||
```yaml
|
||||
services:
|
||||
app:
|
||||
build: .
|
||||
ports: ["8000:8000"]
|
||||
volumes: [".:/app", "data:/data"]
|
||||
env_file: .env
|
||||
command: uv run uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
|
||||
|
||||
scheduler:
|
||||
build: .
|
||||
volumes: ["data:/data"]
|
||||
env_file: .env
|
||||
command: uv run python -m services.scheduler
|
||||
|
||||
telegram-bot:
|
||||
build: .
|
||||
volumes: ["data:/data"]
|
||||
env_file: .env
|
||||
command: uv run python -m services.telegram_bot
|
||||
profiles: ["full"]
|
||||
|
||||
volumes:
|
||||
data:
|
||||
```
|
||||
|
||||
### Scheduler Sidecar
|
||||
|
||||
The scheduler is a lightweight process that triggers jobs by calling the main app's API:
|
||||
|
||||
```python
|
||||
# services/scheduler/__main__.py
|
||||
import httpx
|
||||
from apscheduler.schedulers.blocking import BlockingScheduler
|
||||
|
||||
API_URL = os.environ.get("API_URL", "http://app:8000")
|
||||
API_TOKEN = os.environ.get("SCHEDULER_API_TOKEN") # internal service token
|
||||
|
||||
scheduler = BlockingScheduler()
|
||||
|
||||
@scheduler.scheduled_job("interval", minutes=15)
|
||||
def data_refresh():
|
||||
httpx.post(f"{API_URL}/api/sync/trigger", headers={"Authorization": f"Bearer {API_TOKEN}"})
|
||||
|
||||
@scheduler.scheduled_job("interval", minutes=30)
|
||||
def corporate_memory():
|
||||
httpx.post(f"{API_URL}/api/internal/collect-knowledge", headers={"Authorization": f"Bearer {API_TOKEN}"})
|
||||
|
||||
# ... more jobs
|
||||
scheduler.start()
|
||||
```
|
||||
|
||||
This keeps all business logic in the main app. The scheduler is stateless and restartable.
|
||||
|
||||
### Kamal (production)
|
||||
|
||||
- Auto-SSL via Kamal Proxy (Let's Encrypt)
|
||||
- Zero-downtime deploy
|
||||
- Healthcheck on `/api/health`
|
||||
- Staging: `kamal deploy -d staging`
|
||||
- Production: `kamal deploy`
|
||||
- Rollback: `kamal rollback`
|
||||
|
||||
### CI/CD (GitHub Actions)
|
||||
|
||||
```
|
||||
push → pytest (unit) → docker compose test (integration) → build+push GHCR
|
||||
PR → kamal deploy staging
|
||||
merge main → kamal deploy production
|
||||
```
|
||||
|
||||
## 7. Security
|
||||
|
||||
### RBAC
|
||||
|
||||
| Role | Permissions |
|
||||
|------|-------------|
|
||||
| `viewer` | Read catalog, view profiles, browse corporate memory |
|
||||
| `analyst` | + sync data, run queries, vote on knowledge, run/deploy scripts |
|
||||
| `admin` | + manage users, approve knowledge, trigger sync, view audit |
|
||||
| `km_admin` | + corporate memory governance (approve/reject/mandate) |
|
||||
|
||||
Dataset-level permissions restrict which datasets each user can access.
|
||||
|
||||
### Auth Flow
|
||||
|
||||
1. Web: user logs in via Google OAuth / Email magic link / Password
|
||||
2. Server issues JWT (contains: user_id, email, role, exp)
|
||||
3. CLI: `da login` → OAuth browser flow → JWT stored in `~/.config/da/token.json`
|
||||
4. All API calls include JWT in Authorization header
|
||||
5. FastAPI dependency validates JWT + checks role permissions
|
||||
|
||||
### Audit Trail
|
||||
|
||||
Every API call logged to `audit_log` table:
|
||||
- timestamp, user_id, action, resource, params, result, duration_ms
|
||||
- Queryable by agent: `da query "SELECT * FROM system.audit_log WHERE ..."`
|
||||
|
||||
### Script Sandboxing
|
||||
|
||||
User scripts run in isolated Docker container:
|
||||
- Read-only DuckDB access
|
||||
- Memory limit: 512MB, time limit: 5min
|
||||
- No network (except notification dispatch)
|
||||
- Whitelisted Python packages: pandas, duckdb, matplotlib, numpy
|
||||
|
||||
## 8. Testing Strategy
|
||||
|
||||
```
|
||||
tests/
|
||||
unit/ # No I/O, mocked dependencies
|
||||
test_repositories.py # In-memory DuckDB
|
||||
test_sync_logic.py
|
||||
test_auth.py
|
||||
test_rbac.py
|
||||
integration/ # Docker compose, real DuckDB + sample data
|
||||
test_api_endpoints.py
|
||||
test_sync_flow.py
|
||||
test_cli_commands.py
|
||||
fixtures/
|
||||
sample_data/ # Small parquets for testing
|
||||
instance.yaml # Test config
|
||||
```
|
||||
|
||||
## 9. Migration Path
|
||||
|
||||
1. **Greenfield demo** — build new system from scratch with sample Keboola data
|
||||
2. **Validate** — end-to-end: setup → sync → query → scripts → notifications
|
||||
3. **Migrate internal** — point new system at Keboola internal, migrate users
|
||||
4. **Migrate Groupon** — deploy new system for Groupon with their config
|
||||
5. **Deprecate old** — remove old server infrastructure
|
||||
|
||||
## 10. Reused Code
|
||||
|
||||
| File | Status | Notes |
|
||||
|------|--------|-------|
|
||||
| `src/config.py` | Reused as-is | TableConfig, Config parsing |
|
||||
| `src/parquet_manager.py` | Reused as-is | Parquet conversion |
|
||||
| `connectors/keboola/` | Reused as-is | Keboola adapter + client |
|
||||
| `connectors/bigquery/` | Reused as-is | BigQuery adapter + client |
|
||||
| `connectors/jira/` | Reused as-is | Jira connector |
|
||||
| `connectors/llm/` | Reused as-is | LLM abstraction |
|
||||
| `connectors/openmetadata/` | Reused as-is | Catalog enrichment |
|
||||
| `src/data_sync.py` | Rewired | SyncState → DuckDB repository |
|
||||
| `src/remote_query.py` | Wrapped | Query logic wrapped by API endpoint |
|
||||
| `src/profiler.py` | Rewired | Output to DuckDB instead of JSON |
|
||||
| `src/table_registry.py` | Rewired | JSON → DuckDB repository |
|
||||
| `webapp/corporate_memory_service.py` | Rewired | Business logic preserved, I/O swapped |
|
||||
| `webapp/templates/` | Migrated | Jinja2 templates work in FastAPI |
|
||||
| `auth/` | Migrated | Provider pattern preserved |
|
||||
|
||||
## 11. Deleted Code
|
||||
|
||||
| File | Reason |
|
||||
|------|--------|
|
||||
| `server/setup.sh` | Replaced by Docker |
|
||||
| `server/webapp-setup.sh` | Replaced by Docker + Kamal |
|
||||
| `server/deploy.sh` | Replaced by Kamal |
|
||||
| `server/sudoers-*` | No more Linux user management |
|
||||
| `server/bin/add-analyst` | Replaced by API + CLI |
|
||||
| `scripts/sync_data.sh` | Replaced by `da sync` |
|
||||
| `services/*/systemd/` | Replaced by Docker Compose |
|
||||
| `webapp/user_service.py` | Rewritten for DB-based users |
|
||||
| `webapp/sync_settings_service.py` (sudo parts) | Replaced by API |
|
||||
Loading…
Reference in a new issue