chore: clean repo for public release — fix references, remove drafts

- Replace padak/tmp_oss → keboola/agnes-the-ai-analyst in all docs, infra, CLI
- Replace your-org/ai-data-analyst → keboola/agnes-the-ai-analyst in README, Jira docs
- Remove real GCP project ID from terraform.tfvars.example
- Delete internal draft documents (dev_docs/draft/)
- Update infra/main.tf to clone from main branch
This commit is contained in:
ZdenekSrotyr 2026-04-08 19:27:25 +02:00
parent 79443e0df4
commit 89154d043b
15 changed files with 877 additions and 1011 deletions

View file

@ -59,8 +59,8 @@ The short version:
```bash
# 1. Clone the repository
git clone https://github.com/your-org/ai-data-analyst.git
cd ai-data-analyst
git clone https://github.com/keboola/agnes-the-ai-analyst.git
cd agnes-the-ai-analyst
# 2. Copy and edit configuration
cp config/instance.yaml.example config/instance.yaml

View file

@ -19,9 +19,9 @@ ssh user@your-server-ip
### 2. Clone the repository
```bash
git clone https://github.com/padak/tmp_oss.git /opt/data-analyst
git clone https://github.com/keboola/agnes-the-ai-analyst.git /opt/data-analyst
cd /opt/data-analyst
git checkout feature/v2-fastapi-duckdb-docker-cli
git checkout main
```
### 3. Create .env file

View file

@ -500,11 +500,11 @@ The SLA polling job runs every 15 minutes via systemd timer (`jira-sla-poll.time
3. Updates raw JSON atomically (`tempfile.mkstemp()` + `os.fchmod(fd, 0o660)` + `os.replace()`)
4. Triggers incremental Parquet transform (inside advisory file lock)
**Self-healing:** The poll fetches `status`, `resolution`, `resolutiondate`, and `updated` alongside the SLA fields. If a ticket is resolved in Jira but still appears "open" in Parquet (e.g. due to a missed webhook), the poll automatically corrects the status in JSON and re-transforms to Parquet. Log output: `Self-healing: SUPPORT-XXXX is resolved in Jira`. This was added in response to [#203](https://github.com/your-org/ai-data-analyst/issues/203) where 12 tickets were permanently stale after a permission bug prevented webhooks from updating JSON files.
**Self-healing:** The poll fetches `status`, `resolution`, `resolutiondate`, and `updated` alongside the SLA fields. If a ticket is resolved in Jira but still appears "open" in Parquet (e.g. due to a missed webhook), the poll automatically corrects the status in JSON and re-transforms to Parquet. Log output: `Self-healing: SUPPORT-XXXX is resolved in Jira`. This was added in response to [#203](https://github.com/keboola/agnes-the-ai-analyst/issues/203) where 12 tickets were permanently stale after a permission bug prevented webhooks from updating JSON files.
**File locking:** The entire read-modify-write + Parquet transform is wrapped in a per-issue advisory file lock (`connectors/jira/file_lock.py`) to prevent races with the webhook handler. The webhook handler (`connectors/jira/service.py`) uses the same lock. Different issue keys don't block each other.
**Important — `mkstemp` and ACL:** The `issues/` directory uses POSIX ACLs with `default:mask::rwx`. `tempfile.mkstemp()` creates files with mode `0600`, which overrides the ACL mask to `---` and breaks group access for www-data (webhook handler) and deploy (batch transform). The `os.fchmod(fd, 0o660)` call immediately after `mkstemp()` restores the mask to `rw-`, preserving ACL-based access. See [#203](https://github.com/your-org/ai-data-analyst/issues/203) for the full incident report.
**Important — `mkstemp` and ACL:** The `issues/` directory uses POSIX ACLs with `default:mask::rwx`. `tempfile.mkstemp()` creates files with mode `0600`, which overrides the ACL mask to `---` and breaks group access for www-data (webhook handler) and deploy (batch transform). The `os.fchmod(fd, 0o660)` call immediately after `mkstemp()` restores the mask to `rw-`, preserving ACL-based access. See [#203](https://github.com/keboola/agnes-the-ai-analyst/issues/203) for the full incident report.
```bash
# Manual run

View file

@ -105,7 +105,7 @@ Disk Layout:
```bash
mkdir -p /opt/data-analyst
chown deploy:data-ops /opt/data-analyst
sudo -u deploy git clone git@github.com:your-org/ai-data-analyst.git /opt/data-analyst/repo
sudo -u deploy git clone git@github.com:keboola/agnes-the-ai-analyst.git /opt/data-analyst/repo
git config --global --add safe.directory /opt/data-analyst/repo
/opt/data-analyst/repo/server/setup.sh
```

View file

@ -1,233 +0,0 @@
# Service Connector - Integration of Internal APIs into Data Analyst Platform
## Context
The data analyst platform currently supports only data analysis (parquet files + DuckDB). We want to extend it so analysts can also interact with internal services (Purchase Order system, Invoicing, CRM) through Claude Code. This requires:
1. **API keys** delivered to the analyst's local machine (`.env` file)
2. **Skills** teaching Claude Code how to use each service's API (`.claude/rules/` markdown files)
3. **Seamless UX** - non-technical users click "Connect" in the web portal, everything else is automatic
Key constraints:
- All external services are internal apps (we can modify them)
- They already have Google OAuth and Bearer token/API key authentication
- They already have token generation UI
- We target 2-3 services initially
- Must reuse established patterns (sudo install, atomic JSON, sync_data.sh)
## Architecture Overview
```
User clicks "Connect" on your-instance.example.com
|
v
Webapp calls external service's internal token-exchange endpoint
| (service-to-service, shared secret)
v
API key returned, stored in /data/service-connectors/connections.json
|
v
Webapp writes /home/{user}/.service_env (sudo install, mode 600)
Webapp writes /home/{user}/.claude_rules/sc_{service}.md (skill file)
|
v
Analyst runs sync_data.sh
|
v
.service_env -> merged into ~/keboola-analysis/.env
sc_*.md -> already synced with existing corporate memory rules sync
```
## Implementation Plan
### Phase 1: Service Registry & Infrastructure
**1.1 Create service registry config**
- File: `docs/setup/service_connectors.json`
- Defines available services: id, name, description, URLs, env var names, skill file name
- Deployed to `/data/docs/setup/` by deploy.sh
**1.2 Create sudo helper script**
- File: `server/bin/install-service-env`
- Accepts: USERNAME, ENV_SOURCE_PATH, SKILLS_SOURCE_DIR
- Installs `.service_env` (mode 600) to user home
- Installs `sc_*.md` skill files to `.claude_rules/` (mode 600)
- Only removes `sc_*.md` files (leaves `km_*.md` from corporate memory intact)
- Template: `server/bin/install-user-rules` (63 lines, same structure)
**1.3 Update sudoers**
- File: `server/sudoers-webapp` - add entry for `install-service-env`
**1.4 Update deploy.sh**
- Create `/data/service-connectors/` directory (www-data:data-ops, 2770)
- Deploy service registry and skill files
- Add new env vars to .env block: `SC_SECRET_PURCHASE_ORDERS`, `SC_SECRET_INVOICING`, `SC_SECRET_CRM`
**1.5 Add config entries**
- File: `webapp/config.py` - no new config class entries needed (secrets read directly with `os.environ.get()` in the service module, same pattern as sync_settings_service.py)
### Phase 2: Backend Service
**2.1 Create service connector module**
- File: `webapp/service_connector_service.py`
- Pattern: follows `webapp/sync_settings_service.py` exactly
Key functions:
```python
# Data storage
CONNECTORS_DIR = Path(os.environ.get("CONNECTORS_DIR", "/data/service-connectors"))
CONNECTIONS_FILE = CONNECTORS_DIR / "connections.json"
# Core functions
def get_available_services() -> dict # Load registry
def get_user_connections(username: str) -> dict # User's connection status
def connect_service(username, service_id, user_email) -> (bool, str) # Token exchange + install
def disconnect_service(username, service_id) -> (bool, str) # Revoke + cleanup
def check_service_health(service_id) -> dict # Health check
# Internal
def _exchange_token(service, user_email) -> dict | None # Call external service
def _revoke_token(service, token_id) -> bool # Call revoke endpoint
def _regenerate_user_env(username) -> bool # Write .service_env via sudo
def _install_service_skills(username) -> bool # Write sc_*.md via sudo
def _get_server_username(webapp_username) -> str # Reuse WEBAPP_TO_SERVER_USERNAME
```
Storage format (`connections.json`):
```json
{
"john": {
"purchase_orders": {
"connected": true,
"api_key": "pk_live_abc123...",
"token_id": "tok_xyz789",
"connected_at": "2026-02-16T12:00:00Z",
"expires_at": "2026-05-17T12:00:00Z"
}
}
}
```
Note: API keys stored in connections.json (protected by 660 permissions, www-data:data-ops). This follows the same approach as telegram_users.json storing chat_ids. For internal services, this is acceptable security level.
**2.2 Add API routes to webapp**
- File: `webapp/app.py` - add routes in `register_routes()`
```
GET /api/service-connectors - List services + user connections
POST /api/service-connectors/connect - Connect to a service {service_id}
POST /api/service-connectors/disconnect - Disconnect {service_id}
GET /api/service-connectors/health/<service_id> - Health check
```
**2.3 Token exchange protocol**
What each external service needs to implement:
```
POST /api/internal/token-exchange
Authorization: Bearer <shared_secret>
Body: {"user_email": "john@your-domain.com", "ttl_days": 90}
Response: {"status": "ok", "api_key": "...", "token_id": "...", "expires_at": "..."}
POST /api/internal/token-revoke
Authorization: Bearer <shared_secret>
Body: {"token_id": "tok_xyz789"}
Response: {"status": "ok"}
```
### Phase 3: Dashboard UI
**3.1 Add Service Connectors card to dashboard**
- File: `webapp/templates/dashboard.html`
- New card in the existing 2-column layout (same pattern as Data Settings and Telegram cards)
- Shows grid of service cards with Connect/Disconnect buttons
- Connected = green badge + expiry date
- AJAX calls to `/api/service-connectors/*` endpoints
### Phase 4: Sync & Skills
**4.1 Extend sync_data.sh**
- File: `scripts/sync_data.sh`
- Add block after corporate memory rules sync (line ~418):
1. Download `~/.service_env` from server via SCP
2. If exists: merge into local `.env` using marker comments (`# --- SERVICE CONNECTOR START/END ---`)
3. If not exists: clean old service connector block from `.env`
```bash
# --- Sync service connector credentials ---
if scp -q data-analyst:~/.service_env /tmp/.service_env_$$ 2>/dev/null; then
# Remove old block, append new one with markers
sed -i.bak '/^# --- SERVICE CONNECTOR START ---$/,/^# --- SERVICE CONNECTOR END ---$/d' ./.env 2>/dev/null
{ echo "# --- SERVICE CONNECTOR START ---"; cat /tmp/.service_env_$$; echo "# --- SERVICE CONNECTOR END ---"; } >> ./.env
rm -f /tmp/.service_env_$$
fi
```
Note: `sc_*.md` skills are already synced by the existing corporate memory sync block (line 410: `scp -rq "data-analyst:~/.claude_rules/"* .claude/rules/`).
**4.2 Create skill files**
- Directory: `docs/service_connector_skills/`
- Files: `sc_purchase_orders.md`, `sc_invoicing.md`, `sc_crm.md`
- Content: Authentication setup, available endpoints, common patterns, data models
- Deployed to `/data/docs/service_connector_skills/` by deploy.sh
- Installed to user's `.claude_rules/` when they connect
### Phase 5: Tests
**5.1 Unit tests**
- File: `tests/test_service_connector_service.py`
- Test: connect/disconnect flow, env generation, registry loading, error handling
## Files to Create
| File | Purpose |
|------|---------|
| `webapp/service_connector_service.py` | Core service (connect, disconnect, env generation) |
| `docs/setup/service_connectors.json` | Service registry config |
| `docs/service_connector_skills/sc_purchase_orders.md` | PO API skill |
| `server/bin/install-service-env` | Sudo helper for env + skills install |
| `tests/test_service_connector_service.py` | Unit tests |
## Files to Modify
| File | Change |
|------|--------|
| `webapp/app.py` | Import service_connector_service, add 4 API routes |
| `webapp/templates/dashboard.html` | Add Service Connectors card widget |
| `server/sudoers-webapp` | Add `install-service-env` entry |
| `server/deploy.sh` | Create /data/service-connectors/, deploy skills, add env vars |
| `scripts/sync_data.sh` | Add .service_env download and .env merge block |
| `.github/workflows/deploy.yml` | Add SC_SECRET_* GitHub Secrets to env |
## Key Patterns Reused
- **Sudo install**: `sync_settings_service.py:_regenerate_user_config()` (line 143-183)
- **Atomic JSON**: `sync_settings_service.py:_write_json()` (line 61-74)
- **Username mapping**: `corporate_memory_service.py:_get_server_username()` (line 56-59)
- **Sudo helper script**: `server/bin/install-user-rules` (entire file)
- **Dashboard AJAX pattern**: Sync settings toggles in `dashboard.html`
## Security Model
| Stage | Protection |
|-------|------------|
| Token exchange (webapp <-> service) | HTTPS + shared secret in Authorization header |
| Central storage (connections.json) | /data/service-connectors/ (2770), file 660 |
| User home (.service_env) | Mode 600 (owner-only), sudo install |
| Transit (sync) | SCP over SSH |
| Client (.env) | Local filesystem; Claude Code settings deny Read(.env) |
| Claude Code usage | Python `load_dotenv()` via Bash (allowed) |
## Verification
1. **Unit tests**: `pytest tests/test_service_connector_service.py`
2. **Manual flow**:
- Deploy to server
- Log into your-instance.example.com
- Click "Connect" on PO system in dashboard
- Verify `.service_env` appears in `/home/{user}/`
- Run `sync_data.sh` on client
- Verify `.env` contains PO_API_KEY
- Verify `.claude/rules/sc_purchase_orders.md` exists
- In Claude Code: `python -c "from dotenv import load_dotenv; load_dotenv(); import os; print(os.environ.get('PO_API_KEY', 'NOT SET'))"`
3. **Disconnect flow**: Click Disconnect, verify key removed from .env after sync

View file

@ -1,180 +0,0 @@
# Pilot: PO Integrace pro Interního Analysta (bez lokálního ukládání klíčů)
## Shrnutí
Cíl je dodat produkční pilot pro **Purchase Order System** s tímto chováním:
1. Uživatel v dashboardu udělá jen `Connect` (1 klik + Google consent).
2. Server bezpečně uloží per-user credential (šifrovaně, ne v lokálním počítači).
3. Na klientovi se token načte **on-demand přes SSH** pouze pro jeden příkaz (ENV jen v child procesu).
4. Do `.claude/rules` se synchronizuje znalost, že PO operace mají používat PO wrapper/skill.
5. Dodáme zároveň **full skill beta** (vedle produkční varianty rules+wrapper).
## Scope a hranice
- In-scope:
- Obecný service registry model (rozšiřitelný), implementace provideru jen pro `po`.
- Connect/Disconnect/Test flow v dashboard katalogu.
- Server-side per-user encrypted credential store.
- Runtime wrapper přes SSH s krátkodobým tokenem.
- Audit + revoke + 30denní rotace.
- Sync rules + skill beta distribuce.
- Out-of-scope:
- CRM/Revolut implementace (jen připravený framework).
- Plná proxy architektura pro API calls (volání půjde přímo klient -> PO API).
- Migrace na externí secret manager v pilotu.
## Návrh řešení (decision-complete)
### 1) Service registry a datový model
- Přidat nový server-side registry JSON: `/data/integrations/registry.json`.
- Schema položky služby:
- `service_id` (`po`)
- `display_name`
- `connect_type` (`oauth_bootstrap_link`)
- `scopes`
- `api_base_url`
- `enabled`
- `skill_slug` (`po-system`)
- `rule_template_id`
- Per-user credential metadata:
- `/data/integrations/state/<service>/<username>.json`
- Obsah: `status`, `connected_at`, `last_rotated_at`, `expires_at`, `token_ref`, `audit_last_use_at`
- Per-user encrypted secret:
- `/data/integrations/secrets/<service>/<username>.enc`
- Šifrování AES-GCM, key z `/etc/internal-analyst/integrations.key` (root:root, 600).
### 2) Dashboard UX (katalog zdrojů)
- Do `webapp/templates/catalog.html` přidat novou kartu “Purchase Order System”.
- Akce:
- `Connect` -> redirect na bootstrap URL PO.
- `Disconnect` -> revoke + smazání secretu.
- `Test connection` -> ověření issuance short-lived tokenu.
- UI stav:
- `Not connected`, `Connected`, `Error`, `Reauthorization required`.
### 3) Webapp API rozhraní
- Přidat nové endpointy (Flask):
- `GET /api/integrations/catalog`
- `POST /api/integrations/<service>/connect`
- `GET /api/integrations/<service>/callback`
- `POST /api/integrations/<service>/disconnect`
- `POST /api/integrations/<service>/test`
- Interní service layer:
- nový `webapp/integration_service.py`
- provider interface:
- `build_connect_url(user, state)`
- `exchange_callback(code, state)`
- `rotate(user)`
- `issue_runtime_token(user, ttl_seconds)`
- `revoke(user)`
- `po` provider implementace jako první konkrétní provider.
### 4) PO bootstrap link + token lifecycle
- Connect flow:
1. Uživatel klikne Connect v dashboardu.
2. Redirect na PO auth/bootstrap endpoint (Google auth + consent).
3. Callback zpět na webapp.
4. Webapp uloží refresh credential šifrovaně.
- Rotace:
- server job (cron/systemd) denně kontroluje `last_rotated_at`.
- rotuje každých 30 dní.
- Revoke:
- Disconnect okamžitě revokuje token v PO a maže server secret.
### 5) Runtime token injection přes SSH (bez lokální persistence)
- Přidat server helper:
- `/usr/local/bin/integration-runtime-env`
- Vstup: `--service po --ttl 300 --format env`
- Výstup: shell-safe `export` lines pro child proces.
- Přístup:
- přes sudoers povolit pouze self-service issuance (uživatel jen pro svůj účet).
- helper tvrdě validuje volajícího uživatele a service allowlist.
- Lokální wrapper skript v syncovaných skriptech:
- `server/scripts/po-run`
- Spuštění:
- `ssh data-analyst "/usr/local/bin/integration-runtime-env --service po --ttl 300 --format env"`
- spustí cílový příkaz v subshell s ENV
- po skončení ENV zahodí (bez zápisu do souboru)
- Přímá API volání:
- klientský skript/skill volá PO API přímo s runtime ENV tokenem.
### 6) Rules + skill beta distribuce
- Produkčně:
- generovat service rule soubor na serveru: `/home/<user>/.claude_rules/svc_po.md`
- `scripts/sync_data.sh` už pravidla stahuje do `.claude/rules/`; upravit jen reporting, aby počítal i `svc_*.md`.
- Skill beta:
- přidat sync složku `server/skills/po-system/` (SKILL.md + scripts + references).
- přidat skript `server/scripts/install_skills.sh`:
- instaluje do uživatelova Codex skill home.
- `sync_data.sh` po syncu volá `install_skills.sh` (best-effort, neblokuje datový sync).
### 7) Audit, observability, bezpečnost
- Audit log soubor:
- `/data/integrations/audit.log` (append-only)
- eventy: `connect`, `callback_ok`, `runtime_issue`, `rotate`, `revoke`, `error`
- Bezpečnostní guardrails:
- token nikdy nelogovat
- helper vrací jen krátkodobý access token
- TTL runtime tokenu default 5 min
- strict input validation service/usernames
- Monitoring:
- dashboard health check pro integrations service
- metriky: connect success rate, runtime issuance latency, rotate failures.
## Důležité změny veřejných API/rozhraní/typů
- Nové REST endpointy pod `/api/integrations/*`.
- Nový datový kontrakt `registry.json` pro katalog služeb.
- Nový provider interface v `webapp/integration_service.py`.
- Nový CLI kontrakt helperu `integration-runtime-env`.
- Nový lokální wrapper command `po-run`.
## Testy a validační scénáře
### Unit testy
- `webapp/integration_service.py`:
- validace state/nonce
- encrypt/decrypt secretu
- rotace pravidla 30 dní
- provider `po`:
- connect URL tvorba
- callback exchange error handling
- helper `integration-runtime-env`:
- self-user enforcement
- invalid service rejection
- TTL bounds
### Integration testy
- E2E Connect flow:
- dashboard -> PO consent -> callback -> connected status.
- E2E runtime:
- `po-run "curl ..."` získá token přes SSH a provede volání.
- E2E Disconnect:
- revoke v PO + odstranění state/secrets + UI přepnutí na not connected.
- E2E rotation:
- forced rotation scenario + audit log verification.
### Security testy
- Pokus o issuance tokenu pro jiného uživatele (musí failnout).
- Pokus o command injection přes service parameter (musí failnout).
- Ověření, že token není v logu ani na disku klienta po dokončení příkazu.
### Acceptance kritéria
- Uživatel připojí PO službu 1 klikem + consent.
- Žádný persistentní API klíč na klientově disku.
- `po-run` funguje bez ručního exportu tokenu.
- Disconnect do 1 min deaktivuje další runtime issuance.
- Rotace probíhá automaticky po 30 dnech.
## Rollout plán
1. Backend + helper + audit nasadit behind feature flag `INTEGRATIONS_PO_ENABLED=false`.
2. Zapnout interně pro 12 test uživatele.
3. Ověřit E2E + security checklist.
4. Zapnout pro všechny analytiky.
5. Po stabilizaci přidat další provider (CRM) přes stejný interface.
## Assumptions a zvolené defaulty
- PO systém je plně pod kontrolou týmu a lze doplnit endpointy pro issuance/rotate/revoke.
- Identita uživatele je mapovatelná mezi dashboardem a server účtem.
- Runtime přenos credentialů má být pouze přes SSH mechanismus (ne lokální secret file).
- Default runtime token TTL: 300 sekund.
- Default rotace: 30 dní.
- Pilot storage není `.env`; používá per-user encrypted store.
- Produkční path v pilotu je rules+wrapper, full skill je beta paralelně.

View file

@ -1,577 +0,0 @@
# Service Connector - Integration of Internal APIs into Data Analyst Platform
## Origin
This plan was derived from two independent proposals (Claude and Codex), both reviewed by 3 AI models (6 reviews total). The reviews identified real issues but also pushed the design toward over-engineering. This final version applies KISS and YAGNI to keep only what matters.
**Previous drafts** (kept for reference):
- `services-integration-claude.md` - Claude plan (good patterns, missing encryption)
- `services-integrations-codex.md` - Codex plan (SSH runtime injection, encrypted store - overkill for pilot)
**What we cut and why:**
- Fernet encryption of connections.json - encryption key lives on the same server as the data, security theater
- fcntl file locking - 5-10 users clicking Connect once a month, race condition probability ~0%
- Transactional connect with rollback - if token exchange fails, user clicks again
- Audit log + HMAC + logrotate - Flask access log is enough for 5 users
- Auto-rotation timer + systemd units - set long TTL, user reconnects if expired
- Feature flag rollout - just deploy when ready
- Security tests (CSRF, injection) - internal tool behind Google OAuth, all users are employees
## Context
The data analyst platform supports data analysis (parquet + DuckDB). We want analysts to also interact with internal services (Purchase Orders, Invoicing, CRM) through Claude Code.
**What the analyst needs:**
1. API key in their local `.env` file
2. Skill file in `.claude/rules/` teaching Claude Code how to use the API
**What we already have:**
- `sync_data.sh` that syncs files from server to analyst's machine
- `.claude_rules/` sync for corporate memory (skills already flow through this)
- `sudo install` pattern for writing to user home dirs
- Dashboard with AJAX cards (Data Settings, Telegram)
**Trust model:** Employee laptops are trusted (corporate-managed, encrypted disks). If a laptop is compromised, we have bigger problems than API keys.
## Architecture Overview
```
User clicks "Connect" on your-instance.example.com
|
v
Webapp calls service's token-exchange endpoint (shared secret)
|
v
API key stored in /data/service-connectors/connections.json (plaintext, 660)
|
v
Webapp writes /home/{user}/.service_env (sudo install, mode 600)
Webapp writes /home/{user}/.claude_rules/sc_{service}.md (skill file)
|
v
Analyst runs sync_data.sh (existing)
|
v
.service_env -> merged into local .env
sc_*.md -> already synced via existing .claude_rules/ sync
```
That's it. No encryption layer, no file locking, no audit log, no rotation timer.
## Implementation
### 1. Service Registry
File: `docs/setup/service_connectors.json`
```json
[
{
"id": "purchase_orders",
"name": "Purchase Order System",
"description": "Create and query purchase orders",
"token_exchange_url": "https://po.internal.example.com/api/internal/token-exchange",
"token_revoke_url": "https://po.internal.example.com/api/internal/token-revoke",
"env_var_name": "PO_API_KEY",
"skill_file": "sc_purchase_orders.md",
"enabled": true
}
]
```
Deployed to `/data/docs/setup/` by deploy.sh (same as other config files).
### 2. Backend Service
File: `webapp/service_connector_service.py`
Follows `webapp/sync_settings_service.py` pattern exactly.
```python
"""
Service connector - manages API key provisioning for internal services.
Reads service registry from /data/docs/setup/service_connectors.json.
Stores user connections in /data/service-connectors/connections.json.
Writes .service_env and skill files to user home dirs via sudo install.
"""
import json
import logging
import os
import subprocess
import tempfile
from datetime import datetime
from pathlib import Path
from typing import Any
import httpx
logger = logging.getLogger(__name__)
CONNECTORS_DIR = Path(os.environ.get("CONNECTORS_DIR", "/data/service-connectors"))
CONNECTIONS_FILE = CONNECTORS_DIR / "connections.json"
REGISTRY_FILE = Path(os.environ.get("REGISTRY_FILE", "/data/docs/setup/service_connectors.json"))
SKILLS_DIR = Path(os.environ.get("SC_SKILLS_DIR", "/data/docs/service_connector_skills"))
# Username mapping (reuse existing pattern)
WEBAPP_TO_SERVER_USERNAME = {
# Add overrides here if webapp username != server username
# "jane.smith": "jane",
}
def get_available_services() -> list[dict]:
"""Load service registry."""
if not REGISTRY_FILE.exists():
return []
with open(REGISTRY_FILE) as f:
services = json.load(f)
return [s for s in services if s.get("enabled", True)]
def get_user_connections(username: str) -> dict:
"""Get user's active connections (without API keys)."""
connections = _load_connections()
user_conns = connections.get(username, {})
# Strip API keys from response
safe = {}
for service_id, conn in user_conns.items():
safe[service_id] = {
"connected": conn.get("connected", False),
"connected_at": conn.get("connected_at"),
"expires_at": conn.get("expires_at"),
}
return safe
def connect_service(username: str, service_id: str, user_email: str) -> tuple[bool, str]:
"""Exchange token with service, store key, install to user home."""
service = _get_service_config(service_id)
if not service:
return False, "Unknown service"
# Get shared secret for this service
secret_env = f"SC_SECRET_{service_id.upper()}"
shared_secret = os.environ.get(secret_env)
if not shared_secret:
logger.error(f"Missing {secret_env} environment variable")
return False, "Service not configured"
# Token exchange
try:
resp = httpx.post(
service["token_exchange_url"],
headers={"Authorization": f"Bearer {shared_secret}"},
json={"user_email": user_email, "ttl_days": 365},
timeout=30,
)
resp.raise_for_status()
token_data = resp.json()
except Exception as e:
logger.error(f"Token exchange failed for {service_id}: {e}")
return False, f"Token exchange failed: {e}"
# Store in connections.json
connections = _load_connections()
connections.setdefault(username, {})[service_id] = {
"connected": True,
"api_key": token_data["api_key"],
"token_id": token_data.get("token_id"),
"connected_at": datetime.utcnow().isoformat() + "Z",
"expires_at": token_data.get("expires_at"),
}
_save_connections(connections)
# Write .service_env and skills to user home
server_username = _get_server_username(username)
_regenerate_user_env(server_username, connections.get(username, {}))
_install_service_skills(server_username, connections.get(username, {}))
return True, "Connected successfully"
def disconnect_service(username: str, service_id: str) -> tuple[bool, str]:
"""Revoke token and remove credentials."""
connections = _load_connections()
conn = connections.get(username, {}).get(service_id)
if not conn:
return False, "Not connected"
# Try to revoke remotely (best-effort)
service = _get_service_config(service_id)
token_id = conn.get("token_id")
if service and token_id:
try:
secret_env = f"SC_SECRET_{service_id.upper()}"
shared_secret = os.environ.get(secret_env, "")
httpx.post(
service["token_revoke_url"],
headers={"Authorization": f"Bearer {shared_secret}"},
json={"token_id": token_id},
timeout=30,
)
except Exception as e:
logger.warning(f"Remote revoke failed for {service_id}/{token_id}: {e}")
# Always clean up locally
connections.get(username, {}).pop(service_id, None)
if username in connections and not connections[username]:
del connections[username]
_save_connections(connections)
# Regenerate user files
server_username = _get_server_username(username)
_regenerate_user_env(server_username, connections.get(username, {}))
_install_service_skills(server_username, connections.get(username, {}))
return True, "Disconnected"
# --- Internal helpers ---
def _get_service_config(service_id: str) -> dict | None:
"""Find service in registry by ID."""
for s in get_available_services():
if s["id"] == service_id:
return s
return None
def _get_server_username(webapp_username: str) -> str:
"""Map webapp username to server Linux username."""
return WEBAPP_TO_SERVER_USERNAME.get(webapp_username, webapp_username)
def _load_connections() -> dict:
"""Load connections.json."""
if not CONNECTIONS_FILE.exists():
return {}
with open(CONNECTIONS_FILE) as f:
return json.load(f)
def _save_connections(data: dict) -> None:
"""Atomic write to connections.json (same pattern as sync_settings_service)."""
fd, temp_path = tempfile.mkstemp(dir=str(CONNECTORS_DIR), suffix=".json")
try:
os.write(fd, json.dumps(data, indent=2).encode())
os.close(fd)
os.chmod(temp_path, 0o660)
os.replace(temp_path, str(CONNECTIONS_FILE))
except Exception:
os.close(fd) if not os.get_inheritable(fd) else None
os.unlink(temp_path)
raise
def _regenerate_user_env(server_username: str, user_connections: dict) -> None:
"""Write .service_env to user home via sudo install."""
# Build env file content
lines = []
for service_id, conn in user_connections.items():
if not conn.get("connected"):
continue
service = _get_service_config(service_id)
if service:
lines.append(f"{service['env_var_name']}={conn['api_key']}")
# Write to temp file, then sudo install
fd, temp_path = tempfile.mkstemp(suffix=".env")
try:
os.write(fd, "\n".join(lines).encode() if lines else b"")
os.close(fd)
target = f"/home/{server_username}/.service_env"
if lines:
subprocess.run(
["sudo", "/usr/bin/install", "-o", server_username, "-g", server_username,
"-m", "600", temp_path, target],
check=True, capture_output=True,
)
else:
# No connections - remove .service_env if it exists
subprocess.run(
["sudo", "rm", "-f", target],
check=True, capture_output=True,
)
finally:
os.unlink(temp_path)
def _install_service_skills(server_username: str, user_connections: dict) -> None:
"""Install sc_*.md skill files to user's .claude_rules/ via sudo helper."""
connected_services = [
sid for sid, conn in user_connections.items() if conn.get("connected")
]
# Copy relevant skill files to temp dir
temp_dir = tempfile.mkdtemp()
try:
for service_id in connected_services:
service = _get_service_config(service_id)
if service and service.get("skill_file"):
src = SKILLS_DIR / service["skill_file"]
if src.exists():
dest = Path(temp_dir) / service["skill_file"]
dest.write_bytes(src.read_bytes())
subprocess.run(
["sudo", "/usr/local/bin/install-service-env",
server_username, temp_dir],
check=True, capture_output=True,
)
finally:
import shutil
shutil.rmtree(temp_dir, ignore_errors=True)
```
### 3. API Routes
Add to `webapp/app.py` in `register_routes()`:
```python
from webapp import service_connector_service
@app.route("/api/service-connectors")
@login_required
def api_service_connectors():
username = get_username_from_email(session["user"]["email"])
services = service_connector_service.get_available_services()
connections = service_connector_service.get_user_connections(username)
return jsonify({"services": services, "connections": connections})
@app.route("/api/service-connectors/connect", methods=["POST"])
@login_required
def api_service_connect():
username = get_username_from_email(session["user"]["email"])
email = session["user"]["email"]
service_id = request.json.get("service_id")
ok, msg = service_connector_service.connect_service(username, service_id, email)
return jsonify({"success": ok, "message": msg})
@app.route("/api/service-connectors/disconnect", methods=["POST"])
@login_required
def api_service_disconnect():
username = get_username_from_email(session["user"]["email"])
service_id = request.json.get("service_id")
ok, msg = service_connector_service.disconnect_service(username, service_id)
return jsonify({"success": ok, "message": msg})
```
### 4. Sudo Helper
File: `server/bin/install-service-env` (copy of `install-user-rules`, modified for `sc_*` prefix)
```bash
#!/bin/bash
# Install service connector skill files to user's .claude_rules/.
# Called by webapp (www-data) via sudo.
#
# Usage: sudo install-service-env USERNAME SKILLS_SOURCE_DIR
set -euo pipefail
if [[ $EUID -ne 0 ]]; then
echo "Must be run as root (via sudo)" >&2
exit 1
fi
if [[ $# -lt 2 ]]; then
echo "Usage: sudo install-service-env USERNAME SKILLS_SOURCE_DIR" >&2
exit 1
fi
USERNAME="$1"
SOURCE_DIR="$2"
if ! id "$USERNAME" &>/dev/null; then
echo "User '$USERNAME' does not exist" >&2
exit 1
fi
if [[ ! -d "$SOURCE_DIR" ]]; then
echo "Source directory '$SOURCE_DIR' does not exist" >&2
exit 1
fi
USER_HOME=$(eval echo "~${USERNAME}")
RULES_DIR="${USER_HOME}/.claude_rules"
mkdir -p "$RULES_DIR"
chown "${USERNAME}:${USERNAME}" "$RULES_DIR"
chmod 700 "$RULES_DIR"
# Remove old service connector skills only (sc_*.md), preserve km_*.md
rm -f "${RULES_DIR}"/sc_*.md
# Install new skill files
COUNT=0
for src_file in "${SOURCE_DIR}"/*.md; do
if [[ -f "$src_file" ]]; then
/usr/bin/install -o "$USERNAME" -g "$USERNAME" -m 600 "$src_file" "$RULES_DIR/"
COUNT=$((COUNT + 1))
fi
done
echo "Installed ${COUNT} service skills for ${USERNAME} in ${RULES_DIR}"
```
### 5. Dashboard UI Card
Add to `webapp/templates/dashboard.html` - new card in existing grid, same AJAX pattern as Data Settings toggles:
- Grid of service cards (name + description from registry)
- Green "Connected" badge or grey "Not connected"
- Connect / Disconnect button
- Shows `expires_at` if connected
### 6. Sync Extension
Add to `scripts/sync_data.sh` after existing corporate memory sync block:
```bash
# --- Sync service connector credentials ---
if scp -q data-analyst:~/.service_env /tmp/.service_env_$$ 2>/dev/null; then
# Remove old service connector block
if [ -f ./.env ]; then
awk '
/^# --- SERVICE CONNECTOR START ---$/ { skip=1; next }
/^# --- SERVICE CONNECTOR END ---$/ { skip=0; next }
!skip { print }
' ./.env > ./.env.tmp && mv ./.env.tmp ./.env
fi
# Append new block
{
echo "# --- SERVICE CONNECTOR START ---"
cat /tmp/.service_env_$$
echo "# --- SERVICE CONNECTOR END ---"
} >> ./.env
rm -f /tmp/.service_env_$$
echo "Service connector credentials synced"
else
# No active connections - clean old block if present
if [ -f ./.env ] && grep -q "^# --- SERVICE CONNECTOR START ---$" ./.env 2>/dev/null; then
awk '
/^# --- SERVICE CONNECTOR START ---$/ { skip=1; next }
/^# --- SERVICE CONNECTOR END ---$/ { skip=0; next }
!skip { print }
' ./.env > ./.env.tmp && mv ./.env.tmp ./.env
echo "Service connector credentials removed"
fi
fi
```
Skill files (`sc_*.md`) are already synced by the existing `.claude_rules/` sync block.
### 7. Deploy Changes
Add to `server/deploy.sh`:
```bash
# Service connectors directory
mkdir -p /data/service-connectors
chown www-data:data-ops /data/service-connectors
chmod 2770 /data/service-connectors
# Deploy skill files
mkdir -p /data/docs/service_connector_skills
cp -r docs/service_connector_skills/* /data/docs/service_connector_skills/ 2>/dev/null || true
# Deploy service registry
cp docs/setup/service_connectors.json /data/docs/setup/ 2>/dev/null || true
# Install sudo helper
install -m 755 server/bin/install-service-env /usr/local/bin/
```
Add to `server/sudoers-webapp`:
```
www-data ALL=(ALL) NOPASSWD: /usr/local/bin/install-service-env
```
Add to `.github/workflows/deploy.yml` env block:
```yaml
SC_SECRET_PURCHASE_ORDERS: ${{ secrets.SC_SECRET_PURCHASE_ORDERS }}
```
## Token Exchange Protocol
What each internal service needs to implement (simple Bearer + JSON):
```
POST /api/internal/token-exchange
Authorization: Bearer <shared_secret>
Content-Type: application/json
Body: {"user_email": "john@your-domain.com", "ttl_days": 365}
Response: {"status": "ok", "api_key": "...", "token_id": "...", "expires_at": "..."}
POST /api/internal/token-revoke
Authorization: Bearer <shared_secret>
Content-Type: application/json
Body: {"token_id": "tok_xyz789"}
Response: {"status": "ok"}
```
TTL is 365 days. If a key expires, user clicks Reconnect. No auto-rotation needed.
## Files to Create
| File | Purpose | Size estimate |
|------|---------|---------------|
| `webapp/service_connector_service.py` | Connect, disconnect, env generation | ~150 lines |
| `docs/setup/service_connectors.json` | Service registry | ~20 lines |
| `docs/service_connector_skills/sc_purchase_orders.md` | PO API skill for Claude Code | ~50 lines |
| `server/bin/install-service-env` | Sudo helper (copy of install-user-rules) | ~30 lines |
| `tests/test_service_connector_service.py` | Unit tests | ~100 lines |
## Files to Modify
| File | Change | Size estimate |
|------|--------|---------------|
| `webapp/app.py` | Add 3 API routes | ~20 lines |
| `webapp/templates/dashboard.html` | Service connectors card | ~60 lines |
| `server/sudoers-webapp` | Add install-service-env entry | 1 line |
| `server/deploy.sh` | Create dirs, deploy skills, add env vars | ~10 lines |
| `scripts/sync_data.sh` | .service_env merge block | ~20 lines |
| `.github/workflows/deploy.yml` | Add SC_SECRET_* to env | ~3 lines |
**Total new code: ~350 lines** (vs ~800+ in the hybrid plan, ~1200+ in the Codex plan)
## Security Model
| Stage | Protection |
|-------|------------|
| Token exchange | HTTPS + per-service shared secret |
| Server storage (connections.json) | File permissions 660, dir 2770 (www-data:data-ops) |
| User home (.service_env) | Mode 600, sudo install |
| Transit | SCP over SSH |
| Client (.env) | Local filesystem, Claude Code denies Read(.env) |
| Trust model | Employee laptops trusted (corporate-managed, encrypted disks) |
## Key Patterns Reused
- **Sudo install**: `sync_settings_service.py:_regenerate_user_config()`
- **Atomic JSON write**: `sync_settings_service.py:_write_json()` (tempfile + os.replace)
- **Username mapping**: `corporate_memory_service.py:_get_server_username()`
- **Sudo helper**: `server/bin/install-user-rules` (same structure)
- **Dashboard AJAX**: Sync settings toggles in `dashboard.html`
## Verification
1. `pytest tests/test_service_connector_service.py`
2. Deploy, click Connect on PO, verify `.service_env` in `/home/{user}/`
3. Run `sync_data.sh`, verify `.env` contains `PO_API_KEY`
4. Verify `.claude/rules/sc_purchase_orders.md` exists
5. In Claude Code: `python -c "from dotenv import load_dotenv; load_dotenv(); import os; print(os.environ.get('PO_API_KEY', 'NOT SET'))"`
6. Click Disconnect, sync, verify key removed
## What We Might Add Later (only if needed)
| Feature | When to add |
|---------|-------------|
| Encryption of connections.json | If we get a compliance requirement |
| Auto-rotation | If services start issuing short-lived tokens |
| Audit log | If we need forensics capability |
| File locking | If we ever have concurrent connect/disconnect issues |
| SSH runtime injection | If laptop trust model changes |

View file

@ -19,7 +19,7 @@ This is NOT a usage log. It is a **strategic command center** that:
- **Current state**: Demo mockup with fictional data (DEMO badge in header)
- **URL**: https://your-instance.example.com/activity-center (requires login)
- **PR**: https://github.com/your-org/ai-data-analyst/pull/122
- **PR**: https://github.com/keboola/agnes-the-ai-analyst/pull/122
- **Branch**: `feature/activity-center`
- **Dashboard link**: Not added yet (UX placement decided, implementation pending)

View file

@ -145,7 +145,7 @@ except Exception:
raise
```
Use `0o660` for files accessed by services via data-ops group ACL, `0o644` for world-readable files (e.g., profiler output). See [#203](https://github.com/your-org/ai-data-analyst/issues/203) for a production incident caused by missing `fchmod`.
Use `0o660` for files accessed by services via data-ops group ACL, `0o644` for world-readable files (e.g., profiler output). See [#203](https://github.com/keboola/agnes-the-ai-analyst/issues/203) for a production incident caused by missing `fchmod`.
**Per-issue file locking for concurrent writers:**
@ -678,7 +678,7 @@ sudo cat /home/deploy/.ssh/id_ed25519.pub
```
**3. Add Deploy Key to GitHub:**
- Go to: https://github.com/your-org/ai-data-analyst/settings/keys
- Go to: https://github.com/keboola/agnes-the-ai-analyst/settings/keys
- Click "Add deploy key"
- Title: `data-broker-server`
- Key: (paste public key from previous step)
@ -688,7 +688,7 @@ sudo cat /home/deploy/.ssh/id_ed25519.pub
```bash
sudo mkdir -p /opt/data-analyst
sudo chown deploy:data-ops /opt/data-analyst
sudo -u deploy git clone git@github.com:your-org/ai-data-analyst.git /opt/data-analyst/repo
sudo -u deploy git clone git@github.com:keboola/agnes-the-ai-analyst.git /opt/data-analyst/repo
sudo git config --global --add safe.directory /opt/data-analyst/repo
sudo -u deploy git config --global --add safe.directory /opt/data-analyst/repo
sudo /opt/data-analyst/repo/server/setup.sh
@ -1144,7 +1144,7 @@ cat ~/.notifications/logs/runner.log
### Known Issues
**On-demand script execution security hardening (partially resolved):**
The `notify-scripts` helper replaced direct `sudo -H -u ... /usr/bin/env ...` calls with a single auditable entry point. Services no longer need filesystem access to user home directories (750 permissions are preserved). The bot still requires `NoNewPrivileges=false` and `/tmp` in `ReadWritePaths` for sudo execution. A queue-based approach ([#51](https://github.com/your-org/ai-data-analyst/issues/51)) could further improve this by having `notify-runner` pick up run requests from a queue instead of the bot calling sudo directly.
The `notify-scripts` helper replaced direct `sudo -H -u ... /usr/bin/env ...` calls with a single auditable entry point. Services no longer need filesystem access to user home directories (750 permissions are preserved). The bot still requires `NoNewPrivileges=false` and `/tmp` in `ReadWritePaths` for sudo execution. A queue-based approach ([#51](https://github.com/keboola/agnes-the-ai-analyst/issues/51)) could further improve this by having `notify-runner` pick up run requests from a queue instead of the bot calling sudo directly.
## Data Sync Settings (Web Portal)
@ -1510,7 +1510,7 @@ for row in result:
- API token has read-only access to Jira (no write permissions needed)
- Webhook events are logged for audit purposes
- Multiple services write to `/data/src_data/raw/jira/`: webapp (www-data), SLA poll (root), consistency check (root), backfill scripts (admin users)
- Concurrent writes to the same issue JSON are serialized via per-issue advisory file locking (`connectors/jira/file_lock.py`, `fcntl.flock`). Lock files in `issues/.locks/`. See [#203](https://github.com/your-org/ai-data-analyst/issues/203).
- Concurrent writes to the same issue JSON are serialized via per-issue advisory file locking (`connectors/jira/file_lock.py`, `fcntl.flock`). Lock files in `issues/.locks/`. See [#203](https://github.com/keboola/agnes-the-ai-analyst/issues/203).
## Data Profiler
@ -1967,7 +1967,7 @@ The Corporate Memory page at `/corporate-memory` provides:
- **No credentials stored**: Knowledge items are filtered before storage
- **Source attribution**: Items track which users contributed (displayed as avatar initials)
- **Read-only for analysts**: `/data/corporate-memory/` is only writable by data-ops group
- **Atomic writes**: All JSON file updates use `tempfile.mkstemp()` + `os.replace()` to prevent corruption. **Critical:** always call `os.fchmod(fd, 0o660)` (or appropriate mode) immediately after `mkstemp()` — otherwise the default `0600` mode overrides the POSIX ACL mask to `---`, breaking group-based access for other services. See [#203](https://github.com/your-org/ai-data-analyst/issues/203).
- **Atomic writes**: All JSON file updates use `tempfile.mkstemp()` + `os.replace()` to prevent corruption. **Critical:** always call `os.fchmod(fd, 0o660)` (or appropriate mode) immediately after `mkstemp()` — otherwise the default `0600` mode overrides the POSIX ACL mask to `---`, breaking group-based access for other services. See [#203](https://github.com/keboola/agnes-the-ai-analyst/issues/203).
## Session Collector

View file

@ -3,8 +3,8 @@
Step-by-step deployment of AI Data Analyst on a clean Ubuntu 24.04 VM.
Two repos are involved:
- **OSS repo** (public/private): application code (`padak/tmp_oss`)
- **Instance repo** (private): your config, secrets template, data schema (`padak/tmp_oss_cfg`)
- **OSS repo** (public/private): application code (`keboola/agnes-the-ai-analyst`)
- **Instance repo** (private): your config, secrets template, data schema (your private instance config repo)
## Architecture on Server

View file

@ -0,0 +1,512 @@
# Final Integration Fixes — Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Fix all remaining integration gaps after the v2 refactoring so the system is fully operational — Jira webhooks, Docker services, dynamic login, profiler auto-trigger, scheduler auth.
**Architecture:** Five independent fixes targeting: (1) Jira webhook FastAPI adapter, (2) Docker compose service entries, (3) dynamic auth provider detection on login page, (4) profiler integration with sync pipeline, (5) scheduler auto-authentication.
**Tech Stack:** Python 3.11+, FastAPI, DuckDB, Docker Compose, httpx
---
### Task 1: Jira Webhook FastAPI Adapter
**Files:**
- Create: `app/api/jira_webhooks.py`
- Modify: `app/main.py`
- Modify: `connectors/jira/service.py` (add `JIRA_WEBHOOK_SECRET` to `_JiraConfig`)
- Test: `tests/test_jira_webhooks.py`
- [ ] **Step 1: Add JIRA_WEBHOOK_SECRET to config**
```python
# connectors/jira/service.py — add to _JiraConfig class (line ~24)
JIRA_WEBHOOK_SECRET = os.environ.get("JIRA_WEBHOOK_SECRET", "")
DEBUG = os.environ.get("DEBUG", "").lower() in ("1", "true")
```
- [ ] **Step 2: Write the failing test**
```python
# tests/test_jira_webhooks.py
"""Tests for Jira webhook FastAPI adapter."""
import hashlib
import hmac
import json
import os
import pytest
from fastapi.testclient import TestClient
@pytest.fixture
def webhook_client(tmp_path):
os.environ["DATA_DIR"] = str(tmp_path)
os.environ["JWT_SECRET_KEY"] = "test-secret-32chars-minimum!!"
os.environ["JIRA_WEBHOOK_SECRET"] = "test-webhook-secret"
(tmp_path / "state").mkdir()
(tmp_path / "extracts" / "jira" / "raw" / "webhook_events").mkdir(parents=True)
from app.main import create_app
app = create_app()
return TestClient(app)
def _sign(payload: bytes, secret: str = "test-webhook-secret") -> str:
return "sha256=" + hmac.new(secret.encode(), payload, hashlib.sha256).hexdigest()
class TestJiraWebhook:
def test_health(self, webhook_client):
resp = webhook_client.get("/webhooks/jira/health")
assert resp.status_code == 200
assert resp.json()["status"] == "ok"
def test_missing_signature_401(self, webhook_client):
resp = webhook_client.post("/webhooks/jira", json={"webhookEvent": "test"})
assert resp.status_code == 401
def test_invalid_signature_401(self, webhook_client):
body = json.dumps({"webhookEvent": "test"}).encode()
resp = webhook_client.post(
"/webhooks/jira", content=body,
headers={"X-Hub-Signature-256": "sha256=invalid", "Content-Type": "application/json"},
)
assert resp.status_code == 401
def test_valid_signature_accepted(self, webhook_client):
body = json.dumps({
"webhookEvent": "jira:issue_updated",
"issue": {"key": "TEST-1", "fields": {"summary": "Test"}},
}).encode()
sig = _sign(body)
resp = webhook_client.post(
"/webhooks/jira", content=body,
headers={"X-Hub-Signature-256": sig, "Content-Type": "application/json"},
)
# 200 or 503 (Jira not configured) — but NOT 401
assert resp.status_code in (200, 503)
def test_empty_payload_400(self, webhook_client):
body = b""
sig = _sign(body)
resp = webhook_client.post(
"/webhooks/jira", content=body,
headers={"X-Hub-Signature-256": sig, "Content-Type": "application/json"},
)
assert resp.status_code == 400
```
- [ ] **Step 3: Run test to verify it fails**
Run: `pytest tests/test_jira_webhooks.py -v`
Expected: FAIL — no `/webhooks/jira` route
- [ ] **Step 4: Create the FastAPI adapter**
```python
# app/api/jira_webhooks.py
"""Jira webhook endpoint — FastAPI adapter for connectors/jira/webhook.py logic."""
import hashlib
import hmac
import json
import logging
from datetime import datetime
from pathlib import Path
from fastapi import APIRouter, HTTPException, Request
from connectors.jira.service import Config, get_jira_service
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/webhooks", tags=["webhooks"])
WEBHOOK_LOG_DIR = Config.JIRA_DATA_DIR / "webhook_events"
def _verify_signature(payload: bytes, signature: str | None) -> bool:
secret = Config.JIRA_WEBHOOK_SECRET
if not secret:
logger.warning("JIRA_WEBHOOK_SECRET not configured, skipping verification")
return True
if not signature:
return False
if signature.startswith("sha256="):
signature = signature[7:]
expected = hmac.new(secret.encode(), payload, hashlib.sha256).hexdigest()
return hmac.compare_digest(signature, expected)
def _log_event(event_data: dict) -> None:
try:
WEBHOOK_LOG_DIR.mkdir(parents=True, exist_ok=True)
ts = datetime.utcnow().strftime("%Y%m%d_%H%M%S_%f")
event_type = event_data.get("webhookEvent", "unknown").replace(":", "_")
path = WEBHOOK_LOG_DIR / f"{ts}_{event_type}.json"
path.write_text(json.dumps(event_data, indent=2, default=str))
except Exception as e:
logger.warning("Failed to log webhook event: %s", e)
@router.post("/jira")
async def receive_jira_webhook(request: Request):
payload = await request.body()
signature = request.headers.get("X-Hub-Signature-256") or request.headers.get("X-Hub-Signature")
if not _verify_signature(payload, signature):
raise HTTPException(status_code=401, detail="Invalid signature")
try:
event_data = json.loads(payload) if payload else None
except (json.JSONDecodeError, ValueError):
raise HTTPException(status_code=400, detail="Invalid JSON")
if not event_data:
raise HTTPException(status_code=400, detail="Empty payload")
_log_event(event_data)
webhook_event = event_data.get("webhookEvent", "unknown")
issue_key = event_data.get("issue", {}).get("key", "unknown")
logger.info("Received webhook: %s for %s", webhook_event, issue_key)
jira_service = get_jira_service()
if not jira_service.is_configured():
return {"status": "error", "message": "Jira not configured"}, 503
success = jira_service.process_webhook_event(event_data)
if success:
return {"status": "ok", "event": webhook_event, "issue": issue_key}
raise HTTPException(status_code=500, detail="Failed to process event")
@router.get("/jira/health")
async def jira_webhook_health():
jira_service = get_jira_service()
return {
"status": "ok",
"configured": jira_service.is_configured(),
"webhook_secret_set": bool(Config.JIRA_WEBHOOK_SECRET),
"jira_domain": Config.JIRA_DOMAIN or "(not set)",
}
```
- [ ] **Step 5: Register router in app/main.py**
```python
# After existing router imports, add:
from app.api.jira_webhooks import router as jira_webhooks_router
# In create_app(), add:
app.include_router(jira_webhooks_router)
```
- [ ] **Step 6: Run tests to verify they pass**
Run: `pytest tests/test_jira_webhooks.py -v`
Expected: 5 passed
- [ ] **Step 7: Commit**
```bash
git add app/api/jira_webhooks.py app/main.py connectors/jira/service.py tests/test_jira_webhooks.py
git commit -m "feat: Jira webhook FastAPI adapter — replaces Flask Blueprint"
```
---
### Task 2: Docker Compose — Add Missing Services + Scheduler Auth
**Files:**
- Modify: `docker-compose.yml`
- Modify: `services/scheduler/__main__.py`
- Delete: `services/catalog_refresh/`, `services/data_refresh/` (empty dirs)
- [ ] **Step 1: Add missing services to docker-compose.yml**
Add after `telegram-bot` service:
```yaml
ws-gateway:
build: .
command: python -m services.ws_gateway
volumes:
- data:/data
env_file: .env
environment:
- DATA_DIR=/data
depends_on:
- app
profiles:
- full
restart: unless-stopped
corporate-memory:
build: .
command: python -m services.corporate_memory
volumes:
- data:/data
env_file: .env
environment:
- DATA_DIR=/data
depends_on:
- app
profiles:
- full
restart: unless-stopped
session-collector:
build: .
command: python -m services.session_collector
volumes:
- data:/data
env_file: .env
environment:
- DATA_DIR=/data
depends_on:
- app
profiles:
- full
restart: unless-stopped
```
- [ ] **Step 2: Fix scheduler auto-auth**
In `services/scheduler/__main__.py`, add token auto-fetch if `SCHEDULER_API_TOKEN` not set:
```python
# After line 25 (SCHEDULER_API_TOKEN = ...), add:
def _get_auth_token() -> str:
"""Get auth token — use SCHEDULER_API_TOKEN or auto-fetch from API."""
token = SCHEDULER_API_TOKEN
if token:
return token
# Auto-fetch: call /auth/token with SEED_ADMIN_EMAIL
admin_email = os.environ.get("SEED_ADMIN_EMAIL", "")
if not admin_email:
logger.warning("No SCHEDULER_API_TOKEN or SEED_ADMIN_EMAIL — scheduler calls will be unauthenticated")
return ""
try:
resp = httpx.post(f"{API_URL}/auth/token", json={"email": admin_email}, timeout=10)
if resp.status_code == 200:
token = resp.json().get("access_token", "")
logger.info("Auto-fetched scheduler token for %s", admin_email)
return token
except Exception as e:
logger.warning("Failed to fetch scheduler token: %s", e)
return ""
```
Update `_call_api` to use `_get_auth_token()`:
```python
def _call_api(endpoint: str, method: str = "POST") -> bool:
url = f"{API_URL}{endpoint}"
headers = {}
token = _get_auth_token()
if token:
headers["Authorization"] = f"Bearer {token}"
# ... rest unchanged
```
- [ ] **Step 3: Add SEED_ADMIN_EMAIL to scheduler in docker-compose.yml**
```yaml
scheduler:
# ... existing config ...
environment:
- DATA_DIR=/data
- API_URL=http://app:8000
- SEED_ADMIN_EMAIL=${SEED_ADMIN_EMAIL:-}
```
- [ ] **Step 4: Delete empty service directories**
```bash
rm -rf services/catalog_refresh/ services/data_refresh/
```
- [ ] **Step 5: Commit**
```bash
git add docker-compose.yml services/scheduler/__main__.py
git add -A services/catalog_refresh/ services/data_refresh/
git commit -m "feat: add Docker services (ws-gateway, corporate-memory, session-collector) + scheduler auto-auth"
```
---
### Task 3: Dynamic Auth Providers on Login Page
**Files:**
- Modify: `app/web/router.py`
- [ ] **Step 1: Update login_page in router.py**
Replace the hard-coded providers list (lines 152-158):
```python
@router.get("/login", response_class=HTMLResponse)
async def login_page(request: Request):
providers = []
# Google OAuth — available if credentials configured
try:
from app.auth.providers.google import is_available as google_available
if google_available():
providers.append({"name": "google", "display_name": "Google", "icon": "google"})
except Exception:
pass
# Password auth — always available
providers.append({"name": "password", "display_name": "Email & Password", "icon": "key"})
# Email magic link — available if configured
try:
from app.auth.providers.email import is_available as email_available
if email_available():
providers.append({"name": "email", "display_name": "Email Link", "icon": "mail"})
except Exception:
pass
ctx = _build_context(request, providers=providers)
return templates.TemplateResponse(request, "login.html", ctx)
```
- [ ] **Step 2: Verify login page still renders**
Run: `pytest tests/test_api_complete.py::TestWebUI::test_login_page -v`
Expected: PASS
- [ ] **Step 3: Commit**
```bash
git add app/web/router.py
git commit -m "feat: dynamic auth provider detection on login page"
```
---
### Task 4: Profiler Auto-Trigger After Sync
**Files:**
- Modify: `app/api/sync.py`
- Modify: `app/api/catalog.py`
- [ ] **Step 1: Add profiler call after orchestrator rebuild in sync.py**
After the orchestrator rebuild line in `_run_sync`, add:
```python
# Auto-profile synced tables
try:
from src.profiler import profile_table, TableInfo, parse_data_description, load_sync_state, load_metrics, get_parquet_path
from src.db import get_system_db
from src.repositories.profiles import ProfileRepository
from pathlib import Path
data_dir = Path(os.environ.get("DATA_DIR", "./data"))
extracts_dir = data_dir / "extracts"
sys_conn = get_system_db()
try:
profile_repo = ProfileRepository(sys_conn)
# Profile each synced table from extract parquets
for source_name, table_names in views.items():
for table_name in table_names[:10]: # Limit to 10 per sync to avoid timeout
pq_path = extracts_dir / source_name / "data" / f"{table_name}.parquet"
if not pq_path.exists():
continue
try:
table_info = TableInfo(name=table_name, table_id=table_name)
profile = profile_table(table_info, pq_path, [], {}, {})
profile_repo.save(table_name, profile)
except Exception as pe:
print(f"[SYNC] Profile {table_name}: {pe}", file=_sys.stderr, flush=True)
finally:
sys_conn.close()
print(f"[SYNC] Profiler complete", file=_sys.stderr, flush=True)
except Exception as e:
print(f"[SYNC] Profiler skipped: {e}", file=_sys.stderr, flush=True)
```
- [ ] **Step 2: Add profile refresh endpoint to catalog.py**
```python
# app/api/catalog.py — add after existing endpoints:
@router.post("/profile/{table_name}/refresh")
async def refresh_profile(
table_name: str,
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Re-generate profile for a table on demand."""
import os as _os
from pathlib import Path as _Path
from src.profiler import profile_table, TableInfo
data_dir = _Path(_os.environ.get("DATA_DIR", "./data"))
extracts_dir = data_dir / "extracts"
# Find parquet file
candidates = list(extracts_dir.rglob(f"data/{table_name}.parquet"))
if not candidates:
raise HTTPException(status_code=404, detail=f"No parquet for '{table_name}'")
try:
table_info = TableInfo(name=table_name, table_id=table_name)
profile = profile_table(table_info, candidates[0], [], {}, {})
ProfileRepository(conn).save(table_name, profile)
return {"status": "ok", "table": table_name, "columns": len(profile.get("columns", {}))}
except Exception as e:
raise HTTPException(status_code=500, detail=f"Profile failed: {e}")
```
- [ ] **Step 3: Commit**
```bash
git add app/api/sync.py app/api/catalog.py
git commit -m "feat: auto-profile after sync + on-demand profile refresh endpoint"
```
---
### Task 5: Final Verification
- [ ] **Step 1: Run full test suite**
```bash
pytest tests/ --ignore=tests/test_cli.py -v
```
Expected: all pass
- [ ] **Step 2: Redeploy to production**
```bash
ssh -i ~/.ssh/google_compute_engine deploy@35.195.96.98 \
'cd /home/deploy/app && GIT_SSH_COMMAND="ssh -i ~/.ssh/github_deploy -o StrictHostKeyChecking=no" git pull && sudo docker compose build --quiet && sudo docker compose up -d'
```
- [ ] **Step 3: Verify on production**
```bash
# Health
curl http://35.195.96.98:8000/api/health
# Jira webhook health
curl http://35.195.96.98:8000/webhooks/jira/health
# Login page
curl -s http://35.195.96.98:8000/login | grep -o "password\|google\|email"
# Scheduler logs (should show token auto-fetch)
ssh deploy@35.195.96.98 'cd /home/deploy/app && sudo docker compose logs scheduler --tail 5'
```
- [ ] **Step 4: Commit and push**
```bash
git push origin feature/v2-fastapi-duckdb-docker-cli
git push fork feature/v2-fastapi-duckdb-docker-cli
```

View file

@ -0,0 +1,344 @@
# Security Hardening — Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Fix all critical/high security vulnerabilities found by audit before merging to main. Also update Python version, sync deps, fix Docker prod config.
**Architecture:** 7 independent fix tasks targeting specific vulnerabilities. Each is self-contained.
**Tech Stack:** Python 3.13, FastAPI, DuckDB, Docker
---
### Task 1: SQL Query Hardening
**Files:**
- Modify: `app/api/query.py`
- Modify: `src/db.py` (add read-only analytics getter)
- Test: existing `tests/test_security.py` should still pass
- [ ] **Step 1: Add read-only analytics DB connection**
In `src/db.py`, add after `get_analytics_db()`:
```python
def get_analytics_db_readonly() -> duckdb.DuckDBPyConnection:
"""Read-only connection to analytics DB. Blocks writes and external access."""
db_path = _get_data_dir() / "analytics" / "server.duckdb"
if not db_path.exists():
db_path.parent.mkdir(parents=True, exist_ok=True)
conn = duckdb.connect(str(db_path), read_only=True)
conn.execute("SET enable_external_access = false")
return conn
```
- [ ] **Step 2: Harden query endpoint**
In `app/api/query.py`, replace `get_analytics_db()` with `get_analytics_db_readonly()`. Add to blocklist:
```python
blocked = [
"drop ", "delete ", "insert ", "update ", "alter ", "create ",
"copy ", "attach ", "detach ", "load ", "install ",
"export ", "import ", "pragma ",
"read_csv", "read_json", "read_parquet(", "read_text",
"write_csv", "write_parquet",
"read_blob", "glob(", "read_ndjson",
";",
"'/", '"/", # Block absolute file paths in FROM clause
]
```
- [ ] **Step 3: Run tests**
```bash
pytest tests/test_security.py tests/test_e2e_api.py -v --tb=short
```
- [ ] **Step 4: Commit**
```bash
git commit -m "security: query endpoint — read-only DB + disable external access + block file paths"
```
---
### Task 2: Upload Path Traversal Fix
**Files:**
- Modify: `app/api/upload.py`
- [ ] **Step 1: Sanitize filenames**
Replace lines 30-31 and 47-48 with:
```python
# Sanitize filename — strip directory components to prevent traversal
raw_name = file.filename or f"session_{uuid.uuid4().hex[:8]}.jsonl"
filename = Path(raw_name).name # strips ../../../ etc
if not filename or filename.startswith("."):
filename = f"upload_{uuid.uuid4().hex[:8]}"
target = sessions_dir / filename
```
Same pattern for artifact upload.
- [ ] **Step 2: Commit**
```bash
git commit -m "security: sanitize upload filenames — prevent path traversal"
```
---
### Task 3: Script Sandbox Hardening
**Files:**
- Modify: `app/api/scripts.py`
- [ ] **Step 1: Add AST-based validation**
Add before the blocklist check:
```python
import ast
# AST-based validation — catches obfuscated imports
try:
tree = ast.parse(source)
except SyntaxError as e:
raise HTTPException(status_code=400, detail=f"Script syntax error: {e}")
for node in ast.walk(tree):
if isinstance(node, ast.Import):
for alias in node.names:
if alias.name.split(".")[0] in BLOCKED_MODULES:
raise HTTPException(status_code=400, detail=f"Blocked import: {alias.name}")
elif isinstance(node, ast.ImportFrom):
if node.module and node.module.split(".")[0] in BLOCKED_MODULES:
raise HTTPException(status_code=400, detail=f"Blocked import: {node.module}")
elif isinstance(node, ast.Call):
if isinstance(node.func, ast.Name) and node.func.id in BLOCKED_FUNCTIONS:
raise HTTPException(status_code=400, detail=f"Blocked function: {node.func.id}")
BLOCKED_MODULES = {"os", "sys", "subprocess", "shutil", "ctypes", "importlib", "socket",
"requests", "urllib", "http", "signal", "pathlib", "builtins"}
BLOCKED_FUNCTIONS = {"exec", "eval", "compile", "open", "globals", "locals",
"getattr", "setattr", "delattr", "breakpoint", "__import__"}
```
Keep the string blocklist as a secondary check.
- [ ] **Step 2: Commit**
```bash
git commit -m "security: AST-based script validation — catches obfuscated imports"
```
---
### Task 4: Password Hashing + Auth Fixes
**Files:**
- Modify: `app/auth/providers/password.py` (remove SHA256 fallback)
- Modify: `app/auth/providers/google.py` (add secure cookie flag)
- Modify: `app/auth/dependencies.py` (fix get_optional_user bug)
- Modify: `app/auth/jwt.py` (add secret length validation)
- Modify: `pyproject.toml` (add missing deps)
- [ ] **Step 1: Remove SHA256 fallback in password.py**
Replace the try/except ImportError block with:
```python
from argon2 import PasswordHasher
ph = PasswordHasher()
hashed = ph.hash(request.password)
```
Remove all `except ImportError: import hashlib` blocks. Same in `app/auth/router.py` if present.
- [ ] **Step 2: Add secure flag to Google OAuth cookie**
In `app/auth/providers/google.py`, change set_cookie to:
```python
is_https = os.environ.get("HTTPS", "").lower() in ("1", "true") or request.url.scheme == "https"
response.set_cookie(
key="access_token", value=jwt_token,
httponly=True, max_age=86400 * 30, samesite="lax",
secure=is_https,
)
```
- [ ] **Step 3: Fix get_optional_user argument bug**
In `app/auth/dependencies.py`, change line 68:
```python
# Before (wrong):
return await get_current_user(authorization, conn)
# After (correct):
return await get_current_user(request=None, authorization=authorization, conn=conn)
```
- [ ] **Step 4: Add JWT secret validation**
In `app/auth/jwt.py`, add after SECRET_KEY:
```python
if len(SECRET_KEY) < 32 and not os.environ.get("TESTING"):
import warnings
warnings.warn("JWT_SECRET_KEY is less than 32 characters — insecure for production", stacklevel=2)
```
- [ ] **Step 5: Sync pyproject.toml deps**
Add to pyproject.toml dependencies:
```toml
"authlib>=1.3.0",
"argon2-cffi>=23.1.0",
```
- [ ] **Step 6: Commit**
```bash
git commit -m "security: fix password hashing, OAuth cookie, JWT validation, optional_user bug"
```
---
### Task 5: CORS + Session Middleware
**Files:**
- Modify: `app/main.py`
- [ ] **Step 1: Fix CORS**
Replace wildcard CORS with environment-configured origins:
```python
cors_origins = os.environ.get("CORS_ORIGINS", "http://localhost:3000,http://localhost:8000").split(",")
app.add_middleware(
CORSMiddleware,
allow_origins=[o.strip() for o in cors_origins],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
```
- [ ] **Step 2: Fix SessionMiddleware secret**
```python
session_secret = os.environ.get("SESSION_SECRET", os.environ.get("JWT_SECRET_KEY", ""))
if not session_secret:
import secrets
session_secret = secrets.token_hex(32)
app.add_middleware(SessionMiddleware, secret_key=session_secret)
```
- [ ] **Step 3: Commit**
```bash
git commit -m "security: CORS from env config + session secret validation"
```
---
### Task 6: Docker Production Config
**Files:**
- Modify: `docker-compose.yml` (remove --reload)
- Modify: `Dockerfile` (upgrade Python)
- [ ] **Step 1: Remove --reload from docker-compose.yml**
```yaml
# Before:
command: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
# After:
command: uvicorn app.main:app --host 0.0.0.0 --port 8000
```
Create `docker-compose.override.yml` for dev:
```yaml
services:
app:
command: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
volumes:
- .:/app
```
- [ ] **Step 2: Upgrade Dockerfile to Python 3.13**
```dockerfile
FROM python:3.13-slim
```
- [ ] **Step 3: Commit**
```bash
git commit -m "chore: Docker production config — no reload, Python 3.13"
```
---
### Task 7: Stale Docs Cleanup + datetime.utcnow Fix
**Files:**
- Modify: `CLAUDE.md` (fix test count)
- Delete or update: stale docs referencing Flask
- Modify: `app/api/jira_webhooks.py`, `src/profiler.py` (utcnow → now(UTC))
- [ ] **Step 1: Fix CLAUDE.md test count**
Update test count to current number.
- [ ] **Step 2: Delete stale docs**
```bash
rm docs/superpowers/plans/*.md # agent plans, not needed in repo
```
- [ ] **Step 3: Fix datetime.utcnow deprecation**
Replace `datetime.utcnow()` with `datetime.now(timezone.utc)` in:
- `app/api/jira_webhooks.py`
- `src/profiler.py` (lines 1213, 1379)
- [ ] **Step 4: Commit**
```bash
git commit -m "chore: fix stale docs, test count, datetime deprecation"
```
---
## Execution order
```
Task 1 (SQL) ─┐
Task 2 (Upload) ├─ 3 parallel agents
Task 3 (Scripts) ─┘
Task 4 (Auth) ─┐
Task 5 (CORS) ├─ 3 parallel agents
Task 6 (Docker) ─┘
Task 7 (Docs) ── sequential after above
```
## Verification
```bash
# All tests pass
pytest tests/ --ignore=tests/test_cli.py -v
# Security spot checks
python3 -c "from app.api.query import *" # no errors
curl -X POST http://localhost:8000/api/query -d '{"sql":"SELECT * FROM \"/etc/passwd\""}' # should fail
# Docker build
docker build -t test .
```

View file

@ -17,7 +17,7 @@ End-to-end test of the full platform on a clean VM with a new GitHub repository.
**On your local machine:**
```bash
cd /Users/padak/github/oss-ai-data-analyst
cd /path/to/agnes-the-ai-analyst
# Create repo on GitHub (pick org/name)
gh repo create YOUR_ORG/ai-data-analyst --private --source=. --push

View file

@ -70,12 +70,12 @@ locals {
echo "=== Cloning repository ==="
APP_DIR="/opt/data-analyst"
if [ ! -d "$APP_DIR" ]; then
git clone https://github.com/padak/tmp_oss.git "$APP_DIR"
git clone https://github.com/keboola/agnes-the-ai-analyst.git "$APP_DIR"
cd "$APP_DIR"
git checkout feature/v2-fastapi-duckdb-docker-cli
git checkout main
else
cd "$APP_DIR"
git pull origin feature/v2-fastapi-duckdb-docker-cli || true
git pull origin main || true
fi
echo "=== Creating .env ==="

View file

@ -1,5 +1,5 @@
# Copy to terraform.tfvars and fill in values
project_id = "kids-ai-data-analysis"
project_id = "your-gcp-project"
region = "europe-north1"
zone = "europe-north1-a"
machine_type = "e2-small" # 2 vCPU, 2GB RAM, ~$7/mo