agnes-the-ai-analyst/dev_docs/draft/services-integration-claude.md
Petr 485ac0a742 Security fixes: sanitize dev_docs, harden sudoers and config validation
H1 - Sanitize dev_docs/ for public release:
  - Replace all real employee names with generic placeholders
    (padak->admin1, matejkys->admin2, dasa->admin3, petr->john, etc.)
  - Replace GCP project ID (kids-ai-data-analysis -> your-gcp-project)
  - Replace server hostname (data-broker-for-claude -> your-server)
  - Replace real IP address (34.88.8.46 -> YOUR_SERVER_IP)
  - Replace internal FQDN with placeholder
  - Covers: security.md, server.md, disaster-recovery.md, desktop-app.md,
    session_explore.md, plan-rsync-fix.md, draft/*.md

H3 - webapp-setup.sh: validate sudoers syntax BEFORE copying to /etc/sudoers.d
  - Prevents broken sudo if syntax is invalid
  - Uses install -m 440 for atomic copy with correct permissions

M1 - setup.sh: deploy user created with /usr/sbin/nologin instead of /bin/bash
  - CI/CD service account does not need interactive shell

M2 - config/loader.py: warn on missing env vars, validate webapp_secret_key
  - _resolve_env_refs now logs warnings for unset ${ENV_VAR} references
  - _validate_config checks auth.webapp_secret_key is non-empty
  - Prevents Flask signing sessions with empty secret key

All 118 tests pass.
2026-03-09 08:06:45 +01:00

233 lines
9.4 KiB
Markdown

# Service Connector - Integration of Internal APIs into Data Analyst Platform
## Context
The data analyst platform currently supports only data analysis (parquet files + DuckDB). We want to extend it so analysts can also interact with internal services (Purchase Order system, Invoicing, CRM) through Claude Code. This requires:
1. **API keys** delivered to the analyst's local machine (`.env` file)
2. **Skills** teaching Claude Code how to use each service's API (`.claude/rules/` markdown files)
3. **Seamless UX** - non-technical users click "Connect" in the web portal, everything else is automatic
Key constraints:
- All external services are internal apps (we can modify them)
- They already have Google OAuth and Bearer token/API key authentication
- They already have token generation UI
- We target 2-3 services initially
- Must reuse established patterns (sudo install, atomic JSON, sync_data.sh)
## Architecture Overview
```
User clicks "Connect" on your-instance.example.com
|
v
Webapp calls external service's internal token-exchange endpoint
| (service-to-service, shared secret)
v
API key returned, stored in /data/service-connectors/connections.json
|
v
Webapp writes /home/{user}/.service_env (sudo install, mode 600)
Webapp writes /home/{user}/.claude_rules/sc_{service}.md (skill file)
|
v
Analyst runs sync_data.sh
|
v
.service_env -> merged into ~/keboola-analysis/.env
sc_*.md -> already synced with existing corporate memory rules sync
```
## Implementation Plan
### Phase 1: Service Registry & Infrastructure
**1.1 Create service registry config**
- File: `docs/setup/service_connectors.json`
- Defines available services: id, name, description, URLs, env var names, skill file name
- Deployed to `/data/docs/setup/` by deploy.sh
**1.2 Create sudo helper script**
- File: `server/bin/install-service-env`
- Accepts: USERNAME, ENV_SOURCE_PATH, SKILLS_SOURCE_DIR
- Installs `.service_env` (mode 600) to user home
- Installs `sc_*.md` skill files to `.claude_rules/` (mode 600)
- Only removes `sc_*.md` files (leaves `km_*.md` from corporate memory intact)
- Template: `server/bin/install-user-rules` (63 lines, same structure)
**1.3 Update sudoers**
- File: `server/sudoers-webapp` - add entry for `install-service-env`
**1.4 Update deploy.sh**
- Create `/data/service-connectors/` directory (www-data:data-ops, 2770)
- Deploy service registry and skill files
- Add new env vars to .env block: `SC_SECRET_PURCHASE_ORDERS`, `SC_SECRET_INVOICING`, `SC_SECRET_CRM`
**1.5 Add config entries**
- File: `webapp/config.py` - no new config class entries needed (secrets read directly with `os.environ.get()` in the service module, same pattern as sync_settings_service.py)
### Phase 2: Backend Service
**2.1 Create service connector module**
- File: `webapp/service_connector_service.py`
- Pattern: follows `webapp/sync_settings_service.py` exactly
Key functions:
```python
# Data storage
CONNECTORS_DIR = Path(os.environ.get("CONNECTORS_DIR", "/data/service-connectors"))
CONNECTIONS_FILE = CONNECTORS_DIR / "connections.json"
# Core functions
def get_available_services() -> dict # Load registry
def get_user_connections(username: str) -> dict # User's connection status
def connect_service(username, service_id, user_email) -> (bool, str) # Token exchange + install
def disconnect_service(username, service_id) -> (bool, str) # Revoke + cleanup
def check_service_health(service_id) -> dict # Health check
# Internal
def _exchange_token(service, user_email) -> dict | None # Call external service
def _revoke_token(service, token_id) -> bool # Call revoke endpoint
def _regenerate_user_env(username) -> bool # Write .service_env via sudo
def _install_service_skills(username) -> bool # Write sc_*.md via sudo
def _get_server_username(webapp_username) -> str # Reuse WEBAPP_TO_SERVER_USERNAME
```
Storage format (`connections.json`):
```json
{
"john": {
"purchase_orders": {
"connected": true,
"api_key": "pk_live_abc123...",
"token_id": "tok_xyz789",
"connected_at": "2026-02-16T12:00:00Z",
"expires_at": "2026-05-17T12:00:00Z"
}
}
}
```
Note: API keys stored in connections.json (protected by 660 permissions, www-data:data-ops). This follows the same approach as telegram_users.json storing chat_ids. For internal services, this is acceptable security level.
**2.2 Add API routes to webapp**
- File: `webapp/app.py` - add routes in `register_routes()`
```
GET /api/service-connectors - List services + user connections
POST /api/service-connectors/connect - Connect to a service {service_id}
POST /api/service-connectors/disconnect - Disconnect {service_id}
GET /api/service-connectors/health/<service_id> - Health check
```
**2.3 Token exchange protocol**
What each external service needs to implement:
```
POST /api/internal/token-exchange
Authorization: Bearer <shared_secret>
Body: {"user_email": "john@your-domain.com", "ttl_days": 90}
Response: {"status": "ok", "api_key": "...", "token_id": "...", "expires_at": "..."}
POST /api/internal/token-revoke
Authorization: Bearer <shared_secret>
Body: {"token_id": "tok_xyz789"}
Response: {"status": "ok"}
```
### Phase 3: Dashboard UI
**3.1 Add Service Connectors card to dashboard**
- File: `webapp/templates/dashboard.html`
- New card in the existing 2-column layout (same pattern as Data Settings and Telegram cards)
- Shows grid of service cards with Connect/Disconnect buttons
- Connected = green badge + expiry date
- AJAX calls to `/api/service-connectors/*` endpoints
### Phase 4: Sync & Skills
**4.1 Extend sync_data.sh**
- File: `scripts/sync_data.sh`
- Add block after corporate memory rules sync (line ~418):
1. Download `~/.service_env` from server via SCP
2. If exists: merge into local `.env` using marker comments (`# --- SERVICE CONNECTOR START/END ---`)
3. If not exists: clean old service connector block from `.env`
```bash
# --- Sync service connector credentials ---
if scp -q data-analyst:~/.service_env /tmp/.service_env_$$ 2>/dev/null; then
# Remove old block, append new one with markers
sed -i.bak '/^# --- SERVICE CONNECTOR START ---$/,/^# --- SERVICE CONNECTOR END ---$/d' ./.env 2>/dev/null
{ echo "# --- SERVICE CONNECTOR START ---"; cat /tmp/.service_env_$$; echo "# --- SERVICE CONNECTOR END ---"; } >> ./.env
rm -f /tmp/.service_env_$$
fi
```
Note: `sc_*.md` skills are already synced by the existing corporate memory sync block (line 410: `scp -rq "data-analyst:~/.claude_rules/"* .claude/rules/`).
**4.2 Create skill files**
- Directory: `docs/service_connector_skills/`
- Files: `sc_purchase_orders.md`, `sc_invoicing.md`, `sc_crm.md`
- Content: Authentication setup, available endpoints, common patterns, data models
- Deployed to `/data/docs/service_connector_skills/` by deploy.sh
- Installed to user's `.claude_rules/` when they connect
### Phase 5: Tests
**5.1 Unit tests**
- File: `tests/test_service_connector_service.py`
- Test: connect/disconnect flow, env generation, registry loading, error handling
## Files to Create
| File | Purpose |
|------|---------|
| `webapp/service_connector_service.py` | Core service (connect, disconnect, env generation) |
| `docs/setup/service_connectors.json` | Service registry config |
| `docs/service_connector_skills/sc_purchase_orders.md` | PO API skill |
| `server/bin/install-service-env` | Sudo helper for env + skills install |
| `tests/test_service_connector_service.py` | Unit tests |
## Files to Modify
| File | Change |
|------|--------|
| `webapp/app.py` | Import service_connector_service, add 4 API routes |
| `webapp/templates/dashboard.html` | Add Service Connectors card widget |
| `server/sudoers-webapp` | Add `install-service-env` entry |
| `server/deploy.sh` | Create /data/service-connectors/, deploy skills, add env vars |
| `scripts/sync_data.sh` | Add .service_env download and .env merge block |
| `.github/workflows/deploy.yml` | Add SC_SECRET_* GitHub Secrets to env |
## Key Patterns Reused
- **Sudo install**: `sync_settings_service.py:_regenerate_user_config()` (line 143-183)
- **Atomic JSON**: `sync_settings_service.py:_write_json()` (line 61-74)
- **Username mapping**: `corporate_memory_service.py:_get_server_username()` (line 56-59)
- **Sudo helper script**: `server/bin/install-user-rules` (entire file)
- **Dashboard AJAX pattern**: Sync settings toggles in `dashboard.html`
## Security Model
| Stage | Protection |
|-------|------------|
| Token exchange (webapp <-> service) | HTTPS + shared secret in Authorization header |
| Central storage (connections.json) | /data/service-connectors/ (2770), file 660 |
| User home (.service_env) | Mode 600 (owner-only), sudo install |
| Transit (sync) | SCP over SSH |
| Client (.env) | Local filesystem; Claude Code settings deny Read(.env) |
| Claude Code usage | Python `load_dotenv()` via Bash (allowed) |
## Verification
1. **Unit tests**: `pytest tests/test_service_connector_service.py`
2. **Manual flow**:
- Deploy to server
- Log into your-instance.example.com
- Click "Connect" on PO system in dashboard
- Verify `.service_env` appears in `/home/{user}/`
- Run `sync_data.sh` on client
- Verify `.env` contains PO_API_KEY
- Verify `.claude/rules/sc_purchase_orders.md` exists
- In Claude Code: `python -c "from dotenv import load_dotenv; load_dotenv(); import os; print(os.environ.get('PO_API_KEY', 'NOT SET'))"`
3. **Disconnect flow**: Click Disconnect, verify key removed from .env after sync