agnes-the-ai-analyst/docs/CONFIGURATION.md
Petr 95358448e6 Add modular LLM connector for Corporate Memory
Replace hardwired Anthropic API calls with a pluggable provider system.
Each deployment configures its AI provider in instance.yaml — switching
between Anthropic, LiteLLM, OpenRouter, or any OpenAI-compatible proxy
is a config change, not a code change.

New connectors/llm/ module:
- StructuredExtractor Protocol with extract_json() interface
- AnthropicExtractor: direct Anthropic SDK with retry + backoff
- OpenAICompatExtractor: any OpenAI-compatible proxy with three-layer
  structured output fallback (json_schema -> json_object -> prompt)
- Configurable structured_output policy (strict/json/auto)
- Custom exception hierarchy (auth/rate_limit/timeout/format/refusal)
- Zero secrets in logs: no API keys, prompts, or responses logged

Reviewed by: Google Gemini, Claude Sonnet, OpenAI GPT-5.4.
Security audit passed with all critical findings resolved.
2026-03-23 12:08:33 +01:00

131 lines
2.6 KiB
Markdown

# Configuration Reference
## instance.yaml
The main configuration file for your AI Data Analyst instance. Located at `config/instance.yaml`.
### Instance Branding
```yaml
instance:
name: "AI Data Analyst" # UI title, email subjects
subtitle: "Acme Corp" # Header subtitle
copyright: "Acme Corp" # Footer copyright
```
### Authentication
```yaml
auth:
allowed_domain: "acme.com" # Google OAuth domain restriction
```
Only emails from this domain can log in via Google OAuth. External users can be added via password auth (requires SendGrid).
### Email
```yaml
email:
from_address: "noreply@acme.com"
from_name: "Acme Data Analyst"
```
Used for password auth setup and reset emails. Requires `SENDGRID_API_KEY` in `.env`.
### Server
```yaml
server:
host: "10.0.0.1" # Server IP
hostname: "data.acme.com" # Server DNS name
```
### Desktop App
```yaml
desktop:
jwt_issuer: "acme-analyst"
url_scheme: "acme-analyst"
```
### Data Source
```yaml
data_source:
type: "keboola" # keboola, csv, bigquery
```
### Users
```yaml
users:
john.doe:
name: "John Doe"
initials: "JD"
jane.smith:
name: "Jane Smith"
initials: "JS"
username_mapping:
john.doe: john # Only if webapp and server names differ
```
### Datasets
```yaml
datasets:
jira:
label: "Jira Tickets"
description: "Support tickets"
size_hint: "~50 MB"
requires: null
jira_attachments:
label: "Jira Attachments"
description: "File attachments"
size_hint: "~500 MB+"
requires: "jira"
```
### Catalog
```yaml
catalog:
categories:
sales:
label: "Sales"
icon: "sales"
hr:
label: "HR"
icon: "hr"
order: ["sales", "hr"]
```
## Environment Variables (.env)
### Required
| Variable | Description |
|----------|-------------|
| `WEBAPP_SECRET_KEY` | Flask session secret |
| `GOOGLE_CLIENT_ID` | Google OAuth client ID |
| `GOOGLE_CLIENT_SECRET` | Google OAuth client secret |
### Data Source (Keboola)
| Variable | Description |
|----------|-------------|
| `KEBOOLA_STORAGE_TOKEN` | Keboola Storage API token |
| `KEBOOLA_STACK_URL` | Keboola stack URL |
| `KEBOOLA_PROJECT_ID` | Keboola project ID |
| `DATA_DIR` | Data directory path |
### Optional
| Variable | Description |
|----------|-------------|
| `SENDGRID_API_KEY` | For password auth emails |
| `TELEGRAM_BOT_TOKEN` | For Telegram notifications |
| `ANTHROPIC_API_KEY` | For Corporate Memory AI |
| `LLM_API_KEY` | API key for LLM proxy (LiteLLM, OpenRouter, etc.) |
| `JIRA_WEBHOOK_SECRET` | For Jira integration |
| `CONFIG_DIR` | Override config directory path |