Replace hardwired Anthropic API calls with a pluggable provider system. Each deployment configures its AI provider in instance.yaml — switching between Anthropic, LiteLLM, OpenRouter, or any OpenAI-compatible proxy is a config change, not a code change. New connectors/llm/ module: - StructuredExtractor Protocol with extract_json() interface - AnthropicExtractor: direct Anthropic SDK with retry + backoff - OpenAICompatExtractor: any OpenAI-compatible proxy with three-layer structured output fallback (json_schema -> json_object -> prompt) - Configurable structured_output policy (strict/json/auto) - Custom exception hierarchy (auth/rate_limit/timeout/format/refusal) - Zero secrets in logs: no API keys, prompts, or responses logged Reviewed by: Google Gemini, Claude Sonnet, OpenAI GPT-5.4. Security audit passed with all critical findings resolved.
326 lines
13 KiB
Markdown
326 lines
13 KiB
Markdown
# Plan: Modular LLM Routing for Corporate Memory
|
|
|
|
> Reviewed by: Claude Opus (author), Google Gemini, Claude Sonnet, OpenAI GPT-5.4
|
|
> Feedback incorporated from all three external reviewers.
|
|
|
|
## Context
|
|
|
|
Corporate Memory is a feature that reads team members' local notes (CLAUDE.local.md),
|
|
sends them to a small AI model (Claude Haiku) for knowledge extraction, and builds
|
|
a shared knowledge base. Currently it's hardwired to call Anthropic's API directly.
|
|
|
|
Different clients deploying this platform use different AI providers:
|
|
|
|
| Client | AI Provider | Why |
|
|
|--------|------------|-----|
|
|
| Groupon | LiteLLM proxy | Corporate AI gateway, cost control, audit |
|
|
| Keboola | Direct Anthropic | Simple setup, single provider |
|
|
| Future client A | OpenRouter | Multi-model access, cost optimization |
|
|
| Future client B | Google Gemini | Existing Google Cloud relationship |
|
|
|
|
**Problem**: The code only works with Anthropic. Adding a second client means duplicating
|
|
or rewriting the AI calling logic.
|
|
|
|
**Solution**: Extract the AI calling logic into a modular connector that each instance
|
|
configures for its own provider. The connector lives in the open-source repo (code),
|
|
the configuration lives in the private instance repo (config).
|
|
|
|
## Design Principles
|
|
|
|
### 1. Structured Extraction, Not General AI
|
|
|
|
This connector has one job: send a prompt, get back structured JSON.
|
|
It is NOT a general-purpose AI chat interface. The naming and interface reflect this:
|
|
`StructuredExtractor` (not "LLMProvider"), `extract_json()` (not "chat" or "generate").
|
|
|
|
This keeps the scope tight and the interface honest. If we need general AI capabilities
|
|
later, we build a separate abstraction.
|
|
|
|
### 2. Instance Config Drives Provider Selection
|
|
|
|
Each deployment configures its AI provider in `instance.yaml` (the same file that
|
|
already configures authentication, branding, data sources, and catalog integration).
|
|
Secrets use `${ENV_VAR}` references, resolved at load time by the existing config loader.
|
|
|
|
The open-source code never knows which provider it's talking to. It receives a configured
|
|
extractor and calls `extract_json()`.
|
|
|
|
### 3. Backward Compatibility
|
|
|
|
Existing deployments using `ai.anthropic_api_key` in their config continue to work
|
|
without changes. The factory recognizes the legacy config shape and creates the
|
|
appropriate provider automatically. No migration step required for existing instances.
|
|
|
|
### 4. Structured Output Strategy (Configurable)
|
|
|
|
Not all AI providers support JSON schema enforcement equally. The connector supports
|
|
three levels, but the operator controls which are allowed:
|
|
|
|
1. **JSON Schema mode** — provider enforces the exact schema (best quality)
|
|
2. **JSON Object mode** — provider guarantees valid JSON but no schema (good quality)
|
|
3. **Prompt-based JSON** — instructions in the prompt ask for JSON (acceptable quality)
|
|
|
|
By default, all three layers are available as progressive fallback. But the operator
|
|
can restrict this in config:
|
|
|
|
```yaml
|
|
ai:
|
|
provider: "openai_compat"
|
|
# --- Structured output quality control ---
|
|
# AI models can return JSON in three ways, each with different reliability:
|
|
#
|
|
# Layer 1 - "json_schema" (best):
|
|
# The provider enforces an exact schema. Every field, type, and structure
|
|
# is guaranteed. Available on: Anthropic, OpenAI, Claude via LiteLLM.
|
|
#
|
|
# Layer 2 - "json_object" (good):
|
|
# The provider guarantees valid JSON, but does not enforce a specific schema.
|
|
# Fields may be missing or have wrong types. Available on most providers.
|
|
#
|
|
# Layer 3 - "prompt" (acceptable):
|
|
# The AI is asked to respond in JSON via instructions in the prompt.
|
|
# No technical enforcement — the model may still return invalid JSON.
|
|
# Works everywhere, but least reliable.
|
|
#
|
|
# "strict" = only Layer 1. Fail if provider doesn't support json_schema.
|
|
# Use when data quality is non-negotiable.
|
|
# "json" = Layer 1, fall back to Layer 2. No prompt-based fallback.
|
|
# Good balance of quality and compatibility.
|
|
# "auto" = All three layers as progressive fallback. Maximum compatibility.
|
|
# Use when you'd rather get imperfect data than no data.
|
|
structured_output: "strict"
|
|
```
|
|
|
|
When set to `"strict"`, the connector will NOT fall back to weaker strategies.
|
|
If the provider doesn't support JSON schema, the extraction fails with a clear error.
|
|
This is the right choice when data quality is non-negotiable.
|
|
|
|
### 5. Fail-Safe by Default
|
|
|
|
- Missing config → Corporate Memory logs a warning and skips AI extraction (doesn't crash)
|
|
- AI call fails → item marked as "unsafe" (conservative, nothing leaks)
|
|
- Truncated response → detected and retried once
|
|
- Auth error → fails fast with clear message (don't retry forever)
|
|
- Rate limit → waits and retries with backoff
|
|
|
|
### 6. Zero Secrets in Logs
|
|
|
|
The connector NEVER logs:
|
|
- API keys, tokens, or any secret values
|
|
- Prompt content (may contain user notes with sensitive data)
|
|
- Response content (may contain extracted knowledge before sensitivity check)
|
|
- Full URLs with query parameters (may contain tokens)
|
|
|
|
What IS logged:
|
|
- Provider type and model name
|
|
- Sanitized base URL (scheme + host only, no path/query)
|
|
- Structured output strategy selected
|
|
- Call duration (latency)
|
|
- Error classification (auth/rate_limit/timeout/format — never the error body)
|
|
- Whether fallback was triggered
|
|
|
|
This is a hard rule, not a guideline. API keys and user content must never
|
|
appear in logs, stdout, stderr, or error messages propagated to callers.
|
|
|
|
## Architecture
|
|
|
|
### Where the code lives
|
|
|
|
```
|
|
OSS Repo (open-source, shared):
|
|
connectors/llm/ ← NEW: AI provider abstraction
|
|
base.py Interface definition
|
|
anthropic_provider.py Direct Anthropic API
|
|
openai_compat.py Any OpenAI-compatible proxy
|
|
factory.py Creates the right provider from config
|
|
|
|
services/corporate_memory/
|
|
collector.py ← MODIFIED: uses connector instead of direct API calls
|
|
|
|
Instance Repo (private, per-client):
|
|
config/instance.yaml ← MODIFIED: new ai: section
|
|
.env ← MODIFIED: new LLM_API_KEY secret
|
|
```
|
|
|
|
### How it flows
|
|
|
|
```
|
|
instance.yaml (ai: section)
|
|
↓
|
|
Config loader resolves ${ENV_VAR} secrets
|
|
↓
|
|
Factory reads provider type, creates extractor
|
|
↓
|
|
Corporate Memory calls extractor.extract_json(prompt, schema)
|
|
↓
|
|
Extractor routes to the right API:
|
|
├─ Anthropic SDK → api.anthropic.com/v1/messages
|
|
└─ OpenAI SDK → litellm.groupondev.com/v1/chat/completions
|
|
openrouter.ai/v1/chat/completions
|
|
any OpenAI-compatible endpoint
|
|
```
|
|
|
|
### Config examples
|
|
|
|
**Groupon (LiteLLM proxy):**
|
|
```yaml
|
|
ai:
|
|
provider: "openai_compat"
|
|
base_url: "https://litellm.groupondev.com"
|
|
api_key: "${LLM_API_KEY}"
|
|
model: "claude-haiku-4-5-20251001"
|
|
```
|
|
|
|
**Keboola (direct Anthropic):**
|
|
```yaml
|
|
ai:
|
|
provider: "anthropic"
|
|
api_key: "${ANTHROPIC_API_KEY}"
|
|
model: "claude-haiku-4-5-20251001"
|
|
```
|
|
|
|
**Legacy (existing deployments, no changes needed):**
|
|
```yaml
|
|
ai:
|
|
anthropic_api_key: "${ANTHROPIC_API_KEY}"
|
|
```
|
|
|
|
## What We're Improving
|
|
|
|
### A. From Hardwired to Pluggable
|
|
|
|
**Before**: One provider, baked into the code. Changing provider = changing code.
|
|
**After**: Provider is a config choice. Switching from Anthropic to LiteLLM to OpenRouter
|
|
is a YAML change + secret rotation. No code touches needed.
|
|
|
|
### B. From Fragile to Resilient
|
|
|
|
**Before**: API error = entire collection run fails. No retries.
|
|
**After**:
|
|
- Transient errors (rate limits, timeouts, network) → automatic retry with backoff
|
|
- Permanent errors (bad API key, unsupported model) → fail fast, clear error message
|
|
- Truncated responses (model hit token limit) → detected, retried with note
|
|
- Model refuses request → logged, item skipped safely
|
|
|
|
### C. From All-or-Nothing to Progressive Degradation
|
|
|
|
**Before**: Structured output works or it doesn't. Binary.
|
|
**After**: Three fallback layers (schema → json_object → prompt-based). The connector
|
|
adapts to what each provider actually supports instead of assuming capabilities.
|
|
|
|
### D. From Silent to Observable
|
|
|
|
**Before**: No visibility into what the AI extraction does.
|
|
**After**:
|
|
- Which provider/model is being used (logged at startup)
|
|
- Which structured output strategy was selected (logged once)
|
|
- How long each call takes (logged per call)
|
|
- Whether fallback was triggered (logged as warning)
|
|
- Clear error classification in logs
|
|
|
|
### E. From Coupled to Separated
|
|
|
|
**Before**: AI provider choice is an engineering decision embedded in code.
|
|
**After**: AI provider choice is an operations decision in instance config.
|
|
Each client controls their own provider, model, and API gateway independently.
|
|
|
|
## Error Handling Strategy
|
|
|
|
| Error Type | What Happens | Why |
|
|
|-----------|-------------|-----|
|
|
| Missing `ai:` config | Corporate Memory skips AI extraction, logs warning | Don't crash the whole service |
|
|
| Invalid API key | Fail fast, log error, skip collection run | Don't waste retries on permanent failure |
|
|
| Rate limit (429) | Wait + retry with exponential backoff (3 attempts) | Transient, will resolve |
|
|
| Network timeout | Retry once, then fail | Might be transient |
|
|
| Truncated response | Detect via finish_reason, retry once | Model hit token limit |
|
|
| Model refusal | Log, mark item as unsafe | Conservative: don't share uncertain content |
|
|
| Invalid JSON response | Log, mark item as unsafe | Better to skip than crash |
|
|
| Structured output unsupported | Fall back to json_object, then prompt-based | Adapt to provider capabilities |
|
|
|
|
## Scope Boundaries
|
|
|
|
**In scope (v1):**
|
|
- Anthropic direct provider (existing behavior, tested)
|
|
- OpenAI-compatible proxy provider (LiteLLM, verified against Groupon proxy)
|
|
- Backward compatibility with existing `ai.anthropic_api_key` config
|
|
- Three-layer structured output fallback
|
|
- Custom error hierarchy (auth / rate limit / timeout / format)
|
|
- Retry with backoff for transient errors
|
|
- Corporate Memory collector integration
|
|
|
|
**Explicitly NOT in scope (future):**
|
|
- Azure OpenAI, OpenRouter, Gemini — listed as "untested" until verified per-provider
|
|
- General-purpose AI chat/generation interface
|
|
- Streaming responses
|
|
- Multi-turn conversations
|
|
- Token usage tracking / cost monitoring (v2 consideration)
|
|
- Provider capability auto-detection at startup
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Tests (connector internals)
|
|
- Factory creates correct provider from each config shape
|
|
- Factory handles legacy `ai.anthropic_api_key` config
|
|
- Missing/invalid config raises clear errors
|
|
- Each provider formats API calls correctly (mocked SDK)
|
|
- Structured output fallback chain works
|
|
- Error classification (auth vs rate limit vs timeout)
|
|
|
|
### Integration Tests (Corporate Memory behavior)
|
|
- Full collection run with mocked provider
|
|
- Skip when no files changed (hash check)
|
|
- Preserve existing item IDs across runs
|
|
- Sensitivity check runs only on new items
|
|
- Fail-closed on sensitivity check errors
|
|
- user_hashes.json written only after successful processing
|
|
- Graceful degradation when `ai:` config is missing
|
|
|
|
### Manual Verification (before production)
|
|
- Dry-run against actual Groupon LiteLLM proxy
|
|
- Verify structured output works through proxy
|
|
- Verify sensitivity check works through proxy
|
|
- Full collection produces valid knowledge.json
|
|
|
|
## Deployment
|
|
|
|
The existing `deploy.sh` handles dependency installation from `requirements.txt`,
|
|
so no manual pip install is needed. The deployment sequence:
|
|
|
|
1. Add `openai` to `requirements.txt` (OSS repo)
|
|
2. Update `collector.py` to use new connector (OSS repo)
|
|
3. Add `ai:` section to `instance.yaml` (instance repo)
|
|
4. Add `LLM_API_KEY` secret to GHA secrets and deploy.yml (instance repo)
|
|
5. Add `CONFIG_DIR` to the wrapper script `collect-knowledge` (OSS repo)
|
|
6. Push both repos → CI/CD deploys automatically
|
|
7. Verify via `--dry-run` on server
|
|
|
|
**Rollback**: Revert both repos to previous commit. The legacy config path
|
|
means existing `ai.anthropic_api_key` still works if we need to roll back.
|
|
|
|
## Files to Modify
|
|
|
|
| File | Repo | Change |
|
|
|------|------|--------|
|
|
| `connectors/llm/` (5 new files) | OSS | New connector module |
|
|
| `services/corporate_memory/collector.py` | OSS | Use connector instead of direct API |
|
|
| `server/bin/collect-knowledge` | OSS | Add CONFIG_DIR |
|
|
| `requirements.txt` | OSS | Add `openai>=1.0.0` |
|
|
| `server/deploy.sh` | OSS | Add LLM_API_KEY to env propagation |
|
|
| `config/.env.template` | OSS | Document LLM_API_KEY |
|
|
| `config/instance.yaml.example` | OSS | Expanded ai: section with examples |
|
|
| `docs/CONFIGURATION.md` | OSS | Add AI provider docs |
|
|
| `tests/test_llm_connector.py` | OSS | New: connector tests |
|
|
| `tests/test_corporate_memory.py` | OSS | New/expanded: behavior tests |
|
|
| `config/instance.yaml` | Instance | Add ai: section for Groupon |
|
|
| `.github/workflows/deploy.yml` | Instance | Add LLM_API_KEY to .env |
|
|
| `env.example` | Instance | Document LLM_API_KEY |
|
|
|
|
## Risk Assessment
|
|
|
|
| Risk | Level | Mitigation |
|
|
|------|-------|-----------|
|
|
| LiteLLM structured output translation | Medium | Three-layer fallback + manual verification before deploy |
|
|
| Config migration breaks existing instances | Low | Backward compat shim for legacy config shape |
|
|
| New `openai` dependency conflicts | Low | Standard package, deploy.sh handles install |
|
|
| Corporate Memory regression | Medium | Expanded behavior tests covering all current logic |
|
|
| Systemd/wrapper script CONFIG_DIR | Low | Follows existing pattern from other services |
|