Replace hardwired Anthropic API calls with a pluggable provider system. Each deployment configures its AI provider in instance.yaml — switching between Anthropic, LiteLLM, OpenRouter, or any OpenAI-compatible proxy is a config change, not a code change. New connectors/llm/ module: - StructuredExtractor Protocol with extract_json() interface - AnthropicExtractor: direct Anthropic SDK with retry + backoff - OpenAICompatExtractor: any OpenAI-compatible proxy with three-layer structured output fallback (json_schema -> json_object -> prompt) - Configurable structured_output policy (strict/json/auto) - Custom exception hierarchy (auth/rate_limit/timeout/format/refusal) - Zero secrets in logs: no API keys, prompts, or responses logged Reviewed by: Google Gemini, Claude Sonnet, OpenAI GPT-5.4. Security audit passed with all critical findings resolved.
13 KiB
Plan: Modular LLM Routing for Corporate Memory
Reviewed by: Claude Opus (author), Google Gemini, Claude Sonnet, OpenAI GPT-5.4 Feedback incorporated from all three external reviewers.
Context
Corporate Memory is a feature that reads team members' local notes (CLAUDE.local.md), sends them to a small AI model (Claude Haiku) for knowledge extraction, and builds a shared knowledge base. Currently it's hardwired to call Anthropic's API directly.
Different clients deploying this platform use different AI providers:
| Client | AI Provider | Why |
|---|---|---|
| Groupon | LiteLLM proxy | Corporate AI gateway, cost control, audit |
| Keboola | Direct Anthropic | Simple setup, single provider |
| Future client A | OpenRouter | Multi-model access, cost optimization |
| Future client B | Google Gemini | Existing Google Cloud relationship |
Problem: The code only works with Anthropic. Adding a second client means duplicating or rewriting the AI calling logic.
Solution: Extract the AI calling logic into a modular connector that each instance configures for its own provider. The connector lives in the open-source repo (code), the configuration lives in the private instance repo (config).
Design Principles
1. Structured Extraction, Not General AI
This connector has one job: send a prompt, get back structured JSON.
It is NOT a general-purpose AI chat interface. The naming and interface reflect this:
StructuredExtractor (not "LLMProvider"), extract_json() (not "chat" or "generate").
This keeps the scope tight and the interface honest. If we need general AI capabilities later, we build a separate abstraction.
2. Instance Config Drives Provider Selection
Each deployment configures its AI provider in instance.yaml (the same file that
already configures authentication, branding, data sources, and catalog integration).
Secrets use ${ENV_VAR} references, resolved at load time by the existing config loader.
The open-source code never knows which provider it's talking to. It receives a configured
extractor and calls extract_json().
3. Backward Compatibility
Existing deployments using ai.anthropic_api_key in their config continue to work
without changes. The factory recognizes the legacy config shape and creates the
appropriate provider automatically. No migration step required for existing instances.
4. Structured Output Strategy (Configurable)
Not all AI providers support JSON schema enforcement equally. The connector supports three levels, but the operator controls which are allowed:
- JSON Schema mode — provider enforces the exact schema (best quality)
- JSON Object mode — provider guarantees valid JSON but no schema (good quality)
- Prompt-based JSON — instructions in the prompt ask for JSON (acceptable quality)
By default, all three layers are available as progressive fallback. But the operator can restrict this in config:
ai:
provider: "openai_compat"
# --- Structured output quality control ---
# AI models can return JSON in three ways, each with different reliability:
#
# Layer 1 - "json_schema" (best):
# The provider enforces an exact schema. Every field, type, and structure
# is guaranteed. Available on: Anthropic, OpenAI, Claude via LiteLLM.
#
# Layer 2 - "json_object" (good):
# The provider guarantees valid JSON, but does not enforce a specific schema.
# Fields may be missing or have wrong types. Available on most providers.
#
# Layer 3 - "prompt" (acceptable):
# The AI is asked to respond in JSON via instructions in the prompt.
# No technical enforcement — the model may still return invalid JSON.
# Works everywhere, but least reliable.
#
# "strict" = only Layer 1. Fail if provider doesn't support json_schema.
# Use when data quality is non-negotiable.
# "json" = Layer 1, fall back to Layer 2. No prompt-based fallback.
# Good balance of quality and compatibility.
# "auto" = All three layers as progressive fallback. Maximum compatibility.
# Use when you'd rather get imperfect data than no data.
structured_output: "strict"
When set to "strict", the connector will NOT fall back to weaker strategies.
If the provider doesn't support JSON schema, the extraction fails with a clear error.
This is the right choice when data quality is non-negotiable.
5. Fail-Safe by Default
- Missing config → Corporate Memory logs a warning and skips AI extraction (doesn't crash)
- AI call fails → item marked as "unsafe" (conservative, nothing leaks)
- Truncated response → detected and retried once
- Auth error → fails fast with clear message (don't retry forever)
- Rate limit → waits and retries with backoff
6. Zero Secrets in Logs
The connector NEVER logs:
- API keys, tokens, or any secret values
- Prompt content (may contain user notes with sensitive data)
- Response content (may contain extracted knowledge before sensitivity check)
- Full URLs with query parameters (may contain tokens)
What IS logged:
- Provider type and model name
- Sanitized base URL (scheme + host only, no path/query)
- Structured output strategy selected
- Call duration (latency)
- Error classification (auth/rate_limit/timeout/format — never the error body)
- Whether fallback was triggered
This is a hard rule, not a guideline. API keys and user content must never appear in logs, stdout, stderr, or error messages propagated to callers.
Architecture
Where the code lives
OSS Repo (open-source, shared):
connectors/llm/ ← NEW: AI provider abstraction
base.py Interface definition
anthropic_provider.py Direct Anthropic API
openai_compat.py Any OpenAI-compatible proxy
factory.py Creates the right provider from config
services/corporate_memory/
collector.py ← MODIFIED: uses connector instead of direct API calls
Instance Repo (private, per-client):
config/instance.yaml ← MODIFIED: new ai: section
.env ← MODIFIED: new LLM_API_KEY secret
How it flows
instance.yaml (ai: section)
↓
Config loader resolves ${ENV_VAR} secrets
↓
Factory reads provider type, creates extractor
↓
Corporate Memory calls extractor.extract_json(prompt, schema)
↓
Extractor routes to the right API:
├─ Anthropic SDK → api.anthropic.com/v1/messages
└─ OpenAI SDK → litellm.groupondev.com/v1/chat/completions
openrouter.ai/v1/chat/completions
any OpenAI-compatible endpoint
Config examples
Groupon (LiteLLM proxy):
ai:
provider: "openai_compat"
base_url: "https://litellm.groupondev.com"
api_key: "${LLM_API_KEY}"
model: "claude-haiku-4-5-20251001"
Keboola (direct Anthropic):
ai:
provider: "anthropic"
api_key: "${ANTHROPIC_API_KEY}"
model: "claude-haiku-4-5-20251001"
Legacy (existing deployments, no changes needed):
ai:
anthropic_api_key: "${ANTHROPIC_API_KEY}"
What We're Improving
A. From Hardwired to Pluggable
Before: One provider, baked into the code. Changing provider = changing code. After: Provider is a config choice. Switching from Anthropic to LiteLLM to OpenRouter is a YAML change + secret rotation. No code touches needed.
B. From Fragile to Resilient
Before: API error = entire collection run fails. No retries. After:
- Transient errors (rate limits, timeouts, network) → automatic retry with backoff
- Permanent errors (bad API key, unsupported model) → fail fast, clear error message
- Truncated responses (model hit token limit) → detected, retried with note
- Model refuses request → logged, item skipped safely
C. From All-or-Nothing to Progressive Degradation
Before: Structured output works or it doesn't. Binary. After: Three fallback layers (schema → json_object → prompt-based). The connector adapts to what each provider actually supports instead of assuming capabilities.
D. From Silent to Observable
Before: No visibility into what the AI extraction does. After:
- Which provider/model is being used (logged at startup)
- Which structured output strategy was selected (logged once)
- How long each call takes (logged per call)
- Whether fallback was triggered (logged as warning)
- Clear error classification in logs
E. From Coupled to Separated
Before: AI provider choice is an engineering decision embedded in code. After: AI provider choice is an operations decision in instance config. Each client controls their own provider, model, and API gateway independently.
Error Handling Strategy
| Error Type | What Happens | Why |
|---|---|---|
Missing ai: config |
Corporate Memory skips AI extraction, logs warning | Don't crash the whole service |
| Invalid API key | Fail fast, log error, skip collection run | Don't waste retries on permanent failure |
| Rate limit (429) | Wait + retry with exponential backoff (3 attempts) | Transient, will resolve |
| Network timeout | Retry once, then fail | Might be transient |
| Truncated response | Detect via finish_reason, retry once | Model hit token limit |
| Model refusal | Log, mark item as unsafe | Conservative: don't share uncertain content |
| Invalid JSON response | Log, mark item as unsafe | Better to skip than crash |
| Structured output unsupported | Fall back to json_object, then prompt-based | Adapt to provider capabilities |
Scope Boundaries
In scope (v1):
- Anthropic direct provider (existing behavior, tested)
- OpenAI-compatible proxy provider (LiteLLM, verified against Groupon proxy)
- Backward compatibility with existing
ai.anthropic_api_keyconfig - Three-layer structured output fallback
- Custom error hierarchy (auth / rate limit / timeout / format)
- Retry with backoff for transient errors
- Corporate Memory collector integration
Explicitly NOT in scope (future):
- Azure OpenAI, OpenRouter, Gemini — listed as "untested" until verified per-provider
- General-purpose AI chat/generation interface
- Streaming responses
- Multi-turn conversations
- Token usage tracking / cost monitoring (v2 consideration)
- Provider capability auto-detection at startup
Testing Strategy
Unit Tests (connector internals)
- Factory creates correct provider from each config shape
- Factory handles legacy
ai.anthropic_api_keyconfig - Missing/invalid config raises clear errors
- Each provider formats API calls correctly (mocked SDK)
- Structured output fallback chain works
- Error classification (auth vs rate limit vs timeout)
Integration Tests (Corporate Memory behavior)
- Full collection run with mocked provider
- Skip when no files changed (hash check)
- Preserve existing item IDs across runs
- Sensitivity check runs only on new items
- Fail-closed on sensitivity check errors
- user_hashes.json written only after successful processing
- Graceful degradation when
ai:config is missing
Manual Verification (before production)
- Dry-run against actual Groupon LiteLLM proxy
- Verify structured output works through proxy
- Verify sensitivity check works through proxy
- Full collection produces valid knowledge.json
Deployment
The existing deploy.sh handles dependency installation from requirements.txt,
so no manual pip install is needed. The deployment sequence:
- Add
openaitorequirements.txt(OSS repo) - Update
collector.pyto use new connector (OSS repo) - Add
ai:section toinstance.yaml(instance repo) - Add
LLM_API_KEYsecret to GHA secrets and deploy.yml (instance repo) - Add
CONFIG_DIRto the wrapper scriptcollect-knowledge(OSS repo) - Push both repos → CI/CD deploys automatically
- Verify via
--dry-runon server
Rollback: Revert both repos to previous commit. The legacy config path
means existing ai.anthropic_api_key still works if we need to roll back.
Files to Modify
| File | Repo | Change |
|---|---|---|
connectors/llm/ (5 new files) |
OSS | New connector module |
services/corporate_memory/collector.py |
OSS | Use connector instead of direct API |
server/bin/collect-knowledge |
OSS | Add CONFIG_DIR |
requirements.txt |
OSS | Add openai>=1.0.0 |
server/deploy.sh |
OSS | Add LLM_API_KEY to env propagation |
config/.env.template |
OSS | Document LLM_API_KEY |
config/instance.yaml.example |
OSS | Expanded ai: section with examples |
docs/CONFIGURATION.md |
OSS | Add AI provider docs |
tests/test_llm_connector.py |
OSS | New: connector tests |
tests/test_corporate_memory.py |
OSS | New/expanded: behavior tests |
config/instance.yaml |
Instance | Add ai: section for Groupon |
.github/workflows/deploy.yml |
Instance | Add LLM_API_KEY to .env |
env.example |
Instance | Document LLM_API_KEY |
Risk Assessment
| Risk | Level | Mitigation |
|---|---|---|
| LiteLLM structured output translation | Medium | Three-layer fallback + manual verification before deploy |
| Config migration breaks existing instances | Low | Backward compat shim for legacy config shape |
New openai dependency conflicts |
Low | Standard package, deploy.sh handles install |
| Corporate Memory regression | Medium | Expanded behavior tests covering all current logic |
| Systemd/wrapper script CONFIG_DIR | Low | Follows existing pattern from other services |