Sweep operator runbooks (docs/QUICKSTART, docs/HEADLESS_USAGE, docs/architecture, docs/sample-data, docs/agent-workspace-prompt, docs/metrics/metrics.yml, dev_docs/server, dev_docs/disaster-recovery), the corporate-memory service README, the jira connector README + backfill scripts, the deploy skill, and test docstrings. Replaces `da sync` → `agnes pull`, `da analyst setup` → `agnes init`, `da metrics ...` → `agnes catalog --metrics` / `agnes admin metrics ...`, `da fetch` → `agnes snapshot create`, plus the matching docker-compose admin invocations. Vendor-specific `/opt/data-analyst/` install paths in jira backfill / consistency scripts and operator docs are replaced with the placeholder `<install-dir>` and a new `AGNES_ENV_FILE` env-var override that lets a deployment inject its actual install path without a code change. Aligns with the OSS vendor-agnostic policy in CLAUDE.md. CHANGELOG `### Internal` entry summarizes the audit and reaffirms the intentional stale-marker tuples (`_LEGACY_STRINGS`, `_OUR_COMMAND_MARKERS`) that must keep referencing `da sync` / `da fetch` / etc. for hook upgrade and override-detection logic.
7.1 KiB
Server Operations
Operational guide for the AI Data Analyst Docker deployment.
Basic Information
| Parameter | Value |
|---|---|
| GCP Project | your-gcp-project |
| Zone | europe-north1-a |
| Machine type | e2-medium |
| OS | Debian 12 (bookworm) |
| External IP | YOUR_SERVER_IP |
Docker Compose
Starting and stopping
# Start all services (app + scheduler)
docker compose up -d
# Include optional services (Telegram bot, etc.)
docker compose --profile full up -d
# Stop all services
docker compose down
# Restart a single service
docker compose restart app
# Pull latest images and redeploy
docker compose pull && docker compose up -d
Status
# List running containers and their state
docker compose ps
# Resource usage
docker stats
Log Viewing
# All services, follow
docker compose logs -f
# Single service
docker compose logs -f app
docker compose logs -f scheduler
# Last N lines
docker compose logs --tail=100 app
# Since a timestamp
docker compose logs --since=1h app
Application logs are written to stdout/stderr and captured by Docker.
Health Check
# Quick check
curl https://your-instance.example.com/health
# With response body
curl -s https://your-instance.example.com/health | python3 -m json.tool
Expected response:
{"status": "ok"}
The /health endpoint also checks DuckDB connectivity and returns 503 if
the database is unavailable.
Data Sync
Trigger a manual sync
# Via API
curl -X POST http://localhost:8000/api/sync/trigger
# Sync a single table
curl -X POST "http://localhost:8000/api/sync/trigger?table=table_name"
Check sync status
curl -s http://localhost:8000/api/sync/status | python3 -m json.tool
Data Structure
/data/ # Persistent volume (GCP pd-balanced, snapshotted)
├── state/
│ └── system.duckdb # Table registry, users, sync state, audit log
├── analytics/
│ └── server.duckdb # Master analytics DB (rebuilt on startup)
└── extracts/
└── {source_name}/
├── extract.duckdb # Per-source extract DB with views
└── data/ # Parquet files (local sources: Keboola, Jira)
└── *.parquet
system.duckdb is the source of truth for configuration. Back it up before
any destructive operation.
Admin CLI
# List registered tables
docker compose exec app agnes admin list-tables
# Register a new table
docker compose exec app agnes admin register-table
# User management
docker compose exec app agnes admin list-users
# Query data directly
docker compose exec app agnes query "SELECT * FROM my_table LIMIT 10"
Application Deployment
Application is deployed via Docker image. The recommended workflow:
- Push changes to the
mainbranch - CI builds and pushes a new image
- On the server, pull and restart:
cd <install-dir> docker compose pull docker compose up -d
To pin a specific image version, set the tag in docker-compose.yml before deploying.
Environment configuration
# Edit .env (never commit this file)
nano <install-dir>/.env
# Restart app to apply changes
docker compose restart app
See config/.env.template for the full variable reference and
config/instance.yaml.example for instance configuration.
Monitoring
GCP Cloud Monitoring
The VM reports metrics via the Google Cloud Ops Agent:
# Check agent status
sudo systemctl status google-cloud-ops-agent
Key metrics in GCP Console > Monitoring > Metrics Explorer:
agent.googleapis.com/disk/percent_used— watch/datapartitionagent.googleapis.com/memory/percent_usedagent.googleapis.com/cpu/utilization
A disk space alert fires when /data exceeds 85% for 5 minutes.
Local checks
# Disk usage
df -h /data
# Data directory breakdown
du -sh /data/*
# Container resource usage
docker stats --no-stream
Backup and Disaster Recovery
The /data persistent disk has daily GCP snapshot schedules with 14-day retention.
# List existing snapshots
gcloud compute snapshots list --project=your-gcp-project \
--filter="sourceDisk:data-disk" --sort-by=~creationTimestamp
# Create a manual snapshot before risky operations
gcloud compute disks snapshot data-disk \
--project=your-gcp-project \
--zone=europe-north1-a \
--snapshot-names=data-disk-$(date +%Y%m%d)-manual
See disaster-recovery.md for full recovery procedures.
Web Application
The FastAPI app is available at https://your-instance.example.com.
- Google OAuth: restricted to
allowed_domainset inconfig/instance.yaml - Email magic link: available out of the box (no external service required)
- Admin API:
POST /api/admin/register-table(register),PUT /api/admin/registry/{id}(update),GET /api/admin/registry(list) — manage tables - Sync API:
POST /api/sync/trigger— trigger data extraction
Google OAuth setup
- Go to Google Cloud Console
- Create OAuth 2.0 Client ID (Web application)
- Authorized JavaScript origins:
https://your-instance.example.com - Authorized redirect URIs:
https://your-instance.example.com/auth/google/callback - Add
GOOGLE_CLIENT_IDandGOOGLE_CLIENT_SECRETto.env
Jira Webhook Integration
Receives webhooks from Atlassian Jira for real-time issue sync.
Configuration
Add to .env:
JIRA_WEBHOOK_SECRET=<generate with: python -c "import secrets; print(secrets.token_hex(32))">
JIRA_API_TOKEN=<API token from https://id.atlassian.com/manage-profile/security/api-tokens>
Add to config/instance.yaml:
jira:
domain: "your-org.atlassian.net"
email: "integration-user@your-domain.com"
webhook_secret: "${JIRA_WEBHOOK_SECRET}"
api_token: "${JIRA_API_TOKEN}"
Jira webhook setup
- Go to Jira Admin > System > WebHooks
- Create new webhook:
- URL:
https://your-instance.example.com/webhooks/jira - Secret: same value as
JIRA_WEBHOOK_SECRET - Events: Issue created/updated/deleted, Comment created/updated, Attachment created
- URL:
Monitoring
# Health check
curl https://your-instance.example.com/webhooks/jira/health
# Webhook processing logs
docker compose logs -f app | grep -i jira
Troubleshooting
Container won't start
docker compose logs app | tail -50
# Look for configuration or DuckDB errors at startup
DuckDB locked
If the app crashes mid-write, DuckDB may hold a write lock:
docker compose down
# Wait a few seconds, then:
docker compose up -d
DuckDB releases locks when the process exits cleanly. A forced restart resolves most lock issues.
Sync failing
# Check sync logs
docker compose logs app | grep -i "sync\|error\|exception"
# Verify data source credentials in .env
docker compose exec app agnes admin list-tables
Out of disk space
df -h /data
du -sh /data/extracts/*
# Remove old parquet partitions if needed (check with orchestrator first)
# Trigger a fresh snapshot before any manual cleanup