agnes-the-ai-analyst/dev_docs/server.md
ZdenekSrotyr 8233c3e3f9 chore(docs): replace stale da verbs and vendor-specific install paths
Sweep operator runbooks (docs/QUICKSTART, docs/HEADLESS_USAGE,
docs/architecture, docs/sample-data, docs/agent-workspace-prompt,
docs/metrics/metrics.yml, dev_docs/server, dev_docs/disaster-recovery),
the corporate-memory service README, the jira connector README + backfill
scripts, the deploy skill, and test docstrings. Replaces `da sync` →
`agnes pull`, `da analyst setup` → `agnes init`, `da metrics ...` →
`agnes catalog --metrics` / `agnes admin metrics ...`, `da fetch` →
`agnes snapshot create`, plus the matching docker-compose admin
invocations.

Vendor-specific `/opt/data-analyst/` install paths in jira backfill /
consistency scripts and operator docs are replaced with the
placeholder `<install-dir>` and a new `AGNES_ENV_FILE` env-var override
that lets a deployment inject its actual install path without a code
change. Aligns with the OSS vendor-agnostic policy in CLAUDE.md.

CHANGELOG `### Internal` entry summarizes the audit and reaffirms the
intentional stale-marker tuples (`_LEGACY_STRINGS`, `_OUR_COMMAND_MARKERS`)
that must keep referencing `da sync` / `da fetch` / etc. for hook upgrade
and override-detection logic.
2026-05-04 21:22:19 +02:00

7.1 KiB

Server Operations

Operational guide for the AI Data Analyst Docker deployment.

Basic Information

Parameter Value
GCP Project your-gcp-project
Zone europe-north1-a
Machine type e2-medium
OS Debian 12 (bookworm)
External IP YOUR_SERVER_IP

Docker Compose

Starting and stopping

# Start all services (app + scheduler)
docker compose up -d

# Include optional services (Telegram bot, etc.)
docker compose --profile full up -d

# Stop all services
docker compose down

# Restart a single service
docker compose restart app

# Pull latest images and redeploy
docker compose pull && docker compose up -d

Status

# List running containers and their state
docker compose ps

# Resource usage
docker stats

Log Viewing

# All services, follow
docker compose logs -f

# Single service
docker compose logs -f app
docker compose logs -f scheduler

# Last N lines
docker compose logs --tail=100 app

# Since a timestamp
docker compose logs --since=1h app

Application logs are written to stdout/stderr and captured by Docker.

Health Check

# Quick check
curl https://your-instance.example.com/health

# With response body
curl -s https://your-instance.example.com/health | python3 -m json.tool

Expected response:

{"status": "ok"}

The /health endpoint also checks DuckDB connectivity and returns 503 if the database is unavailable.

Data Sync

Trigger a manual sync

# Via API
curl -X POST http://localhost:8000/api/sync/trigger

# Sync a single table
curl -X POST "http://localhost:8000/api/sync/trigger?table=table_name"

Check sync status

curl -s http://localhost:8000/api/sync/status | python3 -m json.tool

Data Structure

/data/                          # Persistent volume (GCP pd-balanced, snapshotted)
├── state/
│   └── system.duckdb           # Table registry, users, sync state, audit log
├── analytics/
│   └── server.duckdb           # Master analytics DB (rebuilt on startup)
└── extracts/
    └── {source_name}/
        ├── extract.duckdb      # Per-source extract DB with views
        └── data/               # Parquet files (local sources: Keboola, Jira)
            └── *.parquet

system.duckdb is the source of truth for configuration. Back it up before any destructive operation.

Admin CLI

# List registered tables
docker compose exec app agnes admin list-tables

# Register a new table
docker compose exec app agnes admin register-table

# User management
docker compose exec app agnes admin list-users

# Query data directly
docker compose exec app agnes query "SELECT * FROM my_table LIMIT 10"

Application Deployment

Application is deployed via Docker image. The recommended workflow:

  1. Push changes to the main branch
  2. CI builds and pushes a new image
  3. On the server, pull and restart:
    cd <install-dir>
    docker compose pull
    docker compose up -d
    

To pin a specific image version, set the tag in docker-compose.yml before deploying.

Environment configuration

# Edit .env (never commit this file)
nano <install-dir>/.env

# Restart app to apply changes
docker compose restart app

See config/.env.template for the full variable reference and config/instance.yaml.example for instance configuration.

Monitoring

GCP Cloud Monitoring

The VM reports metrics via the Google Cloud Ops Agent:

# Check agent status
sudo systemctl status google-cloud-ops-agent

Key metrics in GCP Console > Monitoring > Metrics Explorer:

  • agent.googleapis.com/disk/percent_used — watch /data partition
  • agent.googleapis.com/memory/percent_used
  • agent.googleapis.com/cpu/utilization

A disk space alert fires when /data exceeds 85% for 5 minutes.

Local checks

# Disk usage
df -h /data

# Data directory breakdown
du -sh /data/*

# Container resource usage
docker stats --no-stream

Backup and Disaster Recovery

The /data persistent disk has daily GCP snapshot schedules with 14-day retention.

# List existing snapshots
gcloud compute snapshots list --project=your-gcp-project \
  --filter="sourceDisk:data-disk" --sort-by=~creationTimestamp

# Create a manual snapshot before risky operations
gcloud compute disks snapshot data-disk \
  --project=your-gcp-project \
  --zone=europe-north1-a \
  --snapshot-names=data-disk-$(date +%Y%m%d)-manual

See disaster-recovery.md for full recovery procedures.

Web Application

The FastAPI app is available at https://your-instance.example.com.

  • Google OAuth: restricted to allowed_domain set in config/instance.yaml
  • Email magic link: available out of the box (no external service required)
  • Admin API: POST /api/admin/register-table (register), PUT /api/admin/registry/{id} (update), GET /api/admin/registry (list) — manage tables
  • Sync API: POST /api/sync/trigger — trigger data extraction

Google OAuth setup

  1. Go to Google Cloud Console
  2. Create OAuth 2.0 Client ID (Web application)
  3. Authorized JavaScript origins: https://your-instance.example.com
  4. Authorized redirect URIs: https://your-instance.example.com/auth/google/callback
  5. Add GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET to .env

Jira Webhook Integration

Receives webhooks from Atlassian Jira for real-time issue sync.

Configuration

Add to .env:

JIRA_WEBHOOK_SECRET=<generate with: python -c "import secrets; print(secrets.token_hex(32))">
JIRA_API_TOKEN=<API token from https://id.atlassian.com/manage-profile/security/api-tokens>

Add to config/instance.yaml:

jira:
  domain: "your-org.atlassian.net"
  email: "integration-user@your-domain.com"
  webhook_secret: "${JIRA_WEBHOOK_SECRET}"
  api_token: "${JIRA_API_TOKEN}"

Jira webhook setup

  1. Go to Jira Admin > System > WebHooks
  2. Create new webhook:
    • URL: https://your-instance.example.com/webhooks/jira
    • Secret: same value as JIRA_WEBHOOK_SECRET
    • Events: Issue created/updated/deleted, Comment created/updated, Attachment created

Monitoring

# Health check
curl https://your-instance.example.com/webhooks/jira/health

# Webhook processing logs
docker compose logs -f app | grep -i jira

Troubleshooting

Container won't start

docker compose logs app | tail -50
# Look for configuration or DuckDB errors at startup

DuckDB locked

If the app crashes mid-write, DuckDB may hold a write lock:

docker compose down
# Wait a few seconds, then:
docker compose up -d

DuckDB releases locks when the process exits cleanly. A forced restart resolves most lock issues.

Sync failing

# Check sync logs
docker compose logs app | grep -i "sync\|error\|exception"

# Verify data source credentials in .env
docker compose exec app agnes admin list-tables

Out of disk space

df -h /data
du -sh /data/extracts/*

# Remove old parquet partitions if needed (check with orchestrator first)
# Trigger a fresh snapshot before any manual cleanup