agnes-the-ai-analyst/docs/DEPLOYMENT.md
Petr c56905d34f Initial commit: OSS data distribution platform
Open-source AI data analyst platform extracted from internal repo.
Includes data sync engine, Keboola adapter, Flask web portal,
server deployment scripts, and configuration templates.
2026-03-08 23:31:28 +01:00

2.3 KiB

Deployment Guide

Server Requirements

  • Debian 12 / Ubuntu 22.04+
  • 2+ vCPUs, 2+ GB RAM
  • 10+ GB data disk
  • Public IP with DNS

Initial Server Setup

  1. Provision a VM (GCP, AWS, Azure, etc.)

  2. Run the setup script:

    sudo bash server/setup.sh
    

    This creates:

    • System groups: data-ops, dataread, data-private
    • Deploy user with appropriate permissions
    • Directory structure under /opt/data-analyst/
    • Python virtual environment
  3. Set up the webapp:

    sudo bash server/webapp-setup.sh
    

    This installs:

    • Gunicorn systemd service
    • Nginx reverse proxy with SSL
    • Log rotation

CI/CD Pipeline

  1. Copy the example workflow:

    cp .github/workflows/deploy.yml.example .github/workflows/deploy.yml
    
  2. Configure GitHub Secrets:

    • SERVER_HOST: Server IP address
    • SERVER_USER: Deploy username
    • SERVER_SSH_KEY: Deploy SSH private key
    • All environment variables from .env
  3. Push to main branch triggers automatic deployment.

Directory Structure on Server

/opt/data-analyst/
├── repo/           # Git clone of this repository
├── .env            # Environment variables (secrets)
├── .venv/          # Python virtual environment
└── logs/           # Application logs

/data/
├── src_data/
│   ├── parquet/    # Converted data files
│   ├── metadata/   # Sync state, profiles
│   └── raw/        # Raw source data
├── docs/           # Documentation served to analysts
├── scripts/        # Scripts distributed to analysts
└── notifications/  # Notification system data

Separate Config Repository

For production deployments, keep instance config in a separate private repository:

client-config-repo/
├── config/
│   ├── instance.yaml
│   └── data_description.md
├── .env.example
└── .github/workflows/deploy.yml

Set CONFIG_DIR=/opt/data-analyst/client-config/config/ in the environment.

SSL Setup

Use certbot for Let's Encrypt SSL:

sudo apt install certbot python3-certbot-nginx
sudo certbot --nginx -d data.yourcompany.com

Monitoring

  • Health check: GET /health
  • Logs: journalctl -u webapp -f
  • Disk usage: df -h /data