agnes-the-ai-analyst/docs/DEPLOYMENT.md
Petr c56905d34f Initial commit: OSS data distribution platform
Open-source AI data analyst platform extracted from internal repo.
Includes data sync engine, Keboola adapter, Flask web portal,
server deployment scripts, and configuration templates.
2026-03-08 23:31:28 +01:00

96 lines
2.3 KiB
Markdown

# Deployment Guide
## Server Requirements
- Debian 12 / Ubuntu 22.04+
- 2+ vCPUs, 2+ GB RAM
- 10+ GB data disk
- Public IP with DNS
## Initial Server Setup
1. Provision a VM (GCP, AWS, Azure, etc.)
2. Run the setup script:
```bash
sudo bash server/setup.sh
```
This creates:
- System groups: `data-ops`, `dataread`, `data-private`
- Deploy user with appropriate permissions
- Directory structure under `/opt/data-analyst/`
- Python virtual environment
3. Set up the webapp:
```bash
sudo bash server/webapp-setup.sh
```
This installs:
- Gunicorn systemd service
- Nginx reverse proxy with SSL
- Log rotation
## CI/CD Pipeline
1. Copy the example workflow:
```bash
cp .github/workflows/deploy.yml.example .github/workflows/deploy.yml
```
2. Configure GitHub Secrets:
- `SERVER_HOST`: Server IP address
- `SERVER_USER`: Deploy username
- `SERVER_SSH_KEY`: Deploy SSH private key
- All environment variables from `.env`
3. Push to `main` branch triggers automatic deployment.
## Directory Structure on Server
```
/opt/data-analyst/
├── repo/ # Git clone of this repository
├── .env # Environment variables (secrets)
├── .venv/ # Python virtual environment
└── logs/ # Application logs
/data/
├── src_data/
│ ├── parquet/ # Converted data files
│ ├── metadata/ # Sync state, profiles
│ └── raw/ # Raw source data
├── docs/ # Documentation served to analysts
├── scripts/ # Scripts distributed to analysts
└── notifications/ # Notification system data
```
## Separate Config Repository
For production deployments, keep instance config in a separate private repository:
```
client-config-repo/
├── config/
│ ├── instance.yaml
│ └── data_description.md
├── .env.example
└── .github/workflows/deploy.yml
```
Set `CONFIG_DIR=/opt/data-analyst/client-config/config/` in the environment.
## SSL Setup
Use certbot for Let's Encrypt SSL:
```bash
sudo apt install certbot python3-certbot-nginx
sudo certbot --nginx -d data.yourcompany.com
```
## Monitoring
- Health check: `GET /health`
- Logs: `journalctl -u webapp -f`
- Disk usage: `df -h /data`