Open-source AI data analyst platform extracted from internal repo. Includes data sync engine, Keboola adapter, Flask web portal, server deployment scripts, and configuration templates.
96 lines
2.3 KiB
Markdown
96 lines
2.3 KiB
Markdown
# Deployment Guide
|
|
|
|
## Server Requirements
|
|
|
|
- Debian 12 / Ubuntu 22.04+
|
|
- 2+ vCPUs, 2+ GB RAM
|
|
- 10+ GB data disk
|
|
- Public IP with DNS
|
|
|
|
## Initial Server Setup
|
|
|
|
1. Provision a VM (GCP, AWS, Azure, etc.)
|
|
|
|
2. Run the setup script:
|
|
```bash
|
|
sudo bash server/setup.sh
|
|
```
|
|
|
|
This creates:
|
|
- System groups: `data-ops`, `dataread`, `data-private`
|
|
- Deploy user with appropriate permissions
|
|
- Directory structure under `/opt/data-analyst/`
|
|
- Python virtual environment
|
|
|
|
3. Set up the webapp:
|
|
```bash
|
|
sudo bash server/webapp-setup.sh
|
|
```
|
|
|
|
This installs:
|
|
- Gunicorn systemd service
|
|
- Nginx reverse proxy with SSL
|
|
- Log rotation
|
|
|
|
## CI/CD Pipeline
|
|
|
|
1. Copy the example workflow:
|
|
```bash
|
|
cp .github/workflows/deploy.yml.example .github/workflows/deploy.yml
|
|
```
|
|
|
|
2. Configure GitHub Secrets:
|
|
- `SERVER_HOST`: Server IP address
|
|
- `SERVER_USER`: Deploy username
|
|
- `SERVER_SSH_KEY`: Deploy SSH private key
|
|
- All environment variables from `.env`
|
|
|
|
3. Push to `main` branch triggers automatic deployment.
|
|
|
|
## Directory Structure on Server
|
|
|
|
```
|
|
/opt/data-analyst/
|
|
├── repo/ # Git clone of this repository
|
|
├── .env # Environment variables (secrets)
|
|
├── .venv/ # Python virtual environment
|
|
└── logs/ # Application logs
|
|
|
|
/data/
|
|
├── src_data/
|
|
│ ├── parquet/ # Converted data files
|
|
│ ├── metadata/ # Sync state, profiles
|
|
│ └── raw/ # Raw source data
|
|
├── docs/ # Documentation served to analysts
|
|
├── scripts/ # Scripts distributed to analysts
|
|
└── notifications/ # Notification system data
|
|
```
|
|
|
|
## Separate Config Repository
|
|
|
|
For production deployments, keep instance config in a separate private repository:
|
|
|
|
```
|
|
client-config-repo/
|
|
├── config/
|
|
│ ├── instance.yaml
|
|
│ └── data_description.md
|
|
├── .env.example
|
|
└── .github/workflows/deploy.yml
|
|
```
|
|
|
|
Set `CONFIG_DIR=/opt/data-analyst/client-config/config/` in the environment.
|
|
|
|
## SSL Setup
|
|
|
|
Use certbot for Let's Encrypt SSL:
|
|
```bash
|
|
sudo apt install certbot python3-certbot-nginx
|
|
sudo certbot --nginx -d data.yourcompany.com
|
|
```
|
|
|
|
## Monitoring
|
|
|
|
- Health check: `GET /health`
|
|
- Logs: `journalctl -u webapp -f`
|
|
- Disk usage: `df -h /data`
|