Open-source AI data analyst platform extracted from internal repo. Includes data sync engine, Keboola adapter, Flask web portal, server deployment scripts, and configuration templates.
2.3 KiB
2.3 KiB
Deployment Guide
Server Requirements
- Debian 12 / Ubuntu 22.04+
- 2+ vCPUs, 2+ GB RAM
- 10+ GB data disk
- Public IP with DNS
Initial Server Setup
-
Provision a VM (GCP, AWS, Azure, etc.)
-
Run the setup script:
sudo bash server/setup.shThis creates:
- System groups:
data-ops,dataread,data-private - Deploy user with appropriate permissions
- Directory structure under
/opt/data-analyst/ - Python virtual environment
- System groups:
-
Set up the webapp:
sudo bash server/webapp-setup.shThis installs:
- Gunicorn systemd service
- Nginx reverse proxy with SSL
- Log rotation
CI/CD Pipeline
-
Copy the example workflow:
cp .github/workflows/deploy.yml.example .github/workflows/deploy.yml -
Configure GitHub Secrets:
SERVER_HOST: Server IP addressSERVER_USER: Deploy usernameSERVER_SSH_KEY: Deploy SSH private key- All environment variables from
.env
-
Push to
mainbranch triggers automatic deployment.
Directory Structure on Server
/opt/data-analyst/
├── repo/ # Git clone of this repository
├── .env # Environment variables (secrets)
├── .venv/ # Python virtual environment
└── logs/ # Application logs
/data/
├── src_data/
│ ├── parquet/ # Converted data files
│ ├── metadata/ # Sync state, profiles
│ └── raw/ # Raw source data
├── docs/ # Documentation served to analysts
├── scripts/ # Scripts distributed to analysts
└── notifications/ # Notification system data
Separate Config Repository
For production deployments, keep instance config in a separate private repository:
client-config-repo/
├── config/
│ ├── instance.yaml
│ └── data_description.md
├── .env.example
└── .github/workflows/deploy.yml
Set CONFIG_DIR=/opt/data-analyst/client-config/config/ in the environment.
SSL Setup
Use certbot for Let's Encrypt SSL:
sudo apt install certbot python3-certbot-nginx
sudo certbot --nginx -d data.yourcompany.com
Monitoring
- Health check:
GET /health - Logs:
journalctl -u webapp -f - Disk usage:
df -h /data