Merge feature/multi-customer-deployment: multi-customer deployment infra
- infra/modules/customer-instance/ — reusable Terraform module (tag infra-v1.0.0) - infra/examples/minimal/ — OSS self-host quickstart - scripts/bootstrap-gcp.sh — per-customer GCP setup - scripts/fetch-env-from-secrets.sh — VM-side .env from Secret Manager - docker-compose.prod.yml — bind data volume to host /data for persistent disks - docs/superpowers/specs/2026-04-21-multi-customer-deployment-spec.md - docs/superpowers/plans/2026-04-21-multi-customer-deployment.md - docs/superpowers/plans/2026-04-21-deployment-log.md
This commit is contained in:
commit
94b6a8eff2
15 changed files with 3136 additions and 308 deletions
5
.gitignore
vendored
5
.gitignore
vendored
|
|
@ -141,3 +141,8 @@ docs/AGENT-REPORTS/
|
|||
docs/ZS_PADAK_*
|
||||
.github/workflows/ci.yml
|
||||
/auth/
|
||||
/tmp/
|
||||
|
||||
# GCP service account keys — never commit
|
||||
*-key.json
|
||||
/agnes-deploy-*.json
|
||||
|
|
|
|||
|
|
@ -1,4 +1,7 @@
|
|||
# Production override — uses pre-built GHCR image instead of local build.
|
||||
# Production override — uses pre-built GHCR image instead of local build,
|
||||
# and binds the `data` volume to /data on the host (so persistent-disk mounts
|
||||
# at /data are used by all services).
|
||||
#
|
||||
# Usage: docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
|
||||
# Override tag: AGNES_TAG=stable-2026.04.3 docker compose -f ... up -d
|
||||
services:
|
||||
|
|
@ -16,3 +19,15 @@ services:
|
|||
image: ghcr.io/keboola/agnes-the-ai-analyst:${AGNES_TAG:-stable}
|
||||
session-collector:
|
||||
image: ghcr.io/keboola/agnes-the-ai-analyst:${AGNES_TAG:-stable}
|
||||
|
||||
# Override the `data` named volume to bind-mount /data from the host.
|
||||
# This ensures a persistent disk mounted at /data (by Terraform startup
|
||||
# script) is the actual backing store, not a Docker-managed volume on the
|
||||
# boot disk.
|
||||
volumes:
|
||||
data:
|
||||
driver: local
|
||||
driver_opts:
|
||||
type: none
|
||||
o: bind
|
||||
device: /data
|
||||
|
|
|
|||
2137
docs/superpowers/plans/2026-04-21-multi-customer-deployment.md
Normal file
2137
docs/superpowers/plans/2026-04-21-multi-customer-deployment.md
Normal file
File diff suppressed because it is too large
Load diff
|
|
@ -0,0 +1,442 @@
|
|||
# Multi-Customer Deployment — Design Spec
|
||||
|
||||
Datum: 2026-04-21
|
||||
Status: Návrh k implementaci
|
||||
Autor: Zdeněk Šrotýř + Claude (sparring)
|
||||
|
||||
## 1. Cíl
|
||||
|
||||
Zavést *production-grade* nasazení Agnes, které:
|
||||
|
||||
1. Nechává **upstream repo public** (žádné zákaznické info tam).
|
||||
2. Umožňuje **N zákazníků paralelně**, každý v izolovaném prostoru.
|
||||
3. Je **anonymizované** — jeden zákazník nevidí existenci ani identitu ostatních.
|
||||
4. Má **auto-deploy s rozumnými gates** — feature branch push → dev VM aktualizace do minut; merge do main → prod s review gate.
|
||||
5. Podporuje **branch-aware dev environments** — víc vývojářů paralelně, každý na své branchi, bez interference.
|
||||
6. **Škáluje O(1) na zákazníka** — přidání GRPN vedle Keboola znamená jen klonování šablony, ne změnu upstream.
|
||||
|
||||
## 2. Model — Pure Self-Deploy
|
||||
|
||||
### 2.1 Role
|
||||
|
||||
| Strana | Co dělá |
|
||||
|---|---|
|
||||
| **Keboola jako upstream** | Udržuje app kód, buildí & pushuje Docker image na GHCR, udržuje TF modul, udržuje infra template |
|
||||
| **Zákazník (vč. Keboola-as-customer)** | Vlastní GCP projekt, vlastní privátní infra repo, vlastní CI/CD, spravuje svoje VMs, nese náklady |
|
||||
|
||||
Keboola jako upstream **nemá žádný přístup k zákaznickým GCP projektům**. Zákazník zodpovídá za svoje nasazení.
|
||||
|
||||
Keboola interní produkční Agnes instance je **speciální případ zákazníka** — Keboola IT vlastní `kids-ai-data-analysis` GCP projekt a spravuje tam svou Agnes stejně jako to bude dělat GRPN ve svém GCP.
|
||||
|
||||
### 2.2 Budoucí rozšíření (out of scope pro tuto vlnu)
|
||||
|
||||
- **AWS podpora**: TF modul je dnes GCP-specific. Jakmile přijde první AWS zákazník, přidáme paralelní modul `modules/customer-instance-aws/`.
|
||||
- **Managed service**: Keboola bude nabízet "nasadíme vám to za vás" — znamená přidat Keboola jako operator role s IAM delegací do zákazníkova GCP. Design v tomhle specu je kompatibilní, jen vyžaduje extra vrstvu IAM bindings.
|
||||
|
||||
## 3. Repo architektura
|
||||
|
||||
### 3.1 Počet a typ repozitářů
|
||||
|
||||
```
|
||||
keboola/agnes-the-ai-analyst PUBLIC App + TF modul + dokumentace
|
||||
keboola/agnes-infra-template PUBLIC Skeleton pro privátní infra repo (template)
|
||||
keboola/agnes-infra-keboola PRIVATE Keboola-as-customer deployment
|
||||
{acme}/agnes-infra PRIVATE Nový zákazník — v jejich GitHub org, klonováno z template
|
||||
```
|
||||
|
||||
Počet: **2 upstream + N per-customer**. Upstream repa jsou stabilní, per-customer vznikají při onboarding.
|
||||
|
||||
### 3.2 Obsah `keboola/agnes-the-ai-analyst` (public)
|
||||
|
||||
```
|
||||
agnes-the-ai-analyst/
|
||||
├── app/ src/ connectors/ cli/ # produkt
|
||||
├── Dockerfile docker-compose.yml
|
||||
├── .github/workflows/
|
||||
│ └── release.yml # build + push do GHCR; tagy: :dev, :stable, :dev-branch-xyz
|
||||
├── infra/
|
||||
│ ├── modules/
|
||||
│ │ └── customer-instance/ # versioned: tag infra-v1.0, v1.1, ...
|
||||
│ │ ├── main.tf
|
||||
│ │ ├── variables.tf
|
||||
│ │ └── outputs.tf
|
||||
│ └── examples/
|
||||
│ └── minimal/ # quickstart pro OSS self-hoster
|
||||
└── docs/
|
||||
├── DEPLOYMENT.md # pro self-host (compose, bez Terraform)
|
||||
├── ONBOARDING.md # pro managed (cesta k TF + template)
|
||||
└── architecture.md
|
||||
```
|
||||
|
||||
**TF modul `customer-instance`** je verzován samostatně semver (`infra-v1.x`), odlišeně od app image (CalVer `YYYY.MM.N`).
|
||||
|
||||
### 3.3 Obsah `keboola/agnes-infra-template` (public template)
|
||||
|
||||
```
|
||||
agnes-infra-template/
|
||||
├── terraform/
|
||||
│ ├── main.tf # module { source = "github.com/keboola/agnes-the-ai-analyst//infra/modules/customer-instance?ref=infra-v1.0" }
|
||||
│ ├── variables.tf
|
||||
│ ├── backend.tf # gcs by default, komentář jak přepnout na s3/remote
|
||||
│ ├── terraform.tfvars.example
|
||||
│ └── .gitignore # terraform.tfvars, *.tfstate
|
||||
├── .github/workflows/
|
||||
│ ├── plan.yml # PR → terraform plan
|
||||
│ └── apply.yml # main → terraform apply
|
||||
├── config/
|
||||
│ └── instance.yaml.example
|
||||
├── bootstrap.sh # jednorázový setup GCP: SA, API enable, bucket, secrets
|
||||
└── README.md # step-by-step onboarding
|
||||
```
|
||||
|
||||
Zákazník (nebo Keboola při onboardingu) použije `gh repo create --template keboola/agnes-infra-template` → přijde privátní repo s hotovou strukturou.
|
||||
|
||||
### 3.4 Obsah per-customer privátního repa (např. `keboola/agnes-infra-keboola`)
|
||||
|
||||
Přesně ta samá struktura jako template, jen s konkrétními hodnotami v `terraform.tfvars`:
|
||||
|
||||
```hcl
|
||||
# keboola/agnes-infra-keboola/terraform/terraform.tfvars
|
||||
# (gitignored, nebo lokálně v Secret Manageru — viz §6)
|
||||
|
||||
gcp_project_id = "kids-ai-data-analysis"
|
||||
region = "europe-west1"
|
||||
zone = "europe-west1-b"
|
||||
|
||||
prod_instance = {
|
||||
name = "agnes-prod"
|
||||
machine_type = "e2-small"
|
||||
image_tag = "stable" # floating | "stable-2026.04.N" (pinned)
|
||||
upgrade_mode = "auto" # auto (watchtower) | pinned (Renovate)
|
||||
tls_mode = "caddy" # caddy | gcp-lb | cloudflare | none
|
||||
domain = "" # prázdné = jen IP
|
||||
}
|
||||
|
||||
dev_instances = [
|
||||
{ name = "agnes-dev-default", image_tag = "dev" },
|
||||
# přidávat další dev VMs per branch/developer
|
||||
]
|
||||
|
||||
seed_admin_email = "zdenek.srotyr@keboola.com"
|
||||
|
||||
# Keboola-specific
|
||||
data_source = "keboola"
|
||||
keboola_stack_url = "https://connection.us-east4.gcp.keboola.com/"
|
||||
keboola_token_secret_id = "keboola-storage-token" # reference do Secret Manageru
|
||||
```
|
||||
|
||||
## 4. Release model
|
||||
|
||||
### 4.1 Image tagging v GHCR
|
||||
|
||||
Public repo CI (release.yml) buildí a pushuje do `ghcr.io/keboola/agnes-the-ai-analyst` při každém push:
|
||||
|
||||
| Trigger | Tagy které vzniknou |
|
||||
|---|---|
|
||||
| Push `main` | `:stable`, `:stable-YYYY.MM.N`, `:sha-xxxxxxx` |
|
||||
| Push `feature/xyz` | `:dev`, `:dev-feature-xyz`, `:sha-xxxxxxx` |
|
||||
| Push `release/1.2.x` | `:release-1.2.x`, `:release-1.2.x-YYYY.MM.N` |
|
||||
|
||||
`:dev` a `:stable` jsou **floating** tagy — posouvají se při každém pushe. Verzované tagy jsou **neměnné**.
|
||||
|
||||
### 4.2 Visibility obrazu
|
||||
|
||||
`ghcr.io/keboola/agnes-the-ai-analyst` je **public image**. Zákaznické VMs pullují bez credentials.
|
||||
|
||||
Důvod: kód je veřejný, obraz nesmí obsahovat nic, co veřejný kód neobsahuje. Secrets jdou do `.env` na VM, ne do image.
|
||||
|
||||
### 4.3 Smoke test
|
||||
|
||||
Po push `main` a tagování `:stable-N`, CI spustí smoke test: `docker compose up` + curl `/api/health` + auth + query. PASS → `:stable` floating se posune. FAIL → build dostane `:deprecated-N` label, `:stable` se nehne, GitHub issue s logy.
|
||||
|
||||
### 4.4 CalVer + smoke test = kontinuální release
|
||||
|
||||
Žádné manuální release rozhodnutí. Každý merge do main = release (pokud smoke test projde). Číslování `YYYY.MM.N` = rok.měsíc.sekvence.
|
||||
|
||||
## 5. Branch-aware dev environments
|
||||
|
||||
### 5.1 Motivace
|
||||
|
||||
Víc vývojářů paralelně potřebuje víc dev environmentů bez interference. „Floating `:dev`" je nedostatečné — poslední push přepíše ostatní.
|
||||
|
||||
### 5.2 Mechanismus
|
||||
|
||||
Každý feature branch push → samostatný tag `:dev-{branch-slug}` navíc k floating `:dev`.
|
||||
|
||||
V privátním infra repu zákazník vyjmenuje dev VMs s pinned tagem:
|
||||
|
||||
```hcl
|
||||
dev_instances = [
|
||||
{ name = "agnes-dev", image_tag = "dev" }, # floating (demo / reviewers)
|
||||
{ name = "agnes-alice-feat1", image_tag = "dev-feature-alice-dashboard" }, # Alice má svou
|
||||
{ name = "agnes-bob-pr142", image_tag = "dev-pr-142" }, # Bob pinned na PR
|
||||
]
|
||||
```
|
||||
|
||||
### 5.3 Lifecycle dev VM
|
||||
|
||||
```
|
||||
1. Někdo otevře PR v privátním infra repu:
|
||||
+ { name = "agnes-carol", image_tag = "dev-feature-carol-new-auth" }
|
||||
2. CI plan.yml komentuje v PR: „vytvoří se VM agnes-carol (e2-small, europe-west1-b)"
|
||||
3. Merge → apply.yml spustí terraform apply
|
||||
4. VM up za ~2 min
|
||||
5. Watchtower na VM polluje :dev-feature-carol-new-auth každých 5 min
|
||||
6. Každý push na feature/carol/new-auth → nový image → watchtower pullne → VM má aktuální verzi
|
||||
7. Až Carol dokončí feature (merge do main), smaže řádek v tfvars → terraform apply → VM destroy
|
||||
```
|
||||
|
||||
**Žádný nový SA, žádný nový GitHub environment, žádná infra operace navíc.** Jen editace seznamu v tfvars.
|
||||
|
||||
### 5.4 Ephemeral preview environments (budoucnost)
|
||||
|
||||
V pozdější fázi zvážit automatizaci: PR otevřen → GHA vytvoří per-PR VM; PR zavřen → destroy. Aktuálně explicitní flow přes tfvars stačí.
|
||||
|
||||
## 6. Prod upgrade model
|
||||
|
||||
### 6.1 Dva režimy (per-instance volitelné)
|
||||
|
||||
| Režim | Jak | Pro koho |
|
||||
|---|---|---|
|
||||
| **auto** | Watchtower na VM polluje `:stable` (floating), pullne + restart, když se objeví nový digest | Default — rychlost, low-touch |
|
||||
| **pinned** | `image_tag = "stable-2026.04.7"` v tfvars. Renovate polluje GHCR, otevírá PR s bump. Ops schválí → merge → apply | Regulovaní zákazníci, audit trail |
|
||||
|
||||
### 6.2 Gate pro auto režim
|
||||
|
||||
Jedinou ochranou před rozbitým `:stable` je **CI smoke test** před posunutím floating tagu. Pokud projde tam, prod auto-upgradne. Doporučení: mít i u Keboola instance **monitoring + alert na `/api/health` degraded status**, aby případný skluz smoke testu nezůstal dlouho bez povšimnutí.
|
||||
|
||||
### 6.3 Rollback
|
||||
|
||||
Rollback = změnit `image_tag` na předchozí verzi a `docker compose up -d`. Zjednodušená forma:
|
||||
|
||||
- **Auto režim:** rychle přepnout watchtower na specifický tag; pak investigate
|
||||
- **Pinned režim:** PR revert, apply
|
||||
|
||||
## 7. Security model
|
||||
|
||||
### 7.1 Authentication mezi komponenty
|
||||
|
||||
| Kdo → kde | Jak se přihlásí |
|
||||
|---|---|
|
||||
| Public CI → GHCR push | `${{ secrets.GITHUB_TOKEN }}` (built-in) |
|
||||
| VM → GHCR pull | Public image, bez auth |
|
||||
| Privátní CI → GCP | SA JSON key v `GCP_SA_KEY` secret (Fáze 1); WIF (Fáze follow-up) |
|
||||
| CI na zákaznickém GCP → Secret Manager | SA má `roles/secretmanager.admin` |
|
||||
| App na VM → Secret Manager | VM má dedikovaný SA s `roles/secretmanager.secretAccessor` |
|
||||
| App na VM → Keboola Storage | Token z Secret Manageru |
|
||||
|
||||
### 7.2 Deploy SA — scope per zákazník
|
||||
|
||||
SA `agnes-deploy@<gcp-project>` dostane **jen** tyto role:
|
||||
|
||||
```
|
||||
roles/compute.instanceAdmin.v1 # create/update/delete VMs
|
||||
roles/compute.securityAdmin # firewall rules
|
||||
roles/compute.networkAdmin # static IP
|
||||
roles/iam.serviceAccountUser # attach VM SA k instancím
|
||||
roles/secretmanager.admin # vytvořit/rotovat secrets
|
||||
roles/storage.admin # tfstate bucket
|
||||
```
|
||||
|
||||
Žádný `owner`, žádný `editor`. Blast radius pro leak SA key = přepis VMs v tomhle projektu. Nic mimo projekt, nic dat.
|
||||
|
||||
### 7.3 GitHub environmenty
|
||||
|
||||
```yaml
|
||||
environments:
|
||||
dev:
|
||||
# žádná protection
|
||||
secrets:
|
||||
GCP_SA_KEY: <same key>
|
||||
prod:
|
||||
protection_rules:
|
||||
required_reviewers: [@keboola-ops-team]
|
||||
wait_timer: 5m
|
||||
deployment_branches: main
|
||||
secrets:
|
||||
GCP_SA_KEY: <same key>
|
||||
```
|
||||
|
||||
Oba environmenty sdílí ten samý SA key (jeden GCP, jedna identita). Rozdíl je **jen v protection rules** — kdo smí pushnout kam.
|
||||
|
||||
### 7.4 VM hardening
|
||||
|
||||
- **OS Login** místo per-user SSH klíčů (follow-up)
|
||||
- **Dedikovaný VM SA** s minimem práv (jen read z Secret Manageru, nic dalšího)
|
||||
- **Ephemeral disk strategy**: boot disk = produkt (stateless), `/data` = persistent disk (stateful, snapshoty)
|
||||
- **Žádný token v startup-script metadatě** — všechny secrets teprve při boot z Secret Manageru
|
||||
|
||||
### 7.5 Rotace tajemství
|
||||
|
||||
| Tajemství | Kde žije | Jak se rotuje |
|
||||
|---|---|---|
|
||||
| Keboola Storage token | Secret Manager v zákaznickém GCP | Keboola UI → nová verze v SM → app restart |
|
||||
| JWT_SECRET_KEY | Secret Manager, generováno TF | `terraform apply` s `-replace=google_secret_manager_secret_version.jwt` |
|
||||
| SA JSON key | GitHub secret | Vygenerovat nový klíč, paste do GH secret, smazat starý klíč v GCP |
|
||||
| User passwords | Argon2 hash v DuckDB `users` | User-facing flow (reset endpoint, admin CLI) |
|
||||
|
||||
## 8. Onboarding nového zákazníka
|
||||
|
||||
### 8.1 Kroky (cílový čas: < 1 hod)
|
||||
|
||||
```
|
||||
1. Zákazník (nebo Keboola ops za něj) založí GCP projekt + billing
|
||||
2. Někdo s owner rolí v projektu spustí bootstrap.sh:
|
||||
- Enable APIs (compute, iam, secretmanager, storage, iamcredentials)
|
||||
- Vytvoří SA agnes-deploy s rolemi
|
||||
- Vygeneruje SA key (předá ownerovi)
|
||||
- Vytvoří gs://agnes-{project}-tfstate
|
||||
3. Zákazník (nebo Keboola ops) klonuje template:
|
||||
gh repo create {org}/agnes-infra --template keboola/agnes-infra-template --private
|
||||
4. V novém repu:
|
||||
- Nastaví GH secret GCP_SA_KEY (paste z kroku 2)
|
||||
- Upraví terraform.tfvars na jejich hodnoty
|
||||
- Vytvoří initial commit + push
|
||||
5. Nastaví Secret Manager tajemství (Keboola token atd.)
|
||||
6. První PR s tfvars → plan → merge → apply
|
||||
7. DNS — zákazník si později nastaví CNAME na IP (nebo zůstane na IP)
|
||||
8. Admin user — bootstrap endpoint POST /auth/bootstrap nebo admin CLI
|
||||
9. Smoke test: login, sync, query
|
||||
```
|
||||
|
||||
### 8.2 Co je vidět komu
|
||||
|
||||
| Role | Vidí |
|
||||
|---|---|
|
||||
| Každý na internetu | Public repo `agnes-the-ai-analyst`, jeho issues, PRs, image na GHCR |
|
||||
| Keboola ops tým | Výše + privátní template repo + infra-keboola repo |
|
||||
| Zákazník (acme) | Výše public + svůj vlastní infra-acme repo ve svém org |
|
||||
| Nikdo | Ostatní zákazníky kromě jejich vlastního |
|
||||
|
||||
## 9. Tok změn
|
||||
|
||||
### 9.1 Change v app kódu (nejčastější)
|
||||
|
||||
```
|
||||
1. Vývojář: push feature branch v public repu
|
||||
2. Public CI: build :dev-feature-xyz (a :dev floating)
|
||||
3. Watchtower na každé VM s image_tag = "dev": pullne do 5 min
|
||||
Watchtower na VM s image_tag = "dev-feature-xyz": pullne taky
|
||||
4. Dev review
|
||||
5. Merge do main
|
||||
6. Public CI: build :stable-YYYY.MM.N (a :stable floating)
|
||||
7. Smoke test CI: PASS → :stable se posune
|
||||
8. Prod VMs:
|
||||
- auto režim: watchtower pullne do 5 min
|
||||
- pinned režim: Renovate otevře PR v privátním repu
|
||||
```
|
||||
|
||||
### 9.2 Change v infra (VM size, dev VM list, nová disk)
|
||||
|
||||
```
|
||||
1. Ops otevře PR v privátním infra repu
|
||||
2. CI plan.yml: terraform plan → komentář v PR
|
||||
3. Review + merge
|
||||
4. CI apply.yml:
|
||||
- pro dev změny: environment "dev" → apply bez gatu
|
||||
- pro prod změny: environment "prod" → required reviewer → apply
|
||||
5. Po apply: smoke test přes curl /api/health
|
||||
```
|
||||
|
||||
### 9.3 Change v TF modulu
|
||||
|
||||
```
|
||||
1. Maintainer otevře PR v public repu do infra/modules/customer-instance/
|
||||
2. CI validuje modul proti examples/
|
||||
3. Merge → auto git tag infra-v1.1.0
|
||||
4. Renovate v každém privátním infra repu:
|
||||
→ otevře PR "bump source ref to infra-v1.1.0"
|
||||
5. Každý zákazník schvaluje samostatně → terraform plan → apply
|
||||
```
|
||||
|
||||
## 10. Provozní aspekty
|
||||
|
||||
### 10.1 Monitoring a alerting (doporučení, ne v první vlně)
|
||||
|
||||
- Cloud Monitoring dashboard per-customer
|
||||
- Alert na `/api/health` `status != "healthy"` déle než 5 min
|
||||
- Alert na VM CPU > 80 % déle než 30 min
|
||||
- Log-based metric: sync failures, auth failures, HTTP 5xx rate
|
||||
- Integrace se Slack/email přes Alerting policy
|
||||
|
||||
### 10.2 Backup
|
||||
|
||||
- Snapshoty `/data` persistent disku denně, retention 30 dní (TF `google_compute_resource_policy`)
|
||||
- `system.duckdb` obsahuje users/permissions — při schema migraci snapshot kopie (již existuje jako `*.pre-migrate`)
|
||||
|
||||
### 10.3 Disaster recovery
|
||||
|
||||
- Recreation VM z nuly = `terraform apply` (~5 min) + restore `/data` ze snapshotu (~5 min)
|
||||
- Total loss zákazníka = destroy GCP projektu; recreate ze snapshotu + tfstate
|
||||
|
||||
### 10.4 Cost per customer (orientačně)
|
||||
|
||||
| Položka | $/měs |
|
||||
|---|---|
|
||||
| Prod VM e2-small + 30GB SSD | ~$15 |
|
||||
| Dev VM e2-small + 30GB SSD | ~$15 |
|
||||
| Persistent disk (50 GB) | ~$2 |
|
||||
| Static IP (×2 — prod, dev) | ~$5 |
|
||||
| Snapshots (daily, 30d retention) | ~$2 |
|
||||
| Secret Manager | ~$0 (pod freetier) |
|
||||
| **Celkem base** | **~$40/měs** |
|
||||
|
||||
Škáluje lineárně s počtem dev VMs.
|
||||
|
||||
## 11. Principy / Non-goals
|
||||
|
||||
- ✅ **Public upstream zůstává public.** Nic, co zákazníka identifikuje, tam není.
|
||||
- ✅ **Zákazník má plnou kontrolu svého nasazení.** Včetně rozhodnutí, zda upgradovat.
|
||||
- ✅ **Žádná centrální Keboola ops infra.** Žádný sdílený GCP projekt, žádný sdílený state.
|
||||
- ❌ **Není to multi-tenant** v jednom deploymentu. Jeden `docker compose up` = jeden zákazník.
|
||||
- ❌ **Keboola není SaaS hostér** (aspoň ne teď). Pokud zákazník chce managed, je to ručně poskytnutá služba, ne produkt.
|
||||
- ❌ **Žádný cross-customer routing.** Žádný sdílený load balancer, žádný sdílený DNS.
|
||||
|
||||
## 12. Rozhodnutí a otázky
|
||||
|
||||
Všechny designové otázky, které vznikly během brainstormingu, jsou vyřešené. Odkazy zde pro trasovatelnost:
|
||||
|
||||
| Otázka | Rozhodnutí |
|
||||
|---|---|
|
||||
| Managed vs self-deploy | A) Pure self-deploy (mění se v Fázi 2+ pokud bude potřeba) |
|
||||
| Centrální ops repo | Ne — 1 public + 1 template + N per-customer |
|
||||
| TF state lokace | gs:// v zákaznickém GCP (default); flex na S3/TFC v template |
|
||||
| Template repo název | `keboola/agnes-infra-template` |
|
||||
| CI auth | SA JSON key v GH secret (Fáze 1); WIF (follow-up) |
|
||||
| Image visibility | Public na GHCR |
|
||||
| Prod upgrade režim | Per-instance volba auto/pinned, default auto |
|
||||
| TLS | Caddy default, flex na gcp-lb/cloudflare |
|
||||
| DNS | Zákazník si řeší sám, default jen IP |
|
||||
| GCP projekt pro Keboola | `kids-ai-data-analysis` zůstává |
|
||||
| Dev VM model | Seznam `dev_instances` v tfvars, per-položka image_tag |
|
||||
| `ZdenekSrotyr/tmp_oss` | Smazat po Fázi 1 |
|
||||
|
||||
## 13. Glosář
|
||||
|
||||
| Zkratka | Význam |
|
||||
|---|---|
|
||||
| **GHCR** | GitHub Container Registry — ghcr.io |
|
||||
| **WIF** | Workload Identity Federation — GCP mechanismus auth CI bez static key |
|
||||
| **SA** | Service Account (GCP) |
|
||||
| **TF** | Terraform |
|
||||
| **OIDC** | OpenID Connect — auth protokol, GitHub vydává OIDC tokeny pro GHA |
|
||||
| **CalVer** | Calendar Versioning — YYYY.MM.N |
|
||||
| **PD** | Persistent Disk (GCP) |
|
||||
|
||||
## 14. Follow-up iterace
|
||||
|
||||
Mimo scope této první vlny, ale plánováno:
|
||||
|
||||
- **WIF místo SA JSON key** (bezpečnost)
|
||||
- **OS Login** (odstranění osobních SSH klíčů)
|
||||
- **Monitoring + alerting** (Cloud Monitoring, Slack integration)
|
||||
- **Automatické snapshoty** + restore procedura
|
||||
- **Ephemeral PR preview environments**
|
||||
- **AWS podpora** (paralelní TF modul)
|
||||
- **Plugin API** pro proprietární customer extensions (viz issue #8)
|
||||
- **Managed service varianta** (Keboola hostuje za zákazníka)
|
||||
|
||||
## 15. Reference
|
||||
|
||||
- Předchozí spec: `docs/superpowers/specs/2026-04-09-multi-instance-deployment-design.md` (CalVer release model)
|
||||
- Issue: keboola/agnes-the-ai-analyst#8 — plugin API for private customer extensions
|
||||
54
infra/examples/minimal/main.tf
Normal file
54
infra/examples/minimal/main.tf
Normal file
|
|
@ -0,0 +1,54 @@
|
|||
# Minimal example: single-VM Agnes deploy.
|
||||
# Pro OSS self-hoster, co chce prod VM bez dev, bez TLS.
|
||||
terraform {
|
||||
required_version = ">= 1.5"
|
||||
required_providers {
|
||||
google = {
|
||||
source = "hashicorp/google"
|
||||
version = "~> 5.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
provider "google" {
|
||||
project = var.gcp_project_id
|
||||
region = "europe-west1"
|
||||
}
|
||||
|
||||
variable "gcp_project_id" {
|
||||
description = "GCP project ID (must have billing enabled)"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "admin_email" {
|
||||
description = "Email for first admin user"
|
||||
type = string
|
||||
}
|
||||
|
||||
module "agnes" {
|
||||
source = "../../modules/customer-instance"
|
||||
|
||||
gcp_project_id = var.gcp_project_id
|
||||
customer_name = "self-hosted"
|
||||
seed_admin_email = var.admin_email
|
||||
|
||||
prod_instance = {
|
||||
name = "agnes"
|
||||
machine_type = "e2-small"
|
||||
data_disk_gb = 30
|
||||
image_tag = "stable"
|
||||
upgrade_mode = "auto"
|
||||
tls_mode = "none"
|
||||
domain = ""
|
||||
}
|
||||
|
||||
dev_instances = []
|
||||
|
||||
# Customize below for your setup
|
||||
data_source = "keboola"
|
||||
}
|
||||
|
||||
output "agnes_ip" {
|
||||
description = "SSH in via: ssh <user>@<ip>; UI at http://<ip>:8000"
|
||||
value = module.agnes.prod_ip
|
||||
}
|
||||
170
infra/main.tf
170
infra/main.tf
|
|
@ -1,170 +0,0 @@
|
|||
terraform {
|
||||
required_version = ">= 1.5"
|
||||
|
||||
backend "gcs" {
|
||||
bucket = "agnes-terraform-state"
|
||||
prefix = "instances"
|
||||
}
|
||||
|
||||
required_providers {
|
||||
google = {
|
||||
source = "hashicorp/google"
|
||||
version = "~> 5.0"
|
||||
}
|
||||
random = {
|
||||
source = "hashicorp/random"
|
||||
version = "~> 3.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
provider "google" {
|
||||
project = var.project_id
|
||||
region = var.region
|
||||
zone = var.zone
|
||||
}
|
||||
|
||||
# --- Auto-generated secrets ---
|
||||
|
||||
resource "random_password" "jwt_secret" {
|
||||
length = 48
|
||||
special = false
|
||||
}
|
||||
|
||||
# --- Network ---
|
||||
|
||||
resource "google_compute_firewall" "data_analyst" {
|
||||
name = "${var.instance_name}-allow-web"
|
||||
network = "default"
|
||||
|
||||
allow {
|
||||
protocol = "tcp"
|
||||
ports = ["22", "80", "443", "8000"]
|
||||
}
|
||||
|
||||
source_ranges = ["0.0.0.0/0"]
|
||||
target_tags = [var.instance_name]
|
||||
}
|
||||
|
||||
# --- Static IP ---
|
||||
|
||||
resource "google_compute_address" "data_analyst" {
|
||||
name = "${var.instance_name}-ip"
|
||||
region = var.region
|
||||
}
|
||||
|
||||
# --- Startup script ---
|
||||
|
||||
locals {
|
||||
startup_script = <<-SCRIPT
|
||||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
exec > /var/log/startup.log 2>&1
|
||||
|
||||
echo "=== Installing Docker ==="
|
||||
if ! command -v docker &> /dev/null; then
|
||||
curl -fsSL https://get.docker.com | sh
|
||||
usermod -aG docker ${var.ssh_user}
|
||||
fi
|
||||
|
||||
# Install docker compose plugin
|
||||
if ! docker compose version &> /dev/null; then
|
||||
apt-get update && apt-get install -y docker-compose-plugin
|
||||
fi
|
||||
|
||||
echo "=== Cloning repository ==="
|
||||
APP_DIR="/opt/data-analyst"
|
||||
if [ ! -d "$APP_DIR" ]; then
|
||||
git clone https://github.com/keboola/agnes-the-ai-analyst.git "$APP_DIR"
|
||||
cd "$APP_DIR"
|
||||
git checkout main
|
||||
else
|
||||
cd "$APP_DIR"
|
||||
git pull origin main || true
|
||||
fi
|
||||
|
||||
echo "=== Creating .env ==="
|
||||
cat > "$APP_DIR/.env" << 'ENVEOF'
|
||||
JWT_SECRET_KEY=${random_password.jwt_secret.result}
|
||||
DATA_DIR=/data
|
||||
DATA_SOURCE=${var.keboola_token != "" ? "keboola" : "local"}
|
||||
KEBOOLA_STORAGE_TOKEN=${var.keboola_token}
|
||||
KEBOOLA_STACK_URL=${var.keboola_stack_url}
|
||||
KEBOOLA_PROJECT_ID=${var.keboola_project_id}
|
||||
SEED_ADMIN_EMAIL=${var.admin_email}
|
||||
LOG_LEVEL=info
|
||||
ENVEOF
|
||||
# Strip leading whitespace from heredoc
|
||||
sed -i 's/^ //' "$APP_DIR/.env"
|
||||
chmod 600 "$APP_DIR/.env"
|
||||
|
||||
echo "=== Creating instance.yaml ==="
|
||||
mkdir -p "$APP_DIR/config"
|
||||
cat > "$APP_DIR/config/instance.yaml" << YAMLEOF
|
||||
instance:
|
||||
name: "${var.instance_name}"
|
||||
subtitle: "Data Analytics Platform"
|
||||
server:
|
||||
host: "${google_compute_address.data_analyst.address}"
|
||||
hostname: "${var.domain != "" ? var.domain : google_compute_address.data_analyst.address}"
|
||||
port: 8000
|
||||
auth:
|
||||
allowed_domain: ""
|
||||
data_source:
|
||||
type: "${var.keboola_token != "" ? "keboola" : "local"}"
|
||||
YAMLEOF
|
||||
|
||||
echo "=== Creating data directory ==="
|
||||
mkdir -p /data/state /data/analytics /data/extracts
|
||||
chown -R 1000:1000 /data
|
||||
|
||||
echo "=== Starting Docker Compose ==="
|
||||
cd "$APP_DIR"
|
||||
docker compose pull 2>/dev/null || true
|
||||
docker compose build
|
||||
docker compose up -d
|
||||
|
||||
echo "=== Startup complete ==="
|
||||
docker compose ps
|
||||
SCRIPT
|
||||
}
|
||||
|
||||
# --- VM Instance ---
|
||||
|
||||
resource "google_compute_instance" "data_analyst" {
|
||||
name = var.instance_name
|
||||
machine_type = var.machine_type
|
||||
zone = var.zone
|
||||
|
||||
tags = [var.instance_name]
|
||||
|
||||
boot_disk {
|
||||
initialize_params {
|
||||
image = "ubuntu-os-cloud/ubuntu-2404-lts-amd64"
|
||||
size = var.disk_size_gb
|
||||
type = "pd-ssd"
|
||||
}
|
||||
}
|
||||
|
||||
network_interface {
|
||||
network = "default"
|
||||
access_config {
|
||||
nat_ip = google_compute_address.data_analyst.address
|
||||
}
|
||||
}
|
||||
|
||||
metadata = {
|
||||
ssh-keys = "${var.ssh_user}:${file(pathexpand(var.ssh_public_key_path))}"
|
||||
}
|
||||
|
||||
metadata_startup_script = local.startup_script
|
||||
|
||||
service_account {
|
||||
scopes = ["cloud-platform"]
|
||||
}
|
||||
|
||||
labels = {
|
||||
app = "data-analyst"
|
||||
managed = "terraform"
|
||||
}
|
||||
}
|
||||
163
infra/modules/customer-instance/main.tf
Normal file
163
infra/modules/customer-instance/main.tf
Normal file
|
|
@ -0,0 +1,163 @@
|
|||
terraform {
|
||||
required_version = ">= 1.5"
|
||||
required_providers {
|
||||
google = {
|
||||
source = "hashicorp/google"
|
||||
version = "~> 5.0"
|
||||
}
|
||||
random = {
|
||||
source = "hashicorp/random"
|
||||
version = "~> 3.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
locals {
|
||||
# Normalize all instances into a single list so for_each is uniform across prod + dev.
|
||||
all_instances = concat(
|
||||
[merge(var.prod_instance, { role = "prod" })],
|
||||
[for d in var.dev_instances : merge(d, {
|
||||
role = "dev"
|
||||
disk_size_gb = 30
|
||||
data_disk_gb = 20
|
||||
upgrade_mode = "auto"
|
||||
tls_mode = "caddy"
|
||||
domain = ""
|
||||
})]
|
||||
)
|
||||
}
|
||||
|
||||
# --- Secrets ---
|
||||
|
||||
resource "google_secret_manager_secret" "jwt" {
|
||||
secret_id = "agnes-${var.customer_name}-jwt-secret"
|
||||
project = var.gcp_project_id
|
||||
replication {
|
||||
auto {}
|
||||
}
|
||||
}
|
||||
|
||||
resource "random_password" "jwt" {
|
||||
length = 48
|
||||
special = false
|
||||
}
|
||||
|
||||
resource "google_secret_manager_secret_version" "jwt" {
|
||||
secret = google_secret_manager_secret.jwt.id
|
||||
secret_data = random_password.jwt.result
|
||||
}
|
||||
|
||||
# --- VM service account (dedikovaný, jen read Secret Manageru) ---
|
||||
|
||||
resource "google_service_account" "vm" {
|
||||
account_id = "agnes-${var.customer_name}-vm"
|
||||
display_name = "Agnes VM runtime SA (${var.customer_name})"
|
||||
project = var.gcp_project_id
|
||||
}
|
||||
|
||||
resource "google_project_iam_member" "vm_secrets" {
|
||||
project = var.gcp_project_id
|
||||
role = "roles/secretmanager.secretAccessor"
|
||||
member = "serviceAccount:${google_service_account.vm.email}"
|
||||
}
|
||||
|
||||
# --- Network ---
|
||||
|
||||
resource "google_compute_firewall" "web" {
|
||||
name = "agnes-${var.customer_name}-allow-web"
|
||||
project = var.gcp_project_id
|
||||
network = "default"
|
||||
|
||||
allow {
|
||||
protocol = "tcp"
|
||||
ports = ["22", "80", "443", "8000"]
|
||||
}
|
||||
|
||||
source_ranges = ["0.0.0.0/0"]
|
||||
target_tags = ["agnes-${var.customer_name}"]
|
||||
}
|
||||
|
||||
# --- Persistent data disks + VMs (prod + dev) ---
|
||||
|
||||
resource "google_compute_disk" "data" {
|
||||
for_each = { for inst in local.all_instances : inst.name => inst }
|
||||
|
||||
name = "${each.value.name}-data"
|
||||
project = var.gcp_project_id
|
||||
zone = var.zone
|
||||
size = each.value.data_disk_gb
|
||||
type = "pd-ssd"
|
||||
}
|
||||
|
||||
resource "google_compute_address" "ip" {
|
||||
for_each = { for inst in local.all_instances : inst.name => inst }
|
||||
|
||||
name = "${each.value.name}-ip"
|
||||
project = var.gcp_project_id
|
||||
region = var.region
|
||||
}
|
||||
|
||||
resource "google_compute_instance" "vm" {
|
||||
for_each = { for inst in local.all_instances : inst.name => inst }
|
||||
|
||||
name = each.value.name
|
||||
project = var.gcp_project_id
|
||||
machine_type = each.value.machine_type
|
||||
zone = var.zone
|
||||
tags = ["agnes-${var.customer_name}"]
|
||||
|
||||
boot_disk {
|
||||
initialize_params {
|
||||
image = "ubuntu-os-cloud/ubuntu-2404-lts-amd64"
|
||||
size = each.value.disk_size_gb
|
||||
type = "pd-ssd"
|
||||
}
|
||||
}
|
||||
|
||||
attached_disk {
|
||||
source = google_compute_disk.data[each.key].self_link
|
||||
device_name = "data"
|
||||
}
|
||||
|
||||
network_interface {
|
||||
network = "default"
|
||||
access_config {
|
||||
nat_ip = google_compute_address.ip[each.key].address
|
||||
}
|
||||
}
|
||||
|
||||
metadata = {
|
||||
enable-oslogin = "TRUE"
|
||||
}
|
||||
|
||||
metadata_startup_script = templatefile("${path.module}/startup-script.sh.tpl", {
|
||||
customer_name = var.customer_name
|
||||
image_repo = var.image_repo
|
||||
image_tag = each.value.image_tag
|
||||
upgrade_mode = each.value.upgrade_mode
|
||||
tls_mode = each.value.tls_mode
|
||||
domain = each.value.domain
|
||||
data_source = var.data_source
|
||||
keboola_stack_url = var.keboola_stack_url
|
||||
seed_admin_email = var.seed_admin_email
|
||||
role = each.value.role
|
||||
})
|
||||
|
||||
service_account {
|
||||
email = google_service_account.vm.email
|
||||
scopes = ["cloud-platform"]
|
||||
}
|
||||
|
||||
labels = {
|
||||
app = "agnes"
|
||||
customer = var.customer_name
|
||||
role = each.value.role
|
||||
managed = "terraform"
|
||||
}
|
||||
|
||||
# Změna startup scriptu nemění běžící VM (script běží jen na boot).
|
||||
# Pro aplikaci změn je potřeba VM restartovat nebo recreate.
|
||||
lifecycle {
|
||||
ignore_changes = [metadata_startup_script]
|
||||
}
|
||||
}
|
||||
19
infra/modules/customer-instance/outputs.tf
Normal file
19
infra/modules/customer-instance/outputs.tf
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
output "instance_ips" {
|
||||
description = "Mapa { name => external IP }"
|
||||
value = { for k, v in google_compute_address.ip : k => v.address }
|
||||
}
|
||||
|
||||
output "prod_ip" {
|
||||
description = "External IP prod instance"
|
||||
value = google_compute_address.ip[var.prod_instance.name].address
|
||||
}
|
||||
|
||||
output "vm_service_account" {
|
||||
description = "Email VM SA (pro další IAM bindings, např. BigQuery)"
|
||||
value = google_service_account.vm.email
|
||||
}
|
||||
|
||||
output "jwt_secret_name" {
|
||||
description = "Plný název JWT secretu v Secret Manageru"
|
||||
value = google_secret_manager_secret.jwt.name
|
||||
}
|
||||
100
infra/modules/customer-instance/startup-script.sh.tpl
Normal file
100
infra/modules/customer-instance/startup-script.sh.tpl
Normal file
|
|
@ -0,0 +1,100 @@
|
|||
#!/bin/bash
|
||||
# Agnes VM startup script — templated by Terraform.
|
||||
# Idempotent — spustí se při každém boot.
|
||||
set -euo pipefail
|
||||
exec > /var/log/agnes-startup.log 2>&1
|
||||
|
||||
CUSTOMER_NAME="${customer_name}"
|
||||
IMAGE_REPO="${image_repo}"
|
||||
IMAGE_TAG="${image_tag}"
|
||||
UPGRADE_MODE="${upgrade_mode}"
|
||||
TLS_MODE="${tls_mode}"
|
||||
DOMAIN="${domain}"
|
||||
DATA_SOURCE="${data_source}"
|
||||
KEBOOLA_STACK_URL="${keboola_stack_url}"
|
||||
SEED_ADMIN_EMAIL="${seed_admin_email}"
|
||||
ROLE="${role}"
|
||||
|
||||
echo "=== [Agnes $CUSTOMER_NAME $ROLE] Startup at $(date) ==="
|
||||
|
||||
# --- 1. Docker (install if missing) ---
|
||||
if ! command -v docker &>/dev/null; then
|
||||
curl -fsSL https://get.docker.com | sh
|
||||
fi
|
||||
if ! docker compose version &>/dev/null; then
|
||||
apt-get update && apt-get install -y docker-compose-plugin
|
||||
fi
|
||||
|
||||
# --- 2. Persistent data disk mount ---
|
||||
DATA_DEV="/dev/disk/by-id/google-data"
|
||||
DATA_MNT="/data"
|
||||
if [ -b "$DATA_DEV" ]; then
|
||||
if ! blkid "$DATA_DEV" | grep -q ext4; then
|
||||
mkfs.ext4 -F "$DATA_DEV"
|
||||
fi
|
||||
mkdir -p "$DATA_MNT"
|
||||
mountpoint -q "$DATA_MNT" || mount -o discard,defaults "$DATA_DEV" "$DATA_MNT"
|
||||
grep -qF "$DATA_DEV" /etc/fstab || echo "$DATA_DEV $DATA_MNT ext4 discard,defaults,nofail 0 2" >> /etc/fstab
|
||||
mkdir -p "$DATA_MNT/state" "$DATA_MNT/analytics" "$DATA_MNT/extracts"
|
||||
fi
|
||||
|
||||
# --- 3. App directory + docker-compose files from public repo ---
|
||||
APP_DIR="/opt/agnes"
|
||||
mkdir -p "$APP_DIR"
|
||||
cd "$APP_DIR"
|
||||
|
||||
# Fetch minimal docker-compose from public repo (main branch — stable)
|
||||
curl -fsSL "https://raw.githubusercontent.com/keboola/agnes-the-ai-analyst/main/docker-compose.yml" -o docker-compose.yml
|
||||
curl -fsSL "https://raw.githubusercontent.com/keboola/agnes-the-ai-analyst/main/docker-compose.prod.yml" -o docker-compose.prod.yml
|
||||
|
||||
# TLS overlay (Caddy + Let's Encrypt) — jen pokud potřeba
|
||||
if [ "$TLS_MODE" = "caddy" ] && [ -n "$DOMAIN" ]; then
|
||||
curl -fsSL "https://raw.githubusercontent.com/keboola/agnes-the-ai-analyst/main/Caddyfile" -o Caddyfile 2>/dev/null || true
|
||||
fi
|
||||
|
||||
# --- 4. Fetch secrets from Secret Manager ---
|
||||
KEBOOLA_TOKEN=""
|
||||
if [ "$DATA_SOURCE" = "keboola" ]; then
|
||||
KEBOOLA_TOKEN=$(gcloud secrets versions access latest --secret=keboola-storage-token 2>/dev/null || echo "")
|
||||
fi
|
||||
JWT_KEY=$(gcloud secrets versions access latest --secret=agnes-$${CUSTOMER_NAME}-jwt-secret)
|
||||
|
||||
cat > "$APP_DIR/.env" <<ENVEOF
|
||||
JWT_SECRET_KEY=$JWT_KEY
|
||||
DATA_DIR=$DATA_MNT
|
||||
DATA_SOURCE=$DATA_SOURCE
|
||||
KEBOOLA_STORAGE_TOKEN=$KEBOOLA_TOKEN
|
||||
KEBOOLA_STACK_URL=$KEBOOLA_STACK_URL
|
||||
SEED_ADMIN_EMAIL=$SEED_ADMIN_EMAIL
|
||||
LOG_LEVEL=info
|
||||
DOMAIN=$DOMAIN
|
||||
AGNES_TAG=$IMAGE_TAG
|
||||
ACME_EMAIL=admin@$${DOMAIN#*.}
|
||||
ENVEOF
|
||||
chmod 600 "$APP_DIR/.env"
|
||||
|
||||
# --- 5. Start Agnes ---
|
||||
COMPOSE_PROFILES_ARG=""
|
||||
if [ "$TLS_MODE" = "caddy" ] && [ -n "$DOMAIN" ]; then
|
||||
COMPOSE_PROFILES_ARG="--profile tls"
|
||||
fi
|
||||
|
||||
docker compose -f docker-compose.yml -f docker-compose.prod.yml $COMPOSE_PROFILES_ARG pull
|
||||
docker compose -f docker-compose.yml -f docker-compose.prod.yml $COMPOSE_PROFILES_ARG up -d
|
||||
|
||||
# --- 6. Watchtower (auto-pull new images) ---
|
||||
if [ "$UPGRADE_MODE" = "auto" ]; then
|
||||
# Odstraň starý watchtower pokud existuje (pro idempotenci)
|
||||
docker rm -f agnes-watchtower 2>/dev/null || true
|
||||
docker run -d \
|
||||
--name agnes-watchtower \
|
||||
--restart=unless-stopped \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
containrrr/watchtower \
|
||||
--interval 300 \
|
||||
--cleanup \
|
||||
--include-restarting
|
||||
fi
|
||||
|
||||
echo "=== [Agnes $CUSTOMER_NAME $ROLE] Startup complete at $(date) ==="
|
||||
docker compose ps
|
||||
72
infra/modules/customer-instance/variables.tf
Normal file
72
infra/modules/customer-instance/variables.tf
Normal file
|
|
@ -0,0 +1,72 @@
|
|||
variable "gcp_project_id" {
|
||||
description = "GCP project ID kde bude instance nasazená"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "region" {
|
||||
description = "GCP region"
|
||||
type = string
|
||||
default = "europe-west1"
|
||||
}
|
||||
|
||||
variable "zone" {
|
||||
description = "GCP zone"
|
||||
type = string
|
||||
default = "europe-west1-b"
|
||||
}
|
||||
|
||||
variable "customer_name" {
|
||||
description = "Krátký identifikátor zákazníka (např. keboola, grpn). Použije se v prefixu resourců."
|
||||
type = string
|
||||
validation {
|
||||
condition = can(regex("^[a-z][a-z0-9-]{1,20}$", var.customer_name))
|
||||
error_message = "customer_name musí být lowercase, začínat písmenem, 2-21 znaků."
|
||||
}
|
||||
}
|
||||
|
||||
variable "prod_instance" {
|
||||
description = "Prod VM konfigurace"
|
||||
type = object({
|
||||
name = string
|
||||
machine_type = optional(string, "e2-small")
|
||||
disk_size_gb = optional(number, 30)
|
||||
data_disk_gb = optional(number, 50)
|
||||
image_tag = optional(string, "stable")
|
||||
upgrade_mode = optional(string, "auto")
|
||||
tls_mode = optional(string, "caddy")
|
||||
domain = optional(string, "")
|
||||
})
|
||||
}
|
||||
|
||||
variable "dev_instances" {
|
||||
description = "Seznam dev VMs. Prázdné pole = žádné dev VMs."
|
||||
type = list(object({
|
||||
name = string
|
||||
machine_type = optional(string, "e2-small")
|
||||
image_tag = optional(string, "dev")
|
||||
}))
|
||||
default = []
|
||||
}
|
||||
|
||||
variable "seed_admin_email" {
|
||||
description = "Email prvního admin usera"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "data_source" {
|
||||
description = "Typ data source — keboola | bigquery | csv"
|
||||
type = string
|
||||
default = "keboola"
|
||||
}
|
||||
|
||||
variable "keboola_stack_url" {
|
||||
description = "Keboola Stack URL (pokud data_source = keboola)"
|
||||
type = string
|
||||
default = ""
|
||||
}
|
||||
|
||||
variable "image_repo" {
|
||||
description = "Docker image repo"
|
||||
type = string
|
||||
default = "ghcr.io/keboola/agnes-the-ai-analyst"
|
||||
}
|
||||
|
|
@ -1,39 +0,0 @@
|
|||
output "instance_ip" {
|
||||
description = "Public IP address of the server"
|
||||
value = google_compute_address.data_analyst.address
|
||||
}
|
||||
|
||||
output "ssh_command" {
|
||||
description = "SSH command to connect"
|
||||
value = "ssh ${var.ssh_user}@${google_compute_address.data_analyst.address}"
|
||||
}
|
||||
|
||||
output "api_url" {
|
||||
description = "API URL"
|
||||
value = "http://${google_compute_address.data_analyst.address}:8000"
|
||||
}
|
||||
|
||||
output "web_url" {
|
||||
description = "Web UI URL"
|
||||
value = var.domain != "" ? "https://${var.domain}" : "http://${google_compute_address.data_analyst.address}:8000"
|
||||
}
|
||||
|
||||
output "swagger_url" {
|
||||
description = "Swagger API docs URL"
|
||||
value = "http://${google_compute_address.data_analyst.address}:8000/docs"
|
||||
}
|
||||
|
||||
output "bootstrap_command" {
|
||||
description = "Command to bootstrap first admin user"
|
||||
value = "curl -X POST http://${google_compute_address.data_analyst.address}:8000/auth/bootstrap -H 'Content-Type: application/json' -d '{\"email\":\"admin@keboola.com\",\"name\":\"Admin\"}'"
|
||||
}
|
||||
|
||||
output "cli_setup_commands" {
|
||||
description = "Commands to set up local CLI"
|
||||
value = <<-EOT
|
||||
da setup init --server http://${google_compute_address.data_analyst.address}:8000
|
||||
da setup bootstrap admin@keboola.com
|
||||
da setup test-connection
|
||||
da sync
|
||||
EOT
|
||||
}
|
||||
|
|
@ -1,19 +0,0 @@
|
|||
# Copy to terraform.tfvars and fill in values
|
||||
project_id = "your-gcp-project"
|
||||
region = "europe-north1"
|
||||
zone = "europe-north1-a"
|
||||
machine_type = "e2-small" # 2 vCPU, 2GB RAM, ~$7/mo
|
||||
disk_size_gb = 30
|
||||
instance_name = "data-analyst"
|
||||
ssh_user = "deploy"
|
||||
ssh_public_key_path = "~/.ssh/id_ed25519.pub"
|
||||
|
||||
# JWT secret is auto-generated by Terraform (random_password)
|
||||
|
||||
# Keboola (optional — leave empty for sample data)
|
||||
keboola_token = ""
|
||||
keboola_stack_url = "https://connection.keboola.com"
|
||||
keboola_project_id = ""
|
||||
|
||||
# Domain (optional — leave empty for IP-only access)
|
||||
domain = ""
|
||||
|
|
@ -1,79 +0,0 @@
|
|||
variable "project_id" {
|
||||
description = "GCP project ID"
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "region" {
|
||||
description = "GCP region"
|
||||
type = string
|
||||
default = "europe-west1"
|
||||
}
|
||||
|
||||
variable "zone" {
|
||||
description = "GCP zone"
|
||||
type = string
|
||||
default = "europe-west1-b"
|
||||
}
|
||||
|
||||
variable "machine_type" {
|
||||
description = "VM machine type"
|
||||
type = string
|
||||
default = "e2-small"
|
||||
}
|
||||
|
||||
variable "disk_size_gb" {
|
||||
description = "Boot disk size in GB"
|
||||
type = number
|
||||
default = 30
|
||||
}
|
||||
|
||||
variable "instance_name" {
|
||||
description = "Name for the VM instance"
|
||||
type = string
|
||||
default = "data-analyst"
|
||||
}
|
||||
|
||||
variable "ssh_user" {
|
||||
description = "SSH username"
|
||||
type = string
|
||||
default = "deploy"
|
||||
}
|
||||
|
||||
variable "ssh_public_key_path" {
|
||||
description = "Path to SSH public key file"
|
||||
type = string
|
||||
default = "~/.ssh/id_ed25519.pub"
|
||||
}
|
||||
|
||||
# App config (JWT secret auto-generated by Terraform)
|
||||
|
||||
variable "keboola_token" {
|
||||
description = "Keboola Storage API token"
|
||||
type = string
|
||||
sensitive = true
|
||||
default = ""
|
||||
}
|
||||
|
||||
variable "keboola_stack_url" {
|
||||
description = "Keboola Stack URL"
|
||||
type = string
|
||||
default = "https://connection.keboola.com"
|
||||
}
|
||||
|
||||
variable "keboola_project_id" {
|
||||
description = "Keboola project ID"
|
||||
type = string
|
||||
default = ""
|
||||
}
|
||||
|
||||
variable "admin_email" {
|
||||
description = "Admin email for initial seed (e.g., admin@company.com)"
|
||||
type = string
|
||||
default = ""
|
||||
}
|
||||
|
||||
variable "domain" {
|
||||
description = "Domain name for SSL (optional, empty = IP only)"
|
||||
type = string
|
||||
default = ""
|
||||
}
|
||||
84
scripts/bootstrap-gcp.sh
Executable file
84
scripts/bootstrap-gcp.sh
Executable file
|
|
@ -0,0 +1,84 @@
|
|||
#!/usr/bin/env bash
|
||||
# Bootstrap GCP projekt pro Agnes deployment.
|
||||
# Jednorázové, idempotentní. Spusť jako owner GCP projektu.
|
||||
#
|
||||
# Usage: bootstrap-gcp.sh <GCP_PROJECT_ID> [SA_NAME]
|
||||
#
|
||||
# Produkuje:
|
||||
# - enabled APIs (compute, iam, secretmanager, storage, iamcredentials)
|
||||
# - service account <SA_NAME> s rolemi pro TF apply
|
||||
# - GCS bucket agnes-<PROJECT_ID>-tfstate (versioned, uniform bucket-level access)
|
||||
# - SA JSON key (lokální soubor — paste do GitHub secret GCP_SA_KEY a smazat)
|
||||
set -euo pipefail
|
||||
|
||||
PROJECT_ID="${1:?Usage: $0 <GCP_PROJECT_ID> [SA_NAME=agnes-deploy]}"
|
||||
SA_NAME="${2:-agnes-deploy}"
|
||||
SA_EMAIL="${SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"
|
||||
|
||||
echo "=== Bootstrap GCP project: ${PROJECT_ID} ==="
|
||||
gcloud config set project "${PROJECT_ID}" 1>/dev/null
|
||||
|
||||
echo "=== Enable APIs ==="
|
||||
gcloud services enable \
|
||||
compute.googleapis.com \
|
||||
iam.googleapis.com \
|
||||
iamcredentials.googleapis.com \
|
||||
secretmanager.googleapis.com \
|
||||
cloudresourcemanager.googleapis.com \
|
||||
storage.googleapis.com \
|
||||
--project="${PROJECT_ID}"
|
||||
|
||||
echo "=== Create deploy service account (if not exists) ==="
|
||||
if ! gcloud iam service-accounts describe "${SA_EMAIL}" --project="${PROJECT_ID}" 2>/dev/null 1>&2; then
|
||||
gcloud iam service-accounts create "${SA_NAME}" \
|
||||
--display-name="Agnes Terraform deploy" \
|
||||
--project="${PROJECT_ID}"
|
||||
else
|
||||
echo " (SA already exists — skipping creation)"
|
||||
fi
|
||||
|
||||
echo "=== Grant roles ==="
|
||||
for role in \
|
||||
compute.instanceAdmin.v1 \
|
||||
compute.securityAdmin \
|
||||
compute.networkAdmin \
|
||||
iam.serviceAccountUser \
|
||||
iam.serviceAccountAdmin \
|
||||
secretmanager.admin \
|
||||
storage.admin \
|
||||
resourcemanager.projectIamAdmin; do
|
||||
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
|
||||
--member="serviceAccount:${SA_EMAIL}" \
|
||||
--role="roles/${role}" \
|
||||
--condition=None \
|
||||
--quiet 1>/dev/null
|
||||
done
|
||||
|
||||
echo "=== Create tfstate bucket (if not exists) ==="
|
||||
BUCKET="agnes-${PROJECT_ID}-tfstate"
|
||||
if ! gsutil ls -b "gs://${BUCKET}" 2>/dev/null 1>&2; then
|
||||
gsutil mb -p "${PROJECT_ID}" -l europe-west1 -b on "gs://${BUCKET}"
|
||||
gsutil versioning set on "gs://${BUCKET}"
|
||||
else
|
||||
echo " (bucket already exists — skipping creation)"
|
||||
fi
|
||||
|
||||
echo "=== Generate SA key ==="
|
||||
KEY_FILE="./${SA_NAME}-${PROJECT_ID}-key.json"
|
||||
gcloud iam service-accounts keys create "${KEY_FILE}" \
|
||||
--iam-account="${SA_EMAIL}" \
|
||||
--project="${PROJECT_ID}"
|
||||
|
||||
echo ""
|
||||
echo "=== HOTOVO ==="
|
||||
echo ""
|
||||
echo "SA email: ${SA_EMAIL}"
|
||||
echo "TF state bucket: gs://${BUCKET}"
|
||||
echo "SA key file: ${KEY_FILE}"
|
||||
echo ""
|
||||
echo "DALŠÍ KROKY:"
|
||||
echo "1. Pushni klíč do GitHub secretu privátního infra repa:"
|
||||
echo " gh secret set GCP_SA_KEY --repo <owner>/<repo> < ${KEY_FILE}"
|
||||
echo "2. POTOM smaž klíč z lokálu:"
|
||||
echo " rm ${KEY_FILE}"
|
||||
echo ""
|
||||
44
scripts/fetch-env-from-secrets.sh
Executable file
44
scripts/fetch-env-from-secrets.sh
Executable file
|
|
@ -0,0 +1,44 @@
|
|||
#!/usr/bin/env bash
|
||||
# Stáhne secrets z GCP Secret Manageru a vytvoří .env pro Agnes.
|
||||
# Spouští se na VM pod uživatelem, který má gcloud přístup k Secret Manageru
|
||||
# (typicky přes VM service account s roles/secretmanager.secretAccessor).
|
||||
#
|
||||
# Usage: ./fetch-env-from-secrets.sh [APP_DIR]
|
||||
# Default APP_DIR: /home/deploy/app
|
||||
set -euo pipefail
|
||||
|
||||
APP_DIR="${1:-${APP_DIR:-/home/deploy/app}}"
|
||||
ENV_FILE="${APP_DIR}/.env"
|
||||
|
||||
# Non-secret config (override via environment or hardcoded defaults)
|
||||
DATA_SOURCE="${DATA_SOURCE:-keboola}"
|
||||
KEBOOLA_STACK_URL="${KEBOOLA_STACK_URL:-https://connection.us-east4.gcp.keboola.com/}"
|
||||
SEED_ADMIN_EMAIL="${SEED_ADMIN_EMAIL:-zdenek.srotyr@keboola.com}"
|
||||
LOG_LEVEL="${LOG_LEVEL:-info}"
|
||||
DATA_DIR="${DATA_DIR:-/data}"
|
||||
AGNES_TAG="${AGNES_TAG:-stable}"
|
||||
|
||||
echo "Fetching secrets from Secret Manager..."
|
||||
JWT_KEY=$(gcloud secrets versions access latest --secret=jwt-secret-key)
|
||||
KEBOOLA_TOKEN=""
|
||||
if [ "$DATA_SOURCE" = "keboola" ]; then
|
||||
KEBOOLA_TOKEN=$(gcloud secrets versions access latest --secret=keboola-storage-token)
|
||||
fi
|
||||
|
||||
echo "Writing ${ENV_FILE}..."
|
||||
cat > "${ENV_FILE}" <<EOF
|
||||
JWT_SECRET_KEY=${JWT_KEY}
|
||||
DATA_DIR=${DATA_DIR}
|
||||
DATA_SOURCE=${DATA_SOURCE}
|
||||
KEBOOLA_STORAGE_TOKEN=${KEBOOLA_TOKEN}
|
||||
KEBOOLA_STACK_URL=${KEBOOLA_STACK_URL}
|
||||
SEED_ADMIN_EMAIL=${SEED_ADMIN_EMAIL}
|
||||
LOG_LEVEL=${LOG_LEVEL}
|
||||
AGNES_TAG=${AGNES_TAG}
|
||||
EOF
|
||||
|
||||
chmod 600 "${ENV_FILE}"
|
||||
# Chown je best-effort — pokud skript neběží jako root, ignoruj
|
||||
chown deploy:deploy "${ENV_FILE}" 2>/dev/null || true
|
||||
|
||||
echo "Done. ${ENV_FILE} has $(wc -l < "${ENV_FILE}") lines, chmod 600."
|
||||
Loading…
Reference in a new issue