* chore(oss): isolate customer-specific deploy bits from scripts/grpn/ (#88) Vendor-neutralization step before public release. The directory mixed two concerns: (1) generic ops scripts referenced from mainline OSS infrastructure (TLS rotation, auto-upgrade cron) and (2) one operator's hackathon manual-deploy helper with hardcoded GCP project IDs, VM names, and admin emails. Splitting them per concern. Moved (still in OSS, just under a vendor-neutral name): - scripts/grpn/agnes-tls-rotate.sh → scripts/ops/agnes-tls-rotate.sh - scripts/grpn/agnes-auto-upgrade.sh → scripts/ops/agnes-auto-upgrade.sh Removed (belongs in private consumer infra repos, not upstream OSS): - scripts/grpn/Makefile (hardcoded prj-grp-foundryai-dev-7c37, foundryai-development VM name, e_zsrotyr@groupon.com bootstrap email) - scripts/grpn/README.md (GRPN hackathon deploy walkthrough) - docs/superpowers/plans/2026-04-22-grpn-deploy-learnings.md (org-specific deploy log) Cross-refs updated in README.md, CLAUDE.md, docs/DEPLOYMENT.md, docker-compose.yml. CHANGELOG entry flags BREAKING (ops) for any consumer infra repo that installs these scripts via path-based systemd timers. This is the first wave of #88 — the remaining leaks (test data with prj-grp-dataview-prod-1ff9, AIAgent.FoundryAI tags in OpenMetadata test fixtures, docstrings in connectors/openmetadata/enricher.py) will be a separate, smaller PR. Refs #88. * chore(oss): comprehensive vendor-neutralization (#88 wave 2 + review fixes) PR #94 review found that the original wave-1 grep was scoped wrong and many leaks survived. This commit closes wave 1 properly AND folds in all wave-2 anonymization in a single pass — easier to review than two PRs. Wave-1 review-fix corrections: - Caddyfile: scripts/grpn/agnes-tls-rotate.sh → scripts/ops/ (the original wave-1 grep filter excluded extensionless files like Caddyfile). - CHANGELOG bullet rewritten — original wording implied an in-repo migration for infra/modules/customer-instance/, which is wrong (the TF module embeds the script inline via heredoc, never sourced from scripts/grpn/). Now flags downstream consumer infra repos only. - infra/modules/customer-instance/variables.tf: Czech docstring with `grpn` example → English description with `acme, example` placeholders. Wave-2 anonymization: - Code docstrings (connectors/openmetadata/{client,transformer,enricher}.py, src/catalog_export.py, scripts/duckdb_manager.py): prj-grp-… → my-bq-project / prj-example-1234, AIAgent.FoundryAI → AIAgent.MyAgent, FoundryAIDataModel → AnalyticsDataModel. - Test fixtures (4 files): same set of replacements — 157 tests still pass. - .github/workflows/keboola-deploy.yml: "Groupon-side dev VMs" comment → generic "per-developer dev VMs". - docs/auth-groups.md + scripts/debug/probe_google_groups.py: kids-ai-data-analysis project name → acme-internal-prod placeholder. - 5 planning/spec docs under docs/superpowers/{plans,specs}/2026-04-21-*: hardcoded IPs (34.77.94.14, 34.77.102.61) → <dev-vm-ip>/<prod-vm-ip>; GRPN/Groupon → Acme/another-customer; prj-grp-… → prj-example-…. - scripts/switch-dev-vm.sh deleted — hackathon-era helper hardcoded to a specific shared dev VM. Per-developer dev VMs are the supported pattern. Final grep `groupon|grpn|foundryai|prj-grp|groupondev|34\.77\.(94|102)\.…|kids-ai-data` returns zero hits (excluding CHANGELOG.md historical entries). CHANGELOG entry expanded to document both waves under one bullet, with the BREAKING (ops) clarification about the TF module being unaffected. Refs review of #94, closes #88. * fix(oss): close remaining #94 review-2 findings (Czech, padak refs, CHANGELOG) Reviewer of PR #94 round 2 caught 4 remaining items the wave-2 pass missed: 1. infra/modules/customer-instance/variables.tf had Czech descriptions on 8 more variables. Previous review only flagged line 19; this round audited the rest. Translated lines 2, 28, 42-46 (heredoc), 60, 65, 71, 78, 84 to English. Same review concern: a Terraform module that is the customer-facing API surface in Czech is unfit for OSS distribution. 2. infra/modules/customer-instance/outputs.tf had Czech descriptions on four outputs. Same fix. 3. docs/padak-security.md referenced a private repo (padak/keboola_agent_cli#206) in two places. Replaced with generic 'tracked upstream in the auth-CLI repo' per CLAUDE.md vendor-agnostic rule (no cross-refs to private repos). 4. scripts/fetch-env-from-secrets.sh:41 had a Czech comment. Translated. 5. CHANGELOG cosmetic: bullet said 'AIAgent.FoundryAI -> AIAgent.MyAgent' but the actual code uses both MyAgent (in docstrings) and Example (in test fixtures). Reworded to mention both targets. Final grep across all shipping file types (.md, .py, .yml, .yaml, .sh, Makefile, .json, .tf, .tpl, Caddyfile, .toml) for groupon|grpn|foundryai| prj-grp|groupondev|34.77.94.14|34.77.102.61|kids-ai-data|padak/keboola_agent_cli returns ZERO hits (excluding CHANGELOG.md). Czech-diacritic grep across .tf/.toml/Caddyfile/Makefile/.yml returns ZERO hits. 157/157 OpenMetadata + DuckDB tests still pass. * fix(oss): close #94 round-3 leaks (env.template, instance.yaml.example, padak typo) Round-3 reviewer caught two MUST-FIX leaks the round-2 grep missed (grep was scoped to extensions that did not include .template / .example suffixes — the audit was right, the previous grep was not paranoid enough): 1. config/instance.yaml.example:114 — '(optional - Groupon-specific)' brand leak in a shipping config example. Replaced with '(optional)'. 2. config/.env.template:68 — stale path 'scripts/grpn/agnes-tls-rotate.sh' in operator-facing env-template comment. The script lives at scripts/ops/ now (commit 16a85cc); this comment had been pointing operators at a non-existent path. 3. docs/padak-security.md:188 — phrase duplication 'tracked in tracked upstream' from a sloppy substitution in round-2. Trivial wording fix. Final paranoid grep across .md/.py/.yml/.yaml/.sh/Makefile/.json/.tf/.tpl/ Caddyfile/.toml/.template/.example/.env* with the full token set (groupon|grpn|foundryai|prj-grp|groupondev|34\.77\.94\.14|34\.77\.102\.61| kids-ai-data|padak/keboola_agent_cli) returns ZERO hits, excluding CHANGELOG.md historical entries. * fix(oss): #94 round-4 — QUICKSTART.md + rename padak-security.md Devin Review caught two findings on the latest round-3 commit: 1. docs/QUICKSTART.md:67 still pointed users at the deleted scripts/switch-dev-vm.sh. A Quickstart user following step-by-step would hit a missing-file error at the final step. Replaced with the inline gcloud-ssh equivalent that the Removed bullet documents. 2. docs/padak-security.md filename retains the personal identifier 'padak'. The PR fixed the body content (replaced padak/keboola_agent_cli#206 references with generic wording) but missed the filename. Renamed to docs/security-audit-2026-04.md (date-anchored, vendor-neutral). Updated the historical CHANGELOG link to point at the new path with an inline note about the rename. * fix(oss): redact remaining hardcoded IPs from planning docs + remove default email Devin Review caught two more leaks: 1. scripts/fetch-env-from-secrets.sh line 16 had a hardcoded personal-email default (zdenek.srotyr@keboola.com). Replaced with ':?' bash error so SEED_ADMIN_EMAIL must be explicitly set — safer than carrying any specific identity. 2. Planning docs still had 35.195.96.98 and 34.62.223.189 (legacy prod/dev IPs) that the round-1 IP-replace pattern missed (it only targeted 34.77.x.x). Generic regex redaction across all five planning docs replaces every public IP with <redacted-ip>, preserving private/loopback/IAP ranges.
2137 lines
63 KiB
Markdown
2137 lines
63 KiB
Markdown
# Multi-Customer Deployment Implementation Plan
|
||
|
||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||
|
||
**Goal:** Přejít z dnešního "prod běží z osobního forku padak/tmp_oss" na production-grade multi-customer setup podle spec `docs/superpowers/specs/2026-04-21-multi-customer-deployment-spec.md`.
|
||
|
||
**Architecture:** Public upstream (`keboola/agnes-the-ai-analyst`) s TF modulem + public image na GHCR. Privátní template repo (`keboola/agnes-infra-template`) jako skeleton. Per-customer privátní repo (`keboola/agnes-infra-keboola` pro Keboola-as-customer, `{org}/agnes-infra` pro další) s Terraform + GitHub Actions + SA JSON key. Každý zákazník má vlastní GCP projekt, vlastní Secret Manager, vlastní prod/dev VMs. Watchtower na VMs polluje GHCR pro auto-deploy. Branch-aware dev VMs přes pole `dev_instances` v tfvars.
|
||
|
||
**Tech Stack:** Terraform (google provider ~5.0), Docker Compose, Caddy (TLS), Watchtower, GHCR, Google Cloud (Compute Engine, Secret Manager, Cloud Storage, IAM), GitHub Actions, Argon2 (passwords), DuckDB.
|
||
|
||
---
|
||
|
||
## Závislosti mezi fázemi
|
||
|
||
```
|
||
Fáze 1 (MVP) ──────────────────────────┐
|
||
│ │
|
||
▼ ▼
|
||
Fáze 2 (TF modul + PD + rebuild) Fáze 0 (Předpoklady)
|
||
│
|
||
├─────────┬──────────┬──────────┐
|
||
▼ ▼ ▼ ▼
|
||
Fáze 3 Fáze 4 Fáze 5 Fáze 6
|
||
(TLS) (Watchtower) (CI/CD) (Template)
|
||
│ │ │ │
|
||
└─────────┴──────────┴──────────┘
|
||
▼
|
||
Hotovo
|
||
```
|
||
|
||
Fáze 0 a 1 jsou sériové. Po Fázi 2 mohou 3/4/5 běžet paralelně. Fáze 6 používá výstupy 3/4/5.
|
||
|
||
---
|
||
|
||
## Fáze 0 — Předpoklady (manuální, mimo kód)
|
||
|
||
Tyto kroky vyžadují externí akce (oprávnění, Keboola UI). Musí být hotové před Fází 1.
|
||
|
||
### Task 0.1: Ověřit přístupová práva
|
||
|
||
- [ ] **Step 1: Ověřit, že máš `iam.serviceAccountAdmin` na internal-prod**
|
||
|
||
```bash
|
||
gcloud projects get-iam-policy internal-prod --format=json \
|
||
| python3 -c "import json, sys; d=json.load(sys.stdin); \
|
||
me='zdenek.srotyr@keboola.com'; \
|
||
roles=[b['role'] for b in d['bindings'] if any(me in m for m in b.get('members', []))]; \
|
||
print('\n'.join(roles) if roles else 'NO DIRECT ROLES — check org-level or ask Petr (owner)')"
|
||
```
|
||
|
||
Expected: seznam rolí, nebo poznámka "NO DIRECT ROLES".
|
||
|
||
- [ ] **Step 2: Pokud chybí SA admin práva, požádat Petra o dočasný `roles/iam.serviceAccountAdmin` + `roles/resourcemanager.projectIamAdmin`**
|
||
|
||
Poslat mu odkaz na tuhle dokumentaci: https://cloud.google.com/iam/docs/understanding-roles#iam-roles
|
||
|
||
Napsat Petrovi ve Slacku / emailu: "Potřebuji dočasně roli `iam.serviceAccountAdmin` a `resourcemanager.projectIamAdmin` na projektu `internal-prod` pro vytvoření Agnes deploy SA. Zrušíme, jakmile bude hotovo."
|
||
|
||
- [ ] **Step 3: Ověřit, že image `ghcr.io/keboola/agnes-the-ai-analyst` je public**
|
||
|
||
```bash
|
||
gh api /orgs/keboola/packages/container/agnes-the-ai-analyst --jq '.visibility' 2>&1
|
||
```
|
||
|
||
Expected: `"public"`. Pokud `"private"`, změnit přes GitHub UI: Keboola org → Packages → agnes-the-ai-analyst → Package settings → Change visibility → Public.
|
||
|
||
### Task 0.2: Backup stávajících dat (safety net před Fází 2)
|
||
|
||
- [ ] **Step 1: Snapshot boot disku prod VM (obsahuje /data)**
|
||
|
||
```bash
|
||
gcloud compute disks snapshot data-analyst \
|
||
--zone=europe-west1-b \
|
||
--snapshot-names=data-analyst-pre-migration-$(date +%Y%m%d) \
|
||
--project=internal-prod
|
||
```
|
||
|
||
Expected: `Created snapshot data-analyst-pre-migration-YYYYMMDD`.
|
||
|
||
- [ ] **Step 2: Ověřit snapshot**
|
||
|
||
```bash
|
||
gcloud compute snapshots list --project=internal-prod \
|
||
--filter="name~pre-migration" --format="table(name, status, diskSizeGb, creationTimestamp)"
|
||
```
|
||
|
||
Expected: STATUS = READY, 30 GB.
|
||
|
||
---
|
||
|
||
## Fáze 1 — MVP: Odstřihnout od osobního forku, přejít na :stable image
|
||
|
||
**Goal fáze:** Prod VM `data-analyst` pulluje image z GHCR, nikoliv git pull z `ZdenekSrotyr/tmp_oss`. Tokeny jsou v Secret Manageru. Přepnutí je reverzibilní.
|
||
|
||
### Task 1.1: Přidat per-branch image tagging do release.yml
|
||
|
||
**Files:**
|
||
- Modify: `.github/workflows/release.yml:47-95`
|
||
|
||
- [ ] **Step 1: Number current state of meta step**
|
||
|
||
```bash
|
||
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
|
||
grep -n "branch_slug\|feature_tag\|SLUG" .github/workflows/release.yml 2>&1 | head -5
|
||
```
|
||
|
||
Expected: žádné výsledky — pattern neexistuje, přidáme ho.
|
||
|
||
- [ ] **Step 2: Otevřít `.github/workflows/release.yml` a najít `Claim version tag` step**
|
||
|
||
Sekce má `id: meta`. Za řádkem `echo "short_sha=${SHORT_SHA}" >> "$GITHUB_OUTPUT"` (~ř. 90) přidat:
|
||
|
||
```yaml
|
||
# Per-branch slug for dev images (only on feature branches)
|
||
if [[ "${{ github.ref }}" != "refs/heads/main" ]]; then
|
||
BRANCH_NAME="${GITHUB_REF#refs/heads/}"
|
||
BRANCH_SLUG=$(echo "$BRANCH_NAME" | sed 's|^feature/||' | sed 's|[^a-zA-Z0-9-]|-|g' | tr '[:upper:]' '[:lower:]' | cut -c1-50)
|
||
echo "branch_slug=${BRANCH_SLUG}" >> "$GITHUB_OUTPUT"
|
||
echo "Branch slug: ${BRANCH_SLUG}"
|
||
fi
|
||
```
|
||
|
||
- [ ] **Step 3: V `Build and push` stepu přidat branch-slug tag**
|
||
|
||
Najít `tags: |` blok (~ř. 110), nahradit za:
|
||
|
||
```yaml
|
||
tags: |
|
||
ghcr.io/${{ github.repository }}:${{ steps.meta.outputs.channel }}
|
||
ghcr.io/${{ github.repository }}:${{ steps.meta.outputs.versioned_tag }}
|
||
ghcr.io/${{ github.repository }}:sha-${{ steps.meta.outputs.short_sha }}
|
||
${{ steps.meta.outputs.channel == 'dev' && format('ghcr.io/{0}:dev-{1}', github.repository, steps.meta.outputs.branch_slug) || '' }}
|
||
```
|
||
|
||
Poslední řádek přidá `:dev-<branch-slug>` jen při pushech na feature branch.
|
||
|
||
- [ ] **Step 4: Syntax check workflow**
|
||
|
||
```bash
|
||
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
|
||
gh workflow view release.yml 2>&1 | head -10
|
||
```
|
||
|
||
Expected: workflow info, žádné "Parse error".
|
||
|
||
- [ ] **Step 5: Commit**
|
||
|
||
```bash
|
||
git add .github/workflows/release.yml
|
||
git commit -m "ci: add per-branch image tag :dev-<slug> for branch-aware dev deploys"
|
||
```
|
||
|
||
### Task 1.2: Vytvořit GCP deploy service account
|
||
|
||
**Files:**
|
||
- Create: `scripts/bootstrap-gcp.sh`
|
||
|
||
- [ ] **Step 1: Vytvořit bootstrap skript**
|
||
|
||
```bash
|
||
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
|
||
mkdir -p scripts
|
||
```
|
||
|
||
Write `scripts/bootstrap-gcp.sh`:
|
||
|
||
```bash
|
||
#!/usr/bin/env bash
|
||
# Bootstrap GCP projekt pro Agnes deployment.
|
||
# Jednorázové, idempotentní. Výstup = výpis secretů pro GitHub Actions.
|
||
#
|
||
# Usage: bootstrap-gcp.sh <GCP_PROJECT_ID> [SA_NAME]
|
||
# Pokud SA existuje, skript vygeneruje nový klíč a skončí.
|
||
set -euo pipefail
|
||
|
||
PROJECT_ID="${1:?Usage: $0 <GCP_PROJECT_ID> [SA_NAME=agnes-deploy]}"
|
||
SA_NAME="${2:-agnes-deploy}"
|
||
SA_EMAIL="${SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"
|
||
|
||
echo "=== Bootstrap GCP projekt: ${PROJECT_ID} ==="
|
||
gcloud config set project "${PROJECT_ID}" 1>/dev/null
|
||
|
||
echo "=== Enable APIs ==="
|
||
gcloud services enable \
|
||
compute.googleapis.com \
|
||
iam.googleapis.com \
|
||
iamcredentials.googleapis.com \
|
||
secretmanager.googleapis.com \
|
||
cloudresourcemanager.googleapis.com \
|
||
storage.googleapis.com \
|
||
--project="${PROJECT_ID}"
|
||
|
||
echo "=== Create deploy service account (if not exists) ==="
|
||
if ! gcloud iam service-accounts describe "${SA_EMAIL}" --project="${PROJECT_ID}" 2>/dev/null; then
|
||
gcloud iam service-accounts create "${SA_NAME}" \
|
||
--display-name="Agnes Terraform deploy" \
|
||
--project="${PROJECT_ID}"
|
||
fi
|
||
|
||
echo "=== Grant roles ==="
|
||
for role in \
|
||
compute.instanceAdmin.v1 \
|
||
compute.securityAdmin \
|
||
compute.networkAdmin \
|
||
iam.serviceAccountUser \
|
||
secretmanager.admin \
|
||
storage.admin; do
|
||
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
|
||
--member="serviceAccount:${SA_EMAIL}" \
|
||
--role="roles/${role}" \
|
||
--condition=None \
|
||
--quiet 1>/dev/null
|
||
done
|
||
|
||
echo "=== Create tfstate bucket (if not exists) ==="
|
||
BUCKET="agnes-${PROJECT_ID}-tfstate"
|
||
if ! gsutil ls -b "gs://${BUCKET}" 2>/dev/null; then
|
||
gsutil mb -p "${PROJECT_ID}" -l europe-west1 -b on "gs://${BUCKET}"
|
||
gsutil versioning set on "gs://${BUCKET}"
|
||
fi
|
||
|
||
echo "=== Generate SA key ==="
|
||
KEY_FILE="./${SA_NAME}-${PROJECT_ID}-key.json"
|
||
gcloud iam service-accounts keys create "${KEY_FILE}" \
|
||
--iam-account="${SA_EMAIL}" \
|
||
--project="${PROJECT_ID}"
|
||
|
||
echo ""
|
||
echo "=== HOTOVO ==="
|
||
echo ""
|
||
echo "SA email: ${SA_EMAIL}"
|
||
echo "TF state bucket: gs://${BUCKET}"
|
||
echo "SA key file: ${KEY_FILE}"
|
||
echo ""
|
||
echo "DALŠÍ KROKY:"
|
||
echo "1. Pushni klíč do GitHub secretu privátního infra repa:"
|
||
echo " gh secret set GCP_SA_KEY --repo <owner>/<repo> < ${KEY_FILE}"
|
||
echo "2. POTOM smaž klíč z lokálu:"
|
||
echo " rm ${KEY_FILE}"
|
||
echo ""
|
||
```
|
||
|
||
- [ ] **Step 2: Udělat skript executable**
|
||
|
||
```bash
|
||
chmod +x scripts/bootstrap-gcp.sh
|
||
```
|
||
|
||
- [ ] **Step 3: Spustit skript na internal-prod**
|
||
|
||
```bash
|
||
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
|
||
./scripts/bootstrap-gcp.sh internal-prod
|
||
```
|
||
|
||
Expected: na konci výpis "HOTOVO" + instrukce.
|
||
|
||
Pokud selže na "Permission denied": viz Task 0.1 step 2 (požádat Petra).
|
||
|
||
- [ ] **Step 4: Ověřit SA a bucket**
|
||
|
||
```bash
|
||
gcloud iam service-accounts list --project=internal-prod --filter="email~agnes-deploy" --format="value(email)"
|
||
gsutil ls -b gs://agnes-internal-prod-tfstate
|
||
```
|
||
|
||
Expected: SA email + bucket URL.
|
||
|
||
- [ ] **Step 5: Commit bootstrap skript**
|
||
|
||
```bash
|
||
git add scripts/bootstrap-gcp.sh
|
||
git commit -m "infra: add bootstrap-gcp.sh for per-customer GCP setup"
|
||
```
|
||
|
||
### Task 1.3: Nastavit tajemství v Secret Manageru
|
||
|
||
- [ ] **Step 1: Rotovat Keboola Storage token v Keboola UI**
|
||
|
||
Přihlásit se do Keboola UI (https://connection.us-east4.gcp.keboola.com/), sekce Settings → Master Tokens → vygenerovat nový token.
|
||
|
||
**Starý token zachovat aktivní, dokud nebude nový nasazený.**
|
||
|
||
- [ ] **Step 2: Uložit nový token do Secret Manageru**
|
||
|
||
```bash
|
||
read -s NEW_TOKEN
|
||
echo -n "$NEW_TOKEN" | gcloud secrets create keboola-storage-token \
|
||
--data-file=- \
|
||
--replication-policy=automatic \
|
||
--project=internal-prod
|
||
unset NEW_TOKEN
|
||
```
|
||
|
||
Expected: `Created secret [keboola-storage-token]`.
|
||
|
||
- [ ] **Step 3: Vygenerovat a uložit JWT secret**
|
||
|
||
```bash
|
||
openssl rand -hex 32 | gcloud secrets create jwt-secret-key \
|
||
--data-file=- \
|
||
--replication-policy=automatic \
|
||
--project=internal-prod
|
||
```
|
||
|
||
Expected: `Created secret [jwt-secret-key]`.
|
||
|
||
- [ ] **Step 4: Ověřit secrets**
|
||
|
||
```bash
|
||
gcloud secrets list --project=internal-prod --format="table(name, createTime)"
|
||
```
|
||
|
||
Expected: dva secrets — keboola-storage-token, jwt-secret-key.
|
||
|
||
- [ ] **Step 5: Přiřadit read access deploy SA**
|
||
|
||
```bash
|
||
for secret in keboola-storage-token jwt-secret-key; do
|
||
gcloud secrets add-iam-policy-binding "$secret" \
|
||
--member="serviceAccount:agnes-deploy@internal-prod.iam.gserviceaccount.com" \
|
||
--role=roles/secretmanager.secretAccessor \
|
||
--project=internal-prod
|
||
done
|
||
```
|
||
|
||
Expected: `Updated IAM policy` × 2.
|
||
|
||
### Task 1.4: Vytvořit skript, který na VM natáhne secrets ze Secret Manageru do .env
|
||
|
||
**Files:**
|
||
- Create: `scripts/fetch-env-from-secrets.sh`
|
||
|
||
- [ ] **Step 1: Napsat skript**
|
||
|
||
Write `scripts/fetch-env-from-secrets.sh`:
|
||
|
||
```bash
|
||
#!/usr/bin/env bash
|
||
# Stáhne secrets z GCP Secret Manageru a vytvoří .env pro Agnes.
|
||
# Spouští se jednorázově na VM během boot / deploy.
|
||
#
|
||
# Vyžaduje:
|
||
# - gcloud CLI (už nainstalované na GCE default image)
|
||
# - VM SA má roli roles/secretmanager.secretAccessor
|
||
set -euo pipefail
|
||
|
||
APP_DIR="${APP_DIR:-/home/deploy/app}"
|
||
ENV_FILE="${APP_DIR}/.env"
|
||
|
||
echo "Fetching secrets..."
|
||
|
||
KEBOOLA_TOKEN=$(gcloud secrets versions access latest --secret=keboola-storage-token 2>&1)
|
||
JWT_KEY=$(gcloud secrets versions access latest --secret=jwt-secret-key 2>&1)
|
||
|
||
# Non-secret config (může zůstat v metadatě/startup-scriptu)
|
||
DATA_SOURCE="${DATA_SOURCE:-keboola}"
|
||
KEBOOLA_STACK_URL="${KEBOOLA_STACK_URL:-https://connection.us-east4.gcp.keboola.com/}"
|
||
SEED_ADMIN_EMAIL="${SEED_ADMIN_EMAIL:-zdenek.srotyr@keboola.com}"
|
||
LOG_LEVEL="${LOG_LEVEL:-info}"
|
||
DATA_DIR="${DATA_DIR:-/data}"
|
||
|
||
cat > "${ENV_FILE}" <<EOF
|
||
JWT_SECRET_KEY=${JWT_KEY}
|
||
DATA_DIR=${DATA_DIR}
|
||
DATA_SOURCE=${DATA_SOURCE}
|
||
KEBOOLA_STORAGE_TOKEN=${KEBOOLA_TOKEN}
|
||
KEBOOLA_STACK_URL=${KEBOOLA_STACK_URL}
|
||
SEED_ADMIN_EMAIL=${SEED_ADMIN_EMAIL}
|
||
LOG_LEVEL=${LOG_LEVEL}
|
||
EOF
|
||
|
||
chmod 600 "${ENV_FILE}"
|
||
chown deploy:deploy "${ENV_FILE}" 2>/dev/null || true
|
||
|
||
echo "Wrote ${ENV_FILE} (chmod 600)"
|
||
```
|
||
|
||
- [ ] **Step 2: Chmod + commit**
|
||
|
||
```bash
|
||
chmod +x scripts/fetch-env-from-secrets.sh
|
||
git add scripts/fetch-env-from-secrets.sh
|
||
git commit -m "infra: add fetch-env-from-secrets.sh for VM-side secret retrieval"
|
||
```
|
||
|
||
### Task 1.5: Připravit prod docker-compose pro GHCR image
|
||
|
||
**Files:**
|
||
- Modify: `docker-compose.prod.yml`
|
||
|
||
- [ ] **Step 1: Přečíst současný docker-compose.prod.yml**
|
||
|
||
```bash
|
||
cat "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss/docker-compose.prod.yml"
|
||
```
|
||
|
||
Zaznamenat si strukturu (services, volumes).
|
||
|
||
- [ ] **Step 2: Ověřit, že prod overlay používá `image:` místo `build:`**
|
||
|
||
```bash
|
||
grep -E "^\s*(image|build):" docker-compose.prod.yml
|
||
```
|
||
|
||
Expected: řádek `image: ghcr.io/keboola/agnes-the-ai-analyst:${AGNES_TAG:-stable}` (nebo podobně). Pokud chybí, přidat do `services.app`:
|
||
|
||
```yaml
|
||
services:
|
||
app:
|
||
image: ghcr.io/keboola/agnes-the-ai-analyst:${AGNES_TAG:-stable}
|
||
build: !reset null # vypnout lokální build
|
||
```
|
||
|
||
A pro scheduler:
|
||
|
||
```yaml
|
||
scheduler:
|
||
image: ghcr.io/keboola/agnes-the-ai-analyst:${AGNES_TAG:-stable}
|
||
build: !reset null
|
||
```
|
||
|
||
- [ ] **Step 3: Commit změn (pokud nějaké)**
|
||
|
||
```bash
|
||
git status docker-compose.prod.yml
|
||
# Pokud modified:
|
||
git add docker-compose.prod.yml
|
||
git commit -m "infra: prod compose pulls from GHCR via AGNES_TAG env (default :stable)"
|
||
```
|
||
|
||
### Task 1.6: Deploy MVP na prod VM data-analyst
|
||
|
||
**Tohle je destruktivní akce na prod. Předtím Task 0.2 (snapshot).**
|
||
|
||
- [ ] **Step 1: SSH na prod VM a zastavit kontejnery**
|
||
|
||
```bash
|
||
gcloud compute ssh data-analyst --zone=europe-west1-b --project=internal-prod --command="sudo -u deploy bash -c 'cd /home/deploy/app && docker compose down'"
|
||
```
|
||
|
||
Expected: `Container app-app-1 Stopped`, `Container app-scheduler-1 Stopped`.
|
||
|
||
- [ ] **Step 2: Nastavit VM SA na deploy VM (jednorázově)**
|
||
|
||
```bash
|
||
# Ověřit aktuální SA
|
||
gcloud compute instances describe data-analyst --zone=europe-west1-b --project=internal-prod \
|
||
--format="value(serviceAccounts[0].email)"
|
||
```
|
||
|
||
Pokud výstup `327445566538-compute@developer.gserviceaccount.com` (default SA), je to OK pro MVP — má cloud-platform scope a může číst secrets. Ve Fázi 4 (hardening) to přepneme na dedikovaný SA.
|
||
|
||
Přidat mu explicitně secretmanager.secretAccessor (idempotentní):
|
||
|
||
```bash
|
||
gcloud projects add-iam-policy-binding internal-prod \
|
||
--member="serviceAccount:327445566538-compute@developer.gserviceaccount.com" \
|
||
--role="roles/secretmanager.secretAccessor" \
|
||
--condition=None
|
||
```
|
||
|
||
- [ ] **Step 3: Uploadnout fetch-env skript na VM**
|
||
|
||
```bash
|
||
gcloud compute scp \
|
||
"/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss/scripts/fetch-env-from-secrets.sh" \
|
||
data-analyst:/tmp/fetch-env.sh \
|
||
--zone=europe-west1-b --project=internal-prod
|
||
```
|
||
|
||
- [ ] **Step 4: Spustit fetch-env skript pod uživatelem deploy**
|
||
|
||
```bash
|
||
gcloud compute ssh data-analyst --zone=europe-west1-b --project=internal-prod --command="sudo install -m 755 -o deploy -g deploy /tmp/fetch-env.sh /home/deploy/app/fetch-env.sh && sudo -u deploy bash -c 'cd /home/deploy/app && ./fetch-env.sh'"
|
||
```
|
||
|
||
Expected: `Wrote /home/deploy/app/.env (chmod 600)`.
|
||
|
||
- [ ] **Step 5: Zkontrolovat .env na VM (bez vypisování hodnot)**
|
||
|
||
```bash
|
||
gcloud compute ssh data-analyst --zone=europe-west1-b --project=internal-prod --command="sudo -u deploy bash -c 'ls -la /home/deploy/app/.env && wc -l /home/deploy/app/.env && cut -d= -f1 /home/deploy/app/.env'"
|
||
```
|
||
|
||
Expected: soubor 600 mode, 7 řádků, klíče: JWT_SECRET_KEY, DATA_DIR, DATA_SOURCE, KEBOOLA_STORAGE_TOKEN, KEBOOLA_STACK_URL, SEED_ADMIN_EMAIL, LOG_LEVEL.
|
||
|
||
- [ ] **Step 6: Aktualizovat docker-compose.yml konfiguraci na VM na pulling z GHCR**
|
||
|
||
```bash
|
||
gcloud compute ssh data-analyst --zone=europe-west1-b --project=internal-prod --command="sudo -u deploy bash -c 'cd /home/deploy/app && git fetch origin feature/v2-fastapi-duckdb-docker-cli && git reset --hard origin/feature/v2-fastapi-duckdb-docker-cli'"
|
||
```
|
||
|
||
**Pozor:** VM má starý remote `ZdenekSrotyr/tmp_oss`. Tohle tedy nebude fungovat, pokud se ten repo smazal. Alternativa: nahradit origin remote za keboola/agnes-the-ai-analyst:
|
||
|
||
```bash
|
||
gcloud compute ssh data-analyst --zone=europe-west1-b --project=internal-prod --command="sudo -u deploy bash -c 'cd /home/deploy/app && git remote set-url origin https://github.com/keboola/agnes-the-ai-analyst.git && git fetch origin main && git reset --hard origin/main'"
|
||
```
|
||
|
||
Expected: HEAD is now at `<sha>` `<message>`.
|
||
|
||
- [ ] **Step 7: Pullnout image z GHCR a nastartovat s novým override**
|
||
|
||
```bash
|
||
gcloud compute ssh data-analyst --zone=europe-west1-b --project=internal-prod --command="sudo -u deploy bash -c 'cd /home/deploy/app && export AGNES_TAG=stable && docker compose -f docker-compose.yml -f docker-compose.prod.yml pull && docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d'"
|
||
```
|
||
|
||
Expected: `Container app-app-1 Started`, `Container app-scheduler-1 Started`.
|
||
|
||
- [ ] **Step 8: Ověřit běh**
|
||
|
||
```bash
|
||
# Počkat 30 sekund
|
||
sleep 30
|
||
curl -s --max-time 10 http://<redacted-ip>:8000/api/health | python3 -m json.tool | head -10
|
||
```
|
||
|
||
Expected: `"status": "healthy"` nebo `"degraded"` (stale tables jsou OK). Ne `connection refused`.
|
||
|
||
- [ ] **Step 9: Ověřit, že app používá nový image**
|
||
|
||
```bash
|
||
gcloud compute ssh data-analyst --zone=europe-west1-b --project=internal-prod --command="sudo docker inspect app-app-1 --format '{{.Config.Image}}'"
|
||
```
|
||
|
||
Expected: `ghcr.io/keboola/agnes-the-ai-analyst:stable` (ne `app-app`).
|
||
|
||
- [ ] **Step 10: Ověřit login**
|
||
|
||
```bash
|
||
curl -sS --max-time 5 -X POST http://<redacted-ip>:8000/auth/password/login \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"email":"zdenek.srotyr@keboola.com","password":"1234"}' 2>&1 | python3 -c "import sys,json; d=json.load(sys.stdin); print('OK — role:', d.get('role'))"
|
||
```
|
||
|
||
Expected: `OK — role: admin`.
|
||
|
||
- [ ] **Step 11: Zapsat poznámku o nové .env strategii do dokumentace**
|
||
|
||
Add to `docs/DEPLOYMENT.md` (if not present) section "Production environment":
|
||
|
||
```markdown
|
||
## Production .env strategy
|
||
|
||
Secrets (KEBOOLA_STORAGE_TOKEN, JWT_SECRET_KEY) are fetched from GCP Secret Manager
|
||
by `scripts/fetch-env-from-secrets.sh` during VM boot. Non-secret config (STACK_URL,
|
||
SEED_ADMIN_EMAIL, LOG_LEVEL) is passed via env vars in the startup script.
|
||
|
||
To rotate a secret:
|
||
1. Add a new version via `gcloud secrets versions add ...`
|
||
2. SSH to VM and re-run `./fetch-env.sh`
|
||
3. Restart: `docker compose up -d --force-recreate app`
|
||
```
|
||
|
||
- [ ] **Step 12: Commit dokumentace**
|
||
|
||
```bash
|
||
git add docs/DEPLOYMENT.md
|
||
git commit -m "docs: document Secret Manager-backed .env for production"
|
||
```
|
||
|
||
### Task 1.7: Zopakovat MVP deploy na dev VM
|
||
|
||
- [ ] **Step 1: Opakovat Task 1.6 steps 1-10 proti data-analyst-dev VM**
|
||
|
||
Stejné příkazy, jen zaměnit `data-analyst` za `data-analyst-dev` a IP `<redacted-ip>` za `<redacted-ip>`.
|
||
|
||
- [ ] **Step 2: Verify**
|
||
|
||
```bash
|
||
curl -s --max-time 10 http://<redacted-ip>:8000/api/health | python3 -m json.tool | head -3
|
||
```
|
||
|
||
Expected: valid JSON s `"status"`.
|
||
|
||
### Task 1.8: Smazat osobní fork
|
||
|
||
- [ ] **Step 1: Odstranit deploy key z `ZdenekSrotyr/tmp_oss` (pokud existuje)**
|
||
|
||
```bash
|
||
gh api repos/ZdenekSrotyr/tmp_oss/keys 2>&1 | python3 -m json.tool
|
||
```
|
||
|
||
Pokud něco vrací, smazat: `gh api -X DELETE repos/ZdenekSrotyr/tmp_oss/keys/<id>`.
|
||
|
||
- [ ] **Step 2: Smazat repo**
|
||
|
||
```bash
|
||
gh repo delete ZdenekSrotyr/tmp_oss --yes
|
||
```
|
||
|
||
Expected: `✓ Deleted repository ZdenekSrotyr/tmp_oss`.
|
||
|
||
- [ ] **Step 3: Ověřit, že je fuč**
|
||
|
||
```bash
|
||
gh api repos/ZdenekSrotyr/tmp_oss 2>&1 | head -2
|
||
```
|
||
|
||
Expected: `Not Found (HTTP 404)`.
|
||
|
||
### Task 1.9: Invalidovat starý Keboola token
|
||
|
||
- [ ] **Step 1: V Keboola UI zrušit starý master token**
|
||
|
||
(Ruční krok v Keboola UI. Nový token už je v Secret Manageru z Task 1.3.)
|
||
|
||
Ověřit, že nová verze tokenu funguje:
|
||
|
||
```bash
|
||
curl -s --max-time 10 http://<redacted-ip>:8000/api/sync/status 2>&1 | python3 -m json.tool | head -20
|
||
```
|
||
|
||
Expected: nějaký valid JSON. Pokud `401 Unauthorized` nebo `Invalid token`, app ještě má cached starý token — restartovat:
|
||
|
||
```bash
|
||
gcloud compute ssh data-analyst --zone=europe-west1-b --project=internal-prod --command="sudo -u deploy bash -c 'cd /home/deploy/app && docker compose restart app'"
|
||
```
|
||
|
||
### Task 1.10: Checkpoint — Fáze 1 hotová
|
||
|
||
- [ ] **Step 1: Přepnout heslo z `1234` na něco silného**
|
||
|
||
Přes UI nebo:
|
||
|
||
```bash
|
||
read -s NEW_PASSWORD
|
||
TOKEN=$(curl -sS -X POST http://<redacted-ip>:8000/auth/password/login \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"email":"zdenek.srotyr@keboola.com","password":"1234"}' | python3 -c "import sys,json;print(json.load(sys.stdin)['access_token'])")
|
||
# [Volba: použít admin endpoint pro změnu hesla, pokud existuje — jinak přes UI]
|
||
unset NEW_PASSWORD TOKEN
|
||
```
|
||
|
||
- [ ] **Step 2: Ověřit stav**
|
||
|
||
Zkontrolovat checklist:
|
||
- [ ] Prod VM `data-analyst` běží z `ghcr.io/...:stable`
|
||
- [ ] Dev VM `data-analyst-dev` běží z `ghcr.io/...:stable`
|
||
- [ ] Secrets v GCP Secret Manageru
|
||
- [ ] Heslo admin usera není `1234`
|
||
- [ ] `ZdenekSrotyr/tmp_oss` je smazaný
|
||
- [ ] Starý Keboola token je invalidován
|
||
|
||
---
|
||
|
||
## Fáze 2 — TF modul + persistent disk + F1 rebuild
|
||
|
||
**Goal fáze:** Keboola instance běží na VMs, kterou spravuje Terraform modul z `infra/modules/customer-instance/`. Data jsou na samostatném persistent disku. TF state v GCS bucketu.
|
||
|
||
### Task 2.1: Refactor `infra/main.tf` na modulární strukturu
|
||
|
||
**Files:**
|
||
- Create: `infra/modules/customer-instance/main.tf`
|
||
- Create: `infra/modules/customer-instance/variables.tf`
|
||
- Create: `infra/modules/customer-instance/outputs.tf`
|
||
- Create: `infra/modules/customer-instance/startup-script.sh`
|
||
- Delete: `infra/main.tf` (old monolith)
|
||
- Keep (upraveno): `infra/variables.tf`, `infra/outputs.tf`, `infra/terraform.tfvars.example`
|
||
- Create: `infra/examples/minimal/main.tf` (usage example)
|
||
|
||
- [ ] **Step 1: Vytvořit adresářovou strukturu**
|
||
|
||
```bash
|
||
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
|
||
mkdir -p infra/modules/customer-instance
|
||
mkdir -p infra/examples/minimal
|
||
```
|
||
|
||
- [ ] **Step 2: Napsat `infra/modules/customer-instance/variables.tf`**
|
||
|
||
Write:
|
||
|
||
```hcl
|
||
variable "gcp_project_id" {
|
||
description = "GCP project ID kde bude instance nasazená"
|
||
type = string
|
||
}
|
||
|
||
variable "region" {
|
||
description = "GCP region"
|
||
type = string
|
||
default = "europe-west1"
|
||
}
|
||
|
||
variable "zone" {
|
||
description = "GCP zone"
|
||
type = string
|
||
default = "europe-west1-b"
|
||
}
|
||
|
||
variable "customer_name" {
|
||
description = "Krátké identifikátor zákazníka (např. keboola, another-customer). Použije se v prefixu resourců."
|
||
type = string
|
||
validation {
|
||
condition = can(regex("^[a-z][a-z0-9-]{1,20}$", var.customer_name))
|
||
error_message = "customer_name musí být lowercase, začínat písmenem, 2-21 znaků."
|
||
}
|
||
}
|
||
|
||
variable "prod_instance" {
|
||
description = "Prod VM konfigurace"
|
||
type = object({
|
||
name = string
|
||
machine_type = optional(string, "e2-small")
|
||
disk_size_gb = optional(number, 30)
|
||
data_disk_gb = optional(number, 50)
|
||
image_tag = optional(string, "stable")
|
||
upgrade_mode = optional(string, "auto")
|
||
tls_mode = optional(string, "caddy")
|
||
domain = optional(string, "")
|
||
})
|
||
}
|
||
|
||
variable "dev_instances" {
|
||
description = "Seznam dev VMs. Prázdné pole = žádné dev VMs."
|
||
type = list(object({
|
||
name = string
|
||
machine_type = optional(string, "e2-small")
|
||
image_tag = optional(string, "dev")
|
||
}))
|
||
default = []
|
||
}
|
||
|
||
variable "seed_admin_email" {
|
||
description = "Email prvního admin usera"
|
||
type = string
|
||
}
|
||
|
||
variable "data_source" {
|
||
description = "Typ data source — keboola | bigquery | csv"
|
||
type = string
|
||
default = "keboola"
|
||
}
|
||
|
||
variable "keboola_stack_url" {
|
||
description = "Keboola Stack URL (pokud data_source = keboola)"
|
||
type = string
|
||
default = ""
|
||
}
|
||
|
||
variable "image_repo" {
|
||
description = "Docker image repo"
|
||
type = string
|
||
default = "ghcr.io/keboola/agnes-the-ai-analyst"
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 3: Napsat `infra/modules/customer-instance/main.tf`**
|
||
|
||
Write:
|
||
|
||
```hcl
|
||
terraform {
|
||
required_version = ">= 1.5"
|
||
required_providers {
|
||
google = {
|
||
source = "hashicorp/google"
|
||
version = "~> 5.0"
|
||
}
|
||
random = {
|
||
source = "hashicorp/random"
|
||
version = "~> 3.0"
|
||
}
|
||
}
|
||
}
|
||
|
||
locals {
|
||
all_instances = concat(
|
||
[merge(var.prod_instance, { role = "prod" })],
|
||
[for d in var.dev_instances : merge(d, {
|
||
role = "dev"
|
||
disk_size_gb = 30
|
||
data_disk_gb = 20
|
||
upgrade_mode = "auto"
|
||
tls_mode = "caddy"
|
||
domain = ""
|
||
})]
|
||
)
|
||
}
|
||
|
||
# --- Secrets ---
|
||
|
||
resource "google_secret_manager_secret" "jwt" {
|
||
secret_id = "agnes-${var.customer_name}-jwt-secret"
|
||
project = var.gcp_project_id
|
||
replication { auto {} }
|
||
}
|
||
|
||
resource "random_password" "jwt" {
|
||
length = 48
|
||
special = false
|
||
}
|
||
|
||
resource "google_secret_manager_secret_version" "jwt" {
|
||
secret = google_secret_manager_secret.jwt.id
|
||
secret_data = random_password.jwt.result
|
||
}
|
||
|
||
# Keboola token — manuálně vytvořený secret (tenhle TF ho jen referenční).
|
||
data "google_secret_manager_secret_version" "keboola_token" {
|
||
count = var.data_source == "keboola" ? 1 : 0
|
||
secret = "keboola-storage-token"
|
||
project = var.gcp_project_id
|
||
}
|
||
|
||
# --- VM service account (dedikovaný, bez cloud-platform scope) ---
|
||
|
||
resource "google_service_account" "vm" {
|
||
account_id = "agnes-${var.customer_name}-vm"
|
||
display_name = "Agnes VM runtime SA (${var.customer_name})"
|
||
project = var.gcp_project_id
|
||
}
|
||
|
||
resource "google_project_iam_member" "vm_secrets" {
|
||
project = var.gcp_project_id
|
||
role = "roles/secretmanager.secretAccessor"
|
||
member = "serviceAccount:${google_service_account.vm.email}"
|
||
}
|
||
|
||
# --- Network ---
|
||
|
||
resource "google_compute_firewall" "web" {
|
||
name = "agnes-${var.customer_name}-allow-web"
|
||
project = var.gcp_project_id
|
||
network = "default"
|
||
|
||
allow {
|
||
protocol = "tcp"
|
||
ports = ["22", "80", "443", "8000"]
|
||
}
|
||
|
||
source_ranges = ["<redacted-ip>/0"]
|
||
target_tags = ["agnes-${var.customer_name}"]
|
||
}
|
||
|
||
# --- Persistent data disks + VMs (prod + dev) ---
|
||
|
||
resource "google_compute_disk" "data" {
|
||
for_each = { for inst in local.all_instances : inst.name => inst }
|
||
|
||
name = "${each.value.name}-data"
|
||
project = var.gcp_project_id
|
||
zone = var.zone
|
||
size = each.value.data_disk_gb
|
||
type = "pd-ssd"
|
||
}
|
||
|
||
resource "google_compute_address" "ip" {
|
||
for_each = { for inst in local.all_instances : inst.name => inst }
|
||
|
||
name = "${each.value.name}-ip"
|
||
project = var.gcp_project_id
|
||
region = var.region
|
||
}
|
||
|
||
resource "google_compute_instance" "vm" {
|
||
for_each = { for inst in local.all_instances : inst.name => inst }
|
||
|
||
name = each.value.name
|
||
project = var.gcp_project_id
|
||
machine_type = each.value.machine_type
|
||
zone = var.zone
|
||
tags = ["agnes-${var.customer_name}"]
|
||
|
||
boot_disk {
|
||
initialize_params {
|
||
image = "ubuntu-os-cloud/ubuntu-2404-lts-amd64"
|
||
size = each.value.disk_size_gb
|
||
type = "pd-ssd"
|
||
}
|
||
}
|
||
|
||
attached_disk {
|
||
source = google_compute_disk.data[each.key].self_link
|
||
device_name = "data"
|
||
}
|
||
|
||
network_interface {
|
||
network = "default"
|
||
access_config {
|
||
nat_ip = google_compute_address.ip[each.key].address
|
||
}
|
||
}
|
||
|
||
metadata = {
|
||
enable-oslogin = "TRUE"
|
||
}
|
||
|
||
metadata_startup_script = templatefile("${path.module}/startup-script.sh", {
|
||
customer_name = var.customer_name
|
||
image_repo = var.image_repo
|
||
image_tag = each.value.image_tag
|
||
upgrade_mode = each.value.upgrade_mode
|
||
tls_mode = each.value.tls_mode
|
||
domain = each.value.domain
|
||
data_source = var.data_source
|
||
keboola_stack_url = var.keboola_stack_url
|
||
seed_admin_email = var.seed_admin_email
|
||
role = each.value.role
|
||
})
|
||
|
||
service_account {
|
||
email = google_service_account.vm.email
|
||
scopes = ["cloud-platform"]
|
||
}
|
||
|
||
labels = {
|
||
app = "agnes"
|
||
customer = var.customer_name
|
||
role = each.value.role
|
||
managed = "terraform"
|
||
}
|
||
|
||
lifecycle {
|
||
ignore_changes = [metadata_startup_script]
|
||
}
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 4: Napsat `infra/modules/customer-instance/startup-script.sh`**
|
||
|
||
Write:
|
||
|
||
```bash
|
||
#!/bin/bash
|
||
# Agnes VM startup script.
|
||
# Idempotentní — spustí se při každém boot.
|
||
set -euo pipefail
|
||
exec > /var/log/agnes-startup.log 2>&1
|
||
|
||
CUSTOMER_NAME="${customer_name}"
|
||
IMAGE_REPO="${image_repo}"
|
||
IMAGE_TAG="${image_tag}"
|
||
UPGRADE_MODE="${upgrade_mode}"
|
||
TLS_MODE="${tls_mode}"
|
||
DOMAIN="${domain}"
|
||
DATA_SOURCE="${data_source}"
|
||
KEBOOLA_STACK_URL="${keboola_stack_url}"
|
||
SEED_ADMIN_EMAIL="${seed_admin_email}"
|
||
ROLE="${role}"
|
||
|
||
echo "=== [Agnes $CUSTOMER_NAME $ROLE] Startup ==="
|
||
|
||
# --- 1. Docker (install if missing) ---
|
||
if ! command -v docker &>/dev/null; then
|
||
curl -fsSL https://get.docker.com | sh
|
||
fi
|
||
if ! docker compose version &>/dev/null; then
|
||
apt-get update && apt-get install -y docker-compose-plugin
|
||
fi
|
||
|
||
# --- 2. Persistent disk mount ---
|
||
DATA_DEV="/dev/disk/by-id/google-data"
|
||
DATA_MNT="/data"
|
||
if [ -b "$DATA_DEV" ]; then
|
||
if ! blkid "$DATA_DEV" | grep -q ext4; then
|
||
mkfs.ext4 -F "$DATA_DEV"
|
||
fi
|
||
mkdir -p "$DATA_MNT"
|
||
mountpoint -q "$DATA_MNT" || mount -o discard,defaults "$DATA_DEV" "$DATA_MNT"
|
||
grep -q "$DATA_DEV" /etc/fstab || echo "$DATA_DEV $DATA_MNT ext4 discard,defaults,nofail 0 2" >> /etc/fstab
|
||
mkdir -p "$DATA_MNT/state" "$DATA_MNT/analytics" "$DATA_MNT/extracts"
|
||
fi
|
||
|
||
# --- 3. App directory (pro docker-compose.yml) ---
|
||
APP_DIR="/opt/agnes"
|
||
mkdir -p "$APP_DIR"
|
||
cd "$APP_DIR"
|
||
|
||
# Fetch minimal docker-compose — z public repa na jejich tagu
|
||
curl -fsSL "https://raw.githubusercontent.com/keboola/agnes-the-ai-analyst/main/docker-compose.yml" \
|
||
-o docker-compose.yml
|
||
curl -fsSL "https://raw.githubusercontent.com/keboola/agnes-the-ai-analyst/main/docker-compose.prod.yml" \
|
||
-o docker-compose.prod.yml
|
||
|
||
# --- 4. Fetch secrets from Secret Manager ---
|
||
KEBOOLA_TOKEN=""
|
||
if [ "$DATA_SOURCE" = "keboola" ]; then
|
||
KEBOOLA_TOKEN=$(gcloud secrets versions access latest --secret=keboola-storage-token 2>/dev/null || echo "")
|
||
fi
|
||
JWT_KEY=$(gcloud secrets versions access latest --secret=agnes-$CUSTOMER_NAME-jwt-secret)
|
||
|
||
cat > "$APP_DIR/.env" <<EOF
|
||
JWT_SECRET_KEY=$JWT_KEY
|
||
DATA_DIR=$DATA_MNT
|
||
DATA_SOURCE=$DATA_SOURCE
|
||
KEBOOLA_STORAGE_TOKEN=$KEBOOLA_TOKEN
|
||
KEBOOLA_STACK_URL=$KEBOOLA_STACK_URL
|
||
SEED_ADMIN_EMAIL=$SEED_ADMIN_EMAIL
|
||
LOG_LEVEL=info
|
||
DOMAIN=$DOMAIN
|
||
AGNES_TAG=$IMAGE_TAG
|
||
EOF
|
||
chmod 600 "$APP_DIR/.env"
|
||
|
||
# --- 5. Start Agnes ---
|
||
docker compose -f docker-compose.yml -f docker-compose.prod.yml pull
|
||
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
|
||
|
||
# --- 6. Watchtower (auto pull nových image) ---
|
||
if [ "$UPGRADE_MODE" = "auto" ]; then
|
||
docker run -d --name watchtower --restart=unless-stopped \
|
||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||
containrrr/watchtower \
|
||
--interval 300 --cleanup \
|
||
$(docker ps --filter "ancestor=$IMAGE_REPO:$IMAGE_TAG" --format "{{.Names}}") 2>/dev/null || true
|
||
fi
|
||
|
||
echo "=== [Agnes $CUSTOMER_NAME $ROLE] Startup complete ==="
|
||
docker compose ps
|
||
```
|
||
|
||
- [ ] **Step 5: Napsat `infra/modules/customer-instance/outputs.tf`**
|
||
|
||
Write:
|
||
|
||
```hcl
|
||
output "instance_ips" {
|
||
description = "Mapa { name → external IP }"
|
||
value = { for k, v in google_compute_address.ip : k => v.address }
|
||
}
|
||
|
||
output "prod_ip" {
|
||
description = "External IP prod instance"
|
||
value = google_compute_address.ip[var.prod_instance.name].address
|
||
}
|
||
|
||
output "vm_service_account" {
|
||
description = "Email VM SA (pro další IAM bindings, např. BigQuery)"
|
||
value = google_service_account.vm.email
|
||
}
|
||
|
||
output "jwt_secret_name" {
|
||
description = "Plný název JWT secretu v Secret Manageru"
|
||
value = google_secret_manager_secret.jwt.name
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 6: Smazat starý `infra/main.tf` a uložit si ho jako backup**
|
||
|
||
```bash
|
||
mv infra/main.tf infra/main.tf.backup-pre-module
|
||
```
|
||
|
||
- [ ] **Step 7: Vytvořit `infra/examples/minimal/main.tf`**
|
||
|
||
Write:
|
||
|
||
```hcl
|
||
# Minimal example: single-VM Agnes deploy.
|
||
# Pro OSS self-hoster, co nechce ani persistent disk ani dev VM.
|
||
terraform {
|
||
required_version = ">= 1.5"
|
||
required_providers {
|
||
google = { source = "hashicorp/google", version = "~> 5.0" }
|
||
}
|
||
}
|
||
|
||
provider "google" {
|
||
project = var.gcp_project_id
|
||
region = "europe-west1"
|
||
}
|
||
|
||
variable "gcp_project_id" {
|
||
type = string
|
||
}
|
||
|
||
module "agnes" {
|
||
source = "../../modules/customer-instance"
|
||
|
||
gcp_project_id = var.gcp_project_id
|
||
customer_name = "self-hosted"
|
||
seed_admin_email = "admin@example.com"
|
||
|
||
prod_instance = {
|
||
name = "agnes"
|
||
data_disk_gb = 30
|
||
}
|
||
|
||
dev_instances = []
|
||
|
||
data_source = "keboola"
|
||
}
|
||
|
||
output "agnes_ip" {
|
||
value = module.agnes.prod_ip
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 8: Smazat `infra/variables.tf`, `infra/outputs.tf`, `infra/terraform.tfvars.example` (už patří do modulu / examples)**
|
||
|
||
```bash
|
||
# Backup si udělat
|
||
mv infra/variables.tf infra/variables.tf.backup-pre-module
|
||
mv infra/outputs.tf infra/outputs.tf.backup-pre-module
|
||
mv infra/terraform.tfvars.example infra/terraform.tfvars.example.backup-pre-module
|
||
```
|
||
|
||
- [ ] **Step 9: `terraform init` + `validate` v example**
|
||
|
||
```bash
|
||
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss/infra/examples/minimal"
|
||
terraform init -backend=false
|
||
terraform validate
|
||
```
|
||
|
||
Expected: `Success! The configuration is valid.`
|
||
|
||
- [ ] **Step 10: Commit**
|
||
|
||
```bash
|
||
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
|
||
git add infra/modules/ infra/examples/
|
||
git add -u infra/ # pro mv backupy
|
||
git commit -m "infra: extract customer-instance Terraform module; add minimal example"
|
||
```
|
||
|
||
### Task 2.2: Tag prvního release TF modulu
|
||
|
||
- [ ] **Step 1: Otevřít PR z feature branch do main**
|
||
|
||
```bash
|
||
git push origin feature/v2-fastapi-duckdb-docker-cli
|
||
gh pr create --title "feat: multi-customer deployment (Fáze 1-2)" \
|
||
--body "Implements Phases 1-2 of docs/superpowers/plans/2026-04-21-multi-customer-deployment.md"
|
||
```
|
||
|
||
- [ ] **Step 2: Po mergi do main vytvořit tag `infra-v1.0.0`**
|
||
|
||
```bash
|
||
git checkout main
|
||
git pull
|
||
git tag -a infra-v1.0.0 -m "Initial customer-instance module release"
|
||
git push origin infra-v1.0.0
|
||
```
|
||
|
||
### Task 2.3: Založit privátní repo `keboola/agnes-infra-keboola` (manuálně)
|
||
|
||
**Tohle je krok mimo tento repo. Plán jen popisuje.**
|
||
|
||
- [ ] **Step 1: Vytvořit prázdný privátní repo**
|
||
|
||
```bash
|
||
gh repo create keboola/agnes-infra-keboola --private --description "Agnes deployment — Keboola internal instance"
|
||
```
|
||
|
||
- [ ] **Step 2: Klonovat lokálně vedle tohohle repa**
|
||
|
||
```bash
|
||
cd ~/Library/Mobile\ Documents/com\~apple\~CloudDocs/Sources/VsCode/component_factory/
|
||
gh repo clone keboola/agnes-infra-keboola
|
||
cd agnes-infra-keboola
|
||
```
|
||
|
||
- [ ] **Step 3: Vytvořit strukturu**
|
||
|
||
```bash
|
||
mkdir -p terraform .github/workflows config
|
||
|
||
# Terraform root
|
||
cat > terraform/main.tf <<'EOF'
|
||
terraform {
|
||
required_version = ">= 1.5"
|
||
required_providers {
|
||
google = { source = "hashicorp/google", version = "~> 5.0" }
|
||
}
|
||
backend "gcs" {
|
||
bucket = "agnes-internal-prod-tfstate"
|
||
prefix = "keboola"
|
||
}
|
||
}
|
||
|
||
provider "google" {
|
||
project = var.gcp_project_id
|
||
region = var.region
|
||
zone = var.zone
|
||
}
|
||
|
||
module "agnes" {
|
||
source = "github.com/keboola/agnes-the-ai-analyst//infra/modules/customer-instance?ref=infra-v1.0.0"
|
||
|
||
gcp_project_id = var.gcp_project_id
|
||
region = var.region
|
||
zone = var.zone
|
||
customer_name = "keboola"
|
||
seed_admin_email = var.seed_admin_email
|
||
data_source = "keboola"
|
||
keboola_stack_url = var.keboola_stack_url
|
||
prod_instance = var.prod_instance
|
||
dev_instances = var.dev_instances
|
||
}
|
||
|
||
output "prod_ip" { value = module.agnes.prod_ip }
|
||
output "instance_ips" { value = module.agnes.instance_ips }
|
||
EOF
|
||
|
||
cat > terraform/variables.tf <<'EOF'
|
||
variable "gcp_project_id" { type = string }
|
||
variable "region" { type = string, default = "europe-west1" }
|
||
variable "zone" { type = string, default = "europe-west1-b" }
|
||
variable "seed_admin_email" { type = string }
|
||
variable "keboola_stack_url" { type = string }
|
||
variable "prod_instance" { type = any }
|
||
variable "dev_instances" { type = any, default = [] }
|
||
EOF
|
||
|
||
cat > terraform/terraform.tfvars.example <<'EOF'
|
||
gcp_project_id = "internal-prod"
|
||
seed_admin_email = "zdenek.srotyr@keboola.com"
|
||
keboola_stack_url = "https://connection.us-east4.gcp.keboola.com/"
|
||
|
||
prod_instance = {
|
||
name = "agnes-prod"
|
||
machine_type = "e2-small"
|
||
data_disk_gb = 50
|
||
image_tag = "stable"
|
||
upgrade_mode = "auto"
|
||
tls_mode = "caddy"
|
||
domain = ""
|
||
}
|
||
|
||
dev_instances = [
|
||
{ name = "agnes-dev", image_tag = "dev" }
|
||
]
|
||
EOF
|
||
|
||
cat > terraform/.gitignore <<'EOF'
|
||
terraform.tfvars
|
||
*.tfstate
|
||
*.tfstate.*
|
||
.terraform/
|
||
.terraform.lock.hcl
|
||
EOF
|
||
|
||
cp terraform/terraform.tfvars.example terraform/terraform.tfvars
|
||
# Edit terraform.tfvars on real values if they differ
|
||
```
|
||
|
||
- [ ] **Step 4: Initial commit**
|
||
|
||
```bash
|
||
git add .
|
||
git commit -m "initial: Keboola-as-customer Agnes deployment"
|
||
git push -u origin main
|
||
```
|
||
|
||
- [ ] **Step 5: Uploadnout GCP_SA_KEY jako GitHub secret**
|
||
|
||
```bash
|
||
# Klíč vytvořený v Task 1.2 step 3
|
||
gh secret set GCP_SA_KEY --repo keboola/agnes-infra-keboola \
|
||
< ../tmp_oss/agnes-deploy-internal-prod-key.json
|
||
```
|
||
|
||
**Poznámka:** Pokud klíč ne už smazal, re-generate: `gcloud iam service-accounts keys create ...`.
|
||
|
||
- [ ] **Step 6: První terraform init + plan (lokálně, abychom viděli diff)**
|
||
|
||
```bash
|
||
cd terraform
|
||
export GOOGLE_APPLICATION_CREDENTIALS="../agnes-deploy-key.json"
|
||
terraform init
|
||
terraform plan
|
||
```
|
||
|
||
Expected: `Plan: N to add, 0 to change, 0 to destroy.` (N ~ 15-20 resources)
|
||
|
||
Zkontrolovat plán: žádné `destroy` na existujících `data-analyst` / `data-analyst-dev` (to teprve poté, co bude nové nahoře).
|
||
|
||
### Task 2.4: Migrace dat ze starých VMs na nové (bez downtime risku)
|
||
|
||
**Strategy:** Zachovat staré VMs běžící. Terraform vytvoří **nové** VMs s jinými jmény (`agnes-prod`, `agnes-dev`). Data se zkopírují. Poté přepneme DNS/IP (nebo jen komunikujeme novou IP) a staré VMs smažeme.
|
||
|
||
- [ ] **Step 1: Snapshot starého /data**
|
||
|
||
Už máme z Task 0.2. Pokud je snapshot starší než 24 h, udělat nový:
|
||
|
||
```bash
|
||
gcloud compute disks snapshot data-analyst \
|
||
--zone=europe-west1-b \
|
||
--snapshot-names=data-analyst-migration-$(date +%Y%m%d-%H%M) \
|
||
--project=internal-prod
|
||
```
|
||
|
||
- [ ] **Step 2: Terraform apply — vytvoří nové VMs (`agnes-prod`, `agnes-dev`) vedle starých**
|
||
|
||
```bash
|
||
cd ~/.../agnes-infra-keboola/terraform
|
||
terraform apply
|
||
# Type 'yes' to confirm
|
||
```
|
||
|
||
Expected: ~15-20 resources created, ~5 min. Outputs: `prod_ip`, `instance_ips`.
|
||
|
||
- [ ] **Step 3: Zkopírovat data ze starého boot-disku na nový persistent disk**
|
||
|
||
Nové VMs mají prázdný `/data`. Musíme do něj nakopírovat stav z `data-analyst` VM.
|
||
|
||
Nejjednodušší cesta: `rsync` mezi VM přes SSH.
|
||
|
||
```bash
|
||
# SSH na nové prod VM
|
||
NEW_PROD_IP=$(cd ~/.../agnes-infra-keboola/terraform && terraform output -raw prod_ip)
|
||
|
||
# Zkopírovat SSH klíč na starou VM, aby mohla mít přístup na novou
|
||
# (nebo použít oslogin → další prerekvizita)
|
||
|
||
# Alternativa: udělat z druhé strany — SSH na starou VM, rsync na novou
|
||
gcloud compute ssh data-analyst --zone=europe-west1-b --project=internal-prod --command="sudo docker compose -f /home/deploy/app/docker-compose.yml -f /home/deploy/app/docker-compose.prod.yml down"
|
||
|
||
# Rsync přes gcloud compute scp recursive (funguje jen z lokálu)
|
||
gcloud compute scp --recurse --zone=europe-west1-b --project=internal-prod \
|
||
data-analyst:/home/deploy/app/data-volume/ \
|
||
agnes-prod:/data/
|
||
|
||
# Spustit app na nové VM znovu
|
||
gcloud compute ssh agnes-prod --zone=europe-west1-b --project=internal-prod --command="sudo docker compose -f /opt/agnes/docker-compose.yml -f /opt/agnes/docker-compose.prod.yml restart"
|
||
```
|
||
|
||
**Alternativně (čistěji):** restore ze snapshotu přes `gcloud compute disks create --source-snapshot`, pak attach místo prázdného data disku.
|
||
|
||
- [ ] **Step 4: Ověřit nový prod**
|
||
|
||
```bash
|
||
NEW_PROD_IP=$(cd ~/.../agnes-infra-keboola/terraform && terraform output -raw prod_ip)
|
||
curl -s --max-time 10 "http://$NEW_PROD_IP:8000/api/health" | python3 -m json.tool | head -10
|
||
```
|
||
|
||
Expected: healthy / degraded, tables visible.
|
||
|
||
- [ ] **Step 5: Ověřit login na novém prod**
|
||
|
||
```bash
|
||
curl -sS -X POST "http://$NEW_PROD_IP:8000/auth/password/login" \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"email":"zdenek.srotyr@keboola.com","password":"<nové silné heslo z Task 1.10>"}' \
|
||
| python3 -c "import sys,json;print('OK' if json.load(sys.stdin).get('role')=='admin' else 'FAIL')"
|
||
```
|
||
|
||
Expected: `OK`
|
||
|
||
- [ ] **Step 6: Zopakovat pro dev VM (`agnes-dev`)**
|
||
|
||
Stejné kroky 1-5.
|
||
|
||
- [ ] **Step 7: Vypnout staré VMs (zatím NEmazat — jen stop)**
|
||
|
||
```bash
|
||
gcloud compute instances stop data-analyst --zone=europe-west1-b --project=internal-prod
|
||
gcloud compute instances stop data-analyst-dev --zone=europe-west1-b --project=internal-prod
|
||
```
|
||
|
||
- [ ] **Step 8: Ověřit, že nový prod běží minimálně 24 h bez problému**
|
||
|
||
```bash
|
||
# Poznámka v kalendáři / Slacku: "check agnes-prod health in 24h"
|
||
curl -s "http://$NEW_PROD_IP:8000/api/health" | python3 -m json.tool
|
||
```
|
||
|
||
- [ ] **Step 9: Po 24h stability smazat staré VMs + jejich disky + statické IP**
|
||
|
||
```bash
|
||
gcloud compute instances delete data-analyst --zone=europe-west1-b --project=internal-prod --quiet
|
||
gcloud compute instances delete data-analyst-dev --zone=europe-west1-b --project=internal-prod --quiet
|
||
|
||
gcloud compute disks delete data-analyst --zone=europe-west1-b --project=internal-prod --quiet 2>&1 || true
|
||
gcloud compute disks delete data-analyst-dev --zone=europe-west1-b --project=internal-prod --quiet 2>&1 || true
|
||
|
||
gcloud compute addresses delete data-analyst-ip --region=europe-west1 --project=internal-prod --quiet 2>&1 || true
|
||
```
|
||
|
||
- [ ] **Step 10: Checkpoint — Fáze 2 hotová**
|
||
|
||
Checklist:
|
||
- [ ] Terraform modul v `infra/modules/customer-instance/`
|
||
- [ ] `keboola/agnes-infra-keboola` privátní repo existuje, `terraform apply` funguje
|
||
- [ ] Prod VM `agnes-prod` běží s persistent diskem
|
||
- [ ] Dev VM `agnes-dev` běží
|
||
- [ ] Data zmigrovaná, login funguje
|
||
- [ ] Staré VMs smazané, projekt vyčištěný
|
||
|
||
**Po Fázi 2 lze pokračovat paralelně Fázemi 3, 4, 5.**
|
||
|
||
---
|
||
|
||
## Fáze 3 — TLS přes Caddy
|
||
|
||
**Goal fáze:** Agnes je dostupná na HTTPS s automatickým Let's Encrypt certifikátem. Cookie `secure=True` funguje.
|
||
|
||
### Task 3.1: Přidat Caddy service do docker-compose
|
||
|
||
**Files:**
|
||
- Create: `Caddyfile` (v public repu root)
|
||
- Modify: `docker-compose.prod.yml` (přidat caddy service)
|
||
|
||
- [ ] **Step 1: Vytvořit Caddyfile**
|
||
|
||
Write `Caddyfile`:
|
||
|
||
```
|
||
# Agnes reverse proxy with automatic Let's Encrypt.
|
||
# Config přes ENV vars: AGNES_DOMAIN, ACME_EMAIL.
|
||
|
||
{$AGNES_DOMAIN} {
|
||
# Health check endpoint bez TLS redirect (pro smoke testy interně)
|
||
@health path /api/health
|
||
|
||
encode gzip
|
||
|
||
reverse_proxy app:8000 {
|
||
header_up X-Forwarded-Proto https
|
||
}
|
||
|
||
tls {$ACME_EMAIL}
|
||
|
||
log {
|
||
output stdout
|
||
format json
|
||
}
|
||
}
|
||
|
||
# Fallback pro IP access (bez HTTPS, bez cert)
|
||
:80 {
|
||
reverse_proxy app:8000
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: Přidat caddy do `docker-compose.prod.yml`**
|
||
|
||
Add to `services` (pokud už tam není):
|
||
|
||
```yaml
|
||
caddy:
|
||
image: caddy:2-alpine
|
||
restart: unless-stopped
|
||
ports:
|
||
- "80:80"
|
||
- "443:443"
|
||
volumes:
|
||
- ./Caddyfile:/etc/caddy/Caddyfile:ro
|
||
- caddy_data:/data
|
||
- caddy_config:/config
|
||
environment:
|
||
AGNES_DOMAIN: ${AGNES_DOMAIN:-:80}
|
||
ACME_EMAIL: ${ACME_EMAIL:-admin@example.com}
|
||
depends_on:
|
||
- app
|
||
profiles:
|
||
- tls # nezapne se bez --profile tls
|
||
|
||
volumes:
|
||
caddy_data:
|
||
caddy_config:
|
||
```
|
||
|
||
- [ ] **Step 3: Aktualizovat modul — předat `tls_mode` do startup-script**
|
||
|
||
V `infra/modules/customer-instance/startup-script.sh` najít sekci `# --- 5. Start Agnes ---` a rozšířit:
|
||
|
||
```bash
|
||
# --- 5. Start Agnes ---
|
||
COMPOSE_PROFILES=""
|
||
if [ "$TLS_MODE" = "caddy" ] && [ -n "$DOMAIN" ]; then
|
||
COMPOSE_PROFILES="--profile tls"
|
||
# Další ENV pro Caddy
|
||
{
|
||
echo "AGNES_DOMAIN=$DOMAIN"
|
||
echo "ACME_EMAIL=admin@$${DOMAIN#*.}"
|
||
} >> "$APP_DIR/.env"
|
||
fi
|
||
|
||
docker compose -f docker-compose.yml -f docker-compose.prod.yml $COMPOSE_PROFILES pull
|
||
docker compose -f docker-compose.yml -f docker-compose.prod.yml $COMPOSE_PROFILES up -d
|
||
```
|
||
|
||
- [ ] **Step 4: Commit changes v public repu**
|
||
|
||
```bash
|
||
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
|
||
git add Caddyfile docker-compose.prod.yml infra/modules/customer-instance/startup-script.sh
|
||
git commit -m "feat(tls): add Caddy reverse proxy with Let's Encrypt support"
|
||
```
|
||
|
||
- [ ] **Step 5: Tag nového releasu modulu**
|
||
|
||
```bash
|
||
# Po mergi PR do main
|
||
git checkout main && git pull
|
||
git tag -a infra-v1.1.0 -m "Add TLS support via Caddy"
|
||
git push origin infra-v1.1.0
|
||
```
|
||
|
||
### Task 3.2: Zapnout TLS pro Keboola instanci
|
||
|
||
**Tohle vyžaduje DNS záznam. Pokud nemáš doménu, skip a zůstaň na :8000.**
|
||
|
||
- [ ] **Step 1: V `keboola/agnes-infra-keboola/terraform/terraform.tfvars` nastavit doménu**
|
||
|
||
Pokud máme `agnes.keboola.com` (ověřit u IT), edit:
|
||
|
||
```hcl
|
||
prod_instance = {
|
||
name = "agnes-prod"
|
||
# ...
|
||
tls_mode = "caddy"
|
||
domain = "agnes.keboola.com"
|
||
}
|
||
```
|
||
|
||
A v `main.tf` bumpnout module ref:
|
||
|
||
```hcl
|
||
source = "github.com/keboola/agnes-the-ai-analyst//infra/modules/customer-instance?ref=infra-v1.1.0"
|
||
```
|
||
|
||
- [ ] **Step 2: Terraform apply**
|
||
|
||
```bash
|
||
cd ~/.../agnes-infra-keboola/terraform
|
||
terraform apply
|
||
```
|
||
|
||
- [ ] **Step 3: Nastavit DNS A record `agnes.keboola.com` → prod_ip**
|
||
|
||
Ruční krok (potřebuje přístup do Keboola DNS). Výstup `prod_ip` je IP.
|
||
|
||
- [ ] **Step 4: Počkat na DNS propagation + LE cert**
|
||
|
||
```bash
|
||
until nslookup agnes.keboola.com | grep -q "$(terraform output -raw prod_ip)"; do sleep 30; done
|
||
sleep 60 # čas na LE cert issuance
|
||
curl -sSI --max-time 10 https://agnes.keboola.com | head -5
|
||
```
|
||
|
||
Expected: `HTTP/2 200` (ne 301, ne TLS error).
|
||
|
||
---
|
||
|
||
## Fáze 4 — Watchtower (dev VM auto-deploy), OS Login, VM SA
|
||
|
||
**Goal fáze:** Dev VMs auto-pullují nové image. OS Login pro SSH (bez osobního klíče). Dedikovaný VM SA.
|
||
|
||
### Task 4.1: Watchtower integrace (už v Task 2 startup-script, zde jen ověření)
|
||
|
||
- [ ] **Step 1: SSH na dev VM a ověřit, že watchtower běží**
|
||
|
||
```bash
|
||
gcloud compute ssh agnes-dev --zone=europe-west1-b --project=internal-prod --command="sudo docker ps | grep watchtower"
|
||
```
|
||
|
||
Expected: container `watchtower` STATUS `Up X minutes`.
|
||
|
||
- [ ] **Step 2: Otestovat auto-deploy: pushnout drobnou změnu na feature branch, počkat**
|
||
|
||
```bash
|
||
# V public repu
|
||
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
|
||
git checkout -b feature/watchtower-test
|
||
echo "# test" >> README.md
|
||
git add README.md
|
||
git commit -m "test: trigger :dev image rebuild"
|
||
git push origin feature/watchtower-test
|
||
```
|
||
|
||
Počkat ~ 5-10 min (CI build + watchtower poll interval 5 min).
|
||
|
||
```bash
|
||
# Kontrola image sha na dev VM
|
||
gcloud compute ssh agnes-dev --zone=europe-west1-b --project=internal-prod \
|
||
--command="sudo docker inspect app-app-1 --format '{{.Image}}' && sudo docker image inspect \$(sudo docker inspect app-app-1 --format '{{.Image}}') --format '{{.Created}}'"
|
||
```
|
||
|
||
Expected: Created timestamp v posledních ~ 10 minutách.
|
||
|
||
### Task 4.2: OS Login
|
||
|
||
- [ ] **Step 1: Ověřit, že modul nastavuje `enable-oslogin=TRUE`**
|
||
|
||
Už je v `infra/modules/customer-instance/main.tf`:
|
||
|
||
```hcl
|
||
metadata = {
|
||
enable-oslogin = "TRUE"
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: Zkontrolovat, že uživatelé mají `roles/compute.osAdminLogin` na projektu**
|
||
|
||
```bash
|
||
gcloud projects get-iam-policy internal-prod \
|
||
--flatten="bindings[].members" \
|
||
--filter="bindings.role=roles/compute.osAdminLogin" \
|
||
--format="value(bindings.members)"
|
||
```
|
||
|
||
Pokud prázdné, přidat:
|
||
|
||
```bash
|
||
gcloud projects add-iam-policy-binding internal-prod \
|
||
--member=user:zdenek.srotyr@keboola.com \
|
||
--role=roles/compute.osAdminLogin
|
||
```
|
||
|
||
- [ ] **Step 3: Test SSH přes OS Login**
|
||
|
||
```bash
|
||
gcloud compute ssh agnes-prod --zone=europe-west1-b --project=internal-prod --command="whoami"
|
||
```
|
||
|
||
Expected: username ve formátu `zdenek_srotyr_keboola_com` (OS Login generated).
|
||
|
||
### Task 4.3: VM SA už má správný scope (ověřit)
|
||
|
||
- [ ] **Step 1: Ověřit, že VM SA má jen secretmanager.secretAccessor**
|
||
|
||
```bash
|
||
gcloud projects get-iam-policy internal-prod \
|
||
--flatten="bindings[].members" \
|
||
--filter="bindings.members:agnes-keboola-vm@" \
|
||
--format="value(bindings.role)"
|
||
```
|
||
|
||
Expected: `roles/secretmanager.secretAccessor` (jen tohle).
|
||
|
||
---
|
||
|
||
## Fáze 5 — CI/CD v privátním infra repu
|
||
|
||
**Goal fáze:** PR v `keboola/agnes-infra-keboola` spustí `terraform plan`; merge → `terraform apply`. Prod aplikuje přes environment protection s reviewerem.
|
||
|
||
### Task 5.1: plan.yml workflow
|
||
|
||
**Files (v `keboola/agnes-infra-keboola` repu):**
|
||
- Create: `.github/workflows/plan.yml`
|
||
|
||
- [ ] **Step 1: Napsat plan.yml**
|
||
|
||
```yaml
|
||
name: Terraform Plan
|
||
|
||
on:
|
||
pull_request:
|
||
paths:
|
||
- 'terraform/**'
|
||
|
||
permissions:
|
||
contents: read
|
||
pull-requests: write
|
||
|
||
jobs:
|
||
plan:
|
||
runs-on: ubuntu-latest
|
||
defaults:
|
||
run:
|
||
working-directory: terraform
|
||
steps:
|
||
- uses: actions/checkout@v5
|
||
|
||
- uses: google-github-actions/auth@v2
|
||
with:
|
||
credentials_json: ${{ secrets.GCP_SA_KEY }}
|
||
|
||
- uses: hashicorp/setup-terraform@v3
|
||
with:
|
||
terraform_version: ~1.7
|
||
|
||
- run: terraform init
|
||
- run: terraform fmt -check
|
||
- id: plan
|
||
run: |
|
||
terraform plan -no-color -out=tfplan 2>&1 | tee plan.txt
|
||
echo "status=$(echo $? )" >> $GITHUB_OUTPUT
|
||
|
||
- uses: actions/github-script@v7
|
||
if: always()
|
||
with:
|
||
script: |
|
||
const fs = require('fs');
|
||
const plan = fs.readFileSync('terraform/plan.txt', 'utf8').slice(0, 60000);
|
||
const body = `### Terraform plan\n\n\`\`\`\n${plan}\n\`\`\``;
|
||
github.rest.issues.createComment({
|
||
issue_number: context.issue.number,
|
||
owner: context.repo.owner,
|
||
repo: context.repo.repo,
|
||
body: body
|
||
});
|
||
```
|
||
|
||
- [ ] **Step 2: Commit**
|
||
|
||
```bash
|
||
cd ~/.../agnes-infra-keboola
|
||
git add .github/workflows/plan.yml
|
||
git commit -m "ci: add terraform plan on PR"
|
||
git push
|
||
```
|
||
|
||
### Task 5.2: apply.yml workflow s environment protection
|
||
|
||
**Files:**
|
||
- Create: `.github/workflows/apply.yml`
|
||
|
||
- [ ] **Step 1: Napsat apply.yml**
|
||
|
||
```yaml
|
||
name: Terraform Apply
|
||
|
||
on:
|
||
push:
|
||
branches: [main]
|
||
paths:
|
||
- 'terraform/**'
|
||
workflow_dispatch: {}
|
||
|
||
permissions:
|
||
contents: read
|
||
|
||
jobs:
|
||
apply-dev:
|
||
runs-on: ubuntu-latest
|
||
environment: dev # no protection
|
||
defaults:
|
||
run:
|
||
working-directory: terraform
|
||
steps:
|
||
- uses: actions/checkout@v5
|
||
- uses: google-github-actions/auth@v2
|
||
with:
|
||
credentials_json: ${{ secrets.GCP_SA_KEY }}
|
||
- uses: hashicorp/setup-terraform@v3
|
||
with:
|
||
terraform_version: ~1.7
|
||
- run: terraform init
|
||
- run: terraform apply -auto-approve -target='module.agnes.google_compute_instance.vm["agnes-dev"]'
|
||
|
||
apply-prod:
|
||
needs: apply-dev
|
||
runs-on: ubuntu-latest
|
||
environment: prod # protected — requires reviewer
|
||
defaults:
|
||
run:
|
||
working-directory: terraform
|
||
steps:
|
||
- uses: actions/checkout@v5
|
||
- uses: google-github-actions/auth@v2
|
||
with:
|
||
credentials_json: ${{ secrets.GCP_SA_KEY }}
|
||
- uses: hashicorp/setup-terraform@v3
|
||
with:
|
||
terraform_version: ~1.7
|
||
- run: terraform init
|
||
- run: terraform apply -auto-approve
|
||
|
||
- name: Smoke test
|
||
run: |
|
||
PROD_IP=$(terraform output -raw prod_ip)
|
||
for i in 1 2 3 4 5; do
|
||
if curl -sf "http://$PROD_IP:8000/api/health" >/dev/null; then
|
||
echo "Healthy"; exit 0
|
||
fi
|
||
sleep 15
|
||
done
|
||
echo "Health check failed"; exit 1
|
||
```
|
||
|
||
- [ ] **Step 2: V GitHub UI nastavit environmenty**
|
||
|
||
Navigovat do `keboola/agnes-infra-keboola` → Settings → Environments → New environment:
|
||
|
||
- **dev**: žádná protection
|
||
- **prod**:
|
||
- Required reviewers: @ZdenekSrotyr (nebo @keboola-ops-team)
|
||
- Wait timer: 5 min
|
||
- Deployment branches: Selected branches → `main`
|
||
|
||
- [ ] **Step 3: Commit workflow**
|
||
|
||
```bash
|
||
git add .github/workflows/apply.yml
|
||
git commit -m "ci: add terraform apply with dev/prod environments and smoke test"
|
||
git push
|
||
```
|
||
|
||
- [ ] **Step 4: Test flow — otevřít dummy PR, sledovat plan, merge, apply**
|
||
|
||
```bash
|
||
git checkout -b test/ci-flow
|
||
# trivial edit in tfvars, např. přidat dev VM
|
||
echo "# ci flow test" >> terraform/README.md
|
||
git add terraform/README.md
|
||
git commit -m "test: CI flow"
|
||
git push origin test/ci-flow
|
||
gh pr create --title "test: CI flow" --body "Testing plan → apply flow"
|
||
```
|
||
|
||
V PR:
|
||
1. Počkat na plan.yml → komentář s plánem
|
||
2. Schválit + merge
|
||
3. Sledovat apply-dev (auto), pak apply-prod (čeká na reviewera)
|
||
4. Schválit prod deploy
|
||
5. Ověřit smoke test PASS
|
||
|
||
### Task 5.3: Rotovat SA key (z lokálního -> jen v GH secret)
|
||
|
||
- [ ] **Step 1: Smazat lokální SA key**
|
||
|
||
```bash
|
||
rm ~/.../agnes-deploy-internal-prod-key.json
|
||
```
|
||
|
||
- [ ] **Step 2: Na GCP smazat starý klíč (key rotation)**
|
||
|
||
```bash
|
||
# Seznam klíčů
|
||
gcloud iam service-accounts keys list \
|
||
--iam-account=agnes-deploy@internal-prod.iam.gserviceaccount.com \
|
||
--project=internal-prod
|
||
```
|
||
|
||
Po ověření, že GH Actions s novým klíčem funguje (po úspěšném prvním apply), smazat starý.
|
||
|
||
---
|
||
|
||
## Fáze 6 — Template repo + onboarding playbook
|
||
|
||
**Goal fáze:** Druhý zákazník (another-customer) se dá nasadit za < 1 hodinu.
|
||
|
||
### Task 6.1: Vytvořit `keboola/agnes-infra-template`
|
||
|
||
- [ ] **Step 1: Založit prázdný repo jako template**
|
||
|
||
```bash
|
||
gh repo create keboola/agnes-infra-template --public --description "Template for Agnes per-customer infrastructure" -c
|
||
cd ~/Library/Mobile\ Documents/com\~apple\~CloudDocs/Sources/VsCode/component_factory/
|
||
gh repo clone keboola/agnes-infra-template
|
||
cd agnes-infra-template
|
||
```
|
||
|
||
- [ ] **Step 2: Zkopírovat strukturu z `agnes-infra-keboola`, nahradit konkrétní hodnoty placeholdery**
|
||
|
||
```bash
|
||
# Zkopírovat strukturu
|
||
cp -r ../agnes-infra-keboola/terraform .
|
||
cp -r ../agnes-infra-keboola/.github .
|
||
|
||
# Reset konkrétní hodnoty
|
||
cat > terraform/main.tf <<'EOF'
|
||
terraform {
|
||
required_version = ">= 1.5"
|
||
required_providers {
|
||
google = { source = "hashicorp/google", version = "~> 5.0" }
|
||
}
|
||
backend "gcs" {
|
||
bucket = "REPLACE_WITH_YOUR_BUCKET"
|
||
prefix = "REPLACE_WITH_CUSTOMER_NAME"
|
||
}
|
||
}
|
||
|
||
provider "google" {
|
||
project = var.gcp_project_id
|
||
region = var.region
|
||
zone = var.zone
|
||
}
|
||
|
||
module "agnes" {
|
||
source = "github.com/keboola/agnes-the-ai-analyst//infra/modules/customer-instance?ref=infra-v1.1.0"
|
||
|
||
gcp_project_id = var.gcp_project_id
|
||
region = var.region
|
||
zone = var.zone
|
||
customer_name = var.customer_name
|
||
seed_admin_email = var.seed_admin_email
|
||
data_source = var.data_source
|
||
keboola_stack_url = var.keboola_stack_url
|
||
prod_instance = var.prod_instance
|
||
dev_instances = var.dev_instances
|
||
}
|
||
|
||
output "prod_ip" { value = module.agnes.prod_ip }
|
||
output "instance_ips" { value = module.agnes.instance_ips }
|
||
EOF
|
||
|
||
cat > terraform/variables.tf <<'EOF'
|
||
variable "gcp_project_id" { type = string }
|
||
variable "region" { type = string, default = "europe-west1" }
|
||
variable "zone" { type = string, default = "europe-west1-b" }
|
||
variable "customer_name" { type = string }
|
||
variable "seed_admin_email" { type = string }
|
||
variable "data_source" { type = string, default = "keboola" }
|
||
variable "keboola_stack_url" { type = string, default = "" }
|
||
variable "prod_instance" { type = any }
|
||
variable "dev_instances" { type = any, default = [] }
|
||
EOF
|
||
|
||
cat > terraform/terraform.tfvars.example <<'EOF'
|
||
# Kopie tohoto souboru → terraform.tfvars, vyplnit hodnoty.
|
||
# terraform.tfvars je gitignored (nikdy necommitovat!)
|
||
|
||
gcp_project_id = "REPLACE" # Váš GCP projekt
|
||
customer_name = "REPLACE" # Krátký identifikátor, např. "acme"
|
||
seed_admin_email = "admin@example.com"
|
||
data_source = "keboola" # keboola | bigquery | csv
|
||
keboola_stack_url = "https://connection.keboola.com/"
|
||
|
||
prod_instance = {
|
||
name = "agnes-prod"
|
||
machine_type = "e2-small"
|
||
data_disk_gb = 50
|
||
image_tag = "stable"
|
||
upgrade_mode = "auto"
|
||
tls_mode = "caddy"
|
||
domain = ""
|
||
}
|
||
|
||
dev_instances = [
|
||
{ name = "agnes-dev", image_tag = "dev" }
|
||
]
|
||
EOF
|
||
```
|
||
|
||
- [ ] **Step 3: Zkopírovat bootstrap skript z public repa**
|
||
|
||
```bash
|
||
cp ../tmp_oss/scripts/bootstrap-gcp.sh .
|
||
```
|
||
|
||
- [ ] **Step 4: Napsat README.md pro onboarding**
|
||
|
||
Write:
|
||
|
||
```markdown
|
||
# Agnes Infrastructure Template
|
||
|
||
Deploy Agnes (AI Data Analyst) into your own GCP project.
|
||
|
||
## Prerequisites
|
||
|
||
- GCP project with billing enabled
|
||
- `gcloud` CLI authenticated as project Owner
|
||
- `terraform` >= 1.5
|
||
- GitHub account (for private repo + Actions)
|
||
|
||
## 1. Bootstrap GCP
|
||
|
||
```bash
|
||
./bootstrap-gcp.sh <YOUR_GCP_PROJECT_ID>
|
||
```
|
||
|
||
Výstup: SA key JSON.
|
||
|
||
## 2. Klonovat template
|
||
|
||
```bash
|
||
gh repo create <YOUR_ORG>/agnes-infra --template keboola/agnes-infra-template --private
|
||
cd agnes-infra
|
||
```
|
||
|
||
## 3. Nastavit secrets
|
||
|
||
```bash
|
||
# SA key (z kroku 1)
|
||
gh secret set GCP_SA_KEY < path/to/key.json
|
||
rm path/to/key.json
|
||
|
||
# Keboola token (pokud data_source = keboola)
|
||
gcloud secrets create keboola-storage-token --data-file=- <<< "YOUR_TOKEN"
|
||
```
|
||
|
||
## 4. Konfigurace
|
||
|
||
Editovat `terraform/main.tf` — aktualizovat `backend.bucket` a `backend.prefix`.
|
||
|
||
Kopírovat `terraform/terraform.tfvars.example` → `terraform/terraform.tfvars`, vyplnit.
|
||
|
||
## 5. První apply
|
||
|
||
```bash
|
||
cd terraform
|
||
terraform init
|
||
terraform plan
|
||
terraform apply
|
||
```
|
||
|
||
IP prod VM je v outputu.
|
||
|
||
## 6. Login
|
||
|
||
```bash
|
||
# Bootstrap prvního admin usera
|
||
curl -X POST http://$(terraform output -raw prod_ip):8000/auth/bootstrap \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"email": "YOU@example.com", "password": "YOUR_STRONG_PASSWORD"}'
|
||
```
|
||
|
||
Otevřít http://<prod_ip>:8000/login.
|
||
|
||
## 7. Upgrade workflow
|
||
|
||
- `:stable` image → auto-upgrade přes Watchtower
|
||
- Infra změna: PR v tomto repu → `terraform plan` v PR → merge → `apply` (prod vyžaduje reviewer)
|
||
- TF modul upgrade: Renovate otevře PR s novým `ref=infra-vX.Y.Z`
|
||
|
||
Další detaily: https://github.com/keboola/agnes-the-ai-analyst/blob/main/docs/ONBOARDING.md
|
||
```
|
||
|
||
- [ ] **Step 5: Vytvořit README + push + mark as template**
|
||
|
||
```bash
|
||
git add .
|
||
git commit -m "initial template"
|
||
git push -u origin main
|
||
gh repo edit keboola/agnes-infra-template --template
|
||
```
|
||
|
||
### Task 6.2: Napsat ONBOARDING.md v public repu
|
||
|
||
**Files:**
|
||
- Create: `docs/ONBOARDING.md` (v public repu)
|
||
|
||
- [ ] **Step 1: Napsat ONBOARDING.md**
|
||
|
||
Write `docs/ONBOARDING.md` obsah identický s README v template repu + poznámkou "fyzická šablona: keboola/agnes-infra-template".
|
||
|
||
- [ ] **Step 2: Commit**
|
||
|
||
```bash
|
||
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
|
||
git add docs/ONBOARDING.md
|
||
git commit -m "docs: onboarding guide for deploying Agnes per customer"
|
||
```
|
||
|
||
### Task 6.3: Vyzkoušet onboarding na dummy customer (sanity check)
|
||
|
||
- [ ] **Step 1: Vytvořit testovací GCP projekt**
|
||
|
||
```bash
|
||
gcloud projects create agnes-onboarding-test-$(date +%s) --name="Agnes onboarding test"
|
||
# Link billing (via UI) if required
|
||
```
|
||
|
||
- [ ] **Step 2: Spustit bootstrap**
|
||
|
||
```bash
|
||
./scripts/bootstrap-gcp.sh <test-project-id>
|
||
```
|
||
|
||
- [ ] **Step 3: Klonovat template do dummy repa**
|
||
|
||
```bash
|
||
gh repo create zdeneksrotyr/agnes-infra-test --template keboola/agnes-infra-template --private
|
||
gh repo clone zdeneksrotyr/agnes-infra-test
|
||
cd agnes-infra-test
|
||
```
|
||
|
||
- [ ] **Step 4: Projít README krok za krokem a změřit čas**
|
||
|
||
Cíl: end-to-end < 1 hod. Zaznamenat překážky, zpět do README.
|
||
|
||
- [ ] **Step 5: Cleanup — smazat test projekt**
|
||
|
||
```bash
|
||
gcloud projects delete <test-project-id>
|
||
gh repo delete zdeneksrotyr/agnes-infra-test --yes
|
||
```
|
||
|
||
### Task 6.4: Renovate configuration
|
||
|
||
- [ ] **Step 1: Přidat renovate.json do template repa**
|
||
|
||
Write `keboola/agnes-infra-template/renovate.json`:
|
||
|
||
```json
|
||
{
|
||
"$schema": "https://docs.renovatebot.com/renovate-schema.json",
|
||
"extends": ["config:base"],
|
||
"customManagers": [
|
||
{
|
||
"customType": "regex",
|
||
"fileMatch": ["\\.tf$"],
|
||
"matchStrings": [
|
||
"source\\s*=\\s*\"github\\.com/keboola/agnes-the-ai-analyst//infra/modules/customer-instance\\?ref=(?<currentValue>infra-v\\d+\\.\\d+\\.\\d+)\""
|
||
],
|
||
"datasourceTemplate": "github-releases",
|
||
"depNameTemplate": "keboola/agnes-the-ai-analyst",
|
||
"packageNameTemplate": "keboola/agnes-the-ai-analyst",
|
||
"versioningTemplate": "regex:^infra-v(?<major>\\d+)\\.(?<minor>\\d+)\\.(?<patch>\\d+)$"
|
||
}
|
||
],
|
||
"packageRules": [
|
||
{
|
||
"matchPackageNames": ["keboola/agnes-the-ai-analyst"],
|
||
"matchUpdateTypes": ["major"],
|
||
"prPriority": 10
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
- [ ] **Step 2: Instalovat Renovate GitHub App na privátní repa**
|
||
|
||
Ruční krok v GitHub: Settings → Integrations → Renovate → grant access.
|
||
|
||
---
|
||
|
||
## Finální checkpoint
|
||
|
||
- [ ] **Fáze 1 complete** — prod běží z `:stable` image, žádný git pull z forku
|
||
- [ ] **Fáze 2 complete** — TF modul, PD, Keboola nasazena přes modul
|
||
- [ ] **Fáze 3 complete** — HTTPS funguje (pokud DNS dostupné)
|
||
- [ ] **Fáze 4 complete** — watchtower na dev VM auto-pulluje :dev, OS Login aktivní
|
||
- [ ] **Fáze 5 complete** — GHA CI/CD funguje, prod apply vyžaduje review
|
||
- [ ] **Fáze 6 complete** — template repo existuje, ONBOARDING.md, Renovate nakonfigurovaný
|
||
- [ ] **Starý osobní fork smazán**
|
||
- [ ] **Keboola token rotován a v Secret Manageru**
|
||
- [ ] **Dokumentace aktualizovaná**
|
||
|
||
---
|
||
|
||
## Self-Review
|
||
|
||
**Spec coverage:**
|
||
- §2 Model self-deploy → Task 1.2 (bootstrap), Task 2.3 (private repo), Task 6 (template) ✅
|
||
- §3 Repo architektura → Task 2.1 (modul), Task 6.1 (template), Task 2.3 (customer repo) ✅
|
||
- §4 Release model → Task 1.1 (per-branch tagging), existuje release.yml ✅
|
||
- §5 Branch-aware dev → Task 2.1 (dev_instances proměnná), Task 4.1 (watchtower) ✅
|
||
- §6 Prod upgrade model → Task 4.1 (auto via watchtower), pinned mode přes tfvars (zákazník zvolí) ✅
|
||
- §7 Security → Task 1.2-1.4 (Secret Manager, SA), Task 4.2 (OS Login), Task 5.2 (env protection) ✅
|
||
- §8 Onboarding → Task 6.1-6.4 ✅
|
||
- §9 Tok změn → Task 5.1-5.2 (plan/apply), Task 4.1 (watchtower pipeline) ✅
|
||
- §10 Backup/monitoring → částečně; monitoring je follow-up (§14) ✅
|
||
|
||
**Placeholder scan:** Všechny kódy, konfigurace, příkazy jsou konkrétní.
|
||
|
||
**Type consistency:** `prod_instance` object a `dev_instances` list mají konzistentní schéma napříč Task 2.1, Task 2.3, Task 6.1.
|
||
|
||
**Gap:** Zákazníkem-zvolený pinned upgrade režim (§6.1) spouští Renovate — Renovate konfigurace je v Task 6.4, ale nepokrývá upgrade image tagu (jen modul ref). Follow-up: rozšířit `customManagers` v renovate.json na `image_tag` v tfvars.
|