* chore(oss): isolate customer-specific deploy bits from scripts/grpn/ (#88) Vendor-neutralization step before public release. The directory mixed two concerns: (1) generic ops scripts referenced from mainline OSS infrastructure (TLS rotation, auto-upgrade cron) and (2) one operator's hackathon manual-deploy helper with hardcoded GCP project IDs, VM names, and admin emails. Splitting them per concern. Moved (still in OSS, just under a vendor-neutral name): - scripts/grpn/agnes-tls-rotate.sh → scripts/ops/agnes-tls-rotate.sh - scripts/grpn/agnes-auto-upgrade.sh → scripts/ops/agnes-auto-upgrade.sh Removed (belongs in private consumer infra repos, not upstream OSS): - scripts/grpn/Makefile (hardcoded prj-grp-foundryai-dev-7c37, foundryai-development VM name, e_zsrotyr@groupon.com bootstrap email) - scripts/grpn/README.md (GRPN hackathon deploy walkthrough) - docs/superpowers/plans/2026-04-22-grpn-deploy-learnings.md (org-specific deploy log) Cross-refs updated in README.md, CLAUDE.md, docs/DEPLOYMENT.md, docker-compose.yml. CHANGELOG entry flags BREAKING (ops) for any consumer infra repo that installs these scripts via path-based systemd timers. This is the first wave of #88 — the remaining leaks (test data with prj-grp-dataview-prod-1ff9, AIAgent.FoundryAI tags in OpenMetadata test fixtures, docstrings in connectors/openmetadata/enricher.py) will be a separate, smaller PR. Refs #88. * chore(oss): comprehensive vendor-neutralization (#88 wave 2 + review fixes) PR #94 review found that the original wave-1 grep was scoped wrong and many leaks survived. This commit closes wave 1 properly AND folds in all wave-2 anonymization in a single pass — easier to review than two PRs. Wave-1 review-fix corrections: - Caddyfile: scripts/grpn/agnes-tls-rotate.sh → scripts/ops/ (the original wave-1 grep filter excluded extensionless files like Caddyfile). - CHANGELOG bullet rewritten — original wording implied an in-repo migration for infra/modules/customer-instance/, which is wrong (the TF module embeds the script inline via heredoc, never sourced from scripts/grpn/). Now flags downstream consumer infra repos only. - infra/modules/customer-instance/variables.tf: Czech docstring with `grpn` example → English description with `acme, example` placeholders. Wave-2 anonymization: - Code docstrings (connectors/openmetadata/{client,transformer,enricher}.py, src/catalog_export.py, scripts/duckdb_manager.py): prj-grp-… → my-bq-project / prj-example-1234, AIAgent.FoundryAI → AIAgent.MyAgent, FoundryAIDataModel → AnalyticsDataModel. - Test fixtures (4 files): same set of replacements — 157 tests still pass. - .github/workflows/keboola-deploy.yml: "Groupon-side dev VMs" comment → generic "per-developer dev VMs". - docs/auth-groups.md + scripts/debug/probe_google_groups.py: kids-ai-data-analysis project name → acme-internal-prod placeholder. - 5 planning/spec docs under docs/superpowers/{plans,specs}/2026-04-21-*: hardcoded IPs (34.77.94.14, 34.77.102.61) → <dev-vm-ip>/<prod-vm-ip>; GRPN/Groupon → Acme/another-customer; prj-grp-… → prj-example-…. - scripts/switch-dev-vm.sh deleted — hackathon-era helper hardcoded to a specific shared dev VM. Per-developer dev VMs are the supported pattern. Final grep `groupon|grpn|foundryai|prj-grp|groupondev|34\.77\.(94|102)\.…|kids-ai-data` returns zero hits (excluding CHANGELOG.md historical entries). CHANGELOG entry expanded to document both waves under one bullet, with the BREAKING (ops) clarification about the TF module being unaffected. Refs review of #94, closes #88. * fix(oss): close remaining #94 review-2 findings (Czech, padak refs, CHANGELOG) Reviewer of PR #94 round 2 caught 4 remaining items the wave-2 pass missed: 1. infra/modules/customer-instance/variables.tf had Czech descriptions on 8 more variables. Previous review only flagged line 19; this round audited the rest. Translated lines 2, 28, 42-46 (heredoc), 60, 65, 71, 78, 84 to English. Same review concern: a Terraform module that is the customer-facing API surface in Czech is unfit for OSS distribution. 2. infra/modules/customer-instance/outputs.tf had Czech descriptions on four outputs. Same fix. 3. docs/padak-security.md referenced a private repo (padak/keboola_agent_cli#206) in two places. Replaced with generic 'tracked upstream in the auth-CLI repo' per CLAUDE.md vendor-agnostic rule (no cross-refs to private repos). 4. scripts/fetch-env-from-secrets.sh:41 had a Czech comment. Translated. 5. CHANGELOG cosmetic: bullet said 'AIAgent.FoundryAI -> AIAgent.MyAgent' but the actual code uses both MyAgent (in docstrings) and Example (in test fixtures). Reworded to mention both targets. Final grep across all shipping file types (.md, .py, .yml, .yaml, .sh, Makefile, .json, .tf, .tpl, Caddyfile, .toml) for groupon|grpn|foundryai| prj-grp|groupondev|34.77.94.14|34.77.102.61|kids-ai-data|padak/keboola_agent_cli returns ZERO hits (excluding CHANGELOG.md). Czech-diacritic grep across .tf/.toml/Caddyfile/Makefile/.yml returns ZERO hits. 157/157 OpenMetadata + DuckDB tests still pass. * fix(oss): close #94 round-3 leaks (env.template, instance.yaml.example, padak typo) Round-3 reviewer caught two MUST-FIX leaks the round-2 grep missed (grep was scoped to extensions that did not include .template / .example suffixes — the audit was right, the previous grep was not paranoid enough): 1. config/instance.yaml.example:114 — '(optional - Groupon-specific)' brand leak in a shipping config example. Replaced with '(optional)'. 2. config/.env.template:68 — stale path 'scripts/grpn/agnes-tls-rotate.sh' in operator-facing env-template comment. The script lives at scripts/ops/ now (commit 16a85cc); this comment had been pointing operators at a non-existent path. 3. docs/padak-security.md:188 — phrase duplication 'tracked in tracked upstream' from a sloppy substitution in round-2. Trivial wording fix. Final paranoid grep across .md/.py/.yml/.yaml/.sh/Makefile/.json/.tf/.tpl/ Caddyfile/.toml/.template/.example/.env* with the full token set (groupon|grpn|foundryai|prj-grp|groupondev|34\.77\.94\.14|34\.77\.102\.61| kids-ai-data|padak/keboola_agent_cli) returns ZERO hits, excluding CHANGELOG.md historical entries. * fix(oss): #94 round-4 — QUICKSTART.md + rename padak-security.md Devin Review caught two findings on the latest round-3 commit: 1. docs/QUICKSTART.md:67 still pointed users at the deleted scripts/switch-dev-vm.sh. A Quickstart user following step-by-step would hit a missing-file error at the final step. Replaced with the inline gcloud-ssh equivalent that the Removed bullet documents. 2. docs/padak-security.md filename retains the personal identifier 'padak'. The PR fixed the body content (replaced padak/keboola_agent_cli#206 references with generic wording) but missed the filename. Renamed to docs/security-audit-2026-04.md (date-anchored, vendor-neutral). Updated the historical CHANGELOG link to point at the new path with an inline note about the rename. * fix(oss): redact remaining hardcoded IPs from planning docs + remove default email Devin Review caught two more leaks: 1. scripts/fetch-env-from-secrets.sh line 16 had a hardcoded personal-email default (zdenek.srotyr@keboola.com). Replaced with ':?' bash error so SEED_ADMIN_EMAIL must be explicitly set — safer than carrying any specific identity. 2. Planning docs still had 35.195.96.98 and 34.62.223.189 (legacy prod/dev IPs) that the round-1 IP-replace pattern missed (it only targeted 34.77.x.x). Generic regex redaction across all five planning docs replaces every public IP with <redacted-ip>, preserving private/loopback/IAP ranges.
63 KiB
Multi-Customer Deployment Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Přejít z dnešního "prod běží z osobního forku padak/tmp_oss" na production-grade multi-customer setup podle spec docs/superpowers/specs/2026-04-21-multi-customer-deployment-spec.md.
Architecture: Public upstream (keboola/agnes-the-ai-analyst) s TF modulem + public image na GHCR. Privátní template repo (keboola/agnes-infra-template) jako skeleton. Per-customer privátní repo (keboola/agnes-infra-keboola pro Keboola-as-customer, {org}/agnes-infra pro další) s Terraform + GitHub Actions + SA JSON key. Každý zákazník má vlastní GCP projekt, vlastní Secret Manager, vlastní prod/dev VMs. Watchtower na VMs polluje GHCR pro auto-deploy. Branch-aware dev VMs přes pole dev_instances v tfvars.
Tech Stack: Terraform (google provider ~5.0), Docker Compose, Caddy (TLS), Watchtower, GHCR, Google Cloud (Compute Engine, Secret Manager, Cloud Storage, IAM), GitHub Actions, Argon2 (passwords), DuckDB.
Závislosti mezi fázemi
Fáze 1 (MVP) ──────────────────────────┐
│ │
▼ ▼
Fáze 2 (TF modul + PD + rebuild) Fáze 0 (Předpoklady)
│
├─────────┬──────────┬──────────┐
▼ ▼ ▼ ▼
Fáze 3 Fáze 4 Fáze 5 Fáze 6
(TLS) (Watchtower) (CI/CD) (Template)
│ │ │ │
└─────────┴──────────┴──────────┘
▼
Hotovo
Fáze 0 a 1 jsou sériové. Po Fázi 2 mohou 3/4/5 běžet paralelně. Fáze 6 používá výstupy 3/4/5.
Fáze 0 — Předpoklady (manuální, mimo kód)
Tyto kroky vyžadují externí akce (oprávnění, Keboola UI). Musí být hotové před Fází 1.
Task 0.1: Ověřit přístupová práva
- Step 1: Ověřit, že máš
iam.serviceAccountAdminna internal-prod
gcloud projects get-iam-policy internal-prod --format=json \
| python3 -c "import json, sys; d=json.load(sys.stdin); \
me='zdenek.srotyr@keboola.com'; \
roles=[b['role'] for b in d['bindings'] if any(me in m for m in b.get('members', []))]; \
print('\n'.join(roles) if roles else 'NO DIRECT ROLES — check org-level or ask Petr (owner)')"
Expected: seznam rolí, nebo poznámka "NO DIRECT ROLES".
- Step 2: Pokud chybí SA admin práva, požádat Petra o dočasný
roles/iam.serviceAccountAdmin+roles/resourcemanager.projectIamAdmin
Poslat mu odkaz na tuhle dokumentaci: https://cloud.google.com/iam/docs/understanding-roles#iam-roles
Napsat Petrovi ve Slacku / emailu: "Potřebuji dočasně roli iam.serviceAccountAdmin a resourcemanager.projectIamAdmin na projektu internal-prod pro vytvoření Agnes deploy SA. Zrušíme, jakmile bude hotovo."
- Step 3: Ověřit, že image
ghcr.io/keboola/agnes-the-ai-analystje public
gh api /orgs/keboola/packages/container/agnes-the-ai-analyst --jq '.visibility' 2>&1
Expected: "public". Pokud "private", změnit přes GitHub UI: Keboola org → Packages → agnes-the-ai-analyst → Package settings → Change visibility → Public.
Task 0.2: Backup stávajících dat (safety net před Fází 2)
- Step 1: Snapshot boot disku prod VM (obsahuje /data)
gcloud compute disks snapshot data-analyst \
--zone=europe-west1-b \
--snapshot-names=data-analyst-pre-migration-$(date +%Y%m%d) \
--project=internal-prod
Expected: Created snapshot data-analyst-pre-migration-YYYYMMDD.
- Step 2: Ověřit snapshot
gcloud compute snapshots list --project=internal-prod \
--filter="name~pre-migration" --format="table(name, status, diskSizeGb, creationTimestamp)"
Expected: STATUS = READY, 30 GB.
Fáze 1 — MVP: Odstřihnout od osobního forku, přejít na :stable image
Goal fáze: Prod VM data-analyst pulluje image z GHCR, nikoliv git pull z ZdenekSrotyr/tmp_oss. Tokeny jsou v Secret Manageru. Přepnutí je reverzibilní.
Task 1.1: Přidat per-branch image tagging do release.yml
Files:
-
Modify:
.github/workflows/release.yml:47-95 -
Step 1: Number current state of meta step
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
grep -n "branch_slug\|feature_tag\|SLUG" .github/workflows/release.yml 2>&1 | head -5
Expected: žádné výsledky — pattern neexistuje, přidáme ho.
- Step 2: Otevřít
.github/workflows/release.ymla najítClaim version tagstep
Sekce má id: meta. Za řádkem echo "short_sha=${SHORT_SHA}" >> "$GITHUB_OUTPUT" (~ř. 90) přidat:
# Per-branch slug for dev images (only on feature branches)
if [[ "${{ github.ref }}" != "refs/heads/main" ]]; then
BRANCH_NAME="${GITHUB_REF#refs/heads/}"
BRANCH_SLUG=$(echo "$BRANCH_NAME" | sed 's|^feature/||' | sed 's|[^a-zA-Z0-9-]|-|g' | tr '[:upper:]' '[:lower:]' | cut -c1-50)
echo "branch_slug=${BRANCH_SLUG}" >> "$GITHUB_OUTPUT"
echo "Branch slug: ${BRANCH_SLUG}"
fi
- Step 3: V
Build and pushstepu přidat branch-slug tag
Najít tags: | blok (~ř. 110), nahradit za:
tags: |
ghcr.io/${{ github.repository }}:${{ steps.meta.outputs.channel }}
ghcr.io/${{ github.repository }}:${{ steps.meta.outputs.versioned_tag }}
ghcr.io/${{ github.repository }}:sha-${{ steps.meta.outputs.short_sha }}
${{ steps.meta.outputs.channel == 'dev' && format('ghcr.io/{0}:dev-{1}', github.repository, steps.meta.outputs.branch_slug) || '' }}
Poslední řádek přidá :dev-<branch-slug> jen při pushech na feature branch.
- Step 4: Syntax check workflow
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
gh workflow view release.yml 2>&1 | head -10
Expected: workflow info, žádné "Parse error".
- Step 5: Commit
git add .github/workflows/release.yml
git commit -m "ci: add per-branch image tag :dev-<slug> for branch-aware dev deploys"
Task 1.2: Vytvořit GCP deploy service account
Files:
-
Create:
scripts/bootstrap-gcp.sh -
Step 1: Vytvořit bootstrap skript
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
mkdir -p scripts
Write scripts/bootstrap-gcp.sh:
#!/usr/bin/env bash
# Bootstrap GCP projekt pro Agnes deployment.
# Jednorázové, idempotentní. Výstup = výpis secretů pro GitHub Actions.
#
# Usage: bootstrap-gcp.sh <GCP_PROJECT_ID> [SA_NAME]
# Pokud SA existuje, skript vygeneruje nový klíč a skončí.
set -euo pipefail
PROJECT_ID="${1:?Usage: $0 <GCP_PROJECT_ID> [SA_NAME=agnes-deploy]}"
SA_NAME="${2:-agnes-deploy}"
SA_EMAIL="${SA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com"
echo "=== Bootstrap GCP projekt: ${PROJECT_ID} ==="
gcloud config set project "${PROJECT_ID}" 1>/dev/null
echo "=== Enable APIs ==="
gcloud services enable \
compute.googleapis.com \
iam.googleapis.com \
iamcredentials.googleapis.com \
secretmanager.googleapis.com \
cloudresourcemanager.googleapis.com \
storage.googleapis.com \
--project="${PROJECT_ID}"
echo "=== Create deploy service account (if not exists) ==="
if ! gcloud iam service-accounts describe "${SA_EMAIL}" --project="${PROJECT_ID}" 2>/dev/null; then
gcloud iam service-accounts create "${SA_NAME}" \
--display-name="Agnes Terraform deploy" \
--project="${PROJECT_ID}"
fi
echo "=== Grant roles ==="
for role in \
compute.instanceAdmin.v1 \
compute.securityAdmin \
compute.networkAdmin \
iam.serviceAccountUser \
secretmanager.admin \
storage.admin; do
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
--member="serviceAccount:${SA_EMAIL}" \
--role="roles/${role}" \
--condition=None \
--quiet 1>/dev/null
done
echo "=== Create tfstate bucket (if not exists) ==="
BUCKET="agnes-${PROJECT_ID}-tfstate"
if ! gsutil ls -b "gs://${BUCKET}" 2>/dev/null; then
gsutil mb -p "${PROJECT_ID}" -l europe-west1 -b on "gs://${BUCKET}"
gsutil versioning set on "gs://${BUCKET}"
fi
echo "=== Generate SA key ==="
KEY_FILE="./${SA_NAME}-${PROJECT_ID}-key.json"
gcloud iam service-accounts keys create "${KEY_FILE}" \
--iam-account="${SA_EMAIL}" \
--project="${PROJECT_ID}"
echo ""
echo "=== HOTOVO ==="
echo ""
echo "SA email: ${SA_EMAIL}"
echo "TF state bucket: gs://${BUCKET}"
echo "SA key file: ${KEY_FILE}"
echo ""
echo "DALŠÍ KROKY:"
echo "1. Pushni klíč do GitHub secretu privátního infra repa:"
echo " gh secret set GCP_SA_KEY --repo <owner>/<repo> < ${KEY_FILE}"
echo "2. POTOM smaž klíč z lokálu:"
echo " rm ${KEY_FILE}"
echo ""
- Step 2: Udělat skript executable
chmod +x scripts/bootstrap-gcp.sh
- Step 3: Spustit skript na internal-prod
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
./scripts/bootstrap-gcp.sh internal-prod
Expected: na konci výpis "HOTOVO" + instrukce.
Pokud selže na "Permission denied": viz Task 0.1 step 2 (požádat Petra).
- Step 4: Ověřit SA a bucket
gcloud iam service-accounts list --project=internal-prod --filter="email~agnes-deploy" --format="value(email)"
gsutil ls -b gs://agnes-internal-prod-tfstate
Expected: SA email + bucket URL.
- Step 5: Commit bootstrap skript
git add scripts/bootstrap-gcp.sh
git commit -m "infra: add bootstrap-gcp.sh for per-customer GCP setup"
Task 1.3: Nastavit tajemství v Secret Manageru
- Step 1: Rotovat Keboola Storage token v Keboola UI
Přihlásit se do Keboola UI (https://connection.us-east4.gcp.keboola.com/), sekce Settings → Master Tokens → vygenerovat nový token.
Starý token zachovat aktivní, dokud nebude nový nasazený.
- Step 2: Uložit nový token do Secret Manageru
read -s NEW_TOKEN
echo -n "$NEW_TOKEN" | gcloud secrets create keboola-storage-token \
--data-file=- \
--replication-policy=automatic \
--project=internal-prod
unset NEW_TOKEN
Expected: Created secret [keboola-storage-token].
- Step 3: Vygenerovat a uložit JWT secret
openssl rand -hex 32 | gcloud secrets create jwt-secret-key \
--data-file=- \
--replication-policy=automatic \
--project=internal-prod
Expected: Created secret [jwt-secret-key].
- Step 4: Ověřit secrets
gcloud secrets list --project=internal-prod --format="table(name, createTime)"
Expected: dva secrets — keboola-storage-token, jwt-secret-key.
- Step 5: Přiřadit read access deploy SA
for secret in keboola-storage-token jwt-secret-key; do
gcloud secrets add-iam-policy-binding "$secret" \
--member="serviceAccount:agnes-deploy@internal-prod.iam.gserviceaccount.com" \
--role=roles/secretmanager.secretAccessor \
--project=internal-prod
done
Expected: Updated IAM policy × 2.
Task 1.4: Vytvořit skript, který na VM natáhne secrets ze Secret Manageru do .env
Files:
-
Create:
scripts/fetch-env-from-secrets.sh -
Step 1: Napsat skript
Write scripts/fetch-env-from-secrets.sh:
#!/usr/bin/env bash
# Stáhne secrets z GCP Secret Manageru a vytvoří .env pro Agnes.
# Spouští se jednorázově na VM během boot / deploy.
#
# Vyžaduje:
# - gcloud CLI (už nainstalované na GCE default image)
# - VM SA má roli roles/secretmanager.secretAccessor
set -euo pipefail
APP_DIR="${APP_DIR:-/home/deploy/app}"
ENV_FILE="${APP_DIR}/.env"
echo "Fetching secrets..."
KEBOOLA_TOKEN=$(gcloud secrets versions access latest --secret=keboola-storage-token 2>&1)
JWT_KEY=$(gcloud secrets versions access latest --secret=jwt-secret-key 2>&1)
# Non-secret config (může zůstat v metadatě/startup-scriptu)
DATA_SOURCE="${DATA_SOURCE:-keboola}"
KEBOOLA_STACK_URL="${KEBOOLA_STACK_URL:-https://connection.us-east4.gcp.keboola.com/}"
SEED_ADMIN_EMAIL="${SEED_ADMIN_EMAIL:-zdenek.srotyr@keboola.com}"
LOG_LEVEL="${LOG_LEVEL:-info}"
DATA_DIR="${DATA_DIR:-/data}"
cat > "${ENV_FILE}" <<EOF
JWT_SECRET_KEY=${JWT_KEY}
DATA_DIR=${DATA_DIR}
DATA_SOURCE=${DATA_SOURCE}
KEBOOLA_STORAGE_TOKEN=${KEBOOLA_TOKEN}
KEBOOLA_STACK_URL=${KEBOOLA_STACK_URL}
SEED_ADMIN_EMAIL=${SEED_ADMIN_EMAIL}
LOG_LEVEL=${LOG_LEVEL}
EOF
chmod 600 "${ENV_FILE}"
chown deploy:deploy "${ENV_FILE}" 2>/dev/null || true
echo "Wrote ${ENV_FILE} (chmod 600)"
- Step 2: Chmod + commit
chmod +x scripts/fetch-env-from-secrets.sh
git add scripts/fetch-env-from-secrets.sh
git commit -m "infra: add fetch-env-from-secrets.sh for VM-side secret retrieval"
Task 1.5: Připravit prod docker-compose pro GHCR image
Files:
-
Modify:
docker-compose.prod.yml -
Step 1: Přečíst současný docker-compose.prod.yml
cat "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss/docker-compose.prod.yml"
Zaznamenat si strukturu (services, volumes).
- Step 2: Ověřit, že prod overlay používá
image:místobuild:
grep -E "^\s*(image|build):" docker-compose.prod.yml
Expected: řádek image: ghcr.io/keboola/agnes-the-ai-analyst:${AGNES_TAG:-stable} (nebo podobně). Pokud chybí, přidat do services.app:
services:
app:
image: ghcr.io/keboola/agnes-the-ai-analyst:${AGNES_TAG:-stable}
build: !reset null # vypnout lokální build
A pro scheduler:
scheduler:
image: ghcr.io/keboola/agnes-the-ai-analyst:${AGNES_TAG:-stable}
build: !reset null
- Step 3: Commit změn (pokud nějaké)
git status docker-compose.prod.yml
# Pokud modified:
git add docker-compose.prod.yml
git commit -m "infra: prod compose pulls from GHCR via AGNES_TAG env (default :stable)"
Task 1.6: Deploy MVP na prod VM data-analyst
Tohle je destruktivní akce na prod. Předtím Task 0.2 (snapshot).
- Step 1: SSH na prod VM a zastavit kontejnery
gcloud compute ssh data-analyst --zone=europe-west1-b --project=internal-prod --command="sudo -u deploy bash -c 'cd /home/deploy/app && docker compose down'"
Expected: Container app-app-1 Stopped, Container app-scheduler-1 Stopped.
- Step 2: Nastavit VM SA na deploy VM (jednorázově)
# Ověřit aktuální SA
gcloud compute instances describe data-analyst --zone=europe-west1-b --project=internal-prod \
--format="value(serviceAccounts[0].email)"
Pokud výstup 327445566538-compute@developer.gserviceaccount.com (default SA), je to OK pro MVP — má cloud-platform scope a může číst secrets. Ve Fázi 4 (hardening) to přepneme na dedikovaný SA.
Přidat mu explicitně secretmanager.secretAccessor (idempotentní):
gcloud projects add-iam-policy-binding internal-prod \
--member="serviceAccount:327445566538-compute@developer.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor" \
--condition=None
- Step 3: Uploadnout fetch-env skript na VM
gcloud compute scp \
"/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss/scripts/fetch-env-from-secrets.sh" \
data-analyst:/tmp/fetch-env.sh \
--zone=europe-west1-b --project=internal-prod
- Step 4: Spustit fetch-env skript pod uživatelem deploy
gcloud compute ssh data-analyst --zone=europe-west1-b --project=internal-prod --command="sudo install -m 755 -o deploy -g deploy /tmp/fetch-env.sh /home/deploy/app/fetch-env.sh && sudo -u deploy bash -c 'cd /home/deploy/app && ./fetch-env.sh'"
Expected: Wrote /home/deploy/app/.env (chmod 600).
- Step 5: Zkontrolovat .env na VM (bez vypisování hodnot)
gcloud compute ssh data-analyst --zone=europe-west1-b --project=internal-prod --command="sudo -u deploy bash -c 'ls -la /home/deploy/app/.env && wc -l /home/deploy/app/.env && cut -d= -f1 /home/deploy/app/.env'"
Expected: soubor 600 mode, 7 řádků, klíče: JWT_SECRET_KEY, DATA_DIR, DATA_SOURCE, KEBOOLA_STORAGE_TOKEN, KEBOOLA_STACK_URL, SEED_ADMIN_EMAIL, LOG_LEVEL.
- Step 6: Aktualizovat docker-compose.yml konfiguraci na VM na pulling z GHCR
gcloud compute ssh data-analyst --zone=europe-west1-b --project=internal-prod --command="sudo -u deploy bash -c 'cd /home/deploy/app && git fetch origin feature/v2-fastapi-duckdb-docker-cli && git reset --hard origin/feature/v2-fastapi-duckdb-docker-cli'"
Pozor: VM má starý remote ZdenekSrotyr/tmp_oss. Tohle tedy nebude fungovat, pokud se ten repo smazal. Alternativa: nahradit origin remote za keboola/agnes-the-ai-analyst:
gcloud compute ssh data-analyst --zone=europe-west1-b --project=internal-prod --command="sudo -u deploy bash -c 'cd /home/deploy/app && git remote set-url origin https://github.com/keboola/agnes-the-ai-analyst.git && git fetch origin main && git reset --hard origin/main'"
Expected: HEAD is now at <sha> <message>.
- Step 7: Pullnout image z GHCR a nastartovat s novým override
gcloud compute ssh data-analyst --zone=europe-west1-b --project=internal-prod --command="sudo -u deploy bash -c 'cd /home/deploy/app && export AGNES_TAG=stable && docker compose -f docker-compose.yml -f docker-compose.prod.yml pull && docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d'"
Expected: Container app-app-1 Started, Container app-scheduler-1 Started.
- Step 8: Ověřit běh
# Počkat 30 sekund
sleep 30
curl -s --max-time 10 http://<redacted-ip>:8000/api/health | python3 -m json.tool | head -10
Expected: "status": "healthy" nebo "degraded" (stale tables jsou OK). Ne connection refused.
- Step 9: Ověřit, že app používá nový image
gcloud compute ssh data-analyst --zone=europe-west1-b --project=internal-prod --command="sudo docker inspect app-app-1 --format '{{.Config.Image}}'"
Expected: ghcr.io/keboola/agnes-the-ai-analyst:stable (ne app-app).
- Step 10: Ověřit login
curl -sS --max-time 5 -X POST http://<redacted-ip>:8000/auth/password/login \
-H "Content-Type: application/json" \
-d '{"email":"zdenek.srotyr@keboola.com","password":"1234"}' 2>&1 | python3 -c "import sys,json; d=json.load(sys.stdin); print('OK — role:', d.get('role'))"
Expected: OK — role: admin.
- Step 11: Zapsat poznámku o nové .env strategii do dokumentace
Add to docs/DEPLOYMENT.md (if not present) section "Production environment":
## Production .env strategy
Secrets (KEBOOLA_STORAGE_TOKEN, JWT_SECRET_KEY) are fetched from GCP Secret Manager
by `scripts/fetch-env-from-secrets.sh` during VM boot. Non-secret config (STACK_URL,
SEED_ADMIN_EMAIL, LOG_LEVEL) is passed via env vars in the startup script.
To rotate a secret:
1. Add a new version via `gcloud secrets versions add ...`
2. SSH to VM and re-run `./fetch-env.sh`
3. Restart: `docker compose up -d --force-recreate app`
- Step 12: Commit dokumentace
git add docs/DEPLOYMENT.md
git commit -m "docs: document Secret Manager-backed .env for production"
Task 1.7: Zopakovat MVP deploy na dev VM
- Step 1: Opakovat Task 1.6 steps 1-10 proti data-analyst-dev VM
Stejné příkazy, jen zaměnit data-analyst za data-analyst-dev a IP <redacted-ip> za <redacted-ip>.
- Step 2: Verify
curl -s --max-time 10 http://<redacted-ip>:8000/api/health | python3 -m json.tool | head -3
Expected: valid JSON s "status".
Task 1.8: Smazat osobní fork
- Step 1: Odstranit deploy key z
ZdenekSrotyr/tmp_oss(pokud existuje)
gh api repos/ZdenekSrotyr/tmp_oss/keys 2>&1 | python3 -m json.tool
Pokud něco vrací, smazat: gh api -X DELETE repos/ZdenekSrotyr/tmp_oss/keys/<id>.
- Step 2: Smazat repo
gh repo delete ZdenekSrotyr/tmp_oss --yes
Expected: ✓ Deleted repository ZdenekSrotyr/tmp_oss.
- Step 3: Ověřit, že je fuč
gh api repos/ZdenekSrotyr/tmp_oss 2>&1 | head -2
Expected: Not Found (HTTP 404).
Task 1.9: Invalidovat starý Keboola token
- Step 1: V Keboola UI zrušit starý master token
(Ruční krok v Keboola UI. Nový token už je v Secret Manageru z Task 1.3.)
Ověřit, že nová verze tokenu funguje:
curl -s --max-time 10 http://<redacted-ip>:8000/api/sync/status 2>&1 | python3 -m json.tool | head -20
Expected: nějaký valid JSON. Pokud 401 Unauthorized nebo Invalid token, app ještě má cached starý token — restartovat:
gcloud compute ssh data-analyst --zone=europe-west1-b --project=internal-prod --command="sudo -u deploy bash -c 'cd /home/deploy/app && docker compose restart app'"
Task 1.10: Checkpoint — Fáze 1 hotová
- Step 1: Přepnout heslo z
1234na něco silného
Přes UI nebo:
read -s NEW_PASSWORD
TOKEN=$(curl -sS -X POST http://<redacted-ip>:8000/auth/password/login \
-H "Content-Type: application/json" \
-d '{"email":"zdenek.srotyr@keboola.com","password":"1234"}' | python3 -c "import sys,json;print(json.load(sys.stdin)['access_token'])")
# [Volba: použít admin endpoint pro změnu hesla, pokud existuje — jinak přes UI]
unset NEW_PASSWORD TOKEN
- Step 2: Ověřit stav
Zkontrolovat checklist:
- Prod VM
data-analystběží zghcr.io/...:stable - Dev VM
data-analyst-devběží zghcr.io/...:stable - Secrets v GCP Secret Manageru
- Heslo admin usera není
1234 ZdenekSrotyr/tmp_ossje smazaný- Starý Keboola token je invalidován
Fáze 2 — TF modul + persistent disk + F1 rebuild
Goal fáze: Keboola instance běží na VMs, kterou spravuje Terraform modul z infra/modules/customer-instance/. Data jsou na samostatném persistent disku. TF state v GCS bucketu.
Task 2.1: Refactor infra/main.tf na modulární strukturu
Files:
-
Create:
infra/modules/customer-instance/main.tf -
Create:
infra/modules/customer-instance/variables.tf -
Create:
infra/modules/customer-instance/outputs.tf -
Create:
infra/modules/customer-instance/startup-script.sh -
Delete:
infra/main.tf(old monolith) -
Keep (upraveno):
infra/variables.tf,infra/outputs.tf,infra/terraform.tfvars.example -
Create:
infra/examples/minimal/main.tf(usage example) -
Step 1: Vytvořit adresářovou strukturu
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
mkdir -p infra/modules/customer-instance
mkdir -p infra/examples/minimal
- Step 2: Napsat
infra/modules/customer-instance/variables.tf
Write:
variable "gcp_project_id" {
description = "GCP project ID kde bude instance nasazená"
type = string
}
variable "region" {
description = "GCP region"
type = string
default = "europe-west1"
}
variable "zone" {
description = "GCP zone"
type = string
default = "europe-west1-b"
}
variable "customer_name" {
description = "Krátké identifikátor zákazníka (např. keboola, another-customer). Použije se v prefixu resourců."
type = string
validation {
condition = can(regex("^[a-z][a-z0-9-]{1,20}$", var.customer_name))
error_message = "customer_name musí být lowercase, začínat písmenem, 2-21 znaků."
}
}
variable "prod_instance" {
description = "Prod VM konfigurace"
type = object({
name = string
machine_type = optional(string, "e2-small")
disk_size_gb = optional(number, 30)
data_disk_gb = optional(number, 50)
image_tag = optional(string, "stable")
upgrade_mode = optional(string, "auto")
tls_mode = optional(string, "caddy")
domain = optional(string, "")
})
}
variable "dev_instances" {
description = "Seznam dev VMs. Prázdné pole = žádné dev VMs."
type = list(object({
name = string
machine_type = optional(string, "e2-small")
image_tag = optional(string, "dev")
}))
default = []
}
variable "seed_admin_email" {
description = "Email prvního admin usera"
type = string
}
variable "data_source" {
description = "Typ data source — keboola | bigquery | csv"
type = string
default = "keboola"
}
variable "keboola_stack_url" {
description = "Keboola Stack URL (pokud data_source = keboola)"
type = string
default = ""
}
variable "image_repo" {
description = "Docker image repo"
type = string
default = "ghcr.io/keboola/agnes-the-ai-analyst"
}
- Step 3: Napsat
infra/modules/customer-instance/main.tf
Write:
terraform {
required_version = ">= 1.5"
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
random = {
source = "hashicorp/random"
version = "~> 3.0"
}
}
}
locals {
all_instances = concat(
[merge(var.prod_instance, { role = "prod" })],
[for d in var.dev_instances : merge(d, {
role = "dev"
disk_size_gb = 30
data_disk_gb = 20
upgrade_mode = "auto"
tls_mode = "caddy"
domain = ""
})]
)
}
# --- Secrets ---
resource "google_secret_manager_secret" "jwt" {
secret_id = "agnes-${var.customer_name}-jwt-secret"
project = var.gcp_project_id
replication { auto {} }
}
resource "random_password" "jwt" {
length = 48
special = false
}
resource "google_secret_manager_secret_version" "jwt" {
secret = google_secret_manager_secret.jwt.id
secret_data = random_password.jwt.result
}
# Keboola token — manuálně vytvořený secret (tenhle TF ho jen referenční).
data "google_secret_manager_secret_version" "keboola_token" {
count = var.data_source == "keboola" ? 1 : 0
secret = "keboola-storage-token"
project = var.gcp_project_id
}
# --- VM service account (dedikovaný, bez cloud-platform scope) ---
resource "google_service_account" "vm" {
account_id = "agnes-${var.customer_name}-vm"
display_name = "Agnes VM runtime SA (${var.customer_name})"
project = var.gcp_project_id
}
resource "google_project_iam_member" "vm_secrets" {
project = var.gcp_project_id
role = "roles/secretmanager.secretAccessor"
member = "serviceAccount:${google_service_account.vm.email}"
}
# --- Network ---
resource "google_compute_firewall" "web" {
name = "agnes-${var.customer_name}-allow-web"
project = var.gcp_project_id
network = "default"
allow {
protocol = "tcp"
ports = ["22", "80", "443", "8000"]
}
source_ranges = ["<redacted-ip>/0"]
target_tags = ["agnes-${var.customer_name}"]
}
# --- Persistent data disks + VMs (prod + dev) ---
resource "google_compute_disk" "data" {
for_each = { for inst in local.all_instances : inst.name => inst }
name = "${each.value.name}-data"
project = var.gcp_project_id
zone = var.zone
size = each.value.data_disk_gb
type = "pd-ssd"
}
resource "google_compute_address" "ip" {
for_each = { for inst in local.all_instances : inst.name => inst }
name = "${each.value.name}-ip"
project = var.gcp_project_id
region = var.region
}
resource "google_compute_instance" "vm" {
for_each = { for inst in local.all_instances : inst.name => inst }
name = each.value.name
project = var.gcp_project_id
machine_type = each.value.machine_type
zone = var.zone
tags = ["agnes-${var.customer_name}"]
boot_disk {
initialize_params {
image = "ubuntu-os-cloud/ubuntu-2404-lts-amd64"
size = each.value.disk_size_gb
type = "pd-ssd"
}
}
attached_disk {
source = google_compute_disk.data[each.key].self_link
device_name = "data"
}
network_interface {
network = "default"
access_config {
nat_ip = google_compute_address.ip[each.key].address
}
}
metadata = {
enable-oslogin = "TRUE"
}
metadata_startup_script = templatefile("${path.module}/startup-script.sh", {
customer_name = var.customer_name
image_repo = var.image_repo
image_tag = each.value.image_tag
upgrade_mode = each.value.upgrade_mode
tls_mode = each.value.tls_mode
domain = each.value.domain
data_source = var.data_source
keboola_stack_url = var.keboola_stack_url
seed_admin_email = var.seed_admin_email
role = each.value.role
})
service_account {
email = google_service_account.vm.email
scopes = ["cloud-platform"]
}
labels = {
app = "agnes"
customer = var.customer_name
role = each.value.role
managed = "terraform"
}
lifecycle {
ignore_changes = [metadata_startup_script]
}
}
- Step 4: Napsat
infra/modules/customer-instance/startup-script.sh
Write:
#!/bin/bash
# Agnes VM startup script.
# Idempotentní — spustí se při každém boot.
set -euo pipefail
exec > /var/log/agnes-startup.log 2>&1
CUSTOMER_NAME="${customer_name}"
IMAGE_REPO="${image_repo}"
IMAGE_TAG="${image_tag}"
UPGRADE_MODE="${upgrade_mode}"
TLS_MODE="${tls_mode}"
DOMAIN="${domain}"
DATA_SOURCE="${data_source}"
KEBOOLA_STACK_URL="${keboola_stack_url}"
SEED_ADMIN_EMAIL="${seed_admin_email}"
ROLE="${role}"
echo "=== [Agnes $CUSTOMER_NAME $ROLE] Startup ==="
# --- 1. Docker (install if missing) ---
if ! command -v docker &>/dev/null; then
curl -fsSL https://get.docker.com | sh
fi
if ! docker compose version &>/dev/null; then
apt-get update && apt-get install -y docker-compose-plugin
fi
# --- 2. Persistent disk mount ---
DATA_DEV="/dev/disk/by-id/google-data"
DATA_MNT="/data"
if [ -b "$DATA_DEV" ]; then
if ! blkid "$DATA_DEV" | grep -q ext4; then
mkfs.ext4 -F "$DATA_DEV"
fi
mkdir -p "$DATA_MNT"
mountpoint -q "$DATA_MNT" || mount -o discard,defaults "$DATA_DEV" "$DATA_MNT"
grep -q "$DATA_DEV" /etc/fstab || echo "$DATA_DEV $DATA_MNT ext4 discard,defaults,nofail 0 2" >> /etc/fstab
mkdir -p "$DATA_MNT/state" "$DATA_MNT/analytics" "$DATA_MNT/extracts"
fi
# --- 3. App directory (pro docker-compose.yml) ---
APP_DIR="/opt/agnes"
mkdir -p "$APP_DIR"
cd "$APP_DIR"
# Fetch minimal docker-compose — z public repa na jejich tagu
curl -fsSL "https://raw.githubusercontent.com/keboola/agnes-the-ai-analyst/main/docker-compose.yml" \
-o docker-compose.yml
curl -fsSL "https://raw.githubusercontent.com/keboola/agnes-the-ai-analyst/main/docker-compose.prod.yml" \
-o docker-compose.prod.yml
# --- 4. Fetch secrets from Secret Manager ---
KEBOOLA_TOKEN=""
if [ "$DATA_SOURCE" = "keboola" ]; then
KEBOOLA_TOKEN=$(gcloud secrets versions access latest --secret=keboola-storage-token 2>/dev/null || echo "")
fi
JWT_KEY=$(gcloud secrets versions access latest --secret=agnes-$CUSTOMER_NAME-jwt-secret)
cat > "$APP_DIR/.env" <<EOF
JWT_SECRET_KEY=$JWT_KEY
DATA_DIR=$DATA_MNT
DATA_SOURCE=$DATA_SOURCE
KEBOOLA_STORAGE_TOKEN=$KEBOOLA_TOKEN
KEBOOLA_STACK_URL=$KEBOOLA_STACK_URL
SEED_ADMIN_EMAIL=$SEED_ADMIN_EMAIL
LOG_LEVEL=info
DOMAIN=$DOMAIN
AGNES_TAG=$IMAGE_TAG
EOF
chmod 600 "$APP_DIR/.env"
# --- 5. Start Agnes ---
docker compose -f docker-compose.yml -f docker-compose.prod.yml pull
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
# --- 6. Watchtower (auto pull nových image) ---
if [ "$UPGRADE_MODE" = "auto" ]; then
docker run -d --name watchtower --restart=unless-stopped \
-v /var/run/docker.sock:/var/run/docker.sock \
containrrr/watchtower \
--interval 300 --cleanup \
$(docker ps --filter "ancestor=$IMAGE_REPO:$IMAGE_TAG" --format "{{.Names}}") 2>/dev/null || true
fi
echo "=== [Agnes $CUSTOMER_NAME $ROLE] Startup complete ==="
docker compose ps
- Step 5: Napsat
infra/modules/customer-instance/outputs.tf
Write:
output "instance_ips" {
description = "Mapa { name → external IP }"
value = { for k, v in google_compute_address.ip : k => v.address }
}
output "prod_ip" {
description = "External IP prod instance"
value = google_compute_address.ip[var.prod_instance.name].address
}
output "vm_service_account" {
description = "Email VM SA (pro další IAM bindings, např. BigQuery)"
value = google_service_account.vm.email
}
output "jwt_secret_name" {
description = "Plný název JWT secretu v Secret Manageru"
value = google_secret_manager_secret.jwt.name
}
- Step 6: Smazat starý
infra/main.tfa uložit si ho jako backup
mv infra/main.tf infra/main.tf.backup-pre-module
- Step 7: Vytvořit
infra/examples/minimal/main.tf
Write:
# Minimal example: single-VM Agnes deploy.
# Pro OSS self-hoster, co nechce ani persistent disk ani dev VM.
terraform {
required_version = ">= 1.5"
required_providers {
google = { source = "hashicorp/google", version = "~> 5.0" }
}
}
provider "google" {
project = var.gcp_project_id
region = "europe-west1"
}
variable "gcp_project_id" {
type = string
}
module "agnes" {
source = "../../modules/customer-instance"
gcp_project_id = var.gcp_project_id
customer_name = "self-hosted"
seed_admin_email = "admin@example.com"
prod_instance = {
name = "agnes"
data_disk_gb = 30
}
dev_instances = []
data_source = "keboola"
}
output "agnes_ip" {
value = module.agnes.prod_ip
}
- Step 8: Smazat
infra/variables.tf,infra/outputs.tf,infra/terraform.tfvars.example(už patří do modulu / examples)
# Backup si udělat
mv infra/variables.tf infra/variables.tf.backup-pre-module
mv infra/outputs.tf infra/outputs.tf.backup-pre-module
mv infra/terraform.tfvars.example infra/terraform.tfvars.example.backup-pre-module
- Step 9:
terraform init+validatev example
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss/infra/examples/minimal"
terraform init -backend=false
terraform validate
Expected: Success! The configuration is valid.
- Step 10: Commit
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
git add infra/modules/ infra/examples/
git add -u infra/ # pro mv backupy
git commit -m "infra: extract customer-instance Terraform module; add minimal example"
Task 2.2: Tag prvního release TF modulu
- Step 1: Otevřít PR z feature branch do main
git push origin feature/v2-fastapi-duckdb-docker-cli
gh pr create --title "feat: multi-customer deployment (Fáze 1-2)" \
--body "Implements Phases 1-2 of docs/superpowers/plans/2026-04-21-multi-customer-deployment.md"
- Step 2: Po mergi do main vytvořit tag
infra-v1.0.0
git checkout main
git pull
git tag -a infra-v1.0.0 -m "Initial customer-instance module release"
git push origin infra-v1.0.0
Task 2.3: Založit privátní repo keboola/agnes-infra-keboola (manuálně)
Tohle je krok mimo tento repo. Plán jen popisuje.
- Step 1: Vytvořit prázdný privátní repo
gh repo create keboola/agnes-infra-keboola --private --description "Agnes deployment — Keboola internal instance"
- Step 2: Klonovat lokálně vedle tohohle repa
cd ~/Library/Mobile\ Documents/com\~apple\~CloudDocs/Sources/VsCode/component_factory/
gh repo clone keboola/agnes-infra-keboola
cd agnes-infra-keboola
- Step 3: Vytvořit strukturu
mkdir -p terraform .github/workflows config
# Terraform root
cat > terraform/main.tf <<'EOF'
terraform {
required_version = ">= 1.5"
required_providers {
google = { source = "hashicorp/google", version = "~> 5.0" }
}
backend "gcs" {
bucket = "agnes-internal-prod-tfstate"
prefix = "keboola"
}
}
provider "google" {
project = var.gcp_project_id
region = var.region
zone = var.zone
}
module "agnes" {
source = "github.com/keboola/agnes-the-ai-analyst//infra/modules/customer-instance?ref=infra-v1.0.0"
gcp_project_id = var.gcp_project_id
region = var.region
zone = var.zone
customer_name = "keboola"
seed_admin_email = var.seed_admin_email
data_source = "keboola"
keboola_stack_url = var.keboola_stack_url
prod_instance = var.prod_instance
dev_instances = var.dev_instances
}
output "prod_ip" { value = module.agnes.prod_ip }
output "instance_ips" { value = module.agnes.instance_ips }
EOF
cat > terraform/variables.tf <<'EOF'
variable "gcp_project_id" { type = string }
variable "region" { type = string, default = "europe-west1" }
variable "zone" { type = string, default = "europe-west1-b" }
variable "seed_admin_email" { type = string }
variable "keboola_stack_url" { type = string }
variable "prod_instance" { type = any }
variable "dev_instances" { type = any, default = [] }
EOF
cat > terraform/terraform.tfvars.example <<'EOF'
gcp_project_id = "internal-prod"
seed_admin_email = "zdenek.srotyr@keboola.com"
keboola_stack_url = "https://connection.us-east4.gcp.keboola.com/"
prod_instance = {
name = "agnes-prod"
machine_type = "e2-small"
data_disk_gb = 50
image_tag = "stable"
upgrade_mode = "auto"
tls_mode = "caddy"
domain = ""
}
dev_instances = [
{ name = "agnes-dev", image_tag = "dev" }
]
EOF
cat > terraform/.gitignore <<'EOF'
terraform.tfvars
*.tfstate
*.tfstate.*
.terraform/
.terraform.lock.hcl
EOF
cp terraform/terraform.tfvars.example terraform/terraform.tfvars
# Edit terraform.tfvars on real values if they differ
- Step 4: Initial commit
git add .
git commit -m "initial: Keboola-as-customer Agnes deployment"
git push -u origin main
- Step 5: Uploadnout GCP_SA_KEY jako GitHub secret
# Klíč vytvořený v Task 1.2 step 3
gh secret set GCP_SA_KEY --repo keboola/agnes-infra-keboola \
< ../tmp_oss/agnes-deploy-internal-prod-key.json
Poznámka: Pokud klíč ne už smazal, re-generate: gcloud iam service-accounts keys create ....
- Step 6: První terraform init + plan (lokálně, abychom viděli diff)
cd terraform
export GOOGLE_APPLICATION_CREDENTIALS="../agnes-deploy-key.json"
terraform init
terraform plan
Expected: Plan: N to add, 0 to change, 0 to destroy. (N ~ 15-20 resources)
Zkontrolovat plán: žádné destroy na existujících data-analyst / data-analyst-dev (to teprve poté, co bude nové nahoře).
Task 2.4: Migrace dat ze starých VMs na nové (bez downtime risku)
Strategy: Zachovat staré VMs běžící. Terraform vytvoří nové VMs s jinými jmény (agnes-prod, agnes-dev). Data se zkopírují. Poté přepneme DNS/IP (nebo jen komunikujeme novou IP) a staré VMs smažeme.
- Step 1: Snapshot starého /data
Už máme z Task 0.2. Pokud je snapshot starší než 24 h, udělat nový:
gcloud compute disks snapshot data-analyst \
--zone=europe-west1-b \
--snapshot-names=data-analyst-migration-$(date +%Y%m%d-%H%M) \
--project=internal-prod
- Step 2: Terraform apply — vytvoří nové VMs (
agnes-prod,agnes-dev) vedle starých
cd ~/.../agnes-infra-keboola/terraform
terraform apply
# Type 'yes' to confirm
Expected: ~15-20 resources created, ~5 min. Outputs: prod_ip, instance_ips.
- Step 3: Zkopírovat data ze starého boot-disku na nový persistent disk
Nové VMs mají prázdný /data. Musíme do něj nakopírovat stav z data-analyst VM.
Nejjednodušší cesta: rsync mezi VM přes SSH.
# SSH na nové prod VM
NEW_PROD_IP=$(cd ~/.../agnes-infra-keboola/terraform && terraform output -raw prod_ip)
# Zkopírovat SSH klíč na starou VM, aby mohla mít přístup na novou
# (nebo použít oslogin → další prerekvizita)
# Alternativa: udělat z druhé strany — SSH na starou VM, rsync na novou
gcloud compute ssh data-analyst --zone=europe-west1-b --project=internal-prod --command="sudo docker compose -f /home/deploy/app/docker-compose.yml -f /home/deploy/app/docker-compose.prod.yml down"
# Rsync přes gcloud compute scp recursive (funguje jen z lokálu)
gcloud compute scp --recurse --zone=europe-west1-b --project=internal-prod \
data-analyst:/home/deploy/app/data-volume/ \
agnes-prod:/data/
# Spustit app na nové VM znovu
gcloud compute ssh agnes-prod --zone=europe-west1-b --project=internal-prod --command="sudo docker compose -f /opt/agnes/docker-compose.yml -f /opt/agnes/docker-compose.prod.yml restart"
Alternativně (čistěji): restore ze snapshotu přes gcloud compute disks create --source-snapshot, pak attach místo prázdného data disku.
- Step 4: Ověřit nový prod
NEW_PROD_IP=$(cd ~/.../agnes-infra-keboola/terraform && terraform output -raw prod_ip)
curl -s --max-time 10 "http://$NEW_PROD_IP:8000/api/health" | python3 -m json.tool | head -10
Expected: healthy / degraded, tables visible.
- Step 5: Ověřit login na novém prod
curl -sS -X POST "http://$NEW_PROD_IP:8000/auth/password/login" \
-H "Content-Type: application/json" \
-d '{"email":"zdenek.srotyr@keboola.com","password":"<nové silné heslo z Task 1.10>"}' \
| python3 -c "import sys,json;print('OK' if json.load(sys.stdin).get('role')=='admin' else 'FAIL')"
Expected: OK
- Step 6: Zopakovat pro dev VM (
agnes-dev)
Stejné kroky 1-5.
- Step 7: Vypnout staré VMs (zatím NEmazat — jen stop)
gcloud compute instances stop data-analyst --zone=europe-west1-b --project=internal-prod
gcloud compute instances stop data-analyst-dev --zone=europe-west1-b --project=internal-prod
- Step 8: Ověřit, že nový prod běží minimálně 24 h bez problému
# Poznámka v kalendáři / Slacku: "check agnes-prod health in 24h"
curl -s "http://$NEW_PROD_IP:8000/api/health" | python3 -m json.tool
- Step 9: Po 24h stability smazat staré VMs + jejich disky + statické IP
gcloud compute instances delete data-analyst --zone=europe-west1-b --project=internal-prod --quiet
gcloud compute instances delete data-analyst-dev --zone=europe-west1-b --project=internal-prod --quiet
gcloud compute disks delete data-analyst --zone=europe-west1-b --project=internal-prod --quiet 2>&1 || true
gcloud compute disks delete data-analyst-dev --zone=europe-west1-b --project=internal-prod --quiet 2>&1 || true
gcloud compute addresses delete data-analyst-ip --region=europe-west1 --project=internal-prod --quiet 2>&1 || true
- Step 10: Checkpoint — Fáze 2 hotová
Checklist:
- Terraform modul v
infra/modules/customer-instance/ keboola/agnes-infra-keboolaprivátní repo existuje,terraform applyfunguje- Prod VM
agnes-prodběží s persistent diskem - Dev VM
agnes-devběží - Data zmigrovaná, login funguje
- Staré VMs smazané, projekt vyčištěný
Po Fázi 2 lze pokračovat paralelně Fázemi 3, 4, 5.
Fáze 3 — TLS přes Caddy
Goal fáze: Agnes je dostupná na HTTPS s automatickým Let's Encrypt certifikátem. Cookie secure=True funguje.
Task 3.1: Přidat Caddy service do docker-compose
Files:
-
Create:
Caddyfile(v public repu root) -
Modify:
docker-compose.prod.yml(přidat caddy service) -
Step 1: Vytvořit Caddyfile
Write Caddyfile:
# Agnes reverse proxy with automatic Let's Encrypt.
# Config přes ENV vars: AGNES_DOMAIN, ACME_EMAIL.
{$AGNES_DOMAIN} {
# Health check endpoint bez TLS redirect (pro smoke testy interně)
@health path /api/health
encode gzip
reverse_proxy app:8000 {
header_up X-Forwarded-Proto https
}
tls {$ACME_EMAIL}
log {
output stdout
format json
}
}
# Fallback pro IP access (bez HTTPS, bez cert)
:80 {
reverse_proxy app:8000
}
- Step 2: Přidat caddy do
docker-compose.prod.yml
Add to services (pokud už tam není):
caddy:
image: caddy:2-alpine
restart: unless-stopped
ports:
- "80:80"
- "443:443"
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile:ro
- caddy_data:/data
- caddy_config:/config
environment:
AGNES_DOMAIN: ${AGNES_DOMAIN:-:80}
ACME_EMAIL: ${ACME_EMAIL:-admin@example.com}
depends_on:
- app
profiles:
- tls # nezapne se bez --profile tls
volumes:
caddy_data:
caddy_config:
- Step 3: Aktualizovat modul — předat
tls_modedo startup-script
V infra/modules/customer-instance/startup-script.sh najít sekci # --- 5. Start Agnes --- a rozšířit:
# --- 5. Start Agnes ---
COMPOSE_PROFILES=""
if [ "$TLS_MODE" = "caddy" ] && [ -n "$DOMAIN" ]; then
COMPOSE_PROFILES="--profile tls"
# Další ENV pro Caddy
{
echo "AGNES_DOMAIN=$DOMAIN"
echo "ACME_EMAIL=admin@$${DOMAIN#*.}"
} >> "$APP_DIR/.env"
fi
docker compose -f docker-compose.yml -f docker-compose.prod.yml $COMPOSE_PROFILES pull
docker compose -f docker-compose.yml -f docker-compose.prod.yml $COMPOSE_PROFILES up -d
- Step 4: Commit changes v public repu
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
git add Caddyfile docker-compose.prod.yml infra/modules/customer-instance/startup-script.sh
git commit -m "feat(tls): add Caddy reverse proxy with Let's Encrypt support"
- Step 5: Tag nového releasu modulu
# Po mergi PR do main
git checkout main && git pull
git tag -a infra-v1.1.0 -m "Add TLS support via Caddy"
git push origin infra-v1.1.0
Task 3.2: Zapnout TLS pro Keboola instanci
Tohle vyžaduje DNS záznam. Pokud nemáš doménu, skip a zůstaň na :8000.
- Step 1: V
keboola/agnes-infra-keboola/terraform/terraform.tfvarsnastavit doménu
Pokud máme agnes.keboola.com (ověřit u IT), edit:
prod_instance = {
name = "agnes-prod"
# ...
tls_mode = "caddy"
domain = "agnes.keboola.com"
}
A v main.tf bumpnout module ref:
source = "github.com/keboola/agnes-the-ai-analyst//infra/modules/customer-instance?ref=infra-v1.1.0"
- Step 2: Terraform apply
cd ~/.../agnes-infra-keboola/terraform
terraform apply
- Step 3: Nastavit DNS A record
agnes.keboola.com→ prod_ip
Ruční krok (potřebuje přístup do Keboola DNS). Výstup prod_ip je IP.
- Step 4: Počkat na DNS propagation + LE cert
until nslookup agnes.keboola.com | grep -q "$(terraform output -raw prod_ip)"; do sleep 30; done
sleep 60 # čas na LE cert issuance
curl -sSI --max-time 10 https://agnes.keboola.com | head -5
Expected: HTTP/2 200 (ne 301, ne TLS error).
Fáze 4 — Watchtower (dev VM auto-deploy), OS Login, VM SA
Goal fáze: Dev VMs auto-pullují nové image. OS Login pro SSH (bez osobního klíče). Dedikovaný VM SA.
Task 4.1: Watchtower integrace (už v Task 2 startup-script, zde jen ověření)
- Step 1: SSH na dev VM a ověřit, že watchtower běží
gcloud compute ssh agnes-dev --zone=europe-west1-b --project=internal-prod --command="sudo docker ps | grep watchtower"
Expected: container watchtower STATUS Up X minutes.
- Step 2: Otestovat auto-deploy: pushnout drobnou změnu na feature branch, počkat
# V public repu
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
git checkout -b feature/watchtower-test
echo "# test" >> README.md
git add README.md
git commit -m "test: trigger :dev image rebuild"
git push origin feature/watchtower-test
Počkat ~ 5-10 min (CI build + watchtower poll interval 5 min).
# Kontrola image sha na dev VM
gcloud compute ssh agnes-dev --zone=europe-west1-b --project=internal-prod \
--command="sudo docker inspect app-app-1 --format '{{.Image}}' && sudo docker image inspect \$(sudo docker inspect app-app-1 --format '{{.Image}}') --format '{{.Created}}'"
Expected: Created timestamp v posledních ~ 10 minutách.
Task 4.2: OS Login
- Step 1: Ověřit, že modul nastavuje
enable-oslogin=TRUE
Už je v infra/modules/customer-instance/main.tf:
metadata = {
enable-oslogin = "TRUE"
}
- Step 2: Zkontrolovat, že uživatelé mají
roles/compute.osAdminLoginna projektu
gcloud projects get-iam-policy internal-prod \
--flatten="bindings[].members" \
--filter="bindings.role=roles/compute.osAdminLogin" \
--format="value(bindings.members)"
Pokud prázdné, přidat:
gcloud projects add-iam-policy-binding internal-prod \
--member=user:zdenek.srotyr@keboola.com \
--role=roles/compute.osAdminLogin
- Step 3: Test SSH přes OS Login
gcloud compute ssh agnes-prod --zone=europe-west1-b --project=internal-prod --command="whoami"
Expected: username ve formátu zdenek_srotyr_keboola_com (OS Login generated).
Task 4.3: VM SA už má správný scope (ověřit)
- Step 1: Ověřit, že VM SA má jen secretmanager.secretAccessor
gcloud projects get-iam-policy internal-prod \
--flatten="bindings[].members" \
--filter="bindings.members:agnes-keboola-vm@" \
--format="value(bindings.role)"
Expected: roles/secretmanager.secretAccessor (jen tohle).
Fáze 5 — CI/CD v privátním infra repu
Goal fáze: PR v keboola/agnes-infra-keboola spustí terraform plan; merge → terraform apply. Prod aplikuje přes environment protection s reviewerem.
Task 5.1: plan.yml workflow
Files (v keboola/agnes-infra-keboola repu):
-
Create:
.github/workflows/plan.yml -
Step 1: Napsat plan.yml
name: Terraform Plan
on:
pull_request:
paths:
- 'terraform/**'
permissions:
contents: read
pull-requests: write
jobs:
plan:
runs-on: ubuntu-latest
defaults:
run:
working-directory: terraform
steps:
- uses: actions/checkout@v5
- uses: google-github-actions/auth@v2
with:
credentials_json: ${{ secrets.GCP_SA_KEY }}
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: ~1.7
- run: terraform init
- run: terraform fmt -check
- id: plan
run: |
terraform plan -no-color -out=tfplan 2>&1 | tee plan.txt
echo "status=$(echo $? )" >> $GITHUB_OUTPUT
- uses: actions/github-script@v7
if: always()
with:
script: |
const fs = require('fs');
const plan = fs.readFileSync('terraform/plan.txt', 'utf8').slice(0, 60000);
const body = `### Terraform plan\n\n\`\`\`\n${plan}\n\`\`\``;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: body
});
- Step 2: Commit
cd ~/.../agnes-infra-keboola
git add .github/workflows/plan.yml
git commit -m "ci: add terraform plan on PR"
git push
Task 5.2: apply.yml workflow s environment protection
Files:
-
Create:
.github/workflows/apply.yml -
Step 1: Napsat apply.yml
name: Terraform Apply
on:
push:
branches: [main]
paths:
- 'terraform/**'
workflow_dispatch: {}
permissions:
contents: read
jobs:
apply-dev:
runs-on: ubuntu-latest
environment: dev # no protection
defaults:
run:
working-directory: terraform
steps:
- uses: actions/checkout@v5
- uses: google-github-actions/auth@v2
with:
credentials_json: ${{ secrets.GCP_SA_KEY }}
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: ~1.7
- run: terraform init
- run: terraform apply -auto-approve -target='module.agnes.google_compute_instance.vm["agnes-dev"]'
apply-prod:
needs: apply-dev
runs-on: ubuntu-latest
environment: prod # protected — requires reviewer
defaults:
run:
working-directory: terraform
steps:
- uses: actions/checkout@v5
- uses: google-github-actions/auth@v2
with:
credentials_json: ${{ secrets.GCP_SA_KEY }}
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: ~1.7
- run: terraform init
- run: terraform apply -auto-approve
- name: Smoke test
run: |
PROD_IP=$(terraform output -raw prod_ip)
for i in 1 2 3 4 5; do
if curl -sf "http://$PROD_IP:8000/api/health" >/dev/null; then
echo "Healthy"; exit 0
fi
sleep 15
done
echo "Health check failed"; exit 1
- Step 2: V GitHub UI nastavit environmenty
Navigovat do keboola/agnes-infra-keboola → Settings → Environments → New environment:
-
dev: žádná protection
-
prod:
- Required reviewers: @ZdenekSrotyr (nebo @keboola-ops-team)
- Wait timer: 5 min
- Deployment branches: Selected branches →
main
-
Step 3: Commit workflow
git add .github/workflows/apply.yml
git commit -m "ci: add terraform apply with dev/prod environments and smoke test"
git push
- Step 4: Test flow — otevřít dummy PR, sledovat plan, merge, apply
git checkout -b test/ci-flow
# trivial edit in tfvars, např. přidat dev VM
echo "# ci flow test" >> terraform/README.md
git add terraform/README.md
git commit -m "test: CI flow"
git push origin test/ci-flow
gh pr create --title "test: CI flow" --body "Testing plan → apply flow"
V PR:
- Počkat na plan.yml → komentář s plánem
- Schválit + merge
- Sledovat apply-dev (auto), pak apply-prod (čeká na reviewera)
- Schválit prod deploy
- Ověřit smoke test PASS
Task 5.3: Rotovat SA key (z lokálního -> jen v GH secret)
- Step 1: Smazat lokální SA key
rm ~/.../agnes-deploy-internal-prod-key.json
- Step 2: Na GCP smazat starý klíč (key rotation)
# Seznam klíčů
gcloud iam service-accounts keys list \
--iam-account=agnes-deploy@internal-prod.iam.gserviceaccount.com \
--project=internal-prod
Po ověření, že GH Actions s novým klíčem funguje (po úspěšném prvním apply), smazat starý.
Fáze 6 — Template repo + onboarding playbook
Goal fáze: Druhý zákazník (another-customer) se dá nasadit za < 1 hodinu.
Task 6.1: Vytvořit keboola/agnes-infra-template
- Step 1: Založit prázdný repo jako template
gh repo create keboola/agnes-infra-template --public --description "Template for Agnes per-customer infrastructure" -c
cd ~/Library/Mobile\ Documents/com\~apple\~CloudDocs/Sources/VsCode/component_factory/
gh repo clone keboola/agnes-infra-template
cd agnes-infra-template
- Step 2: Zkopírovat strukturu z
agnes-infra-keboola, nahradit konkrétní hodnoty placeholdery
# Zkopírovat strukturu
cp -r ../agnes-infra-keboola/terraform .
cp -r ../agnes-infra-keboola/.github .
# Reset konkrétní hodnoty
cat > terraform/main.tf <<'EOF'
terraform {
required_version = ">= 1.5"
required_providers {
google = { source = "hashicorp/google", version = "~> 5.0" }
}
backend "gcs" {
bucket = "REPLACE_WITH_YOUR_BUCKET"
prefix = "REPLACE_WITH_CUSTOMER_NAME"
}
}
provider "google" {
project = var.gcp_project_id
region = var.region
zone = var.zone
}
module "agnes" {
source = "github.com/keboola/agnes-the-ai-analyst//infra/modules/customer-instance?ref=infra-v1.1.0"
gcp_project_id = var.gcp_project_id
region = var.region
zone = var.zone
customer_name = var.customer_name
seed_admin_email = var.seed_admin_email
data_source = var.data_source
keboola_stack_url = var.keboola_stack_url
prod_instance = var.prod_instance
dev_instances = var.dev_instances
}
output "prod_ip" { value = module.agnes.prod_ip }
output "instance_ips" { value = module.agnes.instance_ips }
EOF
cat > terraform/variables.tf <<'EOF'
variable "gcp_project_id" { type = string }
variable "region" { type = string, default = "europe-west1" }
variable "zone" { type = string, default = "europe-west1-b" }
variable "customer_name" { type = string }
variable "seed_admin_email" { type = string }
variable "data_source" { type = string, default = "keboola" }
variable "keboola_stack_url" { type = string, default = "" }
variable "prod_instance" { type = any }
variable "dev_instances" { type = any, default = [] }
EOF
cat > terraform/terraform.tfvars.example <<'EOF'
# Kopie tohoto souboru → terraform.tfvars, vyplnit hodnoty.
# terraform.tfvars je gitignored (nikdy necommitovat!)
gcp_project_id = "REPLACE" # Váš GCP projekt
customer_name = "REPLACE" # Krátký identifikátor, např. "acme"
seed_admin_email = "admin@example.com"
data_source = "keboola" # keboola | bigquery | csv
keboola_stack_url = "https://connection.keboola.com/"
prod_instance = {
name = "agnes-prod"
machine_type = "e2-small"
data_disk_gb = 50
image_tag = "stable"
upgrade_mode = "auto"
tls_mode = "caddy"
domain = ""
}
dev_instances = [
{ name = "agnes-dev", image_tag = "dev" }
]
EOF
- Step 3: Zkopírovat bootstrap skript z public repa
cp ../tmp_oss/scripts/bootstrap-gcp.sh .
- Step 4: Napsat README.md pro onboarding
Write:
# Agnes Infrastructure Template
Deploy Agnes (AI Data Analyst) into your own GCP project.
## Prerequisites
- GCP project with billing enabled
- `gcloud` CLI authenticated as project Owner
- `terraform` >= 1.5
- GitHub account (for private repo + Actions)
## 1. Bootstrap GCP
```bash
./bootstrap-gcp.sh <YOUR_GCP_PROJECT_ID>
Výstup: SA key JSON.
2. Klonovat template
gh repo create <YOUR_ORG>/agnes-infra --template keboola/agnes-infra-template --private
cd agnes-infra
3. Nastavit secrets
# SA key (z kroku 1)
gh secret set GCP_SA_KEY < path/to/key.json
rm path/to/key.json
# Keboola token (pokud data_source = keboola)
gcloud secrets create keboola-storage-token --data-file=- <<< "YOUR_TOKEN"
4. Konfigurace
Editovat terraform/main.tf — aktualizovat backend.bucket a backend.prefix.
Kopírovat terraform/terraform.tfvars.example → terraform/terraform.tfvars, vyplnit.
5. První apply
cd terraform
terraform init
terraform plan
terraform apply
IP prod VM je v outputu.
6. Login
# Bootstrap prvního admin usera
curl -X POST http://$(terraform output -raw prod_ip):8000/auth/bootstrap \
-H "Content-Type: application/json" \
-d '{"email": "YOU@example.com", "password": "YOUR_STRONG_PASSWORD"}'
Otevřít http://<prod_ip>:8000/login.
7. Upgrade workflow
:stableimage → auto-upgrade přes Watchtower- Infra změna: PR v tomto repu →
terraform planv PR → merge →apply(prod vyžaduje reviewer) - TF modul upgrade: Renovate otevře PR s novým
ref=infra-vX.Y.Z
Další detaily: https://github.com/keboola/agnes-the-ai-analyst/blob/main/docs/ONBOARDING.md
- [ ] **Step 5: Vytvořit README + push + mark as template**
```bash
git add .
git commit -m "initial template"
git push -u origin main
gh repo edit keboola/agnes-infra-template --template
Task 6.2: Napsat ONBOARDING.md v public repu
Files:
-
Create:
docs/ONBOARDING.md(v public repu) -
Step 1: Napsat ONBOARDING.md
Write docs/ONBOARDING.md obsah identický s README v template repu + poznámkou "fyzická šablona: keboola/agnes-infra-template".
- Step 2: Commit
cd "/Users/zdeneksrotyr/Library/Mobile Documents/com~apple~CloudDocs/Sources/VsCode/component_factory/tmp_oss"
git add docs/ONBOARDING.md
git commit -m "docs: onboarding guide for deploying Agnes per customer"
Task 6.3: Vyzkoušet onboarding na dummy customer (sanity check)
- Step 1: Vytvořit testovací GCP projekt
gcloud projects create agnes-onboarding-test-$(date +%s) --name="Agnes onboarding test"
# Link billing (via UI) if required
- Step 2: Spustit bootstrap
./scripts/bootstrap-gcp.sh <test-project-id>
- Step 3: Klonovat template do dummy repa
gh repo create zdeneksrotyr/agnes-infra-test --template keboola/agnes-infra-template --private
gh repo clone zdeneksrotyr/agnes-infra-test
cd agnes-infra-test
- Step 4: Projít README krok za krokem a změřit čas
Cíl: end-to-end < 1 hod. Zaznamenat překážky, zpět do README.
- Step 5: Cleanup — smazat test projekt
gcloud projects delete <test-project-id>
gh repo delete zdeneksrotyr/agnes-infra-test --yes
Task 6.4: Renovate configuration
- Step 1: Přidat renovate.json do template repa
Write keboola/agnes-infra-template/renovate.json:
{
"$schema": "https://docs.renovatebot.com/renovate-schema.json",
"extends": ["config:base"],
"customManagers": [
{
"customType": "regex",
"fileMatch": ["\\.tf$"],
"matchStrings": [
"source\\s*=\\s*\"github\\.com/keboola/agnes-the-ai-analyst//infra/modules/customer-instance\\?ref=(?<currentValue>infra-v\\d+\\.\\d+\\.\\d+)\""
],
"datasourceTemplate": "github-releases",
"depNameTemplate": "keboola/agnes-the-ai-analyst",
"packageNameTemplate": "keboola/agnes-the-ai-analyst",
"versioningTemplate": "regex:^infra-v(?<major>\\d+)\\.(?<minor>\\d+)\\.(?<patch>\\d+)$"
}
],
"packageRules": [
{
"matchPackageNames": ["keboola/agnes-the-ai-analyst"],
"matchUpdateTypes": ["major"],
"prPriority": 10
}
]
}
- Step 2: Instalovat Renovate GitHub App na privátní repa
Ruční krok v GitHub: Settings → Integrations → Renovate → grant access.
Finální checkpoint
- Fáze 1 complete — prod běží z
:stableimage, žádný git pull z forku - Fáze 2 complete — TF modul, PD, Keboola nasazena přes modul
- Fáze 3 complete — HTTPS funguje (pokud DNS dostupné)
- Fáze 4 complete — watchtower na dev VM auto-pulluje :dev, OS Login aktivní
- Fáze 5 complete — GHA CI/CD funguje, prod apply vyžaduje review
- Fáze 6 complete — template repo existuje, ONBOARDING.md, Renovate nakonfigurovaný
- Starý osobní fork smazán
- Keboola token rotován a v Secret Manageru
- Dokumentace aktualizovaná
Self-Review
Spec coverage:
- §2 Model self-deploy → Task 1.2 (bootstrap), Task 2.3 (private repo), Task 6 (template) ✅
- §3 Repo architektura → Task 2.1 (modul), Task 6.1 (template), Task 2.3 (customer repo) ✅
- §4 Release model → Task 1.1 (per-branch tagging), existuje release.yml ✅
- §5 Branch-aware dev → Task 2.1 (dev_instances proměnná), Task 4.1 (watchtower) ✅
- §6 Prod upgrade model → Task 4.1 (auto via watchtower), pinned mode přes tfvars (zákazník zvolí) ✅
- §7 Security → Task 1.2-1.4 (Secret Manager, SA), Task 4.2 (OS Login), Task 5.2 (env protection) ✅
- §8 Onboarding → Task 6.1-6.4 ✅
- §9 Tok změn → Task 5.1-5.2 (plan/apply), Task 4.1 (watchtower pipeline) ✅
- §10 Backup/monitoring → částečně; monitoring je follow-up (§14) ✅
Placeholder scan: Všechny kódy, konfigurace, příkazy jsou konkrétní.
Type consistency: prod_instance object a dev_instances list mají konzistentní schéma napříč Task 2.1, Task 2.3, Task 6.1.
Gap: Zákazníkem-zvolený pinned upgrade režim (§6.1) spouští Renovate — Renovate konfigurace je v Task 6.4, ale nepokrývá upgrade image tagu (jen modul ref). Follow-up: rozšířit customManagers v renovate.json na image_tag v tfvars.