CLAUDE.md rewritten (708 -> ~320 lines): four overlapping release sections collapsed to one, stale v1->v35 schema history dropped (it lives in CHANGELOG), marketplace endpoint internals and verbose process sections moved out or tightened. New focused docs: - docs/RELEASING.md - release process, deploy workflows, CI quirks (RELEASE_TEMPLATE.md folded in as an appendix) - docs/marketplace.md - marketplace ingestion + re-serving internals - docs/README.md - documentation index by audience, linked from README.md and CLAUDE.md Archived under docs/archive/: docs/superpowers/ (52 historical planning artifacts), HACKATHON.md, pd-ps-comments.md, security-audit-2026-04.md, future/NOTIFICATIONS.md. Removed the docs/auto-install.md stub. Fixed dangling links in connectors/jira/README.md and dev_docs/README.md, repointed code/doc references to archived paths.
12 KiB
Releasing & deploying
The full release process for Agnes. CLAUDE.md carries the short version; this doc is the operational reference. Read it linearly the first few times — once internalized, the order matters less, but the non-obvious gotchas never go away.
Changelog discipline — non-negotiable
Every PR that adds, removes, or changes user-visible behavior MUST update
CHANGELOG.md in the same PR. No exceptions, no follow-ups, no "I'll do it
after merge". User-visible = anything an operator, end-user, or downstream
integrator can observe: CLI flags / output / exit codes, REST endpoints /
payloads / status codes, web UI, instance.yaml schema, env vars,
extract.duckdb contract, Docker / compose / Caddyfile knobs, default
behaviors, breaking changes, security fixes.
How:
- Add a bullet under the topmost
## [Unreleased]heading (create one if missing — it sits above the latest released version). - Group by
### Added/### Changed/### Fixed/### Removed/### Internal(Keep-a-Changelog sections). - Mark breaking changes with
**BREAKING**at the start of the bullet — operators grep for that string before bumping the pin. - Reference the relevant doc/runbook if one exists (e.g.
see docs/auth-groups.md), don't restate it. - Internal-only changes (refactors, test additions, dependency bumps without
behavior change) go under
### Internal— still log them, just keep them terse.
Reviewers should bounce PRs that touch user-visible behavior without a changelog update — same way they'd bounce a PR with no test changes for new logic.
Release-cut belongs to the PR — non-negotiable
The version bump + CHANGELOG rename + new empty [Unreleased] are the LAST
commit on the PR that earned the version. Never a standalone follow-up PR.
When a PR lands the only [Unreleased] content (or is the last in a queue of
in-flight feature PRs), the release-cut MUST ship as part of the same merge.
Standalone release-cut PRs add review-overhead PRs to history with no behavior
change of their own and pollute git log with bookkeeping commits separated
from the work that earned them.
Mandatory checklist before approving / enabling auto-merge on ANY PR:
- Stop. Will this PR land alone in
[Unreleased](no other in-flight PRs queued behind it)? - If yes, the release-cut is REQUIRED in the same PR before merge. BEFORE
pushing the final commit:
- Bump
pyproject.tomltoX.Y.Z - Rename
## [Unreleased]→## [X.Y.Z] — YYYY-MM-DD, add a new empty## [Unreleased]on top - Either squash these into the consolidation commit OR add as a separate
release: X.Y.Zcommit on the same branch
- Bump
- THEN push, approve, enable auto-merge.
- After auto-merge fires: tag
vX.Y.Zagainst the merge commit + create a GitHub Release. Done — one PR, one merge, one release.
Failure mode to avoid: enabling auto-merge on the feature PR thinking "I'll add the release-cut after." Auto-merge fires faster than the second commit lands. The window closes; the only fix is a standalone release-cut PR — exactly what this rule prohibits.
Acceptable standalone release-cut (rare): only when [Unreleased]
accumulated bullets from MULTIPLE already-merged PRs AND no further
behavior-change PR is queued — i.e. the cut is the only outstanding work and
there's no PR to attach it to.
Release workflow — concrete recipe
Happy path (8 steps)
# 1. Branch from a fresh checkout. iCloud Drive worktrees randomly hang
# on git operations — use a fresh shallow clone in /tmp instead.
cd /tmp && git clone --depth 50 --branch main \
https://github.com/keboola/agnes-the-ai-analyst.git agnes-<topic>
cd agnes-<topic> && git checkout -b zs/<branch-name>
# 2. Make the change + tests. Run the AREA pytest while iterating
# (e.g. `pytest tests/test_X.py -p no:xdist -q`).
# 3. Add a CHANGELOG bullet under [Unreleased].
# Group: Added | Changed | Fixed | Removed | Internal
# Mark BREAKING with **BREAKING** prefix.
# 4. Commit the change(s). Multiple logical commits OK; release-cut
# will be a SEPARATE last commit (next step). DO NOT bundle the
# release-cut into the same commit as the change — it pollutes
# the SHA that auto-close keywords reference and makes revert
# targeted at the change-only difficult.
# 5. Run the full pytest suite locally:
# `pytest tests/ -p no:xdist -q` (or `-n auto` if xdist works).
# Pre-existing fails (e.g. test_readers_in_pre_init_dir under
# subprocess timeout) are OK to ignore; verify by reverting your
# diff and reproducing on bare main.
# 6. Release-cut commit (LAST commit on the PR per the rule above):
# - Bump pyproject.toml: version = "X.Y.Z"
# - Rename `## [Unreleased]` → `## [X.Y.Z] — YYYY-MM-DD`
# - Add a fresh empty `## [Unreleased]` line above
# Commit message: `release: X.Y.Z — <one-line summary>`
# 7. Push branch + open PR + enable auto-merge SQUASH:
# git push -u origin HEAD
# gh pr create --repo keboola/agnes-the-ai-analyst \
# --head <branch> --title "<...>" --body "<...>"
# gh pr merge <N> --repo keboola/agnes-the-ai-analyst \
# --squash --auto --delete-branch
# 8. After auto-merge fires (poll or `Monitor`):
# git fetch origin --tags
# git tag vX.Y.Z <merge-sha>
# git push origin vX.Y.Z
# gh release create vX.Y.Z --repo keboola/agnes-the-ai-analyst \
# --title "vX.Y.Z — <...>" --notes "<copy-paste from CHANGELOG>"
Picking the next version
pyproject.toml's current version is the next-release target (post-cut
from the previous release). Pre-1.0 we patch-bump for everything that doesn't
break operator-facing APIs:
instance.yamlschema additions, new env vars, new endpoints → patch (e.g. 0.54.3 → 0.54.4)- New CLI subcommands, BREAKING removals, schema migrations → still patch within the current 0.5x cycle (no minor bumps cut today)
- The CHANGELOG
**BREAKING**marker is what operators grep for; the version number is secondary
Always check git tag -l "v0.X*" before naming — if v0.54.0 is already
tagged, the next one is v0.54.1, even if pyproject.toml still says 0.54.0
from a stale post-cut commit (we've shipped that race before).
Authoring expectations on the PR
- Self-PRs (you're both author and reviewer): GitHub forbids self-approve.
If branch protection requires N approving reviews (we don't today —
required_approving_review_count = 0), you need someone else to approve. With our current 0-review setup, self-PRs can still merge automatically once required CI passes. - Other people's PRs you're taking over: dismiss any prior
CHANGES_REQUESTED reviews (yours or someone else's) before auto-merge can
fire.
gh pr review <N> --approve --body "..."after pushing your fixes. - Devin Review: not a required check today; runs in parallel and posts a comment. Don't wait on it for merge unless the human reviewer explicitly asks.
CI quirks you WILL hit
gh pr checksglosses CANCELLED asfail. When you force-push (rebase, amend), GitHub auto-cancels the in-flightReleaseworkflow run on the older SHA. Those cancelled jobs show up as "fail" in the PR's check summary and tab forever, even after newer runs succeed. Look at the conclusion column, not just the count. Rule of thumb: if the same check name appears with bothpassandfailrows, thefailrow is from an older auto-cancelled SHA. Verify withgh api repos/keboola/agnes-the-ai-analyst/commits/<sha>/check-runs— the raw API distinguishescancelledfromfailuretruthfully.- Branch protection's "strict" mode caches cancelled
testas blocking even after newertestruns succeed. Symptom:mergeable_state: blockeddespite all required checks green on the latest SHA. Fix: re-run the cancelledReleaseworkflow run (gh run rerun <run-id>); once itstestjob lands as success, the block clears. We've hit this on PRs #273, #281, #285, #286. - Required checks (per branch protection):
test+docker-buildonly. Other workflows (cli-wheel-clean-install,build-and-push,Release-pipeline, Devin Review) are advisory — green/red doesn't gate merge. enforce_admins: truein branch protection means--adminflag ongh pr mergedoes NOT bypass. Don't try; just fix the underlying block.
Recovery when something derails
- Force-pushed and lost auto-merge? GitHub usually preserves auto-merge
across force-pushes for the same PR; if it cleared, just re-run
gh pr merge <N> --squash --auto --delete-branch. - Release-cut commit forgot to land? That's the failure mode the "Release-cut belongs to the PR" rule prevents. If it happens anyway: open a follow-on PR with ONLY the release-cut commit, ship it, and write up why in your post-mortem comment.
- Wrong version number tagged?
git tag -d vX.Y.Z && git push --delete origin vX.Y.Zthen re-tag against the right SHA. Update the GitHub Release if you already created it.
Deploy workflows
Two separate release.yml-style workflows produce GHCR images. Pick the one that matches what you're shipping.
release.yml — auto-build on every push
Runs on every push to every branch.
- Push to
main→:stable,:stable-YYYY.MM.N(CalVer). - Push to non-main
<prefix>/<branch>→:dev,:dev-YYYY.MM.N,:dev-<branch-slug>, and (when prefix isn't a Git Flow convention):dev-<prefix>-latestalias.
VMs that pin to a floating tag (:dev, :dev-<prefix>-latest) auto-upgrade
within ~5 min via the cron in agnes-auto-upgrade.sh. Convenient for
per-developer dev VMs; footgun for shared dev VMs (last pusher wins,
regardless of who).
keboola-deploy.yml — tag-triggered, explicit deploy only
Runs only on git tags matching keboola-deploy-*. Publishes:
:keboola-deploy-<git-tag-suffix>— immutable, tied to the exact commit:keboola-deploy-latest— floating alias the consumer pins to
Operator workflow:
git checkout <commit-or-branch>
git tag keboola-deploy-<descriptive-name>
git push origin keboola-deploy-<descriptive-name>
# → workflow builds + publishes both tags
# → VM cron picks up :keboola-deploy-latest within ~5 min
# → manual cron trigger (skip the wait): sudo /usr/local/bin/agnes-auto-upgrade.sh on the VM
Use this when the consumer (e.g. a customer dev VM) needs
deploy-when-I-decide semantics — no surprise rollouts from upstream branch
pushes by other contributors. The infra repo pins
image_tag = "keboola-deploy-latest" on the relevant VM.
Module versioning
The customer-instance Terraform module under infra/modules/customer-instance/
is published as infra-vMAJOR.MINOR.PATCH git tags (separate from app CalVer
tags). Bump on any module-API change; downstream infra repos pin to the tag in
their source = "github.com/keboola/agnes-the-ai-analyst//infra/modules/customer-instance?ref=infra-v1.X.Y".
After merging a module change to main:
git tag infra-vX.Y.Z origin/main
git push origin infra-vX.Y.Z
Replacing a VM after a startup-script change
Module sets lifecycle { ignore_changes = [metadata_startup_script] } on
google_compute_instance.vm so normal terraform apply doesn't churn running
VMs. To propagate a startup-script update, trigger the consumer's apply workflow
manually with the VM resource address — typical workflow_dispatch input is
recreate_targets='module.agnes.google_compute_instance.vm["<vm-name>"]'.
Appendix: CHANGELOG entry skeleton
Copy this when adding to ## [Unreleased] in CHANGELOG.md. Drop the sections
you don't need; keep the Keep-a-Changelog order.
### Added
- New feature description.
### Changed
- Change description. **BREAKING** prefix + migration steps if operator-facing.
### Fixed
- Bug fix description.
### Removed
- **BREAKING** removed feature — what replaces it.
### Internal
- Refactors, test additions, dependency bumps with no behavior change.
At release-cut time ## [Unreleased] is renamed to ## [X.Y.Z] — YYYY-MM-DD
and a fresh empty ## [Unreleased] is added on top. CI publishes the matching
stable-YYYY.MM.N image tag for the merge commit (see Deploy workflows above).