* ci: add actionlint workflow lint, drop superseded deploy.yml stub
* ci: extract rollback into reusable rollback.yml, wire into release smoke-test
* ci: add weekly prune-dev-tags workflow for legacy CalVer tag/image cleanup
* release: 0.54.17 — CI/release workflow consolidation
* fix(ci): warn when rollback.yml receives a non-stable failed_image_tag
* fix(ci): rollback.yml + prune-dev-tags.sh review findings
rollback.yml:
- Pass workflow_dispatch inputs (failed_image_tag, target_image_tag)
through env: instead of textual ${{ }} splicing into bash run blocks
— prevents an actor with workflow_dispatch privilege from injecting
shell via quote/backtick payloads.
- Guard against TARGET == FAILED when only one stable-* tag exists
(fresh repo, or aggressive pruning at month boundary). Fail loudly
rather than re-push the broken image as :stable.
- Add commit SHA to the rollback tracking-issue body — github.sha is
inherited across workflow_call, so on-call no longer has to navigate
rollback run → caller-workflow breadcrumb → failing commit.
prune-dev-tags.sh:
- Replace 'printf … | head -20' preview pipeline with array slice
('"${TO_PRUNE[@]:0:20}"'). Under set -o pipefail, head closing
the pipe early SIGPIPEs printf (exit 141) and aborts the script
before any deletion runs — exactly the multi-month-backlog scenario
the script targets.
- Refactor GHCR-pass: fetch versions JSON once before the loop, then
build a tag→version-id map up-front. Closes two problems:
1. O(N × pages) GHCR API calls collapse to one paginated listing
— months of accumulated CalVer tags no longer risk tripping
abuse detection.
2. The new jq filter excludes any version that ALSO carries a
floating alias (:stable, :dev, *-latest). GHCR DELETE-version
drops the entire manifest, so pruning a CalVer tag that shares
a manifest with :stable (e.g. after a rollback re-tag) would
have vaporized :stable. Now it's skipped with a log line.
lint-workflows.yml:
- Add an explicit shellcheck step. actionlint only walks
.github/workflows/ and the shell embedded in their run: blocks, so
freestanding scripts/ops/*.sh (which are in the workflow's path
filter) were never actually validated despite triggering CI.
* fix(ci): shellcheck --severity=warning to skip pre-existing info findings
The new shellcheck step caught info-level findings (SC1091, SC2015) in
agnes-auto-upgrade.sh / agnes-tls-rotate.sh — pre-existing, not regressed
by this PR. Constrain shellcheck to warning+ severity (real bugs) so info
and style findings don't block CI; mirrors the actionlint step's
continue-on-error initial-rollout posture.
* fix(ci): second-pass review findings — concurrency, walk-back, failure propagation
rollback.yml:
- Add own concurrency block (group: rollback-<repo>-<failed_tag>,
cancel-in-progress: false). The caller release.yml uses
cancel-in-progress: true to avoid duplicate CalVer claims, but a
second push to main mid-rollback would otherwise kill the workflow
between the :stable recovery push and the :deprecated-* audit push,
leaving :stable stuck on the broken image. A reusable workflow's own
concurrency overrides the inherited one.
- Walk back through stable-* tags newest-first, skipping any whose
:deprecated-<stripped> GHCR alias already exists (carries the mark of
a prior failed rollback). The previous 'second-most-recent' heuristic
could re-point :stable at a known-broken image on cascading failures.
- Reorder re-tag step: push :stable recovery FIRST, then the
:deprecated-* audit tag. Defense in depth — even if the concurrency
block somehow misfires, the worst case is missing audit metadata
rather than production stuck on the broken image.
- Move GHCR login before resolve step so 'docker manifest inspect' can
probe for :deprecated-* aliases during walk-back.
- Document the top-level permissions block's dual semantics
(workflow_dispatch grants directly; workflow_call acts as a cap
intersected with the caller's job-level permissions).
release.yml:
- Rewrite the 'issues: write' comment. Old wording ('default for jobs')
was factually wrong — GITHUB_TOKEN's default for issues is never write
— and read as 'this line just documents a default', so a future
cleanup PR could delete it. The line is load-bearing: workflow_call
permissions are bounded by the caller's GITHUB_TOKEN scope, and
removing it would silently 403 rollback.yml's gh issue create step.
prune-dev-tags.sh:
- Drop the '|| echo "[]"' fallback on the GHCR versions fetch. The
fallback turned every API failure (403 missing scope, 429 rate limit,
transient 5xx) into a silent no-op with exit 0 — operators saw a
green run while every TAG fell through to the same 'no eligible
version' skip message used for legitimate manifest-collision skips.
- Reorder: fetch GHCR versions BEFORE any git-tag deletion. Git-tag
delete is irrecoverable (next run rebuilds TO_PRUNE from 'git tag
-l', so an orphan GHCR image is never enumerated again). Fetching
first means an API failure aborts cleanly with no state change.
- Track PRUNE_FAILED flag. 'git push --delete' fallback is no longer
unconditional — local 'git tag -d' is gated on successful remote
push, so a refused remote delete (tag-protection rule, missing
contents:write) leaves the local tag in place for retry. The flag
propagates to a final 'exit 1' so the cron run turns red on any
push or DELETE failure.
lint-workflows.yml:
- shellcheck step now uses 'find scripts/ops -type f -name *.sh' to
match the workflow's recursive 'scripts/ops/**.sh' path filter. The
previous bare 'scripts/ops/*.sh' glob only matched top-level files;
a future script under a subdirectory would have triggered the
workflow but never been linted.
* docs(releasing): document rollback.yml, prune-dev-tags.yml, lint-workflows.yml
Reflects the new operational workflows landing in this release:
- Auto-rollback paragraph in release.yml description (smoke-test job +
rollback-on-smoke-fail → rollback.yml)
- rollback.yml subsection — workflow_call + workflow_dispatch entry
points, walk-back target resolution, immutability + concurrency
guarantees, manual operator gh workflow run examples
- prune-dev-tags.yml subsection — weekly cron, KEEP_MONTHS retention
semantics, floating-alias safety, dry_run preview, failure-propagation
exit-non-zero behavior
- lint-workflows.yml CI quirk — actionlint (continue-on-error) +
shellcheck (--severity=warning blocking) advisory checks
CLAUDE.md non-negotiable rules unchanged — still high-level and
correct (changelog discipline + release-cut belongs to the PR + run the
full test suite).
|
||
|---|---|---|
| .. | ||
| admin | ||
| archive | ||
| examples | ||
| HOWTO | ||
| metrics | ||
| operator | ||
| setup | ||
| testing | ||
| ADR-corporate-memory-v1.md | ||
| agent-setup-prompt.md | ||
| agent-workspace-prompt.md | ||
| architecture.md | ||
| auth-google-oauth.md | ||
| auth-groups.md | ||
| CONFIGURATION.md | ||
| corporate-memory-governance.md | ||
| curated-marketplace-format.md | ||
| DATA_SOURCES.md | ||
| DEPLOYMENT.md | ||
| development.md | ||
| HEADLESS_USAGE.md | ||
| initial-workspace-override.md | ||
| llm-routing.md | ||
| local-development.md | ||
| marketplace.md | ||
| observability.md | ||
| ONBOARDING.md | ||
| PLATFORM_SETUP.md | ||
| QUICKSTART.md | ||
| RBAC.md | ||
| README.md | ||
| RELEASE_CHECKLIST.md | ||
| RELEASING.md | ||
| sample-data.md | ||
| state-dir.md | ||
| STORE_GUARDRAILS.md | ||
| theme-reference.html | ||
Agnes documentation
Index of all documentation, organized by who needs it. New here? Start with the row that matches your role.
| You are… | Start with |
|---|---|
| Analyst — using Agnes to query data | QUICKSTART.md, then HOWTO/ |
| Operator — deploying & running an instance | PLATFORM_SETUP.md |
| Developer — working on Agnes itself | ../ARCHITECTURE.md + architecture.md |
For analysts
Using the platform to analyze data.
QUICKSTART.md— local setup + first syncHOWTO/— task-oriented cookbook (querying, snapshots, common workflows)DATA_SOURCES.md— data source connectors (Keboola, BigQuery, CSV) and how tables surfacemetrics/— canonical business-metric definitions (YAML)HEADLESS_USAGE.md— PAT auth for CI / headless clients
For operators
Deploying, configuring, and running an Agnes instance.
PLATFORM_SETUP.md— the consolidated operator playbook (bootstrap, TLS, marketplaces, scheduler, telemetry)ONBOARDING.md— end-to-end Terraform deployment into a new GCP projectDEPLOYMENT.md— picks between the Terraform and Docker Compose pathsCONFIGURATION.md—instance.yaml, env vars, per-instance optionsstate-dir.md— persistent data layout (data+statetiers, mount layouts, migration)RBAC.md— access control: groups, members, resource grantsauth-google-oauth.md— Google OAuth setup + operator gotchasauth-groups.md— Google Workspace group syncadmin/query-modes.md— table registration query modesagent-setup-prompt.md— customize the/setuppage banneragent-workspace-prompt.md— customize the generated analystCLAUDE.mdinitial-workspace-override.md— per-instance analyst-workspace skeleton overridecurated-marketplace-format.md— authoringmarketplace-metadata.jsonfor curated marketplacesobservability.md— PostHog integration (exceptions, tracing, session replay)operator/news-content-guide.md— editorial guidelines for in-app news content
For developers
Working on the Agnes codebase.
../ARCHITECTURE.md— high-level system overview (the summary)architecture.md— detailed architecture reference (module map, extract.duckdb contract, components)../CLAUDE.md— project instructions for AI agents working in this repodevelopment.md— logging, request correlation, debug toolbarlocal-development.md—LOCAL_DEV_MODEsetup (what's mocked vs. real)RELEASING.md— release process, deploy workflows, CI quirksRELEASE_CHECKLIST.md— pre-merge checks for bootstrap-path changestesting/— test plans (clean-analyst bootstrap, VM test)marketplace.md— Claude Code marketplace ingestion + re-serving internalsSTORE_GUARDRAILS.md— flea-market upload guardrails (static checks + LLM review)corporate-memory-governance.md— knowledge-distribution governance designADR-corporate-memory-v1.md— ADR: corporate-memory v1 decisionsllm-routing.md— design: provider-agnostic LLM routingsample-data.md— sample data generator (e-commerce schema, size presets)theme-reference.html— web UI theme/color reference../dev_docs/— server/developer-internal docs (not synced to analyst machines): server ops, disaster recovery, security audit, desktop app, design system, Telegram bot
Code-adjacent READMEs: ../connectors/jira/README.md,
../services/corporate_memory/README.md,
../scripts/README.md.
Agent skill files: ../cli/skills/.
Other
../CHANGELOG.md— full change history (Keep-a-Changelog format)archive/— historical planning artifacts and superseded docs; not maintained, seearchive/README.md