From b5178fe9423cc1eec8ff4aead457637180bf9f75 Mon Sep 17 00:00:00 2001 From: ZdenekSrotyr <139972147+ZdenekSrotyr@users.noreply.github.com> Date: Thu, 30 Apr 2026 09:42:27 +0200 Subject: [PATCH] fix(ci): smoke-test stale route + rollback ghcr auth + issues:write (#140) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three CI fixes triggered by the failed PR #137 deploy: 1. scripts/smoke-test.sh: assertion 8 was hitting /api/admin/tables (renamed to /api/admin/registry long ago). The 404 was treated as deployment regression and triggered the auto-rollback. Same stale URL also fixed in CLAUDE.md, README.md, dev_docs/server.md. 2. .github/workflows/release.yml smoke-test job: added Log in to GHCR step. The auto-rollback's docker push :stable was failing with 'unauthenticated' because the smoke-test job had no GHCR login of its own — leaving :stable pointing at the broken image. 3. Rollback step gained GH_TOKEN env, AND the workflow's permissions block gained issues:write. Both were needed for gh issue create to actually create the alert issue (was silently swallowed by the || echo fallback). Manual cleanup outside this PR: :stable currently points at the broken PR #137 image — needs manual retag back to stable-2026.04.505. --- .github/workflows/release.yml | 24 ++++++++++++++++++++++++ CHANGELOG.md | 4 ++++ CLAUDE.md | 2 +- README.md | 2 +- dev_docs/server.md | 2 +- scripts/smoke-test.sh | 14 +++++++++----- 6 files changed, 40 insertions(+), 8 deletions(-) diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index d438fdf..38dc620 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -24,6 +24,11 @@ on: permissions: contents: write packages: write + # `issues: write` lets the smoke-test job's rollback step open a + # GitHub issue alerting operators when an auto-rollback fires. Without + # this, the `gh issue create` call hits 403 and the `|| echo` fallback + # silently swallows it — operators see :stable revert with no alert. + issues: write # When a developer pushes a brand-new branch with code changes, GitHub fires # both a `create` and a `push` event for the same commit. Without @@ -224,6 +229,19 @@ jobs: fetch-depth: 0 fetch-tags: true + # Required for the rollback step's `docker push` to GHCR. The + # `build-and-push` job logs in for itself; this job needs its own + # login since GitHub Actions tokens are scoped per-job. Without it, + # the rollback hits "unauthenticated: User cannot be authenticated + # with the token provided" and silently leaves :stable pointing at + # the broken image (real incident: PR #137 / 4ec5ff44). + - name: Log in to GHCR + uses: docker/login-action@v4 + with: + registry: ghcr.io + username: ${{ github.actor }} + password: ${{ secrets.GITHUB_TOKEN }} + - name: Start Agnes from built image run: | # Create empty .env (docker-compose.yml requires env_file: .env, gitignored) @@ -239,6 +257,12 @@ jobs: - name: Automatic rollback on failure if: failure() + env: + # Required for the `gh issue create` call below — without GH_TOKEN + # the gh CLI fails the auth check and the issue creation falls + # through the `|| echo` fallback, so an operator never sees the + # rollback alert. + GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} run: | IMAGE_TAG="${{ needs.build-and-push.outputs.image_tag }}" VERSION="${{ needs.build-and-push.outputs.version }}" diff --git a/CHANGELOG.md b/CHANGELOG.md index bc2847f..42f1580 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,10 @@ CalVer image tags (`stable-YYYY.MM.N`, `dev-YYYY.MM.N`) are produced for every C ## [Unreleased] +### Internal +- `scripts/smoke-test.sh`: assertion 8 now hits `/api/admin/registry` (the current admin tables endpoint). The old `/api/admin/tables` URL was renamed long ago and the smoke test was returning 404 on every run — it only surfaced as a deploy failure when the full release pipeline first triggered the rollback path on the post-#137 deploy (run 25151878647). Same stale URL was also fixed in `CLAUDE.md`, `README.md`, and `dev_docs/server.md` — the routes now correctly point at `POST /api/admin/register-table` (create) and `PUT /api/admin/registry/{id}` (update). +- `.github/workflows/release.yml` smoke-test job: added `Log in to GHCR` step. The auto-rollback's `docker push :stable` was hitting `unauthenticated: User cannot be authenticated with the token provided` because the smoke-test job had no GHCR login of its own. Result: a failed deploy left `:stable` pointing at the broken image. The rollback step also got an explicit `GH_TOKEN` env, and the workflow's top-level `permissions` block gained `issues: write`, so its `gh issue create` call actually creates the alert issue (was silently swallowed by the `|| echo` fallback because of both the missing env var AND the missing scope). + ## [0.21.0] — 2026-04-30 ### Internal diff --git a/CLAUDE.md b/CLAUDE.md index 80a14a7..97d67d9 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -19,7 +19,7 @@ Ask the user for: 4. Create `.env` from `config/.env.template` ### Step 3: Register Tables -1. Use the FastAPI admin API (`POST /api/admin/tables/{id}`) or webapp UI to register tables +1. Use the FastAPI admin API (`POST /api/admin/register-table`, then `PUT /api/admin/registry/{id}` for updates) or webapp UI to register tables 2. Tables are stored in DuckDB `table_registry` with source_type, bucket, source_table, query_mode 3. For migration from old format: `python scripts/migrate_registry_to_duckdb.py` diff --git a/README.md b/README.md index d1be2f4..ed27ee0 100644 --- a/README.md +++ b/README.md @@ -125,7 +125,7 @@ pytest tests/ -v |------|---------| | `config/instance.yaml` | Instance-specific settings: branding, data source type, auth provider, Google domain | | `.env` | Secrets and environment variables — never committed | -| `system.duckdb` `table_registry` table | Table definitions managed via `POST /api/admin/tables/{id}` or the web UI | +| `system.duckdb` `table_registry` table | Table definitions managed via `POST /api/admin/register-table` (or `PUT /api/admin/registry/{id}` to update) or the web UI | Copy the example to get started: diff --git a/dev_docs/server.md b/dev_docs/server.md index b576769..0a2a776 100644 --- a/dev_docs/server.md +++ b/dev_docs/server.md @@ -218,7 +218,7 @@ The FastAPI app is available at `https://your-instance.example.com`. - **Google OAuth**: restricted to `allowed_domain` set in `config/instance.yaml` - **Email magic link**: available out of the box (no external service required) -- **Admin API**: `POST /api/admin/tables/{id}` — register/update tables +- **Admin API**: `POST /api/admin/register-table` (register), `PUT /api/admin/registry/{id}` (update), `GET /api/admin/registry` (list) — manage tables - **Sync API**: `POST /api/sync/trigger` — trigger data extraction ### Google OAuth setup diff --git a/scripts/smoke-test.sh b/scripts/smoke-test.sh index f2b2e56..03f060f 100755 --- a/scripts/smoke-test.sh +++ b/scripts/smoke-test.sh @@ -136,17 +136,21 @@ else echo " SKIP catalog (no token)" fi -# 8. Admin tables endpoint (authenticated) +# 8. Admin registry endpoint (authenticated) +# NOTE: was /api/admin/tables until that endpoint was renamed to +# /api/admin/registry; this assertion went stale and only surfaced when the +# auto-rollback workflow first fired (smoke test was failing for many +# releases without anyone noticing). if [ -n "$TOKEN" ]; then - TABLES_HTTP=$(curl -s -o /tmp/smoke_tables.json -w "%{http_code}" "$HOST/api/admin/tables" \ + TABLES_HTTP=$(curl -s -o /tmp/smoke_tables.json -w "%{http_code}" "$HOST/api/admin/registry" \ -H "Authorization: Bearer $TOKEN" 2>/dev/null || echo "000") if [[ "$TABLES_HTTP" =~ ^(200|403)$ ]]; then - check "admin tables endpoint (HTTP $TABLES_HTTP)" "true" + check "admin registry endpoint (HTTP $TABLES_HTTP)" "true" else - check "admin tables endpoint (HTTP $TABLES_HTTP)" "false" + check "admin registry endpoint (HTTP $TABLES_HTTP)" "false" fi else - echo " SKIP admin tables (no token)" + echo " SKIP admin registry (no token)" fi # 9. Marketplace.zip endpoint (with PAT auth if available)