* feat(initial-workspace): per-instance agnes init override
Adds Initial Workspace Template — an admin-configurable per-instance
override for the agnes init analyst workspace. When configured, agnes
init downloads a server-rendered zip from a Git repo the admin registered
and extracts it into the analyst's workspace, fully bypassing Agnes-default
CLAUDE.md / settings.json / hooks / slash commands / AGNES_WORKSPACE.md.
Repo layout convention: only the contents of a top-level `workspace/`
subdirectory ship to analysts; admin docs (README, CI configs) at the
repo root stay in the repo and never reach an analyst. Sync rejects
repos without `workspace/` at root.
Server side:
- src/initial_workspace.py — clone (or fetch+reset), validate, build zip
with strict path checks and reserved-path rejection
(workspace/.claude/init-complete reserved by Agnes)
- app/api/initial_workspace.py — admin CRUD + sync endpoint + analyst-
facing status/zip/applied endpoints; config persists to instance.yaml
overlay, PAT to .env_overlay
- app/secrets.py — refactor: persist_overlay_token shared helper with
threading.Lock for .env_overlay writes (closes pre-existing race
between concurrent marketplaces saves)
- app/web/templates/admin_server_config.html — new "Initial Workspace
Template" section + modal + Sync/Edit/Delete/Download buttons (matches
existing cfg-section visual language)
CLI side:
- cli/lib/override.py — single source of truth for is_override_workspace
sentinel detection
- cli/lib/initial_workspace.py — probe status, safe zip extraction with
../absolute/symlink rejection, typed-YES force confirmation
- cli/commands/init.py — override branch (skips Agnes-default workspace
writes); extended sentinel with override:true, template_source,
template_sha so future agnes self-upgrade does not auto-refresh hooks
- cli/lib/hooks.py + cli/lib/commands.py — short-circuit on override
workspaces (install_claude_hooks, install_claude_commands,
maybe_refresh_claude_hooks)
Audit-event strategy: server writes initial_workspace.fetch_started
inside GET /api/initial-workspace.zip (cannot be spoofed by PAT-holder);
CLI POST /applied writes initial_workspace.applied as best-effort
confirmation. Admin mutations log via the existing _audit pattern.
Tests: 27 server (clone/validate/zip + workspace-subdir convention +
concurrent persist_overlay_token + endpoint shapes + audit rows) + 29
CLI (override sentinel parse + probe fall-through + safe extraction +
YES strictness + hook guards + e2e mocked init).
Risk acceptance — documented in docs/initial-workspace-override.md +
CHANGELOG Internal section so AI reviewers understand the deviations
from defaults are intentional:
- maybe_refresh_claude_hooks deliberately no-ops on override workspaces
- --force on override does NOT back up CLAUDE.md (admin's repo is the
source of truth)
- .claude/CLAUDE.local.md IS overwritten by override extraction when
admin's repo ships one
* test+vendor-agnostic: drop Groupon tokens from #292 fixtures + extend admin-gate coverage
Two fixes from the takeover review on #292:
1. **Vendor-agnostic OSS rule**: Replace `Groupon` / `groupon/template`
tokens in test fixtures with `Acme` / `acme/template` (8 sites in
test_cli_init_override.py + 1 in test_initial_workspace_api.py).
Per CLAUDE.md "Vendor-agnostic OSS — no customer-specific content"
rule: customer-specific tokens don't belong in shipped artifacts,
even in test fixtures. The pre-existing FoundryAI mentions in
test_instance_config.py + test_setup_instructions.py are out of
scope for this PR (didn't introduce them).
2. **Admin-gate coverage gap**: `test_admin_endpoints_require_admin`
only covered GET /api/admin/initial-workspace + POST .../sync. The
register-write (POST .../initial-workspace) and delete (DELETE
.../initial-workspace) endpoints used the same `Depends(require_admin)`
wiring but had no regression test. Loop now covers all 4 verbs so
a future refactor that drops the dependency from one endpoint
fails here instead of silently exposing the write/delete paths to
any analyst with a PAT.
* release: 0.54.9 — Initial Workspace Template (per-instance agnes init override)
Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.8 →
0.54.9) for Mina's Initial Workspace Template feature.
No DB migration (config lives in instance.yaml overlay). No
mandatory operator action — empty default keeps OSS-default
agnes init behavior. Operators wanting full template control link a
Git repo on /admin/server-config → "Initial Workspace Template".
See docs/initial-workspace-override.md for the full
responsibility-transfer contract.
---------
Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
Introduces STATE_DIR as the single source of truth for the writable
state directory path, with backward-compatible default of
${DATA_DIR}/state. Pairs with a new docker-compose.flat-mount.yml
overlay that mounts the state disk in PARALLEL to the data disk
(rather than nested under it).
Why
---
The default deployment topology nests state under data: sdb at /data,
sdc at /data/state. That layout has known fragility documented in
docs/state-dir.md — bind-propagation gotchas, two-writer collisions
on the same prefix, mount-order coupling. The 2026-05-05 incident in
the Groupon FoundryAI deployment was a manifestation of the
propagation gotcha.
The flat layout (sdb at /data, sdc at /data-state — parallel, not
nested) eliminates the nested-mount class entirely. Each disk is its
own bind mount, recursive by default in modern Docker. No volume
options to forget. No two-writer collision (host scripts and
container app share /data-state at the same path, single namespace).
What changes
------------
App code (Python):
- src/db.py: new _get_state_dir() helper. get_system_db() and
schema migration snapshot use it.
- app/secrets.py: new _state_dir() helper. _load_or_generate() uses
it for .session_secret and .jwt_secret.
- app/main.py: .env_overlay loaded from _state_dir().
Host scripts:
- scripts/ops/agnes-auto-upgrade.sh: STATE_DIR drives mount-sanity
check and cert detection. Defaults preserve existing behavior.
- scripts/ops/agnes-tls-rotate.sh: STATE_DIR drives CERT_DIR.
New compose overlay:
- docker-compose.flat-mount.yml: parallel /data and /data-state binds
per service. Mutually exclusive with docker-compose.host-mount.yml;
pick one based on disk topology.
Documentation:
- docs/state-dir.md: layout choice (A nested vs B flat), pros/cons,
migration steps, and which code paths read STATE_DIR.
Backward compatibility
----------------------
STATE_DIR defaults to ${DATA_DIR}/state — current behavior. Existing
deployers that don't set the var see no behavior change. Migration
to flat layout is opt-in per the runbook in docs/state-dir.md.
Validation
----------
- bash -n on both host scripts: pass
- docker compose config -f docker-compose.flat-mount.yml: resolves
cleanly with all 6 services binding /data and /data-state directly
- python3 import + helper exercise: STATE_DIR override works,
default falls back to ${DATA_DIR}/state
Companion to PR #191 (drop named-volume driver_opts in host-mount.yml).
That PR fixes the immutability footgun for Layout A; this PR offers
Layout B as the architectural alternative.
- secrets.py: validate file content is non-empty before using it;
regenerate if file exists but is empty/corrupted
- release.yml: touch .env before docker compose in smoke test
(env_file: .env in docker-compose.yml requires the file to exist)
663 tests pass.
- Config writes to DATA_DIR/state/instance.yaml (writable) instead of
CONFIG_DIR (read-only :ro in Docker)
- instance_config.py checks DATA_DIR/state/ first, then falls back to
CONFIG_DIR for backward compat
- CalVer counter is now global across channels (*-YYYY.MM.*) per spec
- Keboola error messages sanitized — log full error, return generic msg
- chmod in secrets.py wrapped in try/except for Windows compat
- Setup wizard JS handles 401 (expired JWT) with user-facing message
- deploy.yml changed to workflow_dispatch only (no duplicate test runs)
- Smoke test uses docker-compose.prod.yml + AGNES_TAG instead of sed
- docker-compose.prod.yml uses ${AGNES_TAG:-stable} env var
663 tests pass. 8 E2E verification tests pass.