Commit graph

5 commits

Author SHA1 Message Date
minasarustamyan
69a1e22cf5
feat(initial-workspace): per-instance agnes init override (#292)
* feat(initial-workspace): per-instance agnes init override

Adds Initial Workspace Template — an admin-configurable per-instance
override for the agnes init analyst workspace. When configured, agnes
init downloads a server-rendered zip from a Git repo the admin registered
and extracts it into the analyst's workspace, fully bypassing Agnes-default
CLAUDE.md / settings.json / hooks / slash commands / AGNES_WORKSPACE.md.

Repo layout convention: only the contents of a top-level `workspace/`
subdirectory ship to analysts; admin docs (README, CI configs) at the
repo root stay in the repo and never reach an analyst. Sync rejects
repos without `workspace/` at root.

Server side:
- src/initial_workspace.py — clone (or fetch+reset), validate, build zip
  with strict path checks and reserved-path rejection
  (workspace/.claude/init-complete reserved by Agnes)
- app/api/initial_workspace.py — admin CRUD + sync endpoint + analyst-
  facing status/zip/applied endpoints; config persists to instance.yaml
  overlay, PAT to .env_overlay
- app/secrets.py — refactor: persist_overlay_token shared helper with
  threading.Lock for .env_overlay writes (closes pre-existing race
  between concurrent marketplaces saves)
- app/web/templates/admin_server_config.html — new "Initial Workspace
  Template" section + modal + Sync/Edit/Delete/Download buttons (matches
  existing cfg-section visual language)

CLI side:
- cli/lib/override.py — single source of truth for is_override_workspace
  sentinel detection
- cli/lib/initial_workspace.py — probe status, safe zip extraction with
  ../absolute/symlink rejection, typed-YES force confirmation
- cli/commands/init.py — override branch (skips Agnes-default workspace
  writes); extended sentinel with override:true, template_source,
  template_sha so future agnes self-upgrade does not auto-refresh hooks
- cli/lib/hooks.py + cli/lib/commands.py — short-circuit on override
  workspaces (install_claude_hooks, install_claude_commands,
  maybe_refresh_claude_hooks)

Audit-event strategy: server writes initial_workspace.fetch_started
inside GET /api/initial-workspace.zip (cannot be spoofed by PAT-holder);
CLI POST /applied writes initial_workspace.applied as best-effort
confirmation. Admin mutations log via the existing _audit pattern.

Tests: 27 server (clone/validate/zip + workspace-subdir convention +
concurrent persist_overlay_token + endpoint shapes + audit rows) + 29
CLI (override sentinel parse + probe fall-through + safe extraction +
YES strictness + hook guards + e2e mocked init).

Risk acceptance — documented in docs/initial-workspace-override.md +
CHANGELOG Internal section so AI reviewers understand the deviations
from defaults are intentional:
- maybe_refresh_claude_hooks deliberately no-ops on override workspaces
- --force on override does NOT back up CLAUDE.md (admin's repo is the
  source of truth)
- .claude/CLAUDE.local.md IS overwritten by override extraction when
  admin's repo ships one

* test+vendor-agnostic: drop Groupon tokens from #292 fixtures + extend admin-gate coverage

Two fixes from the takeover review on #292:

1. **Vendor-agnostic OSS rule**: Replace `Groupon` / `groupon/template`
   tokens in test fixtures with `Acme` / `acme/template` (8 sites in
   test_cli_init_override.py + 1 in test_initial_workspace_api.py).
   Per CLAUDE.md "Vendor-agnostic OSS — no customer-specific content"
   rule: customer-specific tokens don't belong in shipped artifacts,
   even in test fixtures. The pre-existing FoundryAI mentions in
   test_instance_config.py + test_setup_instructions.py are out of
   scope for this PR (didn't introduce them).

2. **Admin-gate coverage gap**: `test_admin_endpoints_require_admin`
   only covered GET /api/admin/initial-workspace + POST .../sync. The
   register-write (POST .../initial-workspace) and delete (DELETE
   .../initial-workspace) endpoints used the same `Depends(require_admin)`
   wiring but had no regression test. Loop now covers all 4 verbs so
   a future refactor that drops the dependency from one endpoint
   fails here instead of silently exposing the write/delete paths to
   any analyst with a PAT.

* release: 0.54.9 — Initial Workspace Template (per-instance agnes init override)

Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.8 →
0.54.9) for Mina's Initial Workspace Template feature.

No DB migration (config lives in instance.yaml overlay). No
mandatory operator action — empty default keeps OSS-default
agnes init behavior. Operators wanting full template control link a
Git repo on /admin/server-config → "Initial Workspace Template".
See docs/initial-workspace-override.md for the full
responsibility-transfer contract.

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-13 20:35:01 +00:00
Vojtech Rysanek
a303de0372 feat: STATE_DIR env var + flat-mount overlay (parallel disks)
Introduces STATE_DIR as the single source of truth for the writable
state directory path, with backward-compatible default of
${DATA_DIR}/state. Pairs with a new docker-compose.flat-mount.yml
overlay that mounts the state disk in PARALLEL to the data disk
(rather than nested under it).

Why
---
The default deployment topology nests state under data: sdb at /data,
sdc at /data/state. That layout has known fragility documented in
docs/state-dir.md — bind-propagation gotchas, two-writer collisions
on the same prefix, mount-order coupling. The 2026-05-05 incident in
the Groupon FoundryAI deployment was a manifestation of the
propagation gotcha.

The flat layout (sdb at /data, sdc at /data-state — parallel, not
nested) eliminates the nested-mount class entirely. Each disk is its
own bind mount, recursive by default in modern Docker. No volume
options to forget. No two-writer collision (host scripts and
container app share /data-state at the same path, single namespace).

What changes
------------
App code (Python):
- src/db.py:        new _get_state_dir() helper. get_system_db() and
                    schema migration snapshot use it.
- app/secrets.py:   new _state_dir() helper. _load_or_generate() uses
                    it for .session_secret and .jwt_secret.
- app/main.py:      .env_overlay loaded from _state_dir().

Host scripts:
- scripts/ops/agnes-auto-upgrade.sh: STATE_DIR drives mount-sanity
  check and cert detection. Defaults preserve existing behavior.
- scripts/ops/agnes-tls-rotate.sh:   STATE_DIR drives CERT_DIR.

New compose overlay:
- docker-compose.flat-mount.yml: parallel /data and /data-state binds
  per service. Mutually exclusive with docker-compose.host-mount.yml;
  pick one based on disk topology.

Documentation:
- docs/state-dir.md: layout choice (A nested vs B flat), pros/cons,
  migration steps, and which code paths read STATE_DIR.

Backward compatibility
----------------------
STATE_DIR defaults to ${DATA_DIR}/state — current behavior. Existing
deployers that don't set the var see no behavior change. Migration
to flat layout is opt-in per the runbook in docs/state-dir.md.

Validation
----------
- bash -n on both host scripts: pass
- docker compose config -f docker-compose.flat-mount.yml: resolves
  cleanly with all 6 services binding /data and /data-state directly
- python3 import + helper exercise: STATE_DIR override works,
  default falls back to ${DATA_DIR}/state

Companion to PR #191 (drop named-volume driver_opts in host-mount.yml).
That PR fixes the immutability footgun for Layout A; this PR offers
Layout B as the architectural alternative.
2026-05-05 19:28:07 +02:00
ZdenekSrotyr
44b99f25ca fix: address Devin review round 5 — empty secret file, CI .env
- secrets.py: validate file content is non-empty before using it;
  regenerate if file exists but is empty/corrupted
- release.yml: touch .env before docker compose in smoke test
  (env_file: .env in docker-compose.yml requires the file to exist)

663 tests pass.
2026-04-10 14:55:31 +02:00
ZdenekSrotyr
49f109bf73 fix: address PR review findings — config write, CalVer, error handling
- Config writes to DATA_DIR/state/instance.yaml (writable) instead of
  CONFIG_DIR (read-only :ro in Docker)
- instance_config.py checks DATA_DIR/state/ first, then falls back to
  CONFIG_DIR for backward compat
- CalVer counter is now global across channels (*-YYYY.MM.*) per spec
- Keboola error messages sanitized — log full error, return generic msg
- chmod in secrets.py wrapped in try/except for Windows compat
- Setup wizard JS handles 401 (expired JWT) with user-facing message
- deploy.yml changed to workflow_dispatch only (no duplicate test runs)
- Smoke test uses docker-compose.prod.yml + AGNES_TAG instead of sed
- docker-compose.prod.yml uses ${AGNES_TAG:-stable} env var

663 tests pass. 8 E2E verification tests pass.
2026-04-10 13:16:40 +02:00
ZdenekSrotyr
6c53082295 feat: multi-instance deployment — all 14 must-have items from spec
CalVer CI (release.yml) with stable/dev channels, health endpoint
with version/channel/schema_version, JWT secret auto-generation with
file persistence, smoke test script + Docker-in-CI, pre-migration
snapshot, /api/admin/configure for headless setup, /api/admin/
discover-and-register, /setup wizard, OpenAPI snapshot test, custom
connector mount support, CHANGELOG, migration safety tests, startup
banner.

663 tests pass (6 new migration safety + 3 OpenAPI snapshot + 1
updated JWT test).
2026-04-10 11:57:42 +02:00