agnes-the-ai-analyst/docs/superpowers/specs/2026-05-15-agnes-agents-design.md
ZdenekSrotyr 650ea3c804
feat: Agnes specialist agents and skills under .claude/ (#328) (#328)
Four knowledge skills auto-load into the main agent's context when
their description matches the work; invokable explicitly via
Skill(<name>):

- agnes-orchestrator — extract.duckdb ATTACH flow, query_mode
  semantics, _remote_attach, rebuild lock
- agnes-rbac — require_admin vs require_resource_access,
  ResourceType registration
- agnes-connectors — _meta contract, three connector shapes,
  new-connector checklist
- agnes-release-process — CHANGELOG discipline, release-cut,
  version bump, post-merge auto-rollback

Three reviewer subagents fire in parallel at end of PR work; one
releaser subagent handles pre-merge release-cut + post-merge tag /
GitHub Release:

- agnes-reviewer-rules — CHANGELOG bullet, vendor-agnostic scan,
  AI attribution, commit hygiene (always fires)
- agnes-reviewer-rbac — endpoint gates, ResourceType registration
  (fires on app/api/, app/auth/ diffs)
- agnes-reviewer-architecture — extract.duckdb invariants, schema
  migrations, rebuild lock (fires on src/, connectors/ diffs)
- agnes-releaser — Phase 1 pre-merge release-cut commit; Phase 2
  post-merge tag + GitHub Release

.gitignore un-ignores .claude/agents/ and .claude/skills/ while
keeping the rest of .claude/ local-only. CLAUDE.md gets a new
'Specialized agents and skills' section pointing at the two
directories.

Source of truth for the rules these encode remains CLAUDE.md +
docs/RELEASING.md — skills explicitly defer to the master docs on
conflict.

Design rationale: docs/superpowers/specs/2026-05-15-agnes-agents-design.md
Implementation plan: docs/superpowers/plans/2026-05-15-agnes-agents.md
2026-05-15 20:39:11 +02:00

313 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Agnes specialized agents — design
**Status:** approved (brainstorm), not yet implemented
**Date:** 2026-05-15
**Author:** zsrotyr
## Problem
Working on Agnes through Claude Code, three classes of friction recur:
1. **Mental model drift.** Claude (and humans) forget how Agnes hangs together — the `extract.duckdb` contract, RBAC layering (`require_admin` vs `require_resource_access`), `query_mode` semantics (local / remote / materialized), the orchestrator's `rebuild()` flow. Edits to one part silently break invariants of another, caught only at code review.
2. **Convention enforcement at review.** Several CLAUDE.md rules (CHANGELOG bullet, vendor-agnostic OSS, no AI attribution, issue economy, RBAC gates on new endpoints) are easy to forget. Manual review catches most but not all.
3. **Release-cut workflow.** The non-negotiable rules from `docs/RELEASING.md` and `CLAUDE.md § Release process` — release-cut belongs in the PR that earned it, last commit on the PR, post-merge tag + GitHub Release — evolve and are repetitive enough that doing them by hand each time is error-prone.
The brainstorm explored variants A ("one big architect"), B ("three specialist subagents covering all three concerns"), and C ("knowledge as skills, enforcement/workflow as subagents"). C was selected because the mental model is *not delegatable* — when the main agent writes code it needs the context in its own window, not in a subagent's.
## Approach
Four layers, each with a distinct mechanism and lifecycle:
```
Layer 1: KNOWLEDGE SKILLS (.claude/skills/agnes-*.md)
- Auto-trigger by description + explicit invocation by main agent
- Load into MAIN agent's context
- Purpose: main agent knows how Agnes works while writing code
Layer 2: REVIEWER SUBAGENTS (.claude/agents/agnes-reviewer-*.md)
- Spawned via Agent tool at end of PR work
- Own context window; return a punch list
- Three specialists, fired in parallel by main agent
Layer 3: RELEASER SUBAGENT (.claude/agents/agnes-releaser.md)
- Spawned via Agent tool pre-merge (phase 1) and post-merge (phase 2)
- Own context window; produces release-cut commit + tag + GitHub Release
Layer 4: PERSONAL (~/.claude/agents/<customer>-deploy.md)
- Outside this repo (operator's home dir, never committed)
- Customer-specific context (customer VMs, gcloud / cloud accounts,
private infra repos, deploy targets)
```
Skills (Layer 1) share context with the main agent — when Claude edits
`src/orchestrator.py`, the orchestrator skill is loaded so rules stay "in head"
across delegation. Subagents (Layers 2 and 3) isolate context — review/release
read many files but only a punch list returns to the main conversation.
Cross-layer sharing happens through skills. The `agnes-release-process` skill is
read by the main agent (planning), by `agnes-reviewer-rules` (checking), and by
`agnes-releaser` (executing). Single source of truth.
## Components
### Layer 1 — Knowledge skills (`.claude/skills/`)
Four skills. Each file is ~80120 lines. Files are kept focused; if one grows
past ~150 lines that is a signal to split. Skills do not duplicate `CLAUDE.md`
content verbatim — they reference (`see CLAUDE.md § Access control`) so master
rules have one location.
#### `agnes-orchestrator`
- **Description (triggers auto-spawn):** Use when editing `src/orchestrator.py`,
`src/db.py`, or anything that produces `extract.duckdb` in `connectors/*/`.
Rules for ATTACH flow, `query_mode` semantics, and when `rebuild()` is
required.
- **Body:** master view lifecycle in `analytics.duckdb`; thread-safety via
`_rebuild_lock`; `rebuild_source(name)` vs full `rebuild()` decision;
`_remote_attach` reattach flow at query time (extension install + token
resolution via `token_env` or extension-specific auth path).
#### `agnes-rbac`
- **Description:** Use when adding or changing an endpoint in `app/api/`,
touching `app/auth/`, or introducing a new resource type. Enforces gate
pattern (`require_admin` vs `require_resource_access`) and `ResourceType`
registration.
- **Body:** decision tree for picking a gate (app-level mutation vs
entity-scoped); how to add a new `ResourceType` (StrEnum value +
`ResourceTypeSpec` with `list_blocks` delegate in `app/resource_types.py`, no
DB migration); when a grant is needed even for reads; god-mode short-circuit
for the `Admin` group.
#### `agnes-connectors`
- **Description:** Use when adding a new data source or modifying an existing
extractor in `connectors/`. Enforces the `extract.duckdb` contract — `_meta`
table, `query_mode` column, parquet layout.
- **Body:** required `_meta` columns (`table_name`, `description`, `rows`,
`size_bytes`, `extracted_at`, `query_mode`); when a connector is batch-pull
vs remote-attach vs real-time push; how to expose `_remote_attach` for remote
mode; where the extractor writes (`/data/extracts/{source}/`).
#### `agnes-release-process`
- **Description:** Use before opening a PR, before merge, or when handling a
release-cut. Rules for CHANGELOG bullet, when the release-cut commit belongs
in the PR, version bump decision (patch is default; ask before minor).
- **Body:** CHANGELOG discipline (Added / Changed / Fixed / Removed / Internal
grouping, `**BREAKING**` prefix); release-cut decision tree (when it is the
last commit on the PR); post-merge sequence (tag `vX.Y.Z` on merge commit +
`gh release create`); patch / minor / major guidance.
### Layer 2 — Reviewer subagents (`.claude/agents/`)
Each subagent has a standard frontmatter (`name`, `description`, `tools`,
`model`). All are read-only (no `Edit` / `Write`) — they return punch lists, not
code changes.
#### `agnes-reviewer-rules`
- **When fired:** every PR, at end of work, before opening the PR.
- **Tools:** `Read`, `Bash` (restricted to `git diff`, `git log`, `grep`).
- **Model:** Haiku — fast, mostly text work.
- **Input from main agent:** PR branch name (or current HEAD), optionally PR
draft body.
- **Checks:**
- CHANGELOG.md has a new bullet under `[Unreleased]` *iff* the PR changes
user-visible behavior. Smart, not blind — doc-only PRs typically do not
need a bullet; judgment applied based on the diff.
- No customer-specific tokens in the diff or PR body (deployment names,
internal hostnames, cloud project IDs, internal SA emails). The
`agnes-reviewer-rules` subagent loads the operator's token list from
`CLAUDE.local.md` (`vendor_tokens:` entry) rather than inlining
customer names in the public OSS repo.
- Commits do not contain `Co-Authored-By: Claude` or any AI attribution; PR
body is the same.
- Issue-economy red flags — filing follow-up issues rather than fix-it-now or
close-it-as-moot.
- Commit messages are clean and concise per project convention.
- Consults `agnes-release-process` skill for the release-cut implication of
this PR.
- **Output:** Done / Missing / Warning punch list.
#### `agnes-reviewer-rbac`
- **When fired:** when the diff touches `app/api/`, `app/auth/`, or
`app/resource_types.py`.
- **Tools:** `Read`, `Grep`, `Bash` (read-only).
- **Model:** Sonnet — needs to understand the auth flow.
- **Checks:**
- New `@router.get/post/...` handlers have `Depends(require_admin)` or
`Depends(require_resource_access(ResourceType.X, "..."))`.
- New `ResourceType` values have a `ResourceTypeSpec` registration in
`app/resource_types.py`.
- Consults `agnes-rbac` skill for the gate decision rules.
- **Output:** per-endpoint flag — gated correctly / missing gate / ambiguous.
#### `agnes-reviewer-architecture`
- **When fired:** when the diff touches `src/orchestrator.py`, `src/db.py`,
`connectors/*/extractor.py`, or adds a schema migration.
- **Tools:** `Read`, `Grep`, `Bash` (read-only).
- **Model:** Sonnet.
- **Checks:**
- Extractor changes preserve `_meta` table contract.
- Remote-attach changes preserve `_remote_attach` columns (`alias`,
`extension`, `url`, `token_env`).
- Schema bumps in `src/db.py` include the `vN-1 → vN` migration step, a
CHANGELOG note, and documentation references that reflect the new version.
- Changes to `rebuild()` / `rebuild_source()` hold `_rebuild_lock` on all
write paths.
- Consults `agnes-orchestrator` and `agnes-connectors` skills.
- **Output:** per-invariant punch list — holds / broken / unclear.
### Layer 3 — Releaser subagent (`.claude/agents/agnes-releaser.md`)
- **Tools:** `Read`, `Edit`, `Bash` (including `gh`, `git`).
- **Model:** Sonnet.
Two phases, each invoked explicitly by the user (never auto-fired).
**Phase 1 — pre-merge.** User says "ready to merge".
1. Consults `agnes-release-process` skill.
2. Runs `git log` since the last tag and inspects scope.
3. Decision tree: patch (default) vs minor (asks user) vs major (requires
explicit confirmation).
4. If this PR lands the only `[Unreleased]` content since the last release,
prepares the last commit on the PR: bump `pyproject.toml`, rename
`[Unreleased]` to `[X.Y.Z] - YYYY-MM-DD`, add a new empty `[Unreleased]`.
5. Pushes the prepared commit. **Does not merge** — the user merges via
`gh pr merge` themselves.
**Phase 2 — post-merge.** User says "tag it".
1. Verifies the merge commit contains the release-cut diff.
2. `git tag vX.Y.Z <merge-sha>` and `git push origin vX.Y.Z`.
3. `gh release create vX.Y.Z` with body extracted from the `[X.Y.Z]` section of
CHANGELOG.
4. Returns the GitHub Release URL.
**Never does:** merges the PR (high-blast-radius); force-pushes; amends
published commits.
### Layer 4 — Personal (`~/.claude/agents/<customer>-deploy.md`)
Outside this repo. Customer-specific content lives here so the OSS repo stays
vendor-agnostic per `CLAUDE.md § Vendor-agnostic OSS`. Operators write their
own Layer-4 agent against their deployment.
- **Tools:** `Read`, `Bash` (including the operator's cloud CLI — `gcloud`,
`aws`, etc. — plus `gh`, `git push`).
- **Model:** Sonnet.
- **Knows** (each item is customer-specific; concrete tokens stay in the
operator's home dir / `CLAUDE.local.md`, never committed):
- VM ↔ project ↔ zone ↔ cloud-account mapping (one row per deployment).
- Per-environment deploy ritual (e.g., force-push to `<branch>` to deploy
to `<host>`).
- Cross-references to private infra repos and pinned module tag.
- Default cloud account selection rules; explicit `--account=` overrides.
- Any pre-existing legacy systems still running alongside the deployment.
- **Does not** touch the OSS repo, push to public branches, or participate in
PR review.
## End-to-end PR flow
```
1. User asks for feature X.
2. Main agent creates a worktree (per CLAUDE.local.md), invokes relevant
knowledge skills based on what the feature touches:
- src/orchestrator.py → Skill(agnes-orchestrator)
- app/api/ + app/auth/ → Skill(agnes-rbac)
- connectors/ → Skill(agnes-connectors)
3. Implementation + tests + CHANGELOG bullet (the agnes-release-process skill
reminded the main agent it is needed).
4. Before opening the PR, main agent fires reviewers in parallel — one message,
multiple Agent tool calls:
- Agent(agnes-reviewer-rules) — always
- Agent(agnes-reviewer-rbac) — only if diff touched app/api or app/auth
- Agent(agnes-reviewer-architecture) — only if diff touched src/ or connectors/
5. Main agent aggregates the punch lists, fixes findings, opens the PR.
6. User says "ready to merge" → Agent(agnes-releaser, phase 1) prepares the
release-cut decision and the last commit on the PR.
7. User confirms the version → releaser pushes the prepared commit. User
merges via `gh pr merge` manually.
8. User says "tag it" → Agent(agnes-releaser, phase 2) creates the tag and the
GitHub Release.
```
A separate flow handles personal dev-VM deploys (outside the OSS PR cycle):
```
User says "push to my VM" → main agent invokes Agent(<customer>-deploy) →
that personal-layer agent knows what "push to my VM" means in the operator's
environment (typically a force-push to a deploy-target branch), confirms with
the user (force-push is destructive), pushes.
```
## What lands in the repo
```
.claude/
├── agents/
│ ├── agnes-reviewer-rules.md
│ ├── agnes-reviewer-rbac.md
│ ├── agnes-reviewer-architecture.md
│ └── agnes-releaser.md
└── skills/
├── agnes-orchestrator.md
├── agnes-rbac.md
├── agnes-connectors.md
└── agnes-release-process.md
docs/superpowers/specs/
└── 2026-05-15-agnes-agents-design.md (this document)
```
`CLAUDE.md` gets a short paragraph under "Project conventions" pointing at
`.claude/agents/` and `.claude/skills/`, noting that subagents and skills exist
and how to invoke them.
The personal layer (`~/.claude/agents/<customer>-deploy.md`) is written
separately by each operator and is not part of the in-repo change set.
## Non-goals
- **Not a Claude Code "team".** The architecture, review, and release agents
work sequentially and do not message each other; a team would add the
experimental flag, more tokens, file-conflict risk, and no functional gain.
If `agnes-reviewer-*` ever needs to coordinate across four+ parallel reviews,
re-evaluate.
- **Not a pre-commit hook replacement.** Mechanical checks (CHANGELOG bullet
present, AI-attribution scan) could be a `pre-commit` hook in addition; the
reviewer agent provides judgment-level checks the hook cannot.
- **No auto-merge.** `agnes-releaser` never runs `gh pr merge`. Merge is a
visible, user-controlled action.
## Open questions
- **Auto-trigger reliability for knowledge skills.** Claude Code skill
auto-spawn is description-driven and not always reliable in large skill
catalogs. Mitigation: a one-line pointer at the top of `CLAUDE.md`
("when touching X, invoke skill Y") and explicit invocation in the main
agent's planning step. If reliability is still low after a few weeks, fall
back to a `SessionStart` hook that lists Agnes-specific skills as a reminder.
- **Schema migration as a separate skill?** `agnes-orchestrator` covers
`src/db.py` migration patterns. If migration gotchas grow (versioning beyond
`vN`, multi-step migrations, data backfills), split into a fifth skill
`agnes-schema-migration`.
## Implementation plan
The follow-up step is to invoke the `writing-plans` skill to turn this spec
into a sequenced implementation plan covering: skills first (lowest risk,
testable in isolation), then reviewers, then the releaser, then a CLAUDE.md
pointer. Personal layer lands separately, outside the repo.