agnes-the-ai-analyst/.claude/skills/agnes-orchestrator.md
ZdenekSrotyr 650ea3c804
feat: Agnes specialist agents and skills under .claude/ (#328) (#328)
Four knowledge skills auto-load into the main agent's context when
their description matches the work; invokable explicitly via
Skill(<name>):

- agnes-orchestrator — extract.duckdb ATTACH flow, query_mode
  semantics, _remote_attach, rebuild lock
- agnes-rbac — require_admin vs require_resource_access,
  ResourceType registration
- agnes-connectors — _meta contract, three connector shapes,
  new-connector checklist
- agnes-release-process — CHANGELOG discipline, release-cut,
  version bump, post-merge auto-rollback

Three reviewer subagents fire in parallel at end of PR work; one
releaser subagent handles pre-merge release-cut + post-merge tag /
GitHub Release:

- agnes-reviewer-rules — CHANGELOG bullet, vendor-agnostic scan,
  AI attribution, commit hygiene (always fires)
- agnes-reviewer-rbac — endpoint gates, ResourceType registration
  (fires on app/api/, app/auth/ diffs)
- agnes-reviewer-architecture — extract.duckdb invariants, schema
  migrations, rebuild lock (fires on src/, connectors/ diffs)
- agnes-releaser — Phase 1 pre-merge release-cut commit; Phase 2
  post-merge tag + GitHub Release

.gitignore un-ignores .claude/agents/ and .claude/skills/ while
keeping the rest of .claude/ local-only. CLAUDE.md gets a new
'Specialized agents and skills' section pointing at the two
directories.

Source of truth for the rules these encode remains CLAUDE.md +
docs/RELEASING.md — skills explicitly defer to the master docs on
conflict.

Design rationale: docs/superpowers/specs/2026-05-15-agnes-agents-design.md
Implementation plan: docs/superpowers/plans/2026-05-15-agnes-agents.md
2026-05-15 20:39:11 +02:00

75 lines
3.1 KiB
Markdown

---
name: agnes-orchestrator
description: Rules for the SyncOrchestrator, the extract.duckdb ATTACH flow, query_mode semantics (local / remote / materialized), and when to call rebuild() vs rebuild_source(). Use when editing src/orchestrator.py, src/db.py, or anything that produces extract.duckdb in connectors/.
---
# Agnes orchestrator
Source of truth for orchestrator invariants. See `CLAUDE.md § Architecture`
and `docs/architecture.md` for the canonical description.
## ATTACH flow
`SyncOrchestrator.rebuild()` scans `/data/extracts/*/extract.duckdb`,
ATTACHes each into the master `analytics.duckdb`, creates views like
`<source>."<bucket>"."<table>"`, and updates `sync_state`.
Per-source rebuild is `rebuild_source(name)` — used after Jira webhooks where
only one source changed. Full `rebuild()` is the fallback when scope is
unclear.
## Thread safety
All write paths take `self._rebuild_lock` (a `threading.Lock`). New write
paths — anything that DETACHes / re-ATTACHes / updates `sync_state` — MUST
hold the lock. Read paths must not hold it.
## query_mode
Every table has a `query_mode` in its `_meta` row:
- `local` — batch-pulled to parquet, queried locally. Parquets live under
`/data/extracts/<source>/data/`. Synced via `agnes pull`.
- `remote` — queried against the upstream (e.g., BigQuery) at query time.
No parquet on disk. Requires a `_remote_attach` row in `extract.duckdb`.
- `materialized` — admin-registered SQL run by the scheduler. Result lands as
a parquet under `/data/extracts/<source>/data/`. Distributed like `local`.
## `_remote_attach` mechanism
For `query_mode='remote'` tables, the extractor writes a `_remote_attach`
table in `extract.duckdb` with columns:
| column | meaning |
|---|---|
| `alias` | name used in the ATTACH statement |
| `extension` | DuckDB extension to install + load |
| `url` | upstream connection URL |
| `token_env` | env var holding the auth token (`''` if extension-specific auth, e.g., BigQuery's GCE metadata server) |
At query time the orchestrator installs/loads the extension, resolves the
token, creates a session-scoped SECRET when required, and ATTACHes the
source so views like `kbc."bucket"."table"` resolve.
## Master DB locations
- System DB: `${DATA_DIR}/state/system.duckdb` (sync_state, table_registry, users, RBAC).
- Analytics DB: `${DATA_DIR}/analytics/server.duckdb` (master views).
## Schema migrations
`src/db.py` auto-migrates from `v1 → vN` on startup. Per-version notes live
in `CHANGELOG.md`. Adding a schema version means:
1. Bumping the version constant in `src/db.py`.
2. Adding the `vN-1 → vN` migration step.
3. Adding a CHANGELOG bullet that names the version.
4. Updating documentation that references the schema version (search for
"schema v" in `docs/` + `CLAUDE.md`).
## Files NOT to modify
`connectors/jira/file_lock.py` and `services/ws_gateway/` — stable
infrastructure. (`connectors/jira/transform.py` was previously off-limits
but as of 0.54.19 is no longer; it remains sensitive — touch only with
end-to-end understanding of the JSON-overlay / parquet-rewrite pipeline.)