Four knowledge skills auto-load into the main agent's context when their description matches the work; invokable explicitly via Skill(<name>): - agnes-orchestrator — extract.duckdb ATTACH flow, query_mode semantics, _remote_attach, rebuild lock - agnes-rbac — require_admin vs require_resource_access, ResourceType registration - agnes-connectors — _meta contract, three connector shapes, new-connector checklist - agnes-release-process — CHANGELOG discipline, release-cut, version bump, post-merge auto-rollback Three reviewer subagents fire in parallel at end of PR work; one releaser subagent handles pre-merge release-cut + post-merge tag / GitHub Release: - agnes-reviewer-rules — CHANGELOG bullet, vendor-agnostic scan, AI attribution, commit hygiene (always fires) - agnes-reviewer-rbac — endpoint gates, ResourceType registration (fires on app/api/, app/auth/ diffs) - agnes-reviewer-architecture — extract.duckdb invariants, schema migrations, rebuild lock (fires on src/, connectors/ diffs) - agnes-releaser — Phase 1 pre-merge release-cut commit; Phase 2 post-merge tag + GitHub Release .gitignore un-ignores .claude/agents/ and .claude/skills/ while keeping the rest of .claude/ local-only. CLAUDE.md gets a new 'Specialized agents and skills' section pointing at the two directories. Source of truth for the rules these encode remains CLAUDE.md + docs/RELEASING.md — skills explicitly defer to the master docs on conflict. Design rationale: docs/superpowers/specs/2026-05-15-agnes-agents-design.md Implementation plan: docs/superpowers/plans/2026-05-15-agnes-agents.md
75 lines
3.1 KiB
Markdown
75 lines
3.1 KiB
Markdown
---
|
|
name: agnes-orchestrator
|
|
description: Rules for the SyncOrchestrator, the extract.duckdb ATTACH flow, query_mode semantics (local / remote / materialized), and when to call rebuild() vs rebuild_source(). Use when editing src/orchestrator.py, src/db.py, or anything that produces extract.duckdb in connectors/.
|
|
---
|
|
|
|
# Agnes orchestrator
|
|
|
|
Source of truth for orchestrator invariants. See `CLAUDE.md § Architecture`
|
|
and `docs/architecture.md` for the canonical description.
|
|
|
|
## ATTACH flow
|
|
|
|
`SyncOrchestrator.rebuild()` scans `/data/extracts/*/extract.duckdb`,
|
|
ATTACHes each into the master `analytics.duckdb`, creates views like
|
|
`<source>."<bucket>"."<table>"`, and updates `sync_state`.
|
|
|
|
Per-source rebuild is `rebuild_source(name)` — used after Jira webhooks where
|
|
only one source changed. Full `rebuild()` is the fallback when scope is
|
|
unclear.
|
|
|
|
## Thread safety
|
|
|
|
All write paths take `self._rebuild_lock` (a `threading.Lock`). New write
|
|
paths — anything that DETACHes / re-ATTACHes / updates `sync_state` — MUST
|
|
hold the lock. Read paths must not hold it.
|
|
|
|
## query_mode
|
|
|
|
Every table has a `query_mode` in its `_meta` row:
|
|
|
|
- `local` — batch-pulled to parquet, queried locally. Parquets live under
|
|
`/data/extracts/<source>/data/`. Synced via `agnes pull`.
|
|
- `remote` — queried against the upstream (e.g., BigQuery) at query time.
|
|
No parquet on disk. Requires a `_remote_attach` row in `extract.duckdb`.
|
|
- `materialized` — admin-registered SQL run by the scheduler. Result lands as
|
|
a parquet under `/data/extracts/<source>/data/`. Distributed like `local`.
|
|
|
|
## `_remote_attach` mechanism
|
|
|
|
For `query_mode='remote'` tables, the extractor writes a `_remote_attach`
|
|
table in `extract.duckdb` with columns:
|
|
|
|
| column | meaning |
|
|
|---|---|
|
|
| `alias` | name used in the ATTACH statement |
|
|
| `extension` | DuckDB extension to install + load |
|
|
| `url` | upstream connection URL |
|
|
| `token_env` | env var holding the auth token (`''` if extension-specific auth, e.g., BigQuery's GCE metadata server) |
|
|
|
|
At query time the orchestrator installs/loads the extension, resolves the
|
|
token, creates a session-scoped SECRET when required, and ATTACHes the
|
|
source so views like `kbc."bucket"."table"` resolve.
|
|
|
|
## Master DB locations
|
|
|
|
- System DB: `${DATA_DIR}/state/system.duckdb` (sync_state, table_registry, users, RBAC).
|
|
- Analytics DB: `${DATA_DIR}/analytics/server.duckdb` (master views).
|
|
|
|
## Schema migrations
|
|
|
|
`src/db.py` auto-migrates from `v1 → vN` on startup. Per-version notes live
|
|
in `CHANGELOG.md`. Adding a schema version means:
|
|
|
|
1. Bumping the version constant in `src/db.py`.
|
|
2. Adding the `vN-1 → vN` migration step.
|
|
3. Adding a CHANGELOG bullet that names the version.
|
|
4. Updating documentation that references the schema version (search for
|
|
"schema v" in `docs/` + `CLAUDE.md`).
|
|
|
|
## Files NOT to modify
|
|
|
|
`connectors/jira/file_lock.py` and `services/ws_gateway/` — stable
|
|
infrastructure. (`connectors/jira/transform.py` was previously off-limits
|
|
but as of 0.54.19 is no longer; it remains sensitive — touch only with
|
|
end-to-end understanding of the JSON-overlay / parquet-rewrite pipeline.)
|