agnes-the-ai-analyst/.claude/skills/agnes-orchestrator.md
ZdenekSrotyr 650ea3c804
feat: Agnes specialist agents and skills under .claude/ (#328) (#328)
Four knowledge skills auto-load into the main agent's context when
their description matches the work; invokable explicitly via
Skill(<name>):

- agnes-orchestrator — extract.duckdb ATTACH flow, query_mode
  semantics, _remote_attach, rebuild lock
- agnes-rbac — require_admin vs require_resource_access,
  ResourceType registration
- agnes-connectors — _meta contract, three connector shapes,
  new-connector checklist
- agnes-release-process — CHANGELOG discipline, release-cut,
  version bump, post-merge auto-rollback

Three reviewer subagents fire in parallel at end of PR work; one
releaser subagent handles pre-merge release-cut + post-merge tag /
GitHub Release:

- agnes-reviewer-rules — CHANGELOG bullet, vendor-agnostic scan,
  AI attribution, commit hygiene (always fires)
- agnes-reviewer-rbac — endpoint gates, ResourceType registration
  (fires on app/api/, app/auth/ diffs)
- agnes-reviewer-architecture — extract.duckdb invariants, schema
  migrations, rebuild lock (fires on src/, connectors/ diffs)
- agnes-releaser — Phase 1 pre-merge release-cut commit; Phase 2
  post-merge tag + GitHub Release

.gitignore un-ignores .claude/agents/ and .claude/skills/ while
keeping the rest of .claude/ local-only. CLAUDE.md gets a new
'Specialized agents and skills' section pointing at the two
directories.

Source of truth for the rules these encode remains CLAUDE.md +
docs/RELEASING.md — skills explicitly defer to the master docs on
conflict.

Design rationale: docs/superpowers/specs/2026-05-15-agnes-agents-design.md
Implementation plan: docs/superpowers/plans/2026-05-15-agnes-agents.md
2026-05-15 20:39:11 +02:00

3.1 KiB

name description
agnes-orchestrator Rules for the SyncOrchestrator, the extract.duckdb ATTACH flow, query_mode semantics (local / remote / materialized), and when to call rebuild() vs rebuild_source(). Use when editing src/orchestrator.py, src/db.py, or anything that produces extract.duckdb in connectors/.

Agnes orchestrator

Source of truth for orchestrator invariants. See CLAUDE.md § Architecture and docs/architecture.md for the canonical description.

ATTACH flow

SyncOrchestrator.rebuild() scans /data/extracts/*/extract.duckdb, ATTACHes each into the master analytics.duckdb, creates views like <source>."<bucket>"."<table>", and updates sync_state.

Per-source rebuild is rebuild_source(name) — used after Jira webhooks where only one source changed. Full rebuild() is the fallback when scope is unclear.

Thread safety

All write paths take self._rebuild_lock (a threading.Lock). New write paths — anything that DETACHes / re-ATTACHes / updates sync_state — MUST hold the lock. Read paths must not hold it.

query_mode

Every table has a query_mode in its _meta row:

  • local — batch-pulled to parquet, queried locally. Parquets live under /data/extracts/<source>/data/. Synced via agnes pull.
  • remote — queried against the upstream (e.g., BigQuery) at query time. No parquet on disk. Requires a _remote_attach row in extract.duckdb.
  • materialized — admin-registered SQL run by the scheduler. Result lands as a parquet under /data/extracts/<source>/data/. Distributed like local.

_remote_attach mechanism

For query_mode='remote' tables, the extractor writes a _remote_attach table in extract.duckdb with columns:

column meaning
alias name used in the ATTACH statement
extension DuckDB extension to install + load
url upstream connection URL
token_env env var holding the auth token ('' if extension-specific auth, e.g., BigQuery's GCE metadata server)

At query time the orchestrator installs/loads the extension, resolves the token, creates a session-scoped SECRET when required, and ATTACHes the source so views like kbc."bucket"."table" resolve.

Master DB locations

  • System DB: ${DATA_DIR}/state/system.duckdb (sync_state, table_registry, users, RBAC).
  • Analytics DB: ${DATA_DIR}/analytics/server.duckdb (master views).

Schema migrations

src/db.py auto-migrates from v1 → vN on startup. Per-version notes live in CHANGELOG.md. Adding a schema version means:

  1. Bumping the version constant in src/db.py.
  2. Adding the vN-1 → vN migration step.
  3. Adding a CHANGELOG bullet that names the version.
  4. Updating documentation that references the schema version (search for "schema v" in docs/ + CLAUDE.md).

Files NOT to modify

connectors/jira/file_lock.py and services/ws_gateway/ — stable infrastructure. (connectors/jira/transform.py was previously off-limits but as of 0.54.19 is no longer; it remains sensitive — touch only with end-to-end understanding of the JSON-overlay / parquet-rewrite pipeline.)