agnes-the-ai-analyst/.claude/skills/agnes-connectors.md
ZdenekSrotyr 650ea3c804
feat: Agnes specialist agents and skills under .claude/ (#328) (#328)
Four knowledge skills auto-load into the main agent's context when
their description matches the work; invokable explicitly via
Skill(<name>):

- agnes-orchestrator — extract.duckdb ATTACH flow, query_mode
  semantics, _remote_attach, rebuild lock
- agnes-rbac — require_admin vs require_resource_access,
  ResourceType registration
- agnes-connectors — _meta contract, three connector shapes,
  new-connector checklist
- agnes-release-process — CHANGELOG discipline, release-cut,
  version bump, post-merge auto-rollback

Three reviewer subagents fire in parallel at end of PR work; one
releaser subagent handles pre-merge release-cut + post-merge tag /
GitHub Release:

- agnes-reviewer-rules — CHANGELOG bullet, vendor-agnostic scan,
  AI attribution, commit hygiene (always fires)
- agnes-reviewer-rbac — endpoint gates, ResourceType registration
  (fires on app/api/, app/auth/ diffs)
- agnes-reviewer-architecture — extract.duckdb invariants, schema
  migrations, rebuild lock (fires on src/, connectors/ diffs)
- agnes-releaser — Phase 1 pre-merge release-cut commit; Phase 2
  post-merge tag + GitHub Release

.gitignore un-ignores .claude/agents/ and .claude/skills/ while
keeping the rest of .claude/ local-only. CLAUDE.md gets a new
'Specialized agents and skills' section pointing at the two
directories.

Source of truth for the rules these encode remains CLAUDE.md +
docs/RELEASING.md — skills explicitly defer to the master docs on
conflict.

Design rationale: docs/superpowers/specs/2026-05-15-agnes-agents-design.md
Implementation plan: docs/superpowers/plans/2026-05-15-agnes-agents.md
2026-05-15 20:39:11 +02:00

3.2 KiB

name description
agnes-connectors Rules for the extract.duckdb contract every data source must produce — the _meta table, the _remote_attach mechanism for remote-mode tables, parquet layout, and the pattern for adding a new connector. Use when adding a new data source or modifying an existing extractor in connectors/.

Agnes connectors — the extract.duckdb contract

Every data source produces the same output:

/data/extracts/{source_name}/
├── extract.duckdb          ← _meta table + views
└── data/                   ← parquet files (local sources only)

See CLAUDE.md § Architecture: extract.duckdb Contract and docs/architecture.md.

Required _meta table

Every extract.duckdb MUST contain a _meta table with these columns:

column type meaning
table_name VARCHAR name used in views
description VARCHAR human-readable description
rows BIGINT row count at extraction time
size_bytes BIGINT parquet size for local mode, 0 for remote
extracted_at TIMESTAMP extraction time
query_mode VARCHAR one of local, remote, materialized

If _meta is missing or malformed, SyncOrchestrator.rebuild() skips the source with an error logged. Tests for new connectors MUST assert _meta is well-formed.

Four connector shapes

  • Batch pull (Keboola, query_mode='local') — DuckDB extension downloads data to parquet, scheduled. Extractor in connectors/<name>/extractor.py.
  • Remote attach (BigQuery, query_mode='remote') — DuckDB BQ extension, no download. Queries hit the upstream at query time. Requires _remote_attach.
  • Materialized SQL (query_mode='materialized') — scheduler runs admin-registered SQL through DuckDB and writes the result to a parquet under /data/extracts/<source>/data/. Distributed via the same manifest + agnes pull flow as local. BigQuery cost guardrail: data_source.bigquery.max_bytes_per_materialize (default 10 GiB; 0 disables).
  • Real-time push (Jira) — webhooks update parquets incrementally; the webhook handler triggers rebuild_source('jira').

_remote_attach table (remote mode only)

For each remote-mode table in _meta, the extractor writes a row in _remote_attach with alias, extension, url, token_env. See the agnes-orchestrator skill for how the orchestrator consumes it.

Adding a new connector — checklist

  1. Create connectors/<name>/extractor.py that emits extract.duckdb (+ data/*.parquet if local) into /data/extracts/<name>/.
  2. Populate _meta with one row per table.
  3. If any table is query_mode='remote', populate _remote_attach.
  4. Register the connector type in the catalog (search for existing source_type values to follow the pattern).
  5. Add a fixture-based test that runs the extractor against a fixture upstream and asserts _meta is complete.
  6. CHANGELOG bullet under Added per agnes-release-process.

Stable infrastructure — do NOT modify

connectors/jira/file_lock.py. (connectors/jira/transform.py was previously off-limits but as of 0.54.19 is no longer; it remains sensitive — touch only with end-to-end understanding of the JSON-overlay / parquet-rewrite pipeline.)