New src/rbac.py: Role enum, hierarchy, get_user_role(), has_role(), is_admin(), is_km_admin(), has_dataset_access(), set_user_role(). webapp/auth.py: admin_required + km_admin_required now use DuckDB roles instead of Linux groups (pwd.getpwnam + sudo/data-ops check). app/auth/dependencies.py: imports Role from src/rbac.py (single source). 11 RBAC tests passing.
1.7 KiB
1.7 KiB
Connectors — How to add a new data source
Existing Connectors
- Keboola (
connectors/keboola/extractor.py) — DuckDB Keboola extension, batch pull - BigQuery (
connectors/bigquery/extractor.py) — DuckDB BQ extension, remote-only - Jira (
connectors/jira/) — Webhook + incremental parquet transform
extract.duckdb Contract
Every connector produces the same output:
/data/extracts/{source_name}/
├── extract.duckdb ← _meta table + views
└── data/ ← parquet files (local sources only)
The _meta table must have columns:
table_name VARCHAR— view namedescription VARCHARrows BIGINTsize_bytes BIGINTextracted_at TIMESTAMPquery_mode VARCHAR— 'local' (data here) or 'remote' (query on demand)
Adding a New Connector
-
Create
connectors/<name>/extractor.py:import duckdb from pathlib import Path def run(output_dir: str, table_configs: list[dict], **kwargs): output = Path(output_dir) data_dir = output / "data" data_dir.mkdir(parents=True, exist_ok=True) conn = duckdb.connect(str(output / "extract.duckdb")) # Create _meta table # For each table: COPY TO parquet, create view, insert _meta row conn.close() -
Register tables in DuckDB
table_registryvia admin API or migration script. Setsource_typeto your connector name. -
Add required env vars to
.envandconfig/.env.template. -
The SyncOrchestrator (
src/orchestrator.py) will auto-discover your extract.duckdb.
Configuration
- Instance-level config:
config/instance.yaml(connection details) - Table definitions: DuckDB
table_registrytable - Credentials: environment variables