Cache freshness
Loading…
Show log
Register BigQuery Table
{# Two orthogonal questions: (1) live vs synced, (2) when synced,
whole table vs custom SQL. Visibility classes:
bq-access-live — only when accessMode='live'
bq-access-synced — only when accessMode='synced'
bq-source-table — only when accessMode='live' OR
(accessMode='synced' AND syncMode='whole')
bq-source-custom — only when accessMode='synced' AND syncMode='custom'
Backend payload: live → query_mode='remote'; synced/whole →
query_mode='materialized' with auto-built SELECT *; synced/custom
→ query_mode='materialized' with admin SQL. Server auto-detects
BASE TABLE vs VIEW at register time, so the UI doesn't ask. #}
BigQuery dataset name (no project prefix — read from instance.yaml).
Click Discover to populate the autocomplete from the BQ project's dataset list.
Table or view name within the dataset. Click
List tables after filling Dataset to populate autocomplete.
Live access: BASE TABLEs query via
Synced access: handles both table and view transparently — the scheduler runs
Live access: BASE TABLEs query via
bq."dataset"."table" (Storage Read API; predicate pushdown).
VIEWs and MATERIALIZED_VIEWs query via the BQ jobs API (full-scan estimate;
cost-guarded by bq_max_scan_bytes).
agnes query --remote works for both.
Synced access: handles both table and view transparently — the scheduler runs
SELECT * through the jobs API and writes a
parquet.
SELECT statement, no trailing semicolon. Native BQ identifiers
(
`project.dataset.table`) recommended — DuckDB three-part
names like bq."ds"."t" work for the COPY but disable the
cost guardrail's BQ dry-run.
Name analysts use to query the data (e.g.
SELECT * FROM orders_90d). Required for Custom query; defaults
to the source table for the other modes.Logical grouping for catalog organization
How often Agnes refreshes the local copy. Examples:
every 15m, every 6h,
daily 03:00, daily 07:00,13:00,18:00 (UTC).
Source check
Edit BigQuery Table
Slugified id, immutable. Source type:
bigquery
Table or view name within the dataset.
Live access: BASE TABLEs query via
Synced access: handles both transparently — the scheduler runs
Live access: BASE TABLEs query via
bq."dataset"."table" (Storage Read API; predicate pushdown).
VIEWs and MATERIALIZED_VIEWs query via the BQ jobs API (full-scan estimate;
cost-guarded by bq_max_scan_bytes).
agnes query --remote works for both.
Synced access: handles both transparently — the scheduler runs
SELECT * through the jobs API and writes a
parquet.SELECT statement, no trailing semicolon. Native BQ
identifiers recommended for the cost guardrail to engage.
How often Agnes refreshes the local copy.
every 15m, every 6h,
daily 03:00 (UTC).Logical grouping for catalog organization (does not affect storage).
Register Keboola Table
{# Three sync-mode radios:
- whole / custom → query_mode='materialized' (DuckDB Keboola
extension; whole synthesizes SELECT *, custom uses admin SQL)
- direct → query_mode='local' (Storage API SDK, supports
v26 sync strategies: incremental/partitioned + where_filters) #}
{# Discover/List tables backend currently routes by instance's data_source.type
ignoring the `source` query param. Hiding the buttons on non-Keboola instances
prevents wrong-shape responses; inputs stay for manual entry. Future fix: make
/api/admin/discover-tables accept ?source=keboola and remove this guard. #}
SELECT against
kbc."bucket"."table".
Result is materialized to parquet and distributed via
agnes pull.
How often Agnes refreshes the local copy. Examples:
every 15m, every 6h,
daily 03:00, daily 07:00,13:00,18:00 (UTC).
Advanced (optional)
Comma-separated list. Required for
Direct extract → Incremental (used as the dedup key on
delta merge). Auto-filled from the Keboola source when
available.
Direct extract — sync strategy
Backtrack window applied to last_sync timestamp on each tick.
Higher = more reliable on late-arriving rows; lower = less data per tick.
Cap on how far back the first-ever sync goes. Multi-year tables
without this can OOM at write — set 90/180/365 for safety.
Date / timestamp column whose value drives the partition key.
Rows with NULL or unparseable values are dropped (logged warning).
First-sync chunked load step. Smaller = more API calls, less
memory per chunk. Larger = fewer calls, more memory.
Server-side row filter. Operators:
eq, ne, gt, ge, lt, le.
Date placeholders resolved at sync time:
{{ '{{today}}' }},
{{ '{{last_week}}' }},
{{ '{{last_month}}' }},
{{ '{{last_2_months}}' }},
{{ '{{last_3_months}}' }},
{{ '{{last_6_months}}' }},
{{ '{{last_year}}' }},
{{ '{{last_2_years}}' }},
{{ '{{start_of_3_months_ago}}' }}.
Not compatible with Incremental strategy.
Edit Keboola Table
Slugified id, immutable.
Advanced (optional)
Comma-separated list. Required for
Direct extract → Incremental.
Direct extract — sync strategy
Operators:
eq, ne, gt, ge, lt, le. Date placeholders
({{ '{{today}}' }}, {{ '{{last_3_months}}' }}, etc.) resolved at
sync time. Not compatible with Incremental.
Jira tables are populated by webhooks.
To register a new Jira webhook integration, see
docs/connectors/jira.md.