agnes-the-ai-analyst/config/data_description.md.example
Petr c56905d34f Initial commit: OSS data distribution platform
Open-source AI data analyst platform extracted from internal repo.
Includes data sync engine, Keboola adapter, Flask web portal,
server deployment scripts, and configuration templates.
2026-03-08 23:31:28 +01:00

40 lines
1.2 KiB
Text

# Data Description
This file defines the tables available for synchronization and analysis.
Copy this file to `data_description.md` and customize for your data sources.
## Tables
```yaml
# Folder mapping: data source bucket -> local folder name
folder_mapping:
"in.c-example": "example"
tables:
- id: "in.c-example.customers"
name: "customers"
description: "Customer master data"
primary_key: "id"
sync_strategy: "full_refresh"
- id: "in.c-example.orders"
name: "orders"
description: "Order transactions with line items"
primary_key: "id"
sync_strategy: "incremental"
incremental_window_days: 7
partition_by: "created_at"
partition_granularity: "month"
```
## Sync Strategies
- **full_refresh**: Downloads entire table on each sync. Best for small reference tables.
- **incremental**: Downloads only new/changed rows based on a date column. Best for large transactional tables.
## Partition Granularity
When using `partition_by`, data is split into separate Parquet files by time period:
- **month**: One file per month (e.g., `orders/2024-01.parquet`)
- **day**: One file per day (e.g., `events/2024-01-15.parquet`)
- **none**: Single file (default)