* feat(home): Getting Started + Overview + Usage modes sections
Three new content cards rendered between the install-hero and the
existing connector tiles on /home. Order: Getting Started → Overview
→ Usage modes → connectors.
- Getting Started — dismissible card with two clickable rows linking
to /setup (install flow) and /setup-advanced (deeper reference).
Subsumes the legacy `.advanced-pointer` row that sat above the news
section. Per-device dismiss via a generic localStorage handler:
`.home-card-close[data-dismiss-key="..."]` inside a <section> wires
itself up at page load — drop in any future dismissible card without
per-card JS.
- Overview — operator-owned HTML body sourced from the new
`instance.overview` yaml field (env override
`AGNES_INSTANCE_OVERVIEW`). HTML in, HTML out via the same `| safe`
filter as news_intro. Empty default hides the section entirely,
keeping the OSS vendor-neutral; operators paste their product
framing / privacy posture into instance.yaml. New helper
`get_instance_overview()` in app/instance_config.py mirrors
`get_instance_logo_svg()`.
- Usage modes — three OSS-shipped tiles (Terminal / VS Code / Claude
Desktop · claude.ai) explaining each surface and linking to the
matching /setup-advanced anchors. Closes the gap for users
wondering "where do I actually run this".
Supporting changes:
- setup_advanced.html gains a new `#claude-app` section between
#vscode and #workspace, anchored by the Usage modes Claude Desktop
tile. Covers the marketplace registration paths and when to prefer
the terminal. Added to the table of contents.
- Three new tests in test_web_home_page.py pin the Getting Started
card markup, the Overview-on-when-yaml-set path, and the
Overview-off-by-default path. All 13 tests in the file pass.
Operator follow-up (separate infra PR — NOT this PR): paste the
Foundry-specific Overview body into instance.yaml's
`instance.overview` field. OSS ships with an empty default.
* fix(home): Overview is operator-owned content — drop dismiss button
Earlier iteration added a close X to the Overview section to match
the Getting Started card's dismiss UX. Wrong call: Overview is
operator-authored reference content (privacy posture, telemetry
policy, project framing) and a per-device localStorage hide means
returning users who want to re-read the policy can't recover it
without clearing storage.
Reverts the close button + the data-dismiss-key on the Overview
section. Test inverted to assert the dismiss key is absent (defends
against a future drive-by adding it back). Getting Started still
dismisses — that's procedural getting-started content users
legitimately stop needing once they've finished setup. Overview is
always reachable; whole section is still opt-in at the operator
level via the empty-yaml default.
* fix(home): Terminal usage-mode tile is informational (no click-through)
The setup hero above /home's Usage modes already walks the user
through the Claude Code CLI install — the Terminal tile click-through
to /setup just round-trips back to content the user already scrolled
past. Switch Terminal to a non-anchor <div> and scope the hover
affordance to a.home-usage-item so VS Code + Claude Desktop tiles
keep their click-through (those legitimately deep-link into
/setup-advanced anchors).
* fix(home): point Usage modes guidance at ~/{workspace}/Projects/ subfolder
The bundled plugin scopes the session-analysis loop and the
central-catalog sync to ~/<workspace>/Projects/, not the workspace
root itself — that convention already appears in the install hero's
Step 4 manual-fallback note ('Don't create ~/<workspace>/Projects/
manually — the bundled plugin offers to set it up after install').
Usage modes' footer guidance now matches: 'create every project
under ~/<workspace>/Projects/'. Also calls out that the
session-analysis loop is scoped to that root so users understand
why working outside the workspace dir is invisible to the platform.
|
||
|---|---|---|
| .github | ||
| app | ||
| cli | ||
| config | ||
| connectors | ||
| dev_docs | ||
| docs | ||
| infra | ||
| scripts | ||
| services | ||
| src | ||
| tests | ||
| .dockerignore | ||
| .gitignore | ||
| .pre-commit-config.yaml | ||
| ARCHITECTURE.md | ||
| Caddyfile | ||
| CHANGELOG.md | ||
| CLAUDE.md | ||
| docker-compose.ci.yml | ||
| docker-compose.dev.yml | ||
| docker-compose.flat-mount.yml | ||
| docker-compose.host-mount.yml | ||
| docker-compose.local-dev.yml | ||
| docker-compose.prod.yml | ||
| docker-compose.test.yml | ||
| docker-compose.tls.yml | ||
| docker-compose.yml | ||
| Dockerfile | ||
| LICENSE | ||
| Makefile | ||
| pyproject.toml | ||
| pytest.ini | ||
| README.md | ||
| uv.lock | ||
Agnes — AI Data Analyst
Agnes is an open-source data distribution platform for AI analytical systems. It extracts data from configured sources into DuckDB, serves it via a FastAPI backend, and distributes Parquet files to analysts who query them locally using Claude Code and DuckDB.
Each data source produces a self-describing extract.duckdb file. The SyncOrchestrator attaches all extract databases into a master analytics.duckdb, making every table available through a unified view layer without copying data unnecessarily.
Architecture: extract.duckdb Contract
Every connector produces the same output structure:
/data/extracts/{source_name}/
├── extract.duckdb ← _meta table + views
└── data/ ← parquet files (local sources only)
The orchestrator scans /data/extracts/*/extract.duckdb, attaches each into analytics.duckdb, and creates master views.
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Keboola │ │ BigQuery │ │ Jira │
│ extractor │ │ extractor │ │ webhooks │
│ (DuckDB ext) │ │ (remote BQ) │ │ (incremental)│
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
▼ ▼ ▼
extract.duckdb extract.duckdb extract.duckdb
+ data/*.parquet (views → BQ) + data/*.parquet
│ │ │
└─────────────────┼─────────────────┘
▼
SyncOrchestrator.rebuild()
ATTACH → master views in analytics.duckdb
│
┌──────────┼──────────┐
▼ ▼ ▼
FastAPI CLI
(serve) (agnes pull)
Supported Data Sources
| Mode | Distribution | Sources | Use when |
|---|---|---|---|
Batch pull (local) |
Parquet on disk, scheduled | Keboola | Source has a native bulk-export and the table fits on disk |
Materialized SQL (materialized) |
Parquet on disk, scheduled query | BigQuery, Keboola | Source table is too large to mirror as-is; you want a curated subset / aggregate on disk |
Remote attach (remote) |
View only, no download | BigQuery | Table is too large to materialize; latency cost of remote query is acceptable |
| Real-time push | Incremental parquet | Jira | Source is event-driven and you need sub-minute freshness |
The first three modes are what agnes pull distributes to analysts. The fourth is server-side only — analysts query Jira data through the same agnes pull-distributed parquets.
Admins manage per-source registrations through the /admin/tables UI (per-connector tabs for BigQuery / Keboola / Jira) or the agnes admin register-table CLI; per-row "Manage access" deep-links to /admin/access for granting tables to user groups via resource_grants(group, ResourceType.TABLE, table_id).
Analysts get a closed loop with Claude Code: agnes init writes <workspace>/.claude/settings.json with SessionStart (agnes pull --quiet) and SessionEnd (agnes push --quiet) hooks so every Claude Code session starts with fresh RBAC-filtered parquets and ends with the session log uploaded back.
Adding a new source means creating connectors/<name>/extractor.py that produces extract.duckdb with a _meta table (table_name, description, rows, size_bytes, extracted_at, query_mode). The orchestrator attaches it automatically.
Quick Start with Docker
# Clone the repository
git clone https://github.com/keboola/agnes-the-ai-analyst.git
cd agnes-the-ai-analyst
# Copy and edit configuration
cp config/instance.yaml.example config/instance.yaml
cp config/.env.template .env
# Edit both files for your environment
# Start the app and scheduler
docker compose up
# Start with all optional services (Telegram bot, etc.)
docker compose --profile full up
# Start with TLS (Caddy on :443 with corporate-CA certs from /data/state/certs)
docker compose -f docker-compose.yml -f docker-compose.prod.yml -f docker-compose.tls.yml \
--profile tls up -d
Once running, the FastAPI app is available at http://localhost:8000 (or https://$DOMAIN in TLS mode). See docs/DEPLOYMENT.md for cert provisioning + auto-rotation via scripts/ops/agnes-tls-rotate.sh. Trigger a manual sync:
curl -X POST http://localhost:8000/api/sync/trigger
Local sync & auto-update
Analysts run Claude Code against a local DuckDB built from RBAC-filtered parquets pulled from the server. agnes pull is the distribution path:
agnes pull # delta-pull: manifest → MD5 compare → download changed → rebuild views
agnes pull --quiet # same, no progress output (for hooks/cron)
agnes push # push session jsonl + CLAUDE.local.md back to the server
agnes init writes Claude Code lifecycle hooks into <workspace>/.claude/settings.json:
SessionStart→agnes pull --quiet— fresh data on every sessionSessionEnd→agnes push --quiet— uploads notes and session log
Hooks live at workspace level so they only fire in this analyst workspace, not in unrelated Claude Code sessions on the same machine.
Admin: which tables auto-sync to whom
The auto-sync set per analyst is the intersection of:
- Tables with
query_mode IN ('local', 'materialized')— these have parquets on disk and end up in the manifest - Tables granted to one of the analyst's groups via
resource_grants(group, ResourceType.TABLE, table_id)(seedocs/RBAC.md)
To enroll a new table for auto-sync, register it (or update its query_mode) and grant it to the relevant groups in /admin/access. New analysts get the same set on their next agnes pull.
For BigQuery, register a query_mode='materialized' table with a SQL body:
agnes admin register-table orders_90d \
--source-type bigquery \
--query-mode materialized \
--query @docs/queries/orders_90d.sql \
--schedule "every 6h"
The scheduler runs the query through the DuckDB BigQuery extension on each tick that's due, writes the result as a parquet, and the analyst picks it up on the next agnes pull. Cost guardrail: data_source.bigquery.max_bytes_per_materialize (default 10 GiB) — operations exceeding the BQ dry-run estimate are skipped.
Development Setup
# Create and activate virtual environment
python3 -m venv .venv && source .venv/bin/activate
# Install dependencies
uv pip install ".[dev]"
# Run FastAPI locally with hot reload
uvicorn app.main:app --reload
# Run the test suite
pytest tests/ -v
Project Structure
├── src/ # Core engine
│ ├── db.py # DuckDB schema (system.duckdb, analytics.duckdb)
│ ├── orchestrator.py # SyncOrchestrator — ATTACHes extract.duckdb files
│ ├── repositories/ # DuckDB-backed CRUD (sync_state, table_registry, users, etc.)
│ ├── profiler.py # Data profiling
│ └── catalog_export.py # OpenMetadata catalog export
├── app/ # FastAPI application
│ ├── main.py # App setup, router registration
│ ├── api/ # REST API (sync, data, catalog, admin, auth)
│ ├── auth/ # Auth providers (Google OAuth, email magic link, desktop JWT)
│ └── web/ # HTML dashboard routes
├── connectors/ # Data source connectors (extract.duckdb contract)
│ ├── keboola/ # Keboola: extractor.py (DuckDB extension) + client.py (fallback)
│ ├── bigquery/ # BigQuery: extractor.py (remote-only via DuckDB BQ extension)
│ └── jira/ # Jira: webhook + incremental parquet → extract.duckdb
├── cli/ # CLI tool (`agnes pull`, `agnes query`, `agnes admin`)
├── services/ # Standalone services (scheduler, telegram_bot, ws_gateway, etc.)
├── scripts/ # Utility + migration scripts
├── config/ # Configuration templates (instance.yaml.example)
├── docs/ # Documentation + metric YAML definitions
└── tests/ # Test suite (633 tests)
Configuration
| File | Purpose |
|---|---|
config/instance.yaml |
Instance-specific settings: branding, data source type, auth provider, Google domain |
.env |
Secrets and environment variables — never committed |
system.duckdb table_registry table |
Table definitions managed via POST /api/admin/register-table (or PUT /api/admin/registry/{id} to update) or the web UI |
Copy the example to get started:
cp config/instance.yaml.example config/instance.yaml
See config/instance.yaml.example for all available options.
Documentation
- Hackathon TL;DR — condensed deploy + dev playbooks (for both humans and AI agents)
- Onboarding Guide — end-to-end Terraform deployment into a GCP project (recommended for production)
- Deployment Guide — chooses between Terraform and Docker Compose; covers OSS self-host
- Configuration Reference —
instance.yaml, env vars, per-instance options - Architecture — orchestrator, extractors, DB layout
- Quickstart — local development
Contributing
- Fork the repository and create a feature branch.
- Run
pytest tests/ -vto verify all tests pass before opening a pull request. - Keep commits focused and messages concise.
- Open a pull request against
mainwith a clear description of the change.
For bugs and feature requests, open a GitHub issue.
License
This project is licensed under the MIT License.