Fork of keboola/agnes-the-ai-analyst (via manana2520 GitHub fork). Develop here, push to GitHub fork to open upstream PRs.
Find a file
Vojtech 2e2e1a1eca
feat(home): state-aware /home + /setup-advanced + schema v26 (#228)
* feat(home+news): state-aware /home + /news + admin-edited news section

Squash of the vr/home-page feature work for clean rebase onto main.
Original 18-commit history preserved in branch backup/vr-home-page-pre-rebase.

What's in this PR:

**State-aware /home page**
- New `/home` route with hero + auto-mode + connectors (Asana / GWS /
  Atlassian) + lookarounds. Onboarded vs not-onboarded state-machine
  branches a single template (`home_not_onboarded.html`); the install
  steps, "Setup a new Claude Code" CTA (90-day PAT mint), and per-
  connector setup prompts hide once `users.onboarded=TRUE`. A
  completion badge replaces them.
- "Mark me as offboarded" button reverses the flag without an SQL UPDATE.
- `users.onboarded BOOLEAN` column added; default FALSE; flipped by the
  CLI's `agnes init` post-success POST and the `/admin/users` API.
- Connector setup prompts pre-check whether the tool is already
  installed/connected before re-running setup.
- GWS scope set widened to include Google Chat (`chat.spaces`,
  `chat.messages`).

**Single template + design tokens**
- `dashboard.html` now extends `base.html` via the new
  `{% block layout %}` opt-out (full-width pages skip the 800px
  `.container`). Net: every page shares one shell.
- `style-custom.css` `:root` extended with `--space-{7,9,10,12}`,
  `--radius-2xl`, `--shadow-{card,elevated}`, `--text-{muted,disabled}`,
  `--focus-ring`, `--transition-*`, `--width-{narrow,app,wide}` so
  inline page styles can migrate incrementally.

**Auth redirects honor AGNES_HOME_ROUTE**
- `safe_next_path` resolves the configured home route when no `default=`
  is passed; OAuth callbacks, magic-link clicks, password form, and
  LOCAL_DEV_MODE shortcuts now land on `/home` (or whatever the operator
  picked) instead of always /dashboard.

**News section + /news permalink + /admin/news editor**
- Schema-bumped `news_template` table (single versioned entity, draft +
  publish gate). `published BOOLEAN` distinguishes draft from public;
  monotonically-increasing `version` per save; rows >30d pruned on
  save except the currently-displayed published version.
- `/home` bottom-of-page renders the latest published intro with a
  "Read more →" link to `/news` (which renders the full body).
- `/admin/news` editor with sandboxed live preview, versions table,
  per-row Unpublish, Format-help cheatsheet.
- `agnes admin news show / draft / edit / publish / unpublish /
  versions / export` (CLI). Talks to the live server via the
  `/api/admin/news/*` endpoints (PAT-authed) — no direct DB access
  so it coexists with a running uvicorn.
- **Optimistic-lock guard**: `agnes admin news publish --version N` and
  PUT/PATCH endpoints accept `expected_version` and 409 with structured
  `{error: "version_conflict", expected, actual, actual_by}` when a
  concurrent admin replaced the draft. Edit refuses to overwrite a
  draft authored by someone else without `--force` or
  `--expect-version`.
- nh3 (Rust-backed ammonia) HTML sanitizer; iframe pre-pass strips
  any iframe whose src is not on the YouTube/Vimeo/Loom allowlist;
  javascript:/data: schemes blocked everywhere.
- Author CSS vocabulary: `.news-hero` (blue gradient hero block),
  `.callout`/`.callout-{info,warn,success,danger}`,
  `.video-embed`, `.news-section`, `.news-grid-{2,3}`, `.news-cta` —
  all consolidated in `style-custom.css` under "News content
  vocabulary (shared)" so /home perex, /news body, and /admin/news
  preview share one source of styling.
- Code-inside-`<pre>` contrast fix (was unreadable amber-on-silver).
- `.news-content` table styling (border, header band, row-hover).

**`scripts/dev/run-local.sh`** — local uvicorn launcher. Pulls Google
OAuth client id/secret from GCP Secret Manager
(`AGNES_OAUTH_GCP_PROJECT`-driven, no vendor defaults), points
`AGNES_CLI_DIST_DIR` at `./dist` so the wheel endpoint resolves, and
`--dev` flips `LOCAL_DEV_MODE=1` + `AGNES_HOME_ROUTE=/home` for one-
command iteration. `LOCAL_DEV_MODE=1` also enables the FastAPI debug
toolbar.

**CLAUDE.md "Run tests before every push" section** codifies
`pytest tests/ -n auto -q` as non-negotiable before each push.

**Tests**: 51 + 14 + 8 = 73 new tests across news-template repo,
sanitizer, API, web, CLI; plus updated home/auth/template tests for
the new shared-shell architecture.

Origin docs (gitignored, customer-fork content):
docs/brainstorms/home-page-requirements.md,
docs/plans/2026-05-07-001-feat-home-page-plan.md.

* feat(cli): agnes onboarded {on,off,status} — self-scoped flag toggle

User-facing equivalent of the in-page "Mark me as (off)boarded" button
on /home. POSTs /api/me/onboarded with {onboarded, source}; --source
overrides the audit-log marker so flips made from the CLI vs the web
button vs agnes init automation stay distinguishable.

`status` reads via /api/me/profile (when present); falls back to a
quick body-marker scan of /home so the read path doesn't write an
audit_log row. PAT-authed via cli.client.api_post — same convention
as agnes admin news / agnes admin add-user etc.

Tests: 5 covering on/off/status round-trip, idempotency, and
audit-log source recording. Full suite holds at 12 pre-existing
failures (same set as before).

* ui(nav+home): primary nav reorg + green What's new band + /marketplace link fix

Primary nav (post-rebase audit + per-user feedback):

- Items: Home → Marketplace → Data Packages → Memory. Admin dropdown
  for admins only. The "Dashboard" label was renamed Home — point still
  resolves through `home_route` so customer instances on /dashboard
  still land there.
- Activity Center moved into the Admin dropdown. Per-team adoption
  analytics is admin-consumed in practice; the route still allows
  any authed user for direct deep-links so existing /home tile +
  bookmarks keep working.
- Memory link added (→ /corporate-memory) — was previously buried in
  the /home "Look around" tiles.
- Setup local agent + My Stack dropped from main nav. Setup is the
  /home install flow's home now; My Stack lives as a tab inside
  /marketplace.

/home tweaks:

- Plugin marketplace tile now points at /marketplace (was /store —
  legacy from before the marketplace rebrand landed in #230).
- "What's new" section header gets a green band (success-flavored
  D1FAE5 background, A7F3D0 border, darker green title) so the
  bottom-of-page news block visibly distinguishes from the blue
  install-hero at the top. Header strip only — body stays white.

Test fix: test_home_route_resolution renamed `dashboard_link_uses_home_route`
→ `home_link_uses_home_route` and asserts `href="/home">Home` instead
of `href="/home">Dashboard` after the label change.

* fix(home): decouple Step 3 + Connect-tools collapse from server onboarded flag

The server-side `users.onboarded` flip happens through two paths:

1. Explicit user click on "Mark me as onboarded" or `agnes onboarded on`.
2. Implicit `agnes init` POST → /api/me/onboarded on success.

Path 2 produced a UX surprise: an analyst running `agnes init` mid-flow
reloaded /home and saw Step 3 (auto-mode) + Connect-your-tools auto-
collapse to summary bars. They were actively working through those
sections — the install POST never signalled "I'm done with the rest
of setup", just "Agnes itself is installed".

Decouple the section-collapse decision from the server flag:

- Step 1 + Step 2 install blocks: still hidden on `onboarded=TRUE`
  (their completion is a hard server signal — Agnes IS installed).
- Step 3 + Connect-your-tools: render flat by default in BOTH states.
  Wrapped in `<details class="setup-collapsible" open>` so the
  browser's native disclosure handles per-section toggle without JS,
  but the `<summary>` is CSS-hidden until the page-level
  `data-setup-minimized="1"` attribute is set on `.home-mock`.
- New "Minimize setup view" toggle inside the blue install-hero,
  rendered only when onboarded. Click flips the data-attr on
  `.home-mock` AND removes the `open` attribute from each
  `<details>`. State persists in `localStorage["agnes_home_setup_minimized"]`
  so the choice survives reloads but is per-device.
- "Show full setup view" (the same button when minimized) re-opens
  both `<details>` and clears localStorage.

When minimized, each `<details>` still has its own native expand/
collapse — click the gray summary bar to peek at one section without
toggling the page-level minimize off.

Tests:
- test_step3_and_connectors_render_flat_when_onboarded_by_default —
  asserts `<details class="setup-collapsible" ... open>` for both
  sections post-onboarding and the absence of any server-rendered
  `data-setup-minimized` attribute on the `.home-mock` root.
- test_minimize_toggle_visible_only_when_onboarded — toggle button
  rendered only when onboarded.

Full pytest holds at 12 pre-existing failures (same set).
2026-05-08 18:28:47 +02:00
.github fix(ci): smoke-test stale route + rollback ghcr auth + issues:write (#140) 2026-04-30 09:42:27 +02:00
app feat(home): state-aware /home + /setup-advanced + schema v26 (#228) 2026-05-08 18:28:47 +02:00
cli feat(home): state-aware /home + /setup-advanced + schema v26 (#228) 2026-05-08 18:28:47 +02:00
config feat(observability): optional PostHog integration (#231) 2026-05-08 17:57:10 +04:00
connectors feat(observability): optional PostHog integration (#231) 2026-05-08 17:57:10 +04:00
dev_docs chore(docs): replace stale da verbs and vendor-specific install paths 2026-05-04 21:22:19 +02:00
docs feat(home): state-aware /home + /setup-advanced + schema v26 (#228) 2026-05-08 18:28:47 +02:00
infra infra(customer-instance): preserve operator AGNES_TAG / AGNES_TEMP_DIR (#214) 2026-05-07 11:36:36 +02:00
scripts feat(home): state-aware /home + /setup-advanced + schema v26 (#228) 2026-05-08 18:28:47 +02:00
services feat(observability): optional PostHog integration (#231) 2026-05-08 17:57:10 +04:00
src feat(home): state-aware /home + /setup-advanced + schema v26 (#228) 2026-05-08 18:28:47 +02:00
tests feat(home): state-aware /home + /setup-advanced + schema v26 (#228) 2026-05-08 18:28:47 +02:00
.dockerignore refactor: consolidate deps into pyproject.toml, remove requirements.txt 2026-04-09 13:17:59 +02:00
.gitignore feat(home): state-aware /home + /setup-advanced + schema v26 (#228) 2026-05-08 18:28:47 +02:00
.pre-commit-config.yaml feat(ci+tests): deploy safety audit — linting, rollback, smoke tests, 50+ new tests (#120) 2026-04-29 09:18:55 +02:00
ARCHITECTURE.md fix: address Devin Review findings — incomplete renames + estimate guard 2026-05-04 20:05:06 +02:00
Caddyfile fix: Devin Review on #188 — try_files fallback + auto-upgrade ordering 2026-05-05 17:24:42 +02:00
CHANGELOG.md feat(home): state-aware /home + /setup-advanced + schema v26 (#228) 2026-05-08 18:28:47 +02:00
CLAUDE.md feat(home): state-aware /home + /setup-advanced + schema v26 (#228) 2026-05-08 18:28:47 +02:00
docker-compose.ci.yml feat: multi-instance deployment — all 14 must-have items from spec 2026-04-10 11:57:42 +02:00
docker-compose.dev.yml fix(security+ops) + release(0.12.1): #82 #85 #87 hardening + cut 0.12.1 (#104) 2026-04-28 19:57:30 +02:00
docker-compose.flat-mount.yml fix: Devin Review on #194 round 2 — 3 BUG-class findings 2026-05-05 20:02:50 +02:00
docker-compose.host-mount.yml fix: Devin Review on #194 round 2 — 3 BUG-class findings 2026-05-05 20:02:50 +02:00
docker-compose.local-dev.yml release(0.11.2): LOCAL_DEV_GROUPS dev mock + Makefile defaults + docs/local-development.md (#70) 2026-04-26 16:48:55 +02:00
docker-compose.prod.yml fix(compose): drop corporate-memory + session-collector services (#176) 2026-05-04 23:59:44 +02:00
docker-compose.test.yml chore(deploy): trust proxy headers + document HTTPS env vars (#48) 2026-04-24 08:52:53 +02:00
docker-compose.tls.yml feat(tls): corporate-CA HTTPS with URL-driven rotation, on-VM CSR gen, self-signed fallback (#51) 2026-04-25 19:51:25 +00:00
docker-compose.yml release: 0.47.4 — Docker collector skip + FIFO session-pipeline check (#229) 2026-05-08 09:38:21 +02:00
Dockerfile refactor(ops): bake all host artifacts into image, drop every curl-from-main (#149) 2026-04-30 21:40:25 +02:00
LICENSE OSS cleanup: remove internal references, harden deployment, add config env interpolation 2026-03-09 07:59:57 +01:00
Makefile fix(security+ops) + release(0.12.1): #82 #85 #87 hardening + cut 0.12.1 (#104) 2026-04-28 19:57:30 +02:00
pyproject.toml feat(home): state-aware /home + /setup-advanced + schema v26 (#228) 2026-05-08 18:28:47 +02:00
pytest.ini feat(rbac+marketplace): RBAC v13 + Claude Code marketplace + #81/#83/#44 hardening 2026-04-28 14:25:04 +02:00
README.md fix: address Devin Review findings — incomplete renames + estimate guard 2026-05-04 20:05:06 +02:00
uv.lock feat(home): state-aware /home + /setup-advanced + schema v26 (#228) 2026-05-08 18:28:47 +02:00

Agnes — AI Data Analyst

Agnes is an open-source data distribution platform for AI analytical systems. It extracts data from configured sources into DuckDB, serves it via a FastAPI backend, and distributes Parquet files to analysts who query them locally using Claude Code and DuckDB.

Each data source produces a self-describing extract.duckdb file. The SyncOrchestrator attaches all extract databases into a master analytics.duckdb, making every table available through a unified view layer without copying data unnecessarily.

Architecture: extract.duckdb Contract

Every connector produces the same output structure:

/data/extracts/{source_name}/
├── extract.duckdb          ← _meta table + views
└── data/                   ← parquet files (local sources only)

The orchestrator scans /data/extracts/*/extract.duckdb, attaches each into analytics.duckdb, and creates master views.

┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│   Keboola    │  │   BigQuery   │  │   Jira       │
│  extractor   │  │  extractor   │  │  webhooks    │
│ (DuckDB ext) │  │ (remote BQ)  │  │ (incremental)│
└──────┬───────┘  └──────┬───────┘  └──────┬───────┘
       │                 │                 │
       ▼                 ▼                 ▼
   extract.duckdb    extract.duckdb    extract.duckdb
   + data/*.parquet  (views → BQ)      + data/*.parquet
       │                 │                 │
       └─────────────────┼─────────────────┘
                         ▼
              SyncOrchestrator.rebuild()
              ATTACH → master views in analytics.duckdb
                         │
              ┌──────────┼──────────┐
              ▼          ▼          ▼
          FastAPI      CLI
          (serve)    (agnes pull)

Supported Data Sources

Mode Distribution Sources Use when
Batch pull (local) Parquet on disk, scheduled Keboola Source has a native bulk-export and the table fits on disk
Materialized SQL (materialized) Parquet on disk, scheduled query BigQuery, Keboola Source table is too large to mirror as-is; you want a curated subset / aggregate on disk
Remote attach (remote) View only, no download BigQuery Table is too large to materialize; latency cost of remote query is acceptable
Real-time push Incremental parquet Jira Source is event-driven and you need sub-minute freshness

The first three modes are what agnes pull distributes to analysts. The fourth is server-side only — analysts query Jira data through the same agnes pull-distributed parquets.

Admins manage per-source registrations through the /admin/tables UI (per-connector tabs for BigQuery / Keboola / Jira) or the agnes admin register-table CLI; per-row "Manage access" deep-links to /admin/access for granting tables to user groups via resource_grants(group, ResourceType.TABLE, table_id).

Analysts get a closed loop with Claude Code: agnes init writes <workspace>/.claude/settings.json with SessionStart (agnes pull --quiet) and SessionEnd (agnes push --quiet) hooks so every Claude Code session starts with fresh RBAC-filtered parquets and ends with the session log uploaded back.

Adding a new source means creating connectors/<name>/extractor.py that produces extract.duckdb with a _meta table (table_name, description, rows, size_bytes, extracted_at, query_mode). The orchestrator attaches it automatically.

Quick Start with Docker

# Clone the repository
git clone https://github.com/keboola/agnes-the-ai-analyst.git
cd agnes-the-ai-analyst

# Copy and edit configuration
cp config/instance.yaml.example config/instance.yaml
cp config/.env.template .env
# Edit both files for your environment

# Start the app and scheduler
docker compose up

# Start with all optional services (Telegram bot, etc.)
docker compose --profile full up

# Start with TLS (Caddy on :443 with corporate-CA certs from /data/state/certs)
docker compose -f docker-compose.yml -f docker-compose.prod.yml -f docker-compose.tls.yml \
    --profile tls up -d

Once running, the FastAPI app is available at http://localhost:8000 (or https://$DOMAIN in TLS mode). See docs/DEPLOYMENT.md for cert provisioning + auto-rotation via scripts/ops/agnes-tls-rotate.sh. Trigger a manual sync:

curl -X POST http://localhost:8000/api/sync/trigger

Local sync & auto-update

Analysts run Claude Code against a local DuckDB built from RBAC-filtered parquets pulled from the server. agnes pull is the distribution path:

agnes pull             # delta-pull: manifest → MD5 compare → download changed → rebuild views
agnes pull --quiet     # same, no progress output (for hooks/cron)
agnes push  # push session jsonl + CLAUDE.local.md back to the server

agnes init writes Claude Code lifecycle hooks into <workspace>/.claude/settings.json:

  • SessionStartagnes pull --quiet — fresh data on every session
  • SessionEndagnes push --quiet — uploads notes and session log

Hooks live at workspace level so they only fire in this analyst workspace, not in unrelated Claude Code sessions on the same machine.

Admin: which tables auto-sync to whom

The auto-sync set per analyst is the intersection of:

  1. Tables with query_mode IN ('local', 'materialized') — these have parquets on disk and end up in the manifest
  2. Tables granted to one of the analyst's groups via resource_grants(group, ResourceType.TABLE, table_id) (see docs/RBAC.md)

To enroll a new table for auto-sync, register it (or update its query_mode) and grant it to the relevant groups in /admin/access. New analysts get the same set on their next agnes pull.

For BigQuery, register a query_mode='materialized' table with a SQL body:

agnes admin register-table orders_90d \
    --source-type bigquery \
    --query-mode materialized \
    --query @docs/queries/orders_90d.sql \
    --schedule "every 6h"

The scheduler runs the query through the DuckDB BigQuery extension on each tick that's due, writes the result as a parquet, and the analyst picks it up on the next agnes pull. Cost guardrail: data_source.bigquery.max_bytes_per_materialize (default 10 GiB) — operations exceeding the BQ dry-run estimate are skipped.

Development Setup

# Create and activate virtual environment
python3 -m venv .venv && source .venv/bin/activate

# Install dependencies
uv pip install ".[dev]"

# Run FastAPI locally with hot reload
uvicorn app.main:app --reload

# Run the test suite
pytest tests/ -v

Project Structure

├── src/                    # Core engine
│   ├── db.py               # DuckDB schema (system.duckdb, analytics.duckdb)
│   ├── orchestrator.py     # SyncOrchestrator — ATTACHes extract.duckdb files
│   ├── repositories/       # DuckDB-backed CRUD (sync_state, table_registry, users, etc.)
│   ├── profiler.py         # Data profiling
│   └── catalog_export.py   # OpenMetadata catalog export
├── app/                    # FastAPI application
│   ├── main.py             # App setup, router registration
│   ├── api/                # REST API (sync, data, catalog, admin, auth)
│   ├── auth/               # Auth providers (Google OAuth, email magic link, desktop JWT)
│   └── web/                # HTML dashboard routes
├── connectors/             # Data source connectors (extract.duckdb contract)
│   ├── keboola/            # Keboola: extractor.py (DuckDB extension) + client.py (fallback)
│   ├── bigquery/           # BigQuery: extractor.py (remote-only via DuckDB BQ extension)
│   └── jira/               # Jira: webhook + incremental parquet → extract.duckdb
├── cli/                    # CLI tool (`agnes pull`, `agnes query`, `agnes admin`)
├── services/               # Standalone services (scheduler, telegram_bot, ws_gateway, etc.)
├── scripts/                # Utility + migration scripts
├── config/                 # Configuration templates (instance.yaml.example)
├── docs/                   # Documentation + metric YAML definitions
└── tests/                  # Test suite (633 tests)

Configuration

File Purpose
config/instance.yaml Instance-specific settings: branding, data source type, auth provider, Google domain
.env Secrets and environment variables — never committed
system.duckdb table_registry table Table definitions managed via POST /api/admin/register-table (or PUT /api/admin/registry/{id} to update) or the web UI

Copy the example to get started:

cp config/instance.yaml.example config/instance.yaml

See config/instance.yaml.example for all available options.

Documentation

Contributing

  1. Fork the repository and create a feature branch.
  2. Run pytest tests/ -v to verify all tests pass before opening a pull request.
  3. Keep commits focused and messages concise.
  4. Open a pull request against main with a clear description of the change.

For bugs and feature requests, open a GitHub issue.

License

This project is licensed under the MIT License.