AI-Cognitive-Leap/agnes-the-ai-analyst

Fork 0

Fork of keboola/agnes-the-ai-analyst (via manana2520 GitHub fork). Develop here, push to GitHub fork to open upstream PRs.

Find a file

ZdenekSrotyr 7e4ddf0b01 feat(auth): password reset & invite flows for web + admin (#34 ) (#37 ) * feat(auth): password reset & invite flows for web + admin (#34) Wires end-to-end the previously orphaned password_reset.html and password_setup.html templates, adds the missing POST /auth/password/reset handler (closes #34), and restores the Reset action in the admin user UI (which origin/main had removed precisely because the flow was broken). Web flow - GET /auth/password/reset — renders the set-new-password form - POST /auth/password/reset — 'Forgot Password?' request; emails link, anti-enumeration (same response for unknown email) - POST /auth/password/reset/confirm — validates token + 24h TTL, sets new password, clears token, logs user in - GET /auth/password/setup — renders the setup form (invite link landing) - POST /auth/password/setup/request — signup-tab 'Request Access' (email-only) - POST /auth/password/setup/confirm — 7-day TTL, sets password + name, logs in - Reuses LOCAL_DEV_MODE pattern from email.py: logs the link loudly so developers can use the flow without an SMTP/SendGrid transport Admin flow - POST /api/users accepts send_invite → returns invite_url + invite_email_sent - POST /api/users/{id}/reset-password now returns a full reset_url pointing at the dedicated password-reset endpoint (NOT the magic-link verifier, which would log the user in without prompting for a new password) - admin_users.html: restored Reset row action, copyable reset/invite link modals, invite checkbox on create, reworded 'magic-link not wired' notes Backward compat - JSON POST /auth/password/setup kept unchanged (existing tests pass) - Active-account gate applied to reset/setup flows (matches password_login) Tests: 21 new cases (tests/test_password_flows.py) covering GET renders, request/confirm happy + error paths, TTLs, anti-enumeration, and admin invite/reset URL responses. Full suite: 1309 passed. Closes #34 * fix(admin-users): allow horizontal scroll when actions overflow Four action buttons (Tokens, Reset, Set pwd, Delete) can exceed the viewport on narrow screens. Switch .users-table-wrap from overflow: hidden to overflow-x: auto so the table scrolls instead of clipping, and lock row-actions buttons to a single nowrap line. * fix(admin-users): override base 800px container so table can use full width The base layout caps .container at 800px, so the table was always being clipped regardless of viewport. Unclamp the container on this page and widen the inner page cap to 1400px. * fix(auth): address Devin review — harden JSON setup, anti-enumeration, preserve email case Addresses findings from Devin review on PR #37: 1. JSON POST /auth/password/setup now enforces the same SETUP_TOKEN_TTL (7 days) and active-account check as the web flow. An expired token or a deactivated user can no longer bypass the gate by posting JSON. Existing test fixture seeds setup_token_created=now so backward-compat tests continue to pass. 2. GET /auth/password/setup no longer looks up the user to pre-fill name. The form renders identically regardless of whether the email exists, consistent with anti-enumeration in POST /setup/request. 3. reset_request / setup_request no longer lowercase the submitted email. The rest of the codebase (password_login, magic-link, admin create) uses case-sensitive lookups, so normalizing only here would silently fail for mixed-case accounts. Tests: 6 new cases covering expired-JSON-setup, missing-created-timestamp, deactivated-user-rejection, mixed-case email preservation, and the anti-enumeration property of GET /setup.		2026-04-22 17:43:57 +02:00
.github/workflows	ci(release): push dev-<user-prefix>-latest alias for <user>/* branches (#31 )	2026-04-22 14:02:59 +02:00
app	feat(auth): password reset & invite flows for web + admin (#34 ) (#37 )	2026-04-22 17:43:57 +02:00
cli	User management + PAT + CLI distribution + HTML auth redirect (#9 #10 #11 #12 ) (#28 )	2026-04-22 14:24:28 +02:00
config	ci(release): build image for all branches, not just feature/** (#19 )	2026-04-21 21:33:57 +02:00
connectors	fix: strip HTML from table and column descriptions in OpenMetadata enricher	2026-04-09 18:42:37 +02:00
dev_docs	docs: update stale v1 docs to v2 Docker/FastAPI/DuckDB architecture	2026-04-09 18:44:25 +02:00
docs	docs(security): add padak-security.md audit report (#35 )	2026-04-22 16:31:13 +02:00
infra	ci(release): build image for all branches, not just feature/** (#19 )	2026-04-21 21:33:57 +02:00
scripts	feat(dev): LOCAL_DEV_MODE for one-command local dev + magic-link fixes (#32 )	2026-04-22 14:47:33 +02:00
services	fix: make bot.py FileHandler resilient to missing log directory	2026-04-13 13:28:59 +02:00
src	User management + PAT + CLI distribution + HTML auth redirect (#9 #10 #11 #12 ) (#28 )	2026-04-22 14:24:28 +02:00
tests	feat(auth): password reset & invite flows for web + admin (#34 ) (#37 )	2026-04-22 17:43:57 +02:00
.dockerignore	refactor: consolidate deps into pyproject.toml, remove requirements.txt	2026-04-09 13:17:59 +02:00
.gitignore	infra: add bootstrap-gcp.sh for per-customer GCP setup	2026-04-21 16:18:35 +02:00
ARCHITECTURE.md	Update docs for modular architecture (auth/, services/, scripts/)	2026-03-09 13:11:40 +01:00
Caddyfile	feat: add Caddy HTTPS reverse proxy and production compose override	2026-04-09 16:39:23 +02:00
CHANGELOG.md	feat: multi-instance deployment — all 14 must-have items from spec	2026-04-10 11:57:42 +02:00
CLAUDE.md	User management + PAT + CLI distribution + HTML auth redirect (#9 #10 #11 #12 ) (#28 )	2026-04-22 14:24:28 +02:00
docker-compose.ci.yml	feat: multi-instance deployment — all 14 must-have items from spec	2026-04-10 11:57:42 +02:00
docker-compose.host-mount.yml	fix(ci): move bind-mount of /data to separate overlay, fix CI smoke test	2026-04-21 16:54:18 +02:00
docker-compose.local-dev.yml	feat(dev): LOCAL_DEV_MODE for one-command local dev + magic-link fixes (#32 )	2026-04-22 14:47:33 +02:00
docker-compose.override.yml	chore: Docker prod config (Python 3.13, no reload), fix utcnow deprecation, update docs	2026-04-08 12:10:47 +02:00
docker-compose.prod.yml	fix(ci): move bind-mount of /data to separate overlay, fix CI smoke test	2026-04-21 16:54:18 +02:00
docker-compose.test.yml	feat: add SEED_ADMIN_EMAIL for Docker test environments	2026-03-31 09:48:12 +02:00
docker-compose.yml	feat: multi-instance deployment — all 14 must-have items from spec	2026-04-10 11:57:42 +02:00
Dockerfile	User management + PAT + CLI distribution + HTML auth redirect (#9 #10 #11 #12 ) (#28 )	2026-04-22 14:24:28 +02:00
LICENSE	OSS cleanup: remove internal references, harden deployment, add config env interpolation	2026-03-09 07:59:57 +01:00
Makefile	feat(dev): make local-dev targets for one-keystroke LOCAL_DEV_MODE startup (#33 )	2026-04-22 14:57:10 +02:00
pyproject.toml	chore(deps): bump python-multipart from 0.0.24 to 0.0.26	2026-04-21 13:26:19 +00:00
pytest.ini	test: add shared test infrastructure (fixtures, factories, assertions, mocks)	2026-04-12 11:05:35 +02:00
README.md	docs: add HACKATHON.md — condensed deploy + dev playbooks (#21 )	2026-04-21 21:33:06 +02:00
uv.lock	chore(deps): bump python-multipart from 0.0.24 to 0.0.26	2026-04-21 13:26:19 +00:00

README.md

Agnes — AI Data Analyst

Agnes is an open-source data distribution platform for AI analytical systems. It extracts data from configured sources into DuckDB, serves it via a FastAPI backend, and distributes Parquet files to analysts who query them locally using Claude Code and DuckDB.

Each data source produces a self-describing extract.duckdb file. The SyncOrchestrator attaches all extract databases into a master analytics.duckdb, making every table available through a unified view layer without copying data unnecessarily.

Architecture: extract.duckdb Contract

Every connector produces the same output structure:

/data/extracts/{source_name}/
├── extract.duckdb          ← _meta table + views
└── data/                   ← parquet files (local sources only)

The orchestrator scans /data/extracts/*/extract.duckdb, attaches each into analytics.duckdb, and creates master views.

┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│   Keboola    │  │   BigQuery   │  │   Jira       │
│  extractor   │  │  extractor   │  │  webhooks    │
│ (DuckDB ext) │  │ (remote BQ)  │  │ (incremental)│
└──────┬───────┘  └──────┬───────┘  └──────┬───────┘
       │                 │                 │
       ▼                 ▼                 ▼
   extract.duckdb    extract.duckdb    extract.duckdb
   + data/*.parquet  (views → BQ)      + data/*.parquet
       │                 │                 │
       └─────────────────┼─────────────────┘
                         ▼
              SyncOrchestrator.rebuild()
              ATTACH → master views in analytics.duckdb
                         │
              ┌──────────┼──────────┐
              ▼          ▼          ▼
          FastAPI      CLI
          (serve)    (da sync)

Supported Data Sources

Source	Mode	Description
Keboola	Batch pull	DuckDB Keboola extension downloads tables to Parquet on a schedule
BigQuery	Remote attach	DuckDB BQ extension; queries execute in BigQuery, no local download
Jira	Real-time push	Webhook receiver updates Parquet files incrementally

Adding a new source means creating connectors/<name>/extractor.py that produces extract.duckdb with a _meta table (table_name, description, rows, size_bytes, extracted_at, query_mode). The orchestrator attaches it automatically.

Quick Start with Docker

# Clone the repository
git clone https://github.com/keboola/agnes-the-ai-analyst.git
cd agnes-the-ai-analyst

# Copy and edit configuration
cp config/instance.yaml.example config/instance.yaml
cp config/.env.template .env
# Edit both files for your environment

# Start the app and scheduler
docker compose up

# Start with all optional services (Telegram bot, etc.)
docker compose --profile full up

Once running, the FastAPI app is available at http://localhost:8000. Trigger a manual sync:

curl -X POST http://localhost:8000/api/sync/trigger

Development Setup

# Create and activate virtual environment
python3 -m venv .venv && source .venv/bin/activate

# Install dependencies
uv pip install ".[dev]"

# Run FastAPI locally with hot reload
uvicorn app.main:app --reload

# Run the test suite
pytest tests/ -v

Project Structure

├── src/                    # Core engine
│   ├── db.py               # DuckDB schema (system.duckdb, analytics.duckdb)
│   ├── orchestrator.py     # SyncOrchestrator — ATTACHes extract.duckdb files
│   ├── repositories/       # DuckDB-backed CRUD (sync_state, table_registry, users, etc.)
│   ├── profiler.py         # Data profiling
│   └── catalog_export.py   # OpenMetadata catalog export
├── app/                    # FastAPI application
│   ├── main.py             # App setup, router registration
│   ├── api/                # REST API (sync, data, catalog, admin, auth)
│   ├── auth/               # Auth providers (Google OAuth, email magic link, desktop JWT)
│   └── web/                # HTML dashboard routes
├── connectors/             # Data source connectors (extract.duckdb contract)
│   ├── keboola/            # Keboola: extractor.py (DuckDB extension) + client.py (fallback)
│   ├── bigquery/           # BigQuery: extractor.py (remote-only via DuckDB BQ extension)
│   └── jira/               # Jira: webhook + incremental parquet → extract.duckdb
├── cli/                    # CLI tool (`da sync`, `da query`, `da admin`)
├── services/               # Standalone services (scheduler, telegram_bot, ws_gateway, etc.)
├── scripts/                # Utility + migration scripts
├── config/                 # Configuration templates (instance.yaml.example)
├── docs/                   # Documentation + metric YAML definitions
└── tests/                  # Test suite (633 tests)

Configuration

File	Purpose
`config/instance.yaml`	Instance-specific settings: branding, data source type, auth provider, Google domain
`.env`	Secrets and environment variables — never committed
`system.duckdb` `table_registry` table	Table definitions managed via `POST /api/admin/tables/{id}` or the web UI

Copy the example to get started:

cp config/instance.yaml.example config/instance.yaml

See config/instance.yaml.example for all available options.

Documentation

Hackathon TL;DR — condensed deploy + dev playbooks (for both humans and AI agents)
Onboarding Guide — end-to-end Terraform deployment into a GCP project (recommended for production)
Deployment Guide — chooses between Terraform and Docker Compose; covers OSS self-host
Configuration Reference — instance.yaml, env vars, per-instance options
Architecture — orchestrator, extractors, DB layout
Quickstart — local development

Contributing

Fork the repository and create a feature branch.
Run pytest tests/ -v to verify all tests pass before opening a pull request.
Keep commits focused and messages concise.
Open a pull request against main with a clear description of the change.

For bugs and feature requests, open a GitHub issue.

License

This project is licensed under the MIT License.