* feat(auth): password reset & invite flows for web + admin (#34) Wires end-to-end the previously orphaned password_reset.html and password_setup.html templates, adds the missing POST /auth/password/reset handler (closes #34), and restores the Reset action in the admin user UI (which origin/main had removed precisely because the flow was broken). Web flow - GET /auth/password/reset — renders the set-new-password form - POST /auth/password/reset — 'Forgot Password?' request; emails link, anti-enumeration (same response for unknown email) - POST /auth/password/reset/confirm — validates token + 24h TTL, sets new password, clears token, logs user in - GET /auth/password/setup — renders the setup form (invite link landing) - POST /auth/password/setup/request — signup-tab 'Request Access' (email-only) - POST /auth/password/setup/confirm — 7-day TTL, sets password + name, logs in - Reuses LOCAL_DEV_MODE pattern from email.py: logs the link loudly so developers can use the flow without an SMTP/SendGrid transport Admin flow - POST /api/users accepts send_invite → returns invite_url + invite_email_sent - POST /api/users/{id}/reset-password now returns a full reset_url pointing at the dedicated password-reset endpoint (NOT the magic-link verifier, which would log the user in without prompting for a new password) - admin_users.html: restored Reset row action, copyable reset/invite link modals, invite checkbox on create, reworded 'magic-link not wired' notes Backward compat - JSON POST /auth/password/setup kept unchanged (existing tests pass) - Active-account gate applied to reset/setup flows (matches password_login) Tests: 21 new cases (tests/test_password_flows.py) covering GET renders, request/confirm happy + error paths, TTLs, anti-enumeration, and admin invite/reset URL responses. Full suite: 1309 passed. Closes #34 * fix(admin-users): allow horizontal scroll when actions overflow Four action buttons (Tokens, Reset, Set pwd, Delete) can exceed the viewport on narrow screens. Switch .users-table-wrap from overflow: hidden to overflow-x: auto so the table scrolls instead of clipping, and lock row-actions buttons to a single nowrap line. * fix(admin-users): override base 800px container so table can use full width The base layout caps .container at 800px, so the table was always being clipped regardless of viewport. Unclamp the container on this page and widen the inner page cap to 1400px. * fix(auth): address Devin review — harden JSON setup, anti-enumeration, preserve email case Addresses findings from Devin review on PR #37: 1. JSON POST /auth/password/setup now enforces the same SETUP_TOKEN_TTL (7 days) and active-account check as the web flow. An expired token or a deactivated user can no longer bypass the gate by posting JSON. Existing test fixture seeds setup_token_created=now so backward-compat tests continue to pass. 2. GET /auth/password/setup no longer looks up the user to pre-fill name. The form renders identically regardless of whether the email exists, consistent with anti-enumeration in POST /setup/request. 3. reset_request / setup_request no longer lowercase the submitted email. The rest of the codebase (password_login, magic-link, admin create) uses case-sensitive lookups, so normalizing only here would silently fail for mixed-case accounts. Tests: 6 new cases covering expired-JSON-setup, missing-created-timestamp, deactivated-user-rejection, mixed-case email preservation, and the anti-enumeration property of GET /setup. |
||
|---|---|---|
| .github/workflows | ||
| app | ||
| cli | ||
| config | ||
| connectors | ||
| dev_docs | ||
| docs | ||
| infra | ||
| scripts | ||
| services | ||
| src | ||
| tests | ||
| .dockerignore | ||
| .gitignore | ||
| ARCHITECTURE.md | ||
| Caddyfile | ||
| CHANGELOG.md | ||
| CLAUDE.md | ||
| docker-compose.ci.yml | ||
| docker-compose.host-mount.yml | ||
| docker-compose.local-dev.yml | ||
| docker-compose.override.yml | ||
| docker-compose.prod.yml | ||
| docker-compose.test.yml | ||
| docker-compose.yml | ||
| Dockerfile | ||
| LICENSE | ||
| Makefile | ||
| pyproject.toml | ||
| pytest.ini | ||
| README.md | ||
| uv.lock | ||
Agnes — AI Data Analyst
Agnes is an open-source data distribution platform for AI analytical systems. It extracts data from configured sources into DuckDB, serves it via a FastAPI backend, and distributes Parquet files to analysts who query them locally using Claude Code and DuckDB.
Each data source produces a self-describing extract.duckdb file. The SyncOrchestrator attaches all extract databases into a master analytics.duckdb, making every table available through a unified view layer without copying data unnecessarily.
Architecture: extract.duckdb Contract
Every connector produces the same output structure:
/data/extracts/{source_name}/
├── extract.duckdb ← _meta table + views
└── data/ ← parquet files (local sources only)
The orchestrator scans /data/extracts/*/extract.duckdb, attaches each into analytics.duckdb, and creates master views.
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Keboola │ │ BigQuery │ │ Jira │
│ extractor │ │ extractor │ │ webhooks │
│ (DuckDB ext) │ │ (remote BQ) │ │ (incremental)│
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
▼ ▼ ▼
extract.duckdb extract.duckdb extract.duckdb
+ data/*.parquet (views → BQ) + data/*.parquet
│ │ │
└─────────────────┼─────────────────┘
▼
SyncOrchestrator.rebuild()
ATTACH → master views in analytics.duckdb
│
┌──────────┼──────────┐
▼ ▼ ▼
FastAPI CLI
(serve) (da sync)
Supported Data Sources
| Source | Mode | Description |
|---|---|---|
| Keboola | Batch pull | DuckDB Keboola extension downloads tables to Parquet on a schedule |
| BigQuery | Remote attach | DuckDB BQ extension; queries execute in BigQuery, no local download |
| Jira | Real-time push | Webhook receiver updates Parquet files incrementally |
Adding a new source means creating connectors/<name>/extractor.py that produces extract.duckdb with a _meta table (table_name, description, rows, size_bytes, extracted_at, query_mode). The orchestrator attaches it automatically.
Quick Start with Docker
# Clone the repository
git clone https://github.com/keboola/agnes-the-ai-analyst.git
cd agnes-the-ai-analyst
# Copy and edit configuration
cp config/instance.yaml.example config/instance.yaml
cp config/.env.template .env
# Edit both files for your environment
# Start the app and scheduler
docker compose up
# Start with all optional services (Telegram bot, etc.)
docker compose --profile full up
Once running, the FastAPI app is available at http://localhost:8000. Trigger a manual sync:
curl -X POST http://localhost:8000/api/sync/trigger
Development Setup
# Create and activate virtual environment
python3 -m venv .venv && source .venv/bin/activate
# Install dependencies
uv pip install ".[dev]"
# Run FastAPI locally with hot reload
uvicorn app.main:app --reload
# Run the test suite
pytest tests/ -v
Project Structure
├── src/ # Core engine
│ ├── db.py # DuckDB schema (system.duckdb, analytics.duckdb)
│ ├── orchestrator.py # SyncOrchestrator — ATTACHes extract.duckdb files
│ ├── repositories/ # DuckDB-backed CRUD (sync_state, table_registry, users, etc.)
│ ├── profiler.py # Data profiling
│ └── catalog_export.py # OpenMetadata catalog export
├── app/ # FastAPI application
│ ├── main.py # App setup, router registration
│ ├── api/ # REST API (sync, data, catalog, admin, auth)
│ ├── auth/ # Auth providers (Google OAuth, email magic link, desktop JWT)
│ └── web/ # HTML dashboard routes
├── connectors/ # Data source connectors (extract.duckdb contract)
│ ├── keboola/ # Keboola: extractor.py (DuckDB extension) + client.py (fallback)
│ ├── bigquery/ # BigQuery: extractor.py (remote-only via DuckDB BQ extension)
│ └── jira/ # Jira: webhook + incremental parquet → extract.duckdb
├── cli/ # CLI tool (`da sync`, `da query`, `da admin`)
├── services/ # Standalone services (scheduler, telegram_bot, ws_gateway, etc.)
├── scripts/ # Utility + migration scripts
├── config/ # Configuration templates (instance.yaml.example)
├── docs/ # Documentation + metric YAML definitions
└── tests/ # Test suite (633 tests)
Configuration
| File | Purpose |
|---|---|
config/instance.yaml |
Instance-specific settings: branding, data source type, auth provider, Google domain |
.env |
Secrets and environment variables — never committed |
system.duckdb table_registry table |
Table definitions managed via POST /api/admin/tables/{id} or the web UI |
Copy the example to get started:
cp config/instance.yaml.example config/instance.yaml
See config/instance.yaml.example for all available options.
Documentation
- Hackathon TL;DR — condensed deploy + dev playbooks (for both humans and AI agents)
- Onboarding Guide — end-to-end Terraform deployment into a GCP project (recommended for production)
- Deployment Guide — chooses between Terraform and Docker Compose; covers OSS self-host
- Configuration Reference —
instance.yaml, env vars, per-instance options - Architecture — orchestrator, extractors, DB layout
- Quickstart — local development
Contributing
- Fork the repository and create a feature branch.
- Run
pytest tests/ -vto verify all tests pass before opening a pull request. - Keep commits focused and messages concise.
- Open a pull request against
mainwith a clear description of the change.
For bugs and feature requests, open a GitHub issue.
License
This project is licensed under the MIT License.