* fix(cta): fall back to textarea+execCommand when Clipboard API rejects
The "Setup a new Claude Code" CTA fetches /auth/tokens, parses the JSON
response, renders the setup script, THEN calls
`navigator.clipboard.writeText()`. Modern browsers (Safari, Firefox, and
Chrome on stricter configurations) reject `writeText` with
NotAllowedError when transient user activation has been consumed by an
intervening `await` — which is exactly the case here. Users perceived
this as "the browser blocked the copy" and got the manual-paste fallback
modal even though the textarea + `document.execCommand('copy')` path
WOULD have worked synchronously without needing fresh user activation.
`copyToClipboard` now:
- prefers the modern Clipboard API (unchanged for the happy path)
- on writeText rejection, falls back to `copyViaTextarea` instead of
surfacing the rejection to the caller's catch block.
`copyViaTextarea` is the previously-inline textarea fallback factored
out into a named helper, with two small hardening touches:
- `readonly` + `tabindex=-1` so the hidden textarea doesn't steal
focus or pop the virtual keyboard on mobile.
- explicit `setSelectionRange(0, text.length)` to belt-and-braces the
selection on iOS Safari (where `.select()` alone sometimes selects
zero chars on touch-focused textareas).
Only the CTA button needed this — the Step-1 install-command and the
connector-copy buttons all call `writeText` synchronously inside the
click handler (no awaits in between), so they keep their existing
user-gesture context and didn't hit the same rejection. No template
changes there.
* refactor(home): fold Atlassian MCP registration into connectors block
The standalone "Register the Atlassian MCP server" step (was step 6 in
the unified setup script) moves INTO the Atlassian connector's prompt
body so all Atlassian-related setup lives in one logical group. Same
intent that #247 carried for connectors, applied one level deeper:
the hosted Remote MCP registration is part of "set up Atlassian", not
its own ungrouped step.
What changed:
- `app/web/connector_prompts.py` — the Atlassian prompt's step 5
replaces the speculative "Register the on-demand Atlassian MCP under
.claude/mcp/atlassian" line with the actual hosted Remote MCP
registration: `claude mcp add --transport sse atlassian
https://mcp.atlassian.com/v1/sse || true`. The `|| true` keeps re-runs
idempotent and the body explains the OAuth-on-first-use contract.
Both /home's Atlassian tile and the inlined setup-script Atlassian
sub-block emit this line — single source of truth holds.
- `app/web/setup_instructions.py` — `_mcp_servers_block` deleted; the
`mcp_servers` step is removed from `_step_numbers`; resolve_lines no
longer calls it.
- Renumbering: install (1), init (2), catalog (3), preflight (4),
marketplace (5), diagnose (6), connectors (7), confirm (8). Was:
6 = mcp_servers, 7 = diagnose, 8 = connectors, 9 = confirm.
- `tests/test_setup_instructions.py` — Confirm step 9→8, Connect 8→7,
diagnose 7→6, mcp_servers references dropped.
`test_step_numbering_with_connectors_step` now asserts
`"mcp_servers" not in steps`. Stray-Confirm assertion lists shift
by one position.
- `tests/test_setup_page_unified.py` + `tests/test_web_ui.py` — same
step-number shifts in the rendered /setup preview assertions.
The `claude mcp add` line is still the Atlassian Remote-MCP path that
the 2026-05-10 init-report Fix C added — only its position in the
flow changes. /home Atlassian tile copying continues to install the
MCP too (the prompt body the tile pastes contains the same line).
112 tests pass.
* feat(atlassian): operator-overrideable base URL via AGNES_ATLASSIAN_BASE_URL
Adds an env var / YAML key the operator (Terraform module, customer-VM
template, OSS instance.yaml) can set to bake the Atlassian Cloud site
root into the connector prompt — so end users don't have to guess /
paste their org's `https://<myorg>.atlassian.net`.
When set, the Atlassian connector prompt (rendered on both /home tile
and inlined into the setup-script step 7 Atlassian sub-block) replaces
step 1's "Ask me for my Atlassian Cloud site URL and email" with a
one-line note that the URL is already provisioned by the operator and
asks only for the email. Step 4's helper-script body has the
`BASE_URL='<the site URL I gave you>'` placeholder substituted with
the literal value. When unset (empty), the existing "ask the user"
flow remains — no regression for OSS instances.
Resolution + normalization in `get_atlassian_base_url()`:
- env `AGNES_ATLASSIAN_BASE_URL` > yaml `instance.atlassian.base_url` > ""
- strips trailing slash + trailing `/wiki` so the canonical value is
the bare site root. Matches the per-user helper script's
normalization at storage time (atlassian_prompt step 4 guard 2), so
the literal baked in by the operator stays consistent with what the
user's helper script would have computed from their input.
Plumbing:
- `app/instance_config.py`: new `get_atlassian_base_url()` resolver.
- `app/web/connector_prompts.py`:
- `atlassian_prompt(*, base_url: str = "")` — string-replace two
explicit placeholder phrases when base_url is truthy; otherwise
return the prompt unchanged.
- `all_connector_prompts(..., atlassian_base_url: str = "")` —
forwards the kwarg.
- `app/web/router.py` (`_build_context`): reads
`get_atlassian_base_url()` and passes it through to
`all_connector_prompts(...)` so both the /home tile context AND the
inlined-script `resolve_lines(...)` call use the same value.
- `src/welcome_template.py` (`compute_default_agent_prompt`): same
threading via the existing import-on-demand path.
Tests (`tests/test_home_route_resolution.py`):
- `get_atlassian_base_url` resolver: default empty, env override,
trailing-slash strip, trailing-`/wiki` strip.
- `atlassian_prompt(base_url=...)`: literal URL baked in, ask-step
removed, placeholder replaced, operator-baked-in copy appears.
- `atlassian_prompt(base_url="")`: existing ask-the-user flow
unchanged.
- `all_connector_prompts(atlassian_base_url=...)`: kwarg threads
through to the rendered atlassian prompt.
135 tests pass.
* feat(asana): register hosted Asana Remote MCP in connector prompt
The Asana connector prompt only stored a PAT in the OS keychain + ran
a curl verify against /api/1.0/users/me. That set Claude Code up for
direct `curl` calls but didn't actually wire Asana into Claude's tool
list — so the user couldn't ask Claude to "find my open Asana tasks"
and have it work. Symmetric oversight to the Atlassian connector's
original speculative `.claude/mcp/atlassian` line that this branch
already replaced with `claude mcp add --transport sse atlassian
https://mcp.atlassian.com/v1/sse`.
Adds a new step 5 that registers Asana's hosted Remote MCP:
claude mcp add --transport http asana https://mcp.asana.com/mcp || true
This is the V2 endpoint (streamable HTTP transport, launched February
2026). The V1 SSE endpoint at https://mcp.asana.com/sse was deprecated
2026-05-11 (today) and must NOT be used — calling it out explicitly
in the prompt body so a future operator who finds an old reference
doesn't paste the dead URL. OAuth is handled by Claude Code at first
use, same model as the Atlassian MCP step.
The PAT stored in step 3 stays for direct `curl` calls (precheck +
ad-hoc scripts) — the MCP path uses its own OAuth grant, not the PAT.
Old step 5 (revoke instructions) renumbers to step 6 and adds the
`claude mcp remove asana` cleanup hint.
Same single-source-of-truth invariant holds: /home Asana tile + the
inlined Asana sub-block in the setup script (step 7 connectors) both
emit identical text from `asana_prompt()`.
71 tests pass.
* feat(asana): drive MCP OAuth login + end-to-end validation post-register
`claude mcp add --transport http asana ...` only registers the
server in Claude Code's local config — it does NOT trigger OAuth.
The browser tab opens the first time any `mcp__asana__*` tool gets
invoked. So the previous step 5 left a user looking at a "registered"
MCP that, in practice, hadn't authed yet and would fail on first
real use. Same blind spot Atlassian's prompt also has, but Asana was
the one called out in the latest review pass.
Adds a new step 6 between MCP registration (step 5) and the revoke
instructions (now step 7):
a. Tell the user verbatim what's about to happen — a low-impact
read through the MCP will pop the OAuth browser tab; sign in
with the same account whose PAT they stored in step 3 and
approve. Frames the OAuth as one-time so users don't wait
for it on every later call.
b. Drive an actual MCP read. Don't prescribe the exact tool name
because the Asana MCP's exposed surface (`mcp__asana__*`) is
versioned upstream and we don't want to pin to a name that
gets renamed. Instead: tell Claude to pick the lightest read
from its surfaced tool list (users-me / list-workspaces /
equivalent). Document the recovery path when Claude Code
times out waiting for the OAuth tool use: `claude mcp list`
to confirm registration before retrying.
c. Print a single one-line proof that combines wiring + auth:
"Asana MCP connected as <name> — <N> workspace(s) visible."
Explicit anti-echo callout for tokens, task content, comments.
On failure, surface the exact Claude-Code error and stop —
no silent pass.
d. Sanity-check that the MCP OAuth identity and the PAT identity
reference the same Asana account. Easy mistake to make when
the user has multiple Asana accounts — flag only on mismatch,
keep quiet when they match. Recovery: `claude mcp remove asana
&& claude logout asana` then redo step 5.
Step 7 (revoke) absorbs both the keychain delete + the
`claude logout asana` line so users have a single place to undo
everything.
43 tests pass.
* fix(init): clear stale CA env vars on Windows before any TLS handshake
Reported by the 2026-05-11 Windows test pass: after `agnes init` the
gws connector failed with `UnknownIssuer` TLS errors because
`SSL_CERT_FILE` and `REQUESTS_CA_BUNDLE` were still set in Windows
User scope pointing at `C:\Users\localadmin\.config\agnes\ca-bundle.pem`
— a file that did not exist on the test host. Past Agnes installs
(the setup-prompt trust block + older bootstrap helpers) write those
pointers when they materialize a combined Agnes-CA bundle; when the
bundle file later disappears (re-init on a new VM, machine swap, the
~/.agnes dir wiped), the pointers go stale and every native Windows
TLS handshake fails before Agnes itself runs. SSL_CERT_FILE in
particular REPLACES (not appends to) the trust store, so a stale
pointer is silently catastrophic.
`agnes init` now clears stale pointers in two layers before the first
server roundtrip:
1. Current-process env (os.environ) — what the immediately-following
`api_get` to /api/catalog/tables actually reads. Without this, init
itself blows up before it gets to step 2.
2. Windows User-scope env via PowerShell
`[Environment]::SetEnvironmentVariable(name, $null, 'User')` — what
every future shell + every native tool (gws, claude.exe, pip, uv)
inherits. The 2026-05-11 reporter expected this exact cleanup
("init was supposed to clear these but they persisted").
The cleanup is best-effort and conservative:
- Only deletes a var when its value points at a path that does NOT
exist on disk. Intentional operator config (e.g. SSL_CERT_FILE
pointing at a corp certifi bundle) stays put.
- PowerShell missing / restricted execution policy / WSL-without-pwsh:
swallowed silently. The current-process leg still runs, which
unblocks init even on hosts where the User-scope leg cannot fire.
Tests (`tests/test_init_ca_cleanup.py`, 6 cases):
- Stale pointers → removed from process env.
- Real-path pointers → preserved.
- Non-Windows hosts: PowerShell is not invoked.
- Windows hosts: PowerShell IS invoked with a script that checks
all three vars + uses Test-Path + SetEnvironmentVariable.
- PowerShell FileNotFoundError: cleanup swallows it, does not raise.
- `_is_windows_host()` reflects sys.platform.
* refactor(asana): MCP-first flow — drop PAT storage, precheck via `claude mcp list`
The Asana hosted MCP at https://mcp.asana.com/mcp authenticates via
OAuth (Claude Code holds the grant; browser tab pops on first tool
use). The earlier prompt walked the user through creating + keychain-
storing an Asana Personal Access Token AND registering the MCP — two
parallel auth surfaces for one connector. Once the MCP works, the PAT
has no consumer: the precheck/verify steps that used `curl
$BASE/api/1.0/users/me` are just redundant proof that Asana itself is
reachable, which the OAuth handshake already establishes.
Removed:
- Step 0 keychain probe + curl verify against /users/me with PAT.
- Step 1 open developer-console / create PAT.
- Step 2 click "+ New access token", warn shown-ONCE.
- Step 3 helper-script for keychain-storage (per-OS bodies: macOS
`security add-generic-password`, Linux `secret-tool store`, Windows
`cmdkey /generic`).
- Step 4 PAT-side `users/me` verify.
- Step 5's split that kept the PAT around for direct curl scripts.
- Step 6d's "MCP vs PAT identity sanity check" — there is no PAT
anymore, nothing to mismatch against.
New flow (3 steps total):
- Step 0 precheck: `claude mcp list | grep ^asana` — if found, the
server is registered AND Claude Code is holding its OAuth grant
(otherwise prior failure would have removed it); print
"Asana MCP already registered — skipping setup" and stop. Tells the
user the explicit reset command (`claude mcp remove asana && claude
logout asana`) so a re-register stays one paste.
- Step 1: `claude mcp add --transport http asana
https://mcp.asana.com/mcp` — no `|| true` because step 0 should have
caught the "already exists" case. Step explains the V2-vs-V1
endpoint distinction (V1 SSE deprecated 2026-05-11) and the
abort-clean recovery if the precheck somehow missed the existing
server.
- Step 2: same OAuth + low-impact-read validation pattern as before.
- Step 3: revoke instructions (mcp remove + logout + Asana-side app
revoke at app.asana.com/Settings → Apps).
Both surfaces (the /home Asana tile and the inlined Asana sub-block
in the setup script's step 7) emit the new text from the same
asana_prompt() — single-source-of-truth invariant intact.
77 tests pass.
|
||
|---|---|---|
| .github | ||
| app | ||
| cli | ||
| config | ||
| connectors | ||
| dev_docs | ||
| docs | ||
| infra | ||
| scripts | ||
| services | ||
| src | ||
| tests | ||
| .dockerignore | ||
| .gitignore | ||
| .pre-commit-config.yaml | ||
| ARCHITECTURE.md | ||
| Caddyfile | ||
| CHANGELOG.md | ||
| CLAUDE.md | ||
| docker-compose.ci.yml | ||
| docker-compose.dev.yml | ||
| docker-compose.flat-mount.yml | ||
| docker-compose.host-mount.yml | ||
| docker-compose.local-dev.yml | ||
| docker-compose.prod.yml | ||
| docker-compose.test.yml | ||
| docker-compose.tls.yml | ||
| docker-compose.yml | ||
| Dockerfile | ||
| LICENSE | ||
| Makefile | ||
| pyproject.toml | ||
| pytest.ini | ||
| README.md | ||
| uv.lock | ||
Agnes — AI Data Analyst
Agnes is an open-source data distribution platform for AI analytical systems. It extracts data from configured sources into DuckDB, serves it via a FastAPI backend, and distributes Parquet files to analysts who query them locally using Claude Code and DuckDB.
Each data source produces a self-describing extract.duckdb file. The SyncOrchestrator attaches all extract databases into a master analytics.duckdb, making every table available through a unified view layer without copying data unnecessarily.
Architecture: extract.duckdb Contract
Every connector produces the same output structure:
/data/extracts/{source_name}/
├── extract.duckdb ← _meta table + views
└── data/ ← parquet files (local sources only)
The orchestrator scans /data/extracts/*/extract.duckdb, attaches each into analytics.duckdb, and creates master views.
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Keboola │ │ BigQuery │ │ Jira │
│ extractor │ │ extractor │ │ webhooks │
│ (DuckDB ext) │ │ (remote BQ) │ │ (incremental)│
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
▼ ▼ ▼
extract.duckdb extract.duckdb extract.duckdb
+ data/*.parquet (views → BQ) + data/*.parquet
│ │ │
└─────────────────┼─────────────────┘
▼
SyncOrchestrator.rebuild()
ATTACH → master views in analytics.duckdb
│
┌──────────┼──────────┐
▼ ▼ ▼
FastAPI CLI
(serve) (agnes pull)
Supported Data Sources
| Mode | Distribution | Sources | Use when |
|---|---|---|---|
Batch pull (local) |
Parquet on disk, scheduled | Keboola | Source has a native bulk-export and the table fits on disk |
Materialized SQL (materialized) |
Parquet on disk, scheduled query | BigQuery, Keboola | Source table is too large to mirror as-is; you want a curated subset / aggregate on disk |
Remote attach (remote) |
View only, no download | BigQuery | Table is too large to materialize; latency cost of remote query is acceptable |
| Real-time push | Incremental parquet | Jira | Source is event-driven and you need sub-minute freshness |
The first three modes are what agnes pull distributes to analysts. The fourth is server-side only — analysts query Jira data through the same agnes pull-distributed parquets.
Admins manage per-source registrations through the /admin/tables UI (per-connector tabs for BigQuery / Keboola / Jira) or the agnes admin register-table CLI; per-row "Manage access" deep-links to /admin/access for granting tables to user groups via resource_grants(group, ResourceType.TABLE, table_id).
Analysts get a closed loop with Claude Code: agnes init writes <workspace>/.claude/settings.json with SessionStart (agnes pull --quiet) and SessionEnd (agnes push --quiet) hooks so every Claude Code session starts with fresh RBAC-filtered parquets and ends with the session log uploaded back.
Adding a new source means creating connectors/<name>/extractor.py that produces extract.duckdb with a _meta table (table_name, description, rows, size_bytes, extracted_at, query_mode). The orchestrator attaches it automatically.
Quick Start with Docker
# Clone the repository
git clone https://github.com/keboola/agnes-the-ai-analyst.git
cd agnes-the-ai-analyst
# Copy and edit configuration
cp config/instance.yaml.example config/instance.yaml
cp config/.env.template .env
# Edit both files for your environment
# Start the app and scheduler
docker compose up
# Start with all optional services (Telegram bot, etc.)
docker compose --profile full up
# Start with TLS (Caddy on :443 with corporate-CA certs from /data/state/certs)
docker compose -f docker-compose.yml -f docker-compose.prod.yml -f docker-compose.tls.yml \
--profile tls up -d
Once running, the FastAPI app is available at http://localhost:8000 (or https://$DOMAIN in TLS mode). See docs/DEPLOYMENT.md for cert provisioning + auto-rotation via scripts/ops/agnes-tls-rotate.sh. Trigger a manual sync:
curl -X POST http://localhost:8000/api/sync/trigger
Local sync & auto-update
Analysts run Claude Code against a local DuckDB built from RBAC-filtered parquets pulled from the server. agnes pull is the distribution path:
agnes pull # delta-pull: manifest → MD5 compare → download changed → rebuild views
agnes pull --quiet # same, no progress output (for hooks/cron)
agnes push # push session jsonl + CLAUDE.local.md back to the server
agnes init writes Claude Code lifecycle hooks into <workspace>/.claude/settings.json:
SessionStart→agnes pull --quiet— fresh data on every sessionSessionEnd→agnes push --quiet— uploads notes and session log
Hooks live at workspace level so they only fire in this analyst workspace, not in unrelated Claude Code sessions on the same machine.
Admin: which tables auto-sync to whom
The auto-sync set per analyst is the intersection of:
- Tables with
query_mode IN ('local', 'materialized')— these have parquets on disk and end up in the manifest - Tables granted to one of the analyst's groups via
resource_grants(group, ResourceType.TABLE, table_id)(seedocs/RBAC.md)
To enroll a new table for auto-sync, register it (or update its query_mode) and grant it to the relevant groups in /admin/access. New analysts get the same set on their next agnes pull.
For BigQuery, register a query_mode='materialized' table with a SQL body:
agnes admin register-table orders_90d \
--source-type bigquery \
--query-mode materialized \
--query @docs/queries/orders_90d.sql \
--schedule "every 6h"
The scheduler runs the query through the DuckDB BigQuery extension on each tick that's due, writes the result as a parquet, and the analyst picks it up on the next agnes pull. Cost guardrail: data_source.bigquery.max_bytes_per_materialize (default 10 GiB) — operations exceeding the BQ dry-run estimate are skipped.
Development Setup
# Create and activate virtual environment
python3 -m venv .venv && source .venv/bin/activate
# Install dependencies
uv pip install ".[dev]"
# Run FastAPI locally with hot reload
uvicorn app.main:app --reload
# Run the test suite
pytest tests/ -v
Project Structure
├── src/ # Core engine
│ ├── db.py # DuckDB schema (system.duckdb, analytics.duckdb)
│ ├── orchestrator.py # SyncOrchestrator — ATTACHes extract.duckdb files
│ ├── repositories/ # DuckDB-backed CRUD (sync_state, table_registry, users, etc.)
│ ├── profiler.py # Data profiling
│ └── catalog_export.py # OpenMetadata catalog export
├── app/ # FastAPI application
│ ├── main.py # App setup, router registration
│ ├── api/ # REST API (sync, data, catalog, admin, auth)
│ ├── auth/ # Auth providers (Google OAuth, email magic link, desktop JWT)
│ └── web/ # HTML dashboard routes
├── connectors/ # Data source connectors (extract.duckdb contract)
│ ├── keboola/ # Keboola: extractor.py (DuckDB extension) + client.py (fallback)
│ ├── bigquery/ # BigQuery: extractor.py (remote-only via DuckDB BQ extension)
│ └── jira/ # Jira: webhook + incremental parquet → extract.duckdb
├── cli/ # CLI tool (`agnes pull`, `agnes query`, `agnes admin`)
├── services/ # Standalone services (scheduler, telegram_bot, ws_gateway, etc.)
├── scripts/ # Utility + migration scripts
├── config/ # Configuration templates (instance.yaml.example)
├── docs/ # Documentation + metric YAML definitions
└── tests/ # Test suite (633 tests)
Configuration
| File | Purpose |
|---|---|
config/instance.yaml |
Instance-specific settings: branding, data source type, auth provider, Google domain |
.env |
Secrets and environment variables — never committed |
system.duckdb table_registry table |
Table definitions managed via POST /api/admin/register-table (or PUT /api/admin/registry/{id} to update) or the web UI |
Copy the example to get started:
cp config/instance.yaml.example config/instance.yaml
See config/instance.yaml.example for all available options.
Documentation
- Hackathon TL;DR — condensed deploy + dev playbooks (for both humans and AI agents)
- Onboarding Guide — end-to-end Terraform deployment into a GCP project (recommended for production)
- Deployment Guide — chooses between Terraform and Docker Compose; covers OSS self-host
- Configuration Reference —
instance.yaml, env vars, per-instance options - Architecture — orchestrator, extractors, DB layout
- Quickstart — local development
Contributing
- Fork the repository and create a feature branch.
- Run
pytest tests/ -vto verify all tests pass before opening a pull request. - Keep commits focused and messages concise.
- Open a pull request against
mainwith a clear description of the change.
For bugs and feature requests, open a GitHub issue.
License
This project is licensed under the MIT License.