agnes-the-ai-analyst

Author	SHA1	Message	Date
ZdenekSrotyr	61ef0d0eed	fix(devin-review): address 4 findings on PR #167 - Fix #1: _detect_existing_project now checks .claude/settings.json for "da sync" marker instead of deleted CLAUDE.md; update tests accordingly. - Fix #2: preview endpoint uses autoescape=False to match /setup rendering; align render_agent_prompt_banner in welcome_template.py to the same. - Fix #3: apply _sanitize_banner_html to override render path in setup_page so all render paths sanitize consistently. - Fix #4: move .setup-link-banner into the existing-user branch where account_details.last_sync_display is reachable; remove dead copy from new-user branch.	2026-05-03 21:15:01 +02:00
ZdenekSrotyr	97e72c3f1c	test(web-ui): update dashboard CTA link assertion after copy edit	2026-05-03 19:35:59 +02:00
ZdenekSrotyr	dc931a6556	feat(admin-prompt): default = live setup script; override replaces /setup content The /admin/agent-prompt editor now pre-fills with the full bash bootstrap script from setup_instructions.resolve_lines() instead of being empty. When an admin saves an override it replaces the default everywhere — the /setup page display and the dashboard clipboard CTA — rather than adding a banner above the auto-generated commands. GET /api/admin/welcome-template now returns a `default` field with the live computed script so the editor always shows meaningful starting content. {server_url} and {token} single-brace placeholders survive Jinja2 rendering and are substituted by JavaScript at clipboard-copy time as before. Preview pane switches to textContent (not innerHTML) since content is bash.	2026-05-03 16:31:35 +02:00
ZdenekSrotyr	d7705b5aa3	chore(openapi): regenerate snapshot after /api/welcome removal	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	8db4c1645b	feat(admin-prompt): variant C — banner on /setup, drop CLAUDE.md generation - src/welcome_template.py: rewrite as HTML banner renderer (render_agent_prompt_banner); drop _list_tables, _metrics_summary, _marketplaces_for_user, render_welcome, _load_default_template. build_context now exposes only instance/server/user/now/today. _sanitize_banner_html strips script/iframe/on*/javascript: post-render. - app/api/welcome.py: drop get_welcome handler, WelcomeResponse, old _VALIDATION_STUB_CONTEXT. Admin endpoints stay at same URLs; validation stub updated to match new slim context. Preview now uses autoescape=True. - app/web/router.py: setup_page calls render_agent_prompt_banner and passes banner_html to install.html; admin_agent_prompt_page drops _load_default_template. - app/web/templates/install.html: add .setup-banner CSS + banner block above hero. - cli/commands/analyst.py: replace _generate_claude_md with _init_claude_workspace; no CLAUDE.md written, only .claude/CLAUDE.local.md placeholder + settings.json hooks. - tests: delete test_cli_analyst_welcome.py (tests deleted endpoint/function); rewrite TestGenerateClaudeMd → TestInitClaudeWorkspace; update api test to assert /api/welcome returns 404 and remove welcome-fetch tests.	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	60386b9c3c	polish: drop dead CSS, fix docstring drift, add agent-prompt route test	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	ecb6c35ad5	feat(admin): rename /admin/welcome to /admin/agent-prompt (Agent Setup Prompt) Rename the welcome prompt editor from /admin/welcome to /admin/agent-prompt and update all UI labels to "Agent Setup Prompt". API endpoint URLs are unchanged (PUT/GET/DELETE /api/admin/welcome-template, GET /api/welcome). - Nav menu: "Welcome prompt" → "Agent Setup Prompt", href updated - Page title and h2 updated in admin_welcome.html - Error message hint in app/api/welcome.py updated to /admin/agent-prompt - Dashboard: replace inline <details> preview of _claude_setup_instructions with a simple link to /setup (Task C) - docs/welcome-template.md renamed to docs/agent-setup-prompt.md; internal references to /admin/welcome updated - OpenAPI snapshot path updated - Tests updated to reflect new route and removed inline preview	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	c7b14fb120	feat(admin): drop setup_banner feature; consolidate into single editor Remove the setup_banner feature (admin-editable /setup page banner) and all associated code: API router, repository, renderer, admin template, tests, and docs. The setup_page handler no longer calls render_setup_banner; the install.html template no longer renders banner_html. The setup_banner DuckDB table (v22) is kept intact for forward-compat with already-migrated instances — only the application code is removed. CHANGELOG updated: setup_banner bullets removed; Agent Setup Prompt (welcome-template feature) now stands alone as the single editable prompt.	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	0ee22f8fb0	docs: add setup-banner.md + rename migration test to test_db_schema_version.py - Add docs/setup-banner.md: placeholder table, autoescape semantics, security note on post-render stripping, diff table vs welcome-template (M-9). - Update CHANGELOG.md to reference docs/setup-banner.md. - Rename tests/test_db_migration_v20.py → tests/test_db_schema_version.py (file tested SCHEMA_VERSION==22, not just the v20 step; clearer name) (M-10).	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	5bfd8997ea	test: RBAC marketplace render test + validation stub drift detectors - test_render_marketplaces_filtered_by_rbac: seeds 2 marketplaces, 2 groups, grants, 2 users; asserts each user's rendered output references only their group's marketplace/plugins, not the other's (I-3). - test_validation_stub_matches_build_context_shape in test_welcome_template_api.py: asserts _VALIDATION_STUB_CONTEXT top-level and nested keys (instance, server, user) match build_context() output so stub drift is caught in CI (I-4). - test_validation_stub_matches_build_context_shape in test_setup_banner_api.py: same shape check against build_setup_banner_context() (I-4).	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	b3ffc98e9f	fix(security): XSS hardening for setup banner + cleanup unused imports - Add _sanitize_banner_html() to src/setup_banner.py: strips <script>/ <iframe> blocks, on* event-handler attributes, and javascript:/data: URI schemes post-render (I-2). Defense-in-depth — /setup is partly anonymous so malformed admin content must not execute in visitors' browsers. - Apply sanitizer in render_setup_banner() before returning rendered HTML. - Add 3 unit tests: test_render_strips_script_tags, test_render_strips_event_handlers, test_render_strips_javascript_uri. - Drop unused Optional import from src/repositories/welcome_template.py and src/repositories/setup_banner.py (M-6).	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	8ec194cbe4	test(db): bump v20 migration test assertions to v22	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	39146288e1	feat: admin-editable setup_banner on /setup page (schema v22) Adds an optional Jinja2/HTML banner displayed above the bootstrap commands on /setup. Empty by default; admin authors it at /admin/setup-banner. autoescape=True — safe for HTML context. Render failures return "" so a broken banner never breaks /setup. Schema v22: setup_banner singleton table, auto-migration v21→v22.	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	85967e14ca	feat(web): rename /install → /setup; nav label 'Setup local agent' - Add GET /setup serving install.html (CLI + Claude Code setup page) - Add GET /install → 301 redirect to /setup for backwards compat - Move first-time setup wizard from /setup to /first-time-setup - Update nav link: href=/setup, label 'Setup local agent', active on both /setup and /install paths - Update page <title> to 'Setup local agent — …' - Update /dashboard and /setup comment in _claude_setup_instructions.jinja - Update tests and OpenAPI snapshot accordingly	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	1eb03405c7	test(db): bump v20 migration test assertions to v21	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	517e63d217	fix(cli): warn on welcome-fetch failures; expand test coverage	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	c604dad9cf	feat(cli): da analyst setup fetches rendered welcome from /api/welcome	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	ecaa113c68	fix(admin-welcome): credentials: include, real-content preview, refresh after mutate	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	93b713900b	fix(api): validate template render on PUT; broaden render-time catch	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	0d1ecd235d	feat(api): /api/welcome + /api/admin/welcome-template endpoints	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	4449623af8	fix(renderer): tolerate missing optional tables; document tzinfo	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	51f287a81a	feat: server-side jinja2 renderer for welcome prompt	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	19f1795350	feat(repo): WelcomeTemplateRepository singleton CRUD	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	33e7107637	feat(db): schema v15 — welcome_template singleton table	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	13ab464ac5	Merge branch 'main' into zs/fix-health-e2e-tests	2026-05-03 15:55:02 +02:00
ZdenekSrotyr	c54917fc50	fix(tests): drop stale 'healthy' from /api/health status assert Per Devin review on #166: /api/health returns 'ok' or 'unhealthy'; 'healthy' is the detailed endpoint's vocabulary (app/api/health.py:180). The pre-existing OR-tuple was dead code and inconsistent with the rest of this PR's alignment work.	2026-05-03 15:40:41 +02:00
ZdenekSrotyr	f348296685	fix(tests): align docker-e2e health asserts with current /api/health shape `/api/health` is the auth-free LB probe — returns `status` + `db_schema` only. `version` lives in `/api/version` and the richer `services.duckdb_state` lives in `/api/health/detailed` (auth-gated). The two e2e asserts had drifted and broke nightly on main.	2026-05-03 11:21:19 +02:00
ZdenekSrotyr	91caefaca9	security(auth): per-IP rate limit + last-admin guard (#165 ) * security(auth): per-IP rate limit on auth endpoints + generalize last-admin guard Closes #45 and #151. #45 — every auth endpoint was unthrottled (login, magic-link, token, bootstrap), leaving us open to password brute-force and SMTP email-bombing. Wires slowapi (new dep) into the middleware chain with per-route limits: 10/min on login + token, 5/min on send-link, 3/min on bootstrap. Returns 429 with Retry-After: 60 once exceeded. Per-IP key respects the leftmost X-Forwarded-For hop (Caddy in front of the app strips client-supplied XFF). Operator escape hatch: AGNES_AUTH_RATELIMIT_ENABLED=0. Test suite disables the limiter via autouse conftest fixture so existing auth tests that hammer endpoints in tight loops are unaffected. #151 — DELETE /api/admin/users/{id}/memberships/{group_id} and the mirror DELETE /api/admin/groups/{group_id}/members/{user_id} only guarded against self-removal as last admin. Generalizes to refuse removing anyone from the seeded Admin group when they are the only remaining active admin (mirrors the existing count_admins(active_only=True) <= 1 check on delete_user / update_user). Recovery from zero admins requires direct DB access, so this closes a path where a scheduler/bootstrap actor that bypasses normal admin checks could otherwise empty the group. * security(auth): throttle remaining email-bombing + token-confirm endpoints Address code-review gap on PR #165 — the first commit covered /send-link but missed two endpoints with the IDENTICAL email-bombing surface: - POST /auth/password/reset — sends reset mail, anti-enum response - POST /auth/password/setup/request — sends setup mail, anti-enum response Both now share the 5/min limit with /send-link. Also add 10/min to the token-confirm surfaces — high-entropy tokens but partial leaks via logs / referer have surfaced before, and unbounded guess rate would let an attacker exhaust the keyspace adjacent to a leaked prefix: - POST /auth/email/verify - GET /auth/email/verify — closes the click-through bypass - POST /auth/password/reset/confirm - POST /auth/password/setup/confirm Doc fix: rate_limit.py module docstring + CHANGELOG entry no longer claim "disable without a redeploy" (misleading). The Limiter constructor freezes `enabled` from env at import time, matching every other Agnes env knob — operators set the flag and bounce the container. Tests: 4 new cases in test_auth_rate_limit.py covering /reset, /setup/request, /reset/confirm, GET /verify. Full suite: 2583 passed, 32 skipped, 0 failed. * security(auth): throttle JSON /auth/password/setup — closes form-throttle bypass Second code-review pass on PR #165 caught a fifth gap: POST /auth/password/setup (JSON variant, kept for backward compat) consumes the same setup_token as the web form /setup/confirm but was unthrottled — an attacker brute-forcing the token just switches from the form path to the JSON path and resumes at unbounded RPS. Apply the same 10/min limit and signature shape used on /setup/confirm. Also extend CHANGELOG note about the JSON-variant bypass for future operators reading the security entry. Test: 1 new case (test_password_setup_json_rate_limited_after_10_requests), 9 rate-limit tests + 28 password-flow tests + 41 auth-provider tests pass, no regressions. * chore(release): cut 0.30.1 — auth security hardening (rate limit + last-admin guard)	2026-05-02 21:08:33 +02:00
ZdenekSrotyr	07c7bd4c8b	fix(test): reset instance_config cache in TestRebuildFromRegistry leakage repair CI on `dc03837a` showed test_missing_project_returns_error failing with 'ok-project' instead of '' — config-cache leak from the sibling test_returns_skipped_when_no_bq_rows that ran first under pytest-xdist. Pre-existing flake (cache lives in app.instance_config; monkeypatch restores the loader patch but doesn't invalidate the cached return). Earlier CI runs (`a4339ce6`) got lucky on test ordering. Adding an explicit reset_cache() at the top of the test removes the dependency on ordering.	2026-05-01 23:27:59 +02:00
ZdenekSrotyr	dc03837a7b	feat(query-api): better error message when --remote query references a materialized-but-not-rebuilt id E2E sub-agent finding: `da query --remote "SELECT * FROM <id>"` against a materialized table that hasn't yet been rebuilt in the server's analytics.duckdb returns a confusing DuckDB "Table does not exist" message even though the table is in the registry. Materialized rows produce parquets at `${DATA_DIR}/extracts/<source>/data/<id>.parquet`, but the orchestrator's master-view creation is `_meta`-driven — fresh instances or pre-tick states have the registry row without a corresponding view, so analysts hit the bare "does not exist" with no path forward. Improve the error rendering in `app/api/query.py:execute_query`. When DuckDB raises a "table does not exist" error, scan the registry for any `query_mode='materialized'` row whose id or name appears in the failed SQL. On a hit, return a 400 whose detail names the table, explains the materialize state, and offers two concrete next steps: 1. Run `da sync` (or wait for the scheduler tick / hit POST /api/sync/trigger) to materialize the parquet, OR 2. Query the source directly via the catalog alias when the registry row carries bucket+source_table (e.g. `bq."dataset"."table"` for BigQuery, `kbc."bucket"."table"` for Keboola). Detection is bounded — the registry round-trip only fires when DuckDB's error mentions a missing table, so happy-path queries pay no cost. Non-materialized unknowns fall through to DuckDB's raw error. 2 new tests: materialized id surfaces the hint with the bucket+source_table payload; unknown table falls back to the generic error path with no false positive on the new hint.	2026-05-01 23:09:52 +02:00
ZdenekSrotyr	8030a867ec	fix(admin-api): keep source_type validator permissive when primary is 'local' (bootstrap) The strict source_type-availability validator from the prior commit broke ~12 existing tests that register tables on the default test instance (where `data_source.type` resolves to 'local' since no instance.yaml is loaded). The intent of the validator is to catch explicit misconfig: `type=bigquery` instance + `source_type=keboola` payload with no `data_source.keboola.*` block. The bootstrap workflow — admin sets up a fresh instance and registers a few tables before pointing at a real source — should not be gated here. Loosen the check: when `get_data_source_type()` returns 'local' (the fallback when no `data_source.type` is set), skip the rejection. The explicit mismatch case still 422s because that path resolves `configured_primary` to a real source type. Also adds an autouse keboola_instance fixture to test_journey_sync_query.py which exercises Keboola registrations through the full sync→query flow — the fixture documents the test's data-source assumption rather than relying on the bootstrap escape hatch.	2026-05-01 23:09:15 +02:00
ZdenekSrotyr	bc3ba0d43d	feat(admin-api): reject register-table for source_type not configured on instance E2E sub-agent finding: instance configured with `data_source.type='bigquery'` and no `data_source.keboola.` block. Admin POSTs `{source_type: 'keboola'}` to /api/admin/register-table → returns 201, row lands in the registry, but never syncs because the scheduler has no Keboola URL/token to ATTACH against. Operator only notices the gap when `da catalog` keeps showing nothing. The new `_validate_source_type_configured` helper runs immediately after the id/view-name collision checks in `register_table`. A source_type is considered configured when: - it matches `get_data_source_type()` (the instance's primary), OR - a non-empty `data_source.<source_type>` block exists in the effective `instance.yaml` (multi-source instance), OR - it's in `_SOURCE_TYPES_INDEPENDENT_OF_DATA_SOURCE` (Jira / local — both get data through paths that don't involve `data_source.`). Returns 422 with a message that names the configured primary source and points at `/admin/server-config` for enabling a secondary one. None / empty source_type is still tolerated for backward compat with legacy CLI scripts that don't set the field — the route resolves it later. 5 new tests cover: keboola-on-bq rejected, bq-on-keboola rejected, matching source_type still works, jira allowed regardless, omitted source_type passes through. Existing tests that registered Keboola rows on the unconfigured default test instance now opt into a `keboola_instance` fixture to satisfy the new validator (tests/test_admin_bq_register.py + .keboola_materialized + .unregister_cleanup; the multi-source PUT test in test_admin_bq_register adds a `keboola` block to its synthetic config). Pre-existing test_missing_project_returns_error failure in TestRebuildFromRegistry is unrelated (config-cache leakage from a previous test in the same class) — confirmed pre-existing on the prior commit via `git stash` reproduction.	2026-05-01 23:04:51 +02:00
ZdenekSrotyr	dd46461c6c	fix(admin+orchestrator): DELETE registry drops parquet + sync_state; rebuild skips orphan parquets E2E sub-agent finding: register a materialized BQ row → sync to materialize the parquet at `/data/extracts/bigquery/data/<id>.parquet` → DELETE the registry row. The DB row goes away but: - the parquet file stays on disk forever, AND - the sync_state row stays, so `/api/sync/manifest` keeps advertising the dropped table to `da sync`, AND - the orchestrator's next rebuild can resurrect a master view by picking up the leftover parquet. Two-part fix in `unregister_table`: 1. For materialized rows on bigquery/keboola, remove `${DATA_DIR}/extracts/<source_type>/data/<name>.parquet` (and any stale `<name>.parquet.tmp` from a crashed prior materialize). Filename is keyed on `table_registry.name` to match sync_state bookkeeping. File-removal errors are logged but don't fail the DELETE — the registry row is already gone, and an orphan parquet won't get a master view at next rebuild because the orchestrator's _meta-driven scan never picks up bare parquet files. 2. Always clear `sync_state` + `sync_history` rows for the dropped table_id so the manifest stops advertising the table — applies to all source types and modes, not just materialized, since any synced row had a sync_state entry. Orchestrator-side defensive guard (Finding 2b) is a no-op in the current implementation: `_attach_and_create_views` only creates master views from `_meta` rows in each connector's `extract.duckdb`, so a parquet without a matching `_meta` entry is already invisible to the rebuild. The new test `test_orchestrator_skips_orphan_parquet_in_extracts` is kept as a regression guard for that contract. 5 tests cover: BQ + Keboola materialized DELETE removes parquet, remote DELETE doesn't error trying to remove a non-existent file, sync_state cleared on DELETE, orchestrator orphan-skip invariant.	2026-05-01 22:54:11 +02:00
ZdenekSrotyr	f0979f997a	fix(admin-api): reject backtick BQ-native source_query at register; surface materialize errors per-row E2E testing showed admin POSTs of materialized BQ rows whose source_query uses BigQuery-native backtick identifiers (`prj.ds.t`) silently no-op'd at the next sync tick — the materialize path runs the SQL through the DuckDB BQ extension's COPY which uses DuckDB's parser; backticks aren't recognized and the query either parse-errors or matches zero rows. No parquet lands at the canonical path and no error reaches an operator-visible surface. Two-part fix: 1. RegisterTableRequest's _check_mode_query_coherence model_validator now rejects any source_query containing a backtick with a 422 + actionable message pointing at the DuckDB equivalent (bq."dataset"."table"). Same check is applied in update_table on the merged record so PATCHes that flip a stored source_query to backtick form are also caught. Covers BQ AND Keboola materialized rows since both connectors funnel source_query through DuckDB's COPY. 2. _run_materialized_pass now persists per-row failures via the new SyncStateRepository.set_error / clear_error methods (existing sync_state.error / status columns — no schema migration). GET /api/admin/registry enriches each row with `last_sync_error` from a single batched SELECT against sync_state, so the admin UI / da admin status can show "this table failed last sync because: X" instead of operators having to trawl scheduler logs. Recovered rows have the error cleared automatically — update_sync's success path resets status='ok' / error=NULL on the upsert. The materialized-path test fixture's _materialized_payload helper is updated to use DuckDB-flavor SQL (the prior backtick example pre-dated the fix). 6 new tests cover register/update rejection on BQ + Keboola, the sync_state error persistence, and the registry response surface.	2026-05-01 22:51:02 +02:00
ZdenekSrotyr	a4339ce679	fix(admin+diagnose): address 2 additional Devin Review findings on PR #152 Devin's second review pass on commit `16938ae7` surfaced 2 more issues: BUG_pr-review-job-58ae3148_0001 — non-BQ materialized via PUT bypasses source_query check app/api/admin.py update_table only enforces 'query_mode=materialized requires source_query' for source_type='bigquery' rows (via the synthetic RegisterTableRequest at line 2129+). Non-BQ source types (Keboola) skip the check — admin could PUT {query_mode: materialized} on a Keboola local row without source_query, persist successfully, then crash at the next sync tick when kb_materialize_query received sql=None and DuckDB rejected COPY (None) TO '...'. Fix: generic coherence guard before the BQ-specific block — for ALL source types, query_mode='materialized' requires non-empty source_query in the merged record. Returns 422 with a hint about reverting via query_mode='local'/'remote'. ANALYSIS_pr-review-job-642ff90f_0007 — diagnose returns 'ok' on BQ resolution failure app/api/health.py:_check_bq_billing_project caught get_bq_access() exceptions and returned status='ok' with a 'could not resolve' detail. Automated alerting keyed on status != 'ok' would silently miss missing google-cloud-bigquery, auth failures, or malformed config. Fix: return status='unknown' on resolution failure — surfaces it on operator dashboards without promoting the overall health to 'degraded' (which 'warning' does, intentionally for the billing==project case). Tests: - test_update_keboola_to_materialized_without_source_query_rejected: PUT {query_mode: materialized} on a Keboola local row returns 422 with 'source_query' in the detail - test_diagnose_returns_unknown_status_when_bq_resolution_fails: when get_bq_access raises, the bq_config service entry surfaces status='unknown' (not 'ok') Full sweep: 2507 passed, 25 skipped, 0 failed (+2 from previous sweep because of the 2 new regression tests; 8 pre-existing internal_roles schema-migration failures still ignored per task brief).	2026-05-01 21:21:23 +02:00
ZdenekSrotyr	16938ae7cb	fix(materialized): address 4 Devin Review findings on PR #152 Devin Review on commit `7052a235` flagged 4 real bugs in the Keboola materialized path. All four are fixed; 3 new regression tests pin the behavior so future refactors can't quietly regress. BUG_pr-review-job-3fbd31c9_0001 — _run_materialized_pass gated behind 'if bq_project:' app/api/sync.py:444-466 wrapped the entire materialized pass (which dispatches BOTH BigQuery AND Keboola rows by source_type) in a check for data_source.bigquery.project being non-empty. On Keboola-only instances this short-circuited and Keboola materialized rows sat in table_registry forever without their SQL being evaluated — the feature CHANGELOG advertised was dead code on the most common deployment shape. Fix: always run the materialized pass; the BQ branch's per-row try/except catches the typed BqAccessError(not_configured) the sentinel raises when no BQ project is set, so non-BQ instances incur a per-row error for any (hypothetical) BQ-tagged row but the Keboola path runs cleanly. Log line renamed 'Materialized BQ' → 'Materialized SQL' to match. BUG_pr-review-job-3fbd31c9_0004 — wrong config key 'url' instead of 'stack_url' app/api/sync.py:149 read get_value('data_source', 'keboola', 'url'), but the canonical config key documented in instance.yaml.example:111 and used by app/api/admin.py:1503 + 2359 is 'stack_url'. Production Keboola instances would always see an empty URL and fail with the 'not configured' error. The pre-existing test patched the wrong key too, so it passed without catching the mismatch. Fix: use stack_url in both sync.py and the test fixture. BUG_pr-review-job-3fbd31c9_0003 — no atomic write in Keboola materialize_query connectors/keboola/extractor.py wrote COPY directly to the final '<id>.parquet' path. A mid-COPY failure (network, disk full, extension crash) left a partial parquet that the orchestrator rebuild would later pick up and serve to analysts. BQ's materialize_query already uses a '<id>.parquet.tmp' staging path + os.replace() atomic swap (connectors/bigquery/extractor.py:370-445); Keboola now mirrors that pattern with the same try/except cleanup on COPY failure. BUG_pr-review-job-3fbd31c9_0002 — full file read into memory for MD5 Same file:60-62 used parquet_path.read_bytes() for the MD5 hash. Multi-GB Keboola materialized results would OOM on memory-constrained containers. BQ's version uses streaming 8 KiB-chunk hashing (connectors/bigquery/extractor.py:438-442); Keboola now mirrors it. Tests: - test_run_sync_runs_materialized_pass_on_keboola_only_instance — pins BUG_0001's fix; setting bigquery.project='' must NOT skip Keboola materialized dispatch - test_keboola_materialize_atomic_write_on_failure — pins BUG_0003; a mid-COPY RuntimeError leaves no .parquet AND no .parquet.tmp at the canonical path - test_keboola_materialize_uses_tmp_path_during_copy — documents the atomic-write contract: COPY targets .parquet.tmp, final swap to .parquet (no .tmp suffix on the result['path']) - existing test_run_materialized_pass_dispatches_keboola_to_keboola_extractor fixture updated: stack_url instead of url Full sweep: 2505 passed, 25 skipped, 0 failed (modulo 8 pre-existing internal_roles schema-migration failures called out in the task brief).	2026-05-01 20:58:17 +02:00
ZdenekSrotyr	b627de8344	feat(diagnose) + docs: warn on USER_PROJECT_DENIED footgun + document all newly-exposed knobs Diagnostic + operator-facing documentation that closes the loop on the work in this PR. `da diagnose` (via /api/health/detailed): - New _check_bq_billing_project() helper. When data_source.type='bigquery' and BqProjects.billing == .data, surface a yellow warning: 'BigQuery billing project equals data project'. Hint includes the YAML field path + the /admin/server-config UI shortcut. Diagnose's overall status promotes warning → degraded so the CLI echoes it. - Non-BQ instances (Keboola-only, etc.) skip the check. - Implementation hooks into the existing /api/health/detailed surface — no new endpoint, no CLI changes. config/instance.yaml.example documentation: - data_source.bigquery.billing_project: USER_PROJECT_DENIED hint, /admin/server-config UI reference - data_source.bigquery.legacy_wrap_views: analyst-side discipline note (use `da fetch` / `da query --remote`), issue #101 history, view-heavy deployment guidance - data_source.bigquery.max_bytes_per_materialize: cost guardrail block (NEW — wasn't documented in .example before) - ai.base_url: provider list + UI hint - openmetadata + desktop: 'configurable via /admin/server-config UI' headers - corporate_memory: leading note that the schema is editable via UI Other docs: - CHANGELOG.md: comprehensive Unreleased section - CLAUDE.md: schema chain → v20 + Materialized SQL connector mode + per-connector tab UI mention - README.md: mode-first source table summary - docs/architecture.md: per-connector tab UI mention - cli/skills/connectors.md: bootstrap rails (parallel to #154) - docs/superpowers/plans/2026-05-01-admin-tables-form-cleanup.md: implementation plan archive (2515 lines) - scripts/seed_dummy_tables.py: drop is_public after #150 RBAC migration (column gone) Tests: - test_diagnose_billing.py — 3 cases (BQ with billing==data warns, BQ with billing!=data clean, non-BQ skips)	2026-05-01 20:27:24 +02:00
ZdenekSrotyr	df7f5b1d9a	feat(admin-ui): /admin/server-config known-fields registry + structured nested editor Today /admin/server-config renders fields by iterating Object.keys(payload) on the YAML value — if a key isn't in instance.yaml, the operator can't see it. They have to know to type it via the JSON-patch textarea (which only renders for empty sections) or SSH and edit YAML. Adds a known-fields registry (`_KNOWN_FIELDS` in app/api/admin.py) the UI consumes alongside the YAML payload. Renderer shows BOTH: - existing fields (from YAML) with current value - known-but-unset fields with dashed-border placeholder + hint, ready to fill in Renderer (`renderField`, `renderSection`, `collectSection`): - kind="string"\|"secret"\|"bool"\|"int"\|"select"\|"object"\|"array"\|"map" — picks input type - kind="object" with `fields` — recursive structured form, arbitrary depth (corporate_memory needs 3-4 levels) - kind="array" with `item_kind` — vertical stack of typed inputs + add/remove buttons - kind="map" with `key_kind` + `value_kind` — key:value rows + add/remove (used for confidence.base, domain_owners, entity_resolution.entities) - data-path encoded as JSON segment array so map keys with embedded dots (e.g. 'user_verification.correction') survive collect → patch round-trip - .cfg-field.is-unset CSS — dashed border, muted label, italic hint Sections newly exposed (added to _EDITABLE_SECTIONS): - openmetadata: url, token (secret), cache_ttl_seconds, verify_ssl - desktop: jwt_issuer, jwt_secret (secret), url_scheme Known fields populated for existing sections: - data_source.bigquery: billing_project (the cause of the 403 USER_PROJECT_DENIED footgun when SA can read but not bill the data project), legacy_wrap_views (bigquery_query() wrap for VIEWs — issue #101 default off, ON for view-heavy deployments), max_bytes_per_materialize (cost guardrail) - data_source.keboola: stack_url, project_id (hints; values already populated) - ai: base_url (required for openai_compat), structured_output (select) - corporate_memory: full schema from instance.yaml.example — distribution_mode, approval_mode, review_period_months, notify_on_new_items, sources.{claude_local_md,session_transcripts}, extraction.{model,sensitivity_check,contradiction_check}, confidence.{base,modifiers,decay.{mode,half_life_months,decay_rate_monthly,floor}}, contradiction_detection.{enabled,max_candidates}, entity_resolution.{enabled,entities}, domain_owners, domains - Known partial: confidence.modifiers is map<string, map<string, float>> — falls through to JSON-textarea with TODO; structured editor for that one shape needs more renderer work Tests: - test_admin_server_config_known_fields — registry envelope shape, smoke fixture - test_admin_server_config_renderer_depth — 4-level nested objects, arrays of strings, maps of floats, dotted-key safety - test_admin_server_config_corp_memory — full corporate_memory schema, 12 fields incl. nested - test_admin_server_config — existing tests adjusted for new shape	2026-05-01 20:27:01 +02:00
ZdenekSrotyr	c63f54d643	feat(admin-ui): /admin/tables per-connector tabs + Keboola materialized parity + form cleanup + Manage access deep link Replaces the single mixed Jinja-branched form at /admin/tables with a per-connector tab interface and brings Keboola to capability parity with BigQuery. Tab structure: - BigQuery tab: Register modal with two-question radio model (Q1 Live \| Synced × Q2 Whole \| Custom SQL), Discover datasets / List tables / Use-table-as-base autocomplete buttons, table-vs-view auto-detection hint, per-tab listing filter - Keboola tab: same two-question radio (Q2 only — no Live mode for Keboola), Custom SQL textarea against kbc."bucket"."table" for materialized rows - Jira tab: read-only listing (Jira is webhook-driven; no Register form) - Active tab persists in window.location.hash so refresh keeps the operator in place Form cleanup (within tabs): - Drops the misleading 'Sync Strategy' dropdown — runtime never read it (only profiler.is_partitioned() consumes the value for parquet-layout detection); kept in DB for back-compat (Pydantic deprecated) - Adds Sync Schedule input to Keboola Register/Edit (was missing — scheduler honored per-table cron via is_table_due() for every source but the Keboola UI had no surface) - Hides Primary Key under <details>Advanced with clarifying hint that it's catalog-metadata only (Agnes does not perform upsert/dedup; every sync is a full overwrite) - Drops the Strategy column from the registry listing (every Keboola row defaulted to full_refresh after Strategy was hidden — column was noise) - Removes the legacy out-of-tab #registerModal + the legacy global Discovery panel; each tab now owns its own header + Register button + listing div Edit modal: - BigQuery Edit modal physically relocated into <section id="tab-content-bigquery"> (mirrors Phase E Register placement) - Keboola Edit modal mirrors Register (same Q2 radio, Discover/List buttons via parameterized helpers) - openEditModal(table) dispatches by source_type to the right modal — fixes a quiet bug where Phase F's openEditKeboolaModal was never wired up and Keboola edits silently used the legacy modal Per-row Manage access deep link: - Each row in the per-tab listing has a lock-icon button between Edit and Delete that navigates to /admin/access#table:<table_id> - admin_access.html bootstrap reads window.location.hash and pre-fills the resource filter, mirroring the existing ?group=<id> deep-link pattern Tests: - test_admin_tables_tab_ui.py — tab nav, hash persistence, register-button-per-tab, listing partition by source_type, Manage access deep link - test_admin_tables_ui_materialized.py — two-question radio (BQ + Keboola), Discover/List/Use-as-base buttons, Edit modal parity, Jira read-only	2026-05-01 20:26:29 +02:00
ZdenekSrotyr	85d3810535	feat(materialized): query_mode='materialized' for BigQuery + Keboola — admin SELECT → parquet → analyst Closes the 'admin pre-stages a curated table/view for analysts' use case end-to-end across both supported source connectors. Backend (BigQuery + Keboola, schema v20): - schema v20 adds source_query TEXT to table_registry (renumbered from v19 after main's #150 RBAC migration also bumped to v19) - connectors/bigquery/extractor.py adds materialize_query(table_id, sql, , bq, output_dir, max_bytes=...) — BqAccess session, dry-run cost guardrail (default 10 GiB, configurable via data_source.bigquery.max_bytes_per_materialize), idempotent ATTACH, rows/bytes/md5 metadata for sync_state - connectors/keboola/access.py — new KeboolaAccess facade (parallel of BqAccess) wrapping ATTACH 'keboola://...' AS kbc - connectors/keboola/extractor.py adds materialize_query — same shape, no dry-run analog (Keboola Storage API has different cost model); legacy bucket-download path skips query_mode='materialized' rows - app/api/sync.py:_run_materialized_pass dispatches by source_type to the right materialize_query - app/api/admin.py: RegisterTableRequest accepts source_query; model_validator coheres mode↔source_query↔bucket; PUT preserves omitted fields; deprecation marks (Field(deprecated=True)) on sync_strategy + profile_after_sync (no extractor reads them; profile_after_sync becomes inert — bug from earlier work where /api/sync/trigger never honored the flag); _BQ_OPTIONAL_FIELD_DEFAULTS injects defaults into GET /server-config payload Operator + CLI surface: - da admin register-table --query / --query-mode materialized - scripts/smoke-test-materialized-bq.sh — end-to-end smoke for operators Tests (incl. spike + integration + regression): - test_db_migration_v20, test_table_registry_source_query - test_bq_materialize, test_bq_cost_guardrail, test_bq_init_extract_skips - test_keboola_access, test_keboola_extension_query_passthrough (lock-in for the DuckDB extension capability), test_keboola_materialize, test_keboola_init_extract_skips, test_keboola_materialized_e2e (skipped without KBC_TEST_ creds) - test_sync_trigger_materialized, test_sync_trigger_keboola_materialized - test_api_admin_materialized, test_cli_admin_materialized - test_admin_bq_register, test_admin_discover_bigquery, test_admin_keboola_materialized, test_admin_phase_c_deprecation, test_admin_put_preservation, test_materialized_e2e Cost: BQ uses bigquery_query() (jobs API, view-aware) — works on tables, views, materialized views uniformly. Keboola uses ATTACH+COPY parquet through the DuckDB extension.	2026-05-01 20:25:56 +02:00
ZdenekSrotyr	d0b7e122d6	feat(cli): smart local sync — Claude Code SessionStart/SessionEnd hooks + da sync --quiet The analyst flow becomes a closed loop with the server-curated table catalog: - `da analyst setup` writes `<workspace>/.claude/settings.json` with two hooks: SessionStart → `da sync --quiet \|\| true` — pulls fresh RBAC-filtered parquets at session start SessionEnd → `da sync --upload-only --quiet \|\| true` — uploads session jsonl + CLAUDE.local.md - `\|\| true` keeps Claude Code unblocked when the server is down. - Workspace-level (not user-home) so the hooks fire only when Claude Code opens this analyst workspace. - `da sync --quiet` rewrites the CLI output for hook consumption — 0 stdout on success, single-line error on failure. - Existing settings.json is patched (deep-merged), not overwritten; malformed JSON is reported, not silently overwritten. Tests cover: workspace bootstrap, hook insertion, malformed-json safety, quiet-mode output shape.	2026-05-01 20:25:27 +02:00
minasarustamyan	d4ac84dd46	feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150 ) * feat(rbac): drop dataset_permissions + access_requests + users.role + is_public; v19 migration BREAKING. Sjednocení datové RBAC vrstvy do per-group resource_grants modelu. Před PR byla legacy data RBAC vrstva (dataset_permissions + is_public bypass) de-facto neaktivní — is_public neměl API/UI/CLI surface, default true znamenal že can_access_table vždycky bypassl. Dnes každý non-admin přístup vyžaduje explicitní resource_grants(group, "table", id) řádek. Schema v18 → v19 (src/db.py:_v18_to_v19_finalize): - DROP TABLE dataset_permissions, access_requests - DROP COLUMN users.role (NULL artifact since v13) - DROP COLUMN table_registry.is_public - Drops přes table-rebuild idiom (rename → create new → INSERT … SELECT → drop old) kvůli DuckDB ALTER DROP COLUMN limitacím na tabulkách s historic FK constraints. INSERT picks intersection sloupců, takže test fixtures s minimal pre-v19 schemou migrate cleanly. Runtime: - src/rbac.py:can_access_table → deleguje na app.auth.access.can_access - DatasetPermissionRepository, AccessRequestRepository smazány - AGNES_ENABLE_TABLE_GRANTS env-gate v app/resource_types.py odstraněn (TABLE je unconditionally enabled) API drop: - app/api/permissions.py, app/api/access_requests.py celé soubory - /admin/permissions web route + admin_permissions.html - "Request Access" modal v catalog.html + locked-row UI - ~10 if user.get("role") != "admin" checků nahrazeno (admin shortcut je uvnitř can_access_table) - /api/settings: drop permissions field z GET; PUT /api/settings/dataset gate přepnut na can_access(user_id, "table", dataset, conn) Auth: - app/auth/jwt.py:create_access_token: drop role parametr (claim zmizí z nově vydávaných JWT; staré tokeny zůstávají valid, claim ignored) - app/api/users.py: drop role z CreateUserRequest / UpdateUserRequest (admin promotion = explicit add to Admin group via memberships API) - src/repositories/users.py: drop role z create() / update() CLI: - da admin set-role smazán → hard-fail s replacement command - da admin add-user --role flag pryč - da auth import-token --role flag pryč - da auth whoami: drop "Role:" výpis - cli/config.py:save_token: role parametr now optional, no longer written (back-compat se starými token.json soubory zachována — pole se ignoruje) Tests: - DELETE: test_permissions.py, test_permissions_api.py, test_access_requests_api.py - REWRITE: test_access_control.py (resource_grants flow), test_rbac.py (can_access_table over resource_grants), test_journey_rbac.py (drop access-request flow), test_resource_types.py (drop env-gate tests, drop is_public from helpers), test_v2_.py (drop role-based user dicts in favor of id-based + Admin group membership), test_settings_api.py (no permissions field, can_access gate) - TRIVIAL: ~30 souborů — drop role="admin" arg z UserRepository.create a 3rd positional role z create_access_token - NEW: test_v18_to_v19 migration test (test_db.py), test_can_access_table_no_implicit_public (test_rbac.py), test_admin_set_role_returns_hardfail (test_cli_admin.py) - OpenAPI snapshot regenerated Docs: - CHANGELOG: BREAKING entry pod [Unreleased] - CLAUDE.md: schema v18 → v19 - docs/architecture.md: schema table + RBAC sekce přepsána - docs/auth-google-oauth.md: admin promotion přes da admin break-glass - cli/skills/security.md: kompletně přepsáno na group-based model - docs/TODO-rbac-data-enforcement.md: smazáno (TODO splněn) Test results: 2363 passed, 19 failed. Zbývající failures jsou pre-existing Windows-specific issues (fcntl, charset) nesouvisející s tímto PR — ověřeno git stash pop. Plan: ~/.claude/plans/floofy-coalescing-parnas.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> chore(release): cut 0.27.0 --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-04-30 22:02:16 +02:00
minasarustamyan	fb1573766a	feat(admin): users/groups UI polish + SSO lock + v18 migration (#142 ) Cuts release 0.24.0. ## Highlights - SSO-managed accounts read-only for password / delete operations (UI + API). New `is_sso_user` flag derived from group memberships. - Admin/Everyone system rows show `google_sync` chip + Workspace email subtitle when env-mapped. - Origin pill vocabulary unified across `/admin/groups`, `/admin/access`, `/admin/users`, `/admin/users/{id}`, `/profile` (Admin yellow, Everyone gray, google_sync green, custom purple). - Effective-access readout no longer short-circuits for admin users — always renders per-resource breakdown. - Schema migration v18 drops stranded non-google memberships in env-mapped Admin/Everyone groups (cleans up v13's blanket Everyone backfill). ## Devin findings addressed - _is_sso_user requires source='google_sync' on system-group branches (so v13 system_seed memberships in env-mapped Everyone don't lock out the admin). - POST add-to-group returns correct origin via _derive_origin (matching GET). - 8 customer-specific token instances (groupon.com / foundryai) replaced with vendor-neutral placeholders across templates, tests, and CHANGELOG. - deriveDisplayName name-skip for canonical "Admin"/"Everyone" so an overlapping AGNES_GOOGLE_GROUP_PREFIX doesn't mangle the chip text. See CHANGELOG [0.24.0] for full notes.	2026-04-30 15:16:04 +02:00
ZdenekSrotyr	70672204fe	feat(memory): admin Edit + MEMORY_DOMAIN RBAC + ai-section UI (#141 ) Cuts release 0.23.0. ## Highlights - Single-item Edit button on every memory item card (modal hits PATCH /api/memory/admin/{id}). - MEMORY_DOMAIN RBAC resource type — admins grant user_groups access to specific domains via /admin/access. Composes with existing audience filter (OR semantics, no-op when no grants). - ai: section editable in /admin/server-config — admins can set ANTHROPIC_API_KEY / model / provider / base_url for the corporate-memory extractor without editing instance.yaml directly. api_key auto-masked. ## Devin findings addressed - Modal NULL→empty fix (audience visibility wouldn't break). - Stats endpoint granted_domains parity with list endpoint. - Documented intentional MEMORY_DOMAIN→audience bypass. - Documented conscious ai.base_url SSRF exclusion (legit internal LiteLLM/vLLM proxies). See CHANGELOG [0.23.0] for full notes.	2026-04-30 11:04:41 +02:00
ZdenekSrotyr	83adf01bde	fix(v2): #134 BigQuery cross-project errors return structured 502/400 + BqAccess facade (#138 ) * docs(spec): #134 unify BigQuery access behind BqAccess facade Brainstorm output for issue #134. Captures: - root cause (incl. correction of the issue's hypothesis about commit 33a9964) - BqAccess facade API + project resolution rules - error contract — typed BqAccessError mapped to HTTP 502 for upstream BQ failures, 500 for deployment/config bugs - migration plan for v2_scan, v2_sample, RemoteQueryEngine - test rewrite eliminating _bq_client_factory injection point - E2E verification protocol on agnes-development as success criterion * docs(spec): #134 revise after first review Incorporates code-reviewer findings: Must-fix: - Add v2_schema (2 copies of INSTALL/LOAD/SECRET dance) to migration scope. - Reframe v2_scan headline: missing try/except around BQ calls is the actual cause of bare 500s, not project resolution (which 33a9964 fixed). - List two more deferred call sites (extractor.py, register_bq_table) with explicit rationale. Important: - Drop billing != data clause from cross_project_forbidden heuristic; rely only on 'serviceusage' substring. billing != data is normal for cross-project setup, was over-classifying. - Split bq_bad_request into _user (400) and _server (502) variants; add sql_origin parameter to translate_bq_error so call sites declare whether SQL contains user input. - Add @functools.cache to BqAccess.from_config; document tests bypass via dependency_overrides. - Replace monkey-patched-classmethod test pattern with BqAccess(client_factory=...) injection at construction time. Cleaner than today's _bq_client_factory and 1:1 migration shape. - Keep BqProjects.data (reviewer assumed registry has source_project; it doesn't). Multi-project explicitly listed as non-goal with note. Nice-to-have: - Add 'Implementation strategy' section: 2 staged commits (bug fix alone is revertable; refactor follows). - Extend E2E protocol to cover all three endpoints, not just /sample. - Note removal of stale docstring at src/remote_query.py:204. * docs(spec): #134 revision 3 — incorporates second-round review Must-fix from second review: - v2_schema split into two migration cases: _fetch_bq_schema translates errors via translate_bq_error; _fetch_bq_table_options preserves its swallow-all 'except Exception → return {}' so /schema doesn't 502 on partition-info failures. - RemoteQueryEngine.__init__ now resolves BqAccess lazily (in _get_bq_client, not in __init__). Without this, ~7 DuckDB-only tests in test_remote_query.py would suddenly fail with not_configured. - translate_bq_error pass-through for BqAccessError is now load-bearing (clause 1, before any Google-API branch). bq.client() raises BqAccessError for bq_lib_missing/auth_failed; without explicit pass-through those fall to 'unknown' and re-raise as bare 500. - Commit 1 now emits the SAME structured response shape as commit 2 to avoid contract churn between commits. - BIGQUERY_PROJECT env-var precedence is BREAKING for env-only deployments — flagged in CHANGELOG ### Changed. Editorial: - sql_origin renamed to bad_request_status with values 'client_error' / 'upstream_error' (clearer about what the parameter actually decides). bq_bad_request_user/_server kinds collapsed to bq_bad_request (400) and bq_upstream_error (502). - CLI (cli/commands/query.py) noted as external RemoteQueryEngine caller; unaffected because new bq_access kwarg has default None. - Added unit/integration tests for the new contracts: test_translate_passes_through_BqAccessError, test_v2_scan_returns_500_on_bq_lib_missing, test_v2_schema_returns_200_with_empty_partition_on_bq_failure, test_resolve_succeeds_after_config_set. - E2E protocol now covers /schema as the fourth endpoint. - Documented functools.cache-doesn't-cache-exceptions semantics and fixture nullcontext-doesn't-close caveat for nested sessions. * docs(spec): #134 revision 4 — incorporates third-round review Third reviewer verdict: 'implementation-ready with two trivial edits'; explicitly noted prior rounds did the heavy lifting. Edits: 1. get_bq_access() module-level function instead of @classmethod @functools.cache from_config. Removes the classmethod-cache stacking footgun (different Python versions wrap differently) and gives FastAPI's dependency introspection a clean function signature. Drops the 'Do not subclass BqAccess' caveat that no longer applies. 2. Commit 1 strategy explicitly: wrap _fetch_bq_sample (v2_sample), _bq_dry_run_bytes + _run_bq_scan (v2_scan), and _fetch_bq_schema (v2_schema strict block). Do NOT touch _fetch_bq_table_options swallow-all in commit 1 — preserved as-is, then migrated (still preserved) in commit 2. All three endpoints emit the same structured body shape so client parsers see one consistent contract throughout the staged rollout. No more half-rolled-out window where /sample is bare 500 while /scan is structured 502. * docs(plan): #134 implementation plan — Phase 1 (atomic bug fix) + Phase 2 (BqAccess refactor) + Phase 3 (verification) Bite-sized TDD tasks. 3 phases, 16 tasks total: Phase 1 (Commit 1) — atomic bug fix across all four v2 endpoints: Tasks 1.1-1.5 wrap _fetch_bq_sample, _bq_dry_run_bytes, _run_bq_scan, _fetch_bq_schema with structured 502/400 try/except. _fetch_bq_table_options preserved untouched. CHANGELOG Fixed entries. Phase 2 (Commit 2) — BqAccess facade extraction + migration: Tasks 2.1-2.5 build connectors/bigquery/access.py bottom-up (BqProjects, BqAccessError, translate_bq_error, default factories, BqAccess class, get_bq_access module-level cached). Task 2.6 adds conftest.py fixture. Tasks 2.7-2.9 migrate v2_scan, v2_sample, v2_schema to BqAccess. Tasks 2.10-2.11 migrate RemoteQueryEngine + tests (lazy bq_access, drop _bq_client_factory). Task 2.12 CHANGELOG Changed BREAKING + Internal. Phase 3 — Verification: 3.1 full pytest. 3.2 squash into two PR-shape commits. 3.3 manual E2E on agnes-development per spec protocol → close #134. Self-review table maps spec sections to implementing tasks; no gaps. * fix(v2): #134 structured 502/400 on BQ errors across /scan, /scan/estimate, /sample, /schema Wraps the BigQuery call sites in v2_scan, v2_sample, and v2_schema (strict block only) with try/except for google.api_core exceptions, translating to HTTPException with a structured body shape: {error, message, details}. Fixes Pavel's report (#134) where these endpoints returned bare HTTP 500 with no body when the SA on agnes-development hit cross-project Forbidden on serviceusage.services.use. Also fixes /sample's missing billing_project fallback (the bug 33a9964 fixed for /scan never landed here). Status code split: - /scan, /scan/estimate: BadRequest -> 400 (bq_bad_request) since SQL is user-derived from req.select/where/order_by. - /sample, /schema: BadRequest -> 502 (bq_upstream_error) since SQL is server-constructed from validated identifiers. - All Forbidden -> 502 with cross_project_forbidden if 'serviceusage' in error message (with hint pointing at data_source.bigquery.billing_project), else bq_forbidden. Body shape matches what the upcoming BqAccess refactor (next commit) will produce, so client-side parsers see one consistent contract throughout the staged rollout. _fetch_bq_table_options preserved exactly as-is — its swallow-all-and-return-empty contract is intentional and survives into the refactor; /schema continues to return 200 with empty partition info when partition queries fail. Outer wraps in scan_endpoint, scan_estimate_endpoint, sample, and schema endpoints exist only to make the test pattern (monkeypatching whole _fetch_* functions) work, and are tagged TODO(#134 Phase 2) for removal once BqAccess centralizes translation. * refactor(bq): #134 BqAccess facade — unify v2_scan, v2_sample, v2_schema, RemoteQueryEngine Extracts the duplicated BigQuery-access pattern (project resolution + client construction + DuckDB-extension session + Google-API error translation) into connectors/bigquery/access.py. Migrates four call sites to use it: - app/api/v2_scan.py — _bq_dry_run_bytes, _run_bq_scan - app/api/v2_sample.py — _fetch_bq_sample - app/api/v2_schema.py — _fetch_bq_schema (strict translation), _fetch_bq_table_options (preserves swallow-all best-effort contract) - src/remote_query.py — RemoteQueryEngine, lazy bq_access kwarg The new module exposes: - BqProjects (frozen dataclass: billing + data project IDs) - BqAccessError (typed exception with HTTP_STATUS class mapping) - BqAccess (facade with injectable client_factory/duckdb_session_factory for tests; defaults call the real google-cloud-bigquery + DuckDB extension) - get_bq_access (module-level @functools.cache; FastAPI Depends target) - translate_bq_error (Google API exception → BqAccessError mapper, with BqAccessError pass-through, 'serviceusage'-substring heuristic for cross_project_forbidden, and bad_request_status param distinguishing user-derived (400) from server-constructed (502) SQL) - _default_client_factory, _default_duckdb_session_factory RemoteQueryEngine.__init__ no longer accepts _bq_client_factory; tests migrate to bq_access=BqAccess(projects, client_factory=...). DuckDB-only RemoteQueryEngine tests need no changes — bq_access defaults to None and get_bq_access() is only invoked on first BQ call (lazy resolution). BqAccessError raised internally is translated to RemoteQueryError( error_type="bq_error") in _get_bq_client to preserve the engine's existing public contract — CLI and /api/query/hybrid callers see no change. Endpoint tests (test_v2_scan, test_v2_scan_estimate, test_v2_sample, test_v2_schema) migrate from monkey-patching whole _fetch_* functions to using the new bq_access fixture in tests/conftest.py — which exercises the REAL translation path through BqAccess + translate_bq_error, closing the test gap flagged in Task 1.1's review. Side-effect behavior change: v2_sample's FROM clause now uses the data project (instance.yaml data_source.bigquery.project), not the conflated billing_project from Phase 1. Documented in CHANGELOG ### Internal. BREAKING for deployments combining BIGQUERY_PROJECT env var with data_source.bigquery.project in instance.yaml — env var now overrides data project too. See CHANGELOG ### Changed. Two known-duplicate BQ-access sites (connectors/bigquery/extractor.py, scripts/duckdb_manager.register_bq_table) explicitly out of scope; tracked as follow-up. Removed stale docstring at the previous src/remote_query.py:204 that referenced scripts.duckdb_manager._create_bq_client as the default BQ client factory (RemoteQueryEngine never actually used that function). Test counts: tests/test_bq_access.py +27 (new), tests/test_v2_.py + tests/test_remote_query.py migrated to bq_access fixture (counts unchanged or +1-2 per file). Full suite: 2086 passed, 8 pre-existing failures (DB migration tests with unrelated internal_roles DependencyException — not introduced by this PR). fix(bq_access): translate DefaultCredentialsError to BqAccessError(auth_failed) CI on PR #138 caught: bigquery.Client(...) resolves Application Default Credentials at construction time; without ADC (CI without SA key, dev laptop without 'gcloud auth application-default login') it raises google.auth.exceptions.DefaultCredentialsError synchronously. Pre-fix _default_client_factory only caught ImportError, so DefaultCredentialsError propagated as raw exception — and from production endpoints would surface as bare 500 (the exact failure mode #134 sets out to fix). Now translates to BqAccessError(kind='auth_failed', details.hint='Run gcloud auth application-default login...'). Endpoint catch chain returns HTTP 502 with structured body. Adds unit test test_raises_auth_failed_on_default_credentials_error. Third-round spec review flagged this case in passing; the fix didn't land. CI's auth-less environment surfaced it. * fix(bq_access): get_bq_access() returns sentinel instead of raising when not configured Devin BUG_0001 on PR #138 review: 'get_bq_access() as FastAPI Depends breaks all v2 endpoints for non-BigQuery instances'. Pre-fix: get_bq_access() raised BqAccessError(not_configured) when neither BIGQUERY_PROJECT env nor data_source.bigquery.project was set. Because FastAPI resolves Depends() BEFORE the endpoint body runs, this exception fires during dep-injection — the endpoint's try/except BqAccessError clause never gets a chance to catch it. Result: every v2 request on Keboola-only or CSV-only instances returned bare HTTP 500, even for local-source tables that never touch BigQuery. Fix: get_bq_access() now returns a sentinel BqAccess with empty BqProjects and factories that raise BqAccessError(not_configured) on actual use. Construction succeeds, FastAPI's dep-injection cleanly yields the sentinel, the endpoint runs. The local-source code path in build_sample / build_schema / etc. never calls bq.client() or bq.duckdb_session() (it reads parquet directly), so non-BQ tables return 200 as before. Only when an endpoint actually tries to query BQ (source_type == 'bigquery') does the sentinel raise — and the endpoint's existing except BqAccessError catches it normally, returning structured 502 with hint. Test get_bq_access::test_raises_not_configured_when_neither_set renamed and rewritten to test_returns_sentinel_when_neither_set: asserts BqAccess is returned, then asserts client() and duckdb_session() each raise BqAccessError(not_configured) on call. Test test_does_not_cache_exceptions removed (no longer applicable) and replaced with test_sentinel_is_cached_per_process documenting the operator-restart-on-config-change contract. * docs(spec+plan): #134 genericize customer-specific tokens (CLAUDE.md OSS rule) Devin BUG_0001/0002 round 3 on PR #138: spec and plan docs contained customer-specific deployment hostnames, deployment names, and a GCP project ID that violated CLAUDE.md's vendor-agnostic OSS rule ('Nothing customer-specific belongs in code, configuration defaults, comments, docs, commit messages, PR titles, or PR bodies'). Replacements: agnes-development.groupondev.com -> <your-agnes-host> agnes-development -> <your-dev-instance> prj-grp-dataview-prod-1ff9 -> <your-data-project> s1_session_landings -> <bq_table_id> E2E verification semantics unchanged — operators still run the same four curls + config flip + retry, just substituting their own host / deployment name / project / table. * fix(bq_access): hook get_bq_access.cache_clear into instance_config.reset_cache Devin ANALYSIS_0004 on PR #138: get_bq_access is @functools.cache'd at process level, so it captures BigQuery project IDs at first call and ignores subsequent instance.yaml changes. Pre-Phase-2 the v2 endpoints re-read get_value() on every request, so admin /api/admin/server-config saves (which call instance_config.reset_cache()) hot-reloaded the BQ project. Without this fix, my refactor silently regresses that contract — operators editing instance.yaml via the admin UI would see no effect on v2 endpoints until container restart. instance_config.reset_cache() now also calls connectors.bigquery.access.get_bq_access.cache_clear() (lazy import, swallowed if connectors module isn't loaded — keeps instance_config usable in isolated unit tests). Adds test_instance_config_reset_cache_invalidates_get_bq_access as regression guard. Updates CHANGELOG Internal entry to mention the hot-reload contract + the not-configured sentinel behavior (round-3 fix from Devin BUG_0001 was previously only in commit message). * fix(bq_access): surface not_configured before identifier validation + plan path genericize Devin BUG_0001 + BUG_0002 round 5 on PR #138. BUG_0001 (plan doc): personal filesystem path violated CLAUDE.md vendor-agnostic rule. Replaced with '<worktree-root>' placeholder. BUG_0002 (sentinel error path): when get_bq_access() returns the sentinel BqAccess (BQ not configured), the empty bq.projects.data was reaching validate_quoted_identifier first and raising ValueError -> endpoint mapped to HTTP 400 'unsafe_identifier' instead of structured 500 'not_configured' with hint. Each fetch helper now checks 'if not bq.projects.data: bq.client()' as the first step, which triggers the sentinel's BqAccessError(not_configured). Endpoint catches the typed error and returns HTTP 500 with hint pointing at data_source.bigquery.project. Best-effort _fetch_bq_table_options returns {} silently in this case (preserves the swallow-all contract). * fix(bq_access): classify DuckDB-native exceptions from bigquery_query() via string match Devin ANALYSIS on PR #138 review (latest round). The DuckDB bigquery extension is a C++ plugin making its own HTTP calls — when BQ returns 403, it throws duckdb.IOException with the BQ error embedded as text, not gax.Forbidden. translate_bq_error's isinstance checks would miss these, falling to case 7 → bare 500 in production for v2_scan, v2_sample, and v2_schema (the bigquery_query() paths). Fix: last-resort string-match heuristic before the re-raise. 'Forbidden' / '403' / 'Bad Request' / '400' in the lowercased message classifies via the same kind hierarchy. The 'serviceusage' substring still distinguishes cross_project_forbidden from bq_forbidden. Specific enough that random exceptions without HTTP-error keywords still re-raise. Adds 4 unit tests covering the new heuristic + the 'don't swallow random exceptions' invariant. * chore(release): cut 0.22.0 PR #138 contains issue #134 user-visible behavior changes: - BREAKING: BIGQUERY_PROJECT env var now overrides instance.yaml data_source.bigquery.project for v2 endpoints (previously RemoteQueryEngine billing only). - Fixed: structured 502/400 on /api/v2/sample, /scan, /scan/estimate, /schema when BigQuery raises Forbidden/BadRequest (was bare 500). - Internal: BqAccess facade refactor unifying four duplicate BQ-access call sites; instance_config.reset_cache() now invalidates BqAccess cache too so admin server-config saves hot-reload BQ project IDs. Bumps to 0.22.0 because PR #137 merged first and took 0.21.0.	2026-04-30 10:11:20 +02:00
minasarustamyan	4ec5ff44dd	feat(setup): cross-platform TLS bootstrap + marketplace plugin install (#137 ) Bootstraps the Agnes Claude Code marketplace + RBAC-allowed plugins from the dashboard CTA, and inlines the server's TLS cert when the chain isn't publicly trusted (self-signed / private CA). Cross-platform setup prompt covers Windows Git Bash, macOS, Linux. Includes Bun-compiled `claude` fix (macOS goes via git-clone fallback, same as Windows), PAT stripping after clone, explicit error handling, and four rounds of Devin Review fixes (phantom step references, $PLATFORM re-detection, heredoc/awk line-count sync). Cuts 0.21.0. See CHANGELOG.md [0.21.0] section for details.	2026-04-30 08:56:45 +02:00
Vojtech	38f6b639d2	feat(observability): request_id end-to-end + dev debug toolbar + centralized logging (#136 ) Cuts release 0.20.0. ## Highlights - X-Request-ID header on every response + sanitized to [A-Za-z0-9_-] (CRLF log-forging mitigation) - Error pages (HTML + JSON 500) surface request_id for support tickets - Dev debug toolbar gated by DEBUG=1 — fastapi-debug-toolbar with custom DuckDBPanel - Centralized app.logging_config.setup_logging() replaces 23 scattered basicConfig calls - Telegram bot drops bot.log file — stdout only (BREAKING) ## Devin findings addressed - BUG_0001: .env.template no longer claims FastAPI debug=True - BUG_0002: subprocess extractor logs INFO to stderr again - ANALYSIS_0003: _wants_html no longer matches Accept: / (curl gets JSON as before) - BUG on b1c6ee9: HTML 500 page no longer leaks str(exc) in production - BUG on b13d2fe: 2 CLAUDE.md compliance flags (transform.py + ws_gateway) accepted as scope-limited logging refactor — follow-up to update CLAUDE.md if needed See CHANGELOG [0.20.0] for full notes.	2026-04-29 22:54:21 +02:00
ZdenekSrotyr	b7a1795834	feat(scheduler): re-wire sync_schedule + script.schedule; tune via env; OpenMetadata TLS (#135 ) Bundles 4 issues: - #79 — table_registry.sync_schedule honored at runtime (API-side filter + Pydantic validators) - #78 — script_registry.schedule honored via new POST /api/scripts/run-due (atomic claim, BackgroundTask exec, deploy-time safety validation) - #77 — sidecar JOBS env-driven (SCHEDULER_DATA_REFRESH_INTERVAL/HEALTH_CHECK_INTERVAL/SCRIPT_RUN_INTERVAL/TICK_SECONDS) - #89 — OpenMetadataClient verify=True default (BREAKING for self-signed) Cuts release 0.19.0. See CHANGELOG for full notes incl. Known Limitations.	2026-04-29 22:06:30 +02:00
minasarustamyan	953bd9d250	fix(marketplace): use plugin.json name in synth marketplace.json (#133 ) Closes the /plugin UI 'Plugin <X> not found in marketplace' bug. Synth marketplace.json catalog 'name' now reads from <plugin_dir>/.claude-plugin/plugin.json (with fallback to upstream marketplace.json name). On-disk plugins/<slug>-<plugin>/ layout preserved so cross-marketplace files don't collide. /marketplace/info exposes both name and prefixed_name (BREAKING — downstream tooling parsing 'name' for the slug-prefixed form must switch to prefixed_name).	2026-04-29 19:25:57 +02:00
minasarustamyan	c940593a90	feat(auth): Google Workspace group prefix filter + system mapping (#131 ) Three new env vars wire the Google OAuth callback to a configurable Workspace prefix and route admin/everyone Workspace groups onto the seeded system rows: AGNES_GOOGLE_GROUP_PREFIX, AGNES_GROUP_ADMIN_EMAIL, AGNES_GROUP_EVERYONE_EMAIL. Login gate redirects users with no prefix-matching group to /login?error=not_in_allowed_group. BREAKING: auto-Everyone membership for new users removed. Admin UI/API are read-only on Google-managed groups. See docs/auth-groups.md.	2026-04-29 14:08:04 +02:00

1 2 3 4

198 commits