agnes-the-ai-analyst

Author	SHA1	Message	Date
ZdenekSrotyr	8f71af6c22	docs(changelog): variant C — banner-on-setup model Update [Unreleased] to reflect the actual shipped behavior: - banner on /setup replaces CLAUDE.md generation - BREAKING: da analyst setup no longer writes CLAUDE.md - BREAKING: GET /api/welcome removed - schema v21/v22 notes corrected - drop sync_interval bullet (not part of this feature set)	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	c4d23cf235	feat(admin-prompt): update editor UX + docs for banner context - admin_welcome.html: update subtitle, description, placeholder cheatsheet (drop tables/metrics/marketplaces/sync_interval; add user-null note and security note). Textarea initial value is now empty (no default template to show). Preview pane uses innerHTML (HTML output). refreshStatus sets editor to empty when no override. Preview pane styled as light surface. Reset modal copy updated (no banner shown, not "OSS-shipped template"). - config/claude_md_template.txt: deleted (markdown template is gone; default is now no banner). - docs/agent-setup-prompt.md: rewritten for variant C — describes the /setup banner, smaller placeholder table, security/sanitization notes, anonymous-user guard, example HTML snippet.	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	8db4c1645b	feat(admin-prompt): variant C — banner on /setup, drop CLAUDE.md generation - src/welcome_template.py: rewrite as HTML banner renderer (render_agent_prompt_banner); drop _list_tables, _metrics_summary, _marketplaces_for_user, render_welcome, _load_default_template. build_context now exposes only instance/server/user/now/today. _sanitize_banner_html strips script/iframe/on*/javascript: post-render. - app/api/welcome.py: drop get_welcome handler, WelcomeResponse, old _VALIDATION_STUB_CONTEXT. Admin endpoints stay at same URLs; validation stub updated to match new slim context. Preview now uses autoescape=True. - app/web/router.py: setup_page calls render_agent_prompt_banner and passes banner_html to install.html; admin_agent_prompt_page drops _load_default_template. - app/web/templates/install.html: add .setup-banner CSS + banner block above hero. - cli/commands/analyst.py: replace _generate_claude_md with _init_claude_workspace; no CLAUDE.md written, only .claude/CLAUDE.local.md placeholder + settings.json hooks. - tests: delete test_cli_analyst_welcome.py (tests deleted endpoint/function); rewrite TestGenerateClaudeMd → TestInitClaudeWorkspace; update api test to assert /api/welcome returns 404 and remove welcome-fetch tests.	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	60386b9c3c	polish: drop dead CSS, fix docstring drift, add agent-prompt route test	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	ecb6c35ad5	feat(admin): rename /admin/welcome to /admin/agent-prompt (Agent Setup Prompt) Rename the welcome prompt editor from /admin/welcome to /admin/agent-prompt and update all UI labels to "Agent Setup Prompt". API endpoint URLs are unchanged (PUT/GET/DELETE /api/admin/welcome-template, GET /api/welcome). - Nav menu: "Welcome prompt" → "Agent Setup Prompt", href updated - Page title and h2 updated in admin_welcome.html - Error message hint in app/api/welcome.py updated to /admin/agent-prompt - Dashboard: replace inline <details> preview of _claude_setup_instructions with a simple link to /setup (Task C) - docs/welcome-template.md renamed to docs/agent-setup-prompt.md; internal references to /admin/welcome updated - OpenAPI snapshot path updated - Tests updated to reflect new route and removed inline preview	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	c7b14fb120	feat(admin): drop setup_banner feature; consolidate into single editor Remove the setup_banner feature (admin-editable /setup page banner) and all associated code: API router, repository, renderer, admin template, tests, and docs. The setup_page handler no longer calls render_setup_banner; the install.html template no longer renders banner_html. The setup_banner DuckDB table (v22) is kept intact for forward-compat with already-migrated instances — only the application code is removed. CHANGELOG updated: setup_banner bullets removed; Agent Setup Prompt (welcome-template feature) now stands alone as the single editable prompt.	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	0ee22f8fb0	docs: add setup-banner.md + rename migration test to test_db_schema_version.py - Add docs/setup-banner.md: placeholder table, autoescape semantics, security note on post-render stripping, diff table vs welcome-template (M-9). - Update CHANGELOG.md to reference docs/setup-banner.md. - Rename tests/test_db_migration_v20.py → tests/test_db_schema_version.py (file tested SCHEMA_VERSION==22, not just the v20 step; clearer name) (M-10).	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	5bfd8997ea	test: RBAC marketplace render test + validation stub drift detectors - test_render_marketplaces_filtered_by_rbac: seeds 2 marketplaces, 2 groups, grants, 2 users; asserts each user's rendered output references only their group's marketplace/plugins, not the other's (I-3). - test_validation_stub_matches_build_context_shape in test_welcome_template_api.py: asserts _VALIDATION_STUB_CONTEXT top-level and nested keys (instance, server, user) match build_context() output so stub drift is caught in CI (I-4). - test_validation_stub_matches_build_context_shape in test_setup_banner_api.py: same shape check against build_setup_banner_context() (I-4).	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	b3ffc98e9f	fix(security): XSS hardening for setup banner + cleanup unused imports - Add _sanitize_banner_html() to src/setup_banner.py: strips <script>/ <iframe> blocks, on* event-handler attributes, and javascript:/data: URI schemes post-render (I-2). Defense-in-depth — /setup is partly anonymous so malformed admin content must not execute in visitors' browsers. - Apply sanitizer in render_setup_banner() before returning rendered HTML. - Add 3 unit tests: test_render_strips_script_tags, test_render_strips_event_handlers, test_render_strips_javascript_uri. - Drop unused Optional import from src/repositories/welcome_template.py and src/repositories/setup_banner.py (M-6).	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	b0ec842804	feat(admin-ui): SRI + CDN fallback for CodeMirror, 301→302 on /install, error sanitization - Add integrity= + crossorigin= to all 4 cdnjs tags in admin_welcome.html and admin_setup_banner.html (I-1) - Add graceful CDN fallback: when CodeMirror is undefined (SRI mismatch or CDN down), degrade to styled plain textarea with polyfill editor interface so save/reset/preview still work (I-1) - Replace fixed 480px editor height with calc(100vh - 320px) for viewport-relative sizing; add min-height: 480px to .welcome-editor-col (M-8) - Change /install redirect from 301 to 302 to prevent indefinite browser caching (I-5) - Sanitize Jinja2 error detail in /api/welcome 500 response: log full error server-side, return generic detail pointing at /admin/welcome (M-7) - Hoist build_context import to module level in app/api/welcome.py (M-11)	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	8ec194cbe4	test(db): bump v20 migration test assertions to v22	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	39146288e1	feat: admin-editable setup_banner on /setup page (schema v22) Adds an optional Jinja2/HTML banner displayed above the bootstrap commands on /setup. Empty by default; admin authors it at /admin/setup-banner. autoescape=True — safe for HTML context. Render failures return "" so a broken banner never breaks /setup. Schema v22: setup_banner singleton table, auto-migration v21→v22.	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	40d221f20a	feat(admin-welcome): CodeMirror editor + live preview pane	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	4bcdc4e7d7	feat(dashboard): link Claude Code setup CTA to /setup page	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	85967e14ca	feat(web): rename /install → /setup; nav label 'Setup local agent' - Add GET /setup serving install.html (CLI + Claude Code setup page) - Add GET /install → 301 redirect to /setup for backwards compat - Move first-time setup wizard from /setup to /first-time-setup - Update nav link: href=/setup, label 'Setup local agent', active on both /setup and /install paths - Update page <title> to 'Setup local agent — …' - Update /dashboard and /setup comment in _claude_setup_instructions.jinja - Update tests and OpenAPI snapshot accordingly	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	92fd78cfb4	fix(admin-welcome): redesign with peer chrome, toast, btn-copy	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	1eb03405c7	test(db): bump v20 migration test assertions to v21	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	b579f119b5	docs(changelog): customizable welcome prompt	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	1c07977d84	docs: welcome-template customization reference	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	517e63d217	fix(cli): warn on welcome-fetch failures; expand test coverage	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	c604dad9cf	feat(cli): da analyst setup fetches rendered welcome from /api/welcome	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	ecaa113c68	fix(admin-welcome): credentials: include, real-content preview, refresh after mutate	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	2b3048f77f	feat(web): /admin/welcome editor page	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	93b713900b	fix(api): validate template render on PUT; broaden render-time catch	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	0d1ecd235d	feat(api): /api/welcome + /api/admin/welcome-template endpoints	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	4449623af8	fix(renderer): tolerate missing optional tables; document tzinfo	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	51f287a81a	feat: server-side jinja2 renderer for welcome prompt	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	d055417377	feat(config): default welcome template in jinja2 + sync_interval	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	19f1795350	feat(repo): WelcomeTemplateRepository singleton CRUD	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	33e7107637	feat(db): schema v15 — welcome_template singleton table	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	96281f884c	docs: implementation plan for customizable welcome prompt	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	cec7605c02	chore: ignore .worktrees/ for local isolated workspaces	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	214793b635	Merge pull request #166 from keboola/zs/fix-health-e2e-tests fix(tests): align docker-e2e health asserts with current /api/health shape	2026-05-03 16:09:30 +02:00
ZdenekSrotyr	13ab464ac5	Merge branch 'main' into zs/fix-health-e2e-tests	2026-05-03 15:55:02 +02:00
ZdenekSrotyr	c54917fc50	fix(tests): drop stale 'healthy' from /api/health status assert Per Devin review on #166: /api/health returns 'ok' or 'unhealthy'; 'healthy' is the detailed endpoint's vocabulary (app/api/health.py:180). The pre-existing OR-tuple was dead code and inconsistent with the rest of this PR's alignment work.	2026-05-03 15:40:41 +02:00
ZdenekSrotyr	f348296685	fix(tests): align docker-e2e health asserts with current /api/health shape `/api/health` is the auth-free LB probe — returns `status` + `db_schema` only. `version` lives in `/api/version` and the richer `services.duckdb_state` lives in `/api/health/detailed` (auth-gated). The two e2e asserts had drifted and broke nightly on main.	2026-05-03 11:21:19 +02:00
ZdenekSrotyr	91caefaca9	security(auth): per-IP rate limit + last-admin guard (#165 ) * security(auth): per-IP rate limit on auth endpoints + generalize last-admin guard Closes #45 and #151. #45 — every auth endpoint was unthrottled (login, magic-link, token, bootstrap), leaving us open to password brute-force and SMTP email-bombing. Wires slowapi (new dep) into the middleware chain with per-route limits: 10/min on login + token, 5/min on send-link, 3/min on bootstrap. Returns 429 with Retry-After: 60 once exceeded. Per-IP key respects the leftmost X-Forwarded-For hop (Caddy in front of the app strips client-supplied XFF). Operator escape hatch: AGNES_AUTH_RATELIMIT_ENABLED=0. Test suite disables the limiter via autouse conftest fixture so existing auth tests that hammer endpoints in tight loops are unaffected. #151 — DELETE /api/admin/users/{id}/memberships/{group_id} and the mirror DELETE /api/admin/groups/{group_id}/members/{user_id} only guarded against self-removal as last admin. Generalizes to refuse removing anyone from the seeded Admin group when they are the only remaining active admin (mirrors the existing count_admins(active_only=True) <= 1 check on delete_user / update_user). Recovery from zero admins requires direct DB access, so this closes a path where a scheduler/bootstrap actor that bypasses normal admin checks could otherwise empty the group. * security(auth): throttle remaining email-bombing + token-confirm endpoints Address code-review gap on PR #165 — the first commit covered /send-link but missed two endpoints with the IDENTICAL email-bombing surface: - POST /auth/password/reset — sends reset mail, anti-enum response - POST /auth/password/setup/request — sends setup mail, anti-enum response Both now share the 5/min limit with /send-link. Also add 10/min to the token-confirm surfaces — high-entropy tokens but partial leaks via logs / referer have surfaced before, and unbounded guess rate would let an attacker exhaust the keyspace adjacent to a leaked prefix: - POST /auth/email/verify - GET /auth/email/verify — closes the click-through bypass - POST /auth/password/reset/confirm - POST /auth/password/setup/confirm Doc fix: rate_limit.py module docstring + CHANGELOG entry no longer claim "disable without a redeploy" (misleading). The Limiter constructor freezes `enabled` from env at import time, matching every other Agnes env knob — operators set the flag and bounce the container. Tests: 4 new cases in test_auth_rate_limit.py covering /reset, /setup/request, /reset/confirm, GET /verify. Full suite: 2583 passed, 32 skipped, 0 failed. * security(auth): throttle JSON /auth/password/setup — closes form-throttle bypass Second code-review pass on PR #165 caught a fifth gap: POST /auth/password/setup (JSON variant, kept for backward compat) consumes the same setup_token as the web form /setup/confirm but was unthrottled — an attacker brute-forcing the token just switches from the form path to the JSON path and resumes at unbounded RPS. Apply the same 10/min limit and signature shape used on /setup/confirm. Also extend CHANGELOG note about the JSON-variant bypass for future operators reading the security entry. Test: 1 new case (test_password_setup_json_rate_limited_after_10_requests), 9 rate-limit tests + 28 password-flow tests + 41 auth-provider tests pass, no regressions. * chore(release): cut 0.30.1 — auth security hardening (rate limit + last-admin guard)	2026-05-02 21:08:33 +02:00
ZdenekSrotyr	916d0cb4c6	Merge pull request #161 from keboola/zs/readme-030 docs(readme): 0.30.0 highlights	2026-05-02 09:00:26 +02:00
ZdenekSrotyr	6c2040ac13	docs(readme): reflect 0.30.0 — Keboola materialized parity + tab UI + analyst hooks - Source-mode table: 'Materialized SQL' row now lists both BigQuery AND Keboola (Keboola gained materialized parity in 0.30.0). - Two-paragraph operator/analyst overview: admin path through /admin/tables tabs + RBAC deep-link; analyst path through da analyst setup hooks. Detail in CHANGELOG.md [0.30.0] and the GitHub Release prose.	2026-05-02 08:46:12 +02:00
ZdenekSrotyr	a887931339	Merge pull request #152 from keboola/zs/admin-tables-tabs-cleanup /admin/tables tab UI + Keboola materialized + form cleanup	2026-05-02 08:43:38 +02:00
ZdenekSrotyr	07c7bd4c8b	fix(test): reset instance_config cache in TestRebuildFromRegistry leakage repair CI on `dc03837a` showed test_missing_project_returns_error failing with 'ok-project' instead of '' — config-cache leak from the sibling test_returns_skipped_when_no_bq_rows that ran first under pytest-xdist. Pre-existing flake (cache lives in app.instance_config; monkeypatch restores the loader patch but doesn't invalidate the cached return). Earlier CI runs (`a4339ce6`) got lucky on test ordering. Adding an explicit reset_cache() at the top of the test removes the dependency on ordering.	2026-05-01 23:27:59 +02:00
ZdenekSrotyr	dc03837a7b	feat(query-api): better error message when --remote query references a materialized-but-not-rebuilt id E2E sub-agent finding: `da query --remote "SELECT * FROM <id>"` against a materialized table that hasn't yet been rebuilt in the server's analytics.duckdb returns a confusing DuckDB "Table does not exist" message even though the table is in the registry. Materialized rows produce parquets at `${DATA_DIR}/extracts/<source>/data/<id>.parquet`, but the orchestrator's master-view creation is `_meta`-driven — fresh instances or pre-tick states have the registry row without a corresponding view, so analysts hit the bare "does not exist" with no path forward. Improve the error rendering in `app/api/query.py:execute_query`. When DuckDB raises a "table does not exist" error, scan the registry for any `query_mode='materialized'` row whose id or name appears in the failed SQL. On a hit, return a 400 whose detail names the table, explains the materialize state, and offers two concrete next steps: 1. Run `da sync` (or wait for the scheduler tick / hit POST /api/sync/trigger) to materialize the parquet, OR 2. Query the source directly via the catalog alias when the registry row carries bucket+source_table (e.g. `bq."dataset"."table"` for BigQuery, `kbc."bucket"."table"` for Keboola). Detection is bounded — the registry round-trip only fires when DuckDB's error mentions a missing table, so happy-path queries pay no cost. Non-materialized unknowns fall through to DuckDB's raw error. 2 new tests: materialized id surfaces the hint with the bucket+source_table payload; unknown table falls back to the generic error path with no false positive on the new hint.	2026-05-01 23:09:52 +02:00
ZdenekSrotyr	8030a867ec	fix(admin-api): keep source_type validator permissive when primary is 'local' (bootstrap) The strict source_type-availability validator from the prior commit broke ~12 existing tests that register tables on the default test instance (where `data_source.type` resolves to 'local' since no instance.yaml is loaded). The intent of the validator is to catch explicit misconfig: `type=bigquery` instance + `source_type=keboola` payload with no `data_source.keboola.*` block. The bootstrap workflow — admin sets up a fresh instance and registers a few tables before pointing at a real source — should not be gated here. Loosen the check: when `get_data_source_type()` returns 'local' (the fallback when no `data_source.type` is set), skip the rejection. The explicit mismatch case still 422s because that path resolves `configured_primary` to a real source type. Also adds an autouse keboola_instance fixture to test_journey_sync_query.py which exercises Keboola registrations through the full sync→query flow — the fixture documents the test's data-source assumption rather than relying on the bootstrap escape hatch.	2026-05-01 23:09:15 +02:00
ZdenekSrotyr	bc3ba0d43d	feat(admin-api): reject register-table for source_type not configured on instance E2E sub-agent finding: instance configured with `data_source.type='bigquery'` and no `data_source.keboola.` block. Admin POSTs `{source_type: 'keboola'}` to /api/admin/register-table → returns 201, row lands in the registry, but never syncs because the scheduler has no Keboola URL/token to ATTACH against. Operator only notices the gap when `da catalog` keeps showing nothing. The new `_validate_source_type_configured` helper runs immediately after the id/view-name collision checks in `register_table`. A source_type is considered configured when: - it matches `get_data_source_type()` (the instance's primary), OR - a non-empty `data_source.<source_type>` block exists in the effective `instance.yaml` (multi-source instance), OR - it's in `_SOURCE_TYPES_INDEPENDENT_OF_DATA_SOURCE` (Jira / local — both get data through paths that don't involve `data_source.`). Returns 422 with a message that names the configured primary source and points at `/admin/server-config` for enabling a secondary one. None / empty source_type is still tolerated for backward compat with legacy CLI scripts that don't set the field — the route resolves it later. 5 new tests cover: keboola-on-bq rejected, bq-on-keboola rejected, matching source_type still works, jira allowed regardless, omitted source_type passes through. Existing tests that registered Keboola rows on the unconfigured default test instance now opt into a `keboola_instance` fixture to satisfy the new validator (tests/test_admin_bq_register.py + .keboola_materialized + .unregister_cleanup; the multi-source PUT test in test_admin_bq_register adds a `keboola` block to its synthetic config). Pre-existing test_missing_project_returns_error failure in TestRebuildFromRegistry is unrelated (config-cache leakage from a previous test in the same class) — confirmed pre-existing on the prior commit via `git stash` reproduction.	2026-05-01 23:04:51 +02:00
ZdenekSrotyr	dd46461c6c	fix(admin+orchestrator): DELETE registry drops parquet + sync_state; rebuild skips orphan parquets E2E sub-agent finding: register a materialized BQ row → sync to materialize the parquet at `/data/extracts/bigquery/data/<id>.parquet` → DELETE the registry row. The DB row goes away but: - the parquet file stays on disk forever, AND - the sync_state row stays, so `/api/sync/manifest` keeps advertising the dropped table to `da sync`, AND - the orchestrator's next rebuild can resurrect a master view by picking up the leftover parquet. Two-part fix in `unregister_table`: 1. For materialized rows on bigquery/keboola, remove `${DATA_DIR}/extracts/<source_type>/data/<name>.parquet` (and any stale `<name>.parquet.tmp` from a crashed prior materialize). Filename is keyed on `table_registry.name` to match sync_state bookkeeping. File-removal errors are logged but don't fail the DELETE — the registry row is already gone, and an orphan parquet won't get a master view at next rebuild because the orchestrator's _meta-driven scan never picks up bare parquet files. 2. Always clear `sync_state` + `sync_history` rows for the dropped table_id so the manifest stops advertising the table — applies to all source types and modes, not just materialized, since any synced row had a sync_state entry. Orchestrator-side defensive guard (Finding 2b) is a no-op in the current implementation: `_attach_and_create_views` only creates master views from `_meta` rows in each connector's `extract.duckdb`, so a parquet without a matching `_meta` entry is already invisible to the rebuild. The new test `test_orchestrator_skips_orphan_parquet_in_extracts` is kept as a regression guard for that contract. 5 tests cover: BQ + Keboola materialized DELETE removes parquet, remote DELETE doesn't error trying to remove a non-existent file, sync_state cleared on DELETE, orchestrator orphan-skip invariant.	2026-05-01 22:54:11 +02:00
ZdenekSrotyr	f0979f997a	fix(admin-api): reject backtick BQ-native source_query at register; surface materialize errors per-row E2E testing showed admin POSTs of materialized BQ rows whose source_query uses BigQuery-native backtick identifiers (`prj.ds.t`) silently no-op'd at the next sync tick — the materialize path runs the SQL through the DuckDB BQ extension's COPY which uses DuckDB's parser; backticks aren't recognized and the query either parse-errors or matches zero rows. No parquet lands at the canonical path and no error reaches an operator-visible surface. Two-part fix: 1. RegisterTableRequest's _check_mode_query_coherence model_validator now rejects any source_query containing a backtick with a 422 + actionable message pointing at the DuckDB equivalent (bq."dataset"."table"). Same check is applied in update_table on the merged record so PATCHes that flip a stored source_query to backtick form are also caught. Covers BQ AND Keboola materialized rows since both connectors funnel source_query through DuckDB's COPY. 2. _run_materialized_pass now persists per-row failures via the new SyncStateRepository.set_error / clear_error methods (existing sync_state.error / status columns — no schema migration). GET /api/admin/registry enriches each row with `last_sync_error` from a single batched SELECT against sync_state, so the admin UI / da admin status can show "this table failed last sync because: X" instead of operators having to trawl scheduler logs. Recovered rows have the error cleared automatically — update_sync's success path resets status='ok' / error=NULL on the upsert. The materialized-path test fixture's _materialized_payload helper is updated to use DuckDB-flavor SQL (the prior backtick example pre-dated the fix). 6 new tests cover register/update rejection on BQ + Keboola, the sync_state error persistence, and the registry response surface.	2026-05-01 22:51:02 +02:00
ZdenekSrotyr	a4339ce679	fix(admin+diagnose): address 2 additional Devin Review findings on PR #152 Devin's second review pass on commit `16938ae7` surfaced 2 more issues: BUG_pr-review-job-58ae3148_0001 — non-BQ materialized via PUT bypasses source_query check app/api/admin.py update_table only enforces 'query_mode=materialized requires source_query' for source_type='bigquery' rows (via the synthetic RegisterTableRequest at line 2129+). Non-BQ source types (Keboola) skip the check — admin could PUT {query_mode: materialized} on a Keboola local row without source_query, persist successfully, then crash at the next sync tick when kb_materialize_query received sql=None and DuckDB rejected COPY (None) TO '...'. Fix: generic coherence guard before the BQ-specific block — for ALL source types, query_mode='materialized' requires non-empty source_query in the merged record. Returns 422 with a hint about reverting via query_mode='local'/'remote'. ANALYSIS_pr-review-job-642ff90f_0007 — diagnose returns 'ok' on BQ resolution failure app/api/health.py:_check_bq_billing_project caught get_bq_access() exceptions and returned status='ok' with a 'could not resolve' detail. Automated alerting keyed on status != 'ok' would silently miss missing google-cloud-bigquery, auth failures, or malformed config. Fix: return status='unknown' on resolution failure — surfaces it on operator dashboards without promoting the overall health to 'degraded' (which 'warning' does, intentionally for the billing==project case). Tests: - test_update_keboola_to_materialized_without_source_query_rejected: PUT {query_mode: materialized} on a Keboola local row returns 422 with 'source_query' in the detail - test_diagnose_returns_unknown_status_when_bq_resolution_fails: when get_bq_access raises, the bq_config service entry surfaces status='unknown' (not 'ok') Full sweep: 2507 passed, 25 skipped, 0 failed (+2 from previous sweep because of the 2 new regression tests; 8 pre-existing internal_roles schema-migration failures still ignored per task brief).	2026-05-01 21:21:23 +02:00
ZdenekSrotyr	16938ae7cb	fix(materialized): address 4 Devin Review findings on PR #152 Devin Review on commit `7052a235` flagged 4 real bugs in the Keboola materialized path. All four are fixed; 3 new regression tests pin the behavior so future refactors can't quietly regress. BUG_pr-review-job-3fbd31c9_0001 — _run_materialized_pass gated behind 'if bq_project:' app/api/sync.py:444-466 wrapped the entire materialized pass (which dispatches BOTH BigQuery AND Keboola rows by source_type) in a check for data_source.bigquery.project being non-empty. On Keboola-only instances this short-circuited and Keboola materialized rows sat in table_registry forever without their SQL being evaluated — the feature CHANGELOG advertised was dead code on the most common deployment shape. Fix: always run the materialized pass; the BQ branch's per-row try/except catches the typed BqAccessError(not_configured) the sentinel raises when no BQ project is set, so non-BQ instances incur a per-row error for any (hypothetical) BQ-tagged row but the Keboola path runs cleanly. Log line renamed 'Materialized BQ' → 'Materialized SQL' to match. BUG_pr-review-job-3fbd31c9_0004 — wrong config key 'url' instead of 'stack_url' app/api/sync.py:149 read get_value('data_source', 'keboola', 'url'), but the canonical config key documented in instance.yaml.example:111 and used by app/api/admin.py:1503 + 2359 is 'stack_url'. Production Keboola instances would always see an empty URL and fail with the 'not configured' error. The pre-existing test patched the wrong key too, so it passed without catching the mismatch. Fix: use stack_url in both sync.py and the test fixture. BUG_pr-review-job-3fbd31c9_0003 — no atomic write in Keboola materialize_query connectors/keboola/extractor.py wrote COPY directly to the final '<id>.parquet' path. A mid-COPY failure (network, disk full, extension crash) left a partial parquet that the orchestrator rebuild would later pick up and serve to analysts. BQ's materialize_query already uses a '<id>.parquet.tmp' staging path + os.replace() atomic swap (connectors/bigquery/extractor.py:370-445); Keboola now mirrors that pattern with the same try/except cleanup on COPY failure. BUG_pr-review-job-3fbd31c9_0002 — full file read into memory for MD5 Same file:60-62 used parquet_path.read_bytes() for the MD5 hash. Multi-GB Keboola materialized results would OOM on memory-constrained containers. BQ's version uses streaming 8 KiB-chunk hashing (connectors/bigquery/extractor.py:438-442); Keboola now mirrors it. Tests: - test_run_sync_runs_materialized_pass_on_keboola_only_instance — pins BUG_0001's fix; setting bigquery.project='' must NOT skip Keboola materialized dispatch - test_keboola_materialize_atomic_write_on_failure — pins BUG_0003; a mid-COPY RuntimeError leaves no .parquet AND no .parquet.tmp at the canonical path - test_keboola_materialize_uses_tmp_path_during_copy — documents the atomic-write contract: COPY targets .parquet.tmp, final swap to .parquet (no .tmp suffix on the result['path']) - existing test_run_materialized_pass_dispatches_keboola_to_keboola_extractor fixture updated: stack_url instead of url Full sweep: 2505 passed, 25 skipped, 0 failed (modulo 8 pre-existing internal_roles schema-migration failures called out in the task brief).	2026-05-01 20:58:17 +02:00
ZdenekSrotyr	7052a23552	release(0.30.0): per-connector tab UI + Keboola materialized parity + /admin/server-config full exposure Highlights (full prose in CHANGELOG.md [0.30.0]): - Smart local sync — Claude Code SessionStart/SessionEnd hooks via 'da analyst setup' + 'da sync --quiet' for hook-friendly output - query_mode='materialized' end-to-end for BigQuery + Keboola — admin SELECT (against bq.dataset.x or kbc.bucket.table) → scheduler runs through DuckDB extension → parquet → da sync distribution - /admin/tables per-connector tabs (BigQuery / Keboola / Jira), full Keboola Custom-SQL parity, form cleanup, per-row Manage access deep link - /admin/server-config known-fields registry + structured nested editor: surfaces BQ optional knobs (billing_project, legacy_wrap_views, max_bytes_per_materialize), ai.base_url, new openmetadata + desktop sections, full corporate_memory governance schema - da diagnose warns on USER_PROJECT_DENIED-prone billing_project=project config - Schema v20 — adds source_query TEXT to table_registry	2026-05-01 20:38:34 +02:00
ZdenekSrotyr	b627de8344	feat(diagnose) + docs: warn on USER_PROJECT_DENIED footgun + document all newly-exposed knobs Diagnostic + operator-facing documentation that closes the loop on the work in this PR. `da diagnose` (via /api/health/detailed): - New _check_bq_billing_project() helper. When data_source.type='bigquery' and BqProjects.billing == .data, surface a yellow warning: 'BigQuery billing project equals data project'. Hint includes the YAML field path + the /admin/server-config UI shortcut. Diagnose's overall status promotes warning → degraded so the CLI echoes it. - Non-BQ instances (Keboola-only, etc.) skip the check. - Implementation hooks into the existing /api/health/detailed surface — no new endpoint, no CLI changes. config/instance.yaml.example documentation: - data_source.bigquery.billing_project: USER_PROJECT_DENIED hint, /admin/server-config UI reference - data_source.bigquery.legacy_wrap_views: analyst-side discipline note (use `da fetch` / `da query --remote`), issue #101 history, view-heavy deployment guidance - data_source.bigquery.max_bytes_per_materialize: cost guardrail block (NEW — wasn't documented in .example before) - ai.base_url: provider list + UI hint - openmetadata + desktop: 'configurable via /admin/server-config UI' headers - corporate_memory: leading note that the schema is editable via UI Other docs: - CHANGELOG.md: comprehensive Unreleased section - CLAUDE.md: schema chain → v20 + Materialized SQL connector mode + per-connector tab UI mention - README.md: mode-first source table summary - docs/architecture.md: per-connector tab UI mention - cli/skills/connectors.md: bootstrap rails (parallel to #154) - docs/superpowers/plans/2026-05-01-admin-tables-form-cleanup.md: implementation plan archive (2515 lines) - scripts/seed_dummy_tables.py: drop is_public after #150 RBAC migration (column gone) Tests: - test_diagnose_billing.py — 3 cases (BQ with billing==data warns, BQ with billing!=data clean, non-BQ skips)	2026-05-01 20:27:24 +02:00

... 3 4 5 6 7 ...

709 commits