Commit graph

104 commits

Author SHA1 Message Date
ZdenekSrotyr
4bd1919f77 fix(query): #168 review iter 5 — forbidden-table check uses registry IDs
Devin Review iter #5 flagged a pre-existing class of name/id mismatch
in app/api/query.py:131-136 — the SAME root cause as the bq.* RBAC
issue I fixed in iter #3 (line 332/362). Devin called it out as
"NOT introduced by this PR" / "might merit follow-up", but it's
exactly the same security-boundary pattern this PR is hardening, so
fixing here keeps the RBAC story consistent across the handler.

The `forbidden = all_views - set(allowed)` comparison mixed types:
- `all_views` carries DuckDB master view names (= registry display
  `name` from the orchestrator's CREATE VIEW)
- `set(allowed)` carries registry IDs (resource_grants.resource_id)

When `id != name` (e.g. id="bq.finance.ue", name="ue"), authorized
users got spurious 403s — the view name landed in `forbidden` even
though the caller had a valid grant on the registry id.

Build a name->id map from the registry, then the forbidden check
compares apples to apples:
    allowed_view_names = {r["name"] for r in registry_rows
                          if r.get("name") and r.get("id") in allowed_ids}
    forbidden = all_views - allowed_view_names

107 affected tests pass; 487 pass in wider RBAC/query/access/admin
domain — no regressions.
2026-05-04 14:18:43 +02:00
ZdenekSrotyr
28aba4c1f9 fix(query): #168 review iter 3 — RBAC name-vs-id, placeholder dead code
Devin Review iter #3 found 3 new real bugs after iter #2's fixes landed.

🔴 RBAC check at app/api/query.py:362 used `row["name"]` against
`accessible_set`, but `accessible_set` is keyed by registry IDs
(`get_accessible_tables` returns `resource_grants.resource_id` —
table IDs, not display names). Confirmed by `_table_blocks` projection
at `app/resource_types.py:157-158`. When `id != name` (e.g.
`id="bq.finance.ue", name="ue"`), non-admin users with valid grants
got 403 `bq_path_access_denied`. Switch to `row["id"]`.

🚩 Bare-name pass at app/api/query.py:332 had the same name-vs-id
mismatch (different impact): legitimate accessible rows were skipped
from `dry_run_set`, so the cost guardrail under-counted scan bytes
for non-admin users. Could let an over-cap query through and
under-bill quota. Switch to `row_id` comparison.

🟡 `placeholder_from` for billing_project was dead code.
`_BQ_OPTIONAL_FIELD_DEFAULTS["billing_project"] = ""` seeded an empty
string into every GET payload via `_ensure_bq_optional_fields`. JS
`isUnset = (value === undefined)` evaluated False, so the
`(defaults to <project>)` placeholder NEVER rendered. Drop the seed —
field stays in `known_fields` (UI sees it) but routes through the
unset rendering path on GET, where placeholder_from fires.

Tests: test_get_surfaces_bq_fields_even_when_unset assertion flipped
from "billing_project IS present" to "billing_project NOT auto-seeded"
to lock in the new shape. 67 affected tests pass.
2026-05-04 13:51:36 +02:00
ZdenekSrotyr
5eaa449fcc fix(query): #168 review iter 2 — quota user_id parity + concurrent-slot 429
Devin Review iter #2 found 2 new issues (after iter #1's 5 fixes
landed). Both real, both addressed.

🔴 Quota user_id key mismatch defeated shared daily budget. /api/query
computed `user.get("id") or user.get("email")` while /api/v2/scan uses
`user.get("email") or "anon"` (app/api/v2_scan.py:327). Same user → two
different keys in the singleton QuotaTracker. BQ bytes consumed via
/api/query were tracked under UUID; via /api/v2/scan under email; the
`check_daily_budget` pre-flight on either endpoint never saw the
other's recorded bytes — per-user cap was effectively doubled. Match
v2/scan's email-first ordering.

🟡 QuotaExceededError(KIND_CONCURRENT) → 400 instead of 429.
`quota.acquire(user_id)` raises this from __enter__ when the per-user
concurrent-scan slot is at cap. The exception propagated through the
@contextlib.contextmanager generator, the caller's `with guard:`
block, and was caught by execute_query's generic `except Exception`
handler → mapped to 400 with a flattened "Query error: concurrent_scans:
N/M" string, dropping the typed retry_after_seconds field. Wrap the
`with quota.acquire(...)` in a try/except QuotaExceededError that maps
to 429 with the same typed-detail shape used for the daily-budget
rejection — consistent with /api/v2/scan:392-402.

Tests: test_api_query_quota.py user_id strings updated to
"admin@test.com" (the seeded_app admin's email) to match the new
email-first ordering. 40 affected tests pass.
2026-05-04 13:38:31 +02:00
ZdenekSrotyr
1263b80726 fix(query): #168 review — concurrent-slot wraps execute, doc/JS fixes
Devin Review on PR #168 found 5 issues — all real, all addressed.

🚩 ANALYSIS_001 (architectural): concurrent-slot guard didn't protect
actual BQ query execution. Earlier `_enforce_remote_bq_quota_and_cap`
ran dry-run + cap check inside `with quota.acquire(user_id):`, then
returned — releasing the slot BEFORE `analytics.execute(...)` ran. Spec
§4.3.3 explicitly designs the slot to wrap execute so the per-user
concurrent cap limits BQ scans, not just dry-runs.

Refactor to a context manager `_bq_quota_and_cap_guard`. Caller's `with`
block now holds the slot through dry-run, cap check, the actual
`analytics.execute(...)` (which is what triggers the BQ scan when DuckDB
resolves the master view), AND the post-flight record_bytes. Slot
released only when caller's `with` body exits.

🟡 BUG_001: placeholder JS walked `original` (full GET payload root)
instead of `original.sections`. `placeholder_from: ["data_source",
"bigquery", "project"]` is a section-relative path, so billing_project
placeholder NEVER rendered. Fix: walk `original.sections` (with fallback
to `original` for safety).

🟡 BUG_002 + BUG_003: admin_tables.html register and edit modals'
operator help text referenced `max_bytes_per_remote_query` (the old
name from the spec) but the actual config key is `bq_max_scan_bytes`
after the fix-up commit `6423888d` moved it. Replace both occurrences.

🟡 BUG_004: CHANGELOG entry said `api.query.bq_max_scan_bytes` (the
old path) but the read at app/api/query.py:53 is
`get_value("data_source", "bigquery", "bq_max_scan_bytes", ...)`. An
operator who set it under `api.query` in their yaml would have no
effect. Correct path in CHANGELOG.

All 95 #160-affected tests pass after the changes.
2026-05-04 13:28:03 +02:00
ZdenekSrotyr
6423888d02 fix(query): #160 move bq_max_scan_bytes to data_source.bigquery (UI editable)
E2E test on dev VM revealed: spec said "configurable via /admin/server-config"
for the cost guardrail cap, but the underlying read path was
`api.query.bq_max_scan_bytes` and `api` is NOT in `_EDITABLE_SECTIONS`. POST
to /admin/server-config rejected `{"sections":{"api":...}}` as "unknown
section(s): api" — the cap was only adjustable via direct YAML edit.

Move to `data_source.bigquery.bq_max_scan_bytes`:
- `_default_remote_query_cap_bytes()` reads from the new path.
- Add to `_OPTIONAL_FIELDS["data_source"]["bigquery"]["fields"]` with the
  same shape as `max_bytes_per_materialize` (kind=int, default 5 GiB, hint).
- Add to `_BQ_OPTIONAL_FIELD_DEFAULTS` so it surfaces in the GET payload
  even when YAML omits it.

Convention now mirrors `max_bytes_per_materialize` — both BQ cost
guardrails live under `data_source.bigquery`, both editable in the UI.
2026-05-04 12:46:38 +02:00
ZdenekSrotyr
39bdc1ff45 feat(admin): #160 BQ test-connection endpoint + billing_project placeholder UI
Closes the operator-side half of the reporter's loop. The CLI fix in
the previous commit makes USER_PROJECT_DENIED errors readable to
analysts; this commit lets admins verify reachability proactively
from /admin/server-config without waiting for analyst reports.

New endpoint POST /api/admin/bigquery/test-connection
(app/api/admin_bigquery_test.py, ~110 LOC):
- Depends(require_admin); registered in app/main.py.
- Builds BqAccess via existing get_bq_access(), runs `SELECT 1 AS ok`
  with a 10s polling timeout.
- 200 with {ok, billing_project, data_project, elapsed_ms} on success.
- 400 for `BqAccessError(not_configured)` (operator config issue).
- 502 for any other typed BqAccessError or unknown upstream exception.
- 504 for concurrent.futures.TimeoutError; best-effort cancel_job
  invoked (BQ-side cancel may still run; documented caveat).

Server-config placeholder (app/api/admin.py + admin_server_config.html):
- `data_source.bigquery.billing_project` field-spec gains
  `placeholder_from: ["data_source", "bigquery", "project"]`.
- renderLeafInput's text branch reads `opts.spec.placeholder_from`,
  walks the loaded `original` config dict, injects
  `placeholder="(defaults to <project>)"` into the input HTML at
  construction time. Admin sees the access.py:339-340 fallback rule
  visible directly in the UI without reading source.

UI button:
- "Test BigQuery connection" button next to data_source's Save button.
- onTestBigQuery() POSTs to the endpoint, renders structured result
  inline (green check + elapsed_ms on success; red kind + hint on
  failure).

Tests: 6 endpoint cases + 1 placeholder payload test = 7 GREEN. 62
total across the affected admin server-config test files.
2026-05-04 10:31:35 +02:00
ZdenekSrotyr
77cdb65f76 sec(query): #160 BQ_PATH catches quoted "bq" catalog token (Phase 3 review)
Phase 3 review identified an RBAC + cost-cap bypass: `SELECT * FROM
"bq"."ds"."tbl"` (catalog token quoted as a DuckDB identifier) was NOT
matched by the BQ_PATH regex, so direct quoted-form references skipped
both the registry check and the cost-cap dry-run. DuckDB resolves
`"bq"` to the same ATTACHed BQ catalog, so the bypass is real.

Widen the catalog-token alternation: `(?:"bq"|bq)` matches both forms.
Negative lookbehind `(?<![\w.])` still rejects look-alike prefixes
(`other_bq`, `my_bq`); the new "my_bq".ds.tbl negative test locks that
in alongside `other_bq.ds.tbl`.

Tests:
- 2 new positive cases in tests/test_query_bq_regex.py for the quoted
  form (`"bq"."finance"."ue"` and uppercase `"BQ"."ds"."tbl"`).
- 1 new negative case rejecting `"my_bq".ds.tbl` so the quoted-form
  widening doesn't open a different evasion.
- 1 new RBAC test in tests/test_api_query_rbac_bq_path.py: admin
  hitting an unregistered quoted path returns the same
  bq_path_not_registered 403 as the unquoted form.

All 33 Phase 3 tests pass after the fix.
2026-05-04 10:31:35 +02:00
ZdenekSrotyr
896c43c7a2 feat(query): #160 cost guardrail + bq.* RBAC + quota integration on /api/query
The headline implementation for issue #160. POST /api/query now gates
direct `bq."<dataset>"."<source_table>"` references behind the registry
and bounds the BQ scan cost behind a configurable cap. Wired through
the same singleton QuotaTracker as /api/v2/scan so daily-byte budgets
are shared across both BQ-touching paths.

Changes in app/api/query.py:

- Add module-level `BQ_PATH` regex matching the 16 syntax variants
  verified empirically (fully-quoted, unquoted, mixed quoting,
  case-insensitive, inside CTE bodies, multi-path, …).
- Add `bigquery_query` to the SQL keyword blocklist. Closes the
  pre-existing function-call backdoor where a user could run an
  arbitrary BQ jobs API call against any reachable dataset, bypassing
  the registry and RBAC. Wrap views internal to the BQ extractor still
  use bigquery_query() — but those run via DuckDB view resolution at
  query time, not via user-submitted SQL, so the blocklist doesn't
  break them.
- Add `_bq_guardrail_inputs` helper: walks user SQL twice — once for
  bare-name matches against accessible registered remote-BQ names
  (contributes to dry_run_set), once for direct `bq.X.Y` matches
  (gated against `find_by_bq_path` lookups, returns 403 with
  structured detail on miss or grant violation).
- Add `_enforce_remote_bq_quota_and_cap` helper: pre-flight
  `check_daily_budget` (over-cap → 429), then `with quota.acquire(...)`
  wraps a per-path BQ dry-run, sums bytes, raises 400
  `remote_scan_too_large` when total > cap.
- Cap default 5 GiB; configurable via `api.query.bq_max_scan_bytes`
  in /admin/server-config (next phase wires the UI).
- Post-flight `record_bytes` against the user's daily counter.
- Module-level imports of `_bq_dry_run_bytes`, `_build_quota_tracker`,
  `get_bq_access` so tests can monkeypatch via `app.api.query.<name>`.

Tests:
- All 23 RED tests from the previous commit now pass (regex matrix,
  blocklist with detail-string assertion, RBAC unregistered/admin-bypass,
  guardrail dry-run-called/over-cap-rejected, quota pre-flight 429).
- mock_dry_run fixture stubs both `_bq_dry_run_bytes` and `get_bq_access`
  so guardrail tests don't require a live BQ project.
- Quota test uses `admin1` (the seeded_app fixture's actual user id, not
  `admin`).

Smoke: 887 passed across query/bq/admin/extractor/registry/quota
domains. No regressions.
2026-05-04 10:31:35 +02:00
ZdenekSrotyr
e44d2280e5 refactor(quota): #160 relocate _build_quota_tracker to v2_quota.py
The /api/query cost guardrail (next phase) needs the same singleton
QuotaTracker so its daily-byte and concurrent-slot caps accumulate
across both /api/v2/scan and /api/query BQ-touching paths.

Move `_build_quota_tracker`, `_quota_singleton`, and `_quota_init_lock`
from app/api/v2_scan.py to app/api/v2_quota.py (the natural home; the
factory uses QuotaTracker which already lives there). Re-export the
function from v2_scan.py so the 7 test sites at tests/test_v2_scan.py
(lines 77, 118, 143, 160, 186, 208, 250) keep working without edits.

Crucially do NOT re-export `_quota_singleton` from v2_scan.py — Python
`from X import var` copies the binding at import time, so a re-exported
singleton would freeze at the initial None and never observe the
in-place mutation done inside `_build_quota_tracker()`. Re-export only
the function (which always reads the live module-global through `global`).

Mechanical refactor; no behavior change. 30 quota-related tests pass.
2026-05-04 10:31:35 +02:00
ZdenekSrotyr
9d0e4e687d refactor(bq): #160 remove legacy_wrap_views config knob (always-wrap)
Now that VIEW/MATERIALIZED_VIEW always wrap via bigquery_query() (the
prior `legacy_wrap_views=True` branch behavior, made unconditional in
the previous commit), the toggle has no semantic meaning and is removed
across the codebase.

Production code:
- app/api/admin.py: drop the field from _OPTIONAL_FIELDS["data_source"]
  ["bigquery"]["fields"] and from _BQ_OPTIONAL_FIELD_DEFAULTS, plus the
  comment block above the defaults dict.
- config/instance.yaml.example: drop the example snippet.
- src/orchestrator.py: update the inner-objects skip-branch comment to
  reflect the new BQ behavior (the skip itself stays — keboola
  use_extension=False still inserts _meta rows without inner views).
- app/web/templates/admin_tables.html: rewrite operator copy in the
  register and edit forms to reflect always-wrap.

Tests:
- tests/test_admin_server_config.py (TestServerConfigBigQueryFields):
  flip assertions from "field IS present" to "field NOT present" on
  legacy_wrap_views. Drop the test_post_persists_legacy_wrap_views test
  since the field no longer exists.
- tests/test_admin_server_config_known_fields.py: same flip on the
  known-fields registry assertion.
- tests/test_bigquery_extractor.py: drop the obsolete
  test_view_entity_does_not_create_master_view_by_default (asserted the
  bug we fixed) and test_legacy_wrap_views_toggle_restores_old_behavior
  (toggle no longer meaningful). Update remaining test docstrings.

Operators with `legacy_wrap_views: true` set in their overlay get the
new (equivalent) behavior automatically — the unrecognized key is
silently ignored by the YAML loader. Operators with `false` get the
issue-#160 fix as a behavior change, not a regression.

Spec gate updated: production code grep gate
  grep -rn 'legacy_wrap_views' connectors app src config cli
must return zero. tests/ excluded — historical "removed in #160"
breadcrumbs and `assert "X" not in fields` regression guards retained
as anti-regression signals.
2026-05-04 10:31:35 +02:00
ZdenekSrotyr
955b56608d feat(api,web,cli): /admin/workspace-prompt + /api/welcome restored + da analyst writes CLAUDE.md
- app/api/claude_md.py: GET /api/welcome (analyst, auth required); GET/PUT/DELETE
  /api/admin/workspace-prompt-template; POST …/preview; two-pass Jinja2 validation
  on PUT; validation stub mirrors build_claude_md_context() shape
- app/main.py: register claude_md_router
- app/web/router.py: GET /admin/workspace-prompt → admin_workspace_prompt.html
- app/web/templates/admin_workspace_prompt.html: CodeMirror editor + live preview +
  status chip + reset modal; mirrors admin_welcome.html for Agent Setup Prompt
- app/web/templates/_app_header.html: add "Agent Workspace Prompt" nav item next to
  "Agent Setup Prompt"; extend _admin_active to cover /admin/workspace-prompt
- cli/commands/analyst.py: _init_claude_workspace now accepts server_url + token;
  _write_claude_md fetches GET /api/welcome, writes CLAUDE.md, graceful 404/5xx;
  setup command adds --no-claude-md flag to opt out; default = write CLAUDE.md
- tests: test_claude_md_api.py (16 tests); test_analyst_bootstrap.py updated with
  4 new CLAUDE.md bootstrap tests; test_welcome_template_api.py: update stale
  assertion about /api/welcome being removed (endpoint restored)
- tests/snapshots/openapi.json: regenerated
2026-05-03 22:44:14 +02:00
ZdenekSrotyr
9ad7856f72 fix(devin-review): dashboard CTA respects override; PUT validates anon path
Finding #1: _build_context now routes through render_agent_prompt_banner when
a DB connection is available, so both /setup and the /dashboard clipboard CTA
always reflect the admin override (or the live default when no override is set).
Previously _build_context unconditionally used resolve_lines(), ignoring the
welcome_template override for the dashboard JS array.

Finding #2: PUT /api/admin/welcome-template now performs a second render pass
with user=None (anonymous stub) after the authenticated-user pass. Templates
that reference user.* fields without an {% if user %} guard are rejected with
a clear 400 error explaining the anon-visitor breakage.
2026-05-03 21:45:32 +02:00
ZdenekSrotyr
d18bc4c8f7 fix(api): align PUT validation autoescape with runtime (False); docs match 2026-05-03 21:30:24 +02:00
ZdenekSrotyr
61ef0d0eed fix(devin-review): address 4 findings on PR #167
- Fix #1: _detect_existing_project now checks .claude/settings.json for
  "da sync" marker instead of deleted CLAUDE.md; update tests accordingly.
- Fix #2: preview endpoint uses autoescape=False to match /setup rendering;
  align render_agent_prompt_banner in welcome_template.py to the same.
- Fix #3: apply _sanitize_banner_html to override render path in setup_page
  so all render paths sanitize consistently.
- Fix #4: move .setup-link-banner into the existing-user branch where
  account_details.last_sync_display is reachable; remove dead copy from
  new-user branch.
2026-05-03 21:15:01 +02:00
ZdenekSrotyr
dc931a6556 feat(admin-prompt): default = live setup script; override replaces /setup content
The /admin/agent-prompt editor now pre-fills with the full bash bootstrap
script from setup_instructions.resolve_lines() instead of being empty.
When an admin saves an override it replaces the default everywhere — the
/setup page display and the dashboard clipboard CTA — rather than adding a
banner above the auto-generated commands.

GET /api/admin/welcome-template now returns a `default` field with the live
computed script so the editor always shows meaningful starting content.

{server_url} and {token} single-brace placeholders survive Jinja2 rendering
and are substituted by JavaScript at clipboard-copy time as before.

Preview pane switches to textContent (not innerHTML) since content is bash.
2026-05-03 16:31:35 +02:00
ZdenekSrotyr
8db4c1645b feat(admin-prompt): variant C — banner on /setup, drop CLAUDE.md generation
- src/welcome_template.py: rewrite as HTML banner renderer
  (render_agent_prompt_banner); drop _list_tables, _metrics_summary,
  _marketplaces_for_user, render_welcome, _load_default_template.
  build_context now exposes only instance/server/user/now/today.
  _sanitize_banner_html strips script/iframe/on*/javascript: post-render.
- app/api/welcome.py: drop get_welcome handler, WelcomeResponse, old
  _VALIDATION_STUB_CONTEXT. Admin endpoints stay at same URLs; validation
  stub updated to match new slim context. Preview now uses autoescape=True.
- app/web/router.py: setup_page calls render_agent_prompt_banner and passes
  banner_html to install.html; admin_agent_prompt_page drops _load_default_template.
- app/web/templates/install.html: add .setup-banner CSS + banner block above hero.
- cli/commands/analyst.py: replace _generate_claude_md with _init_claude_workspace;
  no CLAUDE.md written, only .claude/CLAUDE.local.md placeholder + settings.json hooks.
- tests: delete test_cli_analyst_welcome.py (tests deleted endpoint/function);
  rewrite TestGenerateClaudeMd → TestInitClaudeWorkspace; update api test to
  assert /api/welcome returns 404 and remove welcome-fetch tests.
2026-05-03 16:12:13 +02:00
ZdenekSrotyr
60386b9c3c polish: drop dead CSS, fix docstring drift, add agent-prompt route test 2026-05-03 16:12:13 +02:00
ZdenekSrotyr
ecb6c35ad5 feat(admin): rename /admin/welcome to /admin/agent-prompt (Agent Setup Prompt)
Rename the welcome prompt editor from /admin/welcome to /admin/agent-prompt
and update all UI labels to "Agent Setup Prompt". API endpoint URLs are
unchanged (PUT/GET/DELETE /api/admin/welcome-template, GET /api/welcome).

- Nav menu: "Welcome prompt" → "Agent Setup Prompt", href updated
- Page title and h2 updated in admin_welcome.html
- Error message hint in app/api/welcome.py updated to /admin/agent-prompt
- Dashboard: replace inline <details> preview of _claude_setup_instructions
  with a simple link to /setup (Task C)
- docs/welcome-template.md renamed to docs/agent-setup-prompt.md; internal
  references to /admin/welcome updated
- OpenAPI snapshot path updated
- Tests updated to reflect new route and removed inline preview
2026-05-03 16:12:13 +02:00
ZdenekSrotyr
c7b14fb120 feat(admin): drop setup_banner feature; consolidate into single editor
Remove the setup_banner feature (admin-editable /setup page banner) and
all associated code: API router, repository, renderer, admin template,
tests, and docs. The setup_page handler no longer calls render_setup_banner;
the install.html template no longer renders banner_html. The setup_banner
DuckDB table (v22) is kept intact for forward-compat with already-migrated
instances — only the application code is removed.

CHANGELOG updated: setup_banner bullets removed; Agent Setup Prompt
(welcome-template feature) now stands alone as the single editable prompt.
2026-05-03 16:12:13 +02:00
ZdenekSrotyr
b0ec842804 feat(admin-ui): SRI + CDN fallback for CodeMirror, 301→302 on /install, error sanitization
- Add integrity= + crossorigin= to all 4 cdnjs tags in admin_welcome.html
  and admin_setup_banner.html (I-1)
- Add graceful CDN fallback: when CodeMirror is undefined (SRI mismatch or
  CDN down), degrade to styled plain textarea with polyfill editor interface
  so save/reset/preview still work (I-1)
- Replace fixed 480px editor height with calc(100vh - 320px) for
  viewport-relative sizing; add min-height: 480px to .welcome-editor-col (M-8)
- Change /install redirect from 301 to 302 to prevent indefinite browser
  caching (I-5)
- Sanitize Jinja2 error detail in /api/welcome 500 response: log full error
  server-side, return generic detail pointing at /admin/welcome (M-7)
- Hoist build_context import to module level in app/api/welcome.py (M-11)
2026-05-03 16:12:13 +02:00
ZdenekSrotyr
39146288e1 feat: admin-editable setup_banner on /setup page (schema v22)
Adds an optional Jinja2/HTML banner displayed above the bootstrap
commands on /setup. Empty by default; admin authors it at
/admin/setup-banner. autoescape=True — safe for HTML context.
Render failures return "" so a broken banner never breaks /setup.

Schema v22: setup_banner singleton table, auto-migration v21→v22.
2026-05-03 16:12:13 +02:00
ZdenekSrotyr
ecaa113c68 fix(admin-welcome): credentials: include, real-content preview, refresh after mutate 2026-05-03 16:10:48 +02:00
ZdenekSrotyr
93b713900b fix(api): validate template render on PUT; broaden render-time catch 2026-05-03 16:10:48 +02:00
ZdenekSrotyr
0d1ecd235d feat(api): /api/welcome + /api/admin/welcome-template endpoints 2026-05-03 16:10:48 +02:00
ZdenekSrotyr
91caefaca9
security(auth): per-IP rate limit + last-admin guard (#165)
* security(auth): per-IP rate limit on auth endpoints + generalize last-admin guard

Closes #45 and #151.

#45 — every auth endpoint was unthrottled (login, magic-link, token,
bootstrap), leaving us open to password brute-force and SMTP
email-bombing. Wires slowapi (new dep) into the middleware chain with
per-route limits: 10/min on login + token, 5/min on send-link, 3/min on
bootstrap. Returns 429 with Retry-After: 60 once exceeded. Per-IP key
respects the leftmost X-Forwarded-For hop (Caddy in front of the app
strips client-supplied XFF). Operator escape hatch:
AGNES_AUTH_RATELIMIT_ENABLED=0. Test suite disables the limiter via
autouse conftest fixture so existing auth tests that hammer endpoints
in tight loops are unaffected.

#151 — DELETE /api/admin/users/{id}/memberships/{group_id} and the
mirror DELETE /api/admin/groups/{group_id}/members/{user_id} only
guarded against self-removal as last admin. Generalizes to refuse
removing anyone from the seeded Admin group when they are the only
remaining active admin (mirrors the existing
count_admins(active_only=True) <= 1 check on delete_user / update_user).
Recovery from zero admins requires direct DB access, so this closes
a path where a scheduler/bootstrap actor that bypasses normal admin
checks could otherwise empty the group.

* security(auth): throttle remaining email-bombing + token-confirm endpoints

Address code-review gap on PR #165 — the first commit covered /send-link
but missed two endpoints with the IDENTICAL email-bombing surface:

- POST /auth/password/reset       — sends reset mail, anti-enum response
- POST /auth/password/setup/request — sends setup mail, anti-enum response

Both now share the 5/min limit with /send-link.

Also add 10/min to the token-confirm surfaces — high-entropy tokens but
partial leaks via logs / referer have surfaced before, and unbounded
guess rate would let an attacker exhaust the keyspace adjacent to a
leaked prefix:

- POST /auth/email/verify
- GET  /auth/email/verify         — closes the click-through bypass
- POST /auth/password/reset/confirm
- POST /auth/password/setup/confirm

Doc fix: rate_limit.py module docstring + CHANGELOG entry no longer
claim "disable without a redeploy" (misleading). The Limiter constructor
freezes `enabled` from env at import time, matching every other Agnes
env knob — operators set the flag and bounce the container.

Tests: 4 new cases in test_auth_rate_limit.py covering
/reset, /setup/request, /reset/confirm, GET /verify. Full suite:
2583 passed, 32 skipped, 0 failed.

* security(auth): throttle JSON /auth/password/setup — closes form-throttle bypass

Second code-review pass on PR #165 caught a fifth gap: POST /auth/password/setup
(JSON variant, kept for backward compat) consumes the same setup_token as
the web form /setup/confirm but was unthrottled — an attacker brute-forcing
the token just switches from the form path to the JSON path and resumes
at unbounded RPS. Apply the same 10/min limit and signature shape used
on /setup/confirm.

Also extend CHANGELOG note about the JSON-variant bypass for future
operators reading the security entry.

Test: 1 new case (test_password_setup_json_rate_limited_after_10_requests),
9 rate-limit tests + 28 password-flow tests + 41 auth-provider tests pass,
no regressions.

* chore(release): cut 0.30.1 — auth security hardening (rate limit + last-admin guard)
2026-05-02 21:08:33 +02:00
ZdenekSrotyr
dc03837a7b feat(query-api): better error message when --remote query references a materialized-but-not-rebuilt id
E2E sub-agent finding: `da query --remote "SELECT * FROM <id>"` against a
materialized table that hasn't yet been rebuilt in the server's
analytics.duckdb returns a confusing DuckDB "Table does not exist"
message even though the table is in the registry. Materialized rows
produce parquets at `${DATA_DIR}/extracts/<source>/data/<id>.parquet`,
but the orchestrator's master-view creation is `_meta`-driven — fresh
instances or pre-tick states have the registry row without a
corresponding view, so analysts hit the bare "does not exist" with no
path forward.

Improve the error rendering in `app/api/query.py:execute_query`. When
DuckDB raises a "table does not exist" error, scan the registry for any
`query_mode='materialized'` row whose id or name appears in the failed
SQL. On a hit, return a 400 whose detail names the table, explains the
materialize state, and offers two concrete next steps:

1. Run `da sync` (or wait for the scheduler tick / hit
   POST /api/sync/trigger) to materialize the parquet, OR
2. Query the source directly via the catalog alias when the registry row
   carries bucket+source_table (e.g. `bq."dataset"."table"` for BigQuery,
   `kbc."bucket"."table"` for Keboola).

Detection is bounded — the registry round-trip only fires when DuckDB's
error mentions a missing table, so happy-path queries pay no cost.
Non-materialized unknowns fall through to DuckDB's raw error.

2 new tests: materialized id surfaces the hint with the bucket+source_table
payload; unknown table falls back to the generic error path with no false
positive on the new hint.
2026-05-01 23:09:52 +02:00
ZdenekSrotyr
8030a867ec fix(admin-api): keep source_type validator permissive when primary is 'local' (bootstrap)
The strict source_type-availability validator from the prior commit
broke ~12 existing tests that register tables on the default test
instance (where `data_source.type` resolves to 'local' since no
instance.yaml is loaded).

The intent of the validator is to catch *explicit* misconfig:
`type=bigquery` instance + `source_type=keboola` payload with no
`data_source.keboola.*` block. The bootstrap workflow — admin sets up
a fresh instance and registers a few tables before pointing at a real
source — should not be gated here.

Loosen the check: when `get_data_source_type()` returns 'local' (the
fallback when no `data_source.type` is set), skip the rejection. The
explicit mismatch case still 422s because that path resolves
`configured_primary` to a real source type.

Also adds an autouse keboola_instance fixture to test_journey_sync_query.py
which exercises Keboola registrations through the full sync→query
flow — the fixture documents the test's data-source assumption rather
than relying on the bootstrap escape hatch.
2026-05-01 23:09:15 +02:00
ZdenekSrotyr
bc3ba0d43d feat(admin-api): reject register-table for source_type not configured on instance
E2E sub-agent finding: instance configured with `data_source.type='bigquery'`
and no `data_source.keboola.*` block. Admin POSTs `{source_type: 'keboola'}`
to /api/admin/register-table → returns 201, row lands in the registry, but
never syncs because the scheduler has no Keboola URL/token to ATTACH
against. Operator only notices the gap when `da catalog` keeps showing
nothing.

The new `_validate_source_type_configured` helper runs immediately after
the id/view-name collision checks in `register_table`. A source_type is
considered configured when:

- it matches `get_data_source_type()` (the instance's primary), OR
- a non-empty `data_source.<source_type>` block exists in the effective
  `instance.yaml` (multi-source instance), OR
- it's in `_SOURCE_TYPES_INDEPENDENT_OF_DATA_SOURCE` (Jira / local — both
  get data through paths that don't involve `data_source.*`).

Returns 422 with a message that names the configured primary source and
points at `/admin/server-config` for enabling a secondary one. None /
empty source_type is still tolerated for backward compat with legacy CLI
scripts that don't set the field — the route resolves it later.

5 new tests cover: keboola-on-bq rejected, bq-on-keboola rejected,
matching source_type still works, jira allowed regardless, omitted
source_type passes through.

Existing tests that registered Keboola rows on the unconfigured default
test instance now opt into a `keboola_instance` fixture to satisfy the
new validator (tests/test_admin_bq_register.py + .keboola_materialized
+ .unregister_cleanup; the multi-source PUT test in test_admin_bq_register
adds a `keboola` block to its synthetic config).

Pre-existing test_missing_project_returns_error failure in
TestRebuildFromRegistry is unrelated (config-cache leakage from a
previous test in the same class) — confirmed pre-existing on the prior
commit via `git stash` reproduction.
2026-05-01 23:04:51 +02:00
ZdenekSrotyr
dd46461c6c fix(admin+orchestrator): DELETE registry drops parquet + sync_state; rebuild skips orphan parquets
E2E sub-agent finding: register a materialized BQ row → sync to materialize
the parquet at `/data/extracts/bigquery/data/<id>.parquet` → DELETE the
registry row. The DB row goes away but:

- the parquet file stays on disk forever, AND
- the sync_state row stays, so `/api/sync/manifest` keeps advertising the
  dropped table to `da sync`, AND
- the orchestrator's next rebuild can resurrect a master view by picking
  up the leftover parquet.

Two-part fix in `unregister_table`:

1. For materialized rows on bigquery/keboola, remove
   `${DATA_DIR}/extracts/<source_type>/data/<name>.parquet` (and any stale
   `<name>.parquet.tmp` from a crashed prior materialize). Filename is
   keyed on `table_registry.name` to match sync_state bookkeeping.
   File-removal errors are logged but don't fail the DELETE — the registry
   row is already gone, and an orphan parquet won't get a master view at
   next rebuild because the orchestrator's _meta-driven scan never picks
   up bare parquet files.

2. Always clear `sync_state` + `sync_history` rows for the dropped table_id
   so the manifest stops advertising the table — applies to all source
   types and modes, not just materialized, since any synced row had a
   sync_state entry.

Orchestrator-side defensive guard (Finding 2b) is a no-op in the current
implementation: `_attach_and_create_views` only creates master views from
`_meta` rows in each connector's `extract.duckdb`, so a parquet without a
matching `_meta` entry is already invisible to the rebuild. The new
test `test_orchestrator_skips_orphan_parquet_in_extracts` is kept as a
regression guard for that contract.

5 tests cover: BQ + Keboola materialized DELETE removes parquet, remote
DELETE doesn't error trying to remove a non-existent file, sync_state
cleared on DELETE, orchestrator orphan-skip invariant.
2026-05-01 22:54:11 +02:00
ZdenekSrotyr
f0979f997a fix(admin-api): reject backtick BQ-native source_query at register; surface materialize errors per-row
E2E testing showed admin POSTs of materialized BQ rows whose source_query
uses BigQuery-native backtick identifiers (`prj.ds.t`) silently no-op'd at
the next sync tick — the materialize path runs the SQL through the DuckDB
BQ extension's COPY which uses DuckDB's parser; backticks aren't recognized
and the query either parse-errors or matches zero rows. No parquet lands at
the canonical path and no error reaches an operator-visible surface.

Two-part fix:

1. RegisterTableRequest's _check_mode_query_coherence model_validator now
   rejects any source_query containing a backtick with a 422 + actionable
   message pointing at the DuckDB equivalent (bq."dataset"."table"). Same
   check is applied in update_table on the merged record so PATCHes that
   flip a stored source_query to backtick form are also caught. Covers BQ
   AND Keboola materialized rows since both connectors funnel source_query
   through DuckDB's COPY.

2. _run_materialized_pass now persists per-row failures via the new
   SyncStateRepository.set_error / clear_error methods (existing
   sync_state.error / status columns — no schema migration). GET
   /api/admin/registry enriches each row with `last_sync_error` from a
   single batched SELECT against sync_state, so the admin UI / da admin
   status can show "this table failed last sync because: X" instead of
   operators having to trawl scheduler logs. Recovered rows have the
   error cleared automatically — update_sync's success path resets
   status='ok' / error=NULL on the upsert.

The materialized-path test fixture's _materialized_payload helper is
updated to use DuckDB-flavor SQL (the prior backtick example pre-dated the
fix). 6 new tests cover register/update rejection on BQ + Keboola, the
sync_state error persistence, and the registry response surface.
2026-05-01 22:51:02 +02:00
ZdenekSrotyr
a4339ce679 fix(admin+diagnose): address 2 additional Devin Review findings on PR #152
Devin's second review pass on commit 16938ae7 surfaced 2 more issues:

BUG_pr-review-job-58ae3148_0001 — non-BQ materialized via PUT bypasses source_query check
  app/api/admin.py update_table only enforces 'query_mode=materialized
  requires source_query' for source_type='bigquery' rows (via the
  synthetic RegisterTableRequest at line 2129+). Non-BQ source types
  (Keboola) skip the check — admin could PUT {query_mode: materialized}
  on a Keboola local row without source_query, persist successfully,
  then crash at the next sync tick when kb_materialize_query received
  sql=None and DuckDB rejected COPY (None) TO '...'.
  Fix: generic coherence guard before the BQ-specific block — for ALL
  source types, query_mode='materialized' requires non-empty source_query
  in the merged record. Returns 422 with a hint about reverting via
  query_mode='local'/'remote'.

ANALYSIS_pr-review-job-642ff90f_0007 — diagnose returns 'ok' on BQ resolution failure
  app/api/health.py:_check_bq_billing_project caught get_bq_access()
  exceptions and returned status='ok' with a 'could not resolve' detail.
  Automated alerting keyed on status != 'ok' would silently miss missing
  google-cloud-bigquery, auth failures, or malformed config. Fix: return
  status='unknown' on resolution failure — surfaces it on operator
  dashboards without promoting the overall health to 'degraded' (which
  'warning' does, intentionally for the billing==project case).

Tests:
  - test_update_keboola_to_materialized_without_source_query_rejected:
    PUT {query_mode: materialized} on a Keboola local row returns 422
    with 'source_query' in the detail
  - test_diagnose_returns_unknown_status_when_bq_resolution_fails:
    when get_bq_access raises, the bq_config service entry surfaces
    status='unknown' (not 'ok')

Full sweep: 2507 passed, 25 skipped, 0 failed (+2 from previous sweep
because of the 2 new regression tests; 8 pre-existing internal_roles
schema-migration failures still ignored per task brief).
2026-05-01 21:21:23 +02:00
ZdenekSrotyr
16938ae7cb fix(materialized): address 4 Devin Review findings on PR #152
Devin Review on commit 7052a235 flagged 4 real bugs in the Keboola
materialized path. All four are fixed; 3 new regression tests pin the
behavior so future refactors can't quietly regress.

BUG_pr-review-job-3fbd31c9_0001 — _run_materialized_pass gated behind 'if bq_project:'
  app/api/sync.py:444-466 wrapped the entire materialized pass (which
  dispatches BOTH BigQuery AND Keboola rows by source_type) in a check
  for data_source.bigquery.project being non-empty. On Keboola-only
  instances this short-circuited and Keboola materialized rows sat in
  table_registry forever without their SQL being evaluated — the feature
  CHANGELOG advertised was dead code on the most common deployment shape.
  Fix: always run the materialized pass; the BQ branch's per-row try/except
  catches the typed BqAccessError(not_configured) the sentinel raises
  when no BQ project is set, so non-BQ instances incur a per-row error
  for any (hypothetical) BQ-tagged row but the Keboola path runs cleanly.
  Log line renamed 'Materialized BQ' → 'Materialized SQL' to match.

BUG_pr-review-job-3fbd31c9_0004 — wrong config key 'url' instead of 'stack_url'
  app/api/sync.py:149 read get_value('data_source', 'keboola', 'url'),
  but the canonical config key documented in instance.yaml.example:111
  and used by app/api/admin.py:1503 + 2359 is 'stack_url'. Production
  Keboola instances would always see an empty URL and fail with the
  'not configured' error. The pre-existing test patched the wrong key
  too, so it passed without catching the mismatch. Fix: use stack_url
  in both sync.py and the test fixture.

BUG_pr-review-job-3fbd31c9_0003 — no atomic write in Keboola materialize_query
  connectors/keboola/extractor.py wrote COPY directly to the final
  '<id>.parquet' path. A mid-COPY failure (network, disk full, extension
  crash) left a partial parquet that the orchestrator rebuild would
  later pick up and serve to analysts. BQ's materialize_query already
  uses a '<id>.parquet.tmp' staging path + os.replace() atomic swap
  (connectors/bigquery/extractor.py:370-445); Keboola now mirrors that
  pattern with the same try/except cleanup on COPY failure.

BUG_pr-review-job-3fbd31c9_0002 — full file read into memory for MD5
  Same file:60-62 used parquet_path.read_bytes() for the MD5 hash.
  Multi-GB Keboola materialized results would OOM on memory-constrained
  containers. BQ's version uses streaming 8 KiB-chunk hashing
  (connectors/bigquery/extractor.py:438-442); Keboola now mirrors it.

Tests:
  - test_run_sync_runs_materialized_pass_on_keboola_only_instance —
    pins BUG_0001's fix; setting bigquery.project='' must NOT skip
    Keboola materialized dispatch
  - test_keboola_materialize_atomic_write_on_failure — pins BUG_0003;
    a mid-COPY RuntimeError leaves no .parquet AND no .parquet.tmp at
    the canonical path
  - test_keboola_materialize_uses_tmp_path_during_copy — documents the
    atomic-write contract: COPY targets .parquet.tmp, final swap to
    .parquet (no .tmp suffix on the result['path'])
  - existing test_run_materialized_pass_dispatches_keboola_to_keboola_extractor
    fixture updated: stack_url instead of url

Full sweep: 2505 passed, 25 skipped, 0 failed (modulo 8 pre-existing
internal_roles schema-migration failures called out in the task brief).
2026-05-01 20:58:17 +02:00
ZdenekSrotyr
b627de8344 feat(diagnose) + docs: warn on USER_PROJECT_DENIED footgun + document all newly-exposed knobs
Diagnostic + operator-facing documentation that closes the loop on the work in this PR.

`da diagnose` (via /api/health/detailed):
  - New _check_bq_billing_project() helper. When data_source.type='bigquery' and BqProjects.billing == .data, surface a yellow warning: 'BigQuery billing project equals data project'. Hint includes the YAML field path + the /admin/server-config UI shortcut. Diagnose's overall status promotes warning → degraded so the CLI echoes it.
  - Non-BQ instances (Keboola-only, etc.) skip the check.
  - Implementation hooks into the existing /api/health/detailed surface — no new endpoint, no CLI changes.

config/instance.yaml.example documentation:
  - data_source.bigquery.billing_project: USER_PROJECT_DENIED hint, /admin/server-config UI reference
  - data_source.bigquery.legacy_wrap_views: analyst-side discipline note (use `da fetch` / `da query --remote`), issue #101 history, view-heavy deployment guidance
  - data_source.bigquery.max_bytes_per_materialize: cost guardrail block (NEW — wasn't documented in .example before)
  - ai.base_url: provider list + UI hint
  - openmetadata + desktop: 'configurable via /admin/server-config UI' headers
  - corporate_memory: leading note that the schema is editable via UI

Other docs:
  - CHANGELOG.md: comprehensive Unreleased section
  - CLAUDE.md: schema chain → v20 + Materialized SQL connector mode + per-connector tab UI mention
  - README.md: mode-first source table summary
  - docs/architecture.md: per-connector tab UI mention
  - cli/skills/connectors.md: bootstrap rails (parallel to #154)
  - docs/superpowers/plans/2026-05-01-admin-tables-form-cleanup.md: implementation plan archive (2515 lines)
  - scripts/seed_dummy_tables.py: drop is_public after #150 RBAC migration (column gone)

Tests:
  - test_diagnose_billing.py — 3 cases (BQ with billing==data warns, BQ with billing!=data clean, non-BQ skips)
2026-05-01 20:27:24 +02:00
ZdenekSrotyr
85d3810535 feat(materialized): query_mode='materialized' for BigQuery + Keboola — admin SELECT → parquet → analyst
Closes the 'admin pre-stages a curated table/view for analysts' use case end-to-end across both supported source connectors.

Backend (BigQuery + Keboola, schema v20):
  - schema v20 adds source_query TEXT to table_registry (renumbered from v19 after main's #150 RBAC migration also bumped to v19)
  - connectors/bigquery/extractor.py adds materialize_query(table_id, sql, *, bq, output_dir, max_bytes=...) — BqAccess session, dry-run cost guardrail (default 10 GiB, configurable via data_source.bigquery.max_bytes_per_materialize), idempotent ATTACH, rows/bytes/md5 metadata for sync_state
  - connectors/keboola/access.py — new KeboolaAccess facade (parallel of BqAccess) wrapping ATTACH 'keboola://...' AS kbc
  - connectors/keboola/extractor.py adds materialize_query — same shape, no dry-run analog (Keboola Storage API has different cost model); legacy bucket-download path skips query_mode='materialized' rows
  - app/api/sync.py:_run_materialized_pass dispatches by source_type to the right materialize_query
  - app/api/admin.py: RegisterTableRequest accepts source_query; model_validator coheres mode↔source_query↔bucket; PUT preserves omitted fields; deprecation marks (Field(deprecated=True)) on sync_strategy + profile_after_sync (no extractor reads them; profile_after_sync becomes inert — bug from earlier work where /api/sync/trigger never honored the flag); _BQ_OPTIONAL_FIELD_DEFAULTS injects defaults into GET /server-config payload

Operator + CLI surface:
  - da admin register-table --query / --query-mode materialized
  - scripts/smoke-test-materialized-bq.sh — end-to-end smoke for operators

Tests (incl. spike + integration + regression):
  - test_db_migration_v20, test_table_registry_source_query
  - test_bq_materialize, test_bq_cost_guardrail, test_bq_init_extract_skips
  - test_keboola_access, test_keboola_extension_query_passthrough (lock-in for the DuckDB extension capability), test_keboola_materialize, test_keboola_init_extract_skips, test_keboola_materialized_e2e (skipped without KBC_TEST_* creds)
  - test_sync_trigger_materialized, test_sync_trigger_keboola_materialized
  - test_api_admin_materialized, test_cli_admin_materialized
  - test_admin_bq_register, test_admin_discover_bigquery, test_admin_keboola_materialized, test_admin_phase_c_deprecation, test_admin_put_preservation, test_materialized_e2e

Cost: BQ uses bigquery_query() (jobs API, view-aware) — works on tables, views, materialized views uniformly. Keboola uses ATTACH+COPY parquet through the DuckDB extension.
2026-05-01 20:25:56 +02:00
minasarustamyan
d4ac84dd46
feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150)
* feat(rbac): drop dataset_permissions + access_requests + users.role + is_public; v19 migration

BREAKING. Sjednocení datové RBAC vrstvy do per-group resource_grants modelu.
Před PR byla legacy data RBAC vrstva (dataset_permissions + is_public bypass)
de-facto neaktivní — is_public neměl API/UI/CLI surface, default true znamenal
že can_access_table vždycky bypassl. Dnes každý non-admin přístup vyžaduje
explicitní resource_grants(group, "table", id) řádek.

Schema v18 → v19 (src/db.py:_v18_to_v19_finalize):
- DROP TABLE dataset_permissions, access_requests
- DROP COLUMN users.role (NULL artifact since v13)
- DROP COLUMN table_registry.is_public
- Drops přes table-rebuild idiom (rename → create new → INSERT … SELECT
  → drop old) kvůli DuckDB ALTER DROP COLUMN limitacím na tabulkách
  s historic FK constraints. INSERT picks intersection sloupců, takže
  test fixtures s minimal pre-v19 schemou migrate cleanly.

Runtime:
- src/rbac.py:can_access_table → deleguje na app.auth.access.can_access
- DatasetPermissionRepository, AccessRequestRepository smazány
- AGNES_ENABLE_TABLE_GRANTS env-gate v app/resource_types.py odstraněn
  (TABLE je unconditionally enabled)

API drop:
- app/api/permissions.py, app/api/access_requests.py celé soubory
- /admin/permissions web route + admin_permissions.html
- "Request Access" modal v catalog.html + locked-row UI
- ~10 if user.get("role") != "admin" checků nahrazeno (admin shortcut
  je uvnitř can_access_table)
- /api/settings: drop permissions field z GET; PUT /api/settings/dataset
  gate přepnut na can_access(user_id, "table", dataset, conn)

Auth:
- app/auth/jwt.py:create_access_token: drop role parametr (claim zmizí
  z nově vydávaných JWT; staré tokeny zůstávají valid, claim ignored)
- app/api/users.py: drop role z CreateUserRequest / UpdateUserRequest
  (admin promotion = explicit add to Admin group via memberships API)
- src/repositories/users.py: drop role z create() / update()

CLI:
- da admin set-role smazán → hard-fail s replacement command
- da admin add-user --role flag pryč
- da auth import-token --role flag pryč
- da auth whoami: drop "Role:" výpis
- cli/config.py:save_token: role parametr now optional, no longer written
  (back-compat se starými token.json soubory zachována — pole se ignoruje)

Tests:
- DELETE: test_permissions.py, test_permissions_api.py, test_access_requests_api.py
- REWRITE: test_access_control.py (resource_grants flow), test_rbac.py
  (can_access_table over resource_grants), test_journey_rbac.py
  (drop access-request flow), test_resource_types.py (drop env-gate
  tests, drop is_public from helpers), test_v2_*.py (drop role-based
  user dicts in favor of id-based + Admin group membership),
  test_settings_api.py (no permissions field, can_access gate)
- TRIVIAL: ~30 souborů — drop role="admin" arg z UserRepository.create
  a 3rd positional role z create_access_token
- NEW: test_v18_to_v19 migration test (test_db.py),
  test_can_access_table_no_implicit_public (test_rbac.py),
  test_admin_set_role_returns_hardfail (test_cli_admin.py)
- OpenAPI snapshot regenerated

Docs:
- CHANGELOG: BREAKING entry pod [Unreleased]
- CLAUDE.md: schema v18 → v19
- docs/architecture.md: schema table + RBAC sekce přepsána
- docs/auth-google-oauth.md: admin promotion přes da admin break-glass
- cli/skills/security.md: kompletně přepsáno na group-based model
- docs/TODO-rbac-data-enforcement.md: smazáno (TODO splněn)

Test results: 2363 passed, 19 failed. Zbývající failures jsou pre-existing
Windows-specific issues (fcntl, charset) nesouvisející s tímto PR —
ověřeno git stash pop.

Plan: ~/.claude/plans/floofy-coalescing-parnas.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): cut 0.27.0

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-04-30 22:02:16 +02:00
minasarustamyan
fb1573766a
feat(admin): users/groups UI polish + SSO lock + v18 migration (#142)
Cuts release 0.24.0.

## Highlights
- SSO-managed accounts read-only for password / delete operations (UI + API). New `is_sso_user` flag derived from group memberships.
- Admin/Everyone system rows show `google_sync` chip + Workspace email subtitle when env-mapped.
- Origin pill vocabulary unified across `/admin/groups`, `/admin/access`, `/admin/users`, `/admin/users/{id}`, `/profile` (Admin yellow, Everyone gray, google_sync green, custom purple).
- Effective-access readout no longer short-circuits for admin users — always renders per-resource breakdown.
- Schema migration v18 drops stranded non-google memberships in env-mapped Admin/Everyone groups (cleans up v13's blanket Everyone backfill).

## Devin findings addressed
- _is_sso_user requires source='google_sync' on system-group branches (so v13 system_seed memberships in env-mapped Everyone don't lock out the admin).
- POST add-to-group returns correct origin via _derive_origin (matching GET).
- 8 customer-specific token instances (groupon.com / foundryai) replaced with vendor-neutral placeholders across templates, tests, and CHANGELOG.
- deriveDisplayName name-skip for canonical "Admin"/"Everyone" so an overlapping AGNES_GOOGLE_GROUP_PREFIX doesn't mangle the chip text.

See CHANGELOG [0.24.0] for full notes.
2026-04-30 15:16:04 +02:00
ZdenekSrotyr
70672204fe
feat(memory): admin Edit + MEMORY_DOMAIN RBAC + ai-section UI (#141)
Cuts release 0.23.0.

## Highlights
- Single-item Edit button on every memory item card (modal hits PATCH /api/memory/admin/{id}).
- MEMORY_DOMAIN RBAC resource type — admins grant user_groups access to specific domains via /admin/access. Composes with existing audience filter (OR semantics, no-op when no grants).
- ai: section editable in /admin/server-config — admins can set ANTHROPIC_API_KEY / model / provider / base_url for the corporate-memory extractor without editing instance.yaml directly. api_key auto-masked.

## Devin findings addressed
- Modal NULL→empty fix (audience visibility wouldn't break).
- Stats endpoint granted_domains parity with list endpoint.
- Documented intentional MEMORY_DOMAIN→audience bypass.
- Documented conscious ai.base_url SSRF exclusion (legit internal LiteLLM/vLLM proxies).

See CHANGELOG [0.23.0] for full notes.
2026-04-30 11:04:41 +02:00
ZdenekSrotyr
83adf01bde
fix(v2): #134 BigQuery cross-project errors return structured 502/400 + BqAccess facade (#138)
* docs(spec): #134 unify BigQuery access behind BqAccess facade

Brainstorm output for issue #134. Captures:
- root cause (incl. correction of the issue's hypothesis about commit 33a9964)
- BqAccess facade API + project resolution rules
- error contract — typed BqAccessError mapped to HTTP 502 for upstream
  BQ failures, 500 for deployment/config bugs
- migration plan for v2_scan, v2_sample, RemoteQueryEngine
- test rewrite eliminating _bq_client_factory injection point
- E2E verification protocol on agnes-development as success criterion

* docs(spec): #134 revise after first review

Incorporates code-reviewer findings:

Must-fix:
- Add v2_schema (2 copies of INSTALL/LOAD/SECRET dance) to migration scope.
- Reframe v2_scan headline: missing try/except around BQ calls is the
  actual cause of bare 500s, not project resolution (which 33a9964 fixed).
- List two more deferred call sites (extractor.py, register_bq_table)
  with explicit rationale.

Important:
- Drop billing != data clause from cross_project_forbidden heuristic;
  rely only on 'serviceusage' substring. billing != data is normal
  for cross-project setup, was over-classifying.
- Split bq_bad_request into _user (400) and _server (502) variants;
  add sql_origin parameter to translate_bq_error so call sites declare
  whether SQL contains user input.
- Add @functools.cache to BqAccess.from_config; document tests bypass
  via dependency_overrides.
- Replace monkey-patched-classmethod test pattern with
  BqAccess(client_factory=...) injection at construction time. Cleaner
  than today's _bq_client_factory and 1:1 migration shape.
- Keep BqProjects.data (reviewer assumed registry has source_project;
  it doesn't). Multi-project explicitly listed as non-goal with note.

Nice-to-have:
- Add 'Implementation strategy' section: 2 staged commits (bug fix
  alone is revertable; refactor follows).
- Extend E2E protocol to cover all three endpoints, not just /sample.
- Note removal of stale docstring at src/remote_query.py:204.

* docs(spec): #134 revision 3 — incorporates second-round review

Must-fix from second review:
- v2_schema split into two migration cases: _fetch_bq_schema translates
  errors via translate_bq_error; _fetch_bq_table_options preserves its
  swallow-all 'except Exception → return {}' so /schema doesn't 502 on
  partition-info failures.
- RemoteQueryEngine.__init__ now resolves BqAccess lazily (in
  _get_bq_client, not in __init__). Without this, ~7 DuckDB-only tests
  in test_remote_query.py would suddenly fail with not_configured.
- translate_bq_error pass-through for BqAccessError is now load-bearing
  (clause 1, before any Google-API branch). bq.client() raises BqAccessError
  for bq_lib_missing/auth_failed; without explicit pass-through those
  fall to 'unknown' and re-raise as bare 500.
- Commit 1 now emits the SAME structured response shape as commit 2 to
  avoid contract churn between commits.
- BIGQUERY_PROJECT env-var precedence is BREAKING for env-only deployments
  — flagged in CHANGELOG ### Changed.

Editorial:
- sql_origin renamed to bad_request_status with values 'client_error' /
  'upstream_error' (clearer about what the parameter actually decides).
  bq_bad_request_user/_server kinds collapsed to bq_bad_request (400)
  and bq_upstream_error (502).
- CLI (cli/commands/query.py) noted as external RemoteQueryEngine caller;
  unaffected because new bq_access kwarg has default None.
- Added unit/integration tests for the new contracts:
  test_translate_passes_through_BqAccessError,
  test_v2_scan_returns_500_on_bq_lib_missing,
  test_v2_schema_returns_200_with_empty_partition_on_bq_failure,
  test_resolve_succeeds_after_config_set.
- E2E protocol now covers /schema as the fourth endpoint.
- Documented functools.cache-doesn't-cache-exceptions semantics and
  fixture nullcontext-doesn't-close caveat for nested sessions.

* docs(spec): #134 revision 4 — incorporates third-round review

Third reviewer verdict: 'implementation-ready with two trivial edits';
explicitly noted prior rounds did the heavy lifting.

Edits:
1. get_bq_access() module-level function instead of @classmethod
   @functools.cache from_config. Removes the classmethod-cache stacking
   footgun (different Python versions wrap differently) and gives FastAPI's
   dependency introspection a clean function signature. Drops the
   'Do not subclass BqAccess' caveat that no longer applies.

2. Commit 1 strategy explicitly: wrap _fetch_bq_sample (v2_sample),
   _bq_dry_run_bytes + _run_bq_scan (v2_scan), and _fetch_bq_schema
   (v2_schema strict block). Do NOT touch _fetch_bq_table_options swallow-all
   in commit 1 — preserved as-is, then migrated (still preserved) in commit 2.
   All three endpoints emit the same structured body shape so client parsers
   see one consistent contract throughout the staged rollout. No more
   half-rolled-out window where /sample is bare 500 while /scan is
   structured 502.

* docs(plan): #134 implementation plan — Phase 1 (atomic bug fix) + Phase 2 (BqAccess refactor) + Phase 3 (verification)

Bite-sized TDD tasks. 3 phases, 16 tasks total:

Phase 1 (Commit 1) — atomic bug fix across all four v2 endpoints:
  Tasks 1.1-1.5 wrap _fetch_bq_sample, _bq_dry_run_bytes, _run_bq_scan,
  _fetch_bq_schema with structured 502/400 try/except. _fetch_bq_table_options
  preserved untouched. CHANGELOG Fixed entries.

Phase 2 (Commit 2) — BqAccess facade extraction + migration:
  Tasks 2.1-2.5 build connectors/bigquery/access.py bottom-up
  (BqProjects, BqAccessError, translate_bq_error, default factories,
  BqAccess class, get_bq_access module-level cached). Task 2.6 adds
  conftest.py fixture. Tasks 2.7-2.9 migrate v2_scan, v2_sample, v2_schema
  to BqAccess. Tasks 2.10-2.11 migrate RemoteQueryEngine + tests
  (lazy bq_access, drop _bq_client_factory). Task 2.12 CHANGELOG
  Changed BREAKING + Internal.

Phase 3 — Verification:
  3.1 full pytest. 3.2 squash into two PR-shape commits. 3.3 manual
  E2E on agnes-development per spec protocol → close #134.

Self-review table maps spec sections to implementing tasks; no gaps.

* fix(v2): #134 structured 502/400 on BQ errors across /scan, /scan/estimate, /sample, /schema

Wraps the BigQuery call sites in v2_scan, v2_sample, and v2_schema (strict
block only) with try/except for google.api_core exceptions, translating to
HTTPException with a structured body shape: {error, message, details}.

Fixes Pavel's report (#134) where these endpoints returned bare HTTP 500
with no body when the SA on agnes-development hit cross-project Forbidden
on serviceusage.services.use.

Also fixes /sample's missing billing_project fallback (the bug 33a9964
fixed for /scan never landed here).

Status code split:
  - /scan, /scan/estimate: BadRequest -> 400 (bq_bad_request) since SQL is
    user-derived from req.select/where/order_by.
  - /sample, /schema: BadRequest -> 502 (bq_upstream_error) since SQL is
    server-constructed from validated identifiers.
  - All Forbidden -> 502 with cross_project_forbidden if 'serviceusage' in
    error message (with hint pointing at data_source.bigquery.billing_project),
    else bq_forbidden.

Body shape matches what the upcoming BqAccess refactor (next commit) will
produce, so client-side parsers see one consistent contract throughout
the staged rollout.

_fetch_bq_table_options preserved exactly as-is — its swallow-all-and-return-empty
contract is intentional and survives into the refactor; /schema continues to
return 200 with empty partition info when partition queries fail.

Outer wraps in scan_endpoint, scan_estimate_endpoint, sample, and schema
endpoints exist only to make the test pattern (monkeypatching whole
_fetch_* functions) work, and are tagged TODO(#134 Phase 2) for removal
once BqAccess centralizes translation.

* refactor(bq): #134 BqAccess facade — unify v2_scan, v2_sample, v2_schema, RemoteQueryEngine

Extracts the duplicated BigQuery-access pattern (project resolution +
client construction + DuckDB-extension session + Google-API error
translation) into connectors/bigquery/access.py. Migrates four
call sites to use it:

- app/api/v2_scan.py — _bq_dry_run_bytes, _run_bq_scan
- app/api/v2_sample.py — _fetch_bq_sample
- app/api/v2_schema.py — _fetch_bq_schema (strict translation),
  _fetch_bq_table_options (preserves swallow-all best-effort contract)
- src/remote_query.py — RemoteQueryEngine, lazy bq_access kwarg

The new module exposes:
- BqProjects (frozen dataclass: billing + data project IDs)
- BqAccessError (typed exception with HTTP_STATUS class mapping)
- BqAccess (facade with injectable client_factory/duckdb_session_factory
  for tests; defaults call the real google-cloud-bigquery + DuckDB extension)
- get_bq_access (module-level @functools.cache; FastAPI Depends target)
- translate_bq_error (Google API exception → BqAccessError mapper, with
  BqAccessError pass-through, 'serviceusage'-substring heuristic for
  cross_project_forbidden, and bad_request_status param distinguishing
  user-derived (400) from server-constructed (502) SQL)
- _default_client_factory, _default_duckdb_session_factory

RemoteQueryEngine.__init__ no longer accepts _bq_client_factory; tests
migrate to bq_access=BqAccess(projects, client_factory=...). DuckDB-only
RemoteQueryEngine tests need no changes — bq_access defaults to None and
get_bq_access() is only invoked on first BQ call (lazy resolution).
BqAccessError raised internally is translated to RemoteQueryError(
error_type="bq_error") in _get_bq_client to preserve the engine's
existing public contract — CLI and /api/query/hybrid callers see no change.

Endpoint tests (test_v2_scan, test_v2_scan_estimate, test_v2_sample,
test_v2_schema) migrate from monkey-patching whole _fetch_* functions
to using the new bq_access fixture in tests/conftest.py — which
exercises the REAL translation path through BqAccess + translate_bq_error,
closing the test gap flagged in Task 1.1's review.

Side-effect behavior change: v2_sample's FROM clause now uses the data
project (instance.yaml data_source.bigquery.project), not the conflated
billing_project from Phase 1. Documented in CHANGELOG ### Internal.

BREAKING for deployments combining BIGQUERY_PROJECT env var with
data_source.bigquery.project in instance.yaml — env var now overrides
data project too. See CHANGELOG ### Changed.

Two known-duplicate BQ-access sites (connectors/bigquery/extractor.py,
scripts/duckdb_manager.register_bq_table) explicitly out of scope;
tracked as follow-up.

Removed stale docstring at the previous src/remote_query.py:204
that referenced scripts.duckdb_manager._create_bq_client as the default
BQ client factory (RemoteQueryEngine never actually used that function).

Test counts: tests/test_bq_access.py +27 (new), tests/test_v2_*.py +
tests/test_remote_query.py migrated to bq_access fixture (counts unchanged
or +1-2 per file). Full suite: 2086 passed, 8 pre-existing failures
(DB migration tests with unrelated internal_roles DependencyException —
not introduced by this PR).

* fix(bq_access): translate DefaultCredentialsError to BqAccessError(auth_failed)

CI on PR #138 caught: bigquery.Client(...) resolves Application Default
Credentials at construction time; without ADC (CI without SA key, dev
laptop without 'gcloud auth application-default login') it raises
google.auth.exceptions.DefaultCredentialsError synchronously.

Pre-fix _default_client_factory only caught ImportError, so DefaultCredentialsError
propagated as raw exception — and from production endpoints would surface
as bare 500 (the exact failure mode #134 sets out to fix).

Now translates to BqAccessError(kind='auth_failed', details.hint='Run
gcloud auth application-default login...'). Endpoint catch chain returns
HTTP 502 with structured body. Adds unit test
test_raises_auth_failed_on_default_credentials_error.

Third-round spec review flagged this case in passing; the fix didn't land.
CI's auth-less environment surfaced it.

* fix(bq_access): get_bq_access() returns sentinel instead of raising when not configured

Devin BUG_0001 on PR #138 review: 'get_bq_access() as FastAPI Depends
breaks all v2 endpoints for non-BigQuery instances'.

Pre-fix: get_bq_access() raised BqAccessError(not_configured) when
neither BIGQUERY_PROJECT env nor data_source.bigquery.project was set.
Because FastAPI resolves Depends() BEFORE the endpoint body runs, this
exception fires during dep-injection — the endpoint's try/except
BqAccessError clause never gets a chance to catch it. Result: every
v2 request on Keboola-only or CSV-only instances returned bare HTTP
500, even for local-source tables that never touch BigQuery.

Fix: get_bq_access() now returns a sentinel BqAccess with empty
BqProjects and factories that raise BqAccessError(not_configured)
on actual use. Construction succeeds, FastAPI's dep-injection cleanly
yields the sentinel, the endpoint runs. The local-source code path
in build_sample / build_schema / etc. never calls bq.client() or
bq.duckdb_session() (it reads parquet directly), so non-BQ tables
return 200 as before. Only when an endpoint actually tries to query
BQ (source_type == 'bigquery') does the sentinel raise — and the
endpoint's existing except BqAccessError catches it normally,
returning structured 502 with hint.

Test get_bq_access::test_raises_not_configured_when_neither_set
renamed and rewritten to test_returns_sentinel_when_neither_set:
asserts BqAccess is returned, then asserts client() and
duckdb_session() each raise BqAccessError(not_configured) on call.

Test test_does_not_cache_exceptions removed (no longer applicable)
and replaced with test_sentinel_is_cached_per_process documenting
the operator-restart-on-config-change contract.

* docs(spec+plan): #134 genericize customer-specific tokens (CLAUDE.md OSS rule)

Devin BUG_0001/0002 round 3 on PR #138: spec and plan docs contained
customer-specific deployment hostnames, deployment names, and a GCP
project ID that violated CLAUDE.md's vendor-agnostic OSS rule
('Nothing customer-specific belongs in code, configuration defaults,
comments, docs, commit messages, PR titles, or PR bodies').

Replacements:
  agnes-development.groupondev.com -> <your-agnes-host>
  agnes-development                -> <your-dev-instance>
  prj-grp-dataview-prod-1ff9       -> <your-data-project>
  s1_session_landings              -> <bq_table_id>

E2E verification semantics unchanged — operators still run the same
four curls + config flip + retry, just substituting their own host /
deployment name / project / table.

* fix(bq_access): hook get_bq_access.cache_clear into instance_config.reset_cache

Devin ANALYSIS_0004 on PR #138: get_bq_access is @functools.cache'd at
process level, so it captures BigQuery project IDs at first call and
ignores subsequent instance.yaml changes. Pre-Phase-2 the v2 endpoints
re-read get_value() on every request, so admin /api/admin/server-config
saves (which call instance_config.reset_cache()) hot-reloaded the BQ
project. Without this fix, my refactor silently regresses that contract
— operators editing instance.yaml via the admin UI would see no effect
on v2 endpoints until container restart.

instance_config.reset_cache() now also calls
connectors.bigquery.access.get_bq_access.cache_clear() (lazy import,
swallowed if connectors module isn't loaded — keeps instance_config
usable in isolated unit tests).

Adds test_instance_config_reset_cache_invalidates_get_bq_access as
regression guard. Updates CHANGELOG Internal entry to mention the
hot-reload contract + the not-configured sentinel behavior (round-3
fix from Devin BUG_0001 was previously only in commit message).

* fix(bq_access): surface not_configured before identifier validation + plan path genericize

Devin BUG_0001 + BUG_0002 round 5 on PR #138.

BUG_0001 (plan doc): personal filesystem path violated CLAUDE.md
vendor-agnostic rule. Replaced with '<worktree-root>' placeholder.

BUG_0002 (sentinel error path): when get_bq_access() returns the sentinel
BqAccess (BQ not configured), the empty bq.projects.data was reaching
validate_quoted_identifier first and raising ValueError -> endpoint
mapped to HTTP 400 'unsafe_identifier' instead of structured 500
'not_configured' with hint.

Each fetch helper now checks 'if not bq.projects.data: bq.client()' as
the first step, which triggers the sentinel's BqAccessError(not_configured).
Endpoint catches the typed error and returns HTTP 500 with hint pointing
at data_source.bigquery.project. Best-effort _fetch_bq_table_options
returns {} silently in this case (preserves the swallow-all contract).

* fix(bq_access): classify DuckDB-native exceptions from bigquery_query() via string match

Devin ANALYSIS on PR #138 review (latest round). The DuckDB bigquery
extension is a C++ plugin making its own HTTP calls — when BQ returns
403, it throws duckdb.IOException with the BQ error embedded as text,
not gax.Forbidden. translate_bq_error's isinstance checks would miss
these, falling to case 7 → bare 500 in production for v2_scan, v2_sample,
and v2_schema (the bigquery_query() paths).

Fix: last-resort string-match heuristic before the re-raise. 'Forbidden'
/ '403' / 'Bad Request' / '400' in the lowercased message classifies via
the same kind hierarchy. The 'serviceusage' substring still distinguishes
cross_project_forbidden from bq_forbidden. Specific enough that random
exceptions without HTTP-error keywords still re-raise.

Adds 4 unit tests covering the new heuristic + the 'don't swallow random
exceptions' invariant.

* chore(release): cut 0.22.0

PR #138 contains issue #134 user-visible behavior changes:
- BREAKING: BIGQUERY_PROJECT env var now overrides instance.yaml
  data_source.bigquery.project for v2 endpoints (previously
  RemoteQueryEngine billing only).
- Fixed: structured 502/400 on /api/v2/sample, /scan, /scan/estimate,
  /schema when BigQuery raises Forbidden/BadRequest (was bare 500).
- Internal: BqAccess facade refactor unifying four duplicate BQ-access
  call sites; instance_config.reset_cache() now invalidates BqAccess
  cache too so admin server-config saves hot-reload BQ project IDs.

Bumps to 0.22.0 because PR #137 merged first and took 0.21.0.
2026-04-30 10:11:20 +02:00
Vojtech
38f6b639d2
feat(observability): request_id end-to-end + dev debug toolbar + centralized logging (#136)
Cuts release 0.20.0.

## Highlights
- X-Request-ID header on every response + sanitized to [A-Za-z0-9_-] (CRLF log-forging mitigation)
- Error pages (HTML + JSON 500) surface request_id for support tickets
- Dev debug toolbar gated by DEBUG=1 — fastapi-debug-toolbar with custom DuckDBPanel
- Centralized app.logging_config.setup_logging() replaces 23 scattered basicConfig calls
- Telegram bot drops bot.log file — stdout only (BREAKING)

## Devin findings addressed
- BUG_0001: .env.template no longer claims FastAPI debug=True
- BUG_0002: subprocess extractor logs INFO to stderr again
- ANALYSIS_0003: _wants_html no longer matches Accept: */* (curl gets JSON as before)
- BUG on b1c6ee9: HTML 500 page no longer leaks str(exc) in production
- BUG on b13d2fe: 2 CLAUDE.md compliance flags (transform.py + ws_gateway) accepted as scope-limited logging refactor — follow-up to update CLAUDE.md if needed

See CHANGELOG [0.20.0] for full notes.
2026-04-29 22:54:21 +02:00
ZdenekSrotyr
b7a1795834
feat(scheduler): re-wire sync_schedule + script.schedule; tune via env; OpenMetadata TLS (#135)
Bundles 4 issues:
- #79 — table_registry.sync_schedule honored at runtime (API-side filter + Pydantic validators)
- #78 — script_registry.schedule honored via new POST /api/scripts/run-due (atomic claim, BackgroundTask exec, deploy-time safety validation)
- #77 — sidecar JOBS env-driven (SCHEDULER_DATA_REFRESH_INTERVAL/HEALTH_CHECK_INTERVAL/SCRIPT_RUN_INTERVAL/TICK_SECONDS)
- #89 — OpenMetadataClient verify=True default (BREAKING for self-signed)

Cuts release 0.19.0. See CHANGELOG for full notes incl. Known Limitations.
2026-04-29 22:06:30 +02:00
minasarustamyan
c940593a90
feat(auth): Google Workspace group prefix filter + system mapping (#131)
Three new env vars wire the Google OAuth callback to a configurable Workspace prefix and route admin/everyone Workspace groups onto the seeded system rows: AGNES_GOOGLE_GROUP_PREFIX, AGNES_GROUP_ADMIN_EMAIL, AGNES_GROUP_EVERYONE_EMAIL. Login gate redirects users with no prefix-matching group to /login?error=not_in_allowed_group. BREAKING: auto-Everyone membership for new users removed. Admin UI/API are read-only on Google-managed groups. See docs/auth-groups.md.
2026-04-29 14:08:04 +02:00
ZdenekSrotyr
82c5d71d63
feat(memory): #62 — duplicate hints + tree-view + bulk-edit (#126)
Issue #62. Tree view with cross-axis filtering, duplicate-candidate hints (Jaccard score on entity overlap), bulk-edit endpoints (PATCH /api/memory/admin/{id} + POST /api/memory/admin/bulk-update), schema v17 (knowledge_item_relations), full CLI parity (da admin memory tree/edit/bulk-edit/duplicates list/resolve).
2026-04-29 13:55:15 +02:00
ZdenekSrotyr
1824b9dd9c
feat(admin): #108 M1 — BigQuery table registration in UI + CLI (#119)
Issue #108 Milestone 1. Adds BigQuery table registration via /admin/tables UI and `da admin register-table` CLI without hand-editing table_registry. POST /api/admin/register-table/precheck for round-trip validation. --dry-run flag on CLI. Audit-log entries on register/update/unregister. PUT /api/admin/registry/{id} now preserves registered_at (closes #130).
2026-04-29 13:18:31 +02:00
ZdenekSrotyr
995e4cd366
fix(scheduler): HTTP marketplaces job + SCHEDULER_API_TOKEN shared secret (#127)
* fix(scheduler): HTTP marketplaces job + SCHEDULER_API_TOKEN shared secret

Two scheduler-reliability bugs surfaced after the v0.12.1 USER-agnes flip:

1. The marketplaces job called src.marketplace.sync_marketplaces() in-process
   from the scheduler container, racing the app's long-lived system.duckdb
   handle. DuckDB rejects cross-process writers — every cron tick 500-ed on
   "Could not set lock on file ... PID 0".

2. The data-refresh + new marketplaces jobs both 401-ed on the API because
   SCHEDULER_API_TOKEN was never propagated by the Terraform startup script.
   The scheduler had no credential to authenticate with.

Fix:
- New POST /api/marketplaces/sync-all (admin-only) drives the nightly refresh
  through the app process so it inherits the existing DB connection.
- Scheduler swaps fn->http for marketplaces; all jobs are now plain HTTP and
  the scheduler is reduced to a cron clock.
- New app/auth/scheduler_token.py adds a shared-secret auth path. The
  startup script generates a 256-bit secret on first boot, persists it
  across reboots, and writes it to /opt/agnes/.env. Both containers source
  the same .env. The app validates incoming Bearer tokens against the env
  var (constant-time, length-floored) and resolves matches to a synthetic
  scheduler@system.local user that's a member of the Admin system group.
  Audit-log entries from the scheduler are attributed to this user.
- app/main.py seeds the synthetic user at startup so the first cron tick
  has a valid actor; lazy seed in get_scheduler_user covers token rotation
  before the next app restart.

Tests: 5 new in tests/test_auth_scheduler_token.py covering empty/short
secret rejection, exact-match comparison, idempotent user seeding, and
lazy provisioning. 142 marketplace + scheduler tests + 96 auth tests
remain green.

Existing VMs with .env from before this change need a one-time
re-provisioning (re-run startup-script or rotate via openssl rand);
documented in CHANGELOG.

* fix(audit): use '_all' sentinel for bulk marketplace sync — Devin review #127

Avoids the literal string 'marketplace:None' in the audit_log resource
column when the bulk sync endpoint writes its summary row.

* fix(scheduler): unblock event loop + per-job timeouts — Devin review #127

Two findings from Devin re-review on commit 5fbad15:

1. BUG: trigger_sync_all was async def, so FastAPI ran it on the asyncio
   event loop. sync_marketplaces() does blocking I/O (subprocess git
   clones up to GIT_TIMEOUT_SEC=300 each, threading.Lock, DuckDB writes)
   and would freeze every concurrent request for the duration of a bulk
   sync. Switched to plain def so FastAPI auto-routes to the thread pool.

2. ANALYSIS: scheduler used a fixed 120s httpx timeout for every POST.
   Bulk marketplace sync iterates the registry under a single lock with
   up to 300s per repo — easily exceeds 120s on 2-3 slow repos. The
   scheduler then sees a timeout, doesn't update last_run, and re-fires
   on the next 30s tick, queueing redundant work. Per-job timeout
   override added to the JOBS tuple; marketplaces gets 900s (15 min),
   data-refresh keeps 120s, health-check 30s.

* fix(auth): require_session_token rejects scheduler shared secret — Devin review #127

require_session_token gates /auth/tokens (PAT minting). Pre-fix it only
rejected JWTs with typ=pat — but the scheduler shared secret is an opaque
string, so verify_token() returns None, payload becomes {}, and the
PAT-claim check silently passed. A caller bearing SCHEDULER_API_TOKEN
could mint persistent PATs that survive a secret rotation.

Added explicit is_scheduler_token() check before the PAT-claim check;
new regression test in tests/test_auth_scheduler_token.py.

Devin's other note (pre-existing async def trigger_sync at marketplaces.py:392
also calls blocking sync_one) — Devin flagged it as out-of-scope for this PR
and I agree; tracking separately.

* release(0.17.0): cut + clean up CHANGELOG duplicates

Cuts 0.17.0 (minor: scheduler shared-secret auth + sync-all endpoint
plus the deploy-shape fixes that landed since the last release tag).

Bumps pyproject from 0.15.0 — also corrects the missed bump from PR #120
(v0.16.0 was tagged on GitHub and shipped as :stable, but pyproject
stayed at 0.15.0, so /api/version, /cli/latest, and `da --version` had
been under-reporting the running release).

Removes the long-form duplicate entries for 0.13.0 / 0.14.0 / 0.15.0
above [0.16.0] — the canonical short summaries (with GitHub-release
links) already exist below 0.16.0, the long forms were leftover state
from before those versions were cut and have been silently shadowed
ever since.
2026-04-29 11:44:00 +02:00
ZdenekSrotyr
61f6b8d2d5
feat(ci+tests): deploy safety audit — linting, rollback, smoke tests, 50+ new tests (#120)
Comprehensive deploy safety audit implementing 19 improvements across CI/CD pipeline, test coverage, and source code.

### CI/CD Pipeline
- ruff + mypy added to both release.yml and keboola-deploy.yml (continue-on-error)
- Smoke test added to keboola-deploy.yml (was missing)
- Automatic rollback on smoke test failure in release.yml
- Expanded smoke-test.sh with catalog, admin/tables, marketplace.zip, metrics
- Required status checks via .github/settings.yml
- Dependabot + CODEOWNERS + pre-commit hooks + ruff config

### Source Code
- DB schema version check in /api/health (db_schema: ok/mismatch/unhealthy)
- Config versioning (config_version: 1 in instance.yaml, non-blocking validation)
- BigQuery extractor ATTACH error handling (try/except around INSTALL+ATTACH)
- Post-deploy smoke test script for prod VM validation

### Test Coverage (~50 new tests)
- v13->v14 migration, Email magic link TTL, PAT, Marketplace ZIP/Git,
  Jira webhooks, Hybrid Query BQ, Keboola/BQ extractor failure modes,
  Orchestrator failure modes

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-04-29 09:18:55 +02:00
PavelDo
e1108b6112
feat(memory): corporate memory v1+v1.5 + 0.15.0 (#72)
Adds corporate memory v1 (verification flywheel + contradiction detection + confidence scoring) and v1.5 (audience-based distribution + per-item privacy + admin curation). Server: GET /api/memory/bundle returns mandatory + ranked-approved items within a token budget; POST /api/memory/admin/mandate accepts an audience field gated against user_group_members; /api/memory/stats uses SQL aggregation. CLI: da sync writes received items to .claude/rules/km_*.md. Verification detector extracts knowledge candidates from session JSONL files. Auto-tagging via Haiku when ai: is configured. Adapted from the v9-era branch onto v13/v14 RBAC: _is_privileged_viewer + _effective_groups now query user_group_members JOIN user_groups; require_role(Role.KM_ADMIN) replaced with require_admin (km_admin collapsed into admin). Schema v15: knowledge_items context-engineering columns + knowledge_contradictions + session_extraction_state. Schema v16: verification_evidence. Cuts release v0.15.0 (also bundles #116 /me/debug page).
2026-04-29 07:16:22 +02:00
minasarustamyan
7a06f1a585
feat(auth): /me/debug self-only auth diagnostic page (#116)
Adds /me/debug HTML page rendering the logged-in user's own session state — decoded JWT claims (no raw token, sha256[:12] fingerprint for log correlation), group memberships with sources and bound external_id when present, resource grants effective via those memberships, and a Refetch from Google (dry-run) button that diffs a fresh fetch_user_groups call against the cached user_group_members snapshot. Gated by AGNES_DEBUG_AUTH env var (default off → 404, route existence undetectable in production). Self-only by construction: user_id is read from the validated session, never echoes raw JWT / password hash / full PAT. Tolerates v13 + v14 schemas via information_schema check on users.external_id.
2026-04-29 06:36:28 +02:00
ZdenekSrotyr
2e1dfb7553
feat(v2): claude-driven fetch primitives + 0.14.0 (#102)
Replaces the BigQuery wrap-view pattern with a discovery + scoped-fetch toolkit driven by the analyst's Claude session. Adds /api/v2/{catalog,schema,sample,scan,scan/estimate}, da catalog/schema/describe/fetch/snapshot/disk-info CLI commands, sqlglot-backed WHERE validator, process-local quota tracker, agent rails skill (cli/skills/agnes-data-querying.md). BREAKING: BQ wrap views off by default — set data_source.bigquery.legacy_wrap_views=true for one cycle. Backward-compat field_validator on primary_key. Catalog cache now matches documented 300s TTL with RBAC fresh per request. Cuts release v0.14.0.
2026-04-29 01:07:19 +02:00
ZdenekSrotyr
a222f92e70
feat(admin): server configuration editor + 0.13.0 (#107)
Adds /admin/server-config UI for editing instance.yaml from the web. Hardening: SSRF gate on data_source URLs, narrow-overlay write strategy, atomic writes, audit log with secret masking on shape changes, threading lock on read-modify-write, corrupt-overlay refusal on write side + louder log on read side, modal Promise resolution on backdrop dismiss, sentinel scrub on save (defense-in-depth client+server). Bundles Windows PowerShell wrapper from #80. Cuts release v0.13.0.
2026-04-29 00:47:23 +02:00
ZdenekSrotyr
5f6bb7a4b2
fix(security+ops) + release(0.12.1): #82 #85 #87 hardening + cut 0.12.1 (#104)
* fix(security+ops): #82 #85 #87 — auth hardening, API validation, deploy posture

Security and operational hardening across three issue groups:

- M23: docker-compose.override.yml → docker-compose.dev.yml (BREAKING, prod foot-gun)
- C13: Container runs as non-root user 'agnes' (USER directive in Dockerfile)
- M21: Docker resource limits (mem_limit, cpus) on app + scheduler
- M22: Caddyfile security headers (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, -Server)
- M17: /api/health split into minimal (unauth) + /api/health/detailed (auth) (BREAKING)
- M26: release.yml restricts build-and-push to main + workflow_dispatch; paths-ignore for docs

- C2: table_id traversal validation on /api/data/{table_id}/download
- M4: Upload streaming (chunk-read + temp file) instead of full-buffer; /local-md hashed filename

- C5: reset_token removed from POST /api/users/{id}/reset-password response
- C8: Startup WARNING when no user has password_hash (bootstrap window visible)
- M9: Audit log on failed web form login (mirrors /auth/token endpoint)
- M10: Atomic magic-link consume via compare-and-swap (CONSUMED: marker + DuckDB conflict catch)

Also: SSRF protection on /api/admin/configure (#46), memory stats SQL aggregation (#90)

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* fix(review): SSRF 169.254.x.x + IPv6 multicast; M10 marker cleanup safety

Review fixes:
- Add 169.254.0.0/16 (link-local, cloud metadata) to SSRF regex — was
  missing, allowing requests to AWS/GCP/Azure metadata endpoints
- Add ff[0-9a-f]{2}: (IPv6 multicast) to SSRF regex
- M10: wrap Step 3 (CONSUMED marker cleanup) in try-except with
  warning log — prevents unhandled exception if DB write fails after
  successful token consumption
- Add test for 169.254.169.254 SSRF rejection

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* fix(review): SSRF IPv6 bypass, CLI health endpoint, upload FD leak

Address Devin Review findings on PR #104:

1. SSRF IPv6 bypass: Replace hostname regex with DNS resolution +
   ipaddress module checks. The old regex patterns like `fe80:` only
   matched up to the first colon, missing real IPv6 addresses like
   `fe80::1`, `fc00::1`, `ff02::1`. The new approach resolves the
   hostname via getaddrinfo and checks each resulting IP against
   ipaddress.is_private/is_loopback/is_link_local/is_reserved/is_multicast.

2. CLI commands broken: `da setup test-connection`, `da setup verify`,
   `da diagnose`, `da status` all called /api/health expecting the old
   format (status=="healthy", services dict). Now they call
   /api/health/detailed for service-level checks (with graceful fallback
   to the minimal endpoint when auth is not configured).

3. Temp file handle leak: _stream_to_temp returns an open
   NamedTemporaryFile; callers now close it before shutil.move() to
   prevent FD leaks until GC.

Also adds IPv6 SSRF test cases (loopback, link-local, unique-local,
multicast) with mocked DNS resolution for test environment independence.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* fix(review): download regex blocks hyphenated IDs; document health split

Address Devin Review round-3 findings on PR #104:

1. _SAFE_IDENTIFIER regex blocked hyphenated table IDs: The download
   endpoint used the strict SQL-identifier regex which does not allow
   dots or hyphens, but Keboola table IDs like in.c-crm.orders
   contain both. Switched to _SAFE_QUOTED_IDENTIFIER which allows dots
   and hyphens while still blocking path-traversal chars (/, .., \)
   and quote/control characters. Added test for hyphenated/dotted IDs.

2. Documented health endpoint split in DEPLOYMENT.md: Added Health
   checks & external monitoring section explaining both endpoints
   (minimal unauth /api/health vs authenticated /api/health/detailed)
   and how to wire external monitoring tools to the detailed endpoint
   with a PAT.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* release(0.12.1): cut hotfix for snapshot integrity + #82/#85/#87 hardening

* fix(security): apply CAS pattern to password reset confirm (#82/M10 follow-up)

Devin review on the rebased PR flagged the asymmetry: magic-link verify
got the atomic compare-and-swap pattern in the original M10 fix, but
password reset confirm at /auth/password/reset/confirm was still using
read-validate-clear. Two concurrent POSTs with the same valid reset
token could both succeed in setting different new passwords (last-write-
wins). Lower severity than the magic-link race because the attacker
would need the reset token AND to race the legitimate user, but the
asymmetry was a polish gap.

Mirrors app/auth/providers/email.py::_consume_token CAS exactly: write
unique CONSUMED:<random> marker via UPDATE...WHERE token=old_token, then
SELECT to verify our marker won, then proceed. Only the winner clears
the marker and applies the password change.

New regression test_concurrent_reset_only_one_wins in
tests/test_password_flows.py::TestResetConfirm pins the contract: two
ThreadPoolExecutor workers + Barrier hit /reset/confirm with the same
token; exactly one gets 302 (password applied), the other gets 200 with
'Invalid or expired'. Sanity-checked against the pre-CAS code — both
POSTs got 302 (race confirmed).

---------

Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
2026-04-28 19:57:30 +02:00