agnes-the-ai-analyst

Author	SHA1	Message	Date
ZdenekSrotyr	7e4ddf0b01	feat(auth): password reset & invite flows for web + admin (#34 ) (#37 ) * feat(auth): password reset & invite flows for web + admin (#34) Wires end-to-end the previously orphaned password_reset.html and password_setup.html templates, adds the missing POST /auth/password/reset handler (closes #34), and restores the Reset action in the admin user UI (which origin/main had removed precisely because the flow was broken). Web flow - GET /auth/password/reset — renders the set-new-password form - POST /auth/password/reset — 'Forgot Password?' request; emails link, anti-enumeration (same response for unknown email) - POST /auth/password/reset/confirm — validates token + 24h TTL, sets new password, clears token, logs user in - GET /auth/password/setup — renders the setup form (invite link landing) - POST /auth/password/setup/request — signup-tab 'Request Access' (email-only) - POST /auth/password/setup/confirm — 7-day TTL, sets password + name, logs in - Reuses LOCAL_DEV_MODE pattern from email.py: logs the link loudly so developers can use the flow without an SMTP/SendGrid transport Admin flow - POST /api/users accepts send_invite → returns invite_url + invite_email_sent - POST /api/users/{id}/reset-password now returns a full reset_url pointing at the dedicated password-reset endpoint (NOT the magic-link verifier, which would log the user in without prompting for a new password) - admin_users.html: restored Reset row action, copyable reset/invite link modals, invite checkbox on create, reworded 'magic-link not wired' notes Backward compat - JSON POST /auth/password/setup kept unchanged (existing tests pass) - Active-account gate applied to reset/setup flows (matches password_login) Tests: 21 new cases (tests/test_password_flows.py) covering GET renders, request/confirm happy + error paths, TTLs, anti-enumeration, and admin invite/reset URL responses. Full suite: 1309 passed. Closes #34 * fix(admin-users): allow horizontal scroll when actions overflow Four action buttons (Tokens, Reset, Set pwd, Delete) can exceed the viewport on narrow screens. Switch .users-table-wrap from overflow: hidden to overflow-x: auto so the table scrolls instead of clipping, and lock row-actions buttons to a single nowrap line. * fix(admin-users): override base 800px container so table can use full width The base layout caps .container at 800px, so the table was always being clipped regardless of viewport. Unclamp the container on this page and widen the inner page cap to 1400px. * fix(auth): address Devin review — harden JSON setup, anti-enumeration, preserve email case Addresses findings from Devin review on PR #37: 1. JSON POST /auth/password/setup now enforces the same SETUP_TOKEN_TTL (7 days) and active-account check as the web flow. An expired token or a deactivated user can no longer bypass the gate by posting JSON. Existing test fixture seeds setup_token_created=now so backward-compat tests continue to pass. 2. GET /auth/password/setup no longer looks up the user to pre-fill name. The form renders identically regardless of whether the email exists, consistent with anti-enumeration in POST /setup/request. 3. reset_request / setup_request no longer lowercase the submitted email. The rest of the codebase (password_login, magic-link, admin create) uses case-sensitive lookups, so normalizing only here would silently fail for mixed-case accounts. Tests: 6 new cases covering expired-JSON-setup, missing-created-timestamp, deactivated-user-rejection, mixed-case email preservation, and the anti-enumeration property of GET /setup.	2026-04-22 17:43:57 +02:00
ZdenekSrotyr	d2c76cb221	User management + PAT + CLI distribution + HTML auth redirect (#9 #10 #11 #12 ) (#28 ) * fix: redirect unauthenticated HTML routes to /login (#10) * docs(plan): user mgmt + PAT + CLI distribution implementation plan (#9 #10 #11 #12) * build(docker): produce wheel artifact for /cli/download (#9) * feat(db): schema v5 — users.active + deactivated_at/by (#11) * feat(api): /cli/download wheel + /cli/install.sh with baked server URL (#9) * feat(users): repository supports active flag + count_admins (#11) * feat(ui): /install page with per-deployment install instructions (#9) * feat(api): user PATCH/reset-password/set-password/activate/deactivate (#11) * fix(cli): da login prompts for password and sends it in body (#9) * test(api): safeguard tests for self-deactivate and last admin (#11) * feat(auth): reject requests from deactivated users (#11) * fixup(#10): propagate next through /login buttons + lock down sanitizer tests * feat(cli): da admin set-role/activate/deactivate/reset-password/set-password (#11) * feat(ui): /admin/users management page (#11) * feat(db): schema v6 — personal_access_tokens (#12) * feat(users): access_tokens repository (#12) * feat(auth): JWT carries typ (session\|pat) and explicit jti (#12) * feat(auth): reject revoked/expired PATs; update last_used_at (#12) * feat(api): /auth/tokens CRUD + admin revoke; session-only guard (#12) * feat(cli): da auth token create/list/revoke (#12) * feat(ui): /profile page with PAT create/list/revoke (#12) * docs: PAT usage and session/PAT TTL clarification (#12) * feat(auth): PAT first-use-from-new-IP audit + last_used_ip (schema v7) (#12) Closes remaining acceptance gap from issue #12: audit_log entry on first use of a PAT from an IP that differs from the recorded last_used_ip. - schema v7: personal_access_tokens.last_used_ip column - AccessTokenRepository.mark_used now stores the client IP - get_current_user extracts client IP (X-Forwarded-For first hop, fallback to request.client.host) and emits a token.first_use_new_ip audit when the IP changes on a subsequent use (not the very first use) - tests: new-ip audit, same-ip no-op, first-ever-use no-op, schema v7 column * fix: address Devin review findings on PR #28 - app/main.py: exclude /auth/* from HTML redirect handler so JSON endpoints under /auth/ (PAT CRUD used by `da auth token` CLI) keep their 401 JSON contract (Devin #1, bug) - app/api/tokens.py: reject expires_in_days <= 0 explicitly; use `is not None` so 0 no longer silently creates a non-expiring token (Devin #2) - app/api/users.py: validate role against Role enum in create_user to match update_user and prevent 500 on role-protected requests later (Devin #3) - app/web/templates/admin_users.html: escape user-supplied strings before innerHTML; move onclick handlers to addEventListener via data attributes so emails with quotes / HTML no longer break the UI or enable stored XSS (Devin #4) - app/auth/router.py, app/auth/providers/{password,google}.py: reject deactivated users at login instead of issuing a JWT that would then fail on the next request — removes the confusing redirect loop (Devin #5) - CLAUDE.md: document schema v7 instead of stale v4 (Devin #6) - tests/test_web_ui.py: regression test for the /auth/* JSON 401 * feat(web): add /profile and /admin/users links to dashboard nav * feat(web): point setup banner at /install page * chore(web): drop unused setup_instructions context * fix: address Devin review round 2 on PR #28 - app/api/tokens.py: when expires_in_days is None (the "never" option), use a ~100-year JWT expiry so the token doesn't silently die in 24h via the session-default fallback in create_access_token. The real expiry enforcement stays in verify_token's DB-level check (Devin 🔴) - app/web/templates/profile.html: escape t.name and other user-supplied strings via esc() helper before innerHTML, same pattern as admin_users.html. Move revoke onclick to data-attribute + addEventListener (Devin 🟡) - app/api/cli_artifacts.py: use `mktemp -d` with X's at end of template for GNU/BSD portability, place wheel inside the temp dir and clean up with rm -rf (Devin 🚩) * feat(web): redesign /install page; make curl one-liner primary, collapse manual Rebuild the public /install page using the dashboard visual language (shared header, card layout, gradient hero, design tokens from style-custom.css). The page is now anchored on the one-liner install path: curl -fsSL <server>/cli/install.sh \| bash is rendered as the primary, prominent step 1, while the old manual wheel-download flow is tucked behind a closed-by-default <details> block for users in restricted/offline environments. Information architecture: hero (server URL + version) -> step 1: quick install (one-liner, big Copy button) -> step 2: create PAT on /profile + export DA_TOKEN / da auth whoami -> step 3: Claude Code / MCP via ~/.config/da/token.json -> collapsed "Manual install" details for download-wheel flow -> footer link to docs/HEADLESS_USAGE.md Every shell snippet has a vanilla-JS "Copy" button that confirms visually ("Copied!" for 1.5s) and falls back to textarea+execCommand on non-secure contexts. No new dependencies, no bundler. The route now also pulls an optional user so the header shows the same nav (Dashboard / Profile / Logout) as dashboard.html when a session exists, while staying fully public when signed out. * fix(cli): use real wheel filename in install.sh (broken pip/uv install) The installer wrote the downloaded wheel as agnes_cli.whl, which lacks a PEP-427 version component — both pip and uv tool install reject it and abort the one-liner. Use curl -OJ so Content-Disposition determines the on-disk filename, then resolve it via glob. Install an EXIT trap to remove the tmpdir even when install fails. * fix(web): correct manual install wheel glob and add PEP 668 / PATH hints - Wheel glob is agnes_the_ai_analyst-.whl (not agnes-.whl) — the old pattern never matched the real artefact name from the build. - Add — or — separator between uv tool install and pip install. - Warn that pip install --user is blocked on macOS Homebrew / modern Debian (PEP 668) and recommend uv tool install as the default path. - Both flows now show the ~/.local/bin PATH hint so a fresh shell can find the da binary after install. * fix(web): consistent session.user reference in install header The avatar-letter fallback inside {% if session.user %} was reading user.name / user.email directly, but the route dependency can pass user=None — those references resolved to an empty FlexDict and produced an empty avatar circle. Read everything through session.user to match the guard and the dashboard pattern. * fix(web): point headless usage link at GitHub source /docs/HEADLESS_USAGE.md 404s — no static route serves repo docs. Point the footer link at the rendered markdown on GitHub instead of adding a dedicated docs serving route just for one file. * feat(web): /install hero size, anon sign-in banner, step 2 copy polish - Bump hero h1 from 26px to 30px to match dashboard primary scale. - Anonymous visitors see a small sign-in banner above Step 2 (creating a token requires auth; without the banner the flow appears stuck). - Add an 'After generating your token' section label inside Step 2 so the /profile CTA button no longer looks wedged mid-sentence between adjacent paragraphs. * chore(web): /install a11y + version pill polish - aria-live='polite' on copy buttons so screen readers announce the 'Copied!' state change. - Replace redundant INSTANCE_NAME eyebrow (already in the header logo) with 'Getting started'. - Hide the version pill when AGNES_VERSION is unset/'dev' — avoids the misleading 'vdev' label in local/unbuilt runs. - Manual summary focus-visible outline-offset +2px (was -2px which clipped inside the card), and mark the chevron as decorative. * fix(web): use session.user in dashboard avatar fallback Inside {% if session.user %} guard, the avatar fallback referenced (user.name or user.email). If user is None the block crashes when the profile picture is absent. Align with the guard variable. * fix: address Devin review round 3 on PR #28 - app/api/users.py: stop auto-sending email from reset_password. The magic-link sender would deliver a "Login Link" that — when clicked — consumes the reset_token via verify_magic_link and logs the user in WITHOUT prompting for a new password. Admins now share the raw reset_token from the API response manually, or use set-password directly. email_sent is always False. Documented inline. (Devin 🟡) - app/api/cli_artifacts.py: harden /cli/install.sh generation against shell injection via Host header or AGNES_VERSION. base_url is validated against a strict scheme+host+port regex; version against an alnum + dot/dash/underscore allowlist. Both values are also piped through shlex.quote() as defense in depth. (Devin 🟡) The shared users.reset_token column between magic-link and password- reset flows (Devin 🚩) remains an architectural gap; splitting into separate columns needs schema v8 and is tracked for a follow-up PR. * docs, chore(grpn): manual-deploy helpers + hackathon deploy learnings Adds scripts/grpn/ — Makefile + agnes-auto-upgrade.sh + README for operating Agnes on GRPN's existing foundryai-development VM when the full Terraform flow is blocked by org policies: - iam.disableServiceAccountKeyCreation (org constraint) forbids SA JSON keys, so GCP_SA_KEY-based CI is unavailable - No projectIamAdmin delegation → bootstrap-gcp.sh can't grant roles - Secret Manager IAM bindings require setIamPolicy which editor lacks Helper targets: deploy, deploy-tag, recreate, restart, stop, start, status, version, logs, ps, env, ssh, tunnel, open, bootstrap-admin, set-data-source, install-cron, uninstall-cron. docs/superpowers/plans/2026-04-22-grpn-deploy-learnings.md — running log of all org-policy constraints hit during the hackathon deploy, with workarounds and derived follow-ups (WIF support, external_ip variable, customer onboarding IAM checklist). Not a replacement for the TF flow — stopgap until WIF lands. * fix(web): make header logos clickable links to home * feat(web): one-click "Setup a new Claude Code" button Adds a single-button flow on the dashboard and /install page that generates a fresh personal access token via POST /auth/tokens and copies a complete, paste-ready setup script (server URL, token, install/verify commands) to the clipboard. Falls back to a modal textarea when the clipboard is blocked; redirects to /login on 401; surfaces backend errors inline. - dashboard.html: replaces the top "Set up your local environment" anchor with a real button wired to setupNewClaude(). Removes the duplicate bottom setup banner to keep a single entry point. - install.html: for signed-in users, Step 1 leads with the one-click button and demotes the curl one-liner into a collapsible "Or run manually" aside. Anonymous visitors still see the curl flow plus a sign-in hint. - No new deps. Vanilla JS. Token lives in memory/clipboard only — never rendered into persistent DOM. * feat(cli): add "da auth import-token" for non-interactive PAT login Writes a provided JWT into ~/.config/da/token.json using the canonical {access_token, email, role} shape expected by save_token(). Decodes the token locally to pull email/role claims, verifies it against the server via GET /api/catalog/tables, and refuses to overwrite an existing token file if the server returns 401. --email / --role overrides exist for tokens missing those claims; --skip-verify bypasses the server round-trip for offline / CI scenarios. * test(cli): cover da auth import-token success + 401 + claim-fallback paths Three new tests in TestAuthImportToken: - valid JWT + 200 -> canonical token.json written - 401 from /api/catalog/tables -> exit 1, existing token file untouched - JWT without email/role claims -> refused without overrides, accepted with --email / --role flags * feat(web): update one-click Claude setup instructions — explicit uv install, import-token, skills question Replaces the fragile `cat > token.json <<EOF` clipboard payload with an explicit, auditable sequence: 1. `curl -fsSL /cli/download` + `uv tool install --force` (no opaque `curl \| bash`). 2. `da auth import-token --token ...` instead of hand-written JSON. 3. Explicit PATH persistence for zsh/bash. 4. A required question to the user about whether to copy the bundled skills into ~/.claude/skills/agnes/ or pull them on-demand via `da skills show`. 5. A final confirmation step with whoami + version output. Factored both pages to include a shared partial (app/web/templates/_claude_setup_instructions.jinja) so dashboard.html and install.html can never drift apart again. {server_url} and {token} stay as runtime placeholders substituted by renderSetupInstructions(). * feat(ui): modernize /admin/users + unify header nav across pages - New shared partial app/web/templates/_app_header.html — single source of truth for the top navigation. Used by base.html and dashboard.html (which doesn't extend base.html). Active page highlighted via request.url.path. Admin "Users" link gated by session.user.role. - style-custom.css: add .app-header / .app-nav-link / .app-btn-logout / .app-avatar styles (mirrors dashboard's previous inline copy under app-* prefix). Mobile-friendly fallback at <720px. - base.html: include the new partial so every page extending base (admin_users, profile, login_email, error, …) gets the same chrome the dashboard has. - dashboard.html: replace its inline <header class="header"> markup with the shared partial. Inline .header CSS left in place as harmless dead code (separate cleanup PR). - admin_users.html: rewritten with avatars, role pills (color-coded per role), toggle switch for active, search/filter input, toast notifications, modal dialogs replacing alert/confirm/prompt, one-click copy for the reset token, empty / loading states. All XSS-safe via the existing esc() helper + data-attribute event delegation. - tests/test_web_ui.py: smoke test that /admin/users renders the new shared header chrome and the modernized markup. * feat(api): serve CLI wheel at /cli/agnes.whl for direct uv install uv tool install inspects the URL path suffix to recognise a wheel, so /cli/download (which has no .whl suffix) cannot be installed directly. Expose a stable /cli/agnes.whl alias over the same wheel lookup so users can run: uv tool install --force https://<server>/cli/agnes.whl * test(cli): cover da auth import-token --server persisting to config.yaml The server persistence was already implemented in the import-token command (save_config({server}) call) but not covered by tests. Add an explicit test so the one-step setup contract — single import-token call writes both token and server — cannot regress. * feat(web): simpler Claude setup — single uv install URL, single import-token call User feedback: the prior clipboard payload repeated the server URL and token across multiple steps (curl + tmpfile + install + rm + separate seed-config + import-token). Collapse to: 1. uv tool install --force {server_url}/cli/agnes.whl (single URL, direct) 2. da auth import-token --token ... --server ... (one call, persists both) 3. da auth whoami 4. skills (ask user first) 5. confirm uv accepts HTTPS URLs that end in .whl and installs them directly, so the tmpfile dance is unnecessary. import-token --server already persists the server to config.yaml, so no separate printf > config.yaml step. * fix(tests): update admin users heading assertion after template rename The admin_users.html template now uses <h2 class="users-title">Users</h2> instead of <h2>User management</h2>. Update the assertion to match. * feat(ui): unify header across remaining 7 standalone pages These 7 pages render their own full <html> and don't extend base.html, so the previous unification commit only covered base + dashboard. Each had its own ad-hoc <header> markup with inconsistent classes (.top-header / .header / .page-header), inconsistent nav-link sets, and inconsistent avatar/email styling. Replace each inline <header>...</header> block with the shared {% include '_app_header.html' %} so /activity-center, /admin/permissions, /admin/tables, /catalog, /corporate-memory, /corporate-memory/admin, and /install all show the same chrome (Dashboard / Install CLI / Profile / Users / email + avatar / Logout) with the active page highlighted via request.url.path. Old inline header CSS (.header, .top-header, .page-header, .nav-link, etc.) is left in place as harmless dead code; it can be cleaned up in a follow-up sweep. * feat(web): add readable preview of Claude setup payload on dashboard + /install Move the line-by-line setup instructions into app/web/setup_instructions.py as the single source of truth, then render them in two modes from the existing _claude_setup_instructions.jinja partial: - preview_mode=True → visible, read-only <pre><code> block with the real server URL and a clearly-styled placeholder token (never a real one). - preview_mode=False → the JS SETUP_INSTRUCTIONS_TEMPLATE used by the one-click flow (unchanged behaviour). Both /dashboard (env-setup-cta card) and /install (Step 1 card) now show the preview directly under the 'Setup a new Claude Code' button so users can see exactly what will land in their clipboard before they click. * feat(web): update setup instructions — `da diagnose` step, explicit section titles Rework the Claude Code setup payload to: - Give every numbered step an unambiguous verb header ("1) Install the CLI", "2) Log in", "3) Verify the login", "4) Run diagnostics", "5) Skills (ask the user first)", "6) Confirm"). - Add step 4 `da diagnose` as the post-login health check. The CLI already ships this command (cli/commands/diagnose.py); it prints "Overall: healthy" and a list of green checks that map cleanly to next actions. - Ask the skills copy-vs-on-demand question verbatim so Claude Code always prompts the user the same way. - Replace the terse "Confirm" line with a 4-bullet summary (version, whoami, skills choice, diagnose status) so the return message is structured and comparable across setups. * chore(web): remove stale MCP card from /install (no MCP server today) The 'Use with Claude Code / MCP' card (Step 3 on /install) referenced an MCP integration Agnes does not ship. Remove the whole card. The one-click 'Setup a new Claude Code' flow in Step 1 already covers the long-lived client use case and is less confusing than dangling persistence tips for a non-existent integration. * feat(api): include user_email + last_used_ip + user_id in admin tokens list response Adds AdminTokenItem response model (superset of TokenListItem) and AccessTokenRepository.list_all_with_user() joining personal_access_tokens with users to denormalize user_email. Needed for /admin/tokens UI where admins triage tokens across all users. * feat(web): /admin/tokens page — list, filter, search, revoke across all users Adds a new admin-only page with client-side filtering (status, user email, last-used window), column sorting, counts bar (active/revoked/expired), and an inline revoke action. Mirrors the /admin/users visual language. * feat(web): add Tokens nav link for admins + deep-link from admin/users row Admin-only nav entry to /admin/tokens, and a per-row Tokens button on /admin/users that prefills the token page's user filter via ?user=<email>. * test(admin): cover /admin/tokens rendering, filter state, non-admin denial, revoke Verifies admin can render the page (title + JS hooks present), a non-admin is blocked, unauthenticated users are redirected, the admin list response includes user_email / user_id / last_used_ip, and admin can revoke another user's token. * feat(web): modern redesign of /admin/tokens — hero, stat strip, refined table, responsive cards, a11y * feat(web): ditch the table — /admin/tokens as a card stack, modern GitHub-style list Replaces the table-based layout with a stack of self-contained token cards inside a <ul role=list>. Each card is a flex row: avatar + name/meta on the left, last-used block in the middle, status pill + outlined 'Revoke' button on the right. Status and sort controls are pill-shaped toggle chips; user email search has an inline search icon. No <table>/<tr>/<th>/<td> anywhere. Responsive below 720px (card stacks vertically) and 480px (stat chips 2x2). Preserves filter IDs (flt-status, flt-user, flt-last-used) and data-revoke for existing tests. * feat(web): add /tokens (role-aware) — single page for both user PAT CRUD and admin overview - Rename admin_tokens.html -> tokens.html with a new is_admin context flag. - New route GET /tokens: renders the same card-stack UI for everyone. * Admins: loads /auth/admin/tokens, shows owner column + stat strip, keeps the owner-email search box and sort-by-owner chip. * Non-admins: loads /auth/tokens (own tokens only), hides owner column + stat chips, adds a 'New token' CTA in the hero that opens a modal (name + expires_in_days) calling POST /auth/tokens. The raw token is revealed once in a dismissable banner and cleared from the DOM on Hide. - GET /admin/tokens now 302-redirects to /tokens, preserving query string (so the /admin/users deep-link ?user=foo still works). * feat(web): /tokens full-bleed layout to match dashboard width The hero, toolbar, and card list used to sit inside base.html's .container (max-width 800px). Break out with negative horizontal margins so the page spans the viewport like /dashboard does, capped at 1440px for readability on very wide screens with a 24px gutter on each side. - No change to base.html itself. The override is scoped to .tokens-page. - body { overflow-x: hidden; } guards against rare horizontal scrollbars. - < 808px viewport: reset to natural flow (mobile already narrower). - ≥ 1488px viewport: cap to 1440px and re-center. * chore(web): remove /profile template + nav link (redirect /profile -> /tokens) The old /profile PAT CRUD page is now redundant — the modern /tokens page covers both user and admin flows. Delete the template; the router's /profile handler already 302-redirects to /tokens. Nav cleanup: - Remove the 'Profile' link. - Show a single 'Tokens' link to every signed-in user (previously only admins saw it). - Active-state matches /tokens, /admin/tokens, and /profile so the highlight survives the redirect chain. /install CTA now points at /tokens instead of /profile. * test: cover /tokens for admin + non-admin flows, /profile redirect, nav update tests/test_admin_tokens_ui.py - Point admin rendering test at /tokens directly and tighten assertions (admin-only stat strip + owner search, non-admin CTA absent). - Add test_non_admin_can_render_tokens_page: personal body, New-token CTA, create-modal, reveal banner; stat strip + owner search absent. - Add test_admin_tokens_redirects_to_tokens: 302 to /tokens, query string (?user=...) preserved for the /admin/users deep-link. - Add test_profile_redirects_to_tokens: 302 to /tokens. - Add test_non_admin_can_create_pat_via_tokens_page_api: exercises the POST /auth/tokens call that the non-admin create-modal submits. tests/test_pat.py - test_profile_page_renders -> test_profile_page_redirects_to_tokens: assert the 302 + that /tokens lands on the unified non-admin body. tests/test_web_ui.py - admin_users nav assertion: 'Tokens' link present, 'Profile' link absent. - Add test_nav_shows_tokens_link_for_non_admin: non-admins see the same 'Tokens' link (previously only admins did). - Add test_profile_redirects_to_tokens back-compat check. * feat(web): collapse 'What Claude Code will receive' by default The preview block on /dashboard and /install now uses <details>/<summary> so it is hidden by default. Click the chevron/title to expand and review the clipboard payload. Markup stays in the DOM so existing tests that assert on content continue to pass. * fix(web): /tokens width — override .container to 1280px like dashboard The negative-margin full-bleed trick was fragile and pushed content past the right edge on deployed viewports. Replace with a simple max-width override of base.html's .container on this page only, matching /dashboard's 1280px center-column layout. * feat(web): split role-aware /tokens into my_tokens.html + admin_tokens.html * feat(web): router — separate handlers for /tokens (own) and /admin/tokens (all) * feat(web): nav — show Tokens for all, add All tokens for admins * test: cover split token pages (own vs all) + admin access gating * feat(web): move 'My tokens' into a user dropdown menu Replaces the separate Tokens/email/Logout nav trio with a rounded avatar trigger that opens a dropdown containing the user's email, role, a 'My tokens' link, and Logout. Admin-only 'All tokens' stays as a top-level nav item since it's an admin function, not a personal one. Click-outside and Escape close the panel; chevron rotates on open. * fix(api): allow PATs to list/get/revoke their own tokens (CLI flow) The documented 'da auth token list/revoke' CLI flow in docs/HEADLESS_USAGE.md uses a PAT, but the previous dependency (require_session_token) returned 403. Only create_token must be session-only to prevent PAT-spawning-PAT chains; listing and revoking your own tokens is safe with a PAT. * fix(api): cap expires_in_days at 3650 to avoid datetime overflow (500 to 400) Values above ~11 million days overflowed datetime.max in datetime.now(utc) + timedelta(days=...) and surfaced as an unhandled OverflowError → 500. Cap at 10 years with a clear 400 instead; the no-expiry code path is unaffected. * fix(api): relax _SAFE_URL_RE to allow path prefixes, underscores, and IPv6 The previous regex rejected legitimate reverse-proxy base_url values (https://host/agnes/), underscores in Docker Compose hostnames, and IPv6 literals (http://[::1]:8000). Widen the charset and allow an optional trailing path. shlex.quote continues to provide defense-in-depth against any metacharacter that slips through. * fix(web): /login/email and Google OAuth propagate next_path Previously, /login/email silently dropped the ?next=<path> query param so the hidden form field rendered empty and login always landed on /dashboard. Google's button was hard-coded to /auth/google/login, ignoring next entirely. - /login page now appends ?next to the Google button URL - /login/email reads + sanitizes next, passes as template context - google_login stashes sanitized next_path in session['login_next'] - google_callback pops + re-sanitizes and redirects there Sanitization factored into app/auth/_common.safe_next_path. * fix(auth): differentiate argon2 VerifyMismatchError from internal errors in web login The previous except (VerifyMismatchError, Exception) collapsed both cases into the generic 'invalid credentials' redirect, silently hiding corrupted-hash / library errors from ops. Split the two: bad password still gets ?error=invalid; anything else logs via logger.exception and redirects with ?err=auth_internal so ops have a visible signal and users don't retry forever against a broken password_hash column. * docs: correct CLAUDE.md table name (personal_access_tokens) v7 note referenced 'access_tokens.last_used_ip' but the real table is personal_access_tokens (as mentioned two tokens earlier in the same bullet). Same-file consistency fix. * chore(web): clarify admin user-reset UI — encourage Set password over the unused reset_token POST /api/users/{id}/reset-password stores and returns a token but no endpoint consumes it — the magic-link sender would log the user in without prompting for a new password, defeating the reset. - Drop the 'Reset' row action from admin_users so admins aren't pointed at a dead end. - Rewrite the reveal-modal copy to tell admins to use Set password and explicitly note that the magic-link flow isn't available for reset tokens in this build. The API endpoint stays for API-level future use. * test: cover PAT CLI flow, expires_in_days overflow, proxy base_url, next propagation - tests/test_pat.py: PAT can list own tokens (200, was 403); PAT can revoke own tokens (204); create_token returns 400 for expires_in_days > 3650 (was 500 via datetime overflow). - tests/test_cli_artifacts.py: _SAFE_URL_RE accepts reverse-proxy path prefixes, underscores, and IPv6 literals; end-to-end check of cli_install_script with a stubbed base_url that includes a path prefix (Agnes behind /agnes/). - tests/test_web_ui.py: /login propagates ?next to the Google button URL; /login/email renders next in the hidden form field and strips hostile values; unit coverage of safe_next_path. * fix(security): use \Z instead of $ in URL/version allowlists (trailing-\n bypass) Python regex `$` also matches just before a trailing newline, so a Host header or AGNES_VERSION value like "good.example.com\n$(rm -rf /)" would slip past the allowlist. `\Z` anchors to strict end-of-string. shlex.quote downstream remains as defense-in-depth, but the allowlist is now the tight gate it claims to be. * fix(auth): PAT with null expiry omits JWT exp claim (DB is the source of truth) Previously a PAT created with `expires_in_days=null` (user-requested "never expires") set the DB `expires_at` to NULL (correct) but still baked a ~100y `exp` claim into the JWT. That is misleading: the PAT silently did expire eventually, despite the UI and API promising "no expiry". `create_access_token` now accepts `omit_exp=True` to skip the `exp` claim entirely. `app/api/tokens.py` passes that when `expires_in_days is None`. The authoritative expiry check lives in `app/auth/dependencies.py`, which reads `expires_at` from the DB row — unchanged. PyJWT accepts claim-less JWTs indefinitely. * test: cover trailing-newline regex bypass + no-exp JWT for unbounded PAT - test_safe_url_re_rejects_trailing_newline_bypass: asserts both `_SAFE_URL_RE` and `_SAFE_VERSION_RE` reject values with a trailing `\n` (previously accepted because Python `$` matches before `\n`). - test_pat_null_expiry_jwt_has_no_exp_claim: POST /auth/tokens with `expires_in_days=null`, decode the returned JWT, assert `exp` is absent while `typ=pat`, `sub`, and `jti` are still present. - test_pat_with_null_expiry_is_accepted_by_verify_token: verify_token round-trips a claim-less JWT without ExpiredSignatureError. - test_pat_null_expiry_end_to_end_allows_authenticated_request: use the null-expiry PAT against /auth/tokens and confirm it authenticates. * docs(auth): document X-Forwarded-For trust model in _client_ip Deployment runs behind Caddy which strips incoming X-Forwarded-For and sets its own, so the leftmost hop is trustworthy. Clarify that the stored last_used_ip is audit-only and never used for access control — if the app is ever exposed directly, this value becomes client-settable. * docs: /profile → /tokens in install.sh next-steps, CLI error, HEADLESS_USAGE, security skill After splitting PAT management to /tokens (with /profile as a back-compat 302), stale references remained in user-facing text. Update them to the canonical /tokens URL so shell scripts, CLI error hints, docs, and the bundled security skill are all consistent.	2026-04-22 14:24:28 +02:00
ZdenekSrotyr	2b17973796	fix(auth): /auth/bootstrap activates seed users, disabled only by real password Bug: SEED_ADMIN_EMAIL creates a password-less user at app startup, which made /auth/bootstrap return 403 '1 users already exist' on a fresh deployment — leaving the operator no way to log in (the seed user has no password, and /auth/token requires one). Fix: bootstrap is now disabled only when at least one user has a password_hash set. On a fresh deploy with a seed user: - POST /auth/bootstrap { email: <matches seed>, password: X } → sets the password on the seed user, promotes to admin, returns token. - With a non-matching email, a new admin is created alongside the seed user. Lock semantics: bootstrap self-deactivates as soon as any password is set. Tests: 8 passing, including new test_bootstrap_activates_seed_user and test_bootstrap_disabled_when_password_user_exists covering the two halves.	2026-04-21 20:01:20 +02:00
ZdenekSrotyr	bd6921c4d5	docs,tests: anonymize customer references Replace identifying customer names and infrastructure URLs in documentation and test fixtures with generic placeholders. Test semantics preserved.	2026-04-21 11:56:19 +02:00
ZdenekSrotyr	5bbd82bacd	fix: address Devin review — docker-e2e .env, jira webhook test isolation - Create empty .env before docker compose up in CI (env_file: .env is required) - Mock get_jira_service in webhook HMAC test to isolate signature check from Jira API availability — strict assert 200 instead of permissive 500	2026-04-13 14:36:31 +02:00
ZdenekSrotyr	863453b2e2	fix: address code review findings — duplicate fixture, JWT key length, async deprecation - Remove duplicate mock_extract_factory fixture in conftest.py - Use 32+ char JWT_SECRET_KEY everywhere (was 15 chars, triggered warnings) - Replace deprecated asyncio.get_event_loop() with asyncio.run() - Unify WebhookEventFactory sign methods (consistent json.dumps)	2026-04-13 13:47:51 +02:00
ZdenekSrotyr	12480b8c35	fix: graceful skip for telegram bot tests when log dir unavailable in CI	2026-04-13 13:31:51 +02:00
ZdenekSrotyr	0045f5d324	fix: ensure DATA_DIR and notifications dir exist before bot.py import in CI	2026-04-13 13:26:18 +02:00
ZdenekSrotyr	1a68decd4e	fix: patch BOT_LOG_FILE at import time for CI/xdist compatibility	2026-04-13 13:21:04 +02:00
ZdenekSrotyr	9a144f8291	fix: unify JWT_SECRET_KEY across all test modules for xdist stability	2026-04-12 14:28:17 +02:00
ZdenekSrotyr	ed58075419	Merge branch 'worktree-agent-a417e289' into feature/v2-fastapi-duckdb-docker-cli	2026-04-12 14:24:39 +02:00
ZdenekSrotyr	325f785ef4	fix: get_instance_name reads nested instance.name from YAML	2026-04-12 14:23:54 +02:00
ZdenekSrotyr	209643becb	fix: return filename instead of absolute path in upload responses	2026-04-12 14:23:51 +02:00
ZdenekSrotyr	31e210c7e3	fix: require admin/km_admin role for web admin pages	2026-04-12 14:23:47 +02:00
ZdenekSrotyr	01b5f80ef9	fix: restrict script deploy/execute to analyst role, undeploy to admin	2026-04-12 14:23:44 +02:00
ZdenekSrotyr	2ec50b4e4f	test: add telegram API endpoint tests (verify, unlink, status)	2026-04-12 14:12:28 +02:00
ZdenekSrotyr	e25a7aba7d	fix: resolve JWT secret key test isolation issue Replace module-level SECRET_KEY cache with lazy _get_cached_secret_key() that re-reads env vars in test mode. This fixes 20 test failures caused by JWT secret mismatch when test modules load in different orders.	2026-04-12 14:05:41 +02:00
ZdenekSrotyr	833de96cd7	merge: resolve Block E conflicts in pytest.ini and conftest.py	2026-04-12 11:17:26 +02:00
ZdenekSrotyr	d70d645902	Merge branch 'worktree-agent-afb2461f' into feature/v2-fastapi-duckdb-docker-cli	2026-04-12 11:15:35 +02:00
ZdenekSrotyr	8e22eed669	Merge branch 'worktree-agent-aaa8db4c' into feature/v2-fastapi-duckdb-docker-cli	2026-04-12 11:15:34 +02:00
ZdenekSrotyr	44317a86c6	merge: resolve factories.py conflict — keep Faker factories + add Block D convenience methods	2026-04-12 11:15:15 +02:00
ZdenekSrotyr	7967279181	test: add E2E journey tests (J1-J8) covering full user flows 40 tests across 8 files covering bootstrap/auth, sync+query, hybrid queries, RBAC+access-requests, Jira webhooks, corporate memory, analyst uploads, and multi-source orchestration. Adds mock_extract_factory and admin_user fixtures to conftest, and journey marker to pytest.ini.	2026-04-12 11:13:51 +02:00
ZdenekSrotyr	9c2bd3ff25	test: add 132 API gap tests across 8 endpoint modules Covers upload (sessions, artifacts, local-md), scripts (deploy/run/delete), settings (get/dataset), memory (CRUD, voting, admin governance), access-requests (create, approve, deny), permissions (grant/revoke/list), metadata (get/save/push), and admin configure+registry endpoints. Each file tests happy path, auth required (401), role enforcement (403), and input validation (422) independently using the seeded_app fixture.	2026-04-12 11:13:24 +02:00
ZdenekSrotyr	cef1310b8f	test: add CLI gap tests for all 9 command groups 81 tests covering auth login/logout/whoami, admin user/table/metadata CRUD, sync download/upload/skip-unchanged, query local/remote/formats, analyst setup/status freshness, server subprocess delegation, diagnose health checks, explore local/remote, and metrics list/show.	2026-04-12 11:13:15 +02:00
ZdenekSrotyr	3c653b6dc2	test: add connector test suite (Block D) — 5 files, 58 tests Tests cover Keboola extractor (extension + legacy fallback, _remote_attach), BigQuery extractor (remote views, contract validation), Jira service (webhook processing, HMAC verification, HTTP mocking), Jira incremental transform (upsert/delete, monthly parquet partitioning), and LLM providers (factory, AnthropicExtractor retry/auth, OpenAICompatExtractor strategy cascade, JSON extraction helpers). Also adds tests/helpers/factories.py with WebhookEventFactory.	2026-04-12 11:12:50 +02:00
ZdenekSrotyr	b6ace1e09a	Merge branch 'worktree-agent-af11156d' into feature/v2-fastapi-duckdb-docker-cli	2026-04-12 11:12:14 +02:00
ZdenekSrotyr	5a651ca59c	test: add Block C services tests (68 tests across 6 files) Cover ws_gateway JWT auth, telegram storage user linking and verification codes, telegram bot handlers, scheduler pure functions, corporate memory collector hash detection and governance, and session file collection.	2026-04-12 11:11:48 +02:00
ZdenekSrotyr	ba61eb5f44	Merge branch 'worktree-agent-ab9a9016' into feature/v2-fastapi-duckdb-docker-cli	2026-04-12 11:10:27 +02:00
ZdenekSrotyr	4d8de9c3b7	test: add Docker E2E and live connector test files Adds test_docker_full.py (4 docker-marked tests against a running stack), test_live_keboola.py, test_live_bigquery.py, and test_live_jira.py (live-marked, read-only, skipped when credentials are absent).	2026-04-12 11:10:06 +02:00
ZdenekSrotyr	510608813c	test: add shared test infrastructure (fixtures, factories, assertions, mocks) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-12 11:05:35 +02:00
ZdenekSrotyr	e351c38368	test: add correctness test for _reattach_remote_extensions Verifies that _remote_attach table is actually found via table_catalog and contains expected extension data (not just resilience). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 08:40:12 +02:00
ZdenekSrotyr	35df940e5c	fix: BQ COUNT subquery alias, wrap ImportError in RemoteQueryError - Add AS _cnt alias to COUNT(*) subquery (BQ Standard SQL requires it) - Catch ImportError in _get_bq_client() and raise RemoteQueryError so API endpoint returns proper 400 instead of 500 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 20:29:03 +02:00
ZdenekSrotyr	77d369e311	fix: CLI help test handles ANSI escape codes in Typer output Rich/Typer may insert ANSI codes within option names like --register-bq, breaking exact string matching in CI. Check parts separately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 19:58:01 +02:00
ZdenekSrotyr	2ad8828f8c	fix: stdin register_bq parsing, separate BQ SQL validation - cli/commands/query.py: --stdin mode now reads register_bq from the JSON payload and merges it into the register_bq option list, matching the documented {"register_bq": {...}, "sql": "..."} contract. - src/remote_query.py: add _validate_bq_sql() with a narrower blocklist (writes only); register_bq() now calls _validate_bq_sql() so legitimate BQ operations like INFORMATION_SCHEMA, CALL, IMPORT are not blocked. The final DuckDB execute() path still uses the full _validate_sql(). - tests/test_remote_query.py: add TestValidateBqSql covering allowed INFORMATION_SCHEMA queries and blocked write operations.	2026-04-11 19:31:39 +02:00
ZdenekSrotyr	f4129dc87d	fix: alias validation, url escaping, read-only CLI, blocklist comment Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-11 11:28:27 +02:00
ZdenekSrotyr	ed43feb4e6	feat: add POST /api/query/hybrid endpoint for two-phase BQ+DuckDB queries	2026-04-11 11:09:42 +02:00
ZdenekSrotyr	d605e7d95f	feat: add --register-bq and --stdin to da query for hybrid BQ+local queries Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-11 11:09:11 +02:00
ZdenekSrotyr	86bbb8fce4	feat: add RemoteQueryEngine with BQ registration and safety limits Two-phase query engine: Phase 1 registers BQ query results as DuckDB Arrow views (with COUNT pre-check, row/memory limits, Storage API fallback); Phase 2 executes validated SQL against DuckDB with result serialization and truncation. 25 tests covering all branches.	2026-04-11 11:07:08 +02:00
ZdenekSrotyr	0a69814fca	fix: re-attach remote extensions in get_analytics_db_readonly() Add _reattach_remote_extensions() helper that reads _remote_attach tables from attached extract.duckdb files and LOADs the corresponding DuckDB extensions, so BigQuery and other remote views resolve correctly in read-only analytics connections.	2026-04-11 11:04:04 +02:00
ZdenekSrotyr	4b4c071959	fix: async httpx in metadata push, guard access_token, add push test - Replace synchronous httpx.post() with async httpx.AsyncClient in push_metadata_to_source endpoint to avoid blocking the event loop - Guard data["access_token"] in CLI analyst setup with .get() and a clear error message on missing key - Add test_push_non_keboola_table_fails and test_push_keboola_table to TestMetadataAPI, covering 400/404 path and the happy path with mocked async httpx	2026-04-11 08:33:10 +02:00
ZdenekSrotyr	126d151413	fix: address code review — path injection, multi-table search, metrics import API, error handling - Validate view names with _SAFE_IDENTIFIER regex and check path traversal in _initialize_duckdb() - find_by_table() and get_table_map() now also search the tables[] array field - Add POST /api/admin/metrics/import endpoint for YAML file upload - Replace generic except in _connect_to_instance() with specific HTTPStatusError/TimeoutException handlers - Generate .claude/settings.json in _generate_claude_md() bootstrap - Update test_find_by_table and test_get_table_map to cover tables[] array lookups - Add test_import_metrics_yaml in TestMetricsAPI	2026-04-10 19:56:00 +02:00
ZdenekSrotyr	847b48f3af	feat: add da analyst status and returning-session freshness check	2026-04-10 19:44:07 +02:00
ZdenekSrotyr	d0cdfcf8c1	feat: add column metadata API with Keboola push support	2026-04-10 19:44:03 +02:00
ZdenekSrotyr	a3531f0ead	feat: add da analyst setup command with bootstrap flow	2026-04-10 19:43:36 +02:00
ZdenekSrotyr	89552a00f0	feat: add da admin metadata-show and metadata-apply commands	2026-04-10 19:42:47 +02:00
ZdenekSrotyr	bf90f06774	feat: add ColumnMetadataRepository with CRUD and proposal import	2026-04-10 19:41:53 +02:00
ZdenekSrotyr	488b79f4d1	fix: use SCHEMA_VERSION constant in e2e migration test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 19:39:19 +02:00
ZdenekSrotyr	344d744089	feat: add 10 starter pack metrics (revenue, usage, sales, operations)	2026-04-10 19:35:28 +02:00
ZdenekSrotyr	5cf0df77fc	feat: add Metrics API endpoints (GET/POST/DELETE) with admin auth - New app/api/metrics.py: GET /api/metrics, GET /api/metrics/{id:path}, POST /api/admin/metrics (201), DELETE /api/admin/metrics/{id:path} - Add require_admin dependency to app/auth/dependencies.py - Register metrics_router in app/main.py before web_router - Deprecate GET /api/catalog/metrics/{path} with 301 redirect to new endpoint - 7 new tests in TestMetricsAPI covering CRUD, 404, RBAC, category filter	2026-04-10 19:32:13 +02:00
ZdenekSrotyr	5cddb5573a	feat: add da metrics CLI subcommand (list, show, import, export, validate) Implements Task 4 — five Typer commands under `da metrics`: - list/show use api_get() to query the server API - import/export/validate access DuckDB directly via MetricRepository and TableRegistryRepository (no server required) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 19:28:51 +02:00
ZdenekSrotyr	a65de8574e	feat: add import_from_yaml and export_to_yaml to MetricRepository Adds YAML-based bulk import/export to MetricRepository, supporting list-wrapped and plain-dict YAML formats, table→table_name field mapping, and sql_by_* → sql_variants collection (and reverse on export). All 24 tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 19:25:11 +02:00
ZdenekSrotyr	88d536ca29	feat: add MetricRepository with full CRUD and search for metric_definitions Implements MetricRepository following the table_registry pattern — raw SQL, dict returns, ON CONFLICT upsert, and json.dumps for sql_variants/validation. Includes 18 tests covering create, read, list, update, delete, find_by_table, find_by_synonym, and get_table_map.	2026-04-10 19:21:25 +02:00
ZdenekSrotyr	cc1445f7ed	fix: use SCHEMA_VERSION constant in v3-to-v4 migration test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 19:18:57 +02:00
ZdenekSrotyr	bc394bd266	feat: schema migration v3→v4 with metric_definitions and column_metadata tables Add SCHEMA_VERSION = 4, _V3_TO_V4_MIGRATIONS list, and if current < 4 block in _ensure_schema(). Both new tables are also added to _SYSTEM_SCHEMA for fresh installs. Tests cover fresh install, all columns, and v3→v4 migration path.	2026-04-10 19:14:32 +02:00
ZdenekSrotyr	6c53082295	feat: multi-instance deployment — all 14 must-have items from spec CalVer CI (release.yml) with stable/dev channels, health endpoint with version/channel/schema_version, JWT secret auto-generation with file persistence, smoke test script + Docker-in-CI, pre-migration snapshot, /api/admin/configure for headless setup, /api/admin/ discover-and-register, /setup wizard, OpenAPI snapshot test, custom connector mount support, CHANGELOG, migration safety tests, startup banner. 663 tests pass (6 new migration safety + 3 OpenAPI snapshot + 1 updated JWT test).	2026-04-10 11:57:42 +02:00
ZdenekSrotyr	cf59abe6dd	fix: update tests to provide password after OAuth token bypass fix	2026-04-09 16:35:15 +02:00
ZdenekSrotyr	2043594670	fix: restrict script execution endpoints to analyst/admin roles deploy, run, and run-deployed require analyst; undeploy requires admin. Update test to use admin token for undeploy.	2026-04-09 16:31:42 +02:00
ZdenekSrotyr	449053bf8a	fix: enforce per-table access control on catalog profile endpoints Add can_access_table check to GET /api/catalog/profile/{table_name} and POST /api/catalog/profile/{table_name}/refresh, returning 403 for unauthorized tables. Update test_api_complete to cover new 403 behaviour and fix the existing 404 test to use admin token.	2026-04-09 16:30:24 +02:00
ZdenekSrotyr	ad6b3a96e4	fix: enforce role guards on admin web pages Add require_role(Role.ADMIN) to /admin/tables and /admin/permissions, and require_role(Role.KM_ADMIN) to /corporate-memory/admin so that non-admin users receive 403 instead of being served the page. Fix admin_cookie test fixture to supply a password_hash (required since the /auth/token endpoint blocks passwordless requests). Add analyst fixture and TestAdminRoleGuards tests verifying analysts get 403 and admins get 200 on the protected routes.	2026-04-09 16:30:13 +02:00
ZdenekSrotyr	3205a8d300	fix: block /auth/token for OAuth-only users without password_hash Users without a password_hash (Google OAuth / magic-link accounts) could obtain a JWT by simply posting their email to /auth/token. Add an else clause that rejects such requests with 401, directing them to their configured auth provider. Update and extend tests accordingly.	2026-04-09 16:29:47 +02:00
ZdenekSrotyr	55515266ea	fix: block DuckDB metadata functions and relative paths in query endpoint Add information_schema, duckdb_* introspection functions, pragma_* functions, and relative path traversal patterns to the SQL blocklist so users cannot enumerate schema metadata regardless of RBAC. Add six corresponding tests.	2026-04-09 16:29:11 +02:00
ZdenekSrotyr	afa84f6585	fix: web UI smoke tests — reset DuckDB singleton, get token via API	2026-04-09 07:18:17 +02:00
ZdenekSrotyr	5131816a5b	test: add missing coverage for web UI, Jira extract, instance config, and concurrent rebuild - tests/test_web_ui.py: smoke tests for all authenticated web pages (login, dashboard, catalog, corporate-memory, activity-center, admin/tables, admin/permissions) - tests/test_jira_service.py: unit tests for extract_init and update_meta in the Jira connector - tests/test_instance_config.py: verifies get_instance_name() returns a string when config file is absent - tests/test_orchestrator.py: concurrent rebuild test asserting rebuild succeeds while a read-only connection holds the analytics DB	2026-04-09 07:15:14 +02:00
ZdenekSrotyr	8df8183a9f	feat: add 50 MB upload size limit to session and artifact endpoints Rejects files exceeding MAX_UPLOAD_SIZE with HTTP 413 before writing to disk.	2026-04-09 07:14:16 +02:00
ZdenekSrotyr	8e9a0c367a	fix: replace os.environ direct assignment with monkeypatch.setenv in test fixtures Prevents environment variable leaking between tests. All DATA_DIR, JWT_SECRET_KEY, and SCRIPT_TIMEOUT assignments in fixtures now use monkeypatch.setenv() which auto-reverts after each test. Removes manual os.environ.pop() cleanup lines.	2026-04-09 07:11:36 +02:00
ZdenekSrotyr	53a9e838f9	feat: add graceful shutdown handler - Add close_system_db() function in src/db.py to cleanly close shared DB connection - Add lifespan context manager in app/main.py to trigger shutdown on app exit - Integrate lifespan into FastAPI app initialization - All API tests pass (77/77)	2026-04-09 07:03:45 +02:00
ZdenekSrotyr	1b3acce7e9	fix: replace substring table access check with word-boundary regex Replace substring matching with word-boundary regex in query endpoint's table access validation. Prevents false positives where short table names like 'id' would block any query containing the word. Uses re.escape() to safely handle special characters in table names. - Import re module at top - Use regex pattern with word boundaries (\b) for matching - Add tests to verify no false positives and proper blocking	2026-04-09 07:00:48 +02:00
ZdenekSrotyr	3e9c347cf1	fix: validate extract dir name in get_analytics_db_readonly to prevent SQL injection Adds _SAFE_IDENTIFIER regex guard before ATTACHing extract.duckdb files in the read-only analytics connection, matching the same fix already applied in the orchestrator. Adds test coverage for malicious directory names.	2026-04-09 06:57:31 +02:00
ZdenekSrotyr	e425d4baa5	fix: handle WAL files in atomic swap to prevent DB corruption Add _atomic_swap_db helper that removes stale WAL files before and after moving the temp DuckDB into place. Apply CHECKPOINT before close in both orchestrator and Keboola extractor so DuckDB flushes WAL before the swap.	2026-04-09 06:57:29 +02:00
ZdenekSrotyr	3321d2e266	security: reduce JWT expiry to 24h and add jti claim Tokens previously lasted 30 days with no revocation path. Expiry is now 24 hours and every token carries a unique jti (UUID hex) to support future revocation checks.	2026-04-09 06:57:23 +02:00
ZdenekSrotyr	23ae6a602c	security: harden query endpoint SQL blocklist and disable external access Expand blocked keywords to cover parquet_scan, read_csv_auto, query_table, iceberg_scan, delta_scan, call, URL schemes (http/https/s3/gcs), and additional file-scan functions. Set enable_external_access=false on the non-read-only analytics connection path. Add three new tests covering parquet_scan, read_csv_auto, and query_table blocking.	2026-04-09 06:54:58 +02:00
ZdenekSrotyr	4aa97c23d2	fix: raise RuntimeError on missing JWT_SECRET_KEY in non-test environments Prevents production deployments from silently using a hardcoded default secret. TESTING=1 still resolves to a built-in test key so the existing test suite is unaffected. Adds a test that verifies the RuntimeError is raised when neither JWT_SECRET_KEY nor TESTING is set.	2026-04-09 06:54:29 +02:00
ZdenekSrotyr	0d3ab5060c	fix: reject unsafe SQL identifiers in orchestrator Adds _validate_identifier() with ^[a-zA-Z_][a-zA-Z0-9_]{0,63}$ regex and applies it to source_name (directory names), table_name (_meta rows), and alias/extension (_remote_attach rows) before any SQL interpolation. Adds two tests covering SQL-injection directory names and malicious _meta entries.	2026-04-09 06:51:07 +02:00
ZdenekSrotyr	cb9c566d07	fix: rebuild_source delegates to full rebuild to preserve all source views _do_rebuild_source was creating a fresh temp DB with only one source, then atomically replacing analytics.duckdb — wiping views from every other source. Now it delegates to _do_rebuild so all extract dirs are re-attached in a single pass. Adds test_rebuild_source_preserves_other_sources to guard the regression.	2026-04-09 06:48:25 +02:00
ZdenekSrotyr	94c6b0f839	fix: require password verification when user has password_hash in /auth/token Previously the password check was gated on both user.password_hash and request.password being truthy, so an attacker could omit the password field (which defaults to "") and receive a valid JWT. Now any user with a stored hash must supply a non-empty password that passes argon2 verification. Adds six TestTokenEndpoint tests covering empty, missing, wrong, and correct password, plus no-hash user and unknown user cases.	2026-04-09 06:44:31 +02:00
ZdenekSrotyr	3ba207a7f8	feat: add _remote_attach to BigQuery extractor, support token-less ATTACH in orchestrator BigQuery extension handles auth via GOOGLE_APPLICATION_CREDENTIALS env var, so _remote_attach uses empty token_env. Orchestrator now supports both token-based (Keboola) and env-based (BigQuery) authentication modes.	2026-04-08 18:13:31 +02:00
ZdenekSrotyr	06e1cf0a8d	feat: generic _remote_attach contract for remote DuckDB extension views Extractors with remote tables now write a _remote_attach table into extract.duckdb so the orchestrator can re-ATTACH external extensions at query time. The mechanism is source-agnostic — any connector can use it. - Keboola extractor writes _remote_attach + creates views on kbc.* - Orchestrator reads _remote_attach, installs extension, reads token from env - Graceful degradation: missing token → warning, local tables still work	2026-04-08 18:10:12 +02:00
ZdenekSrotyr	05a1b452e9	security: harden query (read-only DB), uploads (path sanitization), scripts (AST validation)	2026-04-08 12:09:19 +02:00
ZdenekSrotyr	67a1e0bb45	feat: Jira webhook FastAPI adapter — replaces Flask Blueprint	2026-04-08 07:04:50 +02:00
ZdenekSrotyr	4d1acd014a	refactor: remove legacy webapp + add missing tests + housekeeping Phase A: Close fixed issues (#7, #8, #9), add server/ user/ to .gitignore, increase extractor timeout to 30 min. Phase B: Add 10 new tests — access request lifecycle (4), CLI admin commands (5), sync subprocess trigger (1). 578 tests passing. Phase C: Delete entire webapp/ directory (24,800 lines) — legacy Flask app fully replaced by FastAPI app/. Fix auth providers to use app.instance_config instead of webapp.config. Update CLAUDE.md. Delete 6 webapp-only test files. Fix Jira service config imports.	2026-03-31 13:44:06 +02:00
ZdenekSrotyr	1074d5ec49	feat: implement data access control — table-level permissions Schema v3: add is_public column to table_registry (default true). src/rbac.py: can_access_table() checks admin bypass, public flag, explicit permissions, wildcard bucket permissions. API enforcement: - manifest: filters tables by user access - download: 403 if no access - catalog: filters table list - query: validates referenced tables against allowed list New admin permissions API (/api/admin/permissions) for grant/revoke. 28 access control tests + 733 total tests passing.	2026-03-31 12:33:31 +02:00
ZdenekSrotyr	617e724d21	feat: add E2E test suite — API, extractor, Docker tests/conftest.py: shared fixtures (e2e_env, seeded_app, create_mock_extract) tests/test_e2e_api.py: 11 tests — full sync flow, RBAC, table lifecycle tests/test_e2e_extract.py: 6 tests — Keboola/BQ/Jira pipelines, multi-source, corrupt handling tests/test_e2e_docker.py: 3 tests — Docker health + full flow (opt-in via -m docker) Fix admin update route (duplicate id kwarg, .dict() → .model_dump()). 705 tests passing.	2026-03-31 08:18:54 +02:00
ZdenekSrotyr	b0eaef88cc	refactor: delete old server infra — 4,200 lines removed Remove all legacy deployment infrastructure replaced by Docker + Kamal: - server/ directory (deploy.sh, setup.sh, webapp-setup.sh, sudoers, nginx config, systemd units, bin scripts) - scripts/sync_data.sh (replaced by da sync + API) - All services/*/systemd/ files (replaced by docker-compose) - tests/test_deploy_guard.py and tests/test_sync_data.py 688 tests passing.	2026-03-31 08:06:41 +02:00
ZdenekSrotyr	caa60a507d	feat: add centralized RBAC module — replace Linux group auth New src/rbac.py: Role enum, hierarchy, get_user_role(), has_role(), is_admin(), is_km_admin(), has_dataset_access(), set_user_role(). webapp/auth.py: admin_required + km_admin_required now use DuckDB roles instead of Linux groups (pwd.getpwnam + sudo/data-ops check). app/auth/dependencies.py: imports Role from src/rbac.py (single source). 11 RBAC tests passing.	2026-03-31 08:04:35 +02:00
ZdenekSrotyr	b502bd8bdd	refactor: delete old sync pipeline — 9,500 lines removed Phase 5 cleanup: remove all code replaced by extract.duckdb architecture. Deleted modules: - src/config.py (653) — replaced by DuckDB table_registry - src/parquet_manager.py (755) — replaced by DuckDB COPY TO - src/data_sync.py (734) — replaced by SyncOrchestrator - src/remote_query.py (636) — replaced by DuckDB BigQuery ATTACH - src/table_registry.py (464) — replaced by DuckDB repository - connectors/keboola/adapter.py (820) — replaced by extractor.py - connectors/bigquery/adapter.py (665) — replaced by extractor.py - connectors/bigquery/client.py (644) — replaced by DuckDB BQ extension Updated all imports in webapp, catalog_export, enricher, router, sync_settings_service, generate_sample_data. Kept keboola/client.py as fallback (removed src.config dependency). 704 tests passing.	2026-03-31 07:50:37 +02:00
ZdenekSrotyr	9f20529f10	fix: resolve 7 preexisting test failures - Remove iCloud duplicate files (test_db 2.py, src/db 2.py) - Fix metrics expression fallback to top-level field in transformer + webapp - Fix sync_data.sh rsync exception pattern for $SSH_HOST variable - Fix deploy_guard cp regex to skip shell variable expansions - Update sudoers-deploy with missing root:data-ops rules - Update CRITICAL_DIRS ownership expectations to match deploy.sh reality 913 tests passing, 0 failures.	2026-03-30 20:36:00 +02:00
ZdenekSrotyr	18e5f0b6e8	feat: implement extract.duckdb contract — orchestrator + extractors Phase 0: extend table_registry schema (v1→v2 migration), add source_type/bucket/source_table/query_mode columns. Phase 1: SyncOrchestrator ATTACHes extract.duckdb files into master analytics.duckdb. Keboola extractor uses DuckDB extension with legacy client fallback. BigQuery extractor is remote-only via DuckDB BQ extension (no data download). 62 tests passing.	2026-03-30 20:12:56 +02:00
ZdenekSrotyr	bca5e91826	feat: add bootstrap endpoint + deploy skill for AI agents - POST /auth/bootstrap — creates first admin, self-deactivates after - da setup bootstrap — CLI command for agent-driven setup - da setup verify — structured health check (JSON output for agents) - cli/skills/deploy.md — complete deployment guide for AI agents - 6 bootstrap tests including full agent deployment flow simulation - 156 total tests passing	2026-03-30 14:01:01 +02:00
ZdenekSrotyr	1a7939c594	feat: add auth providers (Google OAuth, Password, Email magic link) + web UI fixes - Google OAuth with authlib + auto user creation + cookie-based JWT - Password auth with argon2 hash + setup token flow - Email magic link with SMTP/SendGrid support - Cookie-based auth for web UI (after OAuth redirect) - Dashboard template compatibility (user_info, activity, desktop status) - 150 tests passing	2026-03-27 17:07:59 +01:00
ZdenekSrotyr	1287e63ed9	feat: complete system — web UI, all API endpoints, governance, admin, CLI commands Major additions: - Web UI: Jinja2 templates in FastAPI (login, dashboard, catalog, corporate memory, admin) - API: catalog profiles/metrics, telegram verify/unlink/status, admin table registry CRUD - Corporate memory governance: approve/reject/mandate/revoke/edit/batch + audit log - Sync: real DataSyncManager trigger, sync-settings, table-subscriptions - CLI: setup (init/test/deploy/verify), server (logs/restart/deploy/backup), explore - Instance config integration (instance.yaml loaded at startup) - 140 tests passing (25 new)	2026-03-27 16:52:22 +01:00
ZdenekSrotyr	c5527ec153	fix: harden script sandbox and SQL query security Fixes found by E2E QA agent: - Script sandbox: block os, sys, socket, eval, exec, open, __import__, getattr, pathlib and 20+ other dangerous patterns - SQL query: block COPY, ATTACH, read_csv, semicolons, non-SELECT - Added 24 security tests covering all attack vectors	2026-03-27 16:11:05 +01:00
ZdenekSrotyr	e0ce91ddb9	feat: add dataset permissions, script execution, Kamal config, CI/CD - SyncSettingsRepository + DatasetPermissionRepository with RBAC - Script deploy/run/undeploy API with import sandboxing - User sync settings API with permission checks - 4 CLI skills (connectors, security, notifications, corporate-memory) - Kamal production + staging configs - GitHub Actions CI + deploy workflows - 91 total tests passing	2026-03-27 15:40:11 +01:00
ZdenekSrotyr	3701130a11	feat: add Docker, CLI tool, scheduler, and agent skills - Dockerfile (uv-based) + docker-compose.yml (3 services) - CLI tool 'da' with commands: auth, sync, query, status, admin, diagnose, skills - Scheduler sidecar service (replaces systemd timers) - pyproject.toml for uv distribution - Built-in skills (setup, troubleshoot) for AI agents - 17 CLI tests, 75 total tests passing	2026-03-27 15:30:03 +01:00
ZdenekSrotyr	a3918d3833	feat: add FastAPI server with auth, RBAC, and all API endpoints - JWT auth with role-based access control (viewer/analyst/admin/km_admin) - Endpoints: health, sync manifest, data download, query, users CRUD, corporate memory, session/artifact upload - 18 API tests covering auth, RBAC, all endpoints	2026-03-27 15:19:18 +01:00
ZdenekSrotyr	64acc8d731	feat: add JSON to DuckDB migration script with tests	2026-03-27 15:09:06 +01:00
ZdenekSrotyr	79b0b66f2e	feat: add DuckDB state layer with all repository classes - src/db.py: schema with 14 tables matching design spec - 7 repository classes: SyncState, Users, Knowledge, Audit, Telegram, PendingCode, Script, TableRegistry, Profiles - 37 tests covering all CRUD operations	2026-03-27 15:06:55 +01:00
ZdenekSrotyr	f76411c603	feat: add DuckDB state layer with schema management Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 13:55:54 +01:00
Petr	1318b74ff1	Add Corporate Memory governance — Phase 1 (data model + admin API) Add admin curation layer between AI extraction and knowledge distribution. Admins (km_admin flag in instance.yaml) can approve, reject, mandate, and revoke knowledge items. Mandatory items distribute to all targeted users automatically. Three governance modes (configurable per instance): - mandatory_only: admin controls everything, no user voting - admin_curated: admin controls, users vote as feedback signal - hybrid: mandatory from admin + optional from user voting Three approval workflows: - review_queue: nothing published without admin approval - auto_publish: items go live immediately, admin intervenes retroactively - threshold: confidence-based auto-publish (Phase 5) Includes: - 9 admin action functions (approve/reject/mandate/revoke/edit/batch/...) - 11 new admin API endpoints under /api/corporate-memory/admin/ - Immutable audit log (audit.jsonl) - Audience targeting via groups - Automatic migration of existing items to "approved" status - km_admin_required auth decorator - 69 tests covering all governance logic - Backward compatible: no config = legacy wiki behavior	2026-03-23 19:15:33 +01:00
Petr	95358448e6	Add modular LLM connector for Corporate Memory Replace hardwired Anthropic API calls with a pluggable provider system. Each deployment configures its AI provider in instance.yaml — switching between Anthropic, LiteLLM, OpenRouter, or any OpenAI-compatible proxy is a config change, not a code change. New connectors/llm/ module: - StructuredExtractor Protocol with extract_json() interface - AnthropicExtractor: direct Anthropic SDK with retry + backoff - OpenAICompatExtractor: any OpenAI-compatible proxy with three-layer structured output fallback (json_schema -> json_object -> prompt) - Configurable structured_output policy (strict/json/auto) - Custom exception hierarchy (auth/rate_limit/timeout/format/refusal) - Zero secrets in logs: no API keys, prompts, or responses logged Reviewed by: Google Gemini, Claude Sonnet, OpenAI GPT-5.4. Security audit passed with all critical findings resolved.	2026-03-23 12:08:33 +01:00
Petr	8c6c162417	Fix: --sql not required when --stdin is used argparse was rejecting --stdin mode because --sql was required=True. Changed to required=False with runtime validation in main().	2026-03-21 12:17:02 +01:00
Petr	d180b2014e	Step 28: Remote query architecture for local+remote table JOINs Add src/remote_query.py CLI module enabling the AI agent to run SQL queries spanning local Parquet tables and remote BigQuery tables in a single DuckDB session on the server. Two-phase protocol: BQ sub-queries (--register-bq) fetch filtered/aggregated data, then DuckDB SQL (--sql) joins everything. Safety: COUNT(*) pre-check, memory estimation (2GB cap), row limits (500K per BQ sub-query, 100K final result). Changes: - New src/remote_query.py with CLI, BQ registration, output formatting - Add bq_entity_type field to TableConfig (view vs table routing) - Extract create_local_views() from duckdb_manager.py for reuse - Update claude_md_template.txt with remote query agent instructions - Update example configs with remote_query section and docs - 52 new tests (42 remote_query + 10 bq_entity_type), all passing	2026-03-21 11:39:15 +01:00
Petr	ab99f0af92	Fix sync_schedule validation to accept multi-time daily format The scheduler.py already supported "daily HH:MM,HH:MM,HH:MM" format (commit `5f27d05`), but config.py validation regex only accepted single time "daily HH:MM", causing data-refresh to crash on startup. Also adds: - tests/test_config_sync_schedule.py (16 test cases) - Makefile with validate-config target for CI/CD integration	2026-03-17 13:21:14 +01:00
Petr	5f27d05894	Support multiple daily sync times (e.g., "daily 07:00,13:00,18:00") Scheduler now accepts comma-separated HH:MM times in daily schedules. Each time slot is independently evaluated - if any slot has passed and last_sync is before it, the table is marked as due. This lets tables sync multiple times per day to pick up data refreshes that happen throughout the day (e.g., Keboola pipelines running 3x/day).	2026-03-16 23:09:48 +01:00
Petr	ad525a96aa	Filter catalog metrics by configurable tag (e.g., AIAgent.FoundryAI) Add filter_tag support to catalog_export and webapp so only metrics with the required tag are exported to YAML and displayed in UI. Previously all 19+ metrics were exported regardless of relevance. - Add has_tag() helper to transformer module - catalog_export.py: filter_tag parameter from instance.yaml openmetadata config - webapp/app.py: filter metrics in _load_metrics_from_catalog() - 7 new tests (has_tag, filter_tag export, stale cleanup)	2026-03-16 22:03:53 +01:00
Petr	80c5b902e0	Add scheduled data sync and catalog refresh with systemd timers - New sync_schedule and profile_after_sync fields in TableConfig (formats: "every 15m", "every 1h", "daily 05:00") - New src/scheduler.py with schedule evaluation logic (is_table_due) - New --scheduled mode in data_sync.py: only syncs tables that are due, respects profile_after_sync flag, auto-restarts webapp after profiling - Systemd timer+service for data-refresh (every 15 min) - Systemd timer+service for catalog-refresh (every 15 min) - deploy.sh enables new timers automatically - Complete table config reference in data_description.md.example - 58 new scheduler tests	2026-03-15 02:16:31 +01:00
Petr	ab1a93ed67	Strip HTML tags from OpenMetadata descriptions in YAML export OpenMetadata stores descriptions as rich HTML (<p>, <strong>,  , etc.). Add strip_html() to transformer that converts to clean plain text for YAML files consumed by Claude Code agent. Applied to metric descriptions, table descriptions, and column descriptions. Webapp display dict keeps raw HTML since the modal renders it correctly.	2026-03-15 01:57:04 +01:00
Petr	985f47cdb7	Add catalog export: generate YAML metrics and tables from OpenMetadata - New `connectors/openmetadata/transformer.py` with shared parsing logic for extracting categories, grain, dimensions, expressions from OM tags - New `src/catalog_export.py` script (python -m src.catalog_export) that fetches metrics/tables from OpenMetadata API and writes YAML files to /data/docs/metrics/ and /data/docs/tables/ for agent consumption - Refactor webapp/app.py to delegate to transformer (with inline fallback) - Add `fields` parameter to client.get_metrics() and get_metric_by_fqn() for fetching tags+owners in a single API call - Fix pre-existing mock bug in test_openmetadata_enricher (base_url) - 101 new tests (80 transformer + 21 export), all passing	2026-03-15 01:15:30 +01:00
Petr	5fc9526627	Phase 2: Replace demo YAML metrics with OpenMetadata catalog data - Add get_metric_by_fqn() to OpenMetadataClient - Add get_metrics() to CatalogEnricher with TTL caching - Implement _parse_om_metric() to extract category/grain from OpenMetadata tags - Implement _load_metrics_from_catalog() to fetch and categorize metrics - Implement _build_om_metric_detail() to convert OpenMetadata format to MetricParser JSON - Add /api/catalog/metrics/<fqn> endpoint for metric detail modal - Update _load_metrics_data() to prefer catalog over YAML fallback - Update metric_modal.js to route catalog:{fqn} to catalog API endpoint - Delete 10 demo YAML files from docs/metrics/ - Replace metric tests with new unit tests for catalog parsing functions (19 tests) Catalog metrics provide single source of truth vs maintaining demo YAML files. UI remains unchanged - only data source changes from YAML to OpenMetadata catalog.	2026-03-12 15:10:42 +01:00
Petr	14d75d6229	Fix: correct OpenMetadata catalog URL path and add debug logging - Change catalog URL from /explore/{fqn} to /table/{fqn} - Add debug logging to see parsed tags, owners, tier from API response	2026-03-12 14:34:12 +01:00
Petr	c5c24cb45b	Implement OpenMetadata catalog integration (Phase 1) Add OpenMetadata REST API connector and enricher to merge table/column metadata from OpenMetadata catalog at sync and query time. Changes: - connectors/openmetadata/client.py: HTTP client for OM API - connectors/openmetadata/enricher.py: Data enrichment with TTL cache - tests/test_openmetadata_*: Unit tests for client and enricher - src/config.py: Add catalog_fqn field to TableConfig - src/data_sync.py: Use enricher in _generate_schema_yaml (catalog > BQ API > data_description.md) - webapp/app.py: Initialize enricher, enrich catalog data with tags/tier/owners/url - config/instance.yaml.example: Document openmetadata section Features: - FQN auto-derivation: bigquery.{table.id} - TTL cache (default 1h) to avoid repeated API calls - Graceful degradation: disabled if token missing, silent on HTTP errors - Column description priority: catalog > BQ API > (none) - Table description priority: catalog > data_description.md	2026-03-12 14:07:13 +01:00
Petr	8bb46a9e0a	Add per-partition streaming sync and hybrid query architecture Partitioned sync: iterates day-by-day instead of loading full dataset. Each partition: query BQ -> stream to disk -> free RAM. Peak ~50 MB. New helpers: _sync_single_partition, _cleanup_old_partitions, _generate_partition_dates. Config: added partition_column_type (DATE/TIMESTAMP/DATETIME), query_mode (local/remote/hybrid). DuckDB manager: hybrid architecture support (local Parquet + remote BQ tables). Data sync: skips remote tables, filters by query_mode. Tests: 113 passing (adapter, client, config, data_sync, duckdb_manager).	2026-03-12 13:20:41 +01:00
Petr	ee70da86c3	Stream BQ results to Parquet instead of loading into memory Replace to_arrow() (loads entire result into RAM) with to_arrow_iterable() (streams RecordBatches). Each batch is written directly to disk via ParquetWriter - constant memory regardless of table size. Prevents OOM on 8GB server for multi-million row tables.	2026-03-11 20:13:03 +01:00
Petr	a191ede28c	Add columns and row_filter to TableConfig for selective BQ export Propagate column selection and row filtering from data_description.md through the BigQuery adapter to the BQ client. This enables exporting only needed columns and applying date range filters at the SQL level, critical for large DataView tables (e.g., 412-col unit_economics).	2026-03-11 19:37:04 +01:00
Petr	758910463b	Add BigQuery data source adapter BigQuery connector that syncs BQ tables to local Parquet files via PyArrow (no CSV intermediate step). Supports full refresh, timestamp-based incremental (via incremental_column), and partition-based sync strategies. - connectors/bigquery/client.py: BQ API wrapper with ADC auth, parameterized queries, metadata cache, cross-project support (job project != data project) - connectors/bigquery/adapter.py: DataSource implementation with merge/dedup - src/config.py: Add incremental_column field to TableConfig - 72 unit tests (mocked, no GCP SDK required)	2026-03-11 13:56:12 +01:00
Petr	5a84473213	Add dynamic Business Metrics with sample e-commerce definitions Replace hardcoded Keboola-specific metrics card in Data Catalog with dynamic Jinja template that renders whatever metric YAMLs exist in docs/metrics/. Add 10 sample e-commerce metric definitions across 4 categories (revenue, customers, marketing, support) that align with the sample data generator tables. Key changes: - MetricParser: new category colors + dynamic sql_* field discovery - _load_metrics_data(): scans docs/metrics//.yml with prod fallback - catalog.html: 240 lines hardcoded HTML -> 35 lines Jinja loop - metric_modal.js: regex-based category class removal, new categories - 21 tests validating YAML schema, parser, and loader	2026-03-10 22:38:44 +01:00
Petr	302494b632	Add --format parquet using project's ParquetManager Generator now supports --format {csv,parquet,both}. Parquet mode uses src.parquet_manager.ParquetManager for snappy compression, proper column types (DATE, TIMESTAMP, DOUBLE), and metadata. No more ad-hoc pandas conversion needed on the server.	2026-03-10 21:46:20 +01:00
Petr	44bf43535b	Add sample data generator with 9 e-commerce tables Synthetic data generator for demo/testing without real data adapter: - 9 tables: customers, products, campaigns, web_sessions, web_leads, orders, order_items, payments, support_tickets - 4 size presets: xs (1MB), s (15MB), m (150MB), l (1.5GB) - Realistic patterns: seasonality, Pareto customer distribution, segment-based behavior, referential integrity - Deterministic output via --seed parameter Also: docs/sample-data.md, updated auto-install.md with Step 6, updated CLAUDE.md (email auth provider, dual-repo architecture)	2026-03-10 12:31:14 +01:00
Petr	f635195c80	Add multi-domain support and full-email username generation - Support comma-separated domains in auth.allowed_domain config - Use full email as system username (user@domain.com -> user_domain_com) to avoid collisions with reserved names and across domains - Update both auth providers (google, email) for multi-domain display - Add tests for username generation and update email auth tests	2026-03-10 10:50:01 +01:00
Petr	e2ab219171	Add email magic link authentication provider New pluggable auth provider that sends passwordless sign-in links. Works with domain restriction (same as Google OAuth). Falls back to showing the link in browser when SMTP is not configured (dev mode).	2026-03-10 10:39:19 +01:00
Petr	b99ec576ca	Add self-service data onboarding system Table Registry as central source of truth (JSON) with atomic writes, optimistic locking, audit logging, and data_description.md generation. Existing readers (config.py, profiler.py) need zero changes. Phase 1 - Discovery API: - discover_tables() on DataSource ABC + Keboola implementation - admin_required decorator with server-side recomputation - GET /api/admin/discover-tables endpoint Phase 2 - Table Registry: - src/table_registry.py with CRUD, validation, migration from MD - Admin API: register/update/unregister with version locking - DELETE cascade cleans up per-user subscriptions Phase 3 - Auto-Profiling: - profile_changed_tables() for incremental profiling - Non-fatal hook in sync_all() after successful sync Phase 4 - Per-Table Subscriptions: - table_mode (all/explicit) with per-table toggles - GET/POST /api/table-subscriptions endpoints - Subscription status in catalog and dashboard views Phase 5 - Smart Sync: - Python-generated rsync filter files (not shell YAML parsing) - sync_data.sh uses --filter="merge ..." for explicit mode Phase 6 - Admin UI: - /admin/tables with discovery, registration modal, registry mgmt - Vanilla JS, matching existing design system	2026-03-09 14:25:37 +01:00
Petr	86edd27655	Extract Jira into connectors/jira module Move all Jira-specific code into a self-contained connector module: - 22 files moved via git mv (transform, service, webhook, scripts, systemd units, tests, docs, bin helper) - All imports updated to use connectors.jira.* paths - Jira is now conditional: auto-detected via JIRA_DOMAIN env var - Webapp registers Jira blueprint only when available - Health service monitors Jira timers only when enabled - Profiler loads Jira tables dynamically from filesystem - Sync settings uses config-driven dependency validation - Renamed keboola_platform_url -> custom_url in transform - Updated deploy.sh, sudoers-deploy, backfill_gap.sh paths - Fixed pytest.ini to skip live tests by default	2026-03-09 11:17:50 +01:00
Petr	26c4e0934d	OSS cleanup: remove internal references, harden deployment, add config env interpolation Phase 1 - Internal reference cleanup: - Delete dev_docs/meetings/ (internal meeting notes/transcripts) - Replace hardcoded usernames (padak/matejkys/dasa) with deploy/generic - Replace "Internal AI Data Analyst" with "AI Data Analyst" - Replace keboola/internal_ai_data_analyst URLs with your-org/ai-data-analyst - Replace /tmp/keboola_load/ with /tmp/data_analyst_staging/ in dev_docs Phase 2 - Deployment hardening: - Tighten sudoers wildcards to explicit paths (visudo, sudoers cp) - setup.sh creates all groups (data-ops, dataread, data-private) and deploy user - webapp-setup.sh copies sudoers-webapp from repo instead of inline definition - deploy.sh conditional copy for data_description.md (not in git for OSS) - deploy.sh ownership changed to deploy:data-ops for /data/{scripts,docs,examples} Phase 3 - Config and misc: - Add ${ENV_VAR} interpolation to config/loader.py - Expand config/instance.yaml.example with all sections (admins, deployment, auth, etc.) - Create config/.env.template for secret values - Add MIT LICENSE - Fix .gitignore: add .venv/, docs/data_description.md - Fix README.md: CSV status Planned, remove metrics/, update license text - Translate Czech comments in requirements.txt to English - Fix test_account_service.py: mock username mapping instead of relying on instance config All 118 tests pass.	2026-03-09 07:59:57 +01:00
Petr	c56905d34f	Initial commit: OSS data distribution platform Open-source AI data analyst platform extracted from internal repo. Includes data sync engine, Keboola adapter, Flask web portal, server deployment scripts, and configuration templates.	2026-03-08 23:31:28 +01:00

... 5 6 7 8 9

423 commits