agnes-the-ai-analyst

Author	SHA1	Message	Date
ZdenekSrotyr	caa60a507d	feat: add centralized RBAC module — replace Linux group auth New src/rbac.py: Role enum, hierarchy, get_user_role(), has_role(), is_admin(), is_km_admin(), has_dataset_access(), set_user_role(). webapp/auth.py: admin_required + km_admin_required now use DuckDB roles instead of Linux groups (pwd.getpwnam + sudo/data-ops check). app/auth/dependencies.py: imports Role from src/rbac.py (single source). 11 RBAC tests passing.	2026-03-31 08:04:35 +02:00
ZdenekSrotyr	b502bd8bdd	refactor: delete old sync pipeline — 9,500 lines removed Phase 5 cleanup: remove all code replaced by extract.duckdb architecture. Deleted modules: - src/config.py (653) — replaced by DuckDB table_registry - src/parquet_manager.py (755) — replaced by DuckDB COPY TO - src/data_sync.py (734) — replaced by SyncOrchestrator - src/remote_query.py (636) — replaced by DuckDB BigQuery ATTACH - src/table_registry.py (464) — replaced by DuckDB repository - connectors/keboola/adapter.py (820) — replaced by extractor.py - connectors/bigquery/adapter.py (665) — replaced by extractor.py - connectors/bigquery/client.py (644) — replaced by DuckDB BQ extension Updated all imports in webapp, catalog_export, enricher, router, sync_settings_service, generate_sample_data. Kept keboola/client.py as fallback (removed src.config dependency). 704 tests passing.	2026-03-31 07:50:37 +02:00
ZdenekSrotyr	9f20529f10	fix: resolve 7 preexisting test failures - Remove iCloud duplicate files (test_db 2.py, src/db 2.py) - Fix metrics expression fallback to top-level field in transformer + webapp - Fix sync_data.sh rsync exception pattern for $SSH_HOST variable - Fix deploy_guard cp regex to skip shell variable expansions - Update sudoers-deploy with missing root:data-ops rules - Update CRITICAL_DIRS ownership expectations to match deploy.sh reality 913 tests passing, 0 failures.	2026-03-30 20:36:00 +02:00
Petr	eb7e5bdf8f	Add data freshness indicators and remote table visibility to UI - Fix sync_state.json parsing: derive last_updated from table last_sync timestamps when root-level field is missing (flat format support) - Parse ALL YAML blocks from data_description.md (was only first block) - Show remote tables (daily_deal_traffic) in catalog with "Live" badge - Show per-table sync timestamps and Local/Live query mode badges - Add data freshness note to Business Metrics section - Dashboard: fix "Not yet synced" bug, show local/live table breakdown	2026-03-25 16:24:26 +01:00
Petr	0560bbc127	Rename Mandate button to Make Mandatory	2026-03-23 19:44:08 +01:00
Petr	e85d296b0a	Add Corporate Memory admin review queue UI (Phase 2) Admin page at /corporate-memory/admin with three tabs: - Review Queue: pending items with approve/mandate/reject + batch ops - All Items: status filter, promote/demote/revoke actions - Audit Log: filterable action history table Features: - Keyboard shortcuts (j/k navigate, a/r/m = approve/reject/mandate) - Inline mandate form (mandatory reason + audience targeting) - Toast notifications on action success/error - Pending count badge on main Corporate Memory page - Matches existing visual design (CSS variables, card styles)	2026-03-23 19:32:33 +01:00
Petr	1318b74ff1	Add Corporate Memory governance — Phase 1 (data model + admin API) Add admin curation layer between AI extraction and knowledge distribution. Admins (km_admin flag in instance.yaml) can approve, reject, mandate, and revoke knowledge items. Mandatory items distribute to all targeted users automatically. Three governance modes (configurable per instance): - mandatory_only: admin controls everything, no user voting - admin_curated: admin controls, users vote as feedback signal - hybrid: mandatory from admin + optional from user voting Three approval workflows: - review_queue: nothing published without admin approval - auto_publish: items go live immediately, admin intervenes retroactively - threshold: confidence-based auto-publish (Phase 5) Includes: - 9 admin action functions (approve/reject/mandate/revoke/edit/batch/...) - 11 new admin API endpoints under /api/corporate-memory/admin/ - Immutable audit log (audit.jsonl) - Audience targeting via groups - Automatic migration of existing items to "approved" status - km_admin_required auth decorator - 69 tests covering all governance logic - Backward compatible: no config = legacy wiki behavior	2026-03-23 19:15:33 +01:00
Petr	ed16122994	Use data_product config for metric discovery instead of filter_tag in webapp	2026-03-18 16:10:15 +01:00
Petr	6c0abf275b	Add cache busting to metric_modal.css include	2026-03-16 22:16:37 +01:00
Petr	9be22fdc82	Fix metric display: use displayName in list, render HTML in modal List view: - Show display_name ("M1 + VFM Operational") instead of name ("M1PlusVFMOperational") - Strip HTML and truncate description for clean list excerpts Modal detail: - Render original HTML from catalog instead of stripped plain text - Add .om-description CSS class for structured HTML (bold labels, lists, code) - Pass description_html alongside plain text description for backwards compat	2026-03-16 22:11:58 +01:00
Petr	ad525a96aa	Filter catalog metrics by configurable tag (e.g., AIAgent.FoundryAI) Add filter_tag support to catalog_export and webapp so only metrics with the required tag are exported to YAML and displayed in UI. Previously all 19+ metrics were exported regardless of relevance. - Add has_tag() helper to transformer module - catalog_export.py: filter_tag parameter from instance.yaml openmetadata config - webapp/app.py: filter metrics in _load_metrics_from_catalog() - 7 new tests (has_tag, filter_tag export, stale cleanup)	2026-03-16 22:03:53 +01:00
Petr	985f47cdb7	Add catalog export: generate YAML metrics and tables from OpenMetadata - New `connectors/openmetadata/transformer.py` with shared parsing logic for extracting categories, grain, dimensions, expressions from OM tags - New `src/catalog_export.py` script (python -m src.catalog_export) that fetches metrics/tables from OpenMetadata API and writes YAML files to /data/docs/metrics/ and /data/docs/tables/ for agent consumption - Refactor webapp/app.py to delegate to transformer (with inline fallback) - Add `fields` parameter to client.get_metrics() and get_metric_by_fqn() for fetching tags+owners in a single API call - Fix pre-existing mock bug in test_openmetadata_enricher (base_url) - 101 new tests (80 transformer + 21 export), all passing	2026-03-15 01:15:30 +01:00
Petr	508d92771f	Generate setup instructions from bootstrap.yaml (single source of truth) - Rewrite bootstrap.yaml as clean structured YAML with steps, commands, descriptions, conditions, and notes - Add _generate_setup_instructions() in app.py that reads YAML, substitutes placeholders, and produces clipboard-ready plain text - Replace 50-line hardcoded JS string builder with single tojson variable - All setup instructions now editable in one YAML file	2026-03-15 00:37:19 +01:00
Petr	85c07732b2	Fix dashboard stats: support flat sync_state.json format (no 'tables' wrapper) BigQuery adapter writes table entries at top level, not nested under 'tables'. Detect flat format by checking if values contain 'rows' key.	2026-03-15 00:26:10 +01:00
Petr	3ebb15cbab	Make project_dir, ssh_key configurable in Get Started UI Read server.project_dir from instance.yaml (default: 'data-analyst'). Replace hardcoded 'data-analyst' folder name and 'data_analyst_server' SSH key name in dashboard template with Jinja variables.	2026-03-15 00:12:46 +01:00
Petr	021c453ea6	Auto-create .sync_connection via printf command in bootstrap Replace 'Save this to .sync_connection' prose with actual printf command that Claude/user executes. Fix heredoc indentation in bootstrap.yaml.	2026-03-15 00:05:42 +01:00
Petr	2237334b05	Make CLAUDE.md template generic and instance-aware - Remove all Keboola-specific content (metric categories, MRR/ARR refs, corporate memory, hardcoded server IP) - Add {ssh_alias}, {server_host}, {webapp_url} placeholders - Bootstrap saves .sync_connection file with instance details - sync_data.sh reads .sync_connection to substitute all placeholders - Text instructions in dashboard include .sync_connection step	2026-03-14 23:57:58 +01:00
Petr	6728da63fb	Use ssh_alias and ssh_key from config in bootstrap text instructions Replace hardcoded 'data-analyst' and '~/.ssh/data_analyst_server' in the copyBootstrapInstructions JS function with values from instance config. Pass ssh_alias and ssh_key to dashboard template context.	2026-03-14 21:06:37 +01:00
Petr	13938bf72f	Read ssh_alias and ssh_key from instance.yaml for bootstrap instructions Config reads server.ssh_alias and server.ssh_key from instance.yaml (defaults: 'data-analyst' and '~/.ssh/data_analyst_server' for backward compat). App.py substitutes {ssh_alias} and {ssh_key} in bootstrap.yaml template.	2026-03-14 21:04:51 +01:00
Petr	c2681ccc86	Add cache-busting with git commit hash for static assets Flask will now include git commit hash as URL parameter (v=abc1234) for metric_modal.js and other static assets. This ensures browser doesn't cache stale JavaScript when code changes. Cache invalidation based on actual git history rather than timestamps.	2026-03-12 15:37:29 +01:00
Petr	f6000cc867	Fix: URL-encode metric FQN in catalog modal request When metric FQN contains spaces (e.g. 'Active2 Customers'), JavaScript was creating invalid URLs with literal spaces. Now properly encoding FQN with encodeURIComponent() to convert spaces to %20 before sending to API. Flask automatically decodes the path parameter back to original FQN.	2026-03-12 15:20:10 +01:00
Petr	1bcd7e4080	Fix: URL-decode metric FQN in catalog endpoint FQN can contain spaces (e.g., 'Active2 Customers') which get URL-encoded as 'Active2%20Customers' in the path parameter. Need to decode before passing to OpenMetadata API.	2026-03-12 15:18:08 +01:00
Petr	268fe07f91	Fix: Use correct OpenMetadata API field names for metrics OpenMetadata uses different field names than expected: - metricExpression instead of expression - metricType instead of type - unitOfMeasurement instead of unit - granularity instead of grain Remove 'fields' query parameter from /api/v1/metrics - returns 400 Bad Request when invalid field names are specified. Let API return full metric objects. Update parsing to extract metadata from proper OpenMetadata fields instead of relying on tags (tags are optional, fields are always present).	2026-03-12 15:16:24 +01:00
Petr	5fc9526627	Phase 2: Replace demo YAML metrics with OpenMetadata catalog data - Add get_metric_by_fqn() to OpenMetadataClient - Add get_metrics() to CatalogEnricher with TTL caching - Implement _parse_om_metric() to extract category/grain from OpenMetadata tags - Implement _load_metrics_from_catalog() to fetch and categorize metrics - Implement _build_om_metric_detail() to convert OpenMetadata format to MetricParser JSON - Add /api/catalog/metrics/<fqn> endpoint for metric detail modal - Update _load_metrics_data() to prefer catalog over YAML fallback - Update metric_modal.js to route catalog:{fqn} to catalog API endpoint - Delete 10 demo YAML files from docs/metrics/ - Replace metric tests with new unit tests for catalog parsing functions (19 tests) Catalog metrics provide single source of truth vs maintaining demo YAML files. UI remains unchanged - only data source changes from YAML to OpenMetadata catalog.	2026-03-12 15:10:42 +01:00
Petr	14d75d6229	Fix: correct OpenMetadata catalog URL path and add debug logging - Change catalog URL from /explore/{fqn} to /table/{fqn} - Add debug logging to see parsed tags, owners, tier from API response	2026-03-12 14:34:12 +01:00
Petr	de66f6dd55	Fix: include partition fields in TableConfig for catalog enrichment Pass partition_by, partition_granularity, partition_column_type, and incremental_window_days from YAML to TableConfig to avoid validation errors when sync_strategy='partitioned'	2026-03-12 14:29:05 +01:00
Petr	2d03a9b557	Display OpenMetadata catalog enrichment in table profile overview - API endpoint /api/catalog/profile/ enriches response with catalog metadata (tier, owners, tags, url) - renderOverview() template function displays 'Data Catalog' section with tier, owners, tags, and catalog link - Graceful degradation: section only shown if catalog enrichment available	2026-03-12 14:28:02 +01:00
Petr	c5c24cb45b	Implement OpenMetadata catalog integration (Phase 1) Add OpenMetadata REST API connector and enricher to merge table/column metadata from OpenMetadata catalog at sync and query time. Changes: - connectors/openmetadata/client.py: HTTP client for OM API - connectors/openmetadata/enricher.py: Data enrichment with TTL cache - tests/test_openmetadata_*: Unit tests for client and enricher - src/config.py: Add catalog_fqn field to TableConfig - src/data_sync.py: Use enricher in _generate_schema_yaml (catalog > BQ API > data_description.md) - webapp/app.py: Initialize enricher, enrich catalog data with tags/tier/owners/url - config/instance.yaml.example: Document openmetadata section Features: - FQN auto-derivation: bigquery.{table.id} - TTL cache (default 1h) to avoid repeated API calls - Graceful degradation: disabled if token missing, silent on HTTP errors - Column description priority: catalog > BQ API > (none) - Table description priority: catalog > data_description.md	2026-03-12 14:07:13 +01:00
Petr	d438438e33	Add configurable white-label theming via instance.yaml Extend theming from 3 CSS variables (primary colors only) to 14 configurable properties covering colors, fonts, borders, and shape. All values are optional with sensible defaults. - New _theme.html include replaces duplicated inline injection - Wire theme include into all 7 templates (base, login, dashboard, catalog, admin_tables, activity_center, corporate_memory) - Conditional font loading: skip default Inter when custom font_url set - Config.theme_overrides() classmethod generates CSS variable dict - Visual theme-reference.html guide for instance configurators - Document all theme keys in instance.yaml.example	2026-03-11 13:58:58 +01:00
Petr	eb5264b903	Make header logo configurable via instance.yaml logo_svg Move hardcoded Keboola SVG logo from 4 templates into config. Templates now use {{ config.LOGO_SVG \| safe }}. Default falls back to Keboola logo when not configured.	2026-03-11 13:08:26 +01:00
Petr	d3c7f7feea	Fix activity-center 500 error: provide default data structure The activity_center view was passing an empty dict but the template expected nested keys (executive_summary, maturity_roadmap, etc). Added _build_activity_data() that returns properly structured defaults.	2026-03-11 12:59:40 +01:00
Petr	91a05a2c2b	Add auth.disabled_providers config to skip auth providers Reads disabled_providers list from instance.yaml auth section. Listed providers are skipped during auto-discovery.	2026-03-11 12:54:23 +01:00
Petr	954aa0f17e	Add theme color support via instance.yaml Allow instances to override primary CSS color variables through theme section in instance.yaml config.	2026-03-11 00:42:10 +01:00
Petr	ad3b94c168	Add Business Metrics card to dashboard	2026-03-10 22:52:48 +01:00
Petr	34fde746e7	Remove hardcoded Jira and Telemetry cards from dashboard	2026-03-10 22:50:22 +01:00
Petr	49559fba1b	Remove hardcoded Jira and Telemetry cards from catalog These Keboola-specific data source cards don't belong in the OSS repo. The catalog now shows only dynamic content: Core Business Data (from data_description.md) and Business Metrics (from docs/metrics/*.yml). Also update auto-install.md with Business Metrics documentation, pipeline diagram, and expanded checklist.	2026-03-10 22:48:07 +01:00
Petr	5a84473213	Add dynamic Business Metrics with sample e-commerce definitions Replace hardcoded Keboola-specific metrics card in Data Catalog with dynamic Jinja template that renders whatever metric YAMLs exist in docs/metrics/. Add 10 sample e-commerce metric definitions across 4 categories (revenue, customers, marketing, support) that align with the sample data generator tables. Key changes: - MetricParser: new category colors + dynamic sql_* field discovery - _load_metrics_data(): scans docs/metrics//.yml with prod fallback - catalog.html: 240 lines hardcoded HTML -> 35 lines Jinja loop - metric_modal.js: regex-based category class removal, new categories - 21 tests validating YAML schema, parser, and loader	2026-03-10 22:38:44 +01:00
Petr	28543d98b1	Fix profiler file_size and catalog stats fallback - Profiler computes file_size_mb from actual parquet files when sync_state.json is absent (sample data / no-sync deployments) - Catalog header falls back to profiles.json for aggregate stats (tables count, total rows) when sync_state.json is missing	2026-03-10 22:12:46 +01:00
Petr	1ac868d787	Improve setup instructions for robustness - Check for existing SSH config entry before overwriting - Use --no-perms --no-group in rsync (fixes macOS permission errors) - Explicit mkdir instead of brace expansion (Claude Code compatibility) - Gracefully handle missing server directories (empty server is OK) - Conditional steps for setup_views.sh and CLAUDE.md template	2026-03-10 11:29:31 +01:00
Petr	fde1d6fc01	Move Claude Code setup to dashboard, remove step 5 from onboarding - Get Started page now has 4 steps (folder, SSH key, pubkey, register) - After account creation, dashboard shows prominent "Set up your local environment" CTA with claude command and Copy Setup Instructions - CTA only visible when user hasn't synced yet (last_sync is empty) - Bottom banner demoted to subtle secondary style for returning users	2026-03-10 11:18:56 +01:00
Petr	45454ab86a	Redesign onboarding: compact single-screen layout with terminal block - Merge steps 1-3 into a single dark terminal block with copy buttons - Inline registration form with single-row layout for step 4 - Compact step 5 with Claude Code command and copy button on one line - Full-width layout (960px) instead of narrow 640px column - Everything fits on one screen without scrolling	2026-03-10 11:10:19 +01:00
Petr	9c4208bb89	Unify onboarding into single-column stepper with inline registration Merge the two-column layout (setup steps + registration form) into one unified flow. Step 4 now contains the registration form inline, creating a natural top-to-bottom progression through the setup process.	2026-03-10 11:06:11 +01:00
Petr	21af1abb6e	Fix setup instructions: add SSH key steps, fix clipboard on HTTP - Add steps 2-4 (SSH key generation, copy pubkey, create account) - Fix clipboard copy using textarea fallback for non-HTTPS contexts - Generate simple plain-text Claude Code prompt instead of full YAML - Show what Claude will do (SSH, rsync, DuckDB, CLAUDE.md)	2026-03-10 11:00:48 +01:00
Petr	f635195c80	Add multi-domain support and full-email username generation - Support comma-separated domains in auth.allowed_domain config - Use full email as system username (user@domain.com -> user_domain_com) to avoid collisions with reserved names and across domains - Update both auth providers (google, email) for multi-domain display - Add tests for username generation and update email auth tests	2026-03-10 10:50:01 +01:00
Petr	e2ab219171	Add email magic link authentication provider New pluggable auth provider that sends passwordless sign-in links. Works with domain restriction (same as Google OAuth). Falls back to showing the link in browser when SMTP is not configured (dev mode).	2026-03-10 10:39:19 +01:00
Petr	b99ec576ca	Add self-service data onboarding system Table Registry as central source of truth (JSON) with atomic writes, optimistic locking, audit logging, and data_description.md generation. Existing readers (config.py, profiler.py) need zero changes. Phase 1 - Discovery API: - discover_tables() on DataSource ABC + Keboola implementation - admin_required decorator with server-side recomputation - GET /api/admin/discover-tables endpoint Phase 2 - Table Registry: - src/table_registry.py with CRUD, validation, migration from MD - Admin API: register/update/unregister with version locking - DELETE cascade cleans up per-user subscriptions Phase 3 - Auto-Profiling: - profile_changed_tables() for incremental profiling - Non-fatal hook in sync_all() after successful sync Phase 4 - Per-Table Subscriptions: - table_mode (all/explicit) with per-table toggles - GET/POST /api/table-subscriptions endpoints - Subscription status in catalog and dashboard views Phase 5 - Smart Sync: - Python-generated rsync filter files (not shell YAML parsing) - sync_data.sh uses --filter="merge ..." for explicit mode Phase 6 - Admin UI: - /admin/tables with discovery, registration modal, registry mgmt - Vanilla JS, matching existing design system	2026-03-09 14:25:37 +01:00
Petr	c6a711aa27	Extract pluggable auth provider system into auth/ package Replace hardcoded Google OAuth + password auth registration with auto-discovered auth providers. Each provider in auth/<name>/provider.py implements AuthProvider ABC and is automatically registered at startup. - auth/__init__.py: AuthProvider ABC + discover_providers() scanner - auth/google/: Google OAuth provider (extracted from webapp/auth.py) - auth/password/: Email/password provider (delegates to webapp/password_auth) - auth/desktop/: Desktop JWT auth (API-only, not visible on login page) - webapp/auth.py: stripped to core infra (login_required, /login, /logout) - webapp/app.py: auto-discovery loop replaces manual blueprint registration - login.html: dynamic provider buttons via Jinja loop	2026-03-09 13:02:08 +01:00
Petr	f2d3d156e3	Move standalone services from server/ to services/ Extract 4 self-contained services into services/ module: - server/telegram_bot/ -> services/telegram_bot/ - server/ws_gateway/ -> services/ws_gateway/ - server/corporate_memory/ -> services/corporate_memory/ - server/session_collector.py -> services/session_collector/ Each service now has its own systemd/ directory with .service and .timer files. deploy.sh updated to auto-discover service units from services//systemd/. server/ now contains only deployment infrastructure (deploy.sh, setup scripts, bin/ management tools, sudoers, nginx config). All imports updated: webapp/app.py, server/bin/ scripts, systemd ExecStart paths.	2026-03-09 12:54:30 +01:00
Petr	86edd27655	Extract Jira into connectors/jira module Move all Jira-specific code into a self-contained connector module: - 22 files moved via git mv (transform, service, webhook, scripts, systemd units, tests, docs, bin helper) - All imports updated to use connectors.jira.* paths - Jira is now conditional: auto-detected via JIRA_DOMAIN env var - Webapp registers Jira blueprint only when available - Health service monitors Jira timers only when enabled - Profiler loads Jira tables dynamically from filesystem - Sync settings uses config-driven dependency validation - Renamed keboola_platform_url -> custom_url in transform - Updated deploy.sh, sudoers-deploy, backfill_gap.sh paths - Fixed pytest.ini to skip live tests by default	2026-03-09 11:17:50 +01:00
Petr	26c4e0934d	OSS cleanup: remove internal references, harden deployment, add config env interpolation Phase 1 - Internal reference cleanup: - Delete dev_docs/meetings/ (internal meeting notes/transcripts) - Replace hardcoded usernames (padak/matejkys/dasa) with deploy/generic - Replace "Internal AI Data Analyst" with "AI Data Analyst" - Replace keboola/internal_ai_data_analyst URLs with your-org/ai-data-analyst - Replace /tmp/keboola_load/ with /tmp/data_analyst_staging/ in dev_docs Phase 2 - Deployment hardening: - Tighten sudoers wildcards to explicit paths (visudo, sudoers cp) - setup.sh creates all groups (data-ops, dataread, data-private) and deploy user - webapp-setup.sh copies sudoers-webapp from repo instead of inline definition - deploy.sh conditional copy for data_description.md (not in git for OSS) - deploy.sh ownership changed to deploy:data-ops for /data/{scripts,docs,examples} Phase 3 - Config and misc: - Add ${ENV_VAR} interpolation to config/loader.py - Expand config/instance.yaml.example with all sections (admins, deployment, auth, etc.) - Create config/.env.template for secret values - Add MIT LICENSE - Fix .gitignore: add .venv/, docs/data_description.md - Fix README.md: CSV status Planned, remove metrics/, update license text - Translate Czech comments in requirements.txt to English - Fix test_account_service.py: mock username mapping instead of relying on instance config All 118 tests pass.	2026-03-09 07:59:57 +01:00

1 2

51 commits