agnes-the-ai-analyst

Author	SHA1	Message	Date
Petr	c2681ccc86	Add cache-busting with git commit hash for static assets Flask will now include git commit hash as URL parameter (v=abc1234) for metric_modal.js and other static assets. This ensures browser doesn't cache stale JavaScript when code changes. Cache invalidation based on actual git history rather than timestamps.	2026-03-12 15:37:29 +01:00
Petr	1bcd7e4080	Fix: URL-decode metric FQN in catalog endpoint FQN can contain spaces (e.g., 'Active2 Customers') which get URL-encoded as 'Active2%20Customers' in the path parameter. Need to decode before passing to OpenMetadata API.	2026-03-12 15:18:08 +01:00
Petr	268fe07f91	Fix: Use correct OpenMetadata API field names for metrics OpenMetadata uses different field names than expected: - metricExpression instead of expression - metricType instead of type - unitOfMeasurement instead of unit - granularity instead of grain Remove 'fields' query parameter from /api/v1/metrics - returns 400 Bad Request when invalid field names are specified. Let API return full metric objects. Update parsing to extract metadata from proper OpenMetadata fields instead of relying on tags (tags are optional, fields are always present).	2026-03-12 15:16:24 +01:00
Petr	5fc9526627	Phase 2: Replace demo YAML metrics with OpenMetadata catalog data - Add get_metric_by_fqn() to OpenMetadataClient - Add get_metrics() to CatalogEnricher with TTL caching - Implement _parse_om_metric() to extract category/grain from OpenMetadata tags - Implement _load_metrics_from_catalog() to fetch and categorize metrics - Implement _build_om_metric_detail() to convert OpenMetadata format to MetricParser JSON - Add /api/catalog/metrics/<fqn> endpoint for metric detail modal - Update _load_metrics_data() to prefer catalog over YAML fallback - Update metric_modal.js to route catalog:{fqn} to catalog API endpoint - Delete 10 demo YAML files from docs/metrics/ - Replace metric tests with new unit tests for catalog parsing functions (19 tests) Catalog metrics provide single source of truth vs maintaining demo YAML files. UI remains unchanged - only data source changes from YAML to OpenMetadata catalog.	2026-03-12 15:10:42 +01:00
Petr	14d75d6229	Fix: correct OpenMetadata catalog URL path and add debug logging - Change catalog URL from /explore/{fqn} to /table/{fqn} - Add debug logging to see parsed tags, owners, tier from API response	2026-03-12 14:34:12 +01:00
Petr	de66f6dd55	Fix: include partition fields in TableConfig for catalog enrichment Pass partition_by, partition_granularity, partition_column_type, and incremental_window_days from YAML to TableConfig to avoid validation errors when sync_strategy='partitioned'	2026-03-12 14:29:05 +01:00
Petr	2d03a9b557	Display OpenMetadata catalog enrichment in table profile overview - API endpoint /api/catalog/profile/ enriches response with catalog metadata (tier, owners, tags, url) - renderOverview() template function displays 'Data Catalog' section with tier, owners, tags, and catalog link - Graceful degradation: section only shown if catalog enrichment available	2026-03-12 14:28:02 +01:00
Petr	c5c24cb45b	Implement OpenMetadata catalog integration (Phase 1) Add OpenMetadata REST API connector and enricher to merge table/column metadata from OpenMetadata catalog at sync and query time. Changes: - connectors/openmetadata/client.py: HTTP client for OM API - connectors/openmetadata/enricher.py: Data enrichment with TTL cache - tests/test_openmetadata_*: Unit tests for client and enricher - src/config.py: Add catalog_fqn field to TableConfig - src/data_sync.py: Use enricher in _generate_schema_yaml (catalog > BQ API > data_description.md) - webapp/app.py: Initialize enricher, enrich catalog data with tags/tier/owners/url - config/instance.yaml.example: Document openmetadata section Features: - FQN auto-derivation: bigquery.{table.id} - TTL cache (default 1h) to avoid repeated API calls - Graceful degradation: disabled if token missing, silent on HTTP errors - Column description priority: catalog > BQ API > (none) - Table description priority: catalog > data_description.md	2026-03-12 14:07:13 +01:00
Petr	d3c7f7feea	Fix activity-center 500 error: provide default data structure The activity_center view was passing an empty dict but the template expected nested keys (executive_summary, maturity_roadmap, etc). Added _build_activity_data() that returns properly structured defaults.	2026-03-11 12:59:40 +01:00
Petr	ad3b94c168	Add Business Metrics card to dashboard	2026-03-10 22:52:48 +01:00
Petr	5a84473213	Add dynamic Business Metrics with sample e-commerce definitions Replace hardcoded Keboola-specific metrics card in Data Catalog with dynamic Jinja template that renders whatever metric YAMLs exist in docs/metrics/. Add 10 sample e-commerce metric definitions across 4 categories (revenue, customers, marketing, support) that align with the sample data generator tables. Key changes: - MetricParser: new category colors + dynamic sql_* field discovery - _load_metrics_data(): scans docs/metrics//.yml with prod fallback - catalog.html: 240 lines hardcoded HTML -> 35 lines Jinja loop - metric_modal.js: regex-based category class removal, new categories - 21 tests validating YAML schema, parser, and loader	2026-03-10 22:38:44 +01:00
Petr	28543d98b1	Fix profiler file_size and catalog stats fallback - Profiler computes file_size_mb from actual parquet files when sync_state.json is absent (sample data / no-sync deployments) - Catalog header falls back to profiles.json for aggregate stats (tables count, total rows) when sync_state.json is missing	2026-03-10 22:12:46 +01:00
Petr	b99ec576ca	Add self-service data onboarding system Table Registry as central source of truth (JSON) with atomic writes, optimistic locking, audit logging, and data_description.md generation. Existing readers (config.py, profiler.py) need zero changes. Phase 1 - Discovery API: - discover_tables() on DataSource ABC + Keboola implementation - admin_required decorator with server-side recomputation - GET /api/admin/discover-tables endpoint Phase 2 - Table Registry: - src/table_registry.py with CRUD, validation, migration from MD - Admin API: register/update/unregister with version locking - DELETE cascade cleans up per-user subscriptions Phase 3 - Auto-Profiling: - profile_changed_tables() for incremental profiling - Non-fatal hook in sync_all() after successful sync Phase 4 - Per-Table Subscriptions: - table_mode (all/explicit) with per-table toggles - GET/POST /api/table-subscriptions endpoints - Subscription status in catalog and dashboard views Phase 5 - Smart Sync: - Python-generated rsync filter files (not shell YAML parsing) - sync_data.sh uses --filter="merge ..." for explicit mode Phase 6 - Admin UI: - /admin/tables with discovery, registration modal, registry mgmt - Vanilla JS, matching existing design system	2026-03-09 14:25:37 +01:00
Petr	c6a711aa27	Extract pluggable auth provider system into auth/ package Replace hardcoded Google OAuth + password auth registration with auto-discovered auth providers. Each provider in auth/<name>/provider.py implements AuthProvider ABC and is automatically registered at startup. - auth/__init__.py: AuthProvider ABC + discover_providers() scanner - auth/google/: Google OAuth provider (extracted from webapp/auth.py) - auth/password/: Email/password provider (delegates to webapp/password_auth) - auth/desktop/: Desktop JWT auth (API-only, not visible on login page) - webapp/auth.py: stripped to core infra (login_required, /login, /logout) - webapp/app.py: auto-discovery loop replaces manual blueprint registration - login.html: dynamic provider buttons via Jinja loop	2026-03-09 13:02:08 +01:00
Petr	f2d3d156e3	Move standalone services from server/ to services/ Extract 4 self-contained services into services/ module: - server/telegram_bot/ -> services/telegram_bot/ - server/ws_gateway/ -> services/ws_gateway/ - server/corporate_memory/ -> services/corporate_memory/ - server/session_collector.py -> services/session_collector/ Each service now has its own systemd/ directory with .service and .timer files. deploy.sh updated to auto-discover service units from services//systemd/. server/ now contains only deployment infrastructure (deploy.sh, setup scripts, bin/ management tools, sudoers, nginx config). All imports updated: webapp/app.py, server/bin/ scripts, systemd ExecStart paths.	2026-03-09 12:54:30 +01:00
Petr	86edd27655	Extract Jira into connectors/jira module Move all Jira-specific code into a self-contained connector module: - 22 files moved via git mv (transform, service, webhook, scripts, systemd units, tests, docs, bin helper) - All imports updated to use connectors.jira.* paths - Jira is now conditional: auto-detected via JIRA_DOMAIN env var - Webapp registers Jira blueprint only when available - Health service monitors Jira timers only when enabled - Profiler loads Jira tables dynamically from filesystem - Sync settings uses config-driven dependency validation - Renamed keboola_platform_url -> custom_url in transform - Updated deploy.sh, sudoers-deploy, backfill_gap.sh paths - Fixed pytest.ini to skip live tests by default	2026-03-09 11:17:50 +01:00
Petr	c56905d34f	Initial commit: OSS data distribution platform Open-source AI data analyst platform extracted from internal repo. Includes data sync engine, Keboola adapter, Flask web portal, server deployment scripts, and configuration templates.	2026-03-08 23:31:28 +01:00

17 commits