agnes-the-ai-analyst

Author	SHA1	Message	Date
Petr	5fc9526627	Phase 2: Replace demo YAML metrics with OpenMetadata catalog data - Add get_metric_by_fqn() to OpenMetadataClient - Add get_metrics() to CatalogEnricher with TTL caching - Implement _parse_om_metric() to extract category/grain from OpenMetadata tags - Implement _load_metrics_from_catalog() to fetch and categorize metrics - Implement _build_om_metric_detail() to convert OpenMetadata format to MetricParser JSON - Add /api/catalog/metrics/<fqn> endpoint for metric detail modal - Update _load_metrics_data() to prefer catalog over YAML fallback - Update metric_modal.js to route catalog:{fqn} to catalog API endpoint - Delete 10 demo YAML files from docs/metrics/ - Replace metric tests with new unit tests for catalog parsing functions (19 tests) Catalog metrics provide single source of truth vs maintaining demo YAML files. UI remains unchanged - only data source changes from YAML to OpenMetadata catalog.	2026-03-12 15:10:42 +01:00
Petr	14d75d6229	Fix: correct OpenMetadata catalog URL path and add debug logging - Change catalog URL from /explore/{fqn} to /table/{fqn} - Add debug logging to see parsed tags, owners, tier from API response	2026-03-12 14:34:12 +01:00
Petr	de66f6dd55	Fix: include partition fields in TableConfig for catalog enrichment Pass partition_by, partition_granularity, partition_column_type, and incremental_window_days from YAML to TableConfig to avoid validation errors when sync_strategy='partitioned'	2026-03-12 14:29:05 +01:00
Petr	2d03a9b557	Display OpenMetadata catalog enrichment in table profile overview - API endpoint /api/catalog/profile/ enriches response with catalog metadata (tier, owners, tags, url) - renderOverview() template function displays 'Data Catalog' section with tier, owners, tags, and catalog link - Graceful degradation: section only shown if catalog enrichment available	2026-03-12 14:28:02 +01:00
Petr	c5c24cb45b	Implement OpenMetadata catalog integration (Phase 1) Add OpenMetadata REST API connector and enricher to merge table/column metadata from OpenMetadata catalog at sync and query time. Changes: - connectors/openmetadata/client.py: HTTP client for OM API - connectors/openmetadata/enricher.py: Data enrichment with TTL cache - tests/test_openmetadata_*: Unit tests for client and enricher - src/config.py: Add catalog_fqn field to TableConfig - src/data_sync.py: Use enricher in _generate_schema_yaml (catalog > BQ API > data_description.md) - webapp/app.py: Initialize enricher, enrich catalog data with tags/tier/owners/url - config/instance.yaml.example: Document openmetadata section Features: - FQN auto-derivation: bigquery.{table.id} - TTL cache (default 1h) to avoid repeated API calls - Graceful degradation: disabled if token missing, silent on HTTP errors - Column description priority: catalog > BQ API > (none) - Table description priority: catalog > data_description.md	2026-03-12 14:07:13 +01:00
Petr	d3c7f7feea	Fix activity-center 500 error: provide default data structure The activity_center view was passing an empty dict but the template expected nested keys (executive_summary, maturity_roadmap, etc). Added _build_activity_data() that returns properly structured defaults.	2026-03-11 12:59:40 +01:00
Petr	ad3b94c168	Add Business Metrics card to dashboard	2026-03-10 22:52:48 +01:00
Petr	5a84473213	Add dynamic Business Metrics with sample e-commerce definitions Replace hardcoded Keboola-specific metrics card in Data Catalog with dynamic Jinja template that renders whatever metric YAMLs exist in docs/metrics/. Add 10 sample e-commerce metric definitions across 4 categories (revenue, customers, marketing, support) that align with the sample data generator tables. Key changes: - MetricParser: new category colors + dynamic sql_* field discovery - _load_metrics_data(): scans docs/metrics//.yml with prod fallback - catalog.html: 240 lines hardcoded HTML -> 35 lines Jinja loop - metric_modal.js: regex-based category class removal, new categories - 21 tests validating YAML schema, parser, and loader	2026-03-10 22:38:44 +01:00
Petr	28543d98b1	Fix profiler file_size and catalog stats fallback - Profiler computes file_size_mb from actual parquet files when sync_state.json is absent (sample data / no-sync deployments) - Catalog header falls back to profiles.json for aggregate stats (tables count, total rows) when sync_state.json is missing	2026-03-10 22:12:46 +01:00
Petr	b99ec576ca	Add self-service data onboarding system Table Registry as central source of truth (JSON) with atomic writes, optimistic locking, audit logging, and data_description.md generation. Existing readers (config.py, profiler.py) need zero changes. Phase 1 - Discovery API: - discover_tables() on DataSource ABC + Keboola implementation - admin_required decorator with server-side recomputation - GET /api/admin/discover-tables endpoint Phase 2 - Table Registry: - src/table_registry.py with CRUD, validation, migration from MD - Admin API: register/update/unregister with version locking - DELETE cascade cleans up per-user subscriptions Phase 3 - Auto-Profiling: - profile_changed_tables() for incremental profiling - Non-fatal hook in sync_all() after successful sync Phase 4 - Per-Table Subscriptions: - table_mode (all/explicit) with per-table toggles - GET/POST /api/table-subscriptions endpoints - Subscription status in catalog and dashboard views Phase 5 - Smart Sync: - Python-generated rsync filter files (not shell YAML parsing) - sync_data.sh uses --filter="merge ..." for explicit mode Phase 6 - Admin UI: - /admin/tables with discovery, registration modal, registry mgmt - Vanilla JS, matching existing design system	2026-03-09 14:25:37 +01:00
Petr	c6a711aa27	Extract pluggable auth provider system into auth/ package Replace hardcoded Google OAuth + password auth registration with auto-discovered auth providers. Each provider in auth/<name>/provider.py implements AuthProvider ABC and is automatically registered at startup. - auth/__init__.py: AuthProvider ABC + discover_providers() scanner - auth/google/: Google OAuth provider (extracted from webapp/auth.py) - auth/password/: Email/password provider (delegates to webapp/password_auth) - auth/desktop/: Desktop JWT auth (API-only, not visible on login page) - webapp/auth.py: stripped to core infra (login_required, /login, /logout) - webapp/app.py: auto-discovery loop replaces manual blueprint registration - login.html: dynamic provider buttons via Jinja loop	2026-03-09 13:02:08 +01:00
Petr	f2d3d156e3	Move standalone services from server/ to services/ Extract 4 self-contained services into services/ module: - server/telegram_bot/ -> services/telegram_bot/ - server/ws_gateway/ -> services/ws_gateway/ - server/corporate_memory/ -> services/corporate_memory/ - server/session_collector.py -> services/session_collector/ Each service now has its own systemd/ directory with .service and .timer files. deploy.sh updated to auto-discover service units from services//systemd/. server/ now contains only deployment infrastructure (deploy.sh, setup scripts, bin/ management tools, sudoers, nginx config). All imports updated: webapp/app.py, server/bin/ scripts, systemd ExecStart paths.	2026-03-09 12:54:30 +01:00
Petr	86edd27655	Extract Jira into connectors/jira module Move all Jira-specific code into a self-contained connector module: - 22 files moved via git mv (transform, service, webhook, scripts, systemd units, tests, docs, bin helper) - All imports updated to use connectors.jira.* paths - Jira is now conditional: auto-detected via JIRA_DOMAIN env var - Webapp registers Jira blueprint only when available - Health service monitors Jira timers only when enabled - Profiler loads Jira tables dynamically from filesystem - Sync settings uses config-driven dependency validation - Renamed keboola_platform_url -> custom_url in transform - Updated deploy.sh, sudoers-deploy, backfill_gap.sh paths - Fixed pytest.ini to skip live tests by default	2026-03-09 11:17:50 +01:00
Petr	c56905d34f	Initial commit: OSS data distribution platform Open-source AI data analyst platform extracted from internal repo. Includes data sync engine, Keboola adapter, Flask web portal, server deployment scripts, and configuration templates.	2026-03-08 23:31:28 +01:00

14 commits