- Fix sync_state.json parsing: derive last_updated from table last_sync
timestamps when root-level field is missing (flat format support)
- Parse ALL YAML blocks from data_description.md (was only first block)
- Show remote tables (daily_deal_traffic) in catalog with "Live" badge
- Show per-table sync timestamps and Local/Live query mode badges
- Add data freshness note to Business Metrics section
- Dashboard: fix "Not yet synced" bug, show local/live table breakdown
Add admin curation layer between AI extraction and knowledge distribution.
Admins (km_admin flag in instance.yaml) can approve, reject, mandate, and
revoke knowledge items. Mandatory items distribute to all targeted users
automatically.
Three governance modes (configurable per instance):
- mandatory_only: admin controls everything, no user voting
- admin_curated: admin controls, users vote as feedback signal
- hybrid: mandatory from admin + optional from user voting
Three approval workflows:
- review_queue: nothing published without admin approval
- auto_publish: items go live immediately, admin intervenes retroactively
- threshold: confidence-based auto-publish (Phase 5)
Includes:
- 9 admin action functions (approve/reject/mandate/revoke/edit/batch/...)
- 11 new admin API endpoints under /api/corporate-memory/admin/
- Immutable audit log (audit.jsonl)
- Audience targeting via groups
- Automatic migration of existing items to "approved" status
- km_admin_required auth decorator
- 69 tests covering all governance logic
- Backward compatible: no config = legacy wiki behavior
List view:
- Show display_name ("M1 + VFM Operational") instead of name ("M1PlusVFMOperational")
- Strip HTML and truncate description for clean list excerpts
Modal detail:
- Render original HTML from catalog instead of stripped plain text
- Add .om-description CSS class for structured HTML (bold labels, lists, code)
- Pass description_html alongside plain text description for backwards compat
Add filter_tag support to catalog_export and webapp so only metrics
with the required tag are exported to YAML and displayed in UI.
Previously all 19+ metrics were exported regardless of relevance.
- Add has_tag() helper to transformer module
- catalog_export.py: filter_tag parameter from instance.yaml openmetadata config
- webapp/app.py: filter metrics in _load_metrics_from_catalog()
- 7 new tests (has_tag, filter_tag export, stale cleanup)
- New `connectors/openmetadata/transformer.py` with shared parsing logic
for extracting categories, grain, dimensions, expressions from OM tags
- New `src/catalog_export.py` script (python -m src.catalog_export) that
fetches metrics/tables from OpenMetadata API and writes YAML files to
/data/docs/metrics/ and /data/docs/tables/ for agent consumption
- Refactor webapp/app.py to delegate to transformer (with inline fallback)
- Add `fields` parameter to client.get_metrics() and get_metric_by_fqn()
for fetching tags+owners in a single API call
- Fix pre-existing mock bug in test_openmetadata_enricher (base_url)
- 101 new tests (80 transformer + 21 export), all passing
- Rewrite bootstrap.yaml as clean structured YAML with steps, commands,
descriptions, conditions, and notes
- Add _generate_setup_instructions() in app.py that reads YAML, substitutes
placeholders, and produces clipboard-ready plain text
- Replace 50-line hardcoded JS string builder with single tojson variable
- All setup instructions now editable in one YAML file
Read server.project_dir from instance.yaml (default: 'data-analyst').
Replace hardcoded 'data-analyst' folder name and 'data_analyst_server'
SSH key name in dashboard template with Jinja variables.
Replace hardcoded 'data-analyst' and '~/.ssh/data_analyst_server' in
the copyBootstrapInstructions JS function with values from instance config.
Pass ssh_alias and ssh_key to dashboard template context.
Config reads server.ssh_alias and server.ssh_key from instance.yaml
(defaults: 'data-analyst' and '~/.ssh/data_analyst_server' for backward compat).
App.py substitutes {ssh_alias} and {ssh_key} in bootstrap.yaml template.
Flask will now include git commit hash as URL parameter (v=abc1234)
for metric_modal.js and other static assets. This ensures browser
doesn't cache stale JavaScript when code changes.
Cache invalidation based on actual git history rather than timestamps.
FQN can contain spaces (e.g., 'Active2 Customers') which get URL-encoded
as 'Active2%20Customers' in the path parameter. Need to decode before
passing to OpenMetadata API.
OpenMetadata uses different field names than expected:
- metricExpression instead of expression
- metricType instead of type
- unitOfMeasurement instead of unit
- granularity instead of grain
Remove 'fields' query parameter from /api/v1/metrics - returns 400 Bad Request
when invalid field names are specified. Let API return full metric objects.
Update parsing to extract metadata from proper OpenMetadata fields instead
of relying on tags (tags are optional, fields are always present).
- Add get_metric_by_fqn() to OpenMetadataClient
- Add get_metrics() to CatalogEnricher with TTL caching
- Implement _parse_om_metric() to extract category/grain from OpenMetadata tags
- Implement _load_metrics_from_catalog() to fetch and categorize metrics
- Implement _build_om_metric_detail() to convert OpenMetadata format to MetricParser JSON
- Add /api/catalog/metrics/<fqn> endpoint for metric detail modal
- Update _load_metrics_data() to prefer catalog over YAML fallback
- Update metric_modal.js to route catalog:{fqn} to catalog API endpoint
- Delete 10 demo YAML files from docs/metrics/
- Replace metric tests with new unit tests for catalog parsing functions (19 tests)
Catalog metrics provide single source of truth vs maintaining demo YAML files.
UI remains unchanged - only data source changes from YAML to OpenMetadata catalog.
Pass partition_by, partition_granularity, partition_column_type, and
incremental_window_days from YAML to TableConfig to avoid validation errors
when sync_strategy='partitioned'
- API endpoint /api/catalog/profile/ enriches response with catalog metadata (tier, owners, tags, url)
- renderOverview() template function displays 'Data Catalog' section with tier, owners, tags, and catalog link
- Graceful degradation: section only shown if catalog enrichment available
Add OpenMetadata REST API connector and enricher to merge table/column metadata
from OpenMetadata catalog at sync and query time.
Changes:
- connectors/openmetadata/client.py: HTTP client for OM API
- connectors/openmetadata/enricher.py: Data enrichment with TTL cache
- tests/test_openmetadata_*: Unit tests for client and enricher
- src/config.py: Add catalog_fqn field to TableConfig
- src/data_sync.py: Use enricher in _generate_schema_yaml (catalog > BQ API > data_description.md)
- webapp/app.py: Initialize enricher, enrich catalog data with tags/tier/owners/url
- config/instance.yaml.example: Document openmetadata section
Features:
- FQN auto-derivation: bigquery.{table.id}
- TTL cache (default 1h) to avoid repeated API calls
- Graceful degradation: disabled if token missing, silent on HTTP errors
- Column description priority: catalog > BQ API > (none)
- Table description priority: catalog > data_description.md
The activity_center view was passing an empty dict but the template
expected nested keys (executive_summary, maturity_roadmap, etc).
Added _build_activity_data() that returns properly structured defaults.
- Profiler computes file_size_mb from actual parquet files when
sync_state.json is absent (sample data / no-sync deployments)
- Catalog header falls back to profiles.json for aggregate stats
(tables count, total rows) when sync_state.json is missing
Extract 4 self-contained services into services/ module:
- server/telegram_bot/ -> services/telegram_bot/
- server/ws_gateway/ -> services/ws_gateway/
- server/corporate_memory/ -> services/corporate_memory/
- server/session_collector.py -> services/session_collector/
Each service now has its own systemd/ directory with .service and .timer files.
deploy.sh updated to auto-discover service units from services/*/systemd/*.
server/ now contains only deployment infrastructure (deploy.sh, setup scripts,
bin/ management tools, sudoers, nginx config).
All imports updated: webapp/app.py, server/bin/ scripts, systemd ExecStart paths.
Move all Jira-specific code into a self-contained connector module:
- 22 files moved via git mv (transform, service, webhook, scripts,
systemd units, tests, docs, bin helper)
- All imports updated to use connectors.jira.* paths
- Jira is now conditional: auto-detected via JIRA_DOMAIN env var
- Webapp registers Jira blueprint only when available
- Health service monitors Jira timers only when enabled
- Profiler loads Jira tables dynamically from filesystem
- Sync settings uses config-driven dependency validation
- Renamed keboola_platform_url -> custom_url in transform
- Updated deploy.sh, sudoers-deploy, backfill_gap.sh paths
- Fixed pytest.ini to skip live tests by default
Open-source AI data analyst platform extracted from internal repo.
Includes data sync engine, Keboola adapter, Flask web portal,
server deployment scripts, and configuration templates.