Commit graph

72 commits

Author SHA1 Message Date
Petr
508d92771f Generate setup instructions from bootstrap.yaml (single source of truth)
- Rewrite bootstrap.yaml as clean structured YAML with steps, commands,
  descriptions, conditions, and notes
- Add _generate_setup_instructions() in app.py that reads YAML, substitutes
  placeholders, and produces clipboard-ready plain text
- Replace 50-line hardcoded JS string builder with single tojson variable
- All setup instructions now editable in one YAML file
2026-03-15 00:37:19 +01:00
Petr
021c453ea6 Auto-create .sync_connection via printf command in bootstrap
Replace 'Save this to .sync_connection' prose with actual printf command
that Claude/user executes. Fix heredoc indentation in bootstrap.yaml.
2026-03-15 00:05:42 +01:00
Petr
2237334b05 Make CLAUDE.md template generic and instance-aware
- Remove all Keboola-specific content (metric categories, MRR/ARR refs,
  corporate memory, hardcoded server IP)
- Add {ssh_alias}, {server_host}, {webapp_url} placeholders
- Bootstrap saves .sync_connection file with instance details
- sync_data.sh reads .sync_connection to substitute all placeholders
- Text instructions in dashboard include .sync_connection step
2026-03-14 23:57:58 +01:00
Petr
140cbb3cee Make bootstrap.yaml instance-agnostic with configurable SSH alias
Add {ssh_alias} and {ssh_key} placeholders so each instance can use
its own SSH config name (avoids conflicts when user has multiple instances).
Remove Keboola-specific sync_settings and dataset references.
Simplify to single download_server_data step (rsync with scp fallback).
Handle SSH alias conflicts gracefully.
2026-03-14 20:58:26 +01:00
Petr
da6d605ae0 Add sample metric YAML as fallback when OpenMetadata metrics unavailable
The /api/v1/metrics endpoint may not be available in all OpenMetadata instances.
This sample metric provides a fallback for demonstration purposes.
2026-03-12 15:14:04 +01:00
Petr
5fc9526627 Phase 2: Replace demo YAML metrics with OpenMetadata catalog data
- Add get_metric_by_fqn() to OpenMetadataClient
- Add get_metrics() to CatalogEnricher with TTL caching
- Implement _parse_om_metric() to extract category/grain from OpenMetadata tags
- Implement _load_metrics_from_catalog() to fetch and categorize metrics
- Implement _build_om_metric_detail() to convert OpenMetadata format to MetricParser JSON
- Add /api/catalog/metrics/<fqn> endpoint for metric detail modal
- Update _load_metrics_data() to prefer catalog over YAML fallback
- Update metric_modal.js to route catalog:{fqn} to catalog API endpoint
- Delete 10 demo YAML files from docs/metrics/
- Replace metric tests with new unit tests for catalog parsing functions (19 tests)

Catalog metrics provide single source of truth vs maintaining demo YAML files.
UI remains unchanged - only data source changes from YAML to OpenMetadata catalog.
2026-03-12 15:10:42 +01:00
Petr
c77a6f6c2e Fix clipped annotation badges in theme-reference.html
Remove overflow:hidden from mockup containers and reposition
surface/text_primary badges that were cut off at edges.
2026-03-11 14:09:04 +01:00
Petr
d438438e33 Add configurable white-label theming via instance.yaml
Extend theming from 3 CSS variables (primary colors only) to 14
configurable properties covering colors, fonts, borders, and shape.
All values are optional with sensible defaults.

- New _theme.html include replaces duplicated inline injection
- Wire theme include into all 7 templates (base, login, dashboard,
  catalog, admin_tables, activity_center, corporate_memory)
- Conditional font loading: skip default Inter when custom font_url set
- Config.theme_overrides() classmethod generates CSS variable dict
- Visual theme-reference.html guide for instance configurators
- Document all theme keys in instance.yaml.example
2026-03-11 13:58:58 +01:00
Petr
49559fba1b Remove hardcoded Jira and Telemetry cards from catalog
These Keboola-specific data source cards don't belong in the OSS repo.
The catalog now shows only dynamic content: Core Business Data (from
data_description.md) and Business Metrics (from docs/metrics/*.yml).

Also update auto-install.md with Business Metrics documentation,
pipeline diagram, and expanded checklist.
2026-03-10 22:48:07 +01:00
Petr
5a84473213 Add dynamic Business Metrics with sample e-commerce definitions
Replace hardcoded Keboola-specific metrics card in Data Catalog with
dynamic Jinja template that renders whatever metric YAMLs exist in
docs/metrics/. Add 10 sample e-commerce metric definitions across
4 categories (revenue, customers, marketing, support) that align
with the sample data generator tables.

Key changes:
- MetricParser: new category colors + dynamic sql_* field discovery
- _load_metrics_data(): scans docs/metrics/*/*.yml with prod fallback
- catalog.html: 240 lines hardcoded HTML -> 35 lines Jinja loop
- metric_modal.js: regex-based category class removal, new categories
- 21 tests validating YAML schema, parser, and loader
2026-03-10 22:38:44 +01:00
Petr
f685dc357f Document Data Catalog and Profiler pipeline in auto-install guide
- Add architecture diagram showing data flow from instance config
  through profiler to webapp
- Explain folder_mapping dual purpose (catalog categories + file paths)
- Add Step 6c for running the profiler
- Document foreign_keys for relationship diagrams
- Explain profiles.json fallback for catalog header stats
- Expand checklist with profiler verification steps
2026-03-10 22:14:45 +01:00
Petr
7f61ae8772 Update auto-install docs with Data Catalog setup
- Split Step 6 into 6a (Generate Parquet) and 6b (Configure Data Catalog)
- Document data_description.md + instance.yaml catalog categories
- Uncomment data_description.md symlink in Step 3c
- Add Data Catalog verification to Step 6 checklist
2026-03-10 22:00:28 +01:00
Petr
302494b632 Add --format parquet using project's ParquetManager
Generator now supports --format {csv,parquet,both}. Parquet mode
uses src.parquet_manager.ParquetManager for snappy compression,
proper column types (DATE, TIMESTAMP, DOUBLE), and metadata.
No more ad-hoc pandas conversion needed on the server.
2026-03-10 21:46:20 +01:00
Petr
44bf43535b Add sample data generator with 9 e-commerce tables
Synthetic data generator for demo/testing without real data adapter:
- 9 tables: customers, products, campaigns, web_sessions, web_leads,
  orders, order_items, payments, support_tickets
- 4 size presets: xs (1MB), s (15MB), m (150MB), l (1.5GB)
- Realistic patterns: seasonality, Pareto customer distribution,
  segment-based behavior, referential integrity
- Deterministic output via --seed parameter

Also: docs/sample-data.md, updated auto-install.md with Step 6,
updated CLAUDE.md (email auth provider, dual-repo architecture)
2026-03-10 12:31:14 +01:00
Petr
879bc6c44f docs 2026-03-10 11:43:11 +01:00
Petr
495940d6b8 Rewrite auto-install guide with dual-repo architecture
Document the full end-to-end workflow: OSS repo (code) + private
instance repo (config/secrets). Covers SSH key isolation per repo,
symlink bridging, and ongoing deployment workflow.
2026-03-10 11:38:41 +01:00
Petr
a8a9efeb60 Update auto-install docs with steps 3-4 (config + email auth) 2026-03-10 10:43:39 +01:00
Petr
f2d3d156e3 Move standalone services from server/ to services/
Extract 4 self-contained services into services/ module:
- server/telegram_bot/ -> services/telegram_bot/
- server/ws_gateway/ -> services/ws_gateway/
- server/corporate_memory/ -> services/corporate_memory/
- server/session_collector.py -> services/session_collector/

Each service now has its own systemd/ directory with .service and .timer files.
deploy.sh updated to auto-discover service units from services/*/systemd/*.

server/ now contains only deployment infrastructure (deploy.sh, setup scripts,
bin/ management tools, sudoers, nginx config).

All imports updated: webapp/app.py, server/bin/ scripts, systemd ExecStart paths.
2026-03-09 12:54:30 +01:00
Petr
38b86127ed Branding cleanup: remove Keboola-specific references from docs and config
- server/deploy.sh: KEBOOLA_ENV_FILE -> SYNC_ENV_FILE
- server/ws-gateway.service, notify-bot.service: remove Keboola from descriptions
- .gitignore: generic comment for data directory
- CLAUDE.md, README.md, ARCHITECTURE.md: update paths from src/adapters to connectors/
- docs/DATA_SOURCES.md: update custom connector guide to connectors/ pattern
- connectors/jira/README.md: keboola-analyst -> data-analyst in config paths
- dev_docs/desktop-app.md: KeboolaAnalyst -> DataAnalyst branding
2026-03-09 12:22:27 +01:00
Petr
d8226c6641 Restructure docs for OSS readability
Remove redundant docs (GETTING_STARTED, README index, jira_schema),
add ARCHITECTURE.md and llms.txt for AI-era discoverability,
move notifications.md to docs/future/NOTIFICATIONS.md.
2026-03-09 10:42:45 +01:00
Petr
26c4e0934d OSS cleanup: remove internal references, harden deployment, add config env interpolation
Phase 1 - Internal reference cleanup:
- Delete dev_docs/meetings/ (internal meeting notes/transcripts)
- Replace hardcoded usernames (padak/matejkys/dasa) with deploy/generic
- Replace "Internal AI Data Analyst" with "AI Data Analyst"
- Replace keboola/internal_ai_data_analyst URLs with your-org/ai-data-analyst
- Replace /tmp/keboola_load/ with /tmp/data_analyst_staging/ in dev_docs

Phase 2 - Deployment hardening:
- Tighten sudoers wildcards to explicit paths (visudo, sudoers cp)
- setup.sh creates all groups (data-ops, dataread, data-private) and deploy user
- webapp-setup.sh copies sudoers-webapp from repo instead of inline definition
- deploy.sh conditional copy for data_description.md (not in git for OSS)
- deploy.sh ownership changed to deploy:data-ops for /data/{scripts,docs,examples}

Phase 3 - Config and misc:
- Add ${ENV_VAR} interpolation to config/loader.py
- Expand config/instance.yaml.example with all sections (admins, deployment, auth, etc.)
- Create config/.env.template for secret values
- Add MIT LICENSE
- Fix .gitignore: add .venv/, docs/data_description.md
- Fix README.md: CSV status Planned, remove metrics/, update license text
- Translate Czech comments in requirements.txt to English
- Fix test_account_service.py: mock username mapping instead of relying on instance config

All 118 tests pass.
2026-03-09 07:59:57 +01:00
Petr
c56905d34f Initial commit: OSS data distribution platform
Open-source AI data analyst platform extracted from internal repo.
Includes data sync engine, Keboola adapter, Flask web portal,
server deployment scripts, and configuration templates.
2026-03-08 23:31:28 +01:00