Commit graph

918 commits

Author SHA1 Message Date
Petr
f635195c80 Add multi-domain support and full-email username generation
- Support comma-separated domains in auth.allowed_domain config
- Use full email as system username (user@domain.com -> user_domain_com)
  to avoid collisions with reserved names and across domains
- Update both auth providers (google, email) for multi-domain display
- Add tests for username generation and update email auth tests
2026-03-10 10:50:01 +01:00
Petr
a8a9efeb60 Update auto-install docs with steps 3-4 (config + email auth) 2026-03-10 10:43:39 +01:00
Petr
e2ab219171 Add email magic link authentication provider
New pluggable auth provider that sends passwordless sign-in links.
Works with domain restriction (same as Google OAuth). Falls back to
showing the link in browser when SMTP is not configured (dev mode).
2026-03-10 10:39:19 +01:00
Petr
b99ec576ca Add self-service data onboarding system
Table Registry as central source of truth (JSON) with atomic writes,
optimistic locking, audit logging, and data_description.md generation.
Existing readers (config.py, profiler.py) need zero changes.

Phase 1 - Discovery API:
  - discover_tables() on DataSource ABC + Keboola implementation
  - admin_required decorator with server-side recomputation
  - GET /api/admin/discover-tables endpoint

Phase 2 - Table Registry:
  - src/table_registry.py with CRUD, validation, migration from MD
  - Admin API: register/update/unregister with version locking
  - DELETE cascade cleans up per-user subscriptions

Phase 3 - Auto-Profiling:
  - profile_changed_tables() for incremental profiling
  - Non-fatal hook in sync_all() after successful sync

Phase 4 - Per-Table Subscriptions:
  - table_mode (all/explicit) with per-table toggles
  - GET/POST /api/table-subscriptions endpoints
  - Subscription status in catalog and dashboard views

Phase 5 - Smart Sync:
  - Python-generated rsync filter files (not shell YAML parsing)
  - sync_data.sh uses --filter="merge ..." for explicit mode

Phase 6 - Admin UI:
  - /admin/tables with discovery, registration modal, registry mgmt
  - Vanilla JS, matching existing design system
2026-03-09 14:25:37 +01:00
Petr
7c9007a8f9 Update docs for modular architecture (auth/, services/, scripts/)
Add auth providers, standalone services, and service patterns
to project structure in README, ARCHITECTURE, and CLAUDE.md.
Reflects the completed extraction of auth, telegram bot,
ws gateway, corporate memory, and session collector.
2026-03-09 13:11:40 +01:00
Petr
15b513266d Merge dev_scripts/ into scripts/
Move dev_run.py and test_sync.sh from dev_scripts/ to scripts/,
eliminating the separate dev_scripts directory. Update scripts
README with development scripts section.
2026-03-09 13:11:36 +01:00
Petr
2d3f127e58 Update paths in docs and sudoers after services/ extraction
All references to server/telegram_bot/, server/ws_gateway/,
server/corporate_memory/, server/session_collector* updated
to their new locations under services/.
2026-03-09 13:02:13 +01:00
Petr
c6a711aa27 Extract pluggable auth provider system into auth/ package
Replace hardcoded Google OAuth + password auth registration with
auto-discovered auth providers. Each provider in auth/<name>/provider.py
implements AuthProvider ABC and is automatically registered at startup.

- auth/__init__.py: AuthProvider ABC + discover_providers() scanner
- auth/google/: Google OAuth provider (extracted from webapp/auth.py)
- auth/password/: Email/password provider (delegates to webapp/password_auth)
- auth/desktop/: Desktop JWT auth (API-only, not visible on login page)
- webapp/auth.py: stripped to core infra (login_required, /login, /logout)
- webapp/app.py: auto-discovery loop replaces manual blueprint registration
- login.html: dynamic provider buttons via Jinja loop
2026-03-09 13:02:08 +01:00
Petr
f2d3d156e3 Move standalone services from server/ to services/
Extract 4 self-contained services into services/ module:
- server/telegram_bot/ -> services/telegram_bot/
- server/ws_gateway/ -> services/ws_gateway/
- server/corporate_memory/ -> services/corporate_memory/
- server/session_collector.py -> services/session_collector/

Each service now has its own systemd/ directory with .service and .timer files.
deploy.sh updated to auto-discover service units from services/*/systemd/*.

server/ now contains only deployment infrastructure (deploy.sh, setup scripts,
bin/ management tools, sudoers, nginx config).

All imports updated: webapp/app.py, server/bin/ scripts, systemd ExecStart paths.
2026-03-09 12:54:30 +01:00
Petr
38b86127ed Branding cleanup: remove Keboola-specific references from docs and config
- server/deploy.sh: KEBOOLA_ENV_FILE -> SYNC_ENV_FILE
- server/ws-gateway.service, notify-bot.service: remove Keboola from descriptions
- .gitignore: generic comment for data directory
- CLAUDE.md, README.md, ARCHITECTURE.md: update paths from src/adapters to connectors/
- docs/DATA_SOURCES.md: update custom connector guide to connectors/ pattern
- connectors/jira/README.md: keboola-analyst -> data-analyst in config paths
- dev_docs/desktop-app.md: KeboolaAnalyst -> DataAnalyst branding
2026-03-09 12:22:27 +01:00
Petr
266e8573d3 Extract Keboola into connectors/keboola module
Move all Keboola-specific code out of src/ into connectors/keboola/:
- git mv src/keboola_client.py -> connectors/keboola/client.py
- Extract LocalKeboolaSource (855 lines) from data_sync.py -> connectors/keboola/adapter.py
- Rename to KeboolaDataSource with full env var validation
- Extend DataSource ABC with get_column_metadata() and get_source_name()
- Add dynamic connector registry via importlib in create_data_source()
- Refactor _generate_schema_yaml to use ABC methods (source_type, _schema_version: 2)
- Remove src/adapters/ (redundant facade layer)
- Remove Keboola validation from src/config.py (connector validates itself)
- Add 14 tests for factory, ABC defaults, env validation, dynamic lookup
2026-03-09 12:22:16 +01:00
Petr
e3b741210e Merge: extract Jira into connectors/jira module 2026-03-09 11:18:03 +01:00
Petr
86edd27655 Extract Jira into connectors/jira module
Move all Jira-specific code into a self-contained connector module:
- 22 files moved via git mv (transform, service, webhook, scripts,
  systemd units, tests, docs, bin helper)
- All imports updated to use connectors.jira.* paths
- Jira is now conditional: auto-detected via JIRA_DOMAIN env var
- Webapp registers Jira blueprint only when available
- Health service monitors Jira timers only when enabled
- Profiler loads Jira tables dynamically from filesystem
- Sync settings uses config-driven dependency validation
- Renamed keboola_platform_url -> custom_url in transform
- Updated deploy.sh, sudoers-deploy, backfill_gap.sh paths
- Fixed pytest.ini to skip live tests by default
2026-03-09 11:17:50 +01:00
Petr
d8226c6641 Restructure docs for OSS readability
Remove redundant docs (GETTING_STARTED, README index, jira_schema),
add ARCHITECTURE.md and llms.txt for AI-era discoverability,
move notifications.md to docs/future/NOTIFICATIONS.md.
2026-03-09 10:42:45 +01:00
Petr
1471b8addf Add agent-generated artifacts to .gitignore 2026-03-09 08:33:31 +01:00
Petr
485ac0a742 Security fixes: sanitize dev_docs, harden sudoers and config validation
H1 - Sanitize dev_docs/ for public release:
  - Replace all real employee names with generic placeholders
    (padak->admin1, matejkys->admin2, dasa->admin3, petr->john, etc.)
  - Replace GCP project ID (kids-ai-data-analysis -> your-gcp-project)
  - Replace server hostname (data-broker-for-claude -> your-server)
  - Replace real IP address (34.88.8.46 -> YOUR_SERVER_IP)
  - Replace internal FQDN with placeholder
  - Covers: security.md, server.md, disaster-recovery.md, desktop-app.md,
    session_explore.md, plan-rsync-fix.md, draft/*.md

H3 - webapp-setup.sh: validate sudoers syntax BEFORE copying to /etc/sudoers.d
  - Prevents broken sudo if syntax is invalid
  - Uses install -m 440 for atomic copy with correct permissions

M1 - setup.sh: deploy user created with /usr/sbin/nologin instead of /bin/bash
  - CI/CD service account does not need interactive shell

M2 - config/loader.py: warn on missing env vars, validate webapp_secret_key
  - _resolve_env_refs now logs warnings for unset ${ENV_VAR} references
  - _validate_config checks auth.webapp_secret_key is non-empty
  - Prevents Flask signing sessions with empty secret key

All 118 tests pass.
2026-03-09 08:06:45 +01:00
Petr
26c4e0934d OSS cleanup: remove internal references, harden deployment, add config env interpolation
Phase 1 - Internal reference cleanup:
- Delete dev_docs/meetings/ (internal meeting notes/transcripts)
- Replace hardcoded usernames (padak/matejkys/dasa) with deploy/generic
- Replace "Internal AI Data Analyst" with "AI Data Analyst"
- Replace keboola/internal_ai_data_analyst URLs with your-org/ai-data-analyst
- Replace /tmp/keboola_load/ with /tmp/data_analyst_staging/ in dev_docs

Phase 2 - Deployment hardening:
- Tighten sudoers wildcards to explicit paths (visudo, sudoers cp)
- setup.sh creates all groups (data-ops, dataread, data-private) and deploy user
- webapp-setup.sh copies sudoers-webapp from repo instead of inline definition
- deploy.sh conditional copy for data_description.md (not in git for OSS)
- deploy.sh ownership changed to deploy:data-ops for /data/{scripts,docs,examples}

Phase 3 - Config and misc:
- Add ${ENV_VAR} interpolation to config/loader.py
- Expand config/instance.yaml.example with all sections (admins, deployment, auth, etc.)
- Create config/.env.template for secret values
- Add MIT LICENSE
- Fix .gitignore: add .venv/, docs/data_description.md
- Fix README.md: CSV status Planned, remove metrics/, update license text
- Translate Czech comments in requirements.txt to English
- Fix test_account_service.py: mock username mapping instead of relying on instance config

All 118 tests pass.
2026-03-09 07:59:57 +01:00
Petr
c56905d34f Initial commit: OSS data distribution platform
Open-source AI data analyst platform extracted from internal repo.
Includes data sync engine, Keboola adapter, Flask web portal,
server deployment scripts, and configuration templates.
2026-03-08 23:31:28 +01:00