agnes-the-ai-analyst

Author	SHA1	Message	Date
Petr	7f61ae8772	Update auto-install docs with Data Catalog setup - Split Step 6 into 6a (Generate Parquet) and 6b (Configure Data Catalog) - Document data_description.md + instance.yaml catalog categories - Uncomment data_description.md symlink in Step 3c - Add Data Catalog verification to Step 6 checklist	2026-03-10 22:00:28 +01:00
Petr	302494b632	Add --format parquet using project's ParquetManager Generator now supports --format {csv,parquet,both}. Parquet mode uses src.parquet_manager.ParquetManager for snappy compression, proper column types (DATE, TIMESTAMP, DOUBLE), and metadata. No more ad-hoc pandas conversion needed on the server.	2026-03-10 21:46:20 +01:00
Petr	44bf43535b	Add sample data generator with 9 e-commerce tables Synthetic data generator for demo/testing without real data adapter: - 9 tables: customers, products, campaigns, web_sessions, web_leads, orders, order_items, payments, support_tickets - 4 size presets: xs (1MB), s (15MB), m (150MB), l (1.5GB) - Realistic patterns: seasonality, Pareto customer distribution, segment-based behavior, referential integrity - Deterministic output via --seed parameter Also: docs/sample-data.md, updated auto-install.md with Step 6, updated CLAUDE.md (email auth provider, dual-repo architecture)	2026-03-10 12:31:14 +01:00
Petr	879bc6c44f	docs	2026-03-10 11:43:11 +01:00
Petr	495940d6b8	Rewrite auto-install guide with dual-repo architecture Document the full end-to-end workflow: OSS repo (code) + private instance repo (config/secrets). Covers SSH key isolation per repo, symlink bridging, and ongoing deployment workflow.	2026-03-10 11:38:41 +01:00
Petr	1ac868d787	Improve setup instructions for robustness - Check for existing SSH config entry before overwriting - Use --no-perms --no-group in rsync (fixes macOS permission errors) - Explicit mkdir instead of brace expansion (Claude Code compatibility) - Gracefully handle missing server directories (empty server is OK) - Conditional steps for setup_views.sh and CLAUDE.md template	2026-03-10 11:29:31 +01:00
Petr	fde1d6fc01	Move Claude Code setup to dashboard, remove step 5 from onboarding - Get Started page now has 4 steps (folder, SSH key, pubkey, register) - After account creation, dashboard shows prominent "Set up your local environment" CTA with claude command and Copy Setup Instructions - CTA only visible when user hasn't synced yet (last_sync is empty) - Bottom banner demoted to subtle secondary style for returning users	2026-03-10 11:18:56 +01:00
Petr	45454ab86a	Redesign onboarding: compact single-screen layout with terminal block - Merge steps 1-3 into a single dark terminal block with copy buttons - Inline registration form with single-row layout for step 4 - Compact step 5 with Claude Code command and copy button on one line - Full-width layout (960px) instead of narrow 640px column - Everything fits on one screen without scrolling	2026-03-10 11:10:19 +01:00
Petr	9c4208bb89	Unify onboarding into single-column stepper with inline registration Merge the two-column layout (setup steps + registration form) into one unified flow. Step 4 now contains the registration form inline, creating a natural top-to-bottom progression through the setup process.	2026-03-10 11:06:11 +01:00
Petr	21af1abb6e	Fix setup instructions: add SSH key steps, fix clipboard on HTTP - Add steps 2-4 (SSH key generation, copy pubkey, create account) - Fix clipboard copy using textarea fallback for non-HTTPS contexts - Generate simple plain-text Claude Code prompt instead of full YAML - Show what Claude will do (SSH, rsync, DuckDB, CLAUDE.md)	2026-03-10 11:00:48 +01:00
Petr	f635195c80	Add multi-domain support and full-email username generation - Support comma-separated domains in auth.allowed_domain config - Use full email as system username (user@domain.com -> user_domain_com) to avoid collisions with reserved names and across domains - Update both auth providers (google, email) for multi-domain display - Add tests for username generation and update email auth tests	2026-03-10 10:50:01 +01:00
Petr	a8a9efeb60	Update auto-install docs with steps 3-4 (config + email auth)	2026-03-10 10:43:39 +01:00
Petr	e2ab219171	Add email magic link authentication provider New pluggable auth provider that sends passwordless sign-in links. Works with domain restriction (same as Google OAuth). Falls back to showing the link in browser when SMTP is not configured (dev mode).	2026-03-10 10:39:19 +01:00
Petr	b99ec576ca	Add self-service data onboarding system Table Registry as central source of truth (JSON) with atomic writes, optimistic locking, audit logging, and data_description.md generation. Existing readers (config.py, profiler.py) need zero changes. Phase 1 - Discovery API: - discover_tables() on DataSource ABC + Keboola implementation - admin_required decorator with server-side recomputation - GET /api/admin/discover-tables endpoint Phase 2 - Table Registry: - src/table_registry.py with CRUD, validation, migration from MD - Admin API: register/update/unregister with version locking - DELETE cascade cleans up per-user subscriptions Phase 3 - Auto-Profiling: - profile_changed_tables() for incremental profiling - Non-fatal hook in sync_all() after successful sync Phase 4 - Per-Table Subscriptions: - table_mode (all/explicit) with per-table toggles - GET/POST /api/table-subscriptions endpoints - Subscription status in catalog and dashboard views Phase 5 - Smart Sync: - Python-generated rsync filter files (not shell YAML parsing) - sync_data.sh uses --filter="merge ..." for explicit mode Phase 6 - Admin UI: - /admin/tables with discovery, registration modal, registry mgmt - Vanilla JS, matching existing design system	2026-03-09 14:25:37 +01:00
Petr	7c9007a8f9	Update docs for modular architecture (auth/, services/, scripts/) Add auth providers, standalone services, and service patterns to project structure in README, ARCHITECTURE, and CLAUDE.md. Reflects the completed extraction of auth, telegram bot, ws gateway, corporate memory, and session collector.	2026-03-09 13:11:40 +01:00
Petr	15b513266d	Merge dev_scripts/ into scripts/ Move dev_run.py and test_sync.sh from dev_scripts/ to scripts/, eliminating the separate dev_scripts directory. Update scripts README with development scripts section.	2026-03-09 13:11:36 +01:00
Petr	2d3f127e58	Update paths in docs and sudoers after services/ extraction All references to server/telegram_bot/, server/ws_gateway/, server/corporate_memory/, server/session_collector* updated to their new locations under services/.	2026-03-09 13:02:13 +01:00
Petr	c6a711aa27	Extract pluggable auth provider system into auth/ package Replace hardcoded Google OAuth + password auth registration with auto-discovered auth providers. Each provider in auth/<name>/provider.py implements AuthProvider ABC and is automatically registered at startup. - auth/__init__.py: AuthProvider ABC + discover_providers() scanner - auth/google/: Google OAuth provider (extracted from webapp/auth.py) - auth/password/: Email/password provider (delegates to webapp/password_auth) - auth/desktop/: Desktop JWT auth (API-only, not visible on login page) - webapp/auth.py: stripped to core infra (login_required, /login, /logout) - webapp/app.py: auto-discovery loop replaces manual blueprint registration - login.html: dynamic provider buttons via Jinja loop	2026-03-09 13:02:08 +01:00
Petr	f2d3d156e3	Move standalone services from server/ to services/ Extract 4 self-contained services into services/ module: - server/telegram_bot/ -> services/telegram_bot/ - server/ws_gateway/ -> services/ws_gateway/ - server/corporate_memory/ -> services/corporate_memory/ - server/session_collector.py -> services/session_collector/ Each service now has its own systemd/ directory with .service and .timer files. deploy.sh updated to auto-discover service units from services//systemd/. server/ now contains only deployment infrastructure (deploy.sh, setup scripts, bin/ management tools, sudoers, nginx config). All imports updated: webapp/app.py, server/bin/ scripts, systemd ExecStart paths.	2026-03-09 12:54:30 +01:00
Petr	38b86127ed	Branding cleanup: remove Keboola-specific references from docs and config - server/deploy.sh: KEBOOLA_ENV_FILE -> SYNC_ENV_FILE - server/ws-gateway.service, notify-bot.service: remove Keboola from descriptions - .gitignore: generic comment for data directory - CLAUDE.md, README.md, ARCHITECTURE.md: update paths from src/adapters to connectors/ - docs/DATA_SOURCES.md: update custom connector guide to connectors/ pattern - connectors/jira/README.md: keboola-analyst -> data-analyst in config paths - dev_docs/desktop-app.md: KeboolaAnalyst -> DataAnalyst branding	2026-03-09 12:22:27 +01:00
Petr	266e8573d3	Extract Keboola into connectors/keboola module Move all Keboola-specific code out of src/ into connectors/keboola/: - git mv src/keboola_client.py -> connectors/keboola/client.py - Extract LocalKeboolaSource (855 lines) from data_sync.py -> connectors/keboola/adapter.py - Rename to KeboolaDataSource with full env var validation - Extend DataSource ABC with get_column_metadata() and get_source_name() - Add dynamic connector registry via importlib in create_data_source() - Refactor _generate_schema_yaml to use ABC methods (source_type, _schema_version: 2) - Remove src/adapters/ (redundant facade layer) - Remove Keboola validation from src/config.py (connector validates itself) - Add 14 tests for factory, ABC defaults, env validation, dynamic lookup	2026-03-09 12:22:16 +01:00
Petr	e3b741210e	Merge: extract Jira into connectors/jira module	2026-03-09 11:18:03 +01:00
Petr	86edd27655	Extract Jira into connectors/jira module Move all Jira-specific code into a self-contained connector module: - 22 files moved via git mv (transform, service, webhook, scripts, systemd units, tests, docs, bin helper) - All imports updated to use connectors.jira.* paths - Jira is now conditional: auto-detected via JIRA_DOMAIN env var - Webapp registers Jira blueprint only when available - Health service monitors Jira timers only when enabled - Profiler loads Jira tables dynamically from filesystem - Sync settings uses config-driven dependency validation - Renamed keboola_platform_url -> custom_url in transform - Updated deploy.sh, sudoers-deploy, backfill_gap.sh paths - Fixed pytest.ini to skip live tests by default	2026-03-09 11:17:50 +01:00
Petr	d8226c6641	Restructure docs for OSS readability Remove redundant docs (GETTING_STARTED, README index, jira_schema), add ARCHITECTURE.md and llms.txt for AI-era discoverability, move notifications.md to docs/future/NOTIFICATIONS.md.	2026-03-09 10:42:45 +01:00
Petr	1471b8addf	Add agent-generated artifacts to .gitignore	2026-03-09 08:33:31 +01:00
Petr	485ac0a742	Security fixes: sanitize dev_docs, harden sudoers and config validation H1 - Sanitize dev_docs/ for public release: - Replace all real employee names with generic placeholders (padak->admin1, matejkys->admin2, dasa->admin3, petr->john, etc.) - Replace GCP project ID (kids-ai-data-analysis -> your-gcp-project) - Replace server hostname (data-broker-for-claude -> your-server) - Replace real IP address (34.88.8.46 -> YOUR_SERVER_IP) - Replace internal FQDN with placeholder - Covers: security.md, server.md, disaster-recovery.md, desktop-app.md, session_explore.md, plan-rsync-fix.md, draft/*.md H3 - webapp-setup.sh: validate sudoers syntax BEFORE copying to /etc/sudoers.d - Prevents broken sudo if syntax is invalid - Uses install -m 440 for atomic copy with correct permissions M1 - setup.sh: deploy user created with /usr/sbin/nologin instead of /bin/bash - CI/CD service account does not need interactive shell M2 - config/loader.py: warn on missing env vars, validate webapp_secret_key - _resolve_env_refs now logs warnings for unset ${ENV_VAR} references - _validate_config checks auth.webapp_secret_key is non-empty - Prevents Flask signing sessions with empty secret key All 118 tests pass.	2026-03-09 08:06:45 +01:00
Petr	26c4e0934d	OSS cleanup: remove internal references, harden deployment, add config env interpolation Phase 1 - Internal reference cleanup: - Delete dev_docs/meetings/ (internal meeting notes/transcripts) - Replace hardcoded usernames (padak/matejkys/dasa) with deploy/generic - Replace "Internal AI Data Analyst" with "AI Data Analyst" - Replace keboola/internal_ai_data_analyst URLs with your-org/ai-data-analyst - Replace /tmp/keboola_load/ with /tmp/data_analyst_staging/ in dev_docs Phase 2 - Deployment hardening: - Tighten sudoers wildcards to explicit paths (visudo, sudoers cp) - setup.sh creates all groups (data-ops, dataread, data-private) and deploy user - webapp-setup.sh copies sudoers-webapp from repo instead of inline definition - deploy.sh conditional copy for data_description.md (not in git for OSS) - deploy.sh ownership changed to deploy:data-ops for /data/{scripts,docs,examples} Phase 3 - Config and misc: - Add ${ENV_VAR} interpolation to config/loader.py - Expand config/instance.yaml.example with all sections (admins, deployment, auth, etc.) - Create config/.env.template for secret values - Add MIT LICENSE - Fix .gitignore: add .venv/, docs/data_description.md - Fix README.md: CSV status Planned, remove metrics/, update license text - Translate Czech comments in requirements.txt to English - Fix test_account_service.py: mock username mapping instead of relying on instance config All 118 tests pass.	2026-03-09 07:59:57 +01:00
Petr	c56905d34f	Initial commit: OSS data distribution platform Open-source AI data analyst platform extracted from internal repo. Includes data sync engine, Keboola adapter, Flask web portal, server deployment scripts, and configuration templates.	2026-03-08 23:31:28 +01:00

28 commits