agnes-the-ai-analyst

Author	SHA1	Message	Date
Petr	95358448e6	Add modular LLM connector for Corporate Memory Replace hardwired Anthropic API calls with a pluggable provider system. Each deployment configures its AI provider in instance.yaml — switching between Anthropic, LiteLLM, OpenRouter, or any OpenAI-compatible proxy is a config change, not a code change. New connectors/llm/ module: - StructuredExtractor Protocol with extract_json() interface - AnthropicExtractor: direct Anthropic SDK with retry + backoff - OpenAICompatExtractor: any OpenAI-compatible proxy with three-layer structured output fallback (json_schema -> json_object -> prompt) - Configurable structured_output policy (strict/json/auto) - Custom exception hierarchy (auth/rate_limit/timeout/format/refusal) - Zero secrets in logs: no API keys, prompts, or responses logged Reviewed by: Google Gemini, Claude Sonnet, OpenAI GPT-5.4. Security audit passed with all critical findings resolved.	2026-03-23 12:08:33 +01:00
Petr	fb63a72a98	Add data product discovery, fix remove-analyst script - client.py: add search_by_data_product() for OpenMetadata search API - catalog_export.py: prefer data product discovery over tag filtering (finds all 16 metrics in FoundryAIDataModel vs 3 with tag filter) - remove-analyst: fix GROUPS bash variable collision, improve messaging	2026-03-18 12:52:41 +01:00
Petr	440662c8fe	Fix remove-analyst silent failure caused by set -e + pipefail The script was exiting silently on the GROUPS=$(groups ... \| cut ...) line — set -eo pipefail caused bash to terminate the script before any echo output, making it appear to do nothing. Replace set -euo pipefail with set -u and explicit error handling. Admin scripts must always report what happened, never exit silently. Also: use id -nG instead of groups\|cut pipe, add verification step after userdel, and log each operation for visibility.	2026-03-15 14:17:39 +01:00
Petr	2181d490e9	Fix systemd NAMESPACE failures caused by missing ReadWritePaths dirs data-refresh.service: use /tmp instead of /tmp/data_analyst_staging in ReadWritePaths — the subdirectory may not exist at service start, causing mount namespace setup to fail before any Exec* directive runs. deploy.sh: fix typo services/corporate-memory -> services/corporate_memory so the mkdir conditional actually matches the repo directory name. deploy.sh: add ReadWritePaths validation loop that auto-creates any missing directories listed in installed .service files before daemon-reload. This acts as a safety net against future NAMESPACE failures from new services.	2026-03-15 11:40:11 +01:00
Petr	80c5b902e0	Add scheduled data sync and catalog refresh with systemd timers - New sync_schedule and profile_after_sync fields in TableConfig (formats: "every 15m", "every 1h", "daily 05:00") - New src/scheduler.py with schedule evaluation logic (is_table_due) - New --scheduled mode in data_sync.py: only syncs tables that are due, respects profile_after_sync flag, auto-restarts webapp after profiling - Systemd timer+service for data-refresh (every 15 min) - Systemd timer+service for catalog-refresh (every 15 min) - deploy.sh enables new timers automatically - Complete table config reference in data_description.md.example - 58 new scheduler tests	2026-03-15 02:16:31 +01:00
Petr	4206b06d92	Make deploy.sh data-source agnostic with --scripts-only flag - Add --scripts-only flag for quick script/docs deployment without restart - Replace hardcoded Keboola env vars with generic loop over all known vars (supports Keboola, BigQuery, OpenMetadata, and optional services) - Make data directories conditional (Jira, notifications, corporate memory created only when relevant code/config exists) - Enable timers only when their .timer files exist on disk - Use root:data-ops ownership (works without deploy user)	2026-03-14 20:38:43 +01:00
Petr	2d3f127e58	Update paths in docs and sudoers after services/ extraction All references to server/telegram_bot/, server/ws_gateway/, server/corporate_memory/, server/session_collector* updated to their new locations under services/.	2026-03-09 13:02:13 +01:00
Petr	f2d3d156e3	Move standalone services from server/ to services/ Extract 4 self-contained services into services/ module: - server/telegram_bot/ -> services/telegram_bot/ - server/ws_gateway/ -> services/ws_gateway/ - server/corporate_memory/ -> services/corporate_memory/ - server/session_collector.py -> services/session_collector/ Each service now has its own systemd/ directory with .service and .timer files. deploy.sh updated to auto-discover service units from services//systemd/. server/ now contains only deployment infrastructure (deploy.sh, setup scripts, bin/ management tools, sudoers, nginx config). All imports updated: webapp/app.py, server/bin/ scripts, systemd ExecStart paths.	2026-03-09 12:54:30 +01:00
Petr	38b86127ed	Branding cleanup: remove Keboola-specific references from docs and config - server/deploy.sh: KEBOOLA_ENV_FILE -> SYNC_ENV_FILE - server/ws-gateway.service, notify-bot.service: remove Keboola from descriptions - .gitignore: generic comment for data directory - CLAUDE.md, README.md, ARCHITECTURE.md: update paths from src/adapters to connectors/ - docs/DATA_SOURCES.md: update custom connector guide to connectors/ pattern - connectors/jira/README.md: keboola-analyst -> data-analyst in config paths - dev_docs/desktop-app.md: KeboolaAnalyst -> DataAnalyst branding	2026-03-09 12:22:27 +01:00
Petr	86edd27655	Extract Jira into connectors/jira module Move all Jira-specific code into a self-contained connector module: - 22 files moved via git mv (transform, service, webhook, scripts, systemd units, tests, docs, bin helper) - All imports updated to use connectors.jira.* paths - Jira is now conditional: auto-detected via JIRA_DOMAIN env var - Webapp registers Jira blueprint only when available - Health service monitors Jira timers only when enabled - Profiler loads Jira tables dynamically from filesystem - Sync settings uses config-driven dependency validation - Renamed keboola_platform_url -> custom_url in transform - Updated deploy.sh, sudoers-deploy, backfill_gap.sh paths - Fixed pytest.ini to skip live tests by default	2026-03-09 11:17:50 +01:00
Petr	485ac0a742	Security fixes: sanitize dev_docs, harden sudoers and config validation H1 - Sanitize dev_docs/ for public release: - Replace all real employee names with generic placeholders (padak->admin1, matejkys->admin2, dasa->admin3, petr->john, etc.) - Replace GCP project ID (kids-ai-data-analysis -> your-gcp-project) - Replace server hostname (data-broker-for-claude -> your-server) - Replace real IP address (34.88.8.46 -> YOUR_SERVER_IP) - Replace internal FQDN with placeholder - Covers: security.md, server.md, disaster-recovery.md, desktop-app.md, session_explore.md, plan-rsync-fix.md, draft/*.md H3 - webapp-setup.sh: validate sudoers syntax BEFORE copying to /etc/sudoers.d - Prevents broken sudo if syntax is invalid - Uses install -m 440 for atomic copy with correct permissions M1 - setup.sh: deploy user created with /usr/sbin/nologin instead of /bin/bash - CI/CD service account does not need interactive shell M2 - config/loader.py: warn on missing env vars, validate webapp_secret_key - _resolve_env_refs now logs warnings for unset ${ENV_VAR} references - _validate_config checks auth.webapp_secret_key is non-empty - Prevents Flask signing sessions with empty secret key All 118 tests pass.	2026-03-09 08:06:45 +01:00
Petr	26c4e0934d	OSS cleanup: remove internal references, harden deployment, add config env interpolation Phase 1 - Internal reference cleanup: - Delete dev_docs/meetings/ (internal meeting notes/transcripts) - Replace hardcoded usernames (padak/matejkys/dasa) with deploy/generic - Replace "Internal AI Data Analyst" with "AI Data Analyst" - Replace keboola/internal_ai_data_analyst URLs with your-org/ai-data-analyst - Replace /tmp/keboola_load/ with /tmp/data_analyst_staging/ in dev_docs Phase 2 - Deployment hardening: - Tighten sudoers wildcards to explicit paths (visudo, sudoers cp) - setup.sh creates all groups (data-ops, dataread, data-private) and deploy user - webapp-setup.sh copies sudoers-webapp from repo instead of inline definition - deploy.sh conditional copy for data_description.md (not in git for OSS) - deploy.sh ownership changed to deploy:data-ops for /data/{scripts,docs,examples} Phase 3 - Config and misc: - Add ${ENV_VAR} interpolation to config/loader.py - Expand config/instance.yaml.example with all sections (admins, deployment, auth, etc.) - Create config/.env.template for secret values - Add MIT LICENSE - Fix .gitignore: add .venv/, docs/data_description.md - Fix README.md: CSV status Planned, remove metrics/, update license text - Translate Czech comments in requirements.txt to English - Fix test_account_service.py: mock username mapping instead of relying on instance config All 118 tests pass.	2026-03-09 07:59:57 +01:00
Petr	c56905d34f	Initial commit: OSS data distribution platform Open-source AI data analyst platform extracted from internal repo. Includes data sync engine, Keboola adapter, Flask web portal, server deployment scripts, and configuration templates.	2026-03-08 23:31:28 +01:00

13 commits