agnes-the-ai-analyst

Author	SHA1	Message	Date
Petr	67df4acd73	Add --stdin JSON mode to avoid shell escaping nightmare Agent was failing 3x on SSH commands due to backticks (BQ table names) and single quotes (SQL string literals) getting mangled by nested shell interpretation (local -> SSH -> bash -> Python). New --stdin mode reads query spec as JSON from stdin via heredoc: cat <<'QUERY' \| ssh alias 'bash remote_query.sh --stdin' {"register_bq": {"alias": "SELECT ... FROM \`table\` ..."}, "sql": "..."} QUERY Heredoc with <<'QUERY' (quoted) passes everything literally -- no escaping needed for backticks, quotes, or parentheses. Updated claude_md_template.txt to use --stdin as the primary method.	2026-03-21 12:15:50 +01:00
Petr	dce8454894	Add remote_query.sh wrapper, fix analyst SSH permissions Analyst user (foundry_e_psimecek) couldn't access /opt/data-analyst/. Added to data-ops group on server. New scripts/remote_query.sh wrapper handles env setup (PYTHONPATH, CONFIG_DIR, .env) so agents use simple: ssh alias 'bash ~/server/scripts/remote_query.sh --sql "..." --format table' Updated claude_md_template.txt to use wrapper instead of raw commands.	2026-03-21 11:58:04 +01:00
Petr	d180b2014e	Step 28: Remote query architecture for local+remote table JOINs Add src/remote_query.py CLI module enabling the AI agent to run SQL queries spanning local Parquet tables and remote BigQuery tables in a single DuckDB session on the server. Two-phase protocol: BQ sub-queries (--register-bq) fetch filtered/aggregated data, then DuckDB SQL (--sql) joins everything. Safety: COUNT(*) pre-check, memory estimation (2GB cap), row limits (500K per BQ sub-query, 100K final result). Changes: - New src/remote_query.py with CLI, BQ registration, output formatting - Add bq_entity_type field to TableConfig (view vs table routing) - Extract create_local_views() from duckdb_manager.py for reuse - Update claude_md_template.txt with remote query agent instructions - Update example configs with remote_query section and docs - 52 new tests (42 remote_query + 10 bq_entity_type), all passing	2026-03-21 11:39:15 +01:00
Petr	49adbe26ec	Move server venv setup to bootstrap, remove cross-platform pip freeze sync Server venv is created during bootstrap via SSH (same package list, installed natively on Linux). Removes sync_data.sh section that copied pip freeze output across platforms (Windows/macOS freeze is incompatible with Linux).	2026-03-15 00:59:45 +01:00
Petr	508d92771f	Generate setup instructions from bootstrap.yaml (single source of truth) - Rewrite bootstrap.yaml as clean structured YAML with steps, commands, descriptions, conditions, and notes - Add _generate_setup_instructions() in app.py that reads YAML, substitutes placeholders, and produces clipboard-ready plain text - Replace 50-line hardcoded JS string builder with single tojson variable - All setup instructions now editable in one YAML file	2026-03-15 00:37:19 +01:00
Petr	021c453ea6	Auto-create .sync_connection via printf command in bootstrap Replace 'Save this to .sync_connection' prose with actual printf command that Claude/user executes. Fix heredoc indentation in bootstrap.yaml.	2026-03-15 00:05:42 +01:00
Petr	2237334b05	Make CLAUDE.md template generic and instance-aware - Remove all Keboola-specific content (metric categories, MRR/ARR refs, corporate memory, hardcoded server IP) - Add {ssh_alias}, {server_host}, {webapp_url} placeholders - Bootstrap saves .sync_connection file with instance details - sync_data.sh reads .sync_connection to substitute all placeholders - Text instructions in dashboard include .sync_connection step	2026-03-14 23:57:58 +01:00
Petr	140cbb3cee	Make bootstrap.yaml instance-agnostic with configurable SSH alias Add {ssh_alias} and {ssh_key} placeholders so each instance can use its own SSH config name (avoids conflicts when user has multiple instances). Remove Keboola-specific sync_settings and dataset references. Simplify to single download_server_data step (rsync with scp fallback). Handle SSH alias conflicts gracefully.	2026-03-14 20:58:26 +01:00
Petr	26c4e0934d	OSS cleanup: remove internal references, harden deployment, add config env interpolation Phase 1 - Internal reference cleanup: - Delete dev_docs/meetings/ (internal meeting notes/transcripts) - Replace hardcoded usernames (padak/matejkys/dasa) with deploy/generic - Replace "Internal AI Data Analyst" with "AI Data Analyst" - Replace keboola/internal_ai_data_analyst URLs with your-org/ai-data-analyst - Replace /tmp/keboola_load/ with /tmp/data_analyst_staging/ in dev_docs Phase 2 - Deployment hardening: - Tighten sudoers wildcards to explicit paths (visudo, sudoers cp) - setup.sh creates all groups (data-ops, dataread, data-private) and deploy user - webapp-setup.sh copies sudoers-webapp from repo instead of inline definition - deploy.sh conditional copy for data_description.md (not in git for OSS) - deploy.sh ownership changed to deploy:data-ops for /data/{scripts,docs,examples} Phase 3 - Config and misc: - Add ${ENV_VAR} interpolation to config/loader.py - Expand config/instance.yaml.example with all sections (admins, deployment, auth, etc.) - Create config/.env.template for secret values - Add MIT LICENSE - Fix .gitignore: add .venv/, docs/data_description.md - Fix README.md: CSV status Planned, remove metrics/, update license text - Translate Czech comments in requirements.txt to English - Fix test_account_service.py: mock username mapping instead of relying on instance config All 118 tests pass.	2026-03-09 07:59:57 +01:00
Petr	c56905d34f	Initial commit: OSS data distribution platform Open-source AI data analyst platform extracted from internal repo. Includes data sync engine, Keboola adapter, Flask web portal, server deployment scripts, and configuration templates.	2026-03-08 23:31:28 +01:00

10 commits