Agent was failing 3x on SSH commands due to backticks (BQ table names)
and single quotes (SQL string literals) getting mangled by nested shell
interpretation (local -> SSH -> bash -> Python).
New --stdin mode reads query spec as JSON from stdin via heredoc:
cat <<'QUERY' | ssh alias 'bash remote_query.sh --stdin'
{"register_bq": {"alias": "SELECT ... FROM \`table\` ..."}, "sql": "..."}
QUERY
Heredoc with <<'QUERY' (quoted) passes everything literally -- no
escaping needed for backticks, quotes, or parentheses.
Updated claude_md_template.txt to use --stdin as the primary method.
Analyst user (foundry_e_psimecek) couldn't access /opt/data-analyst/.
Added to data-ops group on server.
New scripts/remote_query.sh wrapper handles env setup (PYTHONPATH,
CONFIG_DIR, .env) so agents use simple:
ssh alias 'bash ~/server/scripts/remote_query.sh --sql "..." --format table'
Updated claude_md_template.txt to use wrapper instead of raw commands.
Add src/remote_query.py CLI module enabling the AI agent to run SQL
queries spanning local Parquet tables and remote BigQuery tables in a
single DuckDB session on the server. Two-phase protocol: BQ sub-queries
(--register-bq) fetch filtered/aggregated data, then DuckDB SQL (--sql)
joins everything.
Safety: COUNT(*) pre-check, memory estimation (2GB cap), row limits
(500K per BQ sub-query, 100K final result).
Changes:
- New src/remote_query.py with CLI, BQ registration, output formatting
- Add bq_entity_type field to TableConfig (view vs table routing)
- Extract create_local_views() from duckdb_manager.py for reuse
- Update claude_md_template.txt with remote query agent instructions
- Update example configs with remote_query section and docs
- 52 new tests (42 remote_query + 10 bq_entity_type), all passing
Server venv is created during bootstrap via SSH (same package list,
installed natively on Linux). Removes sync_data.sh section that copied
pip freeze output across platforms (Windows/macOS freeze is incompatible
with Linux).
- Rewrite bootstrap.yaml as clean structured YAML with steps, commands,
descriptions, conditions, and notes
- Add _generate_setup_instructions() in app.py that reads YAML, substitutes
placeholders, and produces clipboard-ready plain text
- Replace 50-line hardcoded JS string builder with single tojson variable
- All setup instructions now editable in one YAML file
Add {ssh_alias} and {ssh_key} placeholders so each instance can use
its own SSH config name (avoids conflicts when user has multiple instances).
Remove Keboola-specific sync_settings and dataset references.
Simplify to single download_server_data step (rsync with scp fallback).
Handle SSH alias conflicts gracefully.
- Add get_metric_by_fqn() to OpenMetadataClient
- Add get_metrics() to CatalogEnricher with TTL caching
- Implement _parse_om_metric() to extract category/grain from OpenMetadata tags
- Implement _load_metrics_from_catalog() to fetch and categorize metrics
- Implement _build_om_metric_detail() to convert OpenMetadata format to MetricParser JSON
- Add /api/catalog/metrics/<fqn> endpoint for metric detail modal
- Update _load_metrics_data() to prefer catalog over YAML fallback
- Update metric_modal.js to route catalog:{fqn} to catalog API endpoint
- Delete 10 demo YAML files from docs/metrics/
- Replace metric tests with new unit tests for catalog parsing functions (19 tests)
Catalog metrics provide single source of truth vs maintaining demo YAML files.
UI remains unchanged - only data source changes from YAML to OpenMetadata catalog.
Extend theming from 3 CSS variables (primary colors only) to 14
configurable properties covering colors, fonts, borders, and shape.
All values are optional with sensible defaults.
- New _theme.html include replaces duplicated inline injection
- Wire theme include into all 7 templates (base, login, dashboard,
catalog, admin_tables, activity_center, corporate_memory)
- Conditional font loading: skip default Inter when custom font_url set
- Config.theme_overrides() classmethod generates CSS variable dict
- Visual theme-reference.html guide for instance configurators
- Document all theme keys in instance.yaml.example
These Keboola-specific data source cards don't belong in the OSS repo.
The catalog now shows only dynamic content: Core Business Data (from
data_description.md) and Business Metrics (from docs/metrics/*.yml).
Also update auto-install.md with Business Metrics documentation,
pipeline diagram, and expanded checklist.
Generator now supports --format {csv,parquet,both}. Parquet mode
uses src.parquet_manager.ParquetManager for snappy compression,
proper column types (DATE, TIMESTAMP, DOUBLE), and metadata.
No more ad-hoc pandas conversion needed on the server.
Extract 4 self-contained services into services/ module:
- server/telegram_bot/ -> services/telegram_bot/
- server/ws_gateway/ -> services/ws_gateway/
- server/corporate_memory/ -> services/corporate_memory/
- server/session_collector.py -> services/session_collector/
Each service now has its own systemd/ directory with .service and .timer files.
deploy.sh updated to auto-discover service units from services/*/systemd/*.
server/ now contains only deployment infrastructure (deploy.sh, setup scripts,
bin/ management tools, sudoers, nginx config).
All imports updated: webapp/app.py, server/bin/ scripts, systemd ExecStart paths.
Phase 1 - Internal reference cleanup:
- Delete dev_docs/meetings/ (internal meeting notes/transcripts)
- Replace hardcoded usernames (padak/matejkys/dasa) with deploy/generic
- Replace "Internal AI Data Analyst" with "AI Data Analyst"
- Replace keboola/internal_ai_data_analyst URLs with your-org/ai-data-analyst
- Replace /tmp/keboola_load/ with /tmp/data_analyst_staging/ in dev_docs
Phase 2 - Deployment hardening:
- Tighten sudoers wildcards to explicit paths (visudo, sudoers cp)
- setup.sh creates all groups (data-ops, dataread, data-private) and deploy user
- webapp-setup.sh copies sudoers-webapp from repo instead of inline definition
- deploy.sh conditional copy for data_description.md (not in git for OSS)
- deploy.sh ownership changed to deploy:data-ops for /data/{scripts,docs,examples}
Phase 3 - Config and misc:
- Add ${ENV_VAR} interpolation to config/loader.py
- Expand config/instance.yaml.example with all sections (admins, deployment, auth, etc.)
- Create config/.env.template for secret values
- Add MIT LICENSE
- Fix .gitignore: add .venv/, docs/data_description.md
- Fix README.md: CSV status Planned, remove metrics/, update license text
- Translate Czech comments in requirements.txt to English
- Fix test_account_service.py: mock username mapping instead of relying on instance config
All 118 tests pass.
Open-source AI data analyst platform extracted from internal repo.
Includes data sync engine, Keboola adapter, Flask web portal,
server deployment scripts, and configuration templates.