Session testing revealed 3 issues with remote queries:
1. CLAUDE.md template recommended `cat <<HEREDOC | ssh ...` but
claude_settings.json had `cat` in deny list, causing 2-3 failed
attempts per query. Replaced with file-based approach: Write tool
creates JSON file, then `ssh ... < file` avoids the cat deny.
2. ssh/scp commands were not in the allow list, requiring manual
approval for every remote query. Added both to allow list.
3. DuckDB fetch_arrow_table() emitted DeprecationWarning on every
parquet export. Replaced with .arrow().read_all().
Also added instruction for proactive hybrid analysis when remote
tables are available (agent was only using local data until asked).
Agent was failing 3x on SSH commands due to backticks (BQ table names)
and single quotes (SQL string literals) getting mangled by nested shell
interpretation (local -> SSH -> bash -> Python).
New --stdin mode reads query spec as JSON from stdin via heredoc:
cat <<'QUERY' | ssh alias 'bash remote_query.sh --stdin'
{"register_bq": {"alias": "SELECT ... FROM \`table\` ..."}, "sql": "..."}
QUERY
Heredoc with <<'QUERY' (quoted) passes everything literally -- no
escaping needed for backticks, quotes, or parentheses.
Updated claude_md_template.txt to use --stdin as the primary method.
Analyst user (foundry_e_psimecek) couldn't access /opt/data-analyst/.
Added to data-ops group on server.
New scripts/remote_query.sh wrapper handles env setup (PYTHONPATH,
CONFIG_DIR, .env) so agents use simple:
ssh alias 'bash ~/server/scripts/remote_query.sh --sql "..." --format table'
Updated claude_md_template.txt to use wrapper instead of raw commands.
Add src/remote_query.py CLI module enabling the AI agent to run SQL
queries spanning local Parquet tables and remote BigQuery tables in a
single DuckDB session on the server. Two-phase protocol: BQ sub-queries
(--register-bq) fetch filtered/aggregated data, then DuckDB SQL (--sql)
joins everything.
Safety: COUNT(*) pre-check, memory estimation (2GB cap), row limits
(500K per BQ sub-query, 100K final result).
Changes:
- New src/remote_query.py with CLI, BQ registration, output formatting
- Add bq_entity_type field to TableConfig (view vs table routing)
- Extract create_local_views() from duckdb_manager.py for reuse
- Update claude_md_template.txt with remote query agent instructions
- Update example configs with remote_query section and docs
- 52 new tests (42 remote_query + 10 bq_entity_type), all passing
Open-source AI data analyst platform extracted from internal repo.
Includes data sync engine, Keboola adapter, Flask web portal,
server deployment scripts, and configuration templates.