agnes-the-ai-analyst/connectors/keboola
ZdenekSrotyr 569cd90d75
fix(security): #81 Group D — extractor-side identifier validation (squashed) (#97)
Closes M15 from issue #81 — SQL injection via attacker-controlled
identifiers in connectors/keboola/extractor.py and
connectors/bigquery/extractor.py.

Lifted _validate_identifier from src/orchestrator.py into a new
src/identifier_validation.py shared module (single source of truth for
both layers). Two validator policies:

- validate_identifier (strict, ^[a-zA-Z_][a-zA-Z0-9_]{0,63}$) for
  table_name — matches the orchestrator's rebuild-time check, so dashed
  names fail fast at extraction rather than being silently dropped.
- validate_quoted_identifier (relaxed, accepts dashes/dots) for
  bucket/dataset/source_table — Keboola in.c-foo and BigQuery
  my-dataset are legitimate, just need to be safe inside `"..."`.

Both extractors skip-and-continue on unsafe rows (logged + counted in
failure stats); _extract_via_extension re-validates as defense-in-depth.

71/71 extractor + orchestrator tests pass.
Refs #81 Group D.
2026-04-27 21:46:17 +02:00
..
tests refactor: delete old sync pipeline — 9,500 lines removed 2026-03-31 07:50:37 +02:00
__init__.py Extract Keboola into connectors/keboola module 2026-03-09 12:22:16 +01:00
client.py fix: keboola client metadata_cache_path uses DATA_DIR instead of deleted config 2026-03-31 11:57:57 +02:00
extractor.py fix(security): #81 Group D — extractor-side identifier validation (squashed) (#97) 2026-04-27 21:46:17 +02:00