agnes-the-ai-analyst/connectors
ZdenekSrotyr 569cd90d75
fix(security): #81 Group D — extractor-side identifier validation (squashed) (#97)
Closes M15 from issue #81 — SQL injection via attacker-controlled
identifiers in connectors/keboola/extractor.py and
connectors/bigquery/extractor.py.

Lifted _validate_identifier from src/orchestrator.py into a new
src/identifier_validation.py shared module (single source of truth for
both layers). Two validator policies:

- validate_identifier (strict, ^[a-zA-Z_][a-zA-Z0-9_]{0,63}$) for
  table_name — matches the orchestrator's rebuild-time check, so dashed
  names fail fast at extraction rather than being silently dropped.
- validate_quoted_identifier (relaxed, accepts dashes/dots) for
  bucket/dataset/source_table — Keboola in.c-foo and BigQuery
  my-dataset are legitimate, just need to be safe inside `"..."`.

Both extractors skip-and-continue on unsafe rows (logged + counted in
failure stats); _extract_via_extension re-validates as defense-in-depth.

71/71 extractor + orchestrator tests pass.
Refs #81 Group D.
2026-04-27 21:46:17 +02:00
..
bigquery fix(security): #81 Group D — extractor-side identifier validation (squashed) (#97) 2026-04-27 21:46:17 +02:00
jira fix(security): close Jira webhook fail-open + path traversal (#83) (#93) 2026-04-27 19:53:55 +02:00
keboola fix(security): #81 Group D — extractor-side identifier validation (squashed) (#97) 2026-04-27 21:46:17 +02:00
llm Suppress httpcore debug logging in LLM connector 2026-03-23 12:57:35 +01:00
openmetadata chore(oss): isolate customer-specific deploy bits from scripts/grpn/ (#88, wave 1) (#94) 2026-04-27 20:24:34 +02:00
__init__.py Extract Jira into connectors/jira module 2026-03-09 11:17:50 +01:00