ZdenekSrotyr 55d11920ef docs: add comprehensive test strategy spec (6 parallel blocks, 4 layers)

Covers gap analysis, 8 critical E2E journeys, shared test infrastructure,
Docker E2E and live test design for full project coverage.

2026-04-12 10:33:26 +02:00

14 KiB

Raw Blame History

Comprehensive Test Strategy — AI Data Analyst

Date: 2026-04-12 Approach: Hybrid (gap analysis + critical journeys + parallel sub-agents) Goal: Full test coverage across unit, integration, Docker E2E, and live layers — repeatable, parallelizable, non-blocking to development.

1. Test Taxonomy

Layer	Marker	Runs in CI	What it tests	Isolation
Unit	(none)	Every PR	Isolated functions, business logic, parsers, validators	`tmp_path`, mocks
Integration	`@pytest.mark.integration`	Every PR	FastAPI TestClient, repository+DuckDB, CLI with mock server	`tmp_path`, `seeded_app` fixture
Docker E2E	`@pytest.mark.docker`	Nightly	Full docker-compose stack, HTTP from outside	docker compose up/down
Live	`@pytest.mark.live`	Manual/weekly	Real Keboola, BigQuery, Jira credentials	Read-only against real sources

CI matrix

# PR check (fast, <3 min)
pytest tests/ -x --timeout=60 -n auto  # unit + integration, parallel

# Nightly (docker, ~10 min)
pytest tests/ -m docker --timeout=120

# Weekly/manual (live, ~5 min)
pytest tests/ -m live --timeout=300

Repeatability guarantees

Every test uses tmp_path + monkeypatch — no global state leakage
Faker factories use deterministic seeds — same data on every run
Docker tests are idempotent — compose up → test → compose down, clean start
Live tests are read-only — they never mutate real data sources
CI uses pinned dependencies — no version drift between runs

2. Gap Analysis — Current vs. Target

Module	Current tests	Gap	Priority
WebSocket gateway	0	Auth, connection mgmt, heartbeat, multi-client	High
Corporate memory service	~0	Collector, hash detection, LLM mock, API CRUD+voting	High
Telegram bot	1 integration	Storage, sender, dispatch, verify/unlink flow	Medium
Upload API	0	Upload limits, directory traversal protection, session/artifact upload	High
Scripts API	0	Deploy, run, undeploy, ad-hoc execution	High
Settings API	0	Get/update settings	Medium
Memory API	0	CRUD, voting, admin approve/reject/mandate	High
Access requests API	0	Request→approve→verify flow, deny flow	High
Permissions API	unit ok, API weak	Grant→query→revoke integration flow	Medium
Metadata API	weak	Get/save/push metadata	Medium
Admin configure API	weak	Configure flow, credential validation	High
Admin discover-and-register	weak	Discovery + registration in one call	Medium
CLI commands	27 for ~15 cmds	Per-command coverage, error handling, output formats	High
Web UI routes	11	Auth redirects, dashboard render, setup wizard	Medium
Jira service	2	Incremental transform, webhook→rebuild pipeline	High
Scheduler edge cases	few	All parse_interval formats, is_table_due edge cases	Medium

3. Critical E2E Journeys

Eight user flows tested end-to-end:

J1: Bootstrap → Auth → Dashboard

da setup init → da setup bootstrap (admin user)
Password login → JWT token
Google OAuth mock → callback → session
GET /dashboard with valid session → 200
GET /dashboard without session → redirect to /login

J2: Table Registration → Sync → Query

POST /api/admin/register-table (name, folder, sync_strategy)
POST /api/sync/trigger → background sync with mock extractor
Orchestrator rebuild → views created in analytics.duckdb
POST /api/query SELECT * FROM registered_table → data returned
GET /api/catalog/tables → table appears in catalog

J3: Hybrid BQ + Local Query

Register local table via sync
POST /api/query/hybrid with register_bq → BQ subquery mocked + local join
CLI: da query --register-bq "alias=SELECT ..." --sql "SELECT ..."
CLI: stdin mode with JSON input
Live variant: real BigQuery credentials

J4: RBAC & Permissions

Create admin + analyst users
Admin grants permission on dataset → analyst can query
Admin revokes → analyst gets 403
Analyst creates access request → admin approves → analyst can query again
Wildcard bucket permissions tested

J5: Jira Webhook Pipeline

POST /webhooks/jira with valid HMAC signature → 200
POST /webhooks/jira with invalid signature → 401
Verify incremental_transform called → parquet updated
Verify rebuild_source("jira") called → views refreshed
POST /api/query on Jira data → results returned

J6: Corporate Memory Lifecycle

POST /api/upload/local-md → CLAUDE.local.md stored
Corporate memory collector runs (mocked LLM) → knowledge items created
GET /api/memory → items listed with filtering
POST /api/memory/{id}/vote → vote recorded
POST /api/memory/admin/approve → status changed
CLI sync picks up mandated items

J7: Analyst Workflow

da analyst setup → workspace created, data downloaded
da query --local "SELECT ..." → local DuckDB query works
POST /api/upload/sessions → session transcript stored
POST /api/upload/artifacts → artifact stored
da analyst status → freshness check passes

J8: Multi-source Orchestration

Create Keboola extract.duckdb (mock) + Jira extract.duckdb (mock) + BQ remote attach
SyncOrchestrator.rebuild() → all sources attached
Query across sources: SELECT * FROM keboola_table UNION SELECT * FROM jira_issues
Verify _remote_attach extensions loaded correctly
Live variant: real multi-source with actual credentials

4. Parallel Work Blocks (6 agents)

Each block writes to its own files — no conflicts. All blocks can run simultaneously.

Block A: API Gaps (Agent 1)

New test files:

tests/test_upload_api.py — session upload, artifact upload, 50MB limit, directory traversal reject, invalid content type
tests/test_scripts_api.py — deploy script, run deployed, run ad-hoc, undeploy, invalid script
tests/test_settings_api.py — get settings, update dataset settings, invalid input
tests/test_memory_api.py — CRUD, pagination, search, filtering, voting, admin approve/reject/mandate/revoke
tests/test_access_requests_api.py — create request, list my requests, pending (admin), approve, deny, duplicate request
tests/test_permissions_api.py — grant, revoke, list per-user, list all, wildcard bucket, query enforcement
tests/test_metadata_api.py — get metadata, save metadata, push to source (mock)
tests/test_admin_configure_api.py — configure data source, credential validation, discover-and-register

Estimated: ~60-80 tests

Block B: CLI Gaps (Agent 2)

New test files:

tests/test_cli_auth.py — login, logout, whoami, token storage, invalid credentials
tests/test_cli_admin.py — add-user, list-users, remove-user, register-table, discover-and-register, list-tables, metadata show/apply
tests/test_cli_sync.py — sync (--table, --upload-only, --docs-only, --json), progress reporting
tests/test_cli_query.py — query (--remote, --local, --hybrid, --limit, --format json/csv/table), error cases
tests/test_cli_analyst.py — analyst setup, analyst status, freshness check
tests/test_cli_server.py — server status, logs, restart, deploy, rollback, backup
tests/test_cli_diagnose.py — diagnose output collection, error formatting
tests/test_cli_explore.py — explore (--table, --limit, --json)
tests/test_cli_metrics.py — metrics list, create, update, delete

Testing pattern: Each CLI test uses CliRunner (Typer) + mock_http_server fixture for API calls.

Estimated: ~40-50 tests

Block C: Services (Agent 3)

New test files:

tests/test_ws_gateway.py — connection lifecycle, JWT auth on connect, heartbeat timeout, multi-client per user, connection limit, message routing, disconnect cleanup
tests/test_telegram_bot.py — /start flow, verification code generation, code verification, /help response, message dispatch, get_updates polling, callback query handling
tests/test_telegram_storage.py — SQLite storage: create code, get chat_id, expiry, duplicate codes
tests/test_scheduler_full.py — all parse_interval formats ("every 5m", "every 2h", "daily 05:00"), is_table_due with edge cases (never synced, just synced, overdue, future schedule), poll loop mock
tests/test_corporate_memory_collector.py — MD5 hash change detection, full refresh trigger, LLM extraction mock, knowledge merge, vote/ID preservation, governance field preservation
tests/test_session_collector.py — CLAUDE.local.md processing, session transcript parsing, artifact collection

Testing pattern: Services use mock sockets, mock HTTP clients, mock LLM responses. No real network.

Estimated: ~40-50 tests

Block D: Connectors (Agent 4)

New/expanded test files:

tests/test_keboola_extractor_full.py — DuckDB extension path, legacy client fallback, _meta creation, _remote_attach creation, multi-table extraction, error recovery, partial extraction
tests/test_bigquery_extractor_full.py — remote-only extraction, _remote_attach table, BQ extension mock, credential handling, query timeout
tests/test_jira_service_full.py — process_webhook_event (create/update/delete), trigger_incremental_transform, rebuild_source, concurrent webhook handling, malformed events
tests/test_jira_incremental.py — monthly parquet update, issue insert/update/delete in parquet, concurrent file access (file_lock)
tests/test_llm_providers_full.py — factory selection, OpenAI provider, Anthropic provider, retry logic, rate limit handling, structured output parsing

Testing pattern: Mock DuckDB extensions, mock API clients. Test the connector logic, not the external services.

Estimated: ~20-30 tests

Block E: E2E Journeys (Agent 5)

New test files:

tests/test_journey_bootstrap_auth.py — J1
tests/test_journey_sync_query.py — J2
tests/test_journey_hybrid.py — J3
tests/test_journey_rbac.py — J4
tests/test_journey_jira.py — J5
tests/test_journey_memory.py — J6
tests/test_journey_analyst.py — J7
tests/test_journey_multisource.py — J8

Testing pattern: Each journey uses seeded_app fixture + mock_extract_factory. Multi-step flows with assertions at each stage. Marked @pytest.mark.journey for selective running.

Estimated: ~30-40 tests

Block F: Docker & Live (Agent 6)

New/expanded test files:

tests/test_docker_full.py — extend existing docker E2E: full bootstrap flow, sync trigger, query via HTTP, multi-service health (app + scheduler + ws-gateway), profile=full (telegram + corporate memory)
tests/test_live_keboola.py — real Keboola extraction, table discovery, data validation (read-only)
tests/test_live_bigquery.py — real BQ query, hybrid query with real BQ source (read-only)
tests/test_live_jira.py — real Jira API read, webhook signature validation with real secret

Testing pattern: Docker tests use docker compose up with health wait. Live tests use env vars for credentials, skip if not set. All read-only.

Estimated: ~15-20 tests

5. Shared Test Infrastructure

Prepared before agents start — agents consume but don't modify these.

`tests/conftest.py` (extend existing)

New fixtures:

mock_extract_factory(source_name, tables, query_mode) — creates extract.duckdb with _meta, _remote_attach, and parquet data in tmp_path
mock_http_server(responses) — lightweight HTTP server on random port, returns configured responses, for CLI tests
analyst_user(seeded_app) — pre-created analyst user with limited permissions

`tests/helpers/factories.py` (new)

Faker-based factories with deterministic seeds:

UserFactory — email, name, role, hashed password
TableRegistryFactory — name, source_type, bucket, source_table, query_mode, sync_schedule
KnowledgeItemFactory — title, content, category, status
WebhookEventFactory — Jira webhook payloads with valid/invalid HMAC

`tests/helpers/assertions.py` (new)

assert_api_error(response, status, detail_contains) — validate error response shape
assert_parquet_schema(path, expected_columns) — validate parquet file structure
assert_extract_contract(extract_dir) — validate extract.duckdb has _meta + correct schema
assert_duckdb_table_exists(db_path, table_name) — check table in DuckDB

`tests/helpers/mocks.py` (new)

MockKeboolaExtension — simulates DuckDB Keboola extension behavior
MockBigQueryExtension — simulates DuckDB BQ extension behavior
MockJiraWebhook(valid_signature=True) — generates webhook payloads with correct HMAC
MockLLMProvider — returns configured responses for corporate memory tests

`tests/helpers/docker.py` (new)

wait_for_healthy(url, timeout=30) — poll health endpoint until ready
docker_compose_up(profile="default") — start services, return cleanup function
docker_exec(service, cmd) — run command inside container

pytest configuration

Add to pytest.ini:

markers =
    live: requires real credentials (deselected by default)
    docker: requires docker-compose (deselected by default)
    integration: FastAPI TestClient tests
    journey: end-to-end user flow tests

Add to pyproject.toml dev dependencies:

pytest-xdist>=3.0.0

6. Quality Gates & Review Checkpoints

Per-agent review

After each agent completes its block, a code-review sub-agent verifies:

All tests pass (pytest <block_files> -v)
No test relies on global state or execution order
Each test has a descriptive name and tests ONE thing
Negative cases covered (auth failures, invalid input, missing data, edge cases)
Assertions are specific (not just status code checks)
No hardcoded paths, ports, or credentials
Proper cleanup via fixtures

Post-merge validation

After all 6 blocks are merged:

Full suite run: pytest tests/ -v --timeout=60
Parallel run: pytest tests/ -n auto — verify no ordering dependencies
Docker run: pytest tests/ -m docker
Check no test file naming collisions
Verify total test count matches expectations (~210-270 new tests + ~204 existing)

Ongoing

PR CI runs unit + integration on every push
Nightly CI adds docker tests
Weekly manual run includes live tests
Test count tracked — regressions flagged in PR review

14 KiB Raw Blame History

Comprehensive Test Strategy — AI Data Analyst

1. Test Taxonomy

CI matrix

Repeatability guarantees

2. Gap Analysis — Current vs. Target

3. Critical E2E Journeys

J1: Bootstrap → Auth → Dashboard

J2: Table Registration → Sync → Query

J3: Hybrid BQ + Local Query

J4: RBAC & Permissions

J5: Jira Webhook Pipeline

J6: Corporate Memory Lifecycle

J7: Analyst Workflow

J8: Multi-source Orchestration

4. Parallel Work Blocks (6 agents)

Block A: API Gaps (Agent 1)

Block B: CLI Gaps (Agent 2)

Block C: Services (Agent 3)

Block D: Connectors (Agent 4)

Block E: E2E Journeys (Agent 5)

Block F: Docker & Live (Agent 6)

5. Shared Test Infrastructure

tests/conftest.py (extend existing)

tests/helpers/factories.py (new)

tests/helpers/assertions.py (new)

tests/helpers/mocks.py (new)

tests/helpers/docker.py (new)

pytest configuration

6. Quality Gates & Review Checkpoints

Per-agent review

Post-merge validation

Ongoing

14 KiB

Raw Blame History

`tests/conftest.py` (extend existing)

`tests/helpers/factories.py` (new)

`tests/helpers/assertions.py` (new)

`tests/helpers/mocks.py` (new)

`tests/helpers/docker.py` (new)