CLAUDE.md rewritten (708 -> ~320 lines): four overlapping release sections collapsed to one, stale v1->v35 schema history dropped (it lives in CHANGELOG), marketplace endpoint internals and verbose process sections moved out or tightened. New focused docs: - docs/RELEASING.md - release process, deploy workflows, CI quirks (RELEASE_TEMPLATE.md folded in as an appendix) - docs/marketplace.md - marketplace ingestion + re-serving internals - docs/README.md - documentation index by audience, linked from README.md and CLAUDE.md Archived under docs/archive/: docs/superpowers/ (52 historical planning artifacts), HACKATHON.md, pd-ps-comments.md, security-audit-2026-04.md, future/NOTIFICATIONS.md. Removed the docs/auto-install.md stub. Fixed dangling links in connectors/jira/README.md and dev_docs/README.md, repointed code/doc references to archived paths.
14 KiB
Comprehensive Test Strategy — AI Data Analyst
Date: 2026-04-12 Approach: Hybrid (gap analysis + critical journeys + parallel sub-agents) Goal: Full test coverage across unit, integration, Docker E2E, and live layers — repeatable, parallelizable, non-blocking to development.
1. Test Taxonomy
| Layer | Marker | Runs in CI | What it tests | Isolation |
|---|---|---|---|---|
| Unit | (none) | Every PR | Isolated functions, business logic, parsers, validators | tmp_path, mocks |
| Integration | @pytest.mark.integration |
Every PR | FastAPI TestClient, repository+DuckDB, CLI with mock server | tmp_path, seeded_app fixture |
| Docker E2E | @pytest.mark.docker |
Nightly | Full docker-compose stack, HTTP from outside | docker compose up/down |
| Live | @pytest.mark.live |
Manual/weekly | Real Keboola, BigQuery, Jira credentials | Read-only against real sources |
CI matrix
# PR check (fast, <3 min)
pytest tests/ -x --timeout=60 -n auto # unit + integration, parallel
# Nightly (docker, ~10 min)
pytest tests/ -m docker --timeout=120
# Weekly/manual (live, ~5 min)
pytest tests/ -m live --timeout=300
Repeatability guarantees
- Every test uses
tmp_path+monkeypatch— no global state leakage - Faker factories use deterministic seeds — same data on every run
- Docker tests are idempotent — compose up → test → compose down, clean start
- Live tests are read-only — they never mutate real data sources
- CI uses pinned dependencies — no version drift between runs
2. Gap Analysis — Current vs. Target
| Module | Current tests | Gap | Priority |
|---|---|---|---|
| WebSocket gateway | 0 | Auth, connection mgmt, heartbeat, multi-client | High |
| Corporate memory service | ~0 | Collector, hash detection, LLM mock, API CRUD+voting | High |
| Telegram bot | 1 integration | Storage, sender, dispatch, verify/unlink flow | Medium |
| Upload API | 0 | Upload limits, directory traversal protection, session/artifact upload | High |
| Scripts API | 0 | Deploy, run, undeploy, ad-hoc execution | High |
| Settings API | 0 | Get/update settings | Medium |
| Memory API | 0 | CRUD, voting, admin approve/reject/mandate | High |
| Access requests API | 0 | Request→approve→verify flow, deny flow | High |
| Permissions API | unit ok, API weak | Grant→query→revoke integration flow | Medium |
| Metadata API | weak | Get/save/push metadata | Medium |
| Admin configure API | weak | Configure flow, credential validation | High |
| Admin discover-and-register | weak | Discovery + registration in one call | Medium |
| CLI commands | 27 for ~15 cmds | Per-command coverage, error handling, output formats | High |
| Web UI routes | 11 | Auth redirects, dashboard render, setup wizard | Medium |
| Jira service | 2 | Incremental transform, webhook→rebuild pipeline | High |
| Scheduler edge cases | few | All parse_interval formats, is_table_due edge cases | Medium |
3. Critical E2E Journeys
Eight user flows tested end-to-end:
J1: Bootstrap → Auth → Dashboard
da setup init→da setup bootstrap(admin user)- Password login → JWT token
- Google OAuth mock → callback → session
- GET /dashboard with valid session → 200
- GET /dashboard without session → redirect to /login
J2: Table Registration → Sync → Query
- POST /api/admin/register-table (name, folder, sync_strategy)
- POST /api/sync/trigger → background sync with mock extractor
- Orchestrator rebuild → views created in analytics.duckdb
- POST /api/query
SELECT * FROM registered_table→ data returned - GET /api/catalog/tables → table appears in catalog
J3: Hybrid BQ + Local Query
- Register local table via sync
- POST /api/query/hybrid with register_bq → BQ subquery mocked + local join
- CLI:
da query --register-bq "alias=SELECT ..." --sql "SELECT ..." - CLI: stdin mode with JSON input
- Live variant: real BigQuery credentials
J4: RBAC & Permissions
- Create admin + analyst users
- Admin grants permission on dataset → analyst can query
- Admin revokes → analyst gets 403
- Analyst creates access request → admin approves → analyst can query again
- Wildcard bucket permissions tested
J5: Jira Webhook Pipeline
- POST /webhooks/jira with valid HMAC signature → 200
- POST /webhooks/jira with invalid signature → 401
- Verify incremental_transform called → parquet updated
- Verify rebuild_source("jira") called → views refreshed
- POST /api/query on Jira data → results returned
J6: Corporate Memory Lifecycle
- POST /api/upload/local-md → CLAUDE.local.md stored
- Corporate memory collector runs (mocked LLM) → knowledge items created
- GET /api/memory → items listed with filtering
- POST /api/memory/{id}/vote → vote recorded
- POST /api/memory/admin/approve → status changed
- CLI sync picks up mandated items
J7: Analyst Workflow
da analyst setup→ workspace created, data downloadedda query --local "SELECT ..."→ local DuckDB query works- POST /api/upload/sessions → session transcript stored
- POST /api/upload/artifacts → artifact stored
da analyst status→ freshness check passes
J8: Multi-source Orchestration
- Create Keboola extract.duckdb (mock) + Jira extract.duckdb (mock) + BQ remote attach
- SyncOrchestrator.rebuild() → all sources attached
- Query across sources:
SELECT * FROM keboola_table UNION SELECT * FROM jira_issues - Verify _remote_attach extensions loaded correctly
- Live variant: real multi-source with actual credentials
4. Parallel Work Blocks (6 agents)
Each block writes to its own files — no conflicts. All blocks can run simultaneously.
Block A: API Gaps (Agent 1)
New test files:
tests/test_upload_api.py— session upload, artifact upload, 50MB limit, directory traversal reject, invalid content typetests/test_scripts_api.py— deploy script, run deployed, run ad-hoc, undeploy, invalid scripttests/test_settings_api.py— get settings, update dataset settings, invalid inputtests/test_memory_api.py— CRUD, pagination, search, filtering, voting, admin approve/reject/mandate/revoketests/test_access_requests_api.py— create request, list my requests, pending (admin), approve, deny, duplicate requesttests/test_permissions_api.py— grant, revoke, list per-user, list all, wildcard bucket, query enforcementtests/test_metadata_api.py— get metadata, save metadata, push to source (mock)tests/test_admin_configure_api.py— configure data source, credential validation, discover-and-register
Estimated: ~60-80 tests
Block B: CLI Gaps (Agent 2)
New test files:
tests/test_cli_auth.py— login, logout, whoami, token storage, invalid credentialstests/test_cli_admin.py— add-user, list-users, remove-user, register-table, discover-and-register, list-tables, metadata show/applytests/test_cli_sync.py— sync (--table, --upload-only, --docs-only, --json), progress reportingtests/test_cli_query.py— query (--remote, --local, --hybrid, --limit, --format json/csv/table), error casestests/test_cli_analyst.py— analyst setup, analyst status, freshness checktests/test_cli_server.py— server status, logs, restart, deploy, rollback, backuptests/test_cli_diagnose.py— diagnose output collection, error formattingtests/test_cli_explore.py— explore (--table, --limit, --json)tests/test_cli_metrics.py— metrics list, create, update, delete
Testing pattern: Each CLI test uses CliRunner (Typer) + mock_http_server fixture for API calls.
Estimated: ~40-50 tests
Block C: Services (Agent 3)
New test files:
tests/test_ws_gateway.py— connection lifecycle, JWT auth on connect, heartbeat timeout, multi-client per user, connection limit, message routing, disconnect cleanuptests/test_telegram_bot.py— /start flow, verification code generation, code verification, /help response, message dispatch, get_updates polling, callback query handlingtests/test_telegram_storage.py— SQLite storage: create code, get chat_id, expiry, duplicate codestests/test_scheduler_full.py— all parse_interval formats ("every 5m", "every 2h", "daily 05:00"), is_table_due with edge cases (never synced, just synced, overdue, future schedule), poll loop mocktests/test_corporate_memory_collector.py— MD5 hash change detection, full refresh trigger, LLM extraction mock, knowledge merge, vote/ID preservation, governance field preservationtests/test_session_collector.py— CLAUDE.local.md processing, session transcript parsing, artifact collection
Testing pattern: Services use mock sockets, mock HTTP clients, mock LLM responses. No real network.
Estimated: ~40-50 tests
Block D: Connectors (Agent 4)
New/expanded test files:
tests/test_keboola_extractor_full.py— DuckDB extension path, legacy client fallback, _meta creation, _remote_attach creation, multi-table extraction, error recovery, partial extractiontests/test_bigquery_extractor_full.py— remote-only extraction, _remote_attach table, BQ extension mock, credential handling, query timeouttests/test_jira_service_full.py— process_webhook_event (create/update/delete), trigger_incremental_transform, rebuild_source, concurrent webhook handling, malformed eventstests/test_jira_incremental.py— monthly parquet update, issue insert/update/delete in parquet, concurrent file access (file_lock)tests/test_llm_providers_full.py— factory selection, OpenAI provider, Anthropic provider, retry logic, rate limit handling, structured output parsing
Testing pattern: Mock DuckDB extensions, mock API clients. Test the connector logic, not the external services.
Estimated: ~20-30 tests
Block E: E2E Journeys (Agent 5)
New test files:
tests/test_journey_bootstrap_auth.py— J1tests/test_journey_sync_query.py— J2tests/test_journey_hybrid.py— J3tests/test_journey_rbac.py— J4tests/test_journey_jira.py— J5tests/test_journey_memory.py— J6tests/test_journey_analyst.py— J7tests/test_journey_multisource.py— J8
Testing pattern: Each journey uses seeded_app fixture + mock_extract_factory. Multi-step flows with assertions at each stage. Marked @pytest.mark.journey for selective running.
Estimated: ~30-40 tests
Block F: Docker & Live (Agent 6)
New/expanded test files:
tests/test_docker_full.py— extend existing docker E2E: full bootstrap flow, sync trigger, query via HTTP, multi-service health (app + scheduler + ws-gateway), profile=full (telegram + corporate memory)tests/test_live_keboola.py— real Keboola extraction, table discovery, data validation (read-only)tests/test_live_bigquery.py— real BQ query, hybrid query with real BQ source (read-only)tests/test_live_jira.py— real Jira API read, webhook signature validation with real secret
Testing pattern: Docker tests use docker compose up with health wait. Live tests use env vars for credentials, skip if not set. All read-only.
Estimated: ~15-20 tests
5. Shared Test Infrastructure
Prepared before agents start — agents consume but don't modify these.
tests/conftest.py (extend existing)
New fixtures:
mock_extract_factory(source_name, tables, query_mode)— creates extract.duckdb with _meta, _remote_attach, and parquet data in tmp_pathmock_http_server(responses)— lightweight HTTP server on random port, returns configured responses, for CLI testsanalyst_user(seeded_app)— pre-created analyst user with limited permissions
tests/helpers/factories.py (new)
Faker-based factories with deterministic seeds:
UserFactory— email, name, role, hashed passwordTableRegistryFactory— name, source_type, bucket, source_table, query_mode, sync_scheduleKnowledgeItemFactory— title, content, category, statusWebhookEventFactory— Jira webhook payloads with valid/invalid HMAC
tests/helpers/assertions.py (new)
assert_api_error(response, status, detail_contains)— validate error response shapeassert_parquet_schema(path, expected_columns)— validate parquet file structureassert_extract_contract(extract_dir)— validate extract.duckdb has _meta + correct schemaassert_duckdb_table_exists(db_path, table_name)— check table in DuckDB
tests/helpers/mocks.py (new)
MockKeboolaExtension— simulates DuckDB Keboola extension behaviorMockBigQueryExtension— simulates DuckDB BQ extension behaviorMockJiraWebhook(valid_signature=True)— generates webhook payloads with correct HMACMockLLMProvider— returns configured responses for corporate memory tests
tests/helpers/docker.py (new)
wait_for_healthy(url, timeout=30)— poll health endpoint until readydocker_compose_up(profile="default")— start services, return cleanup functiondocker_exec(service, cmd)— run command inside container
pytest configuration
Add to pytest.ini:
markers =
live: requires real credentials (deselected by default)
docker: requires docker-compose (deselected by default)
integration: FastAPI TestClient tests
journey: end-to-end user flow tests
Add to pyproject.toml dev dependencies:
pytest-xdist>=3.0.0
6. Quality Gates & Review Checkpoints
Per-agent review
After each agent completes its block, a code-review sub-agent verifies:
- All tests pass (
pytest <block_files> -v) - No test relies on global state or execution order
- Each test has a descriptive name and tests ONE thing
- Negative cases covered (auth failures, invalid input, missing data, edge cases)
- Assertions are specific (not just status code checks)
- No hardcoded paths, ports, or credentials
- Proper cleanup via fixtures
Post-merge validation
After all 6 blocks are merged:
- Full suite run:
pytest tests/ -v --timeout=60 - Parallel run:
pytest tests/ -n auto— verify no ordering dependencies - Docker run:
pytest tests/ -m docker - Check no test file naming collisions
- Verify total test count matches expectations (~210-270 new tests + ~204 existing)
Ongoing
- PR CI runs unit + integration on every push
- Nightly CI adds docker tests
- Weekly manual run includes live tests
- Test count tracked — regressions flagged in PR review