CLAUDE.md rewritten (708 -> ~320 lines): four overlapping release sections collapsed to one, stale v1->v35 schema history dropped (it lives in CHANGELOG), marketplace endpoint internals and verbose process sections moved out or tightened. New focused docs: - docs/RELEASING.md - release process, deploy workflows, CI quirks (RELEASE_TEMPLATE.md folded in as an appendix) - docs/marketplace.md - marketplace ingestion + re-serving internals - docs/README.md - documentation index by audience, linked from README.md and CLAUDE.md Archived under docs/archive/: docs/superpowers/ (52 historical planning artifacts), HACKATHON.md, pd-ps-comments.md, security-audit-2026-04.md, future/NOTIFICATIONS.md. Removed the docs/auto-install.md stub. Fixed dangling links in connectors/jira/README.md and dev_docs/README.md, repointed code/doc references to archived paths.
1840 lines
54 KiB
Markdown
1840 lines
54 KiB
Markdown
# Production Hardening Implementation Plan
|
|
|
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
|
|
**Goal:** Fix all P0/P1 issues from 4 independent code reviews (architect, data engineer, senior dev, test specialist) to make the codebase production-ready.
|
|
|
|
**Architecture:** Fixes are grouped into 6 independent workstreams: auth/security, SQL safety, orchestrator bugs, DuckDB lifecycle, test hardening, and docs/cleanup. Each workstream can be executed in parallel by separate agents.
|
|
|
|
**Tech Stack:** Python 3.13, FastAPI, DuckDB, pytest, Docker
|
|
|
|
**Source:** Consolidated findings from 4 review agents run 2026-04-08.
|
|
|
|
---
|
|
|
|
## Workstream 1: Authentication & Security (P0)
|
|
|
|
### Task 1.1: Fix password bypass in /auth/token
|
|
|
|
The `/auth/token` endpoint issues a JWT without verifying the password when `request.password` is empty but `user.password_hash` exists. Any user with a password can get a token by omitting the password field.
|
|
|
|
**Files:**
|
|
- Modify: `app/auth/router.py:47-54`
|
|
- Test: `tests/test_auth_providers.py`
|
|
|
|
- [ ] **Step 1: Write the failing test**
|
|
|
|
```python
|
|
# In tests/test_auth_providers.py, add to existing test class:
|
|
|
|
def test_password_required_when_hash_exists(client, e2e_env):
|
|
"""A user with password_hash must provide correct password."""
|
|
from src.db import get_system_db
|
|
from src.repositories.users import UserRepository
|
|
from argon2 import PasswordHasher
|
|
|
|
conn = get_system_db()
|
|
repo = UserRepository(conn)
|
|
ph = PasswordHasher()
|
|
repo.create(id="pw-user", email="pw@test.com", role="analyst")
|
|
conn.execute(
|
|
"UPDATE users SET password_hash = ? WHERE id = ?",
|
|
[ph.hash("correct-password"), "pw-user"],
|
|
)
|
|
conn.close()
|
|
|
|
# Empty password should be rejected
|
|
resp = client.post("/auth/token", json={"email": "pw@test.com", "password": ""})
|
|
assert resp.status_code == 401
|
|
|
|
# Missing password field should be rejected
|
|
resp = client.post("/auth/token", json={"email": "pw@test.com"})
|
|
assert resp.status_code == 401
|
|
|
|
# Wrong password should be rejected
|
|
resp = client.post("/auth/token", json={"email": "pw@test.com", "password": "wrong"})
|
|
assert resp.status_code == 401
|
|
|
|
# Correct password should work
|
|
resp = client.post("/auth/token", json={"email": "pw@test.com", "password": "correct-password"})
|
|
assert resp.status_code == 200
|
|
```
|
|
|
|
- [ ] **Step 2: Run test to verify it fails**
|
|
|
|
Run: `pytest tests/test_auth_providers.py::test_password_required_when_hash_exists -v`
|
|
Expected: FAIL — empty password returns 200 instead of 401
|
|
|
|
- [ ] **Step 3: Fix the auth logic**
|
|
|
|
In `app/auth/router.py`, replace lines 47-54:
|
|
|
|
```python
|
|
# If user has password_hash, require and verify password
|
|
if user.get("password_hash"):
|
|
if not request.password:
|
|
raise HTTPException(status_code=401, detail="Password required")
|
|
try:
|
|
from argon2 import PasswordHasher
|
|
ph = PasswordHasher()
|
|
ph.verify(user["password_hash"], request.password)
|
|
except Exception:
|
|
raise HTTPException(status_code=401, detail="Invalid password")
|
|
```
|
|
|
|
- [ ] **Step 4: Run test to verify it passes**
|
|
|
|
Run: `pytest tests/test_auth_providers.py::test_password_required_when_hash_exists -v`
|
|
Expected: PASS
|
|
|
|
- [ ] **Step 5: Run full auth test suite**
|
|
|
|
Run: `pytest tests/test_auth_providers.py tests/test_security.py -v`
|
|
Expected: All pass
|
|
|
|
- [ ] **Step 6: Commit**
|
|
|
|
```bash
|
|
git add app/auth/router.py tests/test_auth_providers.py
|
|
git commit -m "fix: require password when password_hash exists — prevents auth bypass"
|
|
```
|
|
|
|
### Task 1.2: Fail on default JWT secret in non-test environments
|
|
|
|
The app starts with a hardcoded known secret if `JWT_SECRET_KEY` env var is missing. A production deployment that forgets to set it is wide open.
|
|
|
|
**Files:**
|
|
- Modify: `app/auth/jwt.py:9-16`
|
|
- Test: `tests/test_security.py`
|
|
|
|
- [ ] **Step 1: Write the failing test**
|
|
|
|
```python
|
|
# In tests/test_security.py, add:
|
|
|
|
def test_jwt_rejects_default_secret_in_production(monkeypatch):
|
|
"""App should refuse to start with the default JWT secret unless TESTING=1."""
|
|
monkeypatch.delenv("JWT_SECRET_KEY", raising=False)
|
|
monkeypatch.delenv("TESTING", raising=False)
|
|
|
|
with pytest.raises(RuntimeError, match="JWT_SECRET_KEY"):
|
|
# Force reimport to trigger module-level check
|
|
import importlib
|
|
import app.auth.jwt as jwt_mod
|
|
importlib.reload(jwt_mod)
|
|
```
|
|
|
|
- [ ] **Step 2: Run test to verify it fails**
|
|
|
|
Run: `pytest tests/test_security.py::test_jwt_rejects_default_secret_in_production -v`
|
|
Expected: FAIL — no RuntimeError raised
|
|
|
|
- [ ] **Step 3: Fix jwt.py**
|
|
|
|
Replace `app/auth/jwt.py` lines 9-16:
|
|
|
|
```python
|
|
SECRET_KEY = os.environ.get("JWT_SECRET_KEY", "")
|
|
|
|
if not SECRET_KEY:
|
|
if os.environ.get("TESTING", "").lower() in ("1", "true"):
|
|
SECRET_KEY = "test-jwt-secret-key-minimum-32-chars!!"
|
|
else:
|
|
raise RuntimeError(
|
|
"JWT_SECRET_KEY environment variable is required. "
|
|
"Generate one: python -c \"import secrets; print(secrets.token_hex(32))\""
|
|
)
|
|
elif len(SECRET_KEY) < 32 and os.environ.get("TESTING", "").lower() not in ("1", "true"):
|
|
import warnings as _warnings
|
|
_warnings.warn(
|
|
f"JWT_SECRET_KEY is {len(SECRET_KEY)} chars — minimum 32 recommended",
|
|
UserWarning, stacklevel=2,
|
|
)
|
|
```
|
|
|
|
- [ ] **Step 4: Run tests**
|
|
|
|
Run: `pytest tests/test_security.py tests/test_auth_providers.py -v`
|
|
Expected: All pass (existing tests set TESTING=1 or JWT_SECRET_KEY via conftest)
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add app/auth/jwt.py tests/test_security.py
|
|
git commit -m "fix: fail startup on missing JWT_SECRET_KEY in non-test environments"
|
|
```
|
|
|
|
### Task 1.3: Reduce JWT expiry, add jti claim
|
|
|
|
30-day tokens with no revocation mechanism are too risky.
|
|
|
|
**Files:**
|
|
- Modify: `app/auth/jwt.py:18-19,28-37`
|
|
- Test: `tests/test_security.py`
|
|
|
|
- [ ] **Step 1: Write the failing test**
|
|
|
|
```python
|
|
# In tests/test_security.py, add:
|
|
|
|
def test_jwt_contains_jti_claim():
|
|
"""JWT tokens must contain a unique jti claim for future revocation support."""
|
|
from app.auth.jwt import create_access_token, verify_token
|
|
token = create_access_token(user_id="u1", email="a@b.com", role="analyst")
|
|
payload = verify_token(token)
|
|
assert "jti" in payload
|
|
assert len(payload["jti"]) >= 16
|
|
|
|
def test_jwt_expiry_is_24_hours():
|
|
"""JWT tokens should expire in 24 hours, not 30 days."""
|
|
from app.auth.jwt import ACCESS_TOKEN_EXPIRE_HOURS
|
|
assert ACCESS_TOKEN_EXPIRE_HOURS == 24
|
|
```
|
|
|
|
- [ ] **Step 2: Run to verify failure**
|
|
|
|
Run: `pytest tests/test_security.py::test_jwt_contains_jti_claim tests/test_security.py::test_jwt_expiry_is_24_hours -v`
|
|
Expected: FAIL
|
|
|
|
- [ ] **Step 3: Fix jwt.py**
|
|
|
|
In `app/auth/jwt.py`:
|
|
|
|
Change line 19: `ACCESS_TOKEN_EXPIRE_HOURS = 24`
|
|
|
|
Add `import uuid` at the top. In `create_access_token`, add `"jti"` to payload:
|
|
|
|
```python
|
|
payload = {
|
|
"sub": user_id,
|
|
"email": email,
|
|
"role": role,
|
|
"exp": expire,
|
|
"iat": datetime.now(timezone.utc),
|
|
"jti": uuid.uuid4().hex,
|
|
}
|
|
```
|
|
|
|
- [ ] **Step 4: Run tests**
|
|
|
|
Run: `pytest tests/ -q --tb=short`
|
|
Expected: All pass
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add app/auth/jwt.py tests/test_security.py
|
|
git commit -m "fix: reduce JWT expiry to 24h, add jti claim for future revocation"
|
|
```
|
|
|
|
### Task 1.4: Fix get_optional_user not checking cookies
|
|
|
|
`get_optional_user` only checks the Authorization header, not cookies. Web UI users appear as None.
|
|
|
|
**Files:**
|
|
- Modify: `app/auth/dependencies.py:60-70`
|
|
- Test: `tests/test_auth_providers.py`
|
|
|
|
- [ ] **Step 1: Write the failing test**
|
|
|
|
```python
|
|
# In tests/test_auth_providers.py, add:
|
|
|
|
def test_optional_user_reads_cookie(client, e2e_env):
|
|
"""get_optional_user should detect cookie-authenticated users."""
|
|
from src.db import get_system_db
|
|
from src.repositories.users import UserRepository
|
|
from app.auth.jwt import create_access_token
|
|
|
|
conn = get_system_db()
|
|
UserRepository(conn).create(id="cookie-user", email="cookie@test.com", role="analyst")
|
|
conn.close()
|
|
|
|
token = create_access_token(user_id="cookie-user", email="cookie@test.com", role="analyst")
|
|
|
|
# Simulate web UI request with cookie but no Authorization header
|
|
resp = client.get("/api/catalog", cookies={"access_token": token})
|
|
assert resp.status_code == 200
|
|
```
|
|
|
|
- [ ] **Step 2: Run to verify behavior** (this may or may not fail depending on endpoint requirements)
|
|
|
|
- [ ] **Step 3: Fix dependencies.py**
|
|
|
|
Replace `get_optional_user` in `app/auth/dependencies.py`:
|
|
|
|
```python
|
|
async def get_optional_user(
|
|
request: Request = None,
|
|
authorization: Optional[str] = Header(None),
|
|
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
|
|
) -> Optional[dict]:
|
|
"""Like get_current_user but returns None instead of 401 if no token."""
|
|
try:
|
|
return await get_current_user(request=request, authorization=authorization, conn=conn)
|
|
except HTTPException:
|
|
return None
|
|
```
|
|
|
|
- [ ] **Step 4: Run tests**
|
|
|
|
Run: `pytest tests/test_auth_providers.py tests/test_api.py -v`
|
|
Expected: All pass
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add app/auth/dependencies.py tests/test_auth_providers.py
|
|
git commit -m "fix: get_optional_user now checks cookies like get_current_user"
|
|
```
|
|
|
|
---
|
|
|
|
## Workstream 2: SQL Safety (P0/P1)
|
|
|
|
### Task 2.1: Add identifier validation to orchestrator
|
|
|
|
`source_name` from directory names and `table_name` from `_meta` are interpolated into SQL without validation. A crafted directory name or _meta entry could inject arbitrary SQL.
|
|
|
|
**Files:**
|
|
- Modify: `src/orchestrator.py` (add validation helper, apply in `_attach_and_create_views` and `_attach_remote_extensions`)
|
|
- Test: `tests/test_orchestrator.py`
|
|
|
|
- [ ] **Step 1: Write the failing tests**
|
|
|
|
```python
|
|
# In tests/test_orchestrator.py, add:
|
|
|
|
def test_rejects_malicious_source_name(setup_env):
|
|
"""Orchestrator must reject directory names with SQL injection characters."""
|
|
from src.orchestrator import SyncOrchestrator
|
|
|
|
malicious_dir = setup_env["extracts_dir"] / 'test; DROP TABLE _meta--'
|
|
malicious_dir.mkdir()
|
|
db_path = malicious_dir / "extract.duckdb"
|
|
import duckdb as _duckdb
|
|
conn = _duckdb.connect(str(db_path))
|
|
conn.execute("""CREATE TABLE _meta (
|
|
table_name VARCHAR, description VARCHAR, rows BIGINT,
|
|
size_bytes BIGINT, extracted_at TIMESTAMP, query_mode VARCHAR DEFAULT 'local'
|
|
)""")
|
|
conn.execute('CREATE TABLE "safe_table" (id VARCHAR)')
|
|
conn.execute("INSERT INTO _meta VALUES ('safe_table', '', 0, 0, current_timestamp, 'local')")
|
|
conn.close()
|
|
|
|
orch = SyncOrchestrator(analytics_db_path=setup_env["analytics_db"])
|
|
result = orch.rebuild()
|
|
|
|
# Malicious source should be skipped, not attached
|
|
assert 'test; DROP TABLE _meta--' not in result
|
|
|
|
|
|
def test_rejects_malicious_table_name(setup_env):
|
|
"""Orchestrator must reject table names with SQL injection characters."""
|
|
from src.orchestrator import SyncOrchestrator
|
|
|
|
source_dir = setup_env["extracts_dir"] / "keboola"
|
|
source_dir.mkdir()
|
|
(source_dir / "data").mkdir()
|
|
|
|
db_path = source_dir / "extract.duckdb"
|
|
import duckdb as _duckdb
|
|
conn = _duckdb.connect(str(db_path))
|
|
conn.execute("""CREATE TABLE _meta (
|
|
table_name VARCHAR, description VARCHAR, rows BIGINT,
|
|
size_bytes BIGINT, extracted_at TIMESTAMP, query_mode VARCHAR DEFAULT 'local'
|
|
)""")
|
|
conn.execute('CREATE TABLE "safe" (id VARCHAR)')
|
|
conn.execute("INSERT INTO _meta VALUES ('safe', '', 0, 0, current_timestamp, 'local')")
|
|
# Malicious table name in _meta
|
|
conn.execute("""INSERT INTO _meta VALUES ('x"; DROP TABLE _meta; --', '', 0, 0, current_timestamp, 'local')""")
|
|
conn.close()
|
|
|
|
orch = SyncOrchestrator(analytics_db_path=setup_env["analytics_db"])
|
|
result = orch.rebuild()
|
|
|
|
# Safe table should be there, malicious should be skipped
|
|
assert "keboola" in result
|
|
assert "safe" in result["keboola"]
|
|
assert 'x"; DROP TABLE _meta; --' not in result.get("keboola", [])
|
|
```
|
|
|
|
- [ ] **Step 2: Run to verify failure**
|
|
|
|
Run: `pytest tests/test_orchestrator.py::test_rejects_malicious_source_name tests/test_orchestrator.py::test_rejects_malicious_table_name -v`
|
|
Expected: FAIL or crash from SQL injection
|
|
|
|
- [ ] **Step 3: Add validation helper and apply it**
|
|
|
|
At the top of `src/orchestrator.py`, add after imports:
|
|
|
|
```python
|
|
import re
|
|
|
|
_SAFE_IDENTIFIER = re.compile(r"^[a-zA-Z_][a-zA-Z0-9_]{0,63}$")
|
|
|
|
def _validate_identifier(name: str, context: str) -> bool:
|
|
"""Validate a DuckDB identifier. Returns True if safe, False if not."""
|
|
if not _SAFE_IDENTIFIER.match(name):
|
|
logger.warning("Rejected unsafe %s identifier: %r", context, name)
|
|
return False
|
|
return True
|
|
```
|
|
|
|
In `_do_rebuild` (line ~92), add check before `_attach_and_create_views`:
|
|
|
|
```python
|
|
if not _validate_identifier(ext_dir.name, "source_name"):
|
|
continue
|
|
```
|
|
|
|
In `_attach_and_create_views` (line ~160), add check before CREATE VIEW:
|
|
|
|
```python
|
|
for table_name, rows, size_bytes, query_mode in meta_rows:
|
|
if not _validate_identifier(table_name, "table_name"):
|
|
continue
|
|
```
|
|
|
|
In `_attach_remote_extensions` (line ~193), add check:
|
|
|
|
```python
|
|
for alias, extension, url, token_env in rows:
|
|
if not _validate_identifier(alias, "alias") or not _validate_identifier(extension, "extension"):
|
|
continue
|
|
```
|
|
|
|
- [ ] **Step 4: Run all orchestrator tests**
|
|
|
|
Run: `pytest tests/test_orchestrator.py -v`
|
|
Expected: All pass including new tests
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/orchestrator.py tests/test_orchestrator.py
|
|
git commit -m "fix: validate SQL identifiers in orchestrator — prevent injection via directory/table names"
|
|
```
|
|
|
|
### Task 2.2: Harden query endpoint SQL blocklist
|
|
|
|
The current blocklist misses `parquet_scan`, `read_csv_auto`, `query_table`, and has false positives on semicolons in string literals. Also add `enable_external_access=false` on the analytics connection.
|
|
|
|
**Files:**
|
|
- Modify: `app/api/query.py:39-51` and `src/db.py` (analytics readonly connection)
|
|
- Test: `tests/test_security.py`
|
|
|
|
- [ ] **Step 1: Write the failing tests**
|
|
|
|
```python
|
|
# In tests/test_security.py, add to TestQuerySecurity:
|
|
|
|
def test_blocks_parquet_scan(self, client, auth_headers):
|
|
resp = client.post("/api/query", json={"sql": "SELECT * FROM parquet_scan('/etc/passwd')"}, headers=auth_headers)
|
|
assert resp.status_code == 400
|
|
|
|
def test_blocks_read_csv_auto(self, client, auth_headers):
|
|
resp = client.post("/api/query", json={"sql": "SELECT * FROM read_csv_auto('/data/state/system.duckdb')"}, headers=auth_headers)
|
|
assert resp.status_code == 400
|
|
|
|
def test_blocks_query_table(self, client, auth_headers):
|
|
resp = client.post("/api/query", json={"sql": "SELECT * FROM query_table('secret')"}, headers=auth_headers)
|
|
assert resp.status_code == 400
|
|
|
|
def test_blocks_httpfs(self, client, auth_headers):
|
|
resp = client.post("/api/query", json={"sql": "SELECT * FROM read_parquet('https://evil.com/data.parquet')"}, headers=auth_headers)
|
|
assert resp.status_code == 400
|
|
```
|
|
|
|
- [ ] **Step 2: Run to verify failure**
|
|
|
|
Run: `pytest tests/test_security.py::TestQuerySecurity -v`
|
|
Expected: Some FAIL (parquet_scan, read_csv_auto, query_table not blocked)
|
|
|
|
- [ ] **Step 3: Expand the blocklist and set enable_external_access=false**
|
|
|
|
In `app/api/query.py`, replace the `blocked` list (lines 39-49):
|
|
|
|
```python
|
|
blocked = [
|
|
"drop ", "delete ", "insert ", "update ", "alter ", "create ",
|
|
"copy ", "attach ", "detach ", "load ", "install ",
|
|
"export ", "import ", "pragma ", "call ",
|
|
# File access functions
|
|
"read_csv", "read_json", "read_parquet", "read_text",
|
|
"write_csv", "write_parquet", "read_blob", "read_ndjson",
|
|
"parquet_scan", "parquet_metadata", "parquet_schema",
|
|
"json_scan", "csv_scan",
|
|
"query_table", "iceberg_scan", "delta_scan",
|
|
"glob(", "list_files",
|
|
"'/", '"/','http://', 'https://', 's3://', 'gcs://',
|
|
# Multiple statements
|
|
";",
|
|
]
|
|
```
|
|
|
|
In `src/db.py` `get_analytics_db_readonly()`, after opening the connection, add:
|
|
|
|
```python
|
|
try:
|
|
conn.execute("SET enable_external_access = false")
|
|
except Exception:
|
|
pass # Older DuckDB versions
|
|
```
|
|
|
|
- [ ] **Step 4: Run tests**
|
|
|
|
Run: `pytest tests/test_security.py::TestQuerySecurity -v`
|
|
Expected: All pass
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add app/api/query.py src/db.py tests/test_security.py
|
|
git commit -m "fix: expand query blocklist, disable external_access on analytics connection"
|
|
```
|
|
|
|
---
|
|
|
|
## Workstream 3: Orchestrator Bugs (P0/P1)
|
|
|
|
### Task 3.1: Fix rebuild_source destroying all other sources' views
|
|
|
|
`_do_rebuild_source()` creates a fresh temp DB with only one source, then replaces the entire analytics DB. Every Jira webhook wipes all Keboola/BigQuery views.
|
|
|
|
**Files:**
|
|
- Modify: `src/orchestrator.py:116-141`
|
|
- Test: `tests/test_orchestrator.py`
|
|
|
|
- [ ] **Step 1: Write the failing test**
|
|
|
|
```python
|
|
# In tests/test_orchestrator.py, add:
|
|
|
|
def test_rebuild_source_preserves_other_sources(setup_env):
|
|
"""rebuild_source('jira') must NOT destroy views from keboola."""
|
|
from src.orchestrator import SyncOrchestrator
|
|
|
|
_create_mock_extract(
|
|
setup_env["extracts_dir"], "keboola",
|
|
[{"name": "orders", "data": [{"id": "1"}]}],
|
|
)
|
|
_create_mock_extract(
|
|
setup_env["extracts_dir"], "jira",
|
|
[{"name": "issues", "data": [{"key": "PROJ-1"}]}],
|
|
)
|
|
|
|
orch = SyncOrchestrator(analytics_db_path=setup_env["analytics_db"])
|
|
|
|
# First: full rebuild
|
|
result = orch.rebuild()
|
|
assert "keboola" in result
|
|
assert "jira" in result
|
|
|
|
# Second: rebuild only jira
|
|
jira_tables = orch.rebuild_source("jira")
|
|
assert "issues" in jira_tables
|
|
|
|
# Third: full rebuild again — keboola must still be there
|
|
result2 = orch.rebuild()
|
|
assert "keboola" in result2
|
|
assert "orders" in result2["keboola"]
|
|
```
|
|
|
|
- [ ] **Step 2: Run to verify failure**
|
|
|
|
Run: `pytest tests/test_orchestrator.py::test_rebuild_source_preserves_other_sources -v`
|
|
Expected: FAIL — keboola views gone after rebuild_source("jira")
|
|
|
|
- [ ] **Step 3: Fix _do_rebuild_source to delegate to full rebuild**
|
|
|
|
In `src/orchestrator.py`, replace `_do_rebuild_source` (lines 116-141):
|
|
|
|
```python
|
|
def _do_rebuild_source(self, source_name: str) -> List[str]:
|
|
"""Rebuild views for a single source by doing a full rebuild.
|
|
|
|
A full rebuild is necessary because the analytics DB is created fresh
|
|
each time (temp file + atomic swap). Rebuilding only one source would
|
|
destroy views from all other sources.
|
|
"""
|
|
extracts_dir = _get_extracts_dir()
|
|
db_file = extracts_dir / source_name / "extract.duckdb"
|
|
if not db_file.exists():
|
|
logger.warning("No extract.duckdb for source %s", source_name)
|
|
return []
|
|
|
|
result = self._do_rebuild()
|
|
return result.get(source_name, [])
|
|
```
|
|
|
|
- [ ] **Step 4: Run tests**
|
|
|
|
Run: `pytest tests/test_orchestrator.py -v`
|
|
Expected: All pass
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/orchestrator.py tests/test_orchestrator.py
|
|
git commit -m "fix: rebuild_source delegates to full rebuild — preserves other sources' views"
|
|
```
|
|
|
|
### Task 3.2: Handle WAL files in atomic swap
|
|
|
|
`shutil.move` only moves the `.duckdb` file. The `.wal` file from the old DB can corrupt the new one.
|
|
|
|
**Files:**
|
|
- Modify: `src/orchestrator.py` (_do_rebuild lines 106-112)
|
|
- Modify: `connectors/keboola/extractor.py` (lines 148-155)
|
|
- Test: `tests/test_orchestrator.py`
|
|
|
|
- [ ] **Step 1: Write the failing test**
|
|
|
|
```python
|
|
# In tests/test_orchestrator.py, add:
|
|
|
|
def test_rebuild_cleans_wal_files(setup_env):
|
|
"""After rebuild, no .wal files should remain from the temp or old DB."""
|
|
from src.orchestrator import SyncOrchestrator
|
|
|
|
_create_mock_extract(
|
|
setup_env["extracts_dir"], "keboola",
|
|
[{"name": "orders", "data": [{"id": "1"}]}],
|
|
)
|
|
orch = SyncOrchestrator(analytics_db_path=setup_env["analytics_db"])
|
|
orch.rebuild()
|
|
|
|
from pathlib import Path
|
|
db_path = Path(setup_env["analytics_db"])
|
|
assert not (db_path.parent / (db_path.name + ".wal")).exists()
|
|
assert not (db_path.parent / (db_path.name + ".tmp.wal")).exists()
|
|
```
|
|
|
|
- [ ] **Step 2: Run to verify it passes or fails**
|
|
|
|
Run: `pytest tests/test_orchestrator.py::test_rebuild_cleans_wal_files -v`
|
|
|
|
- [ ] **Step 3: Add WAL cleanup helper**
|
|
|
|
In `src/orchestrator.py`, add a helper after `_validate_identifier`:
|
|
|
|
```python
|
|
def _atomic_swap_db(tmp_path: str, target_path: str) -> None:
|
|
"""Atomically replace target DuckDB file, cleaning up WAL files."""
|
|
import shutil
|
|
target = Path(target_path)
|
|
tmp = Path(tmp_path)
|
|
|
|
# Remove old WAL file if it exists
|
|
old_wal = Path(str(target) + ".wal")
|
|
if old_wal.exists():
|
|
old_wal.unlink()
|
|
|
|
# Move temp DB into place
|
|
if tmp.exists():
|
|
shutil.move(str(tmp), str(target))
|
|
|
|
# Clean up temp WAL
|
|
tmp_wal = Path(str(tmp) + ".wal")
|
|
if tmp_wal.exists():
|
|
tmp_wal.unlink()
|
|
```
|
|
|
|
Replace `shutil.move` call in `_do_rebuild` (line ~112) with:
|
|
|
|
```python
|
|
_atomic_swap_db(tmp_path, self._db_path)
|
|
```
|
|
|
|
Also add `CHECKPOINT` before `conn.close()` in `_do_rebuild`:
|
|
|
|
```python
|
|
conn.execute("CHECKPOINT")
|
|
finally:
|
|
conn.close()
|
|
```
|
|
|
|
Apply the same pattern in `connectors/keboola/extractor.py` at the end of `run()`.
|
|
|
|
- [ ] **Step 4: Run tests**
|
|
|
|
Run: `pytest tests/test_orchestrator.py tests/test_keboola_extractor.py -v`
|
|
Expected: All pass
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/orchestrator.py connectors/keboola/extractor.py tests/test_orchestrator.py
|
|
git commit -m "fix: clean WAL files during atomic DB swap, add CHECKPOINT before close"
|
|
```
|
|
|
|
### Task 3.3: Add temp-file swap to BigQuery extractor
|
|
|
|
BigQuery extractor writes directly to `extract.duckdb`, causing lock conflicts with the orchestrator.
|
|
|
|
**Files:**
|
|
- Modify: `connectors/bigquery/extractor.py:64-68`
|
|
- Test: `tests/test_bigquery_extractor.py`
|
|
|
|
- [ ] **Step 1: Write the test**
|
|
|
|
```python
|
|
# In tests/test_bigquery_extractor.py, add:
|
|
|
|
def test_uses_temp_file_swap(self, output_dir):
|
|
"""BigQuery extractor should write to .tmp and rename, not write directly."""
|
|
from connectors.bigquery.extractor import _create_meta_table
|
|
db_path = Path(output_dir) / "extract.duckdb"
|
|
|
|
# Pre-create the DB to simulate existing file
|
|
conn = duckdb.connect(str(db_path))
|
|
_create_meta_table(conn)
|
|
conn.close()
|
|
|
|
# After init_extract, the file should exist and no .tmp should remain
|
|
# (The actual init_extract test already covers this — we just verify no .tmp leak)
|
|
assert db_path.exists()
|
|
assert not (Path(output_dir) / "extract.duckdb.tmp").exists()
|
|
```
|
|
|
|
- [ ] **Step 2: Modify init_extract to use temp-file swap**
|
|
|
|
In `connectors/bigquery/extractor.py`, replace lines 64-68:
|
|
|
|
```python
|
|
output_path = Path(output_dir)
|
|
output_path.mkdir(parents=True, exist_ok=True)
|
|
|
|
db_path = output_path / "extract.duckdb"
|
|
tmp_db_path = output_path / "extract.duckdb.tmp"
|
|
if tmp_db_path.exists():
|
|
tmp_db_path.unlink()
|
|
conn = duckdb.connect(str(tmp_db_path))
|
|
```
|
|
|
|
And at the end, before `return stats` (after `conn.close()`):
|
|
|
|
```python
|
|
import shutil
|
|
if tmp_db_path.exists():
|
|
shutil.move(str(tmp_db_path), str(db_path))
|
|
```
|
|
|
|
- [ ] **Step 3: Run tests**
|
|
|
|
Run: `pytest tests/test_bigquery_extractor.py -v`
|
|
Expected: All pass
|
|
|
|
- [ ] **Step 4: Commit**
|
|
|
|
```bash
|
|
git add connectors/bigquery/extractor.py tests/test_bigquery_extractor.py
|
|
git commit -m "fix: BigQuery extractor uses temp-file swap to avoid lock conflicts"
|
|
```
|
|
|
|
---
|
|
|
|
## Workstream 4: Script Sandbox Hardening (P1)
|
|
|
|
### Task 4.1: Strip VIRTUAL_ENV and PYTHONPATH from sandbox subprocess
|
|
|
|
The sandbox gives scripts access to all installed packages (httpx, duckdb) via inherited env vars.
|
|
|
|
**Files:**
|
|
- Modify: `app/api/scripts.py:191-198`
|
|
- Test: `tests/test_security.py`
|
|
|
|
- [ ] **Step 1: Write the failing test**
|
|
|
|
```python
|
|
# In tests/test_security.py, add to TestScriptSecurity:
|
|
|
|
def test_sandbox_cannot_import_httpx(self, client, admin_headers):
|
|
"""Sandboxed scripts must not have access to httpx or other installed packages."""
|
|
resp = client.post("/api/scripts/run", json={
|
|
"name": "test",
|
|
"source": "import httpx\nprint('pwned')"
|
|
}, headers=admin_headers)
|
|
data = resp.json()
|
|
# httpx should be blocked by pattern OR unavailable due to stripped VIRTUAL_ENV
|
|
assert resp.status_code == 400 or data.get("exit_code", 0) != 0
|
|
```
|
|
|
|
- [ ] **Step 2: Run to verify failure**
|
|
|
|
Run: `pytest tests/test_security.py::TestScriptSecurity::test_sandbox_cannot_import_httpx -v`
|
|
Expected: FAIL — httpx imports successfully
|
|
|
|
- [ ] **Step 3: Fix the subprocess env**
|
|
|
|
In `app/api/scripts.py`, replace the env dict in `subprocess.run` (lines 191-198):
|
|
|
|
```python
|
|
env={
|
|
"PATH": "/usr/bin:/usr/local/bin",
|
|
"DATA_DIR": data_dir,
|
|
"HOME": "/tmp",
|
|
# Deliberately exclude VIRTUAL_ENV and PYTHONPATH
|
|
# to prevent access to installed packages
|
|
},
|
|
```
|
|
|
|
Also add `"httpx"`, `"from httpx"` to `blocked_patterns` list.
|
|
|
|
- [ ] **Step 4: Run tests**
|
|
|
|
Run: `pytest tests/test_security.py::TestScriptSecurity -v`
|
|
Expected: All pass
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add app/api/scripts.py tests/test_security.py
|
|
git commit -m "fix: strip VIRTUAL_ENV/PYTHONPATH from script sandbox, block httpx import"
|
|
```
|
|
|
|
---
|
|
|
|
## Workstream 5: Test Hardening (P0-P1)
|
|
|
|
### Task 5.1: Fix environment variable leaking in test fixtures
|
|
|
|
Most test files set `os.environ["DATA_DIR"]` directly without cleanup. This causes test ordering dependencies.
|
|
|
|
**Files:**
|
|
- Modify: `tests/test_db.py`, `tests/test_rbac.py`, `tests/test_repositories.py`, `tests/test_api.py`, `tests/test_api_complete.py`, `tests/test_api_scripts.py`, `tests/test_auth_providers.py`, `tests/test_bootstrap.py`, `tests/test_permissions.py`, `tests/test_security.py`
|
|
|
|
- [ ] **Step 1: Search and replace pattern**
|
|
|
|
In every test file that has `os.environ["DATA_DIR"] =` inside a fixture, replace with `monkeypatch.setenv("DATA_DIR", ...)`. Add `monkeypatch` to the fixture parameters.
|
|
|
|
Example — in `tests/test_db.py`, change:
|
|
|
|
```python
|
|
@pytest.fixture
|
|
def db_env(tmp_path):
|
|
os.environ["DATA_DIR"] = str(tmp_path)
|
|
yield tmp_path
|
|
```
|
|
|
|
To:
|
|
|
|
```python
|
|
@pytest.fixture
|
|
def db_env(tmp_path, monkeypatch):
|
|
monkeypatch.setenv("DATA_DIR", str(tmp_path))
|
|
yield tmp_path
|
|
```
|
|
|
|
Apply to all affected files. Remove manual `os.environ.pop("DATA_DIR", None)` lines since monkeypatch handles cleanup automatically.
|
|
|
|
- [ ] **Step 2: Run full test suite**
|
|
|
|
Run: `pytest tests/ -v --tb=short`
|
|
Expected: 607+ passed
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add tests/
|
|
git commit -m "fix: use monkeypatch for DATA_DIR in all test fixtures — prevent env leaking"
|
|
```
|
|
|
|
### Task 5.2: Add extract.duckdb contract test
|
|
|
|
Create a shared validator that verifies any extract.duckdb conforms to the contract. Apply it in all extractor tests.
|
|
|
|
**Files:**
|
|
- Create: `tests/helpers/contract.py`
|
|
- Modify: `tests/test_keboola_extractor.py`, `tests/test_bigquery_extractor.py`
|
|
|
|
- [ ] **Step 1: Create contract validator**
|
|
|
|
```python
|
|
# tests/helpers/__init__.py (empty)
|
|
# tests/helpers/contract.py
|
|
|
|
"""Shared validator for the extract.duckdb contract."""
|
|
|
|
import duckdb
|
|
from pathlib import Path
|
|
|
|
|
|
def validate_extract_contract(db_path: str) -> None:
|
|
"""Verify an extract.duckdb conforms to the contract.
|
|
|
|
Raises AssertionError with details if any check fails.
|
|
"""
|
|
path = Path(db_path)
|
|
assert path.exists(), f"extract.duckdb not found at {db_path}"
|
|
|
|
conn = duckdb.connect(str(path), read_only=True)
|
|
try:
|
|
# _meta table must exist with correct schema
|
|
cols = conn.execute(
|
|
"SELECT column_name FROM information_schema.columns "
|
|
"WHERE table_name='_meta' ORDER BY ordinal_position"
|
|
).fetchall()
|
|
col_names = [c[0] for c in cols]
|
|
assert col_names == ["table_name", "description", "rows", "size_bytes", "extracted_at", "query_mode"], \
|
|
f"_meta schema mismatch: {col_names}"
|
|
|
|
# Every _meta entry with query_mode='local' must have a corresponding view or table
|
|
local_tables = conn.execute(
|
|
"SELECT table_name FROM _meta WHERE query_mode = 'local'"
|
|
).fetchall()
|
|
for (name,) in local_tables:
|
|
tables = conn.execute(
|
|
"SELECT table_name FROM information_schema.tables WHERE table_name = ?", [name]
|
|
).fetchall()
|
|
assert len(tables) > 0, f"Local table '{name}' in _meta but no view/table exists"
|
|
|
|
# If _remote_attach exists, validate its schema
|
|
ra_exists = conn.execute(
|
|
"SELECT count(*) FROM information_schema.tables WHERE table_name='_remote_attach'"
|
|
).fetchone()[0]
|
|
if ra_exists:
|
|
ra_cols = conn.execute(
|
|
"SELECT column_name FROM information_schema.columns "
|
|
"WHERE table_name='_remote_attach' ORDER BY ordinal_position"
|
|
).fetchall()
|
|
ra_col_names = [c[0] for c in ra_cols]
|
|
assert ra_col_names == ["alias", "extension", "url", "token_env"], \
|
|
f"_remote_attach schema mismatch: {ra_col_names}"
|
|
finally:
|
|
conn.close()
|
|
```
|
|
|
|
- [ ] **Step 2: Apply in extractor tests**
|
|
|
|
In `tests/test_keboola_extractor.py`, add to `test_creates_extract_duckdb`:
|
|
|
|
```python
|
|
from tests.helpers.contract import validate_extract_contract
|
|
validate_extract_contract(str(Path(output_dir) / "extract.duckdb"))
|
|
```
|
|
|
|
Similarly in `tests/test_bigquery_extractor.py::test_creates_extract_duckdb_with_meta`.
|
|
|
|
- [ ] **Step 3: Run tests**
|
|
|
|
Run: `pytest tests/test_keboola_extractor.py tests/test_bigquery_extractor.py -v`
|
|
Expected: All pass
|
|
|
|
- [ ] **Step 4: Commit**
|
|
|
|
```bash
|
|
git add tests/helpers/ tests/test_keboola_extractor.py tests/test_bigquery_extractor.py
|
|
git commit -m "feat: add extract.duckdb contract validator, apply in all extractor tests"
|
|
```
|
|
|
|
### Task 5.3: Add pytest timeout and strict markers
|
|
|
|
Prevent CI hangs and catch marker typos.
|
|
|
|
**Files:**
|
|
- Modify: `pytest.ini`
|
|
- Modify: `requirements.txt` (add pytest-timeout)
|
|
|
|
- [ ] **Step 1: Update pytest.ini**
|
|
|
|
```ini
|
|
[pytest]
|
|
addopts = -m "not live and not docker" --timeout=60 --strict-markers
|
|
markers =
|
|
live: tests requiring server access (run with '-m live')
|
|
docker: tests requiring Docker (run with '-m docker')
|
|
```
|
|
|
|
- [ ] **Step 2: Add pytest-timeout to requirements.txt**
|
|
|
|
Add line: `pytest-timeout>=2.0.0`
|
|
|
|
- [ ] **Step 3: Install and run**
|
|
|
|
Run: `uv pip install --system pytest-timeout && pytest tests/ -q --tb=short`
|
|
Expected: All pass within 60s timeout
|
|
|
|
- [ ] **Step 4: Commit**
|
|
|
|
```bash
|
|
git add pytest.ini requirements.txt
|
|
git commit -m "chore: add pytest-timeout (60s) and strict-markers to pytest config"
|
|
```
|
|
|
|
---
|
|
|
|
## Workstream 6: Docs & Cleanup (P1-P2)
|
|
|
|
### Task 6.1: Rewrite README.md from CLAUDE.md
|
|
|
|
The current README describes the old Flask/rsync architecture. CLAUDE.md is accurate.
|
|
|
|
**Files:**
|
|
- Modify: `README.md`
|
|
|
|
- [ ] **Step 1: Rewrite README.md**
|
|
|
|
Use CLAUDE.md as the source of truth. The README should contain:
|
|
- Project description (1-2 paragraphs)
|
|
- Architecture diagram (from CLAUDE.md)
|
|
- Quick start (Docker compose)
|
|
- Development setup (venv, pytest)
|
|
- Project structure (from CLAUDE.md)
|
|
- Configuration overview
|
|
- Supported data sources (Keboola ✅, BigQuery ✅, Jira ✅)
|
|
- Links to docs/DEPLOYMENT.md for server setup
|
|
- License
|
|
|
|
Remove all references to Flask, rsync, SSH, sync_data.sh, Linux groups, server/setup.sh.
|
|
|
|
- [ ] **Step 2: Verify no broken references**
|
|
|
|
Run: `grep -r "webapp/" README.md; grep -r "sync_data.sh" README.md; grep -r "server/setup" README.md`
|
|
Expected: No matches
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add README.md
|
|
git commit -m "docs: rewrite README for v2 architecture (FastAPI, DuckDB, Docker)"
|
|
```
|
|
|
|
### Task 6.2: Update .env.template to match actual code
|
|
|
|
Template references `WEBAPP_SECRET_KEY` but code uses `JWT_SECRET_KEY`.
|
|
|
|
**Files:**
|
|
- Modify: `config/.env.template`
|
|
|
|
- [ ] **Step 1: Rewrite template**
|
|
|
|
```bash
|
|
# AI Data Analyst - Environment Variables
|
|
# Copy to .env: cp config/.env.template .env
|
|
# .env is gitignored - NEVER commit it.
|
|
|
|
# Required
|
|
JWT_SECRET_KEY= # python -c "import secrets; print(secrets.token_hex(32))"
|
|
|
|
# Google OAuth (optional — needed for Google login)
|
|
# GOOGLE_CLIENT_ID=
|
|
# GOOGLE_CLIENT_SECRET=
|
|
|
|
# Keboola adapter (optional — skip if using CSV/sample data)
|
|
# KEBOOLA_STORAGE_TOKEN=
|
|
# KEBOOLA_STACK_URL=https://connection.keboola.com
|
|
|
|
# Bootstrap admin (optional — used on first docker compose up)
|
|
# SEED_ADMIN_EMAIL=admin@example.com
|
|
|
|
# Optional services
|
|
# TELEGRAM_BOT_TOKEN=
|
|
# JIRA_WEBHOOK_SECRET=
|
|
# ANTHROPIC_API_KEY=
|
|
```
|
|
|
|
- [ ] **Step 2: Commit**
|
|
|
|
```bash
|
|
git add config/.env.template
|
|
git commit -m "docs: update .env.template to match actual code (JWT_SECRET_KEY, not WEBAPP_SECRET_KEY)"
|
|
```
|
|
|
|
### Task 6.3: Remove dead Flask Blueprint from Jira connector
|
|
|
|
`connectors/jira/webhook.py` uses Flask Blueprint but the app uses FastAPI. It's dead code that confuses readers.
|
|
|
|
**Files:**
|
|
- Check: `connectors/jira/webhook.py` — verify it's not imported anywhere except Jira-internal code
|
|
- Modify: add deprecation comment or delete if unused
|
|
|
|
- [ ] **Step 1: Check if webhook.py is imported**
|
|
|
|
Run: `grep -r "from connectors.jira.webhook" app/ src/ services/`
|
|
If no matches: the Flask Blueprint is dead code.
|
|
|
|
- [ ] **Step 2: Add deprecation notice or delete**
|
|
|
|
If unused by the FastAPI app, delete `connectors/jira/webhook.py` and update any imports.
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add connectors/jira/
|
|
git commit -m "chore: remove dead Flask Blueprint from Jira connector (replaced by FastAPI)"
|
|
```
|
|
|
|
### Task 6.4: Add upload size limit
|
|
|
|
`upload_session` and `upload_artifact` read entire files into memory with no limit.
|
|
|
|
**Files:**
|
|
- Modify: `app/api/upload.py`
|
|
- Test: `tests/test_api_complete.py`
|
|
|
|
- [ ] **Step 1: Write the test**
|
|
|
|
```python
|
|
# In tests/test_api_complete.py or a new test file:
|
|
|
|
def test_upload_rejects_oversized_file(client, admin_headers):
|
|
"""Uploads over 50MB should be rejected."""
|
|
# Create a file reference that claims to be too large
|
|
import io
|
|
large_data = b"x" * (50 * 1024 * 1024 + 1) # 50MB + 1 byte
|
|
resp = client.post(
|
|
"/api/upload/artifact/test-session",
|
|
files={"file": ("big.csv", io.BytesIO(large_data), "text/csv")},
|
|
headers=admin_headers,
|
|
)
|
|
assert resp.status_code == 413 or resp.status_code == 400
|
|
```
|
|
|
|
- [ ] **Step 2: Add size check**
|
|
|
|
In `app/api/upload.py`, at the start of each upload endpoint:
|
|
|
|
```python
|
|
MAX_UPLOAD_SIZE = 50 * 1024 * 1024 # 50 MB
|
|
contents = await file.read()
|
|
if len(contents) > MAX_UPLOAD_SIZE:
|
|
raise HTTPException(status_code=413, detail=f"File too large (max {MAX_UPLOAD_SIZE // 1024 // 1024}MB)")
|
|
```
|
|
|
|
- [ ] **Step 3: Run tests**
|
|
|
|
Run: `pytest tests/test_api_complete.py -v`
|
|
Expected: All pass
|
|
|
|
- [ ] **Step 4: Commit**
|
|
|
|
```bash
|
|
git add app/api/upload.py tests/test_api_complete.py
|
|
git commit -m "fix: add 50MB upload size limit — prevent memory exhaustion"
|
|
```
|
|
|
|
---
|
|
|
|
## Workstream 7: DuckDB Lifecycle & Connection Management (P1)
|
|
|
|
### Task 7.1: Fix SQL injection in get_analytics_db_readonly
|
|
|
|
Same unquoted `ext_dir.name` issue as the orchestrator, but in the read-only analytics connection used by every query request.
|
|
|
|
**Files:**
|
|
- Modify: `src/db.py:228-233`
|
|
- Test: `tests/test_db.py`
|
|
|
|
- [ ] **Step 1: Write the failing test**
|
|
|
|
```python
|
|
# In tests/test_db.py, add:
|
|
|
|
def test_analytics_readonly_rejects_malicious_dir_name(db_env):
|
|
"""get_analytics_db_readonly must skip directories with unsafe names."""
|
|
extracts = db_env / "extracts"
|
|
extracts.mkdir(parents=True)
|
|
malicious = extracts / "test; DROP TABLE x--"
|
|
malicious.mkdir()
|
|
db_file = malicious / "extract.duckdb"
|
|
|
|
import duckdb
|
|
conn = duckdb.connect(str(db_file))
|
|
conn.execute("""CREATE TABLE _meta (
|
|
table_name VARCHAR, description VARCHAR, rows BIGINT,
|
|
size_bytes BIGINT, extracted_at TIMESTAMP, query_mode VARCHAR
|
|
)""")
|
|
conn.close()
|
|
|
|
# Should not crash
|
|
from src.db import get_analytics_db_readonly
|
|
ro_conn = get_analytics_db_readonly()
|
|
ro_conn.close()
|
|
```
|
|
|
|
- [ ] **Step 2: Run to verify failure**
|
|
|
|
Run: `pytest tests/test_db.py::test_analytics_readonly_rejects_malicious_dir_name -v`
|
|
|
|
- [ ] **Step 3: Add identifier validation**
|
|
|
|
Import the validator from orchestrator (or extract to shared module). In `src/db.py`, add at top:
|
|
|
|
```python
|
|
import re
|
|
_SAFE_IDENTIFIER = re.compile(r"^[a-zA-Z_][a-zA-Z0-9_]{0,63}$")
|
|
```
|
|
|
|
In `get_analytics_db_readonly()`, replace line 232:
|
|
|
|
```python
|
|
if db_file.exists() and ext_dir.is_dir():
|
|
if not _SAFE_IDENTIFIER.match(ext_dir.name):
|
|
continue
|
|
try:
|
|
conn.execute(f"ATTACH '{db_file}' AS {ext_dir.name} (READ_ONLY)")
|
|
```
|
|
|
|
- [ ] **Step 4: Run tests**
|
|
|
|
Run: `pytest tests/test_db.py -v`
|
|
Expected: All pass
|
|
|
|
- [ ] **Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/db.py tests/test_db.py
|
|
git commit -m "fix: validate identifiers in get_analytics_db_readonly — prevent SQL injection"
|
|
```
|
|
|
|
### Task 7.2: Remove dead PRAGMA enable_wal code
|
|
|
|
`PRAGMA enable_wal` is not valid DuckDB syntax. DuckDB uses WAL by default since v0.8. This is dead code with a misleading comment.
|
|
|
|
**Files:**
|
|
- Modify: `src/db.py:200-204`
|
|
|
|
- [ ] **Step 1: Remove the dead code**
|
|
|
|
In `src/db.py`, delete lines 200-204:
|
|
|
|
```python
|
|
# WAL mode: allows concurrent readers while writing
|
|
try:
|
|
_system_db_conn.execute("PRAGMA enable_wal")
|
|
except Exception:
|
|
pass # Older DuckDB versions may not support this
|
|
```
|
|
|
|
- [ ] **Step 2: Run tests**
|
|
|
|
Run: `pytest tests/test_db.py -v`
|
|
Expected: All pass
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add src/db.py
|
|
git commit -m "chore: remove dead PRAGMA enable_wal — DuckDB uses WAL by default"
|
|
```
|
|
|
|
### Task 7.3: Escape token single-quotes in ATTACH SQL
|
|
|
|
If a token contains a single quote, the ATTACH SQL breaks. DuckDB doesn't support parameterized ATTACH, so escape manually.
|
|
|
|
**Files:**
|
|
- Modify: `src/orchestrator.py` (`_attach_remote_extensions`)
|
|
- Modify: `connectors/keboola/extractor.py` (`_try_attach_extension`)
|
|
|
|
- [ ] **Step 1: Add escaping in orchestrator**
|
|
|
|
In `src/orchestrator.py`, in `_attach_remote_extensions`, replace the ATTACH line:
|
|
|
|
```python
|
|
if token:
|
|
escaped_token = token.replace("'", "''")
|
|
conn.execute(
|
|
f"ATTACH '{url}' AS {alias} (TYPE {extension}, TOKEN '{escaped_token}')"
|
|
)
|
|
```
|
|
|
|
- [ ] **Step 2: Add escaping in Keboola extractor**
|
|
|
|
In `connectors/keboola/extractor.py`, `_try_attach_extension`:
|
|
|
|
```python
|
|
escaped_token = keboola_token.replace("'", "''")
|
|
conn.execute(f"ATTACH '{keboola_url}' AS kbc (TYPE keboola, TOKEN '{escaped_token}')")
|
|
```
|
|
|
|
- [ ] **Step 3: Run tests**
|
|
|
|
Run: `pytest tests/test_orchestrator.py tests/test_keboola_extractor.py -v`
|
|
Expected: All pass
|
|
|
|
- [ ] **Step 4: Commit**
|
|
|
|
```bash
|
|
git add src/orchestrator.py connectors/keboola/extractor.py
|
|
git commit -m "fix: escape single quotes in ATTACH TOKEN to prevent SQL breakage"
|
|
```
|
|
|
|
### Task 7.4: Add temp-file swap to Jira extract_init.update_meta
|
|
|
|
Jira's `update_meta()` writes directly to live `extract.duckdb` while the orchestrator may have it ATTACHed read-only.
|
|
|
|
**Files:**
|
|
- Modify: `connectors/jira/extract_init.py:87`
|
|
- Test: `tests/test_e2e_extract.py`
|
|
|
|
- [ ] **Step 1: Examine current code and fix**
|
|
|
|
The `update_meta()` function opens `extract.duckdb` directly. Since it only updates `_meta` rows and recreates views (not bulk writes), the simplest fix is to use a short-lived connection with CHECKPOINT:
|
|
|
|
In `connectors/jira/extract_init.py`, after the `conn.execute("UPDATE _meta ...")` block, add before `conn.close()`:
|
|
|
|
```python
|
|
conn.execute("CHECKPOINT")
|
|
```
|
|
|
|
This forces WAL flush and reduces the lock window. A full temp-file swap is not practical here since the Jira connector does incremental updates.
|
|
|
|
- [ ] **Step 2: Run tests**
|
|
|
|
Run: `pytest tests/test_e2e_extract.py::TestJiraWebhookToQuery -v`
|
|
Expected: Pass
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add connectors/jira/extract_init.py
|
|
git commit -m "fix: add CHECKPOINT after Jira meta update — reduce lock window"
|
|
```
|
|
|
|
---
|
|
|
|
## Workstream 8: Scalability & Robustness (P1)
|
|
|
|
### Task 8.1: Fix table access check false positives in query endpoint
|
|
|
|
The query endpoint checks table access with `table.lower() in sql_lower` — a substring match. A table named `id` blocks any query containing the word "id". A table named `orders` triggers on `ordered_items`.
|
|
|
|
**Files:**
|
|
- Modify: `app/api/query.py:67-71`
|
|
- Test: `tests/test_security.py`
|
|
|
|
- [ ] **Step 1: Write the failing test**
|
|
|
|
```python
|
|
# In tests/test_security.py, add to TestQuerySecurity:
|
|
|
|
def test_table_access_no_false_positive_on_column_name(self, client, auth_headers):
|
|
"""A forbidden table named 'id' should not block queries that use 'id' as a column."""
|
|
# This test verifies the table access check doesn't use naive substring matching
|
|
resp = client.post("/api/query", json={
|
|
"sql": "SELECT id, name FROM allowed_table"
|
|
}, headers=auth_headers)
|
|
# Should not get 403 just because 'id' appears in SQL
|
|
assert resp.status_code != 403 or "id" not in resp.json().get("detail", "")
|
|
```
|
|
|
|
- [ ] **Step 2: Fix with word-boundary matching**
|
|
|
|
In `app/api/query.py`, replace the table access check (lines 67-71):
|
|
|
|
```python
|
|
# Check if query references any forbidden tables (word-boundary match)
|
|
import re
|
|
forbidden = all_views - set(allowed)
|
|
for table in forbidden:
|
|
# Use word boundaries to avoid false positives on column names
|
|
pattern = r'\b' + re.escape(table.lower()) + r'\b'
|
|
if re.search(pattern, sql_lower):
|
|
raise HTTPException(status_code=403, detail=f"Access denied to table '{table}'")
|
|
```
|
|
|
|
- [ ] **Step 3: Run tests**
|
|
|
|
Run: `pytest tests/test_security.py::TestQuerySecurity -v`
|
|
Expected: All pass
|
|
|
|
- [ ] **Step 4: Commit**
|
|
|
|
```bash
|
|
git add app/api/query.py tests/test_security.py
|
|
git commit -m "fix: use word-boundary matching for table access check — prevent false positives"
|
|
```
|
|
|
|
### Task 8.2: Replace Docker healthcheck with curl
|
|
|
|
The current healthcheck starts a full Python interpreter + imports httpx every 30 seconds.
|
|
|
|
**Files:**
|
|
- Modify: `Dockerfile` (add curl)
|
|
- Modify: `docker-compose.yml:13`
|
|
|
|
- [ ] **Step 1: Add curl to Dockerfile**
|
|
|
|
In `Dockerfile`, add after the `FROM` line:
|
|
|
|
```dockerfile
|
|
RUN apt-get update && apt-get install -y --no-install-recommends curl && rm -rf /var/lib/apt/lists/*
|
|
```
|
|
|
|
- [ ] **Step 2: Update docker-compose healthcheck**
|
|
|
|
In `docker-compose.yml`, replace line 13:
|
|
|
|
```yaml
|
|
healthcheck:
|
|
test: ["CMD", "curl", "-sf", "http://localhost:8000/api/health"]
|
|
interval: 30s
|
|
timeout: 5s
|
|
retries: 3
|
|
```
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add Dockerfile docker-compose.yml
|
|
git commit -m "fix: use curl for Docker healthcheck instead of Python+httpx (faster, lighter)"
|
|
```
|
|
|
|
### Task 8.3: Add graceful shutdown handler
|
|
|
|
No lifespan handler exists to close the shared DuckDB connection on shutdown.
|
|
|
|
**Files:**
|
|
- Modify: `app/main.py`
|
|
|
|
- [ ] **Step 1: Add lifespan handler**
|
|
|
|
In `app/main.py`, add a lifespan context manager and use it in `FastAPI()`:
|
|
|
|
```python
|
|
from contextlib import asynccontextmanager
|
|
|
|
@asynccontextmanager
|
|
async def lifespan(app):
|
|
# Startup
|
|
yield
|
|
# Shutdown: close shared DuckDB connection
|
|
from src.db import close_system_db
|
|
close_system_db()
|
|
```
|
|
|
|
Change `app = FastAPI(...)` to `app = FastAPI(..., lifespan=lifespan)`.
|
|
|
|
Add `close_system_db()` to `src/db.py`:
|
|
|
|
```python
|
|
def close_system_db() -> None:
|
|
"""Close the shared system DB connection. Called on app shutdown."""
|
|
global _system_db_conn, _system_db_path
|
|
if _system_db_conn:
|
|
try:
|
|
_system_db_conn.close()
|
|
except Exception:
|
|
pass
|
|
_system_db_conn = None
|
|
_system_db_path = None
|
|
```
|
|
|
|
- [ ] **Step 2: Run tests**
|
|
|
|
Run: `pytest tests/test_api.py -v`
|
|
Expected: All pass
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add app/main.py src/db.py
|
|
git commit -m "feat: add graceful shutdown handler — close DuckDB on app exit"
|
|
```
|
|
|
|
### Task 8.4: Extract shared _get_data_dir utility
|
|
|
|
`_get_data_dir()` is copy-pasted in 4 API files.
|
|
|
|
**Files:**
|
|
- Create: `app/utils.py`
|
|
- Modify: `app/api/sync.py`, `app/api/data.py`, `app/api/upload.py`, `app/api/catalog.py`
|
|
|
|
- [ ] **Step 1: Create shared utility**
|
|
|
|
```python
|
|
# app/utils.py
|
|
import os
|
|
from pathlib import Path
|
|
|
|
def get_data_dir() -> Path:
|
|
return Path(os.environ.get("DATA_DIR", "./data"))
|
|
```
|
|
|
|
- [ ] **Step 2: Replace in all 4 files**
|
|
|
|
In each file, replace:
|
|
```python
|
|
def _get_data_dir():
|
|
return Path(os.environ.get("DATA_DIR", "./data"))
|
|
```
|
|
|
|
With:
|
|
```python
|
|
from app.utils import get_data_dir as _get_data_dir
|
|
```
|
|
|
|
- [ ] **Step 3: Run tests**
|
|
|
|
Run: `pytest tests/ -q --tb=short`
|
|
Expected: All pass
|
|
|
|
- [ ] **Step 4: Commit**
|
|
|
|
```bash
|
|
git add app/utils.py app/api/sync.py app/api/data.py app/api/upload.py app/api/catalog.py
|
|
git commit -m "refactor: extract shared _get_data_dir to app/utils.py — DRY"
|
|
```
|
|
|
|
### Task 8.5: Move faker to dev dependencies
|
|
|
|
Faker is a production dependency but only used for sample data generation.
|
|
|
|
**Files:**
|
|
- Modify: `requirements.txt`
|
|
- Create: `requirements-dev.txt`
|
|
|
|
- [ ] **Step 1: Move faker**
|
|
|
|
Remove `faker>=24.0.0` from `requirements.txt`.
|
|
|
|
Create `requirements-dev.txt`:
|
|
|
|
```
|
|
-r requirements.txt
|
|
faker>=24.0.0
|
|
pytest>=9.0.0
|
|
pytest-timeout>=2.0.0
|
|
```
|
|
|
|
- [ ] **Step 2: Verify app starts without faker**
|
|
|
|
Run: `python -c "from app.main import create_app; print('OK')"`
|
|
Expected: OK (faker not imported at startup)
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add requirements.txt requirements-dev.txt
|
|
git commit -m "chore: move faker to dev dependencies — not needed in production"
|
|
```
|
|
|
|
---
|
|
|
|
## Workstream 9: Missing Test Coverage (P0-P1)
|
|
|
|
### Task 9.1: Add web UI smoke tests
|
|
|
|
`app/web/router.py` has 46 functions with almost no test coverage. A template error would not be caught.
|
|
|
|
**Files:**
|
|
- Create: `tests/test_web_ui.py`
|
|
|
|
- [ ] **Step 1: Create smoke tests for all authenticated pages**
|
|
|
|
```python
|
|
"""Smoke tests for web UI pages — verify they render without template errors."""
|
|
|
|
import os
|
|
import pytest
|
|
import duckdb
|
|
from fastapi.testclient import TestClient
|
|
|
|
|
|
@pytest.fixture
|
|
def web_client(tmp_path, monkeypatch):
|
|
monkeypatch.setenv("DATA_DIR", str(tmp_path))
|
|
monkeypatch.setenv("TESTING", "1")
|
|
(tmp_path / "state").mkdir()
|
|
(tmp_path / "analytics").mkdir()
|
|
(tmp_path / "extracts").mkdir()
|
|
|
|
from app.main import create_app
|
|
app = create_app()
|
|
return TestClient(app)
|
|
|
|
|
|
@pytest.fixture
|
|
def admin_cookie(web_client, tmp_path, monkeypatch):
|
|
"""Create admin user and return cookie dict."""
|
|
from src.db import get_system_db
|
|
from src.repositories.users import UserRepository
|
|
from app.auth.jwt import create_access_token
|
|
|
|
conn = get_system_db()
|
|
UserRepository(conn).create(id="admin1", email="admin@test.com", role="admin")
|
|
conn.close()
|
|
|
|
token = create_access_token(user_id="admin1", email="admin@test.com", role="admin")
|
|
return {"access_token": token}
|
|
|
|
|
|
class TestWebUISmoke:
|
|
"""Every page should return 200 without template errors."""
|
|
|
|
def test_login_page(self, web_client):
|
|
resp = web_client.get("/login")
|
|
assert resp.status_code == 200
|
|
|
|
def test_dashboard(self, web_client, admin_cookie):
|
|
resp = web_client.get("/", cookies=admin_cookie)
|
|
assert resp.status_code in (200, 302)
|
|
|
|
def test_catalog(self, web_client, admin_cookie):
|
|
resp = web_client.get("/catalog", cookies=admin_cookie)
|
|
assert resp.status_code == 200
|
|
|
|
def test_admin_tables(self, web_client, admin_cookie):
|
|
resp = web_client.get("/admin/tables", cookies=admin_cookie)
|
|
assert resp.status_code == 200
|
|
|
|
def test_admin_permissions(self, web_client, admin_cookie):
|
|
resp = web_client.get("/admin/permissions", cookies=admin_cookie)
|
|
assert resp.status_code == 200
|
|
|
|
def test_corporate_memory(self, web_client, admin_cookie):
|
|
resp = web_client.get("/corporate-memory", cookies=admin_cookie)
|
|
assert resp.status_code == 200
|
|
|
|
def test_activity_center(self, web_client, admin_cookie):
|
|
resp = web_client.get("/activity-center", cookies=admin_cookie)
|
|
assert resp.status_code == 200
|
|
```
|
|
|
|
- [ ] **Step 2: Run tests**
|
|
|
|
Run: `pytest tests/test_web_ui.py -v`
|
|
Expected: All pass (or reveal actual template errors)
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add tests/test_web_ui.py
|
|
git commit -m "test: add web UI smoke tests — catch template errors in 7 pages"
|
|
```
|
|
|
|
### Task 9.2: Add Jira service integration tests
|
|
|
|
`connectors/jira/service.py` (15 functions) orchestrates the entire Jira webhook flow but has no dedicated tests.
|
|
|
|
**Files:**
|
|
- Create: `tests/test_jira_service.py`
|
|
|
|
- [ ] **Step 1: Create integration tests**
|
|
|
|
```python
|
|
"""Tests for Jira service — webhook event processing pipeline."""
|
|
|
|
import os
|
|
from pathlib import Path
|
|
from unittest.mock import patch, MagicMock
|
|
|
|
import duckdb
|
|
import pytest
|
|
|
|
from connectors.jira.extract_init import init_extract, update_meta
|
|
|
|
|
|
@pytest.fixture
|
|
def jira_env(tmp_path, monkeypatch):
|
|
monkeypatch.setenv("DATA_DIR", str(tmp_path))
|
|
jira_dir = tmp_path / "extracts" / "jira"
|
|
jira_dir.mkdir(parents=True)
|
|
return jira_dir
|
|
|
|
|
|
class TestJiraExtractInit:
|
|
def test_init_creates_extract_db(self, jira_env):
|
|
init_extract(jira_env)
|
|
assert (jira_env / "extract.duckdb").exists()
|
|
|
|
conn = duckdb.connect(str(jira_env / "extract.duckdb"))
|
|
meta = conn.execute("SELECT * FROM _meta").fetchall()
|
|
conn.close()
|
|
assert isinstance(meta, list)
|
|
|
|
def test_update_meta_creates_view(self, jira_env):
|
|
init_extract(jira_env)
|
|
|
|
# Create a parquet file for 'issues'
|
|
issues_dir = jira_env / "data" / "issues"
|
|
issues_dir.mkdir(parents=True)
|
|
pq_path = str(issues_dir / "2026-04.parquet")
|
|
tmp = duckdb.connect()
|
|
tmp.execute(
|
|
f"COPY (SELECT 'PROJ-1' AS issue_key, 'Bug' AS type) "
|
|
f"TO '{pq_path}' (FORMAT PARQUET)"
|
|
)
|
|
tmp.close()
|
|
|
|
update_meta(jira_env, "issues")
|
|
|
|
conn = duckdb.connect(str(jira_env / "extract.duckdb"))
|
|
rows = conn.execute("SELECT rows FROM _meta WHERE table_name='issues'").fetchone()
|
|
assert rows[0] == 1
|
|
|
|
data = conn.execute("SELECT issue_key FROM issues").fetchone()
|
|
assert data[0] == "PROJ-1"
|
|
conn.close()
|
|
```
|
|
|
|
- [ ] **Step 2: Run tests**
|
|
|
|
Run: `pytest tests/test_jira_service.py -v`
|
|
Expected: All pass
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add tests/test_jira_service.py
|
|
git commit -m "test: add Jira extract_init integration tests"
|
|
```
|
|
|
|
### Task 9.3: Add instance_config tests
|
|
|
|
`app/instance_config.py` (10 functions) is loaded at startup and affects all web pages. No tests exist.
|
|
|
|
**Files:**
|
|
- Create: `tests/test_instance_config.py`
|
|
|
|
- [ ] **Step 1: Create tests**
|
|
|
|
```python
|
|
"""Tests for instance_config — YAML loading and accessor functions."""
|
|
|
|
import os
|
|
from pathlib import Path
|
|
|
|
import pytest
|
|
|
|
|
|
@pytest.fixture
|
|
def config_env(tmp_path, monkeypatch):
|
|
config_dir = tmp_path / "config"
|
|
config_dir.mkdir()
|
|
monkeypatch.setenv("DATA_DIR", str(tmp_path))
|
|
return config_dir
|
|
|
|
|
|
class TestInstanceConfig:
|
|
def test_missing_config_file_returns_defaults(self, config_env, monkeypatch):
|
|
"""Missing instance.yaml should not crash, just return defaults."""
|
|
from app.instance_config import get_instance_name, get_data_source_type
|
|
# Should return some default, not crash
|
|
name = get_instance_name()
|
|
assert isinstance(name, str)
|
|
|
|
def test_loads_valid_yaml(self, config_env, tmp_path, monkeypatch):
|
|
"""Valid instance.yaml should be loaded and accessible."""
|
|
yaml_path = tmp_path / "config" / "instance.yaml"
|
|
yaml_path.write_text("instance_name: Test Instance\ndata_source: keboola\n")
|
|
|
|
from app.instance_config import load_instance_config, get_instance_name
|
|
import importlib
|
|
import app.instance_config as mod
|
|
importlib.reload(mod)
|
|
|
|
name = mod.get_instance_name()
|
|
assert "Test" in name or isinstance(name, str)
|
|
```
|
|
|
|
- [ ] **Step 2: Run tests**
|
|
|
|
Run: `pytest tests/test_instance_config.py -v`
|
|
Expected: All pass
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add tests/test_instance_config.py
|
|
git commit -m "test: add instance_config tests — missing file, valid YAML"
|
|
```
|
|
|
|
### Task 9.4: Add concurrent rebuild safety test
|
|
|
|
Verify the atomic swap pattern works when a read connection is open.
|
|
|
|
**Files:**
|
|
- Modify: `tests/test_orchestrator.py`
|
|
|
|
- [ ] **Step 1: Write the test**
|
|
|
|
```python
|
|
# In tests/test_orchestrator.py, add:
|
|
|
|
def test_rebuild_while_reading(setup_env):
|
|
"""Rebuild should succeed even while a read-only connection exists."""
|
|
from src.orchestrator import SyncOrchestrator
|
|
import duckdb
|
|
|
|
_create_mock_extract(
|
|
setup_env["extracts_dir"], "keboola",
|
|
[{"name": "orders", "data": [{"id": "1"}]}],
|
|
)
|
|
|
|
orch = SyncOrchestrator(analytics_db_path=setup_env["analytics_db"])
|
|
orch.rebuild()
|
|
|
|
# Open a read-only connection (simulating query endpoint)
|
|
reader = duckdb.connect(setup_env["analytics_db"], read_only=True)
|
|
|
|
# Rebuild while reader is open — should not crash
|
|
result = orch.rebuild()
|
|
assert "keboola" in result
|
|
|
|
reader.close()
|
|
```
|
|
|
|
- [ ] **Step 2: Run test**
|
|
|
|
Run: `pytest tests/test_orchestrator.py::test_rebuild_while_reading -v`
|
|
Expected: Pass
|
|
|
|
- [ ] **Step 3: Commit**
|
|
|
|
```bash
|
|
git add tests/test_orchestrator.py
|
|
git commit -m "test: add concurrent rebuild safety test"
|
|
```
|
|
|
|
---
|
|
|
|
## Execution Order
|
|
|
|
Workstreams are independent and can run in parallel. Within each workstream, tasks are sequential.
|
|
|
|
**Critical path (do first):**
|
|
1. Task 1.1 (password bypass) — active auth vulnerability
|
|
2. Task 3.1 (rebuild_source) — active data loss bug
|
|
3. Task 2.1 (SQL injection) — security hardening
|
|
|
|
**Then:**
|
|
4. Tasks 1.2, 1.3 (JWT hardening)
|
|
5. Tasks 2.2 (query blocklist)
|
|
6. Tasks 3.2, 3.3 (WAL + BQ swap)
|
|
7. Task 4.1 (sandbox)
|
|
8. Tasks 7.1-7.4 (DuckDB lifecycle)
|
|
9. Tasks 8.1-8.5 (scalability + cleanup)
|
|
10. Tasks 5.1-5.3 (test hardening)
|
|
11. Tasks 9.1-9.4 (missing test coverage)
|
|
12. Tasks 6.1-6.4 (docs + cleanup)
|
|
13. Task 1.4 (cookie auth)
|
|
|
|
**Verification after all tasks:**
|
|
|
|
```bash
|
|
pytest tests/ -v --tb=short # All 620+ tests pass
|
|
```
|
|
|
|
Workstreams are independent and can run in parallel. Within each workstream, tasks are sequential.
|
|
|
|
**Critical path (do first):**
|
|
1. Task 1.1 (password bypass) — active auth vulnerability
|
|
2. Task 3.1 (rebuild_source) — active data loss bug
|
|
3. Task 2.1 (SQL injection) — security hardening
|
|
|
|
**Then:**
|
|
4. Tasks 1.2, 1.3 (JWT hardening)
|
|
5. Tasks 2.2 (query blocklist)
|
|
6. Tasks 3.2, 3.3 (WAL + BQ swap)
|
|
7. Task 4.1 (sandbox)
|
|
8. Tasks 5.1-5.3 (test hardening)
|
|
9. Tasks 6.1-6.4 (docs + cleanup)
|
|
|
|
**Verification after all tasks:**
|
|
|
|
```bash
|
|
pytest tests/ -v --tb=short # All 607+ tests pass
|
|
```
|