feat: multi-instance deployment — all 14 must-have items from spec

CalVer CI (release.yml) with stable/dev channels, health endpoint
with version/channel/schema_version, JWT secret auto-generation with
file persistence, smoke test script + Docker-in-CI, pre-migration
snapshot, /api/admin/configure for headless setup, /api/admin/
discover-and-register, /setup wizard, OpenAPI snapshot test, custom
connector mount support, CHANGELOG, migration safety tests, startup
banner.

663 tests pass (6 new migration safety + 3 OpenAPI snapshot + 1
updated JWT test).
This commit is contained in:
ZdenekSrotyr 2026-04-10 11:57:42 +02:00
parent cce179f114
commit 6c53082295
23 changed files with 6394 additions and 51 deletions

View file

@ -1,4 +1,7 @@
name: Build & Push # SUPERSEDED by release.yml — CalVer tagging with stable/dev channels.
# This workflow is kept for backward compatibility but only runs tests.
# Image build and push is handled by release.yml.
name: Build & Push (legacy)
on: on:
push: push:
@ -24,27 +27,3 @@ jobs:
run: pytest tests/ -v --tb=short run: pytest tests/ -v --tb=short
env: env:
TESTING: "1" TESTING: "1"
build-and-push:
needs: test
runs-on: ubuntu-latest
permissions:
packages: write
contents: read
steps:
- uses: actions/checkout@v5
- name: Log in to GHCR
uses: docker/login-action@v4
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v7
with:
push: true
tags: |
ghcr.io/${{ github.repository }}:latest
ghcr.io/${{ github.repository }}:${{ github.sha }}

130
.github/workflows/release.yml vendored Normal file
View file

@ -0,0 +1,130 @@
name: Release
on:
push:
branches: [main, "feature/**"]
permissions:
contents: write
packages: write
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: actions/setup-python@v6
with:
python-version: "3.13"
- name: Install uv
uses: astral-sh/setup-uv@v7
- name: Install dependencies
run: uv pip install --system ".[dev]"
- name: Run tests
run: pytest tests/ -v --tb=short
env:
TESTING: "1"
build-and-push:
needs: test
runs-on: ubuntu-latest
outputs:
image_tag: ${{ steps.meta.outputs.versioned_tag }}
version: ${{ steps.meta.outputs.version }}
channel: ${{ steps.meta.outputs.channel }}
steps:
- uses: actions/checkout@v5
with:
fetch-depth: 0
fetch-tags: true
- name: Determine channel and version
id: meta
run: |
YEAR_MONTH=$(date +%Y.%m)
if [[ "${{ github.ref }}" == "refs/heads/main" ]]; then
CHANNEL="stable"
else
CHANNEL="dev"
fi
# Count existing tags for this channel+month to get next N
EXISTING=$(git tag -l "${CHANNEL}-${YEAR_MONTH}.*" | wc -l | tr -d ' ')
N=$((EXISTING + 1))
VERSION="${YEAR_MONTH}.${N}"
SHORT_SHA=$(echo "${{ github.sha }}" | cut -c1-7)
echo "channel=${CHANNEL}" >> "$GITHUB_OUTPUT"
echo "version=${VERSION}" >> "$GITHUB_OUTPUT"
echo "versioned_tag=${CHANNEL}-${VERSION}" >> "$GITHUB_OUTPUT"
echo "short_sha=${SHORT_SHA}" >> "$GITHUB_OUTPUT"
echo "Channel: ${CHANNEL}"
echo "Version: ${VERSION}"
echo "Versioned tag: ${CHANNEL}-${VERSION}"
- name: Log in to GHCR
uses: docker/login-action@v4
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v7
with:
push: true
build-args: |
AGNES_VERSION=${{ steps.meta.outputs.version }}
RELEASE_CHANNEL=${{ steps.meta.outputs.channel }}
tags: |
ghcr.io/${{ github.repository }}:${{ steps.meta.outputs.channel }}
ghcr.io/${{ github.repository }}:${{ steps.meta.outputs.versioned_tag }}
ghcr.io/${{ github.repository }}:sha-${{ steps.meta.outputs.short_sha }}
- name: Create git tag
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
TAG="${{ steps.meta.outputs.versioned_tag }}"
git tag -a "$TAG" -m "Release $TAG"
git push origin "$TAG" || echo "Tag $TAG already exists, skipping"
smoke-test:
needs: build-and-push
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- name: Start Agnes from built image
run: |
# Override image to use the just-built version
export AGNES_IMAGE="ghcr.io/${{ github.repository }}:${{ needs.build-and-push.outputs.image_tag }}"
sed -i "s|build: \.|image: ${AGNES_IMAGE}|g" docker-compose.yml
docker compose -f docker-compose.yml -f docker-compose.ci.yml up -d
# Wait for healthy (max 60s)
timeout 60 bash -c 'until curl -sf http://localhost:8000/api/health | python3 -c "import sys,json; d=json.load(sys.stdin); sys.exit(0 if d[\"status\"]!=\"unhealthy\" else 1)"; do sleep 3; done'
- name: Run smoke tests
run: bash scripts/smoke-test.sh http://localhost:8000
- name: Collect logs on failure
if: failure()
run: docker compose -f docker-compose.yml -f docker-compose.ci.yml logs > smoke-test-logs.txt
- name: Upload logs
if: failure()
uses: actions/upload-artifact@v4
with:
name: smoke-test-logs
path: smoke-test-logs.txt
- name: Teardown
if: always()
run: docker compose -f docker-compose.yml -f docker-compose.ci.yml down -v

33
CHANGELOG.md Normal file
View file

@ -0,0 +1,33 @@
# Changelog
All notable changes to Agnes AI Data Analyst are documented in this file.
Format: [CalVer](https://calver.org/) `YYYY.MM.N` with channels `stable` and `dev`.
---
## stable-2026.04.1 (unreleased)
Multi-instance deployment and self-service setup.
### Added
- CalVer versioning with `stable` and `dev` release channels
- `/api/health` now returns `version`, `channel`, and `schema_version`
- Auto-generated JWT and session secrets with file persistence (`/data/state/.jwt_secret`)
- Pre-migration snapshot of `system.duckdb` before schema upgrades
- `POST /api/admin/configure` for headless data source configuration
- `POST /api/admin/discover-and-register` combined table discovery and registration
- `/setup` web wizard for first-time instance setup
- `scripts/smoke-test.sh` for post-deploy verification
- Smoke test job in CI (Docker-in-CI after every release)
- OpenAPI snapshot test for breaking change detection
- Custom connector mount support (`connectors/custom/`)
- Startup banner logging version, channel, and schema version
- Schema migration safety tests (idempotency, data preservation, snapshot)
- `CHANGELOG.md` and release notes template
### Breaking Changes
None.
### Migration Guide
No action required. Existing instances upgrade seamlessly.

View file

@ -6,6 +6,11 @@ RUN apt-get update && apt-get install -y --no-install-recommends curl && rm -rf
# Install uv for fast dependency management # Install uv for fast dependency management
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
ARG AGNES_VERSION=dev
ARG RELEASE_CHANNEL=dev
ENV AGNES_VERSION=${AGNES_VERSION}
ENV RELEASE_CHANNEL=${RELEASE_CHANNEL}
WORKDIR /app WORKDIR /app
# Copy application code # Copy application code

View file

@ -1,6 +1,6 @@
# Agnes AI Data Analyst — Development Makefile # Agnes AI Data Analyst — Development Makefile
.PHONY: help test lint dev docker .PHONY: help test lint dev docker update-openapi-snapshot
help: help:
@echo "Available targets:" @echo "Available targets:"
@ -20,3 +20,7 @@ docker:
lint: lint:
@ruff check . 2>/dev/null || echo "ruff not installed: pip install ruff" @ruff check . 2>/dev/null || echo "ruff not installed: pip install ruff"
update-openapi-snapshot:
TESTING=1 python scripts/generate_openapi.py > tests/snapshots/openapi.json
@echo "Snapshot updated. Review diff and commit."

View file

@ -1,7 +1,9 @@
"""Admin endpoints — table discovery, registry management.""" """Admin endpoints — table discovery, registry management, instance configuration."""
import logging import logging
import os
import uuid import uuid
from pathlib import Path
from fastapi import APIRouter, Depends, HTTPException from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel from pydantic import BaseModel
@ -42,6 +44,16 @@ class UpdateTableRequest(BaseModel):
profile_after_sync: Optional[bool] = None profile_after_sync: Optional[bool] = None
class ConfigureRequest(BaseModel):
data_source: str # "keboola" | "bigquery" | "local"
keboola_token: Optional[str] = None
keboola_url: Optional[str] = None
bigquery_project: Optional[str] = None
bigquery_location: Optional[str] = None
instance_name: Optional[str] = None
allowed_domain: Optional[str] = None
@router.get("/discover-tables") @router.get("/discover-tables")
async def discover_tables( async def discover_tables(
user: dict = Depends(require_role(Role.ADMIN)), user: dict = Depends(require_role(Role.ADMIN)),
@ -144,3 +156,196 @@ async def unregister_table(
if not repo.get(table_id): if not repo.get(table_id):
raise HTTPException(status_code=404, detail="Table not found") raise HTTPException(status_code=404, detail="Table not found")
repo.unregister(table_id) repo.unregister(table_id)
@router.post("/configure")
async def configure_instance(
request: ConfigureRequest,
user: dict = Depends(require_role(Role.ADMIN)),
):
"""Configure data source and instance settings via API.
Writes config to instance.yaml and persists secrets to .env_overlay.
AI agents and the /setup wizard use this instead of manual file editing.
"""
import yaml
if request.data_source not in ("keboola", "bigquery", "local"):
raise HTTPException(status_code=400, detail="data_source must be 'keboola', 'bigquery', or 'local'")
# Validate credentials if provided
if request.data_source == "keboola":
if not request.keboola_token or not request.keboola_url:
raise HTTPException(status_code=400, detail="keboola_token and keboola_url are required for Keboola data source")
try:
from connectors.keboola.client import KeboolaClient
client = KeboolaClient(token=request.keboola_token, url=request.keboola_url)
client.verify_token()
except Exception as e:
raise HTTPException(status_code=400, detail=f"Keboola connection failed: {e}")
elif request.data_source == "bigquery":
if not request.bigquery_project:
raise HTTPException(status_code=400, detail="bigquery_project is required for BigQuery data source")
# Build instance.yaml config (secrets as ${ENV_VAR} references)
config_dir = Path(os.environ.get("CONFIG_DIR", "./config"))
config_path = config_dir / "instance.yaml"
# Load existing config or start fresh
existing = {}
if config_path.exists():
try:
existing = yaml.safe_load(config_path.read_text()) or {}
except Exception:
existing = {}
# Merge instance settings
if request.instance_name:
existing.setdefault("instance", {})["name"] = request.instance_name
if request.allowed_domain:
existing.setdefault("auth", {})["allowed_domain"] = request.allowed_domain
# Merge data source config (secrets as env var references)
existing["data_source"] = {"type": request.data_source}
if request.data_source == "keboola":
existing["data_source"]["keboola"] = {
"url": request.keboola_url,
"token_env": "KEBOOLA_STORAGE_TOKEN",
}
elif request.data_source == "bigquery":
existing["data_source"]["bigquery"] = {
"project": request.bigquery_project,
"location": request.bigquery_location or "us",
}
# Write instance.yaml
config_dir.mkdir(parents=True, exist_ok=True)
config_path.write_text(yaml.dump(existing, default_flow_style=False, sort_keys=False))
logger.info("Wrote instance config to %s", config_path)
# Persist secrets to .env_overlay (in data volume, never in git)
secrets_to_persist = {}
if request.keboola_token:
secrets_to_persist["KEBOOLA_STORAGE_TOKEN"] = request.keboola_token
if request.keboola_url:
secrets_to_persist["KEBOOLA_STACK_URL"] = request.keboola_url
if secrets_to_persist:
data_dir = Path(os.environ.get("DATA_DIR", "./data"))
overlay_path = data_dir / "state" / ".env_overlay"
overlay_path.parent.mkdir(parents=True, exist_ok=True)
# Merge with existing overlay
existing_overlay = {}
if overlay_path.exists():
for line in overlay_path.read_text().splitlines():
if "=" in line and not line.startswith("#"):
k, v = line.split("=", 1)
existing_overlay[k.strip()] = v.strip()
existing_overlay.update(secrets_to_persist)
overlay_path.write_text(
"\n".join(f"{k}={v}" for k, v in existing_overlay.items()) + "\n"
)
try:
overlay_path.chmod(0o600)
except OSError:
pass
logger.info("Persisted %d secrets to .env_overlay", len(secrets_to_persist))
# Inject into current process environment
for k, v in secrets_to_persist.items():
os.environ[k] = v
# Invalidate cached instance config so next read picks up changes
import app.instance_config as ic
ic._instance_config = None
return {
"status": "ok",
"data_source": request.data_source,
"connection": "verified" if request.data_source != "local" else "local",
}
def _discover_and_register_tables(conn: duckdb.DuckDBPyConnection, user_email: str) -> dict:
"""Discover tables from configured source and register them. Shared logic for API and sync."""
from app.instance_config import get_data_source_type, get_value
source_type = get_data_source_type()
if source_type != "keboola":
return {"registered": 0, "skipped": 0, "errors": 0, "tables": [], "source": source_type}
from connectors.keboola.client import KeboolaClient
url = get_value("keboola", "url", default="")
token = os.environ.get(get_value("keboola", "token_env", default="KEBOOLA_STORAGE_TOKEN"), "")
if not token:
token = os.environ.get("KEBOOLA_STORAGE_TOKEN", "")
client = KeboolaClient(token=token, url=url)
discovered = client.discover_all_tables()
repo = TableRegistryRepository(conn)
registered = 0
skipped = 0
errors = 0
table_names = []
for table in discovered:
table_id = table.get("id", "").strip().lower().replace(".", "_").replace(" ", "_")
if not table_id:
errors += 1
continue
if repo.get(table_id):
skipped += 1
continue
try:
# Parse bucket from table ID (format: in.c-bucket.table_name)
parts = table.get("id", "").split(".")
bucket = parts[1] if len(parts) > 1 else ""
source_table = parts[2] if len(parts) > 2 else table.get("name", "")
repo.register(
id=table_id,
name=table.get("name", table_id),
source_type="keboola",
bucket=bucket,
source_table=source_table,
query_mode="local",
registered_by=user_email,
description=f"Auto-discovered from Keboola: {table.get('id', '')}",
)
registered += 1
table_names.append(table_id)
except Exception as e:
logger.warning("Failed to register %s: %s", table_id, e)
errors += 1
return {
"registered": registered,
"skipped": skipped,
"errors": errors,
"tables": table_names,
"source": "keboola",
}
@router.post("/discover-and-register")
async def discover_and_register(
user: dict = Depends(require_role(Role.ADMIN)),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Discover tables from configured source and auto-register them.
Combines discover-tables + register-table into one call.
Skips already-registered tables. Used by /setup wizard and AI agents.
"""
try:
result = _discover_and_register_tables(conn, user.get("email", "admin"))
return result
except Exception as e:
raise HTTPException(status_code=500, detail=f"Discovery and registration failed: {e}")

View file

@ -1,11 +1,13 @@
"""Health check endpoint — structured diagnostics for AI agents.""" """Health check endpoint — structured diagnostics for AI agents."""
import os
from datetime import datetime, timezone from datetime import datetime, timezone
from fastapi import APIRouter, Depends from fastapi import APIRouter, Depends
import duckdb import duckdb
from app.auth.dependencies import _get_db from app.auth.dependencies import _get_db
from src.db import SCHEMA_VERSION
from src.repositories.sync_state import SyncStateRepository from src.repositories.sync_state import SyncStateRepository
router = APIRouter(tags=["health"]) router = APIRouter(tags=["health"])
@ -69,6 +71,9 @@ async def health_check(conn: duckdb.DuckDBPyConnection = Depends(_get_db)):
return { return {
"status": overall, "status": overall,
"version": os.environ.get("AGNES_VERSION", "dev"),
"channel": os.environ.get("RELEASE_CHANNEL", "dev"),
"schema_version": SCHEMA_VERSION,
"timestamp": datetime.now(timezone.utc).isoformat(), "timestamp": datetime.now(timezone.utc).isoformat(),
"services": checks, "services": checks,
} }

View file

@ -63,6 +63,27 @@ def _run_sync(tables: Optional[List[str]] = None):
finally: finally:
sys_conn.close() sys_conn.close()
if not table_configs:
# Auto-discover tables on first sync when registry is empty
if source_type == "keboola" and os.environ.get("KEBOOLA_STORAGE_TOKEN"):
logger.info("No tables registered — running auto-discovery from Keboola")
try:
from app.api.admin import _discover_and_register_tables
auto_conn = get_system_db()
try:
result = _discover_and_register_tables(auto_conn, "auto-discovery")
logger.info("Auto-discovered %d tables, skipped %d", result["registered"], result["skipped"])
finally:
auto_conn.close()
# Re-read table configs after auto-registration
sys_conn2 = get_system_db()
try:
table_configs = TableRegistryRepository(sys_conn2).list_local(source_type)
finally:
sys_conn2.close()
except Exception as e:
logger.warning("Auto-discovery failed: %s", e)
if not table_configs: if not table_configs:
logger.warning("No tables to sync for source_type=%s", source_type) logger.warning("No tables to sync for source_type=%s", source_type)
return return
@ -113,6 +134,29 @@ print(json.dumps(result))
else: else:
print(f"[SYNC] Extractor OK", file=_sys.stderr, flush=True) print(f"[SYNC] Extractor OK", file=_sys.stderr, flush=True)
# Run custom connectors (Tier A: local mount)
connectors_dir = Path(os.environ.get("CONNECTORS_DIR", str(Path(__file__).parent.parent.parent / "connectors" / "custom")))
if connectors_dir.exists():
for connector_dir in sorted(connectors_dir.iterdir()):
if not connector_dir.is_dir():
continue
extractor = connector_dir / "extractor.py"
if not extractor.exists():
continue
logger.info("Running custom connector: %s", connector_dir.name)
try:
custom_result = subprocess.run(
[sys.executable, str(extractor)],
env=env, capture_output=True, text=True, timeout=600,
cwd=str(Path(__file__).parent.parent.parent),
)
if custom_result.returncode != 0:
logger.error("Custom connector %s failed: %s", connector_dir.name, custom_result.stderr[-500:])
else:
logger.info("Custom connector %s completed", connector_dir.name)
except subprocess.TimeoutExpired:
logger.error("Custom connector %s timed out", connector_dir.name)
# Rebuild master views (reads extract.duckdb files, no write conflict) # Rebuild master views (reads extract.duckdb files, no write conflict)
from src.orchestrator import SyncOrchestrator from src.orchestrator import SyncOrchestrator
orch = SyncOrchestrator() orch = SyncOrchestrator()

View file

@ -7,22 +7,22 @@ from typing import Optional
import jwt import jwt
SECRET_KEY = os.environ.get("JWT_SECRET_KEY", "") def _get_secret_key() -> str:
"""Load JWT secret - from env, file, or auto-generated."""
if not SECRET_KEY:
if os.environ.get("TESTING", "").lower() in ("1", "true"): if os.environ.get("TESTING", "").lower() in ("1", "true"):
SECRET_KEY = "test-jwt-secret-key-minimum-32-chars!!" return os.environ.get("JWT_SECRET_KEY", "test-jwt-secret-key-minimum-32-chars!!")
else: from app.secrets import get_jwt_secret
raise RuntimeError( key = get_jwt_secret()
"JWT_SECRET_KEY environment variable is required. " if len(key) < 32:
"Generate one: python -c \"import secrets; print(secrets.token_hex(32))\""
)
elif len(SECRET_KEY) < 32 and os.environ.get("TESTING", "").lower() not in ("1", "true"):
import warnings as _warnings import warnings as _warnings
_warnings.warn( _warnings.warn(
f"JWT_SECRET_KEY is {len(SECRET_KEY)} chars — minimum 32 recommended", f"JWT_SECRET_KEY is {len(key)} chars — minimum 32 recommended",
UserWarning, stacklevel=2, UserWarning, stacklevel=2,
) )
return key
SECRET_KEY = _get_secret_key()
ALGORITHM = "HS256" ALGORITHM = "HS256"
ACCESS_TOKEN_EXPIRE_HOURS = 24 # 24 hours ACCESS_TOKEN_EXPIRE_HOURS = 24 # 24 hours

View file

@ -48,8 +48,8 @@ def create_app() -> FastAPI:
) )
# Session middleware (required for OAuth state) # Session middleware (required for OAuth state)
import secrets as _secrets from app.secrets import get_session_secret
session_secret = os.environ.get("SESSION_SECRET", os.environ.get("JWT_SECRET_KEY", _secrets.token_hex(32))) session_secret = get_session_secret()
app.add_middleware(SessionMiddleware, secret_key=session_secret) app.add_middleware(SessionMiddleware, secret_key=session_secret)
# CORS for CLI and external clients # CORS for CLI and external clients
@ -62,6 +62,14 @@ def create_app() -> FastAPI:
allow_headers=["*"], allow_headers=["*"],
) )
# Load .env_overlay (persisted by /api/admin/configure)
_overlay = Path(os.environ.get("DATA_DIR", "./data")) / "state" / ".env_overlay"
if _overlay.exists():
for line in _overlay.read_text().splitlines():
if "=" in line and not line.startswith("#"):
k, v = line.split("=", 1)
os.environ.setdefault(k.strip(), v.strip())
# Load instance config on startup # Load instance config on startup
try: try:
from app.instance_config import load_instance_config from app.instance_config import load_instance_config
@ -70,6 +78,15 @@ def create_app() -> FastAPI:
except Exception as e: except Exception as e:
logger.warning(f"Could not load instance config: {e}") logger.warning(f"Could not load instance config: {e}")
# Startup banner
from src.db import SCHEMA_VERSION
logger.info(
"Agnes %s | channel: %s | schema v%s",
os.environ.get("AGNES_VERSION", "dev"),
os.environ.get("RELEASE_CHANNEL", "dev"),
SCHEMA_VERSION,
)
# Seed admin user for testing/CI (when SEED_ADMIN_EMAIL is set) # Seed admin user for testing/CI (when SEED_ADMIN_EMAIL is set)
seed_email = os.environ.get("SEED_ADMIN_EMAIL") seed_email = os.environ.get("SEED_ADMIN_EMAIL")
if seed_email: if seed_email:

37
app/secrets.py Normal file
View file

@ -0,0 +1,37 @@
"""Auto-generate and persist secrets that survive container restarts."""
import logging
import os
import secrets
from pathlib import Path
logger = logging.getLogger(__name__)
def _load_or_generate(env_var: str, file_name: str) -> str:
"""Load secret from env var, or from file, or generate and persist."""
val = os.environ.get(env_var, "")
if val:
return val
data_dir = Path(os.environ.get("DATA_DIR", "./data"))
secret_path = data_dir / "state" / file_name
if secret_path.exists():
return secret_path.read_text().strip()
secret_path.parent.mkdir(parents=True, exist_ok=True)
val = secrets.token_hex(32)
secret_path.write_text(val)
secret_path.chmod(0o600)
logger.info(
"Auto-generated %s -> %s (set %s in .env to use a fixed value)",
file_name, secret_path, env_var,
)
return val
def get_jwt_secret() -> str:
"""Get JWT secret key from env, file, or auto-generate."""
return _load_or_generate("JWT_SECRET_KEY", ".jwt_secret")
def get_session_secret() -> str:
"""Get session secret from env, file, or auto-generate."""
return _load_or_generate("SESSION_SECRET", ".session_secret")

View file

@ -120,6 +120,7 @@ _URL_MAP = {
"email_auth.login_email_form": "/login/email", "email_auth.login_email_form": "/login/email",
"email_auth.send_magic_link": "/auth/email/send-link", "email_auth.send_magic_link": "/auth/email/send-link",
"register": "/auth/password/setup", "register": "/auth/password/setup",
"setup": "/setup",
} }
@ -177,6 +178,18 @@ async def index(request: Request, user: Optional[dict] = Depends(get_optional_us
return RedirectResponse(url="/login", status_code=302) return RedirectResponse(url="/login", status_code=302)
@router.get("/setup", response_class=HTMLResponse)
async def setup_wizard(request: Request, conn: duckdb.DuckDBPyConnection = Depends(_get_db)):
"""First-time setup wizard. Redirects to dashboard if users already exist."""
try:
user_count = conn.execute("SELECT COUNT(*) FROM users").fetchone()[0]
if user_count > 0:
return RedirectResponse(url="/dashboard", status_code=302)
except Exception:
pass # No users table yet — show setup
return templates.TemplateResponse(request, "setup.html", _build_context(request))
@router.get("/login", response_class=HTMLResponse) @router.get("/login", response_class=HTMLResponse)
async def login_page(request: Request): async def login_page(request: Request):
providers = [] providers = []

View file

@ -0,0 +1,261 @@
{% extends "base_login.html" %}
{% block title %}Setup - Agnes AI Data Analyst{% endblock %}
{% block content %}
<div class="login-page">
<div class="login-card-wrapper" style="max-width: 520px; margin: 40px auto; padding: 0 20px;">
<div class="login-card" style="max-width: 520px;">
<h2 id="wizard-title">Setup Agnes</h2>
<p class="login-description" id="wizard-description">
Create your admin account to get started.
</p>
<!-- Progress -->
<div style="display: flex; gap: 8px; margin-bottom: 24px;">
<div id="step-dot-1" style="flex: 1; height: 4px; border-radius: 2px; background: var(--primary, #2563eb);"></div>
<div id="step-dot-2" style="flex: 1; height: 4px; border-radius: 2px; background: #e5e7eb;"></div>
<div id="step-dot-3" style="flex: 1; height: 4px; border-radius: 2px; background: #e5e7eb;"></div>
<div id="step-dot-4" style="flex: 1; height: 4px; border-radius: 2px; background: #e5e7eb;"></div>
</div>
<!-- Status message -->
<div id="status-msg" style="display: none; padding: 10px 14px; border-radius: 6px; margin-bottom: 16px; font-size: 14px;"></div>
<!-- Step 1: Create Admin -->
<div id="step-1">
<form id="admin-form" onsubmit="return createAdmin(event)">
<label style="display: block; margin-bottom: 4px; font-size: 14px; font-weight: 500;">Email</label>
<input type="email" id="admin-email" required placeholder="admin@company.com"
style="width: 100%; padding: 10px 12px; border: 1px solid #d1d5db; border-radius: 6px; margin-bottom: 12px; font-size: 14px; box-sizing: border-box;">
<label style="display: block; margin-bottom: 4px; font-size: 14px; font-weight: 500;">Password</label>
<input type="password" id="admin-password" required minlength="8" placeholder="Min. 8 characters"
style="width: 100%; padding: 10px 12px; border: 1px solid #d1d5db; border-radius: 6px; margin-bottom: 16px; font-size: 14px; box-sizing: border-box;">
<button type="submit" class="btn btn-primary" style="width: 100%;" id="btn-admin">
Create Admin Account
</button>
</form>
</div>
<!-- Step 2: Data Source -->
<div id="step-2" style="display: none;">
<form id="source-form" onsubmit="return configureSource(event)">
<label style="display: block; margin-bottom: 4px; font-size: 14px; font-weight: 500;">Data Source</label>
<select id="data-source" onchange="toggleSourceFields()"
style="width: 100%; padding: 10px 12px; border: 1px solid #d1d5db; border-radius: 6px; margin-bottom: 12px; font-size: 14px; box-sizing: border-box;">
<option value="keboola">Keboola</option>
<option value="bigquery">BigQuery</option>
<option value="local">Local / CSV</option>
</select>
<div id="keboola-fields">
<label style="display: block; margin-bottom: 4px; font-size: 14px; font-weight: 500;">Keboola URL</label>
<input type="url" id="keboola-url" placeholder="https://connection.keboola.com"
style="width: 100%; padding: 10px 12px; border: 1px solid #d1d5db; border-radius: 6px; margin-bottom: 12px; font-size: 14px; box-sizing: border-box;">
<label style="display: block; margin-bottom: 4px; font-size: 14px; font-weight: 500;">Storage API Token</label>
<input type="password" id="keboola-token" placeholder="Your Keboola storage token"
style="width: 100%; padding: 10px 12px; border: 1px solid #d1d5db; border-radius: 6px; margin-bottom: 16px; font-size: 14px; box-sizing: border-box;">
</div>
<div id="bigquery-fields" style="display: none;">
<label style="display: block; margin-bottom: 4px; font-size: 14px; font-weight: 500;">GCP Project</label>
<input type="text" id="bq-project" placeholder="my-gcp-project"
style="width: 100%; padding: 10px 12px; border: 1px solid #d1d5db; border-radius: 6px; margin-bottom: 12px; font-size: 14px; box-sizing: border-box;">
<label style="display: block; margin-bottom: 4px; font-size: 14px; font-weight: 500;">Location</label>
<input type="text" id="bq-location" value="us" placeholder="us"
style="width: 100%; padding: 10px 12px; border: 1px solid #d1d5db; border-radius: 6px; margin-bottom: 16px; font-size: 14px; box-sizing: border-box;">
</div>
<button type="submit" class="btn btn-primary" style="width: 100%;" id="btn-source">
Configure Data Source
</button>
<button type="button" onclick="skipToStep(4)" class="btn btn-secondary" style="width: 100%; margin-top: 8px;" id="btn-skip-source">
Skip (configure later)
</button>
</form>
</div>
<!-- Step 3: Discover Tables -->
<div id="step-3" style="display: none;">
<p style="font-size: 14px; color: #6b7280; margin-bottom: 16px;">
Discover and register tables from your data source.
</p>
<button onclick="discoverTables()" class="btn btn-primary" style="width: 100%;" id="btn-discover">
Discover Tables
</button>
<div id="discover-result" style="display: none; margin-top: 12px; padding: 12px; background: #f0fdf4; border-radius: 6px; font-size: 14px;"></div>
<button onclick="goToStep(4)" class="btn btn-primary" style="width: 100%; margin-top: 12px; display: none;" id="btn-next-sync">
Continue
</button>
</div>
<!-- Step 4: First Sync & Done -->
<div id="step-4" style="display: none;">
<p style="font-size: 14px; color: #6b7280; margin-bottom: 16px;">
Start the first data sync and go to your dashboard.
</p>
<button onclick="triggerSync()" class="btn btn-primary" style="width: 100%;" id="btn-sync">
Start First Sync
</button>
<a href="/dashboard" class="btn btn-primary" style="width: 100%; margin-top: 12px; display: none; text-align: center; text-decoration: none;" id="btn-dashboard">
Go to Dashboard
</a>
</div>
</div>
</div>
</div>
<script>
let token = '';
const steps = {
1: { title: 'Setup Agnes', desc: 'Create your admin account to get started.' },
2: { title: 'Data Source', desc: 'Connect to your data source.' },
3: { title: 'Discover Tables', desc: 'Find and register tables from your data source.' },
4: { title: 'Almost Done', desc: 'Start syncing data and open your dashboard.' },
};
function showStatus(msg, type) {
const el = document.getElementById('status-msg');
el.textContent = msg;
el.style.display = 'block';
el.style.background = type === 'error' ? '#fef2f2' : '#f0fdf4';
el.style.color = type === 'error' ? '#dc2626' : '#16a34a';
}
function hideStatus() {
document.getElementById('status-msg').style.display = 'none';
}
function goToStep(n) {
hideStatus();
for (let i = 1; i <= 4; i++) {
document.getElementById('step-' + i).style.display = i === n ? 'block' : 'none';
document.getElementById('step-dot-' + i).style.background = i <= n ? 'var(--primary, #2563eb)' : '#e5e7eb';
}
document.getElementById('wizard-title').textContent = steps[n].title;
document.getElementById('wizard-description').textContent = steps[n].desc;
}
function skipToStep(n) {
goToStep(n);
}
function toggleSourceFields() {
const src = document.getElementById('data-source').value;
document.getElementById('keboola-fields').style.display = src === 'keboola' ? 'block' : 'none';
document.getElementById('bigquery-fields').style.display = src === 'bigquery' ? 'block' : 'none';
}
async function apiCall(url, body) {
const headers = { 'Content-Type': 'application/json' };
if (token) headers['Authorization'] = 'Bearer ' + token;
const resp = await fetch(url, { method: 'POST', headers, body: JSON.stringify(body) });
const data = await resp.json();
if (!resp.ok) throw new Error(data.detail || 'Request failed');
return data;
}
async function createAdmin(e) {
e.preventDefault();
const btn = document.getElementById('btn-admin');
btn.disabled = true;
btn.textContent = 'Creating...';
try {
const data = await apiCall('/auth/bootstrap', {
email: document.getElementById('admin-email').value,
password: document.getElementById('admin-password').value,
});
token = data.access_token;
sessionStorage.setItem('setup_token', token);
goToStep(2);
} catch (err) {
showStatus(err.message, 'error');
} finally {
btn.disabled = false;
btn.textContent = 'Create Admin Account';
}
return false;
}
async function configureSource(e) {
e.preventDefault();
const btn = document.getElementById('btn-source');
btn.disabled = true;
btn.textContent = 'Verifying...';
try {
const src = document.getElementById('data-source').value;
const body = { data_source: src };
if (src === 'keboola') {
body.keboola_url = document.getElementById('keboola-url').value;
body.keboola_token = document.getElementById('keboola-token').value;
} else if (src === 'bigquery') {
body.bigquery_project = document.getElementById('bq-project').value;
body.bigquery_location = document.getElementById('bq-location').value;
}
await apiCall('/api/admin/configure', body);
showStatus('Connection verified!', 'success');
if (src === 'local') {
goToStep(4);
} else {
goToStep(3);
}
} catch (err) {
showStatus(err.message, 'error');
} finally {
btn.disabled = false;
btn.textContent = 'Configure Data Source';
}
return false;
}
async function discoverTables() {
const btn = document.getElementById('btn-discover');
btn.disabled = true;
btn.textContent = 'Discovering...';
try {
const headers = { 'Content-Type': 'application/json' };
if (token) headers['Authorization'] = 'Bearer ' + token;
const resp = await fetch('/api/admin/discover-and-register', { method: 'POST', headers });
const data = await resp.json();
if (!resp.ok) throw new Error(data.detail || 'Discovery failed');
const el = document.getElementById('discover-result');
el.style.display = 'block';
el.textContent = `Registered ${data.registered} tables, skipped ${data.skipped}.`;
document.getElementById('btn-next-sync').style.display = 'block';
btn.style.display = 'none';
} catch (err) {
showStatus(err.message, 'error');
} finally {
btn.disabled = false;
btn.textContent = 'Discover Tables';
}
}
async function triggerSync() {
const btn = document.getElementById('btn-sync');
btn.disabled = true;
btn.textContent = 'Starting sync...';
try {
const headers = {};
if (token) headers['Authorization'] = 'Bearer ' + token;
await fetch('/api/sync/trigger', { method: 'POST', headers });
btn.style.display = 'none';
document.getElementById('btn-dashboard').style.display = 'block';
showStatus('Sync started! You can now go to your dashboard.', 'success');
} catch (err) {
showStatus(err.message, 'error');
btn.disabled = false;
btn.textContent = 'Start First Sync';
}
}
// Restore token from sessionStorage (in case of page reload)
const savedToken = sessionStorage.getItem('setup_token');
if (savedToken) token = savedToken;
</script>
{% endblock %}

11
docker-compose.ci.yml Normal file
View file

@ -0,0 +1,11 @@
# CI smoke test overlay — minimal config for testing in GitHub Actions.
# Usage: docker compose -f docker-compose.yml -f docker-compose.ci.yml up -d
services:
app:
environment:
- JWT_SECRET_KEY=smoke-test-ci-key-minimum-32-chars-xx
- SESSION_SECRET=smoke-test-session-key-32-chars-min-x
- DATA_DIR=/data
- TESTING=0
ports:
- "8000:8000"

View file

@ -7,6 +7,7 @@ services:
volumes: volumes:
- data:/data - data:/data
- ./config:/app/config:ro - ./config:/app/config:ro
# - ./custom-connectors:/app/connectors/custom:ro # Tier A: AI-generated connectors
env_file: .env env_file: .env
environment: environment:
- DATA_DIR=/data - DATA_DIR=/data

37
docs/RELEASE_TEMPLATE.md Normal file
View file

@ -0,0 +1,37 @@
# Release Notes Template
Use this template when adding a new entry to `CHANGELOG.md`.
---
## stable-YYYY.MM.N
**Image:** `ghcr.io/keboola/agnes-the-ai-analyst:stable-YYYY.MM.N`
**Digest:** `sha256:...` (from `docker inspect --format='{{index .RepoDigests 0}}'`)
**Date:** YYYY-MM-DD
### Added
- Feature description
### Changed
- Change description
### Fixed
- Bug fix description
### Breaking Changes
- Description of breaking change
- **Migration guide:** Steps to upgrade from previous version
### Deprecated
- Description of deprecated feature (will be removed in YYYY.MM.N)
---
## Guidelines
- Every merge to `main` creates a new `stable-YYYY.MM.N` release
- Include the image digest for verification with `cosign verify`
- Breaking changes require `BREAKING:` prefix in commit message
- Migration guides must include exact commands or config changes
- If a release deprecates the previous stable, note it explicitly

View file

@ -0,0 +1,16 @@
"""Generate OpenAPI snapshot from the current FastAPI app."""
import json
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
os.environ.setdefault("TESTING", "1")
os.environ.setdefault("JWT_SECRET_KEY", "snapshot-generation-key-32-chars-min!!")
from app.main import create_app # noqa: E402
app = create_app()
schema = app.openapi()
json.dump(schema, sys.stdout, indent=2, sort_keys=True)
sys.stdout.write("\n")

97
scripts/smoke-test.sh Executable file
View file

@ -0,0 +1,97 @@
#!/usr/bin/env bash
# Agnes smoke test — verifies a running instance is functional.
# Usage: ./scripts/smoke-test.sh [host:port]
# Default: http://localhost:8000
set -euo pipefail
HOST="${1:-http://localhost:8000}"
PASS=0
FAIL=0
TOKEN=""
check() {
local name="$1" ok="$2"
if [ "$ok" = "true" ]; then
echo " PASS $name"
((PASS++))
else
echo " FAIL $name"
((FAIL++))
fi
}
echo "Smoke test: $HOST"
echo "---"
# 1. Health check
HEALTH=$(curl -sf "$HOST/api/health" | python3 -c "import sys,json; print(json.load(sys.stdin)['status'])" 2>/dev/null || echo "unreachable")
if [ "$HEALTH" = "unhealthy" ] || [ "$HEALTH" = "unreachable" ]; then
echo " FATAL: health=$HEALTH"
exit 1
fi
check "health ($HEALTH)" "true"
# 2. Health has version fields
HAS_VERSION=$(curl -sf "$HOST/api/health" | python3 -c "
import sys,json
d=json.load(sys.stdin)
print('true' if 'version' in d and 'channel' in d and 'schema_version' in d else 'false')
" 2>/dev/null || echo "false")
check "health version fields" "$HAS_VERSION"
# 3. Bootstrap (only works on fresh DB; 403 means users exist)
BOOT_HTTP=$(curl -s -o /tmp/smoke_boot.json -w "%{http_code}" -X POST "$HOST/auth/bootstrap" \
-H "Content-Type: application/json" \
-d '{"email":"smoke@test.local","name":"Smoke Test","password":"SmokeTest123!"}' 2>/dev/null || echo "000")
if [ "$BOOT_HTTP" = "200" ]; then
TOKEN=$(python3 -c "import json; print(json.load(open('/tmp/smoke_boot.json'))['access_token'])" 2>/dev/null || echo "")
check "bootstrap (new admin)" "true"
elif [ "$BOOT_HTTP" = "403" ]; then
TOKEN="${SMOKE_TOKEN:-}"
echo " SKIP bootstrap (users exist)"
else
check "bootstrap (HTTP $BOOT_HTTP)" "false"
fi
# 4. Query SELECT 1 (requires auth)
if [ -n "$TOKEN" ]; then
QUERY_OK=$(curl -sf -X POST "$HOST/api/query" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"sql":"SELECT 1 as test"}' | python3 -c "
import sys,json
d=json.load(sys.stdin)
print('true' if len(d.get('rows',[])) > 0 else 'false')
" 2>/dev/null || echo "false")
check "query SELECT 1" "$QUERY_OK"
else
echo " SKIP query (no token)"
fi
# 5. Sync trigger
if [ -n "$TOKEN" ]; then
SYNC_HTTP=$(curl -s -o /dev/null -w "%{http_code}" -X POST "$HOST/api/sync/trigger" \
-H "Authorization: Bearer $TOKEN" 2>/dev/null || echo "000")
if [[ "$SYNC_HTTP" =~ ^(200|202)$ ]]; then
check "sync trigger" "true"
else
check "sync trigger (HTTP $SYNC_HTTP)" "false"
fi
else
echo " SKIP sync (no token)"
fi
# 6. Post-sync health (wait briefly)
sleep 5
HEALTH2=$(curl -sf "$HOST/api/health" | python3 -c "import sys,json; print(json.load(sys.stdin)['status'])" 2>/dev/null || echo "unreachable")
if [ "$HEALTH2" = "unhealthy" ] || [ "$HEALTH2" = "unreachable" ]; then
check "post-sync health ($HEALTH2)" "false"
else
check "post-sync health ($HEALTH2)" "true"
fi
# Results
echo ""
echo "Results: $PASS passed, $FAIL failed"
[ "$FAIL" -eq 0 ] || exit 1

View file

@ -4,12 +4,16 @@ Provides get_system_db() for the system state database
and get_analytics_db() for the analytics database with parquet views. and get_analytics_db() for the analytics database with parquet views.
""" """
import logging
import os import os
import re import re
import shutil
from pathlib import Path from pathlib import Path
import duckdb import duckdb
logger = logging.getLogger(__name__)
_SAFE_IDENTIFIER = re.compile(r"^[a-zA-Z_][a-zA-Z0-9_]{0,63}$") _SAFE_IDENTIFIER = re.compile(r"^[a-zA-Z_][a-zA-Z0-9_]{0,63}$")
SCHEMA_VERSION = 3 SCHEMA_VERSION = 3
@ -260,6 +264,16 @@ def _ensure_schema(conn: duckdb.DuckDBPyConnection) -> None:
"""Create tables if they don't exist. Apply migrations if schema version changed.""" """Create tables if they don't exist. Apply migrations if schema version changed."""
current = get_schema_version(conn) current = get_schema_version(conn)
if current < SCHEMA_VERSION: if current < SCHEMA_VERSION:
# Snapshot before migration for rollback support
if current > 0:
try:
db_path = Path(os.environ.get("DATA_DIR", "./data")) / "state" / "system.duckdb"
if db_path.exists():
snapshot = db_path.parent / "system.duckdb.pre-migrate"
shutil.copy2(str(db_path), str(snapshot))
logger.info("Pre-migration snapshot saved: %s", snapshot)
except Exception as e:
logger.warning("Could not create pre-migration snapshot: %s", e)
conn.execute(_SYSTEM_SCHEMA) conn.execute(_SYSTEM_SCHEMA)
if current == 0: if current == 0:
conn.execute( conn.execute(

5151
tests/snapshots/openapi.json Normal file

File diff suppressed because it is too large Load diff

View file

@ -144,6 +144,205 @@ class TestGetAnalyticsDb:
conn.close() conn.close()
class TestMigrationSafety:
"""Tests for schema migration correctness, idempotency, and safety snapshots."""
# Minimal v2 table_registry (no is_public column — that comes in v3)
_V2_TABLE_REGISTRY = """
CREATE TABLE table_registry (
id VARCHAR PRIMARY KEY,
name VARCHAR NOT NULL,
source_type VARCHAR,
bucket VARCHAR,
source_table VARCHAR,
sync_strategy VARCHAR DEFAULT 'full_refresh',
query_mode VARCHAR DEFAULT 'local',
sync_schedule VARCHAR,
profile_after_sync BOOLEAN DEFAULT true,
primary_key VARCHAR,
folder VARCHAR,
description TEXT,
registered_by VARCHAR,
registered_at TIMESTAMP DEFAULT current_timestamp
);
"""
def _create_v2_db(self, db_path):
"""Create a minimal v2-schema DuckDB file at db_path."""
import duckdb as _duckdb
db_path.parent.mkdir(parents=True, exist_ok=True)
conn = _duckdb.connect(str(db_path))
try:
conn.execute(
"CREATE TABLE schema_version (version INTEGER, applied_at TIMESTAMP DEFAULT current_timestamp);"
"INSERT INTO schema_version (version) VALUES (2);"
)
conn.execute(self._V2_TABLE_REGISTRY)
# Stub out remaining tables so _ensure_schema doesn't fail
for ddl in [
"CREATE TABLE IF NOT EXISTS users (id VARCHAR PRIMARY KEY, email VARCHAR)",
"CREATE TABLE IF NOT EXISTS sync_state (table_id VARCHAR PRIMARY KEY)",
"CREATE TABLE IF NOT EXISTS sync_history (id VARCHAR PRIMARY KEY, table_id VARCHAR)",
"CREATE TABLE IF NOT EXISTS user_sync_settings (user_id VARCHAR, dataset VARCHAR, PRIMARY KEY(user_id, dataset))",
"CREATE TABLE IF NOT EXISTS knowledge_items (id VARCHAR PRIMARY KEY, title VARCHAR)",
"CREATE TABLE IF NOT EXISTS knowledge_votes (item_id VARCHAR, user_id VARCHAR, PRIMARY KEY(item_id, user_id))",
"CREATE TABLE IF NOT EXISTS audit_log (id VARCHAR PRIMARY KEY, action VARCHAR)",
"CREATE TABLE IF NOT EXISTS telegram_links (user_id VARCHAR PRIMARY KEY, chat_id BIGINT)",
"CREATE TABLE IF NOT EXISTS pending_codes (code VARCHAR PRIMARY KEY, chat_id BIGINT)",
"CREATE TABLE IF NOT EXISTS script_registry (id VARCHAR PRIMARY KEY, name VARCHAR, source TEXT)",
"CREATE TABLE IF NOT EXISTS table_profiles (table_id VARCHAR PRIMARY KEY, profile JSON)",
"CREATE TABLE IF NOT EXISTS dataset_permissions (user_id VARCHAR, dataset VARCHAR, PRIMARY KEY(user_id, dataset))",
]:
conn.execute(ddl)
finally:
conn.close()
def test_v2_to_v3_migration(self, tmp_path, monkeypatch):
"""v2 DB migrated to v3: schema_version=3 and is_public column added."""
monkeypatch.setenv("DATA_DIR", str(tmp_path))
import duckdb as _duckdb
from src.db import _ensure_schema, get_schema_version
db_path = tmp_path / "state" / "system.duckdb"
self._create_v2_db(db_path)
conn = _duckdb.connect(str(db_path))
try:
_ensure_schema(conn)
assert get_schema_version(conn) == 3
cols = {
r[0]
for r in conn.execute(
"SELECT column_name FROM information_schema.columns WHERE table_name='table_registry'"
).fetchall()
}
assert "is_public" in cols
finally:
conn.close()
def test_migration_idempotency(self, tmp_path, monkeypatch):
"""Calling _ensure_schema twice on a fresh DB raises no error and leaves version at 3."""
monkeypatch.setenv("DATA_DIR", str(tmp_path))
import duckdb as _duckdb
from src.db import _ensure_schema, get_schema_version, SCHEMA_VERSION
db_path = tmp_path / "state" / "system.duckdb"
db_path.parent.mkdir(parents=True, exist_ok=True)
conn = _duckdb.connect(str(db_path))
try:
_ensure_schema(conn)
_ensure_schema(conn)
assert get_schema_version(conn) == SCHEMA_VERSION
finally:
conn.close()
def test_migration_preserves_data(self, tmp_path, monkeypatch):
"""Data inserted before migration is preserved after migration runs."""
monkeypatch.setenv("DATA_DIR", str(tmp_path))
import duckdb as _duckdb
from src.db import _ensure_schema, get_schema_version, _SYSTEM_SCHEMA
db_path = tmp_path / "state" / "system.duckdb"
db_path.parent.mkdir(parents=True, exist_ok=True)
conn = _duckdb.connect(str(db_path))
try:
# Build a v1 schema manually
conn.execute(
"CREATE TABLE schema_version (version INTEGER, applied_at TIMESTAMP DEFAULT current_timestamp);"
"INSERT INTO schema_version (version) VALUES (1);"
)
conn.execute("""
CREATE TABLE table_registry (
id VARCHAR PRIMARY KEY,
name VARCHAR NOT NULL,
folder VARCHAR,
sync_strategy VARCHAR,
primary_key VARCHAR,
description TEXT,
registered_by VARCHAR,
registered_at TIMESTAMP DEFAULT current_timestamp
);
""")
conn.execute(
"INSERT INTO table_registry (id, name, description) VALUES ('row1', 'MyTable', 'kept')"
)
# Stub remaining tables
for ddl in [
"CREATE TABLE IF NOT EXISTS users (id VARCHAR PRIMARY KEY, email VARCHAR)",
"CREATE TABLE IF NOT EXISTS sync_state (table_id VARCHAR PRIMARY KEY)",
"CREATE TABLE IF NOT EXISTS sync_history (id VARCHAR PRIMARY KEY, table_id VARCHAR)",
"CREATE TABLE IF NOT EXISTS user_sync_settings (user_id VARCHAR, dataset VARCHAR, PRIMARY KEY(user_id, dataset))",
"CREATE TABLE IF NOT EXISTS knowledge_items (id VARCHAR PRIMARY KEY, title VARCHAR)",
"CREATE TABLE IF NOT EXISTS knowledge_votes (item_id VARCHAR, user_id VARCHAR, PRIMARY KEY(item_id, user_id))",
"CREATE TABLE IF NOT EXISTS audit_log (id VARCHAR PRIMARY KEY, action VARCHAR)",
"CREATE TABLE IF NOT EXISTS telegram_links (user_id VARCHAR PRIMARY KEY, chat_id BIGINT)",
"CREATE TABLE IF NOT EXISTS pending_codes (code VARCHAR PRIMARY KEY, chat_id BIGINT)",
"CREATE TABLE IF NOT EXISTS script_registry (id VARCHAR PRIMARY KEY, name VARCHAR, source TEXT)",
"CREATE TABLE IF NOT EXISTS table_profiles (table_id VARCHAR PRIMARY KEY, profile JSON)",
"CREATE TABLE IF NOT EXISTS dataset_permissions (user_id VARCHAR, dataset VARCHAR, PRIMARY KEY(user_id, dataset))",
]:
conn.execute(ddl)
_ensure_schema(conn)
assert get_schema_version(conn) == 3
row = conn.execute(
"SELECT name, description FROM table_registry WHERE id='row1'"
).fetchone()
assert row is not None, "Pre-migration row was lost"
assert row[0] == "MyTable"
assert row[1] == "kept"
finally:
conn.close()
def test_pre_migration_snapshot_created(self, tmp_path, monkeypatch):
"""A pre-migrate snapshot is written when migrating an existing (non-fresh) DB."""
monkeypatch.setenv("DATA_DIR", str(tmp_path))
from src.db import get_system_db
# Create a v2 DB at the expected path before calling get_system_db
db_path = tmp_path / "state" / "system.duckdb"
self._create_v2_db(db_path)
conn = get_system_db()
try:
snapshot = tmp_path / "state" / "system.duckdb.pre-migrate"
assert snapshot.exists(), "Pre-migration snapshot was not created"
finally:
conn.close()
def test_no_snapshot_on_fresh_db(self, tmp_path, monkeypatch):
"""No pre-migrate snapshot is created when initialising a brand-new DB."""
monkeypatch.setenv("DATA_DIR", str(tmp_path))
from src.db import get_system_db
conn = get_system_db()
try:
snapshot = tmp_path / "state" / "system.duckdb.pre-migrate"
assert not snapshot.exists(), "Snapshot should not exist for a fresh DB"
finally:
conn.close()
def test_future_version_is_noop(self, tmp_path, monkeypatch):
"""_ensure_schema does nothing when schema_version > SCHEMA_VERSION."""
monkeypatch.setenv("DATA_DIR", str(tmp_path))
import duckdb as _duckdb
from src.db import _ensure_schema, get_schema_version
db_path = tmp_path / "state" / "system.duckdb"
db_path.parent.mkdir(parents=True, exist_ok=True)
conn = _duckdb.connect(str(db_path))
try:
conn.execute(
"CREATE TABLE schema_version (version INTEGER, applied_at TIMESTAMP DEFAULT current_timestamp);"
"INSERT INTO schema_version (version) VALUES (99);"
)
_ensure_schema(conn)
assert get_schema_version(conn) == 99
finally:
conn.close()
class TestGetAnalyticsDbReadonly: class TestGetAnalyticsDbReadonly:
def test_analytics_readonly_rejects_malicious_dir_name(self, tmp_path, monkeypatch): def test_analytics_readonly_rejects_malicious_dir_name(self, tmp_path, monkeypatch):
"""Directories with SQL-injection chars in their name are skipped.""" """Directories with SQL-injection chars in their name are skipped."""

View file

@ -0,0 +1,73 @@
"""OpenAPI snapshot test — detect breaking API changes.
Compares the current app's OpenAPI schema against a committed snapshot.
Fails if any path or HTTP method has been removed (breaking change).
To update the snapshot after an intentional change:
make update-openapi-snapshot
"""
import json
import os
from pathlib import Path
import pytest
SNAPSHOT_PATH = Path(__file__).parent / "snapshots" / "openapi.json"
@pytest.fixture(scope="module")
def current_schema():
os.environ.setdefault("TESTING", "1")
from app.main import create_app
app = create_app()
return app.openapi()
def test_snapshot_exists():
"""Committed OpenAPI snapshot must exist."""
assert SNAPSHOT_PATH.exists(), (
"No OpenAPI snapshot found. Generate one with: make update-openapi-snapshot"
)
def test_no_removed_paths(current_schema):
"""No API paths should be removed compared to the snapshot."""
if not SNAPSHOT_PATH.exists():
pytest.skip("No snapshot to compare against")
snapshot = json.loads(SNAPSHOT_PATH.read_text())
current_paths = set(current_schema.get("paths", {}))
snapshot_paths = set(snapshot.get("paths", {}))
removed = snapshot_paths - current_paths
assert not removed, (
f"BREAKING: {len(removed)} API path(s) removed: {sorted(removed)}\n"
"If intentional, run: make update-openapi-snapshot"
)
def test_no_removed_methods(current_schema):
"""No HTTP methods should be removed from existing paths."""
if not SNAPSHOT_PATH.exists():
pytest.skip("No snapshot to compare against")
snapshot = json.loads(SNAPSHOT_PATH.read_text())
current_paths = current_schema.get("paths", {})
snapshot_paths = snapshot.get("paths", {})
breaking = []
for path in set(snapshot_paths) & set(current_paths):
removed_methods = set(snapshot_paths[path]) - set(current_paths[path])
# Ignore non-HTTP keys like 'parameters'
http_methods = {"get", "post", "put", "delete", "patch", "head", "options"}
removed_http = removed_methods & http_methods
if removed_http:
breaking.append(f" {path}: {sorted(removed_http)}")
assert not breaking, (
f"BREAKING: HTTP methods removed from {len(breaking)} path(s):\n"
+ "\n".join(breaking)
+ "\nIf intentional, run: make update-openapi-snapshot"
)

View file

@ -304,26 +304,37 @@ class TestJwtClaims:
# ---- JWT Secret Hardening ---- # ---- JWT Secret Hardening ----
class TestJwtSecretHardening: class TestJwtSecretHardening:
def test_raises_without_jwt_secret_in_non_test_env(self): def test_auto_generates_jwt_secret_when_absent(self, tmp_path):
"""Module-level code must raise RuntimeError when JWT_SECRET_KEY is absent """When JWT_SECRET_KEY is absent and TESTING is not set,
and TESTING is not set, preventing accidental production deploys with no secret.""" the secret is auto-generated and persisted to a file."""
saved_key = os.environ.pop("JWT_SECRET_KEY", None) saved_key = os.environ.pop("JWT_SECRET_KEY", None)
saved_testing = os.environ.pop("TESTING", None) saved_testing = os.environ.pop("TESTING", None)
# Eject any cached module so the re-import re-executes module-level code saved_data_dir = os.environ.get("DATA_DIR")
os.environ["DATA_DIR"] = str(tmp_path)
# Eject cached modules so the re-import re-executes module-level code
sys.modules.pop("app.auth.jwt", None) sys.modules.pop("app.auth.jwt", None)
sys.modules.pop("app.secrets", None)
try: try:
with pytest.raises(RuntimeError, match="JWT_SECRET_KEY environment variable is required"):
importlib.import_module("app.auth.jwt") importlib.import_module("app.auth.jwt")
secret_file = tmp_path / "state" / ".jwt_secret"
assert secret_file.exists(), "JWT secret file should be auto-generated"
secret = secret_file.read_text().strip()
assert len(secret) == 64, "Auto-generated secret should be 64 hex chars (32 bytes)"
finally: finally:
# Restore environment before re-importing so the module loads cleanly # Restore environment before re-importing so the module loads cleanly
if saved_key is not None: if saved_key is not None:
os.environ["JWT_SECRET_KEY"] = saved_key os.environ["JWT_SECRET_KEY"] = saved_key
if saved_testing is not None: if saved_testing is not None:
os.environ["TESTING"] = saved_testing os.environ["TESTING"] = saved_testing
if saved_data_dir is not None:
os.environ["DATA_DIR"] = saved_data_dir
else:
os.environ.pop("DATA_DIR", None)
# If neither was set (bare test run), use TESTING flag so reload works # If neither was set (bare test run), use TESTING flag so reload works
if saved_key is None and saved_testing is None: if saved_key is None and saved_testing is None:
os.environ["TESTING"] = "1" os.environ["TESTING"] = "1"
sys.modules.pop("app.auth.jwt", None) sys.modules.pop("app.auth.jwt", None)
sys.modules.pop("app.secrets", None)
importlib.import_module("app.auth.jwt") importlib.import_module("app.auth.jwt")
# Clean up the temporary TESTING flag if we added it # Clean up the temporary TESTING flag if we added it
if saved_key is None and saved_testing is None: if saved_key is None and saved_testing is None: