* fix(ops): fail-fast guard in agnes-auto-upgrade — refuse to start containers if config disk not mounted Companion to keboola/agnes-the-ai-analyst-infra#62. Same incident: foundryai-development 2026-04-30, marketplaces / DuckDB / session secret written to /data (sdb) instead of the config disk (sdc), wiped on next container recreate. ## Why an app-side guard agnes-auto-upgrade.sh fires every 5 min on every VM. If `/data/state` is not on the config disk (because of the propagation regression fixed by the infra PR, or the boot-time udev race fixed by infra #58, or any future mount-loss path), this script previously ran `docker compose up -d` anyway — and the app silently wrote state onto the wrong disk. Next recreate, that state was gone. The boot-time fixes in infra are preventive. This is the runtime backstop. ## Behavior Before the existing pull/up logic, when /dev/disk/by-id/google-config-disk exists on the VM: 1. Up to 3 mount-and-verify attempts with backoff (2s, 4s, 6s). - Mount the config disk if /data/state is not a mountpoint. - Detect mismatch: if /data/state is mounted from the wrong source, umount and retry. 2. After the loop, assert findmnt source matches the config disk. - On mismatch: `logger -t agnes-auto-upgrade FATAL` + exit 1. systemd marks the service failed; no docker compose action runs; existing containers (if any) keep running on stale state, but no new write lands on the wrong disk. 3. Once verified mounted: re-apply `mount --make-rprivate /data /data/state` on every run. Idempotent. Guards against propagation regressions sneaking back in via future docker / kernel changes. VMs without a config disk (foundryai-poc, single-disk legacy) skip the whole block — the `if [ -e $CONFIG_DEVICE ]` guard. ## Tested Patched script installed on foundryai-development as a hotfix; manual run post-migration was a no-op (digest unchanged); /data/state stayed on sdc across a full `docker compose down + up -d` cycle. ## Rollout - This file is fetched by infra startup.sh from raw.githubusercontent.com/keboola/agnes-the-ai-analyst/main on every boot. Once merged to main, all VMs pick up the new script on their next boot — no infra recreate needed. - For immediate rollout to running VMs without waiting for next boot: `scp scripts/ops/agnes-auto-upgrade.sh <vm>:/tmp/ && ssh <vm> sudo install -m755 -o root -g root /tmp/agnes-auto-upgrade.sh /usr/local/bin/agnes-auto-upgrade.sh` (already done on foundryai-development). * chore: vendor-agnostic comment + changelog text Drop customer-specific VM names from the script comment and CHANGELOG entry. The OSS distribution should not name a particular operator's hosts; the technical description already conveys why the guard exists. * fix(ops): suppress mount stderr in retry loop Match the rest of the script's error-tolerant idiom (2>/dev/null). Mount failures in the cold-boot udev race the loop is designed to handle gracefully should not flow to stdout — cron would mail on every transient retry. Devin BUG_0001 on PR #146. * fix(changelog): move auto-upgrade entry to [Unreleased] Entry landed under v0.20.0 because that section was [Unreleased] when this branch first opened — releases v0.21–v0.24 cut in the meantime stranded it inside an already-released section. Move it back where new entries belong. Devin BUG_0001 on PR #146. * fix(infra): single-source agnes-auto-upgrade.sh via curl from main Replace the inline heredoc copy of the auto-upgrade script in the customer-instance Terraform startup template with a curl fetch from raw.githubusercontent.com on every boot. The inline copy had drifted several iterations behind canonical scripts/ops/agnes-auto-upgrade.sh (missing TLS overlay detection, array-form COMPOSE_FILES, and now the config-disk fail-fast guard from this PR). Devin ANALYSIS_0001 on PR #146. * fix(infra): fetch docker-compose.tls.yml unconditionally + document coupling The canonical agnes-auto-upgrade.sh from main detects TLS at runtime via cert files on disk, regardless of the TLS_MODE Terraform variable. Certs can appear after boot via agnes-tls-rotate.sh or manual provisioning, and the cron job would then fail every 5 min under 'set -euo pipefail' because docker-compose.tls.yml was never fetched. Also document the main-vs-COMPOSE_REF coupling: when the canonical script references a new compose file, the fetch list above must be updated to match — pinned-ref VMs would otherwise break. Devin BUG_0001 + ANALYSIS_0001 on PR #146. * fix(ops,infra): unconditional Caddyfile + skip tls overlay if missing Caddyfile fetch now matches docker-compose.tls.yml: unconditional in startup-script.sh.tpl. Without it, Docker would auto-create an empty directory at the bind-mount target and Caddy would crash-loop while the tls overlay has already closed :8000 — making the app unreachable on any non-caddy VM where certs land via rotate or manual provisioning. Defensive layer: agnes-auto-upgrade.sh now also requires Caddyfile to exist (size > 0) before activating the tls profile, with a WARN log if it's missing. Belt-and-suspenders so the failure mode is contained even when the script is deployed by some other path (not just the customer-instance TF module). Devin BUG_0001 on PR #146. * chore(release): cut 0.25.0 --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
100 lines
3.1 KiB
TOML
100 lines
3.1 KiB
TOML
[project]
|
|
name = "agnes-the-ai-analyst"
|
|
version = "0.25.0"
|
|
description = "Agnes — AI Data Analyst platform for AI analytical systems"
|
|
requires-python = ">=3.11,<3.14"
|
|
license = "MIT"
|
|
readme = "README.md"
|
|
|
|
dependencies = [
|
|
# Core database
|
|
"duckdb>=0.9.0",
|
|
# Web framework (FastAPI)
|
|
"fastapi>=0.115.0",
|
|
"uvicorn[standard]>=0.32.0",
|
|
"python-multipart>=0.0.26",
|
|
"jinja2>=3.1.0",
|
|
"starlette>=0.41.0",
|
|
# Authentication
|
|
"PyJWT>=2.8.0",
|
|
"itsdangerous>=2.1.0",
|
|
"authlib>=1.6.11",
|
|
"argon2-cffi>=23.1.0",
|
|
# HTTP client
|
|
"httpx>=0.27.0",
|
|
# CLI
|
|
"typer>=0.12.0",
|
|
"rich>=13.0.0",
|
|
# Configuration
|
|
"python-dotenv>=1.0.0",
|
|
"pyyaml>=6.0",
|
|
# Data processing
|
|
"pandas>=2.0.0",
|
|
"pyarrow>=12.0.0",
|
|
"pytz>=2024.1",
|
|
# SQL parsing — server-side WHERE validator for /api/v2/scan (app/api/where_validator.py)
|
|
# Minimum 30.x — older versions had walk() yielding (node, parent, key)
|
|
# tuples instead of expression nodes, which would silently bypass the
|
|
# WHERE-validator structural checks (isinstance(tuple, exp.Subquery)
|
|
# is always False). 30.x yields nodes directly.
|
|
"sqlglot>=30.0.0",
|
|
# Data source connectors
|
|
"google-cloud-bigquery>=3.0.0",
|
|
"google-cloud-bigquery-storage>=2.0.0",
|
|
# Google Workspace Cloud Identity / Admin SDK (Workspace group membership sync)
|
|
"google-api-python-client>=2.0.0",
|
|
# Profiler visualizations
|
|
"matplotlib>=3.8.0",
|
|
"numpy>=1.24.0",
|
|
# Claude Code marketplace endpoint — pure-Python git server mounted in FastAPI
|
|
"dulwich>=0.22.0",
|
|
"a2wsgi>=1.10.0",
|
|
# In-process TTL cache for marketplace etag (transitively present via
|
|
# google-auth, declared explicitly here because we depend on it directly).
|
|
"cachetools>=5.3.0",
|
|
]
|
|
|
|
[project.optional-dependencies]
|
|
# keboola-legacy: install kbcstorage>=0.9.0 manually if you need the legacy
|
|
# Keboola client fallback (primary path uses DuckDB Keboola extension)
|
|
dev = [
|
|
"pytest>=9.0.0",
|
|
"pytest-timeout>=2.0.0",
|
|
"pytest-xdist>=3.0.0",
|
|
"faker>=24.0.0",
|
|
"anthropic>=0.30.0",
|
|
"openai>=1.30.0",
|
|
# jsonschema validates the corporate-memory extraction-tool golden fixtures
|
|
# under tests/test_corporate_memory_v1.py (extraction.json, correction.json,
|
|
# confidence_calibration.json). Production code does not depend on it.
|
|
"jsonschema>=4.0.0",
|
|
# FastAPI debug toolbar — gated behind DEBUG=1 env var in app/main.py.
|
|
# Provides per-request panels (headers, routes, timer, profiling, etc.)
|
|
# for local development. Never loaded in production (no DEBUG=1 there).
|
|
"fastapi-debug-toolbar>=0.6.3",
|
|
]
|
|
|
|
[project.scripts]
|
|
da = "cli.main:app"
|
|
|
|
[build-system]
|
|
requires = ["hatchling"]
|
|
build-backend = "hatchling.build"
|
|
|
|
[tool.hatch.build.targets.wheel]
|
|
packages = ["app", "src", "connectors", "cli", "services", "config"]
|
|
|
|
[tool.ruff]
|
|
line-length = 120
|
|
target-version = "py313"
|
|
|
|
[tool.uv]
|
|
dev-dependencies = [
|
|
"pytest>=9.0.0",
|
|
"pytest-timeout>=2.0.0",
|
|
"pytest-xdist>=3.0.0",
|
|
"faker>=24.0.0",
|
|
"anthropic>=0.30.0",
|
|
"openai>=1.30.0",
|
|
"fastapi-debug-toolbar>=0.6.3",
|
|
]
|