From f593a151fc94b7abf48fc7ecf38443c6015b18ec Mon Sep 17 00:00:00 2001 From: Petr Simecek Date: Wed, 22 Apr 2026 16:31:13 +0200 Subject: [PATCH] docs(security): add padak-security.md audit report (#35) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * docs(security): add padak-security.md — full audit report from 2026-04-22 Four-agent audit (secrets/SQLi/authz/SSRF, auth flows, UI wiring, data layer) deduped into one document. Top 5 to fix first, second/third/fourth tier by real exploitability, verified non-issues so we don't re-open them, and coverage gaps where automated scanners / pytest / Jira connector / infra were not touched. Missing /auth/password/reset is already tracked in padak/keboola_agent_cli#206; other top items (script sandbox RCE, rate-limit, backslash open-redirect, SSRF) still need their own issues. * docs(security): rephrase methodology description Replace "four parallel agents" with "parallel review passes over four scope areas" — same meaning, removes the overlap with agentic-AI terminology. --- docs/padak-security.md | 188 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 188 insertions(+) create mode 100644 docs/padak-security.md diff --git a/docs/padak-security.md b/docs/padak-security.md new file mode 100644 index 0000000..8b2fafa --- /dev/null +++ b/docs/padak-security.md @@ -0,0 +1,188 @@ +# Security audit — Agnes AI Data Analyst + +**Date:** 2026-04-22 +**Branch audited:** `main` at commit `cbb7733` +**Method:** parallel review passes over four scope areas — (1) secrets/SQLi/authz/SSRF, (2) auth flows & route wiring, (3) templates & UI wiring/XSS, (4) data layer & config & dead code. Findings deduped across the passes, severities adjusted to real-world exploitability. + +Known issues already in flight are marked with their tracking links so we do not re-open them. + +--- + +## Top 5 — fix first + +### 1. `[CRITICAL]` Script-API sandbox escape → RCE for the `analyst` role + +- **File:** `app/api/scripts.py:116–180` +- **Required role:** `Role.ANALYST` (not admin) +- **Trigger:** `POST /api/scripts/run` with body: + ```python + ().__class__.__base__.__subclasses__()[N].__init__.__globals__["system"]("id") + ``` +- **Why existing guards miss it:** the AST walker and the string allowlist block direct `import os` / `exec`, but neither stops attribute traversal through Python's class hierarchy (`__class__ → __base__ → __subclasses__() → __globals__`). +- **Impact:** arbitrary OS commands under the FastAPI process uid. Gives access to `DATA_DIR` (DuckDB files, cached parquet), `.jwt_secret`, env vars, and any credentials mounted into the container. +- **Fix (minimum):** add to the string-pattern blocklist: `__subclasses__`, `__globals__`, `__mro__`, `__bases__`, `__class__`, `__dict__`, `__code__`. In the AST walker, reject any `ast.Attribute` whose `attr` starts and ends with `__`. +- **Fix (correct):** do not run untrusted Python in-process. Either drop server-side script execution entirely, or run the sandbox in `nsjail`/gVisor/Pyodide in isolation, or gate the endpoint behind `Role.ADMIN` if it must stay. +- **Confidence:** broken (verified analytically). + +### 2. `[HIGH]` `/auth/password/reset` endpoint missing — "Forgot Password?" returns 404 + +- **Template reference:** `app/web/templates/login_email.html:47` — `
` +- **URL map:** `app/web/router.py:119` — `"password_auth.reset_request": "/auth/password/reset"` +- **Backend:** `app/auth/providers/password.py` only registers `/login`, `/login/web`, `/setup`. No `/reset` handler is wired. +- **Related dead code:** templates `password_reset.html` and `password_setup.html` exist but no route renders them — indicates an abandoned reset flow. +- **Tracking:** [padak/keboola_agent_cli#206](https://github.com/padak/keboola_agent_cli/issues/206) +- **Confidence:** broken. + +### 3. `[HIGH]` No rate limiting on any auth endpoint + +- **Files:** `app/auth/providers/password.py:36`, `app/auth/providers/email.py:53`, `app/auth/router.py:58`, `app/main.py` (middleware). +- **Evidence:** `grep -rn "slowapi\|limiter\|throttle\|ratelimit"` — zero hits in `app/`. +- **Impact:** + - Password brute-force against `POST /auth/password/login` and `POST /auth/token`. + - Email bombing on `POST /auth/email/send-link` — attacker floods SMTP/SendGrid quota by looping with random recipients. + - Enumeration of bootstrap state via `POST /auth/bootstrap`. +- **Fix:** add `slowapi` with `@limiter.limit("10/minute")` on the four endpoints above, `get_remote_address` as key with proxy-aware client-IP extraction (project already has `_client_ip` helper in `app/auth/dependencies.py`). +- **Confidence:** broken. + +### 4. `[HIGH]` Open-redirect bypass via backslash in `safe_next_path` + +- **Files:** `app/auth/_common.py:10-24`, `app/auth/providers/password.py:95-96`, `app/web/router.py:218-219`. +- **Trigger:** `https://agnes/login?next=/\evil.com` +- **Why the current check fails:** Python sees `/\` (not `//`) and the guard `startswith("//")` does not fire. Every major browser normalizes `\` to `/` in URL paths, so `Location: /\evil.com` resolves as `//evil.com` — a cross-origin redirect. +- **Impact:** phishing — attacker crafts a link on the victim's Agnes URL, lands them on a lookalike after login. +- **Fix:** + ```python + if len(candidate) > 1 and candidate[1] in ("/", "\\"): + return default + ``` + Existing tests (`tests/test_web_ui.py:270-296`) cover `//evil.example/` but not `/\evil.com` — add the case. +- **Confidence:** broken. + +### 5. `[HIGH]` SSRF in `/api/admin/configure` — `keboola_url` accepted as-is + +- **File:** `app/api/admin.py:163–282`; the URL is passed straight to `KeboolaClient.test_connection()` which issues a GET request. +- **Trigger:** a compromised (or insider-threat) admin sends `{"keboola_url": "http://169.254.169.254/latest/meta-data/"}` or `http://localhost:5432/`. +- **Impact:** server as SSRF proxy to the private network — GCP/AWS instance metadata service (IAM tokens), internal databases, LAN services. +- **Why a domain allowlist is wrong:** `keboola_url` is the URL of the Keboola stack the Agnes instance connects to, **not** the Agnes host. Valid values include `connection.keboola.com`, `connection.eu-central-1.keboola.com`, `connection.europe-west3.gcp.keboola.com`, plus potentially self-hosted private Keboola stacks. A `.keboola.com` suffix check would break legitimate deployments. +- **Correct fix:** enforce `https://` scheme, then resolve the hostname and reject any result in a private / loopback / link-local / reserved block. Still allows any public HTTPS host. + ```python + from urllib.parse import urlparse + import ipaddress, socket + + def _validate_stack_url(url: str) -> str | None: + try: + p = urlparse(url) + except Exception: + return "not a valid URL" + if p.scheme != "https": + return "must use https" + if not p.hostname: + return "missing hostname" + try: + infos = socket.getaddrinfo(p.hostname, None) + except socket.gaierror: + return f"cannot resolve {p.hostname}" + for fam, _, _, _, sa in infos: + try: + ip = ipaddress.ip_address(sa[0]) + except ValueError: + continue + if ip.is_private or ip.is_loopback or ip.is_link_local or ip.is_reserved: + return f"{p.hostname} resolves to a private/loopback address ({ip})" + return None + ``` +- **Residual risk:** DNS rebinding (hostname resolves to a public IP at validation time, then to loopback at request time). Out of scope for this fix — would need an outbound egress proxy with an IP-level ACL. +- **Confidence:** broken. + +--- + +## Second tier (HIGH) + +| # | Category | File:line | Summary | Confidence | +|---|---|---|---|---| +| 6 | race | `app/auth/providers/email.py:106-128` | `_consume_token` read-validate-clear is not atomic. Two concurrent clicks on the same magic link can both issue JWTs. Fix: `UPDATE users SET reset_token=NULL WHERE id=? AND reset_token=?` and check `rowcount == 1` before issuing the JWT. | broken | +| 7 | bootstrap | `app/auth/router.py:103-158` | Check is "any user with `password_hash`". If a seed admin exists without a password (e.g. created by `LOCAL_DEV_MODE=1` then redeployed without it, or a `SEED_ADMIN_EMAIL` without `SEED_ADMIN_PASSWORD`), `/auth/bootstrap` stays open and any caller can register a new admin account. Fix: check "any user at all" or require an explicit admin token for bootstrap. | broken | +| 8 | cookie | `app/main.py:61` | `SessionMiddleware(secret_key=...)` without `max_age` → OAuth session cookie expires only when the browser closes. Also `https_only` is not forced in production. Fix: `max_age=3600` and `https_only=bool(os.environ.get("DOMAIN"))`. | broken | +| 9 | sqli-adjacent | `connectors/keboola/extractor.py:104-106, 128` | `CREATE OR REPLACE VIEW "{table_name}" AS SELECT * FROM kbc."{bucket}"."{source_table}"` — f-string DDL with identifier quotes. Inputs come from `table_registry` (admin-controlled), but relying on quote-escaping for DDL is fragile. Fix: validate each identifier against `^[a-zA-Z0-9_]{1,63}$` before interpolation. | suspicious | +| 10 | datetime | `connectors/keboola/client.py:183` | Same offset-naive vs. offset-aware bug as the one we just fixed in `app/auth/providers/email.py`. `datetime.now() - cached_time` crashes or treats every cache entry as stale, causing needless API calls. Fix: use `datetime.now(timezone.utc)` on both sides. | broken | + +--- + +## Third tier (MEDIUM) + +| # | Category | File:line | Summary | +|---|---|---|---| +| 11 | timing | `email.py:113`, `password.py:117`, `dependencies.py:148` | Token / `token_hash` compared with `!=` instead of `hmac.compare_digest`. 32-byte entropy mitigates the brute side, but constant-time comparison is a zero-cost hardening. | +| 12 | xss-latent | `app/web/templates/_theme.html:12` | `{{ var }}: {{ val }};` inside a `