agnes-the-ai-analyst

Author	SHA1	Message	Date
ZdenekSrotyr	5f6bb7a4b2	fix(security+ops) + release(0.12.1): #82 #85 #87 hardening + cut 0.12.1 (#104 ) * fix(security+ops): #82 #85 #87 — auth hardening, API validation, deploy posture Security and operational hardening across three issue groups: - M23: docker-compose.override.yml → docker-compose.dev.yml (BREAKING, prod foot-gun) - C13: Container runs as non-root user 'agnes' (USER directive in Dockerfile) - M21: Docker resource limits (mem_limit, cpus) on app + scheduler - M22: Caddyfile security headers (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, -Server) - M17: /api/health split into minimal (unauth) + /api/health/detailed (auth) (BREAKING) - M26: release.yml restricts build-and-push to main + workflow_dispatch; paths-ignore for docs - C2: table_id traversal validation on /api/data/{table_id}/download - M4: Upload streaming (chunk-read + temp file) instead of full-buffer; /local-md hashed filename - C5: reset_token removed from POST /api/users/{id}/reset-password response - C8: Startup WARNING when no user has password_hash (bootstrap window visible) - M9: Audit log on failed web form login (mirrors /auth/token endpoint) - M10: Atomic magic-link consume via compare-and-swap (CONSUMED: marker + DuckDB conflict catch) Also: SSRF protection on /api/admin/configure (#46), memory stats SQL aggregation (#90) Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix(review): SSRF 169.254.x.x + IPv6 multicast; M10 marker cleanup safety Review fixes: - Add 169.254.0.0/16 (link-local, cloud metadata) to SSRF regex — was missing, allowing requests to AWS/GCP/Azure metadata endpoints - Add ff[0-9a-f]{2}: (IPv6 multicast) to SSRF regex - M10: wrap Step 3 (CONSUMED marker cleanup) in try-except with warning log — prevents unhandled exception if DB write fails after successful token consumption - Add test for 169.254.169.254 SSRF rejection Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix(review): SSRF IPv6 bypass, CLI health endpoint, upload FD leak Address Devin Review findings on PR #104: 1. SSRF IPv6 bypass: Replace hostname regex with DNS resolution + ipaddress module checks. The old regex patterns like `fe80:` only matched up to the first colon, missing real IPv6 addresses like `fe80::1`, `fc00::1`, `ff02::1`. The new approach resolves the hostname via getaddrinfo and checks each resulting IP against ipaddress.is_private/is_loopback/is_link_local/is_reserved/is_multicast. 2. CLI commands broken: `da setup test-connection`, `da setup verify`, `da diagnose`, `da status` all called /api/health expecting the old format (status=="healthy", services dict). Now they call /api/health/detailed for service-level checks (with graceful fallback to the minimal endpoint when auth is not configured). 3. Temp file handle leak: _stream_to_temp returns an open NamedTemporaryFile; callers now close it before shutil.move() to prevent FD leaks until GC. Also adds IPv6 SSRF test cases (loopback, link-local, unique-local, multicast) with mocked DNS resolution for test environment independence. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix(review): download regex blocks hyphenated IDs; document health split Address Devin Review round-3 findings on PR #104: 1. _SAFE_IDENTIFIER regex blocked hyphenated table IDs: The download endpoint used the strict SQL-identifier regex which does not allow dots or hyphens, but Keboola table IDs like in.c-crm.orders contain both. Switched to _SAFE_QUOTED_IDENTIFIER which allows dots and hyphens while still blocking path-traversal chars (/, .., \) and quote/control characters. Added test for hyphenated/dotted IDs. 2. Documented health endpoint split in DEPLOYMENT.md: Added Health checks & external monitoring section explaining both endpoints (minimal unauth /api/health vs authenticated /api/health/detailed) and how to wire external monitoring tools to the detailed endpoint with a PAT. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com> * release(0.12.1): cut hotfix for snapshot integrity + #82/#85/#87 hardening * fix(security): apply CAS pattern to password reset confirm (#82/M10 follow-up) Devin review on the rebased PR flagged the asymmetry: magic-link verify got the atomic compare-and-swap pattern in the original M10 fix, but password reset confirm at /auth/password/reset/confirm was still using read-validate-clear. Two concurrent POSTs with the same valid reset token could both succeed in setting different new passwords (last-write- wins). Lower severity than the magic-link race because the attacker would need the reset token AND to race the legitimate user, but the asymmetry was a polish gap. Mirrors app/auth/providers/email.py::_consume_token CAS exactly: write unique CONSUMED:<random> marker via UPDATE...WHERE token=old_token, then SELECT to verify our marker won, then proceed. Only the winner clears the marker and applies the password change. New regression test_concurrent_reset_only_one_wins in tests/test_password_flows.py::TestResetConfirm pins the contract: two ThreadPoolExecutor workers + Barrier hit /reset/confirm with the same token; exactly one gets 302 (password applied), the other gets 200 with 'Invalid or expired'. Sanity-checked against the pre-CAS code — both POSTs got 302 (race confirmed). --------- Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>	2026-04-28 19:57:30 +02:00
ZdenekSrotyr	4e4d2a39e6	chore(oss): isolate customer-specific deploy bits from scripts/grpn/ (#88 , wave 1) (#94 ) * chore(oss): isolate customer-specific deploy bits from scripts/grpn/ (#88) Vendor-neutralization step before public release. The directory mixed two concerns: (1) generic ops scripts referenced from mainline OSS infrastructure (TLS rotation, auto-upgrade cron) and (2) one operator's hackathon manual-deploy helper with hardcoded GCP project IDs, VM names, and admin emails. Splitting them per concern. Moved (still in OSS, just under a vendor-neutral name): - scripts/grpn/agnes-tls-rotate.sh → scripts/ops/agnes-tls-rotate.sh - scripts/grpn/agnes-auto-upgrade.sh → scripts/ops/agnes-auto-upgrade.sh Removed (belongs in private consumer infra repos, not upstream OSS): - scripts/grpn/Makefile (hardcoded prj-grp-foundryai-dev-7c37, foundryai-development VM name, e_zsrotyr@groupon.com bootstrap email) - scripts/grpn/README.md (GRPN hackathon deploy walkthrough) - docs/superpowers/plans/2026-04-22-grpn-deploy-learnings.md (org-specific deploy log) Cross-refs updated in README.md, CLAUDE.md, docs/DEPLOYMENT.md, docker-compose.yml. CHANGELOG entry flags BREAKING (ops) for any consumer infra repo that installs these scripts via path-based systemd timers. This is the first wave of #88 — the remaining leaks (test data with prj-grp-dataview-prod-1ff9, AIAgent.FoundryAI tags in OpenMetadata test fixtures, docstrings in connectors/openmetadata/enricher.py) will be a separate, smaller PR. Refs #88. * chore(oss): comprehensive vendor-neutralization (#88 wave 2 + review fixes) PR #94 review found that the original wave-1 grep was scoped wrong and many leaks survived. This commit closes wave 1 properly AND folds in all wave-2 anonymization in a single pass — easier to review than two PRs. Wave-1 review-fix corrections: - Caddyfile: scripts/grpn/agnes-tls-rotate.sh → scripts/ops/ (the original wave-1 grep filter excluded extensionless files like Caddyfile). - CHANGELOG bullet rewritten — original wording implied an in-repo migration for infra/modules/customer-instance/, which is wrong (the TF module embeds the script inline via heredoc, never sourced from scripts/grpn/). Now flags downstream consumer infra repos only. - infra/modules/customer-instance/variables.tf: Czech docstring with `grpn` example → English description with `acme, example` placeholders. Wave-2 anonymization: - Code docstrings (connectors/openmetadata/{client,transformer,enricher}.py, src/catalog_export.py, scripts/duckdb_manager.py): prj-grp-… → my-bq-project / prj-example-1234, AIAgent.FoundryAI → AIAgent.MyAgent, FoundryAIDataModel → AnalyticsDataModel. - Test fixtures (4 files): same set of replacements — 157 tests still pass. - .github/workflows/keboola-deploy.yml: "Groupon-side dev VMs" comment → generic "per-developer dev VMs". - docs/auth-groups.md + scripts/debug/probe_google_groups.py: kids-ai-data-analysis project name → acme-internal-prod placeholder. - 5 planning/spec docs under docs/superpowers/{plans,specs}/2026-04-21-: hardcoded IPs (34.77.94.14, 34.77.102.61) → <dev-vm-ip>/<prod-vm-ip>; GRPN/Groupon → Acme/another-customer; prj-grp-… → prj-example-…. - scripts/switch-dev-vm.sh deleted — hackathon-era helper hardcoded to a specific shared dev VM. Per-developer dev VMs are the supported pattern. Final grep `groupon\|grpn\|foundryai\|prj-grp\|groupondev\|34\.77\.(94\|102)\.…\|kids-ai-data` returns zero hits (excluding CHANGELOG.md historical entries). CHANGELOG entry expanded to document both waves under one bullet, with the BREAKING (ops) clarification about the TF module being unaffected. Refs review of #94, closes #88. fix(oss): close remaining #94 review-2 findings (Czech, padak refs, CHANGELOG) Reviewer of PR #94 round 2 caught 4 remaining items the wave-2 pass missed: 1. infra/modules/customer-instance/variables.tf had Czech descriptions on 8 more variables. Previous review only flagged line 19; this round audited the rest. Translated lines 2, 28, 42-46 (heredoc), 60, 65, 71, 78, 84 to English. Same review concern: a Terraform module that is the customer-facing API surface in Czech is unfit for OSS distribution. 2. infra/modules/customer-instance/outputs.tf had Czech descriptions on four outputs. Same fix. 3. docs/padak-security.md referenced a private repo (padak/keboola_agent_cli#206) in two places. Replaced with generic 'tracked upstream in the auth-CLI repo' per CLAUDE.md vendor-agnostic rule (no cross-refs to private repos). 4. scripts/fetch-env-from-secrets.sh:41 had a Czech comment. Translated. 5. CHANGELOG cosmetic: bullet said 'AIAgent.FoundryAI -> AIAgent.MyAgent' but the actual code uses both MyAgent (in docstrings) and Example (in test fixtures). Reworded to mention both targets. Final grep across all shipping file types (.md, .py, .yml, .yaml, .sh, Makefile, .json, .tf, .tpl, Caddyfile, .toml) for groupon\|grpn\|foundryai\| prj-grp\|groupondev\|34.77.94.14\|34.77.102.61\|kids-ai-data\|padak/keboola_agent_cli returns ZERO hits (excluding CHANGELOG.md). Czech-diacritic grep across .tf/.toml/Caddyfile/Makefile/.yml returns ZERO hits. 157/157 OpenMetadata + DuckDB tests still pass. * fix(oss): close #94 round-3 leaks (env.template, instance.yaml.example, padak typo) Round-3 reviewer caught two MUST-FIX leaks the round-2 grep missed (grep was scoped to extensions that did not include .template / .example suffixes — the audit was right, the previous grep was not paranoid enough): 1. config/instance.yaml.example:114 — '(optional - Groupon-specific)' brand leak in a shipping config example. Replaced with '(optional)'. 2. config/.env.template:68 — stale path 'scripts/grpn/agnes-tls-rotate.sh' in operator-facing env-template comment. The script lives at scripts/ops/ now (commit 16a85cc); this comment had been pointing operators at a non-existent path. 3. docs/padak-security.md:188 — phrase duplication 'tracked in tracked upstream' from a sloppy substitution in round-2. Trivial wording fix. Final paranoid grep across .md/.py/.yml/.yaml/.sh/Makefile/.json/.tf/.tpl/ Caddyfile/.toml/.template/.example/.env* with the full token set (groupon\|grpn\|foundryai\|prj-grp\|groupondev\|34\.77\.94\.14\|34\.77\.102\.61\| kids-ai-data\|padak/keboola_agent_cli) returns ZERO hits, excluding CHANGELOG.md historical entries. * fix(oss): #94 round-4 — QUICKSTART.md + rename padak-security.md Devin Review caught two findings on the latest round-3 commit: 1. docs/QUICKSTART.md:67 still pointed users at the deleted scripts/switch-dev-vm.sh. A Quickstart user following step-by-step would hit a missing-file error at the final step. Replaced with the inline gcloud-ssh equivalent that the Removed bullet documents. 2. docs/padak-security.md filename retains the personal identifier 'padak'. The PR fixed the body content (replaced padak/keboola_agent_cli#206 references with generic wording) but missed the filename. Renamed to docs/security-audit-2026-04.md (date-anchored, vendor-neutral). Updated the historical CHANGELOG link to point at the new path with an inline note about the rename. * fix(oss): redact remaining hardcoded IPs from planning docs + remove default email Devin Review caught two more leaks: 1. scripts/fetch-env-from-secrets.sh line 16 had a hardcoded personal-email default (zdenek.srotyr@keboola.com). Replaced with ':?' bash error so SEED_ADMIN_EMAIL must be explicitly set — safer than carrying any specific identity. 2. Planning docs still had 35.195.96.98 and 34.62.223.189 (legacy prod/dev IPs) that the round-1 IP-replace pattern missed (it only targeted 34.77.x.x). Generic regex redaction across all five planning docs replaces every public IP with <redacted-ip>, preserving private/loopback/IAP ranges.	2026-04-27 20:24:34 +02:00
Petr Simecek	4799119c81	feat(deploy): keboola-deploy tag-triggered workflow + Caddyfile LE/internal modes + dev_instances TLS support (#52 ) * feat(deploy): keboola-deploy tag-triggered workflow + Caddyfile LE/internal modes + dev_instances TLS support Three coordinated changes that together unblock Keboola's internal Agnes deployment from the foot-gun where the dev VM tracks `:dev` (= last push from anyone in the upstream repo). 1. .github/workflows/keboola-deploy.yml — new workflow Triggered ONLY on `keboola-deploy-` git tag pushes (not on every branch push like release.yml). Builds an image and publishes two GHCR tags: ghcr.io/keboola/agnes-the-ai-analyst:keboola-deploy-<git-tag-suffix> ghcr.io/keboola/agnes-the-ai-analyst:keboola-deploy-latest The Keboola dev VM pins to `keboola-deploy-latest`; an operator deploys by `git tag keboola-deploy-foo && git push origin keboola-deploy-foo`. Audit trail lives in git tags (immutable, who-tagged-what-when), no PR-cycle needed for each deploy. Doesn't touch Vojta/Minas/David workflow — release.yml still builds `:dev-<slug>` for every branch push as before. 2. Caddyfile — parametrize TLS directive via $CADDY_TLS env var PR #51 hardcoded cert-file mode (`tls /certs/fullchain.pem ...`) for Groupon's corporate CA flow. That broke the Let's Encrypt path the module previously supported. Now: CADDY_TLS unset (default) → cert-file mode (Groupon corp PKI) CADDY_TLS="tls user@x.com" → Let's Encrypt auto-issue CADDY_TLS="tls internal" → Caddy-managed self-signed (lab/dev) Single Caddyfile, three regimes, no per-deployment fork. Validated with `caddy validate` in all three modes. 3. customer-instance module — dev_instances TLS + auto-set CADDY_TLS - variables.tf: dev_instances object schema gains optional tls_mode + domain (mirroring prod_instance). Defaults to "none" + "" so existing callers without those fields keep current behavior. - startup-script.sh.tpl: when tls_mode="caddy" and DOMAIN is set, write CADDY_TLS=tls <ACME_EMAIL> (or "tls internal" when ACME_EMAIL empty) into /opt/agnes/.env. Caddy then picks it up and the Caddyfile substitution flips the cert source. For an LE deploy: set tls_mode="caddy", domain="agnes-dev.example.com", ensure DNS A-record points at the VM, and acme_email is set on the module (or seed_admin_email is, since acme_email defaults to it). After this lands, tag as infra-v1.6.0 so downstream infra repos can bump their module ref without needing the upstream change tracking. feat(deploy): fetch optional Google OAuth credentials from Secret Manager Mirrors the existing keboola-storage-token / agnes-<customer>-jwt-secret pattern: VM SA reads google-oauth-client-{id,secret} secrets at boot (if they exist + IAM is wired by caller via runtime_secrets) and writes them into /opt/agnes/.env. Empty / missing / 403 → silent fallback to "" so password and email auth keep working untouched. Pairs with downstream change in agnes-infra-keboola which adds the two secret names to runtime_secrets, granting the Keboola VM SA secretAccessor on them. Operator pre-creates the SM containers via gcloud secrets create google-oauth-client-{id,secret} (one-time, out of band) — values stay in SM forever; rotation = `gcloud secrets versions add`. This unblocks the Keboola agnes-dev deploy from PR #3 (infra) — without GOOGLE_CLIENT_{ID,SECRET} in .env, app/auth/providers/google.is_available() returns False and the Google sign-in button never even appears.	2026-04-25 23:19:00 +02:00
Petr Simecek	1bbbe58ea0	release(2.1.0): durable sync, CLI auto-update, versioned wheel URL, version unification (#43 ) * fix(cli): versioned wheel URL in setup instructions; drop broken /cli/agnes.whl alias (#36) * fix(cli): inline PEP 427 wheel filename in setup instructions `uv tool install <server>/cli/agnes.whl` fails with error: The wheel filename "agnes.whl" is invalid: Must have a version because uv validates the filename in the URL path before fetching — so the server-side Content-Disposition header (which has the real versioned filename) is never consulted, and an HTTP redirect does not help either: uv resolves the filename from the initial URL. Fix the root cause by inlining the real PEP 427 filename into the setup snippet the dashboard copies to the clipboard. The wheel filename is resolved server-side via `_find_wheel()` and substituted into the lines returned from `setup_instructions.resolve_lines()`, so both the read-only HTML preview and the JS clipboard renderer get byte-identical output. Also added `/cli/wheel/{filename}` to serve wheels at their PEP 427 path, and kept `/cli/agnes.whl` as a 302 redirect for manual/legacy callers — though that redirect alone is NOT sufficient for `uv tool install` (uv validates before following redirects) and is there only as defense-in-depth. Verified locally: - `uv tool install <server>/cli/wheel/agnes_the_ai_analyst-2.0.0-py3-none-any.whl` succeeds - `/install` HTML now renders the versioned URL; `/cli/agnes.whl` no longer appears in the rendered snippet * fix(cli): remove /cli/agnes.whl alias entirely — it only confused users The bareword alias was never actually usable: - `uv tool install <server>/cli/agnes.whl` fails at filename validation before any HTTP fetch, so neither the Content-Disposition header nor a 302 redirect rescued it. - The 302-to-versioned-path fallback left a visibly "working" URL in browser / curl -L contexts, which is exactly how the original bug got reported in the first place ("the URL loads, why doesn't install work?"). Remove the endpoint and scrub all remaining references. The only CLI wheel URL is now `/cli/wheel/{filename}` with the real PEP 427 filename, which the setup-instructions template already generates server-side. Existing tests that referenced /cli/agnes.whl become negative tests ("must not appear") so we don't regress. * feat(cli): --version flag; sync --dry-run + progress indicator (#38) * feat(cli): add --version / -V flag Prints `da <version>` from package metadata (importlib.metadata). Falls back to "unknown" when the package is not installed (e.g. running from a source checkout without `uv pip install -e .`), instead of crashing. Eager typer callback, so `da --version` exits before subcommand resolution and does not require any auth/config. * feat(cli): da sync --dry-run + X/N progress indicator --dry-run reports what would be downloaded/uploaded without hitting the API or writing local state. Supports the full flag set (--table, --json, --upload-only); JSON shape is {"dry_run": true, "would_download": [...], "summary": {...}}. Progress bar now shows "[X/N] Downloading <table>..." with a Rich BarColumn + TaskProgressColumn + TimeElapsedColumn instead of a bare spinner — makes long syncs visible. * feat(cli): durable sync + server gzip + auto-update check (#41) * fix(sync): atomic writes + manifest hash verification + retry on transient errors Three durability hooks around stream_download and the sync command: 1. Atomic writes. stream_download now streams into `<target>.tmp` and calls os.replace() on success, so the real target file never exists in a half-written state. On failure the tmp is unlinked — no cleanup leftovers, no guard needed at read time. 2. Retry with backoff. Transient errors (ConnectError, ReadError, WriteError, RemoteProtocolError, TimeoutException, 5xx) are retried up to 3× with 0.3s / 1s / 3s backoff. 4xx (auth, 404) surfaces immediately — retrying those is pointless. 3. Manifest-hash verification. After download, sync.py computes MD5 of the target (same 8KiB chunking as app/api/sync.py:_file_hash) and compares against `server_tables[tid]["hash"]`. Mismatch ⇒ unlink, record error, skip state commit. The PAR1 structural check survives as a fallback for legacy manifests without a hash. Also makes _rebuild_duckdb_views tolerant: single broken parquet is skipped with a stderr warning instead of killing the whole rebuild. Supersedes #40 — this commit is a strict super-set (hash check + PAR1 fallback + atomic write + retry). #40 can be closed without merging. * perf(server): enable GZipMiddleware for JSON / HTML responses GZipMiddleware at minimum_size=1024 shaves bandwidth on manifest-style JSON endpoints (/api/sync/manifest, /api/version, …) and the /install HTML preview. Parquet file downloads are already columnar-compressed so the middleware sees limited benefit there — but it doesn't hurt, httpx on the client side decompresses transparently. Placed after session middleware so gzip wraps the session-Set-Cookie response too, and before CORSMiddleware so compression is applied to both cross-origin and same-origin responses. * feat(cli): auto-check for newer CLI version on startup Server side - GET /cli/latest returns {version, wheel_filename, download_url_path} for whatever wheel is currently in AGNES_CLI_DIST_DIR. Public, cacheable, no secrets — consumed by the CLI auto-update probe. Client side - New cli/update_check.py: reads /cli/latest with a 3s timeout, caches the result in $DA_CONFIG_DIR/update_check.json for 24h. Cache is invalidated when the installed version changes (e.g. after a fresh `uv tool install`) so stale "you're behind" warnings don't linger. - Root typer callback fires the probe before subcommand dispatch; any failure is swallowed so a bad network never blocks a working command. - Outdated → one-line stderr warning: [update] da 2.0.0 is out of date — latest on this server is 2.1.0. Upgrade: uv tool install --force <server>/cli/wheel/<…>.whl - Disable with DA_NO_UPDATE_CHECK=1. * fix(pr-review): None-guard the upgrade line + skip gzip on parquet paths Two follow-ups from Devin review on #41. 1. format_outdated_notice(UpdateInfo(download_url=None)) emitted literal "uv tool install --force None" — copy-pasting that fails. Drop the upgrade snippet when the URL is absent and keep only the version line. 2. GZipMiddleware compressed everything over 1024 bytes, including the parquet FileResponses served by /api/data/{tid}/download, /cli/wheel/{name}, and /cli/download. Parquet is already columnar- compressed — gzip there is pure CPU + latency with no size win, and /api/data bodies can reach hundreds of MB. Wrap GZipMiddleware in a small _SelectiveGZipMiddleware that skips those path prefixes and delegates the rest to the stock middleware. JSON / HTML endpoints (manifest, /install, /api/version, …) still get compressed. * release: bump to 2.1.0 — unify AGNES_VERSION with pyproject.toml version (#42) Before: two independent version systems. pyproject.toml carried semver (2.0.0 → wheel filename → `da --version`) while release.yml injected CalVer into AGNES_VERSION (e.g. 2026.04.155 → /api/version). Users saw different strings in the CLI vs. the /install page, and the CLI auto- update check couldn't tell "new deploy, same package version" apart from "new package version". Make pyproject.toml [project].version the single product-version source of truth. release.yml extracts it and feeds AGNES_VERSION, so every surface (/api/version, /api/health, /cli/latest, `da --version`) agrees on one number. The CalVer tag keeps doing what CalVer is for: release identity on the git tag and Docker image tag (versioned_tag). Also wires AGNES_TAG through the build: release.yml → Dockerfile ARG → env, so /api/version.image_tag finally reports the actual image tag instead of the "unknown" fallback. Bump to 2.1.0 to reflect the PRs shipped on ps/wheel-name-fix: durable sync (atomic writes + manifest MD5 + retry), server GZip, CLI auto- update probe, setup snippet PEP 427 URL. * fix(pr-review): directional version compare in is_outdated() UpdateInfo.is_outdated() used `self.latest != self.installed`, which fires in both directions. If the server is rolled back or the user connects to an older deployment, the CLI would warn "out of date" and — worse — the formatted notice would prompt uv tool install --force <older-version>.whl i.e. an unintended downgrade. Compare with packaging.version.Version (PEP 440 aware, handles pre- release tags). Fall back to dotted-int tuple compare if packaging is somehow missing, and return False on unparseable strings — better to miss an upgrade hint than to silently suggest a downgrade. Adds 4 test cases: installed older (True), installed newer (False), 10.0.0 vs 2.1.0 lexical-compare trap (correct), unparseable strings (False). Addresses Devin review on #43. * fix(pr-review): read FastAPI app version from package metadata app/main.py:80 hardcoded `version="2.0.0"` in the FastAPI constructor. After #42 bumped pyproject.toml to 2.1.0, /api/version, /cli/latest, and `da --version` all reported 2.1.0 while /openapi.json and the /docs UI still advertised 2.0.0. Read `agnes-the-ai-analyst` version via importlib.metadata (same pattern cli/main.py:_cli_version already uses), with a `"dev"` fallback when the package is not installed (source checkout). This way pyproject.toml stays the single source of truth across every version surface — /openapi.json now tracks the bump automatically. Adds a dedicated test file to pin this behavior so a future regression to a hardcoded literal fails at CI. Addresses second Devin finding on #43. * fix(pr-review): _fmt_bytes PiB label + negative cache in update_check Two more follow-ups from Devin review on #43. 1. _fmt_bytes off-by-unit. The old loop exited at TiB but the fallback labelled PiB, so 1 PiB rendered as "1024.0 PiB". Restructure: put every unit inside the loop (KiB through EiB) so the division count always matches the label. Covers up to 1 ZiB cleanly; anything beyond renders as "<big>.0 EiB" rather than crashing. 2. Negative cache for failed /cli/latest probes. On a corporate firewall / VPN that silently drops packets, the 3s HTTP timeout fired on every `da` invocation. Writing a `latest=None` cache entry with a 5-minute TTL caps that at one probe per 5min. Successful probes still use the 24h TTL. Reading logic branches on whether the cached `latest` is None. Adds TestFmtBytes (2 cases: small/medium sizes and the PiB/EiB fallback regression), plus two TestSync update-check cases covering negative- cache reuse and TTL expiry.	2026-04-22 21:18:18 +02:00
ZdenekSrotyr	963db420fe	ci(release): push dev-<user-prefix>-latest alias for <user>/* branches (#31 ) Adds a second tag to dev-channel image builds: when a branch is in the form <prefix>/<whatever>, the image is also pushed as ghcr.io/keboola/agnes-the-ai-analyst:dev-<prefix>-latest. Enables per-developer dev VMs on GRPN (and elsewhere) to auto-deploy without knowing the specific branch slug. Each VM pins its .env to AGNES_TAG=dev-<prefix>-latest, and the auto-upgrade cron (5 min tick) picks up the newly pushed image on the next run. Common Git Flow prefixes are deliberately skipped so feature/, fix/, hotfix/* etc. don't create noise tags. Matched list: feature, fix, hotfix, bugfix, docs, chore, test, ci, ops, refactor, perf, style, build. Verified locally against several branch names: zs/my-feature -> dev-zs-latest vr/foo -> dev-vr-latest pc/bar-baz -> dev-pc-latest feature/xyz -> (skipped) fix/bug -> (skipped) main -> (no-op, stable channel) test-no-slash -> (no-op, no slash)	2026-04-22 14:02:59 +02:00
ZdenekSrotyr	4f381dc103	fix(ci): propagate-infra-tag fail-soft on branch push / missing secret (#24 ) Job-level 'if: secrets.X != ""' did not prevent workflow from being scheduled on branch pushes (GitHub reports failure with 0 jobs in that case). Refactored: first step is a guard that checks both the tag ref pattern and the secret presence; downstream steps skip when the guard says no. Result: workflow now reports success with a clear warning annotation on branch pushes or when the secret is absent; only real infra-v* tag pushes with the secret set perform the bump.	2026-04-21 21:59:10 +02:00
ZdenekSrotyr	e2eb51f657	ci(release): build image for all branches, not just feature/** (#19 ) * dryrun: intentional failing test (will be reverted) * feat(auth): optional SEED_ADMIN_PASSWORD to pre-hash seed admin (dev helper) Terraform gains enable_seed_password + seed_admin_password (sensitive) vars on the customer-instance module; when enabled the password is piped via startup-script into /opt/agnes/.env as SEED_ADMIN_PASSWORD. On first boot app/main.py argon2-hashes it onto the seed user so the admin can log in immediately without going through /auth/bootstrap. Never overwrites an existing password_hash — safe against accidental reset on terraform apply. * ci(release): build :dev-<slug> on any branch, not just feature/ Before: only 'feature/' branches triggered release.yml, so pushing 'zs/my-edit' or 'fix/bug' did not publish an image. dev_instances entry pinning image_tag = 'dev-zs-my-edit' then crashed VM startup with 'image not found'. Now: any branch push (except main, which produces :stable) publishes :dev-<slug>. Slug strips a leading 'feature/' and replaces non-[a-z0-9-] with '-', keeping existing feature/** behavior identical. * Revert "dryrun: intentional failing test (will be reverted)" This reverts commit cf9cc06a7884bb401ff29fc5cb6d8baf84dc3daa.	2026-04-21 21:33:57 +02:00
ZdenekSrotyr	2cbffce85f	ci: propagate infra-v* tags to template repo + auto-merge rules (#17 ) * dryrun: verify per-branch GHCR tag * ci: propagate infra-v* tag bumps to template repo On push of any infra-v* tag, opens a PR in keboola/agnes-infra-template that bumps the module ref in terraform/main.tf. Auto-merge rules in the template (Renovate + CI validate + GitHub native auto-merge) land it without manual work on patch/minor bumps. Requires repo secret TEMPLATE_REPO_TOKEN (fine-grained PAT with Contents:write + Pull requests:write on keboola/agnes-infra-template). Fail-soft: if secret is missing the job is skipped and Renovate on the template repo picks up the new tag on its next cycle as a fallback. * docs(onboarding): 'Keeping the template up-to-date' maintainer section Documents the two mechanisms (upstream release hook + Renovate), the required repo settings (allow_auto_merge, validate.yml gate), the TOKEN secret setup, and the one-time setup checklist. Notes the difference between template repo (auto-merge on) and customer infra repos (human approval).	2026-04-21 21:32:58 +02:00
ZdenekSrotyr	1c7cc8aa29	fix(image): add AGNES_COMMIT_SHA build-arg to Dockerfile + release.yml Completes the previous commit — bakes the full git SHA into the image ENV at build time so the UI badge shows a real commit, not a sha256 digest (which was the floating manifest digest and unhelpful for debugging).	2026-04-21 21:00:30 +02:00
ZdenekSrotyr	5188bd9127	ci: add per-branch image tag :dev-<slug> for branch-aware dev deploys Extracts branch name from GITHUB_REF, slugifies it, and adds as extra tag on feature branch builds. Main branch is unaffected (no branch_slug output). Enables dev_instances tfvar with image_tag pinning specific feature branches.	2026-04-21 18:47:01 +02:00
ZdenekSrotyr	5bbd82bacd	fix: address Devin review — docker-e2e .env, jira webhook test isolation - Create empty .env before docker compose up in CI (env_file: .env is required) - Mock get_jira_service in webhook HMAC test to isolate signature check from Jira API availability — strict assert 200 instead of permissive 500	2026-04-13 14:36:31 +02:00
ZdenekSrotyr	5bfff6616c	ci: add parallel test execution and nightly Docker E2E job	2026-04-12 14:15:46 +02:00
ZdenekSrotyr	44b99f25ca	fix: address Devin review round 5 — empty secret file, CI .env - secrets.py: validate file content is non-empty before using it; regenerate if file exists but is empty/corrupted - release.yml: touch .env before docker compose in smoke test (env_file: .env in docker-compose.yml requires the file to exist) 663 tests pass.	2026-04-10 14:55:31 +02:00
ZdenekSrotyr	40cca627be	fix: address Devin review round 4 — bash arithmetic, CalVer max, docs - smoke-test.sh: replace ((PASS++)) with PASS=$((PASS + 1)) to avoid set -e abort when counter is 0 (bash returns exit 1 for ((0))) - CalVer: use max(N) from existing tags instead of count, safe when tags are deleted (e.g. deprecated version cleanup) - CLAUDE.md: update schema version from v2 to v3 663 tests pass.	2026-04-10 14:39:16 +02:00
ZdenekSrotyr	dc8a9275e6	fix: address Devin review round 3 — retry exhaustion, discover path, WAL snapshot - CalVer retry loop now exits with error if all 5 attempts fail (prevents pushing Docker image with unclaimed version tag) - discover_tables endpoint reads data_source.keboola.url (consistent with configure_instance and _discover_and_register_tables) - Pre-migration snapshot flushes WAL via CHECKPOINT before copying and copies .wal file if it still exists after flush 663 tests pass.	2026-04-10 14:11:17 +02:00
ZdenekSrotyr	c79d85f87c	fix: config path mismatch + CalVer race condition (Devin review round 2) - _discover_and_register_tables reads from data_source.keboola.url (matches what /api/admin/configure writes) instead of top-level keboola.url which doesn't exist - CalVer: claim git tag BEFORE Docker build with retry loop (up to 5 attempts). Prevents race where two concurrent CI runs get same N. Git tag acts as a distributed lock for version uniqueness. 663 tests pass.	2026-04-10 13:30:05 +02:00
ZdenekSrotyr	49f109bf73	fix: address PR review findings — config write, CalVer, error handling - Config writes to DATA_DIR/state/instance.yaml (writable) instead of CONFIG_DIR (read-only :ro in Docker) - instance_config.py checks DATA_DIR/state/ first, then falls back to CONFIG_DIR for backward compat - CalVer counter is now global across channels (-YYYY.MM.) per spec - Keboola error messages sanitized — log full error, return generic msg - chmod in secrets.py wrapped in try/except for Windows compat - Setup wizard JS handles 401 (expired JWT) with user-facing message - deploy.yml changed to workflow_dispatch only (no duplicate test runs) - Smoke test uses docker-compose.prod.yml + AGNES_TAG instead of sed - docker-compose.prod.yml uses ${AGNES_TAG:-stable} env var 663 tests pass. 8 E2E verification tests pass.	2026-04-10 13:16:40 +02:00
ZdenekSrotyr	6c53082295	feat: multi-instance deployment — all 14 must-have items from spec CalVer CI (release.yml) with stable/dev channels, health endpoint with version/channel/schema_version, JWT secret auto-generation with file persistence, smoke test script + Docker-in-CI, pre-migration snapshot, /api/admin/configure for headless setup, /api/admin/ discover-and-register, /setup wizard, OpenAPI snapshot test, custom connector mount support, CHANGELOG, migration safety tests, startup banner. 663 tests pass (6 new migration safety + 3 OpenAPI snapshot + 1 updated JWT test).	2026-04-10 11:57:42 +02:00
ZdenekSrotyr	816f217d8e	feat: add commit SHA tag to Docker image push for rollback capability	2026-04-09 16:38:38 +02:00
ZdenekSrotyr	22b4d830e5	chore: upgrade docker actions to Node.js 24 (login-action@v4, build-push-action@v7)	2026-04-09 14:22:11 +02:00
ZdenekSrotyr	6ebfc15010	fix: setup-uv@v7 (v8 major tag doesn't exist yet)	2026-04-09 14:19:32 +02:00
ZdenekSrotyr	1ebf50bd78	fix: upgrade setup-uv@v5 → v8 (Node.js 24 native), add uv.lock - setup-uv@v8 runs on Node.js 24 natively — no more deprecation warnings - Removed FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 workaround (no longer needed) - Added uv.lock for reproducible dependency resolution	2026-04-09 14:16:55 +02:00
ZdenekSrotyr	554ba0d9f2	fix: remove Kamal deploy job (no server configured), force Node.js 24 in CI - Removed deploy-production job — Kamal config has placeholder IPs, no secrets - Renamed workflow to "Build & Push" — test + Docker image to GHCR - Added FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true to suppress Node.js 20 warnings	2026-04-09 14:10:37 +02:00
ZdenekSrotyr	0279cc06fa	refactor: consolidate deps into pyproject.toml, remove requirements.txt - All dependencies now in pyproject.toml [project.dependencies] - Dev/test deps in [project.optional-dependencies] dev and [tool.uv] - Dockerfile uses uv pip install . from pyproject.toml - CI uses uv pip install ".[dev]" - Deleted requirements.txt and requirements-dev.txt - Updated README, CLAUDE.md install instructions - Enhanced .dockerignore (exclude tests, docs, infra from image)	2026-04-09 13:17:59 +02:00
ZdenekSrotyr	fa3aef652f	chore: update GitHub Actions to Node.js 24 compatible versions (checkout@v5, setup-python@v6, setup-uv@v5)	2026-04-09 12:48:14 +02:00
ZdenekSrotyr	f9fae6e895	fix: CI installs requirements-dev.txt (faker needed for tests), set TESTING=1	2026-04-09 09:10:29 +02:00
ZdenekSrotyr	2635f77974	ci: add CI test suite + deploy pipeline - ci.yml: runs 607 tests + Docker build on push/PR - deploy.yml: tests → build → GHCR push → Kamal deploy on main	2026-04-08 18:24:05 +02:00
ZdenekSrotyr	cfa08c4b4c	chore: remove obsolete CI workflows (deploy-guard, deploy.yml.example) deploy-guard.yml referenced deleted tests and sudoers files. deploy.yml.example used legacy SSH-based deployment. Updated ci.yml and deploy.yml are in .gitignore (need workflow scope to push).	2026-04-08 18:16:48 +02:00
ZdenekSrotyr	a74f69d6b1	chore: exclude CI workflow from push (needs workflow scope)	2026-03-27 17:41:27 +01:00
ZdenekSrotyr	e0ce91ddb9	feat: add dataset permissions, script execution, Kamal config, CI/CD - SyncSettingsRepository + DatasetPermissionRepository with RBAC - Script deploy/run/undeploy API with import sandboxing - User sync settings API with permission checks - 4 CLI skills (connectors, security, notifications, corporate-memory) - Kamal production + staging configs - GitHub Actions CI + deploy workflows - 91 total tests passing	2026-03-27 15:40:11 +01:00
Petr	c56905d34f	Initial commit: OSS data distribution platform Open-source AI data analyst platform extracted from internal repo. Includes data sync engine, Keboola adapter, Flask web portal, server deployment scripts, and configuration templates.	2026-03-08 23:31:28 +01:00

31 commits