* feat(auth): internal roles + external→internal group mapping (foundation)
Two-layer authorization model: external Cloud Identity groups (org-managed)
get mapped onto internal Agnes-defined capabilities (app-managed) via an
admin-curated many-to-many table. Per-request permission checks read off
the session — no DB hit. Refresh requires re-login.
Schema v8 — new tables:
- internal_roles (id, key UNIQUE, display_name, description, owner_module, …)
— app-defined capabilities like 'context_admin'. Modules self-register at
import; the startup hook syncs the registry into this table (idempotent).
- group_mappings (id, external_group_id, internal_role_id FK, …)
— admin-managed bindings, UNIQUE(external_group_id, internal_role_id).
app/auth/role_resolver.py — new module:
- register_internal_role(key, display_name, description, owner_module)
Module-author entry point. lower_snake_case key, immutable, validated.
Same key + same fields = no-op (re-import safe); same key + different
fields = ValueError so two modules can't silently overwrite each other.
- sync_registered_roles_to_db(conn) — startup reconciliation. Inserts new
keys, updates drifted metadata, never deletes (preserves mappings).
- resolve_internal_roles(external_groups, conn) — joins group_mappings.
Sorted, deduplicated role-key list. Plugged into google_callback +
dev-bypass branch in get_current_user.
- require_internal_role('key') — FastAPI dependency factory; reads
session.internal_roles; 403 with explicit message when missing.
Resolution runs at sign-in only (Google callback + LOCAL_DEV_GROUPS change
in dev-bypass) — same semantics as session.google_groups. No admin UI yet;
mappings created via repository directly until follow-up PR ships UI.
21 new tests in tests/test_role_resolver.py: register/list, idempotency,
collision detection, key-format validation; sync insert/update/no-delete;
resolve empty/single/many-to-many/malformed-input; e2e via
LOCAL_DEV_GROUPS — gated endpoint allowed/denied + direct session-cookie
inspection. Full sweep: 178/178 passed across auth + db + repo tests.
(Two pre-existing test_catalog_export.py failures verified unrelated.)
* fix(auth): polish review feedback — first-request dev populate + PAT doc
Two follow-ups from a code-reviewer pass on the foundation commit before
opening the PR:
- Dev-bypass populates session["internal_roles"] on the first request
after sign-in, not just when external groups change. The previous
guard only resolved when groups_changed=True, which left a hole for
the LOCAL_DEV_GROUPS=`""` (explicit empty) flow: target=[],
current=None, neither write branch fires, internal_roles stays
unset, and require_internal_role then 403s with no roles to check
against. The OAuth callback writes session["internal_roles"]
unconditionally on sign-in (even []); dev-bypass now matches that
semantics. Adds a single-pass populate gated on the key being
absent from the session, so subsequent same-state requests still
no-op (cheap session lookup, no resolver call).
- Document that internal roles are session-scoped and PAT/headless
clients will get 403 from any require_internal_role(...) endpoint.
Same constraint already applies to session.google_groups (PAT JWTs
deliberately don't snapshot group memberships — they could change
after issuance with no way to re-sign), but the doc didn't surface
this — an operator pointing a CLI at a role-gated endpoint would
see 403 with no clue why. New "PAT and headless requests" section
spells out the constraint, the rationale, and the three escape
valves (use users.role for the gate; route through OAuth; wait for
the planned `da admin grant-role` CLI helper).
54 auth tests still pass locally (21 role-resolver + 33 existing
auth-provider).
* release(0.11.3): cut release for the internal-roles foundation
Bumps pyproject.toml 0.11.2 → 0.11.3 and renames CHANGELOG's
[Unreleased] section to [0.11.3] — 2026-04-26 (with a fresh
empty [Unreleased] skeleton appended). Adds the matching
[0.11.3] link reference at the bottom of CHANGELOG so the
section heading renders as a hyperlink to the GitHub release
page once the tag lands.
The bullet itself is unchanged content; the rephrasing of
"dev-bypass when external groups change" → "dev-bypass —
populates on first request and whenever external groups
change, mirroring the OAuth callback's always-write
semantics" reflects the polish committed in d590579, plus
the appended PAT/headless caveat pointing at the doc
section that landed in the same polish pass.
* fix(auth): address review feedback from Pavel — PAT-specific 403, audit logs, hardening
Round-2 polish over the internal-roles foundation, addressing Pavel's review
on PR #71. No behavior change for the happy path; tightens the safety rails
and makes the failure modes self-explanatory.
User-visible:
- require_internal_role now distinguishes "no session" (Bearer/PAT caller)
from "signed in but missing role" and surfaces a PAT-specific 403 detail
in the first case ("This endpoint needs an interactive (OAuth) session
— Bearer/PAT tokens do not carry session-resolved roles by design").
- docs/internal-roles.md documents deactivate+reactivate as the supported
"force re-resolve now" lever for users that can't be made to log out.
Internal hardening:
- INFO-level audit log on every successful resolve (OAuth callback +
dev-bypass) so a wrong-role complaint is debuggable from the log alone.
- Startup warning when SESSION_SECRET is shorter than 32 chars, matching
the existing JWT_SECRET_KEY gate — both HMAC surfaces sign trust-laden
state (session.internal_roles, session.google_groups, JWTs).
- _clear_registry_for_tests() now refuses to run unless TESTING=1 so a
stray import path in production can't drop the registered capabilities.
Tests:
- 4 new tests in tests/test_role_resolver.py covering: stale-session
contract after a mid-session mapping revoke (pin the documented
limitation), PAT 403 detail wording, OAuth pipeline data flow from
external groups to internal_roles, and the dev-bypass empty-list
fallback when the resolver raises.
CHANGELOG.md updated under [0.11.3] (### Changed + ### Internal).
CLAUDE.md schema doc bumped from v7 to v8.
---------
Co-authored-by: Claude <noreply@anthropic.com>
* feat(auth): mock session.google_groups in LOCAL_DEV_MODE via LOCAL_DEV_GROUPS
LOCAL_DEV_MODE auto-logged-in the dev user but left session.google_groups
empty, so group-aware UI/code paths can't be exercised on localhost without
a real Google OAuth round-trip. New LOCAL_DEV_GROUPS env var (JSON array
matching the production {id, name} shape) populates the session on every
dev-bypass request — same structure the OAuth callback writes, so mock and
prod stay in lockstep. Compare-then-write avoids spurious Set-Cookie noise
on PAT/CLI requests; malformed input falls back to [] with a WARNING so
the dev mock never breaks the dev flow.
* refactor(auth): fail-fast LOCAL_DEV_GROUPS at startup + cache + no-mutate
Three small follow-ups on the same dev-mock vector before merge:
- Validate LOCAL_DEV_GROUPS at app startup and report the parsed group IDs
in the LOCAL_DEV_MODE banner. A malformed value now warns loudly at boot
instead of silently logging on the first authenticated request, where
it's easy to miss.
- Cache the parsed result single-slot, keyed by the raw env-string. Avoids
re-parsing JSON on every authenticated request without test-isolation
surprises — when the env value changes, the key changes and the cache
transparently rebuilds.
- Stop mutating the parsed-input dicts (item.setdefault → spread-merge)
so the cached list stays a fresh value on every rebuild.
- Replace the try/except guard around request.session with hasattr —
SessionMiddleware is always registered, the silent except was paranoid.
Tests grow by a direct session-cookie inspection (decoupled from the
profile template) and three startup-banner log assertions.
* fix(auth): drop fragile session-decoder test + actually skip empty-target write
Two follow-ups on the LOCAL_DEV_GROUPS feature before merge:
- Drop test_session_holds_mocked_groups_directly. It manually decoded the
signed session cookie via TimestampSigner + base64, hardcoding both the
Starlette session-cookie format and the 14-day max_age. Starlette has
changed its session encoding before (URLSafeTimedSerializer pre-0.20)
and would do so again silently — the test would fail with a cryptic
BadSignature, not a clear "mock is broken" signal. The remaining
test_dev_user_sees_mocked_groups_on_profile already covers the same
observable signal (mocked groups in /profile body) without coupling to
Starlette internals.
- Actually skip the session write when target_groups is empty. The previous
comment claimed compare-then-write avoided spurious Set-Cookie noise on
PAT/CLI requests, but on those requests session.get("google_groups") is
None and target is [], so None != [] always evaluates True and the write
fired anyway, marking the session dirty and re-issuing Set-Cookie on
every request. Adding `target_groups and ...` to the guard makes the
comment honest: empty mock now genuinely no-ops, stable browser sessions
still skip via value-equality, and the only remaining write is the one
that actually changes state.
33 auth tests still pass locally.
* fix(auth): match production's always-write semantics for stale dev groups
Devin code-review finding on PR #70: my earlier `target_groups and ...`
short-circuit silently diverged from the production OAuth callback. In
app/auth/providers/google.py:189-194 the callback always writes
session.google_groups on each login — including [] on failure or empty
token — so the session always reflects authoritative current state. The
mock should match.
Failure mode the previous guard left open: a developer sets
LOCAL_DEV_GROUPS=[{...}] for a session, the groups land in the signed
cookie, then the developer unsets the env var and reloads. target → [],
session.get → [{...}], `if target_groups and ...` is False, no write,
stale groups stay in the browser session indefinitely. Mock now lies
about state until logout.
Fix splits the guard:
- target_groups truthy + value-changed → write the new mock (existing path)
- target_groups falsy + non-empty stored → write [] to clear stale state
- otherwise no-op (target [] + stored None/[]: no transition to record)
PAT/CLI requests with no prior session still take the no-op path
(target=[], session.get → None which is falsy), so the original goal of
suppressing spurious Set-Cookie noise on token traffic is preserved.
Tests already cover the populated and unset paths; the new clear-stale
branch is correct by construction (production has the same shape) and
the rare manual reset workflow.
* release(0.11.2): default mocked groups in make local-dev + docs/local-development.md
Cuts 0.11.2 around the LOCAL_DEV_GROUPS work plus a small dev-experience
follow-up: every `make local-dev` now boots with two sensible default
mocked groups (Local Dev Engineers + Local Dev Admins on example.com),
so /profile and group-aware code paths render something realistic
without the operator having to discover and set LOCAL_DEV_GROUPS.
Layered so the default lives in the workflow, not the contract:
- scripts/run-local-dev.sh seeds LOCAL_DEV_GROUPS via shell ":="
syntax — only sets the var when the operator hasn't already.
Override: LOCAL_DEV_GROUPS='[...]' make local-dev. Disable:
LOCAL_DEV_GROUPS= make local-dev.
- docker-compose.local-dev.yml swaps the commented JSON example for
a bare `- LOCAL_DEV_GROUPS` passthrough — the value comes from the
shell, the compose file just propagates it. Operators running
`docker compose up` directly without the wrapper script get an
empty mock (correct: they didn't opt into the make-driven defaults).
- Makefile help line mentions the mocked groups so the behavior is
visible without grepping.
New docs/local-development.md consolidates dev-onboarding instructions
that were previously scattered across docker-compose.local-dev.yml
inline comments, docs/auth-groups.md "Local-dev mock" section, the
Makefile help text, and CLAUDE.md "First-Time Setup". Single page now
covers TL;DR, what LOCAL_DEV_MODE actually bypasses, group mocking
controls + verification, what is *not* mocked (Cloud Identity, real
OAuth, admin Workspace permissions), and the safety rails that keep
the dev shortcuts off production.
Version bump 0.11.1 → 0.11.2 in pyproject.toml, CHANGELOG cuts
[Unreleased] → [0.11.2] — 2026-04-26 with a fresh empty [Unreleased]
skeleton.
* fix(local-dev): default LOCAL_DEV_GROUPS truncated by shell parameter expansion
Reported by an operator running `make local-dev` against the freshly
released 0.11.2 — the LOCAL_DEV_MODE banner showed:
LOCAL_DEV_GROUPS is not valid JSON, ignoring:
Expecting ',' delimiter: line 1 column 70 (char 69)
LOCAL_DEV_GROUPS is set but produced no valid groups —
check the WARNING above for the parse error.
Cause: the default value lived inside `${LOCAL_DEV_GROUPS:=…}` parameter
expansion. Bash matches `}` to close the expansion at the *first* `}`
encountered in the body, regardless of context — even one inside a
nested JSON object literal. The two-element JSON array was therefore
truncated to the first group's closing brace, leaving an unparseable
fragment:
[{"id":"local-dev-engineers@example.com","name":"Local Dev Engineers"
There is no escaping syntax for `}` inside parameter expansion (the
backslash escapes I had only escaped the quotes — `}` reaches bash
literally). Fix: hold the default in a single-quoted variable and
reference it through `${LOCAL_DEV_GROUPS:-$DEFAULT_LOCAL_DEV_GROUPS}`.
The variable's value is opaque to the expansion — no `}` matching
inside it — so the JSON survives intact. Verified with `python -m json`:
parsed OK: 2 groups: ['local-dev-engineers@example.com',
'local-dev-admins@example.com']
Operators on a running 0.11.2 stack: `make local-dev-down && make
local-dev` to pick up the corrected default.
* fix(local-dev): respect LOCAL_DEV_GROUPS= disable path + add 0.11.2 changelog link
Two follow-ups from a Devin code-review pass on PR #70:
- run-local-dev.sh: switch ${LOCAL_DEV_GROUPS:-$DEFAULT} to
${LOCAL_DEV_GROUPS-$DEFAULT} (no leading colon). The :- form
substitutes the default when the variable is unset OR set-but-empty,
silently overwriting the documented disable knob. Three places
promise this works — docs/local-development.md, the CHANGELOG entry,
and the script's own comment — so the bug was an operator-facing
lie, not just an implementation detail. The bare - form only
substitutes on unset, so `LOCAL_DEV_GROUPS= make local-dev` now
reaches the Python parser as "" and short-circuits to []. Verified
with both empty and unset shells.
- CHANGELOG.md: add the [0.11.2] link reference at the bottom.
Keep-a-Changelog convention is to mirror every version heading
with a release-tag link in the footer; the 0.11.2 heading was
missing its counterpart, breaking the Markdown link rendering on
GitHub.
---------
Co-authored-by: Claude <noreply@anthropic.com>
Patch release containing the two follow-up changes from 0.11.0:
- Caddy CADDY_TLS env passthrough in docker-compose.yml (#55) — should
have shipped with #52 but the first PR got accidentally closed before
merge. Without this fix Caddy ignores .env CADDY_TLS and crash-loops
on any LE / internal-CA deployment.
- CLAUDE.md changelog discipline (#59) — every PR touching user-visible
behavior must update CHANGELOG.md under [Unreleased] in the same PR.
The discipline rule itself caused this release to exist: writing the
[Unreleased] entry made the missed fix obvious, which is exactly the
feedback loop the rule is supposed to create.
The version = "2.x" strings in earlier pyproject.toml snapshots were
arbitrary placeholders from the initial scaffold (cookiecutter default),
not a reflection of API maturity. Resetting to 0.11.0 to signal pre-1.0
status: public surface (CLI flags, REST endpoints, instance.yaml schema,
extract.duckdb contract) may still shift between minor versions.
CalVer image tags (stable-YYYY.MM.N, dev-YYYY.MM.N) continue from CI;
semver tags (v0.X.Y) are cut at release boundaries and reference the
same commit as a stable-* tag from the same day.
CHANGELOG.md replaces the old CalVer draft format with Keep a Changelog
+ semver. The 0.11.0 entry curates everything currently in main:
- Auth: Workspace groups, password reset, PAT, magic-link, seed admin pwd
- Deploy: keboola-deploy workflow, Caddy/LE/cert-file TLS, dev_instances
TLS, optional Google OAuth from SM, LOCAL_DEV_MODE, /setup wizard
- CLI: wheel distribution, auto-update, --version, --dry-run, gzip
- Data: remote query (BQ+DuckDB), business metrics, OpenAPI snapshot test
- Security: padak-security.md audit batch + urllib3 + argon2-cffi
- Two BREAKING items called out (Caddy profile rename, Caddyfile default
cert mode flipped to cert-file)
* fix(cli): versioned wheel URL in setup instructions; drop broken /cli/agnes.whl alias (#36)
* fix(cli): inline PEP 427 wheel filename in setup instructions
`uv tool install <server>/cli/agnes.whl` fails with
error: The wheel filename "agnes.whl" is invalid: Must have a version
because uv validates the filename in the URL path *before* fetching — so
the server-side Content-Disposition header (which has the real versioned
filename) is never consulted, and an HTTP redirect does not help either:
uv resolves the filename from the initial URL.
Fix the root cause by inlining the real PEP 427 filename into the setup
snippet the dashboard copies to the clipboard. The wheel filename is
resolved server-side via `_find_wheel()` and substituted into the lines
returned from `setup_instructions.resolve_lines()`, so both the read-only
HTML preview and the JS clipboard renderer get byte-identical output.
Also added `/cli/wheel/{filename}` to serve wheels at their PEP 427 path,
and kept `/cli/agnes.whl` as a 302 redirect for manual/legacy callers —
though that redirect alone is NOT sufficient for `uv tool install` (uv
validates before following redirects) and is there only as defense-in-depth.
Verified locally:
- `uv tool install <server>/cli/wheel/agnes_the_ai_analyst-2.0.0-py3-none-any.whl` succeeds
- `/install` HTML now renders the versioned URL; `/cli/agnes.whl` no longer appears in the rendered snippet
* fix(cli): remove /cli/agnes.whl alias entirely — it only confused users
The bareword alias was never actually usable:
- `uv tool install <server>/cli/agnes.whl` fails at filename validation
before any HTTP fetch, so neither the Content-Disposition header nor a
302 redirect rescued it.
- The 302-to-versioned-path fallback left a visibly "working" URL in
browser / curl -L contexts, which is exactly how the original bug got
reported in the first place ("the URL loads, why doesn't install work?").
Remove the endpoint and scrub all remaining references. The only CLI wheel
URL is now `/cli/wheel/{filename}` with the real PEP 427 filename, which
the setup-instructions template already generates server-side.
Existing tests that referenced /cli/agnes.whl become negative tests
("must not appear") so we don't regress.
* feat(cli): --version flag; sync --dry-run + progress indicator (#38)
* feat(cli): add --version / -V flag
Prints `da <version>` from package metadata (importlib.metadata). Falls
back to "unknown" when the package is not installed (e.g. running from a
source checkout without `uv pip install -e .`), instead of crashing.
Eager typer callback, so `da --version` exits before subcommand
resolution and does not require any auth/config.
* feat(cli): da sync --dry-run + X/N progress indicator
--dry-run reports what would be downloaded/uploaded without hitting the
API or writing local state. Supports the full flag set (--table, --json,
--upload-only); JSON shape is {"dry_run": true, "would_download": [...],
"summary": {...}}.
Progress bar now shows "[X/N] Downloading <table>..." with a Rich
BarColumn + TaskProgressColumn + TimeElapsedColumn instead of a bare
spinner — makes long syncs visible.
* feat(cli): durable sync + server gzip + auto-update check (#41)
* fix(sync): atomic writes + manifest hash verification + retry on transient errors
Three durability hooks around stream_download and the sync command:
1. Atomic writes. stream_download now streams into `<target>.tmp` and
calls os.replace() on success, so the real target file never exists
in a half-written state. On failure the tmp is unlinked — no cleanup
leftovers, no guard needed at read time.
2. Retry with backoff. Transient errors (ConnectError, ReadError,
WriteError, RemoteProtocolError, TimeoutException, 5xx) are retried
up to 3× with 0.3s / 1s / 3s backoff. 4xx (auth, 404) surfaces
immediately — retrying those is pointless.
3. Manifest-hash verification. After download, sync.py computes MD5 of
the target (same 8KiB chunking as app/api/sync.py:_file_hash) and
compares against `server_tables[tid]["hash"]`. Mismatch ⇒ unlink,
record error, skip state commit. The PAR1 structural check survives
as a fallback for legacy manifests without a hash.
Also makes _rebuild_duckdb_views tolerant: single broken parquet is
skipped with a stderr warning instead of killing the whole rebuild.
Supersedes #40 — this commit is a strict super-set (hash check + PAR1
fallback + atomic write + retry). #40 can be closed without merging.
* perf(server): enable GZipMiddleware for JSON / HTML responses
GZipMiddleware at minimum_size=1024 shaves bandwidth on manifest-style
JSON endpoints (/api/sync/manifest, /api/version, …) and the /install
HTML preview. Parquet file downloads are already columnar-compressed so
the middleware sees limited benefit there — but it doesn't hurt, httpx
on the client side decompresses transparently.
Placed after session middleware so gzip wraps the session-Set-Cookie
response too, and before CORSMiddleware so compression is applied to
both cross-origin and same-origin responses.
* feat(cli): auto-check for newer CLI version on startup
Server side
- GET /cli/latest returns {version, wheel_filename, download_url_path}
for whatever wheel is currently in AGNES_CLI_DIST_DIR. Public,
cacheable, no secrets — consumed by the CLI auto-update probe.
Client side
- New cli/update_check.py: reads /cli/latest with a 3s timeout, caches
the result in $DA_CONFIG_DIR/update_check.json for 24h. Cache is
invalidated when the installed version changes (e.g. after a fresh
`uv tool install`) so stale "you're behind" warnings don't linger.
- Root typer callback fires the probe before subcommand dispatch; any
failure is swallowed so a bad network never blocks a working command.
- Outdated → one-line stderr warning:
[update] da 2.0.0 is out of date — latest on this server is 2.1.0.
Upgrade: uv tool install --force <server>/cli/wheel/<…>.whl
- Disable with DA_NO_UPDATE_CHECK=1.
* fix(pr-review): None-guard the upgrade line + skip gzip on parquet paths
Two follow-ups from Devin review on #41.
1. format_outdated_notice(UpdateInfo(download_url=None)) emitted literal
"uv tool install --force None" — copy-pasting that fails. Drop the
upgrade snippet when the URL is absent and keep only the version line.
2. GZipMiddleware compressed everything over 1024 bytes, including the
parquet FileResponses served by /api/data/{tid}/download,
/cli/wheel/{name}, and /cli/download. Parquet is already columnar-
compressed — gzip there is pure CPU + latency with no size win, and
/api/data bodies can reach hundreds of MB. Wrap GZipMiddleware in a
small _SelectiveGZipMiddleware that skips those path prefixes and
delegates the rest to the stock middleware. JSON / HTML endpoints
(manifest, /install, /api/version, …) still get compressed.
* release: bump to 2.1.0 — unify AGNES_VERSION with pyproject.toml version (#42)
Before: two independent version systems. pyproject.toml carried semver
(2.0.0 → wheel filename → `da --version`) while release.yml injected
CalVer into AGNES_VERSION (e.g. 2026.04.155 → /api/version). Users saw
different strings in the CLI vs. the /install page, and the CLI auto-
update check couldn't tell "new deploy, same package version" apart
from "new package version".
Make pyproject.toml [project].version the single product-version source
of truth. release.yml extracts it and feeds AGNES_VERSION, so every
surface (/api/version, /api/health, /cli/latest, `da --version`) agrees
on one number. The CalVer tag keeps doing what CalVer is for: release
identity on the git tag and Docker image tag (versioned_tag).
Also wires AGNES_TAG through the build: release.yml → Dockerfile ARG →
env, so /api/version.image_tag finally reports the actual image tag
instead of the "unknown" fallback.
Bump to 2.1.0 to reflect the PRs shipped on ps/wheel-name-fix: durable
sync (atomic writes + manifest MD5 + retry), server GZip, CLI auto-
update probe, setup snippet PEP 427 URL.
* fix(pr-review): directional version compare in is_outdated()
UpdateInfo.is_outdated() used `self.latest != self.installed`, which
fires in both directions. If the server is rolled back or the user
connects to an older deployment, the CLI would warn "out of date"
and — worse — the formatted notice would prompt
uv tool install --force <older-version>.whl
i.e. an unintended downgrade.
Compare with packaging.version.Version (PEP 440 aware, handles pre-
release tags). Fall back to dotted-int tuple compare if packaging is
somehow missing, and return False on unparseable strings — better to
miss an upgrade hint than to silently suggest a downgrade.
Adds 4 test cases: installed older (True), installed newer (False),
10.0.0 vs 2.1.0 lexical-compare trap (correct), unparseable strings
(False).
Addresses Devin review on #43.
* fix(pr-review): read FastAPI app version from package metadata
app/main.py:80 hardcoded `version="2.0.0"` in the FastAPI constructor.
After #42 bumped pyproject.toml to 2.1.0, /api/version, /cli/latest,
and `da --version` all reported 2.1.0 while /openapi.json and the
/docs UI still advertised 2.0.0.
Read `agnes-the-ai-analyst` version via importlib.metadata (same
pattern cli/main.py:_cli_version already uses), with a `"dev"`
fallback when the package is not installed (source checkout). This
way pyproject.toml stays the single source of truth across every
version surface — /openapi.json now tracks the bump automatically.
Adds a dedicated test file to pin this behavior so a future
regression to a hardcoded literal fails at CI.
Addresses second Devin finding on #43.
* fix(pr-review): _fmt_bytes PiB label + negative cache in update_check
Two more follow-ups from Devin review on #43.
1. _fmt_bytes off-by-unit. The old loop exited at TiB but the fallback
labelled PiB, so 1 PiB rendered as "1024.0 PiB". Restructure: put
every unit inside the loop (KiB through EiB) so the division count
always matches the label. Covers up to 1 ZiB cleanly; anything
beyond renders as "<big>.0 EiB" rather than crashing.
2. Negative cache for failed /cli/latest probes. On a corporate
firewall / VPN that silently drops packets, the 3s HTTP timeout
fired on *every* `da` invocation. Writing a `latest=None` cache
entry with a 5-minute TTL caps that at one probe per 5min. Successful
probes still use the 24h TTL. Reading logic branches on whether the
cached `latest` is None.
Adds TestFmtBytes (2 cases: small/medium sizes and the PiB/EiB fallback
regression), plus two TestSync update-check cases covering negative-
cache reuse and TTL expiry.
Removed kbcstorage from all dependency groups (optional + dev) so urllib3
is no longer pinned to <2.0. Legacy Keboola client is available via
manual install: pip install kbcstorage
kbcstorage pins urllib3<2.0.0 which blocks Dependabot security patches.
Moved to [project.optional-dependencies] keboola-legacy since the primary
extraction path uses the DuckDB Keboola extension, not kbcstorage.
Legacy fallback uses lazy import — app works without it installed.
- All dependencies now in pyproject.toml [project.dependencies]
- Dev/test deps in [project.optional-dependencies] dev and [tool.uv]
- Dockerfile uses uv pip install . from pyproject.toml
- CI uses uv pip install ".[dev]"
- Deleted requirements.txt and requirements-dev.txt
- Updated README, CLAUDE.md install instructions
- Enhanced .dockerignore (exclude tests, docs, infra from image)