agnes-the-ai-analyst

Author	SHA1	Message	Date
ZdenekSrotyr	28430ced09	Keboola cutover: native parquet path + sync correctness + auto-discover protection (#190 ) * fix: cutover regressions + parallel Keboola legacy fallback Bundled fixes from a fresh-deploy run on a Keboola Storage backend with the block-shared-snowflake-access feature flag — DuckDB Keboola extension's per-table scan can't access bucket schemas, so the legacy kbcstorage Storage-API client is the only working path. CUTOVER REGRESSIONS - agnes pull hash mismatch on every Keboola local-mode table — src/orchestrator.py:_update_sync_state stored md5(mtime+size)[:12] while the CLI compares against full 32-char content MD5. Now stores the same content MD5 the materialized SQL path already used. - Trailing-slash sanitization in connectors/keboola/access.py and extractor.py — DuckDB Keboola extension's ATTACH fails when the URL ends in / (canonical form). - src/profiler.py:TableInfo.description becomes optional — two call sites instantiated without it, crashing the profiler pass. - scripts/ops/agnes-auto-upgrade.sh: chown on UID change — older images ran as root, current runs as agnes (uid 999). Reads target uid:gid from /etc/passwd inside the new image and chowns ${STATE_DIR}, /data/extracts, /data/analytics when the digest moves. - POST /api/sync/trigger is now singleton per process — two near-simultaneous trigger calls each forked an extractor subprocess, fought for extract.duckdb's file lock, starved uvicorn, flipped the container to unhealthy. Trigger now returns 409 (sync_already_in_progress) when held; _run_sync acquires non-blocking. PARALLEL LEGACY FALLBACK - Process pool fan-out for the _extract_via_legacy queue (default 8 workers, override via AGNES_KEBOOLA_PARALLELISM). Process pool, not thread pool, because connectors/keboola/client.py:export_table does os.chdir(temp_dir) — process-global, so threads raced and slice files landed in the wrong directory ("[Errno 2] No such file or directory: '<job_id>.csv_X_Y_Z.csv'"). - Extractor subprocess timeout 1800s -> 3600s (configurable via AGNES_EXTRACTOR_TIMEOUT_SEC). 28+ tables × multi-minute Keboola export jobs need the headroom on telemetry-class projects. - Process group cleanup on timeout — Popen(start_new_session=True) puts the extractor in its own group. On timeout the parent SIGTERMs the group (10s grace) then SIGKILLs stragglers. Without this, the pool workers were reparented to PID 1 and continued holding open Keboola Storage export jobs. Inline extractor script also installs a SIGTERM -> sys.exit(143) handler so the with ProcessPoolExecutor(...) block __exit__ runs cleanly. Tests: existing tests that patched subprocess.run updated to patch subprocess.Popen with a _FakePopen stand-in (same exit-code-injection contract). Two tests that exercised the parallel path forced AGNES_KEBOOLA_PARALLELISM=1 to keep mocks alive (mocks don't ride into ProcessPoolExecutor subprocesses). Squashed onto current main (was 7 commits + multi-commit CHANGELOG + agnes-auto-upgrade.sh conflicts; squash avoids per-commit conflict resolution against main's flat-mount STATE_DIR refactor and 0.38.0 release cut). * feat(keboola): Storage API direct extract path; drop extension data path The DuckDB Keboola extension's COPY routes through Keboola QueryService, which is unreliable on linked-bucket projects (extension v0.1.6 fixes that case but isn't yet in the community CDN, and pre-fix any project with the block-shared-snowflake-access feature flag couldn't see bucket schemas at all). Move the extract path off the extension entirely and talk to the Storage API directly via signed-URL download — works on any project, regardless of extension state. connectors/keboola/storage_api.py (NEW) Lightweight client built on requests.Session. Three endpoints: - POST /v2/storage/tables/{id}/export-async (kicks off job) - GET /v2/storage/jobs/{id} (poll until done) - GET /v2/storage/files/{id}?federationToken=1 (signed URL detail) - GET <signed_url> (download bytes) Supports sliced exports (manifest + per-slice signed URLs) and gzipped payloads. ExportFilter dataclass mirrors the Keboola filter spec (whereFilters / columns / changedSince / limit) and handles JSON round-trip with the registry's source_query column. Token redaction in error messages. Bounded exponential backoff on job polling. No cloud-SDK dependency on the data path; thread-safe. connectors/keboola/extractor.py - materialize_query() rewritten: takes bucket/source_table/source_query (JSON filter spec), exports via KeboolaStorageClient, converts CSV to parquet via DuckDB, atomic os.replace. Same return shape so sync.py downstream code stays uniform with the BQ branch. - _extract_via_legacy() also moved to Storage API direct (kept the name for caller compatibility with _legacy_worker / the parallel batch extractor). Per-call temp directories — no os.chdir, threads don't race. app/api/sync.py _run_materialized_pass for source_type='keboola' rows now constructs a KeboolaStorageClient (replaces KeboolaAccess) and passes bucket/source_table/source_query to materialize_query. Reuses one client across rows for HTTP keep-alive. Sources keboola URL from env too (KEBOOLA_STACK_URL) when instance.yaml doesn't have stack_url configured. cli/commands/admin.py discover-and-register defaults Keboola rows to query_mode='materialized' (NULL source_query = full table), matching the v26 migration's unification of the local/materialized split for Keboola. BigQuery and Jira keep their per-source defaults. src/db.py Schema bump 25 → 26. Migration: UPDATE table_registry SET query_mode='materialized' WHERE source_type='keboola' AND query_mode='local'. NULL source_query on those rows means "full table export" — same effective behavior the local mode provided, but now via Storage API instead of the extension. pyproject.toml kbcstorage dep stays (admin-side bucket/table list still uses the SDK in app/api/admin.py / connectors/keboola/client.py); only the data path is migrated off the SDK. Comment updated to reflect the new boundary. tests - test_keboola_storage_api.py (NEW, 19 tests): ExportFilter parsing, HTTP client (token redaction, retry logic, polling), download_file (single, gzipped, sliced), end-to-end export_table_to_csv. - test_keboola_materialize.py rewritten: mocks KeboolaStorageClient instead of FakeAccess; same atomic-write + zero-rows + unsafe-id contracts. - test_sync_trigger_keboola_materialized.py: registry rows now carry bucket+source_table+JSON-shape source_query. 114+ Keboola-impacted tests green locally. * test: schema version assertion bumped to 26 alongside the keboola query_mode migration * fix(keboola): cutover hot-patches surfaced on agnes-dev Five small fixes that were applied as in-container hot-patches during agnes-dev cutover and need to be on the source-of-truth image so a fresh upgrade does not undo them. - app/api/sync.py: auto-discover gate considers the WHOLE registry (any source, any mode), not just rows where source matches and query_mode is local. After the v25→v26 keboola materialized migration an instance can have 30 materialized rows and zero local rows; the previous gate kept re-firing _discover_and_register_tables every scheduler tick, creating duplicate auto-discovered rows with the wrong bucket prefix every time. - app/api/admin.py: _discover_and_register_tables reassembles the bucket as <stage>.<bucket-id> (e.g. in.c-finance) instead of dropping the stage prefix; default query_mode for keboola is now materialized (the v26 contract); validator allows NULL source_query for keboola materialized rows (full-table export via Storage API export-async, no SQL needed). - cli/commands/admin.py: register-table mirrors the server validator (NULL source_query allowed for source_type=keboola); --bucket help text generalized to cover both BQ dataset and Keboola bucket id. - connectors/keboola/extractor.py: max_line_size=64 MiB on read_csv_auto so embedded JSON / SQL cells (kbc_component_configuration in particular) do not trip the default 2 MiB ceiling. - connectors/keboola/storage_api.py: GCP backend support — when the Storage API returns a manifest whose slice URLs are gs:// references with a gcsCredentials block, rewrite to the JSON REST download endpoint and authenticate with the issued OAuth bearer token; redact tokens in any surfaced error string. * test: align with new keboola materialized + auto-discover-gate contracts - test_admin_keboola_materialized: rename test_register_keboola_materialized_rejects_missing_source_query → test_register_keboola_materialized_accepts_missing_source_query. v25→v26 introduced 'keboola materialized with NULL source_query means full-table export via Storage API export-async' as the default registration shape; the rejection case is no longer the contract. - test_sync_filter: add list_all() to _StubRegistry. The auto-discover gate in _run_sync now keys off the WHOLE registry (not just local rows) so materialized-only Keboola instances do not re-trigger discovery on every tick. * feat(keboola): native parquet export — skip CSV roundtrip Storage API export-async accepts fileType={csv,parquet}. Switching the materialized sync to parquet eliminates the CSV → DuckDB COPY → parquet roundtrip that pinned a single uvicorn worker over 4 GiB on multi-GB tables (read_csv with all_varchar + max_line_size=64MB has to materialize the whole CSV in memory before COPY can stream out a parquet). Snowflake UNLOAD on Keboola's side already produces typed, self-contained parquet files; the extractor downloads them and renames into place. Two cases: - Single-file export (small table): file_info.url points at one signed URL; download_file streams chunks straight to .parquet.tmp and we're done. No DuckDB. - Sliced export (Snowflake UNLOAD respects MAX_FILE_SIZE — 16 MiB default — so anything larger arrives as N parquet slices): each slice is a complete parquet file with its own footer; naive concat would corrupt them. download_file_slices keeps the slices as separate files in a tempdir, then DuckDB COPY (SELECT * FROM read_parquet([slice0, slice1, ...])) merges them into one consolidated parquet. DuckDB streams row groups during this — peak memory bounded to one row group (~1 MiB) regardless of source size. The legacy CSV path stays as the explicit opt-in via source_query= '{"file_type":"csv"}' for projects whose backend can't UNLOAD parquet (none known today; cheap escape hatch). Backward-compat alias KeboolaStorageClient.export_table_to_csv kept. Also fixes a latent bug in download_file's gzip detection: previous heuristic flagged any unencrypted file as gzipped, which would have corrupted parquet downloads at gunzip time. Name-suffix-only now. * fix: tempdir leak cleanup, every 0m schedule, /sync/trigger body shapes Three small self-contained fixes uncovered during agnes-dev cutover. - connectors/keboola/extractor.py: tempfile.TemporaryDirectory now uses ignore_cleanup_errors=True so a worker death mid-write doesn't leave multi-GiB stale slice trees on the boot disk. (12 GiB seen after a disk-full crash where TemporaryDirectory's own cleanup also raised and got swallowed.) - src/scheduler.py: is_valid_schedule accepts 'every 0m' (interval=0 = always due). Force-resync of an errored row no longer requires waiting out the default 'every 1h' interval — admin can flip the schedule, trigger, then flip back. - app/api/sync.py: POST /api/sync/trigger accepts both ['table_id'] (legacy bare-array body) and {'tables': ['table_id']} (matches the response payload shape, more discoverable for clients building requests by hand). Malformed bodies return 422 with a structured detail; null/missing means 'sync everything' as before. Tests cover: tempdir cleanup on raise (sliced parquet path), is_valid_schedule + is_table_due 'every 0m' acceptance, and trigger body parametrized matrix (8 valid shapes + 6 rejection cases). * fix: targeted-trigger filter in materialized pass + auto-upgrade defer Two operational gaps observed during agnes-dev cutover, in the same sync-routing area. - _run_materialized_pass now takes a 'tables' arg and skips rows not in the target set with reason='not_in_target'. POST /api/sync/trigger with a body of tables previously only scoped the legacy extractor subprocess — the materialized pass kept iterating every due materialized row, so an admin asking to re-sync kbc_job re-ran every other due materialized row alongside it. Match on registry id OR name (admins commonly pass either form). tables=None preserves the no-filter behavior. - New GET /api/sync/status (public, no auth) returns {locked: bool} off _sync_lock.locked(). agnes-auto-upgrade.sh probes this before docker compose up -d and exits 0 with a 'deferred recreate' log line if a sync is in flight — the next 5-min cron tick retries. Pre-fix, an auto-upgrade triggered mid-sync would recreate the uvicorn worker and kill the in-flight extractor / Snowflake-UNLOAD download (observed when kbc_job's first 7-day retry got SIGKILLed). Connection failures in the probe fall through to the upgrade — being stuck on a wedged image is worse than interrupting a hypothetical sync. * fix: auto-discover protects admin overrides + surfaces drift Two real-world incidents on agnes-dev drove this: 1. kbc_job was registered manually with the correct (in.c-kbc_telemetry, kbc_job) coordinates. A naive auto-discover re-run would have inserted a SECOND kbc_job row at the slugified id 'in_c-keboola-storage_kbc_job' (where Keboola's discovery places it) — and that row's Storage API export-async 404s. 2. An earlier auto-discover bug stripped the stage prefix from bucket ids ('c-finance' instead of 'in.c-finance'), inserting 137 rows whose syncs all failed. Fix: - _discover_and_register_tables now builds a plan first (_build_keboola_discovery_plan) classifying each discovered table into one of new / existing_match / existing_drift / invalid, then executes only the 'new' bucket. Drift rows are reported with both sides of the disagreement plus drift_kind: - same_id_diff_coords: registry has the same id but different bucket / source_table (admin migrated coords inline). - name_collision: discovery's slugified id differs from any registry id, but the discovered .name matches an existing row's .name (case-insensitive). Catches the kbc_job case. - Bucket detection now prefers the API's authoritative bucket_id field (separate field on the Keboola tables.list response, normalised by KeboolaClient.discover_all_tables). Falls back to id-string parsing only when bucket_id is missing (older fallback path inside discover_all_tables). - Endpoint POST /api/admin/discover-and-register?dry_run=true returns the plan without writing — would_register, drift, invalid lists. Lets an operator audit before merging discovery with a registry that has admin overrides. Removed 'every 0m' from test_register_request_rejects_malformed_sync_schedule — the runtime started accepting it in the previous commit (force-resync override) and the validator follows suit. * feat(keboola): AGNES_TEMP_DIR routes tempfiles off overlayfs /tmp The container's /tmp lives on the boot disk's overlayfs (29 GiB on agnes-dev, shared with /var). Snowflake UNLOAD of a wide table writes slices into per-call /tmp tempdirs that fill multi-GiB / many-slice exports long before the dedicated data disk fills. agnes-dev hit 100% boot-disk while the 20 GiB data disk had 15 GiB free. connectors.keboola.storage_api.get_temp_root() reads AGNES_TEMP_DIR; mkdirs the target on first use; unset / empty / unwritable falls back to None (system tempdir, OSS-pre-fix behaviour). Both materialize_query (parquet path) and _extract_via_legacy (CSV fallback) and the sliced-CSV concat path in storage_api use the helper now. docker-compose.yml defaults AGNES_TEMP_DIR=/data/tmp on app, scheduler, and extract services. The data volume is the dedicated disk in production layouts and a plain docker volume in single-disk dev/laptop setups — same blast radius as the previous /tmp default on the latter, no regression.	2026-05-07 12:12:14 +02:00
Vojtech Rysanek	32c8ea601a	fix(bigquery): apply bq_query_timeout_ms on every BQ-extension attach + surface silent failures The DuckDB BigQuery extension defaults bq_query_timeout_ms to 90 s, which is too tight for analyst-scale queries against view-backed BQ datasets. Agnes already has apply_bq_session_settings() that bumps it to 600 s (configurable via data_source.bigquery.query_timeout_ms), but two regressions let the 90 s default leak through to live queries: 1. apply_bq_session_settings() swallowed every Exception silently. If the BigQuery extension wasn't loaded on the connection yet, or the installed extension version didn't recognise the setting, the SET would fail and the function would return without surfacing the problem. Operators saw 90 s timeouts on 'agnes query --remote' with no log line explaining why. 2. The call sites in src/db.py:_reattach_remote_extensions and src/orchestrator.py:_remote_attach only invoked apply_bq_session_settings on the metadata-token branch (token_env empty, the BqAccess contract). The token-based and no-auth branches ran ATTACH against the BigQuery extension without ever applying the timeout setting — so any BQ source registered with an explicit token_env, or with no auth env at all, fell back to the 90 s default. Fix: - apply_bq_session_settings now logs WARNING on each failure path (instance_config import error, non-numeric value, SET execution failure, readback error). It also verifies the setting actually landed via SELECT current_setting('bq_query_timeout_ms') and logs WARNING when the readback disagrees with the requested value, which catches the silent-ignore case some extension versions exhibit. - Both _reattach_remote_extensions (src/db.py) and _remote_attach (src/orchestrator.py) now call apply_bq_session_settings on every branch that ATTACHes a BigQuery alias, not only the metadata-token branch. Idempotent: calling it twice on the metadata-token path is a no-op SET. Tests: - Extended the _RecordingConn fixture to support .fetchone() so the readback assertion path works. Updated existing call-shape assertions to expect the SELECT current_setting readback alongside the SET. Added two new tests covering the WARNING surfaces for SET failure and readback mismatch — regression guards for the silent- fallback bug this PR addresses. - Full BQ-touching suite (398 tests) passes.	2026-05-06 11:24:14 +04:00
ZdenekSrotyr	6c94d2cbce	Merge remote-tracking branch 'origin/main' into pr180-review # Conflicts: # CHANGELOG.md # pyproject.toml	2026-05-06 07:27:25 +02:00
ZdenekSrotyr	4a1916a4b0	fix: v24 migration error message points to actual snapshot path The pre-migration snapshot was correctly migrated to STATE_DIR-aware path in src/db.py:1832 (`_get_state_dir() / 'system.duckdb.pre-migrate'`), but the error message in _migrate_v24_bq_source_queries still hardcoded the old `{DATA_DIR}/state/...` shape. Under flat-mount layout (STATE_DIR=/data-state), an operator hitting the v24 migration error would look in /data/state/ for a rollback snapshot that lives in /data-state/. Devin Review on PR #194 round 3.	2026-05-05 20:13:08 +02:00
Vojtech Rysanek	a303de0372	feat: STATE_DIR env var + flat-mount overlay (parallel disks) Introduces STATE_DIR as the single source of truth for the writable state directory path, with backward-compatible default of ${DATA_DIR}/state. Pairs with a new docker-compose.flat-mount.yml overlay that mounts the state disk in PARALLEL to the data disk (rather than nested under it). Why --- The default deployment topology nests state under data: sdb at /data, sdc at /data/state. That layout has known fragility documented in docs/state-dir.md — bind-propagation gotchas, two-writer collisions on the same prefix, mount-order coupling. The 2026-05-05 incident in the Groupon FoundryAI deployment was a manifestation of the propagation gotcha. The flat layout (sdb at /data, sdc at /data-state — parallel, not nested) eliminates the nested-mount class entirely. Each disk is its own bind mount, recursive by default in modern Docker. No volume options to forget. No two-writer collision (host scripts and container app share /data-state at the same path, single namespace). What changes ------------ App code (Python): - src/db.py: new _get_state_dir() helper. get_system_db() and schema migration snapshot use it. - app/secrets.py: new _state_dir() helper. _load_or_generate() uses it for .session_secret and .jwt_secret. - app/main.py: .env_overlay loaded from _state_dir(). Host scripts: - scripts/ops/agnes-auto-upgrade.sh: STATE_DIR drives mount-sanity check and cert detection. Defaults preserve existing behavior. - scripts/ops/agnes-tls-rotate.sh: STATE_DIR drives CERT_DIR. New compose overlay: - docker-compose.flat-mount.yml: parallel /data and /data-state binds per service. Mutually exclusive with docker-compose.host-mount.yml; pick one based on disk topology. Documentation: - docs/state-dir.md: layout choice (A nested vs B flat), pros/cons, migration steps, and which code paths read STATE_DIR. Backward compatibility ---------------------- STATE_DIR defaults to ${DATA_DIR}/state — current behavior. Existing deployers that don't set the var see no behavior change. Migration to flat layout is opt-in per the runbook in docs/state-dir.md. Validation ---------- - bash -n on both host scripts: pass - docker compose config -f docker-compose.flat-mount.yml: resolves cleanly with all 6 services binding /data and /data-state directly - python3 import + helper exercise: STATE_DIR override works, default falls back to ${DATA_DIR}/state Companion to PR #191 (drop named-volume driver_opts in host-mount.yml). That PR fixes the immutability footgun for Layout A; this PR offers Layout B as the architectural alternative.	2026-05-05 19:28:07 +02:00
ZdenekSrotyr	025a2b5c0e	fix(db): apply bq_query_timeout_ms to read-only reattach path Devin Review on PR #181: caught that the original PR plumbed the new SET into the orchestrator's _remote_attach (rebuild path), the BqAccess factory (materialize path), and the standalone extractor — but missed the actual primary `agnes query --remote` request path: every read-only analytics-DB connection runs `_reattach_remote_extensions` in `src/db.py` on open, and that LOAD bigquery + ATTACH cycle was unconfigured. Without this commit, the very flow the PR was meant to fix — analyst queries hitting BQ views > 90s — would still 400 with the same Binder Error / Job ID wording, because the runtime LOAD bigquery happens here not in the orchestrator's rebuild path. Apply apply_bq_session_settings(conn) right after the BQ secret is created and before ATTACH, mirroring what every other PR site does.	2026-05-05 16:40:40 +02:00
Minas Arustamyan	9d53efc6e1	fix(schema-v25): drop FK refs from store tables Past migration finalize steps RENAME / DROP COLUMN / ALTER on the `users` table (e.g. _v12_to_v13_finalize, _v13_to_v14_finalize, _v17_to_v18_finalize, the v5 backfill). DuckDB rejects an ALTER on a table that any other table references via FOREIGN KEY, so the new store_entities / user_store_installs / user_plugin_optouts entries — which the self-heal pass writes to _SYSTEM_SCHEMA before the migration ladder runs — broke 6 legacy-migration tests with: Cannot alter entry "users" because there are entries that depend on it Pre-existing convention (see personal_access_tokens at v6) is to omit FK constraints to `users` and validate user existence at the app layer. Sync the three v25 tables with that convention. Same edit in both _SYSTEM_SCHEMA and _V24_TO_V25_MIGRATIONS so fresh installs and upgraded installs land in the same shape. App-level cascade behavior is unchanged: store entity DELETE explicitly deletes user_store_installs rows in app/api/store.py, and the admin grant-deletion hook explicitly deletes user_plugin_optouts rows for the plugin. The dropped FK constraints were defense-in-depth, not the only guard.	2026-05-05 03:15:09 +02:00
Minas Arustamyan	d5a7c9ad79	feat(store): /store + /my-ai-stack — community marketplace + per-user composition Adds a community-driven Store where any authenticated user uploads skills/agents/plugins as ZIPs, plus /my-ai-stack as the per-user composition view. The served Claude Code marketplace is now: (admin_granted ∖ opt_outs) ∪ store_installs Skill + agent installs are merged into a single `agnes-store-bundle` plugin in the served marketplace; type=plugin uploads stay standalone. Names are suffixed with `-by-<owner-username>` at upload time so two owners can use the same display name without colliding in Claude Code's flat skill/agent namespace. Schema v23 → v24 adds three tables: - store_entities — community-uploaded skills/agents/plugins - user_store_installs — what each user has chosen to install - user_plugin_optouts — opt-out overlay on top of admin grants Admin grant-delete drops every user's opt-out for that plugin so re-grant resets cleanly to enabled (no sticky personal preference). UI: - /store — e-commerce-style listing with type/category/owner filters, search, pagination, owner-aware [Install] buttons, clickable cards - /store/new — 2-step upload wizard with drag & drop, preview validation (POST /api/store/entities/preview), docs multi-upload, photo + video URL - /store/{id} — detail page with hero, file list, docs, owner actions (Edit/Delete) for the uploader - /my-ai-stack — Granted plugins (toggle opt-out) + From the Store (uninstall) sections - Admin nav: Marketplaces moved into Admin dropdown, renamed to "Curated Marketplaces" Validation hardening: type-mismatch guards reject skill ZIP uploaded as agent (or vice versa), and plugin ZIPs masquerading as skills/agents. Human-readable error messages mapped client-side from machine codes. Cross-source naming: Store entity-id-prefixed dirs (`plugins/store-<id>/`) plus the bundle (`plugins/store-bundle/`) avoid collisions with admin marketplaces (whose `store` slug is reserved by `is_valid_slug`). Bundle composition is content-hashed at serve time — install/uninstall or owner re-upload bumps the bundle's plugin.json `version`, so Claude Code's auto-update toggle picks up changes. Tests: 50+ new tests across naming, repositories, filter (admin ∪ store ∪ bundle), API (upload/install/uninstall/delete/preview/docs), end-to-end marketplace.zip with bundle merging.	2026-05-05 02:53:49 +02:00
ZdenekSrotyr	0612c1e1a1	fix(schema-v24): raise on deferred migration so retry path actually runs (Devin Review on db.py:1757) Pre-fix: when v24 migration found rows to migrate but data_source.bigquery.project was empty, it logged a warning per row and returned normally. Schema_version then bumped to 24 unconditionally → next start's 'if current < 24:' gate skipped _v23_to_v24_finalize forever, leaving rows in DuckDB-flavor SQL that the new _wrap_admin_sql_for_jobs_api wrapping path rejects. Devin escalated this from advisory ("idempotent retry") to critical on rescan after my reply. The reply was wrong — the LIKE filter inside the function gives idempotency IF the function is called again, but the schema-version gate prevents that call from happening. Fix (Devin's recommended Approach 1): raise RuntimeError BEFORE the schema-version bump when rows need migration but project_id is empty. The schema_version stays at 23, so on next start the 'if current < 24:' gate fires and the migration runs again — this time with project_id configured. Side effect: a BQ-using deployment that hasn't set the project blocks startup until they do. That's the right call for a config error that would otherwise silently break all materialized tables. The error message points at the right knob (data_source.bigquery.project + restart). No-rows-no-block invariant preserved: the early 'if not rows: return' at the top of _v23_to_v24_finalize means non-BQ deployments are unaffected. Tests: - test_v24_raises_when_project_not_configured_and_rows_need_migration: asserts raise + schema_version stays at 23 (the load-bearing invariant for retry-on-next-start to work) - test_v24_skips_clean_when_no_rows_match_even_without_project: asserts non-BQ deployments don't block startup - Existing 3 tests still pass	2026-05-04 23:11:34 +02:00
ZdenekSrotyr	103efb69f0	chore(cli-rename): replace stale `da` verbs in active code paths Bring admin UI, audit-log messages, code comments, and analyst-facing skill docs in line with the post-bootstrap CLI surface (`agnes pull`, `agnes push`, `agnes init`, `agnes snapshot create`). The legacy `_LEGACY_STRINGS` detection tuple in `app/api/claude_md.py` and the hook upgrade markers in `cli/lib/hooks.py` are intentionally left as-is — they exist precisely to flag pre-rewrite content for re-authoring. Strip "(folded from `da metrics list`)" / "(lifted from `da metrics show`)" / "Replaces the old `da analyst status`" docstring noise — the rename history is in CHANGELOG.md, not in module docstrings.	2026-05-04 21:10:43 +02:00
ZdenekSrotyr	ce108d4c6d	fix(schema): code-review follow-ups for `fac10b29` - _v23_to_v24_finalize: wrap row-update loop in BEGIN/COMMIT/ROLLBACK to match the project's transactional-finalizer pattern (compare _v12_to_v13_finalize, _v17_to_v18_finalize, _v18_to_v19_finalize). Pre-fix a process crash mid-loop left the schema_version unchanged but partially-converted rows persisted across restart — idempotent overall but inconsistent with project convention. - _v23_to_v24_finalize: re.sub replacement now uses a function-form (lambda) instead of an f-string, so any future project_id with a backslash sequence isn't misinterpreted as a group reference. - tests: add a Keboola-source materialized row case asserting the SELECT's source_type filter prevents non-BQ rewrites.	2026-05-04 19:32:24 +02:00
ZdenekSrotyr	fac10b29e4	feat(schema): v24 — rewrite materialized BQ source_query to BQ-native Materialize now wraps admin SQL into bigquery_query('<billing>', '<inner>') which requires the inner SQL to be BigQuery-flavor (backticked identifiers, native function syntax). v24 migrates existing rows from DuckDB-flavor (bq."ds"."tbl") to (`<project>.ds.tbl`) using the configured BQ project. Idempotent on already-converted rows; logs a warning and skips when the project isn't configured (operator can configure + restart for retry).	2026-05-04 19:15:54 +02:00
ZdenekSrotyr	f01eb4143d	feat(db,repo,renderer): schema v23 + claude_md_template + ClaudeMd renderer - Bump SCHEMA_VERSION 22 → 23; add claude_md_template singleton table to _SYSTEM_SCHEMA and _V22_TO_V23_MIGRATIONS; wire migration + fresh-install seed - src/repositories/claude_md_template.py: ClaudeMdTemplateRepository (get/set/reset) mirroring WelcomeTemplateRepository; defensive re-seed in get() - src/claude_md.py: compute_default_claude_md / render_claude_md / build_claude_md_context — rich renderer with RBAC-filtered tables, metrics, and marketplaces; reads override from claude_md_template or falls back to config/claude_md_template.txt; raises TemplateError on broken override - config/claude_md_template.txt: default Jinja2 markdown template restored from PR #167 history (tables, metrics, marketplaces, BQ guidance, corporate memory, directory structure, per-user footer)	2026-05-03 22:43:56 +02:00
ZdenekSrotyr	c7b14fb120	feat(admin): drop setup_banner feature; consolidate into single editor Remove the setup_banner feature (admin-editable /setup page banner) and all associated code: API router, repository, renderer, admin template, tests, and docs. The setup_page handler no longer calls render_setup_banner; the install.html template no longer renders banner_html. The setup_banner DuckDB table (v22) is kept intact for forward-compat with already-migrated instances — only the application code is removed. CHANGELOG updated: setup_banner bullets removed; Agent Setup Prompt (welcome-template feature) now stands alone as the single editable prompt.	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	39146288e1	feat: admin-editable setup_banner on /setup page (schema v22) Adds an optional Jinja2/HTML banner displayed above the bootstrap commands on /setup. Empty by default; admin authors it at /admin/setup-banner. autoescape=True — safe for HTML context. Render failures return "" so a broken banner never breaks /setup. Schema v22: setup_banner singleton table, auto-migration v21→v22.	2026-05-03 16:12:13 +02:00
ZdenekSrotyr	33e7107637	feat(db): schema v15 — welcome_template singleton table	2026-05-03 16:10:48 +02:00
ZdenekSrotyr	85d3810535	feat(materialized): query_mode='materialized' for BigQuery + Keboola — admin SELECT → parquet → analyst Closes the 'admin pre-stages a curated table/view for analysts' use case end-to-end across both supported source connectors. Backend (BigQuery + Keboola, schema v20): - schema v20 adds source_query TEXT to table_registry (renumbered from v19 after main's #150 RBAC migration also bumped to v19) - connectors/bigquery/extractor.py adds materialize_query(table_id, sql, , bq, output_dir, max_bytes=...) — BqAccess session, dry-run cost guardrail (default 10 GiB, configurable via data_source.bigquery.max_bytes_per_materialize), idempotent ATTACH, rows/bytes/md5 metadata for sync_state - connectors/keboola/access.py — new KeboolaAccess facade (parallel of BqAccess) wrapping ATTACH 'keboola://...' AS kbc - connectors/keboola/extractor.py adds materialize_query — same shape, no dry-run analog (Keboola Storage API has different cost model); legacy bucket-download path skips query_mode='materialized' rows - app/api/sync.py:_run_materialized_pass dispatches by source_type to the right materialize_query - app/api/admin.py: RegisterTableRequest accepts source_query; model_validator coheres mode↔source_query↔bucket; PUT preserves omitted fields; deprecation marks (Field(deprecated=True)) on sync_strategy + profile_after_sync (no extractor reads them; profile_after_sync becomes inert — bug from earlier work where /api/sync/trigger never honored the flag); _BQ_OPTIONAL_FIELD_DEFAULTS injects defaults into GET /server-config payload Operator + CLI surface: - da admin register-table --query / --query-mode materialized - scripts/smoke-test-materialized-bq.sh — end-to-end smoke for operators Tests (incl. spike + integration + regression): - test_db_migration_v20, test_table_registry_source_query - test_bq_materialize, test_bq_cost_guardrail, test_bq_init_extract_skips - test_keboola_access, test_keboola_extension_query_passthrough (lock-in for the DuckDB extension capability), test_keboola_materialize, test_keboola_init_extract_skips, test_keboola_materialized_e2e (skipped without KBC_TEST_ creds) - test_sync_trigger_materialized, test_sync_trigger_keboola_materialized - test_api_admin_materialized, test_cli_admin_materialized - test_admin_bq_register, test_admin_discover_bigquery, test_admin_keboola_materialized, test_admin_phase_c_deprecation, test_admin_put_preservation, test_materialized_e2e Cost: BQ uses bigquery_query() (jobs API, view-aware) — works on tables, views, materialized views uniformly. Keboola uses ATTACH+COPY parquet through the DuckDB extension.	2026-05-01 20:25:56 +02:00
minasarustamyan	d4ac84dd46	feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150 ) * feat(rbac): drop dataset_permissions + access_requests + users.role + is_public; v19 migration BREAKING. Sjednocení datové RBAC vrstvy do per-group resource_grants modelu. Před PR byla legacy data RBAC vrstva (dataset_permissions + is_public bypass) de-facto neaktivní — is_public neměl API/UI/CLI surface, default true znamenal že can_access_table vždycky bypassl. Dnes každý non-admin přístup vyžaduje explicitní resource_grants(group, "table", id) řádek. Schema v18 → v19 (src/db.py:_v18_to_v19_finalize): - DROP TABLE dataset_permissions, access_requests - DROP COLUMN users.role (NULL artifact since v13) - DROP COLUMN table_registry.is_public - Drops přes table-rebuild idiom (rename → create new → INSERT … SELECT → drop old) kvůli DuckDB ALTER DROP COLUMN limitacím na tabulkách s historic FK constraints. INSERT picks intersection sloupců, takže test fixtures s minimal pre-v19 schemou migrate cleanly. Runtime: - src/rbac.py:can_access_table → deleguje na app.auth.access.can_access - DatasetPermissionRepository, AccessRequestRepository smazány - AGNES_ENABLE_TABLE_GRANTS env-gate v app/resource_types.py odstraněn (TABLE je unconditionally enabled) API drop: - app/api/permissions.py, app/api/access_requests.py celé soubory - /admin/permissions web route + admin_permissions.html - "Request Access" modal v catalog.html + locked-row UI - ~10 if user.get("role") != "admin" checků nahrazeno (admin shortcut je uvnitř can_access_table) - /api/settings: drop permissions field z GET; PUT /api/settings/dataset gate přepnut na can_access(user_id, "table", dataset, conn) Auth: - app/auth/jwt.py:create_access_token: drop role parametr (claim zmizí z nově vydávaných JWT; staré tokeny zůstávají valid, claim ignored) - app/api/users.py: drop role z CreateUserRequest / UpdateUserRequest (admin promotion = explicit add to Admin group via memberships API) - src/repositories/users.py: drop role z create() / update() CLI: - da admin set-role smazán → hard-fail s replacement command - da admin add-user --role flag pryč - da auth import-token --role flag pryč - da auth whoami: drop "Role:" výpis - cli/config.py:save_token: role parametr now optional, no longer written (back-compat se starými token.json soubory zachována — pole se ignoruje) Tests: - DELETE: test_permissions.py, test_permissions_api.py, test_access_requests_api.py - REWRITE: test_access_control.py (resource_grants flow), test_rbac.py (can_access_table over resource_grants), test_journey_rbac.py (drop access-request flow), test_resource_types.py (drop env-gate tests, drop is_public from helpers), test_v2_.py (drop role-based user dicts in favor of id-based + Admin group membership), test_settings_api.py (no permissions field, can_access gate) - TRIVIAL: ~30 souborů — drop role="admin" arg z UserRepository.create a 3rd positional role z create_access_token - NEW: test_v18_to_v19 migration test (test_db.py), test_can_access_table_no_implicit_public (test_rbac.py), test_admin_set_role_returns_hardfail (test_cli_admin.py) - OpenAPI snapshot regenerated Docs: - CHANGELOG: BREAKING entry pod [Unreleased] - CLAUDE.md: schema v18 → v19 - docs/architecture.md: schema table + RBAC sekce přepsána - docs/auth-google-oauth.md: admin promotion přes da admin break-glass - cli/skills/security.md: kompletně přepsáno na group-based model - docs/TODO-rbac-data-enforcement.md: smazáno (TODO splněn) Test results: 2363 passed, 19 failed. Zbývající failures jsou pre-existing Windows-specific issues (fcntl, charset) nesouvisející s tímto PR — ověřeno git stash pop. Plan: ~/.claude/plans/floofy-coalescing-parnas.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> chore(release): cut 0.27.0 --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-04-30 22:02:16 +02:00
minasarustamyan	fb1573766a	feat(admin): users/groups UI polish + SSO lock + v18 migration (#142 ) Cuts release 0.24.0. ## Highlights - SSO-managed accounts read-only for password / delete operations (UI + API). New `is_sso_user` flag derived from group memberships. - Admin/Everyone system rows show `google_sync` chip + Workspace email subtitle when env-mapped. - Origin pill vocabulary unified across `/admin/groups`, `/admin/access`, `/admin/users`, `/admin/users/{id}`, `/profile` (Admin yellow, Everyone gray, google_sync green, custom purple). - Effective-access readout no longer short-circuits for admin users — always renders per-resource breakdown. - Schema migration v18 drops stranded non-google memberships in env-mapped Admin/Everyone groups (cleans up v13's blanket Everyone backfill). ## Devin findings addressed - _is_sso_user requires source='google_sync' on system-group branches (so v13 system_seed memberships in env-mapped Everyone don't lock out the admin). - POST add-to-group returns correct origin via _derive_origin (matching GET). - 8 customer-specific token instances (groupon.com / foundryai) replaced with vendor-neutral placeholders across templates, tests, and CHANGELOG. - deriveDisplayName name-skip for canonical "Admin"/"Everyone" so an overlapping AGNES_GOOGLE_GROUP_PREFIX doesn't mangle the chip text. See CHANGELOG [0.24.0] for full notes.	2026-04-30 15:16:04 +02:00
Vojtech	38f6b639d2	feat(observability): request_id end-to-end + dev debug toolbar + centralized logging (#136 ) Cuts release 0.20.0. ## Highlights - X-Request-ID header on every response + sanitized to [A-Za-z0-9_-] (CRLF log-forging mitigation) - Error pages (HTML + JSON 500) surface request_id for support tickets - Dev debug toolbar gated by DEBUG=1 — fastapi-debug-toolbar with custom DuckDBPanel - Centralized app.logging_config.setup_logging() replaces 23 scattered basicConfig calls - Telegram bot drops bot.log file — stdout only (BREAKING) ## Devin findings addressed - BUG_0001: .env.template no longer claims FastAPI debug=True - BUG_0002: subprocess extractor logs INFO to stderr again - ANALYSIS_0003: _wants_html no longer matches Accept: / (curl gets JSON as before) - BUG on b1c6ee9: HTML 500 page no longer leaks str(exc) in production - BUG on b13d2fe: 2 CLAUDE.md compliance flags (transform.py + ws_gateway) accepted as scope-limited logging refactor — follow-up to update CLAUDE.md if needed See CHANGELOG [0.20.0] for full notes.	2026-04-29 22:54:21 +02:00
ZdenekSrotyr	82c5d71d63	feat(memory): #62 — duplicate hints + tree-view + bulk-edit (#126 ) Issue #62. Tree view with cross-axis filtering, duplicate-candidate hints (Jaccard score on entity overlap), bulk-edit endpoints (PATCH /api/memory/admin/{id} + POST /api/memory/admin/bulk-update), schema v17 (knowledge_item_relations), full CLI parity (da admin memory tree/edit/bulk-edit/duplicates list/resolve).	2026-04-29 13:55:15 +02:00
PavelDo	e1108b6112	feat(memory): corporate memory v1+v1.5 + 0.15.0 (#72 ) Adds corporate memory v1 (verification flywheel + contradiction detection + confidence scoring) and v1.5 (audience-based distribution + per-item privacy + admin curation). Server: GET /api/memory/bundle returns mandatory + ranked-approved items within a token budget; POST /api/memory/admin/mandate accepts an audience field gated against user_group_members; /api/memory/stats uses SQL aggregation. CLI: da sync writes received items to .claude/rules/km_*.md. Verification detector extracts knowledge candidates from session JSONL files. Auto-tagging via Haiku when ai: is configured. Adapted from the v9-era branch onto v13/v14 RBAC: _is_privileged_viewer + _effective_groups now query user_group_members JOIN user_groups; require_role(Role.KM_ADMIN) replaced with require_admin (km_admin collapsed into admin). Schema v15: knowledge_items context-engineering columns + knowledge_contradictions + session_extraction_state. Schema v16: verification_evidence. Cuts release v0.15.0 (also bundles #116 /me/debug page).	2026-04-29 07:16:22 +02:00
ZdenekSrotyr	2e1dfb7553	feat(v2): claude-driven fetch primitives + 0.14.0 (#102 ) Replaces the BigQuery wrap-view pattern with a discovery + scoped-fetch toolkit driven by the analyst's Claude session. Adds /api/v2/{catalog,schema,sample,scan,scan/estimate}, da catalog/schema/describe/fetch/snapshot/disk-info CLI commands, sqlglot-backed WHERE validator, process-local quota tracker, agent rails skill (cli/skills/agnes-data-querying.md). BREAKING: BQ wrap views off by default — set data_source.bigquery.legacy_wrap_views=true for one cycle. Backward-compat field_validator on primary_key. Catalog cache now matches documented 300s TTL with RBAC fresh per request. Cuts release v0.14.0.	2026-04-29 01:07:19 +02:00
Petr Simecek	3047f310b9	fix(db): self-heal missing tables on future-version DBs (agnes-dev incident) (#75 ) Discovered when 0.11.5 deployed onto agnes-dev whose system DB had been bumped to schema_version=10 during local experimentation with a parallel WIP branch (PR #72-style Context Engineering work). The lab v10 migration laid down its own table set without including v9's role tables — so the v9 binary saw `current=10 > SCHEMA_VERSION=9`, correctly treated it as a future-version-rollback and skipped its migration ladder, but ALSO skipped the table-creation step. Every query against user_role_grants (`_hydrate_legacy_role`, /profile, require_internal_role's DB fallback, every admin-gated request) then crashed with `_duckdb.CatalogException: Table with name user_role_grants does not exist`. Symptom on agnes-dev: HTTP 500 on /profile, admin nav vanished, /admin/* returned 403. Fix: hoist `conn.execute(_SYSTEM_SCHEMA)` to the TOP of _ensure_schema, unconditional. _SYSTEM_SCHEMA is all `CREATE TABLE IF NOT EXISTS`, so existing tables stay untouched (columns + data preserved); missing tables get created. Idempotent, near-zero cost (a few dozen no-op DDLs per process start). The migration block below still calls _SYSTEM_SCHEMA when migrating; that's now the redundant-but-cheap follow-up — left in place so the migration ladder reads chronologically. Concrete coverage of the rebase scenario the user asked about — a contributor switching FROM a lab future-schema branch BACK to a released binary now boots cleanly: - Forward rebase (older → current): unchanged, ladder runs as before. - Same-version rebase: unchanged, _seed_core_roles tail call still drives doc-tweak refresh. - Backward "lab" rebase (this fix): tables get re-materialized; if the DB is still on a future schema_version, _seed_core_roles tail call remains gated so we don't accidentally write data into a schema shape this binary doesn't understand. Operator can drop the v9 schema_version manually to trigger a clean ladder re-run if they want the full v8→v9 backfill (what we did to recover agnes-dev). Test: new test_split_brain_future_version_with_missing_tables_self_heals in tests/test_db.py::TestMigrationSafety. Synthesizes a v99 DB whose only existing table is schema_version, runs _ensure_schema, asserts both user_role_grants AND internal_roles AND group_mappings AND users exist after the call, and that the schema_version row stays at 99 (future-version contract). test_future_version_is_noop docstring updated to reflect the new self-heal pass — its only assertion (the version-row contract) still holds unchanged. pyproject.toml: 0.11.5 → 0.11.6. CHANGELOG.md: new [0.11.6] section under [Unreleased] skeleton.	2026-04-28 15:51:33 +02:00
Vojtech Rysanek	7147bac079	feat(rbac+marketplace): schema v14 FK + AGNES_ENABLE_TABLE_GRANTS + break-glass CLI Follow-up to the RBAC v13 + marketplace work in the parent commit. Addresses deferred Devin findings, gemini-flagged blockers, and adds three guard rails. == Schema v14 — FK constraints on user_group_members + resource_grants == Adds DuckDB foreign-key constraints so cascade deletes can no longer leave orphaned member / grant rows pointing at a deleted group_id (which were relying on application-level cascades up to v13). Migration is RENAME → CREATE-with-FK → INSERT → DROP, wrapped in BEGIN TRANSACTION so a partial failure rolls back without leaving the DB at a half-applied schema. == AGNES_ENABLE_TABLE_GRANTS feature flag (default off) == ResourceType.TABLE was shipped in the parent commit as listing-only — admins can record grants but runtime enforcement still flows through legacy dataset_permissions. To avoid the misleading-UX surface area, the chip is hidden from /admin/access and POST /api/admin/grants returns 422 with the env-var name in detail until the operator opts in. Existing TABLE rows in resource_grants stay listable + deletable so cleanup is never blocked. Helpers: is_resource_type_enabled(rt), enabled_resource_types(). == Break-glass admin CLI == `da admin break-glass <user>` adds the user to the Admin user_group with source='system_seed' regardless of RBAC state. Bypasses authentication — relies on filesystem access to ${DATA_DIR}/state/system.duckdb implying host-level trust. Recovery path when the operator has locked themselves out of /admin/access. == Devin round-2 fixes (deferred on b4ec4c4) == - src/repositories/user_groups.py — narrow update() guard from blocking any mutation on system groups to blocking name change only. Description edits now pass through. Endpoint pre-check stays as defense-in-depth. Prior behavior surfaced as a misleading 409 'Cannot rename a system group' on description-only PATCH. - app/api/access.py:delete_group — wrap cascade DELETEs + repo.delete in BEGIN TRANSACTION / COMMIT / ROLLBACK. Prevents orphan rows if any DELETE fails after the user_groups row is gone. - app/marketplace_server/{packager,router}.py — split compute_etag_for_user() from build_zip(); router resolves etag first and 304-shorts before any file read or ZIP_DEFLATED. In-process cachetools.TTLCache (default 120s, env-tunable via AGNES_MARKETPLACE_ETAG_TTL, set 0 to disable). invalidate_etag_cache() called by sync to force re-hash on content drift. == Tests == - TestTableGrantsFeatureFlag (4 cases) — endpoint exclude/include, grant rejection/acceptance under the flag. - test_v12_to_v13_finalize_rollback_on_failure — destructive: monkeypatches _seed_system_groups to raise mid-transaction, asserts schema_version stays at 12, legacy tables intact, new tables empty (rollback fired). Then restores the real function and asserts the retry succeeds. - test_update_system_group_description_allowed, test_update_system_group_same_name_no_op — repo-level coverage of the narrowed guard.	2026-04-28 14:25:13 +02:00
ZdenekSrotyr	e9d7af3cce	feat(rbac+marketplace): RBAC v13 + Claude Code marketplace + #81/#83/#44 hardening This squashes 13 commits from ma/staging plus a small docstring translation into a single coherent unit. Three workstreams. == RBAC v13 redesign == - Drops core.viewer/analyst/km_admin/admin hierarchy and the internal_roles / group_mappings / user_role_grants / plugin_access tables. - Replaced by user_group_members + resource_grants. Atomic v12→v13 backfill wrapped in BEGIN/COMMIT; ROLLBACK leaves schema_version at 12 for retry. - Two authorization primitives in app.auth.access: require_admin — Admin-group god-mode require_resource_access(rt, "{path}") — entity-scoped grants Single DB lookup per request; no session cache; no implies BFS. - /admin/access UI (single page) replaces /admin/role-mapping + /admin/plugin-access. CLI `da admin group/grant ` replaces `da admin role/mapping/grant-role/revoke-role/effective-roles`. - ResourceType.TABLE listing-only — admins can record table grants, runtime enforcement still flows through legacy dataset_permissions (migration plan in docs/TODO-rbac-data-enforcement.md). == Claude Code marketplace == - Aggregated /marketplace.zip + /marketplace.git/ (PAT-gated, RBAC-filtered, content-addressed cache via dulwich). - Admin god-mode dropped on the marketplace surface — admins curate their own view via grants like everyone else. - Bare-repo cache materializes per RBAC-filtered ETag; stale entries not pruned in this iteration (disclaimed in git_backend.py docstring). == #81 #83 #44 security/ops hardening == - #81 Group A — orchestrator ATTACH allow-listing (extension/url/alias). - #81 Group B — Keboola extractor 3-state exit codes: 0 success / 1 total fail / 2 PARTIAL fail Sync API logs PARTIAL FAILURE alert on exit 2. Operators with binary alerting must teach it the new partial signal. - #81 Group C — schema v10 view_ownership; rejects silent overwrite of a prior connector's view name on collision. - #81 Group D — extractor-side identifier validation. - #83 — Jira webhook fail-closed when JIRA_WEBHOOK_SECRET unset + path-traversal fix. - #44 — entire /api/scripts/* surface is admin-only (planted-script + sandbox-bypass risk closed). == Web UI polish + deploy fix == - /admin/access: live grant-count badges (no stale snapshot revert), shared-header CSS link added to /catalog and /admin/{tables,permissions}, per-resource-type colored stripes. - docker-compose.host-mount.yml: bind,rbind so dual-disk hosts don't silently shadow sub-mounts and write state to the wrong disk. == OSS vendor-neutralization (waves 1+2) == - scripts/grpn/ → scripts/ops/. Customer-specific identifiers (project IDs, internal hostnames, dev/prod VM IPs, brand names) replaced with placeholders across code, docs, Terraform, Caddyfile, OAuth probe, and planning docs. Downstream infra repos that copied scripts/grpn/agnes-tls-rotate.sh or agnes-auto-upgrade.sh must update the path. == Translation == - src/repositories/user_groups.py::ensure_system docstring translated from Czech to English for codebase consistency. Co-authored-by: Mina Rustamyan <mina@keboola.com>	2026-04-28 14:25:04 +02:00
ZdenekSrotyr	72230c3b51	fix: #81 Group C — view-name collision detection (schema v10, squashed) (#100 ) Schema v10 + view_ownership table. Cross-connector view name collisions are detected and refused with an actionable ERROR rather than silently last-write-wins. Pre-scan reconcile releases stale ownerships in the same rebuild as a rename — but only when ALL sources' pre-scans succeed (transient-IO defense; partial pre-scan skips reconcile to avoid silently stealing a name). 26/26 view collision + orchestrator tests pass. Refs #81 Group C.	2026-04-27 22:09:49 +02:00
ZdenekSrotyr	23be8ad46f	fix(security): #81 Group A — orchestrator attach hardening (squashed) (#95 ) Closes the C1 findings from issue #81 plus the round-3/4 follow-ups on the read-only query path. Both _attach_remote_extensions (rebuild path) and _reattach_remote_extensions (query path) now apply the same hard allowlists for extensions and token-env names, single-quote-escape the URL, and split built-in vs community install. The CHANGELOG bullet documents the full scope including the table_schema → table_catalog fix that made the rebuild path a silent no-op for every connector. New module src/orchestrator_security.py centralises the policy. Tests in tests/test_orchestrator_remote_attach_security.py — 28/28 pass. Refs #81.	2026-04-27 21:34:04 +02:00
Petr Simecek	83ced81966	feat(auth): unified role management — UI + REST API + CLI + schema v9 (v0.11.4) (#73 ) * feat(auth): v9 schema — unified role management foundation (WIP) Tasks 1-5, 10 of the role-management-complete plan. Foundation only, follow-up commits add REST API, CLI, UI, and tests. Schema v9: - user_role_grants table: direct user → internal_role mapping (complementary to group_mappings). Drives PAT/headless auth and persists across sessions. Source field tracks 'direct' vs auto-seed. - internal_roles.implies (JSON): transitive role hierarchy. core.admin implies core.km_admin → core.analyst → core.viewer. Resolver does BFS expand at lookup time. - internal_roles.is_core (BOOL): distinguishes seeded core.* hierarchy from module-registered roles. UI renders them differently. - v8→v9 migration: ADD COLUMN, CREATE TABLE, _seed_core_roles + _backfill_users_role_to_grants, then NULL legacy users.role values. DuckDB FK constraint blocks DROP COLUMN — sloupec zůstává jako deprecated artifact (UserRepository ignoruje), fyzický drop deferred. Resolver: - Regex extended to allow dotted namespace (core.admin, context_engineering.admin), max 64 chars total. - expand_implies(role_keys, conn): BFS over implies JSON column. - resolve_internal_roles signature gains optional user_id parameter; unions group-mapping resolution with user_role_grants direct grants before implies expansion. require_internal_role: - Two-path resolution: session cache (OAuth) → DB grants (PAT/headless fallback). PAT clients now legitimately satisfy gates without the OAuth round-trip, fixing the v8 limitation where every PAT-callable admin endpoint needed require_role(Role.ADMIN) instead of require_internal_role(...). Backward-compat: - require_role(Role.X) and require_admin become thin wrappers over require_internal_role(f"core.{role}"). Implies hierarchy preserves the legacy "at least this level" semantics automatically — no per-level comparison code needed. - src/rbac.py helpers (is_admin, has_role, get_user_role, set_user_role, can_access_table, get_accessible_tables) all read from the resolver via _get_internal_role_keys. - UserRepository.create() and update() now mirror role changes into user_role_grants via _grant_core_role helper. Preserves API while making the new table the source of truth. - UserRepository.delete() pre-deletes user_role_grants rows (FK cascade — DuckDB doesn't auto-cascade). - count_admins() reads user_role_grants ⨝ internal_roles instead of the now-NULL users.role column. First consumer: - app/api/admin.py module-level docstring documents the v9 pattern for future module authors. Existing require_role(Role.ADMIN) callsites flow through the wrapper; no behavior change for OAuth callers, and PAT callers gain access via direct grants. Tests: full suite green (1396 passed, 6 skipped). Existing tests exercise the new pathway transparently because UserRepository.create auto-grants. New test_pat_caller_with_direct_grant_passes pins the PAT-aware contract. Schema: v9 (was v8). pyproject.toml + CHANGELOG bump deferred to the final PR-prep commit. * feat(auth): role management complete — REST API + CLI + UI + docs (v0.11.4) Sjednocuje legacy users.role enum s v8 internal-roles foundation pod jeden model s implies hierarchií, dodává admin UI + REST API + CLI pro správu group mappings i přímých user grants, a dělá require_internal_role PAT-aware tak, aby admin endpointy fungovaly uniformly napříč OAuth i headless callery. REST API (app/api/role_management.py, +496 LOC): - 8 endpointů pod /api/admin: internal-roles list, group-mappings CRUD, users/{id}/role-grants CRUD, users/{id}/effective-roles debug. - Všechny gated require_internal_role("core.admin"). Audit-log na každé mutaci (role_mapping.created/deleted, role_grant.created/deleted). - Last-admin protection: refuse to delete the final core.admin grant (mirrors users.py:count_admins protection). - Nový UserRoleGrantsRepository v src/repositories/user_role_grants.py. CLI (cli/commands/admin.py extension, +258 LOC): - da admin role list / show <key> - da admin mapping list / create <group-id> <role-key> / delete <id> - da admin grant-role <email> <role-key> - da admin revoke-role <email> <role-key> - da admin effective-roles <email> - Všechno přes typer + PAT auth, --json flag, response-shape tolerantní. UI (admin_role_mapping.html + admin_user_detail.html + nav + user list): - Nová stránka /admin/role-mapping: internal_roles read-only table + group_mappings table with create/delete forms. - Nová stránka /admin/users/{id}: core role single-select + capabilities multi-checkbox + effective-roles debug (direct + group + expanded). - Existing user list dostává "Detail" link na novou stránku. - Nav link na /admin/role-mapping. Tests: +85 nových testů přes 4 nové soubory: - test_schema_v9_migration.py (8) — fresh install + v8→v9 backfill + legacy column NULL semantics + unknown-role fallback + invariants. - test_api_role_management.py (33) — všech 8 endpointů, happy + error paths, audit-log assertions, last-admin protection. - test_cli_admin_role.py (25 + 1 conditional) — typer subcommands, text + json output, PAT integration smoke. - test_admin_role_mapping_ui.py (9) + test_admin_user_capabilities_ui.py (10) — page rendering, auth gating, form contracts, JS hooks. Full suite: 1482 passed, 6 skipped (was 1396 → +86, žádné regrese). Docs: - docs/internal-roles.md kompletní rewrite — odstranil "no UI yet", přidal hierarchy diagram, dual-path resolution, dotted-namespace convention, admin workflow přes UI/CLI/REST, refresh semantics for group mappings vs direct grants, migration notes. - CLAUDE.md schema v8 → v9. - CHANGELOG.md [0.11.4] s BREAKING marker pro users.role NULL semantics + complete Added/Changed/Removed/Internal sekce. - pyproject.toml: 0.11.3 → 0.11.4. Sequencing: po mergi tohoto PR Pabu rebasuje pabu/local-dev (PR #72) na main, jeho schema migrations se posouvají z v9/v10/v11 na v10/v11/v12. Implementation breakdown: - Sequential (já): foundation tasks — schema v9, resolver, PAT-aware require_internal_role, backward-compat wrappers, rbac refactor, UserRepository auto-grant. - Parallel sub-agents (3 worktrees, ~10 min): REST API, CLI, UI. - Sequential (já): integrace, docs/CHANGELOG/version, schema tests, fullsuite verification. * fix(auth): address Devin review on PR #73 — three regressions Three concrete bugs caught in Devin's PR review, all fixed in this commit. 1. users.role hydration on read (the big one): v8→v9 migration NULLs users.role for every existing user, but a long tail of read sites still inspect user["role"] directly: - app/web/templates/_app_header.html:15 — admin nav gate - app/web/templates/_app_header.html:36-37 — role badge in dropdown - app/web/router.py:319-321 — UserInfo.is_admin/is_analyst/is_privileged - app/web/router.py:489 — corporate memory is_km_admin - app/api/catalog.py:54 — admin "see all tables" bypass - app/api/sync.py:215 — admin "see all sync states" bypass Without a fix, every existing admin loses the entire admin nav (and API admin bypasses) immediately after upgrade — a serious regression. Fix: new helper _hydrate_legacy_role() in app/auth/dependencies.py maps the highest-level core.* grant back into user["role"] as the legacy enum string. Called from get_current_user() on both auth paths (LOCAL_DEV_MODE + JWT/PAT). Idempotent — skips when role is already populated. Net effect: every pre-v9 callsite keeps working transparently for both OAuth and PAT callers, with one extra DB round-trip per authenticated request (same cost as the existing PAT-aware require_internal_role fallback). 3 regression tests in tests/test_schema_v9_migration.py: - test_hydration_recovers_role_from_user_role_grants - test_hydration_returns_highest_grant (multi-grant → highest wins) - test_hydration_falls_back_to_viewer_when_no_grants (safe fallback) 2. CLI effective-roles TypeError: API returns direct/group as List[Dict] (RoleGrantResponse-shaped), but the CLI did ', '.join(direct) which raises TypeError on dicts. Tests masked it because mocks used bare string lists. Replaced raw .join() with a _names() helper that extracts role_key from each item, falling back to str() for legacy mock shapes. 3. UI template field-name mismatch: admin_user_detail.html JS reads data.groups but the API serializes the field as group (singular, per EffectiveRolesResponse pydantic). Currently benign because the API always returns group:[], but the field would silently disappear once the group-derived view is wired up. Added data.group as the primary lookup, kept the legacy aliases for shape-drift tolerance. Full suite: 1485 passed (was 1482, +3 hydration tests), 6 skipped, no regressions. * fix(auth): Devin review #2 + UX self-service + RBAC docs rename Three threads landed in one commit because they share the same auth/role surface and CHANGELOG entry. Devin review #73 second round (2 actionable findings): - _hydrate_legacy_role no longer short-circuits on truthy users.role. The role-management endpoints (POST/DELETE /api/admin/users/{id}/ role-grants + the changeCoreRole UI flow) only mutate user_role_grants — they don't update the legacy column. The early return trusted that stale value, so a user downgraded via the new REST/UI kept role="admin" in their dict on subsequent requests, which fooled _is_admin_user_dict (src/rbac.py) and the catalog/sync admin-bypass short-circuits into retaining elevated table access even though require_internal_role correctly denied the API gates. Always re-resolves now, making user_role_grants the single source of truth on every authenticated request. Cost: one DB round-trip per request — same as the existing PAT-aware fallback. Pinned by test_hydration_ignores_stale_legacy_role_after_grant_revoke. - Dev-bypass (app/auth/dependencies.py) and OAuth callback (app/auth/providers/google.py) now pass user_id to resolve_internal_roles so direct grants land in session["internal_roles"] alongside group-mapped roles. Pre-fix, every admin-gated request fell through to the per-request DB fallback inside require_internal_role and the dev-bypass log line read "resolved 0 internal role(s)" for an obviously-admin user. test_session_internal_roles_populated updated to assert union. User-visible UX (also addresses local-test feedback): - HTTP 500 on /admin/users post-v8→v9 migration — UserResponse.role is required str, but legacy users.role was NULL-ed by the migration. _to_response in app/api/users.py now routes every dict through _hydrate_legacy_role; same fix lifts the silent no-op of last-admin protection in update_user/delete_user (the role-equality short-circuits would skip the count_admins guard for migrated admins). Three regression tests under TestAPIUsersPostMigration. - /profile is now a real self-service detail page for every signed-in user (not just admins). Three new server-side sections: Effective roles (resolver output as chip cloud), Direct grants (rows in user_role_grants with source label), Roles via groups (which Cloud Identity / dev group grants which role for the current user). Non-admins finally see why a feature is or isn't accessible. Admins additionally see a deep-link to /admin/users/{id} for editing their own grants. - /admin/role-mapping group-id picker. New "Known groups" panel above the create form: clickable chips for the calling admin's own session.google_groups (tagged "your group") merged with external_group_ids already used in existing mappings (tagged "already mapped"). Click a chip → fills the form. Empty-state copy points operators at LOCAL_DEV_GROUPS / Google sign-in instead of leaving them to guess Cloud Identity opaque IDs from memory. Operational fixes: - Scheduler log-noise: every cron tick produced a POST /auth/token 401 because the auto-fetch fallback called the endpoint with just an email (no password) and silently fell through. Removed the broken path entirely. Operators set SCHEDULER_API_TOKEN (long-lived PAT) in production; in LOCAL_DEV_MODE the dev-bypass auto-authenticates the un-tokenized request, so jobs continue to work. Docs: - docs/internal-roles.md → docs/RBAC.md (git mv preserves history). Standard industry term, more discoverable for engineers grepping for RBAC in a new repo. Restructured: Quickstart-by-role (operator / end-user / module author), step-by-step Module-author workflow with code examples (register key, gate endpoint, declare implies, write contract test), naming pitfalls, refresh semantics. CLAUDE.md gets a new "Extensibility → RBAC" section pointing contributors at the doc before they add gated endpoints. Cross-refs in app/api/admin.py + tests/test_role_resolver.py updated. Tests: 293 in the auth/role/scheduler/UI test set passed, 0 regressions. * fix(auth): Devin review #3 — login flows + RBAC docs Two new findings on commit 7d1c048, both real and addressed. Finding 1 (BUG, HTTP 500): every auth login flow loaded users via UserRepository.get_by_email and passed user["role"] straight to create_access_token, Pydantic response models, and _set_login_cookie without going through _hydrate_legacy_role. Post-v9 the legacy column is NULL for migrated users, and TokenResponse.role is a required str — so POST /auth/token raised ValidationError → HTTP 500 for any v8-admin trying to log in via password. Same root cause produced non-crashing but semantically wrong JWTs (role: null) from Google OAuth, password web flows, and email magic-link verification. Fix: hydrate inline in every login flow before reading user["role"]: - app/auth/router.py — POST /auth/token (the crash site) - app/auth/providers/google.py — OAuth callback (was just stale JWT) - app/auth/providers/password.py — 5 flows: JSON login, web login, JSON setup, web reset confirm, web setup confirm - app/auth/providers/email.py — centralized in _consume_token, covers both /verify endpoints New regression class TestAuthLoginFlowsPostMigration pins both the no-crash and the correct-role contracts for all four legacy levels (viewer/analyst/km_admin/admin) on POST /auth/token. Finding 2 (DOCS): docs/RBAC.md showed register_internal_role() being called with implies=[...], but the function signature is (key, , display_name, description, owner_module). A module author copying the example would TypeError at import time. The implies field on internal_roles IS honored at runtime by expand_implies, but the registry-side write path (register_internal_role + InternalRoleSpec + sync_registered_roles_to_db) doesn't exist yet — implies is currently seeded only for the core. hierarchy via _seed_core_roles in src/db.py. Rewrote the Implies hierarchy and Module-author workflow sections to document what's actually supported in 0.11.4 and what a future change would need to add. The "for cross-module hierarchies, register each level + grant both" pattern works today. Tests: 322 in the auth/role/scheduler/UI/password test set passed, 0 regressions. * fix(db): _seed_core_roles actually runs on every connect (Devin review #4) Devin flagged that the docstring on `_seed_core_roles` promised per-connect execution as a safety net for accidental DELETEs and in-code seed changes, but the only call sites lived inside `if current < SCHEMA_VERSION:` — so once a DB was on v9 the function never ran again, and the docstring lied. Picked option (b) from the review (actually call it on every startup) over option (a) (fix the docstring) because the safety net is genuinely useful: - recovery from accidental admin DELETE on internal_roles, - in-code _CORE_ROLES_SEED tweaks (display_name/description/implies) ship without a manual SQL deploy, - fresh installs and migrations stop needing their own seed call sites. Tail call gated by `get_schema_version(conn) <= SCHEMA_VERSION` so the future-version-is-noop rollback contract still holds — a v9 binary won't touch a DB that's been upgraded past v9. Test coverage: new TestSeedCoreRolesSafetyNet class (3 tests) pins the three contracts — deleted row re-seeds, mutated display_name re-syncs from in-code seed, applied_at on schema_version doesn't churn on already-current DBs. Existing TestMigrationSafety::test_future_version_is_noop still passes (verified against the gating logic).	2026-04-27 02:23:01 +02:00
Petr Simecek	6c36b26979	release(0.11.3): internal roles + external→internal group mapping (foundation) (#71 ) * feat(auth): internal roles + external→internal group mapping (foundation) Two-layer authorization model: external Cloud Identity groups (org-managed) get mapped onto internal Agnes-defined capabilities (app-managed) via an admin-curated many-to-many table. Per-request permission checks read off the session — no DB hit. Refresh requires re-login. Schema v8 — new tables: - internal_roles (id, key UNIQUE, display_name, description, owner_module, …) — app-defined capabilities like 'context_admin'. Modules self-register at import; the startup hook syncs the registry into this table (idempotent). - group_mappings (id, external_group_id, internal_role_id FK, …) — admin-managed bindings, UNIQUE(external_group_id, internal_role_id). app/auth/role_resolver.py — new module: - register_internal_role(key, display_name, description, owner_module) Module-author entry point. lower_snake_case key, immutable, validated. Same key + same fields = no-op (re-import safe); same key + different fields = ValueError so two modules can't silently overwrite each other. - sync_registered_roles_to_db(conn) — startup reconciliation. Inserts new keys, updates drifted metadata, never deletes (preserves mappings). - resolve_internal_roles(external_groups, conn) — joins group_mappings. Sorted, deduplicated role-key list. Plugged into google_callback + dev-bypass branch in get_current_user. - require_internal_role('key') — FastAPI dependency factory; reads session.internal_roles; 403 with explicit message when missing. Resolution runs at sign-in only (Google callback + LOCAL_DEV_GROUPS change in dev-bypass) — same semantics as session.google_groups. No admin UI yet; mappings created via repository directly until follow-up PR ships UI. 21 new tests in tests/test_role_resolver.py: register/list, idempotency, collision detection, key-format validation; sync insert/update/no-delete; resolve empty/single/many-to-many/malformed-input; e2e via LOCAL_DEV_GROUPS — gated endpoint allowed/denied + direct session-cookie inspection. Full sweep: 178/178 passed across auth + db + repo tests. (Two pre-existing test_catalog_export.py failures verified unrelated.) * fix(auth): polish review feedback — first-request dev populate + PAT doc Two follow-ups from a code-reviewer pass on the foundation commit before opening the PR: - Dev-bypass populates session["internal_roles"] on the first request after sign-in, not just when external groups change. The previous guard only resolved when groups_changed=True, which left a hole for the LOCAL_DEV_GROUPS=`""` (explicit empty) flow: target=[], current=None, neither write branch fires, internal_roles stays unset, and require_internal_role then 403s with no roles to check against. The OAuth callback writes session["internal_roles"] unconditionally on sign-in (even []); dev-bypass now matches that semantics. Adds a single-pass populate gated on the key being absent from the session, so subsequent same-state requests still no-op (cheap session lookup, no resolver call). - Document that internal roles are session-scoped and PAT/headless clients will get 403 from any require_internal_role(...) endpoint. Same constraint already applies to session.google_groups (PAT JWTs deliberately don't snapshot group memberships — they could change after issuance with no way to re-sign), but the doc didn't surface this — an operator pointing a CLI at a role-gated endpoint would see 403 with no clue why. New "PAT and headless requests" section spells out the constraint, the rationale, and the three escape valves (use users.role for the gate; route through OAuth; wait for the planned `da admin grant-role` CLI helper). 54 auth tests still pass locally (21 role-resolver + 33 existing auth-provider). * release(0.11.3): cut release for the internal-roles foundation Bumps pyproject.toml 0.11.2 → 0.11.3 and renames CHANGELOG's [Unreleased] section to [0.11.3] — 2026-04-26 (with a fresh empty [Unreleased] skeleton appended). Adds the matching [0.11.3] link reference at the bottom of CHANGELOG so the section heading renders as a hyperlink to the GitHub release page once the tag lands. The bullet itself is unchanged content; the rephrasing of "dev-bypass when external groups change" → "dev-bypass — populates on first request and whenever external groups change, mirroring the OAuth callback's always-write semantics" reflects the polish committed in d590579, plus the appended PAT/headless caveat pointing at the doc section that landed in the same polish pass. * fix(auth): address review feedback from Pavel — PAT-specific 403, audit logs, hardening Round-2 polish over the internal-roles foundation, addressing Pavel's review on PR #71. No behavior change for the happy path; tightens the safety rails and makes the failure modes self-explanatory. User-visible: - require_internal_role now distinguishes "no session" (Bearer/PAT caller) from "signed in but missing role" and surfaces a PAT-specific 403 detail in the first case ("This endpoint needs an interactive (OAuth) session — Bearer/PAT tokens do not carry session-resolved roles by design"). - docs/internal-roles.md documents deactivate+reactivate as the supported "force re-resolve now" lever for users that can't be made to log out. Internal hardening: - INFO-level audit log on every successful resolve (OAuth callback + dev-bypass) so a wrong-role complaint is debuggable from the log alone. - Startup warning when SESSION_SECRET is shorter than 32 chars, matching the existing JWT_SECRET_KEY gate — both HMAC surfaces sign trust-laden state (session.internal_roles, session.google_groups, JWTs). - _clear_registry_for_tests() now refuses to run unless TESTING=1 so a stray import path in production can't drop the registered capabilities. Tests: - 4 new tests in tests/test_role_resolver.py covering: stale-session contract after a mid-session mapping revoke (pin the documented limitation), PAT 403 detail wording, OAuth pipeline data flow from external groups to internal_roles, and the dev-bypass empty-list fallback when the resolver raises. CHANGELOG.md updated under [0.11.3] (### Changed + ### Internal). CLAUDE.md schema doc bumped from v7 to v8. --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-04-26 23:49:10 +02:00
ZdenekSrotyr	d2c76cb221	User management + PAT + CLI distribution + HTML auth redirect (#9 #10 #11 #12 ) (#28 ) * fix: redirect unauthenticated HTML routes to /login (#10) * docs(plan): user mgmt + PAT + CLI distribution implementation plan (#9 #10 #11 #12) * build(docker): produce wheel artifact for /cli/download (#9) * feat(db): schema v5 — users.active + deactivated_at/by (#11) * feat(api): /cli/download wheel + /cli/install.sh with baked server URL (#9) * feat(users): repository supports active flag + count_admins (#11) * feat(ui): /install page with per-deployment install instructions (#9) * feat(api): user PATCH/reset-password/set-password/activate/deactivate (#11) * fix(cli): da login prompts for password and sends it in body (#9) * test(api): safeguard tests for self-deactivate and last admin (#11) * feat(auth): reject requests from deactivated users (#11) * fixup(#10): propagate next through /login buttons + lock down sanitizer tests * feat(cli): da admin set-role/activate/deactivate/reset-password/set-password (#11) * feat(ui): /admin/users management page (#11) * feat(db): schema v6 — personal_access_tokens (#12) * feat(users): access_tokens repository (#12) * feat(auth): JWT carries typ (session\|pat) and explicit jti (#12) * feat(auth): reject revoked/expired PATs; update last_used_at (#12) * feat(api): /auth/tokens CRUD + admin revoke; session-only guard (#12) * feat(cli): da auth token create/list/revoke (#12) * feat(ui): /profile page with PAT create/list/revoke (#12) * docs: PAT usage and session/PAT TTL clarification (#12) * feat(auth): PAT first-use-from-new-IP audit + last_used_ip (schema v7) (#12) Closes remaining acceptance gap from issue #12: audit_log entry on first use of a PAT from an IP that differs from the recorded last_used_ip. - schema v7: personal_access_tokens.last_used_ip column - AccessTokenRepository.mark_used now stores the client IP - get_current_user extracts client IP (X-Forwarded-For first hop, fallback to request.client.host) and emits a token.first_use_new_ip audit when the IP changes on a subsequent use (not the very first use) - tests: new-ip audit, same-ip no-op, first-ever-use no-op, schema v7 column * fix: address Devin review findings on PR #28 - app/main.py: exclude /auth/* from HTML redirect handler so JSON endpoints under /auth/ (PAT CRUD used by `da auth token` CLI) keep their 401 JSON contract (Devin #1, bug) - app/api/tokens.py: reject expires_in_days <= 0 explicitly; use `is not None` so 0 no longer silently creates a non-expiring token (Devin #2) - app/api/users.py: validate role against Role enum in create_user to match update_user and prevent 500 on role-protected requests later (Devin #3) - app/web/templates/admin_users.html: escape user-supplied strings before innerHTML; move onclick handlers to addEventListener via data attributes so emails with quotes / HTML no longer break the UI or enable stored XSS (Devin #4) - app/auth/router.py, app/auth/providers/{password,google}.py: reject deactivated users at login instead of issuing a JWT that would then fail on the next request — removes the confusing redirect loop (Devin #5) - CLAUDE.md: document schema v7 instead of stale v4 (Devin #6) - tests/test_web_ui.py: regression test for the /auth/* JSON 401 * feat(web): add /profile and /admin/users links to dashboard nav * feat(web): point setup banner at /install page * chore(web): drop unused setup_instructions context * fix: address Devin review round 2 on PR #28 - app/api/tokens.py: when expires_in_days is None (the "never" option), use a ~100-year JWT expiry so the token doesn't silently die in 24h via the session-default fallback in create_access_token. The real expiry enforcement stays in verify_token's DB-level check (Devin 🔴) - app/web/templates/profile.html: escape t.name and other user-supplied strings via esc() helper before innerHTML, same pattern as admin_users.html. Move revoke onclick to data-attribute + addEventListener (Devin 🟡) - app/api/cli_artifacts.py: use `mktemp -d` with X's at end of template for GNU/BSD portability, place wheel inside the temp dir and clean up with rm -rf (Devin 🚩) * feat(web): redesign /install page; make curl one-liner primary, collapse manual Rebuild the public /install page using the dashboard visual language (shared header, card layout, gradient hero, design tokens from style-custom.css). The page is now anchored on the one-liner install path: curl -fsSL <server>/cli/install.sh \| bash is rendered as the primary, prominent step 1, while the old manual wheel-download flow is tucked behind a closed-by-default <details> block for users in restricted/offline environments. Information architecture: hero (server URL + version) -> step 1: quick install (one-liner, big Copy button) -> step 2: create PAT on /profile + export DA_TOKEN / da auth whoami -> step 3: Claude Code / MCP via ~/.config/da/token.json -> collapsed "Manual install" details for download-wheel flow -> footer link to docs/HEADLESS_USAGE.md Every shell snippet has a vanilla-JS "Copy" button that confirms visually ("Copied!" for 1.5s) and falls back to textarea+execCommand on non-secure contexts. No new dependencies, no bundler. The route now also pulls an optional user so the header shows the same nav (Dashboard / Profile / Logout) as dashboard.html when a session exists, while staying fully public when signed out. * fix(cli): use real wheel filename in install.sh (broken pip/uv install) The installer wrote the downloaded wheel as agnes_cli.whl, which lacks a PEP-427 version component — both pip and uv tool install reject it and abort the one-liner. Use curl -OJ so Content-Disposition determines the on-disk filename, then resolve it via glob. Install an EXIT trap to remove the tmpdir even when install fails. * fix(web): correct manual install wheel glob and add PEP 668 / PATH hints - Wheel glob is agnes_the_ai_analyst-.whl (not agnes-.whl) — the old pattern never matched the real artefact name from the build. - Add — or — separator between uv tool install and pip install. - Warn that pip install --user is blocked on macOS Homebrew / modern Debian (PEP 668) and recommend uv tool install as the default path. - Both flows now show the ~/.local/bin PATH hint so a fresh shell can find the da binary after install. * fix(web): consistent session.user reference in install header The avatar-letter fallback inside {% if session.user %} was reading user.name / user.email directly, but the route dependency can pass user=None — those references resolved to an empty FlexDict and produced an empty avatar circle. Read everything through session.user to match the guard and the dashboard pattern. * fix(web): point headless usage link at GitHub source /docs/HEADLESS_USAGE.md 404s — no static route serves repo docs. Point the footer link at the rendered markdown on GitHub instead of adding a dedicated docs serving route just for one file. * feat(web): /install hero size, anon sign-in banner, step 2 copy polish - Bump hero h1 from 26px to 30px to match dashboard primary scale. - Anonymous visitors see a small sign-in banner above Step 2 (creating a token requires auth; without the banner the flow appears stuck). - Add an 'After generating your token' section label inside Step 2 so the /profile CTA button no longer looks wedged mid-sentence between adjacent paragraphs. * chore(web): /install a11y + version pill polish - aria-live='polite' on copy buttons so screen readers announce the 'Copied!' state change. - Replace redundant INSTANCE_NAME eyebrow (already in the header logo) with 'Getting started'. - Hide the version pill when AGNES_VERSION is unset/'dev' — avoids the misleading 'vdev' label in local/unbuilt runs. - Manual summary focus-visible outline-offset +2px (was -2px which clipped inside the card), and mark the chevron as decorative. * fix(web): use session.user in dashboard avatar fallback Inside {% if session.user %} guard, the avatar fallback referenced (user.name or user.email). If user is None the block crashes when the profile picture is absent. Align with the guard variable. * fix: address Devin review round 3 on PR #28 - app/api/users.py: stop auto-sending email from reset_password. The magic-link sender would deliver a "Login Link" that — when clicked — consumes the reset_token via verify_magic_link and logs the user in WITHOUT prompting for a new password. Admins now share the raw reset_token from the API response manually, or use set-password directly. email_sent is always False. Documented inline. (Devin 🟡) - app/api/cli_artifacts.py: harden /cli/install.sh generation against shell injection via Host header or AGNES_VERSION. base_url is validated against a strict scheme+host+port regex; version against an alnum + dot/dash/underscore allowlist. Both values are also piped through shlex.quote() as defense in depth. (Devin 🟡) The shared users.reset_token column between magic-link and password- reset flows (Devin 🚩) remains an architectural gap; splitting into separate columns needs schema v8 and is tracked for a follow-up PR. * docs, chore(grpn): manual-deploy helpers + hackathon deploy learnings Adds scripts/grpn/ — Makefile + agnes-auto-upgrade.sh + README for operating Agnes on GRPN's existing foundryai-development VM when the full Terraform flow is blocked by org policies: - iam.disableServiceAccountKeyCreation (org constraint) forbids SA JSON keys, so GCP_SA_KEY-based CI is unavailable - No projectIamAdmin delegation → bootstrap-gcp.sh can't grant roles - Secret Manager IAM bindings require setIamPolicy which editor lacks Helper targets: deploy, deploy-tag, recreate, restart, stop, start, status, version, logs, ps, env, ssh, tunnel, open, bootstrap-admin, set-data-source, install-cron, uninstall-cron. docs/superpowers/plans/2026-04-22-grpn-deploy-learnings.md — running log of all org-policy constraints hit during the hackathon deploy, with workarounds and derived follow-ups (WIF support, external_ip variable, customer onboarding IAM checklist). Not a replacement for the TF flow — stopgap until WIF lands. * fix(web): make header logos clickable links to home * feat(web): one-click "Setup a new Claude Code" button Adds a single-button flow on the dashboard and /install page that generates a fresh personal access token via POST /auth/tokens and copies a complete, paste-ready setup script (server URL, token, install/verify commands) to the clipboard. Falls back to a modal textarea when the clipboard is blocked; redirects to /login on 401; surfaces backend errors inline. - dashboard.html: replaces the top "Set up your local environment" anchor with a real button wired to setupNewClaude(). Removes the duplicate bottom setup banner to keep a single entry point. - install.html: for signed-in users, Step 1 leads with the one-click button and demotes the curl one-liner into a collapsible "Or run manually" aside. Anonymous visitors still see the curl flow plus a sign-in hint. - No new deps. Vanilla JS. Token lives in memory/clipboard only — never rendered into persistent DOM. * feat(cli): add "da auth import-token" for non-interactive PAT login Writes a provided JWT into ~/.config/da/token.json using the canonical {access_token, email, role} shape expected by save_token(). Decodes the token locally to pull email/role claims, verifies it against the server via GET /api/catalog/tables, and refuses to overwrite an existing token file if the server returns 401. --email / --role overrides exist for tokens missing those claims; --skip-verify bypasses the server round-trip for offline / CI scenarios. * test(cli): cover da auth import-token success + 401 + claim-fallback paths Three new tests in TestAuthImportToken: - valid JWT + 200 -> canonical token.json written - 401 from /api/catalog/tables -> exit 1, existing token file untouched - JWT without email/role claims -> refused without overrides, accepted with --email / --role flags * feat(web): update one-click Claude setup instructions — explicit uv install, import-token, skills question Replaces the fragile `cat > token.json <<EOF` clipboard payload with an explicit, auditable sequence: 1. `curl -fsSL /cli/download` + `uv tool install --force` (no opaque `curl \| bash`). 2. `da auth import-token --token ...` instead of hand-written JSON. 3. Explicit PATH persistence for zsh/bash. 4. A required question to the user about whether to copy the bundled skills into ~/.claude/skills/agnes/ or pull them on-demand via `da skills show`. 5. A final confirmation step with whoami + version output. Factored both pages to include a shared partial (app/web/templates/_claude_setup_instructions.jinja) so dashboard.html and install.html can never drift apart again. {server_url} and {token} stay as runtime placeholders substituted by renderSetupInstructions(). * feat(ui): modernize /admin/users + unify header nav across pages - New shared partial app/web/templates/_app_header.html — single source of truth for the top navigation. Used by base.html and dashboard.html (which doesn't extend base.html). Active page highlighted via request.url.path. Admin "Users" link gated by session.user.role. - style-custom.css: add .app-header / .app-nav-link / .app-btn-logout / .app-avatar styles (mirrors dashboard's previous inline copy under app-* prefix). Mobile-friendly fallback at <720px. - base.html: include the new partial so every page extending base (admin_users, profile, login_email, error, …) gets the same chrome the dashboard has. - dashboard.html: replace its inline <header class="header"> markup with the shared partial. Inline .header CSS left in place as harmless dead code (separate cleanup PR). - admin_users.html: rewritten with avatars, role pills (color-coded per role), toggle switch for active, search/filter input, toast notifications, modal dialogs replacing alert/confirm/prompt, one-click copy for the reset token, empty / loading states. All XSS-safe via the existing esc() helper + data-attribute event delegation. - tests/test_web_ui.py: smoke test that /admin/users renders the new shared header chrome and the modernized markup. * feat(api): serve CLI wheel at /cli/agnes.whl for direct uv install uv tool install inspects the URL path suffix to recognise a wheel, so /cli/download (which has no .whl suffix) cannot be installed directly. Expose a stable /cli/agnes.whl alias over the same wheel lookup so users can run: uv tool install --force https://<server>/cli/agnes.whl * test(cli): cover da auth import-token --server persisting to config.yaml The server persistence was already implemented in the import-token command (save_config({server}) call) but not covered by tests. Add an explicit test so the one-step setup contract — single import-token call writes both token and server — cannot regress. * feat(web): simpler Claude setup — single uv install URL, single import-token call User feedback: the prior clipboard payload repeated the server URL and token across multiple steps (curl + tmpfile + install + rm + separate seed-config + import-token). Collapse to: 1. uv tool install --force {server_url}/cli/agnes.whl (single URL, direct) 2. da auth import-token --token ... --server ... (one call, persists both) 3. da auth whoami 4. skills (ask user first) 5. confirm uv accepts HTTPS URLs that end in .whl and installs them directly, so the tmpfile dance is unnecessary. import-token --server already persists the server to config.yaml, so no separate printf > config.yaml step. * fix(tests): update admin users heading assertion after template rename The admin_users.html template now uses <h2 class="users-title">Users</h2> instead of <h2>User management</h2>. Update the assertion to match. * feat(ui): unify header across remaining 7 standalone pages These 7 pages render their own full <html> and don't extend base.html, so the previous unification commit only covered base + dashboard. Each had its own ad-hoc <header> markup with inconsistent classes (.top-header / .header / .page-header), inconsistent nav-link sets, and inconsistent avatar/email styling. Replace each inline <header>...</header> block with the shared {% include '_app_header.html' %} so /activity-center, /admin/permissions, /admin/tables, /catalog, /corporate-memory, /corporate-memory/admin, and /install all show the same chrome (Dashboard / Install CLI / Profile / Users / email + avatar / Logout) with the active page highlighted via request.url.path. Old inline header CSS (.header, .top-header, .page-header, .nav-link, etc.) is left in place as harmless dead code; it can be cleaned up in a follow-up sweep. * feat(web): add readable preview of Claude setup payload on dashboard + /install Move the line-by-line setup instructions into app/web/setup_instructions.py as the single source of truth, then render them in two modes from the existing _claude_setup_instructions.jinja partial: - preview_mode=True → visible, read-only <pre><code> block with the real server URL and a clearly-styled placeholder token (never a real one). - preview_mode=False → the JS SETUP_INSTRUCTIONS_TEMPLATE used by the one-click flow (unchanged behaviour). Both /dashboard (env-setup-cta card) and /install (Step 1 card) now show the preview directly under the 'Setup a new Claude Code' button so users can see exactly what will land in their clipboard before they click. * feat(web): update setup instructions — `da diagnose` step, explicit section titles Rework the Claude Code setup payload to: - Give every numbered step an unambiguous verb header ("1) Install the CLI", "2) Log in", "3) Verify the login", "4) Run diagnostics", "5) Skills (ask the user first)", "6) Confirm"). - Add step 4 `da diagnose` as the post-login health check. The CLI already ships this command (cli/commands/diagnose.py); it prints "Overall: healthy" and a list of green checks that map cleanly to next actions. - Ask the skills copy-vs-on-demand question verbatim so Claude Code always prompts the user the same way. - Replace the terse "Confirm" line with a 4-bullet summary (version, whoami, skills choice, diagnose status) so the return message is structured and comparable across setups. * chore(web): remove stale MCP card from /install (no MCP server today) The 'Use with Claude Code / MCP' card (Step 3 on /install) referenced an MCP integration Agnes does not ship. Remove the whole card. The one-click 'Setup a new Claude Code' flow in Step 1 already covers the long-lived client use case and is less confusing than dangling persistence tips for a non-existent integration. * feat(api): include user_email + last_used_ip + user_id in admin tokens list response Adds AdminTokenItem response model (superset of TokenListItem) and AccessTokenRepository.list_all_with_user() joining personal_access_tokens with users to denormalize user_email. Needed for /admin/tokens UI where admins triage tokens across all users. * feat(web): /admin/tokens page — list, filter, search, revoke across all users Adds a new admin-only page with client-side filtering (status, user email, last-used window), column sorting, counts bar (active/revoked/expired), and an inline revoke action. Mirrors the /admin/users visual language. * feat(web): add Tokens nav link for admins + deep-link from admin/users row Admin-only nav entry to /admin/tokens, and a per-row Tokens button on /admin/users that prefills the token page's user filter via ?user=<email>. * test(admin): cover /admin/tokens rendering, filter state, non-admin denial, revoke Verifies admin can render the page (title + JS hooks present), a non-admin is blocked, unauthenticated users are redirected, the admin list response includes user_email / user_id / last_used_ip, and admin can revoke another user's token. * feat(web): modern redesign of /admin/tokens — hero, stat strip, refined table, responsive cards, a11y * feat(web): ditch the table — /admin/tokens as a card stack, modern GitHub-style list Replaces the table-based layout with a stack of self-contained token cards inside a <ul role=list>. Each card is a flex row: avatar + name/meta on the left, last-used block in the middle, status pill + outlined 'Revoke' button on the right. Status and sort controls are pill-shaped toggle chips; user email search has an inline search icon. No <table>/<tr>/<th>/<td> anywhere. Responsive below 720px (card stacks vertically) and 480px (stat chips 2x2). Preserves filter IDs (flt-status, flt-user, flt-last-used) and data-revoke for existing tests. * feat(web): add /tokens (role-aware) — single page for both user PAT CRUD and admin overview - Rename admin_tokens.html -> tokens.html with a new is_admin context flag. - New route GET /tokens: renders the same card-stack UI for everyone. * Admins: loads /auth/admin/tokens, shows owner column + stat strip, keeps the owner-email search box and sort-by-owner chip. * Non-admins: loads /auth/tokens (own tokens only), hides owner column + stat chips, adds a 'New token' CTA in the hero that opens a modal (name + expires_in_days) calling POST /auth/tokens. The raw token is revealed once in a dismissable banner and cleared from the DOM on Hide. - GET /admin/tokens now 302-redirects to /tokens, preserving query string (so the /admin/users deep-link ?user=foo still works). * feat(web): /tokens full-bleed layout to match dashboard width The hero, toolbar, and card list used to sit inside base.html's .container (max-width 800px). Break out with negative horizontal margins so the page spans the viewport like /dashboard does, capped at 1440px for readability on very wide screens with a 24px gutter on each side. - No change to base.html itself. The override is scoped to .tokens-page. - body { overflow-x: hidden; } guards against rare horizontal scrollbars. - < 808px viewport: reset to natural flow (mobile already narrower). - ≥ 1488px viewport: cap to 1440px and re-center. * chore(web): remove /profile template + nav link (redirect /profile -> /tokens) The old /profile PAT CRUD page is now redundant — the modern /tokens page covers both user and admin flows. Delete the template; the router's /profile handler already 302-redirects to /tokens. Nav cleanup: - Remove the 'Profile' link. - Show a single 'Tokens' link to every signed-in user (previously only admins saw it). - Active-state matches /tokens, /admin/tokens, and /profile so the highlight survives the redirect chain. /install CTA now points at /tokens instead of /profile. * test: cover /tokens for admin + non-admin flows, /profile redirect, nav update tests/test_admin_tokens_ui.py - Point admin rendering test at /tokens directly and tighten assertions (admin-only stat strip + owner search, non-admin CTA absent). - Add test_non_admin_can_render_tokens_page: personal body, New-token CTA, create-modal, reveal banner; stat strip + owner search absent. - Add test_admin_tokens_redirects_to_tokens: 302 to /tokens, query string (?user=...) preserved for the /admin/users deep-link. - Add test_profile_redirects_to_tokens: 302 to /tokens. - Add test_non_admin_can_create_pat_via_tokens_page_api: exercises the POST /auth/tokens call that the non-admin create-modal submits. tests/test_pat.py - test_profile_page_renders -> test_profile_page_redirects_to_tokens: assert the 302 + that /tokens lands on the unified non-admin body. tests/test_web_ui.py - admin_users nav assertion: 'Tokens' link present, 'Profile' link absent. - Add test_nav_shows_tokens_link_for_non_admin: non-admins see the same 'Tokens' link (previously only admins did). - Add test_profile_redirects_to_tokens back-compat check. * feat(web): collapse 'What Claude Code will receive' by default The preview block on /dashboard and /install now uses <details>/<summary> so it is hidden by default. Click the chevron/title to expand and review the clipboard payload. Markup stays in the DOM so existing tests that assert on content continue to pass. * fix(web): /tokens width — override .container to 1280px like dashboard The negative-margin full-bleed trick was fragile and pushed content past the right edge on deployed viewports. Replace with a simple max-width override of base.html's .container on this page only, matching /dashboard's 1280px center-column layout. * feat(web): split role-aware /tokens into my_tokens.html + admin_tokens.html * feat(web): router — separate handlers for /tokens (own) and /admin/tokens (all) * feat(web): nav — show Tokens for all, add All tokens for admins * test: cover split token pages (own vs all) + admin access gating * feat(web): move 'My tokens' into a user dropdown menu Replaces the separate Tokens/email/Logout nav trio with a rounded avatar trigger that opens a dropdown containing the user's email, role, a 'My tokens' link, and Logout. Admin-only 'All tokens' stays as a top-level nav item since it's an admin function, not a personal one. Click-outside and Escape close the panel; chevron rotates on open. * fix(api): allow PATs to list/get/revoke their own tokens (CLI flow) The documented 'da auth token list/revoke' CLI flow in docs/HEADLESS_USAGE.md uses a PAT, but the previous dependency (require_session_token) returned 403. Only create_token must be session-only to prevent PAT-spawning-PAT chains; listing and revoking your own tokens is safe with a PAT. * fix(api): cap expires_in_days at 3650 to avoid datetime overflow (500 to 400) Values above ~11 million days overflowed datetime.max in datetime.now(utc) + timedelta(days=...) and surfaced as an unhandled OverflowError → 500. Cap at 10 years with a clear 400 instead; the no-expiry code path is unaffected. * fix(api): relax _SAFE_URL_RE to allow path prefixes, underscores, and IPv6 The previous regex rejected legitimate reverse-proxy base_url values (https://host/agnes/), underscores in Docker Compose hostnames, and IPv6 literals (http://[::1]:8000). Widen the charset and allow an optional trailing path. shlex.quote continues to provide defense-in-depth against any metacharacter that slips through. * fix(web): /login/email and Google OAuth propagate next_path Previously, /login/email silently dropped the ?next=<path> query param so the hidden form field rendered empty and login always landed on /dashboard. Google's button was hard-coded to /auth/google/login, ignoring next entirely. - /login page now appends ?next to the Google button URL - /login/email reads + sanitizes next, passes as template context - google_login stashes sanitized next_path in session['login_next'] - google_callback pops + re-sanitizes and redirects there Sanitization factored into app/auth/_common.safe_next_path. * fix(auth): differentiate argon2 VerifyMismatchError from internal errors in web login The previous except (VerifyMismatchError, Exception) collapsed both cases into the generic 'invalid credentials' redirect, silently hiding corrupted-hash / library errors from ops. Split the two: bad password still gets ?error=invalid; anything else logs via logger.exception and redirects with ?err=auth_internal so ops have a visible signal and users don't retry forever against a broken password_hash column. * docs: correct CLAUDE.md table name (personal_access_tokens) v7 note referenced 'access_tokens.last_used_ip' but the real table is personal_access_tokens (as mentioned two tokens earlier in the same bullet). Same-file consistency fix. * chore(web): clarify admin user-reset UI — encourage Set password over the unused reset_token POST /api/users/{id}/reset-password stores and returns a token but no endpoint consumes it — the magic-link sender would log the user in without prompting for a new password, defeating the reset. - Drop the 'Reset' row action from admin_users so admins aren't pointed at a dead end. - Rewrite the reveal-modal copy to tell admins to use Set password and explicitly note that the magic-link flow isn't available for reset tokens in this build. The API endpoint stays for API-level future use. * test: cover PAT CLI flow, expires_in_days overflow, proxy base_url, next propagation - tests/test_pat.py: PAT can list own tokens (200, was 403); PAT can revoke own tokens (204); create_token returns 400 for expires_in_days > 3650 (was 500 via datetime overflow). - tests/test_cli_artifacts.py: _SAFE_URL_RE accepts reverse-proxy path prefixes, underscores, and IPv6 literals; end-to-end check of cli_install_script with a stubbed base_url that includes a path prefix (Agnes behind /agnes/). - tests/test_web_ui.py: /login propagates ?next to the Google button URL; /login/email renders next in the hidden form field and strips hostile values; unit coverage of safe_next_path. * fix(security): use \Z instead of $ in URL/version allowlists (trailing-\n bypass) Python regex `$` also matches just before a trailing newline, so a Host header or AGNES_VERSION value like "good.example.com\n$(rm -rf /)" would slip past the allowlist. `\Z` anchors to strict end-of-string. shlex.quote downstream remains as defense-in-depth, but the allowlist is now the tight gate it claims to be. * fix(auth): PAT with null expiry omits JWT exp claim (DB is the source of truth) Previously a PAT created with `expires_in_days=null` (user-requested "never expires") set the DB `expires_at` to NULL (correct) but still baked a ~100y `exp` claim into the JWT. That is misleading: the PAT silently did expire eventually, despite the UI and API promising "no expiry". `create_access_token` now accepts `omit_exp=True` to skip the `exp` claim entirely. `app/api/tokens.py` passes that when `expires_in_days is None`. The authoritative expiry check lives in `app/auth/dependencies.py`, which reads `expires_at` from the DB row — unchanged. PyJWT accepts claim-less JWTs indefinitely. * test: cover trailing-newline regex bypass + no-exp JWT for unbounded PAT - test_safe_url_re_rejects_trailing_newline_bypass: asserts both `_SAFE_URL_RE` and `_SAFE_VERSION_RE` reject values with a trailing `\n` (previously accepted because Python `$` matches before `\n`). - test_pat_null_expiry_jwt_has_no_exp_claim: POST /auth/tokens with `expires_in_days=null`, decode the returned JWT, assert `exp` is absent while `typ=pat`, `sub`, and `jti` are still present. - test_pat_with_null_expiry_is_accepted_by_verify_token: verify_token round-trips a claim-less JWT without ExpiredSignatureError. - test_pat_null_expiry_end_to_end_allows_authenticated_request: use the null-expiry PAT against /auth/tokens and confirm it authenticates. * docs(auth): document X-Forwarded-For trust model in _client_ip Deployment runs behind Caddy which strips incoming X-Forwarded-For and sets its own, so the leftmost hop is trustworthy. Clarify that the stored last_used_ip is audit-only and never used for access control — if the app is ever exposed directly, this value becomes client-settable. * docs: /profile → /tokens in install.sh next-steps, CLI error, HEADLESS_USAGE, security skill After splitting PAT management to /tokens (with /profile as a back-compat 302), stale references remained in user-facing text. Update them to the canonical /tokens URL so shell scripts, CLI error hints, docs, and the bundled security skill are all consistent.	2026-04-22 14:24:28 +02:00
ZdenekSrotyr	618385e7e4	fix: table_catalog in re-attach query, --limit in hybrid CLI - _reattach_remote_extensions: query table_catalog instead of table_schema (DuckDB ATTACHed databases use table_catalog for the alias) - _query_hybrid: forward --limit flag to RemoteQueryEngine.max_result_rows Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 20:13:35 +02:00
ZdenekSrotyr	f4129dc87d	fix: alias validation, url escaping, read-only CLI, blocklist comment Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-11 11:28:27 +02:00
ZdenekSrotyr	0a69814fca	fix: re-attach remote extensions in get_analytics_db_readonly() Add _reattach_remote_extensions() helper that reads _remote_attach tables from attached extract.duckdb files and LOADs the corresponding DuckDB extensions, so BigQuery and other remote views resolve correctly in read-only analytics connections.	2026-04-11 11:04:04 +02:00
ZdenekSrotyr	bc394bd266	feat: schema migration v3→v4 with metric_definitions and column_metadata tables Add SCHEMA_VERSION = 4, _V3_TO_V4_MIGRATIONS list, and if current < 4 block in _ensure_schema(). Both new tables are also added to _SYSTEM_SCHEMA for fresh installs. Tests cover fresh install, all columns, and v3→v4 migration path.	2026-04-10 19:14:32 +02:00
ZdenekSrotyr	dc8a9275e6	fix: address Devin review round 3 — retry exhaustion, discover path, WAL snapshot - CalVer retry loop now exits with error if all 5 attempts fail (prevents pushing Docker image with unclaimed version tag) - discover_tables endpoint reads data_source.keboola.url (consistent with configure_instance and _discover_and_register_tables) - Pre-migration snapshot flushes WAL via CHECKPOINT before copying and copies .wal file if it still exists after flush 663 tests pass.	2026-04-10 14:11:17 +02:00
ZdenekSrotyr	6c53082295	feat: multi-instance deployment — all 14 must-have items from spec CalVer CI (release.yml) with stable/dev channels, health endpoint with version/channel/schema_version, JWT secret auto-generation with file persistence, smoke test script + Docker-in-CI, pre-migration snapshot, /api/admin/configure for headless setup, /api/admin/ discover-and-register, /setup wizard, OpenAPI snapshot test, custom connector mount support, CHANGELOG, migration safety tests, startup banner. 663 tests pass (6 new migration safety + 3 OpenAPI snapshot + 1 updated JWT test).	2026-04-10 11:57:42 +02:00
ZdenekSrotyr	53a9e838f9	feat: add graceful shutdown handler - Add close_system_db() function in src/db.py to cleanly close shared DB connection - Add lifespan context manager in app/main.py to trigger shutdown on app exit - Integrate lifespan into FastAPI app initialization - All API tests pass (77/77)	2026-04-09 07:03:45 +02:00
ZdenekSrotyr	1b219cabe9	fix: remove dead PRAGMA enable_wal code DuckDB has used WAL by default since v0.8, so this pragma is not valid DuckDB syntax. Removed obsolete try-except block that attempted to enable WAL on system database initialization.	2026-04-09 06:59:57 +02:00
ZdenekSrotyr	3e9c347cf1	fix: validate extract dir name in get_analytics_db_readonly to prevent SQL injection Adds _SAFE_IDENTIFIER regex guard before ATTACHing extract.duckdb files in the read-only analytics connection, matching the same fix already applied in the orchestrator. Adds test coverage for malicious directory names.	2026-04-09 06:57:31 +02:00
ZdenekSrotyr	23ae6a602c	security: harden query endpoint SQL blocklist and disable external access Expand blocked keywords to cover parquet_scan, read_csv_auto, query_table, iceberg_scan, delta_scan, call, URL schemes (http/https/s3/gcs), and additional file-scan functions. Set enable_external_access=false on the non-read-only analytics connection path. Add three new tests covering parquet_scan, read_csv_auto, and query_table blocking.	2026-04-09 06:54:58 +02:00
ZdenekSrotyr	ee7d5630ef	fix: keep external_access enabled — views need read_parquet on local files File access attacks blocked by SQL blocklist instead of DuckDB pragma (pragma also blocks legitimate view resolution via read_parquet).	2026-04-08 12:33:05 +02:00
ZdenekSrotyr	f2f9a62803	fix: set enable_external_access=false AFTER ATTACHing extracts	2026-04-08 12:29:27 +02:00
ZdenekSrotyr	6efdf4ca64	fix: read-only analytics DB ATTACHes extract.duckdb files for view resolution	2026-04-08 12:27:12 +02:00
ZdenekSrotyr	05a1b452e9	security: harden query (read-only DB), uploads (path sanitization), scripts (AST validation)	2026-04-08 12:09:19 +02:00
ZdenekSrotyr	2d6a94fb6f	fix: DuckDB concurrency — WAL mode, subprocess sync, temp+rename Three-pronged fix for DuckDB lock conflicts: 1. WAL mode on system.duckdb — enables concurrent readers + writer 2. Sync trigger runs extractor as subprocess (not background task) — separate process = separate DuckDB connections, no lock conflict 3. Both extractor and orchestrator write to .tmp then atomic rename — avoids lock conflict with API reads on extract.duckdb/analytics.duckdb Fixes #9 permanently.	2026-03-31 13:19:57 +02:00
ZdenekSrotyr	675a29c1c7	fix: DuckDB connection pool — shared connection avoids lock conflicts Fixes #9 — background sync tasks could not access system.duckdb because FastAPI held an exclusive lock. Now uses single shared connection per DATA_DIR with cursor() for thread safety.	2026-03-31 13:01:04 +02:00
ZdenekSrotyr	2e7d5d1fe9	feat: access request UI — catalog badges, request modal, admin approval page Backend: - access_requests table in DuckDB schema - AccessRequestRepository with create/approve/deny/list - API: POST/GET /api/access-requests (submit, my requests, pending, approve, deny) UI: - Catalog: lock icon on private tables, "Request Access" button + modal - Catalog: "Pending" badge for tables with pending requests - Admin permissions page (/admin/permissions): approve/deny requests, grant/revoke permissions, view all user permissions - Cross-navigation between admin/tables and admin/permissions 733 tests passing.	2026-03-31 12:45:29 +02:00
ZdenekSrotyr	1074d5ec49	feat: implement data access control — table-level permissions Schema v3: add is_public column to table_registry (default true). src/rbac.py: can_access_table() checks admin bypass, public flag, explicit permissions, wildcard bucket permissions. API enforcement: - manifest: filters tables by user access - download: 403 if no access - catalog: filters table list - query: validates referenced tables against allowed list New admin permissions API (/api/admin/permissions) for grant/revoke. 28 access control tests + 733 total tests passing.	2026-03-31 12:33:31 +02:00
ZdenekSrotyr	18e5f0b6e8	feat: implement extract.duckdb contract — orchestrator + extractors Phase 0: extend table_registry schema (v1→v2 migration), add source_type/bucket/source_table/query_mode columns. Phase 1: SyncOrchestrator ATTACHes extract.duckdb files into master analytics.duckdb. Keboola extractor uses DuckDB extension with legacy client fallback. BigQuery extractor is remote-only via DuckDB BQ extension (no data download). 62 tests passing.	2026-03-30 20:12:56 +02:00

1 2

52 commits