Two Devin-flagged regressions on the squashed PR #106 head:
1) PATCH /api/admin/groups/{id} blanket-rejected on system groups.
The repository guard at src/repositories/user_groups.py was already
narrowed to "rename only" by 7147bac (PR #110 follow-up), but the
endpoint at app/api/access.py:331-343 still short-circuited with
409 "System groups are immutable" for any mutation. A description-only
payload like {"description": "..."} returned 409 instead of 200 even
though the repo would have accepted it. CHANGELOG entry promised the
fix but the code didn't match.
Endpoint now mirrors the repo contract: 409 only when payload.name
is set AND differs from existing name. Same-name no-op renames are
dropped before the repo call. Description-only updates flow through.
2) Google OAuth callback wiped google_sync memberships on transient
API failure.
fetch_user_groups is fail-soft and returns [] for both "user has no
groups" and "Cloud Identity API error". The callback fed that empty
list into replace_google_sync_groups, which DELETEs all rows with
source='google_sync' for the user then INSERTs zero — silently
wiping every Workspace-synced membership on a hiccup.
Callback now skips replace_google_sync_groups when group_names is
empty and logs "preserving existing memberships". Trade-off: a user
whose Workspace groups were genuinely cleared keeps stale memberships
until the next non-empty sync. Admin-added rows (source='admin') were
already protected by source-scope and are unaffected. The previous
guard against this exact regression was test_callback_empty_groups_
does_not_overwrite_existing in tests/test_auth_providers.py — that
test class has been skipped since v12 (asserts users.groups JSON,
needs rewrite for user_group_members).
52 KiB
Changelog
All notable changes to Agnes AI Data Analyst.
Format: Keep a Changelog. Versions follow Semantic Versioning, pre-1.0 — public surface (CLI flags, REST endpoints, instance.yaml schema, extract.duckdb contract) may shift between minor versions; breaking changes called out under Changed or Removed with the BREAKING marker.
CalVer image tags (stable-YYYY.MM.N, dev-YYYY.MM.N) are produced for every CI build; semver tags (v0.X.Y) are cut at release boundaries and reference the same commit as a stable-* tag from the same day.
[Unreleased]
[0.12.0] — 2026-04-28
Changed
/admin/accessresource tree now visually separates the three-level hierarchy (resource type → block/bucket → item). Each resource-type section gets a colored left stripe and a faint tinted banner; sections are separated by an 8px neutral gap. Stripe colors cycle 4-wide vianth-childso adding new resource types toapp/resource_types.pyworks without touching CSS. The first-position color is the project primary blue (#0073D1), avoiding the violet (#6366f1) reserved for granted items.
Added
ResourceType.TABLE— admins can grant table-level access peruser_groupvia the/admin/accesspage. Tables registered intable_registryare listed grouped bybucket, with the existing per-block "Grant all" / "Revoke all" bulk actions. Listing and grant storage only — runtime enforcement still flows through legacydataset_permissions; the migration plan lives indocs/TODO-rbac-data-enforcement.md.AGNES_ENABLE_TABLE_GRANTSenv var (default off) gates the half-builtResourceType.TABLEchip. While disabled the chip is hidden from/admin/accessandPOST /api/admin/grantsreturns 422 with the env-var name indetailon a TABLE grant attempt. Existing TABLE rows inresource_grantsstay listable and deletable — the flag controls UI exposure and new-grant acceptance only, never blocks cleanup.da admin break-glass <user>CLI — recovery path when the operator has locked themselves out of/admin/access. Adds the user to the Admin user_group withsource='system_seed'regardless of RBAC state. Bypasses authentication; relies on filesystem access to${DATA_DIR}/state/system.duckdbimplying host-level trust. Document this in deployment runbooks alongsideSEED_ADMIN_EMAIL.
Internal
scripts/seed_dummy_tables.py— populatestable_registrywith 12 dummy tables across 3 buckets (in.c-finance,in.c-marketing,in.c-product), each withis_public=False, for exercising the new/admin/accessTables section without a configured data source./marketplace.zipshort-circuits to304before any file IO or ZIP compression on a matchingIf-None-Match. Hot path on every Claude Code SessionStart hook. Backed by an in-processcachetools.TTLCacheover the resolved-plugins → ETag map (default 120s, env-tunable viaAGNES_MARKETPLACE_ETAG_TTL, set0to disable).invalidate_etag_cache()is called by marketplace sync after refresh so the next request re-hashes against new on-disk content instead of waiting for TTL expiry. New explicit dependency:cachetools>=5.3.0.
Fixed
/admin/accessgroup sidebar grant-count badges no longer revert to a stale value when switching between groups. The badge was readingstate.groups[i].grant_count, a snapshot populated once at/access-overviewload; toggling a grant only updated the DOM (viarefreshCounts), not that field, so the nextrenderGroupscall (triggered byselectGroup) would clobber the live count with the original snapshot.renderGroupsnow derives the count live fromstate.grants, the array thattoggleGrant/bulkSetkeep in sync. Server data was always correct — only the in-page badge drifted until refresh./catalog,/admin/tables, and/admin/permissionspages now render the shared top header correctly. The pages include_app_header.html(which uses.app-*CSS classes) but were not linkingstyle-custom.csswhere those classes are defined; onlydashboard.htmlandbase.htmldid. Without the stylesheet the nav links, dropdowns, and user menu rendered as unstyled inline text. Added the missing<link>to all three templates.PATCH /api/admin/groups/{id}on a system group now correctly accepts description-only updates while still rejecting renames. The endpoint guard previously short-circuited with409 "System groups are immutable"for any mutation, which contradicted the repository layer's narrowed contract (rename-only rejection) — a description-only payload like{"description": "..."}would hit the endpoint short-circuit and never reach the repo. The endpoint now 409s only whenpayload.namediffers from the existing name; a no-op rename (same name in payload) is dropped from the update before reaching the repo.- Google OAuth callback no longer wipes a user's
google_syncgroup memberships on a transient Workspace API failure.fetch_user_groupsis fail-soft and returns[]for both "no groups" and "API error" — the callback used to feed that empty list intoreplace_google_sync_groups, which deletes allsource='google_sync'rows for the user and then inserts zero. A login during a transient Cloud Identity hiccup would silently drop every Workspace-synced membership the user had built up. Admin-added memberships (source='admin') were already protected. The callback now skipsreplace_google_sync_groupswhen the fetch returns empty and logs "preserving existing memberships" instead. Trade-off: a user whose Workspace groups were genuinely cleared keeps stale memberships until the next non-empty sync — accepted untilfetch_user_groupslearns to distinguish empty-success from empty-failure. docker-compose.host-mount.ymlnow useso: bind,rbindinstead ofo: bindfor thedatavolume. With a plain bind, sub-mounts under/dataon the host (e.g. the dual-disk layout where sdc is mounted on/data/state) are silently shadowed inside the container by an empty subdirectory on the parent disk. The container then writessystem.duckdband other state to the wrong disk; the dedicated state disk receives no writes and accumulates only the snapshot left by the migration script. Recursive bind propagates existing sub-mounts at container start, so the container sees the same filesystem the host does. Operators on dual-disk VMs need to copy the live DB from/var/lib/docker/volumes/agnes_data/_data/state/(sdb's empty subdir) onto/data/state/(sdc) before redeploying with the fix, or the next start will surface the stale snapshot.
Changed
- BREAKING Marketplace endpoint (
/marketplace.zip,/marketplace.git/*) no longer god-modes for Admin members.src.marketplace_filter.resolve_allowed_pluginsnow filters every caller — admins included — throughresource_grants. Admins curate their own marketplace view by granting plugins to the Admin group (or any group they belong to). Existing installs where the only membership on Admin is the admin themselves will see an empty marketplace until grants are added in/admin/access. App-level authorization (require_admin,can_accessfor non-marketplace types) is unaffected — Admin is still god mode there. - BREAKING RBAC redesigned around two layers: app-level access via the
Adminuser-group (god mode short-circuit) and resource-level access via a generic(group, resource_type, resource_id)grant model. The four-valuecore.viewer/analyst/km_admin/adminhierarchy withimpliesBFS expansion is gone — every protected endpoint now uses eitherrequire_adminorrequire_resource_access(ResourceType.X, "{path}")from the newapp.auth.accessmodule. Authorization is decided per-request via a single DB lookup; no session cache, no dual-path resolver, no_hydrate_legacy_roleshim. Seedocs/RBAC.md. - BREAKING
internal_roles,group_mappings,user_role_grants, andplugin_accesstables removed. Replaced byuser_group_members(binds users to user_groups with asourceenum:admin/google_sync/system_seed) andresource_grants(group →(resource_type, resource_id)). Schema v13; the migration backfills from v12 atomically —users.groupsJSON is converted intouser_group_membersrows withsource='google_sync',core.admingrants become Admin-group memberships withsource='system_seed', andplugin_accessrows becomeresource_grantsof typemarketplace_plugin. Theusers.groupsJSON column is dropped; the deprecatedusers.rolecolumn is kept NULL as a legacy artifact. - BREAKING Schema v14 —
user_group_membersandresource_grantsnow declare DuckDB foreign-key constraints ongroup_id(referencinguser_groups.id). Cascade deletes can no longer leave orphaned member / grant rows pointing at a deleted group. Migration is RENAME → CREATE-with-FK → INSERT → DROP, wrapped inBEGIN TRANSACTIONso a partial failure rolls back without leaving the DB at a half-applied schema. Forks that touched these tables outside the documented repository APIs need to verify the FK direction matches their writes. - BREAKING Admin REST surface unified under
/api/admin/groups,/api/admin/groups/{id}/members,/api/admin/grants,/api/admin/resource-types.app.api.role_managementandapp.api.plugin_accessremoved. The web UI route/admin/role-mappingand/admin/plugin-accessare replaced by a single/admin/accesspage; the_app_header.htmllink is renamed to "Access". - BREAKING CLI subcommands
da admin role *,da admin mapping *,da admin grant-role,da admin revoke-role,da admin effective-rolesremoved. New subcommands:da admin group {list,create,delete,members,add-member,remove-member}andda admin grant {list,create,delete,resource-types}.da admin set-role <user> adminstill works as a thin wrapper that toggles Admin-group membership. - Module authors no longer call
register_internal_role(...). Resource types are anapp.resource_types.ResourceTypeStrEnumpaired with aResourceTypeSpecregistered inRESOURCE_TYPES; adding a new resource type means adding one enum member, onelist_blocks(conn)projection delegate, and one spec entry — all inapp/resource_types.py. The registry drives both/api/admin/resource-typesand/api/admin/access-overview, so there's no second wiring step. No DB migration, no startup hook. - Google OAuth callback writes Cloud Identity group memberships into
user_group_members(source='google_sync') instead ofusers.groupsJSON. Manual admin-added memberships (source='admin') survive subsequent logins.
Removed
app/auth/role_resolver.py,app/api/role_management.py,app/api/plugin_access.py.src/repositories/internal_roles.py,src/repositories/group_mappings.py,src/repositories/user_role_grants.py.app/web/templates/admin_role_mapping.html,app/web/templates/admin_plugin_access.html.Roleenum +has_role,is_admin,is_km_admin,is_analyst,_is_admin_user_dict,set_user_role,get_user_rolefromsrc/rbac.py. Dataset-access helpers (can_access_table,get_accessible_tables,has_dataset_access) preserved.- Test files:
test_role_resolver.py,test_api_role_management.py,test_admin_role_mapping_ui.py,test_cli_admin_role.py,test_schema_v9_migration.py,test_plugin_access_api.py.
Internal
src/db.pyschema bumped to v13. New helpers_seed_system_groups(idempotent Admin/Everyone seed, runs on every connect) and_v12_to_v13_finalize(one-shot backfill + DROP cascade) replace_seed_core_rolesand_backfill_users_role_to_grants.app.auth.accessis the new authorization vocabulary:_user_group_ids,is_user_admin,can_access,require_admin,require_resource_access. Lives in its own module to avoid the circular import that would happen if it sat inapp.auth.dependencies(the dependency factory needsget_current_userfrom there).- New
tests/helpers/auth.py::grant_admin(conn, user_id)— adds a user to the Admin system group sorequire_adminresolves to True. Updated test fixtures acrosstest_admin_tokens_ui,test_password_flows,test_pat,test_api,test_api_complete,test_api_scripts,test_web_uito call it afterUserRepository.create(role="admin"). The legacyusers.rolecolumn alone is no longer the admin marker. - Skipped at module level (rewrite required for v13):
test_admin_user_capabilities_ui(asserts the gone v9 capabilities UI),test_marketplace_server_zipandtest_marketplace_server_git(depend on the removedPluginAccessRepository). - Skipped individually as v13 behavior changes:
TestScriptRBACintest_security(scripts are now any-signed-in-user, not analyst+), profile-page tests intest_web_uithat assertedcore.analyst/Direct grants/Effective rolesmarkers from the dropped role hierarchy.
Added
- Schema v10 introduces
view_ownershipto detect cross-connector view-name collisions in the master analytics DB (issue #81 Group C). When two connectors register the same_meta.table_name, the orchestrator now refuses to silently overwrite the prior owner's view — it logs aview_ownership collisionERROR identifying both sources and the colliding name, and the second source's view is NOT created. Previously this was last-write-wins, which depended on directory iteration order and could change deployment-to-deployment. Operators resolve a collision by renamingnameintable_registryon one side (registry-side aliasing —source_tablestays unchanged, only the view name changes). The orchestrator pre-scans every connector's_metaat the start of each rebuild and releases stale ownerships immediately (when ALL pre-scans succeed; if any fail, reconcile is skipped to avoid silently stealing a transient-IO source's name), so a renamed table frees its name in the SAME rebuild that introduces the rename — no two-step waits needed. New modulesrc/repositories/view_ownership.pyexposes the repository.
Changed
-
BREAKING (ops): Keboola extractor now exits with three distinct codes instead of two (issue #81 Group B / M14):
0= full success,1= full failure,2= partial failure (some tables succeeded, some failed). Previouslyexit(0)fired even when 9 of 10 tables failed, masking partial failures from the sync API and any operator alerting hooked to non-zero exit codes. The sync API (POST /api/sync/trigger) now logsPARTIAL FAILURE (exit 2)as a data-quality alert (distinct fromFAILED (exit 1)) and continues to the orchestrator rebuild step — successful tables from this run plus unchanged tables from previous runs stay queryable. Operators whose alerting treated any non-zero exit as a hard error must teach it that exit 2 is a partial-failure signal, not a deploy failure. -
BREAKING (security): The entire Script API is now admin-only (issue #44).
GET /api/scripts,POST /api/scripts/deploy,POST /api/scripts/run, andPOST /api/scripts/{id}/runall require the admin role; previously the list endpoint was open to any authenticated user and deploy/run were analyst-accessible. Two reasons: (1) the AST + string-blocklist sandbox in_execute_scriptis defense-in-depth and known to be bypassable through introspection chains (__class__.__base__.__subclasses__(),__globals__['__builtins__'],__mro__traversal — the dunder pattern list was tightened in this PR but the policy is "the role gate is the trust boundary, not the blocklist"); (2) gating only/runleft a planted-script attack open — an analyst could deploy a malicious script and wait for an admin to run it. Operators who need scripted workflows for non-admin users should run them on the user's behalf or expose the relevant data via the read-only/api/datasurface instead. Migration for cron / scheduler PATs: if a non-admin PAT is wired into a scheduler that hits/api/scripts/{id}/runor/api/scripts/run, the request now returns 403. Add the PAT user to the Admin group via/admin/accessorda admin group add-member Admin <pat-user-email>. PATs themselves do not need re-issuing — group membership is read at request time. -
BREAKING (ops): Generic ops scripts moved out of the customer-named
scripts/grpn/directory intoscripts/ops/as part of the OSS vendor-neutralization (issue #88):scripts/grpn/agnes-tls-rotate.sh→scripts/ops/agnes-tls-rotate.shscripts/grpn/agnes-auto-upgrade.sh→scripts/ops/agnes-auto-upgrade.sh
Downstream consumer infra repos that copy these scripts onto VMs (e.g. via their own
startup.sh) must update the source path. The OSS-shippedinfra/modules/customer-instance/Terraform module is unaffected — it embeds equivalent logic inline via heredoc and does not source-by-path fromscripts/. Script behaviour and env vars are unchanged. Cross-refs inREADME.md,CLAUDE.md,docs/DEPLOYMENT.md,Caddyfile, anddocker-compose.ymlwere updated. -
OSS neutralization (wave 2 — code, tests, planning docs). Customer identifiers replaced with placeholders across the codebase to ready the repo for public release (issue #88):
- Code docstrings:
connectors/openmetadata/{client,transformer,enricher}.py,src/catalog_export.py,scripts/duckdb_manager.py—prj-grp-…→my-bq-project/prj-example-1234,AIAgent.FoundryAI→AIAgent.MyAgent(in docstrings) /AIAgent.Example(in test fixtures),FoundryAIDataModel→AnalyticsDataModel. - Test fixtures in
tests/test_openmetadata_enricher.py,tests/test_duckdb_manager.py,tests/test_catalog_export.py,tests/test_openmetadata_transformer.py— same set of replacements, behaviour-preserving (157 tests still green). - Terraform module
infra/modules/customer-instance/variables.tf:customer_namedescription rewritten in English, examples switched fromkeboola, grpntoacme, example. - Workflow
.github/workflows/keboola-deploy.yml: comment "Groupon-side dev VMs" → generic "per-developer dev VMs". - Caddyfile: TLS-rotation cross-ref updated to
scripts/ops/…and Keboola-specific aside removed. - Auth docs
docs/auth-groups.mdand the OAuth probe inscripts/debug/probe_google_groups.py: GCP project namekids-ai-data-analysisreplaced with placeholderacme-internal-prod. - Planning docs under
docs/superpowers/plans/and…/specs/: the five hackathon-era documents (2026-04-21-deployment-log.md,…-multi-customer-deployment.md,…-issues-14-and-10.md,…-hackathon-dry-run.md, the spec) had34.77.94.14/34.77.102.61replaced with<dev-vm-ip>/<prod-vm-ip>,Groupon/GRPN/grpnwithAcme/another-customer, andprj-grp-…withprj-example-….
- Code docstrings:
Fixed
- BREAKING (security CRITICAL): Jira webhook handler is now
fail-closed (issue #83). Previously, if
JIRA_WEBHOOK_SECRETwas unset,_verify_signaturereturnedTrueand any unauthenticated POST to/webhooks/jiracould trigger the full ingest pipeline. The handler now returns 503 when the secret is missing (operator-misconfiguration signal, distinct from 401 wrong-signature). Operators relying on the no-secret = accept-everything mode (don't — it was never documented) must setJIRA_WEBHOOK_SECRETbefore this merges. - Security (CRITICAL): Jira issue keys arriving via webhooks are now
validated against the canonical
^[A-Z][A-Z0-9]{0,31}-[0-9]{1,12}\Zformat ([0-9]not\dto refuse non-ASCII Unicode digits,\Znot$to refuse trailing newlines that$would tolerate) before any filesystem operation (issue #83). Previously,issue_keyflowed unsanitized intoconnectors/jira/service.py(save_issue,download_attachment,_handle_deletion,process_webhook_event) andconnectors/jira/incremental_transform.py, enabling path traversal (../../etc/passwdstyle writes outside the Jira data dir). New moduleconnectors/jira/validation.pyprovidesis_valid_issue_key(regex whitelist; underscore deliberately excluded — Atlassian rejects underscores in real project keys) andsafe_join_under(Path.resolve()containment check). Both are enforced at every filesystem boundary, defense-in-depth. - Security (CRITICAL):
webhookEvent(the second attacker-controlled field in Jira webhook payloads) was used as a filename component in_log_webhook_eventwithout sanitization (issue #83 reviewer follow-up). A payload withwebhookEvent: "../../tmp/pwn"could write a JSON dump outsideWEBHOOK_LOG_DIR. The handler now strips everything that isn't[A-Za-z0-9_-](dot deliberately excluded to defeat..survival), clips length to 64 chars, and routes the final filename throughsafe_join_under. - Security (CRITICAL): hardened the connector → orchestrator trust
boundary on BOTH the rebuild path
(
src/orchestrator.py::_attach_remote_extensions) AND the read-only query path (src/db.py::_reattach_remote_extensions, called byget_analytics_db_readonly()on every request) — issue #81 Group A. Three fixes: (1) DuckDB extensions referenced by_remote_attachare matched against a hard allowlist (default:keboola, bigquery; override viaAGNES_REMOTE_ATTACH_EXTENSIONS). Install path splits built-in (LOAD only) from community (INSTALL FROM community; LOADon rebuild path; LOAD only on the read-only query path which must not touch the network). (2)token_envnames are matched against a hard allowlist (default:KBC_TOKEN,KBC_STORAGE_TOKEN,KEBOOLA_STORAGE_TOKEN,GOOGLE_APPLICATION_CREDENTIALS; override viaAGNES_REMOTE_ATTACH_TOKEN_ENVS). Names must additionally match^[A-Z][A-Z0-9_]{0,63}$. A malicious connector cannot ask the orchestrator to readJWT_SECRET_KEY/SESSION_SECRET/OPENAI_API_KEYand exfiltrate them viaATTACH ... TOKEN. (3) The URL passed toATTACHis now single-quote-escaped on both paths. Also fixed atable_schemavstable_catalogmismatch that silently no-op'd_attach_remote_extensionsfor every connector (the rebuild-path hardening would have been moot in production without this fix). New modulesrc/orchestrator_security.pycentralises the policy and exposeslog_effective_policy(), called from app startup so an operator's typo inAGNES_REMOTE_ATTACH_EXTENSIONS(which replaces the default, not extends it — a setting ofhttpfswould silently lock outkeboola, bigquery) is visible at boot rather than at the next failed attach. Seedocs/superpowers/plans/2026-04-27-issue-81-trust-boundary.md. - Security (MEDIUM): extractor-side identifier validation (issue
#81 Group D / M15). The Keboola and BigQuery extractors interpolate
table_name,bucket/dataset, andsource_tablefromtable_registrydirectly intoCREATE OR REPLACE VIEW,INSERT INTO _meta, andCOPY ... TOSQL. Anyone with write access totable_registry(admin, registry-write API) could inject SQL via these identifiers. New shared modulesrc/identifier_validation.pyexposes a strictvalidate_identifier(for our own view names —^[a-zA-Z_][a-zA-Z0-9_]{0,63}$, used fortable_nameso it matches the orchestrator's rebuild-time check and dashed names fail fast at extraction rather than being silently dropped at rebuild) and a relaxedvalidate_quoted_identifier(for upstream-typed names like Keboolain.c-foo/ BigQuerymy-dataset:[a-zA-Z0-9_][a-zA-Z0-9_.\-]*, refusing any character that could close a"..."identifier literal). The orchestrator's existing_validate_identifierwas lifted into the new module so both layers share a single source of truth; both extractors skip-and-continue on unsafe rows (logged + counted in failure stats; the rest of the registry still processes).
Removed
- Customer-specific manual-deploy helper
scripts/grpn/Makefileand its README, plus the corresponding hackathon deploy log underdocs/superpowers/plans/2026-04-22-grpn-deploy-learnings.md. These documented one operator's hand-rolled stopgap for an org-policy-blocked Terraform flow and do not belong in vendor-neutral OSS. scripts/switch-dev-vm.sh— hackathon-era helper hardcoded to a specific shared dev VM. Per-developer dev VMs are the supported pattern now; operators who need an equivalent should usegcloud compute ssh <vm> --command "sed -i …/.env && sudo /usr/local/bin/agnes-auto-upgrade.sh"with their own VM details.
Internal
- Sandbox blocklist now flags introspection-chain dunders explicitly:
__subclasses__,__globals__,__class__,__base__,__bases__,__mro__,__dict__,__code__,__builtins__.__init__and__getattribute__are intentionally not in the list — substring match would flag every legitimatedef __init__(self):. The chain breaks at the next link anyway. - New regression test
test_run_pwn_payload_blockedparametrized over the exact PoC from issue #44 plus two equivalent variants (lambda+__globals__,__mro__traversal). If the dunder list is silently weakened in a future refactor, the test fails. Newtest_*_requires_admintests parametrized over all three non-admin core roles (analyst, viewer, km_admin). tests/conftest.py::seeded_appextended withviewer_tokenandkm_admin_tokenso role-gating tests cover all four core roles.
Migrated
- Schema bumped from v9 to v10. Auto-migration applies on next start
(creates the
view_ownershiptable; data on disk is unaffected). The pre-migration snapshot machinery (added at v8→v9) covers v9→v10 too — if anything goes wrong during the migration, the snapshot at<DATA_DIR>/state/system.duckdb.pre-migratelets you roll back.
[0.11.5] — 2026-04-27
Follow-up release for PR #73: addresses four rounds of Devin AI review on the role-management-complete branch. No new public-API surface; the user-visible payoff is that v8→v9-migrated installations now work end-to-end (login flows, user list, admin nav, privilege revocation), and make local-dev startup is finally quiet.
Fixed
- Privilege retention after grant revocation via the new REST API (Devin review #73).
_hydrate_legacy_rolepreviously short-circuited on a truthyuser.get("role"). The role-management endpoints (POST/DELETE /api/admin/users/{id}/role-grants, plus thechangeCoreRoleUI flow) only mutateuser_role_grants— they don't touch the legacyusers.rolecolumn. After a downgrade-via-API, the stale legacy value would keepuser["role"] = "admin"in memory;_is_admin_user_dictand the catalog/sync admin-bypass short-circuits then silently retained elevated table access even thoughrequire_internal_rolecorrectly denied the API gates. Fix: always re-resolve fromuser_role_grantsregardless of the legacy column, making the grants table the single source of truth on every authenticated request. Cost: one DB round-trip per request (same as the existing PAT-aware fallback). - Dev-bypass + OAuth callback dropped direct grants from the session cache (Devin review #73). Both call sites passed
external_groupsonly toresolve_internal_roles, never the user's id — souser_role_grantsrows were resolved on the per-request DB-fallback path insiderequire_internal_roleinstead of the cache. Functionally correct, but every admin-gated request paid a DB round-trip and the dev-bypass log line read "resolved 0 internal role(s)" for an obviously-admin user, which was confusing during debugging. Fix: passuser_idso the cache reflects the union at sign-in. GET /api/usersreturned HTTP 500 for any v8→v9-migrated installation. The migration NULL-s legacyusers.role(kept as a deprecated artifact because DuckDB FK blocks DROP COLUMN), butUserResponse.roleis a requiredstrPydantic field — every user listing failed validation./admin/usersshowed only "Failed to load users" and the new/admin/users/{id}Detail link was unreachable. Fix: route every user dict returned by the API through_hydrate_legacy_role(same shim already used byget_current_user), which derives the legacy enum value fromuser_role_grantsfor migrated users. Also fixes a quieter dual of the same bug —target["role"] == "admin"short-circuits inupdate_user/delete_userwould silently no-op on migrated admins, letting the operator demote/delete the last admin against the documented protection.- Scheduler log-noise: every cron tick produced a
POST /auth/token 401 Unauthorizedaccess-log line because the scheduler's auto-fetch fallback was always broken — it called/auth/tokenwith just an email, but the endpoint requires email + password. Fix: removed the auto-fetch path entirely. Operators setSCHEDULER_API_TOKEN(a long-lived PAT) in production; inLOCAL_DEV_MODEthe dev-bypass auto-authenticates the un-tokenized request, so jobs continue to work. - HTTP 500 on
POST /auth/tokenfor v8-migrated users (Devin review #73 round 3).TokenResponse.roleis a requiredstrPydantic field, but the v8→v9 migration NULL-s the legacyusers.rolecolumn for every existing user. The login endpoint passed the raw NULL through to Pydantic, raisingValidationError→ 500. Same root cause produced semantically wrong (but non-crashing) JWTs from Google OAuth, password, and email-magic-link flows — they wroterole: nullinto the issued token; downstream_hydrate_legacy_roleinget_current_userwould correct the per-request view, but the token payload itself stayed misleading. Fix: hydrate inline in each login flow before readinguser["role"]—app/auth/router.py(POST /auth/token),app/auth/providers/google.py(OAuth callback),app/auth/providers/password.py(5 flows: JSON login, web login, JSON setup, web reset, web setup), andapp/auth/providers/email.py(centralized in_consume_token, covers both magic-link/verifyendpoints). New regression classTestAuthLoginFlowsPostMigrationintests/test_schema_v9_migration.pypins both the no-crash and the correct-role contracts for all four legacy levels (viewer/analyst/km_admin/admin). docs/RBAC.mddocumented animplies=[…]keyword onregister_internal_role()that the function doesn't accept (Devin review #73 round 3). A module author copying the example would hitTypeError: got an unexpected keyword argument 'implies'at import time. Reality:impliesis currently seeded only for thecore.*hierarchy via_seed_core_rolesinsrc/db.py— the registry-side write path doesn't exist yet. Rewrote the Implies hierarchy and Module-author workflow sections to document what's actually supported in 0.11.4 and what a future change would need to add._seed_core_roleswas advertised as a per-connect safety net but only ran during fresh installs and the v8→v9 migration (Devin review #73 round 4). The docstring promised "called from_ensure_schemaon every connect" so an accidentalDELETE FROM internal_roles WHERE key = 'core.admin'(or a doc-tweak release that updated_CORE_ROLES_SEEDwithout bumping the schema version) would self-heal on the next process start. In reality both call sites lived insideif current < SCHEMA_VERSION:— once the DB was on v9, the seed function never ran again, leaving any deletion permanent and any in-codedisplay_name/description/implieschange requiring a manual SQL deploy. Fix: added an unconditional tail call to_seed_core_roles(conn)at the bottom of_ensure_schema, gated only bycurrent <= SCHEMA_VERSIONso the future-version-rollback contract still holds. New regression classTestSeedCoreRolesSafetyNetintests/test_schema_v9_migration.pypins all three contracts (deleted row re-seeds, mutateddisplay_namere-syncs from code,applied_atdoesn't churn on already-current DBs).make local-devstartup spammed anAuthlibDeprecationWarningfrom upstream's own_joserfc_helpers.pyevery timeapp/auth/providers/google.pytriggered thefrom authlib.integrations.starlette_client import OAuthimport chain. The warning is upstream-internal — authlib telling itself to migrate fromauthlib.josetojoserfcbefore its 2.0 cut — and isn't actionable on our side until either authlib ships the fix or we rewrite OAuth on top ofjoserfcdirectly. Filtered the specific warning class at the top ofapp/main.py(with a message-based fallback if the class moves in a future authlib release) so the warning no longer pollutes operator-facing stdout. OtherDeprecationWarnings remain visible.
Added
/profilenow self-services every user's role situation. Three new sections rendered server-side for all signed-in users (not just admins): Effective roles (the full resolver output as chip cloud — direct grants ∪ group-derived ∪ implies-expanded), Direct grants (rows inuser_role_grantswith source label:auto-seedfrom v8 backfill vs.directadmin grant), and Roles via groups (which Cloud Identity / dev group grants which role for the current user). Non-admins finally see why a particular feature is or isn't accessible without asking an admin to read the DB. Admins additionally see a deep-link to/admin/users/{id}for editing their own grants in place./admin/role-mappinggroup ID picker. A new "Known groups" panel above the create-mapping form surfaces clickable chips of group IDs known to the system: the calling admin's ownsession.google_groups(with human-readable names + a "your group" tag) merged with distinctexternal_group_ids already used in existing mappings (tagged "already mapped"). Click a chip → fills the form's external-group-id input and focuses the role select. Empty-state copy points the operator atLOCAL_DEV_GROUPS/ Google sign-in when the picker is empty, instead of leaving them to guess Cloud Identity opaque IDs from memory.
Changed
- Renamed
docs/internal-roles.md→docs/RBAC.md. Standard industry term, more discoverable for engineers grepping for "RBAC" in a new repo. Added Quickstart-by-role sections (operator / end-user / module author) and a step-by-step Module-author workflow with code examples for registering a key, gating endpoints, declaring implies hierarchies, and writing a contract test against the gate. Cross-references in code (app/api/admin.py,tests/test_role_resolver.py) updated.CLAUDE.mdnow points contributors at the new doc from the Extensibility → RBAC section. Historical CHANGELOG entries ([0.11.3]/[0.11.4]body) keep the originalinternal-roles.mdfilename — they describe what shipped at that version and aren't retro-edited.
[0.11.4] — 2026-04-27
Role-management complete release. Sjednocuje legacy users.role enum (viewer/analyst/km_admin/admin) with the v8 internal-roles foundation under one model with implies hierarchy, ships admin UI + REST API + CLI for managing both group mappings and direct user grants, and wires require_internal_role for PAT-aware resolution so admin endpoints work uniformly across OAuth and headless callers.
Added
- Schema v9 — unified role model. New
user_role_grants(user_id, internal_role_id, granted_by, source)table for direct user→role assignments (complementary togroup_mappingswhich assigns via Cloud Identity group). Two new columns oninternal_roles:implies(JSON array of role keys this role transitively grants) andis_core(BOOL, distinguishes seeded core.* hierarchy from module-registered roles). Migration v8→v9 seeds fourcore.*rows (core.viewer/analyst/km_admin/admin) with the legacy hierarchy asimplies(core.admin → core.km_admin → core.analyst → core.viewer), backfills oneuser_role_grantsrow per existing user mirroring their pre-v9users.rolevalue (source='auto-seed'), and NULLs the legacy column. - PAT-aware
require_internal_role. Two-path resolution: session cache first (OAuth flow), DB-backeduser_role_grantsfallback (PAT/headless flow). Admin CLI scripts now hit gated endpoints uniformly without an OAuth round-trip. The PAT-specific 403 message from 0.11.3 is removed — PAT now legitimately resolves through direct grants. - Implies expansion at resolve time. New
expand_implies(role_keys, conn)helper inapp.auth.role_resolverdoes BFS over theimpliesgraph;resolve_internal_rolescalls it at the end so a singlecore.admingrant expands to the full four-level hierarchy automatically. - Dotted role-key namespace. Regex extended to allow
core.admin,context_engineering.admin,corporate_memory.curatorstyle keys (max 64 chars, lower-snake-case segments separated by dots). The owner_module column should match the prefix before the first dot. - REST API for role management. New router
app/api/role_management.pyunder/api/admin:GET/POST/DELETEongroup-mappings,users/{id}/role-grants, plusGET internal-rolesandGET users/{id}/effective-roles(debug). All gated byrequire_internal_role("core.admin")— works for both OAuth admins (cookie) and admin PATs. - Admin UI
/admin/role-mapping. Browse internal roles, manage Cloud Identity group → role mappings (table view + create/delete forms). User detail page extended with three sections: Core role (single-select forcore.*), Additional capabilities (multi-checkbox for module roles), Effective roles (debug view of direct + group-derived + expanded set). da adminCLI subcommands.role list,role show <key>,mapping list/create/delete,grant-role <email> <key>,revoke-role <email> <key>,effective-roles <email>. All run over PAT — use them in CI scripts to grant/revoke roles without going through the browser.
Changed
- BREAKING (semantics, not API).
users.rolecolumn NULL-ed during v8→v9 migration. Reads viaUserRepository.get_by_*still return the column but the value is always NULL after upgrade — code readinguser["role"]directly in business logic getsNone. The legacyRoleenum (Role.VIEWER/ANALYST/KM_ADMIN/ADMIN) and convenience helpers (is_admin,has_role, etc. insrc/rbac.py) continue to work — they now read fromuser_role_grantsvia the resolver. Sweepinguser.get("role") == "admin"checks were rewritten to the new helper. The column itself is preserved physically because DuckDB rejects DROP COLUMN while a FK references the table; physical drop is deferred to a future schema-rebuild migration. require_role(Role.X)andrequire_adminare now thin wrappers overrequire_internal_role(f"core.{role}"). Behavior identical for OAuth users (admin role from group_mappings); PAT users now succeed when they hold a directcore.admingrant.UserRepository.create()andupdate()mirror role changes intouser_role_grantsautomatically (_grant_core_rolehelper); existing setup code keeps working without changes.UserRepository.delete()pre-deletesuser_role_grantsrows (DuckDB FK doesn't auto-cascade).UserRepository.count_admins()readsuser_role_grants ⨝ internal_roles WHERE key='core.admin'— the legacyusers.role = 'admin'count would always return 0 after backfill.app/api/admin.pymodule-level docstring documents the v9 pattern for module authors who want to add their own capability gates.docs/internal-roles.mdrewritten to remove the v8 "no UI yet" caveat, document the implies hierarchy, the dual session/DB resolution pathway, and the dotted-namespace key convention.
Removed
require_internal_role's session-only enforcement (the v8 "This endpoint needs an interactive (OAuth) session — Bearer/PAT tokens do not carry session-resolved roles" error message). PAT clients with a matchinguser_role_grantsrow now pass the gate uniformly.
Internal
- New
UserRoleGrantsRepositoryinsrc/repositories/user_role_grants.pymirrors the style ofGroupMappingsRepository(list/get/create/delete + per-user / per-role indices). - INFO-level audit log on grant + mapping mutations (action strings:
role_mapping.created/deleted,role_grant.created/deleted, resourcemapping:<id>/grant:<id>). - "Last admin protection" on
DELETE /api/admin/users/{id}/role-grants/{grant_id}: refuses to delete the finalcore.admingrant in the system (mirrors existingcount_adminsprotection on user deletion / deactivation).
0.11.3 — 2026-04-26
Authorization-foundation release — adds the internal-roles layer between Cloud Identity groups and per-module capability checks. Schema v8 migration; no admin UI yet (follow-up).
Added
- Internal roles + group mapping (foundation). Schema v8 adds two tables:
internal_roles(app-defined capabilities likecontext_admin,agent_operator, registered by Agnes modules at import time) andgroup_mappings(many-to-many bindings of Cloud Identity group IDs to internal role keys, managed by admins). Newapp.auth.role_resolvermodule exposesregister_internal_role(...)for module authors,sync_registered_roles_to_db(...)(run once at startup, idempotent),resolve_internal_roles(external_groups, conn)(called at sign-in, writes resolved keys intosession["internal_roles"]), and arequire_internal_role("…")FastAPI dependency factory for permission checks. Resolution runs at sign-in (Google OAuth callback + dev-bypass — populates on first request and whenever external groups change, mirroring the OAuth callback's always-write semantics). No DB hit per request. Refresh requires re-login, same semantics assession.google_groups. No admin UI yet — mapping rows must be created via the repository directly until the management UI ships in a follow-up. PAT/headless clients carry no session and therefore cannot passrequire_internal_rolegates by design —require_internal_roledistinguishes "signed-in but missing role" from "no session at all" and surfaces a PAT-specific 403 detail in the second case so an API consumer hitting the wall sees what to fix. Seedocs/internal-roles.md→ PAT and headless requests.
Changed
docs/internal-roles.mddocumentsAdmin → Users → deactivate then reactivateas the supported "force re-resolve now" lever for users you can't get to log out (long-lived sessions, automated clients) — invalidates the existing session and forces a fresh sign-in on the next request.
Internal
- INFO-level audit log on every successful resolve (OAuth callback + dev-bypass) so a "wrong role" complaint is debuggable from the log alone — admin can correlate "user X claims they lost access" with the resolver output without replaying the request.
- Startup warning when
SESSION_SECRETis shorter than 32 chars, matching the existingJWT_SECRET_KEYgate. Both HMAC surfaces sign trust-laden state (session.internal_roles,session.google_groups, JWTs) — keeping the two gates consistent so a weak secret gets surfaced at boot, not after a quiet downgrade. _clear_registry_for_tests()now refuses to run unlessTESTING=1so a stray import path in production can't drop the registered capabilities.
0.11.2 — 2026-04-26
Dev-experience patch release — make LOCAL_DEV_MODE realistic enough to actually exercise group-aware code paths on localhost, and consolidate scattered dev-onboarding instructions into a single docs/local-development.md.
Added
LOCAL_DEV_GROUPSenv var mockssession.google_groupsfor the auto-logged-in dev user whenLOCAL_DEV_MODE=1. JSON array matching the production shape ([{"id":"…","name":"…"}]) so group-aware UI and access-control code paths can be exercised onlocalhostwithout a Google OAuth round-trip. Honored only underLOCAL_DEV_MODE=1. The startup banner reports the parsed group IDs (or warns loudly when the value is set but malformed), so a typo gets surfaced at boot rather than silently on the first authenticated request. Session injection mirrors the production OAuth callback's "always-write" semantics — including clearing stale groups when the operator unsetsLOCAL_DEV_GROUPSmid-session. Seedocs/auth-groups.md→ Local-dev mock.make local-devnow seeds two default mocked groups (Local Dev Engineers+Local Dev Adminsonexample.com) viascripts/run-local-dev.sh, so first-boot/profileis non-empty out of the box. Override withLOCAL_DEV_GROUPS='[…]' make local-dev; disable withLOCAL_DEV_GROUPS= make local-dev.docs/local-development.md— single onboarding doc for working on Agnes locally: TL;DR, whatLOCAL_DEV_MODEactually bypasses, group mocking, what isn't mocked, and the security-rails reminder that dev mode must never reach a production deploy.
Internal
- Fix nightly
docker-e2eCI failures: refresh two stale assertions that had drifted from the live API.tests/test_docker_full.py::test_app_returns_html_on_rootnow expects the auth-aware302 → /login(root has redirected since the auth middleware landed);tests/test_e2e_docker.py::TestDockerHealth::test_health_has_duckdbnow readsservices["duckdb_state"](current health-payload shape, already validated bytests/test_api.py). No application behavior change — these only ran in the scheduled nightly job, so the drift went unnoticed for several PRs.
0.11.1 — 2026-04-26
Patch release — hotfix the missed Caddy env passthrough that should have shipped with 0.11.0, plus codify changelog discipline so this kind of drift gets caught at PR review time next time.
Fixed
docker-compose.ymlcaddy service now passesCADDY_TLSthrough to the container (- CADDY_TLSbare-form passthrough). Without it theCaddyfile{$CADDY_TLS:default}substitution always falls back to cert-file mode regardless of what the operator wrote into.env, and Caddy crash-loops on Let's Encrypt / internal-CA deployments. Should have shipped with #52; first attempt was #55, accidentally closed before merging.
Internal
CLAUDE.md— non-negotiable changelog discipline: every PR touching user-visible behavior must updateCHANGELOG.mdunder## [Unreleased]in the same PR.
0.11.0 — 2026-04-26
First tagged semver release. The version = "2.x" strings that appeared in earlier pyproject.toml snapshots were arbitrary placeholders from the initial scaffold and never reflected actual API maturity — resetting to pre-1.0 to signal that things may still shift.
Added — Auth
- Google Workspace groups on
/profile. OAuth callback fetches the signed-in user's group memberships via Cloud Identity (searchTransitiveGroupswith thesecuritylabel — seedocs/auth-groups.mdfor the GCP setup checklist and thesecurity-vs-discussion_forumgotcha). Profile link added to the user dropdown. - Password reset + invite flows for web and admin (
/auth/password/reset,/admin/users/invite). - Personal access tokens (PAT) with separate
:typ=patJWT claim, per-token revoke, last-used IP tracking, "My tokens" + admin "All tokens" UI. - Email magic-link provider (itsdangerous-signed token).
- Optional
SEED_ADMIN_PASSWORDto pre-hash the seed admin (dev convenience).
Added — Deploy
keboola-deploy.ymlworkflow. Tag-triggered alternative torelease.ymlfor shared dev VMs that want explicit "deploy when I tag" semantics. Publishes immutable:keboola-deploy-<tag>+ floating:keboola-deploy-latestalias.- Caddy + Let's Encrypt + corporate-CA TLS.
Caddyfileparametrized via$CADDY_TLSenv var so a single file serves three regimes: cert-file (corp PKI), Let's Encrypt auto-issue, Caddy-internal-CA. URL-driven cert rotation with self-signed fallback (scripts/grpn/agnes-tls-rotate.sh).docker-compose.tls.ymloverlay closes host:8000when Caddy fronts. dev_instancesschema incustomer-instanceTerraform module gains optionaltls_mode+domain(mirrorsprod_instance).infra-v1.6.0tag.- Optional Google OAuth credentials from Secret Manager. Module reads
google-oauth-client-{id,secret}at boot if present; graceful fallback so non-Google deployments aren't affected. LOCAL_DEV_MODE+make local-dev-up/local-dev-downfor one-keystroke local stack with magic-link auth pre-wired.- Per-developer
dev-<prefix>-latestGHCR alias for branches matching<prefix>/<branch>— push-to-deploy on personal dev VMs. /setupweb wizard for first-time instance setup, plus headlessPOST /api/admin/configureandPOST /api/admin/discover-and-register.- Smoke-test job in CI (Docker-in-CI after every release) +
scripts/smoke-test.shfor post-deploy verification.
Added — CLI
- Wheel distribution + auto-update check on startup.
--versionflag,--dry-run+X/Nprogress onda sync, durable sync (atomic writes + manifest hash + retry on transient errors).- gzip on JSON/HTML responses (server-side).
Added — Data
- Remote query engine. Two-phase BigQuery + DuckDB engine for tables too large to sync locally (
--register-bqflag). - Business metrics. Standardized
metric_definitionstable in DuckDB with starter pack importer (da metrics import). /api/healthreturnsversion,channel,commit_sha,image_tag,schema_version.- Custom connector mount support (
connectors/custom/). - OpenAPI snapshot test for breaking-change detection.
Added — Docs / tooling
docs/auth-groups.md,docs/DEPLOYMENT.md,docs/HACKATHON.md,docs/ONBOARDING.mdrunbooks.scripts/debug/probe_google_groups.py— stdlib-only probe for diagnosing Cloud Identity API issues without a deploy cycle.- Schema migration safety tests (idempotency, data preservation, snapshot).
- Pre-migration snapshot of
system.duckdbbefore schema upgrades. - Auto-generated JWT and session secrets with file persistence (
/data/state/.jwt_secret). - Startup banner logging version, channel, and schema version.
Changed
- BREAKING (deployment) — Caddy compose profile renamed
production→tls. Existingdocker compose --profile production up -dinvocations need to switch. - BREAKING (deployment) — Default
Caddyfilemode is now cert-file (tls /certs/fullchain.pem /certs/privkey.pem); for the previous Let's Encrypt auto-issue behaviour setCADDY_TLS=tls <ops-email>in.env. Seedocs/auth-groups.mdandCaddyfileinline docs. - Schema migration v5→v6→v7: adds
users.active,personal_access_tokenstable,personal_access_tokens.last_used_ip. Auto-applied at boot. - Image-level
AGNES_VERSIONnow sourced frompyproject.tomlat build time (no more drift betweenda --versionand the package metadata). - Vendor-agnostic OSS rule codified in
CLAUDE.md— customer-specific names, hostnames, project IDs belong in consumer infra repos, not in this OSS distribution.
Fixed — Security
- Open-redirect guard for backslash in
safe_next_path. SessionMiddleware max_age=3600 + https_only(was browser-session forever, plain-HTTP-OK).- Timezone-aware datetimes in Keboola metadata cache.
- Atomic magic-link token consumption (closes double-use race under concurrent clicks).
- Bootstrap backdoor closed when passwordless seed admin exists.
- urllib3 1.26→2.6.3 (resolves 4 Dependabot security alerts).
- argon2-cffi adopted for password hashing.
- See docs/security-audit-2026-04.md for the full audit (renamed from
docs/padak-security.mdin #94).
Fixed — Other
uvicorn --proxy-headers --forwarded-allow-ips='*'so OAuth callbacks resolve to https when behind a TLS terminator.scripts/grpn/agnes-tls-rotate.shhardened:--max-redirs 0+--proto '=https'on cert fetch, post-fetch PEM validation (rejects HTML error pages from corp portals),ulimit -c 0to suppress coredumps that could leak the unencrypted privkey, POSIX-safe${arr[@]+"${arr[@]}"}array expansion.scripts/tls-fetch.sh— generic URL fetcher (sm://,gs://,https://,file://) with redirect refusal + PEM validation.kbcstoragemoved to optional dep — unblocks urllib3 security updates; primary Keboola path now uses the DuckDB Keboola extension.- Dependencies consolidated into
pyproject.toml(no morerequirements.txt).
Internal
- Test suite expanded to 1357+ tests (4 layers — unit, integration, web smoke, journey).