This squashes 13 commits from ma/staging plus a small docstring translation
into a single coherent unit. Three workstreams.
== RBAC v13 redesign ==
- Drops core.viewer/analyst/km_admin/admin hierarchy and the
internal_roles / group_mappings / user_role_grants / plugin_access tables.
- Replaced by user_group_members + resource_grants. Atomic v12→v13 backfill
wrapped in BEGIN/COMMIT; ROLLBACK leaves schema_version at 12 for retry.
- Two authorization primitives in app.auth.access:
require_admin — Admin-group god-mode
require_resource_access(rt, "{path}") — entity-scoped grants
Single DB lookup per request; no session cache; no implies BFS.
- /admin/access UI (single page) replaces /admin/role-mapping +
/admin/plugin-access. CLI `da admin group/grant *` replaces
`da admin role/mapping/grant-role/revoke-role/effective-roles`.
- ResourceType.TABLE listing-only — admins can record table grants,
runtime enforcement still flows through legacy dataset_permissions
(migration plan in docs/TODO-rbac-data-enforcement.md).
== Claude Code marketplace ==
- Aggregated /marketplace.zip + /marketplace.git/* (PAT-gated,
RBAC-filtered, content-addressed cache via dulwich).
- Admin god-mode dropped on the marketplace surface — admins curate
their own view via grants like everyone else.
- Bare-repo cache materializes per RBAC-filtered ETag; stale entries
not pruned in this iteration (disclaimed in git_backend.py docstring).
== #81 #83 #44 security/ops hardening ==
- #81 Group A — orchestrator ATTACH allow-listing (extension/url/alias).
- #81 Group B — Keboola extractor 3-state exit codes:
0 success / 1 total fail / 2 PARTIAL fail
Sync API logs PARTIAL FAILURE alert on exit 2. Operators with binary
alerting must teach it the new partial signal.
- #81 Group C — schema v10 view_ownership; rejects silent overwrite
of a prior connector's view name on collision.
- #81 Group D — extractor-side identifier validation.
- #83 — Jira webhook fail-closed when JIRA_WEBHOOK_SECRET unset
+ path-traversal fix.
- #44 — entire /api/scripts/* surface is admin-only (planted-script +
sandbox-bypass risk closed).
== Web UI polish + deploy fix ==
- /admin/access: live grant-count badges (no stale snapshot revert),
shared-header CSS link added to /catalog and /admin/{tables,permissions},
per-resource-type colored stripes.
- docker-compose.host-mount.yml: bind,rbind so dual-disk hosts don't
silently shadow sub-mounts and write state to the wrong disk.
== OSS vendor-neutralization (waves 1+2) ==
- scripts/grpn/ → scripts/ops/. Customer-specific identifiers
(project IDs, internal hostnames, dev/prod VM IPs, brand names)
replaced with placeholders across code, docs, Terraform, Caddyfile,
OAuth probe, and planning docs. Downstream infra repos that copied
scripts/grpn/agnes-tls-rotate.sh or agnes-auto-upgrade.sh must
update the path.
== Translation ==
- src/repositories/user_groups.py::ensure_system docstring translated
from Czech to English for codebase consistency.
Co-authored-by: Mina Rustamyan <mina@keboola.com>
7.7 KiB
TODO — RBAC data enforcement (drop dataset_permissions)
Goal
Switch table-access authorization from the legacy per-user
dataset_permissions table to the unified per-group resource_grants
model (via ResourceType.TABLE), then drop dataset_permissions
entirely.
The first step — surfacing tables on the /admin/access page as a new
resource type — has already shipped. Admins can grant/revoke per-group,
but the runtime check still walks the legacy code path. This document
describes the follow-up work that closes the loop.
Background
Two parallel access models exist today:
- v13 unified model —
user_groups←user_group_members←resource_grants(group_id, resource_type, resource_id). Used byMARKETPLACE_PLUGINand (for listing only)TABLE. - Legacy per-user model —
dataset_permissions(user_id, dataset)withbucket.*wildcards and anis_publicbypass ontable_registry. Used at runtime bysrc/rbac.py:can_access_table, which is the call site referenced byapp/api/catalog.py,app/api/sync.py, and any other endpoint that filters tables by caller.
The two layers are mutually exclusive — dataset_permissions is
per-user, resource_grants is per-group. v13 already pivoted marketplace
plugins to groups; tables are the last holdout.
Steps
-
Refactor
src/rbac.py:can_access_table(line 63). Preserve the external signature — callers stay untouched. New check order:- Admin bypass (already in place).
is_public=Trueontable_registryrow → allow.- Delegate to
app.auth.access.can_access(user_id, ResourceType.TABLE.value, table_id, conn).
Remove the
DatasetPermissionRepository.has_access(user_id, table_id)branch and thebucket.*wildcard branch. The bucket-level "grant all" UX is now expressed via the per-table bulk action in/admin/access, not via a wildcard token. -
Audit callers of
can_access_table. Rungrep -rn 'can_access_table\|get_accessible_tables' --include='*.py'and confirm each site still works with the new semantics. Known sites:app/api/catalog.py:24,app/api/catalog.py:90,app/api/sync.py. Either keep delegating throughcan_access_table(cheapest), or migrate the endpoint toDepends(require_resource_access(ResourceType.TABLE, "{table_name}"))for endpoints that have a clean{table_name}path param. -
Schema v14 migration in
src/db.py.- Decide on backfill strategy for existing
dataset_permissionsrows. Two options:- (a) Backfill — for each
(user_id, dataset)row, ensure a personal groupuser-{email}exists, add the user as a member (sourcesystem_seed), and insertresource_grants(group_id, "table", dataset). Wildcard rows (bucket.*) need expansion to all matchingtable_registry.idvalues at migration time. - (b) Drop without backfill — log a CHANGELOG warning,
require admins to re-grant via
/admin/accesspost-deploy.
- (a) Backfill — for each
- Recommended: (b), contingent on an ops check that
dataset_permissionsis not heavily populated in any active deployment. (b) is much simpler and avoids creating "groups of one" that clutter the access UI. - Drop the
dataset_permissionstable. - Decide whether
access_requests(src/db.py:183) goes too — it implements the "I'd like access to X" flow that pairs withdataset_permissions. If retained, it should write toresource_grantson approval; if not used in practice, drop it in the same migration.
- Decide on backfill strategy for existing
-
Remove
DatasetPermissionRepositoryfromsrc/repositories/sync_settings.py:47. Grep for any remaining importers and clean them up. -
Update
tests/test_access_control.py. The current suite seedsdataset_permissionsdirectly. Rewrite the relevant cases to seedresource_grantsvia the access API (or directly viaResourceGrantsRepository). Most assertions should carry over verbatim because the public surface (HTTP status codes, manifest content) does not change. -
CHANGELOG entry under
## [Unreleased]:### Changed - **BREAKING** Table access moved from per-user `dataset_permissions` to per-group `resource_grants` (via `ResourceType.TABLE`). Existing `dataset_permissions` entries are dropped on upgrade — admins must re-grant via /admin/access. -
Update
docs/RBAC.md. It already calls out v13's removal ofplugin_access; add the analogous note fordataset_permissionsand document theis_publicbypass alongside the admin bypass. -
Delete the dummy seed script. Once enforcement is wired up and verified end-to-end against a real data source, remove
scripts/seed_dummy_tables.pyand the corresponding### Internalentry fromCHANGELOG.md. The script exists only to make the/admin/accessTables section testable while no real connector is configured; once production-style tables flow throughtable_registryvia Keboola/BigQuery discovery, the dummy rows just clutter the UI. Also drop any leftover dummy rows from deployments that ran the script (DELETE FROM table_registry WHERE source_type = 'dummy').
Open questions
- Production usage of
dataset_permissions. Determines whether step 3 needs a backfill. Check with ops or query a representative deployment. If essentially empty, prefer (b) and skip the backfill. - Fate of
access_requests. Does anyone use the request-access workflow today? If not, drop it together withdataset_permissions. If yes, it needs an updated approval path that writes toresource_grants. is_publicflag. Recommended to keep — semantics "default visible to everyone, no grant needed" is independent of the group-grant model and useful for shared reference tables. Confirm before locking in.- Bucket-level grants. v1 deliberately models the bucket as a
UI-only grouping (mirroring how
marketplaceis not a grantable resource — only individual plugins are). If, after living with the per-table bulk-action UX, admins still want first-class bucket grants that auto-cover newly registered tables, addResourceType.BUCKETin a later iteration. - Legacy role-based remnants. Several places in the code still
carry role-based artifacts (API contracts, CLI gestures, helper
functions, DB column, JWT claim). Audit the codebase end-to-end and
replace each with the v13 group-based mechanism. Gated on Google
Workspace group sync (
app/auth/group_sync.py) being verified reliable end-to-end in production — until then,users.roleserves as a defensive fallback for admin elevation when Cloud Identity calls silently return[]. The asymmetry flagged in PR #106 (Devin review on_set_admin_membership) falls under this cleanup and will be resolved together with the rest. - v9 admins via group_mapping not carried to v13. The v12→v13
finalize backfill at
src/db.py:851matchescore.admingrants inuser_role_grantsonly — direct grants. Any v9 admin who heldcore.adminexclusively via agroup_mappingsrow (Google group → internal role mapping) is silently dropped on upgrade and gets 403 on admin-gated endpoints post-deploy. In this deploy the set is empty (Google Workspace group sync was never functional →group_mappingstable was never populated → no admin promotions came through that path), so the loss is theoretical here. For forked installs that wired up Workspace groups under v9, those admins need to be re-promoted via/admin/accesspost-upgrade. Falls under the broader role-based cleanup pass once Workspace sync is verified end-to-end. Flagged in PR #106 (cvrysanek review onsrc/db.py:851).