agnes-the-ai-analyst/app/api/metadata.py
ZdenekSrotyr e9d7af3cce feat(rbac+marketplace): RBAC v13 + Claude Code marketplace + #81/#83/#44 hardening
This squashes 13 commits from ma/staging plus a small docstring translation
into a single coherent unit. Three workstreams.

== RBAC v13 redesign ==
- Drops core.viewer/analyst/km_admin/admin hierarchy and the
  internal_roles / group_mappings / user_role_grants / plugin_access tables.
- Replaced by user_group_members + resource_grants. Atomic v12→v13 backfill
  wrapped in BEGIN/COMMIT; ROLLBACK leaves schema_version at 12 for retry.
- Two authorization primitives in app.auth.access:
    require_admin                        — Admin-group god-mode
    require_resource_access(rt, "{path}") — entity-scoped grants
  Single DB lookup per request; no session cache; no implies BFS.
- /admin/access UI (single page) replaces /admin/role-mapping +
  /admin/plugin-access. CLI `da admin group/grant *` replaces
  `da admin role/mapping/grant-role/revoke-role/effective-roles`.
- ResourceType.TABLE listing-only — admins can record table grants,
  runtime enforcement still flows through legacy dataset_permissions
  (migration plan in docs/TODO-rbac-data-enforcement.md).

== Claude Code marketplace ==
- Aggregated /marketplace.zip + /marketplace.git/* (PAT-gated,
  RBAC-filtered, content-addressed cache via dulwich).
- Admin god-mode dropped on the marketplace surface — admins curate
  their own view via grants like everyone else.
- Bare-repo cache materializes per RBAC-filtered ETag; stale entries
  not pruned in this iteration (disclaimed in git_backend.py docstring).

== #81 #83 #44 security/ops hardening ==
- #81 Group A — orchestrator ATTACH allow-listing (extension/url/alias).
- #81 Group B — Keboola extractor 3-state exit codes:
    0 success / 1 total fail / 2 PARTIAL fail
  Sync API logs PARTIAL FAILURE alert on exit 2. Operators with binary
  alerting must teach it the new partial signal.
- #81 Group C — schema v10 view_ownership; rejects silent overwrite
  of a prior connector's view name on collision.
- #81 Group D — extractor-side identifier validation.
- #83 — Jira webhook fail-closed when JIRA_WEBHOOK_SECRET unset
  + path-traversal fix.
- #44 — entire /api/scripts/* surface is admin-only (planted-script +
  sandbox-bypass risk closed).

== Web UI polish + deploy fix ==
- /admin/access: live grant-count badges (no stale snapshot revert),
  shared-header CSS link added to /catalog and /admin/{tables,permissions},
  per-resource-type colored stripes.
- docker-compose.host-mount.yml: bind,rbind so dual-disk hosts don't
  silently shadow sub-mounts and write state to the wrong disk.

== OSS vendor-neutralization (waves 1+2) ==
- scripts/grpn/ → scripts/ops/. Customer-specific identifiers
  (project IDs, internal hostnames, dev/prod VM IPs, brand names)
  replaced with placeholders across code, docs, Terraform, Caddyfile,
  OAuth probe, and planning docs. Downstream infra repos that copied
  scripts/grpn/agnes-tls-rotate.sh or agnes-auto-upgrade.sh must
  update the path.

== Translation ==
- src/repositories/user_groups.py::ensure_system docstring translated
  from Czech to English for codebase consistency.

Co-authored-by: Mina Rustamyan <mina@keboola.com>
2026-04-28 14:25:04 +02:00

132 lines
4.3 KiB
Python

"""Column metadata API endpoints — CRUD and Keboola push."""
import logging
import os
from typing import List, Optional
import duckdb
import httpx
from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
from app.auth.access import require_admin
from app.auth.dependencies import get_current_user, _get_db
from src.repositories.column_metadata import ColumnMetadataRepository
from src.repositories.table_registry import TableRegistryRepository
logger = logging.getLogger(__name__)
router = APIRouter(tags=["metadata"])
class ColumnMetadataItem(BaseModel):
column_name: str
basetype: Optional[str] = None
description: Optional[str] = None
confidence: str = "manual"
class ColumnMetadataSave(BaseModel):
columns: List[ColumnMetadataItem]
@router.get("/api/admin/metadata/{table_id}")
async def get_table_metadata(
table_id: str,
user: dict = Depends(get_current_user),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Return column metadata for a table."""
repo = ColumnMetadataRepository(conn)
columns = repo.list_for_table(table_id)
return {"table_id": table_id, "columns": columns}
@router.post("/api/admin/metadata/{table_id}")
async def save_table_metadata(
table_id: str,
body: ColumnMetadataSave,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Save column metadata for a table. Admin only."""
repo = ColumnMetadataRepository(conn)
for item in body.columns:
repo.save(
table_id=table_id,
column_name=item.column_name,
basetype=item.basetype,
description=item.description,
confidence=item.confidence,
)
return {"status": "ok", "count": len(body.columns)}
@router.post("/api/admin/metadata/{table_id}/push")
async def push_metadata_to_source(
table_id: str,
user: dict = Depends(require_admin),
conn: duckdb.DuckDBPyConnection = Depends(_get_db),
):
"""Push column metadata to Keboola Storage API. Admin only."""
registry_repo = TableRegistryRepository(conn)
table = registry_repo.get(table_id)
if not table:
raise HTTPException(status_code=404, detail=f"Table not found: {table_id}")
source_type = table.get("source_type", "")
if source_type != "keboola":
raise HTTPException(
status_code=400,
detail=f"Push is only supported for keboola tables (table source_type={source_type!r})",
)
source_table = table.get("source_table") or table_id
stack_url = os.environ.get("KBC_STACK_URL", "").rstrip("/")
token = os.environ.get("KBC_STORAGE_TOKEN", "")
if not stack_url or not token:
raise HTTPException(
status_code=500,
detail="KBC_STACK_URL and KBC_STORAGE_TOKEN must be set",
)
metadata_repo = ColumnMetadataRepository(conn)
columns = metadata_repo.list_for_table(table_id)
if not columns:
return {"status": "ok", "pushed": 0, "message": "No column metadata to push"}
pushed = 0
errors = []
async with httpx.AsyncClient() as client:
for col in columns:
column_name = col["column_name"]
metadata_payload = []
if col.get("basetype"):
metadata_payload.append({"key": "KBC.datatype.basetype", "value": col["basetype"]})
if col.get("description"):
metadata_payload.append({"key": "KBC.description", "value": col["description"]})
if not metadata_payload:
continue
endpoint = f"{stack_url}/v2/storage/tables/{source_table}/columns/{column_name}/metadata"
try:
resp = await client.post(
endpoint,
headers={"X-StorageApi-Token": token},
json={"provider": "ai-metadata-enrichment", "metadata": metadata_payload},
timeout=30,
)
if resp.status_code in (200, 201):
pushed += 1
else:
errors.append(f"{column_name}: {resp.status_code} {resp.text[:200]}")
except httpx.RequestError as e:
errors.append(f"{column_name}: request error — {e}")
result = {"status": "ok", "pushed": pushed}
if errors:
result["errors"] = errors
return result