agnes-the-ai-analyst/app/api
ZdenekSrotyr dd46461c6c fix(admin+orchestrator): DELETE registry drops parquet + sync_state; rebuild skips orphan parquets
E2E sub-agent finding: register a materialized BQ row → sync to materialize
the parquet at `/data/extracts/bigquery/data/<id>.parquet` → DELETE the
registry row. The DB row goes away but:

- the parquet file stays on disk forever, AND
- the sync_state row stays, so `/api/sync/manifest` keeps advertising the
  dropped table to `da sync`, AND
- the orchestrator's next rebuild can resurrect a master view by picking
  up the leftover parquet.

Two-part fix in `unregister_table`:

1. For materialized rows on bigquery/keboola, remove
   `${DATA_DIR}/extracts/<source_type>/data/<name>.parquet` (and any stale
   `<name>.parquet.tmp` from a crashed prior materialize). Filename is
   keyed on `table_registry.name` to match sync_state bookkeeping.
   File-removal errors are logged but don't fail the DELETE — the registry
   row is already gone, and an orphan parquet won't get a master view at
   next rebuild because the orchestrator's _meta-driven scan never picks
   up bare parquet files.

2. Always clear `sync_state` + `sync_history` rows for the dropped table_id
   so the manifest stops advertising the table — applies to all source
   types and modes, not just materialized, since any synced row had a
   sync_state entry.

Orchestrator-side defensive guard (Finding 2b) is a no-op in the current
implementation: `_attach_and_create_views` only creates master views from
`_meta` rows in each connector's `extract.duckdb`, so a parquet without a
matching `_meta` entry is already invisible to the rebuild. The new
test `test_orchestrator_skips_orphan_parquet_in_extracts` is kept as a
regression guard for that contract.

5 tests cover: BQ + Keboola materialized DELETE removes parquet, remote
DELETE doesn't error trying to remove a non-existent file, sync_state
cleared on DELETE, orchestrator orphan-skip invariant.
2026-05-01 22:54:11 +02:00
..
__init__.py feat: add FastAPI server with auth, RBAC, and all API endpoints 2026-03-27 15:19:18 +01:00
access.py feat(admin): users/groups UI polish + SSO lock + v18 migration (#142) 2026-04-30 15:16:04 +02:00
admin.py fix(admin+orchestrator): DELETE registry drops parquet + sync_state; rebuild skips orphan parquets 2026-05-01 22:54:11 +02:00
catalog.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
cli_artifacts.py release(2.1.0): durable sync, CLI auto-update, versioned wheel URL, version unification (#43) 2026-04-22 21:18:18 +02:00
data.py fix(security+ops) + release(0.12.1): #82 #85 #87 hardening + cut 0.12.1 (#104) 2026-04-28 19:57:30 +02:00
health.py fix(admin+diagnose): address 2 additional Devin Review findings on PR #152 2026-05-01 21:21:23 +02:00
jira_webhooks.py fix(security): close Jira webhook fail-open + path traversal (#83) (#93) 2026-04-27 19:53:55 +02:00
marketplaces.py fix(scheduler): HTTP marketplaces job + SCHEDULER_API_TOKEN shared secret (#127) 2026-04-29 11:44:00 +02:00
me_debug.py feat(auth): /me/debug self-only auth diagnostic page (#116) 2026-04-29 06:36:28 +02:00
memory.py feat(memory): admin Edit + MEMORY_DOMAIN RBAC + ai-section UI (#141) 2026-04-30 11:04:41 +02:00
metadata.py feat(rbac+marketplace): RBAC v13 + Claude Code marketplace + #81/#83/#44 hardening 2026-04-28 14:25:04 +02:00
metrics.py feat(rbac+marketplace): RBAC v13 + Claude Code marketplace + #81/#83/#44 hardening 2026-04-28 14:25:04 +02:00
query.py feat(v2): claude-driven fetch primitives + 0.14.0 (#102) 2026-04-29 01:07:19 +02:00
query_hybrid.py feat(rbac+marketplace): RBAC v13 + Claude Code marketplace + #81/#83/#44 hardening 2026-04-28 14:25:04 +02:00
scripts.py feat(scheduler): re-wire sync_schedule + script.schedule; tune via env; OpenMetadata TLS (#135) 2026-04-29 22:06:30 +02:00
settings.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
sync.py fix(admin-api): reject backtick BQ-native source_query at register; surface materialize errors per-row 2026-05-01 22:51:02 +02:00
telegram.py feat: complete system — web UI, all API endpoints, governance, admin, CLI commands 2026-03-27 16:52:22 +01:00
tokens.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
upload.py fix(security+ops) + release(0.12.1): #82 #85 #87 hardening + cut 0.12.1 (#104) 2026-04-28 19:57:30 +02:00
users.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
v2_arrow.py feat(v2): claude-driven fetch primitives + 0.14.0 (#102) 2026-04-29 01:07:19 +02:00
v2_cache.py feat(v2): claude-driven fetch primitives + 0.14.0 (#102) 2026-04-29 01:07:19 +02:00
v2_catalog.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
v2_quota.py feat(v2): claude-driven fetch primitives + 0.14.0 (#102) 2026-04-29 01:07:19 +02:00
v2_sample.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
v2_scan.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
v2_schema.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
where_validator.py feat(v2): claude-driven fetch primitives + 0.14.0 (#102) 2026-04-29 01:07:19 +02:00