agnes-the-ai-analyst/app
ZdenekSrotyr dd46461c6c fix(admin+orchestrator): DELETE registry drops parquet + sync_state; rebuild skips orphan parquets
E2E sub-agent finding: register a materialized BQ row → sync to materialize
the parquet at `/data/extracts/bigquery/data/<id>.parquet` → DELETE the
registry row. The DB row goes away but:

- the parquet file stays on disk forever, AND
- the sync_state row stays, so `/api/sync/manifest` keeps advertising the
  dropped table to `da sync`, AND
- the orchestrator's next rebuild can resurrect a master view by picking
  up the leftover parquet.

Two-part fix in `unregister_table`:

1. For materialized rows on bigquery/keboola, remove
   `${DATA_DIR}/extracts/<source_type>/data/<name>.parquet` (and any stale
   `<name>.parquet.tmp` from a crashed prior materialize). Filename is
   keyed on `table_registry.name` to match sync_state bookkeeping.
   File-removal errors are logged but don't fail the DELETE — the registry
   row is already gone, and an orphan parquet won't get a master view at
   next rebuild because the orchestrator's _meta-driven scan never picks
   up bare parquet files.

2. Always clear `sync_state` + `sync_history` rows for the dropped table_id
   so the manifest stops advertising the table — applies to all source
   types and modes, not just materialized, since any synced row had a
   sync_state entry.

Orchestrator-side defensive guard (Finding 2b) is a no-op in the current
implementation: `_attach_and_create_views` only creates master views from
`_meta` rows in each connector's `extract.duckdb`, so a parquet without a
matching `_meta` entry is already invisible to the rebuild. The new
test `test_orchestrator_skips_orphan_parquet_in_extracts` is kept as a
regression guard for that contract.

5 tests cover: BQ + Keboola materialized DELETE removes parquet, remote
DELETE doesn't error trying to remove a non-existent file, sync_state
cleared on DELETE, orchestrator orphan-skip invariant.
2026-05-01 22:54:11 +02:00
..
api fix(admin+orchestrator): DELETE registry drops parquet + sync_state; rebuild skips orphan parquets 2026-05-01 22:54:11 +02:00
auth feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
debug feat(observability): request_id end-to-end + dev debug toolbar + centralized logging (#136) 2026-04-29 22:54:21 +02:00
marketplace_server fix(marketplace): use plugin.json name in synth marketplace.json (#133) 2026-04-29 19:25:57 +02:00
middleware feat(observability): request_id end-to-end + dev debug toolbar + centralized logging (#136) 2026-04-29 22:54:21 +02:00
web feat(admin-ui): /admin/server-config known-fields registry + structured nested editor 2026-05-01 20:27:01 +02:00
__init__.py feat: add FastAPI server with auth, RBAC, and all API endpoints 2026-03-27 15:19:18 +01:00
instance_config.py fix(v2): #134 BigQuery cross-project errors return structured 502/400 + BqAccess facade (#138) 2026-04-30 10:11:20 +02:00
logging_config.py feat(observability): request_id end-to-end + dev debug toolbar + centralized logging (#136) 2026-04-29 22:54:21 +02:00
main.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
resource_types.py feat(rbac): drop dataset_permissions + users.role + is_public; v19 migration (#150) 2026-04-30 22:02:16 +02:00
secrets.py fix: address Devin review round 5 — empty secret file, CI .env 2026-04-10 14:55:31 +02:00
utils.py feat(rbac+marketplace): RBAC v13 + Claude Code marketplace + #81/#83/#44 hardening 2026-04-28 14:25:04 +02:00