13 Devin findings across 10 files:
🔴 Critical:
- app/api/v2_catalog.py:42 — `_fetch_hint` returns `da fetch` in /api/v2/catalog
responses (user-visible in every catalog list)
- cli/skills/agnes-data-querying.md — 11 stale `da fetch`/`da sync` refs in the
bundled skill markdown
- config/claude_md_template.txt:38 — referenced `agnes pull --docs-only` flag
that does NOT exist in agnes pull (removed; spec only ships --quiet/--json/
--dry-run)
🟡 Important:
- app/api/admin.py:252 — `da fetch` in bq_max_scan_bytes hint
- cli/commands/auth.py:119 — `da sync` in import-token docstring (--help text)
- cli/commands/tokens.py:48 — "Export it so `da` can use it" prose
- ARCHITECTURE.md — 4 stale rows in CLI commands table
- README.md — stale paragraphs for analysts (da sync, da analyst setup)
🚩 Substantive observations addressed:
- app/api/query.py:249,302,489 — server-side error/help strings still said
`da sync`/`da fetch` (returned in API responses to clients)
- cli/commands/snapshot.py:235-241 — DuckDB existence guard incorrectly
blocked `--estimate` (server-side dry-run that never opens local DB).
Added test ensuring estimate path skips the guard.
Skipped (intentionally historical):
- app/api/admin.py:2377,2429,2437 — historical comments describing past
manifest-vs-sync_state bug; past tense, accurate to keep as `da sync`.
Adds three sections to the E2E plan:
- "Coverage honesty" — explicit list of what the plan reveals (✅) and
what it does NOT (❌, with reasoning per gap)
- "Recommended additional coverage layers" — Tier 1/2/3 with realistic
coverage estimates (~70 % / ~80 % / ~95 % / ~98 %)
- "Prerequisites" table — what's needed on the VM, with fallback
behavior per missing item
The plan is intentionally not exhaustive. Goal is to surface the worst
contract violations fast, not to prove correctness across all real-world
environments. Documenting the gap explicitly so operators don't ship
on a false sense of "tests passed = production-ready."
Typer/rich emits ANSI styling in CI's --help output (e.g. `--metrics`
becomes `-\x1b[0m\x1b[1;36m-metrics`), so literal substring asserts
like `assert "--metrics" in result.output` fail. Locally the test runner
auto-detects no-TTY and produces plain text, masking the issue.
Add a small `_clean()` helper per test file that strips ANSI escape
codes (`\x1b\[[0-9;]*m`) before substring containment checks.
- _v23_to_v24_finalize: wrap row-update loop in BEGIN/COMMIT/ROLLBACK
to match the project's transactional-finalizer pattern (compare
_v12_to_v13_finalize, _v17_to_v18_finalize, _v18_to_v19_finalize).
Pre-fix a process crash mid-loop left the schema_version unchanged
but partially-converted rows persisted across restart — idempotent
overall but inconsistent with project convention.
- _v23_to_v24_finalize: re.sub replacement now uses a function-form
(lambda) instead of an f-string, so any future project_id with a
backslash sequence isn't misinterpreted as a group reference.
- tests: add a Keboola-source materialized row case asserting the
SELECT's source_type filter prevents non-BQ rewrites.
Materialize now wraps admin SQL into bigquery_query('<billing>', '<inner>')
which requires the inner SQL to be BigQuery-flavor (backticked
identifiers, native function syntax). v24 migrates existing rows from
DuckDB-flavor (bq."ds"."tbl") to (`<project>.ds.tbl`) using the
configured BQ project. Idempotent on already-converted rows; logs a
warning and skips when the project isn't configured (operator can
configure + restart for retry).
Task 20: reusable pytest fixtures for the clean-bootstrap test suite.
Tasks 21 and 22 (reader smoke matrix + init smoke matrix) consume them.
- fastapi_test_server boots a real uvicorn subprocess against a tmp DATA_DIR,
pre-seeded with admin@example.com (Admin group), analyst@example.com
(Everyone group), and three tables (one per query_mode: local /
materialized / remote).
- web_session: cookie-authenticated httpx.Client for the admin user.
- test_pat: minted JWT for the analyst with table grants on local +
materialized.
- test_pat_no_grants: same shape, zero resource_grants.
- zero_grants_workspace: subprocess invocation of `agnes init` against the
no-grants PAT; returns the bootstrapped workspace path.
- NONEXISTENT_TABLE: module-level sentinel for the upcoming reader matrix.
Subprocess uvicorn (mirrors tests/test_e2e_corporate_memory.py) instead of
in-thread so DATA_DIR + module-level singletons in src.db don't bleed
across tests. agnes CLI invoked via `python -m cli.main` instead of the
.venv/bin/agnes shim, which depends on .pth file visibility that iCloud
Drive intermittently re-hides on macOS.
Spec-review note on 6c0846fd: every other section in the example file
with a default appears as live YAML, not commented. Match that
convention so operators see the documented default rendered.
New top-level 'materialize' section, single field (lock_ttl_seconds).
Default 86400 (24h). Backs the file-lock TTL reclaim added in the
per-table-mutex change. Editable via PUT /api/admin/server-config and
the /admin/server-config UI.
When admin registers a materialized BQ row with bucket+source_table but
no source_query, the server generates 'SELECT * FROM `<project>.<ds>.<tbl>`'
from instance.yaml's configured BQ project. Same fallback fires on PUT
when flipping to materialized. The backtick rejection guard, which was
appropriate for DuckDB-flavor source_query, is relaxed for materialized
rows since the new wrapping path (Task 2) runs admin SQL through BQ
jobs API which uses BQ-native syntax (backticks for dashed identifiers).
_run_materialized_pass distinguishes due-check skips from in-flight
skips and never calls state.set_error for either. summary['skipped']
becomes a list of {table, reason} dicts; the end-of-pass log line
breaks out the in_flight subcount.
Hoists is_table_due to module-level import so test monkeypatching of
the symbol intercepts the call (the previous local import made
patches a no-op).
- extractor._try_acquire_file_lock: close fd and re-raise on non-
BlockingIOError from fcntl.flock (read-only fs, unsupported flock,
fd exhaustion). Pre-fix the fd leaked silently and the underlying
OSError still propagated past the caller.
- extractor: reorder module-level layout so logger is bound before
the new lock-related helpers reference it. Deferred import of
app.instance_config inside _get_lock_ttl_seconds documented inline.
- extractor: comment _table_locks unbounded-by-design rationale.
- tests: docstring + monkeypatch-target rationale for the two
concurrency tests where the contract isn't obvious from the body.
- Verb renames (da X -> agnes X for surviving verbs; legacy verbs already
absent from this default template — admin overrides with legacy verbs are
caught by Task 2's _LEGACY_STRINGS scan + Task 5's admin banner).
- Path renames: data/parquet/ -> server/parquet/, data/duckdb/ ->
user/duckdb/, data/metadata/ removed entirely (no longer exists per spec).
- Drop user/artifacts/ from directory structure (spec workspace layout
drops it; surviving paths: server/parquet/, user/duckdb/, user/snapshots/,
user/sessions/).
- Add AGNES_WORKSPACE.md pointer near top-of-template so analysts know
where to find human-readable docs.
Cleans Task 0.5's missed sweep on this file (was not in cli/ tree but is
user-visible via /api/welcome).
81 claude_md/welcome_template tests pass.
Two Task 4 review fixes for app/web/templates/install.html:
1. JSON-escape `ROLE` JS const via `{{ role | tojson }}` (defense in
depth — removes the dependency on Jinja autoescape semantics for JS
contexts; FastAPI's Literal validator already constrains role values).
2. Verify the analyst tile's clipboard payload is the analyst layout.
The pre-existing role-aware plumbing (compute_default_agent_prompt
threading role into setup_instructions_lines, picked up by the JS
SETUP_INSTRUCTIONS_TEMPLATE array) was correct; adding regression tests
that pin to the JS clipboard block specifically so a future inversion
would fail loudly.
Tests: analyst clipboard contains `agnes init` + `agnes catalog` and
NOT `agnes auth import-token` / `agnes skills`; admin clipboard is the
inverse. Plus an explicit assertion that ROLE is rendered via tojson.
Two layers of concurrency control. Layer 1 is a per-table_id
threading.Lock keyed on table_id; Layer 2 is fcntl.flock on a sibling
<id>.parquet.lock file. Overlapping calls for the same id raise
MaterializeInFlightError, which the caller treats as 'skipped,
in_flight' instead of a hard error. Stale file locks (mtime older
than materialize.lock_ttl_seconds, default 86400) are reclaimed on
the next attempt — covers the rare case where a holder was hard-killed
before kernel-level flock release.
Pre-fix, when a materialize ran longer than the scheduler tick interval
(15 min), the next tick called materialize_query for the same id, hit
the unconditional tmp_path.unlink() at function entry, and started a
second COPY against the same path. Both writers interleaved bytes;
the original COPY's read_parquet validation then failed with
'No magic bytes found at end of file'.