Closes#171. The /api/query cost guardrail used to dry-run a synthetic
`SELECT * FROM <table>` for each registered remote-BQ row referenced
by the user SQL — which made BigQuery estimate a full table scan, with
column projection, predicate pushdown, and partition pruning all
disabled. Narrow queries on big partitioned/clustered tables (the
documented happy path for `agnes query --remote`) hit ~30,000×
over-estimates and got rejected with 400 `remote_scan_too_large` even
when BQ's own dry-run reported single-digit MB.
Pavel's report on #171 traced the root cause and proposed the fix:
rewrite the user SQL to BQ-native syntax and dry-run it as a single
job, exactly the way `bq query --dry_run` works.
Implementation:
- New helper _rewrite_user_sql_for_bq_dry_run rewrites bare registered
names (word-boundary, case-insensitive, longest-first to avoid prefix
collisions) + bq."<ds>"."<tbl>" forms to backticked
`<project>.<ds>.<tbl>` paths.
- _bq_quota_and_cap_guard runs ONE dry-run on the rewritten SQL. Cap
check uses the real estimate.
- Fallback path: if BQ rejects with bq_bad_request (e.g. DuckDB-only
syntax like ::INT casts), the guard falls back to the pre-fix
per-table SELECT * approach so non-portable queries still get a
(loose) cap estimate instead of fail-opening. Non-parse BQ errors
(forbidden, upstream) still propagate as 502.
- _bq_guardrail_inputs now also returns name_lookups so the rewriter
has the (registered_name, bucket, source_table) mapping it needs.
- Per-table breakdown is unavailable from a composite dry-run; total
bytes are pinned to dry_run_set[0] for the post-flight
record_bytes(sum(...)) call to keep returning the right total.
Tests (7 new, 3 existing still pass):
- dry-run receives rewritten user SQL with WHERE clause intact (the
load-bearing assertion for #171)
- single dry-run per request even with multiple registered tables
(JOIN, UNION) referenced
- fallback to per-table SELECT * on bq_bad_request
- non-parse BQ errors (forbidden) still 502
- rewriter unit tests: bare + bq.path in same SQL, longest-name-wins
on prefix collision, case-insensitive bare-name match
The renderer no longer emits the legacy "da analyst setup" verb (the analyst
flow uses `agnes init`, the admin flow uses `agnes auth import-token`). The
disjunction assertions ("da analyst setup" OR "agnes auth" OR "curl") were
permissive and would have silently kept passing even if the renderer
regressed. Replace them with role-aware assertions that match the actual
emitted markers and explicitly check that no legacy verb survives.
Four call sites where #174 (branched from main before the agnes rename
fully landed in some files) emitted or referenced `da fetch`. None are
operator-visible runtime crashes — but `extractor.py` logs a stale
verb to the operator log and `DATA_SOURCES.md` is current docs:
- connectors/bigquery/extractor.py:431,434 (operator-facing log line on
unverified BQ entity_type — was suggesting `da fetch`).
- docs/DATA_SOURCES.md:77,85 (current public docs, two refs to
`da fetch` in the workflow + the BQ scope description).
- tests/test_cli_query_render.py:7 (module docstring listed
`da fetch / agnes schema / etc.` — now `agnes snapshot create / agnes
schema / etc.`).
- tests/test_cli_snapshot_create.py:1 (docstring referenced `(folded
from `da fetch`)` — historical, removed; no value once the rename
landed).
Pre-existing stale `da` references elsewhere in the branch (templates,
operator runbooks, internal comments) are not touched by this commit —
they live outside the merge surface and are a separate cleanup task.
Verified: 10/10 across the affected test files pass.
Ports Minas's PR #172 (against pre-rename `da` CLI on main) and applies
the principle to the post-rename `agnes` CLI. Two distinct failure modes
on Windows consoles whose default codepage is cp1250 (cs-CZ) / cp1252
(en-US):
1. `agnes pull` and other Rich-progress codepaths
UnicodeEncodeError on Braille spinner glyphs. Fix: `cli/main.py`
reconfigures stdout/stderr to UTF-8 with errors='replace' at import
time on `sys.platform == 'win32'` so Rich's legacy-Windows render
path emits decodable bytes. Wrapped in try/except so pytest's
captured streams (which aren't TextIOWrapper) don't break.
2. `agnes skills list` and `agnes skills show`
UnicodeDecodeError when reading skill markdown containing em-dashes /
accented chars. Default `Path.read_text()` uses
locale.getpreferredencoding(False), which is the broken codepage on
Windows. Fix: every call site passes encoding='utf-8' explicitly.
Broader scope than #172 because:
- The bootstrap rewrite renamed/removed several files Minas's PR
patched (`cli/commands/analyst.py` -> rolled into init.py;
`cli/commands/sync.py` -> split into pull/push). Those targets no
longer exist; the equivalent code lives in init.py.
- Other call sites Minas didn't touch (still bare in his branch) are
patched here too — config.py / update_check.py / snapshot_meta.py /
setup.py / skills.py — so the codebase has zero locale-default text
I/O in cli/.
Side cleanup: stale `Run `da`` reference in snapshot_meta.py:88 fixed
to `agnes` while touching the file.
The dashed identifier is what the test exercises (backticks required for
dashed BQ project IDs); the literal string can be any synthetic value.
'prj-grp' is too close to a real customer-prefix pattern that the OSS
vendor-scrub regex flags. 'my-project' matches placeholders used elsewhere
in the project.
CI failure: test_readers_in_pre_init_dir asserted no Traceback in stderr
when running `agnes snapshot create x --as y --estimate` in a folder
that never saw `agnes init`. The estimate-guard fix in 3d587681 let
`--estimate` skip the local_db check and reach `api_post_json`, but the
existing `except V2ClientError` doesn't cover transport-layer failures.
With no server configured the URL defaults to http://localhost:8000;
httpx raises ConnectError → ConnectError isn't a V2ClientError → the
exception bubbles up through Typer/rich as a full traceback.
Add `except httpx.HTTPError` next to V2ClientError so connection /
DNS / TLS / timeout failures all render the friendly hint
`Run `agnes init …` first` instead of leaking transport noise.
Real bug: `agnes push` was reading `<workspace>/user/sessions/`, but
Claude Code writes session jsonls to `~/.claude/projects/<encoded-cwd>/`
and nothing on the analyst side ever copies them across. The SessionEnd
hook ran `agnes push` happily and uploaded zero sessions every time.
`cli/lib/claude_sessions.py` probes both Claude Code encoding variants
(older `/`→`-` keeping spaces+tildes; newer all-non-alphanumeric→`-`
with collapsed runs) and unions whichever exist. Users who upgraded
Claude Code mid-project end up with both encoded dirs side-by-side on
disk; the union ensures no session is left behind. Same-named jsonl in
both dirs → newest mtime wins. `<workspace>/user/sessions/` survives as
a fallback for any setup that explicitly mirrors sessions there.
Verified on real disk: helper returns 2 dirs + 8 unioned session files
for the Agnes-test workspace where the previous code returned 0.
Three coupled UX fixes for the analyst-onboarding flow:
1. Dashboard "Setup a new Claude Code" CTA was rendering admin paste
prompt for everyone (analysts couldn't actually execute the marketplace
plugin install / skills setup steps). render_agent_prompt_banner now
picks role based on user.is_admin — analysts get the analyst flow.
2. /setup default role changed from admin to analyst. Most visitors are
analysts; admin layout is opt-in via the admin tile or ?role=admin.
3. Admin tile is admin-only on the role-tile nav. Non-admins see only
the analyst tile. Server-side: non-admin requesting ?role=admin is
silently downgraded to analyst (otherwise they'd see admin paste
prompt despite no tile).
Tests:
- New: test_setup_page_admin_tile_hidden_for_non_admin (anonymous client
can't see "Admin CLI" or role=admin link)
- New: test_setup_page_admin_role_downgraded_for_non_admin (anonymous
?role=admin → analyst layout, no marketplace step in clipboard)
- New: test_install_preview_default_role_is_analyst (admin signing in to
bare /setup gets analyst clipboard by default)
- Renamed: test_setup_page_default_role_is_admin → ..._is_analyst
- Updated: test_setup_page_admin_clipboard_renders_admin_layout uses
FastAPI dependency_overrides to inject admin user (admin layout is
now admin-gated)
- Updated: test_install_preview_visible_for_signed_in_user explicitly
passes ?role=admin to exercise admin layout
Caught by my own broader test scope after Devin fixes — three test files
asserted on user-visible strings that were renamed by the bootstrap PR
but the assertions weren't updated:
- tests/test_api_query_guardrail.py:110 — asserted `da fetch in suggestion`
on /api/query 400 response. Renamed to `agnes snapshot create`.
- tests/test_query_materialized_error_message.py:56 — asserted `da sync`
in materialized-not-yet error detail. Renamed to `agnes pull`.
- tests/test_cli_error_render.py:71 — fixture data + assertion both
carried `da fetch`. Updated to `agnes snapshot create`.
Plus an actual content miss: docs/setup/claude_settings.json (a template
shipped to operators) still installed `da sync` / `da sync --upload-only`
hooks. The companion test file (tests/test_setup_hooks_template.py) was
asserting that legacy state. Updated both:
- Template hooks: `agnes pull --quiet` / `agnes push --quiet`
- Test assertions + function name match the new commands
13 Devin findings across 10 files:
🔴 Critical:
- app/api/v2_catalog.py:42 — `_fetch_hint` returns `da fetch` in /api/v2/catalog
responses (user-visible in every catalog list)
- cli/skills/agnes-data-querying.md — 11 stale `da fetch`/`da sync` refs in the
bundled skill markdown
- config/claude_md_template.txt:38 — referenced `agnes pull --docs-only` flag
that does NOT exist in agnes pull (removed; spec only ships --quiet/--json/
--dry-run)
🟡 Important:
- app/api/admin.py:252 — `da fetch` in bq_max_scan_bytes hint
- cli/commands/auth.py:119 — `da sync` in import-token docstring (--help text)
- cli/commands/tokens.py:48 — "Export it so `da` can use it" prose
- ARCHITECTURE.md — 4 stale rows in CLI commands table
- README.md — stale paragraphs for analysts (da sync, da analyst setup)
🚩 Substantive observations addressed:
- app/api/query.py:249,302,489 — server-side error/help strings still said
`da sync`/`da fetch` (returned in API responses to clients)
- cli/commands/snapshot.py:235-241 — DuckDB existence guard incorrectly
blocked `--estimate` (server-side dry-run that never opens local DB).
Added test ensuring estimate path skips the guard.
Skipped (intentionally historical):
- app/api/admin.py:2377,2429,2437 — historical comments describing past
manifest-vs-sync_state bug; past tense, accurate to keep as `da sync`.
Adds three sections to the E2E plan:
- "Coverage honesty" — explicit list of what the plan reveals (✅) and
what it does NOT (❌, with reasoning per gap)
- "Recommended additional coverage layers" — Tier 1/2/3 with realistic
coverage estimates (~70 % / ~80 % / ~95 % / ~98 %)
- "Prerequisites" table — what's needed on the VM, with fallback
behavior per missing item
The plan is intentionally not exhaustive. Goal is to surface the worst
contract violations fast, not to prove correctness across all real-world
environments. Documenting the gap explicitly so operators don't ship
on a false sense of "tests passed = production-ready."
Typer/rich emits ANSI styling in CI's --help output (e.g. `--metrics`
becomes `-\x1b[0m\x1b[1;36m-metrics`), so literal substring asserts
like `assert "--metrics" in result.output` fail. Locally the test runner
auto-detects no-TTY and produces plain text, masking the issue.
Add a small `_clean()` helper per test file that strips ANSI escape
codes (`\x1b\[[0-9;]*m`) before substring containment checks.
- _v23_to_v24_finalize: wrap row-update loop in BEGIN/COMMIT/ROLLBACK
to match the project's transactional-finalizer pattern (compare
_v12_to_v13_finalize, _v17_to_v18_finalize, _v18_to_v19_finalize).
Pre-fix a process crash mid-loop left the schema_version unchanged
but partially-converted rows persisted across restart — idempotent
overall but inconsistent with project convention.
- _v23_to_v24_finalize: re.sub replacement now uses a function-form
(lambda) instead of an f-string, so any future project_id with a
backslash sequence isn't misinterpreted as a group reference.
- tests: add a Keboola-source materialized row case asserting the
SELECT's source_type filter prevents non-BQ rewrites.
Materialize now wraps admin SQL into bigquery_query('<billing>', '<inner>')
which requires the inner SQL to be BigQuery-flavor (backticked
identifiers, native function syntax). v24 migrates existing rows from
DuckDB-flavor (bq."ds"."tbl") to (`<project>.ds.tbl`) using the
configured BQ project. Idempotent on already-converted rows; logs a
warning and skips when the project isn't configured (operator can
configure + restart for retry).
Task 20: reusable pytest fixtures for the clean-bootstrap test suite.
Tasks 21 and 22 (reader smoke matrix + init smoke matrix) consume them.
- fastapi_test_server boots a real uvicorn subprocess against a tmp DATA_DIR,
pre-seeded with admin@example.com (Admin group), analyst@example.com
(Everyone group), and three tables (one per query_mode: local /
materialized / remote).
- web_session: cookie-authenticated httpx.Client for the admin user.
- test_pat: minted JWT for the analyst with table grants on local +
materialized.
- test_pat_no_grants: same shape, zero resource_grants.
- zero_grants_workspace: subprocess invocation of `agnes init` against the
no-grants PAT; returns the bootstrapped workspace path.
- NONEXISTENT_TABLE: module-level sentinel for the upcoming reader matrix.
Subprocess uvicorn (mirrors tests/test_e2e_corporate_memory.py) instead of
in-thread so DATA_DIR + module-level singletons in src.db don't bleed
across tests. agnes CLI invoked via `python -m cli.main` instead of the
.venv/bin/agnes shim, which depends on .pth file visibility that iCloud
Drive intermittently re-hides on macOS.
Spec-review note on 6c0846fd: every other section in the example file
with a default appears as live YAML, not commented. Match that
convention so operators see the documented default rendered.
New top-level 'materialize' section, single field (lock_ttl_seconds).
Default 86400 (24h). Backs the file-lock TTL reclaim added in the
per-table-mutex change. Editable via PUT /api/admin/server-config and
the /admin/server-config UI.
When admin registers a materialized BQ row with bucket+source_table but
no source_query, the server generates 'SELECT * FROM `<project>.<ds>.<tbl>`'
from instance.yaml's configured BQ project. Same fallback fires on PUT
when flipping to materialized. The backtick rejection guard, which was
appropriate for DuckDB-flavor source_query, is relaxed for materialized
rows since the new wrapping path (Task 2) runs admin SQL through BQ
jobs API which uses BQ-native syntax (backticks for dashed identifiers).
_run_materialized_pass distinguishes due-check skips from in-flight
skips and never calls state.set_error for either. summary['skipped']
becomes a list of {table, reason} dicts; the end-of-pass log line
breaks out the in_flight subcount.
Hoists is_table_due to module-level import so test monkeypatching of
the symbol intercepts the call (the previous local import made
patches a no-op).
- extractor._try_acquire_file_lock: close fd and re-raise on non-
BlockingIOError from fcntl.flock (read-only fs, unsupported flock,
fd exhaustion). Pre-fix the fd leaked silently and the underlying
OSError still propagated past the caller.
- extractor: reorder module-level layout so logger is bound before
the new lock-related helpers reference it. Deferred import of
app.instance_config inside _get_lock_ttl_seconds documented inline.
- extractor: comment _table_locks unbounded-by-design rationale.
- tests: docstring + monkeypatch-target rationale for the two
concurrency tests where the contract isn't obvious from the body.
- Verb renames (da X -> agnes X for surviving verbs; legacy verbs already
absent from this default template — admin overrides with legacy verbs are
caught by Task 2's _LEGACY_STRINGS scan + Task 5's admin banner).
- Path renames: data/parquet/ -> server/parquet/, data/duckdb/ ->
user/duckdb/, data/metadata/ removed entirely (no longer exists per spec).
- Drop user/artifacts/ from directory structure (spec workspace layout
drops it; surviving paths: server/parquet/, user/duckdb/, user/snapshots/,
user/sessions/).
- Add AGNES_WORKSPACE.md pointer near top-of-template so analysts know
where to find human-readable docs.
Cleans Task 0.5's missed sweep on this file (was not in cli/ tree but is
user-visible via /api/welcome).
81 claude_md/welcome_template tests pass.
Two Task 4 review fixes for app/web/templates/install.html:
1. JSON-escape `ROLE` JS const via `{{ role | tojson }}` (defense in
depth — removes the dependency on Jinja autoescape semantics for JS
contexts; FastAPI's Literal validator already constrains role values).
2. Verify the analyst tile's clipboard payload is the analyst layout.
The pre-existing role-aware plumbing (compute_default_agent_prompt
threading role into setup_instructions_lines, picked up by the JS
SETUP_INSTRUCTIONS_TEMPLATE array) was correct; adding regression tests
that pin to the JS clipboard block specifically so a future inversion
would fail loudly.
Tests: analyst clipboard contains `agnes init` + `agnes catalog` and
NOT `agnes auth import-token` / `agnes skills`; admin clipboard is the
inverse. Plus an explicit assertion that ROLE is rendered via tojson.