Commit graph

599 commits

Author SHA1 Message Date
ZdenekSrotyr
8403529fcd test: clean-install integration suite (minimal/zero grants, force, pre-init) 2026-05-04 19:22:24 +02:00
ZdenekSrotyr
42e108ae5e test: reader smoke matrix on zero-grants workspace 2026-05-04 19:15:39 +02:00
ZdenekSrotyr
a47c2be282 test: clean-bootstrap fixtures (fastapi_test_server, test_pat, zero_grants_workspace)
Task 20: reusable pytest fixtures for the clean-bootstrap test suite.
Tasks 21 and 22 (reader smoke matrix + init smoke matrix) consume them.

- fastapi_test_server boots a real uvicorn subprocess against a tmp DATA_DIR,
  pre-seeded with admin@example.com (Admin group), analyst@example.com
  (Everyone group), and three tables (one per query_mode: local /
  materialized / remote).
- web_session: cookie-authenticated httpx.Client for the admin user.
- test_pat: minted JWT for the analyst with table grants on local +
  materialized.
- test_pat_no_grants: same shape, zero resource_grants.
- zero_grants_workspace: subprocess invocation of `agnes init` against the
  no-grants PAT; returns the bootstrapped workspace path.
- NONEXISTENT_TABLE: module-level sentinel for the upcoming reader matrix.

Subprocess uvicorn (mirrors tests/test_e2e_corporate_memory.py) instead of
in-thread so DATA_DIR + module-level singletons in src.db don't bleed
across tests. agnes CLI invoked via `python -m cli.main` instead of the
.venv/bin/agnes shim, which depends on .pth file visibility that iCloud
Drive intermittently re-hides on macOS.
2026-05-04 19:11:54 +02:00
ZdenekSrotyr
8d9323c99e docs(claude-md): sweep surviving-verb da X references (Task 19 follow-up) 2026-05-04 19:01:27 +02:00
ZdenekSrotyr
3990fb0d85 docs(claude-md): rewrite verbs + paths for new CLI surface 2026-05-04 19:00:31 +02:00
ZdenekSrotyr
7e1dd1adba refactor(cli): drop sync/fetch/analyst/metrics; register init/pull/push (BREAKING) 2026-05-04 18:59:51 +02:00
ZdenekSrotyr
5551f12bb0 fix(cli): hint text 'Run: da sync' → 'Run: agnes pull' 2026-05-04 18:42:21 +02:00
ZdenekSrotyr
ff5da0af90 feat(cli): agnes admin metrics {import,export,validate} 2026-05-04 18:39:05 +02:00
ZdenekSrotyr
42b8d0309b feat(cli): agnes catalog --metrics replaces da metrics list/show 2026-05-04 18:33:17 +02:00
ZdenekSrotyr
8309141705 feat(cli): agnes snapshot create (folded from da fetch); friendly exit if no DuckDB 2026-05-04 18:32:30 +02:00
ZdenekSrotyr
5e1e8c4e14 feat(cli): agnes status = workspace state; old health check moves to agnes diagnose system 2026-05-04 18:29:15 +02:00
ZdenekSrotyr
b799aa534a fix(cli): I1+I2 review — surface manifest_unauthorized + add 3 typed-error tests 2026-05-04 18:19:35 +02:00
ZdenekSrotyr
9b70ca3069 feat(cli): agnes init orchestrator + AGNES_WORKSPACE.md template 2026-05-04 18:15:08 +02:00
ZdenekSrotyr
60b6fbed97 feat(cli): agnes push command (extracted from sync --upload-only) 2026-05-04 18:09:57 +02:00
ZdenekSrotyr
7f89e1d594 feat(cli): agnes pull command (Typer wrapper around lib.pull.run_pull) 2026-05-04 18:07:28 +02:00
ZdenekSrotyr
15004126de fix(cli-lib): I1+I2+I3 review fixes — token-precedence note, sync-state TODO, dry-run hermeticity test 2026-05-04 18:04:56 +02:00
ZdenekSrotyr
37da602060 feat(cli-lib): cli/lib/pull.py:run_pull primitive with lazy mkdir 2026-05-04 18:00:57 +02:00
ZdenekSrotyr
2b3d62fbf5 chore(.gitignore): allowlist cli/lib/ from generic lib/ rule (Task 7 follow-up) 2026-05-04 17:54:00 +02:00
ZdenekSrotyr
5aebeabf23 feat(cli-lib): cli/lib/hooks.py:install_claude_hooks 2026-05-04 17:53:20 +02:00
ZdenekSrotyr
d25d075ed2 docs(claude-md-template): rewrite verbs + paths for new CLI surface (Task 6)
- Verb renames (da X -> agnes X for surviving verbs; legacy verbs already
  absent from this default template — admin overrides with legacy verbs are
  caught by Task 2's _LEGACY_STRINGS scan + Task 5's admin banner).
- Path renames: data/parquet/ -> server/parquet/, data/duckdb/ ->
  user/duckdb/, data/metadata/ removed entirely (no longer exists per spec).
- Drop user/artifacts/ from directory structure (spec workspace layout
  drops it; surviving paths: server/parquet/, user/duckdb/, user/snapshots/,
  user/sessions/).
- Add AGNES_WORKSPACE.md pointer near top-of-template so analysts know
  where to find human-readable docs.

Cleans Task 0.5's missed sweep on this file (was not in cli/ tree but is
user-visible via /api/welcome).

81 claude_md/welcome_template tests pass.
2026-05-04 17:51:14 +02:00
ZdenekSrotyr
a92c624dba feat(admin): yellow banner for legacy CLI verbs in workspace-prompt override 2026-05-04 17:46:50 +02:00
ZdenekSrotyr
8091620d33 fix(setup): role-aware clipboard render + JSON-escape ROLE injection
Two Task 4 review fixes for app/web/templates/install.html:

1. JSON-escape `ROLE` JS const via `{{ role | tojson }}` (defense in
   depth — removes the dependency on Jinja autoescape semantics for JS
   contexts; FastAPI's Literal validator already constrains role values).

2. Verify the analyst tile's clipboard payload is the analyst layout.
   The pre-existing role-aware plumbing (compute_default_agent_prompt
   threading role into setup_instructions_lines, picked up by the JS
   SETUP_INSTRUCTIONS_TEMPLATE array) was correct; adding regression tests
   that pin to the JS clipboard block specifically so a future inversion
   would fail loudly.

Tests: analyst clipboard contains `agnes init` + `agnes catalog` and
NOT `agnes auth import-token` / `agnes skills`; admin clipboard is the
inverse. Plus an explicit assertion that ROLE is rendered via tojson.
2026-05-04 17:43:46 +02:00
ZdenekSrotyr
44234ba3ae test(setup): add mutation-resistant ternary-direction assertion (Task 4 polish) 2026-05-04 17:37:54 +02:00
ZdenekSrotyr
7965f8021d fix(setup): role-aware PAT scope+TTL in setupNewClaude JS (Task 4 spec fix) 2026-05-04 17:34:30 +02:00
ZdenekSrotyr
f731ee7897 feat(setup): /setup?role=analyst|admin branching with role tiles 2026-05-04 17:28:47 +02:00
ZdenekSrotyr
54f83c281c test(setup): I1+I2 review fixes — AGNES_WORKSPACE.md alignment + step-number pin 2026-05-04 17:23:15 +02:00
ZdenekSrotyr
ae00945cbf fix(setup): clean stale 'da' refs in setup_instructions.py (Task 0.5 missed sweep) 2026-05-04 17:19:55 +02:00
ZdenekSrotyr
29e28ccbd3 feat(setup): add analyst role to install-prompt renderer 2026-05-04 17:17:59 +02:00
ZdenekSrotyr
59324f9361 feat(admin): scan CLAUDE.md override for legacy strings 2026-05-04 17:10:58 +02:00
ZdenekSrotyr
68639e54cf test(tokens): tighten scope-default + add precedence + audit + reserved-key tests 2026-05-04 17:07:02 +02:00
ZdenekSrotyr
4ee7323436 feat(tokens): add scope + ttl_seconds fields with bootstrap-analyst clamp 2026-05-04 17:00:54 +02:00
ZdenekSrotyr
8fbf4c7873 refactor: Task 0.5 amendments — README/ARCHITECTURE sweep + main.py install hint + drop dead AGNES_SERVER_URL 2026-05-04 16:55:55 +02:00
ZdenekSrotyr
ed371c84d1 refactor(docs): sweep DA_* env vars + surviving da-verbs in docs/*.md (Task 0.5 fix) 2026-05-04 16:43:15 +02:00
ZdenekSrotyr
1563b05f2e refactor(cli): hard-cutover env vars + config dir to AGNES_*
Task 0.5 of clean-analyst-bootstrap. Greenfield rewrite — no fallback,
no aliases. Existing dev environments lose their cached PAT and must
re-authenticate.

Env var renames (hard cutover):
- DA_CONFIG_DIR    -> AGNES_CONFIG_DIR
- DA_SERVER        -> AGNES_SERVER
- DA_SERVER_URL    -> AGNES_SERVER_URL  (test-only stale ref, not in spec)
- DA_NO_UPDATE_CHECK -> AGNES_NO_UPDATE_CHECK
- DA_LOCAL_DIR     -> AGNES_LOCAL_DIR
- DA_TOKEN         -> AGNES_TOKEN
- DA_STREAM_RETRIES -> AGNES_STREAM_RETRIES

Config dir rename: ~/.config/da/ -> ~/.config/agnes/ (across code,
comments, docstrings, error messages, install templates, dev scripts).

Stale `da X` references in CLI source (and adjacent app/, tests/):
swept docstrings, comments, help text, and error messages where the
verb survives the rewrite (init, pull, push, catalog, status, diagnose,
auth, admin, skills, query, schema, describe, explore, disk-info,
snapshot, login, logout, whoami, server, setup) and replaced `da X`
with `agnes X`. Intentionally kept `da sync`, `da fetch`, `da analyst`,
`da metrics` — those verbs are removed in later tasks; the legacy
strings will be detected by `_LEGACY_STRINGS` (added in Task 2).

Test fixes:
- TestCLIVersion now asserts output starts with `agnes ` (was `da `).

Test results: 2675 passed, 25 skipped (full pytest run, excluding 9
pre-existing test_db.py / test_user_management.py / test_e2e_extract.py
/ test_cli_binary_rename.py failures unrelated to this rename).
2026-05-04 16:35:44 +02:00
ZdenekSrotyr
8c8cdf6a6a feat(cli): rename binary from da to agnes (BREAKING) 2026-05-04 16:05:14 +02:00
ZdenekSrotyr
841dcc8447 docs(spec+plan): round-4 review fixes — rename hygiene
5 must-fixes from rename-hygiene review:

- B1: test command arrays ["da", ...] -> ["agnes", ...] in spec lines
  433-446, 466, 502 and plan reader smoke matrix + clean-install
  integration tests + readers-in-pre-init test (39 occurrences)
- B2: ~/.local/bin/da -> ~/.local/bin/agnes (binary path string in
  spec data flow + plan AGNES_WORKSPACE.md uninstall table)
- B3: CHANGELOG missing BREAKING bullet for binary rename — added in
  both spec and plan CHANGELOG drafts
- B4: plan CHANGELOG typo "previous agnes status" -> "previous da
  status" (server-health overview was historically da status)
- B5: spec Components table missing row for the binary rename — added
  Client-side row mapping pyproject.toml + cli/main.py changes to
  plan Task 0
2026-05-04 15:57:07 +02:00
ZdenekSrotyr
5e7fa418d1 docs(spec+plan): rename CLI binary from da to agnes (BREAKING)
- Spec rev 5: branding consistency. New CLI verbs use agnes prefix
  (agnes init, agnes pull, agnes push, agnes catalog, agnes status,
  agnes snapshot create, agnes admin, …).
- Plan: add Phase 0 / Task 0 — pyproject.toml [project.scripts] entry
  rename to "agnes = cli.main:app" + Typer(name="agnes") in cli/main.py.
- Legacy command references (da sync, da fetch, da analyst setup,
  da metrics) keep their da prefix throughout — they're historical
  artifacts being removed (preserved in CHANGELOG Removed section,
  _LEGACY_STRINGS constant for admin override scan, etc.).

Bulk rename via Python regex with verb whitelist: 286 verb refs
rewritten in plan, 265 in spec; 104+72 legacy refs restored to "da"
post-pass (false positives where the doc was describing the legacy
flow being replaced).
2026-05-04 15:50:44 +02:00
ZdenekSrotyr
fb8f55c335 docs(plan): clean-analyst-bootstrap implementation plan
25 tasks across 6 phases:
- Phase 1: server-side foundation (PAT scope/TTL, legacy-strings scan,
  /setup?role= branching, claude_md_template)
- Phase 2: cli/lib/ shared-library tree (hooks.py, pull.py)
- Phase 3: new commands (init, pull, push, status, diagnose system,
  snapshot create, catalog --metrics, admin metrics)
- Phase 4: wiring + delete obsolete (sync, fetch, analyst, metrics)
- Phase 5: test fixtures + reader smoke matrix + clean-install integration
- Phase 6: CHANGELOG + final verification

TDD discipline throughout: test → fail → implement → pass → commit per task.
2026-05-04 15:22:10 +02:00
ZdenekSrotyr
b2cc6517aa docs(spec): rev 4 — round-3 review fixes; cleared for implementation
Round-3 review (2 load-bearing + 5 quality-of-life):
- web_session fixture endpoint: POST /auth/password/login/web (form),
  not POST /auth/token (which doesn't exist); PAT mint requires session
  cookie via require_session_token
- Renderer wording: "adopt" not "reuse" — da init/da pull are first-time
  adopters; synthetic {"detail": {"kind": ..., "hint": ...}} pattern
  documented (cli/commands/query.py:152, 165)
- cli/lib/__init__.py mention for Hatchling wheel inclusion
- _LEGACY_STRINGS constant + _scan_legacy_strings helper home named in
  app/api/claude_md.py
- /api/welcome?server_url= query param explicit in data flow
- ttl_seconds upper bound 315_360_000 (matches existing 3650 d cap)
- Conftest line range 50-82 -> 50-83
2026-05-04 15:10:40 +02:00
ZdenekSrotyr
b52a37b680 docs(spec): rev 3 — round-2 review fixes + main-sync (post-0.32.0)
Round-2 review (N1-N10):
- N1 Token CLI keeps current `da auth token` location (not top-level)
- N2 `da catalog --metrics --show <id>` decided in Components, dropped
  from Open questions
- N3 `_install_claude_hooks` migrated to new `cli/lib/hooks.py` module
- N4 Test sentinel `__nonexistent__` documented in fixtures
- N5 `web_session` fixture uses real `POST /auth/token` with seeded password
- N6 `AGNES_WORKSPACE.md` content asserts (PAT not leaked, placeholders
  substituted) added to clean-install integration test
- N7 Admin UI legacy-strings banner concretized: `legacy_strings_detected`
  field + yellow banner in `/admin/workspace-prompt` editor
- N8 `da metrics export/validate` relocate to `da admin metrics …`
  alongside `import`
- N9 Bootstrap PAT verify endpoint switched from `/api/health` (unauth)
  to `/api/catalog/tables` (PAT-validating, matches `da auth import-token`)
- N10 New `cli/lib/pull.py` and `cli/lib/hooks.py` modules inventoried

Main-sync (rebased on 0.32.0 / #160):
- Reconsidered: keep `da skills list / show` as analyst-side discovery
  (skill content was strengthened by #160 cost-guardrail/registry rails)
- Bigger CLAUDE.md (repo-root) rewrite scope acknowledges new sections
- `cli/error_render.py` (added in 0.32.0) reused by `da init` and
  `da pull` for consistent typed-error UX
- Test fixtures piggyback on autouse `_reset_module_caches` from
  tests/conftest.py:50-82 (added in 0.32.0)
2026-05-04 15:05:44 +02:00
ZdenekSrotyr
973e96e6af docs(spec): apply round-1 review — orphan/wiring fixes
11 minimum edits + minor cleanups from first review pass:

- TTL field: add `ttl_seconds` alongside existing `expires_in_days` in
  PAT request, force-clamp to 1h for scope=bootstrap-analyst
- Drop layered per-workspace config (no defined producer; multi-instance
  is edge case) — moved to Open questions
- CLAUDE.md producer: explicit `GET /api/welcome` flow, address admin
  DB-override migration
- URL: `/install` → `/setup?role=...` (matches existing routing; legacy
  `/install` keeps 302 redirect)
- `da metrics import` → relocated under `da admin metrics import`
- Repo-root CLAUDE.md added to rewritten files list
- CLI surface: enumerate `da explore`, `da disk-info`, `da snapshot
  refresh/prune`, full `da snapshot create` flag list, `da push` flags
- Test fixtures section: contracts for `fastapi_test_server`, `test_pat`,
  `test_pat_no_grants`, `zero_grants_workspace`, `web_session`, `client`
- Cleanups: line ref for `_install_claude_hooks`, admin-namespace
  disclaimer, `da init`-doesn't-call-`da auth login` note, audit-log
  consumer note for PAT scope, `{workspace_path}` placeholder usage
2026-05-04 14:59:40 +02:00
ZdenekSrotyr
8cbf0aa818 docs(spec): clean analyst bootstrap — greenfield redesign
Single-paste web→Claude Code workspace bootstrap. PAT-only auth,
non-interactive `da init`, slim CLI surface (init/pull/push/snapshot),
lazy-mkdir contract for empty-folder discipline, AGNES_WORKSPACE.md
human docs, layered per-workspace config, comprehensive clean-install
verification protocol.

Greenfield rewrite — drops `da analyst setup`, `da sync`, `da fetch`,
`da metrics`, `da skills`. Removes legacy `data/{parquet,duckdb,metadata}/`
and `user/artifacts/` workspace dirs.
2026-05-04 14:59:40 +02:00
ZdenekSrotyr
baa2efe57f
Merge pull request #168 from keboola/zs/fix-160-remote-query-view-entities
fix: #160 da query --remote works for BQ VIEW/MATERIALIZED_VIEW entities + cost guardrail + closing pre-existing RBAC bypass
2026-05-04 14:48:05 +02:00
ZdenekSrotyr
cf8930b593 chore(release): cut 0.32.0 — #160 da query --remote on VIEW + 4 reinforcing fixes
CHANGELOG: rename [Unreleased] → [0.32.0] — 2026-05-04, prepend a new
empty [Unreleased] for next-PR landing zone.
pyproject.toml: version 0.31.0 → 0.32.0.

Per repo discipline (memory: feedback_release_cut_with_pr.md), the
release-cut commit lands as the FINAL commit of the PR that contained
the user-visible behavior change — it does not get a separate PR.

After merge: tag v0.32.0 on the merge commit + create a GitHub Release
(memory: feedback_github_release_per_tag.md — the tag alone isn't
enough; the Release prose is the operator-visible artifact).

Headline: closes #160. da query --remote now resolves query_mode='remote'
BQ rows whose entity is VIEW or MATERIALIZED_VIEW (the bug Pavel hit).
Plus 4 reinforcing fixes — server-side cost guardrail (bq_max_scan_bytes,
default 5 GiB), registry-gating of direct bq.* paths, bigquery_query()
function-call backdoor closed, structured CLI render of typed BQ errors —
and one operator-side admin convenience (BQ test-connection endpoint +
billing_project placeholder UI).

14 issues caught and addressed across 6 iterations of Devin Review.
E2E verified on agnes-zsrotyr.groupondev.com (commit 7f743d03):
  - VIEW path resolves (count=23 from active_inventory_view)
  - VIEW aggregate parity vs filtered BASE TABLE
  - cost guardrail rejects with structured 400 detail
  - bq_path_not_registered 403 (incl. quoted "bq" variant)
  - bigquery_query() blocklist 400
  - test-connection endpoint 200 with elapsed_ms
2026-05-04 14:37:52 +02:00
ZdenekSrotyr
7f743d0392 fix(cli): #168 review iter 6 — render empty-string diagnostics
Devin Review iter #6 found 2 issues.

🟡 BUG: cli/error_render.py filtered out empty-string values via
`detail[key] not in (None, "")` and `value not in (None, "")` before
they could reach `_kv_line`. But `_kv_line` was specifically designed
to render empty strings as `(empty)` — the filter shadowed that
branch. The hidden field happens to be the most operator-actionable
one in `cross_project_forbidden`: `billing_project: ""` is the exact
diagnostic confirming WHY USER_PROJECT_DENIED fires.

Change filter to `is not None`. Empty strings now flow through
`_kv_line` and render as `billing_project: (empty)`.

📝 ANALYSIS: CHANGELOG wording for the test-connection endpoint said
"the saved data_source.bigquery config", which Devin flagged as
slightly misleading because `get_bq_access` is `@functools.cache`d —
"Test connection" tests the config in the running process, not the
just-saved YAML overlay. The save flow already returns
`restart_required: True` and the UI shows a banner, so the behavior
is documented; only the CHANGELOG wording was loose. Tightened to
"the **process-cached** BqAccess... Tests the config active in the
running process — after a save the response includes restart_required;
click Test AFTER restart to validate the freshly-saved values."

New test: test_renders_empty_string_as_empty_marker locks in the
empty-string-as-(empty) rendering for the cross_project_forbidden
case so a future filter change won't silently drop the diagnostic
again. 9 affected render tests pass.
2026-05-04 14:30:43 +02:00
ZdenekSrotyr
4bd1919f77 fix(query): #168 review iter 5 — forbidden-table check uses registry IDs
Devin Review iter #5 flagged a pre-existing class of name/id mismatch
in app/api/query.py:131-136 — the SAME root cause as the bq.* RBAC
issue I fixed in iter #3 (line 332/362). Devin called it out as
"NOT introduced by this PR" / "might merit follow-up", but it's
exactly the same security-boundary pattern this PR is hardening, so
fixing here keeps the RBAC story consistent across the handler.

The `forbidden = all_views - set(allowed)` comparison mixed types:
- `all_views` carries DuckDB master view names (= registry display
  `name` from the orchestrator's CREATE VIEW)
- `set(allowed)` carries registry IDs (resource_grants.resource_id)

When `id != name` (e.g. id="bq.finance.ue", name="ue"), authorized
users got spurious 403s — the view name landed in `forbidden` even
though the caller had a valid grant on the registry id.

Build a name->id map from the registry, then the forbidden check
compares apples to apples:
    allowed_view_names = {r["name"] for r in registry_rows
                          if r.get("name") and r.get("id") in allowed_ids}
    forbidden = all_views - allowed_view_names

107 affected tests pass; 487 pass in wider RBAC/query/access/admin
domain — no regressions.
2026-05-04 14:18:43 +02:00
ZdenekSrotyr
a43533ca44 fix(cli): #168 review iter 4 — render reason when both kind+reason set
Devin Review iter #4 caught: `_format_dict` in cli/error_render.py
seeded `seen = {"kind", "reason"}` to keep both out of the kv block.
But the label line uses only ONE of them (`kind or reason or "error"`),
so the other was silently dropped.

Quota rejections at app/api/query.py:423 (daily-budget) and 488
(concurrent-slot) emit BOTH keys: `{reason: "daily_byte_cap_exceeded",
kind: "daily_bytes", ...}` and `{reason: "concurrent_slot_exceeded",
kind: "concurrent_scans", ...}`. Operator only saw `kind` in the label
and never the more specific `reason` value.

Fix: track which key actually went into the label and skip only that
one. The other appears in the kv section.

Verified output:
  Error: daily_bytes (HTTP 429)
    reason: daily_byte_cap_exceeded
    current: 99999
    ...

8 affected render tests pass.
2026-05-04 14:04:16 +02:00
ZdenekSrotyr
28aba4c1f9 fix(query): #168 review iter 3 — RBAC name-vs-id, placeholder dead code
Devin Review iter #3 found 3 new real bugs after iter #2's fixes landed.

🔴 RBAC check at app/api/query.py:362 used `row["name"]` against
`accessible_set`, but `accessible_set` is keyed by registry IDs
(`get_accessible_tables` returns `resource_grants.resource_id` —
table IDs, not display names). Confirmed by `_table_blocks` projection
at `app/resource_types.py:157-158`. When `id != name` (e.g.
`id="bq.finance.ue", name="ue"`), non-admin users with valid grants
got 403 `bq_path_access_denied`. Switch to `row["id"]`.

🚩 Bare-name pass at app/api/query.py:332 had the same name-vs-id
mismatch (different impact): legitimate accessible rows were skipped
from `dry_run_set`, so the cost guardrail under-counted scan bytes
for non-admin users. Could let an over-cap query through and
under-bill quota. Switch to `row_id` comparison.

🟡 `placeholder_from` for billing_project was dead code.
`_BQ_OPTIONAL_FIELD_DEFAULTS["billing_project"] = ""` seeded an empty
string into every GET payload via `_ensure_bq_optional_fields`. JS
`isUnset = (value === undefined)` evaluated False, so the
`(defaults to <project>)` placeholder NEVER rendered. Drop the seed —
field stays in `known_fields` (UI sees it) but routes through the
unset rendering path on GET, where placeholder_from fires.

Tests: test_get_surfaces_bq_fields_even_when_unset assertion flipped
from "billing_project IS present" to "billing_project NOT auto-seeded"
to lock in the new shape. 67 affected tests pass.
2026-05-04 13:51:36 +02:00
ZdenekSrotyr
5eaa449fcc fix(query): #168 review iter 2 — quota user_id parity + concurrent-slot 429
Devin Review iter #2 found 2 new issues (after iter #1's 5 fixes
landed). Both real, both addressed.

🔴 Quota user_id key mismatch defeated shared daily budget. /api/query
computed `user.get("id") or user.get("email")` while /api/v2/scan uses
`user.get("email") or "anon"` (app/api/v2_scan.py:327). Same user → two
different keys in the singleton QuotaTracker. BQ bytes consumed via
/api/query were tracked under UUID; via /api/v2/scan under email; the
`check_daily_budget` pre-flight on either endpoint never saw the
other's recorded bytes — per-user cap was effectively doubled. Match
v2/scan's email-first ordering.

🟡 QuotaExceededError(KIND_CONCURRENT) → 400 instead of 429.
`quota.acquire(user_id)` raises this from __enter__ when the per-user
concurrent-scan slot is at cap. The exception propagated through the
@contextlib.contextmanager generator, the caller's `with guard:`
block, and was caught by execute_query's generic `except Exception`
handler → mapped to 400 with a flattened "Query error: concurrent_scans:
N/M" string, dropping the typed retry_after_seconds field. Wrap the
`with quota.acquire(...)` in a try/except QuotaExceededError that maps
to 429 with the same typed-detail shape used for the daily-budget
rejection — consistent with /api/v2/scan:392-402.

Tests: test_api_query_quota.py user_id strings updated to
"admin@test.com" (the seeded_app admin's email) to match the new
email-first ordering. 40 affected tests pass.
2026-05-04 13:38:31 +02:00
ZdenekSrotyr
1263b80726 fix(query): #168 review — concurrent-slot wraps execute, doc/JS fixes
Devin Review on PR #168 found 5 issues — all real, all addressed.

🚩 ANALYSIS_001 (architectural): concurrent-slot guard didn't protect
actual BQ query execution. Earlier `_enforce_remote_bq_quota_and_cap`
ran dry-run + cap check inside `with quota.acquire(user_id):`, then
returned — releasing the slot BEFORE `analytics.execute(...)` ran. Spec
§4.3.3 explicitly designs the slot to wrap execute so the per-user
concurrent cap limits BQ scans, not just dry-runs.

Refactor to a context manager `_bq_quota_and_cap_guard`. Caller's `with`
block now holds the slot through dry-run, cap check, the actual
`analytics.execute(...)` (which is what triggers the BQ scan when DuckDB
resolves the master view), AND the post-flight record_bytes. Slot
released only when caller's `with` body exits.

🟡 BUG_001: placeholder JS walked `original` (full GET payload root)
instead of `original.sections`. `placeholder_from: ["data_source",
"bigquery", "project"]` is a section-relative path, so billing_project
placeholder NEVER rendered. Fix: walk `original.sections` (with fallback
to `original` for safety).

🟡 BUG_002 + BUG_003: admin_tables.html register and edit modals'
operator help text referenced `max_bytes_per_remote_query` (the old
name from the spec) but the actual config key is `bq_max_scan_bytes`
after the fix-up commit `6423888d` moved it. Replace both occurrences.

🟡 BUG_004: CHANGELOG entry said `api.query.bq_max_scan_bytes` (the
old path) but the read at app/api/query.py:53 is
`get_value("data_source", "bigquery", "bq_max_scan_bytes", ...)`. An
operator who set it under `api.query` in their yaml would have no
effect. Correct path in CHANGELOG.

All 95 #160-affected tests pass after the changes.
2026-05-04 13:28:03 +02:00