Adds a community-driven Store where any authenticated user uploads
skills/agents/plugins as ZIPs, plus /my-ai-stack as the per-user
composition view. The served Claude Code marketplace is now:
(admin_granted ∖ opt_outs) ∪ store_installs
Skill + agent installs are merged into a single `agnes-store-bundle`
plugin in the served marketplace; type=plugin uploads stay standalone.
Names are suffixed with `-by-<owner-username>` at upload time so two
owners can use the same display name without colliding in Claude Code's
flat skill/agent namespace.
Schema v23 → v24 adds three tables:
- store_entities — community-uploaded skills/agents/plugins
- user_store_installs — what each user has chosen to install
- user_plugin_optouts — opt-out overlay on top of admin grants
Admin grant-delete drops every user's opt-out for that plugin so
re-grant resets cleanly to enabled (no sticky personal preference).
UI:
- /store — e-commerce-style listing with type/category/owner
filters, search, pagination, owner-aware [Install]
buttons, clickable cards
- /store/new — 2-step upload wizard with drag & drop, preview
validation (POST /api/store/entities/preview), docs
multi-upload, photo + video URL
- /store/{id} — detail page with hero, file list, docs, owner
actions (Edit/Delete) for the uploader
- /my-ai-stack — Granted plugins (toggle opt-out) + From the Store
(uninstall) sections
- Admin nav: Marketplaces moved into Admin dropdown, renamed to
"Curated Marketplaces"
Validation hardening: type-mismatch guards reject skill ZIP uploaded as
agent (or vice versa), and plugin ZIPs masquerading as skills/agents.
Human-readable error messages mapped client-side from machine codes.
Cross-source naming: Store entity-id-prefixed dirs (`plugins/store-<id>/`)
plus the bundle (`plugins/store-bundle/`) avoid collisions with admin
marketplaces (whose `store` slug is reserved by `is_valid_slug`).
Bundle composition is content-hashed at serve time — install/uninstall
or owner re-upload bumps the bundle's plugin.json `version`, so Claude
Code's auto-update toggle picks up changes.
Tests: 50+ new tests across naming, repositories, filter (admin ∪ store
∪ bundle), API (upload/install/uninstall/delete/preview/docs), end-to-end
marketplace.zip with bundle merging.
GET /api/health/detailed now returns a session_pipeline service entry.
Heuristic:
max(mtime of /data/user_sessions/**/*.jsonl) <=
max(processed_at in session_extraction_state) + grace_seconds
grace_seconds = 2 × verification-detector cadence (default 30 min;
configurable via SCHEDULER_VERIFICATION_DETECTOR_INTERVAL).
When the assert fails, status='warning' (never 'error') with an
actionable detail pointing at the verification-detector scheduler job.
A warning bubbles up to the existing overall='degraded' aggregation —
operators querying /api/health/detailed (or /agnes diagnose system)
get a clear breadcrumb instead of a silently-broken pipeline.
Cold-start case (no session files, or files newer than the grace
window with empty state table) is handled explicitly to avoid noise
on a fresh deploy.
Tests: tests/test_health_session_pipeline.py.
The /corporate-memory page filters status IN ('approved','mandatory')
and showed no hint that pending items exist. With approval_mode set to
'review_queue' (the default in instance.yaml.example), every collection
run would silently funnel new items into the pending bucket where no
operator ever saw them.
For admins (is_km_admin), the page now renders a banner above the
stats bar:
N pending items awaiting review — review them at /corporate-memory/admin
Non-admins see no change (the route zeroes the count server-side
before passing to the template, so the hint is never leaked).
Tests: tests/test_corporate_memory_page.py.
**BREAKING** for operators using `COMPOSE_PROFILES=full` or custom
Compose overrides that referenced these stanzas — they're gone in
docker-compose.yml and docker-compose.prod.yml. The scheduler-v2 model
(previous commit) is now the sole driver: every cadence is a job in
services/scheduler/__main__.py:JOBS hitting an admin HTTP endpoint.
Why drop instead of keep behind `profiles: [full]`:
- The previous stanzas were tight `restart: unless-stopped` boot loops.
When the scheduled run ended (every cycle), Docker re-spawned the
container, defeating any cadence the service intended.
- The whole point of #176 is that there's now exactly one driver. Two
drivers (scheduler HTTP + standalone container loop) would race on
the same /data/user_sessions and knowledge_items writes.
- Removing the stanzas is a louder signal than commenting them out —
operators upgrading get a clean failure mode (no stale containers),
not a silently double-driven pipeline.
The Python entry points (services/{corporate_memory, session_collector,
verification_detector}/__main__.py) stay — they're still callable from
the CLI for manual one-shot runs and from the new admin endpoints.
docs/architecture.md updated to reflect the new schedule table.
tests/test_docker_compose.py pins the contract: the two services must
not reappear under either Compose file.
The session-collector, verification-detector, and corporate-memory
services now run on the same scheduler-v2 model that already drives
data-refresh, health-check, script-runner, and marketplaces:
- New admin endpoints in app/api/admin.py:
POST /api/admin/run-session-collector
POST /api/admin/run-verification-detector
POST /api/admin/run-corporate-memory
All admin-gated, sync-def (FastAPI thread pool), with one audit row
per invocation. Same single-writer-of-system.duckdb pattern as the
existing /api/marketplaces/sync-all job.
- services/scheduler/__main__.py JOBS gains three entries with offset
cadences (10m / 15m / 17m, all coprime modulo the 30s tick) so the
three LLM-backed jobs don't fire on the same tick and stack their
API + DB load.
- The verification-detector endpoint surfaces the LLM factory's
fail-fast ValueError as HTTP 500 with the actionable message,
preserving the no-silent-skip contract from the previous commit.
Tests:
- tests/test_admin_run_endpoints.py covers admin gating + scheduler
registration + endpoint contract.
- tests/test_scheduler_sidecar.py existing tests continue to pass.
POST /api/admin/configure now writes a default ai: block into the
instance.yaml overlay when the request leaves it untouched and either
ANTHROPIC_API_KEY or LLM_API_KEY is set in the environment. The block
references the env var via ${VAR} syntax — secrets never land in YAML.
connectors.llm.factory grows create_extractor_from_env_or_config which
falls back to ANTHROPIC_API_KEY / LLM_API_KEY when ai_config is empty
and raises a clear ValueError when neither is available. Both
services/corporate_memory and services/verification_detector switch to
the new helper, replacing the old 'silently skip when ai: missing'
path that was the silent-failure root cause.
Tests:
- tests/test_setup_ai_block.py — overlay seeding contract.
- tests/test_llm_provider_env_fallback.py — fallback + fail-fast.
LLM provider SDKs are imported by services/corporate_memory and
services/verification_detector — both production code paths. Listing
them only in [project.optional-dependencies].dev caused the scheduler
container to boot-loop with ModuleNotFoundError on default
`docker compose up` deploys, because the Dockerfile installs core
deps only (`uv pip install --system --no-cache .`).
Adds tests/test_packaging.py to lock the contract: anthropic + openai
must live in [project].dependencies, not in dev extras.
Pre-fix: when v24 migration found rows to migrate but
data_source.bigquery.project was empty, it logged a warning per row
and returned normally. Schema_version then bumped to 24 unconditionally
→ next start's 'if current < 24:' gate skipped _v23_to_v24_finalize
forever, leaving rows in DuckDB-flavor SQL that the new
_wrap_admin_sql_for_jobs_api wrapping path rejects.
Devin escalated this from advisory ("idempotent retry") to critical
on rescan after my reply. The reply was wrong — the LIKE filter inside
the function gives idempotency IF the function is called again, but
the schema-version gate prevents that call from happening.
Fix (Devin's recommended Approach 1): raise RuntimeError BEFORE the
schema-version bump when rows need migration but project_id is empty.
The schema_version stays at 23, so on next start the 'if current < 24:'
gate fires and the migration runs again — this time with project_id
configured.
Side effect: a BQ-using deployment that hasn't set the project blocks
startup until they do. That's the right call for a config error that
would otherwise silently break all materialized tables. The error
message points at the right knob (data_source.bigquery.project +
restart).
No-rows-no-block invariant preserved: the early 'if not rows: return'
at the top of _v23_to_v24_finalize means non-BQ deployments are
unaffected.
Tests:
- test_v24_raises_when_project_not_configured_and_rows_need_migration:
asserts raise + schema_version stays at 23 (the load-bearing
invariant for retry-on-next-start to work)
- test_v24_skips_clean_when_no_rows_match_even_without_project:
asserts non-BQ deployments don't block startup
- Existing 3 tests still pass
Three items from operator feedback after running the actual flow:
(1) Help docstring lied: "--bucket / --source-table ignored" for
materialized rows. Reality: --bucket is load-bearing because
`agnes schema <name>` builds the BQ identifier as
`bq.<bucket>.<source_table>`. An empty bucket registered the row but
broke schema/describe with HTTP 400 "unsafe BQ identifier in
registry". Fix: docstring rewritten to reflect reality, plus
client-side validation rejects materialized + empty bucket with a
clear error pointing at the right knob.
(2) Post-register UX cliff: `agnes pull` after register-table reports
"Updated 0 tables (1 total)" because registration adds a registry
row but does NOT trigger a parquet build. Operators routinely
assume something's broken when they need to run
`agnes setup first-sync` to kick off the materialization. Hint
emitted on success now points at first-sync.
(3) RBAC gotcha: `agnes catalog` is RBAC-filtered via
`resource_grants`, so non-admin users don't see freshly-registered
rows until a grant is created. Hint emitted on success now points at
`agnes admin grant create <group> table <name>`.
Tests: 8/8 in test_cli_admin_materialized.py, including two new
regression tests for the validation + the hint output.
The iterative bare-name rewriter (one re.sub per name, longest-first)
was vulnerable to cross-contamination when the GCP project ID contained
a registered table name as a hyphen-delimited word.
Concrete repro:
project = 'my-ue-project'
registered = ['orders', 'ue']
user SQL = 'SELECT * FROM orders JOIN ue ON ...'
iter 1 (orders): produces 'FROM `my-ue-project.fin.orders` JOIN ue ...'
iter 2 (ue): '\bue\b' matches 'ue' INSIDE 'my-ue-project' (hyphen
creates word boundary on both sides) — corrupts
the iter-1 path
Fallback at query.py:576 caught the resulting BQ parse error and fell
back to per-table SELECT * estimate, so impact was over-estimation,
not fail-open — but the #171 partition-pruning fix silently degraded
to pre-fix behavior whenever a project name shared a hyphen-segment
with a registered table.
Fix: single re.sub call with an alternation regex sorted longest-first.
Single-pass means each source position is processed exactly once, so
freshly-inserted backticked text from one match isn't re-scanned by
later names in the alternation.
Regression test
test_rewrite_helper_does_not_corrupt_when_project_id_contains_registered_name
covers the exact Devin repro.
`_try_acquire_file_lock` opened the lock file with `open(mode='w')`
BEFORE the mtime check, which truncated the file and refreshed mtime
to now. The subsequent age check always saw ~0, so the TTL reclaim
branch was never reachable and `materialize.lock_ttl_seconds` was
a silently no-op config knob.
Repro:
before open(w): mtime age = 100000s
after open(w): mtime age = 0s
Fix: stat the lock path BEFORE any open(). If pre-probe mtime is
older than TTL, unlink (forcing a fresh inode for the open + flock
that follows). Order is now stat-then-decide-then-probe, not
probe-then-stat-then-decide.
Two regression tests added in tests/test_bq_materialize_concurrency.py:
- test_stale_held_lock_is_reclaimed_despite_live_holder — exercises
the full reclaim path with a still-living fcntl holder. Pre-fix
this returned None (in_flight forever); post-fix returns a holder
fd on a new inode.
- test_failed_probe_does_not_self_refresh_lock_mtime — sister test
pins that a failed acquisition's mode='w' truncate doesn't
pathologically loop.
Residual cross-process risk (genuinely overrunning materialize past
TTL races a fresh attempt — both write to the same parquet.tmp,
inode-level flock independence means new acquisition succeeds while
old holder is still alive) stays documented in the helper docstring.
In-process threading.Lock keyed on table_id blocks the single-process
race; cross-process protection relies on TTL being well above
longest plausible COPY (24h default).
Adds `test_unified_flow_uses_only_agnes_verbs` that asserts no `da `
substring (with trailing space, to dodge false positives on `Darwin` /
`database` / `adapter`) appears in any of the four
`resolve_lines()` shapes:
- bare (no plugins, no ca)
- plugins only
- ca only
- plugins + ca
Also pins the `agnes init --server-url … --token …` shape — commit
8784f10a's stale-on-disk-token fix relies on `init` receiving an
explicit `--token` argument; if a future refactor drops the flag from
the emitted command the test fails loudly instead of silently
regressing to 401-on-stale-token in production.
Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 10.
The `/setup` route no longer accepts `?role=analyst|admin`. The route
signature drops the `Literal[...] = Query(...)` parameter and the
silent admin-downgrade block (`if role == "admin" and not is_admin:
role = "analyst"`). The `role` ctx variable threaded into install.html
also goes away — Task 6 cleans up the template's role-tile UI and the
JS PAT-mint ternary.
`?role=` is silently ignored by FastAPI for unknown query params, so
existing bookmarks (none in production — the param was added in this
PR and never shipped) just degrade to the unified layout. No
RedirectResponse shim needed.
Tests: drop the entire `tests/test_setup_page_roles.py` file (eight
role-branching tests that no longer apply) and add
`tests/test_setup_page_unified.py` with three tests:
- `test_setup_page_renders_unified_layout`
- `test_setup_page_ignores_role_query_param`
- `test_setup_page_renders_marketplace_for_user_with_grants`
- `test_install_legacy_path_redirects_to_setup`
Also replace the role-aware `test_install_preview_*` tests in
test_web_ui.py with unified-layout assertions.
Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 5.
Removes the `role: Literal["analyst", "admin"] = "admin"` parameter from
`compute_default_agent_prompt`. The same RBAC pass
(`marketplace_filter.resolve_allowed_plugins`) now runs for every user —
admin or not. Users with no `resource_grants` rows get the
no-marketplace layout; users with grants get the marketplace block
inserted. Admin-vs-analyst is no longer a layout branch.
`render_agent_prompt_banner` no longer derives a `role` from
`user.is_admin`; it just delegates to `compute_default_agent_prompt`.
Two `compute_default_agent_prompt(...role=role)` call sites in
`app/web/router.py::setup_page` are updated to drop the keyword so the
route keeps rendering — Task 5 will remove the `?role=` query
parameter and the silent admin-downgrade block from the route signature
itself.
Tests: drop role-aware assertions from test_welcome_template_renderer
and test_welcome_template_api. Both files now assert the unified
default contains `agnes init` + `uv tool install` and bans the legacy
`agnes auth import-token` / `agnes auth whoami` verbs.
Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 4.
Renames `_git_check_block` to `_preflight_block` and adds a
`claude --version` check beside `git --version`. Both binaries are
required by the marketplace step — git for the clone fallback,
claude for `claude plugin marketplace add` / `claude plugin install` —
so checking them together gives one clear failure instead of two
confusing downstream errors.
Install hints: `npm i -g @anthropic-ai/claude-code` for Linux / WSL
plus a doc URL (https://docs.claude.com/claude-code) for the native
macOS / Windows installers. We don't try to one-line a native
installer; the canonical instructions live upstream.
Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 3.
Adds `_step_numbers(*, has_marketplace, has_skills)` so step numbering
lives in one place instead of being split across three branches in
`resolve_lines`. Pins the unified layout in the tests:
No plugins: 1 install, 2 init, 3 catalog, 4 diagnose, 5 skills, 6 confirm
With plugins: 1, 2, 3, 4 preflight, 5 marketplace, 6 diagnose, 7 skills, 8 confirm
`agnes auth import-token` / `agnes auth whoami` are now banned from the
rendered prompt — `agnes init` subsumes them. The renamed
`test_resolve_lines_no_plugins_unified_six_step_layout` asserts those
strings are absent and that the new step headers (`Bootstrap your Agnes
workspace`, `Verify the data is queryable`) are present.
Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 2.
Removes the `role: Literal["analyst", "admin"]` parameter from
`resolve_lines` / `render_setup_instructions` and deletes the
`_resolve_analyst_lines`, `_analyst_init_lines`, `_analyst_finale_lines`
helpers. The unified flow now always emits `agnes init` (the
workspace-rails delivery mechanism) in place of the legacy
`agnes auth import-token` + `agnes auth whoami` pair, and uses
`agnes catalog` as the smoke-verify step.
`agnes init` already verifies the PAT internally, and `agnes catalog`
doubles as a data-plane smoke check, so dropping `agnes auth whoami`
costs no signal.
Drops the now-redundant `tests/test_setup_instructions_analyst.py` and
patches the one ordering test in `tests/test_setup_instructions.py` that
referenced the old "Log in" / "Verify the login" headers. Also strips
the `role=role` kwarg from `compute_default_agent_prompt`'s call into
`resolve_lines` so the welcome-template render path keeps working;
welcome_template.py's own role param is removed in a follow-up task.
Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 1.
Three Devin Review findings on PR #173 addressed in one commit since
they're in adjacent code paths:
1. cli/commands/init.py:99 (\u{1F534}): `agnes init --token NEW` ran
step 2 verify against the OLD on-disk token because `get_token()`
read `~/.config/agnes/token.json` before the env var, and
`_override_server_env` only set the env var. So `agnes init --force`
on a machine with a stale token.json failed 401 with a confusing
'token expired' even though the --token arg was valid.
Fix: ContextVar-based override in `cli.config._token_override`
checked by `get_token()` BEFORE the on-disk read.
`_with_token_override` context manager scopes the override.
`_override_server_env` now also sets the contextvar via
`_with_token_override(token)`, so both env var and contextvar
carry the override (env for back-compat with anything bypassing
get_token; contextvar is the authoritative source).
Async-safe (each task sees its own override) and leak-proof
(resets on context exit).
2 new tests: regression on stale-disk-token + scope leak guard.
2. cli/commands/status.py:43 (\u{1F7E1}): sessions_pending_upload only
checked legacy `<workspace>/user/sessions/` and always reported 0
in workspaces bootstrapped with `agnes init` (Claude Code writes
to `~/.claude/projects/`, not the legacy path). Same bug we fixed
for `agnes push` in 08e49591.
Fix: route through `cli.lib.claude_sessions.list_session_files()`
so status and push agree on what counts as a pending session.
3. connectors/bigquery/extractor.py:111 (\u{1F7E1}): docstring claimed
"a live holder still wins the second flock attempt" — incorrect on
Linux. After `unlink()` + `open()`, the new file is a new inode;
fcntl.flock keys per-inode, so the old holder's lock does NOT block
the new acquisition. In a genuine TTL-overrun scenario two writers
CAN race the parquet.tmp.
Fix: documentation only. Comment now honestly describes the
inode-recreation behavior, names the threading.Lock as the actual
in-process guard, and flags pid-gating as the next-iteration fix
if real corruption surfaces. The 24h default TTL is well above
typical COPY durations so the practical risk is low.
Tests: 17/17 across test_cli_init.py + test_lib_pull.py + the broader
regression set.
Sweep operator runbooks (docs/QUICKSTART, docs/HEADLESS_USAGE,
docs/architecture, docs/sample-data, docs/agent-workspace-prompt,
docs/metrics/metrics.yml, dev_docs/server, dev_docs/disaster-recovery),
the corporate-memory service README, the jira connector README + backfill
scripts, the deploy skill, and test docstrings. Replaces `da sync` →
`agnes pull`, `da analyst setup` → `agnes init`, `da metrics ...` →
`agnes catalog --metrics` / `agnes admin metrics ...`, `da fetch` →
`agnes snapshot create`, plus the matching docker-compose admin
invocations.
Vendor-specific `/opt/data-analyst/` install paths in jira backfill /
consistency scripts and operator docs are replaced with the
placeholder `<install-dir>` and a new `AGNES_ENV_FILE` env-var override
that lets a deployment inject its actual install path without a code
change. Aligns with the OSS vendor-agnostic policy in CLAUDE.md.
CHANGELOG `### Internal` entry summarizes the audit and reaffirms the
intentional stale-marker tuples (`_LEGACY_STRINGS`, `_OUR_COMMAND_MARKERS`)
that must keep referencing `da sync` / `da fetch` / etc. for hook upgrade
and override-detection logic.
Pre-fix `agnes pull` decided what to download from sync_state hash
equality alone:
if server_hash != local_hash or tid not in local_tables or not server_hash:
to_download.append(tid)
If the recorded local hash matched server but the actual parquet had
been deleted from disk, the download was skipped. The next DuckDB
view rebuild then fails on a missing file. Repro: `rm
server/parquet/X.parquet && agnes pull` → 'Updated 0 tables', X
still missing.
Failure modes that produce hash-equal-but-file-missing:
- manual `rm` of a single parquet
- operator-side cleanup of `server/parquet/`
- two workspaces sharing one user's
`~/.config/agnes/sync_state.json` (TODO(workspace-scoped-sync-state)
in pull.py): one workspace writes its parquets, the other reads
sync_state and concludes 'I already have these'
- disk corruption / partial restore from backup
Fix: existence check runs alongside the hash compare. Missing file
forces a re-download regardless of hash equality. `parquet_dir` is
hoisted above the loop so the existence check is in scope when the
download set is built.
Tests: regression test for the hash-equal-but-missing-file case +
counterpart for the fast-path (hash-equal-and-file-present must
still skip).
Closes#171. The /api/query cost guardrail used to dry-run a synthetic
`SELECT * FROM <table>` for each registered remote-BQ row referenced
by the user SQL — which made BigQuery estimate a full table scan, with
column projection, predicate pushdown, and partition pruning all
disabled. Narrow queries on big partitioned/clustered tables (the
documented happy path for `agnes query --remote`) hit ~30,000×
over-estimates and got rejected with 400 `remote_scan_too_large` even
when BQ's own dry-run reported single-digit MB.
Pavel's report on #171 traced the root cause and proposed the fix:
rewrite the user SQL to BQ-native syntax and dry-run it as a single
job, exactly the way `bq query --dry_run` works.
Implementation:
- New helper _rewrite_user_sql_for_bq_dry_run rewrites bare registered
names (word-boundary, case-insensitive, longest-first to avoid prefix
collisions) + bq."<ds>"."<tbl>" forms to backticked
`<project>.<ds>.<tbl>` paths.
- _bq_quota_and_cap_guard runs ONE dry-run on the rewritten SQL. Cap
check uses the real estimate.
- Fallback path: if BQ rejects with bq_bad_request (e.g. DuckDB-only
syntax like ::INT casts), the guard falls back to the pre-fix
per-table SELECT * approach so non-portable queries still get a
(loose) cap estimate instead of fail-opening. Non-parse BQ errors
(forbidden, upstream) still propagate as 502.
- _bq_guardrail_inputs now also returns name_lookups so the rewriter
has the (registered_name, bucket, source_table) mapping it needs.
- Per-table breakdown is unavailable from a composite dry-run; total
bytes are pinned to dry_run_set[0] for the post-flight
record_bytes(sum(...)) call to keep returning the right total.
Tests (7 new, 3 existing still pass):
- dry-run receives rewritten user SQL with WHERE clause intact (the
load-bearing assertion for #171)
- single dry-run per request even with multiple registered tables
(JOIN, UNION) referenced
- fallback to per-table SELECT * on bq_bad_request
- non-parse BQ errors (forbidden) still 502
- rewriter unit tests: bare + bq.path in same SQL, longest-name-wins
on prefix collision, case-insensitive bare-name match
The renderer no longer emits the legacy "da analyst setup" verb (the analyst
flow uses `agnes init`, the admin flow uses `agnes auth import-token`). The
disjunction assertions ("da analyst setup" OR "agnes auth" OR "curl") were
permissive and would have silently kept passing even if the renderer
regressed. Replace them with role-aware assertions that match the actual
emitted markers and explicitly check that no legacy verb survives.
Four call sites where #174 (branched from main before the agnes rename
fully landed in some files) emitted or referenced `da fetch`. None are
operator-visible runtime crashes — but `extractor.py` logs a stale
verb to the operator log and `DATA_SOURCES.md` is current docs:
- connectors/bigquery/extractor.py:431,434 (operator-facing log line on
unverified BQ entity_type — was suggesting `da fetch`).
- docs/DATA_SOURCES.md:77,85 (current public docs, two refs to
`da fetch` in the workflow + the BQ scope description).
- tests/test_cli_query_render.py:7 (module docstring listed
`da fetch / agnes schema / etc.` — now `agnes snapshot create / agnes
schema / etc.`).
- tests/test_cli_snapshot_create.py:1 (docstring referenced `(folded
from `da fetch`)` — historical, removed; no value once the rename
landed).
Pre-existing stale `da` references elsewhere in the branch (templates,
operator runbooks, internal comments) are not touched by this commit —
they live outside the merge surface and are a separate cleanup task.
Verified: 10/10 across the affected test files pass.
The dashed identifier is what the test exercises (backticks required for
dashed BQ project IDs); the literal string can be any synthetic value.
'prj-grp' is too close to a real customer-prefix pattern that the OSS
vendor-scrub regex flags. 'my-project' matches placeholders used elsewhere
in the project.
Real bug: `agnes push` was reading `<workspace>/user/sessions/`, but
Claude Code writes session jsonls to `~/.claude/projects/<encoded-cwd>/`
and nothing on the analyst side ever copies them across. The SessionEnd
hook ran `agnes push` happily and uploaded zero sessions every time.
`cli/lib/claude_sessions.py` probes both Claude Code encoding variants
(older `/`→`-` keeping spaces+tildes; newer all-non-alphanumeric→`-`
with collapsed runs) and unions whichever exist. Users who upgraded
Claude Code mid-project end up with both encoded dirs side-by-side on
disk; the union ensures no session is left behind. Same-named jsonl in
both dirs → newest mtime wins. `<workspace>/user/sessions/` survives as
a fallback for any setup that explicitly mirrors sessions there.
Verified on real disk: helper returns 2 dirs + 8 unioned session files
for the Agnes-test workspace where the previous code returned 0.
Three coupled UX fixes for the analyst-onboarding flow:
1. Dashboard "Setup a new Claude Code" CTA was rendering admin paste
prompt for everyone (analysts couldn't actually execute the marketplace
plugin install / skills setup steps). render_agent_prompt_banner now
picks role based on user.is_admin — analysts get the analyst flow.
2. /setup default role changed from admin to analyst. Most visitors are
analysts; admin layout is opt-in via the admin tile or ?role=admin.
3. Admin tile is admin-only on the role-tile nav. Non-admins see only
the analyst tile. Server-side: non-admin requesting ?role=admin is
silently downgraded to analyst (otherwise they'd see admin paste
prompt despite no tile).
Tests:
- New: test_setup_page_admin_tile_hidden_for_non_admin (anonymous client
can't see "Admin CLI" or role=admin link)
- New: test_setup_page_admin_role_downgraded_for_non_admin (anonymous
?role=admin → analyst layout, no marketplace step in clipboard)
- New: test_install_preview_default_role_is_analyst (admin signing in to
bare /setup gets analyst clipboard by default)
- Renamed: test_setup_page_default_role_is_admin → ..._is_analyst
- Updated: test_setup_page_admin_clipboard_renders_admin_layout uses
FastAPI dependency_overrides to inject admin user (admin layout is
now admin-gated)
- Updated: test_install_preview_visible_for_signed_in_user explicitly
passes ?role=admin to exercise admin layout
Caught by my own broader test scope after Devin fixes — three test files
asserted on user-visible strings that were renamed by the bootstrap PR
but the assertions weren't updated:
- tests/test_api_query_guardrail.py:110 — asserted `da fetch in suggestion`
on /api/query 400 response. Renamed to `agnes snapshot create`.
- tests/test_query_materialized_error_message.py:56 — asserted `da sync`
in materialized-not-yet error detail. Renamed to `agnes pull`.
- tests/test_cli_error_render.py:71 — fixture data + assertion both
carried `da fetch`. Updated to `agnes snapshot create`.
Plus an actual content miss: docs/setup/claude_settings.json (a template
shipped to operators) still installed `da sync` / `da sync --upload-only`
hooks. The companion test file (tests/test_setup_hooks_template.py) was
asserting that legacy state. Updated both:
- Template hooks: `agnes pull --quiet` / `agnes push --quiet`
- Test assertions + function name match the new commands
13 Devin findings across 10 files:
🔴 Critical:
- app/api/v2_catalog.py:42 — `_fetch_hint` returns `da fetch` in /api/v2/catalog
responses (user-visible in every catalog list)
- cli/skills/agnes-data-querying.md — 11 stale `da fetch`/`da sync` refs in the
bundled skill markdown
- config/claude_md_template.txt:38 — referenced `agnes pull --docs-only` flag
that does NOT exist in agnes pull (removed; spec only ships --quiet/--json/
--dry-run)
🟡 Important:
- app/api/admin.py:252 — `da fetch` in bq_max_scan_bytes hint
- cli/commands/auth.py:119 — `da sync` in import-token docstring (--help text)
- cli/commands/tokens.py:48 — "Export it so `da` can use it" prose
- ARCHITECTURE.md — 4 stale rows in CLI commands table
- README.md — stale paragraphs for analysts (da sync, da analyst setup)
🚩 Substantive observations addressed:
- app/api/query.py:249,302,489 — server-side error/help strings still said
`da sync`/`da fetch` (returned in API responses to clients)
- cli/commands/snapshot.py:235-241 — DuckDB existence guard incorrectly
blocked `--estimate` (server-side dry-run that never opens local DB).
Added test ensuring estimate path skips the guard.
Skipped (intentionally historical):
- app/api/admin.py:2377,2429,2437 — historical comments describing past
manifest-vs-sync_state bug; past tense, accurate to keep as `da sync`.
Typer/rich emits ANSI styling in CI's --help output (e.g. `--metrics`
becomes `-\x1b[0m\x1b[1;36m-metrics`), so literal substring asserts
like `assert "--metrics" in result.output` fail. Locally the test runner
auto-detects no-TTY and produces plain text, masking the issue.
Add a small `_clean()` helper per test file that strips ANSI escape
codes (`\x1b\[[0-9;]*m`) before substring containment checks.
- _v23_to_v24_finalize: wrap row-update loop in BEGIN/COMMIT/ROLLBACK
to match the project's transactional-finalizer pattern (compare
_v12_to_v13_finalize, _v17_to_v18_finalize, _v18_to_v19_finalize).
Pre-fix a process crash mid-loop left the schema_version unchanged
but partially-converted rows persisted across restart — idempotent
overall but inconsistent with project convention.
- _v23_to_v24_finalize: re.sub replacement now uses a function-form
(lambda) instead of an f-string, so any future project_id with a
backslash sequence isn't misinterpreted as a group reference.
- tests: add a Keboola-source materialized row case asserting the
SELECT's source_type filter prevents non-BQ rewrites.
Materialize now wraps admin SQL into bigquery_query('<billing>', '<inner>')
which requires the inner SQL to be BigQuery-flavor (backticked
identifiers, native function syntax). v24 migrates existing rows from
DuckDB-flavor (bq."ds"."tbl") to (`<project>.ds.tbl`) using the
configured BQ project. Idempotent on already-converted rows; logs a
warning and skips when the project isn't configured (operator can
configure + restart for retry).
Task 20: reusable pytest fixtures for the clean-bootstrap test suite.
Tasks 21 and 22 (reader smoke matrix + init smoke matrix) consume them.
- fastapi_test_server boots a real uvicorn subprocess against a tmp DATA_DIR,
pre-seeded with admin@example.com (Admin group), analyst@example.com
(Everyone group), and three tables (one per query_mode: local /
materialized / remote).
- web_session: cookie-authenticated httpx.Client for the admin user.
- test_pat: minted JWT for the analyst with table grants on local +
materialized.
- test_pat_no_grants: same shape, zero resource_grants.
- zero_grants_workspace: subprocess invocation of `agnes init` against the
no-grants PAT; returns the bootstrapped workspace path.
- NONEXISTENT_TABLE: module-level sentinel for the upcoming reader matrix.
Subprocess uvicorn (mirrors tests/test_e2e_corporate_memory.py) instead of
in-thread so DATA_DIR + module-level singletons in src.db don't bleed
across tests. agnes CLI invoked via `python -m cli.main` instead of the
.venv/bin/agnes shim, which depends on .pth file visibility that iCloud
Drive intermittently re-hides on macOS.
New top-level 'materialize' section, single field (lock_ttl_seconds).
Default 86400 (24h). Backs the file-lock TTL reclaim added in the
per-table-mutex change. Editable via PUT /api/admin/server-config and
the /admin/server-config UI.
When admin registers a materialized BQ row with bucket+source_table but
no source_query, the server generates 'SELECT * FROM `<project>.<ds>.<tbl>`'
from instance.yaml's configured BQ project. Same fallback fires on PUT
when flipping to materialized. The backtick rejection guard, which was
appropriate for DuckDB-flavor source_query, is relaxed for materialized
rows since the new wrapping path (Task 2) runs admin SQL through BQ
jobs API which uses BQ-native syntax (backticks for dashed identifiers).
_run_materialized_pass distinguishes due-check skips from in-flight
skips and never calls state.set_error for either. summary['skipped']
becomes a list of {table, reason} dicts; the end-of-pass log line
breaks out the in_flight subcount.
Hoists is_table_due to module-level import so test monkeypatching of
the symbol intercepts the call (the previous local import made
patches a no-op).
- extractor._try_acquire_file_lock: close fd and re-raise on non-
BlockingIOError from fcntl.flock (read-only fs, unsupported flock,
fd exhaustion). Pre-fix the fd leaked silently and the underlying
OSError still propagated past the caller.
- extractor: reorder module-level layout so logger is bound before
the new lock-related helpers reference it. Deferred import of
app.instance_config inside _get_lock_ttl_seconds documented inline.
- extractor: comment _table_locks unbounded-by-design rationale.
- tests: docstring + monkeypatch-target rationale for the two
concurrency tests where the contract isn't obvious from the body.
Two Task 4 review fixes for app/web/templates/install.html:
1. JSON-escape `ROLE` JS const via `{{ role | tojson }}` (defense in
depth — removes the dependency on Jinja autoescape semantics for JS
contexts; FastAPI's Literal validator already constrains role values).
2. Verify the analyst tile's clipboard payload is the analyst layout.
The pre-existing role-aware plumbing (compute_default_agent_prompt
threading role into setup_instructions_lines, picked up by the JS
SETUP_INSTRUCTIONS_TEMPLATE array) was correct; adding regression tests
that pin to the JS clipboard block specifically so a future inversion
would fail loudly.
Tests: analyst clipboard contains `agnes init` + `agnes catalog` and
NOT `agnes auth import-token` / `agnes skills`; admin clipboard is the
inverse. Plus an explicit assertion that ROLE is rendered via tojson.
Two layers of concurrency control. Layer 1 is a per-table_id
threading.Lock keyed on table_id; Layer 2 is fcntl.flock on a sibling
<id>.parquet.lock file. Overlapping calls for the same id raise
MaterializeInFlightError, which the caller treats as 'skipped,
in_flight' instead of a hard error. Stale file locks (mtime older
than materialize.lock_ttl_seconds, default 86400) are reclaimed on
the next attempt — covers the rare case where a holder was hard-killed
before kernel-level flock release.
Pre-fix, when a materialize ran longer than the scheduler tick interval
(15 min), the next tick called materialize_query for the same id, hit
the unconditional tmp_path.unlink() at function entry, and started a
second COPY against the same path. Both writers interleaved bytes;
the original COPY's read_parquet validation then failed with
'No magic bytes found at end of file'.
- tests/test_bq_cost_guardrail.py: assert fail-open warning is logged
(test previously only proved fail-open doesn't crash; review note:
warning is the only operator-visible signal of the silent failure).
- extractor._wrap_admin_sql_for_jobs_api: docstring no longer claims
DuckDB-flavor SQL is rejected — the function performs no inner-SQL
validation; the v24 migration + register-time validator are the
real enforcement points.
- extractor.materialize_query: safe_path uses _escape_sql_string_literal
instead of inline replace, for one-place-to-update consistency.
- extractor: import hashlib hoisted to module-level imports.
Pre-fix, materialize ran the admin source_query as 'COPY (sql) TO parquet'
through the DuckDB BQ extension session. The extension defaults to the
BQ Storage Read API for bq.<ds>.<tbl> references, which rejects views
('non-table entities cannot be read with the storage API'). The fix
always wraps admin SQL into bigquery_query('<billing>', '<inner>') so
COPY uses the BQ jobs API uniformly for tables and views.
Cost guardrail dry-run now operates on the inner SQL (BQ-native), so
the BQ Python client parses it and the cap engages — pre-fix the dry-run
hit 'Table-valued function not found: bigquery_query' and fail-opened.
Pre-fix the fixtures lived inside tests/test_api_admin_materialized.py.
Upcoming test files in this branch need them too; conftest is the
canonical home so they resolve via pytest's auto-discovery.
Devin Review iter #6 found 2 issues.
🟡 BUG: cli/error_render.py filtered out empty-string values via
`detail[key] not in (None, "")` and `value not in (None, "")` before
they could reach `_kv_line`. But `_kv_line` was specifically designed
to render empty strings as `(empty)` — the filter shadowed that
branch. The hidden field happens to be the most operator-actionable
one in `cross_project_forbidden`: `billing_project: ""` is the exact
diagnostic confirming WHY USER_PROJECT_DENIED fires.
Change filter to `is not None`. Empty strings now flow through
`_kv_line` and render as `billing_project: (empty)`.
📝 ANALYSIS: CHANGELOG wording for the test-connection endpoint said
"the saved data_source.bigquery config", which Devin flagged as
slightly misleading because `get_bq_access` is `@functools.cache`d —
"Test connection" tests the config in the running process, not the
just-saved YAML overlay. The save flow already returns
`restart_required: True` and the UI shows a banner, so the behavior
is documented; only the CHANGELOG wording was loose. Tightened to
"the **process-cached** BqAccess... Tests the config active in the
running process — after a save the response includes restart_required;
click Test AFTER restart to validate the freshly-saved values."
New test: test_renders_empty_string_as_empty_marker locks in the
empty-string-as-(empty) rendering for the cross_project_forbidden
case so a future filter change won't silently drop the diagnostic
again. 9 affected render tests pass.
Devin Review iter #3 found 3 new real bugs after iter #2's fixes landed.
🔴 RBAC check at app/api/query.py:362 used `row["name"]` against
`accessible_set`, but `accessible_set` is keyed by registry IDs
(`get_accessible_tables` returns `resource_grants.resource_id` —
table IDs, not display names). Confirmed by `_table_blocks` projection
at `app/resource_types.py:157-158`. When `id != name` (e.g.
`id="bq.finance.ue", name="ue"`), non-admin users with valid grants
got 403 `bq_path_access_denied`. Switch to `row["id"]`.
🚩 Bare-name pass at app/api/query.py:332 had the same name-vs-id
mismatch (different impact): legitimate accessible rows were skipped
from `dry_run_set`, so the cost guardrail under-counted scan bytes
for non-admin users. Could let an over-cap query through and
under-bill quota. Switch to `row_id` comparison.
🟡 `placeholder_from` for billing_project was dead code.
`_BQ_OPTIONAL_FIELD_DEFAULTS["billing_project"] = ""` seeded an empty
string into every GET payload via `_ensure_bq_optional_fields`. JS
`isUnset = (value === undefined)` evaluated False, so the
`(defaults to <project>)` placeholder NEVER rendered. Drop the seed —
field stays in `known_fields` (UI sees it) but routes through the
unset rendering path on GET, where placeholder_from fires.
Tests: test_get_surfaces_bq_fields_even_when_unset assertion flipped
from "billing_project IS present" to "billing_project NOT auto-seeded"
to lock in the new shape. 67 affected tests pass.
Devin Review iter #2 found 2 new issues (after iter #1's 5 fixes
landed). Both real, both addressed.
🔴 Quota user_id key mismatch defeated shared daily budget. /api/query
computed `user.get("id") or user.get("email")` while /api/v2/scan uses
`user.get("email") or "anon"` (app/api/v2_scan.py:327). Same user → two
different keys in the singleton QuotaTracker. BQ bytes consumed via
/api/query were tracked under UUID; via /api/v2/scan under email; the
`check_daily_budget` pre-flight on either endpoint never saw the
other's recorded bytes — per-user cap was effectively doubled. Match
v2/scan's email-first ordering.
🟡 QuotaExceededError(KIND_CONCURRENT) → 400 instead of 429.
`quota.acquire(user_id)` raises this from __enter__ when the per-user
concurrent-scan slot is at cap. The exception propagated through the
@contextlib.contextmanager generator, the caller's `with guard:`
block, and was caught by execute_query's generic `except Exception`
handler → mapped to 400 with a flattened "Query error: concurrent_scans:
N/M" string, dropping the typed retry_after_seconds field. Wrap the
`with quota.acquire(...)` in a try/except QuotaExceededError that maps
to 429 with the same typed-detail shape used for the daily-budget
rejection — consistent with /api/v2/scan:392-402.
Tests: test_api_query_quota.py user_id strings updated to
"admin@test.com" (the seeded_app admin's email) to match the new
email-first ordering. 40 affected tests pass.
CI failures on PR #168 after rebasing onto main + PR #169/#170:
gw2 worker bucket reproducibly fails test_admin_can_list_registry +
test_three_sources_catalog_count with `assert "X" in set()` — the
register-table POST landed but list/catalog endpoints returned empty.
Root cause: pre-existing module-level cache leak across tests on the
same xdist worker process. `app.instance_config._instance_config`,
`connectors.bigquery.access.get_bq_access` (functools.cache), and
`app.api.v2_quota._quota_singleton` all survive across function-scoped
fixtures, so a prior test that read instance.yaml against an old
DATA_DIR poisons the next test's env even after `monkeypatch.setenv`
resets DATA_DIR.
Pre-existing on main — surfaced now because #160's new tests changed
the xdist test bucket distribution and dropped a different mix of
tests onto gw2 that hit the leak. Direct cause is unchanged; my T1a
fix in test_main_exits_when_project_missing addressed one symptom of
the same pollution but didn't generalize.
Add an autouse fixture in conftest.py that resets all three caches
before every test. Generic fix; helps any future test that reads
instance.yaml or BqAccess and would otherwise be order-dependent on
the worker.
Two new test files driving the next commit's admin UI work.
tests/test_admin_bigquery_test_connection.py — POST
/api/admin/bigquery/test-connection (admin-only health probe). 6 cases:
- success → 200 with ok=true + resolved billing_project / data_project
/ elapsed_ms
- not_configured → 400 with the typed BqAccessError detail surface
- cross_project_forbidden (USER_PROJECT_DENIED simulation) → 502
- 10s timeout → 504 with kind="timeout" (best-effort cancel_job)
- non-admin caller → 403
- unauthenticated → 401
The endpoint matters for the operator side of the reporter's loop —
admin saves data_source.bigquery in /admin/server-config, clicks
"Test connection", gets typed structured feedback BEFORE any analyst
hits a query failure.
tests/test_admin_server_config_placeholder.py — `billing_project`
field-spec must carry `placeholder_from: ["data_source", "bigquery",
"project"]` so the JS template can resolve and inject
"(defaults to <project>)" greyed under the input when the operator
hasn't set billing_project explicitly. This makes the existing
"billing falls back to data" rule (connectors/bigquery/access.py:
339-340) visible in the UI.
7 RED on the current branch (endpoint and placeholder_from key both
absent). GREEN landing in the next commit.
The reporter (#160) saw `USER_PROJECT_DENIED` raw in the CLI because
all three CLI error-rendering paths flatten typed BqAccessError /
guardrail / RBAC dicts to a truncated single-line string, hiding the
structured `hint` field that explains how to fix the misconfig.
Fix: shared `cli/error_render.py:render_error(status_code, body)` that
recognizes the canonical typed shapes and pretty-prints them. Falls
back to truncated-and-flattened form for unrecognized bodies, so the
renderer never makes worse-than-status-quo output.
Recognized shapes:
- {detail: {kind: ..., hint?, billing_project?, data_project?}}
— typed BqAccessError responses from /api/v2/scan, /sample, /schema,
/api/query (when /api/query escalates a BQ failure)
- {detail: {reason: 'remote_scan_too_large', scan_bytes, limit_bytes,
tables, suggestion}} — new /api/query cost-guardrail rejection
- {detail: {reason: 'bq_path_not_registered'/'bq_path_access_denied',
path, hint?, registered_as?}} — new /api/query RBAC patch
- {detail: '...'} — string detail (legacy endpoints)
Wired through 3 CLI paths:
- cli/v2_client.py: V2ClientError.__str__ delegates to render_error;
pre-truncation removed from V2ClientError.message (was hiding hints
past 200 chars).
- cli/commands/query.py:_query_remote: parse JSON body, call renderer
on error.
- cli/commands/query.py:_query_hybrid: catch RemoteQueryError, build
synthetic `{detail: {kind: error_type, **details}}` payload, render.
tests/test_cli_query.py:test_remote_query_failure: assertion updated
from `"Query failed"` (no longer printed) to `HTTP 400` + `bad SQL`
(what the renderer surfaces for string detail).
Sample output for cross_project_forbidden:
Error: cross_project_forbidden (HTTP 502)
billing_project: (empty)
data_project: prj-example-data-001
message: USER_PROJECT_DENIED on bigquery.googleapis.com
hint: Set data_source.bigquery.billing_project in
/admin/server-config to a project where the SA has
serviceusage.services.use, or grant the SA that role on the
data project.
19 tests pass — 10 from T4a now GREEN + 3 prior cli_query tests still
green + 6 ancillary.
Phase 3 review identified an RBAC + cost-cap bypass: `SELECT * FROM
"bq"."ds"."tbl"` (catalog token quoted as a DuckDB identifier) was NOT
matched by the BQ_PATH regex, so direct quoted-form references skipped
both the registry check and the cost-cap dry-run. DuckDB resolves
`"bq"` to the same ATTACHed BQ catalog, so the bypass is real.
Widen the catalog-token alternation: `(?:"bq"|bq)` matches both forms.
Negative lookbehind `(?<![\w.])` still rejects look-alike prefixes
(`other_bq`, `my_bq`); the new "my_bq".ds.tbl negative test locks that
in alongside `other_bq.ds.tbl`.
Tests:
- 2 new positive cases in tests/test_query_bq_regex.py for the quoted
form (`"bq"."finance"."ue"` and uppercase `"BQ"."ds"."tbl"`).
- 1 new negative case rejecting `"my_bq".ds.tbl` so the quoted-form
widening doesn't open a different evasion.
- 1 new RBAC test in tests/test_api_query_rbac_bq_path.py: admin
hitting an unregistered quoted path returns the same
bq_path_not_registered 403 as the unquoted form.
All 33 Phase 3 tests pass after the fix.
3 new test files that drive the upcoming cli/error_render.py module
and the V2ClientError refactor.
tests/test_cli_error_render.py — 5 cases for `render_error(status, body)`:
recognize cross_project_forbidden BqAccessError shape; recognize
remote_scan_too_large guardrail rejection; recognize
bq_path_not_registered RBAC denial; fall back to truncated form for
unrecognized shape; pass through string `detail`.
tests/test_cli_query_render.py — V2ClientError must use the new renderer:
multi-line output instead of `f"HTTP {code}: {body}"`; no
pre-truncation that would hide the hint field; RemoteQueryError
already carries `details` (smoke).
tests/test_remote_query_error_details.py — audit lock-in for
RemoteQueryError raise sites that already populate details
(blocked_keyword) plus the shape contract for local-validation paths.
Run: 5 errors (cli.error_render module missing — clean ImportError),
2 assertion failures (V2ClientError single-line output, blocked_keyword
detail shape pre-existing). 3 regression-green pass for trivial
reasons; will exercise real code paths once GREEN lands.
The headline implementation for issue #160. POST /api/query now gates
direct `bq."<dataset>"."<source_table>"` references behind the registry
and bounds the BQ scan cost behind a configurable cap. Wired through
the same singleton QuotaTracker as /api/v2/scan so daily-byte budgets
are shared across both BQ-touching paths.
Changes in app/api/query.py:
- Add module-level `BQ_PATH` regex matching the 16 syntax variants
verified empirically (fully-quoted, unquoted, mixed quoting,
case-insensitive, inside CTE bodies, multi-path, …).
- Add `bigquery_query` to the SQL keyword blocklist. Closes the
pre-existing function-call backdoor where a user could run an
arbitrary BQ jobs API call against any reachable dataset, bypassing
the registry and RBAC. Wrap views internal to the BQ extractor still
use bigquery_query() — but those run via DuckDB view resolution at
query time, not via user-submitted SQL, so the blocklist doesn't
break them.
- Add `_bq_guardrail_inputs` helper: walks user SQL twice — once for
bare-name matches against accessible registered remote-BQ names
(contributes to dry_run_set), once for direct `bq.X.Y` matches
(gated against `find_by_bq_path` lookups, returns 403 with
structured detail on miss or grant violation).
- Add `_enforce_remote_bq_quota_and_cap` helper: pre-flight
`check_daily_budget` (over-cap → 429), then `with quota.acquire(...)`
wraps a per-path BQ dry-run, sums bytes, raises 400
`remote_scan_too_large` when total > cap.
- Cap default 5 GiB; configurable via `api.query.bq_max_scan_bytes`
in /admin/server-config (next phase wires the UI).
- Post-flight `record_bytes` against the user's daily counter.
- Module-level imports of `_bq_dry_run_bytes`, `_build_quota_tracker`,
`get_bq_access` so tests can monkeypatch via `app.api.query.<name>`.
Tests:
- All 23 RED tests from the previous commit now pass (regex matrix,
blocklist with detail-string assertion, RBAC unregistered/admin-bypass,
guardrail dry-run-called/over-cap-rejected, quota pre-flight 429).
- mock_dry_run fixture stubs both `_bq_dry_run_bytes` and `get_bq_access`
so guardrail tests don't require a live BQ project.
- Quota test uses `admin1` (the seeded_app fixture's actual user id, not
`admin`).
Smoke: 887 passed across query/bq/admin/extractor/registry/quota
domains. No regressions.
5 new test files for the upcoming /api/query pre-flight block (next
commit). All failing for the right reason on the current codebase:
tests/test_query_bq_regex.py (8 + 1 + 7 + 1 = 17 cases)
Pure unit test of `BQ_PATH` regex constant (not yet imported from
app.api.query). Verifies the 16-case matrix from spec §4.3.1:
positive matches for fully-quoted / unquoted / mixed quoting / case
variants / inside CTE bodies / multiple paths in one statement;
negative for bare registered names / 2-part bq.col / prefix that
contains bq / middle-component bq / quoted bare names; documented
string-literal false-positive accepted.
tests/test_query_bigquery_query_blocked.py (3 cases)
POST /api/query with bigquery_query() function call must hit the
canonical blocklist rejection ("Only single SELECT queries are
allowed"). Today the blocklist passes all 3 — confirmed RED via
detail-string assertion.
tests/test_api_query_rbac_bq_path.py (4 cases)
Direct bq."<ds>"."<tbl>" references must be registry-gated:
unregistered → 403 bq_path_not_registered; registered + admin →
bypass per-name grant; case-insensitive lookup; string-literal
containing bq.X.Y → 403 (strict-deny).
tests/test_api_query_guardrail.py (3 cases)
Cost guardrail: SQL referencing a registered remote BQ row invokes
_bq_dry_run_bytes (verified via call-counter side effect); over-cap
dry-run returns 400 remote_scan_too_large with bytes/tables/suggestion
in detail; non-BQ queries skip the dry-run entirely.
tests/test_api_query_quota.py (3 cases)
Daily-byte quota check_daily_budget pre-flight (over-cap → 429
before dry-run); record_bytes post-flight on the shared singleton
v2_quota tracker; non-BQ queries leave the counter alone.
RED breakdown: 16 ImportError (BQ_PATH not yet defined) + 7 assertion
failures = 23 fully-RED. 6 tests pass for regression-green reasons
(use `if r.status_code == 403:` patterns where current code returns
400 for unrelated reasons). They serve as anti-regression guards once
the implementation lands and remain green throughout — documented per
spec §6 Phase 1 RED-discipline notes.
The upcoming /api/query RBAC patch (next phase) gates direct
`bq."<dataset>"."<source_table>"` references in user SQL — every such path
must point at a registered query_mode='remote' BigQuery row, otherwise the
caller has stepped around the registry and around RBAC.
Add `TableRegistryRepository.find_by_bq_path(bucket, source_table)` to
support that lookup. Returns None if no row matches, the row dict if
exactly one matches, or the oldest-by-`registered_at` row when 2+ match
(no UNIQUE constraint on `(source_type, bucket, source_table)` — admins
can in principle register a BQ table twice with different ids/names).
Match is case-insensitive on bucket+source_table so user SQL `SELECT FROM
bq.Finance.UE` resolves to a `(finance, ue)` registry row. NULL values in
either column are excluded so a legacy NULL-bucket row never masks a
legitimate non-NULL lookup.
5 RED tests cover: empty registry, non-BQ source rejected, single match,
oldest-of-many tie-breaker, case-insensitive match, NULL-column exclusion.
All initially failed with AttributeError; pass after the ~30 LOC method
addition.
Now that VIEW/MATERIALIZED_VIEW always wrap via bigquery_query() (the
prior `legacy_wrap_views=True` branch behavior, made unconditional in
the previous commit), the toggle has no semantic meaning and is removed
across the codebase.
Production code:
- app/api/admin.py: drop the field from _OPTIONAL_FIELDS["data_source"]
["bigquery"]["fields"] and from _BQ_OPTIONAL_FIELD_DEFAULTS, plus the
comment block above the defaults dict.
- config/instance.yaml.example: drop the example snippet.
- src/orchestrator.py: update the inner-objects skip-branch comment to
reflect the new BQ behavior (the skip itself stays — keboola
use_extension=False still inserts _meta rows without inner views).
- app/web/templates/admin_tables.html: rewrite operator copy in the
register and edit forms to reflect always-wrap.
Tests:
- tests/test_admin_server_config.py (TestServerConfigBigQueryFields):
flip assertions from "field IS present" to "field NOT present" on
legacy_wrap_views. Drop the test_post_persists_legacy_wrap_views test
since the field no longer exists.
- tests/test_admin_server_config_known_fields.py: same flip on the
known-fields registry assertion.
- tests/test_bigquery_extractor.py: drop the obsolete
test_view_entity_does_not_create_master_view_by_default (asserted the
bug we fixed) and test_legacy_wrap_views_toggle_restores_old_behavior
(toggle no longer meaningful). Update remaining test docstrings.
Operators with `legacy_wrap_views: true` set in their overlay get the
new (equivalent) behavior automatically — the unrecognized key is
silently ignored by the YAML loader. Operators with `false` get the
issue-#160 fix as a behavior change, not a regression.
Spec gate updated: production code grep gate
grep -rn 'legacy_wrap_views' connectors app src config cli
must return zero. tests/ excluded — historical "removed in #160"
breadcrumbs and `assert "X" not in fields` regression guards retained
as anti-regression signals.
Issue #160: da query --remote against query_mode='remote' BQ rows whose
underlying entity is a VIEW or MATERIALIZED_VIEW returned a DuckDB catalog
error because the extractor (with legacy_wrap_views=False default since
the v2 fetch primitives release) skipped master-view creation for those
entity types — but kept inserting the _meta row, leaving operators with a
registered name that resolves to nothing.
Always create a master view for entity types we have proven runtime support
for in this codebase:
BASE TABLE → bq."<dataset>"."<source_table>"
(Storage Read API path; predicate pushdown)
VIEW / MAT_VIEW → bigquery_query('<project>', 'SELECT * FROM `proj.ds.tbl`')
(jobs API path; no pushdown — the upcoming /api/query
cost guardrail bounds the scan; was the legacy
legacy_wrap_views=True branch SQL form, just always-on)
For other entity types (EXTERNAL, SNAPSHOT, CLONE, future), log a warning
and SKIP both the master view AND the _meta row. The registry row remains
intact so /api/v2/scan still works for `da fetch`; we just don't expose a
stale _meta entry that the orchestrator would later strand.
The legacy_wrap_views config knob is still readable in this commit (read
returns the value, which is then ignored). Removal across the rest of
the codebase happens in the follow-up REFACTOR commit.
tests/test_bigquery_extractor.py:
- Add 3 RED tests covering the new always-wrap behavior:
test_view_creates_wrap_view_with_default_config,
test_materialized_view_creates_wrap_view_with_default_config,
test_unsupported_entity_type_skips_meta_and_view.
- Fix pre-existing flakiness in test_main_exits_when_project_missing
by resetting app.instance_config cache before the no-project mock —
the prior test populates the cache with a project, and removing the
legacy_wrap_views get_value() call surfaced this latent ordering bug.
- _list_tables now accepts a user param and delegates to
get_accessible_tables: admins see all, non-admins see only tables
covered by their resource_grants. Fixes silent leak of table names
to unauthorised analysts.
- today derived from now.date() (UTC) instead of date.today()
(server-local TZ), so today and now are always consistent.
- Updated test_render_override_tables_list to seed an admin user so
RBAC filtering doesn't hide the table; added three new tests covering
per-user table isolation, admin sees-all, and no-grants-empty.
Finding #1: _build_context now routes through render_agent_prompt_banner when
a DB connection is available, so both /setup and the /dashboard clipboard CTA
always reflect the admin override (or the live default when no override is set).
Previously _build_context unconditionally used resolve_lines(), ignoring the
welcome_template override for the dashboard JS array.
Finding #2: PUT /api/admin/welcome-template now performs a second render pass
with user=None (anonymous stub) after the authenticated-user pass. Templates
that reference user.* fields without an {% if user %} guard are rejected with
a clear 400 error explaining the anon-visitor breakage.
- Fix#1: _detect_existing_project now checks .claude/settings.json for
"da sync" marker instead of deleted CLAUDE.md; update tests accordingly.
- Fix#2: preview endpoint uses autoescape=False to match /setup rendering;
align render_agent_prompt_banner in welcome_template.py to the same.
- Fix#3: apply _sanitize_banner_html to override render path in setup_page
so all render paths sanitize consistently.
- Fix#4: move .setup-link-banner into the existing-user branch where
account_details.last_sync_display is reachable; remove dead copy from
new-user branch.
The /admin/agent-prompt editor now pre-fills with the full bash bootstrap
script from setup_instructions.resolve_lines() instead of being empty.
When an admin saves an override it replaces the default everywhere — the
/setup page display and the dashboard clipboard CTA — rather than adding a
banner above the auto-generated commands.
GET /api/admin/welcome-template now returns a `default` field with the live
computed script so the editor always shows meaningful starting content.
{server_url} and {token} single-brace placeholders survive Jinja2 rendering
and are substituted by JavaScript at clipboard-copy time as before.
Preview pane switches to textContent (not innerHTML) since content is bash.
- src/welcome_template.py: rewrite as HTML banner renderer
(render_agent_prompt_banner); drop _list_tables, _metrics_summary,
_marketplaces_for_user, render_welcome, _load_default_template.
build_context now exposes only instance/server/user/now/today.
_sanitize_banner_html strips script/iframe/on*/javascript: post-render.
- app/api/welcome.py: drop get_welcome handler, WelcomeResponse, old
_VALIDATION_STUB_CONTEXT. Admin endpoints stay at same URLs; validation
stub updated to match new slim context. Preview now uses autoescape=True.
- app/web/router.py: setup_page calls render_agent_prompt_banner and passes
banner_html to install.html; admin_agent_prompt_page drops _load_default_template.
- app/web/templates/install.html: add .setup-banner CSS + banner block above hero.
- cli/commands/analyst.py: replace _generate_claude_md with _init_claude_workspace;
no CLAUDE.md written, only .claude/CLAUDE.local.md placeholder + settings.json hooks.
- tests: delete test_cli_analyst_welcome.py (tests deleted endpoint/function);
rewrite TestGenerateClaudeMd → TestInitClaudeWorkspace; update api test to
assert /api/welcome returns 404 and remove welcome-fetch tests.
Rename the welcome prompt editor from /admin/welcome to /admin/agent-prompt
and update all UI labels to "Agent Setup Prompt". API endpoint URLs are
unchanged (PUT/GET/DELETE /api/admin/welcome-template, GET /api/welcome).
- Nav menu: "Welcome prompt" → "Agent Setup Prompt", href updated
- Page title and h2 updated in admin_welcome.html
- Error message hint in app/api/welcome.py updated to /admin/agent-prompt
- Dashboard: replace inline <details> preview of _claude_setup_instructions
with a simple link to /setup (Task C)
- docs/welcome-template.md renamed to docs/agent-setup-prompt.md; internal
references to /admin/welcome updated
- OpenAPI snapshot path updated
- Tests updated to reflect new route and removed inline preview
Remove the setup_banner feature (admin-editable /setup page banner) and
all associated code: API router, repository, renderer, admin template,
tests, and docs. The setup_page handler no longer calls render_setup_banner;
the install.html template no longer renders banner_html. The setup_banner
DuckDB table (v22) is kept intact for forward-compat with already-migrated
instances — only the application code is removed.
CHANGELOG updated: setup_banner bullets removed; Agent Setup Prompt
(welcome-template feature) now stands alone as the single editable prompt.
- test_render_marketplaces_filtered_by_rbac: seeds 2 marketplaces, 2 groups,
grants, 2 users; asserts each user's rendered output references only their
group's marketplace/plugins, not the other's (I-3).
- test_validation_stub_matches_build_context_shape in test_welcome_template_api.py:
asserts _VALIDATION_STUB_CONTEXT top-level and nested keys (instance, server,
user) match build_context() output so stub drift is caught in CI (I-4).
- test_validation_stub_matches_build_context_shape in test_setup_banner_api.py:
same shape check against build_setup_banner_context() (I-4).
- Add _sanitize_banner_html() to src/setup_banner.py: strips <script>/
<iframe> blocks, on* event-handler attributes, and javascript:/data:
URI schemes post-render (I-2). Defense-in-depth — /setup is partly
anonymous so malformed admin content must not execute in visitors'
browsers.
- Apply sanitizer in render_setup_banner() before returning rendered HTML.
- Add 3 unit tests: test_render_strips_script_tags,
test_render_strips_event_handlers, test_render_strips_javascript_uri.
- Drop unused Optional import from src/repositories/welcome_template.py
and src/repositories/setup_banner.py (M-6).
Adds an optional Jinja2/HTML banner displayed above the bootstrap
commands on /setup. Empty by default; admin authors it at
/admin/setup-banner. autoescape=True — safe for HTML context.
Render failures return "" so a broken banner never breaks /setup.
Schema v22: setup_banner singleton table, auto-migration v21→v22.
- Add GET /setup serving install.html (CLI + Claude Code setup page)
- Add GET /install → 301 redirect to /setup for backwards compat
- Move first-time setup wizard from /setup to /first-time-setup
- Update nav link: href=/setup, label 'Setup local agent', active on both /setup and /install paths
- Update page <title> to 'Setup local agent — …'
- Update /dashboard and /setup comment in _claude_setup_instructions.jinja
- Update tests and OpenAPI snapshot accordingly
Per Devin review on #166: /api/health returns 'ok' or 'unhealthy';
'healthy' is the detailed endpoint's vocabulary (app/api/health.py:180).
The pre-existing OR-tuple was dead code and inconsistent with the rest
of this PR's alignment work.
`/api/health` is the auth-free LB probe — returns `status` + `db_schema`
only. `version` lives in `/api/version` and the richer
`services.duckdb_state` lives in `/api/health/detailed` (auth-gated).
The two e2e asserts had drifted and broke nightly on main.
* security(auth): per-IP rate limit on auth endpoints + generalize last-admin guard
Closes#45 and #151.
#45 — every auth endpoint was unthrottled (login, magic-link, token,
bootstrap), leaving us open to password brute-force and SMTP
email-bombing. Wires slowapi (new dep) into the middleware chain with
per-route limits: 10/min on login + token, 5/min on send-link, 3/min on
bootstrap. Returns 429 with Retry-After: 60 once exceeded. Per-IP key
respects the leftmost X-Forwarded-For hop (Caddy in front of the app
strips client-supplied XFF). Operator escape hatch:
AGNES_AUTH_RATELIMIT_ENABLED=0. Test suite disables the limiter via
autouse conftest fixture so existing auth tests that hammer endpoints
in tight loops are unaffected.
#151 — DELETE /api/admin/users/{id}/memberships/{group_id} and the
mirror DELETE /api/admin/groups/{group_id}/members/{user_id} only
guarded against self-removal as last admin. Generalizes to refuse
removing anyone from the seeded Admin group when they are the only
remaining active admin (mirrors the existing
count_admins(active_only=True) <= 1 check on delete_user / update_user).
Recovery from zero admins requires direct DB access, so this closes
a path where a scheduler/bootstrap actor that bypasses normal admin
checks could otherwise empty the group.
* security(auth): throttle remaining email-bombing + token-confirm endpoints
Address code-review gap on PR #165 — the first commit covered /send-link
but missed two endpoints with the IDENTICAL email-bombing surface:
- POST /auth/password/reset — sends reset mail, anti-enum response
- POST /auth/password/setup/request — sends setup mail, anti-enum response
Both now share the 5/min limit with /send-link.
Also add 10/min to the token-confirm surfaces — high-entropy tokens but
partial leaks via logs / referer have surfaced before, and unbounded
guess rate would let an attacker exhaust the keyspace adjacent to a
leaked prefix:
- POST /auth/email/verify
- GET /auth/email/verify — closes the click-through bypass
- POST /auth/password/reset/confirm
- POST /auth/password/setup/confirm
Doc fix: rate_limit.py module docstring + CHANGELOG entry no longer
claim "disable without a redeploy" (misleading). The Limiter constructor
freezes `enabled` from env at import time, matching every other Agnes
env knob — operators set the flag and bounce the container.
Tests: 4 new cases in test_auth_rate_limit.py covering
/reset, /setup/request, /reset/confirm, GET /verify. Full suite:
2583 passed, 32 skipped, 0 failed.
* security(auth): throttle JSON /auth/password/setup — closes form-throttle bypass
Second code-review pass on PR #165 caught a fifth gap: POST /auth/password/setup
(JSON variant, kept for backward compat) consumes the same setup_token as
the web form /setup/confirm but was unthrottled — an attacker brute-forcing
the token just switches from the form path to the JSON path and resumes
at unbounded RPS. Apply the same 10/min limit and signature shape used
on /setup/confirm.
Also extend CHANGELOG note about the JSON-variant bypass for future
operators reading the security entry.
Test: 1 new case (test_password_setup_json_rate_limited_after_10_requests),
9 rate-limit tests + 28 password-flow tests + 41 auth-provider tests pass,
no regressions.
* chore(release): cut 0.30.1 — auth security hardening (rate limit + last-admin guard)
CI on dc03837a showed test_missing_project_returns_error failing with
'ok-project' instead of '' — config-cache leak from the sibling
test_returns_skipped_when_no_bq_rows that ran first under pytest-xdist.
Pre-existing flake (cache lives in app.instance_config; monkeypatch
restores the loader patch but doesn't invalidate the cached return).
Earlier CI runs (a4339ce6) got lucky on test ordering. Adding an
explicit reset_cache() at the top of the test removes the dependency
on ordering.
E2E sub-agent finding: `da query --remote "SELECT * FROM <id>"` against a
materialized table that hasn't yet been rebuilt in the server's
analytics.duckdb returns a confusing DuckDB "Table does not exist"
message even though the table is in the registry. Materialized rows
produce parquets at `${DATA_DIR}/extracts/<source>/data/<id>.parquet`,
but the orchestrator's master-view creation is `_meta`-driven — fresh
instances or pre-tick states have the registry row without a
corresponding view, so analysts hit the bare "does not exist" with no
path forward.
Improve the error rendering in `app/api/query.py:execute_query`. When
DuckDB raises a "table does not exist" error, scan the registry for any
`query_mode='materialized'` row whose id or name appears in the failed
SQL. On a hit, return a 400 whose detail names the table, explains the
materialize state, and offers two concrete next steps:
1. Run `da sync` (or wait for the scheduler tick / hit
POST /api/sync/trigger) to materialize the parquet, OR
2. Query the source directly via the catalog alias when the registry row
carries bucket+source_table (e.g. `bq."dataset"."table"` for BigQuery,
`kbc."bucket"."table"` for Keboola).
Detection is bounded — the registry round-trip only fires when DuckDB's
error mentions a missing table, so happy-path queries pay no cost.
Non-materialized unknowns fall through to DuckDB's raw error.
2 new tests: materialized id surfaces the hint with the bucket+source_table
payload; unknown table falls back to the generic error path with no false
positive on the new hint.
The strict source_type-availability validator from the prior commit
broke ~12 existing tests that register tables on the default test
instance (where `data_source.type` resolves to 'local' since no
instance.yaml is loaded).
The intent of the validator is to catch *explicit* misconfig:
`type=bigquery` instance + `source_type=keboola` payload with no
`data_source.keboola.*` block. The bootstrap workflow — admin sets up
a fresh instance and registers a few tables before pointing at a real
source — should not be gated here.
Loosen the check: when `get_data_source_type()` returns 'local' (the
fallback when no `data_source.type` is set), skip the rejection. The
explicit mismatch case still 422s because that path resolves
`configured_primary` to a real source type.
Also adds an autouse keboola_instance fixture to test_journey_sync_query.py
which exercises Keboola registrations through the full sync→query
flow — the fixture documents the test's data-source assumption rather
than relying on the bootstrap escape hatch.
E2E sub-agent finding: instance configured with `data_source.type='bigquery'`
and no `data_source.keboola.*` block. Admin POSTs `{source_type: 'keboola'}`
to /api/admin/register-table → returns 201, row lands in the registry, but
never syncs because the scheduler has no Keboola URL/token to ATTACH
against. Operator only notices the gap when `da catalog` keeps showing
nothing.
The new `_validate_source_type_configured` helper runs immediately after
the id/view-name collision checks in `register_table`. A source_type is
considered configured when:
- it matches `get_data_source_type()` (the instance's primary), OR
- a non-empty `data_source.<source_type>` block exists in the effective
`instance.yaml` (multi-source instance), OR
- it's in `_SOURCE_TYPES_INDEPENDENT_OF_DATA_SOURCE` (Jira / local — both
get data through paths that don't involve `data_source.*`).
Returns 422 with a message that names the configured primary source and
points at `/admin/server-config` for enabling a secondary one. None /
empty source_type is still tolerated for backward compat with legacy CLI
scripts that don't set the field — the route resolves it later.
5 new tests cover: keboola-on-bq rejected, bq-on-keboola rejected,
matching source_type still works, jira allowed regardless, omitted
source_type passes through.
Existing tests that registered Keboola rows on the unconfigured default
test instance now opt into a `keboola_instance` fixture to satisfy the
new validator (tests/test_admin_bq_register.py + .keboola_materialized
+ .unregister_cleanup; the multi-source PUT test in test_admin_bq_register
adds a `keboola` block to its synthetic config).
Pre-existing test_missing_project_returns_error failure in
TestRebuildFromRegistry is unrelated (config-cache leakage from a
previous test in the same class) — confirmed pre-existing on the prior
commit via `git stash` reproduction.
E2E sub-agent finding: register a materialized BQ row → sync to materialize
the parquet at `/data/extracts/bigquery/data/<id>.parquet` → DELETE the
registry row. The DB row goes away but:
- the parquet file stays on disk forever, AND
- the sync_state row stays, so `/api/sync/manifest` keeps advertising the
dropped table to `da sync`, AND
- the orchestrator's next rebuild can resurrect a master view by picking
up the leftover parquet.
Two-part fix in `unregister_table`:
1. For materialized rows on bigquery/keboola, remove
`${DATA_DIR}/extracts/<source_type>/data/<name>.parquet` (and any stale
`<name>.parquet.tmp` from a crashed prior materialize). Filename is
keyed on `table_registry.name` to match sync_state bookkeeping.
File-removal errors are logged but don't fail the DELETE — the registry
row is already gone, and an orphan parquet won't get a master view at
next rebuild because the orchestrator's _meta-driven scan never picks
up bare parquet files.
2. Always clear `sync_state` + `sync_history` rows for the dropped table_id
so the manifest stops advertising the table — applies to all source
types and modes, not just materialized, since any synced row had a
sync_state entry.
Orchestrator-side defensive guard (Finding 2b) is a no-op in the current
implementation: `_attach_and_create_views` only creates master views from
`_meta` rows in each connector's `extract.duckdb`, so a parquet without a
matching `_meta` entry is already invisible to the rebuild. The new
test `test_orchestrator_skips_orphan_parquet_in_extracts` is kept as a
regression guard for that contract.
5 tests cover: BQ + Keboola materialized DELETE removes parquet, remote
DELETE doesn't error trying to remove a non-existent file, sync_state
cleared on DELETE, orchestrator orphan-skip invariant.
E2E testing showed admin POSTs of materialized BQ rows whose source_query
uses BigQuery-native backtick identifiers (`prj.ds.t`) silently no-op'd at
the next sync tick — the materialize path runs the SQL through the DuckDB
BQ extension's COPY which uses DuckDB's parser; backticks aren't recognized
and the query either parse-errors or matches zero rows. No parquet lands at
the canonical path and no error reaches an operator-visible surface.
Two-part fix:
1. RegisterTableRequest's _check_mode_query_coherence model_validator now
rejects any source_query containing a backtick with a 422 + actionable
message pointing at the DuckDB equivalent (bq."dataset"."table"). Same
check is applied in update_table on the merged record so PATCHes that
flip a stored source_query to backtick form are also caught. Covers BQ
AND Keboola materialized rows since both connectors funnel source_query
through DuckDB's COPY.
2. _run_materialized_pass now persists per-row failures via the new
SyncStateRepository.set_error / clear_error methods (existing
sync_state.error / status columns — no schema migration). GET
/api/admin/registry enriches each row with `last_sync_error` from a
single batched SELECT against sync_state, so the admin UI / da admin
status can show "this table failed last sync because: X" instead of
operators having to trawl scheduler logs. Recovered rows have the
error cleared automatically — update_sync's success path resets
status='ok' / error=NULL on the upsert.
The materialized-path test fixture's _materialized_payload helper is
updated to use DuckDB-flavor SQL (the prior backtick example pre-dated the
fix). 6 new tests cover register/update rejection on BQ + Keboola, the
sync_state error persistence, and the registry response surface.
Devin's second review pass on commit 16938ae7 surfaced 2 more issues:
BUG_pr-review-job-58ae3148_0001 — non-BQ materialized via PUT bypasses source_query check
app/api/admin.py update_table only enforces 'query_mode=materialized
requires source_query' for source_type='bigquery' rows (via the
synthetic RegisterTableRequest at line 2129+). Non-BQ source types
(Keboola) skip the check — admin could PUT {query_mode: materialized}
on a Keboola local row without source_query, persist successfully,
then crash at the next sync tick when kb_materialize_query received
sql=None and DuckDB rejected COPY (None) TO '...'.
Fix: generic coherence guard before the BQ-specific block — for ALL
source types, query_mode='materialized' requires non-empty source_query
in the merged record. Returns 422 with a hint about reverting via
query_mode='local'/'remote'.
ANALYSIS_pr-review-job-642ff90f_0007 — diagnose returns 'ok' on BQ resolution failure
app/api/health.py:_check_bq_billing_project caught get_bq_access()
exceptions and returned status='ok' with a 'could not resolve' detail.
Automated alerting keyed on status != 'ok' would silently miss missing
google-cloud-bigquery, auth failures, or malformed config. Fix: return
status='unknown' on resolution failure — surfaces it on operator
dashboards without promoting the overall health to 'degraded' (which
'warning' does, intentionally for the billing==project case).
Tests:
- test_update_keboola_to_materialized_without_source_query_rejected:
PUT {query_mode: materialized} on a Keboola local row returns 422
with 'source_query' in the detail
- test_diagnose_returns_unknown_status_when_bq_resolution_fails:
when get_bq_access raises, the bq_config service entry surfaces
status='unknown' (not 'ok')
Full sweep: 2507 passed, 25 skipped, 0 failed (+2 from previous sweep
because of the 2 new regression tests; 8 pre-existing internal_roles
schema-migration failures still ignored per task brief).
Devin Review on commit 7052a235 flagged 4 real bugs in the Keboola
materialized path. All four are fixed; 3 new regression tests pin the
behavior so future refactors can't quietly regress.
BUG_pr-review-job-3fbd31c9_0001 — _run_materialized_pass gated behind 'if bq_project:'
app/api/sync.py:444-466 wrapped the entire materialized pass (which
dispatches BOTH BigQuery AND Keboola rows by source_type) in a check
for data_source.bigquery.project being non-empty. On Keboola-only
instances this short-circuited and Keboola materialized rows sat in
table_registry forever without their SQL being evaluated — the feature
CHANGELOG advertised was dead code on the most common deployment shape.
Fix: always run the materialized pass; the BQ branch's per-row try/except
catches the typed BqAccessError(not_configured) the sentinel raises
when no BQ project is set, so non-BQ instances incur a per-row error
for any (hypothetical) BQ-tagged row but the Keboola path runs cleanly.
Log line renamed 'Materialized BQ' → 'Materialized SQL' to match.
BUG_pr-review-job-3fbd31c9_0004 — wrong config key 'url' instead of 'stack_url'
app/api/sync.py:149 read get_value('data_source', 'keboola', 'url'),
but the canonical config key documented in instance.yaml.example:111
and used by app/api/admin.py:1503 + 2359 is 'stack_url'. Production
Keboola instances would always see an empty URL and fail with the
'not configured' error. The pre-existing test patched the wrong key
too, so it passed without catching the mismatch. Fix: use stack_url
in both sync.py and the test fixture.
BUG_pr-review-job-3fbd31c9_0003 — no atomic write in Keboola materialize_query
connectors/keboola/extractor.py wrote COPY directly to the final
'<id>.parquet' path. A mid-COPY failure (network, disk full, extension
crash) left a partial parquet that the orchestrator rebuild would
later pick up and serve to analysts. BQ's materialize_query already
uses a '<id>.parquet.tmp' staging path + os.replace() atomic swap
(connectors/bigquery/extractor.py:370-445); Keboola now mirrors that
pattern with the same try/except cleanup on COPY failure.
BUG_pr-review-job-3fbd31c9_0002 — full file read into memory for MD5
Same file:60-62 used parquet_path.read_bytes() for the MD5 hash.
Multi-GB Keboola materialized results would OOM on memory-constrained
containers. BQ's version uses streaming 8 KiB-chunk hashing
(connectors/bigquery/extractor.py:438-442); Keboola now mirrors it.
Tests:
- test_run_sync_runs_materialized_pass_on_keboola_only_instance —
pins BUG_0001's fix; setting bigquery.project='' must NOT skip
Keboola materialized dispatch
- test_keboola_materialize_atomic_write_on_failure — pins BUG_0003;
a mid-COPY RuntimeError leaves no .parquet AND no .parquet.tmp at
the canonical path
- test_keboola_materialize_uses_tmp_path_during_copy — documents the
atomic-write contract: COPY targets .parquet.tmp, final swap to
.parquet (no .tmp suffix on the result['path'])
- existing test_run_materialized_pass_dispatches_keboola_to_keboola_extractor
fixture updated: stack_url instead of url
Full sweep: 2505 passed, 25 skipped, 0 failed (modulo 8 pre-existing
internal_roles schema-migration failures called out in the task brief).
Diagnostic + operator-facing documentation that closes the loop on the work in this PR.
`da diagnose` (via /api/health/detailed):
- New _check_bq_billing_project() helper. When data_source.type='bigquery' and BqProjects.billing == .data, surface a yellow warning: 'BigQuery billing project equals data project'. Hint includes the YAML field path + the /admin/server-config UI shortcut. Diagnose's overall status promotes warning → degraded so the CLI echoes it.
- Non-BQ instances (Keboola-only, etc.) skip the check.
- Implementation hooks into the existing /api/health/detailed surface — no new endpoint, no CLI changes.
config/instance.yaml.example documentation:
- data_source.bigquery.billing_project: USER_PROJECT_DENIED hint, /admin/server-config UI reference
- data_source.bigquery.legacy_wrap_views: analyst-side discipline note (use `da fetch` / `da query --remote`), issue #101 history, view-heavy deployment guidance
- data_source.bigquery.max_bytes_per_materialize: cost guardrail block (NEW — wasn't documented in .example before)
- ai.base_url: provider list + UI hint
- openmetadata + desktop: 'configurable via /admin/server-config UI' headers
- corporate_memory: leading note that the schema is editable via UI
Other docs:
- CHANGELOG.md: comprehensive Unreleased section
- CLAUDE.md: schema chain → v20 + Materialized SQL connector mode + per-connector tab UI mention
- README.md: mode-first source table summary
- docs/architecture.md: per-connector tab UI mention
- cli/skills/connectors.md: bootstrap rails (parallel to #154)
- docs/superpowers/plans/2026-05-01-admin-tables-form-cleanup.md: implementation plan archive (2515 lines)
- scripts/seed_dummy_tables.py: drop is_public after #150 RBAC migration (column gone)
Tests:
- test_diagnose_billing.py — 3 cases (BQ with billing==data warns, BQ with billing!=data clean, non-BQ skips)
Today /admin/server-config renders fields by iterating Object.keys(payload) on the YAML value — if a key isn't in instance.yaml, the operator can't see it. They have to know to type it via the JSON-patch textarea (which only renders for empty sections) or SSH and edit YAML.
Adds a known-fields registry (`_KNOWN_FIELDS` in app/api/admin.py) the UI consumes alongside the YAML payload. Renderer shows BOTH:
- existing fields (from YAML) with current value
- known-but-unset fields with dashed-border placeholder + hint, ready to fill in
Renderer (`renderField`, `renderSection`, `collectSection`):
- kind="string"|"secret"|"bool"|"int"|"select"|"object"|"array"|"map" — picks input type
- kind="object" with `fields` — recursive structured form, arbitrary depth (corporate_memory needs 3-4 levels)
- kind="array" with `item_kind` — vertical stack of typed inputs + add/remove buttons
- kind="map" with `key_kind` + `value_kind` — key:value rows + add/remove (used for confidence.base, domain_owners, entity_resolution.entities)
- data-path encoded as JSON segment array so map keys with embedded dots (e.g. 'user_verification.correction') survive collect → patch round-trip
- .cfg-field.is-unset CSS — dashed border, muted label, italic hint
Sections newly exposed (added to _EDITABLE_SECTIONS):
- openmetadata: url, token (secret), cache_ttl_seconds, verify_ssl
- desktop: jwt_issuer, jwt_secret (secret), url_scheme
Known fields populated for existing sections:
- data_source.bigquery: billing_project (the cause of the 403 USER_PROJECT_DENIED footgun when SA can read but not bill the data project), legacy_wrap_views (bigquery_query() wrap for VIEWs — issue #101 default off, ON for view-heavy deployments), max_bytes_per_materialize (cost guardrail)
- data_source.keboola: stack_url, project_id (hints; values already populated)
- ai: base_url (required for openai_compat), structured_output (select)
- corporate_memory: full schema from instance.yaml.example — distribution_mode, approval_mode, review_period_months, notify_on_new_items, sources.{claude_local_md,session_transcripts}, extraction.{model,sensitivity_check,contradiction_check}, confidence.{base,modifiers,decay.{mode,half_life_months,decay_rate_monthly,floor}}, contradiction_detection.{enabled,max_candidates}, entity_resolution.{enabled,entities}, domain_owners, domains
- Known partial: confidence.modifiers is map<string, map<string, float>> — falls through to JSON-textarea with TODO; structured editor for that one shape needs more renderer work
Tests:
- test_admin_server_config_known_fields — registry envelope shape, smoke fixture
- test_admin_server_config_renderer_depth — 4-level nested objects, arrays of strings, maps of floats, dotted-key safety
- test_admin_server_config_corp_memory — full corporate_memory schema, 12 fields incl. nested
- test_admin_server_config — existing tests adjusted for new shape
Replaces the single mixed Jinja-branched form at /admin/tables with a per-connector tab interface and brings Keboola to capability parity with BigQuery.
Tab structure:
- BigQuery tab: Register modal with two-question radio model (Q1 Live | Synced × Q2 Whole | Custom SQL), Discover datasets / List tables / Use-table-as-base autocomplete buttons, table-vs-view auto-detection hint, per-tab listing filter
- Keboola tab: same two-question radio (Q2 only — no Live mode for Keboola), Custom SQL textarea against kbc."bucket"."table" for materialized rows
- Jira tab: read-only listing (Jira is webhook-driven; no Register form)
- Active tab persists in window.location.hash so refresh keeps the operator in place
Form cleanup (within tabs):
- Drops the misleading 'Sync Strategy' dropdown — runtime never read it (only profiler.is_partitioned() consumes the value for parquet-layout detection); kept in DB for back-compat (Pydantic deprecated)
- Adds Sync Schedule input to Keboola Register/Edit (was missing — scheduler honored per-table cron via is_table_due() for every source but the Keboola UI had no surface)
- Hides Primary Key under <details>Advanced with clarifying hint that it's catalog-metadata only (Agnes does not perform upsert/dedup; every sync is a full overwrite)
- Drops the Strategy column from the registry listing (every Keboola row defaulted to full_refresh after Strategy was hidden — column was noise)
- Removes the legacy out-of-tab #registerModal + the legacy global Discovery panel; each tab now owns its own header + Register button + listing div
Edit modal:
- BigQuery Edit modal physically relocated into <section id="tab-content-bigquery"> (mirrors Phase E Register placement)
- Keboola Edit modal mirrors Register (same Q2 radio, Discover/List buttons via parameterized helpers)
- openEditModal(table) dispatches by source_type to the right modal — fixes a quiet bug where Phase F's openEditKeboolaModal was never wired up and Keboola edits silently used the legacy modal
Per-row Manage access deep link:
- Each row in the per-tab listing has a lock-icon button between Edit and Delete that navigates to /admin/access#table:<table_id>
- admin_access.html bootstrap reads window.location.hash and pre-fills the resource filter, mirroring the existing ?group=<id> deep-link pattern
Tests:
- test_admin_tables_tab_ui.py — tab nav, hash persistence, register-button-per-tab, listing partition by source_type, Manage access deep link
- test_admin_tables_ui_materialized.py — two-question radio (BQ + Keboola), Discover/List/Use-as-base buttons, Edit modal parity, Jira read-only
The analyst flow becomes a closed loop with the server-curated table catalog:
- `da analyst setup` writes `<workspace>/.claude/settings.json` with two hooks:
SessionStart → `da sync --quiet || true` — pulls fresh RBAC-filtered parquets at session start
SessionEnd → `da sync --upload-only --quiet || true` — uploads session jsonl + CLAUDE.local.md
- `|| true` keeps Claude Code unblocked when the server is down.
- Workspace-level (not user-home) so the hooks fire only when Claude Code opens this analyst workspace.
- `da sync --quiet` rewrites the CLI output for hook consumption — 0 stdout on success, single-line error on failure.
- Existing settings.json is patched (deep-merged), not overwritten; malformed JSON is reported, not silently overwritten.
Tests cover: workspace bootstrap, hook insertion, malformed-json safety, quiet-mode output shape.
* feat(rbac): drop dataset_permissions + access_requests + users.role + is_public; v19 migration
BREAKING. Sjednocení datové RBAC vrstvy do per-group resource_grants modelu.
Před PR byla legacy data RBAC vrstva (dataset_permissions + is_public bypass)
de-facto neaktivní — is_public neměl API/UI/CLI surface, default true znamenal
že can_access_table vždycky bypassl. Dnes každý non-admin přístup vyžaduje
explicitní resource_grants(group, "table", id) řádek.
Schema v18 → v19 (src/db.py:_v18_to_v19_finalize):
- DROP TABLE dataset_permissions, access_requests
- DROP COLUMN users.role (NULL artifact since v13)
- DROP COLUMN table_registry.is_public
- Drops přes table-rebuild idiom (rename → create new → INSERT … SELECT
→ drop old) kvůli DuckDB ALTER DROP COLUMN limitacím na tabulkách
s historic FK constraints. INSERT picks intersection sloupců, takže
test fixtures s minimal pre-v19 schemou migrate cleanly.
Runtime:
- src/rbac.py:can_access_table → deleguje na app.auth.access.can_access
- DatasetPermissionRepository, AccessRequestRepository smazány
- AGNES_ENABLE_TABLE_GRANTS env-gate v app/resource_types.py odstraněn
(TABLE je unconditionally enabled)
API drop:
- app/api/permissions.py, app/api/access_requests.py celé soubory
- /admin/permissions web route + admin_permissions.html
- "Request Access" modal v catalog.html + locked-row UI
- ~10 if user.get("role") != "admin" checků nahrazeno (admin shortcut
je uvnitř can_access_table)
- /api/settings: drop permissions field z GET; PUT /api/settings/dataset
gate přepnut na can_access(user_id, "table", dataset, conn)
Auth:
- app/auth/jwt.py:create_access_token: drop role parametr (claim zmizí
z nově vydávaných JWT; staré tokeny zůstávají valid, claim ignored)
- app/api/users.py: drop role z CreateUserRequest / UpdateUserRequest
(admin promotion = explicit add to Admin group via memberships API)
- src/repositories/users.py: drop role z create() / update()
CLI:
- da admin set-role smazán → hard-fail s replacement command
- da admin add-user --role flag pryč
- da auth import-token --role flag pryč
- da auth whoami: drop "Role:" výpis
- cli/config.py:save_token: role parametr now optional, no longer written
(back-compat se starými token.json soubory zachována — pole se ignoruje)
Tests:
- DELETE: test_permissions.py, test_permissions_api.py, test_access_requests_api.py
- REWRITE: test_access_control.py (resource_grants flow), test_rbac.py
(can_access_table over resource_grants), test_journey_rbac.py
(drop access-request flow), test_resource_types.py (drop env-gate
tests, drop is_public from helpers), test_v2_*.py (drop role-based
user dicts in favor of id-based + Admin group membership),
test_settings_api.py (no permissions field, can_access gate)
- TRIVIAL: ~30 souborů — drop role="admin" arg z UserRepository.create
a 3rd positional role z create_access_token
- NEW: test_v18_to_v19 migration test (test_db.py),
test_can_access_table_no_implicit_public (test_rbac.py),
test_admin_set_role_returns_hardfail (test_cli_admin.py)
- OpenAPI snapshot regenerated
Docs:
- CHANGELOG: BREAKING entry pod [Unreleased]
- CLAUDE.md: schema v18 → v19
- docs/architecture.md: schema table + RBAC sekce přepsána
- docs/auth-google-oauth.md: admin promotion přes da admin break-glass
- cli/skills/security.md: kompletně přepsáno na group-based model
- docs/TODO-rbac-data-enforcement.md: smazáno (TODO splněn)
Test results: 2363 passed, 19 failed. Zbývající failures jsou pre-existing
Windows-specific issues (fcntl, charset) nesouvisející s tímto PR —
ověřeno git stash pop.
Plan: ~/.claude/plans/floofy-coalescing-parnas.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(release): cut 0.27.0
---------
Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
* docs(spec): #134 unify BigQuery access behind BqAccess facade
Brainstorm output for issue #134. Captures:
- root cause (incl. correction of the issue's hypothesis about commit 33a9964)
- BqAccess facade API + project resolution rules
- error contract — typed BqAccessError mapped to HTTP 502 for upstream
BQ failures, 500 for deployment/config bugs
- migration plan for v2_scan, v2_sample, RemoteQueryEngine
- test rewrite eliminating _bq_client_factory injection point
- E2E verification protocol on agnes-development as success criterion
* docs(spec): #134 revise after first review
Incorporates code-reviewer findings:
Must-fix:
- Add v2_schema (2 copies of INSTALL/LOAD/SECRET dance) to migration scope.
- Reframe v2_scan headline: missing try/except around BQ calls is the
actual cause of bare 500s, not project resolution (which 33a9964 fixed).
- List two more deferred call sites (extractor.py, register_bq_table)
with explicit rationale.
Important:
- Drop billing != data clause from cross_project_forbidden heuristic;
rely only on 'serviceusage' substring. billing != data is normal
for cross-project setup, was over-classifying.
- Split bq_bad_request into _user (400) and _server (502) variants;
add sql_origin parameter to translate_bq_error so call sites declare
whether SQL contains user input.
- Add @functools.cache to BqAccess.from_config; document tests bypass
via dependency_overrides.
- Replace monkey-patched-classmethod test pattern with
BqAccess(client_factory=...) injection at construction time. Cleaner
than today's _bq_client_factory and 1:1 migration shape.
- Keep BqProjects.data (reviewer assumed registry has source_project;
it doesn't). Multi-project explicitly listed as non-goal with note.
Nice-to-have:
- Add 'Implementation strategy' section: 2 staged commits (bug fix
alone is revertable; refactor follows).
- Extend E2E protocol to cover all three endpoints, not just /sample.
- Note removal of stale docstring at src/remote_query.py:204.
* docs(spec): #134 revision 3 — incorporates second-round review
Must-fix from second review:
- v2_schema split into two migration cases: _fetch_bq_schema translates
errors via translate_bq_error; _fetch_bq_table_options preserves its
swallow-all 'except Exception → return {}' so /schema doesn't 502 on
partition-info failures.
- RemoteQueryEngine.__init__ now resolves BqAccess lazily (in
_get_bq_client, not in __init__). Without this, ~7 DuckDB-only tests
in test_remote_query.py would suddenly fail with not_configured.
- translate_bq_error pass-through for BqAccessError is now load-bearing
(clause 1, before any Google-API branch). bq.client() raises BqAccessError
for bq_lib_missing/auth_failed; without explicit pass-through those
fall to 'unknown' and re-raise as bare 500.
- Commit 1 now emits the SAME structured response shape as commit 2 to
avoid contract churn between commits.
- BIGQUERY_PROJECT env-var precedence is BREAKING for env-only deployments
— flagged in CHANGELOG ### Changed.
Editorial:
- sql_origin renamed to bad_request_status with values 'client_error' /
'upstream_error' (clearer about what the parameter actually decides).
bq_bad_request_user/_server kinds collapsed to bq_bad_request (400)
and bq_upstream_error (502).
- CLI (cli/commands/query.py) noted as external RemoteQueryEngine caller;
unaffected because new bq_access kwarg has default None.
- Added unit/integration tests for the new contracts:
test_translate_passes_through_BqAccessError,
test_v2_scan_returns_500_on_bq_lib_missing,
test_v2_schema_returns_200_with_empty_partition_on_bq_failure,
test_resolve_succeeds_after_config_set.
- E2E protocol now covers /schema as the fourth endpoint.
- Documented functools.cache-doesn't-cache-exceptions semantics and
fixture nullcontext-doesn't-close caveat for nested sessions.
* docs(spec): #134 revision 4 — incorporates third-round review
Third reviewer verdict: 'implementation-ready with two trivial edits';
explicitly noted prior rounds did the heavy lifting.
Edits:
1. get_bq_access() module-level function instead of @classmethod
@functools.cache from_config. Removes the classmethod-cache stacking
footgun (different Python versions wrap differently) and gives FastAPI's
dependency introspection a clean function signature. Drops the
'Do not subclass BqAccess' caveat that no longer applies.
2. Commit 1 strategy explicitly: wrap _fetch_bq_sample (v2_sample),
_bq_dry_run_bytes + _run_bq_scan (v2_scan), and _fetch_bq_schema
(v2_schema strict block). Do NOT touch _fetch_bq_table_options swallow-all
in commit 1 — preserved as-is, then migrated (still preserved) in commit 2.
All three endpoints emit the same structured body shape so client parsers
see one consistent contract throughout the staged rollout. No more
half-rolled-out window where /sample is bare 500 while /scan is
structured 502.
* docs(plan): #134 implementation plan — Phase 1 (atomic bug fix) + Phase 2 (BqAccess refactor) + Phase 3 (verification)
Bite-sized TDD tasks. 3 phases, 16 tasks total:
Phase 1 (Commit 1) — atomic bug fix across all four v2 endpoints:
Tasks 1.1-1.5 wrap _fetch_bq_sample, _bq_dry_run_bytes, _run_bq_scan,
_fetch_bq_schema with structured 502/400 try/except. _fetch_bq_table_options
preserved untouched. CHANGELOG Fixed entries.
Phase 2 (Commit 2) — BqAccess facade extraction + migration:
Tasks 2.1-2.5 build connectors/bigquery/access.py bottom-up
(BqProjects, BqAccessError, translate_bq_error, default factories,
BqAccess class, get_bq_access module-level cached). Task 2.6 adds
conftest.py fixture. Tasks 2.7-2.9 migrate v2_scan, v2_sample, v2_schema
to BqAccess. Tasks 2.10-2.11 migrate RemoteQueryEngine + tests
(lazy bq_access, drop _bq_client_factory). Task 2.12 CHANGELOG
Changed BREAKING + Internal.
Phase 3 — Verification:
3.1 full pytest. 3.2 squash into two PR-shape commits. 3.3 manual
E2E on agnes-development per spec protocol → close#134.
Self-review table maps spec sections to implementing tasks; no gaps.
* fix(v2): #134 structured 502/400 on BQ errors across /scan, /scan/estimate, /sample, /schema
Wraps the BigQuery call sites in v2_scan, v2_sample, and v2_schema (strict
block only) with try/except for google.api_core exceptions, translating to
HTTPException with a structured body shape: {error, message, details}.
Fixes Pavel's report (#134) where these endpoints returned bare HTTP 500
with no body when the SA on agnes-development hit cross-project Forbidden
on serviceusage.services.use.
Also fixes /sample's missing billing_project fallback (the bug 33a9964
fixed for /scan never landed here).
Status code split:
- /scan, /scan/estimate: BadRequest -> 400 (bq_bad_request) since SQL is
user-derived from req.select/where/order_by.
- /sample, /schema: BadRequest -> 502 (bq_upstream_error) since SQL is
server-constructed from validated identifiers.
- All Forbidden -> 502 with cross_project_forbidden if 'serviceusage' in
error message (with hint pointing at data_source.bigquery.billing_project),
else bq_forbidden.
Body shape matches what the upcoming BqAccess refactor (next commit) will
produce, so client-side parsers see one consistent contract throughout
the staged rollout.
_fetch_bq_table_options preserved exactly as-is — its swallow-all-and-return-empty
contract is intentional and survives into the refactor; /schema continues to
return 200 with empty partition info when partition queries fail.
Outer wraps in scan_endpoint, scan_estimate_endpoint, sample, and schema
endpoints exist only to make the test pattern (monkeypatching whole
_fetch_* functions) work, and are tagged TODO(#134 Phase 2) for removal
once BqAccess centralizes translation.
* refactor(bq): #134 BqAccess facade — unify v2_scan, v2_sample, v2_schema, RemoteQueryEngine
Extracts the duplicated BigQuery-access pattern (project resolution +
client construction + DuckDB-extension session + Google-API error
translation) into connectors/bigquery/access.py. Migrates four
call sites to use it:
- app/api/v2_scan.py — _bq_dry_run_bytes, _run_bq_scan
- app/api/v2_sample.py — _fetch_bq_sample
- app/api/v2_schema.py — _fetch_bq_schema (strict translation),
_fetch_bq_table_options (preserves swallow-all best-effort contract)
- src/remote_query.py — RemoteQueryEngine, lazy bq_access kwarg
The new module exposes:
- BqProjects (frozen dataclass: billing + data project IDs)
- BqAccessError (typed exception with HTTP_STATUS class mapping)
- BqAccess (facade with injectable client_factory/duckdb_session_factory
for tests; defaults call the real google-cloud-bigquery + DuckDB extension)
- get_bq_access (module-level @functools.cache; FastAPI Depends target)
- translate_bq_error (Google API exception → BqAccessError mapper, with
BqAccessError pass-through, 'serviceusage'-substring heuristic for
cross_project_forbidden, and bad_request_status param distinguishing
user-derived (400) from server-constructed (502) SQL)
- _default_client_factory, _default_duckdb_session_factory
RemoteQueryEngine.__init__ no longer accepts _bq_client_factory; tests
migrate to bq_access=BqAccess(projects, client_factory=...). DuckDB-only
RemoteQueryEngine tests need no changes — bq_access defaults to None and
get_bq_access() is only invoked on first BQ call (lazy resolution).
BqAccessError raised internally is translated to RemoteQueryError(
error_type="bq_error") in _get_bq_client to preserve the engine's
existing public contract — CLI and /api/query/hybrid callers see no change.
Endpoint tests (test_v2_scan, test_v2_scan_estimate, test_v2_sample,
test_v2_schema) migrate from monkey-patching whole _fetch_* functions
to using the new bq_access fixture in tests/conftest.py — which
exercises the REAL translation path through BqAccess + translate_bq_error,
closing the test gap flagged in Task 1.1's review.
Side-effect behavior change: v2_sample's FROM clause now uses the data
project (instance.yaml data_source.bigquery.project), not the conflated
billing_project from Phase 1. Documented in CHANGELOG ### Internal.
BREAKING for deployments combining BIGQUERY_PROJECT env var with
data_source.bigquery.project in instance.yaml — env var now overrides
data project too. See CHANGELOG ### Changed.
Two known-duplicate BQ-access sites (connectors/bigquery/extractor.py,
scripts/duckdb_manager.register_bq_table) explicitly out of scope;
tracked as follow-up.
Removed stale docstring at the previous src/remote_query.py:204
that referenced scripts.duckdb_manager._create_bq_client as the default
BQ client factory (RemoteQueryEngine never actually used that function).
Test counts: tests/test_bq_access.py +27 (new), tests/test_v2_*.py +
tests/test_remote_query.py migrated to bq_access fixture (counts unchanged
or +1-2 per file). Full suite: 2086 passed, 8 pre-existing failures
(DB migration tests with unrelated internal_roles DependencyException —
not introduced by this PR).
* fix(bq_access): translate DefaultCredentialsError to BqAccessError(auth_failed)
CI on PR #138 caught: bigquery.Client(...) resolves Application Default
Credentials at construction time; without ADC (CI without SA key, dev
laptop without 'gcloud auth application-default login') it raises
google.auth.exceptions.DefaultCredentialsError synchronously.
Pre-fix _default_client_factory only caught ImportError, so DefaultCredentialsError
propagated as raw exception — and from production endpoints would surface
as bare 500 (the exact failure mode #134 sets out to fix).
Now translates to BqAccessError(kind='auth_failed', details.hint='Run
gcloud auth application-default login...'). Endpoint catch chain returns
HTTP 502 with structured body. Adds unit test
test_raises_auth_failed_on_default_credentials_error.
Third-round spec review flagged this case in passing; the fix didn't land.
CI's auth-less environment surfaced it.
* fix(bq_access): get_bq_access() returns sentinel instead of raising when not configured
Devin BUG_0001 on PR #138 review: 'get_bq_access() as FastAPI Depends
breaks all v2 endpoints for non-BigQuery instances'.
Pre-fix: get_bq_access() raised BqAccessError(not_configured) when
neither BIGQUERY_PROJECT env nor data_source.bigquery.project was set.
Because FastAPI resolves Depends() BEFORE the endpoint body runs, this
exception fires during dep-injection — the endpoint's try/except
BqAccessError clause never gets a chance to catch it. Result: every
v2 request on Keboola-only or CSV-only instances returned bare HTTP
500, even for local-source tables that never touch BigQuery.
Fix: get_bq_access() now returns a sentinel BqAccess with empty
BqProjects and factories that raise BqAccessError(not_configured)
on actual use. Construction succeeds, FastAPI's dep-injection cleanly
yields the sentinel, the endpoint runs. The local-source code path
in build_sample / build_schema / etc. never calls bq.client() or
bq.duckdb_session() (it reads parquet directly), so non-BQ tables
return 200 as before. Only when an endpoint actually tries to query
BQ (source_type == 'bigquery') does the sentinel raise — and the
endpoint's existing except BqAccessError catches it normally,
returning structured 502 with hint.
Test get_bq_access::test_raises_not_configured_when_neither_set
renamed and rewritten to test_returns_sentinel_when_neither_set:
asserts BqAccess is returned, then asserts client() and
duckdb_session() each raise BqAccessError(not_configured) on call.
Test test_does_not_cache_exceptions removed (no longer applicable)
and replaced with test_sentinel_is_cached_per_process documenting
the operator-restart-on-config-change contract.
* docs(spec+plan): #134 genericize customer-specific tokens (CLAUDE.md OSS rule)
Devin BUG_0001/0002 round 3 on PR #138: spec and plan docs contained
customer-specific deployment hostnames, deployment names, and a GCP
project ID that violated CLAUDE.md's vendor-agnostic OSS rule
('Nothing customer-specific belongs in code, configuration defaults,
comments, docs, commit messages, PR titles, or PR bodies').
Replacements:
agnes-development.groupondev.com -> <your-agnes-host>
agnes-development -> <your-dev-instance>
prj-grp-dataview-prod-1ff9 -> <your-data-project>
s1_session_landings -> <bq_table_id>
E2E verification semantics unchanged — operators still run the same
four curls + config flip + retry, just substituting their own host /
deployment name / project / table.
* fix(bq_access): hook get_bq_access.cache_clear into instance_config.reset_cache
Devin ANALYSIS_0004 on PR #138: get_bq_access is @functools.cache'd at
process level, so it captures BigQuery project IDs at first call and
ignores subsequent instance.yaml changes. Pre-Phase-2 the v2 endpoints
re-read get_value() on every request, so admin /api/admin/server-config
saves (which call instance_config.reset_cache()) hot-reloaded the BQ
project. Without this fix, my refactor silently regresses that contract
— operators editing instance.yaml via the admin UI would see no effect
on v2 endpoints until container restart.
instance_config.reset_cache() now also calls
connectors.bigquery.access.get_bq_access.cache_clear() (lazy import,
swallowed if connectors module isn't loaded — keeps instance_config
usable in isolated unit tests).
Adds test_instance_config_reset_cache_invalidates_get_bq_access as
regression guard. Updates CHANGELOG Internal entry to mention the
hot-reload contract + the not-configured sentinel behavior (round-3
fix from Devin BUG_0001 was previously only in commit message).
* fix(bq_access): surface not_configured before identifier validation + plan path genericize
Devin BUG_0001 + BUG_0002 round 5 on PR #138.
BUG_0001 (plan doc): personal filesystem path violated CLAUDE.md
vendor-agnostic rule. Replaced with '<worktree-root>' placeholder.
BUG_0002 (sentinel error path): when get_bq_access() returns the sentinel
BqAccess (BQ not configured), the empty bq.projects.data was reaching
validate_quoted_identifier first and raising ValueError -> endpoint
mapped to HTTP 400 'unsafe_identifier' instead of structured 500
'not_configured' with hint.
Each fetch helper now checks 'if not bq.projects.data: bq.client()' as
the first step, which triggers the sentinel's BqAccessError(not_configured).
Endpoint catches the typed error and returns HTTP 500 with hint pointing
at data_source.bigquery.project. Best-effort _fetch_bq_table_options
returns {} silently in this case (preserves the swallow-all contract).
* fix(bq_access): classify DuckDB-native exceptions from bigquery_query() via string match
Devin ANALYSIS on PR #138 review (latest round). The DuckDB bigquery
extension is a C++ plugin making its own HTTP calls — when BQ returns
403, it throws duckdb.IOException with the BQ error embedded as text,
not gax.Forbidden. translate_bq_error's isinstance checks would miss
these, falling to case 7 → bare 500 in production for v2_scan, v2_sample,
and v2_schema (the bigquery_query() paths).
Fix: last-resort string-match heuristic before the re-raise. 'Forbidden'
/ '403' / 'Bad Request' / '400' in the lowercased message classifies via
the same kind hierarchy. The 'serviceusage' substring still distinguishes
cross_project_forbidden from bq_forbidden. Specific enough that random
exceptions without HTTP-error keywords still re-raise.
Adds 4 unit tests covering the new heuristic + the 'don't swallow random
exceptions' invariant.
* chore(release): cut 0.22.0
PR #138 contains issue #134 user-visible behavior changes:
- BREAKING: BIGQUERY_PROJECT env var now overrides instance.yaml
data_source.bigquery.project for v2 endpoints (previously
RemoteQueryEngine billing only).
- Fixed: structured 502/400 on /api/v2/sample, /scan, /scan/estimate,
/schema when BigQuery raises Forbidden/BadRequest (was bare 500).
- Internal: BqAccess facade refactor unifying four duplicate BQ-access
call sites; instance_config.reset_cache() now invalidates BqAccess
cache too so admin server-config saves hot-reload BQ project IDs.
Bumps to 0.22.0 because PR #137 merged first and took 0.21.0.
Bootstraps the Agnes Claude Code marketplace + RBAC-allowed plugins from
the dashboard CTA, and inlines the server's TLS cert when the chain isn't
publicly trusted (self-signed / private CA). Cross-platform setup prompt
covers Windows Git Bash, macOS, Linux. Includes Bun-compiled `claude` fix
(macOS goes via git-clone fallback, same as Windows), PAT stripping after
clone, explicit error handling, and four rounds of Devin Review fixes
(phantom step references, $PLATFORM re-detection, heredoc/awk line-count
sync). Cuts 0.21.0.
See CHANGELOG.md [0.21.0] section for details.
Closes the /plugin UI 'Plugin <X> not found in marketplace' bug. Synth marketplace.json catalog 'name' now reads from <plugin_dir>/.claude-plugin/plugin.json (with fallback to upstream marketplace.json name). On-disk plugins/<slug>-<plugin>/ layout preserved so cross-marketplace files don't collide. /marketplace/info exposes both name and prefixed_name (BREAKING — downstream tooling parsing 'name' for the slug-prefixed form must switch to prefixed_name).
Three new env vars wire the Google OAuth callback to a configurable Workspace prefix and route admin/everyone Workspace groups onto the seeded system rows: AGNES_GOOGLE_GROUP_PREFIX, AGNES_GROUP_ADMIN_EMAIL, AGNES_GROUP_EVERYONE_EMAIL. Login gate redirects users with no prefix-matching group to /login?error=not_in_allowed_group. BREAKING: auto-Everyone membership for new users removed. Admin UI/API are read-only on Google-managed groups. See docs/auth-groups.md.
Issue #108 Milestone 1. Adds BigQuery table registration via /admin/tables UI and `da admin register-table` CLI without hand-editing table_registry. POST /api/admin/register-table/precheck for round-trip validation. --dry-run flag on CLI. Audit-log entries on register/update/unregister. PUT /api/admin/registry/{id} now preserves registered_at (closes#130).
* fix(scheduler): HTTP marketplaces job + SCHEDULER_API_TOKEN shared secret
Two scheduler-reliability bugs surfaced after the v0.12.1 USER-agnes flip:
1. The marketplaces job called src.marketplace.sync_marketplaces() in-process
from the scheduler container, racing the app's long-lived system.duckdb
handle. DuckDB rejects cross-process writers — every cron tick 500-ed on
"Could not set lock on file ... PID 0".
2. The data-refresh + new marketplaces jobs both 401-ed on the API because
SCHEDULER_API_TOKEN was never propagated by the Terraform startup script.
The scheduler had no credential to authenticate with.
Fix:
- New POST /api/marketplaces/sync-all (admin-only) drives the nightly refresh
through the app process so it inherits the existing DB connection.
- Scheduler swaps fn->http for marketplaces; all jobs are now plain HTTP and
the scheduler is reduced to a cron clock.
- New app/auth/scheduler_token.py adds a shared-secret auth path. The
startup script generates a 256-bit secret on first boot, persists it
across reboots, and writes it to /opt/agnes/.env. Both containers source
the same .env. The app validates incoming Bearer tokens against the env
var (constant-time, length-floored) and resolves matches to a synthetic
scheduler@system.local user that's a member of the Admin system group.
Audit-log entries from the scheduler are attributed to this user.
- app/main.py seeds the synthetic user at startup so the first cron tick
has a valid actor; lazy seed in get_scheduler_user covers token rotation
before the next app restart.
Tests: 5 new in tests/test_auth_scheduler_token.py covering empty/short
secret rejection, exact-match comparison, idempotent user seeding, and
lazy provisioning. 142 marketplace + scheduler tests + 96 auth tests
remain green.
Existing VMs with .env from before this change need a one-time
re-provisioning (re-run startup-script or rotate via openssl rand);
documented in CHANGELOG.
* fix(audit): use '_all' sentinel for bulk marketplace sync — Devin review #127
Avoids the literal string 'marketplace:None' in the audit_log resource
column when the bulk sync endpoint writes its summary row.
* fix(scheduler): unblock event loop + per-job timeouts — Devin review #127
Two findings from Devin re-review on commit 5fbad15:
1. BUG: trigger_sync_all was async def, so FastAPI ran it on the asyncio
event loop. sync_marketplaces() does blocking I/O (subprocess git
clones up to GIT_TIMEOUT_SEC=300 each, threading.Lock, DuckDB writes)
and would freeze every concurrent request for the duration of a bulk
sync. Switched to plain def so FastAPI auto-routes to the thread pool.
2. ANALYSIS: scheduler used a fixed 120s httpx timeout for every POST.
Bulk marketplace sync iterates the registry under a single lock with
up to 300s per repo — easily exceeds 120s on 2-3 slow repos. The
scheduler then sees a timeout, doesn't update last_run, and re-fires
on the next 30s tick, queueing redundant work. Per-job timeout
override added to the JOBS tuple; marketplaces gets 900s (15 min),
data-refresh keeps 120s, health-check 30s.
* fix(auth): require_session_token rejects scheduler shared secret — Devin review #127
require_session_token gates /auth/tokens (PAT minting). Pre-fix it only
rejected JWTs with typ=pat — but the scheduler shared secret is an opaque
string, so verify_token() returns None, payload becomes {}, and the
PAT-claim check silently passed. A caller bearing SCHEDULER_API_TOKEN
could mint persistent PATs that survive a secret rotation.
Added explicit is_scheduler_token() check before the PAT-claim check;
new regression test in tests/test_auth_scheduler_token.py.
Devin's other note (pre-existing async def trigger_sync at marketplaces.py:392
also calls blocking sync_one) — Devin flagged it as out-of-scope for this PR
and I agree; tracking separately.
* release(0.17.0): cut + clean up CHANGELOG duplicates
Cuts 0.17.0 (minor: scheduler shared-secret auth + sync-all endpoint
plus the deploy-shape fixes that landed since the last release tag).
Bumps pyproject from 0.15.0 — also corrects the missed bump from PR #120
(v0.16.0 was tagged on GitHub and shipped as :stable, but pyproject
stayed at 0.15.0, so /api/version, /cli/latest, and `da --version` had
been under-reporting the running release).
Removes the long-form duplicate entries for 0.13.0 / 0.14.0 / 0.15.0
above [0.16.0] — the canonical short summaries (with GitHub-release
links) already exist below 0.16.0, the long forms were leftover state
from before those versions were cut and have been silently shadowed
ever since.
Adds /me/debug HTML page rendering the logged-in user's own session state — decoded JWT claims (no raw token, sha256[:12] fingerprint for log correlation), group memberships with sources and bound external_id when present, resource grants effective via those memberships, and a Refetch from Google (dry-run) button that diffs a fresh fetch_user_groups call against the cached user_group_members snapshot. Gated by AGNES_DEBUG_AUTH env var (default off → 404, route existence undetectable in production). Self-only by construction: user_id is read from the validated session, never echoes raw JWT / password hash / full PAT. Tolerates v13 + v14 schemas via information_schema check on users.external_id.
Replaces the BigQuery wrap-view pattern with a discovery + scoped-fetch toolkit driven by the analyst's Claude session. Adds /api/v2/{catalog,schema,sample,scan,scan/estimate}, da catalog/schema/describe/fetch/snapshot/disk-info CLI commands, sqlglot-backed WHERE validator, process-local quota tracker, agent rails skill (cli/skills/agnes-data-querying.md). BREAKING: BQ wrap views off by default — set data_source.bigquery.legacy_wrap_views=true for one cycle. Backward-compat field_validator on primary_key. Catalog cache now matches documented 300s TTL with RBAC fresh per request. Cuts release v0.14.0.
Adds /admin/server-config UI for editing instance.yaml from the web. Hardening: SSRF gate on data_source URLs, narrow-overlay write strategy, atomic writes, audit log with secret masking on shape changes, threading lock on read-modify-write, corrupt-overlay refusal on write side + louder log on read side, modal Promise resolution on backdrop dismiss, sentinel scrub on save (defense-in-depth client+server). Bundles Windows PowerShell wrapper from #80. Cuts release v0.13.0.
* fix(security+ops): #82#85#87 — auth hardening, API validation, deploy posture
Security and operational hardening across three issue groups:
- M23: docker-compose.override.yml → docker-compose.dev.yml (BREAKING, prod foot-gun)
- C13: Container runs as non-root user 'agnes' (USER directive in Dockerfile)
- M21: Docker resource limits (mem_limit, cpus) on app + scheduler
- M22: Caddyfile security headers (X-Frame-Options, X-Content-Type-Options, Referrer-Policy, -Server)
- M17: /api/health split into minimal (unauth) + /api/health/detailed (auth) (BREAKING)
- M26: release.yml restricts build-and-push to main + workflow_dispatch; paths-ignore for docs
- C2: table_id traversal validation on /api/data/{table_id}/download
- M4: Upload streaming (chunk-read + temp file) instead of full-buffer; /local-md hashed filename
- C5: reset_token removed from POST /api/users/{id}/reset-password response
- C8: Startup WARNING when no user has password_hash (bootstrap window visible)
- M9: Audit log on failed web form login (mirrors /auth/token endpoint)
- M10: Atomic magic-link consume via compare-and-swap (CONSUMED: marker + DuckDB conflict catch)
Also: SSRF protection on /api/admin/configure (#46), memory stats SQL aggregation (#90)
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
* fix(review): SSRF 169.254.x.x + IPv6 multicast; M10 marker cleanup safety
Review fixes:
- Add 169.254.0.0/16 (link-local, cloud metadata) to SSRF regex — was
missing, allowing requests to AWS/GCP/Azure metadata endpoints
- Add ff[0-9a-f]{2}: (IPv6 multicast) to SSRF regex
- M10: wrap Step 3 (CONSUMED marker cleanup) in try-except with
warning log — prevents unhandled exception if DB write fails after
successful token consumption
- Add test for 169.254.169.254 SSRF rejection
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
* fix(review): SSRF IPv6 bypass, CLI health endpoint, upload FD leak
Address Devin Review findings on PR #104:
1. SSRF IPv6 bypass: Replace hostname regex with DNS resolution +
ipaddress module checks. The old regex patterns like `fe80:` only
matched up to the first colon, missing real IPv6 addresses like
`fe80::1`, `fc00::1`, `ff02::1`. The new approach resolves the
hostname via getaddrinfo and checks each resulting IP against
ipaddress.is_private/is_loopback/is_link_local/is_reserved/is_multicast.
2. CLI commands broken: `da setup test-connection`, `da setup verify`,
`da diagnose`, `da status` all called /api/health expecting the old
format (status=="healthy", services dict). Now they call
/api/health/detailed for service-level checks (with graceful fallback
to the minimal endpoint when auth is not configured).
3. Temp file handle leak: _stream_to_temp returns an open
NamedTemporaryFile; callers now close it before shutil.move() to
prevent FD leaks until GC.
Also adds IPv6 SSRF test cases (loopback, link-local, unique-local,
multicast) with mocked DNS resolution for test environment independence.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
* fix(review): download regex blocks hyphenated IDs; document health split
Address Devin Review round-3 findings on PR #104:
1. _SAFE_IDENTIFIER regex blocked hyphenated table IDs: The download
endpoint used the strict SQL-identifier regex which does not allow
dots or hyphens, but Keboola table IDs like in.c-crm.orders
contain both. Switched to _SAFE_QUOTED_IDENTIFIER which allows dots
and hyphens while still blocking path-traversal chars (/, .., \)
and quote/control characters. Added test for hyphenated/dotted IDs.
2. Documented health endpoint split in DEPLOYMENT.md: Added Health
checks & external monitoring section explaining both endpoints
(minimal unauth /api/health vs authenticated /api/health/detailed)
and how to wire external monitoring tools to the detailed endpoint
with a PAT.
Generated with [Devin](https://cli.devin.ai/docs)
Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
* release(0.12.1): cut hotfix for snapshot integrity + #82/#85/#87 hardening
* fix(security): apply CAS pattern to password reset confirm (#82/M10 follow-up)
Devin review on the rebased PR flagged the asymmetry: magic-link verify
got the atomic compare-and-swap pattern in the original M10 fix, but
password reset confirm at /auth/password/reset/confirm was still using
read-validate-clear. Two concurrent POSTs with the same valid reset
token could both succeed in setting different new passwords (last-write-
wins). Lower severity than the magic-link race because the attacker
would need the reset token AND to race the legitimate user, but the
asymmetry was a polish gap.
Mirrors app/auth/providers/email.py::_consume_token CAS exactly: write
unique CONSUMED:<random> marker via UPDATE...WHERE token=old_token, then
SELECT to verify our marker won, then proceed. Only the winner clears
the marker and applies the password change.
New regression test_concurrent_reset_only_one_wins in
tests/test_password_flows.py::TestResetConfirm pins the contract: two
ThreadPoolExecutor workers + Barrier hit /reset/confirm with the same
token; exactly one gets 302 (password applied), the other gets 200 with
'Invalid or expired'. Sanity-checked against the pre-CAS code — both
POSTs got 302 (race confirmed).
---------
Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Discovered when 0.11.5 deployed onto agnes-dev whose system DB had been
bumped to schema_version=10 during local experimentation with a parallel
WIP branch (PR #72-style Context Engineering work). The lab v10 migration
laid down its own table set without including v9's role tables — so the
v9 binary saw `current=10 > SCHEMA_VERSION=9`, correctly treated it as a
future-version-rollback and skipped its migration ladder, but ALSO
skipped the table-creation step. Every query against user_role_grants
(`_hydrate_legacy_role`, /profile, require_internal_role's DB fallback,
every admin-gated request) then crashed with `_duckdb.CatalogException:
Table with name user_role_grants does not exist`. Symptom on agnes-dev:
HTTP 500 on /profile, admin nav vanished, /admin/* returned 403.
Fix: hoist `conn.execute(_SYSTEM_SCHEMA)` to the TOP of _ensure_schema,
unconditional. _SYSTEM_SCHEMA is all `CREATE TABLE IF NOT EXISTS`, so
existing tables stay untouched (columns + data preserved); missing
tables get created. Idempotent, near-zero cost (a few dozen no-op DDLs
per process start). The migration block below still calls
_SYSTEM_SCHEMA when migrating; that's now the redundant-but-cheap
follow-up — left in place so the migration ladder reads chronologically.
Concrete coverage of the rebase scenario the user asked about — a
contributor switching FROM a lab future-schema branch BACK to a
released binary now boots cleanly:
- Forward rebase (older → current): unchanged, ladder runs as before.
- Same-version rebase: unchanged, _seed_core_roles tail call still
drives doc-tweak refresh.
- Backward "lab" rebase (this fix): tables get re-materialized; if the
DB is still on a future schema_version, _seed_core_roles tail call
remains gated so we don't accidentally write data into a schema
shape this binary doesn't understand. Operator can drop the v9
schema_version manually to trigger a clean ladder re-run if they
want the full v8→v9 backfill (what we did to recover agnes-dev).
Test: new test_split_brain_future_version_with_missing_tables_self_heals
in tests/test_db.py::TestMigrationSafety. Synthesizes a v99 DB whose
only existing table is schema_version, runs _ensure_schema, asserts
both user_role_grants AND internal_roles AND group_mappings AND users
exist after the call, and that the schema_version row stays at 99
(future-version contract). test_future_version_is_noop docstring
updated to reflect the new self-heal pass — its only assertion (the
version-row contract) still holds unchanged.
pyproject.toml: 0.11.5 → 0.11.6.
CHANGELOG.md: new [0.11.6] section under [Unreleased] skeleton.
Follow-up to the RBAC v13 + marketplace work in the parent commit. Addresses
deferred Devin findings, gemini-flagged blockers, and adds three guard rails.
== Schema v14 — FK constraints on user_group_members + resource_grants ==
Adds DuckDB foreign-key constraints so cascade deletes can no longer leave
orphaned member / grant rows pointing at a deleted group_id (which were
relying on application-level cascades up to v13). Migration is RENAME →
CREATE-with-FK → INSERT → DROP, wrapped in BEGIN TRANSACTION so a partial
failure rolls back without leaving the DB at a half-applied schema.
== AGNES_ENABLE_TABLE_GRANTS feature flag (default off) ==
ResourceType.TABLE was shipped in the parent commit as listing-only — admins
can record grants but runtime enforcement still flows through legacy
dataset_permissions. To avoid the misleading-UX surface area, the chip is
hidden from /admin/access and POST /api/admin/grants returns 422 with the
env-var name in detail until the operator opts in. Existing TABLE rows in
resource_grants stay listable + deletable so cleanup is never blocked.
Helpers: is_resource_type_enabled(rt), enabled_resource_types().
== Break-glass admin CLI ==
`da admin break-glass <user>` adds the user to the Admin user_group with
source='system_seed' regardless of RBAC state. Bypasses authentication —
relies on filesystem access to ${DATA_DIR}/state/system.duckdb implying
host-level trust. Recovery path when the operator has locked themselves
out of /admin/access.
== Devin round-2 fixes (deferred on b4ec4c4) ==
- src/repositories/user_groups.py — narrow update() guard from blocking any
mutation on system groups to blocking name change only. Description edits
now pass through. Endpoint pre-check stays as defense-in-depth. Prior
behavior surfaced as a misleading 409 'Cannot rename a system group' on
description-only PATCH.
- app/api/access.py:delete_group — wrap cascade DELETEs + repo.delete in
BEGIN TRANSACTION / COMMIT / ROLLBACK. Prevents orphan rows if any
DELETE fails after the user_groups row is gone.
- app/marketplace_server/{packager,router}.py — split compute_etag_for_user()
from build_zip(); router resolves etag first and 304-shorts before any
file read or ZIP_DEFLATED. In-process cachetools.TTLCache (default 120s,
env-tunable via AGNES_MARKETPLACE_ETAG_TTL, set 0 to disable).
invalidate_etag_cache() called by sync to force re-hash on content drift.
== Tests ==
- TestTableGrantsFeatureFlag (4 cases) — endpoint exclude/include, grant
rejection/acceptance under the flag.
- test_v12_to_v13_finalize_rollback_on_failure — destructive: monkeypatches
_seed_system_groups to raise mid-transaction, asserts schema_version stays
at 12, legacy tables intact, new tables empty (rollback fired). Then
restores the real function and asserts the retry succeeds.
- test_update_system_group_description_allowed,
test_update_system_group_same_name_no_op — repo-level coverage of the
narrowed guard.
This squashes 13 commits from ma/staging plus a small docstring translation
into a single coherent unit. Three workstreams.
== RBAC v13 redesign ==
- Drops core.viewer/analyst/km_admin/admin hierarchy and the
internal_roles / group_mappings / user_role_grants / plugin_access tables.
- Replaced by user_group_members + resource_grants. Atomic v12→v13 backfill
wrapped in BEGIN/COMMIT; ROLLBACK leaves schema_version at 12 for retry.
- Two authorization primitives in app.auth.access:
require_admin — Admin-group god-mode
require_resource_access(rt, "{path}") — entity-scoped grants
Single DB lookup per request; no session cache; no implies BFS.
- /admin/access UI (single page) replaces /admin/role-mapping +
/admin/plugin-access. CLI `da admin group/grant *` replaces
`da admin role/mapping/grant-role/revoke-role/effective-roles`.
- ResourceType.TABLE listing-only — admins can record table grants,
runtime enforcement still flows through legacy dataset_permissions
(migration plan in docs/TODO-rbac-data-enforcement.md).
== Claude Code marketplace ==
- Aggregated /marketplace.zip + /marketplace.git/* (PAT-gated,
RBAC-filtered, content-addressed cache via dulwich).
- Admin god-mode dropped on the marketplace surface — admins curate
their own view via grants like everyone else.
- Bare-repo cache materializes per RBAC-filtered ETag; stale entries
not pruned in this iteration (disclaimed in git_backend.py docstring).
== #81#83#44 security/ops hardening ==
- #81 Group A — orchestrator ATTACH allow-listing (extension/url/alias).
- #81 Group B — Keboola extractor 3-state exit codes:
0 success / 1 total fail / 2 PARTIAL fail
Sync API logs PARTIAL FAILURE alert on exit 2. Operators with binary
alerting must teach it the new partial signal.
- #81 Group C — schema v10 view_ownership; rejects silent overwrite
of a prior connector's view name on collision.
- #81 Group D — extractor-side identifier validation.
- #83 — Jira webhook fail-closed when JIRA_WEBHOOK_SECRET unset
+ path-traversal fix.
- #44 — entire /api/scripts/* surface is admin-only (planted-script +
sandbox-bypass risk closed).
== Web UI polish + deploy fix ==
- /admin/access: live grant-count badges (no stale snapshot revert),
shared-header CSS link added to /catalog and /admin/{tables,permissions},
per-resource-type colored stripes.
- docker-compose.host-mount.yml: bind,rbind so dual-disk hosts don't
silently shadow sub-mounts and write state to the wrong disk.
== OSS vendor-neutralization (waves 1+2) ==
- scripts/grpn/ → scripts/ops/. Customer-specific identifiers
(project IDs, internal hostnames, dev/prod VM IPs, brand names)
replaced with placeholders across code, docs, Terraform, Caddyfile,
OAuth probe, and planning docs. Downstream infra repos that copied
scripts/grpn/agnes-tls-rotate.sh or agnes-auto-upgrade.sh must
update the path.
== Translation ==
- src/repositories/user_groups.py::ensure_system docstring translated
from Czech to English for codebase consistency.
Co-authored-by: Mina Rustamyan <mina@keboola.com>