* fix(store): surface review failures + harden publish gate
Four independent fixes to the flea-market submission pipeline, all surfaced
by an admin upload that landed at status='approved' without an LLM review.
1. LLM truncation no longer pins submissions in review_error.
- Raised MAX_RESPONSE_TOKENS 2500 → 6000 in llm_review.py
- Added one-shot retry-with-doubled-budget in anthropic_provider.py
(capped at 4× initial)
2. Flea detail page surfaces the latest submission's failure verdict even
when a previously-approved version is still serving (deferred-promotion
path). The _quarantine_banner gate widened from `visibility != approved`
to also fire on `blocked_inline / blocked_llm / review_error`, with copy
that distinguishes the v2+ edit case ("Latest edit failed review —
previously approved version (vN) keeps serving") from the initial-upload
quarantine wording.
3. Restore button + endpoint no longer allow restoring a version that was
never approved. Added StoreEntitiesRepository.get_with_version_approvals
joining store_submissions, gated the UI button on submission_status in
('approved', None), rendered status pills for non-restorable rows, and
added a 400 version_not_approved guard in POST /restore.
4. **BREAKING (operator-facing)**: publish gate is now fail-CLOSED on
misconfig. The previous get_guardrails_enabled() silently fell back to
"disabled, auto-approve everything" when guardrails.enabled=true in YAML
but no ANTHROPIC_API_KEY was in env. Split into:
- get_guardrails_enabled() (intent — YAML)
- get_guardrails_llm_provider_ready() (readiness — env)
Three-state matrix:
enabled=false → auto-approve (unchanged)
enabled=true + ready=true → normal pipeline (unchanged)
enabled=true + ready=false (NEW) → submissions hold at pending_llm
awaiting admin retry or override
(was: silent auto-approve)
Admin "Retry review" eligibility broadened to include pending_llm.
Boot-time WARNING banner surfaces the misconfig in app/main.py.
docs/STORE_GUARDRAILS.md updated with the three-state matrix.
Operators relying on the auto-fallback for local-dev no-LLM setups must
now explicitly set `guardrails.enabled: false` in instance.yaml.
Tests: 4623 passed. Added TestPublishGateFailClosed (4 tests) and
TestRestoreVersion::test_restore_rejects_* (3 tests). conftest.py adds an
autouse fixture defaulting guardrails OFF so legacy tests don't need to
know about the new toggle.
* fix(store): admin override promotes v2+ edits to current
The override handler at app/api/admin.py:3708 only flipped submission
status → 'overridden' and entity visibility → 'approved'. Under the v37+
deferred-promotion model that's insufficient for v2+ edits / restores:
the new bundle sits in versions/v<N>/plugin/ and the entity row stays at
the prior approved version_no + hash + on-disk live bundle. Installers
kept getting the OLD bytes the admin had just intended to replace.
Mirror the runner.run_llm_review auto-approval branch: look up the
submission's version_hash in entity.version_history, and if its `n`
differs from entity.version_no, promote_version + _swap_live_to_version.
Initial v1 overrides are unaffected — the loop finds n=1 == version_no
and skips promotion.
Tests:
- test_override_v2_edit_promotes_to_current: stage v1 approved + v2
blocked_llm; override the v2 sub; assert entity.version_no=2,
entity.version flips off the v1 hash, and the live plugin/ dir
mirrors versions/v2/plugin/.
- test_override_v1_initial_upload_no_promote: regression guard so the
promote loop doesn't accidentally bump a v1 override.
Audit log gains a promoted_to_version_no field on the override action.
* fix(store): retry/rescan review staged bundle; override forward-only
Two adversarial-review findings from a Codex pass on the publish-gate
work.
C1. Admin retry + rescan were passing live `plugin/` to the LLM. For a
v2+ submission held at `pending_llm` / `blocked_llm` / `review_error`,
live still holds the prior approved version's bytes — so the LLM
reviewed the WRONG bytes, and the runner's hash-match promotion in
`run_llm_review` would then advance the entity to staged bytes that
were never actually reviewed. Resolve the staged
`<entity>/versions/v<N>/plugin/` from the submission's
`version_history` entry, with a fall-back to live for legacy pre-v37
rows that never seeded a versions/ dir. Helpers
`_submission_plugin_dir` and `_version_no_for_submission` added to
`app/api/store.py` so override / retry / rescan share one path.
H1. Override's promote loop used `target != current`, which would
silently demote the live bundle when admin overrode a stale v2
submission while v3 was already approved + live. Changed to
`target > current` so override flips status + visibility on the row
regardless, but on-disk promotion only fires forward. Same `>`
defensive guard applied in `runner.run_llm_review` so a late LLM
verdict racing with a newer approval can't demote either.
Tests:
- TestAdminRetryReviewsStagedBundle::test_retry_v2_blocked_passes_staged_dir_not_live
- TestAdminRetryReviewsStagedBundle::test_rescan_v2_blocked_passes_staged_dir_not_live
- TestOverrideForwardOnly::test_override_stale_v2_does_not_demote_when_v3_current
* review polish: CHANGELOG drift, override eligibility, defensive copy
Three small additions on top of the retry/rescan staged-bundle fix:
1. CHANGELOG: the PR's bullets had drifted into the released
[0.54.17] section during rebase (context-match landed them next
to already-released content). Moved them up to [Unreleased] where
they belong; [0.54.17] now holds only what was actually released
(refresh-marketplace ls-remote, /me/activity hero, CI sharding +
workflow polish).
2. app/api/admin.py: admin override eligibility now accepts
pending_llm alongside blocked_inline + blocked_llm + review_error.
Closes a UX gap from the new fail-CLOSED behavior: under
enabled-but-not-ready, a known-good submission would otherwise
sit indefinitely until the admin set credentials AND clicked
Retry. Override already routes through version_history (and is
now forward-only on promote), so it stays safe for v2+ deferred-
promotion submissions.
3. src/repositories/store_entities.py: get_with_version_approvals
defensively copies each version_history entry before annotating
with submission_status. self.get() re-parses JSON each call today
so this is belt-and-suspenders against any future caching layer
leaking the annotated key into a subsequent plain get() call.
Tests: 112 passed (focused on test_store_entity_versions +
test_admin_store_submissions, covering the retry/rescan staged-
bundle fix the author shipped + this polish).
---------
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
* feat(home): status frame on /home — last sync, sessions, prompts, tokens, projects
Adds the homepage status frame: a 5-card row above the install-hero /
offboard-strip on /home showing the calling user's Last sync (their
last `agnes pull`), Sessions, Prompts, Tokens used, and Projects worked
on, with a 24h/7d pill toggle.
Backed by `GET /api/me/home-stats?window=` (one DuckDB CTE joining
`users` + `usage_session_summary` + `usage_events`) and SSR'd from the
same `compute_home_stats` helper on initial paint so there's no
spinner. The window toggle is the only JS-driven path.
Side surfaces:
- `GET /api/sync/manifest` now stamps `users.last_pull_at` so
`agnes pull` (and the Claude Code SessionStart hook that wraps it)
imprints the analyst's last sync time for the new card.
- `usage_session_summary` gains four BIGINT token counters
(input_tokens, output_tokens, cache_read_tokens, cache_creation_tokens)
summed from JSONL `message.usage.*` per assistant turn.
- `USAGE_PROCESSOR_VERSION` bumps 1 → 2 so the session-pipeline
reprocess loop invalidates stale summaries and backfills tokens
on the next tick.
Schema migration v43 → v44 is idempotent ALTERs (last_pull_at +
4 token columns) — fresh installs receive them from `_SYSTEM_SCHEMA`,
upgrade path runs `_v43_to_v44`. Defaults (NULL / 0) backfill
existing rows cleanly.
9 new tests in tests/test_home_stats.py cover the migration,
endpoint shapes (24h/7d/unknown/empty/missing-user), and the
manifest-side last_pull_at bump.
* docs(CHANGELOG): homepage status frame entries under [Unreleased]
The post-rebase release-cut now belongs to whichever PR lands next
after main rolled to 0.54.9. This PR logs its bullets under
[Unreleased] (Added: homepage status frame, per-user pull tracking,
token counters; Changed: schema v43 → v44 migration) so they ride
out with the next release-cut.
* fix(tests): bump test_schema_v42_migration asserts to v44
CI failed because tests/test_schema_v42_migration.py hardcoded
`assert SCHEMA_VERSION == 43` and `assert v == 43` after init.
v44 (homepage stats frame backing columns) was introduced in the
preceding feat commit; this aligns the existing v42-era migration
tests with the new schema version.
* feat(home): gate status frame on operator flag + user.onboarded
Two gates on the homepage status frame:
1. **Operator master switch** — `get_home_status_frame_visibility()` in
app/instance_config.py mirrors the existing `get_home_automode_visibility()`
shape: env var `AGNES_HOME_SHOW_STATUS_FRAME` > yaml
`instance.home.show_status_frame` > default `True`. Cautious-rollout
instances can disable the frame without forking; the yaml example
documents both knobs.
2. **Onboarded gate** — the template only renders the frame when the
caller's `users.onboarded` is true. First-day users see a clean
install-hero before all-zero stat cards; the frame appears
automatically on the next render after `agnes init` POSTs
`/api/me/onboarded`.
Router skips the `compute_home_stats` DB read entirely when either
gate is closed; `home_stats` arrives at the template as None in that
branch and the `{% if %}` shortcuts the include.
Why both gates: PostHog feature flags evaluated and rejected — this
codebase uses PostHog for analytics capture only, not feature gating;
adding a per-user feature_enabled() call on the /home critical path
would couple the homepage render to a remote eval and still require
an admin master switch. The onboarded gate is a UX coherence rule
layered on top of the operator switch, not an A/B test signal.
3 new tests in test_home_stats.py cover the env-var resolution
(falsey values + default-true). The yaml example gets a `home:`
block documenting both `show_automode` (pre-existing flag, was
undocumented in the example) and `show_status_frame`.
Addresses post-merge review findings on #290:
- Admin Rescan is the only post-v30 producer of status='blocked_inline'.
Re-add it to admin queue 'Needs review' filter chip and to
TERMINAL_BLOCKED_STATUSES in the bundle-purge job so rescan-produced
rows surface in the default operator view and bundles get TTL-swept
instead of lingering indefinitely.
- Update three doc-drift sites still referring to the pre-#290 spam
counter scope (counted blocked_inline). The counter now narrows to
blocked_llm + review_error; fix the comment in app/api/store.py,
the docstring in get_guardrails_blocked_quota_per_day(), and the
operator-facing hint rendered on /admin/server-config.
- Add positive test for _reject_inline_or_continue validation branch
(code='validation_failed', checks payload shape, no-DB-write
contract). Locks the frontend wizard's detail.checks contract.
- Tighten test_quota_disabled_with_zero — assert (200, 201) explicitly
instead of !=429 so a 500 regression no longer passes.
- _reject_inline_or_continue takes plugin_dir and lazy-computes
bundle_meta only on the security branch. Validation rejects no
longer pay for a SHA256 walk on the bundle.
- Surface store.upload.security_blocked audit-log write failures via
logger.exception instead of swallowing — that audit row is the only
forensic trace by design.
* feat(home): Getting Started + Overview + Usage modes sections
Three new content cards rendered between the install-hero and the
existing connector tiles on /home. Order: Getting Started → Overview
→ Usage modes → connectors.
- Getting Started — dismissible card with two clickable rows linking
to /setup (install flow) and /setup-advanced (deeper reference).
Subsumes the legacy `.advanced-pointer` row that sat above the news
section. Per-device dismiss via a generic localStorage handler:
`.home-card-close[data-dismiss-key="..."]` inside a <section> wires
itself up at page load — drop in any future dismissible card without
per-card JS.
- Overview — operator-owned HTML body sourced from the new
`instance.overview` yaml field (env override
`AGNES_INSTANCE_OVERVIEW`). HTML in, HTML out via the same `| safe`
filter as news_intro. Empty default hides the section entirely,
keeping the OSS vendor-neutral; operators paste their product
framing / privacy posture into instance.yaml. New helper
`get_instance_overview()` in app/instance_config.py mirrors
`get_instance_logo_svg()`.
- Usage modes — three OSS-shipped tiles (Terminal / VS Code / Claude
Desktop · claude.ai) explaining each surface and linking to the
matching /setup-advanced anchors. Closes the gap for users
wondering "where do I actually run this".
Supporting changes:
- setup_advanced.html gains a new `#claude-app` section between
#vscode and #workspace, anchored by the Usage modes Claude Desktop
tile. Covers the marketplace registration paths and when to prefer
the terminal. Added to the table of contents.
- Three new tests in test_web_home_page.py pin the Getting Started
card markup, the Overview-on-when-yaml-set path, and the
Overview-off-by-default path. All 13 tests in the file pass.
Operator follow-up (separate infra PR — NOT this PR): paste the
Foundry-specific Overview body into instance.yaml's
`instance.overview` field. OSS ships with an empty default.
* fix(home): Overview is operator-owned content — drop dismiss button
Earlier iteration added a close X to the Overview section to match
the Getting Started card's dismiss UX. Wrong call: Overview is
operator-authored reference content (privacy posture, telemetry
policy, project framing) and a per-device localStorage hide means
returning users who want to re-read the policy can't recover it
without clearing storage.
Reverts the close button + the data-dismiss-key on the Overview
section. Test inverted to assert the dismiss key is absent (defends
against a future drive-by adding it back). Getting Started still
dismisses — that's procedural getting-started content users
legitimately stop needing once they've finished setup. Overview is
always reachable; whole section is still opt-in at the operator
level via the empty-yaml default.
* fix(home): Terminal usage-mode tile is informational (no click-through)
The setup hero above /home's Usage modes already walks the user
through the Claude Code CLI install — the Terminal tile click-through
to /setup just round-trips back to content the user already scrolled
past. Switch Terminal to a non-anchor <div> and scope the hover
affordance to a.home-usage-item so VS Code + Claude Desktop tiles
keep their click-through (those legitimately deep-link into
/setup-advanced anchors).
* fix(home): point Usage modes guidance at ~/{workspace}/Projects/ subfolder
The bundled plugin scopes the session-analysis loop and the
central-catalog sync to ~/<workspace>/Projects/, not the workspace
root itself — that convention already appears in the install hero's
Step 4 manual-fallback note ('Don't create ~/<workspace>/Projects/
manually — the bundled plugin offers to set it up after install').
Usage modes' footer guidance now matches: 'create every project
under ~/<workspace>/Projects/'. Also calls out that the
session-analysis loop is scoped to that root so users understand
why working outside the workspace dir is invisible to the platform.
* feat(brand): inline operator SVG logo + drop header subtitle (release 0.54.6)
Three header tweaks, one PR:
1. _app_header.html drops the small uppercase subtitle line below the
brand. instance.subtitle still flows into the CLAUDE.md preamble +
init welcome template ("Operated by …"); only the web header chrome
loses it.
2. get_instance_logo_svg() in app/instance_config.py reads
instance.logo_svg (yaml) / AGNES_INSTANCE_LOGO_SVG (env). The
yaml field was already documented in instance.yaml.example and the
template already supported inline <svg> via {{ config.LOGO_SVG |
safe }}, but router.py:344 hard-coded LOGO_SVG = "" — the middle
wire was missing. Now operators can paste a lockup directly into
their instance.yaml under instance.logo_svg: | and have it render
in the header. Resolution mirrors get_instance_brand (env > yaml >
""). instance.name remains independent: drives browser <title>
tags + page h1s + CLAUDE.md heading; the SVG is the web-header
visual only.
3. .app-header-logo svg gains max-height: 40px; width: auto; so any
operator's lockup scales via its viewBox to fit the 72px header
without per-asset width/height edits. Pairs with #2 — without the
clamp, raw artwork (e.g. a 1600x430 lockup) overflows the chrome.
Release-cut included per the same-PR rule (Unreleased contained only
these bullets after rebase onto 0.54.5).
* revert: keep app-header-subtitle span — out of scope for this PR
Initial commit dropped the subtitle line on the assumption that
the user wanted both the secondary header line AND the future-SVG
brand cleaned up. The actual ask was narrower: drop the hostname
suffix that renders inside instance.name ("Foundry AI (hostname)"),
which is a startup.sh concern, not a template one. Restore the
subtitle span and the CHANGELOG bullet that announced its removal.
PR scope narrows to LOGO_SVG wiring + CSS clamp only.
* fix(header): hide subtitle span when instance.subtitle is empty
Pre-fix the template fell back to the literal string 'Data Analyst
Portal' when INSTANCE_SUBTITLE was unset, so operators who left the
field empty saw a stray hardcoded label below their brand. Switched
to a Jinja {% if %} guard around the whole <span class="app-header-
subtitle"> so an empty subtitle produces no element at all — clean
header chrome instead of placeholder leak.
* feat(home): hide install-hero once onboarded + X close button
- Wrap the entire install-hero in `{% if not onboarded %}` so once
`users.onboarded=true` (auto-flipped by `agnes init` POSTing
/api/me/onboarded, or by the new X / existing fallback button) the
blue hero disappears entirely. Pre-PR the onboarded branch reused
the same shell with a "Welcome back" header + "Steps 1–4 done" badge
+ minimize toggle, which visually outweighed the actual nav hub.
- Add a circular × close button (top-right of the hero, rendered only
when not-onboarded). Click → window.confirm() asking the user to
acknowledge onboarding → POST /api/me/onboarded → reload. The
confirm string intentionally avoids the literal phrase
"Mark me as offboarded" because cli/commands/onboarded.py::status
scans /home's rendered HTML for that exact marker as a fallback for
the api/me/profile check.
- Lift the offboard escape hatch out of the hero into a discrete
`.offboard-strip` rendered below, gated `{% if onboarded %}`. Lets
the analyst flip back to the install view after wiping their
workspace folder.
- Centralize the /api/me/onboarded POST into a `postOnboarded()` JS
helper reused by the hero X, the existing "Mark me as onboarded"
fallback button, and the new offboard button.
Tests updated to match the new behavior:
- `test_home_onboarded_user_sees_nav_hub` — asserts the hero is gone
and the offboard strip is the only setup-flow remnant.
- `test_minimize_toggle_no_longer_rendered` (renamed) — asserts the
minimize toggle is absent in both states (was previously rendered
inside the now-hidden onboarded branch of the hero).
- `test_home_no_auto_transition_after_post_until_reload` — checks
offboard-strip presence post-flip instead of the removed
"Welcome back" hero copy.
* fix(home): X-close button used invalid source enum, hit 422
The X button's data-target-source was 'self_acknowledged_x' to give
audit_log a separate marker for X-vs-button-driven flips. But
app/api/me.py:38's OnboardedRequest pins source to a Literal of
['agnes_init', 'self_acknowledged', 'self_unmark'] — pydantic
returned 422 on every X click.
Confusing side effect: both buttons share self-mark-status as the
status element, so the failed X click rendered 'Failed (422)' next
to the still-functional 'Mark me as onboarded' button. Looked like
the button itself broke.
Fix: drop the _x suffix. Both surfaces now POST source='self_acknowledged'.
Distinction in audit_log is not load-bearing — the source field
captures user intent ('I'm onboarded'), not the specific UI affordance.
* feat(store-guardrails): admin-configurable content thresholds
Adds the flea-market content guardrail floors to the /admin/server-config
editor so operators can tune the bar without code changes. Defaults are
unchanged (60 chars description, 25 chars command, 5 distinct words, 200
chars body) — patching guardrails.* in instance.yaml or via the admin UI
overrides any of them and the next inline check picks up the new value.
src/store_guardrails/content_check.py now resolves the four floors via
helper functions (_min_desc_chars / _min_command_desc_chars /
_min_distinct_words / _min_body_chars) that read app.instance_config at
call time. Module-level _DEFAULT_* constants remain as fallbacks if
the import fails (defensive — keeps the guardrail module loadable
without the app package on its path).
app/instance_config.py grows four matching getters returning the live
value with sane defaults + integer coercion.
app/api/admin.py registers 'guardrails' as an editable section + ships
nine known-fields entries (min_description_chars,
min_command_description_chars, min_distinct_words, min_body_chars,
enabled, review_model, blocked_quota_per_day, blocked_bundle_ttl_days,
stuck_review_grace_seconds) with operator-facing hint copy explaining
what each knob does.
app/web/templates/admin_server_config.html gets a SECTION_META entry
so the section renders as 'Flea-market guardrails' with a help string
instead of a bare section ID.
app/web/router.py threads the live thresholds into /store/new and
/store/examples via a small _guardrail_thresholds() helper so the
disclosure copy, char counter, and "Why these limits" table render
the configured value (not a hardcoded 60). End-to-end smoke verified:
PATCH guardrails.min_description_chars=90 → /store/new immediately
renders "90 characters" + JS DESC_MIN=90 on the next request, no
restart required (helpers read live config per call).
* chore(store-guardrails): address PR review safe-fix findings
Code-review safe_auto findings on PR #281 (review run
20260513-100126-64052520):
- CHANGELOG: add Unreleased entry covering the new
/admin/server-config Flea-market guardrails section, the four live
threshold getters, and the route-helper rendering knobs. Required by
the project's non-negotiable "Changelog discipline" rule.
- content_check.py: narrow `except Exception` to `except ImportError`
on the four `_min_*()` resolver helpers. Surface-level TypeError /
ValueError on a malformed YAML value belongs to the
instance_config getters' own try/except — the resolvers should only
defend against the in-tree import itself failing, not silently
swallow real bugs in the getters.
- store_upload.html: refresh the stale "30-char threshold" comment to
reflect the configurable floor (default 60), and add `|default(60)`
/ `|default(25)` / `|default(5)` filters to the disclosure-copy
bindings so the upload form matches store_examples.html's
belt-and-suspenders rendering if a future route ever renders the
template without populating the `guardrail` context.
- router.py: tighten `_guardrail_thresholds()` return annotation
from bare `dict` to `dict[str, int]`.
Residual work (left for separate change after operator direction):
- Add round-trip test (PATCH guardrails -> next inline check uses
new value) — primary testing gap.
- Decide policy on `min_*=0` (currently coerced to 1 via
`max(1, int(val))`) vs treating 0 as a disable sentinel like
neighbour getters (`blocked_quota_per_day`,
`blocked_bundle_ttl_days`).
- Add POST-time integer validation for `guardrails.*` so a typo'd
YAML value (bool / string / float) errors loudly instead of
silently falling back to the default.
* test(store-guardrails): cover admin-configurable thresholds + PATCH round-trip
Closes the "primary testing gap" Vojta noted in the safe-fix commit
on PR #281 — the four new `get_guardrails_min_*` getters and the
PATCH-takes-effect-on-next-check live-config flow had no direct
coverage.
10 new tests in `tests/test_store_guardrails_admin_config.py`:
- TestGuardrailGetterDefaults (4 tests) — each new getter returns the
documented default (60 / 25 / 5 / 200) when nothing is configured.
- TestGuardrailGetterOverlay (5 tests) — overlay-driven overrides win,
string values that look numeric coerce via int(), garbage strings
fall back to default via the (TypeError, ValueError) branch, and the
`max(1, int(val))` floor pins zero/negative inputs to 1.
- TestPatchRoundTrip (1 test) — PATCH `/api/admin/server-config`
`guardrails.min_description_chars=90`, then call content_check
against a 75-char description that previously passed: must now fail
with `too_short`. Then PATCH back to 60 and verify the next check
passes again. Closes the cache-invalidation contract Vojta relies on
for the "no app restart" claim — broken without the
reset_cache() bracket in /api/admin/server-config.
The TestGuardrailGetterOverlay.test_zero_or_negative_floored_to_one
test pins the current `max(1, int(val))` policy. Vojta's safe-fix
commit explicitly left "policy on min_*=0 vs disable-sentinel" as
residual work — pinning the current behavior here ensures any future
change to use 0 as a disable sentinel must update this test (and the
reviewer sees the policy decision).
Verified: 4509 tests pass locally (4499 existing + 10 new).
* release: 0.54.2 — admin-configurable flea-market guardrail thresholds + tests
Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.1 →
0.54.2) bundling Vojta's admin-configurable thresholds for the
flea-market content guardrail (9 knobs in /admin/server-config) plus
the test coverage closing the "primary testing gap" he punted in the
safe-fix commit.
No DB migration; defaults unchanged from PR #276 — instances that
don't set `guardrails.*` keep the original bar transparently.
---------
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
Co-authored-by: ZdenekSrotyr <139972147+ZdenekSrotyr@users.noreply.github.com>
- instance.brand (env AGNES_INSTANCE_BRAND, default "Agnes") +
instance.workspace_dir replace hard-coded "Agnes" / "~/Agnes" across
/home, /setup, /setup-advanced, /login, /install, /me/debug, and the
Claude Code clipboard setup script. Terraform-friendly env override;
defaults preserve existing Agnes branding.
- Explicit "create workspace folder" step on /home (OS-tabbed mkdir+cd)
+ same step baked into the clipboard script as step 2. Drops the
implicit assumption that `agnes init --workspace .` lands in a
sensibly-cd'd shell.
- Final "Restart Claude Code" step in the setup script (unconditional,
between connectors and Confirm) so freshly-installed plugins, MCP
servers, and SessionStart hooks load on the next Claude Code session.
- Asana reverted from hosted Remote MCP back to PAT + raw REST against
app.asana.com/api/1.0. MCP envelope shape consumed ~5x tokens per
call; the PAT path lets the agent read flat REST fields. Existing
MCP registration is detected and the user is asked whether to remove
it (default Y, with benefits listed: token cost, no third-party hop,
no OAuth refresh dance, deterministic envelope shape).
- Atlassian connector instructs picking the longest API-token expiry
(today "1 year") to cut re-mint friction. No public query-parameter
hook exists on id.atlassian.com to pre-select expiry, so the prompt
documents the manual click and acknowledges that limitation.
- Uniform ✅ / ❌ per-connector marker contract (Asana, GWS, Atlassian)
for the Confirm summary to grep. Each connector now ends with a
Claude-driven end-to-end test that uses Claude Code's own bash to
exercise the stored credential and prints
"✅ <Connector> integration verified — ..." (or the failure variant).
* fix(cta): fall back to textarea+execCommand when Clipboard API rejects
The "Setup a new Claude Code" CTA fetches /auth/tokens, parses the JSON
response, renders the setup script, THEN calls
`navigator.clipboard.writeText()`. Modern browsers (Safari, Firefox, and
Chrome on stricter configurations) reject `writeText` with
NotAllowedError when transient user activation has been consumed by an
intervening `await` — which is exactly the case here. Users perceived
this as "the browser blocked the copy" and got the manual-paste fallback
modal even though the textarea + `document.execCommand('copy')` path
WOULD have worked synchronously without needing fresh user activation.
`copyToClipboard` now:
- prefers the modern Clipboard API (unchanged for the happy path)
- on writeText rejection, falls back to `copyViaTextarea` instead of
surfacing the rejection to the caller's catch block.
`copyViaTextarea` is the previously-inline textarea fallback factored
out into a named helper, with two small hardening touches:
- `readonly` + `tabindex=-1` so the hidden textarea doesn't steal
focus or pop the virtual keyboard on mobile.
- explicit `setSelectionRange(0, text.length)` to belt-and-braces the
selection on iOS Safari (where `.select()` alone sometimes selects
zero chars on touch-focused textareas).
Only the CTA button needed this — the Step-1 install-command and the
connector-copy buttons all call `writeText` synchronously inside the
click handler (no awaits in between), so they keep their existing
user-gesture context and didn't hit the same rejection. No template
changes there.
* refactor(home): fold Atlassian MCP registration into connectors block
The standalone "Register the Atlassian MCP server" step (was step 6 in
the unified setup script) moves INTO the Atlassian connector's prompt
body so all Atlassian-related setup lives in one logical group. Same
intent that #247 carried for connectors, applied one level deeper:
the hosted Remote MCP registration is part of "set up Atlassian", not
its own ungrouped step.
What changed:
- `app/web/connector_prompts.py` — the Atlassian prompt's step 5
replaces the speculative "Register the on-demand Atlassian MCP under
.claude/mcp/atlassian" line with the actual hosted Remote MCP
registration: `claude mcp add --transport sse atlassian
https://mcp.atlassian.com/v1/sse || true`. The `|| true` keeps re-runs
idempotent and the body explains the OAuth-on-first-use contract.
Both /home's Atlassian tile and the inlined setup-script Atlassian
sub-block emit this line — single source of truth holds.
- `app/web/setup_instructions.py` — `_mcp_servers_block` deleted; the
`mcp_servers` step is removed from `_step_numbers`; resolve_lines no
longer calls it.
- Renumbering: install (1), init (2), catalog (3), preflight (4),
marketplace (5), diagnose (6), connectors (7), confirm (8). Was:
6 = mcp_servers, 7 = diagnose, 8 = connectors, 9 = confirm.
- `tests/test_setup_instructions.py` — Confirm step 9→8, Connect 8→7,
diagnose 7→6, mcp_servers references dropped.
`test_step_numbering_with_connectors_step` now asserts
`"mcp_servers" not in steps`. Stray-Confirm assertion lists shift
by one position.
- `tests/test_setup_page_unified.py` + `tests/test_web_ui.py` — same
step-number shifts in the rendered /setup preview assertions.
The `claude mcp add` line is still the Atlassian Remote-MCP path that
the 2026-05-10 init-report Fix C added — only its position in the
flow changes. /home Atlassian tile copying continues to install the
MCP too (the prompt body the tile pastes contains the same line).
112 tests pass.
* feat(atlassian): operator-overrideable base URL via AGNES_ATLASSIAN_BASE_URL
Adds an env var / YAML key the operator (Terraform module, customer-VM
template, OSS instance.yaml) can set to bake the Atlassian Cloud site
root into the connector prompt — so end users don't have to guess /
paste their org's `https://<myorg>.atlassian.net`.
When set, the Atlassian connector prompt (rendered on both /home tile
and inlined into the setup-script step 7 Atlassian sub-block) replaces
step 1's "Ask me for my Atlassian Cloud site URL and email" with a
one-line note that the URL is already provisioned by the operator and
asks only for the email. Step 4's helper-script body has the
`BASE_URL='<the site URL I gave you>'` placeholder substituted with
the literal value. When unset (empty), the existing "ask the user"
flow remains — no regression for OSS instances.
Resolution + normalization in `get_atlassian_base_url()`:
- env `AGNES_ATLASSIAN_BASE_URL` > yaml `instance.atlassian.base_url` > ""
- strips trailing slash + trailing `/wiki` so the canonical value is
the bare site root. Matches the per-user helper script's
normalization at storage time (atlassian_prompt step 4 guard 2), so
the literal baked in by the operator stays consistent with what the
user's helper script would have computed from their input.
Plumbing:
- `app/instance_config.py`: new `get_atlassian_base_url()` resolver.
- `app/web/connector_prompts.py`:
- `atlassian_prompt(*, base_url: str = "")` — string-replace two
explicit placeholder phrases when base_url is truthy; otherwise
return the prompt unchanged.
- `all_connector_prompts(..., atlassian_base_url: str = "")` —
forwards the kwarg.
- `app/web/router.py` (`_build_context`): reads
`get_atlassian_base_url()` and passes it through to
`all_connector_prompts(...)` so both the /home tile context AND the
inlined-script `resolve_lines(...)` call use the same value.
- `src/welcome_template.py` (`compute_default_agent_prompt`): same
threading via the existing import-on-demand path.
Tests (`tests/test_home_route_resolution.py`):
- `get_atlassian_base_url` resolver: default empty, env override,
trailing-slash strip, trailing-`/wiki` strip.
- `atlassian_prompt(base_url=...)`: literal URL baked in, ask-step
removed, placeholder replaced, operator-baked-in copy appears.
- `atlassian_prompt(base_url="")`: existing ask-the-user flow
unchanged.
- `all_connector_prompts(atlassian_base_url=...)`: kwarg threads
through to the rendered atlassian prompt.
135 tests pass.
* feat(asana): register hosted Asana Remote MCP in connector prompt
The Asana connector prompt only stored a PAT in the OS keychain + ran
a curl verify against /api/1.0/users/me. That set Claude Code up for
direct `curl` calls but didn't actually wire Asana into Claude's tool
list — so the user couldn't ask Claude to "find my open Asana tasks"
and have it work. Symmetric oversight to the Atlassian connector's
original speculative `.claude/mcp/atlassian` line that this branch
already replaced with `claude mcp add --transport sse atlassian
https://mcp.atlassian.com/v1/sse`.
Adds a new step 5 that registers Asana's hosted Remote MCP:
claude mcp add --transport http asana https://mcp.asana.com/mcp || true
This is the V2 endpoint (streamable HTTP transport, launched February
2026). The V1 SSE endpoint at https://mcp.asana.com/sse was deprecated
2026-05-11 (today) and must NOT be used — calling it out explicitly
in the prompt body so a future operator who finds an old reference
doesn't paste the dead URL. OAuth is handled by Claude Code at first
use, same model as the Atlassian MCP step.
The PAT stored in step 3 stays for direct `curl` calls (precheck +
ad-hoc scripts) — the MCP path uses its own OAuth grant, not the PAT.
Old step 5 (revoke instructions) renumbers to step 6 and adds the
`claude mcp remove asana` cleanup hint.
Same single-source-of-truth invariant holds: /home Asana tile + the
inlined Asana sub-block in the setup script (step 7 connectors) both
emit identical text from `asana_prompt()`.
71 tests pass.
* feat(asana): drive MCP OAuth login + end-to-end validation post-register
`claude mcp add --transport http asana ...` only registers the
server in Claude Code's local config — it does NOT trigger OAuth.
The browser tab opens the first time any `mcp__asana__*` tool gets
invoked. So the previous step 5 left a user looking at a "registered"
MCP that, in practice, hadn't authed yet and would fail on first
real use. Same blind spot Atlassian's prompt also has, but Asana was
the one called out in the latest review pass.
Adds a new step 6 between MCP registration (step 5) and the revoke
instructions (now step 7):
a. Tell the user verbatim what's about to happen — a low-impact
read through the MCP will pop the OAuth browser tab; sign in
with the same account whose PAT they stored in step 3 and
approve. Frames the OAuth as one-time so users don't wait
for it on every later call.
b. Drive an actual MCP read. Don't prescribe the exact tool name
because the Asana MCP's exposed surface (`mcp__asana__*`) is
versioned upstream and we don't want to pin to a name that
gets renamed. Instead: tell Claude to pick the lightest read
from its surfaced tool list (users-me / list-workspaces /
equivalent). Document the recovery path when Claude Code
times out waiting for the OAuth tool use: `claude mcp list`
to confirm registration before retrying.
c. Print a single one-line proof that combines wiring + auth:
"Asana MCP connected as <name> — <N> workspace(s) visible."
Explicit anti-echo callout for tokens, task content, comments.
On failure, surface the exact Claude-Code error and stop —
no silent pass.
d. Sanity-check that the MCP OAuth identity and the PAT identity
reference the same Asana account. Easy mistake to make when
the user has multiple Asana accounts — flag only on mismatch,
keep quiet when they match. Recovery: `claude mcp remove asana
&& claude logout asana` then redo step 5.
Step 7 (revoke) absorbs both the keychain delete + the
`claude logout asana` line so users have a single place to undo
everything.
43 tests pass.
* fix(init): clear stale CA env vars on Windows before any TLS handshake
Reported by the 2026-05-11 Windows test pass: after `agnes init` the
gws connector failed with `UnknownIssuer` TLS errors because
`SSL_CERT_FILE` and `REQUESTS_CA_BUNDLE` were still set in Windows
User scope pointing at `C:\Users\localadmin\.config\agnes\ca-bundle.pem`
— a file that did not exist on the test host. Past Agnes installs
(the setup-prompt trust block + older bootstrap helpers) write those
pointers when they materialize a combined Agnes-CA bundle; when the
bundle file later disappears (re-init on a new VM, machine swap, the
~/.agnes dir wiped), the pointers go stale and every native Windows
TLS handshake fails before Agnes itself runs. SSL_CERT_FILE in
particular REPLACES (not appends to) the trust store, so a stale
pointer is silently catastrophic.
`agnes init` now clears stale pointers in two layers before the first
server roundtrip:
1. Current-process env (os.environ) — what the immediately-following
`api_get` to /api/catalog/tables actually reads. Without this, init
itself blows up before it gets to step 2.
2. Windows User-scope env via PowerShell
`[Environment]::SetEnvironmentVariable(name, $null, 'User')` — what
every future shell + every native tool (gws, claude.exe, pip, uv)
inherits. The 2026-05-11 reporter expected this exact cleanup
("init was supposed to clear these but they persisted").
The cleanup is best-effort and conservative:
- Only deletes a var when its value points at a path that does NOT
exist on disk. Intentional operator config (e.g. SSL_CERT_FILE
pointing at a corp certifi bundle) stays put.
- PowerShell missing / restricted execution policy / WSL-without-pwsh:
swallowed silently. The current-process leg still runs, which
unblocks init even on hosts where the User-scope leg cannot fire.
Tests (`tests/test_init_ca_cleanup.py`, 6 cases):
- Stale pointers → removed from process env.
- Real-path pointers → preserved.
- Non-Windows hosts: PowerShell is not invoked.
- Windows hosts: PowerShell IS invoked with a script that checks
all three vars + uses Test-Path + SetEnvironmentVariable.
- PowerShell FileNotFoundError: cleanup swallows it, does not raise.
- `_is_windows_host()` reflects sys.platform.
* refactor(asana): MCP-first flow — drop PAT storage, precheck via `claude mcp list`
The Asana hosted MCP at https://mcp.asana.com/mcp authenticates via
OAuth (Claude Code holds the grant; browser tab pops on first tool
use). The earlier prompt walked the user through creating + keychain-
storing an Asana Personal Access Token AND registering the MCP — two
parallel auth surfaces for one connector. Once the MCP works, the PAT
has no consumer: the precheck/verify steps that used `curl
$BASE/api/1.0/users/me` are just redundant proof that Asana itself is
reachable, which the OAuth handshake already establishes.
Removed:
- Step 0 keychain probe + curl verify against /users/me with PAT.
- Step 1 open developer-console / create PAT.
- Step 2 click "+ New access token", warn shown-ONCE.
- Step 3 helper-script for keychain-storage (per-OS bodies: macOS
`security add-generic-password`, Linux `secret-tool store`, Windows
`cmdkey /generic`).
- Step 4 PAT-side `users/me` verify.
- Step 5's split that kept the PAT around for direct curl scripts.
- Step 6d's "MCP vs PAT identity sanity check" — there is no PAT
anymore, nothing to mismatch against.
New flow (3 steps total):
- Step 0 precheck: `claude mcp list | grep ^asana` — if found, the
server is registered AND Claude Code is holding its OAuth grant
(otherwise prior failure would have removed it); print
"Asana MCP already registered — skipping setup" and stop. Tells the
user the explicit reset command (`claude mcp remove asana && claude
logout asana`) so a re-register stays one paste.
- Step 1: `claude mcp add --transport http asana
https://mcp.asana.com/mcp` — no `|| true` because step 0 should have
caught the "already exists" case. Step explains the V2-vs-V1
endpoint distinction (V1 SSE deprecated 2026-05-11) and the
abort-clean recovery if the precheck somehow missed the existing
server.
- Step 2: same OAuth + low-impact-read validation pattern as before.
- Step 3: revoke instructions (mcp remove + logout + Asana-side app
revoke at app.asana.com/Settings → Apps).
Both surfaces (the /home Asana tile and the inlined Asana sub-block
in the setup script's step 7) emit the new text from the same
asana_prompt() — single-source-of-truth invariant intact.
77 tests pass.
* Make /home install-hero links readable against blue background
The Claude license-options link added in the previous commit inherited
the default `<a>` style (`var(--hp-primary)` blue), which renders as
blue-on-blue and is unreadable inside the blue install-hero. Add a
scoped `.install-hero a` rule that uses white with an underline
(matching the existing lead-paragraph contrast pattern) so any link
nested in the hero stays legible.
* Reorder /home install flow: auto-mode is now Step 2, Agnes install becomes Step 3
Step 3 (was Step 2) pastes a ~20-command bash bootstrap into a fresh
Claude Code session. Without auto-mode enabled first, each Bash/edit
command needs a manual approve click — bad UX for first-time users.
Move auto-mode from the outside-hero `<details>` reference block into
the install-hero as a real Step 2, between "install Claude Code" and
"install Agnes". Content is the persistent `acceptEdits` snippet
(write to ~/.claude/settings.json) plus a one-liner pointing at
Shift+Tab for users who are already inside a running Claude Code
session. YOLO mode for full Bash auto-approve stays on
/setup-advanced behind the existing link.
The outside-hero `setup-collapsible[data-section="step3"]` block is
dropped — auto-mode is no longer reference content, it's a real
install step, and duplicating it would just diverge over time.
Onboarded users no longer see the auto-mode block at all (consistent
with Steps 1 + 3 also hiding post-onboarding).
Completion banner copy updated: "Step 1, 2 & 3 done — Claude Code
installed, auto-mode set, Agnes ready". Dashboard CTA partial and
other templates don't reference step numbers for this flow, so no
adaptation needed there.
* Simplify /home Step 2 to Shift+Tab only — drop the JSON snippet
Operator pointed out two issues with the prior Step 2:
1. The settings.json snippet is redundant. Claude Code's first
Shift+Tab cycle to auto-accept mode already prompts the user
whether to persist it as default — Claude writes the config
itself, no manual file edit needed.
2. The snippet only showed the POSIX path `~/.claude/settings.json`,
which doesn't translate to native Windows.
Replace the snippet + copy button with a plain Shift+Tab instruction,
explicitly call out the first-time "make this the default?" prompt,
and note that Claude handles the config write itself — same flow on
macOS / Linux / WSL / Windows. Adds a fallback line for users who
already closed the post-OAuth session.
* Tighten /home Step 2 install-note to two paragraphs
Operator: drop the 'Claude writes the setting itself, so this works
the same on macOS / Linux / WSL / Windows...' line plus the
'auto-approves file edits going forward; Bash commands stay gated
— that's the safe default' line. Both were filler — the make-default
prompt already implies persistence, and gated Bash is the obvious
default users won't be surprised by.
Result: paragraph 1 carries Shift+Tab + first-time make-default
say-yes + closed-session fallback in one breath; paragraph 2 keeps
the verbatim YOLO link. Same affordances, less vertical space.
* feat(store): flea-market upload guardrails + soft delete + JOIN-based admin queue
Adds an end-to-end guardrails pipeline for store uploads (manifest +
static-security + LLM review), persists blocked bundles for forensics,
introduces soft-delete (Archive) semantics, consolidates the legacy
/store/{id} surface into /marketplace/flea/{id}, and reworks the admin
queue so lifecycle filters read live entity visibility via LEFT JOIN
rather than a denormalized submission column.
Schema v29 → v35:
* v29 store_submissions table + store_entities.visibility_status
* v30 file_size, bundle_sha256, bundle_purged_at on submissions
* v31 reshape store_submissions (drop legacy unique on entity_id)
* v32 store_entities.archived_at/by + 'archived' visibility value
* v33 drop store_submissions.retry_count (unused)
* v34 ensure idx_store_submissions_entity exists post column-drop
* v35 broaden visibility_status enum + JOIN architecture cutover
Pipeline (src/store_guardrails/):
* Inline checks: manifest_check, static_scan, quality_check
* LLM review configurable haiku|sonnet|opus (default haiku)
* BackgroundTasks-driven async path with structured-output JSON
* Per-submitter daily quota (default 50)
* 30-day TTL purge job (POST /api/admin/run-blocked-purge)
* Bundle SHA256 + size persisted; sha256 survives purge for forensics
Visibility model:
* pending | approved | hidden | archived
* _enforce_visibility returns 404 (no leak) for non-owner non-admin
* Owner sees own non-approved entries via include_owner_id widening
* Install refused with 409 entity_not_approved when not approved
Soft-delete (DELETE /api/store/entities/{id}):
* Default = soft (visibility_status='archived'); existing installs
keep getting served the bundle so users don't lose the plugin
* ?hard=true admin-only: drops bundle + cascades user_store_installs
* Hard-delete preserves entity_id on submission as tombstone so
audit_log linkage survives for the activity timeline
Admin queue lifecycle (the JOIN refactor):
* Verdict (store_submissions.status) is immutable forensic record
* Lifecycle (store_entities.visibility_status) is live state
* /admin/store/submissions Archived chip translates to
`e.visibility_status='archived'` via LEFT JOIN — any path that
flips visibility surfaces in the queue immediately
* Detail page renders Status (verdict) and Entity lifecycle side by
side so admins see "approved at review, now archived" at a glance
URL consolidation:
* /store/{id} deleted (no redirect, stale bookmarks 404)
* /marketplace/flea/{id} is the canonical detail surface
* Three in-tree callers (upload-success, my-stack card, store
listing card) updated to point at the new URL
* Quarantine banner extracted to _quarantine_banner.html partial,
self-guarded, included from both flea detail templates
* Banner JS auto-refreshes when the verdict lands by polling
/api/marketplace/flea/{id}/detail (visibility_status +
submission_status — the latter is needed because blocked_llm
keeps the entity at visibility_status='pending')
Audit log resource format:
* runner.py emits prefixed `store_submission:{id}` (post-fix)
* Detail-page timeline query handles three patterns: prefixed
submission, helper-emitted `store_entity:{sub_id}`, and bare-id
legacy rows — all surface in the activity timeline
UX fixes:
* Owner sees Under review / Quarantined / Hidden banner with status
* Install button gray-disabled (not blue) when non-approved
* Owner cannot delete quarantined entries (403); admin can
* Admin queue: filter chips, sortable columns, paging, page-size
* Auto-refresh queue every 5s while pending rows are visible
* Store upload page file picker no longer opens twice (label →
input default action collided with explicit JS handler)
Tests: 168 passed across the guardrails suites (admin submissions,
store API, inline / LLM / purge guardrails, store repositories,
marketplace filter, schema version). New regression coverage
includes: archive surfaces via JOIN even when API path is bypassed;
deleted submission renders activity timeline (tombstone); flea
detail surfaces submission_status only for owner/admin; detail page
renders Entity lifecycle row; audit log resource format covers both
helper and runner paths.
* fix(store-guardrails): PR #233 follow-up — prompt injection, atomic PUT, BG race, schema, reaper, sort whitelist
Addresses 9 of the 23 findings from the PR #233 review (spec at
docs/superpowers/specs/2026-05-09-pr233-guardrails-fixes-spec.md).
Merge-gate items #1-#6 plus high-value mediums #7, #9-#12, #23.
Architectural items (#8 enum split, #14 factory) and pure
maintainability (#15-#22) deferred to follow-ups.
Security:
* #1 prompt injection — SYSTEM_PROMPT now passed via the SDK's
dedicated system= parameter; bundle wrapped in <bundle>...</bundle>
sentinels declared data-only by the system prompt; literal
sentinel strings in user content are escaped so an adversarial
README can't forge a close tag.
* #6 static scan honesty — module docstring + admin copy + docs
declare static scan as signal not gate; .md/.txt/.rst/.html/.json/
.yaml/.yml/.toml skipped to avoid false positives on prose.
AST mode for Python deferred (separate flag, FP comparison work).
Correctness:
* #2 PUT atomicity — bundles bake into plugin.staging-<rand>/
alongside live, atomic-rename on success; failed checks leave
live tree byte-for-byte intact.
* #3 BG-task race — set_visibility_if_pending guards verdict flips
to the (pending, hidden) review window; admin archives during
review survive; skipped flips audit-logged.
* #4 v35 NOT NULL/DEFAULT — schema v35→v36 re-applies them on
store_entities.visibility_status. CHECK constraint enforced
application-side (DuckDB ADD CHECK on existing column unsupported).
* #7 stuck-review reaper — reap_stuck_llm_reviews flips pending_llm
rows older than guardrails.stuck_review_grace_seconds (default
1800) to review_error. Scheduler runs every 15 min via new
/api/admin/run-reap-stuck-reviews. Set knob to 0 to disable.
* #9 quota counter — count_blocked_for_submitter_since now counts
blocked_inline + blocked_llm + review_error so a submitter
triggering only LLM-blocked verdicts is bounded.
* #10 missing risk_level — surfaces as review_error with
error='missing_risk_level' instead of silently defaulting to
'medium' (which looked like a model-decided block).
* #11 archived_at clear — set_visibility nulls archived_at +
archived_by when transitioning out of 'archived' so a future
read doesn't show stale archive forensics on an approved row.
Maintainability:
* #12 FSM doc comment — accurate insert/transition/lifecycle
description in src/db.py near store_submissions schema.
* #23 sort-key whitelist — admin queue rejects unknown sort keys
with 400 invalid_sort_key; substring-replace footgun removed.
Deferred (separate PRs):
* #5 quota race — proper fix requires asyncio.Lock spanning the
full pipeline; threading.Lock blocks event loop, DuckDB MVCC
doesn't help. API-level slowapi bounds worst case for now.
* #6 part 3 (AST static scan), #8 (enum split), #13 (import
bundle docs), #14 (factory consolidation), #15-#22 (maint).
Tests:
* New: tests/test_store_guardrails_prompt_injection.py (corpus +
trust-boundary invariants), tests/test_store_put_atomic.py,
tests/test_store_guardrails_reaper.py.
* Extended: test_store_guardrails_llm.py (system param, missing
risk_level, BG race), test_admin_store_submissions.py (quota
counter widening, sort whitelist 400), test_store_repositories.py
(un-archive metadata clear), test_db_schema_version.py (v36).
* Full suite: 3738 passed; 17 pre-existing baseline failures
unchanged (db migration tests, cli binary rename, catalog export,
user mgmt v5 backfill — confirmed by stash + rerun on clean tree).
* feat(home+news): state-aware /home + /news + admin-edited news section
Squash of the vr/home-page feature work for clean rebase onto main.
Original 18-commit history preserved in branch backup/vr-home-page-pre-rebase.
What's in this PR:
**State-aware /home page**
- New `/home` route with hero + auto-mode + connectors (Asana / GWS /
Atlassian) + lookarounds. Onboarded vs not-onboarded state-machine
branches a single template (`home_not_onboarded.html`); the install
steps, "Setup a new Claude Code" CTA (90-day PAT mint), and per-
connector setup prompts hide once `users.onboarded=TRUE`. A
completion badge replaces them.
- "Mark me as offboarded" button reverses the flag without an SQL UPDATE.
- `users.onboarded BOOLEAN` column added; default FALSE; flipped by the
CLI's `agnes init` post-success POST and the `/admin/users` API.
- Connector setup prompts pre-check whether the tool is already
installed/connected before re-running setup.
- GWS scope set widened to include Google Chat (`chat.spaces`,
`chat.messages`).
**Single template + design tokens**
- `dashboard.html` now extends `base.html` via the new
`{% block layout %}` opt-out (full-width pages skip the 800px
`.container`). Net: every page shares one shell.
- `style-custom.css` `:root` extended with `--space-{7,9,10,12}`,
`--radius-2xl`, `--shadow-{card,elevated}`, `--text-{muted,disabled}`,
`--focus-ring`, `--transition-*`, `--width-{narrow,app,wide}` so
inline page styles can migrate incrementally.
**Auth redirects honor AGNES_HOME_ROUTE**
- `safe_next_path` resolves the configured home route when no `default=`
is passed; OAuth callbacks, magic-link clicks, password form, and
LOCAL_DEV_MODE shortcuts now land on `/home` (or whatever the operator
picked) instead of always /dashboard.
**News section + /news permalink + /admin/news editor**
- Schema-bumped `news_template` table (single versioned entity, draft +
publish gate). `published BOOLEAN` distinguishes draft from public;
monotonically-increasing `version` per save; rows >30d pruned on
save except the currently-displayed published version.
- `/home` bottom-of-page renders the latest published intro with a
"Read more →" link to `/news` (which renders the full body).
- `/admin/news` editor with sandboxed live preview, versions table,
per-row Unpublish, Format-help cheatsheet.
- `agnes admin news show / draft / edit / publish / unpublish /
versions / export` (CLI). Talks to the live server via the
`/api/admin/news/*` endpoints (PAT-authed) — no direct DB access
so it coexists with a running uvicorn.
- **Optimistic-lock guard**: `agnes admin news publish --version N` and
PUT/PATCH endpoints accept `expected_version` and 409 with structured
`{error: "version_conflict", expected, actual, actual_by}` when a
concurrent admin replaced the draft. Edit refuses to overwrite a
draft authored by someone else without `--force` or
`--expect-version`.
- nh3 (Rust-backed ammonia) HTML sanitizer; iframe pre-pass strips
any iframe whose src is not on the YouTube/Vimeo/Loom allowlist;
javascript:/data: schemes blocked everywhere.
- Author CSS vocabulary: `.news-hero` (blue gradient hero block),
`.callout`/`.callout-{info,warn,success,danger}`,
`.video-embed`, `.news-section`, `.news-grid-{2,3}`, `.news-cta` —
all consolidated in `style-custom.css` under "News content
vocabulary (shared)" so /home perex, /news body, and /admin/news
preview share one source of styling.
- Code-inside-`<pre>` contrast fix (was unreadable amber-on-silver).
- `.news-content` table styling (border, header band, row-hover).
**`scripts/dev/run-local.sh`** — local uvicorn launcher. Pulls Google
OAuth client id/secret from GCP Secret Manager
(`AGNES_OAUTH_GCP_PROJECT`-driven, no vendor defaults), points
`AGNES_CLI_DIST_DIR` at `./dist` so the wheel endpoint resolves, and
`--dev` flips `LOCAL_DEV_MODE=1` + `AGNES_HOME_ROUTE=/home` for one-
command iteration. `LOCAL_DEV_MODE=1` also enables the FastAPI debug
toolbar.
**CLAUDE.md "Run tests before every push" section** codifies
`pytest tests/ -n auto -q` as non-negotiable before each push.
**Tests**: 51 + 14 + 8 = 73 new tests across news-template repo,
sanitizer, API, web, CLI; plus updated home/auth/template tests for
the new shared-shell architecture.
Origin docs (gitignored, customer-fork content):
docs/brainstorms/home-page-requirements.md,
docs/plans/2026-05-07-001-feat-home-page-plan.md.
* feat(cli): agnes onboarded {on,off,status} — self-scoped flag toggle
User-facing equivalent of the in-page "Mark me as (off)boarded" button
on /home. POSTs /api/me/onboarded with {onboarded, source}; --source
overrides the audit-log marker so flips made from the CLI vs the web
button vs agnes init automation stay distinguishable.
`status` reads via /api/me/profile (when present); falls back to a
quick body-marker scan of /home so the read path doesn't write an
audit_log row. PAT-authed via cli.client.api_post — same convention
as agnes admin news / agnes admin add-user etc.
Tests: 5 covering on/off/status round-trip, idempotency, and
audit-log source recording. Full suite holds at 12 pre-existing
failures (same set as before).
* ui(nav+home): primary nav reorg + green What's new band + /marketplace link fix
Primary nav (post-rebase audit + per-user feedback):
- Items: Home → Marketplace → Data Packages → Memory. Admin dropdown
for admins only. The "Dashboard" label was renamed Home — point still
resolves through `home_route` so customer instances on /dashboard
still land there.
- Activity Center moved into the Admin dropdown. Per-team adoption
analytics is admin-consumed in practice; the route still allows
any authed user for direct deep-links so existing /home tile +
bookmarks keep working.
- Memory link added (→ /corporate-memory) — was previously buried in
the /home "Look around" tiles.
- Setup local agent + My Stack dropped from main nav. Setup is the
/home install flow's home now; My Stack lives as a tab inside
/marketplace.
/home tweaks:
- Plugin marketplace tile now points at /marketplace (was /store —
legacy from before the marketplace rebrand landed in #230).
- "What's new" section header gets a green band (success-flavored
D1FAE5 background, A7F3D0 border, darker green title) so the
bottom-of-page news block visibly distinguishes from the blue
install-hero at the top. Header strip only — body stays white.
Test fix: test_home_route_resolution renamed `dashboard_link_uses_home_route`
→ `home_link_uses_home_route` and asserts `href="/home">Home` instead
of `href="/home">Dashboard` after the label change.
* fix(home): decouple Step 3 + Connect-tools collapse from server onboarded flag
The server-side `users.onboarded` flip happens through two paths:
1. Explicit user click on "Mark me as onboarded" or `agnes onboarded on`.
2. Implicit `agnes init` POST → /api/me/onboarded on success.
Path 2 produced a UX surprise: an analyst running `agnes init` mid-flow
reloaded /home and saw Step 3 (auto-mode) + Connect-your-tools auto-
collapse to summary bars. They were actively working through those
sections — the install POST never signalled "I'm done with the rest
of setup", just "Agnes itself is installed".
Decouple the section-collapse decision from the server flag:
- Step 1 + Step 2 install blocks: still hidden on `onboarded=TRUE`
(their completion is a hard server signal — Agnes IS installed).
- Step 3 + Connect-your-tools: render flat by default in BOTH states.
Wrapped in `<details class="setup-collapsible" open>` so the
browser's native disclosure handles per-section toggle without JS,
but the `<summary>` is CSS-hidden until the page-level
`data-setup-minimized="1"` attribute is set on `.home-mock`.
- New "Minimize setup view" toggle inside the blue install-hero,
rendered only when onboarded. Click flips the data-attr on
`.home-mock` AND removes the `open` attribute from each
`<details>`. State persists in `localStorage["agnes_home_setup_minimized"]`
so the choice survives reloads but is per-device.
- "Show full setup view" (the same button when minimized) re-opens
both `<details>` and clears localStorage.
When minimized, each `<details>` still has its own native expand/
collapse — click the gray summary bar to peek at one section without
toggling the page-level minimize off.
Tests:
- test_step3_and_connectors_render_flat_when_onboarded_by_default —
asserts `<details class="setup-collapsible" ... open>` for both
sections post-onboarding and the absence of any server-rendered
`data-setup-minimized` attribute on the `.home-mock` root.
- test_minimize_toggle_visible_only_when_onboarded — toggle button
rendered only when onboarded.
Full pytest holds at 12 pre-existing failures (same set).
1. instance.yaml overlay path now matches read site under STATE_DIR.
Three sites updated:
- app/api/admin.py:1005 (server-config endpoint writer)
- app/api/admin.py:2610 (configure endpoint writer)
- app/instance_config.py:106 (overlay reader)
All three now go through _state_dir() so under flat-mount layout
(STATE_DIR=/data-state) the irreplaceable instance.yaml overlay
lands on the state disk (sdc) instead of the regenerable data
disk (sdb). Without this fix, .env_overlay correctly went to the
state disk while instance.yaml went to the data disk — config
would be lost if an operator wiped sdb.
2. Strip customer-specific tokens from OSS repo per CLAUDE.md
vendor-agnostic rule:
- docker-compose.host-mount.yml: 'a deployer (Groupon FoundryAI)'
→ 'a deployer in production'
- docker-compose.flat-mount.yml: 'caused 2026-05-05 in the
Groupon FoundryAI deployment' → generic 'production failure
mode'
- docs/state-dir.md: rewrote the incident reference to describe
the failure mode abstractly without naming the deployment;
updated the recommendation table to say 'shadow-mount class'
instead of dating the specific incident.
3. Updated docs/state-dir.md 'What reads STATE_DIR' to list all
read/write sites including the three migrated in this round
(admin.py, instance_config.py, marketplaces.py).
ANALYSIS finding (tls-rotate.sh hardcoded host-mount.yml) deferred
— same operator-side class as auto-upgrade.sh hardcoded host-mount,
documented limitation per the PR body.
Devin BUG: /api/admin/configure seeds an ai: block to the writable
overlay at DATA_DIR/state/instance.yaml, but the three LLM consumers
imported from config.loader.load_instance_config — which reads the
static config dir only. Even if they had read the overlay, the loader
ran yaml.safe_load directly without passing through _resolve_env_refs,
so '${ANTHROPIC_API_KEY}' would have stayed a literal placeholder. The
pipeline appeared to work because the factory falls back to the env
var directly, but the overlay path itself was dead code.
Two fixes, both required:
1. Switched the three LLM consumers to app.instance_config.load_instance_config:
- services/corporate_memory/collector.py:collect_all
- services/verification_detector/__main__.py:main
- app/api/admin.py:run_verification_detector
2. app/instance_config.py runs the loaded overlay through
config.loader._resolve_env_refs *before* the deep-merge, so
'${ANTHROPIC_API_KEY}' resolves at config-load time.
New regression suite tests/test_instance_config_overlay.py pins:
- env-ref resolution against the overlay (resolved when env set,
empty when env missing — never the literal placeholder)
- deep-merge still preserves static-only sections
- the three consumers reach app.instance_config (inspected via
inspect.getsource so a future refactor that reverts the import
fails the test)
- end-to-end: a seeded overlay + ANTHROPIC_API_KEY env reaches the
factory with a resolved api_key
* docs(spec): #134 unify BigQuery access behind BqAccess facade
Brainstorm output for issue #134. Captures:
- root cause (incl. correction of the issue's hypothesis about commit 33a9964)
- BqAccess facade API + project resolution rules
- error contract — typed BqAccessError mapped to HTTP 502 for upstream
BQ failures, 500 for deployment/config bugs
- migration plan for v2_scan, v2_sample, RemoteQueryEngine
- test rewrite eliminating _bq_client_factory injection point
- E2E verification protocol on agnes-development as success criterion
* docs(spec): #134 revise after first review
Incorporates code-reviewer findings:
Must-fix:
- Add v2_schema (2 copies of INSTALL/LOAD/SECRET dance) to migration scope.
- Reframe v2_scan headline: missing try/except around BQ calls is the
actual cause of bare 500s, not project resolution (which 33a9964 fixed).
- List two more deferred call sites (extractor.py, register_bq_table)
with explicit rationale.
Important:
- Drop billing != data clause from cross_project_forbidden heuristic;
rely only on 'serviceusage' substring. billing != data is normal
for cross-project setup, was over-classifying.
- Split bq_bad_request into _user (400) and _server (502) variants;
add sql_origin parameter to translate_bq_error so call sites declare
whether SQL contains user input.
- Add @functools.cache to BqAccess.from_config; document tests bypass
via dependency_overrides.
- Replace monkey-patched-classmethod test pattern with
BqAccess(client_factory=...) injection at construction time. Cleaner
than today's _bq_client_factory and 1:1 migration shape.
- Keep BqProjects.data (reviewer assumed registry has source_project;
it doesn't). Multi-project explicitly listed as non-goal with note.
Nice-to-have:
- Add 'Implementation strategy' section: 2 staged commits (bug fix
alone is revertable; refactor follows).
- Extend E2E protocol to cover all three endpoints, not just /sample.
- Note removal of stale docstring at src/remote_query.py:204.
* docs(spec): #134 revision 3 — incorporates second-round review
Must-fix from second review:
- v2_schema split into two migration cases: _fetch_bq_schema translates
errors via translate_bq_error; _fetch_bq_table_options preserves its
swallow-all 'except Exception → return {}' so /schema doesn't 502 on
partition-info failures.
- RemoteQueryEngine.__init__ now resolves BqAccess lazily (in
_get_bq_client, not in __init__). Without this, ~7 DuckDB-only tests
in test_remote_query.py would suddenly fail with not_configured.
- translate_bq_error pass-through for BqAccessError is now load-bearing
(clause 1, before any Google-API branch). bq.client() raises BqAccessError
for bq_lib_missing/auth_failed; without explicit pass-through those
fall to 'unknown' and re-raise as bare 500.
- Commit 1 now emits the SAME structured response shape as commit 2 to
avoid contract churn between commits.
- BIGQUERY_PROJECT env-var precedence is BREAKING for env-only deployments
— flagged in CHANGELOG ### Changed.
Editorial:
- sql_origin renamed to bad_request_status with values 'client_error' /
'upstream_error' (clearer about what the parameter actually decides).
bq_bad_request_user/_server kinds collapsed to bq_bad_request (400)
and bq_upstream_error (502).
- CLI (cli/commands/query.py) noted as external RemoteQueryEngine caller;
unaffected because new bq_access kwarg has default None.
- Added unit/integration tests for the new contracts:
test_translate_passes_through_BqAccessError,
test_v2_scan_returns_500_on_bq_lib_missing,
test_v2_schema_returns_200_with_empty_partition_on_bq_failure,
test_resolve_succeeds_after_config_set.
- E2E protocol now covers /schema as the fourth endpoint.
- Documented functools.cache-doesn't-cache-exceptions semantics and
fixture nullcontext-doesn't-close caveat for nested sessions.
* docs(spec): #134 revision 4 — incorporates third-round review
Third reviewer verdict: 'implementation-ready with two trivial edits';
explicitly noted prior rounds did the heavy lifting.
Edits:
1. get_bq_access() module-level function instead of @classmethod
@functools.cache from_config. Removes the classmethod-cache stacking
footgun (different Python versions wrap differently) and gives FastAPI's
dependency introspection a clean function signature. Drops the
'Do not subclass BqAccess' caveat that no longer applies.
2. Commit 1 strategy explicitly: wrap _fetch_bq_sample (v2_sample),
_bq_dry_run_bytes + _run_bq_scan (v2_scan), and _fetch_bq_schema
(v2_schema strict block). Do NOT touch _fetch_bq_table_options swallow-all
in commit 1 — preserved as-is, then migrated (still preserved) in commit 2.
All three endpoints emit the same structured body shape so client parsers
see one consistent contract throughout the staged rollout. No more
half-rolled-out window where /sample is bare 500 while /scan is
structured 502.
* docs(plan): #134 implementation plan — Phase 1 (atomic bug fix) + Phase 2 (BqAccess refactor) + Phase 3 (verification)
Bite-sized TDD tasks. 3 phases, 16 tasks total:
Phase 1 (Commit 1) — atomic bug fix across all four v2 endpoints:
Tasks 1.1-1.5 wrap _fetch_bq_sample, _bq_dry_run_bytes, _run_bq_scan,
_fetch_bq_schema with structured 502/400 try/except. _fetch_bq_table_options
preserved untouched. CHANGELOG Fixed entries.
Phase 2 (Commit 2) — BqAccess facade extraction + migration:
Tasks 2.1-2.5 build connectors/bigquery/access.py bottom-up
(BqProjects, BqAccessError, translate_bq_error, default factories,
BqAccess class, get_bq_access module-level cached). Task 2.6 adds
conftest.py fixture. Tasks 2.7-2.9 migrate v2_scan, v2_sample, v2_schema
to BqAccess. Tasks 2.10-2.11 migrate RemoteQueryEngine + tests
(lazy bq_access, drop _bq_client_factory). Task 2.12 CHANGELOG
Changed BREAKING + Internal.
Phase 3 — Verification:
3.1 full pytest. 3.2 squash into two PR-shape commits. 3.3 manual
E2E on agnes-development per spec protocol → close#134.
Self-review table maps spec sections to implementing tasks; no gaps.
* fix(v2): #134 structured 502/400 on BQ errors across /scan, /scan/estimate, /sample, /schema
Wraps the BigQuery call sites in v2_scan, v2_sample, and v2_schema (strict
block only) with try/except for google.api_core exceptions, translating to
HTTPException with a structured body shape: {error, message, details}.
Fixes Pavel's report (#134) where these endpoints returned bare HTTP 500
with no body when the SA on agnes-development hit cross-project Forbidden
on serviceusage.services.use.
Also fixes /sample's missing billing_project fallback (the bug 33a9964
fixed for /scan never landed here).
Status code split:
- /scan, /scan/estimate: BadRequest -> 400 (bq_bad_request) since SQL is
user-derived from req.select/where/order_by.
- /sample, /schema: BadRequest -> 502 (bq_upstream_error) since SQL is
server-constructed from validated identifiers.
- All Forbidden -> 502 with cross_project_forbidden if 'serviceusage' in
error message (with hint pointing at data_source.bigquery.billing_project),
else bq_forbidden.
Body shape matches what the upcoming BqAccess refactor (next commit) will
produce, so client-side parsers see one consistent contract throughout
the staged rollout.
_fetch_bq_table_options preserved exactly as-is — its swallow-all-and-return-empty
contract is intentional and survives into the refactor; /schema continues to
return 200 with empty partition info when partition queries fail.
Outer wraps in scan_endpoint, scan_estimate_endpoint, sample, and schema
endpoints exist only to make the test pattern (monkeypatching whole
_fetch_* functions) work, and are tagged TODO(#134 Phase 2) for removal
once BqAccess centralizes translation.
* refactor(bq): #134 BqAccess facade — unify v2_scan, v2_sample, v2_schema, RemoteQueryEngine
Extracts the duplicated BigQuery-access pattern (project resolution +
client construction + DuckDB-extension session + Google-API error
translation) into connectors/bigquery/access.py. Migrates four
call sites to use it:
- app/api/v2_scan.py — _bq_dry_run_bytes, _run_bq_scan
- app/api/v2_sample.py — _fetch_bq_sample
- app/api/v2_schema.py — _fetch_bq_schema (strict translation),
_fetch_bq_table_options (preserves swallow-all best-effort contract)
- src/remote_query.py — RemoteQueryEngine, lazy bq_access kwarg
The new module exposes:
- BqProjects (frozen dataclass: billing + data project IDs)
- BqAccessError (typed exception with HTTP_STATUS class mapping)
- BqAccess (facade with injectable client_factory/duckdb_session_factory
for tests; defaults call the real google-cloud-bigquery + DuckDB extension)
- get_bq_access (module-level @functools.cache; FastAPI Depends target)
- translate_bq_error (Google API exception → BqAccessError mapper, with
BqAccessError pass-through, 'serviceusage'-substring heuristic for
cross_project_forbidden, and bad_request_status param distinguishing
user-derived (400) from server-constructed (502) SQL)
- _default_client_factory, _default_duckdb_session_factory
RemoteQueryEngine.__init__ no longer accepts _bq_client_factory; tests
migrate to bq_access=BqAccess(projects, client_factory=...). DuckDB-only
RemoteQueryEngine tests need no changes — bq_access defaults to None and
get_bq_access() is only invoked on first BQ call (lazy resolution).
BqAccessError raised internally is translated to RemoteQueryError(
error_type="bq_error") in _get_bq_client to preserve the engine's
existing public contract — CLI and /api/query/hybrid callers see no change.
Endpoint tests (test_v2_scan, test_v2_scan_estimate, test_v2_sample,
test_v2_schema) migrate from monkey-patching whole _fetch_* functions
to using the new bq_access fixture in tests/conftest.py — which
exercises the REAL translation path through BqAccess + translate_bq_error,
closing the test gap flagged in Task 1.1's review.
Side-effect behavior change: v2_sample's FROM clause now uses the data
project (instance.yaml data_source.bigquery.project), not the conflated
billing_project from Phase 1. Documented in CHANGELOG ### Internal.
BREAKING for deployments combining BIGQUERY_PROJECT env var with
data_source.bigquery.project in instance.yaml — env var now overrides
data project too. See CHANGELOG ### Changed.
Two known-duplicate BQ-access sites (connectors/bigquery/extractor.py,
scripts/duckdb_manager.register_bq_table) explicitly out of scope;
tracked as follow-up.
Removed stale docstring at the previous src/remote_query.py:204
that referenced scripts.duckdb_manager._create_bq_client as the default
BQ client factory (RemoteQueryEngine never actually used that function).
Test counts: tests/test_bq_access.py +27 (new), tests/test_v2_*.py +
tests/test_remote_query.py migrated to bq_access fixture (counts unchanged
or +1-2 per file). Full suite: 2086 passed, 8 pre-existing failures
(DB migration tests with unrelated internal_roles DependencyException —
not introduced by this PR).
* fix(bq_access): translate DefaultCredentialsError to BqAccessError(auth_failed)
CI on PR #138 caught: bigquery.Client(...) resolves Application Default
Credentials at construction time; without ADC (CI without SA key, dev
laptop without 'gcloud auth application-default login') it raises
google.auth.exceptions.DefaultCredentialsError synchronously.
Pre-fix _default_client_factory only caught ImportError, so DefaultCredentialsError
propagated as raw exception — and from production endpoints would surface
as bare 500 (the exact failure mode #134 sets out to fix).
Now translates to BqAccessError(kind='auth_failed', details.hint='Run
gcloud auth application-default login...'). Endpoint catch chain returns
HTTP 502 with structured body. Adds unit test
test_raises_auth_failed_on_default_credentials_error.
Third-round spec review flagged this case in passing; the fix didn't land.
CI's auth-less environment surfaced it.
* fix(bq_access): get_bq_access() returns sentinel instead of raising when not configured
Devin BUG_0001 on PR #138 review: 'get_bq_access() as FastAPI Depends
breaks all v2 endpoints for non-BigQuery instances'.
Pre-fix: get_bq_access() raised BqAccessError(not_configured) when
neither BIGQUERY_PROJECT env nor data_source.bigquery.project was set.
Because FastAPI resolves Depends() BEFORE the endpoint body runs, this
exception fires during dep-injection — the endpoint's try/except
BqAccessError clause never gets a chance to catch it. Result: every
v2 request on Keboola-only or CSV-only instances returned bare HTTP
500, even for local-source tables that never touch BigQuery.
Fix: get_bq_access() now returns a sentinel BqAccess with empty
BqProjects and factories that raise BqAccessError(not_configured)
on actual use. Construction succeeds, FastAPI's dep-injection cleanly
yields the sentinel, the endpoint runs. The local-source code path
in build_sample / build_schema / etc. never calls bq.client() or
bq.duckdb_session() (it reads parquet directly), so non-BQ tables
return 200 as before. Only when an endpoint actually tries to query
BQ (source_type == 'bigquery') does the sentinel raise — and the
endpoint's existing except BqAccessError catches it normally,
returning structured 502 with hint.
Test get_bq_access::test_raises_not_configured_when_neither_set
renamed and rewritten to test_returns_sentinel_when_neither_set:
asserts BqAccess is returned, then asserts client() and
duckdb_session() each raise BqAccessError(not_configured) on call.
Test test_does_not_cache_exceptions removed (no longer applicable)
and replaced with test_sentinel_is_cached_per_process documenting
the operator-restart-on-config-change contract.
* docs(spec+plan): #134 genericize customer-specific tokens (CLAUDE.md OSS rule)
Devin BUG_0001/0002 round 3 on PR #138: spec and plan docs contained
customer-specific deployment hostnames, deployment names, and a GCP
project ID that violated CLAUDE.md's vendor-agnostic OSS rule
('Nothing customer-specific belongs in code, configuration defaults,
comments, docs, commit messages, PR titles, or PR bodies').
Replacements:
agnes-development.groupondev.com -> <your-agnes-host>
agnes-development -> <your-dev-instance>
prj-grp-dataview-prod-1ff9 -> <your-data-project>
s1_session_landings -> <bq_table_id>
E2E verification semantics unchanged — operators still run the same
four curls + config flip + retry, just substituting their own host /
deployment name / project / table.
* fix(bq_access): hook get_bq_access.cache_clear into instance_config.reset_cache
Devin ANALYSIS_0004 on PR #138: get_bq_access is @functools.cache'd at
process level, so it captures BigQuery project IDs at first call and
ignores subsequent instance.yaml changes. Pre-Phase-2 the v2 endpoints
re-read get_value() on every request, so admin /api/admin/server-config
saves (which call instance_config.reset_cache()) hot-reloaded the BQ
project. Without this fix, my refactor silently regresses that contract
— operators editing instance.yaml via the admin UI would see no effect
on v2 endpoints until container restart.
instance_config.reset_cache() now also calls
connectors.bigquery.access.get_bq_access.cache_clear() (lazy import,
swallowed if connectors module isn't loaded — keeps instance_config
usable in isolated unit tests).
Adds test_instance_config_reset_cache_invalidates_get_bq_access as
regression guard. Updates CHANGELOG Internal entry to mention the
hot-reload contract + the not-configured sentinel behavior (round-3
fix from Devin BUG_0001 was previously only in commit message).
* fix(bq_access): surface not_configured before identifier validation + plan path genericize
Devin BUG_0001 + BUG_0002 round 5 on PR #138.
BUG_0001 (plan doc): personal filesystem path violated CLAUDE.md
vendor-agnostic rule. Replaced with '<worktree-root>' placeholder.
BUG_0002 (sentinel error path): when get_bq_access() returns the sentinel
BqAccess (BQ not configured), the empty bq.projects.data was reaching
validate_quoted_identifier first and raising ValueError -> endpoint
mapped to HTTP 400 'unsafe_identifier' instead of structured 500
'not_configured' with hint.
Each fetch helper now checks 'if not bq.projects.data: bq.client()' as
the first step, which triggers the sentinel's BqAccessError(not_configured).
Endpoint catches the typed error and returns HTTP 500 with hint pointing
at data_source.bigquery.project. Best-effort _fetch_bq_table_options
returns {} silently in this case (preserves the swallow-all contract).
* fix(bq_access): classify DuckDB-native exceptions from bigquery_query() via string match
Devin ANALYSIS on PR #138 review (latest round). The DuckDB bigquery
extension is a C++ plugin making its own HTTP calls — when BQ returns
403, it throws duckdb.IOException with the BQ error embedded as text,
not gax.Forbidden. translate_bq_error's isinstance checks would miss
these, falling to case 7 → bare 500 in production for v2_scan, v2_sample,
and v2_schema (the bigquery_query() paths).
Fix: last-resort string-match heuristic before the re-raise. 'Forbidden'
/ '403' / 'Bad Request' / '400' in the lowercased message classifies via
the same kind hierarchy. The 'serviceusage' substring still distinguishes
cross_project_forbidden from bq_forbidden. Specific enough that random
exceptions without HTTP-error keywords still re-raise.
Adds 4 unit tests covering the new heuristic + the 'don't swallow random
exceptions' invariant.
* chore(release): cut 0.22.0
PR #138 contains issue #134 user-visible behavior changes:
- BREAKING: BIGQUERY_PROJECT env var now overrides instance.yaml
data_source.bigquery.project for v2 endpoints (previously
RemoteQueryEngine billing only).
- Fixed: structured 502/400 on /api/v2/sample, /scan, /scan/estimate,
/schema when BigQuery raises Forbidden/BadRequest (was bare 500).
- Internal: BqAccess facade refactor unifying four duplicate BQ-access
call sites; instance_config.reset_cache() now invalidates BqAccess
cache too so admin server-config saves hot-reload BQ project IDs.
Bumps to 0.22.0 because PR #137 merged first and took 0.21.0.
Adds /admin/server-config UI for editing instance.yaml from the web. Hardening: SSRF gate on data_source URLs, narrow-overlay write strategy, atomic writes, audit log with secret masking on shape changes, threading lock on read-modify-write, corrupt-overlay refusal on write side + louder log on read side, modal Promise resolution on backdrop dismiss, sentinel scrub on save (defense-in-depth client+server). Bundles Windows PowerShell wrapper from #80. Cuts release v0.13.0.
- Config writes to DATA_DIR/state/instance.yaml (writable) instead of
CONFIG_DIR (read-only :ro in Docker)
- instance_config.py checks DATA_DIR/state/ first, then falls back to
CONFIG_DIR for backward compat
- CalVer counter is now global across channels (*-YYYY.MM.*) per spec
- Keboola error messages sanitized — log full error, return generic msg
- chmod in secrets.py wrapped in try/except for Windows compat
- Setup wizard JS handles 401 (expired JWT) with user-facing message
- deploy.yml changed to workflow_dispatch only (no duplicate test runs)
- Smoke test uses docker-compose.prod.yml + AGNES_TAG instead of sed
- docker-compose.prod.yml uses ${AGNES_TAG:-stable} env var
663 tests pass. 8 E2E verification tests pass.