agnes-the-ai-analyst

Author	SHA1	Message	Date
Vojtech	79a958ec26	feat(setup): configurable instance brand + connector setup overhaul (#268 ) - instance.brand (env AGNES_INSTANCE_BRAND, default "Agnes") + instance.workspace_dir replace hard-coded "Agnes" / "~/Agnes" across /home, /setup, /setup-advanced, /login, /install, /me/debug, and the Claude Code clipboard setup script. Terraform-friendly env override; defaults preserve existing Agnes branding. - Explicit "create workspace folder" step on /home (OS-tabbed mkdir+cd) + same step baked into the clipboard script as step 2. Drops the implicit assumption that `agnes init --workspace .` lands in a sensibly-cd'd shell. - Final "Restart Claude Code" step in the setup script (unconditional, between connectors and Confirm) so freshly-installed plugins, MCP servers, and SessionStart hooks load on the next Claude Code session. - Asana reverted from hosted Remote MCP back to PAT + raw REST against app.asana.com/api/1.0. MCP envelope shape consumed ~5x tokens per call; the PAT path lets the agent read flat REST fields. Existing MCP registration is detected and the user is asked whether to remove it (default Y, with benefits listed: token cost, no third-party hop, no OAuth refresh dance, deterministic envelope shape). - Atlassian connector instructs picking the longest API-token expiry (today "1 year") to cut re-mint friction. No public query-parameter hook exists on id.atlassian.com to pre-select expiry, so the prompt documents the manual click and acknowledges that limitation. - Uniform ✅ / ❌ per-connector marker contract (Asana, GWS, Atlassian) for the Confirm summary to grep. Each connector now ends with a Claude-driven end-to-end test that uses Claude Code's own bash to exercise the stored credential and prints "✅ <Connector> integration verified — ..." (or the failure variant).	2026-05-12 17:10:08 +02:00
ZdenekSrotyr	99b9379ba3	Merge remote-tracking branch 'origin/main' into worktree-catalog-bq-hotfix # Conflicts: # CHANGELOG.md	2026-05-12 11:56:49 +02:00
minasarustamyan	dc5e0e0d11	Marketplace UX overhaul: rich plugin/skill/agent detail + filename rename (#251 ) * Rename agnes-metadata.json to marketplace-metadata.json Curated marketplace enrichment file (.claude-plugin/agnes-metadata.json) becomes marketplace-metadata.json. Clean cut, no fallback — curators of upstream marketplace repos must rename the file on their side. Python API renames mirror the file rename: read_agnes_metadata → read_marketplace_metadata, AGNES_METADATA_REL → MARKETPLACE_METADATA_REL, AGNES_METADATA_MAX_BYTES → MARKETPLACE_METADATA_MAX_BYTES. Synth Claude Code marketplace strip rule (.agnes/** + the metadata file) follows the new filename. * Marketplace detail polish: window cover + 715:310 aspect + helper alignment - Plugin & item (skill/agent) detail hero: 160x160 square cover replaced with a macOS-style window frame (3 traffic-light dots + titlebar label showing the entity name). Body is constrained to 715:310 so curator- uploaded covers no longer crop to a square. Window is 380px wide; meta column and absolutely-positioned top-right install/remove actions stay put. Fallback when no cover_photo_url (translucent gradient + PL/SK/AG initials) is unchanged, just inside the window body. - Inner skill/agent cards in the plugin detail's Internal structure section adopt the same 715:310 aspect (was fixed 78px tall). No window chrome on inner cards — just the matching proportions so covers read consistently across hero, grid tiles, and listing cards. - Curated nested item helper text ("This skill is part of ... — add the bundle to your stack to use it") now stacks UNDER the "Open parent plugin" button instead of being a side-by-side flex sibling in the actions-row. Added align-self: flex-end so the 260px helper box anchors at the right edge of the 300px actions column, matching the button's right edge. * Marketplace My tab: surface the same category + type filters as Flea - Frontend: mp-cat-row and mp-type-row now show on tab=my (previously hidden — type was flea-only, category was flea/curated-only). Curated browse stays plugin-only and continues to hide the type pills. fetchOne() sends the `type` param for tab=my too, so the items endpoint's existing my-branch filter actually receives it. - Backend categories endpoint, tab=my branch: when the type filter is set to skill/agent, skip counting curated subscriptions. Curated plugins are always type='plugin', so they wouldn't survive the items endpoint's type filter; including them in the category counts made the pill numbers overstate what users could actually see in the grid. type=None or type='plugin' keeps the previous behaviour. - CHANGELOG entry under [Unreleased]. * Marketplace plugin detail: render rich content from marketplace-metadata.json Adds five optional plugin-level fields to marketplace-metadata.json and renders them on the curated plugin detail page + listing card: * display_name — friendly h1 / listing-card name / mac-window titlebar label (overrides the technical plugin id) * tagline — punchy 1-line value prop for the hero subtitle and the listing card description (replacing the verbose marketplace.json description on cards) * description — multi-paragraph markdown body, server-side rendered through markdown-it-py and sanitized through nh3 with a description-scoped allowlist (no iframes / no raw HTML / no javascript: links). Powers the "What it does" panel. * use_cases[] — {title, description, prompt} entries that render as a 3-column "When to use it" card grid; each card shows the literal prompt as a code chip so users can copy-paste into Claude Code. * sample_interaction — {user, assistant} dialog rendered in a Claude Code-style dark Catppuccin Mocha transcript panel: monospace user row with a green ">" prompt indicator + sans-serif assistant body with markdown formatting (peach bold, yellow italic, pink inline code, mantle-dark fenced code blocks). All five fields are optional; UI sections only render when populated, so plugins without enrichment look identical to before. Fields are read on-demand from the working tree (cached by mtime per marketplace slug) so curator edits land at the next request without waiting for a sync cycle — same pattern as the existing inner-skill/agent enrichment path. No DB schema bump. Skill / agent rich-content rendering is deferred to a later phase (needs a source-of-truth decision: extend plugin.yml? LLM-generate from SKILL.md / agent.md?). The schema accepts the same fields at skill/agent level today for forward compatibility but the UI ignores them for now. Also: stripped a stale `background-color: var(--bg)` from the global `code` rule in style.css (was making inline code visually disappear on the page background). * Skill / agent detail: render rich content from marketplace-metadata.json Brings the skill/agent detail pages to parity with the plugin detail page. Same rich-content schema (display_name, tagline, description as markdown, use_cases[], sample_interaction) plus two per-item additions: * invocation — curator-provided literal command string. When set, overrides the computed "<manifest_name>:<inner_name>" chip and cleanly supports both "/" skill prefix and "@" agent prefix (the hardcoded "/" in the chip markup is hidden when the curator provides the invocation, so /grpn-eng:query <q> and @grpn-eng:cto-architect both render correctly). * when_to_use — markdown disambiguation block ("Use this for X. For similar Y, see /other-skill") rendered into a new "When to use this" panel below the Example section. Skill / agent category is now per-item overridable in marketplace-metadata.json. When absent, the API keeps the parent plugin's category as the badge so existing items don't lose their category until curators opt in to per-item categorization. The new "Example" Q&A panel uses the same Claude Code-style dark Catppuccin Mocha transcript treatment as the plugin detail — monospace user row with a green ">" prompt indicator + sans-serif assistant body with markdown formatting. All new fields are optional and read on-demand from the working tree. Skills / agents whose marketplace-metadata.json doesn't carry rich content render exactly the same way they did before (frontmatter description + computed slash command + cover from existing v32 enrichment). No DB schema bump. * Fix TypeError in skill / agent detail when curator sets per-item category `curated_skill_detail` and `curated_agent_detail` were passing both `parent` (from `_curated_inner_parent_fields`, which returns the parent plugin's category as a fallback) and `enrichment` (from `_curated_inner_enrichment`, which returns the per-item category override when the curator set one) into `InnerDetailResponse(...)`. Python function-call kwargs unpacking with overlapping keys raises `TypeError: got multiple values for keyword argument 'category'` — it doesn't merge like a literal dict does. The bug only surfaced when the marketplace-metadata.json carried a `category` field at skill / agent level (curator opting into per-item categorization); items without that override hit the endpoint cleanly because only parent provided the key. Fix: build `merged = {parent, enrichment}` first (literal-dict syntax DOES merge, with the right-hand-side winning) and unpack the merged dict. Curator override still wins via the merge order, and the same pattern is future-proof for any other field that lands in both layers later. Plus a regression test in test_marketplace_metadata.py asserting that the inner-resolver carries `category` for downstream merging. * Marketplace detail: tolerate partial curator JSON Server constructed UseCase / SampleInteraction via raw dict indexing (uc["title"], sample["assistant"]), so a curator commit missing any required Pydantic field crashed the whole plugin / skill / agent detail endpoint with a 500. Route both constructions through _safe_use_case / _safe_sample_interaction helpers — partial input silently drops the malformed card / section instead of breaking the page. Regression test in test_marketplace_api.py covers the three shapes: use_case missing a key, use_case with an empty string, and sample_interaction with only user (no assistant). Sibling rich fields still render. * Address PR-251 review (must-fixes + S2/S3 polish) + release-cut 0.50.0 Five must-fixes from the review pass (3 from @cvrysanek's two-stage review, 2 from my independent pass), plus the 0.50.0 release-cut as the last commit on this PR per CLAUDE.md (CLAUDE.md "Release-cut belongs to the PR" rule added in v0.49.1). Must-fixes ---------- 1. Cache eviction: bounded LRU instead of per-marketplace predicate. The previous predicate (`k[0] == marketplace_id and k[1] != mtime_ns`) only swept stale entries for the CURRENT marketplace; with N>100 distinct marketplaces each holding one mtime key, the cap silently failed and memory grew linearly. Replaced with OrderedDict-backed bounded LRU at cap=256, drop oldest insert on overflow. Cache stress test pinned in test_marketplace_metadata.py. 2. Render CPU cap: per-field byte cap on description / when_to_use / sample_interaction.assistant via MARKETPLACE_METADATA_FIELD_MAX_BYTES (= 64 KiB). Without this, a 1 MiB curator markdown body × QPS = curator-controlled CPU burn through pure-Python markdown-it-py. Truncation respects UTF-8 boundaries and logs a warning so the curator sees the cap fire on the next sync. Test for cap + UTF-8-boundary preservation. 3. Inner-detail bypassed the metadata cache. _curated_inner_enrichment, _curated_inner_cover, and curated_detail all called read_marketplace_metadata directly, defeating the mtime cache the plugin listing already shared. Routed all three through _read_metadata_cached so skill/agent detail hits are O(1) re-parses per marketplace per mtime instead of O(QPS). 4. Truthy-vs-presence trap in plugin/inner enrichment merge. API-layer writers used `if resolved.get(k):` which silently dropped any future falsy-but-valid resolver field (bool featured=False, int priority=0, str category=''). Switched to presence check (`if k in resolved`) so the resolver is the authority on field presence; `{parent, enrichment}` merge respects whatever the resolver decided to ship. 5. Vendor-agnostic OSS cleanup. Removed operator-specific token references (/grpn-eng:, @grpn-eng:, .foundryai/) from src/marketplace_metadata.py docstring, app/web/templates/ marketplace_item_detail.html JS comment, docs/curated-marketplace- format.md, and tests/test_marketplace_metadata.py fixtures. Replaced with generic /my-plugin:tool / @my-agent:role / .example/ placeholders. CHANGELOG --------- - New "### Fixed (PR #251 follow-ups)" section documenting all 4 code-side must-fixes - New "### Internal" section noting the vendor cleanup + new tests - BREAKING bullet for the file rename now covers operator-side migration: running instances see plugin enrichment disappear from the UI until upstream curator renames + nightly sync overwrites the working tree; POST /api/marketplaces/{id}/sync forces refresh sooner - Stripped /grpn-eng: leaks from the existing skill/agent rich-content bullet Tests ----- 128 targeted tests pass (test_marketplace_metadata, test_marketplace_api, test_marketplace, test_markdown_render, test_marketplace_synth_strip, test_marketplace_filter). New tests added: - 6 XSS regression tests on render_safe (javascript:/data:/vbscript: schemes via autolink, reference link, and mixed-case + positive http/https/mailto + noopener noreferrer rel) - 3 byte-cap tests (truncation + UTF-8 boundary + under-cap pass-through) - 1 cache eviction stress test (>256 marketplaces -> bounded at cap) - 1 truthy-vs-presence resolver-contract test Release-cut ----------- - pyproject.toml 0.49.1 -> 0.50.0 (minor; BREAKING file rename per pre-1.0 CHANGELOG note: "breaking changes called out under Changed or Removed with the BREAKING marker") - CHANGELOG [Unreleased] -> [0.50.0] - 2026-05-12, new empty [Unreleased] on top. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-12 08:38:39 +00:00
ZdenekSrotyr	b6cdd68e8d	feat(catalog): entity_type + validated where_examples + view-aware cost-guard + scheduler hygiene Three behavioural improvements driven by the sub-agent end-to-end test findings, plus scheduler tweaks to prevent the post-deploy contention burst we measured. CATALOG (catalog-side bugs the test agents tripped on): - new entity_type field per remote row (BASE TABLE / VIEW / MATERIALIZED VIEW). For views, rows + size_bytes return null instead of the misleading 0 that __TABLES__ reports. - where_examples now validates against the table's actual schema (cached known_columns from refresh). The pre-fix behavior blindly advertised `country_code = 'CZ'` on tables with no country_code column — the sub-agent tests reliably hit this on unit_economics. - new known_columns + entity_type columns on bq_metadata_cache; populated by bq_metadata_refresh.refresh_one from the same fetch_bq_columns_full call (no extra BQ roundtrip) plus a cheap INFORMATION_SCHEMA.TABLES lookup for table_type. QUERY COST-GUARD: - remote_scan_too_large suggestion now names views explicitly: `Target(s) <ids> are VIEW or MATERIALIZED VIEW. BigQuery does not push LIMIT into the view body — SELECT * FROM <view> LIMIT 1 still runs the full underlying scan.` Programmatic consumers get a new view_targets field on the error detail. SCHEDULER HYGIENE (the post-deploy 1-minute window where concurrent parquet downloads dropped to ~1 MB/s): - SCHEDULER_STARTUP_GRACE_SECONDS (default 60) holds the first tick so the burst doesn't overlap cache_warmup writes. - SCHEDULER_BQ_METADATA_INITIAL_OFFSET_MAX_SECONDS (default 900) randomises bq-metadata-refresh's first-fire offset. TESTS: - test_bq_metadata_cache_repo: entity_type + known_columns round-trip - test_v2_catalog_remote_metadata: where_examples validation, views return null rows/size_bytes, cold rows have empty examples - test_api_query_guardrail: VIEW-aware suggestion text + view_targets - test_connectors_bigquery_metadata: entity_type lookup mock + new fields in TableMetadata expectations - test_scheduler_sidecar: grace + jitter env-var resolution	2026-05-12 10:37:35 +02:00
Vojtech	c09c85d13a	fix(cta): clipboard fallback + fold Atlassian MCP into connectors (#249 ) * fix(cta): fall back to textarea+execCommand when Clipboard API rejects The "Setup a new Claude Code" CTA fetches /auth/tokens, parses the JSON response, renders the setup script, THEN calls `navigator.clipboard.writeText()`. Modern browsers (Safari, Firefox, and Chrome on stricter configurations) reject `writeText` with NotAllowedError when transient user activation has been consumed by an intervening `await` — which is exactly the case here. Users perceived this as "the browser blocked the copy" and got the manual-paste fallback modal even though the textarea + `document.execCommand('copy')` path WOULD have worked synchronously without needing fresh user activation. `copyToClipboard` now: - prefers the modern Clipboard API (unchanged for the happy path) - on writeText rejection, falls back to `copyViaTextarea` instead of surfacing the rejection to the caller's catch block. `copyViaTextarea` is the previously-inline textarea fallback factored out into a named helper, with two small hardening touches: - `readonly` + `tabindex=-1` so the hidden textarea doesn't steal focus or pop the virtual keyboard on mobile. - explicit `setSelectionRange(0, text.length)` to belt-and-braces the selection on iOS Safari (where `.select()` alone sometimes selects zero chars on touch-focused textareas). Only the CTA button needed this — the Step-1 install-command and the connector-copy buttons all call `writeText` synchronously inside the click handler (no awaits in between), so they keep their existing user-gesture context and didn't hit the same rejection. No template changes there. * refactor(home): fold Atlassian MCP registration into connectors block The standalone "Register the Atlassian MCP server" step (was step 6 in the unified setup script) moves INTO the Atlassian connector's prompt body so all Atlassian-related setup lives in one logical group. Same intent that #247 carried for connectors, applied one level deeper: the hosted Remote MCP registration is part of "set up Atlassian", not its own ungrouped step. What changed: - `app/web/connector_prompts.py` — the Atlassian prompt's step 5 replaces the speculative "Register the on-demand Atlassian MCP under .claude/mcp/atlassian" line with the actual hosted Remote MCP registration: `claude mcp add --transport sse atlassian https://mcp.atlassian.com/v1/sse \|\| true`. The `\|\| true` keeps re-runs idempotent and the body explains the OAuth-on-first-use contract. Both /home's Atlassian tile and the inlined setup-script Atlassian sub-block emit this line — single source of truth holds. - `app/web/setup_instructions.py` — `_mcp_servers_block` deleted; the `mcp_servers` step is removed from `_step_numbers`; resolve_lines no longer calls it. - Renumbering: install (1), init (2), catalog (3), preflight (4), marketplace (5), diagnose (6), connectors (7), confirm (8). Was: 6 = mcp_servers, 7 = diagnose, 8 = connectors, 9 = confirm. - `tests/test_setup_instructions.py` — Confirm step 9→8, Connect 8→7, diagnose 7→6, mcp_servers references dropped. `test_step_numbering_with_connectors_step` now asserts `"mcp_servers" not in steps`. Stray-Confirm assertion lists shift by one position. - `tests/test_setup_page_unified.py` + `tests/test_web_ui.py` — same step-number shifts in the rendered /setup preview assertions. The `claude mcp add` line is still the Atlassian Remote-MCP path that the 2026-05-10 init-report Fix C added — only its position in the flow changes. /home Atlassian tile copying continues to install the MCP too (the prompt body the tile pastes contains the same line). 112 tests pass. * feat(atlassian): operator-overrideable base URL via AGNES_ATLASSIAN_BASE_URL Adds an env var / YAML key the operator (Terraform module, customer-VM template, OSS instance.yaml) can set to bake the Atlassian Cloud site root into the connector prompt — so end users don't have to guess / paste their org's `https://<myorg>.atlassian.net`. When set, the Atlassian connector prompt (rendered on both /home tile and inlined into the setup-script step 7 Atlassian sub-block) replaces step 1's "Ask me for my Atlassian Cloud site URL and email" with a one-line note that the URL is already provisioned by the operator and asks only for the email. Step 4's helper-script body has the `BASE_URL='<the site URL I gave you>'` placeholder substituted with the literal value. When unset (empty), the existing "ask the user" flow remains — no regression for OSS instances. Resolution + normalization in `get_atlassian_base_url()`: - env `AGNES_ATLASSIAN_BASE_URL` > yaml `instance.atlassian.base_url` > "" - strips trailing slash + trailing `/wiki` so the canonical value is the bare site root. Matches the per-user helper script's normalization at storage time (atlassian_prompt step 4 guard 2), so the literal baked in by the operator stays consistent with what the user's helper script would have computed from their input. Plumbing: - `app/instance_config.py`: new `get_atlassian_base_url()` resolver. - `app/web/connector_prompts.py`: - `atlassian_prompt(, base_url: str = "")` — string-replace two explicit placeholder phrases when base_url is truthy; otherwise return the prompt unchanged. - `all_connector_prompts(..., atlassian_base_url: str = "")` — forwards the kwarg. - `app/web/router.py` (`_build_context`): reads `get_atlassian_base_url()` and passes it through to `all_connector_prompts(...)` so both the /home tile context AND the inlined-script `resolve_lines(...)` call use the same value. - `src/welcome_template.py` (`compute_default_agent_prompt`): same threading via the existing import-on-demand path. Tests (`tests/test_home_route_resolution.py`): - `get_atlassian_base_url` resolver: default empty, env override, trailing-slash strip, trailing-`/wiki` strip. - `atlassian_prompt(base_url=...)`: literal URL baked in, ask-step removed, placeholder replaced, operator-baked-in copy appears. - `atlassian_prompt(base_url="")`: existing ask-the-user flow unchanged. - `all_connector_prompts(atlassian_base_url=...)`: kwarg threads through to the rendered atlassian prompt. 135 tests pass. feat(asana): register hosted Asana Remote MCP in connector prompt The Asana connector prompt only stored a PAT in the OS keychain + ran a curl verify against /api/1.0/users/me. That set Claude Code up for direct `curl` calls but didn't actually wire Asana into Claude's tool list — so the user couldn't ask Claude to "find my open Asana tasks" and have it work. Symmetric oversight to the Atlassian connector's original speculative `.claude/mcp/atlassian` line that this branch already replaced with `claude mcp add --transport sse atlassian https://mcp.atlassian.com/v1/sse`. Adds a new step 5 that registers Asana's hosted Remote MCP: claude mcp add --transport http asana https://mcp.asana.com/mcp \|\| true This is the V2 endpoint (streamable HTTP transport, launched February 2026). The V1 SSE endpoint at https://mcp.asana.com/sse was deprecated 2026-05-11 (today) and must NOT be used — calling it out explicitly in the prompt body so a future operator who finds an old reference doesn't paste the dead URL. OAuth is handled by Claude Code at first use, same model as the Atlassian MCP step. The PAT stored in step 3 stays for direct `curl` calls (precheck + ad-hoc scripts) — the MCP path uses its own OAuth grant, not the PAT. Old step 5 (revoke instructions) renumbers to step 6 and adds the `claude mcp remove asana` cleanup hint. Same single-source-of-truth invariant holds: /home Asana tile + the inlined Asana sub-block in the setup script (step 7 connectors) both emit identical text from `asana_prompt()`. 71 tests pass. * feat(asana): drive MCP OAuth login + end-to-end validation post-register `claude mcp add --transport http asana ...` only registers the server in Claude Code's local config — it does NOT trigger OAuth. The browser tab opens the first time any `mcp__asana__` tool gets invoked. So the previous step 5 left a user looking at a "registered" MCP that, in practice, hadn't authed yet and would fail on first real use. Same blind spot Atlassian's prompt also has, but Asana was the one called out in the latest review pass. Adds a new step 6 between MCP registration (step 5) and the revoke instructions (now step 7): a. Tell the user verbatim what's about to happen — a low-impact read through the MCP will pop the OAuth browser tab; sign in with the same account whose PAT they stored in step 3 and approve. Frames the OAuth as one-time so users don't wait for it on every later call. b. Drive an actual MCP read. Don't prescribe the exact tool name because the Asana MCP's exposed surface (`mcp__asana__`) is versioned upstream and we don't want to pin to a name that gets renamed. Instead: tell Claude to pick the lightest read from its surfaced tool list (users-me / list-workspaces / equivalent). Document the recovery path when Claude Code times out waiting for the OAuth tool use: `claude mcp list` to confirm registration before retrying. c. Print a single one-line proof that combines wiring + auth: "Asana MCP connected as <name> — <N> workspace(s) visible." Explicit anti-echo callout for tokens, task content, comments. On failure, surface the exact Claude-Code error and stop — no silent pass. d. Sanity-check that the MCP OAuth identity and the PAT identity reference the same Asana account. Easy mistake to make when the user has multiple Asana accounts — flag only on mismatch, keep quiet when they match. Recovery: `claude mcp remove asana && claude logout asana` then redo step 5. Step 7 (revoke) absorbs both the keychain delete + the `claude logout asana` line so users have a single place to undo everything. 43 tests pass. * fix(init): clear stale CA env vars on Windows before any TLS handshake Reported by the 2026-05-11 Windows test pass: after `agnes init` the gws connector failed with `UnknownIssuer` TLS errors because `SSL_CERT_FILE` and `REQUESTS_CA_BUNDLE` were still set in Windows User scope pointing at `C:\Users\localadmin\.config\agnes\ca-bundle.pem` — a file that did not exist on the test host. Past Agnes installs (the setup-prompt trust block + older bootstrap helpers) write those pointers when they materialize a combined Agnes-CA bundle; when the bundle file later disappears (re-init on a new VM, machine swap, the ~/.agnes dir wiped), the pointers go stale and every native Windows TLS handshake fails before Agnes itself runs. SSL_CERT_FILE in particular REPLACES (not appends to) the trust store, so a stale pointer is silently catastrophic. `agnes init` now clears stale pointers in two layers before the first server roundtrip: 1. Current-process env (os.environ) — what the immediately-following `api_get` to /api/catalog/tables actually reads. Without this, init itself blows up before it gets to step 2. 2. Windows User-scope env via PowerShell `[Environment]::SetEnvironmentVariable(name, $null, 'User')` — what every future shell + every native tool (gws, claude.exe, pip, uv) inherits. The 2026-05-11 reporter expected this exact cleanup ("init was supposed to clear these but they persisted"). The cleanup is best-effort and conservative: - Only deletes a var when its value points at a path that does NOT exist on disk. Intentional operator config (e.g. SSL_CERT_FILE pointing at a corp certifi bundle) stays put. - PowerShell missing / restricted execution policy / WSL-without-pwsh: swallowed silently. The current-process leg still runs, which unblocks init even on hosts where the User-scope leg cannot fire. Tests (`tests/test_init_ca_cleanup.py`, 6 cases): - Stale pointers → removed from process env. - Real-path pointers → preserved. - Non-Windows hosts: PowerShell is not invoked. - Windows hosts: PowerShell IS invoked with a script that checks all three vars + uses Test-Path + SetEnvironmentVariable. - PowerShell FileNotFoundError: cleanup swallows it, does not raise. - `_is_windows_host()` reflects sys.platform. * refactor(asana): MCP-first flow — drop PAT storage, precheck via `claude mcp list` The Asana hosted MCP at https://mcp.asana.com/mcp authenticates via OAuth (Claude Code holds the grant; browser tab pops on first tool use). The earlier prompt walked the user through creating + keychain- storing an Asana Personal Access Token AND registering the MCP — two parallel auth surfaces for one connector. Once the MCP works, the PAT has no consumer: the precheck/verify steps that used `curl $BASE/api/1.0/users/me` are just redundant proof that Asana itself is reachable, which the OAuth handshake already establishes. Removed: - Step 0 keychain probe + curl verify against /users/me with PAT. - Step 1 open developer-console / create PAT. - Step 2 click "+ New access token", warn shown-ONCE. - Step 3 helper-script for keychain-storage (per-OS bodies: macOS `security add-generic-password`, Linux `secret-tool store`, Windows `cmdkey /generic`). - Step 4 PAT-side `users/me` verify. - Step 5's split that kept the PAT around for direct curl scripts. - Step 6d's "MCP vs PAT identity sanity check" — there is no PAT anymore, nothing to mismatch against. New flow (3 steps total): - Step 0 precheck: `claude mcp list \| grep ^asana` — if found, the server is registered AND Claude Code is holding its OAuth grant (otherwise prior failure would have removed it); print "Asana MCP already registered — skipping setup" and stop. Tells the user the explicit reset command (`claude mcp remove asana && claude logout asana`) so a re-register stays one paste. - Step 1: `claude mcp add --transport http asana https://mcp.asana.com/mcp` — no `\|\| true` because step 0 should have caught the "already exists" case. Step explains the V2-vs-V1 endpoint distinction (V1 SSE deprecated 2026-05-11) and the abort-clean recovery if the precheck somehow missed the existing server. - Step 2: same OAuth + low-impact-read validation pattern as before. - Step 3: revoke instructions (mcp remove + logout + Asana-side app revoke at app.asana.com/Settings → Apps). Both surfaces (the /home Asana tile and the inlined Asana sub-block in the setup script's step 7) emit the new text from the same asana_prompt() — single-source-of-truth invariant intact. 77 tests pass.	2026-05-11 21:54:51 +02:00
ZdenekSrotyr	b3841f5b6c	release: 0.50.0 — persistent BQ metadata cache + scheduled refresh; catalog never blocks on BigQuery Since 0.47.0 GET /api/v2/catalog enriched each remote BigQuery row by fetching INFORMATION_SCHEMA.TABLE_STORAGE + COLUMNS through the DuckDB BigQuery extension inside the request. On cold caches that fanned out to O(N) sequential BQ jobs-API roundtrips — easily 90 s+ on partitioned / view-backed tables — and reliably blew the CLI's 30 s httpx ReadTimeout. Reproduced with py-spy: three AnyIO worker threads stuck inside connectors/bigquery/metadata._fetch_via_legacy_tables. Refactor: enrichment is read exclusively from a new persistent bq_metadata_cache DuckDB table (schema v40), populated by a scheduler- driven refresh job at SCHEDULER_BQ_METADATA_REFRESH_INTERVAL (default 4 h). Cold catalog response on a fresh container is now tens of milliseconds with metadata_freshness=never_fetched for unwarmed rows. New surface: - POST /api/admin/run-bq-metadata-refresh (scheduler-driven, full) - POST /api/v2/metadata-cache/refresh?table=<id> (admin, single) - GET /api/v2/metadata-cache/status (auth, non-admin) - metadata_freshness field per catalog row Removed (internal API): v2_catalog._size_hint_for_row, _resolve_remote_metadata, _metadata_provider_for, _build_metadata_request, _materialized_size_hint, in-memory _metadata_cache. Response shape unchanged for external consumers. 991 tests passing; 2 pre-existing failures (test_db v3→v4 ladder, test_cli_binary_rename) unrelated to this change.	2026-05-11 20:37:17 +02:00
Vojtech	a46b9dc928	/home install-hero polish: license link contrast, auto-mode reorder, Shift+Tab guidance (#243 ) * Make /home install-hero links readable against blue background The Claude license-options link added in the previous commit inherited the default `<a>` style (`var(--hp-primary)` blue), which renders as blue-on-blue and is unreadable inside the blue install-hero. Add a scoped `.install-hero a` rule that uses white with an underline (matching the existing lead-paragraph contrast pattern) so any link nested in the hero stays legible. * Reorder /home install flow: auto-mode is now Step 2, Agnes install becomes Step 3 Step 3 (was Step 2) pastes a ~20-command bash bootstrap into a fresh Claude Code session. Without auto-mode enabled first, each Bash/edit command needs a manual approve click — bad UX for first-time users. Move auto-mode from the outside-hero `<details>` reference block into the install-hero as a real Step 2, between "install Claude Code" and "install Agnes". Content is the persistent `acceptEdits` snippet (write to ~/.claude/settings.json) plus a one-liner pointing at Shift+Tab for users who are already inside a running Claude Code session. YOLO mode for full Bash auto-approve stays on /setup-advanced behind the existing link. The outside-hero `setup-collapsible[data-section="step3"]` block is dropped — auto-mode is no longer reference content, it's a real install step, and duplicating it would just diverge over time. Onboarded users no longer see the auto-mode block at all (consistent with Steps 1 + 3 also hiding post-onboarding). Completion banner copy updated: "Step 1, 2 & 3 done — Claude Code installed, auto-mode set, Agnes ready". Dashboard CTA partial and other templates don't reference step numbers for this flow, so no adaptation needed there. * Simplify /home Step 2 to Shift+Tab only — drop the JSON snippet Operator pointed out two issues with the prior Step 2: 1. The settings.json snippet is redundant. Claude Code's first Shift+Tab cycle to auto-accept mode already prompts the user whether to persist it as default — Claude writes the config itself, no manual file edit needed. 2. The snippet only showed the POSIX path `~/.claude/settings.json`, which doesn't translate to native Windows. Replace the snippet + copy button with a plain Shift+Tab instruction, explicitly call out the first-time "make this the default?" prompt, and note that Claude handles the config write itself — same flow on macOS / Linux / WSL / Windows. Adds a fallback line for users who already closed the post-OAuth session. * Tighten /home Step 2 install-note to two paragraphs Operator: drop the 'Claude writes the setting itself, so this works the same on macOS / Linux / WSL / Windows...' line plus the 'auto-approves file edits going forward; Bash commands stay gated — that's the safe default' line. Both were filler — the make-default prompt already implies persistence, and gated Bash is the obvious default users won't be surprised by. Result: paragraph 1 carries Shift+Tab + first-time make-default say-yes + closed-session fallback in one breath; paragraph 2 keeps the verbatim YOLO link. Same affordances, less vertical space.	2026-05-11 16:46:58 +00:00
minasarustamyan	9de679c714	System plugins (schema v39) + marketplace UX polish + drop legacy pages (#241 ) * System plugin tier with mark/unmark fanout (schema v39) Adds a mandatory plugin tier so admins can pin a small set of curated plugins into every user's stack from day one. Marking a plugin via the new toggle on /admin/marketplaces materializes resource_grants for every group and user_plugin_optouts subscriptions for every user, so the existing resolver pulls the plugin into every served set without a new filter layer. Hooks on user-create (Google OAuth, magic-link, admin POST, scheduler) and group-create propagate the same materialization to new principals. UI locks: /admin/access disables the checkbox with a SYSTEM pill; /marketplace cards swap the "In stack" green pill for an amber "Required" badge with shield icon; the plugin detail install button reads "Required by your org"; /my-ai-stack toggle is disabled. Bypass paths return 409 (DELETE /api/admin/grants for system grants, PUT /api/my-stack/curated/.../{enabled:false}, DELETE /api/marketplace/curated/.../install). Unmark only flips the flag — materialized rows persist so admins curate cleanup at their leisure through the now-unlocked /admin/access checkboxes. * Marketplace UX polish + drop legacy /store and /my-ai-stack pages Two-part cleanup post-v39: (1) Page deletion. /store and /my-ai-stack were already replaced by /marketplace?tab=flea and /marketplace?tab=my respectively, but the standalone routes lingered. Hard delete in dev mode — no redirects, stale bookmarks 404. The /store/new upload wizard, the flea detail/edit pages, the admin queue, and all /api/store/* + /api/my-stack endpoints (CLI consumers) stay. Internal hardcoded hrefs in the upload wizard's Cancel button and the advanced-setup page repointed to the marketplace tabs. (2) Detail-page install button rework. The single button that morphed between "+ Add to my stack" and "✓ In your stack" did not communicate uninstall affordance. The installed state now renders an inline white status label before a separate red-bordered "✕ Remove from stack" button on the same row, both at identical height to avoid layout shift. System plugins keep their locked amber "✓ Required by your org" pill (no Remove button — API refuses 409). The post-action hint panel now fires on remove too with the title flipped to "✓ Removed from your stack" — Claude Code needs the same /update-agnes-plugins refresh either way. Also: /admin/marketplaces Details modal "Mark as system" toggle redesigned. The button was near-invisible (matched neutral row metadata). It's now a balanced amber-toned chip with shield icon and a structured confirm modal replacing the native confirm() dialog that summarizes fanout consequences before commit. * Move stack-hint inside hero with glass-on-gradient styling The post-action hint card ("✓ Added to your stack" with the /update-agnes-plugins recipe) used to live below the hero in panel-what (gray card on white page body). Clicking add/remove inserted/removed it between the hero and content, shifting the panels below — a noticeable scroll jump. The hint is now anchored inside the hero's top-right corner alongside the install/remove buttons, both as flex children of an absolutely positioned .actions container. The card uses a translucent white-on-glass treatment that adopts the hero's kind color (blue for plugin, green for skill, purple for agent) without per-kind branching. Hero is always tall enough (160px photo) to contain the action+hint stack without overflow, so toggling the hint visibility doesn't grow the hero or shift body content. The hero-head grid reserves a third 300px column for the absolute actions overlay so meta gets the proper 1fr free space instead of being squeezed by a padding-right hack. Responsive breakpoint at 1100px reflows the actions stack below hero-head when the viewport isn't wide enough to keep meta + actions side-by-side comfortably. * Add optional -DataPath bind mount to run-local-dev.ps1 When the operator wants to inspect DuckDB files (system.duckdb, extracts, marketplaces, store/, …) directly from Windows Explorer, the named volume inside the Docker Desktop WSL VM isn't reachable. The new -DataPath param generates a transient compose override that rebinds /data on app, scheduler, extract (and Caddy's /srv:ro mirror) to a Windows host folder. Fully additive — when -DataPath is omitted everything behaves exactly as before: no override file is generated, $composeFiles array is unchanged, finally cleanup is a no-op. Existing positional invocations (.\run-local-dev.ps1 up \| down \| logs) keep binding to $Action because $DataPath is a named-only parameter with no Position attribute. The override is written via [System.IO.File]::WriteAllText so the YAML is BOM-less across PS 5.1 / 7+ — Compose rejects BOM-prefixed YAML on Windows. The override file is unique per PID and removed in the script's finally block so concurrent invocations and crashes don't leak files. * factor mark_system fanout into UserCuratedSubscriptionsRepository The endpoint imported UserCuratedSubscriptionsRepository, ignored it (noqa: F841), then duplicated the user-side fanout SQL inline. Adds fanout_system_for_plugin() symmetric to the existing fanout_system_for_user() and routes mark_plugin_system through it — removes the dead import + 14 lines of inline SQL, returns the same `affected_users` delta count, no behavior change. * drop customer-specific path from .ps1 example Per CLAUDE.md vendor-agnostic OSS rule: replaced C:\\Business\\Groupon\\Agnes\\agnes-data with the generic C:\\Users\\<you>\\agnes-data placeholder so the docstring example reads cleanly on any reviewer's box. * release: 0.48.0 + parallelize Release-workflow pytest Cuts the release shipped via #228 #230 #231 #232 #233 #234 #236 #237 #238 #239 #240 plus this PR (#241). Major changes: - System plugin tier (schema v39) — admins mark a plugin mandatory; fans out RBAC grants + subscriptions to every existing user/group plus hooks for new principals - BREAKING: removed standalone /store + /my-ai-stack page routes (replaced by /marketplace?tab=flea + /marketplace?tab=my) - Setup-prompt + bootstrap recovery fixes (#240) - DuckDB CHECKPOINT-on-shutdown + 60s compose grace (#235) - Marketplace + flea-market UX polish, agnes-metadata.json enrichment Bonus: switch release.yml test step to `-n auto` (matches ci.yml). Single-threaded was 15-20 min and frequently the bottleneck on PR mergeability — now ~6 min. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-10 19:15:41 +00:00
Vojtech	8b2f6620a8	fix(duckdb): CHECKPOINT on shutdown + 60s compose grace to prevent WAL corruption (#235 ) * fix(duckdb): CHECKPOINT on shutdown + 60s compose grace to prevent WAL corruption Default 10s stop_grace_period + missing CHECKPOINT on shutdown produced a class of WAL-replay failures during agnes-auto-upgrade recreates. Sequence: 1. New image digest detected → docker compose up -d → SIGTERM to app 2. App's lifespan close_system_db() called .close() but never CHECKPOINT, so any uncheckpointed ops stayed in system.duckdb.wal 3. Container didn't exit within 10s → dockerd SIGKILL (verified in journal: "Container failed to exit within 10s of signal 15 - using the force") 4. New container started with possibly-different DuckDB version, replay hit "Failure while replaying WAL ... GetDefaultDatabase with no default database set" assertion → 500 on every authed request Observed on foundryai-dev-vrysanek 2026-05-05; recovered by removing the WAL manually. _try_open_system_db already exists as a recovery net but requires a system.duckdb.pre-migrate snapshot, which doesn't exist outside migration windows. Two-part prevention: - src/db.py::close_system_db: execute CHECKPOINT before .close() so the WAL is empty when the file is released. Best-effort (try/except), so a locked or full-disk CHECKPOINT does not block close. - docker-compose.yml: stop_grace_period: 60s on app + scheduler, gives uvicorn + lifespan room to run shutdown handlers under load before Docker's SIGKILL fires. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * log CHECKPOINT outcome on system DB close Silent except: pass on both CHECKPOINT and close() left operators without any signal when the WAL-flush safety net actually saved them (or didn't). Add logger.warning on CHECKPOINT failure (operator-actionable - recovery via _try_open_system_db kicks in next start) and debug-level trace on success / close exception. * drop customer-specific token + add CHANGELOG entries Per CLAUDE.md vendor-agnostic OSS rule: nothing customer-specific in shipped code/comments. Replaced "foundryai-dev-vrysanek 2026-05-05" references in docker-compose.yml and src/db.py docstring with generic "Docker image upgrade window where DuckDB versions differ" framing. The original incident date + host live in the commit history / PR body, not in the tree. Adds CHANGELOG entries under Unreleased: - Fixed: close_system_db CHECKPOINT-on-shutdown semantics + WAL-replay failure mode the fix protects against - Changed: docker-compose stop_grace_period 60s on app + scheduler --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-10 19:02:30 +00:00
Vojtech	929520f5e1	Flea-market edit feature with version history (schema v37) (#239 ) * feat(store): flea-market entity edit feature with version history (schema v38) Owner + admin can now edit a store entity from a real Edit page at /marketplace/flea/{id}/edit, replacing the prior "coming soon" placeholder. Editable: display name, description, category, video URL, cover photo, and an optional new bundle. Type is locked (400 type_locked). Display-name change renames the on-disk slug for both live plugin/ and version dirs (reuses rename-on-archive helper). Schema v38 (originally drafted as v37; renumbered after rebase onto main where v37 was taken by the curated marketplace enrichment). Versioning model: * Each bundle update bakes into ${DATA_DIR}/store/<id>/versions/v<N+1>/plugin/ and runs the standard guardrails pipeline. * DEFERRED PROMOTION: live plugin/ + entity.version_no stay at the prior approved version through the LLM review window so existing installers keep receiving the previously approved bundle. Live swap + version_no/version/file_size bump happen only on LLM approval. Blocked verdicts leave the prior version serving forever. * store_entities gains version_no INTEGER + version_history JSON. Each version_history entry carries hash, sha256, size, submission_id, created_at, created_by. * Existing entities backfill to v1 with a single-entry history seeded from the row's current `version` hash. Initial create also seeds versions/v1/plugin/ so future restore can copy v1 bytes forward. Concurrency: * Block-while-pending: an in-flight LLM review blocks any further edit with 409 prior_version_pending. Owner waits 5-30s; Edit button on detail page renders disabled in the same window via the new edit_in_flight flag (decoupled from quarantine_sub since the deferred-promotion flow keeps visibility='approved'). Rollback: * New endpoint POST /api/store/entities/{id}/versions/{n}/restore (owner + admin). Copies vN bundle forward as v<max+1> and re-runs guardrails (rules tighten over time; pre-approved bundles re-validate). Forward-only history. Same deferred-promotion semantics — live stays at prior version until LLM approves the restored copy. UI: * New /marketplace/flea/{id}/edit page (owner + admin gated). * Versions card on plugin + item detail templates (owner/admin only) via shared _flea_versions.html partial. * Admin queue gains v# column with current badge + separate Hash column. Submission detail surfaces Version + Bundle hash rows. * Activity timeline split into per-submission + entity-wide cards; entity-wide rows render vN chips when audit row params reference a specific version. * Section headers (Manifest / Static / Quality / LLM review) tag with vN chip via shared macro. * Reviewed-by-model field surfaces explanatory text per status. * Banner upload-failure now redirects to detail page on submission_blocked instead of staying stuck. Tests: 24 in tests/test_store_entity_versions.py covering metadata- only edit, bundle-edit version bump, type lock, block-while-pending, name change disk rename, restore flow + 404/400/403 paths, edit page 404 for non-owner, versions card visibility gating, admin queue v# column, admin detail Version/Hash rows, deferred-promotion installer contract (pending review doesn't break installer / blocked verdict keeps prior / approved promotes), admin can edit/restore non-owned, restore deferred promotion, audit log per-version params. 214 tests green across guardrails + edit + admin + repo + schema suites. * docs(store): refresh update_entity docstring to match deferred-promotion + submission-status gate Bring the docstring in sync with the actual fixes from the prior commit. The pre-fix wording said the gate read visibility_status='pending' AND submission status — under deferred promotion that would never fire for v2+ edits. Now describes: - Block-while-pending gates on submission.status DIRECTLY, independent of visibility (so v2+ deferred-promotion edits don't slip through). - Display-name + bundle change defers the live rename to promotion; metadata-only renames stay immediate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 00:14:33 +04:00
minasarustamyan	d269c69359	Drop legacy sslVerify=false fallback from install setup prompt (#238 ) The marketplace step (step 5) emitted `git config --global http.<host>/.sslVerify false` on AGNES_DEBUG_AUTH=1 instances when no ca_pem was readable from AGNES_TLS_FULLCHAIN_PATH. Two problems: 1. Claude Code auto-mode classifiers ("do not disable TLS verification" rule) block the line, breaking hands-free setup. 2. It silently masked operator misconfiguration — a debug-auth instance without a fullchain on disk fell through to a TLS-disabled clone instead of surfacing the missing cert. After the cross-platform trust block (#137), self-signed and private-CA servers are fully covered by step 0 reading the fullchain via _read_agnes_ca_pem; publicly-trusted certs need no bootstrap at all. The legacy fallback no longer covers a real scenario — verified by running step 5 without sslVerify=false against a self-signed instance. BREAKING: drops `self_signed_tls` parameter from app.web.setup_instructions.resolve_lines and render_setup_instructions (only consumed by the deleted block). The AGNES_DEBUG_AUTH env var itself is unchanged — still gates /api/me_debug and the dropdown link. Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>	2026-05-09 20:10:01 +02:00
minasarustamyan	6fe67d5279	Curated marketplace enrichment via agnes-metadata.json + curator metadata (#234 ) * Curated marketplace enrichment via agnes-metadata.json + curator metadata Adds a second well-defined metadata file `.claude-plugin/agnes-metadata.json` that upstream marketplace repos can opt into, providing per-plugin (and per-skill / per-agent) cover photo, demo video URL, doc links, and category override. The Claude Code marketplace contract is untouched — agnes-metadata.json + the convention `.agnes/` directory are stripped from the synthetic Claude Code marketplace served via /marketplace.zip and /marketplace.git/, so user instances see a clean Claude Code repo with no Agnes-only metadata. Highlights: - DB schema v32 — adds curator_name + curator_email on marketplace_registry, cover_photo_url + video_url + doc_links on marketplace_plugins. - Mandatory curator at marketplace registration, editable later through the admin UI; surfaces on cards + detail pages in place of owner_todo. - External-asset mirror cache at ${DATA_DIR}/marketplace-cache/<slug>/ with conditional GET, 60s timeout, 10 MB body cap, SSRF guards, and Wikipedia-policy-compliant User-Agent. - Strict drop semantics — anything Agnes can't deliver as a real PDF / Markdown / plain text doc, or a real PNG / JPEG / WebP cover, is dropped from the served metadata; UI looks identical to no-entry case (gradient placeholder for missing covers, no row in the doc list). - Doc allowlist + image allowlist enforced on both the curated mirror flow and the Flea upload flow (/store/new); shared module src/marketplace_assets.py. - New /api/marketplace/curated/{mp}/{plugin}/{asset,doc,mirrored}/... endpoints with path-traversal guards + RBAC + Content-Disposition attachment for docs. - Curator-focused format guide at /marketplace/format-guide; canonical source is docs/curated-marketplace-format.md, also linked from the admin /admin/marketplaces page next to + Add Marketplace. See CHANGELOG.md under [Unreleased] for the full breakdown. Fix format-guide test assertion to match shortened disclaimer The 'Flea Market' phrase was trimmed out of the disclaimer in docs/curated-marketplace-format.md after the curator-focused rewrite. Update the rendered-HTML test to assert the channel-scoping phrase that's actually present ('Curated Marketplace channel only') rather than the 'Flea Market' contrast that's no longer in the doc. * Drop unused 'version' field from agnes-metadata.json schema The parser never read it; it was a YAGNI placeholder for future schema evolution. Curators don't need to wonder what to put there when adding the file for the first time. Will be re-added if and when we actually introduce a backwards-incompatible schema change. * Harden asset mirror against SSRF via redirect + DNS rebinding The pre-flight _is_safe_url check validated only the initial URL; urllib.request.urlopen then followed redirects and re-resolved DNS for the actual connection — both bypassable. Attacker-controlled origin could 302 to http://169.254.169.254/... and exfil cloud metadata; attacker-controlled DNS could return public IP first / 127.0.0.1 second. Replace urlopen call with a shared OpenerDirector wired through three custom handlers: _SafeRedirectHandler re-runs SSRF allowlist on every redirect Location (max 5 hops, down from urllib's 10), and _PinnedHTTPHandler / _PinnedHTTPSHandler connect to the IP that passed validation rather than re-resolving the hostname. TLS SNI + cert verify stay bound to the original hostname. _resolve_safe returns the validated IP (the existing _is_safe_url 2-tuple wrapper stays for backwards compatibility) and rejects round- robin DNS that mixes a public + private record. _UnsafeRedirectError is a typed exception so _fetch_url can map redirect blocks to terminal 'rejected' status (not transient 'failed'). _http_open is the single call site so tests can mock at one well-defined seam. Tests cover redirect blocking (link-local, loopback), redirect-error unwrapping inside URLError, pinned-IP connection target, and the end-to-end DNS-rebinding scenario. Existing tests that mocked urllib.request.urlopen are migrated to mock _http_open. * Harden /asset/ endpoint against stored XSS The endpoint served any file in the cloned marketplace repo with stdlib-detected Content-Type, so a curator who landed evil.html (or a renamed evil.png carrying HTML bytes) in the working tree got a same-origin XSS — the response shares cookie scope with /admin and /api/me/. The asset endpoint is image-only by contract (cover photos referenced from agnes-metadata.json + inner skill / agent cards), so applying the same allowlist + magic-bytes pattern that /doc/ already uses closes the gap without breaking any legitimate use case. Three layered checks: extension in IMAGE_EXTENSIONS (.png/.jpg/.jpeg/.webp; SVG excluded — <script> inside SVG executes), validate_image_file magic bytes (defeats rename-extension attack), Content-Type pinned from the validated extension (never stdlib mimetypes). Defense-in-depth: X-Content-Type-Options: nosniff stops browser MIME sniffing; Content-Security-Policy: default-src 'none' blocks script / iframe execution even if a future regression let HTML through. Tests cover the .html extension reject, the renamed-HTML-as-PNG magic- bytes reject, the .svg reject, and the happy-path PNG with security headers attached. The pre-existing path-traversal test seeds a real PNG instead of ok.txt now that the endpoint is image-only. Enforce mandatory curator on marketplace PATCH The POST handler enforced curator_name + curator_email at create time, but PATCH treated empty / missing curator inputs as 'no change'. Legacy rows that pre-date v32 (curator_name=NULL) could be edited indefinitely without ever filling the curator gap, and OWNER_TODO_PLACEHOLDER lingered on every /marketplace card. Reject the PATCH with 400 when the post-merge row would persist with empty curator. The check fires after the existing field-merge logic, so once-filled rows that don't touch curator still pass through (their existing values fall through from the DB row). DB column stays nullable so untouched legacy rows continue to coexist — the gate fires only the moment an admin opens the edit modal. Existing PATCH semantics preserved: empty-string input still means 'leave existing value alone', and once-filled curator can't be cleared (those test cases pass unchanged). New test seeds a legacy row directly via the repository, then exercises url-only PATCH (rejected), partial-fill PATCH (rejected), and full-fill PATCH (succeeds); a follow-up no-curator PATCH on the now-formed row also passes. * Drop unused curated-marketplace helpers (PR #234 review) * build_db_payload — imported by src/marketplace.py but never called. The strict-drop semantics it would have implemented were re-written inline in _refresh_plugin_cache (see the comment block there). The standalone helper still carried the old fall-back-to-original-external- URL-on-mirror-failure behaviour, which contradicts the documented drop-when-can't-deliver contract — a future contributor who re-wired it would have introduced a silent regression. Delete with the helper + the import + the comment that referenced it. * _resolve_marketplace_name — one-line shim with no remaining call sites. Callers use _resolve_marketplace_meta which returns name + curator together, avoiding the double DB hit the shim exists to hide. * '# noqa: F401 Optional kept for forward-compat' was wrong — Optional IS used in src/marketplace.py (line 70 and line 238). Drop the noqa comment so a future ruff run doesn't try to remove a real import. Removing build_db_payload also drops the only remaining use of Optional in src/marketplace_metadata.py, so the import comes out there too. * Cap agnes-metadata.json size + catch RecursionError on parse The reader is invoked once per marketplace per sync and the file is curator-controlled. Two failure modes were unguarded: * Multi-GB JSON: path.read_text() pulled the whole file into memory before json.loads even ran. A curator with commit access to an upstream repo could OOM the sync worker. * Deeply-nested JSON under any size cap: cpython's recursive object / array parser raises RecursionError at ~1000 levels of depth. RecursionError is a RuntimeError, not ValueError, so the existing catch let it propagate up and abort the entire sync — every other marketplace in the same pass got skipped. Add AGNES_METADATA_MAX_BYTES = 1 MiB (a real metadata file with covers, docs, categories for ~50 plugins fits in <100 KB so the cap is generous) and gate the size check on path.stat().st_size before the body read. Broaden the parse except to (ValueError, RecursionError) with a unified log line. Both failure modes degrade to the same empty-dict fall-back the malformed-JSON path already used, so one bad upstream never aborts the rest of the sync. Tests cover the size cap firing before json.loads (whitespace-padded valid JSON exceeding the cap) and the recursion path (5000 nested arrays — past cpython's default recursion limit but well under the size cap). * Persist asset-mirror manifest per body write, before unlink sync_assets wrote each body atomically (tmp + rename) but persisted the manifest only at the end of the batch. A kill -9 mid-Phase 2 left on-disk files the manifest never referenced. Once a curator dropped that URL from agnes-metadata.json, Phase 3's cleanup had no record of the file and the orphan stayed forever — there's no GC pass walking the cache dir today, so disk would slowly bloat. Phase 2 (body-write iteration): after the in-memory manifest mutation, persist BEFORE unlinking the previous body. The crash window narrows from 'all of Phase 2' to 'between persist and unlink' (microseconds). A persist failure mid-batch keeps the previous body on disk — the on- disk manifest still references it, and a stale-but-existing file beats a 404. Cost: one extra tmp+rename per body write; manifest is a few KB so the overhead is negligible vs. the HTTP fetches. Phase 3 (curator-removed URLs): same discipline. Collect the to-delete relpaths, persist the manifest with the entries already gone, THEN unlink. A crash mid-cleanup leaves at most a microsecond window where files exist despite the manifest no longer naming them. The next sync reads the (correct) manifest and the orphan stays orphaned, but the served state is consistent. Tests cover per-body persist call count, the post-update on-disk manifest content, and Phase 3 ordering verified by reading the on-disk manifest from inside Path.unlink. * Consolidate marketplace video embeds + format-guide CSS The YouTube nocookie / Vimeo / <video> / link-fallback detection logic was duplicated verbatim in marketplace_plugin_detail.html and marketplace_item_detail.html (~40 JS lines each, with subtly-different inline styles). Both templates now {% include %} a single _marketplace_video_embed.html partial inside their IIFE so the regex, the nocookie attribute set, and the unknown-host link fallback live in ONE place — future tweaks (new host, new attribute, fixed sandbox flag) no longer need to be applied twice in lockstep. The .video-wrap selectors (one inline <style> rule in plugin_detail, one inline style='...' attribute in item_detail) are replaced by the existing .video-embed 16:9 wrapper in style-custom.css, with new .video-embed video / .video-embed a child rules added so the wrapper handles all four embed shapes uniformly without per-template positioning. The 60-line inline <style> block in marketplace_format_guide.html moves verbatim to style-custom.css under a new 'Marketplace format guide page' section, scoped to .format-guide so other pages aren't affected. No user-visible behaviour change: the rendered HTML for valid YouTube / Vimeo / mp4 / external links is byte-identical to before, and the format-guide page renders the same. * Maintainability cleanup batch (PR #234 review) #10: drop _path_under from app/api/marketplace.py — it was a byte- equivalent clone of _safe_join (same Path.resolve(strict=True) + relative_to() containment check). The three v32 endpoint handlers (/asset, /doc, /mirrored) now share the existing helper. #14: rename src/marketplace_assets.py → src/marketplace_asset_validation.py so the file's purpose is obvious from the name and the previous overlap with src/marketplace_asset_mirror.py is gone. Six call-site imports updated in lockstep; CHANGELOG references under [Unreleased] updated to track the new path. #11: consolidate the URL builders that resolve /api/marketplace/curated/<slug>/<plugin>/{asset,doc,mirrored}/... paths. _internal_asset_url / _internal_doc_url / _mirrored_asset_url lived in src/marketplace.py, while a copy named _mirrored_url lived in app/api/marketplace.py with a 'must stay aligned' comment. New module src/marketplace_urls.py is the single source of truth — both call sites import from it and a future URL-format tweak only needs to change one file. The _ROUTE_PREFIX constant collapses the per- function f-string repetition. The route-handler endpoints themselves still own the path string literals (keeping the builders identical to the route declarations remains a checklist item, not a runtime guarantee). * Re-key asset-mirror manifest by (plugin, url) + dedup HTTP fetches The manifest used to be keyed by URL alone, so two plugins in the same marketplace referencing the same external image (a shared CDN icon, a common cover) collided on entry.plugin_name — last writer won. The DB row for the losing plugin then stored a served URL pointing under the winning plugin's tree, and require_resource_access denied legitimate access on one side and let the other plugin's user reach the wrong asset. In-memory: Dict[Tuple[str, str], MirrorEntry] keyed (plugin_name, url). On disk: format flips from {url: entry} dict to [entry, ...] list of self-describing entries (each carries plugin_name + url + the previous fields). JSON keys can't be tuples; encoding 'plugin::url' would just shift the parsing burden. Phase 1 of sync_assets deduplicates fetches by URL — three plugins sharing one URL share one HTTP request. The conditional-GET prior is picked from any owning plugin's prior entry; if their etags diverge (rare) we miss one 304 and pay for a full re-download instead. Phase 2 still creates a per-(plugin, url) manifest entry pointing under the plugin's own subdir, and Phase 3 cleanup is keyed the same way so dropping a URL from one plugin's metadata doesn't disturb another plugin still referencing it. Body files stay per plugin (RBAC-clean isolation: deleting plugin A's cache can't strand plugin B). Bandwidth saved by fetch dedup. Consumer code re-keyed: src.marketplace._refresh_plugin_cache rebuilt served_url_for / mirror_status as composite-keyed maps; app.api.marketplace._resolve_external_via_mirror / _curated_inner_cover / _curated_inner_enrichment look up by (plugin_name, url). Tests cover per-plugin manifest entries with shared URL, the single HTTP fetch for N plugins, and Phase 3 drop-one-keep-other. All existing tests migrated to composite key access; v2 list format assertions verify on-disk shape. * Migrate asset mirror from urllib.request to httpx The asset mirror was the only HTTP call site in Agnes still using urllib.request; every other module (CLI, Jira / OpenMetadata / OpenAI connectors, scheduler, Telegram bot) already used httpx. The asset mirror was added in this PR's base commit, so this is the only chance to bring it into convention before someone copies it as 'the pattern for HTTP fetches in Agnes'. Three concrete benefits beyond consistency: * SSRF defence collapses from five urllib classes (_PinnedHTTPConnection, _PinnedHTTPSConnection, _PinnedHTTPHandler, _PinnedHTTPSHandler, _SafeRedirectHandler) into one _SSRFGuardTransport. httpx invokes handle_request() on every redirect hop, so re-validation is free — we don't need a custom redirect handler at all. * DNS-rebinding defence: the transport rewrites request.url.host to the SSRF-validated IP before delegating to super().handle_request(). httpcore connects to whatever URL.host says, so this pins the connection without subclassing HTTPSConnection. The original hostname goes into the Host header + the sni_hostname extension so TLS / vhost routing still bind to the curator-supplied hostname. * Error handling: one httpx.HTTPError catch-all for transport errors, plus specific httpx.TimeoutException / httpx.TooManyRedirects branches for clearer diagnostics. Matches the _translate_transport_error shape in cli/client.py. The shared httpx.Client is built lazily at module load (same pattern as cli/client.py:_get_shared_client) with follow_redirects=True, max_redirects=5, timeout=HTTP_TIMEOUT_SEC, and our custom transport. Externally observable behaviour is unchanged: same FetchOutcome statuses, same manifest format, same conditional GET semantics, same body-size cap. Tests migrated from urllib-shaped fakes to httpx-shaped (status_code, iter_bytes, context manager). Five urllib-specific tests replaced with httpx equivalents — three transport unit tests + one DNS-rebinding integration test that verifies host rewrite via monkey-patched super().handle_request. One test deleted without replacement (unwrap-URLError-wrapping-an-_UnsafeRedirectError — urllib-specific, not applicable to httpx). * Surface curated agnes-metadata enrichment on My Stack tab GET /api/marketplace/items?tab=my built each curated row from the on-disk marketplace.json by way of resolve_allowed_plugins, which doesn't carry the agnes-metadata enrichment columns (cover_photo_url, video_url, category override, doc_links). The handler then hard-coded cover_photo_url=None on the synthetic row. Result: once a user clicked '+ Add to my stack' on a curated card, the same plugin in tab=my rendered with the gradient placeholder instead of its cover photo — confusing parity break vs. the curated tab where the same row goes through MarketplacePluginsRepository and gets the enriched columns. Pre-load the enriched marketplace_plugins rows for every marketplace the user is subscribed to, then look each granted+subscribed plugin up by (marketplace_id, plugin_name). Fall back to the on-disk synthetic shape only when the DB row is missing — happens during the rare race where RBAC is granted before the first sync cycle ingests the plugin. RBAC gating (granted set from resolve_allowed_plugins) is unchanged so this fix can't widen visibility; it just upgrades the data shape behind cards the user was already going to see. Per-marketplace list_for_marketplace beats N gets — typical user is subscribed to <5 marketplaces, so this is at most a handful of queries vs. one per subscribed plugin. Regression test seeds a plugin with cover_photo_url + category override, subscribes the user, hits /api/marketplace/items?tab=my, and asserts photo_url + category come through. The misleading 'fall through to gradient until the user re-visits the curated tab' comment is gone. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>	2026-05-09 17:01:37 +02:00
minasarustamyan	a6b647dda9	Make v34 to v35 store_entities migration idempotent (#236 ) The original list-form _V34_TO_V35_MIGRATIONS ran four ALTER statements in sequence: ADD _vis_v35 → UPDATE _vis_v35 = visibility_status → DROP visibility_status → RENAME _vis_v35 TO visibility_status If the RENAME failed for any reason after the DROP succeeded — DuckDB lock contention at startup, scheduler-vs-app race opening system.duckdb, container kill mid-migration, etc. — the DB was stranded with _vis_v35 populated and visibility_status missing. The schema_version row never bumped because the UPDATE at the bottom of the migration ladder runs only when every step succeeded. Subsequent restarts then hit DROP visibility_status again with no IF EXISTS guard and looped on the same error; the only recovery was hand- editing the DB. Replace the list with a Python function _v34_to_v35_migrate that inspects the table's columns up front and dispatches into one of three paths: * clean v34 (visibility_status present, _vis_v35 absent) — run the full rebuild * partial v35 (_vis_v35 present, visibility_status absent) — finish the RENAME alone, data is already in _vis_v35 from the prior UPDATE * both columns present (rare; aborted before DROP) — drop the temp and keep visibility_status The audit columns (archived_at, archived_by) ship first behind IF NOT EXISTS so they're safe in all states. Operators stranded by the original bug now recover automatically on next startup. Tests cover the three direct paths plus an end-to-end scenario where _ensure_schema walks a schema_version=32 DB with the half-applied state up through to v36. Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>	2026-05-09 16:38:29 +02:00
Vojtech	d6ad08f107	Flea-market upload guardrails + soft delete + JOIN-based admin queue (#233 ) * feat(store): flea-market upload guardrails + soft delete + JOIN-based admin queue Adds an end-to-end guardrails pipeline for store uploads (manifest + static-security + LLM review), persists blocked bundles for forensics, introduces soft-delete (Archive) semantics, consolidates the legacy /store/{id} surface into /marketplace/flea/{id}, and reworks the admin queue so lifecycle filters read live entity visibility via LEFT JOIN rather than a denormalized submission column. Schema v29 → v35: * v29 store_submissions table + store_entities.visibility_status * v30 file_size, bundle_sha256, bundle_purged_at on submissions * v31 reshape store_submissions (drop legacy unique on entity_id) * v32 store_entities.archived_at/by + 'archived' visibility value * v33 drop store_submissions.retry_count (unused) * v34 ensure idx_store_submissions_entity exists post column-drop * v35 broaden visibility_status enum + JOIN architecture cutover Pipeline (src/store_guardrails/): * Inline checks: manifest_check, static_scan, quality_check * LLM review configurable haiku\|sonnet\|opus (default haiku) * BackgroundTasks-driven async path with structured-output JSON * Per-submitter daily quota (default 50) * 30-day TTL purge job (POST /api/admin/run-blocked-purge) * Bundle SHA256 + size persisted; sha256 survives purge for forensics Visibility model: * pending \| approved \| hidden \| archived * _enforce_visibility returns 404 (no leak) for non-owner non-admin * Owner sees own non-approved entries via include_owner_id widening * Install refused with 409 entity_not_approved when not approved Soft-delete (DELETE /api/store/entities/{id}): * Default = soft (visibility_status='archived'); existing installs keep getting served the bundle so users don't lose the plugin * ?hard=true admin-only: drops bundle + cascades user_store_installs * Hard-delete preserves entity_id on submission as tombstone so audit_log linkage survives for the activity timeline Admin queue lifecycle (the JOIN refactor): * Verdict (store_submissions.status) is immutable forensic record * Lifecycle (store_entities.visibility_status) is live state * /admin/store/submissions Archived chip translates to `e.visibility_status='archived'` via LEFT JOIN — any path that flips visibility surfaces in the queue immediately * Detail page renders Status (verdict) and Entity lifecycle side by side so admins see "approved at review, now archived" at a glance URL consolidation: * /store/{id} deleted (no redirect, stale bookmarks 404) * /marketplace/flea/{id} is the canonical detail surface * Three in-tree callers (upload-success, my-stack card, store listing card) updated to point at the new URL * Quarantine banner extracted to _quarantine_banner.html partial, self-guarded, included from both flea detail templates * Banner JS auto-refreshes when the verdict lands by polling /api/marketplace/flea/{id}/detail (visibility_status + submission_status — the latter is needed because blocked_llm keeps the entity at visibility_status='pending') Audit log resource format: * runner.py emits prefixed `store_submission:{id}` (post-fix) * Detail-page timeline query handles three patterns: prefixed submission, helper-emitted `store_entity:{sub_id}`, and bare-id legacy rows — all surface in the activity timeline UX fixes: * Owner sees Under review / Quarantined / Hidden banner with status * Install button gray-disabled (not blue) when non-approved * Owner cannot delete quarantined entries (403); admin can * Admin queue: filter chips, sortable columns, paging, page-size * Auto-refresh queue every 5s while pending rows are visible * Store upload page file picker no longer opens twice (label → input default action collided with explicit JS handler) Tests: 168 passed across the guardrails suites (admin submissions, store API, inline / LLM / purge guardrails, store repositories, marketplace filter, schema version). New regression coverage includes: archive surfaces via JOIN even when API path is bypassed; deleted submission renders activity timeline (tombstone); flea detail surfaces submission_status only for owner/admin; detail page renders Entity lifecycle row; audit log resource format covers both helper and runner paths. * fix(store-guardrails): PR #233 follow-up — prompt injection, atomic PUT, BG race, schema, reaper, sort whitelist Addresses 9 of the 23 findings from the PR #233 review (spec at docs/superpowers/specs/2026-05-09-pr233-guardrails-fixes-spec.md). Merge-gate items #1-#6 plus high-value mediums #7, #9-#12, #23. Architectural items (#8 enum split, #14 factory) and pure maintainability (#15-#22) deferred to follow-ups. Security: * #1 prompt injection — SYSTEM_PROMPT now passed via the SDK's dedicated system= parameter; bundle wrapped in <bundle>...</bundle> sentinels declared data-only by the system prompt; literal sentinel strings in user content are escaped so an adversarial README can't forge a close tag. * #6 static scan honesty — module docstring + admin copy + docs declare static scan as signal not gate; .md/.txt/.rst/.html/.json/ .yaml/.yml/.toml skipped to avoid false positives on prose. AST mode for Python deferred (separate flag, FP comparison work). Correctness: * #2 PUT atomicity — bundles bake into plugin.staging-<rand>/ alongside live, atomic-rename on success; failed checks leave live tree byte-for-byte intact. * #3 BG-task race — set_visibility_if_pending guards verdict flips to the (pending, hidden) review window; admin archives during review survive; skipped flips audit-logged. * #4 v35 NOT NULL/DEFAULT — schema v35→v36 re-applies them on store_entities.visibility_status. CHECK constraint enforced application-side (DuckDB ADD CHECK on existing column unsupported). * #7 stuck-review reaper — reap_stuck_llm_reviews flips pending_llm rows older than guardrails.stuck_review_grace_seconds (default 1800) to review_error. Scheduler runs every 15 min via new /api/admin/run-reap-stuck-reviews. Set knob to 0 to disable. * #9 quota counter — count_blocked_for_submitter_since now counts blocked_inline + blocked_llm + review_error so a submitter triggering only LLM-blocked verdicts is bounded. * #10 missing risk_level — surfaces as review_error with error='missing_risk_level' instead of silently defaulting to 'medium' (which looked like a model-decided block). * #11 archived_at clear — set_visibility nulls archived_at + archived_by when transitioning out of 'archived' so a future read doesn't show stale archive forensics on an approved row. Maintainability: * #12 FSM doc comment — accurate insert/transition/lifecycle description in src/db.py near store_submissions schema. * #23 sort-key whitelist — admin queue rejects unknown sort keys with 400 invalid_sort_key; substring-replace footgun removed. Deferred (separate PRs): * #5 quota race — proper fix requires asyncio.Lock spanning the full pipeline; threading.Lock blocks event loop, DuckDB MVCC doesn't help. API-level slowapi bounds worst case for now. * #6 part 3 (AST static scan), #8 (enum split), #13 (import bundle docs), #14 (factory consolidation), #15-#22 (maint). Tests: * New: tests/test_store_guardrails_prompt_injection.py (corpus + trust-boundary invariants), tests/test_store_put_atomic.py, tests/test_store_guardrails_reaper.py. * Extended: test_store_guardrails_llm.py (system param, missing risk_level, BG race), test_admin_store_submissions.py (quota counter widening, sort whitelist 400), test_store_repositories.py (un-archive metadata clear), test_db_schema_version.py (v36). * Full suite: 3738 passed; 17 pre-existing baseline failures unchanged (db migration tests, cli binary rename, catalog export, user mgmt v5 backfill — confirmed by stash + rerun on clean tree).	2026-05-09 17:32:53 +04:00
minasarustamyan	e26236fdc1	Extract session-pipeline framework + UsageProcessor skeleton (#232 ) * Extract session pipeline framework, refactor verification, add UsageProcessor skeleton Pluggable framework under services/session_pipeline/ (contract + lib + per-processor runner) so multiple processors can read /data/user_sessions/<key>/.jsonl on their own cadence with full failure isolation. Verification flow becomes the first plugin; a no-op UsageProcessor reserves the second slot pending a separate brainstorm on extraction logic + storage shape. Schema v28→v29: rename session_extraction_state → session_processor_state with composite PK (processor_name, session_file). Existing rows copied over with processor_name='verification'; legacy table dropped. Migration is idempotent and no-ops the copy step on fresh installs that came up at the new schema. Endpoint: /api/admin/run-verification-detector replaced by parametrized /api/admin/run-session-processor?processor=<name>. Audit action format follows. Scheduler JOBS: verification-detector entry split into session-processor:verification + session-processor:usage. SCHEDULER_VERIFICATION_DETECTOR_INTERVAL retained for operator compatibility (drives both cadence and health-check grace window); SCHEDULER_USAGE_PROCESSOR_INTERVAL added. Address PR #232 review: scan dead branch + per-processor lock - `SessionProcessorStateRepository.scan_unprocessed_for` dead else: both branches surfaced every jsonl, the SELECT was unused, runner MD5-rehashed every stable session per tick. Replaced with an mtime precheck — stable sessions (mtime <= processed_at) are filtered at scan; modified files still surface for the runner's authoritative `file_hash` invalidation. Naive-local comparison matches the existing health-check idiom (DuckDB TIMESTAMP strips tz on storage). - Per-processor advisory lock around `_run_processor` in `/api/admin/run-session-processor`. Scheduler tick + manual admin POST could otherwise both run, both call create_evidence on overlapping detections, and accumulate duplicate verification_evidence rows (the dedup short-circuit only covers create+contradiction, not evidence per ADR Decision 3). Non-blocking acquire → 409 Conflict on concurrent invocation; release in finally so a runner exception doesn't wedge the processor. Tests: two new scan unit tests (mtime filter + post-mark mtime bump), 409 endpoint test, lock-released-on-exception test. Two existing tests updated for the new "filtered at scan" stat shape (previously asserted skipped == 1, now scanned == 0). * Address PR #232 review #2: parallel scheduler tick + last_run on terminal state Two pre-existing scaffold bugs in services/scheduler/__main__.py amplified by adding more session-pipeline jobs: 1. Serial for-loop over jobs with synchronous httpx.post(timeout=900) — a 10-minute verification run blocked every other job (data-refresh, health-check, usage, corporate-memory) for the whole window. The PR's stated isolation guarantee held inside the runner but broke at the scheduler dispatch layer. 2. last_run advanced only when _call_api returned True. Permanent-failure jobs hot-looped on every tick (30s) instead of cadence (15min). Fix: ThreadPoolExecutor.submit per due job + per-job in_flight set so a long-running job can't be re-launched on subsequent ticks. last_run advances unconditionally in finally; errors still surface via _call_api logging + audit_log on the receiving side. _run_job extracted to module-level for unit testing. New tests: - TestRunJobBookkeeping: advances on success / failure / unhandled raise - TestRunLoopParallelism: in_flight protection prevents duplicate launches across ticks for a single slow job --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>	2026-05-08 19:47:46 +02:00
Vojtech	2e2e1a1eca	feat(home): state-aware /home + /setup-advanced + schema v26 (#228 ) * feat(home+news): state-aware /home + /news + admin-edited news section Squash of the vr/home-page feature work for clean rebase onto main. Original 18-commit history preserved in branch backup/vr-home-page-pre-rebase. What's in this PR: State-aware /home page - New `/home` route with hero + auto-mode + connectors (Asana / GWS / Atlassian) + lookarounds. Onboarded vs not-onboarded state-machine branches a single template (`home_not_onboarded.html`); the install steps, "Setup a new Claude Code" CTA (90-day PAT mint), and per- connector setup prompts hide once `users.onboarded=TRUE`. A completion badge replaces them. - "Mark me as offboarded" button reverses the flag without an SQL UPDATE. - `users.onboarded BOOLEAN` column added; default FALSE; flipped by the CLI's `agnes init` post-success POST and the `/admin/users` API. - Connector setup prompts pre-check whether the tool is already installed/connected before re-running setup. - GWS scope set widened to include Google Chat (`chat.spaces`, `chat.messages`). Single template + design tokens - `dashboard.html` now extends `base.html` via the new `{% block layout %}` opt-out (full-width pages skip the 800px `.container`). Net: every page shares one shell. - `style-custom.css` `:root` extended with `--space-{7,9,10,12}`, `--radius-2xl`, `--shadow-{card,elevated}`, `--text-{muted,disabled}`, `--focus-ring`, `--transition-`, `--width-{narrow,app,wide}` so inline page styles can migrate incrementally. Auth redirects honor AGNES_HOME_ROUTE* - `safe_next_path` resolves the configured home route when no `default=` is passed; OAuth callbacks, magic-link clicks, password form, and LOCAL_DEV_MODE shortcuts now land on `/home` (or whatever the operator picked) instead of always /dashboard. News section + /news permalink + /admin/news editor - Schema-bumped `news_template` table (single versioned entity, draft + publish gate). `published BOOLEAN` distinguishes draft from public; monotonically-increasing `version` per save; rows >30d pruned on save except the currently-displayed published version. - `/home` bottom-of-page renders the latest published intro with a "Read more →" link to `/news` (which renders the full body). - `/admin/news` editor with sandboxed live preview, versions table, per-row Unpublish, Format-help cheatsheet. - `agnes admin news show / draft / edit / publish / unpublish / versions / export` (CLI). Talks to the live server via the `/api/admin/news/` endpoints (PAT-authed) — no direct DB access so it coexists with a running uvicorn. - Optimistic-lock guard: `agnes admin news publish --version N` and PUT/PATCH endpoints accept `expected_version` and 409 with structured `{error: "version_conflict", expected, actual, actual_by}` when a concurrent admin replaced the draft. Edit refuses to overwrite a draft authored by someone else without `--force` or `--expect-version`. - nh3 (Rust-backed ammonia) HTML sanitizer; iframe pre-pass strips any iframe whose src is not on the YouTube/Vimeo/Loom allowlist; javascript:/data: schemes blocked everywhere. - Author CSS vocabulary: `.news-hero` (blue gradient hero block), `.callout`/`.callout-{info,warn,success,danger}`, `.video-embed`, `.news-section`, `.news-grid-{2,3}`, `.news-cta` — all consolidated in `style-custom.css` under "News content vocabulary (shared)" so /home perex, /news body, and /admin/news preview share one source of styling. - Code-inside-`<pre>` contrast fix (was unreadable amber-on-silver). - `.news-content` table styling (border, header band, row-hover). `scripts/dev/run-local.sh`* — local uvicorn launcher. Pulls Google OAuth client id/secret from GCP Secret Manager (`AGNES_OAUTH_GCP_PROJECT`-driven, no vendor defaults), points `AGNES_CLI_DIST_DIR` at `./dist` so the wheel endpoint resolves, and `--dev` flips `LOCAL_DEV_MODE=1` + `AGNES_HOME_ROUTE=/home` for one- command iteration. `LOCAL_DEV_MODE=1` also enables the FastAPI debug toolbar. CLAUDE.md "Run tests before every push" section codifies `pytest tests/ -n auto -q` as non-negotiable before each push. Tests: 51 + 14 + 8 = 73 new tests across news-template repo, sanitizer, API, web, CLI; plus updated home/auth/template tests for the new shared-shell architecture. Origin docs (gitignored, customer-fork content): docs/brainstorms/home-page-requirements.md, docs/plans/2026-05-07-001-feat-home-page-plan.md. * feat(cli): agnes onboarded {on,off,status} — self-scoped flag toggle User-facing equivalent of the in-page "Mark me as (off)boarded" button on /home. POSTs /api/me/onboarded with {onboarded, source}; --source overrides the audit-log marker so flips made from the CLI vs the web button vs agnes init automation stay distinguishable. `status` reads via /api/me/profile (when present); falls back to a quick body-marker scan of /home so the read path doesn't write an audit_log row. PAT-authed via cli.client.api_post — same convention as agnes admin news / agnes admin add-user etc. Tests: 5 covering on/off/status round-trip, idempotency, and audit-log source recording. Full suite holds at 12 pre-existing failures (same set as before). * ui(nav+home): primary nav reorg + green What's new band + /marketplace link fix Primary nav (post-rebase audit + per-user feedback): - Items: Home → Marketplace → Data Packages → Memory. Admin dropdown for admins only. The "Dashboard" label was renamed Home — point still resolves through `home_route` so customer instances on /dashboard still land there. - Activity Center moved into the Admin dropdown. Per-team adoption analytics is admin-consumed in practice; the route still allows any authed user for direct deep-links so existing /home tile + bookmarks keep working. - Memory link added (→ /corporate-memory) — was previously buried in the /home "Look around" tiles. - Setup local agent + My Stack dropped from main nav. Setup is the /home install flow's home now; My Stack lives as a tab inside /marketplace. /home tweaks: - Plugin marketplace tile now points at /marketplace (was /store — legacy from before the marketplace rebrand landed in #230). - "What's new" section header gets a green band (success-flavored D1FAE5 background, A7F3D0 border, darker green title) so the bottom-of-page news block visibly distinguishes from the blue install-hero at the top. Header strip only — body stays white. Test fix: test_home_route_resolution renamed `dashboard_link_uses_home_route` → `home_link_uses_home_route` and asserts `href="/home">Home` instead of `href="/home">Dashboard` after the label change. * fix(home): decouple Step 3 + Connect-tools collapse from server onboarded flag The server-side `users.onboarded` flip happens through two paths: 1. Explicit user click on "Mark me as onboarded" or `agnes onboarded on`. 2. Implicit `agnes init` POST → /api/me/onboarded on success. Path 2 produced a UX surprise: an analyst running `agnes init` mid-flow reloaded /home and saw Step 3 (auto-mode) + Connect-your-tools auto- collapse to summary bars. They were actively working through those sections — the install POST never signalled "I'm done with the rest of setup", just "Agnes itself is installed". Decouple the section-collapse decision from the server flag: - Step 1 + Step 2 install blocks: still hidden on `onboarded=TRUE` (their completion is a hard server signal — Agnes IS installed). - Step 3 + Connect-your-tools: render flat by default in BOTH states. Wrapped in `<details class="setup-collapsible" open>` so the browser's native disclosure handles per-section toggle without JS, but the `<summary>` is CSS-hidden until the page-level `data-setup-minimized="1"` attribute is set on `.home-mock`. - New "Minimize setup view" toggle inside the blue install-hero, rendered only when onboarded. Click flips the data-attr on `.home-mock` AND removes the `open` attribute from each `<details>`. State persists in `localStorage["agnes_home_setup_minimized"]` so the choice survives reloads but is per-device. - "Show full setup view" (the same button when minimized) re-opens both `<details>` and clears localStorage. When minimized, each `<details>` still has its own native expand/ collapse — click the gray summary bar to peek at one section without toggling the page-level minimize off. Tests: - test_step3_and_connectors_render_flat_when_onboarded_by_default — asserts `<details class="setup-collapsible" ... open>` for both sections post-onboarding and the absence of any server-rendered `data-setup-minimized` attribute on the `.home-mock` root. - test_minimize_toggle_visible_only_when_onboarded — toggle button rendered only when onboarded. Full pytest holds at 12 pre-existing failures (same set).	2026-05-08 18:28:47 +02:00
Vojtech	107195730d	feat(observability): optional PostHog integration (#231 ) * feat(observability): optional PostHog integration (errors, LLM traces, replay, flags) Off by default. Activates when POSTHOG_API_KEY is set in env. Defaults to PostHog Cloud EU; override host for US Cloud or self-hosted. Coverage: - FastAPI 500 handler captures unhandled exceptions - src/orchestrator.py rebuild + rebuild_source failures - services/scheduler/ HTTP-job failures - cli/main.py uncaught CLI errors (Typer.Exit/SystemExit/KeyboardInterrupt skipped; flushes before re-raise so short-lived CLI invocations don't drop events) - connectors/llm/anthropic_provider.py + openai_compat.py emit $ai_generation events with provider, model, latency, token counts (prompt/completion bodies stay off unless POSTHOG_LLM_PAYLOADS=1 because LLM prompts here routinely include customer SQL/data) - Browser snippet injected into every text/html response by PosthogInjectionMiddleware — registered inside the GZip layer so it sees uncompressed HTML before compression. Many templates are standalone (their own DOCTYPE) and never extend base.html, so a per-template include would miss them. - Frontend: $pageview, $pageleave, JS error capture via window.error and unhandledrejection handlers, masked session replay (maskAllInputs: true plus CSS-selector mask for known data surfaces), feature flags (browser posthog.isFeatureEnabled + server-side feature_enabled with fallback for older SDKs). Identification mode operator-configurable: none / id / email / full. Default email ships user.id + email but never name. CLI entry point moves from cli.main:app to cli.main:main (Typer wrapper). Files: - src/observability/posthog_client.py — lazy singleton, no network when disabled, single-process flush on shutdown - src/observability/llm_tracing.py — trace_generation context manager - app/middleware/posthog_inject.py — HTML rewrite middleware - app/web/templates/_posthog.html — browser snippet template - docs/observability.md — operator guide - config/.env.template — documented POSTHOG_* knobs - tests/test_posthog_disabled.py + tests/test_posthog_client.py + tests/test_llm_tracing.py — 18 tests covering disabled state, identify-mode payloads, $ai_generation shape, error variant. CHANGELOG entry under [Unreleased] Added. * feat(observability): tag every PostHog event with environment + release Splits PostHog dashboards cleanly between localhost / dev / staging / production without manual tagging on every capture call. - POSTHOG_ENVIRONMENT explicit override; auto-resolves to "local" when LOCAL_DEV_MODE=1, else RELEASE_CHANNEL, else AGNES_DEPLOYMENT_ENV, else "unknown". - AGNES_VERSION → RELEASE_CHANNEL fallback feeds the `release` property for "is this error new in this release?" cohorting. - Backend gets both via the PostHog SDK's super_properties constructor arg (every captured event picks them up automatically). - Browser snippet calls posthog.register({environment, release}) inside the loaded callback so $pageview, $exception, autocapture, etc. all carry the same labels. - request.state.user now populated by auth dependencies so the snippet can actually call posthog.identify(user_id, {email}) for logged-in users (previously the user block always resolved to None because nothing wrote to request.state.user). 4 new tests cover env resolution: explicit > LOCAL_DEV_MODE > channel > unknown, plus super-properties forwarding into the SDK constructor. * feat(observability): inline user attrs on every PostHog event + debug throw route PostHog's UI shows person properties on the Person profile page, not inline on each event — so a reviewer triaging an exception couldn't tell which user hit the bug without clicking through. Fix it on both sides. - Backend capture_exception merges user_id / user_email / user_name into the event properties (gated by POSTHOG_IDENTIFY_PII: none/id/email/full). Backed by a new _user_props_for_event helper on PosthogClient. - Browser snippet registers user_id + user_email + user_name as super- properties via posthog.register({...}) so every $exception, $pageview, and custom event coming from posthog.captureException() carries them inline. Mirrors the backend so cross-referencing client/server events doesn't require a person-profile lookup. - /api/debug/throw — debug-only endpoint gated by DEBUG=1 (404 in prod). Runs Depends(get_current_user) first so request.state.user is set when the unhandled-exception handler captures the event. Lets operators exercise the full observability path end-to-end without hand-rolling a TestClient script. Configurable via ?kind=ValueError&msg=... 7 new tests cover: backend user-attr merge across identify modes, anonymous request fall-through, browser snippet super-prop emission for logged-in / anonymous / id-only / full-name cases. * fix(observability): address minasarustamyan PR #231 review Two bugs caught in review. 1. PosthogInjectionMiddleware dropped Response.background on every return path. BaseHTTPMiddleware materialises the body and asks subclasses to return a fresh Response — three paths in dispatch() omitted background=, silently cancelling any BackgroundTask / BackgroundTasks the route attached (audit logging, async webhooks, email sends) with no log line. Fix: route every return through a _passthrough() helper that forwards background. Also adds a _MAX_BUFFER_BYTES (4 MB) cap so a streamed-HTML response can't balloon RSS during buffering. Bigger bodies short-circuit through with a warning rather than being injected. Regression tests in tests/test_posthog_inject_middleware.py exercise four return paths (snippet present, render-fail, double-injection guard, non-HTML passthrough) plus the streaming-guard short-circuit. 2. $ai_input / $ai_output_choices were emitted without truncation, so POSTHOG_LLM_PAYLOADS=1 silently dropped events past PostHog's ~32 KB per-event ingest limit — exactly the calls (large prompts with schemas / sample rows / SQL) an operator would want to inspect. Fix: clip both at POSTHOG_LLM_PAYLOAD_MAX_CHARS (default 30000) with an explicit "…[truncated N chars]" marker so readers don't mistake truncated captures for complete ones. Metadata (provider, model, tokens, latency, error) flows regardless. Three new tests cover default-cap clipping, env-override, and pass-through under the cap. 37 PostHog tests pass.	2026-05-08 17:57:10 +04:00
minasarustamyan	4fb2818a19	Add /marketplace browse page + Model B opt-in stack composition (#230 ) * Add /marketplace browse page + Model B opt-in stack composition New /marketplace browse surface unifies the curated marketplaces (admin-managed git mirrors) and the community Flea Market behind three tabs — Curated / Flea / My Stack — with per-tab category filter, search across both sources with scope checkboxes, and numeric pagination, all driven by URL query state. Plugin detail at /marketplace/curated/<slug>/<plugin> and /marketplace/flea/<id>; nested skill / agent detail at /marketplace/curated/<slug>/<plugin>/ {skill,agent}/<name> and the flea-side single-page detail. Model B opt-in: an RBAC grant on a curated plugin is now only eligibility. The user must click "Add to my stack" for it to enter their served Claude Code marketplace. Composition flips from (rbac ∖ opt_outs) ∪ store_installs to (rbac ∩ subscriptions) ∪ store_installs. The legacy user_plugin_optouts table is renamed user_curated_subscriptions (schema v27) — same table shape, inverted semantic, repository methods become subscribe / unsubscribe / is_subscribed. UX vocabulary: Install → Add to my stack, Installed → In your stack, card "Installed" badge → "In stack" (amber pill), tab "My Subscriptions" → "My Stack". Bridges the two-step model (server-side bookmark vs. on-laptop install) the previous label hid. Click triggers an inline post-add hint panel under the description with the agnes refresh-marketplace recipe + Copy chip, dismissible per-browser via localStorage. Per-tab info blocks above the filter row: - Curated: trust signal — "Each plugin here has a named curator accountable for it." (blue accent + See-all-curators link) - Flea: open-shelf signal — "Anyone in the company can upload here." (purple accent + Tips-for-sharing link) - My Stack: personal-shelf orientation — "Your AI stack — everything you've added." (slate accent, no link) Tabs carry per-tab Heroicons (shield-check / building-storefront / rectangle-stack) tinted to match each tab's accent; flips white when the tab is active for contrast. Hero illustration anchored to the right of the blue hero panel (absolute, 47% wide, behind the search row content). Hidden under 900px viewport. Action-row CTAs realigned to publication intent: curated "How to add new content" → "Submit a plugin" (links to the guide page); flea button removed since +Upload sits next to it. Empty-state CTAs match. /marketplace/guide/{curated,flea} routes now host publication-flow guide pages with placeholder ledes — full copy to be authored separately. Categories: Heroicons-based icons mapped per category in src/category_icons.py (zero new dependencies; SVG path strings inlined). Marketplace cards, filter pills, and detail pages read from the same source. API endpoints under /api/marketplace: - GET /items per-tab listing (curated / flea / my) - GET /categories per-tab non-zero counts - GET /curated/{slug}/{plugin} plugin detail - POST/DELETE /curated/{slug}/{plugin}/install subscribe toggle - GET /curated/{slug}/{plugin}/{skill,agent}/{name} inner item The tab=my branch reads directly from user_curated_subscriptions ∪ user_store_installs (not resolve_user_marketplace, which bundles flea skills/agents into a single store-bundle synthetic entry useful for serving the Claude Code marketplace ZIP/git but wrong for browsing where each item should appear as its own card). Detail pages: plugin detail surfaces inner skills/agents as clickable nested cards; commands/hooks/MCPs render as plain name lists. Skill/agent detail mirrors the plugin layout with kind-tinted accents (skill = green, agent = purple), Description + Details sidebar, Files + Docs sections, and the "How to call it" copy-able invocation chip showing /<plugin>:<inner-name> exactly as Claude Code namespaces it post-install. Curated nested has no install button — links back to the parent plugin. Navbar: standalone "My AI Stack" relabelled "My Stack" and points at /marketplace?tab=my; "Store" link removed (Store flow is reachable via the Flea Market tab's +Upload button). The standalone /my-ai-stack and /store routes still work for old bookmarks. Tests cover the new browse / categories / install / RBAC paths under tests/test_marketplace_api.py; existing marketplace and store tests updated for Model B (explicit subscribe in fixtures). Schema bumped v26 → v27 with idempotent migration that wipes existing user_plugin_optouts rows on flip and adds marketplace_plugins.created_at with registered_at backfill. * Fix v28 migration + post-rebase test fallout v28 ALTER TABLE marketplace_plugins ADD COLUMN created_at conflicted with _SYSTEM_SCHEMA's earlier CREATE that already includes the column on fresh installs (test fixtures starting at any pre-v28 version trip on it). Switch to ADD COLUMN IF NOT EXISTS — same idiom as the upstream v27 Keboola sync-strategy migration on the same ladder. Two test patches needed after the rebase bumped SCHEMA_VERSION 27 → 28: - test_keboola_v27_migration.py: test_schema_version_constant_is_27 was pinning ==27. Loosened to >=27 (the test's purpose is to verify the v27 Keboola migration, not to pin the current SCHEMA_VERSION). - test_setup_page_unified.py: was monkeypatching resolve_allowed_plugins but compute_default_agent_prompt now reads from resolve_user_marketplace (Model B-aware). Stub the right function so the test exercises the v28 served-set path. * Harden curated skill/agent inner endpoints against path traversal `_read_inner`, the `skill_dir` walk in `curated_skill_detail`, and the `agent_path.stat` in `curated_agent_detail` joined URL path-params onto `plugin_root` without verifying the resolved candidate stayed inside it. Starlette's `[^/]+` on `{skill_name}` / `{agent_name}` blocks the direct URL exploit (encoded `/` 404s before the handler), but a curator-planted symlink inside a curated marketplace's git mirror could still dereference outside the plugin tree on read. Adds `_safe_join(plugin_root, *parts)` doing `Path.resolve(strict=True)` + `relative_to(plugin_root.resolve())`, used by all three call sites so the boundary is enforced once and consistently. Tests cover the helper directly (normal path resolves, escaping `..` returns None, escaping symlink returns None, missing file returns None) plus an end-to-end check that the symlink case actually 404s on the HTTP endpoint. Symlink tests skip on Windows where symlink creation needs elevated permissions; they run on Linux CI. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>	2026-05-08 14:22:19 +02:00
ZdenekSrotyr	506a378c3a	release: 0.47.1 — Keboola connector v27 (incremental, partitioned, where_filters, typed parquet) (#217 ) ## Summary Brings the Keboola connector to feature parity with the legacy internal data-analyst's per-table sync strategies. Closes the four documented gaps from the spec branch (`zs/keboola-connector-specs`): - Typed parquet in the legacy SDK extraction path — column types from Keboola Storage metadata (provider cascade `user > ai-metadata-enrichment > keboola.snowflake-transformation`) survive the CSV → parquet roundtrip; invalid date strings (`'0000-00-00'`) and invalid numeric strings (`'Non-Manager'`) become NULL while keeping the column's typed schema. Pre-fix everything was VARCHAR. - Incremental sync via Storage API `changedSince` — opt-in per table; pulls only delta rows, merges into the existing parquet by `primary_key` (drop_duplicates with keep='last'). Cuts daily extraction from O(full table) to O(delta). - Partitioned sync — flat per-partition layout `data/<table>/<key>.parquet` (e.g. `2026_05.parquet`), per-affected-partition merge for daily updates, chunked initial load with 1-day overlap and 2-empty-chunk stop heuristic. - `where_filters` — server-side row filter with date placeholders (`{{today}}`, `{{last_3_months}}`, `{{start_of_3_months_ago}}`, etc.) resolved at sync time. Force the SDK path; reject `incremental + where_filters` combination at API layer (changedSince already filters temporally). ## Architecture - Schema migration v25 → v26: 7 new columns on `table_registry`. Existing `sync_strategy` column reused (pre-v26 it was inert catalog metadata; post-v26 the extractor dispatches off it). - Per-table dispatcher in `extractor.run()` routes to one of `_extract_via_extension` (full_refresh + extension), `_extract_via_legacy` (full_refresh + filters or extension fallback), `extract_incremental`, or `extract_partitioned`. - API conflict policy: `incremental + where_filters` → 422; `partitioned + query_mode='remote'` → 422; `partitioned ⇒ partition_by required`. - Admin UI: third "Direct extract (Storage API)" radio in the Keboola Register / Edit modals, alongside existing "Whole table (extension)" and "Custom SQL". When selected, exposes a v26 sync-strategy panel with conditional fields per strategy. ## Test plan - [x] Unit + module — 134 v26 tests covering migration, repo, parquet_io, where_filters, incremental (compute_changed_since + merge_parquet + extract_incremental E2E), partitioned (key derivation + merge_partition + chunked windows + extract_partitioned E2E), extractor dispatcher, admin API validators, PUT field clearing, registry-shape → dispatcher bridge - [x] HTML form structure — all v26 inputs + visibility classes + JS payload fields verified in rendered template - [x] Real Keboola roundtrip — registered a small test table as `sync_strategy='incremental'` against a test Storage project, triggered two syncs: - Sync 1: `changedSince=None` → full pull → 9 rows typed parquet - Sync 2: `changedSince=last_sync - 1d window` → 9 delta rows merged with 9 existing → 9 after dedup on primary_key (PK merge confirmed) - [x] Browser UX — agent-browser session against a local uvicorn: login → admin/tables → register modal → switch radios → verify field visibility per strategy → submit → edit existing row → switch to Direct/Incremental → save → confirm DB persistence - [x] Regression — no regressions in the broader 3252-test suite (3 pre-v26 tests updated for the deprecation-marker removal + schema-version bump; 2 pre-existing environment-sensitive test failures unrelated to this change) ## Bugs caught + fixed during E2E The browser + real-Keboola roundtrip exposed four bugs the unit tests missed: 1. JS visibility race — two competing `forEach` loops set `display=''` then `display='none'` on form elements sharing `kb-strategy-incremental kb-strategy-partitioned` classes (window_days + max_history_days are reused across strategies). Fix: single-pass selector with class-based visibility resolver. 2. PUT cannot clear field — pre-v26 `updates = {k: v ... if v is not None}` collapsed "omitted from body" and "sent as null" into the same case, so admin couldn't switch a partitioned row back to full_refresh and have stale `partition_by` clear. Fix: `model_dump(exclude_unset=True)`. 3. Subprocess DB lock conflict — `_read_last_sync` reopened `system.duckdb` while the parent server held the write lock (subprocess contract at `app/api/sync.py:_run_sync` line 260). Fix: parent injects `__last_sync__` into table_config before subprocess spawn. 4. Wrong KBC table_id — `extract_incremental` / `extract_partitioned` built the Storage API table_id from the registry row's slugified `id` (`circle_inc`) instead of `bucket.source_table` (`in.c-finance.circle`), producing 404s. Fix: prefer `bucket+source_table`; fall back to `id` only when bucket empty. ## Operator notes - Existing tables stay on `full_refresh` after migration; admins opt individual tables in via `agnes admin register-table --sync-strategy ...`, the Keboola Edit modal, or `POST/PUT /api/admin/registry`. - `merge_parquet` and `merge_partition` use `pd.concat + drop_duplicates`, loading both existing and delta into pandas RAM. For tables in the multi-million-row range this may OOM — switch to `partitioned` strategy for those (per-partition merge keeps memory bounded). Documented in `### Internal` of the changelog entry. - Date placeholders are resolved at sync time, not register time — a typo'd `{{lasst_week}}` is accepted at register and surfaces only when the next sync runs. By design (rolling windows need late-binding). ## Spec source The four corresponding plans on the `zs/keboola-connector-specs` branch under `docs/superpowers/plans/2026-05-07-0[1-4]-*.md` capture the design rationale and link back to internal repo references for each subsystem. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/217" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->	2026-05-07 19:01:27 +02:00
ZdenekSrotyr	28430ced09	Keboola cutover: native parquet path + sync correctness + auto-discover protection (#190 ) * fix: cutover regressions + parallel Keboola legacy fallback Bundled fixes from a fresh-deploy run on a Keboola Storage backend with the block-shared-snowflake-access feature flag — DuckDB Keboola extension's per-table scan can't access bucket schemas, so the legacy kbcstorage Storage-API client is the only working path. CUTOVER REGRESSIONS - agnes pull hash mismatch on every Keboola local-mode table — src/orchestrator.py:_update_sync_state stored md5(mtime+size)[:12] while the CLI compares against full 32-char content MD5. Now stores the same content MD5 the materialized SQL path already used. - Trailing-slash sanitization in connectors/keboola/access.py and extractor.py — DuckDB Keboola extension's ATTACH fails when the URL ends in / (canonical form). - src/profiler.py:TableInfo.description becomes optional — two call sites instantiated without it, crashing the profiler pass. - scripts/ops/agnes-auto-upgrade.sh: chown on UID change — older images ran as root, current runs as agnes (uid 999). Reads target uid:gid from /etc/passwd inside the new image and chowns ${STATE_DIR}, /data/extracts, /data/analytics when the digest moves. - POST /api/sync/trigger is now singleton per process — two near-simultaneous trigger calls each forked an extractor subprocess, fought for extract.duckdb's file lock, starved uvicorn, flipped the container to unhealthy. Trigger now returns 409 (sync_already_in_progress) when held; _run_sync acquires non-blocking. PARALLEL LEGACY FALLBACK - Process pool fan-out for the _extract_via_legacy queue (default 8 workers, override via AGNES_KEBOOLA_PARALLELISM). Process pool, not thread pool, because connectors/keboola/client.py:export_table does os.chdir(temp_dir) — process-global, so threads raced and slice files landed in the wrong directory ("[Errno 2] No such file or directory: '<job_id>.csv_X_Y_Z.csv'"). - Extractor subprocess timeout 1800s -> 3600s (configurable via AGNES_EXTRACTOR_TIMEOUT_SEC). 28+ tables × multi-minute Keboola export jobs need the headroom on telemetry-class projects. - Process group cleanup on timeout — Popen(start_new_session=True) puts the extractor in its own group. On timeout the parent SIGTERMs the group (10s grace) then SIGKILLs stragglers. Without this, the pool workers were reparented to PID 1 and continued holding open Keboola Storage export jobs. Inline extractor script also installs a SIGTERM -> sys.exit(143) handler so the with ProcessPoolExecutor(...) block __exit__ runs cleanly. Tests: existing tests that patched subprocess.run updated to patch subprocess.Popen with a _FakePopen stand-in (same exit-code-injection contract). Two tests that exercised the parallel path forced AGNES_KEBOOLA_PARALLELISM=1 to keep mocks alive (mocks don't ride into ProcessPoolExecutor subprocesses). Squashed onto current main (was 7 commits + multi-commit CHANGELOG + agnes-auto-upgrade.sh conflicts; squash avoids per-commit conflict resolution against main's flat-mount STATE_DIR refactor and 0.38.0 release cut). * feat(keboola): Storage API direct extract path; drop extension data path The DuckDB Keboola extension's COPY routes through Keboola QueryService, which is unreliable on linked-bucket projects (extension v0.1.6 fixes that case but isn't yet in the community CDN, and pre-fix any project with the block-shared-snowflake-access feature flag couldn't see bucket schemas at all). Move the extract path off the extension entirely and talk to the Storage API directly via signed-URL download — works on any project, regardless of extension state. connectors/keboola/storage_api.py (NEW) Lightweight client built on requests.Session. Three endpoints: - POST /v2/storage/tables/{id}/export-async (kicks off job) - GET /v2/storage/jobs/{id} (poll until done) - GET /v2/storage/files/{id}?federationToken=1 (signed URL detail) - GET <signed_url> (download bytes) Supports sliced exports (manifest + per-slice signed URLs) and gzipped payloads. ExportFilter dataclass mirrors the Keboola filter spec (whereFilters / columns / changedSince / limit) and handles JSON round-trip with the registry's source_query column. Token redaction in error messages. Bounded exponential backoff on job polling. No cloud-SDK dependency on the data path; thread-safe. connectors/keboola/extractor.py - materialize_query() rewritten: takes bucket/source_table/source_query (JSON filter spec), exports via KeboolaStorageClient, converts CSV to parquet via DuckDB, atomic os.replace. Same return shape so sync.py downstream code stays uniform with the BQ branch. - _extract_via_legacy() also moved to Storage API direct (kept the name for caller compatibility with _legacy_worker / the parallel batch extractor). Per-call temp directories — no os.chdir, threads don't race. app/api/sync.py _run_materialized_pass for source_type='keboola' rows now constructs a KeboolaStorageClient (replaces KeboolaAccess) and passes bucket/source_table/source_query to materialize_query. Reuses one client across rows for HTTP keep-alive. Sources keboola URL from env too (KEBOOLA_STACK_URL) when instance.yaml doesn't have stack_url configured. cli/commands/admin.py discover-and-register defaults Keboola rows to query_mode='materialized' (NULL source_query = full table), matching the v26 migration's unification of the local/materialized split for Keboola. BigQuery and Jira keep their per-source defaults. src/db.py Schema bump 25 → 26. Migration: UPDATE table_registry SET query_mode='materialized' WHERE source_type='keboola' AND query_mode='local'. NULL source_query on those rows means "full table export" — same effective behavior the local mode provided, but now via Storage API instead of the extension. pyproject.toml kbcstorage dep stays (admin-side bucket/table list still uses the SDK in app/api/admin.py / connectors/keboola/client.py); only the data path is migrated off the SDK. Comment updated to reflect the new boundary. tests - test_keboola_storage_api.py (NEW, 19 tests): ExportFilter parsing, HTTP client (token redaction, retry logic, polling), download_file (single, gzipped, sliced), end-to-end export_table_to_csv. - test_keboola_materialize.py rewritten: mocks KeboolaStorageClient instead of FakeAccess; same atomic-write + zero-rows + unsafe-id contracts. - test_sync_trigger_keboola_materialized.py: registry rows now carry bucket+source_table+JSON-shape source_query. 114+ Keboola-impacted tests green locally. * test: schema version assertion bumped to 26 alongside the keboola query_mode migration * fix(keboola): cutover hot-patches surfaced on agnes-dev Five small fixes that were applied as in-container hot-patches during agnes-dev cutover and need to be on the source-of-truth image so a fresh upgrade does not undo them. - app/api/sync.py: auto-discover gate considers the WHOLE registry (any source, any mode), not just rows where source matches and query_mode is local. After the v25→v26 keboola materialized migration an instance can have 30 materialized rows and zero local rows; the previous gate kept re-firing _discover_and_register_tables every scheduler tick, creating duplicate auto-discovered rows with the wrong bucket prefix every time. - app/api/admin.py: _discover_and_register_tables reassembles the bucket as <stage>.<bucket-id> (e.g. in.c-finance) instead of dropping the stage prefix; default query_mode for keboola is now materialized (the v26 contract); validator allows NULL source_query for keboola materialized rows (full-table export via Storage API export-async, no SQL needed). - cli/commands/admin.py: register-table mirrors the server validator (NULL source_query allowed for source_type=keboola); --bucket help text generalized to cover both BQ dataset and Keboola bucket id. - connectors/keboola/extractor.py: max_line_size=64 MiB on read_csv_auto so embedded JSON / SQL cells (kbc_component_configuration in particular) do not trip the default 2 MiB ceiling. - connectors/keboola/storage_api.py: GCP backend support — when the Storage API returns a manifest whose slice URLs are gs:// references with a gcsCredentials block, rewrite to the JSON REST download endpoint and authenticate with the issued OAuth bearer token; redact tokens in any surfaced error string. * test: align with new keboola materialized + auto-discover-gate contracts - test_admin_keboola_materialized: rename test_register_keboola_materialized_rejects_missing_source_query → test_register_keboola_materialized_accepts_missing_source_query. v25→v26 introduced 'keboola materialized with NULL source_query means full-table export via Storage API export-async' as the default registration shape; the rejection case is no longer the contract. - test_sync_filter: add list_all() to _StubRegistry. The auto-discover gate in _run_sync now keys off the WHOLE registry (not just local rows) so materialized-only Keboola instances do not re-trigger discovery on every tick. * feat(keboola): native parquet export — skip CSV roundtrip Storage API export-async accepts fileType={csv,parquet}. Switching the materialized sync to parquet eliminates the CSV → DuckDB COPY → parquet roundtrip that pinned a single uvicorn worker over 4 GiB on multi-GB tables (read_csv with all_varchar + max_line_size=64MB has to materialize the whole CSV in memory before COPY can stream out a parquet). Snowflake UNLOAD on Keboola's side already produces typed, self-contained parquet files; the extractor downloads them and renames into place. Two cases: - Single-file export (small table): file_info.url points at one signed URL; download_file streams chunks straight to .parquet.tmp and we're done. No DuckDB. - Sliced export (Snowflake UNLOAD respects MAX_FILE_SIZE — 16 MiB default — so anything larger arrives as N parquet slices): each slice is a complete parquet file with its own footer; naive concat would corrupt them. download_file_slices keeps the slices as separate files in a tempdir, then DuckDB COPY (SELECT * FROM read_parquet([slice0, slice1, ...])) merges them into one consolidated parquet. DuckDB streams row groups during this — peak memory bounded to one row group (~1 MiB) regardless of source size. The legacy CSV path stays as the explicit opt-in via source_query= '{"file_type":"csv"}' for projects whose backend can't UNLOAD parquet (none known today; cheap escape hatch). Backward-compat alias KeboolaStorageClient.export_table_to_csv kept. Also fixes a latent bug in download_file's gzip detection: previous heuristic flagged any unencrypted file as gzipped, which would have corrupted parquet downloads at gunzip time. Name-suffix-only now. * fix: tempdir leak cleanup, every 0m schedule, /sync/trigger body shapes Three small self-contained fixes uncovered during agnes-dev cutover. - connectors/keboola/extractor.py: tempfile.TemporaryDirectory now uses ignore_cleanup_errors=True so a worker death mid-write doesn't leave multi-GiB stale slice trees on the boot disk. (12 GiB seen after a disk-full crash where TemporaryDirectory's own cleanup also raised and got swallowed.) - src/scheduler.py: is_valid_schedule accepts 'every 0m' (interval=0 = always due). Force-resync of an errored row no longer requires waiting out the default 'every 1h' interval — admin can flip the schedule, trigger, then flip back. - app/api/sync.py: POST /api/sync/trigger accepts both ['table_id'] (legacy bare-array body) and {'tables': ['table_id']} (matches the response payload shape, more discoverable for clients building requests by hand). Malformed bodies return 422 with a structured detail; null/missing means 'sync everything' as before. Tests cover: tempdir cleanup on raise (sliced parquet path), is_valid_schedule + is_table_due 'every 0m' acceptance, and trigger body parametrized matrix (8 valid shapes + 6 rejection cases). * fix: targeted-trigger filter in materialized pass + auto-upgrade defer Two operational gaps observed during agnes-dev cutover, in the same sync-routing area. - _run_materialized_pass now takes a 'tables' arg and skips rows not in the target set with reason='not_in_target'. POST /api/sync/trigger with a body of tables previously only scoped the legacy extractor subprocess — the materialized pass kept iterating every due materialized row, so an admin asking to re-sync kbc_job re-ran every other due materialized row alongside it. Match on registry id OR name (admins commonly pass either form). tables=None preserves the no-filter behavior. - New GET /api/sync/status (public, no auth) returns {locked: bool} off _sync_lock.locked(). agnes-auto-upgrade.sh probes this before docker compose up -d and exits 0 with a 'deferred recreate' log line if a sync is in flight — the next 5-min cron tick retries. Pre-fix, an auto-upgrade triggered mid-sync would recreate the uvicorn worker and kill the in-flight extractor / Snowflake-UNLOAD download (observed when kbc_job's first 7-day retry got SIGKILLed). Connection failures in the probe fall through to the upgrade — being stuck on a wedged image is worse than interrupting a hypothetical sync. * fix: auto-discover protects admin overrides + surfaces drift Two real-world incidents on agnes-dev drove this: 1. kbc_job was registered manually with the correct (in.c-kbc_telemetry, kbc_job) coordinates. A naive auto-discover re-run would have inserted a SECOND kbc_job row at the slugified id 'in_c-keboola-storage_kbc_job' (where Keboola's discovery places it) — and that row's Storage API export-async 404s. 2. An earlier auto-discover bug stripped the stage prefix from bucket ids ('c-finance' instead of 'in.c-finance'), inserting 137 rows whose syncs all failed. Fix: - _discover_and_register_tables now builds a plan first (_build_keboola_discovery_plan) classifying each discovered table into one of new / existing_match / existing_drift / invalid, then executes only the 'new' bucket. Drift rows are reported with both sides of the disagreement plus drift_kind: - same_id_diff_coords: registry has the same id but different bucket / source_table (admin migrated coords inline). - name_collision: discovery's slugified id differs from any registry id, but the discovered .name matches an existing row's .name (case-insensitive). Catches the kbc_job case. - Bucket detection now prefers the API's authoritative bucket_id field (separate field on the Keboola tables.list response, normalised by KeboolaClient.discover_all_tables). Falls back to id-string parsing only when bucket_id is missing (older fallback path inside discover_all_tables). - Endpoint POST /api/admin/discover-and-register?dry_run=true returns the plan without writing — would_register, drift, invalid lists. Lets an operator audit before merging discovery with a registry that has admin overrides. Removed 'every 0m' from test_register_request_rejects_malformed_sync_schedule — the runtime started accepting it in the previous commit (force-resync override) and the validator follows suit. * feat(keboola): AGNES_TEMP_DIR routes tempfiles off overlayfs /tmp The container's /tmp lives on the boot disk's overlayfs (29 GiB on agnes-dev, shared with /var). Snowflake UNLOAD of a wide table writes slices into per-call /tmp tempdirs that fill multi-GiB / many-slice exports long before the dedicated data disk fills. agnes-dev hit 100% boot-disk while the 20 GiB data disk had 15 GiB free. connectors.keboola.storage_api.get_temp_root() reads AGNES_TEMP_DIR; mkdirs the target on first use; unset / empty / unwritable falls back to None (system tempdir, OSS-pre-fix behaviour). Both materialize_query (parquet path) and _extract_via_legacy (CSV fallback) and the sliced-CSV concat path in storage_api use the helper now. docker-compose.yml defaults AGNES_TEMP_DIR=/data/tmp on app, scheduler, and extract services. The data volume is the dedicated disk in production layouts and a plain docker volume in single-disk dev/laptop setups — same blast radius as the previous /tmp default on the latter, no regression.	2026-05-07 12:12:14 +02:00
ZdenekSrotyr	7781c3f331	fix(0.41.0): orphan parquet skip in filesystem fallback (CI regression) Pre-existing test_orchestrator_skips_orphan_parquet_in_extracts caught the regression: my filesystem fallback created master views for ANY parquet on disk, including orphans where DELETE /api/admin/registry removed the registry row but the parquet wasn't fully cleaned up. Fix: load the set of registered materialized table_ids for THIS source from table_registry before the scan, and skip any parquet whose stem isn't in that set. If the registry read fails (test fixture, transient DB error), skip the fallback entirely — orphan exposure is worse than missing master view recovery. Pre-existing test now passes. New regression test pins the orphan-skip contract specifically for the filesystem-fallback path.	2026-05-06 17:06:20 +02:00
ZdenekSrotyr	dfb7f25e76	release: 0.41.0 — orchestrator filesystem fallback for missing _meta materialized rows 0.40.0 added _persist_materialized_inner_view in materialize_query, which tried to open extract.duckdb from a fresh DuckDB handle to write the _meta row + inner view. In production this conflicts with the same uvicorn process's existing read-only ATTACH (orchestrator's analytics conn holds extract.duckdb ATTACHed as <source_name> alias), and DuckDB single-process file-handle uniqueness rejects with: Binder Error: Unique file handle conflict: Cannot attach "extract" — already attached by database "<source>" The helper logs WARNING fail-soft, parquet stays canonical, but the master view never appears via the meta path. Fix: at the end of _attach_and_create_views, scan <extract_dir>/data/.parquet and CREATE OR REPLACE VIEW <id> AS SELECT FROM read_parquet('<path>') for any parquet whose <id> is not already in the per-source tables list (= meta path didn't pick it up). Decoupled from materialize_query open-handle race. Honors the same view_ownership cross-connector collision rules as the meta path (first-come-first-served via view_repo.claim). Tests: - filesystem-fallback fires when _meta row missing - skipped when meta path already created the view (no shadow) - skips invalid identifiers (e.g. parquet stem starting with a digit) - doesn't crash when source has no data/ subdir	2026-05-06 16:58:18 +02:00
Vojtech Rysanek	32c8ea601a	fix(bigquery): apply bq_query_timeout_ms on every BQ-extension attach + surface silent failures The DuckDB BigQuery extension defaults bq_query_timeout_ms to 90 s, which is too tight for analyst-scale queries against view-backed BQ datasets. Agnes already has apply_bq_session_settings() that bumps it to 600 s (configurable via data_source.bigquery.query_timeout_ms), but two regressions let the 90 s default leak through to live queries: 1. apply_bq_session_settings() swallowed every Exception silently. If the BigQuery extension wasn't loaded on the connection yet, or the installed extension version didn't recognise the setting, the SET would fail and the function would return without surfacing the problem. Operators saw 90 s timeouts on 'agnes query --remote' with no log line explaining why. 2. The call sites in src/db.py:_reattach_remote_extensions and src/orchestrator.py:_remote_attach only invoked apply_bq_session_settings on the metadata-token branch (token_env empty, the BqAccess contract). The token-based and no-auth branches ran ATTACH against the BigQuery extension without ever applying the timeout setting — so any BQ source registered with an explicit token_env, or with no auth env at all, fell back to the 90 s default. Fix: - apply_bq_session_settings now logs WARNING on each failure path (instance_config import error, non-numeric value, SET execution failure, readback error). It also verifies the setting actually landed via SELECT current_setting('bq_query_timeout_ms') and logs WARNING when the readback disagrees with the requested value, which catches the silent-ignore case some extension versions exhibit. - Both _reattach_remote_extensions (src/db.py) and _remote_attach (src/orchestrator.py) now call apply_bq_session_settings on every branch that ATTACHes a BigQuery alias, not only the metadata-token branch. Idempotent: calling it twice on the metadata-token path is a no-op SET. Tests: - Extended the _RecordingConn fixture to support .fetchone() so the readback assertion path works. Updated existing call-shape assertions to expect the SELECT current_setting readback alongside the SET. Added two new tests covering the WARNING surfaces for SET failure and readback mismatch — regression guards for the silent- fallback bug this PR addresses. - Full BQ-touching suite (398 tests) passes.	2026-05-06 11:24:14 +04:00
ZdenekSrotyr	6c94d2cbce	Merge remote-tracking branch 'origin/main' into pr180-review # Conflicts: # CHANGELOG.md # pyproject.toml	2026-05-06 07:27:25 +02:00
ZdenekSrotyr	4a1916a4b0	fix: v24 migration error message points to actual snapshot path The pre-migration snapshot was correctly migrated to STATE_DIR-aware path in src/db.py:1832 (`_get_state_dir() / 'system.duckdb.pre-migrate'`), but the error message in _migrate_v24_bq_source_queries still hardcoded the old `{DATA_DIR}/state/...` shape. Under flat-mount layout (STATE_DIR=/data-state), an operator hitting the v24 migration error would look in /data/state/ for a rollback snapshot that lives in /data-state/. Devin Review on PR #194 round 3.	2026-05-05 20:13:08 +02:00
Vojtech Rysanek	a303de0372	feat: STATE_DIR env var + flat-mount overlay (parallel disks) Introduces STATE_DIR as the single source of truth for the writable state directory path, with backward-compatible default of ${DATA_DIR}/state. Pairs with a new docker-compose.flat-mount.yml overlay that mounts the state disk in PARALLEL to the data disk (rather than nested under it). Why --- The default deployment topology nests state under data: sdb at /data, sdc at /data/state. That layout has known fragility documented in docs/state-dir.md — bind-propagation gotchas, two-writer collisions on the same prefix, mount-order coupling. The 2026-05-05 incident in the Groupon FoundryAI deployment was a manifestation of the propagation gotcha. The flat layout (sdb at /data, sdc at /data-state — parallel, not nested) eliminates the nested-mount class entirely. Each disk is its own bind mount, recursive by default in modern Docker. No volume options to forget. No two-writer collision (host scripts and container app share /data-state at the same path, single namespace). What changes ------------ App code (Python): - src/db.py: new _get_state_dir() helper. get_system_db() and schema migration snapshot use it. - app/secrets.py: new _state_dir() helper. _load_or_generate() uses it for .session_secret and .jwt_secret. - app/main.py: .env_overlay loaded from _state_dir(). Host scripts: - scripts/ops/agnes-auto-upgrade.sh: STATE_DIR drives mount-sanity check and cert detection. Defaults preserve existing behavior. - scripts/ops/agnes-tls-rotate.sh: STATE_DIR drives CERT_DIR. New compose overlay: - docker-compose.flat-mount.yml: parallel /data and /data-state binds per service. Mutually exclusive with docker-compose.host-mount.yml; pick one based on disk topology. Documentation: - docs/state-dir.md: layout choice (A nested vs B flat), pros/cons, migration steps, and which code paths read STATE_DIR. Backward compatibility ---------------------- STATE_DIR defaults to ${DATA_DIR}/state — current behavior. Existing deployers that don't set the var see no behavior change. Migration to flat layout is opt-in per the runbook in docs/state-dir.md. Validation ---------- - bash -n on both host scripts: pass - docker compose config -f docker-compose.flat-mount.yml: resolves cleanly with all 6 services binding /data and /data-state directly - python3 import + helper exercise: STATE_DIR override works, default falls back to ${DATA_DIR}/state Companion to PR #191 (drop named-volume driver_opts in host-mount.yml). That PR fixes the immutability footgun for Layout A; this PR offers Layout B as the architectural alternative.	2026-05-05 19:28:07 +02:00
ZdenekSrotyr	025a2b5c0e	fix(db): apply bq_query_timeout_ms to read-only reattach path Devin Review on PR #181: caught that the original PR plumbed the new SET into the orchestrator's _remote_attach (rebuild path), the BqAccess factory (materialize path), and the standalone extractor — but missed the actual primary `agnes query --remote` request path: every read-only analytics-DB connection runs `_reattach_remote_extensions` in `src/db.py` on open, and that LOAD bigquery + ATTACH cycle was unconfigured. Without this commit, the very flow the PR was meant to fix — analyst queries hitting BQ views > 90s — would still 400 with the same Binder Error / Job ID wording, because the runtime LOAD bigquery happens here not in the orchestrator's rebuild path. Apply apply_bq_session_settings(conn) right after the BQ secret is created and before ATTACH, mirroring what every other PR site does.	2026-05-05 16:40:40 +02:00
ZdenekSrotyr	4f04235502	feat(bigquery): bq_query_timeout_ms knob; default 600s (was 90s) DuckDB BigQuery extension defaults `bq_query_timeout_ms` to 90 s, which is too tight for analyst-scale queries against view-backed BQ datasets. `agnes query --remote` HTTP 400'd with `Binder Error: Query execution exceeded the timeout. Job ID: ...` whenever the underlying BQ job ran longer than 90 s, even though the job itself was healthy. Add `data_source.bigquery.query_timeout_ms` (default 600 000 ms = 10 min, sentinel 0 falls through to the extension default). Applied via `SET bq_query_timeout_ms` after every `LOAD bigquery` on every BQ-touching DuckDB session: orchestrator's `_remote_attach` ATTACH path, BqAccess session factory, and the standalone extractor. Configurable via `/admin/server-config` UI. Fail-soft: extension versions that don't recognise the setting silently keep the default rather than poisoning the session.	2026-05-05 16:40:40 +02:00
ZdenekSrotyr	3d63965a67	Merge remote-tracking branch 'origin/main' into pr180-review # Conflicts: # CHANGELOG.md # app/web/templates/_app_header.html	2026-05-05 12:05:50 +02:00
ZdenekSrotyr	fd3c76d21b	fix(store): security + correctness blockers found in PR review (F1, F2, F4, F5) Three independent reviews of PR #180 surfaced four real defects in the new Store / my-ai-stack surface. CHANGELOG entries detail each; one-liners: - F1 video_url XSS: any authenticated user could upload a Store entity with `video_url=javascript:...` and pop XSS in any viewer's session via the `<a href=...>` "Watch video" link in store_detail.html. Jinja2 autoescape doesn't block URI schemes inside attribute values. Fixed by scheme-validating to http(s) only on create + update; 400 invalid_video_url. - F2 ZIP decompression bomb: _safe_zip_extract checked path-traversal but not declared file_size totals — a 50 MB compressed upload at 1:1000 ratio decompresses to 50 GB and DOS the host disk. Fixed by summing zinfo.file_size across infolist() and refusing > 200 MB before extractall touches disk. 413 zip_too_large_uncompressed. - F4 admin authz parity: PUT /api/store/entities/{id} was owner-only while DELETE allowed owner OR admin; the store-detail page hid Edit/Delete buttons from admin even though DELETE was permitted. Fixed by allowing admin on PUT and passing is_admin to the template; gate is now is_owner OR is_admin everywhere. - F5 cross-owner suffix collision: sanitize_username is many-to-one (alice.smith / alice_smith both → alice-smith). Two such users uploading entities with the same display name produced identical `<name>-by-<username>` suffixes, silently colliding in the served agnes-store-bundle on-disk paths AND the manifest catalog (Claude Code dedupes by plugin.json `name`). Fixed by enforcing global uniqueness on the suffixed value at create_entity; 409 conflict_global_suffix. F3 (ZIP symlink members) was investigated and confirmed to be a false-positive — Python's stdlib ZipFile.extractall does not honor symlink mode bits, so no exploit exists. 9 new regression tests in tests/test_store_api.py::TestStoreSecurityFixes covering all four. Test run locally: 60/60 store-related tests pass.	2026-05-05 08:18:02 +02:00
ZdenekSrotyr	e86dd5edc5	fix(anthropic): strict json_schema (additionalProperties=false) + add /admin/scheduler-runs UI E2E test on a real BQ deploy showed every verification-extraction call fails with HTTP 400 invalid_request_error: "output_config.format.schema: For 'object' type, 'additionalProperties' must be explicitly set to false". The Anthropic structured-output API now requires the field on every object node in the json_schema. Fix: connectors/llm/anthropic_provider.py wraps the caller-supplied schema through a recursive _strict_json_schema() walker that adds the field where missing (preserving any explicit override), then passes the strict variant to the API. Six unit tests in TestStrictJsonSchema pin the recursion across nested objects, array items, and the no-mutation invariant. Adds /admin/scheduler-runs — a read-only admin page that surfaces the last 200 audit-log entries from scheduler-driven actions. New AuditRepository.query_actions(actions, limit) helper, new admin nav entry. Failed scheduler ticks (HTTP 401, network errors) don't reach the audit_log; the page calls that out with a hint to set SCHEDULER_API_TOKEN if no rows show up.	2026-05-05 08:00:57 +02:00
Minas Arustamyan	537ea7662b	chore(store): genericize email examples in docstring + test Per CLAUDE.md vendor-agnostic OSS guidance — replace the real groupon.com email used as a sanitize_username() example with a placeholder (alice_smith@example.com).	2026-05-05 05:48:32 +02:00
Minas Arustamyan	5372d65b26	fix(setup): install list reflects opt-outs + Store bundle `compute_default_agent_prompt` (which renders the install commands in the setup prompt's marketplace block) was calling `resolve_allowed_plugins` — the admin-only feed that predates the v25 Store/opt-out layer. Result: a user with 2 opted-out curated plugins + 2 Store skills saw the original 4 admin grants in the install list (including the opted-out ones, with cross-marketplace duplicates), and no `agnes-store-bundle` install line for the skills. Now we call `resolve_user_marketplace` — the same resolver that `/marketplace.zip` + `/marketplace.git/` serve from. The install commands now match the served catalog exactly: admin grants minus the user's opt-outs, plus the `agnes-store-bundle` synth plugin (which wraps every installed Store skill + agent into one plugin entry) and any standalone Store plugin uploads. Dedup by `manifest_name` because two upstream marketplaces shipping a plugin with the same name collide in the synth marketplace.json by design (CLAUDE.md "Same-named plugins ... collide in the catalog by design"). A duplicate `claude plugin install <name>@agnes` would be a no-op anyway, so it's just visual noise to keep emitting both.	2026-05-05 05:17:05 +02:00
Minas Arustamyan	9d53efc6e1	fix(schema-v25): drop FK refs from store tables Past migration finalize steps RENAME / DROP COLUMN / ALTER on the `users` table (e.g. _v12_to_v13_finalize, _v13_to_v14_finalize, _v17_to_v18_finalize, the v5 backfill). DuckDB rejects an ALTER on a table that any other table references via FOREIGN KEY, so the new store_entities / user_store_installs / user_plugin_optouts entries — which the self-heal pass writes to _SYSTEM_SCHEMA before the migration ladder runs — broke 6 legacy-migration tests with: Cannot alter entry "users" because there are entries that depend on it Pre-existing convention (see personal_access_tokens at v6) is to omit FK constraints to `users` and validate user existence at the app layer. Sync the three v25 tables with that convention. Same edit in both _SYSTEM_SCHEMA and _V24_TO_V25_MIGRATIONS so fresh installs and upgraded installs land in the same shape. App-level cascade behavior is unchanged: store entity DELETE explicitly deletes user_store_installs rows in app/api/store.py, and the admin grant-deletion hook explicitly deletes user_plugin_optouts rows for the plugin. The dropped FK constraints were defense-in-depth, not the only guard.	2026-05-05 03:15:09 +02:00
Minas Arustamyan	d5a7c9ad79	feat(store): /store + /my-ai-stack — community marketplace + per-user composition Adds a community-driven Store where any authenticated user uploads skills/agents/plugins as ZIPs, plus /my-ai-stack as the per-user composition view. The served Claude Code marketplace is now: (admin_granted ∖ opt_outs) ∪ store_installs Skill + agent installs are merged into a single `agnes-store-bundle` plugin in the served marketplace; type=plugin uploads stay standalone. Names are suffixed with `-by-<owner-username>` at upload time so two owners can use the same display name without colliding in Claude Code's flat skill/agent namespace. Schema v23 → v24 adds three tables: - store_entities — community-uploaded skills/agents/plugins - user_store_installs — what each user has chosen to install - user_plugin_optouts — opt-out overlay on top of admin grants Admin grant-delete drops every user's opt-out for that plugin so re-grant resets cleanly to enabled (no sticky personal preference). UI: - /store — e-commerce-style listing with type/category/owner filters, search, pagination, owner-aware [Install] buttons, clickable cards - /store/new — 2-step upload wizard with drag & drop, preview validation (POST /api/store/entities/preview), docs multi-upload, photo + video URL - /store/{id} — detail page with hero, file list, docs, owner actions (Edit/Delete) for the uploader - /my-ai-stack — Granted plugins (toggle opt-out) + From the Store (uninstall) sections - Admin nav: Marketplaces moved into Admin dropdown, renamed to "Curated Marketplaces" Validation hardening: type-mismatch guards reject skill ZIP uploaded as agent (or vice versa), and plugin ZIPs masquerading as skills/agents. Human-readable error messages mapped client-side from machine codes. Cross-source naming: Store entity-id-prefixed dirs (`plugins/store-<id>/`) plus the bundle (`plugins/store-bundle/`) avoid collisions with admin marketplaces (whose `store` slug is reserved by `is_valid_slug`). Bundle composition is content-hashed at serve time — install/uninstall or owner re-upload bumps the bundle's plugin.json `version`, so Claude Code's auto-update toggle picks up changes. Tests: 50+ new tests across naming, repositories, filter (admin ∪ store ∪ bundle), API (upload/install/uninstall/delete/preview/docs), end-to-end marketplace.zip with bundle merging.	2026-05-05 02:53:49 +02:00
ZdenekSrotyr	0612c1e1a1	fix(schema-v24): raise on deferred migration so retry path actually runs (Devin Review on db.py:1757) Pre-fix: when v24 migration found rows to migrate but data_source.bigquery.project was empty, it logged a warning per row and returned normally. Schema_version then bumped to 24 unconditionally → next start's 'if current < 24:' gate skipped _v23_to_v24_finalize forever, leaving rows in DuckDB-flavor SQL that the new _wrap_admin_sql_for_jobs_api wrapping path rejects. Devin escalated this from advisory ("idempotent retry") to critical on rescan after my reply. The reply was wrong — the LIKE filter inside the function gives idempotency IF the function is called again, but the schema-version gate prevents that call from happening. Fix (Devin's recommended Approach 1): raise RuntimeError BEFORE the schema-version bump when rows need migration but project_id is empty. The schema_version stays at 23, so on next start the 'if current < 24:' gate fires and the migration runs again — this time with project_id configured. Side effect: a BQ-using deployment that hasn't set the project blocks startup until they do. That's the right call for a config error that would otherwise silently break all materialized tables. The error message points at the right knob (data_source.bigquery.project + restart). No-rows-no-block invariant preserved: the early 'if not rows: return' at the top of _v23_to_v24_finalize means non-BQ deployments are unaffected. Tests: - test_v24_raises_when_project_not_configured_and_rows_need_migration: asserts raise + schema_version stays at 23 (the load-bearing invariant for retry-on-next-start to work) - test_v24_skips_clean_when_no_rows_match_even_without_project: asserts non-BQ deployments don't block startup - Existing 3 tests still pass	2026-05-04 23:11:34 +02:00
ZdenekSrotyr	291079b1d2	refactor(welcome-template): drop role param; resolve plugins per-user unconditionally Removes the `role: Literal["analyst", "admin"] = "admin"` parameter from `compute_default_agent_prompt`. The same RBAC pass (`marketplace_filter.resolve_allowed_plugins`) now runs for every user — admin or not. Users with no `resource_grants` rows get the no-marketplace layout; users with grants get the marketplace block inserted. Admin-vs-analyst is no longer a layout branch. `render_agent_prompt_banner` no longer derives a `role` from `user.is_admin`; it just delegates to `compute_default_agent_prompt`. Two `compute_default_agent_prompt(...role=role)` call sites in `app/web/router.py::setup_page` are updated to drop the keyword so the route keeps rendering — Task 5 will remove the `?role=` query parameter and the silent admin-downgrade block from the route signature itself. Tests: drop role-aware assertions from test_welcome_template_renderer and test_welcome_template_api. Both files now assert the unified default contains `agnes init` + `uv tool install` and bans the legacy `agnes auth import-token` / `agnes auth whoami` verbs. Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 4.	2026-05-04 22:13:46 +02:00
ZdenekSrotyr	9334beed15	refactor(setup-instructions): drop role param; collapse analyst/admin into one layout Removes the `role: Literal["analyst", "admin"]` parameter from `resolve_lines` / `render_setup_instructions` and deletes the `_resolve_analyst_lines`, `_analyst_init_lines`, `_analyst_finale_lines` helpers. The unified flow now always emits `agnes init` (the workspace-rails delivery mechanism) in place of the legacy `agnes auth import-token` + `agnes auth whoami` pair, and uses `agnes catalog` as the smoke-verify step. `agnes init` already verifies the PAT internally, and `agnes catalog` doubles as a data-plane smoke check, so dropping `agnes auth whoami` costs no signal. Drops the now-redundant `tests/test_setup_instructions_analyst.py` and patches the one ordering test in `tests/test_setup_instructions.py` that referenced the old "Log in" / "Verify the login" headers. Also strips the `role=role` kwarg from `compute_default_agent_prompt`'s call into `resolve_lines` so the welcome-template render path keeps working; welcome_template.py's own role param is removed in a follow-up task. Plan: docs/superpowers/plans/2026-05-04-unified-setup-prompt.md task 1.	2026-05-04 22:08:48 +02:00
ZdenekSrotyr	103efb69f0	chore(cli-rename): replace stale `da` verbs in active code paths Bring admin UI, audit-log messages, code comments, and analyst-facing skill docs in line with the post-bootstrap CLI surface (`agnes pull`, `agnes push`, `agnes init`, `agnes snapshot create`). The legacy `_LEGACY_STRINGS` detection tuple in `app/api/claude_md.py` and the hook upgrade markers in `cli/lib/hooks.py` are intentionally left as-is — they exist precisely to flag pre-rewrite content for re-authoring. Strip "(folded from `da metrics list`)" / "(lifted from `da metrics show`)" / "Replaces the old `da analyst status`" docstring noise — the rename history is in CHANGELOG.md, not in module docstrings.	2026-05-04 21:10:43 +02:00
ZdenekSrotyr	e438170ade	merge: pull #174 (BQ materialize view fix + concurrency, 0.33.0) into bootstrap branch Brings in zs/materialize-sync-fix (PR #174): - BigQuery view materialize works (wrap admin SQL in bigquery_query()) - Per-table mutex + fcntl.flock for concurrent COPY corruption - Cost guardrail dry-run engages on materialized rows - Schema v23 -> v24 migration: rewrite source_query to BQ-native - Server-generated trivial source_query from bucket+source_table - Validator backtick relaxation for materialized rows - 0.33.0 release cut Conflict resolution: - CHANGELOG.md: keep our [Unreleased] (bootstrap rewrite content) ABOVE the new [0.33.0] section from #174. The bootstrap rewrite remains unreleased; it'll cut 0.34.0 (or later) when this PR merges to main. - tests/conftest.py: union — keep our analyst-bootstrap fixture re-export AND #174's bq_instance / stub_bq_extractor fixtures. - pyproject.toml auto-merged to 0.33.0 (matches the cut), correct. - src/db.py auto-merged: SCHEMA_VERSION = 24, _v23_to_v24_finalize added — no overlap with our work which left schema at v23. - CLAUDE.md auto-merged: schema-history paragraph extended with v24. Verified: 79/79 across CLI bootstrap suite + materialize suite + schema v24 migration tests pass locally on Python 3.13/macOS.	2026-05-04 20:53:00 +02:00
ZdenekSrotyr	92d477e422	fix(setup): default /setup to analyst, hide admin tile from non-admins Three coupled UX fixes for the analyst-onboarding flow: 1. Dashboard "Setup a new Claude Code" CTA was rendering admin paste prompt for everyone (analysts couldn't actually execute the marketplace plugin install / skills setup steps). render_agent_prompt_banner now picks role based on user.is_admin — analysts get the analyst flow. 2. /setup default role changed from admin to analyst. Most visitors are analysts; admin layout is opt-in via the admin tile or ?role=admin. 3. Admin tile is admin-only on the role-tile nav. Non-admins see only the analyst tile. Server-side: non-admin requesting ?role=admin is silently downgraded to analyst (otherwise they'd see admin paste prompt despite no tile). Tests: - New: test_setup_page_admin_tile_hidden_for_non_admin (anonymous client can't see "Admin CLI" or role=admin link) - New: test_setup_page_admin_role_downgraded_for_non_admin (anonymous ?role=admin → analyst layout, no marketplace step in clipboard) - New: test_install_preview_default_role_is_analyst (admin signing in to bare /setup gets analyst clipboard by default) - Renamed: test_setup_page_default_role_is_admin → ..._is_analyst - Updated: test_setup_page_admin_clipboard_renders_admin_layout uses FastAPI dependency_overrides to inject admin user (admin layout is now admin-gated) - Updated: test_install_preview_visible_for_signed_in_user explicitly passes ?role=admin to exercise admin layout	2026-05-04 20:20:37 +02:00
ZdenekSrotyr	ce108d4c6d	fix(schema): code-review follow-ups for `fac10b29` - _v23_to_v24_finalize: wrap row-update loop in BEGIN/COMMIT/ROLLBACK to match the project's transactional-finalizer pattern (compare _v12_to_v13_finalize, _v17_to_v18_finalize, _v18_to_v19_finalize). Pre-fix a process crash mid-loop left the schema_version unchanged but partially-converted rows persisted across restart — idempotent overall but inconsistent with project convention. - _v23_to_v24_finalize: re.sub replacement now uses a function-form (lambda) instead of an f-string, so any future project_id with a backslash sequence isn't misinterpreted as a group reference. - tests: add a Keboola-source materialized row case asserting the SELECT's source_type filter prevents non-BQ rewrites.	2026-05-04 19:32:24 +02:00
ZdenekSrotyr	fac10b29e4	feat(schema): v24 — rewrite materialized BQ source_query to BQ-native Materialize now wraps admin SQL into bigquery_query('<billing>', '<inner>') which requires the inner SQL to be BigQuery-flavor (backticked identifiers, native function syntax). v24 migrates existing rows from DuckDB-flavor (bq."ds"."tbl") to (`<project>.ds.tbl`) using the configured BQ project. Idempotent on already-converted rows; logs a warning and skips when the project isn't configured (operator can configure + restart for retry).	2026-05-04 19:15:54 +02:00
ZdenekSrotyr	f731ee7897	feat(setup): /setup?role=analyst\|admin branching with role tiles	2026-05-04 17:28:47 +02:00
ZdenekSrotyr	74c4047567	docs(orchestrator): #160 reviewer-flagged comment polish on _meta-without-inner-object path	2026-05-04 10:31:35 +02:00
ZdenekSrotyr	91aaeb9194	feat(repo): #160 add find_by_bq_path lookup for direct bq.* RBAC enforcement The upcoming /api/query RBAC patch (next phase) gates direct `bq."<dataset>"."<source_table>"` references in user SQL — every such path must point at a registered query_mode='remote' BigQuery row, otherwise the caller has stepped around the registry and around RBAC. Add `TableRegistryRepository.find_by_bq_path(bucket, source_table)` to support that lookup. Returns None if no row matches, the row dict if exactly one matches, or the oldest-by-`registered_at` row when 2+ match (no UNIQUE constraint on `(source_type, bucket, source_table)` — admins can in principle register a BQ table twice with different ids/names). Match is case-insensitive on bucket+source_table so user SQL `SELECT FROM bq.Finance.UE` resolves to a `(finance, ue)` registry row. NULL values in either column are excluded so a legacy NULL-bucket row never masks a legitimate non-NULL lookup. 5 RED tests cover: empty registry, non-BQ source rejected, single match, oldest-of-many tie-breaker, case-insensitive match, NULL-column exclusion. All initially failed with AttributeError; pass after the ~30 LOC method addition.	2026-05-04 10:31:35 +02:00
ZdenekSrotyr	9d0e4e687d	refactor(bq): #160 remove legacy_wrap_views config knob (always-wrap) Now that VIEW/MATERIALIZED_VIEW always wrap via bigquery_query() (the prior `legacy_wrap_views=True` branch behavior, made unconditional in the previous commit), the toggle has no semantic meaning and is removed across the codebase. Production code: - app/api/admin.py: drop the field from _OPTIONAL_FIELDS["data_source"] ["bigquery"]["fields"] and from _BQ_OPTIONAL_FIELD_DEFAULTS, plus the comment block above the defaults dict. - config/instance.yaml.example: drop the example snippet. - src/orchestrator.py: update the inner-objects skip-branch comment to reflect the new BQ behavior (the skip itself stays — keboola use_extension=False still inserts _meta rows without inner views). - app/web/templates/admin_tables.html: rewrite operator copy in the register and edit forms to reflect always-wrap. Tests: - tests/test_admin_server_config.py (TestServerConfigBigQueryFields): flip assertions from "field IS present" to "field NOT present" on legacy_wrap_views. Drop the test_post_persists_legacy_wrap_views test since the field no longer exists. - tests/test_admin_server_config_known_fields.py: same flip on the known-fields registry assertion. - tests/test_bigquery_extractor.py: drop the obsolete test_view_entity_does_not_create_master_view_by_default (asserted the bug we fixed) and test_legacy_wrap_views_toggle_restores_old_behavior (toggle no longer meaningful). Update remaining test docstrings. Operators with `legacy_wrap_views: true` set in their overlay get the new (equivalent) behavior automatically — the unrecognized key is silently ignored by the YAML loader. Operators with `false` get the issue-#160 fix as a behavior change, not a regression. Spec gate updated: production code grep gate grep -rn 'legacy_wrap_views' connectors app src config cli must return zero. tests/ excluded — historical "removed in #160" breadcrumbs and `assert "X" not in fields` regression guards retained as anti-regression signals.	2026-05-04 10:31:35 +02:00
ZdenekSrotyr	8cb6fdc546	fix(claude_md): load default via importlib.resources — survives /app/config bind-mount	2026-05-04 06:53:47 +02:00
ZdenekSrotyr	93fdea3461	fix(claude_md): RBAC-filter tables; align today with now (UTC) - _list_tables now accepts a user param and delegates to get_accessible_tables: admins see all, non-admins see only tables covered by their resource_grants. Fixes silent leak of table names to unauthorised analysts. - today derived from now.date() (UTC) instead of date.today() (server-local TZ), so today and now are always consistent. - Updated test_render_override_tables_list to seed an admin user so RBAC filtering doesn't hide the table; added three new tests covering per-user table isolation, admin sees-all, and no-grants-empty.	2026-05-04 05:57:22 +02:00
ZdenekSrotyr	f01eb4143d	feat(db,repo,renderer): schema v23 + claude_md_template + ClaudeMd renderer - Bump SCHEMA_VERSION 22 → 23; add claude_md_template singleton table to _SYSTEM_SCHEMA and _V22_TO_V23_MIGRATIONS; wire migration + fresh-install seed - src/repositories/claude_md_template.py: ClaudeMdTemplateRepository (get/set/reset) mirroring WelcomeTemplateRepository; defensive re-seed in get() - src/claude_md.py: compute_default_claude_md / render_claude_md / build_claude_md_context — rich renderer with RBAC-filtered tables, metrics, and marketplaces; reads override from claude_md_template or falls back to config/claude_md_template.txt; raises TemplateError on broken override - config/claude_md_template.txt: default Jinja2 markdown template restored from PR #167 history (tables, metrics, marketplaces, BQ guidance, corporate memory, directory structure, per-user footer)	2026-05-03 22:43:56 +02:00

1 2 3 4

156 commits