agnes-the-ai-analyst

Author	SHA1	Message	Date
ZdenekSrotyr	c910281df1	feat(cli): --json alias for --format json + remote-query exit-code regression test (#345 C+D) (#353 ) D — `agnes query --json` shortcut for `--format json`. Paste-prompts and LLM-assisted analysts routinely reach for `--json` first; the typer "Did you mean --stdin?" suggestion the missing flag previously produced was actively misleading. `--json --format <other>` is rejected as mutually exclusive. C — Verified that `_query_remote` already `raise typer.Exit(1)` on any non-200 response (cli/commands/query.py:116). The reporter who filed an explicit 502-path regression test alongside the existing 400 case so any future regression in the `raise typer.Exit(1)` line gets caught. Tests: 3 new in test_cli_query.py (5xx exit code, --json alias works, --json + --format csv mutually exclusive). 15/15 cli_query passes.	2026-05-19 16:39:46 +02:00
Vojtech	318802854c	fix(marketplace): chmod +x .sh files after fetch+reset, not just bootstrap (#352 ) Devin Review #350 caught a coverage gap: the chmod +x pass only ran in _bootstrap_clone (initial install), not in _git_fetch_and_reset (every subsequent `agnes refresh-marketplace` and `--check` follow-up). On core.filemode=false setups, a `git reset --hard FETCH_HEAD` overwrites the working tree without restoring the +x bit, so a hook plugin version bump would silently re-strip the bit and Permission-denied breakage would return on the next SessionStart. Extracted _chmod_clone_sh_files() helper; both _bootstrap_clone and _git_fetch_and_reset now call it. Best-effort, no-op on Windows NTFS.	2026-05-19 14:10:38 +00:00
Vojtech	ae67c40a81	fix(onboarding): /home install flow + agnes init UX hardening (#350 ) * fix(web): /home Step 2 recommends --dangerously-skip-permissions for setup The Step 4 paste runs ~20 shell commands (CLI install, workspace bootstrap, marketplace clone, MCP register, connector logins). Previous Step 2 recommended auto-accept-edits via Shift + Tab, which covers file edits but not Bash — users still clicked ~20 Yes prompts during setup. Step 2 now leads with `claude --dangerously-skip-permissions` as the recommended session flag (Bash + edits both skip). Session-scoped, drops on next plain `claude` — safe here because the pasted script is generated by this server and ends after a fixed sequence; the flag does not weaken future Claude sessions. Auto-accept-edits via Shift + Tab kept as the strict-review fallback; persistent YOLO allowlist link to /setup-advanced#yolo unchanged. * fix(web): swap /home Steps 2↔3, claude --yolo as copy-button command Folder creation moves to Step 2; Step 3 launches Claude from that directory with `claude --dangerously-skip-permissions`. The YOLO flag is rendered through the standard .install-cmd + copy-button affordance (matching Step 1 + Step 2), not inline prose. Step 4 paste runs ~20 shell commands that auto-accept-edits would not cover (Bash still prompts), so the YOLO flag is the default recommendation; session- scoped, drops on next plain `claude`. Setup script's pwd-check warning copy refreshed to reference "/home Step 2" (the new folder-creation step number). # Conflicts: # CHANGELOG.md * fix(web): open YOLO setup-advanced link in new tab Step 3 install-hero's persistent-YOLO link now opens /setup-advanced#yolo in a new window so users don't lose their /home install context mid- setup. target="_blank" + rel="noopener" (no reverse-tabnabbing). * fix(web): merge /home Step 3 fallback prose into prior paragraph Drop the <br><br> between the 'Session-scoped' line and the 'Prefer reviewing each command' line so the strict-review fallback flows on the same paragraph — less vertical space in the install-hero block. * docs(web): add "What leaves your machine" privacy callout on /home Install-hero lead now includes a short privacy paragraph: explains that session telemetry (prompts / tool-calls / tool-responses) flows back to the central catalog for failure-pattern analysis while raw data rows the user queries locally stay on their machine. Points at /agnes-private as the per-session opt-out. Also collapses leftover cherry-pick conflict markers in CHANGELOG.md into one clean [Unreleased] section. * fix(init): harden agnes init UX — 5 issues from David's report 1. chmod +x hooks. agnes init + agnes refresh-marketplace --bootstrap now set the execute bit on every .sh they land on disk (`<workspace>/.claude/hooks/.sh` after init; every `.sh` under the `~/.agnes/marketplace` clone after a bootstrap/pull). Git checkout doesn't always preserve filemode (filemode=false repos, ZIP extractions), so hooks were firing with "Permission denied" — silent SessionStart / PreToolUse breakage. Best-effort, no-op on Windows. 2. --token-file + AGNES_TOKEN. agnes init now accepts `--token-file <path>` and an `AGNES_TOKEN` env fallback alongside `--token`. Precedence: --token > --token-file > AGNES_TOKEN. The file / env-var paths dodge Claude Code's auto-classifier, which sometimes flags a long bearer token in `--token "eyJ..."` command line as a credential- exfil pattern. The pasted setup script now uses `--token-file ~/.agnes/token` (token written via single-quoted heredoc, umask 077) for the same reason. 3. Bash(agnes ) in allow. Default `.claude/settings.json` permissions. allow seeded by agnes init now includes `Bash(agnes )` alongside the bare `Bash` entry, so Claude Code's classifier sees an explicit allow for subsequent `agnes <verb>` calls inside the workspace it just bootstrapped. 4. .zshrc PATH dedup. Setup-script step 1's PATH-persist snippet (no-CA install path) replaced with a `grep -qF + \|\|` idiom so a re-run doesn't append a duplicate `export PATH=...` line. Fixed- string match (not regex) per the dedup-bug report. 5. `!` prefix doc note. Setup-script step 3 now explicitly tells the user: if Claude Code blocks an `agnes` command, prefix it with `!` (e.g. `! agnes init …`) to run the command directly in the shell, bypassing the auto-classifier. release: 0.55.1 — /home onboarding install-hero rework + agnes init UX hardening --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-19 15:26:35 +02:00
ZdenekSrotyr	64cf78860d	feat(stack): unified Browse + My Stack for Data Packages and Memory (v49 schema) (#333 ) * feat(unified-stack): Browse + My Stack + Recipes + RBAC matrix (v49–v55) Squash of 94 commits spanning the v49 → v55 unified-stack rewrite. Full per-feature breakdown lives in CHANGELOG.md under [Unreleased]. Major buckets: * v49 schema — first-class user_groups + user_group_members + resource_grants; admin can CRUD groups and grants; Google Workspace nightly sync writes into the new tables. * v49 data_packages — admin-curated bundles of tables, RBAC-gated, first-class section on /catalog Browse + My Stack. * v49 memory_domains — row-backed (replaces hardcoded VALID_DOMAINS enum); admin can CRUD; grants follow the same shape as tables and packages. * v50 cover_image_url + admin sidebar collapsibles + per-row Mode tooltip + admin queue domain badges + admin "+ New Item" seed flow. * v51 lifecycle status (prod/poc/coming-soon/draft) + category + palette swatches on admin modals. * v52 per-table detail page /catalog/t/<id>. * v53 Recipes — admin-curated SQL templates as a second tab on /catalog with full Edit/Delete admin affordances. * v54 soft-delete (deleted_at) + Undo toast for packages, memory domains, and recipes; hard_delete() retained as escape hatch. * v55 Recipes RBAC — ResourceType.RECIPE registered, inline Group Access matrix on Create + Edit Recipe modals (mirrors the Memory Domain pattern). * Activity Center per-resource filter (resource_prefix LIKE-anchored on audit_log.resource); admin nav g+letter keyboard shortcuts; loadAdminTablesLayout N+1 → single endpoint; /api/memory 30s page-level cache. * CI hardening — Keboola legacy tests pytest.importorskip; perf- smoke threshold widened to stop cold-cache flake. 5002 tests passing, 35 skipped. * feat(p2 backlog): Cmd-K palette + suggest-a-domain + nightly E2E + v55 schema 10-item P2 sweep on top of the unified-stack squash. New behaviour: * Cmd-K admin command palette (base.html) — fuzzy-search overlay over admin + user-facing routes. Arrows/Enter to navigate, Esc to close. * Stack-tabs digit shortcuts — 1/2/3 switch Browse / My Stack / Recipes on /catalog + /corporate-memory. * Friendlier non-admin empty state on /corporate-memory, plus a "Suggest a domain" CTA → POST /api/memory-domain-suggestions, admin queue with approve/reject. Backed by a new memory_domain_suggestions table (schema v55). * /admin/corporate-memory 7-tab strip grouped under Moderation / Catalog parent labels. * Bulk-assign table → package dropdown annotates each option with "(N of M tables already in)" so the existing distribution is visible before picking a target. * GET /api/memory + /tree accept is_required filter; admin status dropdowns route the "Required" sentinel onto it (status no longer holds 'mandatory' post-v49, so the old dropdown returned nothing). * chip-input.js is now opt-in per template via {% block extra_scripts %} instead of loaded globally on every page from base.html. * Edit-modal close helpers consolidated onto _closeEditModalById(); docs the per-source-type modal architecture decision. * New .github/workflows/e2e-nightly.yml runs agent-browser smoke scripts (scripts/e2e/smoke_.sh) against a docker-compose stack nightly at 04:30 UTC; failures open an agent-browser-nightly issue. 5012 tests passing, 35 skipped. fix(visual audit): 6 page regressions on memory + data-package surfaces agent-browser walkthrough of every memory + data-package page in the PR turned up 6 real bugs. Fixes: 1. Admin memory modals were dead. Duplicate `let _cmdNewDomainId` declarations from the deprecated step-2 RBAC stubs in admin_corporate_memory.html collided with the live state vars declared earlier in the same <script> → SyntaxError on parse → the entire second script block silently failed → every inline onclick= handler defined there (`+ New Memory Domain`, Edit, etc.) was a no-op. Removed the duplicate stubs. 2. /catalog/t/<table_id> + /catalog/r/<slug> rendered unstyled. Both templates injected their CSS via {% block head %} but base.html exposes {% block head_extra %} — wrong block name meant <style> rules never reached the rendered HTML. Renamed to head_extra. Hero card, section cards, dark SQL block, proper full-width inputs all now render as designed. 3. L49 leak — "MANDATORY" KPI label + "Make Mandatory" row buttons on /admin/corporate-memory still used the old word. Renamed to "Required" / "Mark as Required" so UI matches the data model (v49 split moved the Required tier onto the orthogonal is_required boolean; status no longer holds 'mandatory'). 4. Activity Center Resource dropdown didn't know the v55 `memory_domain_suggestion:` namespace — added it. 5. Tab strip on /admin/corporate-memory wrapped text 2× per button on narrow viewports after the L50 MODERATION/CATALOG group labels pushed total width past most viewports. Switched the strip to flex-wrap:nowrap + overflow-x:auto with white-space:nowrap + flex-shrink:0 on every direct child so the tabs stay one row and slide horizontally when they overflow. 5012 tests passing, 35 skipped. * rebase-cleanup: align with main's 0.54.25-27 API design + comment fix Three follow-on fixes after rebasing onto origin/main (0.54.27): * admin_tables.html: dropped a stray nested ``{% if data_source_type == 'keboola' %}`` around ``prefillFromKeboolaTable`` (main never had it; the outer Phase F2 guard already covers it) and reworded a JS comment that contained literal ``{% %}`` tokens which Jinja was parsing as a real tag → unbalanced if/endif → 30 template render failures across the suite. * /api/stack/subscription/{type}/{id}: DELETE now returns 204 instead of 200 per the 0.54.26 design rules. CLI client + parity tests updated to accept 2xx / assert 204. * Memory-domain suggestion approve/reject paths added to ``_VERB_PATH_ALLOWLIST`` — they are pending → approved/rejected state-machine transitions (approve also creates the real memory_domains row as a side effect), so the RPC shape is intentional rather than a missed PATCH refactor. 5035 tests passing, 35 skipped. * fix(catalog_table_detail): real polish pass — hero glyph, dedup pills, rows/size meta, scoped sync CTA The previous fix only got the block-name typo so the existing CSS rendered. The actual layout was still wireframe-tier on close inspection: * No cover glyph in the hero (a flat white card with title + meta line); data-package + memory-domain detail pages both have a colored icon square. Restored parity — table.icon emoji if set, otherwise initials on a colored square using table.color. * "INTERNAL" pill rendered twice for agnes_audit etc. — the mode pill and the source-type pill happened to be identical strings. Now skip the source pill when it matches the mode (`internal == internal`). * Bucket / source_table code chip showed `Agnes Internal.audit_log` for internal rows — meaningless to a user. Hidden when source_type is internal. * `pairs_well_with` admin input was a comma-separated `<input>` always visible. Wrapped all 4 sections in an Edit-on-demand toggle: read- only display by default, "+ Add" / "Edit" button on the right edge of each section header reveals the inline form, Cancel hides it. * "Trigger sync now" was a cramped link squashed into the empty-state flex row (visible as `Tr…` overflow before). Promoted to a proper btn-primary button under the empty-state copy. Hidden entirely for internal tables (which are server-managed — no upstream to pull). * Hero meta now surfaces row count + payload size (when sync_state has them) + last sync timestamp on a single line — was missing from the original. * Mode pills colored by tier (local=green, remote=amber, materialized= blue, internal=gray) so the basic fact about a table reads at a glance, not from upper-cased ALL-CAPS text alone. * tests(v56): TDD baseline for extended data-packages content + per-table docs 68 failing tests across 8 files spec the v56 surface before any implementation lands: * test_schema_v55_to_v56_migration.py — schema bump, additive ALTERs on data_packages + table_registry, idempotency, sequential-upgrade preservation * test_data_packages_repo_v56.py — repo create/update/get/list for owner_name, owner_team, tags, long_description, when_to_use, when_not_to_use, example_questions (JSON list round-trip, empty defaults, partial-update preservation) * test_table_registry_v56_docs.py — update_docs for grain, platforms, partition_col, history, gotchas; preserves v52 docs columns * test_api_data_packages_v56.py — PUT/POST/GET for all new fields, field-level validation (tag count, bullet length, description size), virtual badge derivation (curated/new) * test_api_registry_docs_v56.py — PATCH /api/admin/registry/{id}/docs for v56 fields, validation, RBAC unchanged * test_web_catalog_package_detail_v56.py — /catalog/p/<slug> rewrite asserts on rendered owner line, tag pills, badges, What it is, Use it when, Skip it when, Example questions, per-table extended detail in collapsible row, key-gotcha distinctness, admin-only Edit * test_web_stack_card_v56_metadata.py — Browse-grid card additions (owner chip, tag chips, badges) without breaking back-compat for rows missing the new fields * test_data_packages_no_vendor_content.py — CI guard: scans app/ + src/ + cli/ + config/ + scripts/ for Groupon-specific tokens from the colleague's spec MD; fails if any leak into OSS surfaces * test_db_schema_version.py — bumped 55 → 56 with rationale Plus updates schema-version assertion to 56. Implementation lands in subsequent commits (schema migration → repo → API → templates). * feat(v56): schema + repo for extended data-packages content Schema additions (ALTER ADD COLUMN IF NOT EXISTS — additive + idempotent): * data_packages: owner_name, owner_team, tags, long_description, when_to_use, when_not_to_use, example_questions (JSON-as-VARCHAR for the lists) * table_registry: grain, platforms, partition_col, history, gotchas (extends the v52 sample_questions / things_to_know / pairs_well_with docs surface with structured per-table content) Repo extensions: * DataPackagesRepository.create + update accept the new fields with the same Optional-is-no-op contract as v51 (pass an empty list to clear a JSON column) * _decode_row decodes the new JSON-list columns to Python lists; NULL rounds back to [] so callers don't branch * TableRegistryRepository.update_docs grew the v56 fields alongside the existing v52 ones — single PATCH can write either tier atomically * TableRegistryRepository._decode_row picks up platforms + gotchas in the same NULL-tolerant decoder 22 repo + migration tests passing. API + UI land in subsequent commits. * feat(v56): API surface for extended data-packages + per-table docs CreateDataPackageRequest + UpdateDataPackageRequest grew the v56 fields (owner_name, owner_team, tags, long_description, when_to_use, when_not_to_use, example_questions) with per-field validators that match the Foundry spec checklist: * tags: ≤8 entries × ≤30 chars * long_description: ≤4000 chars * use/skip: ≤8 bullets × ≤200 chars * example_questions: ≤12 × ≤200 chars _serialize emits all v56 fields plus a virtual ``badges`` list derived server-side at render time (no DB column needed): "curated" when the creator is in the Admin group, "new" within 30 days of created_at. Backdating created_at or admin-status changes pick up automatically. PATCH /api/admin/registry/{id}/docs extended with v56 structured per-table fields (grain, platforms, partition_col, history, gotchas). gotchas: list of {key: bool, body: str} Pydantic models with the same ≤8 cap; first key=true entry becomes the Key gotcha on the rendered package detail page. PATCH echoes the fresh state so callers can re-render without a second GET. 26 API tests passing (16 data-packages + 10 registry-docs). * feat(v56): /catalog/p/<slug> rewrite + Browse-grid card augmentation The third (and final) v56 commit lights up the UI surfaces backed by the schema + API commits earlier in this PR: * /catalog/p/<slug> template rebuilt around the Foundry spec's section ladder — hero (icon + name + badges + owner + tags + description + meta + Add-to-stack), "What it is" markdown body, paired "Use it when / Skip it when" panels, "Tables in this package" with collapsible per-table extended detail (grain / platforms / partition_col / history / gotchas + sample questions), and an "Example questions you can ask Claude" prompt panel. Each section guarded by ``{% if pkg.<field> %}`` — empty content fields hide the section entirely (no "No X yet" placeholder noise on the public-facing drilldown). * router catalog_package_detail hydrates per-table v56 fields onto the tables list + derives the virtual badges (curated / new) server-side from creator-in-Admin + 30-day created_at. * StackResolver.ResourceEntry grew owner_name / owner_team / tags / badges; _fetch_entries pulls the v56 columns + computes badges once per fetch using a single Admin-group SELECT. * _data_package_entry_dict adapter passes the new fields through to the macro; tags are merged source-type pills + admin-authored category tags per the spec convention. * _stack_card.html renders the v56 badges (top-left, data-badge= hooks) + the owner chip (data-card-owner hook) without breaking back-compat — pre-v56 rows render unchanged. * Admin PUT handler strips the v56 docs fields from the read-modify-write merged dict so register() doesn't blow up with the now-larger row shape (same pattern as the v52 docs fields stripping). 5115 tests passing (+98 v56 + 18 fixed regressions from the merged- register PUT path), 35 skipped. * fix(rbac): Edit-on-package + Group-access 'required' persistence + CI vendor guard Three related bugs reported on the merged-with-main branch: 1. Clicking Edit on a Data Package card landed on /admin/tables with a `#<pkg.id>` hash that nothing listened to — admin saw the global table listing, not the editor for that specific package. Added a `?edit_package=<pkg_id>` query-param handler in admin_tables.html (analog to the existing `?edit=<table_id>` and `?assign_to=<pkg_id>` patterns) that calls openEditDataPackageModal on DOMContentLoaded after a 250ms layout settle. Updated the package-detail Edit link to use the new query param. 2. Setting Group Access to 'required' didn't persist — re-opening the modal showed 'available'. Root cause was the v49 ``resource_grants.requirement`` enum existing in the DB but the POST /api/admin/grants endpoint not surfacing it: ``CreateGrantRequest`` declared only group_id + resource_type + resource_id, so Pydantic silently dropped the matrix's ``requirement: 'required'`` payload and the new row landed at the DB column default ('available'). Plumbed ``requirement`` through ``CreateGrantRequest`` → ``ResourceGrantsRepository.create`` so the value persists in one round-trip. Plus a UNIQUE-constraint race in the matrix diff-apply: DELETE-old + POST-new ran in parallel via ``Promise.allSettled``, so POST could fire first and trip the unique check before DELETE freed the slot. Switched to sequential (await all deletes; then await all writes) across all three matrices (Edit Data Package, Edit Memory Domain, Edit Recipe). 3. CI vendor-content guard ``test_no_groupon_specific_strings_in_oss`` tripped on two of my own docstrings: a "Foundry Data team" mention in two src/db.py comments + an ``s1_session_landings`` example in cli/skills/agnes-table-registration.md. Rephrased the comments to "extended-descriptions admin spec" and replaced the example with a generic ``events_daily`` table name. 5164 tests passing, 35 skipped (+4 regression tests pinning the POST /api/admin/grants requirement contract). Vendor guard back to green. * fix(catalog): admin Browse path drops v58 card fields The /catalog and /memory admin god-mode branch built ResourceEntry instances inline from pkg_repo.list() / domains_repo.list() and skipped owner_name, owner_team, tags, and derived badges (curated/new). Visible symptom: a package with an owner + tags rendered with the v56 chrome for non-admin viewers but as a bare card for admins. Adds StackResolver.browse_admin(user_id, resource_type) — admin god-mode Browse that walks the full table but routes through the same _fetch_entries enrichment pass as browse(), so admin + non-admin Browse stay visually consistent. Both /catalog and /corporate-memory routes switch to it. Regression test in tests/test_stack_resolver_browse_admin.py covers: owner/tags propagation, new/curated badge derivation, in_stack from admin subscriptions, all-packages-regardless-of-grants, and the ValueError for unsupported resource types. * fix(catalog): three /catalog tab-strip UX bugs 1. Required Remove → red toast browse_admin passed empty required_ids to _fetch_entries, so the admin's own required grants surfaced as 'available' and the macro rendered an actionable Remove button that POST /unsubscribe 400'd on. Now derives required_ids from the admin's own groups so Required packages render with the disabled "In stack (required)" button. Regression test in test_stack_resolver_browse_admin.py. 2. Remove green-toasts but card stays until refresh The My-Stack empty-state placeholder was only emitted server-side when stack_entries was empty at render time. Removing the last card left the tab completely blank — users read that as "Remove didn't work, let me refresh". Both grid + empty-state are now always rendered with one of them initially hidden; the JS swaps visibility on add/remove instead of injecting DOM. Same fix in /corporate-memory. 3. "What are Recipes?" + ambiguous (admin) suffix Recipes tab now carries its own curator-block explainer (the shared one was moved inside Browse view so it doesn't bleed across tabs). The grey "(admin)" suffix becomes a yellow .admin-only-hint chip with a title tooltip — visibility hint is now unambiguous: yellow chip = "only you see this", non-admins don't see the affordance at all. * schema: renumber v51..v58 → v52..v59 to make room for main's v51 Main 0.54.29 introduced a NEW v51 (table_registry.bq_fqn — issue #343) that releases ahead of this branch. The unified-stack chain v51..v58 shifts up by one so main's v51 stays as the released schema and ours become v52..v59. Function names, internal version bumps, dispatch ladder thresholds, and the migration-test references all move together. Subsequent merge with main lands the bq_fqn column at the freed v51 slot. * fix(seed): seed admin lands in BOTH Admin AND Everyone groups The LOCAL_DEV_MODE / SEED_ADMIN_EMAIL bootstrap only added the seed user to Admin. Everyone-scoped grants — the canonical "every-user- sees-this" pattern for Required onboarding — didn't surface for the seed admin's own /catalog because they weren't in Everyone. Symptom: admin grants a Required-tier package to Everyone, then sees it on /catalog still rendered with an "Add to stack" button (because the admin's resolved required_ids was empty for that package). The dual-membership keeps Admin (authorization) and Everyone (default-grant target) intentionally separate per the design comment on UserRepository.create — every membership remains traceable to a concrete row, just now with a system_seed row in Everyone too. Both INSERTs go through UserGroupMembersRepository.add_member which is idempotent on (user_id, group_id), so re-fires on every lifespan startup don't duplicate rows. Regression test in test_main_seed_admin_everyone.py. * style: unify admin-only hints across marketplace + memory detail pages Replaces three stale ``(admin)`` parentheticals with the same yellow ``admin-only`` chip introduced for /catalog tab actions. Same tooltip copy ("Visible only to admins — analysts won't see this …") so the visibility hint is unmistakable wherever it appears: - Hard delete on marketplace_plugin_detail (admin-only destructive action — same gating as the original suffix conveyed). - Hard delete on marketplace_item_detail (same). - Edit link on memory_domain_detail (title-attr only before; now a visible chip too). Non-admin viewers never saw these affordances — the gates are unchanged. Pure styling pass for consistency. * fix(catalog): exclude soft-deleted data packages + memory domains from Browse ``StackResolver._fetch_entries`` and ``browse_admin`` were querying data_packages / memory_domains without a ``deleted_at IS NULL`` guard. A package soft-deleted via /admin/* (v54 soft-delete contract) stayed visible on /catalog and /memory until either an Undo or a hard delete — directly contradicting the soft-delete UX which is supposed to remove the affordance immediately and only retain the row for the Undo window. The repository accessors (DataPackagesRepository.list, MemoryDomainsRepository.list, list_packages_of_table, etc.) already filter deleted rows; this commit brings the resolver's direct SQL in line with that contract. Regression test in test_stack_resolver_browse_admin.py. * fix(catalog): Add/Remove updates full card chrome, not just button The previous _applyStackChange flipped only the footer button label — the card border (.is-in-stack class), top-right "In stack" badge, and button color class (--add / --remove) stayed at their server-rendered state. After Add the user saw the button checkmark but the rest of the card still looked like "available, not in stack". They read this as "the change didn't take — let me refresh". This commit makes the optimistic update mirror what the server-side macro renders for the new state: * ``c.classList.toggle('is-in-stack', becameInStack)`` — flips the border + visual state class. * Top-right ``.stack-card__req-badge--instack`` badge is injected on Add, removed on Remove (skipped when ``data-requirement='required'`` — that slot is owned by the Required badge). * Button text is "Remove" / "+ Add to stack" matching the macro (was "✓ In stack" which was visually nice but inconsistent). * Button color class --add / --remove swaps so the destructive Remove tint kicks in immediately. The clone-into-My-Stack path applies the same updates so the new card in My Stack reads identically to a server-rendered in_stack card. Mirrored in /corporate-memory. * fix(memory): four Devin-review bugs on /memory drill-down + manifest PR #333 Devin review surfaced four real bugs that ship a broken /memory experience even though the unit tests passed. 1. Manifest md5 omits is_required + content (app/api/sync.py:836-840) _build_memory_domains_section hashed only (id\|title\|status) per item. _build_per_domain_markdown routes items between "## Required" and "## Approved" by is_required and embeds full content — so an admin edit of either dimension left the manifest md5 unchanged, `agnes pull` skipped the re-fetch, and the analyst kept a stale bundle.md. Now both fields participate in the hash. 2. required_count always 0 (src/repositories/memory_domains.py) list_items_of_domain only SELECTed (id, title, status) so the `it.get("is_required")` in the manifest builder always evaluated to None → required_count = 0 regardless of actual state. The manifest builder advertised a count it could never compute. Now projects is_required + content too (required by fix 1 anyway). 3. Vote URL 404 (memory_domain_detail.html:289-290) Constructed `/api/memory/items/{id}/vote` but the route is `/api/memory/{id}/vote`. Every upvote/downvote button was a silent no-op. 4. Dismiss/undismiss URL + method both wrong (memory_domain_detail.html:296-305) Constructed `/api/memory/items/{id}/dismiss` (extra /items/) and /undismiss (no such route — undismiss is DELETE on /dismiss). Both buttons silently 404'd. Now POST + DELETE on `/api/memory/{id}/dismiss` per app/api/memory.py:635/675. * fix: multi-agent reviewer findings — vendor-token scrubs + manifest md5 predicate + soft-delete filter Three reviewer findings from the multi-agent review on PR #333, fixed in-place per CLAUDE.md issue-economy rule. Reviewer-rules (Important — vendor-agnostic OSS): - app/main.py:218 comment: replaced 'foundryai-prod' with generic 'a customer prod instance' phrasing. Public OSS repo must not carry customer-specific tokens (CLAUDE.md § Project conventions). - tests/test_table_registry_v56_docs.py:70 fixture string: replaced "user_brand_affiliation = 'groupon'" with 'acme' on the same rule. Reviewer-architecture (closes still-unresolved Devin 🚩 ANALYSIS): - app/api/sync.py _build_memory_domains_section: md5 hash loop now filters items to the SAME predicate the bundle renderer uses (is_required OR status='approved'). Pre-fix the hash iterated ALL items but _build_per_domain_markdown only rendered the union of required items + approved-non-required items — so an admin edit to a pending/rejected non-required item flipped the md5 against an identical-bytes bundle, triggering a wasteful re-fetch on every analyst's next 'agnes pull'. The earlier commit fixed the hash-input fields (is_required + content); this closes the set-of-items asymmetry Devin separately flagged. Reviewer-RBAC (minor cleanup): - app/resource_types.py _data_package_blocks and _memory_domain_blocks now filter 'WHERE deleted_at IS NULL' (v54 soft-delete column) so the /admin/access UI doesn't surface soft-deleted entities as grantable. Mirrors the existing filter on _recipe_blocks. No security leak pre-fix (resolver double-filters and re-checks at serve time), just UI cleanliness. - app/services/stack_resolver.py add_to_stack: docstring note added explaining that authorization is enforced at the API layer (app/api/stack.py can_access gate), not at the resolver. The initial review suggested adding a defensive 403 here, but that broke 5 existing tests that legitimately call add_to_stack directly without setting up grants first; the docstring captures the contract instead. stack() already intersects subscriptions with current available_ids on every read, so a 'zombie' row from a misuse never leaks into the user-facing manifest. * release: 0.55.0 — unified Browse + My Stack (Data Packages + Memory), schema v48→v59, 3 BREAKING	2026-05-19 15:00:15 +02:00
minasarustamyan	c6c72b9c00	feat(flea): marketplace refactor — data model, attribution, UI unification (#342 ) * feat(flea): phase-1 — title, tagline, synthetic_name columns + upload UX Schema v49 adds three user-facing metadata columns to store_entities: - title (NOT NULL) — humanized display name shown on marketplace surfaces in later phases. Acronym-aware humanizer in src/store_naming.py (27 entries: MCP, API, OAuth, S3, …) shared with the frontend via Jinja-injected dict so JS pre-fill and Python backfill produce identical output. - tagline (NULL, ≤200 chars) — optional short description for card listings. Long-form `description` stays. - synthetic_name (NOT NULL) — deterministic `<name>-by-<owner_username>` stored as a column for indexing and as the single source of truth for attribution lookups in later phases. Today's bundle bake still uses suffixed_name() at the same call sites. Migration (_v48_to_v49_migrate, Python function — humanize has no SQL equivalent) backfills existing rows: title from humanize_name(strip_archive_suffix(name)), synthetic from the concat formula; tagline stays NULL. Idempotent (ADD COLUMN IF NOT EXISTS + SET NOT NULL no-op on re-run). Upload form (store_upload.html step 2) reorders fields: Title (pre-filled from server-side humanize, JS keeps it in sync until the user edits manually) → Name + dark synthetic preview on one row (matches marketplace_item_detail.html dark code styling, no copy button — preview only) → Short description with character counter → Description (unchanged). Edit form (store_edit.html) mirrors the layout with pre-filled values from the entity row. API: - POST /api/store/entities/preview returns `title` (humanized fallback) for upload form pre-fill. - POST + PUT /api/store/entities accept `title` and `tagline` form fields with 100/200-char validation; PUT recomputes synthetic_name when `name` changes (caller responsibility per repo contract). - StoreEntityResponse exposes all three new fields. Repository: - create() takes title + tagline + synthetic_name as optional kwargs with derived defaults (humanize_name(name) / concat) so existing test fixtures don't need to thread them. - update() supports partial updates on all three; tagline empty string clears via NULL sentinel. - archive() recomputes synthetic_name on rename to the archived slug so the column stays consistent with name. Tests: - New test_schema_v48_to_v49_migration.py: fresh install, populated-row backfill (incl. archived row strip), idempotence, NOT NULL constraint verification. - test_store_naming.py: 14 humanize parametrize cases + acronym dict invariants. - test_store_api.py::TestStoreV49Metadata: preview humanize, POST with explicit + fallback title, 100/200-char rejects, PUT partial update + synthetic recompute on rename. - Schema version assertion bumps (48 → 49) in test_db_schema_version, test_home_stats, test_schema_v42_migration, test_schema_v46_migration. Phase 1 only — surface rendering on cards / detail pages and Claude Code bundle propagation come in later phases. * feat(flea): phase-2 — wire title/tagline/owner through marketplace cards + detail pages Phase 1 (7f4cfcbb) populated the three new columns on store_entities; phase 2 surfaces them across the web presentation layer so the kebab- case slug + bare username no longer leak into user-facing copy. API: - `_flea_to_item` now takes `conn` (both callsites updated) and sets `display_name=entity.title`, `tagline=entity.tagline`, `owner= _resolve_owner_display(conn, owner_user_id, owner_username)` — matches the chain the curated path already uses (users.name → users.email → fallback). The card JS chain `it.display_name \|\| it.name` then renders the friendly form; `name` stays at the suffixed slug as the technical identifier JS uses for fallbacks. - `flea_detail` adds `display_name` + `tagline` to PluginDetailResponse so the standalone skill/agent + plugin detail heroes pick them up through the existing `d.display_name` / `d.tagline` chains. - `_flea_inner_parent_fields` swaps `parent_display_name` from `strip_archive_suffix(name)` to `entity.title or strip_archive_suffix( name)`. Drives parent-plugin label in four surfaces at once: breadcrumb 3rd segment, hero "part of <plugin>" meta-row, helper "This skill is part of <plugin>" panel, and the Details sidebar's "Parent plugin" row. Templates — `marketplace_item_detail.html`: - Pre-render: browser title, hero h1, and hero-window-label read `(entity.title if entity else None) or inner_name or item_name or plugin_name` so the SSR shell shows the friendly title before the JS fetch lands (no flash of kebab-case). - Breadcrumb last segment for flea standalone drops the `d.manifest_name \|\| heroTitle` fallback in favour of just `heroTitle` — manifest_name is the suffixed slug and users explicitly didn't want it in the path. - Hero meta-row for flea standalone is now hidden. The prior "by <author> · N installed · <size>" line duplicated install count (hero telemetry chip below), owner + bundle size (Details sidebar). Templates — `marketplace_plugin_detail.html`: - Same SSR pre-render swap (title, h1, window-label, crumb-name). - Hero tagline element starts hidden; JS shows it only when `d.tagline` is truthy. Pre-fix it fell back to `d.description` (long-form text), which read awkwardly under the h1 and pulled the hero too tall. Description still renders in the "What it does" panel below the hero. - Initial "Loading…" placeholder removed so entities without a tagline don't flash that text mid-fetch. Tests: - New `TestFleaPhase2Presentation` class in test_marketplace_api.py (6 cases): card title + tagline + full-name owner, owner fallback chain when users.name is NULL, flea_detail exposes title + tagline, tagline null when omitted, inner skill parent_display_name uses entity.title (explicit + humanize-fallback variants). - Updated `TestListItems.test_flea_lists_uploads` to assert both `display_name == "Alpha"` (humanized) and `name == "alpha-by-alice"` (suffixed slug compat). - Updated `TestWebPages.test_marketplace_flea_detail_page_renders` to look for the humanized title ("Page Skill") in the SSR shell instead of the kebab-case `page-skill`. * feat(flea): phase-3 — read synthetic_name from DB, suffixed_name() only on write Phase 1 added the column + backfill, repo write paths keep it in sync. Phase 3 routes every READ callsite through `store_entities.synthetic_name` directly instead of recomputing `<name>-by-<owner_username>` on the fly, and switches the collision query off the inline string concat. The `suffixed_name()` primitive now lives exclusively in write flows. Read callsites updated (all read `entity["synthetic_name"]` directly, no fallback — the column is NOT NULL and a missing value would be a real bug worth surfacing as KeyError): - app/api/marketplace.py:_flea_to_item — card MarketplaceItem.name. - app/api/marketplace.py:flea_detail — PluginDetailResponse.manifest_name. - app/api/store.py:_entity_to_response — StoreEntityResponse.invocation_name. - app/api/store.py PUT bundle re-bake — `suffixed` passed to `_bake_plugin_tree`; entity is loaded pre-rename, so its synthetic_name is the OLD value `_bake_plugin_tree` expects. - app/api/store.py PUT rename — `old_suffix` for `_rename_baked_tree`. - app/api/my_stack.py — StoreInstallEntry.invocation_name. - src/marketplace_filter.py — manifest_name in served plugin entry. `suffixed_name` imports removed from marketplace.py, my_stack.py, and marketplace_filter.py (no remaining callsites). store.py keeps the import for its write paths: - POST create (`suffixed = suffixed_name(final_name, username)` → passed to `_bake_plugin_tree` and `repo.create(synthetic_name=...)`). - PUT rename collision check (`new_suffixed`). - PUT rename `new_suffix` for `_rename_baked_tree` (proposed value). - PUT rename `new_synthetic` for `repo.update(synthetic_name=...)`. - Archive `old_suffix` + `new_suffix` for `_rename_baked_tree` (retro-compute pre-archive value after `repo.archive` already overwrote the DB row with the post-archive synthetic). Collision SQL — `_suffixed_already_taken`: WHERE name \|\| '-by-' \|\| owner_username = ? (before) WHERE synthetic_name = ? (after) Same matches today (phase 1 backfill + NOT NULL invariant + write paths in sync); indexable + single source of truth going forward. Repository: - UserStoreInstallsRepository.list_for_user explicit SELECT extended with `se.title`, `se.tagline`, `se.synthetic_name` so my_stack and marketplace_filter callers can read them off the joined row. Tests: - test_store_api.py::test_invocation_name_reads_from_synthetic_column — upload entity, manually override the column with a non-canonical value, verify GET response returns the override (proves read path consumes the column, not recomputes). - test_marketplace_api.py::test_flea_card_and_detail_read_synthetic_name_from_db — same proof for `MarketplaceItem.name` (card) and `PluginDetailResponse.manifest_name` (detail). * feat(flea): phase-4 — rename agnes-store-bundle → flea (synthetic plugin) The synthetic plugin that wraps loose flea-market skills + agents into one Claude Code plugin is renamed from `agnes-store-bundle` to `flea`. Plugin-type flea uploads (their own standalone plugin entry) are unaffected. Constants: - src/marketplace_filter.py: - BUNDLE_PLUGIN_NAME: "agnes-store-bundle" → "flea" (Claude Code plugin manifest name + .claude-plugin/plugin.json name) - BUNDLE_PREFIXED_NAME: "store-bundle" → "flea" (on-disk ZIP / git tree path, now plugins/flea/...) Attribution layer (services/session_processors/usage_lib.py): - FLEA_BUNDLE_PREFIX: "agnes-store-bundle" → "flea". The JSONL invocation identifier going forward is `flea:<skill-name>`. - New `_LEGACY_FLEA_BUNDLE_PREFIXES = ("agnes-store-bundle",)`. `MarketplaceItemLookup.resolve()` + `_attribute_event()` accept BOTH the new and the legacy prefix so historic usage_events (~90-day retention) continue attributing to source='flea'. The tuple becomes a no-op once the rename has been live past the retention window — a follow-up commit can drop it then. - USAGE_PROCESSOR_VERSION bumped 6 → 7 so the session-pipeline reprocess loop re-runs attribution with the new + legacy prefix branches. User-facing copy: - /api/store/bundle.zip Content-Disposition filename: agnes-store-bundle.zip → flea.zip - `agnes admin store pull` default --out: agnes-store-bundle.zip → flea.zip - Docstrings + JS comment + welcome template comment updated. Tests: - skill_flea.jsonl fixture identifier updated to flea:flea-skill. - New skill_flea_legacy.jsonl with the legacy prefix for backward-compat coverage. - New test `test_legacy_agnes_store_bundle_prefix_resolves` replays the legacy fixture and asserts source='flea' attribution still lands. - All other test assertions / mocks substituted mechanically: test_session_processor_usage.py, test_usage_rollups.py, test_marketplace_filter_store.py, test_store_api.py, test_cli_refresh_marketplace.py. - `_seed_flea_entity` (test_usage_rollups.py) + `_seed_attribution` (test_session_processor_usage.py) helpers now supply the NOT NULL `title` + `synthetic_name` columns from phase 1, since they INSERT directly bypassing the repo's create() fallback. Client rollover note (CHANGELOG): `agnes refresh-marketplace` will install the new `flea@agnes` plugin and the local marketplace clone's `plugins/store-bundle/` source folder is removed via `git reset --hard`. Whether Claude Code itself auto-prunes the orphan `agnes-store-bundle @agnes` registry entry is undocumented — to verify empirically on the dev VM. If the orphan entry lingers, a follow-up will add targeted cleanup; until then users can manually run `claude plugin uninstall agnes-store-bundle@agnes`. Verified locally: 98 passed (session_processor_usage + usage_rollups + marketplace_filter_store + cli_refresh_marketplace) + 228 passed/2 skipped (store_api + marketplace_api + admin_store_submissions + store_entity_versions + store_repositories). * fix(flea): phase-5 — attribution keyspace mismatch (closes #335) Pre-fix every flea skill/agent invocation silently fell through to `usage_events.source = 'builtin'`. Root cause: lookup tables in `services/session_processors/usage_lib.py` keyed `_flea_entities` (and the derived `_flea_plugins` set) by `store_entities.name` — the un-suffixed display name. Claude Code writes invocations as `flea:<synthetic_name>` (e.g. `flea:xlsx-by-c-marustamyan`), so `dict.get(local)` always missed and the resolver fell through to builtin. Result: marketplace cards, detail telemetry chips, admin group-by-source all showed 0 flea invocations even when the raw JSONL stream was correct. Phase 1 added the `synthetic_name` column + backfill; phase 4 renamed the bundle prefix to `flea`; phase 5 finally flips the lookup keyspace to match what JSONL writes. usage_lib.py: - `MarketplaceItemLookup.__init__` preload: `SELECT synthetic_name, type FROM store_entities` (was `SELECT name, type`). `_flea_plugins` set derived from those keys, so it now carries synthetic_names too — matches what Claude Code writes when invoking a skill nested inside a flea plugin (`<synthetic>:<inner>`). - `rebuild_rollups` preload: same SELECT change; also derives `flea_plugins` and threads it through `_aggregate_events` / `_rebuild_window`. - `_attribute_event`: signature extended with `flea_plugins`; new branch `if prefix in flea_plugins: return ("flea", default_type, prefix, local)` for flea-plugin-nested skills/agents. This branch was added to `MarketplaceItemLookup.resolve()` in v6 (commit e076ebbe) but the rollup builder's helper was never updated to match, so nested skills inside flea plugins silently dropped out of the daily/window fact tables. - `USAGE_PROCESSOR_VERSION`: 7 → 8. Forces the session-pipeline reprocess loop to re-attribute existing usage_events rows with the corrected lookup so rollup tables fill correctly on the next tick. marketplace.py — 4 API stats lookup callsites switched from `entity["name"]` to `entity["synthetic_name"]`: - `_flea_to_item` (card stats lookup) - `flea_detail` (`_build_telemetry` + `_load_inner_items_stats_by_parent`) - `flea_skill_detail` (inner detail `parent_plugin` key) - `flea_agent_detail` (inner detail `parent_plugin` key) Tests: - `skill_flea.jsonl` invocation: `flea:flea-skill` → `flea:flea-skill-by-alice` (mirrors what Claude Code writes after phase 1/4 — the suffixed synthetic_name). - `test_flea_skill_attributed_with_empty_parent` assertion: rollup `name` column now carries the synthetic_name. No legacy `agnes-store-bundle` prefix backward compat — clean cut per user direction (dev phase, no production data worth preserving). Verified locally: 53 passed targeted (session_processor_usage + usage_rollups + marketplace_filter_store) + 215 passed/2 skipped broader (store_api + marketplace_api + admin_store_submissions + store_entity_versions). * fix(flea): phase-6 — plugin-level rollup aggregation parity for flea Flea plugin entity cards + detail pages showed 0 invocations even though nested skills had correct rollup rows. Root cause: the plugin-level aggregation pass in `_aggregate_events` was hardcoded to `source='curated'` only: if source != "curated" or not parent: continue if group_by_day: pkey = (day, "curated", "plugin", "", parent) else: pkey = ("curated", "plugin", "", parent) So flea plugin entities never got a synthetic `(source='flea', type='plugin', parent_plugin='', name=<synth>)` row aggregating nested invocations. `_load_invocation_stats('flea')` filters `parent_plugin = ''` and returned no row for flea plugin entity cards, so `stats.get(entity["synthetic_name"])` missed and the API exposed 0/0. Triggered by empirical observation on the dev VM — `codex-second-opinion-by-c-marustamyan` plugin showed 0 calls in the listing card while its three inner skills (codex-setup ×3, codex-review ×1, codex-second-opinion ×1) had the expected child rollup rows. Fix: - Extend the guard to `source in ("curated", "flea")`. - Replace the hardcoded `"curated"` in the `pkey` tuple with the loop's `source` variable, so flea aggregation lands as `source= 'flea'` and curated aggregation continues landing as `source='curated'`. API path unchanged — `_load_invocation_stats('flea')` filters `parent_plugin = ''` already picks up the new aggregated row alongside standalone skill/agent rows. Rollup `name` field carries the synthetic_name keyspace; no collision between standalone entity synthetic and plugin entity synthetic (global suffix uniqueness enforced by `_suffixed_already_taken`). `USAGE_PROCESSOR_VERSION` bumped 8 → 9 to force a reprocess pass so historic nested-invocation data fills the new plugin-level rows on the next tick (instead of waiting for the next live invocation). Tests: - New `test_flea_plugin_row_aggregates_children` mirrors the existing `test_curated_plugin_row_aggregates_children`: seeds a flea plugin entity, three nested events (one user invoking two skills, a second user invoking one) → asserts the aggregated plugin row carries count=3, distinct_users=2 (union, not sum), plus the child rows survive alongside. Verified locally: 43 passed (session_processor_usage + usage_rollups) + 82 passed/2 skipped broader (+ marketplace_filter_store + marketplace_api). * refactor(marketplace): phase-7 — unify Details sidebar across detail surfaces Five marketplace detail surfaces (curated plugin, flea plugin, curated inner skill/agent, flea inner skill/agent, flea standalone skill/agent) had drifted on which Details rows they show and what order — the same field landed in different positions, some fields duplicated hero info, and the flea plugin Owner row leaked the kebab-case `owner_username` slug instead of the user's real name. This commit aligns all five surfaces on a single scan order driven by UX priority: identity → life-stage → telemetry → debug-tier Concretely: 1. Curator / Owner (first scan signal — trust) 2. Parent plugin (inner skill/agent only) 3. Released (top-level only — plugins + flea standalone) 4. Last used (recency) 5. Active days (engagement consistency) 6. Version (flea standalone only — content hash) 7. Bundle size (debug-tier) Dropped: - Slug field on plugin detail surfaces (`marketplace_id` for curated, `entity_id` for flea). Pure debug info, never user-relevant; URL already carries it. - Category + Installs on flea standalone skill/agent detail. Category is already shown as a hero badge; install count is in the hero telemetry chip — sidebar duplication added noise. Owner display: - Flea plugin Owner row now reads `d.owner_display` (resolved through `users.name → users.email → owner_username` by `_resolve_owner_display` in `app/api/marketplace.py:1491`) instead of the raw `d.author_name` (which is `owner_username`, the kebab-case slug). API field already populated from phase 2; templates just consume it. - Curated Curator row continues to read `d.author_name` from marketplace-metadata.json; `owner_todo` placeholder behavior preserved. Files: - app/web/templates/marketplace_plugin_detail.html — rewrote the Details render loop (lines 1364-1427 area). Slug row removed, rows reordered, Owner branch reads `d.owner_display`. - app/web/templates/marketplace_item_detail.html — both branches of the Details sidebar (inner skill/agent + flea standalone) re-laid around the same scan order. Telemetry helper unchanged, just repositioned. Category + Installs rows removed from the standalone branch. No new tests — no existing test asserts the precise order of Details rows or references the dropped fields in a sidebar context (grep confirmed). API surface unchanged. Verified locally: 84 passed / 2 skipped on `test_marketplace_api.py` + `test_store_api.py`. * fix(flea): post-review hardening — N+1, v50 UNIQUE, docs, test cleanup Addresses 5 critical findings from PR #342 code review: 1. N+1 query in `_flea_to_item` — owner-display resolution previously ran one `SELECT … FROM users WHERE id = ?` per item in the listing comprehension. Now batched via `_load_users_display` IN-query prefetch; 50 items drops 51 user queries to 2. Regression-guarded by `TestFleaOwnerDisplayBatched` (spies `_resolve_owner_display` and asserts it's not called inside the list path). 2. Misleading comment in `src/marketplace_filter.py` claimed the attribution layer accepts both `agnes-store-bundle` and `flea` prefixes — it doesn't (clean cut per CHANGELOG). Rewrote to match reality. 3. CHANGELOG `[Unreleased]` had two `### Changed` blocks. Merged into one (BREAKING bullet first). 4. New v49→v50 migration adds `UNIQUE INDEX idx_store_entities_synthetic_name`. v49 made `synthetic_name` the canonical attribution key but uniqueness was only app-enforced; v50 promotes the invariant to the DB layer. Migration pre-checks for existing duplicates and raises `RuntimeError` listing them rather than letting `CREATE UNIQUE INDEX` fail mid-way. v48→v49 migration gained an `is_nullable='YES'` guard on its `SET NOT NULL` ALTERs so re-runs on a fully-migrated DB don't trip DuckDB's "cannot alter entry … entries depend on it" block (the new index counts as such an entry). Index is created by the migration only — keeping it out of `_SYSTEM_SCHEMA` preserves fresh-install ordering (CREATE TABLE → v49 ALTERs → v50 CREATE INDEX). 5. Deleted three redundant version-pinned schema asserts whose names lied about their bodies (`test_schema_version_is_42` asserting `== 49`, etc.). Canonical assert lives in `test_db_schema_version.py`, renamed to `test_schema_version_matches_constant`. * fix(db): gate v34→v38 store_entities ALTER COLUMN steps on column state CI on Linux failed `test_v17_to_v18_drops_*` after the v50 UNIQUE INDEX landed. Root cause: those tests open a DB at the full target version, seed fixtures, then reset `schema_version` to 17 and reopen — forcing the ladder to re-run from 17 → current. With the v50 index now in place, DuckDB blocks intermediate `ALTER COLUMN` steps on `store_entities` ("Cannot drop this column: an index depends on a column after it!" / "Cannot alter entry because there are entries that depend on it"), because `synthetic_name` (the indexed column) sits positionally after the columns those steps touch. Fix: convert the three SQL-list migrations that hit store_entities into defensive Python functions: - `_v34_to_v35_migrate` short-circuits when `synthetic_name` already exists (post-v49 shape — the visibility_status rebuild is moot and the DROP COLUMN would be blocked by the index). - `_v35_to_v36_migrate` gates the `visibility_status SET NOT NULL` + `SET DEFAULT` on `is_nullable='YES'` so it's a true no-op when the column is already constrained. - `_v37_to_v38_migrate` gates the `version_no SET NOT NULL` step the same way. Forward-roll path (real installs that never reset schema_version) is unchanged: the gates fire `YES` → ALTERs run. The fix only changes behavior for the "DB is already at v50 shape but version row says 17" scenario the tests construct. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>	2026-05-19 02:32:41 +02:00
minasarustamyan	7907b8082e	perf(cli): use git ls-remote in refresh-marketplace --check (~8s -> ~1s) (#313 ) * perf(cli): use git ls-remote in refresh-marketplace --check (~8s -> ~1s) The SessionStart hook fires agnes refresh-marketplace --check on every Claude Code session in every workspace. The detector used to run git fetch origin against the per-user marketplace bare repo just to update FETCH_HEAD for a HEAD-vs-FETCH_HEAD comparison — paying the full git object/metadata download (~8s) even on the (overwhelming) no-change path. Replace with git ls-remote origin HEAD: one HTTPS round-trip, one line of text, no objects transferred. Compare the returned SHA against local HEAD via the new _remote_head_sha and _local_head_sha helpers; emit the /update-agnes-plugins hook JSON on mismatch, silent on match. Same PAT wiring (AGNES_TOKEN in env, never on argv), same exit-code contract (remote-read failure -> exit 1 so the hook's \|\| true swallows it). The default and --bootstrap paths still do real git fetch + reset --hard — they need the objects. * docs(cli): update --check help text to mention git ls-remote, not git fetch Followup to the previous commit — the --check flag's user-facing help string still described the old + FETCH_HEAD comparison. Updated to match the new ls-remote-based implementation. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>	2026-05-15 06:24:27 +02:00
ZdenekSrotyr	d55c8a3c33	feat(web): consolidate the personal /me/* surface — /me/activity + /me/profile (#304 ) Consolidates the scattered per-analyst pages into /me/activity (usage analytics) and /me/profile (account hub). /me/stats and /profile/sessions 301-redirect; /profile, /me/debug, /tokens are removed with every internal link repointed. Includes an XSS fix in the /me/activity page hero, the user_id-keyed session-lookup alignment, and the v0.54.15 release cut. Co-developed by @ZdenekSrotyr and @cvrysanek.	2026-05-14 21:29:51 +02:00
minasarustamyan	17159bfad9	fix: refresh-marketplace enables stack plugins; override sentinel is init-time only (#307 ) * fix(refresh-marketplace): also enable stack plugins in workspace settings Reconcile previously stopped at `claude plugin install --scope project`, which only writes the global plugin registry. Without an entry in the workspace `.claude/settings.json` `enabledPlugins` map, Claude Code treats every plugin as disabled — `/plugins` doesn't list them and their slash commands, skills, and agents are unreachable. Refresh now writes the enable map after install/update, treating the user's marketplace stack as the source of truth (re-enables anything a prior `claude plugin disable` locally turned off). Override workspaces are skipped via `is_override_workspace`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(override): sentinel governs init only, not runtime CLI Sentinel `.claude/init-complete` with `override: true` was meant to let admins ship INITIAL workspace content. The implementation was over-scoped — `is_override_workspace` check sat inside every Agnes writer (`install_claude_hooks`, `install_claude_commands`, `maybe_refresh_claude_hooks`, `_enable_plugins_in_workspace_settings`), which blocked runtime commands too. Operators on override workspaces got trapped at the template snapshot: no `enabledPlugins` map from `agnes refresh-marketplace`, no hook auto-migration from `agnes self-upgrade`. Move the check to the init-time call site (cli/commands/init.py, `if not override_active:`) — the single place where init-time skip is the right behavior. Writers themselves become unconditional; runtime CLI now updates `.claude/` regardless of the sentinel. Admin custom hooks survive — refresh only rewrites entries matching `_OUR_COMMAND_MARKERS` (foreign commands fall through unchanged, same contract as default workspaces). Existing override workspaces auto-converge on next `agnes self-upgrade` (fires from every SessionStart). No manual migration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 18:43:32 +02:00
minasarustamyan	69a1e22cf5	feat(initial-workspace): per-instance agnes init override (#292 ) * feat(initial-workspace): per-instance agnes init override Adds Initial Workspace Template — an admin-configurable per-instance override for the agnes init analyst workspace. When configured, agnes init downloads a server-rendered zip from a Git repo the admin registered and extracts it into the analyst's workspace, fully bypassing Agnes-default CLAUDE.md / settings.json / hooks / slash commands / AGNES_WORKSPACE.md. Repo layout convention: only the contents of a top-level `workspace/` subdirectory ship to analysts; admin docs (README, CI configs) at the repo root stay in the repo and never reach an analyst. Sync rejects repos without `workspace/` at root. Server side: - src/initial_workspace.py — clone (or fetch+reset), validate, build zip with strict path checks and reserved-path rejection (workspace/.claude/init-complete reserved by Agnes) - app/api/initial_workspace.py — admin CRUD + sync endpoint + analyst- facing status/zip/applied endpoints; config persists to instance.yaml overlay, PAT to .env_overlay - app/secrets.py — refactor: persist_overlay_token shared helper with threading.Lock for .env_overlay writes (closes pre-existing race between concurrent marketplaces saves) - app/web/templates/admin_server_config.html — new "Initial Workspace Template" section + modal + Sync/Edit/Delete/Download buttons (matches existing cfg-section visual language) CLI side: - cli/lib/override.py — single source of truth for is_override_workspace sentinel detection - cli/lib/initial_workspace.py — probe status, safe zip extraction with ../absolute/symlink rejection, typed-YES force confirmation - cli/commands/init.py — override branch (skips Agnes-default workspace writes); extended sentinel with override:true, template_source, template_sha so future agnes self-upgrade does not auto-refresh hooks - cli/lib/hooks.py + cli/lib/commands.py — short-circuit on override workspaces (install_claude_hooks, install_claude_commands, maybe_refresh_claude_hooks) Audit-event strategy: server writes initial_workspace.fetch_started inside GET /api/initial-workspace.zip (cannot be spoofed by PAT-holder); CLI POST /applied writes initial_workspace.applied as best-effort confirmation. Admin mutations log via the existing _audit pattern. Tests: 27 server (clone/validate/zip + workspace-subdir convention + concurrent persist_overlay_token + endpoint shapes + audit rows) + 29 CLI (override sentinel parse + probe fall-through + safe extraction + YES strictness + hook guards + e2e mocked init). Risk acceptance — documented in docs/initial-workspace-override.md + CHANGELOG Internal section so AI reviewers understand the deviations from defaults are intentional: - maybe_refresh_claude_hooks deliberately no-ops on override workspaces - --force on override does NOT back up CLAUDE.md (admin's repo is the source of truth) - .claude/CLAUDE.local.md IS overwritten by override extraction when admin's repo ships one * test+vendor-agnostic: drop Groupon tokens from #292 fixtures + extend admin-gate coverage Two fixes from the takeover review on #292: 1. Vendor-agnostic OSS rule: Replace `Groupon` / `groupon/template` tokens in test fixtures with `Acme` / `acme/template` (8 sites in test_cli_init_override.py + 1 in test_initial_workspace_api.py). Per CLAUDE.md "Vendor-agnostic OSS — no customer-specific content" rule: customer-specific tokens don't belong in shipped artifacts, even in test fixtures. The pre-existing FoundryAI mentions in test_instance_config.py + test_setup_instructions.py are out of scope for this PR (didn't introduce them). 2. Admin-gate coverage gap: `test_admin_endpoints_require_admin` only covered GET /api/admin/initial-workspace + POST .../sync. The register-write (POST .../initial-workspace) and delete (DELETE .../initial-workspace) endpoints used the same `Depends(require_admin)` wiring but had no regression test. Loop now covers all 4 verbs so a future refactor that drops the dependency from one endpoint fails here instead of silently exposing the write/delete paths to any analyst with a PAT. * release: 0.54.9 — Initial Workspace Template (per-instance agnes init override) Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.8 → 0.54.9) for Mina's Initial Workspace Template feature. No DB migration (config lives in instance.yaml overlay). No mandatory operator action — empty default keeps OSS-default agnes init behavior. Operators wanting full template control link a Git repo on /admin/server-config → "Initial Workspace Template". See docs/initial-workspace-override.md for the full responsibility-transfer contract. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-13 20:35:01 +00:00
minasarustamyan	efc607f3ee	feat(cli): agnes marketplace search/detail/add/remove + retire stale subcommands (#280 ) * feat(cli): agnes marketplace search/detail/add/remove + retire stale subcommands Unified CLI surface for the v28+ marketplace: search across Curated and Flea Market (RBAC-filtered server-side), drill into a single item's detail, add/remove from your stack. Replaces opt-out era commands that no longer reflect how users compose their stack. CLI changes: - Added: agnes marketplace {search,detail,add,remove} - Removed: agnes my-stack toggle (opt-out semantics, curated-only) - Removed: agnes store {list,show,install,uninstall} (consumer-side ops moved under marketplace; store now covers only creator-side upload, update, delete, mine) ID format unifies curated and flea: marketplace_id/plugin_name (slash) routes to /api/marketplace/curated/..., bare UUID routes to /api/store/entities/... (flea bundles skills/agents into a synthetic plugin server-side, so the analyst sees a single add/remove surface). Templates: - claude_md_template.txt: rewritten marketplace section as operational guidance for Claude Code (discovery, stack management, behaviour notes). Dropped the static {% if marketplaces %} listing — the CLI is the source of truth for what's in the stack at any moment, so a snapshot rendered at init time would lie the moment the user runs agnes marketplace add/remove. Same discipline already applied to tables and metrics. - agnes_workspace_template.txt: cheat sheet adds 5 marketplace one-liners; keeps the file's reference-doc tone (the original commit's intent: 'what is this thing, how does it work, how do I uninstall it'). Docs: HOWTO/05-customizing-skills.md rewritten around the new CLI flow; the opt-out section is replaced by 'Removing items from your stack'. Tests: new test_cli_marketplace.py covers all four subcommands incl. RBAC/409 paths (system plugin guard, not-approved flea entity); test_cli_store.py trimmed to the retained creator-side commands. * release: 0.54.1 — agnes marketplace CLI redesign + retire stale subcommands Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.0 → 0.54.1) bundling the BREAKING removals of `agnes my-stack toggle` and `agnes store {list,show,install,uninstall}` plus the new unified `agnes marketplace {search,detail,add,remove}` surface. No DB migration; no operator-facing config change. Operators on floating tags (`:stable`) auto-upgrade transparently. Analyst CLI upgrade prompt fires on next `agnes pull`; users invoking the retired commands get "No such command" with the new `agnes marketplace` substitution called out in the BREAKING bullets. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-13 05:20:56 +00:00
ZdenekSrotyr	b4d3c576af	Activity Center: audit log + telemetry + sessions + agnes_* tables (#278 ) * docs(spec): admin observability spec + Activity Center MVP plan Parent spec (480 lines) + executable plan (2295 lines, 14 TDD tasks). Covers Activity Center rebuild (/admin/activity), with /admin/sessions and /admin/feedback deferred to follow-up plans. Already incorporates reviewer-pass revisions across three angles (security, production resilience, code architecture): - _get_db import path corrected to app.auth.dependencies - Test fixtures aligned with seeded_app / admin_user / get_system_db - All new audit writes wrapped in try/except + logger.exception - Filename sanitization on session uploads - DuckDB DESC index behavior documented; upgrade window flagged - Migration idempotency + evolved-DB test cases - reveal_raw + shared-cache multi-worker explicitly deferred Targets schema v40 (audit_log gains params_before, client_ip, client_kind, correlation_id + 3 indices). * feat(db): schema v40 — audit_log gains params_before, client_ip, client_kind, correlation_id + 3 indices * chore(test): clean up Task 1 — drop unused import, rename stale test * feat(audit): AuditRepository.log() accepts params_before/client_ip/client_kind/correlation_id * test(audit): strengthen params_before assertion to round-trip JSON content * feat(audit): AuditRepository.query() rich filters + keyset cursor pagination * feat(sync): SyncStateRepository.list_recent() cross-table feed * feat(audit): POST /api/sync/trigger writes audit_log row * feat(audit): POST /api/scripts/run-due writes audit_log row * feat(audit): POST /api/upload/sessions writes audit_log row + sanitizes filename * feat(audit): GET /api/data/{table_id}/download writes audit_log row * feat(activity): /api/admin/activity timeline + /health + /sync endpoints * feat(ui): /admin/activity rebuilt — health pulse, timeline, sync grid; /activity-center → 308 redirect BREAKING: removed demo executive-pulse / maturity-roadmap content from activity_center.html. The page now reflects real audit_log + sync_history data. * feat(ui): admin nav + dashboard widget point at /admin/activity * feat(activity): recursive-audit suppression for AC read endpoints (60s window per actor+filter) * feat(activity): emit PostHog events when integration enabled (no-op default) * fix(audit): move v40 indices out of _SYSTEM_SCHEMA + update test_repositories to unpack query() tuple _SYSTEM_SCHEMA CREATE INDEX on audit_log(timestamp) failed when migration tests hand-roll a bare audit_log (id, action) without the timestamp column. Fix: remove indices from _SYSTEM_SCHEMA; add ADD COLUMN IF NOT EXISTS guards for timestamp and other pre-v40 columns in _v39_to_v40() so the upgrade path is safe on any hand-rolled schema; call _v39_to_v40 explicitly in the fresh-install (current==0) path to restore index creation there. Also unpack the (rows, next_cursor) tuple from AuditRepository.query() in the three TestAuditRepository tests that still treated it as a list. * docs: CHANGELOG entry for Activity Center MVP * chore: refresh stale module docstring in app/api/activity.py * feat(cli): agnes admin activity — terminal access to Activity Center (timeline + health + sync) * fix(db): _v39_to_v40 — add IF NOT EXISTS guard for 'action' column The v39→v40 ladder step adds defensive ADD COLUMN IF NOT EXISTS for every audit_log column so a hand-rolled bare audit_log (id only) is safe through the ladder. 'action' was missing from the guard list, causing CREATE INDEX idx_audit_action_time to fail on tests that stub audit_log with only an id column (tests/test_e2e_extract.py:: TestSchemaMigration::test_migration_preserves_and_extends). Local 6/6 schema tests + the previously-failing CI test pass. * docs(spec): platform telemetry epic — Boss directive + Activity Monitoring plan rebased onto v40 (stacked on zs/spec-activity-center) * feat(db): schema v41 — 7 usage_* tables for telemetry (events, summary, rollups, attribution) * chore(db): tighten v41 — usage_session_summary.session_id NOT NULL + upgrade test asserts all 7 tables * feat(usage): UsageAttributionRepository — replace/delete/lookup over usage_attribution_* tables * refactor(marketplace): extract list_inner_skills/agents/commands to src/marketplace_listing.py for reuse * feat(usage): explode plugin attribution on marketplace sync + store entity write; backfill script * refactor(marketplace): finish src/marketplace_listing.py extraction — drop duplicate _list_inner_* + _parse_frontmatter from app/api/marketplace.py * feat(usage): promote attribution helpers to src/usage_attribution_helpers.py; hook update_entity rename + bundle-swap; clarify best-effort semantics * feat(usage): UsageProcessor real extraction + rollup rebuild + 10 fixture-driven tests * fix(usage): include tool_id in event hash + executemany + rollup transaction (critical multi-tool-turn drop fix) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(marketplace): popularity stats — invocations_30d + trend + sort=most_used\|trending + Most Popular section * feat(admin): /admin/users/<id> Sessions section — list + single-file + bulk-zip downloads (audit-logged) * feat(usage): admin export endpoint + CLI — csv/json/parquet streaming, filters, audit-logged * feat(usage): agnes admin ask — LLM Text-to-SQL over usage_events with SELECT-only validator (audit-logged) * feat(usage): reprocess + prune endpoints + scheduler daily prune job + CLI * docs: PLATFORM_SETUP.md operator playbook + HOWTO/ cookbook (5 guides + index) Adds docs/PLATFORM_SETUP.md as a consolidated operator playbook covering bootstrap, TLS, marketplaces (curated + flea), scheduler env vars, telemetry extraction/export/ask/prune, privacy posture, and daily routine. Adds docs/HOWTO/ with 5 analyst cookbook guides: first query, snapshots for remote tables, private sessions, feedback + admin ask, and customizing skills. Existing setup docs (QUICKSTART, DEPLOYMENT, ONBOARDING, HEADLESS_USAGE) get a one-line cross-reference at the top pointing to PLATFORM_SETUP.md. * docs(changelog): platform telemetry epic — usage_* foundation + surfaces + admin access + docs Comprehensive [Unreleased] entry covering: usage_events/session_summary/ tool_daily/plugin_daily tables (v41), attribution lookup tables, backfill script, marketplace Most Popular + invocation chips + sort, admin Sessions section, export/ask/reprocess/prune endpoints + CLI mirrors, Activity Center (v40), PLATFORM_SETUP.md + HOWTO/ docs, and operations notes for v41 upgrade. * fix(security): block DuckDB read_/http_/glob functions in usage_ask validator + symlink escape guard in session zip + clarify mark-private semantics * fix(admin): parquet export tempfile cleanup on COPY failure + correct processed-first sort on /admin/users/<id>/sessions * feat(audit): close 8 production audit gaps — query (local/remote/hybrid), catalog/schema/sample, snapshot estimate/create, check-access * feat(ui): /admin/usage summary dashboard + per-user activity tab on /admin/users/<id> * fix(audit): cap error messages at 200 chars + audit user_activity reads + recursion guard on usage.summary * fix(audit): catalog.list audits on error path + clean up deferred json import * fix(ux): client_kind=cli for PAT auth + timeline empty state + email-instead-of-uuid + nav reorder + help text + loading indicators + ask doc * feat(observability): unify /admin/activity into single page with saved views - KPI cards (events, users, error rate, p95) clickable as quick-filters - Faceted filter dropdowns populated from audit_log in the current window - Sortable audit table, cursor pagination, per-row JSON side panel - Saved views (schema v43: user_observability_views) — per-user state - Top bar: window selector + 30s Live toggle + saved views dropdown - /admin/scheduler-runs → 308 redirect (source=scheduler filter) - New endpoints: /api/admin/observability/{facets,kpis,views} * test: update activity + scheduler-runs tests for unified page - test_admin_activity_page_renders asserts new structural anchors - test_admin_scheduler_runs_page_admin_only asserts 308 redirect * fix(observability): respect [hidden] on modal + side panel CSS `display: flex` on .obs-modal beat the [hidden] attribute's UA display:none, so the save-view modal rendered on page load and Cancel clicks couldn't dismiss it. Gate the modal's flex layout on :not([hidden]); add the same display:none guard prophylactically to .obs-panel and .obs-views-panel. * feat(observability): user enrichment in audit + interactive /admin/usage Activity: - /api/admin/activity now joins users for user_email + user_name per row - User column renders "name (id-prefix)" or "email (id-prefix)" instead of an opaque truncated UUID; falls back to id when the user record is missing Usage: - /admin/usage rewritten as the same filter/group-by/search pattern as /admin/activity. Faceted dropdowns (User / Tool / Source / Event type) populated from usage_events; debounced free-text search across tool_name / skill_name / subagent_type / command_name - New endpoints /api/admin/usage/{facets,kpis,query}; the query endpoint supports group_by in {day, username, tool_name, source, ref_id} with sort + offset pagination, plus an ungrouped raw-events mode - 4 KPI cards (events, distinct users, distinct tools, error rate) are clickable quick-filters; clicking a grouped row applies the bucket as a filter - Old static `?window=7d\|30d\|all` server preload removed; all state is client-side via since_minutes + group_by + filters in the URL * fix(observability): clearer labels, all-column sort, drop saved views UI - Rename page titles: "Activity" → "Server activity", "Usage" → "Tool usage" with a one-line subtitle on each explaining what the page covers and linking the other one. The two pages source different data (audit_log vs usage_events) and the previous labels conflated them. - Drop the saved-views dropdown + save modal from /admin/activity. The modal pop-open bug was the trigger; the value wasn't there yet. The /api/admin/observability/views CRUD + DuckDB table stay in place. - Rename "Live (30s)" to "Auto-refresh (30s)" with a tooltip clarifying that it's the re-fetch rate, not the time range. Time range now labeled "Time range" instead of "Window". - All audit-table columns are sortable (User, Source, Action, Resource, Result added); sort is page-local with a Jinja comment explaining the trade-off. Same for raw usage rows. - Fix duplicate sort-arrow bug — the literal "▼" in the Time th HTML was rendering alongside the CSS ::before arrow. Removed the literal; CSS is the single source of truth. * feat(observability): global Sessions browser + transcript viewer + CLI Web: - /admin/sessions — list every collected session JSONL across all users with time-range, user, model, errors-only and free-text filters. Default sort surfaces error-heavy sessions first. KPI cards (sessions, distinct users, sessions w/ errors, tool error rate) clickable as quick-filters. - /admin/sessions/<username>/<file> — transcript viewer rendering the JSONL chronologically: user prompts, assistant text, tool calls (with JSON input) and tool results (with flattened output). Errors get a red border + chip and a "Next error" navigation button at the top. - Admin dropdown gains a "Sessions" link. API: - GET /api/admin/sessions/{list,kpis,facets} — filtered cross-user reads off usage_session_summary - GET /api/admin/sessions/{username}/{file}/transcript — parses JSONL via the existing services.session_pipeline.lib, returns chronological events - GET /api/admin/sessions/{username}/{file}/download — JSONL stream, same path-safety guards as the per-user endpoint, audit-logged CLI: - `agnes admin sessions list [--user X] [--errors] [--since 7d]` — table output with `!` prefix on rows that hit a tool error - `agnes admin sessions show <username> <file>` — transcript dump, with `--errors` to print only the failed tool_result blocks - `agnes admin sessions download <username> <file> [-o path]` - `agnes admin sessions kpis` — top-level numbers * feat(internal): expose telemetry tables to agnes query with row-level RBAC Three new registered tables backed by system.duckdb, queryable through the same /api/query plumbing analysts use for Keboola / BigQuery / local sources: agnes_sessions → usage_session_summary (filter: username) agnes_usage → usage_events (filter: username) agnes_audit → audit_log (filter: user_id) RBAC is per-row, not per-table: admins see every user's rows; non-admins see only their own. The filter is built server-side from the auth user dict; non-admin filter values are regex-validated before SQL interpolation. Implementation: - new connector connectors/internal/ with access (filter+exec) + registry (idempotent table_registry seed at startup) - /api/query detects internal table refs and short-circuits to a CTE wrapper that prepends "WITH agnes_x AS (SELECT * FROM <src> WHERE …), …" then "SELECT * FROM (<user_sql>) AS _q". DuckDB cursor on the shared system.duckdb handle — opening parallel handles / ATTACH on the same file is blocked process-wide. - mixing internal + BQ / registered local tables in one SELECT is rejected (v1 limitation) - src.rbac.can_access_table waves internal tables through for all authenticated users; row scoping is the actual security control - /api/v2/schema and /api/v2/sample gained internal branches; sample intentionally skips its cache because rows are RBAC-scoped per caller - audit row written as action='query.internal' with is_admin flag Tests: connectors/internal/access — RBAC, filter clause, schema, CTE wrapper coexistence with user-supplied aggregations, unsafe-username rejection. 16/16 passing. Motivating queries this enables: SELECT tool_name, COUNT() FROM agnes_usage WHERE is_error GROUP BY 1 ORDER BY 2 DESC -- analyst self-introspection: which tools fail for me? SELECT user_id, COUNT() FROM agnes_audit WHERE action = 'session.transcript_view' GROUP BY 1 -- admin: who's been looking at whose session transcripts? * feat(admin): group dropdown into 5 named sections + internal tables in /catalog Admin dropdown gains section headers so admins can land on the right page without re-reading the full menu: Activity Center Server activity / Tool usage / Sessions Users & Access Users / Groups / Resource access / Tokens Data Tables Agent Experience Curated Marketplaces / Flea Submissions / Agent Setup Prompt / Agent Workspace Prompt Server Server config "Agent Experience" frames the curated content + prompts as one cluster — it's all admin-controlled material that shapes what an analyst's AI agent encounters. "Configuration" → "Server" since only one item lives there now. Renamed the section's first two items: "Activity" → "Server activity" (matches page H1) "Usage" → "Tool usage" Also fixes /catalog visibility of the internal tables (agnes_sessions / _usage / _audit) for non-admin users: ``app.auth.access.can_access`` short-circuits to True for resource_type='table' + an internal-table id. Without this, non-admins saw the tables in /api/v2/catalog (which uses the same RBAC bypass) but not on the /catalog HTML page (which calls can_access directly, requiring a resource_grants row internal tables don't have). CSS for `.app-nav-menu-section`: small caps, muted, non-clickable; first section trims top padding so the panel doesn't open with an awkward gap. * refactor(admin): move corporate memory into Admin > Agent Experience Memory link was the only admin-only entry in the primary nav (gated by session.user.is_admin). Moves it into the Admin dropdown under Agent Experience, alongside Curated Marketplaces / Flea Submissions / Prompts — all admin-curated content that shapes what an analyst's AI agent encounters. Renamed the nav label to "Shared Knowledge" to match what the page actually is (admin-curated organisational knowledge from session verification, surfaced to agents). URL stays at /corporate-memory; the route still gates on require_admin per the existing comment. Side effect: primary nav (Home / Marketplace / Data Packages) is now uniform for every authenticated user — no conditional admin-only entry. * ui: rename admin entries to Curated Knowledge / Init Prompt / Workspace Prompt - "Shared Knowledge" → "Curated Knowledge" (parallel with "Curated Marketplaces" in the same Agent Experience section; "curated" tells the admin what they do there — review + approve) - "Agent Setup Prompt" → "Init Prompt" (matches the `agnes init` flow it actually drives) - "Agent Workspace Prompt" → "Workspace Prompt" (the "Agent" prefix was redundant — every item in the section is agent-facing) Renames page titles + H1s on /admin/agent-prompt and /admin/workspace-prompt to match. * refactor: rename Usage → Telemetry across user-facing surfaces External surfaces all switch; internal Python module / file names and the physical DB tables (usage_events, usage_session_summary, usage_tool_daily, usage_plugin_daily) stay — renaming them would force a schema migration + a redo of the LLM Text-to-SQL prompt for no analyst-visible win. Changes: - Admin dropdown: "Tool usage" → "Telemetry" - Page H1 / <title>: same - URL: /admin/usage → /admin/telemetry; old URL 308-redirects - API prefix: /api/admin/usage/* → /api/admin/telemetry/* - CLI: primary command `agnes admin telemetry …`; `agnes admin usage` kept as a deprecated alias so existing operator scripts keep working - Internal data-source table id: agnes_usage → agnes_telemetry. The registry seed now evicts any stale internal-source row whose id no longer matches INTERNAL_TABLES, so the old `agnes_usage` row is removed from table_registry on next app boot - All tests + JS endpoint paths updated * test(rbac): include auto-appended internal tables in expectations get_accessible_tables now appends agnes_sessions / agnes_telemetry / agnes_audit to every authenticated user's accessible-tables list so the internal data source shows up in /catalog. The two existing rbac tests asserted hardcoded list shapes that pre-dated the change. Rewritten to assert "granted tables + the canonical internal-table set" instead of literal lists, so the test stays correct if the internal table roster changes again later. * ui: visual dividers between admin-dropdown sections Adds a 1px top border + 6px top margin to every section header except the first, so the five named groups (Activity Center, Users & Access, Data, Agent Experience, Server) read as visually separated clusters. The header itself stays small-caps + muted as before — the border is additive. * ui(memory): match obs-topbar visual on /corporate-memory The Curated Knowledge page (linked from the admin dropdown's Agent Experience section) opened straight into the stats bar — no title, no subtitle, no shared chrome with the other admin pages. Adds an obs-topbar-style header at the top of .container-memory: - H1 "Curated Knowledge" - subtitle explaining what the page is + how AI agents pull from it The `.ck-` class set duplicates the inline obs- styles from /admin/activity etc. for this one page; promoting the obs-* class set to style-custom.css for shared reuse is the obvious next step (4 pages already inline the same CSS), tracked as a follow-up. Page <title> also renamed from "Corporate Memory" → "Curated Knowledge". * ui(tables): list Agnes internal tables in /admin/tables + group in /catalog /admin/tables previously rendered three per-source-type listings (BQ / Keboola / Jira) and dropped any row whose source_type didn't match — so the agnes_sessions / agnes_telemetry / agnes_audit rows seeded into table_registry were invisible. Adds a fourth read-only section "Agnes internal tables" that filters source_type === 'internal' and renders the same registry-table layout the other sections use, with two changes: - no Register button (these rows are seeded on every app boot from connectors/internal/registry.py) - Edit + Delete actions hidden (any change would be reverted on the next start). Manage access stays so admins can still inspect. Mode badge picks up a new mode-internal CSS class (teal accent) so the display doesn't lie and call it "local". In /catalog, internal tables now group under an "agnes" accordion section (bucket="agnes" on seed) instead of falling into the catch-all "default". Single source of truth for which tables exist; admins find them where they expect. * ui(tables): Agnes internal as a 4th tab next to BQ/Keboola/Jira Previous iteration mounted the internal-table listing as a separate standalone card under the tab strip. Reshapes it to a proper tab-content section so admins switch between data sources via one consistent nav (BigQuery / Keboola / Jira / Agnes internal). - New tab button "Agnes internal" in the tab-nav. - The listing card becomes <section id="tab-content-internal" class="tab-content">; switchTab() already routes by id so no JS change beyond extending the hash allowlist for direct #internal links. - Tab content keeps the read-only treatment from the previous commit (no Register button, no Edit / Delete in renderRegistryListing). * ui: rename Curated Knowledge → Curated Memory Settles the naming back on "Curated Memory" — parallel structure with "Curated Marketplaces" in the same Agent Experience section, and zero rename ripple: URL (/corporate-memory), API (/api/memory/), CLI (agnes admin memory), and Python modules all stay on "memory" so the admin label finally lines up with the underlying surfaces. The "Curated" prefix still tells admins what they do on the page (review pending → approve / mandate / reject) and reads as a sibling of "Curated Marketplaces" right next to it in the dropdown. Touches: admin dropdown label, page <title>, page H1. DB tables stay on knowledge_ (already the canonical naming for the data shape). * ui: rename "Server activity" → "Audit log" "Audit log" is what the page actually is — server-side audit_log table rendered with KPI cards + filter bar + sortable table. The "Server activity" label confused the term with Claude Code session telemetry (Telemetry page) and didn't make the source/concept clear. Touches: - Admin dropdown nav label - /admin/activity page H1 + subtitle - /admin/telemetry subtitle cross-link - test_activity_api page-renders assertion URL (/admin/activity) and API (/api/admin/activity/) stay — the "activity" name has stuck at the route layer for a year; rerouting those would churn dashboards/bookmarks for zero analyst-visible win. ui(admin-nav): gray band on each section header for clearer separation Previous iteration used a 1px top border between section labels — the labels still blended into the items above/below at a glance. Switches to a light gray background band per section header, extended edge-to- edge inside the panel via negative horizontal margins. Bolder font-weight (700) reinforces the separation; bumping the font color isn't needed because the band itself does the work. First section's header tucks into the panel's top border-radius so the band reaches the corners without a gap. * ui(catalog): rename internal-table category to "Agnes Internal" `bucket` is what /catalog renders as the accordion category header verbatim — "agnes" lowercase didn't read as a real category name and got confused with a system identifier. Bumps to "Agnes Internal". Seed re-applies on every app boot so existing rows pick up the new bucket value via `ON CONFLICT (id) DO UPDATE`. * ui(catalog): split Agnes Internal into its own card on /catalog Previously the three internal tables landed inside the "Core Business Data" card under an "Agnes Internal" accordion alongside Keboola / BQ buckets — readers conflated system telemetry with business datasets, and the data_stats header counter ("3 tables · ~X rows total") only ever counted synced rows so internal tables looked invisible. Split the catalog page into two cards: - Core Business Data: only non-internal source_types (Keboola, BQ, Jira). Accordions group by bucket as before. Stats counter reflects this card's tables. - Agnes Internal: a dedicated card with its own visual treatment (teal accent matching the mode-internal badge in /admin/tables). Flat list (no accordion — only 3 rows, never grows here), each row carries the canonical `agnes query` snippet. Read-only — no profiler click, no In-stack toggle, no sync metadata. Route adds `internal_card` context object; template renders the new card only when it's non-None. * fix(rbac): hide internal tables from /admin/access + drop "my" framing Two related cleanups for the Agnes-internal tables: 1. /admin/access (resource grants) no longer lists them. The `can_access` check has a hardcoded internal-table bypass — security is row-level (per-request view filter), so a table-grain `resource_grants` row would do nothing. Surfacing them in the UI let admins set up grants that silently no-op. Filter at the `_table_blocks` projection so the UI tree never sees them. 2. Display names drop the analyst-perspective "my" framing: "Agnes — my sessions" → "Agnes sessions" "Agnes — my telemetry events" → "Agnes telemetry events" "Agnes — my audit log" → "Agnes audit log" The "my" only makes sense from the querying analyst's seat (`SELECT … FROM agnes_sessions` returns their rows); on /admin/* pages where admin sees / configures them across users, the pronoun was misleading. Description text now spells out the row-level RBAC contract explicitly. Display names update via TableRegistryRepository.register's ON CONFLICT UPDATE on next app boot; no manual cleanup needed. * ui: subtitle notes about agnes_* tables on each Activity Center page The recursive observability story — Agnes serves its own audit / telemetry / session data through the same `agnes query` plumbing analysts use for business data — wasn't surfaced anywhere on the admin pages that show that data. Three pages get a one-liner with the canonical `agnes query` snippet + the RBAC contract (analysts see their own rows, admin sees all): - /admin/activity (Audit log) → agnes_audit - /admin/telemetry (Tool usage) → agnes_telemetry - /admin/sessions → agnes_sessions Sets up the discovery moment for admins: they're reading the page, they see "you can query this from Claude Code", they remember it when an analyst asks "how do I find my own failed tool calls?". * ui(tables): explain "Show log" empty-state on /admin/tables Cache warmup log <pre> renders with a dark background and is only populated by the SSE stream during a Re-warm all run. Opening the page cold + clicking Show log just revealed a black bar with no context — admins couldn't tell what they were looking at. Adds an inline paragraph above the <pre> explaining what the log is, the row format, when it fills in, and where to find the historical audit trail (/admin/activity). The actual <pre> stays empty until SSE events arrive, but the surrounding copy carries the meaning. * ui(tables): auto-open cache-warmup log on Re-warm all click A Re-warm all run takes ~24s per remote BQ row. With the <details> collapsed by default, operators saw the button disable, watched a quiet ~24s pass, and assumed nothing had happened — the streaming log was hidden behind a closed disclosure. Two small JS tweaks: - cacheWarmupRun() opens the details on click, so streamed lines appear without an extra interaction - cacheWarmupOnStart() hides the inline hint paragraph the moment real log content lands, so the dark log block isn't competing with redundant context Hint paragraph also clarifies that only `query_mode='remote'` BQ rows are warmed — operators with only materialized/internal tables would see total=0 and the page would "do nothing" by spec. * ui: trim Agnes internal copy across surfaces Descriptions had grown to explain the extraction pipeline ("parsed out of session JSONLs"), the underlying table ("Backed by usage_session_summary"), the RBAC mechanic ("row-level RBAC at query time — analysts see their own; admin sees all"), and the SQL snippet. Every implementation detail meant another rewrite on the next iter. Strips to one stable line per surface: what the data is, plus "Also available locally for analysis". Mechanics live in code + docs; the page copy says what the user needs to know. Touched: - connectors/internal/access.py: INTERNAL_TABLES descriptions - activity_center.html / admin_usage.html / admin_sessions.html subtitles - catalog.html Agnes Internal card description + row strip - admin_tables.html "Agnes internal" tab hint * fix(internal): is_user_admin arity bugs + + saved-view payload cap Round-1 code review (PR #278) caught two blocking bugs and three nits. Blocking — both `is_user_admin(user)` (single dict arg) calls raised TypeError. is_user_admin signature is `(user_id, conn)`. Affected: - app/api/query.py:_run_internal_query — every POST /api/query that references agnes_sessions / agnes_telemetry / agnes_audit blew up with a 500. The headline analyst-facing feature of this PR was unusable through the API. - app/api/v2_sample.py — same shape; `GET /api/v2/sample/agnes_` returned 500. Both fixed to call `is_user_admin(user.get("id"), conn)`. Added two FastAPI-level tests in test_internal_data_source.py that go through the TestClient — the existing unit tests on `execute_internal_query` and `build_filter_clause` skipped the request-handler layer where the bugs lived, which is why this landed. Nits also closed: - connectors/internal/access.py: `+` allowed in _USERNAME_RE / _USER_ID_RE so RFC 5321 email local-parts (alice+test@x) resolve correctly without hitting InternalAccessError. - app/api/observability.py: saved-view payload capped at 64 KiB to prevent an admin from bloating system.duckdb with a malformed save. fix(security): close non-admin data-leak via underlying-table refs PR #278 R2 review surfaced a non-admin-exploitable bypass: SQL whose string literal contains 'agnes_sessions' routed into the privileged internal-query path, then queried the underlying physical table (usage_session_summary / usage_events / audit_log) directly, escaping the CTE wrapper's row filter. Two reinforcing defenses: 1. find_internal_refs() now strips single-quoted string literals before scanning for alias names — a literal alone no longer routes the request into the privileged code path. 2. execute_internal_query() rejects non-admin SQL that references the underlying physical tables (usage_, audit_log). The CTE wrapper only scopes the agnes_ aliases; a direct FROM on the base table — or a shadowing inner WITH that still has to read the base table — bypasses RBAC. Block before execution with an actionable error pointing to the agnes_* alias. Admins are unaffected (god-mode short-circuit on the filter clause). 3. tests/test_internal_data_source.py — three new negative tests covering literal-only matches, direct-table refs, and CTE shadow attempts. Also tightens usage_ask.py's SELECT-only validator: pragma_table_info, pragma_storage_info, pragma_database_, and duckdb_tables / columns / views / indexes / schemas are reflection functions that leak metadata the analyst question shouldn't reach. \bPRAGMA\b in _FORBIDDEN never matched the function-call form (word-boundary between `A` and `_`). fix(security): dynamic denylist for non-admin internal queries R3 review (PR #278) caught a wider data-leak than R2: the underlying- physical-table guard listed only the 7 usage_* + audit_log tables, but system.duckdb has 30+ other sensitive tables — users (emails + ids), personal_access_tokens, resource_grants, user_groups, user_observability_views, store_, marketplace_, knowledge_, etc. A non-admin SQL like SELECT FROM agnes_sessions UNION ALL SELECT email, id, … FROM users LIMIT 1 would leak every user's row. Replaces the hardcoded denylist with a dynamic allowlist — non-admin SQL may reference ONLY the registered agnes_* aliases. Every other table in `information_schema.tables` (main schema) is rejected. Future migrations that add a new sensitive table are automatically covered without re-editing this module. Also strips SQL comments (`/* /` and `--`) before the identifier scan so a comment-wrapped table name (`//users//`) can't slip past the regex. Four new negative tests pin: `users`, `personal_access_tokens`, block-comment wrap, line-comment wrap. Plus: per-user view-count cap (100) on /api/admin/observability/views so an admin can't fill system.duckdb with thousands of saved views. release: 0.54.0 — Activity Center + Telemetry + Sessions + internal datasource Cuts the work shipped across this PR (Activity Center build, recursive internal data source) into a versioned release. Bumps pyproject.toml to 0.54.0; renames the top of CHANGELOG.md from [Unreleased] to [0.54.0] — 2026-05-12 with a header summary; opens a fresh [Unreleased] section for the next round. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-12 22:41:19 +02:00
ZdenekSrotyr	5458ccc41b	hygiene: BQ error hint dispatch + catalog ENTITY column (#274 ) Two analyst-UX papercuts surfaced by the v0.53.4 onboarding smoke test. 1) /api/query remote_estimate_failed hint now branches on the BigQuery error class instead of always claiming a column doesn't exist. The previous hardcoded "Most often this means a column referenced … doesn't exist" misled analysts whenever BigQuery actually rejected on syntax — concretely, `SELECT COUNT(*) AS rows FROM …` fails with `Syntax error: Unexpected keyword ROWS at [1:20]` (`rows` is a BQ reserved word) and the hint pointed at non-existent columns. New _hint_for_bq_bad_request() helper dispatches: - "Syntax error" / "Unexpected keyword" → reserved-keyword alias hint with `AS row_count` workaround - "Unrecognized name" / "not found inside" → `agnes schema <id>` - "Table not found" → `agnes catalog` - fallback → enumerate all three 4 unit tests in TestHintForBqBadRequest pin each branch. Existing guardrail tests (test_fallback_fails_fast_on_pure_duckdb_syntax, test_remote_estimate_failed_surfaces_first_error_when_attempts_differ) continue to pass — both hint substrings they assert on still appear in the relevant branches. 2) `agnes catalog` replaces the FLAVOR column with ENTITY. FLAVOR rendered t['sql_flavor'] which duplicated SOURCE for any catalog dominated by one source type — analysts saw `SOURCE=bigquery FLAVOR=bigquery` on every row. ENTITY instead surfaces the upstream BigQuery entity_type (BASE TABLE / VIEW / MATERIALIZED_VIEW) for remote rows; non-remote rows render `-`. The distinction matters operationally: views don't support predicate pushdown, so `agnes query --remote` against a view trips the cost guardrail where the same query against a BASE TABLE pushes down cleanly. The entity_type field has been in the v2 catalog response since 0.51.0; this PR just stops hiding it behind a column header that conveyed no information. JSON output (`agnes catalog --json`) is unchanged — only the human- readable column changed. No DB migration; no API change. Verified: 4161 tests pass locally; 25 in test_api_query_guardrail.py green; the 4 new TestHintForBqBadRequest cases pin each branch.	2026-05-12 18:32:29 +00:00
ZdenekSrotyr	c8de0e0f64	release: 0.53.2 — diagnose silent-capture check + urllib3 2.7.0 + flaky-test fix (#270 ) Three bundled improvements: - #244 — new `agnes diagnose` check compares SessionStart events (~/.claude/projects/<encoded>/*.jsonl) against agnes-push uploaded log entries inside a 7-day window. Surfaces a warning when the gap exceeds 3, hinting at silently-broken capture-session — previously detectable only weeks after the fact. - Dependabot — bumps transitive urllib3 from 1.26.20 to 2.7.0 to close 5 advisories (4 high, 1 medium). kbcstorage 0.9.5 still pins urllib3<2.0.0 upstream; overridden via [tool.uv] override-dependencies since the SDK works fine against 2.x in practice (Client + Tables both flow through requests, which supports both lines). - #252 — fix flaky test_scratch_dir_cleaned_up_after_failed_extraction by redirecting tempfile.tempdir to a per-test tmp_path. Pre-#252 the test scanned the shared system tmp dir and a sibling store test in another pytest-xdist worker could trip the assertion mid-window. Closes #244. Closes #252.	2026-05-12 18:28:04 +02:00
ZdenekSrotyr	7d868159f2	address Devin Review on PR #264 BUG_0001 (red): config/claude_md_template.txt is the Jinja2 source for every analyst workspace's CLAUDE.md served via /api/welcome (src/claude_md.py). It still instructed the agent to use the removed --register-bq flag in 6 places — defeating the point of the PR for anyone who ran agnes init after merge. Rewritten: - ASCII routing diagram: "join with a local table" now points to "agnes snapshot create the remote side, then join locally" - "Three patterns" table → "Two patterns" (snapshot create + --remote) - "Hybrid query example" rewritten as snapshot-create + local join, with --remote called out as the escape hatch when the remote side is too big to snapshot - "When the table isn't in agnes catalog" — drop the ad-hoc --register-bq path; admins register, no analyst-side workaround - Footer cross-ref drops "hybrid-query examples" BUG_0002 (yellow): cli/error_render.py docstring line 7 said "All three previously flattened..." after I had already reduced "Three CLI paths" → "Two CLI paths" on line 3. "All three" → "Both".	2026-05-12 18:18:13 +02:00
ZdenekSrotyr	79b55b6ff3	remove agnes query --register-bq from client CLI The flag ran RemoteQueryEngine in-process on the caller's machine and required local BigQuery credentials (BIGQUERY_PROJECT + ADC). Analysts don't have those, so calling --register-bq from an analyst workspace surfaced as a confusing not_configured error chain ("Could not load static instance.yaml" + "BigQuery project not configured"). An agent following CLAUDE.md's hybrid-queries guidance would land in exactly that trap. The underlying engine was originally designed server-side (commit `d180b201`, "Step 28: Remote query architecture"); the CLI port (commit `d605e7d9`) silently assumed parity with the server. Server-side hybrid already exists as an admin-only POST /api/query/hybrid endpoint (app/api/query_hybrid.py) and is untouched here. Analysts combining local + remote data now have two documented paths: agnes snapshot create a filtered slice and join locally, or run the join server-side via agnes query --remote. CLAUDE.md, the agent skill, docs/DATA_SOURCES.md, and connectors.md updated accordingly.	2026-05-12 18:18:13 +02:00
ZdenekSrotyr	12db59127b	release: 0.53.0 — close Tier B trackers (#259-#261) + admin UI fix (#265 ) (#267 ) * release: 0.53.0 — Tier B trackers + admin UI bugfix Closes #259 (init resume sentinel), #260 (startup parquet-lock sweep), #261 (materialized schema uses local parquet, not BQ), #265 (admin tables apostrophe → HTML-entity escape). Tracker notes: #262 closed as obsolete (pre-empted by 0.51.0 changes), #266 left open pending UX clarification. * fix(init): move resume sentinel from .agnes/ to .claude/ The clean-install integration test (test_clean_install_integration.py) forbids creating .agnes/ in the workspace root via its forbidden_unconditional list — that path is reserved for ~/.agnes/ in the user's HOME (marketplace clone, CA bundle). .claude/ is already created by agnes init for settings.json + hooks, so dropping init-complete next to those keeps the resume sentinel consistent with the rest of Claude Code's workspace surface and lets the clean-install assertions pass. Issue #259. * docs(changelog): point #259 entry at new .claude/init-complete path Follows the sentinel move from .agnes/ → .claude/ to keep the changelog in sync with what 0.53.0 actually ships.	2026-05-12 16:28:41 +02:00
ZdenekSrotyr	48755b9864	release: 0.52.0 — UX/hygiene round (5 fixes from 0.51.0 retro) Closes #254 (agnes sample alias), #255 (wide-table render), #256 (single-flight on bq-metadata-refresh + run_id), #257 (init wording), #258 (progress bar clamp). Tier B trackers left open: #259 (init resume), #260 (stale .lock), #261 (schema cold-start), #262 (docker disk).	2026-05-12 15:09:14 +02:00
Vojtech	c09c85d13a	fix(cta): clipboard fallback + fold Atlassian MCP into connectors (#249 ) * fix(cta): fall back to textarea+execCommand when Clipboard API rejects The "Setup a new Claude Code" CTA fetches /auth/tokens, parses the JSON response, renders the setup script, THEN calls `navigator.clipboard.writeText()`. Modern browsers (Safari, Firefox, and Chrome on stricter configurations) reject `writeText` with NotAllowedError when transient user activation has been consumed by an intervening `await` — which is exactly the case here. Users perceived this as "the browser blocked the copy" and got the manual-paste fallback modal even though the textarea + `document.execCommand('copy')` path WOULD have worked synchronously without needing fresh user activation. `copyToClipboard` now: - prefers the modern Clipboard API (unchanged for the happy path) - on writeText rejection, falls back to `copyViaTextarea` instead of surfacing the rejection to the caller's catch block. `copyViaTextarea` is the previously-inline textarea fallback factored out into a named helper, with two small hardening touches: - `readonly` + `tabindex=-1` so the hidden textarea doesn't steal focus or pop the virtual keyboard on mobile. - explicit `setSelectionRange(0, text.length)` to belt-and-braces the selection on iOS Safari (where `.select()` alone sometimes selects zero chars on touch-focused textareas). Only the CTA button needed this — the Step-1 install-command and the connector-copy buttons all call `writeText` synchronously inside the click handler (no awaits in between), so they keep their existing user-gesture context and didn't hit the same rejection. No template changes there. * refactor(home): fold Atlassian MCP registration into connectors block The standalone "Register the Atlassian MCP server" step (was step 6 in the unified setup script) moves INTO the Atlassian connector's prompt body so all Atlassian-related setup lives in one logical group. Same intent that #247 carried for connectors, applied one level deeper: the hosted Remote MCP registration is part of "set up Atlassian", not its own ungrouped step. What changed: - `app/web/connector_prompts.py` — the Atlassian prompt's step 5 replaces the speculative "Register the on-demand Atlassian MCP under .claude/mcp/atlassian" line with the actual hosted Remote MCP registration: `claude mcp add --transport sse atlassian https://mcp.atlassian.com/v1/sse \|\| true`. The `\|\| true` keeps re-runs idempotent and the body explains the OAuth-on-first-use contract. Both /home's Atlassian tile and the inlined setup-script Atlassian sub-block emit this line — single source of truth holds. - `app/web/setup_instructions.py` — `_mcp_servers_block` deleted; the `mcp_servers` step is removed from `_step_numbers`; resolve_lines no longer calls it. - Renumbering: install (1), init (2), catalog (3), preflight (4), marketplace (5), diagnose (6), connectors (7), confirm (8). Was: 6 = mcp_servers, 7 = diagnose, 8 = connectors, 9 = confirm. - `tests/test_setup_instructions.py` — Confirm step 9→8, Connect 8→7, diagnose 7→6, mcp_servers references dropped. `test_step_numbering_with_connectors_step` now asserts `"mcp_servers" not in steps`. Stray-Confirm assertion lists shift by one position. - `tests/test_setup_page_unified.py` + `tests/test_web_ui.py` — same step-number shifts in the rendered /setup preview assertions. The `claude mcp add` line is still the Atlassian Remote-MCP path that the 2026-05-10 init-report Fix C added — only its position in the flow changes. /home Atlassian tile copying continues to install the MCP too (the prompt body the tile pastes contains the same line). 112 tests pass. * feat(atlassian): operator-overrideable base URL via AGNES_ATLASSIAN_BASE_URL Adds an env var / YAML key the operator (Terraform module, customer-VM template, OSS instance.yaml) can set to bake the Atlassian Cloud site root into the connector prompt — so end users don't have to guess / paste their org's `https://<myorg>.atlassian.net`. When set, the Atlassian connector prompt (rendered on both /home tile and inlined into the setup-script step 7 Atlassian sub-block) replaces step 1's "Ask me for my Atlassian Cloud site URL and email" with a one-line note that the URL is already provisioned by the operator and asks only for the email. Step 4's helper-script body has the `BASE_URL='<the site URL I gave you>'` placeholder substituted with the literal value. When unset (empty), the existing "ask the user" flow remains — no regression for OSS instances. Resolution + normalization in `get_atlassian_base_url()`: - env `AGNES_ATLASSIAN_BASE_URL` > yaml `instance.atlassian.base_url` > "" - strips trailing slash + trailing `/wiki` so the canonical value is the bare site root. Matches the per-user helper script's normalization at storage time (atlassian_prompt step 4 guard 2), so the literal baked in by the operator stays consistent with what the user's helper script would have computed from their input. Plumbing: - `app/instance_config.py`: new `get_atlassian_base_url()` resolver. - `app/web/connector_prompts.py`: - `atlassian_prompt(, base_url: str = "")` — string-replace two explicit placeholder phrases when base_url is truthy; otherwise return the prompt unchanged. - `all_connector_prompts(..., atlassian_base_url: str = "")` — forwards the kwarg. - `app/web/router.py` (`_build_context`): reads `get_atlassian_base_url()` and passes it through to `all_connector_prompts(...)` so both the /home tile context AND the inlined-script `resolve_lines(...)` call use the same value. - `src/welcome_template.py` (`compute_default_agent_prompt`): same threading via the existing import-on-demand path. Tests (`tests/test_home_route_resolution.py`): - `get_atlassian_base_url` resolver: default empty, env override, trailing-slash strip, trailing-`/wiki` strip. - `atlassian_prompt(base_url=...)`: literal URL baked in, ask-step removed, placeholder replaced, operator-baked-in copy appears. - `atlassian_prompt(base_url="")`: existing ask-the-user flow unchanged. - `all_connector_prompts(atlassian_base_url=...)`: kwarg threads through to the rendered atlassian prompt. 135 tests pass. feat(asana): register hosted Asana Remote MCP in connector prompt The Asana connector prompt only stored a PAT in the OS keychain + ran a curl verify against /api/1.0/users/me. That set Claude Code up for direct `curl` calls but didn't actually wire Asana into Claude's tool list — so the user couldn't ask Claude to "find my open Asana tasks" and have it work. Symmetric oversight to the Atlassian connector's original speculative `.claude/mcp/atlassian` line that this branch already replaced with `claude mcp add --transport sse atlassian https://mcp.atlassian.com/v1/sse`. Adds a new step 5 that registers Asana's hosted Remote MCP: claude mcp add --transport http asana https://mcp.asana.com/mcp \|\| true This is the V2 endpoint (streamable HTTP transport, launched February 2026). The V1 SSE endpoint at https://mcp.asana.com/sse was deprecated 2026-05-11 (today) and must NOT be used — calling it out explicitly in the prompt body so a future operator who finds an old reference doesn't paste the dead URL. OAuth is handled by Claude Code at first use, same model as the Atlassian MCP step. The PAT stored in step 3 stays for direct `curl` calls (precheck + ad-hoc scripts) — the MCP path uses its own OAuth grant, not the PAT. Old step 5 (revoke instructions) renumbers to step 6 and adds the `claude mcp remove asana` cleanup hint. Same single-source-of-truth invariant holds: /home Asana tile + the inlined Asana sub-block in the setup script (step 7 connectors) both emit identical text from `asana_prompt()`. 71 tests pass. * feat(asana): drive MCP OAuth login + end-to-end validation post-register `claude mcp add --transport http asana ...` only registers the server in Claude Code's local config — it does NOT trigger OAuth. The browser tab opens the first time any `mcp__asana__` tool gets invoked. So the previous step 5 left a user looking at a "registered" MCP that, in practice, hadn't authed yet and would fail on first real use. Same blind spot Atlassian's prompt also has, but Asana was the one called out in the latest review pass. Adds a new step 6 between MCP registration (step 5) and the revoke instructions (now step 7): a. Tell the user verbatim what's about to happen — a low-impact read through the MCP will pop the OAuth browser tab; sign in with the same account whose PAT they stored in step 3 and approve. Frames the OAuth as one-time so users don't wait for it on every later call. b. Drive an actual MCP read. Don't prescribe the exact tool name because the Asana MCP's exposed surface (`mcp__asana__`) is versioned upstream and we don't want to pin to a name that gets renamed. Instead: tell Claude to pick the lightest read from its surfaced tool list (users-me / list-workspaces / equivalent). Document the recovery path when Claude Code times out waiting for the OAuth tool use: `claude mcp list` to confirm registration before retrying. c. Print a single one-line proof that combines wiring + auth: "Asana MCP connected as <name> — <N> workspace(s) visible." Explicit anti-echo callout for tokens, task content, comments. On failure, surface the exact Claude-Code error and stop — no silent pass. d. Sanity-check that the MCP OAuth identity and the PAT identity reference the same Asana account. Easy mistake to make when the user has multiple Asana accounts — flag only on mismatch, keep quiet when they match. Recovery: `claude mcp remove asana && claude logout asana` then redo step 5. Step 7 (revoke) absorbs both the keychain delete + the `claude logout asana` line so users have a single place to undo everything. 43 tests pass. * fix(init): clear stale CA env vars on Windows before any TLS handshake Reported by the 2026-05-11 Windows test pass: after `agnes init` the gws connector failed with `UnknownIssuer` TLS errors because `SSL_CERT_FILE` and `REQUESTS_CA_BUNDLE` were still set in Windows User scope pointing at `C:\Users\localadmin\.config\agnes\ca-bundle.pem` — a file that did not exist on the test host. Past Agnes installs (the setup-prompt trust block + older bootstrap helpers) write those pointers when they materialize a combined Agnes-CA bundle; when the bundle file later disappears (re-init on a new VM, machine swap, the ~/.agnes dir wiped), the pointers go stale and every native Windows TLS handshake fails before Agnes itself runs. SSL_CERT_FILE in particular REPLACES (not appends to) the trust store, so a stale pointer is silently catastrophic. `agnes init` now clears stale pointers in two layers before the first server roundtrip: 1. Current-process env (os.environ) — what the immediately-following `api_get` to /api/catalog/tables actually reads. Without this, init itself blows up before it gets to step 2. 2. Windows User-scope env via PowerShell `[Environment]::SetEnvironmentVariable(name, $null, 'User')` — what every future shell + every native tool (gws, claude.exe, pip, uv) inherits. The 2026-05-11 reporter expected this exact cleanup ("init was supposed to clear these but they persisted"). The cleanup is best-effort and conservative: - Only deletes a var when its value points at a path that does NOT exist on disk. Intentional operator config (e.g. SSL_CERT_FILE pointing at a corp certifi bundle) stays put. - PowerShell missing / restricted execution policy / WSL-without-pwsh: swallowed silently. The current-process leg still runs, which unblocks init even on hosts where the User-scope leg cannot fire. Tests (`tests/test_init_ca_cleanup.py`, 6 cases): - Stale pointers → removed from process env. - Real-path pointers → preserved. - Non-Windows hosts: PowerShell is not invoked. - Windows hosts: PowerShell IS invoked with a script that checks all three vars + uses Test-Path + SetEnvironmentVariable. - PowerShell FileNotFoundError: cleanup swallows it, does not raise. - `_is_windows_host()` reflects sys.platform. * refactor(asana): MCP-first flow — drop PAT storage, precheck via `claude mcp list` The Asana hosted MCP at https://mcp.asana.com/mcp authenticates via OAuth (Claude Code holds the grant; browser tab pops on first tool use). The earlier prompt walked the user through creating + keychain- storing an Asana Personal Access Token AND registering the MCP — two parallel auth surfaces for one connector. Once the MCP works, the PAT has no consumer: the precheck/verify steps that used `curl $BASE/api/1.0/users/me` are just redundant proof that Asana itself is reachable, which the OAuth handshake already establishes. Removed: - Step 0 keychain probe + curl verify against /users/me with PAT. - Step 1 open developer-console / create PAT. - Step 2 click "+ New access token", warn shown-ONCE. - Step 3 helper-script for keychain-storage (per-OS bodies: macOS `security add-generic-password`, Linux `secret-tool store`, Windows `cmdkey /generic`). - Step 4 PAT-side `users/me` verify. - Step 5's split that kept the PAT around for direct curl scripts. - Step 6d's "MCP vs PAT identity sanity check" — there is no PAT anymore, nothing to mismatch against. New flow (3 steps total): - Step 0 precheck: `claude mcp list \| grep ^asana` — if found, the server is registered AND Claude Code is holding its OAuth grant (otherwise prior failure would have removed it); print "Asana MCP already registered — skipping setup" and stop. Tells the user the explicit reset command (`claude mcp remove asana && claude logout asana`) so a re-register stays one paste. - Step 1: `claude mcp add --transport http asana https://mcp.asana.com/mcp` — no `\|\| true` because step 0 should have caught the "already exists" case. Step explains the V2-vs-V1 endpoint distinction (V1 SSE deprecated 2026-05-11) and the abort-clean recovery if the precheck somehow missed the existing server. - Step 2: same OAuth + low-impact-read validation pattern as before. - Step 3: revoke instructions (mcp remove + logout + Asana-side app revoke at app.asana.com/Settings → Apps). Both surfaces (the /home Asana tile and the inlined Asana sub-block in the setup script's step 7) emit the new text from the same asana_prompt() — single-source-of-truth invariant intact. 77 tests pass.	2026-05-11 21:54:51 +02:00
minasarustamyan	19c5a7592a	Session capture queue, private session, and setup-prompt fixes (#242 ) * Capture session paths via SessionStart hook + lock parallel pushes Replace the encoding-based scan of ~/.claude/projects/<encoded-cwd>/ with a queue file populated by a new `agnes capture-session` SessionStart hook. The hook reads the documented `transcript_path` field from Claude Code's hook stdin JSON, sidestepping the cwd-to-folder encoding (which is an internal implementation detail and varies by Claude Code version). - New `agnes capture-session` subcommand appends transcript_path to <workspace>/.claude/agnes-sessions.txt. Silent on all malformed input so a hook chain failure doesn't clutter Claude Code startup. - `agnes push` now consumes the queue: atomic snapshot rename guards against hooks writing during the push window, successful uploads land in agnes-sessions-uploaded.txt (TSV: timestamp + path), failed paths are requeued. - Cross-platform single-instance lock via the filelock package (fcntl on POSIX, msvcrt on Windows). Concurrent SessionEnd hooks — common when the user closes several sessions at once — silent-exit on the losing side instead of all racing the upload. - Recovery: pre-existing snapshot files from a crashed push are picked up and processed before the live queue. - The SessionStart `agnes push` self-heal entry is dropped — it became redundant once the queue persists across runs (orphans from headless / crashed sessions ship out on the next interactive SessionEnd push). Existing workspaces auto-migrate via the marker-based replace logic. - Legacy encoding scan stays available behind `--legacy-scan` for one- off backfills of sessions predating the queue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add /agnes-private + statusLine indicator for private sessions Users handling sensitive data inside Claude Code can now opt a session out of the Agnes upload pipeline, either proactively (right after session start) or reactively (mid-session). The `/agnes-private` slash command runs `agnes mark-private` deterministically via `!`-prefix direct bash — no AI in the loop. A workspace-installed statusLine surfaces a `🔒 agnes-private` indicator in Claude Code's status bar so the user sees the state at a glance. Authoritative source of "do not upload" is a separate file `<workspace>/.claude/agnes-sessions-private.txt` (one session_id per line). Both `capture-session` (queue writer) and `push` (queue reader) consult the list. This makes the slash-command / SessionStart-hook race impossible by construction: whichever runs first, the session is correctly filtered out. - `agnes mark-private` reads `CLAUDE_CODE_SESSION_ID` from env (set by Claude Code in every bash subprocess it spawns — stable documented API) and appends to the private list. - `agnes statusline` reads the session JSON Claude Code pipes on stdin, checks the private list, and emits the indicator or nothing. Optimized for the high call frequency of statusLine renders. - `capture-session` extracts session_id from hook stdin and skips queue write when the ID is already on the private list (race protection). - `push` filters snapshot entries by the private list and appends to a per-workspace audit log `agnes-sessions-private-skipped.txt`. - Queue format migrated from `<path>` to `<session_id>\t<path>`; legacy one-column lines still parse (empty session_id, still upload, can't be marked private retroactively — fine, they pre-date the feature). - `install_claude_hooks` writes a workspace statusLine unless the user already has a custom one (warn + preserve). Idempotent re-init. - `install_claude_commands` ships `agnes-private.md` alongside `update-agnes-plugins.md`. Per-template fallback so a missing template doesn't get clobbered with the wrong content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix setup-prompt + CLAUDE.md marketplace copy + drop skills step Three issues against the post-PR-#240 / post-PR-#237 state: 1. Setup prompt's marketplace block trailer (both has-stack and empty-stack variants) claimed the SessionStart hook keeps the marketplace clone in sync via `agnes refresh-marketplace --quiet` on every session and that admin grants land automatically — both false since PR #237 (0.47.x) moved the install/update path out of the hook into the `/update-agnes-plugins` slash command. The hook is `--check`-only: detects server-side changes, prompts the user to run the slash command, which does the full reconcile interactively with output visible in the transcript. 2. The empty-stack variant framed composition as "admin grants only", missing the actual three-source served stack: (admin RBAC ∩ /marketplace subscriptions) ∪ system-mandatory plugins (admin-pinned, auto-applied) ∪ Flea market installs (skills/agents bundled, plugins standalone) Updated copy spells out all three sources so analysts know where their stack picks live, and what the SessionStart hook actually does on change detection. 3. CLAUDE.md template's "Agnes Marketplace" section conflated eligibility (`resolve_allowed_plugins` — what's listed) with served stack (`resolve_user_marketplace` — what actually reaches Claude Code). The two are different: a user can be RBAC-eligible for a plugin without having subscribed to it on /marketplace. Rewrote the section to distinguish the eligibility set from the served stack and to describe the `--check`-only hook accurately. Plus: deleted the setup prompt's interactive Skills step (final step before Confirm). The named-opinion question — "do you want me to bulk-copy every skill into ~/.claude/skills/agnes/ or pull on-demand via `agnes skills show <name>`?" — had no obvious right answer for new users at the tail end of a wall of technical steps. On-demand lookup is the one-size-fits-all default; `agnes skills list/show` remain discoverable and the CLAUDE.md template references specific skills inline (e.g. agnes-data-querying in the BigQuery section) where they're relevant. Layout: Confirm shifts from step 9 to step 8. Tests updated, full setup/marketplace/welcome surface green (115 passed). Remaining full-suite failures are pre-existing (BQ/Keboola fixtures, Windows charmap collection error in test_v26_keboola_e2e) — verified against a clean stash, unrelated to this diff. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix session-queue race + snapshot PID-reuse data loss Two blocker fixes from the PR #242 review: 1. Concurrent SessionStart hooks could corrupt the queue file on Windows. Python's `open(path, "a")` is not atomic there — the CRT does not pass FILE_APPEND_DATA to CreateFile, so concurrent appenders (user opening several Claude Code windows simultaneously) could interleave bytes mid-line. The malformed lines then silently fail the parser and the entries are dropped. Fix: wrap append_to_queue, requeue_failed, and snapshot_queue in a short-lived FileLock on a dedicated `agnes-queue.lock`. Separate from `agnes-push.lock` so capture-session hooks don't block on the push command. New test_append_concurrent_threads_no_corruption reproduces the race with 4 threads x 50 appends. 2. Snapshot filenames embedded only the PID (`agnes-sessions.snapshot. <PID>.txt`). After a crashed push left a snapshot on disk and the OS recycled the PID for a new push, `os.rename` would atomically overwrite the recovery snapshot — every entry in it lost, silently. Fix: append a uuid8 hex tail (`agnes-sessions.snapshot.<PID>. <uuid8>.txt`). find_recovery_snapshots already globs the prefix so it picks up both old and new format. New test_snapshot_filename_is_unique_per_call asserts two consecutive snapshots under the same PID don't collide. Targeted tests green (47/47 in session_queue/capture_session/cli_push). Full suite failures unchanged from baseline (pre-existing BQ/Keboola fixture issues per CLAUDE.md). * Auto-refresh workspace hooks + bash-wrap all hook entries (Windows) Fixes from PR #242 second review (ZdenekSrotyr): 1. `uv.lock` regenerated to include `filelock 3.29.0` (declared in pyproject.toml but missing from the lock file — CI's lockfile-consistency check would fail; `uv pip install` on a clean cache would silently miss the dep). 2. `agnes self-upgrade` now auto-refreshes the workspace Claude Code hooks via the new `cli.lib.hooks.maybe_refresh_claude_hooks`. Closes the silent-stop migration gap: a v0.48 workspace would auto-upgrade the CLI from its existing SessionStart self-upgrade entry but never pick up the new `agnes capture-session` SessionStart hook, leaving the queue empty and `agnes push` uploading nothing. The refresh fires on both the "info is None" fast path (CLI already current — catches the second SessionStart after a prior upgrade) and the install-success path. Guarded by `workspace_has_agnes_hooks` so it never writes `.claude/settings.json` into directories that aren't Agnes workspaces (e.g. `agnes self-upgrade` invoked from `~/`). Errors are surfaced on stderr but never flip the upgrade exit code. 3. All Agnes-managed hooks are now wrapped in `bash -c "..."`. The self-upgrade+pull chained SessionStart entry was the only one still shipping unwrapped — Claude Code on Windows runs hook commands directly without a shell, so the `;` chain + `2>/dev/null` + `\|\| true` shell syntax silently no-op'd on native Windows installs without Git Bash on PATH. Workspaces still on the old form auto-upgrade via the refresh path above. Tests: +12 in test_lib_hooks.py (guard semantics, v0.48→v0.49 migration end-to-end, third-party-hook preservation, bash-wrap invariant). +5 in test_self_upgrade.py (refresh fires on info=None, fires on install success, skipped on failure, skipped on --check-only, refresh failure never flips exit code). 130 targeted tests green. The 2 pre-existing Windows path-separator failures in `test_smoke_test_detects_version_mismatch[uv\|pip]` are unrelated (path mismatch `\fake\uv\bin\agnes` vs `/fake/uv/bin/agnes` in test asserts, pre-PR baseline). * CHANGELOG: document PR-242 main features Closes ZdenekSrotyr #4: the [Unreleased] block was missing entries for the PR's primary surface — only the post-merge fix bullets and the unrelated setup-prompt copy change were captured. Adds: - ### Added: 6 bullets covering the session capture queue + new `agnes capture-session` subcommand, `/agnes-private` slash + `agnes mark-private`, `agnes statusline` + statusLine wiring, `--legacy-scan` opt-in fallback, single-instance push lock, and the new `filelock` runtime dep. - ### Changed: BREAKING bullet on the SessionStart / SessionEnd hook wire format change (capture-session as first SessionStart entry, push self-heal removed, SessionEnd push detached via nohup, all entries bash-wrapped). Folds the prior standalone bash-wrap bullet into this consolidated entry — Z's review flagged the layout shift as BREAKING, and grouping the related sub-changes makes the migration story readable in one place. - Operator migration is auto-handled by `maybe_refresh_claude_hooks` invoked from `agnes self-upgrade` (separate Changed entry below). No `agnes init` re-run required. Pre-queue session jsonls on upgrading workspaces still need a one-off `agnes push --legacy-scan` — flagged in the BREAKING bullet. No code change; doc only. * Drop permanent 4xx uploads instead of requeueing forever Closes ZdenekSrotyr #5. Previously the push retry path requeued any non-200 response except the literal "file not found on disk", so 401 (token expired), 403 (RBAC denial), 413 (payload too large), 400 (server-side validation) cycled through every push run forever — the queue grew without bound and each run re-bombarded the server with the same deterministically-failing upload. Now 4xx (except 408 Request Timeout + 429 Too Many Requests, which the HTTP spec marks as transient) is dropped and audit-logged to `<workspace>/.claude/agnes-sessions-failed.txt`: <iso_ts>\t<session_id>\t<status>\t<transcript_path> 5xx and network errors continue to requeue — those reflect server / transport state that can change between runs, so retry is the right behavior. The audit log piggybacks on the push single-instance lock (agnes-push.lock) — push is the only writer to this file, same as the existing `mark_uploaded` and `mark_private_skipped` paths, so no separate filelock is needed. `agnes push --json` surfaces a new `dropped_permanent` counter; non- quiet stdout mentions the audit-log path so operators tailing the output have a pointer to the forensic trail. Tests: +7 in test_cli_push.py (401/400/403/413 → drop; 408/429 → requeue; 500/502/503 → requeue; network exception → requeue; --json `dropped_permanent` counter; stdout audit-log pointer). +1 in test_session_queue.py (mark_failed_permanent TSV format). 127/129 targeted tests green. The 2 pre-existing Windows path-separator failures in `test_smoke_test_detects_version_mismatch [uv\|pip]` are unrelated (path mismatch `\fake\uv\bin\agnes` vs `/fake/uv/bin/agnes` in test asserts, pre-PR baseline). * Catch OSError in push lock acquisition Closes ZdenekSrotyr #8. `acquire_or_skip` in `cli/lib/push_lock.py` previously caught only `filelock.Timeout`. Any `OSError` from `FileLock.acquire` — read-only filesystem, permission denied on `.claude/`, disk full, hardware I/O failure — propagated as an unhandled traceback. Two visible failure modes: - SessionEnd hook: `\|\| true` in the wrapper swallowed the error, so daily pushes silently never ran. Operator had no signal. - Manual `agnes push`: ugly Python traceback dumped to the terminal instead of a clean exit. Now `OSError` is treated the same as `Timeout` — yield `None`, caller returns cleanly with rc=0. The operator's environment in these scenarios has bigger problems than missing session uploads, so we swallow rather than retry-loop or surface a noisy warning. Test: `test_push_silent_exit_when_filelock_raises_oserror` patches the `FileLock` used inside `push_lock` to raise OSError on acquire, verifies push exits 0 with no traceback and the queue is preserved for the next attempt. * Address remaining S2 items from PR-242 review Four items from ZdenekSrotyr's S2 list: S2.10 — `_install_statusline` truthy check (cli/lib/hooks.py): replace `if existing:` with explicit `if existing is None or existing == "":`. Documents and tests the behavior for both edge cases (explicit-null and empty-string `statusLine`) — both treated as "not configured" rather than "explicit user opt-out", so we install ours. Two new tests in test_lib_hooks.py pin the contract. S2.6 — onboarding docs for /agnes-private. New "Private sessions" subsection in `config/claude_md_template.txt` (next to Data Sync) covering the slash command, statusbar indicator, and audit-log location. One-line tip in `app/web/setup_instructions.py` so the feature is discoverable at onboarding. S2.9 — e2e privacy test (tests/test_e2e_privacy.py). Wires capture_session → mark_private → push against a recording fake api_post and asserts zero session uploads for the marked one. Three cases: mark-before-capture (queue write skipped), mark-after-capture (push-side filter catches it + audit-logs), control (unmarked sessions upload normally). David #8 — `--legacy-scan` help text now documents the private-list gap (legacy entries carry empty session_id, so the filter is not consulted). The practical impact is bounded — pre-queue sessions cannot have been marked private since the private list is a queue-era feature — but the disclaimer in the help text means an operator running a backfill is not surprised. 68 targeted tests green (3 new e2e + 2 new truthy edge tests + existing). 2 pre-existing Windows path-separator failures in test_smoke_test_detects_version_mismatch[uv\|pip] unchanged. Remaining S2 items (statusline mkdir push-back, capture-session silent-fail follow-up) handled in PR comment + follow-up issue respectively. * Address remaining S2 follow-ups (David #8, S2.7, David #11) Three items left over from Mina's bbf63472 batch — that commit addressed S2.6/S2.9/S2.10 + documented David #8 in help text but deferred the actual implementations of S2.7, David #11, and the real David #8 fix to follow-ups. This commit closes them. David #8 — `agnes push --legacy-scan` now consults the private list. Claude Code names jsonls `<session-id>.jsonl`, so the file stem IS the session id; the legacy-scan path can apply the same private filter the queue path uses. Both the dry-run and live-upload code paths fixed. Help text updated (no longer warns the filter is bypassed). Two new tests in test_cli_push.py cover the upload-skip path + the dry-run `would_skip_private` segregation. S2.7 — `statusline`/`is_private` no longer mkdir-pollutes arbitrary workdirs. Split `_claude_dir` into `_claude_dir_writable` (used only from `add_private`) and `_claude_dir_readonly` (no mkdir). The read-only public helpers (`private_list_path`, `read_all_private`, `is_private`) compose the no-mkdir variant by default; `add_private` opts in via `writable=True`. Added a process-local mtime-keyed cache around `read_all_private` so in-process callers (push doing one stat per upload candidate, future `agnes diagnose`) don't re-parse the file on every check. Cache eviction on `add_private` so a sub-second write+read sequence doesn't see stale data even on coarse-mtime filesystems. Two new tests pin the no-mkdir contract + the in-same-second add+read consistency. David #11 — `agnes capture-session` writes a breadcrumb log on every invocation. New `<workspace>/.claude/agnes-capture-session.log` TSV: `<iso_ts>\t<outcome>\t<detail>` where outcome covers every silent- exit path (`ok`, `private_skip`, `empty_stdin`, `bad_json`, `not_object`, `no_transcript_path`, `stdin_read_error`, `write_error`). Gives operators a signal to detect "hook fires but queue stays empty" — without it, an upstream Claude Code stdin- contract change is invisible because the hook always exits 0. Log rolls at 256 KiB so it doesn't grow unbounded on long-lived workspaces. Best-effort: a breadcrumb-write failure is itself swallowed so the hook contract stays "exit 0 always". Skipped in non-Agnes workdirs (no `.claude/` exists) so opening Claude Code in `~/` doesn't pollute it. Five new tests in test_capture_session.py cover the success / bad_json / no_transcript_path / private_skip / no-pollute paths. 115 targeted tests green (test_cli_push, test_capture_session, test_private_list, test_session_queue, test_e2e_privacy, test_lib_hooks, test_statusline, test_mark_private). --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-11 13:31:16 +00:00
Vojtech	41829e8a45	Setup-prompt + bootstrap fixes from 2026-05-10 init report (#240 ) * Setup-prompt + bootstrap fixes from David's 2026-05-10 init report Three issues from clean-machine bootstrap evidence: 1. `agnes refresh-marketplace --bootstrap` failed to recover when the local clone existed but Claude Code's marketplace registry had lost the `agnes` entry. Bootstrap path now parses `claude plugin marketplace list`, re-runs `claude plugin marketplace add ~/.agnes/marketplace` when missing, and treats `add` failures as fatal (was warn-and-continue, root cause of the cascade into "Marketplace 'agnes' not found" plugin install errors). 2. Setup prompt now always emits the marketplace-registration block, even when the operator has zero plugin grants. Pre-wires the SessionStart hook so future admin grants land automatically without re-running setup. Block copy adapts: empty list shows "no plugins granted yet", populated list shows "install plugins". 3. Setup prompt registers the Atlassian Remote MCP server unattended (`claude mcp add --transport sse atlassian https://mcp.atlassian.com/v1/sse`). Hosted Remote MCP, OAuth handled automatically by Claude Code on first use. Asana / GWS stay on the /home connector cards (PAT/keychain flows don't fit unattended bootstrap). Confirm step nudges the user toward the /home connector cards for the PAT-flow services. CLAUDE.md template renames the marketplace section to "Agnes Marketplace" and documents that all plugins are addressed as `<plugin>@agnes` regardless of upstream slug. Layout: Confirm shifts from step 6/8 to step 9 across all variants (preflight, marketplace, MCP all unconditional). Tests updated. * Link Claude license options from /home install pane Step-1 Claude install on /home pointed users to OAuth without explaining what to do if they don't have a Pro/Max subscription. Add a one-line follow-up link to the plan-tier section on /setup-advanced (new `#claude-plan` anchor) so first-time users discover the subscription tiers rather than bouncing on the OAuth screen. * Add idempotent + no-TLS-bypass guardrails to /home connector prompts The Asana / Google Workspace / Atlassian connector prompts on /home already shipped a precheck step that short-circuits when the service is already wired, but they didn't carry the same idempotency + surface-errors-verbatim + don't-disable-TLS-verification guardrails the bash bootstrap prompt has. Add a one-paragraph 'Ground rules' block at the top of each prompt so a connector failure doesn't tempt the model into bypass workarounds, matching the same posture David's 2026-05-10 init report flagged for the bash flow. * skip Source: lines in marketplace registry detector `claude plugin marketplace list` prints a `Source: <local path>` line under each registered marketplace; the local clone almost always lives under a path containing the marketplace name itself (`~/.agnes/marketplace`). A naive \\bagnes\\b match over the full stdout therefore false-positives whenever ANY unrelated marketplace sits under `~/.agnes-…/` or similar. Filter Source: lines out before matching so the recovery path actually re-adds when needed instead of silently falling through to a broken `marketplace update agnes`. Adds regression test covering the substring-only case. * drop customer-specific tokens from CHANGELOG entries Per CLAUDE.md vendor-agnostic OSS rule ("nothing customer-specific ... in changelogs"): - "agnes-vrysanek.groupondev.com" -> "a private-CA Agnes deployment" - "Groupon Marketplace / groupon-marketplace" -> "<Org> Marketplace / <org>-marketplace" (placeholder example) - Removed "David flagged" attribution language; init-report context stays intact, just stripped of the named host + brand --------- Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>	2026-05-10 20:24:00 +02:00
minasarustamyan	d9405a6888	Move marketplace plugin updates from hook to /update-agnes-plugins skill (#237 ) * Move marketplace plugin updates from hook to /update-agnes-plugins skill The SessionStart hook used to run `agnes refresh-marketplace --quiet`, which performed a full fetch+reset+install cycle on every Claude Code session start. That work was invisible to the user, slowed session startup, and was unrecoverable interactively when something failed. Split the responsibility: - `agnes refresh-marketplace --check` is a new lightweight detector: `git fetch` only, compares local HEAD with remote FETCH_HEAD, emits a Claude Code hook JSON message pointing the user at `/update-agnes-plugins` when the marketplace has changes. No reset, no plugin install/update side effects. - `/update-agnes-plugins` is a new slash command (installed by `agnes init` into `<workspace>/.claude/commands/`) that runs `agnes refresh-marketplace` (default chatty path). Output streams into the Claude Code transcript so the user sees install/update progress and can react to errors interactively. - The SessionStart hook now runs `--check`. Existing workspaces auto-upgrade on next `agnes init` (substring marker matches both the old `--quiet` entry and the new `--check` one). BREAKING: `agnes refresh-marketplace --quiet` is removed. Old hooks calling it silent-noop after the CLI upgrade (the hook's `\|\| true` swallows the unknown-flag error) until re-init rewrites them. * Point marketplace 'Added to your stack' hint at /update-agnes-plugins The post-install green panel on plugin and skill/agent detail pages referenced the SessionStart auto-install path and a shell-prompt `agnes refresh-marketplace` invocation. With the hook now being detect-only, that copy was misleading — the actual install path is the new slash command. Condensed to a single instruction: "Open a new Claude Code session and run:" followed by `/update-agnes-plugins` in a copy-chip. JS clipboard string updated to match. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>	2026-05-09 21:10:39 +02:00
Vojtech	2e2e1a1eca	feat(home): state-aware /home + /setup-advanced + schema v26 (#228 ) * feat(home+news): state-aware /home + /news + admin-edited news section Squash of the vr/home-page feature work for clean rebase onto main. Original 18-commit history preserved in branch backup/vr-home-page-pre-rebase. What's in this PR: State-aware /home page - New `/home` route with hero + auto-mode + connectors (Asana / GWS / Atlassian) + lookarounds. Onboarded vs not-onboarded state-machine branches a single template (`home_not_onboarded.html`); the install steps, "Setup a new Claude Code" CTA (90-day PAT mint), and per- connector setup prompts hide once `users.onboarded=TRUE`. A completion badge replaces them. - "Mark me as offboarded" button reverses the flag without an SQL UPDATE. - `users.onboarded BOOLEAN` column added; default FALSE; flipped by the CLI's `agnes init` post-success POST and the `/admin/users` API. - Connector setup prompts pre-check whether the tool is already installed/connected before re-running setup. - GWS scope set widened to include Google Chat (`chat.spaces`, `chat.messages`). Single template + design tokens - `dashboard.html` now extends `base.html` via the new `{% block layout %}` opt-out (full-width pages skip the 800px `.container`). Net: every page shares one shell. - `style-custom.css` `:root` extended with `--space-{7,9,10,12}`, `--radius-2xl`, `--shadow-{card,elevated}`, `--text-{muted,disabled}`, `--focus-ring`, `--transition-`, `--width-{narrow,app,wide}` so inline page styles can migrate incrementally. Auth redirects honor AGNES_HOME_ROUTE* - `safe_next_path` resolves the configured home route when no `default=` is passed; OAuth callbacks, magic-link clicks, password form, and LOCAL_DEV_MODE shortcuts now land on `/home` (or whatever the operator picked) instead of always /dashboard. News section + /news permalink + /admin/news editor - Schema-bumped `news_template` table (single versioned entity, draft + publish gate). `published BOOLEAN` distinguishes draft from public; monotonically-increasing `version` per save; rows >30d pruned on save except the currently-displayed published version. - `/home` bottom-of-page renders the latest published intro with a "Read more →" link to `/news` (which renders the full body). - `/admin/news` editor with sandboxed live preview, versions table, per-row Unpublish, Format-help cheatsheet. - `agnes admin news show / draft / edit / publish / unpublish / versions / export` (CLI). Talks to the live server via the `/api/admin/news/` endpoints (PAT-authed) — no direct DB access so it coexists with a running uvicorn. - Optimistic-lock guard: `agnes admin news publish --version N` and PUT/PATCH endpoints accept `expected_version` and 409 with structured `{error: "version_conflict", expected, actual, actual_by}` when a concurrent admin replaced the draft. Edit refuses to overwrite a draft authored by someone else without `--force` or `--expect-version`. - nh3 (Rust-backed ammonia) HTML sanitizer; iframe pre-pass strips any iframe whose src is not on the YouTube/Vimeo/Loom allowlist; javascript:/data: schemes blocked everywhere. - Author CSS vocabulary: `.news-hero` (blue gradient hero block), `.callout`/`.callout-{info,warn,success,danger}`, `.video-embed`, `.news-section`, `.news-grid-{2,3}`, `.news-cta` — all consolidated in `style-custom.css` under "News content vocabulary (shared)" so /home perex, /news body, and /admin/news preview share one source of styling. - Code-inside-`<pre>` contrast fix (was unreadable amber-on-silver). - `.news-content` table styling (border, header band, row-hover). `scripts/dev/run-local.sh`* — local uvicorn launcher. Pulls Google OAuth client id/secret from GCP Secret Manager (`AGNES_OAUTH_GCP_PROJECT`-driven, no vendor defaults), points `AGNES_CLI_DIST_DIR` at `./dist` so the wheel endpoint resolves, and `--dev` flips `LOCAL_DEV_MODE=1` + `AGNES_HOME_ROUTE=/home` for one- command iteration. `LOCAL_DEV_MODE=1` also enables the FastAPI debug toolbar. CLAUDE.md "Run tests before every push" section codifies `pytest tests/ -n auto -q` as non-negotiable before each push. Tests: 51 + 14 + 8 = 73 new tests across news-template repo, sanitizer, API, web, CLI; plus updated home/auth/template tests for the new shared-shell architecture. Origin docs (gitignored, customer-fork content): docs/brainstorms/home-page-requirements.md, docs/plans/2026-05-07-001-feat-home-page-plan.md. * feat(cli): agnes onboarded {on,off,status} — self-scoped flag toggle User-facing equivalent of the in-page "Mark me as (off)boarded" button on /home. POSTs /api/me/onboarded with {onboarded, source}; --source overrides the audit-log marker so flips made from the CLI vs the web button vs agnes init automation stay distinguishable. `status` reads via /api/me/profile (when present); falls back to a quick body-marker scan of /home so the read path doesn't write an audit_log row. PAT-authed via cli.client.api_post — same convention as agnes admin news / agnes admin add-user etc. Tests: 5 covering on/off/status round-trip, idempotency, and audit-log source recording. Full suite holds at 12 pre-existing failures (same set as before). * ui(nav+home): primary nav reorg + green What's new band + /marketplace link fix Primary nav (post-rebase audit + per-user feedback): - Items: Home → Marketplace → Data Packages → Memory. Admin dropdown for admins only. The "Dashboard" label was renamed Home — point still resolves through `home_route` so customer instances on /dashboard still land there. - Activity Center moved into the Admin dropdown. Per-team adoption analytics is admin-consumed in practice; the route still allows any authed user for direct deep-links so existing /home tile + bookmarks keep working. - Memory link added (→ /corporate-memory) — was previously buried in the /home "Look around" tiles. - Setup local agent + My Stack dropped from main nav. Setup is the /home install flow's home now; My Stack lives as a tab inside /marketplace. /home tweaks: - Plugin marketplace tile now points at /marketplace (was /store — legacy from before the marketplace rebrand landed in #230). - "What's new" section header gets a green band (success-flavored D1FAE5 background, A7F3D0 border, darker green title) so the bottom-of-page news block visibly distinguishes from the blue install-hero at the top. Header strip only — body stays white. Test fix: test_home_route_resolution renamed `dashboard_link_uses_home_route` → `home_link_uses_home_route` and asserts `href="/home">Home` instead of `href="/home">Dashboard` after the label change. * fix(home): decouple Step 3 + Connect-tools collapse from server onboarded flag The server-side `users.onboarded` flip happens through two paths: 1. Explicit user click on "Mark me as onboarded" or `agnes onboarded on`. 2. Implicit `agnes init` POST → /api/me/onboarded on success. Path 2 produced a UX surprise: an analyst running `agnes init` mid-flow reloaded /home and saw Step 3 (auto-mode) + Connect-your-tools auto- collapse to summary bars. They were actively working through those sections — the install POST never signalled "I'm done with the rest of setup", just "Agnes itself is installed". Decouple the section-collapse decision from the server flag: - Step 1 + Step 2 install blocks: still hidden on `onboarded=TRUE` (their completion is a hard server signal — Agnes IS installed). - Step 3 + Connect-your-tools: render flat by default in BOTH states. Wrapped in `<details class="setup-collapsible" open>` so the browser's native disclosure handles per-section toggle without JS, but the `<summary>` is CSS-hidden until the page-level `data-setup-minimized="1"` attribute is set on `.home-mock`. - New "Minimize setup view" toggle inside the blue install-hero, rendered only when onboarded. Click flips the data-attr on `.home-mock` AND removes the `open` attribute from each `<details>`. State persists in `localStorage["agnes_home_setup_minimized"]` so the choice survives reloads but is per-device. - "Show full setup view" (the same button when minimized) re-opens both `<details>` and clears localStorage. When minimized, each `<details>` still has its own native expand/ collapse — click the gray summary bar to peek at one section without toggling the page-level minimize off. Tests: - test_step3_and_connectors_render_flat_when_onboarded_by_default — asserts `<details class="setup-collapsible" ... open>` for both sections post-onboarding and the absence of any server-rendered `data-setup-minimized` attribute on the `.home-mock` root. - test_minimize_toggle_visible_only_when_onboarded — toggle button rendered only when onboarded. Full pytest holds at 12 pre-existing failures (same set).	2026-05-08 18:28:47 +02:00
Vojtech	107195730d	feat(observability): optional PostHog integration (#231 ) * feat(observability): optional PostHog integration (errors, LLM traces, replay, flags) Off by default. Activates when POSTHOG_API_KEY is set in env. Defaults to PostHog Cloud EU; override host for US Cloud or self-hosted. Coverage: - FastAPI 500 handler captures unhandled exceptions - src/orchestrator.py rebuild + rebuild_source failures - services/scheduler/ HTTP-job failures - cli/main.py uncaught CLI errors (Typer.Exit/SystemExit/KeyboardInterrupt skipped; flushes before re-raise so short-lived CLI invocations don't drop events) - connectors/llm/anthropic_provider.py + openai_compat.py emit $ai_generation events with provider, model, latency, token counts (prompt/completion bodies stay off unless POSTHOG_LLM_PAYLOADS=1 because LLM prompts here routinely include customer SQL/data) - Browser snippet injected into every text/html response by PosthogInjectionMiddleware — registered inside the GZip layer so it sees uncompressed HTML before compression. Many templates are standalone (their own DOCTYPE) and never extend base.html, so a per-template include would miss them. - Frontend: $pageview, $pageleave, JS error capture via window.error and unhandledrejection handlers, masked session replay (maskAllInputs: true plus CSS-selector mask for known data surfaces), feature flags (browser posthog.isFeatureEnabled + server-side feature_enabled with fallback for older SDKs). Identification mode operator-configurable: none / id / email / full. Default email ships user.id + email but never name. CLI entry point moves from cli.main:app to cli.main:main (Typer wrapper). Files: - src/observability/posthog_client.py — lazy singleton, no network when disabled, single-process flush on shutdown - src/observability/llm_tracing.py — trace_generation context manager - app/middleware/posthog_inject.py — HTML rewrite middleware - app/web/templates/_posthog.html — browser snippet template - docs/observability.md — operator guide - config/.env.template — documented POSTHOG_* knobs - tests/test_posthog_disabled.py + tests/test_posthog_client.py + tests/test_llm_tracing.py — 18 tests covering disabled state, identify-mode payloads, $ai_generation shape, error variant. CHANGELOG entry under [Unreleased] Added. * feat(observability): tag every PostHog event with environment + release Splits PostHog dashboards cleanly between localhost / dev / staging / production without manual tagging on every capture call. - POSTHOG_ENVIRONMENT explicit override; auto-resolves to "local" when LOCAL_DEV_MODE=1, else RELEASE_CHANNEL, else AGNES_DEPLOYMENT_ENV, else "unknown". - AGNES_VERSION → RELEASE_CHANNEL fallback feeds the `release` property for "is this error new in this release?" cohorting. - Backend gets both via the PostHog SDK's super_properties constructor arg (every captured event picks them up automatically). - Browser snippet calls posthog.register({environment, release}) inside the loaded callback so $pageview, $exception, autocapture, etc. all carry the same labels. - request.state.user now populated by auth dependencies so the snippet can actually call posthog.identify(user_id, {email}) for logged-in users (previously the user block always resolved to None because nothing wrote to request.state.user). 4 new tests cover env resolution: explicit > LOCAL_DEV_MODE > channel > unknown, plus super-properties forwarding into the SDK constructor. * feat(observability): inline user attrs on every PostHog event + debug throw route PostHog's UI shows person properties on the Person profile page, not inline on each event — so a reviewer triaging an exception couldn't tell which user hit the bug without clicking through. Fix it on both sides. - Backend capture_exception merges user_id / user_email / user_name into the event properties (gated by POSTHOG_IDENTIFY_PII: none/id/email/full). Backed by a new _user_props_for_event helper on PosthogClient. - Browser snippet registers user_id + user_email + user_name as super- properties via posthog.register({...}) so every $exception, $pageview, and custom event coming from posthog.captureException() carries them inline. Mirrors the backend so cross-referencing client/server events doesn't require a person-profile lookup. - /api/debug/throw — debug-only endpoint gated by DEBUG=1 (404 in prod). Runs Depends(get_current_user) first so request.state.user is set when the unhandled-exception handler captures the event. Lets operators exercise the full observability path end-to-end without hand-rolling a TestClient script. Configurable via ?kind=ValueError&msg=... 7 new tests cover: backend user-attr merge across identify modes, anonymous request fall-through, browser snippet super-prop emission for logged-in / anonymous / id-only / full-name cases. * fix(observability): address minasarustamyan PR #231 review Two bugs caught in review. 1. PosthogInjectionMiddleware dropped Response.background on every return path. BaseHTTPMiddleware materialises the body and asks subclasses to return a fresh Response — three paths in dispatch() omitted background=, silently cancelling any BackgroundTask / BackgroundTasks the route attached (audit logging, async webhooks, email sends) with no log line. Fix: route every return through a _passthrough() helper that forwards background. Also adds a _MAX_BUFFER_BYTES (4 MB) cap so a streamed-HTML response can't balloon RSS during buffering. Bigger bodies short-circuit through with a warning rather than being injected. Regression tests in tests/test_posthog_inject_middleware.py exercise four return paths (snippet present, render-fail, double-injection guard, non-HTML passthrough) plus the streaming-guard short-circuit. 2. $ai_input / $ai_output_choices were emitted without truncation, so POSTHOG_LLM_PAYLOADS=1 silently dropped events past PostHog's ~32 KB per-event ingest limit — exactly the calls (large prompts with schemas / sample rows / SQL) an operator would want to inspect. Fix: clip both at POSTHOG_LLM_PAYLOAD_MAX_CHARS (default 30000) with an explicit "…[truncated N chars]" marker so readers don't mistake truncated captures for complete ones. Metadata (provider, model, tokens, latency, error) flows regardless. Three new tests cover default-cap clipping, env-override, and pass-through under the cap. 37 PostHog tests pass.	2026-05-08 17:57:10 +04:00
minasarustamyan	4fb2818a19	Add /marketplace browse page + Model B opt-in stack composition (#230 ) * Add /marketplace browse page + Model B opt-in stack composition New /marketplace browse surface unifies the curated marketplaces (admin-managed git mirrors) and the community Flea Market behind three tabs — Curated / Flea / My Stack — with per-tab category filter, search across both sources with scope checkboxes, and numeric pagination, all driven by URL query state. Plugin detail at /marketplace/curated/<slug>/<plugin> and /marketplace/flea/<id>; nested skill / agent detail at /marketplace/curated/<slug>/<plugin>/ {skill,agent}/<name> and the flea-side single-page detail. Model B opt-in: an RBAC grant on a curated plugin is now only eligibility. The user must click "Add to my stack" for it to enter their served Claude Code marketplace. Composition flips from (rbac ∖ opt_outs) ∪ store_installs to (rbac ∩ subscriptions) ∪ store_installs. The legacy user_plugin_optouts table is renamed user_curated_subscriptions (schema v27) — same table shape, inverted semantic, repository methods become subscribe / unsubscribe / is_subscribed. UX vocabulary: Install → Add to my stack, Installed → In your stack, card "Installed" badge → "In stack" (amber pill), tab "My Subscriptions" → "My Stack". Bridges the two-step model (server-side bookmark vs. on-laptop install) the previous label hid. Click triggers an inline post-add hint panel under the description with the agnes refresh-marketplace recipe + Copy chip, dismissible per-browser via localStorage. Per-tab info blocks above the filter row: - Curated: trust signal — "Each plugin here has a named curator accountable for it." (blue accent + See-all-curators link) - Flea: open-shelf signal — "Anyone in the company can upload here." (purple accent + Tips-for-sharing link) - My Stack: personal-shelf orientation — "Your AI stack — everything you've added." (slate accent, no link) Tabs carry per-tab Heroicons (shield-check / building-storefront / rectangle-stack) tinted to match each tab's accent; flips white when the tab is active for contrast. Hero illustration anchored to the right of the blue hero panel (absolute, 47% wide, behind the search row content). Hidden under 900px viewport. Action-row CTAs realigned to publication intent: curated "How to add new content" → "Submit a plugin" (links to the guide page); flea button removed since +Upload sits next to it. Empty-state CTAs match. /marketplace/guide/{curated,flea} routes now host publication-flow guide pages with placeholder ledes — full copy to be authored separately. Categories: Heroicons-based icons mapped per category in src/category_icons.py (zero new dependencies; SVG path strings inlined). Marketplace cards, filter pills, and detail pages read from the same source. API endpoints under /api/marketplace: - GET /items per-tab listing (curated / flea / my) - GET /categories per-tab non-zero counts - GET /curated/{slug}/{plugin} plugin detail - POST/DELETE /curated/{slug}/{plugin}/install subscribe toggle - GET /curated/{slug}/{plugin}/{skill,agent}/{name} inner item The tab=my branch reads directly from user_curated_subscriptions ∪ user_store_installs (not resolve_user_marketplace, which bundles flea skills/agents into a single store-bundle synthetic entry useful for serving the Claude Code marketplace ZIP/git but wrong for browsing where each item should appear as its own card). Detail pages: plugin detail surfaces inner skills/agents as clickable nested cards; commands/hooks/MCPs render as plain name lists. Skill/agent detail mirrors the plugin layout with kind-tinted accents (skill = green, agent = purple), Description + Details sidebar, Files + Docs sections, and the "How to call it" copy-able invocation chip showing /<plugin>:<inner-name> exactly as Claude Code namespaces it post-install. Curated nested has no install button — links back to the parent plugin. Navbar: standalone "My AI Stack" relabelled "My Stack" and points at /marketplace?tab=my; "Store" link removed (Store flow is reachable via the Flea Market tab's +Upload button). The standalone /my-ai-stack and /store routes still work for old bookmarks. Tests cover the new browse / categories / install / RBAC paths under tests/test_marketplace_api.py; existing marketplace and store tests updated for Model B (explicit subscribe in fixtures). Schema bumped v26 → v27 with idempotent migration that wipes existing user_plugin_optouts rows on flip and adds marketplace_plugins.created_at with registered_at backfill. * Fix v28 migration + post-rebase test fallout v28 ALTER TABLE marketplace_plugins ADD COLUMN created_at conflicted with _SYSTEM_SCHEMA's earlier CREATE that already includes the column on fresh installs (test fixtures starting at any pre-v28 version trip on it). Switch to ADD COLUMN IF NOT EXISTS — same idiom as the upstream v27 Keboola sync-strategy migration on the same ladder. Two test patches needed after the rebase bumped SCHEMA_VERSION 27 → 28: - test_keboola_v27_migration.py: test_schema_version_constant_is_27 was pinning ==27. Loosened to >=27 (the test's purpose is to verify the v27 Keboola migration, not to pin the current SCHEMA_VERSION). - test_setup_page_unified.py: was monkeypatching resolve_allowed_plugins but compute_default_agent_prompt now reads from resolve_user_marketplace (Model B-aware). Stub the right function so the test exercises the v28 served-set path. * Harden curated skill/agent inner endpoints against path traversal `_read_inner`, the `skill_dir` walk in `curated_skill_detail`, and the `agent_path.stat` in `curated_agent_detail` joined URL path-params onto `plugin_root` without verifying the resolved candidate stayed inside it. Starlette's `[^/]+` on `{skill_name}` / `{agent_name}` blocks the direct URL exploit (encoded `/` 404s before the handler), but a curator-planted symlink inside a curated marketplace's git mirror could still dereference outside the plugin tree on read. Adds `_safe_join(plugin_root, *parts)` doing `Path.resolve(strict=True)` + `relative_to(plugin_root.resolve())`, used by all three call sites so the boundary is enforced once and consistently. Tests cover the helper directly (normal path resolves, escaping `..` returns None, escaping symlink returns None, missing file returns None) plus an end-to-end check that the symlink case actually 404s on the HTTP endpoint. Symlink tests skip on Windows where symlink creation needs elevated permissions; they run on Linux CI. --------- Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>	2026-05-08 14:22:19 +02:00
ZdenekSrotyr	6fe9135cb5	release: 0.47.3 — self-upgrade ignores 24h cache, always re-probes /cli/latest (#227 ) ## Summary `agnes self-upgrade` without `--force` previously short-circuited on the local 24h `update_check.json` cache. After a server-side version bump within that window, the explicit command exited silently as a no-op — empirically observed today when prod 0.47.1 → 0.47.2 didn't propagate. Fix: always invalidate the cache in `_resolve_info`. The cache still gates the implicit warning loop in the root callback (correctly — that runs on every `agnes <anything>` and can't hammer `/cli/latest`). ## Test plan - [x] New `test_self_upgrade_bypasses_24h_cache_without_force` — stale cache claims current; mocked server reports newer; assert UpdateInfo carries the newer version, not the cached one. - [x] Existing self-upgrade tests pass (including `--force` semantics — force is now downstream-only, behavior preserved).	2026-05-07 22:08:21 +02:00
ZdenekSrotyr	917f9aaef0	release: 0.47.2 — restore #218 + #219 fixes silently reverted by #217 (#225 ) ## Summary Smoke-testing the just-shipped 0.47.1 against production exposed two regressions: 1. `agnes query --remote "SELECT FROM unit_economics WHERE bad_col=1"` returned `Table "unit_economics" must be qualified` (the OLD error) instead of `Unrecognized name: bad_col` (the #218 fix's intended behavior). 2. `agnes query "DESCRIBE unit_economics"` showed only DuckDB's misleading `Did you mean order_economics?` with no Agnes hint paragraph (the #219 fix is missing). Root cause: PR #217's squash merge (`506a378c`) carried stale snapshots of `app/api/query.py` and `cli/commands/query.py` from before #218 and #219 merged. The rebase-and-merge auto-merged those files cleanly (no conflict markers) but the result silently reverted both fixes. Restore the two changes verbatim. Tests for both fixes already on main and continue to pass against the restored code. ## Test plan - [x] `pytest tests/test_api_query_guardrail.py tests/test_cli_query.py` — clean - [x] Manual repro against prod after deploy: both flows now surface the intended diagnostic. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/225" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->	2026-05-07 19:57:18 +02:00
ZdenekSrotyr	506a378c3a	release: 0.47.1 — Keboola connector v27 (incremental, partitioned, where_filters, typed parquet) (#217 ) ## Summary Brings the Keboola connector to feature parity with the legacy internal data-analyst's per-table sync strategies. Closes the four documented gaps from the spec branch (`zs/keboola-connector-specs`): - Typed parquet in the legacy SDK extraction path — column types from Keboola Storage metadata (provider cascade `user > ai-metadata-enrichment > keboola.snowflake-transformation`) survive the CSV → parquet roundtrip; invalid date strings (`'0000-00-00'`) and invalid numeric strings (`'Non-Manager'`) become NULL while keeping the column's typed schema. Pre-fix everything was VARCHAR. - Incremental sync via Storage API `changedSince` — opt-in per table; pulls only delta rows, merges into the existing parquet by `primary_key` (drop_duplicates with keep='last'). Cuts daily extraction from O(full table) to O(delta). - Partitioned sync — flat per-partition layout `data/<table>/<key>.parquet` (e.g. `2026_05.parquet`), per-affected-partition merge for daily updates, chunked initial load with 1-day overlap and 2-empty-chunk stop heuristic. - `where_filters` — server-side row filter with date placeholders (`{{today}}`, `{{last_3_months}}`, `{{start_of_3_months_ago}}`, etc.) resolved at sync time. Force the SDK path; reject `incremental + where_filters` combination at API layer (changedSince already filters temporally). ## Architecture - Schema migration v25 → v26: 7 new columns on `table_registry`. Existing `sync_strategy` column reused (pre-v26 it was inert catalog metadata; post-v26 the extractor dispatches off it). - Per-table dispatcher in `extractor.run()` routes to one of `_extract_via_extension` (full_refresh + extension), `_extract_via_legacy` (full_refresh + filters or extension fallback), `extract_incremental`, or `extract_partitioned`. - API conflict policy: `incremental + where_filters` → 422; `partitioned + query_mode='remote'` → 422; `partitioned ⇒ partition_by required`. - Admin UI: third "Direct extract (Storage API)" radio in the Keboola Register / Edit modals, alongside existing "Whole table (extension)" and "Custom SQL". When selected, exposes a v26 sync-strategy panel with conditional fields per strategy. ## Test plan - [x] Unit + module — 134 v26 tests covering migration, repo, parquet_io, where_filters, incremental (compute_changed_since + merge_parquet + extract_incremental E2E), partitioned (key derivation + merge_partition + chunked windows + extract_partitioned E2E), extractor dispatcher, admin API validators, PUT field clearing, registry-shape → dispatcher bridge - [x] HTML form structure — all v26 inputs + visibility classes + JS payload fields verified in rendered template - [x] Real Keboola roundtrip — registered a small test table as `sync_strategy='incremental'` against a test Storage project, triggered two syncs: - Sync 1: `changedSince=None` → full pull → 9 rows typed parquet - Sync 2: `changedSince=last_sync - 1d window` → 9 delta rows merged with 9 existing → 9 after dedup on primary_key (PK merge confirmed) - [x] Browser UX — agent-browser session against a local uvicorn: login → admin/tables → register modal → switch radios → verify field visibility per strategy → submit → edit existing row → switch to Direct/Incremental → save → confirm DB persistence - [x] Regression — no regressions in the broader 3252-test suite (3 pre-v26 tests updated for the deprecation-marker removal + schema-version bump; 2 pre-existing environment-sensitive test failures unrelated to this change) ## Bugs caught + fixed during E2E The browser + real-Keboola roundtrip exposed four bugs the unit tests missed: 1. JS visibility race — two competing `forEach` loops set `display=''` then `display='none'` on form elements sharing `kb-strategy-incremental kb-strategy-partitioned` classes (window_days + max_history_days are reused across strategies). Fix: single-pass selector with class-based visibility resolver. 2. PUT cannot clear field — pre-v26 `updates = {k: v ... if v is not None}` collapsed "omitted from body" and "sent as null" into the same case, so admin couldn't switch a partitioned row back to full_refresh and have stale `partition_by` clear. Fix: `model_dump(exclude_unset=True)`. 3. Subprocess DB lock conflict — `_read_last_sync` reopened `system.duckdb` while the parent server held the write lock (subprocess contract at `app/api/sync.py:_run_sync` line 260). Fix: parent injects `__last_sync__` into table_config before subprocess spawn. 4. Wrong KBC table_id — `extract_incremental` / `extract_partitioned` built the Storage API table_id from the registry row's slugified `id` (`circle_inc`) instead of `bucket.source_table` (`in.c-finance.circle`), producing 404s. Fix: prefer `bucket+source_table`; fall back to `id` only when bucket empty. ## Operator notes - Existing tables stay on `full_refresh` after migration; admins opt individual tables in via `agnes admin register-table --sync-strategy ...`, the Keboola Edit modal, or `POST/PUT /api/admin/registry`. - `merge_parquet` and `merge_partition` use `pd.concat + drop_duplicates`, loading both existing and delta into pandas RAM. For tables in the multi-million-row range this may OOM — switch to `partitioned` strategy for those (per-partition merge keeps memory bounded). Documented in `### Internal` of the changelog entry. - Date placeholders are resolved at sync time, not register time — a typo'd `{{lasst_week}}` is accepted at register and surfaces only when the next sync runs. By design (rolling windows need late-binding). ## Spec source The four corresponding plans on the `zs/keboola-connector-specs` branch under `docs/superpowers/plans/2026-05-07-0[1-4]-*.md` capture the design rationale and link back to internal repo references for each subsystem. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/217" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->	2026-05-07 19:01:27 +02:00
ZdenekSrotyr	aa5921da67	release: 0.47.0 — source-agnostic catalog metadata + cache discipline (#223 ) ## Summary - Catalog enrichment for `query_mode='remote'` rows: `rows`, `size_bytes`, `partition_by`, `clustered_by` per table (BQ + Keboola providers). - `/api/v2/schema/{id}` cache miss: 2 BQ jobs → 1 (-50%) via shared `fetch_bq_columns_full`. - All four catalog/schema/sample/metadata caches flush on registry change; single-row re-warm scheduled. - Automatic cache warmup at server startup (bounded concurrency, opt-out via `AGNES_SKIP_CACHE_WARMUP=1`). - SSE-driven freshness toolbar on `/admin/tables` with progress bar, log, and per-row badge. - New admin doc `docs/admin/query-modes.md` — single source of truth on `local` / `remote` / `materialized` choice. Closes #155. Closes #156. ## Test plan - [x] 65+ targeted tests pass across 11 new test modules + 3 modified ones. - [x] No DB migration; no wire-break; `MIN_COMPAT_CLI_VERSION` unchanged. - [ ] Reviewer: register a remote BQ table via `/admin/tables`, observe the toolbar populates within ~2 s and the per-row badge transitions warming → fresh. - [ ] Reviewer: trigger `Re-warm all`, verify SSE log scrolls and `cacheWarmupBar` progresses. - [ ] Reviewer: edit a registered row's bucket, verify `agnes schema <id>` returns updated columns immediately (no 1-hour staleness). - [ ] Reviewer: confirm `agnes admin register-table --query-mode remote` prints the new IAM-smoke-check hint. ## Notable design decisions - BigQuery `INFORMATION_SCHEMA.TABLE_STORAGE` is the only valid scope for size+rows (verified live 2026-05-07; dataset-scoped doesn't exist). Region resolved from `instance.yaml.data_source.bigquery.location` → `bq.client().get_dataset(...)` → fall back to legacy `__TABLES__`. - VIEW handling: TABLE_STORAGE returns no rows for views, fall through to `__TABLES__` (also empty) → `TableMetadata(rows=None, size_bytes=None, partition_by=..., clustered_by=...)`. Null size signals analyst Claude to apply existing CLAUDE.md guidance. - `size_bytes` is `active_logical_bytes + long_term_logical_bytes` — full BQ scan reads both; reporting only active undercounts aged partitioned tables. - Source-agnostic provider seam: per-source `connectors/<source>/metadata.py:fetch(MetadataRequest)`; dispatcher in `app/api/v2_catalog.py:_metadata_provider_for` lazily imports per source_type so a Keboola-only deployment doesn't pay the BQ-extension import cost. - Warmup non-blocking: FastAPI `lifespan` schedules `asyncio.create_task(_warm_catalog_caches_bg)` before `yield`. Per-row failures isolated. ## Out of scope - Profile / column histograms / dimension cardinality for remote tables (separate issue). - Onboarding nudge ("you have 0 remote tables, consider registering some BQ ones") — separate UX call. - Provider plug-in registration via entry-points (the dispatch table is a hardcoded if-tree today; one line per future source). ## Release Bumps `pyproject.toml` 0.46.1 → 0.47.0 (main shipped 0.46.0 + 0.46.1 during this PR — see commit `d98976ec`). New CHANGELOG section under `## [0.47.0] — 2026-05-07`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/223" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->	2026-05-07 18:33:55 +02:00
ZdenekSrotyr	751cc25327	release: 0.46.5 — agnes describe -n parses, server sanitizes NaN (#224 ) ## Summary Two bugs in `agnes describe` surfaced from a real analyst session following the CLAUDE.md agent-rails discovery workflow. Together they break `agnes describe` end-to-end for any analyst (or analyst-AI) who follows the documented form. ### A) CLI parsing `agnes describe TABLE -n 5` failed with `Missing argument 'TABLE_ID'`. Root cause: the command was registered as a `Typer.Typer` subcommand group via `app.add_typer(describe_app, name="describe")` + `@describe_app.callback(invoke_without_command=True)`, and that pattern mis-parses positional + short-int option in some orderings. Same pattern in `cli/commands/schema.py` works only because schema has no INTEGER short option. Fix: switch to flat `@app.command("describe")`. ### B) Server NaN `/api/v2/sample/<id>` (called by `agnes describe`) returned HTTP 500 with `ValueError: Out of range float values are not JSON compliant: nan` whenever a row contained NaN. Fix: sanitize NaN/±inf to None before JSON serialization. ## Test plan - [x] `pytest tests/test_cli_describe.py` — added regression tests pinning `-n` parsing on either side of the positional. - [x] `pytest tests/test_api_v2_sample.py` — added regression test for NaN row → JSON `null` (not 500). <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/224" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->	2026-05-07 18:16:21 +02:00
ZdenekSrotyr	8d0bb43b06	release: 0.46.4 — detach SessionEnd push so it survives claude -p SIGTERM (#222 ) ## Summary `claude -p` (headless mode) gives SessionEnd hook subprocesses ~1 second before SIGTERM, regardless of work in progress. `agnes push` for a typical workspace takes 5-30s. The current synchronous SessionEnd hook (`agnes push --quiet 2>/dev/null \|\| true`) was therefore being killed mid-first-upload — `\|\| true` masks the SIGTERM as exit 0, so this regression was invisible until I traced it via a wrapper script and Claude's `~/.claude/debug/<sid>.txt` log. Fix: wrap SessionEnd push in `bash -c "( nohup agnes push --quiet </dev/null >/dev/null 2>&1 & ) ; true"`. The subshell exits immediately, orphaning the upload child to init so it survives the hook subprocess kill. Same `bash -c` pattern as the existing `refresh-marketplace` SessionStart entry (for Windows compatibility). End-to-end verified against production: claude exited in 5s, detached child completed the upload, file `491e3a23-...jsonl` landed on the server within 30s with mtime 14:30 UTC. ## Test plan - [x] `pytest tests/test_lib_hooks.py` — added `test_session_end_push_is_detached` regression test asserting `nohup`, `&`, `</dev/null` are all present. - [x] `pytest tests/test_setup_hooks_template.py` — assertions loosened from `==` to `in` where necessary. - [x] Verified end-to-end against production with the detached wrapper before opening this PR (manual probe). <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/222" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->	2026-05-07 17:59:27 +02:00
ZdenekSrotyr	7fc5365891	release: 0.46.3 — self-heal session pipeline + clearer diagnose (#220 ) ## Summary Verified against production: `claude -p` headless mode doesn't fire SessionEnd hooks (proven via `--output-format stream-json --include-hook-events`: zero `SessionEnd` events), so any session JSONLs from `-p` invocations stay orphaned locally and never reach the server. Fix: add `agnes push --quiet` as a third SessionStart entry — symmetric self-heal alongside the existing `agnes pull` entry. Existing workspaces pick this up on their next `agnes init` via the marker-based migration already in `cli/lib/hooks.py`. Separately: a colleague's fresh install showed `agnes diagnose` warning "uploads are not being processed", which led them to suspect their `agnes push` was broken. The warning is actually about the LLM-based `verification-detector` backlog (uploads themselves were arriving fine — confirmed by 23+3 JSONLs landed on the server while the warning was firing). Reword the warning to "verification-detector backlog" + add `last_processed` to the diagnose dict so operators don't have to grep logs to confirm. ## Test plan - [x] `pytest tests/test_lib_hooks.py` — updated count + added `agnes push in SessionStart` assertion. - [x] `pytest tests/test_setup_hooks_template.py` — updated. - [x] `pytest tests/test_clean_install_integration.py` — updated. - [x] `pytest tests/test_health_session_pipeline.py` — updated warning text + asserted `last_processed` field. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/220" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->	2026-05-07 17:41:22 +02:00
ZdenekSrotyr	50d10443d1	release: 0.46.2 — friendlier hint on missing-table errors for remote tables (#219 ) ## Summary `agnes query "DESCRIBE unit_economics"` (where `unit_economics` is `query_mode='remote'`) previously returned DuckDB's nearest-name suggestion (`Did you mean "order_economics"`?), sending users down the wrong path. Now appends a friendly hint about remote tables. Reproduced from a real analyst session — colleague spent ~30s diagnosing what was actually "this is a remote table, not materialized locally". ## Test plan - [x] New test: `_query_local("DESCRIBE unit_economics", ...)` against an empty local DuckDB triggers the new hint, original DuckDB error still echoed. - [x] Negative test: a syntax-error query does NOT trigger the hint (regex only matches "Table with name X does not exist"). - [x] `pytest tests/test_cli_query*.py` clean. <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/219" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->	2026-05-07 17:24:10 +02:00
ZdenekSrotyr	28430ced09	Keboola cutover: native parquet path + sync correctness + auto-discover protection (#190 ) * fix: cutover regressions + parallel Keboola legacy fallback Bundled fixes from a fresh-deploy run on a Keboola Storage backend with the block-shared-snowflake-access feature flag — DuckDB Keboola extension's per-table scan can't access bucket schemas, so the legacy kbcstorage Storage-API client is the only working path. CUTOVER REGRESSIONS - agnes pull hash mismatch on every Keboola local-mode table — src/orchestrator.py:_update_sync_state stored md5(mtime+size)[:12] while the CLI compares against full 32-char content MD5. Now stores the same content MD5 the materialized SQL path already used. - Trailing-slash sanitization in connectors/keboola/access.py and extractor.py — DuckDB Keboola extension's ATTACH fails when the URL ends in / (canonical form). - src/profiler.py:TableInfo.description becomes optional — two call sites instantiated without it, crashing the profiler pass. - scripts/ops/agnes-auto-upgrade.sh: chown on UID change — older images ran as root, current runs as agnes (uid 999). Reads target uid:gid from /etc/passwd inside the new image and chowns ${STATE_DIR}, /data/extracts, /data/analytics when the digest moves. - POST /api/sync/trigger is now singleton per process — two near-simultaneous trigger calls each forked an extractor subprocess, fought for extract.duckdb's file lock, starved uvicorn, flipped the container to unhealthy. Trigger now returns 409 (sync_already_in_progress) when held; _run_sync acquires non-blocking. PARALLEL LEGACY FALLBACK - Process pool fan-out for the _extract_via_legacy queue (default 8 workers, override via AGNES_KEBOOLA_PARALLELISM). Process pool, not thread pool, because connectors/keboola/client.py:export_table does os.chdir(temp_dir) — process-global, so threads raced and slice files landed in the wrong directory ("[Errno 2] No such file or directory: '<job_id>.csv_X_Y_Z.csv'"). - Extractor subprocess timeout 1800s -> 3600s (configurable via AGNES_EXTRACTOR_TIMEOUT_SEC). 28+ tables × multi-minute Keboola export jobs need the headroom on telemetry-class projects. - Process group cleanup on timeout — Popen(start_new_session=True) puts the extractor in its own group. On timeout the parent SIGTERMs the group (10s grace) then SIGKILLs stragglers. Without this, the pool workers were reparented to PID 1 and continued holding open Keboola Storage export jobs. Inline extractor script also installs a SIGTERM -> sys.exit(143) handler so the with ProcessPoolExecutor(...) block __exit__ runs cleanly. Tests: existing tests that patched subprocess.run updated to patch subprocess.Popen with a _FakePopen stand-in (same exit-code-injection contract). Two tests that exercised the parallel path forced AGNES_KEBOOLA_PARALLELISM=1 to keep mocks alive (mocks don't ride into ProcessPoolExecutor subprocesses). Squashed onto current main (was 7 commits + multi-commit CHANGELOG + agnes-auto-upgrade.sh conflicts; squash avoids per-commit conflict resolution against main's flat-mount STATE_DIR refactor and 0.38.0 release cut). * feat(keboola): Storage API direct extract path; drop extension data path The DuckDB Keboola extension's COPY routes through Keboola QueryService, which is unreliable on linked-bucket projects (extension v0.1.6 fixes that case but isn't yet in the community CDN, and pre-fix any project with the block-shared-snowflake-access feature flag couldn't see bucket schemas at all). Move the extract path off the extension entirely and talk to the Storage API directly via signed-URL download — works on any project, regardless of extension state. connectors/keboola/storage_api.py (NEW) Lightweight client built on requests.Session. Three endpoints: - POST /v2/storage/tables/{id}/export-async (kicks off job) - GET /v2/storage/jobs/{id} (poll until done) - GET /v2/storage/files/{id}?federationToken=1 (signed URL detail) - GET <signed_url> (download bytes) Supports sliced exports (manifest + per-slice signed URLs) and gzipped payloads. ExportFilter dataclass mirrors the Keboola filter spec (whereFilters / columns / changedSince / limit) and handles JSON round-trip with the registry's source_query column. Token redaction in error messages. Bounded exponential backoff on job polling. No cloud-SDK dependency on the data path; thread-safe. connectors/keboola/extractor.py - materialize_query() rewritten: takes bucket/source_table/source_query (JSON filter spec), exports via KeboolaStorageClient, converts CSV to parquet via DuckDB, atomic os.replace. Same return shape so sync.py downstream code stays uniform with the BQ branch. - _extract_via_legacy() also moved to Storage API direct (kept the name for caller compatibility with _legacy_worker / the parallel batch extractor). Per-call temp directories — no os.chdir, threads don't race. app/api/sync.py _run_materialized_pass for source_type='keboola' rows now constructs a KeboolaStorageClient (replaces KeboolaAccess) and passes bucket/source_table/source_query to materialize_query. Reuses one client across rows for HTTP keep-alive. Sources keboola URL from env too (KEBOOLA_STACK_URL) when instance.yaml doesn't have stack_url configured. cli/commands/admin.py discover-and-register defaults Keboola rows to query_mode='materialized' (NULL source_query = full table), matching the v26 migration's unification of the local/materialized split for Keboola. BigQuery and Jira keep their per-source defaults. src/db.py Schema bump 25 → 26. Migration: UPDATE table_registry SET query_mode='materialized' WHERE source_type='keboola' AND query_mode='local'. NULL source_query on those rows means "full table export" — same effective behavior the local mode provided, but now via Storage API instead of the extension. pyproject.toml kbcstorage dep stays (admin-side bucket/table list still uses the SDK in app/api/admin.py / connectors/keboola/client.py); only the data path is migrated off the SDK. Comment updated to reflect the new boundary. tests - test_keboola_storage_api.py (NEW, 19 tests): ExportFilter parsing, HTTP client (token redaction, retry logic, polling), download_file (single, gzipped, sliced), end-to-end export_table_to_csv. - test_keboola_materialize.py rewritten: mocks KeboolaStorageClient instead of FakeAccess; same atomic-write + zero-rows + unsafe-id contracts. - test_sync_trigger_keboola_materialized.py: registry rows now carry bucket+source_table+JSON-shape source_query. 114+ Keboola-impacted tests green locally. * test: schema version assertion bumped to 26 alongside the keboola query_mode migration * fix(keboola): cutover hot-patches surfaced on agnes-dev Five small fixes that were applied as in-container hot-patches during agnes-dev cutover and need to be on the source-of-truth image so a fresh upgrade does not undo them. - app/api/sync.py: auto-discover gate considers the WHOLE registry (any source, any mode), not just rows where source matches and query_mode is local. After the v25→v26 keboola materialized migration an instance can have 30 materialized rows and zero local rows; the previous gate kept re-firing _discover_and_register_tables every scheduler tick, creating duplicate auto-discovered rows with the wrong bucket prefix every time. - app/api/admin.py: _discover_and_register_tables reassembles the bucket as <stage>.<bucket-id> (e.g. in.c-finance) instead of dropping the stage prefix; default query_mode for keboola is now materialized (the v26 contract); validator allows NULL source_query for keboola materialized rows (full-table export via Storage API export-async, no SQL needed). - cli/commands/admin.py: register-table mirrors the server validator (NULL source_query allowed for source_type=keboola); --bucket help text generalized to cover both BQ dataset and Keboola bucket id. - connectors/keboola/extractor.py: max_line_size=64 MiB on read_csv_auto so embedded JSON / SQL cells (kbc_component_configuration in particular) do not trip the default 2 MiB ceiling. - connectors/keboola/storage_api.py: GCP backend support — when the Storage API returns a manifest whose slice URLs are gs:// references with a gcsCredentials block, rewrite to the JSON REST download endpoint and authenticate with the issued OAuth bearer token; redact tokens in any surfaced error string. * test: align with new keboola materialized + auto-discover-gate contracts - test_admin_keboola_materialized: rename test_register_keboola_materialized_rejects_missing_source_query → test_register_keboola_materialized_accepts_missing_source_query. v25→v26 introduced 'keboola materialized with NULL source_query means full-table export via Storage API export-async' as the default registration shape; the rejection case is no longer the contract. - test_sync_filter: add list_all() to _StubRegistry. The auto-discover gate in _run_sync now keys off the WHOLE registry (not just local rows) so materialized-only Keboola instances do not re-trigger discovery on every tick. * feat(keboola): native parquet export — skip CSV roundtrip Storage API export-async accepts fileType={csv,parquet}. Switching the materialized sync to parquet eliminates the CSV → DuckDB COPY → parquet roundtrip that pinned a single uvicorn worker over 4 GiB on multi-GB tables (read_csv with all_varchar + max_line_size=64MB has to materialize the whole CSV in memory before COPY can stream out a parquet). Snowflake UNLOAD on Keboola's side already produces typed, self-contained parquet files; the extractor downloads them and renames into place. Two cases: - Single-file export (small table): file_info.url points at one signed URL; download_file streams chunks straight to .parquet.tmp and we're done. No DuckDB. - Sliced export (Snowflake UNLOAD respects MAX_FILE_SIZE — 16 MiB default — so anything larger arrives as N parquet slices): each slice is a complete parquet file with its own footer; naive concat would corrupt them. download_file_slices keeps the slices as separate files in a tempdir, then DuckDB COPY (SELECT * FROM read_parquet([slice0, slice1, ...])) merges them into one consolidated parquet. DuckDB streams row groups during this — peak memory bounded to one row group (~1 MiB) regardless of source size. The legacy CSV path stays as the explicit opt-in via source_query= '{"file_type":"csv"}' for projects whose backend can't UNLOAD parquet (none known today; cheap escape hatch). Backward-compat alias KeboolaStorageClient.export_table_to_csv kept. Also fixes a latent bug in download_file's gzip detection: previous heuristic flagged any unencrypted file as gzipped, which would have corrupted parquet downloads at gunzip time. Name-suffix-only now. * fix: tempdir leak cleanup, every 0m schedule, /sync/trigger body shapes Three small self-contained fixes uncovered during agnes-dev cutover. - connectors/keboola/extractor.py: tempfile.TemporaryDirectory now uses ignore_cleanup_errors=True so a worker death mid-write doesn't leave multi-GiB stale slice trees on the boot disk. (12 GiB seen after a disk-full crash where TemporaryDirectory's own cleanup also raised and got swallowed.) - src/scheduler.py: is_valid_schedule accepts 'every 0m' (interval=0 = always due). Force-resync of an errored row no longer requires waiting out the default 'every 1h' interval — admin can flip the schedule, trigger, then flip back. - app/api/sync.py: POST /api/sync/trigger accepts both ['table_id'] (legacy bare-array body) and {'tables': ['table_id']} (matches the response payload shape, more discoverable for clients building requests by hand). Malformed bodies return 422 with a structured detail; null/missing means 'sync everything' as before. Tests cover: tempdir cleanup on raise (sliced parquet path), is_valid_schedule + is_table_due 'every 0m' acceptance, and trigger body parametrized matrix (8 valid shapes + 6 rejection cases). * fix: targeted-trigger filter in materialized pass + auto-upgrade defer Two operational gaps observed during agnes-dev cutover, in the same sync-routing area. - _run_materialized_pass now takes a 'tables' arg and skips rows not in the target set with reason='not_in_target'. POST /api/sync/trigger with a body of tables previously only scoped the legacy extractor subprocess — the materialized pass kept iterating every due materialized row, so an admin asking to re-sync kbc_job re-ran every other due materialized row alongside it. Match on registry id OR name (admins commonly pass either form). tables=None preserves the no-filter behavior. - New GET /api/sync/status (public, no auth) returns {locked: bool} off _sync_lock.locked(). agnes-auto-upgrade.sh probes this before docker compose up -d and exits 0 with a 'deferred recreate' log line if a sync is in flight — the next 5-min cron tick retries. Pre-fix, an auto-upgrade triggered mid-sync would recreate the uvicorn worker and kill the in-flight extractor / Snowflake-UNLOAD download (observed when kbc_job's first 7-day retry got SIGKILLed). Connection failures in the probe fall through to the upgrade — being stuck on a wedged image is worse than interrupting a hypothetical sync. * fix: auto-discover protects admin overrides + surfaces drift Two real-world incidents on agnes-dev drove this: 1. kbc_job was registered manually with the correct (in.c-kbc_telemetry, kbc_job) coordinates. A naive auto-discover re-run would have inserted a SECOND kbc_job row at the slugified id 'in_c-keboola-storage_kbc_job' (where Keboola's discovery places it) — and that row's Storage API export-async 404s. 2. An earlier auto-discover bug stripped the stage prefix from bucket ids ('c-finance' instead of 'in.c-finance'), inserting 137 rows whose syncs all failed. Fix: - _discover_and_register_tables now builds a plan first (_build_keboola_discovery_plan) classifying each discovered table into one of new / existing_match / existing_drift / invalid, then executes only the 'new' bucket. Drift rows are reported with both sides of the disagreement plus drift_kind: - same_id_diff_coords: registry has the same id but different bucket / source_table (admin migrated coords inline). - name_collision: discovery's slugified id differs from any registry id, but the discovered .name matches an existing row's .name (case-insensitive). Catches the kbc_job case. - Bucket detection now prefers the API's authoritative bucket_id field (separate field on the Keboola tables.list response, normalised by KeboolaClient.discover_all_tables). Falls back to id-string parsing only when bucket_id is missing (older fallback path inside discover_all_tables). - Endpoint POST /api/admin/discover-and-register?dry_run=true returns the plan without writing — would_register, drift, invalid lists. Lets an operator audit before merging discovery with a registry that has admin overrides. Removed 'every 0m' from test_register_request_rejects_malformed_sync_schedule — the runtime started accepting it in the previous commit (force-resync override) and the validator follows suit. * feat(keboola): AGNES_TEMP_DIR routes tempfiles off overlayfs /tmp The container's /tmp lives on the boot disk's overlayfs (29 GiB on agnes-dev, shared with /var). Snowflake UNLOAD of a wide table writes slices into per-call /tmp tempdirs that fill multi-GiB / many-slice exports long before the dedicated data disk fills. agnes-dev hit 100% boot-disk while the 20 GiB data disk had 15 GiB free. connectors.keboola.storage_api.get_temp_root() reads AGNES_TEMP_DIR; mkdirs the target on first use; unset / empty / unwritable falls back to None (system tempdir, OSS-pre-fix behaviour). Both materialize_query (parquet path) and _extract_via_legacy (CSV fallback) and the sliced-CSV concat path in storage_api use the helper now. docker-compose.yml defaults AGNES_TEMP_DIR=/data/tmp on app, scheduler, and extract services. The data volume is the dedicated disk in production layouts and a plain docker volume in single-disk dev/laptop setups — same blast radius as the previous /tmp default on the latter, no regression.	2026-05-07 12:12:14 +02:00
ZdenekSrotyr	c97fd504c5	release: 0.45.0 — easy-wins bundle (#84 #164 #177 #178 #203 #204 ) Operator-and-analyst quality bundle: a security fix for the optional Telegram bot, two CLI gaps closed, and three rounds of UX polish on `agnes diagnose` and `agnes pull` so non-TTY consumers (CI runners, Claude Code SessionStart hooks, sub-agent watchdogs) get readable, actionable signal. - Pairing-code RNG: random.choices -> secrets.choice (CSPRNG). - Telegram script runner: refuse out-of-shape usernames before sudo -u. CLAUDE.md.bak.<ISO-timestamp> before regenerating. - agnes admin unregister-table <id> -> DELETE /api/admin/registry/{id} - agnes admin update-table <id> --field=value ... -> PUT /api/admin/registry/{id} response but never promotes the headline. BQ billing-equals-data check downgraded warning -> info. default (5 s / 1 MiB vs 30 s / 10%) so sub-agent watchdogs don't kill the pull as a hung process. New env knobs: AGNES_PULL_PROGRESS_INTERVAL_{SECONDS,BYTES}. --include-schema (or ?include=schema) to opt back in. Tests: 120 passed across the touched modules, including new tests for each fix. Pre-existing failures on main (DB migration v1->v9, binary rename) are unrelated and not introduced here.	2026-05-07 11:43:16 +02:00
Minas Arustamyan	cd10aefdbd	fix(refresh-marketplace): align manual-mode hint with hook JSON Hook JSON path uses /reload-plugins (no restart needed); manual-mode echo path was still telling the operator to /exit + restart. Both now say /reload-plugins. Tests renamed to _reload_hint_ to match the new wording.	2026-05-07 06:59:13 +02:00
Minas Arustamyan	3aeb0f2fbd	fix(refresh-marketplace): use /reload-plugins instead of /exit + restart Claude Code's `/reload-plugins` slash command picks up newly installed plugins into the running session without forcing the user to /exit and restart Claude Code. The hook JSON `systemMessage` and `additionalContext` both now point at it. Tests updated to pin the new hint shape.	2026-05-07 06:59:13 +02:00
Minas Arustamyan	166c1c0752	fix(refresh-marketplace): pass --scope project to `claude plugin update` Without `--scope project`, `claude plugin update <name>@agnes` operated at user scope (the default) instead of updating the project-scoped install — so version bumps in the served manifest never propagated to the workspace, even though `claude plugin install` correctly used `--scope project` for the missing-plugin path. Mirrors the install line in the same function. Any change refresh- marketplace makes to a plugin must now stay in project scope — consistent with the SessionStart hook firing per-workspace.	2026-05-07 06:59:13 +02:00
Minas Arustamyan	50e0463501	feat(marketplace): clone-based plugin setup + auto-refresh SessionStart hook Adds end-to-end flow for installing and keeping the per-user filtered Claude Code marketplace in sync with the user's Agnes stack (admin RBAC grants \ MyAIStack opt-outs U /store installs). Setup (one-liner in install prompt step 5): `agnes refresh-marketplace --bootstrap` clones the per-user marketplace bare repo to ~/.agnes/marketplace, strips PAT from the cloned origin URL, registers the local path with Claude Code, and installs every plugin in the served manifest at --scope project. Replaces a 15-line inline shell sequence that tripped Claude Code's agent-driven `rm -rf` permission gate. Auto-refresh (SessionStart hook installed by `agnes init`): `agnes refresh-marketplace --quiet` runs every Claude Code session, fetches+resets the clone (server rebuilds as orphan commits, so pull --ff-only is impossible), and version-aware reconciles: - missing in workspace -> claude plugin install <name>@agnes --scope project - version differs -> claude plugin update <name>@agnes - matches -> skip Don't auto-uninstall plugins that disappeared from the manifest -- a transient empty manifest from the server would wipe the stack. Hook output: when --quiet AND something actually changed, emits Claude Code hook JSON on stdout -- `systemMessage` (transient toast) and `hookSpecificOutput.additionalContext` (model-side system reminder), both carrying the change summary plus a "/exit + restart Claude Code" instruction (Claude only scans plugins at session start). Windows hook compatibility: the refresh-marketplace hook command is wrapped in `bash -c "..."` because Claude Code on Windows runs hook commands directly without invoking a shell, so `2>/dev/null \|\| true` would otherwise be passed as literal argv tokens. Cross-cutting: - cli/lib/marketplace.py: shared CLONE_DIR + MARKETPLACE_NAME constants. - cli/lib/hooks.py: SessionStart now has two independent entries (pull + refresh-marketplace) so a failure in one doesn't suppress the other; legacy `da sync` and prior single-pull layouts upgrade cleanly on re-init. - PAT injection on every git fetch via per-invocation credential helper (token in \$AGNES_TOKEN env, never in argv or .git/config). - Pre-snapshot of installed plugins captured BEFORE `claude plugin marketplace update` so silent auto-applied version bumps still fire notifications. - scripts/dev/agnes-client-reset.sh: cleans ~/.claude/plugins/marketplaces/agnes, ~/.claude/plugins/cache/agnes, drops uv build cache, documents workspace-scoped residue that can't be enumerated from the script. - app/web/setup_instructions.py: legacy AGNES_DEBUG_AUTH path also uses clone (direct HTTPS marketplace add is broken end-to-end on every Claude Code distribution -- stores response as single file, plugin source paths then 404). 28 new tests (test_cli_refresh_marketplace.py) + extended hook + setup template tests cover bootstrap, fetch+reset ordering, version-aware reconcile, project-path filtering, hook JSON shape, and the bash-c Windows wrapper invariant.	2026-05-07 06:59:13 +02:00
ZdenekSrotyr	df896816d8	chore: rename stale 'da' references to 'agnes' + CHANGELOG Drive-by docstring/comment cleanup in cli_artifacts.py and update_check.py. CHANGELOG entry for the auto-upgrade feature shipped in this branch.	2026-05-06 23:23:59 +02:00
ZdenekSrotyr	73d2896fa6	docs(hooks): update install_claude_hooks docstring for chained SessionStart	2026-05-06 23:23:23 +02:00
ZdenekSrotyr	be62ce61b8	feat(cli): install SessionStart hook chaining self-upgrade then pull Single hook entry: 'agnes self-upgrade --quiet ... \|\| true; agnes pull --quiet ... \|\| true'. Shell semicolon guarantees ordering across every Claude Code version (no reliance on undocumented multi-hook execution semantics); each segment's \|\| true preserves the original property that an upgrade failure does not abort the pull.	2026-05-06 23:23:23 +02:00
ZdenekSrotyr	630e224578	feat(cli): add agnes self-upgrade with smoke test + rollback Reuses cli.update_check.check() for the version probe — extended with bypass_disabled=True so explicit user-typed self-upgrade is not silenced by AGNES_NO_UPDATE_CHECK (which is for the implicit warning loop). Install path: uv tool install --force when uv is on PATH; otherwise curl + pip via sys.executable (NOT system python3, NOT --user — both would land outside the agnes venv and silently no-op the upgrade). Smoke test execs the binary at the install-resolved path (uv tool dir joined with agnes-the-ai-analyst/bin/agnes, or sys.executable's sibling agnes for pip) — never via shutil.which, which can resolve a stale shadow on PATH and produce a false-positive smoke pass on the OLD version. Smoke also asserts --version output contains info.latest via PEP 440 Version() equality (so 0.40.0 does not falsely match 0.40.10). On smoke fail: rollback to last_known_good.json (written only after a previous run's smoke passed). Rollback rc is captured and surfaced on stderr if it also fails. First-ever upgrade or unrecoverable rollback prints the canonical bootstrap recovery: curl -fsSL <server>/cli/install.sh \| bash. AGNES_SELF_UPGRADE_IN_PROGRESS=1 is set for the duration of the run and propagated to the smoke-test subprocess. Layer B's _check_version_headers honors the sentinel and skips the < min hard-stop, so an in-flight upgrade can never sys.exit(2) itself. --force invalidates the update_check cache BEFORE probing. --force + offline = exit 1 with explicit stderr (without --force, offline is silent). --quiet suppresses progress output but never gags failure stderr.	2026-05-06 23:23:23 +02:00
ZdenekSrotyr	d93eda7de3	perf+test(cli): cache User-Agent at module scope; pin local==min boundary	2026-05-06 23:23:23 +02:00
ZdenekSrotyr	2680a6724b	feat(cli): hard-stop on incompatible-version response header Every API response is inspected via httpx event_hooks. When the server reports X-Agnes-Min-Version > local, CLI prints a remediation message and exits 2. Latest-version drift continues to be handled by the update_check warning loop — no double-warning on every API call.	2026-05-06 23:23:23 +02:00
ZdenekSrotyr	77d88014df	fix: devil's advocate R3 — reap PID-suffixed leftovers from dead processes R3 final pass surfaced one issue, addressed: R2#2 introduced PID-suffixed <target>.{pid}.tmp / .{pid}.partN to prevent concurrent agnes pull invocations from yanking each other's in-progress writes. The pre-clean inside _download_chunked / _download_single_stream only deletes leftovers from the CURRENT process's PID — files from a SIGKILL'd or crashed prior pull (any other PID) are never touched and accumulate on disk forever. Add _reap_dead_pid_leftovers(target_path) called at the start of both download paths. Globs <target>..tmp / <target>..partN, extracts the embedded PID, calls os.kill(pid, 0) to test liveness (POSIX standard no-op probe), and unlinks files whose process no longer exists. Permission-denied = process is alive but owned by another user → keep the file (conservative). Windows users get the conservative 'keep' default. Two new tests pin the behavior — live-PID file preserved, dead-PID .tmp + .partN reaped, bare-name (legacy) untouched, garbage filenames skipped without raise.	2026-05-06 14:04:47 +02:00
ZdenekSrotyr	aee585fac6	fix: devil's advocate R2 — narrow shared-client try, PID tmp suffix, Syntax error anchor R2 adversarial review surfaced 3 issues, all addressed: #1 cli/client.py:572-577 outer try/except wrapped both _get_shared_client() AND the actual download. A 401/403/404/5xx from the server triggered a full second download attempt with a fresh client — wasted bandwidth on hard failures, no fail-fast on revoked PAT. Narrowed the try to only the shared-client construction; the download itself is no longer retried under the fallback except. #2 concurrent agnes pull invocations (e.g. SessionStart hook + manual run) collided on bare <target>.tmp / <target>.partN paths — one process's in-progress write got yanked by the other's cleanup, manifest hash check then failed spuriously. Per-process suffix (<target>.{pid}.tmp, <target>.{pid}.partN) makes intermediate files disjoint; the final os.replace to the bare target is atomic so last-writer-wins. #3 _looks_like_bq_rewrite_parse_error patterns 'Syntax error' could false-positive on a query like WHERE log_msg = 'Syntax error in foo' that fails for an unrelated reason (quota, network) and has the literal substring echoed in the error text. Anchored to 'Syntax error: ' (with trailing colon) — BQ always emits the colon in this error format, user SQL string literals normally don't.	2026-05-06 13:57:29 +02:00
ZdenekSrotyr	e5645fd280	fix: devil's advocate R1 — chunked probe, parse-error heuristic narrow, pool settings refresh, content-length sanity, multi-project skip R1 adversarial review surfaced 5 issues, all addressed: #1 chunked download silently disabled in non-Caddy deployments (HEAD on GET-only FastAPI route returns 405). _probe_range_support now falls back to GET with Range: bytes=0-0 when HEAD fails — works against both Caddy file_server (HEAD-friendly) and dev FastAPI direct (GET-only). #2 parse-error fallback heuristic too broad — matched on Unrecognized name / Function not found / No matching signature / Invalid cast, which BQ surfaces for ordinary user-column typos. That triggered slow ATTACH-catalog retry on every typo (2× latency tax). Narrowed to just 'Syntax error' / 'syntax error' which are the genuine DuckDB-vs-BQ dialect mismatch markers. #3 apply_bq_session_settings was only run on fresh-built pool entries, not on reuse. An operator's /admin/server-config change to bq_query _timeout_ms wouldn't propagate to long-lived pooled sessions until restart. Fixed: re-apply on every pool acquire (idempotent + fail-soft). #4 content-length sanity bound — a misconfigured proxy returning a wildly inflated Content-Length would cause overlapping chunked Range requests against the actual file → corrupt assembled output (caught by manifest hash check, but only after wasted bandwidth). Cap at 100 GiB; above that, drop to single-stream. #5 rewriter assumed every BQ row resolves under the single bq.projects.data project. Bucket containing '.' suggests a project- qualified bucket (multi-project deployment); rewriter would silently target the wrong project. Conservative skip with regression test.	2026-05-06 13:50:46 +02:00
ZdenekSrotyr	e72ff259f9	feat(pull): aggregated progress + non-TTY textual fallback Two improvements to `agnes pull` progress reporting: 1. Aggregated per-file progress across chunked downloads: the existing Rich progress bar already used one task per file, but the chunked-download contract (one file = N parallel chunk callbacks summing to file size) meant we needed to verify that all chunk threads advance the same task. They do — the per-file callback is constructed once per tid and routes every chunk's byte delta to the same task / textual entry, so the bar shows one aggregated bytes- downloaded total rather than N separate sub-bars. 2. Textual fallback for non-TTY stderr: when stderr is not a terminal (SessionStart hook, CI runner, Docker log capture), Rich either suppresses output (silent multi-minute pull on a 5 GB parquet) or emits raw control sequences. The new `_TextualProgress` helper instead emits one plain-text line per file at most every 10%-of-total-bytes or 30 s, plus a final `100% done` line per file. Format: `[N/T files] <tid>: 25% (16 MB / 66 MB) at 1.5 MB/s`. The TTY path is unchanged. Detection uses `sys.stderr.isatty()` — `show_progress=True` flips into the textual fallback when that returns False. `show_progress=False` (the SessionStart hook) still emits no progress text in either mode.	2026-05-06 13:09:37 +02:00
ZdenekSrotyr	bd1b5ad444	perf(cli): persistent HTTP/2 client across pull invocation Pool the httpx.Client used by `stream_download` so N parquet downloads share a single TLS handshake instead of one handshake each. With the optional `h2` package installed, HTTP/2 multiplexing further lets all chunk Range requests share a single TCP connection — synergizes with the range-chunked download path added in the previous commit. The shared client is created lazily on first stream-download call, kept alive for the duration of the process via a module-level slot, and closed at exit via `atexit.register`. Construction wraps in a try/except: when `h2` is unavailable (slim install), httpx raises ImportError on `http2=True` and we transparently fall back to an HTTP/1.1 client — pooling alone still amortizes TLS handshakes. `agnes pull` must never crash on a missing optional package, so the fallback path is non-negotiable. `h2>=4.1.0` is added to the core dependency set; downstream slim installs that drop it lose the HTTP/2 benefit but keep correctness.	2026-05-06 13:06:36 +02:00
ZdenekSrotyr	dee33fe25b	feat(pull): range-chunked parallel download for single large files When the server advertises `accept-ranges: bytes` and a parquet exceeds `AGNES_PULL_CHUNK_THRESHOLD_BYTES` (default 50 MB), `stream_download` now splits the file into N parallel HTTP Range requests (`AGNES_PULL_CHUNK_PARALLELISM`, default 4, capped 1..16) and assembles the parts into the destination atomically. Targets the per-flow-shaped network (corp VPN with per-TCP-connection rate-limiting) where single-stream throughput is throttled but N parallel streams over the same connection scale roughly linearly. Manifests with 1 large materialized parquet + N remote tables previously left the existing across-files `AGNES_PULL_PARALLELISM=4` pool with 1 active worker = single-stream throughput; this fixes that. Falls back to single-stream when: - HEAD doesn't advertise `accept-ranges: bytes` - Server returns 200 instead of 206 to a Range probe - File size below the threshold Cleanup discipline: every part file removed before return (success or failure); destination written via `<target>.tmp` and renamed atomically. Per-chunk retry on transient network blips (bounded by AGNES_STREAM_RETRIES).	2026-05-06 13:04:53 +02:00

1 2 3

130 commits