Commit graph

22 commits

Author SHA1 Message Date
ZdenekSrotyr
64cf78860d
feat(stack): unified Browse + My Stack for Data Packages and Memory (v49 schema) (#333)
* feat(unified-stack): Browse + My Stack + Recipes + RBAC matrix (v49–v55)

Squash of 94 commits spanning the v49 → v55 unified-stack rewrite.
Full per-feature breakdown lives in CHANGELOG.md under [Unreleased].
Major buckets:

* v49 schema — first-class user_groups + user_group_members +
  resource_grants; admin can CRUD groups and grants; Google
  Workspace nightly sync writes into the new tables.
* v49 data_packages — admin-curated bundles of tables, RBAC-gated,
  first-class section on /catalog Browse + My Stack.
* v49 memory_domains — row-backed (replaces hardcoded VALID_DOMAINS
  enum); admin can CRUD; grants follow the same shape as tables and
  packages.
* v50 cover_image_url + admin sidebar collapsibles + per-row Mode
  tooltip + admin queue domain badges + admin "+ New Item" seed flow.
* v51 lifecycle status (prod/poc/coming-soon/draft) + category +
  palette swatches on admin modals.
* v52 per-table detail page /catalog/t/<id>.
* v53 Recipes — admin-curated SQL templates as a second tab on
  /catalog with full Edit/Delete admin affordances.
* v54 soft-delete (deleted_at) + Undo toast for packages, memory
  domains, and recipes; hard_delete() retained as escape hatch.
* v55 Recipes RBAC — ResourceType.RECIPE registered, inline Group
  Access matrix on Create + Edit Recipe modals (mirrors the Memory
  Domain pattern).
* Activity Center per-resource filter (resource_prefix LIKE-anchored
  on audit_log.resource); admin nav g+letter keyboard shortcuts;
  loadAdminTablesLayout N+1 → single endpoint; /api/memory 30s
  page-level cache.
* CI hardening — Keboola legacy tests pytest.importorskip; perf-
  smoke threshold widened to stop cold-cache flake.

5002 tests passing, 35 skipped.

* feat(p2 backlog): Cmd-K palette + suggest-a-domain + nightly E2E + v55 schema

10-item P2 sweep on top of the unified-stack squash. New behaviour:

* Cmd-K admin command palette (base.html) — fuzzy-search overlay over
  admin + user-facing routes. Arrows/Enter to navigate, Esc to close.
* Stack-tabs digit shortcuts — 1/2/3 switch Browse / My Stack /
  Recipes on /catalog + /corporate-memory.
* Friendlier non-admin empty state on /corporate-memory, plus a
  "Suggest a domain" CTA → POST /api/memory-domain-suggestions, admin
  queue with approve/reject. Backed by a new memory_domain_suggestions
  table (schema v55).
* /admin/corporate-memory 7-tab strip grouped under Moderation /
  Catalog parent labels.
* Bulk-assign table → package dropdown annotates each option with
  "(N of M tables already in)" so the existing distribution is visible
  before picking a target.
* GET /api/memory + /tree accept is_required filter; admin status
  dropdowns route the "Required" sentinel onto it (status no longer
  holds 'mandatory' post-v49, so the old dropdown returned nothing).
* chip-input.js is now opt-in per template via {% block extra_scripts %}
  instead of loaded globally on every page from base.html.
* Edit-modal close helpers consolidated onto _closeEditModalById();
  docs the per-source-type modal architecture decision.
* New .github/workflows/e2e-nightly.yml runs agent-browser smoke
  scripts (scripts/e2e/smoke_*.sh) against a docker-compose stack
  nightly at 04:30 UTC; failures open an agent-browser-nightly issue.

5012 tests passing, 35 skipped.

* fix(visual audit): 6 page regressions on memory + data-package surfaces

agent-browser walkthrough of every memory + data-package page in the PR
turned up 6 real bugs. Fixes:

1. Admin memory modals were dead. Duplicate `let _cmdNewDomainId`
   declarations from the deprecated step-2 RBAC stubs in
   admin_corporate_memory.html collided with the live state vars
   declared earlier in the same <script> → SyntaxError on parse →
   the entire second script block silently failed → every inline
   onclick= handler defined there (`+ New Memory Domain`, Edit, etc.)
   was a no-op. Removed the duplicate stubs.

2. /catalog/t/<table_id> + /catalog/r/<slug> rendered unstyled.
   Both templates injected their CSS via {% block head %} but
   base.html exposes {% block head_extra %} — wrong block name
   meant <style> rules never reached the rendered HTML. Renamed
   to head_extra. Hero card, section cards, dark SQL block, proper
   full-width inputs all now render as designed.

3. L49 leak — "MANDATORY" KPI label + "Make Mandatory" row buttons
   on /admin/corporate-memory still used the old word. Renamed to
   "Required" / "Mark as Required" so UI matches the data model
   (v49 split moved the Required tier onto the orthogonal
   is_required boolean; status no longer holds 'mandatory').

4. Activity Center Resource dropdown didn't know the v55
   `memory_domain_suggestion:` namespace — added it.

5. Tab strip on /admin/corporate-memory wrapped text 2× per button
   on narrow viewports after the L50 MODERATION/CATALOG group
   labels pushed total width past most viewports. Switched the
   strip to flex-wrap:nowrap + overflow-x:auto with
   white-space:nowrap + flex-shrink:0 on every direct child so the
   tabs stay one row and slide horizontally when they overflow.

5012 tests passing, 35 skipped.

* rebase-cleanup: align with main's 0.54.25-27 API design + comment fix

Three follow-on fixes after rebasing onto origin/main (0.54.27):

* admin_tables.html: dropped a stray nested ``{% if data_source_type
  == 'keboola' %}`` around ``prefillFromKeboolaTable`` (main never had
  it; the outer Phase F2 guard already covers it) and reworded a JS
  comment that contained literal ``{% %}`` tokens which Jinja was
  parsing as a real tag → unbalanced if/endif → 30 template render
  failures across the suite.
* /api/stack/subscription/{type}/{id}: DELETE now returns 204 instead
  of 200 per the 0.54.26 design rules. CLI client + parity tests
  updated to accept 2xx / assert 204.
* Memory-domain suggestion approve/reject paths added to
  ``_VERB_PATH_ALLOWLIST`` — they are pending → approved/rejected
  state-machine transitions (approve also creates the real
  memory_domains row as a side effect), so the RPC shape is
  intentional rather than a missed PATCH refactor.

5035 tests passing, 35 skipped.

* fix(catalog_table_detail): real polish pass — hero glyph, dedup pills, rows/size meta, scoped sync CTA

The previous fix only got the block-name typo so the existing CSS rendered.
The actual layout was still wireframe-tier on close inspection:

* No cover glyph in the hero (a flat white card with title + meta line);
  data-package + memory-domain detail pages both have a colored icon
  square. Restored parity — table.icon emoji if set, otherwise initials
  on a colored square using table.color.
* "INTERNAL" pill rendered twice for agnes_audit etc. — the mode pill
  and the source-type pill happened to be identical strings. Now skip
  the source pill when it matches the mode (`internal == internal`).
* Bucket / source_table code chip showed `Agnes Internal.audit_log` for
  internal rows — meaningless to a user. Hidden when source_type is
  internal.
* `pairs_well_with` admin input was a comma-separated `<input>` always
  visible. Wrapped all 4 sections in an Edit-on-demand toggle: read-
  only display by default, "+ Add" / "Edit" button on the right edge
  of each section header reveals the inline form, Cancel hides it.
* "Trigger sync now" was a cramped link squashed into the empty-state
  flex row (visible as `Tr…` overflow before). Promoted to a proper
  btn-primary button under the empty-state copy. Hidden entirely for
  internal tables (which are server-managed — no upstream to pull).
* Hero meta now surfaces row count + payload size (when sync_state has
  them) + last sync timestamp on a single line — was missing from the
  original.
* Mode pills colored by tier (local=green, remote=amber, materialized=
  blue, internal=gray) so the basic fact about a table reads at a
  glance, not from upper-cased ALL-CAPS text alone.

* tests(v56): TDD baseline for extended data-packages content + per-table docs

68 failing tests across 8 files spec the v56 surface before any
implementation lands:

* test_schema_v55_to_v56_migration.py — schema bump, additive ALTERs
  on data_packages + table_registry, idempotency, sequential-upgrade
  preservation
* test_data_packages_repo_v56.py — repo create/update/get/list for
  owner_name, owner_team, tags, long_description, when_to_use,
  when_not_to_use, example_questions (JSON list round-trip, empty
  defaults, partial-update preservation)
* test_table_registry_v56_docs.py — update_docs for grain, platforms,
  partition_col, history, gotchas; preserves v52 docs columns
* test_api_data_packages_v56.py — PUT/POST/GET for all new fields,
  field-level validation (tag count, bullet length, description size),
  virtual badge derivation (curated/new)
* test_api_registry_docs_v56.py — PATCH /api/admin/registry/{id}/docs
  for v56 fields, validation, RBAC unchanged
* test_web_catalog_package_detail_v56.py — /catalog/p/<slug> rewrite
  asserts on rendered owner line, tag pills, badges, What it is,
  Use it when, Skip it when, Example questions, per-table extended
  detail in collapsible row, key-gotcha distinctness, admin-only Edit
* test_web_stack_card_v56_metadata.py — Browse-grid card additions
  (owner chip, tag chips, badges) without breaking back-compat for
  rows missing the new fields
* test_data_packages_no_vendor_content.py — CI guard: scans app/ +
  src/ + cli/ + config/ + scripts/ for Groupon-specific tokens from
  the colleague's spec MD; fails if any leak into OSS surfaces
* test_db_schema_version.py — bumped 55 → 56 with rationale

Plus updates schema-version assertion to 56. Implementation lands in
subsequent commits (schema migration → repo → API → templates).

* feat(v56): schema + repo for extended data-packages content

Schema additions (ALTER ADD COLUMN IF NOT EXISTS — additive + idempotent):

* data_packages: owner_name, owner_team, tags, long_description,
  when_to_use, when_not_to_use, example_questions (JSON-as-VARCHAR for
  the lists)
* table_registry: grain, platforms, partition_col, history, gotchas
  (extends the v52 sample_questions / things_to_know / pairs_well_with
  docs surface with structured per-table content)

Repo extensions:

* DataPackagesRepository.create + update accept the new fields with
  the same Optional-is-no-op contract as v51 (pass an empty list to
  clear a JSON column)
* _decode_row decodes the new JSON-list columns to Python lists; NULL
  rounds back to [] so callers don't branch
* TableRegistryRepository.update_docs grew the v56 fields alongside
  the existing v52 ones — single PATCH can write either tier
  atomically
* TableRegistryRepository._decode_row picks up platforms + gotchas in
  the same NULL-tolerant decoder

22 repo + migration tests passing. API + UI land in subsequent commits.

* feat(v56): API surface for extended data-packages + per-table docs

CreateDataPackageRequest + UpdateDataPackageRequest grew the v56 fields
(owner_name, owner_team, tags, long_description, when_to_use,
when_not_to_use, example_questions) with per-field validators that
match the Foundry spec checklist:

  * tags: ≤8 entries × ≤30 chars
  * long_description: ≤4000 chars
  * use/skip: ≤8 bullets × ≤200 chars
  * example_questions: ≤12 × ≤200 chars

_serialize emits all v56 fields plus a virtual ``badges`` list derived
server-side at render time (no DB column needed): "curated" when the
creator is in the Admin group, "new" within 30 days of created_at.
Backdating created_at or admin-status changes pick up automatically.

PATCH /api/admin/registry/{id}/docs extended with v56 structured
per-table fields (grain, platforms, partition_col, history, gotchas).
gotchas: list of {key: bool, body: str} Pydantic models with the same
≤8 cap; first key=true entry becomes the Key gotcha on the rendered
package detail page. PATCH echoes the fresh state so callers can
re-render without a second GET.

26 API tests passing (16 data-packages + 10 registry-docs).

* feat(v56): /catalog/p/<slug> rewrite + Browse-grid card augmentation

The third (and final) v56 commit lights up the UI surfaces backed by
the schema + API commits earlier in this PR:

* /catalog/p/<slug> template rebuilt around the Foundry spec's
  section ladder — hero (icon + name + badges + owner + tags +
  description + meta + Add-to-stack), "What it is" markdown body,
  paired "Use it when / Skip it when" panels, "Tables in this
  package" with collapsible per-table extended detail (grain /
  platforms / partition_col / history / gotchas + sample questions),
  and an "Example questions you can ask Claude" prompt panel. Each
  section guarded by ``{% if pkg.<field> %}`` — empty content fields
  hide the section entirely (no "No X yet" placeholder noise on the
  public-facing drilldown).
* router catalog_package_detail hydrates per-table v56 fields onto
  the tables list + derives the virtual badges (curated / new)
  server-side from creator-in-Admin + 30-day created_at.
* StackResolver.ResourceEntry grew owner_name / owner_team / tags /
  badges; _fetch_entries pulls the v56 columns + computes badges
  once per fetch using a single Admin-group SELECT.
* _data_package_entry_dict adapter passes the new fields through to
  the macro; tags are merged source-type pills + admin-authored
  category tags per the spec convention.
* _stack_card.html renders the v56 badges (top-left, data-badge=
  hooks) + the owner chip (data-card-owner hook) without breaking
  back-compat — pre-v56 rows render unchanged.
* Admin PUT handler strips the v56 docs fields from the
  read-modify-write merged dict so register() doesn't blow up
  with the now-larger row shape (same pattern as the v52 docs
  fields stripping).

5115 tests passing (+98 v56 + 18 fixed regressions from the merged-
register PUT path), 35 skipped.

* fix(rbac): Edit-on-package + Group-access 'required' persistence + CI vendor guard

Three related bugs reported on the merged-with-main branch:

1. Clicking Edit on a Data Package card landed on /admin/tables with
   a `#<pkg.id>` hash that nothing listened to — admin saw the global
   table listing, not the editor for that specific package. Added a
   `?edit_package=<pkg_id>` query-param handler in admin_tables.html
   (analog to the existing `?edit=<table_id>` and `?assign_to=<pkg_id>`
   patterns) that calls openEditDataPackageModal on DOMContentLoaded
   after a 250ms layout settle. Updated the package-detail Edit link
   to use the new query param.

2. Setting Group Access to 'required' didn't persist — re-opening
   the modal showed 'available'. Root cause was the v49
   ``resource_grants.requirement`` enum existing in the DB but the
   POST /api/admin/grants endpoint not surfacing it: ``CreateGrantRequest``
   declared only group_id + resource_type + resource_id, so Pydantic
   silently dropped the matrix's ``requirement: 'required'`` payload
   and the new row landed at the DB column default ('available').
   Plumbed ``requirement`` through ``CreateGrantRequest`` →
   ``ResourceGrantsRepository.create`` so the value persists in one
   round-trip. Plus a UNIQUE-constraint race in the matrix
   diff-apply: DELETE-old + POST-new ran in parallel via
   ``Promise.allSettled``, so POST could fire first and trip the
   unique check before DELETE freed the slot. Switched to sequential
   (await all deletes; then await all writes) across all three
   matrices (Edit Data Package, Edit Memory Domain, Edit Recipe).

3. CI vendor-content guard ``test_no_groupon_specific_strings_in_oss``
   tripped on two of my own docstrings: a "Foundry Data team" mention
   in two src/db.py comments + an ``s1_session_landings`` example in
   cli/skills/agnes-table-registration.md. Rephrased the comments to
   "extended-descriptions admin spec" and replaced the example with
   a generic ``events_daily`` table name.

5164 tests passing, 35 skipped (+4 regression tests pinning the POST
/api/admin/grants requirement contract). Vendor guard back to green.

* fix(catalog): admin Browse path drops v58 card fields

The /catalog and /memory admin god-mode branch built ResourceEntry
instances inline from pkg_repo.list() / domains_repo.list() and skipped
owner_name, owner_team, tags, and derived badges (curated/new). Visible
symptom: a package with an owner + tags rendered with the v56 chrome
for non-admin viewers but as a bare card for admins.

Adds StackResolver.browse_admin(user_id, resource_type) — admin god-mode
Browse that walks the full table but routes through the same
_fetch_entries enrichment pass as browse(), so admin + non-admin Browse
stay visually consistent. Both /catalog and /corporate-memory routes
switch to it.

Regression test in tests/test_stack_resolver_browse_admin.py covers:
owner/tags propagation, new/curated badge derivation, in_stack from
admin subscriptions, all-packages-regardless-of-grants, and the
ValueError for unsupported resource types.

* fix(catalog): three /catalog tab-strip UX bugs

1. Required Remove → red toast
   browse_admin passed empty required_ids to _fetch_entries, so the
   admin's own required grants surfaced as 'available' and the macro
   rendered an actionable Remove button that POST /unsubscribe 400'd
   on. Now derives required_ids from the admin's own groups so
   Required packages render with the disabled "In stack (required)"
   button. Regression test in test_stack_resolver_browse_admin.py.

2. Remove green-toasts but card stays until refresh
   The My-Stack empty-state placeholder was only emitted server-side
   when stack_entries was empty at render time. Removing the last
   card left the tab completely blank — users read that as "Remove
   didn't work, let me refresh". Both grid + empty-state are now
   always rendered with one of them initially hidden; the JS swaps
   visibility on add/remove instead of injecting DOM. Same fix in
   /corporate-memory.

3. "What are Recipes?" + ambiguous (admin) suffix
   Recipes tab now carries its own curator-block explainer (the
   shared one was moved inside Browse view so it doesn't bleed
   across tabs). The grey "(admin)" suffix becomes a yellow
   .admin-only-hint chip with a title tooltip — visibility hint is
   now unambiguous: yellow chip = "only you see this", non-admins
   don't see the affordance at all.

* schema: renumber v51..v58 → v52..v59 to make room for main's v51

Main 0.54.29 introduced a NEW v51 (table_registry.bq_fqn — issue #343)
that releases ahead of this branch. The unified-stack chain v51..v58
shifts up by one so main's v51 stays as the released schema and ours
become v52..v59. Function names, internal version bumps, dispatch
ladder thresholds, and the migration-test references all move
together. Subsequent merge with main lands the bq_fqn column at the
freed v51 slot.

* fix(seed): seed admin lands in BOTH Admin AND Everyone groups

The LOCAL_DEV_MODE / SEED_ADMIN_EMAIL bootstrap only added the seed
user to Admin. Everyone-scoped grants — the canonical "every-user-
sees-this" pattern for Required onboarding — didn't surface for the
seed admin's own /catalog because they weren't in Everyone. Symptom:
admin grants a Required-tier package to Everyone, then sees it on
/catalog still rendered with an "Add to stack" button (because the
admin's resolved required_ids was empty for that package).

The dual-membership keeps Admin (authorization) and Everyone
(default-grant target) intentionally separate per the design comment
on UserRepository.create — every membership remains traceable to a
concrete row, just now with a system_seed row in Everyone too. Both
INSERTs go through UserGroupMembersRepository.add_member which is
idempotent on (user_id, group_id), so re-fires on every lifespan
startup don't duplicate rows.

Regression test in test_main_seed_admin_everyone.py.

* style: unify admin-only hints across marketplace + memory detail pages

Replaces three stale ``(admin)`` parentheticals with the same yellow
``admin-only`` chip introduced for /catalog tab actions. Same tooltip
copy ("Visible only to admins — analysts won't see this …") so the
visibility hint is unmistakable wherever it appears:

- Hard delete on marketplace_plugin_detail (admin-only destructive
  action — same gating as the original suffix conveyed).
- Hard delete on marketplace_item_detail (same).
- Edit link on memory_domain_detail (title-attr only before; now a
  visible chip too).

Non-admin viewers never saw these affordances — the gates are
unchanged. Pure styling pass for consistency.

* fix(catalog): exclude soft-deleted data packages + memory domains from Browse

``StackResolver._fetch_entries`` and ``browse_admin`` were querying
data_packages / memory_domains without a ``deleted_at IS NULL`` guard.
A package soft-deleted via /admin/* (v54 soft-delete contract) stayed
visible on /catalog and /memory until either an Undo or a hard delete
— directly contradicting the soft-delete UX which is supposed to
remove the affordance immediately and only retain the row for the
Undo window.

The repository accessors (DataPackagesRepository.list,
MemoryDomainsRepository.list, list_packages_of_table, etc.) already
filter deleted rows; this commit brings the resolver's direct SQL in
line with that contract.

Regression test in test_stack_resolver_browse_admin.py.

* fix(catalog): Add/Remove updates full card chrome, not just button

The previous _applyStackChange flipped only the footer button label —
the card border (.is-in-stack class), top-right "In stack" badge, and
button color class (--add / --remove) stayed at their server-rendered
state. After Add the user saw the button checkmark but the rest of
the card still looked like "available, not in stack". They read this
as "the change didn't take — let me refresh".

This commit makes the optimistic update mirror what the server-side
macro renders for the new state:

* ``c.classList.toggle('is-in-stack', becameInStack)`` — flips the
  border + visual state class.
* Top-right ``.stack-card__req-badge--instack`` badge is injected on
  Add, removed on Remove (skipped when ``data-requirement='required'``
  — that slot is owned by the Required badge).
* Button text is "Remove" / "+ Add to stack" matching the macro
  (was "✓ In stack" which was visually nice but inconsistent).
* Button color class --add / --remove swaps so the destructive Remove
  tint kicks in immediately.

The clone-into-My-Stack path applies the same updates so the new card
in My Stack reads identically to a server-rendered in_stack card.
Mirrored in /corporate-memory.

* fix(memory): four Devin-review bugs on /memory drill-down + manifest

PR #333 Devin review surfaced four real bugs that ship a broken
/memory experience even though the unit tests passed.

1. Manifest md5 omits is_required + content (app/api/sync.py:836-840)
   _build_memory_domains_section hashed only (id|title|status) per
   item. _build_per_domain_markdown routes items between "## Required"
   and "## Approved" by is_required and embeds full content — so an
   admin edit of either dimension left the manifest md5 unchanged,
   `agnes pull` skipped the re-fetch, and the analyst kept a stale
   bundle.md. Now both fields participate in the hash.

2. required_count always 0 (src/repositories/memory_domains.py)
   list_items_of_domain only SELECTed (id, title, status) so the
   `it.get("is_required")` in the manifest builder always evaluated
   to None → required_count = 0 regardless of actual state. The
   manifest builder advertised a count it could never compute. Now
   projects is_required + content too (required by fix 1 anyway).

3. Vote URL 404 (memory_domain_detail.html:289-290)
   Constructed `/api/memory/items/{id}/vote` but the route is
   `/api/memory/{id}/vote`. Every upvote/downvote button was a
   silent no-op.

4. Dismiss/undismiss URL + method both wrong (memory_domain_detail.html:296-305)
   Constructed `/api/memory/items/{id}/dismiss` (extra /items/) and
   /undismiss (no such route — undismiss is DELETE on /dismiss).
   Both buttons silently 404'd. Now POST + DELETE on
   `/api/memory/{id}/dismiss` per app/api/memory.py:635/675.

* fix: multi-agent reviewer findings — vendor-token scrubs + manifest md5 predicate + soft-delete filter

Three reviewer findings from the multi-agent review on PR #333,
fixed in-place per CLAUDE.md issue-economy rule.

Reviewer-rules (Important — vendor-agnostic OSS):
- app/main.py:218 comment: replaced 'foundryai-prod' with generic
  'a customer prod instance' phrasing. Public OSS repo must not
  carry customer-specific tokens (CLAUDE.md § Project conventions).
- tests/test_table_registry_v56_docs.py:70 fixture string:
  replaced "user_brand_affiliation = 'groupon'" with 'acme' on
  the same rule.

Reviewer-architecture (closes still-unresolved Devin 🚩 ANALYSIS):
- app/api/sync.py _build_memory_domains_section: md5 hash loop now
  filters items to the SAME predicate the bundle renderer uses
  (is_required OR status='approved'). Pre-fix the hash iterated ALL
  items but _build_per_domain_markdown only rendered the union of
  required items + approved-non-required items — so an admin edit
  to a pending/rejected non-required item flipped the md5 against
  an identical-bytes bundle, triggering a wasteful re-fetch on
  every analyst's next 'agnes pull'. The earlier commit fixed the
  hash-input fields (is_required + content); this closes the
  set-of-items asymmetry Devin separately flagged.

Reviewer-RBAC (minor cleanup):
- app/resource_types.py _data_package_blocks and _memory_domain_blocks
  now filter 'WHERE deleted_at IS NULL' (v54 soft-delete column) so
  the /admin/access UI doesn't surface soft-deleted entities as
  grantable. Mirrors the existing filter on _recipe_blocks. No
  security leak pre-fix (resolver double-filters and re-checks at
  serve time), just UI cleanliness.
- app/services/stack_resolver.py add_to_stack: docstring note
  added explaining that authorization is enforced at the API layer
  (app/api/stack.py can_access gate), not at the resolver. The
  initial review suggested adding a defensive 403 here, but that
  broke 5 existing tests that legitimately call add_to_stack
  directly without setting up grants first; the docstring captures
  the contract instead. stack() already intersects subscriptions
  with current available_ids on every read, so a 'zombie' row from
  a misuse never leaks into the user-facing manifest.

* release: 0.55.0 — unified Browse + My Stack (Data Packages + Memory), schema v48→v59, 3 BREAKING
2026-05-19 15:00:15 +02:00
minasarustamyan
17159bfad9
fix: refresh-marketplace enables stack plugins; override sentinel is init-time only (#307)
* fix(refresh-marketplace): also enable stack plugins in workspace settings

Reconcile previously stopped at `claude plugin install --scope project`,
which only writes the global plugin registry. Without an entry in the
workspace `.claude/settings.json` `enabledPlugins` map, Claude Code
treats every plugin as disabled — `/plugins` doesn't list them and
their slash commands, skills, and agents are unreachable.

Refresh now writes the enable map after install/update, treating the
user's marketplace stack as the source of truth (re-enables anything a
prior `claude plugin disable` locally turned off). Override workspaces
are skipped via `is_override_workspace`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(override): sentinel governs init only, not runtime CLI

Sentinel `.claude/init-complete` with `override: true` was meant to
let admins ship INITIAL workspace content. The implementation was
over-scoped — `is_override_workspace` check sat inside every Agnes
writer (`install_claude_hooks`, `install_claude_commands`,
`maybe_refresh_claude_hooks`, `_enable_plugins_in_workspace_settings`),
which blocked runtime commands too. Operators on override workspaces
got trapped at the template snapshot: no `enabledPlugins` map from
`agnes refresh-marketplace`, no hook auto-migration from
`agnes self-upgrade`.

Move the check to the init-time call site (cli/commands/init.py,
`if not override_active:`) — the single place where init-time skip
is the right behavior. Writers themselves become unconditional;
runtime CLI now updates `.claude/` regardless of the sentinel.

Admin custom hooks survive — refresh only rewrites entries matching
`_OUR_COMMAND_MARKERS` (foreign commands fall through unchanged,
same contract as default workspaces).

Existing override workspaces auto-converge on next
`agnes self-upgrade` (fires from every SessionStart). No manual
migration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 18:43:32 +02:00
minasarustamyan
69a1e22cf5
feat(initial-workspace): per-instance agnes init override (#292)
* feat(initial-workspace): per-instance agnes init override

Adds Initial Workspace Template — an admin-configurable per-instance
override for the agnes init analyst workspace. When configured, agnes
init downloads a server-rendered zip from a Git repo the admin registered
and extracts it into the analyst's workspace, fully bypassing Agnes-default
CLAUDE.md / settings.json / hooks / slash commands / AGNES_WORKSPACE.md.

Repo layout convention: only the contents of a top-level `workspace/`
subdirectory ship to analysts; admin docs (README, CI configs) at the
repo root stay in the repo and never reach an analyst. Sync rejects
repos without `workspace/` at root.

Server side:
- src/initial_workspace.py — clone (or fetch+reset), validate, build zip
  with strict path checks and reserved-path rejection
  (workspace/.claude/init-complete reserved by Agnes)
- app/api/initial_workspace.py — admin CRUD + sync endpoint + analyst-
  facing status/zip/applied endpoints; config persists to instance.yaml
  overlay, PAT to .env_overlay
- app/secrets.py — refactor: persist_overlay_token shared helper with
  threading.Lock for .env_overlay writes (closes pre-existing race
  between concurrent marketplaces saves)
- app/web/templates/admin_server_config.html — new "Initial Workspace
  Template" section + modal + Sync/Edit/Delete/Download buttons (matches
  existing cfg-section visual language)

CLI side:
- cli/lib/override.py — single source of truth for is_override_workspace
  sentinel detection
- cli/lib/initial_workspace.py — probe status, safe zip extraction with
  ../absolute/symlink rejection, typed-YES force confirmation
- cli/commands/init.py — override branch (skips Agnes-default workspace
  writes); extended sentinel with override:true, template_source,
  template_sha so future agnes self-upgrade does not auto-refresh hooks
- cli/lib/hooks.py + cli/lib/commands.py — short-circuit on override
  workspaces (install_claude_hooks, install_claude_commands,
  maybe_refresh_claude_hooks)

Audit-event strategy: server writes initial_workspace.fetch_started
inside GET /api/initial-workspace.zip (cannot be spoofed by PAT-holder);
CLI POST /applied writes initial_workspace.applied as best-effort
confirmation. Admin mutations log via the existing _audit pattern.

Tests: 27 server (clone/validate/zip + workspace-subdir convention +
concurrent persist_overlay_token + endpoint shapes + audit rows) + 29
CLI (override sentinel parse + probe fall-through + safe extraction +
YES strictness + hook guards + e2e mocked init).

Risk acceptance — documented in docs/initial-workspace-override.md +
CHANGELOG Internal section so AI reviewers understand the deviations
from defaults are intentional:
- maybe_refresh_claude_hooks deliberately no-ops on override workspaces
- --force on override does NOT back up CLAUDE.md (admin's repo is the
  source of truth)
- .claude/CLAUDE.local.md IS overwritten by override extraction when
  admin's repo ships one

* test+vendor-agnostic: drop Groupon tokens from #292 fixtures + extend admin-gate coverage

Two fixes from the takeover review on #292:

1. **Vendor-agnostic OSS rule**: Replace `Groupon` / `groupon/template`
   tokens in test fixtures with `Acme` / `acme/template` (8 sites in
   test_cli_init_override.py + 1 in test_initial_workspace_api.py).
   Per CLAUDE.md "Vendor-agnostic OSS — no customer-specific content"
   rule: customer-specific tokens don't belong in shipped artifacts,
   even in test fixtures. The pre-existing FoundryAI mentions in
   test_instance_config.py + test_setup_instructions.py are out of
   scope for this PR (didn't introduce them).

2. **Admin-gate coverage gap**: `test_admin_endpoints_require_admin`
   only covered GET /api/admin/initial-workspace + POST .../sync. The
   register-write (POST .../initial-workspace) and delete (DELETE
   .../initial-workspace) endpoints used the same `Depends(require_admin)`
   wiring but had no regression test. Loop now covers all 4 verbs so
   a future refactor that drops the dependency from one endpoint
   fails here instead of silently exposing the write/delete paths to
   any analyst with a PAT.

* release: 0.54.9 — Initial Workspace Template (per-instance agnes init override)

Last commit on the PR per CLAUDE.md hard rule. Patch bump (0.54.8 →
0.54.9) for Mina's Initial Workspace Template feature.

No DB migration (config lives in instance.yaml overlay). No
mandatory operator action — empty default keeps OSS-default
agnes init behavior. Operators wanting full template control link a
Git repo on /admin/server-config → "Initial Workspace Template".
See docs/initial-workspace-override.md for the full
responsibility-transfer contract.

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-13 20:35:01 +00:00
ZdenekSrotyr
c8de0e0f64
release: 0.53.2 — diagnose silent-capture check + urllib3 2.7.0 + flaky-test fix (#270)
Three bundled improvements:

- #244 — new `agnes diagnose` check compares SessionStart events
  (~/.claude/projects/<encoded>/*.jsonl) against agnes-push uploaded
  log entries inside a 7-day window. Surfaces a warning when the gap
  exceeds 3, hinting at silently-broken capture-session — previously
  detectable only weeks after the fact.

- Dependabot — bumps transitive urllib3 from 1.26.20 to 2.7.0 to close
  5 advisories (4 high, 1 medium). kbcstorage 0.9.5 still pins
  urllib3<2.0.0 upstream; overridden via [tool.uv] override-dependencies
  since the SDK works fine against 2.x in practice (Client + Tables
  both flow through requests, which supports both lines).

- #252 — fix flaky test_scratch_dir_cleaned_up_after_failed_extraction
  by redirecting tempfile.tempdir to a per-test tmp_path. Pre-#252 the
  test scanned the shared system tmp dir and a sibling store test in
  another pytest-xdist worker could trip the assertion mid-window.

Closes #244. Closes #252.
2026-05-12 18:28:04 +02:00
ZdenekSrotyr
48755b9864 release: 0.52.0 — UX/hygiene round (5 fixes from 0.51.0 retro)
Closes #254 (agnes sample alias), #255 (wide-table render), #256
(single-flight on bq-metadata-refresh + run_id), #257 (init wording),
#258 (progress bar clamp).

Tier B trackers left open: #259 (init resume), #260 (stale .lock),
#261 (schema cold-start), #262 (docker disk).
2026-05-12 15:09:14 +02:00
minasarustamyan
19c5a7592a
Session capture queue, private session, and setup-prompt fixes (#242)
* Capture session paths via SessionStart hook + lock parallel pushes

Replace the encoding-based scan of ~/.claude/projects/<encoded-cwd>/ with
a queue file populated by a new `agnes capture-session` SessionStart hook.
The hook reads the documented `transcript_path` field from Claude Code's
hook stdin JSON, sidestepping the cwd-to-folder encoding (which is an
internal implementation detail and varies by Claude Code version).

- New `agnes capture-session` subcommand appends transcript_path to
  <workspace>/.claude/agnes-sessions.txt. Silent on all malformed input
  so a hook chain failure doesn't clutter Claude Code startup.
- `agnes push` now consumes the queue: atomic snapshot rename guards
  against hooks writing during the push window, successful uploads land
  in agnes-sessions-uploaded.txt (TSV: timestamp + path), failed paths
  are requeued.
- Cross-platform single-instance lock via the filelock package (fcntl
  on POSIX, msvcrt on Windows). Concurrent SessionEnd hooks — common
  when the user closes several sessions at once — silent-exit on the
  losing side instead of all racing the upload.
- Recovery: pre-existing snapshot files from a crashed push are picked
  up and processed before the live queue.
- The SessionStart `agnes push` self-heal entry is dropped — it became
  redundant once the queue persists across runs (orphans from headless /
  crashed sessions ship out on the next interactive SessionEnd push).
  Existing workspaces auto-migrate via the marker-based replace logic.
- Legacy encoding scan stays available behind `--legacy-scan` for one-
  off backfills of sessions predating the queue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add /agnes-private + statusLine indicator for private sessions

Users handling sensitive data inside Claude Code can now opt a session
out of the Agnes upload pipeline, either proactively (right after session
start) or reactively (mid-session). The `/agnes-private` slash command
runs `agnes mark-private` deterministically via `!`-prefix direct bash —
no AI in the loop. A workspace-installed statusLine surfaces a
`🔒 agnes-private` indicator in Claude Code's status bar so the user
sees the state at a glance.

Authoritative source of "do not upload" is a separate file
`<workspace>/.claude/agnes-sessions-private.txt` (one session_id per
line). Both `capture-session` (queue writer) and `push` (queue reader)
consult the list. This makes the slash-command / SessionStart-hook race
impossible by construction: whichever runs first, the session is correctly
filtered out.

- `agnes mark-private` reads `CLAUDE_CODE_SESSION_ID` from env (set by
  Claude Code in every bash subprocess it spawns — stable documented API)
  and appends to the private list.
- `agnes statusline` reads the session JSON Claude Code pipes on stdin,
  checks the private list, and emits the indicator or nothing. Optimized
  for the high call frequency of statusLine renders.
- `capture-session` extracts session_id from hook stdin and skips queue
  write when the ID is already on the private list (race protection).
- `push` filters snapshot entries by the private list and appends to a
  per-workspace audit log `agnes-sessions-private-skipped.txt`.
- Queue format migrated from `<path>` to `<session_id>\t<path>`; legacy
  one-column lines still parse (empty session_id, still upload, can't be
  marked private retroactively — fine, they pre-date the feature).
- `install_claude_hooks` writes a workspace statusLine unless the user
  already has a custom one (warn + preserve). Idempotent re-init.
- `install_claude_commands` ships `agnes-private.md` alongside
  `update-agnes-plugins.md`. Per-template fallback so a missing template
  doesn't get clobbered with the wrong content.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix setup-prompt + CLAUDE.md marketplace copy + drop skills step

Three issues against the post-PR-#240 / post-PR-#237 state:

1. Setup prompt's marketplace block trailer (both has-stack and
   empty-stack variants) claimed the SessionStart hook keeps the
   marketplace clone in sync via `agnes refresh-marketplace --quiet`
   on every session and that admin grants land automatically — both
   false since PR #237 (0.47.x) moved the install/update path out of
   the hook into the `/update-agnes-plugins` slash command. The hook
   is `--check`-only: detects server-side changes, prompts the user
   to run the slash command, which does the full reconcile
   interactively with output visible in the transcript.

2. The empty-stack variant framed composition as "admin grants only",
   missing the actual three-source served stack:
     (admin RBAC ∩ /marketplace subscriptions)
       ∪ system-mandatory plugins (admin-pinned, auto-applied)
       ∪ Flea market installs (skills/agents bundled, plugins standalone)
   Updated copy spells out all three sources so analysts know where
   their stack picks live, and what the SessionStart hook actually
   does on change detection.

3. CLAUDE.md template's "Agnes Marketplace" section conflated
   eligibility (`resolve_allowed_plugins` — what's listed) with served
   stack (`resolve_user_marketplace` — what actually reaches Claude
   Code). The two are different: a user can be RBAC-eligible for a
   plugin without having subscribed to it on /marketplace. Rewrote
   the section to distinguish the eligibility set from the served
   stack and to describe the `--check`-only hook accurately.

Plus: deleted the setup prompt's interactive Skills step (final step
before Confirm). The named-opinion question — "do you want me to
bulk-copy every skill into ~/.claude/skills/agnes/ or pull on-demand
via `agnes skills show <name>`?" — had no obvious right answer for
new users at the tail end of a wall of technical steps. On-demand
lookup is the one-size-fits-all default; `agnes skills list/show`
remain discoverable and the CLAUDE.md template references specific
skills inline (e.g. agnes-data-querying in the BigQuery section)
where they're relevant. Layout: Confirm shifts from step 9 to step 8.

Tests updated, full setup/marketplace/welcome surface green (115
passed). Remaining full-suite failures are pre-existing (BQ/Keboola
fixtures, Windows charmap collection error in test_v26_keboola_e2e)
— verified against a clean stash, unrelated to this diff.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix session-queue race + snapshot PID-reuse data loss

Two blocker fixes from the PR #242 review:

1. Concurrent SessionStart hooks could corrupt the queue file on
   Windows. Python's `open(path, "a")` is not atomic there — the CRT
   does not pass FILE_APPEND_DATA to CreateFile, so concurrent
   appenders (user opening several Claude Code windows simultaneously)
   could interleave bytes mid-line. The malformed lines then silently
   fail the parser and the entries are dropped.

   Fix: wrap append_to_queue, requeue_failed, and snapshot_queue in a
   short-lived FileLock on a dedicated `agnes-queue.lock`. Separate
   from `agnes-push.lock` so capture-session hooks don't block on the
   push command. New test_append_concurrent_threads_no_corruption
   reproduces the race with 4 threads x 50 appends.

2. Snapshot filenames embedded only the PID (`agnes-sessions.snapshot.
   <PID>.txt`). After a crashed push left a snapshot on disk and the
   OS recycled the PID for a new push, `os.rename` would atomically
   overwrite the recovery snapshot — every entry in it lost, silently.

   Fix: append a uuid8 hex tail (`agnes-sessions.snapshot.<PID>.
   <uuid8>.txt`). find_recovery_snapshots already globs the prefix
   so it picks up both old and new format. New
   test_snapshot_filename_is_unique_per_call asserts two consecutive
   snapshots under the same PID don't collide.

Targeted tests green (47/47 in session_queue/capture_session/cli_push).
Full suite failures unchanged from baseline (pre-existing BQ/Keboola
fixture issues per CLAUDE.md).

* Auto-refresh workspace hooks + bash-wrap all hook entries (Windows)

Fixes from PR #242 second review (ZdenekSrotyr):

1. `uv.lock` regenerated to include `filelock 3.29.0` (declared in
   pyproject.toml but missing from the lock file — CI's
   lockfile-consistency check would fail; `uv pip install` on a clean
   cache would silently miss the dep).

2. `agnes self-upgrade` now auto-refreshes the workspace Claude Code
   hooks via the new `cli.lib.hooks.maybe_refresh_claude_hooks`. Closes
   the silent-stop migration gap: a v0.48 workspace would auto-upgrade
   the CLI from its existing SessionStart self-upgrade entry but never
   pick up the new `agnes capture-session` SessionStart hook, leaving
   the queue empty and `agnes push` uploading nothing.

   The refresh fires on both the "info is None" fast path (CLI already
   current — catches the second SessionStart after a prior upgrade)
   and the install-success path. Guarded by `workspace_has_agnes_hooks`
   so it never writes `.claude/settings.json` into directories that
   aren't Agnes workspaces (e.g. `agnes self-upgrade` invoked from
   `~/`). Errors are surfaced on stderr but never flip the upgrade exit
   code.

3. All Agnes-managed hooks are now wrapped in `bash -c "..."`. The
   self-upgrade+pull chained SessionStart entry was the only one still
   shipping unwrapped — Claude Code on Windows runs hook commands
   directly without a shell, so the `;` chain + `2>/dev/null` +
   `|| true` shell syntax silently no-op'd on native Windows installs
   without Git Bash on PATH. Workspaces still on the old form
   auto-upgrade via the refresh path above.

Tests: +12 in test_lib_hooks.py (guard semantics, v0.48→v0.49
migration end-to-end, third-party-hook preservation, bash-wrap
invariant). +5 in test_self_upgrade.py (refresh fires on info=None,
fires on install success, skipped on failure, skipped on --check-only,
refresh failure never flips exit code).

130 targeted tests green. The 2 pre-existing Windows path-separator
failures in `test_smoke_test_detects_version_mismatch[uv|pip]` are
unrelated (path mismatch `\fake\uv\bin\agnes` vs `/fake/uv/bin/agnes`
in test asserts, pre-PR baseline).

* CHANGELOG: document PR-242 main features

Closes ZdenekSrotyr #4: the [Unreleased] block was missing entries for
the PR's primary surface — only the post-merge fix bullets and the
unrelated setup-prompt copy change were captured. Adds:

- ### Added: 6 bullets covering the session capture queue + new
  `agnes capture-session` subcommand, `/agnes-private` slash + `agnes
  mark-private`, `agnes statusline` + statusLine wiring, `--legacy-scan`
  opt-in fallback, single-instance push lock, and the new `filelock`
  runtime dep.

- ### Changed: BREAKING bullet on the SessionStart / SessionEnd hook
  wire format change (capture-session as first SessionStart entry,
  push self-heal removed, SessionEnd push detached via nohup, all
  entries bash-wrapped). Folds the prior standalone bash-wrap bullet
  into this consolidated entry — Z's review flagged the layout shift
  as BREAKING, and grouping the related sub-changes makes the
  migration story readable in one place.

- Operator migration is auto-handled by `maybe_refresh_claude_hooks`
  invoked from `agnes self-upgrade` (separate Changed entry below).
  No `agnes init` re-run required. Pre-queue session jsonls on
  upgrading workspaces still need a one-off `agnes push --legacy-scan`
  — flagged in the BREAKING bullet.

No code change; doc only.

* Drop permanent 4xx uploads instead of requeueing forever

Closes ZdenekSrotyr #5. Previously the push retry path requeued any
non-200 response except the literal "file not found on disk", so 401
(token expired), 403 (RBAC denial), 413 (payload too large), 400
(server-side validation) cycled through every push run forever — the
queue grew without bound and each run re-bombarded the server with the
same deterministically-failing upload.

Now 4xx (except 408 Request Timeout + 429 Too Many Requests, which the
HTTP spec marks as transient) is dropped and audit-logged to
`<workspace>/.claude/agnes-sessions-failed.txt`:

    <iso_ts>\t<session_id>\t<status>\t<transcript_path>

5xx and network errors continue to requeue — those reflect server /
transport state that can change between runs, so retry is the right
behavior.

The audit log piggybacks on the push single-instance lock
(agnes-push.lock) — push is the only writer to this file, same as the
existing `mark_uploaded` and `mark_private_skipped` paths, so no
separate filelock is needed.

`agnes push --json` surfaces a new `dropped_permanent` counter; non-
quiet stdout mentions the audit-log path so operators tailing the
output have a pointer to the forensic trail.

Tests: +7 in test_cli_push.py (401/400/403/413 → drop; 408/429 →
requeue; 500/502/503 → requeue; network exception → requeue;
--json `dropped_permanent` counter; stdout audit-log pointer). +1 in
test_session_queue.py (mark_failed_permanent TSV format).

127/129 targeted tests green. The 2 pre-existing Windows
path-separator failures in `test_smoke_test_detects_version_mismatch
[uv|pip]` are unrelated (path mismatch `\fake\uv\bin\agnes` vs
`/fake/uv/bin/agnes` in test asserts, pre-PR baseline).

* Catch OSError in push lock acquisition

Closes ZdenekSrotyr #8. `acquire_or_skip` in `cli/lib/push_lock.py`
previously caught only `filelock.Timeout`. Any `OSError` from
`FileLock.acquire` — read-only filesystem, permission denied on
`.claude/`, disk full, hardware I/O failure — propagated as an
unhandled traceback.

Two visible failure modes:
- SessionEnd hook: `|| true` in the wrapper swallowed the error, so
  daily pushes silently never ran. Operator had no signal.
- Manual `agnes push`: ugly Python traceback dumped to the terminal
  instead of a clean exit.

Now `OSError` is treated the same as `Timeout` — yield `None`, caller
returns cleanly with rc=0. The operator's environment in these
scenarios has bigger problems than missing session uploads, so we
swallow rather than retry-loop or surface a noisy warning.

Test: `test_push_silent_exit_when_filelock_raises_oserror` patches
the `FileLock` used inside `push_lock` to raise OSError on acquire,
verifies push exits 0 with no traceback and the queue is preserved
for the next attempt.

* Address remaining S2 items from PR-242 review

Four items from ZdenekSrotyr's S2 list:

S2.10 — `_install_statusline` truthy check (cli/lib/hooks.py): replace
`if existing:` with explicit `if existing is None or existing == "":`.
Documents and tests the behavior for both edge cases (explicit-null
and empty-string `statusLine`) — both treated as "not configured"
rather than "explicit user opt-out", so we install ours. Two new
tests in test_lib_hooks.py pin the contract.

S2.6 — onboarding docs for /agnes-private. New "Private sessions"
subsection in `config/claude_md_template.txt` (next to Data Sync)
covering the slash command, statusbar indicator, and audit-log
location. One-line tip in `app/web/setup_instructions.py` so the
feature is discoverable at onboarding.

S2.9 — e2e privacy test (tests/test_e2e_privacy.py). Wires
capture_session → mark_private → push against a recording fake
api_post and asserts zero session uploads for the marked one.
Three cases: mark-before-capture (queue write skipped),
mark-after-capture (push-side filter catches it + audit-logs),
control (unmarked sessions upload normally).

David #8 — `--legacy-scan` help text now documents the
private-list gap (legacy entries carry empty session_id, so
the filter is not consulted). The practical impact is bounded —
pre-queue sessions cannot have been marked private since the
private list is a queue-era feature — but the disclaimer in the
help text means an operator running a backfill is not surprised.

68 targeted tests green (3 new e2e + 2 new truthy edge tests +
existing). 2 pre-existing Windows path-separator failures in
test_smoke_test_detects_version_mismatch[uv|pip] unchanged.

Remaining S2 items (statusline mkdir push-back, capture-session
silent-fail follow-up) handled in PR comment + follow-up issue
respectively.

* Address remaining S2 follow-ups (David #8, S2.7, David #11)

Three items left over from Mina's bbf63472 batch — that commit
addressed S2.6/S2.9/S2.10 + documented David #8 in help text but
deferred the actual implementations of S2.7, David #11, and the real
David #8 fix to follow-ups. This commit closes them.

David #8 — `agnes push --legacy-scan` now consults the private list.
Claude Code names jsonls `<session-id>.jsonl`, so the file stem IS
the session id; the legacy-scan path can apply the same private filter
the queue path uses. Both the dry-run and live-upload code paths fixed.
Help text updated (no longer warns the filter is bypassed). Two new
tests in test_cli_push.py cover the upload-skip path + the dry-run
`would_skip_private` segregation.

S2.7 — `statusline`/`is_private` no longer mkdir-pollutes arbitrary
workdirs. Split `_claude_dir` into `_claude_dir_writable` (used only
from `add_private`) and `_claude_dir_readonly` (no mkdir). The
read-only public helpers (`private_list_path`, `read_all_private`,
`is_private`) compose the no-mkdir variant by default; `add_private`
opts in via `writable=True`. Added a process-local mtime-keyed cache
around `read_all_private` so in-process callers (push doing one stat
per upload candidate, future `agnes diagnose`) don't re-parse the
file on every check. Cache eviction on `add_private` so a sub-second
write+read sequence doesn't see stale data even on coarse-mtime
filesystems. Two new tests pin the no-mkdir contract + the
in-same-second add+read consistency.

David #11 — `agnes capture-session` writes a breadcrumb log on every
invocation. New `<workspace>/.claude/agnes-capture-session.log` TSV:
`<iso_ts>\t<outcome>\t<detail>` where outcome covers every silent-
exit path (`ok`, `private_skip`, `empty_stdin`, `bad_json`,
`not_object`, `no_transcript_path`, `stdin_read_error`,
`write_error`). Gives operators a signal to detect "hook fires but
queue stays empty" — without it, an upstream Claude Code stdin-
contract change is invisible because the hook always exits 0. Log
rolls at 256 KiB so it doesn't grow unbounded on long-lived
workspaces. Best-effort: a breadcrumb-write failure is itself
swallowed so the hook contract stays "exit 0 always". Skipped in
non-Agnes workdirs (no `.claude/` exists) so opening Claude Code
in `~/` doesn't pollute it. Five new tests in test_capture_session.py
cover the success / bad_json / no_transcript_path / private_skip /
no-pollute paths.

115 targeted tests green (test_cli_push, test_capture_session,
test_private_list, test_session_queue, test_e2e_privacy,
test_lib_hooks, test_statusline, test_mark_private).

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: ZdenekSrotyr <zdenek.srotyr@keboola.com>
2026-05-11 13:31:16 +00:00
minasarustamyan
d9405a6888
Move marketplace plugin updates from hook to /update-agnes-plugins skill (#237)
* Move marketplace plugin updates from hook to /update-agnes-plugins skill

The SessionStart hook used to run `agnes refresh-marketplace --quiet`,
which performed a full fetch+reset+install cycle on every Claude Code
session start. That work was invisible to the user, slowed session
startup, and was unrecoverable interactively when something failed.

Split the responsibility:

- `agnes refresh-marketplace --check` is a new lightweight detector:
  `git fetch` only, compares local HEAD with remote FETCH_HEAD, emits
  a Claude Code hook JSON message pointing the user at
  `/update-agnes-plugins` when the marketplace has changes. No reset,
  no plugin install/update side effects.
- `/update-agnes-plugins` is a new slash command (installed by
  `agnes init` into `<workspace>/.claude/commands/`) that runs
  `agnes refresh-marketplace` (default chatty path). Output streams
  into the Claude Code transcript so the user sees install/update
  progress and can react to errors interactively.
- The SessionStart hook now runs `--check`. Existing workspaces
  auto-upgrade on next `agnes init` (substring marker matches both
  the old `--quiet` entry and the new `--check` one).

BREAKING: `agnes refresh-marketplace --quiet` is removed. Old hooks
calling it silent-noop after the CLI upgrade (the hook's `|| true`
swallows the unknown-flag error) until re-init rewrites them.

* Point marketplace 'Added to your stack' hint at /update-agnes-plugins

The post-install green panel on plugin and skill/agent detail pages
referenced the SessionStart auto-install path and a shell-prompt
`agnes refresh-marketplace` invocation. With the hook now being
detect-only, that copy was misleading — the actual install path is
the new slash command.

Condensed to a single instruction: "Open a new Claude Code session
and run:" followed by `/update-agnes-plugins` in a copy-chip.
JS clipboard string updated to match.

---------

Co-authored-by: Minas Arustamyan <arustamyan.minas@gmail.com>
2026-05-09 21:10:39 +02:00
ZdenekSrotyr
8d0bb43b06
release: 0.46.4 — detach SessionEnd push so it survives claude -p SIGTERM (#222)
## Summary

`claude -p` (headless mode) gives SessionEnd hook subprocesses ~1 second before SIGTERM, regardless of work in progress. `agnes push` for a typical workspace takes 5-30s. The current synchronous SessionEnd hook (`agnes push --quiet 2>/dev/null || true`) was therefore being killed mid-first-upload — `|| true` masks the SIGTERM as exit 0, so this regression was invisible until I traced it via a wrapper script and Claude's `~/.claude/debug/<sid>.txt` log.

Fix: wrap SessionEnd push in `bash -c "( nohup agnes push --quiet </dev/null >/dev/null 2>&1 & ) ; true"`. The subshell exits immediately, orphaning the upload child to init so it survives the hook subprocess kill. Same `bash -c` pattern as the existing `refresh-marketplace` SessionStart entry (for Windows compatibility).

End-to-end verified against production: claude exited in 5s, detached child completed the upload, file `491e3a23-...jsonl` landed on the server within 30s with mtime 14:30 UTC.

## Test plan

- [x] `pytest tests/test_lib_hooks.py` — added `test_session_end_push_is_detached` regression test asserting `nohup`, `&`, `</dev/null` are all present.
- [x] `pytest tests/test_setup_hooks_template.py` — assertions loosened from `==` to `in` where necessary.
- [x] Verified end-to-end against production with the detached wrapper before opening this PR (manual probe).
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/222" target="_blank">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
    <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review">
  </picture>
</a>
<!-- devin-review-badge-end -->
2026-05-07 17:59:27 +02:00
ZdenekSrotyr
7fc5365891
release: 0.46.3 — self-heal session pipeline + clearer diagnose (#220)
## Summary

Verified against production: `claude -p` headless mode doesn't fire SessionEnd hooks (proven via `--output-format stream-json --include-hook-events`: zero `SessionEnd` events), so any session JSONLs from `-p` invocations stay orphaned locally and never reach the server. Fix: add `agnes push --quiet` as a third SessionStart entry — symmetric self-heal alongside the existing `agnes pull` entry. Existing workspaces pick this up on their next `agnes init` via the marker-based migration already in `cli/lib/hooks.py`.

Separately: a colleague's fresh install showed `agnes diagnose` warning "uploads are not being processed", which led them to suspect their `agnes push` was broken. The warning is actually about the LLM-based `verification-detector` backlog (uploads themselves were arriving fine — confirmed by 23+3 JSONLs landed on the server while the warning was firing). Reword the warning to "verification-detector backlog" + add `last_processed` to the diagnose dict so operators don't have to grep logs to confirm.

## Test plan

- [x] `pytest tests/test_lib_hooks.py` — updated count + added `agnes push in SessionStart` assertion.
- [x] `pytest tests/test_setup_hooks_template.py` — updated.
- [x] `pytest tests/test_clean_install_integration.py` — updated.
- [x] `pytest tests/test_health_session_pipeline.py` — updated warning text + asserted `last_processed` field.
<!-- devin-review-badge-begin -->

---

<a href="https://app.devin.ai/review/keboola/agnes-the-ai-analyst/pull/220" target="_blank">
  <picture>
    <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1">
    <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review">
  </picture>
</a>
<!-- devin-review-badge-end -->
2026-05-07 17:41:22 +02:00
ZdenekSrotyr
c97fd504c5 release: 0.45.0 — easy-wins bundle (#84 #164 #177 #178 #203 #204)
Operator-and-analyst quality bundle: a security fix for the optional
Telegram bot, two CLI gaps closed, and three rounds of UX polish on
`agnes diagnose` and `agnes pull` so non-TTY consumers (CI runners,
Claude Code SessionStart hooks, sub-agent watchdogs) get readable,
actionable signal.

- Pairing-code RNG: random.choices -> secrets.choice (CSPRNG).
- Telegram script runner: refuse out-of-shape usernames before sudo -u.

CLAUDE.md.bak.<ISO-timestamp> before regenerating.

- agnes admin unregister-table <id> -> DELETE /api/admin/registry/{id}
- agnes admin update-table <id> --field=value ...  -> PUT /api/admin/registry/{id}

response but never promotes the headline. BQ billing-equals-data check
downgraded warning -> info.

default (5 s / 1 MiB vs 30 s / 10%) so sub-agent watchdogs don't kill
the pull as a hung process. New env knobs:
AGNES_PULL_PROGRESS_INTERVAL_{SECONDS,BYTES}.

--include-schema (or ?include=schema) to opt back in.

Tests: 120 passed across the touched modules, including new tests for
each fix. Pre-existing failures on main (DB migration v1->v9, binary
rename) are unrelated and not introduced here.
2026-05-07 11:43:16 +02:00
Minas Arustamyan
50e0463501 feat(marketplace): clone-based plugin setup + auto-refresh SessionStart hook
Adds end-to-end flow for installing and keeping the per-user filtered
Claude Code marketplace in sync with the user's Agnes stack
(admin RBAC grants \ MyAIStack opt-outs U /store installs).

Setup (one-liner in install prompt step 5):
  `agnes refresh-marketplace --bootstrap` clones the per-user marketplace
  bare repo to ~/.agnes/marketplace, strips PAT from the cloned origin
  URL, registers the local path with Claude Code, and installs every
  plugin in the served manifest at --scope project. Replaces a 15-line
  inline shell sequence that tripped Claude Code's agent-driven `rm -rf`
  permission gate.

Auto-refresh (SessionStart hook installed by `agnes init`):
  `agnes refresh-marketplace --quiet` runs every Claude Code session,
  fetches+resets the clone (server rebuilds as orphan commits, so
  pull --ff-only is impossible), and version-aware reconciles:
    - missing in workspace -> claude plugin install <name>@agnes --scope project
    - version differs       -> claude plugin update <name>@agnes
    - matches               -> skip
  Don't auto-uninstall plugins that disappeared from the manifest --
  a transient empty manifest from the server would wipe the stack.

Hook output: when --quiet AND something actually changed, emits Claude
Code hook JSON on stdout -- `systemMessage` (transient toast) and
`hookSpecificOutput.additionalContext` (model-side system reminder),
both carrying the change summary plus a "/exit + restart Claude Code"
instruction (Claude only scans plugins at session start).

Windows hook compatibility: the refresh-marketplace hook command is
wrapped in `bash -c "..."` because Claude Code on Windows runs hook
commands directly without invoking a shell, so `2>/dev/null || true`
would otherwise be passed as literal argv tokens.

Cross-cutting:
  - cli/lib/marketplace.py: shared CLONE_DIR + MARKETPLACE_NAME constants.
  - cli/lib/hooks.py: SessionStart now has two independent entries
    (pull + refresh-marketplace) so a failure in one doesn't suppress
    the other; legacy `da sync` and prior single-pull layouts upgrade
    cleanly on re-init.
  - PAT injection on every git fetch via per-invocation credential
    helper (token in \$AGNES_TOKEN env, never in argv or .git/config).
  - Pre-snapshot of installed plugins captured BEFORE
    `claude plugin marketplace update` so silent auto-applied version
    bumps still fire notifications.
  - scripts/dev/agnes-client-reset.sh: cleans ~/.claude/plugins/marketplaces/agnes,
    ~/.claude/plugins/cache/agnes, drops uv build cache, documents
    workspace-scoped residue that can't be enumerated from the script.
  - app/web/setup_instructions.py: legacy AGNES_DEBUG_AUTH path also
    uses clone (direct HTTPS marketplace add is broken end-to-end on
    every Claude Code distribution -- stores response as single file,
    plugin source paths then 404).

28 new tests (test_cli_refresh_marketplace.py) + extended hook + setup
template tests cover bootstrap, fetch+reset ordering, version-aware
reconcile, project-path filtering, hook JSON shape, and the bash-c
Windows wrapper invariant.
2026-05-07 06:59:13 +02:00
ZdenekSrotyr
73d2896fa6 docs(hooks): update install_claude_hooks docstring for chained SessionStart 2026-05-06 23:23:23 +02:00
ZdenekSrotyr
be62ce61b8 feat(cli): install SessionStart hook chaining self-upgrade then pull
Single hook entry: 'agnes self-upgrade --quiet ... || true; agnes pull
--quiet ... || true'. Shell semicolon guarantees ordering across every
Claude Code version (no reliance on undocumented multi-hook execution
semantics); each segment's || true preserves the original property
that an upgrade failure does not abort the pull.
2026-05-06 23:23:23 +02:00
ZdenekSrotyr
e72ff259f9 feat(pull): aggregated progress + non-TTY textual fallback
Two improvements to `agnes pull` progress reporting:

1. **Aggregated per-file progress across chunked downloads**: the
   existing Rich progress bar already used one task per file, but the
   chunked-download contract (one file = N parallel chunk callbacks
   summing to file size) meant we needed to verify that all chunk
   threads advance the same task. They do — the per-file callback is
   constructed once per tid and routes every chunk's byte delta to the
   same task / textual entry, so the bar shows one aggregated bytes-
   downloaded total rather than N separate sub-bars.

2. **Textual fallback for non-TTY stderr**: when stderr is not a
   terminal (SessionStart hook, CI runner, Docker log capture), Rich
   either suppresses output (silent multi-minute pull on a 5 GB
   parquet) or emits raw control sequences. The new `_TextualProgress`
   helper instead emits one plain-text line per file at most every
   10%-of-total-bytes or 30 s, plus a final `100% done` line per file.
   Format: `[N/T files] <tid>: 25% (16 MB / 66 MB) at 1.5 MB/s`.

The TTY path is unchanged. Detection uses `sys.stderr.isatty()` —
`show_progress=True` flips into the textual fallback when that returns
False. `show_progress=False` (the SessionStart hook) still emits no
progress text in either mode.
2026-05-06 13:09:37 +02:00
ZdenekSrotyr
28423907fd feat: clean CLI errors + init progress + skip-materialize + claude.md catalog pointer
Three first-try-failure-surface fixes from Pavel's #185 trace + the
template guidance question, all under PR #188's umbrella so they land
together with the file_server / parallel pull / Tier 1 work.

1. CLI clean-error wrapper — new AgnesTransportError raised by the
   api_*/stream_download helpers when httpx times out / drops /
   refuses, plus a top-level Typer wrapper (cli/main.py) that prints
   one-line "Error: …" + actionable hint and exits non-zero. Full
   traceback goes to ~/.config/agnes/last-error.log for support
   forwarding. Unhandled Exceptions are caught at the same boundary
   so no Python traceback ever leaks to the analyst's terminal.

   Pavel's #185 Phase 3B: a 30-frame httpx traceback from a slow BQ
   --remote query made it look like a CLI bug. Now: clean message +
   hint pointing at `agnes snapshot create` / partition-column
   guidance.

   Entry point in pyproject.toml flipped from `cli.main:app` →
   `cli.main:_run_with_clean_errors` so the wrapper actually runs
   under the installed `agnes` binary.

2. agnes init / agnes pull --skip-materialize + progress bar.
   --skip-materialize omits query_mode='materialized' rows from the
   download set so a first init doesn't spend 44 minutes silently
   pulling a single 6 GB parquet (Pavel's #185 Phase 1). Rich-driven
   per-file progress bar with label/bytes/rate/ETA renders to stderr
   when not --quiet and not --json. Aggregates across the parallel
   ThreadPoolExecutor workers added earlier in this PR.

3. config/claude_md_template.txt: explicit one-line snippet pointing
   at `agnes catalog --json | jq '.tables[] | select(.id=="<id>")'`
   for per-table descriptions + restated invariant: "the description
   field on each catalog row is the authoritative business-rules
   text — re-read live, never copy into this file." Resolves the
   regression-or-feature debate between Pavel (wants annotations)
   and the user feedback that landed in the prior commit (don't
   embed table-specific content; tables change). Catalog command
   stays the source of truth.
2026-05-05 18:11:59 +02:00
ZdenekSrotyr
2ae486bc5d feat(pull): parallel parquet downloads (AGNES_PULL_PARALLELISM=4 default)
The download loop in cli/lib/pull.py was strictly serial — N tables took
Σ stream_download(t_i). With the Caddy file_server change in this PR,
the server can now sustain many parallel sendfile transfers without
blocking app workers, so the client-side serialization became the new
bottleneck.

Switch to ThreadPoolExecutor capped by AGNES_PULL_PARALLELISM (default 4,
set 1 to restore pre-PR serial). 4 matches typical home-broadband
saturation without over-subscribing the analyst's NIC. Drops to serial
when len(to_download) <= 1 to avoid executor overhead in the common
single-table case.

Per-table error semantics preserved via (tid, entry, err) tuple — a
failure on one parquet doesn't abort the rest of the batch.

Verified end-to-end against a dev VM with the new Caddy file_server
deployed: 2-table pull through agnes CLI works under the new concurrency.
2026-05-05 16:42:55 +02:00
ZdenekSrotyr
8784f10a6b fix(devin-review): stale-token override + status sessions counter + lock comment
Three Devin Review findings on PR #173 addressed in one commit since
they're in adjacent code paths:

1. cli/commands/init.py:99 (\u{1F534}): `agnes init --token NEW` ran
   step 2 verify against the OLD on-disk token because `get_token()`
   read `~/.config/agnes/token.json` before the env var, and
   `_override_server_env` only set the env var. So `agnes init --force`
   on a machine with a stale token.json failed 401 with a confusing
   'token expired' even though the --token arg was valid.

   Fix: ContextVar-based override in `cli.config._token_override`
   checked by `get_token()` BEFORE the on-disk read.
   `_with_token_override` context manager scopes the override.
   `_override_server_env` now also sets the contextvar via
   `_with_token_override(token)`, so both env var and contextvar
   carry the override (env for back-compat with anything bypassing
   get_token; contextvar is the authoritative source).
   Async-safe (each task sees its own override) and leak-proof
   (resets on context exit).
   2 new tests: regression on stale-disk-token + scope leak guard.

2. cli/commands/status.py:43 (\u{1F7E1}): sessions_pending_upload only
   checked legacy `<workspace>/user/sessions/` and always reported 0
   in workspaces bootstrapped with `agnes init` (Claude Code writes
   to `~/.claude/projects/`, not the legacy path). Same bug we fixed
   for `agnes push` in 08e49591.

   Fix: route through `cli.lib.claude_sessions.list_session_files()`
   so status and push agree on what counts as a pending session.

3. connectors/bigquery/extractor.py:111 (\u{1F7E1}): docstring claimed
   "a live holder still wins the second flock attempt" — incorrect on
   Linux. After `unlink()` + `open()`, the new file is a new inode;
   fcntl.flock keys per-inode, so the old holder's lock does NOT block
   the new acquisition. In a genuine TTL-overrun scenario two writers
   CAN race the parquet.tmp.

   Fix: documentation only. Comment now honestly describes the
   inode-recreation behavior, names the threading.Lock as the actual
   in-process guard, and flags pid-gating as the next-iteration fix
   if real corruption surfaces. The 24h default TTL is well above
   typical COPY durations so the practical risk is low.

Tests: 17/17 across test_cli_init.py + test_lib_pull.py + the broader
regression set.
2026-05-04 21:26:30 +02:00
ZdenekSrotyr
976d0c7160 fix(pull): re-download parquet when file missing despite matching hash
Pre-fix `agnes pull` decided what to download from sync_state hash
equality alone:

    if server_hash != local_hash or tid not in local_tables or not server_hash:
        to_download.append(tid)

If the recorded local hash matched server but the actual parquet had
been deleted from disk, the download was skipped. The next DuckDB
view rebuild then fails on a missing file. Repro: `rm
server/parquet/X.parquet && agnes pull` → 'Updated 0 tables', X
still missing.

Failure modes that produce hash-equal-but-file-missing:
- manual `rm` of a single parquet
- operator-side cleanup of `server/parquet/`
- two workspaces sharing one user's
  `~/.config/agnes/sync_state.json` (TODO(workspace-scoped-sync-state)
  in pull.py): one workspace writes its parquets, the other reads
  sync_state and concludes 'I already have these'
- disk corruption / partial restore from backup

Fix: existence check runs alongside the hash compare. Missing file
forces a re-download regardless of hash equality. `parquet_dir` is
hoisted above the loop so the existence check is in scope when the
download set is built.

Tests: regression test for the hash-equal-but-missing-file case +
counterpart for the fast-path (hash-equal-and-file-present must
still skip).
2026-05-04 21:12:06 +02:00
ZdenekSrotyr
08e4959185 fix(push): read sessions from ~/.claude/projects/<encoded-cwd>/
Real bug: `agnes push` was reading `<workspace>/user/sessions/`, but
Claude Code writes session jsonls to `~/.claude/projects/<encoded-cwd>/`
and nothing on the analyst side ever copies them across. The SessionEnd
hook ran `agnes push` happily and uploaded zero sessions every time.

`cli/lib/claude_sessions.py` probes both Claude Code encoding variants
(older `/`→`-` keeping spaces+tildes; newer all-non-alphanumeric→`-`
with collapsed runs) and unions whichever exist. Users who upgraded
Claude Code mid-project end up with both encoded dirs side-by-side on
disk; the union ensures no session is left behind. Same-named jsonl in
both dirs → newest mtime wins. `<workspace>/user/sessions/` survives as
a fallback for any setup that explicitly mirrors sessions there.

Verified on real disk: helper returns 2 dirs + 8 unioned session files
for the Agnes-test workspace where the previous code returned 0.
2026-05-04 20:29:59 +02:00
ZdenekSrotyr
15004126de fix(cli-lib): I1+I2+I3 review fixes — token-precedence note, sync-state TODO, dry-run hermeticity test 2026-05-04 18:04:56 +02:00
ZdenekSrotyr
37da602060 feat(cli-lib): cli/lib/pull.py:run_pull primitive with lazy mkdir 2026-05-04 18:00:57 +02:00
ZdenekSrotyr
5aebeabf23 feat(cli-lib): cli/lib/hooks.py:install_claude_hooks 2026-05-04 17:53:20 +02:00